Job
Description
Opportunity: We are seeking a highly skilled and experienced AI Infrastructure Engineer (or MLOps Engineer) to design, build, and maintain the robust and scalable AI/ML platforms that power our cutting-edge asset allocation strategies. In this critical role, you will be instrumental in enabling our AI Researchers and Quantitative Developers to efficiently develop, deploy, and monitor machine learning models in a high-performance, secure, and regulated financial environment. You will bridge the gap between research and production, ensuring our AI initiatives run smoothly and effectively. Responsibilities: Platform Design & Development: Architect, implement, and maintain the end-to-end AI/ML infrastructure, including data pipelines, feature stores, model training environments, inference serving platforms, and monitoring systems. Environment Setup & Management: Configure and optimize AI/ML development and production environments, ensuring access to necessary compute resources (CPUs, GPUs), software libraries, and data. MLOps Best Practices: Implement and advocate for MLOps best practices, including version control for models and data, automated testing, continuous integration/continuous deployment (CI/CD) pipelines for ML models, and robust model monitoring. Resource Optimization: Manage and optimize cloud computing resources (AWS, Azure, GCP, or on-premise) for cost-efficiency and performance, specifically for AI/ML workloads. Data Management: Collaborate with data engineers to ensure seamless ingestion, storage, and accessibility of high-quality financial and alternative datasets for AI/ML research and production. Tooling & Automation: Select, implement, and integrate various MLOps tools and platforms (e.g., Kubeflow, MLflow, Sagemaker, DataRobot, Vertex AI, Airflow, Jenkins, GitLab CI/CD) to streamline the ML lifecycle. Security & Compliance: Ensure that all AI/ML infrastructure and processes adhere to strict financial industry security standards, regulatory compliance, and data governance policies. Troubleshooting & Support: Provide expert support and troubleshooting for AI/ML infrastructure issues, resolving bottlenecks and ensuring system stability. Collaboration: Work closely with AI Researchers, Data Scientists, Software Engineers, and DevOps teams to translate research prototypes into scalable production systems. Documentation: Create and maintain comprehensive documentation for all AI/ML infrastructure components, processes, and best practices. Qualifications: Bachelor's or Master's degree in Computer Science, Software Engineering, Data Science, or a related quantitative field. Experience: 3+ years of experience in a dedicated MLOps, AI Infrastructure, DevOps, or Site Reliability Engineering role, preferably in the financial services industry. Proven experience in designing, building, and maintaining scalable data and AI/ML pipelines and platforms. Strong proficiency in cloud platforms (AWS, Azure, GCP) including services relevant to AI/ML (e.g., EC2, S3, Sagemaker, Lambda, Azure ML, Google AI Platform). Expertise in containerization technologies (Docker) and orchestration platforms (Kubernetes). Solid understanding of CI/CD principles and tools (Jenkins, GitLab CI/CD, CircleCI, Azure DevOps). Proficiency in scripting languages like Python (preferred), Bash, or similar. Experience with Infrastructure as Code (IaC) tools (e.g., Terraform, CloudFormation, Ansible). Familiarity with distributed computing frameworks (e.g., Spark, Dask) is a plus. Understanding of machine learning concepts and lifecycle, even if not directly developing models. Technical Skills: Deep knowledge of Linux/Unix operating systems. Strong understanding of networking, security, and database concepts. Experience with monitoring and logging tools (e.g., Prometheus, Grafana, ELK stack). Familiarity with data warehousing and data lake concepts. Preferred candidate profile Exceptional problem-solving and debugging skills. Proactive and self-driven with a strong sense of ownership. Excellent communication and interpersonal skills, able to collaborate effectively with diverse teams. Ability to prioritize and manage multiple tasks in a fast-paced environment. A keen interest in applying technology to solve complex financial problems.