Machine Learning Operation Manager

6 years

0 Lacs

Posted:2 days ago| Platform: Linkedin logo

Apply

Work Mode

Remote

Job Type

Full Time

Job Description

This role is for one of the Weekday's clients

Min Experience: 6 yearsLocation: Remote (India)JobType: full-timeAs the Machine Learning Operations Manager, you will oversee the end-to-end ML lifecycle — from model training and deployment to monitoring and optimization. You will lead a small, high-performing team of engineers while remaining hands-on in building scalable, reliable, and efficient ML infrastructure. This role combines strategic leadership with deep technical expertise to ensure smooth collaboration between research, engineering, and operations teams.

Requirements

Key Responsibilities:

  • End-to-End ML Lifecycle: Manage training infrastructure, experiment tracking, deployment, and continuous optimization.
  • Collaboration with Researchers: Partner with research teams to streamline training, evaluation, and fine-tuning workflows.
  • Team Leadership: Mentor and guide a small team of ML engineers (3-4) while contributing as an individual contributor.
  • Performance Optimization: Improve latency, throughput, and cost efficiency; ensure robust packaging and runtime reliability.
  • Automation & Reliability: Develop systems for CI/CD, versioning, rollback, A/B testing, monitoring, and alerting.
  • Infrastructure Management: Maintain scalable, secure, and compliant AI environments across training and inference stages.
  • Cloud & AI Integration: Collaborate with cloud providers (AWS, GCP, Azure) and AI platforms to enhance tooling and optimize costs.
  • Cross-Functional Collaboration: Support GenAI and AI-driven projects across teams beyond core MLOps responsibilities.
  • Architecture & Roadmap: Contribute to architectural planning, documentation, and the continuous evolution of the ML stack.
  • Best Practices: Promote automation, MLOps standards, and operational excellence throughout the ML lifecycle.

Requirements:

  • 5+ years of hands-on experience in MLOps or ML/AI Engineering.
  • Strong understanding of ML/DL concepts and applied experience in model training and deployment infrastructure.
  • Proficiency with cloud-native ML tools (e.g., GCP Vertex AI, AWS SageMaker, Kubernetes).
  • Experience working across both model training and inference systems.
  • Familiarity with model optimization methods such as quantization, distillation, TensorRT, or FasterTransformer.
  • Demonstrated ability to lead complex technical projects independently.
  • Excellent communication and collaboration skills with a cross-functional mindset.
  • Ownership-oriented approach with comfort in driving clarity in ambiguous situations.

Skills:

MLOps, ML Engineering, Machine Learning Infrastructure, Model Deployment, Model Monitoring, CI/CD, Vertex AI, AWS SageMaker, GCP AI Platform, Kubernetes, Docker, MLflow, Kubeflow.

Mock Interview

Practice Video Interview with JobPe AI

Start Job-Specific Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now

RecommendedJobs for You