MLOps Engineer

7 years

0 Lacs

Posted:1 day ago| Platform: Linkedin logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

About the Role

We’re looking for an MLOps Engineer to build and operate reliable, secure, and scalable ML/LLM infrastructure—from data ingestion and training pipelines to model serving, monitoring, and continuous improvement. You’ll partner with Data Science, Platform, and Security teams to ship models to production with strong SLAs, observability, and cost control.

Responsibilities
  • Productionize models end-to-end:

     automate data ingestion, feature engineering, training, evaluation, packaging, and deployment (batch & real-time).
  • Model serving & orchestration:

     design/operate low-latency model endpoints and batch jobs using Kubernetes, Docker, job schedulers, and serving frameworks.
  • CI/CD for ML:

     implement reproducible pipelines (code, data, features, models) with unit/integration tests, approvals, and canary/blue-green rollouts.
  • Monitoring & reliability:

     build drift, performance, and data-quality monitors; set alerts and on-call runbooks; drive incident response and postmortems.
  • Observability:

     instrument tracing/logging/metrics (e.g., OpenTelemetry, Prometheus, Grafana) across data flows and model requests.
  • Model registry & governance:

     manage lineage, versioning, approvals, and audit trails; enforce security (IAM, secrets management) and compliance controls.
  • Cost & capacity management:

     optimize GPU/CPU usage, autoscaling, caching, batching, quantization, and instance right-sizing.
  • LLM & RAG pipelines (nice if applicable):

     stand up vector databases, retrieval flows, prompt/version management, guardrails, and evaluations.
  • Collaboration & enablement:

     create templates, docs, and self-service tooling for data scientists and app teams.
Required Qualifications
  • 3–7 years in MLOps/Platform/DevOps/SRE roles supporting ML in production.
  • Strong with 

    Python

     and one of 

    Go/TypeScript/Bash

    ; proficiency in 

    Docker

     and 

    Kubernetes

    .
  • Experience building ML pipelines with tools like 

    Airflow/Prefect/Kedro/Flyte/Metaflow

    .
  • CI/CD expertise (GitHub Actions/GitLab/Jenkins/Argo), including artifact/version management and automated testing.
  • Data stack: object storage (S3/GCS/Azure Blob), data warehouses/lakes, message queues/streams (Kafka/PubSub), and caching layers.
  • Monitoring/observability: Prometheus, Grafana, ELK/EFK, alerting (PagerDuty/VictorOps), tracing (OpenTelemetry/Jaeger).
  • Security fundamentals: IAM, network policies, secrets (Vault/SSM), image signing, SBOMs.
  • Solid understanding of ML lifecycle: data versioning, feature stores, experiment tracking, evaluation, and rollback.


Mock Interview

Practice Video Interview with JobPe AI

Start DevOps Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now

RecommendedJobs for You

Gurugram, Haryana, India

Pune, Maharashtra, India

Hyderabad, Telangana, India

Gurugram, Haryana, India

New Delhi, Delhi, India

New Delhi, Delhi, India

Gurugram, Haryana, India