Posted:1 week ago| Platform:
Hybrid
Full Time
Experience - 8 - 10 years Location - Bengaluru Key Responsibilities Design, implement, and maintain end-to-end MLOps pipelines for model training, validation, deployment, and monitoring. Build and manage LLMOps pipelines for fine-tuning, evaluating, and deploying large language models (e.g., OpenAI, HuggingFace Transformers, custom LLMs). Use Kubeflow and Kubernetes to orchestrate reproducible, scalable ML/LLM workflows. Implement CI/CD pipelines for ML projects using GitHub Actions , Argo Workflows , or Jenkins . Automate infrastructure provisioning using Terraform , Helm , or similar IaC tools. Integrate model registry and artifact management with tools like MLflow , Weights & Biases , or DVC . Manage containerization with Docker and container orchestration via Kubernetes . Set up monitoring , logging , and alerting for production models using tools like Prometheus , Grafana , and ELK Stack . Collaborate closely with Data Scientists and DevOps engineers to ensure seamless integration of models into production systems. Ensure model governance, reproducibility, auditability, and compliance with enterprise and legal standards. Conduct performance profiling, load testing, and cost optimization for LLM inference endpoints. Required Skills and Experience Core MLOps/LLMOps Expertise 5+ years of hands-on experience in MLOps/DevOps for AI/ML. 2+ years working with LLMs in production (e.g., fine-tuning, inference optimization, safety evaluations). Strong experience with Kubeflow Pipelines , KServe , and MLflow . Deep knowledge of CI/CD pipelines with GitHub Actions , GitLab CI , or CircleCI . Expert in Kubernetes , Helm , and Terraform for container orchestration and infrastructure as code. Programming & Frameworks Proficient in Python , with experience in ML libraries such as scikit-learn , TensorFlow , PyTorch , Hugging Face Transformers . Familiarity with FastAPI , Flask , or gRPC for building ML model APIs. Cloud & DevOps Hands-on with AWS , Azure , or GCP (preferred: EKS, S3, SageMaker, Vertex AI, Azure ML). Knowledge of model serving using Triton Inference Server , TorchServe , or ONNX Runtime . Monitoring & Logging Tools: Prometheus , Grafana , ELK , OpenTelemetry , Sentry . Model drift detection and A/B testing in production environments. Soft Skills Strong problem-solving and debugging skills. Ability to mentor junior engineers and collaborate with cross-functional teams. Clear communication, documentation, and Agile/Scrum proficiency. Preferred Qualifications Experience with LLMOps platforms like Weights & Biases , TruEra , PromptLayer , LangSmith . Experience with multi-tenant LLM serving or agentic systems (LangChain, Semantic Kernel). Prior exposure to Responsible AI practices (bias detection, explainability, fairness).
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
Mumbai, Bengaluru, Gurgaon
INR 32.5 - 37.5 Lacs P.A.
Chennai, Pune, Mumbai, Bengaluru, Gurgaon
INR 35.0 - 42.5 Lacs P.A.
Chennai, Pune, Delhi, Mumbai, Bengaluru, Hyderabad, Kolkata
INR 8.0 - 12.0 Lacs P.A.
Pune, Bengaluru, Mumbai (All Areas)
INR 0.5 - 0.7 Lacs P.A.
INR 2.5 - 5.5 Lacs P.A.
INR 3.0 - 4.5 Lacs P.A.
Bengaluru
INR 3.0 - 3.0 Lacs P.A.
Bengaluru
INR 3.5 - 3.75 Lacs P.A.
INR 2.5 - 3.0 Lacs P.A.
INR 4.0 - 4.0 Lacs P.A.