Senior DevOps Engineer - AI/ML Infrastructure

7 - 12 years

15 - 27 Lacs

Posted:Just now| Platform: Naukri logo

Apply

Work Mode

Work from Office

Job Type

Full Time

Job Description

Position Overview:

Key Responsibilities:

  • Design and implement CI/CD pipelines for AI applications including model deployment and agent workflows
  • Build and maintain Kubernetes clusters optimized for AI workloads including GPU resource management
  • Implement comprehensive monitoring and observability for AI systems including custom metrics for model performance
  • Develop infrastructure-as-code solutions for scalable AI service deployments
  • Establish reliability engineering practices including SLA management and incident response for AI systems
  • Optimize cloud infrastructure costs with focus on GPU utilization and LLM API usage
  • Implement security and compliance frameworks for AI applications and data pipelines
  • Collaborate with development teams to ensure production readiness of AI agents and RAG systems
  • Manage multi-cloud deployments and vendor integrations for AI services

Required Qualifications:

  • Bachelor's degree in Computer Science, Engineering, or related technical field
  • 7-10 years of DevOps/Infrastructure experience with demonstrated production system ownership
  • Strong expertise in Kubernetes orchestration and container management (Docker)
  • Proficient in Python scripting and automation
  • Extensive experience with Linux system administration and performance tuning
  • Hands-on experience with Jenkins or similar CI/CD platforms
  • Production experience with cloud platforms (AWS, GCP, or Azure)
  • Experience with Infrastructure-as-Code tools (Terraform, CloudFormation, or similar)

AI/ML Infrastructure Requirements:

  • Experience deploying and managing AI/ML workloads in production environments
  • Understanding of RAG system infrastructure requirements and vector database operations
  • Knowledge of LLM API integration patterns and rate limiting strategies
  • Experience with GPU cluster management and resource optimization
  • Familiarity with AI agent workflows and their operational characteristics

Site Reliability Engineering Skills:

  • Production monitoring and alerting experience with tools like Prometheus, Grafana, or DataDog
  • Incident response and post-mortem experience with complex distributed systems
  • Capacity planning and performance optimization for high-traffic applications
  • Experience with log aggregation and distributed tracing systems
  • Understanding of reliability patterns including circuit breakers and graceful degradation

Preferred Qualifications:

  • Experience with MLOps practices and model deployment pipelines
  • Knowledge of AI-specific monitoring including model drift detection and performance metrics
  • Experience with cost optimization strategies for AI workloads
  • Background in financial services, gaming, or other high-availability environments
  • Certification in major cloud platforms (AWS Solutions Architect, GCP Professional, etc.)
  • Experience with service mesh technologies (Istio, Linkerd)

Technical Environment:

  • Multi-cloud infrastructure with primary focus on AWS/GCP
  • Kubernetes-based container orchestration
  • Modern observability stack with custom AI metrics
  • GitOps workflows and infrastructure automation
  • Integration with enterprise security and compliance frameworks

    Role & responsibilities

Preferred candidate profile

Mock Interview

Practice Video Interview with JobPe AI

Start Job-Specific Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now
Eteam logo
Eteam

Information Technology and Services

Irvine

RecommendedJobs for You

hyderabad, chennai, bengaluru