DevOps Engineer - AI/ML Infrastructure

5 years

20 Lacs

Posted:2 weeks ago| Platform: GlassDoor logo

Apply

Work Mode

Remote

Job Type

Part Time

Job Description

Job Information

    Date Opened

    27/08/2025

    Job Type

    Permanent

    Work Experience

    5+ years

    Industry

    IT Services

    Salary

    15-20 LPA

    City

    Bangalore

    Province

    Karnataka

    Country

    India

    Postal Code

    560052

Job Description

Position Overview


We are seeking an experienced DevOps Engineer to architect and manage the infrastructure backbone of our revolutionary AI startup. This role offers an exceptional opportunity to build scalable, secure, and efficient systems that power next-generation AI applications. You'll work directly with our founding team to establish DevOps practices that will scale from MVP to enterprise-level solutions.


Key Responsibilities

Infrastructure & Cloud Management

  • Design and implement scalable cloud infrastructure on AWS, Azure, or GCP for AI/ML workloads
  • Architect and manage Kubernetes clusters optimised for ML training and inference
  • Build and maintain infrastructure as code using Terraform, CloudFormation, or Pulumi
  • Implement auto-scaling solutions for variable AI compute demands
  • Manage GPU clusters and specialized hardware for deep learning workloads

MLOps & AI Pipeline Management

  • Design and implement CI/CD pipelines specifically for machine learning model deployment
  • Build automated model training, validation, and deployment workflows
  • Implement model versioning, experiment tracking, and artifact management systems
  • Set up monitoring and alerting for ML model performance and data drift detection
  • Create disaster recovery and rollback strategies for AI model deployments

Platform Engineering

  • Develop internal developer platforms and self-service tools for the engineering team
  • Implement secure API gateways and microservices architecture for AI applications
  • Build and maintain data pipelines for real-time and batch processing
  • Design secrets management and security policies for sensitive AI data and models
  • Establish logging, monitoring, and observability across all systems

Security & Compliance

  • Implement security best practices for AI systems and sensitive data handling
  • Design and maintain network security, firewalls, and VPN configurations
  • Establish backup and disaster recovery procedures for critical AI infrastructure
  • Ensure compliance with data protection regulations and industry standards
  • Conduct regular security audits and vulnerability assessments

Performance Optimization

  • Monitor and optimize infrastructure costs, especially for expensive GPU resources
  • Implement caching strategies for AI inference and data processing
  • Optimize container orchestration for maximum resource utilization
  • Performance tune databases and storage systems for AI workloads
  • Establish SLA monitoring and capacity planning procedures

Required Qualifications

Technical Expertise

  • Cloud Platforms: 4+ years hands-on experience with AWS, Azure, or GCP
  • Containerization: Expert-level Docker and Kubernetes skills with production experience
  • Infrastructure as Code: Proficiency with Terraform, Ansible, or similar tools
  • CI/CD: Experience building robust pipelines using Jenkins, GitLab CI, GitHub Actions, or Azure DevOps
  • Programming: Strong scripting skills in Python, Bash, and familiarity with Go or Java

AI/ML Infrastructure Knowledge

  • Experience deploying and managing ML models in production environments
  • Understanding of GPU computing, CUDA, and specialized AI hardware
  • Familiarity with ML frameworks (TensorFlow, PyTorch, Scikit-learn) and their deployment requirements
  • Knowledge of data engineering tools and big data processing (Spark, Kafka, Airflow)
  • Experience with ML model serving platforms (MLflow, Kubeflow, Seldon, or TensorFlow Serving)

DevOps Fundamentals

  • 5-8 years of DevOps/SRE experience with demonstrated expertise in production systems
  • Strong Linux administration skills and system performance optimization
  • Experience with monitoring tools (Prometheus, Grafana, ELK/EFK stack, Datadog)
  • Database management experience (PostgreSQL, MongoDB, Redis) with backup/recovery
  • Network engineering knowledge including load balancers, CDNs, and service meshes

Preferred Qualifications

  • Previous experience in AI/ML startups or high-growth technology companies
  • Certifications in cloud platforms (AWS Solutions Architect, Azure DevOps Engineer, etc.)
  • Experience with edge computing and distributed AI inference systems
  • Knowledge of data privacy frameworks and federated learning infrastructure
  • Familiarity with FinOps practices for cloud cost optimization
  • Experience with service mesh technologies (Istio, Linkerd, Consul Connect)

What We Offer

Compensation & Benefits

  • Competitive salary up to ₹20,00,000 per annum
  • Comprehensive health insurance with family coverage and wellness benefits

Technical Growth

  • Access to cutting-edge AI infrastructure and latest cloud technologies
  • Opportunity to shape the technical architecture of a groundbreaking AI product
  • Direct collaboration with world-class AI researchers and engineers
  • Mentorship from experienced startup founders and tech leaders

Work Environment

  • Flexible working arrangements with hybrid and remote options
  • Modern office in Bengaluru with high-end development workstations
  • Unlimited learning resources and access to cloud credits for experimentation
  • Fast-paced, innovation-driven culture with direct impact on product success
  • Regular tech talks, hackathons, and team building activities

Career Impact

  • Ground-floor opportunity in a stealth-mode AI company with massive potential
  • Chance to build infrastructure that will serve millions of users
  • Direct reporting to CTO/Founders with significant decision-making authority
  • Opportunity to lead and build the DevOps team as the company scales
  • Potential for international expansion and technology leadership roles

About This Opportunity


Join us at the most exciting phase of our journey. As one of our first DevOps hires, you'll have unprecedented influence over our technical infrastructure and engineering culture. This role is perfect for someone who wants to combine deep technical expertise with entrepreneurial impact in the rapidly evolving AI landscape.


You'll work on challenging problems like:

  • Scaling AI training from single GPUs to multi-node clusters
  • Implementing real-time AI inference at global scale
  • Building secure, compliant infrastructure for sensitive AI applications
  • Optimizing costs while maintaining high performance for variable AI workloads

Required Mindset

  • Strong problem-solving skills with ability to debug complex distributed systems
  • Excellent communication skills for cross-functional collaboration
  • Passion for automation, efficiency, and engineering excellence
  • Interest in AI/ML technology and its infrastructure challenges


Note
: Due to our stealth mode status, specific product and technology details will be shared during the interview process with qualified candidates who execute appropriate NDAs.


Application Requirements


Please submit:

  • Detailed resume highlighting relevant DevOps and AI infrastructure experience
  • GitHub/GitLab profile showcasing infrastructure code and automation projects
  • Brief cover letter explaining your interest in AI DevOps and startup environments
  • Any relevant cloud certifications, case studies, or technical blog posts


We are committed to building a diverse and inclusive team. All qualified applicants will receive equal consideration regardless of race, gender, age, religion, sexual orientation, disability status, or veteran status.

Mock Interview

Practice Video Interview with JobPe AI

Start DevOps Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

RecommendedJobs for You