AI/ML Ops Engineer

5 - 8 years

20 - 32 Lacs

Posted:Just now| Platform: Naukri logo

Apply

Work Mode

Work from Office

Job Type

Full Time

Job Description

Role

- We are seeking a seasoned AI/ML Engineer to join our Cloud Operations (CloudOps) team. The ideal candidate should have strong expertise in designing, developing, and deploying AI/ML models and automation solutions that optimize cloud infrastructure management and operational efficiency.

Primary Responsibilities

- Design, develop, and deploy machine learning models and AI algorithms to automate cloud infrastructure, monitoring, fault detection, and predictive maintenance.

- Collaborate with Cloud Engineers, ML Engineers, Data Scientists, and DevOps teams to to integrate AI/ML solutions into cloud orchestration and management platforms.

- Build scalable data pipelines and workflows using cloud-native services (preferably AWS, GCP) for real-time and batch ML model training and inference.

- Analyze large volumes of cloud telemetry data (logs, metrics, traces) to extract actionable insights using statistical methods and ML techniques.

- Implement anomaly detection, capacity forecasting, resource optimization, and automated remediation solutions.

- Develop APIs and microservices to expose AI/ML capabilities for CloudOps automation.

- Work with SecOps to ensure ML models comply with privacy, and governance standards.

- Optimize existing AI/ML workflows for cost, performance, and accuracy.

- Stay updated with the latest trends in AI/ML, cloud computing, and infrastructure automation.

Skills & Requirements

- 6+ years of professional experience in AI/ML engineering, preferably in cloud infrastructure or operations environments.

- Strong proficiency in Python (relevant libraries such as pandas and numpy) or R, or similar programming languages used in AI/ML development.

- Hands-on experience with machine learning frameworks such as TensorFlow, PyTorch, scikit-learn, or similar.

- Hands-on experience with MCP Server, AI Agents, and A2A (Agent-to-Agent) communication

- Experience with Large Language Model (LLM) operations (LLM Ops)

- Expertise in cloud platforms (AWS, Azure, or GCP) and their AI/ML services (e.g., SageMaker, Bedrock, Anthropic, Azure ML, OpenAI, Vertex AI).

- Strong understanding of data structures, algorithms, and statistical modeling.

- Experience building and maintaining data pipelines using tools like Apache Spark, Kafka, Airflow, or cloud-native alternatives.

- Knowledge of containerization and orchestration (Docker, Kubernetes) to deploy ML models in production.

- Familiarity with infrastructure monitoring tools (Prometheus, Grafana, ELK Stack) and cloud management platforms.

- Experience with CI/CD pipelines and automation tools in cloud environments.

- Excellent problem-solving skills and ability to work independently as well as in cross-functional teams.

- Strong communication skills for collaborating with technical and non-technical stakeholders.

- Knowledge of Infrastructure as Code (IaC) tools like Terraform, CloudFormation.

- Experience with cybersecurity principles related to cloud and AI/ML systems.

Mock Interview

Practice Video Interview with JobPe AI

Start Artificial Intelligence Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now
Automation Anywhere logo
Automation Anywhere

Robotic Process Automation (RPA)

San Jose

RecommendedJobs for You

chennai, bengaluru