Your Title: AI Operations - Engineer
Job Location: Chennai
Our Department: CMS
Are you passionate about deploying, monitoring, and scaling machine learning systems in production environments and eager to contribute to robust AI infrastructure within a collaborative team
What You Will Do This role offers an exciting opportunity to begin a career in AI/ML operations engineering, working within a dynamic team that values reliability and continuous improvement. The successful candidate will contribute to the deployment and maintenance of AI/ML systems in production, gaining hands-on experience with MLOps best practices and infrastructure automation. This position provides a structured environment for developing core competencies in ML system operations, DevOps practices, and production ML monitoring, with direct guidance from experienced professionals.
-
Assist in the deployment and maintenance of machine learning models in production environments under direct supervision, learning containerization technologies like Docker and Kubernetes.
-
Support CI/CD pipeline development for ML workflows, including model versioning, automated testing, and deployment processes using tools like Jenkins, GitLab CI, or GitHub Actions.
-
Monitor ML model performance, data drift, and system health in production environments, implementing basic alerting and logging solutions.
-
Contribute to infrastructure automation and configuration management for ML systems, learning Infrastructure as Code (IaC) practices with tools like Terraform or CloudFormation.
-
Collaborate with ML engineers and data scientists to operationalize models, ensuring scalability, reliability, and adherence to established MLOps procedures and best practices.
What Skills & Experience You Should Bring
Required:
-
Bachelors degree in Computer Science, Engineering, Information Technology, or a closely related technical field. Trimbles Professional ladder typically requires four or more years of formal education.
-
Foundational knowledge of DevOps principles and practices, with understanding of CI/CD concepts and basic system administration.
-
Proficiency in at least one relevant programming language (e.g., Python, Bash) with focus on automation scripting and system integration.
-
Understanding of containerization technologies (Docker) and basic orchestration concepts (Kubernetes fundamentals).
-
Familiarity with version control systems (Git) and collaborative development workflows.
-
Basic understanding of machine learning concepts and the ML model lifecycle from development to production.
Preferred:
-
Experience with cloud computing platforms (AWS, Azure, GCP) and their ML/AI services (SageMaker, Azure ML, Vertex AI).
-
Familiarity with MLOps tools and frameworks (MLflow, Kubeflow, DVC, or similar).
-
Basic experience with monitoring and observability tools (Prometheus, Grafana, ELK stack).
-
Understanding of Infrastructure as Code (IaC) tools like Terraform, Ansible, or CloudFormation.
-
Experience with Linux system administration and command-line tools.
-
Knowledge of database systems and data pipeline technologies.
-
Exposure to model serving frameworks (TensorFlow Serving, TorchServe, ONNX Runtime).
-
Basic understanding of security best practices for ML systems and data governance.