*Job Description: Cloud/Platform/MLOps Engineer*
Location: Mumbai
Position: Onsite
*Position Overview:*
We are seeking a skilled and experienced *Platform/MLOps Engineer* to join our teamThe ideal
candidate will play a critical role in designing, implementing, and maintaining scalable and reliable
machine learning operations infrastructure to support our data science and machine learning teams.
You will work closely with cross-functional teams to enable the seamless deployment, monitoring,
and optimization of ML models and pipelines in production environments.
*Key Responsibilities:*
1. *MLOps & Model Deployment:*
- Design and implement robust MLOps pipelines for deploying, monitoring, and retraining ML
models.
- Automate model training, testing, and deployment workflows using tools like *Kubeflow,
**Airflow, and **MLFlow*.
- Ensure seamless integration of CI/CD pipelines for ML models to accelerate deployment cycles.
2. *Infrastructure Management:*
- Manage and optimize cloud-based infrastructure on *Azure* (Kubernetes, Blob Storage, Virtual
Machines, and Databases).
- Implement infrastructure as code (IaC) solutions using *Terraform* or *Azure Resource Manager
(ARM) templates*.
- Optimize resource utilization for cost-efficiency and performance in both Azure and *AWS*
environments (optional).
3. *Pipeline Orchestration & Automation:*
- Build and maintain data and ML pipelines using *Apache Airflow* or similar workflow
orchestration tools.
- Collaborate with data engineering teams to ensure reliable data pipelines and preprocessing
systems.
4. *Monitoring & Observability:*
- Set up logging, monitoring, and alerting for ML pipelines and deployed models using tools like
*Prometheus, **Grafana, or **Azure Monitor*.
- Establish best practices for performance tracking, drift detection, and model versioning.
5. *Development & Collaboration:*
- Write efficient, maintainable, and well-documented code in *Python* for automation and
infrastructure management.
- Collaborate with data scientists, data engineers, and DevOps teams to ensure seamless
integration between tools, pipelines, and applications.
- Use *Git* for version control and manage repositories in collaborative workflows.
6. *Security & Compliance:*
- Manage authentication and authorization for cloud services and pipelines using tools like *Azure
AD* and *IAM policies*.
*Required Skills & Qualifications:*
- Strong experience with *MLOps* tools and frameworks such as *Kubeflow, **MLFlow, and
**Apache Airflow*.
- Proficiency in *CI/CD pipelines* for machine learning workflows (e.g., GitHub Actions, Azure
DevOps, Jenkins).
- Hands-on experience with *Azure Cloud* (Kubernetes, Blob Storage, Databases, VMs).
- Solid programming skills in *Python* for scripting, automation, and pipeline development.
- Familiarity with containerization and orchestration tools such as *Docker* and *Kubernetes*.
- Experience with *Git* for version control and collaborative workflows.
- Understanding of monitoring and observability tools like *Prometheus, **Grafana, or **Azure
Monitor*.
*Preferred Skills:*
- Knowledge of *AWS Cloud* services (S3, EC2, SageMaker, etc.).
- Experience with IaC tools like *Terraform* or *CloudFormation*.
- Understanding of distributed systems and parallel computing for ML pipelines.
*Soft Skills:*
- Strong problem-solving and analytical skills.
- Excellent communication and collaboration abilities.
- Adaptability to a fast-paced, dynamic work environment.
*Educational Qualifications:*
- Bachelors or Masters degree in Computer Science, Engineering, or a related field.
- Certifications in Azure, AWS, or Kubernetes (e.g., Azure Solutions Architect, CKA) are a plus