Job
Description
We are looking for a skilled MLOps Engineer with 2-3 years of experience to join our team. As an MLOps Engineer, you will collaborate with data scientists, software engineers, and IT teams to ensure smooth deployment, scaling, and monitoring of machine learning models. Your responsibilities will include designing, developing, and maintaining automated pipelines for continuous integration and deployment (CI/CD) of machine learning models. You will manage model versioning, deployment, and monitoring in production environments and optimize the performance and scalability of machine learning models post-deployment. Collaboration with data science teams will be essential to improve model reproducibility, experiment tracking, and data workflows. You will implement monitoring, alerting, and logging solutions to ensure model performance and detect anomalies in production. Managing and scaling infrastructure required for model training and inference, whether on-premise or cloud, will be part of your daily tasks. Working closely with DevOps teams to integrate MLOps practices seamlessly into existing development workflows is also a key responsibility. Implementing security and compliance practices for AI/ML pipelines, including data governance, will be crucial. Troubleshooting issues in production environments and ensuring high availability of models will also be part of your role. Qualifications: - Education: Bachelor's degree in Computer Science, Engineering, or a related field. - Experience: 2-3 years of hands-on experience in MLOps, DevOps, or related fields. - Experience with machine learning lifecycle management tools such as MLflow, Kubeflow, or TFX. - Strong knowledge of cloud platforms such as AWS, Google Cloud, or Azure (experience in setting up AI/ML services is a plus). - Proficiency in scripting and automation (Python, Bash, etc.). - Experience with containerization (Docker) and orchestration tools (Kubernetes). - Familiarity with CI/CD tools like Jenkins, CircleCI, or GitLab CI for deploying machine learning models. - Knowledge of version control systems (e.g., Git) and infrastructure-as-code (e.g., Terraform, CloudFormation). - Understanding of monitoring and logging frameworks (e.g., Prometheus, Grafana, ELK stack). - Familiarity with data engineering tools (e.g., Apache Airflow, Kafka) is a plus.,