Lead Software Engineer ML Ops & System Engineer

5 - 9 years

0 Lacs

Posted:1 month ago| Platform: Shine logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

As a Lead Software Engineer specializing in ML Ops and System Engineering, your role will involve designing, building, and scaling high-performance infrastructure while leading initiatives in software engineering, system reliability, and machine learning operations to deliver robust, production-ready solutions. Key Responsibilities: - Design and develop scalable, secure, and reliable microservices using Golang and Python. - Build and maintain containerized environments using Docker and orchestrate them with Kubernetes. - Implement CI/CD pipelines with Jenkins for automated testing, deployment, and monitoring. - Manage ML workflows with MLflow, ensuring reproducibility, versioning, and deployment of machine learning models. - Leverage Temporal for orchestrating complex workflows and ensuring fault-tolerant execution of distributed systems. - Work with AWS cloud services (EC2, S3, IAM, basics of networking) to deploy and manage scalable infrastructure. - Collaborate with data science and software teams to bridge the gap between ML research and production systems. - Ensure system reliability and observability through monitoring, logging, and performance optimization. - Mentor junior engineers and lead best practices for ML Ops, DevOps, and system design. Qualifications Required: - Minimum 5+ years of experience. - Bachelor's or Master's degree in Computer Science, Engineering, or a related technical field, or equivalent practical experience. - Strong programming skills in Golang and Python. - Hands-on experience with Kubernetes and Docker in production environments. - Proven experience in microservices architecture and distributed systems design. - Good understanding of AWS fundamentals (EC2, S3, IAM, networking basics). - Experience with MLflow for ML model tracking, management, and deployment. - Proficiency in CI/CD tools (preferably Jenkins). - Knowledge of Temporal or similar workflow orchestration tools. - Strong problem-solving and debugging skills in distributed systems. - Excellent communication and leadership skills with experience mentoring engineers.,

Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

RecommendedJobs for You