Job
Description
As a Senior MLOps Engineer, this role requires hands-on experience in building and managing ML Systems in Production with excellent MLOps skill sets. The position will be responsible for building machine learning pipelines and ML Application services, as well as providing machine learning engineering support for Production teams. This role operates at the crossroads of Machine Learning, Software/Data Engineering, and DevOps, leveraging best practices from each domain to optimize the deployment and management of machine learning models in production environments. The position would also be expected to be able to enhance the Technology stack, focusing on continuous improvements in the reliability, automation, and efficiency of the Platform Solutions. Roles and Responsibilities ML Platform Management: Manage ML Platform infrastructure to automate and accelerate model development and deployment. Pipeline Development: Define and build scalable ML pipelines that enable data scientists to build better models. Monitoring & Optimization: Build dashboards and other monitoring tools to optimize performance and infrastructure costs. Production Support: Monitor and provide support to infrastructure and in production systems. Performance Optimization: Identify possible issues and performance leakages in the system and perform optimization. Team Mentorship: Mentor team members through code review and knowledge sharing. Team Leadership: Lead, mentor, and manage a team of MLOps engineers, fostering a collaborative and high-performance environment. Project Oversight: Oversee the team's project management, ensuring timely and successful delivery of data solutions. Model Design & Development: Design, develop, and maintain data science models and algorithms along with the Data Science team to drive business insights and decision-making onto the ML Platforms. CI/CD Implementation: Implement and manage CI/CD pipelines of Core Solution Packages and Applications. Containerization: Containerize applications and workflows using Docker and App Service for consistent environments and deployment. Cross-functional Alignment: Work closely with various teams, including data science, engineering, and business units, to align ML Platform Solution with client goals. Communication & Collaboration: Facilitate communication and collaboration between team members and other departments to ensure cohesive project execution. Technical Support: Provide technical support and insights to cross-functional teams as needed. Security Compliance: Ensure ML systems comply with security standards and best practices in a cloud environment. Technical Debt Management: Identify and address technical debt in current ML projects, incorporating best MLOps practices. Preferred Experience Overall Experience: 8+ years of hands-on experience at scale in data science / ML engineering. Programming Skills: Excellent hands-on skills to write clean and structured SQL, Python, and Shell programs. Infrastructure Knowledge: Good experience with infrastructure, including Cloud Computing, Linux OS, Networks, Kubernetes, Docker, Infrastructure as Code, RDBMS, and NoSQL Databases. MLOps Understanding: Good understanding of Machine Learning concepts and MLOps Best Practices. Production ML Models: Quality experience in serving real-time, production-level machine learning models. Leadership: Proven experience leading and managing a team. Communication: Excellent communication skills. Azure Cloud Technologies: 5+ years hands-on experience with Azure Cloud Technologies such as Azure Databricks, Azure DevOps, Azure App Service, Docker, Azure Key Vault, and other managed Azure services. AWS Technologies: Proficiency in AWS technologies, including Athena, Glue, ECS, EKS, and VPC, as well as AWS SageMaker for deploying machine learning models, improving automation, and implementing essential checks. Python Proficiency: Proficiency in Python for developing automation Scripts and Pipelines. Infrastructure as Code (IaC): Familiarity with Infrastructure as Code (IaC) tools such as Terraform or AWS CloudFormation for managing cloud infrastructure. ML Lifecycle & Frameworks: Basic Familiarity with the Data Science and Machine Learning lifecycle, as well as frameworks like scikit-learn, PyTorch, and TensorFlow