Job
Description
As a MLOps Engineer at Qualcomm India Private Limited, you will have the opportunity to contribute to the development and maintenance of the ML platform both on premises and AWS Cloud. Your role will involve architecting, deploying, and optimizing the ML & Data platform that supports training of Machine Learning Models using NVIDIA DGX clusters and the Kubernetes platform. You will collaborate with various teams to ensure the smooth operation and scalability of the ML infrastructure. **Responsibilities:** - Architect, develop, and maintain the ML platform for training and inference of ML models. - Design and implement scalable infrastructure solutions for NVIDIA clusters on-premises and AWS Cloud. - Collaborate with data scientists and software engineers to integrate ML and Data workflows. - Optimize platform performance and scalability, focusing on GPU resource utilization, data ingestion, model training, and deployment. - Monitor system performance, identify and resolve issues, and ensure platform availability. - Implement CI/CD pipelines for automated model training, evaluation, and deployment. - Implement monitoring stack using Prometheus and Grafana. - Manage AWS services to support the platform. - Implement logging and monitoring solutions using AWS CloudWatch. **Qualifications:** - Bachelor's or Master's degree in Computer Science, Engineering, or a related field. - Proven experience as an MLOps Engineer with expertise in ML and Data infrastructure and GPU clusters. - Strong knowledge of NVIDIA DGX clusters and Kubernetes platform. - Proficient in Python, Go, and ML frameworks like TensorFlow and PyTorch. - Understanding of distributed computing, parallel computing, and GPU acceleration. - Experience with Docker, orchestration tools, CI/CD pipelines, and AWS services. - Strong problem-solving skills and communication skills for effective teamwork. **Additional Information:** Qualcomm India Private Limited is committed to providing an accessible process for individuals with disabilities. For accommodation during the application/hiring process, you may contact disability-accommodations@qualcomm.com. Qualcomm expects its employees to adhere to all applicable policies and procedures, including security requirements for protecting confidential information. If you are an individual seeking a job at Qualcomm, please note that staffing and recruiting agencies are not authorized to submit profiles, applications, or resumes through the Qualcomm Careers Site. Unsolicited submissions from agencies will not be accepted. For more information about the MLOps Engineer role, please contact Qualcomm Careers. As a MLOps Engineer at Qualcomm India Private Limited, you will have the opportunity to contribute to the development and maintenance of the ML platform both on premises and AWS Cloud. Your role will involve architecting, deploying, and optimizing the ML & Data platform that supports training of Machine Learning Models using NVIDIA DGX clusters and the Kubernetes platform. You will collaborate with various teams to ensure the smooth operation and scalability of the ML infrastructure. **Responsibilities:** - Architect, develop, and maintain the ML platform for training and inference of ML models. - Design and implement scalable infrastructure solutions for NVIDIA clusters on-premises and AWS Cloud. - Collaborate with data scientists and software engineers to integrate ML and Data workflows. - Optimize platform performance and scalability, focusing on GPU resource utilization, data ingestion, model training, and deployment. - Monitor system performance, identify and resolve issues, and ensure platform availability. - Implement CI/CD pipelines for automated model training, evaluation, and deployment. - Implement monitoring stack using Prometheus and Grafana. - Manage AWS services to support the platform. - Implement logging and monitoring solutions using AWS CloudWatch. **Qualifications:** - Bachelor's or Master's degree in Computer Science, Engineering, or a related field. - Proven experience as an MLOps Engineer with expertise in ML and Data infrastructure and GPU clusters. - Strong knowledge of NVIDIA DGX clusters and Kubernetes platform. - Proficient in Python, Go, and ML frameworks like TensorFlow and PyTorch. - Understanding of distributed computing, parallel computing, and GPU acceleration. - Experience with Docker, orchestration tools, CI/CD pipelines, and AWS services. - Strong problem-solving skills and communication skills for effective teamwork. **Additional Information:** Qualcomm India Private Limited is committed to providing an accessible process for individuals with disabilities. For accommodation during the application/hiring process, you may contact disability-accommodations@qualcomm.com. Qualcomm expects its employees to adhere to all applicable policies and procedures, including security requirements for protecting confidential information. If you are an individual seeking a job at Qualco