Posted:3 months ago|
Platform:
Work from Office
Full Time
5+ years of experience As the Infrastructure and Ops Engineer, you will work on operations related our UAIS AI Studio (enterprise AI/ML platform), and in particular in relation to AI/ML training initiative supporting thousands of learners on the platform. This individual contributor (IC) role requires experience on working on large-scale AI/ML platforms guaranteeing stability, reliability, scalability, and performance. Experience with modern Infrastructure and DevOps tools and paradigms, as well as hands-on knowledge with major cloud-based services like Azure, AWS and GCP is a must. Primary Responsibilities: Continuous support: Provide continuous SRE support to thousands of geographically distributed learners on the UAIS platform: respond to tickets, triage support, liaise with customers. Automation & DevOps: Improve existing Infrastructure as Code (IaC) according to best DevOps practices. Systems Monitoring: Develop and maintain monitoring frameworks for UAIS infrastructure in relation to AI/ML training program Security & Compliance: Collaborate with cybersecurity teams to ensure all systems and operations comply with industry standards and are secure against evolving threats. Capacity Planning & Cost Optimization: Forecast and manage capacity requirements for the AI/ML training environment, while identifying opportunities to reduce costs without compromising performance. Required Qualifications: Bachelors degree in computer science, information technology, or a related field. 5+ years of infrastructure experience: Proven experience working on large-scale, cloud-based, enterprise-level software platforms and deep understanding of multi-cloud architectures, specifically Azure, AWS, and GCP, with hands-on experience in cloud management. 3+ years of practical experience in Infrastructure-as-Code and CI/CD tools like Terraform, Git Actions and alike. 2+ years of practical experience in containerization technologies (Kubernetes, Docker) and orchestration 2+ years of practical experience in Scripting & Automation Skills: Advanced proficiency in scripting languages such as Python and Bash to support automation and system integration efforts. Preferred Qualifications: Security & Compliance Knowledge: Strong understanding of security best practices and experience ensuring compliance with relevant regulatory frameworks. Machine Learning and LLM Operations: Exposure to modern tools and techniques in MLOps and LLMOps fields. Exposure to AI/ML-specific infrastructure tools (e.g., MLflow, Kubeflow) for managing and deploying models at scale. Exposure to a Regulated Industry: Experience working within a healthcare or regulated industry, with solid understanding of the unique challenges and compliance requirements. Ability to work independently, manage multiple projects simultaneously, and adapt to changing priorities in a fast-paced environment. Location- Anywhere in onsite, Delhi NCR, Bangalore, Chennai, Pune, Kolkata, Ahmedabad, Mumbai, Hyderabad
VAK Consulting LLC
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
My Connections VAK Consulting LLC
Delhi NCR, Mumbai, Bengaluru
22.5 - 25.0 Lacs P.A.
Chennai, Tamil Nadu, India
6.0 - 10.0 Lacs P.A.
Chennai, Tamil Nadu, India
7.0 - 10.0 Lacs P.A.
Bengaluru / Bangalore, Karnataka, India
3.0 - 7.0 Lacs P.A.
Hyderabad / Secunderabad, Telangana, Telangana, India
3.0 - 7.0 Lacs P.A.
Delhi, Delhi, India
3.0 - 7.0 Lacs P.A.
Noida, Uttar Pradesh, India
3.0 - 9.5 Lacs P.A.
Gurgaon / Gurugram, Haryana, India
7.0 - 14.0 Lacs P.A.
Noida, Uttar Pradesh, India
7.0 - 14.0 Lacs P.A.
Patan - Gujarat, Gujrat, India
4.0 - 11.0 Lacs P.A.