Posted:3 weeks ago|
Platform:
Work from Office
Full Time
5+ years of experience
As the Infrastructure and Ops Engineer, you will work on operations related our UAIS AI Studio (enterprise AI/ML platform), and in particular in relation to AI/ML training initiative supporting thousands of learners on the platform. This individual contributor (IC) role requires experience on working on large-scale AI/ML platforms guaranteeing stability, reliability, scalability, and performance. Experience with modern Infrastructure and DevOps tools and paradigms, as well as hands-on knowledge with major cloud-based services like Azure, AWS and GCP is a must.
Primary Responsibilities:Continuous support: Provide continuous SRE support to thousands of geographically distributed learners on the UAIS platform: respond to tickets, triage support, liaise with customers. Automation & DevOps: Improve existing Infrastructure as Code (IaC) according to best DevOps practices.Systems Monitoring: Develop and maintain monitoring frameworks for UAIS infrastructure in relation to AI/ML training programSecurity & Compliance: Collaborate with cybersecurity teams to ensure all systems and operations comply with industry standards and are secure against evolving threats.Capacity Planning & Cost Optimization: Forecast and manage capacity requirements for the AI/ML training environment, while identifying opportunities to reduce costs without compromising performance.
Required Qualifications:Bachelors degree in computer science, information technology, or a related field.5+ years of infrastructure experience: Proven experience working on large-scale, cloud-based, enterprise-level software platforms and deep understanding of multi-cloud architectures, specifically Azure, AWS, and GCP, with hands-on experience in cloud management.3+ years of practical experience in Infrastructure-as-Code and CI/CD tools like Terraform, Git Actions and alike.2+ years of practical experience in containerization technologies (Kubernetes, Docker) and orchestration2+ years of practical experience in Scripting & Automation Skills: Advanced proficiency in scripting languages such as Python and Bash to support automation and system integration efforts.
Preferred Qualifications:Security & Compliance Knowledge: Strong understanding of security best practices and experience ensuring compliance with relevant regulatory frameworks.Machine Learning and LLM Operations: Exposure to modern tools and techniques in MLOps and LLMOps fields. Exposure to AI/ML-specific infrastructure tools (e.g., MLflow, Kubeflow) for managing and deploying models at scale.Exposure to a Regulated Industry: Experience working within a healthcare or regulated industry, with solid understanding of the unique challenges and compliance requirements.Ability to work independently, manage multiple projects simultaneously, and adapt to changing priorities in a fast-paced environment.
VAK Consulting LLC
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.
We have sent an OTP to your contact. Please enter it below to verify.
Practice Python coding challenges to boost your skills
Start Practicing Python Now22.5 - 25.0 Lacs P.A.
bengaluru
10.0 - 14.0 Lacs P.A.
hyderabad
5.0 - 9.0 Lacs P.A.
mumbai, delhi / ncr, bengaluru
8.0 - 12.0 Lacs P.A.
noida, chennai, bengaluru
14.0 - 18.0 Lacs P.A.
7.0 - 11.0 Lacs P.A.
6.0 - 10.0 Lacs P.A.
pune, chennai, bengaluru
5.0 - 9.0 Lacs P.A.
bengaluru
13.0 - 17.0 Lacs P.A.
bengaluru
9.0 - 14.0 Lacs P.A.