9 years

8 - 10 Lacs

Posted:1 week ago| Platform: GlassDoor logo

Apply

Work Mode

On-site

Job Type

Part Time

Job Description

Overview

We are seeking a highly skilled and proactive AI Solutions SRE Lead to oversee the maintenance, optimization, and ongoing performance of deployed AI/ML systems and solutions. In this role, you'll act as the bridge between innovation and operations, ensuring our AI solutions consistently deliver value and operate seamlessly in real-world environments. You will lead efforts to monitor deployments, troubleshoot issues, and define best practices for sustaining AI systems throughout their lifecycle.

Responsibilities

Monitoring & Sustenance:
  • Lead the post-deployment lifecycle of AI solutions, ensuring continued functionality, reliability, and scalability.
  • Establish monitoring frameworks to oversee system performance, usage, and metrics for AI/ML models and APIs.
  • Detect anomalies in AI systems, troubleshoot operational issues, and initiate timely corrective actions.
Performance Optimization:
  • Continuously assess and optimize the performance of AI models to maintain efficiency and accuracy in production environments.
  • Collaborate with data scientists and engineers to refine algorithms, retrain models, and update solutions as needed.
  • Implement automation where possible to streamline maintenance processes.
Stakeholder Collaboration:
  • Work with cross-functional teams (engineering, product, operations, etc.) to ensure alignment of AI sustainment activities with business goals.
  • Communicate effectively with stakeholders to provide updates on system health, risks, and improvements.
Governance & Best Practices:
  • Define and implement best practices for sustaining AI solutions, including documentation, testing protocols, and version control.
  • Ensure compliance with ethical AI standards, regulatory guidelines, and established governance frameworks.
  • Manage and mitigate risks associated with model drift, data shifts, and system vulnerabilities.
Incident Management:
  • Lead responses to critical incidents involving AI systems by performing root cause analysis and deploying solutions for quick resolution.
  • Advocate for proactive risk prevention and early detection strategies.
  • Mentor and develop junior team members, fostering their skills in AI observability and domain-specific knowledge in ML, Computer Vision, and Generative AI.

Qualifications

Required:
  • Bachelor's degree in Computer Science, Engineering, Data Science, or related field; advanced degree preferred.
  • 9+ years of experience in machine learning, data science, or software engineering roles, with significant exposure to Computer Vision and Generative AI projects.
  • 4+ years of experience specifically focused on AI/ML development and sustain the applications / solutions.
  • Strong programming skills in languages such as Python, Java, or Go.
  • Extensive experience with AI/ML frameworks (e.g., TensorFlow, PyTorch, scikit-learn) and cloud platforms (e.g., AWS, Azure, GCP).
  • Proficiency in data visualization tools and techniques (e.g., Grafana, Tableau, D3.js).
  • Deep understanding of AI/ML concepts, including model training, evaluation, and deployment, with specific knowledge of Computer Vision and Generative AI techniques.
  • Experience with monitoring and observability tools such as Prometheus, ELK stack, or similar systems.
  • Excellent problem-solving skills and ability to troubleshoot complex AI systems across various domains.
  • Proven track record of mentoring and developing junior team members in AI-related roles.
Preferred:
  • Experience with MLOps practices and tools, particularly for large-scale AI systems.
  • Familiarity with AI ethics and responsible AI principles, especially as they relate to Generative AI.
  • Knowledge of relevant AI regulations and compliance requirements, including those specific to Computer Vision applications.
  • Experience with distributed systems and large-scale data processing for AI applications.
  • Contributions to open-source projects or research publications in AI solution at production scale. Previous experience with large-scale AI/ML solutions in production environments.
  • Knowledge of DevOps principles and CI/CD pipelines specific to AI/ML systems.
Key Competencies
  • Strong analytical and critical thinking skills
  • Excellent communication and collaboration abilities
  • Proactive and self-motivated work ethic
  • Ability to explain complex technical concepts to both technical and non-technical audiences
  • Adaptability and willingness to learn in a rapidly evolving field
  • Strong mentorship and leadership skills
  • Deep curiosity and passion for AI, particularly in ML, Computer Vision, and Generative AI domains
  • We are looking for a passionate and innovative individual who can help us build robust, transparent, and reliable AI systems while nurturing the growth of our team. If you have a strong background in AI/ML, with specific expertise in Computer Vision and Generative AI, and a keen interest in observability and system reliability, we encourage you to apply.

Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

RecommendedJobs for You

Hyderabad, Telangana, India

Hyderabad, Telangana, India