Openllmetry Jobs – Apply to Latest Openllmetry Job Vacancies

5 - 10 years

7 - 12 Lacs

Bengaluru

Work from Office

About the Job: The Data Development Insights & Strategy (DDIS) team is seeking a Senior AI Engineer to design, scale, and maintain our AI model lifecycle framework within Red Hat's OpenShift AI and RHEL AI infrastructures. As a Senior AI Engineer, you will contribute to managing and optimizing large-scale AI models, collaborating with cross-functional teams to ensure high availability, continuous monitoring, and efficient integration of new model updates, while driving innovation through emerging AI technologies. In this role, you will leverage your expertise in AI, MLOps/LLMOps, cloud computing, and distributed systems to enhance model performance, scalability and operational efficiency. You'll work in close collaboration with the Products & Global Engineering(P&GE) and IT AI Infra teams, ensuring seamless model deployment and maintenance in a secure and high-performance environment. This is an exciting opportunity to drive AI model advancements and contribute to the operational success of mission-critical applications. What you will do? Develop and maintain the lifecycle framework for AI models within Red Hat's OpenShift and RHEL AI infrastructure, ensuring security, scalability and efficiency throughout the process. Design, implement, and optimize CI/CD pipelines and automation for deploying AI models at scale using tools like Git, Jenkins, and Terraform, ensuring zero disruption during updates and integration. Continuously monitor and improve model performance using tools such as OpenLLMetry, Splunk, and Catchpoint, while responding to performance degradation and model-related issues. Work closely with cross-functional teams, including Products & Global Engineering(P&GE) and IT AI Infra teams, to seamlessly integrate new models or model updates into production systems with minimal downtime and disruption. Enable a structured process for handling feature requests (RFEs), prioritization, and resolution, ensuring transparent communication and timely resolution of model issues. Assist in fine-tuning and enhancing large-scale models, including foundational models like Mistral and LLama, while ensuring computational resources are optimally allocated (GPU management, cost management strategies). Drive performance improvements, model updates, and releases on a quarterly basis, ensuring that all RFEs are processed and resolved within 30 days. Collaborate with stakeholders to align AI model updates with evolving business needs, data changes, and emerging technologies. Contribute to mentoring junior engineers, fostering a collaborative and innovative environment. What you will bring? A bachelor's or master's degree in Computer Science, Data Science, Machine Learning, or a related technical field is required. Hands-on experience that demonstrates your ability and interest in AI engineering and MLOps will be considered in lieu of formal degree requirements. Experience programming in at least one of these languages: Python, with a strong understanding of Machine Learning frameworks and tools. Experience working with cloud platforms such as AWS, GCP, or Azure, and have familiarity with deploying and maintaining AI models at scale in these environments. As a Senior AI Engineer, you will be most successful if you have experience working with large-scale distributed systems and infrastructure, especially in production environments where AI and LLM models are deployed and maintained. You should be comfortable troubleshooting, optimizing, and automating workflows related to AI model deployment, monitoring, and lifecycle management. We value a strong ability to debug and optimize model performance and automate manual tasks wherever possible. Additionally, you should be well-versed in managing AI model infrastructure using containerization technologies like Kubernetes and OpenShift, and have hands-on experience with performance monitoring tools (e.g., OpenLLMetry, Splunk, Catchpoint). We also expect you to have a solid understanding of GPU-based computing and resource optimization, with a background in high-performance computing (e.g., CUDA, vLLM, MIG, TGI, TEI). Experience working in Agile development environments. Work collaboratively within cross-functional teams to solve complex problems and drive AI model updates will be key to your success in this role. Desired skills: 5+ years of experience in AI or MLOps, with a focus on deploying, maintaining, and optimizing large-scale AI models in production. Expertise in deploying and managing models in cloud environments (AWS, GCP, Azure) and containerized platforms like OpenShift or Kubernetes. Familiarity with large-scale distributed systems and experience managing their performance and scalability. Experience with performance monitoring and analysis tools such as OpenLLMetry, Prometheus, or Splunk. Deep understanding of GPU-based deployment strategies and computational cost management. Strong experience in managing model lifecycle processes, from training to deployment, monitoring, and updates. Ability to mentor junior engineers and promote knowledge sharing across teams. Excellent communication skills, both verbal and written, with the ability to engage with technical and non-technical stakeholders. A passion for innovation and continuous learning in the rapidly evolving field of AI and machine learning.

Posted 2 months ago

Apply

Principal AI Engineer Red Hat

10 - 14 years

12 - 16 Lacs

Bengaluru

Work from Office

About the Job: The Data Development Insights & Strategy (DDIS) team is seeking a Principal AI Engineer to lead the design, development, and optimization of AI model lifecycle frameworks within Red Hats OpenShift AI and RHEL AI infrastructures. As a Principal AI Engineer, you will play a key leadership role in overseeing the strategic direction of AI model deployment and lifecycle management, collaborating across teams to ensure seamless integration, scalability, and performance of mission-critical AI models. In this role, you will drive the development of innovative solutions for the AI model lifecycle, applying your deep expertise in MLOps/LLMOps, cloud computing, and distributed systems. You will be a technical leader who mentors and guides teams in collaboration with Products & Global Engineering (P&GE) and IT AI Infra to ensure efficient model deployment and maintenance in secure, scalable environments. This is an exciting opportunity for someone who wants to take a leadership role in influencing the strategic direction of Red Hat's AI innovations, driving the innovation and optimization of AI models and technologies. What you will do? Lead the design and development of scalable, efficient, and secure AI model lifecycle frameworks within Red Hats OpenShift and RHEL AI infrastructures, ensuring models are deployed and maintained with minimal disruption and optimal performance. Define and implement the strategy for optimizing AI model deployment, scaling, and integration across hybrid cloud environments (AWS, GCP, Azure), working with cross-functional teams to ensure consistent high availability and operational excellence. Spearhead the creation and optimization of CI/CD pipelines and automation for AI model deployments, leveraging tools such as Git, Jenkins, and Terraform, ensuring zero disruption during updates and integration. Champion the use of advanced monitoring tools (e.g., OpenLLMetry, Splunk, Catchpoint) to monitor and optimize model performance, responding to issues and leading the troubleshooting of complex problems related to AI and LLM models. Lead cross-functional collaboration in collaboration with Products & Global Engineering (P&GE) and IT AI Infra teams to ensure seamless integration of new models or model updates into production systems, adhering to best practices and minimizing downtime. Define and oversee the structured process for handling feature requests (RFEs), prioritization, and resolution, ensuring transparency and timely delivery of updates and enhancements. Lead and influence the adoption of new AI technologies, tools, and frameworks to ensure that Red Hat remains at the forefront of AI and machine learning advancements. Drive performance improvements, model updates, and releases on a quarterly basis, ensuring RFEs are processed and resolved within agreed-upon timeframes and driving business adoption. Oversee the fine-tuning and enhancement of large-scale models, including foundational models like Mistral and LLama, ensuring the optimal allocation of computational resources (GPU management, cost management strategies). Lead a team of engineers, mentoring junior and senior talent, fostering an environment of collaboration and continuous learning, and driving the technical growth of the team. Contribute to strategic discussions with leadership, influencing the direction of AI initiatives and ensuring alignment with broader business goals and technological advancements. What you will bring? A bachelors or masters degree in Computer Science, Data Science, Machine Learning, or a related technical field is required. Hands-on experience and demonstrated leadership in AI engineering and MLOps will be considered in lieu of formal degree requirements. 10+ years of experience in AI or MLOps, with at least 3 years in a technical leadership role managing the deployment, optimization, and lifecycle of large-scale AI models. You should have deep expertise in cloud platforms (AWS, GCP, Azure) and containerized environments (OpenShift, Kubernetes), with a proven track record in scaling and managing AI infrastructure in production. Experience optimizing large-scale distributed AI systems, automating deployment pipelines using CI/CD tools like Git, Jenkins, and Terraform, and leading performance monitoring using tools such as OpenLLMetry, Splunk, or Catchpoint. You should have a strong background in GPU-based computing and resource optimization (e.g., CUDA, MIG, vLLM) and be comfortable with high-performance computing environments. Your leadership skills will be key, as you will mentor and guide engineers while fostering a collaborative, high-performance culture. You should also have a demonstrated ability to drive innovation, solve complex technical challenges, and work cross-functionally with teams to deliver AI model updates that align with evolving business needs. A solid understanding of Agile development processes and excellent communication skills are essential for this role. Lastly, a passion for AI, continuous learning, and staying ahead of industry trends will be vital to your success at Red Hat. Desired skills: 10+ years of experience in AI, MLOps, or related fields, with a substantial portion of that time spent in technical leadership roles driving the strategic direction of AI infrastructure and model lifecycle management. Extensive experience with foundational models such as Mistral, LLama, GPT, and their deployment, tuning, and scaling in production environments. Proven ability to influence and drive AI and MLOps roadmaps, shaping technical strategy and execution in collaboration with senior leadership. In-depth experience with performance monitoring, resource optimization, and troubleshooting of AI models in complex distributed environments. Strong background in high-performance distributed systems and container orchestration, particularly in AI/ML workloads. Proven experience in guiding and mentoring engineering teams to build high-performance capabilities, fostering a culture of continuous improvement and technical innovation. As a Principal AI Engineer at Red Hat, you will have the opportunity to drive major strategic AI initiatives, influence the future of AI infrastructure, and lead a high-performing engineering team. This is a unique opportunity for a seasoned AI professional to shape the future of AI model lifecycle management at scale. If youre ready to take on a technical leadership role with a high level of responsibility and impact, we encourage you to apply.

Posted 3 months ago

Apply

Senior AI Engineer Red Hat

5 - 8 years

7 - 10 Lacs

Bengaluru

Work from Office

About the Job: The Data Development Insights & Strategy (DDIS) team is seeking a Senior AI Engineer to design, scale, and maintain our AI model lifecycle framework within Red Hat's OpenShift AI and RHEL AI infrastructures. As a Senior AI Engineer, you will contribute to managing and optimizing large-scale AI models, collaborating with cross-functional teams to ensure high availability, continuous monitoring, and efficient integration of new model updates, while driving innovation through emerging AI technologies. In this role, you will leverage your expertise in AI, MLOps/LLMOps, cloud computing, and distributed systems to enhance model performance, scalability and operational efficiency. You'll work in close collaboration with the Products & Global Engineering(P&GE) and IT AI Infra teams, ensuring seamless model deployment and maintenance in a secure and high-performance environment. This is an exciting opportunity to drive AI model advancements and contribute to the operational success of mission-critical applications. What you will do? Develop and maintain the lifecycle framework for AI models within Red Hats OpenShift and RHEL AI infrastructure, ensuring security, scalability and efficiency throughout the process. Design, implement, and optimize CI/CD pipelines and automation for deploying AI models at scale using tools like Git, Jenkins, and Terraform, ensuring zero disruption during updates and integration. Continuously monitor and improve model performance using tools such as OpenLLMetry, Splunk, and Catchpoint, while responding to performance degradation and model-related issues. Work closely with cross-functional teams, including Products & Global Engineering(P&GE) and IT AI Infra teams, to seamlessly integrate new models or model updates into production systems with minimal downtime and disruption. Enable a structured process for handling feature requests (RFEs), prioritization, and resolution, ensuring transparent communication and timely resolution of model issues. Assist in fine-tuning and enhancing large-scale models, including foundational models like Mistral and LLama, while ensuring computational resources are optimally allocated (GPU management, cost management strategies). Drive performance improvements, model updates, and releases on a quarterly basis, ensuring that all RFEs are processed and resolved within 30 days. Collaborate with stakeholders to align AI model updates with evolving business needs, data changes, and emerging technologies. Contribute to mentoring junior engineers, fostering a collaborative and innovative environment. What you will bring? A bachelor's or masters degree in Computer Science, Data Science, Machine Learning, or a related technical field is required. Hands-on experience that demonstrates your ability and interest in AI engineering and MLOps will be considered in lieu of formal degree requirements. Experience programming in at least one of these languages: Python, with a strong understanding of Machine Learning frameworks and tools. Experience working with cloud platforms such as AWS, GCP, or Azure, and have familiarity with deploying and maintaining AI models at scale in these environments. As a Senior AI Engineer, you will be most successful if you have experience working with large-scale distributed systems and infrastructure, especially in production environments where AI and LLM models are deployed and maintained. You should be comfortable troubleshooting, optimizing, and automating workflows related to AI model deployment, monitoring, and lifecycle management. We value a strong ability to debug and optimize model performance and automate manual tasks wherever possible. Additionally, you should be well-versed in managing AI model infrastructure using containerization technologies like Kubernetes and OpenShift, and have hands-on experience with performance monitoring tools (e.g., OpenLLMetry, Splunk, Catchpoint). We also expect you to have a solid understanding of GPU-based computing and resource optimization, with a background in high-performance computing (e.g., CUDA, vLLM, MIG, TGI, TEI). Experience working in Agile development environments. Work collaboratively within cross-functional teams to solve complex problems and drive AI model updates will be key to your success in this role. Desired skills: 5+ years of experience in AI or MLOps, with a focus on deploying, maintaining, and optimizing large-scale AI models in production. Expertise in deploying and managing models in cloud environments (AWS, GCP, Azure) and containerized platforms like OpenShift or Kubernetes. Familiarity with large-scale distributed systems and experience managing their performance and scalability. Experience with performance monitoring and analysis tools such as OpenLLMetry, Prometheus, or Splunk. Deep understanding of GPU-based deployment strategies and computational cost management. Strong experience in managing model lifecycle processes, from training to deployment, monitoring, and updates. Ability to mentor junior engineers and promote knowledge sharing across teams. Excellent communication skills, both verbal and written, with the ability to engage with technical and non-technical stakeholders. A passion for innovation and continuous learning in the rapidly evolving field of AI and machine learning.

Posted 3 months ago

Apply

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.