Jobs
Interviews

1154 Prometheus Jobs - Page 7

Setup a job Alert
JobPe aggregates results for easy application access, but you actually apply on the job portal directly.

4.0 - 8.0 years

0 Lacs

maharashtra

On-site

As a Kubernetes Administrator/DevOps Senior Consultant, you will be responsible for designing, provisioning, and managing Kubernetes clusters for applications based on micro-services and event-driven architectures. Your role will involve ensuring seamless integration of applications with Kubernetes orchestrated environments and configuring and managing Kubernetes resources such as pods, services, deployments, and namespaces. Monitoring and troubleshooting Kubernetes clusters to identify and resolve performance issues, system errors, and other operational challenges will be a key aspect of your responsibilities. You will also be required to implement infrastructure as code (IAC) using tools like Ansible and Terraform for configuration management. Furthermore, you will design and implement cluster and application monitoring using tools like Prometheus, Grafana, OpenTelemetry, and Datadog. Managing and optimizing AWS cloud resources and infrastructure for Managed containerized environments (ECR, EKS, Fargate, EC2) will be a part of your daily tasks. Ensuring high availability, scalability, and security of all infrastructure components, monitoring system performance, identifying bottlenecks, and implementing necessary optimizations are also crucial responsibilities. Your role will involve troubleshooting and resolving complex issues related to the DevOps stack, developing and maintaining documentation for DevOps processes and best practices, and staying current with industry trends and emerging technologies to drive continuous improvement. Creating and managing DevOps pipelines, IAC, CI/CD, and Cloud Platforms will also be part of your duties. **Required Skills:** - 4-5 years of extensive hands-on experience in Kubernetes Administration, Docker, Ansible/Terraform, AWS, EKS, and corresponding cloud environments. - Hands-on experience in designing and implementing Service Discovery, Service Mesh, and Load Balancers. - Extensive experience in defining and creating declarative files in YAML for provisioning. - Experience in troubleshooting containerized environments using a combination of Monitoring tools/logs. - Scripting and automation skills (e.g., Bash, Python) for managing Kubernetes configurations and deployments. - Hands-on experience with Helm charts, API gateways, ingress/egress gateways, and service meshes (ISTIO, etc.). - Hands-on experience in managing Kubernetes Network (Services, Endpoints, DNS, Load Balancers) and storages (PV, PVC, Storage Classes, Provisioners). - Design, enhance, and implement additional services for centralized Observability Platforms, ensuring efficient log management based on the Elastic Stack, and effective monitoring and alerting powered by Prometheus. - Design and Implement CI/CD pipelines, hands-on experience in IAC, git, monitoring tools like Prometheus, Grafana, Kibana, etc. **Good to Have Skills:** - Relevant certifications (e.g., Certified Kubernetes Administrator CKA / CKAD) are a plus. - Experience with cloud platforms (e.g., AWS, Azure, GCP) and their managed Kubernetes services. - Perform capacity planning for Kubernetes clusters and optimize costs in On-Prem and cloud environments. **Preferred Experience:** - 4-5 years of experience in Kubernetes, Docker/Containerization.,

Posted 1 week ago

Apply

8.0 - 12.0 years

35 - 75 Lacs

Bengaluru

Work from Office

Job Summary As an Software Engineering Manager, you will be managing a team responsible for actively participating in driving product development and strategy. This role will be the key contributor throughout the entire product lifecycle from conception to deployment and will be involve working on advanced distributed microservices systems that handle petabytes of data, providing essential insights for our enterprise offerings. You will also contribute in designing, implementing, and maintaining robust, scalable, and secure cloud DevOps infrastructure to support our software development lifecycleof resilient, enterprise-level systems that operate effectively within hybrid and multi-cloud environments, ensuring scalability and reliability to meet our customers' complex need. Job Requirements Experience with Working on Cloud Environments (AWS/Azure/GCP). Proficiency in infrastructure-as-code tools (e.g., Terraform, Ansible, CloudFormation). Expertise in Micro service Architecture and programming languages (e.g., K8, Python, Golang). Familiarity with monitoring tools (e.g., Prometheus, Grafana, Jarvis). Knowledge of version control systems (e.g., Git). Understanding of networking, security, and database management. Responsibilities Team Leadership: Manage, mentor, and grow a team of software engineers, fostering a collaborative and high-performing team culture. Infrastructure Management: Oversee the design, implementation, and maintenance of cloud-based infrastructure as code (IaC) to ensure scalability, reliability, and security. Automation: Drive automation of infrastructure provisioning, configuration management, and deployment processes to improve efficiency and reduce manual errors. Monitoring: Implement and manage monitoring, logging, and alerting systems to ensure high availability of applications and infrastructure. Collaboration: Work closely with other software development and product teams to align DevOps processes with development goals and ensure seamless integration. Security and Compliance: Ensure infrastructure and processes comply with security best practices and industry standards Cost Optimization: Monitor and optimize cloud resource usage to balance performance and cost efficiency. Incident Management: Lead incident response efforts, including root cause analysis and post-mortem reviews, to minimize downtime and prevent recurrence. Strategic Planning: Develop and execute a strategy that supports organizational goals, including technology evaluations and tool selection. Education A Minimum of 10-15 years of related experience of which at least 5+ years of experience as a people manager is required. A Bachelor of Science Degree in Electrical Engineering or Computer Science, a Master Degree or a PhD or equivalent experience is required.

Posted 1 week ago

Apply

7.0 - 12.0 years

0 Lacs

goa

On-site

As a Senior Backend Developer at Siemens in Goa, India, you will be part of a passionate group of solution innovators, UX devotees, techies, data scientists/AI experts, software lovers, AR/VR experts, visual artists, and architects working in a lean startup concept. Your role will involve solving complex problems in various domains like industry, energy, mobility, and buildings to smart cities by leveraging data analytics, artificial intelligence, simulations, and interactive visualization. Your responsibilities will include designing, developing, and maintaining robust backend services and APIs. You will collaborate with architects and product owners to translate requirements into technical solutions. It will be essential for you to implement clean, maintainable, and testable code following best practices and coding standards. You will also participate in system design discussions, code reviews, and performance tuning to ensure the integration with frontend components and external systems. To qualify for this role, you should possess a Masters/Bachelor's degree in Computer Science or a related discipline from a reputed institute, along with 7-12 years of experience in backend development for enterprise-grade applications. Proficiency in backend technologies such as Java Spring Boot, Python, & Node.js is required, as well as a deep understanding of SOLID principles, design patterns, and system design. Experience with SQL and NoSQL databases, including handling large-scale & time-series data, is essential. Moreover, you should have a strong grasp of backend methodologies such as RESTful API design and familiarity with event-driven systems using MQTT, WebSocket, or Pub/Sub. Exposure to cloud-native development, CI/CD pipelines, and cloud platforms like AWS is beneficial. Knowledge of unit testing, mocking, test automation frameworks, version control systems like Git, and maintaining code quality through tools like SonarQube is also necessary. Understanding security architecture, data privacy compliance, and DevOps culture will be advantageous for this role. Your ability to work effectively in agile, globally distributed teams, along with strong debugging, problem-solving, and communication skills, will be crucial. Siemens values diversity and equality, welcoming applications that reflect the diversity of the communities it works in across Gender, LGBTQ+, Abilities & Ethnicity. If you are passionate about shaping the future with your technical expertise, join Siemens in making a real impact in the world.,

Posted 1 week ago

Apply

4.0 - 6.0 years

6 - 15 Lacs

Chennai

Work from Office

Job Title: Spring Web Services and Microservices Developer Location: Chennai Experience Level: 5-6 years Employment Type: Full-Time (Work from Office) Job Description: We are seeking a skilled and experienced Spring Web Services and Microservices Developer to join our dynamic development team. The ideal candidate will have 5-6 years of hands-on experience in building scalable, high-performance, and resilient backend systems using Spring technologies, microservices architecture, and RESTful services. You will collaborate closely with cross-functional teams to design, develop, and maintain solutions that meet the needs of the business while ensuring performance, reliability, and security. Skill Requirements: Spring Boot, Spring MVC, Spring JPA, Microservices, Spring Cloud, Azure, AWS, Google Cloud Platform, Spring Data Rest, Spring REST Docs, Spring Reactive Programming (Spring Web flux), Rabbit MQ, Spring AMPQ, Kafka, Spring Microservices Architecture, Log4J2, Splunk, Grafana, Prometheus, Kubernetes, Docker, API Security OAuth2, JWT, Mongo DB, MySQL, Azure SQL Key Responsibilities: Design and Development: Design, develop, and maintain RESTful APIs and microservices using Spring Boot and Spring Cloud. Build and maintain web services and microservices to support business requirements. Implement scalable solutions, ensuring that microservices are loosely coupled and highly available. Integration and Communication: Integrate microservices with various backend systems and external APIs. Collaborate with front-end developers and business stakeholders to ensure seamless integration and a cohesive user experience. Optimization and Performance: Optimize the performance of services and applications by implementing best practices in caching, monitoring, and database optimization. Ensure the responsiveness and performance of all web service applications. Testing and Deployment: Write unit tests and integration tests to ensure the quality and robustness of code. Deploy applications to cloud environments (e.g., AWS, Azure) or on-premises solutions using CI/CD pipelines. Collaboration and Leadership: Work closely with architects and other developers to design and implement microservices-based architecture. Mentor junior developers and contribute to the continuous improvement of team practices. Documentation and Maintenance: Document service architecture, code structure, and application workflows. Provide ongoing support and maintenance for deployed applications and services. Required Skills and Experience: Programming Languages: Strong proficiency in Java and object-oriented programming. Spring Framework: 5+ years of experience with Spring Boot, Spring MVC, Spring Security, and Spring Cloud. Microservices Architecture: Extensive experience in developing, deploying, and managing microservices architecture. Web Services: Expertise in building RESTful APIs and integrating with SOAP web services. Databases: Experience with relational databases (e.g., MySQL, PostgreSQL) and NoSQL databases (e.g., MongoDB, Cassandra). Cloud and DevOps: Familiarity with cloud platforms such as AWS, Azure, or GCP; knowledge of Docker, Kubernetes, and CI/CD pipelines. API Security: Knowledge of OAuth2, JWT, and API gateway configurations. Testing: Experience with JUnit, Mockito, and integration testing frameworks. Tools: Proficiency in using version control systems like Git, build tools like Maven/Gradle, and IDEs like IntelliJ or Eclipse. Preferred Qualifications: Familiarity with event-driven architectures (Kafka, RabbitMQ). Experience in containerization using Docker and orchestration using Kubernetes. Understanding of serverless technologies and deployment strategies. Knowledge of performance tuning and monitoring tools such as Prometheus, Grafana, and ELK stack. Soft Skills: Strong problem-solving skills and attention to detail. Excellent communication and collaboration abilities. Ability to work in an Agile environment and adapt to changing requirements.

Posted 1 week ago

Apply

5.0 - 10.0 years

15 - 30 Lacs

Noida

Hybrid

Lead Site Reliability Engineer Lead Site Reliability Engineers at UKG are critical team members that have a breadth of knowledge encompassing all aspects of service delivery. They develop software solutions to enhance, harden and support our service delivery processes. This can include building and managing CI/CD deployment pipelines, automated testing, capacity planning, performance analysis, monitoring, alerting, chaos engineering and auto remediation. Lead Site Reliability Engineers must be passionate about learning and evolving with current technology trends. They strive to innovate and are relentless in pursuing a flawless customer experience. They have an automate everything” mindset, helping us bring value to our customers by deploying services with incredible speed, consistency, and availability Job Responsibilities: Engage in and improve the lifecycle of services from conception to EOL, including system design consulting, and capacity planning Define and implement standards and best practices related to: System Architecture, Service delivery, metrics and the automation of operational tasks Support services, product & engineering teams by providing common tooling and frameworks to deliver increased availability and improved incident response. Improve system performance, application delivery and efficiency through automation, process refinement, postmortem reviews, and in-depth configuration analysis Collaborate closely with engineering professionals within the organization to deliver reliable services Increase operational efficiency, effectiveness, and quality of services by treating operational challenges as a software engineering problem (reduce toil) Guide junior team members and serve as a champion for Site Reliability Engineering Actively participate in incident response, including on-call responsibilities Partner with stakeholders to influence and help drive the best possible technical and business outcomes Required Qualifications Engineering degree, or a related technical discipline, or equivalent work experience Experience coding in higher-level languages (e.g., Python, JavaScript, C++, or Java) Knowledge of Cloud based applications & Containerization Technologies Demonstrated understanding of best practices in metric generation and collection, log aggregation pipelines, time-series databases, and distributed tracing Working experience with industry standards like Terraform, Ansible Demonstrable fundamentals in 2 of the following: Computer Science, Cloud architecture, Security or Network Design fundamentals Demonstrable fundamentals in 2 of the following: Computer Science, Cloud architecture, Security, or Network Design fundamentals (Experience, Education, Certification, License and Training) Must have at least 5 years of hands-on experience working in Engineering or Cloud Minimum 5 years' experience with public cloud platforms (e.g. GCP, AWS, Azure) Minimum 3 years' Experience in configuration and maintenance of applications and/or systems infrastructure for large scale customer facing company Experience with distributed system design and architecture

Posted 1 week ago

Apply

0.0 - 2.0 years

6 - 8 Lacs

Pune

Work from Office

Job Title: Systems Engineer Location : Pune, India About This Role : The Command Center team is a group of our Engineers that keep Comscore’s Products and Infrastructure running by tracking and resolving issues, supporting systems, and responding to client queries. We’re responsible for supporting the servers, cloud services, network, and storage devices across all of Comscore’s data centers and cloud environments and responding to both internal and client queries 24/7. The Command Center team is hiring a System Engineer – Command Center to provide 24x7 support. This role requires someone with strong technical and communications skills in order to be effective. This team will monitor and troubleshoot issues within the IT Infrastructure environment. These environments exist on-premise as well as in AWS and are a mixture of physical and virtual. This position will also be responsible for working with multiple teams globally to create support and escalation procedures based on the impact on the business. What You’ll Do : Remote management and monitoring of 24 x 7 Command Center Operations Incident Management for all company-wide Major Incidents. Monitor the Data ingestion, Data Processing, and Delivery processes for all Comscore’s products. Provide L1 support for servers hosted on AWS and on-premises. Managing Grafana and Prometheus across clouds. Patch management for Windows and Linux servers, including SQL servers. Ability to work with SOPs. Troubleshoot, prioritize, and escalate issues to concerned technical teams. Ability to communicate severe issues. Provide on-call support with excellent English language communication skills, both verbally and in writing. User Management in LDAP. Experience and/or comfort with communicating with employees and clients around the world, primarily in the US, at all levels of the organization. Help in process improvement and documentation. What You’ll Need : Bachelor’s Degree in Computer Science or a related field 2 to 4 years of infrastructure or Product operations experience. Excellent written and verbal communication skills. Experience in monitoring & Service management tools like Nagios, JIRA, and Pager Duty. Basic Experience working on AWS-related services like CloudWatch, Athena, S3, EMR, etc. Basic Experience in managing/administering observability tools like Grafana and Prometheus. Basic Experience in working with any Database and Query Language. Ability to automate different server admin tasks using PowerShell, Bash, etc. Experience in Microsoft-based server operating systems – 2012, 2016, 2019. Understanding of ITIL – Incident Management, Problem Management. Attention to detail. Ability to follow complex and detailed instructions Proactive problem-solving skills Certifications in the related field are a plus. Benefits: Medical: Comscore offers a collective Private Medical Insurance scheme which is 100% covered by Comscore. The benefit is applicable to employees, an employee’s spouse, up to two children and parents. Pension: Provident Fund: Comscore bears both the employee and employer contribution. Time Off Annual Leave: Comscore offers market competitive annual leave of 26 Annual Leave Days (8 Casual and 18 Privilege), following local guidelines and practices. National Holidays and Festival Holidays: 10 Days. Sick Leave: 10 Days. Additional Leave: Paternity, Bereavement, Marriage, Maternity, Additional Pregnancy / Birth Related Leave • Christmas / New Year Paid Leave, Comscore offers a week of Company paid leave over the Christmas / New Year period. Summer Hours: Comscore has a culture that rewards employees for their hard work. When you work hard, you need time to recharge and refresh. Early releases on Fridays are subject to manager approval. Internal Career Development Opportunities (minimum of 6 months tenure in the current position and in discussion with supervisors) Access to hundreds of professional e-learning courses, specifically created for Comscore Be creative: You don’t have to follow the norm to be successful – we encourage you to think outside the box. Our culture is built on encouraging innovative ideas, communication and joint success. Informal Work Atmosphere: We believe in getting the job done in a comfortable, casual environment! The ability to become a truly global engineer, with exposure to markets across the world. With more than 30 offices around the world, many Comscore teams work together across locations. About Comscore : At Comscore, we’re pioneering the future of cross-platform media measurement, arming organizations with the insights they need to make decisions with confidence. Central to this aim are our people who work together to simplify the complex on behalf of our clients & partners. Though our roles and skills are varied, we’re united by our commitment to five underlying values: Integrity, Velocity, Accountability, Teamwork, and Servant Leadership. If you’re motivated by big challenges and interested in helping some of the largest and most important media properties and brands navigate the future of media, we’d love to hear from you. This will be a foundational role on our Pune-based Engineering team during a time of exponential growth for Comscore in Pune. The candidate will work with Comscore teams around the world on work vital to the future of Comscore and our clients. Comscore (NASDAQ: SCOR) is a trusted partner for planning, transacting, and evaluating media across platforms. With a data footprint that combines digital, linear TV, over-the-top, and theatrical viewership intelligence with advanced audience insights, Comscore allows media buyers and sellers to quantify their multiscreen behavior and make business decisions with confidence. A proven leader in measuring digital and set-top box audiences and advertising at scale, Comscore is the industry’s emerging, third-party source for reliable and comprehensive cross-platform measurement. To learn more about Comscore, please visit Comscore.com. About Comscore: At Comscore, we’re pioneering the future of cross-platform media measurement, arming organizations with the insights they need to make decisions with confidence. Central to this aim are our people who work together to simplify the complex on behalf of our clients & partners. Though our roles and skills are varied, we’re united by our commitment to five underlying values: Integrity, Velocity, Accountability, Teamwork, and Servant Leadership. If you’re motivated by big challenges and interested in helping some of the largest and most important media properties and brands navigate the future of media, we’d love to hear from you. Comscore (NASDAQ: SCOR) is a trusted partner for planning, transacting and evaluating media across platforms. With a data footprint that combines digital, linear TV, over-the-top and theatrical viewership intelligence with advanced audience insights, Comscore allows media buyers and sellers to quantify their multiscreen behavior and make business decisions with confidence. A proven leader in measuring digital and set-top box audiences and advertising at scale, Comscore is the industry’s emerging, third-party source for reliable and comprehensive cross-platform measurement. To learn more about Comscore, please visit Comscore.com. C omscore is committed to creating an inclusive culture, encouraging diversity. *LI-JL1

Posted 1 week ago

Apply

3.0 - 7.0 years

17 - 20 Lacs

Pune

Work from Office

Project description ACQA is built on Microsoft Azure cloud computing technology. It aims to deliver: Scalable cost-efficient infrastructure, using cloud PaaS components. Single core platform, open architecture, designed for change, itemised $cost metrics, automated data lineage. Shared across Front Office, Finance and Risk, improving regulatory compliance. One-Platform / One-Experience -fast to train, easy to operate, retaining talent.The ACQA platform is made up of a series of components providing the next generation valuation and risk management services. ResponsibilitiesPerform functional/configuration changes to improve automation and reduce maintenance effort Build and maintain a CI/CD pipeline automation Management of monitoring systems (Nagios, Prometheus, Grafana) Migration of applications between two banking organizations Certificate renewals Cleanups, removal of redundant applications, functions, and data SkillsMust have Azure Cloud FinOps / Cloud cost efficiencies Azure CosmosDB / SQL Terraform / IaC PowerShell / Bash Linux DevOps skills CI/CD Automation UBS processes / tooling Grid DataSynpase Nice to have SDLC

Posted 1 week ago

Apply

3.0 - 7.0 years

17 - 20 Lacs

Hyderabad

Work from Office

Project description ACQA is built on Microsoft Azure cloud computing technology. It aims to deliver: Scalable cost-efficient infrastructure, using cloud PaaS components. Single core platform, open architecture, designed for change, itemised $cost metrics, automated data lineage. Shared across Front Office, Finance and Risk, improving regulatory compliance. One-Platform / One-Experience -fast to train, easy to operate, retaining talent.The ACQA platform is made up of a series of components providing the next generation valuation and risk management services. ResponsibilitiesPerform functional/configuration changes to improve automation and reduce maintenance effort Build and maintain a CI/CD pipeline automation Management of monitoring systems (Nagios, Prometheus, Grafana) Migration of applications between two banking organizations Certificate renewals Cleanups, removal of redundant applications, functions, and data SkillsMust have Azure Cloud FinOps / Cloud cost efficiencies Azure CosmosDB / SQL Terraform / IaC PowerShell / Bash Linux DevOps skills CI/CD Automation UBS processes / tooling Grid DataSynpase Nice to have SDLC

Posted 1 week ago

Apply

3.0 - 6.0 years

4 - 8 Lacs

Mumbai, Hyderabad

Work from Office

Objectives of this Role Design and maintain CI/CD pipelines that support rapid and reliable deployments. Automate infrastructure and operations tasks using Shell, Bash, and Python scripts. Manage and optimize cloud deployments on AWS and Azure. Ensure seamless integration between backend services, APIs, and frontend apps (e.g., React). Drive adoption of DevOps best practices, including infrastructure as code, monitoring, and version control. Collaborate cross-functionally to support the development, QA, and deployment teams. Primary Skills DevOps & Automation CI/CD: Hands-on experience with Jenkins and other automation tools. Scripting: Strong in Shell, Bash, and Python scripting for automation and task orchestration. Cloud Platforms: Proficient in AWS and/or Azure services (EC2, Lambda, S3, IAM, etc.). Containerization: Docker and basic understanding of Kubernetes. Version Control: Proficiency in Git and Git-based workflows. Infrastructure as Code: Experience with Terraform / Ansible is a plus. Monitoring & Logging: Familiarity with tools like Prometheus, Grafana, CloudWatch, etc. Bonus Skills Exposure to serverless architectures Basic understanding of React app deployments Experience with PowerShell scripting Familiarity with microservices architecture and DevSecOps principles Working knowledge of performance monitoring and alerting setups Other Skills Problem-solving mindset with a focus on scalability and automation Strong collaboration and communication skills Willingness to learn and work across different tech stacks Basic understanding of security and compliance in cloud-based deployments

Posted 1 week ago

Apply

3.0 - 8.0 years

5 - 9 Lacs

Bengaluru

Work from Office

Project Role : Application Developer Project Role Description : Design, build and configure applications to meet business process and application requirements. Must have skills : Kubernetes, AWS Administration, Prometheus Event Monitoring System, OpenShift Virtualization Good to have skills : NAMinimum 3 year(s) of experience is required Educational Qualification : 15 years full time education Summary :As an Application Developer, you will design, build, and configure applications to meet business process and application requirements. A typical day involves collaborating with team members to understand project needs, developing application features, and ensuring that the applications function seamlessly within the existing infrastructure. You will also engage in troubleshooting and optimizing applications to enhance performance and user experience, while adhering to best practices in software development. Roles & Responsibilities:- Strong communication and documentation skills.- Ability to think out-of-the-box during outages or performance issues.- Works well under pressure and collaborates with cross-functional teams infra, app, network, security. Professional & Technical Skills: - Must To Have Skills: Proficiency in Kubernetes, OpenShift Virtualization, AWS Administration, Prometheus Event Monitoring System.- Strong understanding of container orchestration and management.- Strong experience with OpenShift:managing pods, deployments, services, configMaps, secrets, and persistent volumes.-Proficient in handling into Containers, probes, scaling, and troubleshooting pod-level issues.- Familiarity with Namespaces, ResourceQuotas and limits- Solid understanding of OpenShift networking, including Ingress, routes, HAProxy configurations, and DNS resolution.- Proficiency in Tekton Pipelines for CI/CD managing pipeline runs, triggers, tasks, and workspaces within OpenShift.- Strong knowledge of Ansible used for configuration management, provisioning, and environment setup automation.Ability to automate repetitive tasks and deployments across dev, test, and production environments.- Experience with:- Prometheus & Metricbeat for metrics collection and service monitoring.- Fluentd for log aggregation and forwarding.- Grafana or similar tools for dashboarding.- Capable of building alerting rules and actionable observability pipelines in Elasticsearch- Strong skills in Shell scripting and ansible working with yaml files- Familiarity working with CLI tools like oc, kubectl, ansible, containers- Ability to troubleshoot across the stack:container runtime, middleware, integration protocols.- Experienced in handling critical production incidents with minimal downtime.- Skilled in interpreting logs, metrics, and traces to perform root cause analysis quickly.- Experience with cloud infrastructure and services.- Familiarity with application deployment and monitoring tools.- Knowledge of scripting languages for automation. Additional Information:- The candidate should have minimum 3 years of experience in Kubernetes.- This position is based at our Bengaluru office.- A 15 years full time education is required. Qualification 15 years full time education

Posted 1 week ago

Apply

5.0 - 7.0 years

35 - 40 Lacs

Ahmedabad

Remote

Location: Remote Type: Full-Time Experience Level: Senior (Minimum 5+ years of relevant experience ) Industry: Cloud Infrastructure, AI/ML Ops, Kubernetes About the Role Were looking for a Senior OpenShift Platform Engineer with 5+ years of hands-on experience in OpenShift and Kubernetes. In this strategic role, you'll lead the design and implementation of MLOps / LLMOps systems on OpenShift, mentor engineers, and help scale secure, high-performance AI infrastructure in collaboration with cross-functional teams. Key Responsibilities Platform Leadership Architect, install, upgrade, and manage OpenShift clusters, both on bare metal and VMware. Lead the deployment of MLOps / LLMOps workflows in OpenShift AI environments. Implement production-grade solutions for model deployment, monitoring, and validation pipelines. Infrastructure Excellence Set up robust monitoring (Prometheus, Thanos, Grafana), logging, and backup strategies. Drive improvements in scalability, performance, and reliability of containerized platforms. Ensure secure and efficient configurations for RBAC, networking, and persistent storage (NetApp preferred). Collaboration & Communication Translate customer and product requirements into technical solutions. Lead architectural and code reviews across distributed engineering teams. Mentor team members and promote a high standard of engineering practices. Incident Management & Governance Own and lead root cause analysis (RCA) and post-mortem follow-ups. Define and enforce platform standards, technical governance, and compliance practices. Must-Have Qualifications Minimum 5 years of experience in OpenShift cluster installation, management, and lifecycle operations. Proven experience designing highly available and scalable systems in enterprise environments. Hands-on experience with bare metal or VMware-based OpenShift deployments . Deep understanding of Kubernetes/OpenShift security, RBAC, networking, and persistent storage (NetApp preferred) . Expertise in setting up monitoring, logging, and backup solutions . Proficiency with CI/CD DevOps tools such as GitLab and ArgoCD . Solid experience with observability tools like Prometheus, Thanos, and Grafana . Strong communication and presentation skills to engage both technical and non-technical stakeholders. Ability to juggle multiple projects and deliver with minimal oversight. Bonus Skills (Highly Desirable) Experience with OpenShift AI and deployment of MLOps/LLMOps workflows. Familiarity with OpenShift Virtualization or KubeVirt . Open-source contributions in Kubernetes/MLOps communities. Experience leading customer-facing technical discussions and workshops.

Posted 1 week ago

Apply

3.0 - 6.0 years

4 - 7 Lacs

Ahmedabad, Vadodara

Work from Office

AI/ML Engineer (2-3 positions) Job Summary: We are seeking a highly skilled and motivated AI/ML Engineer with a specialization in Computer Vision & Un-Supervised Learning to join our growing team. You will be responsible for building, optimizing, and deploying advanced video analytics solutions for smart surveillance applications, including real-time detection, facial recognition, and activity analysis. This role combines the core competencies of AI/ML modelling with the practical skills required to deploy and scale models in real-world production environments, both in the cloud and on edge devices. Key Responsibilities: AI/ML Development & Computer Vision Design, train, and evaluate models for: o Face detection and recognition o Object/person detection and tracking o Intrusion and anomaly detection o Human activity or pose recognition/estimation Work with models such as YOLOv8, DeepSORT, RetinaNet, Faster-RCNN, and InsightFace. Perform data preprocessing, augmentation, and annotation using tools like LabelImg, CVAT, or custom pipelines. Surveillance System Integration Integrate computer vision models with live CCTV/RTSP streams for real-time analytics. Develop components for motion detection, zone-based event alerts, person re-identification, and multi-camera coordination. Optimize solutions for low-latency inference on edge devices (Jetson Nano, Xavier, Intel Movidius, Coral TPU). Model Optimization & Deployment Convert and optimize trained models using ONNX, TensorRT, or OpenVINO for real-time inference. Build and deploy APIs using FastAPI, Flask, or TorchServe. Package applications using Docker and orchestrate deployments with Kubernetes. Automate model deployment workflows using CI/CD pipelines (GitHub Actions, Jenkins). Monitor model performance in production using Prometheus, Grafana, and log management tools. Manage model versioning, rollback strategies, and experiment tracking using MLflow or DVC. As an AI/ML Engineer, you should be well-versed of AI agent development and finetuning experience Collaboration & Documentation Work closely with backend developers, hardware engineers, and DevOps teams. Maintain clear documentation of ML pipelines, training results, and deployment practices. Stay current with emerging research and innovations in AI vision and MLOps. Required Qualifications: Bachelors or masters degree in computer science, Artificial Intelligence, Data Science, or a related field. 3-6 years of experience in AI/ML, with a strong portfolio in computer vision, Machine Learning. Hands-on experience with: o Deep learning frameworks: PyTorch, TensorFlow o Image/video processing: OpenCV, NumPy o Detection and tracking frameworks: YOLOv8, DeepSORT, RetinaNet. Solid understanding of deep learning architectures (CNNs, Transformers, Siamese Networks). Proven experience with real-time model deployment on cloud or edge environments. Strong Python programming skills and familiarity with Git, REST APIs, and DevOps tools. Preferred Qualifications: Experience with multi-camera synchronization and NVR/DVR systems. Familiarity with ONVIF protocols and camera SDKs. Experience deploying AI models on Jetson Nano/Xavier, Intel NCS2, or Coral Edge TPU. Background in face recognition systems (e.g., InsightFace, FaceNet, Dlib). Understanding of security protocols and compliance in surveillance systems. Tools & Technologies: Category Tools & Frameworks Languages & AI Python, PyTorch, TensorFlow, OpenCV, NumPy, Scikit-learn Model Serving FastAPI, Flask, TorchServe, TensorFlow Serving, REST/gRPC APIs Model Optimization ONNX, TensorRT, OpenVINO, Pruning, Quantization Deployment Docker, Kubernetes, Gunicorn, MLflow, DVC CI/CD & DevOps GitHub Actions, Jenkins, GitLab CI Cloud & Edge AWS SageMaker, Azure ML, GCP AI Platform, Jetson, Movidius, Coral TPU Monitoring Prometheus, Grafana, ELK Stack, Sentry Annotation Tools LabelImg, CVAT, Supervisely

Posted 1 week ago

Apply

10.0 - 16.0 years

30 - 45 Lacs

Bengaluru

Remote

- AWS & SaaS architecture - monitoring tools(Datadog, New Relic, Prometheus, Grafana) - incident mngmnt (PagerDuty, ServiceNow, Zendesk, Opsgenie) - Exp running 24x7 Cloud Ops team - DevOps processes, CI/CD pipelines, IaC tools(Terraform, Ansible)

Posted 1 week ago

Apply

3.0 - 5.0 years

4 - 7 Lacs

Varanasi

Work from Office

Responsibilities: * Implement AI models using Python libraries like scikit-learn, TensorFlow. * Develop REST APIs with Django/Flask frameworks and ORMs. * Collaborate on code reviews and testing strategies.

Posted 1 week ago

Apply

3.0 - 7.0 years

15 - 20 Lacs

Pune

Hybrid

Hi Everyone, I am on lookout for Site reliability engineer for leading product based MNC in Yerwada, Pune. Kindly refer below JD and share your resume on pallavi.ag@peoplefy.com Job description: Candidates with application or production support experience. .Net or Java Expertise in MS SQL Server ITIL Process Monitoring tools Should be comfortable with rotational shifts Thank you!

Posted 1 week ago

Apply

12.0 - 16.0 years

75 - 95 Lacs

Bengaluru

Hybrid

Job Objective : As VP Architect- Lead the design and development of scalable, reliable, and high-performance architecture for Zwayam. Job Description: In this role you will: • Hands-on Coding & Code Review: Actively participate in coding and code reviews, ensuring adherence to best practices, coding standards, and performance optimization. • High-Level and Low-Level Design: Create comprehensive architectural documentation that guides the development team and ensures the scalability and security of the system. • Security Best Practices: Implement security strategies, including data encryption, access control, and threat detection, ensuring the platform adheres to the highest security standards. • Compliance Management: Oversee compliance with regulatory requirements such as GDPR, including data protection, retention policies, and audit readiness. • Disaster Recovery & Business Continuity: Design and implement disaster recovery strategies to ensure the reliability and continuity of the system in case of failures or outages. • Scalability & Performance Optimization: Ensure the system architecture can scale seamlessly and optimize performance as business needs grow. • Monitoring & Alerting: Set up real-time monitoring and alerting systems to ensure proactive identification and resolution of performance bottlenecks, security threats, and system failures. • Cross-Platform Deployment: Architect flexible, cloud-agnostic solutions and manage deployments on Azure and AWS platforms. • Containerization & Orchestration: Use Kubernetes and Docker Swarm for container management and orchestration to achieve a high degree of automation and reliability in deployments. • Data Management: Manage database architecture using MySQL, MongoDB and ElasticSearch to ensure efficient storage, retrieval, and management of data. • Message Queuing Systems: Design and manage asynchronous communication using Kafka and Redis for event-driven architecture. • Collaboration & Leadership: Work closely with cross-functional teams including developers, product managers, and other stakeholders to deliver high-quality solutions on time. • Mentoring & Team Leadership: Mentor, guide, and lead the engineering team, fostering technical growth and maintaining adherence to architectural and coding standards. Required Skills: • Experience: 12+ years of experience in software development and architecture, with at least 3 years in a architect role. Technical Expertise: • Proficient in Java and related frameworks like Spring-boot • Experience with databases like MySQL, MongoDB, ElasticSearch, and message queuing systems like Kafka, Redis. • Proficiency with containerization (Docker, Docker Swarm) and orchestration (Kubernetes). • Solid experience with cloud platforms (Azure, AWS, GCP). • Experience with monitoring tools (e.g., Prometheus, Grafana, ELK stack) and alerting systems for real-time issue detection and resolution. Compliance & Security: • Hands-on experience in implementing security best practices. • Familiarity with compliance frameworks such as GDPR and DPDP • Architecture & Design: Proven experience in high-level and low-level architectural design. • Problem-Solving: Strong analytical and problem-solving skills, with the ability to handle complex and ambiguous situations. • Leadership: Proven ability to lead teams, influence stakeholders, and drive change. • Communication: Excellent verbal and written communication skills Our Ideal Candidate: The ideal candidate should possess a deep understanding of the latest architectural patterns, cloud-native design, and security practices. They should be adept at translating business requirements into scalable and efficient technical solutions. A proactive, hands-on approach to problem-solving and a passion for innovation are essential. Strong leadership and mentoring skills are crucial to drive a high-performance team and foster technical excellence. Why Join us: This is a unique opportunity to work on innovative and disruptive technologies that are shaping the future of the industry. We are looking for candidates who are willing to work passionately in a fast-paced environment and are ready to enhance their skills by learning something new. Being a part of the Info Edge team, you will be engaged in innovations, product development, integration with mobile and social media, technology, research and development, quality assurance, sales and marketing

Posted 1 week ago

Apply

7.0 - 10.0 years

17 - 18 Lacs

Thiruvananthapuram

Work from Office

Devops Lead Open-minded, good communication and interpersonal skills • Fast learner & curious • At least 5+ years of total relevant experience and hands-on experience in below o Public cloud stack: AWS /Azure o Container technology: Docker, Kubernetes, o Automation: Ansible, Terraform, o CI/CD: GitHub Actions/ Azure DevOps o Observability tool, e.g: Prometheus, Grafana, Dynatrace, ELK stack • Able to lead the DevOps practise within the team • Supported a production system/services based on above technology(s) for at least 2+ years • Good experience with scripting/automation • Has good understanding of traditional enterprise technology stack and enterprise networking concepts • Has basic understanding in Agile software development framework (e.g. Scrum) and tools (e.g. JIRA) • Have worked with security enhancements within the overall infra and operations scope. • Previous experience with owning and handling release management process. Preferred/Additional Technical And Professional Expertise • Working with change management of a production grade application • Have a basic understanding of regulations like DORA • Programming knowledge and experience in any general purpose programming language of min 1 years • Has experience in any relational database operations experience of min 1 years • Has good understanding of traditional enterprise technology stack and enterprise networking concepts • Have experience/ sound knowledge of managing any blockchain based solutions would be a big plus Skills/Specific Tasks/Activities performed Lead the DevOps practise within the team 9 Work with architects and development teams to enhance DevOps toolchain 9 Own the regular release activities 9 Manage the entire lifecycle of infrastructure, platforms, and application 8 Optimize performance monitoring and tuning 8 Build and maintain tools for operational stability and efficiency 9 Maintain up-to-date documentation for processes and configurations 7 Monitor system health using observability tools

Posted 1 week ago

Apply

3.0 - 8.0 years

8 - 18 Lacs

Kochi, Thiruvananthapuram

Hybrid

Job Description • Design and execute load, stress, soak, and spike tests for APIs, web apps, and backend services • Develop and maintain performance test scripts using tools like JMeter, k6, Gatling, or LoadRunner • Analyze test results to identify bottlenecks in application layers, database, or infrastructure. • Simulate production-like traffic and user behavior for accurate testing • Integrate performance tests into CI/CD pipelines to enable continuous performance monitoring • Use APM and monitoring tools (New Relic, Grafana, Prometheus, etc.) to gather performance metrics • Prepare detailed performance test reports with recommendations and tuning suggestions

Posted 1 week ago

Apply

7.0 - 12.0 years

10 - 20 Lacs

Bengaluru

Work from Office

8+ Years of exp in Database Technologies: AWS Aurora-PostgreSQL, NoSQL,DynamoDB, MongoDB,Erwin data modeling Exp in pg_stat_statements, Query Execution Plans Exp in Apache Kafka,AWS Kinesis,Airflow,Talend.AWS Exp in CloudWatch,Prometheus,Grafana, Required Candidate profile Exp in GDPR, SOC2, Role-Based Access Control (RBAC), Encryption Standards. Exp in AWS Multi-AZ, Read Replicas, Failover Strategies, Backup Automation. Exp in Erwin, Lucidchart, Confluence, JIRA.

Posted 1 week ago

Apply

3.0 - 5.0 years

10 - 20 Lacs

Pune

Work from Office

Required Skills and Qualifications: 3+ years of backend development experience in Java (Java 8+) and Spring Boot Strong understanding of REST APIs, JPA/Hibernate, and SQL databases (e.g., PostgreSQL, MySQL) Knowledge of software engineering principles and design patterns Experience with testing frameworks like JUnit and Mockito Familiarity with Docker and CI/CD tools Good communication and team collaboration skills Roles and Responsibilities Key Responsibilities: Develop and maintain backend systems using Java (Spring Boot) Build RESTful APIs and integrate with databases and third-party services Write unit and integration tests to ensure code quality Participate in code reviews and collaborate with peers and senior engineers Follow clean code principles and best practices in microservices design Support CI/CD deployment pipelines and container-based workflows Continuously learn and stay updated with backend technologies Required Skills and Qualifications: 3+ years of backend development experience in Java (Java 8+) and Spring Boot Strong understanding of REST APIs, JPA/Hibernate, and SQL databases (e.g., PostgreSQL, MySQL) Knowledge of software engineering principles and design patterns Experience with testing frameworks like JUnit and Mockito Familiarity with Docker and CI/CD tools Good communication and team collaboration skills Nice to Have: Exposure to Kubernetes and cloud platforms (AWS, GCP, etc.) Familiarity with messaging systems like Kafka or RabbitMQ Awareness of security standards and authentication protocols (OAuth2, JWT) Interest in DevOps practices and monitoring tools (Prometheus, ELK, etc.)

Posted 1 week ago

Apply

4.0 - 5.0 years

8 - 12 Lacs

Gurugram

Work from Office

Experience Level : Mid Level. Position Overview : We are looking for a Mid-Level Kubernetes Administrator to support and maintain our on-premises container orchestration infrastructure built on open-source Rancher Kubernetes. This role will focus on day-to-day cluster operations, deployment support, and working closely with DevOps, Infra, and Application teams. Roles and Responsibilities : - Manage Rancher-based Kubernetes clusters in an on-premise environment. - Deploy and monitor containerized applications using Helm and Rancher UI/CLI. - Support pod scheduling, resource allocation, and namespace management. - Handle basic troubleshooting of workloads, networking, and storage issues. - Monitor and report cluster health using Prometheus, Grafana, or similar tools. - Manage users, roles, and access using Rancher-integrated RBAC. - Participate in system patching, cluster upgrades, and capacity planning. - Document standard operating procedures, deployment guides, and issue resolutions. Must Have Skills : - 45 years of experience in Kubernetes administration in on-prem environments. - Hands-on experience with Rancher for managing K8s clusters. - Working knowledge of Linux system administration and networking. - Experience in Docker, Helm, and basic YAML scripting. - Exposure to CI/CD pipelines and Git-based deployment workflows. - Experience with monitoring/logging stacks (Prometheus, Grafana). Good to Have Skills : - Certified Kubernetes Administrator (CKA). - Familiarity with RKE (Rancher Kubernetes Engine). - Experience with bare metal provisioning, VM infrastructure, or storage systems. Qualification : BE/BTech/MCA/ME/MTech/MS in Computer Science or a related technical field or equivalent practical experience.

Posted 1 week ago

Apply

2.0 - 4.0 years

6 - 10 Lacs

Chennai, Bengaluru

Work from Office

Location: Bangalore, India Experience: 2 to 4 Years Employment Type: Full-Time Job Description: We are looking for a skilled DevOps Engineer with hands-on experience in GitLab to join our team in Bangalore. The ideal candidate should have a strong understanding of CI/CD pipelines, infrastructure automation, and cloud technologies. If you are passionate about DevOps and want to work in a dynamic and fast-paced environment, we would love to hear from you! Key Responsibilities: Customer Engagement & Implementation: Work directly with enterprise customers to understand their DevOps landscape and GitLab implementation needs. Lead the design, installation, and configuration of GitLab Self-Managed (OnPrem) environments across cloud and on-premise infrastructure. Translate customer requirements into scalable GitLab deployment architectures. CI/CD Pipeline Enablement: Architect and set up secure and scalable GitLab CI/CD pipelines aligned with customer release workflows. Integrate GitLab with third-party tools such as Kubernetes, Docker, Terraform, Jenkins, and Prometheus. Automation & Infrastructure as Code (IaC): Leverage Ansible, Terraform, and Helm charts for environment provisioning and GitLab automation. Manage GitLab runners and their configuration across distributed infrastructures. Monitoring & Optimization: Implement observability using tools like Prometheus, Loki, Grafana, and GitLab metrics dashboards. Optimize performance, ensure high availability (HA), backup, disaster recovery (DR), and auto-scaling. Knowledge Transfer & Documentation: Deliver technical documentation, operational runbooks, and knowledge transfer sessions for client upskilling. Assist clients in building internal GitLab usage guidelines, governance models, and compliance checks. Collaboration & Support: Coordinate closely with DevOps, Development, Support, and Infrastructure teams to ensure smooth rollouts and version upgrades. Troubleshoot GitLab issues including user management, access controls, LDAP/SAML integration, and runner performance Required Skills & Experience: 2 to 5 years of hands-on experience in DevOps engineering, preferably in customer-facing roles. Proven expertise in GitLab Self-Managed (OnPrem) setup, configuration, upgrade, and maintenance . Strong experience with CI/CD tools , Docker, Kubernetes, and cloud platforms (Azure, AWS, GCP). Proficiency in Infrastructure-as-Code using Terraform, Ansible, and Helm. Experience in monitoring stacks: Prometheus, Loki, Grafana, and OpenTelemetry . Working knowledge of scripting (e.g., Python, Bash ) and Linux system administration. Experience implementing GitLab RBAC, GitOps principles, and GitLab security scans is a plus Preferred Qualifications: Bachelors degree in Computer Science, Information Technology, or a related field. GitLab Certified Associate or GitLab CI/CD Specialist certification is a plus. Exposure to Agile/Scrum practices and experience leading technical deliverables. Experience in customer environments requiring high uptime and regulatory compliance. Why Join Us? • Opportunity to work on cutting-edge DevOps technologies. • Collaborative and innovative work environment. • Competitive salary and benefits. • Career growth and learning opportunities. If you are an experienced DevOps Engineer with GitLab expertise and are ready to join immediately, apply now!

Posted 1 week ago

Apply

4.0 - 5.0 years

8 - 12 Lacs

Gurugram

Work from Office

Position Overview : We are seeking an SRE to join our high-impact platform engineering team. You will maintain SLAs for real-time services deployed across hybrid clouds and Kubernetes clusters, contributing to automation, observability, and availability goals. Roles and Responsibilities : - Monitor application and infrastructure metrics; build dashboards and alerts (Prometheus, Grafana, ELK). - Automate health checks, incident remediation, and reliability guardrails. - Manage on-call rotations, conduct root cause analysis, and implement postmortem action plans. - Define and track SLOs, SLIs, and error budgets. - Use chaos engineering and resilience testing to ensure fault tolerance. Must Have Skills : - 4 - 5 years of experience in managing production-grade Kubernetes clusters and cloud-native platforms. - Proficiency in Linux system internals, containers, and networking. - Scripting/automation expertise in Python/Go/Shell. - Familiarity with incident management, runbooks, and observability standards. - Exposure to service discovery, DNS routing, and load balancing is a bonus. Qualification : BE/BTech/MCA/ME/MTech/MS in Computer Science or a related technical field or equivalent practical experience. Location : Gurugaon / Onsite. About Nomiso : Our mission is to Empower and Enhance the lives of our customers, through efficient solutions for their At Nomiso we encourage entrepreneurial spirit to learn, grow and improve. A great workplace, thrives on ideas and opportunities. We're in pursuit of colleagues who share similar passions, are nimble and thrive when challenged. We offer a positive, stimulating and fun environment with opportunities to grow, a fast-paced approach to innovation, and a place where your views are valued and encouraged. We are an equal opportunity employer and are committed to diversity, equity, and inclusion. We do not discriminate on race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, disability status, or any other protected characteristics.

Posted 1 week ago

Apply

5.0 - 9.0 years

24 - 42 Lacs

Gurugram

Work from Office

* Expert in monitoring solutions Openshift, OCP, Docker, Kubernetes, Ansible, Terraform, Helm, Elk. * Expert in CI CD & Automation * Compliance standards: RBAC, SCCs, Network Segmentation, CIS hardening. * CLI: oc, kubectl, Helm, Kustomize *Devops Health insurance Annual bonus Provident fund

Posted 1 week ago

Apply

3.0 - 8.0 years

15 - 22 Lacs

Hyderabad

Work from Office

Urgent Hiring Genpact (on role) for MLOps Engineer Permanent role Location: Hyderabad Shift Timing: 12 PM to 10 PM IST Looking for candidates with Immediate to serving notice period of 30-45 days maximum. Candidates with the right combination of ML lifecycle management , cloud infrastructure , and generative AI expertise. Breakdown: Role keywords : MLOps, LLM Ops, ML Engineer Technical skills: T echnical skills Cloud platforms : AWS, Azure Automation skills : CI/CD Containerization : Docker, Kubernetes Monitoring tools : Prometheus, Grafana With a startup spirit and 115,000+ curious and courageous minds, we have the expertise to go deep with the worlds biggest brandsand we have fun doing it. We dream in digital, dare in reality, and reinvent the ways companies work to make an impact far bigger than just our bottom line. Were harnessing the power of technology and humanity to create meaningful transformation that moves us forward in our pursuit of a world that works better for people. Now, we’re calling upon the thinkers and doers, those with a natural curiosity and a hunger to keep learning, keep growing., People who thrive on fearlessly experimenting, seizing opportunities, and pushing boundaries to turn our vision into reality. And as you help us create a better world, we will help you build your own intellectual firepower. Welcome to the relentless pursuit of better. Inviting applications for the role of Lead Consultant, ML Ops Engineer ! We are seeking a highly skilled and experienced ML Ops / LLM Ops Engineer to join our team. You will play a crucial role in building and maintaining the infrastructure and pipelines for our cutting-edge Generative AI applications, working closely with the Generative AI Full Stack Architect. Your expertise in automating and streamlining the ML lifecycle will be instrumental in ensuring the efficiency, scalability, and reliability of our Generative AI models in production. Responsibilities Design, develop, and implement ML/LLM pipelines for generative AI models, encompassing data ingestion, pre-processing, training, deployment, and monitoring. Conduct research and experimentation to explore new generative AI techniques and stay up to date with industry trends. Integrate GenAI APIs and microservices into existing applications and systems. Develop solutions leveraging GanAI technologies, integrating advanced AI capabilities into cloud native architectures to enhance system functionality and scalability. Automate ML tasks across the model lifecycle, leveraging tools like GitOps, CI/CD pipelines, and containerization technologies (e.g., Docker, Kubernetes). MLOps Support and Maintenance on ML Platforms Dataiku/Sagemaker. Implement version control, CI/CD pipelines, and containerization techniques to streamline ML and LLM workflows. Design and implement monitoring and alerting systems to track model performance, data drift, and other key metrics. Conduct ground truth analysis to evaluate the accuracy and effectiveness of LLM outputs compared to known, correct data. Work closely with infrastructure and DevOps teams to provision and manage resources for ML and LLM development and deployment. Develop and maintain robust monitoring and alerting systems for generative AI models in production, ensuring proactive identification and resolution of issues. Collaborate with the Generative AI Full Stack Architect and other engineers to optimize model performance and resource utilization. Manage and maintain cloud infrastructure (e.g., AWS, Azure) for ML workloads, ensuring cost-efficiency and scalability. Stay up to date on the latest advancements in MLOps and incorporate them into our platform and processes. Communicate effectively with technical and non-technical stakeholders about the health and performance of generative AI models. Qualifications we seek in you! Minimum Qualifications Bachelor’s degree in computer science, Data Science, Engineering, or a related field, or equivalent experience. Deep understanding of generative AI concepts, including neural networks, recurrent neural networks, and GANs. Experience in MLOps or related areas, such as DevOps, data engineering, or ML infrastructure. Proven experience in ML Ops, LLM Ops, or related roles, with hands-on experience deploying and managing machine learning and large language model pipelines Expertise in cloud platforms (e.g., AWS, Azure) for ML workloads. Strong understanding of CI/CD principles and containerization technologies like Docker and Kubernetes. Familiarity with monitoring and alerting tools for ML systems (e.g., Prometheus, Grafana). Excellent communication, collaboration, and problem-solving skills. Ability to work independently and as part of a team. Passion for Generative AI and its potential to revolutionize various industries. Genpact is an Equal Opportunity Employer and considers applicants for all positions without regard to race, color, religion or belief, sex, age, national origin, citizenship status, marital status, military/veteran status, genetic information, sexual orientation, gender identity, physical or mental disability or any other characteristic protected by applicable laws. Genpact is committed to creating a dynamic work environment that values diversity and inclusion, respect and integrity, customer focus, and innovation. For more information, visit www.genpact.com . Follow us on Twitter, Facebook, LinkedIn, and YouTube. Furthermore, please do note that Genpact does not charge fees to process job applications and applicants are not required to pay to participate in our hiring process in any other way. Examples of such scams include purchasing a 'starter kit,' paying to apply, or purchasing equipment or training.

Posted 1 week ago

Apply
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Featured Companies