Jobs
Interviews

1162 Prometheus Jobs - Page 22

Setup a job Alert
JobPe aggregates results for easy application access, but you actually apply on the job portal directly.

4.0 - 6.0 years

6 - 8 Lacs

Bengaluru

Work from Office

We are seeking a skilled DevOps Engineer with strong experience in Google Cloud Platform (GCP) to support AI/ML project infrastructure. The ideal candidate will work closely with data scientists, ML engineers, and developers to build and manage scalable, secure, and automated pipelines for AI/ML model training, testing, and deployment. Responsibilities: Design and manage cloud infrastructure to support AI/ML workloads on GCP. Develop and maintain CI/CD pipelines for ML models and applications. Automate model training, validation, deployment, and monitoring processes using tools like Kubeflow, Vertex AI, Cloud Composer, Airflow, etc. Set up and manage infrastructure as code (IaC) using tools such as Terraform or Deployment Manager. Implement robust security, monitoring, logging, and alerting systems using Cloud Monitoring, Cloud Logging, Prometheus, Grafana, etc. Collaborate with ML engineers and data scientists to optimize compute environments (e.g., GPU/TPU instances, notebooks). Manage and maintain containerized environments using Docker and Kubernetes (GKE). Ensure cost-efficient cloud resource utilization and governance. Required Skills Bachelor's degree in engineering or relevant field Must have 4 years of proven experience as DevOps Engineer with at least 1 year on GCP Strong experience with DevOps tools and methodologies in production environments Proficiency in scripting with Python, Bash, or Shell Experience with Terraform, Ansible, or other IaC tools. Deep understanding of Docker, Kubernetes, and container orchestration Knowledge of CI/CD pipelines, automated testing, and model deployment best practices. Familiarity with ML lifecycle tools such as MLflow, Kubeflow Pipelines, or TensorFlow Extended (TFX). Experience in designing conversational flows for AI Agents/chatbot

Posted 3 weeks ago

Apply

7.0 - 12.0 years

0 - 1 Lacs

Dhule

Work from Office

Key Responsibilities AI Model Deployment & Integration: Deploy and manage AI/ML models, including traditional machine learning and GenAI solutions (e.g., LLMs, RAG systems). Implement automated CI/CD pipelines for seamless deployment and scaling of AI models. Ensure efficient model integration into existing enterprise applications and workflows in collaboration with AI Engineers. Optimize AI infrastructure for performance and cost efficiency in cloud environments (AWS, Azure, GCP). Monitoring & Performance Management: Develop and implement monitoring solutions to track model performance, latency, drift, and cost metrics. Set up alerts and automated workflows to manage performance degradation and retraining triggers. Ensure responsible AI by monitoring for issues such as bias, hallucinations, and security vulnerabilities in GenAI outputs. Collaborate with Data Scientists to establish feedback loops for continuous model improvement. Automation & MLOps Best Practices: Establish scalable MLOps practices to support the continuous deployment and maintenance of AI models. Automate model retraining, versioning, and rollback strategies to ensure reliability and compliance. Utilize infrastructure-as-code (Terraform, CloudFormation) to manage AI pipelines. Security & Compliance: Implement security measures to prevent prompt injections, data leakage, and unauthorized model access. Work closely with compliance teams to ensure AI solutions adhere to privacy and regulatory standards (HIPAA, GDPR). Regularly audit AI pipelines for ethical AI practices and data governance. Collaboration & Process Improvement: Work closely with AI Engineers, Product Managers, and IT teams to align AI operational processes with business needs. Contribute to the development of AI Ops documentation, playbooks, and best practices. Continuously evaluate emerging GenAI operational tools and processes to drive innovation. Qualifications & Skills Education: Bachelors or Masters degree in Computer Science, Data Engineering, AI, or a related field. Relevant certifications in cloud platforms (AWS, Azure, GCP) or MLOps frameworks are a plus. Experience: 3+ years of experience in AI/ML operations, MLOps, or DevOps for AI-driven solutions. Hands-on experience deploying and managing AI models, including LLMs and GenAI solutions, in production environments. Experience working with cloud AI platforms such as Azure AI, AWS SageMaker, or Google Vertex AI. Technical Skills: Proficiency in MLOps tools and frameworks such as MLflow, Kubeflow, or Airflow. Hands-on experience with monitoring tools (Prometheus, Grafana, ELK Stack) for AI performance tracking. Experience with containerization and orchestration tools (Docker, Kubernetes) to support AI workloads. Familiarity with automation scripting using Python, Bash, or PowerShell. Understanding of GenAI-specific operational challenges such as response monitoring, token management, and prompt optimization. Knowledge of CI/CD pipelines (Jenkins, GitHub Actions) for AI model deployment. Strong understanding of AI security principles, including data privacy and governance considerations.

Posted 3 weeks ago

Apply

5.0 - 8.0 years

12 - 18 Lacs

Mumbai, Hyderabad, Chennai

Work from Office

We are seeking an experienced AWS Platform Engineer Developer to architect and manage secure, scalable AWS environments in compliance with industry regulations such as GDPR, FCA, and PRA. The role involves deploying and maintaining EKS clusters, Istio service mesh, and Kong API Gateway, implementing robust security measures using Dynatrace, Fortigate, and AWS-native security services (Security Hub, GuardDuty, WAF), and automating infrastructure provisioning with Terraform and CloudFormation. Responsibilities also include enforcing Privileged Access Management (PAM) policies, integrating observability tools (Dynatrace, Grafana, Prometheus), and collaborating with teams on container orchestration using Kubernetes and Docker. Experience in serverless technologies like AWS Lambda and API Gateway, as well as container security scanning tools such as Trivy and Aqua Security, is preferred.

Posted 3 weeks ago

Apply

5.0 - 7.0 years

7 - 9 Lacs

Bengaluru

Work from Office

A skilled DevOps Engineer to manage and optimize both on-premises and AWS cloud infrastructure. The ideal candidate will have expertise in DevOps tools, automation, system administration, and CI/CD pipeline management while ensuring security, scalability, and reliability. Key Responsibilities: 1. AWS & On-Premises Solution Architecture: o Design, deploy, and manage scalable, fault-tolerant infrastructure across both on-premises and AWS cloud environments. o Work with AWS services like EC2, IAM, VPC, CloudWatch, GuardDuty, AWS Security Hub, Amazon Inspector, AWS WAF, and Amazon RDS with Multi-AZ. o Configure ASG and implement load balancing techniques such as ALB and NLB. o Optimize cost and performance leveraging Elastic Load Balancing and EFS. o Implement logging and monitoring with CloudWatch, CloudTrail, and on-premises monitoring solutions. 2. DevOps Automation & CI/CD: o Develop and maintain CI/CD pipelines using Jenkins and GitLab for seamless code deployment across cloud and on-premises environments. o Automate infrastructure provisioning using Ansible, and CloudFormation. o Implement CI/CD pipeline setups using GitLab, Maven, Gradle, and deploy on Nginx and Tomcat. o Ensure code quality and coverage using SonarQube. o Monitor and troubleshoot pipelines and infrastructure using Prometheus, Grafana, Nagios, and New Relic. 3. System Administration & Infrastructure Management: o Manage and maintain Linux and Windows systems across cloud and on-premises environments, ensuring timely updates and security patches. o Configure and maintain web/application servers like Apache Tomcat and web servers like Nginx and Node.js. o Implement robust security measures, SSL/TLS configurations, and secure communications. o Configure DNS and SSL certificates. o Maintain and optimize on-premises storage, networking, and compute resources. 4. Collaboration & Documentation: o Collaborate with development, security, and operations teams to optimize deployment and infrastructure processes. o Provide best practices and recommendations for hybrid cloud and on-premises architecture, DevOps, and security. o Document infrastructure designs, security configurations, and disaster recovery plans for both environments. Required Skills & Qualifications: Cloud & On-Premises Expertise: Extensive knowledge of AWS services (EC2, IAM, VPC, RDS, etc.) and experience managing on-premises infrastructure. DevOps Tools: Proficiency in SCM tools (Git, GitLab), CI/CD (Jenkins, GitLab CI/CD), and containerization. Code Quality & Monitoring: Experience with SonarQube, Prometheus, Grafana, Nagios, and New Relic. Operating Systems: Experience managing Linux/Windows servers and working with CentOS, Fedora, Debian, and Windows platforms. Application & Web Servers: Hands-on experience with Apache Tomcat, Nginx, and Node.js. Security & Networking: Expertise in DNS configuration, SSL/TLS implementation, and AWS security services. Soft Skills: Strong problem-solving abilities, effective communication, and proactive learning. Preferred Qualifications: AWS certifications (Solutions Architect, DevOps Engineer) and a bachelors degree in Computer Science or related field. Experience with hybrid cloud environments and on-premises infrastructure automation.

Posted 3 weeks ago

Apply

3.0 - 8.0 years

5 - 10 Lacs

Bengaluru

Work from Office

• Primary Skills: Prometheus, Grafana, Datadog ,Alerting Techniques, Alert Triage and Incident Management, Application Issues RCA/Debugging, SQL. • Proven L3 level experience in managing large-scale, distributed systems in production environments. Required Candidate profile Drive SRE transformations by building frameworks and migrating traditional IT support to modern SRE practices. Collaborate closely with development and operations teams to improve system observability

Posted 3 weeks ago

Apply

1.0 - 3.0 years

3 - 5 Lacs

Chennai

Work from Office

Design and develop backend components and RESTful APIs using Java (11+) and Spring Boot Build and maintain scalable microservices with strong emphasis on clean architecture Write reliable and efficient SQL queries; work with relational and optionally NoSQL (MongoDB) databases Apply DSA fundamentals in solving problems, optimizing code, and building performant features Follow and advocate for SOLID principles, clean code, and test-driven development Collaborate across product, design, and QA to build meaningful, high-quality features Contribute to internal tools or AI-powered enhancements to accelerate workflows Participate in code reviews, peer discussions, and technical design sessions What Were Looking For : 1-2 years of backend development experience using Java and Spring Boot Solid understanding and application of Data Structures and Algorithms in real-world scenarios Strong foundation in Object-Oriented Programming and adherence to SOLID principles Hands-on experience with SQL databases and understanding of performance tuning Familiarity with MongoDB or other NoSQL databases (good to have) Curiosity or exposure to AI/ML, generative APIs, or automation use cases Good communication skills, debugging ability, and a mindset for continuous learning Bonus Points For: Familiarity with cloud environments (AWS) Experience with Git, CI/CD pipelines (e g , GitHub Actions, Jenkins) Exposure to monitoring/logging tools like Prometheus, Grafana, or ELK Past experience in competitive programming, hackathons, or personal projects

Posted 3 weeks ago

Apply

2.0 - 7.0 years

4 - 9 Lacs

Pune, Coimbatore

Work from Office

Job Summary : We are seeking a skilled Erlang Developer to join our backend engineering team. The ideal candidate will have a strong background in Erlang, with working experience in Elixir and RabbitMQ. You will play a key role in designing, building, and maintaining scalable, fault-tolerant systems used in high-availability environments. Key Responsibilities : - Design, develop, test, and maintain scalable Erlang-based backend applications. - Collaborate with cross-functional teams to understand requirements and deliver efficient solutions. - Integrate messaging systems such as RabbitMQ to ensure smooth communication between services. - Write reusable, testable, and efficient code in Erlang and Elixir. - Monitor system performance and troubleshoot issues in production. - Ensure high availability and responsiveness of services. - Participate in code reviews and contribute to best practices in functional programming. Required Skills : - Proficiency in Erlang with hands-on development experience. - Working knowledge of Elixir and the Phoenix framework. - Strong experience with RabbitMQ and messaging systems. - Good understanding of distributed systems and concurrency. - Experience with version control systems like Git. - Familiarity with CI/CD pipelines and containerization (Docker is a plus). Preferred Qualifications : - Experience working in telecom, fintech, or real-time systems. - Knowledge of OTP (Open Telecom Platform) and BEAM VM internals. - Familiarity with monitoring tools like Prometheus, Grafana, etc.

Posted 3 weeks ago

Apply

3.0 - 8.0 years

30 - 35 Lacs

Bengaluru

Work from Office

The IT AI Application Platform team is seeking a Senior Site Reliability Engineer (SRE) to develop, scale, and operate our AI Application Platform based on Red Hat technologies, including OpenShift AI (RHOAI) and Red Hat Enterprise Linux AI (RHEL AI). As an SRE you will contribute to running core AI services at scale by enabling customer self-service, making our monitoring system more sustainable, and eliminating toil through automation. On the IT AI Application Platform team, you will have the opportunity to influence the complex challenges of scale which are unique to Red Hat IT managed AI platform services, while using your skills in coding, operations, and large-scale distributed system design. We develop, deploy, and maintain Red Hats next-generation Ai application deployment environment for custom applications and services across a range of hybrid cloud infrastructures. We are a global team operating on-premise and in the public cloud, using the latest technologies from Red Hat and beyond. Red Hat relies on teamwork and openness for its success. We are a global team and strive to cultivate a transparent environment that makes room for different voices. We learn from our failures in a blameless environment to support the continuous improvement of the team. At Red Hat, your individual contributions have more visibility than most large companies, and visibility means career opportunities and growth. What you will do The day-to-day responsibilities of an SRE involve working with live systems and coding automation. As an SRE you will be expected to Build and manage our large scale infrastructure and platform services, including public cloud, private cloud, and datacenter-based Automate cloud infrastructure through use of technologies (e.g. auto scaling, load balancing, etc.), scripting (bash, python and golang), monitoring and alerting solutions (e.g. Splunk, Splunk IM, Prometheus, Grafana, Catchpoint etc) Design, develop, and become expert in AI capabilities leveraging emerging industry standards Participate in the design and development of software like Kubernetes operators, webhooks, cli-tools.. Implement and maintain intelligent infrastructure and application monitoring designed to enable application engineering teams Ensure the production environment is operating in accordance with established procedures and best practices Provide escalation support for high severity and critical platform-impacting events Provide feedback around bugs and feature improvements to the various Red Hat Product Engineering teams Contribute software tests and participate in peer review to increase the quality of our codebase Help and develop peers capabilities through knowledge sharing, mentoring, and collaboration Participate in a regular on-call schedule, supporting the operation needs of our tenants Practice sustainable incident response and blameless postmortems Work within a small agile team to develop and improve SRE methodologies, support your peers, plan and self-improve What you will bring A bachelor's degree in Computer Science or a related technical field involving software or systems engineering is required. However, hands-on experience that demonstrates your ability and interest in Site Reliability Engineering are valuable to us, and may be considered in lieu of degree requirements. You must have some experience programming in at least one of these languagesPython, Golang, Java, C, C++ or another object-oriented language. You must have experience working with public clouds such as AWS, GCP, or Azure. You must also have the ability to collaboratively troubleshoot and solve problems in a team setting. As an SRE you will be most successful if you have some experience troubleshooting an as-a-service offering (SaaS, PaaS, etc.) and some experience working with complex distributed systems. We like to see a demonstrated ability to debug, optimize code and automate routine tasks. We are Red Hat, so you need a basic understanding of Unix/Linux operating systems. Desired skills 3+ years of experience of using cloud providers and technologies (Google, Azure, Amazon, OpenStack etc) 1+ years of experience administering a kubernetes based production environment 2+ years of experience with enterprise systems monitoring 2+ years of experience with enterprise configuration management software like Ansible by Red Hat, Puppet, or Chef 2+ years of experience programming with at least one object-oriented language; Golang, Java, or Python are preferred 2+ years of experience delivering a hosted service Demonstrated ability to quickly and accurately troubleshoot system issues Solid understanding of standard TCP/IP networking and common protocols like DNS and HTTP Demonstrated comfort with collaboration, open communication and reaching across functional boundaries Passion for understanding users needs and delivering outstanding user experiences Independent problem-solving and self-direction Works well alone and as part of a global team Experience working with Agile development methodologies #LI-SH4 About Red Hat Red Hat is the worlds leading provider of enterprise open source software solutions, using a community-powered approach to deliver high-performing Linux, cloud, container, and Kubernetes technologies. Spread across 40+ countries, our associates work flexibly across work environments, from in-office, to office-flex, to fully remote, depending on the requirements of their role. Red Hatters are encouraged to bring their best ideas, no matter their title or tenure. We're a leader in open source because of our open and inclusive environment. We hire creative, passionate people ready to contribute their ideas, help solve complex problems, and make an impact. Inclusion at Red Hat Red Hats culture is built on the open source principles of transparency, collaboration, and inclusion, where the best ideas can come from anywhere and anyone. When this is realized, it empowers people from different backgrounds, perspectives, and experiences to come together to share ideas, challenge the status quo, and drive innovation. Our aspiration is that everyone experiences this culture with equal opportunity and access, and that all voices are not only heard but also celebrated. We hope you will join our celebration, and we welcome and encourage applicants from all the beautiful dimensions that compose our global village. Equal Opportunity Policy (EEO) Red Hat is proud to be an equal opportunity workplace and an affirmative action employer. We review applications for employment without regard to their race, color, religion, sex, sexual orientation, gender identity, national origin, ancestry, citizenship, age, veteran status, genetic information, physical or mental disability, medical condition, marital status, or any other basis prohibited by law. Red Hat supports individuals with disabilities and provides reasonable accommodations to job applicants. If you need assistance completing our online job application, email application-assistance@redhat.com . General inquiries, such as those regarding the status of a job application, will not receive a reply.

Posted 3 weeks ago

Apply

4.0 - 8.0 years

15 - 19 Lacs

Bengaluru

Work from Office

We are seeking a DevOps & Odoo Tech Lead (India) to spearhead the rollout and support of the Services QT tool within Odoo, ensuring robust infrastructure, configuration, and operational excellence. Youll design and implement APIs for seamless integration with Nokias service automation platforms and external systems, architect and manage the DevOps environmentincluding CI/CD pipelines, containerization, infrastructure as code, high-availability Odoo deployments, monitoring, and automationand resolve complex performance and integration issues. As team leader, you will coach and mentor DevOps staff, manage agile release cycles, and drive best practices for operational stability, scalability, and security. You have: Deep understanding of Odoo architecture (frontend, backend, database structure) and proficiency in Linux (Ubuntu, Debian, CentOS) as Odoo primarily runs on Linux-based environments Experience in installing, configuring, and optimizing Odoo (Community and Enterprise editions) and system monitoring using tools like Prometheus, Grafana, or ELK stack Knowledge of Odoo modules, customization, and development (Python, XML, JavaScript) and ability to manage Odoo scaling (multi-instance, multi-database) Expertise in Odoo performance tuning (load balancing, caching, database optimization) Experience with Git, GitHub/GitLab CI/CD for version control and deployment automation Experience in setting up and managing virtual machines (VMs), bare-metal servers, and containers and automation of deployments using Ansible, Terraform, or shell scripting It would be nice if you also had: Expertise in PostgreSQL (Odoos database) Experience in AWS, Google Cloud, Azure, or DigitalOcean for cloud-based Odoo hosting Expertise in network security, firewalls, and VPNs Define, design, and oversee the development of APIs required from Nokia products (and other new-tech vendors) to enable seamless integration with Nokias service automation platforms. Act as the primary technical liaison for both internal and external service software teams, guiding effective integration with Nokias service automation components. Diagnose and resolve complex performance and reliability issues within service operations automation using deep expertise in DevOps, infrastructure, and Odoo tuning. Use in-depth business domain knowledge to align architectural and DevOps strategies with service automation goals and customer objectives. Provide structured mentoring, best practices, and real-time guidance to Managed Services DevOps staff, taskforces, and workteams. Coordinate task allocation, monitor progress, and coach team members, contributing feedback for formal performance evaluations. Lead release management within Scrum/Agile cycles, including planning, execution, regression testing, and post-release reviews to meet customer requirements. Administer and optimize Odoo deployments on Linux or cloud platformshandling installation, configuration, performance tuning, HA, backupswhile implementing CI/CD pipelines, containerization, infrastructure automation, monitoring, and security best practices.

Posted 3 weeks ago

Apply

4.0 - 8.0 years

20 - 25 Lacs

Mumbai

Work from Office

Required Qualification: BE/ B Tech/ MCA Skill, Knowledge &Trainings: Own and manage the CI/CD pipelines for automated build, test, and deployment. Design and implement robust deployment strategies for microservices and web applications. Set up and maintain monitoring, alerting, and logging frameworks (e.g., Prometheus, Grafana, ELK) Build automations which will help optimize software delivery. Improve reliability, quality, and time-to-market of our suite of software solutions. Will be responsible for availability, latency, performance efficiency, change management, monitoring, emergency response and capacity planning. Will create services that will do automatic provisioning of test environments, automation of release management process, setting up pre-emptive monitoring of logs and creating dashboards for metrics visualisations Partner with development teams to improve services through rigorous testing and release procedures. Run our infrastructure with Gitlab CI/CD, Kubernetes, Kafka, NGINX and ELK stack. Co-ordinate with infra teams and developers to improvise the incident management process. Responsible for L1 support as well. Good Communication and Presentation skills Core Competencies(Must Have): Elastic, Logstash, Kibana or AppDynamics CI/CD Gitlab/Jenkins Other KeySkills SSO technologies Ansible Python Linux Administration Additional Competencies (Nice to have): Kubernetes Kafka, MQ NGINX or APIGEE Redis Experience in working with outsourced vendor teams for application development Appreciation of Enterprise Functional Architecture in Capital Markets Job Purpose: We are looking for a skilled and proactive Site Reliability Engineer (SRE) with strong expertise in deployment automation, monitoring, and infrastructure reliability. The ideal candidate will be responsible for managing the end-to-end deployment lifecycle, ensuring the availability, scalability, and performance of our production and non-production environments. Area of Operations Key Responsibility Deployment & Release Management Own and manage the CI/CD pipelines for automated build, test, and deployment. Design and implement robust deployment strategies for microservices and web applications. Monitor and troubleshoot deployment issues and rollbacks, ensuring zero-downtime deployment where possible System Reliability & Performance Set up and maintain monitoring, alerting, and logging frameworks (e.g., Prometheus, Grafana, ELK) Any Other Requirement: Should be a good team player. Would be required to work with multiple projects / teams concurrently

Posted 3 weeks ago

Apply

8.0 - 10.0 years

6 - 10 Lacs

Bengaluru

Work from Office

Company Overview: Maximus is a leading innovator in the government space, providing transformative solutions in the management and service delivery of government health and human services programs. We pride ourselves in our commitment to excellence, innovation, and a customer-first approach, driven by our core values. This has fostered our continual support of public programs and improving access to government services for citizens. Maximus continues to grow its Digital Solutions organization to better serve the needs of our organization, our customers in the government, health, and human services space, while improving access to government services for citizens. We use an approach grounded in design thinking, lean, and agile to help solve complicated problems and turn bold ideas into delightful solutions. Job Description: We are seeking a hands-on and strategic Lead DevOps Engineer to architect, implement, and lead the automation and CI/CD practices across our cloud infrastructure. This role demands deep expertise in cloud-native technologies and modern DevOps tooling with a strong emphasis on AWS, Kubernetes, ArgoCD, and Infrastructure Code. The ideal candidate is also expected to be a motivated self-starter with a proactive approach to resolving problems and issues with minimal supervision Key Responsibilities: Design and manage scalable infrastructure across AWS and Azure using Terraform (IaC) Define and maintain reusable Terraform modules to enforce infrastructure standards and best practices Implement secrets management , configuration management, and automated environment provisioning Architect and maintain robust CI/CD pipelines using Jenkins and ArgoCD Implement GitOps workflows for continuous delivery and environment promotion Automate testing, security scanning, and deployment processes across multiple environments Design and manage containerized applications with Docker Deploy and manage scalable, secure workloads using Kubernetes (EKS/ECS/GKE/AKS/self-managed) Create and maintain Helm charts , C ustomize configs, or other manifest templating tools Manage Git repositories, branching strategies, and code review workflows Promote version control best practices including commit hygiene and semantic release tagging Set up and operate observability stacks: Prometheus , Grafana , ELK , Loki , Alertmanager any of those. Define SLAs, SLOs, and SLIs for critical services Lead incident response , perform root cause analysis, and publish post-mortems documentations Integrate security tools and checks directly into CI/CD workflows Manage access control, secrets, and ensure compliance with standards such as FedRamp Mentor and guide DevOps engineers to build a high-performing team Collaborate closely with software engineers, QA, product managers, and security teams Promote a culture of automation , reliability , and continuous improvement Roles and Responsibilities Qualifications: Bachelor's degree in computer science, Information Security, or a related field (or equivalent experience). 8+ years of experience in DevOps or a similar role, with a strong security focus. Preferred AWS Certified Cloud Practitioner certification or AWS Certified Devops Engineer – Professional or AWS Certified Solution Architect or similar. Knowledge of cloud platforms (AWS) (Azure – Good to have) and containerization technologies (Docker, Kubernetes) with a key focus on AWS and EKS, ECS. Experience with infrastructure such as code (IaC) tools such as Terraform. Proficiency in CI/CD tools like AWS CodePipeline, Jenkins, Azure DevOps Server Familiarity with programming and scripting languages (e.g., Python, Bash, Go, Bash). Excellent problem-solving skills and the ability to work in a fast-paced, collaborative environment. Strong communication skills, with the ability to convey complex security concepts to technical and non-technical stakeholders. Preferred Qualifications: Strong understanding and working experience with enterprise applications and containerized application workloads. Knowledge of networking concepts Knowledge of network security principles and technologies (e.g., Firewalls, VPNs, IDS/IPS).

Posted 3 weeks ago

Apply

6.0 - 10.0 years

7 - 11 Lacs

Mumbai

Work from Office

We are looking for an experienced DevOps Engineer (Level 2 3) to design, automate, and optimize cloud infrastructure. You will play a key role in CI/CD automation, cloud management, observability, and security, ensuring scalable and reliable systems. Key Responsibilities: Design and manage AWS environments using Terraform/Ansible. Build and optimize deployment pipelines (Jenkins, ArgoCD, AWS CodePipeline). Deploy and maintain EKS, ECS clusters. Implement OpenTelemetry, Prometheus, Grafana for logs, metrics, and tracing. Manage and scale cloud-native microservices efficiently. Required Skills: Proven experience in DevOps, system administration, or software development. Strong knowledge of AWS. Programming languages: Python, Go, Bash, are good to have Experience with IAC tools like Terraform, Ansible Solid understanding of CI/CD tools (Jenkins, ArgoCD , AWS CodePipeline). Experience in containers and orchestration tools like Kubernetes (EKS) Understanding of OpenTelemetry observability stack (logs, metrics, traces Good to have: Experience with container orchestration platforms (e.g., EKS, ECS). Familiarity with serverless architecture and tools (e.g., AWS Lambda). Experience using monitoring tools like DataDog/ NewRelic, CloudWatch, Prometheus/Grafana Experience with managing more than 20+ cloud-native microservices. Previous experience of working in a startup Education Experience: Bachelor’s degree in Computer Science, Information Technology, or a related field (or equivalent work experience). Years of relevant experience in DevOps or a similar role.

Posted 3 weeks ago

Apply

4.0 - 7.0 years

18 - 20 Lacs

Bengaluru

Work from Office

Design, develop, deploy, and maintain RESTful microservices using Java 8+ and Spring Boot, optimized for media delivery and high-volume traffic. Integrate video processing, DRM (e.g., Widevine), and CDN systems to enable adaptive streaming. Implement backend services for content ingestion, encoding pipelines, metadata management, and playback support. Collaborate with frontend (web/app) teams to define APIs for playback, user profiles, watchlists, and recommendations. Build and optimize services for scalability using AWS (EC2, S3, Lambda, CloudFront, Elastic Transcoder) Containerize services with Docker; deploy on ECS or Kubernetes/EKS. Set up CI/CD pipelines (Jenkins, GitHub Actions, AWS CodePipeline). Monitor video service health and performance using CloudWatch, Prometheus, and Grafana. Ensure fault-tolerance with patterns like circuit-breakers (e.g., Netflix Hystrix/Eureka), retries, and auto-scaling. Collaborate with ML teams to support recommendation engines and personalized metadata features. Enforce security standards (authentication, authorization, encryption, IAM). Mentor junior engineers; drive code reviews and architecture discussions. Hands-on with AWS services, especially for content delivery and scalability.

Posted 3 weeks ago

Apply

3.0 - 7.0 years

4 - 8 Lacs

Chennai

Work from Office

Role & responsibilities Collaborate with stakeholders to clarify problem statements, narrow down the analysis scope, and use findings to improve product outcomes. Create replicable data analyses using open-source technologies, summarize findings effectively, and communicate assumptions clearly. Build data pipelines to transition ad-hoc analyses into production-ready dashboards for fellow engineers' use. Develop, deploy, and maintain metrics, applications, and tools to empower engineers to access and utilize data insights independently. Write well-structured and thoroughly tested code, ensuring maintainability and scalability for future enhancements. Stay updated on relevant technologies and propose new ones for team adoption.

Posted 3 weeks ago

Apply

3.0 - 8.0 years

8 - 18 Lacs

Mumbai, Thane, Ahmedabad

Work from Office

Department: App Modernization Job Overview: We are looking for skilled Node.js Developers to join our team supporting TATA AIAs digital initiatives. The selected candidate(s) will be involved in designing and developing scalable microservices-based applications with a focus on Node.js, JavaScript, PostgreSQL, and cloud-native DevOps practices. The role offers opportunities to work on a modern technology stack including Prometheus, Grafana, Azure, and CI/CD pipelines, while contributing to real-time digital transformation projects. Key Responsibilities: Backend Development Design and implement RESTful APIs and backend services using Node.js (Express/Fastify). Develop scalable microservices with performance and reliability in mind. Monitoring and Performance Implement observability using Prometheus and Grafana for metrics collection, alerting, and monitoring. Database & Data Handling Work with PostgreSQL or other relational databases for data modeling and efficient query handling. CI/CD & DevOps Participate in Azure DevOps workflows, including build, deploy, and testing pipelines. Ensure zero-touch deployment and follow modern CI/CD practices. Collaboration & Version Control Collaborate using JIRA, Bitbucket, GIT, and contribute to an Agile development environment. Qualifications: Education: Bachelors degree in Computer Science, Information Technology, or a related field. Experience: 3 to 10 years of hands-on experience in Node.js development based on the role band. Skills: Primary: Node.js Prometheus Microservices Grafana Good Knowledge In: PostgreSQL Azure DevOps Additional Skills: Proficiency in project management tools like JIRA Experience with version control systems such as Bitbucket and GIT Familiarity with CI/CD pipelines and automation tools

Posted 3 weeks ago

Apply

5.0 - 10.0 years

8 - 18 Lacs

Hyderabad, Chennai, Bengaluru

Hybrid

SRE (Site Reliability Engineer Node.js Focus) Location: Hyderabad Interview Date: 10th July: 5:00 PM – 6:00 PM Core Skills Required: Node.js backend expertise Site Reliability Engineering best practices Monitoring: Prometheus, Grafana, Azure Monitor CI/CD Automation, Infrastructure as Code Incident Management & Production Support Good to Have: Experience in Kubernetes / Docker Load testing and capacity planning Scripting (Python, PowerShell, Bash) Additional Info Interviews will be conducted virtually. Candidates must be B.E. / B.Tech graduates from recognized institutions. Strong communication and client-facing skills are essential. Immediate joiners or short notice candidates will be prioritized. How to Apply: Send your CV to: careers@gigaswartechnologies.com Or apply here: tinyurl.com/2kansjhe

Posted 3 weeks ago

Apply

7.0 - 12.0 years

15 - 25 Lacs

Bengaluru

Work from Office

Preferred candidate profile DevOps & Cloud Infrastructure Engineer to lead the design, implementation, and optimization of scalable, secure, and cost-effective infrastructure in Microsoft Azure . The ideal candidate will have deep expertise in Kubernetes , Docker , Terraform , CI/CD , and monitoring tools like Datadog, Grafana , Prometheus along with experience in SonarQube setup , Azure AD integration , and multi-tenancy architecture . Key Responsibilities: Design and implement scalable, secure, and cost-efficient infrastructure on Azure . Set up and manage Kubernetes clusters and containerized applications using Docker . Automate infrastructure provisioning using Terraform . Build and maintain robust CI/CD pipelines for continuous integration and deployment. Design and implement multi-tenancy architecture to support multiple clients or business units securely and efficiently

Posted 3 weeks ago

Apply

15.0 - 20.0 years

5 - 9 Lacs

Chennai

Work from Office

Project Role : Application Developer Project Role Description : Design, build and configure applications to meet business process and application requirements. Must have skills : DevOps Good to have skills : NAMinimum 7.5 year(s) of experience is required Educational Qualification : 15 years full time education Summary :As an Application Developer, you will design, build, and configure applications to meet business process and application requirements in a fast-paced environment, ensuring seamless integration and functionality. Roles & Responsibilities:- Expected to be an SME, collaborate, and manage the team to perform.- Responsible for team decisions.- Engage with multiple teams and contribute on key decisions.- Expected to provide solutions to problems that apply across multiple teams.- Lead the development and implementation of software solutions.- Collaborate with cross-functional teams to define, design, and ship new features.- Ensure the best possible performance, quality, and responsiveness of applications.- Identify bottlenecks and bugs, and devise solutions to mitigate and address these issues. Professional & Technical Skills: - Must To Have Skills: Proficiency in DevOps.- Strong understanding of continuous integration and continuous deployment (CI/CD) pipelines.- Experience with infrastructure as code (IaC) tools like Terraform or CloudFormation.- Knowledge of containerization technologies such as Docker and Kubernetes.- Hands-on experience with monitoring and logging tools like Prometheus and ELK stack. Additional Information:- The candidate should have a minimum of 12 years of experience in DevOps.- This position is based at our Chennai office.- A 15 years full-time education is required. Qualification 15 years full time education

Posted 3 weeks ago

Apply

15.0 - 20.0 years

5 - 10 Lacs

Hyderabad

Work from Office

Project Role : DevOps Engineer Project Role Description : Responsible for building and setting up new development tools and infrastructure utilizing knowledge in continuous integration, delivery, and deployment (CI/CD), Cloud technologies, Container Orchestration and Security. Build and test end-to-end CI/CD pipelines, ensuring that systems are safe against security threats. Must have skills : DevSecOps Good to have skills : Google Cloud Platform Architecture, Microsoft Azure Infrastructure as Code (IaC)Minimum 7.5 year(s) of experience is required Educational Qualification : 15 years full time education Summary :As a DevsecOps Engineer, you will be responsible for building and setting up new development tools and infrastructure. A typical day involves utilizing your knowledge in continuous integration, delivery, and deployment, as well as cloud technologies and container orchestration. You will also focus on ensuring that systems are secure against potential threats while collaborating with various teams to enhance the development process and improve overall efficiency. Roles & Responsibilities:- Expected to be an SME.- Collaborate and manage the team to perform.- Responsible for team decisions.- Engage with multiple teams and contribute on key decisions.- Provide solutions to problems for their immediate team and across multiple teams.- Facilitate knowledge sharing sessions to enhance team capabilities.- Monitor and optimize CI/CD pipelines for performance and security.- Oversee the development, maintenance, and testing of Hashicorp Terraform modules for infrastructure as code (IaC)- Ensure the design, implementation, and management of Sentinel policies as code to enforce security and compliance standards- Collaborate with cross-functional teams to integrate security practices into the CI/CD pipeline- Drive the automation of infrastructure provisioning, configuration management, and application deployment processes- Monitor and troubleshoot infrastructure and application issues, ensuring high availability and performance- Conduct regular security assessments and audits to identify vulnerabilities and implement remediation measures- Stay up to date with the latest industry trends, tools, and best practices in DevSecOps, Terraform, and Sentinel- Foster a culture of continuous improvement, innovation, and collaboration within the team- Develop and implement strategies to enhance the team's efficiency, productivity, and overall performance- Report on team progress, challenges, and achievements to senior management Professional & Technical Skills: - Must To Have Skills: Proficiency in DevSecOps.- Good To Have Skills: Experience with Google Cloud Platform Architecture, Microsoft Azure Infrastructure as Code (IaC).- Strong understanding of continuous integration and continuous deployment methodologies.- Experience with container orchestration tools such as Kubernetes or Docker Swarm.- Familiarity with security best practices in software development and deployment.- Proven experience in a leadership role within a DevSecOps or similar environment- Strong expertise in Hashicorp Terraform and infrastructure as code (IaC) principles- Proficiency in developing and managing Sentinel policies as code- Experience with CI/CD tools such as GitHub, GitHub Actions, Jenkins, and JFrog Platform- Solid understanding of cloud platforms, specifically Google Cloud Platform (GCP) and Microsoft Azure- Knowledge of containerization technologies (Docker, Kubernetes) and orchestration.- Familiarity with security frameworks and compliance standards (e.g., NIST, ISO 27001).- Certifications in Terraform, GCP, or Azure (e.g., HashiCorp Certified:Terraform Associate, Google Cloud Professional Cloud Architect, Microsoft Certified:Azure Solutions Architect Expert).- Experience with scripting languages (Python, Bash, PowerShell).- Knowledge of monitoring and logging tools (Prometheus, Grafana, ELK stack). Additional Information:- The candidate should have minimum 7.5 years of experience in DevSecOps.- This position is based at our Hyderabad office.- A 15 years full time education is required. Qualification 15 years full time education

Posted 3 weeks ago

Apply

7.0 - 12.0 years

10 - 20 Lacs

Hyderabad, Chennai, Bengaluru

Hybrid

Dear candidate, Greetings from Wipro!!! We are hiring Devops SRE with python scripting -Bangalore/Hyderabad/Chennai. Exp: 7 to 15 years. Job location: Bangalore and Hyderabad, Chennai Note: pls share only who can join in 0 to 15 days. JD SRE - Very good in Unix, Jenkins and Scripting python. Should be proficient in creating Workflows in Jenkins and Ansible playbooks Should have understanding Monitoring Tools like Grafana, Splunk, Epic and Inginx Should be able to understand of Databases like MySQL/Oracle/Cassandra Very good in DevOps process and troubleshooting Issues Experience in Production Deployment and On-Call Support. Good to have knowledge in Spinnaker Excellent Analytical, Troubleshooting and problem-solving skills Experience in solving problems and working with a team to resolve large scale production environment issues. To drive the team during Production Maintenances, Outages and Load test activities. Please share profile to kasturi.mettin@wipro.com with below details. Total exp: Rel: CTC:: ECTC:: NP: Current Location: Pref Location: Interview Time: Thanks, Kasturi Mettin kasturi.mettin@wipro.com

Posted 3 weeks ago

Apply

2.0 - 6.0 years

3 - 8 Lacs

Bengaluru

Work from Office

2+ years of hands-on experience in DevOps, with strong expertise in infrastructure automation and cloud-native technologies. Proficient in Terraform for infrastructure provisioning and Argo CD for GitOps-based continuous deployment. Solid understanding of cloud platforms including GCP , AWS , and Azure . Azure experience is a strong plus . Must have experience in setting up and managing monitoring and alerting using tools like Prometheus and Grafana . Responsible for ensuring high system uptime , continuous monitoring, and timely detection and notification of system anomalies. Collaborate with product managers to define and execute the DevOps roadmap for Saleskens services. Drive end-to-end execution of DevOps projects and report on progress and system health at an executive level. Design, implement, and enhance CI/CD pipelines to support reliable and frequent deployments. Perform root cause analysis of operational issues and work closely with development teams to implement fixes and improvements. Manage capacity planning and lead infrastructure enhancement projects, including design, budgeting, and execution. Build and maintain platforms for log processing , metrics collection , and data visualization to support observability and performance tracking. Cloud certifications are a plus.

Posted 3 weeks ago

Apply

5.0 - 10.0 years

11 - 16 Lacs

Pune

Work from Office

What You'll Do We are looking for experienced Machine Learning Engineers with a background in software development and a deep enthusiasm for solving complex problems. You will lead a dynamic team dedicated to designing and implementing a large language model framework to power diverse applications across Avalara. Your responsibilities will span the entire development lifecycle, including conceptualization, prototyping and delivery of the LLM platform features. You will have a blend of technical skills in the fields of AI & Machine Learning especially with LLMs and a deep-seated understanding of software development practices where you'll work with a team to ensure our systems are scalable, performant and accurate. You will be reporting to Senior Manager, AI/ML. What Your Responsibilities Will Be We are looking for engineers who can think quick and have a background in implementation. Your responsibilities will include: Build on top of the foundational framework for supporting Large Language Model Applications at Avalara Experience with LLMs - like GPT, Claude, LLama and other Bedrock models Leverage best practices in software development, including Continuous Integration/Continuous Deployment (CI/CD) along with appropriate functional and unit testing in place. Inspire creativity by researching and applying the latest technologies and methodologies in machine learning and software development. Write, review, and maintain high-quality code that meets industry standards. Lead code review sessions, ensuring good code quality and documentation. Mentor junior engineers, encouraging a culture of collaboration. Proficiency in developing and debugging software with a preference for Python, though familiarity with additional programming languages is valued and encouraged. What You'll Need to be Successful Bachelor's/Master's degree in computer science with 5+ years of industry experience in software development, along with experience building Machine Learning models and deploying them in production environments. Proficiency working in cloud computing environments (AWS, Azure, GCP), Machine Learning frameworks, and software development best practices. Work with technological innovations in AI & ML(esp. GenAI) Experience with design patterns and data structures. Good analytical, design and debugging skills. Technologies you will work with: Python, LLMs, MLFlow, Docker, Kubernetes, Terraform, AWS, GitLab, Postgres, Prometheus, Grafana

Posted 3 weeks ago

Apply

3.0 - 6.0 years

15 - 25 Lacs

Bengaluru

Work from Office

The Opportunity Are you a self-starter with a strong background in UI development, automation, and cloud technologies, who thrives in a collaborative environment? If so, youll find an exciting opportunity on our team, where youll engage in innovative projects, deliver impactful demos, and work closely with diverse experts to drive real-world customer outcome solutions. This team strives to promote continuous learning and growth in a flexible and supportive culture. About the Team The team for this role is part of the Solutions & Performance Engineering organization within R&D at Nutanix, a global organization which operates out of various geographic locations. The team is known for its collaborative culture, where innovation and continuous learning are highly valued. The mission of the Solutions & Performance Engineering team is to engage customers on their technological and business challenges and leverage advanced technologies to develop impactful solutions, and provide efficient, seamless automation processes for clients worldwide. Your Role We are seeking a highly skilled Front-End Engineer to design, build, and optimize user interfaces with a focus on scalability and efficiency , that empower our engineering teams with deep insights into system performance. This role is ideal for someone with strong React.js expertise, a passion for building high-performing UIs, and a problem-solving mindset. Youll work closely with backend engineers and infrastructure teams to develop dashboards, integrate with APIs, and automate the visualization of complex data. Your work will help drive decisions, detect performance regressions, and streamline infrastructure automation workflows . 1. UI/UX Design & Front-End Development Build scalable and responsive front-end applications using React.js . Optimize UI/UX by managing cookies, caching , and performance tuning for large-scale apps (1,000+ pages). Revamp and modernize legacy front-end codebases for better maintainability and performance. Integrate with microservices-based backend architectures to ensure seamless data flow. Collaborate with design teams to create intuitive and visually appealing user interfaces. 2. Data Visualization & Insights Generation Develop interactive dashboards to visualize system performance trends and analytics. Work with APIs and performance benchmarks to translate backend data into actionable visual insights. Collaborate with backend engineers to define and optimize API contracts for UI needs. Utilize tools like Figma for UI design and translate wireframes into high-quality front-end components. What You Will Bring Required Skills & Experience: Proficiency in React.js , JavaScript, and front-end architecture. Strong experience with UI/UX design principles and tools such as Figma . Familiarity with REST APIs and microservices integration. Version control with Git ; experience in CI/CD pipelines , Docker , and Kubernetes . Experience building UIs that scale and perform efficiently under large data loads. Soft Skills & Qualities: Problem Solver: Can troubleshoot complex issues and design innovative, scalable solutions. Effective Communicator: Comfortable explaining technical concepts to both engineers and non-technical stakeholders. Team Player: Works well across teams and contributes to a collaborative, solution-oriented environment. Self-Starter: Independent learner who adapts quickly to new technologies and challenges. Detail-Oriented: Produces high-quality, efficient, and reliable code. Accountable: Takes ownership of tasks and delivers end-to-end solutions. Organized: Strong time management and prioritization skills in fast-paced environments. Preferred / Bonus Skills: Experience with distributed systems and cloud-native architectures . Familiarity with observability tools (e.g., Prometheus, Grafana, Loki, Jaeger, ELK stack). Background in cloud infrastructure automation using AWS, Azure, GCP, or OpenStack. Hands-on with infrastructure as code and workload orchestration tools like Terraform , Ansible , or Kubernetes

Posted 4 weeks ago

Apply

5.0 - 10.0 years

20 - 35 Lacs

Bengaluru

Remote

Role Role : Site Reliability Engineer (SRE) Location : Remote Work Hours : US Working Hours (Weekends on Rotation Basis) Upsmart Solutions At Upsmart Solutions, were focused on delivering high-performing digital solutions backed by strong engineering teams. Were looking for a skilled and proactive Site Reliability Engineer (SRE) to support and enhance the performance of systems that impact thousands of users on both buyer and seller sides. This role is ideal for someone with prior experience in high-traffic e-commerce and/or video platforms like Twitch, Whatnot, etc. You will collaborate with cross-functional teams to troubleshoot issues, build reliable systems, and maintain high availability. A strong background in Java and NodeJS is essential, along with excellent communication skills and a customer-first mindset. Objectives of this role: Ensure high availability, reliability, and performance of production systems. Handle escalated technical issues impacting users and vendors, driving quick and lasting resolutions. Collaborate with Engineering teams to improve observability, alerting, and system robustness. Own incident management, postmortems, and RCA documentation. Continuously improve automation for monitoring, deployment, and infrastructure. Key Responsibilities: Monitor system performance and troubleshoot production issues. Manage infrastructure reliability for platforms built on Java and NodeJS. Collaborate with development teams to optimize applications for scale and performance. Build internal tools for improved operational efficiency. Provide on-call support during US hours and on a rotational weekend basis. Maintain detailed records of incidents, fixes, and preventive measures. Required Skills and Qualifications: Minimum 5 years of experience in SRE or DevOps roles. Hands-on expertise in Java and NodeJS . (Mandatory) Prior experience supporting e-commerce or video streaming platforms . Proven troubleshooting experience across frontend, backend, and infrastructure layers. Strong grasp of system design, scalability, and observability. Excellent verbal and written communication skills. Preferred Skills and Qualifications: Experience with cloud platforms (AWS, GCP, or Azure). Familiarity with CI/CD pipelines, Docker, Kubernetes, and monitoring tools (Grafana, Prometheus, etc.). Incident response and RCA reporting experience.

Posted 4 weeks ago

Apply

5.0 - 8.0 years

15 - 30 Lacs

Gurugram

Work from Office

We are looking for a talented Software Engineer with hands-on experience in Quarkus and Red Hat Fuse to design, develop, and maintain integration solutions. The ideal candidate will have strong proficiency in Java, experience with Kafka-based event streaming, RESTful APIs, relational databases, and CI/CD pipelines deployed on OpenShift Container Platform (OCP) . This role requires a developer who is passionate about building robust microservices and integration systems in a cloud-native environment. Key Responsibilities: Design and develop scalable microservices using Quarkus framework. Build and maintain integration flows and APIs leveraging Red Hat Fuse (Apache Camel) for enterprise integration patterns. Develop and consume RESTful web services and APIs. Design, implement, and optimize Kafka producers and consumers for real-time data streaming and event-driven architecture. Write efficient, well-documented, and testable Java code adhering to best practices. Work with relational databases (e.g., PostgreSQL, MySQL, Oracle) including schema design, queries, and performance tuning. Collaborate with DevOps teams to build and maintain CI/CD pipelines for automated build, test, and deployment workflows. Deploy and manage applications on OpenShift Container Platform (OCP) including containerization best practices (Docker). Participate in code reviews, design discussions, and agile ceremonies. Troubleshoot and resolve production issues with a focus on stability and performance. Keep up-to-date with emerging technologies and recommend improvements. Required Skills & Experience: Strong experience with Java (Java 8 or above) and the Quarkus framework. Expertise in Red Hat Fuse (or Apache Camel) for integration development. Proficient in designing and consuming REST APIs. Experience with Kafka for event-driven and streaming solutions. Solid understanding of relational databases and SQL . Experience in building and maintaining CI/CD pipelines (e.g., Jenkins, GitLab CI) and automated deployment. Hands-on experience deploying applications to OpenShift Container Platform (OCP). Working knowledge of containerization tools like Docker. Familiarity with microservices architecture, cloud-native development, and agile methodologies. Strong problem-solving skills and ability to work independently as well as in a team environment. Good communication and documentation skills.

Posted 4 weeks ago

Apply
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Featured Companies