Home
Jobs

628 Prometheus Jobs - Page 9

Filter
Filter Interviews
Min: 0 years
Max: 25 years
Min: ₹0
Max: ₹10000000
Setup a job Alert
JobPe aggregates results for easy application access, but you actually apply on the job portal directly.

8.0 - 11.0 years

35 - 37 Lacs

Kolkata, Ahmedabad, Bengaluru

Work from Office

Naukri logo

Dear Candidate, We are hiring an SRE to improve reliability and scalability of production systems. Ideal for engineers passionate about automation, monitoring, and performance optimization. Key Responsibilities: Design and implement SLOs, SLAs, and alerting systems Automate operational tasks and incident responses Build robust observability into all services Conduct post-incident reviews and root cause analysis Required Skills & Qualifications: Strong coding/scripting skills (Python, Go, Bash) Experience with cloud services and Kubernetes Knowledge of monitoring/logging tools (Datadog, Prometheus, ELK) Bonus: Background in performance engineering or chaos testing Soft Skills: Strong troubleshooting and problem-solving skills. Ability to work independently and in a team. Excellent communication and documentation skills. Note: If interested, please share your updated resume and preferred time for a discussion. If shortlisted, our HR team will contact you. Kandi Srinivasa Delivery Manager Integra Technologies

Posted 3 weeks ago

Apply

1.0 - 6.0 years

2 - 6 Lacs

Bengaluru

Work from Office

Naukri logo

We are seeking an experienced OpenShift Engineer to design, deploy, and manage containerized applications on Red Hat OpenShift. Key Responsibilities: Design, deploy, and manage OpenShift container platforms in on-premises and cloud environments. Configure and optimize OpenShift clusters to ensure high availability and scalability. Implement CI/CD pipelines and automation for containerized applications. Monitor and troubleshoot OpenShift environments, identifying and resolving issues proactively. Work closely with development teams to support containerized application deployment and orchestration. Manage security policies, access controls, and compliance for OpenShift environments. Perform upgrades, patches, and maintenance of OpenShift infrastructure. Develop and maintain documentation for OpenShift architecture, configurations, and best practices. Stay updated with industry trends and emerging technologies in containerization and Kubernetes. Deploy, configure, and manage OpenShift clusters in hybrid/multi-cloud environments. Automate deployments using CI/CD pipelines (Jenkins, GitLab CI/CD, ArgoCD). Troubleshoot Kubernetes/OpenShift-related issues and optimize performance. Implement security policies and best practices for containerized workloads. Work with developers to containerize applications and manage microservices. Monitor and manage OpenShift clusters using Prometheus, Grafana, and logging tools.

Posted 3 weeks ago

Apply

3.0 - 8.0 years

15 - 20 Lacs

Pune

Work from Office

Naukri logo

About the job Sarvaha would like to welcome Kafka Platform Engineer (or a seasoned backend engineer aspiring to move into platform architecture) with a minimum of 4 years of solid experience in building, deploying, and managing Kafka infrastructure on Kubernetes platforms. Sarvaha is a niche software development company that works with some of the best funded startups and established companies across the globe. Please visit our website at What Youll Do - Deploy and manage scalable Kafka clusters on Kubernetes using Strimzi, Helm, Terraform, and StatefulSets - Tune Kafka for performance, reliability, and cost-efficiency - Implement Kafka security: TLS, SASL, ACLs, Kubernetes Secrets, and RBAC - Automate deployments across AWS, GCP, or Azure - Set up monitoring and alerting with Prometheus, Grafana, JMX Exporter - Integrate Kafka ecosystem components: Connect, Streams, Schema Registry - Define autoscaling, resource limits, and network policies for Kubernetes workloads - Maintain CI/CD pipelines (ArgoCD, Jenkins) and container workflows You Bring - BE/BTech/MTech (CS/IT or MCA), with an emphasis in Software Engineering - Strong foundation in the Apache Kafka ecosystem and internals (brokers, ZooKeeper/KRaft, partitions, storage) - Proficient in Kafka setup, tuning, scaling, and topic/partition management - Skilled in managing Kafka on Kubernetes using Strimzi, Helm, Terraform - Experience with CI/CD, containerization, and GitOps workflows - Monitoring expertise using Prometheus, Grafana, JMX - Experience on EKS, GKE, or AKS preferred - Strong troubleshooting and incident response mindset - High sense of ownership and automation-first thinking - Excellent collaboration with SREs, developers, and platform teams - Clear communicator, documentation-driven, and eager to mentor/share knowledge Why Join Sarvaha? - Top notch remuneration and excellent growth opportunities - An excellent, no-nonsense work environment with the very best people to work with - Highly challenging software implementation problems - Hybrid Mode. We offered complete work from home even before the pandemic.

Posted 3 weeks ago

Apply

1.0 - 2.0 years

6 - 8 Lacs

Bengaluru

Work from Office

Naukri logo

CI/ CD Developer || 1-2 years exp || Bangalore || Work from office Roles & Responsibilities o Automate and optimize CI/CD workflows to enhance efficiency and developer productivity o Design, implement, and maintain automated CI/CD pipelines for seamless code testing, building, and deployment. o Integrate automated testing (unit, integration, performance) to ensure code quality before deployment. o Manage and monitor CI/CD/DevOps infrastructure to ensure high availability. o Embed security best practices in the DevOps pipeline, addressing vulnerabilities early and ensuring compliance. o Oversee monitoring, logging, root cause analysis, and preventive measures for system failures. o Manage user roles, permissions, and enforce security policies across environments. o Generate actionable insights through interactive reports and visualizations using Power BI. o Collaborate with development teams to understand CI/CD needs and deliver effective solutions. o Possess strong analytical, technical, and problem-solving skills with a research-driven approach. o Be a self-starter, contributing to the adoption of DevOps/CI/CD practices. o Research and evaluate new DevOps tools for continuous improvement. o Document CI/CD/DevOps infrastructure, workflows, and automation processes. Skills Technical o Programming and automation: Python, windows batch scripts/Power Shell o Good knowledge of windows platform o Build Tool: Jenkins o Version control: Subversion o Visualization and reporting: PowerBI o Cloud computing, Containerization orchestration You are best equipped for this role if you have o Expertise and working knowledge of Agile Software Development Methodology o Expert knowledge and hands-on experience in scripting (Power shell/batch/python), automation, DevOps tools and methodologies o Expert knowledge and working experience in build automation using Jenkins o Hands on experience in creating and managing Jenkins pipelines o Skilled in Jenkins server administration o Hands on experience in version control tools: Subversion (SVN), Git o Skilled in administrating version control tools on server: Subversion (SVN), Git o Use and integrate different industry standard tools that fit the different parts of the SDLC. o Knowledge of PowerBI for Visualization and reporting o Knowledge of Cloud computing, containerization orchestration o Team player with good communication Skills - Nice to Have o Knowledge and exposer on containerization using Docker, Kubernetes, OpenShift. o Knowledge and exposer on Monitoring and Logging using Prometheus, Grafana o Understand the complete software development life cycle (SDLC).

Posted 3 weeks ago

Apply

5.0 - 7.0 years

35 - 40 Lacs

Mumbai, Pune, Gurugram

Work from Office

Naukri logo

Must have 5+ years of experience.Implement & maintain Kubernetes clusters, ensuring high availability and scalability. Established real-time monitoring with Grafana, Prometheus, and CloudWatch Night Shift Location-Mumbai,Gurugram,Chennai,Indore,Remote Bangalore , Delhi,kolkata

Posted 3 weeks ago

Apply

4.0 - 8.0 years

13 - 17 Lacs

Bengaluru

Work from Office

Naukri logo

Roles & Responsibilities : - Working closely with the CTO and members of technical staff to meet deadlines. - Working with an agile team to setup and configure GitOps (CI/CD) based pipelines on GitLab - Create and deploy Edge AIoT pipelines using AWS Greengrass or Azure IoT - Design and develop secure cloud system architectures in accordance with enterprise standards - Package and automate deployment of releases using Helm charts - Analyze and optimize resource consumption of deployments - Integrate with Prometheus, Grafana, Kibana etc. for application monitoring - Adhering to best practices to deliver secure and robust solutions Requirements : - Experience with Kubernetes and AWS - Knowledge of cloud architecture concepts (IaaS, PaaS, SaaS) - Knowledge of Docker and Linux bash scripting - Strong desire to expand knowledge in modern cloud architectures - Knowledge of System Security Concepts (SAST, DAST, Penetration Testing, Vulnerability analysis) - Familiarity with version control concepts (Git) Apply Insights Follow-up Save this job for future reference Did you find something suspiciousReport Here! Hide This Job Click here to hide this job for you. You can also choose to hide all the jobs from the recruiter.

Posted 3 weeks ago

Apply

3.0 - 5.0 years

9 - 13 Lacs

Bengaluru

Work from Office

Naukri logo

About the Role We are seeking a skilled and motivated Cloud Engineer to join our team. In this role, you will be responsible for designing, implementing, and maintaining our cloud infrastructure. You will work closely with development and operations teams to ensure the reliability, scalability, and security of our cloud-based applications and services.Key Responsibilities Cloud Infrastructure Design & Implementation - Design and implement cloud infrastructure solutions using AWS, Azure, or GCP.- Configure and manage virtual machines, storage, networking, and other cloud resources.- Implement infrastructure as code (IaC) using tools like Terraform or CloudFormation.- Design and deploy scalable and highly available cloud architectures.Cloud Operations & Maintenance - Monitor cloud infrastructure performance and identify potential issues.- Troubleshoot and resolve cloud-related incidents.- Perform routine maintenance tasks, such as patching and upgrades.- Implement and maintain backup and disaster recovery solutions.Automation & Scripting - Automate cloud infrastructure provisioning and management tasks using scripting languages (e.g. , Python, Bash).- Develop and maintain automation scripts for CI/CD pipelines.- Implement configuration management using tools like Ansible or Chef.Security & Compliance - Implement and maintain cloud security best practices.- Ensure compliance with industry standards and regulations (e.g. , SOC 2, GDPR).- Implement security monitoring and alerting.- Implement IAM best practices.Containerization & Orchestration - Deploy and manage containerized applications using Docker and Kubernetes.- Implement and maintain container orchestration solutions.- Manage and implement Helm charts.Monitoring & Logging - Implement and maintain monitoring and logging solutions using tools like Prometheus, Grafana, and ELK stack.- Configure alerts and notifications for critical events.- Utilize Cloud native monitoring tools.Collaboration & Communication - Collaborate with development and operations teams to ensure smooth application deployments.- Communicate effectively with stakeholders regarding cloud infrastructure status and issues.- Document cloud infrastructure designs and procedures.Required Technical Skills Cloud Platforms - Proficiency in AWS, Azure, or GCP.- Knowledge of core cloud services (EC2, S3, VPC, Azure VMs, Azure Storage, GCP Compute Engine, GCP Storage).- Infrastructure as Code (IaC) - Experience with Terraform or CloudFormation.Containerization & Orchestration - Proficiency in Docker and Kubernetes.- Experience with Helm.Scripting & Automation - Proficiency in Python or Bash scripting.- Experience with Ansible or Chef.Monitoring & Logging - Experience with Prometheus, Grafana, and ELK stack.- Experience with cloud native monitoring tools.Networking - Understanding of networking concepts and protocols (TCP/IP, DNS, VPN).Security - Knowledge of cloud security best practices and IAM.Operating Systems - Proficiency in Linux or Windows Server administration.Version Control - Experience with Git.Required Experience - 3-5 years of experience in cloud engineering or related roles.- Proven experience in designing and implementing cloud infrastructure.- Experience with automating cloud operations.Soft Skills - Excellent problem-solving and troubleshooting skills.- Strong communication and collaboration skills.- Ability to work independently and as part of a team. - Strong attention to detail. - Strong desire to learn new technologies.Certifications (Preferred) - AWS Certified Solutions Architect - Associate.- Microsoft Certified Azure Administrator Associate.- Google Cloud Certified Professional Cloud Architect. - Certified Kubernetes Administrator (CKA).Education Bachelor's degree in Computer Science, Information Technology, or a related fieldApplyInsightsFollow-upSave this job for future referenceDid you find something suspiciousReport Here! Hide This JobClick here to hide this job for you. You can also choose to hide all the jobs from the recruiter.

Posted 3 weeks ago

Apply

4.0 - 5.0 years

6 - 10 Lacs

Bengaluru

Work from Office

Naukri logo

Job Title DevOps Engineer (Python) Experience 4-5 Years Location Bangalore About the Role We are seeking a highly motivated and skilled DevOps Engineer with strong Python programming skills to join our team. In this role, you will be responsible for automating and streamlining our software development and deployment processes, ensuring efficient and reliable software delivery. Key Responsibilities - Develop and maintain CI/CD pipelines using tools like Jenkins, GitLab CI/CD, or Azure DevOps. - Automate infrastructure provisioning and management using tools like Terraform, Ansible, or Puppet. - Develop and maintain Python scripts for various DevOps tasks, such as 1. Automating deployments 2. Monitoring and alerting 3. Data analysis and reporting 4. System administration tasks - Troubleshoot and resolve infrastructure and deployment issues. - Collaborate with development teams to improve software delivery processes. - Stay abreast of the latest DevOps tools, technologies, and best practices. Required Skills Mandatory - Strong Python programming skills - Experience with CI/CD pipelines and tools (Jenkins, GitLab CI/CD, Azure DevOps) - Experience with infrastructure automation tools (Terraform, Ansible, Puppet) - Experience with cloud platforms (AWS, Azure, GCP). - Experience with containerization technologies (Docker, Kubernetes). - Experience with scripting languages (Bash, Shell). - Strong understanding of Linux/Unix systems. - Excellent problem-solving and analytical skills. - Strong communication and collaboration skills. Desired Skills (Optional) - Experience with monitoring and logging tools (Prometheus, Grafana, ELK stack). - Experience with configuration management tools (Chef, SaltStack). - Experience with security best practices and tool Apply Insights Follow-up Save this job for future reference Did you find something suspiciousReport Here! Hide This Job Click here to hide this job for you. You can also choose to hide all the jobs from the recruiter.

Posted 3 weeks ago

Apply

2.0 - 5.0 years

11 - 15 Lacs

Ahmedabad

Work from Office

Naukri logo

DesignationSenior DevOps Engineer/ DevOps Engineer JDKey Responsibilities : CI/CD Pipeline Development and Management : - Design, build, and maintain CI/CD pipelines using Jenkins, GitLab CI, or similar tools. - Automate deployment processes for microservices and containerized applications across multiple environments. - Ensure high availability and rollback capabilities for production deployments. Infrastructure as Code (IaC) : - Develop and maintain infrastructure provisioning scripts using tools like Terraform or CloudFormation. - Implement configuration management solutions with Ansible, Puppet, or Chef. - Ensure infrastructure scalability, reliability, and security for on-prem and cloud environments. Scripting and Automation : - Write and optimize scripts using Python, Bash, or PowerShell for automating operational tasks. - Build custom tools to streamline repetitive DevOps workflows. - Implement monitoring and alerting automation to proactively address system issues. Database Management : - Collaborate with database administrators to manage and optimize SQL and NoSQL databases (e.g., PostgreSQL, MongoDB). - Implement automated database backup, restoration, and performance monitoring solutions. - Ensure secure handling of database credentials and access through tools like HashiCorp Vault. Performance Monitoring and Optimization : - Integrate monitoring tools like Prometheus, Grafana, or ELK stack for observability. - Conduct root-cause analysis for incidents and implement fixes to avoid recurrence. - Optimize application performance by fine-tuning DevOps processes and infrastructure. Collaboration and Team Support : - Partner with development, QA, and operations teams to align DevOps practices with business goals. - Support developers by troubleshooting build and deployment issues. - Share best practices and mentor junior team members in DevOps methodologies. Technical Skills and Qualifications : Education : - Bachelor's degree in Computer Science, IT, or related field. Core Skills : - Messaging Queues Proficiency with Kafka and other messaging queue systems for real-time data streaming. - CI/CD Tools Expertise in Jenkins, GitLab CI/CD, or similar tools for automation pipelines. - Scripting Strong proficiency in Python, Bash, or PowerShell scripting for automation. - Cloud Platforms Hands-on experience with AWS, Azure, or Google Cloud. - Containerization Proficiency with Docker and Kubernetes for managing containerized applications. - IaC Tools Expertise in Terraform, CloudFormation, or similar tools for infrastructure provisioning. - Monitoring Experience with Prometheus, Grafana, ELK stack, or equivalent monitoring solutions. - Database Management Familiarity with both SQL and NoSQL databases (e.g., PostgreSQL, MongoDB). - Messaging Queues Expertise in Kafka for high-throughput data pipelines. - Knowledge of Helm charts for Kubernetes application deployments. - Experience with MLOps pipelines for AI/ML workload integration. - Familiarity with GitOps tools like ArgoCD or FluxCD for declarative infrastructure management. - Proficiency in implementing service meshes like Istio for microservices. Soft Skills : - Strong analytical and troubleshooting skills. - Excellent communication abilities to collaborate with cross-functional teams. - Commitment to continuous learning and knowledge sharing. Apply Insights Follow-up Save this job for future reference Did you find something suspiciousReport Here! Hide This Job Click here to hide this job for you. You can also choose to hide all the jobs from the recruiter.

Posted 3 weeks ago

Apply

4.0 - 8.0 years

3 - 7 Lacs

Bengaluru

Work from Office

Naukri logo

Position Java Backend Developer Location Chennai Job Type Full-time Job Summary : We are looking for a proficient Java Backend Developer with 4 to 8 years of experience to join our team. The ideal candidate will have hands-on experience in building high-performance, scalable, enterprise-grade applications. The role involves working with Java, Spring Boot, Kafka, and other modern technologies to deliver robust backend services. You will work closely with cross-functional teams to design and implement backend systems and integrate them with front-end components. Key Responsibilities : - Design, develop, and maintain backend services using Java and Spring Boot. - Implement and manage distributed systems with Kafka for real-time data processing. - Develop RESTful APIs and microservices to support front-end functionality. - Ensure high performance and responsiveness of applications. - Troubleshoot and optimize backend systems to ensure reliability and scalability. - Collaborate with the front-end developers, DevOps, and QA teams to ensure seamless integration. - Write clean, scalable, and maintainable code following best practices. - Implement monitoring solutions using tools like Kibana, Prometheus, and Grafana. Primary Skills : - Strong proficiency in Java and Spring Boot framework. - Experience with Kafka for messaging and stream processing. - Familiarity with RESTful API design and microservices architecture. - Understanding of software development lifecycle (SDLC), design patterns, and best coding practices. Secondary Skills : - Experience with monitoring and visualization tools like Kibana, Prometheus, and Grafana. - Knowledge of databases, including MySQL and NoSQL databases. - Hands-on experience with Cloud technologies (preferably AWS). - Exposure to containerization and orchestration tools like Kubernetes. - Familiarity with CI/CD pipelines and DevOps practices. Apply Insights Follow-up Save this job for future reference Did you find something suspiciousReport Here! Hide This Job Click here to hide this job for you. You can also choose to hide all the jobs from the recruiter.

Posted 3 weeks ago

Apply

8.0 - 12.0 years

7 - 11 Lacs

Hyderabad

Work from Office

Naukri logo

Position Sr DevOps Engineer Location : Hyderabad Immediate Joiner Position Overview As a DevOps Engineer, you will play a critical role in ensuring the smooth operation and maintenance of our vendor applications. You will be responsible for scripting, supporting Java applications, and working with SQL databases. Your expertise will help us maintain high availability, performance, and security of our applications. Key Responsibilities : - Collaborate with development and operations teams to support and maintain vendor applications. - Develop and maintain scripts for automation, deployment, and monitoring. - Provide support for Java-based applications, including troubleshooting and performance tuning. - Manage and optimize SQL databases, ensuring data integrity and availability. - Implement and maintain CI/CD pipelines to streamline the software development lifecycle. - Monitor application performance and system health, proactively identifying and resolving issues. - Participate in on-call rotations to provide 24/7 support for critical systems. - Document processes, procedures, and best practices to ensure knowledge sharing and consistency. Qualifications : - Bachelor's degree in Computer Science, Information Technology, or a related field. - Proven experience as a DevOps Engineer or in a similar role. - Strong scripting skills (e.g., Python, Bash, PowerShell). - Experience supporting Java applications, including troubleshooting and performance tuning. - Proficiency in SQL and experience managing SQL databases. - Familiarity with CI/CD tools (e.g., Jenkins, GitLab CI, CircleCI). - Knowledge of containerization and orchestration tools (e.g., Docker, Kubernetes). - Understanding of cloud platforms (e.g., AWS, Azure, Google Cloud). - Excellent problem-solving skills and attention to detail. - Strong communication and collaboration skills. Preferred Qualifications : - Experience with configuration management tools (e.g., Ansible, Chef, Puppet). - Knowledge of monitoring and logging tools (e.g., Prometheus, Grafana, ELK stack). - Familiarity with Agile and DevOps methodologies. Apply Insights Follow-up Save this job for future reference Did you find something suspiciousReport Here! Hide This Job Click here to hide this job for you. You can also choose to hide all the jobs from the recruiter.

Posted 3 weeks ago

Apply

6.0 - 11.0 years

18 - 22 Lacs

Chennai, Bengaluru

Work from Office

Naukri logo

Who We Are Applied Materials is the global leader in materials engineering solutions used to produce virtually every new chip and advanced display in the world. We design, build and service cutting-edge equipment that helps our customers manufacture display and semiconductor chips- the brains of devices we use every day. As the foundation of the global electronics industry, Applied enables the exciting technologies that literally connect our world- like AI and IoT. If you want to work beyond the cutting-edge, continuously pushing the boundaries of"science and engineering to make possible"the next generations of technology, join us to Make Possible® a Better Future. What We Offer Location: Bangalore,IND, Chennai,IND At Applied, we prioritize the well-being of you and your family and encourage you to bring your best self to work. Your happiness, health, and resiliency are at the core of our benefits and wellness programs. Our robust total rewards package makes it easier to take care of your whole self and your whole family. Were committed to providing programs and support that encourage personal and professional growth and care for you at work, at home, or wherever you may go. Learn more about our benefits . Youll also benefit from a supportive work culture that encourages you to learn, develop and grow your career as you take on challenges and drive innovative solutions for our customers."We empower our team to push the boundaries of what is possible"”while learning every day in a supportive leading global company. Visit our Careers website to learn more about careers at Applied. About Applied Applied Materials is the leader in materials engineering solutions used to produce virtually every new chip and advanced display in the world. Our expertise in modifying materials at atomic levels and on an industrial scale enables customers to transform possibilities into reality. At Applied Materials, our innovations make possible the technology shaping the future. Our Team Our team is developing a high-performance computing solution for low -latency and high throughput image processing and deep-learning workload s that will enable our Chip Man ufacturing process control equipment to offer differentiated value to our customers. Your Opportunity As an HPC Architect , you will get the opportunity to architect high-performance computing solutions from scratch and design/ optimize all aspects (Compute , Memory, Network ing , Storage) for better cost of Ownership. Roles and Responsibility As an architect , you will be responsible for design ing HPC infrastructure solutions, including compute , networking, storage, and workload management components . You will work closely with cross-functional teams, including Hardware, S oftware, product management , and business stakeholders, to understand compute workload and translate them into Platform architecture and designs that meet business needs . You will c reate and maintain detailed system architecture diagrams and specifications. You will e valuate and select appropriate hardware and software components for HPC environments You will Install, configure, and maintain HPC systems, including hardware, software, and networking components You will d evelop and implement automation scripts for system management and deployment." You will be a s ubject Matter expert to unblock depe n dent teams in the HPC domain. You will be expected to develop system benchmarks , profile systems to understand bottleneck s, optimize workflows and processes to improve cost of ownership. Identify and mitigate technical risks and issues throughout the HPC development life cycle . Ensure that Compute Cluster is resilient, reliable, and maintainable. You will be expected to stay abreast of the latest HPC technologies, including Hardwa re, Software and Networking Solutions Your primary focus will be to understand the compute workload and design HPC cluster with right combination of Nodes, CPU/GPU, Memory, Interconnects and storage to have optimum performance at minimum cost of Ownership. Our Ideal Candidate Someone who has the drive and passion to learn quickly , has the ability to multi- task and switch contexts based on business needs . Qualifications In-depth experience with Linux System administration and Hardware/Software Configuration . Strong knowledge of HPC technologies including cluster computing, high speed interconnects (InfiniBand, RoCE), parallel filesystems ( Lus tre, GPF S, BeeGFS etc ) Experience in creating , maintaining Operating System images with different installation and boot schemes Extremely good with automation tools like Ansible, Chef, Salt-Stack and Scripting languages (Python and Bash) Experience in C reating , maintaining Storage Solution s with different RAID configuration . A bility to design storage solution for different IOPS, Access patterns ( Random vs Sequential RW ) and tun e storage and filesystems for better performance. Good of knowledge Networking concepts including IP addressing, routing, protocols and Switch configuration for RDMA, VLAN configuration, network bonding etc. Good Knowledge Virtualization, Hardware and Software Hypervisors Good kno wledge of containerization technologies like docker, singularity . Experience in Software Defined Networking and Storage. Experience in setting-up remote management protocols like IPMI, Red fish etc. Experience in setting-up and using monitoring systems like Prometheus, Grafana . Experience System profiling and custom tuning for target workload for higher performance and low cost of ownership Very good written and verbal communication skills. Very good in Technical documentation meant to serve as manuals for non-experts in the f ield. Additional Qualifications Experience in HPC Cluster management and Work-load orchestration software ( e.g. SLURM , Torque, LSF) Experience in Setting-up Deep-learning training/ inference solutions . Experience in Private cloud infrastructure like Kubernetes, OpenStack, CloudStack etc. Experience in Distributed High Performance Computing and Parallel programming frameworks Good knowledge of Low-latency and high-throughput data transfer technologies (RDMA on RoCE, InfiniBand) Education Bachelors Degree or higher in Computer science or related Disciplines. Applied Materials is committed to diversity in its workforce including Equal Employment Opportunity for Minorities, Females, Protected Veterans and Individuals with Disabilities. Additional Information Time Type: Full time Employee Type: Assignee / Regular Travel: Relocation Eligible: No Applied Materials is an Equal Opportunity Employer. Qualified applicants will receive consideration for employment without regard to race, color, national origin, citizenship, ancestry, religion, creed, sex, sexual orientation, gender identity, age, disability, veteran or military status, or any other basis prohibited by law.

Posted 3 weeks ago

Apply

8.0 - 13.0 years

25 - 30 Lacs

Bengaluru

Work from Office

Naukri logo

About NetApp NetApp is the intelligent data infrastructure company, turning a world of disruption into opportunity for every customer No matter the data type, workload or environment, we help our customers identify and realize new business possibilities And it all starts with our people, If this sounds like something you want to be part of, NetApp is the place for you You can help bring new ideas to life, approaching each challenge with fresh eyes Of course, you won't be doing it alone At NetApp, we're all about asking for help when we need it, collaborating with others, and partnering across the organization and beyond, Job Summary The NetApp Keystone team is responsible for cutting-edge technologies that enable NetApps pay as you go offering Keystone helps customers manage data on prem or in the cloud and have invoices that are charged in a subscription manner, Job Requirements Role & Responsibilities As a Go Lang Engineer for Keystone, youll have the opportunity to Enjoy working on customer Issues that no one has solved yet Influence Engineering teams to suggest improvement Ideas on features Learn storage as a subscription service Work with other engineers to deliver Best Customer Experience for Keystone Key Skills Strong knowledge of Go programming language, paradigms, constructs, and idioms Bachelors/Masters degree in computer science, information technology, or engineering/ or anything specific that you prefer Knowledge of various Go frameworks and tools year experience working with the Go programming language Strong written and communication skills with proven fluency in English Familiarity with database technologies such as NoSQL, Prometheus and MongoDB Hands-on experience with code conversion tools like Git, Passionate about learning new tools, languages, philosophies, and workflows Working with generated code and code generation techniques Working with document databases and Golang ORM libraries Knowledge of programming methodologies Object Oriented/Functional/Design Patterns Knowledge of software development methodologies SCRUM/AGILE/LEAN Knowledge of software deployment Docker/Kubernetes Knowledge of software team tools GIT/JIRA/CICD Education IC Typically requires a minimum of 5-8 years of related experience with bachelor /master's degree, At NetApp, we embrace a hybrid working environment designed to strengthen connection, collaboration, and culture for all employees This means that most roles will have some level of in-office and/or in-person expectations, which will be shared during the recruitment process, Equal Opportunity Employer NetApp is firmly committed to Equal Employment Opportunity (EEO) and to compliance with all laws that prohibit employment discrimination based on age, race, color, gender, sexual orientation, gender identity, national origin, religion, disability or genetic information, pregnancy, and any protected classification, Why NetApp We are all about helping customers turn challenges into business opportunity It starts with bringing new thinking to age-old problems, like how to use data most effectively to run better but also to innovate We tailor our approach to the customer's unique needs with a combination of fresh thinking and proven approaches, We enable a healthy work-life balance Our volunteer time off program is best in class, offering employees 40 hours of paid time off each year to volunteer with their favourite organizations We provide comprehensive benefits, including health care, life and accident plans, emotional support resources for you and your family, legal services, and financial savings programs to help you plan for your future We support professional and personal growth through educational assistance and provide access to various discounts and perks to enhance your overall quality of life, If you want to help us build knowledge and solve big problems, let's talk, Submitting an application To ensure a streamlined and fair hiring process for all candidates, our team only reviews applications submitted through our company website This practice allows us to track, assess, and respond to applicants efficiently Emailing our employees, recruiters, or Human Resources personnel directly will not influence your application, Apply

Posted 3 weeks ago

Apply

7.0 - 12.0 years

30 - 35 Lacs

Pune

Work from Office

Naukri logo

About The Role : Job TitleProduction Specialist, AVP LocationPune, India Role Description Our organization within Deutsche Bank is AFC Production Services. We are responsible for providing technical L2 application support for business applications. The AFC (Anti-Financial Crime) line of business has a current portfolio of 25+ applications. The organization is in process of transforming itself using Google Cloud and many new technology offerings. As an Assistant Vice President, your role will include hands-on production support and be actively involved in technical issues resolution across multiple applications. You will also be working as application lead and will be responsible for technical & operational processes for all application you support. Deutsche Banks Corporate Bank division is a leading provider of cash management, trade finance and securities finance. We complete green-field projects that deliver the best Corporate Bank - Securities Services products in the world. Our team is diverse, international, and driven by shared focus on clean code and valued delivery. At every level, agile minds are rewarded with competitive pay, support, and opportunities to excel. You will work as part of a cross-functional agile delivery team. You will bring an innovative approach to software development, focusing on using the latest technologies and practices, as part of a relentless focus on business value. You will be someone who sees engineering as team activity, with a predisposition to open code, open discussion and creating a supportive, collaborative environment. You will be ready to contribute to all stages of software delivery, from initial analysis right through to production support. What we'll offer you As part of our flexible scheme, here are just some of the benefits that youll enjoy, Best in class leave policy. Gender neutral parental leaves 100% reimbursement under childcare assistance benefit (gender neutral) Sponsorship for Industry relevant certifications and education Employee Assistance Program for you and your family members Comprehensive Hospitalization Insurance for you and your dependents Accident and Term life Insurance Complementary Health screening for 35 yrs. and above Your key responsibilities Provide technical support by handling and consulting on BAU, Incidents/emails/alerts for the respective applications. Perform post-mortem, root cause analysis using ITIL standards of Incident Management, Service Request fulfillment, Change Management, Knowledge Management, and Problem Management. Manage regional L2 team and vendor teams supporting the application. Ensure the team is up to speed and picks up the support duties. Build up technical subject matter expertise on the applications being supported including business flows, application architecture, and hardware configuration. Define and track KPIs, SLAs and operational metrics to measure and improve application stability and performance. Conduct real time monitoring to ensure application SLAs are achieved and maximum application availability (up time) using an array of monitoring tools. Build and maintain effective and productive relationships with the stakeholders in business, development, infrastructure, and third-party systems / data providers & vendors. Assist in the process to approve application code releases as well as tasks assigned to support to perform. Keep key stakeholders informed using communication templates. Approach support with a proactive attitude, desire to seek root cause, in-depth analysis, and strive to reduce inefficiencies and manual efforts. Mentor and guide junior team members, fostering technical upskill and knowledge sharing. Provide strategic input into disaster recovery planning, failover strategies and business continuity procedures Collaborate and deliver on initiatives and install these initiatives to drive stability in the environment. Perform reviews of all open production items with the development team and push for updates and resolutions to outstanding tasks and reoccurring issues. Drive service resilience by implementing SRE(site reliability engineering) principles, ensuring proactive monitoring, automation and operational efficiency. Ensure regulatory and compliance adherence, managing audits,access reviews, and security controls in line with organizational policies. The candidate will have to work in shifts as part of a Rota covering APAC and EMEA hours between 07:00 IST and 09:00 PM IST (2 shifts). In the event of major outages or issues we may ask for flexibility to help provide appropriate cover. Weekend on-call coverage needs to be provided on rotational/need basis. Your skills and experience 9-15 years of experience in providing hands on IT application support. Experience in managing vendor teams providing 24x7 support. Preferred Team lead role experience, Experience in an investment bank, financial institution. Bachelors degree from an accredited college or university with a concentration in Computer Science or IT-related discipline (or equivalent work experience/diploma/certification). Preferred ITIL v3 foundation certification or higher. Knowledgeable in cloud products like Google Cloud Platform (GCP) and hybrid applications. Strong understanding of ITIL /SRE/ DEVOPS best practices for supporting a production environment. Understanding of KPIs, SLO, SLA and SLI Monitoring ToolsKnowledge of Elastic Search, Control M, Grafana, Geneos, OpenShift, Prometheus, Google Cloud Monitoring, Airflow,Splunk. Working Knowledge of creation of Dashboards and reports for senior management Red Hat Enterprise Linux (RHEL) professional skill in searching logs, process commands, start/stop processes, use of OS commands to aid in tasks needed to resolve or investigate issues. Shell scripting knowledge a plus. Understanding of database concepts and exposure in working with Oracle, MS SQL, Big Query etc. databases. Ability to work across countries, regions, and time zones with a broad range of cultures and technical capability. Skills That Will Help You Excel Strong written and oral communication skills, including the ability to communicate technical information to a non-technical audience and good analytical and problem-solving skills. Proven experience in leading L2 support teams, including managing vendor teams and offshore resources. Able to train, coach, and mentor and know where each technique is best applied. Experience with GCP or another public cloud provider to build applications. Experience in an investment bank, financial institution or large corporation using enterprise hardware and software. Knowledge of Actimize, Mantas, and case management software is good to have. Working knowledge of Big Data Hadoop/Secure Data Lake is a plus. Prior experience in automation projects is great to have. Exposure to python, shell, Ansible or other scripting language for automation and process improvement Strong stakeholder management skills ensuring seamless coordination between business, development, and infrastructure teams. Ability to manage high-pressure issues, coordinating across teams to drive swift resolution. Strong negotiation skills with interface teams to drive process improvements and efficiency gains. How we'll support you Training and development to help you excel in your career. Coaching and support from experts in your team A culture of continuous learning to aid progression. A range of flexible benefits that you can tailor to suit your needs.

Posted 3 weeks ago

Apply

7.0 - 12.0 years

32 - 37 Lacs

Bengaluru

Work from Office

Naukri logo

About The Role : Job TitleSite Reliability Engineer LocationBangalore, India Corporate TitleAVP Role Description You will work closely with application teams to ensure stable, well monitored applications that are resilient to faults. You will agree and review Service Level Objectives (SLOs) to achieve high availability for applications based on their criticality. You will maintain Error Budgets for the application teams and prevent releases in the event of production instability and reduced availability. You will focus on reducing manual toil, improving operational reliability and driving automation-first practices. This is a hands-on role with strong focus on implementing SRE practices and reducing toil for Developer Tools. What we'll offer you As part of our flexible scheme, here are just some of the benefits that youll enjoy Best in class leave policy Gender neutral parental leaves 100% reimbursement under childcare assistance benefit (gender neutral) Sponsorship for Industry relevant certifications and education Employee Assistance Program for you and your family members Comprehensive Hospitalization Insurance for you and your dependents Accident and Term life Insurance Complementary Health screening for 35 yrs. and above Your key responsibilities Drive stability, performance and reliability improvements for TDI Engineering applications. Build Monitoring and alerting solutions to alert in the event of failures/performance issues across TDI Engineering applications to help us providing the optimum service level to the users. Provide feedback loops to continually improve the application resilience across multiple application teams. Collaborate with product owners and engineering team to prioritize reliability and stability of these applications. Define, measure and maintain SLOs and Error Budgets to ensure availability for end users and to achieve appropriate levels of application stability. Identify opportunities for automation and self-service capabilities and implement them to eliminate toil for both the application teams and the SRE team to optimise effectiveness Manage outage resolution and agree actions to reduce the likelihood of failure happening in future by owning RCA and conducting blameless postmortems. Your skills and experience Bachelors degree from an accredited college or university with a concentration in Computer Science or IT-related discipline (or equivalent work experience or diploma). 8+ Years of Experience in IT in large corporate environments, specifically in controlled production environments. Demonstrable Site Reliability Engineering experience of at least 3+ Years. Excellent analytical and problem-solving skills Experience in implementing observability solution using any industry standard tools Scripting skills (Groovy, shell, Bash, Cron or any equivalent) Experience in mid-range technologies and platforms, i.e. UNIX/LINUX, ORACLE database and Nginx experience. Good to have Understanding and experience in Developer Tools (Jira, Confluence, Bitbucket, TeamCity, Artifactory, Udeploy) as an enterprise level Administrator experienced in managing applications with large user base. Knowledge and experience of observability tools like Grafana, Prometheus. How we'll support you Training and development to help you excel in your career Coaching and support from experts in your team A culture of continuous learning to aid progression A range of flexible benefits that you can tailor to suit your needs

Posted 3 weeks ago

Apply

7.0 - 11.0 years

0 - 1 Lacs

Hyderabad

Work from Office

Naukri logo

We are seeking a highly skilled Devops Engineer to join our dynamic development team. In this role, you will be responsible for designing, developing, and maintaining both frontend and backend components of our applications using Devops and associated technologies. You will collaborate with cross-functional teams to deliver robust, scalable, and high-performing software solutions that meet our business needs. The ideal candidate will have a strong background in devops, experience with modern frontend frameworks, and a passion for full-stack development. Requirements : Bachelor's degree in Computer Science Engineering, or a related field. 7 to 10+ years of experience in full-stack development, with a strong focus on DevOps. DevOps with AWS Data Engineer - Roles & Responsibilities: Use AWS services like EC2, VPC, S3, IAM, RDS, and Route 53. Automate infrastructure using Infrastructure as Code (IaC) tools like Terraform or AWS CloudFormation . Build and maintain CI/CD pipelines using tools AWS CodePipeline, Jenkins,GitLab CI/CD. Cross-Functional Collaboration Automate build, test, and deployment processes for Java applications. Use Ansible , Chef , or AWS Systems Manager for managing configurations across environments. Containerize Java apps using Docker . Deploy and manage containers using Amazon ECS , EKS (Kubernetes) , or Fargate . Monitoring & Logging using Amazon CloudWatch,Prometheus + Grafana,E Stack (Elasticsearch, Logstash, Kibana),AWS X-Ray for distributed tracing manage access with IAM roles/policies . Use AWS Secrets Manager / Parameter Store for managing credentials. Enforce security best practices , encryption, and audits. Automate backups for databases and services using AWS Backup , RDS Snapshots , and S3 lifecycle rules . Implement Disaster Recovery (DR) strategies. Work closely with development teams to integrate DevOps practices. Document pipelines, architecture, and troubleshooting runbooks. Monitor and optimize AWS resource usage. Use AWS Cost Explorer , Budgets , and Savings Plans . Must-Have Skills: Experience working on Linux-based infrastructure. Excellent understanding of Ruby, Python, Perl, and Java . Configuration and managing databases such as MySQL, Mongo. Excellent troubleshooting. Selecting and deploying appropriate CI/CD tools Working knowledge of various tools, open-source technologies, and cloud services. Awareness of critical concepts in DevOps and Agile principles. Managing stakeholders and external interfaces. Setting up tools and required infrastructure. Defining and setting development, testing, release, update, and support processes for DevOps operation. Have the technical skills to review, verify, and validate the software code developed in the project. Interview Mode : F2F for who are residing in Hyderabad / Zoom for other states Location : 43/A, MLA Colony,Road no 12, Banjara Hills, 500034 Time : 2 - 4pm (Monday-26th May to Friday-30th May)

Posted 3 weeks ago

Apply

4.0 - 9.0 years

3 - 8 Lacs

Noida, Gurugram, Delhi / NCR

Work from Office

Naukri logo

Role & responsibilities Site Reliability Engineer Requirements: We are seeking a proactive and technically strong Site Reliability Engineer (SRE) to ensure the stability, performance, and scalability of our Data Engineering Platform. You will work on cutting-edge technologies including Cloudera Hadoop, Spark, Airflow, NiFi, and JOB DESCRIPTIONS 2 Kubernetesensuring high availability and driving automation to support massive-scale data workloads, especially in the telecom domain. Key Responsibilities • Ensure platform uptime and application health as per SLOs/KPIs • Monitor infrastructure and applications using ELK, Prometheus, Zabbix, etc. • Debug and resolve complex production issues, performing root cause analysis • Automate routine tasks and implement self-healing systems • Design and maintain dashboards, alerts, and operational playbooks • Participate in incident management, problem resolution, and RCA documentation • Own and update SOPs for repeatable processes • Collaborate with L3 and Product teams for deeper issue resolution • Support and guide L1 operations team • Conduct periodic system maintenance and performance tuning • Respond to user data requests and ensure timely resolution • Address and mitigate security vulnerabilities and compliance issues Technical Skillset • Hands-on with Spark, Hive, Cloudera Hadoop, Kafka, Ranger • Strong Linux fundamentals and scripting (Python, Shell) • Experience with Apache NiFi, Airflow, Yarn, and Zookeeper • Proficient in monitoring and observability tools: ELK Stack, Prometheus, Loki • Working knowledge of Kubernetes, Docker, Jenkins CI/CD pipelines • Strong SQL skills (Oracle/Exadata preferred) Job Description: • Familiarity with DataHub, DataMesh, and security best practices is a plus • Strong problem-solving and debugging mindset • Ability to work under pressure in a fast-paced environment. • Excellent communication and collaboration skills. • Ownership, customer orientation, and a bias for action Preferred candidate profile Immediate Joiner

Posted 3 weeks ago

Apply

3.0 - 5.0 years

15 - 17 Lacs

Bengaluru

Work from Office

Naukri logo

About the Role Own the deployment, scaling and hardening of our Kubernetes-based infrastructure. Automate end-to-end provisioning, ensure security and high availability, and troubleshoot production incidents. Key Responsibilities Kubernetes: Deploy, manage & optimize clusters (on-prem, EKS/GKE/AKS) IaC & GitOps: Automate with Terraform, Helm charts & Argo CD (or similar) CI/CD: Build/maintain pipelines (Jenkins, GitHub Actions, etc.) Monitoring: Implement Prometheus, Grafana & ELK for metrics, logs & alerts Troubleshooting: Diagnose container networking, storage & performance issues Security: Enforce RBAC, network policies & image-scanning best practices DR & Optimization: Define backup/restore strategies and cost-control measures Collaboration: Partner with dev teams on containerization and CI/CD workflows Required Qualifications 3-5 yrs in infrastructure, SRE or DevOps roles Hands-on Kubernetes (cluster lifecycle, Helm, CRDs) Linux administration & Bash scripting; networking tools (ip, netstat, tcpdump) IaC with Terraform/Ansible; deep Docker knowledge Monitoring with Prometheus/Grafana & ELK Automation scripting in Bash, Python or Go; Git proficiency; production debugging Preferred Skills Managed K8s services (EKS/GKE/AKS) Advanced IaC/GitOps (Argo CD, Terraform, Helm) Service mesh (Istio, Linkerd) Container security (Trivy, Clair) Custom tooling via Bash/Python automation

Posted 3 weeks ago

Apply

8.0 - 13.0 years

50 - 85 Lacs

Noida

Work from Office

Naukri logo

About the Role We are seeking a highly skilled Staff Engineer to lead the architecture, development, and scaling of our Marketplace platform including portals & core services such as Identity & Access Management (IAM), Audit, and Tenant Management services. This is a hands-on technical leadership role where you will drive engineering excellence, mentor teams, and ensure our platforms are secure, compliant, and built for scale. A Day in the Life Design and implement scalable, high-performance backend systems for all the platform capabilities Lead the development and integration of IAM, audit logging, and compliance frameworks, ensuring secure access, traceability, and regulatory adherence. Champion best practices for reliability, availability, and performance across all marketplace and core service components. Mentor engineers, conduct code/design reviews, and establish engineering standards and best practices. Work closely with product, security, compliance, and platform teams to translate business and regulatory requirements into technical solutions. Evaluate and integrate new technologies, tools, and processes to enhance platform efficiency, developer experience, and compliance posture. Take end-to-end responsibility for the full software development lifecycle, from requirements and design through deployment, monitoring, and operational health. What You Need 8+ years of experience in backend or infrastructure engineering, with a focus on distributed systems, cloud platforms, and security. Proven expertise in building and scaling marketplace platforms and developer/admin/API portals. Deep hands-on experience with IAM, audit logging, and compliance tooling. Strong programming skills in languages such as Python or Go. Experience with cloud infrastructure (AWS, Azure), containerization (Docker, Kubernetes), and service mesh architectures. Understanding of security protocols (OAuth, SAML, TLS), authentication/authorization, and regulatory compliance. Demonstrated ability to lead technical projects and mentor engineering teams & excellent problem-solving, communication, and collaboration skills. Proficiency in observability tools such as Prometheus, Grafana, OpenTelemetry. Prior experience with Marketplace & Portals Bachelor's or Masters degree in Computer Science, Engineering, or a related field

Posted 3 weeks ago

Apply

8.0 - 13.0 years

10 - 20 Lacs

Hyderabad, Chennai, Bengaluru

Work from Office

Naukri logo

Platforms: AWS PaaS, AWS DevOps Engineer Programming: Java, Monitoring Tools: Thousand Eyes, App Dynamics, CloudWatch, Grafana, Prometheus Java development (coding / scripting – 5-10 yrs) + AWS PaaS (min 3+ years) – SRE experience is advantage.

Posted 3 weeks ago

Apply

5.0 - 8.0 years

12 - 18 Lacs

Bengaluru

Work from Office

Naukri logo

Are you an experienced Platform Engineer looking for a new opportunity to showcase your skills and expertise? If so, then Torry Harris is looking for you! We are currently seeking a skilled and motivated individual to join our team and play a critical role in streamlining and automating our cloud infrastructure. As a Senior Platform Engineer at Torry Harris, you responsible to design, build, and maintain scalable infrastructure that supports software development and deployment. The ideal candidate will have expertise in cloud technologies, automation, and DevOps practices. Roles and Responsibilities • Design and maintain scalable, resilient any cloud infrastructure AWS is recommended. • Implement Infrastructure as Code (IaC) using Terraform, Ansible, or CloudFormation. • Automate provisioning, monitoring, and self-healing mechanisms. • Develop and enhance continuous integration & deployment pipelines. • Develop and maintain Helm charts, Kubernetes manifests, and custom operators. • Implement blue-green deployments, canary releases, and rollback mechanisms. • Ensure fast, reliable software delivery while minimizing downtime. • Integrate security scanning tools (SonarQube, Snyk) into CI/CD workflows. • Ensure secure configurations, RBAC policies, and compliance with industry standards. • Implement secrets management and identity access control in cloud environments. • Deploy monitoring tools (Prometheus, Grafana, Datadog) for real-time observability. • Lead root cause analysis,performance optimization for any platform releated issues. • Ensure system reliability using automated alerting and logging mechanisms • Implement monitoring, logging, and alerting solutions for Kubernetes workloads. • Troubleshoot and resolve issues related to container orchestration and networking. • Stay up to date with Kubernetes ecosystem developments and recommend improvements. • Mentor junior engineers and contribute to technical leadership within the DevOps team. • Work closely with developers, platform engineers, and SREs to optimize workflows. • Drive cross-functional collaboration to align DevOps strategies with business objectives.

Posted 3 weeks ago

Apply

3.0 - 6.0 years

15 - 20 Lacs

Pune, Gurugram, Bengaluru

Work from Office

Naukri logo

Roles and Responsibilities Design and develop application health dashboards, alerting and notification delivery systems to help with observability of application stack in Azure cloud. Respond to incidents, perform root cause analysis, troubleshoot issues, and implement solutions to prevent recurrence. Act as gatekeeper for production deployments, participate in the application release cycles and perform production releases. Manage, and maintain environments hosting Credit, Swaps & FX FO IT microservices and data lake platform. Manage and maintain the lifecycle of core application suite that provide common capabilities such as continuous deployment, observability, and kafka streaming. Establish, deploy, and maintain CI/CD pipelines to automate the build, test, and deployment processes adhering to firms audit and compliance policies. Migration of on-prem build and deployment projects to adopt existing GitOps, cloud deployment pipeline pattern and branching policies. Assist the development teams in containerising, building, and migration of on-prem applications to Azure cloud. Setup, manage and maintain central observability solution for on-prem and cloud. Identify areas that benefit from automation and build automated processes wherever possible. Collaborate with infra teams to provision and manage infra resources required by FO IT development teams in Azure cloud. Implement backup and disaster recovery strategies and participate in annual DR tests and assist with executing the DR test plan. Create and maintain documentation related to common issues, fixes, deployment/release processes, transfer knowledge among DevOps and support team members to remove any key man dependencies Essential Criteria : 2 to 5 years of experience in a SRE/DevOps role preferably in Investment Banking with solid understanding of both. Strong knowledge of DevOps practices, tools, and technologies. Experience in working with, managing, and maintaining enterprise scale production application microservice environments, observability tools. Strong knowledge of containerization and orchestration of microservices. Experience with Docker/Podman, Helm, ArgoCD GitOps tool, Terraform. Experience with Azure Kubernetes Service, Azure Storage, and other Azure cloud related technologies. Experience with Prometheus, Grafana, Loki, Tempo, Grafana Agent, Azure Monitor logging and observability tools. Bamboo CI/CD tools, Bitbucket, GIT. Automation scripting (Bash, Powershell, Python). Be able to demonstrate a high level of professionalism, organisation, self-motivation, and a desire for self- improvement. Ability to plan, schedule and manage a demanding workload.

Posted 3 weeks ago

Apply

8 - 12 years

18 - 20 Lacs

Noida

Hybrid

Naukri logo

Role Overview We are looking for a Lead Cloud Operations Engineer to join our growing team supporting key supply-side technology platforms, including Atlas Integration, GMX, Hotel APIs, and related microservices in Azure. This is a high-impact technical leadership role focused on Azure cloud operations, monitoring, performance, security, and incident resolution. You will be responsible for ensuring the availability, scalability, and reliability of cloud-hosted systems, mentoring a small operations team, and collaborating with developers, architects, and business stakeholders to drive continuous improvement. Key Responsibilities Own day-to-day operations and health of production and pre-prod environments hosted in Azure. Monitor infrastructure and applications using Azure Monitor, Application Insights, and Grafana. Lead the team in proactive incident detection, triage, resolution, and post-incident reviews (RCA, documentation). Implement and enhance automation for common operational tasks using PowerShell, Python, Azure CLI, and Terraform/Ansible. Act as escalation point for complex issues and high-severity incidents. Create, improve, and maintain runbooks, dashboards, alerts, and performance tuning metrics. Collaborate with development and DevOps teams to ensure operational readiness, deployment hygiene, and system resilience. Maintain strong governance around Azure resources, RBAC, policy enforcement, and tagging strategy. Lead disaster recovery planning, testing, and execution across critical systems. Drive cost optimization initiatives using Azure Cost Management and FinOps principles. Ensure compliance with security policies (ISO 27001, GDPR, SOC2) and assist in audits or security reviews. Support team mentoring, training, and promoting a strong culture of ownership and accountability. Required Skills & Experience Azure IaaS: Virtual Machines, Scale Sets, Load Balancer, Disks, Networking (VNETs, NSGs, UDRs, Private Links, Service Endpoints) Azure PaaS: App Services, Azure Functions, Logic Apps, Key Vault, Event Grid, Azure SQL, Application Gateway, Azure Front Door, Traffic Manager Azure Kubernetes Service (AKS) deployment, scaling, security & troubleshooting Azure Site Recovery (ASR), Azure Backup, and Disaster Recovery architecture Deep understanding of Azure Monitor, Application Insights, Log Analytics Ability to write and optimize KQL queries for diagnostics and dashboards Experience with Grafana, Prometheus, and alerting pipelines Hands-on experience with Terraform, Ansible, ARM templates Proficiency in scripting with PowerShell, Bash, and/or Python Experience with Azure DevOps Pipelines or similar CI/CD tooling is a plus RBAC, Managed Identities, Conditional Access, Key Vault integration Awareness of ISO 27001, SOC2, GDPR requirements in cloud environments Proven experience leading 24 engineers (including juniors/mid-levels) Strong verbal and written communication skills; able to interact with technical and non- technical stakeholders Experience participating in on-call rotations, owning major incidents, and delivering RCA reports Ability to train, mentor, and guide junior engineers Collaborative mindset with a strong sense of accountability and urgency Nice to Have Experience with multi-cloud (AWS or GCP) environments and hybrid cloud networking Experience working with microservices-based systems and APIs Exposure to FinOps practices and cloud cost management tools Certifications: AZ-305, AZ-104, AZ-500, AZ-700, AZ-400 preferred

Posted 4 weeks ago

Apply

10 - 13 years

18 - 25 Lacs

Bengaluru

Hybrid

Naukri logo

Hiring, Lead Site Reliability Engineer with following skills and expertise. What will this person do? Provide leadership in designing and implementing reliable, scalable, and secure infrastructure solutions. Develop and maintain observability solutions, ensuring visibility into system performance using native Azure Cloud solutions. Define and track SLIs, ensuring compliance with SLOs and SLAs. Lead incident response efforts, conduct root cause analysis, and implement preventive measures to minimize downtime. Automate infrastructure provisioning, configuration and management using Terraform & Ansible. Build and maintain robust Observability pipelines to support automated deployments and continuous monitoring practices. Continuously analyze system health and optimize performance by identifying and resolving bottlenecks. Work with our BCDR team to minimize business impact during failures and measure the quality of services. Work with Cloud Governance team to monitor cloud infrastructure spending and implement cost-saving strategies. Implement centralized logging, metric collection, and distributed tracing for troubleshooting and debugging. Deploy, Manage and Monitor containerized workloads. Maintain configuration consistency and compliance across cloud environments using tools like Ansible. Partner with software development teams to integrate reliability best practices into the application development lifecycle. Conduct detailed post-mortems, document learnings, and drive improvements to reduce future incidents. Develop automation scripts in Python, Bash, or other languages to reduce manual efforts and improve efficiency. Provide mentorship to junior engineers, fostering a culture of learning and continuous technical growth. Research and evaluate new technologies, tools, and methodologies to improve system reliability and efficiency. Maintain detailed documentation on infrastructure, monitoring setups, incident responses, and best practices. Qualifications Bachelors degree in Computer Science, Engineering, or a related field. 10+ years in Observability, DevOps, and Site Reliability Engineering (SRE). At least 2 years of experience in defining Observability KPIs for both on-premises and cloud environments. Strong experience with cloud platforms (AWS, Azure, GCP) and cloud-native technologies. Passion for automation, reducing toil and implementing reliability-focused best practices. Deep knowledge of services/tools like Grafana, PowerBI, Prometheus, Azure Monitor, Application Insights & Azure Metrics. Expertise in Terraform, Ansible, Chef, and CI/CD pipeline tools like GitHub Actions, Jenkins, and GitOps methodologies. Working understanding of load balancing, authentication (AAA), encryption, and network parameters monitoring. Strong troubleshooting skills and experience handling on-call incidents and post-mortem analysis. Ability to work cross-functionally, drive technical discussions, and mentor junior engineers. Ability to work in a dynamic team environment and possess time management skills to meet deadlines. Sense of ownership and pride in your performance and its impact on the companys success. Critical thinker with problem-solving skills. Good interpersonal and communication skills.

Posted 1 month ago

Apply

21 - 31 years

50 - 70 Lacs

Bengaluru

Work from Office

Naukri logo

What we’re looking for As a member of the infrastructure team at Survey Monkey, you will have a direct impact in designing, engineering and maintaining our Cloud, Messaging and Observability Platform. Solutioning with best practices, deployment processes, architecture, and support the ongoing operation of our multi-tenant AWS environments. This role presents a prime opportunity for building world-class infrastructure, solving complex problems at scale, learning new technologies and offering mentorship to other engineers. What you’ll be working on Architect, build, and operate AWS environments at scale with well-established industry best practices. Automating infrastructure provisioning, DevOps, and/or continuous integration/delivery. Provide Technical Leadership & Mentorship Mentor and guide senior engineers to build technical expertise and drive a culture of excellence in software development. Foster collaboration within the engineering team, ensuring the adoption of best practices in coding, testing, and deployment. Review code and provide constructive feedback to ensure code quality and adherence to architectural principles. Collaboration & Cross-Functional Leadership Collaborate with cross-functional teams (Product, Security, and other Engineering teams) to drive the roadmap and ensure alignment with business objectives. Provide technical leadership in meetings and discussions, influencing key decisions on architecture, design, and implementation. Innovation & Continuous Improvement Propose, evaluate, and integrate new tools and technologies to improve the performance, security, and scalability of the cloud platform. Drive initiatives for optimizing cloud resource usage and reducing operational costs without compromising performance. Write libraries and APIs that provide a simple, unified interface to other developers when they use our monitoring, logging, and event-processing systems. Participate in on-call rotation. Support and partner with other teams on improving our observability systems to monitor site stability and performance We’d love to hear from people with: 12+ years of relevant professional experience with cloud platforms such as AWS, Heroku. Extensive experience leading design sessions and evolving well-architected environments in AWS at scale. Extensive experience with Terraform, Docker, Kubernetes, scripting (Bash/Python/Yaml), and helm. Experience with Splunk, OpenTelemetry, CloudWatch, or tools like New Relic, Datadog, or Grafana/Prometheus, ELK (Elasticsearch/Logstash/Kibana). Experience with metrics and logging libraries and aggregators, data analysis and visualization tools – Specifically Splunk and Otel. Experience instrumenting PHP, Python, Java and Node.js applications to send metrics, traces, and logs to third-party Observability tooling. Experience with GitOps and tools like ArgoCD/fluxcd. Interest in Instrumentation and Optimization of Kubernetes Clusters. Ability to listen and partner to understand requirements, troubleshoot problems, or promote the adoption of platforms. Experience with GitHub/GitHub Actions/Jenkins/Gitlab in either a software engineering or DevOps environment. Familiarity with databases and caching technologies, including PostgreSQL, MongoDB, Elasticsearch, Memcached, Redis, Kafka and Debezium. Preferably experience with secrets management, for example Hashicorp Vault. Preferably experience in an agile environment and JIRA. SurveyMonkey believes in-person collaboration is valuable for building relationships, fostering community, and enhancing our speed and execution in problem-solving and decision-making. As such, this opportunity is hybrid and requires you to work from the SurveyMonkey office in Bengaluru 3 days per week. #LI - Hybrid

Posted 1 month ago

Apply

Exploring Prometheus Jobs in India

Prometheus is a popular monitoring and alerting tool used in the field of DevOps and software development. In India, the demand for professionals with expertise in Prometheus is on the rise. Job seekers looking to build a career in this field have a promising outlook in the Indian job market.

Top Hiring Locations in India

  1. Bangalore
  2. Pune
  3. Hyderabad
  4. Mumbai
  5. Chennai

These cities are known for their vibrant tech industry and have a high demand for professionals skilled in Prometheus.

Average Salary Range

The salary range for Prometheus professionals in India varies based on experience levels. Entry-level positions can expect to earn around ₹5-8 lakhs per annum, whereas experienced professionals can earn up to ₹15-20 lakhs per annum.

Career Path

A typical career path in Prometheus may include roles such as: - Junior Prometheus Engineer - Prometheus Developer - Senior Prometheus Engineer - Prometheus Architect - Prometheus Consultant

As professionals gain experience and expertise, they can progress to higher roles with increased responsibilities.

Related Skills

In addition to Prometheus, professionals in this field are often expected to have knowledge and experience in: - Kubernetes - Docker - Grafana - Time series databases - Linux system administration

Having a strong foundation in these related skills can enhance job prospects in the Prometheus domain.

Interview Questions

  • What is Prometheus and how does it differ from traditional monitoring systems? (basic)
  • Explain the architecture of Prometheus. (medium)
  • How do you set up alerting in Prometheus? (medium)
  • What are exporters in Prometheus and why are they important? (basic)
  • How can you visualize data collected by Prometheus? (medium)
  • Explain the concept of time series data and how it is used in Prometheus. (medium)
  • How does Prometheus store its data? (medium)
  • What is the role of PromQL in Prometheus? (medium)
  • How can you troubleshoot performance issues using Prometheus? (medium)
  • Describe the process of setting up Prometheus alerts. (medium)
  • What are the best practices for monitoring with Prometheus? (advanced)
  • How does federation work in Prometheus? (advanced)
  • Explain the role of relabeling in Prometheus configuration. (medium)
  • How can you secure Prometheus endpoints? (medium)
  • What is the role of service discovery in Prometheus? (basic)
  • Describe the benefits of using Prometheus for monitoring microservices. (medium)
  • How can you scale Prometheus for large deployments? (medium)
  • What are the limitations of Prometheus and how can they be mitigated? (medium)
  • Explain the concept of recording rules in Prometheus. (medium)
  • How can you monitor non-containerized applications with Prometheus? (medium)
  • Describe the process of backing up and restoring Prometheus data. (medium)
  • How do you handle high availability in Prometheus? (medium)
  • What are the common pitfalls to avoid when using Prometheus? (medium)
  • How can you integrate Prometheus with other monitoring tools or systems? (medium)
  • What trends do you see in the future of Prometheus and monitoring tools in general? (advanced)

Closing Remark

As you explore opportunities in the Prometheus job market in India, remember to continuously upgrade your skills and stay updated with the latest trends in monitoring and alerting technologies. With dedication and preparation, you can confidently apply for roles in this dynamic field. Good luck!

cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Featured Companies