Jobs
Interviews

95 Site Reliability Jobs - Page 4

Setup a job Alert
JobPe aggregates results for easy application access, but you actually apply on the job portal directly.

3.0 - 8.0 years

16 - 20 Lacs

Mumbai

Work from Office

What will you do at Fynd? - Run the production environment by monitoring availability and taking a holistic view of system health. - Improve reliability, quality, and time-to-market of our suite of software solutions - Be the 1st person to report the incident. - Debug production issues across services and levels of the stack. - Envisioning the overall solution for defined functional and non-functional requirements, and being able to define technologies, patterns and frameworks to realise it. - Building automated tools in Python / Java / GoLang / Ruby etc. - Help Platform and Engineering teams gain visibility into our infrastructure. - Lead design of software components and systems, to ensure availability, scalability, latency, and efficiency of our services. - Participate actively in detecting, remediating and reporting on Production incidents, ensuring the SLAs are met and driving Problem Management for permanent remediation. - Participate in on-call rotation to ensure coverage for planned/unplanned events. - Perform other task like load-test & generating system health reports. - Periodically check for all dashboards readiness. - Engage with other Engineering organizations to implement processes, identify improvements, and drive consistent results. - Working with your SRE and Engineering counterparts for driving Game days, training and other response readiness efforts. - Participate in the 24x7 support coverage as needed Troubleshooting and problem-solving complex issues with thorough root cause analysis on customer and SRE production environments - Collaborate with Service Engineering organizations to build and automate tooling, implement best practices to observe and manage the services in production and consistently achieve our market leading SLA. - Improving the scalability and reliability of our systems in production. - Evaluating, designing and implementing new system architectures. Some specific Requirements : - B.Tech. in Engineering, Computer Science, technical degree, or equivalent work experience - At least 3 years of managing production infrastructure. - Leading / managing a team is a huge plus. - Experience with cloud platforms like - AWS, GCP. - Experience developing and operating large scale distributed systems with Kubernetes, Docker and and Serverless (Lambdas) - Experience in running real-time and low latency high available applications (Kafka, gRPC, RTP) - Comfortable with Python, Go, or any relevant programming language. - Experience with monitoring alerting using technologies like Newrelic / zybix /Prometheus / Garafana / cloudwatch / Kafka / PagerDuty etc. - Experience with one or more orchestration, deployment tools, e. CloudFormation / Terraform / Ansible / Packer / Chef. - Experience with configuration management systems such as Ansible / Chef / Puppet. - Knowledge of load testing methodologies, tools like Gating, Apache Jmeter. - Work your way around Unix shell. - Experience running hybrid clouds and on-prem infrastructures on Red Hat Enterprise Linux / CentOS - A focus on delivering high-quality code through strong testing practices.

Posted 3 months ago

Apply

6.0 - 10.0 years

13 - 17 Lacs

Hyderabad

Remote

Mode of Interview : 2-3 rounds (Virtual/Inperson) Notice : Immediate - 15 Days Max Technical Skill Requirements : ServiceNow Business Analyst, ITIL, ITSM, Dashboard Creation, APM, Scripting, Datadog Role and Responsibilities : - 6+ Years of experience into SRE Engineer , having thorough knowledge on ITIL/ITSM process - Certification in ITIL v4 framework and deep knowledge of ITSM platforms preferable - Hands on experience on APM tool Datadog - Demonstrable ability to implement complex process workflows, and evidence performance through metrics-driven reporting - Strong understanding of IT Operations - Strong written and verbal communication skills with the ability to understand and present complex technical information in a clear and concise manner to a variety of audiences including executive leadership - Ability to develop strategic relationships with other teams, departments, business stakeholders, and 3rd parties - Ability to understand business requirements and define KPIs which can showcase stability of the application in production and give meaningful insights to business - Proven trouble-shooting experience and strong incident reduction-minded focus - Should be able to unsurfaced recurring issues and Toil and suggest automations - Strong problem-solving skills and the ability to think quickly and execute on short-time frames

Posted 3 months ago

Apply

8.0 - 13.0 years

15 - 25 Lacs

Hyderabad

Work from Office

Greetings from AIS!! AIS (Applied Information Sciences) is a highly regarded software and systems engineering firm providing professional application development services to commercial and government clients since 1982. One of Microsofts oldest and largest Managed Gold partners in the U.S., AIS is exclusively focused on building enterprise-class custom applications using Microsoft technologies. As we continue to experience extraordinary growth, we are seeking professionals to join our AIS Team in India. For more information, please visit: http://www.ais.com https://www.ais.com/blog/ Job Summary: Role: Site Reliability Engineer Mode of Hire: Full-time / Contract opportunity Responsibilities The Site reliability engineer will bring enhanced reliability, performance, and security to the project. Implementing comprehensive monitoring solutions to track system performance, detect anomalies, and prevent outages Setting up real-time alerts to quickly respond to issues, minimizing downtime and ensuring continuous service availability Automating routine tasks such as deployments, backups, and scaling, which reduces manual intervention and increases efficiency Integrating Continuous Integration/Continuous Deployment (CI/CD) pipelines to streamline the development and deployment process Optimizing the use of cloud resources to ensure cost-effectiveness and high performance Implementing load balancing strategies to distribute traffic evenly and prevent bottlenecks Applying security best practices to protect sensitive data and ensure compliance with regulatory requirements Regularly scanning for and addressing vulnerabilities to maintain a secure environment Developing and implementing incident response plans to quickly address and resolve issues Establishing disaster recovery protocols to ensure data integrity and service continuity in case of failures Working closely with development, operations, and business teams to align technical solutions with business goals Creating detailed documentation and providing training to ensure all team members are equipped to handle the system Requirements Oracle Cloud infrastructure experience Proficiency in oracle databases including performance tuning and optimization Scripting skills in Json, python Familiarity with CI/CD pipelines to ensure smooth deployments Understanding of security principles and practices to protect data and systems knowledge of regulatory requirements and how to implement them within Oracle Cloud Ability to work effectively with cross-functional teams, including developers and operations Communication Skills: Strong verbal and written communication skills to articulate technical issues and solutions If you are interested, please reply to me to meghana.mandhala@ais.com Thanks & Regards, Meghana Reddy M Sr. Talent Aquisition Business Partner

Posted 3 months ago

Apply

9.0 - 14.0 years

20 - 35 Lacs

Bengaluru

Work from Office

Lead automation and expense management initiatives across global network platforms. Ensure secure, cost-effective operations, enhance reliability via SRE practices, and oversee vendor TEM performance, reporting, and billing accuracy. Required Candidate profile Exp in network automation, CI/CD, and cost governance. Skilled in SRE, telecom expense management, circuit cleanup, vendor coordination, and performance reporting using Power BI and Microsoft 365.

Posted 3 months ago

Apply

10.0 - 18.0 years

30 - 45 Lacs

Bengaluru

Work from Office

Lead and support RF, Voice/IPT, telephony, and mobile infrastructure globally. Drive innovation, reliability, and automation across network platforms, ensuring secure, scalable, and high-performance communication systems. Required Candidate profile Experienced in RF design, VOIP/IPT systems, UC tools, wireless/mobility, and SRE practices. Skilled in Tier-3 support, automation, and vendor management.

Posted 3 months ago

Apply

5 - 10 years

7 - 12 Lacs

Bengaluru

Work from Office

Engineering Manager - Site Reliability The role of Engineering Manager - Site Reliability , is to primarily manage, mentor and develop a team of Site Reliability Engineers, ensuring the development of both (the individual and team as a whole) are in line with organizational objectives and direction. Manages all activities in scope through the direction of activities, to design new products and modify existing designs, ensuring that deliverables are on time and with acceptable quality. The role holder is required to analyze technology trends, human resource needs, and market demand to plan projects to ensure resilience in line with current demand and future ambition. In addition to this, the role will confer with leaders, production, key stakeholders and marketing teams to determine engineering feasibility, cost effectiveness, scalability and time-to-market for new and existing products. What youll be doing: Managing People Inspire, grow and develop individuals by helping the creation of their personal development plan, leveraging available learning resources and offering stretch opportunities. Get things done in the right way by taking ownership, being proactive and collaborating with business counterparts, peers, other craft managers and stakeholders. Ensure delivery by tracking team health metrics and KPIs, monitoring roadmap progress, identifying blockers and resolving or escalating them. End to End System Ownership Own a service end to end by actively monitoring application health and performance, setting and monitoring relevant metrics and act accordingly when violated. Reduce business continuity risks and bus factor by applying state-of-the-art practices and tools, and writing the appropriate documentation such as runbooks and OpDocs. Independently manage an application or service by working through deployment and operations in production and guide more junior members of the team in this topic. Technical Incident Management Address and resolve live production issues by mitigating the customer impact within SLA. improve the overall reliability of systems by producing long term solutions through root cause analysis. Keep track of incidents by contributing to postmortem processes and logging live issues. Building software applications Build software applications by using relevant development languages and applying knowledge of systems, services and tools appropriate for the business area. Write readable and reusable code by applying standard patterns and using standard libraries. Refactor and simplify code by introducing design patterns when necessary. Ensure the quality of the application by following standard testing techniques and methods that adhere to the test strategy. Maintain data security, integrity and quality by effectively following company standards and best practices. Architectural Guidance Has sufficient knowledge to advise product teams towards a technical solution that meets the functional, nonfunctional & architectural requirements by challenging the rationale for an application design and providing context in the wider architectural landscape Set a clear direction for a technical capability by evaluating and aligning the target architecture improvements, reframing architectural designs and decisions for varied stakeholders. What youll bring: Strong people management skills and experience; Excellent communicator with strong stakeholder management experience, good commercial awareness and technical vision; You are a humble and thoughtful technology leader, you lead by example and gain your teammates respect through actions, not the title; Experience in software development, building complex and scalable solutions; Proven experience leading and managing a team of engineers in a fast-paced and complex environment; Solid experience in at least one programming language (Java, C/C++, Python, Go) Ability to formulate software solutions from scratch Solid understanding of Service Oriented Architecture, Microservices & OOP patterns Hands-on experience in Linux administration and troubleshooting Creative approach to problem-solving Practical experience in understanding and defining SLIs and SLOs Past experience with Payments or FinTech and working in a regulated environment is a plus; Strong analytical skills and data-driven mindset. Key Skills Job Description - Engineering Manager - Site Reliability The role of Engineering Manager - Site Reliability, is to primarily manage, mentor and develop a team of Site Reliability Engineers, ensuring the development of both (the individual and team as a whole) are in line with organizational objectives and direction. Manages all activities in scope through the direction of activities, to design new products and modify existing designs, ensuring that deliverables are on time and with acceptable quality. The role holder is required to analyze technology trends, human resource needs, and market demand to plan projects to ensure resilience in line with current demand and future ambition. In addition to this, the role will confer with leaders, production, key stakeholders and marketing teams to determine engineering feasibility, cost effectiveness, scalability and time-to-market for new and existing products. FinTech is a complex, competitive and exciting industry. To accomplish Booking.coms mission (making it easier for everyone to experience the world), we aim to offer frictionless payment experiences to our guests and partners. The FinTech business unit creates best in class payment products that offer choice to guests and help Bookings business partners grow their business. What youll be doing: Managing People Inspire, grow and develop individuals by helping the creation of their personal development plan, leveraging available learning resources and offering stretch opportunities. Get things done in the right way by taking ownership, being proactive and collaborating with business counterparts, peers, other craft managers and stakeholders. Ensure delivery by tracking team health metrics and KPIs, monitoring roadmap progress, identifying blockers and resolving or escalating them. End to End System Ownership Own a service end to end by actively monitoring application health and performance, setting and monitoring relevant metrics and act accordingly when violated. Reduce business continuity risks and bus factor by applying state-of-the-art practices and tools, and writing the appropriate documentation such as runbooks and OpDocs. Independently manage an application or service by working through deployment and operations in production and guide more junior members of the team in this topic. Technical Incident Management Address and resolve live production issues by mitigating the customer impact within SLA. improve the overall reliability of systems by producing long term solutions through root cause analysis. Keep track of incidents by contributing to postmortem processes and logging live issues. Building software applications Build software applications by using relevant development languages and applying knowledge of systems, services and tools appropriate for the business area. Write readable and reusable code by applying standard patterns and using standard libraries. Refactor and simplify code by introducing design patterns when necessary. Ensure the quality of the application by following standard testing techniques and methods that adhere to the test strategy. Maintain data security, integrity and quality by effectively following company standards and best practices. Architectural Guidance Has sufficient knowledge to advise product teams towards a technical solution that meets the functional, nonfunctional & architectural requirements by challenging the rationale for an application design and providing context in the wider architectural landscape Set a clear direction for a technical capability by evaluating and aligning the target architecture improvements, reframing architectural designs and decisions for varied stakeholders. What youll bring: Strong people management skills and experience; Excellent communicator with strong stakeholder management experience, good commercial awareness and technical vision; You are a humble and thoughtful technology leader, you lead by example and gain your teammates respect through actions, not the title; Experience in software development, building complex and scalable solutions; Proven experience leading and managing a team of engineers in a fast-paced and complex environment; Solid experience in at least one programming language (Java, C/C++, Python, Go) Ability to formulate software solutions from scratch Solid understanding of Service Oriented Architecture, Microservices & OOP patterns Hands-on experience in Linux administration and troubleshooting Creative approach to problem-solving Practical experience in understanding and defining SLIs and SLOs Past experience with Payments or FinTech and working in a regulated environment is a plus; Strong analytical skills and data-driven mindset.

Posted 4 months ago

Apply

3 - 8 years

19 - 22 Lacs

Kolkata, Hyderabad, Pune

Work from Office

Experienced in .NET (3–5 yrs), DevOps/SRE (3+ yrs), CI/CD, Git, IaC, Agile, cloud-native apps, observability, KQL/SQL, and cross-functional DevOps solutions in production environments. Mail:kowsalya.k@srsinfoway.com

Posted 4 months ago

Apply

5 - 9 years

22 - 27 Lacs

Pune, Chennai, Bengaluru

Hybrid

#Hiring for below position #Immediate joiner or 15 days Job Title: Senior .Net Developer Experience: 5 - 9 years Job Location: Pan India (Hybrid) Key Requirements: Proficiency in writing production code with an industry standard programming language using Agile methodologies. Proficiency practicing Infrastructure as Code and Configuration as Code techniques Proficiency managing multiple code bases in Git Proficiency creating Continuous Integration builds and deployment automation, for example CI/CD Pipelines Proficiency building Cloud Native applications in a major public cloud Proficiency implementing observability, application monitoring, and log aggregation solutions Proficiency working with cross functional teams to provide DevOps inspired solutions Delivery Insights Team Specific Skills Experience in building customer facing data insights and reporting that span across the enterprise. Proficiency with Grafana Cloud stack. Comfortable configuring various Grafana cloud components, including data sources, permissions, and expanded feature set. Proficiency with Kusto Query Language (KQL). Building and using complex queries to include various merge, join, and sort operations. Will accept equivalent SQL syntax knowledge for certain applicants. Experience in Azure Function Apps. Building, supporting, and operating a modern .net code base across the entire development life cycle. Experience in Azure SQL or Postgres database systems Experience in various components of Azure Devops Webhook configuration and creation Rest API knowledge and ability to interpret reporting needs directly to data availability Comfort with how teams use Azure DevOps to complete the SDLC process, including work item management, repositories, pipelines, and access control. If you are interested, please share your updated CV on this email ID aashifjabarulla@tsit.co.in OR kousalya.v@tsit.co.in +91 9047052352

Posted 4 months ago

Apply

5.0 - 10.0 years

8 - 12 Lacs

surat

Work from Office

We are on the lookout for a hands-on DevOps / SRE expert who thrives in a dynamic, cloud-native environment! Join a high-impact project where your infrastructure and reliability skills will shine. Key Responsibilities : - Design & implement resilient deployment strategies (Blue-Green, Canary, GitOps) - Manage observability tools : logs, metrics, traces, and alerts - Tune backend services & GKE workloads (Node.js, Django, Go, Java) - Build & manage Terraform infra (VPC, CloudSQL, Pub/Sub, Secrets) - Lead incident responses & perform root cause analyses - Standardize secrets, tagging & infra consistency across environments - Enhance CI/CD pipelines & collaborate on better rollout strategies Must-Have Skills : - 5-10 years in DevOps / SRE / Infra roles - Kubernetes (GKE preferred) - IaC with Terraform & Helm - CI/CD : GitHub Actions + GitOps (ArgoCD / Flux) - Cloud architecture expertise (IAM, VPC, Secrets) - Strong scripting/coding & backend debugging skills (Node.js, Django, etc.) - Incident management with tools like Datadog & PagerDuty - Excellent communicator & documenter Tech Stack : - GKE, Kubernetes, Terraform, Helm - GitHub Actions, ArgoCD / Flux - Datadog, PagerDuty - CloudSQL, Cloudflare, IAM, Secrets You're : - A proactive team player & strong individual contributor - Confident yet humble - Curious, driven & always learning - Not afraid to solve deep infrastructure challenges

Posted Date not available

Apply

5.0 - 10.0 years

8 - 12 Lacs

chennai

Work from Office

We are on the lookout for a hands-on DevOps / SRE expert who thrives in a dynamic, cloud-native environment! Join a high-impact project where your infrastructure and reliability skills will shine. Key Responsibilities : - Design & implement resilient deployment strategies (Blue-Green, Canary, GitOps) - Manage observability tools : logs, metrics, traces, and alerts - Tune backend services & GKE workloads (Node.js, Django, Go, Java) - Build & manage Terraform infra (VPC, CloudSQL, Pub/Sub, Secrets) - Lead incident responses & perform root cause analyses - Standardize secrets, tagging & infra consistency across environments - Enhance CI/CD pipelines & collaborate on better rollout strategies Must-Have Skills : - 5-10 years in DevOps / SRE / Infra roles - Kubernetes (GKE preferred) - IaC with Terraform & Helm - CI/CD : GitHub Actions + GitOps (ArgoCD / Flux) - Cloud architecture expertise (IAM, VPC, Secrets) - Strong scripting/coding & backend debugging skills (Node.js, Django, etc.) - Incident management with tools like Datadog & PagerDuty - Excellent communicator & documenter Tech Stack : - GKE, Kubernetes, Terraform, Helm - GitHub Actions, ArgoCD / Flux - Datadog, PagerDuty - CloudSQL, Cloudflare, IAM, Secrets You're : - A proactive team player & strong individual contributor - Confident yet humble - Curious, driven & always learning - Not afraid to solve deep infrastructure challenges

Posted Date not available

Apply

5.0 - 10.0 years

8 - 12 Lacs

kolkata

Work from Office

We are on the lookout for a hands-on DevOps / SRE expert who thrives in a dynamic, cloud-native environment! Join a high-impact project where your infrastructure and reliability skills will shine. Key Responsibilities : - Design & implement resilient deployment strategies (Blue-Green, Canary, GitOps) - Manage observability tools : logs, metrics, traces, and alerts - Tune backend services & GKE workloads (Node.js, Django, Go, Java) - Build & manage Terraform infra (VPC, CloudSQL, Pub/Sub, Secrets) - Lead incident responses & perform root cause analyses - Standardize secrets, tagging & infra consistency across environments - Enhance CI/CD pipelines & collaborate on better rollout strategies Must-Have Skills : - 5-10 years in DevOps / SRE / Infra roles - Kubernetes (GKE preferred) - IaC with Terraform & Helm - CI/CD : GitHub Actions + GitOps (ArgoCD / Flux) - Cloud architecture expertise (IAM, VPC, Secrets) - Strong scripting/coding & backend debugging skills (Node.js, Django, etc.) - Incident management with tools like Datadog & PagerDuty - Excellent communicator & documenter Tech Stack : - GKE, Kubernetes, Terraform, Helm - GitHub Actions, ArgoCD / Flux - Datadog, PagerDuty - CloudSQL, Cloudflare, IAM, Secrets You're : - A proactive team player & strong individual contributor - Confident yet humble - Curious, driven & always learning - Not afraid to solve deep infrastructure challenges.

Posted Date not available

Apply

5.0 - 10.0 years

7 - 11 Lacs

jaipur

Work from Office

We are on the lookout for a hands-on DevOps / SRE expert who thrives in a dynamic, cloud-native environment! Join a high-impact project where your infrastructure and reliability skills will shine. Key Responsibilities : - Design & implement resilient deployment strategies (Blue-Green, Canary, GitOps) - Manage observability tools : logs, metrics, traces, and alerts - Tune backend services & GKE workloads (Node.js, Django, Go, Java) - Build & manage Terraform infra (VPC, CloudSQL, Pub/Sub, Secrets) - Lead incident responses & perform root cause analyses - Standardize secrets, tagging & infra consistency across environments - Enhance CI/CD pipelines & collaborate on better rollout strategies Must-Have Skills : - 5-10 years in DevOps / SRE / Infra roles - Kubernetes (GKE preferred) - IaC with Terraform & Helm - CI/CD : GitHub Actions + GitOps (ArgoCD / Flux) - Cloud architecture expertise (IAM, VPC, Secrets) - Strong scripting/coding & backend debugging skills (Node.js, Django, etc.) - Incident management with tools like Datadog & PagerDuty - Excellent communicator & documenter Tech Stack : - GKE, Kubernetes, Terraform, Helm - GitHub Actions, ArgoCD / Flux - Datadog, PagerDuty - CloudSQL, Cloudflare, IAM, Secrets You're : - A proactive team player & strong individual contributor - Confident yet humble - Curious, driven & always learning - Not afraid to solve deep infrastructure challenges

Posted Date not available

Apply

5.0 - 10.0 years

7 - 11 Lacs

bengaluru

Work from Office

We are on the lookout for a hands-on DevOps / SRE expert who thrives in a dynamic, cloud-native environment! Join a high-impact project where your infrastructure and reliability skills will shine. Key Responsibilities : - Design & implement resilient deployment strategies (Blue-Green, Canary, GitOps) - Manage observability tools : logs, metrics, traces, and alerts - Tune backend services & GKE workloads (Node.js, Django, Go, Java) - Build & manage Terraform infra (VPC, CloudSQL, Pub/Sub, Secrets) - Lead incident responses & perform root cause analyses - Standardize secrets, tagging & infra consistency across environments - Enhance CI/CD pipelines & collaborate on better rollout strategies Must-Have Skills : - 5-10 years in DevOps / SRE / Infra roles - Kubernetes (GKE preferred) - IaC with Terraform & Helm - CI/CD : GitHub Actions + GitOps (ArgoCD / Flux) - Cloud architecture expertise (IAM, VPC, Secrets) - Strong scripting/coding & backend debugging skills (Node.js, Django, etc.) - Incident management with tools like Datadog & PagerDuty - Excellent communicator & documenter Tech Stack : - GKE, Kubernetes, Terraform, Helm - GitHub Actions, ArgoCD / Flux - Datadog, PagerDuty - CloudSQL, Cloudflare, IAM, Secrets You're : - A proactive team player & strong individual contributor - Confident yet humble - Curious, driven & always learning - Not afraid to solve deep infrastructure challenges

Posted Date not available

Apply

4.0 - 5.0 years

8 - 11 Lacs

gurugram

Work from Office

Position Overview : We are seeking an SRE to join our high-impact platform engineering team. You will maintain SLAs for real-time services deployed across hybrid clouds and Kubernetes clusters, contributing to automation, observability, and availability goals. Roles and Responsibilities : - Monitor application and infrastructure metrics; build dashboards and alerts (Prometheus, Grafana, ELK). - Automate health checks, incident remediation, and reliability guardrails. - Manage on-call rotations, conduct root cause analysis, and implement postmortem action plans. - Define and track SLOs, SLIs, and error budgets. - Use chaos engineering and resilience testing to ensure fault tolerance. Must Have Skills : - 4 - 5 years of experience in managing production-grade Kubernetes clusters and cloud-native platforms. - Proficiency in Linux system internals, containers, and networking. - Scripting/automation expertise in Python/Go/Shell. - Familiarity with incident management, runbooks, and observability standards. - Exposure to service discovery, DNS routing, and load balancing is a bonus. Qualification : BE/BTech/MCA/ME/MTech/MS in Computer Science or a related technical field or equivalent practical experience.

Posted Date not available

Apply

5.0 - 10.0 years

8 - 12 Lacs

pune

Work from Office

We are on the lookout for a hands-on DevOps / SRE expert who thrives in a dynamic, cloud-native environment! Join a high-impact project where your infrastructure and reliability skills will shine. Key Responsibilities : - Design & implement resilient deployment strategies (Blue-Green, Canary, GitOps) - Manage observability tools : logs, metrics, traces, and alerts - Tune backend services & GKE workloads (Node.js, Django, Go, Java) - Build & manage Terraform infra (VPC, CloudSQL, Pub/Sub, Secrets) - Lead incident responses & perform root cause analyses - Standardize secrets, tagging & infra consistency across environments - Enhance CI/CD pipelines & collaborate on better rollout strategies Must-Have Skills : - 5-10 years in DevOps / SRE / Infra roles - Kubernetes (GKE preferred) - IaC with Terraform & Helm - CI/CD : GitHub Actions + GitOps (ArgoCD / Flux) - Cloud architecture expertise (IAM, VPC, Secrets) - Strong scripting/coding & backend debugging skills (Node.js, Django, etc.) - Incident management with tools like Datadog & PagerDuty - Excellent communicator & documenter Tech Stack : - GKE, Kubernetes, Terraform, Helm - GitHub Actions, ArgoCD / Flux - Datadog, PagerDuty - CloudSQL, Cloudflare, IAM, Secrets You're : - A proactive team player & strong individual contributor - Confident yet humble - Curious, driven & always learning - Not afraid to solve deep infrastructure challenges

Posted Date not available

Apply

5.0 - 10.0 years

8 - 12 Lacs

gurugram

Work from Office

We are on the lookout for a hands-on DevOps / SRE expert who thrives in a dynamic, cloud-native environment! Join a high-impact project where your infrastructure and reliability skills will shine. Key Responsibilities : - Design & implement resilient deployment strategies (Blue-Green, Canary, GitOps) - Manage observability tools : logs, metrics, traces, and alerts - Tune backend services & GKE workloads (Node.js, Django, Go, Java) - Build & manage Terraform infra (VPC, CloudSQL, Pub/Sub, Secrets) - Lead incident responses & perform root cause analyses - Standardize secrets, tagging & infra consistency across environments - Enhance CI/CD pipelines & collaborate on better rollout strategies Must-Have Skills : - 5-10 years in DevOps / SRE / Infra roles - Kubernetes (GKE preferred) - IaC with Terraform & Helm - CI/CD : GitHub Actions + GitOps (ArgoCD / Flux) - Cloud architecture expertise (IAM, VPC, Secrets) - Strong scripting/coding & backend debugging skills (Node.js, Django, etc.) - Incident management with tools like Datadog & PagerDuty - Excellent communicator & documenter Tech Stack : - GKE, Kubernetes, Terraform, Helm - GitHub Actions, ArgoCD / Flux - Datadog, PagerDuty - CloudSQL, Cloudflare, IAM, Secrets You're : - A proactive team player & strong individual contributor - Confident yet humble - Curious, driven & always learning - Not afraid to solve deep infrastructure challenges

Posted Date not available

Apply

5.0 - 10.0 years

8 - 12 Lacs

mumbai

Work from Office

We are on the lookout for a hands-on DevOps / SRE expert who thrives in a dynamic, cloud-native environment! Join a high-impact project where your infrastructure and reliability skills will shine. Key Responsibilities : - Design & implement resilient deployment strategies (Blue-Green, Canary, GitOps) - Manage observability tools : logs, metrics, traces, and alerts - Tune backend services & GKE workloads (Node.js, Django, Go, Java) - Build & manage Terraform infra (VPC, CloudSQL, Pub/Sub, Secrets) - Lead incident responses & perform root cause analyses - Standardize secrets, tagging & infra consistency across environments - Enhance CI/CD pipelines & collaborate on better rollout strategies Must-Have Skills : - 5-10 years in DevOps / SRE / Infra roles - Kubernetes (GKE preferred) - IaC with Terraform & Helm - CI/CD : GitHub Actions + GitOps (ArgoCD / Flux) - Cloud architecture expertise (IAM, VPC, Secrets) - Strong scripting/coding & backend debugging skills (Node.js, Django, etc.) - Incident management with tools like Datadog & PagerDuty - Excellent communicator & documenter Tech Stack : - GKE, Kubernetes, Terraform, Helm - GitHub Actions, ArgoCD / Flux - Datadog, PagerDuty - CloudSQL, Cloudflare, IAM, Secrets

Posted Date not available

Apply

5.0 - 10.0 years

8 - 12 Lacs

ahmedabad

Work from Office

We are on the lookout for a hands-on DevOps / SRE expert who thrives in a dynamic, cloud-native environment! Join a high-impact project where your infrastructure and reliability skills will shine. Key Responsibilities : - Design & implement resilient deployment strategies (Blue-Green, Canary, GitOps) - Manage observability tools : logs, metrics, traces, and alerts - Tune backend services & GKE workloads (Node.js, Django, Go, Java) - Build & manage Terraform infra (VPC, CloudSQL, Pub/Sub, Secrets) - Lead incident responses & perform root cause analyses - Standardize secrets, tagging & infra consistency across environments - Enhance CI/CD pipelines & collaborate on better rollout strategies Must-Have Skills : - 5-10 years in DevOps / SRE / Infra roles - Kubernetes (GKE preferred) - IaC with Terraform & Helm - CI/CD : GitHub Actions + GitOps (ArgoCD / Flux) - Cloud architecture expertise (IAM, VPC, Secrets) - Strong scripting/coding & backend debugging skills (Node.js, Django, etc.) - Incident management with tools like Datadog & PagerDuty - Excellent communicator & documenter Tech Stack : - GKE, Kubernetes, Terraform, Helm - GitHub Actions, ArgoCD / Flux - Datadog, PagerDuty - CloudSQL, Cloudflare, IAM, Secrets You're : - A proactive team player & strong individual contributor - Confident yet humble - Curious, driven & always learning - Not afraid to solve deep infrastructure challenges

Posted Date not available

Apply

5.0 - 10.0 years

8 - 12 Lacs

hyderabad

Work from Office

We are on the lookout for a hands-on DevOps / SRE expert who thrives in a dynamic, cloud-native environment! Join a high-impact project where your infrastructure and reliability skills will shine. Key Responsibilities : - Design & implement resilient deployment strategies (Blue-Green, Canary, GitOps) - Manage observability tools : logs, metrics, traces, and alerts - Tune backend services & GKE workloads (Node.js, Django, Go, Java) - Build & manage Terraform infra (VPC, CloudSQL, Pub/Sub, Secrets) - Lead incident responses & perform root cause analyses - Standardize secrets, tagging & infra consistency across environments - Enhance CI/CD pipelines & collaborate on better rollout strategies Must-Have Skills : - 5-10 years in DevOps / SRE / Infra roles - Kubernetes (GKE preferred) - IaC with Terraform & Helm - CI/CD : GitHub Actions + GitOps (ArgoCD / Flux) - Cloud architecture expertise (IAM, VPC, Secrets) - Strong scripting/coding & backend debugging skills (Node.js, Django, etc.) - Incident management with tools like Datadog & PagerDuty - Excellent communicator & documenter Tech Stack : - GKE, Kubernetes, Terraform, Helm - GitHub Actions, ArgoCD / Flux - Datadog, PagerDuty - CloudSQL, Cloudflare, IAM, Secrets You're : - A proactive team player & strong individual contributor - Confident yet humble - Curious, driven & always learning - Not afraid to solve deep infrastructure challenges

Posted Date not available

Apply

6.0 - 11.0 years

6 - 16 Lacs

pune, thiruvananthapuram

Hybrid

Automation and Optimization: Develop automation scripts and tools to streamline IAM operations, including provisioning, de-provisioning, and access management. Optimize system configurations and processes to improve efficiency and reduce manual intervention. Incident Management and Response: Lead incident response efforts for IAM-related issues, including root cause analysis and resolution. Implement strategies to minimize downtime and ensure rapid recovery in the event of system failures. Collaboration and Communication: Work closely with development, operations, and security teams to ensure seamless integration and operation of IAM solutions. Communicate effectively with stakeholders regarding system status, incidents, and improvements. Continuous Improvement: Identify opportunities for improvement in system reliability and performance, and implement solutions to address them. Stay current with industry trends and emerging technologies in IAM and site reliability engineering.

Posted Date not available

Apply
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Featured Companies