Site Reliability Engineering Manager

2 - 31 years

17 Lacs

Bengaluru/Bangalore

Posted:2 days ago| Platform: Apna logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

Job Title: Site Reliability Engineering Manager Location: [Add Location or Remote Option] Job OverviewWe are seeking an experienced Site Reliability Engineering (SRE) Manager to lead a high-performing team responsible for ensuring the reliability, scalability, and performance of our SASE cloud infrastructure. In this role, you will define the reliability strategy, mentor engineers, and collaborate across teams to deliver world-class cloud-native services with a focus on automation, observability, and operational excellence. Key ResponsibilitiesLead, mentor, and support a team of Site Reliability Engineers, driving their growth, performance, and well-being. Own and execute the reliability strategy for SASE cloud infrastructure, including incident management, SLIs/SLOs, and capacity planning. Collaborate with Engineering, Product, and Security teams to design and deliver highly available, scalable, and resilient cloud-native services. Guide the team in building automation, improving observability, and enhancing operational efficiency. Establish and drive adoption of best practices in monitoring, alerting, on-call operations, and runbook development. Define and track key reliability metrics and report on system health and team performance to leadership. Contribute to hiring, onboarding, and career development of SRE team members. Foster a strong engineering culture based on ownership, collaboration, and continuous learning. Required Qualifications & SkillsExperience: 9+ years in Site Reliability Engineering, DevOps, or Cloud Infrastructure roles, with at least 2 years in a leadership or managerial position. Deep expertise in cloud platforms (AWS, GCP, or Azure) and cloud-native architectures. Hands-on experience with Kubernetes, containers, Infrastructure as Code (e.g., Terraform), and configuration management tools. Strong background in observability (monitoring, logging, tracing), automation using Python, and incident response. Familiarity with CI/CD pipelines and automation tools. Proven ability to scale SRE practices in high-growth or large-scale environments. Excellent communication, stakeholder management, and team-building skills. Ability to balance long-term reliability goals with short-term delivery needs.

Mock Interview

Practice Video Interview with JobPe AI

Start DevOps Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

RecommendedJobs for You

Kolkata metropolitan area, West Bengal, India