Jobs

Interviews
Job Alerts
Tools

Upskill and Grow with AI

Mock Interview Practice interviews in realistic simulations

Coding Practice Improve your coding skills with challenges

Certification Earn certifications to validate your skills

AI Learning Get trained with AI expert sessions

Career Path AI insights for smarter career decisions

AI Job Match Score AI-Powered Job Match Against Your Resume and Optimize Your Resume

Career Tools and Resources

Resume Builder Build Professional Resume with Ease

ATS Friendliness Check Check Resume Friendliness for Applicant Tracking Systems

Auto Apply Apply to hundreds of jobs on any platform effortlessly

Co-Pilot (Chrome Extension) Your AI Assistant for Seamless Browsing Efficiency

Interview Questions Streamline interviews with ready-to-use questions

Salaries Discover market-driven salary insights across skillsets and geographies

Companies Explore leading companies actively hiring talent
For Employers

Home
>
Jobs in bengaluru
>
Five9
>
Director of Site Reliability Engineering

Director of Site Reliability Engineering

Five9

8 - 13 years

20 - 25 Lacs

bengaluru

Posted:2 months ago| Platform:

Apply

Skills Required

automation performance management configuration management disaster recovery director database administration reliability engineering incident management monitoring capacity planning

Work Mode

Work from Office

Job Type

Full Time

Job Description

This is a hands-on leadership role requiring deep technical expertise, proven ability to scale engineering organizations, and a track record of building reliable systems at scale. The ideal candidate will balance reliability with tactical execution, driving both immediate operational excellence and long-term architectural improvements where necessary.

Key Responsibilities Strategic Leadership & Vision

Define and execute the long-term SRE strategy aligned with business objectives and technical roadmap
Establish reliability standards, SLI/SLO frameworks, and error budget policies across services
Drive architectural decisions that improve system reliability, scalability, and operational efficiency
Partner with engineering leadership to influence platform and application design for reliability
Represent SRE perspective in executive technical discussions and strategic planning

Team Leadership & Development

Build, lead, and scale a high-performing SRE organization
Recruit, hire, and onboard top-tier SRE talent across multiple experience levels
Develop career progression frameworks and growth paths for SRE professionals
Foster a culture of continuous learning, blameless post-mortems, and operational excellence
Provide technical mentorship and leadership development for senior SRE staff

Operational Excellence & Incident Management

Manage and oversee enterprise-wide incident response processes and on-call practices
Drive root cause analysis programs and ensure systematic elimination of failure modes
Implement sustainable on-call practices that maintain work-life balance while ensuring coverage
Oversee capacity planning and resource optimization strategies across all services
Establish metrics and reporting frameworks for reliability, performance, and operational health

Cross-Functional Partnership

Collaborate with VP/Director level peers in Engineering, Product, and Infrastructure
Work with Security leadership to integrate reliability and security practices
Partner with Finance on cost optimization initiatives and capacity planning budgets
Engage with Customer Success and Support teams on reliability-impacting issues

Platform & Tooling Strategy

Drive the simplification and reduction of observability, monitoring, and alerting platforms
Establish automation standards and drive toil reduction initiatives
Help improve CI/CD pipeline architecture and deployment practices
Influence infrastructure-as-code and configuration management strategies

Organizational & Process Innovation

Implement SRE best practices including error budgets, toil tracking, and reliability reviews
Establish metrics-driven decision making and continuous improvement processes
Drive adoption of chaos engineering and proactive reliability testing
Create and maintain SRE documentation, runbooks, and knowledge sharing systems
Develop and execute disaster recovery and business continuity plans

Required Skills Leadership & Management Experience

Bachelors or Masters degree in Computer Science, Engineering, or equivalent experience
8+ years in engineering leadership roles, with 4+ years managing managers
Proven track record of building and scaling engineering teams
Experience with performance management, career development, and succession planning
Strong executive presence and ability to influence without authority
Experience driving organizational change and cultural transformation

Technical Expertise

Experience with multiple cloud platforms (AWS, GCP, Azure) and hybrid environments
Deep understanding of distributed systems, microservices architecture, and cloud platforms
Hands-on experience with modern observability tools (Prometheus, Grafana, Datadog, etc.)
Strong background in infrastructure automation, CI/CD, and infrastructure-as-code
Expertise in capacity planning, performance optimization, and cost management

SRE & Operations Mastery

Deep understanding of SRE principles, practices, and implementation at scale
Experience establishing SLI/SLO frameworks and error budget management
Proven track record of improving system reliability and reducing operational toil
Experience with incident management, post-mortem processes, and reliability engineering
Background in 24/7 operations and on-call management best practices

Business & Strategic Acumen

Understanding of budget management, resource allocation, and ROI analysis
Ability to communicate technical concepts to non-technical stakeholders and executives
Experience with vendor management and technology partnership decisions
Knowledge of compliance frameworks and regulatory requirements

Desired Skills Advanced Technical Background

Background in container orchestration (Kubernetes) and service mesh technologies
Knowledge of database administration and data platform reliability
Experience with security engineering and DevSecOps practices

Success Metrics Reliability & Performance

Achieve and maintain service availability targets (typically 99.9%+ uptime)
Reduce mean time to detection (MTTD) and mean time to recovery (MTTR)
Improve capacity planning accuracy and reduce over-provisioning costs
Increase deployment frequency while maintaining reliability standards

Team & Organizational Development

Build and retain a high-performing SRE organization with low attrition
Establish clear career progression and achieve high employee satisfaction scores
Develop internal talent and promote from within the SRE organization
Create sustainable on-call practices with reasonable operational load

Operational Excellence

Drive measurable reduction in operational toil and manual interventions
Establish comprehensive observability and proactive alerting across all services
Implement effective incident response with blameless post-mortem culture
Achieve cost optimization targets while maintaining reliability standards

More Jobs at Five9

Senior Manager, Advanced Salesforce Delivery

Chennai, Tamil Nadu, India

Experience: Not specified

Salary: Not disclosed

Senior Quality Engineer

Chennai

5.0 - 8.0 yrs

INR 13 - 17 Lacs

Sr. Quality Engineer - Digital

Chennai

5.0 - 10.0 yrs

INR 7 - 12 Lacs

Principal Product Manager

Chennai, Tamil Nadu, India

9.0 - 9.0 yrs

Salary: Not disclosed

Principal Product Manager, Conversational AI

Chennai, Tamil Nadu, India

9.0 - 9.0 yrs

Salary: Not disclosed

Mock Interview

Practice Video Interview with JobPe AI

Start Job-Specific Interview

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now

Five9

Cloud Computing / Software as a Service (SaaS)

San Ramon

Login to

Please Verify Your Phone or Email

Confirm Action

Director of Site Reliability Engineering