VP – Site Reliability Engineering (SRE)

5 years

0 Lacs

Posted:1 week ago| Platform: Linkedin logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

We’re on an exciting journey with our client and we want you to join us. With our client, you will be

exposed to the latest technologies and work with some of the brightest minds in the industry.


VP – Site Reliability Engineering (SRE)


Roles & Responsibilities

This role offers the opportunity to shape the SRE function within CTO and be founder members of the Group SRE team. The successful candidate will work with a small number of SREs in the platform engineering and operations teams as well as wider infrastructure teams for both public cloud and on-premises platforms. They will also play a significant role in setting the direction for automation and operations engineering across the organisation.


  • Help define, drive and implement the SRE strategy
  • Promote an “Automate-first” culture in operating services, through the reduction of toil
  • Develop methodologies and strategies for identification of toil-heavy and inefficient processes, and for the automation and elimination of toil, delay and redundancy in such processes.
  • Assist in developing engineering and operational service metrics with actionable plans to improve operational efficiency, enhance service quality/SLA, and optimize delivery
  • Working with all parties, develop and implement SLOs for critical services
  • Define monitoring strategy with Engineering and implement appropriate capabilities
  • Design and implement reliability improvements
  • Conduct capacity planning
  • Perform chaos engineering exercises
  • Lead architectural reviews for reliability
  • Drive continuous improvement from incidents
  • Contribute to the Test and Deployment processes, ensuring that they are as reliable and automated as possible


Skills and qualifications


  • A bachelor’s degree or higher in computer science, information systems, or a related field, or equivalent work experience
  • Hands on SRE Practitioner with 5+ years working experience in SRE role.
  • Practical experience defining and implementing Service Level Objectives, and operating to Error Budgets
  • Have implemented and operated monitoring and observability technologies for a wide range of enterprise-grade Production systems.
  • Experience in a corporate software development lifecycle methodology. Some experience implementing gitops a plus
  • Demonstrates a strong understanding of how technical systems work and interact
  • Strong analytical skills and a solid understanding of all critical Production Support processes
  • 2+ years of experience with one/more public/private cloud platforms (e.g. AWS, Azure etc.).


Knowledge Required

  • Comprehensive understanding of SRE principles and ability to evangelise
  • Working knowledge of modern observability tooling, including OpenTelemetry, Prometheus, Grafana, and associated projects
  • Experience of Infrastructure as Code (IaC) principles and design
  • Extensive knowledge of Configuration Management Solutions such as Ansible, Chef or Puppet

Mock Interview

Practice Video Interview with JobPe AI

Start Job-Specific Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now

RecommendedJobs for You