Lead Site Reliability Engineer

8 - 13 years

25 - 30 Lacs

Posted:1 day ago| Platform: Naukri logo

Apply

Work Mode

Work from Office

Job Type

Full Time

Job Description

Multiplier is seeking a highly skilled

Lead Site Reliability Engineer (SRE)

to join our engineering organization. This role is critical to scaling and hardening our infrastructure and ensuring the

availability

,

reliability

, and

performance

of our systems. The ideal candi is an experienced engineer with a strong programming background, hands-on experience with modern observability stacks, and a deep understanding of incident management and system reliability practices.

What would you do / key responsibilities

  • Design, build, and evolve our

    observability and telemetry stack

    using tools such as

    Sentry, ELK, Coralogix, New Relic, Squadcast , APM platforms

  • Implement and maintain

    logging, monitoring, alerting, and tracing

    infrastructure across services
  • Lead efforts in

    incident response

    , including coordination, resolution, and

    root cause analysis (RCA)

  • Define, monitor, and maintain

    SLIs, SLOs, and SLAs

    to ensure service reliability and performance
  • Drive

    chaos engineering practices

    to proactively uncover system weaknesses and improve resilience
  • Conduct and lead

    postmortems and reliability reviews

    , focusing on continuous learning and improvement
  • Build

    proactive monitoring solutions

    to detect and remediate potential issues before they impact customers
  • Collaborate with engineering, security, and IT teams to ensure end-to-end system reliability
  • Mentor junior SREs / CREs and contribute to defining best practices across the organization

Required Qualifications

  • 8+ years of experience in SRE, Production Engineering, or backend engineering roles
  • Must have - Proficiency

    in at least one modern programming language (e.g.,

    Go, Python, Java

    )
  • Deep understanding of observability principles (Logging Metrics Traces)and hands-on experience with

    Sentry, ELK, Coralogix, New Relic

    , or equivalent tools
  • Experience designing and operationalizing SLIs/SLOs/SLAs
  • Strong knowledge of

    incident management frameworks

    and leading high-severity incident response
  • Experience conducting

    post-incident reviews

    and driving reliability improvements
  • Familiarity with

    chaos engineering tools

    and practices (e.g., Gremlin, Chaos Mesh, Chaos Monkey)
  • Proven track record of improving system uptime, reliability, and performance

Preferred Qualifications

  • Experience with containerization / ECS , cloud-native infrastructure (AWS), and service mesh technologies.

    Kubernetes

    experience is good to have.
  • Prior experience in a

    high-scale production environment

  • Certifications in SRE, DevOps, or cloud platforms (e.g., AWS Certified DevOps Engineer)

Mock Interview

Practice Video Interview with JobPe AI

Start DevOps Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now

RecommendedJobs for You