Site Reliability Engineer (SRE)

4 - 9 years

20 - 27 Lacs

Posted:4 hours ago| Platform: Naukri logo

Apply

Work Mode

Hybrid

Job Type

Full Time

Job Description

We are seeking a passionate and skilled Site Reliability Engineer (SRE) to join our team. In this role, you will ensure high availability, performance, and security of our systems while proactively identifying and resolving reliability issues. You will be responsible for monitoring, troubleshooting, automation, and building resilient infrastructure that supports millions of users globally.

Key Responsibilities

  • Monitor, troubleshoot, and resolve live-site issues to maintain uptime, performance, and security.
  • Define and manage

    SLIs, SLOs, and error budgets

    to ensure reliable user experiences.
  • Consolidate infrastructure monitoring and alerting into unified systems (e.g., Prometheus + Alertmanager) while enhancing alerts with contextual information (dashboards, runbooks, severity levels).
  • Continuously improve infrastructure by upgrading and patching OS, databases, networking, and related components.
  • Optimize on-call processes, lead incident response, root-cause analysis, and post-mortems.
  • Build self-healing systems, automate repetitive/manual tasks, and proactively identify opportunities to improve uptime.

What You Will Bring

  • Strong

    SRE mindset

    proactive in spotting problems, performance bottlenecks, and areas for improvement.
  • Hands-on expertise with

    observability tools

    and strong troubleshooting skills in distributed systems.
  • Ability to work in a fast-paced, results-driven environment that demands operational excellence.
  • Strong problem-solving skills with a track record of developing and implementing solutions.
  • Excellent organizational and multitasking skills to handle multiple complex priorities under tight deadlines.

Requirements

  • Bachelor’s degree in Computer Science, Engineering, or a related technical field.
  • 2+ years of experience managing

    distributed systems & web applications

    with high uptime requirements (10M+ users preferred).
  • Proficiency in

    Linux and LAMP stack

    environments.
  • Experience with observability tools (e.g.,

    Prometheus, Grafana, New Relic, CloudWatch, ELK, Zabbix

    ).
  • Experience with Infrastructure as Code (IaC) tools (e.g.,

    Ansible, Terraform, Terragrunt

    ).
  • Strong ownership mindset, bias for action, and ability to deliver results end-to-end.
  • Excellent written and verbal communication skills.

Preferred Qualifications

  • Familiarity with

    cloud computing

    and the

    AWS ecosystem

    .
  • Programming experience to automate infrastructure tasks.
  • Flexibility to work during off-schedule hours (evenings/weekends) if required.

Mock Interview

Practice Video Interview with JobPe AI

Start DevOps Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now

RecommendedJobs for You