1 - 3 years

6 - 12 Lacs

Posted:21 hours ago| Platform: Naukri logo

Apply

Work Mode

Work from Office

Job Type

Full Time

Job Description

Job Summary

Site Reliability Engineers (SRE's) cover the intersection of Software Engineer and Systems Administrator. In other words, they can both create code and manage the infrastructure on which the code runs. This is a very wide skillset, but the end goal of an SRE is always the same: to ensure that all SLAs are met, but not exceeded, so as to balance performance and reliability with operational costs.As a Site Reliability Engineer I, you will be learning our systems, improving your craft as an engineer, and taking on tasks that improve the overall reliability of the VP platform.

Key Responsibilities:

  • Design, implement, and maintain robust

    monitoring and alerting

    systems.
  • Lead observability initiatives by improving

    metrics, logging, and tracing

    across services and infrastructure.
  • Collaborate with development and infrastructure teams to

    instrument applications

    and ensure visibility into system health and performance.
  • Write Python scripts and tools for

    automation, infrastructure management, and incident response

    .
  • Participate in and improve the

    incident management and on-call process

    , driving down Mean Time to Resolution (MTTR).
  • Conduct

    root cause analysis

    and postmortems following incidents and champion efforts to prevent recurrence.
  • Optimize systems for

    scalability, performance, and cost-efficiency

    in cloud and containerized environments.
  • Advocate and implement

    SRE best practices

    , including SLOs/SLIs, capacity planning, and reliability reviews.

Required Skills & Qualification

  • 1+ years

    of experience in a Site Reliability Engineer or similar role.
  • Proficiency in

    Python

    for automation and tooling.
  • Hands-on experience with

    monitoring and observability tools

    such as Prometheus, Grafana, Datadog, New Relic, OpenTelemetry, etc.
  • Experience with

    log aggregation and analysis tools

    like ELK Stack (Elasticsearch, Logstash, Kibana) or Fluentd.
  • Good understanding of

    cloud platforms

    (AWS, GCP, or Azure) and container orchestration (Kubernetes).
  • Familiarity with

    infrastructure-as-code

    (Terraform, Ansible, or similar).
  • Strong debugging and

    incident response skills

    .
  • Knowledge of

    CI/CD pipelines

    and release engineering practices.

Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now
Cloud Angles Digital Transformation logo
Cloud Angles Digital Transformation

Information Technology

Tech City

RecommendedJobs for You

ahmedabad, gujarat, india