Posted:1 day ago| Platform: Naukri logo

Apply

Work Mode

Work from Office

Job Type

Full Time

Job Description

About the Role

We are looking for a highly motivated and experienced

Site Reliability Engineer (SRE)

to join our team on a permanent basis. As an SRE, you will be responsible for designing, building, and maintaining scalable, reliable, and secure systems. You will play a critical role in infrastructure automation, observability, and DevOps best practices, while collaborating with developers and operations teams to enhance overall system performance and resilience.
This is a

hands-on role

for a

Senior SRE/Lead-level professional

with the ability to work independently and take ownership of initiatives across the entire system lifecycle from planning to implementation and ongoing support.

Key Responsibilities

  • Architect, build, and manage highly available and scalable systems across cloud infrastructure (AWS).
  • Develop and maintain

    Infrastructure as Code (IaC)

    using

    Terraform

    with advanced automation practices.
  • Design, configure, and manage

    Kubernetes (EKS), Helm Charts, and containerized applications

    .
  • Implement, monitor, and optimize

    CI/CD pipelines

    (GitHub Actions, Octopus Deploy).
  • Establish robust

    observability solutions

    (Prometheus, Grafana, OpenTelemetry, Loki, Tempo, Mimir).
  • Automate manual processes end-to-end to ensure minimal human intervention.
  • Support and optimize distributed systems including

    MongoDB, Kafka, and PostgreSQL

    .
  • Collaborate closely with developers on the

    Developer Experience project

    , enabling self-service infrastructure provisioning.
  • Write and maintain automation scripts and internal tools using

    Python, Bash, PowerShell, C#, and Golang (bonus)

    .
  • Ensure security, reliability, and performance throughout the

    full system lifecycle

    planning, deployment, monitoring, and ongoing support.

Required Skills & Experience

  • 6+ years of experience in

    Site Reliability Engineering, DevOps, or Infrastructure Engineering

    roles.
  • Strong expertise in

    Terraform

    and Infrastructure as Code practices.
  • Hands-on experience with

    Kubernetes (EKS), Helm, and container orchestration

    .
  • Strong knowledge of

    databases and messaging systems

    : MongoDB, Kafka, PostgreSQL.
  • Proficiency with

    CI/CD pipelines

    (GitHub Actions, Octopus Deploy).
  • Experience implementing

    monitoring and observability solutions

    (Prometheus, Grafana, OpenTelemetry, Loki, Tempo, Mimir).
  • Solid scripting/programming skills in

    Python, Bash, PowerShell

    (with bonus skills in

    C# and Golang

    ).
  • Ability to work independently, manage priorities, and deliver solutions with minimal supervision.
  • Strong problem-solving, troubleshooting, and analytical skills.

Preferred Qualifications

  • Experience working in a

    global IT services environment

    .
  • Strong knowledge of

    cloud-native architectures

    and

    DevOps best practices

    .
  • Prior experience in

    Developer Experience initiatives

    .
  • Mindset of continuous improvement, automation-first approach, and operational excellence.
 

Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

RecommendedJobs for You

gurugram, greater noida, delhi / ncr

kolkata, mumbai, new delhi, hyderabad, pune, chennai, bengaluru