Senior Site Reliability Engineer

6 - 8 years

0 Lacs

Posted:3 days ago| Platform: Foundit logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

Position Summary:

We are looking for a Senior Site Reliability Engineer (SRE) with deep expertise in observability, cloud-native infrastructure, and large-scale distributed systems. This role is highly hands-on and focuses on designing, building, and operating reliable, observable, and scalable platforms running on Kubernetes, with a strong preference for Google Cloud Platform (GCP) and AWS.

Job Responsibilities :

  • Design, implement, and operate

    highly available and resilient Kubernetes-based systems

    .
  • Define, monitor, and enforce

    SLIs, SLOs, and error budgets

    to ensure service reliability.
  • Lead

    incident response, root cause analysis (RCA), and postmortems

    , driving continuous improvement.
  • Architect and manage

    observability platforms

    for metrics, logging, tracing, and alerting.
  • Work hands-on with

    Prometheus, Alertmanager, OpenTelemetry, Grafana

    , and

    Loki / ELK / OpenSearch

    .
  • Implement

    cloud-native monitoring and logging

    , with preference for

    GCP Cloud Monitoring & Logging

    .
  • Establish

    actionable alerting standards

    to reduce noise and improve response effectiveness.
  • Build and manage

    cloud infrastructure on GCP (preferred) or AWS

    .
  • Operate and scale

    Kubernetes clusters (GKE preferred)

    and deploy services using

    Helm

    .
  • Manage containerized workloads using

    Docker

    .
  • Develop

    automation and internal tooling

    using Python to improve reliability and observability.
  • Integrate

    CI/CD pipelines

    with reliability and monitoring checks.
  • Mentor junior engineers, influence architectural decisions, and collaborate across engineering teams.
  • Required Skills and Qualifications:

    • 6+ years of experience as a

      DevOps Engineer, SRE, or related software engineering role

      , supporting production-grade systems.
    • Strong hands-on experience with

      cloud infrastructure on GCP (preferred) or AWS

      .
    • Proven expertise in operating

      Kubernetes-based platforms

      in production environments (

      GKE preferred

      ).
    • Solid experience designing and maintaining

      highly available and resilient systems

      using SRE best practices.
    • Hands-on knowledge of

      SLIs, SLOs, error budgets

      , and reliability engineering principles.
    • Strong experience with

      observability and monitoring tools

      , including Prometheus, Grafana, Alertmanager, OpenTelemetry, and log platforms such as Loki / ELK / OpenSearch.
    • Demonstrated experience in

      incident management, on-call support, root cause analysis, and postmortems

      .
    • Proficiency in

      automation and tooling using Python

      , with additional scripting experience in Shell or Groovy.
    • Experience integrating

      CI/CD pipelines

      (Jenkins, GitHub) with deployment, monitoring, and reliability checks.
    • Strong understanding of

      microservices architectures, distributed systems

      , and containerized workloads.
    • Hands-on experience with

      Infrastructure as Code (IaC)

      tools such as Terraform or CloudFormation.
    • Good knowledge of

      cloud networking, security fundamentals, and access controls

      .
    • Strong analytical and problem-solving skills with a proactive operational mindset.
    • Excellent communication skills and the ability to collaborate effectively with cross-functional engineering teams.

    Mock Interview

    Practice Video Interview with JobPe AI

    Start Job-Specific Interview
    cta

    Start Your Job Search Today

    Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

    Job Application AI Bot

    Job Application AI Bot

    Apply to 20+ Portals in one click

    Download Now

    Download the Mobile App

    Instantly access job listings, apply easily, and track applications.

    coding practice

    Enhance Your Skills

    Practice coding challenges to boost your skills

    Start Practicing Now

    RecommendedJobs for You

    navi mumbai, maharashtra, india