Lead Site Reliability Engineer (SRE) - Coimbatore

5 years

84 - 180 Lacs

Posted:2 months ago| Platform: Linkedin logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

Location:

Coimbatore

Department:

Cloud Managed Services

Experience:

5+ years

About The Role

We are seeking a highly skilled

Lead Site Reliability Engineer (SRE)

with expertise in

AWS

Cloud, Kubernetes, and Infrastructure Automation

to join our team. The ideal candidate will have extensive experience in designing and operating highly available, resilient, and scalable systems. As a Lead SRE, you will drive reliability initiatives, mentor engineers, and collaborate with cross-functional teams to ensure operational excellence.

Key Responsibilities

Reliability & Operations

  • Own system reliability, availability, and performance across critical services.
  • Define and manage KPI's to improve service health.
  • Lead incident management, root cause analysis, and postmortems.

Automation & Infrastructure

  • Design and implement Infrastructure as Code using Terraform, Ansible, or similar tools.
  • Automate deployments, monitoring, and recovery processes.
  • Optimize cloud infrastructure for scalability, performance, and cost efficiency.

Observability & Monitoring

  • Establish monitoring, logging, and alerting using Prometheus, Grafana, ELK, Datadog, or equivalent.
  • Drive adoption of tracing and observability practices to enhance system insights.
  • Build and maintain on-call dashboards and incident response.

Collaboration & Leadership

  • Mentor and guide a team of SREs and Cloud engineers.
  • Promote SRE best practices and drive cultural change across teams.

Required Skills & Experience

Core Technical Skills

  • Strong experience with AWS.
  • Proficiency with Kubernetes, containers, and microservices orchestration.
  • Hands-on expertise in Infrastructure as Code (Terraform, Ansible, etc.).
  • Solid scripting/programming skills in Python, Go, or Shell.

Platform & Tooling Expertise

  • Experience with CI/CD tools (Jenkins, GitLab CI, GitHub Actions, ArgoCD, etc.).
  • Proficiency with monitoring & observability stacks (Prometheus, Grafana, ELK, Datadog, etc.).
  • Strong knowledge of networking, security, and distributed system fundamentals.

Leadership Skills

  • Prior experience leading SRE, DevOps, or Cloud teams.
  • Strong ability to mentor junior engineers and drive best practices adoption.

Preferred Qualifications

  • 5+ years in SRE/DevOps/Cloud roles.
  • Certified in AWS/GCP/Azure or Kubernetes (CKA/CKS).
  • Experience with service mesh (Istio, Linkerd).
  • Knowledge of FinOps or cloud cost optimization practices.
Skills: cloud,reliability,infrastructure,aws,kubernetes

Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

RecommendedJobs for You