Home
Jobs

Site Reliability Engineer

8 - 13 years

15 - 25 Lacs

Posted:16 hours ago| Platform: Naukri logo

Apply

Work Mode

Work from Office

Job Type

Full Time

Job Description

Role Summary


Akrivia HCM is seeking an experienced Site Reliability Engineer to safeguard the performance, scalability, and availability of our global HR tech platform. You will define service-level objectives, automate infrastructure, lead incident response, and partner with engineering squads to deliver reliable releases at high velocity.


Key Responsibilities


  • Define and track SLIs/SLOs for latency, availability, and error budgets.
  • Build and maintain Terraform/Helm/ArgoCD stacks; convert manual toil into code.
  • Instrument services with Prometheus, Grafana, Datadog, and OpenTelemetry; create actionable alerts & dashboards.
  • Serve in the on-call rotation, lead rapid mitigation, run blameless post-mortems, and close action items.
  • Model load growth, tune autoscaling policies, run load tests, and drive cost-optimisation reviews.
  • Design chaos game-days and fault-injection experiments to validate fail-over and recovery paths.
  • Review designs/PRs for reliability anti-patterns and coach development teams on SRE best practices.

Must-Have Qualifications


  • 5+ years operating large-scale, user-facing SaaS systems on AWS, GCP, or Azure (Kubernetes/EKS preferred).
  • Proficiency with Infrastructure-as-Code (Terraform, Helm, Pulumi, or CloudFormation) and GitOps (ArgoCD/Flux).
  • Hands-on experience building observability stacks (Prometheus, Grafana, Datadog, New Relic, etc.).
  • Proven track record reducing MTTR and change-failure rate through automation and robust incident processes.
  • Strong scripting or programming skills in Go, Python, or TypeScript.
  • Deep debugging skills across Linux, networking, containers, databases, and web/API layers.
  • Excellent written and verbal communication skills.

Good-to-Have Skills


  • Exposure to AWS Well-Architected reviews, FinOps, or cost-optimisation initiatives.
  • Experience with service mesh (Istio/Linkerd), event-driven systems (Kafka/NATS), or serverless (Lambda).
  • Familiarity with SOC 2 / ISO 27001 controls and secrets management (AWS KMS, Vault).
  • Chaos engineering tools (ChaosMesh, Gremlin) and performance testing (k6, Gatling).
  • Certifications such as AWS DevOps Pro, CKA/CKAD, or Google Cloud SRE.

Mock Interview

Practice Video Interview with JobPe AI

Start New Relic Interview Now
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now
Akrivia HCM
Akrivia HCM

Software Development

San Francisco California

201-500 Employees

3 Jobs

    Key People

  • Bharat Sharma

    Co-Founder & CEO
  • Amit Deshpande

    Co-Founder & COO

RecommendedJobs for You