Site Reliability Engineer

Akrivia HCM

8 - 13 years

15 - 25 Lacs

Hyderabad

Posted:16 hours ago| Platform:

Apply

Skills Required

New Relic Site Reliability Engineering AWS Log Management Logging Root Cause Analysis MTTR Elk Sla Compliance Application Monitoring Prometheus Ci/Cd Logstash Datadog Grafana Monitoring Tools Performance Monitoring Jenkins RCA Log Analysis Terraform Sentry Alerts Pagerduty

Work Mode

Work from Office

Job Type

Full Time

Job Description

Role Summary

Akrivia HCM is seeking an experienced Site Reliability Engineer to safeguard the performance, scalability, and availability of our global HR tech platform. You will define service-level objectives, automate infrastructure, lead incident response, and partner with engineering squads to deliver reliable releases at high velocity.

Key Responsibilities

Define and track SLIs/SLOs for latency, availability, and error budgets.
Build and maintain Terraform/Helm/ArgoCD stacks; convert manual toil into code.
Instrument services with Prometheus, Grafana, Datadog, and OpenTelemetry; create actionable alerts & dashboards.
Serve in the on-call rotation, lead rapid mitigation, run blameless post-mortems, and close action items.
Model load growth, tune autoscaling policies, run load tests, and drive cost-optimisation reviews.
Design chaos game-days and fault-injection experiments to validate fail-over and recovery paths.
Review designs/PRs for reliability anti-patterns and coach development teams on SRE best practices.

Must-Have Qualifications

5+ years operating large-scale, user-facing SaaS systems on AWS, GCP, or Azure (Kubernetes/EKS preferred).
Proficiency with Infrastructure-as-Code (Terraform, Helm, Pulumi, or CloudFormation) and GitOps (ArgoCD/Flux).
Hands-on experience building observability stacks (Prometheus, Grafana, Datadog, New Relic, etc.).
Proven track record reducing MTTR and change-failure rate through automation and robust incident processes.
Strong scripting or programming skills in Go, Python, or TypeScript.
Deep debugging skills across Linux, networking, containers, databases, and web/API layers.
Excellent written and verbal communication skills.

Good-to-Have Skills

Exposure to AWS Well-Architected reviews, FinOps, or cost-optimisation initiatives.
Experience with service mesh (Istio/Linkerd), event-driven systems (Kafka/NATS), or serverless (Lambda).
Familiarity with SOC 2 / ISO 27001 controls and secrets management (AWS KMS, Vault).
Chaos engineering tools (ChaosMesh, Gremlin) and performance testing (k6, Gatling).
Certifications such as AWS DevOps Pro, CKA/CKAD, or Google Cloud SRE.

More Jobs at Akrivia HCM

Senior Content Writer

Hyderabad

6.0 - 11.0 yrs

INR 6 - 12 Lacs

Release Manager

Hyderabad

8.0 - 13.0 yrs

INR 15 - 25 Lacs

Site Reliability Engineer

Hyderabad

8.0 - 13.0 yrs

INR 15 - 25 Lacs

Mock Interview

Practice Video Interview with JobPe AI

Start New Relic Interview Now

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.