Senior Site Reliability Engineer

0 years

0 Lacs

Posted:16 hours ago| Platform: Linkedin logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

e're looking for a hands-on Site Reliability / DevOps Engineer to be our first hire in this function, responsible for owning and scaling the reliability, observability, and infrastructure of our platform running entirely on Microsoft Azure.You'll be critical in shaping DevOps culture, architecting fault-tolerant systems, and deploying automation to improve uptime, performance, and cost efficiency.This is a hybrid role combining SRE and DevOps principles - ideal for builders comfortable working in fast-paced, product-driven environments.

What You'll Own

Cloud Infrastructure (Microsoft Azure Must Have) :
  • Architect, deploy, and maintain services across Azure App Services, Azure Container Apps, Cosmos DB, Event Hubs, Azure Monitor, Azure VMs, and Azure Kubernetes Service (AKS).
  • Design and manage networking (VNets, Subnets, NSGs) and identity/access controls (PIM, Managed Identities, Enterprise Applications, Role-based Access Control).
  • Own infrastructure provisioning using Terraform / Bicep.
  • Implement cost-effective, scalable, and secure cloud environments across development, staging, and production.

Monitoring, Observability & Incident Response

  • Set up end-to-end observability using Prometheus, Grafana, Azure Monitor, ELK Stack, and Sentry.
  • Define and enforce standards for logging, metrics, traces, SLIs/SLOs, and error budgets.
  • Build proactive alerting systems for APIs, RabbitMQ, Databricks pipelines, and external integrations.
  • Establish on-call rotations, incident response runbooks, and lead RCAs to minimize MTTR.

CI/CD, Automation & Tooling

  • Automate deployments and infrastructure lifecycle using GitHub Actions, Terraform modules, and CLI tools.
  • Improve CI/CD for faster, safer releases across containerized and VM-based workloads.
  • Build internal tools for diagnostics, rollback safety, and release automation.
  • Integrate resilience patterns : retries, circuit breakers, backoff strategies, failovers.

DevOps & System Reliability

  • Optimize system performance, memory usage, and availability for core services like RabbitMQ, APIs, analytics pipelines on Databricks.
  • Implement zero-downtime deployments, self-healing systems, and infrastructure audits.
  • Perform regular cost analysis, right-sizing, and tag-based budget enforcement.

Security & Compliance Collaboration

  • Work with security teams to maintain infrastructure and data flow diagrams, support ISO 27001, GDPR, PDPA readiness.
  • Participate in threat modeling, define trust boundaries, and implement audit-ready infrastructure practices.

Tech Stack You'll Work With

  • Cloud : Microsoft Azure (App Services, Container Apps, AKS, Cosmos DB, Event Hubs, Monitor, VMs).
  • IaC : Terraform, Bicep.
  • CI/CD : Azure Devops,GitHub Actions.
  • Monitoring & Logs : Prometheus, Grafana, Azure Monitor, ELK, Sentry.
  • Queueing : RabbitMQ, Kafka.
  • Languages : Node.js, Python (mostly for debugging
(ref:hirist.tech)

Mock Interview

Practice Video Interview with JobPe AI

Start DevOps Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

RecommendedJobs for You