Jobs

Interviews
Job Alerts
Tools

Upskill and Grow with AI

Mock Interview Practice interviews in realistic simulations

Coding Practice Improve your coding skills with challenges

Certification Earn certifications to validate your skills

AI Learning Get trained with AI expert sessions

Career Path AI insights for smarter career decisions

AI Job Match Score AI-Powered Job Match Against Your Resume and Optimize Your Resume

Career Tools and Resources

Resume Builder Build Professional Resume with Ease

ATS Friendliness Check Check Resume Friendliness for Applicant Tracking Systems

Auto Apply Apply to hundreds of jobs on any platform effortlessly

Co-Pilot (Chrome Extension) Your AI Assistant for Seamless Browsing Efficiency

Interview Questions Streamline interviews with ready-to-use questions

Salaries Discover market-driven salary insights across skillsets and geographies

Companies Explore leading companies actively hiring talent
For Employers

Home
>
Jobs in Hyderabad
>
Keka HR
>
Site Reliability Engineer (SRE)

Site Reliability Engineer (SRE)

Name: Jobpe
Address: T-Hub, Plot No 1/C, Sy No 83/1, Raidurgam panmaktha, Knowledge City Rd, Hyderabad, Telangana, 500081, IN
Telephone: +91-83339-09630
Price range: Free

Keka HR

5 years

0 Lacs

Hyderabad Telangana India

Posted:5 months ago| Platform:

Apply

Skills Required

reliability azure engineering devops management stack .net design metrics logging opentelemetry instrumentation monitoring integration storage tuning analytics automation automate terraform powershell checks code scaling planning strategies testing support rollback collaboration documentation development troubleshooting onboarding query scripting certifications

Work Mode

On-site

Job Type

Full Time

Job Description

Site Reliability Engineer (SRE) – Observability & Azure Infrastructure Location: Hyderabad Type: Full-Time Experience Level: 5 to 8 Years Department: Engineering / DevOps About The Role We are looking for a highly skilled Site Reliability Engineer (SRE) to lead the implementation and management of our observability stack across Azure-hosted infrastructure and .NET Core applications. This role will focus on configuring and managing Open Telemetry, Prometheus, Loki, and Tempo, along with setting up robust alerting systems across all services — including Azure infrastructure and MSSQL databases. You will work closely with developers, DevOps, and infrastructure teams to ensure the performance, reliability, and visibility of our .NET Core applications and cloud services. Key Responsibilities Observability Platform Implementation: Design and maintain distributed tracing, metrics, and logging using OpenTelemetry, Prometheus, Loki, and Tempo. Ensure complete instrumentation of .NET Core applications for end-to-end visibility. o Implement telemetry pipelines for application logs, performance metrics, and traces. Monitoring & Alerting Develop and manage SLIs, SLOs, and error budgets. Create actionable, noise-free alerts using Prometheus Alertmanager and Azure Monitor. o Monitor key infrastructure components, applications, and databases with a focus on reliability and performance. Azure & Infrastructure Integration: Integrate Azure services (App Services, VMs, Storage, etc.) with the observability stack. o Configure monitoring for MSSQL databases, including performance tuning metrics and health indicators. o Use Azure Monitor, Log Analytics, and custom exporters where necessary. Automation & DevOps Automate observability configurations using Terraform, PowerShell, or other IaC tools. Integrate telemetry validation and health checks into CI/CD pipelines. Maintain observability as code for repeatable deployments and easy scaling. Resilience & Reliability Engineering: Conduct capacity planning to anticipate scaling needs based on usage patterns and growth. Define and implement disaster recovery strategies for critical Azure-hosted services and databases. Perform load and stress testing to identify performance bottlenecks and validate infrastructure limits. Support release engineering by integrating observability checks and rollback strategies in CI/CD pipelines. Apply chaos engineering practices in lower environments to uncover potential reliability risks proactively. Collaboration & Documentation: Partner with engineering teams to promote observability best practices in .NET Core development. o Create dashboards (Grafana preferred) and runbooks for system insights and incident response. o Document monitoring standards, troubleshooting guides, and onboarding materials. Required Skills And Experience 4+ years of experience in SRE, DevOps, or infrastructure-focused roles. Deep experience with .NET Core application observability using OpenTelemetry. Proficiency with Prometheus, Loki, Tempo, and related observability tools. Strong background in Azure infrastructure monitoring, including App Services and VMs. Hands-on experience monitoring MSSQL databases (deadlocks, query performance, etc.). Familiarity with Infrastructure as Code (Terraform, Bicep) and scripting (PowerShell, Bash). Experience building and tuning alerts, dashboards, and metrics for production systems. Preferred Qualifications Azure certifications (e.g., AZ-104, AZ-400). Experience with Grafana, Azure Monitor, and Log Analytics integration. Familiarity with distributed systems and microservice architectures. Prior experience in high-availability, regulated, or customer-facing environments. Show more Show less

More Jobs at Keka HR

Assistant Manager - Taxation

Hyderabad, Telangana, India

5 - 8 yrs

Salary: Not disclosed

Legal Manger

Hyderabad, Telangana, India

5 - 8 yrs

Salary: Not disclosed

Digital Customer Success - Group Manager

Hyderabad, Telangana, India

5 - 5 yrs

Salary: Not disclosed

Founder's Office- Associate

Hyderabad, Telangana, India

Experience: Not specified

Salary: Not disclosed

Assistant Manager - Taxation

Hyderabad, Telangana, India

Experience: Not specified

Salary: Not disclosed

Mock Interview

Practice Video Interview with JobPe AI

Start DevOps Interview

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now

Keka HR

Login to

Please Verify Your Phone or Email

Confirm Action

Site Reliability Engineer (SRE)