Jobs

Interviews
Job Alerts
Tools

Upskill and Grow with AI

Mock Interview Practice interviews in realistic simulations

Coding Practice Improve your coding skills with challenges

Certification Earn certifications to validate your skills

AI Learning Get trained with AI expert sessions

Career Path AI insights for smarter career decisions

AI Job Match Score AI-Powered Job Match Against Your Resume and Optimize Your Resume

Career Tools and Resources

Resume Builder Build Professional Resume with Ease

ATS Friendliness Check Check Resume Friendliness for Applicant Tracking Systems

Auto Apply Apply to hundreds of jobs on any platform effortlessly

Co-Pilot (Chrome Extension) Your AI Assistant for Seamless Browsing Efficiency

Interview Questions Streamline interviews with ready-to-use questions

Salaries Discover market-driven salary insights across skillsets and geographies

Companies Explore leading companies actively hiring talent
For Employers

Home
>
Jobs in hyderabad
>
Aptimized
>
Senior Site Reliability Engineer – Grafana & Observability

Senior Site Reliability Engineer – Grafana & Observability

Aptimized

20 years

0 Lacs

hyderabad telangana india

Posted:2 days ago| Platform:

Apply

Skills Required

reliability erp technology optimization consulting engineering monitoring opentelemetry data dashboard json azure aws servicenow metrics recording tuning kubernetes code stack automation automate devops github kustomize troubleshooting network security compliance iam audit retention storage nist elasticsearch logstash docker linux terraform helm git scripting python ai microservices itil certifications

Work Mode

On-site

Job Type

Full Time

Job Description

Job Description – Senior Site Reliability Engineer (SRE) – Grafana & Observability

Position: Senior Site Reliability Engineer – Grafana & Observability

Location: [Hyderabad /Hybrid]

Experience: 10–20+ years

Operating globally, Aptimized is a premium ERP, HCM, and Technology Optimization Consulting agency. Our team at Aptimized focuses on helping our customers become intelligent enterprises through leveraging creative technology solutions. At Aptimized, we prioritize our clients’ needs and create tailor-made solutions to deliver success. We understand success is not achieved through chance. We listen to your concerns. We consult with your organization. We accelerate your business. Visit us at our website to learn more about what we can do for you!

We are looking for a highly skilled Senior Site Reliability Engineer (SRE) with deep hands-on experience in Grafana ecosystem, observability engineering, and large-scale monitoring platforms.

The ideal candidate will be an expert in building and managing Grafana dashboards, Managed Grafana, Prometheus monitoring, OpenTelemetry pipelines, and integrating multiple data sources across cloud and on-prem infrastructures.

This role focuses heavily on building real-time observability, improving system reliability, and enabling data-driven operational insights.

Key Responsibilities

Grafana Engineering & Dashboard Development

Build advanced Grafana dashboards with alerts, custom panels, JSON models, and data visualizations.

Work with Grafana Managed (Azure Managed Grafana / AWS Managed Grafana) for enterprise-grade observability.

Integrate Grafana with multiple data sources such as:

Prometheus

ELK / Elasticsearch

Dynatrace

CloudWatch

Azure Monitor

InfluxDB / Telegraf

ServiceNow (incident integrations)

Develop role-based access (RBAC) and multi-tenant dashboard architectures.

Promztheus, Metrics & Alerting

Architect and manage Prometheus metrics pipelines, exporters, recording/alerting rules.

Optimize PromQL queries for high-performance dashboards.

Reduce alert noise through intelligent rule tuning and SLO-driven alerts.

Observability Platform Ownership

Build and maintain end-to-end observability stack:

Grafana + Prometheus + ELK + OpenTelemetry + Cloud-native monitoring tools.

Integrate logs, metrics, traces into unified dashboards.

Establish SLIs, SLOs, error budgets, and real-time reliability insights.

Kubernetes & Cloud Monitoring

Deploy and monitor Kubernetes clusters (AKS, EKS, Rancher).

Configure Grafana Alloy / Prometheus Operator / kube-state-metrics for cluster-level insights.

Implement Infrastructure-as-Code for observability stack deployments.

Automation & Infrastructure as Code

Automate monitoring agent deployments using:

Terraform

Azure DevOps / GitHub / GitLab

FluxCD, Kustomize, Helm

Develop monitoring-as-code for repeatable environment provisioning.

Incident Response & Performance Troubleshooting

Provide deep troubleshooting across infrastructure, network, applications, and microservices.

Build automated dashboards for war rooms and cross-team collaboration.

Leverage Grafana annotations, synthetic monitoring, and event correlation.

Security, Compliance & Governance

Implement secure access to metric/log dashboards using IAM, RBAC, ABAC.

Configure audit logs, long-term retention, and secure storage pipelines.

(Optional: FedRAMP/NIST experience beneficial for regulated workloads.)

Required Skills & Expertise

Grafana & Observability (Primary)

Expert in Grafana dashboard engineering

Prometheus + Alertmanager

Managed Grafana (Azure/AWS)

ELK Stack (Elasticsearch, Logstash, Kibana)

OpenTelemetry (OTEL) metrics & traces

Grafana Alloy, Loki (Bonus)

Cloud Platforms

Azure, AWS, IBM Cloud (Nice-to-have)

CloudWatch, Azure Monitor, App Insights

Containers & Infrastructure

Kubernetes (AKS, EKS)

Docker, Rancher, OpenShift

Linux (RHEL/CentOS)

DevOps & Automation

Terraform, Helm, Kustomize

Git, CI/CD pipelines

Scripting (Python, Bash, PowerShell)

Monitoring Ecosystem

Experience with additional tools is a plus:

Dynatrace

Splunk

Sysdig

AppDynamics

SolarWinds

Moogsoft AI-Ops

Preferred Qualifications

Strong background in SRE, Observability Engineering, DevOps, or Platform Engineering.

Experience with microservices, distributed systems, and cloud-native architectures.

ITIL v3 or industry certifications in AWS/Azure/Kubernetes are a plus.

Education

Bachelor’s degree in Computer Science, Engineering, or equivalent experience.

Certifications in cloud, observability, Grafana, or Kubernetes are an advantage.

More Jobs at Aptimized

Network Security Engineer

Hyderabad, Telangana, India

7 - 7 yrs

Salary: Not disclosed

Ux Ui Design Developer

Hyderabad

8 - 13 yrs

INR 0 - 1 Lacs

Network Engineer

Hyderabad

8.0 - 12.0 yrs

INR 0 - 3 Lacs

React JS Developer

Hyderabad

6.0 - 10.0 yrs

INR 1 - 6 Lacs

LeanIX Support Analyst

Hyderabad, Telangana, India

5.0 - 5.0 yrs

Salary: Not disclosed

Mock Interview

Practice Video Interview with JobPe AI

Start DevOps Interview

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

Aptimized

Software

Techville

RecommendedJobs for You

Senior Site Reliability Engineer – Grafana & Observability

Aptimized

hyderabad, telangana, india

Senior Site Reliability Engineer – Grafana & Observability

Aptimized

hyderabad, telangana, india

Login to

Please Verify Your Phone or Email

Confirm Action

Senior Site Reliability Engineer – Grafana & Observability