Site Reliability Engineer - Grafana

12 years

0 Lacs

Posted:1 week ago| Platform: Linkedin logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

More than 12+ years experience with at least 3 4 years of strong, hands-on expertise in Grafana and Prometheus. Role Overview: We are looking for a Site Reliability Engineer (SRE) who is a Subject Matter Expert (SME) in Grafana and Prometheus, capable of leading and guiding the internal team in implementing industry-standard monitoring and observability practices. The ideal candidate should not just be a user of Grafana but someone who has in-depth experience in its configuration, deployment, scaling, exporter integration, and can advise on monitoring strategies and best practices. Key Responsibilities: Act as a primary expert and SME for Grafana and Prometheus monitoring tools. Design, implement, and manage custom dashboards, alerts, and metrics using Prometheus exporters and Grafana visualizations. Understand and configure Prometheus exporters (Node Exporter, Blackbox Exporter, etc.) and provide guidance to team members on integration. Conduct architecture reviews, suggest improvements, and ensure adherence to observability best practices. Provide hands-on guidance to teams on monitoring tech stack setup, performance tuning, and usage patterns. Should be able to explain the tech stack end-to-end from which metrics are collected, processed, stored, and visualized. Collaborate with DevOps teams to ensure CI/CD pipeline monitoring and observability are well integrated and scalable. Required Skills: Expertise in Grafana Configuration, templating, alerting, and best practices. Strong hands-on with Prometheus Installation, configuration, scaling, and exporter integration. Deep understanding of exporters beyond basic usage (not just metrics consumers).Ability to guide and mentor existing team members with decent Grafana knowledge. Experience with infrastructure monitoring, logging stacks (ELK/EFK), and incident response tools. Exposure to CI/CD tools (Jenkins, GitLab CI, etc.) and integration of monitoring into the pipelines. Strong scripting Shell, Python, etc. and automation skills. Nice to Have: Familiarity with Kubernetes monitoring using kube-prometheus-stack or similar.Experience in other observability tools e.g., Loki, Tempo, Jaeger. Certification in SRE Monitoring tools is a plus. Show more Show less

Mock Interview

Practice Video Interview with JobPe AI

Start Reliability Interview Now

My Connections Virtusa

Download Chrome Extension (See your connection in the Virtusa )

chrome image
Download Now
Virtusa
Virtusa

Information Technology and Services

Southborough

20,000+ Employees

3452 Jobs

    Key People

  • Kris Canekeratne

    Chairman and CEO
  • Sanjay Singh

    President and COO

RecommendedJobs for You