Posted:1 week ago|
Platform:
On-site
Full Time
More than 12+ years experience with at least 3 4 years of strong, hands-on expertise in Grafana and Prometheus. Role Overview: We are looking for a Site Reliability Engineer (SRE) who is a Subject Matter Expert (SME) in Grafana and Prometheus, capable of leading and guiding the internal team in implementing industry-standard monitoring and observability practices. The ideal candidate should not just be a user of Grafana but someone who has in-depth experience in its configuration, deployment, scaling, exporter integration, and can advise on monitoring strategies and best practices. Key Responsibilities: Act as a primary expert and SME for Grafana and Prometheus monitoring tools. Design, implement, and manage custom dashboards, alerts, and metrics using Prometheus exporters and Grafana visualizations. Understand and configure Prometheus exporters (Node Exporter, Blackbox Exporter, etc.) and provide guidance to team members on integration. Conduct architecture reviews, suggest improvements, and ensure adherence to observability best practices. Provide hands-on guidance to teams on monitoring tech stack setup, performance tuning, and usage patterns. Should be able to explain the tech stack end-to-end from which metrics are collected, processed, stored, and visualized. Collaborate with DevOps teams to ensure CI/CD pipeline monitoring and observability are well integrated and scalable. Required Skills: Expertise in Grafana Configuration, templating, alerting, and best practices. Strong hands-on with Prometheus Installation, configuration, scaling, and exporter integration. Deep understanding of exporters beyond basic usage (not just metrics consumers).Ability to guide and mentor existing team members with decent Grafana knowledge. Experience with infrastructure monitoring, logging stacks (ELK/EFK), and incident response tools. Exposure to CI/CD tools (Jenkins, GitLab CI, etc.) and integration of monitoring into the pipelines. Strong scripting Shell, Python, etc. and automation skills. Nice to Have: Familiarity with Kubernetes monitoring using kube-prometheus-stack or similar.Experience in other observability tools e.g., Loki, Tempo, Jaeger. Certification in SRE Monitoring tools is a plus. Show more Show less
Virtusa
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
My Connections Virtusa
Chennai, Tamil Nadu, India
Salary: Not disclosed
Chennai, Tamil Nadu, India
Salary: Not disclosed