DevOps with Grafana Implementation & Infrastructure Monitoring Engineer

0 years

0 Lacs

Posted:22 hours ago| Platform: Linkedin logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

Grafana installation, configuration, and dashboard development

 

Key Responsibilities

1. Grafana Implementation & Administration


  • Install, configure, and manage Grafana Enterprise/OSS in Client environments. 
  • Set up access controls, user roles, folder structure, and data source governance. 
  • Configure dashboards, alerting rules, plugins, variables, and templates. 

2. Data Source Integration


  • Connect Grafana to all required on-prem and cloud monitoring data sources including: 
  • Prometheus / Loki / InfluxDB / Elasticsearch 
  • CloudWatch, Azure Monitor, GCP Metrics 
  • SQL-based sources 
  • Windows/Linux server metrics 
  • Network devices & firewalls 
  • Work with infra-SMEs to understand each system’s metrics and integrate them correctly. 

3. Dashboard & Visualization Development

  • Build high-visibility dashboards for: 
  • Server performance & capacity (Windows/Linux) 
  • Network health (routers, switches, firewalls) 
  • Database monitoring (SQL Server, Oracle, PostgreSQL, MySQL, etc.) 
  • Application performance & service health 
  • Develop standardized templates and modules for repeatable dashboard creation across teams. 

4. Alerting, SLO/SLAs & Incident Response


  • Configure alert rules aligned with Client’ priorities and “North Star” monitoring standards. 
  • Define and implement alert thresholds, severity levels, escalations, and routing integrations. 
  • Integrate Grafana alerts with ticketing systems (ServiceNow/Jira) and on-call tools (PagerDuty, Opsgenie, Teams). 
  • Support the creation of response playbooks and runbooks. 

5. Monitoring Architecture & Consolidation


  • Assess existing monitoring tools across acquired entities. 
  • Inventory all dashboards, alerts, and metrics and plan their migration into Grafana. 
  • Help design the unified Monitoring Command Center for Client. 
  • Identify noise, redundant alerts, and opportunities for automation or tuning. 

6. 24×7 Monitoring Enablement

  • Prepare dashboards, alert policies, and runbooks to support a follow-the-sun offshore model. 
  • Support the Client NOC team in stabilizing monitoring and reducing alert fatigue. 

7. Documentation & Knowledge Transfer


  • Create documentation for installation, configuration, dashboards, and alerting rules. 
  • Build SOPs, runbooks, and onboarding guides for Client infra teams. 
  • Provide KT to onshore and offshore teams. 

 


Required Skills & Experience

Technical Skills


  • Strong experience with 

    Grafana

     (administration + dashboards + plugins). 
  • Hands-on experience integrating: 
  • Prometheus, Loki, InfluxDB, Elastic, or similar metric/log systems 
  • Cloud-native monitoring tools (AWS CloudWatch, Azure Monitor, GCP Metrics) 
  • On-prem server/network monitoring stacks 
  • Good knowledge of: 
  • Linux and Windows server administration 
  • Networking basics (TCP/IP, SNMP, routing, firewalls) 
  • Databases (SQL Server, Oracle, PostgreSQL) 
  • Scripting (Python, Bash, PowerShell) 

Monitoring & Observability Skills

  • Experience designing dashboards for infra health, availability, capacity, and utilization. 
  • Strong understanding of alerting concepts: thresholds, SLOs/SLAs, and severity levels. 
  • Ability to analyze noisy alerts and implement tuning. 

Cloud & On-Prem Exposure


  • AWS / Azure / GCP monitoring tools experience. 
  • Familiarity with hybrid environments and integrating metrics across multiple data centers. 

Soft Skills


  • Strong communication skills for working directly with Client onshore teams. 
  • Ability to gather requirements from infra, DB, app, and network teams. 
  • Problem-solving mindset and ownership of monitoring reliability. 

 

Preferred Qualifications


  • Experience in a NOC / Operations Support / SRE / Monitoring Engineer role. 
  • Exposure to large enterprise monitoring consolidation projects. 
  • Experience integrating Grafana with ServiceNow/Jira/PagerDuty. 
  • Experience with infrastructure-as-code (Terraform/Ansible) is an advantage. 

Mock Interview

Practice Video Interview with JobPe AI

Start DevOps Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

RecommendedJobs for You