Home
Jobs

DHI Solutions - Team Lead - Observability Services

6 - 10 years

0 Lacs

Posted:1 day ago| Platform: Linkedin logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

Job Summary

We are seeking a highly skilled and proactive Team Lead Observability & RCA to build and lead a team focused on monitoring, alerting, logging, tracing (MELT), and deep Root Cause Analysis (RCA). The ideal consultant will have hands-on expertise in tools like Datadog, a strong understanding of distributed systems, and the ability to collaborate with development, support, and DevOps teams to drive actionable resolutions from observed issues.

Key Responsibilities

  • Build and manage observability pipelines to ingest and correlate Metrics, Events, Logs, and Traces (MELT)from multiple systems across web, mobile, backend, and infra.
  • Lead Datadog implementation and optimization, including APM, RUM, dashboards, synthetics, alerting, and anomaly detection features.
  • Act as the primary point of contact for triaging and analyzing issues reported via observability tools.
  • Perform end-to-end Root Cause Analysis (RCA) on recurring incidents and anomalies, and drive closure by working closely with development, QA, infra, and support teams.
  • Define and enforce observability best practices, including tagging strategy, SLO/SLA setup, error budget tracking, and log hygiene.
  • Build and maintain dashboards, monitors, and custom views for different stakeholders including Engineering, Support, and Leadership.
  • Drive incident review meetings, document learnings, and contribute to postmortems and corrective action plans.
  • Continuously evaluate the health of applications and services, and suggest architectural or code-level improvements.
  • Collaborate with InfoSec and Compliance teams to ensure telemetry data is protected, compliant, and governed.
  • Mentor junior engineers on observability frameworks, diagnostic techniques, and tooling.

Required Skills & Experience

  • 6 to 10 years of total experience with 24 years in observability, SRE, or monitoring-focused roles.
  • Strong hands-on experience with Datadog (APM, RUM, Infrastructure Monitoring, Synthetics, Dashboards, Alerts, etc.)
  • Solid understanding of MELT concepts and how to structure telemetry for modern applications.
  • Proficient in Root Cause Analysis (RCA), incident lifecycle management, and blameless postmortems.
  • Experience working with microservices, cloud platforms (AWS/GCP/Azure), and containerized environments (Docker/Kubernetes).
  • Strong skills in scripting (Python/Bash) and tools like Fluentd, Logstash, or OpenTelemetry.
  • Familiarity with CI/CD pipelines, version control (Git), and infrastructure-as-code is a plus.
  • Ability to communicate clearly with cross-functional stakeholders, translate technical findings, and influence resolution paths.
  • Strong sense of ownership, analytical thinking, and process improvement mindset.

Good To Have

  • Experience integrating observability tools with ServiceNow, PagerDuty, or Slack for automated alerting and incident response.
  • Knowledge of ITIL practices, service health modeling, and business KPIs mapping.
  • Certification in Datadog, AWS Cloud Practitioner, or SRE Foundations.

Soft Skills

  • Strong leadership and team mentoring abilities.
  • Excellent analytical and problem-solving skills.
  • Effective verbal and written communication.
  • Ability to work independently and lead in a fast-paced, production-critical environment
(ref:hirist.tech)

Mock Interview

Practice Video Interview with JobPe AI

Start Technical Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

RecommendedJobs for You