Information Technology Lead

4 - 8 years

6 - 10 Lacs

Posted:1 day ago| Platform: Naukri logo

Apply

Work Mode

Work from Office

Job Type

Full Time

Job Description

Job Title: Observability & Monitoring Engineer
Location: India

The Role

You will design and operate the telemetry backbone for our internal platforms and business-critical applications. This role spans metrics, logs, traces, synthetics, RUM, and event correlation instrumenting services, building dashboards, tuning alerts, and partnering with Incident/Problem/Change to drive measurable reliability outcomes.

What You ll Do
  • Design the observability stack: Define and implement standards for metrics, logs, traces, and profiling (e.g., OpenTelemetry collectors, exporters, and context propagation).

  • Instrument what matters: Establish golden signals, SLIs/SLOs, and health checks for priority services; automate baselining and anomaly detection.

  • Build actionable visibility: Create executive and on-call views (dashboards, service health, dependency maps) for Apps, Network, Collaboration tools, HRIS, and integrations.

  • Engineer signal > noise: Develop alerting policy as code; reduce false positives; implement suppression, deduplication, and auto-remediation runbooks.

  • Partner in operations: Work hand-in-hand with Incident & Problem Management to accelerate triage, cut MTTR, and drive durable RCAs and prevention actions.

  • Integrate the ecosystem: Connect observability to CI/CD, feature flags, incident tooling CMDB/service catalog, and collaboration channels (Slack/Zoom).

  • Champion reliability culture: Coach product and platform teams on instrumentation patterns, trace context, and SLO thinking; contribute reusable modules/templates.

  • Continuously improve: Lead telemetry hygiene initiatives, cost/usage optimization of monitoring platforms, and performance tuning across tiers.

  • Security & compliance: Ensure monitoring data is handled per policy; implement role-based access and guardrails for sensitive logs/metrics.

What You Bring
  • Experience: 4-8 years in Observability / SRE / Platform / Monitoring roles supporting SaaS or enterprise applications.

  • Telemetry tools: Hands-on with monitoring and logging tools.

  • Tracing & metrics: Strong grasp of distributed tracing, RED/USE/golden signals, SLI/SLO/SLA, and error budgets.

  • Automation & code: Proficient in common languages such as Python.

  • Cloud & platforms: Experience with AWS

  • ITSM fluency: Comfortable operating within Incident/Problem/Change frameworks; adept at runbooks, RCAs, and post-incident reviews.

  • Data mindset: SQL or log query languages; can translate telemetry into insights and narratives.

  • Soft skills: Clear communicator, collaborative partner, bias to action, and calm during outages.

Nice to Have
  • Service maps/dependency modeling, synthetic/RUM design, APM transaction tuning, log schema governance.

  • Experience integrating observability with CMDB/service catalog and feature flag systems.

  • Certifications (e.g., AWS, Datadog).

Mock Interview

Practice Video Interview with JobPe AI

Start Job-Specific Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now
Zendesk logo
Zendesk

Software Development

San Francisco California

RecommendedJobs for You