Posted:1 day ago|
Platform:
Work from Office
Full Time
Job Title: Observability & Monitoring Engineer
Location: India
The Role
You will design and operate the telemetry backbone for our internal platforms and business-critical applications. This role spans metrics, logs, traces, synthetics, RUM, and event correlation instrumenting services, building dashboards, tuning alerts, and partnering with Incident/Problem/Change to drive measurable reliability outcomes.
What You ll Do
Design the observability stack: Define and implement standards for metrics, logs, traces, and profiling (e.g., OpenTelemetry collectors, exporters, and context propagation).
Instrument what matters: Establish golden signals, SLIs/SLOs, and health checks for priority services; automate baselining and anomaly detection.
Build actionable visibility: Create executive and on-call views (dashboards, service health, dependency maps) for Apps, Network, Collaboration tools, HRIS, and integrations.
Engineer signal > noise: Develop alerting policy as code; reduce false positives; implement suppression, deduplication, and auto-remediation runbooks.
Partner in operations: Work hand-in-hand with Incident & Problem Management to accelerate triage, cut MTTR, and drive durable RCAs and prevention actions.
Integrate the ecosystem: Connect observability to CI/CD, feature flags, incident tooling CMDB/service catalog, and collaboration channels (Slack/Zoom).
Champion reliability culture: Coach product and platform teams on instrumentation patterns, trace context, and SLO thinking; contribute reusable modules/templates.
Continuously improve: Lead telemetry hygiene initiatives, cost/usage optimization of monitoring platforms, and performance tuning across tiers.
Security & compliance: Ensure monitoring data is handled per policy; implement role-based access and guardrails for sensitive logs/metrics.
Experience: 4-8 years in Observability / SRE / Platform / Monitoring roles supporting SaaS or enterprise applications.
Telemetry tools: Hands-on with monitoring and logging tools.
Tracing & metrics: Strong grasp of distributed tracing, RED/USE/golden signals, SLI/SLO/SLA, and error budgets.
Automation & code: Proficient in common languages such as Python.
Cloud & platforms: Experience with AWS
ITSM fluency: Comfortable operating within Incident/Problem/Change frameworks; adept at runbooks, RCAs, and post-incident reviews.
Data mindset: SQL or log query languages; can translate telemetry into insights and narratives.
Soft skills: Clear communicator, collaborative partner, bias to action, and calm during outages.
Service maps/dependency modeling, synthetic/RUM design, APM transaction tuning, log schema governance.
Experience integrating observability with CMDB/service catalog and feature flag systems.
Certifications (e.g., AWS, Datadog).
Zendesk
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.
We have sent an OTP to your contact. Please enter it below to verify.
bengaluru, karnataka, india
Salary: Not disclosed
bengaluru
6.0 - 10.0 Lacs P.A.
pune, maharashtra, india
Salary: Not disclosed
pune, maharashtra
Experience: Not specified
Salary: Not disclosed
pune, maharashtra, india
Salary: Not disclosed
mumbai, maharashtra, india
Salary: Not disclosed
mumbai, maharashtra, india
Experience: Not specified
Salary: Not disclosed
Salary: Not disclosed
bengaluru, karnataka, india
Salary: Not disclosed
18.0 - 25.0 Lacs P.A.