Posted:4 days ago|
Platform:
Remote
Full Time
Job Title: Observability & Monitoring Engineer
Location: India (Remote/Hybrid as per company policy
Experience: 4–6 years
Employment Type: Full-time
Role Summary
While many vendors treat monitoring as a reactive afterthought, we embed Datadog-trained Observability Engineers directly into our engineering and operations teams to deliver real-time visibility, proactive tuning, and smarter incident management.
We are looking for a highly capable Observability & Monitoring Engineer with 4–6 years of experience in Datadog and related observability practices. The engineer will be at the forefront of transforming how systems are monitored—reducing noise, accelerating root-cause discovery, and enabling smarter, correlated event flows across cloud-native environments.
Core Responsibilities:
Datadog Ownership:
Build and maintain Datadog dashboards, monitors, and SLOs with a focus on business and operational relevance.
Configure and tune alerts to eliminate noise and reduce false positives, enabling focused responses and intelligent routing.
Proactive Monitoring & Alert Tuning:
Implement proactive alert strategies based on usage patterns and event behavior.
Continuously optimize thresholds, baselines, and anomaly detection logic to ensure actionable monitoring signals.
Observability & Root-Cause Analysis (RCA):
Correlate metrics, logs, and traces across distributed systems to facilitate rapid root-cause triangulation.
Drive investigations from high CPU alerts to middleware issues such as queue overloads, using Datadog APM and tracing.
Integrated Support & Event Correlation:
Work closely with L2/Smart L3 and platform teams to support event correlation, AWS incident flows, and CI/CD telemetry.
Participate in day-to-day IT operations, functional system support, and incident escalation workflows.
SAP CPI API Monitoring:
Build and maintain targeted dashboards for SAP CPI APIs to ensure availability, throughput, and performance visibility.
What Makes This Role Unique:
You are embedded in the core delivery team, not isolated in a separate monitoring silo.
You work on proactive monitoring, not just reacting to alerts.
You support a platform aligned with Smart’s tooling and architecture, including high-frequency CI tracing and real-time AWS integration.
You help evolve how we define “observability maturity” by integrating it deeply into development and ops workflows.
Required Skills & Experience:
4–6 years of experience in observability, SRE, or DevOps roles with strong exposure to Datadog.
Experience with configuring and managing Datadog’s dashboards, monitors, APM, and logs.
Deep understanding of observability principles: metrics, logs, distributed traces, RUM, and synthetic monitoring.
Experience tracing infrastructure or application alerts (e.g., CPU, latency) to actual service or middleware-level bottlenecks.
Familiarity with cloud platforms like AWS (preferred), Azure, or GCP.
Hands-on experience in event management, incident support, and RCA documentation.
Exposure to SAP CPI monitoring or other enterprise integration middleware is a plus.
What You’ll Get:
The opportunity to redefine observability in a modern, fast-paced environment.
Ownership of critical monitoring pipelines and real-time troubleshooting tools.
Work with global engineering and platform teams to drive performance and reliability.
Flexible work environment and access to upskilling resources.
Send us your resume at:
careers@algoworks.com
Algoworks Solutions
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.
We have sent an OTP to your contact. Please enter it below to verify.
Salary: Not disclosed
0.6 - 0.6 Lacs P.A.
Salary: Not disclosed