8 - 12 years
11.0 - 15.0 Lacs P.A.
Chennai, Pune, Delhi, Mumbai, Bengaluru, Hyderabad, Kolkata
Posted:4 weeks ago| Platform:
Work from Office
Full Time
As a member of our team, you will manage the design, architecture, operation, and advocacy of our observability platform tools. You will use your software engineering expertise to address both technical challenges related to application reliability. Our mission is to enhance system and performance through observability solutions. Candidate will report to Manager, Site Reliability Engineering. What Your Responsibilities Will Be Observability Tools: experience with open-source observability tools such as Grafana, Prometheus, Mimir, Loki, FluentD, OpenTelemetry, and Tempo. Experience designing, implementing, and managing observability platforms to monitor the performance and reliability of distributed systems. AI-Enhanced Observability: Exposure to AI/ML-based observability tools and techniques, including anomaly detection, predictive analytics, automated alert tuning, and root cause analysis using machine learning. Service Level Goals/Indicators (SLOs/SLIs): experience in building SLOs/SLIs, instrumenting applications for monitoring, and creating meaningful alerts to ensure system reliability and performance. Linux Fundamentals: Solid experience in administering, securing, and performance tuning Linux distributions. Proficiency in managing Linux environments to support observability tools. Troubleshooting: Experience with diagnosing and resolving complex technical issues in distributed systems using observability data. Experienced in root cause analysis and incident management. Software Engineering: understanding of software engineering principles, with a focus on integrating observability into the development process. Experience working in collaborative engineering teams, with a emphasis on testing and code quality. Automation: A strong desire to automate monitoring processes, reducing manual toil, and improving system reliability. Containers/Kubernetes: understanding of managing and maintaining container-based systems, within Kubernetes environments. Experience deploying observability solutions in containerized architectures. Infrastructure-as-Code: Experience deploying and maintaining infrastructure using Infrastructure-as-Code (IaC) tools such as Terraform or Pulumi, for observability systems. Technical Writing: create clear, comprehensive documentation and diagrams for observability systems to support other engineering teams. Customer Satisfaction: A for ensuring the satisfaction of internal customers (engineering teams) by providing reliable observability solutions that meet their needs. Experience with Learning: interest in expanding knowledge of the broader technology landscape, in monitoring technologies and emerging AI/ML advancements for site reliability and system monitoring. What Youll Need to be Successful Experience Minimum 8 years of experience in a SaaS environment. Bachelors degree in computer science or equivalent. Participate in an on-call rotation.
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
Bengaluru, Hyderabad
INR 3.5 - 8.5 Lacs P.A.
Mumbai, Bengaluru, Gurgaon
INR 5.5 - 13.0 Lacs P.A.
Chennai, Pune, Delhi, Mumbai, Bengaluru, Hyderabad, Kolkata
INR 3.0 - 7.0 Lacs P.A.
Chennai, Pune, Mumbai (All Areas)
INR 5.0 - 15.0 Lacs P.A.
Pune, Bengaluru, Mumbai (All Areas)
INR 11.0 - 21.0 Lacs P.A.
Chennai, Pune, Delhi, Mumbai, Bengaluru, Hyderabad, Kolkata
INR 15.0 - 16.0 Lacs P.A.
Pune, Bengaluru, Mumbai (All Areas)
INR 10.0 - 15.0 Lacs P.A.
Bengaluru, Hyderabad, Mumbai (All Areas)
INR 0.5 - 3.0 Lacs P.A.
Hyderabad, Gurgaon, Mumbai (All Areas)
INR 6.0 - 16.0 Lacs P.A.
Bengaluru, Noida
INR 16.0 - 22.5 Lacs P.A.