Posted:18 hours ago|
Platform:
Work from Office
Full Time
What You'll Do As a member of our team, you will manage the design, architecture, operation, and advocacy of our observability platform tools. You will use your software engineering expertise to address both technical challenges related to application reliability. Our mission is to enhance system and performance through observability solutions. Candidate will report to Manager, Site Reliability Engineering. What Your Responsibilities Will Be Observability Tools: experience with open-source observability tools such as Grafana, Prometheus, Mimir, Loki, FluentD, OpenTelemetry, and Tempo. Experience designing, implementing, and managing observability platforms to monitor the performance and reliability of distributed systems. AI-Enhanced Observability: Exposure to AI/ML-based observability tools and techniques, including anomaly detection, predictive analytics, automated alert tuning, and root cause analysis using machine learning. Service Level Goals/Indicators (SLOs/SLIs): experience in building SLOs/SLIs, instrumenting applications for monitoring, and creating meaningful alerts to ensure system reliability and performance. Linux Fundamentals: Solid experience in administering, securing, and performance tuning Linux distributions. Proficiency in managing Linux environments to support observability tools. Troubleshooting: Experience with diagnosing and resolving complex technical issues in distributed systems using observability data. Experienced in root cause analysis and incident management. Software Engineering: understanding of software engineering principles, with a focus on integrating observability into the development process. Experience working in collaborative engineering teams, with a emphasis on testing and code quality. Automation: A strong desire to automate monitoring processes, reducing manual toil, and improving system reliability. Containers/Kubernetes: understanding of managing and maintaining container-based systems, within Kubernetes environments. Experience deploying observability solutions in containerized architectures. Infrastructure-as-Code: Experience deploying and maintaining infrastructure using Infrastructure-as-Code (IaC) tools such as Terraform or Pulumi, for observability systems. Technical Writing: create clear, comprehensive documentation and diagrams for observability systems to support other engineering teams. Customer Satisfaction: A for ensuring the satisfaction of internal customers (engineering teams) by providing reliable observability solutions that meet their needs. Experience with Learning: interest in expanding knowledge of the broader technology landscape, in monitoring technologies and emerging AI/ML advancements for site reliability and system monitoring. What You'll Need to be Successful Experience Minimum 8 years of experience in a SaaS environment. Bachelor's degree in computer science or equivalent. Participate in an on-call rotation.
Avalara Technologies
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
Practice Video Interview with JobPe AI
10.0 - 15.0 Lacs P.A.
25.0 - 40.0 Lacs P.A.
25.0 - 40.0 Lacs P.A.
Hyderabad, Chennai, Bengaluru
18.0 - 22.5 Lacs P.A.
10.0 - 15.0 Lacs P.A.
Pune, Chennai
14.0 - 16.0 Lacs P.A.
Noida
10.0 - 14.0 Lacs P.A.
12.0 - 16.0 Lacs P.A.
Bengaluru
4.0 - 6.0 Lacs P.A.
Bengaluru
5.0 - 7.0 Lacs P.A.