Site Reliability Engineer IV

8 - 13 years

10 - 15 Lacs

Pune

Posted:18 hours ago| Platform: Naukri logo

Apply

Skills Required

Site Reliability Engineering Software Engineering SaaS IaC Troubleshooting Technical Writing Kubernetes

Work Mode

Work from Office

Job Type

Full Time

Job Description

What You'll Do As a member of our team, you will manage the design, architecture, operation, and advocacy of our observability platform tools. You will use your software engineering expertise to address both technical challenges related to application reliability. Our mission is to enhance system and performance through observability solutions. Candidate will report to Manager, Site Reliability Engineering. What Your Responsibilities Will Be Observability Tools: experience with open-source observability tools such as Grafana, Prometheus, Mimir, Loki, FluentD, OpenTelemetry, and Tempo. Experience designing, implementing, and managing observability platforms to monitor the performance and reliability of distributed systems. AI-Enhanced Observability: Exposure to AI/ML-based observability tools and techniques, including anomaly detection, predictive analytics, automated alert tuning, and root cause analysis using machine learning. Service Level Goals/Indicators (SLOs/SLIs): experience in building SLOs/SLIs, instrumenting applications for monitoring, and creating meaningful alerts to ensure system reliability and performance. Linux Fundamentals: Solid experience in administering, securing, and performance tuning Linux distributions. Proficiency in managing Linux environments to support observability tools. Troubleshooting: Experience with diagnosing and resolving complex technical issues in distributed systems using observability data. Experienced in root cause analysis and incident management. Software Engineering: understanding of software engineering principles, with a focus on integrating observability into the development process. Experience working in collaborative engineering teams, with a emphasis on testing and code quality. Automation: A strong desire to automate monitoring processes, reducing manual toil, and improving system reliability. Containers/Kubernetes: understanding of managing and maintaining container-based systems, within Kubernetes environments. Experience deploying observability solutions in containerized architectures. Infrastructure-as-Code: Experience deploying and maintaining infrastructure using Infrastructure-as-Code (IaC) tools such as Terraform or Pulumi, for observability systems. Technical Writing: create clear, comprehensive documentation and diagrams for observability systems to support other engineering teams. Customer Satisfaction: A for ensuring the satisfaction of internal customers (engineering teams) by providing reliable observability solutions that meet their needs. Experience with Learning: interest in expanding knowledge of the broader technology landscape, in monitoring technologies and emerging AI/ML advancements for site reliability and system monitoring. What You'll Need to be Successful Experience Minimum 8 years of experience in a SaaS environment. Bachelor's degree in computer science or equivalent. Participate in an on-call rotation.

Mock Interview

Practice Video Interview with JobPe AI

Start Site Reliability Engineering Interview Now
Avalara Technologies
Avalara Technologies

Software Development

Durham NC

1001-5000 Employees

150 Jobs

    Key People

  • Scott McFarlane

    Co-founder & CEO
  • Bill Decker

    CFO

RecommendedJobs for You

Hyderabad, Chennai, Bengaluru