Sr. Site Reliability Engineer - Observability, Golang

8 - 12 years

11 - 15 Lacs

Posted:2 months ago| Platform: Naukri logo

Apply

Work Mode

Work from Office

Job Type

Full Time

Job Description

As a member of our team, you will manage the design, architecture, operation, and advocacy of our observability platform tools. You will use your software engineering expertise to address both technical challenges related to application reliability. Our mission is to enhance system and performance through observability solutions. Candidate will report to Manager, Site Reliability Engineering. What Your Responsibilities Will Be Observability Tools: experience with open-source observability tools such as Grafana, Prometheus, Mimir, Loki, FluentD, OpenTelemetry, and Tempo. Experience designing, implementing, and managing observability platforms to monitor the performance and reliability of distributed systems. AI-Enhanced Observability: Exposure to AI/ML-based observability tools and techniques, including anomaly detection, predictive analytics, automated alert tuning, and root cause analysis using machine learning. Service Level Goals/Indicators (SLOs/SLIs): experience in building SLOs/SLIs, instrumenting applications for monitoring, and creating meaningful alerts to ensure system reliability and performance. Linux Fundamentals: Solid experience in administering, securing, and performance tuning Linux distributions. Proficiency in managing Linux environments to support observability tools. Troubleshooting: Experience with diagnosing and resolving complex technical issues in distributed systems using observability data. Experienced in root cause analysis and incident management. Software Engineering: understanding of software engineering principles, with a focus on integrating observability into the development process. Experience working in collaborative engineering teams, with a emphasis on testing and code quality. Automation: A strong desire to automate monitoring processes, reducing manual toil, and improving system reliability. Containers/Kubernetes: understanding of managing and maintaining container-based systems, within Kubernetes environments. Experience deploying observability solutions in containerized architectures. Infrastructure-as-Code: Experience deploying and maintaining infrastructure using Infrastructure-as-Code (IaC) tools such as Terraform or Pulumi, for observability systems. Technical Writing: create clear, comprehensive documentation and diagrams for observability systems to support other engineering teams. Customer Satisfaction: A for ensuring the satisfaction of internal customers (engineering teams) by providing reliable observability solutions that meet their needs. Experience with Learning: interest in expanding knowledge of the broader technology landscape, in monitoring technologies and emerging AI/ML advancements for site reliability and system monitoring. What Youll Need to be Successful Experience Minimum 8 years of experience in a SaaS environment. Bachelors degree in computer science or equivalent. Participate in an on-call rotation.

Mock Interview

Practice Video Interview with JobPe AI

Start Job-Specific Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now
Avalara logo
Avalara

Tax Compliance Software

Pasadena

RecommendedJobs for You

Chennai, Pune, Delhi, Mumbai, Bengaluru, Hyderabad, Kolkata

Bengaluru, Karnataka, India

Thane, Maharashtra, India

Pune, Maharashtra, India

Aurangabad, West Bengal, India

Mumbai, Maharashtra, India

Mumbai, Maharashtra, India

Bengaluru, Karnataka, India