Site Reliability Engineer - DevOps

5 - 9 years

0 Lacs

Posted:2 weeks ago| Platform: Shine logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

As a Site Reliability Engineer (SRE) with 5-7 years of experience, your role will involve maintaining strict SLOs (99.99% uptime) across distributed systems, including Redis, Golang services, and DocDB. You will be responsible for diagnosing and resolving complex application and network issues, utilizing monitoring and observability tools such as Kibana, Grafana, Instana, and Dynatrace for proactive incident detection. Automation of infrastructure and workflows with Python, Bash, Terraform, and Ansible will also be a key aspect of your responsibilities. Furthermore, managing container orchestration on AWS Elastic Kubernetes Service (EKS) and Red Hat OpenShift to ensure high availability and scalability will be part of your daily tasks. Collaboration with development and QA teams to embed reliability best practices and improve system observability will also be crucial. Additionally, participating in on-call rotations, incident response, and blameless postmortems, as well as documenting runbooks and mentoring junior engineers on SRE and networking fundamentals, are essential responsibilities. Key Responsibilities: - Maintain strict SLOs (99.99% uptime) across distributed systems including Redis, Golang services, and DocDB. - Diagnose and resolve complex application and network issues, including DNS troubleshooting and network latency. - Use monitoring and observability tools such as Kibana, Grafana, Instana, and Dynatrace for proactive incident detection. - Automate infrastructure and workflows with Python, Bash, Terraform, and Ansible. - Manage container orchestration on AWS Elastic Kubernetes Service (EKS) and Red Hat OpenShift, ensuring high availability and scalability. - Collaborate with development and QA teams to embed reliability best practices and improve system observability. - Participate in on-call rotations, incident response, and blameless postmortems. - Document runbooks and mentor junior engineers on SRE and networking fundamentals. Qualifications Required: - 5+ years in SRE or DevOps roles supporting high-scale platforms (fintech, OTT, ecommerce, net banking). - Expertise in uptime and troubleshooting distributed systems (Redis, Golang, DocDB). - Strong networking skills, including network and DNS troubleshooting. - Experience with monitoring/APM tools (Kibana, Grafana, Instana, Dynatrace). - Hands-on with container orchestration on AWS EKS and Red Hat OpenShift. - Proficiency in CI/CD, cloud infrastructure (AWS/Azure), and infrastructure automation. Please note that the company details were not provided in the job description.,

Mock Interview

Practice Video Interview with JobPe AI

Start DevOps Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Golang Skills

Practice Golang coding challenges to boost your skills

Start Practicing Golang Now

RecommendedJobs for You

sahibzada ajit singh nagar, punjab, india