Staff Site Reliability Engineer (SRE)

5 - 9 years

0 Lacs

Posted:3 weeks ago| Platform: Shine logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

Role Overview: As a Staff Site Reliability Engineer (SRE) at The Modern Data Company, you will be a key member of the CTO Office, focusing on enhancing the reliability, scalability, and efficiency of platforms across multi-cloud environments. Your role will involve shaping observability, automation, and operational excellence, collaborating closely with platform engineering and product teams to establish robust monitoring systems. Key Responsibilities: - Design and maintain observability stacks (Prometheus, Thanos, Grafana) for real-time system health monitoring and alerting. - Lead the implementation of Application Performance Monitoring (APM) and log analytics to reduce Mean Time To Repair (MTTR) and proactively manage system performance. - Collaborate with product and platform teams to ensure high availability and scalability across various cloud regions. - Automate operational processes such as incident response, infrastructure provisioning, and deployments using scripting and Infrastructure-as-Code. - Develop resilient Continuous Integration/Continuous Deployment (CI/CD) pipelines to support rapid and secure deployments. - Optimize infrastructure utilization and cost management across AWS, Azure, and GCP. - Lead incident response, conduct Root Cause Analysis (RCA), and implement systemic fixes across teams. - Define and document playbooks, reliability standards, and best practices for operational excellence. Qualifications Required: - 5+ years of experience in Site Reliability Engineering (SRE) or DevOps roles with a focus on production ownership. - Proficiency in observability tools such as Prometheus, Thanos, Grafana, and logging frameworks. - Strong expertise in Kubernetes, Docker, Terraform, and cloud-native architectures. - Demonstrated experience in automating workflows using Python, Bash, or similar scripting languages. - Hands-on experience with AWS, Azure, and/or GCP cloud platforms. - Strong problem-solving skills with a preference for automation and scalability. - Effective communication skills, a commitment to continuous learning, and a collaborative mindset. (Note: Additional company details were not present in the provided job description.),

Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

RecommendedJobs for You