Senior Site Reliability Lead - SaaS & Cloud Operations

8 - 12 years

0 Lacs

Posted:23 hours ago| Platform: Shine logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

As a Senior Site Reliability Engineer (SRE) Manager at our company, you will play a crucial role in leading and managing a team of SRE/Operations engineers to foster a collaborative and high-performing work environment. Your responsibilities will include: - Overseeing the support of our SaaS products and services on AWS to ensure optimal uptime and performance. - Implementing and managing Infrastructure as Code (IaC) tools like Terraform and Helm for automating infrastructure provisioning and configuration. Experience with Ansible would be a plus. - Managing and optimizing our Kubernetes clusters (EKS) and CI/CD pipelines using tools like ArgoCD or similar solutions. - Proactively monitoring system health, identifying and troubleshooting issues, and implementing solutions to minimize downtime. - Responding effectively to incidents, diagnosing root causes, and leading the team in implementing swift resolutions. - Developing and implementing processes and procedures for efficient SRE operations. - Staying up-to-date on the latest trends and advancements in cloud technologies, SRE best practices, and automation tools. - Creating a culture of knowledge sharing and providing mentorship to junior engineers to help them grow their skills. Qualifications required for this role include: - 8+ years of experience in system administration, cloud operations, or a related field. - Proven experience in leading and managing a team of SRE/Operations engineers. - Solid understanding of SaaS delivery models and support methodologies. - In-depth knowledge of AWS cloud services and best practices. - Expertise in Infrastructure as Code (IaC) tools like Terraform and Helm, with experience in Ansible considered a plus. - Experience with Kubernetes (EKS) and CI/CD pipelines, preferably using ArgoCD or similar tools. - Excellent problem-solving, analytical, and troubleshooting skills. - Strong leadership, communication, collaboration, and interpersonal skills. - Passion for building and mentoring high-performing teams. Join us at our company as we strive to maintain a cutting-edge SRE environment and deliver top-notch services to our clients. As a Senior Site Reliability Engineer (SRE) Manager at our company, you will play a crucial role in leading and managing a team of SRE/Operations engineers to foster a collaborative and high-performing work environment. Your responsibilities will include: - Overseeing the support of our SaaS products and services on AWS to ensure optimal uptime and performance. - Implementing and managing Infrastructure as Code (IaC) tools like Terraform and Helm for automating infrastructure provisioning and configuration. Experience with Ansible would be a plus. - Managing and optimizing our Kubernetes clusters (EKS) and CI/CD pipelines using tools like ArgoCD or similar solutions. - Proactively monitoring system health, identifying and troubleshooting issues, and implementing solutions to minimize downtime. - Responding effectively to incidents, diagnosing root causes, and leading the team in implementing swift resolutions. - Developing and implementing processes and procedures for efficient SRE operations. - Staying up-to-date on the latest trends and advancements in cloud technologies, SRE best practices, and automation tools. - Creating a culture of knowledge sharing and providing mentorship to junior engineers to help them grow their skills. Qualifications required for this role include: - 8+ years of experience in system administration, cloud operations, or a related field. - Proven experience in leading and managing a team of SRE/Operations engineers. - Solid understanding of SaaS delivery models and support methodologies. - In-depth knowledge of AWS cloud services and best practices. - Expertise in Infrastructure as Code (IaC) tools like Terraform and Helm, with experience in Ansible considered a plus. - Experience with Kubernetes (EKS) and CI/CD pipelines, preferably using ArgoCD or similar tools. - Excellent problem-solving, analytical, and troubleshooting skills. - Strong leadership, communication, collaboration, and interpersonal skills. - Passion for building and mentoring high-performing teams. Join us at our company as we strive to maintain a cutting-edge SRE environment and deliver top-notch services to our clients.

Mock Interview

Practice Video Interview with JobPe AI

Start Job-Specific Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now

RecommendedJobs for You