SRE AWS Operations Lead

10 - 14 years

0 Lacs

Posted:3 days ago| Platform: Shine logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

Role Overview: As a Site Reliability Engineering (SRE) Lead at our company, you will play a crucial role in ensuring the availability, scalability, and operational excellence of our cloud-native product environments. Your leadership and deep technical expertise in AWS, DevOps, and infrastructure reliability will be instrumental in driving the success of our team. This is a senior-level hands-on position based in Hyderabad or Bengaluru. Key Responsibilities: - Lead and mentor a team of SREs and Cloud Operations Engineers. - Define and enforce reliability standards, SLOs/SLIs, and incident response practices. - Drive reliability, observability, and automation improvements across cloud-based platforms. - Act as the bridge between product engineering, DevOps, and support teams for operational readiness. - Manage production-grade environments hosted on AWS with a focus on high availability and performance. - Lead incident management processes, perform root cause analysis, and implement corrective actions. - Own and evolve monitoring, alerting, and observability using tools like CloudWatch, Prometheus, Grafana, ELK. - Ensure compliance with security and regulatory standards (e.g., HIPAA, SOC2, GDPR). - Design and improve CI/CD pipelines using tools like Jenkins, GitHub Actions, or Azure DevOps. - Implement Infrastructure as Code (IaC) using CloudFormation. - Automate manual operational tasks and production workflows. - Support containerized workloads using Docker, ECS, or Kubernetes (EKS). - Present technical issues, incident reports, and performance metrics to business and technical stakeholders. - Collaborate with Engineering, Product, and Security teams to embed reliability across the software lifecycle. - Provide guidance on cloud cost optimization, performance tuning, and capacity planning. Qualification Required: - 10+ years of overall IT experience, including: - At least 5 years in AWS cloud operations or SRE. - Minimum 3 years in production-grade environments and incident response. - Strong leadership experience managing high-performing technical teams. - Deep understanding of SRE principles, DevOps practices, and cloud-native architecture. - Proven experience in: - AWS core services (VPC, EC2, RDS, ECS, EKS, IAM, S3) - Container orchestration and microservices - Infrastructure as Code (Terraform / CloudFormation) - Monitoring & observability tools (ELK, Prometheus, CloudWatch) (Note: PF/Live Bank statement is mandatory for all the companies),

Mock Interview

Practice Video Interview with JobPe AI

Start DevOps Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now

RecommendedJobs for You