AWS and GCP Team Lead

Consultiquo

5 - 9 years

0 Lacs

hyderabad telangana

Posted:1 month ago| Platform: Shine logo

Apply

Skills Required

aws gcp leadership incident management monitoring site reliability engineering infrastructure as code terraform aws cloudformation gcp deployment manager cicd observability

Work Mode

On-site

Job Type

Full Time

Job Description

Role Overview: As the AWS & GCP SRE Team Lead, your primary responsibility is to lead and mentor a team of Site Reliability Engineers in ensuring the reliability and performance of the cloud infrastructure across AWS and GCP. You will collaborate with cross-functional teams to establish resilient and automated systems while fostering a culture of continuous improvement. Your expertise in both AWS and GCP, combined with leadership experience, will be crucial in building highly available systems. Key Responsibilities: - Leadership & Team Development: - Lead, mentor, and manage a team of SREs specializing in AWS and GCP infrastructure, emphasizing collaboration, innovation, and operational excellence. - Establish clear goals and priorities for the team to align with business objectives. - Provide guidance in resolving complex technical issues, promoting best practices, and facilitating knowledge sharing within the team. - Reliability & Scalability: - Take ownership of the availability, reliability, and scalability of core services on AWS and GCP. - Implement and monitor service-level objectives (SLOs), service-level indicators (SLIs), and error budgets to enhance system reliability. - Collaborate with engineering teams to develop and deploy scalable, high-performance cloud solutions meeting operational and business requirements. - Automation & Infrastructure Management: - Drive automation of operational workflows, encompassing provisioning, deployment, monitoring, and incident management for AWS and GCP. - Promote the adoption of Infrastructure as Code (IaC) tools and practices like Terraform, AWS CloudFormation, and GCP Deployment Manager. - Supervise the implementation of CI/CD pipelines and deployment strategies for faster and more reliable releases. - Incident Management & Resolution: - Lead incident management initiatives to ensure swift responses to critical incidents and minimize downtime. - Conduct post-incident reviews (PIRs) to identify root causes, enhance processes, and share insights across teams. - Continuously refine the incident response process to reduce MTTR (Mean Time to Recovery) and enhance system stability. - Collaboration & Communication: - Collaborate with engineering, product, and DevOps teams to incorporate SRE best practices into the software development lifecycle. - Communicate effectively with both technical and non-technical stakeholders, providing regular updates on service reliability, incident status, and team accomplishments. - Monitoring & Observability: - Ensure comprehensive monitoring and observability practices across AWS and GCP environments utilizing tools such as AWS CloudWatch, GCP Stackdriver, Prometheus, Grafana, or ELK stack. - Proactively identify performance bottlenecks and system failures, leveraging metrics and logs to drive enhancements in system reliability. - Continuous Improvement: - Advocate for ongoing enhancement in operational practices, tooling, and processes to drive efficiency and reliability.,

More Jobs at Consultiquo

ServiceNow Practice Head

Hyderabad, Telangana, India

12.0 - 12.0 yrs

Salary: Not disclosed

ServiceNow Automation Specialist

hyderabad, telangana

7.0 - 11.0 yrs

Salary: Not disclosed

AWS and GCP Cloud Technical Team Lead

hyderabad, telangana

5.0 - 9.0 yrs

Salary: Not disclosed

AWS and GCP Team Lead

hyderabad, telangana

5.0 - 9.0 yrs

Salary: Not disclosed

Mock Interview

Practice Video Interview with JobPe AI

Start Job-Specific Interview

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.