Cloud Operations & Reliability Lead

5 - 8 years

0 Lacs

Posted:1 week ago| Platform: Foundit logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

Cloud Operations Lead

Key Responsibilities

Cloud Operations & Reliability

  • Manage day-to-day operations across production, staging, and development cloud environments within an R&D context.
  • Ensure high availability of services through robust monitoring, alerting, and incident response processes.
  • Lead root cause analyses (RCA) and post-mortem reviews to drive continuous improvement.
  • Implement observability practices including logging, tracing, and metrics for proactive issue detection.
  • Oversee patch management and maintenance to ensure systems remain secure and up-to-date.

Automation & Optimization

  • Develop and maintain automation scripts for provisioning, scaling, and monitoring cloud resources.
  • Optimize cloud usage through rightsizing, reserved instances, and cost governance (FinOps).
  • Standardize operational runbooks and playbooks to streamline processes and reduce manual effort.

Security & Compliance

  • Enforce security baselines, including IAM, encryption, and network segmentation across cloud services.
  • Collaborate with security teams to implement cloud-native security tools and respond to threats.
  • Ensure compliance with regulatory standards and audits (SOC 2, ISO 27001, GDPR, HIPAA where applicable).

Team Leadership & Collaboration

  • Lead, mentor, and develop a team of cloud operations engineers.
  • Promote a culture of SRE/DevOps best practices, automation, and operational reliability.
  • Partner with application, DevOps, and networking teams to support business-critical R&D initiatives.
  • Act as escalation point for critical incidents and operational challenges.

Vendor & Stakeholder Management

  • Manage relationships with cloud providers (AWS, Azure, GCP) and monitoring tool vendors.
  • Provide operational metrics and status updates to senior leadership.
  • Collaborate with finance to align cloud cost forecasts and budget planning.

Required Qualifications

Education & Experience

  • Bachelor's degree in Computer Science, IT, or a related field.
  • 58 years of experience in cloud operations, SRE, or IT infrastructure.
  • 2+ years in a leadership role managing operational teams, preferably in an R&D environment.

Technical Skills

  • Expertise in at least one major cloud platform (AWS, Azure, GCP).
  • Hands-on experience with monitoring and observability tools (CloudWatch, Datadog, New Relic, Prometheus).
  • Strong knowledge of Infrastructure as Code (Terraform, CloudFormation, ARM templates).
  • Experience with incident management frameworks (ITIL, SRE principles, PagerDuty/On-Call rotations).
  • Familiarity with container orchestration (Kubernetes, ECS, AKS, GKE) and CI/CD pipelines.
  • Understanding of cloud security best practices and compliance frameworks.

Soft Skills

  • Proven ability to lead and inspire teams in a fast-paced R&D environment.
  • Strong problem-solving, decision-making, and communication skills.
  • Collaborative mindset to work effectively with technical and business stakeholders.

Preferred Qualifications

  • Cloud certifications (AWS SysOps, Azure Administrator, Google Cloud DevOps Engineer, or equivalent).
  • Experience managing multi-cloud environments.
  • Knowledge of FinOps and cost governance frameworks.
  • Familiarity with ITIL processes or formal service management frameworks.

Key Success Metrics

  • System Uptime:

    Meet or exceed availability SLAs (>99.9%).
  • Incident Response:

    Reduced MTTR (Mean Time to Resolution) for critical incidents.
  • Cost Efficiency:

    Optimize resource utilization and achieve measurable cloud cost savings.
  • Automation:

    Increase automation coverage for operational tasks year over year.
  • Team Performance:

    Maintain high team engagement and development.

Mock Interview

Practice Video Interview with JobPe AI

Start Job-Specific Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now

RecommendedJobs for You