Cloud Engineering Ops Lead (AWS + Application Support)

6 - 10 years

20 - 25 Lacs

Posted:6 hours ago| Platform: Naukri logo

Apply

Work Mode

Work from Office

Job Type

Full Time

Job Description

Job Title: Cloud Engineering Ops Lead (AWS + Application Support)

Location: Hyderabad (Onsite)

Experience Level: 10+ years

Notice period: Immediate Joiner (Who can join within a week)

We are seeking a Cloud Engineering Ops Lead responsible for ensuring the stability, observability, security, and cost-efficiency of our AWS environments and customer-facing applications. This role is critical in maintaining production operations that are reliable, predictable, and optimized for performance and resilience.

Key Responsibilities:

1. AWS Platform Operations

  • Manage and maintain AWS core services including EC2, EKS, RDS, ALB/CloudFront, IAM/OIDC, VPC, Transit Gateways, and Security Groups.
  • Ensure system hygiene, patching, and infrastructure health.
  • Automate operational workflows using Terraform, Ansible, or Python.

2. Application Support

  • Ensure production readiness through runbooks, pre-deployment validations, performance baselines, and rollback mechanisms.
  • Support releases with deployment assistance, smoke testing, and incident troubleshooting.
  • Drive continuous improvement in application stability and availability.

3. Observability & Monitoring

  • Build and maintain dashboards, logs, metrics, traces, and synthetic monitoring.
  • Ensure alert accuracyeliminate noise and ensure targeted notifications.
  • Track SLOs, error budgets, and system performance.
  • Lead incident response, RCA, and implement corrective actions.

4. Backup & Disaster Recovery

  • Define and manage backup and restore operations with schedules, retention rules, replication, and validation.
  • Conduct regular DR drills to ensure RPO/RTO targets are consistently met.
  • Maintain up-to-date documentation on disaster recovery processes.

5. Cost Optimization

  • Enforce cost governance through tagging, right-sizing, reservation planning, and lifecycle management (EBS, EIP, AMIs).
  • Generate cost analysis reports with actionable recommendations to improve efficiency.

6. Team Leadership & Enablement

  • Lead high-severity incident bridges (Sev-1/Sev-2) with clear communication.
  • Mentor team members in operational excellence and preventive practices.
  • Develop reusable runbooks and automation to eliminate repetitive tasks.
  • Promote a culture of reliability, transparency, and proactive improvement.

Success Metrics:

  • Visibility:

    Dashboards and alerts are reliable, actionable, and service-specific.
  • Backup Health:

    100% backup success rate with monthly restore testing.
  • Reliability:

    Reduced MTTR, increased deployment success rate, and runbook-driven resolutions.
  • Change Management:

    Stable release cycles with tested rollback strategies.
  • Cost Control:

    Optimized AWS expenditure with over 95% tagging compliance.

Required Skills & Experience:

  • 10+ years in cloud and application operations with deep expertise in AWS.
  • Proven leadership in managing production incidents and driving operational excellence.
  • Strong knowledge of observability tools: CloudWatch, Prometheus, Grafana, Datadog, etc.
  • Hands-on experience with Terraform, Ansible, and/or Python for automation (IaC).
  • Expertise in backup strategies and disaster recovery practices with real-world restore testing.
  • Solid understanding of AWS cloud networking including VPCs, routing, security groups, and transit gateways.
  • Excellent communication, mentoring ability, and problem-solving mindset.

Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

RecommendedJobs for You