Cloud Engineering Ops Lead (AWS + Application Support)

10 - 12 years

0 Lacs

Posted:1 day ago| Platform: Foundit logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

Job Title: Cloud Engineering Ops Lead (AWS + Application Support)

Location: Hyderabad (Onsite)

Experience Level: 10+ years

Notice period:

About us:

Conglomerate IT is a certified and a pioneer in providing premium end-to-end Global Workforce Solutions and IT Services to diverse clients across various domains. Visit us at https://www.conglomerateit.com/

Conglomerate IT mission is to establish global cross culture human connections that further the careers of our employees and strengthen the businesses of our clients. We are driven to use the power of global network to connect business with the right people without bias. We provide Global Workforce Solutions with affability.

We are seeking a Cloud Engineering Ops Lead responsible for ensuring the stability, observability, security, and cost-efficiency of our AWS environments and customer-facing applications. This role is critical in maintaining production operations that are reliable, predictable, and optimized for performance and resilience.

Key Responsibilities:

1. AWS Platform Operations

  • Manage and maintain AWS core services including EC2, EKS, RDS, ALB/CloudFront, IAM/OIDC, VPC, Transit Gateways, and Security Groups.
  • Ensure system hygiene, patching, and infrastructure health.
  • Automate operational workflows using Terraform, Ansible, or Python.

2. Application Support

  • Ensure production readiness through runbooks, pre-deployment validations, performance baselines, and rollback mechanisms.
  • Support releases with deployment assistance, smoke testing, and incident troubleshooting.
  • Drive continuous improvement in application stability and availability.

3. Observability & Monitoring

  • Build and maintain dashboards, logs, metrics, traces, and synthetic monitoring.
  • Ensure alert accuracyeliminate noise and ensure targeted notifications.
  • Track SLOs, error budgets, and system performance.
  • Lead incident response, RCA, and implement corrective actions.

4. Backup & Disaster Recovery

  • Define and manage backup and restore operations with schedules, retention rules, replication, and validation.
  • Conduct regular DR drills to ensure RPO/RTO targets are consistently met.
  • Maintain up-to-date documentation on disaster recovery processes.

5. Cost Optimization

  • Enforce cost governance through tagging, right-sizing, reservation planning, and lifecycle management (EBS, EIP, AMIs).
  • Generate cost analysis reports with actionable recommendations to improve efficiency.

6. Team Leadership & Enablement

  • Lead high-severity incident bridges (Sev-1/Sev-2) with clear communication.
  • Mentor team members in operational excellence and preventive practices.
  • Develop reusable runbooks and automation to eliminate repetitive tasks.
  • Promote a culture of reliability, transparency, and proactive improvement.

Success Metrics:

  • Visibility:

    Dashboards and alerts are reliable, actionable, and service-specific

    .

  • Backup Health:

    100% backup success rate with monthly restore testing.
  • Reliability:

    Reduced MTTR, increased deployment success rate, and runbook-driven resolutions.
  • Change Management:

    Stable release cycles with tested rollback strategies.
  • Cost Control:

    Optimized AWS expenditure with over 95% tagging compliance.

Required Skills & Experience:

  • 10+ years in cloud and application operations with deep expertise in AWS.
  • Proven leadership in managing production incidents and driving operational excellence.
  • Strong knowledge of observability tools: CloudWatch, Prometheus, Grafana, Datadog, etc.
  • Hands-on experience with Terraform, Ansible, and/or Python for automation (IaC).
  • Expertise in backup strategies and disaster recovery practices with real-world restore testing.
  • Solid understanding of AWS cloud networking including VPCs, routing, security groups, and transit gateways.
  • Excellent communication, mentoring ability, and problem-solving mindset.

Mock Interview

Practice Video Interview with JobPe AI

Start Job-Specific Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now

RecommendedJobs for You