Level 3 AWS Infrastructure Support Engineer

5 years

0 Lacs

Posted:1 week ago| Platform: Linkedin logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

Role Overview

Level 3 AWS Infrastructure Support Engineer

  • Monitor system health using Datadog and AWS-native tools
  • Investigate alerts and anomalies using established runbooks
  • Resolve production incidents when possible
  • Escalate complex issues quickly and accurately
  • Maintain clean, auditable incident documentation

This role is ideal for someone who thrives in high-trust, high-impact operational environments.

Key ResponsibilitiesOn-Call & Incident Response
  • Provide

    initial response within 15 minutes

    for all high-priority production alerts
  • Investigate, mitigate, and resolve production outages when feasible
  • Escalate unresolved or complex issues using the defined escalation matrix
  • Act as the

    owner of the production system stability
Monitoring, Alerting & Observability
  • Analyze and respond to

    Datadog monitor alerts

    across infrastructure and application layers
  • Identify abnormal patterns, trend-line deviations, and early indicators of systemic risk
  • Proactively notify stakeholders of significant performance or stability concerns
  • Contribute insights for preventive and corrective actions
Root Cause & Trend Analysis
  • Track recurring alerts and incidents
  • Provide analysis and recommendations to reduce alert noise and improve system resilience
  • Participate in weekly validation of Datadog alert configurations and thresholds
Communication & Documentation
  • Maintain clear, concise, and timely communication during incidents
  • Document all incidents, alarms, and observations in

    Jira

    during each shift
  • Ensure handoff notes are complete and actionable for daytime engineering teams
Technical EnvironmentCore AWS Services
  • ECS (Fargate)
  • RDS
  • ElastiCache
  • EC2
  • Lambda
  • API Gateway
  • S3
Tooling
  • Datadog (monitoring, alerts, dashboards)
  • Jira (incident tracking and documentation)
QualificationsExperience
  • 5+ years

    of hands-on AWS infrastructure administration and support
  • Proven experience supporting

    production-grade, high-availability systems

  • Strong background in incident response within enterprise or scale-up environments
Skills
  • Deep operational knowledge of AWS services and distributed systems
  • Strong troubleshooting and root-cause analysis skills under tight SLAs
  • Ability to follow runbooks while also knowing when to think beyond them
  • Calm, structured decision-making during production incidents
Certifications (Preferred)
  • AWS Certified Solutions Architect – Associate or Professional
  • AWS Certified DevOps Engineer – Professional (Nice to Have)
Service Level Expectations
  • Alert Escalation SLA:

    ≤ 15 minutes for high-priority alarms
  • Availability:

    Consistent overnight coverage ( IST Day Shift )
  • Reliability:

    Zero missed critical alerts during assigned coverage windows
Deliverables
  • Monthly Service Performance Report

    , including:
  • Alerts monitored
  • Incidents resolved
  • Escalations
  • SLA adherence metrics
  • Weekly Datadog Validation

    , ensuring alert accuracy and functionality


Mock Interview

Practice Video Interview with JobPe AI

Start DevOps Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now

RecommendedJobs for You