Site Reliability Engineer (SRE) – L2 Support

6 - 9 years

6 - 9 Lacs

Posted:6 days ago| Platform: Foundit logo

Apply

Skills Required

Work Mode

On-site

Job Type

Full Time

Job Description

You'll Make a Difference By:

  • SRE L2 Support Role:

    Focus on maintaining and improving the reliability, availability, and performance of AWS-based infrastructure and applications.
  • Incident Management:

    Handle and resolve L2 incidents related to AWS services (EC2, RDS, S3, Lambda, EKS, etc.), perform root cause analysis, and communicate to customers during outages or SLA breaches.
  • Monitoring & Optimization:

    Proactively monitor infrastructure and application health in AWS, set up and fine-tune AWS monitoring and observability tools (e.g., CloudWatch, CloudTrail), create alarms, dashboards, and reports.
  • Troubleshooting AWS Services:

    Resolve issues related to EC2 instances, Autoscaling Groups, Load Balancers (ELB/ALB/NLB), Amazon ECS, EKS, and container workloads.
  • Log Management:

    Manage and analyze logs using AWS CloudWatch Logs, CloudTrail, and third-party solutions like ELK Stack, Datadog, Splunk.
  • Disaster Recovery & Backups:

    Monitor AWS Backup jobs, ensure regular backups for critical infrastructure, validate DR plans, and participate in recovery testing exercises.
  • Automation & Scripting:

    Contribute to automation of repetitive tasks using scripts and support incident recovery processes.
  • Documentation & Knowledge Sharing:

    Create and maintain operational runbooks, SOPs, and knowledge base articles for common AWS issues.
  • Collaboration:

    Work effectively across teams, shift ownership as required, and communicate with stakeholders during incidents.


You'd Describe Yourself As:

  • An experienced professional with

    6 to 9 years

    of relevant experience in

    SRE

    ,

    DevOps

    , or

    Cloud Infrastructure Support

    with strong hands-on expertise in

    AWS services

    .
  • Proficient in

    monitoring tools

    like Prometheus, Datadog, and familiar with

    cloud platforms

    (AWS, Azure, GCP).
  • Knowledgeable in

    Linux/Unix operating systems

    and

    basic scripting skills

    (e.g., Python, GitLab actions).
  • Familiar with

    container orchestration

    (Kubernetes, Docker, Helmcharts),

    CI/CD pipelines

    , and

    GitOps workflows

    (e.g., ArgoCD for automated deployments).
  • Strong analytical skills to resolve

    production incidents

    and a basic understanding of

    networking concepts

    (DNS, Load Balancers, Firewalls).
  • Experienced with

    alerting systems

    (e.g., PagerDuty),

    incident tracking tools

    (e.g., JIRA, ServiceNow), and ability to handle high-pressure environments.
  • A

    proactive problem-solver

    with a strong sense of urgency and excellent

    organizational skills

    to prioritize tasks effectively.
  • Able to work as a

    teammate

    , collaborating across teams and owning tasks as needed.


Preferred Certifications:

  • AWS Certified SysOps Administrator Associate
  • AWS Certified Solutions Architect Associate
  • AWS Certified DevOps Engineer Professional

Mock Interview

Practice Video Interview with JobPe AI

Start Job-Specific Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now
University Of Cambridge

Research Services

Cambridge England

RecommendedJobs for You

Hyderabad, Telangana, India

Bengaluru, Karnataka, India