Senior Site Reliability Engineer

8 - 9 years

8 - 9 Lacs

Posted:1 day ago| Platform: Foundit logo

Apply

Skills Required

ci/cd

Work Mode

On-site

Job Type

Full Time

Job Description

Key Responsibilities:

  • Infrastructure Automation: Develop, deploy, and manage Infrastructure as Code (IaC) solutions using tools like Terraform and Ansible to automate provisioning, configuration, and deployment processes.
  • Cloud Platform Expertise: Leverage deep expertise in AWS services such as EC2, S3, VPC, RDS, EKS, ECS, CloudFormation (CF), and more. Familiarity with AWS Lambda and serverless architectures is a plus.
  • Containerization and Orchestration: Manage containerized applications using Docker and Kubernetes (K8s). Deploy and manage applications using orchestration platforms like Helm.
  • CI/CD Pipelines: Build, manage, and optimize CI/CD pipelines using tools like Jenkins to ensure efficient and reliable software delivery.
  • Monitoring and Alerting: Implement comprehensive monitoring and alerting solutions using tools like ELK stack, Datadog, CloudWatch, and Grafana to proactively detect and resolve system issues.
  • Incident Management: Lead incident response processes, conduct root cause analysis (RCA) to troubleshoot complex issues, and implement Corrective and Preventive Actions (CAPA) to avoid future incidents.
  • Performance Tuning: Continuously optimize system performance, identify bottlenecks, and implement strategies to improve scalability and efficiency of infrastructure.
  • Cost Optimization: Identify and implement strategies to reduce cloud costs while ensuring systems maintain performance, availability, and reliability.
  • Security Best Practices: Ensure infrastructure follows security best practices and implement measures to protect against vulnerabilities and threats.
  • Collaboration and Communication: Work cross-functionally with teams to understand business requirements, provide technical guidance, and ensure alignment with operational goals.
  • SOP Documentation: Create and maintain documentation for infrastructure, processes, incident management, and operational protocols.

Required Qualifications:

  • 7+ years of experience as a DevOps Engineer or Site Reliability Engineer.
  • Strong proficiency in AWS services, including EC2, S3, VPC, RDS, EKS, ECS, and CloudFormation.
  • Experience with Infrastructure as Code (IaC) using tools like Terraform and Ansible.
  • Expertise in containerization technologies (Docker) and orchestration platforms (Kubernetes and Helm).
  • Proficiency in building and managing CI/CD pipelines using tools like Jenkins.
  • Strong understanding of monitoring and alerting solutions such as ELK, Datadog, CloudWatch, and Grafana.
  • Experience in incident management, including Root Cause Analysis (RCA) and Corrective and Preventive Actions (CAPA).
  • Strong knowledge of performance tuning, cost optimization, and security best practices in cloud environments.
  • Excellent communication skills with the ability to collaborate with cross-functional teams.

Mock Interview

Practice Video Interview with JobPe AI

Start Job-Specific Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now

RecommendedJobs for You