Site Reliability Engineer - Elastic Kubernetes Service

5 - 10 years

13.0 - 17.0 Lacs P.A.

Chennai

Posted:2 months ago| Platform: Naukri logo

Apply Now

Skills Required

Elastic Kubernetescontinuous integrationkubernetespythonaws iamsrereliabilityci/cdekssite reliability engineeringcloud technologiesroot cause analysisazure devopsdockercontainerizationdevopsdatadogtroubleshootingbashawsci cd pipeline

Work Mode

Work from Office

Job Type

Full Time

Job Description

Position Overview We are seeking an experienced Site Reliability Engineer (SRE) to join our dynamic team. As an SRE, you will play a pivotal role in ensuring the reliability, availability, and performance of our cloud-based infrastructure hosted on AWS with EKS. You will work closely with cross-functional teams to implement best practices for monitoring, automation, and continuous integration and deployment using tools such as Datadog and Azure DevOps. The ideal candidate should have a solid background in cloud technologies, troubleshooting, and production release support. Responsibilities Collaborate with development and operations teams to design, implement, and manage scalable and reliable infrastructure solutions on AWS using EKS (Elastic Kubernetes Service). Develop, maintain, and enhance monitoring and alerting systems using Datadog to proactively identify and address potential issues, ensuring optimal system performance. Participate in the design and implementation of CI/CD pipelines using Azure DevOps, enabling automated and reliable software delivery. Lead efforts in incident response and troubleshooting to quickly diagnose and resolve production incidents, minimizing downtime and impact on users. Take ownership of reliability initiatives by identifying areas for improvement, conducting root cause analysis, and implementing solutions to prevent recurrence of incidents. Work with the development teams to define and establish Service Level Objectives (SLOs) and Service Level Indicators (SLIs) to measure and maintain the system's reliability. Contribute to the documentation of processes, procedures, and best practices to enhance knowledge sharing within the team. Qualifications Minimum of 4 years of experience in a Site Reliability Engineer or similar role, managing cloud-based infrastructure on AWS with EKS. Strong expertise in AWS services, especially EKS, including cluster provisioning, scaling, and management. Proficiency in using monitoring and observability tools, with hands-on experience in Datadog or similar tools for tracking system performance and generating meaningful alerts. Experience in implementing CI/CD pipelines using Azure DevOps or similar tools to automate software deployment and testing. Solid understanding of containerization and orchestration technologies (e.g., Docker, Kubernetes) and their role in modern application architectures. Excellent troubleshooting skills and the ability to analyze complex issues, determine root causes, and implement effective solutions. Strong scripting and automation skills (Python, Bash, etc.).

Information Technology & Services
Bromley

RecommendedJobs for You

Chennai, Pune, Delhi, Mumbai, Bengaluru, Hyderabad, Kolkata

Pune, Bengaluru, Mumbai (All Areas)

Chennai, Pune, Delhi, Mumbai, Bengaluru, Hyderabad, Kolkata

Bengaluru, Hyderabad, Mumbai (All Areas)

Hyderabad, Gurgaon, Mumbai (All Areas)