Posted:2 months ago| Platform:
Work from Office
Full Time
on AWS using EKS (Elastic Kubernetes Service). - Develop, maintain, and enhance monitoring and alerting systems using Datadog to proactively identify and address potential issues, ensuring optimal system performance. - Participate in the design and implementation of CI/CD pipelines using Azure DevOps, enabling automated and reliable software delivery. - Lead efforts in incident response and troubleshooting to quickly diagnose and resolve production incidents, minimizing downtime and impact on users. - Take ownership of reliability initiatives by identifying areas for improvement, conducting root cause analysis, and implementing solutions to prevent recurrence of incidents. - Collaborate with cross-functional teams to ensure security, compliance, and performance standards are met throughout the development lifecycle. - Participate in on-call rotations and provide 24/7 support for critical incidents, ensuring rapid response and resolution. - Work with the development teams to define and establish Service Level Objectives (SLOs) and Service Level Indicators (SLIs) to measure and maintain the system's reliability. - Contribute to the documentation of processes, procedures, and best practices to enhance knowledge sharing within the team. Qualifications: - Bachelor's degree in Computer Science, Information Technology, or a related field, or equivalent work experience. - Minimum of 4 years of experience in a Site Reliability Engineer or similar role, managing cloud-based infrastructure on AWS with EKS. - Strong expertise in AWS services, especially EKS, including cluster provisioning, scaling, and management. - Proficiency in using monitoring and observability tools, with hands-on experience in Datadog or similar tools for tracking system performance and generating meaningful alerts. - Experience in implementing CI/CD pipelines using Azure DevOps or similar tools to automate software deployment and testing. - Solid understanding of containerization and orchestration technologies (e.g., Docker, Kubernetes) and their role in modern application architectures. - Excellent troubleshooting skills and the ability to analyze complex issues, determine root causes, and implement effective solutions. - Strong scripting and automation skills (Python, Bash, etc.). - Familiarity with infrastructure as code (IaC) tools such as Terraform or CloudFormation. - Experience with incident management, post-incident analysis, and implementing improvements based on lessons learned. - Good understanding of security best practices and compliance standards in cloud environments. - Exceptional communication skills and the ability to collaborate effectively with cross-functional teams. - Willingness to participate in on-call rotations and provide off-hours support when necessary. Preferred: - Relevant certifications such as AWS Certified DevOps Engineer, AWS Certified SRE, or Kubernetes certifications. - Experience with other cloud platforms (e.g., Azure, Google Cloud Platform). - Familiarity with microservices architecture and service mesh technologies. - Prior experience with application performance tuning and optimization.
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
Bengaluru, Hyderabad
INR 3.5 - 8.5 Lacs P.A.
Mumbai, Bengaluru, Gurgaon
INR 5.5 - 13.0 Lacs P.A.
Chennai, Pune, Delhi, Mumbai, Bengaluru, Hyderabad, Kolkata
INR 3.0 - 7.0 Lacs P.A.
Chennai, Pune, Mumbai (All Areas)
INR 5.0 - 15.0 Lacs P.A.
Pune, Bengaluru, Mumbai (All Areas)
INR 11.0 - 21.0 Lacs P.A.
Chennai, Pune, Delhi, Mumbai, Bengaluru, Hyderabad, Kolkata
INR 15.0 - 16.0 Lacs P.A.
Pune, Bengaluru, Mumbai (All Areas)
INR 10.0 - 15.0 Lacs P.A.
Bengaluru, Hyderabad, Mumbai (All Areas)
INR 0.5 - 3.0 Lacs P.A.
Hyderabad, Gurgaon, Mumbai (All Areas)
INR 6.0 - 16.0 Lacs P.A.
Bengaluru, Noida
INR 16.0 - 22.5 Lacs P.A.