Posted:1 day ago|
Platform:
Work from Office
Full Time
About the Role
We are seeking a highly skilled Sr. Site Reliability Engineer (SRE) to lead the implementation, optimization, and management of our observability stack across cloud infrastructure. You will play a key role in ensuring the reliability, scalability, and performance of our platform, spanning microservices on Kubernetes/EC2 and mission-critical systems. This role requires strong problem-solving, automation mindset, and a proactive approach to incident management.
Key Responsibilities
Design, implement, and manage monitoring, logging, and alerting systems across production and non-production environments.Lead incident response, root cause analysis, and post-mortem practices for continuous improvement.Define and implement disaster recovery strategies with regular testing.Collaborate with development teams to define and track SLAs/SLOs for critical services.Optimize AWS cloud infrastructure for cost efficiency, reliability, and scalability.Build and maintain automation frameworks for deployment, scaling, and recovery using Terraform, GitLab CI/CD, and Kubernetes.Administer Kubernetes clusters, troubleshoot performance bottlenecks, and ensure high availability.Manage databases (PostgreSQL or similar), including replication and disaster recovery strategies.Contribute to infrastructure security, compliance, and best practices.Participate in the on-call rotation and handle high-priority incidents under pressure.
Required Skills & Experience
4+ years of experience as an SRE, DevOps, or similar role.Strong hands-on experience with AWS services: EC2, EKS, RDS, Cognito, CloudWatch, etc.Proven expertise in Kubernetes administration in production environments.Proficiency in scripting/programming: Python, Bash, Chef (recipes, cookbooks), Ansible.Strong knowledge of Infrastructure as Code (Terraform/CloudFormation).Deep experience with observability tools: Prometheus, Grafana, ELK stack, distributed tracing.Database administration experience with PostgreSQL or similar systems.Understanding of network protocols, load balancing, and security best practices.Experience in CI/CD pipelines and GitOps workflows.Ability to handle multiple incidents and prioritize effectively under pressure.Exposure to monitoring solutions like Splunk, Datadog, Dynatrace.
Preferred Qualifications
AWS Certified Solutions Architect or AWS DevOps Engineer certification.Certified Kubernetes Administrator (CKA).
Wits Innovation Lab
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.
We have sent an OTP to your contact. Please enter it below to verify.
Practice Python coding challenges to boost your skills
Start Practicing Python Now20.0 - 27.5 Lacs P.A.
20.0 - 25.0 Lacs P.A.
8.0 - 11.0 Lacs P.A.
8.0 - 11.0 Lacs P.A.
20.0 - 25.0 Lacs P.A.
Hyderabad, Telangana, India
5.0 - 9.0 Lacs P.A.
Salary: Not disclosed
Bengaluru, Karnataka, India
5.0 - 10.0 Lacs P.A.
Hyderabad, Telangana, India
5.0 - 10.0 Lacs P.A.
Delhi, Delhi, India
5.0 - 10.0 Lacs P.A.