Posted:2 months ago|
Platform:
Work from Office
Full Time
Design and implement high-availability systems, ensuring systems are reliable, performant, and scalable. Establish and enforce Service Level Objectives (SLOs), Service Level Indicators (SLIs), and Service Level Agreements (SLAs). Perform root cause analysis for system failures, providing insights and ensuring preventative measures are in place. Proactively manage incidents, ensuring timely resolution and effective post-mortem analysis to prevent recurrence. Automate infrastructure provisioning, deployment pipelines, and operational processes. Build, maintain, and optimize CI/CD pipelines with platforms such as GitLab (preferred), GitHub Actions, Jenkins, etc. Develop and manage Infrastructure as Code (IaC) using tools like Terraform, AWS CDK, and CloudFormation. Champion the adoption of automation and DevOps best practices across teams. Implement and manage enterprise observability tools such as Datadog, Dynatrace (preferred), or Grafana for monitoring, ing, and performance tracking. Establish proactive monitoring and ing systems to ensure the health of applications and infrastructure. Create and maintain robust incident response processes and manage on-call rotations, ensuring efficient handling of incidents. Optimize system performance and capacity planning, ensuring efficient resource utilization. Implement horizontal scaling strategies to ensure systems can handle increasing load. Collaborate with development teams to improve application resilience, optimize performance, and manage system health. Manage and optimize infrastructure in a major cloud platform (AWS, GCP, or Azure). Work with cloud infrastructure tools like Terraform and AWS CDK to provision and manage cloud resources. Implement infrastructure automation and ensure the infrastructure is scalable, reliable, and secure. Ensure security best practices are followed in infrastructure, code, and deployment processes. Conduct regular vulnerability assessments and work with teams to remediate identified risks. Ensure compliance with industry standards and organizational security requirements. Act as a technical leader, providing guidance and mentorship to junior SREs and other team members. Collaborate across development, operations, and product teams to drive a DevOps culture focused on automation, reliability, and efficiency. Advocate for a culture of ownership, continuous improvement, and shared responsibility across teams. Strong experience in a previous SRE role, with a proven track record in maintaining highly available and scalable systems. Expertise in one or more programming languages such as Python, Go, or Java. Deep understanding of distributed systems, networking, and operating systems. Hands-on experience with cloud platforms (AWS, GCP, Azure). Proficiency with enterprise observability tools, such as Datadog, Dynatrace, or Grafana. Extensive experience with CI/CD platforms, such as GitLab (preferred), GitHub Actions, or Jenkins. Experience with cloud infrastructure and automation tools, such as Terraform, AWS CDK, or similar IaC frameworks. Solid understanding of containerization and orchestration tools (e.g., Docker, Kubernetes). Strong knowledge of database management (SQL and NoSQL).
UST
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
My Connections UST
Trivandrum
25.0 - 27.5 Lacs P.A.
Chennai, Tamil Nadu, India
6.0 - 10.0 Lacs P.A.
Chennai, Tamil Nadu, India
7.0 - 10.0 Lacs P.A.
Bengaluru / Bangalore, Karnataka, India
3.0 - 7.0 Lacs P.A.
Hyderabad / Secunderabad, Telangana, Telangana, India
3.0 - 7.0 Lacs P.A.
Delhi, Delhi, India
3.0 - 7.0 Lacs P.A.
Noida, Uttar Pradesh, India
3.0 - 9.5 Lacs P.A.
Gurgaon / Gurugram, Haryana, India
7.0 - 14.0 Lacs P.A.
Noida, Uttar Pradesh, India
7.0 - 14.0 Lacs P.A.
Patan - Gujarat, Gujrat, India
4.0 - 11.0 Lacs P.A.