Posted:2 days ago|
Platform:
On-site
We are seeking an accomplished Senior Site Reliability Engineer (SRE) with 12–15 years of experience to lead the reliability, scalability, and performance engineering of our critical infrastructure and production systems. As a Senior SRE, you will play a strategic and technical leadership role — driving reliability practices, mentoring SRE teams, and influencing the adoption of automation, observability, and resilience engineering across the organization.
You will act as a technical thought leader and hands-on engineer, collaborating with infrastructure, application, and operations teams to build, automate, and scale reliable systems that support global business operations. This role requires deep expertise in cloud platforms, automation, monitoring, incident management, and system design for large-scale distributed environments.
Architect, implement, and manage resilient, scalable, and highly available infrastructure systems.
Drive the creation of observability solutions and dashboards to proactively detect and remediate potential issues.
Lead critical incident response, ensuring swift mitigation and clear communication to stakeholders.
Implement and mature incident management frameworks, including runbooks, playbooks, and post-incident reviews.
Oversee system performance, capacity planning, and scalability of infrastructure across hybrid and cloud environments (AWS, Azure, GCP).
Work closely with architecture and platform teams to accommodate growth, change, and modernization initiatives.
Provide technical leadership and mentorship to SRE teams and cross-functional engineering groups.
Drive collaboration between development, QA, DevOps, and release teams to embed reliability into the software development lifecycle (SDLC).
Define, track, and continuously improve Service Level Objectives (SLOs) and Service Level Indicators (SLIs).
Apply the Four Golden Signals of SRE monitoring — Latency, Traffic, Errors, and Saturation — to guide system health and performance strategies.
Establish and maintain comprehensive documentation of systems, operational procedures, and best practices.
Facilitate learning through technical sessions, blameless postmortems, and cross-team knowledge sharing.
Contribute to defining the long-term SRE strategy, tooling roadmap, and automation frameworks.
Partner with business and technical leaders to ensure alignment of SRE objectives with organizational goals.
Collaborate with security and compliance teams to ensure infrastructure, systems, and operations meet organizational and regulatory standards.
Integrate security practices into CI/CD pipelines to ensure DevSecOps alignment.
Partner with executive and business stakeholders to align SRE initiatives with enterprise objectives and risk frameworks.
Represent SRE functions in technical governance forums, audits, and architecture reviews to drive reliability-focused outcomes.
Education: Bachelor’s or Master’s degree in Computer Science, Information Technology, or a related field.
Strong stakeholder management and cross-functional collaboration skills.
Experience defining and implementing SRE frameworks or centers of excellence in global organizations.
Active participation in industry forums or open-source contributions related to DevOps or SRE practices.
Electronic Arts
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.
We have sent an OTP to your contact. Please enter it below to verify.
Practice Python coding challenges to boost your skills
Start Practicing Python Now
hyderābād
8.0 - 10.0 Lacs P.A.
hyderabad, telangana
Experience: Not specified
Salary: Not disclosed