7 - 12 years
30 - 40 Lacs
Posted:-1 days ago|
Platform:
Work from Office
Full Time
We are seeking an experienced and proactive Site Reliability Engineering (SRE) Lead with a
strong background in support operations, service management, and debugging complex
systems built on Java and microservices architecture. This role is crucial in ensuring the
reliability, stability, and efficiency of our critical systems while driving process
improvements, incident management, and cross-functional collaboration. As an SRE Lead,
you will oversee system health, manage escalations, track and ensure ticket closures, follow
up on issues, and enhance support processes to deliver a seamless operational experience.
Experience: 7+ years
Service Reliability & Operational Excellence:
• Ensure high availability and performance of critical services through proactive
monitoring and issue resolution.
• Define and uphold Service Level Indicators (SLIs) and Service Level Objectives
(SLOs) aligned with business needs.
• Identify recurring operational challenges and implement process improvements to
enhance service reliability.
Incident & Problem Management:
• Lead incident response efforts, ensuring quick resolution and minimal business
impact.
• Establish robust on-call processes and ensure smooth incident handling across teams.
• Conduct post-incident reviews, documenting learnings and driving continuous
improvement initiatives.
• Collaborate with engineering teams to ensure long-term fixes for recurring incidents.
• Possess strong debugging skills and the ability to analyze and resolve complex issues.
Support & Escalation Management:
• Act as the primary point of contact for major incidents, working with cross-functional
teams to resolve issues.
• Manage support escalations efficiently, ensuring timely communication and
resolution.
• Track and ensure timely closure of support tickets and incidents.
• Follow up on pending issues to drive resolution and prevent recurring problems.
• Develop and enhance support playbooks and standard operating procedures (SOPs).
• Foster a culture of accountability and knowledge sharing within the team.
Collaboration & Stakeholder Management:
• Work closely with development, infrastructure, and business teams to align
operational goals.
• Ensure seamless communication between engineering teams, customer support, and
leadership.
• Provide regular updates on system health, incidents, and improvements to
stakeholders.
• Advocate for operational needs in engineering and product discussions.
Process Improvement & Automation:
• Streamline support workflows and implement best practices for efficient issue
resolution.
• Drive automation initiatives to reduce manual operational tasks and improve response
times.
• Ensure documentation and knowledge management practices are maintained
effectively.
Leadership & Team Development:
• Mentor and support a team of SREs, fostering a culture of reliability and operational
excellence.
• Promote a customer-first mindset within the team.
• Encourage collaboration, learning, and professional growth among team members.
Required Skills:
• Strong experience in IT operations, support, or service reliability roles.
• Proven track record in incident management, troubleshooting, and root cause analysis.
• Strong Java knowledge with an understanding of microservices architecture.
• Experience with monitoring and alerting tools (e.g., Grafana, Prometheus, New Relic,
or similar).
• Familiarity with Kubernetes and cloud-based environments (AWS, Azure, GCP).
• Familiarity with ITIL practices and service management methodologies.
• Strong communication and stakeholder management skills.
• Ability to manage escalations effectively and ensure timely issue resolution.
• Strong skills in tracking support issues, ensuring ticket closures, and following up on
action items.
• Prior experience in an SRE, IT operations, or support leadership role.
• Knowledge of ticketing and ITSM tools (e.g., ServiceNow, Jira Service Management,
or similar).
• Understanding of compliance, security, and best practices in support operations.
• Exposure to automation and process improvement initiatives.
WOW Softech
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.
We have sent an OTP to your contact. Please enter it below to verify.
Practice Java coding challenges to boost your skills
Start Practicing Java Now30.0 - 40.0 Lacs P.A.
30.0 - 40.0 Lacs P.A.
5.0 - 10.0 Lacs P.A.
6.0 - 11.0 Lacs P.A.
hyderabad
35.0 - 40.0 Lacs P.A.
hyderabad
25.0 - 35.0 Lacs P.A.
30.0 - 40.0 Lacs P.A.
hyderabad
5.0 - 10.0 Lacs P.A.
bengaluru
7.0 - 12.0 Lacs P.A.
5.0 - 10.0 Lacs P.A.