Posted:4 weeks ago|
Platform:
On-site
Full Time
LOCATION - MUMBAI & HYDERABAD Job Summary: The IT Director of Site Reliability Engineering (SRE) and Problem Management is visionary engineering leader responsible for ensuring the reliability, availability, and performance of the organization's IT systems and services. This role involves leading the SRE function through matrixed cross-functional teams, managing problem resolution processes, and implementing strategies to prevent and mitigate incidents, ensuring seamless operations and continuous improvement. Key Responsibilities: Leadership and Team Management: In partnership with the technical Directors, lead and mentor a matrixed team of Engineers, fostering a culture of excellence and continuous improvement. Lead the SRE and Problem Management capabilities, driving cross-functional initiatives to achieve operational excellence and optimizing processes and systems to enhance efficiency and effectiveness. Site Reliability Engineering: Develop and implement SRE practices to ensure high availability, scalability, and performance of IT systems and services. Oversee the monitoring and observability of IT infrastructure, ensuring comprehensive visibility into system health and performance. Implement automation and tooling to improve reliability and reduce manual intervention. Ensure redundancy and failover mechanisms are in place to maintain service continuity during disruptions. Problem Management: Lead efforts in identifying, analyzing, and resolving IT problems, ensuring root causes are addressed to prevent recurrence. Partner with the IT Operations leader to input and enhance the incident response protocols, ensuring swift and effective resolution of IT incidents to minimize impact on operations. Implement and manage problem management processes, including post-incident reviews and continuous improvement initiatives. Strategic and Operational Planning: Develop and implement strategies aligned with the organization's objectives, ensuring resilience and performance. Lead and manage change initiatives, ensuring successful execution and adoption of new processes, technologies, and strategies. Business Continuity Planning (BCP): Develop and maintain comprehensive BCP and recovery procedures to ensure organizational resilience during emergencies and structure recovery plans are followed. Conduct regular testing and drills to validate the effectiveness of technology redundancy, failover mechanisms, and BCP plans to ensure readiness. Vendor and IT Compliance Management: Partner with the IT Operations leader to maintain effective relationships with external vendors and service providers, ensuring service performance, redundancy, and compliance is in line with expectations. Drive problem management initiatives with vendors to ensure root causes are properly addressed and testing is conducted to prevent recurrence. Performance and Innovation: Implement and monitor key performance indicators (KPIs) to assess the effectiveness of SRE and problem management operations and drive continuous improvement. Stay abreast of emerging technologies and industry trends, recommending and implementing innovations to enhance IT resilience and operations. Qualifications: Education: Bachelor's degree in Information Technology, Computer Science, or a related field; Master's degree preferred. Experience: Minimum of 10 years of experience in IT operations, with at least 5 years in a leadership role focused on SRE and problem management. Skills: Strong leadership and management skills, excellent communication and interpersonal abilities, strategic thinking, and problem-solving skills. Technical Expertise: In-depth knowledge of SRE practices, IT infrastructure, network management, telecom, cybersecurity, and cloud computing. Certifications: Relevant certifications such as ITIL, PMP, CISSP, or equivalent are highly desirable. Personal Attributes: Leadership: Demonstrated ability to lead and inspire a global engineering team. Adaptability: Ability to thrive in a fast-paced, dynamic environment. Integrity: High ethical standards and commitment to confidentiality. Innovation: Creative problem-solver, driven by a passion for technology and dedication to outstanding service. Show more Show less
Foundever
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
My Connections Foundever
Hyderabad, Telangana, India
Salary: Not disclosed
30.0 - 40.0 Lacs P.A.
Hyderabad, Telangana, India
Salary: Not disclosed