Role Description
Role:
Incident & Request Manager – Non-Production Environments
Role Overview
The Incident & Request Manager will lead the incident response and request management functions across all non-production environments (Dev, QA, UAT, Performance). This role acts as the escalation point for project/product delivery teams, ensuring incidents are resolved swiftly, requests are fulfilled efficiently, and lessons learned are embedded into continuous improvement.The manager will directly oversee a team of Incident Analysts and SREs, collaborate with DevOps teams to drive automation, and work closely with Environment and Change Managers to reduce recurrence of issues.
Key Responsibilities
Incident Management
- Own the end-to-end incident lifecycle: detection, triage, response, resolution, and closure.
- Act as the primary escalation point for project/product delivery teams during non-production incidents.
- Lead war rooms for critical incidents, coordinating across technical and delivery stakeholders.
- Escalate to Environment, Change, DevOps, Infrastructure, and Security teams as needed.
- Track and improve incident SLAs (MTTR, MTTD, and availability SLOs).
Request Management
- Manage request fulfilment for project/product delivery teams (e.g., access, entitlements, environment services).
- Standardize and automate common request types in collaboration with Intake and DevOps teams.
- Ensure requests are logged, prioritized, and fulfilled within SLA.
- Provide transparent reporting on request status to stakeholders.
Team Leadership
- Manage and mentor Incident Analysts and SREs.
- Ensure follow-the-sun support coverage through onshore/offshore teams.
- Promote a culture of blameless incident management, automation-first practices, and continuous learning.
Governance & RCA
- Ensure all incidents have a documented Root Cause Analysis (RCA).
- Track corrective and preventive actions, feeding outcomes into Change and Environment management processes.
- Deliver trend reporting and actionable insights to leadership.
SRE & DevOps Alignment
- Partner with SREs and DevOps teams to automate detection, rollback, and recovery processes.
- Integrate observability tools (Splunk, Prometheus, Grafana) into proactive monitoring and ing.
Stakeholder Communication
- Provide timely updates during incidents and delays in request fulfilment.
- Publish reports on incident trends, RCA outcomes, and SLA adherence.
- Maintain trust and transparency with project/product delivery teams.
Required Skills & Experience
- 8–10 years of experience in Incident Management, Service Operations, or SRE leadership.
- Proven experience managing Incident Analysts and SRE teams.
- Strong technical knowledge of AWS, Kubernetes, CI/CD pipelines, and observability tools (Splunk, Prometheus, Grafana).
- Deep understanding of ITIL Incident, Problem, and Request Management processes.
- Excellent crisis management, communication, and stakeholder engagement skills.
Core Skills
- Incident Management
- Request Management
- Infrastructure Management
Additional Comments
Incident & Request Manager – Non-Production Environments Role Overview The Incident & Request Manager leads the incident response and request management function for all non-production environments (Dev, QA, UAT, Performance). Acting as the escalation point for project/product delivery teams, this role ensures incidents are resolved quickly, requests are fulfilled efficiently, and learnings are embedded into continuous improvement. The Incident Manager directly manages a team of Incident Analysts and SREs, partners with DevOps teams to automate detection and response, and works closely with Environment and Change Managers to reduce recurrence of issues. ________________________________________ Key Responsibilities Incident Management
- Own the incident lifecycle: detection, triage, response, resolution, and closure.
- Act as the primary escalation point for project/product delivery teams during NPE incidents.
- Lead war rooms for critical incidents, coordinating with technical and delivery stakeholders.
- Ensure timely escalation to Environment, Change, DevOps, Infra, and Security teams when required.
- Track and improve incident SLAs (MTTR, MTTD, availability SLOs). Request Management
- Own request fulfilment for project/product delivery teams (e.g., access, entitlements, environment service requests).
- Standardize and automate common request types in collaboration with Intake and DevOps teams.
- Ensure requests are logged, prioritized, and fulfilled within SLA.
- Provide transparency to stakeholders on request status. Team Leadership
- Manage and mentor Incident Analysts and SREs.
- Ensure follow-the-sun coverage via offshore/onshore teams.
- Build a culture of blameless incident management, automation-first practices, and continuous learning. Governance & RCA
- Ensure all incidents have documented Root Cause Analysis (RCA).
- Track corrective and preventive actions, and feed them into Change and Environment management processes.
- Provide trend reporting and insights to leadership. SRE & DevOps Alignment
- Work with SREs and DevOps teams to automate incident detection, rollback, and recovery.
- Integrate observability tools (Splunk, Prometheus, Grafana) into proactive monitoring. Stakeholder Communication
- Provide timely updates during incidents and delays in request fulfilment.
- Publish regular reports on incident trends, RCA outcomes, and SLA adherence.
- Maintain trust with project/product delivery teams by ensuring transparent communication. ________________________________________ Required Skills & Experience
- 8–10 years in Incident Management, Service Operations, or SRE leadership.
- Experience managing Incident Analysts and SRE teams.
- Strong knowledge of AWS, Kubernetes, CI/CD pipelines, and observability tools (Splunk, Prometheus, Grafana).
- Deep understanding of ITIL Incident, Problem, and Request Management processes.
- Excellent crisis management, communication, and stakeholder engagement skills.
Skills
Incident Management,Infrastructure Management,Request Management