Project Manager I- Incident Manager

10 years

0 Lacs

Posted:1 day ago| Platform: Linkedin logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

Role Description

Role:

Incident & Request Manager – Non-Production Environments

Role Overview

The Incident & Request Manager will lead the incident response and request management functions across all non-production environments (Dev, QA, UAT, Performance). This role acts as the escalation point for project/product delivery teams, ensuring incidents are resolved swiftly, requests are fulfilled efficiently, and lessons learned are embedded into continuous improvement.The manager will directly oversee a team of Incident Analysts and SREs, collaborate with DevOps teams to drive automation, and work closely with Environment and Change Managers to reduce recurrence of issues.

Key Responsibilities

Incident Management

  • Own the end-to-end incident lifecycle: detection, triage, response, resolution, and closure.
  • Act as the primary escalation point for project/product delivery teams during non-production incidents.
  • Lead war rooms for critical incidents, coordinating across technical and delivery stakeholders.
  • Escalate to Environment, Change, DevOps, Infrastructure, and Security teams as needed.
  • Track and improve incident SLAs (MTTR, MTTD, and availability SLOs).

Request Management

  • Manage request fulfilment for project/product delivery teams (e.g., access, entitlements, environment services).
  • Standardize and automate common request types in collaboration with Intake and DevOps teams.
  • Ensure requests are logged, prioritized, and fulfilled within SLA.
  • Provide transparent reporting on request status to stakeholders.

Team Leadership

  • Manage and mentor Incident Analysts and SREs.
  • Ensure follow-the-sun support coverage through onshore/offshore teams.
  • Promote a culture of blameless incident management, automation-first practices, and continuous learning.

Governance & RCA

  • Ensure all incidents have a documented Root Cause Analysis (RCA).
  • Track corrective and preventive actions, feeding outcomes into Change and Environment management processes.
  • Deliver trend reporting and actionable insights to leadership.

SRE & DevOps Alignment

  • Partner with SREs and DevOps teams to automate detection, rollback, and recovery processes.
  • Integrate observability tools (Splunk, Prometheus, Grafana) into proactive monitoring and ing.

Stakeholder Communication

  • Provide timely updates during incidents and delays in request fulfilment.
  • Publish reports on incident trends, RCA outcomes, and SLA adherence.
  • Maintain trust and transparency with project/product delivery teams.

Required Skills & Experience

  • 8–10 years of experience in Incident Management, Service Operations, or SRE leadership.
  • Proven experience managing Incident Analysts and SRE teams.
  • Strong technical knowledge of AWS, Kubernetes, CI/CD pipelines, and observability tools (Splunk, Prometheus, Grafana).
  • Deep understanding of ITIL Incident, Problem, and Request Management processes.
  • Excellent crisis management, communication, and stakeholder engagement skills.

Core Skills

  • Incident Management
  • Request Management
  • Infrastructure Management

Additional Comments

Incident & Request Manager – Non-Production Environments Role Overview The Incident & Request Manager leads the incident response and request management function for all non-production environments (Dev, QA, UAT, Performance). Acting as the escalation point for project/product delivery teams, this role ensures incidents are resolved quickly, requests are fulfilled efficiently, and learnings are embedded into continuous improvement. The Incident Manager directly manages a team of Incident Analysts and SREs, partners with DevOps teams to automate detection and response, and works closely with Environment and Change Managers to reduce recurrence of issues. ________________________________________ Key Responsibilities Incident Management
  • Own the incident lifecycle: detection, triage, response, resolution, and closure.
  • Act as the primary escalation point for project/product delivery teams during NPE incidents.
  • Lead war rooms for critical incidents, coordinating with technical and delivery stakeholders.
  • Ensure timely escalation to Environment, Change, DevOps, Infra, and Security teams when required.
  • Track and improve incident SLAs (MTTR, MTTD, availability SLOs). Request Management
  • Own request fulfilment for project/product delivery teams (e.g., access, entitlements, environment service requests).
  • Standardize and automate common request types in collaboration with Intake and DevOps teams.
  • Ensure requests are logged, prioritized, and fulfilled within SLA.
  • Provide transparency to stakeholders on request status. Team Leadership
  • Manage and mentor Incident Analysts and SREs.
  • Ensure follow-the-sun coverage via offshore/onshore teams.
  • Build a culture of blameless incident management, automation-first practices, and continuous learning. Governance & RCA
  • Ensure all incidents have documented Root Cause Analysis (RCA).
  • Track corrective and preventive actions, and feed them into Change and Environment management processes.
  • Provide trend reporting and insights to leadership. SRE & DevOps Alignment
  • Work with SREs and DevOps teams to automate incident detection, rollback, and recovery.
  • Integrate observability tools (Splunk, Prometheus, Grafana) into proactive monitoring. Stakeholder Communication
  • Provide timely updates during incidents and delays in request fulfilment.
  • Publish regular reports on incident trends, RCA outcomes, and SLA adherence.
  • Maintain trust with project/product delivery teams by ensuring transparent communication. ________________________________________ Required Skills & Experience
  • 8–10 years in Incident Management, Service Operations, or SRE leadership.
  • Experience managing Incident Analysts and SRE teams.
  • Strong knowledge of AWS, Kubernetes, CI/CD pipelines, and observability tools (Splunk, Prometheus, Grafana).
  • Deep understanding of ITIL Incident, Problem, and Request Management processes.
  • Excellent crisis management, communication, and stakeholder engagement skills.

Skills

Incident Management,Infrastructure Management,Request Management

Mock Interview

Practice Video Interview with JobPe AI

Start DevOps Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now
UST logo
UST

IT Services and IT Consulting

Aliso Viejo CA

RecommendedJobs for You