Scope
We are seeking a highly experienced Senior Major Incident Manager (812 years) to lead and oversee the management of high-severity incidents across complex IT environments. The ideal candidate will have extensive experience in IT operations, ITIL-aligned service management, and cross-functional coordination, ensuring minimal business impact and rapid restoration of services.
Our Current Technical Environment
- Incident management in multi-cloud environments (Azure, AWS, GCP).
- Experience with hybrid infrastructure (on-prem + cloud).
- Understanding of cloud-native monitoring tools (CloudWatch, Azure Monitor, GCP Operations Suite).
- Exposure to containerized platforms (Kubernetes, OpenShift, Docker).
- Familiarity with DevOps/SRE practices (CI/CD, automation, observability as code).
- Managing incidents in SaaS, PaaS, and IaaS enterprise environments.
- Knowledge of cloud security, compliance, and governance impacts during incidents.
What You'll Do
- Major Incident Leadership: Serve as the primary owner and decision-maker for all high-severity incidents, coordinating across multiple technical teams, vendors, and business stakeholders.
- End-to-End Incident Management: Manage the complete lifecycle of major incidentsfrom identification and logging to resolution and closureensuring adherence to SLAs and operational best practices.
- Stakeholder Communication: Provide timely, clear, and accurate updates to senior management, business leaders, and external stakeholders during critical incidents.
- Root Cause Analysis & Continuous Improvement: Lead post-incident reviews, identify systemic issues, recommend preventive measures, and implement process improvements to reduce recurrence.
- Process Governance: Define and enforce incident management processes, policies, and best practices across the organization, ensuring compliance with ITIL frameworks.
- Team Leadership & Mentorship: Guide and mentor Major Incident Managers, L2/L3 support teams, and other operational staff on effective incident handling and crisis management.
- Proactive Risk Management: Collaborate with monitoring, infrastructure, and cloud teams to identify potential service-impacting risks and implement proactive mitigation strategies.
- Reporting & Metrics: Maintain dashboards, KPIs, and metrics to track incident trends, team performance, and continuous service improvement.
What We Are Looking For
- 5 8 years of experience in IT operations, service management, or incident management roles, with significant exposure to major incident handling.
- Deep understanding of ITIL processes, especially incident, problem, change, and service continuity management.
- Proven experience managing high-impact, enterprise-wide incidents in complex, multi-cloud or hybrid IT environments.
- Exceptional communication, stakeholder management, and crisis coordination skills.
- Strong analytical, problem-solving, and decision-making abilities under high-pressure situations.
- Demonstrated leadership experience, with the ability to drive cross-functional teams during critical incidents.
- Experience with enterprise monitoring, ticketing, and alerting tools (e.g., ServiceNow, Jira, Splunk, OpsGenie, SolarWinds).
- Preferred Skills:
- Experience with cloud platforms (Azure, AWS, GCP) and hybrid infrastructure.
- ITIL Intermediate or Expert certification.
- Knowledge of automation, orchestration, and incident management dashboards.
- Experience in stakeholder reporting to C-level executives.
- Exposure to compliance, audit, and regulatory requirements for IT operations.
Our Values
If you want to know the heart of a company, take a look at their values. Ours unite us. They are what drive our success and the success of our customers. Does your heart beat like ours Find out here: Core ValuesAll qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability or protected veteran status.