Scope:
We are seeking a highly experienced Senior Major Incident Manager (8–12 years) to lead and oversee the management of high-severity incidents across complex IT environments. The ideal candidate will have extensive experience in IT operations, ITIL-aligned service management, and cross-functional coordination, ensuring minimal business impact and rapid restoration of services.
Our current technical environment:
-
Incident management in multi-cloud environments (Azure, AWS, GCP).
-
Experience with hybrid infrastructure (on-prem + cloud).
-
Understanding of cloud-native monitoring tools (CloudWatch, Azure Monitor, GCP Operations Suite).
-
Exposure to containerized platforms (Kubernetes, OpenShift, Docker).
-
Familiarity with DevOps/SRE practices (CI/CD, automation, “observability as code”).
-
Managing incidents in SaaS, PaaS, and IaaS enterprise environments.
-
Knowledge of cloud security, compliance, and governance impacts during incidents.
What you’ll do:
-
Major Incident Leadership: Serve as the primary owner and decision-maker for all high-severity incidents, coordinating across multiple technical teams, vendors, and business stakeholders.
-
End-to-End Incident Management: Manage the complete lifecycle of major incidents—from identification and logging to resolution and closure—ensuring adherence to SLAs and operational best practices.
-
Stakeholder Communication: Provide timely, clear, and accurate updates to senior management, business leaders, and external stakeholders during critical incidents.
-
Root Cause Analysis & Continuous Improvement: Lead post-incident reviews, identify systemic issues, recommend preventive measures, and implement process improvements to reduce recurrence.
-
Process Governance: Define and enforce incident management processes, policies, and best practices across the organization, ensuring compliance with ITIL frameworks.
-
Team Leadership & Mentorship: Guide and mentor Major Incident Managers, L2/L3 support teams, and other operational staff on effective incident handling and crisis management.
-
Proactive Risk Management: Collaborate with monitoring, infrastructure, and cloud teams to identify potential service-impacting risks and implement proactive mitigation strategies.
-
Reporting & Metrics: Maintain dashboards, KPIs, and metrics to track incident trends, team performance, and continuous service improvement.
What we are looking for:
-
5– 8 years of experience in IT operations, service management, or incident management roles, with significant exposure to major incident handling.
-
Deep understanding of ITIL processes, especially incident, problem, change, and service continuity management.
-
Proven experience managing high-impact, enterprise-wide incidents in complex, multi-cloud or hybrid IT environments.
-
Exceptional communication, stakeholder management, and crisis coordination skills.
-
Strong analytical, problem-solving, and decision-making abilities under high-pressure situations.
-
Demonstrated leadership experience, with the ability to drive cross-functional teams during critical incidents.
-
Experience with enterprise monitoring, ticketing, and alerting tools (e.g., ServiceNow, Jira, Splunk, OpsGenie, SolarWinds).
-
Preferred Skills:
-
Experience with cloud platforms (Azure, AWS, GCP) and hybrid infrastructure.
-
ITIL Intermediate or Expert certification.
-
Knowledge of automation, orchestration, and incident management dashboards.
-
Experience in stakeholder reporting to C-level executives.
-
Exposure to compliance, audit, and regulatory requirements for IT operations.
Our Values
If you want to know the heart of a company, take a look at their values. Ours unite us. They are what drive our success – and the success of our customers. Does your heart beat like ours? Find out here:
Core Values
All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability or protected veteran status.