Responsibilities
A
Senior Manager
leading a 24x7 Application Support Desk with a focus on maximum system uptime holds a critical position. The primary priorities are operational efficiency, team management, incident response, and stakeholder communication. Below are the typical roles and key responsibilities:
- Team Leadership and People Management
- Lead, mentor, and coach a team of support analysts, team leads, and shift managers across all shifts.
- Define staffing models, ensure enough coverage for all shifts, including holidays and weekends.
- Oversee recruitment, training, skill development, and performance management of support staff.
- Operational Oversight
- Ensure a seamless 24x7 support operation with strict adherence to SLAs and quick turnaround times.
- Monitor the health of critical business applications and infrastructure using dashboards and automated alerting.
- Optimize scheduling, handover, and escalation procedures for all shifts.
- Incident and Problem Management
- Oversee the incident response process, ensuring rapid triage, escalation, and resolution of production incidents to minimize downtime.
- Lead root cause analysis (RCA) for major incidents; champion corrective and preventive actions.
- Maintain and regularly update incident playbooks and runbooks.
- Continuous Service Improvement
- Analyze recurring incidents and work proactively with application development and infrastructure teams for permanent fixes.
- Implement process improvements and automation to reduce manual interventions and errors.
- Regularly review and update SOPs for efficiency and compliance.
- Stakeholder and Vendor Management
- Serve as the main point of contact for business leaders, product owners, and external vendors regarding application support and uptime.
- Provide regular updates on system health, incidents, and service performance to senior management.
- Manage vendor SLAs and escalate issues as necessary.
- Compliance, Security, and Audit
- Ensure all processes comply with internal security, audit, and regulatory requirements.
- Oversee logging, monitoring, and evidence collection for internal/external audits.
- Enforce data security, user access policies, and incident response protocols.
- Reporting and Analytics
- Publish daily, weekly, and monthly reports on system uptime, incident volumes, resolution times, trends, and SLA adherence.
- Present insights and recommendations to leadership, using data to justify headcount, budget, or tooling changes.
- Disaster Recovery and Business Continuity
- Own and regularly test the disaster recovery (DR) and business continuity (BCP) plans for all critical applications.
- Conduct DR drills; lead the response in actual disaster or high-impact incident scenarios.
- Customer Experience and Communication
- Ensure timely and transparent communication with users and stakeholders during outages and major incidents.
- Gather user feedback post-incident to drive improvements in service delivery and support experience.
- Technology and Tooling
- Evaluate, implement, and optimize monitoring, ticketing, alert management, and automation tools to support 24x7 operations.
- Stay abreast of technology trends in AIOps, automation, and application monitoring.
Note:
The above responsibilities may be tailored to the organization’s technology stack, business priorities, and regulatory landscape. The unifying theme is a relentless focus on
system uptime, proactive problem management, and a culture of continuous service improvement.
Skills: reporting & analytics (kpi, sla, mis),application support (24x7 operations),disaster recovery (dr) & business continuity (bcp),incident & problem management,system uptime monitoring,vendor management,compliance & audit readiness,customer experience management,stakeholder communication,team leadership & people management,change management,root cause analysis (rca)