Job Summary
The Senior Director of Cloud Operations is responsible for the operational integrity, performance, and reliability of enterprise cloud environments. This role leads a global, data-driven operations team with a strong emphasis on incident management, service continuity, and continuous improvement. This role reports directly to the Vice President of Cloud.This position will be responsible for leading a global team of cloud engineers, SRE practice, service management tools and operations using a metrics-first approach.
What Your Impact Will Look Like
- Cloud Infrastructure Operations
- Oversee the daily operations of cloud platforms (AWS, Azure, GCP), ensuring high availability and performance across global regions.
- Lead the development and execution of operational runbooks, SOPs, and escalation paths.
- Incident Management & Response
- Own the end-to-end incident management lifecycle: detection, triage, escalation, resolution, and post-incident review.
- Lead a global incident response team with 24/7 coverage, ensuring seamless handoffs across time zones.
- Implement real-time monitoring, alerting, and automated remediation to reduce MTTD and MTTR.
- Use data analytics to identify incident trends, recurring issues, and systemic risks.
- Conduct blameless postmortems and ensure corrective actions are prioritized and tracked to closure.
- Data-Driven Operational Leadership
- Build and lead a global team of cloud engineers, SREs, and operations analysts using a metrics-first approach.
- Define and track operational KPIs (e.g., uptime, incident frequency, resolution time, change success rate) to drive accountability and performance.
- Leverage dashboards and analytics platforms (e.g., Datadog, Grafana, Splunk, ServiceNow) to provide real-time visibility into system health and team performance.
- Use data to inform staffing models, on-call rotations, and workload balancing across regions.
- Foster a culture of continuous improvement through data-backed retrospectives and operational reviews.
- AI enabled Focus
- Drive AI and ML adoption in operational workflows (e.g., predictive monitoring, incident pattern analysis etc.,) to improve uptime and automate repetitive tasks.
- Define and execute AI-driven observability strategy using tools like AIOps platforms for intelligent alerting and root cause analysis.
- Collaborate with Engineering, Security, and Product teams to embed AI-enabled automation in deployment pipelines, change management etc.,.
- Establish and maintain SLOs/SLAs leveraging AI-generated insights to prioritize engineering work that improves reliability and customer experience.
- Oversee incident management, post-mortems, and continuous improvement, incorporating AI tools for impact analysis and knowledge retention.
- Operational Governance
- Define and enforce SLAs, SLOs, and operational KPIs.
- Ensure compliance with security, regulatory, and audit requirements.
- Manage change control, configuration management, and release processes to minimize operational risk.
- Cost & Vendor Management
- Monitor and optimize cloud spend through cost governance and usage analysis.
- Manage vendor relationships, contracts, and service-level agreements.
- Collaboration & Communication
- Partner with engineering, security, and business teams to align operations with product and service goals.
- Provide regular reporting and updates to executive leadership on operational health, risks, and incident trends.
You Will Love This Job If You Have
- Education
- Bachelor’s or master’s degree in computer science, Information Systems, or related field.
- Experience
- 14+ years in IT operations, with 7+ years in cloud infrastructure and operations leadership.
- Proven experience leading global teams and managing high-severity incidents in large-scale environments.
- Skills
- Deep expertise in cloud operations, incident response, and service reliability.
- Strong knowledge of ITIL, SRE, and DevOps practices.
- Proficiency in operational analytics and observability tools.
- Excellent leadership, communication, and cross-functional collaboration skills.
- Strong presentation skills, including experience presenting to large global audiences.
- Certifications (Preferred)
- AWS Certified DevOps Engineer – Professional
- Azure Administrator Associate
- ITIL Foundation or Practitioner
The Benefits
At Granicus, we offer a
comprehensive and flexible benefits package
designed to support your well-being, growth, and work-life balance.Here’s what you can expect as a India-based team member:
Flexibility & Balance
- Paid Time Off– Take the time you need to rest, recharge, and live your life.
- Company-Wide Wellbeing Days – Paid days off to unplug and focus on your mental health.
- Work From Home Reimbursement – Support a productive home office environment.
Health & Wellness
- Private healthcare benefits - Comprehensive coverage for you and your family.
- On-Demand Mental Health Support – Access to Headspace and other wellness tools.
- Fitness Reimbursement & Cycle Program – Stay active, your way.
- Critical Illness and Life Insurance Benefits
Family & Future
- Paid Parental Leave - For both birthing and non-birthing parents.
- Pension plan with employer contributions
Growth & Recognition
- Online Learning Platforms – Fuel your professional development.
- Competitive Salary & Bonuses – Your contributions are valued and rewarded.