Director - DevOps & SRE

12 - 18 years

0 Lacs

Posted:1 week ago| Platform: Shine logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

Role Overview: As the Director of DevOps & SRE at Zycus, you will be responsible for leading the global DevOps and reliability engineering organization for our fast-scaling SaaS platform. Your role will involve defining and executing DevOps & SRE strategies, managing large-scale cloud and hybrid infrastructure, and driving AI-powered automation to ensure operational excellence. You will lead and mentor a team of architects, engineers, and managers, define preventive processes, and foster a results-driven culture across all DevOps and SRE teams. Key Responsibilities: - Lead and mentor a global team of DevOps and SRE architects, engineers, and managers. - Define and execute the DevOps & SRE strategy, aligning infrastructure, reliability, and automation with business objectives. - Serve as the primary escalation point for critical product and operational issues. - Define processes, playbooks, and preventive measures to avoid recurring incidents. - Foster a results-driven, hands-on culture across all DevOps and SRE teams. - Manage large-scale SaaS environments spanning thousands of servers, microservices, and hybrid cloud/on-prem infrastructure. - Architect and implement cloud-native and hybrid infrastructure leveraging AWS, Azure, GCP, and on-prem resources. - Build and optimize CI/CD pipelines and infrastructure automation using Python, Ansible, Terraform, and related tooling. - Lead system architecture and performance optimization for load balancers, proxies, caches, messaging queues, and secure APIs. - Integrate Generative AI / AIOps for anomaly detection, predictive scaling, automated remediation, and proactive monitoring. - Evaluate and implement AI-based tools or develop custom scripts/models to reduce manual toil and increase operational efficiency. - Drive continuous innovation in DevOps tooling, automation frameworks, and best practices. - Partner with Product, Engineering, QA, and Security teams to embed observability, automation, and reliability early in the SDLC. - Define KPIs, SLAs, and SLOs to measure system performance and operational efficiency. - Lead post-incident reviews, root cause analysis, and implement preventive processes. Qualification Required: - 12-18 years in DevOps, SRE, or cloud/hybrid infrastructure roles with 5+ years in senior leadership. - Proven experience managing large-scale SaaS environments. - Deep expertise in AWS, Azure, GCP, and on-prem integration. - Hands-on expertise in Kubernetes, Docker, Terraform, Ansible, Packer, Nomad, and automation frameworks. - Strong scripting and automation skills in Python and related tooling. - Experience in handling critical escalations, root cause analysis, and preventive process design. - Strong architectural design, problem-solving, and stakeholder management skills. - Bachelors or Masters in Computer Science, IT, or related field preferred. - Experience with observability tools like Datadog, Prometheus, Grafana, New Relic, or AI-native platforms preferred. - AWS, Azure, or hybrid cloud certifications are a plus.,

Mock Interview

Practice Video Interview with JobPe AI

Start DevOps Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

RecommendedJobs for You