Job
Description
As a Director of Cloud, DevOps, and Site Reliability Engineering (SRE) at our company, your main focus will be on executing the technical strategy, implementation, and continuous operation of our cloud infrastructure and services. You will play a crucial role in translating strategic vision into tangible, high-quality, and scalable results. Key Responsibilities and Execution Focus: - Lead the migration and deployment of core business applications and services to cloud platforms (e.g., AWS, Azure, GCP), ensuring projects are delivered on time, within budget, and meet defined non-functional requirements (security, scalability, performance). - Direct the implementation of Continuous Integration/Continuous Delivery (CI/CD) pipelines across all engineering teams, focusing on fully automated, reliable, and repeatable deployments. - Drive Infrastructure as Code (IaC) adoption (e.g., Terraform, Ansible), establishing a 100% code-driven infrastructure environment with clear governance and review processes. - Establish and enforce Service Level Objectives (SLOs) and Service Level Indicators (SLIs) for all critical services, immediately implementing monitoring and alerting to measure against these targets. - Direct the SRE function to minimize operational toil by developing and deploying automation tools and services for routine tasks, incident response, and capacity management. - Lead major incident response and post-mortem processes, ensuring effective root cause analysis and implementing immediate, execution-driven solutions to prevent recurrence. - Execute a robust cost management strategy for cloud resources, implementing FinOps practices to optimize spending without compromising reliability or performance. - Own the security posture of the cloud environment, working hands-on with security teams to implement and automate compliance and security controls (DevSecOps). Team Leadership and Mentorship: - Recruit, develop, and mentor a high-performing team of Cloud Engineers, DevOps Engineers, and SREs, setting clear, execution-focused goals and metrics. - Foster a culture of ownership, accountability, and execution within the team, emphasizing rapid iteration, collaboration, and bias for action. - Act as a hands-on leader by actively participating in design reviews, critical deployments, and troubleshooting efforts. Qualifications and Requirements: - Minimum of 10 years of progressive experience in infrastructure, operations, or software engineering, with at least 3 years in a Director or Senior Management role overseeing Cloud, DevOps, or SRE teams. - Deep expertise in a major cloud provider (AWS, Azure, and GCP), including advanced networking, security services, and serverless architectures. - Extensive experience implementing and scaling IaC and configuration management tools (e.g., Terraform, Ansible, SaltStack) in a production environment. - Proven track record of establishing and running SRE practices (SLOs, error budgets, toil reduction) with tangible results in improving service reliability and availability. - Proficiency in modern scripting/programming languages (e.g., Python, Go, Bash) for automation and tool development. Education: - Bachelors degree in Computer Science, Engineering, or a related field; equivalent practical experience is accepted.,