About Crunchyroll
Founded by fans, Crunchyroll delivers the art and culture of anime to a passionate community. We super-serve over 100 million anime and manga fans across 200+ countries and territories, and help them connect with the stories and characters they crave. Whether that experience is online or in-person, streaming video, theatrical, games, merchandise, events and more, it s powered by the anime content we all love.
Join our team, and help us shape the future of anime!
About The Team
The Crunchyroll Enterprise Technology DevOps Team is a group of hardworking and forward-thinking cloud engineers passionate about automating the platform to enable our software developers to deploy their applications efficiently. Our customers are the software engineers and other internal business units who work around us every day and need our support. We live to make things easier for them to rapidly develop, test, and deploy new features and applications. Together, we strive to bring an extraordinary experience to our community of fans.
About The Role
We are seeking a seasoned Principal Cloud Platform Engineer to lead our
cloud-nativ
e approach to build services optimized for their target platforms, and follow a poly-cloud
strategy by selecting different cloud providers based on workload needs. This role will be pivotal in designing, automating, and scaling cloud operations across GCP
and AWS
, with a strong emphasis on reliability, security, and performance. Youll partner closely with engineering, architecture, and security teams to evolve our cloud environment into a robust, scalable, and cost-optimized platform. Key Responsibilities
- Lead the design and automation of cloud infrastructure across
AWS and GCP
, ensuring scalability, security, and high availability. - Develop and maintain
Infrastructure as Code (IaC)
using tools such as Terraform, CloudFormation, and Pulumi
. - Design, build, and optimize
CI/CD pipelines
to streamline application and infrastructure deployments.. -
Define and implement
cloud governance standards across IAM, security, compliance, and cost management. - Act as a technical lead for DevOps, SRE, and platform teams; mentor engineers and promote engineering best practices.
- Drive observability and diagnostics by implementing tools such as
Datadog, Prometheus, Grafana
, and the ELK Stack
. - Drive adoption of DevOps and SRE principles, including incident management, post-incident reviews, SLAs/SLOs, and service reliability engineering.
- Lead modernization efforts for services running in
AWS and GCP
, including workload migrations and performance tuning. - Collaborate with engineering teams to design cloud-native, resilient platforms aligned with business goals..
Required Qualifications
-
12+ years
of experience in Information Technology or Engineering, including a minimum of 9 years
in CloudOps, DevOps
, or related roles. - Proven, hands-on expertise with both Google Cloud Platform (GCP) and Amazon Web Services (AWS), including compute, networking, storage, and serverless services.
- Extensive experience with
Terraform, CloudFormation
, and other infrastructure automation tools. - Strong
Linux systems engineering
background and scripting skills in Python, Bash
, or Go
. - Proven expertise managing
Kubernetes workloads
in production (e.g., GKE, EKS), including deployment automation and resource tuning. - Deep understanding of
CI/CD pipeline design
and best practices for secure, reliable, and repeatable deployments. - Experience with
GitHub
and GitHub
Actions, Workflows.
- Solid knowledge of
cloud security
, IAM policies, VPC design, and cost optimization
strategies. - Strong problem-solving skills and a track record of designing
durable, scalable infrastructure solutions
. - Excellent collaboration and communication skills, with the ability to influence architecture and mentor engineers.
Preferred Qualifications (Pluses)
- Professional certifications such as:
- AWS Certified DevOps Engineer - Professional
- Google Professional Cloud DevOps Engineer
- Google Professional Cloud Architect
- Experience with multi-cloud or poly-cloud architecture strategies (GCP, AWS).
- Familiarity with service mesh technologies (e.g., Istio, Linkerd).
- Datawarehouse management using Databricks, etc.
A day in the life of our Principal Cloud Platform Engineer:
We lead a global operation. We split projects and on-call duties across time zones regularly. We believe in working hard, having a work-life balance, and striving to maintain it. You may find yourself doing any of the following on any given day:
-
Cloud Operations
: Monitor AWS/GCP health, ensure SLO compliance, and troubleshoot performance across infrastructure and applications. -
Infrastructure Engineering
: Build and manage multi-cloud infrastructure using Terraform and Pulumi, with a focus on scalability, automation, and resiliency. -
Architecture & Strategy
: Participate in architectural decisions to uphold best practices in scalability, security, container orchestration, and network design. -
Team Leadership
: Lead daily engineering syncs, align cross-functional teams, and drive improvements in CI/CD, automation, and platform reliability. -
Security & Compliance
: Enforce IAM and cloud security standards, ensure regulatory compliance, and automate policy governance. -
Incident & Reliability Management
: Lead incident response, perform root cause analysis, and implement long-term reliability enhancements. -
Cost Optimization:
Analyze cloud spend, optimize resource usage, and collaborate with FinOps to align performance with budget goals. -
Enablement & Documentation
: Mentor engineers and maintain internal knowledge through clear documentation of processes and best practices.