Platform Resilience Architect

8 - 10 years

0 Lacs

Posted:5 days ago| Platform: Foundit logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

Job Title: Platform Resilience Architect

Location: Bangalore, Hybrid

Why should you choose us

Rakuten Symphony is a Rakuten Group company, that provides global B2B services for the mobile telco industry and enables next-generation, cloud-based, international mobile services. Building on the technology Rakuten used to launch Japan's newest mobile network, we are taking our mobile offering global. To support our ambitions to provide an innovative cloud-native telco platform for our customers, Rakuten Symphony is looking to recruit and develop top talent from around the globe. We are looking for individuals to join our team across all functional areas of our business from sales to engineering, support functions to product development. Let's build the future of mobile telecommunications together!

What Do We Expect From You

This role architects and owns the master test strategy for the entire OSS platform, ensuring its resilience, scalability, and operational readiness in a cloud-native environment. The architect will define the how and what of platform quality by designing comprehensive test situations, chaos engineering experiments, and security test plans. The strategic purpose is to guarantee the underlying platform for the entire OSS suite is robust, secure, and production-grade by engineering the comprehensive validation strategy required to certify high-value capabilities like automated deployment, High Availability (HA), and Disaster Recovery (DR).

The company may expect you to undertake other tasks outside of this job description. This job description is not exhaustive and may be updated from time to time.

Responsibilities

Resilience & Chaos Engineering Architecture

  • Architect the comprehensive chaos engineering strategy to proactively identify systemic weaknesses and validate the resilience of the OSS platform against infrastructure, network, and application failures.
  • Design and govern the master test strategy for validating the platform's multi-DC High Availability and Disaster Recovery procedures, defining the methodologies to measure and certify RTO and RPO against business requirements.

Performance & Scalability Validation Strategy

  • Design the end-to-end performance testing strategy to certify the platform's scalability, latency, and throughput against defined Service Level Objectives (SLOs).
  • Architect the validation approach for resource management and capacity planning, ensuring the platform's efficiency and ability to scale cost-effectively under various load conditions.

Security & Compliance Validation Architecture

  • Architect the platform's security validation strategy, defining the continuous test plans for penetration testing, vulnerability scanning, and software supply chain security (e.g., container image scanning).
  • Design the validation framework to ensure the platform's configuration and deployment pipelines adhere to industry compliance standards and internal security best practices.

Platform Lifecycle & Operability Validation

  • Design the validation architecture for the OSS platform's complete lifecycle, including automated zero-touch installation, seamless in-service upgrades, and dynamic scaling.
  • Architect the test strategy for platform observability, ensuring that logging, monitoring, and alerting mechanisms are sufficient to guarantee operational readiness and rapid fault isolation.

Technical Governance & Enablement

  • Define and evangelize the technical requirements for the next generation of infrastructure simulators and chaos engineering tools, providing clear specifications to the Tools Engineering team.
  • Act as the primary technical authority on platform quality, providing strategic guidance and quality benchmarks to the Platform Development, SRE, and Product QE teams to influence design for testability.

Qualifications

Experience and Expertise

  • 8+ years in platform-focused engineering roles such as SRE, DevOps, or Cloud Engineering, with at least 3 years in an architect role focused on cloud-native infrastructure

Analytical and Problem-Solving Skills

  • Ability to think like an adversary (for both security and failure testing) to uncover systemic weaknesses in a distributed platform before they manifest in production.
  • Passion for solving problems and delivering optimal solutions.

Technical Skills

  • Expert-level knowledge of Kubernetes, container runtimes, and service mesh technologies (e.g., Istio).
  • Deep experience with Infrastructure as Code (Terraform, Ansible) and CI/CD practices.
  • Hands-on experience with chaos engineering principles and tools (e.g., LitmusChaos, Chaos Mesh).

Additional Skills

  • Certified Kubernetes Administrator (CKA) or similar certification.
  • Experience with large-scale public cloud deployments (AWS, Azure, GCP).
  • Experience with observability stacks (Prometheus, Grafana, Loki).
  • Understanding of DevSecOps principles and their enforcement across lifecycle.
  • Prior exposure to working in cross-continental distributed organizations.
  • Working knowledge of Atlassian suite, Confluence, and modern OKR tracking tools.
  • Good understanding of Agile principles and processes.
  • Experience working in a fluid, start-up-like environment.

Rakuten Shugi Principles:

  • Our worldwide practices describe specific behaviours that make Rakuten unique and united across the world. We expect Rakuten employees to model these 5 Shugi Principles of Success.
  • Always improve, always advance.

    Only be satisfied with complete success - Kaizen.
  • Be passionately professional.

    Take an uncompromising approach to your work and be determined to be the best.
  • Hypothesize - Practice - Validate - Shikumika.

    Use the Rakuten Cycle to success in unknown territory.
  • Maximize Customer Satisfaction.

    The greatest satisfaction for workers in a service industry is to see their customers smile.
  • Speed!! Speed!! Speed!!

    Always be conscious of time. Take charge, set clear goals, and engage your team.

Mock Interview

Practice Video Interview with JobPe AI

Start Job-Specific Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now
Rakuten Symphony logo
Rakuten Symphony

Telecommunications

N/A

RecommendedJobs for You