Lead Reliability Architect

16 - 18 years

0 Lacs

Posted:1 week ago| Platform: Foundit logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

Making the World More Resilient - One Application at a Time!

At Swiss Re, our mission is to make the world more resilient. As a leading global reinsurance company, we help individuals, businesses, and societies recover from disaster and build confidence for the future.

Lead Reliability Architect

Key Responsibilities

As our Lead Reliability Architect, you will:

  • Own and shape the reliability strategy

    for our Property & Casualty IT landscape, ensuring alignment with Swiss Re's broader technology and business objectives.
  • Overlook the reliability and resilience characteristics

    of our business-critical application portfolio and drive their continuous improvement.
  • Define and maintain blueprints, guidelines, and best practices

    for resilience, high availability, disaster recovery, and fault tolerance - ensuring they are practical, actionable, and consistently applied across all development teams.
  • Work directly with application development teams

    to support the implementation of these blueprints and architectural principles across the whole Software Development Lifecycle.
  • Define and govern the monitoring & alerting baseline

    for our applications, which includes defining golden signals, SLIs, and SLOs across the whole system landscape.
  • Drive the adoption of the OpenTelemetry framework

    in our observability stack - across applications, platforms, and shared infrastructure.
  • Partner closely with Operations (Run) teams

    to analyze operational incidents and derive actionable insights for improving system reliability and fault response capabilities.
  • Act as a

    bridge between engineering and operations

    , fostering a culture of reliability, accountability, and continuous improvement.
  • Mentor teams and advocate for SRE practices

    , ensuring a consistent understanding and application of resilience and observability standards across our engineering workforce.

About You

We are looking for a candidate with a balanced profile of deep technical expertise and strong leadership capabilities.

Professional & Technical Skills

  • Overall 16+ Years of experience in Technology domain.
  • Well-established track record and senior-level hands-on background in software and reliability engineering with a focus on distributed systems and high-availability architectures in public cloud environments (ideally Azure).
  • Deep expertise in reliability and resilience engineering, including concepts like redundancy and failover, fault tolerance and graceful degradation, circuit breakers, retry patterns, chaos engineering, and auto-healing.
  • Solid experience in operating applications at scale, ideally within regulated or mission-critical environments.
  • Familiarity with Google's Site Reliability Engineering (SRE) practices, especially around SLIs and SLOs, error budgets, and operational readiness.
  • Strong background in monitoring, telemetry, and observability, with a focus on defining effective metrics and alerts that reduce noise and improve incident detection.
  • Hands-on experience with OpenTelemetry and related observability tools (e.g., Prometheus, Grafana, Jaeger, Elastic, etc.) would be a plus.
  • Experience collaborating in DevOps and hybrid cloud environments, ideally with exposure to containerized and microservices architectures.

Personal & Leadership Skills

  • Strong thought leadership and influencing skills ability to challenge the status quo and advocate for meaningful change.
  • Architectural mindset, with a structured approach to problem-solving and strong planning and design capabilities.
  • High personal integrity, accountability, and a proactive approach to ownership and decision-making.
  • Excellent collaboration and communication skills, able to build trusted relationships across teams, functions, and geographies.
  • Team player with the ability to work across disciplines and bring people together around shared goals.
  • Demonstrated ability to foster understanding between application development and operations teams - serving as a translator and facilitator between the two worlds.
  • Fluent in English, both written and spoken.

#LI-Hybrid?

Keywords:

Reference Code:

134808

Mock Interview

Practice Video Interview with JobPe AI

Start Job-Specific Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now
Swiss Re logo
Swiss Re

Insurance and Reinsurance

Zürich

RecommendedJobs for You

hyderabad, telangana, india

Hyderabad, Telangana, India