Making the World More Resilient One Application at a Time!
At Swiss Re, our mission is to make the world more resilient. As a leading global reinsurance company, we help individuals, businesses, and societies recover from disaster and build confidence for the future.
To fulfil this mission, we must ensure our own systems and operations are equally resilient. In the Property & Casualty Reinsurance division, the stability and reliability of our IT systems directly impact our ability to deliver on this promise. That s why we re looking for a Lead Reliability Architect
who will champion the resilience of our application landscape ensuring our systems are built to withstand disruption, adapt quickly, and perform reliably even in the face of the unexpected.
Key Responsibilities
As our Lead Reliability Architect, you will:
-
Own and shape the reliability strategy
for our Property & Casualty IT landscape, ensuring alignment with Swiss Re s broader technology and business objectives. -
Overlook the reliability and resilience characteristics
of our business-critical application portfolio and drive their continuous improvement. -
Define and maintain blueprints, guidelines, and best practices
for resilience, high availability, disaster recovery, and fault tolerance ensuring they are practical, actionable, and consistently applied across all development teams. -
Work directly with application development teams
to support the implementation of these blueprints and architectural principles across the whole Software Development Lifecycle. -
Define and govern the monitoring & alerting baseline
for our applications, which includes defining golden signals, SLIs, and SLOs across the whole system landscape. -
Drive the adoption of the OpenTelemetry framework
in our observability stack across applications, platforms, and shared infrastructure. -
Partner closely with Operations (Run) teams
to analyze operational incidents and derive actionable insights for improving system reliability and fault response capabilities. - Act as a
bridge between engineering and operations
, fostering a culture of reliability, accountability, and continuous improvement. -
Mentor teams and advocate for SRE practices
, ensuring a consistent understanding and application of resilience and observability standards across our engineering workforce.
About You
We are looking for a candidate with a balanced profile of deep technical expertise and strong leadership capabilities.
Professional & Technical Skills
- Overall 16+ Years of experience in Technology domain.
- Well-established track record and senior-level hands-on background in software and reliability engineering with a focus on distributed systems and high-availability architectures in public cloud environments (ideally Azure).
- Deep expertise in reliability and resilience engineering, including concepts like redundancy and failover, fault tolerance and graceful degradation, circuit breakers, retry patterns, chaos engineering, and auto-healing.
- Solid experience in operating applications at scale, ideally within regulated or mission-critical environments.
- Familiarity with Google s Site Reliability Engineering (SRE) practices, especially around SLIs and SLOs, error budgets, and operational readiness.
- Strong background in monitoring, telemetry, and observability, with a focus on defining effective metrics and alerts that reduce noise and improve incident detection.
- Hands-on experience with OpenTelemetry and related observability tools (e.g., Prometheus, Grafana, Jaeger, Elastic, etc.) would be a plus.
- Experience collaborating in DevOps and hybrid cloud environments, ideally with exposure to containerized and microservices architectures.
Personal & Leadership Skills
- Strong thought leadership and influencing skills; ability to challenge the status quo and advocate for meaningful change.
- Architectural mindset, with a structured approach to problem-solving and strong planning and design capabilities.
- High personal integrity, accountability, and a proactive approach to ownership and decision-making.
- Excellent collaboration and communication skills, able to build trusted relationships across teams, functions, and geographies.
- Team player with the ability to work across disciplines and bring people together around shared goals.
- Demonstrated ability to foster understanding between application development and operations teams serving as a translator and facilitator between the two worlds.
- Fluent in English, both written and spoken.
#LI-Hybrid
About Swiss Re
Swiss Re is one of the world s leading providers of reinsurance, insurance and other forms of insurance-based risk transfer, working to make the world more resilient. We anticipate and manage a wide variety of risks, from natural catastrophes and climate change to cybercrime. We cover both Property & Casualty and Life & Health. Combining experience with creative thinking and cutting-edge expertise, we create new opportunities and solutions for our clients. This is possible thanks to the collaboration of more than 14,000 employees across the world.
If you are an experienced professional returning to the workforce after a career break, we encourage you to apply for open positions that match your skills and experience.
Keywords:
Reference Code:
134808