Get alerts for new jobs matching your selected skills, preferred locations, and experience range. Manage Job Alerts
16.0 - 18.0 years
0 Lacs
hyderabad, telangana, india
On-site
Making the World More Resilient - One Application at a Time! At Swiss Re, our mission is to make the world more resilient. As a leading global reinsurance company, we help individuals, businesses, and societies recover from disaster and build confidence for the future. To fulfil this mission, we must ensure our own systems and operations are equally resilient. In the Property & Casualty Reinsurance division, the stability and reliability of our IT systems directly impact our ability to deliver on this promise. That's why we're looking for a Lead Reliability Architect who will champion the resilience of our application landscape - ensuring our systems are built to withstand disruption, adapt quickly, and perform reliably even in the face of the unexpected. Key Responsibilities As our Lead Reliability Architect, you will: Own and shape the reliability strategy for our Property & Casualty IT landscape, ensuring alignment with Swiss Re's broader technology and business objectives. Overlook the reliability and resilience characteristics of our business-critical application portfolio and drive their continuous improvement. Define and maintain blueprints, guidelines, and best practices for resilience, high availability, disaster recovery, and fault tolerance - ensuring they are practical, actionable, and consistently applied across all development teams. Work directly with application development teams to support the implementation of these blueprints and architectural principles across the whole Software Development Lifecycle. Define and govern the monitoring & alerting baseline for our applications, which includes defining golden signals, SLIs, and SLOs across the whole system landscape. Drive the adoption of the OpenTelemetry framework in our observability stack - across applications, platforms, and shared infrastructure. Partner closely with Operations (Run) teams to analyze operational incidents and derive actionable insights for improving system reliability and fault response capabilities. Act as a bridge between engineering and operations , fostering a culture of reliability, accountability, and continuous improvement. Mentor teams and advocate for SRE practices , ensuring a consistent understanding and application of resilience and observability standards across our engineering workforce. About You We are looking for a candidate with a balanced profile of deep technical expertise and strong leadership capabilities. Professional & Technical Skills Overall 16+ Years of experience in Technology domain. Well-established track record and senior-level hands-on background in software and reliability engineering with a focus on distributed systems and high-availability architectures in public cloud environments (ideally Azure). Deep expertise in reliability and resilience engineering, including concepts like redundancy and failover, fault tolerance and graceful degradation, circuit breakers, retry patterns, chaos engineering, and auto-healing. Solid experience in operating applications at scale, ideally within regulated or mission-critical environments. Familiarity with Google's Site Reliability Engineering (SRE) practices, especially around SLIs and SLOs, error budgets, and operational readiness. Strong background in monitoring, telemetry, and observability, with a focus on defining effective metrics and alerts that reduce noise and improve incident detection. Hands-on experience with OpenTelemetry and related observability tools (e.g., Prometheus, Grafana, Jaeger, Elastic, etc.) would be a plus. Experience collaborating in DevOps and hybrid cloud environments, ideally with exposure to containerized and microservices architectures. Personal & Leadership Skills Strong thought leadership and influencing skills ability to challenge the status quo and advocate for meaningful change. Architectural mindset, with a structured approach to problem-solving and strong planning and design capabilities. High personal integrity, accountability, and a proactive approach to ownership and decision-making. Excellent collaboration and communication skills, able to build trusted relationships across teams, functions, and geographies. Team player with the ability to work across disciplines and bring people together around shared goals. Demonstrated ability to foster understanding between application development and operations teams - serving as a translator and facilitator between the two worlds. Fluent in English, both written and spoken. #LI-Hybrid? Keywords: Reference Code: 134808
Posted 1 week ago
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.
We have sent an OTP to your contact. Please enter it below to verify.
Accenture
64580 Jobs | Dublin
Wipro
25801 Jobs | Bengaluru
Accenture in India
21267 Jobs | Dublin 2
EY
19320 Jobs | London
Uplers
13908 Jobs | Ahmedabad
Bajaj Finserv
13382 Jobs |
IBM
13114 Jobs | Armonk
Accenture services Pvt Ltd
12227 Jobs |
Amazon
12149 Jobs | Seattle,WA
Oracle
11546 Jobs | Redwood City