Job Description Summary As a Principal Engineer in Site Reliability Engineering (SRE), you'll be a technical leader shaping the reliability, scalability, and efficiency of our cloud-based platforms
You'll architect fault-tolerant systems, champion automation to reduce toil, and mentor teams on SRE principles This individual contributor role is perfect for a seasoned engineer passionate about cloud ecosystems, distributed systems, and turning complex challenges into streamlined, high-impact solutions You will define SRE best practices, drive automation, observability, incident response, performance, and collaborate with cross-functional teams (eg Dev, Security, Product) to ensure that the systems meet the highest standards of reliability You will be a senior technical leader who influences architecture, leads complex projects, mentor others, and acts as a stabilizing presence during incidents GE Healthcare is a leading global medical technology and digital solutions innovator Our mission is to create a world where healthcare has no limits Unlock your ambition, turn ideas into world-changing realities, and join an organization where every voice makes a difference, and every difference builds a healthier world
Job Description -
Lead Platform Reliability Initiatives : Design and optimize multi-region, highly available cloud architectures using services like container orchestration, compute instances, managed databases, and object storage to achieve SLIs/SLOs and error budgets that exceed 99
99% availability
-
Drive Automation and IaC : Build and maintain Infrastructure as Code ( IaC ) pipelines with tools like CDK, Terraform, or CloudFormation; automate deployments via CI/CD tools and serverless functions to accelerate delivery while minimizing operational overhead
-
Reliability, Availability & Resilience : Establish , track and enforce SLIs, SLOs, error budgets
Ensure systems availability, latency, and throughput meet targets Build strategies for redundancy, high availability, multi-AZ / multi-region failover, backups, disaster recovery
-
Enhance Observability and Monitoring: Implement comprehensive monitoring stacks with cloud-native metrics, open-source monitoring, and visualization tools; define alerting thresholds, conduct root cause analyses (RCAs), and optimize performance for distributed systems including message brokers, caching layers, and relational databases
-
Champion Security and Compliance : Enforce cloud best practices for identity and access management, encryption, networking, and policy-as-code with tools like OPA ; integrate security into CI/CD pipelines to protect sensitive data in regulated environments
-
Mentor and Influence : Guide junior engineers through design reviews, incident post-mortems, and adoption of SRE practices; collaborate with stakeholders to shape cloud strategy, cost optimization, and capacity planning for enterprise-scale workloads
Educational Qualification:
-
15+ years in software engineering, site reliability engineering, or cloud platform roles, with significant exposure to AWS production systems
-
Deep hands-on expertise with core cloud services including container orchestration, compute , databases, storage, monitoring, identity management, serverless, and networking
-
Proficiency in programming languages like Python, Go, or Java for automation, scripting, and building tools
-
Deep understanding of observability tooling: metrics, logging, distributed tracing, alerting ( eg CloudWatch, Prometheus, Grafana, ELK, etc)
-
Proven track record in implementing SRE practices: SLIs/SLOs, error budgets, monitoring/alerting, and incident management
Inclusion and Diversity:
GE Healthcare is an Equal Opportunity Employer where inclusion matters
Employment decisions are made without regard to race, colour, religion, national or ethnic origin, sex, sexual orientation, gender identity or expression, age, disability, protected veteran status or other characteristics protected by law
We expect all employees to live and breathe our behaviours: to act with humility and build trust; lead with transparency; deliver with focus, and drive ownership always with unyielding integrity
Our total rewards are designed to unlock your ambition by giving you the boost and flexibility you need to turn your ideas into world-changing realities
Our salary and benefits are everything you d expect from an organization with global strength and scale, and you ll be surrounded by career opportunities in a culture that fosters care, collaboration, and support
#Everyroleisvital
#LI-SM1
#Hybrid
Relocation Assistance Provided: No