As a Senior SRE Operations Manager, you will own the vision and execution for the reliability, scaling, and automated operations of critical components of Oracle Health’s cloud platform. You will apply your technical and organizational skills to reduce operational toil, solve complex production incidents, and establish a world-class standard for operational excellence. This role is responsible for driving the continued automation and upkeep of our existing service distribution platform while building out the foundational Observability and Management solutions for Oracle’s new AI-driven Electronic Health Record.The model candidate should have knowledge and experience working with highly distributed cloud platforms, architectural know-how, a passion for building world-class SRE teams, and a demonstrated ability to balance work across multiple initiatives and projects. You should value simplicity and scale, work comfortably in a collaborative, agile environment, and be excited to learn new technologies and tackle new challenges.About the Role:
- The role will be responsible for managing a team of 8+ Site Reliability Engineers with a variety of skillsets and experience levels.
- The role will be responsible for coordinating with multiple development and operations stakeholders to align on work, priorities, and to ensure the team has a concise operational vision for all projects and initiatives.
- The role will be responsible for ensuring the team meets key Service Level Objectives (SLOs), establishes operational milestones (e.g., toil reduction targets), and manages and achieves production reliability commitments.
- The Role will be responsible for helping coordinate and drive incident resolutions as part of a world-wide, distributed, support model.
About You
- You have solid communication skills. You can clearly explain complex operational post-mortems and production risk profiles to a variety of technical knowledge levels, including leadership.
- You will help set clear expectations for the team, and can drive a culture focused on toil reduction, observability, incident prevention and incident remediation.
- You have exposure to Cloud, SaaS, and virtualization concepts, specifically regarding performance tuning, stability, and resiliency.
- You have expert knowledge and experience defining, calculating, and monitoring system's SLIs, SLOs, and SLAs, ensuring accountability to defined error budgets, when applicable.
- You managed teams that have successfully built and deployed tooling to improve the reliability and automation of systems, reducing manual toil and improving the scalability of platforms.
- You are comfortable with ambiguity, especially in defining Observability standards and POCs for new products that are being actively developed.
- You have a strong sense of ownership and can drive operational projects and initiatives to completion.
- You are comfortable working across teams and organizations to help bridge the gap between infrastructure, networking, security, and application teams to diagnose critical issues and remove blockers when necessary.
- You have experience managing customer incidents and coordinating support to reach a rapid and positive outcome for customers that suffer disruptions to their services.
Minimum Qualifications:
- 5+ years leading high performing SRE or Cloud Operations teams.
- 10+ years in the industry as a people manager or Software/Reliability Engineer.
- Strong understanding of distributed systems, microservices architectures, and modern cloud deployment patterns (Kubernetes/Containers/AWS/GPC/Azure).
- Experience with key observability tooling such as New Relic, APM, Splunk, Grafana, Prometheus, etc.
Preferred Qualifications:
- 7+ years leading high performing SRE teams.
- BS in Computer Science, or equivalent operational experience.
- Experience in Incident Management, Blameless Post-Mortem culture, and driving subsequent remediation efforts.
Responsibilities
Our team bridges the gap between the existing EHR, and the new AI Driven EHR. We maintain a service distribution and management platform for 250+ containerized services, driving operational stability and automation for existing cloud workflows. We are building a new team to develop, maintain, and guide teams on centralized observability and operations for the new Gen2 EHR with an emphasis on SLO attainment, proactive automation, and rapid incident response. We pride ourselves on a customer-first mentality, acting as a core SRE team dedicated to orchestrating tooling and centralized solutions that enable consuming developers to focus on features instead of the nuances of platform stability and repetitive operational tasks.As a leader on this team, you would be responsible for driving the reliability roadmap, operational excellence initiatives, and observability implementations across both of our primary areas of responsibility to ensure the seamless evolution of our existing platform to the new AI Driven EHR.
Qualifications
Career Level - M3
About Us
As a world leader in cloud solutions, Oracle uses tomorrow’s technology to tackle today’s challenges. We’ve partnered with industry-leaders in almost every sector—and continue to thrive after 40+ years of change by operating with integrity.We know that true innovation starts when everyone is empowered to contribute. That’s why we’re committed to growing an inclusive workforce that promotes opportunities for all.Oracle careers open the door to global opportunities where work-life balance flourishes. We offer competitive benefits based on parity and consistency and support our people with flexible medical, life insurance, and retirement options. We also encourage employees to give back to their communities through our volunteer programs.We’re committed to including people with disabilities at all stages of the employment process. If you require accessibility assistance or accommodation for a disability at any point, let us know by emailing accommodation-request_mb@oracle.com or by calling +1 888 404 2494 in the United States.Oracle is an Equal Employment Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability and protected veterans’ status, or any other characteristic protected by law. Oracle will consider for employment qualified applicants with arrest and conviction records pursuant to applicable law.