Do you want to be a part of changing healthcare?
Oracle is excited to be using our resources, knowledge, and expertise as well as our successes in other industries and applying them to healthcare to make a meaningful impact. As people, we all participate in healthcare, it s deeply personal, and we put the human at the center of each of our decisions. Improving healthcare for all requires bringing unique perspectives and expertise together to holistically tackle the biggest problems in global health including physician burnout, patient access to data, and barriers to quality care.
Oracle Health Applications & Infrastructure (OHAI) is developing patient- and provider-centric solutions rapidly and securely. We leverage the power of Oracle Cloud Infrastructure (OCI) to deliver robust, scalable solutions across patient, provider, payer, public health, and life sciences sectors. At OHAI, you ll work with experts across industries and have access to cutting-edge technologies. We apply artificial intelligence, machine learning, large language models, learning networks, and data intelligence in an applied, scalable, and embedded way. Join us in creating people-centric healthcare experiences.
About the Team:
As part of the Oracle Health Foundations Organization, you ll join a high-impact team focused on using machine learning and intelligent automation to improve the performance and reliability of Oracle Healths cloud platforms. Were building systems that detect anomalies, predict incidents, and enable proactive intervention at scale.
As a Machine Learning Engineer, you ll contribute to the design and delivery of production-grade ML models and services that enhance system observability, incident prediction, and product resilience. You ll work closely with software engineers, site reliability engineers, and product teams to develop, deploy, and improve ML-based solutions in a cloud-native environment. This role offers a strong growth path toward deeper technical ownership and leadership.
Responsibilities:
- Build and deploy machine learning models that detect anomalies, predict incidents, and support automated reliability features.
- Develop production-ready software and services to integrate ML models with observability and operational pipelines.
- Contribute to the full ML lifecycle from data ingestion and model training to validation, deployment, and monitoring using modern MLOps tools and practices.
- Collaborate with other engineering and product teams to align ML solutions with business and system requirements.
- Analyze large-scale telemetry data (logs, metrics, traces) to identify patterns, root causes, and opportunities for improvement.
- Help maintain and evolve our ML infrastructure, data pipelines, and observability integrations.
- Stay current with advancements in applied machine learning, particularly time series modeling, anomaly detection, and reliability-focused ML.
Requirements:
- 5+ years of industry experience in software engineering or applied machine learning, with experience deploying ML models in production.
- Proficiency in Python and experience with ML frameworks such as TensorFlow, PyTorch, or scikit-learn.
- Solid understanding of the end-to-end ML lifecycle and experience with MLOps practices (model packaging, deployment, monitoring, etc.).
- Experience working with observability data (e.g., logs, metrics, traces) and time series data.
- Experience developing APIs, backend services, and working in cloud-native environments (OCI, AWS, GCP, or Azure).
- Strong knowledge of SQL and exposure to distributed data processing tools (e.g., Spark, BigQuery, Kafka, Flink).
- Comfortable working on cross-functional teams and contributing to technical discussions and code reviews.
- Bachelor s, Master s, or PhD in Computer Science, Data Science, or a related technical field is preferred.