Optum's Applied AI team is seeking an experienced and pragmatic Lead Data Scientist to drive the end-to-end ML lifecycle, from problem definition to robust, scalable model deployment and continuous improvement. This role demands deep expertise in machine learning, particularly with advanced transformer models and Large Language Models (LLMs), applied to complex domains such as clinical document understanding and semantic search.
Primary Responsibilities:
- Own the end-to-end data science lifecycle - from problem definition and experimentation to deployment, monitoring, and continuous improvement
- Design and deploy robust, explainable, and scalable ML models for clinical document understanding, named entity recognition, context disambiguation, and semantic search across prospective and retrospective use cases
- Lead model development with a focus on production-readiness, incorporating solid MLOps, reproducibility, and experimentation practices
- Diagnose and optimize model performance, mitigate bias, and ensure analytical integrity, accuracy, and operational efficiency
- Work hands-on with multi-modal transformer models for tasks like NER, handwriting and form understanding, and document classification
- Leverage LLMs and SLMs for clinical reasoning, automated annotation, data generation, and downstream distillation
- Collaborate with cross-functional teams - including ML engineers, annotators, and clinical domain experts - to translate business challenges into deployable AI solutions
- Implement automated data labeling pipelines using techniques like active learning, weak supervision, and human-in-the-loop systems
- Ensure reproducibility and operational excellence through Git, DVC, CI/CD pipelines, and orchestration tools (e.g., Airflow, Kafka)
- Mentor and guide junior scientists and engineers, lead technical design reviews, and set best practices for model architecture and evaluation
- Continuously identify and close gaps in the ML platform, proposing and implementing innovative solutions to improve performance, scalability, and reliability
Required Qualifications:
- Bachelor's degree in Computer Science or adjacent field
- Advanced degree in a field that emphasizes the use of data science/statistics techniques (e.g., Computer Science, Applied Mathematics, or a field with direct NLP application)
- 5+ years of experience in Data Science with a focus on Machine Learning and Natural Language Processing
- Solid understanding of machine learning algorithms, NLP principles, and data modeling principles
- Proficiency in Python, R, and SQL. Experience in NLP libraries such as NLTK, SpaCy, and BERT
- Proven excellent Communication Skills
- Proven flexibility to provide support during critical business periods
- Proven ability to interpret and present complex data in various formats.
- Proven solid leadership skills, ability to meet deadlines, and work independently. An analytical mindset for addressing complex business needs
- Proven positive team player with a drive to learn and contribute to achieving results
- Willingness to work in varying shifts.