The Principal Data Engineer will be a senior individual contributor responsible for designing, building, and optimizing advanced data solutions that power enterprise-wide analytics, AI/ML, and data-driven decision-making. Reporting directly to the Senior Director of Data Platforms, this role will serve as a technical expert and thought partner in scaling modern data architectures, enabling AI orchestration, and delivering robust, secure, and compliant data products. 
  This position is highly hands-on, requiring expertise in data engineering, Databricks, Python, AI/ML platforms, and orchestration frameworks. The Principal Engineer will work across functional teams to design and implement high-performance pipelines, ensure platform reliability, and set technical standards for the organizations data engineering practices. 
  
   
Job Duties & Responsibilities
   
   -  Design & Build Data Systems: Architect and implement scalable data pipelines, lakehouse/lake/warehouse environments, APIs, and orchestration workflows to support analytics, AI/ML, and business intelligence. 
-  Enable AI & ML at Scale: Partner with Data Science and AI teams to productionize ML models, automate workflows, and enable AI orchestration frameworks (e.g., MLflow, Airflow, Databricks workflows). 
-  Technical Leadership: Act as a hands-on subject matter expert in Databricks, Python, Spark, and related technologiesdriving adoption of best practices and mentoring other engineers. 
-  Optimize Performance: Ensure data pipelines and platforms are highly available, observable, and performant at scale through monitoring, automation, and continuous improvement. 
-  Ensure Compliance & Security: Build solutions that adhere to data governance, privacy, and regulatory frameworks (HIPAA, SOC 2, GCP, GDPR) within clinical research, life sciences, and healthcare contexts. 
-  Collaborate Across Functions: Work closely with platform engineering, analytics, product management, and compliance teams to deliver aligned solutions that meet enterprise needs. 
-  Advance Modern Architectures: Contribute to evolving data platform strategies, including event-driven architectures, data mesh concepts, and lakehouse adoption. 
 
   
Location
    This role is open to candidates working in the United States (remote or hybrid). 
    
  
Basic Qualifications
    -  Bachelors degree in Computer Science, Engineering, Data Science, or equivalent practical experience. 
-  8+ years of data engineering experience in designing, implementing, and optimizing large-scale data systems. 
-  Strong proficiency in Python, with production-level experience in building reusable, scalable data pipelines. 
-  Hands-on expertise with Databricks (Delta Lake, Spark, MLflow), and modern orchestration frameworks (Airflow, Prefect, Dagster, etc.). 
-  Proven track record of deploying and supporting AI/ML pipelines in production environments. 
-  Experience with cloud platforms (AWS, Azure, or GCP) for building secure and scalable data solutions. 
-  Familiarity with regulatory compliance and data governance standards in healthcare or life sciences. 
 
  
Preferred Qualifications
   
  -  Experience with event-driven systems (Kafka, Kinesis) and real-time data architectures. 
-  Strong background in data modeling, lakehouse/lake/warehouse design, and query optimization. 
-  Exposure to AI orchestration platforms and generative AI use cases. 
-  Contributions to open-source projects or published work in data engineering/ML. 
-  Agile development experience, including CI/CD, automated testing, and DevOps practices. 
 
     
   Mandatory Competencies  
  -  Development Tools and Management - Development Tools and Management - CI/CD 
-  Tech - Agile Methodology 
-  DevOps/Configuration Mgmt - Cloud Platforms - AWS 
-  DevOps/Configuration Mgmt - Cloud Platforms - GCP 
-  Database - Database Programming - SQL 
-  Data Science and Machine Learning - Data Science and Machine Learning - Databricks 
-  Big Data - Big Data - SPARK 
-  Data Science and Machine Learning - Data Science and Machine Learning - AI/ML 
-  Data Science and Machine Learning - Data Science and Machine Learning - Gen AI 
-  Beh - Communication and collaboration