The Principal Data Engineer will be a senior individual contributor responsible for designing, building, and optimizing advanced data solutions that power enterprise-wide analytics, AI/ML, and data-driven decision-making. Reporting directly to the Senior Director of Data Platforms, this role will serve as a technical expert and thought partner in scaling modern data architectures, enabling AI orchestration, and delivering robust, secure, and compliant data products.
This position is highly hands-on, requiring expertise in data engineering, Databricks, Python, AI/ML platforms, and orchestration frameworks. The Principal Engineer will work across functional teams to design and implement high-performance pipelines, ensure platform reliability, and set technical standards for the organizations data engineering practices.
Job Duties & Responsibilities
- Design & Build Data Systems: Architect and implement scalable data pipelines, lakehouse/lake/warehouse environments, APIs, and orchestration workflows to support analytics, AI/ML, and business intelligence.
- Enable AI & ML at Scale: Partner with Data Science and AI teams to productionize ML models, automate workflows, and enable AI orchestration frameworks (e.g., MLflow, Airflow, Databricks workflows).
- Technical Leadership: Act as a hands-on subject matter expert in Databricks, Python, Spark, and related technologiesdriving adoption of best practices and mentoring other engineers.
- Optimize Performance: Ensure data pipelines and platforms are highly available, observable, and performant at scale through monitoring, automation, and continuous improvement.
- Ensure Compliance & Security: Build solutions that adhere to data governance, privacy, and regulatory frameworks (HIPAA, SOC 2, GCP, GDPR) within clinical research, life sciences, and healthcare contexts.
- Collaborate Across Functions: Work closely with platform engineering, analytics, product management, and compliance teams to deliver aligned solutions that meet enterprise needs.
- Advance Modern Architectures: Contribute to evolving data platform strategies, including event-driven architectures, data mesh concepts, and lakehouse adoption.
Location
This role is open to candidates working in the United States (remote or hybrid).
Basic Qualifications
- Bachelors degree in Computer Science, Engineering, Data Science, or equivalent practical experience.
- 8+ years of data engineering experience in designing, implementing, and optimizing large-scale data systems.
- Strong proficiency in Python, with production-level experience in building reusable, scalable data pipelines.
- Hands-on expertise with Databricks (Delta Lake, Spark, MLflow), and modern orchestration frameworks (Airflow, Prefect, Dagster, etc.).
- Proven track record of deploying and supporting AI/ML pipelines in production environments.
- Experience with cloud platforms (AWS, Azure, or GCP) for building secure and scalable data solutions.
- Familiarity with regulatory compliance and data governance standards in healthcare or life sciences.
Preferred Qualifications
- Experience with event-driven systems (Kafka, Kinesis) and real-time data architectures.
- Strong background in data modeling, lakehouse/lake/warehouse design, and query optimization.
- Exposure to AI orchestration platforms and generative AI use cases.
- Contributions to open-source projects or published work in data engineering/ML.
- Agile development experience, including CI/CD, automated testing, and DevOps practices.
Mandatory Competencies
Development Tools and Management - Development Tools and Management - CI/CD
Tech - Agile Methodology
Data Science and Machine Learning - Data Science and Machine Learning - Gen AI
Data Science and Machine Learning - Data Science and Machine Learning - AI/ML
Big Data - Big Data - SPARK
DevOps/Configuration Mgmt - Cloud Platforms - AWS
DevOps/Configuration Mgmt - Cloud Platforms - GCP
Cloud - Azure - Azure Data Factory (ADF), Azure Databricks, Azure Data Lake Storage, Event Hubs, HDInsight
Beh - Communication and collaboration