Role & responsibilities Lead the design, development, and optimization of ETL/ELT pipelines using Databricks , Python, Spark , and Delta Lake . Architect scalable data solutions using Medallion architecture (Bronze, Silver, Gold layers). Design and implement data models and transformations using SQL and Python . Build and maintain audit frameworks to ensure traceability, compliance, and data lineage. Develop data quality monitoring and automated testing frameworks for pipeline reliability. Perform data analysis to support operational data requests and user queries. Collaborate with clinical data teams to analyse IRT/RTSM datasets . Create and maintain dashboards and reports using BI tools (e.g., Superset Power BI, Tableau, Qlik, or similar). Help manage CICD and automated code branching/deployment Ensure compliance with GxP , CDISC , and other regulatory standards. Mentor junior engineers and promote engineering best practices. Preferred candidate profile 710 years of experience in data engineering , with leadership or team lead responsibilities. Strong hands-on experience with Databricks , Apache Spark , and Delta Lake . Advanced proficiency in SQL and Python for data transformation and automation. Experience with ETL/ELT orchestration , job optimization, and performance tuning. Proven experience designing and implementing audit , data quality , and testing frameworks . Hands-on experience with IRT/RTSM clinical trial data systems . Strong data analysis skills and ability to interpret complex datasets. Experience with BI/reporting tools such as Power BI , Tableau , or Qlik . Knowledge of clinical data standards (e.g., CDISC, SDTM, ADaM). Experience with cloud platforms (Azure, AWS, or GCP) and CI/CD pipelines Experience in Pharma or Life Sciences , especially in clinical trial operations . Familiarity with data governance and metadata management tools. Certifications in Databricks , cloud platforms , or clinical data standards . Exposure to machine learning workflows or data science collaboration