Posted:4 days ago|
Platform:
On-site
Part Time
Job Title: Senior Data Engineer - Data Pipelines
Introduction to role:
Are you ready to architect FAIR data platforms that accelerate discovery and turn complex science into deployable insights? Do you want your engineering decisions to remove data friction, power analytics, and help deliver life-changing medicines faster?
In this role, you will design and operate the data foundations our scientists and analysts rely on to explore disease biology, generate evidence, and make bold decisions. You will work across high-performance computing and cloud environments to create secure, scalable pathways for data to move from experiments to models to actionable results.
You will join a collaborative, curious team that fuses data and technology with cutting-edge science. By building canonical models, trusted pipelines, and resilient infrastructure, you will help reduce time-to-insight, improve reproducibility, and enable the next wave of breakthroughs.
Accountabilities:Data Platform Architecture: Design and implement robust, secure, and scalable data platforms and services that enable discovery, access, and reuse (FAIR) and remove barriers to scientific analysis.
Modeling and Warehousing: Define canonical data models and dimensional schemas; build lakehouse/warehouse layers that optimize storage and query performance to speed up evidence generation.
Data Integration: Create reliable ingestion frameworks for structured and unstructured data; standardize metadata, lineage, and cataloging to make data findable and trustworthy.
Governance and Quality: Establish and enforce standards for data quality, access control, retention, and compliance; implement monitoring and observability for proactive issue detection and continuous improvement.
Infrastructure Engineering: Operate solutions across Unix/Linux HPC and AWS cloud environments; engineer for reliability, cost efficiency, scalability, and sustainable performance.
Collaboration and Stakeholder Engagement: Translate scientific and business requirements into clear architectural designs; partner with CPSS stakeholders, R&D IT, and DS&AI to co-create solutions that deliver measurable value.
Engineering Excellence: Apply version control, CI/CD, automated testing, design patterns, and code review to ensure maintainability, resilience, and a high bar for software craftsmanship.
Enablement and Information Exchange: Produce documentation, reusable components, and mentorship that uplift data engineering practices across teams; mentor peers and champion platform adoption.
Essential Skills/Experience:Data platform architecture: Design and implement robust, secure, and scalable data platforms and services that enable discovery, access, and reuse (FAIR).
Modeling and warehousing: Develop canonical data models, dimensional schemas, and lakehouse/warehouse layers; optimize storage and query performance.
Data integration: Build reliable ingestion frameworks for structured and unstructured data; standardize metadata, lineage, and cataloging.
Governance and quality: Establish standards for data quality, access control, retention, and compliance; implement monitoring and observability.
Infrastructure engineering: Operate solutions across Unix/Linux HPC and cloud environments (AWS preferred); ensure reliability, cost efficiency, and scalability.
Collaboration: Translate scientific and business requirements into architectural builds; partner with CPSS collaborators, R&D IT, and DS&AI to co-create solutions.
Engineering excellence: Apply version control, CI/CD, automated testing, design patterns, and code review to ensure maintainability and resilience.
Enablement: Produce documentation, reusable components, and guidance to uplift data engineering practices across teams.
Desirable Skills/Experience:
Hands-on expertise with Python or Scala and distributed data processing frameworks (Spark, PySpark); experience with SQL at scale.
Experience with modern lakehouse and warehouse technologies (Delta Lake, Apache Iceberg or Hudi, Redshift, Snowflake, Athena, BigQuery) and data modeling tools and practices (Dimensional, Data Vault).
Familiarity with orchestration and data workflow tools (Airflow, Argo, Dagster), event streaming (Kafka, Kinesis), and metadata/governance platforms (Collibra, Alation, AWS Glue).
Cloud engineering skills in AWS services relevant to data (S3, EMR, Glue, Lambda, Step Functions, ECS/EKS) and infrastructure-as-code (Terraform, CloudFormation).
Operating experience in Unix/Linux HPC environments, job schedulers (SLURM), containerization, and secure data access patterns for scientific workloads.
Observability and reliability practices (Prometheus, Grafana, CloudWatch), cost optimization, and performance tuning for large-scale analytics.
Strong communication skills to align diverse collaborators, translate domain concepts into technical builds, and drive adoption through documentation and enablement.
Relevant certifications or demonstrated leadership in data platform architecture, governance, or cloud engineering.
When we put unexpected teams in the same room, we unleash bold thinking with the power to
inspire life-changing medicines. In-person working gives us the platform we need to connect, work at pace and challenge
perceptions. That's why we work, on average, a minimum of three days per week from the office. But that
doesn't mean we're not flexible. We balance the expectation of being in the office while respecting individual
flexibility. Join us in our unique and ambitious world.
Why AstraZeneca:
At AstraZeneca you will engineer where impact is immediate and visible—your pipelines will shape evidence, accelerate decisions and help bring new treatments to people sooner. We bring experts from different fields together to solve hard problems quickly, backed by modern platforms across HPC and public cloud so your work runs at scale. Leaders remove barriers, teams share knowledge openly and we value kindness alongside ambition, giving you room to innovate while staying grounded in real patient outcomes.
Call to Action:
If you are ready to architect the data flows that move science into the clinic, send us your CV and tell us about the toughest pipeline you have built and scaled.
AstraZeneca
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.
We have sent an OTP to your contact. Please enter it below to verify.
Practice Python coding challenges to boost your skills
Start Practicing Python NowExperience: Not specified
5.05 - 7.25 Lacs P.A.
Experience: Not specified
5.05 - 7.25 Lacs P.A.