ROLE SUMMARY The AI Acceleration (AIA) function within the Chief Marketing Office (CMO) is the single, business-led engine that owns the design, delivery, and scale-up of priority AI capabilities across Commercial. AIA works in tight collaboration with various Pfizer functions to deploy and maintain production-grade AI solutions that simplify how we work and drive measurable value across all processes. As a Data engineer in the newly formed AIA team, you should be able to design build, integrate, curate, and operationalize data and models into a semantic layer to power AI-enabled products. Additionally, you need to ensure interpretability, lineage of reusable data assets and uphold the bar on governance, performance measurement, and responsible AI
Data Pipeline Development - Design and build the semantic layer that enables contextualized and explainable AI-driven workflows, including ontology development, entity models and knowledge graphs
- Build robust data pipelines to ingest, transform and prepare structured and unstructured datasets from diverse internal and external sources e. g. CRM platforms (e. g. , Veeva) and field force deployment or alignment tools, HCP engagement data, digital metrics, and campaign data sources.
- Ensure data is clean, normalized, and optimized for downstream AI/ML and analytics use.
Infrastructure & Architecture - Develop and manage data infrastructure using cloud platforms (e. g. , AWS, Azure, GCP).
- Implement data lake, data warehouse, and real-time streaming architecture.
- Support containerization and orchestration using data management tools
- Enable real-time and batch data access for AI agents, LLM-based applications and analytical products
Data Quality & Governance - Implement data validation, profiling, and monitoring processes to ensure accuracy and reliability.
- Collaborate with compliance teams to ensure data handling aligns with HIPAA, FDA, and other U. S. healthcare regulations.
- Maintain metadata, lineage, and audit trails for all data assets.
Collaboration & Cross-Functional Support - Collaborate with data scientists, ML engineers and product managers to optimize data for use in RAG, autonomous AI agents and retrieval pipelines
- Translate business needs into technical specifications for data ingestion and transformation.
- Support rapid prototyping and iterative development of AI solutions.
Performance Optimization - Tune data workflows for performance, scalability, and cost-efficiency.
- Implement caching, indexing, and partitioning strategies to support high-volume data processing.
- Monitor system health and troubleshoot bottlenecks in data pipelines.
Documentation & Knowledge Sharing - Maintain clear documentation for data architecture, pipelines, and operational procedures.
- Share best practices and mentor junior data engineers.
- Contribute to internal data engineering communities and innovation forums.
BASIC QUALIFICATIONS - Bachelor s or Master s degree in Computer Science, Engineering, or related field.
- 5+ years of experience in data engineering, preferably in healthcare or life sciences.
- Proficiency in SQL, Python, and data pipeline frameworks (e. g. , Apache Airflow, Spark, Kafka).
- Experience with cloud data platforms (e. g. , AWS Redshift, Azure Synapse, Google BigQuery).
- Hands-on experience with pipeline orchestration (e. g. Airflow, DBT, Prefect , Dagster)
- Familiarity with MLOps and LLMOps(e. g. MLflow, Vertex AI, Langchain, LlamaIndex)
- Excellent problem-solving, communication, and collaboration skills.
- Extensive experience working in agile setting or bring agile best-practice mentorship to the team.
- Familiarity with data privacy standards, pharma industry practices/GDPR compliance is preferred.
- Prioritizes excellence in Data Engineering by following F. A. I. R. principles and adhering to engineering and documentation standards set for by the organization.
Information & Business Tech