Role & responsibilities
  Key Accountabilities: 
              Good experience as a Data Engineer with Veeva and Clinical Data Management,   
- Experience      building data pipelines for various heterogenous data sources.
 - Must have hands-on experience in      Veeva and Clinical Data Management CDMS.
 - Identifying,      designing and implementing scalable data delivery pipelines and automating      manual processes 
 - Building      required infrastructure for optimal data extraction, transformation and      loading of data using cloud technologies like AWS, Azure etc.,
 - Develop      end to end processes on the enterprise level for use by the clinical data      configuration specialist to prepare data extraction and transformations of      raw data quickly and efficiently from various sources at the study level 
 - Coordinate      with downstream users such as statistical programmers, SDTM programming,      analytics, and clinical data programmers to ensure that outputs meet      requirements of end users 
 - Experience      creating ELT and ETL to ingest data into data warehouse and data lakes 
 - Experience      creating reusable data pipelines for heterogenous data ingestions 
 - Manage      and maintain pipelines and troubleshoot data in data lake or warehouse 
 - Provide      visualization and analysis of data stored in data lake
 - Define      and track KPIs and provide continuous improvement 
 - Develop      and maintain, tools, libraries, and reusable templates of data pipelines      and standards for study level consumption by data configuration specialist      
 - Collaborate      with various vendors and cross functional teams to build and align on data      transfer specification and ensure a streamlined process of data      integration 
 - Provide      ad-hoc analysis and visualization as needed
 - Ensure      accurate delivery of data format and data frequency with quality      deliverables per specification 
 - Participate      in the development, maintenance and training rendered by standards and      other functions on transfer specs and best practices used by business.
 - Collaborate      with system architecture team in designing and developing data pipelines      as per business needs
 - Network      with key business stakeholders on refining and enhancing the integration      of structured and non-structured data.
 - Provide      expertise for structured and non-structured data ingestion 
 - Develop      organizational knowledge of key data sources, systems and be a valuable      resource to people in the company on how to best integrate data to pursue      company objectives. 
 - Provides      technical leadership on various aspects of clinical data flow including      assisting with the definition, build, and validation of application      program interfaces (APIs), data streams, data staging to various systems      for data extraction and integration
 - Experience      in creating data integrity and data quality checks for data ingestion 
 - Coordinates      with data base builders, clinical data configuration specialists and data      management (DM) programmers ensuring accuracy of data integration per SOPs
 - Provide      technical support / consultancy and end-user support, work with      Information Technology (IT) in troubleshooting, reporting, and resolving      system issues 
 - Develop      and deliver training programs to internal and external team, ensure timely      communication of new and/or revised data transfer specs
 - Continuous      Improvement/Continuous Development 
 - Efficiently      prepare and process large datasets for various end users for downstream      consumption 
 - Understand      end to end requirements for stakeholders and contribute to process and      conventions for clinical data ingestion and data transfer agreements 
 - Adhere      to SOPs for computer system validation and all GCP (Good Clinical      Practice) regulations 
 - Ensure      compliance with own Learning Curricula, corporate and/or GxP requirements 
 - Assists      with quality review of above activities performed by a vendor, as needed 
 - Assess      and enable clinical data visualization software in the data flows
 - Performs      other duties as assigned within timelines 
 - Performs      clinical data engineering tasks according to applicable SOPs (standard      operating procedures) and processes. 
 
Experience in data engineering, building data pipelines to manage heterogenous data ingestions or similar in data integration across multiple sources including collected data.
-             Experience with Python/R/ RShiny, SQL, NoSQL
 -             Cloud experience (i.e. AWS, AZURE or GCP) 
 -             Experience with GitLab, GitHub
 -             Experience with Jenkins, GitLab 
 -             Experience deploying data pipelines in the cloud 
 -             Experience with Apache Spark (databricks) 
 -             Experience setting up and working with data warehouse, data lakes (eg: snowflake, Amazon RedShift etc.,) 
 -             Experience setting up ELT and ETL
 -             Experience developing and maintaining data pipelines for large amounts of data efficiently
 -             Must understand database concepts. Knowledge of XML, JSON, APIs. 
 -             Demonstrated ability to lead projects and work groups. Strong project management skills. Proven ability to resolve problems independently and collaboratively. 
 -             Must be able to work in a fast-paced environment with demonstrated ability to juggle and prioritize multiple competing tasks and demands.
 -             Ability to work independently, take initiative and complete tasks to deadlines. 
 
Preferred candidate profile