Data Engineering Lead

6 - 10 years

35 - 45 Lacs

Posted:23 hours ago| Platform: Naukri logo

Apply

Work Mode

Hybrid

Job Type

Full Time

Job Description

Job Overview

We are seeking an experienced Data Engineering Lead with a solid foundation in Python, MongoDB, PostgreSQL, Apache Airflow and other data management tools to join our innovative team. This role is critical for building and managing robust data pipelines that extract and process data from health system EHR systems, supporting our AI-driven healthcare applications.

Key Responsibilities:

  • AI Infused Data Pipeline Construction:

    Develop and optimize data pipelines for the efficient extraction, transformation, loading and management of large-scale healthcare and clinical guidelines data.
  • Healthcare Data Handling:

    Manage sensitive healthcare data adhering to FHIR, HL7 standards, as well as the version and variability of Epic Clarity extracts, Quality Measures, Drug Formularies and Clinical Guidelines ensuring compliance with data regulations and client needs.
  • Data Orchestration:

    Use Airflow, Temporal or other similar AI-native next generation tools to manage complex data and knowledge workflows essential for processing healthcare data at scale. This includes RAG, GraphRAG, Tool calling, MCP and other AI enabled Data Orchestration, Search, Retrieval and Aggregation.
  • Assessments and Collaboration:

    Support data scientists and backend developers by integrating and maintaining data systems across the organization. As well as in assessment and validation of large-scale Gen-AI based systems using tools such as RAGAs, Opik etc.
  • Innovative Problem-Solving:

    Apply technical skills to address unique challenges in the healthcare sector, contributing to solutions that enhance patient care.

Minimum Qualifications

  • Bachelors degree in Computer Science, Engineering, Data Science, or a related field.
  • Minimum of 12 years of experience in data engineering, with proficiency in Python, PostgreSQL or other variant, MongoDB and Milvus Vector DB.
  • Experience building and managing data pipelines using healthcare data from hospital EHR systems and other clinical data sources such as HIEs and clinical standards bodies (CMS, NCQA etc).
  • Demonstrated experience with data lakes, ensuring robust and scalable data storage solutions.

Highly Preferred Qualifications

  • Azure technology stack expertise to improve data processing and storage capabilities.
  • Experience with clinical data quality management and validation to ensure the accuracy and reliability of data solutions.
  • Knowledge of anomaly and outlier detection techniques in large datasets.
  • Proficiency in querying massive datasets using database queries to drive insights and decisions. Experience with databases/OLAP systems such as Clickhouse and Snowflake as well as data platforms such as Starburst.
  • Experience in building and maintaining RESTful Web Services to support data integration and accessibility.
  • Experience with Large Scale Recommendation Systems and managing the lifecycle of recommended items that receive end-user clinical feedback. Experience with learning systems (Reinforcement Learning) that use feedback as an input to Machine Learning.
  • Is familiar with LLMs, RAG and fine-tuning architectures.

Detailed Skills:

Must-have skills:

  • In-depth knowledge of LLM, Prompt Engineering, Embedding Techniques
  • In-depth knowledge of Prompt Optimization
  • Knowledge of Various Chunking strategies
  • Performance and Scalability of GenAI Solution
  • Hands-on experience in Python and application to Data Engineering via Airflow
  • Hands-on Azure OpenAI stack for GPT 4, 4o, o3/o4 models
  • Hands-on experience with Weaviate/Milvus Vector DB
  • Should be able to work with Onshore/Offshore team
  • In-depth knowledge of NLP
  • Guide the team for problem solving
  • In-depth knowledge in Prompt Tuning and Context Management
  • Experience in maximizing accuracy, minimizing latency, and enhancing performance of GenAI solutions
  • Should have taken at least one GenAI implementation into Production
  • Hands-on experience with tools and frameworks like LangChain, LlamaIndex, or similar
  • Should have implemented at least one solution in RAG implementation
  • Strong exposure to Healthcare domain data workflows
  • Hands-on experience with FHIR (Fast Healthcare Interoperability Resources) and HL7 standards
  • Experience in Healthcare data engineering including ETL, normalization, validation, and secure data exchange
  • Ability to integrate clinical data sources (EHR/EMR systems) into AI and Analytics pipelines
  • Knowledge of tool calling, MCP, RAG/GraphRAG and other LLM based retrieval and search.

Nice to have skills:

  • Expertise in Model Monitoring and Debugging
  • Expertise in CI/CD of GenAI Solution
  • Familiarity with HIPAA-compliant data handling and PHI/PII data governance in Healthcare AI/ML systems
  • Experience in fine tuning for Data Applications, as well as applications healthcare scenarios (such as fined tuned language models for medication terminologies)

Benefits

  • Competitive salary and performance-based bonuses.
  • Opportunities for professional development and advancement within a rapidly growing company.
  • Collaborative and inclusive work environment.
  • Cutting-edge technology projects with real-world impact

Mock Interview

Practice Video Interview with JobPe AI

Start Data Science Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now

RecommendedJobs for You

gurugram, haryana, india

gurgaon, haryana, india