Generative AI Data Engineer

0 years

0 Lacs

Posted:1 day ago| Platform: Linkedin logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

About The Role

We are seeking a GenAI Data Engineer to design, build, and optimize data pipelines for unstructured and semi-structured content, integrating advanced AI/ML capabilities. This role combines modern ETL expertise with Vector Database & GenAI integration to support intelligent document processing and semantic search applications.

Key Responsibilities

  • Develop and maintain data ingestion pipelines using Azure Data Factory (ADF) and Databricks for structured and unstructured data.
  • Create notebooks to process PDF and Word documents, including extracting text, tables, charts, graphs, and images.
  • Apply NLP / Embedding Models (e.g., OpenAI, Hugging Face, sentence-transformers) to convert extracted content into embeddings.
  • Store embeddings and metadata into Vector Databases (e.g., FAISS, Pinecone, Milvus, Weaviate, ChromaDB).
  • Design and implement semantic search and retrieval workflows to enable prompt-based query capabilities.
  • Optimize ETL pipelines for scalability, reliability, and performance.
  • Collaborate with data scientists and solution architects to integrate GenAI capabilities into enterprise applications.
  • Follow best practices for code quality, modularity, and documentation.

Required Skills & Experience

  • Proven experience in Azure Data Factory (ADF) and Databricks for building ETL/ELT workflows.
  • Strong programming experience in Python (pandas, PySpark, PyPDF, python-docx, OCR libraries, etc.).
  • Hands-on experience with Vector Databases and semantic search implementation.
  • Understanding of embedding models, LLM-based retrieval, and prompt engineering.
  • Familiarity with handling multi-modal data (text, tables, images, charts).
  • Strong knowledge of data modeling, indexing, and query optimization.
  • Experience with cloud platforms (Azure preferred).
  • Strong problem-solving, debugging, and communication skills.

Nice To Have

  • Experience with knowledge graphs or RAG (Retrieval-Augmented Generation) pipelines.
  • Exposure to MLOps practices and LLM fine-tuning.
  • Familiarity with enterprise-scale document management systems.
(ref:hirist.tech)

Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

RecommendedJobs for You