8 - 13 years

40.0 - 80.0 Lacs P.A.

Indore, Gurgaon, Jaipur

Posted:2 months ago| Platform: Naukri logo

Apply Now

Skills Required

Data modelingGCPDebuggingInfrastructureData processingData qualityApacheMonitoringSQLPython

Work Mode

Work from Office

Job Type

Full Time

Job Description

Job Title: Mid Senior Data Engineer (LLM AI Infrastructure) About the Role We are looking for a Data Engineer with a minimum of 1 year of experience working with LLMs and a strong background in data pipelines, storage optimization, and AI-driven data processing. The ideal candidate will play a key role in managing, optimizing, and scaling data architectures to support LLM applications and AI workflows. Key Responsibilities - Design and develop scalable, high-performance data pipelines for AI and LLM-powered applications. - Work with structured and unstructured data, ensuring efficient preprocessing, transformation, and storage. - Implement and optimize data retrieval and indexing strategies for LLM fine-tuning and inference. - Manage vector databases (FAISS, Chroma, Pinecone, Weaviate, Astra DB.) for retrieval-augmented generation (RAG) workflows. - Build and maintain ETL/ELT workflows using tools like Airflow, Prefect, or Dagster. - Ensure data quality, governance, and lineage to support AI-driven insights. - Collaborate with ML engineers and researchers to improve LLM data pipelines and infrastructure. - Work with cloud-based data storage solutions (AWS S3, GCP BigQuery, Azure Data Lake, etc.). - Automate data monitoring, validation, and debugging to ensure seamless pipeline execution. Required Skills Experience - 4+ years of overall experience in data engineering, AI infrastructure, or related fields. - At least 1 year of hands-on experience working with LLMs and AI data workflows . - Strong expertise in Python, SQL, and distributed data processing frameworks (Spark, Dask, Ray, or similar). - Experience with vector databases and retrieval systems for AI-driven applications. - Knowledge of data modeling, indexing, and storage optimization . - Familiarity with ETL/ELT frameworks like Apache Airflow, Prefect, or Dagster. - Experience handling large-scale datasets and optimizing data ingestion pipelines. - Understanding of cloud-based data architectures (AWS, GCP, Azure). - Basic knowledge of MLOps principles and integrating data workflows with AI models. Nice to Have - Exposure to LLM fine-tuning, embeddings, and retrieval-augmented generation (RAG). - Experience with LangChain, Hugging Face Transformers, or OpenAI APIs. - Familiarity with feature stores (Feast, Tecton) and streaming platforms (Kafka)

IT Services and IT Consulting
Gurgaon Haryana +

RecommendedJobs for You

Chennai, Pune, Delhi, Mumbai, Bengaluru, Hyderabad, Kolkata

Pune, Bengaluru, Mumbai (All Areas)

Chennai, Pune, Delhi, Mumbai, Bengaluru, Hyderabad, Kolkata

Bengaluru, Hyderabad, Mumbai (All Areas)

Hyderabad, Gurgaon, Mumbai (All Areas)