Posted:2 months ago| Platform:
Work from Office
Full Time
Job Title: Mid Senior Data Engineer (LLM AI Infrastructure) About the Role We are looking for a Data Engineer with a minimum of 1 year of experience working with LLMs and a strong background in data pipelines, storage optimization, and AI-driven data processing. The ideal candidate will play a key role in managing, optimizing, and scaling data architectures to support LLM applications and AI workflows. Key Responsibilities - Design and develop scalable, high-performance data pipelines for AI and LLM-powered applications. - Work with structured and unstructured data, ensuring efficient preprocessing, transformation, and storage. - Implement and optimize data retrieval and indexing strategies for LLM fine-tuning and inference. - Manage vector databases (FAISS, Chroma, Pinecone, Weaviate, Astra DB.) for retrieval-augmented generation (RAG) workflows. - Build and maintain ETL/ELT workflows using tools like Airflow, Prefect, or Dagster. - Ensure data quality, governance, and lineage to support AI-driven insights. - Collaborate with ML engineers and researchers to improve LLM data pipelines and infrastructure. - Work with cloud-based data storage solutions (AWS S3, GCP BigQuery, Azure Data Lake, etc.). - Automate data monitoring, validation, and debugging to ensure seamless pipeline execution. Required Skills Experience - 4+ years of overall experience in data engineering, AI infrastructure, or related fields. - At least 1 year of hands-on experience working with LLMs and AI data workflows . - Strong expertise in Python, SQL, and distributed data processing frameworks (Spark, Dask, Ray, or similar). - Experience with vector databases and retrieval systems for AI-driven applications. - Knowledge of data modeling, indexing, and storage optimization . - Familiarity with ETL/ELT frameworks like Apache Airflow, Prefect, or Dagster. - Experience handling large-scale datasets and optimizing data ingestion pipelines. - Understanding of cloud-based data architectures (AWS, GCP, Azure). - Basic knowledge of MLOps principles and integrating data workflows with AI models. Nice to Have - Exposure to LLM fine-tuning, embeddings, and retrieval-augmented generation (RAG). - Experience with LangChain, Hugging Face Transformers, or OpenAI APIs. - Familiarity with feature stores (Feast, Tecton) and streaming platforms (Kafka)
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
Bengaluru, Hyderabad
INR 3.5 - 8.5 Lacs P.A.
Mumbai, Bengaluru, Gurgaon
INR 5.5 - 13.0 Lacs P.A.
Chennai, Pune, Delhi, Mumbai, Bengaluru, Hyderabad, Kolkata
INR 3.0 - 7.0 Lacs P.A.
Chennai, Pune, Mumbai (All Areas)
INR 5.0 - 15.0 Lacs P.A.
Pune, Bengaluru, Mumbai (All Areas)
INR 11.0 - 21.0 Lacs P.A.
Chennai, Pune, Delhi, Mumbai, Bengaluru, Hyderabad, Kolkata
INR 15.0 - 16.0 Lacs P.A.
Pune, Bengaluru, Mumbai (All Areas)
INR 10.0 - 15.0 Lacs P.A.
Bengaluru, Hyderabad, Mumbai (All Areas)
INR 0.5 - 3.0 Lacs P.A.
Hyderabad, Gurgaon, Mumbai (All Areas)
INR 6.0 - 16.0 Lacs P.A.
Bengaluru, Noida
INR 16.0 - 22.5 Lacs P.A.