We are seeking a Lead Data Engineer with strong expertise in AWS-based data engineering and hands-on production experience with Generative AI and Agentic AI systems. This role blends deep data infrastructure engineering with modern GenAI application development, focusing on scalable, secure, and production-grade AI systems.

As a Lead, you will own solution architecture, guide engineering best practices, and mentor team members while remaining deeply hands-on. You will play a critical role in taking GenAI initiatives from experimentation to enterprise-scale deployment.

Key Responsibilities

Technical Leadership & Architecture

Lead the design and implementation of scalable data and GenAI architectures on AWS.

Define best practices, coding standards, and architectural patterns for GenAI and data engineering teams.

Review designs and code to ensure performance, security, reliability, and cost optimization.

Act as a technical mentor and escalation point for complex engineering challenges.

Data Engineering & Cloud Infrastructure

Design, build, and maintain robust data pipelines and AI data infrastructure using AWS services such as Glue, Lambda, S3, Redshift, Athena, Step Functions, and related services.

Develop high-performance PySpark pipelines for large-scale data processing in production environments.

Ensure data quality, lineage, governance, and reliability across platforms.

Generative AI & Agentic AI Systems

Design and deploy LLM-powered applications using frameworks such as LangChain, LlamaIndex, AutoGen, or equivalent.

Build and maintain Retrieval-Augmented Generation (RAG) pipelines, integrating S3, Bedrock, SageMaker, and vector databases (OpenSearch, Pinecone, FAISS, Chroma, Milvus, etc.).

Implement agentic reasoning, tool invocation, memory, and orchestration for single-agent and multi-agent workflows.

Integrate LLMs with internal data systems, APIs, and enterprise applications.

Deployment, Observability & Operations

Containerize and deploy AI services using Docker, ECS, or EKS, ensuring scalability and reliability.

Design and maintain CI/CD pipelines for data and AI workloads.

Implement monitoring, logging, tracing, and model observability using CloudWatch, X-Ray, and LLMOps tools.

Optimize runtime performance, latency, and infrastructure cost for AI workloads.

Collaboration & Delivery

Collaborate closely with ML engineers, data scientists, product teams, and cloud architects to deliver production-ready GenAI solutions.

Drive the transition from PoCs to enterprise-grade systems, ensuring compliance, security, and operational readiness.

Required Skills & Experience

7-11 years of experience in Data Engineering, with strong hands-on AWS expertise.

Advanced proficiency in Python and PySpark, with real-world production experience.

Proven experience delivering GenAI solutions in production environments (beyond demos or PoCs).

Hands-on experience with Agentic AI frameworks (LangChain, LlamaIndex, AutoGen, or similar).

Strong understanding of RAG architectures, vector databases, and embedding strategies.

Experience with LLMOps, prompt lifecycle management, evaluation, and performance monitoring.

Practical experience deploying workloads on AWS ECS/EKS, building CI/CD pipelines, and managing production systems.

Solid knowledge of AWS security best practices, including IAM, VPC, Secrets Manager, and data protection.

Strong problem-solving, communication, and technical leadership skills.

Lead Data Engineer - Generative AI & Agentic Systems (AWS)

About the Role

Key Responsibilities

Technical Leadership & Architecture

Lead the design and implementation of scalable data and GenAI architectures on AWS.

Define best practices, coding standards, and architectural patterns for GenAI and data engineering teams.

Review designs and code to ensure performance, security, reliability, and cost optimization.

Act as a technical mentor and escalation point for complex engineering challenges.

Data Engineering & Cloud Infrastructure

Design, build, and maintain robust data pipelines and AI data infrastructure using AWS services such as Glue, Lambda, S3, Redshift, Athena, Step Functions, and related services.

Develop high-performance PySpark pipelines for large-scale data processing in production environments.

Ensure data quality, lineage, governance, and reliability across platforms.

Generative AI & Agentic AI Systems

Design and deploy LLM-powered applications using frameworks such as LangChain, LlamaIndex, AutoGen, or equivalent.

Build and maintain Retrieval-Augmented Generation (RAG) pipelines, integrating S3, Bedrock, SageMaker, and vector databases (OpenSearch, Pinecone, FAISS, Chroma, Milvus, etc.).

Implement agentic reasoning, tool invocation, memory, and orchestration for single-agent and multi-agent workflows.

Integrate LLMs with internal data systems, APIs, and enterprise applications.

Deployment, Observability & Operations

Containerize and deploy AI services using Docker, ECS, or EKS, ensuring scalability and reliability.

Design and maintain CI/CD pipelines for data and AI workloads.

Implement monitoring, logging, tracing, and model observability using CloudWatch, X-Ray, and LLMOps tools.

Optimize runtime performance, latency, and infrastructure cost for AI workloads.

Collaboration & Delivery

Collaborate closely with ML engineers, data scientists, product teams, and cloud architects to deliver production-ready GenAI solutions.

Drive the transition from PoCs to enterprise-grade systems, ensuring compliance, security, and operational readiness.

Required Skills & Experience

7-11 years of experience in Data Engineering, with strong hands-on AWS expertise.

Advanced proficiency in Python and PySpark, with real-world production experience.

Proven experience delivering GenAI solutions in production environments (beyond demos or PoCs).

Hands-on experience with Agentic AI frameworks (LangChain, LlamaIndex, AutoGen, or similar).

Strong understanding of RAG architectures, vector databases, and embedding strategies.

Experience with LLMOps, prompt lifecycle management, evaluation, and performance monitoring.

Practical experience deploying workloads on AWS ECS/EKS, building CI/CD pipelines, and managing production systems.

Solid knowledge of AWS security best practices, including IAM, VPC, Secrets Manager, and data protection.

Strong problem-solving, communication, and technical leadership skills.

More Jobs at Impetus Technologies

DotNet Developer

Indore, Gurugram, Bengaluru

5 - 10 yrs

INR 8 - 12 Lacs

Business Intelligence Engineer

Noida, Pune, Bengaluru

3 - 6 yrs

INR 9 - 16 Lacs

Senior Marketing Manager

Noida, Indore, Bengaluru

10 - 16 yrs

INR 30 - 40 Lacs

Java backend Developer

Indore, Gurugram, Bengaluru

4 - 7 yrs

INR 5 - 15 Lacs

Senior Dot net Developer

Noida, Indore, Pune, Gurugram

8 - 13 yrs

INR 25 - 30 Lacs

Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

Impetus Technologies

Login to

Please Verify Your Phone or Email

Confirm Action

AWS+Gen/Agentic AI Lead