Lead Data Engineer - Generative AI & Agentic Systems (AWS)
About the Role
We are seeking a Lead Data Engineer with strong expertise in AWS-based data engineering and hands-on production experience with Generative AI and Agentic AI systems. This role blends deep data infrastructure engineering with modern GenAI application development, focusing on scalable, secure, and production-grade AI systems.
As a Lead, you will own solution architecture, guide engineering best practices, and mentor team members while remaining deeply hands-on. You will play a critical role in taking GenAI initiatives from experimentation to enterprise-scale deployment.
Key Responsibilities
Technical Leadership & Architecture
- Lead the design and implementation of scalable data and GenAI architectures on AWS.
- Define best practices, coding standards, and architectural patterns for GenAI and data engineering teams.
- Review designs and code to ensure performance, security, reliability, and cost optimization.
- Act as a technical mentor and escalation point for complex engineering challenges.
Data Engineering & Cloud Infrastructure
- Design, build, and maintain robust data pipelines and AI data infrastructure using AWS services such as Glue, Lambda, S3, Redshift, Athena, Step Functions, and related services.
- Develop high-performance PySpark pipelines for large-scale data processing in production environments.
- Ensure data quality, lineage, governance, and reliability across platforms.
Generative AI & Agentic AI Systems
- Design and deploy LLM-powered applications using frameworks such as LangChain, LlamaIndex, AutoGen, or equivalent.
- Build and maintain Retrieval-Augmented Generation (RAG) pipelines, integrating S3, Bedrock, SageMaker, and vector databases (OpenSearch, Pinecone, FAISS, Chroma, Milvus, etc.).
- Implement agentic reasoning, tool invocation, memory, and orchestration for single-agent and multi-agent workflows.
- Integrate LLMs with internal data systems, APIs, and enterprise applications.
Deployment, Observability & Operations
- Containerize and deploy AI services using Docker, ECS, or EKS, ensuring scalability and reliability.
- Design and maintain CI/CD pipelines for data and AI workloads.
- Implement monitoring, logging, tracing, and model observability using CloudWatch, X-Ray, and LLMOps tools.
- Optimize runtime performance, latency, and infrastructure cost for AI workloads.
Collaboration & Delivery
- Collaborate closely with ML engineers, data scientists, product teams, and cloud architects to deliver production-ready GenAI solutions.
- Drive the transition from PoCs to enterprise-grade systems, ensuring compliance, security, and operational readiness.
Required Skills & Experience
- 7-11 years of experience in Data Engineering, with strong hands-on AWS expertise.
- Advanced proficiency in Python and PySpark, with real-world production experience.
- Proven experience delivering GenAI solutions in production environments (beyond demos or PoCs).
- Hands-on experience with Agentic AI frameworks (LangChain, LlamaIndex, AutoGen, or similar).
- Strong understanding of RAG architectures, vector databases, and embedding strategies.
- Experience with LLMOps, prompt lifecycle management, evaluation, and performance monitoring.
- Practical experience deploying workloads on AWS ECS/EKS, building CI/CD pipelines, and managing production systems.
- Solid knowledge of AWS security best practices, including IAM, VPC, Secrets Manager, and data protection.
- Strong problem-solving, communication, and technical leadership skills.
Lead Data Engineer - Generative AI & Agentic Systems (AWS)
About the Role
We are seeking a Lead Data Engineer with strong expertise in AWS-based data engineering and hands-on production experience with Generative AI and Agentic AI systems. This role blends deep data infrastructure engineering with modern GenAI application development, focusing on scalable, secure, and production-grade AI systems.
As a Lead, you will own solution architecture, guide engineering best practices, and mentor team members while remaining deeply hands-on. You will play a critical role in taking GenAI initiatives from experimentation to enterprise-scale deployment.
Key Responsibilities
Technical Leadership & Architecture
- Lead the design and implementation of scalable data and GenAI architectures on AWS.
- Define best practices, coding standards, and architectural patterns for GenAI and data engineering teams.
- Review designs and code to ensure performance, security, reliability, and cost optimization.
- Act as a technical mentor and escalation point for complex engineering challenges.
Data Engineering & Cloud Infrastructure
- Design, build, and maintain robust data pipelines and AI data infrastructure using AWS services such as Glue, Lambda, S3, Redshift, Athena, Step Functions, and related services.
- Develop high-performance PySpark pipelines for large-scale data processing in production environments.
- Ensure data quality, lineage, governance, and reliability across platforms.
Generative AI & Agentic AI Systems
- Design and deploy LLM-powered applications using frameworks such as LangChain, LlamaIndex, AutoGen, or equivalent.
- Build and maintain Retrieval-Augmented Generation (RAG) pipelines, integrating S3, Bedrock, SageMaker, and vector databases (OpenSearch, Pinecone, FAISS, Chroma, Milvus, etc.).
- Implement agentic reasoning, tool invocation, memory, and orchestration for single-agent and multi-agent workflows.
- Integrate LLMs with internal data systems, APIs, and enterprise applications.
Deployment, Observability & Operations
- Containerize and deploy AI services using Docker, ECS, or EKS, ensuring scalability and reliability.
- Design and maintain CI/CD pipelines for data and AI workloads.
- Implement monitoring, logging, tracing, and model observability using CloudWatch, X-Ray, and LLMOps tools.
- Optimize runtime performance, latency, and infrastructure cost for AI workloads.
Collaboration & Delivery
- Collaborate closely with ML engineers, data scientists, product teams, and cloud architects to deliver production-ready GenAI solutions.
- Drive the transition from PoCs to enterprise-grade systems, ensuring compliance, security, and operational readiness.
Required Skills & Experience
- 7-11 years of experience in Data Engineering, with strong hands-on AWS expertise.
- Advanced proficiency in Python and PySpark, with real-world production experience.
- Proven experience delivering GenAI solutions in production environments (beyond demos or PoCs).
- Hands-on experience with Agentic AI frameworks (LangChain, LlamaIndex, AutoGen, or similar).
- Strong understanding of RAG architectures, vector databases, and embedding strategies.
- Experience with LLMOps, prompt lifecycle management, evaluation, and performance monitoring.
- Practical experience deploying workloads on AWS ECS/EKS, building CI/CD pipelines, and managing production systems.
- Solid knowledge of AWS security best practices, including IAM, VPC, Secrets Manager, and data protection.
- Strong problem-solving, communication, and technical leadership skills.