Posted:1 week ago| Platform: Linkedin logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description


Job Title: Data Engineer

Domain: Agentic AI – AWS

Annual CTC: ₹ 7 LPA


Position Overview:

We are seeking skilled Data Engineers to design, build, and manage robust data pipelines that power Agentic AI solutions on AWS. The role focuses on developing efficient ETL/ELT workflows, ensuring data quality, security, and scalability to support AI/ML model training, inference, and intelligent decision-making processes in a cloud-native environment.


Key Responsibilities:

• Design, develop, and maintain ETL/ELT pipelines to process structured, unstructured, and streaming data in AWS environments.

• Leverage AWS services such as S3, Glue, EMR, Lambda, Kinesis, Step Functions, Redshift, Athena, DynamoDB, RDS, and Lake Formation for scalable data ingestion, transformation, storage, and querying.

• Ensure robust data governance, security, and quality controls throughout the data lifecycle, supporting regulatory compliance.

• Enable both real-time and batch data processing pipelines to power AI-driven workflows and applications.

• Collaborate closely with AI/ML teams to prepare clean, high-quality datasets optimized for model training, inference, and fine-tuning.

• Optimize data infrastructure to achieve high performance, scalability, reliability, and cost efficiency.

• Implement CI/CD pipelines and data engineering best practices in a cloud-native architecture for automated deployments.

• Monitor, debug, and resolve issues in data pipelines to ensure high availability, reliability, and minimal latency.

• Maintain thorough documentation of data pipelines, architecture, and processes to facilitate reproducibility and knowledge transfer.



Required Skills & Qualifications:

• 3 years of hands-on experience in Data Engineering, building data pipelines, and cloud-based data solutions.

• Strong programming skills in Python, SQL, and/or Scala/Java.

• Expertise in AWS cloud services, including S3, Glue, EMR, Redshift, Athena, Kinesis, Lambda, Step Functions, DynamoDB, and RDS.

• Experience with data pipeline orchestration tools such as Apache Airflow, AWS Step Functions, Dagster, or similar.

• Proficiency in big data frameworks like Apache Spark, Hadoop, or Flink.

• Solid understanding of data modeling, data warehousing concepts, and schema design principles.

• In-depth knowledge of data governance, lineage, security policies (IAM roles, encryption, Lake Formation).

• Hands-on experience in managing real-time streaming data using Kafka, Kinesis, or equivalent tools.

• Familiarity with DevOps practices including Infrastructure as Code (Terraform, CloudFormation), CI/CD automation, Git, and containerization (Docker).



Preferred Attributes:

• Experience working in regulated industries (healthcare, Life Sciences) with a focus on compliance and data privacy.

• Strong problem-solving skills and ability to work in a collaborative, cross-functional team environment.

• Passion for designing scalable and efficient cloud-native data solutions.

• Excellent documentation skills, ensuring clear communication of technical solutions and architecture.

Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

RecommendedJobs for You

navi mumbai, pune, mumbai (all areas)

pune, chennai, bengaluru

chennai, bengaluru, delhi / ncr, remote