AWS Data Engineering Lead - ETL/PySpark

8 - 10 years

0 Lacs

Posted:1 day ago| Platform: Linkedin logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

Description

Job Summary :

We are looking for an experienced AWS Data Engineer Lead (PySpark Developer) with strong knowledge of AWS services, particularly AWS Lambda, to design and implement scalable, high-performance data processing workflows for large and complex datasets.The ideal candidate will have hands-on experience leading the design and orchestration of Spark-based ETL jobs using core AWS services like AWS Glue, Amazon EMR, and Lambda.This role requires expertise in querying and integrating data using Amazon Athena and DynamoDB for both analytical and operational use cases.

Key Responsibilities

  • Design, develop, and maintain robust data transformation pipelines utilizing PySpark on AWS Glue or Amazon EMR.
  • Implement event-driven data architectures using services like AWS Lambda, S3 events, Amazon EventBridge, and AWS Step Functions to orchestrate scalable data workflows.
  • Integrate and optimize data ingestion and reporting pipelines using Amazon Athena for query-based transformations and analysis.
  • Develop logic to efficiently read, write, and process structured and unstructured data stored in DynamoDB, Amazon S3, and Athena.
  • Proactively monitor, debug, and fine-tune data pipelines for optimal performance, scalability, and cost-efficiency using CloudWatch and platform-specific logging/metrics (Glue/EMR logs, Athena query metrics).
  • Optimize PySpark code and configuration for execution on cloud-based data platforms.
  • Collaborate effectively with cross-functional product and engineering teams to translate business and data requirements into robust technical implementations.

Required Skills & Experience

  • 8-10 years of total experience, with a focus on Python, PySpark, and distributed data processing technologies.
  • Strong, hands-on expertise with core AWS data services, including AWS Glue, AWS Lambda, Amazon EMR, and S3.
  • Proven experience integrating and working with Amazon Athena and DynamoDB for various data use cases.
  • Proficiency in building and deploying serverless solutions and workflow orchestration using AWS Step Functions, EventBridge, and Lambda.
  • Solid understanding of common data formats (e.g., Parquet, Avro, JSON) and advanced data transformation logic.
  • Familiarity with data lake architecture principles.
  • Experience working in the US Mountain Time Zone (MT).

Good To Have

  • Exposure to CI/CD pipelines and Infrastructure as Code (e.g., Terraform, AWS CDK).
  • Knowledge of data governance and security best practices within an AWS environment.
  • Prior experience with on-premises to cloud data migration projects.
  • Familiarity with advanced performance tuning techniques for Athena and leveraging DynamoDB Streams for event-driven ingestion.
(ref:hirist.tech)

Mock Interview

Practice Video Interview with JobPe AI

Start PySpark Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

RecommendedJobs for You