Job Overview
We are seeking a highly skilled and experienced Lead Data Engineer AWS to spearhead the design, development, and optimization of our cloud-based data infrastructure.As a technical leader, you will drive scalable data solutions using AWS services and modern data engineering tools, ensuring robust data pipelines and architectures for real-time and batch data processing.
Responsibilities
The ideal candidate is a hands-on technologist with a deep understanding of distributed data systems, cloud-native data services, and team leadership in Agile Responsibilities :
- Design, build, and maintain scalable, fault-tolerant, and secure data pipelines using AWS-native services (e.g., Glue, EMR, Lambda, S3, Redshift, Athena, Kinesis).
- Lead end-to-end implementation of data architecture strategies including ingestion, storage, transformation, and data governance.
- Collaborate with data scientists, analysts, and application developers to understand data requirements and deliver optimal solutions.
- Ensure best practices for data quality, data cataloging, lineage tracking, and metadata management using tools like AWS Glue Data Catalog or Apache Atlas.
- Optimize data pipelines for performance, scalability, and cost-efficiency across structured and unstructured data sources.
- Mentor and lead a team of data engineers, providing technical guidance, code reviews, and architecture recommendations.
- Implement data modeling techniques (OLTP/OLAP), partitioning strategies, and data warehousing best practices.
- Maintain CI/CD pipelines for data infrastructure using tools such as AWS CodePipeline, Git, and Monitor production systems and lead incident response and root cause analysis for data infrastructure issues.
- Drive innovation by evaluating emerging technologies and proposing improvements to existing data platform Skills & Qualifications :
- Minimum 7 years of experience in data engineering with at least 3+ years in a lead or senior engineering role.
- Strong hands-on experience with AWS data services: S3, Redshift, Glue, Lambda, EMR, Athena, Kinesis, RDS, DynamoDB.
- Advanced proficiency in Python/Scala/Java for ETL development and data transformation logic.
- Deep understanding of distributed data processing frameworks (e.g., Apache Spark, Hadoop).
- Solid grasp of SQL and experience with performance tuning in large-scale environments.
- Experience implementing data lakes, lakehouse architecture, and data warehousing solutions on cloud.
- Knowledge of streaming data pipelines using Kafka, Kinesis, or AWS MSK.
- Proficiency with infrastructure-as-code (IaC) using Terraform or AWS CloudFormation.
- Experience with DevOps practices and tools such as Docker, Git, Jenkins, and monitoring tools (CloudWatch, Prometheus, Grafana).
- Expertise in data governance, security, and compliance in cloud environments
(ref:hirist.tech)