Posted:2 weeks ago|
Platform:
On-site
Full Time
Role Summary We are looking for a Data Engineer to design, build, and manage a scalable Data Lake that integrates data from MySQL and multiple other structured/unstructured data sources. The ideal candidate will work on ETL pipelines, data modeling, and real-time streaming solutions to create a centralized, high-performance data infrastructure for analytics, reporting, and AI-driven insights. Job Responsibilities Data Lake Architecture & Implementation Design and develop a scalable Data Lake solution integrating MySQL, APIs, flat files, and NoSQL databases. Implement data ingestion pipelines using ETL/ELT processes for batch and real-time streaming data. Optimize data storage, partitioning, and indexing for high-performance query execution. ETL & Data Pipeline Development Develop and manage ETL workflows to extract, transform, and load data into the Data Lake. Automate data cleansing, normalization, deduplication, and transformation processes. Ensure efficient data orchestration using Apache Airflow, Prefect, or similar tools. Data Integration & Sources Connect and integrate MySQL, PostgreSQL, MongoDB, APIs, third-party platforms, and cloud storage (S3, GCS, Azure Blob) into the Data Lake. Implement real-time data streaming using Kafka, Kinesis, or Pub/Sub for event-driven architectures. Big Data & Cloud Technologies Utilize Hadoop, Spark, or Snowflake for distributed data processing. Deploy and manage Data Lake on AWS (S3, Glue, Redshift), Azure (ADLS, Synapse), or GCP (BigQuery, Dataproc). Optimize cost and performance of cloud-based data storage and processing. Data Governance, Security & Compliance Implement data governance, lineage, access control, and encryption policies. Ensure compliance with GDPR, CCPA, and other data privacy regulations. Develop monitoring & alerting mechanisms for data quality, integrity, and security. Collaboration & Business Insights Work closely with Data Scientists, Analysts, and BI teams to provide clean, enriched, and structured data. Support machine learning and AI-driven insights by designing optimized data pipelines. Define data cataloging, metadata management, and documentation for self-service analytics. Job Requirements Educational Qualification and Experience 5-8 years of experience in Data Engineering, ETL, and Big Data technologies. Strong expertise in SQL, MySQL, PostgreSQL, and NoSQL (MongoDB, Cassandra, DynamoDB, etc.). Technical Skills Hands-on experience with ETL tools (Apache Airflow, Talend, DBT, Glue, etc.). Experience with Big Data frameworks (Spark, Hadoop, Snowflake, Redshift, BigQuery). Proficiency in Python, Scala, or Java for data engineering workflows. Knowledge of Cloud Data Lakes (AWS S3/Glue, GCP BigQuery, Azure ADLS/Synapse). Strong experience with data modeling, schema design, and query optimization. Experience with Kafka, Kinesis, or Pub/Sub for real-time data processing. Exposure to Docker, Kubernetes, and CI/CD for data pipeline automation. Knowledge of Delta Lake, Iceberg, or Hudi for transactional data lakes. Understanding of ML feature stores and AI-driven data pipelines. Behavioural Skills Strategic thinking Planning and organizing Interpersonal Skills Stakeholder management People Leadership Innovation and Creativity Attention to detail Show more Show less
Group Bayport
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
My Connections Group Bayport
Chennai
25.0 - 30.0 Lacs P.A.
Hyderabad, Pune, Bengaluru
10.0 - 20.0 Lacs P.A.
Chennai
0.5 - 0.6 Lacs P.A.
Hyderabad, Chennai, Bengaluru
9.5 - 15.0 Lacs P.A.
Bengaluru
7.0 - 17.0 Lacs P.A.
Hyderabad
15.0 - 30.0 Lacs P.A.
Pune
15.0 - 30.0 Lacs P.A.
Chennai, Bengaluru
15.0 - 20.0 Lacs P.A.
Hyderabad, Chennai, Bengaluru
10.0 - 19.0 Lacs P.A.
HyderÄbÄd
2.51046 - 7.5 Lacs P.A.