We are looking for an experienced
Data engineer
to join our data engineering team and help design, build, and optimize robust data pipelines and platforms that power our analytics, products, and decision-making. This role demands a strong foundation in big data technologies, excellent programming and SQL skills, and hands-on experience in tuning and optimization for large-scale data processing. You will work closely with data scientists, analysts, product managers, and other engineers to build scalable, efficient, and reliable data solutions.
WHAT YOU LL DO:
- Design, develop, and maintain scalable
big data pipelines
using Spark (Scala or PySpark), Hive, and HDFS. - Build and manage data workflows and orchestration using
Airflow
. - Write efficient, production-grade code in
any programming language
(such as Python, Java, Scala, etc.) to transform and process data. - Develop complex
SQL queries
for data transformation, validation, and reporting, ensuring high performance and optimization. - Tune and optimize Spark jobs and SQL queries to improve performance and resource utilization.
- Work on
cloud platforms
(AWS, Azure, GCP) to deploy and manage data infrastructure, preferably AWS with EMR. - Collaborate with data stakeholders to understand requirements and deliver reliable, high-quality data solutions.
- Maintain data quality, governance, and monitoring, ensuring pipelines are robust, observable, and recoverable.
WHAT YOU LL NEED:
- Strong experience with
Big Data stack
: HDFS, Hive, Spark (Scala or PySpark) - Excellent
programming skills
in any major language (Python, Java, Scala, etc.) - Expert in
SQL
with ability to write and optimize complex queries - Hands-on experience with
Spark tuning and optimization
(both compute and SQL layer) - Experience with
Airflow
for data workflow orchestration - Bachelor s or Master s degree in Computer Science, Engineering, or related field.
- 5+ years of experience as a Data Engineer working on large-scale data systems.
- Proven track record of delivering production-ready data pipelines in big data environments.
- Strong analytical thinking, problem-solving, and communication skills.
- Exposure to
cloud technologies
(AWS, Azure, or GCP)
Preferred / Nice-to-Have Skills
- AWS ecosystem, especially
EMR (Elastic MapReduce)
-
Trino
or Presto
for interactive querying - Familiarity with
Lakehouse formats
(Hudi, Delta, Iceberg) - Experience with
DBT (Data Build Tool)
for analytics engineering - Experience with
Kafka
(streaming ingestion) - Familiarity with monitoring tools like
Prometheus
and Grafana