Senior Data Engineer

9 - 12 years

14 - 24 Lacs

Posted:19 hours ago| Platform: Naukri logo

Apply

Work Mode

Remote

Job Type

Full Time

Job Description

We are looking for an experienced Senior Data Engineer to lead the development of scalable AWS-native data lake pipelines with a strong focus on time series forecasting and upsert-ready architectures. This role requires end-to-end ownership of the data lifecycle, from ingestion to partitioning, versioning, and BI delivery. The ideal candidate must be highly proficient in AWS data services, PySpark, versioned storage formats like Apache Hudi/Iceberg, and must understand the nuances of data quality and observability in large-scale analytics systems.

Role & responsibilities

  • Design and implement data lake zoning (Raw Clean Modeled) using Amazon S3, AWS Glue, and Athena.
  • Ingest structured and unstructured datasets including POS, USDA, Circana, and internal sales data.
  • Build versioned and upsert-friendly ETL pipelines using Apache Hudi or Iceberg.
  • Create forecast-ready datasets with lagged, rolling, and trend features for revenue and occupancy modelling.
  • Optimize Athena datasets with partitioning, CTAS queries, and metadata tagging.
  • Implement S3 lifecycle policies, intelligent file partitioning, and audit logging.
  • Build reusable transformation logic using dbt-core or PySpark to support KPIs and time series outputs.
  • Integrate robust data quality checks using custom logs, AWS CloudWatch, or other DQ tooling.
  • Design and manage a forecast feature registry with metrics versioning and traceability.
  • Collaborate with BI and business teams to finalize schema design and deliverables for dashboard consumption.

Preferred candidate profile

  • 9-12 years of experience in data engineering.
  • Deep hands-on experience with AWS Glue, Athena, S3, Step Functions, and Glue Data Catalog.
  • Strong command over PySpark, dbt-core, CTAS query optimization, and partition strategies.
  • Working knowledge of Apache Hudi, Iceberg, or Delta Lake for versioned ingestion.
  • Experience in S3 metadata tagging and scalable data lake design patterns.
  • Expertise in feature engineering and forecasting dataset preparation (lags, trends, windows).
  • Proficiency in Git-based workflows (Bitbucket), CI/CD, and deployment automation.
  • Strong understanding of time series KPIs, such as revenue forecasts, occupancy trends, or demand volatility.
  • Data observability best practices including field-level logging, anomaly alerts, and classification tagging.
  • Experience with statistical forecasting frameworks such as Prophet, GluonTS, or related libraries.
  • Familiarity with Superset or Streamlit for QA visualization and UAT reporting.
  • Understanding of macroeconomic datasets (USDA, Circana) and third-party data ingestion.
  • Independent, critical thinker with the ability to design for scale and evolving business logic.
  • Strong communication and collaboration with BI, QA, and business stakeholders.
  • High attention to detail in ensuring data accuracy, quality, and documentation.
  • Comfortable interpreting business-level KPIs and transforming them into technical pipelines.

Mock Interview

Practice Video Interview with JobPe AI

Start PySpark Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now

RecommendedJobs for You