Senior Data Engineer (Data Lake, Forecasting & Governance)- 9+ yrs-Immediate

9 - 12 years

0 Lacs

Posted:1 day ago| Platform: Linkedin logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description


Senior Data Engineer

The ideal candidate will be highly proficient in AWS data services, PySpark, and versioned storage formats such as Apache Hudi or Iceberg. A strong understanding of data quality, observability, governance, and metadata management in large-scale analytical systems is critical.


Roles & Responsibilities


  • Design and implement data lake zoning (Raw Clean Modeled) using Amazon S3, AWS Glue, and Athena.
  • Ingest structured and unstructured datasets including POS, USDA, Circana, and internal sales data.
  • Build versioned and upsert-ready ETL pipelines using Apache Hudi or Iceberg.
  • Create forecast-ready datasets with lagged, rolling, and trend features for revenue and occupancy modeling.
  • Optimize Athena datasets with partitioning, CTAS queries, and S3 metadata tagging.
  • Implement S3 lifecycle policies, intelligent file partitioning, and audit logging for performance and compliance.
  • Build reusable transformation logic using dbt-core or PySpark to support KPIs and time series outputs.
  • Integrate data quality frameworks such as Great Expectations, custom logs, and AWS CloudWatch for field-level validation and anomaly detection.
  • Apply data governance practices using tools like OpenMetadata or Atlan, enabling lineage tracking, data cataloging, and impact analysis.
  • Establish QA automation frameworks for pipeline validation, data regression testing, and UAT handoff.
  • Collaborate with BI, QA, and business teams to finalize schema design and deliverables for dashboard consumption.
  • Ensure compliance with enterprise data governance policies and enable discovery and collaboration through metadata platforms.


Preferred Candidate Profile

  • 9-12 years of experience in data engineering.

  • Deep hands-on experience with AWS Glue, Athena, S3, Step Functions, and Glue, Data Catalog.
  • Strong command over PySpark, dbt-core, CTAS query optimization, and advanced partition strategies.
  • Proven experience with versioned ingestion using Apache Hudi, Iceberg, or

    Delta Lake.

  • Experience in data lineage, metadata tagging, and governance tooling using

    OpenMetadata, Atlan, or similar platforms.

  • Proficiency in feature engineering for time series forecasting (lags, rolling windows, trends).
  • Expertise in Git-based workflows, CI/CD, and deployment automation (Bitbucket or similar).
  • Strong understanding of time series KPIs: revenue forecasts, occupancy trends, demand volatility, etc.
  • Knowledge of statistical forecasting frameworks (e.g., Prophet, GluonTS, Scikit-learn).
  • Experience with Superset or Streamlit for QA visualization and UAT testing.
  • Experience building data QA frameworks and embedding data validation checks at each stage of the ETL lifecycle.
  • Independent thinker capable of designing systems that scale with evolving business logic and compliance requirements.
  • Excellent communication skills for collaboration with

    BI, QA, data governance, and business stakeholders.

  • High attention to detail, especially around data accuracy, documentation, traceability, and auditability.

Mock Interview

Practice Video Interview with JobPe AI

Start PySpark Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now