Senior Data Engineer

9 - 14 years

17 - 32 Lacs

Posted:5 days ago| Platform: Naukri logo

Apply

Work Mode

Work from Office

Job Type

Full Time

Job Description

Key Responsibilities:

Spark Engineering & Performance Optimization (Core)

  • Architect, build, and optimize

    large-scale Spark pipelines

    using

    PySpark or Scala

  • Lead

    Spark performance tuning initiatives

    , including:
    • Executor and memory tuning
    • Partitioning and shuffle optimization
    • Broadcast joins, caching, and skew mitigation
  • Analyze and resolve Spark job failures, bottlenecks, and cost inefficiencies
  • Establish

    Spark coding standards and optimization best practices

    across teams

Data Architecture & Engineering

  • Design and build

    end-to-end data pipelines

    (batch and streaming) ingesting data into

    Snowflake and AWS-based Data Lakes

  • Architect and optimize

    cloud-native data lake and data warehouse solutions

    using:
    • AWS S3, EMR, Glue, Lambda, Step Functions, Kinesis, Redshift

  • Define modular, reusable frameworks for ingestion, transformation, and consumption layers
  • Design and implement

    CDC, API-based, and event-driven integration patterns

Data Modeling & Warehouse Optimization

  • Own

    conceptual, logical, and physical data models

    for analytics and reporting
  • Design and optimize

    Star, Snowflake, and Data Vault schemas

  • Drive

    Snowflake performance and cost optimization

    , including:
    • Query tuning and caching strategies
    • Partitioning, clustering, and warehouse sizing
  • Establish metadata management, schema versioning, and lineage standards

Data Quality, Governance & Security

  • Define and enforce

    enterprise-grade data quality frameworks

  • Implement governance practices including

    cataloging, lineage, access control, and compliance

    (GDPR, SOC 2)
  • Partner with InfoSec teams to manage:
    • IAM roles and policies
    • Encryption and data masking in AWS and Snowflake
  • Ensure data accuracy, completeness, and reliability across platforms

Automation, Monitoring & CI/CD

  • Architect and manage

    pipeline orchestration

    using

    Apache Airflow, AWS Step Functions, or Glue Workflows

  • Implement

    CI/CD pipelines

    for Spark and data workflows using

    Git and DevOps tools

  • Establish monitoring and alerting for:
    • Spark job performance
    • Data freshness, volume, and anomalies
  • Evaluate and adopt emerging technologies such as

    Iceberg, Delta Lake, dbt, and Data Mesh

Collaboration & Technical Leadership

  • Translate business and product requirements into

    scalable data platform designs

  • Partner with data scientists and analysts to enable

    self-service analytics

  • Lead design and code reviews; mentor junior data engineers
  • Maintain high-quality documentation for architecture, pipelines, and lineage

Required Skills & Qualifications:

  • Bachelors or master’s degree in computer science

    , Data Engineering, or related field

  • 8–10+ years

    of hands-on experience in

    data engineering

  • Expert-level experience with Apache Spark

    (PySpark or Scala) —

    mandatory

  • Proven hands-on expertise in

    Spark performance tuning and optimization

  • Deep knowledge of

    SQL and Spark SQL

    , query optimization, and execution plans
  • Strong experience with

    Snowflake

    , including schema design, performance tuning, and cost optimization
  • Extensive experience building data platforms on

    AWS

    (S3, EMR, Glue, Kinesis, Redshift, Step Functions, IAM)
  • Strong understanding of

    ETL/ELT patterns

    and tools such as

    dbt, Glue, Informatica, or Talend

  • Experience with

    streaming frameworks

    (Kafka, Kinesis, Flink)
  • Strong programming skills in

    Python or Scala

  • Experience with

    CI/CD, Git, and automation for data pipelines

  • Solid understanding of

    data governance, quality frameworks, and security best practices

Nice-to-Have / Bonus Skills:

  • Experience with

    Iceberg, Delta Lake, or Hudi

  • Spark Structured Streaming and real-time analytics
  • Data observability tools (Monte Carlo, Bigeye)
  • Metadata management platforms (Alation, Collibra)
  • Exposure to machine learning data pipelines or feature stores

Mock Interview

Practice Video Interview with JobPe AI

Start PySpark Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now

RecommendedJobs for You

chennai, tamil nadu, india

pune, maharashtra, india