Senior Data Engineer

4 - 6 years

7 - 11 Lacs

Posted:-1 days ago| Platform: Naukri logo

Apply

Work Mode

Work from Office

Job Type

Full Time

Job Description

We are seeking a skilled and motivated Data Engineer to join our data team, with a strong focus on building and managing the data ingestion layer of our Databricks Lakehouse Platform. You will be responsible for creating reliable, scalable, and automated pipelines to pull data from a wide variety of sources including third-party APIs, streaming platforms, relational databases, and file-based systems like Google Analytics 4 (GA4) ensuring it lands accurately and efficiently in our Bronze layer.
This role requires hands-on expertise in Python (PySpark), SQL, and modern ingestion tools like Databricks Auto Loader and Structured Streaming. You will be the expert on connecting to new data sources, ensuring our data lakehouse has the raw data it needs to power analytics and business insights across the organization.

What youll be doing:

  • Design, build, and maintain robust data ingestion pipelines to collect data from diverse sources such as APIs, streaming sources (e.g., Kafka, Event Hubs), relational databases (via JDBC), and cloud storage.
  • Heavily utilize Databricks Auto Loader and COPY INTO for the efficient, incremental, and scalable ingestion of files into Delta Lake.
  • Develop and manage Databricks Structured Streaming jobs to process near-real-time data feeds.
  • Ensure the reliability, integrity, and freshness of the Bronze layer in our Medallion Architecture, which serves as the single source of truth for all raw data.
  • Perform initial data cleansing, validation, and structuring to prepare data for further transformation in the Silver layer.
  • Monitor, troubleshoot, and optimize ingestion pipelines for performance, cost, and stability.
  • Develop Python scripts and applications to automate data extraction and integration processes.
  • Work closely with platform architects and other data engineers to implement best practices for data ingestion and management.
  • Document data sources, ingestion patterns, and pipeline configurations.
  • Conform to agile development practices, including version control (Git), CI/CD, and automated testing.

What youll need:

  • Education: Minimum of a Bachelors degree in Computer Science, Engineering, Mathematics, or a related technical field preferred.
  • Experience: 4-6+ years of relevant experience in data engineering, with a strong focus on data ingestion and integration.
  • Engineers Core Skills:

    • Databricks Platform Expertise:
    • Data Ingestion Mastery: Deep, practical experience with Databricks Auto Loader, COPY INTO, and Structured Streaming.
    • Apache Spark: Strong hands-on experience with Spark architecture, writing and optimizing PySpark and Spark SQL jobs for ingestion and basic transformation.
    • Delta Lake: Solid understanding of Delta Lake for creating reliable landing zones for raw data. Proficient in writing data to Delta tables and understanding its core concepts like ACID transactions and schema enforcement.
    • Core Engineering & Cloud Skills:
    • Programming: 4+ years of strong, hands-on experience in Python, with an emphasis on PySpark and libraries for API interaction (e.g., requests).
    • SQL: 4+ years of strong SQL experience for data validation and querying.
    • Cloud Platforms: 3+ years working with a major cloud provider (Azure, AWS, or GCP), with specific knowledge of cloud storage (ADLS Gen2, S3), security, and messaging/streaming services.
    • Diverse Data Sources: Proven experience ingesting data from a variety of sources (e.g., REST APIs, SFTP, relational databases, message queues).
    • CI/CD & DevOps: Experience with version control (Git) and CI/CD pipelines (e.g., GitHub Actions, Azure DevOps) for automating deployments.
    • Data Modeling: Familiarity with data modeling concepts (e.g., star schema) to understand the downstream use of the data you are ingesting.
  • Tools & Technologies:

    • Primary Data Platform: Databricks
    • Cloud Platforms: Azure (Preferred), GCP, AWS
    • Data Warehouses (Integration): Snowflake, Google BigQuery
    • Orchestration: Databricks Workflows
    • Version Control: Git/GitHub or similar repositories
    • Infrastructure as Code (Bonus): Terraform

Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now
WPP logo
WPP

Marketing and Advertising

London

RecommendedJobs for You

noida, bengaluru, mumbai (all areas)