Data bricks Engineer

15 - 20 years

15 - 20 Lacs

Posted:1 week ago| Platform: Naukri logo

Apply

Work Mode

Hybrid

Job Type

Full Time

Job Description

Urgent Need for Databricks Engineer with 5-10 Yrs of experience for Chennai & Pune Location, need candidates who can join us in short notice .

1.

We are building a scalable data ingestion and streaming platform that ingests change data capture (CDC) events from diverse source systems (databases and applications), processes them in real time and lands curated data into our analytics lake. The platform uses Confluent connectors (Debezium/Oracle CDC) to emit Parquet files into cloud storage and leverages Databricks Auto Loader to incrementally ingest, deduplicate and write this data into Delta Lake Bronze table. To ensure broad applicability, the following job description emphasizes generic streaming and data engineering skills while highlighting the core technologies used in our solution.

2.

Design and develop streaming ingestion pipelines. Use Apache Spark (Structured Streaming) and Databricks Auto Loader to consume files from cloud storage or messages from Kafka/RabbitMQ/Confluent Cloud and ingest them into Delta Lake, ensuring schema evolution and exactly once semantics.

  • Implement CDC and deduplication logic. Capture change events from source databases using Debezium, built-in CDC features of SQL Server/ Oracle or other connectors. Apply watermarking and drop duplicate strategies based on primary keys and event timestamps.
  • Ensure data quality and fault tolerance. Configure checkpointing, error handling and deadletter queues (DLQ) so that malformed or late data can be quarantined and replayed. Optimise file sizes, partitioning and clustering to maintain performance.

  • Scale ingestion through configuration. Build a config-driven framework (e.g., using Airflow, DBX Jobs or Delta Live Tables) that iterates over metadata tables to deploy/update ingestion pipelines for hundreds of tables/sources without code duplication.
  • Collaborate on architecture and orchestration. Contribute to the overall data platform architectureintegrating data sources, message queues, processing engines and storage—and define orchestration patterns for backfill, replay and streaming jobs.
  • Implement monitoring, observability and security. Capture streaming query metrics and publish them to monitoring platforms (Prometheus, Grafana). Set up dashboards for lag, files processed and processing duration. Enforce role-based access control, encryption and data masking.
  • Work with data consumers. Partner with analytics teams, data scientists and downstream application developers to ensure that ingested data meets their requirements. Provide documentation, metadata and lineage for all tables.
  • Participate in DevOps processes. Use CI/CD pipelines (e.g., Jenkins, GitHub Actions) to automate deployment of jobs; manage infrastructure with Terraform or similar tools; follow best practices for version control and code reviews.

3.

  • 5–8 years of experience designing and building data pipelines using Apache Spark, Databricks or equivalent bigdata frameworks.

  • Handson expertise with streaming and messaging systems such as Apache Kafka (publish subscribe architecture), Confluent Cloud, RabbitMQ or Azure Event Hub. Experience creating producers, consumers and topics and integrating them into downstream processing.
  • Deep understanding of relational databases and CDC. Proficiency in SQL Server, Oracle or other RDBMSs; experience capturing change events using Debezium or native CDC tools and transforming them for downstream consumption.
  • Proficiency in programming languages such as Python, Scala or Java and solid knowledge of SQL for data manipulation and transformation.
  • Cloud platform expertise. Experience with Azure or AWS services for data storage, compute and orchestration (e.g., ADLS, S3, Azure Data Factory, AWS Glue, Airflow, DBX, DLT).
  • Data modelling and warehousing. Knowledge of data Lakehouse architectures, Delta Lake, partitioning strategies and performance optimisation.
  • Version control and DevOps. Familiarity with Git and CI/CD pipelines; ability to automate deployment and manage infrastructure as code.
  • Strong problem solving and communication skills. Ability to work with cross functional teams and articulate complex technical concepts to nontechnical stakeholders.

4.

  • Experience with event driven architectures and micro services integration.

Exposure to NiFi, Flume or other ingestion frameworks for connecting heterogeneous sources.

  • Knowledge of graph processing or machine learning pipelines on Spark.

Mock Interview

Practice Video Interview with JobPe AI

Start Job-Specific Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now
Optimum Solutions logo
Optimum Solutions

Information Technology

Tech City

RecommendedJobs for You

pune/pimpri-chinchwad area

gurugram, haryana, india

pune, maharashtra, india

hyderabad, telangana, india