Posted:7 hours ago|
Platform:
On-site
Full Time
Data Engineer
Designation: Data Engineer Experience: 6-8 Years Location: Mumbai (onsite)
Job Summary
We are seeking a highly skilled Data Engineer with deep expertise in Apache Kafka integration with Databricks, structured streaming, and large-scale data pipeline design using the Medallion Architecture. The ideal candidate will demonstrate strong hands-on experience in building and optimizing real-time and batch pipelines, and will be expected to solve real coding problems during the interview.
Job Description
Key Responsibilities
Design, develop, and maintain real-time and batch data pipelines in Databricks
Integrate Apache Kafka with Databricks using Structured Streaming
Implement robust data ingestion frameworks using Databricks Autoloader
Build and maintain Medallion Architecture pipelines across Bronze, Silver, and Gold layers
Implement checkpointing, output modes, and appropriate processing modes in structured streaming jobs
Design and implement Change Data Capture (CDC) workflows and Slowly Changing Dimensions (SCD) Type 1 and Type 2 logic
Develop reusable components for merge/upsert operations and window function-based transformations Handle large volumes of data efficiently through proper partitioning, caching, and cluster tuning techniques Collaborate with cross-functional teams to ensure data availability, reliability, and consistency
Must Have Skills Apache Kafka
Integration, topic management, schema registry (Avro/JSON)
Databricks & Spark Structured Streaming Processing Modes: Append, Update, Complete
Output Modes: Memory, Console, File, Kafka, Delta Checkpointing and fault tolerance
Databricks Autoloader
Schema inference, schema evolution, incremental loads
Medallion Architecture
Full implementation expertise across Bronze, Silver, and Gold layers
Performance Optimization
Data partitioning strategies
Caching and persistenceAdaptive query execution and cluster configuration tuning
SQL & Spark SQL
Proficiency in writing efficient queries and transformations
Data Governance
Schema enforcement, data quality checks, and monitoring
Good to Have
Strong coding skills in Python and PySpark
Experience working in CI/CD environments for data pipelines Exposure to cloud platforms (AWS/Azure/GCP)
Understanding of Delta Lake, time travel, and data versioning Familiarity with orchestration tools like Airflow or Azure Data Factory
Job Types: Full-time, Permanent
Pay: ₹1,500,000.00 - ₹3,000,000.00 per year
Benefits:
Ability to commute/relocate:
Location:
Work Location: In person
Whatease
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.
We have sent an OTP to your contact. Please enter it below to verify.
Practice Python coding challenges to boost your skills
Start Practicing Python Now8.0 - 10.0 Lacs P.A.
10.0 - 20.0 Lacs P.A.
0.5 - 0.8 Lacs P.A.
bengaluru
24.0 - 42.0 Lacs P.A.
dehradun, uttarakhand, india
Salary: Not disclosed
navi mumbai, maharashtra
15.0 - 30.0 Lacs P.A.
noida, uttar pradesh, india
Salary: Not disclosed
gurugram, haryana, india
Salary: Not disclosed
Salary: Not disclosed
indore, hyderabad, ahmedabad
6.0 - 10.0 Lacs P.A.