Posted:1 day ago|
Platform:
On-site
Full Time
Dear Applicants,
Please find following Job Description for Senior Data Engineer role at Celebal Technologies, Please let us know your level of comfort along with most recent copy of your resume. You can directly apply or share your details at shailendra.sharma@celebaltech.com
We are seeking a highly skilled Data Engineer with deep expertise in Apache Kafka integration with Databricks, structured streaming, and large-scale data pipeline design using the Medallion Architecture. The ideal candidate will demonstrate strong hands-on experience in building and optimizing real-time and batch pipelines, and will be expected to solve real coding problems during the interview.
•Design, develop, and maintain real-time and batch data pipelines in Databricks.
•Integrate Apache Kafka with Databricks using Structured Streaming.
•Implement robust data ingestion frameworks using Databricks Autoloader.
•Build and maintain Medallion Architecture pipelines across Bronze, Silver, and Gold layers.
•Implement checkpointing, output modes, and appropriate processing modes in structured streaming jobs.
•Design and implement Change Data Capture (CDC) workflows and Slowly Changing Dimensions (SCD) Type 1 and Type 2 logic.
•Develop reusable components for merge/upsert operations and window function-based transformations.
•Handle large volumes of data efficiently through proper partitioning, caching, and cluster tuning techniques.
•Collaborate with cross-functional teams to ensure data availability, reliability, and consistency.
Must Have:
•Apache Kafka: Integration, topic management, schema registry (Avro/JSON).
•Databricks & Spark Structured Streaming:
Processing Modes: Append, Update, Complete
Output Modes: Memory, Console, File, Kafka, Delta
Checkpointing and fault tolerance
•Databricks Autoloader: Schema inference, schema evolution, incremental loads.
•Medallion Architecture implementation expertise.
•Performance Optimization:
Data partitioning strategies
Caching and persistence
Adaptive query execution and cluster configuration tuning
•SQL & Spark SQL: Proficiency in writing efficient queries and transformations.
•Data Governance: Schema enforcement, data quality checks, and monitoring.
•Strong coding skills in Python and PySpark.
•Experience working in CI/CD environments for data pipelines.
•Exposure to cloud platforms (AWS/Azure/GCP).
•Understanding of Delta Lake, time travel, and data versioning.
•Familiarity with orchestration tools like Airflow or Azure Data Factory.
Celebal Technologies
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.
We have sent an OTP to your contact. Please enter it below to verify.
Practice Python coding challenges to boost your skills
Start Practicing Python NowPune, Maharashtra, India
Salary: Not disclosed
Pune/Pimpri-Chinchwad Area
Salary: Not disclosed
karnataka
Salary: Not disclosed
Bengaluru/Bangalore
15.6 - 17.99988 Lacs P.A.
Chennai, Tamil Nadu, India
Salary: Not disclosed
Chandigarh
8.0 - 9.5 Lacs P.A.
Chennai
10.0 - 10.0 Lacs P.A.
Gurugram, Haryana, India
Experience: Not specified
Salary: Not disclosed
Chandigarh, Chandigarh, India
Salary: Not disclosed
Pune, Maharashtra, India
Salary: Not disclosed