Key Responsibilities • Build and optimize ETL/ELT pipelines using Databricks and ADF , ingesting data from diverse sources including APIs, flat files, and operational databases. • Develop and maintain scalable PySpark jobs for batch and incremental data processing across Bronze, Silver, and Gold layers. • Write clean, production-ready Python code for data processing, orchestration, and integration tasks. • Contribute to the medallion architecture design and help implement data governance patterns across data layers. • Collaborate with analytics, data science, and business teams to design pipelines that meet performance and data quality expectations. • Monitor, troubleshoot, and continuously improve pipeline performance and reliability. • Support CI/CD for data workflows using Git , Databricks Repos , and optionally Terraform for infrastructure-as-code. • Document pipeline logic, data sources, schema transformations, and operational playbooks. ⸻ Required Qualifications • 3–5 years of experience in data engineering roles with increasing scope and complexity. • Strong hands-on experience with Databricks , including Spark, Delta Lake, and SQL-based transformations. • Proficiency in PySpark and Python for large-scale data manipulation and pipeline development. • Hands-on experience with Azure Data Factory for orchestrating data workflows and integrating with Azure services. • Solid understanding of data modeling concepts and modern warehousing principles (e.g., star schema, slowly changing dimensions). • Comfortable with Git-based development workflows and collaborative coding practices. ⸻ Preferred / Bonus Qualifications • Experience with Terraform to manage infrastructure such as Databricks workspaces, ADF pipelines, or storage resources. • Familiarity with Unity Catalog , Databricks Asset Bundles (DAB) , or Delta Live Tables (DLT) . • Experience with Azure DevOps or GitHub Actions for CI/CD in a data environment. • Knowledge of data governance , role-based access control , or data quality frameworks . • Exposure to real-time ingestion using tools like Event Hubs , Azure Functions , or Autoloader .