About the Role
We are seeking a Senior Databricks Architect to lead the design and implementation of scalable, secure, and cost-optimized data platforms on Azure using Databricks. You will own end-to-end architecture for data ingestion, transformation, governance, and analytics, and mentor engineering teams to deliver high-quality data solutions.
Key Responsibilities
- Architect and implement enterprise-grade data platforms on Azure Databricks (DLT, Workflows, Unity Catalog) with strong security, governance, and reliability.
- Design end-to-end data pipelines: batch and streaming (Structured Streaming), CDC, and near real-time analytics.
- Define and enforce data modeling standards (3NF, dimensional modeling/starsnowflake, data vault where applicable).
- Optimize Databricks workloads (cluster sizing, autoscaling, Delta Lake file management, Z-Ordering, OPTIMIZE/VACUUM) for performance and cost.
- Establish best practices for code (PySpark/Scala/SQL), notebooks, repos/CI-CD, testing, observability, and documentation.
- Integrate with broader Azure ecosystem: ADLS Gen2, ADF/Synapse pipelines, Azure SQL/MI, Azure Key Vault, Event Hubs/Kafka, Azure Functions/Logic Apps.
- Drive data governance, lineage, and access control with Unity Catalog, data masking, and RBAC/ABAC; align with organizational compliance (GDPR/DPDP, SOC2).
- Collaborate with security/network teams on private networking (VNet injection), peering, firewall rules, Private Link, NSGs, VPN/ExpressRoute.
- Lead capacity planning, SLAs/SLOs, incident response, and cost management (photon, serverless, job vs all-purpose clusters).
- Partner with stakeholders (data science, BI, product) to translate business requirements into scalable data solutions.
- Mentor and upskill data engineers; perform code/design reviews; establish architectural guardrails and reference architectures.
Required Qualifications
- 8+ years of professional experience in data engineering/architecture, with at least 3+ years hands-on in Databricks on Azure.
- Strong data modeling expertise: relational 3NF, dimensional (Kimball), and practical experience with Delta Lake/medallion architecture.
- Advanced programming skills: PySpark and/or Scala; solid grasp of OOP principles and design patterns in data engineering contexts.
- Expert SQL skills (analytic/window functions, performance tuning, query optimization).
- Deep experience with Azure services: ADLS Gen2, ADF/Synapse, Azure SQL/MI, Event Hubs; identity and security (AAD, service principals, managed identities, Key Vault).
- Hands-on with Databricks features: Delta Live Tables, Unity Catalog, Workflows, Repos/CI-CD, cluster policies, Jobs API.
- Networking fundamentals for cloud data platforms: VNet design, subnets, routing, Private Link/Endpoints, peering, DNS, firewalling.
- Database fundamentals: relational and NoSQL (e.g., PostgreSQL, SQL Server, Cosmos DB), indexing, partitioning, transactions.
- Proven track record building robust ELT/ETL pipelines, CDC (e.g., with Auto Loader, Kafka), and streaming architectures.
- Experience with DevOps practices: Git, CI-CD (Azure DevOps/GitHub Actions), Infrastructure-as-Code (Terraform/ARM/Bicep preferred).
- Strong communication, stakeholder management, and leadership/mentoring experience.
- Bachelors degree in Computer Science/Engineering or related field from a reputed Indian college/institute.
Preferred/Good-to-Have
- Databricks certifications (Databricks Certified Data Engineer Professional, Machine Learning Professional, or Architect Associate).
- Experience with BI tools and semantic layers (Power BI, dbt, Tabular models).
- Exposure to MLOps and feature stores (Databricks Feature Store) and model serving.
- Cost governance experience: chargeback/showback, cluster policy enforcement, utilization dashboards.
- Knowledge of data privacy and compliance frameworks (DPDP Act, GDPR, HIPAA as applicable).