Job
Description
As a highly skilled Solution Architect with deep expertise in Databricks Lakehouse and proven experience operationalizing unstructured document AI pipelines in regulated industries, you will play a crucial role in designing and leading end-to-end architectures that transform complex, high-volume documents into governed, queryable knowledge graphs inside Databricks. This transformation will enable automation, compliance, and downstream AI applications, providing Databricks customers with reliable and trustworthy solutions. Your responsibilities will include leading the design and implementation of in-Lakehouse pipelines for unstructured and structured data, utilizing Delta Live Tables, Unity Catalog, and MLflow. Additionally, you will architect solutions for ingestion, OCR, and LLM-based parsing of scanned PDFs, legal/medical records, and complex forms. Designing confidence-tiered pipelines, translating extracted entities and relationships into graph-friendly Delta Gold tables, defining security, lineage, and classification rules, and integrating Databricks outputs with downstream applications will also be key aspects of your role. Collaboration and mentorship are essential components, as you will work closely with data engineers, ML engineers, and domain SMEs to translate business requirements into scalable architectures. Furthermore, you will mentor teams on Databricks and document AI best practices to ensure successful project execution. To excel in this role, you should possess 5+ years of experience in solution or data architecture, with at least 2+ years of experience delivering Databricks-based solutions. Hands-on experience with Unity Catalog, Delta Live Tables, Databricks SQL, and Partner Connect integrations is crucial. Expertise in designing Lakehouse architectures for structured and unstructured data, understanding of OCR integration patterns, experience with confidence scoring and human-in-the-loop patterns, familiarity with knowledge graph concepts, strong SQL skills, and knowledge of cloud data ecosystems are also required. Preferred qualifications include Databricks certifications, experience in delivering document AI pipelines in regulated verticals, familiarity with data mesh or federated governance models, and a background in MLOps and continuous improvement for extraction models. If you are passionate about leveraging technology to drive digital transformation and are eager to contribute to a diverse and empathetic work environment, we welcome your application for this exciting opportunity in Pune, India.,