4.0 - 7.0 years
4.0 - 7.0 Lacs P.A.
Navi Mumbai, Maharashtra, India
Posted:1 week ago| Platform:
On-site
Full Time
We are seeking a skilled Databricks Architect to design, implement, and optimize scalable data solutions within our cloud-based data platform. This role requires extensive knowledge of Databricks (Azure/AWS), data engineering, and a deep understanding of data architecture principles, with the ability to drive strategy, best practices, and hands-on implementation for high-performance data processing and analytics solutions. Responsibilities: Solution Architecture: Design and architect end-to-end data solutions using Databricks and Azure/AWS, including data ingestion, processing, and storage. Delta Lake Implementation: Leverage Delta Lake and Lakehouse architecture to create robust, unified data structures that support advanced analytics and machine learning. Data Processing Development: Develop, design, and automate large-scale, high-performance data processing systems (batch and/or streaming) to drive business growth and enhance the product experience. Performance Tuning: Ensure optimal performance of data pipelines and workloads by implementing best practices for resource management, auto-scaling, and query optimization in Databricks. Engineering Best Practices: Advocate for high-quality software engineering practices in building scalable data infrastructure and pipelines. Architecture/Solution Development: Develop Architecture or solution for large data project using Databricks. Project Leadership: Lead data engineering projects to ensure pipelines are reliable, efficient, testable, and maintainable. Data Modeling: Design data models optimized for storage, retrieval, and critical product and business requirements. Logging Architecture: Understand and influence logging to support data flow, implementing logging best practices as needed. Standardization and Tooling: Contribute to shared data engineering tools and standards to boost productivity and quality for Data Engineers across the company. Collaboration: Work closely with leadership, engineers, program managers, and data scientists to understand and meet data needs. Partner Education: Use data engineering expertise to identify gaps and improve existing logging and processes for partners. Data Governance: Collaborate with stakeholders to build data lineage, data governance, and data cataloging using unity catalog. Agile Project Management: Lead projects using agile methodologies. Communication: Communicate effectively with stakeholders at all organizational levels. Team Development: Recruit, retain, and develop team members, preparing them for increased responsibilities and challenges. Requirements: 10+ years of relevant industry experience. ETL Expertise: Skilled in custom ETL design, implementation, and maintenance. Data Modeling: Experience in developing and designing data models for reporting systems. Databricks Proficiency: Hands-on experience with Databricks SQL workloads. Data Ingestion: Expertise in data ingestion from offline files (e.g., CSV, TXT, JSON) along with API and DB, CDC data ingestion. Should have handled such projects in past. Pipeline Observability: Skilled in setting up robust observability for complete pipelines and Databricks in Azure/AWS. Database Knowledge: Proficient in relational databases and SQL query authoring. Programming and Frameworks: Experience with Java, Scala, Spark, PySpark, Python, and Databricks. Cloud Platforms: Cloud experience required (Azure/AWS preferred). Data Scale Handling: Experience working with large-scale data. Pipeline Design and Operations: Proven experience in designing, building, and operating robust data pipelines. Performance Monitoring: Skilled in deploying high-performance pipelines with reliable monitoring and logging. Cross-Team Collaboration: Able to work effectively across teams to establish overarching data architecture and provide team guidance. ETL Optimization: Ability to optimize ETL pipelines to reduce data transfer and storage costs. Auto Scaling: Skilled in using Databricks SQL s auto-scaling feature to adjust worker numbers based on workload. Tech Stack: Cloud Platform: Azure/AWS. Azure/AWS: Databricks SQL Serverless, Databricks SQL, Databricks workspaces, Databricks notebooks, Databricks job scheduling, Data Catalog. Data Architecture: Delta Lake, Lakehouse concepts. Data Processing: Spark Structured/Streaming. File Formats: CSV, Avro, Parquet. CI/CD: CI/CD for ETL pipelines. Governance Model: Databricks SQL unified governance model (Unity Catalog) across clouds, supporting open formats and APIs.
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
Delhi, Delhi, India
INR 3.0 - 10.5 Lacs P.A.
Gurgaon / Gurugram, Haryana, India
INR 5.0 - 10.0 Lacs P.A.
Gurgaon / Gurugram, Haryana, India
INR 4.0 - 7.0 Lacs P.A.
Bengaluru / Bangalore, Karnataka, India
INR 3.0 - 10.5 Lacs P.A.
Mohali, Punjab, India
INR 8.0 - 13.0 Lacs P.A.
Bengaluru / Bangalore, Karnataka, India
INR 15.5 - 40.0 Lacs P.A.
Pune, Maharashtra, India
INR 4.5 - 5.0 Lacs P.A.
Hyderabad / Secunderabad, Telangana, Telangana, India
INR 9.0 - 20.0 Lacs P.A.
Gurgaon / Gurugram, Haryana, India
INR 4.0 - 8.0 Lacs P.A.
Ahmedabad, Gujarat, India
INR 8.0 - 12.0 Lacs P.A.