-
Azure Data Architecture: Define, architect, and implement scalable, secure, and cost-effective data solutions on Azure, utilizing Azure Data Lake Storage (ADLS) Gen2, Azure Data Factory (ADF), and Azure Synapse.
-
Databricks Lakehouse Implementation: Architect and optimize the Databricks Lakehouse platform, leveraging Delta Lake for transactional support and implementing robust data ingestion and transformation architectures.
-
GenAI Data Strategy: Lead data engineering initiatives for Generative AI projects, including the design and construction of data pipelines for Retrieval-Augmented Generation (RAG), feature engineering for large language model (LLM) fine-tuning, and managing vector databases and embedding workflows in both Databricks and Azure.
-
Advanced Data Processing: Develop, manage, and optimize large-scale batch and streaming data pipelines using Databricks notebooks with PySpark and SQL. Implement Databricks Workflows for job orchestration, ensuring robust monitoring, error handling, and alerting.
-
Data Governance and Security: Champion data governance best practices using Databricks Unity Catalog to manage permissions, enforce data quality, track lineage, and ensure compliance with security and privacy standards for all data assets.
-
Collaboration and Mentorship: Work closely with AI/ML engineers, data scientists, and business teams to understand data requirements for models and translate these into technical solutions. Provide technical leadership, mentorship, and guidance to the data engineering team.
-
Azure Cloud Architecture: Oversee the design, provisioning, and management of Azure cloud resources, including Azure Active Directory (AAD), networking, and security protocols. Manage Azure Databricks workspaces and clusters, monitor performance, troubleshoot issues, and optimize resource utilization. Utilize advanced Azure services such as Azure Functions, Logic Apps, and Synapse Analytics to construct robust, serverless solutions.
-
Databricks Pipeline Automation: Implement and manage end-to-end CI/CD pipelines for data and analytics projects on Azure Databricks using Azure DevOps and Databricks Asset Bundles (DABs) with Git integration. Automate the deployment of Databricks notebooks, libraries, and jobs across multiple environments (development, staging, production), and define/manage Databricks jobs using CI/CD practices to ensure version control and reliable, repeatable executions.
-
Infrastructure as Code (IaC) and Automation: Develop, implement, and maintain Infrastructure as Code for the entire cloud stack using advanced Azure Resource Manager (ARM) templates. Create complex automation scripts and playbooks with Python to automate infrastructure tasks and streamline workflows.
-
DevSecOps and Governance: Lead the integration of security best practices throughout the CI/CD pipeline and Azure environment. Establish and enforce governance policies for Databricks and Azure, manage access controls, compliance, and data privacy, and implement observability solutions for monitoring, logging, and alerting on Azure and Databricks using tools such as Azure Monitor, Log Analytics, and Grafana.
-
Collaboration and Problem-Solving: Serve as a technical liaison between data engineering, data science, and security teams to align best practices for data processing and MLOps. Provide expert-level troubleshooting and root cause analysis for performance and availability issues.
-
Cloud Infrastructure Management: Manage, optimize, and secure cloud environments on major platforms like Azure, with a focus on scalability and cost efficiency.
-
Process Improvement: Continuously evaluate and optimize existing processes to enhance the speed, quality, and reliability of software delivery.
-
BE/ B Tech graduate with Over 6 to 8years of progressive experience in data engineering, with significant expertise in building solutions on Azure using Databricks.
-
Azure Ecosystem: Expert-level knowledge of Azure Data Platform components, including ADLS Gen2, Azure Data Factory, Azure Synapse Analytics, and Azure Key Vault.
-
Databricks Mastery: Demonstrated expertise with Databricks, including Delta Lake, Unity Catalog, Databricks SQL, MLflow, and advanced Spark optimization techniques such as Photon Engine and Adaptive Query Execution (AQE).
-
GenAI Integration: Hands-on experience creating Generative AI-driven data solutions, such as Retrieval-Augmented Generation (RAG) pipelines, fine-tuning LLMs, and implementing vector search in production environments.
-
Programming Expertise: Mastery of Python (including PySpark and Pandas) and SQL.
-
Data Warehousing and Modeling: Strong understanding of dimensional modeling, data warehousing concepts, and implementing the Medallion architecture within a Lakehouse framework.
-
CI/CD Tools
:
In-depth, hands-on experience with CI/CD platforms such as GitLab CI and GitHub Actions, Infrastructure-as-Code (Terraform), and containerization (Docker, Kubernetes) for data and ML workloads. -
Containerization: Mastery of container technologies like Docker and orchestration platforms like Kubernetes.
-
Monitoring and Observability: Expertise with observability tools such as Grafana.
-
Version Control: Strong proficiency with Git, including advanced workflow management.
-
Operating Systems: Deep knowledge of Linux/Unix administration.
-
GenAI Model Deployment: Lead the deployment of large language models (LLMs) and Generative AI applications on Azure, addressing challenges related to latency, cost, and security.
-
RAG System Implementation: Architect and implement Retrieval-Augmented Generation (RAG) systems on Azure, integrating vector databases (like Azure AI Search) and managing the associated data and infrastructure.
-
AI-Powe'red Automation: Utilize Generative AI tools to automate code generation, improve testing procedures, and develop intelligent automation for operational tasks.