Senior Lead Data Engineer

GSK India

6 - 8 years

35 - 40 Lacs

bengaluru

Posted:1 day ago| Platform:

Apply

Skills Required

procurement automation linux networking infrastructure management active directory agile analytics sql python

Work Mode

Work from Office

Job Type

Full Time

Job Description

The Senior Lead Data Engineer will operate within a matrixed product team design, holding responsibility for the technical solution design, implementation, and ongoing enhancement of products and systems developed by Medical Digital and Tech.
In alignment with Agile Ways of Working, DevSecOps principles, and requirements for Compliance and Digital Certainty, this role will collaborate closely with a Product Manager, adhering to agile and DevOps methodologies.
The Senior Lead Data Engineer will serve as a T-Shaped engineer, demonstrating both deep expertise and broad proficiency across essential engineering competencies, such as Software Development, Automated Testing, DevOps, CI/CD, Data Science/Analytics, and Lifecycle Management.
We are seeking an exceptionally skilled and strategic Senior lead Data Engineer with DevOps skills to lead the design, development, and optimization of our data and medical products and systems.
The ideal candidate will possess significant expertise in Azure Databricks, modern Lakehouse architectures, infrastructure management, data pipeline automation, and advanced security practices, including the application of Generative AI to create advanced data and analytics solutions.
The role requires effective collaboration with experts from other tech teams and subject matter domains, as we'll as core engineering knowledge and experience with industry technologies, practices, and frameworks such as REACT, Azure Cloud Ops, AI/ML, CI/CD, DevOps, Automated Testing, and API Architectures.

Key Responsibilities

Azure Data Architecture: Define, architect, and implement scalable, secure, and cost-effective data solutions on Azure, utilizing Azure Data Lake Storage (ADLS) Gen2, Azure Data Factory (ADF), and Azure Synapse.
Databricks Lakehouse Implementation: Architect and optimize the Databricks Lakehouse platform, leveraging Delta Lake for transactional support and implementing robust data ingestion and transformation architectures.
GenAI Data Strategy: Lead data engineering initiatives for Generative AI projects, including the design and construction of data pipelines for Retrieval-Augmented Generation (RAG), feature engineering for large language model (LLM) fine-tuning, and managing vector databases and embedding workflows in both Databricks and Azure.
Advanced Data Processing: Develop, manage, and optimize large-scale batch and streaming data pipelines using Databricks notebooks with PySpark and SQL. Implement Databricks Workflows for job orchestration, ensuring robust monitoring, error handling, and alerting.
Data Governance and Security: Champion data governance best practices using Databricks Unity Catalog to manage permissions, enforce data quality, track lineage, and ensure compliance with security and privacy standards for all data assets.
Collaboration and Mentorship: Work closely with AI/ML engineers, data scientists, and business teams to understand data requirements for models and translate these into technical solutions. Provide technical leadership, mentorship, and guidance to the data engineering team.
Azure Cloud Architecture: Oversee the design, provisioning, and management of Azure cloud resources, including Azure Active Directory (AAD), networking, and security protocols. Manage Azure Databricks workspaces and clusters, monitor performance, troubleshoot issues, and optimize resource utilization. Utilize advanced Azure services such as Azure Functions, Logic Apps, and Synapse Analytics to construct robust, serverless solutions.
Databricks Pipeline Automation: Implement and manage end-to-end CI/CD pipelines for data and analytics projects on Azure Databricks using Azure DevOps and Databricks Asset Bundles (DABs) with Git integration. Automate the deployment of Databricks notebooks, libraries, and jobs across multiple environments (development, staging, production), and define/manage Databricks jobs using CI/CD practices to ensure version control and reliable, repeatable executions.
Infrastructure as Code (IaC) and Automation: Develop, implement, and maintain Infrastructure as Code for the entire cloud stack using advanced Azure Resource Manager (ARM) templates. Create complex automation scripts and playbooks with Python to automate infrastructure tasks and streamline workflows.
DevSecOps and Governance: Lead the integration of security best practices throughout the CI/CD pipeline and Azure environment. Establish and enforce governance policies for Databricks and Azure, manage access controls, compliance, and data privacy, and implement observability solutions for monitoring, logging, and alerting on Azure and Databricks using tools such as Azure Monitor, Log Analytics, and Grafana.
Collaboration and Problem-Solving: Serve as a technical liaison between data engineering, data science, and security teams to align best practices for data processing and MLOps. Provide expert-level troubleshooting and root cause analysis for performance and availability issues.
Cloud Infrastructure Management: Manage, optimize, and secure cloud environments on major platforms like Azure, with a focus on scalability and cost efficiency.
Process Improvement: Continuously evaluate and optimize existing processes to enhance the speed, quality, and reliability of software delivery.

Required Skills & Qualifications: Technical Skills

BE/ B Tech graduate with Over 6 to 8years of progressive experience in data engineering, with significant expertise in building solutions on Azure using Databricks.
Azure Ecosystem: Expert-level knowledge of Azure Data Platform components, including ADLS Gen2, Azure Data Factory, Azure Synapse Analytics, and Azure Key Vault.
Databricks Mastery: Demonstrated expertise with Databricks, including Delta Lake, Unity Catalog, Databricks SQL, MLflow, and advanced Spark optimization techniques such as Photon Engine and Adaptive Query Execution (AQE).
GenAI Integration: Hands-on experience creating Generative AI-driven data solutions, such as Retrieval-Augmented Generation (RAG) pipelines, fine-tuning LLMs, and implementing vector search in production environments.
Programming Expertise: Mastery of Python (including PySpark and Pandas) and SQL.
Data Warehousing and Modeling: Strong understanding of dimensional modeling, data warehousing concepts, and implementing the Medallion architecture within a Lakehouse framework.
CI/CD Tools
:
In-depth, hands-on experience with CI/CD platforms such as GitLab CI and GitHub Actions, Infrastructure-as-Code (Terraform), and containerization (Docker, Kubernetes) for data and ML workloads.
Containerization: Mastery of container technologies like Docker and orchestration platforms like Kubernetes.
Monitoring and Observability: Expertise with observability tools such as Grafana.
Version Control: Strong proficiency with Git, including advanced workflow management.
Operating Systems: Deep knowledge of Linux/Unix administration.
GenAI Model Deployment: Lead the deployment of large language models (LLMs) and Generative AI applications on Azure, addressing challenges related to latency, cost, and security.
RAG System Implementation: Architect and implement Retrieval-Augmented Generation (RAG) systems on Azure, integrating vector databases (like Azure AI Search) and managing the associated data and infrastructure.
AI-Powe'red Automation: Utilize Generative AI tools to automate code generation, improve testing procedures, and develop intelligent automation for operational tasks.

Preferred Qualifications

Databricks certifications such as Databricks Certified Data Engineer Professional or Generative AI Engineer. Experience with Generative AI-related technologies and frameworks like Azure AI Search and Lang Chain.

More Jobs at GSK India

Safety Evaluation Scientist-Literature

Bengaluru

3 - 6 yrs

INR 5 - 8 Lacs

Information Security training and Awareness Senior Analyst

Bengaluru

5 - 10 yrs

INR 7 - 12 Lacs

Senior Lead Cyber Security Risk Analyst

Bengaluru

5 - 10 yrs

INR 7 - 12 Lacs

Information Security Training & Awareness Senior Analyst

Bengaluru

5 - 10 yrs

INR 7 - 12 Lacs

Team Lead Document Anonymization

Bengaluru

10 - 15 yrs

INR 35 - 45 Lacs

Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.