Job
Description
You should have 6 to 9 years of experience in Data Engineering with a focus on Azure Databricks, Azure Data Factory, Pyspark, and SQL. The work location for this role can be Bengaluru, Pune, Mumbai, Noida, or Gurugram with a notice period of 0 to 15 days. This job is in a Hybrid mode. As a Data Engineer in this role, your key responsibilities will include developing scalable data pipelines using Azure Data Factory (ADF), Databricks, PySpark, and Delta Lake to support ML and AI workloads. You will be optimizing and transforming large datasets for feature engineering, model training, and real-time AI inference. Building and maintaining lakehouse architecture using Azure Data Lake Storage (ADLS) & Delta Lake will also be a part of your responsibilities. Collaborating closely with ML engineers & Data Scientists, you will deliver high-quality, structured data for training Generative AI models. Implementing MLOps best practices for continuous data processing, versioning, and model retraining workflows will be essential. Monitoring & enhancing data quality using Azure Data Quality Services is also a key responsibility. Ensuring cost-efficient data processing in Databricks by utilizing Photon, Delta Caching, and Auto-Scaling Clusters is another important aspect of this role. Securing data pipelines by implementing RBAC, encryption, and governance will also be a part of your duties. The required skills & experience for this position include having 5+ years of experience in Data Engineering with Azure & Databricks. Proficiency in PySpark, SQL, and Delta Lake for large-scale data transformations is crucial. Strong experience with Azure Data Factory (ADF), Azure Synapse, and Event Hubs is necessary. Hands-on experience in building feature stores for ML models will be beneficial. Experience with ML model deployment and MLOps pipelines (MLflow, Kubernetes, or Azure ML) is considered a plus. A good understanding of Generative AI concepts and handling unstructured data is desirable. Strong problem-solving, debugging, and performance optimization skills are also required for this role.,