Gurugram
INR 11.0 - 16.0 Lacs P.A.
Work from Office
Full Time
We are seeking a highly experienced and innovative Senior Data Engineer with a strong background in hybrid cloud data integration, pipeline orchestration, and AI-driven data modeling. This role is responsible for designing, building, and optimizing robust, scalable, and production-ready data pipelines across both AWS and Azure platforms, supporting modern data architectures such as CEDM and Data Vault 2.0. Responsibilities : Design and develop hybrid ETL/ELT pipelines using AWS Glue and Azure Data Factory (ADF). Process files from AWS S3 and Azure Data Lake Gen2, including schema validation and data profiling. Implement event-based orchestration using AWS Step Functions and Apache Airflow (Astronomer). Develop and maintain bronze \u2192 silver \u2192 gold data layers using DBT or Coalesce. Create scalable ingestion workflows using Airbyte, AWS Transfer Family, and Rivery. Integrate with metadata and lineage tools like Unity Catalog and OpenMetadata. Build reusable components for schema enforcement, EDA, and alerting (e.g., MS Teams). Work closely with QA teams to integrate test automation and ensure data quality. Collaborate with cross-functional teams including data scientists and business stakeholders to align solutions with AI/ML use cases. Document architectures, pipelines, and workflows for internal stakeholders. Requirements Essential Skills: Job Experience with cloud platforms: AWS (Glue, Step Functions, Lambda, S3, CloudWatch, SNS, Transfer Family) and Azure (ADF, ADLS Gen2, Azure Functions,Event Grid). Skilled in transformation and ELT tools: Databricks (PySpark), DBT, Coalesce, and Python. Proficient in data ingestion using Airbyte, Rivery, SFTP/Excel files, and SQL Server extracts. Strong understanding of data modeling techniques including CEDM, Data Vault 2.0, and Dimensional Modeling. Hands-on experience with orchestration tools such as AWS Step Functions, Airflow (Astronomer), and ADF Triggers. Expertise in monitoring and logging with CloudWatch, AWS Glue Metrics, MS Teams Alerts, and Azure Data Explorer (ADX). Familiar with data governance and lineage tools: Unity Catalog, OpenMetadata, and schema drift detection. ProficientinversioncontrolandCI / CDusingGitHub , AzureDevOps , CloudFormation, Terraform, and ARM templates. Experienced in data validation and exploratory data analysis with pandas profiling, AWS Glue Data Quality, and Great Expectations. Personal Excellent communication and interpersonal skills, with the ability to engage with teams. Strong problem-solving, decision-making, and conflict-resolution abilities. Proven ability to work independently and lead cross-functional teams. Ability to work in a fast-paced, dynamic environment and handle sensitive issues with discretion and professionalism. Ability to maintain confidentiality and handle sensitive information with attention to detail with discretion. The candidate must have strong work ethics and trustworthiness Must be highly collaborative and team oriented with commitment to excellence. Preferred Skills: Job Proficiency in SQL and at least one programming language (e.g., Python, Scala). Experience with cloud data platforms (e.g., AWS, Azure, GCP) and their data and AI services. Knowledge of ETL tools and frameworks (e.g., Apache NiFi, Talend, Informatica). Deep understandingof AI/Generative AI concepts and frameworks (e.g., TensorFlow, PyTorch, Hugging Face, OpenAI APIs). Experience with data modeling, data structures, and database design. Proficiency with data warehousing solutions (e.g., Redshift, BigQuery, Snowflake). Hands-on experience with big data technologies (e.g., Hadoop, Spark, Kafka). Personal Demonstrate proactive thinking Should have strong interpersonal relations, expert business acumen and mentoring skills Have the ability to work under stringent deadlines and demanding client conditions Ability to work under pressure to achieve the multiple daily deadlines for client deliverables with a mature approach Other Relevant Information: Bachelor in Engineering with specialization in Computer Science or Artificial Intelligence or Information Technology or a related field. 9+ years of experience in data engineering and data architecture. LeewayHertz is an equal opportunity employer and does not discriminate based on race, color, religion, sex, age, disability, national origin, sexual orientation, gender identity, orany other protected status. We encourage a diverse range of applicants.
Gurugram, Delhi / NCR
INR 15.0 - 30.0 Lacs P.A.
Work from Office
Full Time
Job Description We are seeking a highly skilled Senior Data Engineer with deep expertise in AWS data services, data wrangling using Python & PySpark, and a solid understanding of data governance, lineage, and quality frameworks. The ideal candidate will have a proven track record of delivering end-to-end data pipelines for logistics, supply chain, enterprise finance, or B2B analytics use cases. Role & responsibilities. Design, build, and optimize ETL pipelines using AWS Glue 3.0+ and PySpark. Implement scalable and secure data lakes using Amazon S3, following bronze/silver/gold zoning. Write performant SQL using AWS Athena (Presto) with CTEs, window functions, and aggregations. Take full ownership from ingestion transformation validation metadata documentation dashboard-ready output. Build pipelines that are not just performant, but audit-ready and metadata-rich from the first version. Integrate classification tags and ownership metadata into all columns using AWS Glue Catalog tagging conventions. Ensure no pipeline moves to QA or BI team without validation logs and field-level metadata completed. Develop job orchestration workflows using AWS Step Functions integrated with EventBridge or CloudWatch. Manage schemas and metadata using AWS Glue Data Catalog. Take full ownership from ingestion transformation validation metadata documentation dashboard-ready output. Ensure no pipeline moves to QA or BI team without validation logs and field-level metadata completed. Enforce data quality using Great Expectations, with checks for null %, ranges, and referential rules. Ensure data lineage with OpenMetadata or Amundsen and add metadata classifications (e.g., PII, KPIs). Collaborate with data scientists on ML pipelines, handling JSON/Parquet I/O and feature engineering. Must understand how to prepare flattened, filterable datasets for BI tools like Sigma, Power BI, or Tableau. Interpret business metrics such as forecasted revenue, margin trends, occupancy/utilization, and volatility. Work with consultants, QA, and business teams to finalize KPIs and logic. Build pipelines that are not just performant, but audit-ready and metadata-rich from the first version. Integrate classification tags and ownership metadata into all columns using AWS Glue Catalog tagging conventions. Preferred candidate profile Strong hands-on experience with AWS: Glue, S3, Athena, Step Functions, EventBridge, CloudWatch, Glue Data Catalog. Programming skills in Python 3.x, PySpark, and SQL (Athena/Presto). Proficient with Pandas and NumPy for data wrangling, feature extraction, and time series slicing. Strong command over data governance tools like Great Expectations, OpenMetadata / Amundsen. Familiarity with tagging sensitive metadata (PII, KPIs, model inputs). Capable of creating audit logs for QA and rejected data. Experience in feature engineering rolling averages, deltas, and time-window tagging. BI-readiness with Sigma, with exposure to Power BI / Tableau (nice to have).
Gurugram
INR 15.0 - 30.0 Lacs P.A.
Hybrid
Full Time
Job Description We are seeking a highly skilled Senior Data Engineer with deep expertise in AWS data services, data wrangling using Python & PySpark, and a solid understanding of data governance, lineage, and quality frameworks. The ideal candidate will have a proven track record of delivering end-to-end data pipelines for logistics, supply chain, enterprise finance, or B2B analytics use cases. Role & responsibilities Design, build, and optimize ETL pipelines using AWS Glue 3.0+ and PySpark. Implement scalable and secure data lakes using Amazon S3, following bronze/silver/gold zoning. Write performant SQL using AWS Athena (Presto) with CTEs, window functions, and aggregations. Take full ownership from ingestion transformation validation metadata documentation dashboard-ready output. Build pipelines that are not just performant, but audit-ready and metadata-rich from the first version. Integrate classification tags and ownership metadata into all columns using AWS Glue Catalog tagging conventions. Ensure no pipeline moves to QA or BI team without validation logs and field-level metadata completed. Develop job orchestration workflows using AWS Step Functions integrated with EventBridge or CloudWatch. Manage schemas and metadata using AWS Glue Data Catalog. Take full ownership from ingestion transformation validation metadata documentation dashboard-ready output. Ensure no pipeline moves to QA or BI team without validation logs and field-level metadata completed. Enforce data quality using Great Expectations, with checks for null %, ranges, and referential rules. Ensure data lineage with OpenMetadata or Amundsen and add metadata classifications (e.g., PII, KPIs). Collaborate with data scientists on ML pipelines, handling JSON/Parquet I/O and feature engineering. Must understand how to prepare flattened, filterable datasets for BI tools like Sigma, Power BI, or Tableau. Interpret business metrics such as forecasted revenue, margin trends, occupancy/utilization, and volatility. Work with consultants, QA, and business teams to finalize KPIs and logic. Build pipelines that are not just performant, but audit-ready and metadata-rich from the first version. Integrate classification tags and ownership metadata into all columns using AWS Glue Catalog tagging conventions. Preferred candidate profile Strong hands-on experience with AWS: Glue, S3, Athena, Step Functions, EventBridge, CloudWatch, Glue Data Catalog. Programming skills in Python 3.x, PySpark, and SQL (Athena/Presto). Proficient with Pandas and NumPy for data wrangling, feature extraction, and time series slicing. Strong command over data governance tools like Great Expectations, OpenMetadata / Amundsen. Familiarity with tagging sensitive metadata (PII, KPIs, model inputs). Capable of creating audit logs for QA and rejected data. Experience in feature engineering rolling averages, deltas, and time-window tagging. BI-readiness with Sigma, with exposure to Power BI / Tableau (nice to have).
Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.
We have sent an OTP to your contact. Please enter it below to verify.