Azure Data Engineer (Senior)

6 - 10 years

0 Lacs

Posted:11 hours ago| Platform: Shine logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

As a Data Engineer at Infogain, your role involves analyzing existing Hadoop, Pig, and Spark scripts from Dataproc and refactoring them into Databricks-native PySpark. You will be responsible for implementing data ingestion and transformation pipelines using Delta Lake best practices, applying conversion rules and templates for automated code migration and testing, and conducting data validation between legacy and migrated environments. Additionally, you will collaborate on developing AI-driven tools for code conversion, dependency extraction, and error remediation. It is essential to ensure best practices for code versioning, error handling, and performance optimization, as well as actively participate in UAT, troubleshooting, and post-migration validation activities. **Key Responsibilities:** - Analyze existing Hadoop, Pig, and Spark scripts and refactor them into Databricks-native PySpark - Implement data ingestion and transformation pipelines using Delta Lake best practices - Apply conversion rules and templates for automated code migration and testing - Conduct data validation between legacy and migrated environments - Collaborate on developing AI-driven tools for code conversion, dependency extraction, and error remediation - Ensure best practices for code versioning, error handling, and performance optimization - Participate in UAT, troubleshooting, and post-migration validation activities **Qualification Required:** - Core technical skills in Python, PySpark, and SQL - Experience with Databricks, Delta Lake, Unity Catalog, and Databricks Workflows - Knowledge of GCP services such as Dataproc, BigQuery, GCS, Composer/Airflow, and Cloud Functions - Familiarity with Hadoop, Hive, Pig, and Spark SQL for data engineering - Automation experience with migration utilities or AI-assisted code transformation tools - Understanding of CI/CD tools like Git, Jenkins, and Terraform - Proficiency in data comparison utilities and schema validation In addition to the technical skills and qualifications, the ideal candidate for this role should have 6-8 years of experience in data engineering or big data application development. Hands-on experience migrating Spark or Hadoop workloads to Databricks, familiarity with Delta architecture, data quality frameworks, and GCP cloud integration, as well as exposure to GenAI-based tools for automation or code refactoring are considered advantageous. Infogain is a human-centered digital platform and software engineering company based out of Silicon Valley. They specialize in engineering business outcomes for Fortune 500 companies and digital natives across various industries using technologies like cloud, microservices, automation, IoT, and artificial intelligence. Infogain is committed to accelerating experience-led transformation in the delivery of digital platforms and is recognized as a Microsoft Gold Partner and Azure Expert Managed Services Provider. With offices in multiple locations worldwide, Infogain offers a diverse and innovative work environment for professionals in the technology sector. As a Data Engineer at Infogain, your role involves analyzing existing Hadoop, Pig, and Spark scripts from Dataproc and refactoring them into Databricks-native PySpark. You will be responsible for implementing data ingestion and transformation pipelines using Delta Lake best practices, applying conversion rules and templates for automated code migration and testing, and conducting data validation between legacy and migrated environments. Additionally, you will collaborate on developing AI-driven tools for code conversion, dependency extraction, and error remediation. It is essential to ensure best practices for code versioning, error handling, and performance optimization, as well as actively participate in UAT, troubleshooting, and post-migration validation activities. **Key Responsibilities:** - Analyze existing Hadoop, Pig, and Spark scripts and refactor them into Databricks-native PySpark - Implement data ingestion and transformation pipelines using Delta Lake best practices - Apply conversion rules and templates for automated code migration and testing - Conduct data validation between legacy and migrated environments - Collaborate on developing AI-driven tools for code conversion, dependency extraction, and error remediation - Ensure best practices for code versioning, error handling, and performance optimization - Participate in UAT, troubleshooting, and post-migration validation activities **Qualification Required:** - Core technical skills in Python, PySpark, and SQL - Experience with Databricks, Delta Lake, Unity Catalog, and Databricks Workflows - Knowledge of GCP services such as Dataproc, BigQuery, GCS, Composer/Airflow, and Cloud Functions - Familiarity with Hadoop, Hive, Pig, and Spark SQL for data engineering - Automation experience with migration utilities or AI-assisted code transformation tools - Understanding of CI/CD tools like Git, Jenkins, and Terraform -

Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now
Infogain logo
Infogain

IT Services and IT Consulting

Los Gatos CA

RecommendedJobs for You