Associate III - Data Engineering

5 years

0 Lacs

Posted:1 day ago| Platform: Linkedin logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

Role Description

Role Proficiency:This role requires proficiency in data pipeline development including coding and testing data pipelines for ingesting wrangling transforming and joining data from various sources. Must be adept at using ETL tools such as Informatica Glue Databricks and DataProc with coding skills in Python PySpark and SQL. Works independently and demonstrates proficiency in at least one domain related to data with a solid understanding of SCD concepts and data warehousing principles.

Outcomes

  • Collaborate closely with data analysts data scientists and other stakeholders to ensure data accessibility quality and security across various data sources.rnDesign develop and maintain data pipelines that collect process and transform large volumes of data from various sources.
  • Implement ETL (Extract Transform Load) processes to facilitate efficient data movement and transformation.
  • Integrate data from multiple sources including databases APIs cloud services and third-party data providers.
  • Establish data quality checks and validation procedures to ensure data accuracy completeness and consistency.
  • Develop and manage data storage solutions including relational databases NoSQL databases and data lakes.
  • Stay updated on the latest trends and best practices in data engineering cloud technologies and big data tools.

Measures Of Outcomes

  • Adherence to engineering processes and standards
  • Adherence to schedule / timelines
  • Adhere to SLAs where applicable
  • # of defects post delivery
  • # of non-compliance issues
  • Reduction of reoccurrence of known defects
  • Quickly turnaround production bugs
  • Completion of applicable technical/domain certifications
  • Completion of all mandatory training requirementst
  • Efficiency improvements in data pipelines (e.g. reduced resource consumption faster run times).
  • Average time to detect respond to and resolve pipeline failures or data issues.

Outputs Expected

Code Development:
  • Develop data processing code independently ensuring it meets performance and scalability requirements.

Documentation

  • Create documentation for personal work and review deliverable documents including source-target mappings test cases and results.

Configuration

  • Follow configuration processes diligently.

Testing

  • Create and conduct unit tests for data pipelines and transformations to ensure data quality and correctness.
  • Validate the accuracy and performance of data processes.

Domain Relevance

  • Develop features and components with a solid understanding of the business problems being addressed for the client.
  • Understand data schemas in relation to domain-specific contexts such as EDI formats.

Defect Management

  • Raise fix and retest defects in accordance with project standards.

Estimation

  • Estimate time effort and resource dependencies for personal work.

Knowledge Management

  • Consume and contribute to project-related documents SharePoint libraries and client universities.

Design Understanding

  • Understand design and low-level design (LLD) and link it to requirements and user stories.

Certifications

  • Obtain relevant technology certifications to enhance skills and knowledge.

Skill Examples

  • Proficiency in SQL Python or other programming languages utilized for data manipulation.
  • Experience with ETL tools such as Apache Airflow Talend Informatica AWS Glue Dataproc and Azure ADF.
  • Hands-on experience with cloud platforms like AWS Azure or Google Cloud particularly with data-related services (e.g. AWS Glue BigQuery).
  • Conduct tests on data pipelines and evaluate results against data quality and performance specifications.
  • Experience in performance tuning data processes.
  • Proficiency in querying data warehouses.

Knowledge Examples

Knowledge Examples
  • Knowledge of various ETL services provided by cloud providers including Apache PySpark AWS Glue GCP DataProc/DataFlow and Azure ADF/ADLF.
  • Understanding of data warehousing principles and practices.
  • Proficiency in SQL for analytics including windowing functions.
  • Familiarity with data schemas and models.
  • Understanding of domain-related data and its implications.

Additional Comments

Role Overview: We are seeking a skilled Azure Data Engineer with 3–5 years of experience in designing, developing, and maintaining modern data pipelines and data integration solutions using Azure services. The ideal candidate should have strong expertise in Azure Data Factory (ADF), Azure Databricks, Azure Synapse, and Azure Data Lake Storage (ADLS). You will work closely with business analysts, architects, and data scientists to deliver reliable and scalable data solutions that power analytics and business intelligence platforms. ________________________________________ Key Responsibilities: Data Ingestion & Integration
  • Design, build, and maintain data pipelines using Azure Data Factory for batch and incremental data ingestion.
  • Connect to various data sources (SQL Server, REST APIs, CSV, JSON, SAP, etc.) and integrate into the Azure ecosystem.
  • Develop metadata-driven and parameterized pipelines to improve reusability.
  • Implement data validation, error handling, and logging frameworks in ADF. Data Transformation & Processing
  • Use Azure Databricks (PySpark) for data cleansing, transformation, and enrichment.
  • Optimize Spark jobs for performance and cost efficiency.
  • Implement ETL/ELT workflows with Delta Lake and Medallion (Bronze, Silver, Gold) architecture. Data Storage & Modeling
  • Work with Azure Data Lake Storage Gen2 for raw and curated data zones.
  • Develop data models in Azure Synapse Analytics / SQL Server for reporting and analytics.
  • Implement partitioning, indexing, and performance tuning strategies. Deployment & DevOps
  • Implement CI/CD pipelines using Azure DevOps or GitHub Actions for data workflows.
  • Collaborate with architects to automate deployments and version control using Git. Security & Governance
  • Manage data access using Azure RBAC, Managed Identities, and Key Vault.
  • Ensure data security, compliance, and privacy as per organizational standards. Collaboration
  • Work with data analysts, BI developers, and business users to define data requirements.
  • Participate in code reviews and adhere to best practices in data engineering. ________________________________________ Technical Skills Required: Category Skills Azure Services: Azure Data Factory, Azure Databricks, Azure Synapse, ADLS Gen2, Azure SQL Database Programming: Python (PySpark), SQL, Spark SQL Data Modeling: Star/Snowflake schema, Dimensional modeling Source Systems: SQL Server, Oracle, SAP, Flat Files (CSV, JSON, XML), REST APIs Version Control & CI/CD: Git, Azure DevOps Scheduling & Monitoring: ADF triggers, Databricks jobs, Log Analytics Security: Managed Identity, Key Vault, Access Control Preferred: Power BI basics, exposure to DataBricks Delta Live Tables or Synapse Pipelines ________________________________________ Soft Skills:
  • Strong analytical and problem-solving skills.
  • Good communication and collaboration abilities.
  • Ability to work in agile/scrum environments.
  • Self-driven and proactive in identifying process improvements. ________________________________________ Educational Qualifications:
  • Bachelor’s degree in Computer Science, Information Technology, or a related field.
  • Azure Data Engineer Associate (DP-203) certification preferred. ________________________________________ Example Project Responsibilities:
  • Design and implement end-to-end data ingestion from on-prem SQL Server to Azure Data Lake using ADF.
  • Build Databricks notebooks for data cleansing and transformations using PySpark.
  • Implement Delta Lake tables and load curated data into Synapse for reporting.
  • Collaborate with BI teams to publish Power BI dashboards on top of Synapse datasets. ________________________________________ Optional (Good to Have):
  • Experience with Real-time data processing (Event Hub / Stream Analytics).
  • Knowledge of Infrastructure as Code (IaC) using Terraform or ARM templates.
  • Familiarity with data quality and data catalog tools (Purview).

Skills

Azure Data Factory,Azure Databricks,Azure Synapse,Azure Data Lake Storage

Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now
UST logo
UST

IT Services and IT Consulting

Aliso Viejo CA

RecommendedJobs for You