Job Summary
Experienced ETL Test Engineer with a strong background in writing test cases for databasecentric data management projects built on the Databricks platform, and handson expertise in Python, PySpark, and SQL. The role collaborates closely with Product Owners and Developers to thoroughly understand technical and functional requirements and translate them into robust, automated test coverage for realestate data pipelines (e.g. propertymanagement data, analytics marts).
Role & responsibilities:
Data Validation & Reconciliation:
- Validate sourcetotarget mappings, transformation logic, and aggregations using SQL and PySpark.
- SQL Data testing - Experienced in SQL Procedure/Views/functions
- Perform data profiling (nulls, duplicates, outliers), referentialintegrity checks, SCD
- (Type 1/2), late/earlyarriving data, timezone and partitioning rules.
- Execute full and incremental reconciliations with row and aggregatelevel
- comparisons.
- Experienced in working on Azure Components such as Azure data factory, Azure
- Storages, Azure Synapse, Azure functions and Web client,
- Experienced in writing Python/PySpark scripts
Databricks, Delta Lake & DLT:
- Test DLT pipelines (bronze/silver/gold), including Expectations (dataquality
- constraints), error handling, and lineage.
- Verify Delta Lake behaviors: ACID transactions, schema evolution, time travel,
- OPTIMIZE/VACUUM effects, ZORDER, and Unity Catalog permissions.
- Validate Workflows/Jobs orchestration, task dependencies, parameters, and secrets.
Automation & Frameworks (Pytest)
- Build and maintain a Python/PySpark + Pytest automation framework (fixtures,
- parametrization, markers).
- Implement reusable libraries for row-count/aggregate reconciliation, referential
- integrity, uniqueness/nullable rules, CDC validation, and late-arriving data
- handling.
- Create a mechanism to automate database testing, including dynamic SQL
- driven test scripts for stored procedures, views, and functions.
API & Integration Testing
- Perform API integration testing for PostMan endpoints used by pipelines (including
- Boomi connectors, ADF Web activities, Functions triggers).
- Validate payload contracts (JSON/XML/CSV/Parquet), error handling,
- throttling/retries, and idempotency; exercise HTTP client libraries (e.g., requests,
- .NET Web Client) where relevant.
Required Qualifications:
- 5+ years in Data/ETL Testing or Data Quality Engineering; 3+ years recent experience on
- Databricks with PySpark/Spark SQL and Delta Lake.
- Bachelors degree in computer science, Information Systems, Engineering, or
- equivalent experience.
- Strong Python skills with Pytest (fixtures, parametrization, mocking, assertions) for
- testing PySpark transforms and SQL logic.
- Solid SQL (window functions, complex joins, CTEs) for deep reconciliation and
- validation.
- Hands-on testing of DLT (Delta Live Tables) pipelines: expectations/constraints,
- bronze-silver-gold patterns, schema evolution, and failure/retry logic.
- Practical BI testing experience with Power BI, including security (RLS/OLS) and
- data-refresh validation.
- Strong understanding of data modeling, data quality, and data governance practices.
- Experience with defect tracking and documenting test plans, and execution reports.
- Experience with any cloud platforms such as Azure/AWS/GCP.
- Experience with Agile world and its ceremonies.
- Clear, concise communication and strong analytical/problem-solving skills.
Good to have:
- Understanding about different source types such as Parquet, CSV and open data
- formats, JSON
- Understanding about CDC processes (Change Data Capture)
- Databricks experience Databricks QA Monitoring, understanding about expectation
- engine
- Databricks Analyst Certification