What you will do We are seeking a highly skilled and detail-oriented Data Engineering Test Automation Engineer to ensure the quality, reliability, and performance of our data pipelines and platforms. The ideal candidate has strong experience in data testing, ETL validation, and automation frameworks, and will work closely with data engineers, analysts, and DevOps to build robust test suites. This role involves designing and executing both manual and automated tests for real-time and batch data pipelines across AWS and Databricks platforms, while applying QA best practices in test planning, defect tracking, and lifecycle management to ensure high-quality data delivery.
Roles & Responsibilities: - Design, develop, and maintain automated test scripts for data pipelines, ETL jobs, and data integrations.
- Validate data accuracy, completeness, transformations, and integrity across multiple systems.
- Collaborate with data engineers to define test cases and establish data quality metrics.
- Develop reusable test automation frameworks and CI/CD integrations (e.g., Jenkins, GitHub Actions).
- Perform performance and load testing for data systems.
- Maintain test data management and data mocking strategies.
- Identify and track data quality issues, ensuring timely resolution.
- Perform root cause analysis and drive corrective actions.
- Contribute to QA ceremonies (standups, planning, retrospectives) and drive continuous improvement in QA processes and culture.
What we expect of you We are all different, yet we all use our unique contributions to serve patients.
Basic Qualifications:
- Doctorate degree / Master's degree / Bachelor's degree and 8 to 13 years in Computer Science, IT or related field
- Experience in QA roles, with strong exposure to data pipeline validation and ETL Testing.
- Validate data accuracy, transformations, schema compliance, and completeness across systems using PySpark and SQL.
- Strong hands-on experience with Python, and optionally PySpark, for developing automated data validation scripts.
- Proven experience in validating ETL workflows, with a solid understanding of data transformation logic, schema comparison, and source-to-target mapping.
- Experience working with data integration and processing platforms like Databricks/Snowflake, AWS EMR, Redshift etc
- Experience in manual and automated testing of data pipelines executions for both batch and real-time data pipelines.
- Perform performance testing of large-scale complex data engineering pipelines.
- Ability to troubleshoot data issues independently and collaborate with engineering teams for root cause analysis
- Strong understanding of QA methodologies, test planning, test case design, and defect lifecycle management.
- Hands-on experience with API testing using Postman, pytest, or custom automation scripts
- Experience integrating automated tests into CI/CD pipelines using tools like Jenkins, GitHub Actions, or similar.
- Knowledge of cloud platforms such as AWS, Azure, GCP.
Preferred Qualifications:
- Understanding of data privacy, compliance, and governance frameworks.
- Knowledge of UI automated testing frameworks like Selenium, JUnit, TestNG
- Familiarity with monitoring/observability tools such as Datadog, Prometheus, or Cloud Watch
Professional Certifications (Preferred): - AWS Certified Data Engineer / Data Analyst (preferred on Databricks or cloud environments)
- Certifications in Databricks, AWS, Azure, or data QA (e.g., ISTQB).
Soft Skills: - Excellent critical-thinking and problem-solving skills
- Strong communication and collaboration skills
- Demonstrated awareness of how to function in a team setting
- Demonstrated presentation skills