Data Analyst

2 years

0 Lacs

Posted:1 day ago| Platform: Linkedin logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

Job Summary

We are looking for a highly skilled Big Data & ETL Tester to join our data engineering and analytics team. The ideal candidate will have strong experience in PySpark, SQL, and Python, with a deep understanding of ETL pipelines, data validation, and cloud-based testing on AWS. Familiarity with data visualization tools like Apache Superset or Power BI is a strong plus


You will work closely with our data engineering team to ensure data availability, consistency, and quality across complex data pipelines, and help transform business requirements into robust data testing frameworks.


Key Responsibilities

• Collaborate with big data engineers to validate data pipelines and ensure data integrity across ingestion, processing, and transformation stages.

• Write complex PySpark and SQL queries to test and validate large-scale datasets.

• Perform ETL testing, covering schema validation, data completeness, accuracy, transformation logic, and performance testing.

• Conduct root cause analysis of data issues using structured debugging approaches.

• Build automated test scripts in Python for regression, smoke, and end-to-end data testing.

• Analyze large datasets to track KPIs and performance metrics supporting business operations and strategic decisions.

• Work with data analysts and business teams to translate business needs into testable data validation frameworks.

• Communicate testing results, insights, and data gaps via reports or dashboards (Superset/Power BI preferred).

• Identify and document areas of improvement in data processes and advocate for automation opportunities.

• Maintain detailed documentation of test plans, test cases, results, and associated dashboards.


Required Skills and Qualifications 


2+ years of experience in big data testing and ETL testing.

• Strong hands-on skills in PySpark, SQL, and Python.

• Solid experience working with cloud platforms, especially AWS (S3, EMR, Glue, Lambda, Athena, etc.).

• Familiarity with data warehouse and lakehouse architectures.

• Working knowledge of Apache Superset, Power BI, or similar visualization tools.

• Ability to analyze large, complex datasets and provide actionable insights.

• Strong understanding of data modeling concepts, data governance, and quality frameworks.

• Experience with automation frameworks and CI/CD for data validation is a plus



Preferred Qualifications

• Experience with Airflow, dbt, or other data orchestration tools.

• Familiarity with data cataloging tools (e.g., AWS Glue Data Catalog).

• Prior experience in a product or SaaS-based company with high data volume environments.

Why Join Us?

• Opportunity to work with cutting-edge data stack in a fast-paced environment.

• Collaborate with passionate data professionals driving real business impact.

• Flexible work environment with a focus on learning and innovation

Mock Interview

Practice Video Interview with JobPe AI

Start PySpark Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

RecommendedJobs for You

Chennai, Tamil Nadu, India

Mumbai, Maharashtra, India

Navi Mumbai, Maharashtra, India

Mumbai, Maharashtra, India