PySpark

Teamware Solutions

2 - 5 years

2 - 5 Lacs

Hyderabad Telangana India

Posted:3 months ago| Platform: Foundit logo

Apply

Skills Required

Linux/Unix

Work Mode

On-site

Job Type

Full Time

Job Description

Key Responsibilities:

PySpark Development:

Design, implement, and optimize
PySpark
solutions for large-scale data processing and analysis.
Develop
data pipelines
using
Spark
to handle data transformations, aggregations, and other complex operations efficiently.
Write and optimize
Spark SQL queries
for big data analytics and reporting.
Handle data extraction, transformation, and loading (ETL) processes from various sources into a unified data warehouse or data lake.

Data Pipeline Design & Optimization:

Build and maintain
ETL pipelines
using
PySpark
, ensuring high scalability and performance.
Implement
batch and streaming processing
to handle both real-time and historical data.
Optimize the performance of PySpark applications by applying best practices and techniques such as
partitioning
,
caching
, and
broadcast joins
.

Data Storage & Management:

Work with large datasets and integrate them into storage solutions such as
HDFS
,
S3
,
Azure Blob Storage
, or
Google Cloud Storage
.
Ensure efficient data storage, access, and retrieval through Spark and other tools (e.g.,
Parquet
,
ORC
).
Maintain data quality, consistency, and integrity throughout the pipeline lifecycle.

Cloud Platforms & Big Data Frameworks:

Deploy Spark-based applications on cloud platforms such as
AWS (Amazon EMR)
,
Azure HDInsight
, or
Google Dataproc
.
Work with cloud-native services such as
AWS Lambda
,
S3
,
Google Cloud Storage
, and
Azure Data Lake
to handle and process big data.
Leverage cloud data processing tools and frameworks to scale and optimize the PySpark jobs.

Collaboration & Integration:

Collaborate with cross-functional teams (data scientists, analysts, product managers) to understand business requirements and develop appropriate data solutions.
Integrate data from multiple sources and platforms (e.g., databases, external APIs, flat files) into a unified system.
Provide support for downstream applications and data consumers by ensuring timely and accurate delivery of data.

Performance Tuning & Troubleshooting:

Identify bottlenecks and optimize Spark jobs to improve performance.
Conduct performance tuning of both the
cluster
and individual
Spark jobs
, leveraging Spark's in-built tools for monitoring.
Troubleshoot and resolve issues related to data processing, application failures, and cluster resource utilization.

Documentation & Reporting:

Maintain clear and comprehensive documentation of data pipelines, architectures, and processes.
Create technical documentation to guide future enhancements and troubleshooting.
Provide regular updates on the status of ongoing projects and data processing tasks.

Continuous Improvement:

Stay up to date with the latest trends, technologies, and best practices in big data processing and PySpark.
Contribute to improving development processes, testing strategies, and code quality.
Share knowledge and provide mentoring to junior team members on PySpark best practices.

Required Qualifications:

2-4 years
of professional experience working with
PySpark
and big data technologies.
Strong expertise in
Python
programming with a focus on data processing and manipulation.
Hands-on experience with
Apache Spark
, particularly with PySpark for distributed computing.
Proficiency in
Spark SQL
for data querying and transformation.
Familiarity with cloud platforms like
AWS
,
Azure
, or
Google Cloud
, and experience with
cloud-native big data tools
.
Knowledge of
ETL
processes and tools.
Experience with data storage technologies like
HDFS
,
S3
, or
Google Cloud Storage
.
Knowledge of data formats such as
Parquet
,
ORC
,
Avro
, or
JSON
.
Experience with
distributed computing
and
cluster management
.
Familiarity with
Linux/Unix
and command-line operations.
Strong problem-solving skills and ability to troubleshoot data processing issues.

Mock Interview

Practice Video Interview with JobPe AI

Start Job-Specific Interview

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.