Posted:2 hours ago|
Platform:
On-site
Full Time
Job Title : PySpark Developer
Location : Chennai, Hyderabad, Kolkata
Work Mode : Monday - Friday (5 days WFO)
Experience : 5+ Years in Backend Development
Notice Period : Immediate to 15 days
Must-Have Experience : Python, PySpark, Amazon Redshift, PostgreSQL
About the Role :
We are looking for an experienced PySpark Developer with strong data engineering capabilities to design, develop, and optimize scalable data pipelines for large-scale data processing. The ideal candidate must possess in-depth knowledge of PySpark, SQL, and cloud-based data ecosystems, along with strong problem-solving skills and the ability to work with cross-functional teams.
Roles & Responsibilities :
- Design and develop robust, scalable ETL/ELT pipelines using PySpark to process data from various sources such as databases, APIs, logs, and files.
- Transform raw data into analysis-ready datasets for data hubs and analytical data marts.
- Build reusable, parameterized Spark jobs for batch and micro-batch processing.
- Optimize PySpark job performance to handle large and complex datasets efficiently.
- Ensure data quality, consistency, and lineage, and maintain thorough documentation across
all ingestion workflows.
- Collaborate with Data Architects, Data Modelers, and Data Scientists to implement ingestion
logic aligned with business requirements.
- Work with AWS-based data platforms (S3, Glue, EMR, Redshift) for data movement and
storage.
- Support version control, CI/CD processes, and infrastructure-as-code practices as required.
Must-Have Skills :
- Minimum 5+ years of data engineering experience, with a strong focus on PySpark/Spark.
- Proven experience building data pipelines and ingestion frameworks for relational, semi-
structured (JSON, XML), and unstructured data (logs, PDFs).
- Strong knowledge of Python and related data processing libraries.
- Advanced SQL proficiency (Amazon Redshift, PostgreSQL or similar).
- Hands-on expertise with distributed computing frameworks such as Spark on EMR or
Databricks.
- Familiarity with workflow orchestration tools like AWS Step Functions or similar.
- Good understanding of data lake and data warehouse architectures, including fundamental
data modeling concepts.
Good-to-Have Skills :
- Experience with AWS data services : Glue, S3, Redshift, Lambda, CloudWatch.
- Exposure to Delta Lake or similar large-scale storage technologies.
- Experience with real-time streaming tools such as Spark Structured Streaming or Kafka.
- Understanding of data governance, lineage, and cataloging tools (AWS Glue Catalog, Apache
Atlas).
- Knowledge of DevOps/CI-CD pipelines using Git, Jenkins.
Job Type: Full-time
Pay: ₹1,500,000.00 - ₹2,000,000.00 per year
Application Question(s):
Work Location: In person
Recruitment Hub 365
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.
We have sent an OTP to your contact. Please enter it below to verify.
Practice Python coding challenges to boost your skills
Start Practicing Python Nowhyderābād
15.0 - 20.0 Lacs P.A.
kolkata, hyderabad, chennai
5.0 - 15.0 Lacs P.A.
18.0 - 27.5 Lacs P.A.
chennai, tamil nadu, india
Salary: Not disclosed
kolkata, west bengal, india
Salary: Not disclosed
hyderabad, telangana, india
Salary: Not disclosed
chennai, tamil nadu, india
Salary: Not disclosed
pune, maharashtra, india
Salary: Not disclosed
noida, gurugram
13.0 - 19.0 Lacs P.A.
hyderabad, bengaluru
15.0 - 25.0 Lacs P.A.