Jobs

Interviews
Job Alerts
Tools

Upskill and Grow with AI

Mock Interview Practice interviews in realistic simulations

Coding Practice Improve your coding skills with challenges

Certification Earn certifications to validate your skills

AI Learning Get trained with AI expert sessions

Career Path AI insights for smarter career decisions

AI Job Match Score AI-Powered Job Match Against Your Resume and Optimize Your Resume

Career Tools and Resources

Resume Builder Build Professional Resume with Ease

ATS Friendliness Check Check Resume Friendliness for Applicant Tracking Systems

Auto Apply Apply to hundreds of jobs on any platform effortlessly

Co-Pilot (Chrome Extension) Your AI Assistant for Seamless Browsing Efficiency

Interview Questions Streamline interviews with ready-to-use questions

Salaries Discover market-driven salary insights across skillsets and geographies

Companies Explore leading companies actively hiring talent
For Employers

Home
>
Jobs in chennai
>
Citi
>
Senior Data Engineer - Python & Pyspark

Senior Data Engineer - Python & Pyspark

Citi

5 years

0 Lacs

chennai tamil nadu india

Posted:1 day ago| Platform:

Apply

Skills Required

data python pyspark architecture design development maintenance processing strategy drive support analytics reporting etl test governance security integrity reliability optimization pipeline management devops aws azure gcp leadership code collaboration research documentation software engineering redshift databricks sql nosql postgresql mysql mongodb apache programming scala java spark api kafka airflow containerization docker kubernetes git communication agile learning ml visualization tableau power technology accessibility

Work Mode

On-site

Job Type

Full Time

Job Description

The Senior Data Engineer will be responsible for the architecture, design, development, and maintenance of our data platforms, with a strong focus on leveraging Python and PySpark for data processing and transformation. This role requires a strong technical leader who can work independently and as part of a team, contributing to the overall data strategy and helping to drive data-driven decision-making across the organization.

Key Responsibilities

Data Architecture & Design: Design, develop, and optimize data architectures, pipelines, and data models to support various business needs, including analytics, reporting, and machine learning.
ETL/ELT Development (Python/PySpark Focus): Build, test, and deploy highly scalable and efficient ETL/ELT processes using Python and PySpark to ingest, transform, and load data from diverse sources into data warehouses and data lakes. Develop and optimize complex data transformations using PySpark.
Data Quality & Governance: Implement best practices for data quality, data governance, and data security to ensure the integrity, reliability, and privacy of our data assets.
Performance Optimization: Monitor, troubleshoot, and optimize data pipeline performance, ensuring data availability and timely delivery, particularly for PySpark jobs.
Infrastructure Management: Collaborate with DevOps and MLOps teams to manage and optimize data infrastructure, including cloud resources (AWS, Azure, GCP), databases, and data processing frameworks, ensuring efficient operation of PySpark clusters.
Mentorship & Leadership: Provide technical guidance, mentorship, and code reviews to junior data engineers, particularly in Python and PySpark best practices, fostering a culture of excellence and continuous improvement.
Collaboration: Work closely with data scientists, analysts, product managers, and other stakeholders to understand data requirements and deliver solutions that meet business objectives.
Innovation: Research and evaluate new data technologies, tools, and methodologies to enhance our data capabilities and stay ahead of industry trends.
Documentation: Create and maintain comprehensive documentation for data pipelines, data models, and data infrastructure.

Qualifications

Education

Bachelor's or Master's degree in Computer Science, Software Engineering, Data Science, or a related quantitative field.

Experience

5+ years of professional experience in data engineering, with a strong emphasis on building and maintaining large-scale data systems.
Extensive hands-on experience with Python for data engineering tasks.
Proven experience with PySpark for big data processing and transformation.
Proven experience with cloud data platforms (e.g., AWS Redshift, S3, EMR, Glue; Azure Data Lake, Databricks, Synapse; Google BigQuery, Dataflow).
Strong experience with SQL and NoSQL databases (e.g., PostgreSQL, MySQL, MongoDB, Cassandra).
Extensive experience with distributed data processing frameworks, especially Apache Spark.

Technical Skills

Programming Languages: Expert proficiency in Python is mandatory. Strong SQL mastery is essential. Familiarity with Scala or Java is a plus.
Big Data Technologies: In-depth knowledge and hands-on experience with Apache Spark (PySpark) for data processing, including Spark SQL, Spark Streaming, and DataFrame API. Experience with Apache Kafka, Apache Airflow, Delta Lake, or similar technologies.
Data Warehousing: In-depth knowledge of data warehousing concepts, dimensional modeling, and ETL/ELT processes.
Cloud Platforms: Hands-on experience with at least one major cloud provider (AWS, Azure, GCP) and their data services, particularly those supporting Spark/PySpark workloads.
Containerization: Familiarity with Docker and Kubernetes is a plus.
Version Control: Proficient with Git and CI/CD pipelines.

Soft Skills

Excellent problem-solving and analytical abilities.
Strong communication and interpersonal skills, with the ability to explain complex technical concepts to non-technical stakeholders.
Ability to work effectively in a fast-paced, agile environment.
Proactive and self-motivated with a strong sense of ownership.

Preferred Qualifications

Experience with real-time data streaming and processing using PySpark Structured Streaming.
Knowledge of machine learning concepts and MLOps practices, especially integrating ML workflows with PySpark.
Familiarity with data visualization tools (e.g., Tableau, Power BI).
Contributions to open-source data projects.

------------------------------------------------------

Job Family Group:

Technology------------------------------------------------------

Job Family:

Data Analytics------------------------------------------------------

Time Type:

Full time------------------------------------------------------

Most Relevant Skills

Please see the requirements listed above.------------------------------------------------------

Other Relevant Skills

For complementary skills, please see above and/or contact the recruiter.------------------------------------------------------Citi is an equal opportunity employer, and qualified candidates will receive consideration without regard to their race, color, religion, sex, sexual orientation, gender identity, national origin, disability, status as a protected veteran, or any other characteristic protected by law.
If you are a person with a disability and need a reasonable accommodation to use our search tools and/or apply for a career opportunity review Accessibility at Citi.
View Citi’s EEO Policy Statement and the Know Your Rights poster.

More Jobs at Citi

Third Party Risk Management Intermediate Analyst – C11 (Hybrid) CHENNAI

Chennai, Tamil Nadu, India

Experience: Not specified

Salary: Not disclosed

Applications Development Sr Programmer Analyst - C12 - Pune/CHENNAI

Chennai, Tamil Nadu, India

Experience: Not specified

Salary: Not disclosed

Ops Accounting Analyst 1 - C09 - CHENNAI

Chennai, Tamil Nadu, India

Experience: Not specified

Salary: Not disclosed

Senior QA Analyst with Cypress framework- C11 -Chennai

Chennai, Tamil Nadu, India

Experience: Not specified

Salary: Not disclosed

SVP - Reporting & Visualization

Bengaluru, Karnataka

Experience: Not specified

Salary: Not disclosed

Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

Citi

RecommendedJobs for You

Senior Data Engineer - Python & Pyspark

Citi

chennai, tamil nadu, india

Senior Data Engineer - Python & Pyspark

Citi

chennai, tamil nadu, india

Login to

Please Verify Your Phone or Email

Confirm Action

Senior Data Engineer - Python & Pyspark