Job
Description
As a highly skilled PySpark Developer with expertise in Distributed data processing, your role will involve optimizing Spark Jobs and ensuring efficient data processing in a Big Data platform. This requires a strong understanding of Spark performance tuning, distributed computing, and Big data architecture. Key Responsibilities: - Analyze and comprehend existing data ingestion and reconciliation frameworks - Develop and implement PySpark programs to process large datasets in Hive tables and Big data platforms - Perform complex transformations including reconciliation and advanced data manipulations - Fine-tune Spark jobs for performance optimization, ensuring efficient data processing at scale - Work closely with Data Engineers, Architects, and Analysts to understand data reconciliation requirements - Collaborate with cross-functional teams to improve data ingestion, transformation, and validation workflows Required Skills and Qualifications: - Extensive hands-on experience with Python, PySpark, and PyMongo for efficient data processing across distributed and columnar databases - Expertise in Spark Optimization techniques, and ability to debug Spark performance issues and optimize resource utilization - Proficiency in Python and Spark DataFrame API, and strong experience in complex data transformations using PySpark - Experience working with large-scale distributed data processing, and solid understanding of Big Data architecture and distributed computing frameworks - Strong problem-solving and analytical skills - Experience with CI/CD for data pipelines - Experience with SnowFlake for data processing and integration In addition to the technical skills required for this role, you should have 8+ years of relevant experience in Apps Development or systems analysis and the ability to adjust priorities quickly as circumstances dictate. Education: - Bachelors degree/University degree or equivalent experience in Computer science - Masters degree preferred Please note that Citi is an equal opportunity and affirmative action employer, and they invite all qualified interested applicants to apply for career opportunities. As a highly skilled PySpark Developer with expertise in Distributed data processing, your role will involve optimizing Spark Jobs and ensuring efficient data processing in a Big Data platform. This requires a strong understanding of Spark performance tuning, distributed computing, and Big data architecture. Key Responsibilities: - Analyze and comprehend existing data ingestion and reconciliation frameworks - Develop and implement PySpark programs to process large datasets in Hive tables and Big data platforms - Perform complex transformations including reconciliation and advanced data manipulations - Fine-tune Spark jobs for performance optimization, ensuring efficient data processing at scale - Work closely with Data Engineers, Architects, and Analysts to understand data reconciliation requirements - Collaborate with cross-functional teams to improve data ingestion, transformation, and validation workflows Required Skills and Qualifications: - Extensive hands-on experience with Python, PySpark, and PyMongo for efficient data processing across distributed and columnar databases - Expertise in Spark Optimization techniques, and ability to debug Spark performance issues and optimize resource utilization - Proficiency in Python and Spark DataFrame API, and strong experience in complex data transformations using PySpark - Experience working with large-scale distributed data processing, and solid understanding of Big Data architecture and distributed computing frameworks - Strong problem-solving and analytical skills - Experience with CI/CD for data pipelines - Experience with SnowFlake for data processing and integration In addition to the technical skills required for this role, you should have 8+ years of relevant experience in Apps Development or systems analysis and the ability to adjust priorities quickly as circumstances dictate. Education: - Bachelors degree/University degree or equivalent experience in Computer science - Masters degree preferred Please note that Citi is an equal opportunity and affirmative action employer, and they invite all qualified interested applicants to apply for career opportunities.