Job
Description
Job Summary
Synechron is seeking a skilled PySpark Data Engineer to design, develop, and optimize data processing solutions leveraging modern big data technologies. In this role, you will lead efforts to build scalable data pipelines, support data integration initiatives, and work closely with cross-functional teams to enable data-driven decision-making. Your expertise will contribute to enhancing business insights and operational efficiency, positioning Synechron as a pioneer in adopting emerging data technologies.Software RequirementsRequired Software Skills:PySpark (Apache Spark with Python) experience in developing data pipelinesApache Spark ecosystem knowledgePython programming (versions 3.7 or higher)SQL and relational database management systems (e.g., PostgreSQL, MySQL)Cloud platforms (preferably AWS or Azure)Version control: GITData workflow orchestration tools like Apache AirflowData management tools: SQL Developer or equivalentPreferred Software Skills:Experience with Hadoop ecosystem componentsKnowledge of containerization (Docker, Kubernetes)Familiarity with data lake and data warehouse solutions (e.g., AWS S3, Redshift, Snowflake)Monitoring and logging tools (e.g., Prometheus, Grafana)Overall ResponsibilitiesLead the design and implementation of large-scale data processing solutions using PySpark and related technologiesCollaborate with data scientists, analysts, and business teams to understand data requirements and deliver scalable pipelinesMentor junior team members on best practices in data engineering and emerging technologiesEvaluate new tools and methodologies to optimize data workflows and improve data qualityEnsure data solutions are robust, scalable, and aligned with organizational data governance policiesStay informed on industry trends and technological advancements in big data and analyticsSupport production environment stability and performance tuning of data pipelinesDrive innovative approaches to extract value from large and complex datasetsTechnical Skills (By Category)Programming Languages:Required: Python (PySpark experience minimum 2 years)Preferred: Scala (for Spark), SQL, Bash scriptingDatabases/Data Management:Relational databases (PostgreSQL, MySQL)Distributed storage solutions (HDFS, cloud object storage like S3 or Azure Blob Storage)Data warehousing platforms (Snowflake, Redshift preferred)Cloud Technologies:Required: Experience deploying and managing data solutions on AWS or AzurePreferred: Knowledge of cloud-native services like EMR, Data Factory, or Azure Data LakeFrameworks and Libraries:Apache Spark (PySpark)Airflow or similar orchestration toolsData processing frameworks (Kafka, Spark Streaming preferred)Development Tools and Methodologies:Version control with GITAgile management tools: Jira, ConfluenceContinuous integration/deployment pipelines (Jenkins, GitLab CI)Security Protocols:Understanding of data security, access controls, and GDPR compliance in cloud environmentsExperience RequirementsMinimum of 5+ years in data engineering, with hands-on PySpark experienceProven track record of developing, deploying, and maintaining scalable data pipelinesExperience working with data lakes, data warehouses, and cloud data servicesDemonstrated leadership in projects involving big data technologiesExperience mentoring junior team members and collaborating across teamsPrior experience in financial, healthcare, or retail sectors is beneficial but not mandatoryDay-to-Day ActivitiesDevelop, optimize, and deploy big data pipelines using PySpark and related toolsCollaborate with data analysts, data scientists, and business teams to define data requirementsConduct code reviews, troubleshoot pipeline issues, and optimize performanceMentor junior team members on best practices and emerging technologiesDesign solutions for data ingestion, transformation, and storageEvaluate new tools and frameworks for continuous improvementMaintain documentation, monitor system health, and ensure security complianceParticipate in sprint planning, daily stand-ups, and project retrospectives to align prioritiesQualificationsBachelors or Masters degree in Computer Science, Information Technology, or related disciplineRelevant industry certifications (e.g., AWS Data Analytics, GCP Professional Data Engineer) preferredProven experience working with PySpark and big data ecosystemsStrong understanding of software development lifecycle and data governance standardsCommitment to continuous learning and professional development in data engineering technologiesProfessional CompetenciesAnalytical mindset and problem-solving acumen for complex data challengesEffective leadership and team management skillsExcellent communication skills tailored to technical and non-technical audiencesAdaptability in fast-evolving technological landscapesStrong organizational skills to prioritize tasks and manage multiple projectsInnovation-driven with a passion for leveraging emerging data technologies