CGI is looking for a Lead Big Data Developer who will design, build, and optimize large-scale data pipelines and processing systems.
The ideal candidate is a hands-on technologist with deep expertise in Python, PySpark, and SQL, capable of leading data initiatives, mentoring team members, and ensuring delivery of high-performance data solutions across the organization.
- Lead the design and development of scalable, efficient, and reliable data pipelines using PySpark, Python, and SQL.
- Collaborate with data architects, analysts, and business stakeholders to understand data requirements and translate them into technical solutions.
- Optimize data workflows for performance, scalability, and cost efficiency in big data environments (e.g., Databricks, EMR, GCP DataProc, or similar).
- Implement data ingestion, transformation, and aggregation processes from multiple structured and unstructured sources.
- Ensure data quality, integrity, and consistency through validation, testing, and monitoring frameworks.
- Work with cloud-based data platforms (AWS, Azure, or GCP) and leverage tools like S3, Delta Lake, or Snowflake.
- Design and enforce best practices for coding, version control, and CI/CD within the data engineering team.
- Provide technical leadership and mentorship to junior and mid-level developers.
- Collaborate with DevOps and DataOps teams for deployment and operationalization of data solutions.
- Stay updated with the latest technologies and trends in the big data ecosystem.
Required qualifications to be successful in this role:
Required Skills & Experience:-
8+ years of experience in data engineering or big data development, with at least 3+ years in a lead or senior role. Strong proficiency in Python for data processing, scripting, and automation. Advanced hands-on experience with PySpark (RDD, DataFrame, and Spark SQL APIs). Deep expertise in SQL (query optimization, analytical functions, performance tuning). Strong understanding of distributed data processing and data lake architectures. Experience working with Hadoop ecosystem (Hive, HDFS, Spark, Kafka, etc.). Hands-on experience with cloud platforms (AWS, Azure, or GCP) and data orchestration tools (Airflow, ADF, etc.). Solid understanding of data modeling, ETL design, and performance optimization. Experience with version control (Git) and CI/CD pipelines for data projects. Excellent communication and leadership skills, with the ability to guide cross-functional teams.