Overview
As a data engineering manager, you will be the key technical expert overseeing PepsiCos data product build & operations and drive a strong vision for how data engineering can proactively create a positive impact on the business. Youll be empowered to create & lead a strong team of data engineers who build data pipelines into various source systems, rest data on the PepsiCo Data Lake, and enable exploration and access for analytics, visualization, machine learning, and product development efforts across the company. As a member of the data engineering team, you will help lead the development of very large and complex data applications into public cloud environments directly impacting the design, architecture, and implementation of PepsiCos flagship data products
Responsibilities
Provide leadership and management to a team of data engineers, managing processes and their flow of work, vetting their designs, and mentoring them to realize their full potential.,Engages with team members, uses informal and structured approaches to career development to focus on individual improvement/capabilities, and to provide balanced feedback.,Oversee work with internal clients and external partners to structure and store data into unified taxonomies and link them together with standard identifiers,Manage and scale data pipelines from internal and external data sources to support new product launches and drive data quality across data products,Build and own the automation and monitoring frameworks that captures metrics and operational KPIs for data pipeline quality and performance,Responsible for implementing best practices around systems integration, security, performance and data management.
Qualifications
- 10+ years of overall technology experience that includes at least 8+ years of hands-on software development, data engineering, and systems architecture.
- 8+ years of experience with Data Lake Infrastructure, Data Warehousing, and Data Analytics tools.
- 8+ years of experience in SQL optimization and performance tuning, and development experience in programming languages like Python, PySpark, Scala etc.).
- 5+ years in cloud data engineering experience in at least one cloud (Azure, AWS, GCP).
- Fluent with Azure cloud services((Azure Data Factory(ADF), ADLS-2, Databricks(Lakehouse, Workflow SQL, Unity catalog).). Azure & Databricks Certification is a plus.
- Experience scaling and managing a team of 5+ engineers.
- Experience with integration of multi cloud services with on-premises technologies.
- Experience with data modeling, data warehousing, and building high-volume ETL/ELT pipelines.
- Experience with data quality and data profiling tools like Apache Griffin, Deequ, and Great Expectations.
- Experience building/operating highly available, distributed systems of data extraction, ingestion, and processing of large data sets.
- Experience with at least one MPP database technology such as Redshift, Synapse or SnowFlake.
- Experience with running and scaling applications on the cloud infrastructure and containerized services like Kubernetes.
- Experience with version control systems like Github and deployment & CI tools.
- Experience with Azure Data Factory, Databricks and Mlflow is a plus.
- Experience with Statistical/ML techniques is a plus.
- Experience with building solutions in the Supply chain space(Digital Procurement, Manufacturing, Cost, Warehouse, Network Design) is a plus
- Understanding of metadata management, data lineage, and data glossaries is a plus.
- Working knowledge of agile development, including DevOps and DataOps concepts.
- Familiarity with business intelligence tools (such as PowerBI).
- BA/BS in Computer Science, Math, Physics, or other technical fields.
- 10+ years of overall technology experience that includes at least 8+ years of hands-on software development, data engineering, and systems architecture.
- 8+ years of experience with Data Lake Infrastructure, Data Warehousing, and Data Analytics tools.
- 8+ years of experience in SQL optimization and performance tuning, and development experience in programming languages like Python, PySpark, Scala etc.).
- 5+ years in cloud data engineering experience in at least one cloud (Azure, AWS, GCP).
- Fluent with Azure cloud services((Azure Data Factory(ADF), ADLS-2, Databricks(Lakehouse, Workflow SQL, Unity catalog).). Azure & Databricks Certification is a plus.
- Experience scaling and managing a team of 5+ engineers.
- Experience with integration of multi cloud services with on-premises technologies.
- Experience with data modeling, data warehousing, and building high-volume ETL/ELT pipelines.
- Experience with data quality and data profiling tools like Apache Griffin, Deequ, and Great Expectations.
- Experience building/operating highly available, distributed systems of data extraction, ingestion, and processing of large data sets.
- Experience with at least one MPP database technology such as Redshift, Synapse or SnowFlake.
- Experience with running and scaling applications on the cloud infrastructure and containerized services like Kubernetes.
- Experience with version control systems like Github and deployment & CI tools.
- Experience with Azure Data Factory, Databricks and Mlflow is a plus.
- Experience with Statistical/ML techniques is a plus.
- Experience with building solutions in the Supply chain space(Digital Procurement, Manufacturing, Cost, Warehouse, Network Design) is a plus
- Understanding of metadata management, data lineage, and data glossaries is a plus.
- Working knowledge of agile development, including DevOps and DataOps concepts.
- Familiarity with business intelligence tools (such as PowerBI).
- BA/BS in Computer Science, Math, Physics, or other technical fields.
Provide leadership and management to a team of data engineers, managing processes and their flow of work, vetting their designs, and mentoring them to realize their full potential.,Engages with team members, uses informal and structured approaches to career development to focus on individual improvement/capabilities, and to provide balanced feedback.,Oversee work with internal clients and external partners to structure and store data into unified taxonomies and link them together with standard identifiers,Manage and scale data pipelines from internal and external data sources to support new product launches and drive data quality across data products,Build and own the automation and monitoring frameworks that captures metrics and operational KPIs for data pipeline quality and performance,Responsible for implementing best practices around systems integration, security, performance and data management.