Key Responsibilities :
- Dataiku Leadership : Drive data engineering initiatives with a strong emphasis on leveraging Dataiku capabilities for data preparation, analysis, visualization, and the deployment of data solutions. - Data Pipeline Development : Design, develop, and optimize robust and scalable data pipelines to support various business intelligence and advanced analytics projects. This includes developing and maintaining ETL/ELT processes to automate data extraction, transformation, and loading from diverse sources. - Data Modeling & Architecture : Apply expertise in data modeling techniques to design efficient and scalable database structures, ensuring data integrity and optimal performance. - ETL/ELT Expertise : Implement and manage ETL processes and tools to ensure efficient and reliable data flow, maintaining high data quality and accessibility. - Gen AI Integration : Explore and implement solutions leveraging LLM Mesh for Generative AI applications, contributing to the development of innovative AI-powered features. - Programming & Scripting : Utilize programming languages such as Python and SQL for data manipulation, analysis, automation, and the development of custom data solutions. - Cloud Platform Deployment : Deploy and manage scalable data solutions on cloud platforms such as AWS or Azure, leveraging their respective services for optimal performance and cost-efficiency. - Data Quality & Governance : Ensure seamless integration of data sources, maintaining high data quality, consistency, and accessibility across all data assets. Implement data governance best practices. - Collaboration & Mentorship : Collaborate closely with data scientists, analysts, and other stakeholders to understand data requirements and deliver impactful solutions. Potentially mentor junior team members. - Performance Optimization : Continuously monitor and optimize the performance of data pipelines and data systems. Required Skills & Experience : - Proficiency in Dataiku : Demonstrable expertise in Dataiku for data preparation, analysis, visualization, and building end-to-end data pipelines and applications. - Expertise in Data Modeling : Strong understanding and practical experience in various data modeling techniques (e.g., dimensional modeling, Kimball, Inmon) to design efficient and scalable database structures. - ETL/ELT Processes & Tools : Extensive experience with ETL/ELT processes and a proven track record of using various ETL tools (e.g., Dataiku's built-in capabilities, Apache Airflow, Talend, SSIS, etc.).
- Familiarity with LLM Mesh : Familiarity with LLM Mesh or similar frameworks for Gen AI applications, understanding its concepts and potential for integration.
- Programming Languages : Strong proficiency in Python for data manipulation, scripting, and developing data solutions. Solid command of SQL for complex querying, data analysis, and database interactions. - Cloud Platforms : Knowledge and hands-on experience with at least one major cloud platform (AWS or Azure) for deploying and managing scalable data solutions (e.g., S3, EC2, Azure Data Lake, Azure Synapse, etc.). - Gen AI Concepts : Basic understanding of Generative AI concepts and their potential applications in data engineering. - Problem-Solving : Excellent analytical and problem-solving skills with a keen eye for detail. - Communication : Strong communication and interpersonal skills to collaborate effectively with cross-functional teams. Bonus Points (Nice to Have) : - Experience with other big data technologies (e.g., Spark, Hadoop, Snowflake). - Familiarity with data governance and data security best practices. - Experience with MLOps principles and tools. - Contributions to open-source projects related to data engineering or AI. Education : Bachelor's or Master's degree in Computer Science, Data Science, Engineering, or a related quantitative field.