Job Title: Senior Lead Data Engineer
Location: 100% remote
Duration: 12 months with possible extension
Working time zone: Full EST hours
Working hours: 8 hours per day (40 hours per week)
Introduction:
The Senior Lead Data Engineer plays a crucial role in the portfolio analytics team, focusing on the transition of code development and migration from SAS to Databricks. This position is vital for enhancing data processing capabilities and optimizing performance within the organization, directly impacting the efficiency and accuracy of analytics operations.
Roles and Responsibilities:
- Lead the migration of code and processes from SAS to Databricks, ensuring seamless integration and functionality.
- Develop, test, and optimize PySpark and Python code within the Databricks environment to improve performance and efficiency.
- Design and implement data pipelines and workflows using Databricks notebooks and job clusters.
- Schedule and manage jobs in Databricks to ensure timely data processing and availability.
- Collaborate with the portfolio analytics team to understand data requirements and deliver solutions that meet business needs.
- Mentor junior analysts, providing guidance and training on best practices in data engineering and the use of Databricks.
- Prepare and present training materials, starting with a general introduction to the position and its responsibilities.
- Utilize version control systems, such as GitHub, to manage and document code changes effectively.
- Work with Snowflake and Snowpark to write complex, optimized queries across large datasets.
- Employ Airflow scheduler for orchestrating and automating complex data workflows (preferred).
Qualifications:
- Bachelor's or Master's degree in Computer Science, Information Technology, or a related field.
- Minimum of 5-8 years of hands-on experience in Databricks workspace, Databricks notebooks, and job clusters.
- Strong experience in PySpark and Python coding within the Databricks environment.
- Proven experience in performance tuning and optimizing code in Databricks.
- Familiarity with SAS and experience supporting migration to Databricks.
- Strong working knowledge of Snowflake and Snowpark for data querying and manipulation.
- Experience with version control systems, particularly GitHub.
- Experience with Airflow scheduler is preferred.
- Excellent problem-solving skills and attention to detail.
- Strong communication skills, with the ability to mentor and train junior team members.
Tools and Technologies:
- Databricks
- PySpark
- Python
- Snowflake and Snowpark
- GitHub
- Airflow (preferred)
- SAS (for migration purposes)