Job description Role Overview We are seeking an experienced Data Engineer with strong expertise in SQL, Python, PySpark, Airflow, Trino, and Hive to design, develop, and optimize our data pipelines. The role involves working with large flat file datasets, orchestrating workflows using Airflow, performing data transformations using Spark, and loading the final layers into Snowflake for analytics and reporting. Key Responsibilities Data Pipeline Development: Build, maintain, and optimize data pipelines using Airflow for orchestration and scheduling. Data Ingestion & Transformation: Work with flat files (CSV, JSON, Mainframe-based files) and ensure accurate ingestion and transformation. Spark-based Processing: Use PySpark for large-scale data processing and implementing custom UDF (User Defined Functions) where needed. SQL Development: Create, optimize, and maintain SQL scripts for data manipulation, reporting, and analytics. Snowflake Data Integration: Load and manage final processed data layers in Snowflake . Data Quality & Metrics: Implement checks for file size limits, data consistency, and daily metric tracking. Collaboration & Requirements Gathering: Work with business and technical teams to understand requirements and deliver efficient data solutions. Required Skills & Experience Proficiency in SQL (query optimization, joins, indexing). Strong Python programming skills, including writing reusable functions. Hands-on experience with PySpark (adding columns, transformations, cache usage, UDF functions). Proficiency in Airflow for workflow orchestration. Familiarity with Trino and Hive query engines. Experience with flat file formats (CSV, JSON, Mainframe-based files) and data parsing strategies. Understanding of data normalization , unique constraints, and caching strategies. Experience working with Snowflake or other cloud data warehouses. Preferred Qualifications Knowledge of performance tuning in Spark and SQL. Understanding of data governance and security best practices. Experience with large file processing (including max size handling).