Job
Description
Responsibilities
Design, Develop, and Maintain ETL Pipelines: Create, optimize, and manage Extract, Transform, Load (ETL) processes using Python scripts and Pentaho Data Integration (Kettle) to move and transform data from various sources into target systems (e.g., data warehouses, data lakes).Data Quality Assurance: Implement rigorous data validation, cleansing, and reconciliation procedures to ensure the accuracy, completeness, and consistency of data.Data Sourcing and Integration: Work with diverse data sources, including relational databases (SQL Server, MySQL, PostgreSQL), flat files (CSV, Excel), APIs, and cloud platforms.Performance Optimization: Identify and implement improvements for existing ETL processes to enhance data load times, efficiency, and scalability.Troubleshooting and Support: Diagnose and resolve data-related issues, ensuring data integrity and timely availability for reporting and analysis.Documentation: Create and maintain comprehensive documentation for all ETL processes, data flows, and data dictionaries.Collaboration: Work closely with data engineers, data scientists, business analysts, and other stakeholders to understand data requirements and deliver robust data solutions.Ad-hoc Analysis: Perform ad-hoc data analysis and provide insights to support business decisions as needed.About the Role:We are looking for a skilled and passionateData Engineerwith 3 to 4 years of experience in building robust ETL pipelines using both visual ETL tools (preferably Kettle/Pentaho) and Python-based frameworks. You will be responsible for designing, developing, and maintaining high-quality data workflows that support our data platforms and reporting environments.Key Responsibilities:Design, develop, and maintain ETL pipelines using Kettle (Pentaho) or similar tools.Build data ingestion workflows using Python (Pandas, SQLAlchemy, psycopg2).Extract data from relational and non-relational sources (APIs, CSV, databases).Perform complex transformations and ensure high data quality.Load processed data into target systems such as PostgreSQL, Snowflake, or Redshift.Implement monitoring, error handling, and logging for all ETL jobs.Maintain job orchestration via shell scripts, cron, or workflow tools (e.g., Airflow).Work with stakeholders to understand data needs and deliver accurate, timely data.Maintain documentation for pipelines, data dictionaries, and metadata.Requirements:3 to 4 years of experience in Data Engineering or ETL development.Hands-on experience withKettle (Pentaho Data Integration) or similar ETL tools.Strong proficiency in Python (including pandas, requests, datetime, etc.).Strong SQL knowledge and experience with relational databases (PostgreSQL, SQL Server, etc.).Experience with source control (Git), scripting (Shell/Bash), and config-driven ETL pipelines.Good understanding of data warehousing concepts, performance optimization, and incremental loads.Familiarity with REST APIs, JSON, XML, and flat file processing.Good to Have:Experience with job scheduling tools (e.g., Airflow, Jenkins).Familiarity with cloud platforms (AWS, Azure, or GCP).Knowledge of Data Lakes, Big Data, or real-time streaming tools is a plus.Experience working in Agile/Scrum environments.Soft Skills:Strong analytical and problem-solving skills.Self-motivated and able to work independently and in a team.Good communication skills with technical and non-technical stakeholders.IndustrySoftware DevelopmentEmployment TypeFull-time