Job
Description
As an ETL Data Engineer at our company, your primary role will be to re-engineer existing data pipelines to extract data from a new source (PostgreSQL / CURA system) instead of the current Microsoft SQL Server (CRM persistence store). Your focus will be on ensuring a seamless migration that maintains data quality, system performance, and end-user experience. Key Responsibilities: - Analyze existing ETL pipelines and their dependencies on Microsoft SQL Server as source systems. - Design and implement modifications to redirect ETL extractions from PostgreSQL (CURA) while preserving current transformations and load logic into Elasticsearch and MongoDB. - Ensure end-to-end data integrity, quality, and freshness post-source switch. - Write efficient and optimized SQL queries for data extraction from the new source. - Conduct performance testing to validate no degradation of pipeline throughput or latency in production. - Collaborate with DevOps and platform teams to containerize, orchestrate, and deploy updated ETLs using Docker and Kubernetes. - Monitor post-deployment performance and proactively address any production issues. - Document design, code, data mappings, and operational runbooks. Qualifications Required: - Strong experience in building and maintaining large-scale distributed data systems. - Expert-level proficiency in Python, especially in data analysis/manipulation libraries such as pandas, NumPy, and Polars. - Advanced SQL development skills with a track record in performance optimization. - Working knowledge of Docker and Kubernetes. - Familiarity with Elasticsearch and MongoDB as data stores. - Experience working in production environments with mission-critical systems. Should you choose to join us, you will be part of a team that values innovation, collaboration, and excellence in data engineering practices. As an ETL Data Engineer at our company, your primary role will be to re-engineer existing data pipelines to extract data from a new source (PostgreSQL / CURA system) instead of the current Microsoft SQL Server (CRM persistence store). Your focus will be on ensuring a seamless migration that maintains data quality, system performance, and end-user experience. Key Responsibilities: - Analyze existing ETL pipelines and their dependencies on Microsoft SQL Server as source systems. - Design and implement modifications to redirect ETL extractions from PostgreSQL (CURA) while preserving current transformations and load logic into Elasticsearch and MongoDB. - Ensure end-to-end data integrity, quality, and freshness post-source switch. - Write efficient and optimized SQL queries for data extraction from the new source. - Conduct performance testing to validate no degradation of pipeline throughput or latency in production. - Collaborate with DevOps and platform teams to containerize, orchestrate, and deploy updated ETLs using Docker and Kubernetes. - Monitor post-deployment performance and proactively address any production issues. - Document design, code, data mappings, and operational runbooks. Qualifications Required: - Strong experience in building and maintaining large-scale distributed data systems. - Expert-level proficiency in Python, especially in data analysis/manipulation libraries such as pandas, NumPy, and Polars. - Advanced SQL development skills with a track record in performance optimization. - Working knowledge of Docker and Kubernetes. - Familiarity with Elasticsearch and MongoDB as data stores. - Experience working in production environments with mission-critical systems. Should you choose to join us, you will be part of a team that values innovation, collaboration, and excellence in data engineering practices.