Role & responsibilities
We are seeking a highly skilled and motivated Data Engineer to join our dynamic team. The ideal candidate will have a passion for building and optimizing robust data pipelines, managing real-time data flows, and developing solutions that drive data-driven decision-making. If you thrive in a fast-paced environment and are eager to tackle complex data challenges, we want to hear from you!
Role and Responsibilities:
- Design, develop, and maintain scalable and high-performance
ETL/ELT pipelines
for batch and real-time data processing
from diverse sources. - Develop and implement
APIs and webhooks
for seamless and secure data ingestion and consumption by various internal and external systems. - Champion
real-time data management
strategies, including stream processing, ensuring low latency and high availability of data for critical applications. - Utilize
advanced Python programming
skills (including asynchronous programming and custom library development) to build efficient data transformation, validation, and enrichment logic. - Work extensively with cloud platforms (preferably AWS) to architect and manage data infrastructure, including services like AWS Kinesis/Kafka, Lambda, Glue, S3, Redshift/Snowflake, and API Gateway.
- Implement and manage data warehousing solutions, ensuring optimal performance and accessibility for analytics and reporting.
- Develop and maintain robust
data quality frameworks
and monitoring systems to ensure data accuracy, completeness, and consistency across all pipelines. - Optimize existing data workflows and database queries for enhanced performance and efficiency, aiming for significant improvements in data processing times and resource utilization.
- Collaborate with data scientists, analysts, software engineers & business stakeholders to understand data requirements and deliver effective data solutions.
- Implement
data governance and security best practices
to ensure data is handled responsibly and in compliance with relevant regulations. - Contribute to the design and implementation of data models for both transactional (OLTP) and analytical (OLAP) systems.
- Explore and integrate new data technologies and tools to enhance our data capabilities.
Required Skills and Qualifications:
- Bachelor's or Master's degree in Computer Science, Engineering, or a related technical field.
2-5 years of hands-on experience
as a Data Engineer or in a similar role.Expert proficiency in Python
for data engineering tasks (e.g., Pandas, PySpark, data manipulation libraries) and experience with software development best practices (version control, testing, CI/CD).- Proven experience in designing, building, and deploying
real-time data pipelines
using technologies like Kafka, AWS Kinesis, Apache Flink, or similar. - Strong experience in creating, deploying, and managing
RESTful APIs and webhooks
for data exchange, with a focus on security and scalability. - In-depth knowledge of SQL databases (e.g., PostgreSQL, MySQL, Mongo DB, Dynamo DB).
- Hands-on experience with
cloud data services (AWS, Azure, or GCP)
. Specific AWS experience with Glue, Lambda, S3, EC2, RDS, and API Gateway is highly desirable. - Solid understanding of data warehousing concepts, ETL/ELT processes, and data modelling techniques.
- Familiarity with containerization technologies (Docker) and orchestration tools (Kubernetes) is a plus.
- Excellent problem-solving skills and the ability to work independently as well as in a collaborative team environment.
- Strong communication skills, with the ability to articulate complex technical concepts to non-technical stakeholders.
Preferred Qualifications:
- Experience with data visualization tools (e.g., Tableau, Power BI, Looker).
- Knowledge of machine learning concepts and MLOps.
- Contributions to open-source data engineering projects.
- Relevant certifications (e.g., AWS Certified Data Analytics Specialty, AWS Certified Solutions Architect).