Job
Description
As a Mid-Level Data Engineer at Thinkproject, you will be at the forefront of building, optimizing, and maintaining reliable data pipelines and workflows. Here is a glimpse of what your role entails and the qualifications required to excel in this position: Role Overview: You will play a crucial role in designing scalable data processing systems, handling both structured and unstructured data sources, and supporting data needs across the organization in a cloud-centric environment. Key Responsibilities: - Design, implement, and optimize data pipelines (batch and streaming) for efficient data ingestion, transformation, and storage. - Ensure high data integrity through monitoring, alerting, testing, and validation mechanisms. - Develop data solutions using cloud-native services (preferably GCP or AWS) to support analytics and ML workloads. - Contribute to the design of data warehouses and lakehouse architectures for optimized storage and query performance. - Automate routine tasks like data loading, validation, and transformation processes using scripting languages and workflow orchestration tools. - Collaborate with data analysts, scientists, and application engineers to facilitate data access and insights across teams. - Maintain clear documentation and adhere to best practices for reproducibility, scalability, and security. Qualifications Required: - Education: Bachelors degree in Computer Science, Data Engineering, or a related technical field. - Experience: Minimum of 2-4 years of hands-on experience in data engineering or backend development with a strong focus on data. - Programming Languages: Proficiency in Python and SQL; knowledge of Java or Scala is a plus. - Cloud Platforms: Practical experience with at least one cloud platform (GCP, AWS, Azure) and familiarity with GCP services like BigQuery, Dataflow, Pub/Sub, and Cloud Storage. - Data Technologies: Working knowledge of Apache Spark, Hadoop, Airflow or Composer, and understanding of data warehousing concepts. - Databases: Experience with relational (PostgreSQL, MySQL) and exposure to NoSQL databases (MongoDB, Bigtable, etc.). - Version Control & CI/CD: Proficient with Git and familiarity with CI/CD tools and workflows. - Data Processing: Ability to work with large, complex datasets and apply data cleaning, parsing, and transformation techniques. In addition to the technical skills, the following qualifications are preferred: - Experience in building and managing ETL/ELT pipelines in a production environment. - Exposure to modern data lake/lakehouse architectures (e.g., Delta Lake, Iceberg). - Understanding of data governance, metadata management, and security best practices. - Familiarity with orchestration tools (e.g., Apache Airflow, Dagster, Prefect). - Strong communication and documentation skills. - Curious, proactive mindset and ability to quickly learn new tools. Join Thinkproject and be part of a passionate team that values mutual respect and invests in the growth of its employees. Enjoy various benefits such as Lunch "n" Learn Sessions, Women's Network, LGBTQIA+ Network, Social Events, and more. Your feedback matters, and we strive to create a fantastic culture together. To apply for this role, please submit your application, including salary expectations and potential date of entry. Shape your career with Thinkproject and think ahead. Your contact: Gagandeep Virdi Working at thinkproject.com - think career, think ahead.,