Role Description
Role Proficiency:This role requires proficiency in data pipeline development including coding and testing data pipelines for ingesting wrangling transforming and joining data from various sources. Must be adept at using ETL tools such as Informatica Glue Databricks and DataProc with coding skills in Python PySpark and SQL. Works independently and demonstrates proficiency in at least one domain related to data with a solid understanding of SCD concepts and data warehousing principles.
JD For Data Engineer
This role will be part of the UST Data Science team, which has achieved great recognition and results in its short life.
- The Data Engineer will engage with external Clients and internal customers, understand their needs, and design, build, and maintain data pipelines and infrastructure using Google Cloud Platform (GCP).
- This will involve the design and implementation of scalable data architectures, ETL processes, and data warehousing solutions on GCP.
- The role requires expertise in big data technologies, cloud computing, and data integration, as well as the ability to optimize data systems for performance and reliability.
- This requires a blend of skills including programming, database management, cloud infrastructure, and data pipeline development.
- Additionally, problem-solving skills, attention to detail, and the ability to work in a fast-paced environment are valuable traits.
- You will frequently work as part of a scrum team, together with data scientists, ML engineers, and analyst developers, to design and implement robust data infrastructure that supports analytics and machine learning initiatives.
- Technical Skills:
- Mandatory Skills: Big Query, ETL, Data Management, Python, SQL, Kubernetes, Cloud computing (GCP, Azure) Optional Skills: Cloud Storage, Hadoop, Kafka
- Responsibilities:
- Design, build, and maintain scalable data pipelines and ETL processes using GCP services such as Cloud Dataflow, Cloud Dataproc, and BigQuery.
- Implement and optimize data storage solutions using GCP technologies like Cloud Storage, Cloud SQL, and Cloud Spanner.
- Develop and maintain data warehouses and data lakes on GCP, ensuring data quality, accessibility, and security.
- Collaborate with data scientists and analysts to understand data requirements and provide efficient data access solutions.
- Implement data governance and security measures to ensure compliance with regulations and best practices.
- Automate data workflows and implement monitoring and ing systems for data pipelines.
- Sharing data engineering knowledge with the wider functions and developing reusable data integration patterns and best practices.
- Skills/Experience:
- BSc/MSc in Computer Science, Information Systems, or related field, or equivalent work experience.
- Proven experience (5+ years) as a Data Engineer or similar role, preferably with GCP expertise.
- Strong proficiency in SQL and experience with NoSQL databases.
- Expertise in data modeling, ETL processes, and data warehousing concepts.
- Significant experience with GCP services such as BigQuery, Dataflow, Dataproc, Cloud Storage, and Pub/Sub.
- Proficiency in at least one programming language (e.g., Python, Java, or Scala) for data pipeline development.
- Experience with big data technologies such as Hadoop, Spark, and Kafka.
- Knowledge of data governance, security, and compliance best practices.
- GCP certifications (e.g., Professional Data Engineer) are highly advantageous.
- Effective communication skills to collaborate with cross-functional teams and explain technical concepts to non-technical stakeholders.
Skills
Bigquery,Gcp,Python,Data Management