Key Responsibilities
Design, develop, and maintain scalable and resilient data pipelines (batch, streaming) using modern orchestration and ETL/ELT frameworks.
Build and optimize data lakes, data warehouses, and real-time data systems to support analytics, reporting, and AI workflows.
Define and implement data modeling, schema evolution, governance, and quality standards across structured and unstructured sources.
Partner with platform, product, and AI/ML team to ensure availability, traceability, and performance of data systems in production.
Implement observability, logging, alerting, and lineage tools for data flows and data infrastructure health.
Integrate with cloud-native and third-party APIs, databases, and message buses to ingest and transform diverse data sources.
Drive adoption of best practices in security, cost-efficiency, testing, and CI/CD for data infrastructure.
Required Skills & Qualifications
Bachelor s with 1-2 years of experience or master s in computer science, data engineering, or a related field.
Previous experience in data engineering or backend systems development.
Designing and operating experience of data pipelines using tools such as Apache Airflow, dbt, Dagster, or Prefect.
Proficiency in distributed data processing (e.g., Spark, Flink, Kafka Streams) and SQL engines (Presto, Trino, BigQuery, Snowflake).
Deep understanding of data modeling, partitioning, columnar storage formats (Parquet, ORC), and schema design.
Strong programming skills in Python, Java, or Scala and familiarity with containerization and cloud infrastructure (AWS/GCP/Azure).
Preferable hands-on experience with data governance, access control, and sensitive data handling.
Comfortable working in agile, MLOps-driven, cross-functional engineering teams.
Stress resilient
Team player
Proficient English speaking