Role & responsibilities Design, build, and maintain robust data ingestion and transformation pipelines using Python, SQL, and dbt . Orchestrate workflows using Airflow / Prefect / Dagster for automation, scheduling, and dependency management. Manage data pipelines across environments ensuring reliability, scalability, and version control ( GitOps best practices ). Collaborate with analysts, ML engineers, and backend teams to design high-quality data models supporting analytics and AI use cases. Implement data validation, lineage, and observability frameworks (Great Expectations, OpenMetadata, DataHub). Champion data quality, lineage, and observability as first-class citizens in every pipeline. Optimize query performance and resource utilization across warehouses like Snowflake, Redshift, or BigQuery . Build and maintain real-time and near real-time data streams (Kafka / Kinesis / Spark Streaming). Integrate data from third-party APIs, SaaS platforms, and on-premise systems into unified data lakes. Deploy and monitor data pipelines using Docker, Kubernetes, and CI/CD pipelines . Implement monitoring and alerting for pipeline performance using Grafana, CloudWatch, or Prometheus . Participate in architecture reviews, ensuring adherence to data engineering best practices and security standards . Stay current with new technologies and continuously enhance data reliability and efficiency. Preferred candidate profile Strong programming in Python (Pandas, SQLAlchemy, PySpark) and Advanced SQL (CTEs, window functions, performance tuning). Hands-on experience with dbt for data transformation and Airflow / Prefect / Dagster for orchestration. Deep understanding of cloud data warehouses (Snowflake, Redshift, BigQuery, Databricks). Familiarity with AWS / GCP data services (Glue, Lambda, S3, Dataflow, Pub/Sub). Knowledge of data modeling (Star / Snowflake schema) and ETL optimization principles. Working knowledge of Docker, Git , and CI/CD tools (GitHub Actions, GitLab CI, Jenkins). Good problem-solving, debugging, and collaboration skills. Good to Have Experience with real-time streaming systems (Kafka, Kinesis, Flink). Familiarity with lakehouse architectures (Delta Lake, Iceberg, Hudi). Exposure to Terraform / Kubernetes for infrastructure automation. Awareness of feature stores and vector databases (pgvector, Pinecone) used in AI pipelines. Understanding of data governance and observability tools (Great Expectations, OpenMetadata). Basic understanding of Power BI, Looker, or Tableau for downstream analytics.
 
                         
                    