Job Location: Vapi, Gujarat
On-site Role
About the Role
We are seeking a skilled and motivated Data Engineer to join our growing technology team. The role involves building and maintaining scalable, reliable, and secure data infrastructure to support analytics, data-driven decision-making, and AI/ML pipelines. Youll work with diverse data types and modern data platforms to design efficient data pipelines and ensure smooth data flow across systems.
Key Responsibilities
- Design, develop, and maintain robust ETL/ELT pipelines for structured and unstructured data using tools like Apache NiFi, Airflow, or Dagster.
- Build streaming and event-driven data pipelines using Kafka, RabbitMQ, or similar systems.
- Design and manage scalable data lakes (e.g., Apache Hudi, Iceberg, Delta Lake) over Amazon S3 or MinIO.
- Implement and optimize distributed databases such as Cassandra, MongoDB, ClickHouse, and ElasticSearch.
- Ensure data quality, monitoring, and observability across all data pipeline components.
- Work with query engines like Trino for federated data access.
- Manage data versioning and reproducibility using tools like DVC.
- Perform data migrations, query optimization, and system performance tuning.
- Collaborate with analytics, product, and AI teams to provide clean and well-structured datasets. Must-Have Skills & Experience
- Bachelors or Masters degree in Computer Science, Information Technology, or a related field.
- 4-6 years of experience as a Data Engineer or in a similar role.
- Strong proficiency in Python and SQL.
- Hands-on experience with ETL orchestration tools (Airflow, NiFi, Dagster).
- Familiarity with data lakes, streaming platforms, and distributed databases.
- Experience working with cloud/object storage (Amazon S3, MinIO).
- Knowledge of data governance, security, and pipeline observability. Good-to-Have Skills
- Experience with time-series databases (InfluxDB, TimescaleDB, QuestDB).
- Familiarity with graph databases (Neo4j, OrientDB, or RavenDB).
- Understanding of MLOps, feature stores, or data lifecycle automation. • Exposure to Elasticsearch for indexing and search use cases.
- Experience in query performance tuning and data migration strategies About the Role We are seeking a skilled and motivated Data Engineer to join our growing technology team. The role involves building and maintaining scalable, reliable, and secure data infrastructure to support analytics, data-driven decision-making, and AI/ML pipelines. Youll work with diverse data types and modern data platforms to design efficient data pipelines and ensure smooth data flow across systems.
Key Responsibilities
- Design, develop, and maintain robust ETL/ELT pipelines for structured and unstructured data using tools like Apache NiFi, Airflow, or Dagster.
- Build streaming and event-driven data pipelines using Kafka, RabbitMQ, or similar systems.
- Design and manage scalable data lakes (e.g., Apache Hudi, Iceberg, Delta Lake) over Amazon S3 or MinIO.
- Implement and optimize distributed databases such as Cassandra, MongoDB, ClickHouse, and ElasticSearch.
- Ensure data quality, monitoring, and observability across all data pipeline components.
- Work with query engines like Trino for federated data access.
- Manage data versioning and reproducibility using tools like DVC.
- Perform data migrations, query optimization, and system performance tuning.
- Collaborate with analytics, product, and AI teams to provide clean and well-structured datasets.
Must-Have Skills & Experience
- Bachelors or Masters degree in Computer Science, Information Technology, or a related field.
- 4-6 years of experience as a Data Engineer or in a similar role.
- Strong proficiency in Python and SQL.
- Hands-on experience with ETL orchestration tools (Airflow, NiFi, Dagster).
- Familiarity with data lakes, streaming platforms, and distributed databases.
- Experience working with cloud/object storage (Amazon S3, MinIO).
- Knowledge of data governance, security, and pipeline observability. Good-to-Have Skills
- Experience with time-series databases (InfluxDB, TimescaleDB, QuestDB).
- Familiarity with graph databases (Neo4j, OrientDB, or RavenDB).
- Understanding of MLOps, feature stores, or data lifecycle automation. • Exposure to Elasticsearch for indexing and search use cases.
- Experience in query performance tuning and data migration strategies.
Python and SQL