Job
Description
Role Overview: You will be joining Fluent Health, a dynamic healthcare startup that is revolutionizing healthcare management for individuals and families. As a Senior Data Engineer, you will be a key player in designing, implementing, and optimizing the analytical and real-time data platform. Your role will involve a combination of hands-on data engineering and high-level architectural thinking to build scalable data infrastructure, with ClickHouse as a central component of the analytics and data warehousing strategy. Key Responsibilities: - Architecture & Strategy: - Take ownership of the target data architecture, focusing on ClickHouse for large-scale analytical and real-time querying workloads. - Define and maintain a scalable and secure data platform architecture to support real-time analytics, reporting, and ML applications. - Establish data governance and modeling standards, ensuring data lineage, integrity, and security practices are adhered to. - Evaluate and integrate complementary technologies into the data stack, such as message queues, data lakes, and orchestration frameworks. - Data Engineering: - Design, develop, and maintain robust ETL/ELT pipelines to ingest and transform data from various sources into the data warehouse. - Optimize ClickHouse schema and query performance for real-time and historical analytics workloads. - Develop data APIs and interfaces for product and analytics teams to interact with the data platform. - Implement monitoring and observability tools to ensure pipeline reliability and data quality. - Collaboration & Leadership: - Collaborate with data consumers to understand data needs and translate them into scalable solutions. - Work with security and compliance teams to implement data privacy, classification, retention, and access control policies. - Mentor junior data engineers and contribute to hiring efforts as the team scales. Qualifications Required: - Strong proficiency in SQL and Python. - Hands-on experience with modern data technologies like PostgreSQL, MongoDB, BigQuery, and ideally ClickHouse. - Expert-level proficiency in ClickHouse or similar columnar databases (e.g., BigQuery, Druid, Redshift). - Proven experience in designing and operating scalable data warehouse and data lake architectures. - Deep understanding of data modeling, partitioning, indexing, and query optimization techniques. - Strong experience in building ETL/ELT pipelines using tools like Airflow, dbt, or custom frameworks. - Familiarity with stream processing and event-driven architectures (e.g., Kafka, Pub/Sub). - Proficiency in SQL and at least one programming language like Python, Scala, or Java. - Experience with data governance, compliance frameworks (e.g., HIPAA, GDPR), and data cataloging tools. - Knowledge of real-time analytics use cases and streaming architectures. - Familiarity with machine learning pipelines and integrating data platforms with ML workflows. - Experience working in regulated or high-security domains like health tech, fintech, or product-based companies. - Ability to translate business needs into technical architecture and strong communication skills to align cross-functional stakeholders clearly. - Self-motivated and comfortable in a fast-paced, startup-like environment.,