Remote
Part Time
We are looking for a Data Engineer to design, build, and operate the company’s data
platform on Google Cloud Platform (GCP).
You will own event ingestion, streaming pipelines, lakehouse design, operational
data stores, identity resolution plumbing, and analytics serving, working closely
with platform engineers, AI engineers, and product leaders.
This is not a dashboard-only analytics role.
This is a systems-level data engineering role focused on real-time decisioning,
correctness, and scale.
You will be responsible for implementing and operating:
Event Ingestion & Streaming
Pub/Sub-based event transport with schema-enforced Protobuf envelopes
Streaming pipelines using Dataflow (Apache Beam)
Targeted Apache Flink pipelines for stateful use cases (identity stitching, rolling
counters)
DLQ and quarantine workflows with replay and backfill support
Lakehouse & Warehousing
Apache Iceberg on GCS as the canonical event log
BigQuery for curated analytics, attribution, matchback, and model serving
Partitioning, compaction, and retention strategies for large-scale datasets
dbt-based transformations and data contracts
Operational Data Stores
Bigtable for low-latency operational KV (identity mappings, counters, context
snapshots)
Redis for hot caches and TTL-based session state
AlloyDB as the system of record for tenants, campaigns, budgets, and consent
state
Knowledge Graph & Context
Neo4j AuraDB for campaign context, constraints, offers, and explainability
relationships
Graph ingestion and update pipelines driven by campaign configuration changes
Real-Time Decision Support
Support the Context Packet Service (Cloud Run) by:
o designing read-optimized schemas,
o ensuring deterministic, low-latency access patterns,
o enabling snapshotting and reproducibility.
Orchestration & Governance
Cloud Composer (Airflow) for orchestration, backfills, and compliance
workflows
Dataplex + DataHub for metadata, lineage, ownership, and contracts
Retention, deletion (DSAR), and audit pipelines
Key Responsibilities
Design and implement streaming and batch data pipelines on GCP
Own data correctness, reproducibility, and latency guarantees
Build schema contracts and enforce them across producers and consumers
Implement identity resolution and stitching plumbing (not ML modeling)
Design operational data models optimized for real-time AI decisioning
Partner with AI engineers to ensure data is agent-ready and explainable
Implement backfills, replays, and disaster recovery procedures
Instrument pipelines with metrics, alerts, and SLOs
Collaborate on security, compliance, and data governance
Participate in architectural decisions and long-term platform evolution
5+ years of hands-on data engineering experience
Production experience with streaming data systems
Strong experience with Google Cloud Platform (or willingness to ramp fast)
Deep SQL skills and strong understanding of distributed data systems
Pub/Sub or equivalent event bus
Apache Beam / Dataflow (or Spark Streaming with clear Beam concepts)
BigQuery (partitioning, cost control, performance tuning)
Cloud Storage–based lakehouse patterns
Airflow (Cloud Composer preferred)
dbt or equivalent transformation frameworks
Operational understanding of NoSQL stores (Bigtable, DynamoDB, Cassandra,
etc.)
Experience designing idempotent, replayable pipelines
Data Modeling & Systems Thinking
Strong grasp of:
o event-driven architectures
o canonical logs vs systems of record
o operational vs analytical data stores
Ability to reason about event time vs processing time
Experience designing multi-tenant data models
Apache Iceberg (or Delta Lake / Hudi)
Apache Flink (stateful streaming, windows, exactly-once semantics)
Neo4j or graph databases
Identity resolution, entity resolution, or MDM-like systems
Marketing, adtech, martech, or customer data platforms (CDPs)
Experience with compliance-driven systems (CCPA, CPRA, GDPR, HIPAA)
Experience supporting real-time AI/ML systems
Python-heavy backend data engineering
Event ingestion is schema-enforced, observable, and replayable
Iceberg + BigQuery pipelines support accurate attribution and analytics
Identity resolution data flows are low-latency and deterministic
Context Packet Service is supported by reliable, well-modeled data stores
Backfills and deletions are safe, auditable, and routine
Engineers trust the data platform as correct and reproducible
You are not “just” building dashboards — you are building the nervous system
You will work on seconds-level latency paths, not just batch jobs
Your work directly powers autonomous AI agents
You will influence architecture, not just implement tickets
OWOW
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.
We have sent an OTP to your contact. Please enter it below to verify.
Practice Python coding challenges to boost your skills
Start Practicing Python Now6.0 - 9.0 Lacs P.A.
kochi, cochin
6.0 - 9.0 Lacs P.A.
chennai, tamil nadu, india
Salary: Not disclosed
india
Salary: Not disclosed
hyderabad, chennai, bengaluru
12.0 - 22.0 Lacs P.A.
hyderabad
6.0 - 10.0 Lacs P.A.
greater kolkata area
Experience: Not specified
Salary: Not disclosed
madurai, tamil nadu, india
Salary: Not disclosed
bengaluru, karnataka, india
Salary: Not disclosed
bengaluru, karnataka, india
Salary: Not disclosed