GCP Data Engineer

5 years

0 Lacs

Posted:1 week ago| Platform: Linkedin logo

Apply

Work Mode

Remote

Job Type

Part Time

Job Description

Data Engineer — AI & Real-Time Marketing Platform

Location: Remote (US-friendly time zones preferred)


The Role

We are looking for a Data Engineer to design, build, and operate the company’s data

platform on Google Cloud Platform (GCP).

You will own event ingestion, streaming pipelines, lakehouse design, operational

data stores, identity resolution plumbing, and analytics serving, working closely

with platform engineers, AI engineers, and product leaders.


This is not a dashboard-only analytics role.


This is a systems-level data engineering role focused on real-time decisioning,

correctness, and scale.


What You Will Build

You will be responsible for implementing and operating:


Event Ingestion & Streaming

 Pub/Sub-based event transport with schema-enforced Protobuf envelopes

 Streaming pipelines using Dataflow (Apache Beam)

 Targeted Apache Flink pipelines for stateful use cases (identity stitching, rolling

counters)

 DLQ and quarantine workflows with replay and backfill support


Lakehouse & Warehousing

 Apache Iceberg on GCS as the canonical event log

 BigQuery for curated analytics, attribution, matchback, and model serving

 Partitioning, compaction, and retention strategies for large-scale datasets

 dbt-based transformations and data contracts


Operational Data Stores

 Bigtable for low-latency operational KV (identity mappings, counters, context

snapshots)

 Redis for hot caches and TTL-based session state

 AlloyDB as the system of record for tenants, campaigns, budgets, and consent

state


Knowledge Graph & Context

 Neo4j AuraDB for campaign context, constraints, offers, and explainability

relationships

 Graph ingestion and update pipelines driven by campaign configuration changes

Real-Time Decision Support

 Support the Context Packet Service (Cloud Run) by:

o designing read-optimized schemas,

o ensuring deterministic, low-latency access patterns,

o enabling snapshotting and reproducibility.


Orchestration & Governance

 Cloud Composer (Airflow) for orchestration, backfills, and compliance

workflows

 Dataplex + DataHub for metadata, lineage, ownership, and contracts

 Retention, deletion (DSAR), and audit pipelines

Key Responsibilities

 Design and implement streaming and batch data pipelines on GCP

 Own data correctness, reproducibility, and latency guarantees

 Build schema contracts and enforce them across producers and consumers

 Implement identity resolution and stitching plumbing (not ML modeling)

 Design operational data models optimized for real-time AI decisioning

 Partner with AI engineers to ensure data is agent-ready and explainable

 Implement backfills, replays, and disaster recovery procedures

 Instrument pipelines with metrics, alerts, and SLOs

 Collaborate on security, compliance, and data governance

 Participate in architectural decisions and long-term platform evolution


Required Qualifications


Core Experience

 5+ years of hands-on data engineering experience

 Production experience with streaming data systems

 Strong experience with Google Cloud Platform (or willingness to ramp fast)

 Deep SQL skills and strong understanding of distributed data systems


Required Technical Skills

 Pub/Sub or equivalent event bus

 Apache Beam / Dataflow (or Spark Streaming with clear Beam concepts)

 BigQuery (partitioning, cost control, performance tuning)

 Cloud Storage–based lakehouse patterns

 Airflow (Cloud Composer preferred)

 dbt or equivalent transformation frameworks

 Operational understanding of NoSQL stores (Bigtable, DynamoDB, Cassandra,

etc.)

 Experience designing idempotent, replayable pipelines


Data Modeling & Systems Thinking

 Strong grasp of:

o event-driven architectures

o canonical logs vs systems of record

o operational vs analytical data stores

 Ability to reason about event time vs processing time

 Experience designing multi-tenant data models


Preferred / Nice-to-Have Experience

 Apache Iceberg (or Delta Lake / Hudi)

 Apache Flink (stateful streaming, windows, exactly-once semantics)

 Neo4j or graph databases

 Identity resolution, entity resolution, or MDM-like systems


 Marketing, adtech, martech, or customer data platforms (CDPs)

 Experience with compliance-driven systems (CCPA, CPRA, GDPR, HIPAA)

 Experience supporting real-time AI/ML systems

 Python-heavy backend data engineering


What Success Looks Like (First 6 Months)

 Event ingestion is schema-enforced, observable, and replayable

 Iceberg + BigQuery pipelines support accurate attribution and analytics

 Identity resolution data flows are low-latency and deterministic

 Context Packet Service is supported by reliable, well-modeled data stores

 Backfills and deletions are safe, auditable, and routine

 Engineers trust the data platform as correct and reproducible


How This Role Is Different

 You are not “just” building dashboards — you are building the nervous system

 You will work on seconds-level latency paths, not just batch jobs

 Your work directly powers autonomous AI agents

 You will influence architecture, not just implement tickets

Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

RecommendedJobs for You