Data Engineer

5 years

0 Lacs

Posted:2 weeks ago| Platform: Linkedin logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

We're seeking an experienced Data Engineer to join our team and play a critical role in building

and scaling our next-generation AI-powered marketing personalization platform (V2.0). You'll

architect and implement a sophisticated multi-database infrastructure supporting real-time

personalization, vector search, graph analytics, and large-scale data processing.


This is a greenfield opportunity to design data pipelines from the ground up, working with

cutting-edge technologies including vector databases, graph databases, and large language

models (LLMs). You'll be instrumental in migrating our existing platform while building robust,

scalable data infrastructure that powers AI agents serving thousands of marketing campaigns.


Must-Have Skills

● 5+ years of data engineering experience with production systems

● Expert-level SQL and database design skills

● Strong Python programming (async/await, type hints, testing)

● Experience with at least 3 different database technologies (SQL, NoSQL, Vector,

Graph)

● Proven track record building high-scale data pipelines (>1M records/day)

● Deep understanding of data modeling (dimensional, normalized, denormalized)

● Experience with cloud data warehouses (BigQuery, Redshift, or Snowflake)

● Strong knowledge of data quality, validation, and governance

● Excellent debugging and optimization skills


Highly Desirable

● Experience with vector databases (Milvus, Pinecone, Weaviate, Qdrant)

● Experience with graph databases (Neo4j, ArangoDB, Neptune)

● Knowledge of embedding models and semantic search

● Experience with ML data pipelines (feature stores, model training data)

● Understanding of A/B testing and experimental design

● Experience with real-time streaming (Kafka, Pub/Sub, Kinesis)

● Knowledge of LLMs and conversational AI systems

● Experience with data migration projects (especially large-scale)

● Background in marketing technology or customer data platforms


Nice-to-Have

● Experience with PyTorch Geometric or graph neural networks

● Knowledge of marketing analytics (attribution, segmentation, personalization)

● Familiarity with LangChain, LangGraph, or agent frameworks

● Experience with cost optimization in cloud environments

● Contributions to open-source data engineering projects

● Experience with data compliance (GDPR, CCPA)


Core Responsibilities :

Data Architecture & Infrastructure (40%)

● Design and implement a multi-database architecture (MongoDB, Redis, Milvus, Neo4j, BigQuery) 

● Build scalable data pipelines for real-time conversation processing and personalization ● Architect ETL/ELT workflows for data migration from legacy systems ● Implement data partitioning, sharding, and optimization strategies for high-throughput systems 

Vector & Graph Database Systems (25%)

● Design and optimize Milvus vector collections for semantic search (1024-dim embeddings) 

● Build graph schemas in Neo4j for customer journey mapping and persona relationships ● Implement HNSW indexing strategies and similarity search optimization ● Create hybrid search systems combining vector, full-text, and graph queries ● Monitor and tune database performance (query latency, throughput, resource utilization) 


ML Data Infrastructure (20%)

● Build data collection pipelines for LLM fine-tuning (conversation logs, tool executions) ● Create feature stores for GNN training (customer interactions, engagement signals) ● Implement data versioning and lineage tracking for ML experiments 

● Design A/B testing data infrastructure with CUPED variance reduction ● Build real-time feature computation pipelines for contextual bandits 


Analytics & Monitoring (15%)

● Design BigQuery schemas for marketing analytics and performance tracking ● Create materialized views and aggregation pipelines for real-time dashboards ● Implement data quality monitoring and anomaly detection 

● Build observability infrastructure (Prometheus metrics, Grafana dashboards) ● Develop cost optimization strategies for cloud data warehousing 


Technical Stack You'll Work With

Databases & Storage

MongoDB

Redis

Milvus

Neo4j

BigQuery


Data Processing & Orchestration

Apache Airflow

Pandas

Apache Spark

dbt


ML/AI Data Pipeline

vLLM

MLflow

Sentence Transformers

PyTorch


Cloud & Infrastructure

Google Cloud Platform

Docker

Terraform

GitHub Actions


Programming & Tools

Python 3.10+

SQL

Shell scripting

Git



Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

RecommendedJobs for You