Data Engineer

5 years

0 Lacs

Posted:10 hours ago| Platform: Linkedin logo

Apply

Work Mode

On-site

Job Type

Part Time

Job Description

What You'll Build

Core Responsibilities

Data Architecture & Infrastructure (40%)

● Design and implement a multi-database architecture (MongoDB, Redis, Milvus, Neo4j, BigQuery) 

● Build scalable data pipelines for real-time conversation processing and personalization

● Architect ETL/ELT workflows for data migration from legacy systems

● Implement data partitioning, sharding, and optimization strategies for high-throughput systems 

Vector & Graph Database Systems (25%)

● Design and optimize Milvus vector collections for semantic search (1024-dim embeddings) 

● Build graph schemas in Neo4j for customer journey mapping and persona relationships

● Implement HNSW indexing strategies and similarity search optimization

● Create hybrid search systems combining vector, full-text, and graph queries

● Monitor and tune database performance (query latency, throughput, resource utilization) 


ML Data Infrastructure (20%)

● Build data collection pipelines for LLM fine-tuning (conversation logs, tool executions)

● Create feature stores for GNN training (customer interactions, engagement signals)

● Implement data versioning and lineage tracking for ML experiments 

● Design A/B testing data infrastructure with CUPED variance reduction

● Build real-time feature computation pipelines for contextual bandits 


Analytics & Monitoring (15%)

● Design BigQuery schemas for marketing analytics and performance tracking

● Create materialized views and aggregation pipelines for real-time dashboards

● Implement data quality monitoring and anomaly detection 

● Build observability infrastructure (Prometheus metrics, Grafana dashboards)

● Develop cost optimization strategies for cloud data warehousing 


Technical Stack You'll Work With

Databases & Storage

MongoDB

Redis

Milvus

Neo4j

BigQuery


Data Processing & Orchestration

Apache Airflow

Pandas

Apache Spark

dbt


ML/AI Data Pipeline

vLLM

MLflow

Sentence Transformers

PyTorch


Cloud & Infrastructure

Google Cloud Platform

Docker

Terraform

GitHub Actions


Programming & Tools

Python 3.10+

SQL

Shell scripting

Git


Requirements

Must-Have Skills

5+ years

Expert-level SQL

Strong Python

at least 3 different database technologies

high-scale data pipelines

data modeling

cloud data warehouses

data quality, validation, and governance

debugging and optimization


Highly Desirable

vector databases

graph databases

embedding models

ML data pipelines

A/B testing

real-time streaming

LLMs

data migration

marketing technology

Nice-to-Have


PyTorch Geometric

marketing analytics

LangChain

cost optimization

open-source

data compliance


Key Projects You'll Own

Phase 1: Foundation

● Migrate 10M+ conversation vectors from Pinecone to Milvus 

● Design and implement MongoDB schemas for real-time agent state

● Set up Neo4j graph database with customer journey models 

● Create BigQuery data warehouse with partitioned tables 


Phase 2: Optimization

● Build automated data quality monitoring system 

● Implement caching strategies (Redis) for 10x latency reduction 

● Optimize vector search queries (target: <50ms p95 latency) 

● Create real-time analytics dashboards (Grafana) 


Phase 3: ML Infrastructure

● Build LLM fine-tuning data pipeline 

● Implement feature store for GNN training 

● Create A/B testing data infrastructure 

● Design multi-armed bandit state management 


Work Environment

Collaborative team

Modern stack

Impact

Autonomy

Growth

Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

RecommendedJobs for You

kolkata, bengaluru, delhi / ncr

bengaluru east, karnataka, india

gurugram, haryana, india

pune, maharashtra, india

bengaluru, karnataka, india