Computer Vision & Backend Engineer (60-Day Build)
Company:
Type:
Location:
Mission (60 days)
Deliver a production-ready photo recognition system that powers a calorie-counting app end-to-end:
Upload → Analyze → Nutrition:
From a food photo, return { name, grams, confidence, tags, ingredients, macros }
per item, with meal totals and remaining daily targets.
Retraining option:
Design and ship the infrastructure that learns from user corrections
(renames, grams/macros edits) and can retrain/evaluate safely.
What you will build (end-to-end scope)
- POST /api/vision/upload (multipart JPEG/PNG/WebP) → { name, grams, confidence, tags }[]
- POST /api/coach/photo → persist image, call vision, run lookupFood, return items, meal totals, remaining Daily, and coachReply
Food analysis (multi-cuisine)
- Gate + Instances: YOLOv8/11 detect (food vs distractors) → YOLO-seg (retina masks)
- Naming: SigLIP/CLIP (or compact ViT) on mask crops, synonyms/taxonomy aware
Safety:
OOD detector + low-confidence suggestions; safe abstain (no hallucinations)
- Device-depth first (if present),
monocular fallback
(MiDaS/ZoeDepth), tabletop plane-fit, coverage %, density lookup (Redis), portion_source=device|mono|heuristic
- Map labels →
canonical taxonomy
(≤400 dishes)
- Query
our nutrition DB
or external sources (e.g., FDC) to assemble ingredients + per-ingredient macros
, scale by grams, compute meal totals
Retraining loop (feedback → model)
- Capture user edits & low-margin/OOD crops → store to ClickHouse/S3
- Scripts & jobs to rebuild datasets, fine-tune,
evaluate with metric gates
, and publish new artifacts safely
- CI evaluator (Top-1/Top-5, OOD FP rate, Portion MAPE, latency SLOs) that
blocks regressions
- Observability: structured logs, per-stage ms, model/taxonomy versions
- Privacy: consent gate, retention/“delete my images” flow
60-Day milestone plan (acceptance-driven)
Week 1–2 (Foundation & API)
- Stand up GPU FastAPI /infer-v2 + Node /api/coach/photo
- Return stubbed payload matching contract; basic telemetry; dockerized
Demo:
curl upload → JSON schema exactly matches app contract
Week 3–4 (Models & Portions)
- YOLO gate+seg (export ONNX); CLIP/SigLIP naming with temperature scaling
- Depth-aware grams (device depth) + mono fallback; density via Redis
Demo:
multi-cuisine sample set returns names + grams within sanity bounds
Week 5 (Nutrition & Safety)
- Taxonomy (≤400) + nutrition mapping (our DB / FDC)
- OOD abstain with suggestions; ingredients + per-ingredient macros scaled by grams
Demo:
App-ready payload { name, grams, confidence, tags, ingredients, macros } per item; meal totals & remainingDaily
Week 6–8 (Retraining + CI gates + Canary)
- Feedback capture from user edits; dataset rebuild scripts; fine-tune path
- Evaluator + CI gates (json report) and shadow/canary rollout toggles
- Privacy & retention wired; runbook + handover docs
Final Demo (Day 60):
end-to-end flow on staging GPU; retrain on a small corrected set; CI passes; canary toggle ready
Success metrics (set at kickoff; used by CI gate)
Quality:
Top-1 on core ≥ target; OOD FP ≤ target; Portion MAPE ≤ target on depth images
Latency:
p50 ≤ 350 ms
, p95 ≤ 800 ms
on our staging GPU
Reliability:
CI gate prevents regressions; logs/metrics complete; consent & retention enforced
Minimum qualifications
- Shipped
computer-vision systems
to production (beyond notebooks)
- YOLO detect/seg training or fine-tuning; export to
ONNX/TensorRT
and debug opsets/dynamic shapes
- CLIP/SigLIP or ViT classifier work (fine-tune +
temperature scaling
); OOD thresholding
- Depth pipelines (device + monocular), geometric reasoning (plane fitting, coverage)
- Production APIs (FastAPI/Node), Redis/ClickHouse (or similar), Docker, GitHub Actions
- Obs/ops: structured logging, latency profiling, privacy/retention patterns
Nice-to-haves
- Triton Inference Server, FAISS/ANN, K8s/Helm, W&B/MLflow
- Nutrition data integration (FDC or equivalent), taxonomy design
Tech you’ll touch
PyTorch, Ultralytics YOLOv8/11, SAM/SAM2, SigLIP/CLIP, MiDaS/ZoeDepth, ONNX Runtime (CUDA EP), TensorRT (nice), FastAPI, Node/Express, Redis, ClickHouse, Docker, GitHub Actions.
What we provide
- GPU access (cloud, H100/A10/T4), seed datasets & taxonomy draft, staging infra, and rapid product feedback
- Clear API contract and benchmark packs for CI gating
How to apply
hello@wownom.com
- A shipped CV project (repo/demo) + one
latency
and one accuracy
number you achieved and how
- Availability to start within 1–2 weeks and timezone
- (Optional) A brief note on grams estimation from depth vs. monocular on plated dishes