Job
Description
About the Role You’ll join a small, fast team turning cutting-edge AI research into shippable products across text, vision, and multimodal domains. One sprint you’ll be distilling an LLM for WhatsApp chat-ops; the next you’ll be converting CAD drawings to BOM stories, or training a computer-vision model that flags onsite safety risks. You own the model life-cycle end-to-end: data prep ➞ fine-tune/distil ➞ evaluate ➞ deploy ➞ monitor. Key Responsibilities Model Engineering • Fine-tune and quantise open-weight LLMs (Llama 3, Mistral, Gemma) and SLMs for low-latency edge inference. • Train or adapt computer-vision models (YOLO, Segment Anything, SAM-DINO) to detect site hazards, drawings anomalies, or asset states. Multimodal Pipelines • Build retrieval-augmented-generation (RAG) stacks: loaders → vector DB (FAISS / OpenSearch) → ranking prompts. • Combine vision + language outputs into single “scene → story” responses for dashboards and WhatsApp bots. Serving & MLOps • Package models as Docker images, SageMaker endpoints, or ONNX edge bundles; expose FastAPI/GRPC handlers with auth, rate-limit, telemetry. • Automate CI/CD: GitHub Actions → Terraform → blue-green deploys. Evaluation & Guardrails • Design automatic eval harnesses (BLEU, BERTScore, CLIP similarity, toxicity & bias checks). • Monitor drift, hallucination, latency; implement rollback triggers. Enablement & Storytelling • Write prompt playbooks & model cards so other teams can reuse your work. • Run internal workshops: “From design drawing to narrative” / “LLM safety by example”. Required Skills & Experience 3+ yrs ML/NLP/CV in production; at least 1 yr hands-on with Generative AI . Strong Python (FastAPI, Pydantic, asyncio) and HuggingFace Transformers OR diffusers . Experience with minimal-footprint models (LoRA, QLoRA, GGUF, INT-4) and vector search. Comfortable on AWS/GCP/Azure for GPU instances, serverless endpoints, IaC. Solid grasp of evaluation/guardrail frameworks (Helm, PromptLayer, Guardrails-AI, Triton metrics). Bonus Points Built a RAG or function-calling agent used by 500+ users. Prior CV pipeline (object-detection, segmentation) or speech-to-text real-time project. Live examples of creative prompt engineering or story-generation. Familiarity with LangChain, LlamaIndex, or BentoML. Why You’ll Love It Multidomain playground – text, vision, storytelling, decision-support. Tech freedom – pick the right model & stack; justify it; ship it. Remote-first – work anywhere ±4 hrs of IST; quarterly hack-weeks in Hyderabad. Top-quartile pay – base + milestone bonus + conference stipend. How to Apply Send a resume and link to GitHub / HF / Kaggle showcasing LLM or CV work. Include a 200-word note describing your favourite prompt or model tweak and the impact it had. Short-listed candidates complete a practical take-home (fine-tune tiny model, build RAG or vision demo, brief write-up) and a 45-min technical chat. We hire builders, not resume keywords. Show us you can ship AI that works in the real world—and explain it clearly—and you’re in. Show more Show less