Job Description: We’re seeking a hands-on ML Engineer to transform cutting-edge research into production features. You will own Tiny LLM on-device inference, predictive analytics, dynamic escalation workflows, gamified modules, and localisation pipelines—working end-to-end from model training to mobile deployment. Required Qualifications: • Bachelor’s or Master’s in Computer Science, Engineering, or related field • 2+ years’ hands-on experience with transformer architectures, fine-tuning, and model inference • Strong proficiency in Python, including libraries/frameworks such as PyTorch, TensorFlow, sciki t- learn, and fastAPI • Proven track record deploying ML models to production (TorchServe, ONNX, HuggingFace Inference API) • Solid data engineering skills: data lakes, batch pipelines, structured logging (e.g., Airflow, Spark) • Familiarity with edge/embedded ML: quantisation (4–8 bits), memory footprints (20–50 MB), RAM budgets (100–300 MB) • Experience configuring API Gateways, server-less functions, and message queues (Kafka, Celery) • Deep understanding of data security, privacy regulations (GDPR/HIPAA- inspired), consent flows, and audit logging • Expertise in localisation and NLP: neural translation, dialect adaptation, multi-modal (text & voice) processing • Comfortable in Agile/CI-CD environments with containerised micro-services (Docker, Kubernetes) Key Responsibilities: • Develop and optimise Tiny LLM inference pipelines. • Implement dynamic risk-based escalation workflows (sentiment 0.0–1.0; thresholds 0.3–0.7; horizon 3–14 days) • Build gamification engines (points: 10–100; streaks: 3–30 days; quest windows: 1–7 days) to boost retention • Integrate neural machine translation with regional dialect support (latency: 100–300 ms; BLEU: 30–50) for text and voice interfaces • Architect offline data synchronisation (intervals: 1–24 hrs; payload: 5–50 kB) and ensure seamless async sync under < 50 kB/s bandwidth • Deploy models and services using TorchServe, ONNX, or HuggingFace Inference API, and manage server-less scaling, API Gateway, Kafka/Celery queues • Collaborate with backend and mobile teams to meet performance targets (UI load: 100–300 ms; battery drain: 1–3 %/hr) • Embed security and compliance: AES-256 & TLS 1.3 encryption, consent management, legal disclaimers, audit-grade logging • Maintain high availability (99.9 % SLA), automated retraining cycles (1–4 weeks), and structured logging for analytics • Write clean, production-grade Python code to support data pipelines, model training, inference, and integration. Preferred Skils: • Prior work in digital health or mental-wellness applications • Familiarity with mobile frameworks (React Native, Flutter) • Experience designing or measuring gamification metrics • Knowledge of federated learning or privacy-preserving ML What We Offer: - Competitive contract rate - Remote work arrangement - Opportunity to work on exciting projects with a talented team If you’re passionate about building humane AI that transcends infrastructure barriers and delivers personalised, proactive mental care, we’d love to hear from you. Please share your resume and a brief note on a production ML system you’ve delivered end-to-end.