About MAKO
Founded in 2013, Mako IT Lab is a global software development company with a strong presence across the USA, UK, India, and Nepal. Over the years, we’ve partnered with companies globally helping them solve complex challenges and build meaningful digital experiences.What truly defines Mako is our culture. We believe in creating an environment where people feel empowered to take ownership, exercise freedom in their ideas, and contribute to solutions that genuinely make an impact. Learning is at the heart of who we are—our teams constantly grow through hands-on exposure, real-world problem solving, and continuous knowledge sharing across functions and geographies.We don’t just build long-term partnerships with clients—we build long-term careers for our people. At Mako, you’ll be part of a collaborative, supportive, and fast-growing global team where curiosity is encouraged, initiative is celebrated, and every individual plays a meaningful role in shaping the company’s journey.
Role Overview
We are seeking an experienced AI Engineer with deep expertise in LLM-driven architectures, RAG systems, agentic workflows, and multimodal AI development. The ideal candidate will be skilled in building scalable AI pipelines using FastAPI, Kafka, FastMCP, and Tavily Web Search, while also having hands-on experience with vllm-based inference and Stable Diffusion pipelines.You will architect and implement intelligent systems leveraging Large Language Models, vision models, and autonomous agents, with a strong focus on observability, performance, and production reliability.
Key Responsibilities
- LLM, VLLM & Agentic System Development
- Build autonomous LLM agents using LangChain, LangGraph, and FastMCP.
- Develop RAG workflows using embeddings, vector stores, and knowledge-grounded reasoning.
- Integrate VLLM / SGLang / other high-throughput inference backends for low-latency model serving.
- Implement Tavily web-search integrations for real-time knowledge augmentation.
- Optimize inference using quantized GGUF, tensorized formats, and GPU-accelerated pipelines.
- Multimodal & Image Generation Systems
- Build and deploy Stable Diffusion (SDXL/SD 1.5/ControlNet/T2I) pipelines for image generation tasks.
- Integrate LoRAs, control modules, and diffusion-based fine-tuning for custom domains.
- Develop multimodal agents that combine LLM reasoning with vision tasks such as classification, captioning, or image prompts.
- Backend & Infrastructure Engineering
- Build robust FastAPI services for orchestrating LLMs, Stable Diffusion, retrieval, and agentic tasks.
- Develop event-driven workflows using Kafka for distributed AI systems.
- Implement auditing, agent-output monitoring, and API-layer logging for end-to-end traceability.
- High-level API & Third-party Integrations
- Integrate third-party services: authentication, analytics, search APIs, cloud inference APIs, and enterprise data sources.
- Build secure and scalable API layers for production deployments.
- Fine-tuning & Model Lifecycle Management
- Fine-tune LLaMA, Mistral, Phi-3, and diffusion models for domain-specific tasks.
- Use MLflow for tracking experiments, hyperparameters, metrics, and versioning.
- Conduct evaluation on hallucinations, retrieval consistency, reasoning depth, and multimodal accuracy.
Required Skills & Qualifications
Core AI/LLM Skills
- Experience with LLMs, RAG systems, LangChain, LangGraph, LlamaIndex
- Hands-on with VLLM, SGLang, or similar inference engines
- Model quantization (GGUF), optimization, and GPU memory tuning
- Agent frameworks & tool calling (FastMCP, Groq, Hugging Face)
Multimodal & Image Generation
- Stable Diffusion, ControlNet, LoRA fine-tuning, custom pipelines
- Diffusers, ComfyUI, or InvokeAI experience (bonus)
Engineering & Systems
- Kafka-based event-driven systems
- FastAPI/Flask/Node.js backend development
- Third-party API integrations
- Docker, CI/CD, and cloud platforms (GCP/Azure)
Databases & Retrieval
- MongoDB, DuckDB,
- Embedding stores, vector databases (Pinecone / Qdrant), retrieval optimization
Observability & MLOps
- MLflow for experiment tracking and model lifecycle
- Performance monitoring, logging, auditing, API observability
Frontend (Good to have)
- React, Redux, Next.js, Electron.js for dashboards and AI interfaces