Machine Learning Engineer – LLM & RAG (Remote, India)
About The Opportunity
We operate in the AI/ML and Enterprise Software sector, building production-ready large language model (LLM) applications and retrieval-augmented generation (RAG) systems that solve real-world enterprise problems. The team focuses on scalable, low-latency LLM inference, vector search, and data pipelines to deliver intelligent search, summarization, and automated knowledge workflows for customers across industries.Role & Responsibilities
- Design and implement end-to-end RAG solutions: document ingestion, embedding generation, vector indexing, retriever design, and LLM-based response generation.
- Develop and maintain Python back-end services and APIs that integrate LLMs, LangChain/LlamaIndex workflows, and vector search for production use.
- Optimize LLM inference performance: model selection, batching, quantization, ONNX/Triton integration, and memory/GPU optimization to meet latency and cost SLAs.
- Integrate and tune vector search stacks (FAISS, Milvus, Weaviate, or hosted vector DBs) and design embedding strategies for robust retrieval.
- Deploy and operate scalable infrastructure using Docker and orchestration platforms; automate CI/CD, monitoring, and alerting for ML services.
- Collaborate with Data Scientists and product teams to productionize models, implement A/B experiments, monitor drift, and iterate on model quality and UX.
Skills & Qualifications
Must-Have
- 4+ years of experience in machine learning or ML engineering with hands-on LLM projects.
- Strong software engineering in Python and building production back-end services.
- Experience with transformer frameworks and LLM tooling (Hugging Face Transformers, PyTorch).
- Practical experience building RAG pipelines and working with vector search (FAISS or similar).
- Proven experience deploying ML services with Docker and cloud environments (AWS/GCP/Azure).
- Knowledge of model optimization and serving techniques (quantization, ONNX, Triton, batching).
Preferred
- Hands-on experience with LangChain, LlamaIndex, or similar orchestration frameworks.
- Familiarity with vector databases (Milvus, Weaviate) and managed vector DB services.
- Experience with MLOps and monitoring tools (MLflow, Prometheus, Grafana, model-drift tooling).
Benefits & Culture Highlights
- Fully remote role with flexible hours supporting work-life balance across India.
- Opportunity to work on cutting-edge LLM/RAG products and influence architecture and tooling choices.
- Collaborative, fast-paced engineering culture that values ownership, experimentation, and scalable design.
To apply, bring strong Python engineering, hands-on LLM/RAG experience, and a passion for shipping scalable AI systems. This role is ideal for engineers who enjoy end-to-end ownership of production ML services and optimizing LLMs for real user impact.
Skills: llm,rag,python