We are looking for a skilled AI Engineer / Data Scientist with hands-on experience in OCR pipelines and Large Language Model (LLM) fine-tuning . The ideal candidate will work on developing, fine-tuning, and optimizing Vision-Language Models (VLMs) to extract structured information from scanned or image-based documents. You will collaborate closely with data scientists and backend engineers to build scalable, accurate, and production-ready AI pipelines . Key Responsibilities Fine-tune and adapt multimodal LLMs (e.g., Qwen-VL, LLaVA, or similar) for domain-specific document understanding. Design prompt templates and instruction sets to improve structured JSON output. Perform incremental and cross-dataset fine-tuning for robust generalization. Implement evaluation metrics and create validation datasets to track performance. Optimize inference using quantization, LoRA adapters, and frameworks such as Ray Serve, vLLM, or Unsloth. Collaborate with backend teams to integrate models into production systems . Develop monitoring tools to log model confidence, token usage, and latency metrics. Required Skills & Experience Strong programming skills in Python , with experience in PyTorch and Hugging Face Transformers . Expertise in OCR tools like PaddleOCR, Tesseract, or EasyOCR . Hands-on experience in fine-tuning and serving LLMs / VLMs (e.g., Qwen, LLaVA, Mistral, Vicuna). Solid understanding of LoRA / QLoRA / PEFT for efficient model training. Experience with structured data extraction , prompt engineering , and JSON schema generation . Familiarity with Unsloth, Ray Serve, or vLLM for scalable inference. Proficiency with Docker, CUDA, and NVIDIA GPU environments . Strong grasp of tokenization, attention mechanisms, and quantization (4-bit / 8-bit) . Nice to Have Experience with LLM model serving and orchestration . Exposure to CI/CD pipelines and deployment on AWS, Azure, or GCP . Knowledge of PDF parsing tools (Camelot, PyMuPDF, etc.). Prior work in document intelligence or AI-based invoice/data extraction systems . What We Offer Opportunity to work on cutting-edge Vision-Language AI models . Hands-on involvement in production-grade OCR + LLM pipelines . Competitive compensation and a flexible work culture . Collaborative environment that fosters AI innovation in enterprise automation.
Key Responsibilities: Fine-tune and adapt multimodal LLMs (Qwen-VL, LLaVA, etc.) for document-understanding tasks. Design prompt templates and instruction sets to improve structured JSON output accuracy. Perform full and incremental fine-tuning across multiple datasets for better generalization. Develop validation datasets and evaluation metrics to measure model accuracy and robustness. Optimize inference using quantization, LoRA adapters, vLLM, Unsloth, and Ray Serve . Collaborate with backend engineers to integrate VLMs into production pipelines . Build monitoring tools to log model confidence scores, token usage, latency, and performance drift . Work on continuous improvements to enhance model throughput and reduce inference costs. Required Skills & Experience Strong Python expertise with hands-on experience in PyTorch and Hugging Face Transformers . Practical experience with OCR tools such as PaddleOCR, Tesseract, EasyOCR . Hands-on experience fine-tuning or deploying LLMs / VLMs (Qwen, LLaVA, Mistral, Vicuna, etc.). Solid understanding of LoRA / QLoRA / PEFT for efficient model training. Experience with JSON schema generation, prompt engineering , and structured data extraction. Familiarity with Unsloth, Ray Serve, vLLM for high-performance inference serving. Knowledge of Docker, CUDA, and GPU-based deployment . Strong grasp of tokenization, attention mechanisms, and 4-bit / 8-bit quantization . Nice to Have Experience with LLM model serving frameworks and production deployments. Exposure to CI/CD pipelines and cloud platforms (Azure, AWS, GCP). Familiarity with PDF parsing tools (Camelot, PyMuPDF, pdfplumber). Background in AI-based document intelligence , invoice extraction, or OCR automation systems. What We Offer Opportunity to work on cutting-edge Vision-Language AI systems. Hands-on involvement in production-grade OCR + LLM workflows. Competitive compensation with a flexible and remote-friendly work culture. Collaborative environment focused on innovation in enterprise document automation . Notice: One month. Candidates from Kolkata are preferred.
The Senior AI/ML Engineer will lead the development, fine-tuning, and optimization of advanced multimodal AI systems involving OCR and Vision-Language Models. The role requires strong hands-on technical expertise and the ability to build production-ready, scalable AI pipelines. Core Responsibilities: Fine-tune and adapt multimodal LLMs (e.g., Qwen-VL, LLaVA, Mistral-based VLMs) for domain-specific document understanding. Design prompt templates, instruction sets, and output structures to improve JSON-formatted extraction quality. Perform full-scale fine-tuning, incremental learning, and cross-dataset training to ensure robust model generalization. Create evaluation metrics, validation datasets, and accuracy benchmarking reports. Optimize inference efficiency using quantization, LoRA, QLoRA, PEFT adapters , and frameworks like Ray Serve, vLLM, Unsloth . Integrate AI models into production systems in collaboration with backend engineers. Build monitoring utilities for inference latency, token usage, model confidence scores, and pipeline performance. Required Skill Set Technical Skills (Mandatory): Strong Python programming expertise. Experience with PyTorch and Hugging Face Transformers . Hands-on experience with OCR tools: PaddleOCR, Tesseract, EasyOCR . Experience fine-tuning or serving LLMs / VLMs such as Qwen, LLaVA, Mistral, Vicuna, etc. Knowledge of LoRA, QLoRA, PEFT for efficient model training. Expertise in JSON schema generation, prompt engineering & structured data extraction. Experience with vLLM, Unsloth, Ray Serve for high-throughput inference. Strong understanding of tokenization, attention mechanisms, quantization (4-bit/8-bit) . Experience deploying models using Docker, CUDA, NVIDIA GPU environments . Nice to Have: Experience with LLM-specific model serving and optimization. Exposure to CI/CD pipelines, Cloud environments (Azure / AWS / GCP). Familiarity with PDF parsing libraries (Camelot, PyMuPDF, PDFPlumber). Previous experience in Document AI , automated invoice extraction, OCR+LLM pipelines. Desired Candidate Profile 6+ years of experience as an AI/ML Engineer or Deep Learning Specialist. Ability to work independently and deliver production-quality ML pipelines. Strong problem-solving, debugging, and research-oriented mindset. Capability to work in fast-paced, distributed/remote environments with tight timelines. Notice: One month. Candidates from Kolkata are preferred.