LLM Engineers For On premise LLM SLM setup

3 - 8 years

15 - 30 Lacs

Posted:1 week ago| Platform: Naukri logo

Apply

Skills Required

Work Mode

Work from Office

Job Type

Full Time

Job Description

Job Summary

LLM Engineer

LLM Ops, RAG pipelines, containerized model serving, security-first infra setup, and scalable inference architectures

Key Responsibilities

1. On-Prem LLM / SLM Deployment

  • Install, configure, and deploy LLM/SLM models (Llama, Mistral, Qwen, Gemma, Phi, etc.) in

    on-premise

    Linux environments.
  • Implement

    GPU-based and CPU-optimized inference

    using vLLM, TGI, Ollama, LM Studio, FastChat, or custom model servers.
  • Setup model quantization (GGUF, GPTQ, AWQ) for performance optimization.

2. Architecture & Infrastructure

  • Build scalable, containerized LLM infrastructure using

    Docker, Kubernetes, Helm, Terraform, Ansible

    .
  • Set up

    secure internal API gateways

    for model inference & embeddings.
  • Optimize memory, batching, caching, and multi-model hosting.

3. RAG / Hybrid RAG Pipelines

  • Implement Retrieval-Augmented Generation with

    vector DBs (pgvector, Chroma, Milvus, Weaviate)

    .
  • Build custom pipelines: chunking, embedding generation, source indexing, and multi-vector search.
  • Develop hybrid pipelines using

    graph + vector

    approaches (GraphRAG).

4. Security & Compliance

  • Ensure secure model deployment behind firewalls and restricted networks.
  • Implement RBAC, access control, API throttling, logging, and data-sanitization layers.

5. Model Evaluation & Monitoring

  • Track latency, throughput, memory usage, and cost optimization.
  • Run evaluations using

    Ragas, DeepEval, Eval harness

    , etc.
  • Build dashboards and alerts for uptime, failures, or accuracy drops.

6. API Integrations & Applications

  • Expose on-prem LLMs via

    REST / WebSockets / gRPC APIs

    .
  • Integrate with enterprise applications, workflow tools, or automation systems.
  • Develop Python/Node scripts for orchestration and prompt workflows.

Required Technical Skills

Core LLM / SLM Skills

  • Deployment of LLMs on-prem (Llama, Qwen, Mistral, Gemma, Mixtral, etc.)
  • Quantization formats: GGUF, GPTQ, AWQ, FP16/INT8 optimizations
  • Fast inference frameworks:
    • vLLM, TGI, Ollama, LM Studio, HuggingFace Text-Generation-Inference
    • Triton Inference Server (optional)

RAG & Vector Database Skills

  • Vector DBs: pgvector, Chroma, Milvus, Weaviate, FAISS
  • Embedding pipelines (HF Transformers, OpenAI embeddings alternatives)
  • Chunking, indexing, retrieval optimization
  • Knowledge of

    GraphRAG

    or hybrid indexing (optional but preferred)

Backend & DevOps Skills

  • Python (FastAPI), Node.js, shell scripting
  • Docker, Kubernetes, Helm
  • GPU drivers, CUDA/cuDNN, inference optimization
  • Linux (Ubuntu/CentOS) for model deployment
  • CI/CD for model updates

MLOps / LLMOps Skills

  • Monitoring & logs: Prometheus, Grafana
  • Model evaluation suites: Ragas, DeepEval
  • Scalable inference tuning (batching, caching, token streaming)

Security Skills

  • On-prem networking & firewall rules
  • RBAC, API keys, authentication
  • Handling PII-sensitive offline deployments

Experience Required

  • 38 years total experience

    in AI/ML/Software engineering.
  • 1–3 years of hands-on experience

    in LLM/SLM deployment (on-prem or private cloud).
  • Strong experience in Python or Node.js backend development.
  • Practical experience with GPU servers, distributed systems, or inference optimization.

Preferred Skills

  • Knowledge of LangChain, LlamaIndex, Haystack
  • Experience in building agentic workflows
  • Custom model fine-tuning (LoRA, QLoRA, Adapters)
  • Experience with enterprise RAG, document processing, or ETL
  • Understanding of tokenization, embeddings, and transformer internals
  • Exposure to Dify, Flowise, OpenWebUI, or similar platforms

Mock Interview

Practice Video Interview with JobPe AI

Start Job-Specific Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now
Tenth Planet Technologies logo
Tenth Planet Technologies

Software Development

Innovation City

RecommendedJobs for You