Jobs

Interviews
Job Alerts
Tools

Upskill and Grow with AI

Mock Interview Practice interviews in realistic simulations

Coding Practice Improve your coding skills with challenges

Certification Earn certifications to validate your skills

AI Learning Get trained with AI expert sessions

Career Path AI insights for smarter career decisions

AI Job Match Score AI-Powered Job Match Against Your Resume and Optimize Your Resume

Career Tools and Resources

Resume Builder Build Professional Resume with Ease

ATS Friendliness Check Check Resume Friendliness for Applicant Tracking Systems

Auto Apply Apply to hundreds of jobs on any platform effortlessly

Co-Pilot (Chrome Extension) Your AI Assistant for Seamless Browsing Efficiency

Interview Questions Streamline interviews with ready-to-use questions

Salaries Discover market-driven salary insights across skillsets and geographies

Companies Explore leading companies actively hiring talent
For Employers

Home
>
Jobs in chennai
>
Tenth Planet Technologies
>
LLM Engineers For On premise LLM SLM setup

LLM Engineers For On premise LLM SLM setup

Tenth Planet Technologies

3 - 8 years

15 - 30 Lacs

chennai

Posted:1 week ago| Platform:

Apply

Skills Required

slm model llm

Work Mode

Work from Office

Job Type

Full Time

Job Description

Job Summary

LLM Engineer

LLM Ops, RAG pipelines, containerized model serving, security-first infra setup, and scalable inference architectures

Key Responsibilities

1. On-Prem LLM / SLM Deployment

Install, configure, and deploy LLM/SLM models (Llama, Mistral, Qwen, Gemma, Phi, etc.) in
on-premise
Linux environments.
Implement
GPU-based and CPU-optimized inference
using vLLM, TGI, Ollama, LM Studio, FastChat, or custom model servers.
Setup model quantization (GGUF, GPTQ, AWQ) for performance optimization.

2. Architecture & Infrastructure

Build scalable, containerized LLM infrastructure using
Docker, Kubernetes, Helm, Terraform, Ansible
.
Set up
secure internal API gateways
for model inference & embeddings.
Optimize memory, batching, caching, and multi-model hosting.

3. RAG / Hybrid RAG Pipelines

Implement Retrieval-Augmented Generation with
vector DBs (pgvector, Chroma, Milvus, Weaviate)
.
Build custom pipelines: chunking, embedding generation, source indexing, and multi-vector search.
Develop hybrid pipelines using
graph + vector
approaches (GraphRAG).

4. Security & Compliance

Ensure secure model deployment behind firewalls and restricted networks.
Implement RBAC, access control, API throttling, logging, and data-sanitization layers.

5. Model Evaluation & Monitoring

Track latency, throughput, memory usage, and cost optimization.
Run evaluations using
Ragas, DeepEval, Eval harness
, etc.
Build dashboards and alerts for uptime, failures, or accuracy drops.

6. API Integrations & Applications

Expose on-prem LLMs via
REST / WebSockets / gRPC APIs
.
Integrate with enterprise applications, workflow tools, or automation systems.
Develop Python/Node scripts for orchestration and prompt workflows.

Required Technical Skills

Core LLM / SLM Skills

Deployment of LLMs on-prem (Llama, Qwen, Mistral, Gemma, Mixtral, etc.)
Quantization formats: GGUF, GPTQ, AWQ, FP16/INT8 optimizations
Fast inference frameworks:

vLLM, TGI, Ollama, LM Studio, HuggingFace Text-Generation-Inference
Triton Inference Server (optional)

RAG & Vector Database Skills

Vector DBs: pgvector, Chroma, Milvus, Weaviate, FAISS
Embedding pipelines (HF Transformers, OpenAI embeddings alternatives)
Chunking, indexing, retrieval optimization
Knowledge of
GraphRAG
or hybrid indexing (optional but preferred)

Backend & DevOps Skills

Python (FastAPI), Node.js, shell scripting
Docker, Kubernetes, Helm
GPU drivers, CUDA/cuDNN, inference optimization
Linux (Ubuntu/CentOS) for model deployment
CI/CD for model updates

MLOps / LLMOps Skills

Monitoring & logs: Prometheus, Grafana
Model evaluation suites: Ragas, DeepEval
Scalable inference tuning (batching, caching, token streaming)

Security Skills

On-prem networking & firewall rules
RBAC, API keys, authentication
Handling PII-sensitive offline deployments

Experience Required

38 years total experience
in AI/ML/Software engineering.
1–3 years of hands-on experience
in LLM/SLM deployment (on-prem or private cloud).
Strong experience in Python or Node.js backend development.
Practical experience with GPU servers, distributed systems, or inference optimization.

Preferred Skills

Knowledge of LangChain, LlamaIndex, Haystack
Experience in building agentic workflows
Custom model fine-tuning (LoRA, QLoRA, Adapters)
Experience with enterprise RAG, document processing, or ETL
Understanding of tokenization, embeddings, and transformer internals
Exposure to Dify, Flowise, OpenWebUI, or similar platforms

More Jobs at Tenth Planet Technologies

SAP SD Lead (Order-to-Cash)

Chennai

7 - 12 yrs

INR 16 - 25 Lacs

SAP MM Lead (Procure To Pay)

Chennai

7 - 12 yrs

INR 16 - 20 Lacs

SAP Program Lead Manufacturing

Chennai

12 - 15 yrs

INR 18 - 27 Lacs

SAP FICO Lead

Chennai

7 - 12 yrs

INR 16 - 25 Lacs

Senior Manager SAP Program

Chennai

12 - 15 yrs

INR 25 - 32 Lacs

Mock Interview

Practice Video Interview with JobPe AI

Start Job-Specific Interview

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.