As a System Engineer you will deploy, optimize, and maintain the local AI systems including large language models (LLMs), embedding generators, rerankers, and retrieval pipelines. The role focuses on ensuring reliable local inference, policy‑safe routing, and end‑to‑end RAG performance within a fully private environment.

Responsibilities:

* Deploy and configure local LLMs (Ollama/vLLM) for low‑latency chat and retrieval tasks.
* Integrate embedding models and rerankers (e.g., bge, jina, gte, or Hugging Face alternatives).* Implement hybrid retrieval (BM25 + vector) pipelines with pgvector.* Own and maintain the policy engine controlling model routing and classification (local vs external).* Conduct performance benchmarking and quantization tests for different model sizes.* Tune model parameters for optimal inference on available GPUs.* Collaborate with Backend engineers to wire AI inference APIs into FastAPI services.* Develop scripts to monitor model uptime, latency, and retrieval quality.* Maintain reproducibility: model versions, config hashes, and deterministic inference logs.* Contribute to the Q‑CERT pipeline with model metadata and audit hashes.

Required Skills:

* Python (LangChain or LlamaIndex).
* Hugging Face Transformers and embeddings.* Familiarity with Ollama, vLLM, or text‑generation‑inference.* Basic GPU management, CUDA, and quantization (GGUF, GPTQ, AWQ).* Understanding of RAG systems and evaluation metrics.* Linux environment management and containerized inference (Docker).

Preferred (Bonus):

* Experience with fine‑tuning or LoRA adapters.
* Familiarity with vector DBs (pgvector, FAISS).* Exposure to model evaluation tools (RAGAS, DeepEval).* Knowledge of policy enforcement or prompt‑guard frameworks.

Work Style:

* Works closely with Backend / Infra Engineer for deployment and data pipelines.
* Weekly sync with Frontend team to validate outputs and UI integration.* Expected to test and log all model benchmarks before production use.* Operates in a secure internal environment — zero cloud data leakage allowed.

Notes:

Initial 3‑month engagement with option to extend based on model stability, performance gains, and adherence to privacy protocols.

Job Type: Full-time

Pay: ₹25,000.00 - ₹45,000.00 per month

Benefits:

Health insurance
Paid time off
Provident Fund

Work Location: In person

More Jobs at Atman Artwork

Personal Assistant

t nagar, chennai, tamil nadu

4.0 - 4.0 yrs

INR 4 - 7 Lacs

Data Coordinator

t nagar, chennai, tamil nadu

3.0 - 3.0 yrs

INR 3 - 4 Lacs

Security Lead (India Ops)

india

3.0 - 6.0 yrs

INR 7 - 10 Lacs

Content Writer (Social Media)

t nagar, chennai, tamil nadu

2.0 - 2.0 yrs

INR 2 - 3 Lacs

Video Editor (Social Media & YouTube)

t nagar, chennai, tamil nadu

2.0 - 2.0 yrs

INR 3 - 4 Lacs

Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

Atman Artwork

Login to

Please Verify Your Phone or Email

Confirm Action

Systems Engineer