Systems Engineer

0 years

3 - 5 Lacs

Posted:3 hours ago| Platform: GlassDoor logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

As a System Engineer you will deploy, optimize, and maintain the local AI systems including large language models (LLMs), embedding generators, rerankers, and retrieval pipelines. The role focuses on ensuring reliable local inference, policy‑safe routing, and end‑to‑end RAG performance within a fully private environment.

Responsibilities:

* Deploy and configure local LLMs (Ollama/vLLM) for low‑latency chat and retrieval tasks.
* Integrate embedding models and rerankers (e.g., bge, jina, gte, or Hugging Face alternatives).* Implement hybrid retrieval (BM25 + vector) pipelines with pgvector.* Own and maintain the policy engine controlling model routing and classification (local vs external).* Conduct performance benchmarking and quantization tests for different model sizes.* Tune model parameters for optimal inference on available GPUs.* Collaborate with Backend engineers to wire AI inference APIs into FastAPI services.* Develop scripts to monitor model uptime, latency, and retrieval quality.* Maintain reproducibility: model versions, config hashes, and deterministic inference logs.* Contribute to the Q‑CERT pipeline with model metadata and audit hashes.

Required Skills:

* Python (LangChain or LlamaIndex).
* Hugging Face Transformers and embeddings.* Familiarity with Ollama, vLLM, or text‑generation‑inference.* Basic GPU management, CUDA, and quantization (GGUF, GPTQ, AWQ).* Understanding of RAG systems and evaluation metrics.* Linux environment management and containerized inference (Docker).

Preferred (Bonus):

* Experience with fine‑tuning or LoRA adapters.
* Familiarity with vector DBs (pgvector, FAISS).* Exposure to model evaluation tools (RAGAS, DeepEval).* Knowledge of policy enforcement or prompt‑guard frameworks.

Work Style:

* Works closely with Backend / Infra Engineer for deployment and data pipelines.
* Weekly sync with Frontend team to validate outputs and UI integration.* Expected to test and log all model benchmarks before production use.* Operates in a secure internal environment — zero cloud data leakage allowed.

Notes:

Initial 3‑month engagement with option to extend based on model stability, performance gains, and adherence to privacy protocols.

Job Type: Full-time

Pay: ₹25,000.00 - ₹45,000.00 per month

Benefits:

  • Health insurance
  • Paid time off
  • Provident Fund

Work Location: In person

Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

RecommendedJobs for You

pune, maharashtra, india

mumbai metropolitan region

pune, maharashtra, india

bengaluru, karnataka, india