Posted:3 hours ago|
Platform:
On-site
Full Time
As a System Engineer you will deploy, optimize, and maintain the local AI systems including large language models (LLMs), embedding generators, rerankers, and retrieval pipelines. The role focuses on ensuring reliable local inference, policy‑safe routing, and end‑to‑end RAG performance within a fully private environment.
Responsibilities:
* Deploy and configure local LLMs (Ollama/vLLM) for low‑latency chat and retrieval tasks.
* Integrate embedding models and rerankers (e.g., bge, jina, gte, or Hugging Face alternatives).* Implement hybrid retrieval (BM25 + vector) pipelines with pgvector.* Own and maintain the policy engine controlling model routing and classification (local vs external).* Conduct performance benchmarking and quantization tests for different model sizes.* Tune model parameters for optimal inference on available GPUs.* Collaborate with Backend engineers to wire AI inference APIs into FastAPI services.* Develop scripts to monitor model uptime, latency, and retrieval quality.* Maintain reproducibility: model versions, config hashes, and deterministic inference logs.* Contribute to the Q‑CERT pipeline with model metadata and audit hashes.
Required Skills:
* Python (LangChain or LlamaIndex).
* Hugging Face Transformers and embeddings.* Familiarity with Ollama, vLLM, or text‑generation‑inference.* Basic GPU management, CUDA, and quantization (GGUF, GPTQ, AWQ).* Understanding of RAG systems and evaluation metrics.* Linux environment management and containerized inference (Docker).
Preferred (Bonus):
* Experience with fine‑tuning or LoRA adapters.
* Familiarity with vector DBs (pgvector, FAISS).* Exposure to model evaluation tools (RAGAS, DeepEval).* Knowledge of policy enforcement or prompt‑guard frameworks.
Work Style:
* Works closely with Backend / Infra Engineer for deployment and data pipelines.
* Weekly sync with Frontend team to validate outputs and UI integration.* Expected to test and log all model benchmarks before production use.* Operates in a secure internal environment — zero cloud data leakage allowed.
Notes:
Initial 3‑month engagement with option to extend based on model stability, performance gains, and adherence to privacy protocols.
Job Type: Full-time
Pay: ₹25,000.00 - ₹45,000.00 per month
Benefits:
Work Location: In person
Atman Artwork
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.
We have sent an OTP to your contact. Please enter it below to verify.
Practice Python coding challenges to boost your skills
Start Practicing Python Nowhyderābād
Experience: Not specified
Salary: Not disclosed
Experience: Not specified
3.0 - 5.4 Lacs P.A.
hyderabad
3.0 - 6.0 Lacs P.A.
bengaluru
3.0 - 7.0 Lacs P.A.
pune, maharashtra, india
Salary: Not disclosed
mumbai
12.0 - 16.0 Lacs P.A.
mumbai metropolitan region
Salary: Not disclosed
bengaluru
7.0 - 11.0 Lacs P.A.
pune, maharashtra, india
Salary: Not disclosed
bengaluru, karnataka, india
Salary: Not disclosed