AI / ML Developer LLMs & Self-Hosting

2 - 7 years

9 - 13 Lacs

Posted:4 days ago| Platform: Naukri logo

Apply

Work Mode

Work from Office

Job Type

Full Time

Job Description

Contezy builds scalable web experiences and AI-driven systems empowering automation and ision-making at scale. We emphasize reproducibility, cost-effective performance, and secure deployment of machine learning infrastructure.
Position Sum y
The AI / ML Developer will architect and maintain self-hosted LLM systems for retrieval- mented generation, task-specific assistants, knowledge indexing, and real-time inference. You ll work across model selection, fine-tuning, dataset engineering, deployment, and itoring pipelines.
Key Responsibilities
  • Select and bench k LLMs based on performance, latency, and cost trade-offs.
  • Fine-tune and adapt models via supervised fine-tuning, LoRA, or PEFT using curated datasets.
  • Implement retrieval- mented generation (RAG) with vector stores and embedding workflows.
  • Develop scalable model-serving APIs and inference systems (multi-GPU, quantized models, batching).
  • Containerize and deploy models using Docker and Kubernetes with CI/CD workflows.
  • Optimize inference performance with quantization, ONNX, and accelerated runtimes.
  • Instrument observability and performance metrics: latency, throughput, and cost itoring.
  • Collaborate with cross-functional teams to integrate models into production systems.
Required Skills & Experience
  • 2+ years professional experience in AI/ML engineering, with hands-on LLM deployment.
  • Expertise in Python, PyTorch, and ML pipeline development.
  • Experience with self-hosting LLMs, model serving (vLLM, Text-Generation-Inference, etc.), and GPU optimization.
  • Strong understanding of containerization (Docker/Kubernetes) and backend APIs (FastAPI/Flask).
  • Knowledge of vector databases (FAISS, Milvus, Pinecone) and retrieval strategies.
  • Familiarity with quantization, LoRA fine-tuning, and deployment optimization for cost efficiency.
Preferred Skills
  • Experience with Hugging Face Transformers, Accelerate, and LangChain.
  • Knowledge of open models (Llama, Mistral, Falcon, Mixtral, etc.) and quantized inference frameworks (GGUF, bitsandbytes, llama.cpp).
  • Exposure to MLOps, model observability, and reproducible training workflows.
  • Experience fine-tuning or bench king foundation models on custom datasets.

Disclaimer: The job location mentioned in this description is based on publicly available information or company headquarters. Candidates are advised to verify the exact job location directly with the employer before applying.

Mock Interview

Practice Video Interview with JobPe AI

Start Machine Learning Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

RecommendedJobs for You