Posted:1 day ago|
Platform:
Hybrid
Full Time
About The Opportunity :
We are a fast-growing enterprise AI & data science consultancy serving global clients across finance, healthcare, and enterprise software.The team builds production-grade LLM-driven productsRAG systems, intelligent assistants, and custom inference pipelinesthat deliver measurable business outcomes.Role & Responsibilities :
- Design, fine-tune and productionize large language models (instruction tuning, LoRA/PEFT) using PyTorch and Hugging Face tooling for real-world applications.- Architect and implement RAG pipelines : embeddings generation, chunking strategies, vector search integration (FAISS/Pinecone/Milvus) and relevance tuning for high-quality retrieval.- Build scalable inference services and APIs (FastAPI/Falcon), containerize (Docker) and deploy to cloud/Kubernetes with low-latency and cost-optimized inference (quantization, ONNX/Triton).- Collaborate with data engineers and ML scientists to productionize data pipelines, automate retraining, monitoring, evaluation and drift detection.- Drive prompt-engineering, evaluation frameworks and safety/guardrail implementation to ensure reliable, explainable LLM behavior in production.- Establish engineering best-practices (Git workflows, CI/CD, unit tests, observability) and mentor junior engineers to raise team delivery standards.Skills & Qualifications :
Must-Have :
- 4+ years in data science/ML engineering with demonstrable experience building and shipping LLM-based solutions to production.- Strong Python engineering background and hands-on experience with PyTorch and Hugging Face Transformers (fine-tuning, tokenizers, model optimization).- Practical experience implementing RAG : embeddings, vector DBs (FAISS/Pinecone/Weaviate/Milvus), chunking and retrieval tuning.- Production deployment experience : Docker, Kubernetes, cloud infra (AWS/GCP/Azure) and inference optimization (quantization, batching, ONNX/Triton).Preferred :
- Experience with LangChain/LangGraph or similar orchestration frameworks, and building agentic workflows.- Familiarity with ML observability, model governance, safety/bias mitigation techniques and cost/performance trade-offs for production LLMs.
Zorba Consulting
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.
We have sent an OTP to your contact. Please enter it below to verify.
pune, bengaluru
10.0 - 14.0 Lacs P.A.
10.0 - 14.0 Lacs P.A.
10.0 - 14.0 Lacs P.A.
9.0 - 13.0 Lacs P.A.
9.0 - 13.0 Lacs P.A.
visakhapatnam
10.0 - 14.0 Lacs P.A.
17.0 - 22.5 Lacs P.A.
bengaluru
7.0 - 9.0 Lacs P.A.
14.0 - 18.0 Lacs P.A.
9.0 - 14.0 Lacs P.A.