Jobs

Interviews
Job Alerts
Tools

Upskill and Grow with AI

Mock Interview Practice interviews in realistic simulations

Coding Practice Improve your coding skills with challenges

Certification Earn certifications to validate your skills

AI Learning Get trained with AI expert sessions

Career Path AI insights for smarter career decisions

AI Job Match Score AI-Powered Job Match Against Your Resume and Optimize Your Resume

Career Tools and Resources

Resume Builder Build Professional Resume with Ease

ATS Friendliness Check Check Resume Friendliness for Applicant Tracking Systems

Auto Apply Apply to hundreds of jobs on any platform effortlessly

Co-Pilot (Chrome Extension) Your AI Assistant for Seamless Browsing Efficiency

Interview Questions Streamline interviews with ready-to-use questions

Salaries Discover market-driven salary insights across skillsets and geographies

Companies Explore leading companies actively hiring talent
For Employers

Jobs

Interviews

Home
>
Jobs in Gurugram
>
RCM Business Solutions
>
Machine Learning Engineer - LLM/Python

Machine Learning Engineer - LLM/Python

RCM Business Solutions

0 years

0 Lacs

Gurugram Haryana India

Posted:2 months ago| Platform:

Apply

Skills Required

learning python evaluation developer design ml metrics automate model benchmarking reasoning support compliance finance code logging capture dashboarding reporting visualize testing openai engineering retrieval devops integration data versioning test research content tooling communication documentation reliability

Work Mode

On-site

Job Type

Full Time

Job Description

Job Overview

We are seeking a highly skilled and motivated LLM Evaluation Framework Developer to design, build, and maintain robust frameworks for evaluating large language models (LLMs). You will work closely with ML researchers, engineers, and product teams to define metrics, automate evaluations, integrate datasets, and ensure model behaviour aligns with safety, quality, and performance expectations.

Key Responsibilities

Design and implement evaluation frameworks for benchmarking LLMs across dimensions such as accuracy, robustness, reasoning, safety, and hallucination.

Develop modular pipelines to support automatic, semi-automatic, and human- in-the-loop evaluations.Integrate and customize tools like Giskard, RAGAS, DeepEval, Opik/Comet, TruLens, or similar.Define and implement custom metrics for specific use cases like RAG, Agent performance, Guardrails compliance, etc.Curate or generate high-quality evaluation datasets for various domains (e.g., medical, finance, legal, general QA, code generation).Collaborate with LLM application developers to instrument tracing and logging to capture model behaviour in real-world flows.Implement dashboarding and reporting to visualize performance trends, regressions, and comparison across model versions.Evaluate model responses using structured prompts, chain-of-thought techniques, adversarial tests, and A/B comparisons.Support red-teaming and stress testing efforts to identify vulnerabilities or ethical risks in model outputs.

Required Skills & Qualifications

Core Technical Skills :

Proficiency in Python with experience in NLP, ML/LLM libraries (e.g. Hugging Face, Lang Chain, OpenAI SDK, Cohere).
Experience building evaluation pipelines or benchmarks for ML/LLM systems.
Familiarity with RAG evaluation, agentic evaluation, safety/guardrail testing, and LLM performance metrics.
Strong grasp of prompt engineering, retrieval techniques, and generative model behaviour.

Experience With Tools Such As

Giskard, RAGAS, DeepEval, TruLens, Lang Smith, Opik/Comet, Weights & Biases, or similar.
Working knowledge of vector stores (e.g., FAISS, Weaviate, Pinecone) and embedding-based evaluation.

Testing & DevOps

Familiarity with CI/CD pipelines, unit and integration testing for LLM apps.
Understanding of data versioning, model versioning, and test reproducibility.

Preferred Qualifications

Prior experience developing or maintaining LLM-based applications (chatbots, copilots, RAG systems).
Background in ML research, applied NLP, or machine learning infrastructure.
Exposure to LLM guardrails design (e.g., jailbreaking prevention, content filtering).
Experience with open-source contribution in the LLM evaluation or tooling space.

Soft Skills

Strong communication and documentation abilities.
Comfort working in ambiguous, fast-paced, and research-heavy environments.
Passion for ensuring LLM reliability, safety, and responsible deployment.

(ref:hirist.tech)

Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

RCM Business Solutions

RecommendedJobs for You

Machine Learning Engineer - LLM/Python

RCM Business Solutions

Gurugram, Haryana, India

Machine Learning Engineer - LLM/Python

RCM Business Solutions

Gurugram, Haryana, India

Login to

Please Verify Your Phone or Email

Confirm Action

Search

Profile

Upskill and Grow with AI

Machine Learning Engineer - LLM/Python