Jobs

Interviews
Job Alerts
Tools

Upskill and Grow with AI

Mock Interview Practice interviews in realistic simulations

Coding Practice Improve your coding skills with challenges

Certification Earn certifications to validate your skills

AI Learning Get trained with AI expert sessions

Career Path AI insights for smarter career decisions

AI Job Match Score AI-Powered Job Match Against Your Resume and Optimize Your Resume

Career Tools and Resources

Resume Builder Build Professional Resume with Ease

ATS Friendliness Check Check Resume Friendliness for Applicant Tracking Systems

Auto Apply Apply to hundreds of jobs on any platform effortlessly

Co-Pilot (Chrome Extension) Your AI Assistant for Seamless Browsing Efficiency

Interview Questions Streamline interviews with ready-to-use questions

Salaries Discover market-driven salary insights across skillsets and geographies

Companies Explore leading companies actively hiring talent
For Employers

Jobs

Interviews

Home
>
Jobs in Mohali
>
XenonStack
>
LLM Reliability & Evaluation Engineer

LLM Reliability & Evaluation Engineer

XenonStack

1 years

4 - 8 Lacs

Mohali

Posted:3 days ago| Platform: GlassDoor logo

Apply

Skills Required

reliability evaluation data ai vision inference design model benchmarking latency test testing ml analysis reporting engineering research python pytorch tensorflow dataset metrics auditing monitoring mlflow governance finance healthcare learning certification training cutting recognition leadership

Work Mode

On-site

Job Type

Part Time

Job Description

Job Information

Date Opened

08/08/2025

Job Type

Full time

Industry

Technology

Work Experience

1-3 years

City

Mohali

State/Province

Punjab

Country

India

Zip/Postal Code

160071

ABOUT XENONSTACK

XenonStack is the fastest-growing data and AI foundry for agentic systems, enabling people and organizations to gain real-time and intelligent business insights.

Agentic Systems for AI Agents: akira.ai
Vision AI Platform: xenonstack.ai
Inference AI Infrastructure for Agentic Systems: nexastack.ai

THE OPPORTUNITY

We are looking for an LLM Reliability & Evaluation Engineer to design, implement, and maintain rigorous evaluation frameworks for Large Language Models (LLMs) powering our Agentic AI systems. This role will ensure models meet high standards of accuracy, safety, and performance across enterprise and regulated industry use cases.

If you’re passionate about model trustworthiness, benchmarking, and Responsible AI practices, and want to shape how AI agents behave in mission-critical workflows, this is the role for you.

RESPONSIBILITIES

Design, implement, and maintain evaluation pipelines for LLM-based applications and agentic workflows.
Define and track key performance indicators (accuracy, latency, cost, reliability) for deployed models.
Develop automated test suites, benchmark datasets, and stress-testing scenarios for LLMs.
Collaborate with data scientists, ML engineers, and product teams to integrate evaluation into the model lifecycle.
Assess bias, fairness, and safety risks in LLM outputs and recommend mitigations.
Validate model alignment to enterprise use case requirements and regulatory standards.
Conduct A/B tests, prompt performance analysis, and long-context reliability checks.
Document evaluation methodologies and maintain transparent reporting for internal and client use.
Stay updated on state-of-the-art LLM evaluation techniques, frameworks, and metrics.

SKILLS & QUALIFICATIONS

Must-Have:

3–5 years of experience in AI/ML engineering, applied research, or QA for ML systems.
Strong understanding of LLM architectures, prompt engineering, and agentic workflows.
Proficiency in Python and ML frameworks (PyTorch, TensorFlow, Hugging Face).
Experience with dataset curation, evaluation metrics (BLEU, ROUGE, BERTScore, factuality scores), and performance profiling.
Familiarity with Responsible AI principles, fairness auditing, and bias detection methods.

Good-to-Have:

Experience with LangChain, LangGraph, or similar agent frameworks.
Knowledge of model monitoring tools (Weights & Biases, MLflow, Arize AI, TruLens).
Familiarity with multi-turn conversation evaluation and human-in-the-loop testing.
Exposure to regulated industry AI governance (finance, healthcare, etc.).

CAREER GROWTH & BENEFITS

Continuous Learning & Growth

Certification sponsorships and advanced training in AI evaluation, safety, and optimization.
Access to cutting-edge AI systems and enterprise-scale evaluation environments.

Recognition & Rewards

Performance incentives and awards for innovation in model reliability.
Fast-track opportunities to AI Governance or Model Ops leadership roles.

Work Benefits & Well-Being

Comprehensive medical insurance and project-based allowances.
Cab facilities for women employees and additional perks for special projects.

XENONSTACK CULTURE – JOIN US & MAKE AN IMPACT!

We foster a culture of cultivation with bold, human-centric leadership principles. We value obsession and deep work in every initiative, and we are on a mission to reshape how enterprises adopt AI + Human Intelligence systems.

Product Values:

Obsessed with Adoption – Making AI accessible and enterprise-ready.
Obsessed with Simplicity – Turning complexity into seamless, intuitive AI experiences.

Be a part of our vision to accelerate the world’s transition to AI + Human Intelligence.

Requirements

More Jobs at XenonStack

Digital Marketing Executive

Sahibzada Ajit Singh Nagar, Punjab, India

Experience: Not specified

Salary: Not disclosed

Inside Sales Specialist

Chandigarh, Chandigarh, India

Experience: Not specified

Salary: Not disclosed

Senior Technical Recruiter

Chandigarh, India

Experience: Not specified

Salary: Not disclosed

Sr. DevOps Engineer

Mohali

3.0 - 3.0 yrs

INR 8 - 15 Lacs

Digital Marketing Executive

Sahibzada Ajit Singh Nagar, Punjab, India

Experience: Not specified

Salary: Not disclosed

Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

XenonStack

RecommendedJobs for You

LLM Reliability & Evaluation Engineer

XenonStack

Mohali

LLM Reliability & Evaluation Engineer

XenonStack

Mohali

Login to

Please Verify Your Phone or Email

Confirm Action

Search

Profile

Upskill and Grow with AI

LLM Reliability & Evaluation Engineer