Posted:1 week ago|
Platform:
On-site
Part Time
Date Opened
Job Type
Industry
Work Experience
City
State/Province
Country
Zip/Postal Code
XenonStack is the fastest-growing Data and AI Foundry for Agentic Systems, enabling enterprises to gain real-time and intelligent business insights.
We deliver innovation through:
Agentic Systems for AI Agents akira.ai
Vision AI Platform xenonstack.ai
Inference AI Infrastructure for Agentic Systems nexastack.ai
Our mission is to accelerate the world’s transition to AI + Human Intelligence by making AI agents reliable, explainable, and enterprise-ready.
We are seeking an LLM Reliability & Evaluation Engineer to ensure that large language models (LLMs) and agentic AI systems meet enterprise-grade standards of accuracy, safety, and trustworthiness.
This role focuses on evaluating, benchmarking, and stress-testing LLMs in real-world workflows, building frameworks for reliability, robustness, and continuous improvement. If you thrive at the intersection of AI research, applied testing, and responsible deployment, this is the role for you.
Evaluation Frameworks
Design and implement LLM evaluation pipelines covering accuracy, robustness, safety, and bias.
Develop automated systems for benchmarking models on enterprise-relevant tasks.
Reliability Engineering
Conduct stress tests, adversarial testing, and edge-case evaluations.
Build tools to measure latency, consistency, and error recovery in multi-turn interactions.
Metrics & Monitoring
Define KPIs such as factual accuracy, hallucination rate, toxicity, and compliance alignment.
Establish real-time monitoring for drift, anomalies, and performance regressions.
Collaboration & Alignment
Partner with ML engineers, product managers, and domain experts to align evaluation with business objectives.
Work with Responsible AI teams to implement ethical, explainable, and compliant evaluation practices.
Continuous Improvement
Feed insights from evaluation into fine-tuning, RLHF/RLAIF pipelines, and model selection.
Maintain a central repository of test cases, benchmarks, and evaluation results.
Research & Innovation
Stay current with state-of-the-art LLM evaluation techniques, from academic benchmarks to applied enterprise metrics.
Explore automated evaluation using agentic test harnesses and synthetic data generation.
Must-Have
3–6 years in AI/ML, NLP, or applied model evaluation.
Strong understanding of LLM architectures, prompt engineering, and failure modes.
Hands-on with evaluation frameworks (Eval harnesses, Ragas, OpenAI Evals, DeepEval).
Proficiency in Python and libraries like LangChain, LangGraph, LlamaIndex, Hugging Face.
Experience with vector databases, RAG pipelines, and knowledge graph integration.
Familiarity with bias/fairness testing and Responsible AI frameworks.
Good-to-Have
Experience with reinforcement learning (RLHF, RLAIF) and reward modeling.
Exposure to agentic evaluation frameworks (multi-agent stress testing, synthetic user simulators).
Knowledge of compliance and safety requirements for BFSI, GRC, or SOC use cases.
Contributions to open-source evaluation libraries or research papers.
At XenonStack, we believe in shaping the future of intelligent systems. We foster a culture of cultivation built on bold, human-centric leadership principles, where deep work, simplicity, and adoption define everything we do.
Our Cultural Values
Agency – Be self-directed and proactive.
Taste – Sweat the details and build with precision.
Ownership – Take responsibility for outcomes.
Mastery – Commit to continuous learning and growth.
Impatience – Move fast and embrace progress.
Customer Obsession – Always put the customer first.
Our Product Philosophy
Obsessed with Adoption – Making AI accessible, reliable, and enterprise-ready.
Obsessed with Simplicity – Turning complex evaluation challenges into seamless, automated frameworks.
Be part of our mission to accelerate the world’s transition to AI + Human Intelligence — by making AI agents not just powerful, but trustworthy and reliable.
XenonStack
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.
We have sent an OTP to your contact. Please enter it below to verify.
4.05 - 8.0 Lacs P.A.
mohali, punjab
Experience: Not specified
Salary: Not disclosed
4.05 - 8.0 Lacs P.A.