Jobs

Interviews
Tools

Upskill and Grow with AI

Mock Interview Practice interviews in realistic simulations

Coding Practice Improve your coding skills with challenges

Certification Earn certifications to validate your skills

AI Learning Get trained with AI expert sessions

Career Path AI insights for smarter career decisions

AI Job Match Score AI-Powered Job Match Against Your Resume and Optimize Your Resume

Career Tools and Resources

Resume Builder Build Professional Resume with Ease

ATS Friendliness Check Check Resume Friendliness for Applicant Tracking Systems

Auto Apply Apply to hundreds of jobs on any platform effortlessly

Co-Pilot (Chrome Extension) Your AI Assistant for Seamless Browsing Efficiency

Interview Questions Streamline interviews with ready-to-use questions

Salaries Discover market-driven salary insights across skillsets and geographies

Companies Explore leading companies actively hiring talent
For Employers

Home

Jobs

Home
>
Jobs in Gurugram
>
Bynd
>
AI Engineering Intern - LLM Evaluations

AI Engineering Intern - LLM Evaluations

Bynd

0 years

0 Lacs

Gurugram Haryana India

Posted:4 days ago| Platform:

Apply

Skills Required

ai engineering analyze research cutting retrieval workflow automation ml evaluation design test metrics model stack data testing openai mistral training logging benchmarking python score reports reasoning

Work Mode

On-site

Job Type

Full Time

Job Description

Who We Are

Bynd is redefining financial intelligence through advanced AI, transforming how leading investment banks, private equity firms, and equity researchers globally analyze and act upon critical information. Our founding team includes a Partner from Apollo ($750B AUM) and AI engineers from UIUC, IIT, and other top-tier institutions. Operating as both a research lab and a product company, we build cutting-edge retrieval systems and AI-driven workflow automation for knowledge-intensive financial tasks.

Role Overview

As an AI Intern at Bynd, you’ll work at the intersection of cutting-edge GenAI systems and rigorous classical ML evaluation methodologies. Your primary responsibility will be to build and refine evaluation pipelines for our existing AI-driven financial intelligence systems. You’ll collaborate closely with the founding team and top financial domain experts to ensure our models are not only powerful—but measurable, explainable, and reliable.

If you’re excited by the idea of working hands-on with state-of-the-art LLMs, experimenting with RAG systems, and building frameworks that make AI outputs trustworthy and actionable, this role is made for you.

Responsibilities

• Design, implement, and iterate on evaluation pipelines for existing AI/ML systems, particularly GenAI-based and RAG-based architectures.

• Develop test sets, metrics, and validation frameworks aligned with financial use cases.

• Analyze model performance (both quantitative and qualitative) to uncover insights, gaps, and opportunities for improvement.

• Work alongside full-stack and ML engineers to integrate evaluation systems into CI/CD workflows.

• Assist in data collection, benchmark tasks, and A/B testing setups for LLM responses.

• Stay up-to-date with academic and industry advancements in evaluation frameworks, prompt testing, and trustworthy AI.

Preferred:

• Prior hands-on experience with GenAI systems (e.g., OpenAI, Claude, Mistral, etc.), including prompt design and retrieval-augmented generation (RAG).

• Solid understanding of classical ML concepts like training-validation splits, overfitting, data leakage, and cross-validation.

• Familiarity with tools such as Weights & Biases, LangSmith, or custom logging/benchmarking suites.

• Comfort with Python, evaluation libraries (e.g., sklearn, evaluate, bert-score, BLEU/ROUGE, etc.), and backend integration.

• Experience working with unstructured financial data (PDFs, tables, earnings reports, etc.) is a massive plus.

What We’re Looking For

We’re looking for a fast learner with deep intellectual curiosity and strong fundamentals. You should be comfortable reasoning through ambiguity, rapidly testing hypotheses, and communicating technical decisions with clarity. You’re someone who thinks not just about building intelligent systems—but about how we measure intelligence meaningfully.

This is an opportunity to work closely with a high-caliber founding team and ship impactful systems used by decision-makers at global financial institutions. If you’re passionate about building AI that works and works reliably, come build with us.

More Jobs at Bynd

Founding Full-Stack Engineer

Gurugram, Haryana, India

4.0 - 4.0 yrs

Salary: Not disclosed

AI Engineering Intern - LLM Evaluations

Gurugram, Haryana, India

Experience: Not specified

Salary: Not disclosed

Mock Interview

Practice Video Interview with JobPe AI

Start Ai Interview Now

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

Bynd

2 Jobs

RecommendedJobs for You

AI Engineering Intern - LLM Evaluations

Bynd

Gurugram, Haryana, India

AI Engineering Intern - LLM Evaluations

Bynd

Gurugram, Haryana, India

Login to

Please Verify Your Phone or Email

Confirm Action

Search

Profile

Upskill and Grow with AI

AI Engineering Intern - LLM Evaluations