Jobs

Interviews
Job Alerts
Tools

Upskill and Grow with AI

Mock Interview Practice interviews in realistic simulations

Coding Practice Improve your coding skills with challenges

Certification Earn certifications to validate your skills

AI Learning Get trained with AI expert sessions

Career Path AI insights for smarter career decisions

AI Job Match Score AI-Powered Job Match Against Your Resume and Optimize Your Resume

Career Tools and Resources

Resume Builder Build Professional Resume with Ease

ATS Friendliness Check Check Resume Friendliness for Applicant Tracking Systems

Auto Apply Apply to hundreds of jobs on any platform effortlessly

Co-Pilot (Chrome Extension) Your AI Assistant for Seamless Browsing Efficiency

Interview Questions Streamline interviews with ready-to-use questions

Salaries Discover market-driven salary insights across skillsets and geographies

Companies Explore leading companies actively hiring talent
For Employers

Home
>
Jobs in gurugram
>
Crossing Hurdles
>
AI/ML Evaluation Researcher

AI/ML Evaluation Researcher

Crossing Hurdles

15 - 24 years

0 Lacs

gurugram

Posted:11 hours ago| Platform:

Apply

Skills Required

artificial intelligence machine learning python

Work Mode

Remote

Job Type

Full Time

Job Description

Crossing Hurdles

Position:

Type:

Compensation:

Location:

Commitment:

Role Responsibilities:

Review reinforcement learning (RL) environments and verify terminal conditions for correctness, consistency, and alignment with task objectives.
Evaluate benchmarking pipelines to ensure accuracy, fairness, reproducibility, and rigorous experimental standards.
Provide structured technical feedback on environment design, evaluation protocols, and implementation details.
Review and analyze Python-based codebases to assess environment behavior, termination logic, and metric calculations.
Collaborate with research and engineering teams to refine evaluation methodologies and benchmarking criteria.
Validate reproducibility by checking performance across different runs, seeds, and hardware configurations.
Document findings clearly and propose improvements to strengthen benchmarking reliability and system robustness.

Requirements:

Background in reinforcement learning, computer science, or applied AI research.
Experience working with RL environments and understanding how terminal conditions and environment dynamics are implemented.
Strong understanding of benchmarking methodologies, evaluation metrics, and experimental protocols in RL.
Proficient in Python; ability to review and reason about code (PyTorch/TensorFlow is a plus).
Strong critical thinking and analytical skills for identifying inconsistencies and implementation issues.
Detail-oriented, with a commitment to fairness, accuracy, and reproducibility in agentic AI research.

Key Domains:

Reinforcement Learning:
Environment design, termination logic, reward structures.
Benchmarking & Evaluation:
Reproducibility testing, fairness assessments, metric development.
Agentic AI Systems:
Evaluation protocols for agentic reasoning and behavior.
Python Code Review:
Environment scripts, evaluation pipelines, simulation frameworks.
Experimental Rigor:
Seed management, cross-hardware validation, experimental stability.

Application Process:

Apply for the job role.
Await the official message/email from our recruitment team (typically within 1-2 days).

More Jobs at Crossing Hurdles

Mathematics Researcher(Lean Theorem Proving Specialist)

Gurugram

1 - 6 yrs

INR 15 - 30 Lacs

Payroll Associate

Bengaluru

5 - 7 yrs

INR 4 - 7 Lacs

Content Analyst

Gurugram

3 - 8 yrs

INR 5 - 15 Lacs

Java Developer

Gurugram

5 - 8 yrs

INR 8 - 12 Lacs

Data Scientist

Bengaluru

5 - 10 yrs

INR 25 - 30 Lacs

Mock Interview

Practice Video Interview with JobPe AI

Start Artificial Intelligence Interview

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.