AI/ML Evaluation Researcher

15 - 24 years

0 Lacs

Posted:11 hours ago| Platform: Naukri logo

Apply

Work Mode

Remote

Job Type

Full Time

Job Description

Crossing Hurdles

Position:

Type:

Compensation:

Location:

Commitment:

Role Responsibilities:

  • Review reinforcement learning (RL) environments and verify terminal conditions for correctness, consistency, and alignment with task objectives.
  • Evaluate benchmarking pipelines to ensure accuracy, fairness, reproducibility, and rigorous experimental standards.
  • Provide structured technical feedback on environment design, evaluation protocols, and implementation details.
  • Review and analyze Python-based codebases to assess environment behavior, termination logic, and metric calculations.
  • Collaborate with research and engineering teams to refine evaluation methodologies and benchmarking criteria.
  • Validate reproducibility by checking performance across different runs, seeds, and hardware configurations.
  • Document findings clearly and propose improvements to strengthen benchmarking reliability and system robustness.

Requirements:

  • Background in reinforcement learning, computer science, or applied AI research.
  • Experience working with RL environments and understanding how terminal conditions and environment dynamics are implemented.
  • Strong understanding of benchmarking methodologies, evaluation metrics, and experimental protocols in RL.
  • Proficient in Python; ability to review and reason about code (PyTorch/TensorFlow is a plus).
  • Strong critical thinking and analytical skills for identifying inconsistencies and implementation issues.
  • Detail-oriented, with a commitment to fairness, accuracy, and reproducibility in agentic AI research.

Key Domains:

  • Reinforcement Learning:

    Environment design, termination logic, reward structures.
  • Benchmarking & Evaluation:

    Reproducibility testing, fairness assessments, metric development.
  • Agentic AI Systems:

    Evaluation protocols for agentic reasoning and behavior.
  • Python Code Review:

    Environment scripts, evaluation pipelines, simulation frameworks.
  • Experimental Rigor:

    Seed management, cross-hardware validation, experimental stability.

Application Process:

  • Apply for the job role.
  • Await the official message/email from our recruitment team (typically within 1-2 days).

Mock Interview

Practice Video Interview with JobPe AI

Start Artificial Intelligence Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now
Crossing Hurdles logo
Crossing Hurdles

Consulting

Atlanta

RecommendedJobs for You