Senior Software Engineer – LLM Benchmark & Evaluation - 100% Remote

5 years

0 Lacs

Posted:5 days ago| Platform: Linkedin logo

Apply

Work Mode

Remote

Job Type

Full Time

Job Description

Greetings and thank you for visiting our job post.


Supercoder

  • Type of work: 100% Remote


Overview:

Our client is looking for experienced Senior Software Engineers with strong Python and Linux skills to design advanced benchmark tasks that evaluate the capabilities of modern Large Language Models (LLMs) such as ChatGPT, Claude, and other AI systems.

This role focuses on building realistic, technically challenging engineering scenarios that test model reasoning, debugging, and problem-solving abilities.

What You Will Do

  • Design complex, realistic engineering tasks to evaluate LLM reasoning, coding, debugging, and system understanding.
  • Build Python- and Linux-based workflows, pipelines, and multi-step scenarios.
  • Create reproducible environments using Python, Shell, and CLI tools.
  • Develop tasks that measure code comprehension, debugging, refactoring, and optimization.
  • Write clear technical documentation: problem statements, constraints, expected outputs, and detailed edge cases.
  • Use LLM tools (ChatGPT, Claude, etc.) to validate tasks and analyze model performance.

Must-Have Qualifications

  • 5+ years of professional software development experience.
  • Strong Python: modular code design, debugging complex programs, structured codebases.
  • Proficiency with Linux, Shell scripting, Bash, and command-line tools.
  • Solid technical English writing ability.
  • Strong reasoning, analytical thinking, and problem-solving skills.
  • Ability to design logical multi-step engineering scenarios.

Nice-to-Have Skills

  • Experience creating benchmark datasets, online judge problems, coding tests, or technical challenges.
  • Background with ICPC, Codeforces, Kaggle, or competitive programming.
  • Familiarity with Docker, Git, and CI/CD pipelines.
  • Experience with ML/AI or data-intensive engineering environments.

Who Will Excel in This Role

  • Engineers who enjoy designing difficult problems rather than simple feature development.
  • Developers who are strong at debugging, identifying subtle issues, and understanding complex system interactions.
  • Engineers who work well independently and can define their own approach.
  • Individuals interested in LLM evaluation, AI reliability, and technical task design.

Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now