About the Role Join a high-impact project with one of the foundational Large Language Model (LLM) companies. Your work will directly help enhance next-generation AI models by creating high-quality proprietary datasets used for fine-tuning and evaluation of LLMs. You’ll design structured prompts, write Python code for supervised fine-tuning (SFT) datasets, and evaluate model outputs for Reinforcement Learning with Human Feedback (RLHF) — improving how AI understands and responds. Role Responsibilities Design, develop, and maintain efficient Python code to train and optimize AI models. Conduct evaluations (Evals) to benchmark LLM performance and analyze model responses. Generate and review datasets for Supervised Fine-Tuning (SFT). Collaborate with researchers and annotators on RLHF-based model improvement. Evaluate and rank AI model outputs across multiple domains, ensuring quality and alignment. Required Qualifications 3+ years of professional experience in software development, with strong Python expertise. Proven analytical and problem-solving abilities. Strong reasoning and writing skills in English. Preferred Qualifications Exposure to LLMs, AI evaluation, or data annotation pipelines. Experience designing structured data or evaluation processes for ML/AI models. Good to Have Hands-on experience demonstrated through LeetCode, HackerRank, or GitHub profiles. Attention to detail and creativity in prompt or task design.