Job Summary We are partnering with one of the foundational Large Language Model (LLM) companies to help enhance next-generation AI systems. As a Python Developer, you will play a critical role in generating high-quality proprietary datasets, designing evaluation frameworks, and refining AI outputs. This role focuses on data-driven contributions to model fine-tuning (SFT) and Reinforcement Learning with Human Feedback (RLHF), enabling measurable improvements in LLM performance and reliability. Please note this is a contractual position with immediate joining requirement. Key Responsibilities Develop and maintain high-quality Python code for dataset creation, evaluation, and automation. Design and execute evaluation strategies (Evals) to benchmark AI model performance. Generate, rank, and critique AI responses across technical and general domains. Build task-specific datasets for Supervised Fine-Tuning (SFT) and support RLHF pipelines . Collaborate with annotators, researchers, and product teams to refine reward models. Provide clear, well-documented rationales for model evaluations and feedback. Conduct peer reviews of code and documentation, driving adherence to best practices. Continuously explore new tools and methods to enhance AI training workflows. Required Skills and Experience 3+ years of strong hands-on experience with Python. Proficiency in multi-threading, async programming, debugging concurrency/memory issues . Strong knowledge of Python testing frameworks (unit, integration, property-based testing). Ability to refactor code and work with architectural patterns. Industry experience in maintaining code quality, formatting, and clean design . Excellent analytical and reasoning skills to evaluate LLM outputs. Fluency in written and spoken English. Type of Projects & Hands-On Experience AI Training Data Generation : Writing code, prompts, and responses for SFT. Evaluation Frameworks : Designing processes to measure and benchmark model accuracy, safety, and alignment. RLHF Projects : Comparing outputs of different LLM versions, ranking quality, and providing human feedback. Production-Quality Coding : Writing maintainable, tested, and scalable Python solutions. Expected depth: hands-on coding, dataset design, evaluation strategy creation, and active contribution to LLM training loops. Preferred Qualifications Experience with AI/ML workflows (fine-tuning, eval pipelines, reward models). Familiarity with PyTorch, Hugging Face, or similar ML frameworks . Exposure to AI ethics, alignment research, or model safety practices . Advanced degree in Computer Science, Data Science, or related field (optional). Location & Shift Details Remote (Global) – fully distributed team. Flexible engagement with required 4-hour overlap with PST . Options available: 20, 30, or 40 hours/week .