Note:
Please apply only if you have
6 years or more
of relevant experience (excluding internship)- Comfortable working
5-days a week
from Gurugram, Haryana - Are an immediate joiner or currently serving your notice period
About Eucloid
At Eucloid, innovation meets impact. As a leader in AI and Data Science, we create solutions that redefine industries—from Hi-tech and D2C to Healthcare and SaaS. With partnerships with giants like Databricks, Google Cloud, and Adobe, we’re pushing boundaries and building next-gen technology.
Join our talented team of engineers, scientists, and visionaries from top institutes like IITs, IIMs, and NITs. At Eucloid, growth is a promise, and your work will drive transformative results for Fortune 100 clients.
What You’ll Do
- Design and implement robust frameworks for evaluating large language models (LLMs) across dimensions like accuracy, safety, hallucination, and reasoning.
- Build modular pipelines for automated, semi-automated, and human-in-the-loop evaluations.
- Integrate GenAI testing tools such as Giskard, RAGAS, DeepEval, TruLens, Opik/Comet, and LangSmith.
- Define and implement custom evaluation metrics tailored to use cases like RAG, agents, and safety guardrails.
- Curate or generate high-quality evaluation datasets across domains (e.g., legal, medical, QA, coding).
- Collaborate with developers to instrument tracing and logging for real-world model behavior capture.
- Build dashboards and reporting mechanisms to visualize performance, regressions, and model comparisons.
- Conduct prompt-based testing, chain-of-thought evaluations, adversarial testing, and A/B comparisons.
- Contribute to red-teaming and stress-testing efforts to uncover vulnerabilities and ethical risks.
What Makes You a Fit
Academic Background:
- Bachelor’s or Master’s degree in Computer Science, Data Science, Artificial Intelligence, or a related field.
Technical Expertise:
Minimum 6 years of hands-on experience in building, testing, or evaluating AI/ML systems
, with a strong focus on LLMs or Generative AI applications.- Proficiency in
Python
, along with experience using ML/NLP libraries
such as Hugging Face, LangChain, OpenAI SDK, or Cohere. - Experience in building
evaluation pipelines
or benchmarks for LLM performance across metrics like accuracy, robustness, safety, and hallucination. - Deep understanding of
prompt engineering
, retrieval-augmented generation (RAG)
, and agentic evaluation
techniques. - Hands-on familiarity with
evaluation tools
such as Giskard, RAGAS, DeepEval, TruLens, LangSmith, Opik/Comet, or similar. - Working knowledge of
vector databases
like FAISS, Pinecone, or Weaviate, and embedding-based evaluation methods. - Experience with
CI/CD pipelines
, unit/integration testing for LLM apps, and model versioning for reproducibility. - Ability to define
custom evaluation metrics
tailored to specific use cases (e.g., RAG performance, guardrail compliance, hallucination detection). - Strong grasp of
model instrumentation
techniques for tracing/logging model behavior in real-world flows.
Extra Skills:
- Experience in developing LLM-based applications such as chatbots, copilots, or RAG systems.
- Exposure to designing or evaluating AI safety systems (e.g., jailbreaking prevention, content filters).
- Open-source contributions to GenAI tooling or evaluation libraries.
- Strong communication and documentation skills.
- Comfort working in fast-paced, research-heavy environments.
Why You’ll Love It Here
Innovate with the Best Tech:
Work on groundbreaking projects using AI, GenAI, LLMs, and massive-scale data platforms. Tackle challenges that push the boundaries of innovation.Impact Industry Giants:
Deliver business-critical solutions for Fortune 100 clients across Hi-tech, D2C, Healthcare, SaaS, and Retail. Partner with platforms like Databricks, Google Cloud, and Adobe to create high-impact products.Collaborate with a World-Class Team:
Join exceptional professionals from IITs, IIMs, NITs, and global leaders like Walmart, Amazon, Accenture, and ZS. Learn, grow, and lead in a team that values expertise and collaboration.Accelerate Your Growth:
Access our Centres of Excellence to upskill and work on industry-leading innovations. Your professional development is a top priority.Work in a Culture of Excellence:
Be part of a dynamic workplace that fosters creativity, teamwork, and a passion for building transformative solutions. Your contributions will be recognized and celebrated.
About Our Leadership
Anuj Gupta –
Raghvendra Kushwah
Key Benefits
- Competitive salary and performance-based bonus.
- Comprehensive benefits package, including health insurance and flexible work hours.
- Opportunities for professional development and careers growth.
Location:
Application: Role Name.
Eucloid is an equal-opportunity employer. We celebrate diversity and are committed to creating an inclusive environment.