Pre-Built Evals
Last updated
Was this helpful?
Last updated
Was this helpful?
The following are simple functions on top of the LLM evals building blocks that are pre-tested with benchmark data.
The models are instantiated and usable in the LLM Eval function. The models are also directly callable with strings.
We currently support a growing set of models for LLM Evals, please check out the .
AI vs. Human
Reference Link
User Frustration
SQL Generation
Agent Function Calling
Audio Emotion
Hallucination Eval
Tested on:
Hallucination QA Dataset, Hallucination RAG Dataset
Q&A Eval
Tested on:
WikiQA
Retrieval Eval
Tested on:
MS Marco, WikiQA
Summarization Eval
Tested on:
GigaWorld, CNNDM, Xsum
Code Generation Eval
Tested on:
WikiSQL, HumanEval, CodeXGlu
Toxicity Eval
Tested on:
WikiToxic