Q&A on Retrieved Data
When To Use Q&A Eval Template
This Eval evaluates whether a question was correctly answered by the system based on the retrieved data. In contrast to retrieval Evals that are checks on chunks of data returned, this check is a system level check of a correct Q&A.
question: This is the question the Q&A system is running against
sampled_answer: This is the answer from the Q&A system.
context: This is the context to be used to answer the question, and is what Q&A Eval must use to check the correct answer
Q&A Eval Template
We are continually iterating our templates, view the most up-to-date template on GitHub.
Benchmark Results
GPT-4 Results
GPT-3.5 Results
Claude V2 Results
How To Run the Eval
The above Eval uses the QA template for Q&A analysis on retrieved data.
Precision
1
1
1
1
0.99
0.42
1
1.0
Recall
0.89
0.92
0.98
0.98
0.83
1
0.94
0.64
F1
0.94
0.96
0.99
0.99
0.90
0.59
0.97
0.78
100 Samples
124 Sec
66 sec
67 sec
Last updated