Q&A on Retrieved Data
When To Use Q&A Eval Template
This Eval evaluates whether a question was correctly answered by the system based on the retrieved data. In contrast to retrieval Evals that are checks on chunks of data returned, this check is a system level check of a correct Q&A.
question: This is the question the Q&A system is running against
sampled_answer: This is the answer from the Q&A system.
context: This is the context to be used to answer the question, and is what Q&A Eval must use to check the correct answer
Q&A Eval Template
We are continually iterating our templates, view the most up-to-date template on GitHub. Last updated on 10/12/2023
Benchmark Results
GPT-4 Results
GPT-3.5 Results
Claude V2 Results
How To Run the Eval
The above Eval uses the QA template for Q&A analysis on retrieved data.
Q&A Eval | GPT-4o | GPT-4 | GPT-4 Turbo | Gemini Pro | GPT-3.5 | GPT-3.5-turbo-instruct | Palm (Text Bison) | Claude V2 |
---|---|---|---|---|---|---|---|---|
Precision | 1 | 1 | 1 | 1 | 0.99 | 0.42 | 1 | 1.0 |
Recall | 0.89 | 0.92 | 0.98 | 0.98 | 0.83 | 1 | 0.94 | 0.64 |
F1 | 0.94 | 0.96 | 0.99 | 0.99 | 0.90 | 0.59 | 0.97 | 0.78 |
Throughput | GPT-4 | GPT-4 Turbo | GPT-3.5 |
---|---|---|---|
100 Samples | 124 Sec | 66 sec | 67 sec |
Last updated