Retrieval (RAG) Relevance
When To Use RAG Eval Template
This Eval evaluates whether a retrieved chunk contains an answer to the query. It's extremely useful for evaluating retrieval systems.
RAG Eval Template
We are continually iterating our templates, view the most up-to-date template on GitHub. Last updated on 10/12/2023
Benchmark Results
GPT-4 Result
GPT-3.5 Results
Claude V2 Results
GPT 4 Turbo
How To Run the Eval
The above runs the RAG relevancy LLM template against the dataframe df.
RAG Eval | GPT-4o | GPT-4 | GPT-4 Turbo | Gemini Pro | GPT-3.5 | Palm (Text Bison) | Claude V2 |
---|---|---|---|---|---|---|---|
Precision | 0.60 | 0.70 | 0.68 | 0.61 | 0.42 | 0.53 | 0.79 |
Recall | 0.77 | 0.88 | 0.91 | 1 | 1.0 | 1 | 0.22 |
F1 | 0.67 | 0.78 | 0.78 | 0.76 | 0.59 | 0.69 | 0.34 |
Throughput | GPT-4 | GPT-4 Turbo | GPT-3.5 |
---|---|---|---|
100 Samples | 113 Sec | 61 sec | 73 Sec |
Last updated