Retrieval (RAG) Relevance
When To Use RAG Eval Template
This Eval evaluates whether a retrieved chunk contains an answer to the query. It's extremely useful for evaluating retrieval systems.
RAG Eval Template
We are continually iterating our templates, view the most up-to-date template on GitHub.
Benchmark Results
GPT-4 Result
GPT-3.5 Results
Claude V2 Results
GPT 4 Turbo
How To Run the Eval
The above runs the RAG relevancy LLM template against the dataframe df.
RAG Eval
GPT-4o
GPT-4
GPT-4 Turbo
Gemini Pro
GPT-3.5
Palm (Text Bison)
Claude V2
Precision
0.60
0.70
0.68
0.61
0.42
0.53
0.79
Recall
0.77
0.88
0.91
1
1.0
1
0.22
F1
0.67
0.78
0.78
0.76
0.59
0.69
0.34
Throughput
GPT-4
GPT-4 Turbo
GPT-3.5
100 Samples
113 Sec
61 sec
73 Sec
Last updated