Citation Evals
In chatbots and Q&A systems, many times reference links are provided in the response, along with an answer, to help point users to documentation or pages that contain more information or the source for the answer.
EXAMPLE: Q&A from Arize-Phoenix Documentation
QUESTION: What other models does Arize Phoenix support beyond OpenAI for running Evals?
ANSWER: Phoenix does support a large set of LLM models through the model object. Phoenix supports OpenAI (GPT-4, GPT-4-32k, GPT-3.5 Turbo, GPT-3.5 Instruct, etc...), Azure OpenAI, Google Palm2 Text Bison, and All AWS Bedrock models (Claude, Mistral, etc...).
REFERENCE LINK: https://docs.arize.com/phoenix/api/evaluation-models
This Eval checks the reference link returned answers the question asked in a conversation
Citation Eval Template
How to Run
Benchmark Results
GPT-4 Results
GPT-3.5
GPT-4 Turbo
Precision
0.96
0.97
0.94
0.77
0.89
0.74
0.68
Recall
0.79
0.83
0.69
0.97
0.43
0.48
0.98
F1
0.87
0.89
0.79
0.86
0.58
0.58
0.80
Last updated