Reference (citation) Link
Reference Links in Retrieval Q&A
In chatbots and Q&A systems, many times reference links are provided in the response, along with an answer, to help point users to documentation or pages that contain more information or the source for the answer.
EXAMPLE: Q&A from Arize-Phoenix Documentation
QUESTION: What other models does Arize Phoenix support beyond OpenAI for running Evals?
ANSWER: Phoenix does support a large set of LLM models through the model object. Phoenix supports OpenAI (GPT-4, GPT-4-32k, GPT-3.5 Turbo, GPT-3.5 Instruct, etc...), Azure OpenAI, Google Palm2 Text Bison, and All AWS Bedrock models (Claude, Mistral, etc...).
REFERENCE LINK: https://docs.arize.com/phoenix/api/evaluation-models
This Eval checks the reference link returned answers the question asked in a conversation
We are continually iterating our templates, view the most up-to-date template on GitHub.
How to run the Citation Eval
Benchmark Results
This benchmark was obtained using notebook below. It was run using a handcrafted ground truth dataset consisting of questions on the Arize platform. That dataset is available here.
Each example in the dataset was evaluating using the REF_LINK_EVAL_PROMPT_TEMPLATE_STR
above, then the resulting labels were compared against the ground truth label in the benchmark dataset to generate the confusion matrices below.
GPT-4 Results
GPT-3.5
GPT-4 Turbo
Precision
0.96
0.97
0.94
0.77
0.89
0.74
0.68
Recall
0.79
0.83
0.69
0.97
0.43
0.48
0.98
F1
0.87
0.89
0.79
0.86
0.58
0.58
0.80
Last updated