In chatbots and Q&A systems, many times reference links are provided in the response, along with an answer, to help point users to documentation or pages that contain more information or the source for the answer.
EXAMPLE: Q&A from Arize-Phoenix Documentation
QUESTION: What other models does Arize Phoenix support beyond OpenAI for running Evals?
ANSWER: Phoenix does support a large set of LLM models through the model object. Phoenix supports OpenAI (GPT-4, GPT-4-32k, GPT-3.5 Turbo, GPT-3.5 Instruct, etc...), Azure OpenAI, Google Palm2 Text Bison, and All AWS Bedrock models (Claude, Mistral, etc...).
This Eval checks the reference link returned answers the question asked in a conversation
Tempalte
You are given a conversation that contains questions by a CUSTOMER and you are tryingto determine if the documentation page shared by the ASSISTANT correctly answersthe CUSTOMERS questions. We will give you the conversation between the customerand the ASSISTANT and the text of the documentation returned: [CONVERSATION AND QUESTION]:{conversation}************ [DOCUMENTATION URL TEXT]:{document_text} [DOCUMENTATION URL TEXT]:You should respond "correct"if the documentation text answers the question theCUSTOMER had in the conversation. If the documentation roughly answers the questioneven in a general way the please answer "correct". If there are multiple questions and a singlequestion is answered, please still answer "correct". If the text does not answer thequestion in the conversation,or doesn't contain information that would allow youto answer the specific question please answer "incorrect".
How to Run
from phoenix.evals import ( REF_LINK_EVAL_PROMPT_RAILS_MAP, REF_LINK_EVAL_PROMPT_TEMPLATE_STR, OpenAIModel, download_benchmark_dataset, llm_classify,)model =OpenAIModel( model_name="gpt-4", temperature=0.0,)#The rails is used to hold the output to specific values based on the template#It will remove text such as ",,," or "..."#Will ensure the binary value expected from the template is returnedrails =list(REF_LINK_EVAL_PROMPT_RAILS_MAP.values())relevance_classifications =llm_classify( dataframe=df, template=REF_LINK_EVAL_PROMPT_TEMPLATE_STR, model=model, rails=rails, provide_explanation=True, #optional to generate explanations for the value produced by the eval LLM)