This Eval helps evaluate the summarization results of a summarization task. The template variables are:
document: The document text to summarize
summary: The summary of the document
Summarization Eval Template
You are comparing the summary text and it's original document and trying to determineif the summary is good. Here is the data: [BEGIN DATA]************ [Summary]:{output}************ [Original Document]:{input} [END DATA]Compare the Summary above to the Original Document and determine if the Summary iscomprehensive, concise, coherent,and independent relative to the Original Document.Your response must be a single word, either "good"or"bad",and should not contain any textor characters aside from that."bad" means that the Summary isnot comprehensive,concise, coherent,and independent relative to the Original Document."good" means theSummary is comprehensive, concise, coherent,and independent relative to the Original Document.
How To Run the Eval
import phoenix.evals.default_templates as templatesfrom phoenix.evals import ( OpenAIModel, download_benchmark_dataset, llm_classify,)model =OpenAIModel( model_name="gpt-4", temperature=0.0,)#The rails is used to hold the output to specific values based on the template#It will remove text such as ",,," or "..."#Will ensure the binary value expected from the template is returned rails =list(templates.SUMMARIZATION_PROMPT_RAILS_MAP.values())summarization_classifications =llm_classify( dataframe=df_sample, template=templates.SUMMARIZATION_PROMPT_TEMPLATE, model=model, rails=rails, provide_explanation=True, #optional to generate explanations for the value produced by the eval LLM)
The above shows how to use the summarization Eval template.