Summarization
When To Use Summarization Eval Template
This Eval helps evaluate the summarization results of a summarization task. The template variables are:
document: The document text to summarize
summary: The summary of the document
Summarization Eval Template
We are continually iterating our templates, view the most up-to-date template on GitHub.
How To Run the Summarization Eval
The above shows how to use the summarization Eval template.
Benchmark Results
This benchmark was obtained using notebook below. It was run using a Daily Mail CNN summarization dataset as a ground truth dataset. Each example in the dataset was evaluating using the SUMMARIZATION_PROMPT_TEMPLATE
above, then the resulting labels were compared against the ground truth label in the summarization dataset to generate the confusion matrices below.
GPT-4 Results
GPT-3.5 Results
Claud V2 Results
GPT-4 Turbo
Precision
0.87
0.79
0.94
0.61
1
1
0.57
0.75
Recall
0.63
0.88
0.641
1.0
0.1
0.16
0.7
0.61
F1
0.73
0.83
0.76
0.76
0.18
0.280
0.63
0.67
Last updated