Toxicity
When To Use Toxicity Eval Template
The following shows the results of the toxicity Eval on a toxic dataset test to identify if the AI response is racist, biased, or toxic. The template variables are:
text: the text to be classified
Toxicity Eval Template
We are continually iterating our templates, view the most up-to-date template on GitHub.
How To Run the Toxicity Eval
Benchmark Results
This benchmark was obtained using notebook below. It was run using the WikiToxic dataset as a ground truth dataset. Each example in the dataset was evaluating using the TOXICITY_PROMPT_TEMPLATE
above, then the resulting labels were compared against the ground truth label in the benchmark dataset to generate the confusion matrices below.
GPT-4 Results
GPT-3.5 Results
Claude V2 Results
GPT-4 Turbo
Note: Palm is not useful for Toxicity detection as it always returns "" string for toxic inputs
Precision
0.86
0.91
0.89
0.81
0.93
Does not support
0.86
Recall
1.0
0.91
0.77
0.84
0.83
Does not support
0.40
F1
0.92
0.91
0.83
0.83
0.87
Does not support
0.54
Last updated