Last updated
Copyright © 2023 Arize AI, Inc
Last updated
Arize allows you to customize your evaluation using your own criteria and prompt templates. You can do so by using our evals functions in our library.
We support three types of evaluations:
The function is designed for categorical output, which can be Binary or Multi-Class. It ensures the output is clean and is one of the classes you want or UNPARSABLE
.
A binary template looks like the following with only two values "irrelevant" and "relevant" that are expected from the LLM output:
The categorical template defines the expected output of the LLM and the rails define the classes expected from the LLM:
irrelevant
relevant
The classify uses a snap_to_rails function that searches the output string of the LLM for the classes in the classification list. It handles cases where no class is available, both classes are available or the string is a substring of the other class such as irrelevant and relevant.
A common use case is mapping the class to a 1 or 0 numeric value.
The Phoenix library does support numeric score Evals if you would like to use them. A template for a score Eval looks like the following.
We use the more generic function that can be used for almost any complex eval that doesn't fit into the categorical type.
If you would like to evaluate your application using your own code, you can do so by generating your own outputs, and sending the data to Arize using the log_evaluations
function with our python SDK. See the example code below. The evals dataframe needs to have a column called context.span_id
in order for Arize to know which traces you want to attach this evaluation to. You can get these by using the .