LLM Eval Experiment
LLM Evaluators
LLM evaluators utilize LLMs as judges to assess the success of your experiment. These evaluators can either use a prebuilt LLM evaluation template or be customized to suit your specific needs.
Arize supports a large number of LLM evaluators out of the box with LLM Classify:
Here's an example of a LLM evaluator that checks for hallucinations in the model output:
In this example, the HallucinationEvaluator
class evaluates whether the output of an experiment contains hallucinations by comparing it to the expected output using an LLM. The llm_classify
function runs the eval, and the evaluator returns an EvaluationResult
that includes a score, label, and explanation.
Once you define your evaluator class, you can use it in your experiment run like this:
You can customize LLM evaluators to suit your experiment's needs, whether you're checking for hallucinations, function choice, or other criteria where an LLM's judgment is valuable. Simply update the template with your instructions and the rails with the desired output. You can also have multiple LLM evaluators in a single experiment to assess different aspects of the output simultaneously.
Need help writing a custom evaluator template? Use ✨Copilot to write one for you ✨
Last updated