Last updated
Copyright ยฉ 2023 Arize AI, Inc
Last updated
Code evaluators are functions designed to assess the outputs of your experiments. They allow you to define specific criteria for success, which can be as simple or complex as your application requires. Code evaluators are especially useful when you need to apply tailored logic or rules to validate the output of your model.
Creating a custom code evaluator is as simple as writing a Python function. By default, this function will take the output of an experiment run as its single argument. Your custom evaluator can return either a boolean or a numeric value, which will then be recorded as the evaluation score.
For example, letโs say our experiment is testing a task that should output a numeric value between 1 and 100. We can create a simple evaluator function to check if the output falls within this range:
By passing the in_bounds
function to run_experiment
, evaluations will automatically be generated for each experiment run, indicating whether the output is within the allowed range. This allows you to quickly assess the validity of your experimentโs outputs based on custom criteria.
Alternatively, you can leverage one of our prebuilt evaluators using phoenix.experiments.evaluators
This evaluator checks whether the output of an experiment run is a JSON-parsable string. It's useful when you want to ensure that the generated output can be correctly formatted and parsed as JSON, which is critical for applications that rely on structured data formats.