Code Based Eval Experiment

Code Evaluators

Code evaluators are functions designed to assess the outputs of your experiments. They allow you to define specific criteria for success, which can be as simple or complex as your application requires. Code evaluators are especially useful when you need to apply tailored logic or rules to validate the output of your model.

Custom Code Evaluators

Creating a custom code evaluator is as simple as writing a Python function. By default, this function will take the output of an experiment run as its single argument. Your custom evaluator can return either a boolean or a numeric value, which will then be recorded as the evaluation score.

For example, let’s say our experiment is testing a task that should output a numeric value between 1 and 100. We can create a simple evaluator function to check if the output falls within this range:

def in_bounds(output):
    return 1 <= output <= 100

By passing the in_bounds function to run_experiment, evaluations will automatically be generated for each experiment run, indicating whether the output is within the allowed range. This allows you to quickly assess the validity of your experiment’s outputs based on custom criteria.

experiment = arize_client.run_experiment(
    space_id=SPACE_ID,
    dataset_id=DATASET_ID,
    task=run_task,
    evaluators=[in_bounds],
    experiment_name=experiment_name,
)

Prebuilt Phoenix Code Evaluators

Alternatively, you can leverage one of our prebuilt evaluators using phoenix.experiments.evaluators

This evaluator checks whether the output of an experiment run is a JSON-parsable string. It's useful when you want to ensure that the generated output can be correctly formatted and parsed as JSON, which is critical for applications that rely on structured data formats.

from phoenix.experiments import JSONParsable

# This defines a code evaluator that checks if the output is JSON-parsable
json_parsable_evaluator = JSONParsable()

Last updated

Copyright © 2023 Arize AI, Inc