Code evaluators are functions designed to assess the outputs of your experiments. They allow you to define specific criteria for success, which can be as simple or complex as your application requires. Code evaluators are especially useful when you need to apply tailored logic or rules to validate the output of your model.
Custom Code Evaluators
Creating a custom code evaluator is as simple as writing a Python function. By default, this function will take the output of an experiment run as its single argument. Your custom evaluator can return either a boolean or a numeric value, which will then be recorded as the evaluation score.
For example, letโs say our experiment is testing a task that should output a numeric value between 1 and 100. We can create a simple evaluator function to check if the output falls within this range:
defin_bounds(output):return1<= output <=100
By passing the in_bounds function to run_experiment, evaluations will automatically be generated for each experiment run, indicating whether the output is within the allowed range. This allows you to quickly assess the validity of your experimentโs outputs based on custom criteria.
Alternatively, you can leverage one of our prebuilt evaluators using phoenix.experiments.evaluators
This evaluator checks whether the output of an experiment run is a JSON-parsable string. It's useful when you want to ensure that the generated output can be correctly formatted and parsed as JSON, which is critical for applications that rely on structured data formats.
from phoenix.experiments import JSONParsable# This defines a code evaluator that checks if the output is JSON-parsablejson_parsable_evaluator =JSONParsable()
This evaluator checks whether the output matches a specified regex pattern. Itโs ideal for validating outputs that need to conform to a specific format, such as phone numbers, email addresses, or other structured data that can be described with a regular expression.
from phoenix.experiments import MatchesRegex# This defines a code evaluator that checks if the output contains # a valid phone number formatphone_number_evaluator =MatchesRegex( pattern=r"\d{3}-\d{3}-\d{4}", name="valid-phone-number")
This evaluator checks if a specific keyword is present in the output of an experiment run. Itโs helpful for validating that certain key phrases or terms appear in the output, which might be essential for tasks like content generation or response validation.
from phoenix.experiments import ContainsKeyword# This defines a code evaluator that checks for the presence of # the keyword "success"contains_keyword =ContainsKeyword(keyword="success")
This evaluator checks if any of a list of keywords are present in the output. Itโs useful when you want to validate that at least one of several important terms or phrases appears in the generated output, providing flexibility in how success is defined.
from phoenix.experiments import ContainsAnyKeyword# This defines a code evaluator that checks if any of the keywords # "error", "failed", or "warning" are presentcontains_any_keyword =ContainsAnyKeyword(keywords=["error", "failed", "warning"])
This evaluator ensures that all specified keywords are present in the output of an experiment run. It's useful for cases where the presence of multiple key phrases is essential, such as ensuring all required elements of a response or content piece are included.
from phoenix.experiments import ContainsAllKeywords# This defines a code evaluator that checks if all of the keywords # "foo" and "bar" are presentcontains_all_keywords =ContainsAllKeywords(keywords=["foo", "bar"])