The evaluator of run_experiment can be a callable function. The function should have optional inputs of the following:
Parameter name
Description
Example
input
experiment run input
def eval(input): ...
output
experiment run output
def eval(output): ...
dataset_row
the entire row of the data, including every column as dictionary key
def eval(dataset_row): ...
metadata
experiment metadata
def eval(metadata): ...
Define Function Evaluator and Run Experiment
def edit_distance(dataset_row, output):
str1 = dataset_row['attributes.str1'] #Input used in task
str2 = output #Output from task
dp = [[i + j if i * j == 0 else 0 for j in range(len(str2) + 1)] for i in range(len(str1) + 1)]
for i in range(1, len(str1) + 1):
for j in range(1, len(str2) + 1):
dp[i][j] = dp[i - 1][j - 1] if str1[i - 1] == str2[j - 1] else 1 + min(dp[i - 1][j], dp[i][j - 1], dp[i - 1][j - 1])
return dp[-1][-1]
`
experiment1 = arize_client.run_experiment(space_id=space_id,
dataset_name=dataset_name, task=prompt_gen_task, evaluators=[edit_distance],
experiment_name="test")
Evaluator as a Class
Users have the option to run an experiment by creating an evaluator that inherits from the Evaluator(ABC) base class in the Arize Python SDK. The evaluator takes in a single dataset row as input and returns an EvaluationResult dataclass.