All Functions

Aggregate and metric function syntax

Functions Overview

This page provides a reference of all available functions by aggregation functions and metric functions. Click the linked model type for documentation on a particular model.

Aggregation functions

Every User Defined Metric must have one or more aggregation functions or metrics.



Counts the number of rows



Counts the unique values of exprs



Sums the value of the expression across rows



Averages the value of the expression across rows



Approximate quantile of expres. Second argument must be a numeric literal between 0 and 1 inclusive



Minimum of the value of the expression across rows



Maximum of the value of the expression across rows


Metric functions

Metric functions leverage existing metrics in Arize for use in your custom metric. They also allow you to customize the way existing metrics are calculated. Metric functions can take both positional arguments and keyword arguments. When using both, positional arguments must come before keyword arguments. Keyword arguments can be specified in any order.

For classification metrics, the model's configured positive class is the default value, we will refer to this as defaultPositiveClass in this doc.

Note that these functions need actual (a.k.a. ground truth) data to produce results.

True Positive

TP(actual=categoricalActualLabel, predicted=categoricalPredictionLabel, pos_class=defaultPositiveClass)

Model Type: Score Categorical

Computes the true positive rate, using the positive class. If pos_class= is omitted, then the positive class configured for the model is used.

False Positive

FP(actual=categoricalActualLabel, predicted=categoricalPredictionLabel, pos_class=defaultPositiveClass)

Model Type: Score Categorical

Computes the false positive rate, using the positive class. If pos_class= is omitted, then the positive class configured for the model is used.

True Negative

TN(actual=categoricalActualLabel, predicted=categoricalPredictionLabel, pos_class=defaultPositiveClass)

Model Type: Score Categorical

Computes the true negative rate, using the positive class. If pos_class= is omitted, then the positive class configured for the model is used.

False Negative

FN(actual=categoricalActualLabel, predicted=categoricalPredictionLabel, pos_class=defaultPositiveClass)

Model Type: Score Categorical

Computes the false negative rate, using the positive class. If pos_class= is omitted, then the positive class configured for the model is used.


PRECISION(actual=categoricalActualLabel, predicted=categoricalPredictionLabel, pos_class=defaultPositiveClass)

Model Type: Score Categorical

Computes precision, using the positive class. If pos_class= is omitted, then the positive class configured for the model is used.


RECALL(actual=categoricalActualLabel, predicted=categoricalPredictionLabel, pos_class=defaultPositiveClass)

Model Type: Score Categorical

Computes recall, using the positive class. If pos_class= is omitted, then the positive class configured for the model is used.


F1(actual=categoricalActualLabel, predicted=categoricalPredictionLabel, pos_class=defaultPositiveClass)

Model Type: Score Categorical

Compute the F1 score, also known as balanced F-score or F-measure. If pos_class= is omitted, then the positive class configured for the model is used.


F_BETA(actual=categoricalActualLabel, predicted=categoricalPredictionLabel, pos_class=defaultPositiveClass, beta=1)

Model Type: Score Categorical

Computes F-score with optional beta= parameter for re-weighting precision and recall. Beta is defaulted to 1, which produces the same result as the F-1 score. When beta=0, F-score equals precision, when beta goes to infinity, F-score equals recall. Commonly used values for beta= are 2, which weighs recall higher than precision, and 0.5, which weighs recall lower than precision. If pos_class= is omitted, then the positive class configured for the model is used.


LOG_LOSS(actual=categoricalActualLabel, predicted=scorePredictionLabel, pos_class=defaultPositiveClass)

Model Type: Score Categorical

Computes log loss of the model. Note that actual= is a string column while predicted= is a numeric column.


ACCURACY(actual=categoricalActualLabel, predicted=categoricalPredictionLabel)

Model Type: Score Categorical

Computes accuracy of the model.


MAE(actual=scoreActualLabel, predicted=scorePredictionLabel)

Model Type: Numeric, Score Categorical

Computes mean absolute error.


MAPE(actual=scoreActualLabel, predicted=scorePredictionLabel)

Model Type: Numeric, Score Categorical

Computes mean absolute percentage error.


MSE(actual=scoreActualLabel, predicted=scorePredictionLabel)

Model Type: Numeric, Score Categorical

Computes mean squared error.


RMSE(actual=scoreActualLabel, predicted=scorePredictionLabel)

Model Type: Numeric, Score Categorical

Computes root mean square error.


AUC(actual=scoreActualLabel, predicted=scorePredictionLabel)

Model Type: Score Categorical

Computes the (ROC) AUC.


NDCG(ranking_relevance=relevance, prediction_group_id=predictionGroupId, rank=rank, omit_zero_relevance=True, k=10)

Model Type: Ranking

Computes the Normalized Discounted Cumulative Gain of a ranking model. In order to control the behavior of whether rows with 0-relevance are included or not, use omit_zero_relevance since this will impact the averaging that is implicit in this metric.

Last updated

Was this helpful?


Change request updated