Search
K
Links

Choosing Your Metrics

Monitor Performance, Drift, Data Quality, and Custom Metrics

Overview

Monitors automatically detect drift, data quality issues, or anomalous performance degradations with highly configurable dimensions based on both common KPIs and custom metrics.
Learn how to set up your monitors here!
Model performance metrics measure how well your model performs in production. Monitor model performance with daily or hourly checks using an evaluation metric. Your model type determines your performance metric.

Performance Metrics

Metrics are batched into Metric Groups that align with model types and their variants.
Metric Group
Metrics
Classification
Accuracy, Recall, Precision, FPR, FNR, F1, Sensitivity, Specificity
Regression
MAPE, MAE, RMSE, MSE, R-Squared, Mean Error
Ranking
NDCG@k, AUC@k
Ranking Labels
MAP@k, MRR
AUC / LogLoss
AUC, PR-AUC, Log Loss
Computer Vision / Object Detection
Accuracy (MAP & IoU coming soon)

Valid Model Type & Metric Group Combinations

Model Type
Metric Group Combination
Regression
Regression
Binary Classification
Classification and/or Regression and/or AUC/LogLoss
Ranking w/ label
Ranking and/or Ranking Labels
Ranking w/ score
Ranking and/or AUC/LogLoss
Map performance metrics relevant to your model type within each model type page.
Metric
Metric Family
AUC
auc/logloss
LogLoss
auc/logloss
Mean Error
classification regression
MAE
classification regression
MAPE
regression
SMAPE
regression
WAPE
regression
RMSE
regression
MSE
regression
rSquared
regression
Accuracy
classification
Precision
classification
Recall
classification
F_1
classification
classification
classification
classification
classification
NDCG
classification ranking
Drift monitors measure distribution drift, which is the difference between two statistical distributions.
Arize offers various distributional drift metrics to choose from when setting up a monitor. Each metric is tailored to a specific use case; refer to this guide to help choose the appropriate metric for various ML use cases.

Drift Metrics

Metric
Data Type
Description
PSI
integer, floats, string
  • Sample size has less of an effect on PSI
  • Less sensitive, but will have fewer False positives when compared to KS or EMD (use PSI if you expect fluctuations in your data and don’t want too many false alarms)
  • Binning Strategy can affect the calculation of PSI
  • A true statistical ‘distance’, having the property of symmetry
    • PSI(A -> B) == PSI(B->A)
Embedding Vectors
Euclidean distance check determines if the group of production data’s average centroid has moved away from the baseline group For unstructured data types, learn more here
integer, floats, string
  • Less sensitive than other metrics (such as KS statistic) and will have fewer False positives when compared to KS
  • Use KL if you expect fluctuations in your data
  • Sample size has less of an effect on KL
  • Binning Strategy can affect results
  • The non-symmetric version of PSI
    • KL(A -> B) != KL(B->A)
integer, floats, string
  • Similar to KL except in two areas: JS is always finite and symmetric
  • Interpretable from 0 --> 1 (PSI doesn't have this property as it's evaluated from 0 --> infinity)
    • 0 = identical distributions
    • 1 = completely different with no overlap
  • Mildly sensitive compared to PSI and KL, but not as sensitive as KS
  • Binning strategy can affect results
integer, floats
  • Non-parametric, so it doesn't make assumptions about the underlying data
  • It doesn't require binning to calculate, so binning strategy doesn't affect this metric
  • A smaller P-value means more confident drift detection
    • KS Statistic returns P-value
  • KS is the most sensitive metric among all the drift metrics
    • Larger datasets make KS increasingly more sensitive
    • Will produce more false positives
    • Detects very slight differences
Model health depends on high-quality data that powers model features. Data quality monitors help identify key data quality issues such as cardinality shifts, data type mismatch, missing data, and more.

Data Quality Metrics

Metric
Data Type
Description
Percent Empty
integer, floats, string (Embedding vectors coming soon)
The percent of nulls in your model features
Cardinality (Count Distinct)
string
The cardinality of your categorical features
string
Count of new unique values that appear in production but not in baseline Note: this monitor requires a baseline to compare against
string
Count of new unique values that appear in baseline but not in production Note: this monitor requires a baseline to compare against
Quantiles
integer, floats
p99.9, p99, p95, p50
Sum
integer, floats
Sum of your numeric data over the evaluation window
Count
integer, floats, string
Traffic count of predictions, features, etc. Can be used with filters
Average
integer, floats
Average of your numeric data over the evaluation window
Couldn't find your metric above? Arize supports the ability to monitor custom metrics using SQL. Here is an example of a custom metric for the percent of a loan that is outstanding:
SELECT
SUM(loan_amount - repayment_amount) / SUM(loan_amount)
FROM model
WHERE state = 'CA'
AND loan_amount > 1000
Learn how to create custom metrics here.