Search…
⌃K
Links

Choosing Your Metrics

Monitor Performance, Drift, Data Quality, and Custom Metrics

Overview

Monitors automatically detect drift, data quality issues, or anomalous performance degradations with highly configurable dimensions based on both common KPIs and custom metrics.
Learn how to set up your monitors here!
Model performance metrics measure how well your model performs in production. Monitor model performance with daily or hourly checks using an evaluation metric. Your model type determines your performance metric.

Performance Metrics

Metrics are batched into Metric Groups that align with model types and their variants.
Metric Group
Metrics
Classification
Accuracy, Recall, Precision, FPR, FNR, F1, Sensitivity, Specificity
Regression
MAPE, MAE, RMSE, MSE, R-Squared, Mean Error
Ranking
Ranking Labels
AUC/LogLoss
AUC, PR-AUC, Log Loss
Computer Vision
MAP, IoU

Valid Model Type & Metric Group Combinations

Model Type
Metric Group Combination
Regression
Regression
Binary Classification
Classification and/or Regression and/or AUC/LogLoss
Ranking w/ label
Ranking and/or Ranking Labels
Ranking w/ score
Ranking and/or AUC/LogLoss
Map performance metrics relevant to your model type within each model type page.
Metric
Metric Family
​AUC​
auc/logloss
​LogLoss​
auc/logloss
Mean Error
classification regression
​MAE​
classification regression
​MAPE​
regression
​SMAPE​
regression
​WAPE​
regression
​RMSE​
regression
MSE
regression
rSquared
regression
​Accuracy​
classification
​Precision​
classification
​Recall​
classification
​F_1​
classification
​Sensitivity​
classification
classification
classification
classification
​NDCG​
classification ranking

​Drift Monitors​

Drift monitors measure distribution drift, which is the difference between two statistical distributions.
Arize offers various distributional drift metrics to choose from when setting up a monitor. Each metric is tailored to a specific use case; refer to this guide to help choose the appropriate metric for various ML use cases.

Drift Metrics

Metric
Data Type
Description
​PSI​
integer, floats, string
  • Sample size has less of an effect on PSI
  • Less sensitive, but will have fewer False positives when compared to KS or EMD (use PSI if you expect fluctuations in your data and don’t want too many false alarms)
  • Binning Strategy can affect the calculation of PSI
  • A true statistical ‘distance’, having the property of symmetry
    • PSI(A -> B) == PSI(B->A)
Embedding Vectors
Euclidean distance check determines if the group of production data’s average centroid has moved away from the baseline group For unstructured data types, learn more here​
​KL Divergence​
integer, floats, string
  • Less sensitive than other metrics (such as KS statistic) and will have fewer False positives when compared to KS
  • Use KL if you expect fluctuations in your data
  • Sample size has less of an effect on KL
  • Binning Strategy can affect results
  • The non-symmetric version of PSI
    • KL(A -> B) != KL(B->A)
​JS Distance​
integer, floats, string
  • Similar to KL except in two areas: JS is always finite and symmetric
  • Interpretable from 0 --> 1 (PSI doesn't have this property as it's evaluated from 0 --> infinity)
    • 0 = identical distributions
    • 1 = completely different with no overlap
  • Mildly sensitive compared to PSI and KL, but not as sensitive as KS
  • Binning strategy can affect results
​KS Statistic​
integer, floats
  • Non-parametric, so it doesn't make assumptions about the underlying data
  • It doesn't require binning to calculate, so binning strategy doesn't affect this metric
  • A smaller P-value means more confident drift detection
    • KS Statistic returns P-value
  • KS is the most sensitive metric among all the drift metrics
    • Larger datasets make KS increasingly more sensitive
    • Will produce more false positives
    • Detects very slight differences
Model health depends on high-quality data that powers model features. Data quality monitors help identify key data quality issues such as cardinality shifts, data type mismatch, missing data, and more.

Data Quality Metrics

Metric
Data Type
Description
Percent Empty
integer, floats, string (Embedding vectors coming soon)
The percent of nulls in your model features
Cardinality (Count Distinct)
string
The cardinality of your categorical features
string
Count of new unique values that appear in production but not in baseline Note: this monitor requires a baseline to compare against
string
Count of new unique values that appear in baseline but not in production Note: this monitor requires a baseline to compare against
​Quantiles​
integer, floats
p99.9, p99, p95, p50
Sum
integer, floats
Sum of your numeric data over the evaluation window
Count
integer, floats, string
Traffic count of predictions, features, etc. Can be used with filters
Average
integer, floats
Average of your numeric data over the evaluation window
Couldn't find your metric above? Arize supports the ability to monitor custom metrics using SQL. Here is an example of a custom metric for the percent of a loan that is outstanding:
SELECT
SUM(loan_amount - repayment_amount) / SUM(loan_amount)
FROM model
WHERE state = 'CA'
AND loan_amount > 1000
Learn how to create custom metrics here.