Monitor Metrics

Monitor Performance, Drift, Data Quality, and Custom Metrics

Overview

Monitors automatically detect drift, data quality issues, or anomalous performance degradations with highly configurable dimensions based on both common KPIs and custom metrics.

Learn how to set up your monitors here!

Model performance metrics measure how well your model performs in production. Monitor model performance with daily or hourly checks using an evaluation metric. Your model type determines your performance metric.

Performance Metrics

Metrics are batched into Metric Groups that align with model types and their variants.

Metric GroupMetrics

Classification

Accuracy, Recall, Precision, FPR, FNR, F1, Sensitivity, Specificity

Regression

MAPE, MAE, RMSE, MSE, R-Squared, Mean Error

Ranking

NDCG@k, AUC@k

Ranking Labels

MAP@k, MRR

AUC / LogLoss

AUC, PR-AUC, Log Loss

Computer Vision / Object Detection

Accuracy (MAP & IoU coming soon)

Valid Model Type & Metric Group Combinations

Model TypeMetric Group Combination

Regression

Regression

Binary Classification

Classification and/or Regression and/or AUC/LogLoss

Ranking w/ label

Ranking and/or Ranking Labels

Ranking w/ score

Ranking and/or AUC/LogLoss

Map performance metrics relevant to your model type within each model type page.

MetricMetric Family

auc/logloss

auc/logloss

Mean Error

classification regression

classification regression

regression

regression

regression

regression

regression

rSquared

regression

classification

classification

classification

classification

classification

classification

classification

classification

classification ranking

Drift Monitors

Drift monitors measure distribution drift, which is the difference between two statistical distributions.

Arize offers various distributional drift metrics to choose from when setting up a monitor. Each metric is tailored to a specific use case; refer to this guide to help choose the appropriate metric for various ML use cases.

Drift Metrics

MetricData TypeDescription

integer, floats, string

  • Sample size has less of an effect on PSI

  • Less sensitive, but will have fewer False positives when compared to KS or EMD (use PSI if you expect fluctuations in your data and don’t want too many false alarms)

  • Binning Strategy can affect the calculation of PSI

  • A true statistical ‘distance’, having the property of symmetry

    • PSI(A -> B) == PSI(B->A)

Embedding Vectors

Euclidean distance check determines if the group of production data’s average centroid has moved away from the baseline group For unstructured data types, learn more here

integer, floats, string

  • Less sensitive than other metrics (such as KS statistic) and will have fewer False positives when compared to KS

  • Use KL if you expect fluctuations in your data

  • Sample size has less of an effect on KL

  • Binning Strategy can affect results

  • The non-symmetric version of PSI

    • KL(A -> B) != KL(B->A)

integer, floats, string

  • Similar to KL except in two areas: JS is always finite and symmetric

  • Interpretable from 0 --> 1 (PSI doesn't have this property as it's evaluated from 0 --> infinity)

    • 0 = identical distributions

    • 1 = completely different with no overlap

  • Mildly sensitive compared to PSI and KL, but not as sensitive as KS

  • Binning strategy can affect results

integer, floats

  • Non-parametric, so it doesn't make assumptions about the underlying data

  • It doesn't require binning to calculate, so binning strategy doesn't affect this metric

  • A smaller P-value means more confident drift detection

    • KS Statistic returns P-value

  • KS is the most sensitive metric among all the drift metrics

    • Larger datasets make KS increasingly more sensitive

    • Will produce more false positives

    • Detects very slight differences

Model health depends on high-quality data that powers model features. Data quality monitors help identify key data quality issues such as cardinality shifts, data type mismatch, missing data, and more.

Data Quality Metrics

MetricData TypeDescription

Percent Empty

integer, floats, string (Embedding vectors coming soon)

The percent of nulls in your model features

Cardinality (Count Distinct)

string

The cardinality of your categorical features

string

Count of new unique values that appear in production but not in baseline Note: this monitor requires a baseline to compare against

string

Count of new unique values that appear in baseline but not in production Note: this monitor requires a baseline to compare against

integer, floats

p99.9, p99, p95, p50

Sum

integer, floats

Sum of your numeric data over the evaluation window

Count

integer, floats, string

Traffic count of predictions, features, etc. Can be used with filters

Average

integer, floats

Average of your numeric data over the evaluation window

Couldn't find your metric above? Arize supports the ability to monitor custom metrics using SQL. Here is an example of a custom metric for the percent of a loan that is outstanding:

SELECT
SUM(loan_amount - repayment_amount) / SUM(loan_amount)
FROM model
WHERE state = 'CA'
AND loan_amount > 1000

Learn how to create custom metrics here.

pageCustom Metrics Query Language

Last updated

Copyright © 2023 Arize AI, Inc