ML Monitor Types
Learn about the different monitor options in Arize
Performance Monitor Metrics by Model Type
Model performance metrics measure how well your model performs in production. Monitor model performance with daily or hourly checks using an evaluation metric. Your model type determines your performance metric.
Model Type | Metrics |
---|---|
Classification | Accuracy, Recall, Precision, FPR, FNR, F1, Sensitivity, Specificity |
Regression | MAPE, MAE, RMSE, MSE, R-Squared, Mean Error |
Ranking | NDCG@k, AUC@k |
Ranking Labels | MAP@k, MRR |
AUC/LogLoss | AUC, PR-AUC, Log Loss |
Computer Vision/ Object Detection | Accuracy, MAP, IoU |
Custom Metrics | Not seeing what you're looking for? Create a metric yourself! |
Drift Monitor Metrics
Arize offers various distributional drift metrics to choose from when setting up a monitor. Each metric is tailored to a specific use case; refer to this guide to help choose the appropriate metric for various ML use cases.
Metric | Description |
---|---|
A metric that is less influenced by sample size and offers fewer false positives compared to the Kolmogorov-Smirnov test or Earth Mover's Distance, making it suitable for datasets with expected fluctuations. However, PSI can be affected by the chosen binning strategy. A notable attribute of PSI is its symmetry, confirming its status as a true statistical 'distance'.
| |
Euclidean distance check determines if the group of production data’s average centroid has moved away from the baseline group For unstructured data types, learn more here. | |
A metric that's less sensitive than others like the Kolmogorov-Smirnov statistic, thereby producing fewer false positives and making it appropriate for datasets with expected fluctuations. While its calculation can be influenced by the chosen binning strategy, it's less affected by sample size. Unlike PSI, KL divergence is non-symmetric, meaning the divergence from dataset A to B is not the same as from B to A.
| |
Similar to Kullback-Leibler divergence but has two distinct advantages: it is always finite and symmetric. It offers an interpretable score ranging from 0, indicating identical distributions, to 1, indicating completely different distributions with no overlap. While its sensitivity is moderate compared to PSI and KL and less than KS, its results can still be influenced by the chosen binning strategy.
| |
A non-parametric metric that does not require assumptions about the underlying data or binning for its calculation, making it a sensitive tool for detecting drift, even in large datasets. The return of a smaller p-value from KS signifies a more confident drift detection, though this sensitivity may also result in more false positives. This sensitivity enables it to detect even slight differences in data distribution.
|
Data Quality Monitor Metrics
Metric | Description |
---|---|
Percent Empty | The percent of nulls in your model features. Percent empty for
|
Cardinality (Count Distinct) | The cardinality of your categorical features. Changes in your feature cardinality could indicate a change in the feature pipeline, or a new or deprecated product feature that your model has not adapted to yet. |
Count of new unique values that appear in production but not in the baseline. Identify concept drift or changes in the data distribution over time. These new unique values may not have been accounted for during model training and therefore could lead to unreliable predictions. | |
Count of new unique values that appear in the baseline but not in production. Can indicate changes in data generation processes or an issue with data collection in the production environment. | |
p99.9, p99, p95, p50 A detailed understanding of the underlying statistical properties of the data and its spread. Any significant shift in these quantiles could indicate a change in the data distribution, and require retraining. | |
Sum | The sum of your numeric data over the evaluation window. Detect anomalies or shifts in the data distribution. Significant changes in the sum might indicate data errors, outliers, or systemic changes in the process of generating the data. |
Count | Traffic count of predictions, features, etc. Can be used with filters. Ensure aren't any unexpected surges or drops in traffic that could affect performance and provide valuable insights about usage patterns, for better resource management and planning. |
Average | Average of your numeric data over the evaluation window. May indicate a systematic bias, a change in the data collection process, or an introduction of anomalies, which can adversely impact the performance and signal when your model may need retraining. |
Average List Length / Average Vector Length | This metric calculates the average of all the list lengths from each row and is available only for the Note: This metric omits empty lists or NULL values as missing values are captured in the percent empty metric. |
Custom Metrics
Couldn't find your metric above? Arize supports the ability to monitor custom metrics using SQL. Here is an example of a custom metric for the percent of a loan that is outstanding:
Learn how to create custom metrics here.
Last updated