Best Practices for Monitors
Monitor Performance, Drift, Data Quality, and Custom Metrics
Last updated
Monitor Performance, Drift, Data Quality, and Custom Metrics
Last updated
Copyright © 2023 Arize AI, Inc
Continuous monitoring ensures the accuracy and reliability of ML predictions over time. This is critical because models can drift or degrade in performance due to changes in the underlying data, altering environments, or evolving target variables.
Monitoring isn't a one-size-fits-all solution for the variety of ML use cases, business needs, and areas of concern.
Monitors automatically detect drift, data quality issues, or anomalous performance degradations with highly configurable dimensions based on both common KPIs and custom metrics.
Learn how to set up your monitors here!
Performance metrics quantify a model's effectiveness in its predictions. Monitor performance metrics when deploying a model in production to flag unexpected changes or drops in performance.
Additionally, these metrics are used during the model validation phase, offering insights that guide the improvement and fine-tuning of models to achieve optimal predictive performance.
Models and their data change over time, this change is known as drift. Monitor model drift in production to catch underlying data distribution changes over to help identify and root cause model issues before they impact your model.
Monitoring feature and prediction drift is particularly useful if you receive delayed actuals (ground truth data) to use as a proxy for performance monitoring.
🏃Common Questions:
High-quality data is fundamental to building reliable, accurate machine learning models and the value of predictions can be significantly compromised by poor data quality.
Easily root cause model issues by monitoring key data quality metrics to identify cardinality shifts, data type mismatches, missing data, and more.
🏃 Common Questions:
🌊 How do I monitor performance without ground truth data?
Get a sense of model performance without ground truth data by monitoring feature drift and prediction drift.
🔔 My performance alerts are too noisy/not noisy enough
Edit your threshold value above or below the default standard deviation value to temper your alerts.
🪟 How do I monitor with delayed ground truth data?
Delay a performance evaluation via a delay window. Change this if you have delayed actuals, so you evaluate your model on the most up-to-date data.
🏗️ What if my performance metric is specific to my team?
Create any performance metric to suit your monitoring needs via Custom Metrics. Monitor, troubleshoot, and use custom metrics in dashboards.
📈 My monitors are overly sensitive or not sensitive enough
Increase your evaluation window to smooth out spikes or seasonality. Decrease your evaluation window to react faster to potential incidents.
🤖 Can I create performance monitors programmatically?
Use the GraphQL API to programmatically create performance monitors.
🏎️ How do I track sudden drift over time?
Use a moving window of production data as your model baseline to catch sudden drift.
🐌 How do I track gradual drift over time?
Use training data as your model baseline to catch gradual drift.
🔔 My drift alerts are too noisy/not noisy enough
Edit your threshold value above or below the default standard deviation value to temper your alerts.
🔑 Can I monitor a few key features instead of all of them?
Create custom drift monitors based on individual features by following the 'Custom Monitors' tab in the guide below.
🔍 What are the leading indicators of performance degradation?
Measure feature and prediction drift to indicate performance degradation. Arize supports various drift metrics based on your use case.
🤖 Can I create drift monitors programmatically?
Use the GraphQL API to programmatically create drift monitors.
📜 How should I monitor if I'm concerned about data pipeline issues?
Your data pipeline may occasionally fail or inadvertently drop features. Use count and percent empty monitors to catch these issues.
🛍️ How should I monitor 3rd party/purchased data?
3rd party data is a common culprit of many model performance problems. Use data quality monitors to keep track of quantiles and sum or average values of your 3rd party data.
🚅 How should I monitor my features if I frequently retrain my model?
Every model retrain has the possibility of introducing inadvertent changes to features. Use data quality monitors to compare new values and missing values between your production and your training or validation datasets.
🚂 How should I monitor my pipeline of ground truth data?
Monitor your actuals with percent empty and count to capture any failures or errors in your ground truth. pipeline.
🔔 My data quality alerts are too noisy/not noisy enough
Edit your threshold value above or below the default standard deviation value to temper your alerts.
✌️ Two Types of Drift
Use drift monitors to compare production against different baseline datasets.
Feature drift captures changes to your data pipeline that can lead to anomalous model behavior.
Prediction drift captures changes in the outputs of your model that may require stakeholders to be notified. This is also an excellent way to monitor performance without ground truth values.
🚀 Performance
Monitor performance metrics based on ground truth data (actuals) for your model type, such as NDCG (ranking), AUC (propensity to click), MAPE (predicting ETAs), and more!
📌 Important Features
Monitor key features important to your model with data quality monitors. This can be a powerful tool for root cause analysis workflows.
🔍 Leading Indicators
If your model receives delayed ground truth, monitor your prediction drift score and feature drift as a proxy for model performance.
Performance
AUC, LogLoss, Mean Error, MAE, MAPE, SMAPE, WAPE, RMSE, MSE, RSquared, Accuracy, Precision, Recall, f_1, Sensitivity, Specificity, False Negative Rate, False Positive Rate
Drift
PSI, KL Divergence, JS Distance, KS Statistic
Data Quality
Percent Empty, Cardinality, New Values, Missing Values, Quantiles (P99.9, P95, P50, P99