Best Practices for Monitors
Monitor Performance, Drift, Data Quality, and Custom Metrics
Last updated
Was this helpful?
Monitor Performance, Drift, Data Quality, and Custom Metrics
Last updated
Was this helpful?
Continuous monitoring ensures the accuracy and reliability of ML predictions over time. This is critical because models can drift or degrade in performance due to changes in the underlying data, altering environments, or evolving target variables.
Monitoring isn't a one-size-fits-all solution for the variety of ML use cases, business needs, and areas of concern.
Monitors automatically detect drift, data quality issues, or anomalous performance degradations with highly configurable dimensions based on both common KPIs and custom metrics.
Additionally, these metrics are used during the model validation phase, offering insights that guide the improvement and fine-tuning of models to achieve optimal predictive performance.
Models and their data change over time, this change is known as drift. Monitor model drift in production to catch underlying data distribution changes over to help identify and root cause model issues before they impact your model.
Monitoring feature and prediction drift is particularly useful if you receive delayed actuals (ground truth data) to use as a proxy for performance monitoring.
🏃Common Questions:
High-quality data is fundamental to building reliable, accurate machine learning models and the value of predictions can be significantly compromised by poor data quality.
Easily root cause model issues by monitoring key data quality metrics to identify cardinality shifts, data type mismatches, missing data, and more.
🏃 Common Questions:
quantify a model's effectiveness in its predictions. Monitor performance metrics when deploying a model in production to flag unexpected changes or drops in performance.
📜 How should I monitor if I'm concerned about data pipeline issues?
Your data pipeline may occasionally fail or inadvertently drop features. Use count and percent empty monitors to catch these issues.
🛍️ How should I monitor 3rd party/purchased data?
3rd party data is a common culprit of many model performance problems. Use data quality monitors to keep track of quantiles and sum or average values of your 3rd party data.
🚅 How should I monitor my features if I frequently retrain my model?
Every model retrain has the possibility of introducing inadvertent changes to features. Use data quality monitors to compare new values and missing values between your production and your training or validation datasets.
🚂 How should I monitor my pipeline of ground truth data?
Monitor your actuals with percent empty and count to capture any failures or errors in your ground truth. pipeline.
🔔 My data quality alerts are too noisy/not noisy enough
Edit your threshold value above or below the default standard deviation value to temper your alerts.
✌️ Two Types of Drift
Use drift monitors to compare production against different baseline datasets.
Feature drift captures changes to your data pipeline that can lead to anomalous model behavior.
Prediction drift captures changes in the outputs of your model that may require stakeholders to be notified. This is also an excellent way to monitor performance without ground truth values.
🚀 Performance
Monitor performance metrics based on ground truth data (actuals) for your model type, such as NDCG (ranking), AUC (propensity to click), MAPE (predicting ETAs), and more!
📌 Important Features
Monitor key features important to your model with data quality monitors. This can be a powerful tool for root cause analysis workflows.
🔍 Leading Indicators
If your model receives delayed ground truth, monitor your prediction drift score and feature drift as a proxy for model performance.
Performance
AUC, LogLoss, Mean Error, MAE, MAPE, SMAPE, WAPE, RMSE, MSE, RSquared, Accuracy, Precision, Recall, f_1, Sensitivity, Specificity, False Negative Rate, False Positive Rate
Drift
PSI, KL Divergence, JS Distance, KS Statistic
Data Quality
Percent Empty, Cardinality, New Values, Missing Values, Quantiles (P99.9, P95, P50, P99
🌊 How do I monitor performance without ground truth data?
Get a sense of model performance without ground truth data by monitoring feature drift and prediction drift.
🔔 My performance alerts are too noisy/not noisy enough
🪟 How do I monitor with delayed ground truth data?
🏗️ What if my performance metric is specific to my team?
📈 My monitors are overly sensitive or not sensitive enough
🤖 Can I create performance monitors programmatically?
🏎️ How do I track sudden drift over time?
🐌 How do I track gradual drift over time?
🔔 My drift alerts are too noisy/not noisy enough
🔑 Can I monitor a few key features instead of all of them?
🔍 What are the leading indicators of performance degradation?
🤖 Can I create drift monitors programmatically?
Edit your above or below the default standard deviation value to temper your alerts.
Delay a performance evaluation via a . Change this if you have delayed actuals, so you evaluate your model on the most up-to-date data.
Create any performance metric to suit your monitoring needs via . Monitor, troubleshoot, and use custom metrics in dashboards.
Increase your to smooth out spikes or seasonality. Decrease your evaluation window to react faster to potential incidents.
Use the to programmatically create performance monitors.
Use a moving window of to catch sudden drift.
Use to catch gradual drift.
Edit your above or below the default standard deviation value to temper your alerts.
Create based on individual features by following the 'Custom Monitors' tab in the guide below.
Measure feature and prediction drift to indicate performance degradation. Arize supports based on your use case.
Use the to programmatically create drift monitors.