Arize AI

5. Performance Tracing

Troubleshoot models issues and understand model degradation
Now that your monitors are set up with real-time model performance monitoring, a monitor is triggered alerting you that your performance has dropped. Troubleshoot your performance degradation using Performance Tracing. Use this interactive tool to easily understand the features and slices that impact your models performance the most and begin resolution.
Once a performance monitor is triggered, navigate to the Performance Tracing tab to troubleshoot your model issues and gain an understanding of what caused your model's degradation. You can navigate to performance tracing in two ways: from any page within a model or directly in a monitor.
From any page within a model, click on 'Performance Tracing' tab located in the top of the navigation bar.
If you're in a specific monitor, click on the 'Troubleshoot Model Performance' button on the top right of the 'Metric History' visualization.

Performance Over Time

The Performance Tracing tab immediately visualizes your performance metric over time layered on top of your models prediction volume. This gives you a broad understanding of your models overall performance to identify areas of improvement, compare different datasets, and examine problematic slices.
Pro Tip: Click and drag on the graph itself to zoom into a specific time period
The Performance Over Time graph is highly configurable. Use this graph to visualize different:
  • Environments: pick from production, validation, or training environments
  • Versions: pick from any model version
  • Time periods: zoom in or out on any time period for your dataset
  • Performance metrics: choose from an array of performance metrics such as accuracy, AUC, MAE, MAPE, RMSE, sMAPE, WAPE and more.
  • Filters: layer additional filters across features, prediction values, actuals, and tags to see your model's performance on a more granular level.

Comparative Analysis

Add a model comparison to easily identify performance drops and areas to improve amongst different datasets. All visualizations on the Performance Tracing page are automatically updated to reflect your comparison.
Add another dataset to see which model performs better if:
  • You have a sneaking suspicion that your model performs better in a different environment
  • You think certain filters change your model's performance (i.e. omitting a feature)
  • You're just curious how your production model stacks up against different environments and versions

Performance Breakdown

After you've identified key areas to improve, break performance issues down using Performance Insights and our Performance Heat Map.

Performance Breakdown: Performance Insights

The Performance Insights panel surfaces the worst performing slices impacting your model to easily perform a counterfactual analysis.
Use Performance Insights to exclude features or slices as a filter to identify how your models performance changes. To do this, scroll down to the 'Performance Insights' card and click on a feature. Once you click into a feature, a histogram of your feature slices will populate on the left side with options to 'Add cohort as a filter', 'Exclude cohort as a filter', and 'View explainability'.
Pro Tip: We use 'Cohort' and 'Slice' interchangeably.
A performance slice is a subset of model values formed from any model dimension such as specific periods of time, set of features, etc. Learn more about slices here.

Performance Breakdown: Performance Heat Map

The performance heat map visualizes your feature's performance by slice view to visually indicate the worst performing slices within each feature. Click on the carrot on the left side of your feature's name to uncover its histogram.
Compare feature performance amongst different environments, versions, and filters to uncover areas of improvement.
Pro Tip: Compare the volume of specific segments to indicate areas where you may need to train or retrain your model based on missing training data.

Output Segmentation: Calibration Chart

Scrolling below the 'Performance Breakdown' card, you'll see a Calibration Chart. This chart plots Average Actuals against Estimated Probability. The better calibrated the model, the closer the plotted points will be to the diagonal line.
  • If the model points are below the line: your model has over-forecast in its prediction. For example, predicting a credit card charge has a high probability of fraud when it not fraudulent.
  • If the model points are above the line: your model has under-forecast in its prediction. For example, predicting a credit card charge has a low likelihood to be fraud when it's actually fraudulent.

Additional Resources

Questions? Email us at [email protected] or Slack us in the #arize-support channel