Arize AI
5. Performance Tracing
Previously, we walked you through setting up real-time model performance monitoring.
A monitor is triggered and you are alerted that your model's performance has dropped! How do you troubleshoot what is causing your performance degradation? Introducing Performance Tracing.

Performance Tracing

Once a performance monitor is triggered, navigate to the Performance Tracing tab by clicking on Troubleshoot Model Performance button directly from the performance monitor, or by clicking on Performance Tracing tab on the top navigation bar. Start troubleshooting your model issues and gain an understanding of what caused the degradation!
The Performance Tracing tab immediately visualizes your select performance metric over time, layered on top of a histogram that visualizes your model's volume of predictions. You can zoom in to a particular time range by highlighting it.
You can also choose to bring in another dataset, to compare performance and see which model performs better. This can help answer questions such as "Were we seeing this problem in training?" or "Does my new / previous model version perform better?".
On top of that, you can layer additional filters across features, prediction values, and actuals to see your model's performance on a more granular level.
Scrolling below the Performance over Time chart, you will see a Calibration Chart. This chart plots Average Actuals against Estimated Probability. The better calibrated the model, the closer the plotted points will be to the diagonal line. If the model points are below the line, that means the model has over-forecast in its prediction. Fore example, predicted a credit card charge was high probability fraud, when it wasn't. If the model points are above the line, that means the model has under-forecast in its prediction. For example, predicting a credit card charge was low likelihood to be fraudulent when it was in fact fraud.
Scrolling below the Performance over Time chart, you will see a Performance Breakdown by Feature.
By clicking on the carrot, you can dig even deeper to see which segment within each feature of the model is underperforming. In the example below, we can see that "very poor" for the "fico range" feature is performing the worst, and has a relatively large proportion of volume, indicating that it is a segment I may want to dig into.
In addition to comparing performance of features and time periods across two datasets, you can compare the volume of specific segments. In this example I am seeing much higher volume for Online Food Delivery in Dataset A (production) than B (training), indicating I may need to retrain my model based on what I'm seeing in production.
Overall, the Performance tab is an interactive tool that brings to light underperforming areas with an ability to easily add filters, compare datasets, and visualize slices to undercover and troubleshoot performance issues.

Additional Resources

Questions? Email us at [email protected] or Slack us in the #arize-support channel