Ask or search…

Performance Tracing

How to Troubleshoot Performance Monitors

Once a monitor triggers, alerting you that performance has dropped, troubleshoot your performance monitors in the Performance Tracing tab within your model. Performance tracing enables you to easily understand the features and slices that impact your model's performance the most and begin resolution.

Performance Over Time

The Performance Tracing tab immediately visualizes your performance metric over time layered on top of your model's prediction volume. This gives you a broad understanding of your model's overall performance to identify areas of improvement, compare different datasets, and examine problematic slices.
The 'Performance Over Time' graph is highly configurable. Use this graph to visualize different dimensions such as:
  • Environments: pick from production, validation, or training environments
  • Versions: pick from any model version
  • Time periods: zoom in or out on any time period for your dataset
  • Performance metrics: choose from an array of performance metrics such as accuracy, AUC, MAE, MAPE, RMSE, sMAPE, WAPE, and more.
  • Filters: layer additional filters across features, prediction values, actuals, and tags to see your model's performance on a more granular level.

Add A Comparison Dataset

Send data from different environments to compare model performance between training, validation, or a different time period within your production data. Comparing your production data helps you identify gaps in data quality or where drift occurs for simple troubleshooting.
Navigate to the toolbar, click 'Add a Comparison' and pick from a different environment, version, or time period.

Selecting Your View

Users have the choice of selecting between 3 troubleshooting views: Slice, Table, and Output Segmentation.
Select between table, embeddings, slices, or confusion matrix / calibration chart views of the data

Slice View

Performance Breakdown

To identify key areas to improve with your comparison dataset, break performance issues down using Performance Insights and our Performance Heat Map.

Performance Insights

The Performance Insights panel surfaces the worst-performing slices impacting your model to perform a counterfactual analysis. Use Performance Insights to exclude features or slices as a filter to identify how your model's performance changes.
To do this, scroll down to the 'Performance Insights' card and click on a feature. Once you click into a feature, a histogram of your feature slices will populate on the left side with options to 'Add cohort as a filter', 'Exclude cohort as a filter', and 'View explainability'.
A performance slice is a subset of model values formed from any model dimension, such as specific periods of time, set of features, etc. Learn more about slices here.

Performance Heat Map

The performance heat map visualizes your feature's performance by slice view to visually indicate the worst-performing slices within each feature. Click on the carrot on the left side of your feature's name to uncover its histogram.
Compare feature performance amongst different environments, versions, and filters to uncover areas of improvement. Look out for different colors and distributions between the two histograms to identify areas of missing or poor-performing data.
Once you've identified an area of interest, click on the 'View Feature Details' link to uncover a detailed view of your feature distribution over time.

Table View

The Table View enables users to see and interact with individual records in a simple table.
Data Exploration and Validation:
Get a better understanding of what your data looks like by exploring a record-level view. This is similar to a df.head within a notebook environment. Validate the data that was sent into the platform to make sure it was sent in the correct data format.
Record-level view of the predictions Arize has ingested for a model

Column Selector:

Explore any column in your data, including features and tags, using the column selector. Customize your table view by adding/removing columns, and re-ordering columns.
Select which columns from the data to dispaly in the table

Slide Over:

Click on a table row to get a comprehensive view at the prediction level of all of the columns of your data for deeper analysis.
View all of the data for each prediction

Embeddings Projector View

The Embeddings Projector view automatically surfaces the worst performing embeddings clusters for quick troubleshooting. This additional view is especially helpful when troubleshooting LLMs with prompts and responses, where switching between the Table, Embeddings Projector, and Slice views can help teams get a full picture of how their LLM is performing.

Confusion Matrix and Calibration Chart View

Confusion Matrix

A confusion matrix provides a summary of all prediction results of a classification problem. Each result is shown with its corresponding number of correct/incorrect predictions (True Positive, True Negative, False Positive, False Negative), count values and classification criteria. By providing a summary of all possible results, the confusion matrix lets you know the ways your classification model could get confused when making the predictions. It helps identify errors and the type of errors made by the model and thus helps improve the accuracy of the classification model.

Calibration Chart

This chart plots Average Actuals against Estimated Probability. The better calibrated the model, the closer the plotted points will be to the diagonal line.
  • If the model points are below the line: your model has over-forecast in its prediction. For example, predicting a credit card charge has a high probability of fraud when it not fraudulent.
  • If the model points are above the line: your model has an under-forecast in its prediction. For example, predicting a credit card charge has a low likelihood of being fraud when it's actually fraudulent.