Model Performance


Model performance templates help you track the health of models. Create performance dashboards in the 'Model Performance' tab based on regression and scored models, or for enhanced model comparisons.

Performance Dashboard

The Performance Dashboard provides you with aggregate statistics for accuracy, recall, and other customizable evaluation and data metrics. Dashboards facilitate performance troubleshooting by providing support for customizable widgets and chainable filters to drill down to specific cohorts and see respective model statistics. You can slice and filter Dashboards by any model, model version, feature, and/or actual value.
Once you have an idea of your overall model performance across various slices, it's time to start diving into which features could be causing this performance degradation.

Feature Performance Heatmap

Arize provides Feature Analysis Templates which automatically surface model performance issues across all features and various feature/value combinations. Visual indicators facilitate drill-down analysis of the most problematic slices affecting your overall model performance.
Create a Feature Performance Heatmap in just a few clicks using the Arize template library.
The Feature Performance Heatmap provides you with model performance information across all features at various feature/value combinations —also known as a slice. Feature Performance Heatmaps also support conditional filters (like Dashboards). Additionally, they rank order the worst-performing slices to automatically surface potential root causes of your performance degradation.

Model Performance Templates

The following templates are available for performance analysis.
Regression Model
Scored Model
Ranking Model
Model A vs B
Production vs Training
This template allows you to track performance of predictions against actuals for regression models.
  • View aggregate accuracy metrics
  • Analyze accuracy of slices of predictions
  • Track fluctuations throughout a time period
This template allows you to track performance of scored models.
  • View aggregate statistics for accuracy, recall, specificity, Type I and II error rates, etc. for a single class
  • Analyze slices of performance
  • Performance feature analysis
This template allows you to track rank-aware performance metrics for ranking models
  • View model performance based on individual features
  • Analyze model performance based on rank groups
  • Track fluctuations throughout a time period
This template is often used to compare two live models in production. It will show a model's performance against another specified model.
  • Metrics to compare two models
  • Canary vs Live model comparison
  • Model performance metrics over time
This template is designed to compare production performance to training performance. It allows for commonly used approaches to analysis, and generally will show the distribution difference between production and training.
  • Performance comparison between production and training
  • Ability to slice on facets and compare slices/facets between production and training
  • Ability to select features to compare between production and training datasets
Questions? Email us at [email protected] or Slack us in the #arize-support channel