Churn Forecasting

Overview of how to use Arize for churn models

Check out our Churn Forecasting Colab for an interactive demo.


In this walkthrough, we will use the Arize platform to monitor a Churn Forecasting model's performance.

This use case will walk you through how to manage your model's performance for your churn forecasting models. This will outline how model monitoring and observability can aid common problems associated with churn forecasting models in production. Learn how to actively monitor your model performance, identify root cause issues, and gain further insights into model improvement.

We will look at a few scenarios common to churn models when monitoring performance.

You will learn to:

  1. Get training, validation, and production data into the Arize platform

  2. Setup a baseline

  3. Setup performance monitors for Accuracy, False Negative Rate, and False Positive Rate

  4. Create customized dashboards

  5. Discover the root cause of issues

  6. Identify business impact

Set up a Baseline

Within Arize, you can set a baseline for the model which compares your model's behavior in production to a training set, validation set, or an initial model launch period. This allows us to determine when/where our model's behavior has changed drastically.

A baseline will help identify potential model issues, changing user behaviors, or even changes in our model's concepts.

Setting up a baseline will surface changes in the distributions of:

  1. Model predictions β€”churn / not_churn

  2. Model input data β€” feature values

  3. Ground truth/Actual values β€” was this transaction actually churn

Choosing a baseline

This churn model has a training environment that we will set as the baseline.

If you want to compare to a previous period in production like the initial rollout of the model or using a trailing window of production then you can select production as a baseline in Arize.


Arize sets up monitors across all features, predictions, and actual values. Below are critical metrics to monitor for a churn model:

  • Accuracy β€”Β of all my predictions, what percent did the model predict correctly?

  • False Negative Rate β€”Β Missed Chance to Retain % (these are customers who eventually churn but were identified by the model as not_churn which means we didn't even try to save them).

  • False Positive Rate β€” Wasted Retention Effort % (these are customers who did not that were classified as churn which means the account team spent money on discounts or other campaigns to save a customer who wasn't a threat to churn).

Drift Detection

Drift is a change in distribution over time. This is measured for model inputs, outputs, and actuals of a model. Drift can be used to indicate if a model has grown stale, if there are data quality issues, or if there are adversarial inputs in your model.

Detecting drift in your models will help protect your models from performance degradation and allow you to understand better how to begin resolution.

Prediction Drift Impact can surface when drift has impacted your model.

Drift (PSI) measures how much your distribution has drifted.

Feature Importance helps you explain why even small Drift (PSI) can have significant Drift Impact

Data Quality Checks

It’s important to immediately surface data quality issues to identify how your data quality maps to your model’s performance. Utilize data quality monitoring to analyze hard failures in your data quality pipeline, such as missing data or cardinality shifts.

  • Missing / Null values could be an indicator of issues from an upstream data source.

  • Cardinality is checked to ensure there are no spikes / drops in feature values.

Performance Analysis

Model performance metrics measure how well your model performs in production by comparing the prediction to the ground truth or actual. Once a performance monitor is triggered, Arize allows you to quickly navigate to the Performance tab to start troubleshooting your model issues and gain an understanding of what caused the degradation.

Compare production to training or other windows of production. Bring in another dataset to compare performance and see which model performs better. This can help answer questions such as "Were we seeing this problem in training?" or "Does my new / previous model version perform better?". It can also be helpful to compare to other windows of production.

Identify low performing segments. By looking at performance breakdown by feature, you can dig even deeper to see which segment within each feature of the model is underperforming.

Custom Dashboard

In the case of churn model, a few key metrics to review are the False Positive, False Negative , and overall Accuracy rates.

In only a few clicks, you can add widgets to provide a single glance view of your model's Accuracy, False Positive Rate, and False Negative Rate. To visualize these metrics over time you can also create a custom time series widget which overlays three plots to showcase the fluctuation of these metrics over time.

Arize enables you to review your model's performance and take action on these insights by enabling an export to notebook option when you found an insight that you want to further investigate!


Log feature importances to the Arize platform to explain your model's predictions. By logging these values, you gain the ability to view the global feature importances of your predictions as well as the ability to perform global and cohort prediction-based analysis to compare feature importances for your model's features.

Last updated

Copyright Β© 2023 Arize AI, Inc