Arize AI
Churn Forecasting
Overview of how to use Arize for churn models
Check out our Churn Forecasting Colab for an interactive demo.


In this walkthrough, we will use the Arize platform to monitor a Churn Forecasting model's performance.
This use case will walk you through how to manage your model's performance for your churn forecasting models. This will outline how model monitoring and observability can aid common problems associated with churn forecasting models in production. Learn how to actively monitor your model performance, identify root cause issues, and gain further insights into model improvement.
We will look at a few scenarios common to churn models when monitoring for performance.
You will learn to:
  1. 1.
    Get training, validation, and production data into the Arize platform
  2. 2.
    Setup a baseline
  3. 3.
    Setup performance monitors for Accuracy, False Negative Rate, and False Positive Rate
  4. 4.
    Create customized dashboards
  5. 5.
    Discover the root cause of issues
  6. 6.
    Identify business impact

Set up a Baseline

Within Arize, you are able to set a baseline for the model which compares your model's behavior in production to a training set, validation set, or an initial model launch period. This allows us to determine when/where our model's behavior has changed drastically.
A baseline will help identify potential model issues, changing user behaviors, or even changes in our model's concepts.
Setting up a baseline will surface changes in the distributions of:
  1. 1.
    Model predictions —churn / not_churn
  2. 2.
    Model input data — feature values
  3. 3.
    Ground truth/Actual values — was this transaction actually churn

Choosing a baseline

This churn model has a training environment that we will set as the baseline.
If you want to compare to a previous period in production like the initial rollout of the model or using a trailing window of production then you can select production as a baseline in Arize.


Arize sets up monitors across all features, predictions, and actual values. Below are critical metrics to monitor for a churn model:
  • Accuracy — of all my predictions, what percent did the model predict correctly?
  • False Negative Rate — Missed Chance to Retain % (these are customers who eventually churn but were identified by the model as not_churn which means we didn't even try to save them).
  • False Positive Rate — Wasted Retention Effort % (these are customers who did not that were classified as churn which means the account team spent money on discounts or other campaigns to save a customer who wasn't a threat to churn).

Drift Detection

Drift is a change in distribution over time. This measured for model inputs, outputs, and actuals of a model. Drift can be used to indicate if a model has grown stale, there are data quality issues, or if there are adversarial inputs in your model.
Detecting drift in your models will help protect your models from performance degradation and allow you to better understand how to begin resolution.
Type of Drift
Possible Drift Correlation
Prediction Drift
Your customers are extremely happy and there's less signals of potentially churning customers!
Actual Drift (No Prediction Drift)
An unexpected shift in the market occurred like a pandemic causing customers to cancel their gym memberships or travel credit cards
Feature Drift
A new support model is deployed so there is a new input(s)
Prediction Drift Impact can surface when drift has impacted your model.
Drift (PSI) is a measurement of how much your distribution has drifted.
Feature Importance helps your explain why even small Drift (PSI) can have significant Drift Impact

Data Quality Checks

It’s important to immediately surface data quality issues to identify how your data quality maps to your model’s performance. Utilize data quality monitoring to analyze hard failures in your data quality pipeline, such as missing data or cardinality shifts.
  • Missing / Null values could be an indicator of issues from an upstream data source.
  • Cardinality is checked to ensure there are no spikes / drops in feature values.

Performance Analysis

Model performance metrics measure how well your model performs in production by comparing the prediction to the ground truth or actual. Once a performance monitor is triggered, Arize allows you to quickly navigate to the Performance tab to start troubleshooting your model issues and gain an understanding of what caused the degradation.
Compare production to training or other windows of production. Bring in another dataset to compare performance and see which model performs better. This can help answer questions such as "Were we seeing this problem in training?" or "Does my new / previous model version perform better?". It can also be helpful to compare to other windows of production.
Identify low performing segments. By looking at performance breakdown by feature, you can dig even deeper to see which segment within each feature of the model is underperforming.

Custom Dashboard

In the case of churn model, a few key metrics to review are the False Positive, False Negative , and overall Accuracy rates.
In only a few clicks, you can add widgets to provide a single glance view of your model's Accuracy, False Positive Rate, and False Negative Rate. To visualize these metrics over time you can also create a custom time series widget which overlays three plots to showcase the fluctuation of these metrics over time.
Arize enables you to review your model's performance and take action on these insights by enabling an export to notebook option when you found an insight that you want to further investigate!

Business Impact

Sometimes, we need metrics other than traditional statistical measures to define model performance. Business Impact is a way to measure your scored model's payoff at different thresholds (i.e, decision boundary for a scored model).
When dealing with predicting churn, often the profit/loss associated with model predictions is not weighted equal. For example, the diagram below might estimate the profit/loss of a decision made by your model.


Log feature importances to the Arize platform to explain your model's predictions. By logging these values, you gain the ability to view the global feature importances of your predictions as well as the ability to perform global and cohort prediction-based analysis to compare feature importances for your model's features.


Check out our Churn Colab for an interactive demo.
Last modified 5d ago