Arize AI
Click-Through Rate
Overview of how to use Arize for click-through rate models
Check out our Click-Through Rate colab to send data in and follow along our Click-Through Rate workflow.


In this walkthrough, we will use the Arize platform to monitor Click-through Rate (CTR) model performance.This use case will walk you through how to manage your model's performance for an online advertising platform. While you have spent a great deal of your time collecting online data and training models for best performance, it's common that your models in production have no tools available to monitor the performance of your models, identify any issues or get insights into how to improve your models. In this walkthrough, we will look at a few scenarios common to an advertisement use-case and more specifically look at CTR predictions versus actuals for a given ad or ad group.You will learn to:
  1. 1.
    Get training, validation, and production data into the Arize platform
  2. 2.
    Setup a baseline and performance dashboards
  3. 3.
    Create threshold alerts
  4. 4.
    Monitor for Log-Loss
  5. 5.
    Understand where the model is underperforming
  6. 6.
    Discover the root cause of issues

Set up a Baseline

Within Arize, you are able to set a baseline for the model which compares your model's behavior in production to a training set, validation set, or an initial model launch period. This allows us to determine when/where our model's behavior has changed drastically.
  • Datasets: Training Version 1.0
  • Default Metric: Accuracy, Trigger Alert When: Accuracy is below .7, Positive Class: click
  • Turn On Monitoring: Drift ✅, Data Quality ✅, Performance ✅


Arize sets up monitors across all features, predictions, and actual values. For click-through rate, it's important to monitor the model's Accuracy.

Drift Detection

Drift is a change in distribution over time, measured for model inputs, outputs, and actuals of a model. Measure drift to identify if your models have grown stale, you have data quality issues, or if there are adversarial inputs in your model. Detecting drift in your models will help protect your models from performance degradation and allow you to better understand how to begin resolution.
Prediction Drift Impact can surface when drift has impacted your model. Drift (PSI) is a measurement of how much your distribution has drifted. Lastly, Feature Importance helps your explain why even small Drift (PSI) can have significant Drift Impact.

Data Quality Checks

It’s important to immediately surface data quality issues to identify how your data quality maps to your model’s performance. Utilize data quality monitoring to analyze hard failures in your data quality pipeline, such as missing data or cardinality shifts.
  • Missing / Null values could be an indicator of issues from an upstream data source.
  • Cardinality is checked to ensure there are no spikes / drops in feature values.

Performance Analysis

Model performance metrics measure how well your model performs in production. Once a performance monitor is triggered, navigate to the Performance tab to start troubleshooting your model issues and gain an understanding of what caused the degradation.
Compare production to training or other windows of production. Bring in another dataset to compare performance and see which model performs better. This can help answer questions such as "Were we seeing this problem in training?" or "Does my new / previous model version perform better?". It can also be helpful to compare to other windows of production.
Identify low performing segments. By looking at performance breakdown by feature, you can dig even deeper to see which segment within each feature of the model is underperforming.
Root Cause Analysis Walkthrough
We can see that our type I error (false positive error) has significantly increased. Our model is predicting that many more clicks than in actuality. Our models is expecting a large amount of users to be clicking on a given ad, when in fact they are not. If we look into which cohorts are performing worst, we can peak in our dual histograms. From here we can see that there are large deviations in the device and domain features.
It seems that our performance degradation is due to unseen populations in the device and domain category. Maybe this would be a indication that we should dig deeper into these cohorts and better understand how we want to handle these never before seen populations in the model

Actionable Insight

Now that we understand what is affecting our model we can now:
  • Retrain the model in these brand new cohorts inside the device and domain features
  • Handle the empty values in our data pipelines, affecting our data quality

Custom Dashboard

In only a few clicks, you can add widgets to provide a single glance view of your model's import metrics and KPI. To visualize these metrics over time you can also create a custom time series widget which overlays three plots to showcase the fluctuation of these metrics over time.
Below we'll set up a templatized dashboard and then adjust the template to match our use case.
Now let's make a simple customization to our template. You can refine the Prediction Score vs Actual Score by Day graph by adding a similar plot with these filters:
  • Pred Shopping: Use Aggregation Function : Average with Average of set to Prediction Score with filter (feature category = [shopping]). Also add a filter (feature domain != [])

Business Impact

Sometimes, we need metrics other than traditional statistical measures to define model performance. Business Impact is a way to measure your scored model's payoff at different thresholds (i.e, decision boundary for a scored model).
Getting your marketing and ad campaigns to target the right populations is key to growing and scaling your business. The difference of a few percentage points in your click-through rate can make huge impacts to your marketing and sales funnels.
Setting up a sample Payout Curve
When uses machine learning and building models to detect how well your advertising is performing quantitatively, understanding why your model is behaving the way it is, is more important than ever when it comes to beating the competition. Since models are constantly being exposed to new variables in an ever changing world, it's common that ML and data science professionals will need to be in tune with their models.


Log feature importances to the Arize platform to explain your model's predictions. By logging these values, you gain the ability to view the global feature importances of your predictions as well as the ability to perform global and cohort prediction-based analysis to compare feature importances for your model's features.


Check out our Click-Through Rate colab for an interactive demo and our best practices blog for industry context!
Visit the Arize Blog and Resource Center for more resources on ML observability and model monitoring.