Arize AI

10. Troubleshoot Data Consistency

Monitoring offline and online feature differences with Arize
Data Consistency is an Arize feature that's available under the Projects tab, allowing you to monitor differences between your offline features and online features. Data Consistency is an important metric for any modern ML system, since it surfaces potential problems in feature generation (i.e materializing features), data pipelines, and even latency issues in fetching from online stores.
Visit this Example Tutorial to learn how to use data consistency!
Data Consistency Metrics across features

Logging Offline Features

To monitor data consistency, you don't have to change anything in your production (i.e online features) workflow. You will only need to log your offline features using arize.log_validation_records then set up a match environment in the next step.
To have the proper data consistency environment set-up on Arize, you will need to make sure that all of the following are met!
  1. 1.
    You sent in the same prediction_ids when logging to production and validation
  2. 2.
    You sent in the same prediction_timestampswhen logging to production and validation
  3. 3.
    You sent all your offline data to the same batch_id such asbatch_id=`offline`
You can always add to the same offline environment by calling arize.log_validation_records with the same batch_id!

Setting up Data Consistency on Arize

You will first have to create a new project containing your models with the same match environments. For example, all models with the same feature schema (such as one instance of a model deployed for a stores/cities/state) can use the same Projects page.
Then, you will want to set up the match environment to the batch_id which you decided as your offline data consistency measuring under Config. In this example, we logged to offline.
Data Consistency details may not show-up immediately after the initial set-up. If you have properly logged to Arize, then the next day you should see visualizations on your match environment.

Troubleshooting Data Consistency with Arize

By clicking into the mismatched features, you can see the feature match distribution difference between offline and online environments using our heat map feature and distribution widgets.
In this particular example. You can see that the offline features seem to experience a one-sided delay, signifying potential latency problems.

Additional Resources

Questions? Email us at [email protected] or Slack us in the #arize-support channel