10. Troubleshoot Data Consistency
Monitoring offline and online feature differences with Arize
Data Consistency is an Arize feature that's available under the Projects tab, allowing you to monitor differences between your offline features and online features. Data Consistency is an important metric for any modern ML system, since it surfaces potential problems in feature generation (i.e materializing features), data pipelines, and even latency issues in fetching from online stores.
Data Consistency Metrics across features
To monitor data consistency, you don't have to change anything in your production (i.e online features) workflow. You will only need to log your offline features using
arize.log_validation_recordsthen set up a match environment in the next step.
You can always add to the same offline environment by calling
arize.log_validation_recordswith the same
You will first have to create a new project containing your models with the same match environments. For example, all models with the same feature schema (such as one instance of a model deployed for a stores/cities/state) can use the same Projects page.
Then, you will want to set up the match environment to the
batch_idwhich you decided as your offline data consistency measuring under Config. In this example, we logged to
Data Consistency details may not show-up immediately after the initial set-up. If you have properly logged to Arize, then the next day you should see visualizations on your match environment.
By clicking into the mismatched features, you can see the feature match distribution difference between offline and online environments using our heat map feature and distribution widgets.
In this particular example. You can see that the offline features seem to experience a one-sided delay, signifying potential latency problems.