Troubleshoot Data Upload
How to troubleshoot common data ingestion issues
Last updated
Was this helpful?
How to troubleshoot common data ingestion issues
Last updated
Was this helpful?
Let's check if Arize has received your model correctly! Navigate to the 'Data Ingestion' tab within your model. You should see bar charts representing the volume of data received for predictions, actuals, and feature importance values. Hover over the bars to ensure the volume represents what's expected.
If you've waited and your data ingestion volume is as expected, perform a quick data ingestion check. Use the checklist to identify any data ingestion errors that must be corrected to use Arize successfully.
Verify predictions and actuals on the 'Data Ingestion' or 'Datasets' tab
Cardinality of prediction/actual class
Distribution of prediction/actual scores
The amount of data Arize received is the same amount of predictions sent
Verify performance metrics on the 'Performance Tracing' tab
If actuals are sent separately, verify that the prediction ID used for the prediction matches the prediction ID used for the actual
Check the prediction time range
Verify the model type (i.e., ranking, regression, etc.) next to the model’s name
If the 'Data Ingestion' tab shows values that deviate from what's expected, dig into potential ingestion issues based on what you sent.
Most data ingestion errors come from misnamed columns, missing values, or missing fields for your model type:
Prediction labels and scores - Check that you mapped the prediction label and/or prediction score column correctly in the model schema, and ensure the contents of the columns represent expected values
Actual labels and scores - Check that you mapped the actual label and/or prediction score column correctly in the model schema, and ensure the contents of the columns represent expected values
Tags and features - Check that you correctly batched tags and features together in your schema list and the values within each feature/tag column is representative of what you intend to ingest
If your schema looks right, there could be other potential problems with the data received
Ensure Training and Validation records must include both prediction and actual columns
There are a few things that can go wrong if you just send in actuals:
If you’ve never logged predictions for this model, upload prediction values with corresponding prediction IDs to your actuals to view your model in Arize.
Embeddings Features Errors
If you upload embeddings with dimensions longer than 1500 length, you may run into problems visualizing data within the platform. Reduce the dimension length and re-upload your data.
If you upload multiple of the same ranks for the same prediction group id, you may run into problems visualizing your data within the platform. Revise your data to represent unique ranks for a given prediction group and re-upload your data.
from Arize to compare recalculated performance metrics
If the model type is incorrect, check that the correct was specified during data ingestion
If you're using the Python Pandas SDK, set sync = True
in the. When sync is set to True, the log call will block, or wait, until the data has been successfully ingested by the platform and immediately return the status of the log.
If the 'Data Ingestion' tab indicates no data ingested for the expected ingestion period, but you received a 200 success response via the Python SDK or your import job passes, reach out to Arize support via or to help troubleshoot.
and ingestion fields match - Ensure that you upload the expected prediction and actual values based on your model type. There are some model types that require additional fields (i.e. , , and model types)
If you log delayed actuals, Arize joins delayed actuals with prediction IDs in the platform at 5 AM UTC daily. Ensure you have mapped the correct prediction ID to your actuals. See for more information about joining predictions and actuals.