Troubleshoot Data Upload

How to troubleshoot common data ingestion issues

Check The Data Ingestion Tab

Let's check if Arize has received your model correctly! Navigate to the 'Data Ingestion' tab within your model. You should see bar charts representing the volume of data received for predictions, actuals, and feature importance values. Hover over the bars to ensure the volume represents what's expected.

It can take ~1 minute for bar charts to appear and ~10 minutes for data to fully load in the 'Data Ingestion' tab and appear across the platform. We suggest grabbing a cup of coffee ☕ while you wait!

👍 Looks great! Verify Your Data

If you've waited and your data ingestion volume is as expected, perform a quick data ingestion check. Use the checklist to identify any data ingestion errors that must be corrected to use Arize successfully.

Arize takes a few minutes to ingest and index all of your data. If the number of predictions differs from what you're expecting to see after waiting a few minutes, check the number of records in your DataFrame or file/table.

It's typical to accidentally send duplicate prediction IDs - if a prediction is sent with the same prediction ID as another prediction, this will be counted as 2 predictions in Arize.

👎 There's An Error In Data Received

If you're using the Python Pandas SDK, set sync = True in the log call. When sync is set to True, the log call will block, or wait, until the data has been successfully ingested by the platform and immediately return the status of the log.

No Data Received

If the 'Data Ingestion' tab indicates no data ingested for the expected ingestion period, but you received a 200 success response via the Python SDK or your import job passes, reach out to Arize support via support@arize.com or Slack to help troubleshoot.

arize.utils.logging | INFO | Success! Check out your data at <link to model>

Some Data Received

If the 'Data Ingestion' tab shows values that deviate from what's expected, dig into potential ingestion issues based on what you sent.

Check Input Values

Most data ingestion errors come from misnamed columns, missing values, or missing fields for your model type:

  • Prediction labels and scores - Check that you mapped the prediction label and/or prediction score column correctly in the model schema, and ensure the contents of the columns represent expected values

  • Actual labels and scores - Check that you mapped the actual label and/or prediction score column correctly in the model schema, and ensure the contents of the columns represent expected values

  • Tags and features - Check that you correctly batched tags and features together in your schema list and the values within each feature/tag column is representative of what you intend to ingest

  • and ingestion fields match - Ensure that you upload the expected prediction and actual values based on your model type. There are some model types that require additional fields (i.e. ranking, NLP, and CV model types)

Other Potential Problems

If your schema looks right, there could be other potential problems with the data received

Training and Validation Record Errors

Ensure Training and Validation records must include both prediction and actual columns

Actuals Errors

There are a few things that can go wrong if you just send in actuals:

  • If you log delayed actuals, Arize joins delayed actuals with prediction IDs in the platform at 5 AM UTC daily. Ensure you have mapped the correct prediction ID to your actuals. See here for more information about joining predictions and actuals.

  • If you’ve never logged predictions for this model, upload prediction values with corresponding prediction IDs to your actuals to view your model in Arize.

Embeddings Features Errors

If you upload embeddings with dimensions longer than 1500 length, you may run into problems visualizing data within the platform. Reduce the dimension length and re-upload your data.

Rank Errors

If you upload multiple of the same ranks for the same prediction group id, you may run into problems visualizing your data within the platform. Revise your data to represent unique ranks for a given prediction group and re-upload your data.

Last updated

Copyright © 2023 Arize AI, Inc