Troubleshoot Data Upload

How to troubleshoot common data ingestion issues

Check The Data Ingestion Tab

Let's check if Arize has received your model correctly! Navigate to the 'Data Ingestion' tab within your model. You should see bar charts representing the volume of data received for predictions, actuals, and feature importance values. Hover over the bars to ensure the volume represents what's expected.

It can take ~1 minute for bar charts to appear and ~10 minutes for data to fully load in the 'Data Ingestion' tab and appear across the platform. We suggest grabbing a cup of coffee ☕ while you wait!

👍 Looks great! Verify Your Data

If you've waited and your data ingestion volume is as expected, perform a quick data ingestion check. Use the checklist to identify any data ingestion errors that must be corrected to use Arize successfully.

Verify features and tags on the 'Overview' or 'Datasets' tab
- Data types (numeric & categorical) - If a feature is a wrong data type, verify that the feature is represented correctly in the DataFrame/file/table that was ingested.
- Check missing values - If a feature or tag has missing values, verify that it is expected. If missing values aren’t expected, check the input DataFrame/file/table to see if the missing values are present there as well. If there are missing values, check your upstream data sources.
- Verify feature cardinality - Are there features with a cardinality of 1, or an unusually high cardinality? If a feature cardinality appears incorrect, verify the number of unique values for that feature in the input DataFrame/file/table.
Verify predictions and actuals on the 'Data Ingestion' or 'Datasets' tab
- Cardinality of prediction/actual class
- Distribution of prediction/actual scores
- The amount of data Arize received is the same amount of predictions sent

Arize takes a few minutes to ingest and index all of your data. If the number of predictions differs from what you're expecting to see after waiting a few minutes, check the number of records in your DataFrame or file/table.

Verify performance metrics on the 'Performance Tracing' tab
- If actuals are sent separately, verify that the prediction ID used for the prediction matches the prediction ID used for the actual
- Check the prediction time range
- Export the dataset from Arize to compare recalculated performance metrics

It's typical to accidentally send duplicate prediction IDs - if a prediction is sent with the same prediction ID as another prediction, this will be counted as 2 predictions in Arize.

Verify the model type (i.e., ranking, regression, etc.) next to the model’s name
- If the model type is incorrect, check that the correct model type was specified during data ingestion

👎 There's An Error In Data Received

If you're using the Python Pandas SDK, set sync = True in the log call. When sync is set to True, the log call will block, or wait, until the data has been successfully ingested by the platform and immediately return the status of the log.

No Data Received

If the 'Data Ingestion' tab indicates no data ingested for the expected ingestion period, but you received a 200 success response via the Python SDK or your import job passes, reach out to Arize support via support@arize.com or Slack to help troubleshoot.

arize.utils.logging | INFO | Success! Check out your data at <link to model>

Some Data Received

If the 'Data Ingestion' tab shows values that deviate from what's expected, dig into potential ingestion issues based on what you sent.

Check Input Values

Most data ingestion errors come from misnamed columns, missing values, or missing fields for your model type:

Prediction labels and scores - Check that you mapped the prediction label and/or prediction score column correctly in the model schema, and ensure the contents of the columns represent expected values
Actual labels and scores - Check that you mapped the actual label and/or prediction score column correctly in the model schema, and ensure the contents of the columns represent expected values
Tags and features - Check that you correctly batched tags and features together in your schema list and the values within each feature/tag column is representative of what you intend to ingest
and ingestion fields match - Ensure that you upload the expected prediction and actual values based on your model type. There are some model types that require additional fields (i.e. ranking, NLP, and CV model types)

Troubleshoot Data Upload

Check The Data Ingestion Tab

👍 Looks great! Verify Your Data

👎 There's An Error In Data Received

No Data Received

Some Data Received

Check Input Values

Other Potential Problems

Training and Validation Record Errors

Actuals Errors

Rank Errors

Check The Data Ingestion Tab

👍 Looks great! Verify Your Data

👎 There's An Error In Data Received

No Data Received

Some Data Received

Check Input Values

Other Potential Problems

Training and Validation Record Errors

Actuals Errors

Rank Errors