FAQ & Troubleshoot Data Upload
How to troubleshoot common data ingestion issues
Let's check if Arize has received your model correctly! Navigate to the 'Data Ingestion' tab within your model. You should see bar charts representing the volume of data received for predictions, actuals, and feature importance values. Hover over the bars to ensure the volume represents what's expected.
Data Ingestion Visualization
It can take ~1 minute for bar charts to appear and ~10 minutes for data to fully load in the 'Data Ingestion' tab and appear across the platform. We suggest grabbing a cup of coffee ☕ while you wait!
If you've waited and your data ingestion volume is as expected, perform a quick data ingestion check. Use the checklist to identify any data ingestion errors that must be corrected to use Arize successfully.
- Verify features and tags on the 'Overview' or 'Datasets' tab
- Data types (numeric & categorical) - If a feature is a wrong data type, verify that the feature is represented correctly in the DataFrame/file/table that was ingested.
- Check missing values - If a feature or tag has missing values, verify that it is expected. If missing values aren’t expected, check the input DataFrame/file/table to see if the missing values are present there as well. If there are missing values, check your upstream data sources.
- Verify feature cardinality - Are there features with a cardinality of 1, or an unusually high cardinality? If a feature cardinality appears incorrect, verify the number of unique values for that feature in the input DataFrame/file/table.
- Verify predictions and actuals on the 'Data Ingestion' or 'Datasets' tab
- Cardinality of prediction/actual class
- Distribution of prediction/actual scores
- The amount of data Arize received is the same amount of predictions sent
Arize takes a few minutes to ingest and index all of your data. If the number of predictions differs from what you're expecting to see after waiting a few minutes, check the number of records in your DataFrame or file/table.
- Verify performance metrics on the 'Performance Tracing' tab
- If actuals are sent separately, verify that the prediction ID used for the prediction matches the prediction ID used for the actual
- Check the prediction time range
It's typical to accidentally send duplicate prediction IDs - if a prediction is sent with the same prediction ID as another prediction, this will be counted as 2 predictions in Arize.
- Verify the model type (i.e., ranking, regression, etc.) next to the model’s name
- If the model type is incorrect, check that the correct model type was specified during data ingestion
If you're using the Python Pandas SDK, set
sync = Truein the log call. When sync is set to True, the log call will block, or wait, until the data has been successfully ingested by the platform and immediately return the status of the log.
If the 'Data Ingestion' tab indicates no data ingested for the expected ingestion period, but you received a 200 success response via the Python SDK or your import job passes, reach out to Arize support via [email protected] or Slack to help troubleshoot.
arize.utils.logging | INFO | Success! Check out your data at <link to model>
Successful File Import Example
If the 'Data Ingestion' tab shows values that deviate from what's expected, dig into potential ingestion issues based on what you sent.
Most data ingestion errors come from misnamed columns, missing values, or missing fields for your model type:
- Prediction labels and scores - Check that you mapped the prediction label and/or prediction score column correctly in the model schema, and ensure the contents of the columns represent expected values
- Actual labels and scores - Check that you mapped the actual label and/or prediction score column correctly in the model schema, and ensure the contents of the columns represent expected values
- Tags and features - Check that you correctly batched tags and features together in your schema list and the values within each feature/tag column is representative of what you intend to ingest
- Model type and ingestion fields match - Ensure that you upload the expected prediction and actual values based on your model type. There are some model types that require additional fields (i.e. ranking, NLP, and CV model types)
If your schema looks right, there could be other potential problems with the data received
Ensure Training and Validation records must include both prediction and actual columns
There are a few things that can go wrong if you just send in actuals:
- If you log delayed actuals, Arize joins delayed actuals with prediction IDs in the platform at 5 AM UTC daily. Ensure you have mapped the correct prediction ID to your actuals. See here for more information about joining predictions and actuals.
- If you’ve never logged predictions for this model, upload prediction values with corresponding prediction IDs to your actuals to view your model in Arize.
Embeddings Features Errors
If you upload embeddings with dimensions longer than 1500 length, you may run into problems visualizing data within the platform. Reduce the dimension length and re-upload your data.
If you upload multiple of the same ranks for the same prediction group id, you may run into problems visualizing your data within the platform. Revise your data to represent unique ranks for a given prediction group and re-upload your data.