SDK Data Ingestion FAQ
Frequently asked questions about data ingestion. For file importer related questions, please view the file importer troubleshooting portion of the documentation.
Arize uses the
prediction_id
field to join the actual
back to its corresponding prediction
at a later time, or right away if you already know the ground truth about a prediction. If an
actual
does not have its prediction_id
field matching on a previously sent prediction_id
of a prediction,
the actual will not be displayed even if it is received by Arize.Your model and predictions will usually show up immediately when you log them to the Arize platform. The time that it takes actuals to show up depends on the way they were sent:
- Together with predictions - in this case you can expect to see actuals, as well as performance metrics, usually 10 minutes after being received by Arize.
- Delayed - if you send actuals at a later time, since they might be unknown at inference, we match them to their corresponding
prediction
once per day.
Arize looks back 14 days to match an
actual
to its corresponding prediction
.When you log prediction data into Arize using SDK, there are two ways to check status depending on which mode is chosen.
Batch: The pandas logger namely
arize.pandas
returns a response so you can check status with response.status_code
Real-time: The logger namely
arize.log
returns a future
. See example:import concurrent.futures as cf
def arize_responses_helper(responses):
"""
responses: a list of responses from Arize
returns: None
"""
for response in cf.as_completed(responses):
res = response.result()
if res.status_code != 200:
raise ValueError(f'failed with code {res.status_code}, {res.text}')
# Logging to Arize, returns a list of responses
responses = arize.log(...) # your log call
# Check responses!
arize_responses_helper(responses)
After receiving a
200
response code, head over to your model's Data Ingestion Tab to confirm that Arize has received your data.The model's inferences are indexed by the received timestamp, NOT the timestamp of the inferences.

Please reach out to our team if your team encounters data issues.
A common issue is mismatched IDs between predictions and actuals. One way to troubleshoot this yourself is using our data export feature.
At the top-right corner of every widget, there is an option to export data from the widget. You will get a link in your email with the data export. When the data is exported, it arrives with a Google Colab notebook where you can use pandas dataFrames to view the raw data sent into the platform. This is helpful to troubleshoot count of matched actuals.
In some cases, you may want to export the data to ensure the correct data is inside of the Arize platform or to troubleshoot your data.
First set the date range for the data you wish to export.

Next, you can navigate to your dashboard and export predictions and actuals from your model.

An email will be sent the user's email exporting the data. You can copy the link to open the associated Colab.

Next, you can paste our data url into the Colab.

Following the rest of the Colab, the data will be transformed into a pandas dataframe.

They are treated as separate observations. This would mean that 2 predictions sent with the same prediction ID would count as 2 predictions. If there was an actual sent for both 2 predictions, it would show up as 2 separate predictions with both having a corresponding matching actual.
We currently support the following data types for the corresponding columns.
Column Type | Supported Data Types |
---|---|
Features | int, float, str, bool |
Prediction ID | int, str |
Prediction Timestamps | int, float, date, datetime |
Prediction Score | int, float |
Actual Score | int, float |
SHAP values | int, float |
Supported data types for Prediction and Actual labels and scores depends on the model types.
Column Type | Score Categorical | Numeric |
---|---|---|
Prediction Label | str | int, float |
Actual Label | str | int, float |
Prediction Score | int, float | NA |
Actual Score | int, float | NA |
Actual Numeric Sequence | List[int, float] | NA |
They are generally not allowed, i.e. would cause the dataset to be rejected, with the exceptions being Prediction Score and Actual Score, which are treated as empty if missing/omitted.
Field | arize.pandas | arize.log() |
---|---|---|
Prediction Label | Not allowed | Not allowed |
Prediction Score | Treated as empty | Treated as empty |
Actual Label | Not allowed | Not allowed |
Actual Score | Treated as empty | Treated as empty |
Field | arize.pandas | arize.log() |
---|---|---|
Prediction Label | Not allowed | Not allowed |
Actual Label | Not allowed | Not allowed |
They are accepted and treated as empty.
When predictions and actuals are logged separately, Arize runs a daily joiner job at 5 am UTC to join them up based on the
prediction_id.
Currently prediction label column i.e.
prediction_label_column_name
is a mandatory column for data ingestion relating to classification models ie. modelTypes.SCORE_CATEGORICAL
. Due to this requirement, prediction label column could be populated with a constant or randomized label values. Note that model performance metrics that are not dependent on prediction label (e.g. Log-loss etc.) will still be calculated for analytics in Arize. On the other hand, metrics that are dependent on prediction label (e.g. Accuracy etc.) will be based on what you send in that column. This is applicable when ingesting data through SDK and File Importer.
Last modified 1mo ago