Search…
⌃K
Links

SDK Data Ingestion FAQ

Frequently asked questions about data ingestion. For file importer related questions, please view the file importer troubleshooting portion of the documentation.

1. I've sent actuals, but I don't see them show up on Arize.

Arize uses the prediction_id field to join the actual back to its corresponding prediction at a later time, or right away if you already know the ground truth about a prediction.
If an actual does not have its prediction_id field matching on a previously sent prediction_id of a prediction, the actual will not be displayed even if it is received by Arize.
Your model and predictions will usually show up immediately when you log them to the Arize platform. The time that it takes actuals to show up depends on the way they were sent:
  • Together with predictions - in this case you can expect to see actuals, as well as performance metrics, usually 10 minutes after being received by Arize.
  • Delayed - if you send actuals at a later time, since they might be unknown at inference, we match them to their corresponding prediction once per day.
Arize looks back 14 days to match an actual to its corresponding prediction.

2. Did Arize receive my prediction/actual record?

When you log prediction data into Arize using SDK, there are two ways to check status depending on which mode is chosen.
Batch: The pandas logger namely arize.pandas returns a response so you can check status with response.status_code
Real-time: The logger namely arize.log returns a future . See example:
import concurrent.futures as cf
def arize_responses_helper(responses):
"""
responses: a list of responses from Arize
returns: None
"""
for response in cf.as_completed(responses):
res = response.result()
if res.status_code != 200:
raise ValueError(f'failed with code {res.status_code}, {res.text}')
# Logging to Arize, returns a list of responses
responses = arize.log(...) # your log call
# Check responses!
arize_responses_helper(responses)
After receiving a 200 response code, head over to your model's Data Ingestion Tab to confirm that Arize has received your data.
The model's inferences are indexed by the received timestamp, NOT the timestamp of the inferences.

3. How do I troubleshoot if I don't see my predictions/actual record in the Arize Platform?

Please reach out to our team if your team encounters data issues.
A common issue is mismatched IDs between predictions and actuals. One way to troubleshoot this yourself is using our data export feature.
At the top-right corner of every widget, there is an option to export data from the widget. You will get a link in your email with the data export. When the data is exported, it arrives with a Google Colab notebook where you can use pandas dataFrames to view the raw data sent into the platform. This is helpful to troubleshoot count of matched actuals.

Data Export

In some cases, you may want to export the data to ensure the correct data is inside of the Arize platform or to troubleshoot your data.
First set the date range for the data you wish to export.
Next, you can navigate to your dashboard and export predictions and actuals from your model.
An email will be sent the user's email exporting the data. You can copy the link to open the associated Colab.
Next, you can paste our data url into the Colab.
Following the rest of the Colab, the data will be transformed into a pandas dataframe.

4. What happens if we upload the same data with the same prediction ID twice? Does Arize treat that as one prediction/observation or as two?

They are treated as separate observations. This would mean that 2 predictions sent with the same prediction ID would count as 2 predictions. If there was an actual sent for both 2 predictions, it would show up as 2 separate predictions with both having a corresponding matching actual.

5. What are the Supported Data Types for the Python SDK?

We currently support the following data types for the corresponding columns.
Column Type
Supported Data Types
Features
int, float, str, bool
Prediction ID
int, str
Prediction Timestamps
int, float, date, datetime
Prediction Score
int, float
Actual Score
int, float
SHAP values
int, float
Supported data types for Prediction and Actual labels and scores depends on the model types.
Column Type
Score Categorical
Numeric
Prediction Label
str
int, float
Actual Label
str
int, float
Prediction Score
int, float
NA
Actual Score
int, float
NA
Actual Numeric Sequence
List[int, float]
NA

6. What if my predictions/actuals have None/NaN/Inf values?

They are generally not allowed, i.e. would cause the dataset to be rejected, with the exceptions being Prediction Score and Actual Score, which are treated as empty if missing/omitted.
For more information about these fields, see Model Types.
In the case of Pandas DataFrame, no column should have Mixed Types.

Classification

Field
arize.pandas
arize.log()
Prediction Label
Not allowed
Not allowed
Prediction Score
Treated as empty
Treated as empty
Actual Label
Not allowed
Not allowed
Actual Score
Treated as empty
Treated as empty

Regression

Field
arize.pandas
arize.log()
Prediction Label
Not allowed
Not allowed
Actual Label
Not allowed
Not allowed

7. What if my features have None/NaN/Inf values?

They are accepted and treated as empty.

8. I sent latent actuals, when will I see them in platform?

When predictions and actuals are logged separately, Arize runs a daily joiner job at 5 am UTC to join them up based on the prediction_id.

9. What if I don't have prediction labels for my classification model?

Currently prediction label column i.e. prediction_label_column_name is a mandatory column for data ingestion relating to classification models ie. modelTypes.SCORE_CATEGORICAL. Due to this requirement, prediction label column could be populated with a constant or randomized label values. Note that model performance metrics that are not dependent on prediction label (e.g. Log-loss etc.) will still be calculated for analytics in Arize. On the other hand, metrics that are dependent on prediction label (e.g. Accuracy etc.) will be based on what you send in that column. This is applicable when ingesting data through SDK and File Importer.
Questions? Email us at [email protected] or Slack us in the #arize-support channel