Arize AI
Search…
File Format
Supported file formats and the required fields
Arize supports various file formats for ingesting model inferences. Each record in your files must capture fields that represent a model inference in the Arize platform.
CSV
Parquet
Apache Arrow

Logging Predictions, Actuals, Tags, and Shap Values together

Example File

Below is an example CSV file that contains all the necessary data to log inferences to Arize. Notice that the column headers for shap and tags are strategically named so that it's easy to identify those parts of the model record. Features will be automatically discovered as any column not already reserved in the schema declaration. Columns may also be excluded from automatic feature discovery by specifying their name in the schema's exclusion list.
In this particular example each record contains both the predictions and actuals.
Example CSV Prediction + Actual

Example Schema

The above is a CSV has both predictions and actuals in the same file.
The following columns will be automatically discovered as features: metropolitan_area, industry, state, established_business, experienced_business, safe_business, owner_gender, payroll_bracket, avg_payroll_bracket, risk_b_2, risk_bracket
The column user_id will be not be ingested into the Arize system because it's explicitly excluded.
The following schema supports the above example:
1
{
2
"prediction_id": "prediction_id",
3
"timestamp": "prediction_ts",
4
"prediction_score": "prediction_score",
5
"prediction_label": "prediction_label",
6
"actual_score": "actual_score",
7
"actual_label": "actual_label",
8
"tags": "tag/.*",
9
"shap_values": "shap/.*",
10
"exclude": ["user_id"]
11
}
Copied!

Importing predictions and actuals separately

Example Files

The Arize platform allows for uploading different files for predictions versus actuals. The prediction ID links the files together.
Predictions Only
The above table is the features and predictions data. The actuals are received in a separate file that can be delayed in processing.
Actuals Only
The above file can be delivered days or weeks after the prediction file is processed. The prediction ID will join the data together.

Explicit Declaration of Features

Example File

Below is an example CSV file that is similar to the prior two examples, but uses special notation to explicitly call out columns as features rather than automatically inferring columns as features. This follows a similar convention to shap and tags.
In this particular example each record contains both the predictions and actuals.
Example Predictions + Actuals with Labeled Features

Example Schema

The above is a CSV has both predictions and actuals in the same file.
The column user_id will be not be ingested into the Arize system because it's not prefixed with "feature/.*" excluded.
The following schema supports the above example:
1
{
2
"prediction_id": "prediction_id",
3
"timestamp": "prediction_ts",
4
"features": "feature/.*",
5
"prediction_score": "prediction_score",
6
"prediction_label": "prediction_label",
7
"actual_score": "actual_score",
8
"actual_label": "actual_label",
9
"tags": "tag/.*",
10
"shap_values": "shap/.*"
11
}
Copied!

Logging Predictions, Shaps, and Actuals together

Example File

Below is an example Parquet file (transformed to csv for readability) that contains all the necessary data to log inferences to Arize. Notice that the column headers for shap and tags are strategically named so that it's easy to identify those parts of the model record. Features will be automatically discovered as any column not already reserved in the schema declaration. Columns may also be excluded from automatic feature discovery by specifying their name in the schema's exclusion list.
In this particular example each record contains both the predictions and actuals.
Example Parquet Prediction + Actual

Example Schema

The above is a Parquet file with both predictions and actuals in the same file. The schema supported for the above example is below:
1
{
2
"prediction_id": "prediction_id",
3
"timestamp": "prediction_ts",
4
"prediction_score": "prediction_score",
5
"prediction_label": "prediction_label",
6
"actual_score": "actual_score",
7
"actual_label": "actual_label",
8
"tags": "tag/.*",
9
"shap_values": "shap/.*",
10
"exclude": ["user_id"]
11
}
Copied!

Importing predictions and actuals separately

Example Files

The Arize platform allows for uploading different files for predictions versus actuals. The prediction ID links the files together.
Predictions Only
The above table is the features and predictions data. The actuals are received in a separate file that can be delayed in processing.
Actuals Only
The above file can be delivered days or weeks after the prediction file is processed. The prediction ID will join the data together.

Explicit Declaration of Features

Example File

Below is an example Parquet file that is similar to the prior two examples, but uses special notation to explicitly call out columns as features rather than automatically inferring columns as features. This follows a similar convention to shap and tags.
In this particular example each record contains both the predictions and actuals.
Example Predictions + Actuals with Labeled Features

Example Schema

The above is a CSV has both predictions and actuals in the same file.
The column user_id will be not be ingested into the Arize system because it's not prefixed with "feature/.*" excluded.
The following schema supports the above example:
1
{
2
"prediction_id": "prediction_id",
3
"timestamp": "prediction_ts",
4
"features": "feature/.*",
5
"prediction_score": "prediction_score",
6
"prediction_label": "prediction_label",
7
"actual_score": "actual_score",
8
"actual_label": "actual_label",
9
"tags": "tag/.*",
10
"shap_values": "shap/.*"
11
}
Copied!
This file format is currently only supported for our design partners. For early access, please contact [email protected]
This example shows what an Arrow file columns and schema file would look like.
The "*" can be used to add features to a file without changing the schema.
Column Name in File
Arize Schema
my-prediction-ts
prediction_timestamp
my-prediction-id-customer
prediction_id
my-prediction-score
prediction_score
my-prediction-label
prediction_label
my-feature.addr_state
features
my-feature.revenue
features
my-environment
environment
my-actual-label
actual_label
The above table is an example of column mappings in a file and their mappings to the Arize internal dimension types.
Note the name "my-feature" has multiple feature values.
model.json
1
ModelSchema:
2
prediction_timestamp: "my-prediction_ts"
3
prediction_id: "my-prediction-id-customer"
4
prediction_score: "my-prediction-score"
5
prediction_label: "my-prediction-label"
6
features: "feature.*" # describes the path to the "features" object above, containing "addr_state" and "revenue"
7
Copied!
Questions? Email us at [email protected] or Slack us in the #arize-support channel
Copy link