Arize AI
Search…
File Schema
Map the file contents to model inferences
In order for Arize to be able to ingest model inferences into the platform, it must be able to map columns or fields in your files to fields of a model. The dynamic mapping of fields in your data to model fields in the Arize platform is captured in the form of a schema.

Schema

When configuring the file schema via the UI, you can use a simple JSON-based schema to specify the column mapping. For the listing of parameters, please consult the SDK.

Properties

Property
Type
Description
Required
prediction_id
string
Column name that contains the unique identifier for a specific prediction.
Required
timestamp
string
Column name that contains the timestamp of the prediction. For the data itself, the timestamp can either be the time in seconds or an RFC3339 timestamp. (e.g 1650125707 and 2022-04-16T16:15:00Z are both valid)
Required
prediction_label
string
The column name that contains the prediction value
Required
prediction_score
string
Column name for the predicted score. Required if the model type is score_categorical
Optional
actual_label
sting
Column name for the actual or ground truth value
Optional (can be sent in the same file as the predictions or at a later time as long as the prediction_ids match the ones in the predictions file)
actual_score
string
Column name for the ground truth score. Required for score_categorical models
Optional (can be sent in the same file as the predictions or at a later time as long as the prediction_ids match the ones in the predictions file)
features
A prefix and capture_group regex combination (delineated by a /) that describes a set of columns. For example to describe a set of columns that describe features, you might use feature/.* . This means that all columns that start with feature will be ingested as features and the capture group will be used as the feature name (e.x. feature/bank would be captured as a feature named bank. Features must be sent in the same file as predictions.
Optional - If included, ingest columns matching the column group as features.
If not included, all non-reserved columns (e.g timestamp, prediction_label, etc) will be inferred to be features.
tags
A prefix and capture_group regex combination (delineated by a /) that describes a set of columns. For example to describe a set of columns that describe tags, you might use tag/.* . This means that all columns that start with tag will be ingested as tags and the capture group will be used as the tag name (e.x. tag/season would be captured as a feature named season
Optional (tags, if sent, must be sent on the same files as predictions and features)
shap_values
A prefix and capture_group regex combination (delineated by a /) that describes a set of columns. For example to describe a set of columns that describe feature importances for your features, you might use shap/.* . This means that all columns that start with shap will be ingested as tags and the capture group will be used as the tag name (e.x. shap/season would be captured as a feature named season
Optional (can be sent in the same file as the predictions or at a later time as long as the prediction_ids match the ones in the predictions file)
version
string
Column name used to specify a model version. There are 3 ways to declare the version. 1. Omit the version property or set to an empty string -> defaults to "no_version" for all records. 2. Set the version property to "version/{value}" -> {value} will be the version statically assigned to all rows (e.g version/1.0). 3. Set the version property to a column name -> each record will be assigned the version corresponding to the data in the matching column of the provided file.
Optional
batch_id
string
For Validation records only. Used to distinguish different batch of data under the same model_id and model_version.
(must be specified as a constant during Job set up or in the schema for validation records)
Optional
exclude
list<string>
A list of columns in the input file that should be ignored and excluded from ingestion into the Arize system. This field is only honored if the features property is not included in the ingestion schema.
Optional

Example Schema

1
// Excludes the "features" property.
2
// All columns will be ingested as features except those that
3
// are reserved as properties or excluded.
4
{
5
"prediction_id": "prediction_id",
6
"timestamp": "timestamp",
7
"features": "feature/.*",
8
"tags": "tag/.*",
9
"prediction_score": "prediction_score",
10
"prediction_label": "prediction_label",
11
"actual_label": "actual_label",
12
"actual_score": "actual_score",
13
"shap_values": "shap/.*",
14
"version": "version", // lookup the column "version" in the file
15
"batch_id": "batch_id",
16
"exclude": [
17
"<column1 name>",
18
"<column2 name>"
19
]
20
}
Copied!
1
// Includes the "features" property.
2
// Only columns starting with "feature/" will be ingested as a feature
3
{
4
"prediction_id": "prediction_id",
5
"timestamp": "timestamp",
6
"features": "feature/.*",
7
"tags": "tag/.*",
8
"prediction_score": "prediction_score",
9
"prediction_label": "prediction_label",
10
"actual_label": "actual_label",
11
"actual_score": "actual_score",
12
"shap_values": "shap/.*",
13
"version": "version/1.0", // uses version "1.0" for all rows
14
"batch_id": "batch_id",
15
"exclude": []
16
}
Copied!
Questions? Email us at [email protected] or Slack us in the #arize-support channel