Arize AI
Search…
⌃K

File Schema Reference

Map the file contents to model inferences
In order for Arize to be able to ingest model inferences into the platform, it must be able to map columns or fields in your files to fields of a model. The dynamic mapping of fields in your data to model fields in the Arize platform is captured in the form of a schema.

Schema

When configuring the file schema via the UI, you can either use a form or a simple JSON-based schema to specify the column mapping. For the listing of parameters, please consult the SDK.

Properties

Property
Description
Required
prediction_id
Column name that contains the unique identifier for a specific prediction.
Required
timestamp
Column name that contains the timestamp of the prediction. For the data itself, the timestamp can either be the time in seconds or an RFC3339 timestamp. (e.g 1650125707 and 2022-04-16T16:15:00Z are both valid)
Optional
prediction_label
The column name that contains the prediction value
Required
prediction_score
Column name for the predicted score. Required if the model type is score_categorical
Optional
actual_label
Column name for the actual or ground truth value
Optional for Production records (can be sent in the same file as the predictions or at a later time as long as the prediction_ids match the ones in the predictions file)
actual_score
Column name for the ground truth score. Required for score_categorical models
Optional for Production records (can be sent in the same file as the predictions or at a later time as long as the prediction_ids match the ones in the predictions file)
features
A string prefix that describes a set of columns. For example to describe a set of columns that describe features, you might use feature/ . This means that all columns that start with feature/ will be ingested as features and the suffix will be used as the feature name (e.g. feature/bank would be captured as a feature named bank. Features must be sent in the same file as predictions.
Optional - If included, ingest columns matching the column group as features.
You can choose between feature-prefixing OR inferred-features, but not both.
Note: If this field is not included in schema, columns that are NOT already reserved in the schema declaration (e.g timestamp, prediction_label, etc.) will be inferred to be features.
tags
A string prefix that describes a set of columns. For example to describe a set of columns that describe tags, you might use tag/ . This means that all columns that start with tag/ will be ingested as tags and the capture group will be used as the tag name (e.g. tag/season would be captured as a feature named season
Optional (tags, if sent, must be sent on the same files as predictions and features)
shap_values
A string prefix that describes a set of columns. For example to describe a set of columns that describe feature importances for your features, you might use shap/ . This means that all columns that start with shap/ will be ingested as tags and the capture group will be used as the tag name (e.g. shap/season would be captured as a feature named season
Optional (can be sent in the same file as the predictions or at a later time as long as the prediction_ids match the ones in the predictions file)
version
Column name used to specify a model version. There are 3 ways to declare the version. 1. Omit the version property or set to an empty string -> defaults to "no_version" for all records. 2. Set the version property to "version/{value}" -> {value} will be the version statically assigned to all rows (e.g version/1.0). 3. Set the version property to a column name -> each record will be assigned the version corresponding to the data in the matching column of the provided file.
If you are configuring a job using the UI, version can be explicitly set as a constant in the model configuration step instead.
Optional
batch_id
For Validation records only. Used to distinguish different batch of data under the same model_id and model_version.
(must be specified as a constant during Job set up or in the schema for validation records)
Optional
exclude
A list of columns in the input file that should be ignored and excluded from ingestion into the Arize system. This field is only honored if the features property is not included in the ingestion schema.
Optional (If omitted, defaults to current timestamp at file ingestion time)
embedding_features
A list of embedding columns and the associated vector column (required), raw data column (optional) and link to data column (optional) in the input file that should be ingested as embeddings. See below for an example.
Optional

Example Embedding

When configuring an embedding in the UI:
"embedding_features": [{
"my_feature": // required, my_feature is the name of the feature
{
vector: "vector_col", // required, vector_col is the column name of the vector
raw_data: "raw_data_col", // optional
link_to_data: "link_to_data_col" // optional
}
}]
When configuring an embedding in the API:
"embeddingFeatures": [{
"featureName": "my_feature",
"vectorCol": "vector_col",
"rawDataCol": "raw_data_col",
"linkToDataCol": "link_to_data_col"
}]

Example Form and Schema Inputs

Form Input
JSON Input
// Excludes the "features" property.
// All columns will be ingested as features except those that
// are reserved as properties or excluded.
prediction_id: prediction_id
timestamp: timestamp
tags: tag/
prediction_score: prediction_score
prediction_label: prediction_label
actual_label: actual_label
actual_score: actual_score
shap_values: shap/
version: version // lookup the column "version" in the file
batch_id: batch_id
exclude: <column1 name>,<column2 name>
embedding_features: // fill out the embedding features section
// Excludes the "features" property.
// All columns will be ingested as features except those that
// are reserved as properties or excluded.
{
"prediction_id": "prediction_id",
"timestamp": "timestamp",
"tags": "tag/",
"prediction_score": "prediction_score",
"prediction_label": "prediction_label",
"actual_label": "actual_label",
"actual_score": "actual_score",
"shap_values": "shap/",
"version": "version", // lookup the column "version" in the file
"batch_id": "batch_id",
"exclude": [
"<column1 name>",
"<column2 name>"
],
"embedding_features": [
{
"embedding_1": {
"vector": "vector_column_1"
"raw_data": "raw_data_column_1",
"link_to_data": "link_to_data_column"
}
}
]
}
Form Input
JSON Input
// Includes the "features" property.
// Only columns starting with "feature/" will be ingested as a feature
prediction_id: prediction_id
timestamp: timestamp
features: feature/
tags: tag/
prediction_score: prediction_score
prediction_label: prediction_label
actual_label: actual_label
actual_score: actual_score
shap_values: shap/
version: version // lookup the column "version" in the file
batch_id: batch_id
exclude: // leave empty to omit column exclusions
embedding_features: // leave empty to omit embeddings
// Includes the "features" property.
// Only columns starting with "feature/" will be ingested as a feature
{
"prediction_id": "prediction_id",
"timestamp": "timestamp",
"features": "feature/",
"tags": "tag/",
"prediction_score": "prediction_score",
"prediction_label": "prediction_label",
"actual_label": "actual_label",
"actual_score": "actual_score",
"shap_values": "shap/",
"version": "version",
"batch_id": "batch_id",
"exclude": [],
"embedding_features": []
}
Questions? Email us at [email protected] or Slack us in the #arize-support channel