What Is A Model Schema
Overview of Arize Model Inference Schema
Arize stores model data and this data is organized by via model schema.
The Arize model schema consists of model records. Each record can contain the inputs to the model (features), model outputs (predictions), timestamps, latently linked ground truth (actuals), metadata (tags), and model internals (embeddings and/or SHAP).
Prediction ID | Timestamp | Prediction | Actual | Feature | Tag | Embedding | URL |
---|---|---|---|---|---|---|---|
1fcd50f4689 | 1637538845 | No Claims | No Claims | ca | female | [1.27346, -0.2138, ...] | "https://example_ur.jpg" |
Your model schema differs based on the data ingestion method and model type. Navigate to model types here.
See below for more details, or click to navigate directly to a definition.
1. 2. 4. 6. 7. 8. 9.
Note: This schema example includes possible inputs using the Python Pandas SDK. Please consult model types for applicable schema parameters relevant to your model.
prediction_id | prediction_ts | prediction_label | prediction_score | actual_label | actual_score | feature_1 | tag_1 | vector | text | image_link | group_id_name | rank | relevance_score | actual_relevancy |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1fcd50f4689 | 1637538845 | No Claims | 0.4 | No Claims | 0.4 | ca | female | [1.27346, -0.2138, ...] | "This is an example text" | "https://example_ur.jpg" | 148 | 4 | 0.155441 | not relevant |
embedding_feature_column_names = {
"embedding_display_name": EmbeddingColumnNames(
vector_column_name="vector", # column containing embedding vector (required)
data_column_name="text", # column containing raw text (optional NLP)
link_to_data_column_name="image_link" # column containing image URL links (optional CV)
)
}
schema = Schema(
prediction_id_column_name="prediction id",
feature_column_names=["feature_1", "feature_2", "feature_3"],
tag_column_names=["tag_1", "tag_2", "tag_3"],
timestamp_column_name="prediction_ts",
prediction_label_column_name="prediction_label",
prediction_score_column_name="prediction_score",
actual_label_column_name="actual_label",
actual_score_column_name="actual_score",
shap_values_column_names=shap_values_column_names=dict(zip("feature_1", shap_cols)),
embedding_feature_column_names=embedding_feature_column_names,
prediction_group_id_column_name="group_id_name",
rank_column_name="rank",
relevance_score_column_name="relevance_score",
relevance_labels_column_name="actual_relevancy",
)
response = arize.log(
dataframe=df,
schema=schema,
environment=Environments.Production,
model_id="example_model",
model_type=ModelTypes.BINARY_CLASSIFICATION
metrics_validation=metrics_validation=[Metrics.CLASSIFICATION, Metrics.REGRESSION, Metrics.AUC_LOG_LOSS]
model_version="1.0"
validate=True
)
A unique identifier for your model. Your model name should have a clear name of the business use case (i.e.,
fraud-prevention-model
)
Model versions capture snapshots of a model at different times. New model versions are created after retraining, new weights, or new features. Each version can contain its own training, validation, and production environment.
In Arize, you can have as many model versions as you want for a model, just as long as you upload them with the same Model ID. Use multiple model versions for a given model to filter and compare in Arize.

A model environment refers to the setup or conditions in which a model is developed. Arize supports uploading training, validation, and production environments. In Arize, a model can have multiple sets of environments depending on how many versions you have.
Training Environment: Where the model learns from the training data, adjusting its parameters to minimize the error in its predictions.
- Arize supports multiple training versions for any given model version
Validation Environment: Used to test a model on a separate dataset (validation data) not used in training. This environment helps to fine-tune the model's hyperparameters and prevents overfitting.
- We support multiple batches of validation data (i.e. batch1, batch2, etc)
Production Environment: Where the model is deployed to the real-world and provides predictions or classifications for actual use cases.
- Production data can help inform retraining efforts, thus creating a new model version.
A
prediction ID
is an ID that indicates a unique prediction event. A prediction ID is required to connect predictions with delayed actuals (ground truth). Learn how to send delayed (latent) actuals here. The timestamp indicates when the data will show up in the UI - sent as an integer representing the UNIX Timestamp in seconds. Typically, this is used for the time the prediction was made. However, there are instances such as time series models, where you may want the timestamp to be the date the prediction was made for.
The timestamp field defaults to the time you sent the prediction to Arize. Arize supports sending in timestamps up to 2 year historically and 1 year in the future from the current timestamp.
Arize captures the feature schema as the first prediction is logged. If the features change over time, the feature schema will adjust to show the new schema.

Features are inputs to the model
Arize's embedding objects are composed of 3 different pieces of information:
- vector (required): the embedding vector itself, representing the unstructured input data. Accepted data types are
List[float]
andnd.array[float]
. - data (optional): Typically the raw text represented by the embedding vector. Accepted data types are
str
(for words or sentences) andList[str]
(for token arrays). - link to data (optional): Typically a URL linking to the data file (image, audio, video...) represented by the embedding vector. Accepted data types are
str
.
Tags are a convenient way to group predictions by metadata you find important but don't want to send as an input to the model. (i.e., what server/node was this prediction or actual served on, sensitive categories, model or feature operational metrics). Use tags to group, monitor, slice, and investigate the performance of “cohorts” based on user-defined metadata for the model.

Tags can be sent in with predictions or actuals.
If tags are sent in with a prediction and it's corresponding actual, Arize merges the tag maps, keeping the prediction tag’s value if the tag keys are identical.
Example row of tags
location | month | fruit |
---|---|---|
New York | January | apple |
#Python single record
tags = {
'location':'New York'
'month': 'January'
'fruit': 'apple'
}
response = arize.log(
model_id='sample-model-1',
model_version='v1',
...
tags=tags
)
#Python batch (pandas)
schema = Schema(
prediction_id_column_name='prediction_id',
...
tag_column_names=['location', 'month', 'fruit']
)
Feature importance is a compilation of a class of techniques that take in all the features related to making a model prediction and assign a certain score to each feature to weigh how much or how little it impacted the outcome.
Last modified 17d ago