Arize AI
Search…
⌃K

Model Schema

Overview of Arize Model Inference Schema
Arize stores model data and this data is organized by via model schema. Your model schema differs based on the ingestion method and model type. Navigate to model types here.

Data Ingestion Methods

Optional and required schema parameters vary based on your model type and ingestion method. Learn more about required schema parameters within each ingestion method schema parameter page below.
Find the Python SDK schema parameters here, Java SDK here, R SDK here, and REST API here.

Model Schema Definitions

See below for more details, or click to navigate directly to a definition.
  1. 6.
    Timestamp
  2. 7.
    Features (Tabular - Structured Data)
  3. 8.
    Embedding Features (Unstructured Data)
  4. 9.
    Tags

1. Model Name

A unique identifier for your model. Your model name should have a clear name of the business use case (i.e., fraud-prevention-model)
Ingestion Method
Requirements
Required
Required
Required

2. Model Version

Model Version captures a minor or major change to the model where the model will evaluate differently given different input data. Model Versions can be used to do filtering and compare performance in Arize.
A new model version should be created when:
  • A new model is trained -- there are a new set of weights
  • A new feature is added
Ingestion Method
Requirements
Optional
Optional
Optional

3. Model Environments

Environments are the different prediction streams for a model.
  • Training Environment: Data used to build the model. The response of the model to the training data.
  • Validation Environment: Data used to test the model. The response of the model to various validation datasets.
    • We support multiple batches of validation data. This means there can be batch1, batch2, etc.
  • Production Environment: Deployed model. The stream of production inferences.
Ingestion Method
Requirements
Required
Required
Required

4. Model Type

Arize supports many model types - check out our various Model Types to learn more.
Ingestion Method
Requirements
Required
Required
Required

5. Prediction ID

A prediction ID is an ID that should indicate a unique prediction event. It is used as a join key between prediction and actual. If an actual does not have its prediction_id field match a previously sent prediction_id of a prediction, the actual will not be displayed even if it is received by Arize.
Transaction ID acts as Prediction ID to join the prediction "not fraud" with the actual "fraud"
Ingestion Method
Requirements
Required
Required
Required
What happens if we upload the same data with the same prediction ID twice? Does Arize treat that as one prediction/observation or as two?
They are treated as separate observations. This would mean that 2 predictions sent with the same prediction ID would count as 2 predictions. If there were an actual sent for both 2 predictions, it would show up as 2 separate predictions with both having a corresponding matching actual.

6. Timestamp

The timestamp indicates when the data will show up in the UI - sent as an integer representing the UNIX Timestamp in seconds. Typically, this is used for the time the prediction was made. However, there are instances such as time series models, where you may want the timestamp to be the date the prediction was made for.
The timestamp field defaults to the time you sent the prediction to Arize. Arize supports sending in timestamps up to 1 year historically and in the future from the current timestamp.
Ingestion Method
Requirements
Optional
Optional
Optional

7. Features (Tabular - Structured)

Arize captures the feature schema as the first prediction is logged. If the features change over time, the feature schema will adjust to show the new schema.
Features are inputs to the model
Ingestion Method
Requirements
Optional
Optional
Optional

8. Embedding Features (Unstructured)

Arize's embedding objects are composed of 3 different pieces of information:
  • vector (required): the embedding vector itself, representing the unstructured input data. Accepted data types are List[float] and nd.array[float].
  • data (optional): Typically the raw text represented by the embedding vector. Accepted data types are str (for words or sentences) and List[str] (for token arrays).
  • link to data (optional): Typically a URL linking to the data file (image, audio, video...) represented by the embedding vector. Accepted data types are str.
Learn more about our embedding features here.
Ingestion Method
Requirements
Optional
Optional
Optional

9. Tags

Tags are a convenient way to group predictions by metadata you find important but don't want to send as an input to the model. (i.e., what server/node was this prediction or actual served on, sensitive categories, model or feature operational metrics). Use tags to group, monitor, slice, and investigate the performance of “cohorts” based on user-defined metadata for the model.
Tags can be sent in with predictions or actuals.
If tags are sent in with a prediction and it's corresponding actual, Arize merges the tag maps, keeping the prediction tag’s value if the tag keys are identical.
Example row of tags
location
month
fruit
New York
January
apple
#Python single record
tags = {
'location':'New York'
'month': 'January'
'fruit': 'apple'
}
response = arize.log(
model_id='sample-model-1',
model_version='v1',
...
tags=tags
)
#Python batch (pandas)
schema = Schema(
prediction_id_column_name='prediction_id',
...
tag_column_names=['location', 'month', 'fruit']
)
Ingestion Method
Requirements
Optional
Optional
Optional

10. Feature Importance

Feature importance is a compilation of a class of techniques that take in all the features related to making a model prediction and assign a certain score to each feature to weigh how much or how little it impacted the outcome.
Check out the explainability section to learn more.
Ingestion Method
Requirements
Optional
Optional
Optional
Questions? Email us at [email protected] or Slack us in the #arize-support channel