Search
⌃K
Links

Schema

Arize class to organize and map column names containing model data within your Pandas dataframe to Arize.
Import and initialize Arize Schema from arize.utils.types
from arize.utils.types import Schema
class Schema(
prediction_id_column_name:Optional[str] = None
feature_column_names: Optional[List[str]] = None
tag_column_names: Optional[List[str]] = None
timestamp_column_name: Optional[str] = None
prediction_label_column_name: Optional[str] = None
prediction_score_column_name: Optional[str] = None
actual_label_column_name: Optional[str] = None
actual_score_column_name: Optional[str] = None
shap_values_column_names: Optional[Dict[str, str]] = None
actual_numeric_sequence_column_name: Optional[str] = None
embedding_feature_column_names: Optional[Dict[str, EmbeddingColumnNames]] = None
prediction_group_id_column_name: Optional[str] = None
rank_column_name: Optional[str] = None
attributions_column_name: Optional[str] = None
relevance_score_column_name: Optional[str] = None
relevance_labels_column_name: Optional[str] = None
prompt_column_names: Optional[EmbeddingColumnNames] = None
response_column_names: Optional[EmbeddingColumnNames] = None
)
Parameter
Data Type
Expected Type In Column
Description
prediction_id_column_name
str
Contents must be a string limited to 128 characters
(Optional) A unique string to identify a prediction event. Required to match a prediction to delayed actuals or feature importances in Arize. If the column is not provided, Arize will generate a random prediction id.
feature_column_names
List[str]
The content of this column can be int, float, string
(Optional) List of column names for features
embedding_feature_column_names
Learn more here
(Optional) Dictionary mapping embedding display names to EmbeddingColumnNames objects
timestamp_column_name
str
The content of this column must be int Unix Timestamps in seconds
(Optional) Column name for timestamps
prediction_label_column_name
str
The content of this column must be convertible to string
(Optional) Column name for categorical prediction values
prediction_score_column_name
str
The content of this column must be int/float
(Optional Column name for numeric prediction values
actual_label_column_name
str
The content of this column must be convertible to string
(Optional) Column name for categorical ground truth values
actual_score_column_name
str
The content of this column must be int/float
(Optional) Column name for numeric ground truth
tag_column_names
List[str]
The content of this column can be int, float, string. LImited to 1k values
(Optional) List of column names for tags
shap_values_column_names
Dict[str,str]
The content of this column must be int/float
(Optional) dict of k-v pairs where k is the feature_colname and v is the corresponding shap_val_col_name. For example, your dataframe contains features columnsfeat1, feat2, feat3,...and corresponding shap value columns feat1_shap, feat2_shap, feat3_shap,... You want to set shap_values_column_names = {"feat1": "feat1shap", "feat2": "feat2_shap:", "feat3": "feat3_shap"}
prediction_group_id_column_name
str
The content of this column must be string and is limited to 128 characters
(Required*) Column name for ranking groups or lists in ranking models *for ranking models only
rank_column_name
str
The content of this column must be integer between 1-100
(Required*) Column name for rank of each element on the its group or list *for ranking models only
relevance_score_column_name
str
The content of this column must be int/float
(Required*) Column name for ranking model type numeric ground truth values *for ranking models only
relevance_labels_column_name
str
The content of this column must be a string
(Required*) Column name for ranking model type categorical ground truth values *for ranking models only
prompt_column_names
Learn more here
EmbeddingColumnNames object containing the embedding vector data (required) and raw text (optional) for the input text your model acts on
response_column_names
Learn more here
EmbeddingColumnNames object containing the embedding vector data (required) and raw text (optional) for the text your model generates

Code Example

prediction id
feature_1
tag_1
prediction_ts
prediction_label
actual_label
embedding
1fcd50f4689
ca
female
1637538845
No Claims
No Claims
[1.27346, -0.2138, ...]
schema = Schema(
prediction_id_column_name="prediction id",
feature_column_names=["feature_1", "feature_2", "feature_3"],
tag_column_names=["tag_1", "tag_2", "tag_3"],
timestamp_column_name="prediction_ts",
prediction_label_column_name="prediction_label",
prediction_score_column_name="prediction_score",
actual_label_column_name="actual_label",
actual_score_column_name="actual_score",
shap_values_column_names=shap_values_column_names=dict(zip("feature_1", shap_cols)),
embedding_feature_column_names=embedding_feature_column_names,
prediction_group_id_column_name="group_example_name",
rank_column_name="example_rank",
relevance_score_column_name="relevance_score",
relevance_labels_column_name="actual_relevancy",
)