Schema
Arize class to organize and map column names containing model data within your Pandas dataframe to Arize.
Import and initialize Arize Schema from
arize.utils.types
from arize.utils.types import Schema
class Schema(
prediction_id_column_name:Optional[str] = None
feature_column_names: Optional[List[str]] = None
tag_column_names: Optional[List[str]] = None
timestamp_column_name: Optional[str] = None
prediction_label_column_name: Optional[str] = None
prediction_score_column_name: Optional[str] = None
actual_label_column_name: Optional[str] = None
actual_score_column_name: Optional[str] = None
shap_values_column_names: Optional[Dict[str, str]] = None
actual_numeric_sequence_column_name: Optional[str] = None
embedding_feature_column_names: Optional[Dict[str, EmbeddingColumnNames]] = None
prediction_group_id_column_name: Optional[str] = None
rank_column_name: Optional[str] = None
attributions_column_name: Optional[str] = None
relevance_score_column_name: Optional[str] = None
relevance_labels_column_name: Optional[str] = None
object_detection_prediction_column_names: Optional[ObjectDetectionColumnNames] = None
object_detection_actual_column_names: Optional[ObjectDetectionColumnNames] = None
prompt_column_names: Optional[EmbeddingColumnNames] = None
response_column_names: Optional[EmbeddingColumnNames] = None
)
Parameter | Data Type | Expected Type In Column | Description |
---|---|---|---|
prediction_id_column_name | str | Contents must be a string limited to 128 characters | (Optional) A unique string to identify a prediction event. Required to match a prediction to delayed actuals or feature importances in Arize. If the column is not provided, Arize will generate a random prediction id. |
feature_column_names | List[str] | The content of this column can be int, float, string | (Optional) List of column names for features |
embedding_feature_column_names | (Optional) Dictionary mapping embedding display names to EmbeddingColumnNames objects | ||
timestamp_column_name | str | The content of this column must be int Unix Timestamps in seconds | (Optional) Column name for timestamps
|
prediction_label_column_name | str | The content of this column must be convertible to string | (Optional) Column name for categorical prediction values |
prediction_score_column_name | str | The content of this column must be int/float | (Optional Column name for numeric prediction values |
actual_label_column_name | str | The content of this column must be convertible to string | (Optional) Column name for categorical ground truth values |
actual_score_column_name | str | The content of this column must be int/float | (Optional) Column name for numeric ground truth |
tag_column_names | List[str] | The content of this column can be int, float, string. LImited to 1k values | (Optional) List of column names for tags |
shap_values_column_names | Dict[str,str] | The content of this column must be int/float | (Optional) dict of k-v pairs where k is the feature_colname and v is the corresponding shap_val_col_name. For example, your dataframe contains features columns feat1, feat2, feat3,... and corresponding shap value columns feat1_shap, feat2_shap, feat3_shap,... You want to set shap_values_column_names = {"feat1": "feat1shap", "feat2": "feat2_shap:", "feat3": "feat3_shap"} |
prediction_group_id_column_name | str | The content of this column must be string and is limited to 128 characters | (Required*) Column name for ranking groups or lists in ranking models
*for ranking models only |
rank_column_name | str | The content of this column must be integer between 1-100 | (Required*) Column name for rank of each element on the its group or list
*for ranking models only |
relevance_score_column_name | str | The content of this column must be int/float | (Required*) Column name for ranking model type numeric ground truth values
*for ranking models only |
relevance_labels_column_name | str | The content of this column must be a string | (Required*) Column name for ranking model type categorical ground truth values
*for ranking models only |
object_detection_prediction_column_names | ObjectDetectionColumnNames object containing information defining the predicted bounding boxes' coordinates, categories, and scores. | ||
object_detection_actual_column_names | ObjectDetectionColumnNames object containing information defining the actula bounding boxes' coordinates, categories, and scores. | ||
prompt_column_names | EmbeddingColumnNames object containing the embedding vector data (required) and raw text (optional) for the input text your model acts on | ||
response_column_names | EmbeddingColumnNames object containing the embedding vector data (required) and raw text (optional) for the text your model generates |
prediction id | feature_1 | tag_1 | prediction_ts | prediction_label | actual_label | embedding |
---|---|---|---|---|---|---|
1fcd50f4689 | ca | female | 1637538845 | No Claims | No Claims | [1.27346, -0.2138, ...] |
schema = Schema(
prediction_id_column_name="prediction id",
feature_column_names=["feature_1", "feature_2", "feature_3"],
tag_column_names=["tag_1", "tag_2", "tag_3"],
timestamp_column_name="prediction_ts",
prediction_label_column_name="prediction_label",
prediction_score_column_name="prediction_score",
actual_label_column_name="actual_label",
actual_score_column_name="actual_score",
shap_values_column_names=shap_values_column_names=dict(zip("feature_1", shap_cols)),
embedding_feature_column_names=embedding_feature_column_names,
prediction_group_id_column_name="group_example_name",
rank_column_name="example_rank",
relevance_score_column_name="relevance_score",
relevance_labels_column_name="actual_relevancy",
)
Last modified 4mo ago