Arize AI
Search…
⌃K

arize.pandas (Batch)

Batch Logging - Designed for sending batches of data to Arize

Overview

The Pandas API is designed for either proof of concept (POC) or production environment where batches of data are processed. These environments may be either a Jupyter Notebook or a python server that is batch processing lots of backend data.
Import and initialize Arize client from arize.pandas.logger to call Client.log() with a pandas.DataFrame containing inference data.
Client.log() returns a requests.models.Response object. You can check its http status code to ensure successful delivery of records.
This API uses fast serialization to the file system from Python and followed up by a fast client to server upload. It does require storage in the file system for the file being uploaded.

Initialize Arize Client

from arize.pandas.logger import Client, Schema
API_KEY = 'ARIZE_API_KEY'
SPACE_KEY = 'YOUR SPACE KEY'
arize_client = Client(space_key=SPACE_KEY, api_key=API_KEY)
If using version < 4.0.0, replacespace_key=SPACE_KEY with organization_key=SPACE_KEY

Parameters & Returns

response = arize_client.log(
dataframe,
path,
model_id,
model_version,
batch_id,
model_type,
environment,
sync,
surrogate_explanability,
schema=Schema(
prediction_id_column_name,
feature_column_names,
embedding_feature_column_names,
tag_column_names,
timestamp_column_name,
prediction_label_column_name,
prediction_score_column_name,
actual_label_column_name,
shap_values_column_names
))
type(response) == requests.models.Response
Parameter
Data Type
Description
Required
dataframe
Pandas.DataFrame
The dataframe with your predictions
Required
model_id
string
The unique identifier for your model
Required
model_version
string
Used to group together a subset of predictions and actuals for a given model_id
Required for logging predictions. Optional for logging actuals or shap values.
batch_id
string
Used to distinguish different batch of data under the same model_id and model_version.
Optional. Only applicable and required for validation environment.
model_type
arize.utils.types.ModelTypes
Declared what model type this prediction is for.
Required
environment
arize.utils.types.Environments
The environment that this dataframe is for (Production, Training, Validation)
Required
schema
arize.pandas.logger.Schema
A Schema instance that specifies the column names for corresponding data in the dataframe. More details below
Required
path
string
Temporary directory/file to store the serialized data in binary before sending to Arize
Optional
sync
boolean
When sync is set to True, the log call will block, or wait, until the data has been successfully ingested by the platform and immediately return the status of the log.
Optional
surrogate_explainability
boolean
Computes feature importance values using the surrogate explainability method. This requires that the arize module is installed with the [MimicExplainer] option. If feature importance values are already specified by the shap_values_column_names attribute in the Schema, this module will not run.
Optional

Schema Attributes

Attribute
Data Type
Description
Required
prediction_id_column_name
str
Column name for prediction_id
Required
feature_column_names
List[str]
List of column names for features
Optional
embedding_feature_column_names
List[EmbeddingColumnNames]
List of EmbeddingColumnNames objects
Optional
timestamp_column_name
str
Column name for timestamps
Optional
prediction_label_column_name
str
Column name for prediction label
Optional
prediction_score_column_name
str
Column name for prediction scores
Optional
actual_label_column_name
str
Column name for actual label
Optional
actual_score_column_name
str
Column name for actual scores, or relevance scores in ranking model
Optional
tag_column_names
List[str]
List of column names for tags
Optional
shap_values_column_names
Dict[str, str]
dict of k-v pairs where k is the feature_colname and v is the corresponding shap_val_col_name. For example, your dataframe contains features columnsfeat1, feat2, feat3,...and corresponding shap value columns feat1_shap, feat2_shap, feat3_shap,... You want to set shap_values_column_names = {"feat1": "feat1shap", "feat2": "feat2_shap:", "feat3": "feat3_shap"}
Optional
prediction_group_id_column_name
str
Column name for ranking groups or lists in ranking models
Required for ranking model
rank_column_name
str
Column name for rank of each element on the its group or list
Required for ranking model

Embedding Column Names

Arize's Embedding object is formed by 3 pieces of information: the vector (required), the data (optional) and the link to data (optional). When creating a batched job, we need to map up to 3 columns in a table to a single embedding feature, as opposed to the 1:1 relationship that exists with regular features. For this purpose, Arize provides the EmbeddingColumnNames object.
Attribute
Data Type
Description
Required
vector_column_name
str
Column name for the vector of a given embedding feature. The contents of this column must be List[float] or nd.array[float].
Required
data_column_name
str
Column name for the data of a given embedding feature, typically the raw text associated with the embedding vector. The contents of this column must be str or List[str].
Optional
link_to_data_column_name
str
Column name for the link to data of a given embedding feature, typically a link to the data file (image, audio, ...) associated with the embedding vector. The contents of this column must be str.
Optional
NOTE: Currently Arize only supports link to image files.

Examples

Check out the Example Tutorial.

Example 1: Logging Features & Predictions Only, Then Actuals

response = arize_client.log(
dataframe=your_sample_df,
path="inferences.bin",
model_id="fraud-model",
model_version="1.0",
batch_id=None,
model_type=ModelTypes.SCORE_CATEGORICAL,
environment=Environments.PRODUCTION,
schema = Schema(
prediction_id_column_name="prediction_id",
timestamp_column_name="prediction_ts",
prediction_label_column_name="prediction_label",
prediction_score_column_name="prediction_score",
feature_column_names=feature_cols,
tag_column_names=tag_cols,
shap_values_column_names=dict(zip(feature_cols, shap_cols))
)
)
response = arize_client.log(
dataframe=test_df,
path="inferences.bin",
model_id=model_id,
batch_id=None,
model_type=ModelTypes.SCORE_CATEGORICAL,
environment=Environments.PRODUCTION,
schema = Schema(
prediction_id_column_name="prediction_id",
actual_label_column_name="actual_label",
actual_score_column_name="actual_score",
tag_column_names=tag_cols,
)
)

Example 2: Logging Features, Predictions, Actuals, SHAP values Together

response = arize_client.log(
dataframe=your_sample_df,
path="inferences.bin",
model_id="fraud-model",
model_version="1.0",
batch_id=None,
model_type=ModelTypes.NUMERIC,
environment=Environments.PRODUCTION,
schema = Schema(
prediction_id_column_name="prediction_id",
timestamp_column_name="prediction_ts",
prediction_label_column_name="prediction_label",
actual_label_column_name="actual_label",
feature_column_names=feature_cols,
tag_column_names=tag_cols,
shap_values_column_names=dict(zip(feature_cols, shap_cols))
)
)
Questions? Email us at [email protected] or Slack us in the #arize-support channel