Log model inference data directly to Arize using SDK/API methods
This page shows how to send data to Arize using the Python Pandas SDK.
Step 1: Set Up Python SDK
Install Arize SDK
pip install arize
Initialize Arize client from arize.pandas.logger to call Client.log()
from arize.pandas.logger import Client, SchemaAPI_KEY ='ARIZE_API_KEY'SPACE_KEY ='YOUR SPACE KEY'arize_client =Client(space_key=SPACE_KEY, api_key=API_KEY)
Step 2: Set Model Schema Attributes
A model schema is broken into required and optional parameters. Optional model schema parameters vary based on model types. Learn more about model types here. Gain a comprehensive list of schema attributes and their definitions here.
# feature & tag columns can be optionally defined with typing:feature_columns =TypedColumns( inferred=["state"], to_str=["states"],)schema =Schema( prediction_id_column_name="prediction_id", timestamp_column_name="prediction_ts", prediction_label_column_name="prediction_label", actual_label_column_name="actual_label", feature_column_names=feature_columns, tag_column_names=["gender"])
Optional: Embeddings
# Declare embedding feature columnsembedding_feature_column_names ={# Dictionary keys will be the name of the embedding feature in the app"embedding_display_name":EmbeddingColumnNames( vector_column_name="vector", # column containing embedding vector (required) data_column_name="text", # column containing raw text (optional NLP) link_to_data_column_name="image_link"# column containing image URL links (optional CV) )}schema =Schema( embedding_feature_column_names=embedding_feature_column_names, ...)
Optional: SHAP Values
# Generate the Shap Values and save as a Dataframeexplainer = shap.TreeExplainer(tree_model)shap_values = explainer.shap_values(X_data)shap_dataframe = pd.DataFrame( shap_values, columns=[f"{fn}_shap"for fn in data["state"]])shap_cols = shap_dataframe.columnsschema =Schema( shap_values_column_names=dict(zip("state", shap_cols)), ...)
Optional: Delayed Actuals
If your model receives delayed actuals, log your delayed production data using the same prediction ID, which links your files together in the Arize platform. This can be delivered days or weeks after the prediction is received.
#log predictionsschema =Schema( prediction_id_column_name="prediction_id", prediction_label_column_name="prediction_label", ...)# then log actuals schema =Schema( prediction_id_column_name="prediction_id", #needs to be the same as above actual_label_column_name="actual_label", ...)
Step 3: Log Inferences
Arize expects the DataFrame's index to be sorted and begin at 0. If you perform operations that might affect the index prior to logging data, reset the index as follows:
response = arize_client.log( model_id='sample-model-1', model_version='v1', #Specify your model version to easily track changes across the Arize platform (i.e. a retrained model) batch_id=None, model_type=ModelTypes.BINARY_CLASSIFICATION, metrics_validation=[Metrics.CLASSIFICATION, Metrics.AUC_LOG_LOSS] environment=Environments.PRODUCTION,#pick from training, production,or validation data dataframe=example_dataframe, schema=schema)
There is an optional argument that specifies desired groups of metrics for validation. Combined with a model_type and based on the schema, Arize will validate that these expected metrics will be available in the platform, and will validate required schema columns.
Call __repr__() on a Metrics enum to see its description: