Pandas SDK Example

Log model inference data directly to Arize using SDK/API methods

This page shows how to send data to Arize using the .

Step 1: Set Up Python SDK

Install Arize SDK

pip install arize

Initialize Arize client from arize.pandas.logger to call Client.log()

from arize.pandas.logger import Client, Schema

API_KEY = 'ARIZE_API_KEY'
SPACE_KEY = 'YOUR SPACE KEY'
arize_client = Client(space_key=SPACE_KEY, api_key=API_KEY)

Step 2: Set Model Schema Attributes

A model schema is broken into required and optional parameters. Optional model schema parameters vary based on model types. Learn more about model types here. Gain a comprehensive list of schema attributes and their definitions .

Example Row

prediction_id

prediction_ts

prediction_label

actual_label

state

states

gender

vector

text

image_link

1fcd50f4689

1637538845

No Claims

[ca, ak]

female

[1.27346, -0.2138, ...]

"This is an example text"

"https://example_ur.jpg"

schema = Schema(
    prediction_id_column_name="prediction_id", 
    timestamp_column_name="prediction_ts",
    prediction_label_column_name="prediction_label",
    actual_label_column_name="actual_label",
    feature_column_names=["state", "states"],
    tag_column_names=["gender"]
)

Optional: Typed Columns

# feature & tag columns can be optionally defined with typing:
feature_columns = TypedColumns(
    inferred=["state"],
    to_str=["states"],
)

schema = Schema(
    prediction_id_column_name="prediction_id", 
    timestamp_column_name="prediction_ts",
    prediction_label_column_name="prediction_label",
    actual_label_column_name="actual_label",
    feature_column_names=feature_columns,
    tag_column_names=["gender"]
)

Optional: Embeddings

# Declare embedding feature columns
embedding_feature_column_names = {
    # Dictionary keys will be the name of the embedding feature in the app
    "embedding_display_name": EmbeddingColumnNames(
        vector_column_name="vector", # column containing embedding vector (required)
        data_column_name="text", # column containing raw text (optional NLP)
        link_to_data_column_name="image_link" # column containing image URL links (optional CV)
    )
}

schema = Schema(
    embedding_feature_column_names=embedding_feature_column_names,
    ...
)

Optional: SHAP Values

# Generate the Shap Values and save as a Dataframe
explainer = shap.TreeExplainer(tree_model)
shap_values = explainer.shap_values(X_data)
shap_dataframe = pd.DataFrame(
        shap_values, columns=[f"{fn}_shap" for fn in data["state"]]
)
shap_cols = shap_dataframe.columns

schema = Schema(
    shap_values_column_names=dict(zip("state", shap_cols)),
    ...
)

Optional: Delayed Actuals

If your model receives delayed actuals, log your delayed production data using the same prediction ID, which links your files together in the Arize platform. This can be delivered days or weeks after the prediction is received.

#log predictions
schema = Schema(
    prediction_id_column_name="prediction_id", 
    prediction_label_column_name="prediction_label",
    ...
)
# then log actuals 
schema = Schema(
    prediction_id_column_name="prediction_id", #needs to be the same as above
    actual_label_column_name="actual_label",
    ...
)

Step 3: Log Inferences

Arize expects the DataFrame's index to be sorted and begin at 0. If you perform operations that might affect the index prior to logging data, reset the index as follows:

example_dataframe = example_dataframe.reset_index(drop=True)

response = arize_client.log(
    model_id='sample-model-1', 
    model_version='v1', #Specify your model version to easily track changes across the Arize platform (i.e. a retrained model) 
    batch_id=None,
    model_type=ModelTypes.BINARY_CLASSIFICATION,
    metrics_validation=[Metrics.CLASSIFICATION, Metrics.AUC_LOG_LOSS]
    environment=Environments.PRODUCTION,#pick from training, production,or validation data
    dataframe=example_dataframe,
    schema=schema
)

Optional: Metrics Validation

metrics_validation=[Metrics.CLASSIFICATION, Metrics.AUC_LOG_LOSS]

There is an optional argument that specifies desired groups of metrics for validation. Combined with a model_type and based on the schema, Arize will validate that these expected metrics will be available in the platform, and will validate required schema columns.

Call __repr__() on a Metrics enum to see its description:

repr(Metrics.CLASSIFICATION)
> CLASSIFICATION metrics include: Accuracy, Recall, Precision, FPR, FNR, F1, Sensitivity, Specificity

Other Supported SDKs

Last updated 1 year ago

Was this helpful?

schema = Schema( prediction_id_column_name="prediction_id", timestamp_column_name="prediction_ts", prediction_label_column_name="prediction_label", actual_label_column_name="actual_label", feature_column_names=["state", "states"], tag_column_names=["gender"] )

# feature & tag columns can be optionally defined with typing: feature_columns = TypedColumns( inferred=["state"], to_str=["states"], ) schema = Schema( prediction_id_column_name="prediction_id", timestamp_column_name="prediction_ts", prediction_label_column_name="prediction_label", actual_label_column_name="actual_label", feature_column_names=feature_columns, tag_column_names=["gender"] )

# Declare embedding feature columns embedding_feature_column_names = { # Dictionary keys will be the name of the embedding feature in the app "embedding_display_name": EmbeddingColumnNames( vector_column_name="vector", # column containing embedding vector (required) data_column_name="text", # column containing raw text (optional NLP) link_to_data_column_name="image_link" # column containing image URL links (optional CV) ) } schema = Schema( embedding_feature_column_names=embedding_feature_column_names, ... )

# Generate the Shap Values and save as a Dataframe explainer = shap.TreeExplainer(tree_model) shap_values = explainer.shap_values(X_data) shap_dataframe = pd.DataFrame( shap_values, columns=[f"{fn}_shap" for fn in data["state"]] ) shap_cols = shap_dataframe.columns schema = Schema( shap_values_column_names=dict(zip("state", shap_cols)), ... )

#log predictions schema = Schema( prediction_id_column_name="prediction_id", prediction_label_column_name="prediction_label", ... ) # then log actuals schema = Schema( prediction_id_column_name="prediction_id", #needs to be the same as above actual_label_column_name="actual_label", ... )

response = arize_client.log( model_id='sample-model-1', model_version='v1', #Specify your model version to easily track changes across the Arize platform (i.e. a retrained model) batch_id=None, model_type=ModelTypes.BINARY_CLASSIFICATION, metrics_validation=[Metrics.CLASSIFICATION, Metrics.AUC_LOG_LOSS] environment=Environments.PRODUCTION,#pick from training, production,or validation data dataframe=example_dataframe, schema=schema )