Pandas SDK Example

Log model inference data directly to Arize using SDK/API methods

This page shows how to send data to Arize using the Python Pandas SDK.

This page shows how to send data to Arize using the Python Pandas SDK.

This page shows how to send data to Arize using the Python Pandas SDK.

Step 1: Set Up Python SDK

Install Arize SDK

pip install arize

Initialize Arize client from arize.pandas.logger to call Client.log()

from arize.pandas.logger import Client, Schema

API_KEY = 'ARIZE_API_KEY'
SPACE_ID = 'YOUR SPACE ID'
arize_client = Client(space_id=SPACE_ID, api_key=API_KEY)

Step 2: Set Model Schema Attributes

A model schema is broken into required and optional parameters. Optional model schema parameters vary based on model types. Learn more about model types here. Gain a comprehensive list of schema attributes and their definitions here.

Example Row

prediction_id
prediction_ts
prediction_label
actual_label
state
states
gender
vector
text
image_link

1fcd50f4689

1637538845

No Claims

No Claims

ca

[ca, ak]

female

[1.27346, -0.2138, ...]

"This is an example text"

"https://example_ur.jpg"

schema = Schema(
    prediction_id_column_name="prediction_id", 
    timestamp_column_name="prediction_ts",
    prediction_label_column_name="prediction_label",
    actual_label_column_name="actual_label",
    feature_column_names=["state", "states"],
    tag_column_names=["gender"]
)

Optional: Typed Columns

See Sending Data FAQ for more info on SDK typing features.

# feature & tag columns can be optionally defined with typing:
feature_columns = TypedColumns(
    inferred=["state"],
    to_str=["states"],
)

schema = Schema(
    prediction_id_column_name="prediction_id", 
    timestamp_column_name="prediction_ts",
    prediction_label_column_name="prediction_label",
    actual_label_column_name="actual_label",
    feature_column_names=feature_columns,
    tag_column_names=["gender"]
)

Optional: Embeddings

# Declare embedding feature columns
embedding_feature_column_names = {
    # Dictionary keys will be the name of the embedding feature in the app
    "embedding_display_name": EmbeddingColumnNames(
        vector_column_name="vector", # column containing embedding vector (required)
        data_column_name="text", # column containing raw text (optional NLP)
        link_to_data_column_name="image_link" # column containing image URL links (optional CV)
    )
}

schema = Schema(
    embedding_feature_column_names=embedding_feature_column_names,
    ...
)

Optional: SHAP Values

# Generate the Shap Values and save as a Dataframe
explainer = shap.TreeExplainer(tree_model)
shap_values = explainer.shap_values(X_data)
shap_dataframe = pd.DataFrame(
        shap_values, columns=[f"{fn}_shap" for fn in data["state"]]
)
shap_cols = shap_dataframe.columns

schema = Schema(
    shap_values_column_names=dict(zip("state", shap_cols)),
    ...
)

Optional: Delayed Actuals

If your model receives delayed actuals, log your delayed production data using the same prediction ID, which links your files together in the Arize platform. This can be delivered days or weeks after the prediction is received.

#log predictions
schema = Schema(
    prediction_id_column_name="prediction_id", 
    prediction_label_column_name="prediction_label",
    ...
)
# then log actuals 
schema = Schema(
    prediction_id_column_name="prediction_id", #needs to be the same as above
    actual_label_column_name="actual_label",
    ...
)

Step 3: Log Inferences

Arize expects the DataFrame's index to be sorted and begin at 0. If you perform operations that might affect the index prior to logging data, reset the index as follows:

example_dataframe = example_dataframe.reset_index(drop=True)
response = arize_client.log(
    model_id='sample-model-1', 
    model_version='v1', #Specify your model version to easily track changes across the Arize platform (i.e. a retrained model) 
    batch_id=None,
    model_type=ModelTypes.BINARY_CLASSIFICATION,
    metrics_validation=[Metrics.CLASSIFICATION, Metrics.AUC_LOG_LOSS]
    environment=Environments.PRODUCTION,#pick from training, production,or validation data
    dataframe=example_dataframe,
    schema=schema
)

Optional: Metrics Validation

metrics_validation=[Metrics.CLASSIFICATION, Metrics.AUC_LOG_LOSS]

There is an optional argument that specifies desired groups of metrics for validation. Combined with a model_type and based on the schema, Arize will validate that these expected metrics will be available in the platform, and will validate required schema columns.

Call __repr__() on a Metrics enum to see its description:

repr(Metrics.CLASSIFICATION)
> CLASSIFICATION metrics include: Accuracy, Recall, Precision, FPR, FNR, F1, Sensitivity, Specificity

Learn more about metrics families here.

Other Supported SDKs

Tutorials on how to log predictions, actuals, and feature importance.

Logging Predictions Only

Logging Predictions First, Then Logging Delayed Actuals

Logging Predictions First, Then Logging SHAPs After

Logging Predictions and Actuals Together

Logging Predictions and SHAP Together

Logging Predictions, Actuals, and SHAP Together

Logging PySpark DataFrames

Last updated

Copyright © 2023 Arize AI, Inc