Search…
⌃K
Links

Python Pandas SDK

Log model inference data directly to Arize using SDK/API methods
This page shows how to send data to Arize using the Python Pandas SDK.

Step 1: Set Up Python SDK

Install Arize SDK
pip install arize
Initialize Arize client from arize.pandas.logger to call Client.log()
from arize.pandas.logger import Client, Schema
​
API_KEY = 'ARIZE_API_KEY'
SPACE_KEY = 'YOUR SPACE KEY'
arize_client = Client(space_key=SPACE_KEY, api_key=API_KEY)

Step 2: Set Model Schema Attributes

A model schema is broken into required and optional parameters. Optional model schema parameters vary based on model types. Learn more about model types here. Gain a comprehensive list of schema attributes and their definitions here.

Example Row

prediction_id
prediction_ts
prediction_label
actual_label
state
gender
vector
text
image_link
1fcd50f4689
1637538845
No Claims
No Claims
ca
female
[1.27346, -0.2138, ...]
"This is an example text"
"https://example_ur.jpg"
schema = Schema(
prediction_id_column_name="prediction_id",
timestamp_column_name="prediction_ts",
prediction_label_column_name="prediction_label",
actual_label_column_name="actual_label",
feature_column_names="state",
tag_column_names="gender"
)

Optional: Embeddings

# Declare embedding feature columns
embedding_feature_column_names = {
# Dictionary keys will be the name of the embedding feature in the app
"embedding_display_name": EmbeddingColumnNames(
vector_column_name="vector", # column containing embedding vector (required)
data_column_name="text", # column containing raw text (optional NLP)
link_to_data_column_name="image_link" # column containing image URL links (optional CV)
)
}
​
schema = Schema(
embedding_feature_column_names=embedding_feature_column_names,
...
)

Optional: SHAP Values

# Generate the Shap Values and save as a Dataframe
explainer = shap.TreeExplainer(tree_model)
shap_values = explainer.shap_values(X_data)
shap_dataframe = pd.DataFrame(
shap_values, columns=[f"{fn}_shap" for fn in data["state"]]
)
shap_cols = shap_dataframe.columns
​
schema = Schema(
shap_values_column_names=dict(zip("state", shap_cols)),
...
)

Optional: Delayed Actuals

If your model receives delayed actuals, log your delayed production data using the same prediction ID, which links your files together in the Arize platform. This can be delivered days or weeks after the prediction is received.
#log predictions
schema = Schema(
prediction_id_column_name="prediction_id",
prediction_label_column_name="prediction_label",
...
)
# then log actuals
schema = Schema(
prediction_id_column_name="prediction_id", #needs to be the same as above
actual_label_column_name="actual_label",
...
)

Step 3: Log Inferences

Arize expects the DataFrame's index to be sorted and begin at 0. If you perform operations that might affect the index prior to logging data, reset the index as follows:
example_dataframe = example_dataframe.reset_index(drop=True)
response = arize_client.log(
model_id='sample-model-1',
model_version='v1', #Specify your model version to easily track changes across the Arize platform (i.e. a retrained model)
path='inferences.bin',
batch_id=None,
model_type=ModelTypes.BINARY_CLASSIFICATION,
metrics_validation=[Metrics.CLASSIFICATION, Metrics.AUC_LOG_LOSS]
environment=Environments.PRODUCTION,#pick from training, production,or validation data
dataframe=example_dataframe,
schema=schema
)

Optional: Metrics Validation

metrics_validation=[Metrics.CLASSIFICATION, Metrics.AUC_LOG_LOSS]
There is an optional argument that specifies desired groups of metrics for validation. Combined with a model_type and based on the schema, Arize will validate that these expected metrics will be available in the platform, and will validate required schema columns.
Call __repr__() on a Metrics enum to see its description:
repr(Metrics.CLASSIFICATION)
> CLASSIFICATION metrics include: Accuracy, Recall, Precision, FPR, FNR, F1, Sensitivity, Specificity
Learn more about metrics families here.

Supported SDKs