Arize AI
Search…
⌃K

Log Directly Via SDK/API

Log model inference data directly to Arize using SDK/API methods
This page overviews how to ingest model data directly to Arize using the Python SDK. Learn how to ingest data from your cloud storage provider here. Arize supports logging model inference data using a few different methods

Step 1: Set Up Python SDK

Install Arize SDK
pip install arize
Initialize Arize client from arize.pandas.logger to call Client.log()
from arize.pandas.logger import Client, Schema
API_KEY = 'ARIZE_API_KEY'
SPACE_KEY = 'YOUR SPACE KEY'
arize_client = Client(space_key=SPACE_KEY, api_key=API_KEY)

Example Row

prediction_id
prediction_ts
prediction_label
actual_label
state
gender
vector
text
image_link
1fcd50f4689
1637538845
No Claims
No Claims
ca
female
[1.27346, -0.2138, ...]
"This is an example text"
"https://example_ur.jpg"

Step 2: Log Model Schema Attributes

Optional: Embeddings & SHAP

Learn more about embeddings here and SHAP here.
# Declare embedding feature columns
embedding_feature_column_names = {
# Dictionary keys will be the name of the embedding feature in the app
"embedding_display_name": EmbeddingColumnNames(
vector_column_name="vector", # column containing embedding vector (required)
data_column_name="text", # column containing raw text (optional NLP)
link_to_data_column_name="image_link" # column containing image URL links (optional CV)
)
}
# Generate the Shap Values and save as a Dataframe
explainer = shap.TreeExplainer(tree_model)
shap_values = explainer.shap_values(X_data)
shap_dataframe = pd.DataFrame(
shap_values, columns=[f"{fn}_shap" for fn in data["state"]]
)
shap_cols = shap_dataframe.columns
Gain a comprehensive list of schema attributes and their definitions here.
schema = Schema(
prediction_id_column_name="prediction_id",
timestamp_column_name="prediction_ts",
prediction_label_column_name="prediction_label",
actual_label_column_name="actual_label",
feature_column_names="state",
tag_column_names="gender",
embedding_feature_column_names=embedding_feature_column_names,
shap_values_column_names=dict(zip("state", shap_cols))
)

Step 3: Log Model Schema Parameters

A model schema is broken into required and optional parameters. Optional model schema parameters vary based on model types. Learn more about model types here.
response = arize_client.log(
model_id='sample-model-1',
model_version='v1', #Specify your model version to easily track changes across the Arize platform (i.e. a retrained model)
path='inferences.bin',
batch_id=None,
model_type=ModelTypes.BINARY_CLASSIFICATION,
metrics_validation=[Metrics.CLASSIFICATION, Metrics.AUC_LOG_LOSS]
environment=Environments.PRODUCTION,#pick from training, production,or validation data
dataframe=example_dataframe,
schema=schema
)

Optional: Metrics Validation

metrics_validation=[Metrics.CLASSIFICATION, Metrics.AUC_LOG_LOSS]
There is an optional argument that specifies desired groups of metrics for validation. Combined with a model_type and based on the schema, Arize will validate that these expected metrics will be available in the platform, and will validate required schema columns.
Call __repr__() on a Metrics enum to see its description:
repr(Metrics.CLASSIFICATION)
> CLASSIFICATION metrics include: Accuracy, Recall, Precision, FPR, FNR, F1, Sensitivity, Specificity
Learn more about metrics families here.

Optional: Delayed Actuals

If your model receives delayed actuals, log your delayed production data using the same prediction ID, which links your files together in the Arize platform. This can be delivered days or weeks after the prediction is received.
#log predictions
schema = Schema(
prediction_id_column_name="prediction_id",
prediction_label_column_name="prediction_label",
...
)
# then log actuals
schema = Schema(
prediction_id_column_name="prediction_id", #needs to be the same as above
actual_label_column_name="actual_label",
...
)