Pandas Batch Logging
Batch Logging - Designed for sending batches of data to Arize
Use the
arize
Python library to monitor machine learning predictions with a few lines of code in a Jupyter Notebook or a Python server that batch processes backend dataThe most commonly used functions/objects are:
# install and import dependencies
!pip install -q arize
import datetime
from arize.pandas.logger import Client
from arize.utils.types import ModelTypes, Environments, Schema, Metrics
import numpy as np
import pandas as pd
# create Arize client
SPACE_KEY = "SPACE_KEY"
API_KEY = "API_KEY"
arize_client = Client(space_key=SPACE_KEY, api_key=API_KEY)
#define schema
schema = Schema(
prediction_id_column_name="prediction_id",
timestamp_column_name="prediction_ts",
prediction_label_column_name="predicted_label",
actual_label_column_name="actual_label",
feature_column_names=feature_column_names
)
#log data
response = arize_client.log(
dataframe=df,
schema=schema,
model_id="binary-classification-metrics-only-batch-ingestion-tutorial",
model_version="1.0.0",
model_type=ModelTypes.BINARY_CLASSIFICATION,
metrics_validation=[Metrics.CLASSIFICATION],
validate=True,
environment=Environments.PRODUCTION
)
Follow this example in Google Colab:
pip install arize #Install the Arize SDK
pip install arize[AutoEmbeddings] # Install extra dependencies to autogenerate embeddings
pip install arize[LLM_Evaluation] # Install extra dependencies to compute LLM evaluation metrics
Initialize Arize
Client
, Schema
, ModelTypes
, Environments
, and Metrics
to begin logging a Pandas dataframe: from arize.pandas.logger import Client
from arize.utils.types import ModelTypes, Environments, Schema, Metrics
Data ingestion rejects datasets with mixed type columns. These columns should be converted to Float before sending. Below is an example of a mixed type column in Pandas an how to convert it.
import pandas as pd
# Example Series with mixed types
mixed = pd.Series([1, "", 2]) # it has numbers and strings
mixed.dtype # dtype('O')
# It should be converted to float
# Replace "" with NaN
mixed = mixed.replace("", float("NaN"))
mixed.dtype # dtype('float64')
The ability to ingest data with low latency is important to many customers. Below is a benchmarking colab that demonstrates the efficiency with which Arize uploads data from a Python environment.
Sending 10 Million Inferences to Arize in 90 Seconds |
Last modified 26d ago