Pandas Batch Logging

Batch Logging - Designed for sending batches of data to Arize

Use the arize Python library to monitor machine learning predictions with a few lines of code in a Jupyter Notebook or a Python server that batch processes backend data

The most commonly used functions/objects are:

Client — Initialize to begin logging model data to Arize

Schema — Organize and map column names containing model data within your Pandas dataframe.

log — Log inferences within a dataframe to Arize via a POST request.

Python Pandas Example

For examples and interactive notebooks, see https://docs.arize.com/arize/examples

# install and import dependencies 
!pip install -q arize

import datetime

from arize.pandas.logger import Client
from arize.utils.types import ModelTypes, Environments, Schema, Metrics
import numpy as np
import pandas as pd

# create Arize client
SPACE_KEY = "SPACE_KEY"  
API_KEY = "API_KEY"  
arize_client = Client(space_key=SPACE_KEY, api_key=API_KEY)

# define schema 
schema = Schema(
    prediction_id_column_name="prediction_id",
    timestamp_column_name="prediction_ts",
    prediction_label_column_name="predicted_label",
    actual_label_column_name="actual_label",
    feature_column_names=feature_column_names,
    tag_column_names=TypedSchema(
        inferred=["tag1", "tag3"],
        to_int=["tag2"],
    )
)

#log data
response = arize_client.log(
    dataframe=df,
    schema=schema,
    model_id="binary-classification-metrics-only-batch-ingestion-tutorial",
    model_version="1.0.0",
    model_type=ModelTypes.BINARY_CLASSIFICATION,
    metrics_validation=[Metrics.CLASSIFICATION],
    validate=True,
    environment=Environments.PRODUCTION
)

Follow this example in Google Colab:

Benchmark Tests

The ability to ingest data with low latency is important to many customers. Below is a benchmarking colab that demonstrates the efficiency with which Arize uploads data from a Python environment.

Sending 10 Million Inferences to Arize in 90 Seconds

Last updated

Copyright © 2023 Arize AI, Inc