Datasets & Experiments

The velocity of AI application development is bottlenecked by quality evaluations because AI engineers are often faced with hard tradeoffs: which prompt or LLM best balances performance, latency, and cost. High quality evaluations are critical as they can help developers answer these types of questions with greater confidence as they are making changes.

Datasets

Datasets are collections of examples that provide the inputs and, optionally, expected reference outputs for assessing your application. Datasets allow you to collect data from production, staging, evaluations, and even manually. The examples collected are used to run experiments and evaluations to track improvements to your prompt, LLM, or other parts of your LLM application.

Experiments

In AI development, it's hard to understand how a change will affect performance. This breaks the dev flow, making iteration more guesswork than engineering.

Experiments and evaluations solve this, helping distill the indeterminism of LLMs into tangible feedback that helps you ship more reliable product.

Specifically, good evals help you:

  • Understand whether an update is an improvement or a regression

  • Drill down into good / bad examples

  • Compare specific examples vs. prior runs

  • Avoid guesswork

Creating Datasets

UI:

Programmatically:

!pip install -q arize==7.19.0rc1

from arize.experimental.datasets import ArizeDatasetsClient
from arize.experimental.datasets.utils.constants import INFERENCES, GENERATIVE
import pandas as pd

data = {
    'input': ['Alice', 'Bob', 'Charlie'],
    'output': [1, 2, 3]
}

test_df = pd.DataFrame(data)

client = ArizeDatasetsClient(api_key="YOUR-API-KEY")
dataset_id = client.create_dataset(space_id="YOUR-SPACE-ID", 
                dataset_name="my_dataset", 
                dataset_type=GENERATIVE, 
                data=test_df)

Get Datasets by ID

out_df = client.get_dataset(space_id="YOUR-SPACE-ID", 
                            dataset_id="YOUR-DATASET-ID")

Update Datasets

updated_df = client.update_dataset(space_id="YOUR-SPACE-ID", 
                                   dataset_id="YOUR-DATASET-ID", 
                                   data=new_dataframe)

List Datasets

ds_list = client.list_datasets(space_id="YOUR-SPACE-ID")

Get Dataset Versions

versions = client.get_dataset_versions(space_id="YOUR-SPACE-ID", 
                                       dataset_id="YOUR-DATASET-ID")

Last updated

Copyright © 2023 Arize AI, Inc