Quickstart: Datasets
This guide helps you create golden datasets to benchmark your LLM application performance. Learn more about the concepts of datasets here.
You can create your dataset in four ways:
Import spans from your application
If you have added tracing to your application, you can create datasets by adding spans from your application with Arize. Go to the traces page and filter for the examples you care about.
In the example below, we are filtering for spans with a hallucination label, and adding them to a dataset.
Use AI search to curate your dataset with natural language.✨
Create your dataset in code
To start let's install the packages we need:
Let's get your developer key by clicking "code" on the datasets page.
Let's setup the Arize Dataset Client to create or update a dataset.
You can create many different kinds of datasets. We'll start with the simplest example below, and if you'd like to upload datasets with prompt variables, edit or delete your datasets, follow this guide.
Auto-generate using LLMs
When you are first developing with LLMs, you typically start with a prompt and little else. The early iteration gets you to a point where the video demo looks amazing, but there's a lack of confidence in its reliability and robustness.
This is where you can use LLMs to generate examples for you based on your prompt. Here's an example, where we can use ChatGPT or your LLM of choice to create a set of examples you can upload to Arize.
This will generate a CSV file for you that looks like:
Coming soon, you'll be able to do this directly in the Arize platform. You can use the auto-generated examples and upload them using code or CSV below.
Upload a CSV (coming soon)
If you are managing your existing data points today in google sheets or excel, you can import those directly into Arize. You can set your input variables and expected output for your experiment runs.
Last updated