If you'd like to create your datasets programmatically, you can using our python SDK to create, update, and delete datasets.
To start let's install the packages we need:
!pip install "arize[Datasets]" pandas
Let's get your developer key by clicking "code" on the datasets page.
Let's setup the Arize Dataset Client to create or update a dataset. See here for API reference.
from arize.experimental.datasets import ArizeDatasetsClient
import pandas as pd
client = ArizeDatasetsClient(developer_key=developer_key)
You can create many different kinds of datasets. The examples below are sorted by complexity.
If you are looking to upload a standard set of examples with string inputs, you can create the dataframe as such.
import pandas as pd
import json
from arize.experimental.datasets.utils.constants import GENERATIVE
data = [{
"persona": "An aspiring musician who is writing their own songs",
"problem": "I often get stuck overthinking my lyrics and melodies.",
}]
df = pd.DataFrame(data)
dataset_id = client.create_dataset(
space_id="YOUR_SPACE_ID",
dataset_name="Your Dataset",
dataset_type=GENERATIVE,
data=df
)
In this dataset, we'll attach the prompt to the data points so when you import it in prompt playground, the prompt will automatically appear.
import pandas as pd
import json
from arize.experimental.datasets.utils.constants import GENERATIVE
PROMPT_TEMPLATE = """
You are an expert product manager recommending features for a target user.
Persona: {persona}
Problem: {problem}
"""
data = [{
"attributes.llm.prompt_template.template": PROMPT_TEMPLATE,
"persona": "An aspiring musician who is writing their own songs",
"problem": "I often get stuck overthinking my lyrics and melodies.",
}]
df = pd.DataFrame(data)
dataset_id = client.create_dataset(
space_id="YOUR_SPACE_ID",
dataset_name="Your Dataset",
dataset_type=GENERATIVE,
data=df
)
In this dataset, we are setting attributes.llm.prompt_template.variables to a dictionary converted to a JSON string. Conforming to the openinference semantic conventions here allows you to use these attributes in prompt playground, and they will correctly import as input variables.
import pandas as pd
import json
from arize.experimental.datasets.utils.constants import GENERATIVE
PROMPT_TEMPLATE = """
You are an expert product manager recommending features for a target user.
Persona: {persona}
Problem: {problem}
"""
data = [
{
"attributes.llm.prompt_template.template": PROMPT_TEMPLATE,
"attributes.llm.prompt_template.variables": json.dumps({
"persona": "An aspiring musician who is writing their own songs",
"problem": "I often get stuck overthinking my lyrics and melodies.",
})
},
{
"attributes.llm.prompt_template.template": PROMPT_TEMPLATE,
"attributes.llm.prompt_template.variables": json.dumps({
"persona": "A Christian who goes to church every week",
"problem": "I'm often too tired for deep Bible study at the end of the day.",
})
},
]
df = pd.DataFrame(data)
dataset_id = client.create_dataset(
space_id="YOUR_SPACE_ID",
dataset_name="Your Dataset",
dataset_type=GENERATIVE,
data=df
)
Here's how it looks importing the dataset into prompt playground, making it very easy to iterate on your prompt and test new outputs across many data points.