If you'd like to create your datasets programmatically, you can using our python SDK to create, update, and delete datasets.
To start let's install the packages we need:
!pipinstall"arize[Datasets]"pandas
Let's get your developer key by clicking "code" on the datasets page.
Let's setup the Arize Dataset Client to create or update a dataset.
from arize.experimental.datasets import ArizeDatasetsClientimport pandas as pdclient =ArizeDatasetsClient(developer_key=developer_key)
You can create many different kinds of datasets. The examples below are sorted by complexity.
If you are looking to upload a standard set of examples with string inputs, you can create the dataframe as such.
import pandas as pdimport jsonfrom arize.experimental.datasets.utils.constants import GENERATIVEdata = [{"persona":"An aspiring musician who is writing their own songs","problem":"I often get stuck overthinking my lyrics and melodies.",}]df = pd.DataFrame(data)dataset_id = client.create_dataset( space_id="YOUR_SPACE_ID", dataset_name="Your Dataset", dataset_type=GENERATIVE, data=df)
In this dataset, we'll attach the prompt to the data points so when you import it in prompt playground, the prompt will automatically appear.
import pandas as pdimport jsonfrom arize.experimental.datasets.utils.constants import GENERATIVEPROMPT_TEMPLATE ="""You are an expert product manager recommending features for a target user. Persona: {persona}Problem: {problem}"""data = [{"attributes.llm.prompt_template.template": PROMPT_TEMPLATE,"persona":"An aspiring musician who is writing their own songs","problem":"I often get stuck overthinking my lyrics and melodies.",}]df = pd.DataFrame(data)dataset_id = client.create_dataset( space_id="YOUR_SPACE_ID", dataset_name="Your Dataset", dataset_type=GENERATIVE, data=df)
In this dataset, we are setting attributes.llm.prompt_template.variables to a dictionary converted to a JSON string. Conforming to the openinference semantic conventions here allows you to use these attributes in prompt playground, and they will correctly import as input variables.
import pandas as pdimport jsonfrom arize.experimental.datasets.utils.constants import GENERATIVEPROMPT_TEMPLATE ="""You are an expert product manager recommending features for a target user. Persona: {persona}Problem: {problem}"""data = [{"attributes.llm.prompt_template.template": PROMPT_TEMPLATE,"attributes.llm.prompt_template.variables": json.dumps({"persona": "An aspiring musician who is writing their own songs","problem": "I often get stuck overthinking my lyrics and melodies.", })},{"attributes.llm.prompt_template.template": PROMPT_TEMPLATE,"attributes.llm.prompt_template.variables": json.dumps({"persona": "A Christian who goes to church every week","problem": "I'm often too tired for deep Bible study at the end of the day.", })},]df = pd.DataFrame(data)dataset_id = client.create_dataset( space_id="YOUR_SPACE_ID", dataset_name="Your Dataset", dataset_type=GENERATIVE, data=df)
Here's how it looks importing the dataset into prompt playground, making it very easy to iterate on your prompt and test new outputs across many data points.