There are subtle differences between the experiments SDK using Arize vs. Phoenix, but the base concepts are the same. The example below runs an experiment to write a haiku, and evaluate its tone using an LLM Eval.
You can check out a full notebook example of each.
Arize uses the ArizeDatasetsClient, which requires a developer key you can find in the Arize product. The arize_client.create_dataset function also returns a dataset_id, instead of a dataset object. So if you want to print or manipulate the dataset, you will need to get the dataset using arize_client.get_dataset.
Phoenix uses px.Client().upload_dataset.
############## FOR ARIZE############## Setup Importsimport pandas as pdfrom arize.experimental.datasets import ArizeDatasetsClientfrom arize.experimental.datasets.utils.constants import GENERATIVE# Setup Arize datasets connectiondeveloper_key =""space_id =""api_key =""arize_client =ArizeDatasetsClient(developer_key=developer_key, api_key=api_key)# Create dataframe to uploaddata = [{"topic":"Zebras"}]df = pd.DataFrame(data)# Create dataset in Arizedataset_id = arize_client.create_dataset( dataset_name="haiku-topics"+str(uuid1())[:5], data=df, space_id=space_id, dataset_type=GENERATIVE)# Get dataset from Arizedataset = arize_client.get_dataset( space_id=space_id, dataset_id=dataset_id)############## FOR PHOENIX#############import phoenix as px# Create dataframe to uploaddata = [{"topic":"Zebras"}]df = pd.DataFrame(data)# Upload dataset to Phoenixdataset = px.Client().upload_dataset( dataset_name="haiku-topics", dataframe=df)
Task definition
We define the LLM call here, which uses data from the dataset_row as prompt template variables in Arize. In Phoenix, we use the input variable to capture the items in the dataset.
############## FOR ARIZE#############import openai# Define task to create Haiku using OpenAIdefcreate_haiku(dataset_row) ->str:# Dataset row uses the dataframe from above topic = dataset_row.get("topic") openai_client = openai.OpenAI() response = openai_client.chat.completions.create( model="gpt-4o-mini", messages=[{"role": "user", "content": f"Write a haiku about {topic}"}], max_tokens=20 )assert response.choicesreturn response.choices[0].message.content############## FOR PHOENIX#############import openaidefcreate_haiku(input) ->str: topic =input.get("topic")# continue rest of function
Evaluator definition
For both Arize and Phoenix, you can often use the exact same function as your evaluator. Phoenix does have slightly different way of accessing metadata from your dataset.
Arize uses input, output, dataset_row, metadata as the optional input variables to pass into the function.
Phoenix uses input, expected, reference, example, metadata as the input variables to pass into the function.
# FOR ARIZE IMPORTfrom arize.experimental.datasets.experiments.evaluators.base import EvaluationResult# FOR PHOENIX IMPORTfrom phoenix.experiments.types import EvaluationResult############## FOR ARIZE AND PHOENIX#############from phoenix.evals import ( OpenAIModel, llm_classify,)CUSTOM_TEMPLATE ="""You are evaluating whether tone is positive, neutral, or negative[Message]: {output}Respond with either "positive", "neutral", or "negative""""defis_positive(output): df_in = pd.DataFrame({"output": output}, index=[0]) eval_df =llm_classify( dataframe=df_in, template=CUSTOM_TEMPLATE, model=OpenAIModel(model="gpt-4o"), rails=["positive", "neutral", "negative"], provide_explanation=True )# return score, label, explanationreturnEvaluationResult(score=1, label=eval_df['label'][0], explanation=eval_df['explanation'][0])
Run the experiment
Arize and Phoenix uses slightly different functions to run an experiment. Arize requires the space_id to be passed in, where Phoenix does not have spaces.
In Arize, you also pass in the dataset_id, instead of the dataset object itself in Phoenix.
############## FOR ARIZE#############experiment_id = arize_client.run_experiment( space_id=space_id, dataset_id=dataset_id, task=create_haiku, evaluators=[is_positive], experiment_name=f"haiku-example-{str(uuid1())[:5]}")############## FOR PHOENIX#############from phoenix.experiments import run_experimentexperiment_results =run_experiment( dataset=dataset, task=create_haiku, evaluators=[is_positive], experiment_name=f"haiku-example-{str(uuid1())[:5]}")