Last updated
Was this helpful?
Last updated
Was this helpful?
This guide helps you run experiments to test and validate changes in your LLM applications against a curated dataset. Learn more about the .
Download this and upload it into the UI. The CSV must have an id
column. See example CSV below:
Load the dataset you created into prompt playground, and run it to see your results. It's dead simple. Once you've finished the run, you can save it as an experiment to track your changes.
*Coming soon*
If you want to use your application data, you can also .
We are adding the capability soon to run evaluation templates against your playground tests in the UI in just a few clicks. You can follow the code flow to setup an evaluator or create an .
Learn more about for an experiment.
An evaluator is any function that (1) takes the task output and (2) returns an assessment. This gives you tremendous flexibility to using a custom template, or use .
We use our OSS package Arize Phoenix to run LLM-based evaluations, using llm_classify
. See here for the .
Read more about for experiments.
See the for more info. You can specify dry_run=True
, which does not log the result to Arize. You can also specify exit_on_error=True
, which makes it easier to debug when an experiment doesn't run correctly.