Load a Dataset into Playground
Last updated
Last updated
Copyright © 2023 Arize AI, Inc
Many users curate datasets for evaluating their prompts in their playground, which often cover the following use cases:
'Golden datasets' of core examples where it is important to avoid a regression — for example, critical user queries or high-impact business scenarios.
Challenge datasets containing hard examples where they would like to hill climb on performance — for example, a dataset of jailbreak prompts or examples of past hallucinations.
When modifying a prompt in the playground, you can test your new prompt across a dataset of examples to validate that the model is hill climbing in terms of performance across challenging examples, without regressing on core business use cases.