Load a Dataset into Playground
Last updated
Was this helpful?
Last updated
Was this helpful?
Many users curate datasets for evaluating their prompts in their playground, which often cover the following use cases:
'Golden datasets' of core examples where it is important to avoid a regression — for example, critical user queries or high-impact business scenarios.
Challenge datasets containing hard examples where they would like to hill climb on performance — for example, a dataset of jailbreak prompts or examples of past hallucinations.
When modifying a prompt in the playground, you can test your new prompt across a dataset of examples to validate that the model is hill climbing in terms of performance across challenging examples, without regressing on core business use cases.