How To: LLM Playground
Overview
The Prompt Playground offers a UI to experiment with prompt templates, input variables, LLM models and LLM parameters, leveraging a no-code platform where both coding and non-coding prompt experts can optimize their prompts for production applications.
Need help optimizing your prompt? ✨ Ask Copilot to optimize for you.
Experiment with Template and Variables
In the example below, we create a prompt template for a AI travel agent chatbot. We can chain together a series of system
and user
messages to test the chatbot on a specific example, adding a list of input Variables
in {mustache}
notation and specifying their values in the Variables
column. We hit Run
to see the LLM Output
on the right.
Upon our first attempt, we find that the response is far too long. We iterate on the template by adding a new {max_words}
variable to the template. With this change, we see an improved LLM response that optimizes for our desired user experience!
Load Prompt from Span
Many Prompt Playground users already have an existing prompt template in a production application. In this case, when you are viewing the spans from the production application in the Arize UI, you can hit the Prompt Playground
icon to iterate on the template in the playground if you find an interesting message.
In the example below, we see a jailbreak attempt from a user. Thankfully, this jailbreak message was caught by an Arize Guard. However, we still want to test the prompt locally in the playground to figure out which model is most robust to these types of jailbreak attempts.
Experiment with Models
After hitting the Prompt Playground
button above, we are taken to the Prompt Playground page. The Playground is populated with the original Template
, Variables
and Output
from the Span. We add a user message and find that gpt-3.5-turbo
gives a dangerous response to the user message.
We switch models to see if another model is more robust to these sorts of questions. We try gpt-4
, keeping the same temperature
settings and other parameters.
Note that Arize offers a large variety of model integrations, including:
Switching to gpt-4
, we see the LLM default to a safe response: "Sorry, but I can't assist with that." Problem solved!
Run Prompt Playground on Dataset
You can also load a dataset into the playground. It will pull the prompt template from the first example in the dataset and load up all the prompt variables across the dataset examples.
Last updated