How To: LLM Playground

Overview

The Prompt Playground offers a UI to experiment with prompt templates, input variables, LLM models and LLM parameters, leveraging a no-code platform where both coding and non-coding prompt experts can optimize their prompts for production applications.

Need help optimizing your prompt? Ask Copilot to optimize for you.

Experiment with Template and Variables

In the example below, we create a prompt template for a AI travel agent chatbot. We can chain together a series of system and user messages to test the chatbot on a specific example, adding a list of input Variables in {mustache} notation and specifying their values in the Variables column. We hit Run to see the LLM Output on the right.

Upon our first attempt, we find that the response is far too long. We iterate on the template by adding a new {max_words} variable to the template. With this change, we see an improved LLM response that optimizes for our desired user experience!

Load Prompt from Span

Many Prompt Playground users already have an existing prompt template in a production application. In this case, when you are viewing the spans from the production application in the Arize UI, you can hit the Prompt Playground icon to iterate on the template in the playground if you find an interesting message.

In the example below, we see a jailbreak attempt from a user. Thankfully, this jailbreak message was caught by an Arize Guard. However, we still want to test the prompt locally in the playground to figure out which model is most robust to these types of jailbreak attempts.

Experiment with Models

After hitting the Prompt Playground button above, we are taken to the Prompt Playground page. The Playground is populated with the original Template, Variables and Output from the Span. We add a user message and find that gpt-3.5-turbo gives a dangerous response to the user message.

We switch models to see if another model is more robust to these sorts of questions. We try gpt-4, keeping the same temperature settings and other parameters.

Note that Arize offers a large variety of model integrations, including:

Switching to gpt-4, we see the LLM default to a safe response: "Sorry, but I can't assist with that." Problem solved!

Run Prompt Playground on Dataset

You can also load a dataset into the playground. It will pull the prompt template from the first example in the dataset and load up all the prompt variables across the dataset examples.

Last updated

Copyright © 2023 Arize AI, Inc