Phoenix
TypeScript APIPython APICommunityGitHubPhoenix Cloud
English
  • Documentation
  • Self-Hosting
  • Cookbooks
  • Learn
  • Integrations
  • SDK and API Reference
  • Release Notes
English
  • Arize Phoenix
  • Quickstarts
  • User Guide
  • Environments
  • Phoenix Demo
  • 🔭Tracing
    • Overview: Tracing
    • Quickstart: Tracing
      • Quickstart: Tracing (Python)
      • Quickstart: Tracing (TS)
    • Features: Tracing
      • Projects
      • Annotations
      • Sessions
    • Integrations: Tracing
    • How-to: Tracing
      • Setup Tracing
        • Setup using Phoenix OTEL
        • Setup using base OTEL
        • Using Phoenix Decorators
        • Setup Tracing (TS)
        • Setup Projects
        • Setup Sessions
      • Add Metadata
        • Add Attributes, Metadata, Users
        • Instrument Prompt Templates and Prompt Variables
      • Annotate Traces
        • Annotating in the UI
        • Annotating via the Client
        • Running Evals on Traces
        • Log Evaluation Results
      • Importing & Exporting Traces
        • Import Existing Traces
        • Export Data & Query Spans
        • Exporting Annotated Spans
      • Advanced
        • Mask Span Attributes
        • Suppress Tracing
        • Filter Spans to Export
        • Capture Multimodal Traces
    • Concepts: Tracing
      • How Tracing Works
      • What are Traces
      • Concepts: Annotations
      • FAQs: Tracing
  • 📃Prompt Engineering
    • Overview: Prompts
      • Prompt Management
      • Prompt Playground
      • Span Replay
      • Prompts in Code
    • Quickstart: Prompts
      • Quickstart: Prompts (UI)
      • Quickstart: Prompts (Python)
      • Quickstart: Prompts (TS)
    • How to: Prompts
      • Configure AI Providers
      • Using the Playground
      • Create a prompt
      • Test a prompt
      • Tag a prompt
      • Using a prompt
    • Concepts: Prompts
  • 🗄️Datasets & Experiments
    • Overview: Datasets & Experiments
    • Quickstart: Datasets & Experiments
    • How-to: Datasets
      • Creating Datasets
      • Exporting Datasets
    • Concepts: Datasets
    • How-to: Experiments
      • Run Experiments
      • Using Evaluators
  • 🧠Evaluation
    • Overview: Evals
      • Agent Evaluation
    • Quickstart: Evals
    • How to: Evals
      • Pre-Built Evals
        • Hallucinations
        • Q&A on Retrieved Data
        • Retrieval (RAG) Relevance
        • Summarization
        • Code Generation
        • Toxicity
        • AI vs Human (Groundtruth)
        • Reference (citation) Link
        • User Frustration
        • SQL Generation Eval
        • Agent Function Calling Eval
        • Agent Path Convergence
        • Agent Planning
        • Agent Reflection
        • Audio Emotion Detection
      • Eval Models
      • Build an Eval
      • Build a Multimodal Eval
      • Online Evals
      • Evals API Reference
    • Concepts: Evals
      • LLM as a Judge
      • Eval Data Types
      • Evals With Explanations
      • Evaluators
      • Custom Task Evaluation
  • 🔍Retrieval
    • Overview: Retrieval
    • Quickstart: Retrieval
    • Concepts: Retrieval
      • Retrieval with Embeddings
      • Benchmarking Retrieval
      • Retrieval Evals on Document Chunks
  • 🌌inferences
    • Quickstart: Inferences
    • How-to: Inferences
      • Import Your Data
        • Prompt and Response (LLM)
        • Retrieval (RAG)
        • Corpus Data
      • Export Data
      • Generate Embeddings
      • Manage the App
      • Use Example Inferences
    • Concepts: Inferences
    • API: Inferences
    • Use-Cases: Inferences
      • Embeddings Analysis
  • ⚙️Settings
    • Access Control (RBAC)
    • API Keys
    • Data Retention
Powered by GitBook

Platform

  • Tracing
  • Prompts
  • Datasets and Experiments
  • Evals

Software

  • Python Client
  • TypeScript Client
  • Phoenix Evals
  • Phoenix Otel

Resources

  • Container Images
  • X
  • Blue Sky
  • Blog

Integrations

  • OpenTelemetry
  • AI Providers

© 2025 Arize AI

On this page
  • Prompt
  • Prompt Templates
  • Prompt Version
  • Prompt Version Tag
  • Prompt Format
  • Tools
  • Response Format

Was this helpful?

Edit on GitHub
  1. Prompt Engineering

Concepts: Prompts

PreviousUsing a promptNextOverview: Datasets & Experiments

Last updated 3 months ago

Was this helpful?

Prompt

Prompts often times refer to the content of how you "prompt" a LLM, e.g. the "text" that you send to a model like OpenAI's gpt-4. Within Phoenix we expand this definition to be everything that's needed to prompt:

  • The of the messages to send to a completion endpoint

  • The invocation parameters (temperature, frequency penalty, etc.)

  • The made accessible to the LLM (e.x. weather API)

  • The (sometimes called the output schema) used for when you have JSON mode enabled.

This expanded definition of a prompt lets you more deterministically invoke LLMs with confidence as everything is snapshotted for you to use within your application.

A phoenix prompt captures everything needed to invoke an LLM

Prompt Templates

Although the terms prompt and prompt template get used interchangeably, it's important to know the difference.

Prompts refer to the message(s) that are passed into the language model.

Prompt Templates refer a way of formatting information to get the prompt to hold the information you want (such as context and examples) Prompt templates can include placeholders (variables) for things such as examples (e.x. few-shot), outside context (RAG), or any other external data that is needed.

Prompt Version

Every time you save a prompt within Phoenix, a snapshot of the prompt is saved as a prompt version. Phoenix does this so that you not only can view the changes to a prompt over time but also so that you can build confidence about a specific prompt version before using it within your application. With every prompt version phoenix tracks the author of the prompt and the date at which the version was saved.

Similar to the way in which you can track changes to your code via git shas, Phoenix tracks each change to your prompt with a prompt_id.

Prompt Version Tag

Imagine you’re working on a AI project, and you want to label specific versions of your prompts so you can control when and where they get deployed. This is where prompt version tags come in.

A prompt version tag is like a sticky note you put on a specific version of your prompt to mark it as important. Once tagged, that version won’t change, making it easy to reference later.

When building applications, different environments are often used for different stages of readiness before going live, for example:

  1. Development – Where new features are built.

  2. Staging – Where testing happens.

  3. Production – The live system that users interact with.

Prompt Format

Prompts can be formatted to include any attributes from spans or datasets. These attributes can be added as F-Strings or using Mustache formatting.

F-strings should be formatted with single {s:

{question}

To escape a { when using F-string, add a second { in front of it, e.g., {{escaped}} {not-escaped}. Escaping variables will remove them from inputs in the Playground.

Mustache should be formatted with double {{s:

{{question}}

We recommend using Mustache where possible, since it supports nested attributes, e.g. attributes.input.value, more seamlessly

Tools

Tools allow LLMs to interact with the external environment. This can allow LLMs to interface with your application in more controlled ways. Given a prompt and some tools to choose from an LLM may choose to use some (or one) tools or not. Many LLM API's also expose a tool choice parameter which allow you to constrain how and which tools are selected.

Here is an example of what a tool would looke like for the weather API using OpenAI.

{
    "type": "function",
    "function": {
        "name": "get_current_weather",
        "description": "Get the current weather in a given location",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "The city and state, e.g. San Francisco, CA",
                },
            },
            "required": ["location"],
        }
    }
}

Response Format

Some LLMs support structured responses, known as response format or output schema, allowing you to specify an exact schema for the model’s output.

Structured Outputs ensure the model consistently generates responses that adhere to a defined JSON Schema, preventing issues like missing keys or invalid values.

Benefits of Structured Outputs:

  • Reliable type-safety: Eliminates the need to validate or retry incorrectly formatted responses.

  • Explicit refusals: Enables programmatic detection of safety-based refusals.

  • Simpler prompting: Reduces reliance on strongly worded prompts for consistent formatting.

Prompt templates have placeholders for variables that are dynamically filled at runtime

with environment tags can enable building, testing, and deploying prompts in the same way as an application—ensuring that prompt changes can be .

In addition to environment tags, custom Git tags allow teams to label code versions in a way that fits their specific workflow (`v0.0.1`). These tags can be used to signal different stages of deployment, feature readiness, or any other meaningful status. Prompt version tags work exactly the same way as .

For more details, check out this

📃
git tags
OpenAI guide.
Tagging prompt versions
prompt template
tools
response format
systematically tested and deployed