Phoenix
TypeScript APIPython APICommunityGitHubPhoenix Cloud
English
  • Documentation
  • Self-Hosting
  • Cookbooks
  • SDK and API Reference
  • Release Notes
  • Resources
English
  • Arize Phoenix
  • Quickstarts
  • User Guide
  • Environments
  • Phoenix Demo
  • 🔭Tracing
    • Overview: Tracing
    • Quickstart: Tracing
      • Quickstart: Tracing (Python)
      • Quickstart: Tracing (TS)
    • Features: Tracing
      • Projects
      • Annotations
      • Sessions
    • Integrations: Tracing
      • OpenAI
      • OpenAI Agents SDK
      • LlamaIndex
      • LlamaIndex Workflows
      • LangChain
      • LangGraph
      • LiteLLM
      • Anthropic
      • Amazon Bedrock
      • Amazon Bedrock Agents
      • VertexAI
      • Model Context Protocol (MCP)
      • MistralAI
      • Google GenAI
      • Groq
      • Hugging Face smolagents
      • CrewAI
      • Haystack
      • DSPy
      • Instructor
      • OpenAI Node SDK
      • LangChain.js
      • Vercel AI SDK
      • LangFlow
      • BeeAI
      • Flowise
    • How-to: Tracing
      • Setup Tracing
        • Setup using Phoenix OTEL
        • Setup using base OTEL
        • Using Phoenix Decorators
        • Setup Tracing (TS)
        • Setup Projects
        • Setup Sessions
      • Add Metadata
        • Add Attributes, Metadata, Users
        • Instrument Prompt Templates and Prompt Variables
      • Annotate Traces
        • Annotating in the UI
        • Annotating via the Client
        • Running Evals on Traces
        • Log Evaluation Results
      • Importing & Exporting Traces
        • Import Existing Traces
        • Export Data & Query Spans
        • Exporting Annotated Spans
      • Advanced
        • Mask Span Attributes
        • Suppress Tracing
        • Filter Spans to Export
        • Capture Multimodal Traces
    • Concepts: Tracing
      • How Tracing Works
      • What are Traces
      • Concepts: Annotations
      • FAQs: Tracing
  • 📃Prompt Engineering
    • Overview: Prompts
      • Prompt Management
      • Prompt Playground
      • Span Replay
      • Prompts in Code
    • Quickstart: Prompts
      • Quickstart: Prompts (UI)
      • Quickstart: Prompts (Python)
      • Quickstart: Prompts (TS)
    • How to: Prompts
      • Configure AI Providers
      • Using the Playground
      • Create a prompt
      • Test a prompt
      • Tag a prompt
      • Using a prompt
    • Concepts: Prompts
  • 🗄️Datasets & Experiments
    • Overview: Datasets & Experiments
    • Quickstart: Datasets & Experiments
    • How-to: Datasets
      • Creating Datasets
      • Exporting Datasets
    • Concepts: Datasets
    • How-to: Experiments
      • Run Experiments
      • Using Evaluators
  • 🧠Evaluation
    • Overview: Evals
      • Agent Evaluation
    • Quickstart: Evals
    • How to: Evals
      • Pre-Built Evals
        • Hallucinations
        • Q&A on Retrieved Data
        • Retrieval (RAG) Relevance
        • Summarization
        • Code Generation
        • Toxicity
        • AI vs Human (Groundtruth)
        • Reference (citation) Link
        • User Frustration
        • SQL Generation Eval
        • Agent Function Calling Eval
        • Agent Path Convergence
        • Agent Planning
        • Agent Reflection
        • Audio Emotion Detection
      • Eval Models
      • Build an Eval
      • Build a Multimodal Eval
      • Online Evals
      • Evals API Reference
    • Concepts: Evals
      • LLM as a Judge
      • Eval Data Types
      • Evals With Explanations
      • Evaluators
      • Custom Task Evaluation
  • 🔍Retrieval
    • Overview: Retrieval
    • Quickstart: Retrieval
    • Concepts: Retrieval
      • Retrieval with Embeddings
      • Benchmarking Retrieval
      • Retrieval Evals on Document Chunks
  • 🌌inferences
    • Quickstart: Inferences
    • How-to: Inferences
      • Import Your Data
        • Prompt and Response (LLM)
        • Retrieval (RAG)
        • Corpus Data
      • Export Data
      • Generate Embeddings
      • Manage the App
      • Use Example Inferences
    • Concepts: Inferences
    • API: Inferences
    • Use-Cases: Inferences
      • Embeddings Analysis
  • 🔌INTEGRATIONS
    • Phoenix MCP Server
    • Cleanlab
    • Ragas
  • ⚙️Settings
    • Access Control (RBAC)
    • API Keys
    • Data Retention
Powered by GitBook

Platform

  • Tracing
  • Prompts
  • Datasets and Experiments
  • Evals

Software

  • Python Client
  • TypeScript Client
  • Phoenix Evals
  • Phoenix Otel

Resources

  • Container Images
  • X
  • Blue Sky
  • Blog

Integrations

  • OpenTelemetry
  • AI Providers

© 2025 Arize AI

On this page

Was this helpful?

Edit on GitHub
  1. Evaluation
  2. How to: Evals
  3. Pre-Built Evals

Agent Reflection

PreviousAgent PlanningNextAudio Emotion Detection

Last updated 21 days ago

Was this helpful?

Use this prompt template to evaluate an agent's final response. This is an optional step, which you can use as a gate to retry a set of actions if the response or state of the world is insufficient for the given task.

Read more:

Prompt Template

This prompt template is heavily inspired by the paper: .

You are an expert in {topic}. I will give you a user query. Your task is to reflect on your provided solution and whether it has solved the problem.
First, explain whether you believe the solution is correct or incorrect.
Second, list the keywords that describe the type of your errors from most general to most specific.
Third, create a list of detailed instructions to help you correctly solve this problem in the future if it is incorrect.

Be concise in your response; however, capture all of the essential information.

Here is the data:
    [BEGIN DATA]
    ************
    [User Query]: {user_query}
    ************
    [Tools]: {tool_definitions}
    ************
    [State]: {current_state}
    ************
    [Provided Solution]: {solution}
    [END DATA]
🧠
How to evaluate the evaluators and build self-improving evals
Prompt optimization
"Self Reflection in LLM Agents"