Phoenix
TypeScript APIPython APICommunityGitHubPhoenix Cloud
  • Documentation
  • Self-Hosting
  • Cookbooks
  • Learn
  • Integrations
  • SDK and API Reference
  • Release Notes
  • 🤖Agents
    • Agent Workflow Patterns
      • AutoGen
      • CrewAI
      • Google GenAI SDK (Manual Orchestration)
      • OpenAI Agents
  • 📜Guides & FAQ
    • Frequently Asked Questions
    • Contribute to Phoenix
  • 🪜Fundamentals
    • Agents Hub
    • LLM Evals Hub
    • LLM Ops Hub
  • 🔭Tracing
    • What are Traces
    • How Tracing Works
    • FAQs: Tracing
  • 📃Prompt Engineering
    • Prompts Concepts
  • 🗄️Datasets and Experiments
    • Datasets Concepts
  • 🧠Evaluation
    • Evaluators
    • Eval Data Types
    • Evals With Explanations
    • LLM as a Judge
    • Custom Task Evaluation
  • 🔍Retrieval & Infrences
    • Retrieval with Embeddings
    • Benchmarking Retrieval
  • Retrieval Evals on Document Chunks
  • 🌌Inferences
    • Inferences Concepts
  • 📚Resources
    • Github
  • OpenInference
Powered by GitBook

Platform

  • Tracing
  • Prompts
  • Datasets and Experiments
  • Evals

Software

  • Python Client
  • TypeScript Client
  • Phoenix Evals
  • Phoenix Otel

Resources

  • Container Images
  • X
  • Blue Sky
  • Blog

Integrations

  • OpenTelemetry
  • AI Providers

© 2025 Arize AI

On this page

Was this helpful?

Retrieval Evals on Document Chunks

Retrieval Evals are designed to evaluate the effectiveness of retrieval systems. The retrieval systems typically return list of chunks of length k ordered by relevancy. The most common retrieval systems in the LLM ecosystem are vector DBs.

The retrieval Eval is designed to asses the relevance of each chunk and its ability to answer the question. More information on the retrieval Eval can be found here

The picture above shows a single query returning k=4 chunks as a list. The retrieval Eval runs across each chunk returning a value of relevance in a list highlighting its relevance for the specific chunk. Phoenix provides helper functions that take in a dataframe, with query column that has lists of chunks and produces a column that is a list of equal length with an Eval for each chunk.

PreviousBenchmarking RetrievalNextInferences Concepts

Was this helpful?