Examples
Explore the capabilities of Phoenix with notebooks
Last updated
Explore the capabilities of Phoenix with notebooks
Last updated
Example full-stack applications instrumented using OpenInference and observed via phoenix server instances.
Trace through the execution of your LLM application to understand its internal structure and to troubleshoot issues with retrieval, tool execution, LLM calls, and more.
Tracing an OpenAI App
Tracing
Basics
Tracing a LlamaIndex App
Tracing
Basics
Retrieval Example with Evaluations: Fast UI Viz
Evaluations
Retrieval
Tracing and Evaluating a LlamaIndex + OpenAI RAG Application
LlamaIndex
OpenAI
retrieval-augmented generation
Tracing and Evaluating a LlamaIndex OpenAI Agent
LlamaIndex
OpenAI
agents
function calling
Tracing and Evaluating a Structured Data Extraction Application with OpenAI Function Calling
OpenAI
structured data extraction
function calling
Tracing and Evaluating a LangChain + OpenAI RAG Application
LangChain
OpenAI
retrieval-augmented generation
Tracing and Evaluating a LangChain Agent
LangChain
OpenAI
agents
function calling
Tracing and Evaluation a DSPy Application
LangChain
Google PaLM
retrieval-augmented generation
Tracing a LlamaIndex app with Sessions
LlamaIndex
Tracing
Sessions
Tracing an OpenAI app with Sessions
OpenAI
Tracing
Sessions
Tracing an OpenAI app with Sessions (JS/TS)
OpenAI
Tracing
Sessions
Iteratively improve your LLM task by building datasets, running experiments, and evaluating performance using code and LLM-as-a-judge.
Quickstart: Datasets and Experiments
datasets
experiments
Text2SQL
SQL generation
Prompt Template Iteration for a Summarization Service
summarization
Answer Relevancy and Context Relevancy Evaluation
RAG
Guideline Eval
RAG
Pairwise Eval
pairwise eval
LlamaIndex RAG with Reranker
LlamaIndex
RAG
re-rankers
LangChain Email Extraction
LangChain
Leverage the power of large language models to evaluate your generative model or application for hallucinations, toxicity, relevance of retrieved documents, and more.
Evaluating Hallucinations
hallucinations
Evaluating Toxicity
toxicity
Evaluating Relevance of Retrieved Documents
document relevance
Evaluating Question-Answering
question-answering
Evaluating Summarization
summarization
Evaluating Code Readability
code readability
Visualize your generative application's retrieval process to surface failed retrievals and to find topics not addressed by your knowledge base.
Evaluating and Improving Search and Retrieval Applications
LlamaIndex
retrieval-augmented generation
Evaluating and Improving Search and Retrieval Applications
LlamaIndex
Milvus
retrieval-augmented generation
Explore lower-dimensional representations of your embedding data to identify clusters of high-drift and performance degradation.
Active Learning for a Drifting Image Classification Model
image classification
fine-tuning
Root-Cause Analysis for a Drifting Sentiment Classification Model
NLP
sentiment classification
Troubleshooting an LLM Summarization Task
summarization
Collect Chats with GPT
LLMs
Find Clusters, Export, and Explore with GPT
LLMs
exploratory data analysis
Statistically analyze your structured data to perform A/B analysis, temporal drift analysis, and more.
Detecting Fraud with Tabular Embeddings
tabular data
anomaly detection