Examples
Explore the capabilities of Phoenix with notebooks
Last updated
Explore the capabilities of Phoenix with notebooks
Last updated
Example full-stack applications instrumented using OpenInference and observed via phoenix server instances.
Trace through the execution of your LLM application to understand its internal structure and to troubleshoot issues with retrieval, tool execution, LLM calls, and more.
Title | Topics | Links |
---|---|---|
Tracing an OpenAI App |
| |
Tracing a LlamaIndex App |
| |
Retrieval Example with Evaluations: Fast UI Viz |
| |
Tracing and Evaluating a LlamaIndex + OpenAI RAG Application |
| |
Tracing and Evaluating a LlamaIndex OpenAI Agent |
| |
Tracing and Evaluating a Structured Data Extraction Application with OpenAI Function Calling |
| |
Tracing and Evaluating a LangChain + OpenAI RAG Application |
| |
Tracing and Evaluating a LangChain Agent |
| |
Tracing and Evaluation a DSPy Application |
|
Iteratively improve your LLM task by building datasets, running experiments, and evaluating performance using code and LLM-as-a-judge.
Title | Topics | Links |
---|---|---|
Quickstart: Datasets and Experiments |
| |
Text2SQL |
| |
Prompt Template Iteration for a Summarization Service |
| |
Answer Relevancy and Context Relevancy Evaluation |
| |
Guideline Eval |
| |
Pairwise Eval |
| |
LlamaIndex RAG with Reranker |
| |
LangChain Email Extraction |
|
Leverage the power of large language models to evaluate your generative model or application for hallucinations, toxicity, relevance of retrieved documents, and more.
Title | Topics | Links |
---|---|---|
Evaluating Hallucinations |
| |
Evaluating Toxicity |
| |
Evaluating Relevance of Retrieved Documents |
| |
Evaluating Question-Answering |
| |
Evaluating Summarization |
| |
Evaluating Code Readability |
|
Visualize your generative application's retrieval process to surface failed retrievals and to find topics not addressed by your knowledge base.
Title | Topics | Links |
---|---|---|
Evaluating and Improving Search and Retrieval Applications |
| |
Evaluating and Improving Search and Retrieval Applications |
|
Explore lower-dimensional representations of your embedding data to identify clusters of high-drift and performance degradation.
Title | Topics | Links |
---|---|---|
Active Learning for a Drifting Image Classification Model |
| |
Root-Cause Analysis for a Drifting Sentiment Classification Model |
| |
Troubleshooting an LLM Summarization Task |
| |
Collect Chats with GPT |
| |
Find Clusters, Export, and Explore with GPT |
|
Statistically analyze your structured data to perform A/B analysis, temporal drift analysis, and more.
Title | Topics | Links |
---|---|---|
Detecting Fraud with Tabular Embeddings |
|