Phoenix: AI Observability & Evaluation

Evaluate, troubleshoot, and fine-tune your LLM, CV, and NLP models in a notebook.

Phoenix is an open-source observability library designed for experimentation, evaluation, and troubleshooting.

The toolset is designed to ingest for , CV, NLP, and tabular datasets as well as . It allows AI Engineers and Data Scientists to quickly visualize their data, evaluate performance, track down issues & insights, and easily export to improve.

Quickstarts

Running Phoenix for the first time? Select a quickstart below.

Don't know which one to choose? Phoenix has two main data ingestion methods:

Phoenix is used on top of trace data generated by LlamaIndex and LangChain. The general use case is to troubleshoot LLM applications with agentic workflows.
: Phoenix is used to troubleshoot models whose datasets can be expressed as DataFrames in Python such as LLM applications built in Python workflows, CV, NLP, and tabular models.

Phoenix Functionality

Use the Phoenix Evals library to easily evaluate tasks such as hallucination, summarization, and retrieval relevance, or create your own custom template.
Get visibility into where your complex or agentic workflow broke, or find performance bottlenecks, across different span types with LLM Tracing.
Identify missing context in your knowledge base, and when irrelevant context is retrieved by visualizing query embeddings alongside knowledge base embeddings with RAG Analysis.
Compare and evaluate performance across model versions prior to deploying to production.
Connect teams and workflows, with continued analysis of production data from Arize in a notebook environment for fine tuning workflows.
Find clusters of problems using performance metrics or drift. Export clusters for retraining workflows.
Use the Embeddings Analyzer to surface data drift for computer vision, NLP, and tabular models.

Resources

Check out a comprehensive list of example notebooks for LLM Traces, Evals, RAG Analysis, and more.

Learn about best practices, and how to get started with use case examples such as Q&A with Retrieval, Summarization, and Chatbots.

Join the Phoenix Slack community to ask questions, share findings, provide feedback, and connect with other developers.

NextExample Notebooks

Last updated 1 year ago

Was this helpful?

Quickstarts

Running Phoenix for the first time? Select a quickstart below.

Don't know which one to choose? Phoenix has two main data ingestion methods:

Phoenix is used on top of trace data generated by LlamaIndex and LangChain. The general use case is to troubleshoot LLM applications with agentic workflows.

: Phoenix is used to troubleshoot models whose datasets can be expressed as DataFrames in Python such as LLM applications built in Python workflows, CV, NLP, and tabular models.

Phoenix Functionality

Use the Phoenix Evals library to easily evaluate tasks such as hallucination, summarization, and retrieval relevance, or create your own custom template.

Get visibility into where your complex or agentic workflow broke, or find performance bottlenecks, across different span types with LLM Tracing.

Identify missing context in your knowledge base, and when irrelevant context is retrieved by visualizing query embeddings alongside knowledge base embeddings with RAG Analysis.

Compare and evaluate performance across model versions prior to deploying to production.

Connect teams and workflows, with continued analysis of production data from Arize in a notebook environment for fine tuning workflows.

Find clusters of problems using performance metrics or drift. Export clusters for retraining workflows.

Use the Embeddings Analyzer to surface data drift for computer vision, NLP, and tabular models.

Resources

Check out a comprehensive list of example notebooks for LLM Traces, Evals, RAG Analysis, and more.

Learn about best practices, and how to get started with use case examples such as Q&A with Retrieval, Summarization, and Chatbots.

Join the Phoenix Slack community to ask questions, share findings, provide feedback, and connect with other developers.