Phoenix: AI Observability & Evaluation

Evaluate, troubleshoot, and fine tune your LLM, CV, and NLP models in a notebook.
Phoenix is an Open Source Observability library designed pre-production experimentation, evaluation, and troubleshooting.
The toolset is designed to ingest inference data for LLMs, CV, NLP and tabular datasets as well as LLM traces. It allows AI Engineers and Data Scientists to quickly visualize their data, evaluate performance, track down issues & insights, and easily export to improve.


Running Phoenix for the first time? Select a quickstart below.
Don't know which one to choose? Phoenix has two main data ingestion methods:
  1. 1.
    LLM Traces: Phoenix is used on top of trace data generated by LamaIndex and LangChain. The general use case is to troubleshoot LLM applications with agentic workflows.
  2. 2.
    Inferences: Phoenix is used to troubleshoot models whose datasets can be expressed as DataFrames in Python such as LLM applications built in Python workflows, CV, NLP, and tabular models.

Phoenix Functionality



Check out a comprehensive list of example notebooks for LLM Traces, Evals, RAG Analysis, and more.

Use Cases

Learn about best practices, and how to get started with use case examples such as Q&A with Retrieval, Summarization, and Chatbots.


Join the Phoenix Slack community to ask questions, share findings, provide feedback, and connect with other developers.
Last modified 1d ago