Phoenix Traces for LLM applications - OpenAI, LangChain & LlamaIndex

Inspect the inner-workings of your LLM Application using OpenInference Traces

Streaming Traces to Phoenix

The easiest method of using Phoenix traces with LLM frameworks (or direct OpenAI API) is to stream the execution of your application to a locally running Phoenix server. The traces collected during execution can then be stored for later use for things like validation, evaluation, and fine-tuning.

The traces can be collected and stored in the following ways:

  • In Memory: useful for debugging.

  • Local File: Persistent and good for offline local development. See exports

  • Cloud (coming soon): Store your cloud buckets as as assets for later use

To get started with traces, you will first want to start a local Phoenix app.

import phoenix as px
session = px.launch_app()

The above launches a Phoenix server that acts as a trace collector for any LLM application running locally.

🌍 To view the Phoenix app in your browser, visit https://z8rwookkcle1-496ff2e9c6d22116-6060-colab.googleusercontent.com/
📺 To view the Phoenix app in a notebook, run `px.active_session().view()`
📖 For more information on how to use Phoenix, check out https://docs.arize.com/phoenix

The launch_app command will spit out a URL for you to view the Phoenix UI. You can access this url again at any time via the session. Now that phoenix is up and running, you can now run a LlamaIndex or LangChain application OR just run the OpenAI API and debug your application as the traces stream in.

If you are using llama-index>0.8.36 you will be able to instrument your application with LlamaIndex's one-click observability.

# Phoenix can display in real time the traces automatically
# collected from your LlamaIndex application.
import phoenix as px
# Look for a URL in the output to open the App in a browser.
px.launch_app()

# The App is initially empty, but as you proceed with the steps below,
# traces will appear automatically as your LlamaIndex application runs.

import llama_index
llama_index.set_global_handler("arize_phoenix")

# Run your LlamaIndex application and traces
# will be collected and displayed in Phoenix.

# LlamaIndex application initialization may vary
# depending on your application. Below is a simple example:
service_context = ServiceContext.from_defaults(
    llm_predictor=LLMPredictor(llm=ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)),
    embed_model=OpenAIEmbedding(model="text-embedding-ada-002"),
)
index = load_index_from_storage(
    storage_context,
    service_context=service_context,
)
query_engine = index.as_query_engine()

# Execute queries
query_engine.query("What is OpenInference tracing?")

See the integrations guide for the full details as well as support for older versions of LlamaIndex

Once you've executed a sufficient number of queries (or chats) to your application, you can view the details of the UI by refreshing the browser url

Trace Datasets

Phoenix also support datasets that contain OpenInference trace data. This allows data from a LangChain and LlamaIndex running instance explored for analysis offline.

There are two ways to extract trace dataframes. The two ways for LangChain are described below.

session = px.active_session()

# You can export a dataframe from the session
# Note that you can apply a filter if you would like to export only a sub-set of spans
df = session.get_spans_dataframe('span_kind == "RETRIEVER"')

# Re-launch the app using the data
px.launch_app(trace=px.TraceDataset(df))

For full details on how to export trace data, see the detailed guide

Evaluating Traces

In addition to launching phoenix on LlamaIndex and LangChain, teams can export trace data to a dataframe in order to run LLM Evals on the data.

from phoenix.experimental.evals import run_relevance_eval

# Export all of the traces from all the retriver spans that have been run
trace_df = px.active_session().get_spans_dataframe('span_kind == "RETRIEVER"')

# Run relevance evaluations
relevances = run_relevance_eval(trace_df)

For full details, check out the relevance example of the relevance LLM Eval.

Phoenix Tracing App

Phoenix can be used to understand and troubleshoot your by surfacing:

  • Application latency - highlighting slow invocations of LLMs, Retrievers, etc.

  • Token Usage - Displays the breakdown of token usage with LLMs to surface up your most expensive LLM calls

  • Runtime Exceptions - Critical runtime exceptions such as rate-limiting are captured as exception events.

  • Retrieved Documents - view all the documents retrieved during a retriever call and the score and order in which they were returned

  • Embeddings - view the embedding text used for retrieval and the underlying embedding model

  • LLM Parameters - view the parameters used when calling out to an LLM to debug things like temperature and the system prompts

  • Prompt Templates - Figure out what prompt template is used during the prompting step and what variables were used.

  • Tool Descriptions - view the description and function signature of the tools your LLM has been given access to

  • LLM Function Calls - if using OpenAI or other a model with function calls, you can view the function selection and function messages in the input messages to the LLM.\

LLM Traces are a powerful way to troubleshoot and understand your application and can be leveraged to evaluate the quality of your application. For a full list of notebooks that illustrate this in full-color, please check out the notebooks section.

Last updated

#357: Update Phoenix Inferences Quickstart

Change request updated