Phoenix Traces for LLM applications - OpenAI, LangChain & LlamaIndex

Inspect the inner-workings of your LLM Application using OpenInference Traces

Streaming Traces to Phoenix

The easiest method of using Phoenix traces with LLM frameworks (or direct OpenAI API) is to stream the execution of your application to a locally running Phoenix server. The traces collected during execution can then be stored for later use for things like validation, evaluation, and fine-tuning.

The can be collected and stored in the following ways:

In Memory: useful for debugging.
Local File: Persistent and good for offline local development. See
Cloud (coming soon): Store your cloud buckets as as assets for later use

To get started with traces, you will first want to start a local Phoenix app.

import phoenix as px
session = px.launch_app()

The above launches a Phoenix server that acts as a trace collector for any LLM application running locally.

🌍 To view the Phoenix app in your browser, visit https://z8rwookkcle1-496ff2e9c6d22116-6060-colab.googleusercontent.com/
📺 To view the Phoenix app in a notebook, run `px.active_session().view()`
📖 For more information on how to use Phoenix, check out https://docs.arize.com/phoenix

The launch_app command will spit out a URL for you to view the Phoenix UI. You can access this url again at any time via the . Now that phoenix is up and running, you can now run a or application OR just run the OpenAI API and debug your application as the traces stream in.

If you are using llama-index>0.8.36 you will be able to instrument your application with LlamaIndex's observability.

# Phoenix can display in real time the traces automatically
# collected from your LlamaIndex application.
import phoenix as px
# Look for a URL in the output to open the App in a browser.
px.launch_app()

# The App is initially empty, but as you proceed with the steps below,
# traces will appear automatically as your LlamaIndex application runs.

import llama_index
llama_index.set_global_handler("arize_phoenix")

# Run your LlamaIndex application and traces
# will be collected and displayed in Phoenix.

# LlamaIndex application initialization may vary
# depending on your application. Below is a simple example:
service_context = ServiceContext.from_defaults(
    llm_predictor=LLMPredictor(llm=ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)),
    embed_model=OpenAIEmbedding(model="text-embedding-ada-002"),
)
index = load_index_from_storage(
    storage_context,
    service_context=service_context,
)
query_engine = index.as_query_engine()

# Execute queries
query_engine.query("What is OpenInference tracing?")

from phoenix.trace.langchain import OpenInferenceTracer, LangChainInstrumentor

# If no exporter is specified, the tracer will export to the locally running Phoenix server
tracer = OpenInferenceTracer()
# If no tracer is specified, a tracer is constructed for you
LangChainInstrumentor(tracer).instrument()

# Initialize your LangChain application
# This might vary on your use-case. An example Chain is shown below
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI
from langchain.embeddings import OpenAIEmbeddings
from langchain.retrievers import KNNRetriever

embeddings = OpenAIEmbeddings(model="text-embedding-ada-002")

knn_retriever = KNNRetriever(
    index=vectors,
    texts=texts,
    embeddings=OpenAIEmbeddings(),
)

llm = ChatOpenAI(model_name="gpt-3.5-turbo")
chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="map_reduce",
    retriever=knn_retriever,
)

# Execute the chain
response = chain.run("What is OpenInference tracing?")

from phoenix.trace.tracer import Tracer
from phoenix.trace.openai.instrumentor import OpenAIInstrumentor
from phoenix.trace.exporter import HttpExporter
from phoenix.trace.openai import OpenAIInstrumentor

tracer = Tracer(exporter=HttpExporter())
OpenAIInstrumentor(tracer).instrument()

# Define a conversation with a user message
conversation = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hello, can you help me with something?"}
]

# Generate a response from the assistant
response = openai.ChatCompletion.create(
    model="gpt-3.5-turbo",
    messages=conversation,
)

# Extract and print the assistant's reply
assistant_reply = response['choices'][0]['message']['content']

#The traces will be available in the Phoenix App for the above messsages

from phoenix.trace.tracer import Tracer
from phoenix.trace.openai.instrumentor import OpenAIInstrumentor
from phoenix.trace.exporter import HttpExporter
from phoenix.trace.openai import OpenAIInstrumentor
from phoenix.trace.tracer import Tracer

import phoenix as px
session = px.launch_app()
tracer = Tracer(exporter=HttpExporter())
OpenAIInstrumentor(tracer).instrument()

Once you've executed a sufficient number of queries (or chats) to your application, you can view the details of the UI by refreshing the browser url

Trace Datasets

There are two ways to extract trace dataframes. The two ways for LangChain are described below.

session = px.active_session()

# You can export a dataframe from the session
# Note that you can apply a filter if you would like to export only a sub-set of spans
df = session.get_spans_dataframe('span_kind == "RETRIEVER"')

# Re-launch the app using the data
px.launch_app(trace=px.TraceDataset(df))

from phoenix.trace.langchain import OpenInferenceTracer

tracer = OpenInferenceTracer()

# Run the application with the tracer
chain.run(query, callbacks=[tracer])

# When you are ready to analyze the data, you can convert the traces
ds = TraceDataset.from_spans(tracer.get_spans())

# Print the dataframe
ds.dataframe.head()

# Re-initialize the app with the trace dataset
px.launch_app(trace=ds)

Evaluating Traces

In addition to launching phoenix on LlamaIndex and LangChain, teams can export trace data to a dataframe in order to run LLM Evals on the data.

from phoenix.experimental.evals import run_relevance_eval

# Export all of the traces from all the retriver spans that have been run
trace_df = px.active_session().get_spans_dataframe('span_kind == "RETRIEVER"')

# Run relevance evaluations
relevances = run_relevance_eval(trace_df)

Phoenix Tracing App

Phoenix can be used to understand and troubleshoot your by surfacing:

Application latency - highlighting slow invocations of LLMs, Retrievers, etc.
Token Usage - Displays the breakdown of token usage with LLMs to surface up your most expensive LLM calls
Runtime Exceptions - Critical runtime exceptions such as rate-limiting are captured as exception events.
Retrieved Documents - view all the documents retrieved during a retriever call and the score and order in which they were returned
Embeddings - view the embedding text used for retrieval and the underlying embedding model
LLM Parameters - view the parameters used when calling out to an LLM to debug things like temperature and the system prompts
Prompt Templates - Figure out what prompt template is used during the prompting step and what variables were used.
Tool Descriptions - view the description and function signature of the tools your LLM has been given access to
LLM Function Calls - if using OpenAI or other a model with function calls, you can view the function selection and function messages in the input messages to the LLM.\

PreviousExample Notebooks NextAutoGen Support

Last updated 1 year ago

Was this helpful?