Phoenix Traces for LLM applications - OpenAI, LangChain & LlamaIndex
Inspect the inner-workings of your LLM Application using OpenInference Traces
Streaming Traces to Phoenix
The easiest method of using Phoenix traces with LLM frameworks (or direct OpenAI API) is to stream the execution of your application to a locally running Phoenix server. The traces collected during execution can then be stored for later use for things like validation, evaluation, and fine-tuning.
The traces can be collected and stored in the following ways:
In Memory: useful for debugging.
Local File: Persistent and good for offline local development. See exports
Cloud (coming soon): Store your cloud buckets as as assets for later use
To get started with traces, you will first want to start a local Phoenix app.
import phoenix as pxsession = px.launch_app()
The above launches a Phoenix server that acts as a trace collector for any LLM application running locally.
🌍 To view the Phoenix app in your browser, visit https://z8rwookkcle1-496ff2e9c6d22116-6060-colab.googleusercontent.com/
📺 To view the Phoenix app in a notebook, run `px.active_session().view()`
📖 For more information on how to use Phoenix, check out https://docs.arize.com/phoenix
The launch_app command will spit out a URL for you to view the Phoenix UI. You can access this url again at any time via the session.
Now that phoenix is up and running, you can now run a LlamaIndex or LangChain application OR just run the OpenAI API and debug your application as the traces stream in.
If you are using llama-index>0.8.36 you will be able to instrument your application with LlamaIndex's one-click observability.
# Phoenix can display in real time the traces automatically# collected from your LlamaIndex application.import phoenix as px# Look for a URL in the output to open the App in a browser.px.launch_app()# The App is initially empty, but as you proceed with the steps below,# traces will appear automatically as your LlamaIndex application runs.import llama_indexllama_index.set_global_handler("arize_phoenix")# Run your LlamaIndex application and traces# will be collected and displayed in Phoenix.# LlamaIndex application initialization may vary# depending on your application. Below is a simple example:service_context = ServiceContext.from_defaults( llm_predictor=LLMPredictor(llm=ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)), embed_model=OpenAIEmbedding(model="text-embedding-ada-002"),)index =load_index_from_storage( storage_context, service_context=service_context,)query_engine = index.as_query_engine()# Execute queriesquery_engine.query("What is OpenInference tracing?")
See the integrations guide for the full details as well as support for older versions of LlamaIndex
from phoenix.trace.langchain import OpenInferenceTracer, LangChainInstrumentor# If no exporter is specified, the tracer will export to the locally running Phoenix servertracer =OpenInferenceTracer()# If no tracer is specified, a tracer is constructed for youLangChainInstrumentor(tracer).instrument()# Initialize your LangChain application# This might vary on your use-case. An example Chain is shown belowfrom langchain.chains import RetrievalQAfrom langchain.chat_models import ChatOpenAIfrom langchain.embeddings import OpenAIEmbeddingsfrom langchain.retrievers import KNNRetrieverembeddings =OpenAIEmbeddings(model="text-embedding-ada-002")knn_retriever =KNNRetriever( index=vectors, texts=texts, embeddings=OpenAIEmbeddings(),)llm =ChatOpenAI(model_name="gpt-3.5-turbo")chain = RetrievalQA.from_chain_type( llm=llm, chain_type="map_reduce", retriever=knn_retriever,)# Execute the chainresponse = chain.run("What is OpenInference tracing?")
from phoenix.trace.tracer import Tracerfrom phoenix.trace.openai.instrumentor import OpenAIInstrumentorfrom phoenix.trace.exporter import HttpExporterfrom phoenix.trace.openai import OpenAIInstrumentortracer =Tracer(exporter=HttpExporter())OpenAIInstrumentor(tracer).instrument()# Define a conversation with a user messageconversation = [{"role":"system","content":"You are a helpful assistant."},{"role":"user","content":"Hello, can you help me with something?"}]# Generate a response from the assistantresponse = openai.ChatCompletion.create( model="gpt-3.5-turbo", messages=conversation,)# Extract and print the assistant's replyassistant_reply = response['choices'][0]['message']['content']#The traces will be available in the Phoenix App for the above messsages
Once you've executed a sufficient number of queries (or chats) to your application, you can view the details of the UI by refreshing the browser url
Trace Datasets
Phoenix also support datasets that contain OpenInference trace data. This allows data from a LangChain and LlamaIndex running instance explored for analysis offline.
There are two ways to extract trace dataframes. The two ways for LangChain are described below.
session = px.active_session()# You can export a dataframe from the session# Note that you can apply a filter if you would like to export only a sub-set of spansdf = session.get_spans_dataframe('span_kind == "RETRIEVER"')# Re-launch the app using the datapx.launch_app(trace=px.TraceDataset(df))
from phoenix.trace.langchain import OpenInferenceTracertracer =OpenInferenceTracer()# Run the application with the tracerchain.run(query, callbacks=[tracer])# When you are ready to analyze the data, you can convert the tracesds = TraceDataset.from_spans(tracer.get_spans())# Print the dataframeds.dataframe.head()# Re-initialize the app with the trace datasetpx.launch_app(trace=ds)
In addition to launching phoenix on LlamaIndex and LangChain, teams can export trace data to a dataframe in order to run LLM Evals on the data.
from phoenix.experimental.evals import run_relevance_eval# Export all of the traces from all the retriver spans that have been runtrace_df = px.active_session().get_spans_dataframe('span_kind == "RETRIEVER"')# Run relevance evaluationsrelevances =run_relevance_eval(trace_df)
For full details, check out the relevance example of the relevance LLM Eval.
Phoenix Tracing App
Phoenix can be used to understand and troubleshoot your by surfacing:
Application latency - highlighting slow invocations of LLMs, Retrievers, etc.
Token Usage - Displays the breakdown of token usage with LLMs to surface up your most expensive LLM calls
Runtime Exceptions - Critical runtime exceptions such as rate-limiting are captured as exception events.
Retrieved Documents - view all the documents retrieved during a retriever call and the score and order in which they were returned
Embeddings - view the embedding text used for retrieval and the underlying embedding model
LLM Parameters - view the parameters used when calling out to an LLM to debug things like temperature and the system prompts
Prompt Templates - Figure out what prompt template is used during the prompting step and what variables were used.
Tool Descriptions - view the description and function signature of the tools your LLM has been given access to
LLM Function Calls - if using OpenAI or other a model with function calls, you can view the function selection and function messages in the input messages to the LLM.\
LLM Traces are a powerful way to troubleshoot and understand your application and can be leveraged to evaluate the quality of your application. For a full list of notebooks that illustrate this in full-color, please check out the notebooks section.