Inspect the inner-workings of your LLM Application using OpenInference Traces
Overview
Tracing is a powerful tool for understanding the behavior of your LLM application. Phoenix has best-in-class tracing, irregardless of what framework you use and has first-class instrumentation for a variety of frameworks ( LlamaIndex, LangChain, DSPy), SDKs (OpenAI, Bedrock, Mistral, Vertex), and Languages (Python, Javascript). You can also manually instrument your application using the OpenTelemetry SDK.
To get started with traces, you will first want to start a local Phoenix app. Below we will explore how to use Phoenix in a notebook but you can deploy phoenix once you are ready for a persistent observability platform.
In your Jupyter or Colab environment, run the following command to install.
pipinstallarize-phoenix
condainstall-cconda-forgearize-phoenix
To get started, launch the phoenix app.
import phoenix as pxsession = px.launch_app()
The above launches a Phoenix server that acts as a trace collector for any LLM application running locally in you jupyter notebook!
🌍 To view the Phoenix app in your browser, visit https://z8rwookkcle1-496ff2e9c6d22116-6060-colab.googleusercontent.com/
📺 To view the Phoenix app in a notebook, run `px.active_session().view()`
📖 For more information on how to use Phoenix, check out https://docs.arize.com/phoenix
The launch_app command will spit out a URL for you to view the Phoenix UI. You can access this url again at any time via the session.
Now that phoenix is up and running, you can setup tracing for your AI application so that you can debug your application as the traces stream in.
To use llama-index's one click, you must install the small integration first:
pipinstall'llama-index>=0.10.44'
import phoenix as pxfrom openinference.instrumentation.llama_index import LlamaIndexInstrumentorimport osfrom gcsfs import GCSFileSystemfrom llama_index.core import ( Settings, VectorStoreIndex, StorageContext, set_global_handler, load_index_from_storage)from llama_index.embeddings.openai import OpenAIEmbeddingfrom llama_index.llms.openai import OpenAIimport llama_index# To view traces in Phoenix, you will first have to start a Phoenix server. You can do this by running the following:session = px.launch_app()# Initialize LlamaIndex auto-instrumentationLlamaIndexInstrumentor().instrument()os.environ["OPENAI_API_KEY"]="<ENTER_YOUR_OPENAI_API_KEY_HERE>"# LlamaIndex application initialization may vary# depending on your applicationSettings.llm =OpenAI(model="gpt-4-turbo-preview")Settings.embed_model =OpenAIEmbedding(model="text-embedding-ada-002")# Load your data and create an index. Here we've provided an example of our documentationfile_system =GCSFileSystem(project="public-assets-275721")index_path ="arize-phoenix-assets/datasets/unstructured/llm/llama-index/arize-docs/index/"storage_context = StorageContext.from_defaults( fs=file_system, persist_dir=index_path,)index =load_index_from_storage(storage_context)query_engine = index.as_query_engine()# Query your LlamaIndex applicationquery_engine.query("What is the meaning of life?")query_engine.query("How can I deploy Arize?")# View the traces in the Phoenix UIpx.active_session().url
See the LlamaIndex for the full details as well as support for older versions of LlamaIndex
import phoenix as pxfrom phoenix.trace.langchain import LangChainInstrumentor# To view traces in Phoenix, you will first have to start a Phoenix server. You can do this by running the following:session = px.launch_app()# Initialize Langchain auto-instrumentationLangChainInstrumentor().instrument()# Initialize your LangChain application# This might vary on your use-case. An example Chain is shown belowimport bs4from langchain import hubfrom langchain_community.document_loaders import WebBaseLoaderfrom langchain_chroma import Chromafrom langchain_core.output_parsers import StrOutputParserfrom langchain_core.runnables import RunnablePassthroughfrom langchain_openai import OpenAIEmbeddingsfrom langchain_text_splitters import RecursiveCharacterTextSplitterfrom langchain_openai import ChatOpenAIllm =ChatOpenAI(model="gpt-3.5-turbo-0125")# Load, chunk and index the contents of the blog.loader =WebBaseLoader( web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",), bs_kwargs=dict( parse_only=bs4.SoupStrainer( class_=("post-content", "post-title", "post-header") ) ),)docs = loader.load()text_splitter =RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)splits = text_splitter.split_documents(docs)vectorstore = Chroma.from_documents(documents=splits, embedding=OpenAIEmbeddings())# Retrieve and generate using the relevant snippets of the blog.retriever = vectorstore.as_retriever()prompt = hub.pull("rlm/rag-prompt")defformat_docs(docs):return"\n\n".join(doc.page_content for doc in docs)rag_chain = ({"context": retriever | format_docs,"question":RunnablePassthrough()}| prompt| llm|StrOutputParser())# Execute the chainresponse = rag_chain.invoke("What is Task Decomposition?")
import phoenix as pxfrom phoenix.trace.openai import OpenAIInstrumentor# To view traces in Phoenix, you will first have to start a Phoenix server. You can do this by running the following:session = px.launch_app()# Initialize OpenAI auto-instrumentationOpenAIInstrumentor().instrument()import osfrom openai import OpenAI# Initialize an OpenAI clientclient =OpenAI(api_key='')# Define a conversation with a user messageconversation = [{"role":"system","content":"You are a helpful assistant."},{"role":"user","content":"Hello, can you help me with something?"}]# Generate a response from the assistantresponse = client.chat.completions.create( model="gpt-3.5-turbo", messages=conversation,)# Extract and print the assistant's reply# The traces will be available in the Phoenix App for the above messsagesassistant_reply = response.choices[0].message.content
Once you've executed a sufficient number of queries (or chats) to your application, you can view the details of the UI by refreshing the browser url
Exporting Traces from Phoenix
# You can export a dataframe from the sessiondf = px.Client().get_spans_dataframe()# Note that you can apply a filter if you would like to export only a sub-set of spansdf = px.Client().get_spans_dataframe('span_kind == "RETRIEVER"')
LLM Traces are a powerful way to troubleshoot and understand your application and can be leveraged to evaluate the quality of your application. For a full list of notebooks that illustrate this in full-color, please check out the notebooks section.