LlamaIndex
How to connect to OpenInference compliant data via a llama_index callbacks
Last updated
How to connect to OpenInference compliant data via a llama_index callbacks
Last updated
LlamaIndex (GPT Index) is a data framework for your LLM application. It's a powerful framework by which you can build an application that leverages RAG (retrieval-augmented generation) to super-charge an LLM with your own data. RAG is an extremely powerful LLM application model because it lets you harness the power of LLMs such as OpenAI's GPT but tuned to your data and use-case.
However when building out a retrieval system, a lot can go wrong that can be detrimental to the user-experience of your question and answer system. Phoenix provides two different ways to gain insights into your LLM application: OpenInference inference records and OpenInference tracing.
Traces provide telemetry data about the execution of your LLM application. They are a great way to understand the internals of your LlamaIndex application and to troubleshoot problems related to things like retrieval and tool execution.
To extract traces from your LlamaIndex application, you will have to add Phoenix's OpenInferenceTraceCallback
to your LlamaIndex application. A callback (in this case a OpenInference Tracer
) is a class that automatically accumulates traces (sometimes referred to as spans
) as your application executes. The OpenInference `Tracer`` is a tracer that is specifically designed to work with Phoenix and by default exports the traces to a locally running phoenix server.
To view traces in Phoenix, you will first have to start a Phoenix server. You can do this by running the following:
Once you have started a Phoenix server, you can start your LlamaIndex application with the OpenInferenceTraceCallback
as a callback. To do this, you will have to add the callback to the initialization of your LlamaIndex application
LlamaIndex 0.8.36 and above supports One-Click!
By adding the callback to the callback manager of LlamaIndex, we've created a one-way data connection between your LLM application and Phoenix. This is because by default the OpenInferenceTraceCallback
uses an HTTPExporter
to send traces to your locally running Phoenix server! In this scenario the Phoenix server is serving as a Collector
of the spans that are exported from your LlamaIndex application.
To view the traces in Phoenix, simply open the UI in your browser.
If you would like to save your traces to a file for later use, you can directly extract the traces from the callback
To directly extract the traces from the callback, dump the traces from the tracer into a file (we recommend jsonl
for readability).
Now you can save this file for later inspection. To launch the app with the file generated above, simply pass the contents in the file above via a TraceDataset
In this way, you can use files as a means to store and communicate interesting traces that you may want to use to share with a team or to use later down the line to fine-tune an LLM or model.
For a fully working example of tracing with LlamaIndex, checkout our colab notebook.
Inferences capture each invocation of the LLM application as a single record and is useful for troubleshooting the app's RAG performance using Phoenix's embedding visualization. To view the traces or telemetry information of your application, skip forward to traces.
To provide visibility into how your LLM app is performing, we built the OpenInferenceCallback. The OpenInferenceCallback captures the internals of the LLM App in buffers that conforms to the OpenInference format. As your LlamaIndex application, the callback captures the timing, embeddings, documents, and other critical internals and serializes the data to buffers that can be easily materialized as dataframes or as files such as Parquet. Since Phoenix can ingest OpenInference data natively, making it a seamless integration to analyze your LLM powered chatbot. To understand callbacks in details, consult the LlamaIndex docs.
With a few lines of code, you can mount the OpenInferenceCallback to your application\
If you are running the chatbot in a notebook, you can simply flush the callback buffers to dataframes. Phoenix natively supports parsing OpenInference so there is no need to define a schema for your dataset.
In a production setting, LlamaIndex application maintainers can log the data generated by their system by implementing and passing a custom callback
to OpenInferenceCallbackHandler
. The callback is of type Callable[List[QueryData]]
that accepts a buffer of query data from the OpenInferenceCallbackHandler
, persists the data (e.g., by uploading to cloud storage or sending to a data ingestion service), and flushes the buffer after data is persisted.
A reference implementation is included below that periodically writes data in OpenInference format to local Parquet files when the buffer exceeds a certain size.
⚠️ In a production setting, it's important to clear the buffer, otherwise, the callback handler will indefinitely accumulate data in memory and eventually cause your system to crash.
Note that Parquet is just an example file format, you can use any file format of your choosing such as Avro and NDJSON.
For the full guidance on how to materialize your data in files, consult the LlamaIndex notebook.
For a fully working example, checkout our colab notebook.