Export Data & Query Spans
Various options for to help you get data out of Phoenix
Options for Exporting Data from Phoenix
Exports all spans in a project as a dataframe
Evaluation - Filtering your spans locally using pandas instead of Phoenix DSL.
Exports specific spans or traces based on filters
Evaluation - Querying spans from Phoenix
Exports specific groups of spans from a RAG system
RAG Evaluation - Easily exporting retrieved documents or Q&A data from a RAG system.
Saves all traces as a local file
Storing Data - Backing up an entire Phoenix instance.
Connect to Phoenix
Before using any of the methods above, make sure you've connected to px.Client()
. You'll need to set the following environment variables:
If you're self-hosting Phoenix, ignore the client headers and change the collector endpoint to your endpoint.
Downloading all Spans as a Dataframe
If you prefer to handle your filtering locally, you can also download all spans as a dataframe using the get_spans_dataframe()
function:
Running Span Queries
You can query for data using our query DSL (domain specific language).
This Query DSL is the same as what is used by the filter bar in the dashboard. It can be helpful to form your query string in the Phoenix dashboard for more immediate feedback, before moving it to code.
Below is an example of how to pull all retriever spans and select the input value. The output of this query is a DataFrame that contains the input values for all retriever spans.
How to Specify a Time Range
By default, all queries will collect all spans that are in your Phoenix instance. If you'd like to focus on most recent spans, you can pull spans based on time frames using start_time
and end_time
.
How to Specify a Project
Querying for Retrieved Documents
Let's say we want to extract the retrieved documents into a DataFrame that looks something like the table below, where input
denotes the query for the retriever, reference
denotes the content of each document, and document_position
denotes the (zero-based) index in each span's list of retrieved documents.
5B8EF798A381
0
What was the author's motivation for writing ...
In fact, I decided to write a book about ...
5B8EF798A381
1
What was the author's motivation for writing ...
I started writing essays again, and wrote a bunch of ...
...
...
...
...
E19B7EC3GG02
0
What did the author learn about ...
The good part was that I got paid huge amounts of ...
How to Explode Attributes
How to Apply Filters
The .where()
method accepts a string of valid Python boolean expression. The expression can be arbitrarily complex, but restrictions apply, e.g. making function calls are generally disallowed. Below is a conjunction filtering also on whether the input value contains the string 'programming'
.
Filtering Spans by Evaluation Results
Filtering on Metadata
metadata
is an attribute that is a dictionary and it can be filtered like a dictionary.
Filtering for Substring
Note that Python strings do not have a contain
method, and substring search is done with the in
operator.
Filtering for No Evaluations
Get spans that do not have an evaluation attached yet
How to Apply Filters (UI)
How to Extract Attributes
Span attributes can be selected by simply listing them inside .select()
method.
Renaming Output Columns
Keyword-argument style can be used to rename the columns in the dataframe. The example below returns two columns named input
and output
instead of the original names of the attributes.
Arbitrary Output Column Names
If arbitrary output names are desired, e.g. names with spaces and symbols, we can leverage Python's double-asterisk idiom for unpacking a dictionary, as shown below.
Advanced Usage
Concatenating
Special Separators
If a different separator is desired, say \n************
, it can be specified as follows.
Using Parent ID as Index
This is useful for joining a span to its parent span. To do that we would first index the child span by selecting its parent ID and renaming it as span_id
. This works because span_id
is a special column name: whichever column having that name will become the index of the output DataFrame.
Joining a Span to Its Parent
How to use Data for Evaluation
Extract the Input and Output from LLM Spans
Retrieval (RAG) Relevance Evaluations
Q&A on Retrieved Data Evaluations
Pre-defined Queries
Phoenix also provides helper functions that executes predefined queries for the following use cases.
Retrieved Documents
Q&A on Retrieved Data
CDBC4CE34
What was the author's trick for ...
The author's trick for ...
Even then it took me several years to understand ...
...
...
...
...
Save All Traces
Sometimes you may want to back up your Phoenix traces to a single file, rather than exporting specific spans to run evaluation.
Use the following command to save all traces from a Phoenix instance to a designated location.
You can specify the directory to save your traces by passing adirectory
argument to the save
method.
This output the trace ID and prints the path of the saved file:
💾 Trace dataset saved to under ID: f7733fda-6ad6-4427-a803-55ad2182b662
📂 Trace dataset path: /my_saved_traces/trace_dataset-f7733fda-6ad6-4427-a803-55ad2182b662.parquet
Last updated
Was this helpful?