Ask or search…

Export Your Data

How to export your data for labeling, evaluation, or fine-tuning
Phoenix is designed to be a pre-production tool that can be used to find interesting or problematic data that can be used for various use-cases:
  • A subset of production data for re-labeling and training
  • A subset of data for fine-tuning an LLM
  • A set of traces to run LLM Evals with or to share with a teammate

Exporting Traces

The easiest way to gather traces that have been collected by Phoenix is to directly pull a dataframe of the traces from your Phoenix session object.
px.active_session().get_spans_dataframe('span_kind == "RETRIEVER"')
Notice that the get_spans_dataframe method supports a Python expression as an optional str parameter so you can filter down your data to specific traces you care about. For full details, consult the Session API docs.
You can also directly get the spans from the tracer or callback:
from phoenix.trace.langchain import OpenInferenceTracer
tracer = OpenInferenceTracer()
# Run the application with the tracer, callbacks=[tracer])
# When you are ready to analyze the data, you can convert the traces
ds = TraceDataset.from_spans(tracer.get_spans())
# Print the dataframe
# Re-initialize the app with the trace dataset
Note that the above calls get_spans on a LangChain tracer but the same exact method exists on the OpenInferenceCallback for LlamaIndex as well.

Exporting Embeddings

Embeddings can be extremely useful for fine-tuning. There are two ways to export your embeddings from the Phoenix UI.

Export Selected Clusters

To export a cluster (either selected via the lasso tool or via a the cluster list on the right hand panel), click on the export button on the top left of the bottom slide-out.

Export All Clusters

To export all clusters of embeddings as a single dataframe (labeled by cluster), click the ... icon on the top right of the screen and click export. Your data will be available either as a Parquet file or is available back in your notebook via your session as a dataframe.
session = px.active_session()