Quickstart: Retrieval
Debug your Search and Retrieval LLM workflows
This quickstart shows how to start logging your retrievals from your vector datastore to Phoenix and run evaluations.
Notebooks
Follow our tutorial in a notebook with our Langchain and LlamaIndex integrations
Framework | Phoenix Inferences | Phoenix Traces & Spans |
---|---|---|
LangChain | ||
LlamaIndex |
Logging Retrievals to Phoenix (as Inferences)
Step 1: Logging Knowledge Base
The first thing we need is to collect some sample from your vector store, to be able to compare against later. This is to able to see if some sections are not being retrieved, or some sections are getting a lot of traffic where you might want to beef up your context or documents in that area.
For more details, visit this page.
id | text | embedding |
---|---|---|
1 | Voyager 2 is a spacecraft used by NASA to expl... | [-0.02785328, -0.04709944, 0.042922903, 0.0559... |
Step 2: Logging Retrieval and Response
We also will be logging the prompt/response pairs from the deployed application.
For more details, visit this page.
query | embedding | retrieved_document_ids | relevance_scores | response |
---|---|---|---|---|
who was the first person that walked on the moon | [-0.0126, 0.0039, 0.0217, ... | [7395, 567965, 323794, ... | [11.30, 7.67, 5.85, ... | Neil Armstrong |
Running Evaluations on your Retrievals
In order to run retrieval Evals the following code can be used for quick analysis of common frameworks of LangChain and LlamaIndex.
Independent of the framework you are instrumenting, Phoenix traces allow you to get retrieval data in a common dataframe format that follows the OpenInference specification.
Once the data is in a dataframe, evaluations can be run on the data. Evaluations can be run on on different spans of data. In the below example we run on the top level spans that represent a single trace.
Q&A and Hallucination Evals
This example shows how to run Q&A and Hallucnation Evals with OpenAI (many other models are available including Anthropic, Mixtral/Mistral, Gemini, OpenAI Azure, Bedrock, etc...)
The Evals are available in dataframe locally and can be materilazed back to the Phoenix UI, the Evals are attached to the referenced SpanIDs.
The snipit of code above links the Evals back to the spans they were generated against.
Retrieval Chunk Evals
Retrieval Evals are run on the individual chunks returned on retrieval. In addition to calculating chunk level metrics, Phoenix also calculates MRR and NDCG for the retrieved span.
The calculation is done using the LLM Eval on all chunks returned for the span and the log_evaluations connects the Evals back to the original spans.
Last updated