Quickstart: Evals
Evaluate your LLM application with Phoenix
This quickstart guide will show you through the basics of evaluating data from your LLM application.
1. Install Phoenix Evals
2. Export Data and Launch Phoenix
Export a dataframe from your Phoenix session that contains traces from your LLM application.
If you are interested in a subset of your data, you can export with a custom query. Learn more here.
For the sake of this guide, we'll download some pre-existing trace data collected from a LlamaIndex application (in practice, this data would be collected by instrumenting your LLM application with an OpenInference-compatible tracer).
Then, start Phoenix to view and manage your evaluations.
You should now see a view like this.
3. Evaluate and Log Results
Set up evaluators (in this casefor hallucinations and Q&A correctness), run the evaluations, and log the results to visualize them in Phoenix.
This quickstart uses OpenAI and requires an OpenAI API key, but we support a wide variety of APIs and models.
4. Analyze Your Evaluations
After logging your evaluations, open Phoenix to review your results. Inspect evaluation statistics, identify problematic spans, and explore the reasoning behind each evaluation.
You can view aggregate evaluation statistics, surface problematic spans, and understand the LLM's reason for each evaluation by simply reading the corresponding explanation. Phoenix seamlessly pinpoints the cause (irrelevant retrievals, incorrect parameterization of your LLM, etc.) of your LLM application's poor responses.
If you're interested in extending your evaluations to include relevance, explore our detailed Colab guide.
Now that you're set up, read through the Concepts Section to get an understanding of the different components.
If you want to learn how to accomplish a particular task, check out the How-To Guides.
Last updated