Quickstart: Evaluation
Last updated
Was this helpful?
Last updated
Was this helpful?
This guide assumes you have and are looking to run an evaluation to measure your application performance. If you want to learn more about evaluation, read our guide on .
Here's how you add LLM evaluations:
Once you have traces in Arize, you can visit the LLM Tracing tab to see your traces and export them in code. By clicking the export button, you can get the boilerplate code to copy paste to your evaluator.
Import the functions from our Phoenix library to run a custom evaluation using OpenAI.
Ensure you have your OpenAI API keys setup correctly for your OpenAI model.
Create a prompt template for the LLM to judge the quality of your responses. Below is an example which judges the positivity or negativity of the LLM output.
You can use the code below to check which attributes are in the traces in your dataframe.
Use the code below to set the input and output variables needed for the prompt above.
Currently, our evaluations are logged within Arize every 24 hours, and we're working on making them as close to instant as possible! Reach out to support@arize.com if you're having trouble here.
Notice the variables in brackets for {input}
and {output}
above. You will need to set those variables appropriately for the dataframe so you can run your custom template. We use as a set of conventions (complementary to ) to trace AI applications. This means depending on the provider you are using, the attributes of the trace will be different.
Use the function to run the evaluation using your custom template. You will be using the dataframe from the traces you generated above.
If you'd like more information, see our detailed guide on . You can also use our for evaluating hallucination, toxicity, retrieval, etc.
Export the evals you generated above to Arize using the log_evaluations
function as part of our Python SDK. See more information on how to do this in our article on .