Run evaluations in the UI

Create background tasks to run evaluations without code

Last updated 10 days ago

Was this helpful?

Run evaluations in the UI

Create background tasks to run evaluations without code

This guide will show you how to setup an online evaluation in the Arize UI, which runs on your data automatically as your LLM application is used. You can setup LLM as a judge evaluators or code evaluators, which run in a python container against your data.

. The setup is quite similar to LLM evaluations.

Step 1: Create a new task

Navigate to the tasks page and click "new task".

Step 2: Select your data

Choose which traces you want to evaluate, and how often you want to run it. You can run it once to backfill eval labels on historical data, or you can run it continuously against new data within 5 minutes of its arrival in the platform.

Step 3: Choose your LLM

Step 4: Define your evaluator

You can edit the output rails to constrain the generated eval labels. If there are two options, the first value will be mapped to 1, and the second will be mapped to 0, which allows for aggregate scoring.

Step 5. View Evaluation Labels on LLM Tracing Page

Once your task is successfully created, a green pop-up notification will appear! Navigate to the tracing page to view the evaluation labels generated by your newly created task.

Use Copilot to Write Your Task

Last updated 10 days ago

Was this helpful?