Online Evals

Run evaluations on your trace and span data

Evaluations help you understand your LLM application performance. You can measure your application across several dimensions such as correctness, hallucination, relevance, faithfulness, latency, and more. This helps you ship LLM applications that are reliable, accurate, and fast.

As your application grows and the volume of production logs increases, manually managing the data can become challenging. Online evaluation tasks automatically tag new spans with evaluation labels as soon as the data arrives in the platform.

Learn more

Run evaluations in the UI

Run evaluations with code

Read our guide on agent evals

Understand evaluation basics

Last updated

Was this helpful?