Evaluations

Evaluations help you understand your LLM application performance. You can measure your application across several dimensions such as correctness, hallucination, relevance, faithfulness, latency, and more. This helps you ship LLM applications that are reliable, accurate, and fast.

Arize has built an evaluation framework for production applications

Pre-tested Evaluators Backed by Research - Our evaluators are thoroughly tested against the latest capabilities from LLM providers, such as needle in a haystack tests.
Multi-level Custom Evaluation - We provide several types of evaluation complete with explanations out of the box. Customize your evaluation using your own criteria and prompt templates.
Designed for Speed - Our evals are designed to handle large volumes of data, with parallel calls, batch processing, and rate limiting.
Ease of Onboarding - Our framework integrates seamlessly with popular LLM frameworks like LangChain and LlamaIndex, providing straightforward setup and execution.
Extensive compaibility - Our library is compatible with all common LLMs and offers unparalleled RAG debugging and troubleshooting.

Learn more

To get started, check out the Quickstart guide for evaluation. You can use Arize Evaluators or build your own. Read our best practices to learn how to run robust task-based evaluations on your LLM applications.

PreviousSessions and Users NextQuickstart: Evaluation

Last updated 2 months ago