Evaluations

Evaluate LLM outputs with Arize

This article uses our open source Phoenix package. See our guide on getting started with Phoenix.

As human evaluation of text becomes impractical at scale, we have created a streamlined solution for LLM-based assessments, ensuring your evaluations are fast, accurate, and scalable. We use Phoenix, our open source package for running evals.

An Evaluation Framework Built for Production Applications

Phoenix was created with the understanding that robustness, clarity, and ease of use are necessary:

  1. Pre-tested Evaluators Backed by Research - Our library includes a range of evaluators, thoroughly tested and continually updated, to provide accurate assessments tailored to your application's needs.

  2. Multi-level Custom Evaluation - We provide several evaluations complete with explanations out of the box and allow for customized solutions for any use case.

  3. Designed for Speed - Phoenix Evals are engineered for quick and efficient processing, enabling you to handle large volumes of data without compromising on performance.

  4. Ease of Onboarding - Get up and running without hassle. Our framework integrates seamlessly with popular LLM frameworks like LangChain and LlamaIndex, providing straightforward setup and execution.

To get started, check out the Quickstart guide.

After that, read through the Concepts Section to get an understanding of the different components.

If you want to learn how to accomplish a particular task, check out the How-To Guides.

Last updated

Copyright © 2023 Arize AI, Inc