Last updated
Copyright © 2023 Arize AI, Inc
Last updated
Evaluations help you understand your LLM application performance. You can measure your application across several dimensions such as correctness, hallucination, relevance, faithfulness, latency, and more. This helps you ship LLM applications that are reliable, accurate, and fast.
Multi-level Custom Evaluation - We provide several types of evaluation complete with explanations out of the box. Customize your evaluation using your own criteria and prompt templates.
Designed for Speed - Our evals are designed to handle large volumes of data, with parallel calls, batch processing, and rate limiting.
Ease of Onboarding - Our framework integrates seamlessly with popular LLM frameworks like LangChain and LlamaIndex, providing straightforward setup and execution.
Extensive compaibility - Our library is compatible with all common LLMs and offers unparalleled RAG debugging and troubleshooting.
Pre-tested Evaluators Backed by Research - Our evaluators are thoroughly tested against the latest capabilities from LLM providers, such as tests.
To get started, check out the for evaluation. You can use or . Read our to learn how to run robust task-based evaluations on your LLM applications. Finally, check out our comprehensive guide to .