Evaluations
Last updated
Was this helpful?
Last updated
Was this helpful?
Run code and LLM evaluations to measure performance
Run online evals in the Arize UI
Guide
Run offline evals in code
Evaluate code functionality
Evaluate hallucination
Evlauate human ground truth vs. AI
Evaluate Q&A correctness
Evaluate RAG
Evaluate reference links
Evaluate relevance
Evaluate SQL correctness
Evaluate tool calling
Evaluate toxicity
Evaluate user frustration
Handle errors with evals