Python API TypeScript API GitHub Slack

Datasets and Experiments

Iteratively improve your LLM task by building datasets, running experiments, and evaluating performance using code and LLM-as-a-judge.

Cover

Comprehensive Use Cases

Summarization Service

Email Text Extraction

Pairwise Evaluator

Cover

RAG Use Cases

Answer and Context Relevancy Evals

Response Guideline Evals

LlamaIndex RAG with Reranker

PreviousLLM Traces NextLLM Evals

Last updated 3 days ago

Was this helpful?