Human Annotations
Last updated
Was this helpful?
Last updated
Was this helpful?
Annotations are custom labels that can be added to traces in LLM applications. AI engineers can use annotations to:
Hand-label / manually label data
Categorize spans or traces
Curate a dataset for experimentation
Log human feedback
User Feedback is an important part of improving applications. Annotations enable teams to add their feedback to their trace data. Annotations can be added via the UI or via API.
When are annotations used?
Find examples where LLM evals and humans agree / disagree (for eval improvement) or further review
Often times, subject matter experts (doctor, legal expert, customer support expert) are needed to determine how good the application as a whole is, this is complementary to other evals
(soon) Log feedback from an application directly (via API or latent label)
Annotations are labels that can be applied at a per-span level for LLM use cases. Annotations are defined by a config for the annotation (label, score). Those annotations are then available for any future annotation on the model.
Unstructured text annotations (notes) can also be continuously added.
Users can save and view annotations on a trace and also filter on them.
Annotations can also be performed via our Python SDK. Use the log_annotations_sync
function as part of our Python SDK to attach human feedback, corrections, or other annotations to specific spans (traces). The code below assumes that you have annotation data available in an annotations_dataframe
object. It also assumes you have the relevant context.span_id
for the span you want to annotate.
Navigate to a trace within your project in the Arize platform.
Click the "Annotate" button to open the annotation panel.
Click "Add Annotation".
Define the <annotation_name> exactly as you will use it in your SDK code.
Select the appropriate Type (Label for categorical strings, Score for numerical values).
If using Label, define the specific allowed label strings that the SDK can send for this annotation name. If using Score, you can optionally define score ranges.
Note: Only annotations matching a pre-configured (1) name and (2) type/labels in the UI can be successfully logged via the SDK.
Here is how you can log your annotations in real-time to the Arize platform with the python SDK:
Import Packages and Setup Arize Client
Create Sample Data (replace with your actual data)
Log Annotation
The annotations_dataframe requires the following columns:
context.span_id: The unique identifier of the span to which the annotations should be attached.
Annotation Columns: Columns following the pattern annotation.<annotation_name>.<suffix> where:
<annotation_name>: A name for your annotation (e.g., quality, correctness, sentiment). Should be alphanumeric characters and underscores.
<suffix>: Defines the type and metadata of the annotation. Valid suffixes are:
You must provide at least one annotation.<annotation_name>.label or annotation.<annotation_name>.score column for each annotation you want to log.
label: For categorical annotations (e.g., "good", "bad", "spam"). The value should be a string.
score: For numerical annotations (e.g., a rating from 1-5). The value should be numeric (int or float).
updated_by (Optional): A string indicating who made the annotation (e.g., "user_id_123", "annotator_team_a"). If not provided, the SDK automatically sets this to "SDK Logger".
updated_at (Optional): A timestamp indicating when the annotation was made, represented as milliseconds since the Unix epoch (integer). If not provided, the SDK automatically sets this to the current UTC time.
annotation.notes (Optional): A column containing free-form text notes that apply to the entire span, not a specific annotation label/score. The value should be a string. The SDK will handle formatting this correctly for storage.
An example annotation data dictionary would look like:
Labeling queues are sets of data you would like subject matter experts/3rd parties to label or score on any criteria you specify. You can use these annotations to create golden datasets from experts for fine tuning, and find examples where LLM evals and humans disagree.
What you need to use Labeling queues is:
A dataset you want to annotate
Annotator users in your space
Note: you can assign annotators OR members in your space to a labeling queue. Annotators will see a restricted view of the platform (see below)
Annotation criteria
In the settings page, you can invite your annotators by adding them as users with the account role as Annotator. They will receive an email to be added to your space and set their password.
After you have created a dataset of traces you want to evaluate, you can create a labeling queue and distribute them to your annotation team. Then, you can view your records and annotations provided.
Annotators see the labeling queues they have been assigned, and the data they need to annotate, along with the label or score they need to provide in the top right. Your datasets can contain text, images, and links. Annotators can leave notes, and use the keyboard shortcuts to provide annotations faster.
Annotate spans
Setup labeling queues