How To: Annotating Spans
Last updated
Was this helpful?
Last updated
Was this helpful?
Annotations are custom labels that can be added to traces in LLM applications. AI engineers can use annotations to:
Hand-label / manually label data
Categorize spans or traces
Curate a dataset for experimentation
Log human feedback
User Feedback is an important part of improving applications. Annotations enable teams to add their feedback to their trace data. Annotations can be added via the UI or via API.
When are annotations used?
Find examples where LLM evals and humans agree / disagree (for eval improvement) or further review
Often times, subject matter experts (doctor, legal expert, customer support expert) are needed to determine how good the application as a whole is, this is complementary to other evals
(soon) Log feedback from an application directly (via API or latent label)
Annotations are labels that can be applied at a per-span level for LLM use cases. Annotations are defined by a config for the annotation (label, score). Those annotations are then available for any future annotation on the model.
Unstructured text annotations (notes) can also be continuously added.
Users can save and view annotations on a trace and also filter on them.
Annotations can also be performed via our Python SDK. Use the log_annotations_sync
function as part of our Python SDK to attach human feedback, corrections, or other annotations to specific spans (traces). The code below assumes that you have annotation data available in an annotations_dataframe
object. It also assumes you have the relevant context.span_id
for the span you want to annotate.
Navigate to a trace within your project in the Arize platform.
Click the "Annotate" button to open the annotation panel.
Click "Add Annotation".
Define the <annotation_name> exactly as you will use it in your SDK code.
Select the appropriate Type (Label for categorical strings, Score for numerical values).
If using Label, define the specific allowed label strings that the SDK can send for this annotation name. If using Score, you can optionally define score ranges.
Note: Only annotations matching a pre-configured (1) name and (2) type/labels in the UI can be successfully logged via the SDK.
Here is how you can log your annotations in real-time to the Arize platform with the python SDK:
Import Packages and Setup Arize Client
Create Sample Data (replace with your actual data)
Log Annotation
The annotations_dataframe requires the following columns:
context.span_id: The unique identifier of the span to which the annotations should be attached.
Annotation Columns: Columns following the pattern annotation.<annotation_name>.<suffix> where:
<annotation_name>: A name for your annotation (e.g., quality, correctness, sentiment). Should be alphanumeric characters and underscores.
<suffix>: Defines the type and metadata of the annotation. Valid suffixes are:
You must provide at least one annotation.<annotation_name>.label or annotation.<annotation_name>.score column for each annotation you want to log.
label: For categorical annotations (e.g., "good", "bad", "spam"). The value should be a string.
score: For numerical annotations (e.g., a rating from 1-5). The value should be numeric (int or float).
updated_by (Optional): A string indicating who made the annotation (e.g., "user_id_123", "annotator_team_a"). If not provided, the SDK automatically sets this to "SDK Logger".
updated_at (Optional): A timestamp indicating when the annotation was made, represented as milliseconds since the Unix epoch (integer). If not provided, the SDK automatically sets this to the current UTC time.
annotation.notes (Optional): A column containing free-form text notes that apply to the entire span, not a specific annotation label/score. The value should be a string. The SDK will handle formatting this correctly for storage.
An example annotation data dictionary would look like: