Monitors

Get alerts for your LLM application health

LLM Monitors allow your team to stay on top of model behavior, application health, and evaluation performance by setting up alerts for key metrics. Whether you’re tracking latency spikes, unexpected output, or evaluating quality, monitors keep you informed in real-time.

Overview

Monitors can be created to track a variety of metrics related to span properties, evaluations, and performance. You can:

  • Set up custom monitors tailored to your use case

  • Use managed monitors to get started quickly with one-click setup

All monitors support threshold-based alerting, so your team is notified as soon as something goes wrong.

Monitor Types

  1. Custom Monitors

    1. Custom Metric - Use any metric you've defined on your project (e.g. cost, % hallucinated, etc.).

      Example:

      • Monitor if % Hallucinated goes above 10% over the past hour.

    2. Span Property - Track properties of spans (requests) such as latency, unique user counts, or missing values.

      Example:

      • Alert if the average latency exceeds 500ms

      • Track if % empty responses on the output attribute exceeds 5%

      • Monitor cardinality of user_ids to detect anomalies in traffic

    3. Eval Monitor - Track the count or rate of failed evaluations, based on your defined eval fields (like correctness, hallucinations, etc.).

      Example:

      • Monitor if the number of evaluations where label = incorrect exceeds a certain threshold

  2. Managed Monitors

    Managed monitors help you get started fast by offering out-of-the-box monitors for common metrics.

    With just one click, you can enable monitors for:

    • Latency – Track average latency of your spans

    • Prompt Token Count – Monitor the size of your input prompts

    • Total Token Count – Measure combined prompt + completion token usage

    • Error Count – Alert when the number of failed or errored requests increases

    These are great for baseline observability and catching issues before they escalate.

Best Practices

  • Use custom span property monitors to track application-specific issues like latency or cardinality.

  • Leverage eval monitors to monitor model quality metrics based on your custom evaluations.

  • Set thresholds based on historical patterns or business SLAs.

Example Use Cases

Monitor Type
Use Case

Custom Metric

Alert if % Hallucinated exceeds 5%

Span Property

Detect spike in latency or missing values

Eval Monitor

Catch increase in incorrect application outputs

Managed Monitor

Stay informed on token usage and error counts

Last updated

Was this helpful?