Arize AI for Agents
Last updated
Was this helpful?
Last updated
Was this helpful?
Arize AI is a powerful AI engineering platform designed to support the development, evaluation, and observability of AI agents. Arize helps developers create robust, high performing agents.
It has first-class support for agent frameworks such as Autogen, OpenAI-agents, LangGraph, and smolagents.
Observability is critical for understanding how agents behave in real-world scenarios. Arize AI provides robust tracing capabilities through our open source library, automatically instrumenting your agent applications to capture traces and spans. This includes LLM calls, tool invocations, and data retrieval steps, giving you a detailed view of your agent's workflow.
With just a few lines of code, you can set up tracing for popular frameworks like OpenAI Agents, LangGraph, and Autogen. Learn more about Tracing.
Code Example: Auto Instrumentation for OpenAI Agents
Evaluating agent performance is essential to ensure reliability and accuracy. Arize AI's online evaluations automatically tag spans with performance labels, helping you identify problematic interactions and measure key metrics.
Comprehensive Evaluation Templates: Arize provides templates for evaluating various agent components, such as Tool Calling, Path Convergence, and Planning.
Online Evals: With Online Evals, you can run continuous evaluations on production data to monitor correctness, hallucination, relevance, and latency. This ensures your agents perform consistently across diverse scenarios.
Custom Metrics and Alerts: Track key metrics on custom dashboards and receive alerts when performance deviates from the norm, allowing proactive optimization of agent behavior.
Code Example: Logging Evaluations to Arize
Arize's Prompt Playground is a no-code environment for iterating on prompts and testing agent behaviors, including support for tool calling—a critical feature for agents that interact with external APIs or functions.
Iterate on Prompts: Test different prompt templates, models, and parameters side by side to refine how your agent responds to user inputs.
Tool Calling Support: Debug tool calling directly in the Playground to ensure your agent selects the right tools and parameters. Learn more about Using Tools in Playground.
Save as Experiment: Run systematic A/B tests on datasets to validate agent performance and share results with your team via experiments.
For chatbot or multi-turn agent applications, tracking sessions is invaluable for debugging and performance analysis. Arize AI supports session tracking to group traces based on interactions.
Session ID and User ID: Add session.id
and user.id
as attributes to spans to group interactions and analyze conversation flows. This helps identify where conversations break or user frustration increases.
Debugging Sessions: Use the Arize platform to filter sessions and find underperforming groups of traces. Learn more about Sessions and Users.
Code Example: Adding Session ID for Agent Chatbot
Agent Replay: Replay agent interactions to debug agent tool calling in a controlled environment. Replay will help you simulate past sessions to test improvements without impacting live users.
Agent Pathing: Analyze and optimize the pathways your agents take to complete tasks. Understand whether agents are taking efficient routes or getting stuck in loops, with tools to refine planning and convergence strategies.
Agent Evaluation Guide
Learn how to evaluate every component of your agent.
Try our Tutorials
Explore example notebooks for agents, RAG, tracing, and evaluations.
Watch our Paper Readings
Dive into video discussions on the latest AI research, including agent architectures.
Join our Slack Community
Connect with other developers to ask questions, share insights, and provide feedback on agent development with Arize.