LogoLogo
Python SDKSlack
  • Documentation
  • Cookbooks
  • Self-Hosting
  • Release Notes
  • Reference
  • Arize AI
  • Quickstarts
  • ✨Arize Copilot
  • Arize AI for Agents
  • Concepts
    • Agent Evaluation
    • Tracing
      • What is OpenTelemetry?
      • What is OpenInference?
      • Openinference Semantic Conventions
    • Evaluation
  • 🧪Develop
    • Quickstart: Experiments
    • Datasets
      • Create a dataset
      • Update a dataset
      • Export a dataset
    • Experiments
      • Run experiments
      • Run experiments with code
        • Experiments SDK differences in AX vs Phoenix
        • Log experiment results via SDK
      • Evaluate experiments
      • Evaluate experiment with code
      • CI/CD with experiments
        • Github Action Basics
        • Gitlab CI/CD Basics
      • Download experiment
    • Prompt Playground
      • Use tool calling
      • Use image inputs
    • Playground Integrations
      • OpenAI
      • Azure OpenAI
      • AWS Bedrock
      • VertexAI
      • Custom LLM Models
    • Prompt Hub
  • 🧠Evaluate
    • Online Evals
      • Run evaluations in the UI
      • Run evaluations with code
      • Test LLM evaluator in playground
      • View task details & logs
      • ✨Copilot: Eval Builder
      • ✨Copilot: Eval Analysis
      • ✨Copilot: RAG Analysis
    • Experiment Evals
    • LLM as a Judge
      • Custom Eval Templates
      • Arize Templates
        • Agent Tool Calling
        • Agent Tool Selection
        • Agent Parameter Extraction
        • Agent Path Convergence
        • Agent Planning
        • Agent Reflection
        • Hallucinations
        • Q&A on Retrieved Data
        • Summarization
        • Code Generation
        • Toxicity
        • AI vs Human (Groundtruth)
        • Citation
        • User Frustration
        • SQL Generation
    • Code Evaluations
    • Human Annotations
  • 🔭Observe
    • Quickstart: Tracing
    • Tracing
      • Setup tracing
      • Trace manually
        • Trace inputs and outputs
        • Trace function calls
        • Trace LLM, Retriever and Tool Spans
        • Trace prompt templates & variables
        • Trace as Inferences
        • Send Traces from Phoenix -> Arize
        • Advanced Tracing (OTEL) Examples
      • Add metadata
        • Add events, exceptions and status
        • Add attributes, metadata and tags
        • Send data to a specific project
        • Get the current span context and tracer
      • Configure tracing options
        • Configure OTEL tracer
        • Mask span attributes
        • Redact sensitive data from traces
        • Instrument with OpenInference helpers
      • Query traces
        • Filter Traces
          • Time Filtering
        • Export Traces
        • ✨AI Powered Search & Filter
        • ✨AI Powered Trace Analysis
        • ✨AI Span Analysis & Evaluation
    • Tracing Integrations
      • OpenAI
      • OpenAI Agents SDK
      • LlamaIndex
      • LlamaIndex Workflows
      • LangChain
      • LangGraph
      • Hugging Face smolagents
      • Autogen
      • Google GenAI (Gemini)
      • Model Context Protocol (MCP)
      • Vertex AI
      • Amazon Bedrock
      • Amazon Bedrock Agents
      • MistralAI
      • Anthropic
      • LangFlow
      • Haystack
      • LiteLLM
      • CrewAI
      • Groq
      • DSPy
      • Guardrails AI
      • Prompt flow
      • Vercel AI SDK
      • Llama
      • Together AI
      • OpenTelemetry (arize-otel)
      • BeeAI
    • Evals on Traces
    • Guardrails
    • Sessions
    • Dashboards
      • Dashboard Widgets
      • Tracking Token Usage
      • ✨Copilot: Dashboard Widget Creation
    • Monitors
      • Integrations: Monitors
        • Slack
          • Manual Setup
        • OpsGenie
        • PagerDuty
      • LLM Red Teaming
    • Custom Metrics & Analytics
      • Arize Query Language Syntax
        • Conditionals and Filters
        • All Operators
        • All Functions
      • Custom Metric Examples
      • ✨Copilot: ArizeQL Generator
  • 📈Machine Learning
    • Machine Learning
      • User Guide: ML
      • Quickstart: ML
      • Concepts: ML
        • What Is A Model Schema
        • Delayed Actuals and Tags
        • ML Glossary
      • How To: ML
        • Upload Data to Arize
          • Pandas SDK Example
          • Local File Upload
            • File Upload FAQ
          • Table Ingestion Tuning
          • Wildcard Paths for Cloud Storage
          • Troubleshoot Data Upload
          • Sending Data FAQ
        • Monitors
          • ML Monitor Types
          • Configure Monitors
            • Notifications Providers
          • Programmatically Create Monitors
          • Best Practices for Monitors
        • Dashboards
          • Dashboard Widgets
          • Dashboard Templates
            • Model Performance
            • Pre-Production Performance
            • Feature Analysis
            • Drift
          • Programmatically Create Dashboards
        • Performance Tracing
          • Time Filtering
          • ✨Copilot: Performance Insights
        • Drift Tracing
          • ✨Copilot: Drift Insights
          • Data Distribution Visualization
          • Embeddings for Tabular Data (Multivariate Drift)
        • Custom Metrics
          • Arize Query Language Syntax
            • Conditionals and Filters
            • All Operators
            • All Functions
          • Custom Metric Examples
          • Custom Metrics Query Language
          • ✨Copilot: ArizeQL Generator
        • Troubleshoot Data Quality
          • ✨Copilot: Data Quality Insights
        • Explainability
          • Interpreting & Analyzing Feature Importance Values
          • SHAP
          • Surrogate Model
          • Explainability FAQ
          • Model Explainability
        • Bias Tracing (Fairness)
        • Export Data to Notebook
        • Automate Model Retraining
        • ML FAQ
      • Use Cases: ML
        • Binary Classification
          • Fraud
          • Insurance
        • Multi-Class Classification
        • Regression
          • Lending
          • Customer Lifetime Value
          • Click-Through Rate
        • Timeseries Forecasting
          • Demand Forecasting
          • Churn Forecasting
        • Ranking
          • Collaborative Filtering
          • Search Ranking
        • Natural Language Processing (NLP)
        • Common Industry Use Cases
      • Integrations: ML
        • Google BigQuery
          • GBQ Views
          • Google BigQuery FAQ
        • Snowflake
          • Snowflake Permissions Configuration
        • Databricks
        • Google Cloud Storage (GCS)
        • Azure Blob Storage
        • AWS S3
          • Private Image Link Access Via AWS S3
        • Kafka
        • Airflow Retrain
        • Amazon EventBridge Retrain
        • MLOps Partners
          • Algorithmia
          • Anyscale
          • Azure & Databricks
          • BentoML
          • CML (DVC)
          • Deepnote
          • Feast
          • Google Cloud ML
          • Hugging Face
          • LangChain 🦜🔗
          • MLflow
          • Neptune
          • Paperspace
          • PySpark
          • Ray Serve (Anyscale)
          • SageMaker
            • Batch
            • RealTime
            • Notebook Instance with Greater than 20GB of Data
          • Spell
          • UbiOps
          • Weights & Biases
      • API Reference: ML
        • Python SDK
          • Pandas Batch Logging
            • Client
            • log
            • Schema
            • TypedColumns
            • EmbeddingColumnNames
            • ObjectDetectionColumnNames
            • PromptTemplateColumnNames
            • LLMConfigColumnNames
            • LLMRunMetadataColumnNames
            • NLP_Metrics
            • AutoEmbeddings
            • utils.types.ModelTypes
            • utils.types.Metrics
            • utils.types.Environments
          • Single Record Logging
            • Client
            • log
            • TypedValue
            • Ranking
            • Multi-Class
            • Object Detection
            • Embedding
            • LLMRunMetadata
            • utils.types.ModelTypes
            • utils.types.Metrics
            • utils.types.Environments
        • Java SDK
          • Constructor
          • log
          • bulkLog
          • logValidationRecords
          • logTrainingRecords
        • R SDK
          • Client$new()
          • Client$log()
        • Rest API
    • Computer Vision
      • How to: CV
        • Generate Embeddings
          • How to Generate Your Own Embedding
          • Let Arize Generate Your Embeddings
        • Embedding & Cluster Analyzer
        • ✨Copilot: Embedding Summarization
        • Similarity Search
        • Embedding Drift
        • Embeddings FAQ
      • Integrations: CV
      • Use Cases: CV
        • Image Classification
        • Image Segmentation
        • Object Detection
      • API Reference: CV
Powered by GitBook

Support

  • Chat Us On Slack
  • support@arize.com

Get Started

  • Signup For Free
  • Book A Demo

Copyright © 2025 Arize AI, Inc

On this page
  • General Attributes
  • LLM
  • EMBEDDING
  • RETRIEVER
  • TOOL
  • RERANKER
  • Instrumenting tool spans

Was this helpful?

  1. Observe
  2. Tracing
  3. Trace manually

Trace LLM, Retriever and Tool Spans

Last updated 7 days ago

Was this helpful?

In cases where teams want to manually instrument their own spans and are not using auto-instrumentation, this section instrumentation attributes of the main span types.

This is a link to the semantic conventions of open inference for reference:

General Attributes

These are attributes that can work on any span.

from openinference.semconv.trace import SpanAttributes

def do_work():
    with tracer.start_as_current_span("span-name") as span:
        span.set_attribute(SpanAttributes.OPENINFERENCE_SPAN_KIND, "CHAIN") # see here for a list of span kinds: https://github.com/Arize-ai/openinference/blob/main/python/openinference-semantic-conventions/src/openinference/semconv/trace/__init__.py#L271
        span.set_attribute(SpanAttributes.TAG_TAGS, str("['tag1','tag2']")) # List of tags to give the span a category
        span.set_attribute(SpanAttributes.INPUT_VALUE, "<INPUT>") # The input value to an operation
        span.set_attribute(SpanAttributes.INPUT_MIME_TYPE, "text/plain") # either text/plain or application/json
        span.set_attribute(SpanAttributes.OUTPUT_VALUE, "<OUTPUT>") # The output value of an operation
        span.set_attribute(SpanAttributes.OUTPUT_MIME_TYPE, "text/plain") # either text/plain or application/json 
        span.set_attribute(SpanAttributes.METADATA, "<ADDITIONAL_METADATA>") # additional key value pairs you want to store
        span.set_attribute(SpanAttributes.IMAGE_URL, "<IMAGE_URL>") # An http or base64 image url
        span.set_attribute("exception.message", "<EXCEPTION_MESSAGE>")
        span.set_attribute("exception.stacktrace", "<EXCEPTION_STACKTRACE>")
        span.set_attribute("exception.type", "<EXCEPTION_TYPE>") # e.g. NullPointerException
        
        
        # do some work that 'span' will track
        print("doing some work...")
        # When the 'with' block goes out of scope, 'span' is closed for you

LLM

from openinference.semconv.trace import SpanAttributes

def llm_call():
    with tracer.start_as_current_span("span-name") as span:
        span.set_attribute(SpanAttributes.LLM_PROMPT_TEMPLATE_VARIABLES, "<prompt_template_variables>") # JSON of key value pairs representing prompt vars: to be applied to prompt template
        span.set_attribute(SpanAttributes.LLM_PROMPT_TEMPLATE, "<prompt_template>") # Template used to generate prompts as Python f-strings
        span.set_attribute(SpanAttributes.LLM_PROMPT_TEMPLATE_VERSION, "<input_messages>") # The version of the prompt template, "v1.0"
        span.set_attribute(SpanAttributes.LLM_TOKEN_COUNT_PROMPT, "<prompt_tokens>") # The number of tokens in the prompt
        span.set_attribute(SpanAttributes.LLM_TOKEN_COUNT_COMPLETION, "<completion_tokens>") # The number of tokens in the completion
        span.set_attribute(SpanAttributes.LLM_TOKEN_COUNT_TOTAL, "<tokens_total>") # Total number of tokens, including both prompt and completion.
        span.set_attribute(SpanAttributes.LLM_FUNCTION_CALL, "<function_call_results>") # For models and APIs that support function calling. Records attributes such as the function name and arguments to the called function. This is the result JSON from a model representing the function(s) "to call" 
        span.set_attribute(SpanAttributes.LLM_INVOCATION_PARAMETERS, "<invocation_parameters>") # These are the invocation Object recording details of a function call in models or APIs, "{model_name: 'gpt-3', temperature: 0.7}"
        span.set_attribute(SpanAttributes.LLM_INPUT_MESSAGES, "<input_messages>") # List of messages sent to the LLM in a chat API request, [{"message.role": "user", "message.content": "hello"}]
        span.set_attribute(SpanAttributes.LLM_OUTPUT_MESSAGES, "<output_messages>") # Messages received from a chat API, [{"message.role": "user", "message.content": "hello"}]
        span.set_attribute(SpanAttributes.LLM_MODEL_NAME, "<input_messages") # The name of the language model being utilized
        

EMBEDDING

from openinference.semconv.trace import SpanAttributes, EmbeddingAttributes

def get_embeddings():
    with tracer.start_as_current_span("span-name") as span:
        span.set_attribute(SpanAttributes.OPENINFERENCE_SPAN_KIND, OpenInferenceSpanKindValues.EMBEDDING.value)
        span.set_attribute(SpanAttributes.EMBEDDING_MODEL_NAME, "<RETURNED_EMBEDDING_VECTOR>") # The name of the embedding model.                
        for i in range(number_of_embeddings):
            span.set_attribute(f"{SpanAttributes.EMBEDDING_EMBEDDINGS}.{i}.{EmbeddingAttributes.EMBEDDING_TEXT}", <TEXT HERE>)
            span.set_attribute(f"{SpanAttributes.EMBEDDING_EMBEDDINGS}.{i}.{EmbeddingAttributes.EMBEDDING_VECTOR}", <EMBEDDING VECTOR HERE>)
        # do some work that 'span' will track
        print("doing some work...")
        # When the 'with' block goes out of scope, 'span' is closed for you

RETRIEVER

Use this span type to log spans for documents retrieved as part of a RAG pipeline. Here is a simple example:

from openinference.semconv.trace import SpanAttributes

def get_embeddings():
    with tracer.start_as_current_span("span-name") as span:
        span.set_attribute(SpanAttributes.OPENINFERENCE_SPAN_KIND, OpenInferenceSpanKindValues.RETRIEVER.value)
        span.set_attribute(SpanAttributes.DOCUMENT_ID, "<DOCUMENT_ID>") # Unique identifier for a document               
        span.set_attribute(SpanAttributes.DOCUMENT_SCORE, "<DOCUMENT_SCORE>") # Score representing the relevance of a document
        span.set_attribute(SpanAttributes.DOCUMENT_CONTENT, "<DOCUMENT_CONTENT>") # The content of a retrieved document
        span.set_attribute(SpanAttributes.DOCUMENT_METADATA, str(<DOCUMENT_METADATA_JSON>)) # Metadata associated with a document
        # do some work that 'span' will track
        print("doing some work...")
        # When the 'with' block goes out of scope, 'span' is closed for you

If you are making calls to external APIs to retrieve documents, you can do something like the below:

with tracer.start_as_current_span("Vector Search") as vector_search_span:
    vector_search_span.set_attribute(SpanAttributes.INPUT_VALUE, query_text)
    vector_search_span.set_attribute(SpanAttributes.OPENINFERENCE_SPAN_KIND, OpenInferenceSpanKindValues.RETRIEVER.value)
    try:
        documents = search_client.search(
            search_text=query_text,
        )
        # Example documents returned from the vector search
        # documents = [
        #         {'document.id': 0, 'document.score': 0.1, 'document.content': "This is a test"}, 
        #         {'document.id': 1, 'document.score': 0.2, 'document.content': "This is a test2"} 
        #     ]
        for i, document in enumerate(documents):
            vector_search_span.set_attribute(f"retrieval.documents.{i}.document.id", document["document.id"])
            vector_search_span.set_attribute(f"retrieval.documents.{i}.document.score", document["document.score"])
            vector_search_span.set_attribute(f"retrieval.documents.{i}.document.content", document["document.content"])
                      
        vector_search_span.set_status(Status(StatusCode.OK, "Vector search successful"))
    except Exception as e:
        vector_search_span.set_status(Status(StatusCode.ERROR, "Vector search failed"))
        pass

TOOL

from openinference.semconv.trace import SpanAttributes,ToolCallAttributes

def tool_call():
    with tracer.start_as_current_span("span-name") as span:
        span.set_attribute(SpanAttributes.OPENINFERENCE_SPAN_KIND, OpenInferenceSpanKindValues.TOOL.value)
        span.set_attribute(ToolCallAttributes.TOOL_CALL_FUNCTION_NAME, "<NAME_OF_YOUR_TOOL>") # The name of the tool being utilized
        span.set_attribute(ToolCallAttributes.TOOL_CALL_FUNCTION_ARGUMENTS_JSON, str(<JSON_OBJ_OF_FUNCTION_PARAMS>)) # The arguments for the function being invoked by a tool call
        # do some work that 'span' will track
        print("doing some work...")
        # When the 'with' block goes out of scope, 'span' is closed for you

The example below demonstrates how to manually trace a tool function along with a chat completion response. You'll see how to create spans for both the tool and LLM to capture their input, output, and key events.

import json

#from your_tracer import tracer

def run_tool(tool_function, tool_args):
    #first set context for current span 
    with tracer.start_as_current_span(
        name="Tool - some tool",
        attributes={
            # Set these attributes before calling the tool, in case an exception is raised by the tool
            **{
                "openinference.span.kind": "TOOL",
                "input.value": question,
                "message.tool_calls.0.tool_call.function.name": tool_function.__name__,
                "message.tool_calls.0.tool_call.function.arguments": json.dumps(
                    tool_args
                ),
            },
        },
    ) as tool_span:
        #run tool; output is formatted prompt for chat completion
        resulting_prompt = tool_function(input=tool_args)
        # optional - set the resulting prompt as the tool span output
        tool_span.set_attribute(
            "message.tool_calls.0.tool_call.function.output", resulting_prompt
        )

        # This LLM span nests under the tool span in the trace
        with tracer.start_as_current_span(
            name="Tool - llm response",
            # Set these attributes before calling the LLM
            attributes={
                "openinference.span.kind": "LLM",
                "input.value": resulting_prompt,
            },
        ) as llm_span:
            llm_response = openai_client.chat.completions.create(
                    model=model_version,
                    messages=[current_user_message],
                    temperature=TEMPERATURE,
                )
            llm_span.set_attribute("output.value", llm_response)

RERANKER

from openinference.semconv.trace import SpanAttributes
def tool_call():
    with tracer.start_as_current_span("span-name") as span:
        span.set_attribute(SpanAttributes.OPENINFERENCE_SPAN_KIND, OpenInferenceSpanKindValues.RERANKER.value)
        span.set_attribute(SpanAttributes.RERANKER_INPUT_DOCUMENTS, str(<LIST_OF_DOCUMENTS>)) # List of documents as input to the reranker
        span.set_attribute(SpanAttributes.RERANKER_OUTPUT_DOCUMENTS, str(<LIST_OF_DOCUMENTS>)) # List of documents as outputs of the reranker
        span.set_attribute(SpanAttributes.RERANKER_QUERY, "<RERANKER_QUERY>") # Query parameter of the reranker
        span.set_attribute(SpanAttributes.RERANKER_MODEL_NAME, "<MODEL_NAME>") # Name of the reranker model
        span.set_attribute(SpanAttributes.RERANKER_TOP_K, "<RERANKER_TOP_K>") # Top K parameter of the reranker        
        # do some work that 'span' will track
        print("doing some work...")
        # When the 'with' block goes out of scope, 'span' is closed for you

Instrumenting tool spans

The example below demonstrates how to manually trace a tool function along with a chat completion response. You'll see how to create spans for both the tool and LLM to capture their input, output, and key events.

import json

#from your_tracer import tracer

def run_tool(tool_function, tool_args):
    #first set context for current span 
    with tracer.start_as_current_span(
        name="Tool - some tool",
        attributes={
            # Set these attributes before calling the tool, in case an exception is raised by the tool
            **{
                "openinference.span.kind": "TOOL",
                "input.value": question,
                "message.tool_calls.0.tool_call.function.name": tool_function.__name__,
                "message.tool_calls.0.tool_call.function.arguments": json.dumps(
                    tool_args
                ),
            },
        },
    ) as tool_span:
        #run tool; output is formatted prompt for chat completion
        resulting_prompt = tool_function(input=tool_args)
        # optional - set the resulting prompt as the tool span output
        tool_span.set_attribute(
            "message.tool_calls.0.tool_call.function.output", resulting_prompt
        )

        # This LLM span nests under the tool span in the trace
        with tracer.start_as_current_span(
            name="Tool - llm response",
            # Set these attributes before calling the LLM
            attributes={
                "openinference.span.kind": "LLM",
                "input.value": resulting_prompt,
            },
        ) as llm_span:
            llm_response = openai_client.chat.completions.create(
                    model=model_version,
                    messages=[current_user_message],
                    temperature=TEMPERATURE,
                )
            llm_span.set_attribute("output.value", llm_response)

Before diving into the code, ensure you have correctly:

Before diving into the code, ensure you have correctly:

🔭
configured your tracer
configured your tracer
openinference/spec/semantic_conventions.md at main · Arize-ai/openinferenceGitHub
Logo