LogoLogo
Python SDKSlack
  • Documentation
  • Cookbooks
  • Self-Hosting
  • Release Notes
  • Reference
  • Arize AI
  • Quickstarts
  • ✨Arize Copilot
  • Arize AI for Agents
  • Concepts
    • Agent Evaluation
    • Tracing
      • What is OpenTelemetry?
      • What is OpenInference?
      • Openinference Semantic Conventions
    • Evaluation
  • 🧪Develop
    • Quickstart: Experiments
    • Datasets
      • Create a dataset
      • Update a dataset
      • Export a dataset
    • Experiments
      • Run experiments
      • Run experiments with code
        • Experiments SDK differences in AX vs Phoenix
        • Log experiment results via SDK
      • Evaluate experiments
      • Evaluate experiment with code
      • CI/CD with experiments
        • Github Action Basics
        • Gitlab CI/CD Basics
      • Download experiment
    • Prompt Playground
      • Use tool calling
      • Use image inputs
      • Replay spans
      • Compare prompts side-by-side
      • Load a dataset into playground
      • Save playground outputs as an experiment
      • ✨Copilot: prompt builder
    • Playground Integrations
      • OpenAI
      • Azure OpenAI
      • AWS Bedrock
      • VertexAI
      • Custom LLM Models
    • Prompt Hub
  • 🧠Evaluate
    • Online Evals
      • Run evaluations in the UI
      • Run evaluations with code
      • Test LLM evaluator in playground
      • View task details & logs
      • ✨Copilot: Eval Builder
      • ✨Copilot: Eval Analysis
      • ✨Copilot: RAG Analysis
    • Experiment Evals
    • LLM as a Judge
      • Custom Eval Templates
      • Arize Templates
        • Agent Tool Calling
        • Agent Tool Selection
        • Agent Parameter Extraction
        • Agent Path Convergence
        • Agent Planning
        • Agent Reflection
        • Hallucinations
        • Q&A on Retrieved Data
        • Summarization
        • Code Generation
        • Toxicity
        • AI vs Human (Groundtruth)
        • Citation
        • User Frustration
        • SQL Generation
    • Code Evaluations
    • Human Annotations
  • 🔭Observe
    • Quickstart: Tracing
    • Tracing
      • Setup tracing
      • Trace manually
        • Trace inputs and outputs
        • Trace function calls
        • Trace LLM, Retriever and Tool Spans
        • Trace prompt templates & variables
        • Trace as Inferences
        • Send Traces from Phoenix -> Arize
        • Advanced Tracing (OTEL) Examples
      • Add metadata
        • Add events, exceptions and status
        • Logging Latent Metadata
        • Add attributes, metadata and tags
        • Send data to a specific project
        • Get the current span context and tracer
      • Configure tracing options
        • Configure OTEL tracer
        • Mask span attributes
        • Redact sensitive data from traces
        • Instrument with OpenInference helpers
      • Query traces
        • Filter Traces
          • Time Filtering
        • Export Traces
        • ✨AI Powered Search & Filter
        • ✨AI Powered Trace Analysis
        • ✨AI Span Analysis & Evaluation
    • Tracing Integrations
      • OpenAI
      • OpenAI Agents SDK
      • LlamaIndex
      • LlamaIndex Workflows
      • LangChain
      • LangGraph
      • Hugging Face smolagents
      • Autogen
      • Google GenAI (Gemini)
      • Model Context Protocol (MCP)
      • Vertex AI
      • Amazon Bedrock
      • Amazon Bedrock Agents
      • MistralAI
      • Anthropic
      • LangFlow
      • Haystack
      • LiteLLM
      • CrewAI
      • Groq
      • DSPy
      • Guardrails AI
      • Prompt flow
      • Vercel AI SDK
      • Llama
      • Together AI
      • OpenTelemetry (arize-otel)
      • BeeAI
    • Evals on Traces
    • Guardrails
    • Sessions
    • Dashboards
      • Dashboard Widgets
      • Tracking Token Usage
      • ✨Copilot: Dashboard Widget Creation
    • Monitors
      • Integrations: Monitors
        • Slack
          • Manual Setup
        • OpsGenie
        • PagerDuty
      • LLM Red Teaming
    • Custom Metrics & Analytics
      • Arize Query Language Syntax
        • Conditionals and Filters
        • All Operators
        • All Functions
      • Custom Metric Examples
      • ✨Copilot: ArizeQL Generator
  • 📈Machine Learning
    • Machine Learning
      • User Guide: ML
      • Quickstart: ML
      • Concepts: ML
        • What Is A Model Schema
        • Delayed Actuals and Tags
        • ML Glossary
      • How To: ML
        • Upload Data to Arize
          • Pandas SDK Example
          • Local File Upload
            • File Upload FAQ
          • Table Ingestion Tuning
          • Wildcard Paths for Cloud Storage
          • Troubleshoot Data Upload
          • Sending Data FAQ
        • Monitors
          • ML Monitor Types
          • Configure Monitors
            • Notifications Providers
          • Programmatically Create Monitors
          • Best Practices for Monitors
        • Dashboards
          • Dashboard Widgets
          • Dashboard Templates
            • Model Performance
            • Pre-Production Performance
            • Feature Analysis
            • Drift
          • Programmatically Create Dashboards
        • Performance Tracing
          • Time Filtering
          • ✨Copilot: Performance Insights
        • Drift Tracing
          • ✨Copilot: Drift Insights
          • Data Distribution Visualization
          • Embeddings for Tabular Data (Multivariate Drift)
        • Custom Metrics
          • Arize Query Language Syntax
            • Conditionals and Filters
            • All Operators
            • All Functions
          • Custom Metric Examples
          • Custom Metrics Query Language
          • ✨Copilot: ArizeQL Generator
        • Troubleshoot Data Quality
          • ✨Copilot: Data Quality Insights
        • Explainability
          • Interpreting & Analyzing Feature Importance Values
          • SHAP
          • Surrogate Model
          • Explainability FAQ
          • Model Explainability
        • Bias Tracing (Fairness)
        • Export Data to Notebook
        • Automate Model Retraining
        • ML FAQ
      • Use Cases: ML
        • Binary Classification
          • Fraud
          • Insurance
        • Multi-Class Classification
        • Regression
          • Lending
          • Customer Lifetime Value
          • Click-Through Rate
        • Timeseries Forecasting
          • Demand Forecasting
          • Churn Forecasting
        • Ranking
          • Collaborative Filtering
          • Search Ranking
        • Natural Language Processing (NLP)
        • Common Industry Use Cases
      • Integrations: ML
        • Google BigQuery
          • GBQ Views
          • Google BigQuery FAQ
        • Snowflake
          • Snowflake Permissions Configuration
        • Databricks
        • Google Cloud Storage (GCS)
        • Azure Blob Storage
        • AWS S3
          • Private Image Link Access Via AWS S3
        • Kafka
        • Airflow Retrain
        • Amazon EventBridge Retrain
        • MLOps Partners
          • Algorithmia
          • Anyscale
          • Azure & Databricks
          • BentoML
          • CML (DVC)
          • Deepnote
          • Feast
          • Google Cloud ML
          • Hugging Face
          • LangChain 🦜🔗
          • MLflow
          • Neptune
          • Paperspace
          • PySpark
          • Ray Serve (Anyscale)
          • SageMaker
            • Batch
            • RealTime
            • Notebook Instance with Greater than 20GB of Data
          • Spell
          • UbiOps
          • Weights & Biases
      • API Reference: ML
        • Python SDK
          • Pandas Batch Logging
            • Client
            • log
            • Schema
            • TypedColumns
            • EmbeddingColumnNames
            • ObjectDetectionColumnNames
            • PromptTemplateColumnNames
            • LLMConfigColumnNames
            • LLMRunMetadataColumnNames
            • NLP_Metrics
            • AutoEmbeddings
            • utils.types.ModelTypes
            • utils.types.Metrics
            • utils.types.Environments
          • Single Record Logging
            • Client
            • log
            • TypedValue
            • Ranking
            • Multi-Class
            • Object Detection
            • Embedding
            • LLMRunMetadata
            • utils.types.ModelTypes
            • utils.types.Metrics
            • utils.types.Environments
        • Java SDK
          • Constructor
          • log
          • bulkLog
          • logValidationRecords
          • logTrainingRecords
        • R SDK
          • Client$new()
          • Client$log()
        • Rest API
    • Computer Vision
      • How to: CV
        • Generate Embeddings
          • How to Generate Your Own Embedding
          • Let Arize Generate Your Embeddings
        • Embedding & Cluster Analyzer
        • ✨Copilot: Embedding Summarization
        • Similarity Search
        • Embedding Drift
        • Embeddings FAQ
      • Integrations: CV
      • Use Cases: CV
        • Image Classification
        • Image Segmentation
        • Object Detection
      • API Reference: CV
Powered by GitBook

Support

  • Chat Us On Slack
  • support@arize.com

Get Started

  • Signup For Free
  • Book A Demo

Copyright © 2025 Arize AI, Inc

On this page
  • Functions Overview
  • Aggregation functions
  • Metric functions
  • True Positive
  • False Positive
  • True Negative
  • False Negative
  • Precision
  • Recall
  • F1
  • F_BETA
  • LOG_LOSS
  • ACCURACY
  • MAE
  • MAPE
  • MSE
  • RMSE
  • AUC
  • NDCG
  • MAX_PRECISION

Was this helpful?

  1. Observe
  2. Custom Metrics & Analytics
  3. Arize Query Language Syntax

All Functions

Aggregate and metric function syntax

Last updated 7 months ago

Was this helpful?

Functions Overview

This page provides a reference of all available functions by aggregation functions and metric functions. Click the linked model type for documentation on a particular model.

Aggregation functions

Every User Defined Metric must have one or more aggregation functions or metrics.

Function
Description
Type

COUNT(*)

Counts the number of rows

n/a

APPROX_COUNT_DISTINCT(exprs)

Counts the unique values of exprs

String

SUM(exprs)

Sums the value of the expression across rows

Numeric

AVG(exprs)

Averages the value of the expression across rows

Numeric

APPROX_QUANTILE(exprs,<decimal>)

Approximate quantile of expres. Second argument must be a numeric literal between 0 and 1 inclusive

Numeric

MIN(exprs)

Minimum of the value of the expression across rows

Numeric

MAX(exprs)

Maximum of the value of the expression across rows

Numeric

Metric functions

Metric functions leverage existing metrics in Arize for use in your custom metric. They also allow you to customize the way existing metrics are calculated. Metric functions can take both positional arguments and keyword arguments. When using both, positional arguments must come before keyword arguments. Keyword arguments can be specified in any order.

For classification metrics, the model's configured positive class is the default value, we will refer to this as defaultPositiveClass in this doc.

Note that these functions need actual (a.k.a. ground truth) data to produce results.

True Positive

TP(actual=categoricalActualLabel, predicted=categoricalPredictionLabel, pos_class=defaultPositiveClass)

Model Type: Score Categorical

Computes the true positive rate, using the positive class. If pos_class= is omitted, then the positive class configured for the model is used.

False Positive

FP(actual=categoricalActualLabel, predicted=categoricalPredictionLabel, pos_class=defaultPositiveClass)

Model Type: Score Categorical

Computes the false positive rate, using the positive class. If pos_class= is omitted, then the positive class configured for the model is used.

True Negative

TN(actual=categoricalActualLabel, predicted=categoricalPredictionLabel, pos_class=defaultPositiveClass)

Model Type: Score Categorical

Computes the true negative rate, using the positive class. If pos_class= is omitted, then the positive class configured for the model is used.

False Negative

FN(actual=categoricalActualLabel, predicted=categoricalPredictionLabel, pos_class=defaultPositiveClass)

Model Type: Score Categorical

Computes the false negative rate, using the positive class. If pos_class= is omitted, then the positive class configured for the model is used.

Precision

PRECISION(actual=categoricalActualLabel, predicted=categoricalPredictionLabel, pos_class=defaultPositiveClass)

Model Type: Score Categorical

Computes precision, using the positive class. If pos_class= is omitted, then the positive class configured for the model is used.

Recall

RECALL(actual=categoricalActualLabel, predicted=categoricalPredictionLabel, pos_class=defaultPositiveClass)

Model Type: Score Categorical

Computes recall, using the positive class. If pos_class= is omitted, then the positive class configured for the model is used.

F1

F1(actual=categoricalActualLabel, predicted=categoricalPredictionLabel, pos_class=defaultPositiveClass)

Model Type: Score Categorical

Compute the F1 score, also known as balanced F-score or F-measure. If pos_class= is omitted, then the positive class configured for the model is used.

F_BETA

F_BETA(actual=categoricalActualLabel, predicted=categoricalPredictionLabel, pos_class=defaultPositiveClass, beta=1)

Model Type: Score Categorical

Computes F-score with optional beta= parameter for re-weighting precision and recall. Beta is defaulted to 1, which produces the same result as the F-1 score. When beta=0, F-score equals precision, when beta goes to infinity, F-score equals recall. Commonly used values for beta= are 2, which weighs recall higher than precision, and 0.5, which weighs recall lower than precision. If pos_class= is omitted, then the positive class configured for the model is used.

LOG_LOSS

LOG_LOSS(actual=categoricalActualLabel, predicted=scorePredictionLabel, pos_class=defaultPositiveClass)

Model Type: Score Categorical

Computes log loss of the model. Note that actual= is a string column while predicted= is a numeric column.

ACCURACY

ACCURACY(actual=categoricalActualLabel, predicted=categoricalPredictionLabel)

Model Type: Score Categorical

Computes accuracy of the model.

MAE

MAE(actual=scoreActualLabel, predicted=scorePredictionLabel)

Model Type: Numeric, Score Categorical

Computes mean absolute error.

MAPE

MAPE(actual=scoreActualLabel, predicted=scorePredictionLabel)

Model Type: Numeric, Score Categorical

Computes mean absolute percentage error.

MSE

MSE(actual=scoreActualLabel, predicted=scorePredictionLabel)

Model Type: Numeric, Score Categorical

Computes mean squared error.

RMSE

RMSE(actual=scoreActualLabel, predicted=scorePredictionLabel)

Model Type: Numeric, Score Categorical

Computes root mean square error.

AUC

AUC(actual=scoreActualLabel, predicted=scorePredictionLabel)

Model Type: Score Categorical

Computes the (ROC) AUC.

NDCG

NDCG(ranking_relevance=relevance, prediction_group_id=predictionGroupId, rank=rank, omit_zero_relevance=True, k=10)

Model Type: Ranking

Computes the Normalized Discounted Cumulative Gain of a ranking model. In order to control the behavior of whether rows with 0-relevance are included or not, use omit_zero_relevance since this will impact the averaging that is implicit in this metric.

MAX_PRECISION

MAX_PRECISION(pos_class=positive_class, actual=scoreActualLabel, predicted=scorePredictionLabel group_by_column = 'column_to_group_by')

Model Type: Score Categorical

Groups the data by the group_by_column and selects only the rows with the highest prediction score in each group. Calculates precision using only those rows.

🔭
Aggregation functions
Metric functions