LogoLogo
Python SDKSlack
  • Documentation
  • Cookbooks
  • Self-Hosting
  • Release Notes
  • Reference
  • Arize AI
  • Quickstarts
  • โœจArize Copilot
  • Concepts
    • Agent Evaluation
    • Tracing
      • What is OpenTelemetry?
      • What is OpenInference?
      • Openinference Semantic Conventions
    • Evaluation
  • ๐ŸงชDevelop
    • Quickstart: Experiments
    • Datasets
      • Create a dataset
      • Update a dataset
      • Export a dataset
    • Experiments
      • Run experiments
      • Run experiments with code
        • Experiments SDK differences in AX vs Phoenix
        • Log experiment results via SDK
      • Evaluate experiments
      • Evaluate experiment with code
      • CI/CD with experiments
        • Github Action Basics
        • Gitlab CI/CD Basics
      • Download experiment
    • Prompt Playground
      • Use tool calling
      • Use image inputs
    • Playground Integrations
      • OpenAI
      • Azure OpenAI
      • AWS Bedrock
      • VertexAI
      • Custom LLM Models
    • Prompt Hub
  • ๐Ÿง Evaluate
    • Online Evals
      • Run evaluations in the UI
      • Run evaluations with code
      • Test LLM evaluator in playground
      • View task details & logs
      • โœจCopilot: Eval Builder
      • โœจCopilot: Eval Analysis
      • โœจCopilot: RAG Analysis
    • Experiment Evals
    • LLM as a Judge
      • Custom Eval Templates
      • Arize Templates
        • Agent Tool Calling
        • Agent Tool Selection
        • Agent Parameter Extraction
        • Agent Path Convergence
        • Agent Planning
        • Agent Reflection
        • Hallucinations
        • Q&A on Retrieved Data
        • Summarization
        • Code Generation
        • Toxicity
        • AI vs Human (Groundtruth)
        • Citation
        • User Frustration
        • SQL Generation
    • Code Evaluations
    • Human Annotations
  • ๐Ÿ”ญObserve
    • Quickstart: Tracing
    • Tracing
      • Setup tracing
      • Trace manually
        • Trace inputs and outputs
        • Trace function calls
        • Trace LLM, Retriever and Tool Spans
        • Trace prompt templates & variables
        • Trace as Inferences
        • Send Traces from Phoenix -> Arize
        • Advanced Tracing (OTEL) Examples
      • Add metadata
        • Add events, exceptions and status
        • Add attributes, metadata and tags
        • Send data to a specific project
        • Get the current span context and tracer
      • Configure tracing options
        • Configure OTEL tracer
        • Mask span attributes
        • Redact sensitive data from traces
        • Instrument with OpenInference helpers
      • Query traces
        • Filter Traces
          • Time Filtering
        • Export Traces
        • โœจAI Powered Search & Filter
        • โœจAI Powered Trace Analysis
        • โœจAI Span Analysis & Evaluation
    • Tracing Integrations
      • OpenAI
      • OpenAI Agents SDK
      • LlamaIndex
      • LlamaIndex Workflows
      • LangChain
      • LangGraph
      • Hugging Face smolagents
      • Autogen
      • Google GenAI (Gemini)
      • Model Context Protocol (MCP)
      • Vertex AI
      • Amazon Bedrock
      • Amazon Bedrock Agents
      • MistralAI
      • Anthropic
      • LangFlow
      • Haystack
      • LiteLLM
      • CrewAI
      • Groq
      • DSPy
      • Guardrails AI
      • Prompt flow
      • Vercel AI SDK
      • Llama
      • Together AI
      • OpenTelemetry (arize-otel)
      • BeeAI
    • Evals on Traces
    • Guardrails
    • Sessions
    • Dashboards
      • Dashboard Widgets
      • Tracking Token Usage
      • โœจCopilot: Dashboard Widget Creation
    • Monitors
      • Integrations: Monitors
        • Slack
          • Manual Setup
        • OpsGenie
        • PagerDuty
      • LLM Red Teaming
    • Custom Metrics & Analytics
      • Arize Query Language Syntax
        • Conditionals and Filters
        • All Operators
        • All Functions
      • Custom Metric Examples
      • โœจCopilot: ArizeQL Generator
  • ๐Ÿ“ˆMachine Learning
    • Machine Learning
      • User Guide: ML
      • Quickstart: ML
      • Concepts: ML
        • What Is A Model Schema
        • Delayed Actuals and Tags
        • ML Glossary
      • How To: ML
        • Upload Data to Arize
          • Pandas SDK Example
          • Local File Upload
            • File Upload FAQ
          • Table Ingestion Tuning
          • Wildcard Paths for Cloud Storage
          • Troubleshoot Data Upload
          • Sending Data FAQ
        • Monitors
          • ML Monitor Types
          • Configure Monitors
            • Notifications Providers
          • Programmatically Create Monitors
          • Best Practices for Monitors
        • Dashboards
          • Dashboard Widgets
          • Dashboard Templates
            • Model Performance
            • Pre-Production Performance
            • Feature Analysis
            • Drift
          • Programmatically Create Dashboards
        • Performance Tracing
          • Time Filtering
          • โœจCopilot: Performance Insights
        • Drift Tracing
          • โœจCopilot: Drift Insights
          • Data Distribution Visualization
          • Embeddings for Tabular Data (Multivariate Drift)
        • Custom Metrics
          • Arize Query Language Syntax
            • Conditionals and Filters
            • All Operators
            • All Functions
          • Custom Metric Examples
          • Custom Metrics Query Language
          • โœจCopilot: ArizeQL Generator
        • Troubleshoot Data Quality
          • โœจCopilot: Data Quality Insights
        • Explainability
          • Interpreting & Analyzing Feature Importance Values
          • SHAP
          • Surrogate Model
          • Explainability FAQ
          • Model Explainability
        • Bias Tracing (Fairness)
        • Export Data to Notebook
        • Automate Model Retraining
        • ML FAQ
      • Use Cases: ML
        • Binary Classification
          • Fraud
          • Insurance
        • Multi-Class Classification
        • Regression
          • Lending
          • Customer Lifetime Value
          • Click-Through Rate
        • Timeseries Forecasting
          • Demand Forecasting
          • Churn Forecasting
        • Ranking
          • Collaborative Filtering
          • Search Ranking
        • Natural Language Processing (NLP)
        • Common Industry Use Cases
      • Integrations: ML
        • Google BigQuery
          • GBQ Views
          • Google BigQuery FAQ
        • Snowflake
          • Snowflake Permissions Configuration
        • Databricks
        • Google Cloud Storage (GCS)
        • Azure Blob Storage
        • AWS S3
          • Private Image Link Access Via AWS S3
        • Kafka
        • Airflow Retrain
        • Amazon EventBridge Retrain
        • MLOps Partners
          • Algorithmia
          • Anyscale
          • Azure & Databricks
          • BentoML
          • CML (DVC)
          • Deepnote
          • Feast
          • Google Cloud ML
          • Hugging Face
          • LangChain ๐Ÿฆœ๐Ÿ”—
          • MLflow
          • Neptune
          • Paperspace
          • PySpark
          • Ray Serve (Anyscale)
          • SageMaker
            • Batch
            • RealTime
            • Notebook Instance with Greater than 20GB of Data
          • Spell
          • UbiOps
          • Weights & Biases
      • API Reference: ML
        • Python SDK
          • Pandas Batch Logging
            • Client
            • log
            • Schema
            • TypedColumns
            • EmbeddingColumnNames
            • ObjectDetectionColumnNames
            • PromptTemplateColumnNames
            • LLMConfigColumnNames
            • LLMRunMetadataColumnNames
            • NLP_Metrics
            • AutoEmbeddings
            • utils.types.ModelTypes
            • utils.types.Metrics
            • utils.types.Environments
          • Single Record Logging
            • Client
            • log
            • TypedValue
            • Ranking
            • Multi-Class
            • Object Detection
            • Embedding
            • LLMRunMetadata
            • utils.types.ModelTypes
            • utils.types.Metrics
            • utils.types.Environments
        • Java SDK
          • Constructor
          • log
          • bulkLog
          • logValidationRecords
          • logTrainingRecords
        • R SDK
          • Client$new()
          • Client$log()
        • Rest API
    • Computer Vision
      • How to: CV
        • Generate Embeddings
          • How to Generate Your Own Embedding
          • Let Arize Generate Your Embeddings
        • Embedding & Cluster Analyzer
        • โœจCopilot: Embedding Summarization
        • Similarity Search
        • Embedding Drift
        • Embeddings FAQ
      • Integrations: CV
      • Use Cases: CV
        • Image Classification
        • Image Segmentation
        • Object Detection
      • API Reference: CV
Powered by GitBook

Support

  • Chat Us On Slack
  • support@arize.com

Get Started

  • Signup For Free
  • Book A Demo

Copyright ยฉ 2025 Arize AI, Inc

On this page
  • Multi-Class Classification Overview
  • Supported Metrics
  • How To Log Multi-Class Data
  • Single-Label Use Case
  • Multi-Label Use Case
  • Inferring Labels From Uploaded Scores

Was this helpful?

  1. Machine Learning
  2. Machine Learning
  3. Use Cases: ML

Multi-Class Classification

How to log your model schema for multiclass classification models

Last updated 1 year ago

Was this helpful?

Multi-Class Classification Overview

A classification model with more than two classes.

Supported Metrics

, , , , Precision for a Class, Recall for a Class

How To Log Multi-Class Data

Log multi-class classification models based on your use case

Use Case
Description
Expected Fields

Single-Label

A prediction that has 1 label i.e. A passenger can only be in EITHER economy, business, OR first-class

  • prediction scores (dictionary)

  • actual scores (dictionary, optional)

Multi-Label

A prediction that has multiple labels i.e. A song can be multiple genres such as 'pop-rock'

  • prediction scores (dictionary)

  • threshold scores (dictionary)

  • actual scores (dictionary, optional)

Single-Label Use Case

Example Row

prediction_scores
actual_scores

[{"class_name": "economy_class", "score": 0.81},{"class_name": "business_class", "score": 0.42},{"class_name": "first_class", "score": 0.35}]

[{"class_name": "economy_class", "score": 1}]

Note: class economy_class has the highest prediction score and will be the prediction label

Code Example

schema = Schema(
    prediction_id_column_name="prediction_id",
    prediction_score_column_name="prediction_scores",
    actual_score_column_name="actual_scores",
)

response = arize_client.log(
    model_id='multiclass-classification-single-label-example', 
    model_version= "v1",
    model_type=ModelTypes.MULTI_CLASS,
    dataframe=example_dataframe,
    schema=schema,
    environment=Environments.PRODUCTION,
)
# Predicting likelihood of Economy, Business, or First Class
"""
example_record = {
   "prediction_scores":{
      "economy_class":0.81,
      "business_class":0.42,
      "first_class":0.35
   },
   "actual_scores": {
      "first_class": 1
   }
}
"""
prediction_label = MultiClassPredictionLabel(
	prediction_scores=record["prediction_scores"],
)
actual_label = MultiClassActualLabel(
	actual_scores=record["actual_scores"],
)
response = arize_client.log(
    model_id="multiclass-classification-single-label-example",
    model_version= "v1",
    model_type=ModelTypes.MULTI_CLASS,
    prediction_id=record["prediction_id"],
    prediction_label=prediction_label,
    actual_label=actual_label,
    environment=Environments.PRODUCTION
)

Note: class economy_class has the highest prediction score and will be the prediction label

Multi-Class Prediction

Log multi-class prediction values using the MultiClassPredictionLabel object, which can be assigned to the prediction_label parameter.

Prediction scores are required to use MULTI_CLASS models in Arize.

class MultiClassPredictionLabel(
    prediction_scores: Dict[str, Union[float, int]]
)

Multi-Class Actual Values

Log multi-class actual values by using the MultiClassActualLabel object, which can be assigned to the actual_label parameter.

class MultiClassActualLabel(
    actual_scores: Dict[str, Union[float, int]]
)

For more information on Python Single Record Logging API Reference, visit here:

Multi-Label Use Case

Example Row

prediction_scores
threshold_scores
actual_scores

[{"class_name": "jazz", "score": 0.81},{"class_name": "rock", "score": 0.42},{"class_name": "pop", "score": 0.35}]

[{"class_name": "jazz", "score": 0.5},{"class_name": "rock", "score": 0.4},{"class_name": "pop", "score": 0.6}]

[{"class_name": "rock", "score": 1}]

Note: classes jazz and rock have prediction scores > threshold scores and will be part of the prediction label.

Code Example

schema = Schema(
    prediction_id_column_name="prediction_id",
    prediction_score_column_name="prediction_scores",
    multi_class_threshold_scores_column_name="threshold_scores"
    actual_score_column_name="actual_scores",
)

response = arize_client.log(
    model_id='multiclass-classification-multi-label-example', 
    model_version= "v1",
    model_type=ModelTypes.MULTI_CLASS,
    dataframe=example_dataframe,
    schema=schema,
    environment=Environments.PRODUCTION,
)
#predicting song genre within pop, rock, and jazz categories
"""
example_record = {
"prediction_id": "57437c5d-0cd2-48bf-aa91-379302fd0fe4",
"prediction_scores": {
   "pop": 0.5044088627000001,
   "rock": 0.3688503145,
   "jazz": 0.1267408228
 },
 "threshold_scores": {
   "pop": 0.50,
   "rock": 0.40,
   "jazz": 0.10
 },
"actual_scores": {
   "pop": 1,
   "jazz": 1
 }
"""

prediction_label = MultiClassPredictionLabel(
	prediction_scores=record["prediction_scores"],
        threshold_scores=record["threshold_scores"],  #required for multi-label
)
actual_label = MultiClassActualLabel(
	actual_scores=record["actual_class"],
)
response = arize_client.log(
    model_id="multiclass-classification-multi-label-example",
    model_version= "v1",
    model_type=ModelTypes.MULTI_CLASS,
    prediction_id=record["prediction_id"],
    prediction_label=prediction_label,
    actual_label=actual_label,
    environment=Environments.PRODUCTION
)

Multi-Class Prediction & Threshold Values

Log multi-class prediction and threshold values using the MultiClassPredictionLabel object, which can be assigned to the prediction_label parameter.

Prediction scores are required to use MULTI_CLASS models in Arize.

class MultiClassPredictionLabel(
    prediction_scores: Dict[str, Union[float, int]]
    threshold_scores: Dict[str, Union[float, int]] = None #for multi-label use cases
)

Multi-Class Actual Values

Log multi-class actual values by using the MultiClassActualLabel object, which can be assigned to the actual_label parameter.

class MultiClassActualLabel(
    actual_scores: Dict[str, Union[float, int]]
)

For more information on Python Single Record Logging API Reference, visit here:


Inferring Labels From Uploaded Scores

To calculate metrics and visualize & troubleshoot data for multi-class models, Arize automatically infers prediction & actual labels from the scores that you upload.

Learn how each case is determined below.

Use Case
Prediction Label Determination
Actual Label Determination

Single-Label

For each prediction, the class with the highest prediction score is the prediction label

The class with an actual score of 1 is the actual label

Multi-Label

For each class, there must exist a prediction score and threshold score. If the prediction score > threshold score, the class is a part of the prediction label

Each class with an actual score of 1 is part of the actual label

Actual values are optional, and can be sent later via .

Actual values are optional, and can be sent later via .

๐Ÿ“ˆ
delayed actuals
Single Record Logging
delayed actuals
Single Record Logging
Google Colaboratory
Single-Label Use Case Colab
Logo
Google Colaboratory
Single-Label Use Case Single RecordColab
Logo
Google Colaboratory
Multi-Label Use Case Colab
Logo
Google Colaboratory
Multi-Label Use Case, Single Record Colab
Logo
Micro-Averaged Precision
Micro-Averaged Recall
Macro-Averaged Precision
Macro-Averaged Recall