Phoenix
TypeScript APIPython APICommunityGitHubPhoenix Cloud
English
  • Documentation
  • Self-Hosting
  • Cookbooks
  • Learn
  • Integrations
  • SDK and API Reference
  • Release Notes
English
  • Arize Phoenix
  • Quickstarts
  • User Guide
  • Environments
  • Phoenix Demo
  • 🔭Tracing
    • Overview: Tracing
    • Quickstart: Tracing
      • Quickstart: Tracing (Python)
      • Quickstart: Tracing (TS)
    • Features: Tracing
      • Projects
      • Annotations
      • Sessions
    • Integrations: Tracing
    • How-to: Tracing
      • Setup Tracing
        • Setup using Phoenix OTEL
        • Setup using base OTEL
        • Using Phoenix Decorators
        • Setup Tracing (TS)
        • Setup Projects
        • Setup Sessions
      • Add Metadata
        • Add Attributes, Metadata, Users
        • Instrument Prompt Templates and Prompt Variables
      • Annotate Traces
        • Annotating in the UI
        • Annotating via the Client
        • Running Evals on Traces
        • Log Evaluation Results
      • Importing & Exporting Traces
        • Import Existing Traces
        • Export Data & Query Spans
        • Exporting Annotated Spans
      • Advanced
        • Mask Span Attributes
        • Suppress Tracing
        • Filter Spans to Export
        • Capture Multimodal Traces
    • Concepts: Tracing
      • How Tracing Works
      • What are Traces
      • Concepts: Annotations
      • FAQs: Tracing
  • 📃Prompt Engineering
    • Overview: Prompts
      • Prompt Management
      • Prompt Playground
      • Span Replay
      • Prompts in Code
    • Quickstart: Prompts
      • Quickstart: Prompts (UI)
      • Quickstart: Prompts (Python)
      • Quickstart: Prompts (TS)
    • How to: Prompts
      • Configure AI Providers
      • Using the Playground
      • Create a prompt
      • Test a prompt
      • Tag a prompt
      • Using a prompt
    • Concepts: Prompts
  • 🗄️Datasets & Experiments
    • Overview: Datasets & Experiments
    • Quickstart: Datasets & Experiments
    • How-to: Datasets
      • Creating Datasets
      • Exporting Datasets
    • Concepts: Datasets
    • How-to: Experiments
      • Run Experiments
      • Using Evaluators
  • 🧠Evaluation
    • Overview: Evals
      • Agent Evaluation
    • Quickstart: Evals
    • How to: Evals
      • Pre-Built Evals
        • Hallucinations
        • Q&A on Retrieved Data
        • Retrieval (RAG) Relevance
        • Summarization
        • Code Generation
        • Toxicity
        • AI vs Human (Groundtruth)
        • Reference (citation) Link
        • User Frustration
        • SQL Generation Eval
        • Agent Function Calling Eval
        • Agent Path Convergence
        • Agent Planning
        • Agent Reflection
        • Audio Emotion Detection
      • Eval Models
      • Build an Eval
      • Build a Multimodal Eval
      • Online Evals
      • Evals API Reference
    • Concepts: Evals
      • LLM as a Judge
      • Eval Data Types
      • Evals With Explanations
      • Evaluators
      • Custom Task Evaluation
  • 🔍Retrieval
    • Overview: Retrieval
    • Quickstart: Retrieval
    • Concepts: Retrieval
      • Retrieval with Embeddings
      • Benchmarking Retrieval
      • Retrieval Evals on Document Chunks
  • 🌌inferences
    • Quickstart: Inferences
    • How-to: Inferences
      • Import Your Data
        • Prompt and Response (LLM)
        • Retrieval (RAG)
        • Corpus Data
      • Export Data
      • Generate Embeddings
      • Manage the App
      • Use Example Inferences
    • Concepts: Inferences
    • API: Inferences
    • Use-Cases: Inferences
      • Embeddings Analysis
  • ⚙️Settings
    • Access Control (RBAC)
    • API Keys
    • Data Retention
Powered by GitBook

Platform

  • Tracing
  • Prompts
  • Datasets and Experiments
  • Evals

Software

  • Python Client
  • TypeScript Client
  • Phoenix Evals
  • Phoenix Otel

Resources

  • Container Images
  • X
  • Blue Sky
  • Blog

Integrations

  • OpenTelemetry
  • AI Providers

© 2025 Arize AI

On this page
  • Overview
  • Quickstart
  • Step 1: Install and load dependencies
  • Step 2: Prepare model data
  • Step 3: Define a Schema
  • Step 4: Wrap into Inferences object
  • Step 5: Launch Phoenix!
  • Step 6 (Optional): Add comparison data
  • Step 7 (Optional): Export data
  • Step 8 (Optional): Enable production observability with Arize
  • Where to go from here?
  • Questions?

Was this helpful?

Edit on GitHub
  1. inferences

Quickstart: Inferences

Observability for all model types (LLM, NLP, CV, Tabular)

PreviousRetrieval Evals on Document ChunksNextHow-to: Inferences

Last updated 4 days ago

Was this helpful?

Overview

Phoenix Inferences allows you to observe the performance of your model through visualizing all the model’s inferences in one interactive UMAP view.

This powerful visualization can be leveraged during EDA to understand model drift, find low performing clusters, uncover retrieval issues, and export data for retraining / fine tuning.

Quickstart

The following Quickstart can be executed in a Jupyter notebook or Google Colab.

We will begin by logging just a training set. Then proceed to add a production set for comparison.

Step 1: Install and load dependencies

Use pip or condato install arize-phoenix. Note that since we are going to do embedding analysis we must also add the embeddings extra.

!pip install 'arize-phoenix[embeddings]'

import phoenix as px

Step 2: Prepare model data

Phoenix visualizes data taken from pandas dataframe, where each row of the dataframe compasses all the information about each inference (including feature values, prediction, metadata, etc.)

For this Quickstart, we will show an example of visualizing the inferences from a computer vision model. See example notebooks for all model types .

Let’s begin by working with the training set for this model.

Download the dataset and load it into a Pandas dataframe.

import pandas as pd

train_df = pd.read_parquet(
    "http://storage.googleapis.com/arize-assets/phoenix/datasets/unstructured/cv/human-actions/human_actions_training.parquet"
)

Preview the dataframe with train_df.head() and note that each row contains all the data specific to this CV model for each inference.

train_df.head()

Step 3: Define a Schema

Before we can log these inferences, we need to define a Schema object to describe them.

The Schema object informs Phoenix of the fields that the columns of the dataframe should map to.

Here we define a Schema to describe our particular CV training set:

# Define Schema to indicate which columns in train_df should map to each field
train_schema = px.Schema(
    timestamp_column_name="prediction_ts",
    prediction_label_column_name="predicted_action",
    actual_label_column_name="actual_action",
    embedding_feature_column_names={
        "image_embedding": px.EmbeddingColumnNames(
            vector_column_name="image_vector",
            link_to_data_column_name="url",
        ),
    },
)

Important: The fields used in a Schema will vary depending on the model type that you are working with.

Step 4: Wrap into Inferences object

Wrap your train_df and schema train_schema into a Phoenix Inferences object:

train_ds = px.Inferences(dataframe=train_df, schema=train_schema, name="training")

Step 5: Launch Phoenix!

We are now ready to launch Phoenix with our Inferences!

Here, we are passing train_ds as the primary inferences, as we are only visualizing one inference set (see Step 6 for adding additional inference sets).

session = px.launch_app(primary=train_ds)

Running this will fire up a Phoenix visualization. Follow in the instructions in the output to view Phoenix in a browser, or in-line in your notebook:

🌍 To view the Phoenix app in your browser, visit https://x0u0hsyy843-496ff2e9c6d22116-6060-colab.googleusercontent.com/
📺 To view the Phoenix app in a notebook, run `px.active_session().view()`
📖 For more information on how to use Phoenix, check out https://docs.arize.com/phoenix

You are now ready to observe the training set of your model!

Optional - try the following exercises to familiarize yourself more with Phoenix:

Step 6 (Optional): Add comparison data

We will continue on with our CV model example above, and add a set of production data from our model to our visualization.

This will allow us to analyze drift and conduct A/B comparisons of our production data against our training set.

a) Prepare production inferences

prod_df = pd.read_parquet(
    "http://storage.googleapis.com/arize-assets/phoenix/datasets/unstructured/cv/human-actions/human_actions_training.parquet"
)

prod_df.head()

b) Define model schema

Note that this schema differs slightly from our train_schema above, as our prod_df does not have a ground truth column!

prod_schema = px.Schema(
    timestamp_column_name="prediction_ts",
    prediction_label_column_name="predicted_action",
    embedding_feature_column_names={
        "image_embedding": px.EmbeddingColumnNames(
            vector_column_name="image_vector",
            link_to_data_column_name="url",
        ),
    },
)

When do I need a different schema?

In general, if both sets of inferences you are visualizing have identical schemas, you can reuse the Schema object.

However, there are often differences between the schema of a primary and reference dataset. For example:

  • Your production set does not include any ground truth, but your training set does.

  • Your primary dataset is the set of prompt-responses in an LLM application, and your reference is your corpus.

  • Your production data has differing timestamps between all inferences, but your training set does not have a timestamp column.

c) Wrap into Inferences object

prod_ds = px.Inferences(dataframe=prod_df, schema=prod_schema, name="production")

d) Launch Phoenix with both Inferences!

This time, we will include both train_ds and prod_ds when calling launch_app.

session = px.launch_app(primary=prod_ds, reference=train_ds)

What data should I set as `reference` and as `primary`? Select the inferences that you want to use as the referential baseline as your reference, and the dataset you'd like to actively evaluate as your primary.

In this case, training is our referential baseline, for which we want to gauge the behavior (e.g. evaluate drift) of our production data against.

Once again, enter your Phoenix app with the new link generated by your session. e.g.

🌍 To view the Phoenix app in your browser, visit https://x0u0hsyy845-496ff2e9c6d22116-6060-colab.googleusercontent.com/
📺 To view the Phoenix app in a notebook, run `px.active_session().view()`
📖 For more information on how to use Phoenix, check out https://docs.arize.com/phoenix

You are now ready to conduct comparative Root Cause Analysis!

Optional - try the following exercises to familiarize yourself more with Phoenix:

Step 7 (Optional): Export data

Once you have identified datapoints of interest, you can export this data directly from the Phoenix app for further analysis, or to incorporate these into downstream model retraining and finetuning flows.

Step 8 (Optional): Enable production observability with Arize

Once your model is ready for production, you can add Arize to enable production-grade observability. Phoenix works in conjunction with Arize to enable end-to-end model development and observability.

With Arize, you will additionally benefit from:

  • Being able to publish and observe your models in real-time as inferences are being served, and/or via direct connectors from your table/storage solution

  • Scalable compute to handle billions of predictions

  • Ability to set up monitors & alerts

  • Production-grade observability

  • Integration with Phoenix for model iteration to observability

  • Enterprise-grade RBAC and SSO

  • Experiment with infinite permutations of model versions and filters

Where to go from here?


Questions?

For examples on how Schema are defined for other model types (NLP, tabular, LLM-based applications), see example notebooks under and .

Checkpoint A.

Note that Phoenix automatically generates clusters for you on your data using a clustering algorithm called HDBSCAN (more information: )

Discuss your answers in our !

In order to visualize drift, conduct A/B model comparisons, or in the case of an information retrieval use case, compare inferences against a , you will need to add a comparison dataset to your visualization.

Read more about comparison dataset Schemas here:

Checkpoint B.

Discuss your answers in our !

See more on exporting data .

Create your and see the full suite of features.

Read more about Embeddings Analysis

Join the to ask questions, share findings, provide feedback, and connect with other developers.

🌌
✅
✅
here
https://docs.arize.com/phoenix/concepts/embeddings-analysis#clusters
community
corpus
How many schemas do I need?
community
here
free account
Arize
here
Phoenix Slack community
Embedding Analysis
Structured Data Analysis