Phoenix
TypeScript APIPython APICommunityGitHubPhoenix Cloud
English
  • Documentation
  • Self-Hosting
  • Cookbooks
  • Learn
  • Integrations
  • SDK and API Reference
  • Release Notes
English
  • Arize Phoenix
  • Quickstarts
  • User Guide
  • Environments
  • Phoenix Demo
  • 🔭Tracing
    • Overview: Tracing
    • Quickstart: Tracing
      • Quickstart: Tracing (Python)
      • Quickstart: Tracing (TS)
    • Features: Tracing
      • Projects
      • Annotations
      • Sessions
    • Integrations: Tracing
    • How-to: Tracing
      • Setup Tracing
        • Setup using Phoenix OTEL
        • Setup using base OTEL
        • Using Phoenix Decorators
        • Setup Tracing (TS)
        • Setup Projects
        • Setup Sessions
      • Add Metadata
        • Add Attributes, Metadata, Users
        • Instrument Prompt Templates and Prompt Variables
      • Annotate Traces
        • Annotating in the UI
        • Annotating via the Client
        • Running Evals on Traces
        • Log Evaluation Results
      • Importing & Exporting Traces
        • Import Existing Traces
        • Export Data & Query Spans
        • Exporting Annotated Spans
      • Advanced
        • Mask Span Attributes
        • Suppress Tracing
        • Filter Spans to Export
        • Capture Multimodal Traces
    • Concepts: Tracing
      • How Tracing Works
      • What are Traces
      • Concepts: Annotations
      • FAQs: Tracing
  • 📃Prompt Engineering
    • Overview: Prompts
      • Prompt Management
      • Prompt Playground
      • Span Replay
      • Prompts in Code
    • Quickstart: Prompts
      • Quickstart: Prompts (UI)
      • Quickstart: Prompts (Python)
      • Quickstart: Prompts (TS)
    • How to: Prompts
      • Configure AI Providers
      • Using the Playground
      • Create a prompt
      • Test a prompt
      • Tag a prompt
      • Using a prompt
    • Concepts: Prompts
  • 🗄️Datasets & Experiments
    • Overview: Datasets & Experiments
    • Quickstart: Datasets & Experiments
    • How-to: Datasets
      • Creating Datasets
      • Exporting Datasets
    • Concepts: Datasets
    • How-to: Experiments
      • Run Experiments
      • Using Evaluators
  • 🧠Evaluation
    • Overview: Evals
      • Agent Evaluation
    • Quickstart: Evals
    • How to: Evals
      • Pre-Built Evals
        • Hallucinations
        • Q&A on Retrieved Data
        • Retrieval (RAG) Relevance
        • Summarization
        • Code Generation
        • Toxicity
        • AI vs Human (Groundtruth)
        • Reference (citation) Link
        • User Frustration
        • SQL Generation Eval
        • Agent Function Calling Eval
        • Agent Path Convergence
        • Agent Planning
        • Agent Reflection
        • Audio Emotion Detection
      • Eval Models
      • Build an Eval
      • Build a Multimodal Eval
      • Online Evals
      • Evals API Reference
    • Concepts: Evals
      • LLM as a Judge
      • Eval Data Types
      • Evals With Explanations
      • Evaluators
      • Custom Task Evaluation
  • 🔍Retrieval
    • Overview: Retrieval
    • Quickstart: Retrieval
    • Concepts: Retrieval
      • Retrieval with Embeddings
      • Benchmarking Retrieval
      • Retrieval Evals on Document Chunks
  • 🌌inferences
    • Quickstart: Inferences
    • How-to: Inferences
      • Import Your Data
        • Prompt and Response (LLM)
        • Retrieval (RAG)
        • Corpus Data
      • Export Data
      • Generate Embeddings
      • Manage the App
      • Use Example Inferences
    • Concepts: Inferences
    • API: Inferences
    • Use-Cases: Inferences
      • Embeddings Analysis
  • ⚙️Settings
    • Access Control (RBAC)
    • API Keys
    • Data Retention
Powered by GitBook

Platform

  • Tracing
  • Prompts
  • Datasets and Experiments
  • Evals

Software

  • Python Client
  • TypeScript Client
  • Phoenix Evals
  • Phoenix Otel

Resources

  • Container Images
  • X
  • Blue Sky
  • Blog

Integrations

  • OpenTelemetry
  • AI Providers

© 2025 Arize AI

On this page
  • Inferences
  • How many inferences do I need?
  • Which inference set is which?
  • Corpus Inference set (Information Retrieval)
  • Schemas
  • How many schemas do I need?

Was this helpful?

Edit on GitHub
  1. inferences

Concepts: Inferences

PreviousUse Example InferencesNextAPI: Inferences

Last updated 1 year ago

Was this helpful?

This section introduces inferences and schemas, the starting concepts needed to use Phoenix with inferences.

  • For comprehensive descriptions of phoenix.Inferences and phoenix.Schema, see the .

  • For tips on creating your own Phoenix inferences and schemas, see the .

Inferences

Phoenix inferences are an instance of phoenix.Inferences that contains three pieces of information:

  • The data itself (a pandas dataframe)

  • A (a phoenix.Schema instance) that describes the of your dataframe

  • A name that appears in the UI

For example, if you have a dataframe prod_df that is described by a schema prod_schema, you can define inferences prod_ds with

prod_ds = px.Inferences(prod_df, prod_schema, "production")

If you launch Phoenix with these inferences, you will see inferences named "production" in the UI.

How many inferences do I need?

You can launch Phoenix with zero, one, or two sets of inferences.

With no inferences, Phoenix runs in the background and collects trace data emitted by your instrumented LLM application. With a single inference set, Phoenix provides insights into model performance and data quality. With two inference sets, Phoenix compares your inferences and gives insights into drift in addition to model performance and data quality, or helps you debug your retrieval-augmented generation applications.

Which inference set is which?

Your reference inferences provides a baseline against which to compare your primary inferences.

To compare two inference sets with Phoenix, you must select one inference set as primary and one to serve as a reference. As the name suggests, your primary inference set contains the data you care about most, perhaps because your model's performance on this data directly affects your customers or users. Your reference inferences, in contrast, is usually of secondary importance and serves as a baseline against which to compare your primary inferences.

Very often, your primary inferences will contain production data and your reference inferences will contain training data. However, that's not always the case; you can imagine a scenario where you want to check your test set for drift relative to your training data, or use your test set as a baseline against which to compare your production data. When choosing primary and reference inference sets, it matters less where your data comes from than how important the data is and what role the data serves relative to your other data.

Corpus Inference set (Information Retrieval)

Schemas

For example, if you have a dataframe containing Fisher's Iris data that looks like this:

sepal_length
sepal_width
petal_length
petal_width
target
prediction

7.7

3.0

6.1

2.3

virginica

versicolor

5.4

3.9

1.7

0.4

setosa

setosa

6.3

3.3

4.7

1.6

versicolor

versicolor

6.2

3.4

5.4

2.3

virginica

setosa

5.8

2.7

5.1

1.9

virginica

virginica

your schema might look like this:

schema = px.Schema(
    feature_column_names=[
        "sepal_length",
        "sepal_width",
        "petal_length",
        "petal_width",
    ],
    actual_label_column_name="target",
    prediction_label_column_name="prediction",
)

How many schemas do I need?

Usually one, sometimes two.

Each inference set needs a schema. If your primary and reference inferences have the same format, then you only need one schema. For example, if you have dataframes train_df and prod_df that share an identical format described by a schema named schema, then you can define inference sets train_ds and prod_ds with

train_ds = px.Inferences(train_df, schema, "training")
prod_ds = px.Inferences(prod_df, schema, "production")

Sometimes, you'll encounter scenarios where the formats of your primary and reference inference sets differ. For example, you'll need two schemas if:

  • Your production data has timestamps indicating the time at which an inference was made, but your training data does not.

  • A new version of your model has a differing set of features from a previous version.

In cases like these, you'll need to define two schemas, one for each inference set. For example, if you have dataframes train_df and prod_df that are described by schemas train_schema and prod_schema, respectively, then you can define inference sets train_ds and prod_ds with

train_ds = px.Inferences(train_df, train_schema, "training")
prod_ds = px.Inferences(prod_df, prod_schema, "production")

Schema for Corpus Inferences (Information Retrieval)

corpus_schema=Schema(
    id_column_name="id",
    document_column_names=EmbeddingColumnNames(
        vector_column_name="embedding",
        raw_data_column_name="text",
    ),
),
corpus_ds = px.Inferences(corpus_df, corpus_schema)

The only difference for the inferences is that it needs a separate schema because it have a different set of columns compared to the model data. See the section for more details.

A Phoenix schema is an instance of phoenix.Schema that maps the of your dataframe to fields that Phoenix expects and understands. Use your schema to tell Phoenix what the data in your dataframe means.

Your training data has (what we call actuals in Phoenix nomenclature), but your production data does not.

A inference set, containing documents for information retrieval, typically has a different set of columns than those found in the model data from either production or training, and requires a separate schema. Below is an example schema for a corpus inference set with three columns: the id, text, and embedding for each document in the corpus.

🌌
columns
corpus
corpus
schema

Use Zero Inference sets When:

  • You want to run Phoenix in the background to collect trace data from your instrumented LLM application.

Use a Single Inference set When:

  • You have only a single cohort of data, e.g., only training data.

  • You care about model performance and data quality, but not drift.

Use Two Inference sets When:

  • You want to compare cohorts of data, e.g., training vs. production.

  • You care about drift in addition to model performance and data quality.

API reference
how-to guide
schema
columns

You have corpus data for information retrieval. See .

Corpus Data
ground truth