Similarity Search

The Similarity Search feature allows you to find items that are similar to a set of reference embeddings using cosine similarity. This feature supports both image and text embeddings.

Last updated 9 months ago

Was this helpful?

Similarity Search

The Similarity Search feature allows you to find items that are similar to a set of reference embeddings using cosine similarity. This feature supports both image and text embeddings.

Key Concepts

Reference Embedding: The embedding vector that serves as the baseline for similarity comparisons. Select the column containing these vectors, representing the characteristics or features you are interested in matching.
Search Embedding: The column containing embedding vectors of items to be compared against the reference embedding using cosine similarity.
Threshold: A user-defined value that determines the minimum similarity score required for an item to be considered similar to the reference embeddings.

Performing Similarity Search

Selecting an Embedding Cell Directly

Hover over an embedding column in the table view and click the “Find Similar” button.

Select points in UMAP and then press the “Find Similar” button.

Press the “Find Similar” button in dimension details after selecting an embedding or row.

Any selection automatically updates the reference object with the prediction ID and the name of the embedding column.

Additional Features

Multiple Embeddings

Add multiple items from any of the entry points.
When multiple embeddings are selected, their vectors will be averaged to form the reference embedding.

Limitations

Different columns can be used for the search and reference, but adding a new reference point from a different column will trigger a modal error.
Similarity search is only supported in performance tracing and embedding views.

Programmatic Export

How it Works

Define Reference Embeddings: Specify the embeddings you want to use as references. Ensure that all reference embeddings are in the same column.
Set Search Parameters: Define the search embedding column and the similarity threshold.
Execute the Search: Use the provided API to perform the similarity search and retrieve the results.

Prerequisites

Make sure you have at least version 7.18.1 of Arize installed:

!pip install -q "arize>=7.18.1"

Code Example

from arize.exporter import ArizeExportClient
from arize.utils.types import Environments, SimilaritySearchParams, SimilarityReference

ARIZE_API_KEY = ""
client = ArizeExportClient(api_key=ARIZE_API_KEY)

# Establish references
similarity_references = [
    SimilarityReference(
        prediction_id="pred_1",
        reference_column_name="image_vector",
    ),
    SimilarityReference(
        prediction_id="pred_2",
        reference_column_name="image_vector",
    ),
]

# Define search parameters
search_column_name = "image_vector"
threshold = 0.8

# Execute similarity search
df = flight_client.export_model_to_df(
    model_id=dev_model_id,
    start_time=start_time,
    end_time=end_time,
    environment=Environments.PRODUCTION,
    space_id=dev_space_id,
    similarity_search_params=SimilaritySearchParams(
        references=similarity_references,
        search_column_name=search_column_name,
        threshold=threshold
    )
)