Similarity Search

The Similarity Search feature allows you to find items that are similar to a set of reference embeddings using cosine similarity. This feature supports both image and text embeddings.

Currently, similarity search is only available for images, text, and inferences. Support for traces is coming soon.

Key Concepts

  • Reference Embedding: The embedding vector that serves as the baseline for similarity comparisons. Select the column containing these vectors, representing the characteristics or features you are interested in matching.

  • Search Embedding: The column containing embedding vectors of items to be compared against the reference embedding using cosine similarity.

  • Threshold: A user-defined value that determines the minimum similarity score required for an item to be considered similar to the reference embeddings.

Selecting an Embedding Cell Directly

  • Hover over an embedding column in the table view and click the “Find Similar” button.

  • Select points in UMAP and then press the “Find Similar” button.

  • Press the “Find Similar” button in dimension details after selecting an embedding or row.

Any selection automatically updates the reference object with the prediction ID and the name of the embedding column.

Additional Features

Multiple Embeddings

  • Add multiple items from any of the entry points.

  • When multiple embeddings are selected, their vectors will be averaged to form the reference embedding.

Limitations

  • Different columns can be used for the search and reference, but adding a new reference point from a different column will trigger a modal error.

  • Similarity search is only supported in performance tracing and embedding views.

Programmatic Export

How it Works

  1. Define Reference Embeddings: Specify the embeddings you want to use as references. Ensure that all reference embeddings are in the same column.

  2. Set Search Parameters: Define the search embedding column and the similarity threshold.

  3. Execute the Search: Use the provided API to perform the similarity search and retrieve the results.

Prerequisites

Make sure you have at least version 7.18.1 of Arize installed:

!pip install -q "arize>=7.18.1"

Code Example

from arize.exporter import ArizeExportClient
from arize.utils.types import Environments, SimilaritySearchParams, SimilarityReference

ARIZE_API_KEY = ""
client = ArizeExportClient(api_key=ARIZE_API_KEY)

# Establish references
similarity_references = [
    SimilarityReference(
        prediction_id="pred_1",
        reference_column_name="image_vector",
    ),
    SimilarityReference(
        prediction_id="pred_2",
        reference_column_name="image_vector",
    ),
]

# Define search parameters
search_column_name = "image_vector"
threshold = 0.8

# Execute similarity search
df = flight_client.export_model_to_df(
    model_id=dev_model_id,
    start_time=start_time,
    end_time=end_time,
    environment=Environments.PRODUCTION,
    space_id=dev_space_id,
    similarity_search_params=SimilaritySearchParams(
        references=similarity_references,
        search_column_name=search_column_name,
        threshold=threshold
    )
)

Last updated

Copyright © 2023 Arize AI, Inc