Examples can be found here:
The UMAP visualization requires each datapoint's prediction ID to be unique. If there are are multiple predictions sent with the same prediction ID, the UMAP visualization cannot fetch all of the columns (features, tags, etc) of that datapoint. Since certain fields of this datapoint cannot be fetched, some color by options will be restricted.
Different embedding features can have vectors with different dimensionality. However, Arize currently only supports one vector dimensionality per embedding feature. We are working to support multiple dimensionalities within the same embedding feature.
The vector attribute of Arize's embedding object must be an array of floats. Strings are not allowed.
Regular features and embedding features are ingested into Arize in two different list of column names. In short, embedding column names should not be included in
feature_column_names. Check out our resources to learn more.
Euclidean distance identifies movements of embeddings across many use cases in testing. There will be support for more metrics, i.e., cosine similarity, as the ecosystem develops. Learn more on monitoring embedding drift here.
Inside of the Arize platform, Euclidean distance is calculated using the original embeddings, not the UMAP projections. For visualization purposes, we take a sample from those embeddings and, using UMAP, project them into a 2D, or 3D space.
Any use cases where embeddings or the ability to extract embeddings can be used. A few examples are computer vision, natural language processing, deep learning, hierarchical embedding use cases.
You can set a euclidean distance monitor using the UI or through our Monitors API. By creating a monitor and selecting the embeddings feature of interest, Arize can track and monitor your embeddings for drift.
Arize has the fortune to count Dr. Leland McInnes (one of the creators of UMAP) from the Tutte Institute for Mathematics and Computing as an advisor. He continues to help us develop capabilities in the space.