Let Arize Generate Your Embeddings
Only available in arize>=6.0.0
Generating embeddings is likely another problem to solve, on top of ensuring your model is performing properly. With our Python SDK, you can offload that task to Arize and we will generate the embeddings for you. We use large, pre-trained models that will capture information from your inputs and encode it into embedding vectors.
We extract the embeddings in the appropriate way depending on your use case, and we return it to you to include in your pandas DataFrame, which you then send to Arize.
Auto-Embeddings works end-to-end, you don't have to worry about formatting your inputs for the correct model. By simply passing your input, an embedding will come out as a result. We take care of everything in between.
If you want to use this functionality as part of our Python SDK, you need to install it with the extra dependencies using
pip install arize[AutoEmbeddings]
.You can get an updated table listing of supported models by running the line below.
from arize.pandas.embeddings import EmbeddingGenerator
EmbeddingGenerator.list_pretrained_models()
We are constantly innovating, so if you want other models included, reach out to us at [email protected] or in our community Slack!
Auto-Embeddings is designed to require minimal code from the user. We only require two steps:
- 1.Create the generator: you simply instantiate the generator using
EmbeddingGenerator.from_use_case()
and passing information about your use case, the model to use, and more options depending on the use case; see examples below. - 2.Let Arize generate your embeddings: obtain your embeddings column by calling
generator.generate_embedding()
and passing the column containing your inputs; see examples below.
Arize expects the DataFrame's index to be sorted and begin at 0. If you perform operations that might affect the index prior to generating embeddings, reset the index as follows:
df = df.reset_index(drop=True)
CV Image Classification
NLP Sequence Classification
from arize.pandas.embeddings import EmbeddingGenerator, UseCases
generator = EmbeddingGenerator.from_use_case(
use_case=UseCases.CV.IMAGE_CLASSIFICATION,
model_name="google/vit-base-patch16-224-in21k",
batch_size=100
)
df["image_vector"] = generator.generate_embeddings(
local_image_path_col=df["local_path"]
)py
from arize.pandas.embeddings import EmbeddingGenerator, UseCases
generator = EmbeddingGenerator.from_use_case(
use_case=UseCases.NLP.SEQUENCE_CLASSIFICATION,
model_name="distilbert-base-uncased",
tokenizer_max_length=512,
batch_size=100
)
df["text_vector"] = generator.generate_embeddings(text_col=df["text"])
Check out our tutorials on generating embeddings for different use cases using Arize.
Use-Case | Code |
---|---|
NLP Sentiment Clasfication | |
CV Image Classification |
Last modified 1mo ago