Arize AI
Search…
7b. Embedding Features
Term used to describe unstructured data inputs of a model
Arize's embedding objects are composed of 3 different pieces of information:
  • vector (required): the embedding vector itself, representing the unstructured input data.
  • data (optional): the raw data represented by the embedding vector. For example, words, sentences, etc.
  • link to data (optional): a URL linking to the raw data represented by the embedding vector. Links can direct to images, videos, audio, etc.
Real-Time Log
Batch Log (pandas)

Send Embedding Data in Real-Time

Arize offers the Embedding class to construct your embedding objects. You can log them into the platform using a dictionary that maps the embedding feature names (how they will appear in the UI) to the embedding objects. See our API reference for more details.

Code Example

1
from arize.api import Client
2
from arize.utils.types import ModelTypes, Environments, Embedding
3
4
API_KEY = 'ARIZE_API_KEY'
5
SPACE_KEY = 'YOUR SPACE KEY'
6
arize_client = Client(space_key=SPACE_KEY, api_key=API_KEY)
7
8
# Example features
9
features = {
10
'state': 'ca',
11
'city': 'berkeley',
12
'merchant_name': 'Peets Coffee',
13
'pos_approved': True,
14
'item_count': 10,
15
'merchant_type': 'coffee shop',
16
'charge_amount': 20.11,
17
}
18
19
# Example embedding features
20
embedding_features = {
21
"nlp_embedding": Embedding(
22
vector=pd.Series([4.0, 5.0, 6.0, 7.0]),
23
data="This is a test sentence",
24
),
25
"image_embedding": Embedding(
26
vector=np.array([1.0, 2, 3]),
27
link_to_data="https://link-to-my-image.png",
28
),
29
}
30
31
# Log data into the Arize platform
32
response = arize.log(
33
model_id='sample-model-1',
34
model_version='v1",
35
model_type=ModelTypes.SCORE_CATEGORICAL,
36
...
37
features=features,
38
embedding_features=embedding_features,
39
)
Copied!

Send Embedding Data in batches

Logging embedding features from a pandas dataframe is a bit different than logging structured features. In the latter, there is a 1-to-1 relationship - one column equals one feature.
However, this is different for embedding features - up to 3 columns (vector, data, link to data) can contain information corresponding to the same embedding object. Arize offers the EmbeddingColumnNames class, so you can group different column names to be understood as describing the same embedding object. See our API reference for more details.

Code Example

1
from arize.pandas.logger import Client, Schema
2
from arize.utils.types import ModelTypes, Environments, EmbeddingColumnNames
3
4
API_KEY = 'ARIZE_API_KEY'
5
SPACE_KEY = 'YOUR SPACE KEY'
6
arize_client = Client(space_key=SPACE_KEY, api_key=API_KEY)
7
8
9
# Declare which columns are the feature columns
10
feature_column_names=[
11
"MERCHANT_TYPE",
12
"ENTRY_MODE",
13
"STATE",
14
"MEAN_AMOUNT",
15
"STD_AMOUNT",
16
"TX_AMOUNT",
17
]
18
19
# Declare which columns are the embedding feature columns
20
embedding_feature_column_names = [
21
EmbeddingColumnNames(
22
vector_column_name="text_vector", # Will be name of embedding feature in the app
23
data_column_name="text",
24
),
25
EmbeddingColumnNames(
26
vector_column_name="image_vector", # Will be name of embedding feature in the app
27
link_to_data_column_name="image_url",
28
),
29
]
30
31
# Defina the Schema, including embedding information
32
schema = Schema(
33
prediction_id_column_name="prediction_id",
34
...
35
feature_column_names=feature_column_names,
36
embedding_feature_column_names=embedding_feature_column_names,
37
)
38
39
# Log the dataframe with the schema mapping
40
response = arize_client.log(
41
model_id="sample-model-1",
42
model_version= "v1",
43
model_type=ModelTypes.NUMERIC,
44
environment=Environments.PRODUCTION,
45
dataframe=test_dataframe,
46
schema=schema,
47
)
Copied!

Resources

Check out our colab example for a tutorial on how to send embeddings to Arize.
Questions? Email us at [email protected] or Slack us in the #arize-support channel
Copy link
Contents
Resources