Arize AI
Search…
⌃K

Features (unstructured)

Inputs into your model for unstructured data types
Arize's embedding objects are composed of 3 different pieces of information:
  • vector (required): the embedding vector itself, representing the unstructured input data. Accepted data types are List[float] and nd.array[float].
  • data (optional): Typically the raw text represented by the embedding vector. Accepted data types are str (for words or sentences) and List[str] (for token arrays).
  • link to data (optional): Typically a URL linking to the data file (image, audio, video...) represented by the embedding vector. Accepted data types are str.
Currently, Arize supports embedding vectors of common dimensionality. We are working towards supporting multiple vector sizes within the same embedding feature.
Real-Time Log
Batch Log (pandas)

Send Embedding Data in Real-Time

Arize offers the Embedding class to construct your embedding objects. You can log them into the platform using a dictionary that maps the embedding feature names (how they will appear in the UI) to the embedding objects. See our API reference for more details.

Code Example

from arize.api import Client
from arize.utils.types import ModelTypes, Environments, Embedding
API_KEY = 'ARIZE_API_KEY'
SPACE_KEY = 'YOUR SPACE KEY'
arize_client = Client(space_key=SPACE_KEY, api_key=API_KEY)
# Example features
features = {
'state': 'ca',
'city': 'berkeley',
'merchant_name': 'Peets Coffee',
'pos_approved': True,
'item_count': 10,
'merchant_type': 'coffee shop',
'charge_amount': 20.11,
}
# Example embedding features
embedding_features = {
"nlp_embedding": Embedding(
vector=pd.Series([4.0, 5.0, 6.0, 7.0]),
data="This is a test sentence",
),
"image_embedding": Embedding(
vector=np.array([1.0, 2, 3]),
link_to_data="https://link-to-my-image.png",
),
}
# Log data into the Arize platform
response = arize.log(
model_id='sample-model-1',
model_version='v1",
model_type=ModelTypes.SCORE_CATEGORICAL,
...
features=features,
embedding_features=embedding_features,
)

Send Embedding Data in batches

Logging embedding features from a pandas dataframe is a bit different than logging structured features. In the latter, there is a 1-to-1 relationship - one column equals one feature.
However, this is different for embedding features - up to 3 columns (vector, data, link to data) can contain information corresponding to the same embedding object. Arize offers the EmbeddingColumnNames class, so you can group different column names to be understood as describing the same embedding object. See our API reference for more details.

Code Example

from arize.pandas.logger import Client, Schema
from arize.utils.types import ModelTypes, Environments, EmbeddingColumnNames
API_KEY = 'ARIZE_API_KEY'
SPACE_KEY = 'YOUR SPACE KEY'
arize_client = Client(space_key=SPACE_KEY, api_key=API_KEY)
# Declare which columns are the feature columns
feature_column_names=[
"MERCHANT_TYPE",
"ENTRY_MODE",
"STATE",
"MEAN_AMOUNT",
"STD_AMOUNT",
"TX_AMOUNT",
]
# Declare which columns are the embedding feature columns
embedding_feature_column_names = [
EmbeddingColumnNames(
vector_column_name="text_vector", # column name of the vectors, required
data_column_name="text", # column name of the raw data vectors are representing, optional
),
EmbeddingColumnNames(
vector_column_name="image_vector", # column name of the vectors, required
link_to_data_column_name="image_url", # column name of links to raw data, optional
),
]
# Defina the Schema, including embedding information
schema = Schema(
prediction_id_column_name="prediction_id",
...
feature_column_names=feature_column_names,
embedding_feature_column_names=embedding_feature_column_names,
)
# Log the dataframe with the schema mapping
response = arize_client.log(
model_id="sample-model-1",
model_version= "v1",
model_type=ModelTypes.NUMERIC,
environment=Environments.PRODUCTION,
dataframe=test_dataframe,
schema=schema,
)

Resources

Getting Started: Quick Guides
Category
Code
Multi-Class Sentiment Classification
NLP
Named Entity Recognition
NLP
Image Classification
CV
Additional Education: Long Examples
Category
Code
Multi-Class Sentiment Classification using Hugging Face
NLP
Multi-Class Sentiment Classification using OpenAI
NLP
Named Entity Recognition using Hugging Face
NLP
Image Classification using Hugging Face
CV