Arize AI
Search…
⌃K

Natural Language Processing (NLP)

How to log your model schema for classification models at the sentence(s) level

NLP Model Overview

Text Classification Models predict the categories a piece of text might belong to. Use this Colab for examples of how to obtain embedding vectors from a BERT-like model.
*all binary classification variant specifications are applicable to the NLP model type, with the addition of embeddings

Performance Metric

Accuracy, Recall, Precision, FPR, FNR, F1, Sensitivity, Specificity

Examples

File Type
NLP Classification
NLP NER
Python Batch

NLP Classification Model Schema Parameters

Arize supports logging the prediction & actual category and score, as well as the embedding features associated with the text the model is acting on, and the text itself.
Arize Field
Data Type
Example
str
"prediction_id"
timestamp
str
"prediction_ts"
model_id
str
'sample-model-1'
str
'v1'
model_type
str
SCORE_CATEGORICAL
str
Environments.PRODUCTION
prediction_label
str | bool
"positive"
prediction_score
float
0.6
actual_label
str | bool
"negative"
actual_score
float
0
embedding_feature_column_names
List[EmbeddingColumnNames]
[
EmbeddingColumnNames(
vector_column_name="text_embedding",
data_column_name="text"
)
]
In addition, the EmbeddingColumnNames object that Arize provides have its own fields described below,
Field
Data Type
Example
vector_column_name
str
"text_embedding"
data_column_name
str
"text"
The embedding vector is the dense vector representation of the unstructured input. The embedding data is the raw data associated with the vector. It is the field typically chosen for NLP use-cases since you can introduce both strings (full sentences) or list of strings (token arrays).
Arize offers the Embedding class to construct your embedding objects. You can log them into the platform using a dictionary that maps the embedding feature names (how they will appear in the UI) to the embedding objects. See our API reference for more details.

Code Example

# Example embedding features
embedding_features = [
EmbeddingColumnNames(
vector_column_name="text_embedding",
data_column_name="text",
),
]
# Declare the schema of the dataframe you're sending (predictions, timestamp, actuals)
schema = Schema(
prediction_id_column_name="prediction_id",
timestamp_column_name="prediction_ts",
prediction_label_column_name="PREDICTION",
prediction_score_column_name="PREDICTION_SCORE",
actual_label_column_name="ACTUAL",
actual_score_column_name="ACTUAL_SCORE",
embedding_feature_column_names=embedding_features
)
# Log data into the Arize platform
response = arize_client.log(
model_id='sample-model-1',
model_version='v1',
model_type=ModelTypes.SCORE_CATEGORICAL,
environment=Environments.PRODUCTION,
dataframe=test_dataframe,
schema=schema
)
This example is for the python batch ingestion method, for other languages, please refer to our examples.

Additional NLP Schema Parameters

Prediction Label

Is this a positive product review?
Label = "positive"

Prediction Score

How likely is this to be positive?
Score = 0.6

Actual Label

"Was this actually a positive product review?"
Label = "negative"

Actual Score

"Was this actually positive?"
Score = 0