Search…
⌃K
Links

Large Language Models (LLM)

How to log your model schema for LLM use cases
This model type is currently in early release. Please contact [email protected] for access

LLM Model Overview

Text Classification Models predict the categories a piece of text might belong to.
LLM Use Cases
Expected Fields
Evaluation Metrics
LLM Summarization
*prediction label, actual label, prediction score, actual score
bleu_scores, rouge_scores, bert_scores, Arize Evaluation Metric*
LLM Question Answering (QA)
*prediction label, actual label, prediction score, actual score
bleu_scores, rouge_scores, bert_scores, Arize Evaluation Metric*
*Arize generates an Evaluation Score based on a query to evaluate your LLM performance

Code Example

The EmbeddingColumnNames class constructs your embedding objects. You can log them into the platform using a dictionary that maps the embedding feature names to the embedding objects. See our API reference for more details.
Python Pandas

Example Row

prompt
Text
actual
summary
prediction_vector
prompt_vector
Timestamp
This is the moment when an angry woman forced ...
This is the moment when an angry woman forced ...
This is the moment when an angry woman forced ...
[-0.2077837288, -0.3230468631, -0.08706595, -0...
[-0.1104477122, -0.35748869180000004, -0.02833...
1618590882
from arize.pandas.logger import Client, Schema
from arize.utils.types import ModelTypes, Environments, EmbeddingColumnNames
API_KEY = 'ARIZE_API_KEY'
SPACE_KEY = 'YOUR SPACE KEY'
arize_client = Client(space_key=SPACE_KEY, api_key=API_KEY)
feature_column_names=[
"prompt",
"bleu_score",
"rogue_score",
"evaluation_score"
]
# Declare embedding feature columns
embedding_feature_column_names = {
# Dictionary keys will be the name of the embedding feature in the app
"prompt_embedding": EmbeddingColumnNames(
vector_column_name="prompt_vector",
data_column_name="prompt",
),
"prediction_embedding": EmbeddingColumnNames(
vector_column_name="prediction_vector",
data_column_name="summary",
)
}
# Defina the Schema, including embedding information
schema = Schema(
prediction_id_column_name="prediction_id",
timestamp_column_name="prediction_ts",
prediction_label_column_name="summary", # this is generated summary
actual_label_column_name="actual", # reference text, actual summary
feature_column_names=feature_column_names,
embedding_feature_column_names=embedding_feature_column_names,
)
# Log the dataframe with the schema mapping
response = arize_client.log(
model_id="generative-ai-text-summarization-tutorial-v2-cnn",
model_version= "v1",
model_type=ModelTypes.SCORE_CATEGORICAL,
environment=Environments.PRODUCTION,
dataframe=test_dataframe,
schema=schema,
)

LLM Embedding Features

Arize supports logging the embedding features associated with the text the model is acting on and the text itself using the EmbeddingColumnNames object.
{
"embedding_display_name": EmbeddingColumnNames(
vector_column_name="text_vector",
data_column_name="text"
)
}
The embedding vector is the dense vector representation of the unstructured input. See here for more information on embeddings and options for generating them. The embedding data is the raw data associated with the vector. It is the field typically chosen for LLM use-cases since you can introduce both strings (full sentences) or list of strings (token arrays).