Large Language Models (LLM)
How to log your model schema for LLM use cases
This model type is currently in early release. Please contact [email protected] for access
LLM Use Cases | Expected Fields | Evaluation Metrics |
---|---|---|
LLM Summarization | *prediction label, actual label, prediction score, actual score | bleu_scores, rouge_scores, bert_scores, Arize Evaluation Metric* |
LLM Question Answering (QA) | *prediction label, actual label, prediction score, actual score | bleu_scores, rouge_scores, bert_scores, Arize Evaluation Metric* |
*Arize generates an Evaluation Score based on a query to evaluate your LLM performance
The
EmbeddingColumnNames
class constructs your embedding objects. You can log them into the platform using a dictionary that maps the embedding feature names to the embedding objects. See our API reference for more details.Python Pandas
prompt | Text | actual | summary | prediction_vector | prompt_vector | Timestamp |
---|---|---|---|---|---|---|
This is the moment when an angry woman forced ... | | This is the moment when an angry woman forced ... | This is the moment when an angry woman forced ... | [-0.2077837288, -0.3230468631, -0.08706595, -0... | [-0.1104477122, -0.35748869180000004, -0.02833... | 1618590882 |
from arize.pandas.logger import Client, Schema
from arize.utils.types import ModelTypes, Environments, EmbeddingColumnNames
API_KEY = 'ARIZE_API_KEY'
SPACE_KEY = 'YOUR SPACE KEY'
arize_client = Client(space_key=SPACE_KEY, api_key=API_KEY)
feature_column_names=[
"prompt",
"bleu_score",
"rogue_score",
"evaluation_score"
]
# Declare embedding feature columns
embedding_feature_column_names = {
# Dictionary keys will be the name of the embedding feature in the app
"prompt_embedding": EmbeddingColumnNames(
vector_column_name="prompt_vector",
data_column_name="prompt",
),
"prediction_embedding": EmbeddingColumnNames(
vector_column_name="prediction_vector",
data_column_name="summary",
)
}
# Defina the Schema, including embedding information
schema = Schema(
prediction_id_column_name="prediction_id",
timestamp_column_name="prediction_ts",
prediction_label_column_name="summary", # this is generated summary
actual_label_column_name="actual", # reference text, actual summary
feature_column_names=feature_column_names,
embedding_feature_column_names=embedding_feature_column_names,
)
# Log the dataframe with the schema mapping
response = arize_client.log(
model_id="generative-ai-text-summarization-tutorial-v2-cnn",
model_version= "v1",
model_type=ModelTypes.SCORE_CATEGORICAL,
environment=Environments.PRODUCTION,
dataframe=test_dataframe,
schema=schema,
)
Arize supports logging the embedding features associated with the text the model is acting on and the text itself using the
EmbeddingColumnNames
object.{
"embedding_display_name": EmbeddingColumnNames(
vector_column_name="text_vector",
data_column_name="text"
)
}
The embedding
vector
is the dense vector representation of the unstructured input. See here for more information on embeddings and options for generating them. The embedding data
is the raw data associated with the vector. It is the field typically chosen for LLM use-cases since you can introduce both strings (full sentences) or list of strings (token arrays).