Ask or search…
K
Links

Large Language Models (LLM)

How to log generative LLM models to Arize

LLM Use Cases

There are various applications of LLMs in production that may impact the required data necessary to monitor and troubleshoot LLM performance effectively.
Click on each use case to uncover more details.

Upload LLM Code Example

Google Colaboratory
A step-by-step guide to ingesting LLMs
The following code example provides a brief overview of uploading prompts, embeddings, and other model parameters. Run the above Colab for a more detailed view LLM ingestion.
Python Pandas Batch
Python Single Record

Example Row

prompt_text
prompt_vector
response_text
openai_relevance_0
retrieval_text_0
text_similarity_0
text_similarity_1
user_feedback
prediction_ts
How often does Arize query the table for table import jobs?
[ 0.00393428 -0.00417591 -0.00854287...
Arize will regularly sync new data from your data source with Arize...
irrelevant
Arize will attempt a dry run to validate your job for any access...
0.86539755255...
0.8653975525...
NaN
2023-04-05 20:33:22.006650000
# Declare prompt and response columns
prompt_columns=EmbeddingColumnNames(
vector_column_name="prompt_vector",
data_column_name="prompt_text"
)
response_columns="response_text"
tag_columns = [
"openai_relevance_0",
"retrieval_text_1",
"text_similarity_0",
"text_similarity_1",
]
# Define the Schema, including embedding information
schema = Schema(
timestamp_column_name="prediction_ts",
actual_label_column_name="user_feedback",
tag_column_names=tag_columns,
prompt_column_names=prompt_columns,
response_column_names=response_columns
)
# Log the dataframe with the schema mapping
response = arize_client.log(
model_id="search-and-retrieval-with-corpus-dataset",
model_version= "v1",
model_type=ModelTypes.GENERATIVE_LLM,
environment=Environments.PRODUCTION,
dataframe=test_dataframe,
schema=schema,
)
from arize.api import Client
from arize.utils.types import ModelTypes, Environments, Embedding
API_KEY = 'ARIZE_API_KEY'
SPACE_KEY = 'YOUR SPACE KEY'
arize_client = Client(space_key=SPACE_KEY, api_key=API_KEY)
# example features
# features = {
'feature_name': feature_value,
...
}
# example tags
tags = {
'step': 0,
'task_type': 'Q&A',
'conversation_id': '0a9c',
'api_call_duration': 6.238615,
'response_len': 73,
'prompt_len': 88,
'prompt_template': "You are an agent engineered to provide concise and precise answers to any question."
}
prompt = Embedding(
vector= [0.10031, -0.06012, -0.00259, -0.08568, ...],
data="! Why does the UI have to be so clunky and unintuitive?! It's like every time I try to navigate through the options, I end up getting lost in a sea of confusing menus and poorly labeled buttons. And don't even get me started on the color scheme - who thought neon green on black was a good idea? At this point, I'm ready to throw my computer out the window and switch to a different product altogether. Come on, developers - make things simpler for us already!"
)
response = Embedding(
vector= [-0.11366, -0.20642, -0.03115, -0.18099,...],
data="I'm sorry to hear that you're experiencing difficulties with the UI. As an AI language model, I don't have the capability to change the UI, but I suggest looking into user guides, tutorials or support forums to help navigate through the menus more easily. Additionally, you could provide feedback to the developers on the UI, either through in-product feedback tools or their website. They may take into account your suggestions in future updates."
)
# Log data into the Arize platform
response = arize_client.log(
prediction_id='plED4eERDCasd9797ca34',
model_id='sample-model-1',
model_type=ModelTypes.GENERATIVE_LLM,
environment=Environments.PRODUCTION,
model_version='v1',
prediction_timestamp=1618590882,
# prediction_label will default to 1 if not sent
actual_label=1, # 1 represents a thumbs up, the user liked the response of the model
features=features,
tags=tags,
prompt=prompt,
response=response,
)

Ingesting Prompt & Response

Arize supports ingesting prompts sent to the LLM and its responses via the prompt and response fields in the log function.
The following examples include both prompt and response information. However, you can send just one of them (or none) if you do not have both.

Sending Prompt & Response with Embeddings

Use the Embedding object to create the prompt(containing the document_vector) and the response(containing the summary_vector).
  • The embedding vector is the dense vector representation of the unstructured input.
    Note: embedding features are not sparse vectors.
  • The embedding data is the raw text data associated with the vector.
# Build prompt & response embeddings
prompt = Embedding(
vector= [0.10031, -0.06012, -0.00259, -0.08568, ...],
data="! Why does the UI have to be so clunky and unintuitive?! It's like every time I try to navigate through the options, I end up getting lost in a sea of confusing menus and poorly labeled buttons. And don't even get me started on the color scheme - who thought neon green on black was a good idea? At this point, I'm ready to throw my computer out the window and switch to a different product altogether. Come on, developers - make things simpler for us already!"
)
response = Embedding(
vector= [-0.11366, -0.20642, -0.03115, -0.18099,...],
data="I'm sorry to hear that you're experiencing difficulties with the UI. As an AI language model, I don't have the capability to change the UI, but I suggest looking into user guides, tutorials or support forums to help navigate through the menus more easily. Additionally, you could provide feedback to the developers on the UI, either through in-product feedback tools or their website. They may take into account your suggestions in future updates."
)
# Log data into the Arize platform
response = arize_client.log(
...
model_type=ModelTypes.GENERATIVE_LLM,
prompt=prompt,
response=response,
)
See here for more information on embeddings and options for generating them.

Sending Prompt & Response without Embeddings

If you prefer not to include embedding vectors with your prompt or response text, you can achieve this by just providing the prompt or response text.
# Declare prompt & response
prompt = "! Why does the UI have to be so clunky and unintuitive?! It's like every time I try to navigate through the options, I end up getting lost in a sea of confusing menus and poorly labeled buttons. And don't even get me started on the color scheme - who thought neon green on black was a good idea? At this point, I'm ready to throw my computer out the window and switch to a different product altogether. Come on, developers - make things simpler for us already!"
response = "I'm sorry to hear that you're experiencing difficulties with the UI. As an AI language model, I don't have the capability to change the UI, but I suggest looking into user guides, tutorials or support forums to help navigate through the menus more easily. Additionally, you could provide feedback to the developers on the UI, either through in-product feedback tools or their website. They may take into account your suggestions in future updates."
# Log data into the Arize platform
response = arize_client.log(
...
model_type=ModelTypes.GENERATIVE_LLM,
prompt=prompt,
response=response,
)

Ingesting Prompt Template (Optional)

Arize supports ingesting prompt versions and the overall prompt to be natively tracked in the platform. The following fields in the log call are available:
  • prompt_template: This field should receive the prompt template in string format. The variables are represented by using the double key braces: {{variable_name}}.
    • For example, your prompt template might look something like:
    Given the context of '{{retrieval_text_0}} + {{retrieval_text_1}}', and based on the frequently asked questions from our users, answer the user query as follows: '{{user_query}}'. Follow the instructions here exactly: '{{instruction}}'.
  • prompt_template_version: This field should receive the version of the template used. This will allow you to filter by this field in the Arize platform.
# Declare prompt template columns
prompt_template = "Given the context of '{{retrieval_text_0}} + {{retrieval_text_1}}', and based on the frequently asked questions from our users, answer the user query as follows: '{{user_query}}'. Follow the instructions here exactly: '{{instruction}}'."
prompt_template_version = "template_v1"
# Log data into the Arize platform
response = arize_client.log(
...
model_type=ModelTypes.GENERATIVE_LLM,
prompt=prompt,
response=response,
prompt_template=prompt_template,
prompt_template_version=prompt_template_version,
)

Ingesting LLM Configuration Parameters (Optional)

Arize supports the original LLM configurations to be tracked + monitored as well as modified as a part of the prompt playground feature. Currently, Open AI models are supported, with more model support coming soon. The following fields in the log call are available:
  • llm_model_name: This field should receive the name of the LLM used to produce a response to the prompt. Common examples are gpt-3.5turbo or gpt-4.
  • llm_params: This field should receive the hyperparameters used to configure the LLM used. The type should be a well-formatted JSON string. For example: {'max_tokens': 500, 'presence_penalty': 0.66, 'temperature': 0.28}
# Declare LLM hyperparameters
llm_params="{'max_tokens': 500, 'presence_penalty': 0.66, 'temperature': 0.28}"
# Log data into the Arize platform
response = arize_client.log(
...
model_type=ModelTypes.GENERATIVE_LLM,
prompt=prompt,
response=response,
prompt_template=prompt_template,
prompt_template_version=prompt_template_version,
llm_model_name="gpt-3.5turbo",
llm_params=llm_params,
)

Ingesting LLM Run Metadata

Arize supports tracking token usage and response latency from the LLM run inference. Learn more about metadata tracking here.
LLMRunMetadataColumnNames: This field groups together the run metadata
llm_run_metadata = LLMRunMetadata(
total_token_count=4325,
prompt_token_count=2325
response_token_count=2000,
response_latency_ms=20000,
)
# Log data into the Arize platform
response = arize_client.log(
...
model_type=ModelTypes.GENERATIVE_LLM,
prompt=prompt,
response=response,
prompt_template=prompt_template,
prompt_template_version=prompt_template_version,
llm_model_name="gpt-3.5turbo",
llm_params=llm_params,
llm_run_metadata=llm_run_metadata,
)

Evaluation Metrics

Evaluation (eval) metrics are used to quantify LLM model performance. Evals are typically:
  • User Feedback
  • Task-based Metrics
  • LLM Assisted Evals w/ Templates
Arize supports various evaluation approaches depending on your data and use case.
Navigate here to learn which eval metric is right for you.

Case-Specific LLM Ingestion

Prompt & Response

Upload LLM prompt and responses via the prompt_column_names and response_column_names fields.
What is an embedding? How do I generate an embedding? Learn more here.

Prompt & Response without Embeddings

Upload prompt and responses without embeddings vectors using the relevant column name for your prompt and/or response text.
The following examples include both prompt and response information. However, you can send either prompt or response if you do not have both.
# Declare prompt & response text columns
prompt_columns="document"
response_columns="summary"
# Define the Schema
schema = Schema(
...
prompt_column_names=prompt_columns,
response_column_names=response_columns,
)

Prompt & Response with Embeddings

Upload prompt and responses with embedding vectors using the EmbeddingColumnNames object to define the prompt_column_names and response_column_names in your model schema.
  • The vector_column_name should match the column name representing your embedding vectors.
Note: The embedding vector is the dense vector representation of the unstructured input. Embedding features are not sparse vectors.
  • The data_column_name should match the column name representing the raw text associated with the vector is stored.
The data_column_name is typically used for NLP use cases. The column can contain both strings (full sentences) or a list of strings (token arrays).
# Declare prompt & response embedding columns
prompt_columns=EmbeddingColumnNames(
vector_column_name="prompt_vector", #optional
data_column_name="response"
),
response_columns=EmbeddingColumnNames(
vector_column_name="response_vector", #optional
data_column_name="response"
)
# Define the Schema
schema = Schema(
...
prompt_column_names=prompt_columns,
response_column_names=response_columns,
)

Prompt Playground

Upload prompt versions and the prompt templates using the PromptTemplateColumnNames object.
  • PromptTemplateColumnNames: The field that groups prompt templates with their versions
  • template_column_name: The field that contains the prompt template in string format
  • template_version_column_name: The field that defines the template version
Example fields:
The template_column_name variables are represented via the double key braces {{variable_name}}.
Given the context of '{{retrieval_text_0}} + {{retrieval_text_1}}', and based on the frequently asked questions from our users, answer the user query as follows: '{{user_query}}'. Follow the instructions here exactly: '{{instruction}}'.
The template_version_column_name field enables you to filter by version in Arize.
# Declare prompt template columns
prompt_template_columns = PromptTemplateColumnNames(
template_column_name="prompt_template",
template_version_column_name="prompt_template_name"
)
# Define the Schema
schema = Schema(
...
prompt_template_column_names=prompt_template_columns,
)
Learn more about prompt engineering here.

LLM Configuration Parameters

Track and monitor original and modified LLMs with the LLMConfigColumnNames object.
  • LLMConfigColumnNames: This field groups the LLM with its hyperparameters
  • model_column_name: This field contains the LLM names used to produce responses (i.e. gpt-3.5turbo or gpt-4).
  • params_column_name: This field contains the hyperparameters used to configure the LLM. The contents of the column must be well-formatted JSON string (i.e. {'max_tokens': 500, 'presence_penalty': 0.66, 'temperature': 0.28}).
# Declare LLM config columns
llm_config_columns = LLMConfigColumnNames(
model_column_name="llm_config_model_name",
params_column_name="llm_params",
)
# Define the Schema
schema = Schema(
...
llm_config_column_names=llm_config_columns,
)

Track Token Usage

Track token usage and response latency from the LLM run inference with the LLMRunMetadataColumnNames field.
  • LLMRunMetadataColumnNames: This field groups together the run metadata
llm_run_metadata = LLMRunMetadataColumnNames(
total_token_count_column_name="total_tokens_used",
prompt_token_count_column_name="prompt_tokens_used",
response_token_count_column_name="response_tokens_used",
response_latency_ms_column_name="response_latency",
)
# Define the Schema
schema = Schema(
...
llm_run_metadata_column_names=llm_run_metadata,
)
Learn more about metadata tracking here.

Retrieval Debugging with Knowledge Base (Corpus) Data

Upload the dataset of documents, such as the Knowledge Base (or Corpus), of the deployed application with the CorpusSchema object.
# Logging the Corpus dataset
response = arize_client.log(
dataframe=corpus_df, # Refers to the above dataframe with the example row
model_id="search-and-retrieval-with-corpus-dataset",
model_type=ModelTypes.GENERATIVE_LLM,
environment=Environments.CORPUS,
schema=CorpusSchema(
document_id_column_name='document_id',
document_text_embedding_column_names=EmbeddingColumnNames(
vector_column_name='text_vector',
data_column_name='text'
),
document_version_column_name='document_version'
),
)
Learn more about how Corpus datasets are used here.