Create a Task

A Task is a function that runs on a dataset. That function represents something that you want to test in isolation, offline, before you push your code to production. A task can represent a LLM generation/response in an LLM application, the testing of a new LLM template, the testing of a new model, an LLM generating code or a general purpose function.

The inputs to the Task function are an Example Class that represents a row of a dataframe.

The dataframe is convert row by row into an Example that is passed in to the Task under test.

An example has the following parameters:

example.id The id of the example

example.input The input column of the dataframe (attributes.llm.input_messages)

example.output The output column of the dataframe (attributes.output.value)

example.dataset_row The dataset row of the dataframe is all columns of the dataframe. Arize support any number of columns.

example.metadataMetadata column of dataframe (attributes.metadata)

Each example within a dataset represents a single data point, consisting of a dataset_row a dictionary, input (a string or JSON), an optional output string, and an optional metadata dictionary. The optional output dictionary often contains the the expected LLM application output for the given input.

Testing a New Prompt

new_template = """
Context information is below.
---------------------
{context_str}
---------------------
Given the context information and not prior knowledge, answer the query.
Query: {query_str}
Answer:
"""

TASK_MODEL = 'gpt-4o'

def test_prompt_generation(example)-> str:
    prompt_vars = json.loads(example.dataset_row['attributes.llm.prompt_template.variables'])
    full_prompt = new_template.format(**prompt_vars)
    response = client.chat.completions.create(
        model=TASK_MODEL,
        temperature=0.1,
        messages=[
            {
                "role": "system",
                "content": " ",
            },
            {
                "role": "user",
                "content": full_prompt,
            },
        ],
    )
    return response.choices[0].message.content

Last updated

Copyright © 2023 Arize AI, Inc