Online: Code Based Evals
This feature is in closed Beta
Arize supports code based evaluations for experiments. These can be written in python and run either in code (offline) or in platform (online).
Please reference the code based evaluators here.
The above is the user interface configuration for a online code based Eval. Arize supports sampling and filtering on a per Eval task basis configurable with the task.
Feature | Online Eval | Offline Eval |
---|---|---|
Code Executes | In Platform Server | Python Client Side |
Data Available | Every Span Attribute and Eval | Every Span Attribute and Eval |
Created | UI or Python SDK | Python Code |
Run Statistics | Eval Task Execution Statistics | N/A |
Tracing | N/A | Supported Relative to Experiment |
Python Libraries | Full Support of Public Accessible Libraries | Full Support of Pip Accessible Libraries |
Execution Time Libraries | Libraries Pre-Downloaded in Requirements | Any Library |
Library Version | Any Version | Any Version |
Internet Content | No Network Access in Python | Any |
The online code based Eval runs server side as data is ingested. It runs in a isolated container that is preloaded with the libraries in the requirements. Any version can be specified of any code based Eval library as every container is pre-loaded with the specific libraries.
Writing an Online Code Eval
The online Evals for code are supported using the same approach as the code based Eval for offline use. One can just copy code from an Eval into the user interface or push through the API.
The above is a code based Eval using the BaseArizeEvaluator
class. The evaluate method uses a 3rd party model for language detection.
All the Code Evaluator types are supported an evaluator can return a score, label and EvaluationResult.
Trouble Shooting Online Task Runs
Online tasks are run on incoming data and understanding what has run on what data, can be complicated. Arize provides detailed information on what was applied to what specific incoming data.
The above shows examples of data that is run, skipped and or processed.
Last updated