CML (DVC)
Example of Arize use in CI / CD Workflow
Overview
This tutorial runs through how to use Arize in a Continuous Integration and Continuous Deployment workflow for models.
This tutorial is based on Continuous Machine Learning Groups work:
This tutorial will show you how to:
Integrate Arize into the CI/CD workflow
Run Arize on Validation data every time a new model version is checked in
Capture data in the platform to compare different validated models
Workflow Overview
The CI/CD workflow for models with CML involves a training script and a linkage to Github actions.
The following describes the steps for training and validation runs:
A model directory is setup on Github which contains both model file and train scripts for CML
Train scripts are built to run a set of inferences across any newly built model
GitHub actions are setup to run the train script on any model checkin
On Model Checkin the train script is run
The train Script logs the validation inferences to Arize
Checks within the Arize platform can be setup to run on every new validation batch of data. These checks can include comparing against previous model data or fixed levels analysis
On check failure dashboards can be created for model analysis
Future: The ability to quickly poll through API the validation checks as part of Github actions for pass / fail
An example Train script for CML with Arize is included here:
The github/workflows directory defines the github actions that are run on model checkin. This is derived from the CML example.
The Train scripts typically have 2 parts:
Train and Score the latest model
2. Log artifacts and data to CML
An additional 3rd section is added to send the feature data, inferences (predictions) and ground truth (actuals) to Arize
If using version < 4.0.0, replace space_id=SPACE_ID
with organization_key=SPACE_ID
The above workflow can be modified for any model CI CID flow where scoring is done from a set of validation inferences.
Last updated