Arize syncs the contents of files as model inferences by creating File Import Jobs. A File Import Job is configured with the location of your files, a file schema (how to import the rows), and some basic model configuration. These jobs then continuously track files that are uploaded / present in your buckets. If a change is detected in these buckets, the contents of the files are automatically ingested into the Arize platform without the need to manually log the records.
File Importer Service converts file contents to model inferences on a set interval
The above diagram illustrates how Arize continuously pulls new files from your bucket to import as model inferences.
File import jobs map bucket/prefix tuples to model inference datasets:
job 1: s3://training-bucket/cc-fraud/ -> cc_fraud_model training dataset
job 2: s3://production-bucket/cc-fraud/ -> cc_fraud_model production dataset
It is important to configure the jobs to point to distinct bucket locations OR to take advantage of the job configuration parameters (file schema) to import the inferences under the correct model and dataset type.
Capturing things like version and batchId inside of the files makes the jobs automatically adjust to changes in your model history. However if you use a directory structure for capturing properties like version and environment, make use of the job's model configuration. These "constant" values will be used when importing the inferences.
File Import Jobs
A file import job captures how to transcribe files in an object store location to model inferences. Once a job is created, it coordinates a worker to continuously ingest inferences from the files. Jobs are created and managed on a per-space basis so jobs that are located under a workspace will create/update models under the workspace that they are configured. This allows you to tightly control access to your model inferences if needed.
Please reach out to the Arize team to request support for import job-sets that allow importing sets or groups of models with a single setup. Useful for cases with 100's/1000's of models or dynamic setup of models/jobs based on dropped data.
Authentication & Authorization
The file importer pulls from file locations in S3/Google Cloud or Azure. In order to access the bucket Arize must be authorized to access the file. Authentication and authorization is enabled through applying a policy to the bucket that enables the Arize file importer service to access your files.
Example AWS S3 Policy that grants Arize access to your files
The bucket policy is cloud platform specific. The diagram above shows how to apply a policy that is provided to you during the job setup process.
RBAC (role-based access control) controls the write access to models within a workspace within the Arize platform. File import jobs fall under the same access. Thus, users in your account that have write access under a particular workspace also have create access to the jobs that manage the model ingestion pipeline.
Additionally, you cloud storage bucket is secured through the application of a bucket tag consisting of a key value pair to your cloud storage bucket using a key of arize-ingestion-key and value of the provided tag accessible in the import wizard. Ex: AWS Object Tags.
Navigate to your bucket properties tab
Add your arize-ingestion-key as a bucket tag
Example bucket tag that secures your storage bucket