Arize AI
Search…
File Importer - Cloud Storage
Connect a cloud storage bucket with Arize to automatically sync your model inferences
If you are already storing your model inferences in cloud storage, Arize can automatically extract data from your files and transcribe the stored rows as Arize model records.
Cloud Storage options
Supported Object Stores
  • Amazon S3
  • Google GCS
  • Microsoft Azure Cloud Storage (coming soon)
Our GraphQL API provides a direct path between your cloud storage and the Arize platform. Learn more here.
Supported File Formats
  • CSV
  • Parquet
  • JSON (coming soon)
  • Arrow (coming soon)
See file format section for details on on each format:

How it works

Arize syncs the contents of files as model inferences by creating File Import Jobs. A File Import Job is configured with the location of your files, a file schema (how to import the rows), and some basic model configuration. These jobs then continuously track files that are uploaded / present in your buckets. If a change is detected in these buckets, the new files are automatically ingested into Arize platform.
Arize checks for new files every 10 seconds.
File Importer Service converts file contents to model inferences on a set interval
The above diagram illustrates how Arize continuously pulls new files from your bucket to import as model inferences.
File import jobs map bucket/prefix tuples to model inference datasets:
job 1: s3://training-bucket/cc-fraud/ -> cc_fraud_model training dataset
job 2: s3://production-bucket/cc-fraud/ -> cc_fraud_model production dataset
job 3: gs://bucket1/click-thru-rate/production/v1/ -> ctr_model production dataset
job 4: gs://bucket1/click-thru-rate/training/v1/ -> ctr_model training dataset
It is important to configure the jobs to point to distinct bucket locations to take advantage of the job configuration parameters (file schema) to import the inferences under the correct model and environment.
Capturing things like version and batchId inside of the files makes the jobs automatically capture these subsets in your model history.
If you use directory structure for capturing properties like a model'senvironment or version, you'll need to make use of the job's model configuration and schema definition respectively. These "constant" values will be used when importing the inferences and reflected on your dataset.

File Import Jobs

A file import job captures how to transcribe files in an object store location to model inferences. Once a job is created, it coordinates a worker to continuously ingest inferences from the files. Jobs are created and managed on a per-space basis so jobs that are located under a workspace will create/update models under the workspace that they are configured in. This allows you to tightly control access to your model inferences if needed.
Please reach out to the Arize team to request support for import job-sets that allow importing sets or groups of models with a single setup. Useful for cases with 100's/1000's of models or dynamic setup of models/jobs based on dropped data.

Authentication & Authorization

In order for the File Importer to access files in the bucket, Arize must be authorized to access these objects. Authentication and authorization is enabled through applying a policy to the bucket (in AWS) or providing Arize's principal access to a custom role in your GCP project that enables the Arize file importer service to access your files.
RBAC (role-based access control) controls the write access to models within a workspace within the Arize platform. File import jobs fall under the same access. Thus, users in your account who have write access under a particular workspace also have New Import Job access and can create new jobs to ingest new models into Arize.
In addition to providing Arize with read permission to your cloud storage bucket, you must prove to Arize that you own the bucket the job is importing data from. This motion is secured through applying to the a tag (AWS) or label (GCS) consisting of a key-value pair to your storage bucket with the key as arize-ingestion-key and the value provided in the import wizard.
Capture your unique arize-ingestion-key
Permissions are cloud platform specific. Please refer to GCS and S3 Examples for details on how to set up these up:
Questions? Email us at [email protected] or Slack us in the #arize-support channel
Copy link
Outline
How it works
File Import Jobs
Authentication & Authorization