Google Cloud Storage (GCS)
Set up an import job to ingest data into Arize from GCS
Set up an import job to log inference files to Arize. Updates to files are checked every 10 seconds. Users generally find a sweet spot around a few hundred thousand to a million rows in each file with the total file limit being 1GB.
Navigate to the 'Upload Data' page on the left navigation bar in the Arize platform. From there, select the 'GCS' card to begin a new file import job.

Step 1: Pick GCS
Fill in the bucket and folder (optional) where you would like Arize to pull your model's inferences.

In this example, you might have a GCS bucket and folder named
gcs://bucket1/click-thru-rate/production/v1/
that contains parquet files of your model inferences. Your bucket name is bucket1
and your prefix is click-thru-rate/production/v1/
.The file structure can take into consideration various model environments (training, production, etc) and locations of ground truth.
Example 1: Predictions & Actuals Stored in Separate Folders (different prefixes)
This example contains model predictions and actuals in separate files. This helps in cases of delayed actuals. Learn more here.
gcs://bucket1/click-thru-rate/production/
├── prediction-folder/
│ ├── 12-1-2022.parquet #this file can contain multiple versions
│ ├── 12-2-2022.parquet
│ ├── 12-3-2022.parquet
├── actuals-folder/
│ ├── 12-1-2022.parquet
│ ├── 12-2-2022.parquet
│ └── 12-3-2022.parquet
Example 2: Production & Training Stored in Separate Folders
This example separates model environments (production and training).
gcs://bucket1/click-thru-rate/v1/
├── production-folder/
│ ├── 12-1-2022.parquet
│ ├── 12-2-2022.parquet
│ ├── 12-3-2022.parquet
├── training-folder/
│ ├── 12-1-2022.parquet
│ ├── 12-2-2022.parquet
│ └── 12-3-2022.parquet
The GCS Project ID is a unique identifier for a project. See GCS Docs for steps on how to retrieve this ID.

Tag your GCS bucket with the key
arize-ingestion-key
and the provided tag value. For more details, see docs on Using Bucket Labels.1) In Arize UI: Copy
arize-ingestion-key
value
Capture your unique arize-ingestion-key
2) In Google Cloud console: Navigate to Cloud Storage
Here, you will see a list of your buckets. Find the bucket matching the bucket name set in your job (step 2), select the button for more options, and update its labels to include the arize-ingestion-key.
- Key: arize-ingestion-key
- Value: arize-ingestion-key value from the Arize UI



Create a custom role and copy the command from the Custom IAM Role field in Arize UI.


Start the Google Cloud Shell

Create the IAM Role
Grant Arize access to the custom role
- Copy the command from the Apply IAM Permission Field in the Arize UI.

Paste and run the gsutil command in the Google Cloud Shell. Be sure to update your project id in the service account path.

Apply the IAM Permissions
Model schema parameters are a way of organizing model inference data to ingest to Arize. When configuring your schema, be sure to match your data column headers with the model schema.
Either use a form or a simple JSON-based schema to specify the column mapping.
Arize supports CSV, Parquet, Avro, and Apache Arrow. Refer here for a list of the expected data types by input type.

Set up model configurations

Map your file using form inputs

Map your file using a JSON schema
Property | Description | Required |
---|---|---|
prediction_ID | The unique identifier of a specific prediction | Required |
timestamp | The timestamp of the prediction in seconds or an RFC3339 timestamp | Optional, defaults to current timestamp at file ingestion time |
prediction_label | Column name for the prediction value | Required |
prediction_score | Column name for the predicted score | |
actual_label | Column name for the actual or ground truth value | Optional for production records |
relevance_label | Column name for ranking actual or ground truth value | |
actual_score | Column name for the ground truth score | Optional |
relevance_score | Column name for ranking ground truth score | |
features | A string prefix to describe a column feature/ . Features must be sent in the same file as predictions | Arize automatically infers columns as features. Choose between feature prefixing OR inferred features. |
tags | A string prefix to describe a column tag/ . Tags must be sent in the same file as predictions and features | Optional |
shap_values | A string prefix to describe a column shap/ . SHAP must be sent in the same file as predictions or with a matching prediction_id | Optional |
version | A column to specify model version. version/ assigns a version to the corresponding data within a column, or configure your version within the UI | Optional, defaults to 'no_version' |
batch_id | Distinguish different batches of data under the same model_id and model_version. Must be specified as a constant during job setup or in the schema | Optional for validation records only |
exclude | A list of columns to exclude if the features property is not included in the ingestion schema | Optional |
embedding_features | A list of embedding columns, required vector column, optional raw data column, and optional link to data column. Learn more here | Optional |
Once finished, your import job will be created and will start polling your bucket for files.
If your model receives delayed actuals, connect your predictions and actuals using the same prediction ID, which links your data together in the Arize platform. Arize regularly checks your data source for both predictions and actuals, and ingests them separately as they become available. Learn more here.
Arize will attempt a dry run to validate your job for any access, schema, or record-level errors. If the dry run is successful, you may then create the import job.

After creating a job following a successful dry run, you will be taken to the 'Job Status' tab where you can see the status of your import jobs. A created job will regularly sync new data from your data source with Arize. You can view the job details and import progress by clicking on the job ID, which uncovers more information about the job.
An import job may run into a few problems. Use the dry run and job details UI to troubleshoot and quickly resolve data ingestion issues.
If there is an error validating a file the model schema, Arize will surface an actionable error message. From there, click on the 'Fix Schema' button to adjust your model schema.
%20(1).png?alt=media&token=0dc24395-54f7-4e09-b9eb-3475894e3bc9)
If your dry run is successful, but your job fails, click on the job ID to view the job details. This uncovers job details such as information about the file path or query id, the last import job, potential errors, and error locations.
.png?alt=media&token=2933c855-1ba2-4641-82d1-1f52c876fac2)
Once you've identified the job failure point, fix the file errors and reupload the file to Arize with a new name.
resource "google_storage_bucket" "arize-example-bucket" {
// (optional) uniform_bucket_level_access = true
name = "arize-example-bucket"
project = google_project.development.project_id
labels = {
"arize-ingestion-key" = "value_from_arize_ui"
}
}
resource "google_project_iam_custom_role" "arize-example-bucket" {
description = "permission to view storage bucket, and view and list objects"
permissions = [
"storage.buckets.get",
"storage.objects.get",
"storage.objects.list"
]
project = google_project.development.project_id
role_id = "FileImporterViewer"
title = "File Importer Viewer"
stage = "ALPHA"
}
resource "google_storage_bucket_iam_binding" "arize-example-bucket-iam-binding" {
bucket = google_storage_bucket.arize-example-bucket.name
role = "projects/<PROJECT_ID>/roles/FileImporterViewer"
members = [
"serviceAccount:[email protected]",
]
}
Last modified 21d ago