Google BigQuery

Learn how to setup an import job using Google BigQuery

Step 1 - Start the Data Upload Wizard

Navigate to the 'Upload Data' page on the left navigation bar in the Arize platform. From there, select the 'Google BQ' card or navigate to the Data Warehouse tab to start a new table import job to begin a new table import job.

Storage Selection: Google BQ

Step 2 - Input the Project ID, Dataset, and Table / View

Locate the Project ID, Dataset, and Table or View name of the table/view you would like to sync from Google BigQuery.

The dataset and table name correspond to the path where your table is located

Add your Table ID Arize. Arize will automatically parse your Dataset, Table Name, and GCP Project ID.

Step 3 - Grant Access To Your Dataset, Table, or View

In Arize UI: Copy arize-ingestion-key value

Grant Access To A Table/View

In Google Cloud console: Navigate to the BigQuery SQL Workspace

Select the desired table or view, navigate to the Details tab and click "Edit Details". Under the Labels section, click "Add Labels". Add the following label:
- Key as "arize-ingestion-key"
- Value as the arize-ingestion-key value from the Arize UI
Grant the roles/bigquery.jobUser role to our service account. Go to the IAM page and click "Grant Access"

Navigate to your table/view from the Bigquery SQL Explorer page.
Select "Share" and click on "Permissions"
Click "Add Principal"

Add our service account: fileimporter@production-269901.iam.gserviceaccount.com as a BigQuery Data Viewer, and click "Save"

For a view, you must grant access to all underlying tables, so you must repeat these step for all the underlying tables.

You can create a cloud shell instance from the UI to run the following commands

Add the arize-ingestion-key key from the Arize UI as a label on the dataset

bq update --set_label arize-ingestion-key:${KEY_FROM_UI} ${PROJECT_ID}:${DATASET}

Grant the roles/bigquery.jobUser role to the Arize service account.

gcloud projects add-iam-policy-binding ${PROJECT_ID} --member=serviceAccount:fileimporter@production-269901.iam.gserviceaccount.com --role=roles/bigquery.jobUser

To grant the roles/bigquery.dataViewer role to the Arize service account your table or view

Table:

 bq add-iam-policy-binding \
 --member='serviceAccount:fileimporter@production-269901.iam.gserviceaccount.com' \
 --role='roles/bigquery.dataViewer' \
  ${PROJECT_ID}:${DATASET}.${TABLE}

Grant Access To An Entire Dataset:

In Google Cloud console: Navigate to the BigQuery SQL Workspace

Select the desired dataset, and click "Edit Details". Under the Labels section, click "Add Labels". Add the following label:
- Key as "arize-ingestion-key"
- Value as the arize-ingestion-key value copied from the Arize UI
Grant the roles/bigquery.jobUser role to the Arize service account. Go to the IAM page and click "Grant Access"

Navigate to your dataset from the Bigquery SQL Explorer page.
Select "Sharing" and click on "Permissions"

Click "Add Principal"

Add Arize service account: fileimporter@production-269901.iam.gserviceaccount.com as a BigQuery Data Viewer, and click "Save"

You can create a cloud shell instance from the UI to run the following commands

Add the arize-ingestion-key key from the Arize UI as a label on the dataset

bq update --set_label arize-ingestion-key:${KEY_FROM_UI} ${PROJECT_ID}:${DATASET}

Grant the roles/bigquery.jobUser role to the Arize service account.

gcloud projects add-iam-policy-binding ${PROJECT_ID} --member=serviceAccount:fileimporter@production-269901.iam.gserviceaccount.com --role=roles/bigquery.jobUser

Step 4 - Configure Your Model And Define Your Table’s Schema

Once finished, Arize will begin querying your table and ingesting your records as model inferences.

Step 4b. Validate Model Schema

Once you fill in your applicable predictions, actuals, and model inputs, click 'Validate Schema' to visualize your model schema in the Arize UI. Check that your column names and corresponding data match for a successful import job.

Step 5 - Add Model Data To The Table Or View

Arize will run queries to ingest records from your table based on your configured refresh interval.

Step 6 - Check your Table Import Job

Arize will attempt a dry run to validate your job for any access, schema, or record-level errors. If the dry run is successful, you can proceed to create the import job.

From there, you will be taken to the 'Job Status' tab where you can see the status of your import jobs.

All active jobs will regularly sync new data from your data source with Arize. You can view the job details and import progress by clicking on the job ID, which reveals more information about the job.

Step 6.5 Pause or Delete An Import Job

To pause or edit your table schema, click on 'Job Options'.

Delete a job if it is no longer needed or if you made an error connecting to the wrong bucket. This will set your job status as 'deleted' in Arize.
Pause a job if you have a set cadence to update your table. This way, you can 'start job' when you know there will be new data to reduce query costs. This will set your job status as 'inactive' in Arize.

Step 7 - Troubleshooting An Import Job

An import job may run into a few problems. Use the dry run and job details UI to troubleshoot and quickly resolve data ingestion issues.

Validation Errors

If there is an error validating a file or table against the model schema, Arize will surface an actionable error message. From there, click on the 'Fix Schema' button to adjust your model schema.

Dry Run File/Table Passes But The Job Fails

If your dry run is successful, but your job fails, click on the job ID to view the job details. This uncovers job details such as information about the file path or query id, the last import job, potential errors, and error locations.

Last updated 1 year ago

Was this helpful?