Search…
⌃K
Links

Google BigQuery

Learn how to setup an import job using Google BigQuery

Step 1 - Start the Data Upload Wizard

Navigate to the 'Upload Data' page on the left navigation bar in the Arize platform. From there, select the 'Google BQ' card or navigate to the Data Warehouse tab to start a new table import job to begin a new table import job.
Storage Selection: Google BQ

Step 2 - Input the Project ID, Dataset, and Table / View

Locate the Project ID, Dataset, and Table or View name of the table/view you would like to sync from Google BigQuery.
  • The GBQ Project ID is a unique identifier for a project. See here for steps on how to retrieve this ID.
  • The dataset and table name correspond to the path where your table is located
Console view to find Project ID, Dataset name and Table/View name
Fill in the table information

Step 3 - Grant Access To Your Dataset, Table, or View

Tag your dataset/table/view with the arize-ingestion-key and the provided label value using the steps below. For more details, see docs on Adding labels to resources for BigQuery.
In Arize UI: Copy arize-ingestion-key value
Copy your ingestion key and add it as a key value pair to your GBQ dataset
Consider creating an authorized view if you don't want to grant access to the underlying tables, or granting access to each underlying table is too cumbersome.

Grant Access To A Table/View

From UI
From CLI
  1. 1.
    In Google Cloud console: Navigate to the BigQuery SQL Workspace
​
​
  1. 2.
    Select the desired table or view, navigate to the Details tab and click "Edit Details". Under the Labels section, click "Add Labels". Add the following label:
    • Key as "arize-ingestion-key"
    • Value as the arize-ingestion-key value from the Arize UI
    ​
    ​
  2. 3.
    Grant the roles/bigquery.jobUser role to our service account. Go to the IAM page and click "Grant Access"
Add Arize service account as "Principal" with "BigQuery Job User" role
  • Navigate to your table/view from the Bigquery SQL Explorer page.
  • Select "Share" and click on "Permissions"
    ​
    ​
  • Click "Add Principal"
​
​
​
​
  • For a view, you must grant access to all underlying tables, so you must repeat these step for all the underlying tables.
For more details: see the official documentation for granting access here​
You can create a cloud shell instance from the UI to run the following commands
  1. 1.
    Add the arize-ingestion-key key from the Arize UI as a label on the dataset
bq update --set_label arize-ingestion-key:${KEY_FROM_UI} ${PROJECT_ID}:${DATASET}
  1. 2.
    Grant the roles/bigquery.jobUser role to the Arize service account.
gcloud projects add-iam-policy-binding ${PROJECT_ID} --member=serviceAccount:[email protected] --role=roles/bigquery.jobUser
  1. 3.
    To grant the roles/bigquery.dataViewer role to the Arize service account your table or view
    • Table:
    bq add-iam-policy-binding \
    --member='serviceAccount:[email protected]' \
    --role='roles/bigquery.dataViewer' \
    ${PROJECT_ID}:${DATASET}.${TABLE}

Grant Access To An Entire Dataset:

From UI
From CLI
  1. 1.
    In Google Cloud console: Navigate to the BigQuery SQL Workspace
​
​
  1. 2.
    Select the desired dataset, and click "Edit Details". Under the Labels section, click "Add Labels". Add the following label:
    • Key as "arize-ingestion-key"
    • Value as the arize-ingestion-key value copied from the Arize UI
    • ​
    ​
    ​
  2. 3.
    Grant the roles/bigquery.jobUser role to the Arize service account. Go to the IAM page and click "Grant Access"
  • Navigate to your dataset from the Bigquery SQL Explorer page.
  • Select "Sharing" and click on "Permissions"
  • Click "Add Principal"
​
​
​
​
For additional details: see the official documentation for granting access here​
You can create a cloud shell instance from the UI to run the following commands
​
​
  1. 1.
    Add the arize-ingestion-key key from the Arize UI as a label on the dataset
bq update --set_label arize-ingestion-key:${KEY_FROM_UI} ${PROJECT_ID}:${DATASET}
  1. 2.
    Grant the roles/bigquery.jobUser role to the Arize service account.
gcloud projects add-iam-policy-binding ${PROJECT_ID} --member=serviceAccount:[email protected] --role=roles/bigquery.jobUser
  1. 3.
    To grant the roles/bigquery.dataViewer role to the Arize service account on your dataset, see the BigQuery guide to grant access to a dataset and navigate to the tab bq.

Step 4 - Configure Your Model And Define Your Table’s Schema

Match your model schema to your model type and define your model schema through the form input or a json schema.
Set up model configurations
Map your table using a form
Map your table using a JSON schema
Property
Description
Required
prediction_ID
The unique identifier of a specific prediction
Required
timestamp
The timestamp of the prediction in seconds or an RFC3339 timestamp
Optional, defaults to current timestamp at file ingestion time
change_timestamp*
Timestamp of when a row was added (see example for details)
Required *(only applicable for table upload)
prediction_label
Column name for the prediction value
Required
prediction_score
Column name for the predicted score
Required based on model type​
actual_label
Column name for the actual or ground truth value
Optional for production records
relevance_label
Column name for ranking actual or ground truth value
Required for ranking models
actual_score
Column name for the ground truth score
Required based on model type​
relevance_score
Column name for ranking ground truth score
Required for ranking models
features
A string prefix to describe a column feature/. Features must be sent in the same file as predictions
Arize automatically infers columns as features. Choose between feature prefixing OR inferred features.
tags
A string prefix to describe a column tag/. Tags must be sent in the same file as predictions and features
Optional
shap_values
A string prefix to describe a column shap/. SHAP must be sent in the same file as predictions or with a matching prediction_id
Optional
version
A column to specify model version. version/ assigns a version to the corresponding data within a column, or configure your version within the UI
Optional, defaults to 'no_version'
batch_id
Distinguish different batches of data under the same model_id and model_version. Must be specified as a constant during job setup or in the schema
Optional for validation records only
exclude
A list of columns to exclude if the features property is not included in the ingestion schema
Optional
embedding_features
A list of embedding columns, required vector column, optional raw data column, and optional link to data column. Learn more here​
Optional
Once finished, Arize will begin querying your table and ingesting your records as model inferences.

Step 5 - Add Model Data To The Table Or View

Arize will run queries to ingest records from your table based on your configured refresh interval.

Step 6 - Check your Table Import Job

Arize will attempt a dry run to validate your job for any access, schema or record-level errors. If the dry run is successful, you may then create the import job.
After creating a job following a successful dry run, you will be taken to the 'Job Status' tab where you can see the status of your import jobs.
Table of your import jobs
You can view the job details and import progress by clicking on the job ID, which uncovers more information about the job.
Audit trail of queries run on your table

Step 7 - Troubleshooting An Import Job

An import job may run into a few problems. Use the dry run and job details UI to troubleshoot and quickly resolve data ingestion issues.

Validation Errors

If there is an error validating a file or table against the model schema, Arize will surface an actionable error message. From there, click on the 'Fix Schema' button to adjust your model schema.

Dry Run File/Table Passes But The Job Fails

If your dry run is successful, but your job fails, click on the job ID to view the job details. This uncovers job details such as information about the file path or query id, the last import job, potential errors, and error locations.
Once you've identified the job failure point, append the edited row to the end of your table with an updated change_timestamp value.