LogoLogo
Python SDKSlack
  • Documentation
  • Cookbooks
  • Self-Hosting
  • Release Notes
  • Reference
  • Arize AI
  • Quickstarts
  • ✨Arize Copilot
  • Arize AI for Agents
  • Concepts
    • Agent Evaluation
    • Tracing
      • What is OpenTelemetry?
      • What is OpenInference?
      • Openinference Semantic Conventions
    • Evaluation
  • 🧪Develop
    • Quickstart: Experiments
    • Datasets
      • Create a dataset
      • Update a dataset
      • Export a dataset
    • Experiments
      • Run experiments
      • Run experiments with code
        • Experiments SDK differences in AX vs Phoenix
        • Log experiment results via SDK
      • Evaluate experiments
      • Evaluate experiment with code
      • CI/CD with experiments
        • Github Action Basics
        • Gitlab CI/CD Basics
      • Download experiment
    • Prompt Playground
      • Use tool calling
      • Use image inputs
      • Replay spans
      • Compare prompts side-by-side
      • Load a dataset into playground
      • Save playground outputs as an experiment
      • ✨Copilot: prompt builder
    • Playground Integrations
      • OpenAI
      • Azure OpenAI
      • AWS Bedrock
      • VertexAI
      • Custom LLM Models
    • Prompt Hub
  • 🧠Evaluate
    • Online Evals
      • Run evaluations in the UI
      • Run evaluations with code
      • Test LLM evaluator in playground
      • View task details & logs
      • ✨Copilot: Eval Builder
      • ✨Copilot: Eval Analysis
      • ✨Copilot: RAG Analysis
    • Experiment Evals
    • LLM as a Judge
      • Custom Eval Templates
      • Arize Templates
        • Agent Tool Calling
        • Agent Tool Selection
        • Agent Parameter Extraction
        • Agent Path Convergence
        • Agent Planning
        • Agent Reflection
        • Hallucinations
        • Q&A on Retrieved Data
        • Summarization
        • Code Generation
        • Toxicity
        • AI vs Human (Groundtruth)
        • Citation
        • User Frustration
        • SQL Generation
    • Code Evaluations
    • Human Annotations
  • 🔭Observe
    • Quickstart: Tracing
    • Tracing
      • Setup tracing
      • Trace manually
        • Trace inputs and outputs
        • Trace function calls
        • Trace LLM, Retriever and Tool Spans
        • Trace prompt templates & variables
        • Trace as Inferences
        • Send Traces from Phoenix -> Arize
        • Advanced Tracing (OTEL) Examples
      • Add metadata
        • Add events, exceptions and status
        • Logging Latent Metadata
        • Add attributes, metadata and tags
        • Send data to a specific project
        • Get the current span context and tracer
      • Configure tracing options
        • Configure OTEL tracer
        • Mask span attributes
        • Redact sensitive data from traces
        • Instrument with OpenInference helpers
      • Query traces
        • Filter Traces
          • Time Filtering
        • Export Traces
        • ✨AI Powered Search & Filter
        • ✨AI Powered Trace Analysis
        • ✨AI Span Analysis & Evaluation
    • Tracing Integrations
      • OpenAI
      • OpenAI Agents SDK
      • LlamaIndex
      • LlamaIndex Workflows
      • LangChain
      • LangGraph
      • Hugging Face smolagents
      • Autogen
      • Google GenAI (Gemini)
      • Model Context Protocol (MCP)
      • Vertex AI
      • Amazon Bedrock
      • Amazon Bedrock Agents
      • MistralAI
      • Anthropic
      • LangFlow
      • Haystack
      • LiteLLM
      • CrewAI
      • Groq
      • DSPy
      • Guardrails AI
      • Prompt flow
      • Vercel AI SDK
      • Llama
      • Together AI
      • OpenTelemetry (arize-otel)
      • BeeAI
    • Evals on Traces
    • Guardrails
    • Sessions
    • Dashboards
      • Dashboard Widgets
      • Tracking Token Usage
      • ✨Copilot: Dashboard Widget Creation
    • Monitors
      • Integrations: Monitors
        • Slack
          • Manual Setup
        • OpsGenie
        • PagerDuty
      • LLM Red Teaming
    • Custom Metrics & Analytics
      • Arize Query Language Syntax
        • Conditionals and Filters
        • All Operators
        • All Functions
      • Custom Metric Examples
      • ✨Copilot: ArizeQL Generator
  • 📈Machine Learning
    • Machine Learning
      • User Guide: ML
      • Quickstart: ML
      • Concepts: ML
        • What Is A Model Schema
        • Delayed Actuals and Tags
        • ML Glossary
      • How To: ML
        • Upload Data to Arize
          • Pandas SDK Example
          • Local File Upload
            • File Upload FAQ
          • Table Ingestion Tuning
          • Wildcard Paths for Cloud Storage
          • Troubleshoot Data Upload
          • Sending Data FAQ
        • Monitors
          • ML Monitor Types
          • Configure Monitors
            • Notifications Providers
          • Programmatically Create Monitors
          • Best Practices for Monitors
        • Dashboards
          • Dashboard Widgets
          • Dashboard Templates
            • Model Performance
            • Pre-Production Performance
            • Feature Analysis
            • Drift
          • Programmatically Create Dashboards
        • Performance Tracing
          • Time Filtering
          • ✨Copilot: Performance Insights
        • Drift Tracing
          • ✨Copilot: Drift Insights
          • Data Distribution Visualization
          • Embeddings for Tabular Data (Multivariate Drift)
        • Custom Metrics
          • Arize Query Language Syntax
            • Conditionals and Filters
            • All Operators
            • All Functions
          • Custom Metric Examples
          • Custom Metrics Query Language
          • ✨Copilot: ArizeQL Generator
        • Troubleshoot Data Quality
          • ✨Copilot: Data Quality Insights
        • Explainability
          • Interpreting & Analyzing Feature Importance Values
          • SHAP
          • Surrogate Model
          • Explainability FAQ
          • Model Explainability
        • Bias Tracing (Fairness)
        • Export Data to Notebook
        • Automate Model Retraining
        • ML FAQ
      • Use Cases: ML
        • Binary Classification
          • Fraud
          • Insurance
        • Multi-Class Classification
        • Regression
          • Lending
          • Customer Lifetime Value
          • Click-Through Rate
        • Timeseries Forecasting
          • Demand Forecasting
          • Churn Forecasting
        • Ranking
          • Collaborative Filtering
          • Search Ranking
        • Natural Language Processing (NLP)
        • Common Industry Use Cases
      • Integrations: ML
        • Google BigQuery
          • GBQ Views
          • Google BigQuery FAQ
        • Snowflake
          • Snowflake Permissions Configuration
        • Databricks
        • Google Cloud Storage (GCS)
        • Azure Blob Storage
        • AWS S3
          • Private Image Link Access Via AWS S3
        • Kafka
        • Airflow Retrain
        • Amazon EventBridge Retrain
        • MLOps Partners
          • Algorithmia
          • Anyscale
          • Azure & Databricks
          • BentoML
          • CML (DVC)
          • Deepnote
          • Feast
          • Google Cloud ML
          • Hugging Face
          • LangChain 🦜🔗
          • MLflow
          • Neptune
          • Paperspace
          • PySpark
          • Ray Serve (Anyscale)
          • SageMaker
            • Batch
            • RealTime
            • Notebook Instance with Greater than 20GB of Data
          • Spell
          • UbiOps
          • Weights & Biases
      • API Reference: ML
        • Python SDK
          • Pandas Batch Logging
            • Client
            • log
            • Schema
            • TypedColumns
            • EmbeddingColumnNames
            • ObjectDetectionColumnNames
            • PromptTemplateColumnNames
            • LLMConfigColumnNames
            • LLMRunMetadataColumnNames
            • NLP_Metrics
            • AutoEmbeddings
            • utils.types.ModelTypes
            • utils.types.Metrics
            • utils.types.Environments
          • Single Record Logging
            • Client
            • log
            • TypedValue
            • Ranking
            • Multi-Class
            • Object Detection
            • Embedding
            • LLMRunMetadata
            • utils.types.ModelTypes
            • utils.types.Metrics
            • utils.types.Environments
        • Java SDK
          • Constructor
          • log
          • bulkLog
          • logValidationRecords
          • logTrainingRecords
        • R SDK
          • Client$new()
          • Client$log()
        • Rest API
    • Computer Vision
      • How to: CV
        • Generate Embeddings
          • How to Generate Your Own Embedding
          • Let Arize Generate Your Embeddings
        • Embedding & Cluster Analyzer
        • ✨Copilot: Embedding Summarization
        • Similarity Search
        • Embedding Drift
        • Embeddings FAQ
      • Integrations: CV
      • Use Cases: CV
        • Image Classification
        • Image Segmentation
        • Object Detection
      • API Reference: CV
Powered by GitBook

Support

  • Chat Us On Slack
  • support@arize.com

Get Started

  • Signup For Free
  • Book A Demo

Copyright © 2025 Arize AI, Inc

On this page
  • Step 1 - Start the Data Upload Wizard
  • Step 2 - Input the Project ID, Dataset, and Table / View
  • Step 3 - Grant Access To Your Dataset, Table, or View
  • Step 4 - Configure Your Model And Define Your Table’s Schema
  • Step 4b. Validate Model Schema
  • Step 5 - Add Model Data To The Table Or View
  • Step 6 - Check your Table Import Job
  • Step 6.5 Pause or Delete An Import Job
  • Step 7 - Troubleshooting An Import Job

Was this helpful?

  1. Machine Learning
  2. Machine Learning
  3. Integrations: ML

Google BigQuery

Learn how to setup an import job using Google BigQuery

Last updated 1 year ago

Was this helpful?

Step 1 - Start the Data Upload Wizard

Navigate to the 'Upload Data' page on the left navigation bar in the Arize platform. From there, select the 'Google BQ' card or navigate to the Data Warehouse tab to start a new table import job to begin a new table import job.

Storage Selection: Google BQ

Step 2 - Input the Project ID, Dataset, and Table / View

Locate the Project ID, Dataset, and Table or View name of the table/view you would like to sync from Google BigQuery.

  • The dataset and table name correspond to the path where your table is located

Add your Table ID Arize. Arize will automatically parse your Dataset, Table Name, and GCP Project ID.

Step 3 - Grant Access To Your Dataset, Table, or View

In Arize UI: Copy arize-ingestion-key value

Grant Access To A Table/View

  1. In Google Cloud console: Navigate to the BigQuery SQL Workspace

  1. Select the desired table or view, navigate to the Details tab and click "Edit Details". Under the Labels section, click "Add Labels". Add the following label:

    • Key as "arize-ingestion-key"

    • Value as the arize-ingestion-key value from the Arize UI

  2. Grant the roles/bigquery.jobUser role to our service account. Go to the IAM page and click "Grant Access"

  • Navigate to your table/view from the Bigquery SQL Explorer page.

  • Select "Share" and click on "Permissions"

  • Click "Add Principal"

  • Add our service account: fileimporter@production-269901.iam.gserviceaccount.com as a BigQuery Data Viewer, and click "Save"

  • For a view, you must grant access to all underlying tables, so you must repeat these step for all the underlying tables.

You can create a cloud shell instance from the UI to run the following commands

  1. Add the arize-ingestion-key key from the Arize UI as a label on the dataset

bq update --set_label arize-ingestion-key:${KEY_FROM_UI} ${PROJECT_ID}:${DATASET}
  1. Grant the roles/bigquery.jobUser role to the Arize service account.

gcloud projects add-iam-policy-binding ${PROJECT_ID} --member=serviceAccount:fileimporter@production-269901.iam.gserviceaccount.com --role=roles/bigquery.jobUser
  1. To grant the roles/bigquery.dataViewer role to the Arize service account your table or view

    • Table:

     bq add-iam-policy-binding \
     --member='serviceAccount:fileimporter@production-269901.iam.gserviceaccount.com' \
     --role='roles/bigquery.dataViewer' \
      ${PROJECT_ID}:${DATASET}.${TABLE}

Grant Access To An Entire Dataset:

  1. In Google Cloud console: Navigate to the BigQuery SQL Workspace

  1. Select the desired dataset, and click "Edit Details". Under the Labels section, click "Add Labels". Add the following label:

    • Key as "arize-ingestion-key"

    • Value as the arize-ingestion-key value copied from the Arize UI

  2. Grant the roles/bigquery.jobUser role to the Arize service account. Go to the IAM page and click "Grant Access"

  • Navigate to your dataset from the Bigquery SQL Explorer page.

  • Select "Sharing" and click on "Permissions"

  • Click "Add Principal"

  • Add Arize service account: fileimporter@production-269901.iam.gserviceaccount.com as a BigQuery Data Viewer, and click "Save"

You can create a cloud shell instance from the UI to run the following commands

  1. Add the arize-ingestion-key key from the Arize UI as a label on the dataset

bq update --set_label arize-ingestion-key:${KEY_FROM_UI} ${PROJECT_ID}:${DATASET}
  1. Grant the roles/bigquery.jobUser role to the Arize service account.

gcloud projects add-iam-policy-binding ${PROJECT_ID} --member=serviceAccount:fileimporter@production-269901.iam.gserviceaccount.com --role=roles/bigquery.jobUser

Step 4 - Configure Your Model And Define Your Table’s Schema

Once finished, Arize will begin querying your table and ingesting your records as model inferences.

Step 4b. Validate Model Schema

Once you fill in your applicable predictions, actuals, and model inputs, click 'Validate Schema' to visualize your model schema in the Arize UI. Check that your column names and corresponding data match for a successful import job.

Step 5 - Add Model Data To The Table Or View

Arize will run queries to ingest records from your table based on your configured refresh interval.

Step 6 - Check your Table Import Job

Arize will attempt a dry run to validate your job for any access, schema, or record-level errors. If the dry run is successful, you can proceed to create the import job.

From there, you will be taken to the 'Job Status' tab where you can see the status of your import jobs.

All active jobs will regularly sync new data from your data source with Arize. You can view the job details and import progress by clicking on the job ID, which reveals more information about the job.

Step 6.5 Pause or Delete An Import Job

To pause or edit your table schema, click on 'Job Options'.

  • Delete a job if it is no longer needed or if you made an error connecting to the wrong bucket. This will set your job status as 'deleted' in Arize.

  • Pause a job if you have a set cadence to update your table. This way, you can 'start job' when you know there will be new data to reduce query costs. This will set your job status as 'inactive' in Arize.

Step 7 - Troubleshooting An Import Job

An import job may run into a few problems. Use the dry run and job details UI to troubleshoot and quickly resolve data ingestion issues.

Validation Errors

If there is an error validating a file or table against the model schema, Arize will surface an actionable error message. From there, click on the 'Fix Schema' button to adjust your model schema.

Dry Run File/Table Passes But The Job Fails

If your dry run is successful, but your job fails, click on the job ID to view the job details. This uncovers job details such as information about the file path or query id, the last import job, potential errors, and error locations.

The GBQ Project ID is a unique identifier for a project. See for steps on how to retrieve this ID.

Tag your dataset/table/view with the arize-ingestion-key and the provided label value using the steps below. For more details, see docs on for BigQuery.

You can grant access toor .

Consider creating an if you don't want to grant access to the underlying tables, or granting access to each underlying table is too cumbersome.

For more details: see the official documentation for granting access

View: See the Google BigQuery and navigate to the tab bq.

For additional details: see the official documentation for granting access

To grant the roles/bigquery.dataViewer role to the Arize service account on your dataset, see the BigQuery and navigate to the tab bq.

Match your model schema to your and define your model schema through the form input or a json schema.

Learn more about Schema fields .

Within the Job Details section, you can select More Details on a specific query to view the start time and end time that was used in that query. The query start time represents the max value of the based on the previous query, and the query end time is the current day/time that the query was run. The query start time will then be updated after each query to reflect the current max change_timestamp. This can help debug issues specifically related to the change_timestamp field.

Once you've identified the job failure point, append the edited row to the end of your table with an updated value.

📈
here
Adding labels to resources
authorized view

Grant Access To A Table/View

Grant Access To An Entire Dataset

here
guide to grant access to a view
here
guide to grant access to a dataset
model type
a single table or view
all the tables/views in a dataset
Navigate to Step 4 - Configure your model and define your table’s schema
Navigate to Step 4 - Configure your model and define your table’s schema
Navigate to Step 4 - Configure your model and define your table’s schema
Navigate to Step 4 - Configure your model and define your table’s schema
change_timestamp
change_timestamp
here
Console view to find Project ID, Dataset name and Table/View name
Example TableID
Copy Arize Ingestion Key
Add Arize service account as "Principal" with "BigQuery Job User" role
Set up model configurations
Map your table using a form
Map your table using a JSON schema
Table of your import jobs
Audit trail of queries run on your table
Job Status tab showing job listings