Snowflake
Learn how to setup an import job using Snowflake
Last updated
Learn how to setup an import job using Snowflake
Last updated
Copyright © 2023 Arize AI, Inc
Navigate to the 'Upload Data' page on the left navigation bar in the Arize platform. From there, select the 'Snowflake' card or navigate to the Data Warehouse tab to start a new table import job to begin a new table import job.
Storage Selection: ❄️ Snowflake
A warehouse is an on-demand, scalable compute cluster used for executing data processing tasks, in this case, connect a warehouse to run queries and sync data from tables relevant to your model.
To gain access to your tables, first configure an initial setup to any new Snowflake Warehouse. If you've previously connected your warehouse, skip this step and proceed to specify the table configuration.
In Snowflake: Copy your Warehouse name
In Arize: Paste 'Warehouse Name' in the applicable field, and copy the code snippet
In Snowflake: Create a Snowflake 'SQL worksheet'
In Snowflake: Paste code snippet from Arize, select the applicable Warehouse, and click 'Run All'
Arize requires the following field inputs to enable access permissions.
Account ID
The account identifier<organization_name>-<account_name>
(ex. WOOGSCZ-ZV77179
)
Database
The high-level container for storing and organizing your schemas
(ex. COVID19_EPIDEMIOLOGICAL_DATA
)
Schema
The logical container that holds the target table within a database (ex. PUBLIC
)
Table Name
The database object that stores structured data in rows and columns that Arize will sync from (ex. DEMOGRAPHICS
)
In Snowflake: Create an 'Account ID' by combining your <organization name> with your <account name>, separated by a hyphen. Account information is located at the bottom left of any Snowflake page.
In the example below, the account ID in Arize isWOOGSCZ-ZV77179
.
In Snowflake: Copy the Database, Schema, and Table Names from the 'Databases' tab.
In Arize: Input fields to Arize in the 'Dataset Configuration' card
Table permissions enable Arize to access, read, and sync your data.
In Arize: Copy the code snippet in the “Permissions Configuration” card
In Snowflake: Paste the 'Permissions Configuration' code snippet in a Snowflake SQL Worksheet and click 'Run All'. See docs on granting permissions to Arize's role for Snowflake.
Match your model schema to your model type and define your model schema through the form input or a json schema.
Learn more about Schema fields here.
Once finished, Arize will begin querying your table and ingesting your records as model inferences.
Arize will run queries to ingest records from your table based on your configured refresh interval.
Arize will attempt a dry run to validate your job for any access, schema, or record-level errors. If the dry run is successful, you can proceed to create the import job.
From there, you will be taken to the 'Job Status' tab. where you can see the status of your import jobs. All active jobs will regularly sync new data from your data source with Arize. You can view the job details by clicking on the job ID, which reveals more information about the job.
To pause or edit your table schema, click on 'Job Options'.
Delete a job if it is no longer needed or if you made an error connecting to the wrong bucket. This will set your job status as 'deleted' in Arize.
Pause a job if you have a set cadence to update your table. This way, you can 'start job' when you know there will be new data to reduce query costs. This will set your job status as 'inactive' in Arize.
An import job may run into a few problems. Use the dry run and job details UI to troubleshoot and quickly resolve data ingestion issues.
If there is an error validating a file or table against the model schema, Arize will surface an actionable error message. From there, click on the 'Fix Schema' button to adjust your model schema.
If your dry run is successful, but your job fails, click on the job ID to view the job details. This uncovers job details such as information about the file path or query id, the last import job, potential errors, and error locations.
Once you've identified the job failure point, append the edited row to the end of your table with an updated change_timestamp value.