Snowflake
Learn how to setup an import job using Snowflake
Navigate to the 'Upload Data' page on the left navigation bar in the Arize platform. From there, select the 'Snowflake' card or navigate to the Data Warehouse tab to start a new table import job to begin a new table import job.
Storage Selection: ❄️ Snowflake

Upload Data Page in Arize
A warehouse is an on-demand, scalable compute cluster used for executing data processing tasks, in this case, connect a warehouse to run queries and sync data from tables relevant to your model.
To gain access to your tables, first configure an initial setup to any new Snowflake Warehouse. If you've previously connected your warehouse, skip this step and proceed to specify the table configuration.
In Snowflake: Copy your Warehouse name

Warehouses in Snowflake
In Arize: Paste 'Warehouse Name' in the applicable field, and copy the code snippet

Warehouse Field in Arize

Worksheets Tab in Snowflake
In Snowflake: Paste code snippet from Arize, select the applicable Warehouse, and click 'Run All'

Example Worksheet to Run All
Arize requires the following field inputs to enable access permissions.
Field Name | Description |
---|---|
Account ID | The account identifier <organization_name>-<account_name> (ex. WOOGSCZ-ZV77179 ) |
Database | The high-level container for storing and organizing your schemas
(ex. COVID19_EPIDEMIOLOGICAL_DATA ) |
Schema | The logical container that holds the target table within a database (ex. PUBLIC ) |
Table Name | The database object that stores structured data in rows and columns that Arize will sync from (ex. DEMOGRAPHICS ) |

Field heirarchy in Snowflake
In Snowflake: Create an 'Account ID' by combining your <organization name> with your <account name>, separated by a hyphen. Account information is located at the bottom left of any Snowflake page.
In the example below, the account ID in Arize is
WOOGSCZ-ZV77179
.
Account information located on the bottom left of any Snowflake page

Snowflake databse information
In Arize: Input fields to Arize in the 'Dataset Configuration' card

Dataset configuration in Arize
Table permissions enable Arize to access, read, and sync your data.
In Arize: Copy the code snippet in the “Permissions Configuration” card

Copy Permissions Configuration in Arize
In Snowflake: Paste the 'Permissions Configuration' code snippet in a Snowflake SQL Worksheet and click 'Run All'. See docs on granting permissions to Arize's role for Snowflake.

Snowflake SQL Worksheet
Match your model schema to your model type and define your model schema through the form input or a json schema.

Set up model configurations

Map your table using a form

Map your table using a JSON schema
Property | Description | Required |
---|---|---|
prediction_ID | The unique identifier of a specific prediction. Limited to 128 characters. | Required |
change_timestamp* | Timestamp indicating when a row was added to the table. Used to automatically sync new rows(see example for details) | Required *(only applicable for table upload) |
prediction_label | Column name for the prediction value | |
prediction_score | Column name for the predicted score | |
actual_label | Column name for the actual or ground truth value | Optional for production records |
actual_score | Column name for the ground truth score | |
timestamp | The timestamp of the prediction in seconds or an RFC3339 timestamp | Optional, defaults to current timestamp at file ingestion time |
prediction_group_id | Column name for ranking groups or lists in ranking models | |
rank | Column name for rank of each element on the its group or list | |
relevance_label | Column name for ranking actual or ground truth value | |
relevance_score | Column name for ranking ground truth score | |
features | Column name for features . Features must be sent in the same file as predictions | Optional. Arize automatically infers columns as features if they are not specified. Choose between:
|
tags | Column name for tags . Tags must be sent in the same file as predictions and features | Optional. Choose between:
|
shap_values | A string prefix to describe a column shap/ . SHAP must be sent in the same file as predictions or with a matching prediction_id | Optional |
version | A column to specify model version. version/ assigns a version to the corresponding data within a column, or configure your version within the UI | Optional, defaults to 'no_version' |
batch_id | Distinguish different batches of data under the same model_id and model_version. Must be specified as a constant during job setup or in the schema | Optional for validation records only |
exclude | A list of columns to exclude if the features property is not included in the ingestion schema | Optional |
embedding_features | A list of embedding columns, required vector column, optional raw data column, and optional link to data column. Learn more here | Optional |
Once finished, Arize will begin querying your table and ingesting your records as model inferences.
Arize will run queries to ingest records from your table based on your configured refresh interval.
Arize will attempt a dry run to validate your job for any access, schema or record-level errors. If the dry run is successful, you may then create the import job.

Successful import job summary
After creating a job following a successful dry run, you will be taken to the 'Job Status' tab where you can see the status of your import jobs.

You can view the job details and import progress by clicking on the job ID, which uncovers more information about the job.
An import job may run into a few problems. Use the dry run and job details UI to troubleshoot and quickly resolve data ingestion issues.
If there is an error validating a file or table against the model schema, Arize will surface an actionable error message. From there, click on the 'Fix Schema' button to adjust your model schema.
.png?alt=media&token=09bd2283-5215-4bd2-abe3-8bdb03b53dfd)
If your dry run is successful, but your job fails, click on the job ID to view the job details. This uncovers job details such as information about the file path or query id, the last import job, potential errors, and error locations.

Once you've identified the job failure point, append the edited row to the end of your table with an updated change_timestamp value.
Last modified 3d ago