Databricks
Learn how to setup an import job using Databricks
Last updated
Copyright © 2023 Arize AI, Inc
Learn how to setup an import job using Databricks
Last updated
If necessary, generate a PAT (Personal Access Token), which will be used to authenticate in the following steps when you generate a token for your service principal.
Navigate to your Workspace and click "User Settings"
Click "Generate new token"
Take note of your PAT
Navigate to your Workspace and click "Admin Settings"
In the "Service Principals" tab, click "Add Service Principal"
Click on the service principal and enable “Databricks SQL access” and “Workspace access” and click “Update”
Navigate to "Admin Settings" > "Workspace Settings". Search for Personal Access Tokens
Click Permission Settings and grant "Can Use" to the service account you just created.
With your Token (PAT) and Application ID, run the following CURL command. Don't forget to fill in the environment variables with your specific information (${DATABRICKS_HOST}
should be the URL of your workspace)
Save the token_value from the response. This is the Token you will use to complete the remaining setup in Arize later.
Go to the Data Explorer (on the left drawer) and click on the catalog with the table/view you want to grant access.
Click “Permissions” and grant “USE CATALOG” and “USE SCHEMA”. Click Grant.
Go to the view/table and click “Permissions” and grant “SELECT” to the view/table
Go to "SQL Warehouses" > [YOUR_WAREHOUSE_NAME] and click on "Permissions". Grant Can Use permissions to your service principal.
Navigate to the 'Upload Data' page on the left navigation bar in the Arize platform. From there, select the 'Databricks' card or navigate to the Data Warehouse tab to start a new table import job to begin a new table import job.
Storage Selection: Databricks
Input Hostname, Endpoint, Port, and Token (from Step 1)
You can find Hostname, Endpoint, and Port in your Workspace
Similarly for Table ID
If you have issues granting permissions please reach out to support@arize.com
Tag your Catalog/Schema/Table with the arize_ingestion_key
and the provided label value using the steps below. For more details, see docs on Table_tags for Databricks.
In Arize UI: Copy arize_ingestion_key
value
Match your model schema to your model type and define your model schema through the form input or a json schema.
Learn more about Schema fields here.
Once finished, Arize will begin querying your table and ingesting your records as model inferences.
Arize will run queries to ingest records from your table based on your configured refresh interval.
Arize will attempt a dry run to validate your job for any access, schema or record-level errors. If the dry run is successful, you may then create the import job.
After creating a job following a successful dry run, you will be taken to the 'Job Status' tab where you can see the status of your import jobs.
You can view the job details and import progress by clicking on the job ID, which uncovers more information about the job.
An import job may run into a few problems. Use the dry run and job details UI to troubleshoot and quickly resolve data ingestion issues.
If there is an error validating a file or table against the model schema, Arize will surface an actionable error message. From there, click on the 'Fix Schema' button to adjust your model schema.
If your dry run is successful, but your job fails, click on the job ID to view the job details. This uncovers job details such as information about the file path or query id, the last import job, potential errors, and error locations.
Once you've identified the job failure point, append the edited row to the end of your table with an updated change_timestamp value.