Ask or search…
K
Links

Databricks

Learn how to setup an import job using Databricks

Step 1 - Generate a Token

If necessary, generate a PAT (Personal Access Token), which will be used to authenticate in the following steps when you generate a token for your service principal.
Navigate to your Workspace and click "User Settings"
Click "Generate new token"
Take note of your PAT
With Unity Catalog
Without Unity Catalog
  1. 1.
    Navigate to your Workspace and click "Admin Settings"
  1. 2.
    In the "Service Principals" tab, click "Add Service Principal"
  1. 1.
    Click on "User Management" on accounts.cloud.databricks.com
  2. 2.
    Create a Service Principal
  3. 3.
    Take note of the Application ID
  4. 4.
    Run the following curl command to create a service principal in your workspace where ${DATABRICKS_HOST} is the workspace URL, ${DATABRICKS_TOKEN} is the PAT you just created, and $APPLICATION_ID is the Application ID of the service principal you just created
curl -X POST \
${DATABRICKS_HOST}/api/2.0/preview/scim/v2/ServicePrincipals \
--header "Content-type: application/json" \
--header "Authorization: Bearer ${DATABRICKS_TOKEN}" \
--data "{
\"displayName\": \"displayName\",
\"externalId\": \"externalId\",
\"applicationId\": \"${APPLICATION_ID}\",
\"id\": \"id\",
\"active\": true
}"
Click on the service principal and enable “Databricks SQL access” and “Workspace access” and click “Update”
Navigate to "Admin Settings" > "Workspace Settings". Search for Personal Access Tokens
Click Permission Settings and grant "Can Use" to the service account you just created.
With your Token (PAT) and Application ID, run the following CURL command. Don't forget to fill in the environment variables with your specific information (${DATABRICKS_HOST} should be the URL of your workspace)
curl -X POST \
${DATABRICKS_HOST}/api/2.0/token-management/on-behalf-of/tokens \
--header "Content-type: application/json" \
--header "Authorization: Bearer ${DATABRICKS_TOKEN}" \
--data "{\"application_id\": \"${APPLICATION_ID}\" }"
Save the token_value from the response. This is the Token you will use to complete the remaining setup in Arize later.

Step 2 - Grant Access To Your Table

Go to the Data Explorer (on the left drawer) and click on the catalog with the table/view you want to grant access.
Click “Permissions” and grant “USE CATALOG” and “USE SCHEMA”. Click Grant.
Go to the view/table and click “Permissions” and grant “SELECT” to the view/table
Go to "SQL Warehouses" > [YOUR_WAREHOUSE_NAME] and click on "Permissions". Grant Can Use permissions to your service principal.

Step 3 - Start the Data Upload Wizard

Navigate to the 'Upload Data' page on the left navigation bar in the Arize platform. From there, select the 'Databricks' card or navigate to the Data Warehouse tab to start a new table import job to begin a new table import job.
Storage Selection: Databricks
Select Databricks from Table Options
Input Hostname, Endpoint, Port, and Token (from Step 1)
You can find Hostname, Endpoint, and Port in your Workspace
Similarly for Table ID
If you have issues granting permissions please reach out to [email protected]

Step 4 - Configure Your Model And Define Your Table’s Schema

Match your model schema to your model type and define your model schema through the form input or a json schema.
Set up model configurations
Map your table using a form
Map your table using a JSON schema
Learn more about Schema fields here.
Once finished, Arize will begin querying your table and ingesting your records as model inferences.

Step 5 - Add Model Data To The Table Or View

Arize will run queries to ingest records from your table based on your configured refresh interval.

Step 6 - Check your Table Import Job

Arize will attempt a dry run to validate your job for any access, schema or record-level errors. If the dry run is successful, you may then create the import job.
After creating a job following a successful dry run, you will be taken to the 'Job Status' tab where you can see the status of your import jobs.
You can view the job details and import progress by clicking on the job ID, which uncovers more information about the job.

Step 7 - Troubleshooting An Import Job

An import job may run into a few problems. Use the dry run and job details UI to troubleshoot and quickly resolve data ingestion issues.

Validation Errors

If there is an error validating a file or table against the model schema, Arize will surface an actionable error message. From there, click on the 'Fix Schema' button to adjust your model schema.

Dry Run File/Table Passes But The Job Fails

If your dry run is successful, but your job fails, click on the job ID to view the job details. This uncovers job details such as information about the file path or query id, the last import job, potential errors, and error locations.
Once you've identified the job failure point, append the edited row to the end of your table with an updated change_timestamp value.