Databricks

Learn how to setup an import job using Databricks

Last updated 1 year ago

Was this helpful?

Databricks

Learn how to setup an import job using Databricks

Step 1 - Generate a Token

If necessary, generate a PAT (Personal Access Token), which will be used to authenticate in the following steps when you generate a token for your service principal.

Navigate to your Workspace and click "User Settings"

Click "Generate new token"

Take note of your PAT

Navigate to your Workspace and click "Admin Settings"

In the "Service Principals" tab, click "Add Service Principal"

Click on "User Management" on accounts.cloud.databricks.com
Create a Service Principal
Take note of the Application ID
Run the following curl command to create a service principal in your workspace where ${DATABRICKS_HOST} is the workspace URL, ${DATABRICKS_TOKEN} is the PAT you just created, and $APPLICATION_ID is the Application ID of the service principal you just created

curl -X POST \
${DATABRICKS_HOST}/api/2.0/preview/scim/v2/ServicePrincipals \
--header "Content-type: application/json" \
--header "Authorization: Bearer ${DATABRICKS_TOKEN}" \
--data "{
  \"displayName\": \"displayName\",
  \"externalId\": \"externalId\",
  \"applicationId\": \"${APPLICATION_ID}\",
  \"id\": \"id\",
  \"active\": true
}"

Click on the service principal and enable “Databricks SQL access” and “Workspace access” and click “Update”

Navigate to "Admin Settings" > "Workspace Settings". Search for Personal Access Tokens

Click Permission Settings and grant "Can Use" to the service account you just created.

With your Token (PAT) and Application ID, run the following CURL command. Don't forget to fill in the environment variables with your specific information (${DATABRICKS_HOST} should be the URL of your workspace)

curl -X POST \
${DATABRICKS_HOST}/api/2.0/token-management/on-behalf-of/tokens \
--header "Content-type: application/json" \
--header "Authorization: Bearer ${DATABRICKS_TOKEN}" \
--data "{\"application_id\": \"${APPLICATION_ID}\" }"

Save the token_value from the response. This is the Token you will use to complete the remaining setup in Arize later.

Step 2 - Grant Access To Your Table

Go to the Data Explorer (on the left drawer) and click on the catalog with the table/view you want to grant access.

Click “Permissions” and grant “USE CATALOG” and “USE SCHEMA”. Click Grant.

Go to the view/table and click “Permissions” and grant “SELECT” to the view/table

Go to "SQL Warehouses" > [YOUR_WAREHOUSE_NAME] and click on "Permissions". Grant Can Use permissions to your service principal.

Step 3 - Start the Data Upload Wizard

Navigate to the 'Upload Data' page on the left navigation bar in the Arize platform. From there, select the 'Databricks' card or navigate to the Data Warehouse tab to start a new table import job to begin a new table import job.

Storage Selection: Databricks

Input Hostname, Endpoint, Port, and Token (from Step 1)

You can find Hostname, Endpoint, and Port in your Workspace

Similarly for Table ID

If you have issues granting permissions please reach out to support@arize.com

Step 4 - Grant Access To Your Catalog, Schema, or Table

In Arize UI: Copy arize_ingestion_key value

Granting Access to A Table (via apply tags feature)

Navigate to your Workspace > Catalog, click on the Table to grant access to
Click the Add tags button underneath the Table name

In the pop up open, enter arize_ingestion_key in the Key field and paste the copied tag value in the Value field

Granting Access to A Schema (via apply tags feature)

Navigate to your Workspace > Catalog, click on the Schema to grant access to
Click the Add tags button underneath the Schema name

In the pop up open, enter arize_ingestion_key in the Key field and paste the copied tag value in the Value field

Granting Access to A Catalog (via apply tags feature)

Navigate to your Workspace > Catalog, and click on the Catalog to grant access to
Click the Add tags button underneath the Schema name

In the pop up open, enter arize_ingestion_key in the Key field and paste the copied tag value in the Value field

Granting Access to A Table (via adding key value pairs in table properties)

If you are using built-in catalogs like hive_metastore or an older version of Databricks, you might encounter limitations when applying table_tags, schema_tags, and catalog_tags. However, there's an effective workaround to set up the arize_ingestion_key tag for your table to ensure proper access validation.

Navigate to your SQL editor in your workspace and run the following SQL query:

ALTER TABLE table_name SET TBLPROPERTIES ('arize_ingestion_key' = 'key');

To confirm that the arize_ingestion_key has been successfully applied to your table, run the following SQL command

SHOW TBLPROPERTIES table_name;

Look for the arize_ingestion_key in the results. You should see it listed along with the key-values returned from the query

Step 5 - Configure Your Model And Define Your Table’s Schema

Match your model schema to your model type and define your model schema through the form input or a json schema.

Once finished, Arize will begin querying your table and ingesting your records as model inferences.

Step 6 - Add Model Data To The Table Or View

Arize will run queries to ingest records from your table based on your configured refresh interval.

Step 7 - Check your Table Import Job

Arize will attempt a dry run to validate your job for any access, schema or record-level errors. If the dry run is successful, you may then create the import job.

After creating a job following a successful dry run, you will be taken to the 'Job Status' tab where you can see the status of your import jobs.

You can view the job details and import progress by clicking on the job ID, which uncovers more information about the job.

Step 8 - Troubleshooting An Import Job

An import job may run into a few problems. Use the dry run and job details UI to troubleshoot and quickly resolve data ingestion issues.

Validation Errors

If there is an error validating a file or table against the model schema, Arize will surface an actionable error message. From there, click on the 'Fix Schema' button to adjust your model schema.

Dry Run File/Table Passes But The Job Fails

If your dry run is successful, but your job fails, click on the job ID to view the job details. This uncovers job details such as information about the file path or query id, the last import job, potential errors, and error locations.

Last updated 1 year ago

Was this helpful?