Databricks
Learn how to setup an import job using Databricks
If necessary, generate a PAT (Personal Access Token), which will be used to authenticate in the following steps when you generate a token for your service principal.
Navigate to your Workspace and click "User Settings"

Click "Generate new token"

Take note of your PAT

With Unity Catalog
Without Unity Catalog
- 1.Navigate to your Workspace and click "Admin Settings"

- 2.In the "Service Principals" tab, click "Add Service Principal"

- 1.Click on "User Management" on
accounts.cloud.databricks.com
- 2.Create a Service Principal
- 3.Take note of the Application ID
- 4.Run the following
curl
command to create a service principal in your workspace where${DATABRICKS_HOST}
is the workspace URL,${DATABRICKS_TOKEN}
is the PAT you just created, and$APPLICATION_ID
is the Application ID of the service principal you just created
curl -X POST \
${DATABRICKS_HOST}/api/2.0/preview/scim/v2/ServicePrincipals \
--header "Content-type: application/json" \
--header "Authorization: Bearer ${DATABRICKS_TOKEN}" \
--data "{
\"displayName\": \"displayName\",
\"externalId\": \"externalId\",
\"applicationId\": \"${APPLICATION_ID}\",
\"id\": \"id\",
\"active\": true
}"
Click on the service principal and enable “Databricks SQL access” and “Workspace access” and click “Update”

Navigate to "Admin Settings" > "Workspace Settings". Search for Personal Access Tokens

Click Permission Settings and grant "Can Use" to the service account you just created.

With your Token (PAT) and Application ID, run the following CURL command. Don't forget to fill in the environment variables with your specific information (
${DATABRICKS_HOST}
should be the URL of your workspace)curl -X POST \
${DATABRICKS_HOST}/api/2.0/token-management/on-behalf-of/tokens \
--header "Content-type: application/json" \
--header "Authorization: Bearer ${DATABRICKS_TOKEN}" \
--data "{\"application_id\": \"${APPLICATION_ID}\" }"
Save the token_value from the response. This is the Token you will use to complete the remaining setup in Arize later.
Go to the Data Explorer (on the left drawer) and click on the catalog with the table/view you want to grant access.

Click “Permissions” and grant “USE CATALOG” and “USE SCHEMA”. Click Grant.

Go to the view/table and click “Permissions” and grant “SELECT” to the view/table

Go to "SQL Warehouses" > [YOUR_WAREHOUSE_NAME] and click on "Permissions". Grant Can Use permissions to your service principal.

Navigate to the 'Upload Data' page on the left navigation bar in the Arize platform. From there, select the 'Databricks' card or navigate to the Data Warehouse tab to start a new table import job to begin a new table import job.
Storage Selection: Databricks

Select Databricks from Table Options
Input Hostname, Endpoint, Port, and Token (from Step 1)

You can find Hostname, Endpoint, and Port in your Workspace

Similarly for Table ID

If you have issues granting permissions please reach out to [email protected]
Match your model schema to your model type and define your model schema through the form input or a json schema.

Set up model configurations

Map your table using a form

Map your table using a JSON schema
Once finished, Arize will begin querying your table and ingesting your records as model inferences.
Arize will run queries to ingest records from your table based on your configured refresh interval.
Arize will attempt a dry run to validate your job for any access, schema or record-level errors. If the dry run is successful, you may then create the import job.

After creating a job following a successful dry run, you will be taken to the 'Job Status' tab where you can see the status of your import jobs.

You can view the job details and import progress by clicking on the job ID, which uncovers more information about the job.
An import job may run into a few problems. Use the dry run and job details UI to troubleshoot and quickly resolve data ingestion issues.
If there is an error validating a file or table against the model schema, Arize will surface an actionable error message. From there, click on the 'Fix Schema' button to adjust your model schema.
.png?alt=media&token=09bd2283-5215-4bd2-abe3-8bdb03b53dfd)
If your dry run is successful, but your job fails, click on the job ID to view the job details. This uncovers job details such as information about the file path or query id, the last import job, potential errors, and error locations.

Once you've identified the job failure point, append the edited row to the end of your table with an updated change_timestamp value.
Last modified 10d ago