LogoLogo
Python SDKSlack
  • Documentation
  • Cookbooks
  • Self-Hosting
  • Release Notes
  • Reference
  • Arize AI
  • Quickstarts
  • ✨Arize Copilot
  • Arize AI for Agents
  • Concepts
    • Agent Evaluation
    • Tracing
      • What is OpenTelemetry?
      • What is OpenInference?
      • Openinference Semantic Conventions
    • Evaluation
  • 🧪Develop
    • Quickstart: Experiments
    • Datasets
      • Create a dataset
      • Update a dataset
      • Export a dataset
    • Experiments
      • Run experiments
      • Run experiments with code
        • Experiments SDK differences in AX vs Phoenix
        • Log experiment results via SDK
      • Evaluate experiments
      • Evaluate experiment with code
      • CI/CD with experiments
        • Github Action Basics
        • Gitlab CI/CD Basics
      • Download experiment
    • Prompt Playground
      • Use tool calling
      • Use image inputs
      • Replay spans
      • Compare prompts side-by-side
      • Load a dataset into playground
      • Save playground outputs as an experiment
      • ✨Copilot: prompt builder
    • Playground Integrations
      • OpenAI
      • Azure OpenAI
      • AWS Bedrock
      • VertexAI
      • Custom LLM Models
    • Prompt Hub
  • 🧠Evaluate
    • Online Evals
      • Run evaluations in the UI
      • Run evaluations with code
      • Test LLM evaluator in playground
      • View task details & logs
      • ✨Copilot: Eval Builder
      • ✨Copilot: Eval Analysis
      • ✨Copilot: RAG Analysis
    • Experiment Evals
    • LLM as a Judge
      • Custom Eval Templates
      • Arize Templates
        • Agent Tool Calling
        • Agent Tool Selection
        • Agent Parameter Extraction
        • Agent Path Convergence
        • Agent Planning
        • Agent Reflection
        • Hallucinations
        • Q&A on Retrieved Data
        • Summarization
        • Code Generation
        • Toxicity
        • AI vs Human (Groundtruth)
        • Citation
        • User Frustration
        • SQL Generation
    • Code Evaluations
    • Human Annotations
  • 🔭Observe
    • Quickstart: Tracing
    • Tracing
      • Setup tracing
      • Trace manually
        • Trace inputs and outputs
        • Trace function calls
        • Trace LLM, Retriever and Tool Spans
        • Trace prompt templates & variables
        • Trace as Inferences
        • Send Traces from Phoenix -> Arize
        • Advanced Tracing (OTEL) Examples
      • Add metadata
        • Add events, exceptions and status
        • Logging Latent Metadata
        • Add attributes, metadata and tags
        • Send data to a specific project
        • Get the current span context and tracer
      • Configure tracing options
        • Configure OTEL tracer
        • Mask span attributes
        • Redact sensitive data from traces
        • Instrument with OpenInference helpers
      • Query traces
        • Filter Traces
          • Time Filtering
        • Export Traces
        • ✨AI Powered Search & Filter
        • ✨AI Powered Trace Analysis
        • ✨AI Span Analysis & Evaluation
    • Tracing Integrations
      • OpenAI
      • OpenAI Agents SDK
      • LlamaIndex
      • LlamaIndex Workflows
      • LangChain
      • LangGraph
      • Hugging Face smolagents
      • Autogen
      • Google GenAI (Gemini)
      • Model Context Protocol (MCP)
      • Vertex AI
      • Amazon Bedrock
      • Amazon Bedrock Agents
      • MistralAI
      • Anthropic
      • Agno
      • LangFlow
      • Haystack
      • LiteLLM
      • CrewAI
      • Groq
      • DSPy
      • Guardrails AI
      • Prompt flow
      • Vercel AI SDK
      • Pydantic AI
      • Portkey
      • Mastra
      • Llama
      • Together AI
      • OpenTelemetry (arize-otel)
      • BeeAI
    • Evals on Traces
    • Guardrails
    • Sessions
    • Dashboards
      • Dashboard Widgets
      • Tracking Token Usage
      • ✨Copilot: Dashboard Widget Creation
    • Monitors
      • Integrations: Monitors
        • Slack
          • Manual Setup
        • OpsGenie
        • PagerDuty
      • LLM Red Teaming
    • Custom Metrics & Analytics
      • Arize Query Language Syntax
        • Conditionals and Filters
        • All Operators
        • All Functions
      • Custom Metric Examples
      • ✨Copilot: ArizeQL Generator
  • 📈Machine Learning
    • Machine Learning
      • User Guide: ML
      • Quickstart: ML
      • Concepts: ML
        • What Is A Model Schema
        • Delayed Actuals and Tags
        • ML Glossary
      • How To: ML
        • Upload Data to Arize
          • Pandas SDK Example
          • Local File Upload
            • File Upload FAQ
          • Table Ingestion Tuning
          • Wildcard Paths for Cloud Storage
          • Troubleshoot Data Upload
          • Sending Data FAQ
        • Monitors
          • ML Monitor Types
          • Configure Monitors
            • Notifications Providers
          • Programmatically Create Monitors
          • Best Practices for Monitors
        • Dashboards
          • Dashboard Widgets
          • Dashboard Templates
            • Model Performance
            • Pre-Production Performance
            • Feature Analysis
            • Drift
          • Programmatically Create Dashboards
        • Performance Tracing
          • Time Filtering
          • ✨Copilot: Performance Insights
        • Drift Tracing
          • ✨Copilot: Drift Insights
          • Data Distribution Visualization
          • Embeddings for Tabular Data (Multivariate Drift)
        • Custom Metrics
          • Arize Query Language Syntax
            • Conditionals and Filters
            • All Operators
            • All Functions
          • Custom Metric Examples
          • Custom Metrics Query Language
          • ✨Copilot: ArizeQL Generator
        • Troubleshoot Data Quality
          • ✨Copilot: Data Quality Insights
        • Explainability
          • Interpreting & Analyzing Feature Importance Values
          • SHAP
          • Surrogate Model
          • Explainability FAQ
          • Model Explainability
        • Bias Tracing (Fairness)
        • Export Data to Notebook
        • Automate Model Retraining
        • ML FAQ
      • Use Cases: ML
        • Binary Classification
          • Fraud
          • Insurance
        • Multi-Class Classification
        • Regression
          • Lending
          • Customer Lifetime Value
          • Click-Through Rate
        • Timeseries Forecasting
          • Demand Forecasting
          • Churn Forecasting
        • Ranking
          • Collaborative Filtering
          • Search Ranking
        • Natural Language Processing (NLP)
        • Common Industry Use Cases
      • Integrations: ML
        • Google BigQuery
          • GBQ Views
          • Google BigQuery FAQ
        • Snowflake
          • Snowflake Permissions Configuration
        • Databricks
        • Google Cloud Storage (GCS)
        • Azure Blob Storage
        • AWS S3
          • Private Image Link Access Via AWS S3
        • Kafka
        • Airflow Retrain
        • Amazon EventBridge Retrain
        • MLOps Partners
          • Algorithmia
          • Anyscale
          • Azure & Databricks
          • BentoML
          • CML (DVC)
          • Deepnote
          • Feast
          • Google Cloud ML
          • Hugging Face
          • LangChain 🦜🔗
          • MLflow
          • Neptune
          • Paperspace
          • PySpark
          • Ray Serve (Anyscale)
          • SageMaker
            • Batch
            • RealTime
            • Notebook Instance with Greater than 20GB of Data
          • Spell
          • UbiOps
          • Weights & Biases
      • API Reference: ML
        • Python SDK
          • Pandas Batch Logging
            • Client
            • log
            • Schema
            • TypedColumns
            • EmbeddingColumnNames
            • ObjectDetectionColumnNames
            • PromptTemplateColumnNames
            • LLMConfigColumnNames
            • LLMRunMetadataColumnNames
            • NLP_Metrics
            • AutoEmbeddings
            • utils.types.ModelTypes
            • utils.types.Metrics
            • utils.types.Environments
          • Single Record Logging
            • Client
            • log
            • TypedValue
            • Ranking
            • Multi-Class
            • Object Detection
            • Embedding
            • LLMRunMetadata
            • utils.types.ModelTypes
            • utils.types.Metrics
            • utils.types.Environments
        • Java SDK
          • Constructor
          • log
          • bulkLog
          • logValidationRecords
          • logTrainingRecords
        • R SDK
          • Client$new()
          • Client$log()
        • Rest API
    • Computer Vision
      • How to: CV
        • Generate Embeddings
          • How to Generate Your Own Embedding
          • Let Arize Generate Your Embeddings
        • Embedding & Cluster Analyzer
        • ✨Copilot: Embedding Summarization
        • Similarity Search
        • Embedding Drift
        • Embeddings FAQ
      • Integrations: CV
      • Use Cases: CV
        • Image Classification
        • Image Segmentation
        • Object Detection
      • API Reference: CV
Powered by GitBook

Support

  • Chat Us On Slack
  • support@arize.com

Get Started

  • Signup For Free
  • Book A Demo

Copyright © 2025 Arize AI, Inc

On this page
  • Step 1 - Generate a Token
  • Step 2 - Grant Access To Your Table
  • Step 3 - Start the Data Upload Wizard
  • Step 4 - Grant Access To Your Catalog, Schema, or Table
  • Step 5 - Configure Your Model And Define Your Table’s Schema
  • Step 6 - Add Model Data To The Table Or View
  • Step 7 - Check your Table Import Job
  • Step 8 - Troubleshooting An Import Job

Was this helpful?

  1. Machine Learning
  2. Machine Learning
  3. Integrations: ML

Databricks

Learn how to setup an import job using Databricks

Last updated 1 year ago

Was this helpful?

Step 1 - Generate a Token

If necessary, generate a PAT (Personal Access Token), which will be used to authenticate in the following steps when you generate a token for your service principal.

Navigate to your Workspace and click "User Settings"

Click "Generate new token"

Take note of your PAT

  1. Navigate to your Workspace and click "Admin Settings"

  1. In the "Service Principals" tab, click "Add Service Principal"

  1. Click on "User Management" on accounts.cloud.databricks.com

  2. Create a Service Principal

  3. Take note of the Application ID

  4. Run the following curl command to create a service principal in your workspace where ${DATABRICKS_HOST} is the workspace URL, ${DATABRICKS_TOKEN} is the PAT you just created, and $APPLICATION_ID is the Application ID of the service principal you just created

curl -X POST \
${DATABRICKS_HOST}/api/2.0/preview/scim/v2/ServicePrincipals \
--header "Content-type: application/json" \
--header "Authorization: Bearer ${DATABRICKS_TOKEN}" \
--data "{
  \"displayName\": \"displayName\",
  \"externalId\": \"externalId\",
  \"applicationId\": \"${APPLICATION_ID}\",
  \"id\": \"id\",
  \"active\": true
}"

Click on the service principal and enable “Databricks SQL access” and “Workspace access” and click “Update”

Navigate to "Admin Settings" > "Workspace Settings". Search for Personal Access Tokens

Click Permission Settings and grant "Can Use" to the service account you just created.

With your Token (PAT) and Application ID, run the following CURL command. Don't forget to fill in the environment variables with your specific information (${DATABRICKS_HOST} should be the URL of your workspace)

curl -X POST \
${DATABRICKS_HOST}/api/2.0/token-management/on-behalf-of/tokens \
--header "Content-type: application/json" \
--header "Authorization: Bearer ${DATABRICKS_TOKEN}" \
--data "{\"application_id\": \"${APPLICATION_ID}\" }"

Save the token_value from the response. This is the Token you will use to complete the remaining setup in Arize later.

Step 2 - Grant Access To Your Table

Go to the Data Explorer (on the left drawer) and click on the catalog with the table/view you want to grant access.

Click “Permissions” and grant “USE CATALOG” and “USE SCHEMA”. Click Grant.

Go to the view/table and click “Permissions” and grant “SELECT” to the view/table

Go to "SQL Warehouses" > [YOUR_WAREHOUSE_NAME] and click on "Permissions". Grant Can Use permissions to your service principal.

Step 3 - Start the Data Upload Wizard

Navigate to the 'Upload Data' page on the left navigation bar in the Arize platform. From there, select the 'Databricks' card or navigate to the Data Warehouse tab to start a new table import job to begin a new table import job.

Storage Selection: Databricks

Input Hostname, Endpoint, Port, and Token (from Step 1)

You can find Hostname, Endpoint, and Port in your Workspace

Similarly for Table ID

If you have issues granting permissions please reach out to support@arize.com

Step 4 - Grant Access To Your Catalog, Schema, or Table

In Arize UI: Copy arize_ingestion_key value

Granting Access to A Table (via apply tags feature)
  1. Navigate to your Workspace > Catalog, click on the Table to grant access to

  2. Click the Add tags button underneath the Table name

  1. In the pop up open, enter arize_ingestion_key in the Key field and paste the copied tag value in the Value field

Granting Access to A Schema (via apply tags feature)
  1. Navigate to your Workspace > Catalog, click on the Schema to grant access to

  2. Click the Add tags button underneath the Schema name

  1. In the pop up open, enter arize_ingestion_key in the Key field and paste the copied tag value in the Value field

Granting Access to A Catalog (via apply tags feature)
  1. Navigate to your Workspace > Catalog, and click on the Catalog to grant access to

  2. Click the Add tags button underneath the Schema name

  1. In the pop up open, enter arize_ingestion_key in the Key field and paste the copied tag value in the Value field

Granting Access to A Table (via adding key value pairs in table properties)

If you are using built-in catalogs like hive_metastore or an older version of Databricks, you might encounter limitations when applying table_tags, schema_tags, and catalog_tags. However, there's an effective workaround to set up the arize_ingestion_key tag for your table to ensure proper access validation.

  1. Navigate to your SQL editor in your workspace and run the following SQL query:

ALTER TABLE table_name SET TBLPROPERTIES ('arize_ingestion_key' = 'key');
  1. To confirm that the arize_ingestion_key has been successfully applied to your table, run the following SQL command

SHOW TBLPROPERTIES table_name;

Look for the arize_ingestion_key in the results. You should see it listed along with the key-values returned from the query

Step 5 - Configure Your Model And Define Your Table’s Schema

Match your model schema to your model type and define your model schema through the form input or a json schema.

Once finished, Arize will begin querying your table and ingesting your records as model inferences.

Step 6 - Add Model Data To The Table Or View

Arize will run queries to ingest records from your table based on your configured refresh interval.

Step 7 - Check your Table Import Job

Arize will attempt a dry run to validate your job for any access, schema or record-level errors. If the dry run is successful, you may then create the import job.

After creating a job following a successful dry run, you will be taken to the 'Job Status' tab where you can see the status of your import jobs.

You can view the job details and import progress by clicking on the job ID, which uncovers more information about the job.

Step 8 - Troubleshooting An Import Job

An import job may run into a few problems. Use the dry run and job details UI to troubleshoot and quickly resolve data ingestion issues.

Validation Errors

If there is an error validating a file or table against the model schema, Arize will surface an actionable error message. From there, click on the 'Fix Schema' button to adjust your model schema.

Dry Run File/Table Passes But The Job Fails

If your dry run is successful, but your job fails, click on the job ID to view the job details. This uncovers job details such as information about the file path or query id, the last import job, potential errors, and error locations.

Tag your Catalog/Schema/Table with the arize_ingestion_key and the provided label value using the steps below. For more details, see docs on for Databricks.

You can grant access to , or in databricks

Learn more about Schema fields .

Once you've identified the job failure point, append the edited row to the end of your table with an updated value.

📈
Table_tags
a Table
a Schema
a Catalog
change_timestamp
here
Select Databricks from Table Options
copy arize_ingestion_key value
Set up model configurations
Map your table using a form
Map your table using a JSON schema