Quickstart To Send Data
A guide to send new model data to Arize
It's easy to get your ML data into Arize - we meet you where your data is. For a quick try, upload a file in our UI or using our Python Pandas SDK from your notebook. You can also easily connect your data lake or cloud storage buckets.
Pick an Ingestion Method
Choose Your Data
- Configure the schema via the UI, json, or define the schema in a notebook (SDK)
We recommend starting with sending a sample of data to verify your data format and identify any changes to make before configuring a full production pipeline. The best way to try is using our UI Drag & Drop or our Python Pandas SDK in a notebook.
Once you've verified a sample, connect your data source to Arize to continually sync and transform data from your data source to Arize.
Arize syncs with cloud storage providers, data lakes, and data warehouses to continuously track and transcribe model inference data as Arize model records.
Perform a quick Data Quality Check when a new model data is first sent to Arize. This helps identify any data ingestion errors that must be corrected before all the model data is uploaded.
- Verify features and tags on the 'Overview' or 'Datasets' tab
- Data types (numeric & categorical) - If a feature is a wrong data type, verify that the feature is represented correctly in the DataFrame/file/table that was ingested.
- Check missing values - If a feature or tag has missing values, verify that it is expected. If missing values aren’t expected, check the input DataFrame/file/table to see if the missing values are present there as well. If there are missing values, check your upstream data sources.
- Verify feature cardinality - Are there features with a cardinality of 1, or an unusually high cardinality? If a feature cardinality appears incorrect, verify the number of unique values for that feature in the input DataFrame/file/table.
- Verify predictions and actuals on the 'Data Ingestion' or 'Datasets' tab
- Cardinality of prediction/actual class
- Distribution of prediction/actual scores
- The amount of data Arize received is the same amount of predictions sent
Arize takes a few minutes to ingest and index all of your data. If the number of predictions differs from what you're expecting to see after waiting a few minutes, check the number of records in your DataFrame or file/table.
- Verify performance metrics on the 'Performance Tracing' tab
- If actuals are sent separately, verify that the prediction ID used for the prediction matches the prediction ID used for the actual
- Check the prediction time range
It's typical to accidentally send duplicate prediction IDs - if a prediction is sent with the same prediction ID as another prediction, this will be counted as 2 predictions in Arize.
- Verify the model type (i.e., ranking, regression, etc.) next to the model’s name
- If the model type is incorrect, check that the correct model type was specified during data ingestion