Arize AI
Search…
⌃K

Cloud Storage FAQ

Here you will find a list of common issues of failed file importer jobs and how to resolve them, as well as preventative measures for import errors

Supported Data Type By File Type

Parquet
Avro
Apache Arrow
Input Data Field
Parquet Data Type
prediction_id
  • string
  • int8 , int16, int32, int64
prediction_label/actual_label
  • string
  • boolean
  • int8, int16, int32, int64
  • float32, float64
prediction_score/actual_score
  • int8, int16, int32, int64
  • float32, float64
timestamp
  • int64, float64 (number of seconds since unix epoch)
  • string RFC3339 format (e.g 2022-04-16T16:15:00Z)
  • timestamp
  • date32, date64
features/tags
  • string
  • boolean (converted to string)
  • int8, int16, int32, int64
  • float32, float64
  • decimal128 (round to 5 decimal places)
  • date32, date64, timestamp (converted to integer)
shap_values
  • int32, int64
  • float32, float64
embedding_feature:vector
list of {int8|int16|int32|int64|float32|float64}
embedding_feature:raw_data
  • string
  • list of strings
embedding_feature:link_to_data
string
ranking:prediction_group_id
  • string
  • int8 , int16, int32, int64
ranking:rank
int8, int16, int32, int64
ranking:category
array of strings
ranking:relevance_score
  • int8, int16, int32, int64
  • float32, float32
Use the Avro schema embedded in the header of the Avro Object Container File(OCF) to decode and match to those specified in Arize file importer schema for data ingestion. The field name in the OCF file must
  • start with [A-Za-z_]
  • subsequently contain only [A-Za-z0-9_]
Input Data Field
Avro Data Type
prediction_id
long, int, string
prediction_label/actual_label
  • string
  • boolean, int, long, float, double, enum (will be converted to string)
prediction_score/actual_score
int, long, float, double
timestamp
  • long, double (number of seconds since unix epoch)
  • string RFC3339 format (e.g 2022-04-16T16:15:00Z)
  • timestamp logical type
  • date logical type
features/tags
  • string
  • enum , boolean (will be converted to string)
  • int, long
  • float, double
shap_values
int , long, float, double
embedding_feature:vector
array of {int|long|float|double}
embedding_feature:raw_data
  • string
  • array of strings
embedding_feature:link_to_data
string
ranking:prediction_group_id
long, int, string
ranking:rank
int, long
ranking:category
array of strings
ranking:relevance_score
int , long, float, double
Column Name in File
Arize Schema
my-prediction-ts
prediction_timestamp
my-prediction-id-customer
prediction_id
my-prediction-score
prediction_score
my-prediction-label
prediction_label
my-feature.addr_state
features
my-feature.revenue
features
my-environment
environment
my-actual-label
actual_label
This example shows what an Arrow file columns and schema file would look like.
The "*" can be used to add features to a file without changing the schema.
Note the name "my-feature" has multiple feature values.
ModelSchema:
prediction_timestamp: "my-prediction_ts"
prediction_id: "my-prediction-id-customer"
prediction_score: "my-prediction-score"
prediction_label: "my-prediction-label"
features: "feature.*" # describes the path to the "features" object above, containing "addr_state" and "revenue"

Dry Running a File Import Job

Before creating a File Import Job, there is a recommended option to Dry Run the job to test for any schema or file errors. The dry run:
  • reads the first file (alphabetically) from the import bucket specified
  • scans a subset of records within the file and determines whether that subset can be successfully imported
    • if the subset of records cannot be imported, the resulting error will be surfaced and the dry run can be repeated
While a successful dry run does not guarantee the success of all files within an import job, it provides a quick way to check for common errors in the schema definition or file data.
Dry run a file import job

Debugging a File Import Job

After an import job has been started, details about the files that have been successfully processed, have failed to process, and are pending processing are provided for a transparent file import experience. Details about the file path, when the file was last attempted, and potential errors and the locations of those errors will be present in the details pane.

Resolving Common Issues

Timestamp Issues

If the prediction timestamp column isn't correctly set, import jobs may result in parsing errors and fail. To make sure this doesn't happen, ensure that:
  • the timestamp format is in seconds (not something more granular) or RFC3339- see File Schema
  • the timestamp is within a year of today's date (either past or future)

File Schema Issues

  • Training and Validation records must include both Prediction and Actual columns. Otherwise it will result in a data validation error.

Data Type Issues

If data type expected is numeric, but coming in as string
  • Ensure there are no string values in numeric columns
  • If None or Null values are used to represent empty numerics, represent them instead as NaN