Arize AI

Cloud Storage FAQ

Here you will find a list of common issues of failed file importer jobs and how to resolve them, as well as preventative measures for import errors

Dry Running a File Import Job

Before creating a File Import Job, there is a recommended option to Dry Run the job to test for any schema or file errors. The dry run:
  • reads the first file (alphabetically) from the import bucket specified
  • scans a subset of records within the file and determines whether that subset can be successfully imported
    • if the subset of records cannot be imported, the resulting error will be surfaced and the dry run can be repeated
While a successful dry run does not guarantee the success of all files within an import job, it provides a quick way to check for common errors in the schema definition or file data.
Dry run a file import job

Debugging a File Import Job

After an import job has been started, details about the files that have been successfully processed, have failed to process, and are pending processing are provided for a transparent file import experience. Details about the file path, when the file was last attempted, and potential errors and the locations of those errors will be present in the details pane.

Resolving Common Issues

Timestamp Issues

If the prediction timestamp column isn't correctly set, import jobs may result in parsing errors and fail. To make sure this doesn't happen, ensure that:
  • the timestamp format is in seconds (not something more granular) or RFC3339- see File Schema
  • the timestamp is within a year of today's date (either past or future)

File Schema Issues

  • Training and Validation records must include both Prediction and Actual columns. Otherwise it will result in a data validation error.

Data Type Issues

If data type expected is numeric, but coming in as string
  • Ensure there are no string values in numeric columns
  • If None or Null values are used to represent empty numerics, represent them instead as NaN