Arize AI
Product FAQ
Frequently asked questions about the product

What data types does Arize support?

Arize natively supports tabular/structured data types (strings, floats, booleans, etc). Shortly, Arize will launch embedding support for NLP, Image, and other unstructured data types.

How can Arize surface outliers/anomalies?

Arize can surface outlier/anomalous data through:

Data Quality checks

  • Numeric Features: Arize will monitor outliers in numeric inputs ranges for your input data.
  • Categorical Features: Arize will monitor outlier categories and the overall cardinality of categorical features.

Drift checks

If there are features slices that vary significantly from the set baseline distribution, Arize will alert you through drift detection monitors.

Feature Performance Heatmap

If there are outlier slices that are poorly performing, Arize’s feature performance heatmap will automatically surface up the worst performing segments. These slices can also be monitored explicitly for proactively performance degradation detection.

What performance metrics does Arize support?

Arize supports a comprehensive list of model performance metrics for both numeric and categorical model types. These metrics are available on dashboards as well as monitors. In addition to the out of the gate metrics listed below, Arize also supports model data metrics, custom evaluation metrics, and user defined business impact metrics. Learn more about statistical widgets here and user-defined business impact formulas here.
Log Loss
Custom Metrics
In addition to performance metrics, we also support data metrics that allow you to count, average, view percentiles, or calculate percent/count for all features, actuals, and/or predictions. All metrics can be calculated in aggregate, as well as on particular cohorts using applied filters.

How can I monitor the impact of a particular feature?

You can monitor the performance of the model for that particular feature, feature-value combination —also known as a slice. This feature performance heatmap helps visualize the performance of each slice and indicates what slices are the most problematic/performance degrading.

What happens if a new categorical feature is seen in production?

Arize drift detection can flag when categorical features see a % of unseen categories. For example, if the baseline had 10 categories, but the production/serving distribution differed significantly in number, Arize will trigger an alert. Additionally, Arize captures the percentage of values that fall into these new feature categories not previously seen in the baseline distribution.

What happens if a new numerical feature is seen in production?

Arize drift detection can show the % of values outside of the baseline range. Arize uses the quantiles of the data to calculate the bins of the distribution. If the baseline range has a larger range than the production/serving environment, the user can see the % of volume where the baseline distribution was outside of the production/serving distribution. If the production/serving distribution was outside the range of the baseline distribution, similarly Arize surfaces the % of volume for values outside the baseline range.

How does Arize calculate drift?

Arize calculates drift metrics such as Population Stability Index, KL Divergence, and Wasserstein Distance. Arize computes drift by measuring distribution changes between the model’s production values and a baseline (reference dataset). Users can configure a baseline to be any time window of a:
  1. 1.
    Pre-production dataset (training, test, validation) or
  2. 2.
    Fixed or moving time period from production (e.g. last 30 days, last 60 days).
Baselines are saved in Arize so that users can compare several versions and/or environments against each other across moving or fixed time windows. For more details on baselines, visit here.

What metrics can be applied to individual features?

Arize supports automated schema detection of models and immediately computes statistics for all features of the model, including:
Feature Type
Minimum Value
Standard Deviation
Missing Value
Maximum Value
Custom Metrics

How do you evaluate features?

Arize supports feature quality metrics including feature drift, data quality (ex: cardinality, percent empty, type mismatch, out of range, etc.) and feature importance metrics. Additionally, users can compute performance metrics for their model filtered by feature/value combinations (slices).

How does Arize handle concept drift?

Concept drift is drift in the actuals or ground truth. To measure concept drift, Arize requires historical actuals which are utilized to set a baseline.

How does Arize calculate bins for numeric features?

Arize calculates the bins within the drift tab using quantiles and fixed bins from the baseline distribution.
The range between two quantile values in the baseline distribution are utilized to calculate a fixed width for binning. That fixed width value will be used to calculate a finite set of bins (currently 8) of a fixed width from the Median value, in both directions (4 in each direction). Lastly, it adds bins to the "bookends", one from min value to lowest bin's edge and another from largest bin's edge to maximum value amongst both distributions.
This strategy optimizes for reasonable sized bins by calculating a fixed width based on quantile values.

Does Arize have any security certifications?

Arize is SOC2 Type 2 certified under standards set by the American Institute of Certified Public Accountants (AICPA). Arize’s SOC 2 security certification validates that Arize has adequate processes and policies to securely handle both customer and organizational data.
Read more about the certification and what it means here. To request a copy of the report, please contact us here.

How are automatic thresholds set?

Drift: An automatic threshold for drift metrics is defined as 2 standard deviations above the mean of the calculated metric value for the latest 14 days of data (with up to a 3 day delay).
Performance: An automatic threshold for performance metrics is defined as 2 standard deviations above/below (depending on the metric) the mean of the calculated metric value for the latest 14 days of data (with up to a 16 day delay).