ML FAQ

Frequently asked questions about the ML product

  1. What model types does Arize support?

What data types does Arize support?

Arize natively supports tabular/structured data types (strings, floats, booleans, etc), as well as embedding support for NLP, Image, and other unstructured data types.

What model types does Arize support?

Arize natively supports binary classification, multi-class classification, regression, ranking, NLP, and CV model types. Your model type informs the data ingestion format and the performance metrics that can be utilized in the platform.

How can Arize surface outliers/anomalies?

Arize can surface outlier/anomalous data through:

Data Quality checks

  • Numeric Features: Arize will monitor outliers in numeric inputs ranges for your input data.

  • Categorical Features: Arize will monitor outlier categories and the overall cardinality of categorical features.

Drift checks

If there are features slices that vary significantly from the set baseline distribution, Arize will alert you through drift detection monitors.

Feature Performance Heatmap

If there are outlier slices that are poorly performing, Arize’s feature performance heatmap will automatically surface up the worst performing segments. These slices can also be monitored explicitly for proactively performance degradation detection.

What performance metrics does Arize support?

Arize supports a comprehensive list of model performance metrics for both numeric and categorical model types. These metrics are available on dashboards as well as monitors.

In addition to the out-of-the-gate metrics listed below, Arize also supports model data metrics, custom evaluation metrics, and user defined business impact metrics. Learn more about statistical widgets here and user-defined business impact formulas here.

In addition to performance metrics, we also support data metrics that allow you to count, average, view percentiles, or calculate percent/count for all features, actuals, and/or predictions. All metrics can be calculated in aggregate, as well as on particular cohorts using applied filters.

How can I monitor the impact of a particular feature?

You can monitor the model's performance for that particular feature, feature-value combination —also known as a slice. This feature performance heatmap helps visualize the performance of each slice and indicates what slices are the most problematic/performance degrading.

What happens if a new categorical feature is seen in production?

Arize drift detection can flag when categorical features see a % of unseen categories. For example, if the baseline had 10 categories, but the production/serving distribution differed significantly in number, Arize will trigger an alert. Additionally, Arize captures the percentage of values that fall into these new feature categories not previously seen in the baseline distribution.

What happens if a new numerical feature is seen in production?

Arize drift detection can show the % of values outside of the baseline range. Arize uses the quantiles of the data to calculate the bins of the distribution. If the baseline range has a larger range than the production/serving environment, the user can see the % of volume where the baseline distribution was outside of the production/serving distribution. If the production/serving distribution was outside the range of the baseline distribution, similarly Arize surfaces the % of volume for values outside the baseline range.

How does Arize calculate drift?

Arize calculates drift metrics including Population Stability Index, KL Divergence, KS Statistic and JS Distance. Arize computes drift by measuring distribution changes between the model’s production values and a baseline (reference dataset). Users can configure a baseline to be any time window of a:

  1. Pre-production dataset (training, test, validation) or

  2. Fixed or moving time period from production (e.g. last 30 days, last 60 days).

Baselines are saved in Arize so that users can compare several versions and/or environments against each other across moving or fixed time windows. For more details on baselines, visit here.

What metrics can be applied to individual features?

Arize supports automated schema detection of models and immediately computes statistics for all features of the model, including:

How do you evaluate features?

Arize supports feature quality metrics including feature drift, data quality (ex: cardinality, percent empty, type mismatch, out of range, etc.) and feature importance metrics. Additionally, users can compute performance metrics for their model filtered by feature/value combinations (slices).

How does Arize handle concept drift?

Concept drift is drift in the actuals or ground truth. To measure concept drift, Arize requires historical actuals which are utilized to set a baseline.

How does Arize calculate bins for numeric features?

Arize calculates the bins within the drift tab using quantiles and fixed bins from the baseline distribution.

The range between two quantile values in the baseline distribution are utilized to calculate a fixed width for binning. That fixed width value will be used to calculate a finite set of bins (currently 8) of a fixed width from the Median value, in both directions (4 in each direction). Lastly, it adds bins to the "bookends", one from min value to lowest bin's edge and another from largest bin's edge to maximum value amongst both distributions.

This strategy optimizes for reasonable sized bins by calculating a fixed width based on quantile values.

Does Arize have any security certifications?

Arize is SOC2 Type 2 certified under standards set by the American Institute of Certified Public Accountants (AICPA). Arize’s SOC 2 security certification validates that Arize has adequate processes and policies to securely handle both customer and organizational data.

Read more about the certification and what it means here. To request a copy of the report, please contact us here.

Arize AI has also received certifications from an independent auditor validating that the company’s health information security program is fairly represented and includes the essential elements of HIPAA’s Security Rule and the HITECH Act! Read more here.

How are automatic thresholds set?

Autothresholds are calculated based on a statistical analysis of data over 14 days. Each day, a data point is collected, and after 14 days, the average (mean) and standard deviation of these data points are computed. The thresholds is then set by adding or subtracting the standard deviation from the average.

Can I change the Arize dashboard time scale so that it shows the average per hour?

Yes, use the Arize date range selector to select a date range less than 3 days, the platform will then switch to hourly.

If I have a n-dimensional feature is there way to look at which dimension drifting or which dimension has higher impact on prediction drift?

Yes, our current implementation of vector drift looks at the vector drift as a whole, however, you could log the feature space that generated the n-dimensional feature to determine the prediction drift impact.

Last updated

Copyright © 2023 Arize AI, Inc