📣
Arize Changelog
  • Changelog
  • History
    • 2025
      • 04.2025
      • 03.2025
      • 02.27.2025
      • 02.14.2025
      • 01.21.2025
    • 2024
      • 12.19.2024
      • 12.05.2024
      • 11.07.2024
      • 10.24.2024
      • 10.03.2024
      • 09.19.2024
      • 09.05.2024
      • 08.23.2024
      • 08.08.2024
      • 07.10.2024
      • 06.07.2024
      • 05.10.2024
      • 04.25.2024
      • 04.12.2024
      • 03.28.2024
      • 03.18.2024
      • 02.16.2024
      • 01.18.2024
      • 01.04.2024
    • 2023 and older
      • 2023
        • 12.07.2023
        • 11.14.2023
        • 10.26.2023
        • 10.11.2023
        • 08.29.2023
        • 07.17.2023
        • 06.05.2023
        • 05.08.2023
        • 04.17.2023
        • 03.13.2023
        • 02.13.2023
        • 01.24.2023
      • 2022
        • 12.19.2022
        • 11.28.2022
        • 11.07.2022
        • 10.10.2022
        • 09.26.2022
        • 09.12.2022
        • 08.29.2022
        • 08.15.2022
        • 08.01.2022
        • 07.18.2022
        • 06.28.2022
        • 06.06.2022
        • 05.23.2022
        • 05.09.2022
        • 04.25.2022
        • 04.11.2022
        • 03.28.2022
        • 03.14.2022
        • 02.28.2022
        • 02.14.2022
        • 01.31.2022
        • 01.18.2022
      • 2021
        • 12.15.2021: CVE-44228
        • 12.13.2021
        • 11.18.2021
        • 11.02.2021
        • 10.15.2021
        • 09.27.2021
        • 06.25.2021
        • 05.28.2021
        • 04.30.2021
        • 04.16.2021
        • 04.02.2021
        • 03.12.2021
        • 02.26.2021
Powered by GitBook
On this page
  • Column selection in prompt playground
  • Latency and token counts in prompt playground
  • Major design refresh in Arize AX
  • Custom code evaluators
  • Security audit logs for enterprise customers
  • Larger dataset runs in prompt playground
  • Evaluations on experiments
  • Cancel running background tasks
  • Improved UI for functions in prompt playground
  • Compare prompts side by side
  • Image segmentation support for CV models
  • New time selector on your traces
  • Prompt hub python SDK
  • View task run history and errors
  • Run evals and tasks over a date range
  • Test online evaluation tasks in playground
  • Select metadata on the sessions page
  • Labeling queues
  • Expand and collapse your traces
  • Schedule your monitors
  • Improved traces export
  • Create dataset from CSVs
  • OTEL tracing Via HTTP
  • Voice application tracing and evaluation
  • Dashboard colors
  • Prompt hub
  • Managed code evaluators
  • Create experiments from playground
  • Monitor alert status
  • LangChain Instrumentation
  • Analyze your spans with Copilot
  • Generate dashboards with Copilot
  • View your experiment traces
  • Multi-class calibration chart
  • Log experiments in Python SDK
  • Create custom metrics with Copilot
  • Summarize embeddings with Copilot
  • Local explainability support for ML models
  • See experiment results over time
  • Function calling replay in prompt playground
  • Vercel AI auto-instrumentation
  • Track sessions and context attributes in instrumentation
  • Easily test your online tasks and evals
  • Experiment filters
  • Embedding traces
  • Experiments Details Visualization
  • Support for o1-mini and o1-preview in playground
  • Improved auto-complete in playground
  • Filter history
  • Tracing quick filters
  • New arize-otel package
  • Easily add spans to datasets
  • See more

Was this helpful?

Changelog

See the latest new features released in Arize

Last updated 2 days ago

Was this helpful?

Column selection in prompt playground

May 5, 2025

You can now view all of your prompt variables and dataset values directly in playground!

Latency and token counts in prompt playground

May 2, 2025

We've added latency and token counts to prompt playground runs! Currently supported for OpenAI, with more providers to come!

Major design refresh in Arize AX

We've refreshed Arize AX with polished fonts, spacing, color, and iconography throughout the whole platform.

Custom code evaluators

Security audit logs for enterprise customers

Larger dataset runs in prompt playground

We've increased the row limit for datasets in the playground, so you can run prompts in parallel on up to 100 examples.

Evaluations on experiments

Cancel running background tasks

Improved UI for functions in prompt playground

Compare prompts side by side

Image segmentation support for CV models

We now support logging image segmentation to Arize. Log your segmentation coordinates and compare your predictions vs. your actuals.

New time selector on your traces

Prompt hub python SDK

pip install "arize[PromptHub]"

View task run history and errors

Run evals and tasks over a date range

Easily run your online evaluation tasks over historical data.

Test online evaluation tasks in playground

Select metadata on the sessions page

Dynamically select the fields you want to see in your sessions view.

Labeling queues

Expand and collapse your traces

You can now collapse rows to see more data at a glance or expand them to view more text.

Schedule your monitors

Schedule for monitors to run hourly, daily, weekly, or monthly.

Improved traces export

primary_df = client.export_model_to_df(
    columns=['context.span_id', 'attributes.llm.input'] # <---- HERE
    space_id='',
    model_id='',
    environment=Environments.TRACING,
    start_time=datetime(2025, 3, 25),
    end_time=datetime(2025, 4, 25),
)

Create dataset from CSVs

OTEL tracing Via HTTP

Support for HTTP when sending traces to Arize! See GitHub for more info.

tracer_provider = register(
    endpoint="https://otlp.arize.com/v1/traces",     # NEW
    transport=Transport.HTTP,                        # NEW
    space_id=SPACE_ID,
    api_key=API_KEY
    project_name="test-project-http",
)

Voice application tracing and evaluation

Dashboard colors

We’ve added new ways to plot your charts, with custom colors and better UX!

Prompt hub

Managed code evaluators

Create experiments from playground

Monitor alert status

See exactly how and when your monitors are triggered

LangChain Instrumentation

Support for sessions via LangChain native thread tracking in TypeScript is now available. Easily track multi-turn conversations / threads using LangChain.js.

Analyze your spans with Copilot

Extract key insights quickly from your spans instead of trying to decipher meaning in hundreds of spans. Ask questions and run evals right in the trace view.

Generate dashboards with Copilot

Building dashboard plots just got way easier. Create time series plots and even translate code into ready to go visualizations.

The Custom Metric skill now supports a conversational flow, making it easier for users to iterate and refine metrics dynamically

View your experiment traces

Experiment traces for a dataset are now consolidated accessed under "Experiment Projects".

Multi-class calibration chart

For your multi-class ML models, you can see how your model is calibrated in one visualization

Log experiments in Python SDK

arize_client.log_experiment(
    space_id=SPACE_ID,
    experiment_name="my_experiment",
    experiment_df=experiment_run_df,
    task_columns=task_columns,
    evaluator_columns={"correctness": evaluator_columns},
    dataset_name=dataset_name,
)

Create custom metrics with Copilot

Summarize embeddings with Copilot

Local explainability support for ML models

See experiment results over time

Function calling replay in prompt playground

Vercel AI auto-instrumentation

Track sessions and context attributes in instrumentation

Easily test your online tasks and evals

Experiment filters

Embedding traces

Experiments Details Visualization

Users can now view a detailed breakdown of labels for their experiments on the Experiments Details page.

Support for o1-mini and o1-preview in playground

We've added full support for all available OpenAI models in the playground including the o1-mini and o1-preview.

Improved auto-complete in playground

We've added better input variable behavior, autocompletion enhancements, support for mustache/f-string input variables, and more.

Filter history

We now store the last three filters used by a user! Users can easily access their filter history in the query filters dropdown, making it simpler to reuse filters for future queries.

Tracing quick filters

Apply filters directly from the table by hovering over the text to reveal the filter icon.

New arize-otel package

Easily add spans to datasets

Easily add spans to a dataset from the Traces page using the "Add to Dataset" button.

See more

You can now run your own custom python code evaluators in Arize against your data in a secure environment. Use background tasks to run any custom code, such as URL validations, or keyword match.

Improve your compliance and policy adherence. You can now use audit logs to monitor data access in Arize. Note: This feature is completely opt-in and this tracking is not enabled unless a customer explicitly asks for it.

You can now create and run evals on your experiments from the UI. Compare performance across different prompt templates, models, or configurations without code.

When running evaluations using background tasks, you can now cancel them mid-flight while observing task logs.

We've made it easier to view, test, and validate your tool calls in prompt playground.

Compare the outputs of a new prompt and the original prompt side-by-side. Tweak model parameters and compare results across your datasets.

We’ve made it way easier to drill into specific time ranges, with quick presets like "last 15 minutes" and custom shorthand for specific dates and times, such as 10d ,4/1 - 4/6, 4/1 3:00am .

Access and manage your prompts in code with support for OpenAI and VertexAI.

Get full visbility into your evaluation task runs, including when it ran, what triggered it, and if there were errors.

Quickly debug and refine your prompts used by your online evaluators by loading them prefilled into prompt playground.

Use Arize to annotate your data with 3rd parties.

Specify which columns of data you'd like to export when exporting data via the by specifying columns .

You can now create datasets through many methods, from traces, code, manually in the UI, or CSV upload.

: Capture, process, and send audio data to Arize and observe your application behavior.

: Assess how well your models identify emotional tones like frustration, joy, or neutrality.

Manage, iterate, and deploy your prompts in one place. Version control your templates and use them across playground, tasks, and APIs.

to evaluate spans without requiring requests to an LLM-as-a-Judge. These include Regex matching, JSON validation, Contains keyword, and more!

Quickly experiment with your prompts across your datasets. All you have to do is click "Save as experiment"

You can now log experiment data manually using a dataframe, instead of running an experiment. This is useful if you already have the data you need, and re-running the query would be expensive.

Users can generate their desired metric by having copilot translate natural language descriptions or existing code (e.g., SQL, Python) into AQL.

Copilot now works for embeddings! Users can select embedding data point and Copilot will analyze for patterns and insights.

Local Explainability is now live, providing both a table view and waterfall style plot for detailed, per-feature SHAP values on individual predictions.

Visualize specific evaluations over time in dashboards.

Now users can follow the full from OpenAI and iterate on different functions in different messages from within the Prompt Playground.

User can now ingest traces created by the Vercel AI SDK into Arize.

You can add metadata and context that will be picked up by all of our auto instrumentations and added to spans.

Users now have the option to to test a task, such as online eval, by running it once on existing data, or apply evaluation labels to older traces.

Users can now filter experiments based on dataset attributes or experiment results, making it easy to identify areas for improvement and track their experiment progress with more precision.

With Embeddings Tracing, you can effortlessly select embedding spans and dive straight into the UMAP visualizer, simplifying troubleshooting for your genAI applications. →

We made it way simpler to add automatic tracing to your applications! It's now just a few lines of code to use OpenTelemetry to trace your LLM application. which uses our arize-otel package.

April 28, 2025
April 26, 2025
Learn more
April 25, 2025
Learn more
April 24, 2025
April 24, 2025
Learn more →
April 24, 2025
Learn more →
April 21, 2025
Learn more →
April 15, 2025
Learn more →
April 14, 2025
Learn more →
April 11, 2025
Learn more →
April 7, 2025
Learn more
April 4, 2025
Learn more →
April 2, 2025
March 24, 2025
Learn more →
March 1, 2025
February 27, 2025
Learn more →
February 20, 2025
February 14, 2025
February 14, 2025
ArizeExportClient
February 14, 2025
Read more
February 14, 2025
January 21, 2025
Audio tracing
Evaluation
January 21, 2025
December 19, 2024
Read more
December 19, 2024
Use our pre-built, off-the-shelf evaluators
December 19, 2024
Read more
December 19, 2024
December 19, 2024
December 05, 2024
December 05, 2024
December 05, 2024
December 05, 2024
December 05, 2024
SDK Reference
November 07, 2024
Learn more →
November 07, 2024
Learn more →
November 07, 2024
Learn more →
November 07, 2024
Learn more →
November 07, 2024
function calling tutorial
November 07, 2024
Learn more →
November 07, 2024
Learn more →
October 24, 2024
Learn more →
October 24, 2024
Learn more →
October 03, 2024
Learn more
October 03, 2024
October 03, 2024
October 03, 2024
October 03, 2024
October 03, 2024
October 03, 2024
Check out our new quickstart guide
October 03, 2024
2024
2023
2022
2021
Voice App Tracing
Span Chat Evaluation
Dashboard generator
Experiment Projects
Calibration Chart
Copilot Custom Metric Skill
Copilot Embedding Summarization Skill
Local Explainability Support
Experiment Over Time Widget
Full Function Calling Replay
Filtering experiments by experiment name
Embedding traces in action
Experiments details visualization
Filters history
Quick filters
"Add to Dataset" & "Setup Task" buttons