04.2025
Our April 2025 releases
Last updated
Was this helpful?
Our April 2025 releases
Last updated
Was this helpful?
We've increased the row limit for datasets in the playground, so you can run prompts in parallel on up to 100 examples.
Compare the outputs of a new prompt and the original prompt side-by-side. Tweak model parameters and compare results across your datasets.
We now support logging image segmentation to Arize. Log your segmentation coordinates and compare your predictions vs. your actuals.
Easily run your online evaluation tasks over historical data.
You can now create and run evals on your experiments from the UI. Compare performance across different prompt templates, models, or configurations without code.
You can now create and run evals on your experiments from the UI. Compare performance across different prompt templates, models, or configurations without code.
When running evaluations using background tasks, you can now cancel them mid-flight while observing task logs.
We've made it easier to view, test, and validate your tool calls in prompt playground.
We’ve made it way easier to drill into specific time ranges, with quick presets like "last 15 minutes" and custom shorthand for specific dates and times, such as 10d
,4/1 - 4/6
, 4/1 3:00am
.
Access and manage your prompts in code with support for OpenAI and VertexAI.
Get full visbility into your evaluation task runs, including when it ran, what triggered it, and if there were errors.