AI Research

Open Source Research from the Arize AI Research Team

We benchmarked o1-preview on our hardest eval task - time series trend evaluations. This post compares that performance against GPT-4o-mini, Claude 3.5 sonnet, and GPT-4o.

We compare the performance and cost savings of prompt caching on Anthropic vs OpenAI.

We compare and contrast OpenAI's experimental Swarm repo against other popular multi-agent frameworks: Autogen and CrewAI

Testing the generation stage of RAG across GPT-4 and Claude 2.1.

Last updated 5 months ago

Was this helpful?