AI Research
Last updated
Was this helpful?
Last updated
Was this helpful?
We benchmarked o1-preview on our hardest eval task - time series trend evaluations. This post compares that performance against GPT-4o-mini, Claude 3.5 sonnet, and GPT-4o.
We compare the performance and cost savings of prompt caching on Anthropic vs OpenAI.
We compare and contrast OpenAI's experimental Swarm repo against other popular multi-agent frameworks: Autogen and CrewAI
Testing the generation stage of RAG across GPT-4 and Claude 2.1.
Lessons learned from our journey to one million downloads of our OpenTelemetry wrapper, .
We built the in LangGraph, LlamaIndex Workflows, CrewAI, Autogen, and pure code. See how each framework compares.