OpenAI releases o3 and o4-mini

OpenAI released their o3 and o4-mini reasoning models. The new models are positioned as their latest and smartest.

The o3 is the new frontier, while o4-mini is a smaller model optimized for speed and cost-efficiency.

One thing that everyone noticed is that OpenAI only compares them to their own older models in all charts. That's a pity, because what we'd like to see the most is how they compare against Gemini 2.5 Pro and Claude Sonnet 3.7.

In the Aider's leaderboard, o3 gets the first spot with 79.6% correctness, compared to second place (Gemini 2.5 Pro) at 72.9%. But it also costs a whopping $111 versus $6, making it completely impractical. o4-mini takes the third spot (72% / $20).

And Codex...

OpenAI also released Codex—a tool for AI coding in the terminal.

It's a direct answer to Claude Code and the aforementioned Aider.

After reading through some comments on Hacker News, it seems like the first experiences are not great. On the other hand, people seem to really like Claude Code, even though it's closed-source and turns out to be quite expensive. Well, I need to try it.

Update 18.04.2025: Just noticed, Aider updated the leaderboards, placing o3+GPT4.1 at the first spot performance wise. It's also a bit cheaper, but still pretty expensive.

The latest top 4 places of the Aider LLM Leaderboards

#Public #Article #OpenAI