OpenAI's o3 Is Here: A Reasoning Model That Thinks Before It Speaks
OpenAI's o3 model introduces variable compute at inference time — letting it 'think longer' on hard problems. Early results suggest a step-change in complex reasoning.

OpenAI has released o3, the latest in its reasoning-first model family, and it represents a fundamentally different approach to AI problem-solving. Unlike traditional LLMs that generate tokens left-to-right in a single pass, o3 can allocate additional compute at inference time to work through problems step-by-step before producing a final answer.
The Core Idea: Compute-Scaled Reasoning
OpenAI describes o3 as a model that can "think at variable depth." For simple questions, it responds instantly. For problems requiring multi-step deduction — a math proof, a debugging session, a legal analysis — it can spend seconds or minutes reasoning internally before responding.
This is achieved through what the company calls Extended Chain-of-Thought training, where the model was trained on verified reasoning traces rather than just final answers.
What Makes o3 Different From o1
- Longer internal reasoning chains — o3 can sustain reasoning across 10,000+ tokens internally
- Self-correction — the model backtracks when it detects contradictions in its own chain
- Tool-augmented reasoning — o3 can call code execution, search, and calculators mid-thought
- Configurable compute budget — the API exposes a
reasoning_effortparameter:low,medium,high
Key Benchmark Scores
| Task | o3 | o1 | GPT-4o | |---|---|---|---| | AIME 2025 | 96.7% | 83.3% | 9.3% | | SWE-bench Verified | 71.7% | 48.9% | 38.8% | | GPQA Diamond | 87.7% | 78.3% | 53.6% | | ARC-AGI | 87.5% | 32.0% | 5.0% |
Pricing and Access
o3 is available via the OpenAI API today. Pricing is $15 per million input tokens and $60 per million output tokens at high reasoning effort — reflecting the additional compute. A lighter o3-mini variant targeting coding and math is expected later this quarter at significantly lower cost.
The ARC-AGI result — a benchmark specifically designed to resist memorization — is the number that has the research community talking most. Whether o3 represents a genuine step toward general reasoning or an impressive but narrow capability remains the central debate.