Gemini Ultra 2 Tops Every Major Benchmark — What It Means
Google's latest flagship model outperforms competitors on MMLU, HumanEval, and a new multimodal reasoning suite. Here's the full breakdown.

Google DeepMind has released Gemini Ultra 2, and the benchmark results are hard to argue with. The model achieves state-of-the-art performance across nearly every major evaluation suite, posting a 92.4% score on MMLU and a remarkable 87.1% on the new MMMU-Pro multimodal reasoning benchmark.
What Changed From Gemini Ultra 1
The most significant architectural change is a redesigned attention mechanism that Google calls Sparse Contextual Routing (SCR). Rather than attending uniformly across a 1M-token context window, SCR dynamically routes attention to the most semantically relevant segments — reducing compute by ~40% while improving coherence on long-document tasks.
"We're not just scaling parameters anymore. We're scaling understanding." — Demis Hassabis, Google DeepMind CEO
Benchmark Results at a Glance
- MMLU: 92.4% (vs GPT-4o: 88.7%)
- HumanEval: 90.1% (code generation)
- MMMU-Pro: 87.1% (multimodal reasoning)
- MATH: 94.6% (competition math)
- BIG-Bench Hard: 89.3%
Multimodal Capabilities
Gemini Ultra 2 processes images, audio, video frames, and text natively in a single forward pass. In internal testing, the model demonstrated the ability to analyze a 90-minute lecture video and produce structured notes with timestamps — in under 30 seconds.
Availability
Gemini Ultra 2 rolls out to Gemini Advanced subscribers starting March 1, 2026, with API access through Google AI Studio and Vertex AI available simultaneously.
The release marks Google's most aggressive push yet to reclaim the frontier model lead it briefly held with the original Gemini Ultra in late 2023. Whether these benchmark gains translate to real-world product advantages will become clear in the weeks ahead.