Gemini Ultra 2 Tops Every Major Benchmark — What It Means

Google DeepMind has released Gemini Ultra 2, and the benchmark results are hard to argue with. The model achieves state-of-the-art performance across nearly every major evaluation suite, posting a 92.4% score on MMLU and a remarkable 87.1% on the new MMMU-Pro multimodal reasoning benchmark.

What Changed From Gemini Ultra 1

The most significant architectural change is a redesigned attention mechanism that Google calls Sparse Contextual Routing (SCR). Rather than attending uniformly across a 1M-token context window, SCR dynamically routes attention to the most semantically relevant segments — reducing compute by ~40% while improving coherence on long-document tasks.

"We're not just scaling parameters anymore. We're scaling understanding." — Demis Hassabis, Google DeepMind CEO

Benchmark Results at a Glance

MMLU: 92.4% (vs GPT-4o: 88.7%)
HumanEval: 90.1% (code generation)
MMMU-Pro: 87.1% (multimodal reasoning)
MATH: 94.6% (competition math)
BIG-Bench Hard: 89.3%

Multimodal Capabilities

Gemini Ultra 2 processes images, audio, video frames, and text natively in a single forward pass. In internal testing, the model demonstrated the ability to analyze a 90-minute lecture video and produce structured notes with timestamps — in under 30 seconds.

Availability

Gemini Ultra 2 rolls out to Gemini Advanced subscribers starting March 1, 2026, with API access through Google AI Studio and Vertex AI available simultaneously.

The release marks Google's most aggressive push yet to reclaim the frontier model lead it briefly held with the original Gemini Ultra in late 2023. Whether these benchmark gains translate to real-world product advantages will become clear in the weeks ahead.

Gemini Ultra 2 Tops Every Major Benchmark — What It Means

What Changed From Gemini Ultra 1

Benchmark Results at a Glance

Multimodal Capabilities

Availability

Stay up to date with AI news

Discussion

Related Articles

Google's TurboQuant Algorithm Slashes LLM Memory Usage by 6x, Opening the Door to On-Device AI

Google's TurboQuant Compresses AI Memory 6x With Zero Accuracy Loss, Rattles Chip Industry

DeepMind's New World Model Lets Robots Learn Physical Tasks from Video Alone