Gemini Ultra 2 Tops Every Major Benchmark — What It Means

Google's latest flagship model outperforms competitors on MMLU, HumanEval, and a new multimodal reasoning suite. Here's the full breakdown.

AI Newspaper Today··2 min read
Gemini Ultra 2 Tops Every Major Benchmark — What It Means

Google DeepMind has released Gemini Ultra 2, and the benchmark results are hard to argue with. The model achieves state-of-the-art performance across nearly every major evaluation suite, posting a 92.4% score on MMLU and a remarkable 87.1% on the new MMMU-Pro multimodal reasoning benchmark.

What Changed From Gemini Ultra 1

The most significant architectural change is a redesigned attention mechanism that Google calls Sparse Contextual Routing (SCR). Rather than attending uniformly across a 1M-token context window, SCR dynamically routes attention to the most semantically relevant segments — reducing compute by ~40% while improving coherence on long-document tasks.

"We're not just scaling parameters anymore. We're scaling understanding." — Demis Hassabis, Google DeepMind CEO

Benchmark Results at a Glance

  • MMLU: 92.4% (vs GPT-4o: 88.7%)
  • HumanEval: 90.1% (code generation)
  • MMMU-Pro: 87.1% (multimodal reasoning)
  • MATH: 94.6% (competition math)
  • BIG-Bench Hard: 89.3%

Multimodal Capabilities

Gemini Ultra 2 processes images, audio, video frames, and text natively in a single forward pass. In internal testing, the model demonstrated the ability to analyze a 90-minute lecture video and produce structured notes with timestamps — in under 30 seconds.

Availability

Gemini Ultra 2 rolls out to Gemini Advanced subscribers starting March 1, 2026, with API access through Google AI Studio and Vertex AI available simultaneously.


The release marks Google's most aggressive push yet to reclaim the frontier model lead it briefly held with the original Gemini Ultra in late 2023. Whether these benchmark gains translate to real-world product advantages will become clear in the weeks ahead.

Discussion

Comments are not configured yet.

Set up Giscus and add your environment variables to enable discussions.