Topic

#ai benchmarks

3 articles

Research & ScienceApr 6, 2026

DeepSeek V4 Open Weights Drop as Independent Benchmark Verification Begins

DeepSeek's full V4 open weights are now available under Apache 2.0. Early third-party testing confirms the trillion-parameter MoE model runs on dual RTX 4090s in INT8 — but independent benchmark results are still trickling in, and not all of DeepSeek's claims are holding up cleanly.

5 min read

OpenAIApr 2, 2026

OpenAI's GPT-5.4 Scores 83% on Economic Value Test and Ships Native Computer-Use Capabilities

OpenAI's GPT-5.4 becomes the first general-purpose model to score 83% on GDPVal — a benchmark measuring AI's ability to perform real economic work — while introducing native computer-use capabilities for autonomous agent workflows.

2 min read

AnthropicMar 30, 2026

Claude Opus 4.6 Goes Exponential on METR Benchmark, Completing 14-Hour Human Tasks

Anthropic's latest model achieves a 50% time horizon of 14.5 hours on METR's task-completion benchmark, continuing an exponential trend that has AI capabilities doubling every four months.

4 min read