Topic
#ai benchmarks
3 articles

Research & ScienceApr 6, 2026
DeepSeek V4 Open Weights Drop as Independent Benchmark Verification Begins
DeepSeek's full V4 open weights are now available under Apache 2.0. Early third-party testing confirms the trillion-parameter MoE model runs on dual RTX 4090s in INT8 — but independent benchmark results are still trickling in, and not all of DeepSeek's claims are holding up cleanly.
5 min read

OpenAIApr 2, 2026
OpenAI's GPT-5.4 Scores 83% on Economic Value Test and Ships Native Computer-Use Capabilities
OpenAI's GPT-5.4 becomes the first general-purpose model to score 83% on GDPVal — a benchmark measuring AI's ability to perform real economic work — while introducing native computer-use capabilities for autonomous agent workflows.
2 min read

AnthropicMar 30, 2026
Claude Opus 4.6 Goes Exponential on METR Benchmark, Completing 14-Hour Human Tasks
Anthropic's latest model achieves a 50% time horizon of 14.5 hours on METR's task-completion benchmark, continuing an exponential trend that has AI capabilities doubling every four months.
4 min read