OpenAI's GPT-5.4 Scores 83% on Economic Value Test and Ships Native Computer-Use Capabilities
OpenAI's GPT-5.4 becomes the first general-purpose model to score 83% on GDPVal — a benchmark measuring AI's ability to perform real economic work — while introducing native computer-use capabilities for autonomous agent workflows.

AI That Can Do Real Jobs
OpenAI's latest model, GPT-5.4, has achieved an 83% score on GDPVal, a benchmark designed to measure how well AI can perform tasks with genuine economic value. The result means the model now matches or exceeds human expert performance across a wide range of professional tasks.
This is not another leaderboard game. GDPVal measures practical capability — whether an AI can actually do work that someone would pay a human to do. An 83% score signals that the gap between AI assistance and AI replacement is narrowing in many white-collar domains.
Native Computer Use Changes the Game
What makes GPT-5.4 particularly significant is that it ships with native, state-of-the-art computer-use capabilities. This is the first general-purpose model from OpenAI built from the ground up to operate computers — navigating applications, clicking buttons, filling out forms, and executing multi-step workflows across different software.
Previous computer-use approaches relied on bolted-on tool use or screenshot-based reasoning. GPT-5.4 integrates this capability at the model level, making agent-driven workflows substantially more reliable and faster.
What This Enables
The combination of high economic-value task performance and native computer use opens up agent deployments that were previously impractical:
- Enterprise automation: complex multi-application workflows that required human operators
- Software testing: autonomous QA agents that can navigate real interfaces
- Research workflows: agents that can gather data across multiple tools and synthesize findings
- Customer operations: end-to-end handling of support, billing, and account management tasks
The Competitive Pressure
Anthropic's Claude and Google's Gemini both offer computer-use capabilities, but GPT-5.4's native integration and GDPVal score put competitive pressure on the entire field. The question is no longer whether AI agents can use computers, but how quickly enterprises will trust them to do so autonomously.
The economic implications are significant. If AI can reliably perform 83% of measurable economic tasks, the business case for rapid adoption shifts from "nice to have" to "competitive necessity."


