OpenAI launched GPT-5.4 amid the growing QuitGPT backlash over its Pentagon AI contract.
GPT-5.4 adds a 1-million-token context window, stronger reasoning, and agentic capabilities.
Enterprise users benefit most as GPT-5.4 delivers faster AI agents with fewer tokens.
OpenAI began rolling out GPT-5.4—its most capable model to date—on Thursday as the company scrambles to contain a PR crisis that has seen an estimated 2.5 million users take actions against the company, either by canceling their subscription or sharing the boycott on social media.
The so-called QuitGPT movement exploded after OpenAI revealed a deal with the U.S. Department of Defense hours after Anthropic publicly walked away from the same contract—earning the Claude maker the public scorn of President Trump and other government officials.
Anthropic’s sticking point: The DoD refused to include language explicitly prohibiting the deployment of autonomous weapons and mass surveillance of U.S. citizens.
OpenAI took the deal anyway. CEO Sam Altman, who has been fielding questions about the apparent gap between his company’s stated safety red lines and the contract’s actual language, needs those users back.
Enter GPT-5.4… just two days after GPT-5.3 was introduced.
The new model consolidates reasoning, coding, and agentic capabilities into a single release. It also has a million tokens of context capability, which translates in users having more freedom to handle large amounts of information in a single session.
On paper, the numbers look promising. On GDPval—a benchmark testing knowledge work across 44 occupations—GPT-5.4 matches or beats industry professionals in 83.0% of comparisons, up from 70.9% for GPT-5.2. Computer use is the biggest leap: On OSWorld-Verified, which measures a model’s ability to operate a desktop through screenshots and keyboard/mouse actions, GPT-5.4 hits a 75.0% success rate versus GPT-5.2’s 47.3%—and clears the human baseline of 72.4%.
On BrowseComp, a test of deep web research, it jumps 17 percentage points over GPT-5.2. The 1 million token context window and a mid-response steering feature—letting users redirect the model while it’s still thinking—round out the headline features.
The feature saves time and computation by avoiding the need to discard all previously generated tokens when an error is detected.
Who will benefit from GPT 5.4?
It’s important to note that some benchmarks mostly compare GPT-5.4—and most of the time, reasoning was set to extra high effort, which free and Plus users don’t get to enjoy—to GPT-5.2, skipping over GPT-5.3 entirely.
For users already on GPT-5.3, several gains may feel more incremental than the charts suggest.
Coders have the most reason to temper expectations: On SWE-Bench Pro, the improvement from GPT-5.3-Codex (56.8%) to GPT-5.4 (57.7%) is barely a rounding error. The model also claims significantly fewer tokens are required to complete tasks compared to GPT-5.2.
“GPT‑5.4 is our most token-efficient reasoning model yet, using significantly fewer tokens to solve problems when compared to GPT‑5.2”, OpenAI said.
That said, any improvement in this field is a positive for developers who use OpenAI models via API and get charged per token used. A model with an efficient chain of thought may provide the same results at a fraction of the cost, versus a model that tends to overthink things to ensure it reaches the proper conclusion.
There’s another wrinkle for anyone hoping to use the new model right now: OpenAI says GPT-5.4 will be released today, but it wasn’t yet available as of this writing, so it is likely being slowly rolled out. For most users, the best model is GPT 5.3, and it can only be used for instant replies, meaning it provides answers that don’t require too much effort.
Users who rely on thinking—OpenAI’s terminology for extended chain-of-thought reasoning on complex tasks—are still on GPT-5.2. In other words, the users most likely to push the model’s limits are the last ones to get it.
The clearest beneficiaries are enterprise users doing document-heavy work. On an internal spreadsheet modeling benchmark, GPT-5.4 scored 87.3% against GPT-5.2’s 68.4%. Legal research firm Harvey said it scored 91% on its BigLaw Bench eval. Mainstay, which runs agents across 30,000 property tax portals, reported a 95% first-attempt success rate and sessions running “~3x faster while using ~70% fewer tokens.”
That’s the kind of efficiency argument that might matter to enterprise procurement teams—but it’s a harder sell to the individual user reconsidering whether to delete their account.
Daily Debrief Newsletter
Start every day with the top news stories right now, plus original features, a podcast, videos and more.
The FSNN News Room is the voice of our in-house journalists, editors, and researchers. We deliver timely, unbiased reporting at the crossroads of finance, cryptocurrency, and global politics, providing clear, fact-driven analysis free from agendas.
We and our selected partners wish to use cookies to collect information about you for functional purposes and statistical marketing. You may not give us your consent for certain purposes by selecting an option and you can withdraw your consent at any time via the cookie icon.
Cookies are small text that can be used by websites to make the user experience more efficient. The law states that we may store cookies on your device if they are strictly necessary for the operation of this site. For all other types of cookies, we need your permission. This site uses various types of cookies. Some cookies are placed by third party services that appear on our pages.
Your permission applies to the following domains:
https://fsnn.net
Necessary
Necessary cookies help make a website usable by enabling basic functions like page navigation and access to secure areas of the website. The website cannot function properly without these cookies.
Statistic
Statistic cookies help website owners to understand how visitors interact with websites by collecting and reporting information anonymously.
Preferences
Preference cookies enable a website to remember information that changes the way the website behaves or looks, like your preferred language or the region that you are in.
Marketing
Marketing cookies are used to track visitors across websites. The intention is to display ads that are relevant and engaging for the individual user and thereby more valuable for publishers and third party advertisers.