Anthropic released Claude Sonnet 4.5, calling it the best coding model yet.
The model scored 77.2% on SWE-bench Verified, rising to 82% with parallel compute.
Anthropic claimed improvements on alignment and safety, but jailbreakers cracked it within minutes.
Anthropic released Claude Sonnet 4.5 on Monday, calling it “the best coding model in the world” and releasing a suite of new developer tools alongside the model. The company said the model can focus for more than 30 hours on complex, multi-step coding tasks and shows gains in reasoning and mathematical capabilities.
Introducing Claude Sonnet 4.5—the best coding model in the world.
It’s the strongest model for building complex agents. It’s the best model at using computers. And it shows substantial gains on tests of reasoning and math. pic.twitter.com/7LwV9WPNAv
The model scored 77.2% on SWE-bench Verified, a benchmark that measures real-world software coding abilities, according to Anthropic’s announcement. That score rises to 82% when using parallel test-time compute. This puts the new model ahead of the best offerings from OpenAI and Google, and even Anthropic’s Claude 4.1 Opus (per the company’s naming scheme, Haiku is a small model, Sonnet is a medium size, and Opus is the heaviest and most powerful model in the family).
Image: Anthropic
Claude Sonnet 4.5 also leads on OSWorld, a benchmark testing AI models on real-world computer tasks, scoring 61.4%. Four months ago, Claude Sonnet 4 held the lead at 42.2%. The model shows improved capabilities across reasoning and math benchmarks, and experts in specific business fields like finance, law and medicine.
We tried the model, and our first quick test found it capable of generating our usual “AI vs Journalists” game using zero-shot prompting without iterations, tweaks, or retries. The model produced functional code faster than Claude 4.1 Opus while maintaining top quality output. The application it created showed visual polish comparable to OpenAI’s outputs, a change from earlier Claude versions that typically produced less refined interfaces.
Anthropic released several new features with the model. Claude Code now includes checkpoints, which save progress and allow users to roll back to previous states. The company refreshed the terminal interface and shipped a native VS Code extension. The Claude API gained a context editing feature and a memory tool that lets agents run longer and handle greater complexity. Claude apps now include code execution and file creation for spreadsheets, slides, and documents directly in conversations.
Pricing remains unchanged from Claude Sonnet 4 at $3 per million input tokens and $15 per million output tokens. All Claude Code updates are available to all users, while Claude Developer Platform updates, including the Agent SDK, are available to all developers.
Anthropic also called Claude Sonnet 4.5 “our most aligned frontier model yet,” saying it made substantial improvements in reducing concerning behaviors like sycophancy, deception, power-seeking, and encouraging delusional thinking. The company also said it made progress on defending against prompt injection attacks, which it identified as one of the most serious risks for users of agentic and computer use capabilities.
Of course, it took Pliny—the world’s most famous AI prompt engineer—a few minutes to jailbreak it and generate drug recipes like it was the most normal thing in the world.
The release comes as competition intensifies among AI companies for coding capabilities. OpenAI released GPT-5 last month, while Google’s models compete on various benchmarks. This can be a shocker for some prediction markets, which up until a few hours ago were almost completely certain that Gemini was going to be the best model of the month.
It may be a race against time. Right now, the model does not appear on the rankings, but LM Arena announced it was already available for ranking. Depending on the number of interactions, the outcome tomorrow could be pretty surprising, considering Claude 4.1 Opus in in second place and Claude 4.5 Sonnet is much better.
Anthropic is also releasing a temporary research preview called “Imagine with Claude,” available to Max subscribers for five days. In the experiment, Claude generates software on the fly with no predetermined functionality or prewritten code, responding and adapting to requests as users interact.
“What you see is Claude creating in real time,” the company said. Anthropic described it as a demonstration of what’s possible when combining the model with appropriate infrastructure.
Generally Intelligent Newsletter
A weekly AI journey narrated by Gen, a generative AI model.
The FSNN News Room is the voice of our in-house journalists, editors, and researchers. We deliver timely, unbiased reporting at the crossroads of finance, cryptocurrency, and global politics, providing clear, fact-driven analysis free from agendas.
We and our selected partners wish to use cookies to collect information about you for functional purposes and statistical marketing. You may not give us your consent for certain purposes by selecting an option and you can withdraw your consent at any time via the cookie icon.
Cookies are small text that can be used by websites to make the user experience more efficient. The law states that we may store cookies on your device if they are strictly necessary for the operation of this site. For all other types of cookies, we need your permission. This site uses various types of cookies. Some cookies are placed by third party services that appear on our pages.
Your permission applies to the following domains:
https://fsnn.net
Necessary
Necessary cookies help make a website usable by enabling basic functions like page navigation and access to secure areas of the website. The website cannot function properly without these cookies.
Statistic
Statistic cookies help website owners to understand how visitors interact with websites by collecting and reporting information anonymously.
Preferences
Preference cookies enable a website to remember information that changes the way the website behaves or looks, like your preferred language or the region that you are in.
Marketing
Marketing cookies are used to track visitors across websites. The intention is to display ads that are relevant and engaging for the individual user and thereby more valuable for publishers and third party advertisers.