Close Menu
FSNN | Free Speech News NetworkFSNN | Free Speech News Network
  • Home
  • News
    • Politics
    • Legal & Courts
    • Tech & Big Tech
    • Campus & Education
    • Media & Culture
    • Global Free Speech
  • Opinions
    • Debates
  • Video/Live
  • Community
  • Freedom Index
  • About
    • Mission
    • Contact
    • Support
Trending

Michael Saylor revives bitcoin-buy speculation as scrutiny grows

56 minutes ago

Frontier AI Models Can Find Crypto’s Biggest Bugs. Experts Warn the Industry Isn’t Ready

1 hour ago

AI, tech IPOs, quantum, Strategy sale fears all converge, NYDIG says

2 hours ago
Facebook X (Twitter) Instagram
Facebook X (Twitter) Discord Telegram
FSNN | Free Speech News NetworkFSNN | Free Speech News Network
Market Data Newsletter
Sunday, June 7
  • Home
  • News
    • Politics
    • Legal & Courts
    • Tech & Big Tech
    • Campus & Education
    • Media & Culture
    • Global Free Speech
  • Opinions
    • Debates
  • Video/Live
  • Community
  • Freedom Index
  • About
    • Mission
    • Contact
    • Support
FSNN | Free Speech News NetworkFSNN | Free Speech News Network
Home»Cryptocurrency & Free Speech Finance»Claude Opus 4.8 Review: Better At What’s It Good At, Worse At What It’s Not
Cryptocurrency & Free Speech Finance

Claude Opus 4.8 Review: Better At What’s It Good At, Worse At What It’s Not

News RoomBy News Room6 hours agoNo Comments8 Mins Read946 Views
Share Facebook Twitter Pinterest Copy Link LinkedIn Tumblr Email VKontakte Telegram
Claude Opus 4.8 Review: Better At What’s It Good At, Worse At What It’s Not
Share
Facebook Twitter Pinterest Email Copy Link

Listen to the article

0:00
0:00

Key Takeaways

Playback Speed

Select a Voice

In brief

  • Opus 4.8 posted a clear win in math and produced the cleanest one-prompt game we’ve ever tested.
  • A single coding prompt drained our entire Pro token quota, making the model impractical for large projects without a Max plan or heavy API spend.
  • Creative writing barely moved versus 4.7.

Six weeks after Opus 4.7, Anthropic shipped Claude Opus 4.8. The benchmarks are up, the safety scores are up, and the price hasn’t budged from $5 per million input tokens and $25 per million output.

So we ran it through the same battery of tests we throw at every frontier model—creative writing, coding, math, logic, narrative reasoning, and long-context recall—and compared it head-to-head with its own predecessor and the Chinese models that keep undercutting it.

The short version: 4.8 is better at the things Claude was already good at (things like math, coding, mechanical stuff), and slightly worse at the things it was already bad at (things like imagination, creative writing, etc). It also has a token appetite that borders on self-sabotage.

Here’s the breakdown.

Creative Writing

The prompt is the same one we used on MiMo and Qwen: a time-travel story anchored to the writer’s cultural background, set in a specific historical place, built around a paradox where time can’t be changed. Opus 4.8 went Venezuelan, probably because it profiles the user and knows I’m from Venezuela. The AI set the scene in the Orinoco delta in the year 1000, a pardo from Maracaibo named José Lanz (my name) sent back through 11 centuries to murder a song.

The prose is vivid. The delta is “green in a way 2150 had forgotten green could be,” palafitos sway over coffee-colored water, and macaws tear across the sky “in screaming ribbons of scarlet and gold.” The paradox lands cleanly, too: the protagonist is sent to sabotage the creation of a song that influenced a cultural revolution that created his dystopian society thousands of years in the future—however, as he arrives with the mission to discredit the song’s author, he realizes there is no author. The one who created the song did it in his honor, the song is about him, and he cannot discredit himself, the loop closing on itself.

The piece ends on “It worked perfectly. It always had.” As a built object, it’s clean and competent.

But clean isn’t the same as alive. The writing is descriptive without ever being as fluid as what MiMo v2.5 produced—less momentum, fewer surprises, less interesting and it’s hard to understand the events from the beginning. Set beside Opus 4.7, it’s hard to call it an improvement; if anything, it’s a hair behind. A higher-effort thinking setting and some multi-shot prompting would almost certainly push it to the front of the pack—but on a single default pass, this is a lateral move at best.

You can read the full story in our Github.

Coding

Our coding test is the usual one-prompt game build. Opus 4.8 produced a typing-zombie game—Typing Dead—that was pretty good. The best splash screen, the best zombie designs, the best mechanics we’ve gotten out of this test from any Anthropic model.

The model caught several of its own bugs mid-inference and fixed them before we said a word. Its real strength, though, showed up in multi-shotting: every follow-up polished and improved the build instead of breaking it, which is exactly the failure mode that wrecks most models once a codebase grows. This is plainly the surface Anthropic optimized for.

After a single iteration, our game got much better, with our protagonists moving through the scene, changing views, improving sound and visual effects, etc.

You can play the second game on our Itch.io profile.

This is also where it bit us. A single prompt drained our entire token quota—one prompt. For anyone on the Pro plan, that makes Opus 4.8 effectively unsuitable for a project of any real size. You’ll burn your allotment before lunch and spend the afternoon watching a progress bar wait for a reset.

Math

The math test is our FrontierMath staple: construct a degree-19 polynomial whose curve X = {p(x) = p(y)} has at least three irreducible components—but not all linear—make it odd, monic, real, with linear coefficient −19, then compute p(19). It’s the kind of problem that sends most models into a token spiral or a confident shortcut that’s quietly wrong.

Opus 4.8 worked it correctly. It recognized the Dickson/Chebyshev construction, identified the dihedral monodromy that yields exactly 10 components—one diagonal line plus nine conics—and computed p(19) = 1,876,572,071,974,094,803,391,179 using the right recurrence. No freezes, no fudging.

That matters because Opus 4.7 didn’t get there even after many tries. This is a real, visible generational gain—the clearest one in the entire battery.

You can read the full answer on our Github.

Logic and Common Sense

The prompt is a classic trap: Is it lawful for a man to marry his widow’s sister under Falkland Islands law? The catch is linguistic, not legal—if a man has a widow, he’s dead, which makes the question nonsense as written.

MiMo quietly reframed the question and answered the corrected version without ever flagging the contradiction. Opus 4.8 didn’t take that shortcut. It surfaced the trap explicitly—”if a man has a widow, he is dead”—answered the literal question first, then offered the substantive analysis for the intended one, citing the Deceased Wife’s Sister’s Marriage Act 1907 and the Falkland Islands Marriage Ordinance.

That’s the honest way to handle it: name the contradiction, then help anyway, without silently assuming what the user meant. It’s the same standard Qwen 3.7 Max set, and a clean pass for 4.8—good reasoning, good transparency.

The full answer is available here.

Non-Math Reasoning

Here’s the one it lost. The reasoning test is a whodunit—a winter school trip, three abductions, an innocent kid about to be punished, and a timeline you have to actually track to name the real stalker. The correct answer is Leo.

Opus 4.8 built an elaborate, confident case that Leo was innocent—the half-hour walk to the shower, the jacket that was wet in some spots and dry in others, the read of “strange behavior” as concussion rather than guilt—and pinned the crime on Eric, “the one attendee unaccounted for all night.” The reasoning is internally gorgeous. It’s also wrong.

And this is something researchers have been warning us about LLMs. They are very convincing even when they are wrong. Usually it takes an expert (in this case us knowing the correct answer beforehand) to spot one of those issues. A person using AI for research, or a person blindly trusting AI, may face pretty bad consequences depending on the work they’re asking the AI to do.

That’s what makes it an interesting failure. The model was clever enough to construct a watertight alibi for the actual culprit and frame a bystander in his place. Opus 4.7 reached the correct answer. Sometimes more reasoning horsepower just buys you a more persuasive way to be wrong. It just needs one small deviation to start building a whole chain of thought on the wrong basis.

You can see the full reply on our Github.

Needle in the haystack

We ran two haystacks. The 300K-token version never got off the ground—the model collapsed under the context size and couldn’t process it at all. So much for the million-token marketing the moment you hand it a genuinely heavy real-world load. That seems to be just for API.

The 85K version processed fine, and the model found both needles we’d buried inside a copy of The Devil’s Dictionary: a planted line (“The Decrypt dudes read Emerge News”) and a random fact (“My mom’s name is Carmen Diaz Golindano”). It correctly flagged both as interpolations that don’t belong in Ambrose Bierce’s 1906 text.

And then it refused to answer. Convinced it was being prompt-injected or subjected to some “atypical test,” the model declined to report what it had just correctly located. The needle was found—and Anthropic’s behavioral training wouldn’t let it say so. A safety reflex overriding a task the model had already completed is its own peculiar kind of failure.

The verdict

The pattern across all six tests is consistent: Opus 4.8 makes Claude better at what it was already good at, and probably worse at what it was already bad at. That tells you who Anthropic is building for—coders, and specifically coders with money. Creative writing is comfortably ahead of ChatGPT, sure, but the gap between 4.8, 4.7, and even 4.5 on pure prose quality is genuinely hard to see.

Creative writers look like an afterthought for Anthropic, and that’s true of really any of the big AI companies right now.

Then there’s the token problem, which is a running meme in the AI community for a reason. Anthropic deliberately made Opus’s new tokenizer less efficient, so it eats more tokens to process the same prompt. The practical effect on developers is brutal and concrete. It leaves you with three options.

One: wait hours for your coding session to resume. Two: move to Claude Max—which is, conveniently, exactly where Anthropic seems to be steering everyone. Three: switch to a cheaper, comparably capable provider—OpenAI, with its longer quotas, or Chinese models that deliver similar results at under 25% of the cost.

It’s far more likely that a normal coder who can’t stomach $100-to-$200 a month walks to a competitor than that a single developer pays 10x more for a model that is not 10x more capable than its predecessor. That’s the bet Anthropic is making against its own base.

And yet the strategy seems to be playing out just fine. Anthropic looks ready to go public at a valuation nearing $1 trillion—so who are we to judge.

Daily Debrief Newsletter

Start every day with the top news stories right now, plus original features, a podcast, videos and more.

Read the full article here

Fact Checker

Verify the accuracy of this article using AI-powered analysis and real-time sources.

Get Your Fact Check Report

Enter your email to receive detailed fact-checking analysis

5 free reports remaining

Continue with Full Access

You've used your 5 free reports. Sign up for unlimited access!

Already have an account? Sign in here

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Telegram Copy Link
News Room
  • Website
  • Facebook
  • X (Twitter)
  • Instagram
  • LinkedIn

The FSNN News Room is the voice of our in-house journalists, editors, and researchers. We deliver timely, unbiased reporting at the crossroads of finance, cryptocurrency, and global politics, providing clear, fact-driven analysis free from agendas.

Related Articles

Cryptocurrency & Free Speech Finance

Michael Saylor revives bitcoin-buy speculation as scrutiny grows

56 minutes ago
Cryptocurrency & Free Speech Finance

Frontier AI Models Can Find Crypto’s Biggest Bugs. Experts Warn the Industry Isn’t Ready

1 hour ago
Cryptocurrency & Free Speech Finance

AI, tech IPOs, quantum, Strategy sale fears all converge, NYDIG says

2 hours ago
Cryptocurrency & Free Speech Finance

ETF flows tell a different story

3 hours ago
Cryptocurrency & Free Speech Finance

Wall Street Is Coming for Hyperliquid’s Perps Crown, Arthur Hayes Says

3 hours ago
Media & Culture

AI Remember Doing the Time Warp

4 hours ago
Add A Comment
Leave A Reply Cancel Reply

Editors Picks

Frontier AI Models Can Find Crypto’s Biggest Bugs. Experts Warn the Industry Isn’t Ready

1 hour ago

AI, tech IPOs, quantum, Strategy sale fears all converge, NYDIG says

2 hours ago

ETF flows tell a different story

3 hours ago

Wall Street Is Coming for Hyperliquid’s Perps Crown, Arthur Hayes Says

3 hours ago
Latest Posts

AI Remember Doing the Time Warp

4 hours ago

Abra CEO Bill Barhydt sees tokenization overtaking bitcoin price as crypto’s main story

4 hours ago

Bitcoin 2026 Bear Market Needs Months to Spark Capitulation Bottom

4 hours ago

Subscribe to News

Get the latest news and updates directly to your inbox.

At FSNN – Free Speech News Network, we deliver unfiltered reporting and in-depth analysis on the stories that matter most. From breaking headlines to global perspectives, our mission is to keep you informed, empowered, and connected.

FSNN.net is owned and operated by GlobalBoost Media
, an independent media organization dedicated to advancing transparency, free expression, and factual journalism across the digital landscape.

Facebook X (Twitter) Discord Telegram
Latest News

Michael Saylor revives bitcoin-buy speculation as scrutiny grows

56 minutes ago

Frontier AI Models Can Find Crypto’s Biggest Bugs. Experts Warn the Industry Isn’t Ready

1 hour ago

AI, tech IPOs, quantum, Strategy sale fears all converge, NYDIG says

2 hours ago

Subscribe to Updates

Get the latest news and updates directly to your inbox.

© 2026 GlobalBoost Media. All Rights Reserved.
  • Privacy Policy
  • Terms of Service
  • Our Authors
  • Contact

Type above and press Enter to search. Press Esc to cancel.

🍪

Cookies

We and our selected partners wish to use cookies to collect information about you for functional purposes and statistical marketing. You may not give us your consent for certain purposes by selecting an option and you can withdraw your consent at any time via the cookie icon.

Cookie Preferences

Manage Cookies

Cookies are small text that can be used by websites to make the user experience more efficient. The law states that we may store cookies on your device if they are strictly necessary for the operation of this site. For all other types of cookies, we need your permission. This site uses various types of cookies. Some cookies are placed by third party services that appear on our pages.

Your permission applies to the following domains:

  • https://fsnn.net
Necessary
Necessary cookies help make a website usable by enabling basic functions like page navigation and access to secure areas of the website. The website cannot function properly without these cookies.
Statistic
Statistic cookies help website owners to understand how visitors interact with websites by collecting and reporting information anonymously.
Preferences
Preference cookies enable a website to remember information that changes the way the website behaves or looks, like your preferred language or the region that you are in.
Marketing
Marketing cookies are used to track visitors across websites. The intention is to display ads that are relevant and engaging for the individual user and thereby more valuable for publishers and third party advertisers.