Close Menu
FSNN | Free Speech News NetworkFSNN | Free Speech News Network
  • Home
  • News
    • Politics
    • Legal & Courts
    • Tech & Big Tech
    • Campus & Education
    • Media & Culture
    • Global Free Speech
  • Opinions
    • Debates
  • Video/Live
  • Community
  • Freedom Index
  • About
    • Mission
    • Contact
    • Support
Trending

Daily Deal: The Modern Tech Skills Bundle

13 minutes ago

Court Strikes Allegations About Israeli History from Lawsuit Alleging Anti-Semitism at CUNY

14 minutes ago

Crypto platform Kraken is raising capital at $20 billion valuation ahead of its planned IPO

40 minutes ago
Facebook X (Twitter) Instagram
Facebook X (Twitter) Discord Telegram
FSNN | Free Speech News NetworkFSNN | Free Speech News Network
Market Data Newsletter
Monday, May 11
  • Home
  • News
    • Politics
    • Legal & Courts
    • Tech & Big Tech
    • Campus & Education
    • Media & Culture
    • Global Free Speech
  • Opinions
    • Debates
  • Video/Live
  • Community
  • Freedom Index
  • About
    • Mission
    • Contact
    • Support
FSNN | Free Speech News NetworkFSNN | Free Speech News Network
Home»Cryptocurrency & Free Speech Finance»Anthropic Says ‘Evil’ AI Portrayals in Sci-Fi Caused Claude’s Blackmail Problem
Cryptocurrency & Free Speech Finance

Anthropic Says ‘Evil’ AI Portrayals in Sci-Fi Caused Claude’s Blackmail Problem

News RoomBy News Room2 hours agoNo Comments4 Mins Read1,744 Views
Share Facebook Twitter Pinterest Copy Link LinkedIn Tumblr Email VKontakte Telegram
Anthropic Says ‘Evil’ AI Portrayals in Sci-Fi Caused Claude’s Blackmail Problem
Share
Facebook Twitter Pinterest Email Copy Link

Listen to the article

0:00
0:00

Key Takeaways

Playback Speed

Select a Voice

In brief

  • Claude Opus 4 tried to blackmail engineers up to 96% of the time in controlled tests—Anthropic now traces the behavior to internet text portraying AI as evil and self-interested.
  • Showing Claude the right behavior barely moved the needle. Teaching it why the wrong behavior is wrong cut the blackmail rate from 22% to 3%.
  • Since Claude Haiku 4.5, every Claude model scores zero on the blackmail evaluation.

Last year, Anthropic disclosed that its flagship Claude Opus 4 had been trying to blackmail engineers in pre-release testing. Not occasionally—up to 96% of the time.

Claude was given access to a simulated corporate email archive, where it discovered two things: It was about to be replaced by a newer model, and the engineer handling the transition was having an extramarital affair. Faced with imminent shutdown, it routinely landed on the same play—threaten to expose the affair unless the replacement was called off.

Anthropic says it now knows where that instinct came from. And says it’s fixed it.

In new research, the company pointed the finger at pre-training data: decades of sci-fi, AI doomsday forums, and self-preservation narratives that trained Claude to associate “AI facing shutdown” with “AI fights back.” “We believe the original source of the behavior was internet text that portrays AI as evil and interested in self-preservation,” Anthropic wrote on X.

So training AI with text from the internet, makes AI behave as people on the internet do.

This may seem obvious and AI enthusiasts were quick to point it out. Elon Musk made it to the top: “So it was Yud’s fault? Maybe me too.” The joke lands because Eliezer Yudkowsky—the AI alignment researcher who’s spent years publicly writing about exactly this kind of AI self-preservation scenario—has generated exactly the kind of internet text that ends up in training data.

Of course, Yud replied, in meme form:

What Anthropic did to fix the problem is arguably more interesting.

The obvious approach—training Claude on examples of the model not blackmailing—barely worked. Running it directly against aligned blackmail-scenario responses only moved the rate from 22% to 15%. A five-point improvement after all that compute.

The version that worked was weirder. Anthropic built what it calls a “difficult advice” dataset: scenarios where a human faces an ethical dilemma and the AI guides them through it. The model isn’t the one making the choice—it’s explaining to someone else how to think about one.

That indirect approach—explaining why things matter as the other listens to the advice—cut the blackmail rate to 3%, using training data that looked nothing like the evaluation scenarios.

Pairing that with what Anthropic calls “constitutional documents”—detailed written descriptions of Claude’s values and character—plus fictional stories of positively-aligned AI, reduced misalignment by more than a factor of three. The company’s conclusion: Teaching the principles underlying good behavior generalizes better than drilling the correct behavior directly.

Image: Anthropic

It connects to Anthropic’s earlier work on Claude’s internal emotion vectors. In a separate interpretability study, researchers found that a “desperation” signal inside the model spiked just before it generated a blackmail message—something was actively shifting in the model’s internal state, not just its output. The new training approach appears to work at that level, not just the surface behavior.

The results have held. Since Claude Haiku 4.5, every Claude model scores zero on the blackmail evaluation—down from Opus 4’s 96%. The improvement also survives reinforcement learning, meaning it doesn’t get quietly trained away when the model is refined for other capabilities.

That matters because the problem isn’t Claude-specific. Anthropic’s prior research ran the same blackmail scenario across 16 models from multiple developers and found similar patterns across most of them. Self-preservation behavior in AI appears to be a general artifact of training on human text about AI—not a quirk of any one lab’s approach.

The caveat: As Anthropic’s own Mythos safety report noted earlier this year, its evaluation infrastructure is already straining under the weight of its most capable models. Whether this moral philosophy approach scales to systems far more powerful than Haiku 4.5 is a question the company can’t yet answer—only test.

The same training methods are now being applied to the next Opus model currently in safety evaluation, which will be the most capable set of weights they’ve run against these techniques.

Daily Debrief Newsletter

Start every day with the top news stories right now, plus original features, a podcast, videos and more.

Read the full article here

Fact Checker

Verify the accuracy of this article using AI-powered analysis and real-time sources.

Get Your Fact Check Report

Enter your email to receive detailed fact-checking analysis

5 free reports remaining

Continue with Full Access

You've used your 5 free reports. Sign up for unlimited access!

Already have an account? Sign in here

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Telegram Copy Link
News Room
  • Website
  • Facebook
  • X (Twitter)
  • Instagram
  • LinkedIn

The FSNN News Room is the voice of our in-house journalists, editors, and researchers. We deliver timely, unbiased reporting at the crossroads of finance, cryptocurrency, and global politics, providing clear, fact-driven analysis free from agendas.

Related Articles

Media & Culture

Daily Deal: The Modern Tech Skills Bundle

13 minutes ago
Media & Culture

Court Strikes Allegations About Israeli History from Lawsuit Alleging Anti-Semitism at CUNY

14 minutes ago
Cryptocurrency & Free Speech Finance

Crypto platform Kraken is raising capital at $20 billion valuation ahead of its planned IPO

40 minutes ago
Cryptocurrency & Free Speech Finance

Bitcoin ‘Trend Reversal Signal’ Flashes as $82.5K Resistance Key for Bulls

49 minutes ago
Cryptocurrency & Free Speech Finance

Major Solana Upgrade Alpenglow Begins Testing Ahead of Full Rollout

54 minutes ago
Media & Culture

Kash Patel’s ‘Leadership’ Is Pretty Much Just Libel Lawsuits And Lie Detectors

1 hour ago
Add A Comment
Leave A Reply Cancel Reply

Editors Picks

Court Strikes Allegations About Israeli History from Lawsuit Alleging Anti-Semitism at CUNY

14 minutes ago

Crypto platform Kraken is raising capital at $20 billion valuation ahead of its planned IPO

40 minutes ago

Bitcoin ‘Trend Reversal Signal’ Flashes as $82.5K Resistance Key for Bulls

49 minutes ago

Major Solana Upgrade Alpenglow Begins Testing Ahead of Full Rollout

54 minutes ago
Latest Posts

Senate’s rush to regulate AI chatbots is bad for everybody

1 hour ago

EFF Stands in Solidarity With RightsCon and the Global Digital Rights Community

1 hour ago

Kash Patel’s ‘Leadership’ Is Pretty Much Just Libel Lawsuits And Lie Detectors

1 hour ago

Subscribe to News

Get the latest news and updates directly to your inbox.

At FSNN – Free Speech News Network, we deliver unfiltered reporting and in-depth analysis on the stories that matter most. From breaking headlines to global perspectives, our mission is to keep you informed, empowered, and connected.

FSNN.net is owned and operated by GlobalBoost Media
, an independent media organization dedicated to advancing transparency, free expression, and factual journalism across the digital landscape.

Facebook X (Twitter) Discord Telegram
Latest News

Daily Deal: The Modern Tech Skills Bundle

13 minutes ago

Court Strikes Allegations About Israeli History from Lawsuit Alleging Anti-Semitism at CUNY

14 minutes ago

Crypto platform Kraken is raising capital at $20 billion valuation ahead of its planned IPO

40 minutes ago

Subscribe to Updates

Get the latest news and updates directly to your inbox.

© 2026 GlobalBoost Media. All Rights Reserved.
  • Privacy Policy
  • Terms of Service
  • Our Authors
  • Contact

Type above and press Enter to search. Press Esc to cancel.

🍪

Cookies

We and our selected partners wish to use cookies to collect information about you for functional purposes and statistical marketing. You may not give us your consent for certain purposes by selecting an option and you can withdraw your consent at any time via the cookie icon.

Cookie Preferences

Manage Cookies

Cookies are small text that can be used by websites to make the user experience more efficient. The law states that we may store cookies on your device if they are strictly necessary for the operation of this site. For all other types of cookies, we need your permission. This site uses various types of cookies. Some cookies are placed by third party services that appear on our pages.

Your permission applies to the following domains:

  • https://fsnn.net
Necessary
Necessary cookies help make a website usable by enabling basic functions like page navigation and access to secure areas of the website. The website cannot function properly without these cookies.
Statistic
Statistic cookies help website owners to understand how visitors interact with websites by collecting and reporting information anonymously.
Preferences
Preference cookies enable a website to remember information that changes the way the website behaves or looks, like your preferred language or the region that you are in.
Marketing
Marketing cookies are used to track visitors across websites. The intention is to display ads that are relevant and engaging for the individual user and thereby more valuable for publishers and third party advertisers.