AI and the End of Cyber Security

Listen to the article

0:00

In March 2016, the world was about to watch a paradigm shift. Google Deepmind’s AlphaGo team challenged Lee Sedol, the world champion of Go, to a competition. Go is the oldest and most complex strategy game ever created, played continuously for over 2,500 years. It has more possible board positions than there are atoms in the universe, and it was considered the exclusive province of the human mind. Researchers guessed that AI was at least a decade away from competing at the top level.

Sedol predicted that he would win in a landslide. He lost the first game. In the second game, on move 37, AlphaGo made a move that made no sense to anyone watching. Sedol got up and left the room for fifteen minutes. Commentators thought it was a mistake, a clear vindication of human superiority. But it wasn’t a mistake. The AI was playing in a part of the game that the human mind had never explored. This seminal moment showed us that, after 2,500 years, people have still not fully understood the game they invented. We have been playing in a corner of the probability space, and mistaking it for the whole map.

Sedol retired from professional Go three years later. In a reflection published this year, he called the match “a guide that presented the future in advance” and said it “sent a clear signal to humanity about how the world would change.” A year later, AlphaGo Zero—which was not trained on human games, just on the rules for three days—defeated the earlier AlphaGo 100:0.

I think about Move 37 a lot. I think about what it means when the map you thought was complete turns out to be almost empty.

Enter Claude Mythos

Before Anthropic announced Claude Mythos, information about the model had already leaked online, so it didn’t come as a complete surprise. The model is so good at finding security exploits that they won’t be releasing it publicly. Their previous top model, Claude Opus, was not particularly good at finding actual security exploits. In internal testing, Mythos was able to turn identified vulnerabilities into working exploits in roughly 72.4 percent of cases within a controlled benchmark environment. In just a few weeks, it found hidden security flaws known as zero-day vulnerabilities in major operating systems and web browsers, many of which were in active code for between ten and twenty years.

It found a sixteen-year-old bug in FFmpeg that had eluded detection after it was hit five million times by autonomous testing tools. It casually wrote a browser exploit that chained together four vulnerabilities, which would give an attacker the ability to run code on a user’s system just by visiting a webpage. In just four hours, it wrote a remote root exploit to a server without any formal security training from the human directing it. In response, Anthropic announced Project Glasswing, a coalition of the biggest players in software, AWS, Apple, Google, Microsoft, Crowdstrike, NVIDIA, and others, aimed at patching the vulnerabilities.

The glasswing butterfly’s defence mechanism is transparency. With transparent wings, it hides from the world, and despite a delicate appearance can carry up to forty times its weight. However, the analogy breaks down almost immediately. The glasswing’s transparency makes it invisible and harder to find, but the transparency of a leaked exploit is a target, broadcasting a vulnerability to the entire world. Mythos is Move 37 for computer security. It shows us that we’ve been playing in a tiny corner of the map, apparently besting decades of security research in a matter of days.

Defence Can’t Keep Up

I am glad that Anthropic exists. Its founders have concluded that exponential intelligence growth was inevitable and they have committed to entering the race in the hope of avoiding catastrophe. I know many people at Anthropic and I have been encouraged by the company’s ability to act responsibly within a capitalist framework. The ethical commitments derive from the personalities of the people who work there. But good intentions are a gamble, and we should consider the scale of what they’re up against:

OSS-Fuzz, an automated technique for bug-finding that began in 2016, has found about 13,000 vulnerabilities in ten years.
Human researchers find a few per target per year.
Anthropic says Claude Opus found more than 500 in a few weeks.
Early experiments with Mythos suggest it can find thousands in a few weeks, at drastically lower cost per bug than the previous kinds.

This is a fundamentally different kind of threat, and it heralds a step-change in computing that might be even more significant than the introduction of ChatGPT or Claude Code. This kind of race is rigged against a defensive player for five reasons.

1. Other Labs Are not Far Behind

OpenAI, Deepseek, and other AI companies are just behind Anthropic and will soon develop models with this type of capability. Open-source models won’t be too far behind. In some domains, smaller models are approaching the capabilities of last year’s systems.

2. AI Will Only Get Better

This refrain reminds us that our priors and expectations will be disrupted time and time again by exponential progress. People are very bad at reasoning about exponential progress. We tend to assume that tomorrow will be like today. When it’s not, we experience shock and then quickly reset our baseline. We don’t easily internalise how more step changes could drastically change our world. It’s hard to imagine that AI won’t get better for the foreseeable future, and it doesn’t even require the model itself to get better.

AI models improve drastically when prompted better and connected to the right data. If a model has all your essays it can think of ideas that you’ll like more and sound more like you. Improving the harness, the way a model is prompted, looped, and shown data, makes a tremendous impact.

The current mind-blowing results came from running Mythos in a very simple loop with a one-paragraph prompt. Connecting Mythos to massive amounts of data, giving it lists of previous exploits, and putting it in collaboration and competition with other versions of itself could produce an order of magnitude more exploits.

Infinite Doorways | Dall-E, prompted by Claude Opus

3. You Can’t Just Discover all the Bugs

Thousands of people have used LLMs for the same task and produced drastically different results. When I run Claude in a loop it sometimes doesn’t figure out something important until the fifteenth iteration. LLMs are not deterministic. Each time you prompt them, they travel a different path through their big brains. They are dramatically affected by your approach and ideas for how to solve a problem. This is one of the things that makes them so unpredictable, spontaneous, and anthropomorphic.

So the first time it might find ten exploits, the second time another ten. Prompting it in different directions and guiding it slightly differently can unlock new ones. You could run it 1,000 times and find new bugs each time, with no guarantee that it has found them all.

4. There May Not Be a Ceiling to the Number of Exploits

“It’s quite possible,” Facebook’s former chief information security officer said two weeks ago, “that all this development we’ve done in memory-unsafe languages, without formal methods, that none of that is actually secure in the presence of superintelligent bug-finding machines. In which case we need to be massively rebuilding the base infrastructure we all work on. And nobody is doing that.” Security researcher Thomas Ptacek has added: “We’ve been shielded from exploits not by soundly engineered defenses, but by a scarcity of elite attention. Most software has never been seriously examined by anyone with the skill to break it. AI eliminates that scarcity.”

How many exploits are out there total? The truth is that nobody really knows. Memory bugs in old languages might be finite, but there are so many other types of exploits. There may be a finite number of those too, and it may be possible to write a perfectly secure piece of code. But it may be that it isn’t. Most likely, given that we’ve just started to explore these classes of advanced exploits, the ceiling is at least so far away as to be presently irrelevant.

It really may be that each new generation of model expands the ceiling, gaining visibility into a new world of vulnerabilities that the previous model could never have even imagined.

5. Sticky LLMs and Stuck Paradigms

LLMs are sticky. Once they choose a direction and decide that something is true, it’s extremely hard for them to see different directions. If the LLM concludes that a certain set of well-scoped user permissions will make a system secure, it’s not as likely to realise it’s wrong mid-process. Adversarial review can help. When a second model or group of models is asked to poke holes in an approach, it often finds things the original model missed. But these adversarial reviewers are still stuck in the paradigm of the application.

Compare that to an attacker LLM, trained on all the different ways code goes wrong, watching every bug fix on Github, finding an exploit in one piece of code and looking for similar ones in every piece of software. It can apply thousands of attack paradigms. And it just needs to poke one hole. An attacker just needs one exploit to work in thousands of attempts. A defender, on the other hand, needs every single patch to be correct. The math is asymmetric, and stickiness makes it worse.

“But We’re Fixing the Bugs!”

Anthropic has committed US$100m in usage credits and US$4m in donations to open-source security groups. Some of the brightest minds on Earth are presently focused on the task. The sixteen-year old bug in FFmpeg has already been patched! But it’s much easier to poke a hole in a boat than it is to fix it, especially when the hole is in millions of boats that are out sailing somewhere. Security response is intrinsically slower, harder, and more expensive than exploit creation.

Just two or three of the 500 zero-day bugs Anthropic says it identified using Opus 4.6 in February 2026 have been fixed. It’s not the software developer’s fault. It’s structural. Finding and fixing bugs is hard.

Dan Kaminsky and the Thirteen Days

In early 2008, a brilliant security researcher named Dan Kaminsky discovered an exploit in DNS, the protocol that connects the entire internet. He immediately convened a secret meeting of sixteen security researchers at Microsoft headquarters and then coordinated a massive, multi-vendor response involving dozens of major technology companies. A patch was released on 8 July. It was the largest synchronised security update in the history of the internet. Those involved hoped to keep the exploit secret for thirty days after the patch was made available, but news leaked into the wild after thirteen days. That was one vulnerability. Imagine thousands of new ones in similarly critical systems, with more emerging each month.

The Y2K Bug

Some of you remember the 1990s, those halcyon days when neon was in and dates were stored as two-digit numbers. We had known since the ’60s that this would eventually break and potentially cause tremendous damage, but it wasn’t until the mid-’90s that people started treating the bug as a crisis. In 1993, it was estimated that the bug would cost US$50–75bn to mitigate worldwide. In 1998, response work began in earnest. The actual cost was closer to US$308bn. This was one class of bug, known decades in advance, with a countdown and a forcing function. But it was in every system, in spaghetti string code, with developers who had left or forgotten or passed away.

War Room | Dall-E prompted by Claude Opus

The Heartbleed Bug

Heartbleed was a critical vulnerability in OpenSSL that affected a large share of the web’s HTTPS infrastructure, since OpenSSL powered roughly two-thirds of sites at the time. Security researchers tried a new approach, releasing the patch with great fanfare on the same day they announced the exploit. Two months later, 300,000 servers were still vulnerable. Five years later, 91,000 servers were still vulnerable. One bug. A giant marketing campaign. And this was servers, sysadmins, not end users. And still it didn’t get patched in many places.

COVID-19

When COVID-19 began spreading and sickening people in Wuhan in late 2019, many people had a hard time understanding exponential growth and didn’t respond until the problem was at our doorstep and it was too late. I remember watching people die in Italy and seeing complete nonchalance in California until just a few days before things got really bad. It was also much easier to get infected than it was to prevent infection; much easier to poke one hole than seal all the holes. Right now, 99 percent of the exploits found by Mythos remain unpatched. This is not our fault. It’s certainly not Anthropic’s fault. This stuff is just structurally difficult.

The Patches Don’t Reach

The FFmpeg bug was discovered within weeks, and a patch has already been shipped. FFmpeg is open-source code that powers almost every tool that encodes or decodes video. Nearly every video player, browser, phone, app, and streaming service has a version of it embedded in their software. I had Claude search my computer, and these are the fifteen versions I have:

Fifteen versions of FFmpeg! Capcut, released just a week ago, has a version that is two years old! To be immune from this single vulnerability, I would need to (a) Know that the vulnerability exists, and which software uses the code (b) Stop using that software until a patch is made and the software integrates it and I update. What about an old video game or smartphone that doesn’t release updates anymore? When was the last time you updated FFmpeg?

Typically, large legacy systems are the ones that take forever to patch. Critical infrastructure that protects money, electricity, defence, etc. can’t be updated five times a day by a vibe coder. It’s too risky. And if an attacker can own a bunch of legacy systems, they can then get into the trusted chain and attack your patched system through there. Even if you update everything, you’re still vulnerable.

The SolarWinds attack offers an instructive example. An IT company distributing software to more than 30,000 organisations had its internal network compromised. The attackers injected code into a software update, which went out through trusted channels and installed backdoors on the network. About 18,000 organisation installed the compromised update and attackers were inside networks of the Department of State, Treasury, Energy, Homeland Security, and the NSA for months before they were discovered.

It’s a lot harder to, say, get root access to a remote server over the internet with which you have no relationship. So imagine a chain of four vulnerabilities makes this easy, and it propagates across the internet. Most servers update, but it’s easy to tell which ones haven’t. And attackers have another three sets of four-chain exploits ready to go for when a server gets patched. Attackers can use this surface, which your computer trusts, to push an attack onto your computer. Because it’s travelling over a trusted network, that code doesn’t need to use such sophisticated attacks to hack you.

The Attack Sphere of Your Mind | Dall-E prompted by Claude Opus

Move 37 for Your Brain

We think we understand how human persuasion works. Emotional triggers, in-group identities, thought and perception errors. We think we have the map. But consider that some of the now-familiar sentence structures we call “clickbait” were not popularised and systematised until 2013.

In 1994, a behavioural economist at Carnegie Mellon named George Lowenstein theorised that provoking curiosity in a headline might create more impact. But it wasn’t until 2013, when Upworthy built a massive A/B testing infrastructure to discover what kinds of headlines worked best, that formulations like “You Won’t Believe What Happens Next” were essentially discovered. And with that development, the playspace expanded ever so slightly.

Soon, AI will be able to manipulate our minds in three phases:

Algorithmic filtering (2009): AI designed to optimise engagement subsequently surfaces content that maximises outrage and fear. This discovery of how to exploit psychological vulnerabilities has led to the destructive impact of Facebook and other platforms in conflict zones and on our self-perception.
Generative content (2020): Fake videos, fake articles, and fake evidence erode our understanding of what’s authentic and true.
Direct manipulation from personal information: Using your data, messages, reactions, and patterns, AI can change your mind in ways you can’t detect.

We’re deep into phase two already, and the early, brute force version of phase three is just beginning:

A December 2022 report from the UK’s Centre for Countering Digital Hate found that TikTok’s algorithm can map a user’s psychological vulnerabilities in forty minutes and serve self-harm content to thirteen-year-olds within minutes of account creation. Accounts with usernames suggesting vulnerability receive twelve times more harmful content.
Some studies have indicated that biased search-engine rankings can shift voting preferences among undecided voters by twenty percent or more, and 75–100 percent of subjects in studies have no awareness that this is happening.

Consider then what a system with Mythos-level reasoning would find. One that doesn’t stumble on to psychological vulnerabilities through engagement optimisation, but actually reasons its way to them the way Mythos reasoned its way through code—reading the structure, following the data flows, hypothesising where the weaknesses are, and testing until it finds them. Marketing and persuasion are ancient techniques. But what if, just like these code exploits, there are techniques we haven’t even thought about yet?

It Comes From the Data

Once an attacker has your text message history—just a .db file on your computer—they know things about you that can’t be patched: your trigger words, the arguments that make you fold, the people whose approval you crave, the fears you’ve told no one. And of course, the patterns that we don’t have names for, some of which are personal and some of which are shared by the entire human species. You can’t apply for a new set of personal vulnerabilities. You can’t patch how you react when someone hits the right nerve.

What will it even look like when an AI chains together four zero-day exploits to sow division or create confusion? Would we even notice? Would some of us be oblivious, passing harmful psychological payloads through trusted channels like an unpatched server? The fugue of the past decade may partly be because so much of this has happened already. Social media has bent consensus reality while screens keep us in a hypnotic dream state.

What happens when a future LLM like Mythos, but designed for human persuasion, can take your message database and use it to extract the keys to your mind, finding hundreds or thousands of Move 37s for human manipulation? What happens when AI is connected to more data about what works and doesn’t: your facial expressions when you watch TV, thousands of hours of Zoom calls? How many exploits are there? How difficult are they to patch? And what do we do about the people who don’t patch, especially the ones we trust? Is it possible that beyond the human manipulation exploits we can imagine, there is another class of exploits? That beyond that, yet another?

We must consider the possibility that we are a lot more hackable than we think. And of course, the personal information that these Mythos-class attacks can steal will set the stage.

Tsunami | Dall-E prompted by Claude Opus

Preparation, Not Panic

I remember posting about the danger posed by the COVID-19 pandemic when it was approaching. A friend who’s very good at using his rational mind (and who currently works for Anthropic) had been posting about it since January 2020. I wrote a long post telling people to get ready. Two weeks later, California shut down.

AI Won’t End Software Engineering — It Will Transform It

Matt Shumer’s viral essay about AI is part of a long history of fear produced by technological change.

This moment feels similar. Panic will do nothing, and panicking about an inevitability seems foolhardy. It’s time to prepare. I’m making more frequent backups, with occasional offline backups to cold storage. I’m reducing the attack surface of my personal information. I’m thinking harder about the data I leave lying around, and what it would mean in the hands of a system that could read it the way Mythos reads code. But the real issue is that the security model of the internet was built on the assumption that finding and exploiting bugs is hard and costs time, money, and scarce human expertise. Mythos has now falsified that assumption.

The trust model of human communication is built on similar assumptions: that effective manipulation requires rare skill, deep personal knowledge, and sustained effort, and that the danger rarely comes from people we trust. Those assumptions are next. The world will need to build a new paradigm, and quickly. Not just for computing, but for trust itself. The most important thing any of us can do now is what we are worst at: believing that tomorrow might be radically different from today. It is the surprise that comes from our refusal to expect drastic change that is our greatest vulnerability.

Quillette invites thoughtful responses to its essays.
Selected responses are published once per week as part of a curated Letters to the Editor feature. If selected, letters appear under the contributor’s real name and may be edited for clarity and length.

To submit a letter for consideration, please email [email protected].

Read the full article here

Fact Checker

Verify the accuracy of this article using AI-powered analysis and real-time sources.

Trending

The CDC Doesn’t Want You To See A CDC Report On How Effective COVID Vaccines Are

State vs. Local, State vs. State

Artemis and the Lunar Frontier

Listen to the article

The CDC Doesn’t Want You To See A CDC Report On How Effective COVID Vaccines Are

State vs. Local, State vs. State

Artemis and the Lunar Frontier

Circle ‘Exploring’ Arc Network Token Launch, Proof-of-Stake Shift: CEO

Contempt Proceedings Regarding Tren de Aragua Deportations “Are a Clear Abuse of Discretion”

Senators Eye Draft Deal on Stablecoin Yield Amid Banking Lobby Pushback

State vs. Local, State vs. State

Artemis and the Lunar Frontier

XRP Ledger adds zero-knowledge proofs targeting institutional privacy gap

ETH 2025 Fractal May Trigger 250% Rally To New Highs

Circle ‘Exploring’ Arc Network Token Launch, Proof-of-Stake Shift: CEO

Contempt Proceedings Regarding Tren de Aragua Deportations “Are a Clear Abuse of Discretion”

Latest News

The CDC Doesn’t Want You To See A CDC Report On How Effective COVID Vaccines Are

State vs. Local, State vs. State

Artemis and the Lunar Frontier

Trending

AI and the End of Cyber Security

Listen to the article

Key Takeaways

Playback Speed

Select a Voice

Enter Claude Mythos

Defence Can’t Keep Up

“But We’re Fixing the Bugs!”

The Patches Don’t Reach

Move 37 for Your Brain

It Comes From the Data

Preparation, Not Panic

Fact Checker

Get Your Fact Check Report

Continue with Full Access

Related Articles

Subscribe to Updates

Cookies

Manage Cookies

Your permission applies to the following domains: