By mid-2025, 35% of newly published websites were AI-generated or AI-assisted, up from zero before ChatGPT’s November 2022 launch.
The confirmed effects are semantic contraction and artificial positivity—not misinformation or stylistic homogeneity, despite what most people believe.
At 35% AI prevalence, model collapse risk shifts from a theoretical concern to an empirical one for the next generation of foundation models.
A new study has a number for how much of the internet is now AI-generated: 35%. That’s the share of newly published websites classified as AI-generated or AI-assisted by mid-2025, according to research from Stanford University, Imperial College London, and the Internet Archive. The figure was essentially zero before ChatGPT launched in November 2022.
“I find the sheer speed of the AI takeover of the web quite staggering,” Jonáš Doležal, researcher at Imperial College London and co-author of the paper, told 404 Media. “After decades of humans shaping it, a significant portion of the internet has become defined by AI in just three years.”
The study, titled “The Impact of AI-Generated Text on the Internet,” drew on 33 months of website snapshots from the Internet Archive’s Wayback Machine and used an AI text detector called Pangram v3 to classify each page.
The confirmed harms: vibes, not facts
Researchers tested six hypotheses about what AI content does to the web. Only two held up under data scrutiny.
The first: We’re turning into a horde of dumb NPCs acting in the same way… Or more scientifically put, the web is becoming less semantically diverse.
AI-generated sites showed pairwise semantic similarity scores 33% higher than human-written ones. The same ideas keep getting expressed in nearly the same ways.
The paper suggests the online Overton window may be narrowing, not through censorship or coordinated campaigns, but because language models optimize for outputs close to their training distribution.
The second: The web is getting aggressively cheerful.
AI content showed positive sentiment scores more than 107% higher than human content. Researchers tie this to the well-documented sycophantic tendencies of LLMs—trained on human approval signals, they produce text that feels sanitized, friction-free, and relentlessly upbeat.
An internet flooded with cheerful, homogenized content may marginalize human dissent at scale without anyone pulling a lever.
Despite widespread public belief, the study found no statistically significant evidence that AI content is making the internet less factually accurate. Researchers found no meaningful correlation between AI prevalence and factual error rate.
The stylistic monoculture hypothesis—AI flattening individual voices into a generic uniform register—was the belief respondents held most strongly (83% agreed). The data didn’t confirm it. Character-level analysis found no statistically significant increase in stylistic homogeneity tied to AI prevalence.
The model collapse problem just got real
The broader stakes go beyond discourse quality. At 35% AI prevalence, the theoretical risk of model collapse—where future models degrade after training on AI-generated data—shifts from academic concern to empirical reality. Future foundation models trained on contemporary web crawls will inevitably ingest data that is substantially AI-generated and measurably less semantically diverse.
The team is now working with the Internet Archive to turn the study into a continuous, live monitoring tool, tracking AI’s share of the web in real time rather than as a one-off snapshot.
A U.S. survey conducted alongside the study found most Americans already believe all six negative hypotheses, including the ones the data doesn’t support. People who use AI infrequently were 12% more likely to believe in the harms than frequent users. Dead Internet Theory believers, meet the data: The internet isn’t dead, but 35% of what’s new is probably zombie content in some way.
Daily Debrief Newsletter
Start every day with the top news stories right now, plus original features, a podcast, videos and more.
The FSNN News Room is the voice of our in-house journalists, editors, and researchers. We deliver timely, unbiased reporting at the crossroads of finance, cryptocurrency, and global politics, providing clear, fact-driven analysis free from agendas.
We and our selected partners wish to use cookies to collect information about you for functional purposes and statistical marketing. You may not give us your consent for certain purposes by selecting an option and you can withdraw your consent at any time via the cookie icon.
Cookies are small text that can be used by websites to make the user experience more efficient. The law states that we may store cookies on your device if they are strictly necessary for the operation of this site. For all other types of cookies, we need your permission. This site uses various types of cookies. Some cookies are placed by third party services that appear on our pages.
Your permission applies to the following domains:
https://fsnn.net
Necessary
Necessary cookies help make a website usable by enabling basic functions like page navigation and access to secure areas of the website. The website cannot function properly without these cookies.
Statistic
Statistic cookies help website owners to understand how visitors interact with websites by collecting and reporting information anonymously.
Preferences
Preference cookies enable a website to remember information that changes the way the website behaves or looks, like your preferred language or the region that you are in.
Marketing
Marketing cookies are used to track visitors across websites. The intention is to display ads that are relevant and engaging for the individual user and thereby more valuable for publishers and third party advertisers.