Listen to the article
from the history-is-now-a-black-hole dept
Earlier this year Nieman Lab broke the story that major news publishers, including The New York Times, The Guardian, and USA Today Co., had started blocking the Internet Archive for fear that AI companies might scrape the nonprofit’s repositories for training data. As one of the last bastions of archival history, that is, in case you’re not aware, not very good for the public interest.
Four months later and Nieman Lab now notes that the number of news outlets blocking the archive has soared to around 340 organizations:
“Our new analysis shows that more than 340 local news sites across the United States are now limiting the Internet Archive’s ability to access and preserve their stories. Many sites in our sample are owned by five of the seven largest local news publishers in the country: USA Today Co., McClatchy, Advance Local, MediaNews Group, and Tribune Publishing. The latter two are both subsidiaries of the “vulture hedge fund” Alden Global Capital.”
Many of these localities are already effectively news deserts, where most real local journalism was hollowed out and replaced by a smattering of local right wing broadcasters (like Sinclair Broadcasting) or a hedge fund run “local newspaper” that doesn’t do much in the way of actual local reporting. That’s generally also been terrible for informed consensus or shedding a light on local corruption.
Some of the outlets blocking internet archive access have legitimate concerns about protecting their hard work from being repackaged and resold without compensation or citation. But an awful lot of the folks grumbling about the Internet Archive were never in the journalism business to serve the public interest in the first place.
Regardless of motivation, hiding whatever local news remains behind paywalls, then blocking it from the Internet Archive, in turn makes it harder for everyone else to do real journalism that relies on the historical record, local journalists tell Nieman Lab:
“I cover news within a larger news desert in New York’s Rockland, Sullivan, and Rockland counties. This means I need to heavily rely on archival data of old news articles from now deceased, or zombie-fied, media outlets,” wrote B.J. Mendelson, the editor of The Monroe Gazette newsletter, in one recent petition signed by over 200 journalists. “Without the Internet Archive, my [work] would be incredibly difficult to do.”
Trying to address publisher concerns, the folks at the Wayback Machine have highlighted ongoing efforts to minimize abuse of the site, including restrictions on bulk downloading and collaborating with Cloudflare to monitor bot activity.
But even beyond AI scraping, many corporate media owners simply can’t see beyond the narrow interests of paywalled revenue. And corporate power — and authoritarianism — sometimes in collaboration — both tend to benefit from a misinformed electorate that doesn’t have a firm grip on the lessons learned from historical experience, and doesn’t have easy access to the factual record.
As a journalist of several decades, the vast vast majority of my work has been deleted by website owners and companies that simply couldn’t have cared any less about archival history or any sort of permanent record. My explorations of telecom policy have disappeared, but Verizon, AT&T, and Comcast’s version of the historical record generally remains. You can probably see how that’s of benefit to corporate power.
But again, smaller, independent, local news outlets on fixed budgets have particularly legitimate concerns about the tech giants’ plan to hijack and repackage the entirety of their work using AI without any compensation or attribution whatsoever. The Internet Archive folks say they are listening to those concerns, while also trying to train news orgs on archival preservation:
“In December, the Internet Archive partnered with the Poynter Institute and Investigative Reporters and Editors to train a cohort of 33 local and national news outlets on how to develop and implement an archiving strategy. The initiative, funded through a Press Forward grant, aims to train 300 newsrooms in digital preservation and in using the Internet Archive’s services by the end of 2027.”
Some other archival efforts exist, but they often involve paywalled access; again a problem when you’ve got an authoritarian corporate coalition driven heavily by free propaganda, while factual reality and what’s left of intelligent U.S. analysis and journalism sits hidden behind a monthly subscription fee.
Filed Under: ai, archives, bots, historical record, media, paywalls, wayback machine
Companies: advance media, gannett, internet archive, mcclatchy, medianews
Read the full article here
Fact Checker
Verify the accuracy of this article using AI-powered analysis and real-time sources.

