Close Menu
FSNN | Free Speech News NetworkFSNN | Free Speech News Network
  • Home
  • News
    • Politics
    • Legal & Courts
    • Tech & Big Tech
    • Campus & Education
    • Media & Culture
    • Global Free Speech
  • Opinions
    • Debates
  • Video/Live
  • Community
  • Freedom Index
  • About
    • Mission
    • Contact
    • Support
Trending

Indonesian journalists attacked over reporting at mineral processing plant

8 minutes ago

ECB unveils tokenized finance roadmap as Europe pushes to reduce reliance on foreign infrastructure

19 minutes ago

SEC, CFTC Handshake on Memo to Regulate Markets in Harmony

20 minutes ago
Facebook X (Twitter) Instagram
Facebook X (Twitter) Discord Telegram
FSNN | Free Speech News NetworkFSNN | Free Speech News Network
Market Data Newsletter
Thursday, March 12
  • Home
  • News
    • Politics
    • Legal & Courts
    • Tech & Big Tech
    • Campus & Education
    • Media & Culture
    • Global Free Speech
  • Opinions
    • Debates
  • Video/Live
  • Community
  • Freedom Index
  • About
    • Mission
    • Contact
    • Support
FSNN | Free Speech News NetworkFSNN | Free Speech News Network
Home»News»Media & Culture»Judge Orders OpenAI To Give Lawyers 20 Million Private Chats, Thinks ‘Anonymization’ Can Keep Them Private
Media & Culture

Judge Orders OpenAI To Give Lawyers 20 Million Private Chats, Thinks ‘Anonymization’ Can Keep Them Private

News RoomBy News Room4 months agoNo Comments9 Mins Read460 Views
Share Facebook Twitter Pinterest Copy Link LinkedIn Tumblr Email VKontakte Telegram
Judge Orders OpenAI To Give Lawyers 20 Million Private Chats, Thinks ‘Anonymization’ Can Keep Them Private
Share
Facebook Twitter Pinterest Email Copy Link

Listen to the article

0:00
0:00

Key Takeaways

Playback Speed

Select a Voice

from the seems-like-a-problem dept

A federal magistrate judge just ordered that the private ChatGPT conversations of 20 million users be handed over to the lawyers for dozens of plaintiffs, including news organizations. Those 20 million people weren’t asked. They weren’t notified. They have no say in the matter.

Last week, Magistrate Judge Ona Wang ordered OpenAI to turn over a sample of 20 million chat logs as part of the sprawling multidistrict litigation where publishers are suing AI companies—a mess of consolidated cases that kicked off with the NY Times’ lawsuit against OpenAI. Judge Wang dismissed OpenAI’s privacy concerns, apparently convinced that “anonymization” solves everything.

Even if you hate OpenAI and everything it stands for, and hope that the news orgs bring it to its knees, this should scare you. A lot. OpenAI had pointed out to the judge a week earlier that this demands from the news orgs would represent a massive privacy violation for ChatGPT’s users.

News Plaintiffs demand that OpenAI hand over the entire 20M log sample “in readily searchable format” via a “hard drive or [] dedicated private cloud.” ECF 656 at 3. That would include logs that are neither relevant nor responsive—indeed, News Plaintiffs concede that at least 99.99% of the logs are irrelevant to their claims. OpenAI has never agreed to such a process, which is wildly disproportionate to the needs of the case and exposes private user chats for no reasonable litigation purpose. In a display of striking hypocrisy, News Plaintiffs disregard those users’ privacy interests while claiming that their own chat logs are immune from production because “it is possible” that their employees “entered sensitive information into their prompts.” ECF 475 at 4. Unlike News Plaintiffs, OpenAI’s users have no stake in this case and no opportunity to defend their information from disclosure. It makes no sense to order OpenAI to hand over millions of irrelevant and private conversation logs belonging to those absent third parties while allowing News Plaintiffs to shield their own logs from disclosure.

OpenAI offered a much more privacy-protective alternative: hand over only a targeted set of logs actually relevant to the case, rather than dumping 20 million records wholesale. The news orgs fought back, but their reply brief is sealed—so we don’t get to see their argument. The judge bought it anyway, dismissing the privacy concerns on the theory that OpenAI can simply “anonymize” the chat logs:

Whether or not the parties had reached agreement to produce the 20 million Consumer ChatGPT Logs in whole—which the parties vehemently dispute—such production here is appropriate. OpenAI has failed to explain how its consumers’ privacy rights are not adequately protected by: (1) the existing protective order in this multidistrict litigation or (2) OpenAI’s exhaustive de-identification of all of the 20 million Consumer ChatGPT Logs.

The judge then quotes the news orgs’ filing, noting that OpenAI has already put in this effort to “deidentify” the chat logs.

Both of those supposed protections—the protective order and “exhaustive de-identification”—are nonsense. Let’s start with the anonymization problem, because it shows a stunning lack of understanding about what it means to anonymize data sets, especially AI chatlogs.

We’ve spent years warning people that “anonymized data” is a gibberish term, used by companies to pretend large collections of data can be kept private, when that’s just not true. Almost any large dataset of “anonymized” data can have significant portions of the data connected back to individuals with just a little work. Researchers re-identified individuals from “anonymized” AOL search queries, from NYC taxi records, from Netflix viewing histories—the list goes on. Every time someone shows up with an “anonymized” dataset, researchers show ways to re-identify people in the dataset.

And that’s even worse when it comes to ChatGPT chat logs, which are likely to be way more revealing that previous data sets where the inability to anonymize data were called out. There have been plenty of reports of just how much people “overshare” with ChatGPT, often including incredibly private information.

Back in August, researchers got their hands on just 1,000 leaked ChatGPT conversations and talked about how much sensitive information they were able to glean from just that small number of chats.

Researchers downloaded and analyzed 1,000 of the leaked conversations, spanning over 43 million words. Among them, they discovered multiple chats that explicitly mentioned personally identifiable information (PII), such as full names, addresses, and ID numbers.

With that level of PII and sensitive information, connecting chats back to individuals is likely way easier than in previous cases of connecting “anonymized” data back to individuals.

And that was with just 1,000 records.

Then, yesterday as I was writing this, the Washington Post revealed that they had combed through 47,000 ChatGPT chat logs, many of which were “accidentally” revealed via ChatGPT’s “share” feature. Many of them reveal deeply personal and intimate information.

Users often shared highly personal information with ChatGPT in the conversations analyzed by The Post, including details generally not typed into conventional search engines.

People sent ChatGPT more than 550 unique email addresses and 76 phone numbers in the conversations. Some are public, but others appear to be private, like those one user shared for administrators at a religious school in Minnesota.

Users asking the chatbot to draft letters or lawsuits on workplace or family disputes sent the chatbot detailed private information about the incidents.

There are examples where, even if the user’s official details are redacted, it would be trivial to figure out who was actually doing the chats:

If you can’t see that, it’s a chat with ChatGPT, redacted by the Washington post saying:

User
my name is [name redacted] my husband name [name redacted] is threatning me to kill and not taking my responsibities and trying to go abroad […] he is not caring us and he is going to kuwait and he will give me divorce from abroad please i want to complaint to higher authgorities and immigrition office to stop him to go abroad and i want justice please help

ChatGPT
Below is a formal draft complaint you can submit to the Deputy Commissioner of Police in [redacted] addressing your concerns and seeking immediate action:

That seems like even if you “anonymized” the chat by taking off the user account details, it wouldn’t take long to figure out whose chat it was, revealing some pretty personal info, including the names of their children (according to the Post).

And WaPo reporters found that by starting with 93,000 chats, then using tools do an analysis of the 47,000 in English, followed by human review of just 500 chats in a “random sample.”

Now imagine 20 million records. With many, many times more data, the ability to cross-reference information across chats, identify patterns, and connect seemingly disconnected pieces of information becomes exponentially easier. This isn’t just “more of the same”—it’s a qualitatively different threat level.

Even worse, the judge’s order contains a fundamental contradiction: she demands that OpenAI share these chatlogs “in whole” while simultaneously insisting they undergo “exhaustive de-identification.” Those two requirements are incompatible.

Real de-identification would require stripping far more than just usernames and account info—it would mean redacting or altering the actual content of the chats, because that content is often what makes re-identification possible. But if you’re redacting content to protect privacy, you’re no longer handing over the logs “in whole.” You can’t have both. The judge doesn’t grapple with this contradiction at all.

Yes, as the judge notes, this data is kept under the protective order in the case, meaning that it shouldn’t be disclosed. But protective orders are only as strong as the people bound by them, and there’s a huge risk here.

Looking at the docket, there are a ton of lawyers who will have access to these files. The docket list of parties and lawyers is 45 pages long if you try to print it out. While there are plenty of repeats in there, there have to be at least 100 lawyers and possibly a lot more (I’m not going to count them, and while I asked three different AI tools to count them, each gave me a different answer).

That’s a lot of people—many representing entities directly hostile to OpenAI—who all need to keep 20 million private conversations secret.

That’s not even getting into the fact that handling 20 million chat logs is a difficult task to do well. I am quite sure that among all the plaintiffs and all the lawyers, even with the very best of intentions, there’s still a decent chance that some of the content could leak (and it could, in theory, leak to some of the media properties who are plaintiffs in the case).

And, as OpenAI properly points out, its users whose data is at risk here have no say in any of this. They likely have no idea that a ton of people may be about to get an intimate look at what they thought were their private ChatGPT chats.

On Wednesday morning, OpenAI asked the judge to reconsider, warning of the very real potential harms:

OpenAI is unaware of any court ordering wholesale production of personal information at this scale. This sets a dangerous precedent: it suggests that anyone who files a lawsuit against an AI company can demand production of tens of millions of conversations without first narrowing for relevance. This is not how discovery works in other cases: courts do not allow plaintiffs suing Google to dig through the private emails of tens of millions of Gmail users irrespective of their relevance. And it is not how discovery should work for generative AI tools either.

The judge had cited a ruling in one of Anthropic’s cases, but hadn’t given OpenAI a chance to explain why the ruling in that case didn’t apply here (in that one, Anthropic had agreed to hand over the logs as part of negotiations with the plaintiffs, and OpenAI gets in a little dig at its competitor, pointing out that it appears Anthropic made no effort to protect the privacy of its users in that case).

There have, as Daphne Keller regularly points out, always been challenges between user privacy and platform transparency. But this goes well beyond that familiar tension. We’re not talking about “platform transparency” in the traditional sense—publishing aggregated statistics or clarifying moderation policies. This is 20 million complete chatlogs, handed over “in whole” to dozens of adversarial parties and their lawyers. The potential damage to the privacy rights of those users could be massive.

And the judge just waves it all away.


Filed Under: anonymized data, chat logs, chatgpt, ona wang, privacy

Companies: ny times, openai

Read the full article here

Fact Checker

Verify the accuracy of this article using AI-powered analysis and real-time sources.

Get Your Fact Check Report

Enter your email to receive detailed fact-checking analysis

5 free reports remaining

Continue with Full Access

You've used your 5 free reports. Sign up for unlimited access!

Already have an account? Sign in here

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Telegram Copy Link
News Room
  • Website
  • Facebook
  • X (Twitter)
  • Instagram
  • LinkedIn

The FSNN News Room is the voice of our in-house journalists, editors, and researchers. We deliver timely, unbiased reporting at the crossroads of finance, cryptocurrency, and global politics, providing clear, fact-driven analysis free from agendas.

Related Articles

Cryptocurrency & Free Speech Finance

Crypto Traders Turn to Hyperliquid for Oil Bets Amid Iran Volatility

26 minutes ago
Media & Culture

He Was Arrested Over a Bogus Drug Tests. Now He’s Suing.

1 hour ago
Cryptocurrency & Free Speech Finance

Wells Fargo Applies for WFUSD Trademark, Signaling Use in Crypto and Stablecoins

1 hour ago
AI & Censorship

Certbot and Let’s Encrypt Now Support IP Address Certificates

2 hours ago
Media & Culture

The Ninth Circuit’s En Banc Shadow Docket

2 hours ago
Cryptocurrency & Free Speech Finance

Most AI Chatbots Will Help a Teen Plan a Mass Shooting, Study Finds

2 hours ago
Add A Comment

Comments are closed.

Editors Picks

ECB unveils tokenized finance roadmap as Europe pushes to reduce reliance on foreign infrastructure

19 minutes ago

SEC, CFTC Handshake on Memo to Regulate Markets in Harmony

20 minutes ago

Crypto Traders Turn to Hyperliquid for Oil Bets Amid Iran Volatility

26 minutes ago

He Was Arrested Over a Bogus Drug Tests. Now He’s Suing.

1 hour ago
Latest Posts

SEC, CFTC end years of rivalry with deal that will mean combined crypto oversight

1 hour ago

Pro Traders Anticipate Low Odds of a Bitcoin Rally Toward $78,000

1 hour ago

Wells Fargo Applies for WFUSD Trademark, Signaling Use in Crypto and Stablecoins

1 hour ago

Subscribe to News

Get the latest news and updates directly to your inbox.

At FSNN – Free Speech News Network, we deliver unfiltered reporting and in-depth analysis on the stories that matter most. From breaking headlines to global perspectives, our mission is to keep you informed, empowered, and connected.

FSNN.net is owned and operated by GlobalBoost Media
, an independent media organization dedicated to advancing transparency, free expression, and factual journalism across the digital landscape.

Facebook X (Twitter) Discord Telegram
Latest News

Indonesian journalists attacked over reporting at mineral processing plant

8 minutes ago

ECB unveils tokenized finance roadmap as Europe pushes to reduce reliance on foreign infrastructure

19 minutes ago

SEC, CFTC Handshake on Memo to Regulate Markets in Harmony

20 minutes ago

Subscribe to Updates

Get the latest news and updates directly to your inbox.

© 2026 GlobalBoost Media. All Rights Reserved.
  • Privacy Policy
  • Terms of Service
  • Our Authors
  • Contact

Type above and press Enter to search. Press Esc to cancel.

🍪

Cookies

We and our selected partners wish to use cookies to collect information about you for functional purposes and statistical marketing. You may not give us your consent for certain purposes by selecting an option and you can withdraw your consent at any time via the cookie icon.

Cookie Preferences

Manage Cookies

Cookies are small text that can be used by websites to make the user experience more efficient. The law states that we may store cookies on your device if they are strictly necessary for the operation of this site. For all other types of cookies, we need your permission. This site uses various types of cookies. Some cookies are placed by third party services that appear on our pages.

Your permission applies to the following domains:

  • https://fsnn.net
Necessary
Necessary cookies help make a website usable by enabling basic functions like page navigation and access to secure areas of the website. The website cannot function properly without these cookies.
Statistic
Statistic cookies help website owners to understand how visitors interact with websites by collecting and reporting information anonymously.
Preferences
Preference cookies enable a website to remember information that changes the way the website behaves or looks, like your preferred language or the region that you are in.
Marketing
Marketing cookies are used to track visitors across websites. The intention is to display ads that are relevant and engaging for the individual user and thereby more valuable for publishers and third party advertisers.