Close Menu
FSNN | Free Speech News NetworkFSNN | Free Speech News Network
  • Home
  • News
    • Politics
    • Legal & Courts
    • Tech & Big Tech
    • Campus & Education
    • Media & Culture
    • Global Free Speech
  • Opinions
    • Debates
  • Video/Live
  • Community
  • Freedom Index
  • About
    • Mission
    • Contact
    • Support
Trending

Markwayne Mullin’s Less ‘Flashy’ DHS Is Using the Same Thuggish Tactics

25 minutes ago

EXCLUSIVE: Senegal’s grave miscarriage of justice in the case of René Capain Bassène

30 minutes ago

What American crypto asset perpetuals mean for the future of crypto

42 minutes ago
Facebook X (Twitter) Instagram
Facebook X (Twitter) Discord Telegram
FSNN | Free Speech News NetworkFSNN | Free Speech News Network
Market Data Newsletter
Friday, May 29
  • Home
  • News
    • Politics
    • Legal & Courts
    • Tech & Big Tech
    • Campus & Education
    • Media & Culture
    • Global Free Speech
  • Opinions
    • Debates
  • Video/Live
  • Community
  • Freedom Index
  • About
    • Mission
    • Contact
    • Support
FSNN | Free Speech News NetworkFSNN | Free Speech News Network
Home»Cryptocurrency & Free Speech Finance»AI Models Can’t Agree on Basic Facts Most of the Time, Study Shows
Cryptocurrency & Free Speech Finance

AI Models Can’t Agree on Basic Facts Most of the Time, Study Shows

News RoomBy News Room2 hours agoNo Comments4 Mins Read1,380 Views
Share Facebook Twitter Pinterest Copy Link LinkedIn Tumblr Email VKontakte Telegram
AI Models Can’t Agree on Basic Facts Most of the Time, Study Shows
Share
Facebook Twitter Pinterest Email Copy Link

Listen to the article

0:00
0:00

Key Takeaways

Playback Speed

Select a Voice

In brief

  • Five frontier AI models disagreed on 67% of 1,000 real-world fact-check claims.
  • Unanimous agreement happened on only 328 claims.
  • At 0.639 Krippendorff’s alpha, the models fall below the 0.8 reliability threshold.

Ask five of the world’s most advanced AI systems whether a statement is true, and two-thirds of the time, at least one will give you a different answer. That’s the finding of a new study published this month by researcher Kosta Jordanov at Lenz Research.

The study gave GPT-5.4, Claude Opus 4.7, Gemini 3 Pro, Gemini 3 Pro with Search, and Sonar Pro the same 1,000 real-world fact-check claims submitted by actual users. The models had to pick one of four labels: true, mostly true, misleading, or false.

On 672 out of 1,000 claims, at least one model broke from the majority. In 34% of cases, the disagreement was severe: one model called a claim true while another called it false.

“These aren’t benchmark items with public answer keys—they’re claims real users submitted for verification to a fact-checking platform,” the study reads. “Only one verdict bucket can be correct per claim, so any disagreement among the panel means at least one model’s verdict is label-inconsistent under this 4-bucket rubric.”

Previous studies on AI hallucination have shown that chatbots invent facts. That’s one problem. This is a different one. The models aren’t necessarily making things up, they just can’t agree on basic factual judgments about the same material.

The research used a setup that makes it harder for the AI companies to explain away. Instead of pulling claims from standard test sets—the kind that often leak into training data—the researchers used claims submitted by real people to Lenz’s fact-checking platform. “Most of these claims are unlikely to appear in any training corpus with a gold label attached—there’s no canonical answer key to pattern-match against, no benchmark leaderboard to anchor to,” the paper notes.

The statistical measure of agreement, called Krippendorff’s alpha, came in at 0.639 on a scale where 1.0 means perfect agreement and 0 means random chance. The study says this indicates “nontrivial but limited agreement.” “The models’ verdicts are structured rather than random, but not consistent enough to treat the panel as a single interchangeable judge,” researchers note. Researchers generally consider anything below 0.8 to be weak.

When all five models did agree—which happened on only 328 out of 1,000 claims—they almost never agreed that something was misleading or mostly true. Just four claims received a unanimous “misleading” verdict. Zero received unanimous “mostly true.”

The researchers provided example claims where the AI models showed the most divergence, including “The World Bank’s active portfolio in Nigeria stands an over $16.4 billion as of 2025.” ChatGPT 5.4 said it was “mostly true” while Gemini 3 Pro called it “false” and its sister model Gemini 3 Pro + Search rated it “misleading.”

In another example, the models were provided with the claim: “Donald Trump said that an attack on Iran was postponed at the request of Gulf Allies.” GPT-5.4 said it was false, Claude Opus 4.7 called it mostly true, Gemini 3 Pro said false, and Gemini 3 Pro + Search rated it true.

“The panel converges on definitive verdicts; the middle of the rubric is where it fractures,” the researchers found. Unanimity only happened at the extremes: either the claim was definitely true or definitely false.

This matters because people are increasingly turning to AI systems for fact-checking. If you paste a claim from a news article into ChatGPT, Claude, or Gemini, you might get three different answers. Which one do you trust?

AI companies love to tell you their models are getting more accurate. They publish benchmark scores showing steady improvement. But the Lenz study tested these models on the kind of jagged, ambiguous claims that real humans actually argue about—and found that the models argue too.

The paper is careful to point this out. “A majority of frontier models is not ground truth. The majority verdict is sometimes wrong; an individual dissenting model is sometimes right. We use the majority as a structural reference point for measuring disagreement, not as a stand-in for correctness.”

There’s a deeper problem buried in the numbers. When models disagree, at least one of them must be wrong—the study calls a model’s verdict “label-inconsistent under this 4-bucket rubric.” There’s no tie-breaker mechanism, no appeals court. Recent reporting on AI reliability has raised similar alarms.

On the 328 claims where all five models agreed, zero received a unanimous “mostly true.” The nuance bucket emptied out completely. If AI models can only find consensus at the extremes, can they be trusted as fact checkers at all?

Daily Debrief Newsletter

Start every day with the top news stories right now, plus original features, a podcast, videos and more.

Read the full article here

Fact Checker

Verify the accuracy of this article using AI-powered analysis and real-time sources.

Get Your Fact Check Report

Enter your email to receive detailed fact-checking analysis

5 free reports remaining

Continue with Full Access

You've used your 5 free reports. Sign up for unlimited access!

Already have an account? Sign in here

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Telegram Copy Link
News Room
  • Website
  • Facebook
  • X (Twitter)
  • Instagram
  • LinkedIn

The FSNN News Room is the voice of our in-house journalists, editors, and researchers. We deliver timely, unbiased reporting at the crossroads of finance, cryptocurrency, and global politics, providing clear, fact-driven analysis free from agendas.

Related Articles

Media & Culture

Markwayne Mullin’s Less ‘Flashy’ DHS Is Using the Same Thuggish Tactics

25 minutes ago
Cryptocurrency & Free Speech Finance

What American crypto asset perpetuals mean for the future of crypto

42 minutes ago
Cryptocurrency & Free Speech Finance

Coinbase Launches Regulated access to Global Crypto Options and Perps

45 minutes ago
Cryptocurrency & Free Speech Finance

You Can Now Read the US Constitution via the Bitcoin Blockchain

47 minutes ago
Media & Culture

Court Temporarily Freezes Trump’s $1.776 Billion ‘Anti-Weaponization’ Slush Fund To Figure Out WTF Is Going On

1 hour ago
Media & Culture

Alabama Basketball Player’s Libel Lawsuit Against New York Times Can Go to a Jury

1 hour ago
Add A Comment
Leave A Reply Cancel Reply

Editors Picks

EXCLUSIVE: Senegal’s grave miscarriage of justice in the case of René Capain Bassène

30 minutes ago

What American crypto asset perpetuals mean for the future of crypto

42 minutes ago

Coinbase Launches Regulated access to Global Crypto Options and Perps

45 minutes ago

You Can Now Read the US Constitution via the Bitcoin Blockchain

47 minutes ago
Latest Posts

Court Temporarily Freezes Trump’s $1.776 Billion ‘Anti-Weaponization’ Slush Fund To Figure Out WTF Is Going On

1 hour ago

Alabama Basketball Player’s Libel Lawsuit Against New York Times Can Go to a Jury

1 hour ago

EXCLUSIVE: CPJ uncovers grave miscarriage of justice in Senegalese journalist’s life sentence

2 hours ago

Subscribe to News

Get the latest news and updates directly to your inbox.

At FSNN – Free Speech News Network, we deliver unfiltered reporting and in-depth analysis on the stories that matter most. From breaking headlines to global perspectives, our mission is to keep you informed, empowered, and connected.

FSNN.net is owned and operated by GlobalBoost Media
, an independent media organization dedicated to advancing transparency, free expression, and factual journalism across the digital landscape.

Facebook X (Twitter) Discord Telegram
Latest News

Markwayne Mullin’s Less ‘Flashy’ DHS Is Using the Same Thuggish Tactics

25 minutes ago

EXCLUSIVE: Senegal’s grave miscarriage of justice in the case of René Capain Bassène

30 minutes ago

What American crypto asset perpetuals mean for the future of crypto

42 minutes ago

Subscribe to Updates

Get the latest news and updates directly to your inbox.

© 2026 GlobalBoost Media. All Rights Reserved.
  • Privacy Policy
  • Terms of Service
  • Our Authors
  • Contact

Type above and press Enter to search. Press Esc to cancel.

🍪

Cookies

We and our selected partners wish to use cookies to collect information about you for functional purposes and statistical marketing. You may not give us your consent for certain purposes by selecting an option and you can withdraw your consent at any time via the cookie icon.

Cookie Preferences

Manage Cookies

Cookies are small text that can be used by websites to make the user experience more efficient. The law states that we may store cookies on your device if they are strictly necessary for the operation of this site. For all other types of cookies, we need your permission. This site uses various types of cookies. Some cookies are placed by third party services that appear on our pages.

Your permission applies to the following domains:

  • https://fsnn.net
Necessary
Necessary cookies help make a website usable by enabling basic functions like page navigation and access to secure areas of the website. The website cannot function properly without these cookies.
Statistic
Statistic cookies help website owners to understand how visitors interact with websites by collecting and reporting information anonymously.
Preferences
Preference cookies enable a website to remember information that changes the way the website behaves or looks, like your preferred language or the region that you are in.
Marketing
Marketing cookies are used to track visitors across websites. The intention is to display ads that are relevant and engaging for the individual user and thereby more valuable for publishers and third party advertisers.