Close Menu
FSNN | Free Speech News NetworkFSNN | Free Speech News Network
  • Home
  • News
    • Politics
    • Legal & Courts
    • Tech & Big Tech
    • Campus & Education
    • Media & Culture
    • Global Free Speech
  • Opinions
    • Debates
  • Video/Live
  • Community
  • Freedom Index
  • About
    • Mission
    • Contact
    • Support
Trending

Morgan Stanley enters bitcoin ETF race with market-leading low fee

14 minutes ago

Stablecoin Jitters, AI Micropayments Reshape Crypto

17 minutes ago

Gavin Newsom Bans California Public Officials From Prediction Market Insider Trading

19 minutes ago
Facebook X (Twitter) Instagram
Facebook X (Twitter) Discord Telegram
FSNN | Free Speech News NetworkFSNN | Free Speech News Network
Market Data Newsletter
Friday, March 27
  • Home
  • News
    • Politics
    • Legal & Courts
    • Tech & Big Tech
    • Campus & Education
    • Media & Culture
    • Global Free Speech
  • Opinions
    • Debates
  • Video/Live
  • Community
  • Freedom Index
  • About
    • Mission
    • Contact
    • Support
FSNN | Free Speech News NetworkFSNN | Free Speech News Network
Home»Cryptocurrency & Free Speech Finance»Is AGI Here? Not Even Close, New AI Benchmark Suggests
Cryptocurrency & Free Speech Finance

Is AGI Here? Not Even Close, New AI Benchmark Suggests

News RoomBy News Room19 hours agoNo Comments5 Mins Read220 Views
Share Facebook Twitter Pinterest Copy Link LinkedIn Tumblr Email VKontakte Telegram
Is AGI Here? Not Even Close, New AI Benchmark Suggests
Share
Facebook Twitter Pinterest Email Copy Link

Listen to the article

0:00
0:00

Key Takeaways

Playback Speed

Select a Voice

In brief

  • ARC-AGI-3 exposes a massive gap between AGI claims and reality, with top AI models scoring below 1% while humans achieve perfect performance.
  • The benchmark tests true generalization—requiring agents to explore, plan, and learn from scratch in unknown environments rather than recall trained patterns.
  • Despite industry hype, current AI systems remain far from AGI, lacking the reasoning and adaptability that even young humans display naturally.

Nvidia CEO Jensen Huang went on Lex Fridman’s podcast last week and said, plainly, “I think we’ve achieved AGI.” Two days later, the most rigorous test in AI research dropped its newest artificial general intelligence benchmark—and every frontier model scored below 1%.

The ARC Prize Foundation released ARC-AGI-3 this week, and the results are brutal. Google’s Gemini 3.1 Pro led the pack at 0.37%. OpenAI’s GPT-5.4 came in at 0.26%. Anthropic’s Claude Opus 4.6 managed 0.25%, while xAI’s Grok-4.20 scored exactly zero. Humans, meanwhile, solved 100% of environments.

This isn’t a trivia test or coding exam, or even ultra-hard PhD-level questions. ARC-AGI-3 is something entirely different from anything the AI industry has faced before.

The benchmark was built by François Chollet and Mike Knoop’s foundation, which set up an in-house game studio and created 135 original interactive environments from scratch. The idea is to drop an AI agent into an unfamiliar game-like world with zero instructions, zero stated goals, and no description of the rules. The agent has to explore, figure out what it’s supposed to do, form a plan, and execute it.

If that sounds like something any five-year-old can do, you’re starting to understand the problem. If you want to see if you are better than AI, you can play the same games featured in the test by clicking on this link. We tried one; it was weird at first, but after a few seconds, you can easily get the hang of it.

It also is the clearest example of what the “G” in AGI stands for. When you generalize, you are able to create new knowledge (how a weird game works) without being trained on it in advance.

Previous versions of ARC tested static visual puzzles—show a pattern, predict the next one. They were hard at first. Then the labs threw compute power and training at them until the benchmarks were effectively dead. ARC-AGI-1, introduced in 2019, fell to test-time training and reasoning models. ARC-AGI-2 lasted about a year before Gemini 3.1 Pro hit 77.1%. The labs are very good at saturating benchmarks they can train against.

Version 3 was designed specifically to prevent that. With 110 of the 135 environments kept private—55 semi-private for API testing, 55 fully locked for competition—there’s no dataset to memorize. You can’t brute-force your way through novel game logic you’ve never seen.

Scoring isn’t pass/fail either. ARC-AGI-3 uses what the foundation calls RHAE—Relative Human Action Efficiency. The baseline is the second-best, first-run human performance. An AI that takes ten times as many actions as a human scores 1% for that level, not 10%. The formula squares the penalty for inefficiency. Wandering around, backtracking, and guessing your way to an answer gets punished hard.

The best AI agent in the month-long developer preview scored 12.58%. Frontier LLMs tested through the official API, with no custom tooling, couldn’t crack 1%. Ordinary humans solved all 135 environments with no prior training and no instructions. If that’s the bar, then the current crop of models isn’t clearing it.

There is one real methodological debate here. ARC’s report says a Duke-built custom harness pushed Claude Opus 4.6 from 0.25% to 97.1% on a single environment variant called TR87. That does not mean Claude scored 97.1% on ARC-AGI-3 overall; its official benchmark score remained 0.25%, but the shift is still worth noting.

The official benchmark feeds agents JSON code, not visuals. That’s either a methodological flaw or a demonstration that today’s models are better at processing human-friendly information than raw structured data. Chollet’s foundation has acknowledged the debate, but isn’t changing the format.

“Frame content perception and API format are not limiting factors for frontier model performance on ARC-AGI-3,” the paper reads. In other words, they seem to reject the idea that models fail because they “can’t see” the tasks properly, arguing instead that perception is already sufficient—and the real gap lies in reasoning and generalization.

The AGI reality check arrived during a week when the hype machine was running at full speed. Besides Huang’s comment, Arm named its new data center chip the “AGI CPU.” OpenAI’s Sam Altman has said they’ve “basically built AGI,” and Microsoft is already marketing a lab focused on building ASI: An evolution of what comes after AGI is achieved. The term is being stretched until it means whatever is commercially convenient, it appears.

Chollet’s position is simpler. If a normal human with no instructions can do it, and your system can’t, then you don’t have AGI—you have a very expensive autocomplete that needs a lot of help.

ARC Prize 2026 is offering $2 million across three competition tracks, all hosted on Kaggle. Every winning solution must be open-sourced. The clock is running, and right now, the machines aren’t even close.

Daily Debrief Newsletter

Start every day with the top news stories right now, plus original features, a podcast, videos and more.

Read the full article here

Fact Checker

Verify the accuracy of this article using AI-powered analysis and real-time sources.

Get Your Fact Check Report

Enter your email to receive detailed fact-checking analysis

5 free reports remaining

Continue with Full Access

You've used your 5 free reports. Sign up for unlimited access!

Already have an account? Sign in here

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Telegram Copy Link
News Room
  • Website
  • Facebook
  • X (Twitter)
  • Instagram
  • LinkedIn

The FSNN News Room is the voice of our in-house journalists, editors, and researchers. We deliver timely, unbiased reporting at the crossroads of finance, cryptocurrency, and global politics, providing clear, fact-driven analysis free from agendas.

Related Articles

Cryptocurrency & Free Speech Finance

Morgan Stanley enters bitcoin ETF race with market-leading low fee

14 minutes ago
Cryptocurrency & Free Speech Finance

Stablecoin Jitters, AI Micropayments Reshape Crypto

17 minutes ago
Cryptocurrency & Free Speech Finance

Gavin Newsom Bans California Public Officials From Prediction Market Insider Trading

19 minutes ago
Media & Culture

Turns Out That Advertisers Not Wanting To Fund Neo-Nazi-Adjacent Content Isn’t An Antitrust Violation

51 minutes ago
Media & Culture

Maine Lobsterman Asks the Supreme Court To Strike Down a Rule Allowing the Government To Track His Boat 24/7

57 minutes ago
Cryptocurrency & Free Speech Finance

BTC price falls below $67,000 as 10-year Treasury yield nears 1-year high of 4.5%

1 hour ago
Add A Comment
Leave A Reply Cancel Reply

Editors Picks

Stablecoin Jitters, AI Micropayments Reshape Crypto

17 minutes ago

Gavin Newsom Bans California Public Officials From Prediction Market Insider Trading

19 minutes ago

Turns Out That Advertisers Not Wanting To Fund Neo-Nazi-Adjacent Content Isn’t An Antitrust Violation

51 minutes ago

Maine Lobsterman Asks the Supreme Court To Strike Down a Rule Allowing the Government To Track His Boat 24/7

57 minutes ago
Latest Posts

BTC price falls below $67,000 as 10-year Treasury yield nears 1-year high of 4.5%

1 hour ago

Coinbase Users Push Back against Prediction Markets Notifications

1 hour ago

NYSE Parent Company Finalizes Polymarket Investment, Totaling $1.6 Billion

1 hour ago

Subscribe to News

Get the latest news and updates directly to your inbox.

At FSNN – Free Speech News Network, we deliver unfiltered reporting and in-depth analysis on the stories that matter most. From breaking headlines to global perspectives, our mission is to keep you informed, empowered, and connected.

FSNN.net is owned and operated by GlobalBoost Media
, an independent media organization dedicated to advancing transparency, free expression, and factual journalism across the digital landscape.

Facebook X (Twitter) Discord Telegram
Latest News

Morgan Stanley enters bitcoin ETF race with market-leading low fee

14 minutes ago

Stablecoin Jitters, AI Micropayments Reshape Crypto

17 minutes ago

Gavin Newsom Bans California Public Officials From Prediction Market Insider Trading

19 minutes ago

Subscribe to Updates

Get the latest news and updates directly to your inbox.

© 2026 GlobalBoost Media. All Rights Reserved.
  • Privacy Policy
  • Terms of Service
  • Our Authors
  • Contact

Type above and press Enter to search. Press Esc to cancel.

🍪

Cookies

We and our selected partners wish to use cookies to collect information about you for functional purposes and statistical marketing. You may not give us your consent for certain purposes by selecting an option and you can withdraw your consent at any time via the cookie icon.

Cookie Preferences

Manage Cookies

Cookies are small text that can be used by websites to make the user experience more efficient. The law states that we may store cookies on your device if they are strictly necessary for the operation of this site. For all other types of cookies, we need your permission. This site uses various types of cookies. Some cookies are placed by third party services that appear on our pages.

Your permission applies to the following domains:

  • https://fsnn.net
Necessary
Necessary cookies help make a website usable by enabling basic functions like page navigation and access to secure areas of the website. The website cannot function properly without these cookies.
Statistic
Statistic cookies help website owners to understand how visitors interact with websites by collecting and reporting information anonymously.
Preferences
Preference cookies enable a website to remember information that changes the way the website behaves or looks, like your preferred language or the region that you are in.
Marketing
Marketing cookies are used to track visitors across websites. The intention is to display ads that are relevant and engaging for the individual user and thereby more valuable for publishers and third party advertisers.