Close Menu
FSNN | Free Speech News NetworkFSNN | Free Speech News Network
  • Home
  • News
    • Politics
    • Legal & Courts
    • Tech & Big Tech
    • Campus & Education
    • Media & Culture
    • Global Free Speech
  • Opinions
    • Debates
  • Video/Live
  • Community
  • Freedom Index
  • About
    • Mission
    • Contact
    • Support
Trending

A forehead tattoo typo became a $600,000 crypto token, revealing the dark side of memecoin craze

4 minutes ago

Bybit Launches tokenized IPO Access with SpaceX Debut

6 minutes ago

OpenAI Confirms Confidential IPO Filing, Keeps Timing Open

8 minutes ago
Facebook X (Twitter) Instagram
Facebook X (Twitter) Discord Telegram
FSNN | Free Speech News NetworkFSNN | Free Speech News Network
Market Data Newsletter
Monday, June 8
  • Home
  • News
    • Politics
    • Legal & Courts
    • Tech & Big Tech
    • Campus & Education
    • Media & Culture
    • Global Free Speech
  • Opinions
    • Debates
  • Video/Live
  • Community
  • Freedom Index
  • About
    • Mission
    • Contact
    • Support
FSNN | Free Speech News NetworkFSNN | Free Speech News Network
Home»Cryptocurrency & Free Speech Finance»China’s Xiaomi MiMo Is Now 15X Faster Than ChatGPT and Claude
Cryptocurrency & Free Speech Finance

China’s Xiaomi MiMo Is Now 15X Faster Than ChatGPT and Claude

News RoomBy News Room2 hours agoNo Comments4 Mins Read1,489 Views
Share Facebook Twitter Pinterest Copy Link LinkedIn Tumblr Email VKontakte Telegram
China’s Xiaomi MiMo Is Now 15X Faster Than ChatGPT and Claude
Share
Facebook Twitter Pinterest Email Copy Link

Listen to the article

0:00
0:00

Key Takeaways

Playback Speed

Select a Voice

In brief

  • Xiaomi and inference partner TileRT have broken 1,000 tokens per second on a 1-trillion-parameter model, a first at that scale, using a standard 8-GPU commodity node—not custom chips.
  • The speed comes from FP4 quantization on the model’s expert layers and DFlash speculative decoding, which proposes a full block of tokens in one pass instead of one at a time.
  • A limited API trial opens June 9 through June 23, priced at 3× standard MiMo rates for roughly 10× the generation speed.

Most people know Xiaomi as the Chinese phone brand. The one that makes cheap electric scooters and air purifiers. Not exactly the company you’d expect to break a major AI inference speed record on a Monday morning.

And yet. Xiaomi just released MiMo-V2.5-Pro-UltraSpeed, a serving mode for its trillion-parameter flagship that hits over 1,000 tokens per second—peaking near 1,200 in demos.

Parameters are the internal numerical weights that define how a model thinks—the more you have, the more complex the patterns it can recognize. Tokens are the chunks of text the model reads and writes, roughly three-quarters of a word each on average.

Xiaomi did it on a single 8-GPU commodity node. Standard hardware, no custom chips. That changes the calculus for who can actually deploy this kind of speed in production.

To put that number in human terms: per Artificial Analysis, GPT-5.5—what most ChatGPT users are actually talking to—sits at 68. Claude Opus 4.6 lands around 71 with the lower end model, Haiku, touching 98 tokens per second. Gemini Flash hits 192 tokens per second. MiMo-V2.5-Pro-UltraSpeed does 1,000, on a model that matches Opus on coding benchmarks.

Cerebras and Groq built entire businesses around this problem. Cerebras designed a wafer-scale chip the size of a dinner plate, packing 44GB of on-chip memory to eliminate the bandwidth bottleneck that slows down GPU inference. It hit 969 tokens per second on Meta’s Llama 3.1 405B—impressive, but that’s a 405-billion-parameter model, less than half the size of MiMo-V2.5-Pro. Groq’s custom Language Processing Unit architecture tops out around 300–750 tokens per second depending on model.

Neither runs on hardware you can rent from AWS tonight.

Xiaomi did it on commodity GPUs through software alone—a combination of model-level tricks and a purpose-built inference engine called TileRT.

What’s actually going on under the hood

Two techniques carry the speed. The first technique is called FP4 Quantization: instead of running the model at full 8-bit or 16-bit numerical precision, Xiaomi shrinks the expert layers—which make up most of the 1 trillion parameters—down to 4-bit. Memory footprint drops, bandwidth pressure drops, speed goes up. The catch is usually a small quality degradation. Xiaomi’s fix is surgical: only the expert layers get compressed, everything else stays at full precision. With this approach, quality loss is described as near-zero.

The second is DFlash speculative decoding. Normal speculative decoding has a small draft model guess the next few tokens, then the big model verifies them in parallel. DFlash skips the sequential drafting entirely—it fills a whole block of masked positions in a single forward pass. In coding tasks, the big model accepts an average of 6.3 out of 8 proposed tokens per verification round. That’s six tokens confirmed in one step instead of one.

TileRT ties it together. It keeps the entire compute pipeline continuously resident inside the GPU—no per-operator launch overhead, no execution gaps.

Xiaomi calls this approach “extreme model-system codesign,” and the phrase is accurate: Neither technique alone gets to 1,000 tokens per second, but the synergy among all approaches does.

MiMo-V2.5-Pro is a frontier-level model. We covered the V2.5 Pro launch in April—it matches Claude Opus on most coding benchmarks and runs at roughly $0.43 input / $0.87 output per million tokens. Opus costs $5 input / $25 output per million tokens.

UltraSpeed accelerates that exact MiMo V2.5 Pro model, not a stripped-down version.

Fast enough inference changes how you can use a model. You can run dozens of reasoning paths in parallel instead of waiting on one answer. Fraud detection, trading signal generation, real-time agent loops—all of these have hard latency constraints that 60 tokens per second can’t meet. At 1,000 tokens per second, they can.

Xiaomi is pricing the speed at 3 times the standard MiMo-V2.5-Pro rate for roughly 10 times the output. The API trial runs June 9–23, application-based, with priority given to enterprise and professional developers. The FP4-DFlash checkpoint is already open-sourced on Hugging Face for community testing.

Daily Debrief Newsletter

Start every day with the top news stories right now, plus original features, a podcast, videos and more.

Read the full article here

Fact Checker

Verify the accuracy of this article using AI-powered analysis and real-time sources.

Get Your Fact Check Report

Enter your email to receive detailed fact-checking analysis

5 free reports remaining

Continue with Full Access

You've used your 5 free reports. Sign up for unlimited access!

Already have an account? Sign in here

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Telegram Copy Link
News Room
  • Website
  • Facebook
  • X (Twitter)
  • Instagram
  • LinkedIn

The FSNN News Room is the voice of our in-house journalists, editors, and researchers. We deliver timely, unbiased reporting at the crossroads of finance, cryptocurrency, and global politics, providing clear, fact-driven analysis free from agendas.

Related Articles

Cryptocurrency & Free Speech Finance

A forehead tattoo typo became a $600,000 crypto token, revealing the dark side of memecoin craze

4 minutes ago
Cryptocurrency & Free Speech Finance

Bybit Launches tokenized IPO Access with SpaceX Debut

6 minutes ago
Cryptocurrency & Free Speech Finance

OpenAI Confirms Confidential IPO Filing, Keeps Timing Open

8 minutes ago
Media & Culture

Ex-DOGE Staffer, Ex-Pete-Hegseth Advisor Justin Fulcher Sues the Guardian for Libel

49 minutes ago
Cryptocurrency & Free Speech Finance

Why a hidden math metric shows bitcoin may be getting too cheap for investors to ignore

1 hour ago
Cryptocurrency & Free Speech Finance

OpenAI Wants to Kill the Chatbot It Invented and Turn It Into a Superapp

1 hour ago
Add A Comment
Leave A Reply Cancel Reply

Editors Picks

Bybit Launches tokenized IPO Access with SpaceX Debut

6 minutes ago

OpenAI Confirms Confidential IPO Filing, Keeps Timing Open

8 minutes ago

Ex-DOGE Staffer, Ex-Pete-Hegseth Advisor Justin Fulcher Sues the Guardian for Libel

49 minutes ago

Journalist, press freedom advocate Cristian Herrera Nariño killed in northern Colombia

53 minutes ago
Latest Posts

Why a hidden math metric shows bitcoin may be getting too cheap for investors to ignore

1 hour ago

OpenAI Wants to Kill the Chatbot It Invented and Turn It Into a Superapp

1 hour ago

MetaMask Unveils Self-Custodial Wallet for AI-powered DeFi Trading

1 hour ago

Subscribe to News

Get the latest news and updates directly to your inbox.

At FSNN – Free Speech News Network, we deliver unfiltered reporting and in-depth analysis on the stories that matter most. From breaking headlines to global perspectives, our mission is to keep you informed, empowered, and connected.

FSNN.net is owned and operated by GlobalBoost Media
, an independent media organization dedicated to advancing transparency, free expression, and factual journalism across the digital landscape.

Facebook X (Twitter) Discord Telegram
Latest News

A forehead tattoo typo became a $600,000 crypto token, revealing the dark side of memecoin craze

4 minutes ago

Bybit Launches tokenized IPO Access with SpaceX Debut

6 minutes ago

OpenAI Confirms Confidential IPO Filing, Keeps Timing Open

8 minutes ago

Subscribe to Updates

Get the latest news and updates directly to your inbox.

© 2026 GlobalBoost Media. All Rights Reserved.
  • Privacy Policy
  • Terms of Service
  • Our Authors
  • Contact

Type above and press Enter to search. Press Esc to cancel.

🍪

Cookies

We and our selected partners wish to use cookies to collect information about you for functional purposes and statistical marketing. You may not give us your consent for certain purposes by selecting an option and you can withdraw your consent at any time via the cookie icon.

Cookie Preferences

Manage Cookies

Cookies are small text that can be used by websites to make the user experience more efficient. The law states that we may store cookies on your device if they are strictly necessary for the operation of this site. For all other types of cookies, we need your permission. This site uses various types of cookies. Some cookies are placed by third party services that appear on our pages.

Your permission applies to the following domains:

  • https://fsnn.net
Necessary
Necessary cookies help make a website usable by enabling basic functions like page navigation and access to secure areas of the website. The website cannot function properly without these cookies.
Statistic
Statistic cookies help website owners to understand how visitors interact with websites by collecting and reporting information anonymously.
Preferences
Preference cookies enable a website to remember information that changes the way the website behaves or looks, like your preferred language or the region that you are in.
Marketing
Marketing cookies are used to track visitors across websites. The intention is to display ads that are relevant and engaging for the individual user and thereby more valuable for publishers and third party advertisers.