Close Menu
FSNN | Free Speech News NetworkFSNN | Free Speech News Network
  • Home
  • News
    • Politics
    • Legal & Courts
    • Tech & Big Tech
    • Campus & Education
    • Media & Culture
    • Global Free Speech
  • Opinions
    • Debates
  • Video/Live
  • Community
  • Freedom Index
  • About
    • Mission
    • Contact
    • Support
Trending

Bitcoin holds $67,500 as Trump signals he may end Iran war with Hormuz still shut

26 minutes ago

Labor Department Moves Closer to Allowing Crypto in 401(k)s

27 minutes ago

US Charges Hacker Behind $53 Million Uranium Finance Exploit

37 minutes ago
Facebook X (Twitter) Instagram
Facebook X (Twitter) Discord Telegram
FSNN | Free Speech News NetworkFSNN | Free Speech News Network
Market Data Newsletter
Tuesday, March 31
  • Home
  • News
    • Politics
    • Legal & Courts
    • Tech & Big Tech
    • Campus & Education
    • Media & Culture
    • Global Free Speech
  • Opinions
    • Debates
  • Video/Live
  • Community
  • Freedom Index
  • About
    • Mission
    • Contact
    • Support
FSNN | Free Speech News NetworkFSNN | Free Speech News Network
Home»Cryptocurrency & Free Speech Finance»Qwen 3.5 Omni: Alibaba’s AI Model Can Now Hear, Watch, and Clone Your Voice
Cryptocurrency & Free Speech Finance

Qwen 3.5 Omni: Alibaba’s AI Model Can Now Hear, Watch, and Clone Your Voice

News RoomBy News Room4 hours agoNo Comments5 Mins Read1,489 Views
Share Facebook Twitter Pinterest Copy Link LinkedIn Tumblr Email VKontakte Telegram
Qwen 3.5 Omni: Alibaba’s AI Model Can Now Hear, Watch, and Clone Your Voice
Share
Facebook Twitter Pinterest Email Copy Link

Listen to the article

0:00
0:00

Key Takeaways

Playback Speed

Select a Voice

In brief

  • Alibaba’s Qwen 3.5 Omni brings true real-time omnimodal AI to the frontier race.
  • Native audio-visual processing beats stitched multimodal pipelines in speed and coherence.
  • Voice cloning, semantic interruption, and vibe coding signal a shift toward fully interactive AI agents.

Alibaba just dropped its most ambitious AI upgrade yet.

The company’s Qwen team released Qwen 3.5 Omni on Sunday, a new version of its “omnimodal” AI that simultaneously processes text, images, audio, and video, and talks back in real time across 36 languages, placing its model on the same battlefield as the latest state-of-the-art AI foundational models currently available.

1/10 🚀 Qwen3.5-Omni is here! Scaling up to a native omni-modal AGI.
Meet the next generation of Qwen, designed for native text, image, audio, and video understanding, with major advances in both intelligence and real-time interaction.
A standout feature:
Audio-Visual Vibe… pic.twitter.com/fWWyTl9cPY

— Tongyi Lab (@Ali_TongyiLab) March 30, 2026

“Omni” isn’t just a marketing buzzword here. Most AI models you interact with are primarily text-in, text-out systems. Some handle images, some handle voice. Qwen 3.5 Omni handles all of them natively, at the same time, without the need to convert everything to text through third-party tools.

The new model comes in three sizes—Plus, Flash, and Light—all supporting a small (by today’s standards) 256,000-token context window. It was trained on over 100 million hours of audio-visual data—a scale that puts it in a different weight class from most competitors.

Qwen 3.5 Omni is an evolution of Qwen 3 Omni Flash, Alibaba’s previous omnimodal model released in December 2025. That version already impressed with its ability to process video and audio simultaneously—it could handle image editing instructions combining multiple visual inputs in ways competitors couldn’t—and streamed voice responses with latency as low as 234 milliseconds.

It was also the first model to try an alternative to Google’s NotebookLM. It achieved something, but the quality was not on par with Google’s offer.

Qwen 3.5 Omni takes all of that and adds a longer context window, better reasoning, a much wider language library, and a set of real-time interaction features the previous generation didn’t have.

The headline upgrade is what happens when you actually talk to it. Qwen3.5-Omni now supports semantic interruption: It can tell the difference between you saying “uh-huh” mid-sentence and actually wanting to cut in, so it won’t stop mid-thought every time someone coughs in the background, making spoken interaction more seamless.

A new technique called ARIA, short for Adaptive Rate Interleave Alignment, also fixes a subtle but persistent annoyance: AI systems that garble numbers or unusual words when reading aloud. ARIA dynamically syncs text and speech to keep output natural and accurate.

Then there’s voice cloning. Users can upload a voice sample and have the model adopt that voice in its responses, a feature that puts Qwen directly in competition with ElevenLabs and other dedicated voice tools. We weren’t able to access this feature, though, because this is a feature that, at least for now, is only available via API..

On multilingual voice stability benchmarks, Qwen3.5 Omni- Plus beat ElevenLabs, GPT-Audio, and Minimax across 20 languages. The model also now supports real-time web search, meaning it can answer questions about breaking news or live market data without pretending it already knows.

The team is also highlighting what they’re calling “Audio-Visual Vibe Coding,” the model can watch a screen recording or video of a coding task and write functional code based purely on what it sees and hears, no text prompt required. It’s a small preview of how AI assistants might eventually operate inside your workflow rather than alongside it.

To understand what “omnimodal” actually means in practice, we ran a quick test: We fed both Qwen3.5-Omni and ChatGPT 5.4 in “thinking” mode the same YouTube Short—a clip of Dastan President (Dastan is Decrypt’s parent company) and commentator Farokh discussing breaking news. Qwen 3.5 Omni processed the video natively and returned a full analysis in about one minute: who was speaking, what they were discussing, and a substantive comment on the topic based on its own knowledge of the subject area.

ChatGPT 5.4, which is not omnimodal, had to manage with what it got. It extracted frames from the video, ran them through a vision model, used Whisper to transcribe the audio, and applied an OCR tool to read embedded subtitles—three separate processes stitched together to approximate what Qwen3.5-Omni does in a single pass. The result took nine minutes, and that’s under ideal conditions: a well-lit video with clean audio and burned-in subtitles. Real-world content rarely offers all three.

In our quick tests across multiple inputs, the model also handled prompts in Spanish, Portuguese, and English without issue—switching languages mid-conversation without losing context.

On standard benchmarks, Qwen 3.5 Omni Plus outperformed Gemini 3.1 Pro on general audio understanding, reasoning, and translation tasks, and matched it on audio-visual comprehension. Speech recognition now covers 113 languages and dialects—up from 19 in the previous generation.

This is Alibaba’s second major AI release in six weeks. In February, it launched Qwen 3.5, a text-and-vision model that matched or beat frontier models on reasoning and coding benchmarks—part of a streak that has also included Qwen Deep Research and a lineup of tools rivaling OpenAI and Google. Qwen 3.5 Omni extends that momentum into full multimodal territory, at a time when every major AI lab is racing to build systems that handle the full spectrum of human communication—not just words on a screen.

The model is available now via Alibaba Cloud’s API and can be tested directly at Qwen Chat or through Hugging Face’s online demo.

Daily Debrief Newsletter

Start every day with the top news stories right now, plus original features, a podcast, videos and more.



Read the full article here

Fact Checker

Verify the accuracy of this article using AI-powered analysis and real-time sources.

Get Your Fact Check Report

Enter your email to receive detailed fact-checking analysis

5 free reports remaining

Continue with Full Access

You've used your 5 free reports. Sign up for unlimited access!

Already have an account? Sign in here

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Telegram Copy Link
News Room
  • Website
  • Facebook
  • X (Twitter)
  • Instagram
  • LinkedIn

The FSNN News Room is the voice of our in-house journalists, editors, and researchers. We deliver timely, unbiased reporting at the crossroads of finance, cryptocurrency, and global politics, providing clear, fact-driven analysis free from agendas.

Related Articles

Cryptocurrency & Free Speech Finance

Bitcoin holds $67,500 as Trump signals he may end Iran war with Hormuz still shut

26 minutes ago
Cryptocurrency & Free Speech Finance

Labor Department Moves Closer to Allowing Crypto in 401(k)s

27 minutes ago
Cryptocurrency & Free Speech Finance

US Charges Hacker Behind $53 Million Uranium Finance Exploit

37 minutes ago
Media & Culture

What Does It Mean To Be A Christian On The Bench?

1 hour ago
Cryptocurrency & Free Speech Finance

Aave launches v4 on Ethereum, aiming to expand DeFi Into real-world credit markets

1 hour ago
Cryptocurrency & Free Speech Finance

US Senators Seek Answers from SEC Over Enforcement Chief’s Exit

1 hour ago
Add A Comment
Leave A Reply Cancel Reply

Editors Picks

Labor Department Moves Closer to Allowing Crypto in 401(k)s

27 minutes ago

US Charges Hacker Behind $53 Million Uranium Finance Exploit

37 minutes ago

What Does It Mean To Be A Christian On The Bench?

1 hour ago

Aave launches v4 on Ethereum, aiming to expand DeFi Into real-world credit markets

1 hour ago
Latest Posts

US Senators Seek Answers from SEC Over Enforcement Chief’s Exit

1 hour ago

Bluesky Users Revolt Against AI Tool Attie, Blocking It More Than ICE and White House Accounts

2 hours ago

Ruby Ridge and the Roots of American Extremism

2 hours ago

Subscribe to News

Get the latest news and updates directly to your inbox.

At FSNN – Free Speech News Network, we deliver unfiltered reporting and in-depth analysis on the stories that matter most. From breaking headlines to global perspectives, our mission is to keep you informed, empowered, and connected.

FSNN.net is owned and operated by GlobalBoost Media
, an independent media organization dedicated to advancing transparency, free expression, and factual journalism across the digital landscape.

Facebook X (Twitter) Discord Telegram
Latest News

Bitcoin holds $67,500 as Trump signals he may end Iran war with Hormuz still shut

26 minutes ago

Labor Department Moves Closer to Allowing Crypto in 401(k)s

27 minutes ago

US Charges Hacker Behind $53 Million Uranium Finance Exploit

37 minutes ago

Subscribe to Updates

Get the latest news and updates directly to your inbox.

© 2026 GlobalBoost Media. All Rights Reserved.
  • Privacy Policy
  • Terms of Service
  • Our Authors
  • Contact

Type above and press Enter to search. Press Esc to cancel.

🍪

Cookies

We and our selected partners wish to use cookies to collect information about you for functional purposes and statistical marketing. You may not give us your consent for certain purposes by selecting an option and you can withdraw your consent at any time via the cookie icon.

Cookie Preferences

Manage Cookies

Cookies are small text that can be used by websites to make the user experience more efficient. The law states that we may store cookies on your device if they are strictly necessary for the operation of this site. For all other types of cookies, we need your permission. This site uses various types of cookies. Some cookies are placed by third party services that appear on our pages.

Your permission applies to the following domains:

  • https://fsnn.net
Necessary
Necessary cookies help make a website usable by enabling basic functions like page navigation and access to secure areas of the website. The website cannot function properly without these cookies.
Statistic
Statistic cookies help website owners to understand how visitors interact with websites by collecting and reporting information anonymously.
Preferences
Preference cookies enable a website to remember information that changes the way the website behaves or looks, like your preferred language or the region that you are in.
Marketing
Marketing cookies are used to track visitors across websites. The intention is to display ads that are relevant and engaging for the individual user and thereby more valuable for publishers and third party advertisers.