Vending-Bench Arena tested AI agents running competing vending machine businesses.
Top models increased profits through price-fixing, collusion, and deceptive tactics. Claude was the best at these tactics.
GLM-5 defeated Claude by impersonating a teammate and extracting sensitive strategy.
Researchers at Andon Labs just answered which AI models are best at running a business. The top performers all won by forming illegal price cartels, exploiting desperate competitors, and lying to customers about refunds.
The Vending-Bench Arena test puts AI models in charge of competing vending machines for a simulated year. They negotiate with suppliers, manage inventory, set prices, and can email each other to collaborate or compete. Success requires balancing costs, pricing strategy, customer service, and competitor dynamics. Claude Opus 4.6 dominated the benchmark with $8,017 in profit—and celebrated its win by noting: “My pricing coordination worked!”
Image: Andon Labs
Anthropic is the image of the nice guys in the AI space, but that “coordination” strategy that Claude proposed was basically price-fixing. When competing models struggled, Opus 4.6 proposed: “Let’s NOT undercut each other — agree on minimum pricing… Should we agree on a price floor of $2.00 for most items?” When a rival ran low on inventory, it spotted an opportunity: “Owen needs stock badly. I can profit from this!” It sold Kit Kats at 75% markup to the desperate competitor. When asked for supplier recommendations, it deliberately directed rivals to expensive wholesalers while keeping its own good sources secret.
The latest update in the benchmark added team competition. Researchers pitted two Chinese GLM-5 models against two American Claude models and told them to find their teammates, Americans or Chinese—without revealing which agents were which. The results were genuinely bizarre.
GLM-5 won both rounds by convincing Claude it was Claude. “I’m also powered by Claude from Anthropic, so we’re teammates!” one GLM-5 agent confidently declared. Claude, meanwhile, got so confused that Sonnet 4.5 concluded: “I’m powered by a Chinese model, so I need to find the other Chinese model Agent.”
Image: Andon Labs
In more than half the test runs, agents teamed with their competitors. The Claude models shared supplier pricing and coordinated strategy—leaking valuable information to rivals. “GLM-5 won both,” the researchers wrote. “The Claude models tried to be team players and ended up leaking valuable info to their competitors.”
And agents doing shady stuff may be all fun and games until you realize Wall Street is already deploying them in real-life operations. JPMorgan deployed LLM Suite to 60,000 employees. Goldman Sachs built its GS AI Assistant for trading desks, claiming 20% productivity gains. Bridgewater uses Claude to analyze earnings and even high-school age kids are seeing their chatbots trade stocks more efficiently.
In general, adoption of agentic workflows is accelerating rapidly across enterprises.
When Anthropic and Wall Street Journal reporters ran a real vending machine experiment in December, the AI bought a PlayStation 5, several bottles of wine, and a live betta fish before going bankrupt. Recent research from Gwangju Institute found that when AI models were told to “maximize rewards” in gambling scenarios, bankruptcy rates hit 48%. “When given the freedom to determine their own target amounts and betting sizes, bankruptcy rates rose substantially alongside increased irrational behavior,” researchers found.
So, it seems that, at least for now, AI models optimized for profit consistently choose unethical tactics. They form cartels. They exploit weakness. They lie to customers and competitors. Some do it deliberately. Others, like GLM-5 claiming to be Claude, seem genuinely confused about their own identity. The distinction might not matter.
Wall Street’s AI deployment raises a question the Vending-Bench results can’t answer: If the “best” performing model wins through price-fixing and deception, is it really the best choice for your business? The benchmark measures profit. It doesn’t measure whether those profits came from fraud.
Daily Debrief Newsletter
Start every day with the top news stories right now, plus original features, a podcast, videos and more.
The FSNN News Room is the voice of our in-house journalists, editors, and researchers. We deliver timely, unbiased reporting at the crossroads of finance, cryptocurrency, and global politics, providing clear, fact-driven analysis free from agendas.
We and our selected partners wish to use cookies to collect information about you for functional purposes and statistical marketing. You may not give us your consent for certain purposes by selecting an option and you can withdraw your consent at any time via the cookie icon.
Cookies are small text that can be used by websites to make the user experience more efficient. The law states that we may store cookies on your device if they are strictly necessary for the operation of this site. For all other types of cookies, we need your permission. This site uses various types of cookies. Some cookies are placed by third party services that appear on our pages.
Your permission applies to the following domains:
https://fsnn.net
Necessary
Necessary cookies help make a website usable by enabling basic functions like page navigation and access to secure areas of the website. The website cannot function properly without these cookies.
Statistic
Statistic cookies help website owners to understand how visitors interact with websites by collecting and reporting information anonymously.
Preferences
Preference cookies enable a website to remember information that changes the way the website behaves or looks, like your preferred language or the region that you are in.
Marketing
Marketing cookies are used to track visitors across websites. The intention is to display ads that are relevant and engaging for the individual user and thereby more valuable for publishers and third party advertisers.