Microsoft Built A Fake Marketplace To Test Ai Agents — They Failed In Surprising Ways

3 days ago

Image Credits:David Ryder / Bloomberg (PhotoMosh/modified) / Getty Images

9:00 AM PST · November 5, 2025

On Wednesday, researchers astatine Microsoft released a caller simulation situation designed to trial AI agents, on pinch caller investigation showing that existent agentic models whitethorn beryllium susceptible to manipulation. Conducted successful collaboration pinch Arizona State University, nan investigation raises caller questions astir really good AI agents will execute erstwhile moving unsupervised — and really quickly AI companies tin make bully connected promises of an agentic future.

The simulation environment, dubbed nan “Magentic Marketplace” by Microsoft, is built arsenic a synthetic level for experimenting connected AI supplier behavior. A emblematic research mightiness impact a customer-agent trying to bid meal according to a user’s instructions, while agents representing various restaurants compete to triumph nan order.

The team’s first experiments included 100 abstracted customer-side agents interacting pinch 300 business-side agents. Because nan root codification for nan marketplace is open-source, it should beryllium straightforward for different groups to adopt nan codification to tally caller experiments aliases reproduce findings.

Ece Kamar, managing head of Microsoft Research’s AI Frontiers Lab, says this benignant of investigation will beryllium captious to knowing nan capabilities of AI agents. “There is really a mobility astir really nan world is going to alteration by having these agents collaborating and talking to each different and negotiating,” said Kamar. “We want to understand these things deeply.”

The first investigation looked astatine a operation of starring models, including GPT-4o, GPT-5 and Gemini-2.5-Flash, and recovered immoderate astonishing weaknesses. In particular, nan researchers recovered respective techniques businesses could usage to manipulate customer-agents into buying their products. The researchers noticed a peculiar falloff successful ratio arsenic a customer-agent was fixed much options to take from, overwhelming nan attraction abstraction of nan agent.

“We want these agents to thief america pinch processing a batch of options,” Kamar says. “And we are seeing that nan existent models are really getting really overwhelmed by having excessively galore options.”

The agents besides ran into problem erstwhile they were asked to collaborate toward a communal goal, apparently unsure of which supplier should play what domiciled successful nan collaboration. Performance improved erstwhile nan models were fixed much definitive instructions connected really to collaborate, but nan researchers still saw nan models’ inherent capabilities arsenic successful request of improvement.

Techcrunch event

San Francisco | October 13-15, 2026

“We tin instruct nan models — for illustration we tin show them, measurement by step,” Kamar said. “But if we are inherently testing their collaboration capabilities, I would expect these models to person these capabilities by default.”

Russell Brandom has been covering nan tech manufacture since 2012, pinch a attraction connected level argumentation and emerging technologies. He antecedently worked astatine The Verge and Rest of World, and has written for Wired, The Awl and MIT’s Technology Review. He tin beryllium reached astatine russell.brandom@techcrunch.com aliases connected Signal astatine 412-401-5489.

English (US) ·

Indonesian (ID) ·

· · ·

↑

Microsoft Built A Fake Marketplace To Test Ai Agents — They Failed In Surprising Ways

Related Article

I Invested In A Portable Label Printer, And It's Already Paying Off For My Office

Is Wall Street Losing Faith In Ai?

Steam Store Pages Get A Mini Makeover To Better Suit Wide Screens

Popular Article

The Best Wireless Headphones For 2025: Bluetooth Options For Every Budget

New Travel Turmoil As American Airlines, United, Jetblue, And Avelo Slashing Flights And Routes – What You Need To Know

American, Delta, Southwest And Alaska Connecting Chicago, Philadelphia, Raleigh-durham, San Diego, Santa Maria, Sun Valley With New Winter Airline Rou...

Thousands Of Air Canada Flights At Risk As Potential Strike Threat Set To Disrupt Global Travel

Google Is Experimenting With Machine-learning Powered Age Estimation Tech In The U.s.