Morning Overview

AI trading bots lost money in every head-to-head contest with human traders — and made wildly different decisions when given identical instructions

When researchers handed the same trading instructions to dozens of identical AI agents and turned them loose in a simulated stock market, something unexpected happened: the bots did not agree with each other. They bought at different prices, sold at different times, and pushed the market further from rational values than human traders do in the same experiment. Then, in separate public competitions on Wall Street, AI models went up against professional traders and lost money across the board.

Those two findings, one from a controlled academic experiment and one from real-money contests, are now drawing scrutiny from researchers and financial professionals. Together, they represent the most concrete evidence to date that large language models are not ready to trade autonomously.

The lab experiment: identical instructions, divergent trades

A paper by researchers Jiayi Li, Yizhou Zhang, and colleagues, titled “LLM Agents Do Not Replicate Human Market Traders: Evidence From Experimental Finance,” set up a direct comparison between AI agents and decades of data on how humans behave in asset-bubble experiments. The experimental design follows the Smith-Suchanek-Williams protocol, a well-established framework in experimental economics where participants trade an asset with a known fundamental value that declines over a fixed number of periods. Humans reliably produce bubbles and crashes in this setup. The question was whether LLM agents would do the same.

They did not. In single-model markets, where many copies of the same LLM traded against each other under identical prompts, the agents failed to converge on any coherent strategy. Prices deviated from fundamental values by wider margins than in comparable human experiments. In mixed-model “battle royale” rounds, where different LLMs traded against one another, the dysfunction worsened. Price swings amplified rather than dampened, and the paper’s market-level metrics showed consistent overreaction relative to rational benchmarks.

The inconsistency finding is especially notable. Software given deterministic inputs is expected to produce predictable outputs. But the LLM agents, despite receiving the same prompt and operating under the same rules, made materially different trading decisions from one run to the next. That variability is not a minor technical quirk. For anyone building a risk model around automated trading, it means the system’s behavior cannot be reliably forecast even by its own operators.

What the research does and does not prove

The arXiv paper provides structured, repeatable experimental evidence. Its metrics are clearly defined, and it builds on a protocol that economists have used for decades to study market behavior. The paper has not yet undergone formal peer review, a standard caveat for preprints, but its methodology is transparent enough for other researchers to replicate or challenge.

What the paper does not prove is that AI will never be useful in financial markets. That distinction matters. The failure documented here is specific to a newer approach: deploying general-purpose large language models as autonomous decision-makers in markets. A momentum-based algorithm executing a predefined strategy is doing something fundamentally different from asking a GPT-style model to interpret market conditions and decide when to buy or sell.

Several open questions remain. The academic experiment used a simplified market, not a live exchange with real-time news, full order books, or overnight risk. Whether LLM agents would perform differently with richer data inputs is unknown. It is also possible that fine-tuning on financial data or improved prompt engineering could narrow the gap, but no published evidence supports that possibility yet, and no major AI developer has publicly addressed why their models struggle in trading environments.

What this means for investors and regulators

For individual investors evaluating AI-powered trading tools, the practical implication is blunt: no publicly available evidence from controlled experiments shows that LLM-based bots consistently beat human traders. Before committing real capital to any product marketed as an AI trading agent, investors should demand independently verified performance data, not self-reported returns from the vendor.

For financial regulators, the inconsistency finding raises a structural concern. Risk frameworks for automated trading generally assume that a given system will behave predictably under known conditions. If LLM-based agents produce different outputs from the same inputs, that assumption breaks down. A market with significant LLM-agent participation could exhibit volatility patterns that neither the agents’ operators nor the regulators monitoring them can anticipate in advance.

The gap between the pitch and the performance

Financial firms continue to invest heavily in AI trading capabilities, and the technology may eventually improve. But as of June 2026, the documented record from controlled academic experiments runs in one direction. In the lab, LLM agents deviate further from rational pricing than humans. When given identical instructions, they cannot even agree with themselves. The burden of proof now sits with the systems claiming to trade better than people, and so far, none of them have met it.

More from Morning Overview

*This article was researched with the help of AI, with human editors creating the final content.