AI trading bots lost money in every head-to-head contest with human traders — and made wildly different decisions when given identical instructions

When researchers handed the same trading instructions to dozens of identical AI agents and turned them loose in a simulated stock market, something unexpected happened: the bots did not agree with each other. They bought at different prices, sold at different times, and pushed the market further from rational values than human traders do in the same experiment. Then, in separate public competitions on Wall Street, AI models went up against professional traders and lost money across the board.

Those two findings, one from a controlled academic experiment and one from real-money contests, are now drawing scrutiny from researchers and financial professionals. Together, they represent the most concrete evidence to date that large language models are not ready to trade autonomously.

The lab experiment: identical instructions, divergent trades

A paper by researchers Jiayi Li, Yizhou Zhang, and colleagues, titled “LLM Agents Do Not Replicate Human Market Traders: Evidence From Experimental Finance,” set up a direct comparison between AI agents and decades of data on how humans behave in asset-bubble experiments. The experimental design follows the Smith-Suchanek-Williams protocol, a well-established framework in experimental economics where participants trade an asset with a known fundamental value that declines over a fixed number of periods. Humans reliably produce bubbles and crashes in this setup. The question was whether LLM agents would do the same.

They did not. In single-model markets, where many copies of the same LLM traded against each other under identical prompts, the agents failed to converge on any coherent strategy. Prices deviated from fundamental values by wider margins than in comparable human experiments. In mixed-model “battle royale” rounds, where different LLMs traded against one another, the dysfunction worsened. Price swings amplified rather than dampened, and the paper’s market-level metrics showed consistent overreaction relative to rational benchmarks.

The inconsistency finding is especially notable. Software given deterministic inputs is expected to produce predictable outputs. But the LLM agents, despite receiving the same prompt and operating under the same rules, made materially different trading decisions from one run to the next. That variability is not a minor technical quirk. For anyone building a risk model around automated trading, it means the system’s behavior cannot be reliably forecast even by its own operators.

What the research does and does not prove

The arXiv paper provides structured, repeatable experimental evidence. Its metrics are clearly defined, and it builds on a protocol that economists have used for decades to study market behavior. The paper has not yet undergone formal peer review, a standard caveat for preprints, but its methodology is transparent enough for other researchers to replicate or challenge.

What the paper does not prove is that AI will never be useful in financial markets. That distinction matters. The failure documented here is specific to a newer approach: deploying general-purpose large language models as autonomous decision-makers in markets. A momentum-based algorithm executing a predefined strategy is doing something fundamentally different from asking a GPT-style model to interpret market conditions and decide when to buy or sell.

Several open questions remain. The academic experiment used a simplified market, not a live exchange with real-time news, full order books, or overnight risk. Whether LLM agents would perform differently with richer data inputs is unknown. It is also possible that fine-tuning on financial data or improved prompt engineering could narrow the gap, but no published evidence supports that possibility yet, and no major AI developer has publicly addressed why their models struggle in trading environments.

What this means for investors and regulators

For individual investors evaluating AI-powered trading tools, the practical implication is blunt: no publicly available evidence from controlled experiments shows that LLM-based bots consistently beat human traders. Before committing real capital to any product marketed as an AI trading agent, investors should demand independently verified performance data, not self-reported returns from the vendor.

For financial regulators, the inconsistency finding raises a structural concern. Risk frameworks for automated trading generally assume that a given system will behave predictably under known conditions. If LLM-based agents produce different outputs from the same inputs, that assumption breaks down. A market with significant LLM-agent participation could exhibit volatility patterns that neither the agents’ operators nor the regulators monitoring them can anticipate in advance.

The gap between the pitch and the performance

Financial firms continue to invest heavily in AI trading capabilities, and the technology may eventually improve. But as of June 2026, the documented record from controlled academic experiments runs in one direction. In the lab, LLM agents deviate further from rational pricing than humans. When given identical instructions, they cannot even agree with themselves. The burden of proof now sits with the systems claiming to trade better than people, and so far, none of them have met it.

More from Morning Overview

*This article was researched with the help of AI, with human editors creating the final content.

IG

FB

PIN

LI

X

Global Font

AI trading bots lost money in every head-to-head contest with human traders — and made wildly different decisions when given identical instructions

The lab experiment: identical instructions, divergent trades

What the research does and does not prove

What this means for investors and regulators

The gap between the pitch and the performance

Dorian Maddox

Author

Scattered supercells are possible Saturday across the central plains — with large hail and a few tornadoes near a dryline

1.88 million acres burned across 25,500 fires in 2026 — nearly double the 10-year average before summer begins

NASA’s Psyche spacecraft fires its Hall thrusters one last time before Friday’s Mars flyby at 12,000 mph

U.S. lawmakers warned of an AI-driven wave of zero-day vulnerability discoveries that could overwhelm security teams

Scientists used AI to comb through NASA’s TESS data and confirmed 100 hidden exoplanets — including worlds that orbit their star in under a day

More in AI

AI

Scientists used AI to comb through NASA’s TESS data and confirmed 100 hidden exoplanets — including worlds that orbit their star in under a day

AI

Baidu’s CEO says AI has shifted from model competition to agent competition — agents will learn, verify, and optimize on their own

AI

Baidu says the next phase of AI belongs to agents that teach themselves — not models that wait for instructions

AI

65% of U.S. doctors are using the same AI tool across 27 million patient encounters every month — most patients don’t know

AI

AI data centers now consume 29.6 gigawatts — equal to New York state’s peak load — and demand is doubling every 18 months

AI

Scientists say AI price hikes, usage caps, and unreliable outputs are pushing researchers away from the tools

AI

Human scientists still trounce the best AI agents on complex research tasks — but the gap is closing fast

AI

Alibaba’s profits collapsed 84% because the company is spending aggressively on AI infrastructure with no return yet

IG

FB

PIN

LI

X

IG

FB

PIN

LI

X

AI trading bots lost money in every head-to-head contest with human traders — and made wildly different decisions when given identical instructions

The lab experiment: identical instructions, divergent trades

What the research does and does not prove

What this means for investors and regulators

The gap between the pitch and the performance

Author

Get weekly updates with the latest news and tips!

More in AI

IG

FB

PIN

LI

X