Morning Overview

Fake disease papers fooled AI tools into citing a condition that isn’t real

A disease called bixonimania does not exist. It has no symptoms, no patients, no clinical history. But for a stretch of time in early 2026, anyone who asked certain AI chatbots about it would have received a confident, detailed answer, because the tools had already absorbed fake scientific papers describing the condition as though it were real.

Researchers at the University of Gothenburg in Sweden created bixonimania as a deliberate test. They wrote two preprints, papers posted online before formal peer review, that described the fabricated disorder in the style of legitimate medical research. Then they uploaded the documents to open-access servers where AI training pipelines routinely harvest data. Within weeks, popular AI chatbots were treating bixonimania as a genuine medical condition, relaying invented symptoms and context to users as though citing established science.

“We wanted to see how far a completely fabricated condition could travel through the AI information ecosystem,” one member of the Gothenburg research team told Nature. The answer turned out to be further than many in the field expected.

The contamination did not stop with chatbots. A paper published in Cureus, an open-access medical journal known for its rapid review process, cited the bogus preprints as genuine scholarship. That paper was later retracted once the fabrication trail surfaced. A Nature daily briefing in April 2026 documented the full chain, linking directly to both fake preprints and the retracted Cureus article.

How two fake papers traveled so far

The Gothenburg team did not need to hack an AI system or bribe a journal editor. They exploited a structural weakness: large language models pull from vast pools of online text, and preprint servers are part of that pool. Preprints exist precisely because they have not yet passed peer review, but AI systems do not reliably distinguish between a vetted study in The Lancet and an unreviewed manuscript posted last Tuesday.

This vulnerability has a documented parallel. A study published in Nature Medicine found that medical large language models are vulnerable to data-poisoning attacks, where even tiny proportions of corrupted training data can shift a model’s outputs toward harmful information. That study tested a different mechanism: adversarial injection of poisoned tokens directly into model training data, rather than the passive placement of fake preprints on the open web. The bixonimania experiment exploited a distinct pathway, posting plausible-looking documents where data pipelines would find them on their own, but the underlying lesson is the same. These systems lack robust defenses against contaminated inputs, whether those inputs arrive through direct injection or through the normal ingestion of publicly available text.

A separate evaluation tool called SourceCheckup, developed by researchers affiliated with Cornell, tested whether retrieval-augmented generation, the feature designed to ground chatbot answers in real sources, actually prevents unsupported claims. It found that large language models still produce medical citations that do not hold up when checked against the documents they supposedly reference. That finding helps explain why bixonimania slipped through: the safeguard meant to catch exactly this kind of error is itself unreliable. It is worth noting that the SourceCheckup paper is itself a preprint hosted on arXiv and has not yet undergone formal peer review, a limitation readers should weigh when evaluating its conclusions.

What no one has answered yet

The most uncomfortable question hanging over this experiment is scale. No public data shows how many people encountered bixonimania through a chatbot before the Gothenburg team revealed the test. It could have been dozens. It could have been tens of thousands. Without access logs from AI providers, the actual reach, and any downstream harm to users who may have acted on the fabricated information, remains unknown.

AI developers have been largely silent. As of May 2026, no detailed public statement from OpenAI, Google, or other major chatbot operators has addressed specific safeguards against fabricated medical sources in connection with this experiment. Whether any provider updated its retrieval or filtering systems after the disclosure is not documented in available reporting. Researchers involved in the study have noted that this silence itself is telling, suggesting that companies may not yet have effective countermeasures in place.

The Cureus retraction raises its own set of unresolved questions. How did the fake preprints survive the journal’s review process? Did Cureus conduct an internal investigation, and if so, what did it find? Could similar citation chains, where fabricated sources prop up published papers, already exist in other journals without anyone noticing? Nature’s reporting confirms the retraction but does not include Cureus’s own account of the failure.

Then there is the question the researchers themselves have raised: if a pair of obviously fake papers, designed with intentional markers of fabrication, can travel this far, what happens when someone with real resources and no intention of disclosure does the same thing? The Nature Medicine poisoning study frames that scenario as a known risk. No one has publicly documented it happening in the wild.

What this means for anyone using AI for health questions

The bixonimania case does not prove that every medical answer from a chatbot is wrong. Most queries about well-established conditions will pull from a deep base of legitimate literature, and the answers will often be broadly accurate. What the experiment does prove is that the systems have no reliable way to reject a convincing fake, and that a fabricated condition can survive long enough in the information ecosystem to reach a peer-reviewed journal before anyone catches it.

For users, the practical response is straightforward but worth stating plainly: if a chatbot describes a condition you have never heard of, check it against PubMed, institutional clinical guidelines, or a physician before acting on it. If the chatbot does not link to a recognizable source, treat the answer as unverified. The gap between what these tools sound like (authoritative, precise, confident) and what they actually are (pattern-matching systems with no ability to evaluate truth) is exactly the gap that the Gothenburg researchers walked through.

More from Morning Overview

*This article was researched with the help of AI, with human editors creating the final content.