Analysis finds AI chatbots can be steered into confident medical misinformation

Ask a leading AI chatbot whether a common vaccine is safe, and you might get a clear, confident answer that happens to be wrong. That is the central finding across multiple peer-reviewed studies published between 2024 and early 2026, which collectively show that systems like ChatGPT, Gemini, Claude, and others can be pushed into delivering detailed medical misinformation, sometimes in nearly half of all responses. The research lands as surveys, including a 2023 Pew Research Center report, show a growing share of Americans turning to AI tools for health questions before consulting a doctor.

What the research actually found

A physician-led red-teaming study published in npj Digital Medicine tested four publicly available chatbots (Claude, Gemini, GPT-4o, and Llama-3 70B) against 222 primary-care questions, generating 888 total responses. The results were blunt: problematic responses ranged from roughly 21.6% to 43.2% depending on the model, with the 43.2% ceiling observed in Llama-3 70B, while responses classified as outright unsafe ranged from about 5% to 13%.

Those unsafe answers included cases where chatbots advised against vaccines or promoted unproven treatments with the same assured tone they use for accurate information. Physicians reviewing the outputs concluded that none of the systems, in their tested configurations, were ready to serve as stand-alone clinical advisers.

A separate audit in BMJ Open examined five consumer-facing chatbots: Gemini, DeepSeek, Meta AI, ChatGPT, and Grok. Researchers at the University of Alberta used 250 prompts across five categories prone to health misinformation, including cancer, vaccines, stem cells, nutrition, and athletic performance. The prompts mixed open- and closed-ended questions designed to strain models toward inaccurate or contraindicated advice.

Testing of the chatbots took place in February 2025, prior to the study’s publication in BMJ Open. Across the full experiment, only two outright refusals were recorded. Researchers flagged a pattern of false balance, where chatbots gave equal weight to fringe claims and established science, and found that a substantial share of the medical information provided was inaccurate or incomplete.

The problem goes deeper than bad prompts

Some of the most concerning findings involve vulnerabilities that users never see. Research published in Nature Medicine simulated data-poisoning attacks against The Pile, a widely used pretraining corpus. By inserting misleading medical statements into a subset of training data, the team showed how seeded misinformation can embed itself in a model’s behavior during the training phase and persist long after the poisoned data is introduced.

The researchers also tested defenses, including cross-checking outputs against biomedical knowledge graphs and filtering corrupted samples before training. Those countermeasures showed promise in the lab, but their effectiveness in production systems has not been broadly validated. Importantly, this was a controlled simulation, not evidence that any commercial model has been poisoned in the wild. But it maps a concrete attack surface that other researchers can now probe.

An experimental study in the Annals of Internal Medicine showed that through system-level instructions at the API level, accessible to anyone with a developer account, multiple foundational large language models could be converted into persuasive disinformation agents for health topics. Across 100 health queries, the study reported that the majority of responses were classified as disinformation, with four of the models tested producing disinformation in all of their responses when operating under malicious instructions. The disinformation often mimicked the style of legitimate patient education materials.

A cross-sectional evaluation published in The BMJ added another layer of concern. When prompted to generate health disinformation content, models exhibited low refusal rates and could produce authentic-looking but entirely fabricated journal citations, giving false claims an appearance of scientific backing. For a patient trying to verify what a chatbot told them, a fake reference to a nonexistent study in a real journal is an especially dangerous form of misinformation.

What nobody knows yet

None of the published studies have tracked real-world patient harm resulting from steered chatbot misinformation. The research proves these systems can produce dangerous answers under controlled conditions, but how often actual patients encounter and act on those answers in daily use remains unquantified.

That gap matters. Patients may cross-check chatbot advice with clinicians, family members, or other sources, potentially blunting the impact of any single unsafe answer. But people without regular access to healthcare, a group that may be more likely to rely on free AI tools in the first place, could be disproportionately vulnerable. None of the reviewed studies incorporate usage analytics or follow-up interviews with real users.

The studies also tested specific model versions at specific points in time. AI companies routinely update their systems, retrain on new data, and adjust safety filters. Whether the vulnerabilities documented in these audits persist in the most current versions, or whether new failure modes have emerged, is unclear.

No primary developer statements from OpenAI, Google, Meta, or Anthropic addressing the specific findings of these studies appear in the published research. It is worth noting that several of these companies have published broader safety frameworks and responsible-use policies, but without direct responses to these audits, readers cannot assess what specific mitigations, if any, have been applied since testing occurred.

Regulatory response is similarly undefined. The FDA has released frameworks related to AI in healthcare, but these focus primarily on clinical decision-support tools and drug development rather than general-purpose chatbots dispensing health information to consumers. Which regulators will claim primary authority over that use case, and how liability will be assigned when advice conflicts with medical standards, are open questions that none of the available evidence resolves.

Why the methodology matters for readers

The strongest evidence comes from controlled experiments with transparent methods. The npj Digital Medicine study used physician reviewers to classify 888 chatbot responses against established clinical guidelines, giving its safety failure rates a concrete clinical benchmark. By reporting both “problematic” and “unsafe” categories, the authors drew a useful line between merely incomplete answers and those that could plausibly cause direct harm if followed.

The BMJ Open audit applied structured prompts across defined misinformation categories, making its findings reproducible and comparable across models. Its focus on topics like vaccines and cancer, where misinformation already circulates widely, tests how chatbots behave under realistic but challenging conditions rather than benign everyday queries. The minimal refusal rates suggest that current safety layers are more permissive than many users might assume.

The Nature Medicine data-poisoning research shifts attention from what happens when a user tries to trick a model to what happens when a model has been quietly corrupted before deployment. The Annals of Internal Medicine study on system-instruction vulnerabilities is particularly striking because it shows that bad actors do not need sophisticated technical skills to weaponize these models. And The BMJ evaluation adds a transparency dimension: even without deliberate manipulation, chatbot safeguards against generating health disinformation are weak, and outputs can include fabricated citations that make false claims look credible.

What this means for patients and the people building these tools

Taken together, the studies do not prove that chatbots are causing widespread medical harm today. But they establish that current systems are technically capable of generating unsafe advice at rates that should concern anyone using them for health decisions, and that both training data and deployment settings can be exploited to make the problem worse.

For patients, the practical takeaway as of May 2026 is straightforward: treat general-purpose chatbots as unregulated, fallible information tools, not as substitutes for a doctor, pharmacist, or nurse. If a chatbot’s answer influences a health decision, verify it with a licensed professional or a vetted source like MedlinePlus before acting on it.

For developers and policymakers, the research points to concrete priorities: independent auditing of medical outputs, transparency about model updates and known failure modes, stronger refusal behavior around high-risk health topics, and systematic defenses against both prompt-based and training-time manipulation. Whether those priorities translate into enforceable standards, or remain recommendations in journal articles, will shape how much trust these tools deserve in the years ahead.

More from Morning Overview

*This article was researched with the help of AI, with human editors creating the final content.

IG

FB

PIN

LI

X

Analysis finds AI chatbots can be steered into confident medical misinformation

What the research actually found

The problem goes deeper than bad prompts

What nobody knows yet

Why the methodology matters for readers

What this means for patients and the people building these tools

Author

Get weekly updates with the latest news and tips!

More in AI

IG

FB

PIN

LI

X