Millions of people now ask AI chatbots for medical advice, legal guidance, and factual answers to high-stakes questions. The systems they rely on do not retrieve verified information from a database. They predict the next word in a sequence, and peer-reviewed research published in Nature confirms that this training method systematically rewards confident but incorrect answers. The gap between what users expect and what the technology actually does has never been wider or more consequential.
Next-token prediction and the hallucination problem
Every major chatbot built on a large language model shares the same foundational design. The model reads a string of text and calculates the probability of the next token, a word or word fragment, based on patterns absorbed during training. The original architecture behind this approach, the Transformer, computes those probabilities through a mechanism called self-attention, which weighs how each token in a sequence relates to every other token. That design, described in the foundational Transformer work, is optimized for producing fluent, contextually coherent text. It is not optimized for producing true statements.
GPT-3, one of the most influential models to scale this approach, was trained as an autoregressive system using next-token prediction across a massive text corpus. The corresponding GPT-3 study demonstrated that impressive behaviors, from translation to question-answering, emerge from this single objective applied at sufficient scale. But the paper describes a system without any internal fact-checking layer. The model generates answers that sound authoritative because it learned from authoritative-sounding text, not because it verified the claims it produces.
This distinction matters because the training objective treats all tokens equally. A correct date and an incorrect date are both just prediction targets. When a model assigns high probability to a wrong answer because similar patterns appeared frequently in training data, it produces what researchers call a hallucination: a fluent, confident, and false statement. The user on the other end has no reliable way to distinguish that output from a genuinely accurate response.
Peer-reviewed evidence that accuracy testing backfires
The problem runs deeper than occasional errors. A peer-reviewed study in Nature examining accuracy evaluation found that the very methods used to test and improve model accuracy can push models toward confident but incorrect answers. The evaluation setups designed to measure factual reliability end up incentivizing the wrong behavior: models learn to sound certain rather than to flag uncertainty. This creates a feedback loop where attempts to fix hallucinations through standard accuracy benchmarks can actually make the problem worse.
Separate research published in the same journal developed a method for detecting hallucinations using semantic entropy, a measure of how much a model’s meaning shifts across multiple attempts to answer the same question. That work established that fluent text generation and factual accuracy are distinct capabilities, not two sides of the same coin. A model can produce grammatically perfect, contextually appropriate, and entirely fabricated output because nothing in its training objective penalizes fabrication as long as the next token is statistically plausible.
Additional research has linked hallucination behavior directly to the dynamics of next-token prediction training. The core argument is that the standard cross-entropy loss function, which measures how well the model predicts the next token, does not encode any concept of factual grounding. Rethinking how models generalize from training data, rather than simply scaling them up, appears necessary to address the root cause.
Can a different training objective fix the problem?
One hypothesis gaining attention among researchers is whether replacing or supplementing the standard next-token prediction loss with an auxiliary fact-verification signal during pre-training could meaningfully reduce hallucination rates. The idea is straightforward: if the training objective itself is the source of the problem, changing that objective should change the outcome. The semantic entropy framework developed in the Nature hallucination detection research provides one possible evaluation method for measuring whether such a change works.
No commercial AI lab has published internal training logs or objective-function details that would confirm or deny whether this approach is being tested at scale. OpenAI, Google, and Anthropic have not released raw benchmark data showing per-prompt hallucination counts before and after specific mitigation strategies. The absence of that data makes it difficult to assess whether any current production model has moved beyond the pure next-token paradigm in a way that genuinely reduces false outputs rather than masking them with post-hoc filtering.
The research record does establish several things clearly. Standard evaluation methods can make hallucinations worse, not better. The training objective itself creates the conditions for confident falsehood. And detecting hallucinations after the fact, while useful, does not address the architectural source of the problem. Whether an auxiliary verification gradient applied during pre-training would reduce hallucination scores on established evaluation suites by a significant margin is a testable question, but one that requires access to training infrastructure and datasets that remain proprietary.
What chatbot users should watch for next
The practical consequence for anyone using an AI chatbot is direct: the system’s confidence in its answer tells you nothing about that answer’s accuracy. A chatbot that states a legal statute, a drug interaction, or a historical date with apparent certainty is performing the same statistical operation whether the output is correct or wrong. The peer-reviewed evidence shows that this is not a bug being fixed with incremental tuning; it is a structural feature of how current systems are trained.
For now, the safest stance is to treat chatbot output as a draft, not a verdict. When the stakes are high, users should cross-check key claims against primary sources, official documentation, or qualified human experts. Asking the model to show its reasoning, or to provide citations, can sometimes surface inconsistencies, but those techniques do not eliminate hallucinations. The model can fabricate reasoning chains and references just as easily as it fabricates facts.
Organizations deploying chatbots into sensitive workflows face a parallel responsibility. They can limit models to retrieval-augmented setups that ground answers in verified databases, constrain use to low-risk scenarios such as drafting or summarization, and build user interfaces that emphasize uncertainty rather than confidence. Clear disclaimers that explain what the model does-and just as importantly, what it does not do-are essential to narrowing the expectation gap.
What comes next will depend on whether AI labs are willing to reconsider the foundations of their systems. Incremental improvements in filtering, prompt engineering, and fine-tuning can reduce the most egregious failures, but they cannot fully overcome a training objective that never learned the difference between truth and plausible text. Moving beyond that limitation will require experiments that integrate factual verification into the core learning process and, crucially, transparent reporting about whether those experiments actually reduce hallucinations in practice.
Until such evidence is public, users and institutions alike will need to proceed with caution. Large language models have made it dramatically easier to generate fluent, on-demand text. They have not, so far, made it any easier to know when that text is true. Recognizing that gap-and designing policies, products, and personal habits around it-may be the most important safeguard available while the underlying science catches up.
More from Morning Overview
*This article was researched with the help of AI, with human editors creating the final content.