Morning Overview

Study proposes new model for how Pavlovian learning works

A peer-reviewed article in Neurobiology of Learning and Memory is challenging a foundational assumption about how animals and humans form associations between cues and rewards, Rather than relying solely on prediction errors, the kind of signal that fires when reality deviates from expectation, the new model proposes that the brain tracks information and certainty on a trial-by-trial basis using Bayesian updating. The proposal arrives alongside experimental evidence from dopamine studies that call into question whether the dominant reinforcement-learning framework accurately describes what neurons actually do during simple learning tasks.

Why the Rescorla-Wagner Model Falls Short

For decades, the Rescorla-Wagner model has served as the default explanation for Pavlovian conditioning. It is an error-correction framework that estimates how much a conditioned stimulus predicts an outcome. When the prediction is wrong, the error signal adjusts the association strength. When the prediction is right, learning stops. This elegant simplicity made it the backbone of computational accounts of conditioning and, later, of temporal-difference reinforcement learning in artificial intelligence.

But the model has blind spots. It treats learning as a process driven entirely by the size of a mismatch between expected and actual outcomes. It does not account for the timing between events, the rarity of certain cue-outcome pairings, or the rate at which new information arrives. A recent review in Neurobiology of Learning and Memory positions this gap as a central tension in the field, contrasting connection-strength and prediction-error models against an emerging class of time-based, rate-based, and information-theoretic alternatives. The review argues that the latter class can explain phenomena that error-correction frameworks struggle with, including how organisms respond to changes in the temporal spacing of rewards.

These limitations matter because they show up in classic conditioning puzzles. Animals often learn faster when rewards are spaced out over time than when they are delivered in rapid succession, even if the total number of rewards is the same. They can also become highly sensitive to rare but informative events, such as an occasional shock following a tone, in ways that simple error-driven rules cannot easily capture. As the discrepancies pile up, researchers are increasingly looking for models that treat learning as a richer process than just adjusting the strength of a single association.

Information and Certainty Replace Simple Error Signals

The alternative framework at the center of this debate defines learning not as error correction but as information gain. An article published in eLife and archived in PubMed Central lays out a formal definition of “informativeness” as a rate ratio whose logarithm corresponds to mutual information. In plain terms, the brain asks not just “Was I wrong?” but “How much did that event reduce my uncertainty about the world?”

This distinction changes what counts as a strong learning signal. Under prediction-error models, a fully predicted reward generates no learning because the error is zero. Under the information-theoretic account, even a predicted reward can drive learning if it arrives at an unusual rate or in a context where background uncertainty is high. The model also incorporates an anchor, a baseline expectation that shifts with experience, allowing the system to track continuous changes rather than simply toggling between “learned” and “not learned.”

The Neurobiology of Learning and Memory article describes how this information-theoretic account can be extended with trial-by-trial Bayesian updating, letting the model explain gradual, continuous shifts in conditioned responding rather than the abrupt transitions that simpler models predict. This hybrid approach treats each new experience as evidence that updates a probability distribution over possible cue-outcome relationships. Instead of a single associative strength, the learner maintains a full belief state about how likely different contingencies are, and learning speed naturally depends on both how surprising an event is and how uncertain the system was beforehand.

Because the model is grounded in information theory, it also connects naturally to broader neuroscience data. Neural circuits can be interpreted as encoding probability distributions and updating them when new evidence arrives. That framing dovetails with the long-standing use of Bayesian statistics in perceptual and decision-making research, now being extended to the domain of associative learning.

Dopamine Plateaus Challenge Temporal-Difference Predictions

If the information-theoretic model is correct, the neural signals underlying learning should look different from what classical reinforcement learning predicts. A study published in Nature Communications provides exactly that kind of evidence. Researchers recorded dopamine release in the dorsal striatum during simple cue-outcome learning and found that the signals did not follow the expected temporal-difference pattern.

Classical temporal-difference models predict a specific signature: early in learning, dopamine should spike at the moment of reward delivery, and as the animal learns, that spike should shift backward in time to the cue that predicts the reward. This shift is considered a defining feature of prediction-error coding. But the Nature Communications study documented prolonged “plateau” dopamine responses that persisted across learning rather than migrating cleanly from outcome to cue. The dopamine signal appeared to encode sustained outcome value, not a brief, phasic error update.

Additional access to the same work through a publisher portal underscores that these plateaus were robust across animals and task conditions. Instead of disappearing once the reward became predictable, the elevated dopamine levels remained, suggesting that the system continues to register ongoing information about reward statistics rather than declaring learning complete.

Reporting from MIT’s McGovern Institute offered an accessible explanation of why these findings conflict with canonical reinforcement-learning expectations. The plateau pattern suggests that dopamine in the dorsal striatum may serve a broader role than simply teaching associations. It may instead reflect an ongoing evaluation of outcomes that persists even after the animal has learned the contingency, a pattern more consistent with information-based encoding than with error-based encoding. In that view, dopamine tracks how informative each outcome remains about the underlying structure of the task, not just how wrong the last prediction was.

Rare Events May Drive Learning More Than Repetition

One of the most striking implications of the new framework is that the brain may learn more from rare, surprising events than from repeated, predictable ones. Research highlighted by UCSF’s Weill Institute directly challenges a 100-year-old assumption about Pavlovian conditioning: that repetition is the primary driver of associative learning. The Weill Institute reporting emphasizes that the time that passes between rewards is an important variable, not just whether the reward occurs.

This finding aligns with the information-theoretic model’s core logic. A rare event carries more information precisely because it is unexpected. If the brain is tracking informativeness rather than simply correcting errors, then a single surprising pairing of a bell and food should produce a stronger learning signal than the tenth identical pairing in a row. The prediction-error model can partially capture this through large initial errors, but it cannot explain why temporal spacing between rewards independently affects learning strength once the basic contingency is known.

By treating learning as Bayesian evidence accumulation, the newer framework naturally predicts that widely spaced, low-frequency rewards can be especially influential. Each occurrence dramatically reshapes the learner’s belief distribution, particularly when prior uncertainty is high. In contrast, densely packed, highly predictable rewards add little new information and therefore have diminishing impact on behavior.

Rewriting the Textbook Story of Conditioning

Taken together, the theoretical proposals and neural data point toward a broader rethinking of how associative learning works. Rather than a simple error-correction process that shuts down when predictions become accurate, learning may be a continuous, information-driven negotiation between uncertainty and evidence. Dopamine signals in the striatum, long treated as the textbook embodiment of reward prediction error, now appear to encode richer statistics about reward timing, rate, and context.

This shift has practical implications. In artificial intelligence, algorithms inspired by temporal-difference learning have powered major advances, but they may miss out on the efficiency gains that come from explicitly tracking informativeness and uncertainty. In psychiatry and neurology, conditions that involve disrupted dopamine signaling might be better understood as disorders of information processing, not just miscalibrated reward prediction errors. And in basic neuroscience, the push to integrate Bayesian principles into models of conditioning is likely to accelerate as more labs turn to tools such as large-scale neurodata repositories to test fine-grained predictions about neural dynamics.

The Rescorla-Wagner model is unlikely to disappear from textbooks; its simplicity and historical impact ensure it will remain a useful teaching tool. But as new experiments reveal how brains actually respond to time, rarity, and uncertainty, the field is moving toward a more nuanced account in which information, not just error, drives learning. The next generation of models will have to explain not only how animals come to expect rewards, but also how they decide which experiences are worth learning from in the first place.

More from Morning Overview

*This article was researched with the help of AI, with human editors creating the final content.