The Neurocritic: Dialogues and Dilbert on Prediction Errors

A new article in TICS (Trends in Cognitive Sciences) by Niv and Schoenbaum makes use of Dilbert cartoons and a Q&A format to discuss the neural correlates of prediction error signals:

The goal of this ‘dialogue style’ review (which loosely follows a series of question and answer e-mails between the authors) is to make clear to those not versed in reinforcement learning theory what temporal difference prediction errors are and how this theory interacts with neuroscientific research.

A popular view is that midbrain dopamine neurons respond to discrepancies between a predicted reward and the actual outcome (Schultz, 2006). The TICS article begins by giving a whirlwind tour of the Rescorla-Wagner model of classical conditioning, then differentiates between the Rescorla-Wagner and the temporal difference (TD) models. Next it's on to dopamine...

Q5: The influential reward prediction error hypothesis of dopamine arose from comparing monkey electrophysiological data to the characteristics of a TD prediction error (Schultz et al., 1997). Today, other forms of neural recordings have targeted this same signal. What are the basic criteria for establishing that a recorded signal is, indeed, a TD prediction error?

Three criteria can be considered the ‘fingerprint’ of a reward prediction error signal: a phasic increase to unexpected rewards (a positive prediction error), no change to predicted rewards and a phasic decrease (a negative prediction error) when an expected reward is omitted (or vice-versa – decreases to positive errors and increases to negative errors).

Figure 1 (Niv & Schoenbaum, 2008). The time course of the reward, value and prediction error signals in the TD model. The first predictive stimulus is the label on the wine bottle, after which wine is poured into the glass and finally consumed. Cue-related phasic neural signals whose magnitude reflects the future predicted reward can be called prediction error signals, but sustained neural signals corresponding to the value of the predicted reward throughout the trial are designated value signals.

...although the authors acknowledged that such signals can also occur elsewhere in the brain. Yet, Niv and Schoenbaum are critical of these latter activations observed in neuroimaging studies:

Q8: Correlates of prediction errors in functional imaging studies are frequently found not in the midbrain but, rather, in areas such as the striatum, amygdala, and orbitofrontal cortex. Do all these areas signal prediction errors?

This is a tricky issue that has caused much confusion. Imaging studies have indeed found blood-oxygen-level level dependent (BOLD) signals that correlate with a precise, computationally derived TD prediction error in a variety of brain areas. Furthermore, a handful of single-unit recording studies have reported that activity in other brain areas – amygdala, striatum, orbitofrontal cortex and elsewhere – is reliably modulated by whether rewards or punishments are expected...
However, current thought has it that the BOLD signal does not directly reflect firing activity in an area but, rather, correlates with the local field potential and local processing, which are driven by subthreshold activity and synaptic inputs to the area. Thus, perhaps it is appropriate to view the imaging results as reflecting the information that an area is receiving and processing, whereas single-unit activity reflects the information that an area is transmitting to downstream regions.

Speaking of BOLD critics, see Nikos Logothetis on What we can do and what we cannot do with fMRI (in Nature) and Logothetis doesn’t like to be BOLD … alone, which summarizes a lecture he gave in Copenhagen (in the BRAINETHICS blog).

References

NIV Y, SCHOENBAUM G. (2008). Dialogues on prediction errors. Trends in Cognitive Sciences. DOI: 10.1016/j.tics.2008.03.006.

The recognition that computational ideas from reinforcement learning are relevant to the study of neural circuits has taken the cognitive neuroscience community by storm. A central tenet of these models is that discrepancies between actual and expected outcomes can be used for learning. Neural correlates of such prediction-error signals have been observed now in midbrain dopaminergic neurons, striatum, amygdala and even prefrontal cortex, and models incorporating prediction errors have been invoked to explain complex phenomena such as the transition from goal-directed to habitual behavior. Yet, like any revolution, the fast-paced progress has left an uneven understanding in its wake. Here, we provide answers to ten simple questions about prediction errors, with the aim of exposing both the strengths and the limitations of this active area of neuroscience research.

Schultz W. (2006). Behavioral theories and the neurophysiology of reward. Annu Rev Psychol. 57:87–115.

Schultz W, Dayan P, Montague RR (1997). A neural substrate of prediction and reward. Science 275: 1593–99.

The Neurocritic

Pages

Tuesday, June 24, 2008

Dialogues and Dilbert on Prediction Errors

No comments:

Post a Comment