Escolar Documentos
Profissional Documentos
Cultura Documentos
NeuroImage
journal homepage: www.elsevier.com/locate/ynimg
Institute for Communications Technology, Technische Universitt Braunschweig, Schleinitzstr. 22, 38106 Braunschweig, Germany
Department of Neurology, Hannover Medical School, Carl-Neuberg-Str. 1, 30625 Hannover, Germany
a r t i c l e
i n f o
Article history:
Accepted 2 November 2014
Available online 8 November 2014
Keywords:
Event-related potentials
Single-trial EEG
Free-energy principle
Bayesian brain
Surprise
Probability weighting
a b s t r a c t
Empirical support for the Bayesian brain hypothesis, although of major theoretical importance for cognitive neuroscience, is surprisingly scarce. This hypothesis posits simply that neural activities code and compute Bayesian
probabilities. Here, we introduce an urnball paradigm to relate event-related potentials (ERPs) such as the P300
wave to Bayesian inference. Bayesian model comparison is conducted to compare various models in terms of
their ability to explain trial-by-trial variation in ERP responses at different points in time and over different regions of the scalp. Specically, we are interested in dissociating specic ERP responses in terms of Bayesian
updating and predictive surprise. Bayesian updating refers to changes in probability distributions given new observations, while predictive surprise equals the surprise about observations under current probability distributions. Components of the late positive complex (P3a, P3b, Slow Wave) provide dissociable measures of
Bayesian updating and predictive surprise. Specically, the updating of beliefs about hidden states yields the
best t for the anteriorly distributed P3a, whereas the updating of predictions of observations accounts best for
the posteriorly distributed Slow Wave. In addition, parietally distributed P3b responses are best t by predictive
surprise. These results indicate that the three components of the late positive complex reect distinct neural
computations. As such they are consistent with the Bayesian brain hypothesis, but these neural computations
seem to be subject to nonlinear probability weighting. We integrate these ndings with the free-energy principle
that instantiates the Bayesian brain hypothesis.
2014 Elsevier Inc. All rights reserved.
Introduction
How can the brain make reliable and valid inferences about the external world based on variable sensory information? Bayesian decision
theory offers a useful theoretical framework for explaining this inference process (Jaynes, 2003; Robert, 2007). Bayesian inference constantly updates prior beliefs to posterior beliefs in light of observed data
according to probability rules (Bayes' theorem; Baldi and Itti (2010)).
Thus, it can hardly surprise that a hypothesis has been proposed according to which the brain codes and computes Bayesian probabilities (Knill
and Pouget, 2004; Friston, 2005; Doya et al., 2007; Gold and Shadlen,
2007; Kopp, 2008). While earlier research provided results that are consistent with the Bayesian brain hypothesis (Hampton et al., 2006;
Ostwald et al., 2012; Vilares et al., 2012; Lieder et al., 2013), no
agreed-upon conclusion about the utility of the Bayesian brain hypothesis as a theoretical framework for explaining cognitive functions of the
brain has been achieved in the eld (Clark, 2013).
In order to test the Bayesian brain hypothesis, we tried to explain
event related brain potentials (ERPs) during successive trials in an
urnball task (Phillips and Edwards, 1966) in terms of underlying
Corresponding author.
E-mail addresses: kolossa@ifn.ing.tu-bs.de (A. Kolossa), kopp.bruno@mh-hannover.de
(B. Kopp), ngscheidt@ifn.ing.tu-bs.de (T. Fingscheidt).
http://dx.doi.org/10.1016/j.neuroimage.2014.11.007
1053-8119/ 2014 Elsevier Inc. All rights reserved.
223
(1966) Bayesian model of the orienting response. Notice that the P3a
component of the ERP is usually considered as indicating the brain's
orienting response (e.g., Friedman et al., 2001; Barcel et al., 2002;
Nieuwenhuis et al., 2011), and Kopp and Lange's (2013) P3a data
were consistent with Sokolov's (1966) model of the orienting response.
The second hypothesis postulates that ERP responses at centro-parietal
channels (P3b) are best explained by predictive surprise. This suggestion was made by Kolossa et al. (2012) who showed that trial-by-trial
P3b amplitude uctuations in a simple two-choice response time task
could be best explained by a computational model of predictive surprise. Assuming that these two hypotheses possess some validity, a
novel hypothesis arises according to which ERP responses at occipitoparietal channels (SW, Matsuda and Nittono (2014)) would be best
explained by postdictive surprise.
Materials and methods
Participants, experimental design, and data acquisition & analysis
Participants
Sixteen undergraduate psychology students participated to gain
course credits (15 females, 1 male). Their age ranged from 19 to 50
years (M = 24.7; SD = 9.3 years of age). Handedness was examined
with the Edinburgh Handedness Inventory (Oldeld, 1971), revealing
that one participant was left-handed and two were ambidextrous. All
participants indicated having normal or corrected-to-normal sight.
The procedure was approved by the local Ethics Committee.
Experimental design
Our urnball task represents a modication of tasks that were used
by Phillips and Edwards (1966) and by Grether (1980, 1992; see also
Furl and Averbeck (2011) and Achtziger et al. (2014), for a similar paradigm). There were U = 2 types of urns (labeled u = 1 and u = 2)
which could be distinguished by the distribution of the K = 2 types of
colored balls (labeled k = 1 and k = 2 for red and blue, respectively)
of the ten balls contained in one urn. The urn types represented socalled states s u U f1; 2g which were hidden from the participants during the experiment. The balls were so-called events k K
f1; 2g which could be observed by the participants during the experiment. The experimental design (Fig. 1) consisted of a factorial combination of two levels of prior probabilities (Pc, Pu) and two levels of
likelihoods (Lc, Lu), yielding four experimental conditions C {PcLc,
PcLu, PuLc, PuLu}, each of which contained B = 50 episodes of random
sampling (b {1, , 50}), each consisting of N = 4 trials (n {1, , 4}),
yielding a total of 800 sequentially presented colored ball stimuli. All
conditions were administered to each participant, with short breaks
(approximately 3 min) between the conditions, and their order was
counterbalanced across participants. The ball colors were also
counterbalanced across participants, but we will ignore this in our description to avoid confusion.
At the beginning of each condition, the tableau of ten urns containing a total of one hundred balls was shown to the participants,
representing prior probabilities and likelihoods. The visualization of
prior probabilities and likelihoods in the form of tableaus allowed participants to build an internal representation of these probabilistic parameters. Each episode of random sampling consisted of the following
sequence of events: First, one of the ten urns was selected randomly
to form the state s = u, but the outcomes of these selections remained
hidden to the participants during the experiment. Subsequently, a random sample of four balls was sequentially drawn, with replacement,
from the selected urn, and shown one after the other, taking the form
of observations o(n) = k at each trial n.
Participants were asked to indicate the color of each ball stimulus by
pressing the left or right Ctrl key on a standard computer keyboard
(using the left or right index nger, respectively). Once a sample of
four balls had been completed, participants had to choose which type
224
Fig. 1. Illustration of the urnball paradigm and outline of the experimental setup for the condition uncertain prior probability and uncertain likelihood PuLu, i.e., P(s = 1) = 0.7, P(s = 2) = 0.3,
P(o = 1|s = 1) = P(o = 2|s = 2) = 0.7, and P(o = 2|s = 1) = P(o = 1|s = 2) = 0.3. At the beginning of a condition, the 'tableau' of urns u and balls k is shown to the participant, describing
prior probability and likelihood. Afterwards, an urn is randomly selected to constitute the hidden state s = u and not shown to the participant. From this urn, four balls are drawn consecutively
over trials n {1, 2, 3, 4} with replacement, and observed by the participants o(n) = k {1, 2}. Participants were asked to indicate the color of each ball stimulus by pressing the corresponding
key, and they had to choose which type of urn had been selected on the current sampling once a sample of four balls had been completed.
of urn had been selected on the current episode of sampling (i.e., which
urn type u constitutes the state s). They indicated their choice by pressing the left or right Ctrl key for the state being urn type s = u = 1 and
urn type s = u = 2, respectively. Stimulusresponse mapping was
counterbalanced across participants (i.e., left or right hand responses indicating s = 1 and s = 2 choices). The duration of an episode of sampling, i.e., the presentation of the sample of ball stimuli, the collection
of the color responses, and the nal urn choice, amounted to around
12 s to 15 s. Neither feedback nor reward was provided during the
course of the experiment.
Prior probabilities were manipulated by presenting ten urns, composed of different numbers of type u = 1 and type u = 2 urns. In uncertain prior probability conditions (Pu), seven type u = 1 urns and three
type u = 2 urns (i.e., P(s = 1) = 0.7, P(s = 2) = 0.3) were presented.
In the certain prior probability condition (Pc), nine type u = 1 urns
and one type u = 2 urn (i.e., P(s = 1) = 0.9, P(s = 2) = 0.1) were presented. On uncertain likelihood conditions (Lu), urn type u = 1
contained seven red (k = 1) and three blue (k = 2) balls (i.e., P(o =
1|s = 1) = 0.7, P(o = 2|s = 1) = 0.3), while urn type u = 2 contained
three red and seven blue balls (i.e., P(o = 1|s = 2) = 0.3, P(o = 2|s =
2) = 0.7). On certain likelihood conditions (Lc), urn type u = 1
contained nine red balls and one blue ball (i.e., P(o = 1|s = 1) =
0.9, P(o = 2|s = 1) = 0.1), while urn type u = 2 contained one red
ball and nine blue balls (i.e., P(o = 1|s = 2) = 0.1, P(o = 2|s = 2) = 0.9).
Before an experiment was to begin, each participant completed four
practice episodes of sampling under the supervision of and in a taskrelated dialogue with the experimenter to become accustomed to the
task. The tableau of each practice sampling consisted of one single
u = 1 urn and one single u = 2 urn (yielding uniform prior probabilities). Successful completion of these practice episodes of samplings
demonstrated that the participants understood the procedure and
their task. Visual ball stimuli were presented at the center of a computer
screen (Eizo FlexScan T766 19; Hakusan, Ishikawa, Japan) against gray
background (stimulus size = 1, stimulus duration = 100 ms, stimulus
onset asynchrony = 2500 ms). The experiment was run using
Presentation (Neurobehavioral Systems, Albany, CA).
Data acquisition
The electroencephalogram (EEG) was recorded continuously, using
a QuickAmps-72 amplier (Brain Products, Gilching, Germany) and
the Brain Vision RecorderVersion 1.02 software (Brain Products,
Gilching, Germany) from frontal (F7, F3, Fz, F4, F8), fronto-central
(FCz), central (T7, C3, Cz, C4, T8), parietal (P7, P3, Pz, P4, P8), occipital
(O1, O2), and mastoid (M1, M2) sites. AgAgCl EEG electrodes were
used that were mounted on an EasyCap (EasyCap, HerrschingBreitbrunn, Germany). Electrode impedance was kept below 10 k.
All EEG electrodes were referenced to average reference during the
recording. Participants were informed about the problem of noncerebral artifacts, and they were encouraged to reduce the occurrence
of movement artifacts. Ocular artifacts were monitored by means of bipolar pairs of electrodes positioned at the sub- and supraorbital ridges
P oj P
;
P o
X ;
with likelihood P(o|) of the observation o given , and P(o) as the evidence (i.e., the probability of the observation). The belief about a hidden state before an observation represents the prior probability
P() = P(s(n) = u|on1 1) = Pu(n 1) and the posterior probability
P(|o) = P(s(n) = u|on1) = Pu(n) following an observation, with
on1 1 = (o(1), o(2), , o(n 1)) as sequence of n 1 previous observations. The distribution of balls within the urns equals the likelihood
P(o|) = P(o(n) = k|s(n) = u) = Lk|u and the prediction of the observation constitutes the observation probability P(o) = P(o(n) = k|
Fig. 2. Hierarchical structure of the model space. There are two random variables: hidden
states s (urn type u selected at the start of an episode of sampling) and observable events k
(balls drawn). The ideal Bayesian observer updates the probability distributions over
hidden states (belief updating, BEL) and observations (prediction updating, PRE)
following Bayes' theorem (Eq. (1)). Bayesian updating and predictive surprise are
response functions that link the probability distributions to cortical activity.
225
Table 1
Overview and short description of the evaluated distributions and ways to implement
probability weighting.
Distribution Description
BEL
BELSI
BELSO
PRE
PRESI
PRESO
OSTa
DIFa
a
Additional models taken from literature are described in Appendices B and C,
respectively.
Fig. 3. Illustration of Bayesian inference, i.e., the updating of the belief distribution P U and
of the prediction distribution P K under uncertain prior probabilities and likelihoods Lk. Left
panel: Likelihoods and probability distributions on trial n 1. Right panel: Likelihoods
and probability distributions on trial n. Horizontal lines dene probability distributions
of the hidden state (beliefs about urns), while vertical lines dene the likelihoods (ball distribution within the urns). Colored areas visualize the resulting prediction distribution for
red balls vs. blue balls, respectively. The arrow indicates the observation of a blue ball on
trial n (o(n) = 2). In this case, predictive surprise about the blue ball exceeds predictive
surprise about a (potential) red ball since Pk = 2(n) b Pk = 1(n). The observation of the (surprising) blue ball triggers Bayesian inference about the hidden state that is equivalent to
shifting the horizontal line. The KL divergence between P U n1 and P U n yields the scalar values used to predict trial-by-trial EEG variations based on the BEL distribution
(Bayesian surprise). The resulting ratio of colored areas on trial n represents the updated
prediction distribution. The KL divergence between P K n and P K n 1 yields the scalar
values used to predict trial-by-trial EEG variations based on the PRE distribution
(postdictive surprise).
226
Lkju P u n1
;
C n
uU
uU
for all k K. As the urn does not change within an episode of sampling,
Eq. (4) can be simplied to
P k n 1
Lkju P u n;
k K:
due to symmetry and that = 1 yields the identity w(P) = P. Probability weighting can be incorporated into the observer model as a hyperparameterization of all input of the inference (i.e., prior probability
and likelihood), denoted as
P
wP :
Bayesian inference takes place as before, yielding the BELSI distribution (Eq. (2))
uU
P u n
Hyper-parameterization using probability weighting functions
Lkju P uw n1
C n
u U;
10
It has been shown that estimates of probabilities P (prior probabilities, i.e., beliefs about the urns before an observation and likelihoods, i.e., ball distribution within the urns, in our paradigm) by
human observers vary systematically from the objective probabilities in a way that low probabilities are overestimated and high probabilities are underestimated as shown in Fig. 4. This variation is
formalized via a probability weighting function w(P) which is commonly reported to be (inverse) S-shaped (Kahneman and Tversky,
1979; Prelec, 1998; Gonzalez and Wu, 1999; Zhang and Maloney,
2012; Cavagnaro et al., 2013). We use (inverse) S-shaped probability
weighting as introduced in prospect theory (Kahneman and Tversky,
1979; Tversky and Kahneman, 1992; Fox and Poldrack, 2009) with
the weighting function family as proposed by Zhang and Maloney
(2012)
wP
:
1wP
C n
w w
Lkju P u n1;
11
uU
w w
Lkju P u n;
k K:
12
uU
P u n wP u n;
u U;
13
P k n 1 wP k n 1;
227
Baldi, 2009; Baldi and Itti, 2010). For prior and posterior distributions
over X it is
DKL P X kP X jo
P ln
15
The KullbackLeibler divergence between prior and posterior distributions over hidden states is called Bayesian surprise IB . It is
obtained by setting P U n1 P X as prior distribution and P U n
P X jo as posterior distribution:
IB n DKL P U n1kP U n:
16
k K;
P
:
P jo
17
14
Predictive surprise
In contrast to Bayesian updating, predictive surprise is the surprise
about the current observation o(n) at trial n being k under the prediction P k n P K n after a sequence on1 1 = (o(1), o(2), , o(n 1))
of n 1 former observations, calculated according to (Shannon and
Weaver, 1948; Strange et al., 2005)
IP n log2 P k n:
18
Notice that Pk(n) is the denominator of Eq. (2), i.e., the one probability taken from the prediction distribution P K n that corresponds to the
actual observation o(n). As the state is not revealed to the participant at
any time, it is not possible to calculate Shannon surprise in relation to
the state. However, the average Shannon surprise can be calculated as
the entropy of the belief distribution
IH n
P u n log2 P u n:
19
uU
Evaluation methods
Fig. 5. Comparison of posterior probabilities P(s = 1|o = k) for frequent (red ball, k = 1,
upper curves) and rare (blue ball, k = 2, lower curves) events calculated via Bayes'
theorem (Eq. (1)) with unweighted prior probability P(s = 1) and exemplary likelihood
P(o = 1|s = 1) = 0.9 (solid curves), and weighted prior probability and likelihood (inverse S-shaped weighting (Eq. (6)) with = 0.65, dashed curves). The black dotted line
represents a posterior probability of 0.5. For both frequent and rare events, weighting
leads to a bias towards higher uncertainties in posterior probabilities. The double-headed
arrows illustrate the quantity of Bayesian surprise (Eq. (16)) for a prior probability of
P(s = 1) = 0.5 for a frequent event (red dashed arrow with weighting, red dashed plus
red solid arrow without weighting) and a rare event (blue dashed arrow with weighting,
blue dashed plus blue solid arrow without weighting).
The combinations of probability distributions and response functions will be referenced to as models in this section for sake of brevity.
To compare different models of the EEG we used a linear hierarchical
model as implemented in the Parametric Empirical Bayesian (PEB)
schemes in the SPM software (spm_PEB.m) (Friston et al., 2002,
2007). These empirical Bayes models simply equip a standard general
linear model with a further hierarchical level that places constraints
on the parameter estimates of the rst level. The evidence for each
model is approximated with a variational free energy bound which consists of an accuracy and a complexity term (Penny et al., 2004; Friston
et al., 2007; Penny, 2012). This approximation can then be used to compute Bayes factors and log evidences in the usual way. The exact specication of the design matrices is detailed in Appendix E. The logevidences of the models Fi = ln(p(Y|Mi)), with p(Y|Mi) being the likelihood of the data Y given the model Mi, and i {IB(BEL), IB(BELSI),
IB(BELSO), IH(BEL), IH(BELSI), IH(BELSO), IP(PRE), IP(PRESI), IP(PRESO),
IB(PRE), IB(PRESI), IB(PRESO), OST, DIF} were used for model comparison.
The log-evidences were summed across probability conditions for each
participant. We used random-effects Bayesian model selection (BMS)
for group studies and computed exceedance probabilities i each of
which equals the probability that model i is more likely than the remaining models (Stephan et al., 2009).
228
pYjMi
lnBFiNULL ln
F i F NULL ;
pYjMNULL
20
with M NULL as the common reference null-model. The participantspecic log-Bayes factors were summed up over participants to obtain
the group log-Bayes factor ln(GBF) for one model against the reference
model (Stephan et al., 2007). Due to the use of a common reference
model, all models can be compared with each other following
ln GBFi j ln GBFiNULL ln GB F jNULL :
21
Results
Behavioral results and hyper-parameter tting
Fig. 6 shows the likelihoods (ratios of choices) P c u 1P u1 4
across participants for choice c being urn type u = 1 depending on the
mean posterior probability P u1 4 after observing an episode of sampling. The mean posterior probabilities were calculated over all sequences
containing identical ratios of types of ball colors. The general form of the
likelihood function in a binary decision task is P opt c 1P u1 4
1ea
1
P u1
4b
22
(Towey et al., 1980), a frontally distributed P3a (Kopp and Lange, 2013)
356 ms, a parietally disin the latency range [300, 400] ms with t 2
P3a;max
tributed P3b (Kolossa et al., 2012) in the latency range [300, 400] ms
380 ms, and a posterior-positive Slow Wave (SW) in
with t 2
P3b; max
SW; max
504 ms (Garca-Larrea
229
Fig. 7. ERP waves for frequent (red ball, red curves) and rare (blue ball, blue curves) events for certain (Lc, solid curves) and uncertain (Lu, dashed curves) likelihoods at electrodes C3, Cz,
and C4 for the N250, at Fz, FCz, and Cz for the P3a, at Pz for the P3b, and at O1 and O2 for the SW. Time intervals for the search for maximum variance for the ERP components are highlighted in gray and the time point of maximum variance is marked by a dashed black line at all respective electrodes. Left hand panels: certain prior conditions (PcLc and PcLu). Right hand
panels: uncertain prior conditions (PuLc and PuLu). The presence of a centrally distributed N250 wave in the latency range [200, 300] ms with t 2
232 ms and of a prominent
N250;max
late positive complex is revealed. The late positive complex can be decomposed into three separable ERP components: a frontally distributed P3a in the latency range [300, 400] ms
356 ms, a parietally distributed P3b in the latency range [300, 400] ms with t 2
380 ms, and a posterior-positive Slow Wave (SW) in the latency range
with t 2
P3a;max
P3b;max
SW;max
504 ms.
combinations for the late positive complex (and the N250) at the ERPspecic virtual electrodes and time windows which were determined
as described in the Selection of ERP data for further analyses section.
Table 3 generalizes the results to the comparison of the three (or four)
model families. Finally, the relation between Bayesian surprise and the
measured data at electrode FCz (which was chosen as it represents
the center of the P3a region-of-interest electrodes) is shown for the
BELSI distribution in Fig. 10.
Fig. 8 displays group log-Bayes factors ln(GBFi NULL) with
i {IB(BELSI), IP(PRESI), IB(PRESI)} of the BELSI and PRESI distributions
versus a constant null model over time([100, 600] ms around eliciting
event). The electrodes that we do not display in Fig. 8 are mainly the
marginal electrodes and we did not see anything of importance at
these electrodes. The highest log-Bayes factors (red traces) represent
better ts between our surprise regressors and the measured trial-bytrial ERP amplitude modulations. Bayesian updating and predictive surprise seem to provide accurate approximations to the actual data, with a
fronto-central focus within the P3a latency range, and a centro-parietal
230
Fig. 9. Scalp maps of group log-Bayes factors for time intervals t [336, 376] ms, t
[360, 400] ms, and t [484, 524] ms for P3a, P3b, and SW latency ranges, respectively. Notice central foci within P3a latency range and occipitalparietal foci in the P3b and SW latency ranges. The P3a maps show a circumscribed fronto-central focus along with a leftoccipital spot. For the P3b, a more posteriorly (occipitalparietal) and broadly distributed
focus appears. Finally, the t between surprise and measured data is sharply conned to
the left-occipital region with regard to the SW.
Fig. 8. Degree to which Bayesian updating and predictive surprise based on the BELSI and
PRESI distributions approximate measured trial-by-trial ERP data in group log-Bayes factors ln(GBFi NULL) with i {IB(BELSI), IP(PRESI), IB(PRESI)} versus a constant model
NULL over electrodes and time.
Table 2
Exceedance probabilities for all tested distributionsurprise combinations over the
interval t [212, 252] ms for N250, over the interval t [336, 376] ms for P3a, over
the interval t [360, 400] ms for P3b, and over the interval t [484, 524] ms for SW.
Maximum exceedance probabilities are emphasized in bold face.
Surprise
IB
IB
IB
IH
IH
IH
IP
IP
IP
IB
IB
IB
IB
IP
Distribution
BEL
BELSI
BELSO
BEL
BELSI
BELSO
PRE
PRESI
PRESO
PRE
PRESI
PRESO
OSTa
DIFa
P3a
P3b
SW
b0.01
0.01
0.03
b0.01
b0.01
b0.01
0.06
0.04
0.02
b0.01
0.02
b0.01
0.15
0.68
b0.01
0.58
0.01
0.02
0.07
b0.01
0.05
0.03
b0.01
b0.01
0.02
b0.01
0.02
0.20
b0.01
0.03
b0.01
0.01
b0.01
b0.01
0.02
0.67
0.01
b0.01
0.02
b0.01
0.04
0.19
b0.01
0.09
0.14
0.04
0.01
0.05
0.04
0.07
0.06
b0.01
0.19
b0.01
0.19
0.12
a
Results for the N250 as well as for the OST and DIF models will be discussed in detail in
Appendix D.
F LI
F SI
F SO
F EF
P3a
P3b
SW
0.03
0.06
0.04
0.87
0.07
0.81
0.01
0.11
0.02
0.75
0.01
0.22
0.07
0.41
0.27
0.25
PuLc, PuLu} and across eight potential sequences of three successive ball
stimuli (cf. panels (A)(H)). Note that while Bayesian surprise is shown
at each stage of the sequence on the left, the ERP waves, shown on the
right, are in response to the third ball stimulus only.
A close correlation between Bayesian surprise and cortical
activations is revealed by a comparison between the various values of
231
Bayesian surprise for the third trial, IB(n = 3), and the corresponding
ERP measures. Specically, gradually increasing ERP wave amplitudes
are associated with successive increases in Bayesian surprise, IB(n =
3) (compare (A) vs. (C) vs. (E) vs. (G) and (B) vs. (D) vs. (F) vs. (H), respectively). Further, the left panels show that Bayesian surprise is
mainly grouped by likelihood (Lc (solid curves) vs. Lu (dashed
curves)). In order to show this effect in the ERP data, the waves for
certain (Lc, solid curves) and uncertain (Lu, dashed curves) likelihood have been averaged separately, regardless of prior probabilities. The ERP waves also seem to reect the degree to which
Bayesian surprise IB(n = 3) under Lc conditions (solid curves in left
panels and single solid curve in right panels) surpasses IB (n = 3)
under Lu conditions (dashed curves in left panels and single dashed
curve in right panels).
As a measure of absolute t, we computed the fraction of variance
explained by the winning distributionsurprise combinations for each
component of the late positive complex (P3a, P3b, and SW) and the
N250. We report mean values across participants as well as minimum
and maximum individual values. For the P3a, 1.4% of the variance was
explained with a minimum of 0% and a maximum of 6.2%. For the P3b,
Fig. 10. Relationships between Bayesian surprise (IB(BELSI) with hyper-parameter = 0.65) and the measured data across sequences of observed events. Left panels: Bayesian surprise
IB n DKL P U n1jjP U n (Eq. (16)) over trials n = 1, 2, 3 for all probability conditions PcLc (diamond-marked solid curve), PcLu (triangle-marked dashed curve), PuLc (inverted triangle-marked solid curve), and PuLu (square-marked dashed curve). The sequence of observed events is shown below each gure with a red ball denoting a frequent and blue ball a rare
event. A clear likelihood effect is visible (solid vs. dashed curves). Right panels: In order to show this effect in the ERP data the waves for certain (Lc, solid curves) and uncertain (Lu, dashed
curves) likelihood have been averaged separately yet regardless of prior probabilities. The thus created grand-average ERP waves are shown for the third observation of a sequence o(n =
3) at electrode FCz. Gradually increasing ERP wave amplitudes are associated with successive increases in Bayesian surprise IB(n = 3). Mean reaction times (after averaging across individual median reaction times) are marked by vertical dashed black lines.
232
2.6% of the variance was explained (minimum 0%, maximum 7.5%). For
the SW 2.4% of the variance was explained (minimum 0%, maximum
8.4%). For the N250, 1.8% of the variance was explained (minimum 0%,
maximum 15.2%).
Discussion
This study explored neural correlates of Bayesian inference by combining an urnball paradigm (Fig. 1) with computational modeling of
trial-by-trial electrophysiological signals. Our approach led to the discovery that dissociable cortical signals seem to code and compute distinguishable aspects of Bayes-optimal probabilistic inference. Thus, we
isolated discrete ERP components which could be dissociated with regard to their putative function in accomplishing Bayesian inference
(cf. Figs. 8, 9; Table 2). Specically, we found the late positive complex
spatially, temporally and functionally decomposable into three separable ERP components (see also Dien et al., 2004): (1) Bayesian surprise
yielded superior approximations of activation changes in anteriorly distributed P3a waves at relatively short latency (Kopp and Lange, 2013).
(2) Postdictive surprise best explains posteriorly distributed SW amplitudes at latest latency. (3) Predictive surprise outperformed Bayesian
updating with regard to activation changes in parietally distributed
P3b waves at intermediate latency (Kolossa et al., 2012). Taken together, these results are consistent with the Bayesian brain hypothesis insofar as dissociable cortical activities seem to code and compute various
aspects of Bayesian inference.
Bayesian updating generally reects the KullbackLeibler divergence
between two probability distributions as dened in Eq. (15), but further
differentiation is necessary in order to minimize potential misunderstandings (Fiorillo, 2012): Bayesian surprise represents the change in
beliefs over hidden states given new observations which equals the
KullbackLeibler divergence between P U n1 and P U n (see
Eq. (16)). In contrast, postdictive surprise represents the change in predictions over future events given new observations and equals the
KullbackLeibler divergence between P K n and P K n 1 (see
Eq. (17)). Predictive surprise is simply the surprise over the current observation under its degree of prediction (see Eq. (18)). Our data imply
that Bayesian surprise is related to trial-by-trial P3a amplitude variability, postdictive surprise suitably models trial-by-trial SW amplitude variability, and predictive surprise best predicts trial-by-trial P3b amplitude
variability.
The hyper-parameter was tted by minimizing the mean squared
error to approximate optimal decision behavior. This approach provided
= 0.65, with b 1 being associated with inverse S-shaped probability
weighting (Fig. 4). A Bayesian observer with = 0.65 was compared
with an otherwise equivalent observer without probability weighting
( = 1). We found that the observer with input probability weighting
outperformed the unweighted observer when explaining observed
ERPs (Tables 2 and 3). These ndings seem to demonstrate a ubiquitous
role of probability weighting in probabilistic inference (Kahneman and
Tversky, 1979; Tversky and Kahneman, 1992; Fox and Poldrack,
2009). With regard to non-linear probability weighting, we have
taken our lead from the (neuro-)economics literature. The alternative
possibility that nonlinearity might lie at the level of mapping from
probability distributions to electrophysiological responses such that
the electrophysiological responses may be a nonlinear function of the
neuronal representation of unweighted probabilities did not receive
support.
The primary effect of inverse S-shaped probability weighting (with
b 1) on Bayesian updating is to increase uncertainty. Inspection of
Fig. 5 reveals that all posterior probabilities based on an observer with
probability weighting lie between the corresponding posterior probabilities based on an observer without probability weighting and P =
0.5, which equals the point of maximum uncertainty. Probability
weighting might constitute one of the reasons why empirical support
for the Bayesian brain hypothesis (Knill and Pouget, 2004; Friston,
2005; Doya et al., 2007; Gold and Shadlen, 2007; Kopp, 2008) has apparently been so difcult to obtain in former studies. Notice that earlier attempts to identify brain areas that weight probabilities did not lead to
converging results; yet, common denominators of potential areas
seem to lie within fronto-striatal loops (Trepel et al., 2005; Preuschoff
et al., 2006; Tobler et al., 2008; Hsu et al., 2009; Takahashi et al., 2010;
Wu et al., 2011; Berns and Bell, 2012), within the parietal cortex
(Berns et al., 2008), and/or within the anterior insula (Preuschoff
et al., 2008; Bossaerts, 2010; Mohr et al., 2010). Alternatively, (inverse)
S-shaped probability weighting might constitute an emergent feature of
processing probabilistic information by neurons (Gold and Shadlen,
2007; Yang and Shadlen, 2007; Soltani and Wang, 2010; Pouget et al.,
2013).
Based on our ndings, we suggest the probabilistic reasoning (PR)
model of the Bayesian brain that basically reects the tri-partitioned
late positive complex (Fig. 11). In short, as shown in Fig. 11, the PR
model posits the existence of a Bayesian reasoning unit (BRU) that
interacts, in a reciprocal manner, with cognitive systems that process incoming environmental information (Haykin and Fuster, 2014). Further,
the PR model conjectures that the BRU is capable of Bayes-optimal
updating: Firstly, it computes posterior distributions that take the
prior and observation into account (belief updating, related to trialby-trial P3a amplitude variations (Eq. (10))). Secondly, prediction
distributions for future observations are computed from posterior distributions (prediction updating, related to trial-by-trial SW amplitude
variations (Eq. (12))).
On its output branch, the emergent BRU predictions exert control over
pre-adaptive biases on cognitive processing (Fuster, 2014), whereas
BRU belief updating is based on the incoming emergent observation. Notice further that predictive surprise (related to trial-by-trial P3b amplitude variations) can be thought of as the magnitude of prediction errors
induced by pre-adaptively biased processing within the cognitive processing stream. Predictive surprise could also be considered as the evolution of a decision variable, i.e., as the accumulation of evidence from bias
levels to a decision threshold (Kopp, 2008; O'Connell et al., 2012; Kelly
and O'Connell, 2013). Further, we leave it open whether the P3a reects
the proper updating of beliefs, or an obligatory attentional process that
forms part of belief updating (i.e., an orienting response; Friedman
et al., 2001; Barry and Rushby, 2006; Kopp and Lange, 2013).
Against the background that the P3a originates from prefrontal cortical regions while the P3b is generated in temporal/parietal regions
(Polich, 2007), our results suggest how a network of brain areas may
give rise to Bayesian inference. Specically, while belief updating and
prediction updating seem to be computed in prefrontal cortical regions
(Lee et al., 2007), predictive surprise seems to originate from regions located in posterior association cortices of the visuomotor pathway
(Summereld and Koechlin, 2008; de Lange et al., 2010; d'Acremont
et al., 2013). The occipital scalp topography of the SW needs a short
comment. One plausible possibility is that the SW reects the setting
and updating of pre-adaptive biases. Kok and colleagues recently
found that perceptual predictions trigger the formation of specic stimulus templates in primary visual cortex to efciently process sensory inputs (Kok et al., 2014). Given that we sampled EEG data from merely
twenty channels, we cannot localize the underlying neural architecture
of the Bayesian observer with sufcient precision; thus, further research
is required in order to move forward from a sensor space analysis to a
source space analysis of the Bayesian observer.
Notice that our probabilistic reasoning model of the late positive
complex can be regarded as a computational advancement of the
most widely renowned and respected conceptual theory in the P3
eld, i.e., the so-called context updating model (Donchin, 1981;
Donchin and Coles, 1988). In short, this model postulates that the P3
is evoked in the service of meta-cognitive processes that are concerned
with maintaining a proper representation of the environment, such as
the mapping of probabilities on the environment, the deployment of
attention, or the setting of priorities and biases.
233
Fig. 11. An outline of our probabilistic reasoning (PR) model of the tri-partitioned late positive complex. (A) A conceptual outline of the PR model. The model posits the existence of a
Bayesian reasoning unit (BRU) that interacts with cognitive systems that process incoming environmental information (Fuster, 2014; Haykin and Fuster, 2014). The BRU computes, retains
and updates two distinguishable probability distributions, one over the hidden state (beliefs; lighter gray color) and another one over the observable events (predictions; darker gray
color). The PR model conjectures that belief updating (Bayesian surprise, Eq. (16)) and prediction updating (postdictive surprise, Eq. (17)) are associated with trial-by-trial P3a and
SW amplitude variations, respectively. The emergent BRU predictions set pre-adaptive biases on perceptual decisions, whereas BRU belief updating is based on the observation that
emerges from these decisions. Notice further that predictive surprise (Eq. (18), related to trial-by-trial P3b amplitude variations) can be thought of as the magnitude of prediction errors
induced by unpredicted or surprising observations. (B) A more formal outline of the PR model, in particular of the computational ne structure of the BRU. Units of time (n 1, n) separate
the dynamic evolution of beliefs over states (BELSI, Eq. (10)) that obeys Bayes' theorem (lighter gray color). Units of time (n, n + 1) also separate the dynamic evolution of predictions over
observations (PRESI, Eq. (12)) as prescribed by Bayes' theorem (darker gray color).
234
The terms clong,k(n) and cshort,k(n) model long and short-term memory as count functions with exponential forgetting according to
8
1
>
< ; if n0 uniform initial prior
g k n K
>
: 1; if nN0 and on k
0; otherwise
c1 n c2 n c1 n
c n
1 2 ;
c1 nc2 n
B:1
n
X
1n
e
ck ;
with k f1; 2g
B:2
n
X
e
g k ;
B:3
with
e
g k n
8
< 1;
1;
:
0;
B:4
counting the number of occurrences of event k until trial n. Parameter is set to 2.6 as in the BS3 model in Ostwald et al. (2012).
Bayesian surprise IB(n) is then calculated as the KullbackLeibler divergence between the prior p() = p(|c1(n 1), c2(n 1)) and posterior
p(|o) = p(|c1(n), c2(n)) probability density functions (OST)
IB n pjc1 n1; c2 n1 ln
pjc1 n1; c2 n1
d: B:5
pjc1 n; c2 n
C:1
1
P k n1 long clong;k n short cshort;k n c;k n :
C
X
1 n1
e
C i
1
i
g k ;
C:3
C:4
being the digital lter input. Term c,k(n) captures alternation expectation. The weighting parameters long, short, and constants C1 and C1
guarantee normalized probabilities Pk(n) [0, 1]. The DIF model was
used to calculate predictive surprise IP(n) and the parameters were set
to the optimized parameters as found in Kolossa et al. (2012).
i
with being the gamma function. The event counters are updated on a
trial-by-trial basis according to
ck n
ci;k n
C:2
i; max
tive averaged electrodes were used. To keep the presentation simple, the
electrodes are not explicitly expressed by an additional subscript.
We used a two-level hierarchical model of the form
C;1
Y;t X
C;1
;t
C;1
2
Table A.2
Family-level posterior model probabilities for the distributions based on the observer
without weighting (F LI), with weighting of the inference input (F SI), weighting of the inference output (F SO), and models based on exponential forgetting (F EF). Maximum posterior model probabilities are emphasized in bold face.
C;1
;t E;t
235
E:1
C;2
E;t ;
Family
F LI
F SI
F SO
F EF
P3a
P3b
SW
b0.01
b0.01
b0.01
N0.99
b0.01
N0.99
b0.01
b0.01
b0.01
N0.99
b0.01
b0.01
b0.01
N0.99
b0.01
b0.01
sured voltages for time instant t across trials n and episodes of samC
C;1 C
C;1
E:2
Fig. D.1. Degree to which surprise as calculated by the DIF and OST models approximates
measured trial-by-trial ERP data as group log-Bayes factors of the models versus a constant
null model over electrodes and time.
Table A.1
Posterior model probabilities for all tested distributionsurprise combinations over the
interval t [212, 252] ms for N250, over the interval t [336, 376] ms for P3a, over
the interval t [360, 400] ms for P3b, and over the interval t [484, 524] ms for SW.
Maximum posterior model probabilities are emphasized in bold face.
ERP waves
Surprise
Distribution
N250
P3a
P3b
SW
IB
IB
IB
IH
IH
IH
IP
IP
IP
IB
IB
IB
IB
IP
BEL
BELSI
BELSO
BEL
BELSI
BELSO
PRE
PRESI
PRESO
PRE
PRESI
PRESO
OST
DIF
b0.01
b0.01
b0.01
b0.01
b0.01
b0.01
b0.01
b0.01
b0.01
b0.01
b0.01
b0.01
b0.01
N0.99
b0.01
N0.99
b0.01
b0.01
b0.01
b0.01
b0.01
b0.01
b0.01
b0.01
b0.01
b0.01
b0.01
b0.01
b0.01
b0.01
b0.01
b0.01
b0.01
b0.01
b0.01
N0.99
b0.01
b0.01
b0.01
b0.01
b0.01
b0.01
b0.01
b0.01
b0.01
b0.01
b0.01
b0.01
b0.01
0.05
b0.01
b0.01
0.95
b0.01
b0.01
b0.01
Fig. D.2. Scalp maps of log-Bayes factors for time interval t [212, 252] ms for N250. A central focus becomes apparent along with a left-occipital spot.
References
Achtziger, A., Als-Ferrer, C., Hgelschfer, S., Steinhauser, M., 2014. The neural basis of
belief updating and rational decision making. Soc. Cogn. Affect. Neurosci. 9, 5562.
Bach, D.R., Dolan, R.J., 2012. Knowing how much you don't know: a neural organization of
uncertainty estimates. Nat. Rev. Neurosci. 13, 572586.
Baldi, P., Itti, L., 2010. Of bits and wows: a Bayesian theory of surprise with applications to
attention. Neural Netw. 23, 649666.
Barcel, F., Periez, J.A., Knight, R.T., 2002. Think differently: a brain orienting response
to task novelty. NeuroReport 13, 18871892.
236
Barnard, G.A., 1949. Statistical inference. J. R. Stat. Soc. Ser. B 11, 115149.
Barry, R.J., Rushby, J.A., 2006. An orienting reex perspective on anteriorisation of the P3
of the event-related potential. Exp. Brain Res. 173, 539545.
Berns, G.S., Bell, E., 2012. Striatal topography of probability and magnitude information
for decisions under uncertainty. NeuroImage 59, 31663172.
Berns, G.S., Capra, C.M., Chappelow, J., Moore, S., Noussair, C., 2008. Nonlinear neurobiological probability weighting functions for aversive outcomes. NeuroImage 39,
20472057.
Bossaerts, P., 2010. Risk and risk prediction error signals in anterior insula. Brain Struct.
Funct. 214, 645653.
Cavagnaro, D.R., Pitt, M.A., Gonzalez, R., Myung, J.I., 2013. Discriminating among probability weighting functions using adaptive design optimization. J. Risk Uncertain. 47,
255289.
Clark, A., 2013. Whatever next? Predictive brains, situated agents, and the future of cognitive science. Behav. Brain Sci. 36, 181253.
d'Acremont, M., Schultz, W., Bossaerts, P., 2013. The human brain encodes event frequencies while forming subjective beliefs. J. Neurosci. 33, 1088710897.
Daunizeau, J., Den Ouden, H.E.M., Pessiglione, M., Kiebel, S.J., Friston, K.J., Stephan, K.E.,
2010. Observing the observer (II): deciding when to decide. PLoS One 5, e15555.
Daunizeau, J., Adam, V., Rigoux, L., 2014. VBA: a probabilistic treatment of nonlinear
models for neurobiological and behavioural data. PLoS Comput. Biol. 10, e1003441.
Dayan, P., Hinton, G.E., Neal, R.M., Zemel, R.S., 1995. The Helmholtz machine. Neural
Comput. 7, 889904.
de Lange, F.P., Jensen, O., Dehaene, S., 2010. Accumulation of evidence during sequential
decision making: the importance of top-down factors. J. Neurosci. 30, 731738.
Delorme, A., Makeig, S., 2004. EEGLAB: an open source toolbox for analysis of single-trial EEG
dynamics including independent component analysis. J. Neurosci. Methods 134, 921.
Dien, J., Spencer, K.M., Donchin, E., 2004. Parsing the late positive complex: mental
chronometry and the ERP components that inhabit the neighborhood of the P300.
Psychophysiology 41, 665678.
Dolan, R.J., Dayan, P., 2013. Goals and habits in the brain. Neuron 80, 312325.
Donchin, E., 1981. Surprise! Surprise? Psychophysiology 18, 493513.
Donchin, E., Coles, M.G., 1988. Is the P300 component a manifestation of context
updating? Behav. Brain Sci. 11, 357427.
Doya, K., Ishii, S., Pouget, A., Rao, R.P.N., 2007. Bayesian Brain: Probabilistic Approaches to
Neural Coding. MIT Press, Cambridge, MA.
Fiorillo, C.D., 2012. Beyond Bayes: on the need for a unied and Jaynesian denition of
probability and information within neuroscience. Information 3, 175203.
Folstein, J.R., Van Petten, C., 2008. Inuence of cognitive control and mismatch on the N2
component of the ERP: a review. Psychophysiology 45, 152170.
Fox, C.R., Poldrack, R.A., 2009. Prospect theory and the brain. In: Glimcher, P.W., Camerer,
C.F., Fehr, E., Poldrack, R.A. (Eds.), Neuroeconomics: Decision Making and the Brain.
Elsevier Academic Press, London, UK, pp. 145173.
Friedman, D., Cycowicz, Y.M., Gaeta, H., 2001. The novelty P3: an event-related brain potential
(ERP) sign of the brain's evaluation of novelty. Neurosci. Biobehav. Rev. 25, 355373.
Friston, K.J., 2005. A theory of cortical responses. Philos. Trans. R. Soc. B-Biol. Sci. 360,
815836.
Friston, K.J., 2010. The free-energy principle: a unied brain theory? Nat. Rev. Neurosci.
11, 127138.
Friston, K.J., Penny, W.D., Phillips, C., Kiebel, S.J., Hinton, G., Ashburner, J., 2002. Classical
and Bayesian inference in neuroimaging: theory. NeuroImage 16, 465483.
Friston, K.J., Mattout, J., Trujillo-Bareto, N., Ashburner, J., Penny, W.D., 2007. Variational
free energy and the Laplace approximation. NeuroImage 34, 220234.
Furl, N., Averbeck, B.B., 2011. Parietal cortex and insula relate to evidence seeking relevant
to reward-related decisions. J. Neurosci. 31, 1757217582.
Fuster, J.M., 2014. The prefrontal cortex makes the brain a preadaptive system. Proc. IEEE
102, 417426.
Garca-Larrea, L., Czanne-Bert, G., 1998. P3, positive slow wave and working memory
load: a study on the functional correlates of slow wave activity. Clin. Neurophysiol.
108, 260273.
Gold, J.I., Shadlen, M.N., 2007. The neural basis of decision making. Annu. Rev. Neurosci.
30, 535574.
Gonzalez, R., Wu, G., 1999. On the shape of the probability weighting function. Cogn.
Psychol. 38, 129166.
Gratton, G., Coles, M.G.H., Donchin, E., 1983. A new method for off-line removal of ocular
artifact. Electroencephalogr. Clin. Neurophysiol. 55, 468484.
Grether, D.M., 1980. Bayes rule as a descriptive model: the representativeness heuristic.
Q. J. Econ. 95, 537557.
Grether, D.M., 1992. Testing Bayes rule and the representativeness heuristic: some experimental evidence. J. Econ. Behav. Organ. 17, 3157.
Hampton, A.N., Bossaerts, P., O'Doherty, J.P., 2006. The role of the ventromedial prefrontal
cortex in abstract state-based inference during decision making in humans. J.
Neurosci. 26, 83608367.
Haykin, S., Fuster, J.M., 2014. On cognitive dynamic systems: cognitive neuroscience and
engineering learning from each other. Proc. IEEE 102, 608628.
Hillyard, S.A., Picton, T.W., 1987. Electrophysiology of cognition. In: Plum, F. (Ed.), Handbook of Physiology: The Nervous System, Section 1, vol. 5. Higher Functions of the
Brain, Part 2. American Physiological Society, Bethesda, MD, pp. 519584.
Hsu, M., Krajbich, I., Zhao, C., Camerer, C.F., 2009. Neural response to reward anticipation
under risk is nonlinear in probabilities. J. Neurosci. 29, 22312237.
Itti, L., Baldi, P., 2009. Bayesian surprise attracts human attention. Vis. Res. 49, 12951306.
Jaynes, E.T., 1988. How does the brain do plausible reasoning? In: Erickson, G.J., Smith, C.R.
(Eds.), Maximum-Entropy and Bayesian Methods in Science and Engineering. Kluwer
Academic Publishers, Dordrecht, The Netherlands, pp. 124.
Jaynes, E.T., 2003. Probability Theory: The Logic of Science. Cambridge University Press,
Cambridge, UK.
Kahneman, D., Tversky, A., 1979. Prospect theory: an analysis of decision under risk.
Econometrica 47, 263291.
Kass, R.E., Raftery, A.E., 1995. Bayes factors. J. Am. Stat. Assoc. 90, 773795.
Kelly, S.P., O'Connell, R.G., 2013. Internal and external inuences on the rate of
sensory evidence accumulation in the human brain. J. Neurosci. 33,
1943419441.
Knill, D.C., Pouget, A., 2004. The Bayesian brain: the role of uncertainty in neural coding
and computation for perception and action. Trends Neurosci. 27, 712719.
Kok, P., Failing, M.F., de Lange, F.P., 2014. Prior expectations evoke stimulus templates in
the primary visual cortex. J. Cogn. Neurosci. 26, 15461554.
Kolossa, A., Fingscheidt, T., Wessel, K., Kopp, B., 2012. A model-based approach to trial-bytrial P300 amplitude uctuations. Front. Hum. Neurosci. 6, 359.
Kopp, B., 2008. The P300 component of the event-related brain potential and Bayes'
theorem. In: Sun, M.K. (Ed.), Cognitive Sciences at the Leading Edge. Nova Science
Publishers, New York, NY, pp. 8796.
Kopp, B., Lange, F., 2013. Electrophysiological indicators of surprise and entropy in dynamic task-switching environments. Front. Hum. Neurosci. 7, 300.
Lee, D., Rushworth, M.F.S., Walton, M.E., Watanabe, M., Sakagami, M., 2007. Functional
specialization of the primate frontal cortex during decision making. J. Neurosci. 27,
81708173.
Lieder, F., Daunizeau, J., Garrido, M.I., Friston, K.J., Stephan, K.E., 2013. Modelling trial-bytrial changes in the mismatch negativity. PLoS Comput. Biol. 9, e1002911.
Luck, S.J., 2005. An Introduction to the Event-Related Potential Technique. MIT Press,
Cambridge, MA.
Matsuda, I., Nittono, H., 2014. Motivational signicance and cognitive effort elicit different
late positive potentials. Clin. Neurophysiol. http://dx.doi.org/10.1016/j.clinph.2014.
05.030.
McGrayne, S.B., 2011. The Theory That Would Not Die: How Bayes' Rule Cracked the Enigma Code, Hunted Down Russian Submarines, & Emerged Triumphant From Two Centuries of Controversy. Yale University Press, New Haven, CT.
Mohr, P.N., Biele, G., Heekeren, H.R., 2010. Neural processing of risk. J. Neurosci. 30,
66136619.
Neti, C., Potamianos, G., Luettin, J., Matthews, I., Glotin, H., Vergyri, D., Sison, J., Mashari, A.,
Zhou, J., 2000. Audiovisual speech recognition. Final Workshop 2000 Report, Center
for Language and Speech Processing vol. 764. Johns Hopkins University, Baltimore,
MD.
Nieuwenhuis, S., De Geus, E.J., Aston-Jones, G., 2011. The anatomical and functional relationship between the P3 and autonomic components of the orienting response. Psychophysiology 48, 162175.
O'Connell, R.G., Dockree, P.M., Kelly, S.P., 2012. A supramodal accumulation-to-bound signal that determines perceptual decisions in humans. Nat. Neurosci. 15, 17291735.
Oldeld, R.C., 1971. The assessment and analysis of handedness: the Edinburgh inventory.
Neuropsychologia 9, 97113.
Ostwald, D., Spitzer, B., Guggenmos, M., Schmidt, T.T., Kiebel, S.J., Blankenburg, F., 2012.
Evidence for neural encoding of Bayesian surprise in human somatosensation.
NeuroImage 62, 177188.
Penny, W.D., 2012. Comparing dynamic causal models using AIC, BIC and free energy.
NeuroImage 59, 319330.
Penny, W.D., Stephan, K.E., Mechelli, A., Friston, K.J., 2004. Comparing dynamic causal
models. NeuroImage 22, 11571172.
Penny, W.D., Stephan, K.E., Daunizeau, J., Rosa, M.J., Friston, K.J., Schoeld, T.M., Leff,
A.P., 2010. Comparing families of dynamic causal models. PLoS Comput. Biol. 6,
e1000709.
Phillips, L.D., Edwards, W., 1966. Conservatism in a simple probability inference task. J.
Exp. Psychol. 72, 346354.
Polich, J., 2007. Updating P300: an integrative theory of P3a and P3b. Clin. Neurophysiol.
118, 21282148.
Pouget, A., Beck, J.M., Ma, W.J., Latham, P.E., 2013. Probabilistic brains: knowns and
unknowns. Nat. Neurosci. 16, 11701178.
Prelec, D., 1998. The probability weighting function. Econometrica 66, 497527.
Preuschoff, K., Bossaerts, P., Quartz, S.R., 2006. Neural differentiation of expected reward
and risk in human subcortical structures. Neuron 51, 381390.
Preuschoff, K., Quartz, S.R., Bossaerts, P., 2008. Human insula activation reects risk prediction errors as well as risk. J. Neurosci. 28, 27452752.
Rangel, A., Camerer, C., Montague, P.R., 2008. A framework for studying the neurobiology
of value-based decision making. Nat. Rev. Neurosci. 9, 545556.
Robert, C., 2007. The Bayesian Choice: From Decision-Theoretic Foundations to Computational Implementation. Springer, New York, NY.
Ruchkin, D.S., Johnson, R., Mahaffey, D., Sutton, S., 1988. Toward a functional categorization of slow waves. Psychophysiology 25, 339353.
Shannon, C.E., Weaver, W., 1948. The mathematical theory of communication. Commun.
Bell Syst. Tech. J. 27, 379423.
Sokolov, Y.N., 1966. Orienting reex as information regulator. In: Leontiev, A.N., Luria, A.R.,
Smirnov, A.A. (Eds.), Psychological Research in the U.S.S.R. Progress Publishers,
Moscow, pp. 334360.
Soltani, A., Wang, X.J., 2010. Synaptic computation underlying probabilistic inference. Nat.
Neurosci. 13, 112119.
Spencer, K.M., Dien, J., Donchin, E., 2001. Spatiotemporal analysis of the late ERP responses to deviant stimuli. Psychophysiology 38, 343358.
Stephan, K.E., Weiskopf, N., Drysdale, P.M., Robinson, P.A., Friston, K.J., 2007. Comparing
hemodynamic models with DCM. NeuroImage 38, 387401.
Stephan, K.E., Penny, W.D., Daunizeau, J., Moran, R.J., Friston, K.J., 2009. Bayesian model
selection for group studies. NeuroImage 46, 10041017.
Strange, B.A., Duggins, A., Penny, W.D., Dolan, R.J., Friston, K.J., 2005. Information theory,
novelty and hippocampal responses: unpredicted or unpredictable? Neural Netw.
18, 225230.
237
Tversky, A., Kahneman, D., 1992. Advances in prospect theory: cumulative representation
of uncertainty. J. Risk Uncertain. 5, 297323.
Vilares, I., Krding, K., 2011. Bayesian models: the structure of the world, uncertainty,
behavior, and the brain. Ann. N. Y. Acad. Sci. 1224, 2239.
Vilares, I., Howard, J.D., Fernandes, H.L., Gottfried, J.A., Krding, K.P., 2012. Differential representations of prior and likelihood uncertainty in the human brain. Curr. Biol. 22,
16411648.
Wu, S.W., Delgado, M.R., Maloney, L.T., 2011. The neural correlates of subjective utility of
monetary outcome and probability weight in economic and in motor decision under
risk. J. Neurosci. 31, 88228831.
Yang, T., Shadlen, M.N., 2007. Probabilistic reasoning by neurons. Nature 447,
10751080.
Zhang, H., Maloney, L.T., 2012. Ubiquitous log odds: a common representation of probability and frequency distortion in perception, action, and cognition. Front. Neurosci.
6, 1.