Você está na página 1de 11

Tjerk Dercksen

s2033321
22/8/2016

Essay Clark: Surfing Uncertainty


In 2016 Andy Clark, professor of logic and metaphysics at the university of
Edinburgh, wrote the book Surfing Uncertainty: prediction, action and the
embodied mind. It was well received within the scientific community and by
leading scientists in the field of the predictive mind. For instance, Karl Friston,
arguably the most influential scientist in the field, writes: This is a truly superb
book that will have an enormous influence on the way we understand our
embodied brains. Kenneth Aizawa called it an essential point of departure . . .
to come to grips with the apparatus of predictive processing. The book claims to
be ground-breaking because it does not see perception as a step in the
information processing sequence (as in traditional approaches), but as a
consequence of the mind that is constantly trying to predict what will happen
next. This consequence of the predictive mind is also used to explain
imagination, action, reasoning and understanding and is aptly named a cognitive
package deal by Clark. The book does a good job at explaining how the idea of a
predictive mind can explain such a broad spectrum of cognitive functions.
However, because it has a very hands-on approach, little attention is given to the
history and development behind these ideas. Many ideas in the book have a long
history in psychological and philosophical research. For instance, a recurring
theme in the book is that the reality we see is not reality but a construction of the
mind. This idea has deep historical roots in both philosophy and science. In this
essay I will trace back the most important ideas of Clarks book to their origin and
describe their development over time. In this process I will describe the
emergence of the Bayesian view of the brain, how it developed in relation to the
traditional information processing view, and which advantages the Bayesian
approach entails. This will allow me to assess the value of Clarks book in the
context of previous research.
One may wonder why unravelling the origin of ideas behind a theory would be
useful. For instance, as a chemist I surely do not need to know when and how the
atom was discovered, as long as I know how to apply this knowledge? The
modern philosopher Daniel Dennett would be very disapproving of this position.
He emphasises the importance of being aware of the history of ideas underlying
scientific discoveries and theories, and that knowing which mistakes have been
made in the past will help to ask better questions in the future. In science and
philosophy, Dennett states, a lot of smart people have already made a lot of very
tempting mistakes, and that to avoid these mistakes without knowing them one
must be either extremely smart or extremely lucky (Dennett, 2013). The effect of
historical context on the field of cognitive neuroscience is specifically important
because of the interrelationship between the image of man at a given time and
how he thinks that his brains work. Russel Poldrack (2010) boldly reveals this in
an article by using a reductio ad absurdum:
Imagine that fMRI had been invented in the 1860s rather than the 1990s.
Instead of being based on modern cognitive psychology, neuroimaging would
instead be based on the faculty psychology of Thomas Reid and Dugald Steward,

which provided the mental faculties that Gall and the phrenologists attempted
to map onto the brain. Researchers would have presumably jumped from
phrenology to fMRI and performed experiments manipulating the engagement of
particular mental faculties, or examining individual differences in the strength of
the faculties. They almost certainly would have found brain regions that were
reliably engaged when a particular faculty was engaged, and potentially would
also have found regions whose activity correlated with the strength of each
faculty across subjects.
Poldrack continues his argument by coupling contemporary fMRI research to
phrenology faculties such as poetic talent and moral sense. The point that is
made is that we may be equally captured and possibly limited by a specific
taxonomy of mental function. Tracing back the ideas of Clarks theory is therefore
a useful practice because it will increase our awareness of the paradigm that we
are in, and so enable us to ask better questions and be appropriately critical
towards Clarks book.
This history of ideas starts with one of the most remarkable discoveries I made in
the process of analysing the historical roots of both the predictive processing and
the traditional view of perception. This is that the divide between the two
movements was already present, to a certain degree, between the two most
famous ancient philosophers. Plato and Aristoteles had subtly different ideas
about perception that, in retrospect, can be considered the difference between
the traditional view of perception and the view that is presented in the book of
Clark. Although both Plato and Aristoteles agreed that the experience of reality
was indirect and had to be constructed by the mind, Platos view takes a passive
stance while Aristoteles argues that perception is active. Plato sees perception as
a clearly structured step-by-step process. This is presented in his allegory of the
cave where first the light casts shadows on the walls, then these patterns enter
the eyes where in the next step reason is used to classify them using a store of
ideal forms. Actions were selected as the last step in the process by ethical
judgement using the moral faculty. The fact that the observer in the cave is a
chained prisoner is a striking metaphor fitting the view of Plato. In contrast
Aristoteles argues that perception requires movement into the world. One learns
by manipulation of the forms, textures, weights, and appearances of objects. He
states that behaviour is pro-active, and not reactive as Plato suggests (Freeman,
2003). This bears a lot of resemblance to the view of Clark who unifies action and
perception as being in essence the same thing: a way to sample the world and
test our predictions that are already generated by our internal model.
The next big steps in the development of both traditional cognitive neuroscience
and that of the predictive mind were made by Herman Von Helmholtz (18211894), although he is often claimed by the encampment of the predictive mind
for being the founder of predictive processing. When reading a paper that has
associations with the predictive or Bayesian brain, Von Helmholtz will
undoubtedly make an appearance somewhere in the introduction. This is
rightfully so, because he was the first to seize the idea of the brain as a
hypothesis tester, and to see perception as a process of unconscious inference
(Helmholtz, 1855). What is often overlooked however, is the philosopher that
inspired Von Helmholtz, being Immanuel Kant. Kant had advanced the first
detailed model of the human mind that did not assume, but rather sought to
explain, basic aspects of consciousness and perceptual experience (Hopkins,

2012). Kant recognized that we must understand experience and the whole
conscious image of ourselves of which it is part as somehow created by internal
activity of our minds. This idea was not completely new as philosophers like
Descartes already recognized that experience was not simply produced by effects
on the sensory organs, because perceptual experience regularly arose in dreams.
Kant defined two parallel worlds: an objective physical world and a subjective
world of intuition. The process of going from the objective physical world to the
subjective world of intuition was called synthesis. Synthesis was unconscious and
people made use of concepts that they were born with like a sense of time and
space to make sense of the incoming sensory signal. Helmholtz was critical of
this nativism, and believed that people were not born with concepts but learned
these by engaging with the world after they are born. According to Helmholtz
people must have powerful internal models of the world that are constantly
updated through a process of inductive reasoning, to solve the problem of
inferring the structure of the world from sensory input. This idea would later
develop in the notion of the Bayesian brain. Helmholtz is seen as the founder of
empiricism in psychology because he argued that between sensations (when our
senses first register the effects of stimulation) and our conscious perception of
the real world, there must be intermediate processes of a constructive nature
(Gordon, 2004). The increasing importance of behaviourism at the time
prevented empiricism to flourish. The cognitive revolution in the 1950s changed
this and a lot of research was conducted in a short time. Gregory (1968)
converged a lot of this research in his theory of perception as hypotheses, which
was inspired by the work of Von Helmholtz. He showed that a lot of visual
illusions were based on hypotheses that worked most of the time, but in
extraordinary cases failed to perform. For instance, in the hollow mask illusion a
person perceives a hollow mask as a face because the hypothesis of a hollow
mask is highly unlikely. In addition, he presented the argument that people
behave towards objects in a specific way, even if the object is not completely
visible. For instance, people behave towards a table like it is a table although
they only see three legs and the trapezoidal projection of the top. Gregory
presents many more arguments and makes a convincing case for the theory that
perceptions are unconsciously driven by hypotheses about the world that are true
in most cases.
An important development for both the traditional and the predictive processing
theory of perception was the emergence of the field of artificial intelligence. The
second World War had let to important technological discoveries like the
computer. Together with the (linked) emergence of information theory (Shannon,
1948) and cybernetics (Wiener, 1948), the paradigm of looking at the brain
shifted, as is captured by this quote from Neisser (1967):
. . .the task of a psychologist trying to understand human cognition is
analogous to that of a man trying to discover how a computer has been
programmed.
From this position two different paths were pursued that are in some ways
comparable to the two different positions of Plato and Aristoteles as presented
above. The first has become the dominant view of perception and is analogous to
Platos view on perception. This began with important research by Hubel and
Wiesel (1962, 1968) that showed that the visual cortex contains cells responding
differentially to lines and edges, according to the orientation of these stimuli.

David Marr took this and other recent work on visual perception and used an
information theoretical approach to develop his theory of visual perception (Marr,
1982). This theory is separated by well-defined stages in which an image is built
from the bottom-up. The first step in the process is the retinal image which can
be defined as representing intensity values across the retina and is the starting
point in the process of seeing. Next is the primal sketch of which the function is
to take the raw intensity values of the visual image, and make explicit certain
forms of information that are contained in this image. The third stage is the 2.5D
image at which the orientation and rough depth of visible surfaces are made
explicit. The last stage then is the 3D model representation in which the
perceiver has attained a model of the real external world. After this process
appropriate actions can be selected and executed as a top-down process. The
influence of this work on cognitive neuroscience has been great, as can be seen
in the current textbooks on cognition where bottom-up feature nets are still
presented as the main way our minds create a model of reality (Reisberg, 2013).
The next step in the field of artificial intelligence was to make computers achieve
humanlike perception. Using the information about neurons obtained from
research by Cajal and Golgi at the start of the 20 th century, neural networks were
programmed and algorithms were developed to perform perception. However,
despite major advances of artificial intelligence on other areas as for instance
chess playing, perception proved to be a very hard problem to solve. Some
scientists in the field of computer vision came to the conclusion that the problem
of vision is ill-posed (Yuille & Blthoff, 1996). They state that:
. . . the retinal image is potentially an arbitrarily complicated function of the
visual scene and so there is insufficient information in the image to uniquely
determine the scene.
The problems that these scientists were facing was that the response of cells in
the visual cortex (such as a feature detector) do not uniquely determine whether
the corresponding edge in the scene is due to a shadow, reflections, a change in
depth, orientation or material (Kersten, 1999). The bottom-up process from
retinal image to 3D perception as proposed by Marr thus seemed an impossible
problem, although neurophysiological evidence of feature responsive cells was
successfully continuing (Metzner et al., 1998). To know why the problem was illposed, one must know how Marr approached the understanding of the brain. Marr
and Poggio (1976) thought that information processing systems (such as the
brain) had to be understood at three levels:

The computational level, which entails what problems the system has to
solve and why
The algorithmic level, which entails what representations it uses and what
processes it employs
Implementational level, which entails how the system is physically
realised

The problem at the computational level of vision that Marr stated was knowing
what is where by looking. This was the problem Yuille and Blthoff (1996) found
ill-posed, and they found a better problem using Bayesian probability theory. In
line with Helmholtz and Gregory they argued that the problem was not one of
constructing, but of inferring a visual scene from the retinal image. To make this
new problem at the computational level understandable a brief build-up to

Bayesian inference will be presented inspired 1 by the book of Hohwy (2013): The
Predictive Mind. First of all, the option should be considered that the process of
inferring a visual scene from a retinal image could just be a matter of mere
biases. Hohwy explains this point by giving the example of perceiving a bicycle. If
the image of a bicycle appears on the retina, the brain could just be biased
towards inferring a bicycle. However, if a swarm of bees in the shape of a bicycle
appears on the retina, then the brain would have no way of knowing the
difference because it is simply biased towards a bicycle. Also, there would be no
difference in quality between the perception of a bicycle and the perception of a
swarm of bees in the shape of a bicycle. What is needed is a constraint that will
assign quality to these different hypotheses. In other words, a constraint with
normative impact. One good constraint to consider is a very intuitive one, as it
seems obvious that quality could be assigned to these hypotheses on the basis of
prior belief: experience tells us that it is extremely unlikely that a swarm of bees
would take the shape of a bicycle, and so one could rank the hypothesis of a
bicycle higher than that of a swarm of bees. This idea of prior belief is crucial in
Bayesian inference. Next one could consider to be in a house with no windows
and no books or internet. You hear a tapping sound and need to figure out what is
causing it. In principle an infinite list of hypotheses could be made of what is
causing the sound (a woodpecker pecking at the wall, a branch tapping at the
wall in the wind, a burglar tampering with a lock, heavy roadworks further down
the street, etc.). However, not every hypothesis will seem equally relevant. For
example, one would not accept that the tapping noise was caused by yesterdays
weather. This means there is a link between a hypothesis and the effects in
question. It can be said that if it is really a woodpecker, then it would indeed
cause this kind of sound. This is the notion of likelihood: the probability that the
causes described in the hypothesis would cause those effects. Assessing such
likelihoods is based on assumptions of causal regularities in the world (for
example, the typical effects of woodpeckers). Based on knowledge of causal
regularities in the world hypotheses can be ranked according to their likelihood.
Only the hypotheses with the highest likelihoods could be considered. However,
doing this does not ensure good causal inference. For instance, the tapping sound
could be caused by a tapping machine especially designed to illustrate Bayesian
inference. This hypothesis has a very high likelihood, because it fits the auditory
evidence extremely well, but it does not seem like a good explanation. The
problem is that the tapping machine hypothesis seems very improbable when
considered in its own right and before the tapping sound was perceived.
Therefore, the probability of the hypothesis prior to any consideration of its fit
with the evidence should be considered. This is referred to as the prior
probability. So prior beliefs have presented two tools for figuring out the cause of
the tapping sound: likelihood, which is the probability of the effect observed in
the house given the particular hypothesis that is considered; and the prior
probability of the hypothesis, which is the estimate of how probable that
hypothesis is independently of the effects currently observed. Likelihood and
prior are the main ingredients in Bayes rule. Bayes (1701-1761) was an amateur
mathematician who, inspired by Bernoulli, developed a theorem of probability
theory that was further developed by Laplace and which is thought by many to
1 with inspired I mean an abridged and paraphrased version of his first chapter,
supplemented with background information.

be a paradigm of rationality (Jaynes, 1986). According to this rule the probability


of a given hypothesis (such as the woodpecker hypothesis), given some evidence
(such as hearing some tapping sound) should be considered by the product of the
likelihood (which was the probability of the evidence given the hypothesis) and
the prior probability of the hypothesis (not considering the evidence). The
resulting assignment of probability to the hypothesis is known as the posterior
probability, and the best inference is to the hypothesis with the highest posterior
probability. Mathematically presented this theory of Bayesian inference of Yuille
and Blthoff looks as follows:
P(S|I) = P(I|S) P(S) / P(I)
Here S represents the visual scene and I represents the retinal image. P(I|S) is the
likelihood function for the scene and it specifies the probability of obtaining
image I from a given scene S. P(S) is the prior probability of the visual scene. So,
given this new mathematical view on the brain, how does the problem on the
computational level change? The problem has transformed into a problem of
probabilistic inference where a scene must be inferred, given data through the
senses, using the prior beliefs of likelihood and prior probability. The mathematics
aside, the problems stays the same as the one Helmholtz posed: to infer the
structure of the world from sensory input. Having a new problem at the
computational level, this leaves the two other levels untreated. Programming the
Bayesian ideas into neural networks led to great advances in the field of artificial
intelligence and machine learning and inspired cognitive neuroscientist to include
these insights into their theories about the mind. The question that started the
advances on the algorithmic level was: where should the prior beliefs come from?
Because in order to prioritize between different hypotheses about the causes of
sensory input, the system needs to appeal to prior belief. But our account of it
would be circular if the story reduces to prior belief being directly based on the
thing we are trying to understand, namely perceptual inference. The solution was
the application of a hierarchical generative model, inspired by information theory
and so called empirical Bayes (Huang & Rao, 2011).
The model as presented in the book of Clark is shown in Figure 1. Lower levels of
the hierarchy are concerned with highly spatially and temporally precise
information, and higher levels with increasingly abstract information. As can be
seen every level in the hierarchy perceives expectations from the level above it.
These expectations form the priors in which new evidence is reviewed in a
Bayesian way. Notice that now the circular element of where the priors come
from has disappeared, as priors are drawn from the higher level in the hierarchy.
While Bayesian optimal expectations about what will happen next are send down
the hierarchy, this causes a reaction in the levels receiving these expectations.
This is best illustrated at the point of the senses. If a line is expected to appear
somewhere in the visual field at a certain time, and it does not, a so called
prediction error will move back up the hierarchy. The level receiving this
prediction error will treat this as new evidence and adjust the posterior
probability of its hypotheses in a Bayesian optimal way (given the priors provided
by the level above). If prediction error is still unexplained by the new hypothesis,
this prediction error will keep moving upwards until it is explained on a higher
level. This model emerged as a joint effort of many researchers under the name
of predictive coding, however names that played big roles in its development are
Karl Friston, David Mumford and Geoffrey Hinton. To answer the question asked

by the algorithmic level: how does the system do what it does? The answer is: by
minimising prediction error (and thus coming up with the best possible
hypotheses about the state of the world).

Figure 1: the hierarchical predictive model (Clark, 2016)

The last level of analysis, the implementational level, has been extensively
researched the last ten years or so. The first proof of concepts were the neural
networks that created these models in the first place. After scientists
programmed and trained these neural networks they found that model neurons
developed receptive field properties similar to those in V1 (Huang & Rao, 2011).
In retinal cells of salamander, ganglion cells were found that performed predictive
coding computations, changing the expectations in reaction to a new surrounding
(Hosoya et al., 2005). Also, the ERP potentials that signal surprise like the P300,
are considered evidence for the signalling of prediction error. There is way more
physiological evidence than these examples, but at the same time there is
agreement in the field about the still speculative nature of this evidence and the
need to be critical. Although there is great agreement of the overwhelming
evidence that the brain must be some kind of probabilistic prediction machine
(Helmholtz, 1855), how this is implemented in the brain is still an emerging field
of physiological research. This new paradigm does however provide the scientific
community with a very testable mechanism. Because all areas of cognition
should be explained by the process of prediction error minimization, new
research can focus on falsifying and confirming this underlying mechanism. How
this mechanism works on a neural level will be the main challenge that future
research faces, but at least it is a well-defined challenge.
The discussion up to this point can also be interpreted as a variation on the
bottom-up versus top-down debate in cognitive psychology. This debate started
between Richard Gregory and James Gibson. Gregory, as already discussed,
argued that perception relied on top-down processes where higher cognitive
information is used to make inferences about stimulus information that is often
ambiguous. On the other side of the argument was Gibson who argued that

perception is not subject to hypotheses testing. According to Gibson sensory


information is analysed in one direction: from simple analysis of raw sensory data
to ever increasing complexity of analysis through the visual system (McLeod,
2007). Marrs theory of perception is a prototypical example of such a bottom-up
theory. On the other hand, the Bayesian brain paradigm could be considered an
example of a top-down theory where optimal hypotheses are formed about
incoming sensory stimuli. As the top-down bottom-up debate developed over
time, top-down influence was associated with an active volitional process based
on expectancy and goal set, which was completely under control of the intentions
of the observer. Bottom-up, in contrast, was associated with passive and
automatic processing, determined by the feature properties present in the
environment and driven by emotional content or previous experience (Theeuwus,
2010). Although these descriptions seem clear-cut, there has been and still is a
lot of debate about these definitions and at this time so many definitions are
being used that some researchers have called it a fuzzy dichotomy (Rauss &
Pourtois, 2013), or even a failed dichotomy (Awh, Belopolsky & Theeuwes,
2012). The main problem with the bottom-up versus top-down dichotomy is that
when the terms are broadly interpreted they are of little use, but when the terms
are applied in more detailed areas of cognition they do not seem to fit within
current research. As Rauss and Pourtois (2013) state: . . . the use of a simple
binary classification does not appear to capture the profusion of mental
processes or functions, which include intention, volition, expectation, and
emotion. As mentioned, the two main ways of thinking about perception can be
captured by the top-down versus bottom-up paradigm in one way or another. This
can also lead the way in variations between the two, as is often seen in cognitive
psychology textbooks where perception is seen as a traditional bottom-up
process with some form of top-down control. However, because of the fuzziness
surrounding these terms I chose to stick with the traditional and Bayesian view of
the brain in this essay as I think this gives a clearer picture of the developments
in the field.
Concluding this essay, the contribution of Clarks book will be evaluated. As can
be seen by coarsely reviewing the developments up to this point, many of the
ideas presented in Clarks book are not completely new, and some are even
classic. However, the value of the book is in its state of the art collection of these
ideas, its application on such a broad area of cognition, and the atmosphere of
embodiment throughout the book. The core of Clarks book is the hierarchical
generative model that was just described above. The only new element that is
added here is that he names this predictive processing instead of predictive
coding, reasoning that it would otherwise be confused with predictive coding as a
form of compressing data. In his book he explains how the central mechanism of
predictive processing can be applied to imagination, action, dreaming, autism,
schizophrenia, emotions, reasoning, empathy and even more. Although none of
these applications of predictive processing are his ideas, the act of compiling it
makes the book convincing in its claim to offer a cognitive package deal. The
most original aspect of the book lies in the view of embodiment that Clark has
emphasized throughout the book. Clark is an expert on embodied cognition,
which is the belief that many features of human cognition are shaped by aspects
of the body beyond the brain. He teaches how the internal model of the world is
first learned, and then constantly updated by the bodys active interaction with
the environment. This results in the Aristotelian view of the brain where one

learns through movement and manipulation of the world, with a brain that is
always active. Sharing this classic idea in its broadest, most evolved version, is in
my opinion of great value for future research(ers). This point holds especially
when considering that textbooks on cognition still teach the traditional
information processing view of perception, and that this is only the second book
written on the predictive mind. Given the importance of the work and its deep
historical roots I can even imagine that, to promote the spread of the book, one
would call it ground-breaking.

References
Awh, E., Belopolsky, A. V., & Theeuwes, J. (2012). Top-down versus bottom-up
attentional control: a failed theoretical dichotomy. Trends in Cognitive
Sciences, 16(8), 437443.
Dennett, D. C. (2013). Intuition pumps and other tools for thinking. WW Norton &
Company.
Freeman III, W. J. (2003). Neurodynamic models of brain in
psychiatry.Neuropsychopharmacology, 28.
Friston, K., & Kiebel, S. (2009). Predictive coding under the free-energy
principle. Philosophical Transactions of the Royal Society of London B: Biological
Sciences, 364(1521), 1211-1221.
Gordon, I. E. (2004). Theories of visual perception. Psychology Press.
Helmholtz, H. (1855; 1903). ber das Sehen des Menschen (1855). In Vortrge
und Reden von Hermann Helmholtz. 5th ed. Vol.1.
Hopkins, J. (2012) Kantian Neuroscience and Radical Interpretation.
Hosoya T, Baccus S, Meister M. Dynamic predictive coding by the retina. Nature
2005, 436:7177.
Huang, Y., & Rao, R. P. (2011). Predictive coding. Wiley Interdisciplinary Reviews:
Cognitive Science, 2(5), 580-593.
Jaynes, E. T. (1986). Bayesian methods: General background.
Kant, I. (1781). Kritik der reinen Vernunft. Hamburg: Felix Meiner.
Kersten D. 1999. High-level vision as statistical inference. In The New Cognitive
Neurosciences, ed. MS Gazzaniga, pp. 35363. Cambridge, MA: MIT Press. 2nd ed.
Lenoir, T. (2006). Operationalizing Kant: Manifolds, models, and mathematics in
Helmholtzs theories of perception. In The Kantian legacy in nineteenth-century
science. M. Friedman and A. Nordmann (eds.). Cambridge, Mass.: MIT Press, 141
210.
Marr, D.; Poggio, T. (1976). "From Understanding Computation to Understanding
Neural Circuitry". Artificial Intelligence Laboratory. A.I. Memo. Massachusetts
Institute of Technology.
McLeod, S. A. (2007). Visual Perception Theory. Retrieved from
www.simplypsychology.org/perception-theories.html
Metzner W, Koch C, Wessel R, Gabbiani F (March 1998). "Feature extraction by
burst-like spike patterns in multiple sensory maps". The Journal of
Neuroscience. 18 (6): 2283300.
Poldrack, R. A. (2010). Mapping mental function to brain structure: how can
cognitive neuroimaging succeed?. Perspectives on Psychological Science,5(6),
753-761.

Rauss, K., & Pourtois, G. (2013). What is Bottom-Up and What is Top-Down in
Predictive Coding? Frontiers in Psychology, 4, 276.
Reisberg, D. (2013). Cognition: Exploring the Science of the Mind: Fifth
International Student Edition. WW Norton & Company.
Theeuwes, J. (2010). Topdown and bottomup control of visual selection. Acta
psychologica, 135(2), 77-99.
Wiener, Norbert (1948). Cybernetics, or Control and Communication in the Animal
and the Machine. Cambridge: MIT
Yuille, A. L. and H. H. Blthoff, 1996. Bayesian decision theory and psychophysics.
In Perception as Bayesian Inference, ed. K. D.C. and R. W. Cambridge, U.K.:
Cambridge University Press.

Você também pode gostar