Você está na página 1de 329

i

Exploring RoboticMinds
ii

OXFORD SERIES ON
COGNITIVE MODELS AND ARCHITECTURES

SeriesEditor
Frank E.Ritter

SeriesBoard
Rich Carlson
Gary Cottrell
Robert L.Goldstone
Eva Hudlicka
William G. Kennedy
Pat Langley
Robert St.Amant

Integrated Models of Cognitive Systems


Edited by Wayne D.Gray
In Order to Learn:How the Sequence of Topics Influences Learning
Edited by Frank E.Ritter, Joseph Nerb, Erno Lehtinen, and TimothyOShea
How Can the Human Mind Occur in the Physical Universe?
By John R.Anderson
Principles of Synthetic Intelligence PSI:An Architecture of Motivated Cognition
By JoschaBach
The MultitaskingMind
By David D.Salvucci and Niels A.Taatgen
How to Build a Brain:ANeural Architecture for Biological Cognition
By Chris Eliasmith
Minding Norms:Mechanisms and Dynamics of Social Order in Agent Societies
Edited by Rosaria Conte, Giulia Andrighetto, and Marco Campenn
Social Emotions in Nature and Artifact
Edited by Jonathan Gratch and Stacy Marsella
Anatomy of the Mind:Exploring Psychological Mechanisms and Processes
with the Clarion Cognitive Architecture
By RonSun
Exploring Robotic Minds:Actions, Symbols, and Consciousness
as Self-Organizing Dynamic Phenomena
By JunTani
iii

Exploring RoboticMinds
Actions, Symbols, and Consciousness
as Self-Organizing Dynamic Phenomena

JunTani

1
iv

1
Oxford University Press is a department of the University of Oxford. It furthers
the Universitys objective of excellence in research, scholarship, and education
by publishing worldwide. Oxford is a registered trade mark of Oxford University
Press in the UK and certain other countries.

Published in the United States of America by Oxford UniversityPress


198 Madison Avenue, NewYork, NY 10016, United States of America.

Oxford University Press2017

All rights reserved. No part of this publication may be reproduced, storedin


a retrieval system, or transmitted, in any form or by any means, withoutthe
prior permission in writing of Oxford University Press, or as expressly permitted
by law, by license, or under terms agreed with the appropriate reproduction
rights organization. Inquiries concerning reproduction outside the scopeofthe
above should be sent to the Rights Department, Oxford University Press,atthe
addressabove.

You must not circulate this work in any otherform


and you must impose this same condition on any acquirer.

Library of Congress Cataloging-i n-PublicationData


Names: Tani, Jun, 1958 author.
Title: Exploring robotic minds : actions, symbols, and consciousness as self-organizing
dynamic phenomena / Jun Tani.
Description: Oxford; New York: Oxford University Press, [2017] |
Series: Cognitive models and architectures | Includes bibliographical
references and index.
Identifiers: LCCN 2016014889 (print) | LCCN 2016023997 (ebook) |
ISBN 9780190281069 (hardcover : alk. paper) | ISBN 9780190281076 (UPDF)
Subjects: LCSH: Artificial intelligence. | Robotics. | Cognitive neuroscience.
Classification: LCC Q335 .T3645 2017 (print) | LCC Q335 (ebook) |
DDC 629.8/9263dc23
LC record available at https://lccn.loc.gov/2016014889

987654321
Printed by Sheridan Books, Inc., United States of America
v

Contents

Foreword by Frank E.Ritter ix


Preface xiii

Part I On the Mind

1. Where Do We Begin with Mind? 3

2. Cognitivism 9
.1 Composition and Recursion in Symbol Systems
2 9
2.2 Some Cognitive Models 13
2.3 The Symbol Grounding Problem 16
2.4 Context 18
2.5 Summary 19

3. Phenomenology 21
3.1 Direct Experience 22
3.2 The Subjective Mind and Objective World 23
3.3 Time Perception:How Can the Flow of Subjective
Experiences Be Objectified? 26
3.4 Being-in-the-World 29
3.5 Embodiment of Mind 32
3.6 Stream of Consciousness and Free Will 37
3.7 Summary 41

v
vi

vi Contents

4. Introducing the Brain and Brain Science 43


4.1 Hierarchical Brain Mechanisms for Visual Recognition
and Action Generation 44
4.2 A New Understanding of Action Generation
and Recognition in the Brain 55
4.3 How Can Intention Arise Spontaneously and Become
an Object of Conscious Awareness? 69
4.4 Deciding Among Conflicting Evidence 75
4.5 Summary 77

5. Dynamical Systems Approach for Modeling


Embodied Cognition 81
5.1 Dynamical Systems 83
5.2 Gibsonian and Neo-Gibsonian Approaches 93
5.3 Behavior-Based Robotics 103
5.4 Modeling the Brain at Different Levels 109
5.5 Neural Network Models 112
5.6 Neurorobotics from the Dynamical Systems
Perspective 125
5.7 Summary 136

Part II Emergent Minds:Findings from Robotics Experiments

6. New Proposals 141


6.1 Robots with Subjective Views 141
6.2 Engineering Subjective Views into Neurodynamic
Models 143
6.3 The Subjective Mind and the Objective World
as an Inseparable Entity 148

7. Predictive Learning About the World from


Actional Consequences 151
7.1 Development of Compositionality:The Symbol
Grounding Problem 152
7.2 Predictive Dynamics and Self-Consciousness 161
7.3 Summary 172
vii

Contents vii

8. Mirroring Action Generation and Recognition


with Articulating SensoryMotor Flow 175
.1 A Mirror Neuron Model:RNNPB
8 177
8.2 Embedding Multiple Behaviors in Distributed
Representation 180
8.3 Imitating Others by Reading Their Mental States 182
8.4 Binding Language and Action 190
8.5 Summary 196

9. Development of Functional Hierarchy for Action 199


9.1 Self-Organization of Functional Hierarchy
in Multiple Timescales 203
9.2 Robotics Experiments on Developmental Training
of Complex Actions 209
9.3 Summary 216

10. Free Will for Action and Conscious Awareness 219


0.1 A Dynamic Account of Spontaneous Behaviors
1 219
10.2 Free Will, Consciousness, and Postdiction 230
10.3 Summary 239

11. Conclusions 243


1.1
1 Compositionality in the Cognitive Mind 243
11.2 Phenomenology 247
11.3 Objective Science and Subjective Experience 251
11.4 Future Directions 255
11.5 Summary 262

Glossary for Abbreviations 269


References 271
Index 289
viii
ix

Foreword
Frank E.Ritter

This book describes the background and results from Jun Tanis work of
over a decade of building robots that think and learn through interaction
with the world. It has numerous useful and deep lessons for modelers
developing and using symbolic, subsymbolic, and hybrid architectures,
so Iam pleased to see it in the Oxford Series on Cognitive Models and
Architectures. It is work that is in the spirit of Newell and Simons (1975)
theory of empirical exploration of computer science topics and their
work on generation of behavior, and also takes Newell and Simons and
Feynmans motto of understanding through generation of behavior seri-
ously. At the same time, this work extends the physical symbol hypoth-
esis in a very useful way by suggesting by example that the symbols of
human cognition need not be discrete symbols manually fed into com-
puters (which we have often done in symbolic cognitive architectures),
but can instead be composable neuro-dynamic structures arising through
iterative learning of perceptual experience with the physicalworld.
Tanis work has explored some of the deep issues in embodied cog-
nition, about how interaction with the environment happens, what this
means for representation and learning, and how more complex behavior
can be created or how it arises through more simple aspects. These les-
sons include insights about the role of interaction with the environment,
consciousness and free will, and lessons about how to build neural net
architectures to drive behavior in robots.

ix
x

x Foreword

The book starts with a review of the foundations of this work, includ-
ing some of the philosophical foundations in this area (including the
symbol grounding problem, phenomenology, and the role of time in
thinking). It argues for a role of hierarchy in modeling cognition, and
for modeling and understanding interaction with an external world.
The book also notes that state space attractors can be a useful concept
in understanding cognition, and, I would add, this could be a useful
additional way to measure fit of a model to behavior. This review also
reminds us of areas that current symbolic models have been uninformed
byI dont think that these topics have been so much ignored as much
as put on a list for later work. These aspects are becoming more timely,
as Tanis work shows they can be. The review chapters make this book
particularly useful as an advanced textbook, which Tani already uses
itfor.
Perhaps more importantly, in the second half of the book (Chapters6
to 11)Tani describes lessons from his own work. This work argues that
behavior is not always programmed or extant in a system, but that it can
or often should arise in systems attempting to achieve homeostasis
that there are positions of stability in a mental representation (including
modeling others, imitation), and that differences in knowledge between
the levels can give rise to effects that might be seen to be a type of con-
sciousness, a mental trace of what lower levels should do or are doing, or
explanations of what they have done based on predictions of the agents
own behavior, a type of self-reflexive mental model. These results sug-
gest that more models should model homeostasis and include more goals
and knowledge about how to achieveit.
His work provides another way of representing and generating behav-
ior. This way emphasizes the dynamic behavior of systems rather than
the data structures used in more traditional approaches. The simple ideas
of evolution of knowledge, feedback, attractors, and further concepts
provide food for thought for all systems that generate behavior. These
components are reviewed in the first part of the book. The second part
of the book also presents several systems used to explore theseideas.
Lessons from this book could and should change how we see all kinds
of cognitive architectures. Many of these concepts have not yet been
noticed in symbolic architectures, but they probably exist in them. This
new way to examine behavior in architectures has provided insights
already about learning and interaction and consciousness. Using these
concepts in existing architectures and models will provide new insights
xi

Foreword xi

into how compositional thoughts and actions can be generated without


facing the notorious problems of the symbol grounding problem or, ulti-
mately, the mindbody problem.
In his work about layers of representation, he has seen that higher
levels might not just lead the lower levels, but also follow them, adjust-
ing their own settings based on the lower levels behavior. An interpre-
tation of the higher levels trying to follow or predict the lower levels
provides a potential computational description and explanation of some
forms of consciousness and free will. I found these concepts particu-
larly intriguing. Not only that higher levels could follow and not lead
lower levels, but that the mismatch could lead to a kind of postdiction
in which intention becomes consciously aware after action. We might
see this elsewhere as other architectures, their environments, and their
interaction with the environment become more complex, and indeed
should look forit.
I hope you find the book as useful and suggestive of new areas of work
and new aspects of behavior to consider for including in architectures
as Ihave.
xii
xiii

Preface

The mind is ever elusive, and imagining its underlying mechanisms


remains a constant challenge. This book attempts to show a clear pic-
ture of how the mind might work, based on tangible experimental data
Ihave obtained over the last two decades during my work to construct
the minds of robots. The essential proposal of the book is that the mind
is comprised of emergent phenomena, which appear via intricate and
often conflictive interactions between the top-down subjective view for
proactively acting on the external world and the bottom-up recognition
of the resultant perceptual reality. This core idea can provide a scaf-
fold to account for the various fundamental aspects of the mind and
cognition. Allowing entangling interactions between the top-down and
bottom-up processes means that the skills we need to generate complex
actions, knowledge, and concepts for representing the world and the
linguistic competency we need to express our experiences can naturally
developand the cogito1 that allows this compositional yet fluid think-
ing and action appears to be embedded in dynamic neural structures.
The crucial argument here is that this cogito is free from the prob-
lems inherent in Cartesian dualism, such as that of interaction and how
a nonmaterial mind can cause anything in a material body and world,
and vice versa. We avoid such problems because the cogito embedded

1. Cogito is from a Latin philosophical proposition by Rene Descartes Cogito ergo


sum, which has been translated as I think, therefore Iam. Here, cogito denotes a
subject of cognizing or thinking.

xiii
xiv

xiv Preface

in the continuous state space of dynamic neural systems is also matter,


rather than nonmatter composed of a discrete symbol system or logic.
Therefore, the cogito can interact physically with the external world:As
one side pushes forward a little, the other side pulls back elastically so
that a point of compromise can be found in conflictive situations through
iterative dynamics. It is further proposed that even the nontrivial prob-
lem of consciousness (what David Chalmers has called the hard problem
of consciousness) and free will can become accessible by considering that
consciousness is also an emergent phenomenon of matter arising inevita-
bly from such conflictive interactions. The matter here is alive and vivid
in never-ending trials by the cogito to comprehend an ever-changing
reality in an open-ended world. Each of these statementsmy propos-
als on the workings of the mindwill be examined systematically by
reviewing multidisciplinary discussions, largely from the fields of neuro-
science, phenomenology, nonlinear dynamics, psychology, cognitive sci-
ence and cognitive robotics. Actually, the book aims for a unique way of
understanding the mind from rather an unordinary but inspiring combi-
nation of ingredients such as humanoid robots, Heideggers philosophy,
deep learning neural nets, strange attractor from chaos theory, mirror
neurons, Gibsonian psychology, andmore.
The book has been written with a multidisciplinary audience in
mind. Each of the chapters start by presenting general concepts or
tutorials on each disciplinecognitive science, phenomenology, neu-
roscience and brain science, nonlinear dynamics, and neural network
modelingbefore exploring the subjects specifically in relation to the
emergent phenomena which Ibelieve constitute the mind. By providing
a brief introduction to each topic, I hope that a general audience and
undergraduate students with a specific interest in this subject will enjoy
reading on to the more technical aspects of the book that describe the
neurorobotics experiments.
I have debts of gratitude to many people. First of all, Ithank Jeffrey
White for plenty of insightful advice on this manuscript in regard to
its contents, as well as for editing in English and examining every page.
Iwould like to commend and thank all members of my former labora-
tory at RIKEN as well as of the current one in the Korean Advanced
Institute of Science and Technology (KAIST) who, over the years, have
contributed to the research described in this book. I am lucky to have
many research friends with whom Ican have in-depth discussions about
shared interests. Takashi Ikegami has been one of the most inspiring. His
xv

Preface xv

stroke of genius and creative insights on the topics of life and the mind
are irreplaceable. I admit that many of my research projects described
in this book have been inspired by thoughtful discussions with him.
Ichiro Tsuda provided me deep thoughts about possible roles of chaos in
the brain. The late Joseph Goguen and late Francisco Varela generously
offered me much advice about the links between neurodynamics and
phenomenology. Karl Friston has provided me thoughtful advice in the
research of our shared interests on many occasions. Michael Arbib offered
insight into the concept of action primitives and mirror neuron model-
ing. He kindly read my early draft and sent it to Oxford University Press.
Ihave been inspired by frequent discussions about developmental robot-
ics with Minoru Asada and Yasuo Kuniyoshi. Iwould like to express my
gratitude and appreciation to Masahiro Fujita, Toshitada Doi, and Mario
Tokoro of Sony Corporation who kindly provided me with the chance to
start my neurorobotics studies more than two decades ago in an elevator
hall in a Sony building. Imust thank Masao Ito and Shun-ichi Amari at
RIKEN Brain Science Institute for their thoughtful advice to my research
in general. And, I express my gratitude for Miki Sagara who prepared
many figures. Iam grateful to Frank Ritter as the Oxford series editor on
cognitive models and architectures who kindly provided me advice and
suggestions from micro details to macro levels of this manuscript during
its development. The book could not have been completed in the pres-
ent form without his input. Iwish to thank my Oxford University Press
editor Joan Bossert for her cordial support and encouragement from the
beginning. Finally, my biggest thanks go to my wife, Tomoko, who pro-
fessionally photographed the books cover image; my son, Kentaro; and
my mother, Harumi. Icould not have completed this book without their
patient and loving support.
This book is dedicated to the memory of my father, Yougo Tani, who
ignited my interest in science and engineering before he passed away in
my childhood. Some additional resources such as robot videos can be
found at https://sites.google.com/site/tanioupbook/ home. Finally, this
work was partially supported by RIKEN BSI Research Fund (2010-2011)
and the 2012 KAIST Settlement and Research of New Instructors Fund,
titled Neuro-Robotics Experiments with Large Scale Brain Networks.
xvi
1

PartI
On theMind
2
3

1
Where Do We Begin with Mind?

How do our minds work? Sometimes Inotice that Iact without much
consciousness, for example, when reaching for my mug of coffee on the
table, putting on a jacket, or walking to the station for my daily com-
mute. However, if something unexpected happens, like Ifail to grasp
the mug properly or the road to the station is closed due to roadwork,
Isuddenly become conscious of my actions. How does this conscious-
ness arise at such moments? In everyday conversation, my utterances are
generated smoothly. Iautomatically combine words in the correct order
and seldom consciously manipulate grammar when speaking. How is
this possible? Although it seems that many of our thoughts and actions
are generated either consciously or unconsciously by utilizing knowl-
edge or concepts in terms of images, rules, and symbols, Iwonder how
they are actually stored in our memories and how they can be manipu-
lated in our minds. When Im doing something like making a cup of cof-
fee, my actions as well as thoughts tend to shift freely from getting out
the milk to looking out the window to thinking about whether to stay
in for lunch today. Is this spontaneous switching generated by my will?
If so, how is such will initiated in my mind in the first place? Mostly,
my everyday thinking or action follows routines, habituation, or social
conventions. Nevertheless, sometimes some novel images, thoughts, or
acts can be created. How are they generated? Finally, a somewhat phil-
osophical question arises:How can Ibelieve that this world really exists

3
4

4 On theMind

without my subjectively thinking about it? Does my subjective mind


subsume the reality of the world or is it the other way around?
The mind is one of the most curious and miraculous things. We know
that the phenomena of the mind, like those just described, originate in
the brain:We often hear scientists saying that our minds are the prod-
ucts of entangled activities of neurons firing, synapse modulations,
neuronal chemical reactions, and more. Although the scientific liter-
ature contains an abundance of detailed information about such bio-
logical phenomena in the brain, it is still difficult to find satisfactory
explanations about how the mind actually works. This is because each
piece of detailed knowledge about the biological brain cannot as yet be
connected together well enough to produce a comprehensive picture of
the whole. But understanding the mind is not only the remit of scien-
tists; it is and has always been the job of philosophers, too. One of the
greatest of philosophers, Aristotle, asserted that The mind is the part
of the soul by which it knows and understands (Aristotle, Trans. 1907).
It is hard, however, to link such metaphysical arguments to the actual
biological reality of thebrain.
Twenty-five years ago, Iwas a chemical plant engineer with no such
thoughts about the brain, consciousness, and existence until something
wonderful happened by chance to start me thinking about these things
seriously. One day Itraveled to a chemical plant site in an isolated area
in northern Japan to examine a hydraulic system consisting of piping
networks. The pipeline Isaw there was huge, with a diameter of more
than 1.5 m and a total length of around 20 km. It originated in a ship
yard about 10 km away from the plant and inside the plant yard it was
connected to a complex of looping networks equipped with various
functional components such as automatic control valves, pumps, surge
accumulators, andtanks.
I was conducting an emergency shutdown test of one of the huge
main valves downstream in the pipeline when, immediately after valve
shutdown, Iwas terrified by the thundering noise of the water hammer
phenomenon, the loud knocking heard in a pipe caused by an abrupt
pressure surge upstream of the valve. Several seconds later I heard
the same sound arising from various locations around the plant yard,
presumably because the pressure surge had propagated and was being
reflected at various terminal ends in the piping network. After some
minutes, although the initial thunderous noise had faded, I noticed a
strange coherence of sounds occurring across the yard. Iheard a pair
5

Where Do We Begin withMind? 5

of water hammers at different places, seeming to respond to each other


periodically. This coherence appeared and disappeared almost capri-
ciously, arising again in other locations. Iwent back to the plant control
room to examine the operation records, plotting the time history of the
internal pressure at various points in the piping network. As Ithought,
the plots showed some oscillatory patterns of pressure hikes appearing
at certain points and tending to transform to other oscillatory patterns
within several minutes. Sometimes these patterns seemed to form in a
combinatory way, with a set of patterns appearing in different combina-
tions with other sets. At that point Ijumped on a bicycle to search for
more water hammers around the plant yard even though it was already
dusk. Hearing this mysterious ensemble of roaring pipes in the darkness,
Ifelt as if Iwas exploring inside a huge brain, where its consciousness
arose. In the next moment, however, Istopped and reflected to myself
that this was not actually a mystery at all but complex transient phe-
nomena involving physical systems, and Ithought then that this might
explain the spontaneous nature of themind.
I had another epiphany several months later when, together with my
fellow engineers, I had the chance to visit a robotics research labora-
tory, one of the most advanced of its kind in Japan. The researchers
there showed us a sophisticated mobile robot that could navigate around
a room guided by a map preprogrammed into the robots computer.
During the demonstration the robot maneuvered around the room,
stopped in front of some objects, and said in a synthesized voice, This
is a refrigerator, This is a blackboard, and This is a couch. While
we all stood amazed at seeing the robot correctly naming the objects
around us, Iasked myself how the robot could know what a refrigera-
tor meant. To me, a refrigerator means the smell of refreshing cool air
when Iopen the door to get a beer on a long hot summer day. Surely the
robot didnt understand the meaning of a refrigerator or a chair in such
a way, as these items were nothing more to it than landmarks on a regis-
tered computational map. The meanings of these items to me, however,
would materialize as the result of my own experiences with them, such
as the smell of cool air from the refrigerator or the feeling of my body
sinking back into a soft chair as Isit down to drink my beer. Surely the
meanings of various things in the world around us would be formed in
our brains through the accumulation of our everyday experiences inter-
acting with them. In the next moment Istarted to think about build-
ing my own robot, one that could have a subjective mind, experience
6

6 On theMind

feelings, imagine things, and think about the world by interacting in


it. Ialso had some vague notion that a subjective mind should involve
dynamic phenomena fluttering between the conscious and unconscious,
just as with the water hammers that had captured my imagination a few
months earlier.
Sometime later Iwent back to school, where Istudied many subjects
related to the mind and cognition, including cognitive science, robotics,
neuroscience, neural network modeling, and philosophy. Each discipline
seemed to have its own specific way of understanding the mind, and
the way the problems were approached by each discipline seemed too
narrow to exchange ideas and views with other disciplines. No single
discipline could fully explain what the mind is or how it works. Isim-
ply didnt believe that one day a super genius like Einstein would come
along and show us a complete picture of the mind, but rather I sus-
pected that a good understanding, if attainable, would come from a
mutual, relational understanding between multiple disciplines, enabling
new findings and concepts in one domain to be explainable using differ-
ent expressions in other disciplines.
It was then it came to me that building robots while taking a mul-
tidisciplinary approach could well produce a picture of the mind. The
current book presents the outcome of two decades of research under
this motivation.

***

This book asks how natural or artificial systems can host cognitive
minds that are characterized by higher order cognitive capabilities
such as compositionality on the one hand and also by autonomy in
generating spontaneous interactions with the outer world either con-
sciously or unconsciously. The book draws answers from examination
of synthetic neurorobotics experiments conducted by the author. The
underlying motivation of this study differs from that of conventional
intelligent robotics studies that aim to design or program functions to
generate intelligent actions. The aim of synthetic neurorobotics studies
is to examine experimentally the emergence of nontrivial mindlike phe-
nomena through dynamic interactions, under specific conditions and for
various cognitive tasks. It is like examining the emergence of nontrivial
patterns of water hammer phenomena under the specific operational
conditions applied in complex pipeline networks.
7

Where Do We Begin withMind? 7

The synthetic neurorobotics studies described in this book have two


foci. One is to make use of dynamical systems perspectives to under-
stand various intricate mechanisms characterizing cognitive minds. The
dynamical systems approach has been known to be effective in articu-
lating mechanisms underlying the development of various functional
structures by applying the principles of self-organization from physics
(Nicolis & Prigogine, 1977; Haken, 1983). Structures and functions to
mechanize higher order cognition, such as for compositional manipu-
lations of symbols, concepts, or linguistic thoughts, may develop by
means of self-organization in internal neurodynamic systems via the
consolidative learning of experience. The other focus of these neuro-
robotics studies is on the embodiment of cognitive processes crucial to
understanding the circular causality arising between body and environ-
ment as aspects of mind extend beyond thebrain.
This naturally brings us to the distinction between the subjective
mind and the objective world. Our studies emphasize top-down inten-
tionality on the one hand, by which our own subjective images, views,
and thoughts consolidated into structures through past experience are
proactively projected onto the objective world, guiding and accompany-
ing our actions. Our studies also emphasize bottom-up recognition of the
perceptual reality on the other hand, which results in the modification
of top-down intention in order to minimize gaps or errors between our
prior expectations and actual outcomes. The crucial focus here is on
the circular causality that emerges as the result of iterative interactions
between the two processes of the top-down subjective intention of act-
ing on the objective world and the bottom-up recognition of the objec-
tive world with modification of the intention. My intuition is that the
key to unlocking all of the mysteries of the mind, including our experi-
ences of consciousness as well as free will, is hidden in this as yet unex-
plored phenomenon of circular causality and the structure within which
it occurs. Moreover, close examination of this structure might help us
address the fundamental philosophical problem brought to the fore in
mind/body dualism:how the subjective mind and the objective world
are related. The synthetic robotics approach described in this book seeks
to answer this fundamental question through the examination of actual
experimental results from the viewpoints of various disciplines.
This book is organized into two parts, namely Part IOn the Mind
from c hapter 1 to c hapter 5 and Part II Emergent Minds: Findings
from Robotics Experiments from c hapter6 to c hapter11. In Part I, the
8

8 On theMind

book reviews how problems with cognitive minds have been explored
in different research fields, including cognitive science, phenomenol-
ogy, brain science, neural network modeling, psychology, and robot-
ics. These in-depth reviews will provide general readers with a good
introduction to relevant disciplines and should help them to appreci-
ate the many conflicting arguments about the mind and brain active
therein. Part II starts with new proposals for tackling these problems
through neurorobotics experiments, and through analysis of their
results comes out with some answers to fundamental questions about
the nature of the mind. In the end, this book traces my own journey
in exploration of the fundamental nature of the mind, and in retracing
this journey Ihope to deliver an intuitively accessible account of how
the mindworks.
9

2
Cognitivism

One of the main forces having advanced the study of the mind over the
last 50years is cognitivism. Cognitivism regards the mind as an exter-
nally observable object that can be best articulated with symbol systems
in computational metaphors, and this approach has become successful
as the speed and memory capacity of computers has grown exponen-
tially. Let us begin our discussion of cognitivism by looking at the core
ideas of cognitive science.

2.1. Composition and Recursion inSymbol Systems

The essence of cognitivism is represented well by the principle of com-


positionality (i.e., the meaning of the whole is a function of the mean-
ing of the parts), but specifically that as expounded by Gareth Evans
(1982) in regard to language. According to Evans, the principle asserts
that the meaning of a complex expression is determined by the mean-
ings of its constituent expressions and the rules used to combine them
(sentences are composed from sequences of words). However, its cen-
tral notion that the whole can be decomposed into reusable parts (or
primitives) is applicable to other faculties, such as action generation.
Indeed, Michael Arbib (1981) in his motor schemata theory, which was

9
10

10 On theMind

published not long before Evans work on language, proposed that com-
plex, goal-d irected actions can be decomposed into sequences of behav-
ior primitives. Here, behavior primitives are sets of commonly used
behavior pattern segments or motor programs that are put together to
form streams of continuous sensory-motor flow. Cognitive scientists
have found a good analogy between the compositionality of mental pro-
cesses, like combining the meanings of words into those of sentences or
combining the images of behavior primitives into those of goal-d irected
actions at the back of our mind, and the computational mechanics of
the combinatorial operations of operands. In both cases we have con-
crete objects symbols and distinct procedures for manipulating
them in our brains. Because these objects to be manipulatedeither
by computers or in mental processesare symbols without any physical
dimensions such as weight, length, speed, or force, their manipulation
processes are considered to be cost free in terms of time and energy con-
sumption. When such a symbol system, comprising arbitrary shapes of
tokens (Harnad, 1992), is provided with recursive functionality for the
tokens operations, it achieves compositionality with an infinite range of
expressions.
Noam Chomsky, famous for his revolutionary ideas on generative
grammar in linguistics, has advocated that recursion is a uniquely human
cognitive competency. Chomsky and colleagues (Hauser, Chomsky, &
Fitch, 2002) proposed that the human brain might host two distinct
cognitive competencies: the so-called faculty of language in a narrow
sense (FLN) and the faculty of language in a broad sense (FLB). FLB com-
prises a sensory-motor system, a conceptual-intentional system, and the
computational mechanisms for recursion that allow for an infinite range
of expressions from a finite set of elements. FLN, on the other hand,
involves only recursion and is regarded as a uniquely human aspect of
language. FLN is thought to generate internal representations by utiliz-
ing syntactic rules and mapping them to a sensorymotor interface via
the phonological system as well as to the conceptualintentional inter-
face via the semantic system.
Chomsky and colleagues admit that some animals other than humans
can exhibit certain recursion-like behaviors with training. Chimps have
become able to count the number of objects on a table by indicating a
corresponding panel representing the correct number of objects on the
table by association. The chimps became able to count up to around
five objects correctly, but one or two errors creep in for more than five
11

Cognitivism 11

objects:The more objects to count, the more inaccurate at counting the


chimps become. Another example of recursion-like behavior in animals
is cup nesting, a task in which each cup varies in size so that the small-
est cup fits into the second smallest, which in turn can be nested or
seriated into larger cups. When observing chimps and bonobos cup
nesting, Johnson-P ynn and colleagues (1999) found that performance
differed by species as well as among individuals; some individuals could
nest only two different sizes of cups whereas others could pair three
by employing a subassembly strategy, that is, nesting a small cup in a
medium size cup as a subassembly and then nesting them in a large
cup. However, the number of nestings never reliably went beyond three.
Similar limitations in cup nesting performance have been observed in
parrots (Pepperberg & Shive, 2001)and the degu, a small rat-size rodent
(Tokimoto & Okanoya,2004).
These observations of animals object counting and nesting cup
behaviors suggest that, although some animals can learn to perform
recursion-like behaviors, the depth of recursion is quite limited particu-
larly when contrasted with humans in whom almost an infinite depth
of recursion is possible as long as time and physical conditions allow.
Chomsky and colleagues thus speculated that the human brain might be
uniquely endowed with the FLN component that enables infinite recur-
sion in the generation of various cognitive behaviors including language.
What then is the core mechanism of FLN? It seems to be a recursive
call of logical rules. In counting numbers, the logical rule of add one to
the currently memorized number is recursively called: Starting with
the currently memorized number set to 0, it is increased to 1, 2, 3,,
infinity as the add one rule is called at each recursion. Cup nesting can
be performed infinitely when the logical rule of put the next smallest
cup in the current nesting cup is recursively called. Similarly, in the
recursive structure of sentences, clauses nest inside of other clauses, and
in sentence generation the recursive substitution of one of the context-
free grammar rules for each variable could generate sentences of infinite
length after starting with the symbol S (see Figure 2.1 for an illustra-
tive example).
Chomsky and colleagues crucial argument is that the core aspect of
recursion is not a matter of what has been learned or developed over a
lifetime but what has been implemented as an innate function in the
faculty of language in a narrow sense (FLN). In their view, what is to be
learned or developed are the interfaces from this core aspect of recursion
12

12 On theMind

Sentence generation
Context-free grammar S
R1: S NP VP R1
R2: NP (A NP)/N NP VP
R3: VP V NP
R2 R3
A NP V NP
R4: A Small R4 R2 R6 R2
R5: N dogs/cats
Small N like N
R6: V like
R5 R5
dogs cats

Figure2.1. On the left is a context-f ree grammar (CFG) consisting of a set


of rules and on the right is an example sentence that can be generated by
recursive substitutions of the rules with the starting symbol S allocated to
the top of the parsing tree. Note that the same CFG can generate different
sentences, even those with infinite length, depending on the nature of the
substituting rules (e.g., repeated substitutions of R2:NPANP).

ability to the sensorymotor systems or semantic systems in the faculty


of language in a broad sense (FLB). They assert that the unique exis-
tence of this core recursive aspect of FLN is an innate component that
positions human cognitive capability at the top of the hierarchy of living
systems.
Such a view is contentious though. First, it is not realistic to assume
that we humans perform infinite recursions in everyday life. We can
neither count infinitely nor generate/ recognize infinite- length sen-
tences. Chomsky and colleagues, however, see this not as a problem of
FLN itself but as a problem of external constraints (e.g., a limitation
in working memory size in FLB in remembering currently generated
word sequences) or of physical time constraints that hamper perform-
ing infinite recursions in FLN. Second, are symbols actually manipu-
lated recursively somewhere in our heads when counting numbers or
generating/recognizing sentences? If there are fewer than six objects
on a table, the number would be grasped analogically from visual pat-
terns; if there are more than six objects, we may start to count them
one by one on our fingers. In our everyday conversations we generally
talk without much concern for spoken grammar: Our colloquialisms
seem to be generated not by consciously combining individual words
following grammatical rules, but by automatically and subconsciously
13

Cognitivism 13

combining phrases. However, when needing to write complex embed-


ded sentences such as those often seen in formal documents, we some-
times find ourselves consciously dealing with grammar in our search
for appropriate word sequences. Thus, the notion of there being infinite
levels of recursion in FLN might apply only rarely to human cognition.
In everyday life, it seems unlikely that an infinite range of expressions
would beused.
Many cognitive behaviors in everyday life do still of course require
some level of manipulation that involves composition or recursion of
information. For example, generating goal-d irected action plans by com-
bining behavior primitives into sequences cannot be accounted for by
the simple involuntary action of mapping sensory inputs to motor out-
puts. It requires some level of manipulation of internal knowledge about
the world, yet does not involve infinite complexity. How is such process-
ing done? One possibility might be to use the core recursive component
of calling logical rules in FLN under the limitation of finite levels of
recursions. Another possibility might be to assume subrecursive func-
tions embedded in analogical processes rather than logical operations in
FLB that can mimic recursive operations for finite levels. Cognitivism
embraces the former possibility, with its strong conviction that the core
aspect of cognition should reside in symbol representation and a manip-
ulation framework. But, if we are to assume that symbols play a central
role in cognition, how would symbols comprising arbitrary shapes of
tokens convey the richness of meaning and context we see in the real
world? For example, a typical artificial intelligence system may repre-
sent an apple with its features color-is-R ED and shape-is-SPHERE.
However, this is merely to describe the meaning of a symbol by way
of other symbols, and Im not sure how my everyday experience with
apples could be represented in thisform.

2.2. Some CognitiveModels

This section looks at some cognitive models that have been developed
to solve general cognitive tasks by utilizing the aforementioned symbol-
ist framework. The General Problem Solver (GPS) (Newell & Simon,
1972; Newell, 1990)that was developed by Allen Newell and Herbert
A.Simon is such a typical cognitive model, which has made a significant
impact on the subsequent direction of artificial intelligence research.
14

14 On theMind

Numerous systems such as Act-R (Anderson, 1983) and Soar (Laird


etal., 1987)use this rule-based approach, although it has a crucial prob-
lem, as is shownlater.
The GPS provides a core set of operations that can be used to solve
cognitive problems in various task domains. In solving a problem, the
problem space in terms of the goal to be achieved, the initial state,
and the transition rules are defined. By following a means-end-analysis
approach, the goal to be achieved is divided into subgoals and GPS
attempts to solve each of those. Each transition rule is specified by an
action operator associated with a list of precondition states, a list of
add states and a list of delete states. After an action is applied, the
corresponding add states and delete states are added to and deleted
from the precondition states. A rule actually specifies a possible state
transition from the precondition state to the consequent state after
applying the action.
Let us consider the so-called monkey banana problem in which the
goal of the monkey is to become not hungry by eating a banana. The
rules defined for GPS can be as shown in Table2.1.
By considering that the goal is [not hungry] and the start state
is [at door, on floor, has ball, hungry, chair at door], it can be
seen that the goal state [not hungry] can be achieved by apply-
ing an action of eat bananas in Rule 5 if the precondition state of
[has bananas] is satisfied. Therefore, this precondition state of [has
bananas] becomes the subgoal to be achieved in the next step. In the

Table2.1. Example RulesinGPS

Rule # Action Precondition Add Delete

Rule 1 climb on chair chair at middle at bananas, on at middle


room, at chair room, on
middleroom, floor
onfloor
Rule 2 push chair from chair at door, chair at middle chair at door,
door to middle atdoor room, middle at door
room room
Rule 3 walk from door at door, at middle room at door
to middle room on floor
Rule 4 grasp bananas at bananas, has bananas empty handed
empty
handed
Rule 5 eat bananas has bananas empty handed, has bananas,
not hungry hungry
15

Cognitivism 15

same manner, the subgoal [has bananas] can be achieved by applying


an action of [grasp bananas] with the precondition of [at bananas],
which can be achieved again by applying another action of [climb on
chair]. Repetitions of backward transition from a particular subgoal to
its sub-subgoal by searching for an adequate action enabling the transi-
tion can result in generation of a chain of actions, and the goal state
can be achieved from the start state by applying the resulting action
sequence.
The architecture of GPS is quite general in the sense that it has been
applied to a variety of different task domains including proving theo-
rems in logic or geometry, word puzzles, and chess. Allen Newell and
his colleagues (Laird et al., 1987) developed a new cognitive model,
Soar, by further extending GPS. Of particular interest is its primary
learning mechanism, chunking. Chunking is involved in the conversion
of an experience of an action sequence into long-term memory. When
a particular action sequence is found to be effective to achieve a par-
ticular subgoal, this action sequence is memorized as a chunk (a learned
rule) in long-term memory. When the same subgoal appears again, this
chunked action sequence is recalled rather than deliberating over and
synthesizing it again. For example, in the case of the monkeybanana
problem, the monkey may learn an action sequence of grasp bananas
and eat bananas as an effective chunk for solving a current hungry
problem, and may retain this chunk because hungry may appear as a
problem again in the future.
The idea of chunking has attracted significant attention in cognitive
psychology. Actually, Imyself had been largely influenced by this idea
after Ilearned about it in an artificial intelligence course given by John
Laird, who has led the development of Soar for more than two decades.
At the same time, however, Icould not arrive at full agreement with the
treatment of chunking in Soar because the basic elements to be chun-
ked are symbols rather than continuous patterns even at the lowest per-
ceptual level. Ispeculated that the mechanism of chunking should be
considered at the level of continuous perceptual flow rather than symbol
sequences in which each symbol already stands as an isolated segment
within the flow. Later sections of this book explore how chunks can
be structured out of continuous sensorymotor flow experiences. First,
however, the next section introduces the so-called symbol grounding
problem, which cognitive models built on symbolist frameworks inevi-
tably encounter.
16

16 On theMind

2.3. The Symbol Grounding Problem

The symbol grounding problem as conceptualized by Steven Harnad


(1990) is based on his assertion that the meanings of symbols should
originate from a nonsymbolic substrate like sensory- motor patterns and
as such, symbols are grounded bottom up. To give shape to this thought,
he proposed, as an abstract model of cognitive systems, a hybrid sys-
tem consisting of a symbol system in the upper level and a nonsymbolic
pattern processing system in the lower level. The nonsymbolic pattern
processing system functions as the interface between sensorymotor
reality and abstract symbolic representation by categorizing continuous
sensorymotor patterns into sets of discrete symbols. Harnad argued
that meaning, or semantics, in the hybrid system would no longer be
parasitic on its symbol representation but would become intrinsic to
the whole system operation, as such representation is now grounded
in the world. This concept of a hybrid system has similarities to that of
FLN and FLB advocated by Chomsky and colleagues in the sense that
it assumes a core aspect of human cognition in terms of logical symbol
systems, which can support up to an infinite range of expressions, and
peripheries as the interface to a sensorymotor or semantic system that
may not be involved in composition or recursion indepth.
This idea of a hybrid system reminds me also of Cartesian dualism.
According to Descartes the mind is a thinking thing that is nonmaterial
whereas the body is nonthinking matter, and the two are distinct. The
nonmaterial mind may correspond to FLN or symbol systems that are
defined in a nonphysical discrete space, and the body to sensorymotor
processes that are defined in physical space. The crucial question here
is how these two completely distinct existences that do not share the
same metric space can interact with each other. Obviously, our minds
depend on our physical condition and the freshness of the mind affects
the swiftness of our every move. Descartes showed some concern about
this problem of interactionism, asking how a nonmaterial mind can
cause anything in a material body, and vice versa. Cognitive scientists
in modern times, however, seem to consider rather optimistically
Ithinkthat some nice interfaces would enable interactions between
the two opposite poles of nonmatter and matter.
Lets consider the problem by examining a problem in robot naviga-
tion as an example, reviewing my own work on the subject (Tani, 1998).
Atypical mobile robot, which is equipped with simple range sensors, may
travel around an office environment while taking the range reading that
17

Cognitivism 17

provides an estimate of geometrical shapes in the surrounding environ-


ment at each time step. The continuous flow of the range image pattern
is categorized into one of several predefined landmark types such as a
straight corridor, a corner, a T-branch, or a room entrance. The upper
level constructs a chain representation of landmark types by observing
sequential outputs of the categorizer while the robot explores the envi-
ronment. This internal map consists of nodes representing position states
of the robot associated with encountered landmark types and of arcs rep-
resenting transitions between them associated with actions such as turn-
ing to right/left and going straight. This representation takes exactly the
same form as a symbolic representation known as a finite state machine
(FSM), which consists of a finite number of discrete states and their state
transition rules. It is noted that the rule representation in GPS can be con-
verted into this FSM representation by considering that each rule descrip-
tion in GPS can be expanded into two adjacent nodes connected by an ark
in FSM. Once the robot acquires the internal map of its environment, it
becomes able to predict the next sensation of landmarks on its travels by
looking at the next state transition in the FSM. When the actual percep-
tion of the landmark type matches the prediction, the robot proceeds to
the prediction of the next landmark to be encountered. An illustrative
description is shown in Figure2.2.

C
Straight
T C
Right
C T C
FSM

T-Branch Straight

categorizer Right

robot and its environment

t
sensory pattern

Figure2.2. Landmark-based navigation of a robot using hybrid-t ype


architecture consisting of a finite state machine and a categorizer. Redrawn
from Tani (1998).
18

18 On theMind

Problems occur when this matching process fails. The robot becomes
lost because the operation of the FSM halts upon receiving an illegiti-
mate symbol/ landmark type. This is my concern about the symbol
grounding problem. When systems involve bottom-up and top-down
pathways, they inevitably encounter inconsistencies between the two
pathways of top-down expectation and bottom-up reality. The problem
is how such inconsistencies can be treated internally without causing
a fatal error, halting the systems operations. It is considered that both
levels are dually responsible for any inconsistency and that they should
resolve any conflict through cooperative processes. This cooperation
entails iterative interactions between the two sides through which opti-
mal matching between them is sought dynamically. If one side pushes
forward a little, the other side should pull back elastically so that a
point of compromise can be found through iterative dynamic interac-
tions. The problem here is that the symbol systems defined in a discrete
space appear to be too solid to afford such dynamic interactions with
the sensorymotor system. This problem cannot be resolved simply by
implementing certain interfaces between the two systems because the
two simply do not share the same metric space enabling smooth, dense,
and direct interactions.

2.4.Context

Another concern is how well symbol systems can represent the real-
ity of the world. Wittgenstein once said:Whereof one cannot speak,
thereof one must be silent, meaning that language as a formal symbol
system for fully expressing philosophical ideas has its limitations. Not
only in philosophy, but in everyday life, too, there is always some-
thing that cannot be expressed explicitly. Context, or background, is
an example. Context originally means discourse that surrounds a lan-
guage unit and that helps to determine its interpretation. In a larger
sense, it also means the surroundings that specify the meaning or exis-
tence of anevent.
Spencer- Brown (1969) highlighted a paradox in his attempts to
explicitly specify context in his formulation of the calculus of indica-
tions. Although details of his mathematical formulas are not introduced
here, his statement could be interpreted to mean that indexing the
19

Cognitivism 19

current situation requires the indexing of its background or context.


Because indexing the background requires further indexing of the back-
ground of the background, the operation of indexing situations ends
up as an infinite regression. Spencer-Brown wrote that, in this aspect,
every observation entails a symbol, an unwritten cross, where the cross
operation denotes indexing of the background. Lets imagine you see a
bottle-like shape. This situation can be disambiguated by specifying its
immediate background (context), namely that a bottle-like shape was
seen immediately after you opened the refrigerator door, which means
that the bottle is chilled. Further background information that the
refrigerator was opened after you went back to your apartment after
a long day at work would mean that what you see now is a bottle of
chilled beer waiting to be drunk. There is no logical way to terminate
this regression, yet you can still reach for the bottle of beer to drink it!
Although FLN may have the capability for infinite regression, it is hard
to believe that our minds actually engage in such infinite computations.
We live and act in the world surrounded or supported by context, which
is always implicit, uncertain, and incomplete for us at best. How can a
formal symbol system represent such a situation?

2.5.Summary

We humans definitely have internal images about our surrounding


world. We can extract regularities from our experiences and observa-
tions both consciously and unconsciously, as evidenced by the fact that
we can acquire language skills involving grammar. Also, we can com-
bine the acquired rules to create new images, utterances, and thoughts.
Accounting for this aspect, cognitivists tend to assume that symbols
exist to be manipulated in our heads. My question, though, is what is
the reality of those symbols we suppose to be in our heads? Is symbol
representation and manipulation an operational principle in the cogni-
tive mind? If so, my next questions would be how can symbols compris-
ing arbitrary shapes of tokens interact with sensorymotor reality and
how can they access matters involving context, mood, or tacit knowl-
edge that are considered to be difficult to deal with by formal symbol
systems? It is also difficult to represent the state of consciousness with
them. It is presumably hard to differentiate between doing something
20

20 On theMind

consciously and unconsciously in the processes of merely manipulating


symbols by followinglogic.
If we attempt to model or reconstruct mind, it should be essential
to reconstruct not only rational thinking aspects but also the feelings
that accompany our daily experiences such as consciousness as the
vivid feeling of qualia characterizing various sensations. But if sym-
bol systems cannot deal with such matters, what would be a viable
solution? Indeed, this book proposes an abrupt transition from the
aforementioned conventional symbolist framework. The main pro-
posal is to consider that what we have in our brains as symbol is not
just arbitrary shape of token but dynamic activity of physical matter
embedded in continuous spatio-temporal space. Such dynamic activ-
ity of matter, adequately developed, might enable compositional but
vivid and contextual thinking and imaging in our brains. A crucial
argument would be that such cognitive minds could be naturally situ-
ated to the physical world because these two share the same metric
space for interaction.
The next chapter addresses this very problem from the standpoint
of a different discipline, that of phenomenology. The objective of
phenomenology is not only to investigate the problem of minds but
also to search for how the problems themselves can be constituted
from the introspective view. Readers will find that the disciplinary of
phenomenology is quite sympathetic to the aforementioned dynamic
systemview.
21

3
Phenomenology

Phenomenology originated in Europe at the beginning of the 20th cen-


tury with Edmund Husserls study of so-called phenomenological reduc-
tion, through which the analysis of the natural world is based purely on
the conscious experiences of individuals. As this chapter shows, Husserls
study subsequently evolved and was extended by the existentialism
of Martin Heidegger and the embodiment of Maurice Merleau-Ponty
and others. We should also not forget to mention William James, who
was born 17years earlier than Husserl in the United States. Although
James is best known as the founder of modern psychology, he also pro-
vided numerous essential philosophical ideas about the mind, some of
which are quite analogous to Husserls phenomenology. In Japan, Kitaro
Nishida (1990) developed his original thinking, influenced by Buddhist
meditation, which turned out to include ideas with some affinity to
those of Husserl andJames.
Phenomenology asks us to contemplate how the world can exist for
us and how such a belief can be constituted from our experiences, by
suspending our ordinal assumption that the world exists as a physical
fact from the outset. Here, the question of how the world can be con-
stituted in our subjective reflection might be analogous to the question
of how the knowledge of the world can be represented in cognitive sci-
ence studies. Phenomenology, however, focuses more on phenomena
themselves, through direct perception or pure experience, which has

21
22

22 On theMind

not yet been articulated either by conception or language. For example,


a rose exists in our subjectivity as a conscious phenomenon of a par-
ticular smell or a particular visual shape, but not by our knowledge of
its objective existence. This discipline then focuses purely on phenom-
ena and questions the existence of the world from such a viewpoint.
However, the discipline also explores the being of cogito (how cognition
arises) in the higher level by examining how it can be developed purely
through the accumulation of perceptual experiences. Thus, phenom-
enology asks how cognition is constituted from direct perception, a line
of questioning deeply related to the later discussions on how robotic
agents can develop views or recognition of the world from their own
sensorymotor experiences.

3.1. Direct Experience

Let us begin by examining what direct experience means in phenomenol-


ogy. It is said that Husserl noticed the importance of direct experience
when coming across Machs perspective (Figure 3.1) (T. Tani, 1998). It
is said that Mach drew the picture to represent what he sees with his left
eye while closing his right one. From this perspective, the tip of his nose
appears to the right of the frame with his eye socket curving upwards.
Although we usually do not notice this sort of perspective, this should
represent the direct experience that we then reconstruct in ourminds.
Husserl considered that an examination of such direct experience
could serve as a starting point to explore phenomena. Around the same
time, a notable Japanese philosopher, Kitaro Nishida introduced a simi-
lar idea in terms of pure experience, writingthat:

For example, the moment of seeing a color or hearing a sound is


prior not only to the thought that the color or sound is the activity
of an external object or that one is sensing it, but also to the
judgment of what the color or sound might be. In this regard, pure
experience is identical with direct experience (Nishida, 1990,p.3).

For Nishida, pure experience is not describable by language but is


transcended:

When one directly experiences ones own state of consciousness,


there is not yet a subject or an object. (Nishida, 1990,p.3)
23

Phenomenology 23

Figure3.1. Ernst Machs drawing. Source:Wikimedia Commons.

Here, what exactly does this phrase there is not yet a subject or an
object mean? Shizuteru Ueda (1994), who is known for his studies on
Nishidas philosophy, explains this by analyzing the example utterance,
The temple bell is ringing. If it is said instead as I hear the temple bell
ringing, the explication of I as the subject conveys a subtle expression
of subjective experience at the moment of hearing. In this interpreta-
tion, the former utterance is considered to express pure experience in
which subject and object are not yet separated by any articulation in
the cogito. This analysis is analogous to what Husserl recognized from
Machs perspective.

3.2. The Subjective Mind and ObjectiveWorld

We might ask, however, how much the phenomena of experience depend


on direct perception. Is our experience of perception the same as that of
24

24 On theMind

infants in the sense that any knowledge or conception in the cogito does
not affect them at all? In answer, we have sensationalism on one side,
which emphasizes direct experiences from the objective world, and on
the other we have cognitivism, which emphasizes subjective reflection
and representation of the world. But how did these conflicting poles
of the subjective mind and the objective world appear? Perhaps they
existed as one entity originally and later split off from each other. Lets
look then at how this issue of the subjective and the objective has been
addressed by different phenomenologicalideas.
In Husserls (2002) analysis of the structural relationship between
what he calls appearance and that which appears in perceiving an object,
he uses the example of perceiving a square, as shown in Figure3.2.
In looking at squarelike shapes in everyday life, despite them having
slightly unequal angles, we usually perceive them to be squares with
equal right angles. In other words, a square could appear with unequal
angles in various real situations, when it should have equal right angles
in the ideal:in such a case, a parallelogram or trapezoid is the appear-
ance and the square is that which appears as the result of percep-
tion. At this point, we should forget about the actual existence of this
square in the physical world because this object should, in Husserls
sense, exist only through idealization. Whether things exist or not is
just a subjective matter rather than an objective one. When things are
constituted in our minds, they exist regardless of their actual being.
This approach that puts aside correspondence to actual being is called

that which appears

Square is perceived

(appearance1) (appearance2) (appearance3)


Parallelogram Trapezoid Parallelogram

Figure3.2. Husserls ideas on the structural relationship between


appearance and that which appears in perceiving a square, as an example.
25

Phenomenology 25

epoch, or suspension of belief. Husserl considers that direct experi-


ence has intentionality toward representation. This intentional process
of constituting representation from direct experience actually entails
consciousness. Therefore, the phenomena of experience cannot be
accounted for only by direct experience at the level of perception, but
it must also be accounted for by conscious representation at the level
of cogito. Ultimately, it can be said that the phenomena of experiences
stand on the duality of these two levels.
Incidentally, from the preceding text, readers might speculate that
the level of cogito and the level of perception are treated as separate
entities in phenomenology. However, phenomenology does not seek to
take that direction and instead attempts to explore how the apparent
polarity of, for example, the cogito and perception, subjectivity and
objectivity, and mind and material, could have appeared from a single
unified entity in the beginning. Although understanding such constitu-
tional aspects of the polarity (i.e., how the polarity developed) contin-
ues to be a subject of debate in phenomenology, interesting assumptions
have been made about there being some sort of immanence enabling
self-development of such structures. For example, Husserl considers
how the cogito level of dealing with temporal structure submerged in a
stream of experience could emerge from the direct perceptual level, as
explained in detaillater.
Nishida (1990) also considers that the subject and object should
be one unified existence rather than taken originally as independent
phenomena. He, however, argues that the unified existence could have
internal contradictions that lead to bifurcation or the division of the
unity into the subject and object that we usually grasp. He suggests
that the phenomenological entity simply continues to develop by repeat-
ing these unification and division processes. Merleau-Ponty (1968) pro-
fesses that this iteration of unification and division would take place
in the medium of our bodies, as he considers that the two poles of the
subjective mind and the objective material actually meet and intermin-
gle with each other there. He regards the body as ambiguous, being
positioned between the subjective mental world and the objective phys-
ical world. Heidegger, on the other hand, devoted himself to exploring a
more fundamental problem of being by working on what it means to be
human rather than splitting the problem into that of subject and object.
And through his approach to the problem of being he turned out to be
successful in showing how subjectivity and objectivity can appear.
26

26 On theMind

What follows examines the philosophical arguments concern-


ing the subjective mind and the objective world in more depth, along
with related discussions that include time perception as propounded
by Husserl, being-in-the-world set forth by Heidegger, embodiment by
Merleau-Ponty, and the stream of consciousness by James. Lets begin
by looking closely at each of these, starting with Husserls conception of
the problem of time perception.

3.3. Time Perception:How Can theFlow ofSubjective


Experiences Be Objectified?

To Husserl, the world should consist of objects that the subject can con-
sciously meditate on or describe. However, he noticed that our direct
experiences do not originate with forms of such consciously represent-
able objects but arise from a continuity of experience in time that exists
as pure experience. Analyzing how a continuous flow of experience can
be articulated or segmented into describable objects or events brought
him to the problem of time perception. Husserl asks how we perceive
temporal structure in our experiences (Husserl, 1964). It should be
noted that time discussed here is not physical time having dimensions
of seconds, minutes, and hours but rather time perceived subjectively
without objective measures. The problem of time perception is a core
issue in this book because both humans and robots that generate and
recognize actions have to manage continuous flows of perception by
articulating them (via segmentation and chunking), as is detailedlater.
In considering the problem, Husserl presumed that time consists of
two levels:so-called preempirical time at a deep level and objective time
at a surface level. According to him, the continuous flow of experience
becomes articulated into consciously accessible events by its develop-
ment though these phenomenological levels. This idea seems born from
his thinking on the structural relationship between appearance and
that which appears mentioned earlier in this chapter. At the preem-
pirical level, every experience is implicit and yet must be articulated,
but there is some sort of passive intention toward the flow of experience
which he refers to as retention and protention. His famous explanatory
example is about hearing a continuous melody such as do-re-mi. When
we hear the re note, we would still perceive a lingering impression of
27

Phenomenology 27

do and at the same time we would anticipate hearing the next note
of mi. The former refers to retention and the latter protention. The
present appearance of re is called the primary impression. These three
terms of retention, primary impression, and protention are used to des-
ignate the experienced sense of the immediate past, the present, and the
immediate future, respectively. They are a part of automatic processes
and as such cannot be monitored consciously. The situation is similar
to that of the utterance The temple bell is ringing mentioned ear-
lier, in the sense that the subject of this utterance is not yet consciously
reflected. Lets consider the problem of nowness in the do- re-
mi
example. Nowness as experienced in this situation might be taken to
correspond with the present point of hearing re with no duration and
nothing beyond that. Husserl, however, considered that the subjective
experience of nowness is extended to include the fringes of the experi-
enced sense of both the past and the future, that is, in terms of retention
and protention:Retention of do and protention of mi are included
in the primary impression of hearing re. This would be true especially
when we hear do-re-mi as the chunk of a familiar melody rather than
as a sequence consisting of independent notes. Having now understood
Husserls notion of nowness in terms of retention and protention, the
question arises:Where is nowness bounded? Husserl seems to think that
the immediate past does not belong to a representational conscious mem-
ory but merely to an impression. Yet, how could the immediate past,
experienced just as an impression, slip into the distant past but still be
retrieved through conscious memory, as Francisco Varela (1999) once
asked in the context of neurophenomenology? Conscious memory of the
past actually appears at the level of objective time, as describednext.
This time, lets consider remembering hearing the slightly longer
sequence of notes in do-re-mi-fa-so-la. In this situation, we can recall
hearing the final la that also retains the appearance of so by means
of retention, and we can also recall hearing the same so that retains
the appearance of fa, and so on in order back to do. By means of con-
sciously unifying immediate pastness in a recall with presentness in the
next recall in the retention train, a sense of objective time emerges as a
natural consequence of organizing each appearance into one consistent
linear sequence. In other words, objective time is constituted when the
original experience of continuous flow (in this case the melody) is artic-
ulated into a sequence of objectified events (the notes) by means of con-
sciously recalling and unifying each appearance. There is a fundamental
28

28 On theMind

difference between an impression that will sink into the horizon in


preempirical time and the same past which is represented in objective
time. The former is a present, living experience of passing away, whereas
the latter can be constituted as consciously represented or manipulable
objects, but only after the original experience is retained. Therefore, the
latter may lack the pureness or vividness of the original experience, yet
may fit well with Husserls goal that pure experience can be ultimately
represented as logical forms dealing with discrete objects and events.
Husserl proposed two types of intentionality, each heading in a dif-
ferent direction: transversal intentionality refers to integration of the
living-present experience by means of retention, primary impression,
and protention in preempirical time; longitudinal intentionality affords
an immanence of time structures (from preempirical time to objective
time) by means of conscious recall of retained events in the retention
train. Consequently, this intentionality might be considered to be reten-
tion of retention itself (a reflective awareness of this experience). In the
process of interweaving these two intentionalities (double intentional-
ity) into the unitary flow of consciousness, the original pure experience
is objectified and simulataneously the subjectivity or ego of this objecti-
fying process emerges.
In his later years, Husserl introduced an analysis at an even deeper
level, the absolute flow level. Here, neither retention nor protention has
yet appearedonly flow exists. However, this flow is not homogeneous;
each appearance has its own duration. Tohru Tani (1998) interpreted
this aspect by saying that consciousness flows as well as stagnates, char-
acterizing the uniqueness of the absolute flow of consciousness and set-
ting it apart from consciousness as developed elsewhere. This alternating
flow and stagnation is primordial, an absolutely given dynamic which is
nonreducible. The passive intentional acts of retention and protention
that dimensionalize experience along the continuum of temporality in
the next level originate from this primordial stage of consciousness, and
objective time arises fromthere.
In sum, Husserls persistent drive to reduce the ideas and knowl-
edge of man to direct experiences is admirable. However, his motiva-
tion toward a logically manipulable ideal representation of the world
via reflection seems to me problematic in that it has exactly the same
problem as the symbol grounding problem in cognitive models. Dreyfus
(Dreyfus & Dreyfus, 1988), who is well known for his criticism of arti-
ficial intelligence research, argues that the main computational scheme
29

Phenomenology 29

based on logical inferences and categorical representation of knowl-


edge in modern cognitive science or artificial intelligence originated
from the ideas of Husserl. Actually Husserl (1970) had already toyed
with an idea similar to the frame system, a notable invention of Marvin
Minsky, which introduced domain specificity into the logical descrip-
tions of objects and the world, but he finally admitted defeat in the face
of infinite possibilities of situations or domains. However, Heidegger as
a disciple of Husserl actually took an alternative route to escape this pre-
dicament, as we discovernext.

3.4. Being-in-the-World

Heidegger is considered by many to be one of the greatest philosophers


of modern times, changing the direction of philosophy dramatically by
introducing his thinking of existentialism (Dreyfus, 1991). Although a
disciple of Husserl, once he became inspired by his own thoughts on the
subjective constitution of the world, Heidegger subsequently departed
from Husserls phenomenology. It is said that Heidegger noticed a phil-
osophical problem concerning the cogito and consciousness, a problem
that was considered by Descartes as well as Husserl and yet fully over-
come by neither.
Descartes considered that the cogito, a unique undoubtable being,
should be taken as the initial point of any philosophical thoughts after
everything in the world is discarded for its doubtfulness of being. He
concluded that if he doubted, then something or someone must be doing
the doubting, therefore the very fact that he doubted proved his exist-
ence (Williams, 2014). Husserl, taking on this thought, presented his
idea that the world and objects should exist in terms of conscious rep-
resentations in the cogito and that such conscious representations ought
to be idealones.
Heidegger just could not accept the unconditional prior existence of
the cogito. Nor could he accept an ideal and logical representation of
the world that the cogito supposedly constitutes. Instead, he raised the
more fundamental question of asking what it means to be human, while
avoiding tackling directly the problems of cogito versus perception, sub-
jectivity versus objectivity, and mental versus material. It is important
to note that Heidegger sought not to obtain an objective understanding
of the problem but rather to undertake a hermeneutic analysisofit.
30

30 On theMind

Hermeneutics is an approach that attempts to deepen the under-


standing of targets while having prior estimates, or biases, of them
that are adequately modified during the process of understanding. For
example, when we read a new piece of text, a preunderstanding of
the authors intention would help our understanding of the content as
we go along. However, hermeneutics possesses an inherent difficulty
because preunderstanding (bias) originates from intuitions in a context-
dependent way and there is a potential danger of being caught up in a
loop of interpretation, the so-called hermeneutic circle. Because we can
understand the whole in terms of its parts and the parts only through
their relationship to the whole, we experience an unending interpreta-
tive loop. Despite this difficulty, Heidegger holds that there are some
fundamental problems, like what it means to be human, which can only
be understood in this way. It is said that we take being as granted, but
cannot articulate it precisely when asked to do so. In his classic text,
Being and Time, he attempts to elucidate the meaning of being via her-
meneutic cycling, beginning with this same vague preunderstanding. It
is his thoughts on understanding by hermeneutic cycling that form the
essential philosophical background to the central theme of this book,
namely emergent phenomena, as discussedlater.
For now, lets examine Heideggers famous notion of being-in-the-
world (Heidegger, 1962)by looking at his interpretation of the ways of
being in relation to equipment. Heidegger focuses on the purposeful
exercise of naive capacities as extended by equipment and tools. For
example, he asks what it means that a hammer exists. It is not suffi-
cient to answer that it exists as a thing made from cast iron and wood
because such an answer merely describes its objective features. Rather,
the meaning of being of a hammer must be approached by way of its
employment in daily activities, something like The carpenter building
my house is hitting nails with it. Such an account of nails being hit with
a hammer, the hammer being used by the carpenter, and the carpenter
building my house implies the way of presently being for each of these
entities as situated among others:None exist independently but are uni-
fied in their existence via the preunderstanding of how each interacts
in the constitution of a situation characterized by purposeful activity.
Heidegger asserts that the being of equipment is mostly trans-
parent. Put another way, the existence of pieces of equipment is not
noticed much in our daily usage of them. When a carpenter continues
to hit a nail, the hammer becomes transparent to him:The hammer and
31

Phenomenology 31

the nail are absorbed in a connected structure, the purposeful activity


that is house building. However, when he fails to hit the nail correctly,
the unified structure breaks down and the independence of each entity
becomes noticeable. Their relative meanings become interpretable only
in such a breakdown. In the breakdown of the once-unified structure,
the separated entities of the subject and the object become apparent
with self-questioning, like why did I fail? and whats wrong with
the hammer and the nail?. In this way, it is considered that the herme-
neutic approach can provide an immanent understanding of metaphys-
ical existence, such as consciousness, through cycles of self-corrective
analysis.
Heidegger recognizes that ordinary man has rare opportunities to
reflect on the meaning of his own way of being, occupied as he is with
the daily routines of life, and he regards such a state of being as inau-
thentic. Although man can live in his neighboring community occupied
with idle talk and trivia, he cannot become individuated, ultimately
recognizing and taking responsibility for his or her existence in such a
way. Man in this case lives his daily life only in the immediate present,
vaguely anticipating the future and mostly forgetting thepast.
However, Heidegger tells us that this way of being can be changed
to authentic being when man thinks positively about the possibility of
his death, which could occur at any moment and not necessarily so very
far into the future. Death is an absolutely special event because it is the
ultimately individuating condition that cannot be shared with others.
Death is to be regarded as the absolutely certain impossibility of being
further related to any other kind of being, and when confronted in this
way it prompts the authentic being to head toward its own absolute
impossibility.
Here, we must focus on Heideggers brilliant notion that the present
is born via the dynamic interplay between a unique agents projected
future possibilities and its past. In this process, one reclaims ones self
from the undifferentiated flow of idle chatter and everyday routine. This
is authenticity. The authentic agent has the courage to spend the time of
his or her life in becoming an agent of change, to transform the situation
in which one is thrown (the clearing of being as inherited, as one is
born into it) into that ideal self-situation that characterizes ones unique
potential. The inauthentic agent hides from this potential, and rather
invests his or her time in distractions, idle chatter, merely repeating
established routines and defending established conventions regardless
32

32 On theMind

of suboptimal and even grossly immoral results. Thus, Heidegger estab-


lishes the subjective sense of temporality (rather than objective time)
as the ground of authentic being, whereas the inauthentic being tries
to nullify this subjective sense of time by ignoring his or her mortality
and retreating into the blind habit and routine that is characteristic of
fallenness.
Now, we see that his notion of temporality is drastically different
from that of Husserls. Husserl considers that temporality appears as
the result of subjective reflection to articulate direct experiences of
sensory streams as consciously manipulable object or event sequences.
Heidegger, on the other hand, shows how the differentiable aspects of
past, present, and future rise from the mortal condition. Temporality is
the dynamic structure of being, in light of which anything at all comes
to matter, and from which any inquiry into the nature of being, includ-
ing any derivative understanding of time as sequence, for example, is
ultimatelydrawn.
Next, there are other aspects of mind to review, including the
role of the body in mediating interactions between the mind and the
materialworld.

3.5. Embodiment ofMind

In the philosophy of embodiment developed by Merleau-Ponty, we can


easily find the influence of Heideggers being-in-the-world. Merleau-
Pontys notion of embodiment has been recognized as a notion of ambi-
guity, and with this ambiguity he successfully avoided tackling Cartesian
dualism directly. As mentioned before, Descartes thought that the world
consists of two extremesthe subjective mind of nonmaterial and the
objective things of materialsand this invited a problem of interaction.
The problem is how to account for the causal interaction among nonma-
terial mind, material body, and material world while these effectively
exist in different spaces. From this background, Merleau-Ponty devel-
oped his thoughts on embodiment, asking at which pole the body, as a
part of our being, should be taken to lie. Although the body in terms of
flesh can be regarded as material, we actually often experience the body
as an aspect of mind, which is regarded as nonmaterial by Descartes. For
example, our cheeks turn red when we get angry and tears start to fall
when we feel sad. In response, Merleau-Ponty proposes that we consider
33

Phenomenology 33

the body to be an ambiguous existence belonging to neither of these two


extremes.
Merleau-Ponty examined various means of interaction between mind
and body. For example, he presented an analysis of a blind man with
a stick (Merleau-Ponty, 1962). The stick becomes an object when the
blind man grasps it in order to guide his movements. At the same time,
however, it becomes a part of his body when he scans his surroundings
when walking by touching its tip to things, like tactile scanning with
the finger. Although this is an interesting example showing the possi-
bility of body extension, it also recalls the possibility that the range of
self can be extended or shrunk through the use of tools and artifacts.
In another example, his analysis of the phenomenon of phantom limbs
might indicate the complete opposite of the blind mans case. It is said
that people who have had a limb amputated often still experience pain
in the amputated limb that no longer exists. Merleau-Ponty explained
the phenomena in terms of refusal of deficiency, which is the implicit
negation of what runs counter to the natural momentum that throws
us into our tasks, our cares, our situation, and our familiar horizons
(Merleau-Ponty, 1962). It can be summarized then that the analysis of
the blind man with his stick indicates the possibility of extension of the
familiar horizon associated with daily use of the stick, whereas the anal-
ysis of phantom limbs indicates another possibility, one of refusal of the
sudden shrinking of this once familiar horizon. These examples might
help us to understand how the horizon of subjective possibility is con-
stituted via daily interactions between the body and the world, thereby
enriching our understanding of being in theworld.
Along the same line of the thought, Merleau-Ponty addressed the
problem of body schemathe integrated image of the bodyby con-
ducting an analysis of a patient with neurological blindness. His patient,
Schneider, had lesions in vision-related cortical areas. Although he had
a problem in recognizing objects visually, he could pick up a cup to
have a drink or make a fire by striking a match without problems, see-
ing the cup or the match. So, he could see the shapes and outlines of
objects but needed a reasoning process to identify them. When he was
asked to point to his nose, he had difficulty doing so, but could blow
his nose with a handkerchief. He had difficulty pointing to or moving
a part of his body when asked to do so unless he deliberated over his
movement from an objective view ahead of time. In short, he could per-
form concrete movements in natural situations in daily life very easily,
34

34 On theMind

but had difficulty performing abstract movements without context and


without an objective view. Merleau-Ponty came to the conclusion that
such concrete movements situated in everyday life are fundamental to
the consideration of body schema. In concrete movements, our body or
body part is not an object that we move in an objective space. Rather, it
is our living body, the body as a subject, that we move in a bodily space.
These movements performed by our living body are organized in famil-
iar situations in the world, wherein the body comprehends its world and
objects without explicitly representing or objectifying them. The body
communicates with them through a skill or tacit knowledge, by mak-
ing a direct reference to the world and its objects. This direct reference
implies the fundamental structure of being-in-the-world, as Heidegger
discussed in terms of Dasein.
Merleau-Pontys (1962) analysis of synesthesia is also worth introduc-
ing here. Synesthesia, a neurological condition in which sensation in one
modality unconsciously evokes perception in another, has been reported
in a variety of forms. Some synesthetes perceive colors upon seeing certain
shapes or letterforms, feel textures on hearing particular sounds, or expe-
rience strong tastes on hearing certain words. Merleau-Ponty speculates
that these sensory slips should have some clear meaning behind them,
rather than simply being perceptual side effects, which would account for
how we humans engage in the world. Indeed, perception of objects in the
world is achieved in the iterative interactions between multiple modalities
of sensation by reentrant mechanisms established in the coupling of us
and the world. Merleau-Ponty refutes ordinary scientific views of modu-
larity to understand the reality of perception by reducing it into the sum
of each modality of sensation. His approach is to see perception as ongo-
ing structuring processes of the whole, or Gestalt, which appears in the
communicative exchanges between the different modalities of sensation.
He also refutes the notion of separating perception from action. He
explains that the hand touching something reverses into an object that is
being touched because the hand itself is tangible flesh. In shaking hands,
we feel that we are touching anothers hand and simultaneously that our
extended hand is being touched. Analogously, Merleau-Ponty says that a
see-er reverses into a visible object because of the thickness of its flesh.
Thus, vision is analogous to exploring objects in the dark by tactile palpa-
tion. Visual palpation by looking inevitably accompanies a sense of being
seen at the same time. He writes that painters often feel as if the objects
in their own paintings gaze back at them. There are silent exchanges
35

Phenomenology 35

between the see-ers and the objects. Because flesh is tactile as well as
visible, it can touch as well as be touched and can see as well as be seen.
There is flux in the reciprocal network that is body and world, involving
touching, vision, seeing, and things tangible.
Lets take another example. Imagine that your right hand touches your
left hand while it is palpating something. At this moment of touching,
the subjective world of touching transforms into the objective world of
being touched. Merleau-Ponty wrote that, in this sense, the touching
subject passes over to the rank of the touched, descends into the things,
such that the touch is formed in the midst of the world and as it were in
the things (Merleau-Ponty, 1968, pp.133134). Although the subject of
touching and the object of being touched are opposite in meaning, they
are rendered identical when Merleau-Pontys concept of chiasm is applied.
Chiasm, originating from the Greek letter (chi), is a rhetorical method to
locate words by crossing over, combining subjective experience and objec-
tive existence. Although the concept might become a little difficult from
here onward, lets imagine a situation in which a person who has language
to describe only two-dimensional objects happens to encounter a novel
object, a column, as a three-dimensional object, as Tohru Tani (1998) sug-
gests. By exploring the object from different viewpoints such as from the
top or side, he would say that this circular column could be a rectangular
one and this rectangular column could be a circular one (Figure3.3).
When this is written in the form of chiasm, it is expressed as:

[This circle is a rectangle.] X [This rectangle is a circle.]

Thus, the two-d imensional world is extended to a three-d imensional


one in which a circle and a rectangle turn out to be just different views
of the column. The conflict between the two is resolved by means of
creating an additional dimension that supports their identity in a deeper
level. Lets consider then what could be created, or emerge, in the fol-
lowing cases.

[A subject of touching is an object of being touched.] X [An object


of being touched is a subject of touching.]

[A see-er is a visible object.] X [A visible object is a see-er.]

Merleau-Ponty suggests that embodiment as an additional dimen-


sion emerges, in which flesh of the same tangibility as well as the same
thickness can be given to both the subject of touching or seeing and the
object of being touched or being seen. This dimension of embodiment
36

36 On theMind

Figure3.3. Aperson who has language to describe only two-d imensional


objects happens to encounter a novel object, a column, as a three-
dimensional object.

can facilitate the space for iterative exchanges between the two poles of
subject and object:

There is a circle of the touched and the touching, the touched


takes hold of the touching; there is a circle of the visible and the
seeing, the seeing is not without visible existence. My body as a
visible thing is contained within the full spectacle. But my seeing
body subtends this visible body, and all the visibles with it. There is
reciprocal insertion and intertwining of one in the other (Merleau-
Ponty, 1968, p.143).

Merleau-Ponty, in exploring ambiguity between the two poles of


subjectivity and objectivity, did not anchor his thoughts in the midst
of these two extremes, but rather allowed them to move dynamically
between the two. By positioning the poles to face each other, he would
have imagined a flux, a flow, from one pole to the other and an intertwin-
ing of the two in the course of resolving the apparent conflicts between
them in the medium of embodiment. When the flux intertwines the
two, the subject and the object become an inseparable being reciprocally
inserted into each other with the world arising in thegap.
37

Phenomenology 37

Recently, the thoughts on embodiment have been revived and have


provided significant influences in cognitive science in terms of the rising
embodied minds paradigm in the philosophy of mind and cognitive
sciences (Varela, Thompson & Rosch, 1991; Clark, 1998; Ritter etal.,
2000; ORegan & No, 2001). Actually, a new movement, referred as
the behavior-based approach (Brooks, 1990) in artificial intelligence
and robotics started under this trend, as is repeatedly encountered in
later chapters.
Lets move on now to examination of the concept of the stream of
consciousness put forward by the pioneering American psychologist
and philosopher William James (1892) more than a half century before
Merleau-Ponty. As we go, well find some connection between James
thinking and Husserls concept of time perception, especially that of
the level of absolute flow. Also, well see a certain affinity between this
notion and that of Merleau-Pontys in his attempt to show the imma-
nent dynamics of our inner phenomena. By examining James stream
of consciousness, we can move closer toward answering how our will
might befree.

3.6. Stream ofConsciousness and Free Will

We experience our conscious states of mind as thoughts, images, feel-


ings, and desires that flow while they constantly change. James defines
his notion of the stream of consciousness as the inner coherence or unity
of conscious states as they proceed from here to the next. He explains
the four essential characteristics of this stream in his monumental
Principles of Psychology (1918, p.225) as follows:

1. Every state tends to be part of a personal consciousness.


2. Within each personal consciousness states are always
changing.
3. Each personal consciousness is sensibly continuous.
4. It is interested in some parts of its object to the exclusion of
others, and welcomes or rejectschooses from among them,
in a wordall thewhile.

The first characteristic means that the various states comprising the
stream are ultimately subjective matters that the subjects feel they
38

38 On theMind

experience by themselves. In other words, the subjects can keep them


private in their states of mind. The second characteristic, one of the
most important of James claims, asserts that although the stream pre-
serves the inner coherence as one stream, its states are constantly chang-
ing autonomously as various thoughts and images are generated. James
writesthat:

When we take a general view of the wonderful stream of our


consciousness, what strikes is the pace of its parts. Like a birds life,
it seems to be an alternation of flights and perchings (James, 1918,
p.243).

James considers that the stream comprises successions of substantive


parts of stable perchings and transitive flights. Conscious states of
thoughts and images appear more stably in the substantive parts. On
the other hand, the transitive parts generate successive transitions from
one substantive part to another in temporal association. This alternation
between the two parts takes place only intermittently, and the dura-
tion of each substantive part can be quite different but only in terms of
subjective feeling of time. Here, we can find a structural similarity to
what Tohru Tani (1998) interpreted as consciousness flowing as well as
stagnating when referring to Husserls flow of absolute consciousness.
Although it is said that the transitive parts function to connect and
relate various thoughts and images, how are they actually felt phenom-
enally? James describes them as a subtle feeling like when, immediately
after hearing someone say something, a relevant image is about to pop
into the mind but is not yet quite fully formed. Because the thoughts
and images are so faint, they are lost if we attempt to catch them. The
transitive parts are the fringes of stable images and relate to each other,
where information flows like the free water of consciousness around
these images. James considers these fringes to be more essential than
stable images and that the actual stream of consciousness is generated
by means of tensional dynamics between stable images related to each
other by their fringes.
The third observation suggests that the private states of consciousness
constantly change but only continuously so. James says that conscious-
ness is not like chopped up bits or jointed segments but rather flows like
a river. This statement appears to conflict with the concept of time per-
ception at the objective time level put forward by Husserl, because he
considered that objective time comprises sequences of discrete objects
39

Phenomenology 39

and events. However, James idea is analogous to the absolute flow level,
as mentioned before. Isuspect that James limited his observation of the
stream of consciousness to the level of pure experience and did not pro-
ceed to observation of the higher level such as Husserls objective time.
We can consider then that the notion of the stream of consciousness
evolved from James notion of present existence characterized by con-
tinuous flow to Husserls notion of recall or reconstruction with trains
of segmented objects. Alongside this discussion, from the notion of the
sensible continuity of the stream of consciousness we can see another
essential consequence of James thought, that the continuous generation
of the next state of mind from the current one endows a feeling that
each state in the stream belongs to a single enduring self. The experi-
ence of selfhoodthe feeling of myself from the past to the present as
belonging to the same selfmight arise from the sensible continuity of
the consciousstate.
Finally, the fourth observation professes that our consciousness
attends to a particular part of experiences in the stream. Or, that con-
sciousness brings forth some part of a whole as its object of attention.
Heidegger (1962) attends to this under the heading of attunement,
and James observations of this aspect of the stream of consciousness
lead to his conception of free will. Free will is the capability of an agent
to choose freely, by itself, a course of action from among multiple alter-
natives. However, the essential question concerning free will is that if
we suppose that everything proceeds deterministically by following the
laws of physics, what is left that enables our will to be free? According
to Thomas Hobbes, a materialist philosopher, voluntary actions are
compatible with strict logical and physical determinism, wherein the
cause of the will is not the will itself, but something else which is not
disposed of it (Molesworth, 1841, p.376). He considers that will is not
in fact free at all because voluntary actions, rather than being random
and uncaused, have necessary causes.
James proposed a possible model for free will that combines random-
ness and deterministic characteristics, in the so-called two-stage model
(James, 1884). In this model, multiple alternative possibilities are imag-
ined with the help of some degree of randomness in the first stage and
then one possibility is chosen to be enacted through deterministic evalu-
ation of the alternatives in the second stage. Then, how can these possible
alternatives, in terms of the course of actions or images, be generated?
James considers that all possibilities are learned by way of experience. He
40

40 On theMind

says, when a particular movement, having once occurred in a random,


reflex, or involuntary way, has left an image of itself in the memory, then
the movement can be desired again, proposed as an end, and deliberately
willed (James, 1918, p.487). He considers further that the iterative expe-
riences of different movements result in connections and relations among
various images of movements in memory. Then, multiple alternatives can
be imagined as accidental generations with spontaneous variations from
the memory that has been consolidated, and finally one of the alternatives
is selected for actual enactment. These accidental generations with spon-
taneous variations might be better understood by recalling how James
stream of consciousness is constituted. The stream is generated by transi-
tions of thoughts and images embedded in the substantial part. When the
memory holds complex relations or connections between images of past
experiences, images can be regenerated with spontaneous variations into
streams of consciousness (see Figure 3.4 for an illustration of his ideas).
James considers that all of these things are mechanized by dynamics
in the brain. He writes:

Consider once again the analogy of the brain. We believe the brain
to be an organ whose internal equilibrium is always in a state of
changethe change affecting every part. The pulses of change are

Stable image Transitive


relation

Experiences Learning Memory Multiple streams of Actual action


an image generated selected
with spontaneous
variations

Figure3.4. An interpretative illustration of Jamess thought accounting


for how possible alternatives can be generated. Learning from various
experiences forms a memory that has a relational structure among
substantial images associated with actions. Multiple streams of action images
can be generated with spontaneous variations of transitions from among the
images embedded in the memory. One of those streams of action images is
selected for actual generation.
41

Phenomenology 41

doubtless more violent in one place than in another, their rhythm


more rapid at this time than at that. As in a kaleidoscope revolving
at a uniform rate, although the figures are always rearranging
themselves, So in the brain the perpetual rearrangement must
result in some forms of tension lingering relatively long, whilst
others simply come and pass (James,1892).

It is amazing that more than 100years ago James already had devel-
oped such a dynamic view of brain processes. His thinking is compatible
with todays cutting-edge views outlined in studies on neurodynamic
modeling, as seen in later chapters.

3.7.Summary

Skeptics about the symbolist framework for representing the world as


put forward by traditional cognitive science and outlined in chapter2
has led us in the present chapter to look to phenomenology for alter-
native views. Lets take stock of what weve covered. Phenomenology
begins with an analysis of direct experiences that are not yet articu-
lated by any ideas or thoughts. Husserl considers that objects and the
world can exist because they can be meditated on, regardless of their
corresponding existences in the physical world. Their representations
are constituted by means of the intentionality of direct experiences, a
process that entails consciousness. Although Husserl thinks that such
representations are intended to be idealistic so as to be logically trac-
table, his thinking has been heavily criticized by Dreyfus and other
modern philosophers. They claim that the inclination to ideality with
logical formalism has turned out to provide a foundation for the symbol-
ist framework envisioned by current cognitive science.
It was Heidegger who dramatically redirected phenomenology by
returning to the problem of being. Focusing on the ways of being in
everyday life, Heidegger explains through his notion of being-in-the-
world that things can exist on account of the relational structure between
them, for example, considering our usage of things. His thinking lies
behind my early question, discussed in the introduction to this book, as
to what the object of a refrigerator can actually mean to a robot when
it names it a refrigerator. The refrigerator should be judged not from
its characteristic physical features but from the ways in which it is used,
42

42 On theMind

such as for taking a chilled beer from it. Heidegger also says that such
being is not noticed particularly in daily life as we are submerged in rela-
tional structures, as usage becomes habit and habit proceeds smoothly.
We become consciously aware of the individual being of the subject and
the object only in the very moment of the breakdown in the purposeful
relations between them; for example, when a carpenter mishits a nail
in hammering, he notices that himself, the hammer, and the nail are
independent beings. In a similar way, when habits and conventions break
down, no longer delivering anticipated success, the authentic individual
engages in serious reflection of these past habits, transforms them, and
thus lives proactively for his or her own most future alongside and
with others with whom these habits and conventions are shared.
Merleau-Ponty, who was influenced by Heidegger, examined bodies
as ambiguous beings that are neither subject nor object. On Merleau-
Pontys account, when seeing is regarded as being seen and touching as
being touched, these different modalities of sensation intertwine and
their reentrance through embodiment is iterated. By means of such
iterative processes, the subject and the object constitute an inseparable
being, reciprocally inserted into each other in the course of resolving
the apparent conflicts between them in the medium of embodiment.
Recently, his thoughts on embodiment have been revived and have
provided significant influences in cognitive science in terms of the ris-
ing embodied minds paradigm, such as by Varela and his colleagues
(Varela, Thompson & Rosch,1991).
We finished this chapter by reviewing how William James explained
the inner phenomena of consciousness and free will. His dynamic
stream of conscious is generated by spontaneous variations of images
from past experiences consolidated in memory. More than a century
later, his ideas are still inspiring work in systems neuroscience. By the
way, do these thoughts deliberated by those philosophers suggest any-
thing useful for building minds, though? Indeed, at the least we should
keep in mind that action and perception interact in a complicated man-
ner and that our minds should emerge via such nontrivial dynamic pro-
cess. The next chapter examines neuroscience approaches for exploring
the underlying mechanisms of the cognitive minds in biological brains.
43

4
Introducing theBrain
and Brain Science

In the previous chapter, we saw that a phenomenological understanding


of the mind has come from introspection and its expression through
language. We understand the words used intuitively or deliberatively by
matching them with our own experiences and images. This approach
to understanding the mind, that of subjective reflection, is clearly an
essential approach, and is especially valuable when coupled with the
vast knowledge that has been accumulated through other scientific
approaches, such as neuroscience, which make use of modern technolo-
gies to help us understand how we think by understanding how the
brain works. The approach that neuroscience, or brain science, takes
is quite different from that of cognitive science and phenomenology
because it rests on objective observation of biological phenomena in the
brain. It attempts to explain biological mechanisms for various cogni-
tive functions such as generating actions, recognizing visual objects, or
recognizing and generating speech.
However, readers should note that brain science is still in a relatively
early stage of development and we have no confirmative accounts even
for basic mechanisms. What we do have is some evidence of what is hap-
pening in the brain, albeit in many cases the evidence is still conflicting.

43
44

44 On theMind

What we have to do is build up the most likely construct for a theory


of the brain by carefully examining and linking together all the pieces
of evidence we have thus far accumulated, held against guiding insights
into the phenomenology of the human condition, such as those left by
James and Merleau-Ponty, while adding yet more experimental evidence
in the confirmation or disputation of these guiding insights. In the proc-
ess, further guiding insights may be generated, and research into the
nature of the mind relative to the function of the brain will advance.
The next section starts with a review of the current state of the art
in brain science with a focus on the processes of visual recognition and
action generation, essential for creating autonomous robots. First, the
chapter provides a conventional explanation of each independently, and
then covers recent views that argue that these two processes are effec-
tively inseparable. At the end of this chapter, we introduce some ideas
informed by our robotics experiments on how intentions for actions
originate in (human and other animal, organic not artificial) brains.

4.1. Hierarchical Brain Mechanisms forVisual Recognition


and Action Generation

This section explores how visual recognition and action generation


can be achieved in brains by reviewing accumulated evidence. A spe-
cial focus will be put on how those processes work with hierarchical
organization in brains, because insights into this structure help to guide
us in approaching outstanding questions in cognitive science, such as
how compositional manipulations of sensory- motor patterns can be
achieved, as well as how the direct experience of sensorymotor flow
can be objectified.

4.1.1 Visual Recognition Through Hierarchy and Modularity

First, let us look at the visual recognition process. Visual recognition is


probably the most examined brain function, because related neuronal
processes can be investigated relatively easily in electrophysiological
experiments with nonmoving, anesthetized animals. The visual stim-
ulus enters the retina first, proceeds to the lateral geniculate nucleus
in the thalamus, and then continues on to the primary visual cortex
(V1). One important characteristic assumed in the visual cortex as
45

Introducing the Brain and Brain Science 45

VIP
LIP
LIP: lateral intraparietal area

wh
er
VIP: ventral intraparietal area

e
MST/MT
MST: medial superior temporal area
V2 V1 MT: middle temporal area
V4 TEO, TE: inferior temporal areas
TEO
TE what

Figure4.1. Visual cortex of the macaque monkey showing the what and
where pathways schematically.

well as in other sensory cortices is its hierarchical and modular pro-


cessing, which uses specific neuronal connectivity between local
regions. Figure 4.1 shows the visual cortex of a macaque monkey in
which the visual stimulus from the retina through the thalamus enters
V1 located in the posterior part of the cortex.
V1 is thought to be responsible for lower end processing such as edge
detection by using so-called columnar organization. The cortical columns
for edge detection in V1 are arrayed for continuously changing orienta-
tion. The orientation of the perceived edge in the local receptive field
is detected in a winner-take-all manner; that is, only the best matching
column for the edge orientation is activated (i.e., neurons in the column
get fired) and other columns become silent.
After V1, the signal propagates to V2 where columns undertake
slightly more complex tasks such as perceiving different orientations of
line segments by detecting the end terminals of the line segments. After
V2, the visual processing pathway branches into two:The ventral path-
way reaches areas TEO and TE in the inferotemporal cortex, passing
through V4, and the dorsal pathway reaches areas LIP and VIP in the
parietal cortex, passing through the middle temporal area (MT) and
medial superior temporal area (MST). The ventral branch is called the
what pathway owing to its main involvement in object identification and
the latter is called the where pathway due to its involvement in informa-
tion processing related to position and movement.
Taking the case of the where pathway first, it is said that the MT
detects direction of object motion with a relatively small receptive
field, whereas the MST detects background scenes with a larger recep-
tive field. Because movements in the background scene are related to
own body movements in many cases, the MST consequently detects
46

46 On theMind

Figure4.2. Cell responses to complex object features in area TE in the


inferotemporal cortex. Columnar modular representation in the TE for
complex visual objects. Redrawn from (Tanaka,1993).

self-movements. This information is then sent to areas such as the VIP


and LIP in the parietal cortex. Cells in the VIP are multisensory neu-
rons that often respond to both a visual stimulus and somatosensory
stimulus. For example, it has been found that some VIP neurons in
macaque monkeys respond when the experimenter strokes the animals
face, and the same neurons fire when the experimenter shakes the mon-
keys hand in front of its face. As discussed later, many neurons in the
parietal cortex integrate visual inputs with another modality of sen-
sation (i.e., somatosensory, proprioception, or auditory). LIP neurons
are involved in processing saccadic eye movements, enabling the visual
localization of objects.
In the case of the what pathway, cells in V4 respond to specific
contours or simple object features. Cells in the TEO respond to both
simple and complex object features, and cells in the TE respond only
to complex object features. In terms of the visual processing that
occurs in the inferotemporal cortex, inspiring observations were made
by Keiji Tanaka (1993) when conducting single-u nit recording1 in par-
tially anesthetised monkeys while showing the animals a set of artifi-
cially created complex object features. Columnar representations were
found in the TE for a set of complex object features, wherein most of
the cells in the same column reacted to similar complex object features
(Figure4.2).

1.Single-u nit recording is a method of measuring the electro-


physiological
responses of a single neuron using a microelectrode system.
47

Introducing the Brain and Brain Science 47

TE

V4

V2

V1

Figure4.3. Schematic illustration of visual perception in the what pathway.

For example, in a particular column that encodes starlike shapes,


different cells may react to similar starlike shapes that have a differ-
ent number of spines. This observation suggests that TE columns rep-
resent a set of complex object features discretely like visual alphabets,
but allow a range of modulation of complex object feature within the
column. It can be summarized then that visual perception of objects
might be compositional in the what pathway, in the sense that a set of
visual parts registered in a previous level of the hierarchy are spatially
combined in its next level, as illustrated in Figure4.3.
In the first stage in V1, edges are detected at each narrow local recep-
tive field from the raw retinotopic image, and in V2 the edge segments
are detected. In V4 with its larger receptive field, connected edge seg-
ments for continuously changing orientations are detected as a single
contour curvature. Then in the TEO, geometric combinations of con-
tour curvatures in a larger again receptive field are detected as a simple
object features (some could be complex object features). Finally, in the
TE, combinations of the object feature are detected as a complex object
feature. It seems that columns in each visual cortical area represent
primitive features at each stage of visual processing. Furthermore, each
primitive feature represented in a column might be parameterized for
minor modulation by local cell firing patterns.

4.1.2 Counter Arguments

As mentioned in the beginning of this section, we must exercise some


caution in interpreting actual brain mechanisms from the data available
48

48 On theMind

to us thus far. Although the aforementioned compositional mechanisms


for visual recognition were considered utilizing explicit representations
of the visual parts stored in the local columns and hierarchical manipu-
lation of those from the lower level to the higher, the real mechanism
may not be so simply mechanical but also highly contextual.
There is accumulating evidence that neuronal response in the local
receptive field in early vision can be modulated contextually by means
of lateral interactions with areas outside of the receptive field as well as
through top-down feedback from higher levels. Although contours are
thought to be perceivable only after V4 in the classical theory, Li and
colleagues (2006) showed that, in monkeys performing a contour detec-
tion task, there was a close correlation between the responses of V1 neu-
rons and the perceptual saliency of contours. Interestingly, they showed
that the same visual contours elicited significantly weaker neuronal
responses when they were not the objects of attention. They concluded
that contours can be perceived even in V1 by using the contextual infor-
mation available at this same level and the higherlevel.
Kourtzi and colleagues (2003) provided corroborative evidence that
early visual areas V1 and V2 respond to global rather than simple local
features. It was argued that context modulation in the early visual cor-
tex has a highly sophisticated nature, in effect putting the local features
to which the cells respond into their full perceptual global context.
These experimental results were obtainable because of the use of awake
animals rather than anesthetized ones during the recording. In the elec-
trophysiological experiments of the visual cortex, animals are usually
anesthetized so as to avoid contamination of purely bottom-up percep-
tual signals with unnecessary top-down signals from the higher order
cognitive brain regions such as the prefrontal cortex. Contrary to this
method, however, top-down signals seem to be equally as important as
the bottom-up ones in understanding the hierarchy of vision. Rajesh Rao
and Dona Ballard (1999) proposed so-called predictive coding as a model
for hierarchical visual processing in which the top-down signal conveys
prediction from the higher level activity to the lower one, whereas the
bottom-up signal conveys the prediction error signal from the lower
level, which modulates the higher level activity. They argue that the
visual recognition of complex objects is achieved via such interaction
between these two pathways rather than merely through the bottom-up
one. This insight is deeply important to the neurorobotic experiments
tocome.
49

Introducing the Brain and Brain Science 49

The modularity of feature representation in the columnar organiza-


tion is also questionable. Yen and colleagues (2007) made simultaneous
recordings of multiple early visual cortex cells in cats while showing the
animals movies containing scenes from daily life. What they found was
that there is a substantially large heterogeneity in the responses of adja-
cent cells in the same columns. This finding obviously conflicts with the
classical view that cells with similar response properties are clustered
together in columns. They mention that visual cortex cells could have
multiple response dimensions.
To sum up, the presumption of strict hierarchical and modular proc-
essing in visual recognition might have to be reconsidered given accumu-
lated evidence obtained as experimental setups become more realistic.
The next subsection begins this process concerning action generation in
thebrain.

4.1.3 Action Generation Through Hierarchy

Understanding the brain mechanisms behind action generation is essen-


tial to our attempts at understanding how the mind works because
actions tie the subjective mind to the objective world. It is generally
thought that complex actions can be generated by moving through mul-
tiple stages of processing in different local areas in the brain in a similar
way to how visual perception is achieved. Figure 4.4 shows the main
brain areas assumed to be involved in action generation in the cortex.
The supplementary motor area (SMA) and the premotor cortex
(PMC) are considered to sit at the top of the action generation hierar-
chy. Some researchers think that the prefrontal cortex may play a fur-
ther higher functional role in action generation, sitting as it does above
the SMA or PMC, and we will return to this view later. It is generally
held that the SMA is involved in organizing action programs for volun-
tary action sequences, whereas the PMC is involved in organizing action
programs for sensory guided action sequences. Because these areas have
dense projections to the primary motor cortex (M1), the idea is that
detailed motor patterns along with the motor program are generated in
M1. Then, M1 sends the motor pattern signals via the pons and cerebel-
lum to the spinal cord, which then sends out detailed motor commands
to the corresponding muscles to finally initiate physical movement. As
a seminal study for the primary motor cortex, Georgopoulos and col-
leagues (1982) found evidence in electrophysiological experiments in
50

50 On theMind

PMC M1
SMA Parietal cortex
Prefrontal
cortex

Inferior parietal cortex

Figure4.4. The main cortical areas involved in action generation include the
primary motor cortex (M1), supplementary motor area (SMA), premotor
cortex (PMC), and parietal cortex. The prefrontal cortex and inferior
parietal cortex also play importantroles.

monkeys that the direction of hand movement or reaching behavior is


encoded by a population of neural activities in M1. In the following, we
review possible relationships between SMA and M1 and between PMC
andM1.

4.1.4 Voluntary Sequential Movements in


the Supplementary MotorArea

Considerable evidence suggests that hierarchical relations exist between


the SMA and M1. One well-k nown example involves patients with alien
hand syndrome due to lesions in the SMA. These patients tend to gen-
erate actions completely bypassing their consciousness. For example,
when they see a comb, their hand reaches out to it and they comb their
hair compulsively. It is essential to note that skilled behaviors involved
in combing their hair are completely intact. The people act well, it is
just that they seem unable to regulate their actions at will. By way of
explanation, it is thought that the SMA might regulate the generation of
skilled behaviors by placing inhibitory controls over M1, which encodes
a set of basic movement patterns including the one for combing hair. So
if this inhibitory control is attenuated by lesions in the SMA, the mere
51

Introducing the Brain and Brain Science 51

perception of a comb could automatically trigger the movement pattern


for the combing of hair storedinM1.
Neurophysiological evidence for the encoding of voluntary sequential
movement in the SMA was obtained in pioneering studies conducted by
Tanjis group (Tanji & Shima, 1994; Shima & Tanji, 1998; Shima & Tanji,
2000). In these studies, monkeys were trained to be able to regenerate
a set of specific sequential movements involving a combination of three
primitive movementspulling, pushing, and turning a handle. In each
sequence, the three primitive movements were connected in serial order
with a specific time interval at each transition of movement. After the
training, the monkeys were required to regenerate each learned sequen-
tial movement from memory without any sensory cues being given. In
this way, the task can be regarded as memory driven rather than sen-
sory reactive. In the unit recording in the SMA during the regeneration
phase, three types of task-related cells werefound.
The first interesting finding was that 54 out of 206 recorded cells
showed sequence-specific activities. Figure 4.5a shows raster plots of
one of these 54 cells, in this case an SMA cell, which was activated only
before the sequence Turn-Pull-Push (lower) was initiated, not before
other sequences such as Turn-Push-Pull (upper) were initiated. It is
interesting to note that it took a few seconds for the SMA cell to be fully
activated before onset of the sequential movements and that the acti-
vation was diminished immediately after onset of the first movement.
It is assumed, therefore, that the cell is responsible for preparing the
action program for the specific sequential movement. This is contrasted
with the situation observed in the M1 cell shown in the raster plot in
Figure4.5b. The M1 cell started to become active immediately before
the onset of the specific movement and became fully activated during
the actual movement itself. The preparatory period of this M1 cell was
quite short, within a fraction of a second.
Tanji and Shimas results imply that some portion of SMA cells play
an essential role in the generation of compositional actions by sequen-
tially combining primitive movements. These cells might encode whole
sequences as abstract action programs with slowly changing activation
profiles during the preparatory period. This activity might then lead to
the activation of other SMA cells that can induce specific transitions
from one movement to another during run time by activating partic-
ular M1 cells, as well as SMA cells that encode corresponding move-
ments with rapidly changing activation profiles. Here, we can assume
52

52 On theMind

(a) SMA
Turn Push Pull
Raster
plots M1
[SEQ4] Turn Pull Push
Mean
firing

sec.
Turn Pull Push
Raster
plots
sec. 1s
Mean
firing (b)

sec.
1s

Figure4.5. Raster plots of showing cell firing in multiple trials in the upper
part and the mean firing rate across the multiple trials in the supplementary
motor area (SMA) and primary motor cortex (M1) during trained sequential
movements. (a)An SMA cell activated only in the preparatory period for
initiating the Turn-Pull-Push sequence shown in the bottom panel, not
for other sequences such as the Turn-Push-Pull sequence shown in the top
panel. (b)An M1 cell encoding the single Push movement. Adopted from
Tanji and Shima (1994) with permission.

a certain spatiotemporal structure that affords hierarchical organiza-


tion of sequential movements. In later work, Shima and Tanji (2000)
reported further important findings from more detailed recording in
a similar task protocol. Some cells were found to play multiple func-
tional roles:Some SMA cells encoded not only a single specific motor
sequence, but two or three different sequences out of four trained
sequences. This suggests an interesting neuroscientific result that a set
of primitive sequences is represented by distributed activation of some
SMA cells rather than each sequence being represented by some specific
cells exclusively and uniquely.
Although evidence acquired by various brain measuring techniques
supports the notion that hierarchical organization of voluntary sequen-
tial movements occurs in the SMA for abstract sequence processing and
in M1 for detailed movement patterns, this view is not yet set in stone.
Valuable challenges have arisen against the idea of the SMA encod-
ing abstract sequences. Lu and Ashe (2005) recorded M1 cell activity
53

Introducing the Brain and Brain Science 53

during sequential arm movements in monkeys. In the task, each arm


movement was either downward, upward, toward the left, or toward
the right. It was found that the neural activity of some M1 cells imme-
diately before onset of the sequential movements anticipated the com-
ing sequences, and that 40% of the recorded M1 cells could do this.
Surprisingly, this percentage is much higher than that observed in the
SMA by Tanji and Shima. Are the sequence-related activities of M1 cells
merely epiphenomena that reflect the activity of SMA cells upstream
or do they actually function to initiate corresponding motor sequences?
Lu and Ashe dispelled any doubt about the answer by demonstrating
that a lesion among the M1 cells, artificially created by microinjection
of chemicals, degraded only the generation of sequences not each move-
ment. It seems then that M1 cells primarily encode sequences rather
than each movement, at least in the monkeys and cells involved in Lu
and Ashes experiment.

4.1.5 Sensory-Guided Actions inthe PremotorCortex

The SMA is considered by most to be responsible for organizing com-


plex actions such as sequential movements based on internal motiva-
tion, whereas the PMC is considered to generate actions in a more
externally driven manner by making use of immediate sensory infor-
mation. Mushiake in Tanjis group showed clear neurophysiological
evidence for this dissociation (Mushiake et al., 1991). They trained
monkeys to generate sequential movements under two different condi-
tions:the internal motivation condition in which the monkeys remem-
bered sequential movements and reproduced them from memory, and
the external sensory driven condition in which the monkeys generated
sequential movements guided by given visual cues. Unit recording in
both the SMA and PMC during these two task conditions revealed a
distinct difference in the functional roles of these two regions. During
both the premovement and movement periods, PMC neurons were
more active when the task was visually guided and SMA neurons were
more active when the sequence was self-determined from memorized
sequential movements. It is known that there are so-called bimodal neu-
rons in the PMC that respond to both specific visual stimuli and to ones
own movement patterns. These bimodal neurons in the PMC associ-
ated with visual movement are said to receive what information from
the inferotemporal cortex and where information from the parietal
54

54 On theMind

cortex. Thus, these bimodal neurons seem to enable the PMC to orga-
nize sensory-g uided complex actions.
Graziano and colleagues (2002) in their local stimulation experi-
ments on the monkey cortex demonstrated related findings. However,
in some aspects, their experimental results conflict with the conven-
tional ideas that M1 encodes simple motor patterns such as directional
movements or reaching actions as shown by Georgopoulos and col-
leagues. They stimulated motor-related cortical regions with an elec-
tric current and recorded the corresponding movement trajectories of
the limbs. Some stimuli generated movements involved in reaching to
specific parts of the monkeys own body including the ipsilateral arm,
mouth, and chest, whereas others generated movements involving
reaching toward external spaces. They found some topologically pre-
served mapping from sites over a large area including M1 and PMC to
the generated reaching postures. The hand reached toward the lower
space when the dorsal sites in the region were stimulated, for example,
but reached toward the upper space when the ventral and anterior sites
were stimulated. It was also found that many of those neurons were
bimodal neurons exhibiting responses also to sensory stimulus. Given
these results, Graziano and colleagues have adopted a different view
from the conventional one in that they believe that functional specifi-
cation is topologically parameterized as a large single map, rather than
there being separate subdivisions such as M1, the PMC, and the SMA
that are responsible for differentiable aspects of motor-related func-
tions in a more piecemeal fashion.
So far, some textbookish evidence has been introduced to account for
the hierarchical organization of motor generation, whereby M1 seems
to encode primitive movements, and the SMA and PMC are together
responsible for the more macroscopic manipulation of these primitives.
At the same time, some counter evidence was introduced that M1 cells
function to sequence primitives as if no explicit differences might exist
between M1 and the PMC. Some evidence was also presented indicating
that many neurons in the motor cortices are actually bimodal neurons
that participate not only in motor action generation but also in sensory
perception. The next section explores an alternative view accounting for
action generation mechanisms, which has recently emerged from obser-
vation of bimodal neurons that seem to integrate these two processes of
action generation and recognition.
55

Introducing the Brain and Brain Science 55

4.2. ANew Understanding ofAction Generation


and Recognition intheBrain

This book has alluded a number of times to the fact that perception of
sensory inputs and generation of motor outputs might best be regarded
as two sides of the same coin. In one way, we may think that a motor
behavior is generated in response to a particular sensory input. However,
in the case of voluntary action, intended behaviors performed by bodies
acting on environments necessarily result in changes in proprioception,
tactile, visual, and auditory perceptions. Putting two together, a subject
should be able to anticipate the perceptual outcomes for his or her own
intended actions if similar actions are repeated under similar conditions.
Indeed, the developmental psychologists Eleanor Gibson and Anne Pick
have emphasized the role of perception in action generation. They once
wrote in their seminal book (2000) that infants are active learners who
perceptually engage their environments and extract information from
them. In their ecological approach, learning an action is not just about
learning a motor command sequence. Rather, it involves learning possible
perceptual structures extracted during intentional interactions with the
environment. Indeed, actions might be represented in terms of an expec-
tation of the resultant perceptual sequences caused by those intended
actions. For example, when Ireach for my mug of coffee, it might be
represented by a particular sequence of proprioception for my hand to
make the preshape for grasping, as well as a particular sequence of visual
perception of my hand approaching the mug with a specific expectation
related to the moment of touching it. Eminent neuroscientist Walter
Freeman (2000) argues that action generation can be regarded as a pro-
active process by supposing this sort of actionperception cycle, rather
than as the more passive, conventional perceptionaction cycle whereby
motor behaviors are generated in response to perception.
Upon keeping minds of these arguments, this chapter starts by exam-
ining the functional roles of the parietal cortex, as this area appears
to be the exact place where the top-down perceptual image for action
intention originating in the frontal area meets the perceptual reality
originating bottom-up from the various peripheral sensory areas. Thus
located, the parietal cortex may play an essential role in mediating
between the two, top and bottom. It then examines in detail so-called
mirror neurons that are thought to be essential to pair generation and
56

56 On theMind

to perceptual recognition of actions. It is said that the finding of mir-


ror neurons drastically changed our understanding of the brain mecha-
nisms related to action generation and recognition. Finally, the chapter
rounds out by looking at neural correlates for intentions, or will, that
are thought to be initiated farthest upstream in the actional brain net-
works, by examining some evidence from neuroscience that bear on the
nature of freewill.

4.2.1 The Parietal Cortex:Where Action Intention


and Perceptual OutcomeMeet

The previous section (4.1) discussed the what and where pathways
in visual processes. Today, many researchers refer to the where path-
way that stretches from V1 to the parietal cortex as the how pathway
because recent evidence suggests that it is related more to behavior gen-
eration that makes use of multimodal sensory information than merely
to spatial visual perception. Mel Goodale, David Milner, and colleagues
(1991) conducted a series of investigations on patient D. F. who had
visual agnosia, a severe disorder of visual recognition. When she was
asked to name some household items, she misnamed them, calling a cup
an ashtray or a fork a knife. However, when she was asked to pick up
a pen from the table, she could do it smoothly. In this sense then, the
case of D.F.is very similar to that of Merleau-Pontys patient Schneider
(see c hapter3). Goodale and Milner tested D.F.s ability to perceive the
three-d imensional orientation of objects. Later, D.F.was found to have
bilateral lesions in the ventral what pathway, but not in the dorsal how
pathway, in the parietal cortex. This implies that D.F.could not recog-
nize three-d imensional objects visually using information about their
category, size, and orientation because her ventral what pathway includ-
ing the inferotemporal cortex was damaged. She could, however, gener-
ate visually guided behaviors without conscious perception of objects.
This was possible because her dorsal pathway including the parietal cor-
tex was intact. Thus, the parietal cortex appears to be involved in how
to manipulate visual objects, by allowing a close interaction between
motor components and sensory components.
That the parietal cortex involves the generation of skilled behaviors
by integrating vision-related and motor-related processes is a notion
supported by the findings of electrophysiological experiments, espe-
cially those concerning bimodal neurons in the parietal cortex of the
57

Introducing the Brain and Brain Science 57

monkey during visually guided object manipulation. Hideo Sakata and


colleagues (1995) identified populations of neurons that fire both when
pushing a switch and when visually fixating on it. Skilled object manip-
ulation behaviors such as pushing a switch should require an association
between the visual information about the object itself and the motor
outputs required for acting on it and so, by extension, some popula-
tions of parietal cortex neurons should participate in this association by
accessing both modalities of information.
Damage to the parietal cortex in humans, such as that caused by cere-
bral hemorrhage due to stroke or trauma, for instance, can result in various
deficits in skilled behavior needed for tool use. In the disorder ideational
apraxia, individuals cannot understand how to use tools:If they are given
a comb, they might try to brush their teeth with it. In ideomotor apraxia,
individuals have difficulty particularly with miming: When asked to
mime using a knife, they might knock on the table with their fist or when
asked to mime picking up tiny grains of rice, they move their hand toward
the imagined grains but with it wide open. These clinical observations
suggest that the parietal cortex might store some forms of knowledge, or
models, about the external world (e.g., objects, tools, and surrounding
workspace), and through these models various mental images about pos-
sible interactions with the external world can be composed.
How can the skills or knowledge for object manipulation, or tool
usage, be mechanized in the parietal cortex? Such skills would seem to
require not only motor pattern generation but also proactive represen-
tation of the perceptual image associated with the motor act. Although
the parietal cortex is conventionally seen as being responsible for inte-
grating input from multiple sensory modalities, an increasing number
of recent studies suggest that the parietal cortex might participate in
predicting perceptual inputs associated with behaviors by acquiring
some type of internal model (Sirigu et al., 1996; Eskandar & Assad,
1999; Desmurget & Grafton, 2000; Ehrsson et al., 2003; Mulliken
etal., 2008; Bor & Seth, 2012). In particular, Mulliken and colleagues
(2008) found direct evidence for the existence of the predictive model
in the parietal cortex in their unit recording experiment involving
monkeys performing a joystick task to control a cursor. They found
that specific cells in the parietal cortex encode temporal estimates of
the direction in which the cursor is moving, estimates that cannot be
obtained directly from either of the current sensory inputs or motor
outputs to the joystick, but can be obtained by forward prediction.
58

58 On theMind

Now, lets consider how predicting perceptual sequences could


facilitate the generation of skilled actions in the parietal cortex.
Some researchers have considered that a predictive model referred to
as the forward model and assumed to operate in the cerebellum might
also help us to understand what is happening in the parietal cortex.
Masao Ito, who is famed for his findings linking long-term depres-
sion to the cerebellum, suggested that the cerebellum might host
internal models for action (Ito, 1970). Following Itos idea, Mitsuo
Kawato and Daniel Wolpert constructed detailed forward models
computational models t hat account for optimal control of arm
movements (Kawato, 1990; Wolpert & Kawato, 1998). The forward
model basically predicts how the current sensory inputs change in
the next time step for arbitrary motor commands given in the cur-
rent time step. In the case of arm movement control, the forward
model predicts changes in the angular positions of the arm joints as
output when given joint motor torques as input. Adequate training
of the forward model based on iterative past experience of how joint
angles change due to particular applied motor torques can generate
a good predictive model. More recently, Ito (2005) suggested that
the forward model might be first acquired in the parietal cortex and
the model further consolidated in the cerebellum later. In addition,
Oztop, Kawato, and Arbib (2006) as well as Blakemore and Sirigu
(2003) have suggested that both the parietal cortex and cerebellum
might host the forwardmodel.
I, however, speculate that the predictive model in the parietal cortex
may predict the perceptual outcome sequence as corresponding not to
motor commands at each moment but to macroscopic states of inten-
tion for actions that might be sent from the higher-order cognition pro-
cessing area such as the prefrontal cortex (Figure 4.6). For example, for a
given intention of throwing a basketball into a goal net, the correspond-
ing visuo-proprioceptive flow consisting of proprioceptive trajectory of
body posture change and visual trajectory of the ball falling into the net
can be predicted. In a similar manner, such predictive models acquired
by a skilled carpenter can predict the visuo-auditory-proprioceptive flow
associated with an intention of hitting a nail. These illustrations just
follow the aforementioned thought by Gibson and Pick. The point here
is that a predictive model may not need to predict the perceptual out-
comes for all possible combinations of motor commands, including many
unrealistic ones. If the predictive model attempts to learn to predict
59

Introducing the Brain and Brain Science 59

Mismatch info.

Intention M1
S1 Parietal

Proprioceptive
prediction
Mismatch
Visual
prediction Visual
perception

Motor
command

Figure4.6. Predictive model in the parietal cortex. By receiving intention


for action from the prefrontal cortex it predicts perceptual outcomes such
as visuo-proprioceptive trajectories. Prediction of proprioception in terms
of body posture results in the generation of necessary motor command
sequences for achieving it. The intention is modified in the direction of
minimizing the mismatch between the prediction and the perceptual
outcome.

all possible motor command combinations, such an attempt will face a


combinatorial explosion, which has been known as the frame problem
(McCarthy, 1963)in AI research. Instead, a predictive model needs to
predict possible perceptual trajectories associated only with a set of well-
practiced familiar actional intentions.
Jeannerod (1994) has conjectured that individuals have so- called
motor imagery for their well-practiced behaviors. Motor imagery is a
mental process by which an individual imagines or simulates a given
action without physically moving any body parts or sensing any signals
from the outside world. The predictive model assumed in the parietal
cortex can generate motor imagery by means of a look-ahead predic-
tion of multimodal perceptual trajectories over a certain period. Indeed,
Sirigu and colleagues (1996) compared healthy individuals, patients
with damage to the primary motor area, and ones in the parietal cortex
reported that patients with lesions in the parietal cortex showed selec-
tive impairment in generating motor imagery.
If the predictive model just predicts perceptual sequences for given
intentions for action, how can motor command sequences be obtained? It
can be considered that a predicted body posture state in terms of antici-
pated proprioception might be sent to the premotor cortex or primary
60

60 On theMind

motor cortex (M1) via primary somatosensory cortex (S1) as a target


posture to be achieved in the next time step. This information is further
sent to the cerebellum, where the necessary motor commands or muscle
forces to achieve this target posture might be composed. The target sen-
sory signal could be a reaction force that is anticipated to be perceived,
for example, in the thumb and index finger in the case of precisely grasp-
ing a small object. Again, the cerebellum might compute the necessary
motor torque to be exerted on the thumb and finger joints in order to
achieve the expected reaction force. This constitutes the top-down sub-
jective intentional pathway acting on the objective world as introduced
through the brief review of phenomenology given in c hapter3.
Lets look next at the bottom-up recognition that is thought to be the
counterpart to top-down prediction. The prediction of sensory modali-
ties such as vision and tactile sensation that is projected to each periph-
eral sensory area through the top-down pathway might be compared
with the actual outcome. When the visual or tactile sensation actually
perceived is something different from the predicted sensation, like in
the situation described by Heidegger wherein the hammer misses hit-
ting a nail (see chapter3), the current intention of continuing to hit the
nail would be shifted consciously to a different intention such as look-
ing for the mishit nail or in searching for an unbroken hammer. If the
miss-hit does not happen, however, everything will continue on auto-
matically as expected without any shifts occurring in the current inten-
tion. Such shifts in intentional states might be brought about through
the mismatch error between prediction and perceptual reality. When
such a mismatch is generated, the intention state may be updated in
the direction of minimizing the mismatch error. As the consequence of
interaction between these top-down and bottom-up processes, current
intentions can be reformed in light of a changing situation or mistaken
environment.
When action changes the perceptual reality from the one expected,
the recognized perceptual reality alters the current intention. This
aspect of top-down and bottom-up interaction is analogous to predic-
tive coding suggested for hierarchical visual processing as proposed by
Rao and Ballard (see section 4.1). The obvious question to ask is whether
in fact the brain actually employs such intention adjustment mecha-
nisms by monitoring the outcomes of its own predictions or not. There
is some recent evidence to this effect based on human brain imaging
techniques including functional magnetic resonance imaging (fMRI)
61

Introducing the Brain and Brain Science 61

and electroencephalography (EEG). Both techniques are known to be


good at measuring global brain activity and to compliment one another,
with relatively good spatial resolution from fMRI and good temporal
resolution from EEG. These imaging studies have suggested that the
temporoparietal junction (TPJ), where the temporal and parietal lobes
meet, the inferior frontal cortex, and the SMA may all be involved in
detecting mismatches between expected and actual perception in mul-
timodal sensations (Downar etal., 2000; Balslev etal., 2005; Frith &
Frith, 2012). It may be the TPJ that triggers adjustments in current
action by detecting such mismatches (Frith & Frith,2012).
That said, it may be reasonable to consider the alternative, that inter-
actions between top- down prediction with a specific intention and
bottom-up modification of this intention take place in a web of local
networks including the frontal cortex, parietal cortex, and the vari-
ous peripheral sensory areas, rather than in one specific local region.
From this more distributed point of view, whatever regions are actually
involved, it is the interactions between them that are indispensable in
the organization of diverse intentional skilled actions in a changeable
environment.

4.2.2 Returning toMerleau-Ponty

The concept behind the predictive model accords well with some of
Merleau-Pontys thinking, as described in chapter3. In his analysis of
a blind man walking with a stick, he writes that the stick can be also a
part of the body when the man scans his surroundings by touching its
tip to things. This phenomenon can be accounted for by the acquisition
of a predictive model for the stick. During a lengthy period in which
the man uses the same stick, he acquires a model through which he
can anticipate how tactile sensation will propagate from the tip of stick
while touching things in his environment. Because of this unconscious
anticipation, which we can think about in terms of Husserls notion
of protention (e.g., we would anticipate hearing the next note of mi
when hearing re in do-re-mi, as reviewed in c hapter3), and recalling
Heideggers treatment of equipment as extensions of native capacities
for action, the stick could be felt to be a part of the body, provided that
the anticipation agrees with the outcome.
Related to this, Atsushi Iriki and colleagues (1996) made an impor-
tant finding in their electrophysiological recording of the parietal cortex
62

62 On theMind

in monkeys during a tool manipulation task. Monkeys confined to chairs


were trained to use a rake to draw toward them small food objects
located in front of them. After the training, neurons in the intraparietal
sulcus, a part of the parietal cortex, were recorded for two phases:cap-
turing the food without the rake and capturing the food with it. In the
without-rake phase, they found that some bimodal neurons fired either
when a tactile stimulus was given to the palm of the hand or when
a visual stimulus approached the vicinity of the palm. It was shown
that these particular neurons have a certain receptive field. Thus, each
neuron fires only when the visual or tactile stimulus comes to a specific
position relative to the palm (Figure4.7a).
Surprisingly, in the with-rake phase, the same neurons fired when
the visual stimulus approached the vicinity of the rake, thus demon-
strating an extension of the visual receptive field to include the rake
(Figure 4.7b). This shifting of the receptive field from the vicinity of
the hand to that of the rake implies that the monkey perceives the rake
as a part of the body when extended from the hand and purposefully
employed, in the same way that the stick becomes a part of the body
of a blind man. Monkeys thus seem to embody a predictive model that
includes possible interactions between the rake and the food object.
The phantom limb phenomenon described in chapter3 can be under-
stood as an opposite case to that of the blind mans stick. Even though

(b)

(a)

Figure4.7. The receptive field of neurons in the intraparietal sulcus (a)in


the vicinity of the hand in the without-rake phase and (b)extended to cover
the vicinity of the rake in the with-rakephase.
63

Introducing the Brain and Brain Science 63

the limb has been amputated, the predictive model for the limb might
remain as a familiar horizon, as Merleau-Ponty would say, which would
generate the expectation of a sensory image corresponding to the current
action intention, which is then sent to the phantom limb from the motor
cortex. The psychosomatic treatment invented by Ramachandran and
Blakeslee (1998) using the virtual-reality mirror box provided patients
with fake visual feedback that an amputated hand was moving. This
feedback to the predictive model would have evoked the propriocep-
tive image of move for the amputated limb by modifying the current
intention from freeze to move, which might result in the feeling of
twitching that patients experience in phantomlimbs.
Merleau-Ponty held that synesthesia, wherein sensation in one
modality unconsciously evokes perception in another, might originate
from iterative interactions between multiple modalities of sensation
and motor outputs by means of reentrant mechanisms established in
the coupling between the world and us (see chapter3). If we consider
that the predictive model deals with the anticipation of multimodality
sensations, it is not feasible to assume that each modality of sensation
anticipates this independently. Instead, a shared structure should exist
or be organized that can anticipate incoming sensory flow from all
of the modalities together. It is speculated that a dynamic structure
such as this is composed of collective neuronal activity, and it makes
sense to consider that the bimodal neurons found in the parietal cor-
tex as well as in the premotor cortex might in part constitute such a
structure.
In sum then, the functional role of the parietal cortex in many ways
reflects what Merleau-Ponty was pointing to in his philosophy of embodi-
ment. Actually, the how pathway stretching through the parietal cortex
is reminiscent of ambiguity in Merleau-Pontys sense, as it is located mid-
way between the visual cortex that receives visual inputs from the objec-
tive world and the prefrontal cortex that provides executive control with
subjective intention over the rest of the brain. Several fMRI studies of
object manipulation and motor imagery for objects have shown signifi-
cant activation in the inferior parietal cortex. Probably the goal of object
manipulation propagates from the prefrontal cortex through the supple-
mentary motor area to the parietal cortex via the top-down pathway,
whereas perceptual reality during manipulation of the object propagates
from the sensory cortices, including the visual cortex and somatosen-
sory cortex for tactile and proprioceptive sensation, via the bottom-up
64

64 On theMind

pathway. Both of these pathways likely intermingle with each other, with
close interaction occurring in the parietal cortex.

4.2.3 Mirror Neurons:Unifying theGeneration


and Recognition ofActions

Many researchers would agree that the discovery of mirror neurons by


Rizzolattis group in 1996 is one of the most important findings for sys-
tems neuroscience in recent decades. Personally, Ifind the idea of mir-
ror neurons very appealing because it promises to explain how the two
essential cognitive processes of generating and recognizing actions can be
unified into a single system.

4.2.4 The Evidence forMirror Neurons

In the mid-1990s, researchers in the Rizzolatti laboratory in Parma


were investigating the activities of neurons in the ventral premotor
area (PMv) in the control of hand and mouth movements in monkeys.
They had found that these neurons fired when the monkey grasped food
objects, and whenever they fired, electrodes activated electronic cir-
cuitry to give an audible beep. Serendipitously, one day when a graduate
student entered the lab with an ice cream cone in his hand, every time
he brought it to his lips, the system responded with a beep! The same
neurons were firing both when the monkey grasped food objects and
moved them to its mouth and when the monkey observed others doing
a similar action. With a grad student bringing an ice cream cone to his
mouth, mirror neurons were discovered!
Figure 4.8 shows the firing activity of a mirror neuron responding
to a particular self-generated action as well as to the same action per-
formed by an experimenter. Figure4.8a shows a PMv neuron firing as
the monkey observes the experimenter grasping a piece of food. Here,
we see that the firing of the neuron ceases as the experimenter moves
the food toward the monkey. Then, the same neuron fires again when
the monkey grasps the food given by the experimenter. In Figure4.8b,
it can be seen that the same neuron does not fire when the monkey
observes the experimenter picking up the food with an (unfamiliar!)
tool, but thereafter firing occurs as described for the rest of the sequence
of events in(a).
65

Introducing the Brain and Brain Science 65

(a) (b)

20 20
Spikes s1

Spikes s1
10 10
0 0
0 1 2 3 4 5 0 1 2 3 4 5
Time (s) Time (s)

Figure4.8. How mirror neurons work. (a)Firing of a mirror neuron shown


in raster plots and histograms in the two situations in which the monkey
observes the experimenter grasp a piece of food (left) and thereafter when
the monkey grasps the same piece of food (right). (b)The same mirror neuron
does not fire when the monkey observes the experimenter pick up the food
with a tool (left), but it fires again when the monkey grasps the same piece of
food (right). Adopted from (Rizzolatti etal., 1996)with permission.

Besides these grasping neurons, they also found holding neurons


and tearing neurons that functioned in the same way. There are two
important characteristics about these mirror neurons. The first is that
they encode for entire goal-d irected behaviors, not for parts of them,
that is, the grasping neurons do not fire when the monkey is just about
to grasp the object. The second characteristic is that all the mirror neu-
rons found in the monkey experiments are related to transitive actions
toward objects. Mirror neurons in the monkey so far do not respond to
intransitive behaviors such as reaching the hand toward a part of the
body. That said, however, it looks to be a different case for humans,
as recent human fMRI imaging studies found mirror systems also for
intransitive actions (Rizzolatti & Craighero,2004).
Recent monkey experiments by Rizzolattis group (Fogassi et al.,
2005)have indicated that mirror neurons can be observed in the inferior
parietal lobe (IPL) and that these function to both generate and recognize
goal-directed actions composed of sequences of elementary movements.
In their experiments, monkeys were trained to perform two different
goal-directed actions:to grasp pieces of food and then move them to their
own mouths to eat, and to grasp solid objects (the same size and shape as
the food objects) and then place them into a cylinder. Interestingly, the
66

66 On theMind

activation patterns of many IPL neurons while grasping the objects differ
depending on the subsequent goal, namely to eat or to place, even though
the kinematics of grasping in both cases are the same. Supplemental
experiments confirmed that the activation preferences during grasping do
not originate from differences in visual stimuli between food and a solid
object, but from the difference between goals. This view is reinforced by
the fact that the same IPL neurons fired when the monkeys observed the
experimenters achieving the same goals. These IPL neurons can therefore
also be regarded as mirror neurons. It is certainly interesting that mirror
neuron involvement is not limited to the generation and recognition of
simple actions, but also occurs with compositional goal-directed actions
consisting of chains of elementary movements.
Recent imaging studies focusing on imitative behaviors have also iden-
tified mirror systems in humans. Imitation is considered to be cognitive
behavior whereby an individual observes and replicates the behaviors
of others. fMRI experimental results have shown that neural activa-
tion in the posterior part of the left inferior frontal gyrus as well as in
the right superior temporal sulcus increases during imitation (Iacoboni
etal., 1999). If we consider that the posterior part of the left inferior
frontal gyrus (also called Brocas area) in humans is homologous to the
PMv or F5 in monkeys, it is indeed feasible that these local sites could
host mirror neurons in humans. Although it is still a matter of debate
as to how much other animals including nonhuman primates, dolphins,
and parrots can perform imitation, it is still widely held that the imita-
tion capability uniquely evolved in humans has enabled them to acquire
wider skills and knowledge about human-specific intellectual behaviors
including tool use and language.
Michael Arbib (2012) has explored possible linkages between mir-
ror neurons and human linguistic competency. Based on accounts of
the evolutionary pathway from nonhuman primates to human, he has
developed the view that the involvement of mirror neurons in embod-
ied experience grounds brain structures that underlie language. He has
hypothesized that what he calls the human language-ready brain rests
on evolutionary developments in primates including mirror system pro-
cessing (for skillful manual manipulations of objects, imitation of the
manipulations performed by others, pantomime, and conventionalized
manual gestures) that initiates the protosign system. He further pro-
posed that the development of protosigns provided the scaffolding essen-
tial for protospeech in the evolution of protolanguage (Arbib,2010).
67

Introducing the Brain and Brain Science 67

This hypothesis is interesting in light of the fact that mirror neurons


in human brains might be responsible for recognizing the intentions
of others as expressed in language. Actually, researchers have exam-
ined this idea using various brain imaging techniques such as fMRI,
positron emission tomography, and EEG. Hauk and colleagues (2004)
showed in an fMRI experiment that reading action-related words with
different end effectors, namely lick, pick, and kick, evoked
neural activities in the motor areas that overlap with the local areas
responsible for generating motor movements in the face, arm, and
leg, respectively. More specifically, lick activated the sylvian fissure,
pick activated the dorsolateral sites of the motor cortex, and kick
activated the vertex and interhemispheric sulcus. Brocas area was acti-
vated for all three words. Tettamanti and colleagues (2005) observed
similar types of activation patterns when their subjects listened to
action-related sentences such as I bite an apple, I grasp a knife, and
I kick a ball. Taken together, these results suggest that understanding
action-related words or sentences generates certain canonical activa-
tion patterns of mirror neurons, possibly in Brocas area, which in turn
initiate corresponding activations in motor-related areas. These results
also suggest that Brocas area might be a site of mirror neuronal activ-
ity in humans.
Vittorio Gallese and Alvin Goldman (1998) suggest that mirror neu-
rons in humans play an essential role in theory of mind in social cogni-
tion. The theory of mind approach postulates that although the mental
states of others are hidden from us, they can be inferred to some extent
by applying nave theories or causal rules about the mind to the observed
behavior of others. They argue for a simulation theory, whereby the men-
tal states of others are interpretable through mental simulations that
adopt their perspective, by tracking or matching their states with states
of ones own. If these aforementioned human cases are granted, it can be
said that the mirror neuron system has played an indispensable role in
the emergence of uniquely human cognitive competencies from evolu-
tionary pathways, from manual object manipulation, to protolanguage
and to theory ofmind.

4.2.5 How Might Mirror NeuronsWork?

The reader may ask how the aforementioned mirror neural functions
might be implemented in the brain. Lets consider the mirror neuron
68

68 On theMind

mechanism in terms of the aforementioned predictive model (see


Figure 4.6), which we assumed may be located in the parietal cortex.
If we assume that mirror neurons encode intention for action, we can
easily explain how a particular activation pattern of the mirror neurons
can lead to the generation of ones own specific action and how recogni-
tion of the same action performed by others can lead to the same action
pattern in the mirror neurons. (Although Figure 4.6 assumed that the
intention might be hosted somewhere in the prefrontal area, it could
be by mirror neurons present in this area including Brocas area for
humans.)
In generating ones own actions, such as grasping a coffee cup,
expected perceptual sequences in terms of relative position, ori-
entation, and posture of ones own hand in relation to the cup are
predicted by receiving inputs from mirror neuron activation that rep-
resents the intentional state for this action. Different actions can be
generated by receiving inputs of different mirror neuron activation
patterns, whereby the mirror neurons function as a switcher between
a set of intentional actions. Recognition of the same action performed
by others can be achieved by utilizing the mismatch information as
described previously. In the case of observing others grasp the coffee
cup, the corresponding intentional state in terms of the mirror neuron
activity pattern can be searched such that the reconstructed percep-
tual sequence evoked by this intentional state can best fit with the
actually perceived one in the coordinate system relative to the cof-
fee cup, thereby minimizing the mismatch error. On this model, the
recognition of others actions causes one to feel as if ones own actions
were being generated, due to the generation in the mirror neurons of
motor imagery representing the same intentionalstate.
This assumption accords exactly with what Gallese and Goldman
(1998) suggested for mirror neurons in terms of simulation theory as
described previously. They suggested that mirror neuron discharge
serves the purpose of retrodicting target mental states, moving back-
ward from the observed action, thus representing a primitive version
of a simulation heuristic that might underlie mind-reading. We will
come back to the idea of the predictive coding model for mirror neu-
rons in greater detail as we turn to related robotics experiments in
later chapters.
69

Introducing the Brain and Brain Science 69

4.3. How Can Intention Arise Spontaneously and Become


anObject ofConscious Awareness?

In this chapter so far, we have seen that voluntary actions might be


generated by means of a top-down drive by an intention. The intention
could be hosted by the mirror neurons or other neurons in the prefron-
tal cortex. Wherever it is represented in the brain, we are left with the
essential question of how an intention itself can be set or generated. This
question is related to the problem of free will that was introduced in the
description of Williams Jamess philosophy (see c hapter3). As he says,
free will might be the capability of an agent to choose independently a
course of action freely from among multiple alternatives.
The problem about free will then concerns its origin. If every aspect
of free will can be explained by deterministic physical laws, there
should be no space actually remaining for free will. Can our minds set
intentions for actions absolutely freely without any other causes? Can
intentions shift from one to another spontaneously in a chain for gen-
erating various actions? Another interesting question concerns the issue
of consciousness. If we can freely determine our actions, how can this
determination be accompanied by consciousness? Or more simply, how
can Ifeel consciously that Ihave just determined to do one thing and not
another? Although there have been no definitive answers to this philo-
sophical question thus far, there have been some interesting experimen-
tal results showing possible neural correlates of intention and freewill.

4.3.1 Searching forthe Neural Correlates ofIntention

I would like to introduce, first, the seminal study on conscious inten-


tion conducted by Benjamin Libet. In his experiments (Libet, 1985),
subjects were asked to press a button with their right hands at whatever
moment they wished and their EEG activity was recorded from their
scalp. Libet was trying to measure the exact timing when the subjects
became conscious of their decision to initiate the button press action,
which he called w-judgment time. The subjects were asked to watch
a rotating clock hand and to remember the exact position of the clock
hand when they first felt the urge to move their hand to press the but-
ton. By asking the subjects to report the position after each button press
70

70 On theMind

trial, the exact timing of their conscious intention to act could be mea-
sured for each trial. It was found that the average timing of conscious
intent to act is 206 ms before the onset of muscle activity and that the
Readiness Potential (RP) to build up brain activity (as measured by
EEG) started 1 s before movement onset (Figure4.9).
This EEG activity was localized in the SMA. This is a somewhat sur-
prising result because it implies that the voluntary action of pressing the
button is not initiated by conscious intention but by unconscious brain
activity, namely the readiness potential evoked in the SMA. At the very
least, it demonstrates that one prepares to act before one decides toact.
It should be, however, noted that Libets experiment has drawn sub-
stantial criticism along with enthusiastic debates on the results. It is
said that subjective estimate of time for consciousness arising is not reli-
able (Haggard, 2008). Also, Trevena and Miller (2002) reported that
many reported conscious decision times were before the onset of the
Lateralized Readiness Potential that represents actual preparation for
movement as opposed to RP representing contemplation for movement
as a future possibility.
However, it is also true that Libets study has been replicated by oth-
ers and further extended experiments have been conducted (Haggard,
2008). Soon and colleagues (2008) showed that this unconscious brain
activity to initiate voluntary action begins much longer before the onset

Conscious
decision
Readiness
potential
onset 206 ms

500 ms 1000 ms
Voltage

+ 2 1 0 Time (s)

Movement
onset

Figure4.9. The readiness potential to build up brain activity prior to


movement onset, recorded during a free decision task conducted by Libet
(1985).
71

Introducing the Brain and Brain Science 71

of physical action. By utilizing fMRI brain imaging, they demonstrated


that brain activity is initiated in the frontopolar part of the prefrontal
cortex and in the precuneus in the medial area of the superior parietal
cortex up to 7 s before a conscious decision is made to select either
pressing the left button with the left index finger or the right button
with the right index finger. Moreover, from the brain activity observed,
the outcome of the motor decision to select between the two actions
(a selection the subjects did not consciously make) could be predicted
from this early brain activity, prior to reported consciousness of such
selection.

4.3.2 How toInitiate Intentions and Become ConsciouslyAware

The experimental evidence provided by Libet and Soons group can


be integrated to produce the following hypothesis. Brain activity for
selecting a voluntary action is initiated unconsciously in the frontopolar
part of the prefrontal cortex or in the precuneus in the parietal cortex
from more than several seconds to 10 seconds before the onset of cor-
responding physical movement, then is transmitted downstream to the
SMA 1 second before the movement, with consciousness of this inten-
tion to act arising only a few hundred milliseconds before movement
onset. Controversially, this implies that there is no room left for free
will, because our conscious intent that seemingly determines free next
actions appears to actually be caused by preceding unconscious brain
activities arising a long time before. If this is indeed true, it raises two
fundamental questions. Can we freely initiate unconscious brain activ-
ity in the frontopolar part of the prefrontal cortex or in the parietal
cortex? And second, why do we feel conscious intention for voluntary
action only at a very late stage of preparing for action, and what is the
role of this conscious intention if it is not behind determining subse-
quent voluntary actions?
To address the first question, lets assume that the unconscious activ-
ity in the beginning might not be caused by anybody or anything, but
may appear automatically, by itself, as an aspect of continuously chang-
ing brain dynamics. This notion relates to the spontaneous generation
of alternative images and thoughts put forward by William James.
As described previously (see Figure 3.4), when memory hosts com-
plex relations or connections between images of past experiences, an
image may be regenerated with spontaneous variations into streams of
72

72 On theMind

consciousness. This idea of James leads to the conjecture that continu-


ous transitions of images are generated spontaneously along trajectories
of brain activation states visiting first one image state and then another
iteratively.
Such spontaneous transitions can be accounted for by observations of
the autonomous dynamic shifts of firing patterns in collective neurons
in the absence of external stimulus inputs. Using an advanced optical
imaging technique, Ikegaya and colleagues (2004) observed the activ-
ities of a large number of neurons in the in vitro hippocampus tissue
of rats. Their main finding concerns what the authors metaphorically
call a cortical song wherein various spatiotemporally distributed firing
patterns of collective neurons appear as motifs and shift from one to
another spontaneously. Although those motifs seem to appear randomly
in many cases, they often repeat in sequences exhibiting some regular-
ity. Based on other work done by Churchland and colleagues (2010),
we now also know how fluctuations in activities of collective neurons
in the PMC during the preparation of movements can affect the gener-
ation of succeeding actual movements. They recorded the simultaneous
activities of 96 PMC cells of monkeys during the preparatory period for
a go-cuetriggered visual target reaching task 2 over many trials. First,
they found that the trajectories of the collective neural activities could
be projected into a two-d imensional axis from 96 original dimensions
by a mathematical analysis similar to principal component analysis.
They also found that those trajectories from the go cue until the onset
of movement were mostly repeated for different trials of normal cue
response cases (Figure4.10).
An exception to the preceding schema was observed during prepar-
atory periods leading to generation of failure behaviors such as abnor-
mally delayed responses. In such cases, it was seen that the neural
activation trajectories fluctuated significantly. Such fluctuating trajecto-
ries appeared even though the setting at each trial was identical. Then,
how can such fluctuating activities of collective neurons occur? Freeman
(2000) and many others have speculated that such spontaneous fluctu-
ation might be generated by means of deterministic chaos developed in
the neural activity either at the local neuronal circuit level or at larger

2.The animals are trained to reach a position that was prior-specified visually
immediately after a go-cue.
73

Introducing the Brain and Brain Science 73

(a) Failure (b)


Go cue
Go cue

Movement
Pre-target onset
Movement
onset Pre-target Failure

Figure4.10. Overwriting of 15 trajectories by means of two-d imensional


projection of the activities of 96 neurons in the dorsal premotor cortex area
of a monkey during repeated trials of reaching for a visual target task (a)on
one occasion and (b)on a different occasion. In both plots, trajectories for
failure cases are shown with thick lines. Adopted from (Churchland etal.,
2010)with permission.

cortical area levels. These possibilities are explored in Chapter10. To


sum up then, continuous change in the cortical dynamical state might
account for the spontaneous generation, without any external causes, of
various intentions or images for next actions.
The second question concerning why we become conscious of inten-
tion for voluntary action only at a very late stage of preparation for action
remains difficult to answer at present. However, several reports on cor-
tical electrical stimulation in human subjects might open a way to an
answer. Desmurget and colleagues (2009) offer us two complementary
pieces of evidence obtained in their cortical electrical stimulation study
conducted in patients with brain tumors. The study employed periop-
erative brain stimulations with a bipolar electrode during awake surgery
for tumor removal. Stimulations of the premotor cortex evoked overt
mouth and contralateral limb movements. But what was interesting was
that in the absence of visual feedback, the patients firmly denied mak-
ing the movements they actually made; they were not consciously aware
of the movements generated. Conversely, stimulation of the parietal cor-
tex created an intention or desire in the patients to move. With stronger
stimulation, they reported that they had moved their limbs even though
they had not actually moved them. Given this result, Desmurget and col-
leagues speculated that the parietal cortex might mediate error monitor-
ing between the predicted perceptual outcome for the intended action
74

74 On theMind

and the actual one. (These results also imply that the depotentiation of
the parietal cortex without an error signal signifies successful execution
of the intended action.)
Fried and colleagues (1991) reported results of direct stimulation of the
presupplementary motor area in patients as part of neurosurgical evalua-
tion. Stimulation at a low current elicited the urge to move a specific body
part contralateral to the stimulated hemisphere. This urge to move the
limbs is similar to a compulsive desire and in fact the patients reported
that they felt as if they were not the agent of the generated movements.
In other words, this is a feeling of imminence for movements of specific
body parts in specific ways. Actually, the patients could describe precisely
the urges evoked; for example, the left arm was about to move inward
toward the body. This imminent intention for quite specific movements
with stimulation of the presupplementary motor area contrasts with the
case of parietal stimulation mentioned earlier, in which the patients felt a
relatively weak desire or intention to move. Another difference between
the two studies is that more intense stimulation tended to produce actual
movement of the same body part when the presupplementary motor area,
but not the parietal cortex, was stimulated.
Putting all of this evidence together, we can create a hypothesis for
how conscious intention to initiate actions is organized in the brain as
follows. The intention for action is built up from a vague intention to
a concrete one by moving downward through the cortical hierarchy. In
the first stage (several seconds before movement onset), the very early
form of the intention is initiated by means of spontaneous neuronal
state transitions in the prefrontal cortex, possibly in the frontopolar
part as described by Soon and colleagues. At this stage, the intention
generated might be too vague to access its contents and therefore it
wouldnt be consciously accessible (beyond a general mood of anticipa-
tion, to recall Heidegger once again). Subsequently, the signal carrying
this early form of intention is propagated to the parietal cortex, where
prediction of perceptual sequences based on this intention is generated.
This idea follows the aforementioned assumption about functions of
the parietal cortex shown in Figure 4.6. By generating a prediction of
the overall profile of action in terms of its accompanying perceptual
sequence, the contents of the current intention become consciously
accessible. Then, the next target position for movement predicted by
the parietal cortex in terms of body posture or proprioceptive state
is sent to the presupplementary motor area, where a specific motor
75

Introducing the Brain and Brain Science 75

program for the required immediate movement is generated online.


This process generates the feeling of imminence for movements of spe-
cific body parts in specific ways, as described by Fried and colleagues.
The motor program is sent to the premotor cortex and primary motor
cortex to generate corresponding motor commands. This process is
assumed to be essentially unconscious on the basis of the findings of
Desmurget and colleagues (2009) mentioned earlier.
A big however needs to follow this hypothesis, because it con-
tains some unclear parts. First, this hypothesis conflicts on a num-
ber of points described thus far in this book, and the details of these
conflicts are examined in the next section. Second, it has not been
clarified yet how the contents of the current intention become con-
sciously accessible in the parietal cortex in the process of predicting
the resultant perceptual sequences. As related to this problem, David
Chalmers speculates that it is nontrivial to account for the quality
of human experiences of consciousness in terms of neuroscience
data alone. This is what he calls the hard problem of consciousness
(Chalmers, 1995). This hard problem is contrasted with the so called
easy problem in which a target neural function can be understood by
its reduction into processes of physical matter. Suppose that a set of
neurons that fire only at conscious moments are successfully identi-
fied in subjects. Yet, there is no way to explain how the firings of these
neurons result in the conscious experiences of the subjects. This is the
hard problem.
Analogously, how can we account for causal relationships between
consciousness of ones own actions and neural activity in the parietal
cortex? This problem will be revisited repeatedly in later chapters, as it
is central to this current book. Next, however, we must look at some of
remaining open problems.

4.4. Deciding Among Conflicting Evidence

Lets remind ourselves of the functional role of the presupplementary


motor area described by (Tanji & Shima, 1994; Shima & Tanji, 1998;
Shima & Tanji, 2000). Their electrophysiological experiments with
monkeys showed that this area includes neurons responsible for organiz-
ing sequences of primitive movements. However, their findings conflict
with those of Fried and colleagues (1991), obtained by human brain
76

76 On theMind

electrical stimulation. These researchers claim that the presupplemen-


tary motor area is responsible for generating merely the urge for immi-
nent movements, not for the expectation or desire for whole actions
consisting of sequences of elemental movements. If Tanji and Shimas
findings for the role of the presupplementary motor area in monkeys
hold true for humans, then electrical stimulation of the presupple-
mentary area in humans should likewise evoke desire or expectation
for sequences of elementary movements. Well come back to the pos-
sible role of the presupplementary motor area in human cognition in a
moment.
Another conflict concerns the functional role of the premotor cortex.
Although the premotor cortex (F5 in monkeys) should host intentions
or goals for the next actions to be generated according to the mirror
neuron theory put forward by Rizzolattis group (Rizzolatti et al.,
1996), later experiments by Desmurget and Sirigu (Sirigu etal., 2003;
Desmurget etal., 2009)suggest that it may not be the premotor cor-
tex that is involved in conscious intention for action but the parietal
cortex, as described in the previous section. In fact, Rizzolatti and col-
leagues (Fogassi etal., 2005)did later find mirror neurons in the parietal
cortex of monkeys. These mirror neurons in the parietal cortex seem
to encode intention for sequences of actions for both ones own action
sequence generation and while observing similar action generation by
others. We may ask then whether some neurons, not just those in the
premotor cortex but also those in the parietal cortex, fire as mirror neu-
rons in the case of generating as well as recognizing single actions like
grasping food objects, as described in the original mirror neuron paper
(Rizzolatti etal., 1996). The puzzle we have here is the following. What
is the primary area for generating voluntary actions? Is the presupple-
mentary motor area to be considered the locus for generating voluntary
action? Or is it the premotor cortex, the original mirror neuron site? Or,
is it the parietal cortex, responsible for the prediction of action-related
perceptual sequences? Or, ultimately, is it the prefrontal cortex, as the
center for executive control? Although it could be the supplementary
motor cortex, premotor cortex, or parietal cortex, we simply cannot
tell right now, as the evidence currently available to us is apparently
contradictory.
Finally, we might be disappointed that circuit-level mechanisms for
the cognitive functions of interest are still not accounted for exactly
by current brain research. Neuroscientists have taken a reductionist
approach by pursuing possible neural correlates of all manner of things.
77

Introducing the Brain and Brain Science 77

They have investigated mappings between neuronal activities in specific


local brain areas and their possible functions, like the firing of presup-
plementary motor area cells in action sequencing or of mirror neurons
in the premotor cortex in action generation and recognition, with the
hope of clarifying some mechanisms at work in the mind and cognition.
Although clearly the accumulation of such evidence serves to inspire
us to imagine how the mind may arise from activity in the brain, such
evidence cannot yet tell us the exact mechanisms underlying different
types of subjective experience, at least not in a fine-grained way ade-
quate to confirming one-to-one correlative mappings from the what it
feels like to specific physical processes. How can the firings of specific
cells in the presupplementary motor area mechanize the generation of
corresponding action sequences? How can the firings of the same pre-
motor cells in terms of mirror neurons mechanize both the generation of
specific actions and the recognition of the same actions by others? What
are the underlying circuitry level mechanisms accounting for both, as
well as the feeling of witnessing either? In order to answer questions like
these, we might need future technical breakthroughs in measurement
methods such as simultaneous recording of a good number of neurons
and their synaptic connectivity in target functional circuits which are
associated with modeling scheme of good quality.

4.5.Summary

This chapter explored how cognitive minds can be mechanized in bio-


logical brains by reviewing a set of empirical results. First, we reviewed
general understandings of possible hierarchical architectures in visual
recognition and motor action generation. In the visual pathway, earlier
stages of the visual system (in the primary visual cortex) are thought
to deal with the processing of detailed information in the retinotopic
image and later stages to deal with more abstract information process-
ing (in the inferior temporal cortex). Thus, some have assumed that
complex visual objects can be recognized by decomposition into specific
spatial combinations of visual features represented in the lower level.
The action generation pathway is also presumed to follow hierarchical
processes. It is assumed that the supplementary motor area (SMA) and
the premotor cortex (PMC) perform higher level coordination for gen-
erating voluntary action and sensory-guided action by sending control
signals to the primary motor cortex (M1) in the lowerlevel.
78

78 On theMind

However, there has arisen some conflicting evidence that does not
support the existence of a rigid hierarchy both in the visual recognition
and in action generation. So, we next examined a new way of conceiving
of the processes at work in which action generation and sensory recogni-
tion are inseparable. We found evidence for this new approach in the
review of recent experimental studies focusing on the functional roles
of the parietal cortex and mirror neurons distributed through differ-
ent regions of the brain. We entertained the hypothesis that the pari-
etal cortex may host a predictive model that can anticipate perceptual
outcomes for actional intention encoded in mirror neurons. It was also
speculated that a particular perceptual sequence can be recognized by
means of inferring the corresponding intention state, and that the pre-
dictive model can regenerate this sequence. Ahallmark of this view is
that action might be generated by the dense interaction of the top-down
proactive intention and the bottom-up recognition of perceptual reality.
Furthermore, we showed how this portrait is analogous to to Merleau-
Pontys philosophy of embodiment.
An essential question remained. How is intention itself set or gener-
ated? This question is related to the problem of free will. We reviewed
findings that neural activities correlated with free decisions are initiated
in various regions including the SMA, the prefrontal cortex, and the
parietal cortex significantly before individuals become consciously aware
of the decision. These findings raise two questions. The first question
concerns how unconscious neural activities for decisions are initiated
in those related regions. The second question concerns why conscious
awareness of free decisions is delayed. Although we have provided some
possible accounts to address these questions, they are yet speculative.
Also in this chapter, we have found that neuroscientists have taken a
reductionist approach by pursuing possible neural correlates of all man-
ner of things. They have investigated mappings between neuronal activ-
ities in specific local brain areas and their possible functions. Although
the accumulation of such evidence can serve to inspire us to hypothesize
how the normal functioning brain results in the feeling of being con-
scious, neurological evidence alone cannot yet specify the mechanisms
at work. And with this, we have seen that not one, but many important
questions about the nature of the mind remain to be answered.
How might we see neural correlates for our conscious experi-
ence? Suppose that we might be able to record all essential neuro-
nal data such as the connectivity, synaptic transmission efficiency,
79

Introducing the Brain and Brain Science 79

and neuronal firings of all related local circuits in the future. Will
this enable us to understand the mechanisms behind all of our phe-
nomenological experiences? Probably not. Although we would find
various interesting correlations in such massive datasets, like the cor-
relations between synaptic connectivity and neuronal firing patterns
or those between neuronal firing patterns and behavioral outcomes,
they would still just be correlations, not proof of causal mechanisms.
Can we understand the mechanisms of a computers operating system
(OS) just by putting electrodes at various locations on the mother
board circuits? We may obtain a bunch of correlated data in relation
to voltages but probably not enough to infer the principles behind the
workings of a sophisticatedOS.
By taking seriously limitations inherent to the empirical neuroscience
approach, this book now begins to explore an alternative approach, a
synthetic modeling approach that attempts to understand possible neu-
ronal mechanisms underlying our cognitive brains by reconstructing
them as dynamic artifacts. The synthetic modeling approach described
in this book has two complementary focuses. The first is to use dynam-
ical systems perspectives to understand various complicated mecha-
nisms at work in cognition. The dynamical systems approach is effective
in articulating circular causality, for instance. The second focus con-
cerns the embodiment of the cognitive processes, which were briefly
described in the previous chapter. The role of embodiment in shaping
cognition is crucial when causal links go beyond brains and establish cir-
cular causalities between bodies and their environments (e.g., Freeman,
2000.) The next chapter provides an introductory account that consid-
ers such problems.
80
81

5
Dynamical Systems Approach
forModeling Embodied Cognition

Nobel laureate in physics Richard Feynman once wrote on the chalk-


board during a lecture:

What Icannot create, Icannot understand.Richard Feynman1

Conversely, thus:I can understand what Ican create. This seems to make
sense because if we can synthesize something, we should know its orga-
nizing principles. By this line of reasoning, then, we might be able to
understand the cognitive mind by synthesizingit.
But how can we synthesize the mind? Basically, the plan is to put
some computer simulation models of the brain into robot heads and
then examine how the robots behave as well as how the neural activa-
tion state changes dynamically in the artificial brains while the robots
interact with the environment. The clear difficulty involved in doing
this is how to build these brain models. Although we dont yet know
exactly their organizing principles, we should begin by deriving the most
likely ones through a thorough survey of results from neuroscience,

1.This statement was found on his blackboard at the time of his death in
February1988.

81
82

82 On theMind

psychology, and cognitive science. In robotics experiments, we can


examine neural activation dynamics (of a brain model) and behaviors
(of such embrained robots) as robots attempt to achieve goals of cogni-
tive tasks designed by experimenters.
It is not trivial to anticipatedare we say guesswhat sorts of phe-
nomena might be observed in such experiments even though the prin-
ciples used in engineering relevant brain models are well defined. This
comes from the fact that all interactions that occur within the model
brains, as well as between them and the environment by circular causal-
ity, are dominated by nonlinear dynamics for which numerical solutions
cannot be obtained analytically. Rather, we should expect that such
robotics experiments might evidence nontrivial phenomena that are
not to be inferred from formative principles themselves. If such emer-
gent phenomena observed in experiments correspond to various bodies
of work including empirical observations in neuroscience, computa-
tional aspects in cognitive science, and reports from phenomenological
reduction, the presumed principles behind the models would seem to
hold. Moreover, it would be great if just a small set of principles in the
model could account for numerous phenomena of the mind through
their synthesis. This is the goal of the synthetic approach, to articu-
late the processes essential to cognition as we experience it and ideally
nothingmore.
Now, lets assume that the mind is a product of emergent processes
appearing in the structural interactions between the brain and the envi-
ronment by means of sensorymotor coupling of a whole, embodied agent
through behavior, wherein the mind is considered a nontrivial phenom-
enon appearing as a result of such interactions. This assumption refers to
the embodied mind, or embodied cognition (Varela etal., 1991). Many phe-
nomena emergent from embodied cognition can be efficiently described
in the language of dynamical systems, as we will see. Subsections of the
current chapter will explore the idea of embodied cognition by visiting
different approaches taken so far. These include psychological studies
focusing on embodiment and new-trend artificial intelligence robot-
ics studies exemplifying behavior-based robotics involving the synthe-
sis of embodied cognition. Readers will see that some psychological
views, especially Gibsonian and Neo-Gibsonian approaches have been
well incorporated into dynamical system theories, and have thus pro-
vided useful insights guiding behavior-based robots and neurorobots.
After this review, we will consider particular neural network models as
83

Dynamical Systems Approach for Embodied Cognition 83

abstractions of brains, and then consider a set of neurorobotics studies


by using those models that demonstrate emergence through synthesis by
capturing some of the essence of embodied cognition. First, however, the
next section presents an introduction to dynamical systems theories that
lay the groundwork for the synthetic modeling studies to follow.
But, readers should note that this is not the end of the story:Chapter6
discusses some of the crucial ingredients for synthesizing the mind
that have been missed in conventional studies on neural network model-
ing and behavior-based robotics. The first section provides an introduc-
tory tutorial on general ideas of dynamical systems.

5.1. Dynamical Systems

Here, Iwould like to start with a very intuitive explanation. Lets assume
that there is a dynamical system, and suppose that this system can be
described at any time as exhibiting an N dimensional system state where
the ith dimensional value of the current state is given as x ti . When x ti +1 as
the ith dimensional value of the state at next time step, and can be deter-
mined solely by way of all dimensional values at the current time step,
the time development of the dimensions in the system can be described
by the following difference equation (also called a map):

( )
x1t +1 = g 1 x1t , xt2 ,, x tN

(
xt2+1 = g 2 x1t , xt2 ,, xtN

) (Eq.1)

N N
(
xt +1 = g xt , xt ,, xt
1 2
)
N

Here, the time development of the system state is obtained by iterating


the mapping of the current state at t to the next state at t+1 starting
from given initial state. Eq. 1 can be rewritten with N dimensional state
vector X t , and with P as a set of parameters of interest that characterize
the functionG():

X t +1 = G(X t , P ) (Eq.2)

A given dynamical system is often investigated by examining changes


in time-development trajectories versus changes in the representative
84

84 On theMind

parameter set P. If the function G() in Eq. 2 is given as a nonlinear


function, the trajectories of time development can become complex
depending on the nonlinearity. In most cases, the time development
of the state cannot be obtained analytically. It can be obtained only
through numerical computation as integration over time from a given
initial state X0 and this computation can only be executed with the use
of modern digital computers.
Dynamical systems can be described also with an ordinary differen-
tial equation in continuous time with X as a vector of system state, with
X as a vector of the time derivative of the state (it can be also written
as X ), and with F() as a nonlinear dynamic function parameterized by
t
P as shown inEq.3.

X = F ( X , P ) (Eq.3)

The exact trajectory in continuous time can be obtained also by integrat-


ing the time derivative from a given dynamical state at the initialtime.
The structure of a particular dynamical system is characterized by
the configuration of attractors in the system, which determines the time
evolution profiles of different states. Attractors are basins toward which
trajectories of dynamical states converge. An attractor is called an invar-
iant set because, after trajectories converge (perhaps after infinite time),
they become invariant trajectories. That is, they are no longer variable
and are instead determined, representing stable state behaviors charac-
terizing the system. On the other hand, outside of attractors or invariant
sets are transient states wherein trajectories are variable. Attractors can
be roughly categorized in four types as shown in Figure5.1ad.
The easiest attractor to envision is a fixed point attractor in which
all dynamic states converge to a point (Figure 5.1a). The second one is
a limit cycle attractor (Figure 5.1b). In this type of attractor, the trajec-
tory converges to a cyclic oscillation pattern with constant periodicity.
The third one is a limit torus that appears when there is more than one
frequency involved in the periodic trajectory of the system and two of
these frequencies form an irrational fraction. In this case, the trajectory
is no longer closed and it exhibits quasi-periodicity (Figure 5.1c). The
fourth one is a chaotic attractor (a strange attractor) in which the tra-
jectory exhibits infinite periodicity and thereby forms fractal structures
(Figure 5.1d). Finally, in some cases multiple local attractors can coexist
in the same state space as illustrated in Figure 5.1e. In such cases, the
85

Dynamical Systems Approach for Embodied Cognition 85

(a) (b) (c)

x
P2 P1

(d) (e)

Figure5.1. Different types of attractors. (a)Fixed point attractor, (b)limit


cycle attractor, (c)limit torus characterized by two periodicities P1 and
P2 which form an irrational fraction, and (d)chaotic attractor. (e)Shows
multiple attractors consisting of a fixed point attractor and a limit cycle
attractor. Note that all four types of attractors are illustrated in terms of
continuous time dynamical systems.

attractor to which the system converges depends on the initial state. In


Figure 5.1e a state trajectory starting from the left side and the right side
of the dotted curb will converge to a fixed point and a limit cycle, respec-
tively. Next, we look at the case of discrete time dynamics in detail.

5.1.1 Discrete TimeSystem

Let us examine the so-called logistic map, which was introduced by


Robert May (1976), as a simple illustrative example of Eq. 1 with a one-
dimensional dynamic state. Even with a one-d imensional dynamic state,
its behavior is nontrivial as will be seen in the following. The logistic
map is written in discrete-time formas:

x t +1 = a x t (1 x t ) (Eq.4)

Here, x t is a one-d imensional dynamic state and a is a parameter. If a


particular value is taken for the initial state, x 0 , it will recursively gen-
erate a trajectory x1, x2 ,.,x n as shown in the diagram at the left of
Figure5.2a.
86

86 On theMind

(a) 1.0

xt+1 xt+1=xt 0.8

0.6
x
0.4

0.2

x0 x3 x1 x2 xt 0.0
2.4 2.6 2.8 3.0 3.2 3.4 3.6 3.8 4.0
a

(b) a = 2.6 a = 3.2 a = 3.6


1.0 1.0 1.0

x x x

0.3 0.3 0.3


0 10 20 30 0 10 20 30 0 10 20 30 40 50
t t t

Figure5.2. Alogistic map. (a)Dynamic iteration corresponding to a logistic


map is shown on the left and its bifurcation diagram with respect to the
parameter a is shown on the right. (b)Time developments of the state with
different values of a where a fixed point attractor, limit cycle attractor,
and chaotic attractor appear from left to right for a=2.6, 3.2, and 3.6,
respectively.

Now, lets examine how the dynamical structure of a logistic map


changes when the parameter a is varied continuously. For this purpose,
a bifurcation diagram of the logistic map is shown in Figure 5.2a, right.
This diagram shows an invariant set of attractors for each value of a,
where an invariant set means a set of points within the convergence tra-
jectory as mentioned previously. For example, when a is set to 2.6, the
trajectory of x t converges toward a point around 0.61 from any initial
state, and therefore this point is a fixed-point attractor (see Figure5.2b
left.) When a is increased to 3.0, the fixed-point attractor bifurcates into
a limit-cycle attractor with a period of 2.With a set to 3.2, a limit cycle
alternating between 0.52 and 0.80 appears (see Figure 5.2b middle.),
87

Dynamical Systems Approach for Embodied Cognition 87

and when a is further increased to 3.43, the limit cycle with a period
of 2 bifurcates into one with a period of 4. A limit cycle alternating
sequentially between 0.38, 0.82, 0.51, and 0.88 appears when a is set to
3.5, whereas when a is increased to 3.60, further bifurcation takes place
from a limit cycle to a chaotic attractor characterized by an invariant
set with an infinite number of points (see Figure5.2b right.) The time
evolutions of x starting from different initial states are plotted for these
values of a, where it is clear that the transient dynamics of the trajectory
of x converge toward those fixed-point, limit-cycle, and chaotic attrac-
tors. It should be noted that no periodicity is seen in the case ofchaos.
Well turn now to look briefly at a number of characteristics of chaos.
One of the essential characteristics of chaos is its sensitivity with respect
to initial conditions. In chaos, when two trajectories are generated from
two initial states separated by a negligibly small distance in phase space,
the distance between these two trajectories increases exponentially as
iterations progress. Figure 5.3a shows an example of such development.
This sensitivity to initial conditions determines the ability of chaos
to generate nonrepeatable behaviors even when a negligibly small per-
turbation is applied to the initial conditions. This peculiarity of chaos
can be explained by the process of stretching and folding in phase space
as illustrated in Figure 5.3b. If a is set to 4.0, the logistic map gener-
ates chaos that covers the range of x from 0.0 to 1.0 as can be seen in
Figure5.2a. In this case, the range of values for x 0 between 0.0 and 0.5
is mapped to x1 values between 0.0 and 1.0 with magnification, whereas
x 0 values between 0.5 and 1.0 are mapped to x1 values between 1.0 and
0.0 (again with magnification, but in the opposite direction), as can be
seen in Figure5.3b. This essentially represents the process of stretching
and folding in a single mapping step of the logistic map. Two adjacent
initial states denoted by a dot and a cross are mapped to two points that
are slightly further apart from each other after the first mapping. When
this mapping is repeated n times, the distance between the two states
increases exponentially, resulting in the complex geometry generated for
x n by means of iterated stretching and folding. This iterated stretching
and folding is considered to be a general mechanism for generatingchaos.
Further, look at an interesting relation between chaotic dynamics and
symbolic processes. If we observe the output sequence of the logistic
map and label it with two symbols, H for values greater than 0.5 and
L for those less than or equal to 0.5, we get probabilistic sequences of
alternating H and L. When the parameter a is set at 4.0, it is known
88

88 On theMind

(a) 1

0.9

0.8

0.7
x
0.6

0.5

0.4

5 10 15 20 25 30 35 40 45 50
t

(b) x0 x1 x2 x3 xn

1.0

1st 2nd 3rd


0.5

0.0

Figure5.3. Initial sensitivity of chaotic mechanisms. (a)Distance between two


trajectories (represented by solid and dashed lines) starting from their initial
states separated by a distance of in phase space. Distance between the two
grows exponentially as time goes by in chaos generated by a logistic map with a
set to 3.6. (b)The mechanism of generating chaos is by stretching and folding.

that the logistic map generates H or L with equal probability with no


memory, like a coin flip. This can be represented by a one-state proba-
bilistic finite state machine (FSM) with an equal probability output for
H and L from this single state. If the parameter a is changed to a
different value in the chaotic region, a different form of a probabilistic
FSM with a different number of discrete states and different probability
assignments for output labels is reconstructed for each. This is called
symbolic dynamics (Crutchfield & Young, 1989; Devaney 1989), which
provides a theorem to connect real number dynamical systems and dis-
crete symbol systems.
89

Dynamical Systems Approach for Embodied Cognition 89

Tangency

Xt+1

Xt

Figure5.4. Tangency in nonlinear mapping. The passing through of the state


x slows down in the vicinity of the tangencypoint.

One interesting observation of logistic maps in terms of symbolic


dynamics is that the complexity of symbolic dynamics in terms of
the number of states in the reconstructed probabilistic FSM can be
infinite especially in the parameter region at the onset of chaos, at
the ends of window regions in which the periodicity of the attrac-
tor moves from finite to infinite (Crutchfield & Young, 1989). It is
known that nonlinear dynamic systems in general develop critical
behaviors upon exhibiting state trajectories of infinite complexity,
at the edge of chaos, including at the ends of window parameter
regions where quite rich dynamic patterns following power law can
be observed. Edge of chaos can be observed also under another critical
condition when tangency exists in mapping of function, as shown
in Figure5.4.
When the curve of mapping function becomes tangent to the
line of identity mapping, passing through the tangent point could
take infinite steps depending on the value of x to enter the passing
through. This generates the phenomena known as intermittent chaos
in which the passing through appears intermittently, only after sev-
eral steps, or sometime after infinite steps. These properties of edge
of chaos in critical conditions are revisited in later chapters as we
examine the behavioral characteristics of neurorobots observed in our
experiments.
90

90 On theMind

5.1.2 Continuous-Time Systems

Next, lets examine the case of continuous time, represented by Eq. 5.


Well take the Rssler system (Rssler, 1976)as a simple example that
can be described by the following set of ordinary differential equations:

x = y z
y = x + ay (Eq.5)
z = b + z(x c )

This continuous-time nonlinear dynamical system is defined by a three-


dimensional state (x, y, and z), three parameters (a, b, and c), and no
inputs. If we conduct a phase space analysis on this system, we can see
different dynamical structures appearing for different parameter set-
tings (of a, b, and c). As shown in Figure 5.5, continuous trajectories
of the dynamical state projected in the two-d imensional space (x, y)
converge toward three different types of attractors (fixed point, limit
cycle, or chaotic) depending on the values of the parameters. It should
be noted that in each case the trajectory converges to the same attractor
regardless of the initial state. Such an attractor is called a global attrac-
tor, and the chaotic attractor shown in (c)is the Rssler attractor. The
phenomena corresponding to these changes in the dynamical structure
caused by parameter bifurcation are quite similar to those observed in
the case of the logistic map. The mechanism of generating chaos with
the Rssler attractor can be explained by the process of stretching and
folding previously mentioned. In the Rssler attractor, a bundle of tra-
jectories constituting a sheet rotates in a counterclockwise direction,
accompanied by a one-time folding and stretching. If we take a section
of the sheet, which is known as a Poincar section (Figure 5.5d), well
see a line segment consisting of an infinite number of trajectory points.
This line segment is folded and stretched once during a single rotation,
which is mapped again onto the line segment (see Figure 5.5e). If this
process is iterated, the sensitivity of this system to initial conditions
becomes apparent in the same way as with the logisticmap.

5.1.3 Structural Stability

This subsection explains why structural stability is an important char-


acteristic of nonlinear dynamical systems. Importantly, Iwill argue that
one emergent property of nonlinear dynamical systems is the appearance
91

Dynamical Systems Approach for Embodied Cognition 91

(a) (b)

(c) (d)
Poincare
section

(e) Fold and stretch

Figure5.5. Different attractors appearing in the Rssler system. (a)Afixed-


point attractor (a=0.2, b=0.2, c=5.7), (b)a limit-cycle attractor
(a=0.1, b=0.1, c=4.0), and (c)a chaotic attractor (a=0.2, b=0.2,
c=5.7). Illustrations of (d)the Poincar section and (e)the process of
folding and stretching in the Rssler attractor that accounts for mechanism
of generatingchaos.

of a particular attractor configuration for any given dynamical system.


A particular equation describing a dynamical system can indicate the
direction of change of state at each local point in terms of a vector field.
However, the vector field itself cannot tell us what the attractor looks
like. The attractor emerges only after a certain number of iterations have
been performed, through the transient process of converging toward the
attractor. An important point here is that attractors as trajectories of
steady states cannot exist by themselves in isolation. Rather, they need
to be supported by transient parts of the vector that converge toward
these attractors. In other words, transient parts of the vector flow make
attractors stable, as illustrated in Figure5.6a.
92

92 On theMind

(a) 3 (b) 3

2 2

1 1

V 0 V 0
1 1

2 2

3 3
3 2 1 0 1 2 3 3 2 1 0 1 2 3
X X

Figure5.6. Vector flow. (a)Appearance of a limit-cycle attractor in a vector


field of a particular two-d imensional continuous dynamical system with
the system state (x, v) in which the vector flow converges toward a cyclic
trajectory. (b)Avector field for a harmonic oscillator in which its flow is not
convergent but forms concentric circles.

This is the notion behind the structural stability of attractors. To pro-


vide a more intuitive explanation of this concept, lets take a counterex-
ample in terms of a system that is not structurally stable. Sometimes Iask
students to give me an example of a system that generates oscillation
patterns and a common answer is a sinusoidal function or a harmonic
oscillator, such as the frictionless spring-mass system described byEq.6.

mv = k x
(Eq.6)
x = v

Here, x is the one-d imensional position of a mass m, v is its velocity,


and k is the spring coefficient. The equation represents a second order
dynamic system without damping terms. Africtionless spring-mass sys-
tem can indeed generate sinusoidal oscillation patterns. However, such
patterns are not structurally stable because if we apply force to the mass
of the oscillator instantaneously, the amplitude of oscillation will change
immediately, and the original oscillation pattern will never be recovered
automatically (again, it is frictionless). If the vector field is plotted in (x, v)
space, we will see that the vector flow describes concentric circles where
there is no convergent flow that constitute a limit-cycle attractor (see
Figure 5.6b). Indeed, a sinusoidal wave function is also simply the trace
of one point on a circle as it rolls along aplane.
Most rhythmic patterns in biological systems are thought to be gener-
ated by limit-cycle attractors because of their potential stability against
93

Dynamical Systems Approach for Embodied Cognition 93

perturbations. These include central pattern generators in neural circuits


for the heart beat, locomotion, breathing, swimming, and many oth-
ers, as is described briefly in the next section. Such limit-cycle attractor
dynamics in real physical systems are generated by nonlinear dynamical
systems called dissipative systems. A dissipative system consists of an
energy dissipation part and an energy supply part. If the amounts of
energy dissipation and energy supply during one cycle of oscillation are
balanced, this results in the formation of an attractor of the limit cycle
type (or it could also result in the generation of chaos under certain con-
ditions). Energy can be dissipated by dampening caused by friction in
mechanical systems or by electric resistance in electrical circuits. When
a larger or smaller amount of energy is supplied momentarily due to a
perturbation from an external source, the state trajectory deviates and
becomes transient. However, it returns to the original attractor region by
means of automatic compensation by dissipating an appropriate amount
of energy corresponding to the input energy.
On the other hand, a harmonic oscillator without a dampening term,
such as that shown in Eq. 6, is not a dissipative system but an energy
conservation system. There is no dampening term to dissipate energy
from the system. Once perturbed, its state trajectory will not return to
the original one. In short, the structural stability of dynamic patterns in
terms of physical movements or neural activity in biological systems can
be achieved through attractor dynamics by means of a dissipative struc-
ture. Further, the particular attractors appearing in different cases are the
products of emergent properties of such nonlinear (dissipative) dynamic
systems. Indeed, Neo-Gibsonian psychologists have taken advantage of
these interesting dynamical properties of dissipative systems to account
for the generation of stable but flexible biological movements. The next
section explores such concepts by introducing the Gibsonian approach
first, followed by Neo-Gibsonian variants and infant developmental psy-
chology using the dynamical systems perspectives.

5.2. Gibsonian and Neo-Gibsonian Approaches

5.2.1 The Gibsonian Approach

A concept central to this approach, known as affordance, has signifi-


cantly influenced not only mainstream psychology and philosophy of
94

94 On theMind

the mind, but also synthetic modeling studies including artificial intel-
ligence and robotics. In the original theory of affordance proposed by
J.J. Gibson (1979), affordance was defined as all possibilities for actions
latent in the environment. Put another way, affordance can be under-
stood as behavioral relations that animals are able to acquire in interac-
tion with their environments. Relationships between actors and objects
within these environments afford these agents opportunities to generate
adequate behaviors. For example, a chair affords sitting on it, and a door
knob affords pulling or pushing a door open or closed free from the
resistance afforded by the door's locking mechanism.
Many of Gibson's considerations focused on the fact that essential
information about the environment comes by way of human process-
ing of the optical flow. Optical flow is the pattern of motion sensed by
the eye of an observer. By considering that optical flow information can
be used to perceive one's own motion pattern and to control one's own
behavior, Gibson came up with the notion of affordance constancy. He
illustrated this concept with the example of a pilot flying toward a tar-
get on the ground, adjusting the direction of flight so that the focus of
expansion (FOE) in the visual optical flow becomes superimposed on the
target (see Figure 5.7a). This account was inspired by his own experience
in training pilots to develop better landing skills during World WarII.
A similar example, closer to everyday life, is that we walk along a
corridor while recognizing the difference from zero of the optical flow
vectors along both sides of the corridor, which allows us to walk down
the middle of the corridor without colliding with the walls (see Figure
5.7b). These examples suggest that for each behavior there is a crucial
perceptual variablein Gibsons two examples, the distance between
the FOE and target, and the vector difference between the optical flows
for both wallsand that body movements are generated to keep these
perceptual variables at constant values. By assuming the existence of
coupled dynamics between the environment and small controllers inside
the brain, the role of the controllers is to preserve perceptual constancy.
Asimple dynamical system theory can show how this constancy may be
maintained by assuming the existence of a fixed point attractor, which
ensures that perceptual variables always converge to a constantstate.
Andy Clark, a philosopher in Edinburgh, has been interested in the
role of embodiment in generating situated behaviors from the Gibsonian
perspective. He analyzed how an outfielder positions himself to catch
a fly ball as an example (Clark, 1999). In general, this action is thought
95

Dynamical Systems Approach for Embodied Cognition 95

(a) (b)

Figure5.7. Gibsons notion of optical constancy. (a)Flying while superim


posing the focus of expansion on the target heading and (b)walking along a
corridor while balancing optical flow vectors against both side walls. Redrawn
from Gibson (1979).

to require complicated calculations of variables such as the arc, speed,


acceleration, and distance of the ball. However, there is actually a simple
strategy to catch it:If the outfielder continues to adjust his movement so
that the ball appears to approach in a straight line in his visual field, the
ball falls down to him eventually. By maintaining this coordination for
perceptual constancy, he can catch the fly ball easily. Clark explains that
the task is to maintain, by making multiple, ongoing, real-time adjust-
ments to the running motion, a kind of coordination between the inner
and the outer. This means that coordination dynamics like this naturally
appears under relatively simple principles, such as perceptual constancy,
instead of through complicated computation involving representation in
an objective, simulated Cartesian coordinate system.

5.2.2 Neo-Gibsonian Approaches

In the 1980s, so-called Neo-Gibsonian psychologists such as Turvey,


Kugler, and Kelso started investigating how to achieve the coordination
of many degrees of freedom by applying the ideas of dissipative struc-
tures from nonlinear dynamics to psychological observations of human
and animal behavior (see the seminal book by Scott Kelso, 1995). They
considered that the ideas of dissipative structures, especially concerning
limit cycle attractor dynamics, can serve as a basic principle in organiz-
ing coherent rhythmic movement patterns such as walking, swimming,
breathing, and hand waving, as described briefly in the previous section.
The important theoretical ingredients of these ideas are entrainment and
phase transitions. First, coupled oscillators that initially oscillate with
96

96 On theMind

different phases and periodicities can, by mutual entrainment under cer-


tain conditions, converge to a global synchrony with reduced dimen-
sionality. Second, the characteristics of this global synchrony can be
drastically changed by a shift of an order parameter of the dynamic sys-
tem by means of phase transition.
Lets look at this in more detail by reviewing a representative experi-
mental study conducted by Kelso and colleagues (Schoner & Kelso,
1988). In the experiment, subjects were asked to wiggle the index fin-
gers of their left and right hands in the same direction (different mus-
cles activated; antiphase) in synchrony with a metronome. When the
metronome was speeded up gradually, what happened was that the fin-
ger movement pattern suddenly switched from the same direction to
the opposite direction one (same muscles activated; in-phase). It was
observed that the relative phase changed 180 degrees to 0 degrees sud-
denly (see the left-hand side panel in Figure5.8).

Right index finger


Left index finger
energy
amp

time 180 0 Phase difference


energy
Frequency
amp

time 180 0 Phase difference


energy
amp

time 180 0 Phase difference

Figure5.8. The phase transition model by Kelso (1995) for explaining


the dynamic shifts seen in bimanual finger movements. The panel on the
left-hand side shows how oscillation coordination between right and left
index fingers changes when the leading frequency is increased. The panel
on the right-hand side shows the corresponding change in the energy
landscape.
97

Dynamical Systems Approach for Embodied Cognition 97

After this experiment, Kelso and colleagues showed by computer simu-


lation that the observed dynamic shift is due to the phase transition from
a particular dynamic structure self-organizing to another, given changes in
an order parameter of the system (the speed of metronome in this exam-
ple). When a hypothetical energy landscape is computed for the movement
patterns along with the order parameter of the metronome speed (see the
right-hand side panel in Figure 5.8), the antiphase becomes stable with
its energy minimum state when the metronome speed is low. However,
the antiphase becomes unstable as the metronome speed increases (the
parameter introduces too much energy into the system) and the behavior
is modulated toward the realization of a more stable system and corre-
sponding energetic minimum, switching the system state suddenly from
the anti-phase to the in-phase. Such dramatic shifts in dynamic system
state such as those seen in the bimanual finger movement illustration can
be explained by means of the phenomena of phase transition. Indeed, a
diverse range of phenomena characterized by similar shifts in animal and
human movement patterns appear very effectively explained in terms of
phase transitions. Good examples include the dynamic shift from trot to
gallop in horse locomotion given a change in the system parameter run-
ning speed, as well as the shift from walk to run in human locomotion.
It is common experience that the middle state, a walk-run, is more dif-
ficult to maintain (at least without lots of practice) than one or the other
behaviors. This result accords with a central notion in Neo-Gibsonian
approaches, that behaviors are organized not by an explicit central com-
mander top-down, but by implicit synergy among local elements including
neurons, muscles, and skeletal mechanics, and that these behaviors repre-
sent emergent characteristics of dissipative structures.

5.2.3 Infant Developmental Psychology

Neo-Gibsonian theories helped to give birth to another dynamic system


theory that accounts for infant development. Ester Thelen and Linda
B.Smith wrote in their seminal textbook, A Dynamic Systems Approach
to the Development of Cognition and Action,that:

We invoke Gibsons beliefs that the world contains information and


that the goal of development is to discover relevant information in
order to make a functional match between what the environment
affords and what the actor can and wants to do. (Thelen & Smith,
1994, p.9, Introduction)
98

98 On theMind

They suggest that development is better understood as the emergent


product of many decentralized and local interactions occurring in real
time between parts of the brain, the body, and the environment, rather
than as sequences of events preprogrammed in our genes. For example,
crawling is a stable behavior for infants for several months. However,
when they newly acquire the movement patterns of walking upright,
the movement patterns of crawling become unstable. Smith and Thelen
hold that this happens not as the result of a genome preprogram but as
the result of an efficient solution generated through self-organization
(Smith & Thelen,2003).
Following this line of thinking, Gershkoff-Stowe and Thelen (2004)
provide a remarkable account of so-called U-shaped development, a phe-
nomenon whereby previously performed behaviors regress or disappear
only to recover or reappear with even better performance later on. Atypi-
cal example can be seen in language development around 2 or 3years of age
when children, after several months of correct usage, often incorrectly use
words like foots and goed, a phenomenon known as overregularization.
They eventually resume using these words correctly. Another example is
the walking reflex. When a newborn baby is held so that the feet lightly
touch a solid surface, she or he shows walking-like motion with alternate
stepping. However, this reflexive behavior is scarcely observed after a few
months and does not reappear until just prior to walking.
One more example is perseverative reaching observed in the so-
called A-not-B task as originally demonstrated by Jean Piaget, known as
the father of developmental psychology, illustrated in Figure5.9.
In this task, 8-to 10-month-old infants are cued to recover a hidden
object from one of two identical hiding places (see Figure 5.9). Recovery
is repeated several times at the first location A before the experi-
menter switches the hiding place to the second location B. Although
the infant watches the toy hidden at the new location B, if there is a
delay between hiding and allowing the child to reach, infants robustly
return to the original location A. This is known as perseverative reach-
ing. This reaching can even be observed in the not-hidden toy case (i.e.,
when provided with an explicit cue to indicate the correct location).
An interesting observation in the not-hidden condition is that infants
around 5months old are correct (around 70% success rate) at location
B and show less perseveration than infants around 8months old who
are incorrect (around 20% success rate). This perseverative behavior is
not observed in infants older than 12months ofage.
99

Dynamical Systems Approach for Embodied Cognition 99

1 2 3

B A

4 5

Figure5.9. Piagets A-not-B task. First, in 1, an attractive object is hidden at


location A (left-hand side). The infant then repeatedly retrieves the object,
in 2 and 3, from the correct location of A. In 4, the object is then hidden at
location B (right-hand side) while the infant attends to this. However, with
a delay between seeing the hiding and retrieval, the infant fails to retrieve
the object at the correct locationB.

What is the underlying mechanism in these examples of U-shaped


development? Gershkoff- Stowe and Thelen (2004) argue that
U-shaped development is not caused by regression or loss of a single
element such as one in the motor, perceptual, or memory system
alone. Instead, U-shaped behavior is the result of a continuously
changing configuration between mutually interacting components,
including both mental and behavioral components. They write, The
issue is not how a behavior is lost or gets worse, but how the com-
ponent processes can reorganize to produce such dramatic nonlin-
earities in performance (Gershkoff-Stowe & Thelen, 2004, p. 16).
In the case of perseverative reaching, although it can be considered
that repeated recoveries from location A can reinforce a memory
bias to select location A again upon next reaching, this is not the
only cause. It was found that the hand trajectories in repeated recov-
eries of 8-month-old infants become increasingly similar to those
of 5-month-old infants who are relatively immature in controlling
their hand reaching movements. It was also found that changing the
hand trajectory by adding weights to the infants arms significantly
100

100 On theMind

decreased the perseveration. The point here is that the mutual rein-
forcement of the memory bias and the persistent trajectories in the
reaching movement through the repeated recoveries result in form-
ing a strong habit of reliable perseverative reaching. This account has
been supported by simulation studies using the dynamic neural field
model (Schoner & Thelen, 2006). This perseverative reaching is at
its peak at 8months of age and starts to drop off thereafter as other
functions mature to counter it, such as attention switch and atten-
tion maintenance that allow for tracking and preserving the alter-
native cue appearing in the second location B. Smith and Thelen
(2003) explain that infants who have had more experience exploring
environments by self-locomotion show greater visual attention to the
desired object and its hidden location.
This account of how reaching for either A or B is determined
by infants is parallel to what Spivey (2007) has discussed in terms of
the continuity of minds. He considers that even discrete decisions
for selecting actions might be delivered through the process of gradu-
ally settling partially active and competing neural activities involved
with multiple psychological processes. And again, the emergence of
U-shape development is a product of dynamic interactions between
multiple contingent processes both internal and external to infants
(Gershkoff-Stowe & Thelen, 2004). The next subsection looks at the
development of a cognitive competency, namely imitation, which has
been considered to play an important role in the cognitive develop-
ment of children.

5.2.4Imitation

It has been considered that imitation and observational learning are


essential for children to acquire a wide range of behaviors because learn-
ing by imitation is much more efficient than learning through trial and
error by each individual alone. Jean Piaget proposed that imitation in
infants develops through six discrete stages until 18 to 24months of age
(Piaget, 1962). The first stage starts with the sensory-reflex response of
newborns, which is followed by repletion of some repertories by chance
in the second stage. A drastic differentiation in development comes
with deferred imitation at around 8 to 12months in the fourthth stage.
Here, an ability to reproduce a modeled activity that has been observed
at some point in the past emerges. Piaget emphasized this change by
101

Dynamical Systems Approach for Embodied Cognition 101

suggesting that this stage marks the onset of mentalization capabilities


in infants. This mentalization capability is further developed in the
sixth stage at around 18 to 24months, when some symbolic level men-
tal representation and manipulation can be observed. Atypical example
is the appearance of pretend play. For example, a child pretends to call
by using a banana instead of a real phone after observing the actions of
his or her parents.
Although Piaget's emphasis was cognitive development toward men-
talization or symbolism that appears in the later stages, some recent
studies have pursued the suspicion that the roots of the human cogni-
tion may be found in the analysis of early imitation, and so have focused
on how neuronal mechanisms of imitation appear in much earlier stages.
A seminal study by Meltzoff and Moore (1977) showed that human
neonates can imitate facial gestures of adults such as tongue protrusion,
mouth opening, and lip protrusion. This finding was nontrivial because
it implies that neonates can match their own unseen behaviors with
those demonstrated by others. Even Piaget believed that facial imitation
could appear only after 8months ofage.
Although the exact mechanisms enabling these imitative behaviors
in neonates is still a matter of debate, Meltzoff (2005) has hypothesized
a like me mechanism that connects the perceptions of others like me
with one's own capacities, therefore grounding an embodied understand-
ing of others minds enactive imitation. In the first stage, in newborns,
innate sensorymotor mapping can generate aforementioned imitative
behaviors by means of automatic responses. In the second stage, infants
experience regular relationships between their mental states and actions
generated repeatedly, and thus associations between them are learned.
Finally in the third stage, infants come to understand that others who
act like me have mental states likeme.
On a similar line, Jacqueline Nadel (2002) proposed that imitation is
a means to communicate with others. Nadel observed a group of prever-
bal infants in a natural social play setting involving a type of frequently
observed communicative interaction, turn taking or switching roles
among two or three infants. Typical turn taking was observed when an
infant showed another infant an object similar to the one he or she was
holding. In most cases, the partner infant took the object and imitated
its usage. Sometimes, however, the partner refused to do so or ignored
the other. In these cases, the initiator left the object and turned to imi-
tate the partner's ongoing behavior.
102

102 On theMind

Another remarkable finding by Nadel (2002) was that pairs of pre-


verbal infants often exhibited imitation of instrumental activity with
synchrony between them. Figure 5.10 shows that when one infant dem-
onstrated an unexpected use of objects (carrying an upside-down chair
on his head), the partner imitated this instrumental activity during their
imitative exchanges.
Based on these observations and others, Nadel and colleagues
argue that although immediate imitation generated during behavioral
exchanges may not be always an intelligent process as Piaget pointed
out, infants at the very least know how to communicate with each
other (Andry etal., 2001). This intriguing communicative activity may
not require much of mental representation and manipulation or symbol-
ism, but rather depends on synchronization and rhythm which appear
spontaneously in the dynamical processes of sensorymotor mapping
between the perception of others of like me and one's own actions.
The next section describes a new movement in artificial intelligence and
robotics guided by these insights and many others from contemporary
developmental psychology.

Figure5.10. Preverbal infants exhibit instrumental activities with


synchrony during imitative exchange. Reproduced from (Nadel, 2002)with
permission.
103

Dynamical Systems Approach for Embodied Cognition 103

5.3. Behavior-Based Robotics

At the end of the 1980s, a paradigm shift occurred in artificial intel-


ligence and robotics research. This shift occurred with the introduc-
tion of behavior-based robotics by Rodney Brooks at MIT. It should be
noted, however, that just a few years before Brooks started his proj-
ect, Valentino Braitenberg, a German neuroanatomist, published a
book entitled Vehicles:Experiments in Synthetic Psychology (Braitenberg,
1984)describing the psychological perspective that led to the behavior-
based robotics approach. The uniqueness of the book is its attempt to
explore possible brain-psychological mechanisms for generating behav-
ior via synthesis. For example, Braitenbergs Law of uphill analysis and
downhill invention suggests that it is more difficult to understand a
working mechanism or system just from looking at it externally than it is
to create it from scratch, an insight parallel to the quote from Feynman
introducing this chapter.
Another interesting feature of Braitenbergs book is that all of the
synthesis described is done through thought experiments rather than
by using real robots or computer simulationsalthough many research-
ers reconstructed these experiments using actual robots years later.
Braitenbergs thought experiments are simple, yet provide readers with
valuable clues about the cognitive organization underlying adaptive
behaviors. Some representative examples of his thought experiments
are introduced as follows, because they offer a good introduction to
understanding the behavior-based approach.

5.3.1 Braitenbergs Vehicle Thought Experiments

In his book, Braitenberg introduces thought experiments concerning


14 different types of vehicles. Here, we confine ourselves to looking
at Vehicles 2, 3, and 4 as representative examples. Each of the three
vehicles is equipped with a set of paired sensors on the front left-and
right-hand sides of its body. The sensory inputs are transmitted to the
left and right wheel drive motors at the rear through connecting lines
which are analogous to synaptic connections. Lets begin with Vehicle
2a shown in Figure5.11.
The vehicle has light intensity sensors to the front on each side that
are connected to its corresponding rear motors in an excitatory manner
104

104 On theMind

+ + + +

Vehicle 2a Vehicle 2b

Vehicle 3b

Vehicle 3a

Figure5.11. Braitenberg vehicles 2a and 2b (top) and 3a and 3b (bottom).

(same-side excitatory connectivity). If a light source is located directly


ahead of the vehicle, it will crash into the light source by accelerating
the motors on both sides equally. However, if there is a slight deviation
toward the light source, the deviation will be increased by accelerating
the motor on the side closer to the light source. This eventually gener-
ates radical avoidance of the light source. On the other hand, if each
sensor is connected to a motor on the opposite side (cross-excitatory
connectivity), as shown for Vehicle 2b in Figure 5.11, the vehicle always
crashes into the light source. This is because the motor on the opposite
side of the light source accelerates more and thus the vehicle moves
toward the light source. Vehicles 2a and 2b are named Coward and
Aggressive, respectively.
Now, lets suppose that the connectivity lines, rather than being
excitatory as for Vehicle 2, are inhibitory for Vehicle 3 (Figure 5.11).
Now Vehicle 3 has drastically different behavior characteristics from
Vehicle 2. First, lets look at Vehicle 3a which has same-side inhibi-
tory connectivity. This vehicle slows down in the vicinity of the light
source. It is gradually attracted to the light source and finally stops
close enough (perhaps depending on friction of the wheels and other
factors). If the vehicle deviates slightly to one side from the source,
the motor on the opposite side slows down because it is inhibited by
the sensor that perceives a stronger stimulus from the source. If it
deviates to the right, then the left wheel is inhibited, and vice versa.
105

Dynamical Systems Approach for Embodied Cognition 105

(b)

(a)

Figure5.12. Braitenberg vehicle 4.(a) Nonlinear maps from sensory intensity


to motor velocity assumed for this vehicle and (b)complex behaviors that
emerge on more complexmaps.

Eventually, the vehicle shifts back toward the source and finally stops
to stay in the vicinity of the source. In the case of Vehicle 3b, which has
cross-inhibitory connectivity, although this vehicle also slows down in
the presence of a strong light stimulus, it gently turns away from the
source, employing the opposite control logic of Vehicle 3a. The vehicle
heads for another light source. Vehicles 3a and 3b are named Lover and
Explorer, respectively.
Vehicle 4 is added with a trick in the connectivity lines:The relation-
ship between the sensory stimulus and the motor outputs is changed
from a monotonic one to a non-monotonic one, as shown in Figure5.12a.
Because of the potential nonlinearity in the sensorymotor response,
the vehicle will not just be monotonically approaching the light sources
or escaping from them. It can happen that the vehicle approaches a
source but changes course to deviate away from it when coming within
a certain distance of it. Braitenberg imagined that repetitions of this sort
of approaching and moving away from light sources can result in the
emergence of complex trajectories, as illustrated in Figure 5.12b. Simply
by adding some nonlinearity to the sensorymotor mapping functions
of the simple controllers, the resultant interactions between the vehicle
and the environment (light sources) can become significantly complex.
These are very interesting results. However, being thought experiments,
this approach is quite limited. Should we wish to consider emergent
behaviors beyond the limits of such thought experiments, we require
computer simulations or real robotics experiments.
106

106 On theMind

5.3.2 Behavior-Based Robots and Their Limitations

Returning now to behavior- based robotics, Brooks elaborated on


thoughts similar to Braitenbergs by demonstrating that even small
and extremely simple insect-like robots could exhibit far more com-
plex, realistic, and intelligent behaviors than the conventional compu-
tationally heavy robots used in traditional AI research. This marked the
beginning of behavior-based robotics research. Argumentative papers
published by Brooks, such as Elephants don't play chess (Brooks,
1990)and Intelligence without representation (Brooks, 1991), present
his thoughts on what he calls classical AI and nouvelle AI. He has
criticized the use of large robots programmed with classical AI schemes,
arguing that a lot of the computation time is spent on logical inference
or the preparation of action plans in real-world tests even before taking
a single step or indeed making any movement atall.
On the other hand, small robots whose behavior is based on the phi-
losophy of nouvelle AI are designed to move first, taking part in physical
interactions with their environment and with humans while comput-
ing all the necessary parameters in real time in an event-based manner.
Brooks also criticizes the tendency of classical AI to be overwhelmed
with representation. For example, typical mobile robots based on the
classical AI scheme are equipped with global maps or environment
models represented in a three-d imensional Cartesian coordinate sys-
tem. The robots then proceed to match what they have sensed through
devices such as vision cameras with the stored representation through
complicated coordinate transformations for each step of their move-
ment as they find their location in the stored Cartesian coordinate sys-
tem. The behavior-based robots made by Brooks and his students use
only a simple scheme based on the perception-to-motor cycle, in which
the motor outputs are directly mapped from the perceptual inputs at
each iteration.
The problem with the classical AI approach is that the representation
is prepared not through actual actions taken by the agent (the robot), but
by implementing an externally imposed artificial purpose. This problem
can be attributed to the lack of direct experience, which is related to
Husserls discussions on phenomenological reduction (see chapter3).
Behavior-based robotics could provide AI researchers and cognitive
scientists with a unique means to obtain a view on first-person experi-
ence from the viewpoint of a robot by almost literally putting themselves
107

Dynamical Systems Approach for Embodied Cognition 107

inside its head, thereby affording the opportunity to examine the sen-
sory flow experienced by the robot. Readers should note that the idea of
the perception-to-motor cycle with small controllers in behavior-based
robots and Braitenberg vehicles is quite analogous to the aforementioned
Gibsonian theories emphasizing the role of the environment rather than
the internal brain mechanisms (also see Bach, 1987)).
Behavior-based approaches that emphasize embodiment currently
dominate the field of robotics and AI (Pfeifer & Bongard, 2006).
Although this paradigm shift made by the behavior- based robotics
researchers is deeply significant, I feel a sense of discomfort that the
common use of this approach emphasizes only sensory motor level
interactions. This is because I still believe that we humans have the
cogito level that can manipulate our thoughts and actions by abstract-
ing our daily experiences from the sensorymotor level. Actually, Brooks
and his students examined this view in their experiments applying the
behavior- based approach to the robot navigation problem (Matari,
1992). The behavior-based robots developed by Brooks lab employed
the so-called subsumption architecture, which consists of layers of com-
petencies or task-specific behaviors that subsume lower levels. Although
in principle each behavior functions independently by accessing sensory
inputs and motor outputs, behaviors in the higher layers subsume those
in the lower ones by sending suppression and inhibition signals to their
sensory inputs and motor outputs, respectively. Asubsumption archi-
tecture employed for the navigation task is shown in Figure5.13.
The subsumption control of behaviors allocated to different layers
includes avoiding obstacles, wandering and exploring the environment,
and building map and planning. Of particular interest in this architec-
ture is the top layer module that deals with map building and planning.

Building map & planning

Exploring

Wandering

Avoiding objects
Sensation Motor

Figure5.13. The subsumption architecture used for the robot navigation


problem in research by Brooks and colleagues.
108

108 On theMind

This layer, which corresponds to the cogito level, is supposed to generate


abstract models of the environment through behavioral experiences and
to use these in goal-d irected action planning.
An important remaining problem concerns the ways that acquired
models or maps of the environment are represented. Daniel Dennett
points to this problem when writing The trouble is that once we
try to extend Brooks interesting and important message beyond the
simplest of critters (artificial or biological), we can be quite sure that
something awfully like representation is going to have to creep in
(Dennett, 1993, p.126). The scheme by Matari (1992) employed a
topological graph representation for the environment map consisting
of nodes representing landmark types and arrows representing their
transitions in the course of traveling (see Figure2.2). As long as sym-
bols understood to be arbitrary shapes of tokens (Harnad, 1990)are
used in those nodes for representing the world, they can hardly be
grounded in the physical world in a metric space common to the
physical world, as discussed earlier. In light of this, what direction of
research should behavior-based robotics researchers pursue? Should
we give up involving the cogito level or accept the usage of symbols
for incorporating cogito level activities, bearing in mind potential
inconsistencies?
Actually, a clue to resolving this dichotomy can be found in one of
Braitenberg's vehicles, Vehicle 12. Although Braitenberg vehicles up to
Vehicle 4 have been introduced in numerous robotics and AI textbooks,
the thought experiments beyond Vehicle 4, which target higher-order
cognitive mechanisms, are equally interesting. These higher-order cog-
nitive vehicles concern logic, concepts, rules, regularities, and foresights.
Among them, Vehicle 12 examines how a train of thought can be gener-
ated. Braitenberg implemented a nonlinear dynamical system, a logistic
map (see section 5.1), into the vehicle that enables sequences of val-
ues or thoughts in terms of neuronal activation to be generated in an
unpredictable manner but with hidden regularity by means of chaos.
Braitenberg argues that this vehicle seems to possess free will to manip-
ulate thoughts, at the least from the perspective of outside observers of
the vehicle. We will come back to this consideration in later chapters as
the issue of free will constitutes one of the main focuses of thisbook.
So far, we have seen that Gibsonian and Neo-Gibsonian researchers
as well as behavior-based robotics researchers who emphasize embodied
cognition tend to regard the role of the brain as only that of a minimal
109

Dynamical Systems Approach for Embodied Cognition 109

controller. This is because even very primitive controllers like the


Braitenberg vehicles can generate quite complex behaviors when cou-
pled with environmental stimuli. It is only natural to expect that even
higher-order cognition might emerge to some extent if further nonlin-
earity (like that employed in Vehicle 12)or some adaptability could be
added to the controller.
Now, we begin to consider minimal forms of an artificial brain,
namely neural network models that are characterized by their nonline-
arity and adaptability, when put into robot heads. Note, however, that
these attempts do not accord with our knowledge that the brain is a
complex organ, as we have seen in previous chapters. So, lets contem-
plate first how this discordance can be resolved.

5.4. Modeling theBrain atDifferentLevels

As for general understanding, neural activity in the brain can be


described on the basis of processes that occur at multiple levels, start-
ing from the molecular level (which accounts for processes such as
protein synthesis and gate opening in synapses), the neurochemical
level (which accounts for signal transmission), the single cell activ-
ity level (which accounts for processes such as spiking), and the cell
assembly level in local circuits through to the macroscopic regional
activation level measurable with technologies such as fMRI or EEG.
The target level depends on the phenomenon to be reproduced. If we
aim to model the firing activity of a single cell, we describe precisely
how the membrane potential changes as a result of ion flow in a single
neuron. If we aim to model neuron interconnection phenomena, as
observed in the hippocampus by Ikegaya and colleagues (2004) using
optical recording techniques, the model should focus on how spik-
ing activity can spread across local circuits consisting of thousands of
interconnected neurons.
On the other hand, if we aim to model neural processing related
to the generation of cognitive behavior, it would not be a good idea to
model a single spiking neuron. Rather, such modeling would require the
reproduction of interactions between multiple brain regions to simulate
the activities of tens of billions of spiking neurons, something that is
impossible to perform with computer technology currently available to
us. Another problem besides computational power is the operation and
110

110 On theMind

maintenance of such a tremendously complex simulator as well as tech-


niques for processing the results of simulations.
In fact, using supercomputers to reproduce neural circuits in the
brain presents some considerable challenges in terms of making the
simulation realistic. At present, we can obtain experimental data about
connectivity between different types of neurons by using techniques
such as labeling individual neurons with distinctly colored immuno-
fluorescence markers appearing in specially modified transgenic ani-
mals. These labeled neurons can be traced by confocal microscopy for
each section of the sampled tissue, and eventually a three-d imensional
reproduction of the entire system of interconnected neurons can be pre-
pared by stacking a number of the images. For example, the Blue Brain
project led by Henry Markram (Markram etal., 2015)reconstructed
microcircuitory in the somatosensory neocortical of rat, consisting of
about 31,000 neurons in a didgital computer model. This simulation
coped with neurophysiological details such as reconstruction of firing
propertis of 207 morpho-electrical types of neural cells in the circuit.
The project is now attempting to reproduce the entire visual cortex,
which consists of about a million of columns each of which consists of
about 10,000 cells. If this is achieved, it may also be possible to create a
cellular-level replica of the entire brain! Of course, such an accomplish-
ment would provide us with vast amounts of scientific insight.
At the same time, however, Iwonder how tractable such a realistic
brain simulator would be. I imagine that for a realistic replica of the
brain to function properly, it might also require realistic interactions
with its environment. Therefore, it should be connected to a physical
body of some sort to attain equally realistic sensorymotor interactions
with the environment. It may take several years for the functions of
a human-level brain replica to develop to a sufficiently high level by
being exposed to realistic sensorymotor interactions, as we know that
the development of cognitive capabilities in human infants requires a
comparably long period of intensive parental care. Also, if a human-level
brain replica must be embedded in various social contexts in human
society to ensure its proper development, such an experiment may not
be feasible for various other reasons, including ethical problems associ-
ated with building such creatures. These issues will arise again in the
final chapters of thisbook.
If the goal of modeling, though, is to build not a complete replica of
the human brain but rather an artifact for synthesis and analysis that
111

Dynamical Systems Approach for Embodied Cognition 111

can be used to obtain a better understanding of the human mind and


cognition in general in terms of its organizational and functional prin-
ciples, such models must be built with an adequate level of abstraction to
facilitate their manipulability. Analogously, Herbert Simon (1981) wrote
that we might hope to be able to characterize the main properties of the
system and its behavior without elaborating the detail of either the outer
or inner environments in modeling human. Let us remember the analyti-
cal results obtained by Churchland and colleagues (2010) showing that
the principal dimensions of ensembles of neuronal firing can be reduced
to a few, as introduced in chapter 4. Then, it might be reasonable to
assume that the spiking of some hundreds of neurons can be reproduced
by simulating the activities of a few representative neural units modeled
as point masses. An interesting observation is that the macroscopic state
of collective neural activity changes continuously and rather smoothly
in low-dimensional space, even though the activity of each neuron at
each moment is discontinuous and noisy in regard to spiking. So, cogni-
tion and behavior might just correlate with this macroscopic state, which
changes continuously in a space whose dimensionality is several orders
lower than the original dimensionality of the space of spiking neurons.
Consequently, it might be worthwhile to consider a network model
consisting of a set of interacting units in which each unit essentially
represents a single dimension of the original collective activity of the
spiking neurons. Actually, this type of abstraction has been assumed in
the connectionist approach, which is described in detail in the seminal
book Parallel and distributed processing:Explorations in the microstructure
of cognition, edited by Rumelhart, McClelland, and the PDP Research
Group (1986). They showed that simple network models consisting
of sets of activation units and connections can model various cogni-
tive processes, including pattern matching, dynamic memory, sequence
generation-recognition, and syntax processing in distributed activation
patterns of the units. Those cognitive processes are emergent proper-
ties of the interactive dynamics within networks, which result from the
adjustment of connectivity weights between the different activation
units caused by learning.
Among the various types of connectionist network models proposed,
Ifind particularly interesting a dynamic neural network model called the
recurrent neural network (RNN) (Jordan, 1986; Elman, 1990; Pollack,
1991). It is appealing because it can deal with both spatial and tem-
poral information structures by utilizing its own dynamic properties.
112

112 On theMind

However, the most important characteristics of RNNs are their gen-


erality. As we proceed, well see that RNNs, even in their minimal
form, can exhibit general cognitive functions of learning, recognizing,
and generating continuous spatiotemporal patterns that achieve gener-
alization and compositionality while also preserving context sensitiv-
ity. These unique characteristics of RNNs are due to the fact that they
are nonlinear dynamical systems with high degrees of adaptability. It
is well known that any computational process can be reconstructed by
nonlinear dynamical systems as long as their parameters are adequately
set (Crutchfield & Young, 1989). Astudy by Hava Siegelmann (1995)
has established the possibility that analog computations by RNNs can
exhibit an ultimately complex computational capability that is beyond
the Turing limit. This can be understood by the fact that a nonlinear
dynamical system can exhibit complexity equivalent to an infinite state
machine depending on its parameters, as described in section5.1.
Next, we start to look at a simpler neural network, the feed-forward
network that can learn input-output mapping functions for static pat-
terns. Then, we show how this feed-forward network can be extended
to RNNs, which can learn spatio-temporal patterns. At the same time,
we examine the basic characteristics of the RNN model from the per-
spective of nonlinear dynamical systems.

5.5. Neural NetworkModels

This section introduces three types of basic neural network models


including the three- layered feed-
forward network, the discrete-time
RNN, and the continuous-time RNN (CTRNN). All three types have
two distinct modes of operation. One is the learning mode for deter-
mining a set of optimal connectivity weights from a training dataset,
and the other is the testing mode in which an optimal output pattern is
generated from an example test input pattern.

5.5.1 The Feed-Forward NetworkModel

The feed-forward network model is shown in Figure 5.14. It consists of


an input unit layer, a hidden unit layer and an output unit layer. Neural
activations propagate from the input units to the hidden units and to
the output units through the connectivity weights spanning between
113

Dynamical Systems Approach for Embodied Cognition 113

Oni

Oni
Output ni = (oni oni . oni . (1 oni
( (

Wij Wij = ni . anj

Hidden a nj
nj = i (ni . Wij . anj . (1 anj
( (

Wjk
Wjk = nj . ank
Input in nk

Figure5.14. The feed-forward network model. Feed-forward activation


and error back-propagation schemes are illustrated in the model. The right
side of the figure shows how the delta error and the updated weights can be
calculated through the error back-propagation process from the output layer
to the hiddenlayer.

each layer. The objective of learning is to determine a set of optimal


connectivity weights that can reconstruct input-output patterns given
in the target training dataset. The learning is conducted by utilizing
the error back-propagation scheme that was conceived independently
by Shun-Ichi Amari (1967), Paul Werbos (1974), and Rumelhart and
colleagues (1986).
We assume that the network consists of input units (indexed with k),
hidden units (indexed with j, and output units (indexed with i) and is
trained to produce input-output mapping for P different patterns. The
activations of the units when presented with the nth pattern are denoted
as innk , anj , and oni , respectively, where innk , is given as input. The poten-
tials of the hidden and output units are denoted as unj and uni , respec-
tively, and the training target of the nth pattern is denoted as oni . Thus,
the forward activation of an output unit is writtenas:

uni = j wij anj + bni (Eq.7a)


oni = f (uni ) (Eq.7b)

Wherebni is a bias value for each unit and f is a sigmoid functiion.


Similarly for the hidden unitsas:

unj = k w jk innk + bnj (Eq.8a)



114

114 On theMind

anj = f (unj ) (Eq.8b)

Here, the goal of learning is to minimize the square of the error between
the target and the output, as shown inEq.9

1
En = (oni oni ) (oni oni )
2 i
(Eq.9)

First, we formulate how to update the connection weights in the output


layer, which are denoted as wij . Because the weights should be updated
in the direction of minimizing the square of the error, the direction can
be obtained by taking the derivative of E n with respect to wij as follows:

E n
wij =
wij

The right side of this equation can be decomposedas:

E n E n u i
= i n
wij un wij

By applying Eq. 7a to the second derivation on the right side, weobtain

E n E n (Eq.10)
= i anj
wij un

E n
Here, is the delta error of the ith unit, which is denoteed as in . The
uni
delta error represents the contribution of the potential value of the unit
to the squareerror:

E n
in =
uni

E n oni
=
oni uni

By applying Eq. 9 to the first term on the right side and taking the deriv-
ative of the sigmoid function with respect to the potential for the second
term of the preceding equation, the delta error at the ith unit can be
obtained as follows.
115

Dynamical Systems Approach for Embodied Cognition 115

in = ( oni oni ) oni (1 oni ) (Eq.11)

Furthermore, by utilizing the delta error in Eq. 10, the updated weight
can be writtenas:

wij = in anj (Eq.12)


Next, we obtain the updated connection weights of the hidden layer,


which are denoted as w jk , by taking the derivative of E n, with
respecttow jk .

E n
w jk =
w jk

E n unj
=
unj w jk

E n
By substituting with the delta error at the jth nj and folding by
u j
unj n
at ank by applying Eq. 8a, the updated weights can be writtenas:
w jk

w jk = nj ank (Eq.13)

Here, nj can be derived from the previously obtained in as folllows :

E n
nj =
unj

E n u i a j
= i i nj nj
un an un

= i ( in wij ) anj (1 anj ) (Eq.14)


It should be noted that (


i
i
n wij ) in the first term on the right side
represents the sum of delta errors in back-propagated to the jth hidden
unit multiplied by each connection weight wij . If there are more layers,
the same error back-propagation scheme is repeated, in the course of
which (1)the delta error at each unit in the current layer is obtained by
back-propagating the error from the previous layer through the connec-
tion weights and (2)the incoming connection weights to the units in the
116

116 On theMind

current layer are updated by using the obtained delta errors. The actual
process of updating the connection weights is implemented through
summation of each update for all training patternsas:

P
w new = w old + w n
(Eq.15)
n

5.5.2 Recurrent Neural NetworkModels

Recurrent neural network models have been used to investigate the


human cognitive capability of dealing with temporal processes such as
in motor control (Jordan, 1986)and language learning (Elman, 1990).
Lets look at the exact form of the RNN model. Although various
types of RNNs have been investigated so far (Jordan, 1986; Doya &
Yoshizawa, 1989; Williams & Zipser, 1989; Elman, 1990; Pollack, 1991;
Schmidhuber, 1992), it might be helpful to look at the Jordan-type
RNN (Jordan, 1986), as it is one of the simplest implementations, illus-
trated in Figure5.15.
This model has context units in addition to current step inputs int
and next step output out t +1. The context units represent context or the
internal state in representing dynamic sequence patterns. In the forward
dynamics, the current step context unit activation c t is mapped to its
next step activationc t +1.
Let us consider an example of learning to generate a simple
1-d imensional cyclic sequence pattern of period 3 such as 0 0 1 0 0

(a) (b) Out


3 error

Outt+1 Out3

Contextt+1 In2 t=2


Outt+1
Out2 error

Out2

In1 t=1

Int Out1 error


Contextt
Out1

t=0
In0

Figure5.15. Jordan-t ype RNN. (a)Forward activation and (b)the error


back-propagation through time scheme in the cascadedRNN.
117

Dynamical Systems Approach for Embodied Cognition 117

1 0 0 1. In this example, the input is given as a sequence of 0 0 1


0 0 1 0 0 and the target output is given as this sequence shifted for-
ward one step, 0 1 0 0 1 0 0 1. The learning of this type of sequence
faces the hidden state problem because the sequences include the same
target output value in different orders in the sequence (i.e., two 0s in the
first step and the second step in this cyclic sequence pattern.) Although
this type of sequence cannot be learned by the feed-forward network
by means of simple input-output mapping, the RNN model with the
context units can learn it if the context unit activation states can be
differentiated from ambiguous outputs, that is, a 1-d imensional context
activation sequence is formed such as 0.2 0.4 0.8 0.2 0.4 0.8 0.2 0.4
0.8, which is mapped to the output activation sequence of 0 0 1 0 0
1 0 0 1. It is noted that the Jordan-t ype RNN operated in discrete
time steps can be regarded as a dynamical map as shown in Eq. 2 in sec-
tion 5.1 by considering that the current state Xt consisting of the current
step input and the current context state is mapped to next state Xt +1 con-
sisting of the input and the context at next step. Connectivity weights
correspond to the parameter P of the dynamical map. Therefore, the
RNN model can acquire desired dynamic structures by adequately tun-
ing the connectivity weights to the learnable parameters of the dynami-
cal map. For example, the aforementioned cyclic pattern of repeating
0 0 1 can be learned as a limit cycle attractor of period3.
One of the important characteristics of RNNs is that they can exhibit
dynamic activities autonomously without receiving any inputs when
operated in closed-loop by feeding the prediction output for next step
to the input of the current step. This phenomenon is explained quali-
tatively by Maturana and Varela (1980), who state that neural circuits
are closed circuits without any input or output functions. Closed cir-
cuits maintain endogenous dynamics that are structurally coupled with
sensory inputs and produce motor outputs, wherein sensory inputs
are considered to be perturbative inputs to the endogenous dynamics.
Although it is tempting to think that motor outputs are generated sim-
ply by mapping different sensory states to sensory reflexes in different
situations, this process should in fact involve additional steps, including
utilization of the autonomous internal dynamics of the closed network.
Iterative interactions between interconnected neural units afford the
RNN a certain amount of autonomy, which might well constitute the
origin of voluntary or contextual sensitivity. Later sections return to this
point, again, as we focus in on the issue of freewill.
118

118 On theMind

The RNN employs a learning scheme, back-propagation through time


(BPTT) (Rumelhart etal., 1986; Werbos, 1988), which has been devel-
oped by extending the conventional error back-propagation scheme in
the backward time direction to develop adequate dynamic activation
patterns in the context units. In the aforementioned feed-forward net-
work model, the connectivity weights between the output layer and the
hidden layer are updated by using the error generated between the tar-
get output and the generated output. Then the connectivity weights
between the input layer and the hidden layer are updated using the
delta error back-propagated from the output units to the hidden units.
However, in the case of RNN, there are no error signals for the con-
text output units because there are no target values for them. Therefore
there are no means to update the connectivity weights between the con-
text output units and the hidden units. However, in this situation, if the
delta error back-propagated from the hidden units to the context input
units is copied to the context output units in the previous step, the
connectivity weights between the context output units and the hidden
units can be updated by utilizing this copied information.
This scheme of the BPTT can be well understood by supposing that
an identical RNN is cascaded in the direction of time to form a deep,
feed-forward network as shown in Figure 5.15b. In this cascaded net-
work, the current step activation of the context output units is copied
to the context input units in next step, which is repeated from the start
step to the end step in the forward computation. On the other hand,
in the backward computation for the BPTT, the error generated in the
output units at a particular time step is propagated through the context
input units to the previous step context output units, which is repeated
until the delta error signal reaches the context input units in the start
step. In the BPTT, the error signals originating from the output units of
different time steps are accumulated as the time step folds back and by
which all the connectivity weights of the identical RNN can be updated.
The capability of the RNN for self- organizing context-
dependent
information processing can be understood well by looking at a prominent
research outcome presented by Jeffrey Elman (1991) on the topic of lan-
guage learning utilizing RNN models. He showed that a version of RNN,
now called an Elman net (Figure 5.16a) can learn to extract grammatical
structures from given exemplar sentences. In his simulation experiment,
the example sentences for training the network were generated by using
a lexicon of 23 items including 8 nouns, 12 verbs, the relative pronoun
119

Dynamical Systems Approach for Embodied Cognition 119

(a) Next word prediction output (b) S  NP VP .


NP  PropN | N | N RC
VP  V (NP)
RC  who NP VP | who VP
N  boy | girl | cat | dog | boys | girls | cats | dogs
Context loop PorpN  John | Mary
V chase | feed | see | hear | walk | live |
chases | feeds | sees | hears | walks | lives

Current context (c)


S
Current word input NP VP
N RC V NP
who NP VP N
N V
dog who boys feed sees girl

Figure5.16. Sentence learning experiments done by Elman. (a)The Elman


network, (b)the context free grammar employed, and (c)an example
sentence generated from the grammar (Elman,1991).

who, and a period for indicating ends of sentences. The sentence gen-
erations followed a context-free grammar that is shown in Figure5.16b.
As described in chapter 2, various sentences can be generated by
recursively applying substitution rules starting from S as the top of the
tree representing the sentence structure. Especially, the presence of a
relative clause with who allows generation of recursively complex sen-
tences such as:Dog who boys feed sees girl. (See Figure 5.16c.) In the
experiment, the Elman network was used for the generation of succes-
sive predictions of words in sentences based on training of exemplar
sentences. More specifically, words were input one-at-a-time at each
step, and the network predicted the next word as the output. After the
prediction, the correct target output was shown and the resultant pre-
diction error was back-propagated, thereby adapting the connectivity
weights. At the end of each sentence, the first word in the next sentence
was input. This process was repeated for thousands of the exemplar
sentences generated from the aforementioned grammar. It is noted that
the Elman network in this experiment employed a local representation
in the winner-take-all way using a 31-bit vector for both the input and
the output units. Aparticular word was represented by an activation of
a corresponding unit out of 31 units. The input and the output units had
the same representation.
The analysis of network performance after the training of the tar-
get sentences showed various interesting characteristics of the network
120

120 On theMind

behaviors. First, look at simple sentence cases. When a singular noun


boy was input, all three singular verb categories as well as who for
relative clause were activated as possible predicted next words and all
other words were not activated at all. On the other hand, when a plural
noun boys was input, all plural verbs and who were activated. This
means that the network seems to capture the singularplural agree-
ments between subject nouns and verbs. Moreover, actual activation val-
ues encoded the probability distribution of next-coming words, because
only boy or boys cannot determine the next words deterministically.
It was also observed that the network captures verb agreement struc-
tures as well. For example, after two succeeding words boy lives are
input, a period is predicted. For the case of boy sees, both a period and
noun words were activated for the next prediction. Finally, for the case
of boy chases, only noun words were activated. The network seems to
understand that live and chase are an intransitive verb and a transi-
tive verb, respectively. It also understands that see can beboth.
Although the presence of a relative clause makes a sentence more
complex, singular plural agreements were preserved. An example
exists in the following paired sentences:

1. boy who boys chase chasesboy


2. boys who boys chase chaseboy

Actually, the network activated the singular verbs after being input
boy who boys chase and activated the plural ones after being input
boys who boys chase. To keep singularplural agreements between
subjects and distant verbs, the information of singular or plural of the
subjects had to be preserved internally. Elman found that context acti-
vation dynamics can be adequately self-organized in the network for this
purpose.

5.5.3 Continuous Time Recurrent Neural NetworkModel

Next, we look at an RNN model operated in continuous time, which


is known as a continuous-time recurrent neural network (CTRNN)
(Doya & Yoshizawa, 1989; Williams & Zipser, 1989). Let us consider a
CTRNN model without an explicit layer structure in which each neu-
ral unit has synaptic inputs from all other neural units and also from
its own feedback (see Figure 5.16a as an example). In this model, the
activation dynamics of each neural unit can be described in terms of the
differential equations shown in Eq. 16, equations in which each neural
121

Dynamical Systems Approach for Embodied Cognition 121

unit has synaptic inputs from all other neural units as well as from its
own feedback.

u i = u i + j a j wij + I i (Eq.16a)

i
a i = 1 / (1 + e u ) (Eq.16b)

The left side of Eq. 16a represents the time differential of the poten-
tial of the ith unit multiplied by a time constant , which is equated
with the sum of synaptic inputs subtracted from the first term u i. This
means that positive and negative synaptic inputs increase and decrease
the potential of the unit, respectively. If the sum of synaptic inputs is
zero, the potential converges toward zero. The time constant plays the
role of a viscous damper with its positive value. The larger or smaller
the time constant , the slower or faster the change of the potential ui.
You may notice that this equation is analogous to Eq. 3 of representing a
general form of the continuous-time dynamical system.
Next, lets examine the dynamics of CTRNNs. Randall Beer (1995a)
showed that even a small CTRNN consisting of only three neural units
can generate complex dynamical structures depending on its param-
eters, especially the values of the connection weights. The CTRNN
model examined by Beer consists of three neural units as shown in
Figure 5.17a. Figure 5.17bd shows that different attractor configura-
tions can appear depending on the connection weights.
An interesting observation is that multiple attractors can be generated
simultaneously with a given specific connection weight matrix. The eight
stable fixed-point attractors and the two limit-cycle attractors appear with
each specific connection weight, as shown in Figure5.17b and c, respectively,
and the attractor towards which the state trajectories converge depends on
the initial state. In Figure 5.17d, a single chaotic attractor appears with a dif-
ferent connection weight matrix. This type of complexity in attractor con-
figurations might be the result of mutual nonlinear interactions between
multiple neural units. In summary then, CTRNNs can autonomously gen-
erate various types of dynamic behaviors ranging from simple fixed-point
attractors through limit cycles to complex chaotic attractors, depending on
the parameters represented by connection weights (this characteristics is
the same also for discrete time RNN [Tani & Fukumura, 1995]). This fea-
ture can be used for memorizing multiple temporal patterns of perceptual
signals or movement sequences, which will be especially important as we
consider MTRNNslater.
122

122 On theMind

(a) w11
1
w31 w21
w13 w12

w32 2
3
w33 w23 w
22

(b) (c) (d) u1


u2 10 2 u1 2
5 4 4
6
6
0
10 2 2

u3 5 u3 u3
1 1
0
0 0 0
5 4 4
2 2
u1 10 0 u2 0 u2

Figure5.17. Different attractor configurations appear in the dynamics of


a continuous time RNN model consisting of three neural units receiving
synaptic inputs from the other two neural units as well as own recurrent
ones. (a)The network architecture, (b)eight stable fixed-point attractors
denoted as black points, (c)two limit cycles denoted as line circles with
arrows, and (d)chaotic attractors. (b), (c), and (d)are adopted from (Beer,
1995a) with permission.

In the case of a CTRNN characterized by the time constant param-


eter , the BPTT scheme for supervised learning is used with slight
modifications to the original form. Figure 5.18 illustrates how the BPTT
scheme can be implemented in aCTRNN.
First, the forward activation dynamics of the CTRNN for n steps is
computed by following Eq. 17 with a given initial neural activation state
at each unit. Eq. 17 is obtained by converting Eq. 16 from a differential
equation form into a difference equation by using Eulers method for the
purpose of numerical computation.

1 i 1
uti = (1 i
)ut 1 + i ( j ati1wij + I ti1) (Eq.17a)

i
ati = 1 / (1 + e ut ) (Eq.17b)

What we have here is the leaky-integrator neuron with a decay rate of


1
1 i . After the forward computation with these leaky integrator neural

123

Dynamical Systems Approach for Embodied Cognition 123

On On1 On2
errorn errorn1 errorn2
On On1 On2
0 0 0

1 1 1
3 5 3 5 2 5

2 4 2 4 2 4

n step n1 step n2 step

Figure5.18. An extension of the error back-propagation scheme to


CTRNNs. The figure shows how the error generated at the nth step is
propagated back to the (n2)nd step. Arrows with continuous lines, dotted
lines, and chain lines denote the back-propagation error generated at the
nth step, the (n1)st step, and the (n2)nd step, respectively. These errors
continue to back-propagate along the forward connection overtime.

units, the back-propagation computation is initiated by computing the


error between the training target and the current output in the nth step.
The delta error at the output unit in the nth step is computed as 0n .
Then, this delta error is back-propagated to the 1st, 2nd, 4th, and 5th units,
as denoted by continuous arrows, through forward connections. These
delta errors propagated to local units are further back-propagated to the
1st, 2nd, 3rd, and 4th units in the (n1)st step and to the 1st, 2nd, and 4th
units in the (n2)nd step. Additionally, the delta errors generated at the
output unit in steps n1 and n2 are also back-propagated in the same
manner, as denoted by dotted-line and chain-line arrows, respectively.
This back-propagation process is recursively repeated until the 1st step
of the sequence is reached. One important note here is that the way of
computing the delta error in CTRNN is different from the one in the
conventional RNN because of the leaky integrator term in the forward
activation dynamics defined in Eq. 17a. The delta error at the ith unit E ,
uti
either for an output unit or an internal unit, is recursively calculated
from the following formula:

E

(
i i
) i i
(


) 1 E
ot ot ot 1 ot + 1 i
i ut +1
i Out
= (Eq.18)
ut
i
E 1 1
k N u k ik 1 + wki at 1 at
i i
( ) i Out
t +1
i k

124

124 On theMind

From the right-hand side of Eq. 18 it can be seen that the ith unit in the

current step t inherits a large portion 1 1 of the delta error E
i uti +1
from the same unit in the next step t+1 when its time constant i is
relatively large. It is noted that Eq. 18 turns out to be the conventional,
discrete time version of BPTT when i is set at 1.0. This means that,
in a network with a large time constant, error back-propagates through
time with a small decay rate. This enables the learning of long-term cor-
relations latent in target time profiles by filtering out fast changes in the
profiles. All delta errors propagated from different units are summed at
each unit in each step. For example, at the 1st unit in the (n1)st step,
the delta errors propagated from the 0 th, 2nd, and 1st units are summed
to obtain the error for the (n1)st step. By utilizing the delta errors com-
puted for local units at each step, the updated weights for the input
connections to those units in step n1 are obtained by following Eq.13.
Although the aforementioned models of feed- forward networks,
RNN and CTRNN employ the error back-propagation scheme as the
central mechanism for learning, their biological plausibility in neuronal
circuits has been questioned. However, some supportive evidence has
been provided by Mu-ming Poo and colleagues (Fitzsimonds etal., 1997;
Du & Poo, 2004), as well as by Harris (2008) in related discussions. It
has been observed that the action potential back-propagates through
dendrites when postsynaptic neurons in the downstream side fire upon
receiving synaptic inputs above a threshold from the presynaptic neu-
rons in the upstream side. What Poo has further suggested is that such
synaptic inhibition or potentiation depending on information activity
can propagate backward across not just one but some successive synaptic
connections. We can, therefore, speculate that the retrograde axonal
signal (Harris, 2008)conveying error information might propagate from
the peripheral area of sensorymotor input-output to the higher-order
cortical area, modulating its contextual memory structures by passing
through multiple layers of synapses and neurons in the real brains like
the delta error signal back-propagates from the output units to the inter-
nal units in the CTRNN model. In light of this evidence, then, the bio-
logical plausibility of this approach appears promising.
It should also be noted, however, that counterintuitive results have
been obtained by other researchers. For example, using the echo-state
network (Jaeger & Haas, 2004), a version of RNN in which internal
units are connected with randomly predetermined constant weights and
125

Dynamical Systems Approach for Embodied Cognition 125

only the output connection weights from the internal units are modu-
lated without using error back-propagation, Jaeger and Haas showed
that quite complex sequences can be learned with this scheme. My
question here would be what sorts of internal structures can be gener-
ated without the influence of error-related training signals. The next
section introduces neurorobotics studies that use some of the neural
network models, including the feed-forward network model and the
RNNmodel.

5.6. Neurorobotics fromthe Dynamical Systems Perspective

Although Rodney Brooks did not delve deeply into research on adap-
tive or learnable robots, other researchers have explored such topics
while seriously considering the issues of embodiment emphasized on
the behavior-based approach. Arepresentative researcher in this field,
Randall Beer (2000), proposed the idea of considering the structural
coupling between the neural system, the body, and the environment, as
illustrated in Figure5.19.
The internal neural system interacts with its body and the body inter-
acts with its surrounding environment, so the three can be viewed as a
coupled dynamical system. In this setting, it is argued that the objective
of neural adaptation is to keep the behavior of the whole system within a
viable zone. Obviously, this thought is quite analogous to the Gibsonian
and Neo-Gibsonian approaches as described in section5.2.
In the 1990s, various experiments were conducted in which different
neural adaptation schemes were applied in the development of sensory
motor coordination skills in robots. These schemes included:evolutional
learning (Koza, 1992; Cliff etal., 1993; Beer 1995; Nolfi & Floreano,
2000; Di Paolo, 2000; Ijspeert, 2001; Ziemke & Thieme, 2002;
Ikegami & Iizuka, 2007), which uses artificial evolution of genomes
encoding connection weights for neural networks based on principles
such as survival of the fittest; value- based reinforcement learning
(Edelman, 1987; Meeden, 1996; Shibata & Okabe, 1997; Morimoto &
Doya, 2001; Krichmar & Edelman, 2002; Doya & Uchibe, 2005; Endo
etal., 2008), wherein the connection weights are modified in the direc-
tion of reward maximization; and supervised and imitation learning
(Tani & Fukumura, 1997; Gaussier etal., 1998; Schaal, 1999; Billard,
2000; Demiris & Hayes, 2002; Steil etal., 2004), wherein a teacher or
126

126 On theMind

Environment

Body

Neural
System

Figure5.19. The neural system, the body, and the environment are considered
as a coupled dynamical system by Randall Beer (2000).

imitation targets exist. Most of these experiments were conducted in


minimal settings with rather simple robots (mobile robots with range
sensors in many cases) with small-scale neural controllers (influenced
by Gibsonian and behavior-based philosophy). Although the experi-
ments might have lacked scalability both with respect to engineering
applications and accounting for human cognitive competence, they do
demonstrate that nontrivial structures in terms of minimal cognition
can emerge in the structural coupling between simple neural network
models and the environment.
Lets look now at a few examples of these studies from among the
many remarkable studies that have been conducted. Especially, the fol-
lowing emphasize the dynamical systems perspectives in developing and
generating the minimal cognitive behaviors by neurorobots.

5.6.1 Evolution ofLocomotion withLimit Cycle Attractors

It is widely held that rhythmical movements in animals, such as loco-


motion, are generated by neural circuits called central pattern genera-
tors (CPGs), which generate oscillatory signals by means of limit-cycle
dynamics in neural circuits (Delcomyn, 1980). By constructing synthetic
127

Dynamical Systems Approach for Embodied Cognition 127

simulation models and conducting robotics studies based on the con-


cept of CPGs, a number of researchers have investigated the adaptation
mechanisms of walking locomotion in a number of animals:six-legged
insects (Beer, 1995b), four-legged dogs (Kimura etal., 1999), and two-
legged humans (Taga etal., 1991; Endo etal., 2008), as well as in walk-
ing and swimming via the spinal oscillation of four-legged salamanders
(Ijspeert,2001).
Especially, Beer (1995b) investigated how stable walking can be
achieved for six-legged insect-like robots under different conditions in
the interaction between internal neural systems and the environment
by utilizing artificial evolution within CTRNN models. In this artificial
evolution scheme, the connectivity weights in CTRNN are randomly
modulated in terms of mutation. If some robots exhibit better per-
formance in terms of the predefined fitness functions with the modu-
lated weights in the networks as compared with others, these robots
are allowed to reproduce, with their offspring inheriting the same
connectivity weights of the networks. Otherwise, characteristic connec-
tivity weights within networks are not reproduced. Thus, connectivity
weights are adapted in the direction of maximizing fitness over genera-
tions of population dynamics.
In Beers model, each leg is controlled by a local CTRNN consist-
ing of a small number of neural units. Gait and motor outputs serve as
sensory inputs for the network in terms of the torques generated when
the legs move forward and backward. The six local CTRNNs are sparsely
connected to generate overall body movement. During the evolutionary
learning stage, the connectivity weights within the local CTRNN as well
as the interconnections between the six local CTRNNs are mutated, and
the fitness of the individual is evaluated by measuring the maximum for-
ward walking distance within a specific time period.
An interesting finding from Beers simulation experiments on artificial
evolution is that evolved locomotion mechanisms were qualitatively dif-
ferent under different evolutionary conditions. First, if the sensory inputs
were constantly enabled during evolution, a reflective pattern genera-
tor evolved. Because the leg movements were generated by means of
reflections of sensory inputs, the locomotive motor pattern was easily
distorted when the sensory inputs were disrupted. Second, if the sensory
inputs were made inaccessible completely from the network during evo-
lution, a CPG-type locomotive controller evolved. This evolved control-
ler could generate autonomous rhythmic oscillation without having any
128

128 On theMind

(a) R3
R2
R1
L3
L2
L1
Velocity

(b) R3
R2
R1
L3
L2
L1

Velocity

Figure5.20. Six-legged locomotion patterns generated by the evolved


mixed pattern generator (a)shows a gait pattern with sensory feedback, and
(b)shows one without sensory feedback. The case with the sensory feedback
shows more stable oscillation with tight coordination among different legs.
Adopted from (Beer, 1995b) with permission.

external drives, by means of the self-organizing limit cycle attractor in the


CTRNN. Third, if the presence of the sensory inputs was made unreli-
able during evolution, a mixed pattern generator evolved. Although this
controller could generate robust basic locomotion patterns even when
the sensory inputs were disrupted, it demonstrated better locomotion
performance when the sensory feedbacks were available (Figure5.20).
In summary, these experiments showed that limit cycle attractors can
emerge in the course of evolving the CTRNN controller for generating
locomotion in different ways, depending on the parameters set for the
evolution process. When the sensory feedback is available, a limit cycle is
organized in the coupling between the internal dynamics of the CTRNN
and the environment dynamics. Otherwise, the limit cycle attractor
appears in the form of an autonomous dynamic in the CTRNN alone.
Beer speculated that the mixed strategy that emerges under the condi-
tion of unreliable sensory feedback is the most typical among biological
pattern generators.

5.6.2 Developing SensoryMotor Coordination

Schemes of evolutionary learning have been applied in robots for various


goal-d irected tasks beyond locomotion by developing sensorymotor
coordination adequate for such tasks. Scheier, Pfeifer, and Kuniyoshi
129

Dynamical Systems Approach for Embodied Cognition 129

Figure5.21. The Khepera robot, which features two wheel motors and
eight infrared proximity sensors mounted in the periphery of the body.
Source:Wikipedia.

(1998) showed that nontrivial perceptual categorization capabilities can


be acquired by inducing interactions between robots and their environ-
ments. They prepared a workspace for a miniature mobile robot (55mm
in diameter), called Khepera (Figure 5.21), where large and small cylin-
drical objects were placed at random.
The behavioral task for the robot was to approach large cylindrical
objects and to avoid small ones. This task is far from trivial because the
sensing capabilities of the Khepera robot are quite limited, consisting of
just eight infrared proximity sensors attached to the periphery of the body.
Therefore, the robot can acquire eight directional range images represent-
ing distances to obstacles, but detection occurs only when an obstacle is
within 3cm, and the images are low resolution. Scheier and colleagues
implemented a feed- forward neural network model that receives six
directional range images from sensors at the front and controls the speeds
of the left and right motors. The synaptic weights necessary for determin-
ing the characteristics of mapping sensor inputs to motor outputs were
obtained in an evolutionary way. The fitness value for evolutionary selec-
tion increased when the robot stayed closer to large cylindrical objects
and decreased when the robot stayed closer to smallones.
It was reported that when the robot evolved a successful network to
accomplish the task, it would wander around the environment until it
found an object and then would start circling it (Figure 5.22). The robot
130

130 On theMind

Figure5.22. An illustration of the behavior trajectory generated by


a successfully evolved Khepera robot. It would wander around the
environment until it found a cylinder of large size and then would start
circlingit.

would eventually leave its trajectory if the object was a small cylindrical
one, otherwise it would keep circling if the object was large. Because it
was difficult to distinguish between large and small cylindrical objects by
means of passive perception using the installed low-resolution proximity
sensors, the evolutionary processes found an effective scheme based on
active perception. In this scheme, the successfully evolved robot circled
around a cylindrical object, whether small or large, simply by following
the curvature of its surface, utilizing information from proximity sensors
on one side of its body. Asignificant difference was found between large
and small objects in terms of the way that the robot circled the object by
generating different profiles of the motor output patterns which enabled
different object types to be identified. This example clearly shows that
this type of active perception is essential for the formation of the robots
behavior, whereby perception and action become inseparable. Eventually,
sensorymotor coordination was naturally selected for active perception
in their experiment.
Nolfi and Floreano (2002) showed another good example of evolu-
tion based on active perception, but in this case there is the added ele-
ment of self-organization, the so-called behavior attractor. They showed
that the Khepera robot equipped with a simple perceptron-t ype neural
network model can evolve to distinguish between walls and cylindrical
objects, avoiding walls while staying close to cylindrical objects. After
131

Dynamical Systems Approach for Embodied Cognition 131

the process of evolution, the robot moves around by avoiding walls and
staying close to cylindrical objects whenever encountering them. Here,
staying close to cylindrical objects does not mean stopping. Rather,
the robot continues to move back and forth and/or left and right while
maintaining its relative angular position to the object almost constant.
Asteady oscillation of sensorymotor patterns with small amplitude was
observed while the robot stayed close to the object. Nolfi and Floreano
inferred that the robot could keep its relative position by means of active
perception that was mechanized by a limit cycle attractor developed in
the sensorymotor coupling with the object. These two experimental
studies with the Khepera robot show that some nontrivial schemes for
sensorymotor coordination can emerge via network adaptation through
evolution even when the network structure is relatively simple.
Before closing this subsection, Iwould like to introduce an intrigu-
ing scheme proposed by Gaussier and colleagues (1998) for generat-
ing immediate imitation behaviors of robots. The scheme is based on
the aforementioned thoughts by Nadel (see section 5.2) that immediate
imitation as a means for communication can be generated by synchro-
nization achieved by a simple sensorymotor mapping organized under
the principle of homeostasis. Gaussier and colleagues built an arm robot
with a vision camera that learned a mapping between the arm's position
as perceived in the visual frame and the proprioception (joint angles) of
its own arm by using a simple perceptron-t ype neural network model.
After the learning, another robot of a similar configuration was placed
in front of the robot and the other robot moved its arm (Figure5.23).

Vision camera
Arm movement

Visual
Arm movement

percept
Controller

Proprioception
Self-robot Other robot like me

Figure5.23. Arobot generates immediate imitation of another robots


movement by using acquired visuo-proprioceptive mapping (Gaussier
etal.,1998).
132

132 On theMind

When the self-robot perceived the arm of the other robot as its own,
its own arm was moved and synchronized with the one of the other for
the sake of minimizing the difference between the current propriocep-
tion state and its estimation obtained from the output of the visuo-
proprioceptive map under the homeostasis principle. This study nicely
illustrates that immediate imitation can be generated as synchronicity
by using a simple sensorymotor mapping that also supports the hypoth-
esis of the like me mechanism also described in section5.2.
Next, we look at a robotics experiment that uses sensorymotor map-
ping but in a context-dependent manner.

5.6.3 Self-Organization ofInternal Contextual Dynamic


Structures inNavigation

We should pause here to remind ourselves that the role of neuronal sys-
tems should not be regarded as a simple mapping from sensory inputs
to motor outputs. Recalling Maturana and Varela (1980), neural cir-
cuits are considered to exhibit endogenous dynamics, wherein sensory
inputs and motor outputs are regarded as perturbations of and readouts
from the dynamical system, respectively. This should also be true if
we assume dynamic neural network models with recurrent connections,
such as RNNs or CTRNNs. The following study shows such an example
from my own investigations on learning goal-d irected navigation, which
was done in collaboration with Naohiro Fukumura (Tani & Fukumura,
1993, 1997). The experiment was conducted with a real mobile robot
named Yamabico (Figure 5.24a).
The task was designed in such a way that a mobile robot with limited
sensory capabilities learns to navigate given paths in an obstacle envi-
ronment through teacher supervision. It should be noted that the robot
cannot access any global information, such as its position in the X-Y
coordinate system in the workspace. Instead, the robot has to navigate
the environment depending solely on its own ambiguous sensory inputs
in the form of range images representing the distance to surrounding
obstacles.
First, let me explain a scheme called branching that is implemented
in low-level robot control. The robot is preprogrammed with a colli-
sion avoidance maneuvering scheme that determines its reflex behav-
ior by using inputs from the range sensors. The range sensors perceive
range images from 24 angular directions covering the front of the robot.
133

Dynamical Systems Approach for Embodied Cognition 133

(a)

CCD cameras

Laser projector

(b) (c) (d)


Branching
Time

Action
4 (branching)

4 3
3
start 2
2

1
context units
1 Range recurrent loop
Sensor

L R

Figure5.24. Yamabico robot and its control architecture. (a)The mobile


robot Yamabico employed in this experiment. (b)An example of a collision-
free movement trajectory that contains four branching points labeled 1 to 4.
(c) The corresponding flow of range sensor inputs, where brighter (closer)
and darker (farther) parts indicate their ranges. The exact range profile at
each branching point is shown on the right. Arrows indicate the branching
decision to advance to a new branch or to stay at the current one. (d)The
employed RNN model that receives inputs from range sensors and outputs
the branching decision at each branchingpoint.

The robot essentially moves toward the largest open space in a forward
direction while maintaining equal distance to obstacles on its left and
right sides. Then, a branching decision is required when a new open
space appears. Figure 5.24b,c illustrates how branching takes place in
this workspace.
134

134 On theMind

Once this branching scheme is implemented in the robot, the essence


of learning how to navigate the environment is reduced to the task of
learning the correct branching sequences associated with the sensory
inputs at each branching point. Here, the RNN model is used for learn-
ing the branching sequences. Figure 5.24d shows how the Jordan-t ype
RNN (Jordan, 1986) explained previously was used in the current
navigation task. In this architecture, the original 24-d imensional range
images are reduced to a three-d imensional vector by using a prepro-
cessing scheme. This reduced sensory vector is provided as input to the
RNN at each branching step, and the RNN outputs the corresponding
branching decision along with the context outputs. Learning proceeds
under supervision, wherein the experimenter trains the robot to gener-
ate correct branching on specified target routes. The target route in this
experiment is designed such that cyclic trajectories emerge in the form
of a figure-8 and a circular trajectory at the end, alternating them, as
shown in Figure5.25a.
In the actual training of the robot, the robot is guided repeatedly to
enter this target cyclic route by starting from various locations outside
the cyclic route (see Figure 5.25b for the traces of the training trajecto-
ries). Then, a set of sequential data consisting of the sensory inputs and
branching decisions along with the branching sequences are acquired.
This sequential data is used to train the RNN so it can generate cor-
rect branching decisions upon receiving sensory inputs in the respective
sequences. Note that this is not just simple learning of input-output
mapping, because the sensory inputs cannot necessarily determine the
branching outputs uniquely. For example, the decision whether to move
left and down or straight and down at the switching position denoted as
A in Figure 5.25a should depend on the current context (i.e., whether
the last travel was a figure-8 or a circular trajectory) instead of on solely
sensory inputs, because the latter are the same in both cases. This is
called the sensory aliasing problem. It is expected that such differentia-
tion of context unit activation can be achieved through adaptation of the
connection weights.
After the training stage, the experimenter examines how the robot
can accomplish the learned task by placing the robot in arbitrary ini-
tial positions. Figure 5.25c shows two examples of evaluation trials, in
which it can be seen that the robot always converges toward the desired
loop regardless of its starting position. The time required for achieving
convergence is different in each case, and even if the robot leaves the
135

Dynamical Systems Approach for Embodied Cognition 135

(a) (b)

(c)

Figure5.25. Training and evaluation trajectories. (a)The target trajectory,


which the robot loops around, forming a sequence of figure-8 and circular
trajectories, with A as the switching point between two sequences, (b)the
traces of the training trajectories, and (c)the traces of evaluation trajectories
starting from arbitrary initial positions. Adapted from (Tani & Fukumura,
1997)with permission.

loop after convergence under the influence of noise, it always returns


to the loop after a time. These observations indicate that the robot has
learned the objective of the navigation task as embedded in the attractor
dynamics of limit cycles, which are structurally stable.
It is interesting to examine how the task is encoded in the internal
dynamics of the RNN. By investigating the activation patterns of the
RNN after its convergence toward the loop, it is found that the robot is
exposed to a lot of noise during navigation. It is found as well that the
sensing input vector becomes unstable at particular locations and that
136

136 On theMind

the number of branches in one cycle is not constant, even though the
robot seems to follow the same cyclic trajectory. At the switching point
A for either route, the sensory input receives noisy jitter in different pat-
terns independent of the route. The context units, on the other hand,
are completely identifiable between two decisions, which suggests that
the task sequence between two routes is hardwired into the internal
contextual dynamics of the RNN, even in a noisy environment.
To sum up, the robot accomplished the navigation task in terms of
the convergence of attractor dynamics that emerge in the coupling of
internal and environmental dynamics. Furthermore, situations in which
sensory aliasing and perturbations arise can be disambiguated in navi-
gating repeated experienced trajectories by self-organizing the autono-
mous internal dynamics of theRNN.

5.7.Summary

The current chapter introduced the dynamical systems approach for


modeling embodied cognition. The chapter started with an introduc-
tion of nonlinear dynamics covering characteristics of different classes
of attractor dynamics. Then, it described Gibsonian and Neo-Gibsonian
ideas in psychology and developmental psychology, ideas central to the
contemporary philosophy of embodied minds (Varela et al., 1991).
These ideas fit quite well with the dynamical systems approach, and
this chapter looked at how they have influenced behavior-based robotics
and neurorobotics researchers who attempt to understand the essence
of cognition in terms of the dynamic coupling between internal neural
systems, bodies, and environments.
This chapter also provided brief tutorials on connectionist neural
network models with special focus on dynamic neural network models
including RNN and CTRNN. The chapter concluded by introducing
some studies on neurorobotics that aim to capture minimum cognitive
behaviors based on the ideas of nonlinear dynamical systems and by uti-
lizing the schemes of dynamic neural network models.
Although the dynamical systems views introduced in this chapter
in terms of Gibsonian psychology, connectionist level modeling, and
neurorobotics may provide plausible accounts for some aspects of the
embodied cognition, some readers might feel that these do not solve all
137

Dynamical Systems Approach for Embodied Cognition 137

of the essential problems outstanding in the study of cognitive minds.


They may ask how the dynamical systems approach described so far can
handle difficult problems including those of compositionality in cogni-
tion, of free will, and of consciousness. On the other hand, some others
such as Takashi Ikegami have argued that simple dynamic neural net-
work models are sufficient to exhibit a variety of higher-order cognitive
behaviors such as turn taking (Ikegami & Iizuka, 2007)or free decision
(Ogai & Ikegami, 2008), provided that the dynamics of the coupling of
bodies and environments are developed as specific classes of complex
dynamics. The next chapter introduces my own thoughts on the issue,
and Iput more emphasis on subjectivity than on the objective world as
we try to articulate a general thought of embodied cognition through
the study of neurodynamical robot models.
138
139

PartII
Emergent Minds:Findings
fromRobotics Experiments
140
141

6
New Proposals

The examples of learnable neurorobots described in c hapter5 illustrate


how various goal-directed tasks can be achieved through self-organizing
adequate sensorymotor coupling between the internal neuronal dynam-
ics and the bodyenvironment dynamics. Although the adaptive behav-
iors presented so far seem to capture at least some of the essence of
embodied cognition, Ifeel that something important is still missing. That
something is the subjectivity or intentionality of the system.

6.1. Robots withSubjectiveViews

Phenomenologists might argue that subjectivity cannot be detected


explicitly because the goal of embodied cognition is to combine the
subjective mind and the objective world into a single inseparable entity
through interactions with the environment. However, Iargue that such
a line of robotics research focuses only on reactive behavior based on
the perception-to-motor cycle, and, therefore, might never be able to
access the core problem of the dichotomy between the subjective mind
and the objective world. All these robots do is to generate adequate
motor commands reactively to current sensory inputs or to current inter-
nal states summed with past sequences of sensory inputs.

141
142

142 Emergent Minds:Findings from Robotics Experiments

When my group conducted the robot navigation experiments aimed


at learning cyclic trajectories mentioned in section 5.6, in the begin-
ning Iwas interested in observing the emergent behaviors of the robot
in terms of generating diverse trajectories in its transient states before
converging to a limit cycle. However, after a while, Ibegan to feel that
robots with such reactive behaviors are simply like the steel balls in
pinball machines, repeatedly bouncing against the pins until they finally
disappear down the holes. Although we might see some complexity on
the surface level of these behaviors, they are fundamentally different
from those generally expected from humans in the contexts of both
phenomenology and neuroscience. The behaviors of these robots seem
too automatic and not requiring any effort, as happens in machines,
which show no traits of subjectivity. The behaviors of these robots might
be analogous to patients with alien hand syndrome who show behaviors
generated automatically, as afforded by related perception without sub-
jective or intentional control (see section4.2).
Going back to Husserl (2002), he considered that the world consists of
objects that the subject can consciously meditate on or describe. However,
the bottom line is that direct experiences for humans originate not in such
consciously representable objects but in the continuity of direct experi-
ences in time. As described in chapter 3, in considering this problem,
Husserl assumed a three-level structure in phenomenological time that
consists of the absolute flow at the deepest level, the preempirical time
level of retention and protention, and the objective time at the surface
level. He also considered that the continuous flow of experiences becomes
articulated into consciously accessible events or objects as a result of
its development though these phenomenological levels. According to
Husserl, this development is achieved through a process of interweaving
double intentionality, namely transversal (retention and protention) and
longitudinal (immanence of levels) intentionality, into the unitary flow
of consciousness. Certainly, robots characterized with reactive behav-
ior have nothing to do with such intentionality for consolidating as-yet-
unknown everyday experiences into describable or narrative objects. This
is both good and bad. Although such robots might be able to mimic smart
insects, such as tumblebugs that skillfully roll balls of dung down path-
ways, at this level of sophistication they are not yet capable of authentic
and of inauthentic being as characterized by Heidegger (see section3.4).
This is to say that current robots cannot, like human beings, construct
their own subjective views of the world by structuring and objectifying
143

New Proposals 143

experiences accumulated through interactions with the world, and


especially with other beings more or less like themselves within it.
Constructed by each individual when constantly facing various prob-
lems unique to that individuals place within the objective world, such
characteristic viewpoints and the experiences that underlie them repre-
sent the subjectivity of the individual within the greater social system.
What Iwould like to build are robots that realize what Heidegger con-
sidered authentic being, a character that presumably emerges in dynamic
interplay between looking ahead toward possible futures and reflecting
on ones own unique past in order to recruit the resources necessary
to enact and realize the most possible future shared with others (see
section3.4).
How can the subjective views be constructed? Clearly, we are at the
formative stages of this work. However, some clue as to how to begin
and make no mistake, this is the very beginningappeared first in sec-
tion 4.2, which explained the possible role of the predictive model for
action generation and recognition in the brains. As Gibson and Pick con-
jectured (2000), a set of perceptual structures obtained when an active
learner engages in perceptual interaction with the environment and
extracts information from it can be regarded as a subjective view belong-
ing to that individual. Such an agent can have a proactive expectation
of what the world should look like as it performs its intended actions.
The developmental psychologist Claes von Hofsten has demonstrated
that even 4 month- old infants exhibit such anticipatory behaviors.
They track moving objects even when temporarily hidden from view
by making a saccade to the reappearance point before the object reap-
pears there (Rosander & von Hofsten, 2004). When they plan to reach
for an object, their hands start to close before the object is encountered
as they take into account the direction of and distance to the object
(von Hofsten & Rnnqvist, 1988). These infants have prospects for their
actions. These are the formative stages in the development of a poten-
tially authenticbeing.

6.2. Engineering Subjective Views intoNeurodynamicModels

So, as a first step in understanding how an artificial agent such as those


under consideration in this book may be engineered with the capacity to
act and eventually to be responsible for its actions, and moreover for how
144

144 Emergent Minds:Findings from Robotics Experiments

the world turns out because of them, we now need to consider a theo-
retical conversion from the reactive-type behavior generated by means of
perception-to-action mapping to the proactive behavior generated by means
of intention-to-perception mapping. Here, perception is active, and should be
considered as a subject acting on objects of perception, as Merleau-Ponty
(1968) explained in terms of visual palpation (see section 3.5). In terms of
the neurodynamic models from which our robots are constructed, the per-
ceptual structure for a particular intended action can be viewed as vector
flows in the perceptual space as mapped from this intention. The vector
flows constitute a structurally stable attractor. Let me explain this idea by
considering some familiar examples. Suppose the intended action is your
right hand reaching to a bottle from an arbitrary posture. If we consider
a perceptual space consisting of the end-point position of the hand that
is visually perceived and proprioception of the hand posture at each time
step, the perceptual trajectories for reaching the bottle from arbitrary posi-
tions in this visuo-proprioceptive space can be illustrated with reduced
dimensionality as shown in Figure 6.1a as a flow toward and a conver-
gence of vectors around an attractor that stands as the goal of the action.
These trajectories, and the actions that arise from them, can be gen-
erated by fixed point attractor dynamics (see section 5.1). In this case,
the position of the fixed point varies depending on the position of the
object in question, but all actions of a similar form can be generated by
this type of attractor.
Another example is that of shaking a bottle of juice rhythmically. In this
case, we can imagine the vector flow in the perceptual space as illustrated
in Figure6.1b, which corresponds to limit cycle attractor dynamics. The
essence here is that subjective views or images about the intended actions
can be developed as perceptual structures represented by the correspond-
ing attractor embedded in the neural network dynamics, as we have seen
with CTRNN models that can develop various types of attractors (section
5.5). By switching from one intention to another, the corresponding sub-
jective view in terms of perceptual trajectories is generated in a top-down
manner. These perceptual structures might be stored in the parietal cor-
tex associated with intentions received from the prefrontal cortex, as dis-
cussed in section 4.2. This idea is analogous to the Neo-Gibsonian theory
(Kelso, 1995)in which movement patterns can be shifted by phase transi-
tions due to changes in the system parameters (see section5.2).
The top-down projection of the subjective view should (only implic-
itly) have several levels in general, wherein the views at higher levels
145

New Proposals 145

(a) (b)

Proprioception

Proprioception
Vision Vision

Figure6.1. The perceptual trajectories for different intended actions in


visuo-proprioceptive space, for (a)approaching an object and (b)shakingit.

might be more abstract and those at lower levels might be more concrete
and detailed. Also, top-down views of the world should be composi-
tional enough so that proactive views for various ways of intentionally
interacting with the world can be represented by systematically recom-
bining parts of images extracted from accumulated experiences. For
example, to recall once again the very familiar image of everyday rou-
tine action with which this text began, when we intend to drink a cup
of coffee, the higher level may combine a set of subintentions for primi-
tive actions such as reaching-to-cup, grasping-cup, and bringing-cup-to-
mouth in sequences that may be projected downward to a lower level
where detailed proactive images of corresponding perceptual trajectories
can be generated. Ultimately, perceptual experiences, which are associ-
ated with various intentional interactions with the world, are:semanti-
cally combinatorial language of thought (Fodor and Pylyshyn,1988).
One essential question is how the higher level can manipulate or com-
bine action primitives or words systematically. Do we need a framework
of symbol representation and manipulation, especially in the higher cog-
nitive level, for this purpose? If Isaid yes to this, Iwould be criticized just
like Dreyfus criticized Husserl or like Brooks criticized conventional AI
and cognitive science research.
What I propose is this: We need a neurodynamic system, well-
formed through adaptation, that can afford compositionality as well as
146

146 Emergent Minds:Findings from Robotics Experiments

systematicity and gives an impression that as if discrete symbols existed


within the system as well as that as if these symbols were manipulated by
that system. The model of compositional or symbolic mind to which
Iam now pointing is not impossible to achieve in a neurodynamic sys-
tem, if we remember that the sensitivity of chaos toward initial condi-
tions exhibits a sort of combinatory mechanics by folding and stretching
in phase space. This chaotic dynamic can produce combinatory sequences
of symbols in terms of symbolic dynamics via partitioning processes of
the continuous state space with a finite number of labels, as described in
section 5.1. In simpler terms, the continuous space of action can be cut up
into chunks, and these chunks can be referenced as things in themselves,
represented, symbolized. Dale and Spivey (2005) have provided a sym-
pathetic argument, proposing that the promise of symbolic dynamics lies
in articulating the transition from dynamical, continuous descriptions of
perception into the theoretical language of discrete, algorithmic processes
for high-level cognition. What Iam saying here is that the segmentation
of thinking into discrete thoughts, which are represented in terms of
logical operators, as propositions, as combinations of symbols, can be per-
formed by dynamic models of mind that do not employ discrete symbolic
computation in their internal operations (Tani & Fukumura,1995.)
What about creative composition of primitives into novel sequences
of action? Neurodynamic models account for this capacity, as well.
Nonlinear dynamics can exhibit structural changes of varying discrete-
ness, as can be seen in bifurcations from one attractor structure to another
or in phase transitions by means of controlling relatively low-dimensional
external parameters. So, we may suppose that the higher level sending
sequences of parameter values to the lower level in the network results
in sequential switching of primitive actions by means of the parameter
bifurcation in this lower neurodynamic system. And if the neurodynam-
ics in the higher level for generating these parameter sequences is driven
by its intrinsic chaos, various combinatory sequences of the primitive
actions could be generated. Figure 6.2 illustrates theidea.
Although an agent driven by top-down intentions for action has proac-
tive subjective views on events experienced during its interaction with
the objective environment, its cognitive mind should also reflect on unex-
pected outcomes through the bottom-up process to modify the current
intention. This modification of the intentions for action in the bottom-
up process can be achieved by utilizing information about the prediction
error, the possibility of which having been briefly mentioned in the previ-
ous section. Figure 6.2 illustrates the process whereby the state values in
147

Higher level
Set intention in terms of initial state.

ify Sequences of state values sampled at Poincar section.


Mod
(0.91, 0.24), (0.37, 0.91), (0.65, 0.55) ......

Bottom-up error related signal

Lower level Sequences of parameter bifurcation

Predicting sequences of action primitives


parameter (0.91, 0.24) parameter (0.37, 0.91)
Proprioception

Proprioception

Vision Vision

Predicting visuo-proprioceptive state


Error

Actual

Figure6.2. An illustration showing how chaos can generate diverse


sequential combinations of action primitives. In the higher level, the state
trajectory is generated by a chaotic dynamic system with a given initial state
and the state values are sampled at each time they cross a Poincar section.
These state values are input to a parameterized dynamic system in the lower
level as its parameters successively (along the solid arrow) cause sequential
bifurcation in the parameterized dynamic system and associated action
primitives. The lower level predicts the coming visuo-proprioceptive state
and its prediction error is monitored. The state in the higher level is modified
in the direction of minimizing this error (along the dashed arrow.)
148

148 Emergent Minds:Findings from Robotics Experiments

the higher level are modified to minimize the prediction error in the lower
level. This error signal might convey the experience of consciousness in
terms of the first-person awareness of ones own subjectivity because the
subjective intention is directly differentiated from the objective reality
and the subject feels, as it were, out of place and thus at a difference
from its own self-projection. My tempting speculation is that the authen-
tic being could be seen in a certain imminent situation caused by such
error or conflict between thetwo.
In summary, what Iam suggesting is that nonlinear neurodynamics
can support discrete computational mechanics for compositionality
while preserving the metric space of real-number systems in which
physical properties such as position, speed, weight, and color can be
represented. In this way, neurodynamic systems are able to host both
semantically combinatorial thoughts at higher levels and the corre-
sponding details of their direct perception at lower levels. Because
both of these share the same phase space in a coupled dynamical sys-
tem, they can interact seamlessly and thus densely, not like symbols
and patterns that interact somewhat awkwardly in more common, so-
called hybrid architectures. Meanwhile, the significance of symbolic
expression is not only retained on the neurodynamic account but it is
clarified, and with this newfound clarity we may anticipate many his-
torical problems regarding the nature of representation in cognition in
philosophy of mind to finally dissolve.

6.3. The Subjective Mind and theObjective World


asan InseparableEntity

Next, lets extend such thinking further and examine how the subjec-
tive mind and the objective world might be related. Figure 6.3 illustrates
conceptually how the interactions between top-down and bottom-up
processes take place in the course of executing intended actions.
It is thought that the intention of the subjective mind (top-down) as
well as the perception of the objective world (bottom-up) proceeds as
shown in Figure 6.3 (left panel). These two processes interact, resulting
in the recognition of the perceptual reality in the subjective mind and
the generation of action in the objective world (middle panel). This
recognition results in the modification of the subjective mindand
149

New Proposals 149

subjective mind subjective mind subjective mind

intention intention modify

predict recognize predict recognize predict

perceive perceive act perceive act

physical
interaction
objective world objective world objective world

Figure6.3. The subjective mind and the objective world become an insep
arable entity through interactions between the top-down and bottom-up
pathways. Redrawn from Tani (1998).

potential consciousnesswhereas the generation of action modifies


the objective world, and the interactions continue with the modified
states of the mind and the world (right panel). In this process, we see the
circular causality between action and recognition. This circular causality
results in inseparable flows between the subjective mind and the objec-
tive world as they reciprocally intertwine with each other via action-
perception cycles, as Merleau-Ponty proposed (Merleau-Ponty, 1968).
If we were able to achieve this scenario in a robot, the robot would be
free from Descartess Cartesian dualism, as its subjective mind and the
objective world could finally become inseparable.
I want to conclude this chapter by pointing out what Iconsider to be
essential for constructing models of the cognitivemind:

1. The cognitive mind is best represented by nonlinear


dynamical systems defined in the continuous time and space
domain, wherein their nonlinearity can provide the cognitive
competence of compositionality.
2. Both natural and artificial cognitive systems should be capable
of predicting the perceptual outcome for the current intention
for acting on the outer world via top-down pathways, whereas
the current intention is adapted by using bottom-up signals
of error detected between the prediction and the actual
perceptual outcome in the action-perceptioncycle.
150

150 Emergent Minds:Findings from Robotics Experiments

3. The underlying structure for consciousness and free will


should be clarified by conducting a close examination
of nonstationary characteristics in the circular causality
developed through the aforementioned top-down and the
bottom-up interaction between the subjective mind and the
objective world. The essence of authentic being also might
be clarified via such examination of the apparent dynamic
structure.

The remaining chapters test these conjectures by reviewing a series of


synthetic robotics experiments conducted in my laboratory. Readers
should be aware that my ideas were not at all in a concrete nor com-
plete form from the very outset. Rather, they became consolidated over
time as the modeling studies were conducted. Moreover, my colleagues
and Ihave never tried to put all of the assumed elements of the mind
that we have discussed thus far into our synthetic robotic models. It
was not our aim to put all available neuroscience knowledge about local
functions, mechanisms, and anatomy into the brains of our tiny robots.
Instead, in each trial we varied and developed minimal brains, so to
speak, in dynamic neural network models of the RNN type. We tried
neither to implement all possible cognitive functions into a particular
robotic model nor to account for the full spectrum of phenomenologi-
cal issues in each specific experiment. We concentrated on models and
experiments with specific focuses, therforee in each new trial we added
elements relevant to the focus and removed irrelevant ones. My hope is
that, in reviewing the outcomes of our series of synthetic robotic stud-
ies, readers will be able to share the deep insights into the nature of the
mind, especially how thought and its interaction with the world could
arise, whicht Ihave come to in performing and reflecting on the actual
experiments day-to-day.
The next chapter examines how robots can lean about the outer
environment by using a sensory prediction mechanism in the course of
exploration. It also explores the issue of self-consciousness as related to
this sensory prediction mechanism.
151

7
Predictive Learning Aboutthe World
fromActional Consequences

The previous chapter argued that understanding the processes essential


in the development of a subjective view of the world by way of interac-
tive experiences within that world is crucial if we are to reconstruct
the cognitive mind in another medium, such as in our neurodynamic
robots. But, how exactly can robots develop such subjective views from
their own experiences? Furthermore, if a robot becomes able to acquire
a subjective view of the world, how does it also become aware of its
own subjectivity or self? In considering these questions, this chapter
reviews a set of robotics experiments in the domain of navigation learn-
ing. These experiments were conducted in a relatively simple setting
more than 20years ago in my lab, but they addressed two essential ques-
tions. The first experiment addresses the question of how the compo-
sitional representation of the world can be developed by means of the
self-organization of neurodynamic structures via the accumulated learn-
ing of actional experiences in the environment. The second experiment
inquires into the phenomenology of the self or self-consciousness.
Iattempt to clarify its underlying structure by examining the possible
interaction between top-down prediction and bottom-up recognition
during robot navigation.

151
152

152 Emergent Minds:Findings from Robotics Experiments

7.1. Development ofCompositionality:The Symbol


Grounding Problem

In the mid-1990s, I started to think about how robots could acquire


their own images of the world from experiences gathered while inter-
acting with their environment (Tani, 1996). Because humans can men-
tally generate perceptual images for various ways of interacting with the
world, Iwondered if robots could also develop a similar competence via
learning. As my colleagues and I had just completed experiments on
robot navigation learning with homing and cyclic routing, as described
in c hapter 5, I decided to pursue this new problem in the context of
robot navigation.
First, Itried to apply the forward dynamics model proposed by Masao
Ito and Mitsuo Kawato (see c hapter4) directly to my Yamabico robot
navigation problem. Ithought that a recurrent neural network (RNN)
would work as a forward dynamics model that predicts how the sensa-
tion of images at range changes in response to arbitrary motor command
inputs for two wheel drives at every 500-ms time interval. However,
achieving the convergence of learning with the sensory motor data
acquired in the original workspace proved to be very difficult. The rea-
son for this failure was that it was just asking too much of the network
to learn to predict the sensory outcomes for all possible combinations
of motor commands at each time step. Instead, it seemed reasonable to
assume that the trajectory of the robot should be generated under the
constraint of smooth, collision-free maneuvering. From this assumption,
Idecided to employ the scheme of branching with collision-free maneu-
vering shown in section 5.6 again. This branching scheme enables the
robot to move along topological trajectories in a compositional way by
arbitrarily combining branching decisions in sequence.
By utilizing this scheme, the problem could be simplified to one
wherein an RNN learns to predict just the sensation of the next branch-
ing point in response to action commands (branching decision) at the
current branching point. Ispeculated that the RNN could acquire com-
positional images while traveling around the workspace by combining
various branching decisions, provided that the RNN had already learned
a sufficient number of branching sequences in the topological trajecto-
ries. Afocal question that the experiment was designed to address was
this one:What happens when the prediction differs from the actual out-
come of the sensation? In this situation, a robot navigating a workspace
153

Predictive Learning About the World from Actional Consequences 153

by referring to an internal map with a finite state machine (FSM)-like


representation of the topological trajectories would experience the sym-
bol grounding problem (see Figure2.2), discussed in c hapter2.

7.1.1 Navigation Experiments withYamabico

In the learning phase, the robot explores a given environment containing


obstacles by taking random branching decisions. Lets assume that the
robot arrives at the nth branching point, where it receives sensory input
(range image vector plus travel distance from the previous branch point)
pn and randomly determines the branching (0 or 1)as xn, after which it
moves to the (n+1)st branching point (see the left side of Figure7.1).
The robot acquires a sequence of pairs of sensory inputs and actions
(pn, xn) throughout the course of exploring its environment. Using these
sample pairs of sensory inputs and actions, the RNN is trained so that
it can predict the next sensory input pn+1 in terms of the current sensory
input pn and the branching action xn taken at branching point n (see the
right panel in Figure 7.1). In this predictive navigation task, the context
units in the RNN play the role of storing the current state in the work-
ing memory, which is analogous to the previous Yamabico experiment
described in c hapter5. The actual training of the RNN is conducted in
an offline manner with the sample sequence data saved in short-term
memory storage.
Once the RNN is trained, it can perform two types of prediction.
One is the online prediction of the sensory inputs at the next branching

Prediction of sensation at next branch


Context loop
Pn+1 Cn+1
p3
x3

x2
p2

x1
p1 Cn
Pn Xn

Sensation at current branch Current Branch decision

Figure7.1. An RNN learning to predict sensation at the next branching


point from the current branching decision.
154

154 Emergent Minds:Findings from Robotics Experiments

point for an action taken at the current branch. The other is the offline
look-ahead prediction for multiple branching steps while the robot
stays at a given branching point. Look-ahead prediction is performed by
making a closed loop between the sensory prediction output units and
the sensory input units of the RNN, as denoted with a dotted line in
Figure 7.1. In the forward dynamics of an RNN with a closed sensory
loop, arbitrary steps for look-ahead prediction can be taken by feeding
the current predictive sensory outputs as sensory inputs in the next step
instead of employing actual external sensory inputs. This enables the
robot to perform the mental simulation of arbitrary branching action
sequences as well as goal-d irected planning to achieve given goal states,
as describedlater.
After exploring the workspace for about 1 hour (see the exact tra-
jectories shown in Figure7.2a) and undergoing offline learning for one
night, the robots performance for online one-step prediction was tested.
In the evaluation after the learning phase, the robot was tested for its
predictive capacity during navigation of the workspace. It navigated the
workspace from arbitrarily set initial positions by following an arbitrary
action program of branching and tried to predict the upcoming sensory
inputs at the next branching point from the sensory inputs at the cur-
rent branching point. Figure 7.2b presents an instance of this process,
wherein the left panel shows the trajectory of the robot as observed and
the right panel shows a comparison between the actual sensory sequence
and the predicted one. The figure shows the nine steps of the branch-
ing sequence; the leftmost five units represent sensory input, the next
five units represent the predicted state for the next step, the following
unit is the action command (branching into 0 or 1), and the rightmost
four units are the context units. Although, initially, the robot could
not make correct predictions, it became increasingly accurate after the
fourth step. Because the context units were initially set randomly, the
prediction failed at the very beginning. However, as the robot contin-
ued to travel, sensory input sequences entrained context activations
into the normal/steady-state transition sequence, after which the RNN
became capable of producing correct predictions.
We repeated this experiment with various initial settings (different
initial positions and different action programs) and the robot always
started to produce correct predictions within 10 branch steps. We also
found that although the context was easily lost when perturbed by
strong noise in the sensory input (e.g., when the robot failed to detect a
155

Predictive Learning About the World from Actional Consequences 155

(a)

(b) sensory one-step prediction


sequence sequence

branching step

start
p p x c

sensory look-ahead prediction


(c) sequence sequence
branching step

start

p p x c

Figure7.2. Trajectories of Yamabico (a)during exploration, (b)during


online one-step prediction (left) and comparison between the actual sensory
sequence and its corresponding one-step prediction (right), and (c)generated
after offline look-ahead prediction (left) and comparison between an actual
sensory sequence and its look-ahead prediction (right). Adopted from Tani
(1996) with permission.

branch and ended up in the wrong place), the prediction accuracy was
always recovered as long as the robot continued to travel. This autore-
covery feature of the cognitive process is a consequence of the fact that
a certain coherence in terms of the close matching between the inter-
nal prediction dynamics and the environment dynamics emerges during
their interaction.
156

156 Emergent Minds:Findings from Robotics Experiments

Once the robot was situated in the environment by the entrainment


process, it was able to perorm multistep look-ahead prediction from
branching points. A comparison between a look-ahead prediction and
the actual sensory sequence during travel is shown in Figure7.2c. The
arrow in the workspace in the left panel of the figure denotes the branch-
ing point where the robot performed look-ahead prediction for an action
program represented by the branching sequence 1100111. The robot,
after conducting look-ahead prediction, actually traveled following the
action program, generating a figure-8 trajectory. The right panel in the
figure shows a comparison between the actual sensory input sequence
and its look-ahead prediction associated with the action program and the
context activation sequence. It can be seen that the look-ahead prediction
agrees with the actual sequence. It is also observed that the context val-
ues as well as the prediction of sensory input at the initial and final steps
are almost the same. This indicates that the robot predicted its return to
the initial position at the end step in its mental simulation for traveling
along a figure-8 trajectory. We repeated this experiment of look-ahead
prediction for various branching sequences and found that the robot was
able to predict sensory sequences correctly for arbitrary action programs
in the absence of severe noise affecting the branching sequence.
Finally, the robot was instructed to generate action plans (branch-
ing sequences) for reaching a particular goal (position) specified by a
sensory image. In the planning process, the robot searched for adequate
action sequences that could be used to reach the target sensory state in
the look-ahead prediction of sensory sequences from the current state
while minimizing estimated travel distance to the goal. Figure 7.3 shows
the result of one particulartrial.
In this example in Figure 7.3, the robot generated three different
action plans, each of which was actually executed. The figure shows the
three corresponding trajectories successfully reaching a given goal from
a starting position in the adopted workspace. Although the third trajec-
tory might look redundant due to the unnecessary loop, the creation of
such trajectories suggests that a sort of compositional mechanics in the
forward dynamics of the RNN had developed as a result of consolida-
tion learning. This self-organized mechanism enabled the robot to gen-
erate diverse navigational plans as if segments of images obtained during
actual navigation were combined by following acquiredrules.
Some may consider that the process of the goal-d irect plan by the
RNN is analogous to the one by GPS described in section 2.2, because
157

Predictive Learning About the World from Actional Consequences 157

(a) (b)

start
start
goal goal

(c) (d)

start goal start


goal

Figure7.3. The result of goal-d irected planning. Trajectories corresponding


to three different generated action programs are shown. Adopted from Tani
(1996) with permission.

the forward prediction of the next sensory state for actions to be taken
at each situation of the robot by the RNN seems to play the same role
as that of the causal rule described for each situation in the problem
space in GPS. However, there are crucial differences between the for-
mer functioning in a continuous state space and the latter in a discrete
state space. We will come to understand the significance of these differ-
ences through the following analysis.

7.1.2 Analysis ofthe Acquired Neurodynamic Structure

After the preceding experiment, Ithought that it would be interesting to


see what sorts of attractors or dynamical structures emerged as a result
of self-organization in the RNN and its coupling with the environment,
as well as how such attractors could explain the observed phenomena,
such as the look-ahead prediction of combinatorial branching sequences
and the autorecovery of internal contexts by environmental entrainment.
Therefore Iconducted a phase-space analysis of the obtained RNN to
158

158 Emergent Minds:Findings from Robotics Experiments

examine its dynamical structure, as shown for the Rssler attractor in


chapter5. One difference was that time integration by forward dynam-
ics of the RNN required feeding external inputs in the form of branching
action sequences into the network. Therefore, the RNN in the closed-
loop mode was dynamically activated for thousands of steps while being
fed random branching sequences (1s and 0s). Then, the activation values
of two representative context units were plotted for all steps, in which
the transient part corresponding to the first several hundred steps was
excluded. It was like looking at trajectories from the mental simulation
of thousands of consecutive steps of random branching sequences in the
workspace while ignoring the initial transient period of state transitions.
The resultant plot can be seen in Figure7.4.
We can see a set of segments (Figure7.4a). Moreover, a magnification
of a particular segment shows an assembly of points resembling a Cantor
set (Figure7.4b). The plot represents the invariant set of a global attrac-
tor, as the assembly appears in the same shape regardless of the initial val-
ues of the context units or the exact sequences of randomly determined
branching sequences. This means that the context state initialized with
arbitrary values always converged toward steady-state transitions within
the invariant set after some transient period. It was found that, after
convergence was reached, the context state shifted from one segment
to another at each step, and moreover it was found that each segment
corresponded to a particular branching point. Additionally, an analysis

(a) 1.0 (b) 0.95


context-2

context-2

0 .5

0.0 0.8
0.0 0.5 1.0 0.4 0.55
context-1 context-1

Figure7.4. Phase space analysis of the trained RNN. (a)An invariant set
of an attractor appeared in the two-d imensional context activation space.
(b)Amagnification of a section of the space in (a). Adopted from Tani
(1996) with permission.
159

Predictive Learning About the World from Actional Consequences 159

of the aforementioned experiments for online prediction revealed that,


whenever the predictability of the robot was lost due to perturbations,
the context state left the invariant set. However, the perturbed context
state always returned to the original invariant set after several branching
steps because the invariant set had been generated as a global attractor.
Our repeated experiments with different robot workspace configura-
tions revealed that the observed properties of the RNN are repeatable
and therefore general.

7.1.3 Is theProblem ofSymbol Grounding Relevant?

Given that the context state shifted from one segment to another in the
invariant set in response to branching inputs, we can consider that what
the RNN reproduced in this case was exactly an FSM consisting of nodes
representing branching points and edges corresponding to transitions
between these points, as shown in Figure2.2. This is analogous to what
Cleeremans and colleagues (1989) and Pollack (1991) demonstrated by
training RNNs with symbol sequences characterized by FSM regulari-
ties. Readers should note, however, that the RNNs achieve much more
than just reconstructing an equivalent of the targetFSM.
First, each segment observed in the phase space of the RNN dynam-
ics is not a single node but a set of points, namely a Cantor set spanning
a metric space. The distance between two points in a segment repre-
sents the difference between past trajectories arriving at the node. If the
two trajectories come from different branching sequences, they arrive
at points in the segment that are also far apart. On the other hand, if
the two trajectories come from exactly the same branching sequences
after passing through an infinite number of steps except for the initial
branching points, they arrive at arbitrarily close neighbors in the same
segment. Theoretically speaking, a set of points in the segment consti-
tutes a Cantor set with fractal-like structures because this infinite num-
ber of points should be capable of representing the history of all possible
combinations of branching (this can be proven by taking into account
the theorem of iterative function switching [Kolen,1994] and random
dynamical systems [Arnold,1995]). This fractal structure is actually a
signature of compositionality, which has appeared in the phase space of
the RNN by means of iterative random shifts of the dynamical system
triggered by given input sequences of random branching. Interestingly,
Fukushima and colleagues (2007) recently showed supportive biological
160

160 Emergent Minds:Findings from Robotics Experiments

Figure7.5. The dynamic closure of steady-state transitions organized as


an attractor (solid arrows) associated with a convergent vector flow (dashed
arrows).

evidence from electrophysiological recording data that CA1 cells in the


rat hippocampus encode sequences of episodic memory with a similar
fractal structure.
Second, the observed segments cannot be manipulated or represented
explicitly as symbols attached to nodes in an FSM. They just appear as
a dynamic closure1 as a result of the convergent dynamics of the RNN.
The nature of global convergence of the context state toward steady-
state transitions within the invariant set as a dynamic closure can afford
global stability to the predictive dynamics of the RNN. An illustration
of this concept appears in Figure7.5.
On the other hand, in the case of an FSM, there is no autorecov-
ery mechanism against perturbations in the form of invalid inputs
because the FSM provides only a description of steady-state transitions
within the graph and cannot account for how to recover such states
from dynamically perturbed ones. As mentioned earlier, when an FSM
receives invalid symbols (e.g., unexpected sensations during navigation),
it simply halts operation. The discussion here is analogous to that in
chapter 5 about the advantages of utilizing dissipative dynamic sys-
tems rather than sinusoidal functions for stably generating oscillatory
patterns.

1. It is called a dynamic closure because the state shifts only between points in the
set of segments in the invariant set (Maturana & Varela,1980).
161

Predictive Learning About the World from Actional Consequences 161

It is essential to understand that there is no homunculus that looks at


and manipulates representations or symbols in the proposed approach.
Rather, there are just iterations of a dynamical system whereby com-
positionality emerges. Ultimately, it can be said that this system is not
affected by the symbol grounding problem because there are no sym-
bols to be grounded to begin with, at least not internally.
Before moving on, Ishould mention some drawbacks to this approach.
The current scheme utilizing the forward model is limited to small-
scale problems because of the frame problem discussed in section 7.1.
The model worked successfully because the navigation environment was
small and the branching scheme preprogramed in the lower level simpli-
fied the navigation problem. Although how robots can acquire the lower
sensorymotor level skills such as branching or collision-free maneuver-
ing from their own direct sensorymotor experiences is quite an impor-
tant problem, we did not address the problem in this study. Another
problem concerns the intentionality of the robot. What the experiment
showed is so-called latent learning in which an agent learns an internal
model of the environment via random exploration without any inten-
tions. If the robot attempts to learn about all possible exploration expe-
riences without any intentions or goals to achieve, such learning will
face the combinatory explosion problem sooner or later. We return to
these issues in the later sections.
The next section explores how sensory prediction learning and phe-
nomena of self-consciousness could be related, by reviewing results of
another type of robot navigation experiment.

7.2. Predictive Dynamics and Self-Consciousness

This section examines how the notion of self or self-consciousness


could emerge in artificial systems as well as in human cognitive minds
through the review of further robotics experiments on the topics of
prediction learning in navigation as extended from the aforementioned
Yamabico ones. The following robotics experiments clarify the essential
role of sensory prediction mechanisms in the possible development of
self-consciousness as presumed in the earlier chapters.
Although the experiments with Yamabico described in the previ-
ous section revealed some interesting aspects of contextual predictive
dynamics, they still miss some essential features, one of which is the
162

162 Emergent Minds:Findings from Robotics Experiments

utilization of prediction error signals. The error signal is considered


to be a crucial cue for recognizing a gap between the subjective image
and objective reality. Recent evidence from neuroscience has revealed
brain waves related to prediction error, as in the case of mismatched
negativity, and it is speculated that they are used for fast modification of
ongoing brain processes. Also, Yamabico did not have a particular bias
or attention control for acquiring sensory input. It would naturally be
expected that the addition of some attention control mechanism would
reinforce our proposed framework of top-down predictionexpectation
versus bottom-up recognition. Therefore, we introduced a visual system
with an attention control mechanism in the robot platform that suc-
ceeded Yamabico. Finally, it would be interesting to incorporate such a
system with dynamic or incremental learning of experiences rather than
looking at the result of one-time offline batch learning, as in the case
of Yamabico. Our findings in these robotics experiments enriched with
these new elements suggest a novel interpretation of concepts such as
the momentary self and minimal self, which correspond to ideas devel-
oped by William James (1982) and Martin Heidegger (1962).

7.2.1 Landmark-Based Navigation Performed By a Robot withVision

I built a mobile robot with vision provided by a camera mounted on a


rotating head, as shown in Figure 7.6a (Tani, 1998). The task of this
robot was to learn to dynamically predict landmark sequences encoun-
tered while navigating a confined workspace.
After a successful learning process, the robot was expected to be able
to use its vision to recognize landmarks in the form of colored objects
and corners within a reasonable amount of time before colliding with
them, while navigating the workspace by following the wall and the edge
between the wall and the floor. It should be noted that the navigation
scheme did not include branching as in the case of Yamabico, because
the learning of compositional navigational paths was not the focus of
research in this robotstudy.
The robot was controlled by the neural network architecture shown
in Figure 7.6b. The entire network consisted of parts responsible for
prediction (performed by an RNN) and parts responsible for percep-
tion, the latter being divided into what and where pathways thereby
mimicking known visual cortical structures. In the what pathway,
visual patterns of landmarks corresponding to colored objects were
163
(a) (b)
Prediction by RNN sensory prediction

sensory input Association


network
p
loo categorical output
text
con where
what

categorical output
left l
whe whee
winner take all neurons el right

Hopfield net motors


pop-up

visual field

camera

Figure7.6. Avision-enabled robot and its neural architecture. (a)Amobile robot featuring vision is looking at a colored landmark
object. (b)The neural network architecture employed in the construction of the robot. Adopted from Tani (1998) with permission.
164

164 Emergent Minds:Findings from Robotics Experiments

processed in a Hopfield network, which can store multiple static pat-


terns by using multiple fixed-point attractors. When a perceived visual
pattern converged toward one of the learned fixed-point attractors,
the pattern was recognized and its categorical output was generated
by a winner-takes-a ll activation network, known as a Kohonen net-
work. Learning was initiated for both the Hopfield and Kohonen net-
works whenever a visual stimulus was encountered. In the where
pathway, accumulated encoder readings of the left and right wheels
from the last encountered landmark to the current one, and the direc-
tions of the detected landmarks in frontal view, were processed by the
Kohonen network, in which its categorical outputs were generated.
Together with both pathways, what categories of visual landmark
objects and where categories of the relative travel distance from the
last landmark to the current one, as well as the corresponding direc-
tion determined by the camera orientation, were sent for prediction in
a bottom-up manner.
In the prediction process, the RNN learned to predict in a top-down
manner the perceptual categories of what and where for landmarks
to be encountered in the future. Note that there were no action inputs
in this RNN because there was no branching in the current setting.
In this model, the bottom-up and top-down pathways did not merely
provide inputs and outputs to the system. Rather, they existed for their
mutual interactions, and the system was prepared for expected percep-
tual categories in the top-down pathway before actually encountering
the landmarks. This expectation ensured that the system was ready for
the next arriving pattern in the Hopfield network and was prepared
to direct the camera toward the landmark with correct timing and
direction. Actual recognition of the landmark objects was established
by dynamic interactions between the two pathways. This means that
if the top-down prediction of the visual pattern failed to match the
currently encountered one, the perception would result in an illusion
constituting a combination of the two patterns. Moreover, a mismatch
in the where perceptual category could result in failure to attend
any of the expected landmarks to be recognized. Such misrecognition
outcomes were fed into the RNN, and the next prediction was made
on this basis. Note that the RNN was capable of engaging in mental
rehearsal of learned sequential images by constructing a closed loop
between the prediction outputs and the sensation inputs in the same
way as Yamabico.
165

Predictive Learning About the World from Actional Consequences 165

A particular mechanism for internal parameter control was imple-


mented to achieve adequate interactive balance between the top-down
and bottom- up pathways. The mechanism exerted more top- down
pressure on the two perceptual categories (what and where) as
the error between the predicted perception and its actual outcome
decreased. Ashorter time period was also allocated for reading the per-
ceptual outcomes in the Hopfield network in this case. On the other
hand, less top-down pressure was exerted when the error between the
predicted perception and its actual outcome was larger, and a longer
time period was allowed for dynamic perception in the Hopfield net-
work. In other words, in the case of fewer errors, top-down prediction
dominated the perception, whereby the attention was quickly turned to
upcoming expected landmarks, which resulted in quick convergence in
the Hopfield network. Otherwise, the bottom-up pathway dominated
the perception, taking longer to look for landmarks while waiting for
convergence in the Hopfield network.
Learning of the RNN was conducted for event sequences associated
with encountering landmarks. More specifically, experienced sequences
of perceptual category outcomes were used as target sequences to be
learned. Incremental training of the RNN was conducted after every
15th landmark by adopting a scheme of rehearsal and consolidation,
so that phenomena such as catastrophic forgetting could be avoided.
RNNs lose previously learned memory content quite easily when new
sequences are learned, thereby altering acquired connection weights.
Therefore, in the new scheme, the RNN rehearsed previously learned
content with the closed- loop operation and stored the generated
sequences in the hippocampus (corresponding to short-term memory)
together with the newly acquired sequences, and catastrophic forgetting
of existing memory was avoided by retraining the RNN with both the
rehearsed sequences and the newly experiencedones.
This rehearsal and consolidation might correspond to dreaming dur-
ing the REM sleep phase reported in the literature on consolidation
learning (Wilson & McNaughton, 1994; Squire & Alvarez, 1995). It
has been considered that generalization of our knowledge proceeds sig-
nificantly through consolidating newly acquired knowledge with older
knowledge during sleep. Our robot actually stopped for rest when this
rehearsal and consolidation learning was taking place after every fixed
period. However, in reality the process would not be so straightforward
as this if the rehearsed and the newly acquired experiences conflict with
166

166 Emergent Minds:Findings from Robotics Experiments

each other. One of the aims behind the next experiment Iwill describe
was to examine thispoint.

7.2.2 Intermittency During Dynamic Learning

The experiment was conducted in a confined workspace containing


five landmarks (two colored objects and three corners). It was repeated
three times, and in each trial the robot circulated the workspace about
20 times, which was a limit imposed by the battery life of the robot.
We monitored three characteristic features of the robots navigation
behavior in each run:prediction error, bifurcation of the RNN dynam-
ics due to iterative learning, and phase plots representing the attractor
dynamics of the RNN at particular times during the bifurcation process.
Atypical example is shown in Figure7.7a.
The prediction error was quite high at the beginning of all trials
because of the initially random connection weights. After the first learn-
ing period, the predictability was improved to a certain extent in all
three trials, but the errors were not eliminated completely. Prediction
failures occurred intermittently in the course of the trials, and we can
see from the bifurcation diagram that the dynamical structure of the
RNN varied. In a typical example, shown in Figure7.7a, a fixed-point
attractor appearing in the early periods of the learning iterations as a
single point is plotted at each step in the bifurcation diagram, in most
cases before the third learning period. After the third learning period, a
quasiperiodic or weakly chaotic region appears. Then, after the fourth
learning period, it becomes a limit cycle with a periodicity of 5, as can
be seen from the five points plotted in the bifurcation diagram at each
step during this period. In addition, a snapshot is shown in the phase
plot containing five points. After the fifth learning period, a highly cha-
otic region appears, as indicated by the strange attractor in the corre-
sponding phaseplot.
Importantly, the state alternates between the strange attractor
(chaos) and the limit cycle attractor with a periodicity of 5. In fact,
limit-cycle dynamics with a periodicity of 5 appeared most frequently
in the course of all trials. Aperiodicity of 5 is indicative because it cor-
responds to the five landmarks that the robot encountered in a single
turn around the workspace. Indeed, the five points represent a dynamic
closure for the steady-state transitions between these five landmarks.
However, it should be noted that this limit cycle with a periodicity of 5
167

Predictive Learning About the World from Actional Consequences 167

(a) 1

prediction error
0.5

0
0 15 30 45 60 75 90 10 5
event steps
1.0
activation state

0.5
neural

0.0
0 1 2 3 4 5 6 7
learning times

4 5 7

c1 c1 c1

c2 c2 c2

(b) Unsteady phase Steady phase

Figure7.7. Experimental results for a vision robot. (a)Prediction error,


bifurcation diagram of the RNN dynamics, and phase plot for two context
units at particular times during the learning process. (b)The robots
trajectories as recorded in the unsteady and steady phases. Adopted from
(Tani, 1998)with permission.

does not remain stationary, because the periodicity disappears at times


and other dynamical structures emerge. The dynamic closure observed
in the current experiment is not stable but changes in the course of
dynamic learning. From the view of symbolic dynamics (see chapter5),
this can be interpreted as the robot could mentally simulate various
168

168 Emergent Minds:Findings from Robotics Experiments

symbolic sequence structures for encountering landmark labels includ-


ing deterministic symbol sequences of a period of 5 and the ones with
probabilistic state transitions during the rehearsal.
From these results, we can conclude that there were two distinct
phases:a steady-state phase represented by the limit-cycle dynamics with
a periodicity of 5, and an unsteady phase characterized by nonperiodic
dynamics. We also see that transitions between these two phases took
place arbitrarily over the course of time, and that differences appeared
in the physical movements of the robot concurrently. To clarify why this
happened, we compared the actual robot trajectories observed in these
two periods. Figure7.7b shows the robot trajectories measured in these
two periods with a camera mounted above the workspace. The trajec-
tory was more winding in the unsteady phase than in the steady phase,
particularly in the way objects and corners were approached. From this
it was inferred that the robots maneuvers were more unstable in the
unsteady phase because it spent more time on the visual recognition of
objects due to the higher prediction error. So, the robot faced a higher
risk of misdetecting landmarks when its trajectory meandered during
this period, which was indeed the case in the experiments. In the steady
phase, however, the detection sequence of landmarks became more
deterministic and travel was smooth, with greater prediction success.
What is important here is that these steady and unsteady dynamics are
attributable not only to the internal cognitive processes arising in the
neural network, but also were expressed in the physical movements of
the robots body as it interacted with the external environment.
Finally, we measured the distribution of interval steps between cat-
astrophic error peaks (error >0.5) observed in three different experi-
ments of the robot (Figure7.8).
The graph indicates that the distribution of the breakdown interval
has a long-tail characteristic with a power-law-like profile. This indi-
cates that the shift from the steady to the unsteady phase takes place
intermittently, without dominant periodicity. The observed intermit-
tency might be due to the tangency developed in the whole dynamics
(see section 5.1.). The observation here might be also analogous to the
so-called phenomenon of chaotic itinerancy (Tsuda et al., 1987; Ikeda
etal., 1989; Kaneko, 1990; Aihara etal., 1990)in which state trajec-
tories tend to visit multiple pseudoattractors one by one itinerantly in a
particular class of networks consisting of dynamic elements. Tsuda and
colleagues (1987) showed that intermittent chaos mechanized by means
169

Predictive Learning About the World from Actional Consequences 169

16

8
Frequency (Times)

1
1 4 16 64 128
Interval (Steps)

Figure7.8. Distribution of interval steps between catastrophic prediction


error peaks greater than 0.5, where the x axis represents the interval steps
and the y axis represents the frequency of appearance in the corresponding
range, and both axes are in logscale.

of tangency in nonlinear mapping (see section 5.1) generated the chaotic


itinerancy observed in his memory dynamicsmodel.
The robotics experiment described in this section has demonstrated
that phenomena similar to chaos itinerancy could also emerge in the
learning dynamics of a network model coupled with a physical environ-
ment. The dynamic learning processes while interacting with the outer
environments can generate complex trajectories that alternate between
stabilizing the memory contents and their breakdown.

7.2.3 Accounting forthe MinimalSelf

An interesting observation from the last experiment is that the transi-


tions between steady and unsteady phases occurred spontaneously, even
though the workspace environment was static. In the steady phase, coher-
ence is achieved between the internal dynamics and the environmental
dynamics when subjective anticipation agrees closely with observation.
All the cognitive and behavioral processes proceed smoothly and auto-
matically, and no distinction can be made between the subjective mind
and the objective world. In the unsteady phase, this distinction becomes
rather explicit as conflicts in terms of the prediction error are gener-
ated between the expectations of the subjective mind and the outcome
generated in the objective world. Consequently, it is at this moment of
incoherence that the self-consciousness of the robot arises, whereby
170

170 Emergent Minds:Findings from Robotics Experiments

the systems attention is directed toward the conflicts to be resolved. On


the other hand, in the steady phase, the self-consciousness is reduced
substantially, as there are no conflicts demanding the systems attention.
This interpretation of the experimental observations corresponds to
the aforementioned analysis in Heideggers example of the hammer miss-
ing the nail (see section 3.4) as well as in James concept of the stream
of consciousness (see section 3.6) in which the inner stream consists
of transient and substantive parts, and the self can become consciously
aware momentarily in the discrete event of breakdown. With reference
to the Scottish philosopher David Hume, Gallagher (2000) considered
that this momentary self is in fact a minimal self, which should be dis-
tinguished from the self-referential self or narrative-self provided with a
past and a future in the various stories that we tell about ourselves.
However, one question still remains for us to address here: Why
couldnt the coherence in the steady phase last longer and the break-
down into incoherence take place intermittently? It seems that the com-
plex time evolution of the system emerged from mutual interactions
between multiple local processes. It was observed that changes in the
visual attention dynamics due to changes in the predictability caused
drifts in the robots maneuvers. These drifts resulted in misrecogni-
tion of upcoming landmarks, which led to modification of the dynamic
memory stored in the RNN and a consequent change in predictability.
Dynamic interactions took place as chain reactions with certain delays
among the processes of recognition, prediction, perception, learning,
and acting, wherein we see the circular causality between the subjec-
tive mind and the objective world. So, this circular causality might then
provide a condition for developing a certain criticality.
The aforementioned circular causality can be explained more intu-
itively as follows. When the learning error decreases as learning pro-
ceeds, more strict timing of visual recognition is required for upcoming
landmarks because only a short period for recognition of the objects is
allowed, which is proportional to the magnitude of the current error.
In addition, the top-down image for each upcoming landmark pattern
is shaped into a fixed one, without variance. This is because the same
periodic patterns are learned repeatedly and the robot tends to trace
exactly the same trajectories in the steady phase. If all goes completely
as expected, this strictness grows as the prediction error decreases fur-
ther. Ultimately, at the peak of strictness, catastrophic failure in the
recognition of landmark sequences can occur as a result of even minor
171

Predictive Learning About the World from Actional Consequences 171

noise perturbation because the entire system has evolved too rigidly by
building up relatively narrow and sharp top-down images.
The described phenomena remind me of a theoretical study con-
ducted on sand pile behavior by Bak and colleagues (1987). In their
simulation study, grains of sand were dropped onto a pile, one at a time.
As the pile grew, its sides became steeper, eventually reaching a critical
state. At that very moment, just one more grain would have triggered an
avalanche. Iconsider that this critical state is analogous to the situation
of generating catastrophic failures in recognizing the landmarks in the
robotics experiment. Bak found that although it is impossible to pre-
dict exactly when the avalanche will occur, the size of the avalanches is
distributed in accordance with a power law. The natural growth of the
pile to a critical state is known as self-organized criticality (SOC), and it
is found to be ubiquitous in various other phenomena as well, such as
earthquakes, volcanic activity, the Game of Life, landscape formation,
and stock markets. Acrucial point is that the evolution toward a certain
critical state itself turns out to be a stable mechanism in SOC. It is as if
a critical situation such as tangency (see section 5.1) can be preserved
with structural stability in the system. This seems to be possible in the
system with relatively larger dimensions allowing local nonlinear interac-
tions inside (Bak etal.,1987).
Although we might need a larger experimental dataset to confirm the
presence of SOC in the observed results, Ispeculate that some dynamic
mechanisms for generating criticality could be responsible for the auton-
omous nature of the momentary self, which James metaphorically
spoke of as an alternation of periods of flight and perching throughout a
birds life. Here, the structure of consciousness responsible for generat-
ing the momentary self can be accounted for by emergent phenomena
resulting from the aforementioned circular causality.
Incidentally, readers may wonder how we can appreciate a robot with
such fragility in its behavior characterized by SOCthe robot could
die by crashing into the wall due to a large fluctuation at any moment.
Iargue, however, that the potential for an authentic robot arises from
this fragility (Tani, 2009), remembering what Heidegger said about the
authentic being of man, who resolutely anticipates death as his own-
most possibility (see section 3.4). By following Heidegger, the vivid
nowness of a robot might be born in this criticality as a consequence of
the dynamic interplay between looking ahead to the future for possibili-
ties and regressing to the conflictive past through reflection. In this, the
172

172 Emergent Minds:Findings from Robotics Experiments

robot may ultimately achieve authentic being in terms of its irreplace-


able behavioral trajectories.
Finally, we may ask whether the account provided so far could open
a new pathway to access the hard problem of consciousness character-
ized by Chalmers (see section 4.3) or not. Iwould say yes by observ-
ing the following logic. The top-down pathway of predicting perceptual
event sequences exemplifies subjectivity because it is developed solely
along with the first-person experiences of perceptual events accumu-
lated through iterative interactions in the objective world. Subjectivity
is not a state but a dynamic function of predicting the perceptual
outcomes resulting from interactions with the objective world. If this
is granted, the consciousness that is the first-person awareness of ones
own subjectivity can originate only from a sense of discomfort in ones
own predictabilityt hat is the prediction error,2 which is also the first-
person experience but in another level of the second order (where the
contents of prediction are the first order). Subjectivity as a mirror of the
objective world cannot be aware just by itself alone. It requires differen-
tiation from the objective world as another pole by means of interacting
with it. To this end, the subject and the object turn out to be an insepa-
rable entity by means of the circular causality between them wherein
the open dynamics characterized by intermittent transitions between
the predictable steady phase and the conflictive unsteady one emerges.
And as such, this interpretation of experimental results reviewed in
this chapter provides insight into the fundamental structure of con-
sciousness, rather than merely into a particular state of consciousness
or unconsciousness at a moment.

7.3.Summary

This chapter introduced two robotics experiments on the topics of pre-


diction learning in the navigation domain by utilizing mobile robots
with a focus on how robots can acquire subjective views of the exter-
nal world through iterative interactions with it. The first experiment
focused on the problem of learning to extract compositionality from

2. Recently, Karl Friston (2010) proposed that a likelihood measure by prediction


error divided by an estimate of its variance can represent the surprise of the system.
This measure might quantify the state of consciousness better than simply error itself.
173

Predictive Learning About the World from Actional Consequences 173

sensory motor experiences and their grounding. The experimental


results using the Yamabico robot showed that the compositionality hid-
den in the topological trajectory in the obstacle environments can be
extracted by the predictive model instantiated by RNN. The navigation
of the robot became inherently robust because the mechanism of autore-
covery was supported by means of the development of the global attrac-
tor in the RNN dynamics. We concluded that symbol-like structures
self-organized in neurodynamic systems can be naturally grounded in
the physical environment by allowing active interactions between them
in a shared metricspace.
The second experiment addressed the phenomenological problem of
self by further extending the aforementioned robot navigation experi-
ments. In this new experiment, a vision-based mobile robot implemented
with an RNN model learns to predict landmark sequences experienced
during its dynamic exploration of environment. It was shown that the
developmental learning process during the exploration switches sponta-
neously between coherent phases (when the top-down prediction agrees
with the bottom-up sensation) and incoherent phases (when conflicts
appear between the two). By investigating possible analogies between
this result and the phenomenological literature on the self, we drew the
conclusion that the open dynamic structure characterized by SOC can
account for the underlying structure of consciousness through which
the momentary self appears autonomously.
It is interesting to note that, although Iemphasized the grounding
of the subjective image of the world in the first navigation experi-
ment, the second experiment suggested that the momentary self could
appear instead in the sense of the groundlessness of subjectivity. The
apparent gap between these two has originated from two different
research attitudes for exploring cognitive minds, which are revisited
in later chapters. One drawback of the models presented for robot
navigation in this chapter is that the models could not provide direct
experience of perceptual flow to the robots because the model oper-
ated in an event-based manner that was designed and programmed
by the experimenters. The next chapter introduces a set of robotics
experiments focusing on mirror neuron mechanisms in which we con-
sider how event-like perception develops out of the continuous flow
of perceptual experience, as related to phenomenological problem of
time perception.
174
175

8
Mirroring Action Generation
and Recognition withArticulating
SensoryMotorFlow

In the physical word, everything changes continuously in time like a


river flows. Discontinuity is just a special case. Sensory-motor states
change continuously and neural activation states in essential dimen-
sions do so, too as Churchland observed (2010; also see section 4.3).
If this is granted, one of the most difficult questions in understanding
the sensorymotor system should be how continuous sensorymotor
flows can be recognized as well as generated structurally, that is, recog-
nized as segmented into chunks as well as generated with articulation.
According to the motor schemata theory proposed by Michael Arbib
(1981), a set of well-practiced motor programs or primitives are stored
in long-term memory, and different combinations of these programs in
space and time can generate a variety of motor actions. Everyday actions,
such as picking up a mug to drink some coffee can be generated by con-
catenating different chunks or behavioral schemes, namely those of the
vision system attending to the mug, the hand approaching the handle of
the mug in the next chunk, followed by the hand gripping the handle
in the final chunk. Similarly, Yasuo Kuniyoshi proposed that complex

175
176

176 Emergent Minds:Findings from Robotics Experiments

human actions can be recognized by their structurally segmenting visual


perceptual flow of concatenated reusable patterns (Kuniyoshi et al.,
1994). Kuniyoshi and colleagues (2004) also showed in a psychological
experiment that recognition of timing of such segmentation is essential
to extract crucial information about the action observed.
The problem of segmentation is closely related also to the afore-
mentioned phenomenological problem of time perception considered
by Husserl, which is concerned with the question of how a flow of
experiences in the preempirical level can be consciously recalled in the
form of articulated objects or events at the objective time level (section
3.2). Please note that we did not address this problem in the previous
experiments with the Yamabico robot because segmentation of sen-
sory flows was mechanized by the hand-coded program for branching.
Yamabico received sequences of discontinuous sensory states at each
branchingpoint.
In this chapter, our robots have to deal with a continuous flow of
sensory-motor experiences. Then, we investigate how these robots
can acquire a set of behavioral schemes and how they can be used for
recognizing as well as generating whole complex actions by segment-
ing or articulating the sensory-motor flow. Ipresume that mirror neu-
rons are integral to such processes because Ispeculate that they encode
basic behavior schemes in terms of predictive coding (Rao & Ballard,
1999; Friston, 2010; Clark, 2015) that can be used for both recogni-
tion and generation of sensory- motor patterns, as mentioned previ-
ously (see section 4.2). This chapter develops this idea into a synthetic
neuroroboticsmodel.
The following sections will introduce our formulation of the basic
dynamic neural network model for the mirror neuron system. The for-
mulation is followed by neurorobotics experiments utilizing the model
for a set of cognitive behavior tasks including creation of novel patterns
via learning a set of behavior patterns, imitative learning, and acquisition
of actional concepts via associative learning between a quasilanguage
and motor behaviors. The analysis of these experimental results provide
us with some insight into how the interaction between the top-down
prediction/generation process and the bottom-up recognition process
can achieve segmentation of a continuous perceptual flow into mean-
ingful chunks, and how distributed representation schemes adopted in
the model can enhance the generalization of learned behavioral skills,
knowledge, and concepts.
177

Mirroring Action Generation and Recognition 177

8.1. AMirror Neuron Model:RNNPB

In this section, we examine a dynamic neural network model, the


recurrent neural network with parametric biases (RNNPB) that Iand
my colleagues (Tani, 2003; Tani et al., 2004) proposed as a possible
model to account for the underlying mechanism for mirror neurons
(Rizzolatti etal., 1996.) The RNNPB model adopts the distributed rep-
resentation framework by way of which multiple behavioral schemes
can be memorized in a single network by sharing its neural resources.
This contrasts with the local representation framework in which each
memory content is stored in a distinct local module network separately
(Wolpert & Kawato, 1998; Tani & Nolfi, 1999; Demiris & Hayes, 2002;
Shanahan,2006).
In RNNPB, the inputs of a low-d imensional static vector, the para-
metric bias (PB) represent the intention for action to be enacted. The
RNNPB generates prediction of the perceptual sequence for the out-
come of the enactment of the intended action. The RNNPB can model
the mirror neuron system in an abstract sense because the same PB
vector value accounts for both generation and recognition of the same
action in terms of the corresponding perceptual sequence pattern. This
idea corresponds to the aforementioned concept about the predictive
model in the parietal cortex associated with mirror neurons shown in
Figure 4.6. From the viewpoint of dynamical systems, the PB vector
is considered to play the role of bifurcation parameters in nonlinear
dynamical systems as the PB shifts the dynamic structure of the RNN
for generating different perceptual sequences. Lets look at the detailed
mechanism of the model (Figure8.1).
The RNNPB can be regarded as a predictive coding or genera-
tive model whereby different target perceptual sequence patterns,
pt , t = 0...l -1 can be learned for regeneration as mapped from the cor-
responding PB vector values. The PB vector for each learning sequence
pattern is determined autonomously without supervision by utilizing
the error signals back-propagated to the PB units, whereas the synaptic
weights (common to all patterns) are determined during the learning
process as shown in Figure 8.1a. Readers should note that the RNNPB
can avoid the frame problem described in section 4.2 because the
dynamic mapping to be learned is not from arbitrary actions to per-
ceptual outcomes at each time step but from a specific set of actional
178

178 Emergent Minds:Findings from Robotics Experiments

(a) (b) (c)


Teaching target: pt+1 Perception target: pt+1

Error pt+1 Error


pt+1 pt+1

delay line
delay line

PB PB PB
pt ct pt ct pt ct
Inferred by Externally Inferred by
error-BP set error-BP

Learning phase Generation phase Recognition phase

Figure8.1. The system flow of a recurrent neural network with


parametric biases (RNNPB) in (a)learning mode, (b)top-down generation
mode where intention is set externally in the PB, and (c)bottom-up
recognition mode wherein intention in the PB is inferred by utilizing the
back-propagatederror.

intentions to the corresponding perceptual sequences. This makes the


learning process feasible because the network is trained not for all pos-
sible combinatorial trajectories but only for selectedones.
After the learning is completed, the network is used both for generat-
ing (predicting) and recognizing perceptual sequences. The learned per-
ceptual sequences can be re-generated by means of forward dynamics of
the RNNPB by the PB set given with values determined in the learning
process (see Figure 8.1b). This is the top-down generation process with
the corresponding actional intention represented by the PB. Perceptual
sequences can be generated and predicted either in the open-loop mode
by receiving the current perceptual inputs from the environment, or in
the closed-loop mode, wherein motor imagery is generated by feeding
back the networks own prediction outputs into the inputs (dotted line
indicates the feedbackloop.)
On the other hand, in (c), the experienced perceptual sequences can
be recognized by searching the optimal PB values that minimize the
errors between the target sequences to be recognized and the output
sequences to be generated, as shown in Figure 8.1c. This is the bottom-
up process of inferring the intention in terms of the PB for the given
perceptual sequences. As an experiment described later shows, gen-
eration of action and recognition of the resultant perceptual sequences
179

Mirroring Action Generation and Recognition 179

can be performed simultaneously. More specifically, behavior is gen-


erated by predicting change in posture in terms of proprioception,
depending on the current PB, while the PB is updated in the direction
of minimizing the prediction error for each coming perceptual input.
By this means, the intentionperception cycle can be achieved in the
RNNPB, whereby the circular causality between intention and percep-
tion appears. Note also that both action learning and generation are
formulated as dynamic processes for minimizing the prediction error
(Tani, 2003), the formulation of which is analogous to the free-energy
principle proposed by Karl Friston (2005;2010).
Here, Ishould explain the learning process more precisely, because
its mechanism may not necessarily be intuitive. When learning is com-
menced, the PB vector of each training sequence is set to a small random
value. The forward top-down dynamics initiated with this temporarily
set PB vector generates a predictive sequence for the training perceptual
sequence. The error generated between the target training sequence and
the output sequence is back-propagated along the bottom-up path iter-
ated backward through time steps via recurrent connections, whereby
the connection weights are modified in the direction of minimizing the
error signal. The error signal is also back-propagated to the PB units, in
which their values for each training sequence are modified. Here, we
see that the learning proceeds by having dense interactions between
the top-down regeneration of the training sequences and the bottom-
up regression of the regenerated sequences utilizing the error signals.
The internal structures for embedding multiple behavior schemata can
be gradually developed though this type of the bottom-up and top-
down interaction by self-organizing distributed representation in the
network.
It is also important to note that the generation of sequence pat-
terns is not limited to trained ones. The network can create a vari-
ety of similar or novel sequential patterns depending on the values of
the PB vector. It is naturally assumed that if PB vectors are similar,
they would generate similar sequence patterns, otherwise they could
be quite different. The investigation of these characteristics is one of
the highlights in the study of the current model characterized by its
distributed representational nature. The following subsections detail
such characteristics of the RNNPB model by showing robotics experi-
ments usingit.
180

180 Emergent Minds:Findings from Robotics Experiments

8.2. Embedding Multiple Behaviors inDistributed Representation

A simple experiment involving learning a set of target motor behaviors


was conducted to examine PB mapping, in which a structure emerges as
a result of self-organization through the process of learning. PB mapping
shows how points in the PB vector space can be mapped to sequence
patterns to be generated after learning a set of target patterns. In this
experiment, an RNNPB was trained on five different movement pat-
terns of a robotic arm with four degrees of freedom. The five target
movement patterns in terms of four-d imensional proprioceptive (joint
angle) sequence patterns are shown in Figure8.2.
Teach-(1, 2, and 3)are discrete movements with different end points,
and Teach-(4 and 5)are different cyclic movements. The arrows associ-
ated with those sequence patterns indicate the corresponding PB vector
points determined in two-d imensional space in the training. It can be
seen that the PB vectors for all three discrete movement patterns appear
in the upper right region and the PB vectors for the two target cyclic
movement patterns appear in the lower right region in PB space, which
was found to be divided into two regions (the boundary is shown as a
dotted curve), as shown in Figure8.2.
The area above the dotted curve is the region for generating dis-
crete movements, and the remaining area under the dotted curve is
for cyclic movement patterns (including nonperiodic ones). An impor-
tant observation is that the characteristic landscape is quite smooth in
the region of discrete movements, wherby if the PB vector is changed
slightly, the destination point of the discrete movement changes only
slightly. Particularly inside of the triangular region defined by these
three PB points corresponding to the trained discrete movements, the
profiles of all generated sequence patterns seem to be generated by
interpolations of these three trained sequence patterns. On the other
hand, the characteristic landscape in the region of periodic movement
patterns is quite rugged against changes in the PB values. The pro-
files of generated patterns could change drastically as compared with
changes in the PB vector in this region. Patterns generated from this
region could include a variety of novel patterns, such as Novel-(1 and
2)shown in Figure 8.2. Novel-2 is a nonperiodic pattern that is espe-
cially difficult to imagine as being derived from the profiles of the
training patterns.
181
1.0 1.0 1.0 1.0 P[1]

Proprioception
P[3]

p
P[4]

0 0 0 0 P[2]
0 20 40 60 80 100 0 5 10 0 5 10 15 0 5 10
Novel-2 time Teach-1 Teach-2 Teach-3

1.0
(0.78,0.91)
(0.87,0.81)
1.0

PB2
Proprioception (0.57,0.71)

(0.86,0.49)

0 (0.61,0.29)
0 20 40
time
Novel-1 0.0 PB1 1.0

1.0 1.0

Proprioception

0 0
0 10 20 30 40 0 10 20 30 40
time time
Teach-4 Teach-5

Figure8.2. Mapping from PB vector space with two-d imensional principal components to the generated movement patternspace.
182

182 Emergent Minds:Findings from Robotics Experiments

One interesting observation here is that two qualitatively distinct


regions appeared, namely the discrete movement part and the cyclic
movement part, including nonperiodic patterns. The former successfully
achieves generalization in terms of interpolation of trained sequence pat-
terns because it might be easy to extract common structures shared by
the three trained discrete movements, which exhibit fixed-point dynam-
ics with various destination points. On the other hand, in the latter case
it is difficult to achieve generalization because structures shared between
the two cyclic movement patterns with different shapes, periodicities,
and amplitudes are equally difficult to extract. This results in a highly
nonlinear landscape in this region due to the embedding of quite dif-
ferent dynamic patterns in the same region. In such a highly nonlinear
landscape, diverse temporal patterns can be created by changing the PB
vector.
The aforementioned experiment result fits very well with Jamess
thought (James, 1892)that when the memory hosts complex relations
or connections between images of past experiences, images can be
regenerated with spontaneous variations into streams of consciousness
(see section 3.6). James predicted this type of phenomena without con-
ducting any experiments or simulations but only from formal introspec-
tion. Now that we have covered the basic characteristics of the RNNPB
model, the following subsections introduce a set of cognitive robotics
experiments utilizing the RNNPB model with a focus on mirror neu-
ron functions. First, the next subsection looks at the application of the
RNNPB model to a robot task of imitation learning.

8.3. Imitating Others byReading Their MentalStates

In section 5.2, I briefly explained about the development of imita-


tion behavior with emphasis on its early stage in which the imitation
mechanism is accounted for by simple stimulus response. Ialso intro-
duced a robot study by Gaussier and colleagues (1998) that showed
that robots can generate synchronized imitation with other robots
using acquired visuo- proprioceptive mapping under the homeosta-
sis principle. Rizzolatti and colleagues (2001) suggested the neural
mechanism at this level as response facilitation without understanding
meaning. Experimental results using monkeys indicated that the same
183

Mirroring Action Generation and Recognition 183

motor neurons in the rostral part of the inferior parietal cortex are acti-
vated when a monkey generates and when he observes meaningless arm
movements.
Also, as mentioned in section 4.2, it was observed that the same F5
neurons in monkeys fire when purposeful motor actions such as grasp-
ing an object, holding it, and bringing it to the mouth are either gener-
ated or observed. The neural mechanism at this level is called response
facilitation with understanding meaning (Rizzolatti etal., 2001), which
is considered to correspond to the third stage of the like me mecha-
nism hypothesized by Meltzoff (2005). In this stage my mental state
can be projected to those of others who act like me. Iconsider that our
proposed mechanism for inferring the PB states in RNNPB can account
for the like me mechanism at this level. Lets look here at the results
of a robotics experiment that my team conducted to elucidate how the
recognition of others actional intention can be mirrored in ones own
generation of the same action, wherein the focus falls again on the online
error regression mechanism used in the RNNPB model (Ito & Tani, 2004;
Ogata etal.,2009).

8.3.1 Model and Robot ExperimentSetup

This experiment on imitative interactions between robots and humans


was conducted by using Sony humanoid robot QRIO (Figure8.3).
In the learning phase of this experiment, the robot learns multi-
ple hand movement patterns demonstrated by the experimenter. The
RNNPB learns to predict how the positions of the experimenters
hands (perceived as a visual image) change in time in terms of dynamic
mapping from v t to v t+1. Simultaneously, the network also learns, in an
imitative manner, to predict how its own arms (4DOF joints for each
arm) move as corresponding to the observed movements performed
by the experimenter. This prediction takes the form of dynamic map-
ping of arm proprioception from pt to pt+1 through direct training per-
formed by a teacher who guides the movements of the robots arms
by moving them directly while following the experimenters hand
movements. The tutoring is conducted for each movement pattern by
determining its corresponding PB vector for encoding. In the interac-
tion phase, when one of the learned movement patterns is demon-
strated by the experimenter, the robot is expected to recognize it by
184

184 Emergent Minds:Findings from Robotics Experiments

Figure8.3. Sony humanoid robot QRIO employed in the imitation learning


experiment. Reproduced from Tani etal. (2004) with permission.

inferring an optimal PB vector for reconstruction of the movement


pattern through which its own corresponding movement pattern may
be generated. When the experimenter switches his/her demonstration
of hand movement patterns from one to another freely, the movement
patterns generated by the robot should change accordingly by inferring
the optimal PB vector.

8.3.2 Results:Reading Others Mental States bySegmenting


PerceptualFlow

In the current experiment, after the robot was trained on four different
movement patterns, it was tested in terms of its dynamic adaptation
to sudden changes in the patterns demonstrated by the experimenter.
Figure 8.4 shows one of the obtained results in which the experimenter
switched demonstrated movement patterns twice during a trial of
160steps.
It can be seen that when the movement pattern demonstrated by the
experimenter was shifted from one of the learned patterns to another,
185

1.0 LYH
Actual Human Hand LZH
0.8 RYH
RZH
Position
0.6
L: left
0.4 R: right
0.2 Y: Y-axis
Z: Z-axis
0.0 H: hand
20 40 60 80 100 120 140
Step

LYH
1.0 LZH
Predicted Human

0.8 RYH
Hand Position

RZH
0.6
L: left
0.4 R: right
Y: Y-axis
0.2 Z: Z-axis
0.0 H: hand
20 40 60 80 100 120 140
Step
Generated Robot Arm

LSHP
1.0 LSHR
(Joint Angle)

LSHY
0.8
RSHP
0.6 RSHR
RSHY
0.4
SH: shoulder
0.2 P: pitch
0.0 R: roll
20 40 60 80 100 120 140 Y: yaw
Step

1.0 PBN1
PBN2
0.8 PBN3
PBN4
0.6
PB

0.4
0.2
0.0
20 40 60 80 100 120 140
Step

Figure8.4. Dynamic changes in the movement patterns generated by


the robot triggered by changes in the movements demonstrated by the
experimenter. The time evolution profile of the perceived position of the
experimenters hand and the profile predicted by the robot are shown in the
first and the second rows, respectively. The third and fourth rows show the
time profiles for the predicted proprioception (joint angles) of the robots
arm and the PB vectors, respectively. Adopted from Tani etal. (2004) with
permission.
186

186 Emergent Minds:Findings from Robotics Experiments

the visual and proprioceptive prediction patterns were also changed cor-
respondingly, accompanied by stepwise changes in the PB vector. Here,
it can be seen that the continuous perceptual flow was segmented into
chunks of different learned patterns via sudden changes in the PB vector
mechanized by bottom-up error regression. This means that RNNPB
was able to read the transition of mental states of the experimenter by
segmenting theflow.
There was an interesting finding that connects the ideas of compo-
sitionality and segmentation. When the same robot was trained for a
long sequence that consisted of periodic switching between two differ-
ent movement patterns, the whole sequence was encoded by a single
PB vector without segmentation. This happened because perception of
every step in the trained sequence was perfectively predictable, includ-
ing the moment of switching between the movement patterns due to the
exact periodicity in the tutored sequence. When everything becomes
predictable, all moments of perception belong to a single chunk without
segmentation. The compositionality entails potential unpredictability
because there is always some arbitrariness, perhaps by free will, in
combining a set of primitives into the whole. Therefore, segmentation of
the whole compositional sequence into primitives can be performed by
using the resultant prediction error. In this situation, what is read from
the experimenters mind might be his or her free will for alternating
among primitive patterns.
The aforementioned results accord with the phenomenology of time
perception. Husserl assumed that the subjective experience of now-
ness is extended to include the fringes in the sense of both the expe-
rienced past and the future, in terms of retention and protention, as
described in section 3.3. This description of retention and protention
in the preempirical level seems to correspond directly to the forward
dynamic undertaken by RNNs (Tani, 2004). RNNs perform predic-
tion by retaining the past flow in a context-dependent way. This self-
organized contextual flow of the forward dynamics in RNNs could be
responsible for the phenomenon of retention. Even if Husserls notion
of nowness in terms of retention and protention is understood as corre-
sponding to contextual dynamics in RNNs, the following question still
remains:What are the boundaries of nowness?
The idea of segmentation could be the key to answering this question.
Our main idea is that nowness is bounded where the flow of experience
187

Mirroring Action Generation and Recognition 187

is segmented (Tani, 2004). In the RNNPB model, when the external


perceptual flow cannot be matched with the internal flow correspond-
ing to the anticipated outcome, the resultant error drives PB vector
change. When the prediction is not fulfilled, the flow is segmented into
chunks, which are no longer just parts of the flow but rather represent
events that are identified as one of the perceptual categories by the
PB vector. This identification process takes a certain period of effort
accompanied by consciousness because of delays in the convergence of
the PB regression dynamics, as observed in the preceding experiments.
This might also explain the aforementioned observation by Varela
(1999) that the flow of events in the immediate past is experienced just
as an impression, which later becomes a consciously retrieved object
after undergoing segmentation. Finally, Iclaim that projection of my
mental state to those of others who act like me assumed in the third
stage of Meltzoffs (2005) like me mechanism should accompany
such conscious process.

8.3.3 Mutual ImitationGame

The previous experiment involved unidirectional interaction in which


only the robot adapted to movements demonstrated by the experi-
menter. Our next experiment examined the case of mutual interac-
tion by introducing a simple game played by the robot and human
subjects. In this new experiment, the robot was trained for four move-
ment patterns by the experimenters and then human subjects who
were unaware of what the robot had learned participated. In the imita-
tion game, the subjects were instructed to identify as many movement
patterns as possible and to synchronize their movements with those
of the robot through interactions. Five subjects participated in the
experiment and each subject was allowed to interact with the robot
for 1hour.
Although most of the subjects eventually identified all of the move-
ment patterns, the interaction was not trivial for them. If they merely
attempted to follow the robots movement patterns, convergence could
not be achieved in most instances because the PB values fluctuated
wildly when unpredictable hand movement patterns were demon-
strated. Actually, the robot tended to generate diverse movement pat-
terns due to fluctuations in the PB. Also, if the subjects attempted to
188

188 Emergent Minds:Findings from Robotics Experiments

execute their desired movement patterns regardless of the robots move-


ments, the robot could not follow them unless the movement patterns of
the subjects corresponded to those already learned by therobot.
The movement patterns of the human and the robot as well as the
neural activity (PB units) obtained during interaction in the imitation
game are plotted in Figure 8.5 in the same format as in Figure 8.4. We
can see that diverse movement patterns are generated by the robot
and the human subject, accompanied by frequent shifts during their
interactions.
It can be seen that matching by synchronization between the human
subjects movements and the robots predictions is achieved after an
exploratory phase (see the sections denoted as Pattern 1 and Pattern
2 in the figure). However, it was often observed that such matching
was likely to break down before a match was achieved for another
pattern.
An interesting observation involves the spontaneous switching of
initiative between the robot and the subjects. In postexperiment inter-
views, the subjects reported that when they felt that the robot movement
pattern became close to theirs, they just kept following the movements
passively to stabilize the pattern. However, when they felt that their
movements and those performed by the robot could not synchronize,
they often initiated new movement patterns, hoping that the robot
would start to follow them and eventually synchronize its movements
with those of the subject. This observation is analogous to the turn tak-
ing during imitative exchange observed by Nadel (2002) as described in
section5.2.
Another interesting observation was that spontaneous transitions
between the synchronized phase and the desynchronized phase tended
to occur more frequently in the middle of each session, when the sub-
ject was already familiar with the robots responses to some degree.
When the subjects managed to reach a synchronized movement pat-
tern, they tended to keep the attained synchronization for a short
period of time to memorize the pattern. However, this synchronization
could break down after a while due to various uncertainties in mutual
interactions. Even small perturbations could confuse the subjects if
they were not yet fully confident of the robots repertoire of movement
patterns. This too can be explained by the mechanism of self-organized
criticality (see section 7.2), which can emerge only during a specific
period characterized by an adequate balance between predictability
189
Pattern1 Pattern2
1.00 LYH
Actual Human Hand LZH
Position 0.80 RYH
0.60 RZH
0.40 L: left
R: right
0.20 Y: Y-axis
0.00 Z: Z-axis
20 40 60 80 100 120 140 160 180 200 H: hand
LYH
Predicted Human Hand

1.00
LZH
0.80 RYH
RZH
Position

0.60
L: left
0.40
R: right
0.20 Y: Y-axis
Z: Z-axis
0.00
20 40 60 80 100 120 140 160 180 200 H: hand
1.00 PBN1
0.80 PBN2
0.60 PBN3
PB

PBN4
0.40
0.20
0.00
20 40 60 80 100 120 140 160 180 200
Step

Figure8.5. Asnapshot of parameter values obtained during the imitation game. Movement matching by synchronization between the
human subject and the robot took place momentarily, as can be seen from the sections denoted as Pattern 1 and Pattern 2 in theplot.
190

190 Emergent Minds:Findings from Robotics Experiments

and unpredictability in the course of the subjects developmental


learning in the mutual imitation game. Turn taking was observed more
frequently during this period. These results imply that vivid commu-
nicative exchanges between individuals can appear by utilizing and
anticipating such criticality.
The current experimental results of the imitation game suggest that
imitation provides not only a simple function of storing and regenerating
observed patterns, but also provides for rich functions of spontaneously
generating novel patterns from learned ones through dynamic interac-
tions with others. In this context, we may say that imitation for human
beings is a means for developing diverse creative images and actions
through communicative interaction, rather than simply for mimicking
action patterns as demonstrated by others likeme.
The next subsection explores how mirror neurons may function in
developing actional concepts through the association of language with
action learning.

8.4. Binding Language andAction

In conventional neuroscience, language processing and action processing


have been treated as independent areas of research simply because of
the different areas of expertise necessary for conducting studies in each
of those areas. However, as mentioned in section 4.2, recent reports
have shown that understanding words or sentences related to actions
may require the presence of specific motor circuits responsible for gen-
erating those actions, and therefore the parts of the brain responsible
for language and actions might be interdependent (Hauk etal., 2004;
Tettamanti etal.,2005).
According to Chomskian ideas in conventional linguistics, linguistic
competence has been regarded as independent from other competen-
cies, including sensorymotor processing (see the argument on the fac-
ulty of language in narrow sense by Hauser, Chomsky, and Fitch [2002]
in section 2.1). This view, however, is now being challenged by recent
evidence from neuroscience, including the aforementioned studies
examining the interdependence between linguistic and other modali-
ties. If everyday experiences involving speech and its corresponding
sensory-motor signals tend to overlap during child development, synap-
tic connections between the two circuits can be reinforced by Hebbian
191

Mirroring Action Generation and Recognition 191

learning, as discussed by Pulvermuller (2005). This suggests the pos-


sibility that the meanings of words and sentences as well as associated
abstract concepts can be acquired in association with related sensory
motor experiences. Researchers working in the area of cognitive lin-
guistics have proposed the so-called usage-based approach (Tomasello,
2009), wherein it is argued that linguistic competency can be acquired
through statistical learning of linguistic and sensorymotor stimuli dur-
ing child development, without the need to assume innate mechanisms
such as Chomskys universal grammar. Analogous to these ideas is the
view of Arbib (2012) discussed earlier, that the evolution from dex-
terous manual behaviors learned by imitation to the anticipated imi-
tation of conventionalized gestures (protolanguage) is reflected in the
evolution within the primate line and resulted in humans endowed with
language-ready brains.

8.4.1Model

In this context, we consider the possibly interdependent nature of lan-


guage and motor action in terms of a mirror neuron model. This concept
is based on a predictive coding model for linguistic competence assumed
in the extension of Wernickes area to Brocas area and another predic-
tive coding model for the action competency assumed in the extension
from Brocas area and the parietal cortex to the motor cortex. Brocas
area, as a hub connecting these two distinct pathways, is assumed to
play the role of unifying the two different modalities by means of mir-
roring recognition in one modality and generation in the other modality
by sharing the intention.
The version of the RNNPB model proposed by Yuuya Sugita and
me (Sugita & Tani, 2005) for investigating the task of recognizing a
given set of action-related imperative sentences (word sequences) and of
also generating the corresponding behaviors (sensory-motor sequences)
is shown in Figure8.6.
The model consists of a linguistic RNNPB and a behavioral RNNPB
that are interconnected through PB units. The key idea of the model
is that the PB activation vectors in both modules are bound to become
identical for generating pairs of corresponding linguistic and behavioral
sequences via learning. More specifically, in the course of associative
learning of pairs of linguistic and behavioral sequences, the PB activa-
tion vectors in both modules are updated in the direction of minimizing
192

192 Emergent Minds:Findings from Robotics Experiments

(a) teaching word sequence teaching sensory motor


target: wT+1 target: mt+1, st+1

Error Error
wT+1 mt+1 st+1

PBI PBb
wT ct mt st ct

PB
linguistic module Shared behavior module

Learning phase

(b) given word sequence


wT+1 sensory motor generation

Error
wT+1 mt+1 st+1

PBI PBb
wT ct mt st ct

PB
linguistic module Transfer behavior module
Recognition and generation phase

Figure8.6. RNNPB model extended for language-behavior bound learning.


(a)Bound learning of word sequences and corresponding sensory-motor
sequences through shared PB activation and (b)recognition of word
sequences in the linguistic recurrent neural network with parametric biases
(RNNPB) and generation of corresponding sensorymotor sequences in the
behavioral RNNPB. Redrawn from Tani etal. (2004).

their differences as well as minimizing the prediction error in both


modalities (Figure 8.6a). By using the error signal back-propagated from
both modules to the shared PB units, a sort of unified representation
between the two modalities is formed through self-organization in the
PB activations. After convergence of the bound learning, word sequences
shown to the linguistic RNNPB can be recognized by inferring the PB
193

Mirroring Action Generation and Recognition 193

activation values by means of error regression. Thereafter, the forward


dynamics of the behavioral RNNPB activated with the obtained PB acti-
vation values generate a prediction of the corresponding sensory-motor
sequences (Figure8.6b).

8.4.2 Robot Experiments

Yuuya Sugita and I (Sugita & Tani, 2005) conducted robotics experi-
ments on this model by utilizing a quasilanguage with the aim of gain-
ing insights into how humans acquire compositional knowledge about
action-related concepts through close interactions between linguistic
inputs and related sensorymotor experiences. We also addressed the
issue of generalization in the process of learning linguistic concepts,
which concerns the inference of the meanings of as yet unknown combi-
nations of word sequences through a generalization capability related to
the poverty of stimulus problem (Chomsky, 1980)in human language
development.
A physical mobile robot equipped with vision and a one-DOF arm
was placed in a workspace in which red, blue, and green objects were
always located to the left, in front, and to the right of the robot, respec-
tively (Figure8.7).

(a) red, blue and green objects (b)


blue
red green

mobile robot with vision and


1-D hand at home position

Figure8.7. Robot experiment setup for language-behavior bound learning.


(a) The task environment with the mobile robot in the home position and
three objects in front of the robot. (b)Atrained behavior trajectory of the
command hit red. Adopted from Tani etal. (2004) with permission.
194

194 Emergent Minds:Findings from Robotics Experiments

A set of sentences consisting of three verbs (point, push, hit), six


nouns (left, center, right, red, blue, green) were considered. For exam-
ple, push red means that the robot is to move to the red object and
push it with its body, and hit left means that the robot is to move
to the object to its left and hit it with its arm (Figure 8.7b). Note that
red and left are synonymous in the setting of this workspace, as are
blue and center and as are green and right. For given combinations
of verbs and nouns, corresponding actions in terms of sensory-motor
sequences composed of more than 100 steps are trained by guiding the
robot while introducing slight variations in the positions of the three
objects with each trial. The sensory-motor sequences consist of sensory
inputs in the form of several visual feature vectors, values for motor
torques of the arm and wheel motors, and motor outputs for the two
wheels and the one-DOF arm. To investigate the generalization capabil-
ities of the robot, especially in the case of linguistic training, only 14 out
of the 18 possible sentences were trained. This means that behavioral
categories corresponding to the four untrained sentences were learned
without being bound with sentences.

8.4.3 Compositionality and Generalization

Recognition and generation tests were conducted after convergence in


learning was attained by minimizing the error. Corresponding behav-
iors were successfully generated for all 18 sentences, including the
four untrained ones. To examine the internal structures emerging as
a result of self-organization in the bound learning process, an analysis
of the PB mapping was conducted by taking two-d imensional principal
components in the original six-d imensional PB space. Figure 8.8 shows
the PB vector points corresponding to all 18 sentences as plotted in a
two-d imensionalspace.
These PB points were obtained as a result of the recognition of cor-
responding sentences. The PB vector points for the four untrained word
sequences are surrounded by dashed circles in the figure. First, it can be
seen that PB points corresponding to sentences with the same verbs fol-
lowed by synonymous nouns appeared close to each other on the two-
dimensional map. For example, hit left and hit red appeared close
to each other in the space. Even more interesting is that the PB map-
pings for all 18 sentences appeared in the form of a two-d imensional
grid structure with one dimension for verbs and another for nouns.
195

Mirroring Action Generation and Recognition 195

This means that the PB mapping emerged through self-organization of


an adequate metric space, which can be used for compositional repre-
sentation of acquired meanings in terms of combinations of verbs and
object nouns. Furthermore, it should be noted that even the untrained
sentences (push red/left and point green/right) were mapped to
appropriate points on the grid (see the points surrounded by dotted
circles in Figure 8.8). This explains why untrained sentences were rec-
ognized correctly, as inferred from the successful generation of corre-
sponding behaviors.
These results imply that meanings are acquired through generaliza-
tion when a set of meanings is represented as a distribution of neural
activity while preserving the mutual relationships between meanings in
a binding metric space. Such generalization cannot be expected to arise
if each meaning or concept is stored in a separate local module as is the
case in localist models. It is postulated that mutual interactions between

0.8
PB (2nd PC)

re
d

int
po
blu
no

e
un

hit
gr

b
ee

pus
h ver
n

0.2
0.2 0.8
PB (1st PC)

Point red Push red Hit red


Point left Push left Hit left
Point blue Push blue Hit blue
Point center Push center Hit center
Point green Push green Hit green
Point right Push right Hit right

Figure8.8. Mapping from PB vector points to generated word sequences.


The two-d imensional grid structure consists of an axis for verbs and
another for nouns. Four PB points surrounded by dotted circles correspond
to untrained sentences (push red, push left, point green, and point right.)
Redrawn from Sugita and Tani (2005).
196

196 Emergent Minds:Findings from Robotics Experiments

different concepts during learning processes can eventually induce the


consolidation of generalized structures in the memory structure as rep-
resented earlier in the form of a two-d imensional distribution. This idea
is analogous to what the PDP group (1986) argued in their connection-
ist book more than two decades ago (see section5.4).
Finally, Iwould like to add one more remark concerning the role of
language in developing compositional conceptual space. When the afore-
mentioned experiments were conducted without binding the linguistic
inputs in learning the same set of action categories, we found that nine
different clusters corresponding to different actional categories were
developed without showing any structural relations among them, such
as is illustrated by the aforementioned two-d imensional grid structure
in the PB space. This result suggests that compositionality explicitly
perceived in the linguistic input channel can enhance the development
of compositionality in the actional channel via shared neural activity,
perhaps, again, within the Brocas area of the humanbrain.

8.5.Summary

Weve now covered RNNPB models that can learn multiple behavioral
schemes in the form of structures represented as distributions in a sin-
gle RNN. The model is characterized by the PB vector, which plays an
essential role in modeling mirror neural functions in both the generation
and recognition of movement patterns by forming adequate dynamic
structures internally through self-organization. The model was evalu-
ated through a set of robotics experiments involving the learning of
multiple movement patterns, the imitation learning of others move-
ment patterns, and generating actional concepts via associative learning
of proto-language and behavior.
The hallmark of these robotics experiments exists in their attempt
to explain how generalization in learning as well as creativity for gen-
erating diversity in behavioral patterns can be achieved through self-
organizing distributed memory structures. The contrast between the
proposed distributed representation scheme and the localist scheme in
this context is clear. On the localist scheme, each behavioral schema
is memorized as an independent template in a corresponding local
module, whereas on the distributed representation scheme, learning
197

Mirroring Action Generation and Recognition 197

is considered to include not just memorizing each template of behav-


ioral patterns but also reconstructing them by extracting the structural
relationships between the templates. If there are tractable relationships
between learned patterns in a set, these relationships should appear in
the corresponding memory structures as embedded in a particular met-
ric space. Such characteristics of distributed representation in RNNPB
model has been investigated by others (Ogata etal., 2006; Ogata etal.,
2009; Zhong, etal., 2014)aswell.
The aforementioned characteristics were demonstrated clearly in
the analysis of the PB mapping obtained in the results of learning a set
of movement patterns and of learning bound linguistic and behavioral
patterns. The RNNPB model learned a set of experienced patterns not
just as they were, but also deeply consolidated them, resulting in the
emergence of novel or creative images. This observation might account
for a fascinating mechanism of human cognition by way of which we
humans can develop images or knowledge through multiple stages from
our own limited experiences:In the first stage, each instance of experi-
ence is acquired; in the second stage, generalized images or concepts
are developed by extracting relational structures among the acquired
instances; in the third stage, even novel or creative ones can be found in
the memory developed with the relational structures after long period
of consolidation.
Another interesting characteristic feature of the model is that it
accounts for both top- down generation and bottom- up recognition
processes by utilizing the same acquired generative model. Interactions
between these two processes take place in offline learning processes as
well as during real-time action generation/recognition. In offline learn-
ing, iterations of top-down and bottom-up interactions enable long-
term structural developments of the internal structures for PB mapping
in terms of memory consolidation, as mentioned previously. In real-time
action generation/recognition, shifts of the PB vector by means of error
regression enable rapid adaptation to situational changes. As observed
in the imitative game experiments, nontrivial dynamics emerge in the
close interactions between top-down prediction and bottom-up recog-
nition, leading to segmentation of the continuous perceptual flow into
meaningful chunks. Complexity arises from the intrinsic characteristics
of mutual interactions occurring in the process, whereby recognition
of the actions of others in the immediate past has a profound effect on
the actions generated by the robot in the current step, which in turn
198

198 Emergent Minds:Findings from Robotics Experiments

affects the recognition of these perceptual inputs in the immediate


future, thereby forming a circular causality over the continuum of time
between protention and retention.
The same error regression mechanism can give an account for the
problem of imitation. How can motor acts demonstrated by others be
imitated by reading their intentions or mental states? It was shown that
imitating others by inferring their mental states can be achieved by
segmenting the resultant perceptual flow by regressing the PB states
with its prediction error. This prediction error may result in the subject
becoming conscious while recognizing the shift of mental states of oth-
ers as they alternate their motoracts.
Finally, Iassume there might be some concerns about the scalabil-
ity of the RNNPB model, or more specifically whether there are any
limits to the degree of complexity that the learned behavioral pat-
terns can have. Here, Ijust mention that this scalability issue depends
heavily on how functional hierarchies can be developed that can
decompose complex patterns into sets of simpler ones, or compose
them vice versa, in the network. Accordingly, the next two and the
final chapter of this book are entirely dedicated to the investigation
of this problem.
199

9
Development ofFunctional
Hierarchy forAction

It is generally held that the brain makes use of hierarchical organization


for both recognizing sensory inputs and generating motor outputs. As
an example, c hapter4 illustrated how visual recognition proceeds in the
brain from early signal processing in the primary vision area to object
recognition in the inferior temporal area. It also described how action
generation proceeds from the sequencing and planning of action primi-
tives in the supplementary motor area and prefrontal cortex (PFC) to
motor pattern generation in the primary motor cortex (M1). Although
we dont yet completely understand what hierarchy and what levels
exist in the brain and how they actually function, it is generally accepted
that some form of functional hierarchy exists, whereby sensorymotor
processing is conducted at the lower level and more global controls of
those processes occur at the higher level. Also, this functional hierar-
chy is thought to be indispensable for expressing the essential human
cognitive competency of compositionality, in other words, composition
and decomposition of whole complex action routines from and into
reusableparts.

199
200

200 Emergent Minds:Findings from Robotics Experiments

In speculating about possible neuronal mechanisms for a functional


hierarchy that allows for complex actions to be composed by sequen-
tially combining behavior primitives (a set of commonly used behavior
patterns), readers should note that there are various ways to achieve
such compositions. One possibility is to use a localist representation
scheme. For example, Tani and Nolfi (1997, 1999)proposed a localist
model, called a hierarchical mixture of RNNs, Demiris and Hayes
(2002) showed a similar idea in the proposal of Hierarchical Attentive
Multiple Models for Execution and Recognition, and also Haruno
and colleagues (2003) did in the proposal of the so-called hierarchical
MOSAIC. The basic idea was that each behavior primitive is stored
in its own independent local RNN at the lower level, and sequential
switching of the primitives is achieved by a winner-t ake-a ll-t ype gate-
opening control of these RNNs performed by the higher level RNN
(see Figure9.1).
Information processing at the higher level is abstracted in such a
way that the higher level only remembers which RNN in the lower
level should be selected next as well as the timing of switching over
a longer timescale, without concerning itself with details about the
WTA-type gate opening control

GateT+1
Higher
Gate opening

Gate-2 Gate-1 Gate-3

T Gate cT
T
Time Steps
pt+1 p p
t+1 t+1
Perceptual Prediction

Gate-1 Gate-2 Gate-3


Pattern2 Pattern1 Pattern3
Lower

t
Time Steps pt ct pt ct pt ct

Figure9.1. Hierarchical generation of perceptual sequence patterns in


the hierarchical mixture of RNNs. As the higher level RNN dispatches
the lower level RNNs sequentially by manipulating the openings of their
attached gates, sequential combinations of primitive patterns can be
generated.
201

Development of Functional Hierarchy forAction 201

sensorymotor profiles themselves. Although the proposed scheme


seems to be straightforward in terms of mechanizing a functional hier-
archy, the scheme is faced with the problem of miscategorization in
dealing with perturbed patterns. Rather, the discrete mechanism of
dispatching behavior primitives through the winner-take-all-t ype selec-
tion of the lower RNNs tends to generate a certain level of information
mismatch between the higher and lower levels.
Another possible mechanism can be considered by utilizing a distrib-
uted representation scheme in an extension of the RNNPB model. As
Ipreviously proposed (Tani, 2003), if a specific PB vector value is assigned
to each acquired behavior primitive, sequential changes in the PB vector
generated at the higher level by another RNN can cause corresponding
sequential changes in the primitives at the lower level (Figure9.2).
The higher level RNN learns to predict event sequences in terms
of stepwise changes in the PB vector, as well as the timings of such
events. However, this scheme could also suffer from a similar problem

PBT+1

Higher
PB2
PB

PB1

PBT cT

Time Steps T
pt+1

Lower
Perceptual Prediction

Pattern1 Pattern2 Pattern3

PBt
pt ct

Time Steps t

Figure9.2. Possible extension of the RNNPB model with hierarchy, wherein


sequential stepwise changes in the PB vector at the higher level generate
corresponding changes in the primitive patterns at the lower level. Redrawn
from Tani (2003).
202

202 Emergent Minds:Findings from Robotics Experiments

of information mismatch between the two levels. If one behav-


ior primitive is concatenated to another by corresponding stepwise
changes in the PB vector, a smooth connection between the two
primitives cannot be guaranteed. Asmooth connection often requires
some degree of specific adaptation of profiles at the tail of the preced-
ing primitive and at the head of the subsequent primitive, depend-
ing on their combination. However, such fine adaptation cannot take
place by simply changing the components of the PB vectors in a step-
wise manner within the time necessary for the primitive to change.
The same problem is encountered in the case of gated local network
models if primitives are changed by simply opening and closing the
correspondinggates.
The crucial point here is that the generation of compositional actions
cannot be achieved by simply transforming primitives into sequences
in the same manner as manipulating discrete objects. Instead, the
task requires fluid transitions between primitives by adapting them
via interactions between top-down parametric control exerted on the
primitives and bottom-up modulation of signals implementing such
parametric control. Close interactions could minimize the possible
mismatch between the two sides, whereby we might witness what
Alexander Luria (1973) metaphorically referred to as kinetic melody
in the fluid generation of actions. The following sections show that
such fluid compositionality can be achieved without using preexist-
ing mechanisms such as gating and parametric biases. Rather, it can
emerge by using intrinsic constraints on timescale differences in neu-
ral activity between multiple levels in the course of self-organization,
accompanied by iterative interactions between the levels in consolida-
tion learning.
In the following, we see how the functional hierarchy that enables
compositional action generation can be developed through the use of
a novel RNN model characterized by its multiple timescales dynam-
ics. The model is tested in a task involving learning object manipula-
tion and developing this learning. We then discuss a possible analogy
between the synthetic developmental processes observed and real
human infant developmental processes. The discussion helps to
explain that how fluid compositionality can be developed in both
human and artifact through specific constraints within their brain
networks.
203

Development of Functional Hierarchy forAction 203

9.1. Self-Organization ofFunctional Hierarchy inMultiple Timescales

9.1.1 Multiple-Timescale Recurrent Neural Network

My colleague Yuuichi Yamashita and I (Yamashita & Tani, 2008) pro-


posed a dynamic neural network model characterized by the dynamics of
its neural activity in multiple timescales. This model, named the multiple-
timescale recurrent neural network (MTRNN), is outlined in Figure9.3.
The MTRNN consists of interconnected subnetworks to which
dynamics with different timescales are assigned. Each subnetwork takes
the form of a fully connected continuous-time recurrent neural network
(CTRNN) with a specific time constant assigned for the purposes
of neural activation dynamics, as can be seen in Eq. 17 in section 5.5.
The model shown in Figure 9.3 is composed of subnetworks with slow,

Intention state
Top-down Generation
Update Set

Slow
Action Plans
Slow Init A

Init B
Bottom-up error regression

Intermediate
Top-down prediction

Intermediate Pool for Primitives

Fast
Fast Compositional
Vision Generations
Proprio module
module
Action A

Pt+1 Action B
Pt Vt Vt+1
Error Error

Pt+1 Vt+1
motort+1

Figure9.3. The multiple-t imescale recurrent neural network (MTRNN)


model. The left panel shows the model architecture and the right panel
the information flow in the case of top-down generation of different
compositional actions, Action Aand Action B as triggered by the
corresponding intention in terms of initial states of Init Aand Init B in the
intention units, respectively.
204

204 Emergent Minds:Findings from Robotics Experiments

intermediate, and fast dynamics characterized by the leaky-integrator


neural units with larger, medium, and smaller values of , respectively.
Additionally, the subnetwork with fast dynamics is subdivided into two
peripheral modular subnetworks for proprioception/motor operations
and for vision. Our expectation in the proposed multiple timescales
architecture was that the slow dynamic subnet using large time constant
leaky-integrator units should be good at learning long-time correlation,
as indicated by Jaeger and colleagues (2007), whereas the fast dynamics
one should be good at learning precise short-ranged patterns.
We designed this particular model to generate targets of multiple per-
ceptual sequences that contain a set of primitives or chunks acquired as
a result of supervised learning. In this case, we made use of the sensitiv-
ity of the dynamics toward initial conditions seen in nonlinear dynamics
(see section 5.1) as a mechanism for selecting a specific sequence from
among multiple learned ones as intended one. The network dynamic
always starts with the same neutral neural states for all units, with the
exception of some neural units in the subnetwork with slow dynam-
ics referred to as intention units. By providing specific initial states for
these intention units, corresponding learned perceptual sequences of
intended can be regenerated, and thus the initial states of the intention
units play the role of selecting sequences, similar to the role of PB vec-
tors in RNNPB models. The difference is that selection in the case of PB
is based on parametric bifurcation, while in the case of intention units
in MTRNNs this is performed by utilizing the sensitivity of the network
dynamics to the initial conditions. We decided to employ a switching
scheme based on sensitivity to the initial conditions for the MTRNN
because this feature affords learning of sequence patterns with a long
time correlation.
Adequate mappings between the respective initial states of the inten-
tion units and the corresponding perceptual sequences are acquired
by means of the error back-propagation through time learning scheme
applied for CTRNN (Eq. 18 in section 5.5). In the course of error back-
propagation learning, two classes of variables are determined, namely
the connection weights in all subnetworks and a specific set of initial
state values for the intention units for each perceptual sequence to be
learned. When learning is commenced, the initial state of the intention
units for each training sequence is set to a small random value. The for-
ward top-down dynamics initiated with this temporarily set initial state
generates a predictive sequence for the training visuo-proprioceptive
205

Development of Functional Hierarchy forAction 205

sequence. The error generated between the training sequence and the
output sequence is back-propagated along the bottom-up path through
the subnetworks with fast and intermediate dynamics to the subnetwork
with slow dynamics, and this back-propagation is iterated backward
through time steps via recurrent connections, whereby the connection
weights within and between these subnetworks are modified in the
direction of minimizing the error signal. The error signal is also back-
propagated through time steps to the initial state of the intention units,
in which the initial state values for each training sequence are modified.
Here, we see again that learning proceeds through dense interactions
between top-down regeneration of the training sequences and bottom-
up regression of the regenerated sequences utilizing error signals, just as
the RNNPBdoes.
One point to keep in mind here is that the dampening of the error
signal in backward propagation though time steps depends on the time
constant as described previously (see Eq. 18 in section 5.5). It becomes
smaller within the subnetwork with slow dynamics (characterized by
a larger time constant) and greater within the subnetwork with fast
dynamics (characterized by a smaller time constant). This forces the
learning process to extract the underlying correlations spanning lon-
ger periods of time in the training sequences in the parts with slower
dynamics and correlations spanning relatively shorter periods of time in
the parts with faster dynamics in the whole network.
The right panel of Figure 9.3 illustrates how learning multiple per-
ceptual sequences consisting of a set of primitives results in the devel-
opment of the corresponding functional hierarchy. First, it is assumed
that a set of primitive patterns or chunks should be acquired in the
subnetworks with fast and intermediate dynamics through distributed
representation. Next, a set of trajectories corresponding to slower neural
activation dynamics should appear in the subnetwork with slow dynam-
ics in accordance with the initial state. This subnetwork, of which activ-
ity is sensitive to the initial conditions, induces specific sequences of
primitive transitions by interacting reciprocally with the intermediate
dynamics subnetwork. In the slow dynamics subnetwork, action plans
are selected according to intention and are passed down to the interme-
diate dynamics subnetwork for fluid composition of assembled primi-
tives in the fast dynamics subnetwork. It is noted that change in the slow
dynamic activity plays a role of parameter bifurcation for the intermedi-
ate and fast dynamics to generate transitions of primitives.
206

206 Emergent Minds:Findings from Robotics Experiments

As another function, MTRNNs can generate motor imagery by


feeding predicted visuo-proprioceptive states into future inputs, anal-
ogous to the closed-loop forward dynamics of the RNNPB. Diverse
motor imagery can be generated by manipulating the initial state of the
intention units. By this means, our robots with MTRNN can become
self-n arrative about own possibility, as described later. Additionally,
MTRNNs can perform both offline and online recognition of per-
ceptual sequences by means of error regression, as in the case of the
RNNPB model. For example, prediction errors caused by unexpected
visual sensory input due to certain changes in the environment are
back-propagated from the visual module of the fast dynamics subnet-
work through the one with intermediate dynamics to the intention
units in the slow dynamics subnetwork, whereby the modulation of
the activity of the intention units in the direction of minimizing the
errors results in the adaptation of the currently intended action to
match the changed environment. These functions have been evaluated
in a set of robotics experiments utilizing this model, as described later
in this chapter.

9.1.2 Correspondence withNeuroscience

Now, lets revisit our previous discussions and examine briefly the corre-
spondence of the proposed MTRNN model to concepts in system-level
neuroscience. Because the neuronal mechanisms for action generation
and recognition are still puzzling due to clear conflicts between differ-
ent experimental results, as discussed in chapter4, the correspondence
between the MTRNN model and parts of the biological brain can be
investigated only in terms of plausibility at best. First, as shown by Tanji
and Shima (1994), there is a timescale difference in the buildup of neu-
ral activation dynamics between the supplementary motor area (with
slower dynamics spanning timescales of the order of seconds) and M1
(with faster dynamics of the order of a fraction of a second) immediately
before action generation (see Figure4.5), and therefore our assumption
that the organization of a functional hierarchy involves timescale differ-
ences between regional neural activation dynamics should make sense
in modeling the biological brain. Considering this, Kiebel and colleagues
(2008), Badre and DEsposito (2009), and Uddn and Bahlmann (2012)
proposed a similar idea to explain the rostralcaudal gradient of times-
cale differences by assuming slower dynamics at the rostral side (PFC)
207

Development of Functional Hierarchy forAction 207

and faster dynamics at the caudal side (M1) in the frontal cortex to
account for a possible functional hierarchy in the region.
Accordingly, the MTRNN model assumes that the subnetwork with
slow dynamics corresponds to the PFC and/or the supplementary motor
area, and that the modular subnetwork with fast dynamics corresponds
to the early visual cortex in one stream and to the premotor cortex or
M1 in another stream (Figure9.4).
The subnetwork with moderate dynamics may correspond to the
parietal cortex, which can interact with both the frontal part and the
peripheral part. One possible scenario for the top-down pathway is that
the PFC sets the initial state of activations with slow dynamics assumed
in the supplementary motor cortex, which subsequently propagates to
the parietal cortex assumed to exhibit moderate-timescale dynamics.
Activations in the parietal cortex propagate further into peripheral cor-
tices (the early visual cortex and the premotor or primary motor cortex),
whereby detailed predictions of visual sensory input and propriocep-
tion are made, respectively, by means of neural activations with fast
dynamics.
On the other hand, prediction errors generated in those periph-
eral areas are propagated backward to the forebrain areas through
the parietal cortex via bottom-up error regression in both learning
and recognition, assuming of course that the aforementioned retro-
grade axonal signaling mechanism of brains implements the error

Error
PFC/SMA (Slow)
Motor (Fast) Parietal (Medium)
Intention

Error
Vision (Fast)

Figure9.4. Possible correspondence of MTRNN to parts of the biological


brain. The solid line represents the top-down prediction pathway (from
PFC/SMA, Parietal to Motor and Vision), and the dotted line represents the
bottom-up error regression pathway (from Vision, Parietal to PFC/SMA).
208

208 Emergent Minds:Findings from Robotics Experiments

back-propagation scheme (see section 5.5.). In this situation, the pari-


etal cortex wedged between the frontal and peripheral parts plays the
role of an information hub that integrates multiple input modalities
and motor outputs with the current intention for action. It has been
speculated that populations of bimodal neurons in the parietal cortex,
which have been shown to encode multiple modalities of information
processing, such as vision and motor outputs (Sakata etal., 1995)or
vision and somatosensory inputs (Hyvarinen & Poranen, 1974), are
the consequence of synaptic modulation accompanied by top-down
prediction and bottom-up error regression in the iterative learning of
behavioral skills.
It is worth pausing here a moment to think about what the initial
states actually mean in the brain. Because the initial states unfold into
sequences of behavior primitives, which are expanded into target pro-
prioceptive sequences and finally into motor command sequences, it can
be said that motor programs can be represented by the initial states
of particular neural dynamics in the brain. Coincidentally, as I was
writing this section, Churchland and colleagues published new results
from monkey electrophysiological experiments that support this idea
(Churchland etal., 2012). They conducted simultaneous recordings of
multiple neurons in the motor and premotor cortices while monkeys
repeatedly reached in varying directions and at various distances. The
collective activities of neuron firings were plotted into two-d imensional
state space from their principal components, in the same way Churchland
used before (see Figure4.12).
A nontrivial finding was that, after movement onset, the neural acti-
vation state exhibited a quasirotational movement in the same direction
but with different phase and amplitude in the two-d imensional state
space for each different case of reaching. The differences in the develop-
ment of the neural activation state were due to differences in their ini-
tial state at the moment of movement onset. Churchland and colleagues
interpreted this as follows: The preparatory activity sets the initial
state of the dynamic system for generating quasirotational trajectories
and their subsequent evolution produces the corresponding movement
activity. Their interpretation is quite analogous to the idea Yamashita
and Iproposed:Motor programs might be represented in terms of the
initial states of particular neural dynamical systems. The next section
describes a robotics experiment pursuing this line of reasoning utilizing
the MTRNNmodel.
209

Development of Functional Hierarchy forAction 209

9.2. Robotics Experiments onDevelopmental Training


ofComplex Actions

This section shows how the MTRNN model can be used in humanoid
robot experiment tasks on learning and generating skilled action.

9.2.1ExperimentalSetup

I conducted the following studies to investigate how a humanoid robot


can acquire skills for performing complex actions by organizing a func-
tional hierarchy in the MTRNN through interactive tutoring processes
(Yamashita & Tani, 2008; Nishimoto & Tani, 2009). Asmall humanoid
robot QRIO was trained on a set of object manipulation tasks in parallel
through iterative guidance provided by a teacher. The robot could move
its arms by activating joint motors with eight degrees of freedom (DOF)
and was also capable of arm proprioception by means of encoder readings
for these joints. The robot used a vision camera that could automatically
track a color point placed in the center of the object. Therefore, reading
the joint angles of the camera head (two DOF) represents visual sensory
input corresponding to the object position. The robot was trained on
three different tasks in sequence (shown in Figure 9.5), each of which

Move up and down Move left and right

Task 1

Home position Move forward and back Touch by each hand Back to home

Task 2

Touch by both hands Rotate in the air


Task 3

Figure9.5. Arobot trained on three behavioral tasks, each of which is


composed of a sequence of behavior primitives. After the third session, Task 3
was modified, as illustrated by the dotted lines. Adopted from Nishimoto and
Tani (2009) with permission.
210

210 Emergent Minds:Findings from Robotics Experiments

consisted of sequential combinations of different cyclic movement pat-


terns of actions applied to the object.
The training was conducted interactively in cycles of training ses-
sions, meaning that the arms were physically guided to follow adequate
trajectories while the robot attempted to generate its own trajectories
based on its previously acquired skills. In this sense, it can be said that
the actual training trajectories were codeveloped by the teacher and
the robot. Through this physical guidance, the robot eventually per-
ceived a continuous visuo-proprioceptive (VP) flow without explicit
cues for segmenting the flow into primitives of movement patterns.
In the course of developmental learning, the robot was trained gradu-
ally in the course of five sessions. During each session, all three tasks
were repeated while introducing changes in the object position, and
the network was trained with all training data obtained during the ses-
sion. After each training session, offline training of the MTRNN was
conducted by utilizing the VP sequences obtained in the process of
guidance, in which the connection weights and the initial states of the
intention units for all task sequences were updated. Subsequently, the
performance of both the open-loop physical behavior and the closed-
loop motor imagery was tested for all three tasks. Novel movement pat-
terns were added to one of the tasks during the development process for
the purpose of examining the capability of the network for incremental
learning of new behavioral patterns (see Task 3 in Figure9.5).
The employed MTRNN model consisted of 36 units with fast
dynamics for vision and 144 units with fast dynamics for propriocep-
tion ( = 1.0), 30 units with intermediate dynamics ( = 5.0), and
20 units with slow dynamics (=70.0). The units with slow and inter-
mediate dynamics were fully interconnected, as were all the units
with fast and moderate dynamics, whereas the units with slow and fast
dynamics were not connected directly. It was assumed that this kind of
connection constraint would allow functional phenomena such as infor-
mation bottlenecks or hubs to be developed in the subnetwork with
intermediate dynamics.

9.2.2Results

The developmental learning of multiple goal-d irected actions success-


fully converged after five training sessions, even in the case of Task 3,
which was modified with the addition of a novel primitive pattern after
211

Development of Functional Hierarchy forAction 211

the third session. The developmental process can be categorized into sev-
eral stages, and Figure 9.6 shows the process for Task 1 for the first three
sessions. Plots are shown for the trained VP trajectories (left), motor
imagery (middle), and actual output generated by the robot (right). The
profiles for the units with slow dynamics in the motor imagery and the
actual generated behavior were plotted for their first four principal com-
ponents after conducting principal component analysis(PCA).
In the first stage, which mostly corresponds to Session 1, none of the
tasks were accomplished, as most of the actually generated movement
patterns were premature, and the time evolution of the activations of
the units with slow dynamics was almost flat. In the second stage, cor-
responding to Session 2, most of the primitive movement patterns were
actually generated, showing some generalization with respect to changes
in object position, although correct sequencing of them was not yet
complete. In the third stage, corresponding to Session 3 and subsequent
sessions, all tasks were successfully generated with correct sequencing
of the primitive movement patterns and with good generalization with
respect to changes in object position. The activations of units with slow
dynamics became more dynamic compared with previous sessions in the
case of both motor imagery and generation of physical actions. In sum-
mary then, the level responsible for organization of primitive movement
patterns was developed during the earlier period, and the level respon-
sible for the organization of these patterns into sequences developed in
later periods.
One important point I want to make here is that there was a lag
between the time when the robot became able to generate motor imagery
and the time when it started generating actual behaviors. Motor imag-
ery was generated earlier than the actual behavior, as it was observed
that the motor imagery for all tasks was nearly complete by Session 2,
as compared to Session 3 in the case of actual generated behaviors.
This outcome is in accordance with the arguments of some contempo-
rary developmental psychologists, such as Karmiloff-Smith (1992) and
Diamond (1991), who consider that 2-month-old infants already possess
intentionality toward objects they wish to manipulate, although they
cannot reach or grip them properly due to the immaturity of their motor
control skills. Moreover, this developmental course of the robots learn-
ing supports the view of Smith and Thelen (2003) that development is
better understood as the emergent product of many local interactions
that occur in realtime.
212

Figure9.6. Development of Task 1 for the first three sessions with trained
VP trajectories (left), motor imagery (middle), and actual generated behavior,
accompanied by the profiles of units with slow dynamics, after conducting
principal component analysis. (a)Session 1, (b)Session 2, (c)Session
3.Adopted from Nishimoto and Tani (2009) with permission.
213

Development of Functional Hierarchy forAction 213

Another interesting observation taken from this experiment was that


the profiles of the training trajectories also developed across sessions.
It can be seen that the training trajectories in Session 1 were quite dis-
torted. Training patterns such as UD (moving up and down) in the first
half and LR (moving left and right) in the second half did not form
regular cycles. This is a typical case when cyclic patterns are taught to
robots without using metronome-like devices. However, it can be seen
that cyclic patterns in the training process became much more regular
as the sessions proceeded. This is due to the development of limit-cycle
attractors in the MTRNN that shaped the trajectories trained through
direct guidance into more regular cyclic ones via physical interactions.
This result shows a typical example of the codevelopment process
undertaken by the robot and teacher whereby the robots internal struc-
tures develop via dense interactions between the top-down intentional
generation for the robots movement and the bottom-up recognition of
the teachers intention for guiding the robots movement. The interac-
tion modifies not only the robots action but also the teachers. When
Itried to guide physically the robots arms to move slightly differently
from its own movement by grasping the arms, I became aware of its
intention of persistence by some resistance force perceived in my hands.
This modified my teaching intention and the resultant trajectory of
guidance to some degree. In this sense, it can be said that the robots
behavior trajectory and my teaching trajectory codeveloped during the
experiment.
Next, lets see how neurodynamics with different timescales success-
fully generates sets of action tasks consisting of multiple movement pat-
terns. Figure 9.7 shows how the robot behaviors were generated in the
test run after five training sessions.
First, as can be seen in the first and second rows in Figure 9.7, VP
trajectories for trained robots were successfully generated for all three
tasks accompanied by changes in the cyclic movement patterns. Looking
at the activation dynamics of the units with intermediate dynamics
(shown in the fourth row) after conducting PCA, it is clear that their
dynamics are correlated with the VP trajectories.
However, the activation dynamics of the units with slow dynamics,
which started from different initial states for each of the three tasks,
developed to be uncorrelated with the VP or the trajectories of the units
with intermediate dynamics (see the bottom row). Also, the profiles
changed drastically as the movement patterns changed. However, the
214
(a) (b) (c)
UD LR FB TchLR BG RO
1.0 1.0 1.0
0.8 0.8 0.8

Teach

Teach

Teach
0.6 0.6 0.6
0.4 0.4 0.4
0.2 0.2 0.2
0.0 0.0 0.0 Prop1
0 50 100 150 200 250 300 0 50 100 150 200 250 300 350 0 50 100 150 200 250 300
Prop2
Vision1
1.0 1.0 1.0
Vision2
Generation

Generation

Generation
0.8 0.8 0.8
0.6 0.6 0.6
0.4 0.4 0.4
0.2 0.2 0.2
0.0 0.0 0.0
0 50 100 150 200 250 300 0 50 100 150 200 250 300 350 0 50 100 150 200 250 300

2 2 2
Slow units

Slow units

Slow units
1 1 1
0 0 0
1 1 1
2 2 2 PC1
0 50 100 150 200 250 300 0 50 100 150 200 250 300 350 0 50 100 150 200 250 300 PC2
PC3

Intermediate units
Intermediate units

2
Intermediate units 2 2
PC4
1 1 1
0 0 0
1 1 1
2 2 2
0 50 100 150 200 250 300 0 50 100 150 200 250 300 350 0 50 100 150 200 250 300
Step Step Step

Figure9.7. Visuo-proprioceptive trajectories (two normalized joint angles denoted as Prop 1 and Prop 2 and the camera direction
denoted as Vision 1 and Vision 2)during training and actual generation in session 5 accompanied by activation profiles of intermediate
and slow units after principal component analysis, denoted as PC 14. (a)Moving up and down (UD) followed by moving left and
right (LR) in Task 1, (b)moving forward and backward (FB) followed by touching by left hand and right hand (TchLR) in Task 2,
(c)touching by both hands (BG) followed by rotating in air (RO) in Task 3.Adopted from Nishimoto & Tani (2009) with permission.
215

Development of Functional Hierarchy forAction 215

transitions were still smooth, unlike in the case of gate opening or PB,
which were accompanied by stepwise changes, as described in the previ-
ous section. Such drastic but smooth changes in the slow context profile
were tailored by means of dense interactions between the top-down
forward prediction and the bottom-up error regression. The bottom-
up error regression tends to generate rapidly changing profiles at the
moment of switching, whereas the top-down forward prediction tends
to generate only slowly changing profiles because of its large time con-
stant. The collaboration and competition between the two processes
result in such natural, smooth profiles. After enough training, all actions
are generated unconsciously because no prediction error is generated in
the course of well-practiced trajectories unless encountered with unex-
pected events such as dropping the object.
Further insight was obtained by observing how the robot managed to
generate action when perturbed by receiving external inputs. In Task 1,
the experimenter, by pulling the robots hand slightly, could induce the
robot to switch action primitives from moving up and down to moving
left and right earlier than four times cycling as it had been trained. This
implies that counting at the higher level is more like an elastic dynamic
process rather than a rigid logical computational one, which could be
modulated by external inputs like being pulled by the experimenter.
An interesting observation was that the action primitive of moving up
and down was smoothly connected to the next primitive of moving the
object to the left, which took place right after locating the object on the
floor, even though the switch was made after incorrect times of cycling.
The transitions never took place half way of ongoing primitive and were
made at the same connection point as always regardless of incorrect
times of cycling at the transition.
This observation suggests that the whole system was able to generate
action sequences with fluidity and flexibility by adequately arbitrating
between the higher level that has been trained to count specific times
before switching and the lower level that has been trained to connect one
primitive to another at the same point. In the current observation, the
intention from the higher level was elastic enough to give in for incorrect
times of counting against the bottom-up force exerted by the experi-
menter, whereas the lower level was successful at connecting the first
primitive to the second one at the same point as having been trained. Our
proposed dynamic systems scheme can allow this type of dynamic con-
flict resolution between different levels by letting them interact densely.
216

216 Emergent Minds:Findings from Robotics Experiments

9.3.Summary

This chapter was entirely dedicated to an examination of functional


hierarchy by exploring the potential of the MTRNN model. The exper-
imental results suggest that sequences of primitives are abstractly rep-
resented in the subnetwork consisting of units with slow dynamics,
whereas detailed patterns of behavior primitives are generated in the
subnetworks consisting of units with fast and intermediate dynamics.
We can conclude that a sort of fluid compositionality for smooth and
flexible generation of actions is achieved through self-organization of a
functional hierarchy by utilizing the timescale differences as well as the
structural connectivity among different levels in the proposed MTRNN
model. These findings provide a possible explanation for how different
functional roles can be assigned in different regions in brains (i.e., the
PFC for creating abstract actional plans and the parietal cortex for com-
posing sensorymotor details).
Such assignments in the brain may not be tailor made by a genome
program, but result as a consequence of self-organization via develop-
ment and learning under various structural constraints imposed on the
anatomy of the brain, including connectivity among local regions with
bottlenecks and timescale differences in neuronal activities. This can
be accounted by a well-k nown conception in complex adaptive systems,
known as downward causation (Campbell, 1974; Bassett & Gazzaniga,
2011)which denotes causal relationship from the global to local parts.
It can be said that the functional hierarchy emerges by means of the
upward causation in terms of collective neural activity both in the for-
ward activation dynamics and the error back-propagation which are con-
strained by the downward causation in terms of timescale difference,
network topology, and environmental interaction. The observed fluid
compositionality that has been metaphorically expressed as kinetic
melody by Luria should be resulted from this. It was also shown that
the capability of abstraction through hierarchy in MTRNN can provide
robots with competency of self-narrative for own actional intention via
mental simulation. Reflective selves of robots may start from thispoint.
Readers may ask a crucial question. Can the time constant parameters
in the MTRNN be adapted via learning, or do they have to be set by the
experimenters as in the current version? Hochreiter and Schmidhuber
(1997) proposed the long-term and short-term memory RNN model,
217

Development of Functional Hierarchy forAction 217

which is characterized by its dynamic memory mechanism imple-


mented in memory cells. Amemory cell can keep its current dynamic
state for arbitrarily long time steps without specific parameter setting
by means of its associated adaptive gate openingclosing mechanisms
learned via the error back-propagation scheme. If the memory cells were
allocated in multiple levels of subnetworks, it would be interesting to
examine whether a functional hierarchy can be developed by organizing
the long-term memory in the higher level and the shorter memory in
the lower level. Actually, the MTRNN model was originally developed
with a time-constant adaptation mechanism by using a genetic algo-
rithm (Paine & Tani, 2005). Simulation experiments on robot naviga-
tion learning using this model showed that a functional hierarchy for
navigation control of the robot was developed by evolving slower and
faster dynamics structures between two levels of the subnetworks, pro-
vided that a bottleneck connection is prepared betweenthem.
Some may consider that brains should involve also a spatial hierarchy
as having been evidenced in the accumulated studies on the visual rec-
ognition pathway (see section 4.1). Also, Hasson and colleagues (Hasson
et al., 2008) suggested development of spatio- temporal hierarchy in
human visual cortex. In response to this concern, our group (Jung etal.,
2015; Choi & Tani, 2016)has recently shown that a spatio-temporal hier-
archy can be developed successfully in a neurodynamic model referred
to as multiple spatio-temporal neural network (MSTNN) for the rec-
ognition as well as generation of compositional human action sequence
patterns, as represented in pixel level video images when both spatial
and temporal constraints are applied to neural activation dynamics in
multiple scales for different levels. Furthermore, MSTNN and MTRNN
has been integrated in a simulated humanoid robot platform (Figure 9.8)
by which the simulated robot becomes able to generate object manipula-
tion behaviors corresponding to visually demonstrated human gesture
via end-to-end learning from the video image inputs to the motor out-
puts (Hwang etal., 2015. As the result of end-to-end learning for vari-
ous combinations of the gesture patterns and the corresponding motor
outputs for grasping different shape of objects, it was found that the
intentions for grasping different objects can be developed in the PFC
subnetwork characterized by its slowest time scale in the whole network.
Going back to the robotics experiment using the MTRNN, we
observed that actions could be generated compositionally depending on
218

218 Emergent Minds:Findings from Robotics Experiments

(b) Higher level (PFC)


Very slow

In t e
re

ntio
gestu

n to
MSTNN

uman
MTRNN

man
Slow Slow

ipul
ing h

ate s
goriz

pec
Cate

ified
MSTNN MTRNN
Fast

obje
(a) Fast

ct
Motor output

Dynamic vision
Attention control
Inout (VI)
time

Figure9.8. Asimulated humanoid robot learns to generate object


manipulation behaviors as specified by human gesture demonstrated to
the robot by video image. (a)Task space and (b)the integrated model of
MSTNN for video image processing and MTRNN for dynamic motor
pattern generation.

the initial states of the intention units. However, this naturally poses
the question of how the initial state is set (Park & Tani, 2015). Is there
any way that the initial state representing the intentionality for action
could be self-determined and set autonomously rather than being set
by the experimenter? This issue is related to the problem of the ori-
gin of spontaneity or free will, as addressed in section 4.3. The next
chapter explores this issue by examining the results from several syn-
thetic robotics experiments while drawing attention to possible corre-
spondences with the experimental results of Libet (1985) and Soon and
colleagues (2008).
219

10
Free Will forAction
and Conscious Awareness

We first explore how intentions for actions can be generated spon-


taneously in higher cognitive brain areas by reviewing our robotics
experiments. As Iwrote in section 4.3, Libet (1985) demonstrates that
awareness of intention is delayed, a result later confirmed by Soon and
colleagues (2008). Later sections investigate this problem by clarifying
causal relationships shared by free will and consciousness.

10.1. ADynamic Account ofSpontaneous Behaviors

Although we may not be aware of it, our everyday life is full of spon-
taneity. Lets take the example of the actions involved in making a cup
of instant coffee, something we are all likely to be very familiar with.
After Ive put a spoonful of coffee granules in my mug and have added
hot water, Iusually add milk and then either add sugar or not, which is
rather unconsciously determined. Then frequently, I only notice later
that Iactually added sugar when Itake the first sip. Some parts of these
action sequences are defined and staticI must add the coffee granules
and hot waterbut other parts are optional, and this is where Ican see

219
220

220 Emergent Minds:Findings from Robotics Experiments

spontaneity in the generation of my own actions. Asimilar comment can


be made about improvisations in playing jazz or in contemporary dance,
where musical phrases or body movement patterns are created freely
and on the spot in an unpredictable manner.
It seems that spontaneity appears not within a chunk but at junctions
between chunks in behavior streams. Chunks are behavior primitives,
such as pouring hot water into a mug or repeating musical phrases, which
are presumably acquired through practice and experience, as I have
mentioned many times already. Junctions between behavior primitives
are weaker relationships than within primitives themselves because
junctions appear less frequently than primitives in repeated behavioral
experiences. Actually, psychological observations of child development
as well as adult learning have suggested that chunk structures can be
extracted through statistical learning with a sufficiently large number of
perceptual and behavioral experiences (e.g. Klahr etal., 1983; Saffran
etal., 1996; Kirkham etal., 2002; Baldwin etal., 2008). Here, the term
chunk structures denotes repeatable patterns of action sequences as
unified chunks and takes into account the probabilistic state transi-
tions between those chunks, junctions.
One question essential to the problem of free will arises. How is it
that subsequent chunks or behavior primitives can be considered to be
freely selected if one simply follows a learned statistical expectation? If
we consider someone who has learned that the next behavior primitive
to enact in a certain situation is either Aor B, provided that past expe-
rience defines equal probabilities for Aand B, it is plausible that either
of the primitives might be enacted, so there is at least the apparent
potential for freely chosen action in such instances. However, following
the studies by Libet (1985) and Soon and colleagues (2008) discussed
in section 4.3, voluntary actions might originate from neural activities
in the supplementary motor area, prefrontal cortex, or parietal cortex,
and in no case are these activities accompanied by awareness. Thus,
even though one might believe that the choice of a particular action
from among multiple possibilities (e.g., primitives A, B, and C) has been
entirely conscious, in fact this apparently conscious decision has been
precipitated by neural activity not subject to awareness, and indeed free
will seems not so freely determined, atall.
Our MTRNN model can account for these results by assuming
that neural activities preceding apparently freely chosen actions are
represented by the initial states of the intentional units located in the
221

Free Will for Action and Conscious Awareness 221

network with slow dynamics. However, this explanation generates fur-


ther questions: (1) how are the values of the initial states set for ini-
tiating voluntary actions, and (2)how can conscious awareness of the
decision emerge with delay? To address these problems, my colleagues
and Iconducted some neurorobotics experiments involving the statisti-
cal learning of imitative actions (Namikawa etal., 2011). The following
experimental results highlight the role of cortical itinerant dynamics in
generating spontaneity.

10.1.1Experiment

A humanoid robot was trained to imitate actions involving object


manipulation though direct guidance by an experimenter. The setup
used for the robot and the way its movements were guided were the
same as in our experiment described section 9.2 (and in Yamashita and
Tani, 2008). The target actions to imitate are shown in Figure10.1.
The target task to be imitated included stochastic transitions between
primitive actions. The object was located on the workbench in one of
three positions (left, center, or right), and the experimenter repeated
primitive actions that consisted of picking up the object, moving it to
one of the other two possible positions, and releasing it by guiding the
hands of the robot while deciding the next object position randomly
with equal probability (50%). This process generated 24 training

(a) Right to Left (50%) (b) 1.0

Right to Center (50%) Left to Center (50%)


Vertical

Center to Right (50%) Center to Left (50%)


1.0
1.0 Horizontal 1.0
Left to Right (50%)

Figure10.1. Object manipulation actions to be imitated by a Sony humanoid


robot. (a)The task consists of stochastic transitions between primitive
actions:moving an object to one of two possible positions with equal
probability after reaching and grasping it. (b)Trajectory of the center of
mass of the object as observed by using the robots vision system. Adopted
from Namikawa etal. (2011) with PLoS Creative Commons Attribution
(CC BY) license.
222

222 Emergent Minds:Findings from Robotics Experiments

sequences, each of which consisted of 20 transitions between primi-


tive actions, amounting to about 2,500 time steps of continuous visuo-
proprioceptive sequences. The time constants of the employed MTRNN
were set to 100.0, 20.0, and 2.0 for units with slow, intermediate, and
fast dynamics, respectively. It is noted that in this experiment the lower
level was assembled with a set of gated RNNs (Tani & Nolfi, 1999)that
interacted directly with the visuo-proprioceptive sequences. The inter-
mediate subnetwork controlled the gate opening by the outputs.
After the offline training of the network, the robot was tested on imi-
tating (generating) each training sequence by setting it to the acquired
initial state. Although the trained primitive action sequences were repro-
duced exactly during the initial period consisting of several primitive
action transitions, the sequences gradually started deviating from the
learned ones. This was considered to be due to the sensitivity to the ini-
tial conditions in the trained network. Statistical analysis conducted on
the transition sequences generated over longer periods showed that the
probabilities with which the transitions between the primitive actions
were reproduced were quite similar to the ones to which the robot was
exposed during the training period. The same analysis was repeated for
cases of different transition probabilities of the target actions. When the
transition probabilities for some of the target actions were changed to
25% and 12.5%, the same proportion of corresponding sequences were
newly generated in each case. An analysis of the sequences produced
by the trained network for each case showed that the transition prob-
abilities of the reproduced actions mostly followed the target ones, with
deviations of only a few percent. These results imply that the proposed
model, although unable to learn to imitate the long visuo-proprioceptive
sequences exactly, could extract the statistical structures (chunks) with
their corresponding transition probabilities from these sequences.
Lets now examine the main issue in this context, namely the ori-
gin and indeterminacy of spontaneity in choosing subsequent primitive
actions. Here, one might assume that the prevailing opinion is that spon-
taneity is simply due to noise in the (external) physical world, which
induces transitions between primitive actions represented by different
attractors. The following experiment, however, shows that this is not
the case. Furthermore, in examining whether the same statistical repro-
duction could also be observed in the case of motor imagery rather than
actual motor action, it turned out that the answer is affirmative. This
turns out to be quite important because, as motor imagery is generated
223

Free Will for Action and Conscious Awareness 223

deterministically in offline simulation without any contamination from


external sensory noise, the observed stochastic properties should be
due to some internally generated fluctuations rather than noise-induced
perturbations. In other words, the spontaneity observed at junctions
between chunks of action sequences seems to arise from within the
robots and by way of processes perfectly consistent with results from
Libet andSoon.
To gain some insight into this phenomenon, lets look at the neural
activation sequences in units with different timescales associated with
the visuo-proprioceptive sequences during the generation of motor
imagery, as shown in Figure10.2.

R C R L C R L C
Primitive action
1
Activation

Vision

1
1
Activation

Proprioception

1
1
Unit ID

Intermediate
dynamics
network
30
1
Unit ID

Slow dynamics
network

30
0 Time steps 1000

Figure10.2. Time evolution of neural activities associated with visuo-


proprioceptive sequences in motor imagery. Capital letters shown in the first
panel denote primitive actions executed (R:moving to right, L:moving to
left and C:moving to center). Plots in the first panel and in the second panel
show predicted vision and proprioception outputs, respectively. Plots in the
third and fourth panels show, with different shades of gray, the activities of
30 neural units in the subnetworks with intermediate and slow dynamics,
respectively. Adopted from Namikawa etal. (2011) with PLoS Creative
Commons Attribution (CC BY) license.
224

224 Emergent Minds:Findings from Robotics Experiments

It can be seen that the neural activities in the subnetworks with


intermediate and slow dynamics develop with their intrinsic timescale
dynamics. In the plot of intermediate neural activity, it can be seen that
its dynamic pattern repeats for the same action primitive generated.
On the other hand, in the plot of slow dynamics one, neither of such
apparent regularity nor apparent repeated patterns of activity can be
observed.
To examine the dynamic characteristics of the networks, a dynamic
measure known as the Lyapunov exponent was calculated for the activity
of each subnetwork. The Lyapunov exponent is a multidimensional vec-
tor that indicates the rate of divergence of adjacent trajectories in a given
dynamic system. If the largest component of this vector is positive, this
indicates that chaos is generated by means of the stretching and folding
mechanism described in section 5.1. In the analysis, it was calculated that
the maximum Lyapunov exponent was positive for the subnetwork with
slow dynamics and negative for the subnetworks with intermediate and
fast dynamics. The results were repeatable for different runs of training
of the network, implying that chaos emerged in the subnetwork with slow
dynamics but not in the other subnetworks. Therefore, deterministic chaos
emerging in the subnetwork with slow dynamics might affect the subnet-
works with intermediate and fast dynamics, generating pseudostochastic
transitions between primitive action sequences. Readers may see that this
result corresponds exactly with the aforementioned idea (illustrated in
Figure6.2), that chaos in the higher level network can drive compositional
generation of action primitives, which are stored in the lowerlevel as well
as with what Braitenbergs Vehicle 12 predicted in section 5.3.2.
To clarify the functional role of each subnetwork, we next conducted
an experiment involving a lesion artificially created in one of the sub-
networks. The trajectory of the manipulated object generated as visual
imagery by the original intact network was compared with the one gen-
erated by the same network but with a lesion in the subnetwork with
slow dynamics (Figure10.3).
A complex trajectory wandering between the three object positions
was generated in the case of the intact network, whereas a simple tra-
jectory of exact repetitions of moving to left, to right, and to center
was generated in the case of the lesion in the slow dynamics subnet-
work. This implies that the lesion in the subnetwork with slow dynam-
ics deprived the network of the potential to spontaneously combine
primitive actions.
225

Free Will for Action and Conscious Awareness 225

(a) (b)
1 1

1 1

1 1 1 1

Figure10.3. Comparison of behaviors between an intact network and a


lesioned network. Trajectories of the manipulated object (a)generated as
visual imagery by the original intact network and (b)generated by the same
network but with a lesion in its subnetwork with slow dynamics. Adopted
from Namikawa etal. (2011) with PLoS Creative Commons Attribution
(CC BY) license.

10.1.2 Origin ofSpontaneity

The results of the robotics experiments described so far suggest a pos-


sible mechanism for generating spontaneous actions and their images in
the brain. It is assumed that deterministic chaos emerging in the sub-
network with slow dynamics, possibly corresponding to the prefrontal
cortex, might be responsible for spontaneity in sequencing primitive
actions by destabilizing junctions in chunk structures. This agrees well
with Freemans (2000) speculation that intentionality is spontaneously
generated by means of chaos in the prefrontal cortex. The isolation of
chaos into the prefrontal cortex would make sense because the robust-
ness of generation of physical actions would be lost if chaos governs the
whole cortical region. Also, such isolation of chaos in the higher level of
the organized functional hierarchy in the brain might afford the estab-
lishment of two competencies essential to cognitive agency, namely free
selection and combination of actions, and their robust execution in an
actual physical environment.
Our consideration here is analogous to William James consideration
for the mechanism of free will, as was illustrated in Figure3.4. He con-
sidered that multiple alternatives can be regarded as accidental generations
with spontaneous variation from memory consolidating various experi-
ences, in which one alternative is eventually selected as the next action.
226

226 Emergent Minds:Findings from Robotics Experiments

Chaos present at a higher level of the brain may account for this acciden-
tal generation with spontaneous variation. Also, his metaphoric refer-
ence to substantial parts as perchings and transient parts as flights in
theorizing the stream of consciousness might be analogous to the chunk
structures and their junctions apparent in the robotics experiments
described in section 3.5. What James referred to as intermittent transi-
tions between these perches and flights might also be due to the chaos-
based mechanism discussed here. Furthermore, readers may remember
the experimental results of Churchland and colleagues (2010) showing
that the low-dimensional neural activity during the movement prepara-
tory period exhibits greater fluctuation before the appearance of the tar-
get and a more stable trajectory after its appearance. Such fluctuations
in neuronal activity, possibly due to chaos originating in higher levels of
organization, might facilitate the spontaneous generation of actions and
images.
Here, one thing to be noted is that wills or intentions which are spon-
taneously generated by deterministic chaos are not really freely gener-
ated because they are generated by following the deterministic causality
of internal states. They may look as if generated with some randomness,
because the true internal state is not consciously accessible. If we observe
action sequences in terms of categorized symbol sequences, they turn
out to be probabilistic sequences as explained by symbolic dynamics
(see section 5.1). Mathematically speaking, complete free will without
any prior causality may not exist. But, it may feel as if free will exists
when one has a limited awareness of underlying casual mechanisms.
Now, Id like briefly to discuss the issue of deterministic dynamics
versus probabilistic processes in modeling spontaneity. The unique-
ness of the current model study lies in the fact that deterministic chaos
emerges in the process of imitating probabilistic transitions of action
primitives, provided that sufficient training sequences are used to induce
generalization in learning. This result can be understood as a reverse of
the ordinary way of constructing the symbolic dynamic in which deter-
ministic chaos produces probabilistic transitions of symbols as shown in
chapter5. The mechanism is also analogous to what we have seen about
the emergence of chaos in conflicting situations encountered by robots,
as described in section7.2.
We might be justified in asking why models of deterministic dynamic
systems are considered to be more essential than models of stochas-
tic processes, such as Markov chains (Markov, 1971). A fundamental
227

Free Will for Action and Conscious Awareness 227

reason for this preference is that models of deterministic dynamical


systems more closely represent physical phenomena that take place in
continuous time and space, as argued in previous sections. In contrast,
Markov chain models, which are the most popular schemes for mod-
eling probabilistic processes, employ discrete state representations by
partitioning the state space into substates. The substates are assigned
nodes with labels, and the possible state transitions between those states
are denoted by arcs, as in the case of a finite-state machine (FSM). The
only difference with an FSM is that arcs represent transition probabili-
ties rather than deterministic paths. In such a discretization scheme,
even a slight mismatch between the current state of the model and any
inputs from the external environment can result in a failure to match.
When inputs with unexpected labels arrive, Markov chain models just
halt and refuse to accept the inputs. On the other hand, at the very least,
dynamical system models can avoid such catastrophic events as their
dynamics develop autonomously. The intrinsic fuzziness in represent-
ing levels, primitives, and intentions in dynamical system models, such
as MTRNN, could develop robustness and smoothness in interactions
with the physicalworld.

10.1.3 Creating Novel Action Sequences

My colleagues and Iinvestigated the capability of MTRNNs in generat-


ing diverse combinatorial action sequences by means of chaos developed
via the tutored learning of a set of trajectories. In such experiments,
we often observed that MTRNNs generated novel movement patterns
by combining prior learned segments in mental simulation as well as in
actual behaviors (Arie etal., 2009; Arie etal.,2012).
In one such humanoid robot experiment involving an object manipu-
lation task, we employed an extended MTRNN model that can cope with
dynamic visual images at pixel levels (Arie etal., 2009). In this exten-
sion, a Kohonen network model was used for preprocessing of the pixel
level visual pattern similar to the model described in section 7.2. The
pixel pattern received at each step was fed into the Kohonen network
as a two-dimensional topological map and the low-dimensional winner-
take-all activation pattern of the Kohonen network units was input to
the MTRNN, and the output of the MTRNN was fed into the Kohonen
network to reconstruct the predicted image of the pixel pattern. In
training for the object manipulation task, the robot was tutored on a set
228

228 Emergent Minds:Findings from Robotics Experiments

of movement sequences for manipulating a cuboid object by utilizing


the initial sensitivity characteristics in the slow dynamics context units
(Figure 10.4). The tutored sequences were started from two different ini-
tial conditions in which one initial condition was set as an object (a small
block) that stood on the base in front of a small table (a large block).
From this initial condition, the standing object was either moved to the
right side by pushing, put on the small table by grasping, or laid down by
hitting. The other initial condition was set as the same object laid on the
base in front of the table. Then the object was either moved to the right
or was put on the small table. The tutoring was repeated while the posi-
tion of the object in the initial condition was varied.
After learning all tutored sequences, the network model was
tested on the generation of visual imagery as well as actual action.
It was observed that the model could generate diverse visual imag-
ery sequences, or hallucinations, both for physically possible ones
and impossible ones depending on the initial slow context state

Time

Object standing (1) Move standing object to right


on base
(Initial condition 1)

(2) Put standing object on table

(3) Lay down standing object

Object laid on base


(Initial condition 2)
(4) Move laid object to right

(5) Put laid object on table

Figure10.4. Ahumanoid robot tutored on five different movement


sequences starting from two different initial conditions of a manipulated
object. Adopted from Arie etal. (2009) with permission.
229

Free Will for Action and Conscious Awareness 229

(representing the intention). For example, as for the physically pos-


sible case, the network generated an image of concatenating a partial
sequence of laying down the standing object on the base and that of
grasping it to put on the small table. On the other hand, an example of
a physically impossible case involved a slight modulation of the afore-
mentioned possible case. This impossible case involved laying down
the standing object and then grasping the lying down object to put it
on the small table standing up. Although it is physically impossible
that the lying object suddenly stands up after being put on the table,
this strange hallucination appeared because the prior learned partial
sequence pattern of grasping the standing object and putting it on the
table were wrongly concatenated in the image. In the test of actual
action generation, the aforementioned physically possible one was suc-
cessfully generated as shown in Figure10.5.
The experimental results described here are analogous to the results
obtained by using an RNNPB model. In section 8.2, it was shown that
various action sequences including novel ones were generated by chang-
ing the PB values in the RNNPB. In the current case using MTRNN,
diverse sequential combinations of movement primitives including
novel combinations were spontaneously generated by means of chaos
or transient chaos organized in the higher level network. It can be said
that these robots using RNNPB or MTRNN generated something novel
by trying to avoid simply falling into own habitual patterns. It is noted
again that novel images can be found in the deep memory developed
with relational structure among experienced ones through long-term
consolidative learning. An analogous observation has been obtained in
robotics experiment using MTRNN on learning to generate composi-
tional action sequences corresponding to observation of compositional

Time

Figure10.5. The humanoid robot generated an action by spontaneously


concatenating prior learned two-movement sequences of laying down the
standing object on the base and grasping it to put it on the small table.
Adopted from Arie etal. (2009) with permission.
230

230 Emergent Minds:Findings from Robotics Experiments

gesture patterns (Park & Tani, 2015.) It was shown that novel action
sequences can be adequately generated as corresponding to observation
of unlearned gesture pattern sequences conveying novel compositional
semantics after consolidative learning of the tutored exemplar which
did not contain all possible combination patterns.

***

This is not the end of the story. An important question still remains
unanswered. If we consider that the spontaneous generation of
actional intentions mechanized by chaos in the PFC is the origin
of free will, why is the awareness of a free decision delayed, as evi-
denced by Libets (1985) and Soons (2008) experiments? Here, let
us consider how we recognize our own actions in daily life. In the
very beginning of the current chapter, Iwrote that, after adding cof-
fee granules and hot water, Ieither add sugar or not, which is rather
unconsciously determined and then only notice later that Iactually
added sugar when Itake the first sip. Indeed, in many situations ones
own intention is only consciously recognized when confronted with
unexpected outcomes. This understanding, moreover, had led me to
develop a further set of experiments clarifying the structural rela-
tionships between the spontaneous generation of intention for action
and the conscious awareness of these intentions by way of the results
of said actions. The next section reviews this set of robotics experi-
ments, the last one in thisbook.

10.2. Free Will, Consciousness, and Postdiction

This final section explores possible mechanisms accounting for the


awareness of ones own actional intentions by examining cases of con-
flictive interactions taking place between the self and others in a robot-
ics experiment. The idea is essentially this. In conflicting situations,
spontaneously generated intentions are not completely free but modified
so that the conflict can be reduced, and it is in this interplay that con-
sciousness arises. To illustrate these processes, we conducted a simple
robotics experiment. Through the analysis of the experimental result,
we attempt to explain why free will can become consciously aware with
delay immediately before the onset of the actual action.
231

Free Will for Action and Conscious Awareness 231

10.2.1 Model and Robotics Experiment

In this experiment (Murata et al., 2015), two humanoid robots were


used in which a robot, referred as the self robot, was controlled by an
extended version of MTRNN and the other robot, referred as the other
robot was teleoperated by a human experimenter. At each trial, after
the right hand of the other robot was settled in the center position for
a moment, the human experimenter commanded the robot to move the
hand either to the left L or right R direction at random (by using
a pseudo random generator). Meanwhile the self robot attempted to
generate the same movement simultaneously by predicting the decision
made by the other robot. This trial was repeated severaltimes.
In the learning phase, the self robot was trained to imitate the ran-
dom action sequences of either moving left or right demonstrated by the
other robot through visual inputs. Because this part of the robot train-
ing is analogous to the one described in the last section, it was expected
that the robot could learn to imitate the random action sequences by
developing chaos in the slow dynamics network of the MTRNN. In the
test phase for interactive action generation with the other robot, the
self robot was supposed to decide to move its hand either left or right
spontaneously at each juncture. However, at the same time, it had to fol-
low the movement of the other robot by modifying its own intention
when its decision conflicted with the other robot. It is worth noting
here that the chance of conflict is 50% because moving either left or
right by the other robot is determined randomly.
Under the aforementioned task condition, we examined possible
interactions between the top-down process for spontaneously gener-
ating actional intention and the bottom-up process for modifying the
intention by recognizing the perceptual reality by means of the error
regression mechanism in the conflictive situation. The error regression
was applied for updating the activation states of context units in the
slow dynamics network over a specific time length of the regression win-
dow in the immediate past. Specifically, the prediction errors for the
visual inputs for l steps in the immediate past were back-propagated
through time for updating the activation values of context units at the
lth step in the slow dynamics network toward minimizing those errors.
This update reconstructs new image sequence in the regression window
in the immediate past as well as prediction of future sequence by means
of the forward dynamics in the whole network. This was all done using a
232

232 Emergent Minds:Findings from Robotics Experiments

realization of the abstract model proposed in c hapter6 (see Figure6.2)


that can perform regression of immediate past and prediction of future
simultaneously in online.
The test for robot action generation was conducted in comparison
between two conditions, namely with and without using the error
regression scheme. Figure10.6a and b show examples of the robot tri-
als with open-loop one-step prediction, as observed in the experiments
both without and with using the error regression, respectively. Both
cases were tested with the same conflictive situation wherein the inten-
tion of the self robot in terms of the initial state of the slow context
units was set so that an action sequence LLRRL was anticipated while
the other robot actually generated an action sequence RRLLR. The
profiles of one-step sensory prediction (two representative joint angles
of the self robot and two-d imensional visual inputs representing the
hand position of the other robot) are shown in the first row, the online
prediction error is shown in the second row, and the slow context and
fast context activity are shown in the third and fourth rows, respec-
tively. The dotted vertical lines represent the decision points.
It was observed that the case of one-step prediction without using the
error regression was significantly poorer as compared with the one with
the error regression. In fact, the prediction error became significantly
large at the decision point. In this situation, the movement of the self
robot became erratic. Although the self-robot seemed to try to fol-
low the movements of the other robot by using the sensory inputs, its
movements were significantly delayed. Furthermore, at the point of the
fourth decision the self robot moved its arm in the direction opposite
to that of the other robot (see a cross mark in Figure10.6a.) It seems that
the self robot cannot adapt to the ongoing conflictive situation just by
means of the sensory entrainment, because its top-down intention is too
strong to be modified. In contrast, one-step prediction using the error
regression was quite successful by generating only a spikelike momen-
tary error even at a conflictive decision point (see Figure10.6b.) These
results suggest that the error regression mechanism is more effective for
achieving immediate adaptation of the internal neural states to the cur-
rent situation than the sensory entrainment mechanism.
Now, we examine how the neural activity represents the perceptual
images of past, present, and future as associated with current inten-
tion and how such image and intention can be modulated dynamically
through iterative interactions between the top-down intentional process
233

Free Will for Action and Conscious Awareness 233

(a) (b)
R R L R R 1
R R L L R
1

prediction
prediction
sensory

sensory
0 0

1 1
0 100 200 300 400 0 100 200 300 400
0.8 .8
error

error
MSQ

MSQ
0.4 .4

0.0 .0
0 100 200 300 400 0 100 200 300 400
1 1
activity

activity
slow
slow

0 0

1 1
0 100 200 300 400 0 100 200 300 400
1 1
activity

activity
fast

fast
0 0

1 1
0 100 200 300 400 0 100 200 300 400
time step time step

Figure10.6. The results of the self-robot interacting with the other robot
by the open-loop generation without (a)and with (b)the error regression
mechanism. Redrawn from Murata etal. (2015).

and the bottom-up error regression process while online movement of


the robot. Figure 10.7 shows plots for the neural activity at several now
stepsthe 221st step, the 224th step, and the 227th step from left to
rightin an event when the prediction was in conflict with immediate
sensory input. The plots for the sensory prediction (joint angles and
visual inputs), the prediction error, the slow context unit activity, and
the fast context unit activity are shown from the first row to the fourth
row, respectively. They show profiles for the past and for the future,
with the current step of now sandwiched between them. The prediction
error is shown only for the past, naturally. The regression window is
shown as a shaded area in the immediatepast.
The hand of the self robot started to move once to the right direction
around the 215th step after settling in the home position for a moment
(see the leftmost panels). It is noted that although the joint angles of the
self robot were settled, there were dynamic changes in the activity of
fast context units. This dynamic activity prepares a bias to move the
hand to a particular direction, which was the right direction in this case.
Also, it can be seen that the error arose sharply in the immediate past
when the current now was at the 221st step. At this moment, the pre-
diction by the self robot was betrayed because the hand of the other
234
Regression Now Now Now
window (step = 221) (step = 224) (step = 227)

1 Plan 1 Past 1
Past Plan Past Plan

prediction
modulated

Sensory
0 0 0
Overwritten
past
1 1 1
180 200 220 240 260 180 200 220 240 260 180 200 220 240 260
0.8 0.8 0.8
Prediction
error 0.4
0.4 0.4

0.0 0.0 0.0


180 200 220 240 260 180 200 220 240 260 180 200 220 240 260
1 1 1
Slow context
units

0 0 0

1 1 1
180 200 220 240 260 180 200 220 240 260 180 200 220 240 260
1 1 1
Fast context
units

0 0 0

1 1 1
180 200 220 240 260 180 200 220 240 260 180 200 220 240 260
Time step

Figure10.7. The rewriting of future by prediction and past by postdiction in the case of conflict. Profiles of sensory prediction,
prediction error, and activations of slow and fast context units are plotted from past to future for different current now steps. The
current now is shifted from the 221st step in the left panels, the 224th step in the center panels, and the 227th step in the right
panels. Each panel shows profiles corresponding to the immediate past (the regression window) with solid lines and to the future with
dotted lines. Redrawn from Murata etal. (2015).
235

Free Will for Action and Conscious Awareness 235

robot moved to the left. Then, the error signal generated was propa-
gated upstream strongly and the slow context activation state in the
starting step of the regression window was modified with effort. Here,
we can see discontinuity in the profiles of the slow context unit activ-
ity at the onset of the regression window. This modification caused the
overwriting of all profiles of the sensory prediction (reconstruction) and
the neural activity in the regression window by means of the forward
dynamics recalculated from the onset of the window (see the panels of
the current now at the 224th step.) The profiles for future steps were
also modified accordingly while the error was decreased as the current
now shifted to the 224th and to the 227th steps. Then, the arm of the
self robot moved to theleft.
What we have observed here is postdiction1 for the past and prediction
for the future (Yamashita & Tani, 2012; Murata etal., 2015)by which
ones own action can be recognized only in a postdictive manner when
ones own actional intention is about to be rewritten. This structure
reminds us of Heideggers characterization of the dynamic interplay
between looking ahead to the future for possibilities and regressing to
the conflictive past through reflection where vivid nowness is born (see
section 7.2.) Surely, at this point the robot becomes self-reflective for
own past and future!! Especially, the rewritten window in our model
may correspond to the encompassing narrative history as space of time
in its thought. Thus, we are led to a natural inference, that people may
notice their own intentions in the specious present when confronted
with conflicts that must be reduced, with the effort resulting in con-
scious experience.

10.2.2Interpretation

Can we apply the aforementioned analysis to account for the delayed


awareness of free will? The reader may assume that no conflict should
be encountered in just freely pressing a button as in the Libet experi-
ment. However, our experiments show how conflicts might arise due to
the nature of embodied, situated cognition. When an intention uncon-
sciously developed in the higher cognitive level by deterministic chaos

1. Postdiction is known as perceptual phenomena in which a stimulus presented


later affects the perception of another stimulus presented earlier (e.g., Eagleman &
Sejnowski, 2000; Shimojo,2014).
236

236 Emergent Minds:Findings from Robotics Experiments

3. Embodiment entails
certain amount of error.

Motor
4. Intention modulated signal
Proprioception
by error conscious
Error
M1 Prediction of
1. Spontaneous
Error proprioception
generation of
intention by chaos Parietal
in PFC. t
2. Intention drives
lower level.

Figure10.8. Account for how free will can be generated unconsciously and
how one can become consciously aware of it later.

exceeds a certain threshold, it attempts to drive the lower peripheral


parts to generate a particular movement abruptly (see Figure 10.8).
However, the lower levels may not be able to respond to this impetus
immediately because the internal neural activity in the peripheral areas,
including muscle potential states, may not be always ready to initiate
physical body movements according to top-down expectations. It is like
when a locomotive suddenly starts to move, the following freight train
cars cannot follow immediately, and the wheels spin as the system over-
comes resistance to new inertia. As the wheels spin, the engineer may
slow the engine speed to optimize the acceleration and get the train
going properly. Likewise, in terms of the preceding experimental model,
when higher levels cannot receive exactly the expected response from
lower levels, some prediction error is generated, which can call for a
certain modification of the intention for the movement in the direction
of minimizing the error. Here, when the intention for the movement
that has been developed unconsciously is modified, conscious awareness
arises.
This consciously aware intention is different from the original uncon-
scious one because it has been already rewritten by means of postdic-
tion. In short, if actions can be generated automatically and smoothly
as intended exactly in the beginning, they are not accompanied by
consciousness. However, when they are generated in response to con-
flicts arising due to the nature of embodiment in the real world, these
actions are accompanied by consciousness. This interpretation of our
237

Free Will for Action and Conscious Awareness 237

experimental results is analogous to the aforementioned speculation


made by Desmurget and colleagues (2009) (see section 4.3) that the
parietal cortex might mediate error monitoring between the predicted
perceptual outcome for the intended action and the actual one, a pro-
cess through which one becomes consciously aware. Freeman (2000)
also pointed out that action precedes conscious decision, referring to
Merleau-Ponty:

In reality, the deliberation follows the decisionand it is my secret


decision that brings the motives to life (Merleau-Ponty, 1962,
p506).

On this account, the relationship between free will and consciousness


can be accounted for in the following way; (1) deterministic chaos is
developed in the higher cognitive brain area; (2)the top-down intention
is spontaneously fluctuated by means of the chaotic dynamics without
accompanying consciousness; (3) at the moment of initiating a physi-
cal action as triggered by this fluctuated intention, prediction error is
generated between the intended state and the reality in the external
world; (4)the intention, which has been modified by means of the error
regression (postdiction), becomes consciously noticed as the cause for
the action about to be generated. In terms of human cognition, then, we
may say that consciousness is the feeling of ones own embodied neural
structure as it physically changes in adaptation to a changing, unpredict-
able or unpredicted external environment.
If considered as just discussed, Thomas Hobbes (section 3.6) might
be right in saying that there is no space left for free will because every
free action is determined through deterministic dynamics. However,
the point is that our conscious minds cannot see how they develop
deterministically through causal chains in unconscious processes, but
only notice that each freeaction seems to pop out all of a sudden without
any cause. Therefore, we feel as if our intentions or wills could be gener-
ated freely without cause. To sum up, my account is that free will exists
phenomenologically, whereas third- party observation of the physical
processes underlying its appearance tells a differentstory.

10.2.3 Circular Causality, Criticality, and Authenticity

I explored further possibility for applying the MTRNN model extended


with the error regression mechanism to a scenario of incremental and
238

238 Emergent Minds:Findings from Robotics Experiments

interactive tutoring, because such a venture looked so fascinating to me.


When Itaught a set of movement sequences to the robot, the robot gen-
erated various images as well as actual actions by spontaneously combin-
ing these sequences (this is analogous to the experiment results shown
in section 10.1.) While the robot generated such actions, Ioccasionally
interacted with the robot in order to modify its ongoing movement by
grasping its hands. In these interactions, the robot would suddenly ini-
tiate an unexpected movement by pulling my hands. When I pushed
them back in a different direction, they responded with something in
another way. Now, Iunderstand that novel patterns of the robot were
more likely to be generated when my response conflicted with that of
the robot. This was because the reaction forces generated between the
robots hands and my hands were transformed into an error signal in
the MTRNN model in the robots brain, and consequently its internal
neural state was modified by means of the resultant error regression
process. Such experiences, resulting from the enactment of such novel
intentions, can be learned successively and can induce further modi-
fication of the memory structure in the robot brain. Intentions for a
variety of novel actions can be generated again from such reconstructed
memory structures. What Iwitnessed is illustrated with a sketch shown
in Figure10.9a.

(a) (b)

Re-structuring of memory

Memory structure
Spontaneous
generation of
Conscious novel intention
experience

Unpredicted Novel
perception action

Environment/other agents

Figure10.9. Circular causality. (a)Chain of circular causality and (b)its


appearance by means of mutual prediction of future and regression of past
between a robot and myself.
239

Free Will for Action and Conscious Awareness 239

This sketch depicts that there is a circular causality among (1)spon-


taneous generation of intentions with various proactive actional images
developed from the memory structure, (2)enactment of those actional
images in reality, (3)conscious experience of the outcome of the inter-
action, and (4) incremental learning of these new experiences and
the resultant reconstruction in the memory structure. Here, an open
dynamic structure emerges by way of the aforementioned circular
causality. Consequently, diverse images, actions, and thoughts can be
generated, accompanied by spontaneous shifts between conscious and
unconscious states of mind after repeated confrontation and reconcilia-
tion between the subjective mind and the objectiveworld.
Furthermore, it is worth noting that the emergent processes
described in Figure10.9a include also me as Iinsert myself into the cir-
cular causality in the robotics experiment described in this section (see
Figure10.9b.) When Iconcentrated on tactile perception for the move-
ment of the robot in my grasp, sometimes Inoticed that my own next
movement image popped out suddenly without my conscious control.
Ialso noticed that tension between me and the robot rose up to critical
level occasionally from where unexpected movement patterns of mine
as well as of the robot burst out. Although Imay be unable to articulate
the mechanics behind such experience in greater detail through unaided
introspection, alone, I became sure that the interaction between the
robot and me exhibited its authentic trajectory. Ultimately, free will
or free action might be generated in a codependent manner between
me and others who seek for the most possibility in the shared social
situation in this world. At the same time, finally, Irealized that Ihad
conducted robotics experimental studies not only to evaluate the pro-
posed cognitive models objectively, but also to enjoy myself, creating a
rich subjective experience in the exploration of my own consciousness
and free will through my online interaction with neurodynamic robots.

10.3.Summary

This chapter tackled the problems of consciousness, intention, and free


will through the analysis of neurorobotics experimental results. The
problems we focused on were how free will for action can emerge and
how it can become the content of consciousness. First, our study investi-
gated how intention for different actions can be generated spontaneously.
240

240 Emergent Minds:Findings from Robotics Experiments

It was found that actions can be shifted from one to another spontane-
ously when a chaotic attractor is developed in the slow dynamics sub-
network in the higher levels of the cognitive brain. This implies that
intention for free action arises from fluctuating neural activity by means
of deterministic chaos in the higher cognitive brain area. And this inter-
pretation accords with experiment results as delivered by Libet (1985)
and Soon and colleagues (2008).
The next question tackled was why conscious awareness of the inten-
tion for generating spontaneous actions arises only with a delay imme-
diately before actual action is initiated. For the purpose of considering
this question, a robotics experiment simulating conflictive situations
between two robots was performed. The experiment used an extended
version of the MTRNN model employing an error regression scheme for
achieving online modification of the internal neural activity in the con-
flictive situation. The experimental results showed that spontaneously
generated intention in the higher level subnetwork can be modified in a
postdictive manner by using the prediction error generated by the con-
flict. It was speculated that one becomes consciously aware of ones own
intention for generating action only via postdiction, when the originally
generated intention is modified in the face of conflicting perceptual real-
ity. In the case of generating free actions, as in the experiment by Libet,
the delayed awareness of ones own intention can be explained similarly,
as the conflict emerges between the higher level unconscious intention
for initiating a particular movement and the lower level perceptual real-
ity by embodiment, which results in generation of the predictionerror.
These considerations lead us to conjecture that there might be no
space for free will because all phenomena including the spontaneous
generation of intentions can be explained by causally deterministic
dynamics. We enjoy, however, an experience of free will subjectively,
because we feel as if freely chosen actions appear out of a clear sky in
our minds without any cause, because our conscious mind cannot trace
its secret development in unconscious process.
Finally, the chapter examined the circular causality appearing
among processes generating intention, embodiment of such intention
in reality, conscious experience of perceived outcomes, and successive
learning of such experience in the robothuman interactive tutoring
experiment. It was postulated that, because of this circular causal-
ity, all processes time-develop in a groundless manner (Varela, etal.,
241

Free Will for Action and Conscious Awareness 241

1991) without any convergence to particular situations, whereby


images and actions are generated diversely. The vividness and the
authenticity of our selves might appear especially at a certain crit-
icality under such groundless situations developed through circular
causality. And thus, our minds might become ultimately free only
when gifted with such groundlessness.
242
243

11
Conclusions

Now, after completing descriptions of our robotics experiment out-


comes, this final chapter presents some conclusions from reviewing
these experiments.

11.1. Compositionality inthe CognitiveMind

This book began with a quest for a solution to the symbol grounding
problem by asking how robots can grasp meanings of the objective
world from their subjective experiences such as the smell of cool air
from a refrigerator or the feeling of ones own body sinking back into
a sofa. Iconsidered that this problem originated from Cartesian dual-
ism, wherein Ren Descartes suggested that the mind is a nonmaterial,
thinking thing essentially distinct from the nonthinking, material body,
only then to face the problem of interactionism, that is, expound-
ing how nonmaterial minds can cause anything in material bodies, and
vice versa. Actually, todays symbol grounding problem addresses the
same concern, asking how symbols considered as arbitrary shapes of
tokens defined in nonmetric space could interact densely with sensory
motor reality defined in physical and material metric space (Tani, 2014;
Taniguchi etal.,2016).

243
244

244 Exploring RoboticMinds

In this book, I attempted to resolve this longstanding problem of


mind and body by taking synthetic approaches. The book presents the
experimental trials, inspired by Merleau-Pontys philosophy of embodi-
ment, in which my colleagues and I have engineered self-organizing,
nonlinear dynamic systems onto robotic platforms. Our central hypoth-
esis has been that essential cognitive mechanisms self-organize in the
form of neurodynamic structures via iterative learning of continu-
ous flow of sensorymotor experience. This learning grounds higher
level cognition in perceptual reality without suffering the disjunction
between lower and higher level operations that is often found in hybrid
models employing symbolic composition programs. Instead, iterative
interactions between top- down, subjective, intentional processes of
acting on the objective world and bottom-up recognition of perceptual
reality result in the alteration of top-down intention through circular
causality. Consequently, our models have successfully demonstrated
what Merleau-Ponty described metaphorically as the reciprocal inser-
tion and intertwining of the subject and the object through which those
two become inseparable entities.
It might be still difficult for proponents of cognitivism such as
Chomsky to accept such a line of thought. As mentioned in chapter2,
the cognitivists first assumption is that an essential aspect of human
cognition can be well accounted for in terms of logical symbol systems,
the substantial strength of which being that they can support an infinite
range of recursive expressions. The second assumption is that sensory
motor or semantic systems are not necessary for the composition or
recursion taking place in terms of symbol systems, and therefore may
not be essential components of any cognitive systems.
However, one crucial question is whether or not it is necessary for the
daily actions and thoughts of human being to be supported by such an
infinite length of recursive compositions, in the first place. In everyday
situations, a human being speaks only with a limited depth of embedded
sentences, and makes action plans composed of only a limited length of
primitive behavior sequences at each level. An infinite depth of recur-
sive composition is required in neither case. And, the series of robot-
ics experiments described in this book confirm this characterization.
Our multiple timescale recurrent neural networks (MTRNNs) can
learn to imitate stochastic sequences via selforganizing deterministic
chaos with complexity of finite state machines, but not with that of
infinite ones. Amathematical study by Siegelmann (1995) and recently
245

Conclusions 245

by Graves and colleagues (2014) have proved the potential of analog


computational models, including recurrent neural networks (RNNs)
with external memory for writing and reading, that they can exhibit
computational capabilities beyond the Turing limit. However, the
construction of such Turing machines through learning is practically
impossible, because the corresponding parameters such as connectivity
weights can be found only in singular points in the weight space. Such
a parameter-sensitive system may not function reliably, situated in the
noisy, sensorymotor reality that its practical embodiment may require,
even if an equivalence to such a Turing machine might be constructed
in an RNN by chance (Tani etal.,2014).
This should be the same for ordinary human cognitive processes that
rely on relatively poor working memory characterized by the magic num-
ber seven (Miller, 1956). My work with robots has attempted to model
everyday analogical processes of ordinary humans generating behaviors
and thoughts characterized by an everyday degree of compositionality.
This scope may include the daily utterances of children, before the age
of 5 or 6, who can compose sentences in their mother language with-
out explicitly recognizing their syntactic structures, and also include
the tacit learning of skilled actions such as the grasping of an object
to pass it to others without thinking about it, or even making a cup of
instant coffee. Our robotics experiments have demonstrated that self-
organization of particular dynamical structures within dynamic neural
network models can develop a finite level of compositionality, and that
the contents of these compositions can remain naturally grounded in the
ongoing flow of perceptual reality throughout this process.
Of course, this is far from the end of the story. Even though we may
have created an initial picture of what is happening in the mind, prob-
lems and questions remain. For example, a typical concern people often
ask me about is whether symbols really dont exist in the brain (Tani
etal., 2014). On this count, many electrophysiological researchers have
argued for the existence of so-called grandmother cells based on studies
of animal brains in which local firings are presumed to encode specific
meanings in terms of a one-to-one mapping. These researchers argue
that these grandmother cells might function like symbols. Aneurophys-
iologist once emphatically argued with me, denying the possibility of
distributed representations, saying that this recorded neuron encodes
the action of reaching to pulling that object. On the contrary, Ithought
it a possibility that this neuron could fire for generating other types of
246

246 Exploring RoboticMinds

actions that could not be observed in his experiment setting in which


movements of the animals were quite constrained. Indeed, recent devel-
opments in multiple-cell recording techniques suggest that such map-
pings are more likely to be many-to-many than one-to-one. Mormann
and colleagues (2008) results from multiple- cell recordings of the
human medial temporal lobe revealed that the firing of cells for a par-
ticular concept is sparse (firing of around 1% cell population) and that
each cell encodes from two to five different concepts (e.g., an actress
face, an animal shape, and a mathematical formula). Even though con-
cepts are represented sparsely, their representation is not one-to-one but
distributed, and so any presumption that something like direct symbolic
representations exist in the human brain seems equally to be inerror.
That aside, I speculate that we humans use discrete symbols out-
side of the brain depending on the situation. Human civilization has
evolved through the use of outside-brain devices such as pens and paper
to write down linguistic symbols, thereby distributing thought through
symbolic representations, an aspect of what Clark and Chalmers (1998)
have called extended mind. This use of external representation, more-
over, may be internalized and employed through working memory like
a blackboard in the brain to write down our thoughts when we dont
have pen or paper handy. In this book, my argument has been that our
brain can facilitate everyday compositionality such as in casual conver-
sation or even regular skilled action generation by combining primitive
behaviors without needing to (fully) depend on symbol representation
or manipulation in the outside-brain devices.
Still, when we need to construct complicated plans for solving com-
plex problems such as job scheduling for a group of people in a company
or basic designing for building complex facilities or machines, we typi-
cally compose these plans into flow charts, schematic drawings, or item-
ized statements on paper or in other media utilizing symbols. Tasks at
this level might be solved by cognitive architectures such as Act-R, GPS,
or Soar. Indeed, these cognitive architectures are good at manipulating
symbols as they exist outside of brains by utilizing explicit knowledge
or rules. So, this poses the question of how these symbols outside of
the brain can be grounded in the neurodynamic structures inside the
brain. Actually, one of the original inventors of Soar, John Laird, has
recently investigated this problem by extending Soar (Laird, 2008). The
extended Soar contains additional building blocks that are involved in
the learning of tacit knowledge about perception and action generation
247

Conclusions 247

without using symbolic representation. Such subsymbolic levels are


interfaced with symbolically represented short-term memory (STM) in
next level. Next actions are determined by applying production rules
to the memory contents in the STM. Similar research trials can be seen
elsewhere (Ritter etal., 2000; St Amant & Riedl, 2001; Bach,2008).
Ron Sun (2016) have developed a cognitive architecture, CLARION,
which is characterized by interactions between explicit processes real-
ized by symbol systems and implicit processes by the connectionist net-
works under the similar motivation. Although these trials are worth
examining, I speculate that the introduction of symbolic representa-
tions in STM in Soar or in the explicit level in CLARION might be too
early, because such representations can be developed still in a nonsym-
bolic manner such as by analog neurodynamic patterns, as Ihave shown
repeatedly in the current book. The essential questions would be from
which level in cognitive process external symbols should be used and
how such symbols can be interfaced with sub-symbolic representation.
These questions are left for future studies, and there will undoubtedly
be many more we willface.

11.2.Phenomenology

The current book also explored phenomenological aspects of human


mind including notions of self, consciousness, subjective time, and free
will by drawing correspondences between the outcomes of neurorobot-
ics experiments and some of the literature in traditional phenomenol-
ogy. Although some may argue that such analysis from the synthetic
modeling side can never be more than metaphorical, against this Iwould
argue that models capture aspects essential to a phenomenon, reduc-
ing the complexity of a system to only these essential dimensions, and
in this way models are not metaphors. They are the systems in ques-
tion, only simpler, at least in so far as essential dimensions are indeed
modeled and nothing more (see further discussion by Jeffrey White
[2016].) In this spirit, Ibelieve that interdisciplinary discussions on the
outcomes of such neurorobotics experiments can serve to strengthen
the insights for connecting aspects of robot and human behaviors more
closely. It should be true that human phenomenology, human behav-
ior, and underlying brain mechanisms can be understood only through
their mutual constraints imposed on the formal dynamical models, as
248

248 Exploring RoboticMinds

Varela (1996) pointed out. In this way, robotics experiments of the sort
reviewed in this text afford privileged insights into the human condi-
tion. To reinforce these insights, let us review these experiments briefly.
In the robot navigation experiment described in section 7.2, it was
argued that the self might come to conscious awareness when coher-
ence between internal dynamics and environmental dynamics breaks
down, when subjective anticipation and perceptual observation conflict.
By referring to Heideggers example about a carpenter hitting nails with
a hammer, it was explained that the subject (carpenter) and the object
(hammer) form an enactive unity when all of the cognitive and behav-
ioral processes proceed smoothly and automatically. This process is char-
acterized by a steady phase of neurodynamic activity. In the unsteady
phase, the distinction between these two becomes explicit, and the self
comes to be noticed consciously. An important observation was that these
two phases alternated intermittently by exhibiting the characteristics of
self-organized criticality (Bak et al., 1987). It was considered that the
authentic being might be accounted for by this dynamic structure.
In section 8.4, Iproposed that the problem of segmenting the contin-
uous perceptual flow into meaningful reusable primitive patterns might
be related to the problem of time perception as formulated by Husserl.
For the purpose of examining this thought, we reviewed an experiment
involving robot imitation learning that uses the RNNPB model. From
the analysis of these experimental results, it was speculated that now-
ness is bounded where the flow of experience is segmented. When the
continuous perceptual flow can be anticipated without generating error,
there is no sense of events passing through time. However, when the pre-
diction error is generated, the flow is segmented into chunks by means
of a parametric bias vector modification with an effort for minimizing
the error. With this, the passing of time comes to conscious awareness.
The segmented chunks are no longer just parts of the flow, but rather
represent discrete events that can be consciously identified according
to the perceptual categories as encoded on our model by the PB vector.
In fact, it is interesting to see that the observation of compositional
actions by others accompanies the momentary consciousness at the
moment of segmenting the perceptual flow into a patterned set of primi-
tives. This is because compositional actions generated by others entail
potential unpredictability when such actions are composed of primi-
tive acts voluntarily selected by means of the free will of the oth-
ers. Therefore, compositionality in cognition might be related to the
249

Conclusions 249

phenomenology of free will and consciousness. If some animals live only


on sensory-reflex behaviors without the ability to either recognize or
generate compositional actions, there might be no space for conscious-
ness or for the experience of free will in their minds.
In chapter9, Iwrote that the capability of abstraction through hierar-
chy in MTRNN can provide robots with competency of self-narrative for
own actional intention in mental simulation. Ispeculated that reflective
selves of robots may originate from this point. By following this argu-
ment, c hapter10 was devoted to the relationship between free will and
conscious experience in greater depth. From results of robotics experi-
ments utilizing the MTRNN model (section 10.1), Iproposed that inten-
tions for free actions could be generated spontaneously by deterministic
chaos in the higher cognitive brain area. Results of the robotics experi-
ment shown in section 10.2 suggest that conscious awareness of the
intention developed by such deterministic dynamics can arise only in a
postdictive manner when conflicts arise between top-down prediction
and bottom-up reality. This observation was correlated with the account
for the delayed awareness of free will reported by Libet (1985). By con-
sidering possible situations in which the intention to enact a particular
movement generated in the higher level conflicts with the sensorymotor
reality as constituted in the lower level, it was proposed that an effort
autonomously mechanized for reducing the conflict would bring the
intention to conscious awareness.
Finally, this chapter suggested that there might be no space for free
will from an objective view because all of the mechanisms necessary
for generating voluntary actions can be explained by deterministic
dynamics due to causal physical phenomena, as I have shown in our
robotics experiments. Though it is true that in our everyday subjec-
tive experience we feel as if free will exists, through the results of our
neurorobotics experiments we can see that this phenomenon may arise
simply because our minds cannot see the causal processes at work in
generating each intentional action. Our minds cannot observe the phase
space trajectory of chaos developed in the higher cognitive brain area.
We are conscious of each intention as if it pops up without any prior
cause immediately before the corresponding action is enacted. On this
account, thus, we may conclude that free will exists but merely as an
aspect of our subjective experience.
With the relationship between free will and consciousness thus
clarified, Iwill reiterate once more that the problem of consciousness
250

250 Exploring RoboticMinds

may not be the hard problem after all. If consciousness is considered


to be the first person awareness of embodied physical processes, then
an exhaustive account of consciousness should likewise appear via
the explanation of the relationships between the subjective and the
objective. This stands to reason, of course, provided that the whole of
this universe is also constituted by these two poles, and that nothing
exists outside of them (something supernatural). When subjectiv-
ity is exemplified by the top-down pathway of predicting an actional
outcome, and objectivity by the bottom-up recognition of the percep-
tual reality, these poles are differentiable in terms of the gap between
them. Consequently, consciousness at each moment should appear as
a sense of an effortful process aimed at minimizing this gap. Then,
qualia might be a special case of conscious experience that appears
when the gap is generated only in the lower perceptual level in which
the vividness of qualia may be originated from the prediction error
residual at each instance. Along this line, and more specifically, Friston
(2010) would say that it is from the error divided by the estimated
variance (uncertainty) rather than the error itself.
However, a more essential issue is to understand the underlying struc-
ture of consciousness rather than just a conscious state at a particular
moment that is measured post hoc in terms of integrated information
(Tononi, 2008), for example, or in terms of the aforementioned gap or
prediction error. We have to explain the underlying structural mechanism
accounting for, for example, the stream of consciousness formulated as
spontaneous alternation between conscious state and unconscious state
by William James (1892). The crucial proposal in the current book is that
the circular causality developed between the subjective mind and the
objective world is responsible for consciousness and also for an appear-
ance of free will, as these two are dependent on each other within the
same dynamic structure. The top-down proactive intention acting on the
objective world induces changes on this world, whereas the bottom-up
postdictive recognition of such changes including unexpected ones may
induce changes in memory and intention in the subjective mind. This
could result in another emergence of free action by means of the poten-
tial nonlinearity of the system. In the loop of circular causality, spontane-
ous shifts between unconscious state in terms of the coherent phase and
conscious state in terms of the incoherent phase occur intermittently as
the dynamic whole develops toward criticality.
251

Conclusions 251

To sum up, this open dynamic structure developed in the loop of the
circular causality should account for the autonomy of consciousness and
free will. Or, it can be said that this open dynamic structure explains
the inseparable nature of the subjective mind and the objective world
in terms of autonomous mechanisms moderating the breakdown and
unification of this system of self and situation. Conclusively, critical-
ity developed in this open, dynamic structure might account for the
authenticity thought by Heidegger that generates trajectory toward own
most possibility by avoiding just falling into habitual or conventional
ways of acting (Tani, 2009). Reflective selves of robots that can examine
own past and future possibility should originate from this perspective.

11.3. Objective Science and Subjective Experience

The readers might have noticed that two different attitudes in conduct-
ing robotics experiments appear by turns in Part II of the current book.
One type of my robotics experiment focuses more on how adequate
action can be generated based on the learning of a rational model of
the outer world, whereas the other type focuses more on the dynamic
characteristics of possible interactions between the subjective mind and
the objectiveworld.
For example, chapter 7 employs these two different approaches in
the study of robot navigation learning. Section 7.1 described how the
RNN model used in mobile robots can develop compositional repre-
sentations of the outer environment and how these representations can
be grounded. On the other hand, section 7.2 explored characteristics
of groundlessness (Varela et al., 1991) in terms of fluctuated interac-
tion between the subjective mind and the objective world. Section
8.3 describes the one-way imitation learning of the robot to show that
the RNNPB model can learn to generate and recognize a set of primi-
tive behavior patterns by observing movements of its human partner.
Afterward, Iintroduced the imitation game experiment in which two-
way mutual imitation between robot and human was the focus. It was
observed that some psychologically plausible phenomena such as turn
taking of initiative emerged in the course of the imitation game, reinforc-
ing our emphasis on the interaction between the first-personal subjec-
tive and the objective, in this case social, world. In chapter9, Idescribed
252

252 Exploring RoboticMinds

how the MTRNN model can learn compositional action sequences by


developing an adequate functional hierarchy in the network model.
Then, c hapter 10 examined how circular causality can be developed
among different cognitive processes for the purpose of investigating the
free will problem by using the same MTRNN model. This chapter also
reported how novel image and action can be generated both in robot
and human sides during interactive tutoring of robots by human tutors.
To sum up, my research attitude has been shifting between one
side of investigating rational models for cognitive mechanisms from an
objective view and the other side of exploring subjective phenomena by
means of putting myself inside the interaction loop in robotics experi-
ments. Matsuno (1989) and Gunji (Gunji & Konno, 1991) wrote that
the former type of research attitude would take a view of the so-called
external observer and the latter of the so-called internal observer. They
used the term observation as mostly equivalent to the term interaction.
When the relationship between the observer and the observed can alter
because of the interactions between them, such an observer is regarded
as an internal observer because it is included in the internal loop of the
interactions. On the other hand, the external observer assumes only
one-way, passive observation from observed to observer without any
interactive feedback.
Observation, itself, consists in a set of embodied processes that are
physically constrained in various ways such as by imprecision in percep-
tion and in motor generation, time delays in neural activation and body
movement, limitation in memory capacity, and so on. Such physical con-
straints in time and space do not allow the system to be uniquely opti-
mized and thus give rise to incompleteness and inconsistency. Actually,
in our robot experiments, such inconsistencies arise in every aspect of
cognitive processes including action generation, recognition of percep-
tual outcomes, and the learning of resultant new experience. However,
at the moment of encountering such an inconsistency, the processes can-
not be merely terminated. Instead, each process attempts to change its
current relations as if it were expected that the inconsistency will be
resolved sometime in the future and as long as the interaction continues
(Gunji & Konno,1991).
We can experience something analogous to this when we go to a gig
of cutting-edge contemporary jazz. A brilliant tenor sax player like
the late Michael Brecker often started a tune with familiar phrases of
improvisation in calm, but his play and other band members got tensed
253

Conclusions 253

gradually through mutual responses. At the near peak of the tension,


likely to break down at any moment, his play sometimes got stuck for
an instant as his body control for blowing or tonguing seemed unable to
catch up with his rushed image any more. In the next moment, however,
the unbelievable tension of sound and phrase burst out. His genuine cre-
ativity in such thrilling playing resulted not merely from his outstanding
skills for improvising phrases or for perfect control of the instrument
but originated from the urgent struggle for enactment of his exploded
mental image and intention.
It is interesting to note that cognitive minds appear to maintain
two processes moving toward opposite directions, one toward stabil-
ity and the other toward instability. The goal directedness is consid-
ered as an attempt to achieve the stability of the system by resolving
the currently observed inconsistencies of the system. All processes of
recognition, generation, and learning can be regarded as goal-d irected
activities, which can be accounted for such as by the prediction error
minimization principle employed in our models. These activities are
geared toward grounding as shown in some of our robotics experiments.
However such goal-d irected attempts always entail instability because
of their embodiment as well as potential openness of adopted envi-
ronment that resulted in the groundlessness, as we have witnessed in
our other robotics experiments. The coexistence of the stable and the
unstable nature does not allow the system state to simply converge but
imbues the system with autonomy for generating itinerant trajectory
(Tsuda, 2001; Ikegami & Iizuka, 2007; Ikegami, 2013)wherein we can
find the vividness of a living system.
By overviewing my research history, now I become sure that both
research attitudes are equally important for the goal of understanding
the mind via synthesis. On the one side, it is crucial to build rational
models of cognition with the goal of optimization and stabilization of
each elementary cognitive process. On the other hand, it is equally cru-
cial to explore dynamic aspects of mind while the optimization is yet
to be achieved during the ongoing process of robots acting in the world.
The former research can be much more advanced by using the recent
results from the booming research programs on machine learning and
deep learning in which the connectionist approach with employing the
error back propagation scheme has been revived by introducing more
elegant mathematics to the models than those in 1980s. For further
advancement of the latter part, we need to explore the methodology
254

254 Exploring RoboticMinds

of articulating the subjective experience of the experimenters who are


within the interaction loop in the robotics experiment.
What we need to do is to enhance further the circular loop between
the objective science of modeling cognitive mechanisms and the prac-
tice for articulating the subjective experience. This exactly follows what
Varela and colleagues proposed in the embodied mind (Varela et al.,
1991)and in their so-called neurophenomenology program (Varela, 1996).
Varela and colleagues proposed to build a bridge between mind in science
and mind in experience by articulating a dialogue between these two tra-
ditions of Western cognitive science and Buddhist meditative psychology
(Varela etal., 1991, xviii). Why Buddhist meditation for the analysis of
subjective experience? This is because the Buddhist tradition of medita-
tion practice spanning more than 26 centuries has achieved systematic
and pragmatic disciplines for accessing the human experience. Parts of
Buddhist meditation disciplines could be applied directly to our problem
of how to articulate the subjective experience of the experimenter in the
robotics experimentloop.
The Buddhist mindful awareness tradition starts with practices to sus-
pend habitual attitudes granted in everyday life (Varela etal., 1991). By
practicing this suspension of the habitual attitude, meditators become
able to let their minds present themselves or go by themselves by devel-
oping a mood for stepping back. Analogously, if we attempt to develop
ultimately natural, spontaneous mindful interactions between robots
and human, we should get rid of arbitrary thinking in the human sub-
jects, such as what robots or human should do or should not do, which
have been assumed in the conventional humanrobot interaction frame-
work. In my own experience of interacting with the robot as described
in section 10.2, when Iwas more absorbed in the robot interaction by
concentrating on tactile perception for the movement of the robot in
my grasp, Ifelt more vividness on the robot movement and also experi-
enced more spontaneous arousal of kinesthetic image for my own move-
ment. The ongoing interaction was neither dominated by my subjectivity
nor the objectivity of the robot. It was like floating in the middle way
between the two extremes of the subjectivity and the objectivity. Such
intensive interaction alternated between a more tensed, conflictive phase
and a more relaxed one, as I already mentioned. It is noted that con-
tinuance of such subtle interaction depended on how diverse memory
patterns were consolidated by developing generalized deep structure in
the dynamic neural network used in the robot. The more deeply the
255

Conclusions 255

memory structure develops, the more intriguing the generated images


become. The enhancement of the employed models greatly contributes
to realization of sensible interactions between the robots and the human
subjects.
In summary, it is highly expected that the goal of understanding
the mind can be achieved by making efforts both in the objective sci-
ence and the subjective experience, one for investigating more effective
cognitive models assuring for better performance and scalability, and
the other for practicing to achieve truly mindful interaction with the
robots. True features of the mind should be captured by undertaking
such research trials of moving back and forth in exploration of objective
science and subjective experience.

11.4. Future Directions

Although this book has not concentrated on modeling the biological


reality of the brain in details recent exciting findings in system-level
neuroscience draw me to explore this area of research more explicitly.
The sizeable amount of human brain imaging data that have been gath-
ered to date has enabled a global map to be created of both static con-
nectivity and dynamic connectivity between all the different cortical
areas (Sporns, 2010). Thanks to such data, now might be a good time
to start trying to reconstruct a global model of the brain so that we can
synthetically examine what sorts of brain functions appear locally and
globally with both static and dynamic connectivity constraints. In the
process, we may also examine how these models correspond with evi-
dence from neuroscience.
An exciting future task might be to build a large-scale brain network
by using either rate-coding neural units or spiking neurons for artificial
humanoid brains. Such experiments have been started already by some
researchers, including Edelmans group (Fleischer et al., 2007) and
Eliasmith (2014) by introducing millions of spiking neurons in their
models. Ishould emphasize, however, that large scale does not mean a
complete replica of real brains. We still need a good abstraction of the
biological reality to build tractable models. We may not need to recon-
struct the whole brain by simulating activity of 100 billions of biological
plausible neurons interconnected with columnar structure as like aimed
by Blue Brain project (see section5.4).
256

256 Exploring RoboticMinds

Interestingly, it has been recently shown that some connectionist type


neural network models using several orders less number of rate-coding
neural units can exhibit human level performance in specific tasks such
as visual object recognition. It was shown that so-called the convolutional
neural network (CNN, LeCun etal., 1998)developed as inspired by the
hierarchical organization of the visual cortex can learn to classify visual
images of hundreds object types such as bicycles, cars, chairs, tables, gui-
tars and so on in diverse views and sizes with error rate of 0.0665 by
using 1 million set of static visual image training data (Szegedy et al.,
2015.) Although this classification accuracy is almost close to that of
human (Szegedy et al., 2015), a surprising fact is that the used CNN
consisting of 30 layers contains only around a million of almost homoge-
neous rate-coding neural units as opposed to the fact that the real visual
cortex contains 10 billion of spiking neurons with hundreds of different
morpho-electrical types (see section 5.4.) This implies that activities of
10thousands of spiking neurons could be represented by that of a single
rate coding neural unit as a point mass in connectionist models without
degrading their performance level as Ipresumed in section 5.4. It is also
inferred that the known diversity in cell types as well as in synaptic con-
nection types can be regarded as biological details which may not con-
tribute to primary system level understanding of brain mechanisms such
as how visual objects can be classified in brains. Building a large-scale
brain network model consisting of a dozen of major brain areas in its sub-
networks by allocating around 10million of rate-coding neural units as
the total may not be so difficult even in the current computational envi-
ronment of using clusters. Now, we can start such an enterprise, referred
to as Humanoid Brain project. Humanoid Brain project would clarify the
underlying mechanism on the functional differentiation observed across
local areas in our brains in terms of downward causation by the functional
connectivity and the multiple spatio-temporal scales property evidenced
in human brains and by embodiment in terms of structural coupling of
the peripheral cortical areas with sensory-motor reality.
Another line of meaningful extension in terms of neuro-phenomenological-
robotics would be exploration of underlying mechanisms for various
psychiatric diseases including schizophrenia, autism, and depression.
Actually, Iand my colleagues have started studies in this direction, which
have already shown initial results. Yamashita and Tani (2012) proposed
that disturbance of self, which is a major symptom in schizophrenia, can
be explained as compensation for adaptive behavior by means of the
257

Conclusions 257

error regression. Aneurorobotics model was built as inspired by the dis-


connectivity hypothesis by Friston (1998) that suggests that basic pathol-
ogy of schizophrenia may be associated with functional disconnectivity in
the hierarchical network of the brain (i.e., between prefrontal and poste-
rior brain regions). In the neurorobotics experiment (Yamashita and Tani,
2012)using an MTRNN model, a humanoid robot was trained for a set
of behavioral tasks. After the training, a certain amount of perturbation
was given in the connectivity weights between the higher level and the
lower level to represent the disconnectivity. When the robot performed
the trained tasks with online error regression, the inner prediction error
was generated because of the disconnectivity introduced. Consequently,
the intention state in the higher level was modulated autonomously by
the error signal back-propagated from the lower perception level. This
observation suggests that aberrant modulatory signals induced by inter-
nally generated prediction error might be a source of the patients feeling
that his intention is affected by some outsideforce.
Furthermore, the experimental result by Yamashita and Tani (2012)
suggests a hypothetical account for a schizophrenia symptom, cognitive
fragmentation (Perry & Braff, 1994), in which the patients lack conti-
nuity in spatiotemporal perception. It is speculated that such cognitive
fragmentation might be caused by frequent occurrences of the inner
prediction error, because subjective experience of time passing can be
considered to be associated with prediction error in segmentation points
in perceptual flow, as Ihave analyzed in section8.3.
In future research, the mechanism for autism could be clarified in terms
of another type of malfunction in the predictive coding scheme presumed
in the brain. Recently, Van de Cruys and colleagues (Van de Cruys etal.,
2014)proposed that hyper-prior with less tolerance with the prediction
error results in failure in generalization in learning which is the primary
cause of autism. This can be intuitively explained that the prediction net-
work can generate overfitting problem with generalization error when
the top-down pressure for minimizing the error in learning is imposed
on the network too strongly. This generalization error in predicting com-
ing perceptual state could be considered as the main cause of autism
from accumulated evidence on the patients typical symptom that they
are significantly good at learning by rote but lacking capability in struc-
tural learning (Van de Cruys etal., 2014; Nagai & Asada, 2015.) Robotic
experiment for reconstructing the symptom could be conducted by mod-
eling hyper-prior by implementing estimation of inverse precision used
258

258 Exploring RoboticMinds

in Bayesian predictive coding framework (Friston, 2005; Murata et al.,


2015), as Van de Cruys and colleagues (2014) rationalized that over esti-
mation of the precision under noisy real world circumstance can result in
overfitting of the prediction model. Future studies should examine other
psychiatric diseases including attention deficit hyperactivity disorder and
obsessivecompulsive disorder. In summary, if a particular neurorobotics
model represents a good model of the human mind, it should be able to
account also for the underlying mechanisms for these common psychia-
try pathologies, because brain structures of these patients are known to
be not so much different from the normalones.
Another crucial question should be how much we can scale the neu-
rorobots described in this book, as Iknow well that still my robots can
work only in toy environments. Although Id say that the progress made
in neurorobotics has thus far been steady, actually scaling robots to near-
human level might be very difficult. Confronted with this challenge,
recent pragmatic studies of deep learning (Hinton et al., 2006; Bengio
etal., 2013)have revived aging connectionist approaches supercharged
with huge computational power latent within (multiple) graphic process-
ing units in standard desktop PCs. Already, some deep learning schemes
have recently demonstrated significant advances in perception and rec-
ognition capabilities by using millions of exemplar datasets for learning.
For example, a convolutional neural network (LeCun et al., 1998) can
perform visual object classification with near human level performance
by learning (Szegedy et al., 2015) as described previously in this sub-
section, and a speech recognition system provided a far better recogni-
tion rate given noisy speech signals of unspecified speakers than widely
used, state-of-the-art commercial speech recognition systems (Hannun
etal., 2014). The handwriting recognition system using long-term short-
term memory by Doetsch and colleagues (2014) demonstrated its almost
human-equivalent recognition performance.
Such promising results seem to justify some optimism, that the arti-
ficial upscaling to human-like cognitive capabilities using these meth-
ods may not be so difficult. Optimists may say that these systems can
exhibit near human-level perceptual capabilities. Although this should
be true for recognition of a single modality of perceptional channel, it
is clear that deep understanding of the world on a human level cannot
be achieved just by this. Such understanding should require associative
integration among multiple modalities of perceptual flows, experienced
through iterative interactions of the agents with theworld.
259

Conclusions 259

Regardless, these and other recent advances in deep learning suggest


that neurorobotics studies could be scaled significantly with the afore-
mentioned large-scale brain network model if massive training librar-
ies are used alongside multimodal, high-d imensional perceptual flow
including the pixel level visual stream, like the one shown by Hwang
et al. (2015) briefly described in section 9.3, and tactile sensation via
hundreds of thousands of points of contact covering an entire skin
surface; likewise for auditory signals, olfactory organs and so on. So
empowered, with online experience actively associated with its own
intentional interaction with the world, deep minds near human level
might appear as a consequence. When a robot becomes able to develop
subjective, proactive self-images in huge numbers of dimensions along-
side its own unique real- time perceptual flow as it interacts with
the world, we may approach the reconstruction of real human minds!
Attempts to scale neurorobots toward human- like being, of course,
are scientifically fascinating, and the developmental robotics commu-
nity has already begun investigating this issue seriously (Kuniyoshi &
Sangawa, 2006; Oudeyer etal., 2007; Asada etal., 2009; Metta etal.,
2010, Asada, 2014; Cangelosi & Schlesinger, 2015; Ugur etal.,2015.)
However, what is crucially missing from current models is general
intelligence by way of which various tasks across different domains can
be completed by adaptively combining available cognitive resources
through functions such as inference, induction, inhibition of habitua-
tion, imitation, improvisation, simulation, working memory retrieval,
and planning, among many others. One amazing aspect of human com-
petency is that we can perform such a wide variety of tasks like navigat-
ing, dancing, designing intricate structures, cleaning rooms, talking with
others, painting pictures, deliberating over mathematical equations, and
searching the Internet for information on neurorobotics, simply to name
a few. Compared with this, what our robots can do is merely navigate
a given workspace or manipulate simple objects. So, taking our work
one stage further logically involves educating robots to perform multiple
domain tasks toward multiple goals with increasing degrees of complex-
ity. Success in this endeavor should lead to a more general intelligence.
Toward this end, the crucial question becomes how to increase
the amount of learning. This is not easy, however, because we cannot
train robots simply by connecting them to the Internet or to a data-
base. Robots must act on the physical environment to acquire their
own experiences. So, researchers must provide a certain developmental
260

260 Exploring RoboticMinds

educational environment wherein robots can be tutored every day for


months or possibly for years. And, as robots must be educated within
various task domains, this environment is necessarily more complex
than a long series of still photos.
In considering developmental education of robots, an essential ques-
tion, still remained is that how human or artifacts like robots can acquire
structural representation of the world by learning through experience
under the constraints of poverty of stimulus, as Norm Chomsky (1972)
once asked. This is asking how generalization in learning can be achieved,
for example, in robots with limited amount of tutoring experiences. For
this question developmental robotics could provide a possible solution
by using the concept of staged development considered by Piaget (1951).
The expectation is that learning in one developmental stage can provide
prior to the one in the next stage by which dimensionality of the learning
can be drastically reduced, and therefore generalization with less amount
of tutoring experience becomes possible. Based on this conception, devel-
opmental stage would proceed from physically embodiment level to
more symbolic level. Trials should require a lengthy period wherein phys-
ical interactions between robots and tutors involve scaffoldingguiding
support provided by tutors that enables the bootstrapping of cognitive
and social skills required in the next stage (Metta et al., 2010). With
scaffolding, higher level functions are entrained alongside foundational
perceptual abilities during tutoring, and the robots cognitive capacities
develop from grounding simple sensory-motor skills to more complex
compositional cognitive ones. It could happen that the earlier stages may
require merely sensorymotor level interaction with environment physi-
cally guided by tutors whereas the later stages may provide tutoring more
in demonstration and imitation style without introducing physical guid-
ance. The very final stage of education may require only usage of virtual
environments (like learning from watching videos) or symbolically repre-
sented materials (like reading books). For implementation of such staged
tutoring and development of robots, research on the method for the tutor
or educator side may become equally important.
In the aforementioned developmental tutoring process, a robot should
not be a passive learner. Rather, it should be an active learner that acts
creatively for exploring the world, not merely repeating acquired skills
or habits. For this purpose, robots should become authentic beings, as
Imentioned repeatedly, by reflecting own past seriously and also by act-
ing proactively for own most possibility that is shared with the tutors.
261

Conclusions 261

Tutoring interaction between such active learner robots and human


tutors should inevitably become highly intensive occasionally. To carry
out long- term and sometime intensive educational interactions, the
development of emotions within the robot would be an indispensable
aid. Although this issue has been neglected in this book, to take care of
robots like children, human tutors would require emotional responses
from the robots. Otherwise, many human tutors may not be able to con-
tinue cordial interactions with stone-cold, nonliving machines for such
long periods. The development of adequate emotional responses should
deepen bonds between tutors and robots, by which long-term, affectively
reinforced education would become possible. Minoru Asada proposed so-
called affective developmental robotics (Asada, 2015)in which he assumes
multiple stages of emotional development from a simple stage to a com-
plex one including emotional contagion, emotional empathy, cognitive
empathy, and sympathy. His crucial premise is that the development of
emotion and that of embodied social interaction are codependent on
each other. Consequently, the long-term educational processes of robots
by human caregivers should be accompanied by these two codependent
channels of development.
Finally, a difficult but important problem to be considered is whether
artifacts can embody and express moral virtue. Aristotle says that moral
virtues are not innate, but they can be acquired through habitual prac-
tice. It is said that an individual becomes truthful by acting truthfully or
becomes unselfish by acting unselfishly. Simultaneously, human beings
are motivated to do something good for others because they share
in the consequences of their actions by means of mirror neurons. The
net effect is that, as one human seeks happiness for him-or herself, he
or she experiences happiness in bringing happiness to others similarly
embodied. In principle, robots can do the same by learning the effects of
their own actions on the happiness expressed by others and reinforced
through mirroring neural models. Iwould like to prove that robots can
be developed or educated to acquire not only sophisticated cognitive
competency but also moral virtue. Nowadays, robots may start to have
free will, as I have postulated in this book. This means that those
robots could happen to generate bad behaviors toward others as well
by own wills. However, if the robots can learn about moral virtue, such
robots would generate only good behaviors by inhibiting themselves to
generate bad behaviors. Such robots would contribute to true happiness
in a future humanrobot coexisting society.
262

262 Exploring RoboticMinds

11.5.Summary

This final section overviews the whole book once again for the purpose
of providing final conclusive remarks.
This book sought to account for the subjective experience character-
ized on the one hand by compositionality of higher-order cognition and
on the other hand by fluid and spontaneous interaction with the outer
world through the examination of synthetic neurorobotics experiments
conducted by the author. In essence, this is to inquire into the essential,
dynamical nature of the mind. The book was organized into two parts,
namely Part IOn the Mind and Part IIEmergent Minds:Findings
from Robotics Experiments. In Part I, the book reviewed how different
questions about minds have been explored in different research fields,
including cognitive science, phenomenology, brain science, psychology,
and synthetic modelling. Part II started with new proposals for tackling
open problems through neurorobotics experiments. We once again look
at each chapter briefly to summarizethem.
Part I started with an introduction to cognitivism, in chapter 2
emphasizing compositionality, considered to be a uniquely human
competency whereby knowledge of the world is represented by utilizing
symbols. Some representative cognitive models were introduced that
address the issues of problem solving in problem spaces and the abstrac-
tion of information by using chunking and hierarchy. This chapter sug-
gested, however, the potential difficulty in utilizing symbols internal to
the mechanics of minds, especially in an attempt to ground symbols in
real-time, online, sensory-motor reality and context.
Chapter3 on phenomenology introduced views on the mind from the
other extreme, emphasizing direct or pure experiences prior to being
articulated with particular knowledge or symbols. The chapter covered
the ideas of subjective time by Husserl, being-in-the-world by Heidegger,
embodiment by Merleau-Ponty, and stream of consciousness by James.
By emphasizing the cycle of perception and action in the physical world
via embodiment, we explored how philosophers have tackled the prob-
lem of the inseparable complex that is the subjective mind and the objec-
tive world. It was also shown that notions of consciousness and free will
may be clarified through phenomenological analysis.
Chapter4 attempted to explain how human brains can support cogni-
tive mechanisms through a review of current knowledge in the field of
neuroscience. To start with, we looked at a possible hierarchy in brains
263

Conclusions 263

that supports complex visual recognition and action generation. We


then considered the possibility that two cognitive functionsgenerating
actions and recognizing perceptual realityare just two sides of the same
coin by reviewing empirical studies on the mirror neurons and the pari-
etal cortices. This chapter also examined the issue of the origin of free will
by reviewing the experimental study conducted by Libet (1985). Despite
the recent accumulation of various experimental findings in neurosci-
ence, these chapters concluded that it is not yet possible to grasp com-
plete understanding of the neuronal mechanisms accounting for cognitive
functions of our interests due to conflicting evidence and the limitations
inherent in experimental observation in neuroscience.
Chapter5 introduced the dynamical systems approach for modeling
embodied cognition both in natural and artificial systems. The chap-
ter began with a tutorial on nonlinear dynamical systems. By following
this tutorial, the chapter described Gibsonian and Neo-Gibsonian ideas
in psychology that fit quite well with the dynamical systems frame-
work and also explained how they have influenced the communities of
behavior-based robotics and neurorobotics. Some representative neuro-
robotics studies were introduced investigating how primitive behaviors
can develop and be explained from the dynamical systems perspective.
Chapter 6, as the first chapter of Part II, proposed new paradigms
for understanding cognitive minds by taking a synthetic approach uti-
lizing neurorobotics experiments. First, the chapter postulated the
potential difficulty in clarifying the essence of minds by just pursuing
the bottom-up pathway emphasized by the behaviour-based approach.
Then it was argued that what is missing are the top-down subjective
intentions for acting on the objective world and its iterative interaction
with the bottom-up perceptual reality. It was speculated that human-
like capabilities for dealing with compositional language-thoughts or
even for much simpler cognitive schemes should emerge as the results
of iterative interactions between these two pathways, top to bottom and
bottom to top, rather than just by one-way processes along the bottom-
up pathway. It was furthermore speculated that a key to solving the
so-called hard problem of consciousness and free will could be found on
close examination of such interactions.
Based on the thoughts described in chapter 6, new challenges dis-
cussed in chapters7 through 10 concerned the reconstruction of various
cognitive or psychological behaviours in a set of synthetic neurorobot-
ics experiments. In these robotics studies, our research focus went
264

264 Exploring RoboticMinds

back and forth between two fundamental issues. On the one hand, we
explored how compositionality for cognition can be developed via itera-
tive sensorymotor level interactions of agents with their environments
and how these compositional representations can be grounded. On the
other hand, we also examined the codependent relationship between
the subjective mind and the objective world that emerges in their dense
interaction for the purpose of investigating the underlying structure of
consciousness and freewill.
In the first half of chapter 7, we investigated the development of
compositionality by reviewing a robotics experiment on predictive nav-
igation learning using a simple RNN model. The experimental results
showed that the compositionality hidden in the topological trajectory
in the obstacle environment can be extracted as embedded in a global
attractor with fractal structure in the phase space of the RNN model.
It was shown that compositional representation developed in the RNN
can be naturally grounded in the physical environment by allowing iter-
ative interactions between the two in a shared metricspace.
In the second half of chapter7, on the other hand, we explored a sense
of groundlessness (a sense of not to be grounded completely) through the
analysis of another navigation experiment. It was shown that the develop-
mental learning process during the exploration switched spontaneously
between coherent phases and incoherent phases when chain reactions
took place among different cognitive processes of recognition, prediction,
perception, learning, and acting. By referring to Heideggers example
about a carpenter hitting nails with a hammer, it was explained that the
distinction between the two poles of the subjective mind and the objec-
tive world become explicit in the breakdown, as shown in the incoherent
phase whereby the self rises to conscious awareness. We drew the con-
clusion that the open dynamic structure characterized by self-organized
criticality (SOC) can account for the underlying structure of conscious-
ness by way of which the momentary self appears spontaneously.
Chapter8 introduced the RNNPB as a model of mirror neurons that
have been considered to be crucially responsible for the composition
and decomposition of actions. The RNNPB can learn a set of behavior
primitives for generation as well as for recognition by means of error
minimization in a predictive coding framework. The RNNPB model
was evaluated through a set of robotics experiments including learning
of multiple movement patterns, imitation game, and associative learn-
ing of protolanguage and action whereby the following characteristics
265

Conclusions 265

emerged. (1)The model can recognize aspects of a continuous percep-


tual flow by segmenting it into a sequence of chunks or reusable primi-
tives; (2)a set of actional concepts can be learned with generalization
by developing relational structures among those concepts in the neu-
ral activation space, as shown in the experiment on associative learning
between protolanguage and actions; and (3)the model can generate not
only learned behavior patterns but also novel ones by means of twists or
dimples generated in the manifold of the RNNPB due to the potential
nonlinearity of the network.
Chapter9 addressed the issue of hierarchy in cognitive systems. For this
purpose, we proposed a dynamic model, the MTRNN that is character-
ized by its multiple timescale and examined how a functional hierarchy
for action can be developed in the model through robotics experiments
employing this model. Results showed that a set of behavior primitives
were developed in the fast timescale network in the lower level, while the
whole action plan that sequences the behavior primitives was developed
in the slow timescale network in the higher level. It was also found that the
initial neural activation state in the slow timescale network encoded the
top-down actional intention that triggers the generation of a correspond-
ing slow dynamics trajectory in the higher level, which again triggers the
projection of an intended sequence of behavior primitives from the lower
level of the network to the outer world. It was concluded that a sort of
fluid compositionality for smooth and flexible generation of actions was
achieved in the proposed MTRNN model through the self-organization
of a functional hierarchy by adopting neuroscientifically plausible con-
straints including timescale differences among different local networks
and structural connectivity among them as downward causation.
Chapter 10 also considered two problems about free will. One
involved its origin and the other the conscious awareness of it. From the
results of experiments employing the MTRNN model, Iproposed that
actional intention can be spontaneously generated by means of chaos in
the higher cognitive brain areas. It was postulated that intention or will
developed unconsciously in the higher cognitive brain by chaos would
only come to conscious awareness in a postdictive manner. More specifi-
cally, when a gap emerged between the top-down intention for acting
and the bottom-up perception of reality, the intention may be noticed
as the effort of minimizing this gap is exercised.
Furthermore, the chapter examined the circular causality developed
among different cognitive processes in humanrobot interactive tutoring
266

266 Exploring RoboticMinds

experiments. It was conjectured that free will could exist in the subjective
experience of the human experimenter as well as the robot who seeks
their own most possibility in their conflictive interaction when they feel
as if whatever creative image for next act could pop out freely in their
minds. The robot as well as human at such moments could be regarded
as authentic beings.
Finally, some concluding remarks are shown. The argument pre-
sented here leadsto:

1. The mind should emerge via intricate interactions between


the top-down subjective view for proactively acting on
the external world and the bottom-up recognition of the
perceptual reality.
2. Structures and functions constituting mechanisms
driving higher-order cognition, such as for compositional
manipulations of symbols, concepts, or linguistic thoughts
may develop by means of the self-organization of
neurodynamic structures through the aforementioned top-
down and bottom-up interactions, aiming at the reduction of
any apparent conflict between these two processing streams.
It is presumed that such a compositional cognitive process
embedded in neurodynamic attractors could be naturally
grounded into the physical world, provided they share the
same metric space for interaction.
3. Image or knowledge can be developed through multiple
stages of learning from an agents limited experiences
first stage:each instance of experience is acquired; second
stage:generalized images or concepts are developed
by extracting relational structures among the acquired
instances in the memory; third stage:novel or creative
structures can be found in the memory developed with
nonlinearity. Such a developmental process should take
place in a large network consisting of the PFC, the parietal
cortex, and the sensorymotor peripheral areas that are
assumed to be the neocortical target of the consolidative
learning in human or mammals.
4. However, the most crucial aspect of minds is the sense of
groundlessness that arises by circular causality, understood in
the end as the inseparability of subjectivity and the objective
267

Conclusions 267

world. This understanding could shed light on the hard


problem of consciousness and its relationship to the problem
of free will through unification of theoretical studies on SOC
of the holistic dynamics evolved and Heideggers thoughts on
authenticity.
5. The exploration of cognitive minds should continue with close
dialogue between objective science and subjective experience
(as suggested by Varela and others) for which synthetic
approaches including cognitive, developmental, or neuronal
robotics could contribute by providing effective research
platforms.
268
269

Glossary forAbbreviations

BPTT back-propagation throughtime


CPG central pattern generator
CTRNN continuous-time recurrent neural network
DOF degree of freedom
EEG electroencephalography
fMRI functional magnetic resonance imaging
IPL inferior parietallobe
LGN lateral geniculate nucleus
LIP lateral intraparietalarea
LSBN largescale brain network
LSTM long-term short-termmemory
LRP lateralized readiness potential
M1 primary motorcortex
MIST medial superior temporalarea
MSTNN multiple spatiotemporal neural network
MT middle temporalarea
MTRNN multiple timescale recurrent neural network
PB parametricbiases
PC parietalcortex
PCA principal component analysis
PFC prefrontalcortex
PMC premotorcortex
PMv ventral premotorarea
RNN recurrent neural network

269
270

270 Glossary for Abbreviations

RNNPB recurrent neural network with parametricbiases


RP readiness potential
SMA supplementary motorarea
SOC self-organized criticality
STS superior temporalsulcus
TEO inferior temporalarea
TPJ temporoparietal junction
V1 primary visualcortex
VIP ventral intraparietalarea
VP visuo-proprioceptive
271

References

Aihara, K., Takabe, T., & Toyoda, M. (1990). Chaotic neural networks. Physics
Letters A, 144, 333 340.
Aristotle. (1907). De anima (R. D. Hicks, Trans.). Oxford: Oxford
UniversityPress.
St Amant, R., & Riedl, M. O. (2001). A perception/action substrate for cog-
nitive modeling in HCI. International Journal of Human-Computer Studies,
55(1),1539.
Amari, S. (1967). A theory of adaptive pattern classifiers. IEEE Transactions
on Electronic Computers, 3, 299307.
Anderson, J. R. (1983). The architecture of cognition. Cambridge, MA:Harvard
UniversityPress.
Andry, P., Gaussier, P., Moga, S., Banquet, J. P., & Nadel, J. (2001). Learning
and communication via imitation: An autonomous robot perspective.
IEEE Transactions on Systems, Man and Cybernetics, Part A:Systems and
Humans, 31(5), 4314 42.
Arbib, M. A. (1981). Perceptual structures and distributed motor control. In
V. B. Brooks (Ed.), Handbook of physiology:The nervous system. II. Motor
control (pp. 14481480). Bethesda, MD:American Physiological Society.
Arbib, M. (2010). Mirror system activity for action and language is embed-
ded in the integration of dorsal and ventral pathways. Brain & Language,
112,1224.
Arbib, M. (2012). How the brain got language: The mirror system hypothesis.
NewYork:Oxford UniversityPress.
Arie, H., Endo, T., Arakaki, T., Sugano, S., & Tani, J. (2009). Creating novel
goal-
directed actions at criticality: A neuro-robotic experiment. New
Mathematics and Natural Computation, 5(01), 307334.

271
272

272 References

Arnold, L. (1995). Random dynamical systems. Berlin:Springer.


Asada, M., Hosoda K., Kuniyoshi, Y., Ishiguro, H., Inui, T., Yoshikawa, Y.,
Ogino, M. & Yoshida, C. (2009). Cognitive developmental robotics:Asur-
vey. IEEE Transactions on Autonomous Mental Development, 1(1),1234.
Asada, M. (2015). Towards artificial empathy. How can artificial empathy fol-
low the developmental pathway of natural empathy?. International Journal
of Social Robotics, 7(1),1933.
Bach, J. (2008). Principles of synthetic intelligence:Building blocks for an archi-
tecture of motivated cognition. NewYork:Oxford UniversityPress.
Bach, K., (1987). Thought and reference. Oxford:Oxford UniversityPress.
Badre, D., DEsposito, M. (2009). Is the rostro-caudal axis of the frontal lobe
hierarchical? Nature Reviews Neuroscience, 10, 6596 69.
Bak, P., Tang, C., & Wiesenfeld, K. (1987). Self-organized criticality: An
explanation of the 1/f noise. Physical Review Letters, 59, 381384.
Baldwin, D., Andersson, A., Saffran, J., & Meyer, M. (2008). Segmenting
dynamic human action via statistical structure. Cognition, 106, 13821407.
Balslev, D., Nielsen, F. A., Paulson, O. B., & Law, I. (2005). Right tempo-
roparietal cortex activation during visuo-proprioceptive conflict. Cerebral
Cortex, 15(2), 166169.
Baraglia, J., Nagai, Y., and Asada, M. (in press). Emergence of altruistic behav-
ior through the minimization of prediction error. IEEE Transactions on
Cognitive and Developmental Systems.
Bassett, D. S., & Gazzaniga, M. S. (2011). Understanding complexity in the
human brain. Trends in cognitive sciences, 15(5), 200209.
Beer, R. D. (1995a). On the dynamics of small continuous-time recurrent
neural networks. Adaptive Behavior, 3(4), 471511.
Beer, R. D. (1995b). A dynamical systems perspective on agent-environment
interaction. Artificial Intelligence, 72(1), 73215.
Beer, R. D. (2000). Dynamical approaches to cognitive science. Trends in
Cognitive Sciences, 4(3),919 9.
Bengio, Y., Courville, A., & Vincent, P. (2013). Representation learning:
Areview and new perspectives. IEEE Transactions on Pattern Analysis and
Machine Intelligence, 35(8), 17981828.
Billard, A. (2000). Learning motor skills by imitation:Abiologically inspired
robotic model. Cybernetics and Systems, 32, 155193.
Blakemore, S-J., & Sirigu, A. (2003). Action prediction in the cerebel-
lum and in the parietal cortex. Experimental Brain Research, 153(2),
239245.
Bor, D., & Seth, A. K. (2012). Consciousness and the prefrontal parietal net-
work:Insights from attention, working memory, and chunking. Frontiers in
Psychology,3,63.
Braitenberg, V. (1984). Vehicles: Experiments in synthetic psychology. Cam
bridge, MA: MITPress.
273

References 273

Brooks, R. A. (1990). Elephants dont play chess. Robotics and Autonomous


Systems, 6,315.
Brooks, R. A. (1991). Intelligence without representation. Artificial Intelligence
Journal, 47, 139159.
Campbell, D. T. (1974). Downward causation in hierarchically organized
biological systems. In Studies in the Philosophy of Biology (pp. 179186).
Macmillan EducationUK.
Cangelosi, A., & Schlesinger, M. (2015). Developmental robotics from babies to
robots. Cambridge, MA:MITPress.
Chalmers, D. J. (1995). Facing up to the problem of consciousness. Journal of
Consciousness Studies, 2(3), 200219.
Chomsky, N. (1972). Language and mind. New York: Harcourt Brace
Jovanovich.
Choi, M. & Tani, J. (2016). Predictive coding for dynamic vision : Development
of functional hierarchy in a multiple spatio-temporal scales RNN model.
arXiv.org preprint arXiv:1606.01672
Chomsky, N. (1980). Rules and representations. Oxford:Basil Blackwell.
Churchland, M. M., Yu, B. M., Cunningham, J. P., Sugrue, L. P., Cohen, M. R.,
Corrado, G. S., Newsome, W. T., Clark, A. M., Hosseini, P., Scott, B. B.,
Bradley, D. C., Smith, M. A., Kohn, A., Movshon, J. A., Armstrong, K. M.,
Moore, T., Chang, S. W., Snyder, L. H., Lisberger, S. G., Priebe, N. J.,
Finn, I. M., Ferster, D., Ryu, S. I., Santhanam, G., Sahani, M. & Shenoy, K. V.
(2010). Stimulus onset quenches neural variability:a widespread cortical
phenomenon. Nature Neuroscience, 13(3), 369378.
Churchland, M. M., Cunningham, J. P., Kaufman, M. T., Nuyujukian, P.,
Foster, J. D., Ryu, S. I., & Shenoy, K. V. (2012). Structure of neural popula-
tion dynamics during reaching. Nature, 487,5156.
Clark, A. (1998). Being there: Putting brain, body, and world together again.
Cambridge, MA:MITPress.
Clark, A., & Chalmers, D. (1998). The extended mind. Analysis, 58(1),719.
Clark, A. (1999). An embodied cognitive science?. Trends in Cognitive Sciences,
3(9), 345351.
Clark, A. (2015). Surfing uncertainty: Prediction, action, and the embodied
mind. NewYork:Oxford UniversityPress.
Cleeremans, A., Servan-Schreiber, D., & McClelland, J. L. (1989). Finite state
automata and simple recurrent networks. Neural Computation, 1, 372381.
Cliff, D., Husbands, P., & Harvey, I. (1993). Explorations in evolutionary
robotics. Adaptive Behavior, 2(1), 73110.
Crutchfield, J. P. & Young, K. (1989). Inferring statistical complexity. Physical
Review Letters, 63, 105108.
Dale, R., & Spivey, M. J. (2005). From apples and oranges to symbolic dynam-
ics: A framework for conciliating notions of cognitive representation.
Journal of Experimental & Theoretical Artificial Intelligence, 17(4), 317342.
274

274 References

Delcomyn, F. (1980). Neural basis of rhythmic behavior in animals. Science,


210, 492 498.
Demiris, Y., & Hayes, G. (2002). Imitation as a dual-route process featur-
ing predictive and learning components:Abiologically plausible compu-
tational model. In K. Dautenhahn & C.L. Nehaniv (Eds.), Imitation in
animals and artifacts (pp. 327361). Cambridge, MA:MITPress.
Dennett, D. (1993). Review of F.Varela, E.Thompson and E.Rosch (Eds.),
The embodied mind. American Journal of Psychology, 106, 121126.
Desmurget, M., & Grafton, S. (2000). Forward modeling allows feedback
control for fast reaching movements. Trends in Cognitive Sciences, 4(11),
423431.
Desmurget, M., Reilly, K. T., Richard, N., Szathmari, A., Mottolese, C., &
Sirigu, A. (2009). Movement intention after parietal cortex stimulation in
humans. Science, 324, 811813.
Devaney, R. L. (1989). An introduction to chaotic dynamical systems (Vol. 6).
Reading, MA:Addison-Wesley.
Diamond, A. (1991). Neuropsychological insights into the meaning of object
concept development. In S. Carey, R. Gelman (Eds.), The epigenesis of
mind:Essays on biology and knowledge (pp. 67110). Hillsdale, NJ:Erlbaum.
Di Paolo, E. A. (2000). Behavioral coordination, structural congruence and
entrainment in a simulation of acoustically coupled agents. Adaptive
Behavior, 8(1),2748.
Doetsch, P., Kozielski, M., & Ney, H. (2014). Fast and robust training of recur-
rent neural networks for offline handwriting recognition. In IEEE 14th
International Conference on Frontiers in Handwriting Recognition (ICFHR)
(pp. 279284).
Downar, J., Crawley, A. P., Mikulis, D.J., & Davis, K.D. (2000). A multi-
modal cortical network for the detection of changes in the sensory environ-
ment. Nature Neuroscience, 3(3), 277283.
Doya, K., & Uchibe, E. (2005). The cyber rodent project: Exploration of
adaptive mechanisms for self-preservation and self-reproduction. Adaptive
Behavior, 13(2), 149160.
Doya, K., & Yoshizawa, S. (1989). Memorizing oscillatory patterns in the ana-
log neuron network. Proceedings of the 1989 International Joint Conference
on Neural Networks, I,2732.
Dreyfus, H. L., & Dreyfus, S. E. (1988). Making a mind versus model-
ing the brain: artificial intelligence back at a branch point. Daedalus,
117(1),15 43.
Dreyfus, H. L. (1991). Being-in-the-world:Acommentary on Heideggers Being
and Time. Cambridge, MA:MITPress.
Du, J., & Poo, M. (2004). Rapid BDNF-induced retrograde synaptic modifica-
tion in a developing retinotectal system. Nature, 429, 8788 83.
275

References 275

Eagleman, D. M., & Sejnowski, T. J. (2000). Motion integration and postdic-


tion in visual awareness. Science, 287(5460), 20362038.
Edelman, G. M. (1987). Neural Darwinism:The theory of neuronal group selec-
tion. NewYork:Basic Books,Inc.
Ehrsson, H., Fagergren, A., Johansson, R., & Forssberg, H. (2003). Evidence
for the involvement of the posterior parietal cortex in coordination of fin-
gertip forces for grasp stability in manipulation. Journal of Neurophysiology,
90, 29782986.
Eliasmith, C. (2014). How to build a brain:Aneural architecture for biological
cognition. NewYork:Oxford UniversityPress.
Elman, J. L. (1990). Finding structure in time. Cognitive Science, 14, 179211.
Elman, J. L. (1991). Distributed representations, simple recurrent networks,
and grammatical structure. Machine Learning, 7(2 3), 195225.
Endo, G., Morimoto, J., Matsubara, T., Nakanishi, J., & Cheng, G.
(2008). Learning CPG-based biped locomotion with a policy gradient
method: Application to a humanoid robot. The International Journal of
Robotics Research, 27(2), 213228.
Eskandar, E., & Assad, J. (1999). Dissociation of visual, motor and predic-
tive signals in parietal cortex during visual guidance. Nature Neuroscience,
2,8893.
Evans, G. (1982). The varieties of reference. Oxford:ClarendonPress.
Fitzsimonds, R., Song, H., & Poo, M. (1997). Propagation of activity depend-
ent synaptic depression in simple neural networks. Nature, 388, 439 4 48.
Fleischer, J., Gally, J., Edelman, J., & Krichmar, J. (2007). Retrospective and
prospective responses arising in a modeled hippocampus during maze
navigation by a brain-based device. Proceedings of the National Academy of
Sciences of the USA, 104(9), 35563561.
Fodor, J., & Pylyshyn, Z. (1988). Connectionism and cognitive architec-
ture:Acritique. Cognition, 28,371.
Fogassi, L., Ferrari, P., Gesierich, B., Rozzi, S., Chersi, F., & Rizzolatti, G.
(2005). Parietal lobe:from action organization to intention understanding.
Science, 308, 662 6 67.
Freeman, W. (2000). How brains make up their minds? NewYork:Columbia
UniversityPress.
Fried, I., Katz, A., McCarthy, G., Sass, K. J., Williamson, P., Spencer, S. S. &
Spencer, D. D. (1991). Functional organization of human supplementary
motor cortex studied by electrical stimulation. Journal of Neuroscience, 11,
3656 3666.
Friston, K. (1998). The disconnection hypothesis. Schizophrenia Research,
30(2), 115125.
Friston, K. (2005). A theory of cortical responses. Philosophical transactions of
the Royal Society B:Biological Sciences, 360(1456), 815836.
276

276 References

Friston, K. (2010). The free-energy principle:Aunified brain theory? Nature


Reviews Neuroscience, 11, 127138.
Frith, C. D., & Frith, U. (2012). Mechanisms of social cognition, Annual
Review of Psychology, 63, 287313.
Fukushima, Y., Tsukada, M., Tsuda, I., Yamaguti, Y., & Kuroda, S. (2007).
Spatial clustering property and its self- similarity in membrane poten-
tials of hippocampal CA1 pyramidal neurons for a spatio-temporal input
sequence. Cognitive Neurodynamics, 1, 305316.
Gallagher, S. (2000). Philosophical conceptions of the self:Implications for
cognitive science. Trends in Cognitive Sciences, 4(1),1421.
Gallese, V. & Goldman, A. (1998). Mirror neurons and the simulation theory
of mind-reading. Trends in Cognitive Sciences, 2, 493501.
Gaussier, P., Moga, S., Quoy, M., & Banquet, J. P. (1998). From perception-
action loops to imitation processes:Abottom-up approach of learning by
imitation. Applied Artificial Intelligence, 12(7-8), 701727.
Georgopoulos, A. P., Kalaska, J. F., Caminiti, R., & Massey, J. T. (1982). On the
relations between the direction of two-dimensional arm movements and cell
discharge in primate motor cortex. The Journal of Neuroscience, 2, 15271537.
Gershkoff- Stowe, L., & Thelen, E. (2004). U- shaped changes in behav-
ior:Adynamic systems perspective. Journal of Cognition and Development,
5,1136.
Gibson, E.J., & Pick, A.D. (2000). An ecological approach to perceptual learn-
ing and development. NewYork:Oxford UniversityPress.
Gibson, J. J. (1986). The ecological approach to visual perception. Boston:
Houghton Mifflin.
Goodale, M. A., Milner, A. D., Jakobson, L. S., & Carey, D. P. (1991). A
neurological dissociation between perceiving objects and grasping them.
Nature, 349(6305), 1541546.
Graves, A., Wayne, G., & Danihelka, I. (2014). Neural turing machines.
arXiv.org preprint arXiv:1410.5401.
Graziano, M., Taylor, C., & Moore, T. (2002). Complex movements evoked
by microstimulation of precentral cortex. Neuron, 34, 841851.
Gunji, Y. & Konno, N. (1991). Artificial life with autonomously emerging
boundaries. Applied Mathematics and Computation, 43, 271298.
Haggard, P. (2008). Human volition:towards a neuroscience of will. Nature
Reviews Neuroscience, 9(12), 934946.
Haken, H. (1983). Advanced synergetics. Berlin:Springer.
Hannun, A., Case, C., Casper, J., Catanzaro, B., Diamos, G., Elsen, E.,
Prenger, R., Satheesh, S., Sengupta, S., Coates, A. & Ng, A. Y. (2014).
DeepSpeech: Scaling up end-to-end speech recognition. arXiv.org pre-
print arXiv:1412.5567.
Harnad, S. (1990). The symbol grounding problem. Physica D, 42, 335346.
277

References 277

Harnad, S. (1992). Connecting object to symbol in modeling cognition. In A.


Clarke, & R. Lutz (Eds.), Connectionism in context. Berlin:Springer Verlag.
Haruno, M., Wolpert, D. M., & Kawato, M. (2003). Hierarchical MOSAIC
for movement generation. In International congress series (Vol. 1250,
pp. 575590). Amsterdam:Elsevier.
Harris K. (2008). Stability of the fittest: Organizing learning through ret-
roaxonal signals. Trends in Neurosciences, 31(3), 130136.
Hasson, U., Yang, E., Vallines, I., Heeger, D. J., & Rubin, N. (2008). A hier-
archy of temporal receptive windows in human cortex. The Journal of
Neuroscience, 28(10), 25392550.
Hauk, O., Johnsrude, I., & Pulvermuller, F. (2004). Somatotopic representa-
tion of action words in human motor and premotor cortex. Neuron, 41(2),
301307.
Hauser, M. D., Chomsky, N., & Fitch, W. T. (2002). The faculty of lan-
guage:What is it, who has it, and how did it evolve? Science, 298(5598),
15691579.
Heidegger, M. (1962). Being and time (J. Macquarrie, & E. Robinson, Trans.).
London:SCMPress.
Molesworth, W. (1841). The English works of Thomas Hobbes (Vol. 5). J. Bohn,
1841.
Hinton, G., Osindero, S., & Teh, Y. W. (2006). A fast learning algorithm for
deep belief nets. Neural Computation, 18(7), 15271554.
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural
Computation, 9(8), 17351780.
Husserl, E. (1964). The phenomenology of internal time consciousness (J. S.
Churchill, Trans.). Bloomington, IN:Indiana UniversityPress.
Husserl, E. (1970). Logical investigations (Vol. 1). London:Routledge & Kegan
PaulLtd.
Husserl, E. (2002). Studien zur arithmetik und geometrie. New York:
Springer-Verlag.
Hyvarinen, J., & Poranen, A. (1974). Function of the parietal associative area
7as revealed from cellular discharges in alert monkeys. Brain, 97, 6736 92.
Hwang, J., Jung, M., Madapana, N., Kim, J., Choi, M., & Tani, J. (2015).
Achieving synergy in cognitive behavior of humanoids via deep learning of
dynamic visuo-motor-attentional coordination. In Proceeding of 2015 IEEE-
RAS 15th International Conference on Humanoid Robots (pp. 817824).
Iacoboni, M., Woods, R. P., Brass, M., Bekkering, H., Mazziotta, J. C. &
Rizzolatti, G. (1999). Cortical mechanisms of imitation. Science, 286,
25262528.
Ijspeert, A. J. (2001). A connectionist central pattern generator for the aquatic
and terrestrial gaits of a simulated salamander. Biological Cybernetics, 84,
331348.
278

278 References

Ikeda, K., Otsuka, K. & Matsumoto, K. (1989). Maxwell-Bloch turbulence.


Progress of Theoretical Physics, 99, 295324.
Ikegami, T. & Iizuka, H. (2007). Turn-taking interaction as a cooperative and
co-creative process. Infant Behavior and Development, 30(2), 278288.
Ikegami, T. (2013). A design for living technology: Experiments with the
mind time machine. Artificial Life, 19(3 4), 387 400.
Ikegaya, Y., Aaron, G., Cossart, R., Aronov, D., Lampl, I., et al. (2004).
Synfire chains and cortical songs: Temporal modules of cortical activity.
Science, 304, 559564.
Iriki, A., Tanaka, M., & Iwamura, Y. (1996). Coding of modified body schema
during tool use by macaque postcentral neurones. Neuroreport, 7(14),
23252330.
Ito, M. (1970). Neurophysiological basis of the cerebellar motor control sys-
tem. International Journal of Neurology, 7, 162176.
Ito, M. (2005). Bases and implications of learning in the cerebellumadaptive
control and internal model mechanism. Progress in Brain Research, 148,
95109.
Ito, M., & Tani, J. (2004). On-line imitative interaction with a humanoid
robot using a dynamic neural network model of a mirror system. Adaptive
Behavior, 12(2), 93115.
Jaeger, H., & Haas, H. (2004). Harnessing nonlinearity: Predicting cha-
otic systems and saving energy in wireless telecommunication. Science,
308,788 0.
Jaeger, H., Lukoeviius, M., Popovici, D., & Siewert, U. (2007). Optimization
and applications of echo state networks with leaky-integrator neurons.
Neural Networks, 20(3), 335352.
James, W. (1884). The dilemma of determinism. Unitarian Review (Vol. XXII,
p.193). Reprinted (1956) in The will to believe (p.145). Mineola, NY:Dover
Publications,p.145.
James, W. (1892). The stream of consciousness. World:Cleveland,OH.
James, W. (1918). The principles of psychology (Vol 1). New York, NY:
HenryHolt.
Jeannerod, M. (1994). The representing brain: Neural correlates of motor
intention and imagery. Behavioral and Brain Sciences, 17, 187202.
Johnson-P ynn, J., Fragaszy, D. M., Hirsh, E. M., Brakke, K. E., & Greenfield,
P. M. (1999). Strategies used to combine seriated cups by chimpanzees
(Pan troglodytes), bonobos (Pan paniscus), and capuchins (Cebus apella).
Journal of Comparative Psychology, 113(2), 137148.
Jordan, M. I. (1986). Attractor dynamics and parallelism in a connectionist
sequential machine. In Proceedimgs of Eighth Annual Conference of Cognitive
Science Society (pp. 531546). Hillsdale, NJ:Erlbaum.
279

References 279

Jung, M., Hwang, J., & Tani, J. (2015). Self-organization of spatio-temporal


hierarchy via learning of dynamic visual image patterns on action
sequences. PLoS One, 10(7), e0131214.
Kaneko, K. (1990). Clustering, coding, switching, hierarchical ordering and
control in a network of chaotic elements. Physica D, 41, 13772.
Karmiloff-Smith, A. (1992). Beyond modularity:Adevelopmental perspective
on cognitive science. Cambridge, MA:MITPress.
Kawato, M. (1990). Computational schemes and neural network models
for formation and control of multijoint arm trajectory. In T. Miller, R. S.
Sutton, & P. J. Werbos (Eds.), Neural networks for control (pp. 197228).
Cambridge, MA:MITPress.
Kelso, S. (1995). Dynamic patterns:The self-organization of brain and behavior.
Cambridge, MA:MITPress.
Kiebel, S., Daunizeau, J., & Friston, K. (2008). A hierarchy of time-scales and
the brain. PLoS Computational Biology, 4, e1000209.
Kimura, H., Akiyama, S., & Sakurama, K. (1999). Realization of dynamic
walking and running of the quadruped using neural oscillator. Autonomous
Robots, 7(3), 247258.
Kirkham, N., Slemmer, J., & Johnson, S. (2002). Visual statistical learning
in infancy:Evidence for a domain general learning mechanism. Cognition,
83, B35B 42.
Klahr, D., Chase, W. G., & Lovelace, E. A. (1983). Structure and process in
alphabetic retrieval. Journal of Experimental Psychology:Learning, Memory,
and Cognition, 9(3),462.
Kolen, J. F. (1994). Exploring computational complexity of recurrent neural net-
works. (PhD thesis, The Ohio State University).
Kourtzi, Z., Tolias, A. S., Altmann, C. F., Augath, M., & Logothetis, N. K.
(2003). Integration of local features into global shapes:monkey and human
fMRI studies. Neuron, 37(2), 333 346.
Koza, J. R. (1992). Genetic programming:On the programming of computers by
means of natural selection. Cambridge, MA:MITPress.
Krichmar, J. L. & Edelman, G. M. (2002). Machine psychology:autonomous
behavior, perceptual categorization and conditioning in a brain- based
device, Cerebral Cortex, 12, 818830.
Kuniyoshi, Y., Inaba, M. and Inoue, H. (1994). Learning by watch-
ing:Extracting reusable task knowledge from visual observation of human
performance. IEEE Transactions on Robotics Automation, 10, 799822.
Kuniyoshi, Y., Ohmura, Y., Terada, K., Nagakubo, A., Eitoku, S. I., &
Yamamoto, T. (2004). Embodied basis of invariant features in execution
and perception of whole-body dynamic actionsk nacks and focuses of
Roll-a nd-R ise motion. Robotics and Autonomous Systems, 48(4), 189201.
280

280 References

Kuniyoshi. Y., & Sangawa, S. (2006). Early motor development from par-
tially ordered neural-body dynamicsexperiments with a cortico-spinal-
musculo-skeletal model. Biological Cybernetics, 95, 5896 05.
Laird, J. E., Newell, A., & Rosenbloom, P. S. (1987). Soar:An architecture for
general intelligence. Artificial Intelligence, 33,16 4.
Laird, J. E. (2008). Extending the Soar cognitive architecture. Frontiers in
Artificial Intelligence and Applications, 171,224.
LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient- based
learning applied to document recognition. Proceedings of the IEEE, 86(11),
22782324.
Li, W., Pich, V., & Gilbert, C. D. (2006). Contour saliency in primary visual
cortex. Neuron, 50(6), 9519 62.
Libet, B. (1985). Unconscious cerebral initiative and the role of conscious will
in voluntary action. Behavioral and Brain Sciences, 8, 529539.
Lu, X., & Ashe, J. (2005). Anticipatory activity in primary motor cortex
codes memorized movement sequences. Neuron, 45, 967973.
Luria, A. (1973). The working brain. London:Penguin BooksLtd.
McCarthy, J. (1963). Situations, actions and causal laws. Stanford Artificial
Intelligence Project, Memo 2. Stanford University.
Markov, A. (1971). Extension of the limit theorems of probability theory to
a sum of variables connected in a chain. Dynamic Probabilistic Systems, 1,
552577.
Markram, H., Muller, E., Ramaswamy, S., Reimann, M. W., Abdellah, M.,
Sanchez, C. A., & Kahou, G. A.A. (2015). Reconstruction and simula-
tion of neocortical microcircuitry. Cell, 163(2), 456492.
Matari, M. (1992). Integration of representation into goal-d riven behavior-
based robots. IEEE Transactions on Robotics and Automation, 8(3),
304312.
Matsuno, K. (1989). Physical Basis of Biology. Boca Raton, FL:CRCPress.
Maturana, H. R., & Varela, F. J. (1980). Autopoiesis and Cognition.
Netherlands:Springer.
May, R. M. (1976). Simple mathematical models with very complicated
dynamics. Nature, 261(5560), 459467.
Meeden L. (1996). An incremental approach to developing intelligent neural
network controllers for robots. IEEE Transactions on Systems, Man, and
Cybernetics, Part B, 26(3), 474485.
Merleau-Ponty, M. (1962). Phenomenology of perception (C. Smith, Trans.),
London:Routledge & Kegan PaulLtd.
Merleau-Ponty, M. (1968). The Visible and the invisible:Followed by working
notes (Studies in phenomenology and existential philosophy). Evanston,
IL:Northwestern UniversityPress.
Meltzoff, A. N., & Moore, M. K. (1977). Imitation of facial and manual ges-
tures by human neonates. Science, 198(4312),7578.
281

References 281

Meltzoff, A.N. (2005). Imitation and other minds:The like me hypothe-


sis. In S. Hurley and N. Chater (Eds.), Perspectives on imitation:From cogni-
tive neuroscience to social science (pp. 5577). Cambridge, MA:MITPress.
Metta, G., Natale, L., Nori, F., Sandini, G., Vernon, D., Fadiga, L., et al.
(2010). The iCub humanoid robot:An open-systems platform for research
in cognitive development. Neural Networks, 23(89), 11251134.
Miller, G. A. (1956). The magical number seven, plus or minus two:Some
limits on our capacity for processing information. Psychological Review,
63(2),81.
Morimoto, J., & Doya, K. (2001). Acquisition of stand-up behavior by a real
robot using hierarchical reinforcement learning. Robotics and Autonomous
Systems, 36(1),3751.
Mormann, F., Kornblith, S., Quiroga, R. Q., Kraskov, A., Cerf, M., Fried,
I., & Koch, C. (2008). Latency and selectivity of single neurons indicate
hierarchical processing in the human medial temporal lobe. Journal of
Neuroscience, 28, 8865 8 872.
Mulliken, G. H., Musallam, S., & Andersen, R. A., (2008). Forward esti-
mation of movement state in posterior parietal cortex. Proceedings of the
National Academy of Sciences of the USA, 105(24), 81708177.
Murata, S., Yamashita, Y., Arie, H., Ogata, T., Sugano, S., & Tani, J. (2015).
Learning to perceive the world as probabilistic or deterministic via interac-
tion with others:Aneuro-robotics experiment. IEEE Transactions on neu-
ral Networks and Learning Systems, [2015 Nov 18; epub ahead of print],
DOI:10.1109/T NNLS.2015.2492140
Mushiake, H., Inase, M., & Tanji, J. (1991). Neuronal activity in the pri-
mate premotor, supplementary, and precentral motor cortex during visu-
ally guided and internally determined sequential movements. Journal of
Neurophysiology, 66(3), 705718.
Nadel, J. (2002). Imitation and imitation recognition:Functional use in pre-
verbal infants and nonverbal children with autism. In A. N. Meltzoff, &
W. Prinz (Eds.), The imitative mind:Development, evolution, and brain bases
(pp. 4262). Cambridge UniversityPress.
Nagai, Y., & Asada, M. (2015). Predictive learning of sensorimotor informa-
tion as a key for cognitive development. In Proceedings of the IROS 2015
Workshop on Sensorimotor Contingencies for Robotics. Osaka,Japan.
Namikawa, J., Nishimoto, R., & Tani, J. (2011). A neurodynamic account of
spontaneous behavior. PLoS Computational Biology, 7(10), e1002221.
Newell, A., & Simon, H. (1972). Human problem solving. Englewood Cliffs,
NJ:Prentice-Hall.
Newell, A., & Simon, H. A. (1975). Computer science as empirical
inquiry:Symbols and search. Communications of the ACM, 19(3), 113126.
Newell, A. (1990). Unified theories of cognition. Cambridge, MA: Harvard
UniversityPress.
282

282 References

Nicolis, G., & Prigogine, I. (1977). Self-organization in nonequilibrium systems.


NewYork:Wiley.
Nishida, K. (1990). An inquiry into the good (M. Abe & C. Ives, Trans.). New
Haven:Yale UniversityPress.
Nishimoto, R., & Tani, J. (2009). Development of hierarchical structures for
actions and motor imagery:Aconstructivist view from synthetic neuroro-
botics study. Psychological Research, 73, 545558.
Nolfi, S. & Floreano, D. (2000). Evolutionary robotics:The biology, intelligence,
and technology of self-organizing machines. Cambridge, MA:MITPress.
Nolfi, S., & Floreano, D. (2002). Synthesis of autonomous robots through
artificial evolution. Trends in Cognitive Sciences, 6(1),3137.
Ogai, Y., & Ikegami, T. (2008). Microslip as a simulated artificial mind.
Adaptive Behavior, 16(23), 129147.
Ogata, T., Hattori, Y., Kozima, H., Komatani, K., & Okuno, H. G. (2006).
Generation of robot motions from environmental sounds using intermo-
dality mapping by RNNPB. In Sixth International Workshop on Epigenetic
Robotics, Paris, France.
Ogata, T., Yokoya, R., Tani, J., Komatani, K., & Okuno, H. G. (2009).
Prediction and imitation of others motions by reusing own forward-inverse
model in robots. In Proceedings of the 2009 IEEE International Conference on
Robotics and Automation (pp. 41444149). Kobe,Japan.
ORegan, J. K., & Noe, A. (2001). A sensorimotor account of vision and visual
consciousness. Behavioral & Brain Sciences, 24, 9391031.
Oudeyer, P. Y., Kaplan, F., & Hafner, V. V. (2007). Intrinsic motivation sys-
tems for autonomous mental development. Evolutionary Computation,
IEEE Transactions on, 11(2), 265286.
Oztop, E., Kawato, M., & Arbib, M. (2006). Mirror neurons and imita-
tion:Acomputationally guided review. Neural Networks, 19(3), 254271.
Paine, R. W., & Tani, J. (2005). How hierarchical control self-organizes in
artificial adaptive systems. Adaptive Behavior, 13(3), 211225.
Park, G., & Tani, J. (2015). Development of compositional and contextual
communicable congruence in robots by using dynamic neural network
models. Neural Networks, 72, 109122.
Pepperberg, I. M., & Shive, H. R. (2001). Simultaneous development of
vocal and physical object combinations by a Grey parrot (Psittacus eritha-
cus):Bottle caps, lids, and labels. Journal of Comparative Psychology, 115(4),
376384.
Perry, W., & Braff, D. L. (1994). Information-processing deficits and thought
disorder. American Journal of Psychiatry, 15(1), 363 367.
Pfeifer, R & Bongard, J. (2006). How the body shapes the way we thinkA new
view of intelligence. Cambridge, MA:MITPress.
Piaget, J. (1951). The child's conception of the world. Rowman & Littlefield.
283

References 283

Piaget, J. (1962). Play, dreams, and imitation in childhood (G. Gattegno, &
F. M. Hodgson, Trans.). NewYork:Norton.
Pollack, J. B. (1991). The induction of dynamical recognizers. Machine
Learning, 7, 227252.
Pulvermuller, F. (2005). Brain mechanisms linking language and action.
Nature Neuroscience, 6(5),7682.
Ramachandran, V. S., & Blakeslee, S. (1998). Phantoms in the brain:Probing
the mysteries of the human mind. NewYork:William Morrow.
Rao, R., & Ballard, D. (1999). Predictive coding in the visual cortex:Afunc-
tional interpretation of some extra-classical receptive-field effects. Nature
Neuroscience, 2,7987.
Ritter, F. E., Baxter, G. D., Jones, G., & Young, R. M. (2000). Supporting cog-
nitive models as users. ACM Transactions on Computer-Human Interaction,
7(2), 141173.
Rizzolatti, G., Fadiga, L., Gallese, V., & Fogassi, L. (1996). Premotor cor-
tex and the recognition of motor actions. Cognitive Brain Research, 3,
131141.
Rizzolatti, G., Fogassi, L., & Gallese, V. (2001). Neurophysiological mecha-
nisms underlying the understanding and imitation of action. Nature Review
Neuroscience, 2, 661670.
Rizzolatti, G., & Craighero, L. (2004). The mirror-neuron system. Annual
Review of Neuroscience, 27, 169192.
Rosander, R., & von Hofsten, C. (2004). Infants emerging ability to represent
object motion. Cognition, 91,122.
Rssler, O. E. (1976). An equation for continuous chaos. Physics Letters,
57A(5), 397 398.
Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning internal
representations by error propagation. In D. E. Rumelhart, & J. L. Mclelland
(Eds.), Parallel distributed processing: Explorations in the microstructure of
cognition. Cambridge, MA:MITPress.
Rumelhart, D. E., McClelland, J. L., & the PDP Research Group. (1986).
Parallel distributed processing:Explorations in the microstructure of cognition,
Cambridge, MA:MITPress.
Saffran, J., Aslin, R., & Newport, E. (1996). Statistical learning by 8-month-
old infants. Science, 274, 19261928.
Sakata, H., Taira, M., Murata, A., & Mine, S. (1995). Neural mechanisms
of visual guidance of hand action in the parietal cortex of the monkey.
Cerebral Cortex, 5(5), 429438.
Schaal, S. (1999). Is imitation learning the route to humanoid robots? Trends
in Cognitive Sciences, 3, 233242.
Scheier, C., Pfeifer, R., Kuniyoshi, Y. (1998). Embedded neural net-
works:Exploiting constraints. Neural Networks, 11, 15511596.
284

284 References

Schmidhuber, J. (1992). Learning complex, extended sequences using the


principle of history compression. Neural Computation, 4 (2), 234242.
Schner, G. & Kelso, J. A.S. (1988). Dynamic pattern generation in behav-
ioral and neural systems. Science, 239, 15131539.
Schner, G., & Thelen, E. (2006). Using dynamic field theory to rethink
infant habituation. Psychological Review, 113(2), 273299.
Shanahan, M. (2006). A cognitive architecture that combines internal simula-
tion with a global workspace. Consciousness and cognition, 15(2), 4334 49.
Shibata, K., & Okabe, Y. (1997). Reinforcement learning when visual sen-
sory signals are directly given as inputs. In Proceedings of IEEE International
Conference on Neural Networks (Vol. 3, pp. 17161720).
Shima, K., & Tanji, J. (1998). Both supplementary and presupplementary
motor areas are crucial for the temporal organization of multiple move-
ments. Journal of Neurophysiology, 80, 32473260.
Shima, K., & Tanji, J. (2000). Neuronal activity in the supplementary and
presupplementary motor areas for temporal organization of multiple
movements. Journal of Neurophysiology, 84, 21482160.
Shimojo, S. (2014). Postdiction: Its implications on visual awareness, hind-
sight, and sense of agency. Frontiers in Psychology, 5,196.
Siegelmann, H. T. (1995). Computation beyond the Turing limit. Science,
268(5210), 545548.
Simon, H. A. (1981). The sciences of the artificial (2nd ed.). Cambridge,
MA:MITPress.
Sirigu, A., Daprati, E., Ciancia, S., Giraux, P., Nighoghossian, N., Posada, A., &
Haggard, P. (2003). Altered awareness of voluntary action after damage to
the parietal cortex. Nature Neuroscience, 7,808 4.
Sirigu, A., Duhamel, J. R., Cohen, L., Pillon, B., Dubois, B. & Agid, Y. (1996).
The mental representation of hand movements after parietal cortex dam-
age. Science, 273(5281), 15641568.
Smith, L. & Thelen, E. (2003). Development as a dynamic system. Trends in
Cognitive Sciences, 7(8), 343348.
Soon, C., Brass, M., Heinze, H., & Haynes, J. (2008). Unconscious deter-
minants of free decisions in the human brain. Nature Neuroscience, 11,
543545.
Spencer- Brown, G. (1969). Laws of form. Wales, UK: George Allen and
UnwinLtd.
Spivey, M. (2007). The continuity of mind. New York: Oxford University
Press.
Sporns, O. (2010). Networks of the brain. Cambridge, MA:MITPress.
Squire, L. R., & Alvarez, P. (1995). Retrograde amnesia and memory consoli-
dation:Aneurobiological perspective. Current Opinion in Neurobiology, 5,
169177.
285

References 285

Steil, J. J., Rthling, F., Haschke, R., & Ritter, H. (2004). Situated robot
learning for multi-modal instruction and imitation of grasping. Robotics
and Autonomous Systems, 47(2), 129141.
Sugita, Y., & Tani, J. (2005). Learning semantic combinatoriality from the
interaction between linguistic and behavioral processes. Adaptive Behavior,
13(1),3352.
Sun, R. (2016). Anatomy of mind: Exploring psychological mechanisms and
processes with the CLARION cognitive architecture. New York: Oxford
UniversityPress.
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., &
Rabinovich, A. (2015). Going deeper with convolutions. In Proceedings of
the IEEE Conference on Computer Vision and Pattern Recognition (pp.19).
Taga, G., Yamaguchi, Y. & Shimizu, H. (1991). Self-organized control of
bipedal locomotion by neural oscillators in unpredictable environments.
Biological Cybernetics, 65, 147159.
Tanaka, K. (1993). Neuronal mechanisms of object recognition. Science, 262,
685 6 88.
Tani, J. (1996). Model-based learning for mobile robot navigation from the
dynamical systems perspective. IEEE Transactions on Systems, Man, and
Cybernetics, Part B, 26(3), 421436.
Tani, J. (1998). An interpretation of the self from the dynamical systems
perspective: A constructivist approach. Journal of Consciousness Studies,
5(5- 6 ), 516542.
Tani, J. (2003). Learning to generate articulated behavior through the bottom-
up and the top-down interaction process. Neural Networks, 16,1123.
Tani, J. (2004). The dynamical systems accounts for phenomenology of
immanent time:An interpretation by revisiting a robotics synthetic study.
Journal of Consciousness Studies, 11(9),524.
Tani, J. (2009). Autonomy of self at criticality:The perspective from syn-
thetic neuro-robotics. Adaptive Behavior, 17(5), 4214 43.
Tani, J. (2014). Self-Organization and compositionality in cognitive brains:
Aneurorobotics study. Proceedings of the IEEE, 102(4), 5866 05.
Tani, J., Friston, K., & Haykin, S. (2014). Self-organization and composition-
ality in cognitive brains [Further thoughts]. Proceedings of the IEEE, 4(102),
606 6 07.
Tani, J., & Fukumura N. (1997). Self- organizing internal representation
in learning of navigation: a physical experiment by the mobile robot
YAMABICO. Neural Networks, 10(1), 153159.
Tani, J., & Fukumura, N. (1993). Learning goal-d irected navigation as attrac-
tor dynamics for a sensory motor system. (An experiment by the mobile
robot YAMABICO), Proceedings of the 1993 International Joint Conference
on Neural Networks (pp. 17471752).
286

286 References

Tani, J., & Fukumura, N. (1995). Embedding a grammatical description in


deterministic chaos:an experiment in recurrent neural learning. Biological
Cybernetics, 72(4), 365370.
Tani, J. and Nolfi, S. (1997). Self-organization of modules and their hierarchy
in robot learning prblems:Adynamical systems approach. System Analysis
for Higher Brain Function Research Project News Letter, 2(4),111.
Tani, J., & Nolfi, S. (1999). Learning to perceive the world as articulated:An
approach for hierarchical learning in sensory- motor systems. Neural
Networks, 12(7), 11311141.
Tani, J., Ito, M., Sugita, Y. (2004). Self-organization of distributedly repre-
sented multiple behavior schemata in a mirror system:Reviews of robot
experiments using RNNPB. Neural Networks, 17, 12731289.
Tani, T. (1998). The physics of consciousness. Tokyo:Keiso-shobo.
Taniguchi, T., Nagai, T., Nakamura, T., Iwahashi, N., Ogata, T., & Asoh,
H. (2016). Symbol emergence in robotics: A survey. Advanced Robotics,
DOI:10.1080/01691864.2016.1164622.
Tanji, J., & Shima, K. (1994). Role for supplementary motor area cells in pla-
nning several movements ahead. Nature, 371, 413416.
Tettamanti, M., Buccino, G., Saccuman, M. C., Gallese, V., Danna, M., Scifo,
P. F., Fazio, Rizzolatti, G. S., Cappa, F., & Perani, D. (2005). Listening to
action-related sentences activates fronto-parietal motor circuits. Journal of
Cognitive Neuroscience, 17(2), 273281.
Thelen, E. & Smith, L. (1994). A dynamic systems approach to the development
of cognition and action, Cambridge, MA:MITPress.
Tokimoto, N., & Okanoya, K. (2004). Spontaneous construction of Chinese
boxes by Degus (Octodon degu): A rudiment of recursive intelligence?
Japanese Psychological Research, 46, 255261.
Tomasello, M. (2009). Constructing a language:Ausage-based theory of lan-
guage acquisition. Cambridge, MA:Harvard UniversityPress.
Tononi, G. (2008). Consciousness as integrated information: A provisional
manifesto. The Biological Bulletin, 215(3), 216242.
Trevena, J. A., & Miller, J. (2002). Cortical movement preparation before and
after a conscious decision to move. Consciousness and Cognition, 11(2),
162190.
Tsuda, I., Krner, E. & Shimizu, H. (1987). Memory dynamics in asynchro-
nous neural networks. Progress of Theoretical Physics, 78,5171.
Tsuda, I. (2001). Toward an interpretation of dynamic neural activity in terms
of chaotic dynamical systems. Behavioral and Brain Sciences, 24(5), 793810.
Uddn, J., & Bahlmann, J. (2012). A rostro-caudal gradient of structured
sequence processing in the left inferior frontal gyrus. Philosophical.
Transactions of the Royal Society of London Series B-Biological Sciences,
367, 20232032.
287

References 287

Ueda, S. (1994). Experience and Awareness: Exploring Nishida Philosophy


(English Translation from Japanese). Tokyo, Japan:Iwanami Shoten.
Keiken to jikakuNishida tetsugaku no basho wo motomete, Iwanami-shoten.
Ugur, E., Nagai, Y., Sahin, E., & Oztop, E. (2015). Staged development of
robot skills: Behavior formation, affordance learning and imitation with
motionese. IEEE Transactions on Autonomous Mental Development, 7(2),
119139.
Van de Cruys, S., Evers, K., Van der Hallen, R., Van Eylen, L., Boets, B., de-
Wit, L., & Wagemans, J. (2014). Precise minds in uncertain worlds:predic-
tive coding in autism. Psychological Review, 121(4),649.
Varela, F. J., Thompson, E. T., & Rosch, E. (1991). The embodied mind:Cognitive
science and human experience. Cambridge, MA:MITPress.
Varela, F. J. (1996). Neurophenomenology:Amethodological remedy to the
hard problem. Journal of Consciousness Studies, 3, 330350.
Varela, F. J. (1999). Present- t ime consciousness. Journal of Consciousness
Studies, 6 (2-3), 111140.
von Hofsten, C., & Rnnqvist, L. (1988). Preparation for grasping an
object:Adevelopmental study. Journal of Experimental Psychology:Human
Perception and Performance, 14, 610621.
Werbos, P. (1974). Beyond regression:New tools for prediction and analysis in
the behavioral sciences. (PhD thesis, Harvard University).
Werbos, P. J. (1988). Generalization of backpropagation with application to a
recurrent gas market model. Neural Networks, 1(4), 339356.
White, J. (2016). Simulation, self-extinction, and philosophy in the service of
human civilization. AI & Society, 31(2), 171190.
Wilson, M. A. & McNaughton, B. L. (1994). Reactivation of hippocampal
ensemble memories during sleep. Science, 265, 676 679.
Williams, B. (2014). Descartes: The project of pure enquiry. London and
NewYork:Routledge.
Williams, R. J. & Zipser, D. (1989). A Learning algorithm for continu-
ally running fully recurrent neural networks. Neural Computation, 1,
270280.
Wolpert, D., & Kawato, M. (1998). Multiple paired forward and inverse mod-
els for motor control. Neural Networks, 11, 13171329.
Yamashita, Y., & Tani, J. (2008). Emergence of functional hierarchy in a
multiple timescale neural network model:Ahumanoid robot experiment.
PLoS Computational Biology, 4(11), e1000220.
Yamashita, Y., & Tani, J. (2012). Spontaneous prediction error generation in
schizophrenia. PloS One, 7(5), e37843.
Yen, S. C., Baker, J., & Gray, C. M. (2007). Heterogeneity in the responses
of adjacent neurons to natural stimuli in cat striate cortex. Journal of
Neurophysiology, 97, 13261341.
288

288 References

Ziemke, T. & Thieme, M. (2002). Neuromodulation of reactive sensorimotor


mappings as a short-term memory mechanism in delayed response tasks.
Adaptive Behavior, 10(3/4), 185199.
Zhong, J., Cangelosi, A., & Wermter, S. (2014). Toward a self-organizing pre-
symbolic neural model representing sensorimotor primitives. Frontiers in
Behavioral Neuroscience,7,22.
289

Index

Note:Page numbers followed by f and t denote figures and tables,


respectively.

absolute flow level,28,39 free will for, 21941, 221f, 223f,


abstract sequences, SMA 225f, 228f, 229f, 233f, 234f,
encoding,5253 236f,238f
accidental generations with functional hierarchy developed
spontaneous variation,22526 for, 199218, 200f, 201f,
actional consequences, predictive 203f,265
learning from,15173 intransitive,65
action generation,19798 language bound to, 19096,192f
brain,446 8 neurodynamics generating tasks
hierarchical mechanisms for, of, 21315,214f
50f,52f parietal cortex meeting of,
through hierarchy, 4950,50f 5661,59f
perception's rolein,55 perceptual reality changed
sensory-motor flow mirroring, by,6061
17598,178f primitives, 14548,147f
action-related words,67 recognition's circular causality
actions. See also complex actions; with,149
goal-directed actions; intentions subjective mind influencedby,49
categories,196 training influencing,215
conscious decision preceded transitive,65
by,237 unconscious generation of,215
external inputs perturbing,215 as voluntary,39

289
290

290 Index

action sequences, 15, 21920. See authentic being, 3132, 143,


also chunking 148,17172
as compositional, 22930,252 authenticity, 3132, 23739,
MTRNNs generating, 22730, 238f,267
228f,229f autism, 256,25758
as novel, 22730,229f autorecovery, 155, 157,160
active learner,143
active perception,13031 back-propagation through time
Act-R,14 (BPTT),116f
affective developmental Badre, D., 2067,207f
robotics,261 Bahlmann, J., 2067,207f
affordance, 9394. See also Bak, P.,171
Gibsonian approach Ballard, Dona,48,60
agent,3132 Beer, Randall, 121, 126f,
alien hand syndrome,5051 12728,128f
alternative images,71 behavior attractor,13031
alternative thoughts,71 behavior-based approach,37
Amari, Shun-ichi, 113. See also error behavior-based robotics, 82, 1039,
back-propagationscheme 104f, 105f,107f
ambiguity,636 4 behavior primitives, 10, 13. See
animals, recursion-like behaviors alsochunks
exhibited by,1011 compositions, 200202,
A-not-B task, 98100,99f 200f,201f
appearance, 2425,24f functional hierarchy development,
Arbib, Michael, 910, 58, 66, 203f,2056
175,191 localist scheme, 200201,200f
Arie, H.,228f MTRNN,2046
Aristotle,4,261 PB vector value assigned to,
arm robot, 13132,131f 2012,201f
artificial evolution,12628 behaviors. See also skilled behaviors
Asada, Minoru,261 distributed representation
Ashe, J.,5253 embedding of, 18082,181f
attractive object, 98100,99f as imitative, 66, 100102,102f
attractors, 9197. See also behavior model,1919 6
attractor; limit cycle attractors as reactive,14142
as global, 15859,158f as spontaneous,21930
invariant set, 84, 15859,158f Being and Time (Heidegger),30
Rssler, 90,158 being-in-t he-world, 2932,34
types of, 8485, 85f,91f beings, 22,2425
attunement,39 as authentic, 3132, 143,
authentic agent,3132 148,17172
291

Index 291

of equipment,3031 models, 8183,10912


as inauthentic,14243 outcomes monitored by,6061
man reflectingon,31 overview,4379
meaningof,30 recognition in, 5568, 59f, 62f,65f
bimodal neurons, 5354, spatio-temporal hierarchy of,217
5657,208 symbols in,24547
Blakemore, S.-J.,58 two-stage model mechanized
Blakeslee,S.,63 by,4041
blind man, 3334,61 visual recognition, 4454, 45f,
Blue Brain project,255 46f, 47f, 50f,52f
bodies, 3334, 596 0, 59f. See also brain science,26267
Cartesian dualism brain and,4379
bonobos,11 future directions of,25561
bottom-up error regression, on linguistic competency,19091
MTRNN, 2078, 207f,215 MTRNN correspondence,
bottom-up pathway, 636 4, 16465, 2068,207f
205,263 Braitenberg, Valentino, 1036,
bottom-up recognition, 6061, 107,1089
19798,266 branching
bound learning, 1919 6, 192f, overview, 13234,133f
193f,195f Yamabico, 1526 0,153f
boys, 119, 119f,120 Brecker, Michael,25253
BPTT. See back-propagation Broca's area, 191,196
throughtime Brooks, Rodney, 103, 106, 1078,
brains. See also neural network 107f, 125,145
models; specific brain structures Buddhist meditation,25455
action generation, 446 8,65f button press trial, 6971,70f
brain science and, 4379, 65f,
70f,73f calculus of indications,1819
chemical plant as,45,6 Cantor set, 1586 0,158f
cognitive competencies carpenter, 3031, 42, 248,264
hostedby,10 Cartesian dualism, 7, 16, 3237,
dynamics in,4041 36f,149
FLN componentin,11 cascaded recurrent neural network
hierarchical mechanisms, 4454, (cascaded RNN),116f
47f, 50f,52f catastrophic forgetting,165
human language-ready, cats,49
6667,191 cells, 49, 5153, 52f, 57. See also
intention adjustment mechanisms neurons
employed by,6061 central pattern generators (CPGs),
mind originating in,46 12628,128f
292

292 Index

cerebellum,58,60 cognitive competencies,10


cerebral hemorrhage,57 cognitive fragmentation,257
CFG. See context-f ree grammar cognitive minds, 14950,
Chalmers, David, 75, 172,246 24347,253
chaos, 879 0, 91f, 108,22527 cognitive models, 1315,14t
chaotic attractor, 84,85f cognitive processes, 7, 155, 266.
chaotic itinerancy, 1686 9,169f See also embodied cognition;
chemical plant,45,6 embodiment
chiasm,35 cognitivism, 24445,262
chimps,1011 composition, 913,12f
Chomsky, Noam, 1012, 12f, 19091, context,1819
244, 260. See also faculty of models, 1315,14t
language in broad sense; faculty overview,920
of language in narrowsense recursion, 913,12f
chunking, 15,262 symbol grounding problem,
chunks, 17576,19798 1518,17f
junctions, 220,22526 symbol systems, 913,12f
MTRNN, 2046,22223 coherence,16972
QRIO,18687 collective neurons, 63, 7273,73f
structures, 220,22526 collision-f ree maneuvering,152
Churchland, M., 72, 175, 208,226 columnar organization, 45, 46,
circuit-level mechanisms, 46f,49
7677,7879 comb,5051
circular causality, 7, 149, 17072, combinatory explosion problem,161
179, 198, 24041,26567 complex actions
authenticity and, 23739,238f developmental training of,20915
criticality and, 23739, experiments,20915
238f,25051 QRIO, 20915, 209f, 212f,214f
CLARION. See Connectionist complex object features, 4647,
Learning with Adaptive Rule 46f,47f
Induction On-line complex objects,48
Clark, Andy, 9495,246 complex visual objects,46f
classical artificial intelligence compositional action sequences,
(classical AI),106 22930,252
Cleeremans, A.,159 compositionality, 24849, 262,266
closed-loop mode,178 in cognitive mind,24347
CNN. See convolutional neural development of, 15261,264
network as fluid, 202, 216,265
codevelopment process,213 generalization and, 1949 6,195f
cogito, 23, 25, 2932, 1079, 107f. MTRNN, 21718,218f
See alsobeing compositions, 14548,147f
293

Index 293

behavior primitives, 200202, 122f, 123f, 127, 2045. See also


200f,201f multiple-t imescale recurrent
cognitivism, 913,12f neural network
localist scheme, 200201,200f continuous-t ime systems,90
in symbol systems, 913,12f contours,47,48
concepts,246 convolutional neural network
concrete movements,3334 (CNN), 256,258
Connectionist Learning with corridor, 94,95f
Adaptive Rule Induction On- cortical electrical stimulation
line (CLARION),247 study,7374
connection weight matrix,121 cortical song,72
conscious awareness,248 counting,1012
free will for, 21941, 236f,238f CPGs. See central pattern generators
intentions, 6975, 70f, 73f, 230 creative images,197
39, 236f,238f criticality, 23739, 238f,25051
conscious decision, action CTRNN. See continuous-t ime
preceding,237 recurrent neural network
conscious memory,2728 cup nesting,11
consciousness, 25, 187, 250. See also cursor,57
streams of consciousness
absolute flowof,28 Dale, R.,146
cogito problem concerning,2932 Dasein,34
easy problemof,75 death, 31,17172
free will and, 23039, 236f,238f deep learning, 25354,25859
hard problem of, 75, 172, 24950, deep minds,259
263,267 degu,11
postdiction and, 23039, Demiris, Y., 200201,200f
236f,238f Dennett, Daniel,108
questions about,34 depression,256
structure of,172 Descartes, Ren, 16, 29, 243. See
surprise quantifying,172n2 also Cartesian dualism
conscious states,3739 Desmurget, M., 7374, 75, 76,237
consolidation, 1646 9, 167f, 169f, D'Esposito, M., 2067,207f
197,22526 deterministic chaos,22627
context-f ree grammar (CFG), deterministic dynamics,22627
11,12f developmental psychology,
contexts, 1819, 48, 157, 97100,99f
1586 0,158f developmental training, 20915,
continuity of minds,100 209f, 212f, 214f,25961
continuous-t ime recurrent neural Diamond, A.,211
network (CTRNN), 12025, difference equation,838 4
294

294 Index

dimension, 3536,36f recurrent neural network with


direct experiences, 2223, 23f, parametricbiases
2628, 1068, 107f,14243 A Dynamic Systems Approach to
direct reference,34 the Development of Cognition
disconnectivity hypothesis,257 and Action (Thelen and
discrete movements, 180,180f Smith),9798
discrete time system, 859 0, 86f, dynamic systems theory, 8393, 85f,
88f,89f 86f, 88f, 89f, 91f,92f
distributed representation
framework, 177, 18082, 181f, easy problem,75
19697,2012 echo-state network,12425
disturbance of self,25657 Edelman, G.,255
Doetsch, P.,258 edge of chaos,89
domain specificity,29 electroencephalography (EEG), 61,
do-re-mi example,2628 6971,70f
double intentionalities,142 electrophysiological
dreaming,165 experiments,5657
Dreyfus, H.L., 2829, 41,145 Elephants don't play chess
dynamical structure, 8687, 132 (Brooks),106
36, 133f, 135f,245 Eliasmith, C.,255
dynamical systems. See also Elman, Jeffrey, 11820,119f
nonlinear dynamical systems Elman net, 11820,119f
continuous-t ime,90 embodied cognition, 23536.
difference equation,838 4 See also dynamic neural
discrete time, 859 0, 86f, 88f,89f networkmodels
neurorobotics from perspective of, definitionof,82
12536, 126f, 128f, 129f, 130f, dynamical systems approach
131f, 133f,135f modeling, 81137,85f
structural stability, 9093, embodied mind, 3237, 36f, 42,254
91f,92f embodiment, 78, 79, 1079,
dynamical systems approach, 107f,23637
79,263 dimension of,3536
embodied cognition modeled by, Gibsonian approach and,9495
81137,126f prediction error generated by,240
self-organization appliedby,7 emergence through synthesis,83
dynamic closure, 160, 160f,1666 8 emergency shutdown,45
dynamic conflict resolution,215 emotions,261
dynamic learning, intermittency end effectors,67
during, 1666 9, 167f,169f end-to-end learning,217
dynamic neural network models, entrainment, 959 6,15456
137, 17679, 178f. See also epoch. See suspension of disbelief
295

Index 295

error back-propagation scheme, fixed point attractor, 84, 85, 85f,94


11316, 113f, 123f,257 FLB. See faculty of language in
CTRNN application of,2045 broadsense
perceptual sequences acquired by, flesh,3436
2045,206 FLN. See faculty of language in
retrograde axonal narrowsense
signaling mechanism Floreano, D.,13031
implementing,2078 flow, of subjective
error regression, 23139, 233f, 234f, experiences,2629
236f, 238f,257 fluid compositionality, 202, 216,265
Evans, Gareth,910 fMRI. See functional magnetic
evolution,12632 resonance imaging
experiences, 266. See also direct focus of expansion (FOE), 94,95f
experiences; first-person forward model, 58, 152,161
experience; pure experience; frame problem, 59, 161,17778
subjective experiences frame system,29
continuous flow of,2628 Freeman, Walter, 55, 7273,
perception dependence of,2342 225,237
of selfhood,39 free will, 6975, 78, 218, 24851,
extended mind,246 261, 263,26567
external inputs,215 for action, 21941, 236f,238f
external observer,252 for conscious awareness, 21941,
221f, 223f, 225f, 228f, 229f,
facial imitation,101 233f, 234f, 236f,238f
faculty of language in broad sense consciousness and, 23039,
(FLB), 10,12,16 236f,238f
faculty of language in narrow sense consolidation,22526
(FLN), 10, 1113, 12f,16,19 definitionof,39
fallenness,32 experiments, 22125, 221f,
fast dynamics, 203f, 204, 205,206 223f,225f
at M1, 2067,207f intention correlates, 6975,
QRIO,210 70f,73f
feature representation,49 James considering,22526
feed-forward network model, model for, 3941,40f
11220, 113f, 116f, 119f, in MTRNN model, 22022,221f
12930,130f overview,39
Feynman, Richard, 81,103 postdiction and, 23039,
fingers, 9697,96f 236f,238f
finite state machine (FSM), 17, stream of consciousness and,
888 9, 153, 160,227 3741, 40f,42
first-person experience, 1068,107f vehicle possessing,108
296

296 Index

Fried, I., 74,7576 grammar, 11, 1213,12f


Friston, Karl, 172n2, 179, 250,257 grandmother cells,24546
frontal cortex,207 grasping neurons, 646 6,65f
frontopolar part of prefrontal cortex, Graves, A.,245
7173,73f Graziano,M.,54
FSM. See finite state machine groundlessness, 24041, 251, 253,
Fukumura, Naohiro,132 264,2676 8
Fukushima, Y.,1596 0 Gunji, Y.,252
functional magnetic resonance
imaging (fMRI), 6061, 65, 66, Haas, H.,125
67,7071 hair,5051
hallucinations,22829
Gallagher, S.,170 hammer, 3031, 42, 60, 170,
Gallese, Vittorio,67,68 248,264
gated local network models, hands, 3435,63
200,202 metronome in synchrony with,
gated recurrent neural networks 9697,96f
(RNNs), 200,222 QRIO imitating, 1879 0,189f
Gaussie, 131f,182 QRIO predicting, 18387,
generalization, 1949 6, 184f,185f
195f,25758 handwriting recognition system,258
General Problem Solver (GPS), hard problem, 75, 172, 24950,
1315, 14t, 17,15657 263,267
genetic algorithm,217 harmonic oscillator,92,93
Georgopoulos, A., 4950,54 Harnad, Steven, 16. See also symbol
Gershkoff-Stowe, L.,98,99 grounding problem
Gestalt. See structuring processes Harris, K.,124
ofwhole Haruno, M., 200201,200f
Gibson, Eleanor, 55, 58,143 Hauk, O.,6667
Gibson, J., 9395,95f Hayes, G., 200201,200f
Gibsonian approach, 9395, Hebbian learning,19091
95f, 107, 263. See also Neo- Heidegger, Martin, 21, 39, 4142,
Gibsonian approaches 60, 61, 14243, 162, 170,
global attractor, 15859,158f 248, 251, 264, 267. See also
goal-d irected action plans, 13, 108, being-in-t he-world
15657,157f on future,235
goal-d irected actions, 656 6, on past,235
21015, 212f, 214f,253 hermeneutics,2931
Goldman, Alvin,67,68 Hierarchical Attentive Multiple
Goodale, Mel,56 Models for Execution and
GPS. See General ProblemSolver Recognition, 200201,200f
297

Index 297

hierarchical mechanisms, 4454, ideational apraxia,57


45f, 46f, 47f, 50f,52f ideomotor apraxia,57
hierarchical mixture, Ikegami, Takashi,137
200201,200f Ikegaya, Y., 72,109
hierarchical Modular Selection images, 7172, 182, 197, 266. See also
and Identification for Control motor imagery; visual imagery
(MOSAIC), 200201,200f imitation, 100102, 102f, 13132,
hierarchy, 4950, 50f, 262,265 131f,188
hippocampus, 7273, 73f, 109,165 game, 1879 0, 189f,251
Hobbes, Thomas, 39,237 for humans,190
Hochreiter, S.,21617 manipulation, 22122,221f
holding neurons,65 by mental state reading, 1829 0,
homeostasis principle,182 184f, 185f,189f
Hopfield network, 164,165 prediction error influencing,198
how pathway,56,63 QRIO, 1879 0,189f
Humanoid Brain project,256 imitative actions, statistical learning
humanoid robots, 22122, 221f, of, 22122,221f
227 30, 228f, 229f, 257. See imitative behaviors,66
also other robot; selfrobot imperative sentences, 1919 6, 192f,
humans 193f,195f
cogito level had by, 1079,107f impression,27
direct experiences for,14243 inauthentic agent,3132
imitation for,190 inauthentic being,14243
intransitive actionsof,65 incoherence,16972
language-ready brains, 6667,191 index fingers, 9697,96f
linguistic competency of, indexing,1819
6667,19091 infants
mirror systems in,65,66 developmental psychology,
parietal cortex damagein,57 97100,99f
presupplementary motor imitation in, 100102,102f
area,7576 intentionality possessed by,211
Hume, David,170 object used by, 1012, 102f,
Husserl, Edmund, 21, 2223, 23f, 143,211
2425, 24f, 41, 61, 106, 14243, preverbal, 1012,102f
145, 176, 18687,24849 inferior frontal cortex,61
on direct experiences,2628 inferior parietal cortex,183
temporality notionof,32 inferior parietal lobe (IPL),656 6
on time perception,2629 inferior temporal area (TE) (TEO),
Hwang, J.,259 46 47, 46f,47f
hybrid system. See symbol grounding inferotemporal cortex, 4647,
problem 46f,47f
298

298 Index

infinite regression,19 parietal cortex,2078


information bottlenecks,210 QRIO,210
information hubs, 208,210 VP trajectories, 21315,214f
information mismatch, 201,202 intermittency, during dynamic
information processing, 200201 learning, 1666 9, 167f,169f
initial states, 20815, 209f, intermittent chaos, 89,168
212f,214f intermittent transitions,226
of intention units, 22022,221f internal contextual dynamic
setting,218 structures, 13236, 133f,135f
inner prediction error,257 internal observer,252
instrumental activities, 102,102f intransitive actions,65
Intelligence without representation intraparietal sulcus,62
(Brooks),106 invariant set, 84, 15859,158f
intentionalities, 28, 142, 161, 211, IPL. See inferior parietallobe
225. See also subjectivity Iriki, Atsushi, 6162,62f
intentions, 5661, 59f,78 Ito, Masao, 58,152
conscious awareness, 6975, 70f,
73f, 23039, 236f,238f Jaeger, H., 125,204
free will neural correlates, 6975, James, William, 21, 3741, 40f,
70f,73f 42, 69, 7172, 162, 170,
initiation of, 7175,73f 182, 250. See also streams of
intention switched to from,144 consciousness
mirror neurons coding,68 free will consideration of,22526
organization of,7475 momentary self spoken of by,171
parietal cortex as involvedin,76 Jeannerod,M.,59
from PFC,144 Johnson-P ynn,J.,11
prediction error generated by,240 Jordan-t ype recurrent neural
rising of, 6975, 70f,73f network (Jordan-t ype RNN),
spontaneous, 6975, 70f, 73f, 116f, 133f,134
230 40, 236f,238f joystick task,57
top-down subjective,263
intention-to-perception Karmiloff-Smith, A.,211
mapping,144 Kawato, Mitsuo, 58,152
intention units, 2046, 218, Kelso, Scott, 9597,96f
22022,221f Khepera, 12930, 129f,130f
interaction,25255 Kiebel, S., 2067,207f
interactionism, problem of, kinetic melody, 202,216
16,24347 knowledge, 57,266
intermediate dynamics, 203f, Kohonen network, 164, 22730,
204,205 228f,229f
MTRNN, 223f,224 Kourtzi,Z.,48
299

Index 299

Kugler,959 6 in MTRNN,213
Kuniyoshi, Y., 12830, 129f, periodicity of,1666 8
130f,17576 limit torus, 84,85f
linguistic competency,
Laird, John, 15,24647 6667,19091
landmark-based navigation, mobile LIP. See lateral intraparietalarea
robot performing, 16272, 163f, local attractors, 84, 85,85f
167f, 169f,248 localist scheme, 19697,
landmarks, 1718, 17f,17071 200201,200f
language, action bound to, local representation framework,177
1909 6,192f locomotion, limit attractor
language-ready brains, 6667,191 evolution, 12628, 128f. See
latent learning,161 also walking
lateral intraparietal area (LIP),46 locomotive controller,12728
Lateralized Readiness Potential,70 logistic maps, 858 9, 86f, 88f, 89f,
learnable neurorobots,141 90,108
learning, 25961. See also longitudinal intentionality,28
consolidation; deep learning; long-term and short-term memory
dynamic learning; error back- recurrent neural network
propagation scheme; imitation; (RNN) model,21617
predictive learning look-ahead prediction, Yamabico,
bound, 1919 6, 192f, 193f,195f 15457, 155f,157f
as end-to-end,217 Lu, X.,5253
Hebbian,19091 Luria, Alexander, 202,216
of imitative actions, 22122,221f Lyapunov exponent,224
as latent,161
offline processes,19798 M1. See primary motorcortex
in RNNPB, 17782, 178f,181f macaque monkeys, 45f,46
as statistical, 22122,221f Mach, Ernst, 2223,23f
lesion, 224,225f man, 31, 3334,61
Li,W.,48 manipulation,636 4
Libet, Benjamin, 6971, 70f, 218, imitation, 22122,221f
219, 220, 223, 230, 235, 240, of QRIO, 20915, 209f, 212f,214f
249,263 symbol, 14548,147f
like me mechanism, 101, 132, 183, tutored sequences, 22730,
187,190 228f,229f
limbs, 33, 6263,7374 of visual objects,5657
limit cycle attractors, 84, 85, Markov chains,22627
85f,9293 Massachusetts Institute of
locomotion evolution with, Technology (MIT),103
12628,128f Matari, M.,108
300

300 Index

matching,188 minimal self,16972


Matsuno, K.,252 Minsky, Marvin,29
Maturana, H., 117,132 mirror box,63
May, Robert, 85. See also mirror neurons, 5556, 261. See also
logisticmaps recurrent neural network with
meanings,1959 6 parametricbiases
medial superior temporal area dynamic neural network model
(MST),4546 for, 17679,178f
medial temporal lobe,246 evidence for, 6467,65f
melody,2628 grasping, 646 6,65f
Meltzoff, A., 101, 183,187 holding,65
memory cells,217 implementation,676 8
mental rehearsal, 1646 9, intention codedby,68
167f,169f IPL,656 6
mental simulation, 154,156 model, 17779, 1919 6, 192f,
mental states, imitating others 193f,195f
by reading, 1829 0, 184f, of monkeys,76
185f,189f overview, 646 8,65f
Merleau-Ponty, Maurice, 21, 25, in parietal cortex, 76,177
3237, 36f, 42, 616 4, 78, 144, tearing,65
237, 244. See also embodiment; mirror systems, in humans,65,66
Schneider mismatches, 6061, 201,202
middle temporal area (MT),45 MIT. See Massachusetts Institute of
middle way,25455 Technology
Miller,J.,70 mixed pattern generator, 128,128f
Milner, David,56 mobile robots, 1618, 17f. See also
miming,57 Yamabico
mind/body dualism. See Cartesian example, 56, 1618,17f
dualism landmark-based navigation
mind-reading,68 performed by, 16272,
minds, 38. See also cognitive 163f,248
minds; consciousness; embodied in office environment,
cognition; subjectivemind 16 18,17f
continuity of,100 problem, 1618,17f
deep,259 with vision, 16272, 163f, 167f,
embodiment of, 3237, 36f, 169f, 173, 1939 6,193f
42,254 models,57
as extended,246 modularity, 4449, 45f, 46f,47f
overview,26267 momentary self, 170, 171, 173,264
theoryof,67 monkeybanana problem,
minimal cognition,126 1415,14t
301

Index 301

monkeys, 45f, 46, 48, 50, 5152, 52f, multiple spatio-temporal neural
5354,57 network (MSTNN), 217,218f
inferior parietal cortex of,183 multiple-t imescale recurrent neural
IPL of,656 6 networks (MTRNNs), 252,
mirror neuronsof,76 257,265
motor cortex of,208 action sequences generated by,
motor neurons of,183 227 30, 228f,229f
parietal cortex of, 61 6 2, behavior primitives,2046
62f,76 bottom-up error regression, 2078,
PMC of,208 207f,215
PMv controlling, 6465,65f brain science correspondence,
presupplementary motor 2068,207f
area,7576 chunks, 2046,22223
primitive movements of,7576 compositionality, 21718,218f
Moore, M.,101 experiment, 20815, 209f, 212f,
moral virtue,261 214f, 23035, 233f,234f
Mormann, F.,246 free will in, 22022,221f
mortality,32 limit-cycle attractors in,213
motifs,72 motor imagery generated by,206
motor cortex, 208,22223 overview, 2038, 203f, 207f,
motor imagery, 59, 206, 211, 21618,218f
22224,223f perceptual sequences,2046
motor neurons, of monkeys,183 recognition performed by,206
motor programs, 20815, 209f, RNNPB as analogous to,22930
212f,214f top-down forward
motor schemata theory, prediction,215
910,17576 top-down pathway, 2078,207f
movements tutoring, 23739,238f
discrete, 180,180f Mu-ming Poo,124
parietal cortex,7374 Murata, A.,233f
patterns, 18082, 181f, 1879 0, Mushiake, H.,5354
189f, 21315,214f mutual imitation game,
PMC,73 1879 0,189f
MST. See medial superior
temporalarea Nadel, Jacqueline, 1012, 102f,
MSTNN. See multiple spatio- 131,188
temporal neural network Namikawa, J., 221f, 223f,225f
MT. See middle temporalarea navigation, 251. See also landmark-
MTRNNs. See multiple-t imescale based navigation; mobilerobot
recurrent neural networks dynamical structure in, 13236,
Mulliken, G.H.,57 133f,135f
302

302 Index

navigation (Cont.) as spiking, 10910,25556


internal contextual dynamic V1,48
structures in, 13236, neuro-phenomenological-
133f,135f robotics,25657
problem, 1079,107f neurophenomenology program,254
self-organization in, 13236, neurorobotics
133f,135f from dynamical systems
Yamabico experiments, perspective, 12536, 126f, 128f,
13236,15362 129f, 130f, 131f, 133f,135f
Neo-Gibsonian approaches, 9597, model,257 59
96f, 144,263 neuroscience. See brain science
neonates,101 Newell, Allen, 13, 15. See also
neural activation sequences, General ProblemSolver
22224,223f newness,24849
neural activation state,208 Nishida, Kitaro, 21, 2223,25
neural circuits, 117, 13236, Nishimoto, R., 209f,212f
133f,135f Nolfi, S., 13031, 200201,200f
neural correlates, 7677,7879 nonlinear dynamical systems,
neural network models, 25556. See structural stability of, 9093,
also feed-forward networkmodel 91f, 92f. See also logisticmaps
overview, 11225, 113f, 116f, 119f, nonlinear dynamics,83 93
122f,123f nonlinear mapping, tangency in,
types of, 11225, 113f, 116f, 119f, 89,89f
122f,123f nouvelle artificial intelligence
neurodynamic models, subjective (nouvelle AI),106
views in, 14348, 145f,147f novel action sequences, 22730,
neurodynamic structure, 228f,229f
15759,158f nowness, 27, 17172, 18687,235
neurodynamics with timescales,
21315,214f objectification,2629
neurodynamic system, 14548,147f objective science, subjective
neurons, 46. See also mirror experience and, 25155,267
neurons; motor neurons; neural objective time, 2728,3839
networkmodels objective world,26667
bimodal, 5354, 5657,208 phenomenology, 7, 2342, 24f,
collective, 63, 7273,73f 36f,40f
hard problem,75 subjective mind as tied to, 49,
as motifs,72 14850,149f
PMC, 7273,73f subjective mind's distinction
postsynaptic,124 from, 7, 2342, 24f, 36f,40f
presynaptic,124 subjectivity as mirror of,172
303

Index 303

objectivity, 250,25455 activations, 19193,192f


objects. See also manipulation; tools; prediction error,198
visual objects self-organization,192
as attractive, 98100,99f vectors, 18387, 185f, 19193,
chimps and,1011 192f, 1949 6, 195f, 197, 2012,
complex,46f 201f,204
counting of,1012 parietal cortex, 55, 78, 237, 266. See
features,46f also precuneus
infants using, 1012, 102f, action intention meeting of,
143,211 5661,59f
perception of, 3337,36f bimodal neurons in,208
shaking, 14445,145f cells,57
skilled behaviors for manipulating, damageto,57
5761,59f as information hub,208
subject as separated from,2223 intention involvementof,76
subject iterative exchanges,36 intermediate dynamics,2078
subject's unified existence with, mirror neurons in, 76,177
25, 36, 244,248 of monkeys, 6162, 62f,76
as three-d imensional, 3536,36f movements,7374
as two-d imensional, 3536,36f overview, 5661,59f
offline learning processes,19798 perceptual outcome meeting of,
offline look-ahead prediction. See 5661,59f
look-ahead prediction perceptual structures in,144
one-step prediction, 15455, predictive model in, 59f,68
155f,232 stimulation of,7374
online prediction,15354 visual objects involvement
open-loop mode,178 of,5657
operating system (OS),79 parrots,11
optical constancy,95f past,235
optical flow, 94,95f pastness,27
OS. See operatingsystem PB. See parametricbias
other robot, 23135, 233f,234f PCA. See principal component
outfielder,9495 analysis
overregularization,98 PDP Research Group. See Parallel
Oztop,E.,58 Distributed Processing
ResearchGroup
palpation, 3435,144 perception. See also active
Parallel Distributed Processing perception; what pathway;
(PDP) Research Group,196 where pathway
parametric bias (PB), 17782, 178f, action changing reality of,6061
181f,215 action generation roleof,55
304

304 Index

perception (Cont.) subjective mind, 7, 2342, 24f,


cogito as separate from,25 36f,40f
experience as dependent on, time perception, 2629, 176,
2342, 24f, 36f,40f 18687,24849
intention altered by reality Piaget, Jean, 98101, 99f, 102,260
of,6061 Pick, Anne, 55, 58,143
of objects, 3337,36f pilots,94
outcome, 5661,59f PMC. See premotorcortex
parietal cortex meeting of, PMv. See ventral premotorarea
5661,59f Poincar section, 90,91f
of square, 2425,24f polarity,25
of time, 2629, 176, poles,36
18687,24849 Pollack, J.,159
perception-to-action mapping,144 postdiction, 23039, 233f, 234f,
perception-to-motor cycle, 106,107 236f,238f
perceptual constancy,95 postsynaptic neurons,124
perceptual flows,258 posture, 596 0,59f
perceptual sequences, 17779, 178f, poverty of stimulus problem,
203f,2046 193,260
perceptual structures, in parietal precuneus, 7173,73f
cortex,144 prediction. See also one-step
perchings, 38,226 prediction
periodicity, of limit cycle errors, 16672, 167f, 169f, 19293,
attractors,1666 8 198, 206, 2078, 23132, 236,
perseverative reaching, 240,25758
98100,99f as offline,154
PFC. See prefrontalcortex as online,15354
Pfeifer, R., 12830, 129f,130f RNNs as responsible for,186
phantom limbs, 33,6263 of sensation, 15357, 153f,
phase transitions, 9597, 96f,144 155f,157f
phenomenological reduction,21 top-down, 6061,
phenomenology,20 16465,19798
being-in-t he-world,2932 Yamabico, 15357, 153f, 155f,157f
direct experience in, 2223,23f predictive coding, 48, 1919 6, 192f,
embodiment of mind, 3237,36f 193f,195f
objectification,2629 predictive dynamics, self-
objective world, 7, 2342, 24f, consciousness and, 16172,
36f,40f 163f, 167f,169f
overview, 2142, 23f, 24f, 36f, predictive learning
40f,24751 from actional
subjective experiences,2629 consequences,15173
305

Index 305

about world, 15173, 153f, 155f, problem of interactionism,


157f, 158f, 160f, 163f, 167f,169f 16,24347
predictive model, 576 4, 59f, proprioception, 596 0, 59f, 179,
62f,68 18387, 184f,185f
preempirical time,2627 protention, 2627, 61, 186,198
prefrontal cortex (PFC), 144, protosigns,6667
2067, 207f, 22526, 266. Pulvermuller, F.,191
See also frontopolar part of pure experience, 222 3,
prefrontalcortex 26 2 8,39
premotor cortex (PMC), 4954, 50f,
52f,7778 Quest for cuRIOsity (QRIO), 18390,
of monkey,208 184f,189f
movements,73 complex actions, 20915, 209f,
neurons, 7273,73f 212f,214f
roleof,76 developmental training, 20915,
stimulationsof,73 209f, 212f,214f
present,3132 fast dynamics,210
presentness,27 intermediate dynamics,210
presupplementary motor area, direct manipulation of, 20915, 209f,
stimulation of,7476 212f,214f
presynaptic neurons,124 slow dynamics,210
pretend play,101
preverbal infants, 1012,102f rake,62
primary motor cortex (M1), 4953, Ramachandran,V.,63
50f, 52f, 54, 60,7778 Rao, Rajesh,48,60
faster dynamics at, 2067,207f rapid eye movement (REM) sleep
SMA and, 2067,207f phase,165
primary visual cortex (V1), rats, hippocampus of, 7273,73f
4445,48 reactive behaviors,14142
primitive actions, stochastic Readiness Potential (RP),
transitions between, 6971,70f
22122,221f recognition, 22. See also visual
primitive movements, 5152, 52f, recognition
53,7576 action's circular causality
principal component analysis with,149
(PCA),211 bottom-up, 6061, 19798,266
Principles of Psychology in brain, 556 8, 59f, 62f,65f
(James),3739 of landmarks,17071
private states of MTRNNs performing,206
consciousness,3839 of perceptual sequences,206
probabilistic processes,22627 reconstruction,22
306

306 Index

recurrent neural networks (RNNs), retrograde axonal signal,124


11112, 150, 245. See also retrograde axonal signaling
cascaded recurrent neural mechanism, 124,2078
network; Jordan-t ype recurrent Rizzolatti, G., 646 6, 65f,
neural network 76,18283
as forward dynamics model,152 RNNPB. See recurrent neural
as gated,222 network with parametricbiases
models, 11620, 116f, 119f, 124, RNNs. See recurrent neural
202, 21617,264 networks
prediction responsibility of,186 robotics, 56, 261. See also
Yamabico, 15361, 153f, 155f, behavior-based robotics;
157f,160f neurorobotics
recurrent neural network with robots. See also arm robot; behavior-
parametric biases (RNNPB), based robotics; mobile robots;
17779, 205. See also other robot; selfrobot
parametricbias Cartesian dualism freedom of,149
characteristics of, 17982,181f as humanoid, 22122, 221f, 227
distributed representation 30, 228f, 229f,257
characteristics in,197 Khepera, 12930, 129f,130f
frame problem avoided by,17778 navigation problem, 1079,107f
learning in, 17782, 178f,181f reflective selves of,216
models, 19198, 192f, 193f, 195f, as self-narrative, 206, 216,249
2012, 201f, 204, 22930, with subjective views,14143
24849,26465 walking of, 12628,128f
MTRNN as analogous to,22930 Rssler attractor, 90,158
overview, 17679,178f Rssler system, 90,91f
segmentation of,18687 rostral-caudal gradient, 2067,207f
system flow of,178f RP. See Readiness Potential
recursion, 913,12f rules, 11, 12f, 1415,14t
reflective pattern generator,127 Rumelhart, D., 113. See also error
reflective selves,216 back-propagationscheme
refrigerator, 5, 19,4142
refusal of deficiency,33 Sakata, Hideo,57
rehearsal, 1646 9, 167f,169f sand pile behavior,171
REM phase. See rapid eye movement scaffolding,260
sleepphase Scheier, C., 12830, 129f,130f
representation, 25, 28, 106, 108, schizophrenia,25657
14548,147f Schmidhuber, J.,21617
response facilitation with Schneider, 3334,56
understanding meaning,183 see-ers,3435
retention, 2627, 186,198 segmentation, 176,18687
307

Index 307

self-consciousness, 16172, 163f, articulating, 17598,185f


167f, 16970,169f sensory-motor sequences
selfhood,39 model,1919 6
self-organization, 7, 98, 130, sentences,190
202,24445 Elman net generating,
in bound learning process, 11820,119f
1949 6,195f model, 1919 6,192f
of dynamical structure, 13236, recursive structure of, 11,12f
133f, 135f,245 sequence patterns, 17779, 178f. See
dynamical systems approach also recurrent neural network
applying,7 with parametricbiases
of functional hierarchy, 2038, sequential movements,5354
203f,207f shaking, 14445,145f
multiple timescales, 2038, Shima, K., 5152, 52f, 53, 76,206
203f,207f short-term memory (STM),247
in navigation, 13236, 133f,135f Siegelmann, Hava, 112,24445
PB,192 Simon, Herbert, 13. See also General
self-organized criticality (SOC), ProblemSolver
17172, 1889 0, 264,267 simulation theory,67,68
self robot, 23135, 233f,234f single-unit recording,46
selves, 248,264 sinusoidal function,92
disturbance of,25657 Sirigu, A., 58,59,76
as minimal,16972 skilled behaviors, 5051, 5761,59f
momentary, 170, 171, 173,264 slow dynamics, 2034, 203f, 205,
rangeof,33 206. See also intentionunits
as reflective,216 MTRNN, 223f,224
semantically combinatorial language at PFC, 2067,207f
of thought,145 QRIO,210
sensationalism,24 SMA. See supplementary motorarea
sensations, prediction of, Smith, Linda, 9798,211
15357, 153f, 155f, 157f. See also Soar, 14, 15,24647
synesthesia SOC. See self-organized criticality
sensory aliasing problem,134 Soon, C., 7071, 74, 218, 219, 220,
sensory cortices,636 4 223, 230,240
sensory-g uided actions, in speech recognition system,258
PMC,5354 Spencer-Brown, G.,1819
sensory-motor coordination, spiking neurons, 10910, 25556.
12832, 129f, 130f,131f See also neural networkmodels
sensory-motorflow Spivey, M., 100,146
action generation mirrored by, spoken grammar,1213
17598,178f spontaneity,22627
308

308 Index

spontaneous behaviors,21930 subjective views, 14148, 145f,147f


spontaneous generation subjectivity, 141, 172, 250,
of intention, 6975, 70f, 73f, 25455,26667
230 40, 236f,238f subrecursive functions,13
overview,7172 substantial parts,226
staged development,260 subsumption architecture,
statistical learning, of imitative 1079,107f
actions, 22122,221f Sugita, Yuuya, 1919 6, 192f,
steady phase, 168, 169,170 193f,195f
STM. See short-termmemory Sun, Ron,247
stochastic transitions between superior parietal cortex. See
primitive actions, 22122,221f precuneus
streams of consciousness,170 supplementary motor area
characteristics of,3739 (SMA), 4954, 50f, 52f, 61,
definitionof,37 636 4,7778
flights, 38,226 EEG activity,7071
free will and, 3741, 40f,42 M1 and, 2067,207f
images in,182 surprise,172n2
overview, 3741,40f suspension of disbelief (epoch),25
perchings, 38,226 symbol grounding problem, 1518,
states,3739 17f, 15961, 160f,243
stretching and folding, 87,88f symbolic dynamics,888 9
structural stability, 9093, 91f,92f symbolic processes,878 8
structuring processes of whole symbols, 1920, 108, 14548,
(Gestalt),34 147f,24547
subject symbol systems, 913, 12f,1820
object as separated from,2223 synchronization,188
object iterative exchanges,36 synchrony, 102, 102f,13132
object's unified existence with, synesthesia,34,63
25, 36, 244,248 synthesis, emergence through,83
subjective experiences, 2629, synthetic modeling approach, 79,
25155,267 82. See also dynamical systems
subjectivemind approach; embodiment
actions influencing,49 synthetic neurorobotics studies, 6,
objective world as tied to, 49, 7,2636 4
14850,149f synthetic robotics approach,7,267
objective world's distinction from, synthetic robotics experiments,
7, 2342, 24f, 36f,40f 150,218
phenomenology, 7, 2342, 24f,
36f,40f tactile palpation,34
subjective sense of time,32 Tanaka, Keiji,46
309

Index 309

tangency, 89, 89f,171 training, actions influenced by, 215.


Tani, Tohru, 17f, 28,35,38 See also developmental training
Tanji, J., 5152, 52f, 5354, 76,206 transient parts,226
TE. See inferior temporalarea transition rules, 1415,14t
tearing neurons,65 transition sequences,222
temporality,32 transitive actions,65
temporal patterns,182 transversal intentionality,28
temporoparietal junction Trevena,J.,70
(TPJ),61 Tsuda, I.,1686 9
TEO. See inferior temporalarea Turing limit, 112,245
Tettamanti,M.,67 turn taking, 188,190
that which appears, 2425,24f Turvey,959 6
Thelen, Ester, 9798, 99,211 tutored sequences, 22730, 228f,229f
thinking, thought segmentation tutoring, 23739, 238f,
of,146 24041,25961
thoughts,7172 two-d imensional objects, 3536,36f
chaos generating,108 two-stage model, 3941,40f
experiments, 1035, 104f,105f
semantically combinatorial Uddn, J., 2067,207f
language of,145 Ueda, Shizuteru,23
thinking segmented into,146 universal grammar,191
three-d imensional objects, unsteady phase, 168,16970
3536,36f usage-based approach,191
time, subjective sense of, 32. See also U-shaped development, 98100,99f
objectivetime
time perception phenomenology, V1. See primary visualcortex
2629, 176, 18687,24849 V2,45,48
tokens,10,13 V4, 46,47,48
tools, 5761,59f Van de Cruys, S.,25758
top-down forward prediction,215 Varela, Francisco, 27, 42, 117, 132,
top-down pathway, 63, 16465, 187, 248,254
172, 2078, 207f,250 vector field, 9192,92f
top-down prediction, 6061, vector flow, 9192,92f
16465,19798 vehicles, 1035, 104f, 105f,
top-down projection,14445 107,1089
top-down subjective intentions,263 Vehicles:Experiments in Synthetic
top-down subjective view,266 Psychology (Braitenburg),1036
touched,35 ventral intraparietal area (VIP),46
touching,35 ventral premotor area (PMv),
toy, 98100,99f 6465,65f
TPJ. See temporoparietal junction VIP. See ventral intraparietalarea
310

310 Index

virtual-reality mirror box,63 Wernicke's area,191


virtue,261 what pathway, 45, 46, 47f,
vision 16265,163f
Merleau-Ponty on,3435 where pathway, 45, 16265,163f
mobile robot with, 16272, 163f, will, 56. See also freewill
167f, 169f, 173, 19396, 193f,195f Wittgenstein, Ludwig,18
visual agnosia,56 w-judgment time, 6971,70f
visual alphabets,47 Wolpert, Daniel,58
visual cortex, 4449, 45f, 46f,47f words, 67, 14548, 147f,190
visual imagery,22829 World War II (WWII),94
visual objects, 46f,5657
visual palpation,144 Yamabico, 13236, 133f, 135f,
visual receptive field,62 164,173
visual recognition, 4454, 45f, 46f, branching, 133, 1526 0, 153f,
47f, 50f,52f 155f, 157f, 158f, 160f,176
visuo-proprioceptive (VP) flow,210 intentionality of,161
visuo-proprioceptive mapping, look-ahead prediction, 15457,
13132,131f 155f,157f
visuo-proprioceptive (VP) navigation experiments with,
trajectories, 21315,214f 15362, 153f, 155f, 157f,
voluntary actions,39 158f,160f
voluntary sequential movements in neurodynamic structure,
SMA, 5053,52f 15759,158f
von Hofsten, Claes,143 prediction, 15357, 153f,
VP flow. See 155f,157f
visuo-proprioceptiveflow RNN, 15361, 153f, 155f,
VP trajectories. See visuo- 157f,160f
proprioceptive trajectories symbol grounding problem,
15961,160f
walking, 12628,128f trajectories of, 155, 15657,157f
walking reflex,98 Yamashita, Yuuichi, 208, 25657.
water hammers,45,6 See also multiple-t imescale
Werbos, Paul, 113. See also error recurrent neural network
back-propagationscheme Yen, S.C., 49
311
312

Você também pode gostar