Você está na página 1de 31

Psychologica Belgica, Vol.36, No#, pages ***{***.

ON THE NEED FOR COMPUTER MODELING: THE CASE OF


LANGUAGE PROCESSING
Alain Content y and Uli H. Frauenfeldery
Laboratoire de Psychologie Experimentale, Universite libre de Bruxelles y Laboratoire de
Psycholinguistique Experimentale, Universite de Geneve

abstract

Computational modeling is a fundamental extension to the psychological


scienti c toolkit. The present contribution aims at clarifying the pros
and cons of modeling techniques, using examples from language processing. We present some strategies that may help avoid potential pitfalls
of the computational modeling approach. The traditional relationship
between theory and experimentation in psycholinguistic research is also
considered as well as some limitations associated with the standard experimental approach. Finally, we insist upon the complementarity between theory, experimentation and modeling.
1.

Introduction

A few years ago, during a lunch discussion, Paul Bertelson came up, in his usual
perceptive and provocative manner, with something like \what is all this fuss about
computational psychology? Isn't that what we have been doing all the time since
cognitive psychology was launched? Then what is the dierence?" This paper is
an attempt to answer Bertelson's questions. We argue that the de ning feature of
the computational approach is the use of computer simulation techniques to develop
models of processing systems, and we suggest that, within an information-processing
framework, such tools constitute a natural complement to experimental procedures.
Correspondence should be addressed to Alain Content needs the address here. The writing
of this paper has beneted from support from the Swiss National Fund for Scientic Research
(grant 11-39553.93)and Belgian National Fund for Scientic Research (grant 9.4565.92). We thank
Daniel Holender, Guy Lories, and Dominic Massaro for their comments on a previous version, and
Axel Cleeremans for stimulating discussions and encouragements.
1

LANGUAGE CONNECTIONIST MODELS

Most psychologists interested in the study of human perception and cognition


share the fundamental assumption that mental activity can be described and analyzed as the functioning of a particular kind of physical machine. Most authors
would also argue that descriptions of information processing mechanisms provide
explanations of perception, cognition and human behavior. On the other hand,
the de ning feature of the computational approach is the recourse to computer
techniques as tools to model or simulate information processing systems through
numerical or symbolic computation. Thus, in that context, we believe that Bertelson was right in pointing out the natural liation between the basic assumptions
of cognitive psychology and the recourse to computer modeling. As far as cognitive psychology's agenda is to explain mental life through descriptions in terms of
information processing mechanisms, there may not be any principled disagreement
between that programme and the computational approach. In fact, as often noted,
the information-processing framework is largely derived from the application of the
computer metaphor to mental activity. So, within the information processing approach to psychology, whether one appeals to computational modeling or not may
be more a matter of research strategy than a matter of principle.
Still, not all current metatheories in psychology seem to stick to the principle of
mechanistic explanation, and several well-known scholars have expressed concerns
about the potential and limitations of the information-processing approach as an
answer to psychological inquiry (e.g., Neisser, 1976 Norman, 1980). For instance,
in a recent paper entitled \Has psychology a Future?", Eleanor Gibson (1994, p.70)
claims
When someone asks me (as they quite often do), \But what is the mechanism?" my answer is that I am not a mechanist and I do not believe
in separation of mental processes and action.
We believe that such disagreements may be more apparent than real and may result from an unduly restricted notion of mechanism. In our view, the question \what
is the mechanism?" entails little more than a prompt to provide explanations framed
as answers to \How?" questions. While we agree in principle that some psychologically relevant explanations may not need to take the form of responses to \How?"
questions, we assume that mechanistic descriptions of information processing provide valuable explanations for several aspects of human cognition. Massaro and
Cowan (1993) similarly argue that ecological realism, the physical symbol system
hypothesis, connectionism, and the modularity hypothesis constitute four variants
of a more general information processing framework, and we tend to agree with
this analysis. We would add the dynamic system modeling approach (Port & van
Gelder, 1995b) as one further variant. As a consequence, \computational" is used
in this paper as a cover term including all frameworks using automatic computation devices to simulate mental activity. Thus, our acception diverges from others',
who use the term to refer more speci cally to the metaphor of the Von Neumann
computer, excluding other approaches based on connectionist or dynamical systems.

CONTENT & FRAUENFELDER

Even within the circle of psychologists who adhere to the information processing
framework, there is no clear agreement on the role of the computational modeling enterprise in cognitive psychology. Some (Loftus, 1993) warn of the dangers of
superpowerful tools leading to supercomplex models. Others seem to remain skeptical or agnostic (MacKay, 1993) others still put great hopes in the contribution of
computational modeling or even consider that computational models are a requisite of psychological theories (Broadbent, 1987 Estes, 1993 Johnson-Laird, 1983
Johnson-Laird, 1988 Parisi & Burani, 1988). Our own feeling is that the relevance
and the contribution of the computational approach in current cognitive research
is often misunderstood. We believe that the modeling endeavor is a fundamental
improvement to the psychological scienti c toolkit. So, one aim of the present contribution is to clarify the pros and cons of modeling techniques, with a particular
focus on aspects of language processing. Another aim, without any intention of being prescriptive, is to suggest some strategies that may help avoid potential pitfalls
of the computational modeling approach.
In the rst section of the paper, we oer some general de nitions of the nature
and function of theories and models within psychological science. In the second section, we reconsider the traditional relationship between theory and experimentation
in psycholinguistic research, and we discuss limitations associated with the standard
experimental approach. The nal section examines the contribution of computational modeling to psychological inquiry, and underscores the complementarities
between theory, experimentation and modeling.
2.

Theories and Models as Information Processing Explanations

Psychology, as any other science, aims at producing theories. To articulate in


some detail the nature of the modeling endeavor and its relation to psychological
theorizing, it is necessary to clarify the relationship between theories and models.
2.1. Theories
A theory is a structured set of mental constructs (concepts, propositions, de nitions) that provides an explanation for some set of phenomena. According to the
Encyclopdia Britannica, a scienti c theory is a
systematic ideational structure of broad scope, conceived by the human
imagination, that encompasses a family of empirical (experiential) laws
regarding regularities existing in objects and events, both observed and
posited. A scienti c theory is a structure suggested by these laws and
is devised to explain them in a scienti cally rational manner.
Theories must explicitly and accurately describe the phenomena under consideration, but descriptive adequacy is, of course, not sucient: Theories are expected to
explain things. The notion of \explanation," however, is not an easy term to de ne,
and it is unclear what exactly scientists and philosophers mean by terms such as
\explain" or \understand." Intuitively, what counts as an appropriate explanation
depends on a collection of criteria. To explain is \to make plain." In general, the

LANGUAGE CONNECTIONIST MODELS

process of explanation can be viewed as presenting some phenomenon that we do


not understand in terms of other concepts that we believe we understand, or which
are deemed to be simpler, or maybe which are generally accepted.
Some examples may help clarify what constitutes an appropriate scienti c explanation. Newton's law of universal attraction is a prototypical instance of a useful
theory in physics. It is (at least at a macroscopic level) descriptively correct, its
formulation is extremely synthetic, and it is universal since it applies to an in nite
number of situations. The law of universal attraction provides an explanation for a
number of phenomena. Note, however, that Newton's laws are not the nal word:
As such, they do not oer any explanation of why bodies are attracted at a distance. In other words, we do not yet fully understand the causal mechanism that
determines the attraction between bodies.
Another example from a eld of psychology with which we are familiar, concerns
the nature of the determinants of reading ability. Current theories of reading acquisition state that there is a causal relation between children's ability to understand
spoken words as a sequence of phonemic segments and their later success in reading.
This claim is supported by several empirical observations that have been replicated
often by dierent research groups in various languages (see, e.g., Morais, Alegria &
Content, 1987 Rohl & Pratt, 1995 Wagner & Torgesen, 1987 for reviews). We do
not know, at this time, how children come to understand speech as a discrete string
of segments, nor why some children fail to develop such segmental representations.
Yet, the notion of a causal relation between phonological awareness and reading
acquisition has important implications, most notably in the domain of reading instruction as well as for the prevention of learning diculties and the correction of
reading disabilities.
Some authors apparently believe that the dierence between descriptive and
explanatory adequacy is in terms of predictive power: A good theory should not
only describe accurately what is already known, but should also be able to predict
new phenomena. Pylyshyn (1984), for instance, argues that cognitive explanations
must be stated in terms of cognitive vocabulary because this is the level at which
useful generalizations and predictions can be made. Other researchers point out
that predictive power in itself is not a sucient criterion and argue that explanatory
adequacy depends on conformity to universal, independently motivated principles
or constraints (e.g., Chomsky, 1965 Johnson-Laird, 1983 Seidenberg, 1993).
Both of the above examples, dierent as they may be, constitute statements of
reliable and established regularities that may be empirically observed and veri ed.
Any phenomenon that appears as a logical consequence of such regularities would
naturally be considered explained by them, although the regularities themselves are
only descriptive. Ideally then, a complete theory should reduce phenomena to a
limited set of general or even universal principles. However, the available explanations are often more limited in scope and, apparently, the \grand uni ed theory" of
psychology is nowhere close to emerging. Whether such a theory is at all possible
in psychology is even debatable. Thus, what seems crucial in accepting an explanation for some phenomena is that the phenomena appear as logical consequences of

CONTENT & FRAUENFELDER

independently motivated explanatory statements. We cannot aord to ignore this


type of partial knowledge, which happens to be the rule rather than the exception
in psychology. So the critical question regards how to pursue simultaneously the
quest for the most general principles and the discovery of such local generalizations
and explanations.
2.2. Models
What is the dierence between a theory and a model? According to MerriamWebster Collegiate Dictionary, a model can be
a miniature representation of something an example for imitation or
emulation a description or analogy used to help visualize something
(as an atom) that cannot be directly observed a system of postulates,
data, and inferences presented as a mathematical description of an entity
or state of aairs.
These various de nitions all capture some of the uses of the notion of model in
scienti c circles. Our own idea is that a model is a particular type of theoretical
elaboration, which is most often expressed as a metaphorical device: Proposing
a model of a system amounts to devising another system in such a way that its
behavior will mimic relevant aspects of the target system behavior of the target
system. This will allow to explain these properties of the target system by referring
them to the relevant characteristics of the model.
Of course, models can be formulated in dierent guises, and need not be made
of real stu, and one can think of various types of models: concrete ones (such
as plastic models of complex molecules), symbolic or iconic ones (such as, within
cognitive psychology, those expressed in the box-and-arrow-diagram-ow type of
symbolism), mathematical models, or computer models. One could even extend
the notion to verbal models, which would then be descriptions of imaginary devices
(such as Morton's, 1969, notion of logogens).
We suggest that the essential dierence between theories and models is in terms
of the conceptual media employed to express the ideas, but that this dierence in
media also entails dierences in scope. Theories are generally expressed in the form
of abstract principles of general applicability. When facing complex phenomena, in
which multiple factors interact in intricate ways and vary dynamically over time,
it may be dicult or even impossible to specify or imagine the behavior of the
system from the general principles alone. Models serve the function of making the
interplay of the general abstract principles more concrete and more accessible to
our understanding, within a delimited domain. Thus, most psychologists would use
the term \theory" to refer to the notion of spreading activation and would refer to
Quillian's \model" of semantic memory. The notion of spreading activation refers
to a general principle of how information ows, whereas Quillian's model applies
that principle in one more circumscribed domain. Similarly, one can read about
information \theory," and of its application to human attention in Broadbent's
lter \model." In other domains, likewise, one can distinguish between general
:::

:::

:::

LANGUAGE CONNECTIONIST MODELS

laws and their applications. Meteorology, for instance, illustrates the gap between
general physical laws and physical processes (such as heat radiation, convection
or conduction) and their intervention in models of global climatic change, local
weather, hurricane dynamics or the greenhouse eect. In the latter example, despite
the universal acceptance of the general laws, it appears that most phenomena have
not received a detailed and satisfactory model yet.
In sum, models can be seen as simpli ed representations of some parcel of reality. They are instantiations of general theoretical hypotheses, in a form that lends
itself to more detailed investigation. By virtue of their analogical structure, they
provide intuitive understanding of their object. Indeed, several psychologists studying human reasoning and inference have argued that much of our understanding in
everyday life settings is based on the elaboration of mental models of the situation
or problem domain. To quote Johnson-Laird (1983, p. 2):
the psychological core of understanding consists in your having a \working model" of the phenomenon in your mind. If you understand ination, a mathematical proof, the way a computer works, DNA or a
divorce, then you have a mental representation that serves as a model
of an entity in much the same way as, say, a clock functions as a model
of the earth's rotation.
How do theories and models come into play in psychological research? One basic
tenet of modern cognitive psychology is the belief that interesting explanations are
to be found in the understanding of the mechanisms at work. Palmer and Kinchi
(1986) have tried to identify the fundamental assumptions of the information processing framework from a psychological viewpoint. They consider ve assumptions,
of which two seem more directly relevant in the present context. These are the
assumptions they call informational description, and recursive decomposition. The
principle of informational description states that mental phenomena can be described as informational events, consisting of three parts: the input information,
the operation performed on the input, and the output information. The principle
of recursive decomposition states that
any complex (i.e., nonprimitive) informational event at one level of description can be speci ed more fully at a lower level by decomposing
it into (1) several components, each of which is itself an informational
event, and (2) the temporal ordering relations among them that specify
how the information \ows" through the system of components (p.39).
Flow diagrams have been used abundantly in the form of box and arrow representations. they provide a compact description and a clear decomposition of a process
into a sequence of stages. This strategy of process decomposition has received much
attention in some domains (especially when strong interactions with neuropsychology were possible). In fact, in some areas, the strategy of process decomposition
has been so inuent that the issue of componential architecture became for some
time a major focus of the research eort, culminating in the modularity hypothesis.

CONTENT & FRAUENFELDER

The endeavor is nicely illustrated by the following quotation from the physicist
Lord Kelvin (cited by Johnson-Laird, 1993):
I never satisfy myself until I can make a mechanical model of a thing. If
I can make a mechanical model I can understand it. As long as I cannot
make a mechanical model all the way through I cannot understand
Lord Kelvin's saying could be taken as a motto for cognitive psychology. Indeed, one underlying driving force to current research must be the belief that we
will reach some understanding of mental life and behavior by analyzing perception, recall, language and reasoning as information-processing mechanistic systems.
For instance, MacKay (1988 1993) contrasts two research strategies in psychology,
which he labels the empirical and the theoretical epistemology respectively. The
mission assigned to science by an empirical epistemology is to gather a body of reliable facts and regularities, whereas for a theoretical epistemology it is to develop
theories that explain available facts. MacKay attributes the unsatisfactory state
of advancement of knowledge in psychological research to over-reliance on the empirical, result-centered strategy. He thus appeals to a more theoretically oriented
research strategy in psychological science, and claims (1993, p. 237) that
the sine qua non of theories within the theoretical epistemology is mechanistic explanation: Theories are not just descriptive, but explain phenomena in terms of underlying mechanisms.
Therefore, it comes as no surprise that most current theorizing is about models.
In a sense, the program of modern cognitive psychology could be seen, or even
de ned, as the project of modeling mental activity. So, why not use the best tools
available?
:::

3.

Verbal models and data

One widely accepted strategy in science is empirical falsi cation. Thus, in principle,
one should start with a theoretical hypothesis (induced from preliminary observations or inferred from established results), and generate an empirical prediction, as a
relation between one or several dependent variables and one or several independent
variables. Then, one would design an experiment manipulating the independent
variables and monitoring the eect on dependent variables. According to the falsication principle, the interesting case is when the data do not t with the theory,
since this should trigger its revision or its rejection. In short, science would make
progress through negative feedback.
Within cognitive psychology, many authors have pointed out the limitations of
the falsi cation strategy (MacKay, 1993 Newell, 1990). Moreover, attempts to
conform to the falsi cation precepts have generally resulted in disappointment and
disillusion. This state of aairs may be attributed to three dierent factors: the
complexity of the phenomena under scrutiny, the lack of speci cation of verbal
models, and the general issue of model identi ability.

LANGUAGE CONNECTIONIST MODELS

3.1. Complexity of phenomena


The eects upon modeling of the complexity faced in characterizing language processing can be illustrated by examining the history of models of spoken word recognition. The original cohort model (Marslen-Wilson & Welsh, 1978) represents the
rst attempt to provide a systematic description of spoken word recognition. The
model assumes two successive stages of processing. During the rst, all words that
exactly match the onset (i.e., the initial one or two segments) of the target word
are activated, thus creating a set of competitors which constitute the initial cohort
of the target. This initial activation phase is followed by a stage of deactivation
during which the cohort members that do not match later sensory input are eliminated from the cohort. The number of cohort members decreases as more stimulus
information becomes available. This model makes precise predictions about the
moment at which any word can be recognized in a given lexicon from an analysis of
its cohort members. The recognition point is assumed to correspond to the word's
uniqueness point, or the moment that the word becomes unique with respect to all
other words in the lexicon. A given target word spoken in isolation is assumed to
be recognized when it is the only word remaining in the cohort. For example, a
word like \elephant" heard in isolation is predicted to be identi ed at the sound
/f/, since there are no other words in the lexicon sharing the initial sequence /elef/.
By its clarity and simplicity, cohort I generates precise predictions about the
time-course of recognition: Recognition should be a linear function of the position
of the uniqueness point. The fact that these predictions can be tested and falsi ed
makes the model attractive. However, some critics were quick to point out various
ways in which this simple description fails to account for the robustness of human
language perception (e.g., Norris, 1990).
To incorporate some psychologically more realistic assumptions, Marslen-Wilson
(1987) proposed a new version of his model, cohort II. This model appeals to the
notion of level of activation to express the varying degree of match possible between
the input and dierent lexical competitors. Cohort members vary in activation as a
function of their t with the input but also as a function of their frequency. While
the status of words in the original model is binary (either in or out of the cohort),
in the new formulation of the model cohort membership is a matter of degree. Still,
the model does not specify how the frequency of words and their degree of match
with the input determine activation.These factors and their relative contribution to
lexical activation cannot be quanti ed in a verbal model, so no precise de nition of
the competitor set is yet available in cohort II.
The preceding discussion of the two versions of the cohort model illustrates a
general dilemma confronting eorts to model lexical processing. cohort I makes
clear and testable predictions, but at the price of several simplifying assumptions.
In contrast, cohort II is a more complex verbal model and presumably ts better
with what we know about lexical processing. However, it does not provide direct
answers to the questions concerning the competitor set and therefore cannot predict
the time-course of word recognition.

CONTENT & FRAUENFELDER

We take the lesson to be the following. Simple, verbal models are helpful in
shaping and formalizing questions and issues: As far as they capture the major
dimensions of the problem, they provide a good account of it. However, psychological phenomena are generally aected by a large number of variables, and language performance is no exception. Some factors that must be dealt with in word
recognition research include the form properties of words (quality of sensory input,
length, phonological structure, etc.), the grammatical and abstract properties of
words (syntactic form class, semantic category, word frequency and morphological
structure) and the properties of the lexicon (number of competitors, form properties of competitors, grammatical and abstract properties of competitors). Most of
these factors cannot be adequately taken into account using dichotomous categories,
and psycholinguistics is often faced with intricate interactions and covariations of
multiple factors.
Because verbal models are intrinsically limited in their ability to describe the
inuence of multiple factors and their interactions, they lead to inappropriate simpli cation for the sake of prediction. Simpli cation takes two forms: limiting the
number of factors taken into account, and treating the factors as dichotomous rather
than multi-valued.
One important consequence of these characteristics is the introduction of a bias
toward an analytical methodology. Typical experimental designs manipulate only
a small number of independent variables (often de ned in a binary fashion) and
attempt to control or neutralize other potential factors. While this analytical approach may be appropriate at a rst stage in experimental research, it clearly fails
to deal with the full complexity of cognitive phenomena. As we will argue below,
the addition of computer simulation helps overcome this hurdle.
Another unfortunate consequence of the approach is that it leads to local theorizing, and (in MacKay's, 1993, terms) empirical theorization which is data driven,
and domain speci c. There are several risks to local theorizing. One is the lack of
integration of the research. This is abundantly illustrated in the often blamed paradigm driven research strategy. As pointed out by MacKay and others, miniature
models designed to account for a small number of results have proliferated rather
than merged into a single general theory (MacKay, 1988 Newell, 1973 Norman,
1980). Moreover, because local theorizing proceeds in isolation, a further danger is
that it rarely refers to general principles and thus remains inherently descriptive or
at best, weakly explanatory. Accounts of phenomena such as lexical decision performance provide a good example of this. Lexical decision has mainly been studied as
a speci c language task, rather than as an example of a binary decision task applied
to the domain of language, and thus without consideration of what is known about
the mechanisms of binary decision tasks in general.
3.2. Underspeci cation
A related diculty is that verbal models and information ow diagrams most often
leave many details unspeci ed. The focus on the global architecture induced by the
functional decomposition strategy has generally resulted in insucient explicitness

10

LANGUAGE CONNECTIONIST MODELS

in the description of both the nature of representations at each stage, and the processing mechanisms operating from one stage to the next. Many models of visual
word recognition proposed in the last twenty years could be taken as illustrations
of that limitation: Word recognition is decomposed in a sequence of transcoding
operations, which are only speci ed in terms of their input-output relations. Little
is known about the transcoding processes themselves. This feature is epitomized
in Neisser's (1976) caricature of an information processing model of perception, in
which the three successive boxes are labeled \processing," \more processing," and
\still more processing," respectively. Even when the nature of the representations
or the transcoding operations are more clearly speci ed, one important dimension
that is not explicitly handled is the dynamic characterization of the processing|
particularly for chronometric data (see Parisi & Burani, 1988, for further discussion). For instance, despite the large impact of the dual route model of visual word
recognition, the nature and the time course of grapheme-phoneme conversion have
never been explicitly included in the formulation of the verbal models, and this
has prevented attempts to disentangle the dual-route model from competing lexical
analogy accounts. Thus, the lack of attention to the detailed picture often makes it
hard to generate predictions that can be put to empirical test.
3.2.1. System Identi ability
Even if fully speci ed verbal models were available, however, they would not be
immune to a third problem, that of model or system identi ability. The notion
of system identi ability is discussed in some detail by Massaro and Cowan (1993).
It was introduced by Moore (1956) in the context of formal automata theory, and
refers to the problem of describing the inner workings of a machine when only its
input and outputs are available. Moore demonstrated that any input-output mapping can be reproduced by many dierent automata, so that it would in general be
impossible to uniquely identify the processing mechanism underlying some set of
input-output pairs. Applied to psychological research, this claim seems to strongly
undermine the information processing enterprise. However, as Massaro and Cowan
aptly point out, there is only a partial similarity between the problems addressed
by psychological inquiry and formal automata theory. One dierence is that psychological investigation does not need to restrict its observations to the inputs and
outputs of a processing component. It can extend the database by considering
other measures of performance, such as chronometric data, neuropsychological or
developmental observations.
Furthermore, one can add other constraints on the space of potential models by
taking into account formal conditions such as simplicity and parsimony (see Jacobs
& Grainger, 1994 for an extended discussion), external sources of evidence, such
as neural limitations or neuroanatomical characteristics, or general principles of
processing. Whether such external constraints will ever be sucient to solve the
issue of model identi ability is perhaps a matter of faith. However, one important
consequence to which we will return later is that external constraints are crucially
needed to restrict the set of admissible models.

CONTENT & FRAUENFELDER

4.

The computational approach

11

4.1. A de nition
Appealing again to the Encyclopdia Britannica, a computer simulation refers to
the use of a computer to represent the dynamic responses of one system
by the behavior of another system modeled after it. A simulation uses
a mathematical description, or model, of a real system in the form of a
computer program. This model is composed of equations that duplicate
the functional relationships within the real system. When the program is
run, the resulting mathematical dynamics form an analog of the behavior
of the real system, with the results presented as data.
We will restrict ourselves to a general discussion of the advantages, potential
drawbacks and limitations of the modeling approach. We will not elaborate on the
issue of model evaluation, which has been discussed recently by others in psycholinguistic research (see, e.g., Dijkstra & de Smedt, 1996 Jacobs & Grainger, 1994).
Nor will we enter here into the debate about which particular modeling framework
(e.g., symbolic, connectionist, distributed) is preferable or optimal.
Computational modeling refers to the use of computer programs to simulate some
set of phenomena. Two bene ts to the use of computer modeling in psychology are
often mentioned. One is the requirement of full speci cation of the process under
consideration, and the other is the model's ability to deal with empirical complexity.
In view of the limitations of verbal accounts that we have described in the previous
section, these advantages are important, and deserve further discussion.
These bene ts are particularly relevant within a perspective on model construction in which the designer starts with a verbal model and aims at implementing it as
a computer program. However, we think that there is more to the modeling enterprise and that this restrictive vision of modeling-as-theory-implementation severely
limits the bene ts we can expect from the modeling endeavor. Borrowing partly
from an analogy previously proposed by McCloskey (1991), Jacobs and Grainger
(1994) describe two strategies for model construction that appeal to two dierent
professions: the architect and the gardener. The architect starts from an explicit
(verbal) theory of the target function, and implements it in a computational system.
The gardener's strategy consists of
growing a model or network that mimics in some respect a human cognitive function, without necessarily having an explicit theory of that
function (p. 1327).
In recent years, the gardener's strategy has become more feasible and promising,
thanks to the availability of powerful automated learning algorithms in various
elds of computer science, such as arti cial neural networks, symbolic manipulation
systems (see, e.g., Ling & Marinov, 1993), or probabilistic systems such as hidden
Markov models.
Yet, it would probably be misleading to associate the gardener's approach too
closely with the use of arti cial neural networks, or even with the deployment of

12

LANGUAGE CONNECTIONIST MODELS

automatic adaptive procedures. The architect and the gardener are two extremes
on a continuum ranging from a strict implementation strategy to a mere data tting
strategy. It seems likely that every architect is endowed with a bit of the gardener's
art, and that every gardener secretly entertains a sketch of its accomplishment. In
other words, interesting modeling work involves elements from an intentional and
theoretically-based design, but also unexpected features that emerge from the interplay of the assembled mechanisms. There are many examples in the history of
Arti cial Intelligence illustrating how unforeseen consequences arise from computational implementations.
In the remaining part of this section, we rst discuss the three issues identi ed
earlier from the architect's perspective, namely, detail speci cation, complexity, and
system identi ability, and then continue by developing the speci c issues that may
arise from adopting the gardener's point of view.
4.2. Detail speci cation
Designing a running model of a given set of phenomena obviously forces the modeler
to ll the details missing in the verbal theory, and this immediately pays o by
permitting detailed, quantitative tests of predictions derived from the model's actual
behavior. However, xing the details to transform an abstract scheme into a working
system is not easy. As any architect would know, the nal appearance of the work
may depend on the wallpaper choice as much as on the initial blueprints. Similarly,
in creating a computer model, designers will encounter many unsettled issues and
their decisions|even totally arbitrary ones|may have a crucial inuence on the
performance of the system.
In this regard, it is instructive to examine the evolution from the verbal formulations of cohort I to a related computational realization, the trace model. There
is a direct liation between these two models, as the following quotation testi es:
Although the cohort] model is vague and fails to address many important issues, it is attractive enough so that we have used it as the basis for
our initial attempt to build an interactive model of speech perception.
(Elman & McClelland, 1984, p. 349).
It is thus interesting to examine how the models diverge from each other, and to
establish what the constraints are that play a role in the implementation process.
trace is an interactive activation model made up of distinctive features, phonemes,
and word units that represent hypotheses about the sensory input. These three
types of units are organized hierarchically (see Figure 1). There are bottom-up
and top-down facilitatory connections between units on adjacent levels (featurephoneme, phoneme-word, and word-phoneme) and inhibitory connections between
units within levels (feature-feature, phoneme-phoneme, and word-word). Incoming sensory input provides bottom-up excitation of distinctive feature units which
in turn excite phoneme units. Phoneme units are activated as a function of their
match with the activated distinctive features so that several alternative phonemic

CONTENT & FRAUENFELDER

13

Figure 1. A sketch of the trace model of spoken word recognition.

units are activated for a given input. As the phonemes become excited, they increase the level of activation of words that contain them. As words receive some
activation, they begin to inhibit each other. In addition, as words become activated,
they also excite the phonemes that they contain in a top-down fashion.
trace diverges from the cohort model in its assumptions concerning information or activation ow. These assumptions are derived from the principles of
interactive activation models. Unlike cohort, trace includes both lateral inhibition between word units and top-down activation from the word to the phoneme
level. By the lateral inhibition mechanism, the target word inhibits its competitors,
but is also inhibited by them. The degree to which one word inhibits another depends on the former's activation level: The more activated a word is, the more it
can inhibit its competitors. The dynamics of interactive activation and, in particular, this lateral inhibition of competitors allows trace to keep the actual activated
competitor set small and to converge on a single lexical entry despite the mass of
lexical candidates that contend for recognition. According to the top-down activation mechanism, activated words provide top-down excitatory feedback to the
phoneme units they contain by increasing the latter's level of activation. These
phoneme units can in turn excite the connected word units.
The sequential and continuous properties of speech create a major challenge for
computational models like trace. Indeed, since words can, in principle, begin at

14

LANGUAGE CONNECTIONIST MODELS

any point in the signal, trace must be able to represent every lexical candidate for
each incoming input segment and to assign these candidates a position in the signal.
trace proposes that time is represented spatially. For each time-slice, it constructs
a complete network in which all the units at every level are represented. Thus, to
recognize an input made up of four phonemes, trace constructs at least four (in
fact, 4 6 since each phoneme extends over 6 time-slices) complete lexical networks
and retains the time cycle at which each lexical unit begins. This solution of spatial
reduplication is neither psychologically realistic nor ecient, as was pointed out
by Norris (1990) who suggested that an alternative solution to the problem of
representing time is provided by recurrent networks (see also Content & Sternon,
1994).
We can thus distinguish three essential sources in the elaboration of the trace
model: the pre-existing cohort model, the general assumptions derived from the
interactive activation framework (graded activation, cascade processing, lateral inhibition, top-down excitation), and implementation constraints (i.e., the particular
way the whole network is reduplicated to account for the time dimension).
Some implementation decisions on which trace is based have directly inuenced
the course of empirical research and have contributed to launch new issues or to
reshape existing ones (see Frauenfelder, 1996 for a discussion). For instance, the role
of lateral inhibition has led researchers to explore the nature and inuence of lexical
neighbors on auditory word recognition. Similarly, the reduplication of the network
in time makes it possible to investigate the processing of continuous sequences of
words, and has attracted attention to the issue of lexical segmentation and to the
processing of words embedded in longer words (see Frauenfelder & Peeters, 1990).
Other examples abound. There is a similar liation between the interactive
activation model of visual word perception (McClelland & Rumelhart, 1981) and
Morton's logogen model (1969):
Our model also draws on earlier work in the area of word perception.
There is, of course, a strong similarity between this model and the logogen model of Morton (1969). What we have implemented might be
called a hierarchical, nonlinear, logogen model with feedback between
levels and inhibitory interactions among logogens at the same level. We
have also added dynamic assumptions that are lacking from the logogen
model (McClelland and Rumelhart, 1981, p. 388).
Yet, the two models have largely diverged in their inuence on subsequent research. The logogen model has essentially inspired discussions in the neuropsychological literature about the componential architecture of the lexical function,
leading to a multiplication of speci c subsystems (Ellis & Young, 1988 Morton,
1980). The interactive activation model, besides its adoption in various areas of
language and cognitive processing, has promoted renewed interest on more microscopic issues about lexical processing, such as the inuence of lexical neighbors in
the recognition process.

CONTENT & FRAUENFELDER

15

Design decisions can be motivated by dierent concerns ranging from general theoretical postulates, empirical ndings, epistemological considerations (such as Occam's razor principle), to pragmatic constraints such as expediency and eciency.
As we have argued previously, it is always the case that neither the preexisting
verbal theory nor the empirical database fully determines the model. This poses a
problem in that pragmatic constraints may lead to decisions that are theoretically
unmotivated, arbitrary, or ad hoc. One common problem involves representational
choices in connectionist modeling. As noted by Dijkstra and de Smedt (1996),
present empirical techniques provide scanty information regarding the format of
mental representations. Thus, model designers are forced to refer to other constraints. For instance, the use of \wickelgraph" and \wickelfeature" representations
in Seidenberg and McClelland's (1989) distributed model of visual word recognition
and Rumelhart and McClelland's (1986) model of past-tense acquisition was partly
guided by design considerations. In both cases, the authors acknowledged that their
choices were meant to facilitate generalization, given other known characteristics of
the connectionist framework adopted.
Critics and skeptics have been quick to question the role of such implementation
choices in shaping the models' behavior. If, as some asserted (Bever, 1992 Lachter
& Bever, 1988), these trics (\The Representations It Crucially Supposes") are
primarily responsible for the models' successes, the interest of the demonstration
is strongly undermined. More recent simulation work by Plaut, McClelland, Seidenberg and Patterson (1996) indeed suggests that the nature of the orthographic
and phonological representations has a direct inuence on the model's ability to
generalize.
Two potential strategies may help clarify the extent to which the behavior of
models depends on theoretically irrelevant implementation details. One is to test
the robustness of the behavior across variations of implementation details. An
example of this approach is provided by Plaut and Shallice's (1993) simulations
of deep dyslexia, in which the authors carefully showed that the main behavioral
characteristics resisted variations in network topology, sites of lesion, and training
algorithms. A complementary approach is to abstract away the general design
principles that are operating and which account for the functional characteristics of
the realized model (Stone & Van Orden, 1994 Van Orden & Goldinger, 1994 Van
Orden, Pennington & Stone, 1990).
One question that may arise from the previous discussion is whether the modeling
endeavor is worthwhile, given the apparent insuciencies of the empirical database.
Should we not wait until we know enough? Our answer is to turn the claim the other
way around: We believe that the modeling enterprise provides an important side
bene t, besides the immediate outcome of having a running computer model. By
facing the constraints of implementation directly, modelers are forced to identify
theoretical issues that might otherwise be overlooked. If we do not face these
implementation constraints, we may remain ignorant of our own ignorance.

16

LANGUAGE CONNECTIONIST MODELS

4.3. Complexity
Human behavior unfolds in time and is subtly sensitive to a huge number of factors. It thus seems natural to resort to dynamic systems to describe, formalize and
simulate the complex interactions that determine the observed phenomena. Indeed,
other sciences which share some of the same characteristics, such as economics or
meteorology, gradually moved to computer modeling when hardware and software
of sucient power became available.
Within psychology, a similar move is occurring and modeling techniques appear
more and more as the appropriate interface between theoretical formulations and
empirical observations. For instance, in a recent introductory paper on mathematical models in psychology, Estes (1993) notes: \Models are essential to set the stage
for tests of hypotheses about theoretical concepts." Furthermore, he adds (p. 9{10),
We are dealing with complex systems in which processes or mechanisms
do not exist alone. ...] Models are also essential to the analysis of
complex situations. In psychological research, we are always dealing with
complex systems in which any observed behavior can be the resultant of
many dierent, and often interacting, causal factors. Thus the outcomes
of experiments can only be interpreted by comparing what is observed
with what was expected from some simpli ed view of the situation, that
is, a model.
What appears as one major achievement of computer models is (or should be)
the generation of precise and detailed predictions encompassing rich ensembles of
factors from a simple and limited set of assumptions. Besides the obvious precision
gain (which may not be in itself the most interesting feature, given the limitations
of empirical techniques), we see two more important improvements that depend
upon the availability of more realistic simulation models. In short, we argue that
simulation models may provide a partial solution to the limiting inuence of the
analytical bias in empirical research and to the ubiquitous problem of observational
fragility.
4.3.1. Avoiding analytic bias
The power of current computing technology makes it possible to develop models which apply to relatively large bodies of stimulations. In recent years, many
published simulation studies have incorporated realistic stimulus sets.When models compare adequately in scale, one immediate consequence is the possibility of
comparing simulation and empirical results at the most detailed, ne-grained level.
The availability of real scale models makes it possible to obtain estimates of simulated performance for large sets of words, and thus to transcend some limitations
associated with the standard factorial design in experimental research. Indeed, in
the recent years, an increasing number of research teams have begun to augment the
standard experimental methodologies with studies using much larger stimulus samples and multivariate statistical analysis techniques (Seidenberg, Plaut, Petersen,

CONTENT & FRAUENFELDER

17

McClelland & Patterson, 1994 Treiman, Mullennix, Bijeljac-Babic & RichmondWelty, 1995).
These methods nicely complement the more traditional approach. First, they
provide a welcome relief to those enduring the torturing task of searching for appropriate language stimuli varying along many selected dimensions and controlled
for even more other dimensions (Cutler, 1980). Second, they go beyond factorial manipulations in handling the combination and interaction of factors that are
characteristic of the real world. Moreover, when combined with appropriate simulations, they provide extremely powerful tools to assess the ne-grained adequacy
of the model.
4.3.2. Observational fragility
Broadbent (1987) argued that small-scale computational models may oer a response to what he calls \the problem of observational fragility," that is, the fact
that a minimal variation in task demands or experimental conditions can drastically modify the outcome of the experiment, leading researchers to question the
generality of their accounts. Broadbent further suggested that this state of aairs
is primarily due to the use of theoretical terms that are too imprecise and that do
not capture the details of the experimental conditions or do not allow direct and
explicit comparisons between predictions and observations. He illustrated the point
by showing how a simple random walk model could account for the four typical
result patterns observed in visual and memory search experiments, through limited
variations of the model's parameters. McClelland (1988) reported another illustrative example showing how the recourse to simulation with the interactive activation
model helped reconcile ndings that previously appeared contradictory.
Interestingly, observational fragility or variability may be a problem for computer
systems as much as it is a problem for experimentation. We recently experienced
such diculties in simulating the experimental results of a set of studies devoted to
examining how the presence of an initial minimalmismatch in an auditory word (i.e.,
\shigarette," \focabulary") aected recognition. Our ndings (Frauenfelder, Content & Scholten, 1995) suggested that such a minimal deviation did not prevent the
activation of the target word. We then set up simulations to assess whether trace
could account for the observations. Unfortunately, the implementation characteristics of the model (only a subset of the phoneme inventory of English is available)
prevented us from using exactly the same stimuli as in the experiment. However,
it was possible to run a \simulation experiment" that was close to the human situation. One intriguing result of the simulation was that the ability of the model
to recover the intended word despite a minimal deviation was far from clear and
varied to a large extent as a function of several factors.
With the original lexicon (which contained only about 200 words) and the parameter set provided by the authors, simulations con rmed that trace could recognize
a fair proportion (75%) of stimuli with one feature onset mismatch. However, this
nding was not replicated for a larger lexicon (approximately 1000 words) for which
recognition performance on minimal onset deviations plummeted to below 25%.

18

LANGUAGE CONNECTIONIST MODELS

This result is quite unexpected since part of the original justi cation for the model
(McClelland & Elman, 1986) was its supposed ability to activate words despite
minor initial mismatches (as in the \shigarette" example). In addition, when the
parameter controlling the top-down feedback from word to phoneme was turned
o, the recognition rate for the original but especially for the mismatch stimuli
improved considerably with the larger lexicon. Nonetheless, the words were still
recognized relatively poorly with mismatching inputs (about 50% for minimal mismatches). The results suggest that, contrary to what is generally believed, trace
does not reliably recognize words with minimal mismatches. Limited recognition of
such stimuli can only be achieved at the expense of the key mechanism of top-down
feedback required to account for lexical eects at the phoneme level.
This example drawn from our current research illustrates several issues. One
is the problem of scaling. Because the behavior of the system depends in complex
ways on its database, there is little guarantee that properties observed with a limited
lexicon will generalize to a larger, more realistic one. Note however that the only
way to assess the inuence of corpus size is to explore it directly through simulations,
and this, obviously, is only possible when a computer model is made available.
Second, it is extremely interesting that trace displays variability in its ability
to recover from minimal mismatches. One could, of course, wonder whether this
pattern corresponds to variability observed in human subjects. One way to answer
that question would be to directly compare the performance of the computer system
with the human data across stimuli, and to examine the t on a point-to-point basis.
Unfortunately this cannot be done with the current version of trace. Another
approach is to consider the computer system as an object of study in itself, and to
use experimental and statistical techniques to identify the factors that explain the
observed variability in its behavior.
Frauenfelder and Peeters (1990) appealed to quantitative lexical analyses to understand the behavior of trace. Their objective was to nd the members of the
activated lexical competitor set and their inuence on the time-course of word recognition in trace. They tried to determine how the simulated recognition durations
for a set of words could be predicted by dierent de nitions of the competitors of
these words (for example, candidates matching the input exactly or those with a
small mismatch in their onset like those in the experiments just described). The
results show that competitors that match and are aligned with the target input,
the cohort competitors, play the dominant role in determining the time-course of
word recognition. Words with mismatching onsets did not aect the recognition
time-course.
This approach of relating the simulation results to quantitative analyses of the
properties of the lexicon gives the researcher some leverage to pry open the blackbox and to understand the model's behavior. Indeed, although trace can generate
activation curves and word recognition latencies for each word in its lexicon, it is
still dicult to understand how it produces these results and to predict the outcome
for a new input. As we have seen, the model often shows unexpected patterns
of behavior. Part of the diculty lies in understanding the complex interaction

CONTENT & FRAUENFELDER

19

between the processing mechanisms (bottom-up activation, lateral inhibition and


top-down activation)postulated by interactive activation theory. In this context,
simulation models lead us well beyond the exercise of formalizing and implementing
a verbal theory. Computer models are also of great heuristic value. As proposed
by McCloskey (1991), they give us the equivalent of concrete animal models which
allow further exploration, permit identi cation of neglected factors, lead to new
research questions which deepen our understanding of the target cognitive system.
4.3.3. Locus of complexity
Loftus (1993) expressed the concern that the availability of extremely powerful
automatic computation resources would deter researchers from the quest for general
and simple principles. There is unanimous agreement that theories must be simple
and general. However, the notion of simplicity is itself far from transparent, and
there is no accepted scale to evaluate the simplicity of a theory or a model (but
see Jacobs and Grainger, 1994, for some suggestions). Furthermore, simplicity, as a
feature of the description of the system (human or arti cial) should not be confused
with simplicity as a characteristic of the system's behavior. Anybody who has ever
approached dynamic system theories, chaos or fractals is aware of the paradoxical
complexity associated with extremely simple mathematical functions.
The complexity of the phenomena that we are studying is a feature that we can
enjoy or deplore, but we can do nothing to change it. As noted by Seidenberg
(1993), the issue is far from new in psychology. To quote from a classical source
(Miller, Galanter & Pribram, 1960): \No benign and parsimonious deity has issued
us an insurance policy against complexity" (p.182). By contrast, the use of simulation tools that embody simple mechanisms while producing complex behavior, the
familiarity with their functioning and the analytic understanding of their properties
is most likely to generate insights leading to simpli ed accounts.
4.4. System Identi ability
We have mentioned previously the general problem of system identi ability: Any
input-output mapping is compatible with an in nite equivalence class of algorithms.
This raises the possibility that the whole enterprise of developing process models
(be they verbal or computational) of cognitive abilities is futile and doomed to
undecidability, unless cognitive science can provide further constraints that reduce
search space. How do we choose between models that appear equivalent as to
descriptive adequacy?
A partial response is that theoretical models should be preferred not only based
on their descriptive adequacy, but also in view of other characteristics, such as their
simplicity, scope, generality, heuristic value, and conformity to general principles.
Another element of response to this diculty is the observation that theories may
be confronted with a rich empirical database, including other measures than inputoutput pairings. In most research areas in cognitive psychology, chronometric data
are available, and could be used to assess the validity of theoretical models. As many
authors have noted, most verbal models cannot predict latency patterns directly.

20

LANGUAGE CONNECTIONIST MODELS

Parisi and Burani (1988) argue that most verbal models are static, because they
rarely specify the ne-grained operations of the hypothesized components. At the
very best, they make predictions on nominal (the regularity eect in visual word
naming) or ordinal scales (the frequency eect, see Jacobs and Grainger, 1994),
although the dependent variable used is based on a ratio scale. In contrast, certain
computational models1 can predict mean latencies at the level of an interval or ratio
scale and, if they involve some stochastic component, they might even be used to
account for variations in distributions (see Grainger & Jacobs, in press).
Models of a wider generality are now appearing, that handle not only the nal
state of a particular cognitive ability but also its development, its inter-individual
uctuations, and its pathological deterioration. As discussed above, thanks to their
speci cation of implementational details, process models can be confronted with a
much richer set of observations and be submitted to more stringent empirical tests.
Finally, one additional source of constraints that may help limit the search space
is the appeal to a limited set of computational principles that de ne a metatheoretical framework (or a scienti c paradigm) for information processing theories. An example of such a set of general principles is delineated by McClelland
(1993 see also Plaut et al., 1996) under the acronym of \grain" (Graded Random
Adaptive Interactive Nonlinear) networks. Other principles central to this approach
involve the notions of distributed representations and distributed knowledge. Obviously, neither this particular set of statements nor any other is currently universally
adopted or even accepted by the scienti c community. Shouldn't we then rst focus
on the abstract general principles, such as the componential structure of the system,
the characteristics of information ow, or the nature of computational primitives
and representations rather than building detailed models and thereby incurring the
risk of getting lost in a forest of implementation details?
The trouble is that it may well be impossible to evaluate the validity of principles
in isolation. In discussing the psychological motivations of each principle, McClelland (1993) insists on their interdependence, and a similar argument was made by
Newell (1973), in his twenty-question paper. Besides, such general abstract computational principles cannot be subjected to the empirical test of the falsi cation
strategy. Rather, as argued by MacKay (1993), among others, the fundamental assumptions that de ne a theoretical framework emerge gradually and gather support
through their repeated successes in generating simple, elegant and appropriate accounts of speci c cognitive and linguistic processes they are eliminated only when
an alternative set of principles becomes available.
A useful illustration of this process comes from the debate between supporters of
the connectionist framework and partisans of the symbolic approach concerning the
Port and van Gelder (1995a) argue that computational models based on the symbol manipulation paradigm are intrinsically incapable of predicting the temporal course of processing, because
\they leave time out of the picture, replacing it only with ersatz `time': a bare sequence of symbolic
states." Latency predictions are usually obtained by some transformation of response probabilities.
By contrast, dynamical models describing how the state of the system evolves in time appear most
appropriate to account for reaction time data.
1

CONTENT & FRAUENFELDER

21

acquisition of morphology. Critics of the initial simulation study have pushed the
conclusion that the connectionist approach was in principle unable to account for
the facts of language acquisition. Yet further research (MacWhinney & Leinbach,
1991 Plunkett & Marchman, 1989 Plunkett & Marchman, 1990) has shown that
none of the criticisms was beyond the reach of connectionist techniques. Although it
is still unclear which of the current approaches has the best chances of providing the
most accurate and parsimonious account of morphological acquisition, rejecting the
whole frameworkbecause of the inadequacy of a particular instantiation is logically
unsound.
Another example can be found in an ongoing controversy about the eect of
word context upon phoneme processing. Massaro (1988, 1989b) observed that the
interactive activation model incorrectly predicted an interaction between phoneme
and context information, because of the feedback connections from word to phoneme
units. He thus concluded that the interactivity assumption was inappropriate. Yet,
McClelland (1991) later showed that the inclusion of a stochastic component in
the network changed the system's behavior, in a way that was more compatible
with empirical observations. Thus here also, two assumptions (stochasticity and
interactivity) may have interdependent consequences.
In sum, given the interdependency of various assumptions, modeling projects
provide the most appropriate testing ground for the general principles that they
instantiate. Yet designers should pay attention not only to the descriptive adequacy
of their models but also to the relation between their models and general principles.
4.5. The gardener's problem: from simulation to theory
One conceptual diculty that sometimes a!icts discussions of the role of modeling
techniques is the conation between the computer program and the theory. Some
authors have gone as far as claiming that \Theories can be stated as computer
programs" (Simon, 1992, p 152). In contrast, we consider that it is crucial to
insist on the distinction and complementarity between the simulation system and
the accompanying theoretical gloss. Computer simulations complement rather than
replace verbal descriptions. A clear statement of this complementarity appeared in
Palmer & Kimchi (1986), who argue against the notion that the computer program
as such constitutes a psychological theory, and insist on the importance of the
accompanying description:
a running simulation is only an IP information processing] theory by
virtue of the fact that it too can be described by a ow diagram plus
mini-mapping theories of its components (p. 57).
Their major argument is that a computer program can be described at various
levels of speci cation, and that it may be dicult, without a verbal account, to
decide which levels of description are psychologically relevant. This is the problem
of mapping hypothetical constructs in the model onto their psychological counterparts. There is also, however, a related but distinct diculty, which we call the
redescription problem. Modelers must specify the properties and characteristics

22

LANGUAGE CONNECTIONIST MODELS

underlying the model's functioning at a level of abstractness that permits useful


and appropriate generalizations.
4.5.1. The mapping problem
The rst point may seem obvious. A model is a metaphor, and a metaphor is illuminating only as far as one clari es the relevant features that the metaphorical object
shares with the target system, or better, the relevant level(s) of analysis at which
a correspondence may be established between the two systems. Yet, in practice,
expliciting and understanding the relationship between a simulation model and the
corresponding human process is far from trivial. A major cause of this diculty is
that both human cognitive processes and computer programs are complex objects
that allow for a multiplicity of levels of description.
One well-known reference on the issue of description levels is a well-known proposal by David Marr (1982) that identi es three levels of analysis of information
processing tasks. The three levels correspond to the computational description of
the system (the input-output mapping that the system realizes), its algorithmic
description (the algorithm used to perform the mapping) and its hardware implementation. Marr's discussion makes it clear that all three levels may contribute
to the understanding of the observed phenomena: some being explicable through
hardware properties (afterimages), others (the Necker cube) requiring consideration
of both hardware properties and algorithmic description. Furthermore, the notion
of algorithmic description masks the fact (known to everyone who has engaged in
any sort of computer programming project) that an algorithm can be described
with various grains, independently of the hardware speci cations (cf. Palmer and
Kinchi's notion of recursive decomposition).
Given the multiplicityof potential algorithmic descriptions, a simulation model at
the algorithmic level could in principle be constructed to match the real function at
many dierent levels, from the most abstract level of the input-output mapping (as
happens, for instance, if a regression technique was used to derive a mathematical
function), to the nest-grained level of elementary processes, with all intermediate possibilities (such as, for instance, in Massaro's, 1989a, Fuzzy Logical Model of
Perception, which assumes three stages of perceptual processing|evaluation of perceptual features, integration and decision|but restrict the simulation to an abstract
mathematical description of the integration and decision operations). Concerning
evaluation, it seems obvious that a (hypothetical) simulation model in which the correspondence goes down to the most elementary level is better, in scope and power,
than a model restricted to the most abstract level of mapping. Nevertheless, this
does not mean that starting at the most detailed level is the best research strategy.
As Marr suggested, it may be easier to start from a broad abstract characterization
of the function, and gradually focus the microscope.
These issues pertain not only to symbolic approaches to modeling, but also to the
arti cial neural networks framework. Willshaw (1995) describes a formal technique
through which sets of symbolic and subsymbolic algorithms may be organized hierarchically in terms of their level of abstraction and implementation, and concludes

CONTENT & FRAUENFELDER

23

that \symbolic and subsymbolic algorithms are not neatly divided into two distinct
classes, with the one being at a 'higher' level than the other" (p. 16).
4.5.2. The description problem
The problem of redescription|extracting an appropriate description of the model
functioning from simulation results and knowledge of its design to allow useful
generalizations|may appear more acute if one adopts the gardener's approach,
though in no way would we argue that it is speci c to that strategy. As we have
repeatedly stated, any reasonably complex model may at some point produce unexpected behavior. Indeed, our recent results with trace illustrate one case in
which the behavior of the system did not correspond to the description given by
its designers. It is the job of the designers (or, for that matter, of any serious user
of the model) to explore the details of the system performance, the way it changes
with variations of the stimulus set, or parameter values, and to provide principled
and accurate accounts of how and why the system behaves the way it does.
The gardener's approach may, with much know-how and perhaps a bit of luck,
lead to an outcome that matches the empirical observations. Still, that is only
the beginning of the hard work. Simulations are not explanations. If we do not
understand the simulation process any more than we understand the real one, having
a running simulation of a given function is of little help. To borrow from a judicious
analogy introduced by Forster (1994), this would be no more helpful than having
a next-door neighbor capable of predicting, without explaining how, the outcome
of any experiment that we might design and run. To some extent, the problem is
similar to the use of statistical data- tting techniques: A mathematical equation
may provide a descriptively and predictively adequate account of some regularity,
but not an explicit description of the process that produces the regularity itself,
and this strongly restricts possible generalizations.
This issue has arisen in recent years in the context of the assessment of the distributed arti cial neural networks framework, and the discusion has centered on
Seidenberg and McClelland's (1989) model of visual word recognition and naming,
and its more recent derivatives (Plaut & McClelland, 1993 Plaut et al., 1996).
Note that the issue is not whether any of these models is empirically adequate, but
rather whether they provide or even lead to adequate theories of cognitive functions.
McCloskey (1991) argued that the theoretical claims formulated by Seidenberg and
McClelland are vague and too general, and that the theoretical elaboration fails
to describe how the network accomplishes its task, because of our limited understanding of complex connectionist networks. Yet, such a description of processing
is certainly no less appropriate or informative than any other type of model currently available. As noted by Seidenberg (1993), \there is a rich theory here: it
has only to be acknowledged" (p.233). Granted, the description leaves many details
unspeci ed, it may be incomplete, the mechanics of the model is based on new
and unfamiliar notions, it is implausible in some respects, and many aspects of its
performance could be further explored. However, similar remarks could be made
about any other modeling eort.

24

LANGUAGE CONNECTIONIST MODELS

McCloskey (1991) concluded by arguing that the design (or, for that matter,
the growing of) connectionist networks should be viewed more as analogous to the
use of animal models than as simulations of theories of human cognitive functions.
He further stated that, just like animal models, connectionist systems are objects
of study in themselves, which may aid in developing theories of cognitive systems
thanks to their similarity to the human system. We would simply add that one
dierence between animal models and computer models is that the availability of
the former is limited and constrained by natural selection, whereas the latter are
aorded through design principles and constrained by preexisting theoretical hypotheses. From that perspective, the study of arti cial simulation systems may be
the only way to examine the implications of a set of computational principles and
assess their validity in accounting for human information processing.
5.

Conclusions

We started our discussion by asking some simple questions: why use computer
modeling in cognitive psychology? In what ways does the exercise of computer
modeling techniques modify the nature of psychological research?
We consider that a de ning characteristic of cognitive psychology is the search
for a particular kind of scienti c explanations that consist in accounting for the
behavioral characteristics of human performance in terms of the organization and
mechanisms of mental functions. Thus, empirical regularities observed in performance are used to draw a number of conclusions regarding a hypothetical mental
function, the architecture and components it requires and its probable mode of operation, so that the empirical observations can be reduced to logical and necessary
consequences of the characteristics of that mental machinery.
In this framework, it seems to us that a useful heuristic|perhaps even the only
heuristic|is to create models, that is, to produce theoretical elaborations that describe the relevant characteristics of the function, and to explore how well they
account for the empirical observations. The use of computer modeling is a natural
and obvious extension of this endeavor. Rather than limiting themselves to a verbal
description of an imaginary mechanism, designers of computer models attempt to
concretize the mechanism as a computer program.
Is this modeling enterprise worth the eort? We have analyzed several types of
diculties encountered in current empirical research, and have argued that computer modeling provides appropriate tools to confront these problems.
One basic problem stems from the great complexity of our object of study, that
is, the graded and multidimensional nature of mental functions. Computer modeling provides a good way of dealing with this intrinsic complexity and with the
dynamic nature of information processing systems. In contrast, verbal models can
make only simple processing predictions and our capacity to grasp these predictions
is even more limited. Very limited, indeed: those readers who have tried to present
in any detail the subtleties of the dual-route model of visual word recognition to
their students may know how limited our capacity to compute mentally the logical

CONTENT & FRAUENFELDER

25

consequences of a (very) simple architecture may be. Few of us can imagine without external help the combined evolution of more than two elementary dierential
equations over time.
We have also argued that modeling forces researchers to elaborate more detailed
and fully speci ed accounts. Although any implemented model involves many arbitrary decisions, the \full speci cation constraint" is positive pressure that may drive
scienti c progress. In fact, any arbitrary implementation choice hides a potential
empirical issue: it suces that another designer suggest a dierent solution, and
that the resulting models perform dierently or lead to distinct predictions.
We have also suggested that the use of modeling techniques helps delimit the
space of potential explanations by enlarging the scope of theoretical accounts and
also by referring to general principles of processing. By exploring the intrinsic
characteristics of the model, psychologists may be led toward accounts that are
more strongly motivated theoretically. Models are also concrete objects, which lend
themselves to further study. Access to computational models gives psychologists
collections of hypothetical devices that may be constructed, deconstructed, and
manipulated at will. We have illustrated how the elaboration of computer models
leads to the identi cation of new research issues, and how the exploration and the
systematic study of their performance may be helpful to understand the behavioral
characteristics of hypothetical processing systems.
Finally, we have claimed that to be useful from a psychological viewpoint, computer programs should be accompanied by an appropriate description. Jointly these
make it possible to establish how the elements of the designed system map onto the
real function, and how the behavioral characteristics of the system emerge from its
design features. In our view, the major change that computer models introduce into
psychological research is that they allow a dyadic confrontation between empirical
observations and verbal models to be transformed into a triadic and interactive
confrontation between data, theories and implemented simulation systems.
6.

References

Bever, T. (1992). The demons and the beast|modular and nodular kinds of knowledge. In R. G. Reilly & N. E. Sharkey (Eds.), Connectionist approaches to
natural language processing, (pp. 213-252). Hillsdale, NJ: Lawrence Erlbaum.
Broadbent, D. (1987). Simple models for experimentable situations. In P. Morris
(Ed.), Modelling cognition, (pp. 169-185). London: Wiley.
Chomsky, N. (1965). Aspects of the theory of syntax. Cambridge, Ma: MIT Press.
Content, A., & Sternon, P. (1994). Modelling retroactive context eects in spoken
word recognition with a simple recurrent network, Proceedings of the 16th
Annual Conference of the Cognitive Science Society, 207-212. Hillsdale, NJ:
Lawrence Erlbaum.

26

LANGUAGE CONNECTIONIST MODELS

Cutler, A. (1980). Making up materials is a confounded nuisance, or: Will we be


able to run any psycholinguistic experiments at all in 1990? Cognition, 10,
65-70.
Dijkstra, A., & de Smedt, K. (1996). Computer models in psycholinguistics: an introduction. In A. Dijkstra & K. de Smedt (Eds.), Computational psycholinguistics: AI and Connectionist Models of Human Language Processing. (pp.
1-23) London: Taylor & Francis.
Ellis, A. W., & Young, A. W. (1988). Human cognitive neuropsychology. Hove:
Lawrence Erlbaum.
Elman, J. L., & McClelland, J. L. (1984). Speech perception as a cognitive process:
the interactive activation model. In N. Lass (Ed.), Speech and language:
Advances in basic research and practive, (Vol. 10, pp. 337-374). New York:
Academic Press.
Estes, W. K. (1993). Mathematical models in psychology. In G. Keren & C. Lewis
(Eds.), A handbook for data analysis in the behavioral sciences: methodological issues, (pp. 3-19). London: Lawrence Erlbaum.
Forster, K. I. (1994). Computational modeling and elementary process analysis
in visual word recognition. Journal of Experimental Psychology : Human
Perception and Performance, 20, 1292-1310.
Frauenfelder, U. (1996). Computational modelling of spoken word recognition. In
A. Dijkstra & K. de Smedt (Eds.), Computational psycholinguistics: AI and
Connectionist Models of Human Language Processing. (pp. 114-138) London: Taylor & Francis.
Frauenfelder, U. H., Content, A. & Scholten, M. (1995). Activation and deactivation
in spoken word recognition. Paper presented at the 36th annual meeting of
the Psychonomics Society, Los Angeles, November 1995.
Frauenfelder, U. H., & Peeters, G. (1990). On lexical segmentation in trace :
an exercise in simulation. In G. T. M. Altmann (Ed.), Cognitive Models of
Speech Processing, (pp. 50-86). Cambridge, Ma: M.I.T. Press.
Gibson, E. J. (1994). Has psychology a future? Psychological Science, 5, 69-76.
Grainger, J., & Jacobs, A. M. (in press). Orthographic processing in visual word
recognition: a multiple read-out model. Psychological Review.
Jacobs, A. M., & Grainger, J. (1994). Models of visual word recognition - Sampling
the state of the art. Journal of Experimental Psychology : Human Perception
and Performance, 20, 1311-1334.

CONTENT & FRAUENFELDER

27

Johnson-Laird, P. N. (1983). Mental models. Cambridge: Cambridge University


Press.
Johnson-Laird, P. N. (1988). The computer and the mind. Cambridge, Ma.: Harvard University Press.
Lachter, J., & Bever, T. G. (1988). The relation between linguistic structure and
associative theories of language learning `|A constructive critique of some
connectionist learning models. Cognition, 28, 195-247.
Ling, C., & Marinov, M. (1993). Answering the connectionist challenge: a symbolic
model of learning the past tenses of English verbs. Cognition, 49, 235-290.
Loftus, G. (1993). Computer simulation: some remarks on theory in psychology.
In G.
Keren & C. Lewis (Eds.), Data analysis in the behavioral sciences: Statistical issues,
(pp. 477-491). Hillsdale, NJ: Lawrence Erlbaum.
MacKay, D. G. (1988). Under what conditions can theoretical psychology survive
and prosper? Integrating the rational and empirical epistemologies. Psychological Review, 95, 559-565.
MacKay, D. G. (1993). The theoretical epistemology: a new perspective on some
long- standing methodological issues in psychology. In G. Keren & C. Lewis
(Eds.), A handbook for data analysis in the behavioral sciences: methodological issues, (pp. 229-255). London: Lawrence Erlbaum.
MacWhinney, B., & Leinbach, J. (1991). Implementations are not conceptualizations: Revising the verb learning model. Cognition, 40, 121-157.
Marr, D. (1982). Vision. New York: Freeman.
Marslen-Wilson, W. D. (1987). Functional parallelism in spoken word recognition.
Cognition, 25, 71-102.
Marslen-Wilson, W. D., & Welsh, A. (1978). Processing interactions and lexical
access during word recognition in continuous speech. Cognitive Psychology,
10, 29-63.
Massaro, D. W. (1988). Some criticisms of connectionist models of human performance. Journal of Memory and Language, 27, 213-234.
Massaro, D. W. (1989a). Multiple book review of Speech perception by ear and by
eye: a paradigm for psychological inquiry. Behavioral and Brain Sciences,
12, 741- 794.

28

LANGUAGE CONNECTIONIST MODELS

Massaro, D. W. (1989b). Testing between the trace model and the Fuzzy Logical
model of speech perception. Cognitive Psychology, 21, 398-421.
Massaro, D. W., & Cowan, N. (1993). Information processing models: microscopes
of the mind. Annual Review of Psychology, 44, 383-426.
McClelland, J. L. (1988). Connectionist models and psychological evidence. Journal
of Memory and Language, 27, 107-123.
McClelland, J. L. (1991). Stochastic interactive processes and the eect of context
on perception. Cognitive Psychology, 23, 1-44.
McClelland, J. L. (1993). Toward a theory of information processing in graded,
random and interactive networks. In D. E. Meyer & S. Kornblum (Eds.), Attention and Performance, (Vol. XIV, pp. 655-688). Hillsdale, NJ: Lawrence
Erlbaum.
McClelland, J. L., & Elman, J. L. (1986). The trace model of speech perception.
Cognitive Psychology, 18, 1-86.
McClelland, J. L., & Rumelhart, D. E. (1981). An interactive activation model of
context eects in letter perception: Part 1. An account of basic ndings.
Psychological Review, 88, 375-405.
McCloskey, M. (1991). Networks and theories : the place of connectionism in
cognitive science. Psychological Science, 2, 387-395.
Miller, G. A., Galanter, E., & Pribram, K. H. (1960). Plans and the structure of
behavior. New York: Holt, Rinehart, & Winston.
Moore, E. F. (1956). Gedanken-experiments on machines. In C. E. Shannon & J.
McCarthy (Eds.), Automata studies. Princeton, NJ: Princeton University Press.
Morais, J., Alegria, J., & Content, A. (1987). The relationship beween segmental
analysis and alphabetic literacy: An interactive view. Cahiers de Psychologie
Cognitive, 7, 415-438.
Morton, J. (1969). The interaction of information in word recognition. Psychological Review, 76, 165-178.
Morton, J. (1980). The logogen model and orthographic structure. In U. Frith
(Ed.), Cognitive processes in spelling, (pp. 117-135). London: Academic
Press.
Neisser, U. (1976). Cognition and reality. San Francisco: Freeman.

CONTENT & FRAUENFELDER

29

Newell, A. (1973). You can't play 20 questions with nature and win. In W. E. Chase
(Ed.), Visual information processing, (pp. 283-308). New York: Academic
Press.
Newell, A. (1990). Uni ed theories of cognition. Cambridge, Ma: Harvard University Press.
Norman, D. A. (1980). Twelve issues for cognitive science. In D. A. Norman
(Ed.), Perspectives on cognitive science, (pp. 265-295). Hillsdale: Lawrence
Erlbaum.
Norris, D. (1990). A Dynamic-net model of human speech recognition. In G. T. M.
Altmann (Ed.), Cognitive Models of Speech Processing , (pp. 87-104). Cambridge,
Ma: M.I.T. Press.
Palmer, S. E., & Kimchi, R. (1986). The information processing approach to cognition. In T. J. Knapp & L. C. Robertson (Eds.), Approaches to cognition:
contrast and controversies, (pp. 37-77). Hillsdale: Lawrence Erlbaum.
Parisi, D., & Burani, C. (1988). Observations on theoretical models in neuropsychology of language. In F. Denes, C. Semenza, P. Bisiacchi, & E. Andreewsky
(Eds.), Perspectives on cognitive neuropsychology, . London: Lawrence Erlbaum.
Plaut, D., & McClelland, J. L. (1993, ). Generalization with componential attractors: word and nonword reading in an attractor network. Paper presented at
the 15th Annual Conference of the Cognitive Science Society, Hillsdale.
Plaut, D., & Shallice, T. (1993). Deep dyslexia: a case study of connectionist
neuropsychology. Cognitive Neuropsychology, 10, 377-500.
Plaut, D. C., Mcclelland, J. L., Seidenberg, M. S., & Patterson, K. (1996). Understanding normal and impaired word reading: computational principles in
quasi-regular domains. Psychological Review, 103, 56-115.
Plunkett, K., & Marchman, V. (1989). Pattern association in a back-propagation
network: implications for child language acquisition (Technical Report 8902):
Center for Research in Language, University of California at San Diego.
Plunkett, K., & Marchman, V. (1990). From rote learning to system building
(Technical Report 9020): Center for Research in Language, University of
California at San Diego.
Port, R. F., & van Gelder, T. (1995a). It's about time: an overview of the dynamical
approach to cognition. In R. F. Port & T. van Gelder (Eds.), Mind as motion,
(pp. 1-44). Cambridge, Ma.: MIT Press.

30

LANGUAGE CONNECTIONIST MODELS

Port, R. F., & van Gelder, T. (1995b). Mind as motion. Cambridge, Ma.: MIT
Press.
Pylyshyn, Z. W. (1984). Computation and cognition. Cambridge, Ma: MIT Press.
Rohl, M., & Pratt, C. (1995). Phonological awareness, verbal working memory and
the acquisition of literacy. Reading and Writing, 7, 327-360.
Rumelhart, D. E., & McClelland, J. L. (1986). On learning the past tenses of English
verbs. In J. L. McClelland & D. E. Rumelhart (Eds.), Parallel Distributed
Processing, (Vol. 2, pp. 216-271). Cambridge, Ma: M.I.T. Press.
Seidenberg, M. S. (1993). Connectionist models and cognitive theory. Psychological
Science, 4, 228-235.
Seidenberg, M. S., & McClelland, J. L. (1989). A distributed, developmental model
of word recognition and naming. Psychological Review, 96, 523-568.
Seidenberg, M. S., Plaut, D. C., Petersen, A. S., McClelland, J. L., & Patterson, K.
(1994). Nonword pronunciation and models of word recognition. Journal of
Experimental Psychology : Human Perception and Performance, 20, 11771196.
Simon, H. A. (1992). What is an "explanation" of behavior? Psychological Science,
3, 150-161.
Stone, G. O., & Van Orden, G. C. (1994). Building a resonance framework for
word recognition using design and system principles. Journal of Experimental
Psychology : Human Perception and Performance, 20, 1248-1268.
Treiman, R., Mullennix, J., Bijeljac-Babic, R., & Richmond-Welty, E. D. (1995).
The special role of rimes in the description, use, and acquisition of English
orthography. Journal of Experimental Psychology : General, 124, 107-136.
Van Orden, G. C., & Goldinger, S. D. (1994). Interdependence of form and function
in cognitive systems explains perception of printed words. Journal of Experimental Psychology : Human Perception and Performance, 20, 1269-1291.
Van Orden, G. C., Pennington, B. F., & Stone, G. O. (1990). Word identi cation
in reading and the promise of subsymbolic psycholinguistics. Psychological
Review, 97, 488-522.
Wagner, R. K., & Torgesen, J. K. (1987). The nature of phonological processing
and its causal role in the acquisition of reading skills. Psychological Bulletin,
101, 192- 212.
Willshaw, D. (1995). Symbolic and subsymbolic approaches to cognition. In L. S.

CONTENT & FRAUENFELDER

31

Smith & P. J. B. Hancock (Eds.), Neural computation and psychology, (pp. 3-18).
London: Springer.

Você também pode gostar