Você está na página 1de 390

Approaches to Phonological Complexity

Phonology and Phonetics


16
Editor
Aditi Lahiri
Mouton de Gruyter
Berlin New York
Approaches to
Phonological Complexity
edited by
Francois Pellegrino
Egidio Marsico
Ioana Chitoran
Christophe Coupe
Mouton de Gruyter
Berlin New York
Mouton de Gruyter (formerly Mouton, The Hague)
is a Division of Walter de Gruyter GmbH & Co. KG, Berlin.
Printed on acid-free paper which falls within the guidelines
of the ANSI to ensure permanence and durability.
Library of Congress Cataloging-in-Publication Data
Approaches to phonological complexity / edited by Francois Pelle-
grino [et al.].
p. cm. (Phonology and phonetics ; 16)
Includes bibliographical references and index.
ISBN 978-3-11-022394-1 (hardcover : alk. paper)
1. Complexity (Linguistics) 2. Phonetics. 3. Grammar, Com-
parative and general Phonology. I. Pellegrino, Francois, 1971
P128.C664A77 2009
414dc22
2009043030
Bibliographic information published by the Deutsche Nationalbibliothek
The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie;
detailed bibliographic data are available in the Internet at http://dnb.d-nb.de.
ISBN 978-3-11-022394-1
ISSN 1861-4191
Copyright 2009 by Walter de Gruyter GmbH & Co. KG, D-10785 Berlin.
All rights reserved, including those of translation into foreign languages. No part of this
book may be reproduced in any form or by any means, electronic or mechanical, including
photocopy, recording, or any information storage and retrieval system, without permission
in writing from the publisher.
Cover design: Christopher Schneider, Laufen.
Printed in Germany.

Table of contents

Introduction 1
Franois Pellegrino, Egidio Marsico, Ioana Chitoran and
Christophe Coup
Part 1: Complexity and phonological primitives
Complexity in phonetics and phonology: gradience, categoriality,
and naturalness 21
Ioana Chitoran and Abigail C. Cohn
Languages sound inventories: the devil in the details 47
John J. Ohala
Signal dynamics in the production and perception of vowels 59
Ren Carr
Part 2: Typological approaches to measuring complexity
Calculating phonological complexity 85
Ian Maddieson
Favoured syllabic patterns in the worlds languages and
sensorimotor constraints 111
Nathalie Valle, Solange Rossato and Isabelle Rousset
Structural complexity of phonological systems 141
Christophe Coup, Egidio Marsico and Franois Pellegrino
Scale-free networks in phonological and orthographic wordform
lexicons 171
Christopher T. Kello and Brandon C. Beltz



vi Table of contents
Part 3: Phonological representations in the light of complex
adaptive systems
The dynamical approach to speech perception: From ne phonetic
detail to abstract phonological categories 193
Nol Nguyen, Sophie Wauquier and Betty Tuller
A dynamical model of change in phonological representations:
The case of lenition 219
Adamantios Gafos and Christo Kirov
Cross-linguistic trends in the perception of place of articulation in
stop consonants: A comparison between Hungarian and French 241
Willy Serniclaes and Christian Geng
The complexity of phonetic features organisation in reading 267
Nathalie Bedoin and Sonia Krifi
Part 4: Complexity in the course of language acquisition
Self-organization of syllable structure: a coupled oscillator model 299
Hosung Nam, Louis Goldstein and Elliot Saltzman
Internal and external influences on child language productions 329
Yvan Rose
Emergent complexity in early vocal acquisition: Cross linguistic
omparisons of canonical babbling 353
Sophie Kern and Barbara L. Davis
Index 377
List of Contributors 383





Introduction
Franois Pellegrino, Egidio Marsico, Ioana Chitoran
and Christophe Coup
1. The study of complexity in phonology and phonetics
What is complex? What is not complex, or simple? Is there a gap between
simple and complex? Or is complexity gradient? While universal answers
to these questions are probably of limited relevance, their resolution in
specific fields of research may be crucial, especially in biology or social
sciences, where complexity factors may play a highly significant role in the
emergence and the evolution of systems, whatever they are (Edmonds,
1999).
In phonetics and phonology, these questions have been present for more
than a century. For example, according to Zipf (1935:49), there exists an
equilibrium between the magnitude or degree of complexity of a phoneme
and the relative frequency of its occurrence. In this controversial work, he
thus tried to evaluate the magnitude of complexity of phonemes from articu-
latory effort (Zipf, 1935:66; but see also Joos, 1936) under the assumption
that it plays a major role in phonetic changes as well as in the structure of
phonological systems. Soon afterwards, Trubetzkoy reanalysed this interac-
tion in terms of markedness (1938:282), leading the way to a long-lasting
tradition of intricate relationships between the notions of markedness, fre-
quency, complexity and functional load, well exemplified by this quotation
from Greenberg, forty years later:
Are there any properties which distinguish favored articulations as a group
from their alternatives? There do, as a matter of fact, appear to be several
principles at work. [There is one] which accounts for a considerable number
of clusters of phonological universals () This is the principle that of two
sounds that one is favored which is the less complex. The nature of this
complexity can be stated in quite precise terms. The more complex sound
involves an additional articulatory feature and, correspondingly, an addi-
tional acoustic feature which is not present in the less complex sound. This
additional feature is often called a mark and hence the more complex, less
favored alternative is called marked and the less complex, more favored al-
ternative the unmarked. () It may be noted that the approach outlined here
avoids the circularity for which earlier formulations, such as those of Zipf,
2 Franois Pellegrino et al.
were attacked. () In the present instance, panhuman preferences were in-
vestigated by formulating universals based in the occurrence or non-
occurrence of certain types, by text frequency and other evidence, none of
which referred to the physical or acoustic nature of the sounds. Afterward, a
common physical and acoustic property of the favored alternatives was
noted employing evidence independent of that used to establish the univer-
sals (Greenberg, 1969:476-477).
Indeed, the notion of phonological complexity is implicitly present in nu-
merous works dealing with linguistic typology and universals (as in Green-
bergs quotation), language acquisition (e.g. Demuth, 1995) and historical
linguistics. Articulatory cost, perceptual distinctiveness and systemic con-
straints have thus been proposed as driving forces for explaining sound
changes (Lindblom & Maddieson, 1988; Lindblom, 1998:245), beside an
undisputed social dimension. The role of such mechanisms has also been
extended to the structure of language systems, leading some linguists to
postulate a balance of complexity within language grammar, a lack of com-
plexity in one component being compensated by another more complex
component (e.g. Hockett, 1958:180-181). However, this assumption is
highly debated and still unsolved (Fenck-Oczlon & Fenk, 1999, 2005;
Shosted, 2006).
However, one must acknowledge that the word complexity itself has not
been often explicitly referred to, even if it underlies several salient ad-
vances in phonetics and phonology.
For example, when Ohala pointed out that an exotic consonant invento-
ry such as { k ts m r | } is not observed in languages with few conso-
nants, he suggested that a principle of economy is at work at the systemic
level (Ohala, 1980; but see also, Ohala, this volume). Consequently, one
can infer that the above system is too complex to be viable; but too com-
plex with respect to what? And how to measure this complexity; is it a mat-
ter of global number of articulatory features, of intrinsic phonemic com-
plexity, of the overall size of the phonetic space used in this language?
Contrary to what Greenberg said, measuring complexity is not straightfor-
ward, even when the problem is narrowed, for instance to articulatory com-
plexity (e.g. Ohala, 1990:260) and we still lack relevant tools. Lindblom
and Maddieson (1988) began to address this question and proposed to di-
vide consonants into three sets (simple, elaborated and complex) according
to their articulatory complexity. They analysed the distribution of these
segments among the UPSID database (Maddieson, 1984) and they sug-
gested that languages have a tendency to use consonants and vowels picked
Introduction 3
from an adaptive phonetic space according to the number of elements in
their inventories. This influential paper combined a typological survey and
a theoretical attempt to decipher the mechanisms responsible for the ob-
served patterns (see also, Lindblom, 1998; Lindblom, 1999). In this sense,
it threw a bridge between the bare issue of complexity measurement and
the use of methods fostered by physics and cybernetics to account for the
general behaviour of languages, viewed as dynamical systems. In a late
work, Jakobson judged that:
Like any other social modelling system tending to maintain its dynamic
equilibrium, language ostensively displays its self-regulating and self-
steering properties. Those implicational laws which build the bulk of pho-
nological and grammatical universals and underlie the typology of languag-
es are embedded to a great extent in the internal logic of linguistic struc-
tures, and do not necessarily presuppose special 'genetic' instructions
(Jakobson, 1973:48).
Phenomena such as self-organisation, evoked above, and emergence, which
also comes to mind in this view, are commonly found in the study of com-
plex adaptive systems, a subfield of the science of complexity. These ap-
proaches connect the microscopic level (the components and their interac-
tions) to the macroscopic level (the system and its dynamic behaviour), and
they aim at explaining complex patterns with general mechanisms without
any teleological considerations
1
. As far as phonetics and phonology are
concerned, these perspectives have already generated a noteworthy litera-
ture (e.g. Kelso, Saltzman & Tuller, 1986; Lindblom, 1999) and several
recent developments are described in this book (mostly in Part 3). The next
paragraph provides some landmarks necessary to grasp the aims and con-
tent of this book.
2. Complex adaptive systems and the science of complexity
Since the middle of the twentieth century, scientists from numerous fields
of research, ranging from physics to graph theory, and from biology to
economics and linguistics, have built a web of theories, models and no-
tions, known today as the Science of Complexity. This paradigm pertains to
our everyday experience, and has provided us with insights in phenomena
as distinct at first glance as properties of ferromagnetic materials with
respect to temperature, motion patterns of persons on crowded sidewalks or
of fish schools, social behaviour of ants or termites, fluctuations of finan-
4 Franois Pellegrino et al.
cial markets, etc (e.g. Markose, 2005; Theraulaz et al., 2002; Gazi and Pas-
sino, 2004). The strength of this approach probably dwells in its protean
capacity, an adaptability that has been described by Lass (1997:294) as a
syntax without a semantics preventing any ontological commitment.
The exact scope of disciplines and methodologies which can potentially
benefit from this new science is therefore not restricted, and a reanalysis of
long-lasting open issues in the light of complexity leads to exciting connec-
tions to most areas of research, and linguistics is not an exception.
The main focus of the Science of Complexity is the study of complex
systems. A system is said to be complex when its overall behaviour exhibits
properties that are not easily predicted from the individual description of
the parts of the system. Hence, a car is not really complex but just compli-
cated: it consists of many interacting parts, but the behaviour of the car is
predictable from its components (and that is why we can safely drive it).
On the contrary, when the same car is caught in a traffic jam, it becomes
very difficult to predict the evolution of the blockage and even the individ-
ual trajectory of this particular car: the interaction of the cars (and of their
respective drivers) generates a complex collective pattern.
An essential element of complex systems lies in the interaction between
each component and its environment. Systems may differ in terms of the
reactivity of their components an ant cannot match a human being when it
comes to analyzing and reacting to the environmental input but a minimal
threshold has to be exceeded for complex behaviours to appear. Besides,
complex systems are generally explained by recourse to the notions of non-
linearity and emergence.
Nonlinearity refers to phenomena for which the effect of a perturbation
is not proportional to its initial cause, due to the complex network of inte-
ractions in which it is entangled. The famous butterfly effect, popularized
in chaos theory, illustrates the sensitivity to the initial conditions that de-
rives from this condition. Emergence refers to the appearance of structures
at the overall level, from the interactions of the components of a dynamic
complex system. Such structures can apply to relevant dimensions of the
system like its spatial organization but also unroll in time with the con-
sistent occurrences of transient or stable states. These emergent properties
often result from trade-off between conflicting constraints and from self-
organizing processes that can stabilize the system enough for such regulari-
ties to appear.
Most complex systems do not follow deterministic paths because of the
existence of degrees of freedom leading to a wide range of possible states
Introduction 5
in answer to internal and external constraints. Thus, various evolutionary
trajectories may be observed, and from a given initial state, these systems
may reach various final configurations, whose likelihood is a function of
the self-organizing forces at hand. In other words, it is impossible to predict
the evolution of a single system, but it is possible to draw reliable conclu-
sions for a large enough set of them: a collection of complex systems may
indeed exhibit a diversity of states, with some more frequent than others,
and some very unlikely but still explainable in terms of probability laws for
rare events, etc
2
.
The human language faculty is a complex system, both as an outcome of
interacting linguistic components within each individual and as a collective
set of conventions resulting from the interactions among individuals.
On the one hand, linguistic products themselves words, sentences, sets
of sentences are the outputs of a cognitive system composed of linguistic
components, as well as a set of complex relationships between them. Com-
peting pressures over lexicon and grammar (such as articulatory/auditory
constraints mentioned above) widely influence language production and
understanding by human beings, and dynamical processes (e.g. activation
propagation and decay in the mental lexicon, or interactive meaning con-
struction of sentence from lexicon and grammar). On the other hand, lan-
guage seen as a dynamical distributed system of conventions in a communi-
ty can also be analysed as a complex system, given the intricacy of the
linguistic interactions taking place between the speakers.
Indeed, the science of complexity has successfully addressed tremend-
ous challenges in our understanding of the human language faculty. Theo-
retical approaches that integrate self-organization, emergence, nonlinearity,
adaptive systems, information theory, etc., have already shed new light on
the duality between the observed linguistic diversity and the human cogni-
tive faculty of language. Most of the recent literature written in this frame-
work focuses either on the syntactic level addressed through computational
complexity (Barton et al., 1987; Ristad, 1993; among others) or perform-
ance optimization (e.g. Hawkins, 2004), or explicitly on the emergence and
evolution of language as a communication convention (e.g. Galantucci,
2005; Steels, 2005, 2006; Ke, Gong and Wang, 2008). Other linguistic
components have been less thoroughly investigated, Dahl (2004) and Oud-
eyer (2006) providing noteworthy exceptions offering stimulating ap-
proaches to long-lasting questions. However, no unified framework has yet
come into sight, and the field is characterized by a wide variety of ap-
proaches.
6 Franois Pellegrino et al.
3. Goal and contribution of the present volume
This book is the first one to propose an outline of this multi-faceted field of
research in the general framework of phonetics and phonology. It is orga-
nized in four parts and covers a large spectrum of issues addressed by the
community of specialists in two directions shaped by the concepts of com-
plexity and complex systems. The first branch ranges from the measure-
ment of complexity itself to the assessment of its relevance as an explana-
tion to typological phonology and to phylogenetic or ontogenetic
trajectories. The second branch ranges from the quest for phonet-
ic/phonological primitives to the dynamical modelling of speech communi-
cation (perception and/or production) as a complex system in an emergent
and self-organized attempt to explain phonetic and phonological processes.
Beyond this diversity, all the contributors of this book consider that the
notions of complexity and complex adaptive systems offer today a huge
potential for developing groundbreaking research on language and lan-
guages, to the extent that they may partially reveal the invisible hand for
the organization and evolution of speech communication a metaphor bor-
rowed from Adam Smiths work in economics and already developed in
Keller (1994) in a diachronic perspective. As said above however, no uni-
fied framework exists yet, and the contributions gathered here bring togeth-
er different pieces of the puzzle investigated from several points of view
and methodologies. Consequently, a reflection on phonological complexity
is present in all chapters to some degree, and the analyses are always based
on experimental data or cross-linguistic comparison.
In Part I, the questions of the nature of the relevant primitives in sound
systems is addressed in the light of complexity at the phonetics/phonology
interface. In chapter 1, Ioana Chitoran and Abigail C. Cohn bring together a
number of different notions that correspond to interpretations of phonologi-
cal complexity (e.g., markedness, naturalness), building on them a clear and
comprehensive overview of the main points of debate in phonetics and
phonology. These debates revolve around: (i) the interaction between pho-
netics and phonology; (ii) their gradient vs. categorical nature; (iii) the role
of phonetic naturalness in phonology; (iv) the nature of units of representa-
tion. Ioana Chitoran and Abigail C. Cohn argue that a clear and complete
understanding of what complexity is in phonetics and phonology must nec-
essarily engage these four points, and must take into account phenomena
that have generally been interpreted as lying at the interface between pho-
Introduction 7
netics and phonology. As such, it must crucially take into account variabili-
ty.
Chapters 2 (John J. Ohala) and 3 (Ren Carr) in this section both ad-
dress the issue of variability and challenge traditional representations of
phonetic primitives. Ohala further develops the idea that the degree of
complexity of a sound system should not be limited to the number and
combination of distinctive features. Rather, one has to consider the balance
between symmetry and economy as described in phonology, and asymme-
try and absence of categorical boundaries, as found in phonetics. Starting
from the idea that distinctive phonetic features in language X can be
present non-distinctively in language Y, Ohala argues that phonetic varia-
tion must be included in a measure of phonological complexity, because it
is part of a speakers knowledge of the language. The concept of coarticula-
tion, for example, is not entirely relevant for a phonological system, but the
systematic variation it introduces in the speech signal can, over time, affect
the composition of segmental inventories.
Carr (chapter 3) presents results from production and perception expe-
riments suggesting that the identification of vowels in V1V2 sequences is
possible based exclusively on dynamic stimuli, in the absence of static tar-
gets. Carr proposes that reliable information on vowel identities in V1V2
sequences lies in the direction and rate of transitions. He connects this find-
ing to the known importance of transition rate in the identification of con-
sonants. The implication of this connection is a possible unified theory of
consonant and vowel representation, based on the parameter of transition
rate: consonants are characterized by fast transitions and vowels by slow
transitions. Carrs dynamic approach thus presents an intriguing challenge
to more traditional views of phonetic specification, based primarily on stat-
ic primitives.

Part II starts with a contribution by Ian Maddieson where he proposes
several factors contributing to phonological complexity, departing from the
traditional counts of consonant and vowel inventories, tone systems or syl-
lable canons. The approach benefits from tests on a large representative
sample of the worlds languages and from a thorough analysis of the litera-
ture. The first factor deals with inherent phonetic complexity. The author
proposes various ways of establishing a scale of complexity for the seg-
ments, on which we can then base the measure of the system complexity by
summing the complexity of its particular components. The second factor
assesses the combinatorial possibilities of the elements (segments, tones,
8 Franois Pellegrino et al.
stress) present in a given phonological system, and one possibility sug-
gested by the author is to calculate the number of possible distinct syllables
per language. The third factor focuses on the frequency of types of the dif-
ferent phonological elements of a system. The idea put forward by the au-
thor is that the complexity of a language regarding a particular element is
the inherent complexity of that element weighted by its frequency of occur-
rence in the lexicon. In other words, the more a language uses a complex
element, the more complex it is. The major concern then lies in the way one
calculates the type frequencies for a large sample of languages: shall it be
based on lexicons or texts? The last potential complexity factor is called
variability and transparency. It has to do with phonological processes and
no more with inventories. The author suggests evaluating the motivations
behind phonological alternations; these variations can be ranked from more
conditioned ones, (highly motivated, thus less complex) to free variations
(no motivations, thus more complex). This complexity value can be
weighted against the number of resulting variants in the alternation, giving
a combined score of variability and transparency. The author concludes by
reckoning that even if all the proposed complexity factors are proved to be
relevant, the main problem will remain how to combine all of them in one
overall measure of complexity.
The second contribution of part II is by Nathalie Valle, Solange Rossa-
to and Isabelle Rousset. It echoes the second factor proposed by Ian Mad-
dieson regarding combinatorial possibilities of segments. The authors ana-
lyze some languages preferred sound sequences (syllabic or not) using a
17-language syllabified lexicon database (ULSID) in the light of the
frame/content theory (MacNeilage, 1998). They focus on the alternations of
consonants and vowels looking at their place of articulation. They confirm
previous findings stating preferred associations like coronal consonants
with front vowels, bilabial consonants with central vowels, and velar con-
sonants with back vowels. They also examine the so-called labial-coronal
effect according to which CV.CV words are predominantly composed of a
labial first consonant and a coronal second one. Their data extend this result
by showing the existence of the labial-coronal effect in other syllabic pat-
terns as well. Finally, they look at sequences of plosive and nasal conso-
nants, revealing that preferred associations question the validity of the so-
nority scale. To account for all their typological findings, the authors put
forward convincing arguments from articulatory, acoustic and perceptual
domains; they conclude that the patterns of sound associations encountered
Introduction 9
in the worlds languages find their source partly outside of phonology in
the sensorimotor capacities that underlie them.
The third contribution by Christophe Coup, Egidio Marsico and
Franois Pellegrino departs from the two previous papers, as it does not aim
at proposing any measure or scale of phonological complexity for phono-
logical segments or sound patterns. The contributors rather consider phono-
logical systems as complex adaptive systems per se and consequently, they
propose to characterize their structure in the light of several approaches
borrowed from this framework. The main rationale is that, by applying
models designed outside phonology and linguistics to a typological data-
base of phonological systems (namely UPSID), the influence of theoretical
a priori is limited and consequently allows the emergence of data-driven
patterns of organisation for the phonological systems. They propose two
different approaches. The first one, inspired from graph theory, consists in
analysing the structure of phonological systems by constituting graphs in
which phonemes are nodes and connections receive weights according to
the phonetic distance between these phonemes. Using a topological meas-
ure of complexity, this approach is used to compare the distribution of the
structural complexities among broad areal groups of languages. In the
second approach, they model the content of phonological inventories by
considering the distribution of co-occurrences of phonemes in order to de-
fine attraction and repulsion relations between them. These relations are
then used to propose a synchronic measure of coherence for the phonologi-
cal systems, and then diachronically extended to a measure of stability.
Emergent patterns of stability among phonological systems are demonstrat-
ed, supporting that this approach is efficient in extracting a part of the in-
trinsic information present in the UPSID database and avoiding as much as
possible the use of any linguistic a priori.
The last contribution of this second part is by Christopher T. Kello and
Brandon C. Beltz who propose an exciting hypothesis on the dynamical
equilibrium leading to a relationship between phonological systems and
phonotactics on the one hand, and the process of word formation in the
lexicon on the other. Their approach, like the one by Coup, Marsico and
Pellegrino, imports the mathematical theory of graph into linguistics. Phe-
nomena that exhibit behaviour described by power-laws are widespread in
physics, biology and social systems. When observed, these laws generally
signify that a principle of least-effort is operating, and that a dynamical
equilibrium results from the interaction between several competing con-
straints. Christopher T. Kello and Brandon C. Beltz observe power-law
10 Franois Pellegrino et al.
behaviours in word forms and phonological networks of American English,
built according to inclusion relation between the lexical items (in contrast
with semantic or purely morphological rules). As endorsed by the contribu-
tors, this result may stem from a trade-off between distinctiveness and effi-
ciency pressures. In other words, a valid language deals both with the
need to maintain a sufficient distance between the words of its lexicon and
with a constraint of parsimony leading to the reuse of existing phonotactic
or orthographic sequences. Their assumption is extended to the lexical net-
works of four other languages, and then assessed by comparison with artifi-
cial networks. While power-laws were first shown in lexical networks in
Zipfs seminal work a half-century ago (Zipf, 1949), Kello and Beltz work
goes further by demonstrating that several kinds of constraints interact and
generate the same type of behaviour in word formation mechanisms. In a
sense, this study fills a part of the gap between the lexicon and the phonol-
ogy of a language, and provides a convincing link that will be essential for
developing a systemic view of languages able to take all linguistic compo-
nents into account.

Part III is specifically dedicated to approaches that aim at revealing the
nature and organisation of human phonological representations in a multid-
isciplinary framework and in the light of complexity.
Nol Nguyen, Sophie Wauquier and Betty Tullers contribution devel-
ops a dynamical approach to explore the nature of the representations acti-
vated during speech perception. In the first section, they set the debate be-
tween abstractionist and exemplar-based models of speech perception.
Since arguments exist in favour of these two antagonistic hypotheses, they
argue that these statements result from the dual nature of speech perception.
In this view, phonetic details are retained, not as exemplars but as a dy-
namical tuning of a complex and continuous shape, and an abstractionist-
like behaviour is also possible, based on the existence of several stable
attractors. A dynamical model is developed and a review of several tasks of
speech categorization is proposed. The existence of a hysteresis cycle in the
behavioural performances observed during the task indicates that percep-
tion does not operate in a basic deterministic manner since it is sensitive to
the previous state of the system in a way typical of nonlinear dynamical
systems. These results strongly support the proposal of a hybrid and dy-
namical model of speech perception bringing together the properties of
both exemplar and abstractionist models.
Introduction 11
In connection with John Ohalas contribution in the first part of this
book, and with the dynamical model of speech perception detailed in the
previous chapter (Nguyen, Wauquier and Tuller), Adamantios Gafos and
Christo Kirov implemented a nonlinear dynamical model of phonetic
change illustrated with the case of lenition. They assume that phonological
representations consist of feature-like components that could be theoreti-
cally modelled using activation fields borrowed from the dynamic field
theory, and ruled by differential equations. In their view, produc-
tion/perception loops self-generate the well-known word frequency effect
reported for lenition. More specifically, the interaction between field acti-
vation (biased toward the inputs of the perception stage) and memory decay
is the backbone that enables gradual phonetic change. Produc-
tion/perception loops are thus responsible for both the potential shift of the
phonetic realization and the positive feedback that lead to the emergence of
a new stable variant of the phonetic parameters.
The third contribution of this part, from Willy Serniclaes and Christian
Geng, investigates the bases of categorical boundaries in the perception of
the place of articulation of stop consonants. It compares the perceptual
boundaries of Hungarian and French, using artificial stimuli differing in
terms of formant transitions and generated with the DRM model (see Ren
Carrs contribution in the first part of this volume). Four places of articula-
tion are distinctive in Hungarian, while only three are phonologically rele-
vant in French. Consequently, comparing the positions of their boundaries
is informative on the influence of universal phonetic predispositions on the
organisation of phonological categories. Results show that the perceptual
boundaries are similar for the two languages, dividing the formant transi-
tion space into three salient areas. It happens that the palatal-alveolar
boundary is not as salient as the other boundaries and that an additional
feature (besides burst and formant transition) probably plays a role. These
results are discussed in the perspective of the emergence of distinctive
boundaries from coupling between natural phonetic boundaries; they also
echo John Ohala contribution on the importance of so-called secondary
features in language evolution (see the first part of this book).
Nathalie Bedoin & Sonia Krifis contribution deals with the fundamen-
tal issue of the organisation of phonetic features, as revealed in the context
of reading tasks. They provide a thorough review of this literature, and a
series of visual priming and metalinguistic experiments. These experiments
explore the temporal course of reading by manipulating not only phonetic
feature similarity between primes and targets, but also the nature of these
12 Franois Pellegrino et al.
features. Taken as a whole, their results suggest that voicing, manner and
place are processed at different rates and that a complex pattern of activa-
tion propagation and lateral inhibition is involved. More specifically, voic-
ing seems to be processed first but, depending on the experimental condi-
tions, a prominent impact of manner over place and voicing may also be
evidenced when processing time is no more a relevant factor. Nathalie Be-
doin & Sonia Krifis contribution thus highlights the complexity of the
organisation of phonetic features both in the temporal and the hierarchical
dimensions. Additional information is provided by the replication of these
experiments with second and third grade children, revealing a gradient set-
ting of the underlying processes during language acquisition and develop-
ment.

The relevance of the approaches to phonology borrowed from the sci-
ence of complexity can only be assessed by evaluating whether such mod-
els succeed in tackling some of the challenges that limit our knowledge and
understanding of human language capacity and linguistic diversity. If grant-
ing a significant role to complexity is correct, one of the most salient fields
in which it will radically change our comprehension is the domain of lan-
guage acquisition, and especially along two directions. First, computational
dynamical models of emergence of linguistic patterns may assess hypothe-
ses related to the mechanisms of linguistic bootstrapping (e.g. Morgan &
Demuth, 1996; Pierrehumbert, 2003). Second, cross-linguistic comparison
of courses of language acquisition may reveal universal tendencies, not
necessarily in terms of phonological units (gestures, features, segments or
syllables) but in terms of their intrinsic complexity and of their interactions
in the communication system. In the longer run, these two approaches will
probably give rise to unified models of phonological acquisition, and they
already have reached significant results on the balance between universal
and language specific constraints in acquisition, as shown in Part IV.
In the first paper, Hosung Nam, Louis Goldstein and Elliot Saltzman
promote a dynamical model of the acquisition of syllable structures, com-
patible with what is attested in the worlds languages. More specifically,
the emergence of asymmetries between the frequencies of syllables with
onsets (CV structure) versus syllables with codas (VC structure) is ob-
served with their model which avoids the partially circular notion of the
unmarkedness of the CV structure. These effects emerge as a consequence
of the interaction between the ambient language and the intrinsic character-
istics of the oscillators that control the phasing of the articulatory gestures
Introduction 13
in the child model. By implementing a nonlinear coupling between these
oscillators, multiple stable modes can emerge as attractors, given the ge-
neric assumption that in-phase and anti-phase coordination of gestures are
preferred. Without additional hypotheses, differences between the duration
in the acquisition processes of CV and VC structures also emerge, regard-
less of the target ambient distribution. Next, the computational model is
proved to efficiently model the faster acquisition of VCC over CCV.
Hence, the model successfully manages to reproduce two seemingly con-
tradictory phenomena regarding the course of acquisition of CV vs. VC
structures on the one hand and of CCV vs. VCC on the other. In this
framework, the contributors demonstrate that data from linguistic typology
and from longitudinal studies of language acquisition can foster method-
ologies inspired from complex adaptive systems in an extraordinarily fruit-
ful approach.
The two last chapters of this book (contributed by Yvan, and Sophie
Kern & Barbara L. Davis) do not implement any computational models.
However they thoroughly explore the driving forces underlying phonologi-
cal acquisition in a multi-language framework. Yvan Rose argues for the
necessary synthesis between diverging approaches and he urges develop-
ment of a multi-faceted approach in order to overcome some failures of
current approaches in accounting for patterns observed during early phono-
logical acquisition. Reanalysing several papers from the literature on early
acquisition, he suggests that the role of the statistical patterns of the ambi-
ent language has been overestimated and an alternative explanation based
on structural complexity is introduced. In the rest of the chapter, the con-
tributor discusses a series of phonological patterns taken from data pub-
lished on early acquisition in terms of interactions of driving forces
grounded in several potentially relevant facets (articulation, perception,
statistics of the ambient language, child grammar as a cognitive system).
Yvan Roses contribution thus offers a strong argumentation in favour of
the multi-faceted approach and a rich and stimulating interpretation of ex-
isting data built upon factors of phonological complexity.
Sophie Kern & Barbara L. Daviss contribution tackles the issue of
cross-linguistic variability in canonical babbling, thanks to an unprecedent-
ed amount of empirical data from five languages. The contributors take
advantage from this unique material to investigate the universality and/or
language-specificity of canonical babbling. The theoretical bases support-
ing the existence of universal driving forces are introduced and developed
in the vein of the Frame/Content perspective, and the impact of the ambient
14 Franois Pellegrino et al.
language is discussed from a review of the literature. More specifically,
Sophie Kern and Barbara L. Davis highlight that the lack of common
ground strongly limits the cross-linguistic relevance of these studies (be-
cause of the different procedures applied in different languages). Then, they
analyse the similarities and differences observed in their data, at the seg-
mental level and in terms of subphonemic and phonemic co-occurrences in
the babbling structures. The discussion of these results draws a coherent
scheme that emphasizes the role of speech-like prelinguistic babbling as a
first step into language complexity, but predominated by universal charac-
teristics of the production system.
The editors would like to warmly thank all the authors of this volume.
We are also greatly indebted to the colleagues who have contributed as
reviewers for the submitted chapters: Ren Carr, Barbara Davis, Christian
DiCanio, Christelle Dodane, Emmanuel Ferragne, Ccile Fougeron, Ada-
mantios Gafos, Ian Maddieson, Noel Nguyen, Pierre-Yves Oudeyer, Grard
Philippson, Yvan Rose, Willy Serniclaes, Caroline Smith, Kenny Smith (in
alphabetic order), and to the participants of the Workshop Phonological
Systems & Complex Adaptive Systems, held in Lyon in July 2005, and
more specifically Didier Demolin, Bjrn Lindblom and Sharon Peperkamp,
for their comments.
We also thank Aditi Lahiri for her thorough and fruitful suggestions and
Mouton de Guyters anonymous reviewer. The editors fully acknowledge
the financial support from the French ACI Systmes complexes en SHS
and from the French Agence Nationale de la Recherche (project NT05-
3_43182 CL, P.I. F. Pellegrino).


Notes

1
. It may be more correct to state that these approaches imply a limited teleology, in the
sense that they are often based on the optimization of a given criterion and thus can be
seen as targeted to this optimization. See Blevins, (2004:71-78) for a thorough dis-
cussion about the nature of teleological and functional explanations in sound change.
2
. This statement can obviously be put in perspective with considerations on language
universals and the distribution of patterns among languages, e.g. see Greenberg
(1968): In general one may expect that certain phenomena are widespread in lan-
guage because the ways they can arise are frequent and their stability, once they oc-
cur, is high. A rare or non-existent phenomenon arises only by infrequently occurring
changes and is unstable once it comes into existence. The two factors of probability of
Introduction 15

origin from other states and stability can be considered separately (Greenberg,
1978:75-76).
References
Barton, G. Edward, Robert Berwick & Eric Sven Ristad
1987 Computational Complexity and Natural Language. The MIT Press:
Cambridge, MA, USA.
Blevins, Juliette
2004 Evolutionary phonology: the emergence of sound patterns. Cam-
bridge University Press: New York.
Dahl, sten
2004 The Growth and Maintenance of Linguistic Complexity. Studies in
Language Companion Series 71. John Benjamins.
Demuth, Katherine
1995 Markedness and the Development of Prosodic Structure. In Proceed-
ings of the North East Linguistic Society. Jill N. Beckman (ed.). Am-
herst: Graduate Linguistic Student Association. pp. 13-25.
Fenk-Oczlon Gertraud & Fenk August
1999 Cognition, quantitative linguistics, and systemic typology. Linguistic
Typology 3.2: 151-177.
2005 Crosslinguistic correlations between size of syllables, number of cases,
and adposition order. In Sprache und Natrlichkeit, Gedenkband fr
Willi Mayerthaler. G. Fenk-Oczlon & Ch. Winkler (eds). Tbingen.
Galantucci, Bruno
2005 An experimental study of the emergence of human communication
systems. Cogn. Sci. 29: 737767.
Gazi, Veysel & Kevin M. Passino
2004 Stability analysis of social foraging swarms. IEEE Transactions on
Systems, Man, and Cybernetics Part B, 34.1: 539- 557.
Greenberg Joseph. H.
1969 Language Universals: A Research Frontier. Science 166: 473-478. 24
October 1969.
1978 Diachrony, synchrony, and language universals. in Universals of
human language. J. H. Greenberg, C. A. Ferguson & E. A. Moravc-
sik (eds.). Vol. 1. Stanford University Press: Stanford. pp. 61-93.
Hawkins, John A.
2004 Efficiency and Complexity in Grammars, Oxford University Press:
Oxford.
Hockett, Charles F.
1958 A Course in Modern Linguistics. The MacMillan Company: New York.

16 Franois Pellegrino et al.
Jakobson, Roman
1973 Main trends in the Science of Language. Main trends in the social
Sciences Series. Harper & Row: New-York.
Joos, Martin
1936 Review: The Psycho-Biology of Language by George K. Zipf, Lan-
guage 12.3: 196-210.
Ke Jinyun, Tao Gong & William S-Y Wang
2008 Language Change and Social Networks. Communications in compu-
tational physics 3.4: 935-949.
Keller, Rudi
1994 On language change : the invisible hand in language. Routledge,
London ; New York.
Kelso, J. A.S., Saltzman, E.L. &Tuller, B.
1986 The dynamical perspective on speech production: Data and theory.
Journal of Phonetics 14.1: 29-59.
Lass, Roger
1997 Historical Linguistics and Language Change, Cambridge University
Press: Cambridge.
Lindblom, Bjrn
1998 Systemic constraints and adaptive change in the formation of sound
structure. In Approaches to the evolution of language. Hurford J.R.,
Studdert-Kennedy M. & Knight C. (eds). Cambridge University
Press: Cambridge. pp. 242-264.
1999 Emergent phonology. Perilus. XXII: 1-15.
Lindblom, Bjrn & Maddieson, Ian
1988 Phonetic universals in consonant Systems. In Language, Speech, and
Mind. Hyman L.M. & Li C.N. (eds). Routledge: New York. pp. 62-78.
MacNeilage, Peter F.
1998 The Frame/Content Theory of Evolution of Speech Production. Be-
havioral and Brain Sciences 21: 499-511.
Maddieson, Ian
1984 Patterns of Sounds. Cambridge University Press: Cambridge.
Markose, Sheri M.
2005 Computability and Evolutionary Complexity: Markets as Complex
Adaptive Systems (CAS). The Economic Journal 115.504: F159F192.
Morgan, James L. & Katherine Demuth
1996 Signal to Syntax: Bootstrapping from Speech to Grammar in Early
Acquisition, Lawrence Erlbaum Associates: Mahwah.
Ohala, John J.
1980 Moderators summary of symposium on Phonetic universals in
phonological systems and their explanation. Proceedings of the 9
th

International Congress of Phonetic Sciences Vol. 3. Institute of Pho-
netics: Copenhagen. pp. 181-194.
Introduction 17
1990 The phonetics and phonology of aspects of assimilation. In Papers in
Laboratory Phonology I: Between the grammar and the physics of
speech. J. Kingston & M. Beckman (eds). Cambridge University
Press: Cambridge. pp. 258-265.
Oudeyer, Pierre-Yves
2006 Self-Organization in the Evolution of Speech. Studies in the Evolution of
Language. Oxford University Press. (Translation by James R. Hurford)
Pierrehumbert, Janet B.
2003 Phonetic diversity, statistical learning, and acquisition of phonology.
Language and Speech 46.2-3: 115-154.
Ristad, Eric S.
1993 The Language Complexity Game. The MIT Press: Cambridge, MA.
Shosted, Ryan K.
2006 Correlating complexity: A typological approach. Linguistic Typology
10.1: 1-40.
Steels, Luc
2006 Experiments on the emergence of human communication. Trends in
Cognitive Sciences 10.8: 347-349.
2005 The emergence and evolution of linguistic structure: from lexical to
grammatical communication systems. Connection Science 17.3:213-230.
Theraulaz, Guy, Eric Bonabeau, Stamatios C. Nicolis, Ricard V. Sol, Vincent
Fourcassi, Stphane Blanco, Richard Fournier, Jean-Louis Joly, Pau
Fernndez, Anne Grimal, Patrice Dalle, & Jean-Louis Deneubourg
2002 Spatial patterns in ant colonies. PNAS 2002 99: 9645-9649.
Trubetzkoy, Nikolai. S.
1938 Grundzge der phonologie. French Edition. 1970., Klincksieck:
Paris.
Zipf, George.K.
1935 The Psycho-Biology of Language: An Introduction to Dynamic Phi-
lology. MIT Press: Cambridge. First MIT Press paperback edition,
1965
1949 Human behaviour and the principle of least effort: an introduction to
human ecology. Addison-Wesley: Cambridge.



Part 1:
Complexity and phonological primitives



Complexity in phonetics and phonology: gradience,
categoriality, and naturalness
Ioana Chitoran and Abigail C. Cohn
1. Introduction
In this paper, we explore the relationship between phonetics and phonol-
ogy, in an attempt to determine possible sources of complexity that arise in
sound patterns and sound systems. We propose that in order to understand
complexity, one must consider phonetics and phonology together in their
interaction. We argue that the relationship between phonology and phonet-
ics is a multi-faceted one, which in turn leads us to a multi-faceted view of
complexity itself. Our goal here is to present an overview of the relevant
issues in order to help define a notion (or notions) of complexity in the
domain of sound systems, and to provide a backdrop to a constructive dis-
cussion of the nature of complexity in sound systems.
We begin in 2 by considering possible definitions of phonological
complexity based on the different interpretations that have been given to
this notion. The issue of complexity has previously been addressed, implic-
itly or explicitly, through notions such as markedness, effort, naturalness,
information content. Concerns with a measure of phonological or phonetic
complexity are therefore not new, even though the use of the term com-
plexity per se to refer to these questions is more recent. In this section we
survey earlier endeavors in these directions.
We then turn to the multi-faceted nature of the relationship between
phonology and phonetics. In this regard, we address two main questions
that have traditionally played a central part in the understanding of the pho-
netics-phonology relationship. The first of these, addressed in 3, is the
issue of gradience vs. categoriality in the domain of linguistic sound sys-
tems, and its implications for the question of an adequate representation of
linguistic units.
In 4 we discuss the second issue: the role of phonetic naturalness in
phonology. These are major questions, and we do not attempt to provide a
comprehensive treatment here. Rather, our goal is simply to consider them
in framing a broader discussion of phonological complexity.
22 Ioana Chitoran and Abigail C. Cohn

In 5 we return to the question of complexity and consider the ways in
which measures of complexity depend on the type of unit and representa-
tion considered. The conclusion of our discussion highlights the multi-
faceted nature of complexity. Thus more than one type of unit and more
than one type of measure are relevant to any characterization of complex-
ity.
2. Definitions of complexity
In our survey of earlier implicit and explicit definitions of complexity, we
review past attempts to characterize the nature of phonological systems. We
discuss earlier concerns with complexity in 2.1; then we turn in 2.2 to the
issue of theoretical framing in typological surveys, where we compare two
types of approaches: theory-driven and data-driven ones. The first type is
illustrated by Chomsky and Halles (1968) The Sound Pattern of English
(SPE), and the second by Maddiesons (1984) Patterns of Sounds.


2.1. Early approaches to complexity
A concern with complexity in phonetics and phonology can be traced back
to discussions of several related notions in the literature: markedness, ef-
fort, naturalness, and more recently, information content. While none of
these notions taken individually can be equated with complexity, there is an
intuitive sense in which each one of them can be considered as a relevant
element to be included in the calculation of complexity.
Studies of phonological complexity started from typological surveys,
which led to the development of the notion of markedness in phonological
theory. The interpretation of markedness as complexity is implicit in the
original understanding of the term, the sense in which it is used by
Trubetzkoy (1939, 1969): the presence of a phonological specification (a
mark) corresponds to higher complexity in a linguistic element. Thus, to
take a classic example, voiced /d/ is the more complex (marked) member of
an opposition relative to the voiceless (unmarked) /t/.
Later, the interpretation of markedness as complexity referred to coding
complexity (see Haspelmath, 2006 for a detailed review). Overt marking or
coding is seen to correspond to higher complexity than no coding or zero
expression. This view of complexity was adopted and further developed
Complexity in phonetics and phonology 23

into the notion of iconicity of complexity, recently critiqued by Haspelmath
(to appear). What is relevant for the purposes of our paper is noting the
actual use of the terms complex and complexity in this literature. Several of
the authors cited in Haspelmath (2006; to appear) use these terms explic-
itly. Thus, Lehmann (1974) maintains the presence of a direct correlation
between complex semantic representation and complex phonological repre-
sentation. Givn (1991) treats complexity as tightly related to markedness.
He considers complex categories to be those that are cognitively marked,
and tend to be structurally marked at the same time. Similarly, in New-
meyers formulation: Marked forms and structures are typically both
structurally more complex (or at least longer) and semantically more com-
plex than unmarked ones (Newmeyer, 1992:763).
None of these discussions includes an objective definition of complex-
ity. Only Lehmann (1974) proposes that complexity can be determined by
counting the number of features needed to describe the meaning of an ex-
pression, where the term feature is understood in very broad, more or less
intuitive terms. The study of complexity through the notions of markedness
or iconicity has not been pursued further, and as highlighted by both Hume
(2004) and Haspelmath (2006), neither notion constitutes an explanatory
theoretical tool.
Discussions of complexity in the earlier literature have also focused on
the notion of effort, which has been invoked at times as a diagnostic of
markedness. It is often assumed, for example, that phonetic difficulty corre-
sponds to higher complexity, and things that are harder to produce are
therefore marked. While many such efforts are informal, see Kirchner
(1998/2001) for one attempt to formalize and quantify the notion of effort.
Ironically, however, Jakobson himself criticized the direct interpretation of
this idea as the principle of least effort, adopted in linguistics from the 18
th

century naturalist Georges-Louis Buffon:
Depuis Buffon on invoque souvent le principe du moindre effort: les arti-
culations faciles mettre seraient acquises les premires. Mais un fait es-
sentiel du dveloppement linguistique du bb contredit nettement cette hy-
pothse. Pendant la priode du babil lenfant produit aisment les sons les
plus varis (Jakobson, 1971:317) [Since Buffon, the principle of least
effort is often invoked: articulations that are easy to produce are supposedly
the first to be acquired. But an essential fact about the childs linguistic de-
velopment strictly contradicts this hypothesis. During the babbling stage
the child produces with ease the most varied sounds]
1

24 Ioana Chitoran and Abigail C. Cohn

Jakobsons critique is now substantiated by experimental work showing
that articulatory effort is not necessarily avoided in speech production.
Convincing evidence comes from articulatory speech error experiments
carried out by Pouplier (2003). Poupliers studies show that speech errors
do not involve restricted articulator movement. On the contrary, in errors
speakers often add an extra gesture, resulting in an even more complex
articulation, but in a more stable mode of intergestural coordination.
Similarly, Trubetzkoy cautions against theories that simply explain the
high frequency of a phoneme by the less difficult production of that pho-
neme (Trubetzkoy, 1969, chapter 7). He advocates instead a more sophisti-
cated approach to frequency count, which takes into account both the real
frequency of a phoneme and its expected frequency:
The absolute figures of actual phoneme frequency are only of secondary
importance. Only the relationship of these figures to the theoretically ex-
pected figures of phoneme frequency is of real value. An actual phoneme
count in the text must therefore be preceded by a careful calculation of the
theoretical possibilities (with all rules for neutralization and combination in
mind) (Trubetzkoy, 1969:264).
We return to this view below, in relation to Humes (2006) proposal of
information content as a basis for markedness.
In general, however, the usefulness of insights gained by considering
speculative notions such as effort, described in either physical or processing
terms, has been limited. Nevertheless these attempts have at least served to
show, as Maddieson (this volume) points out, that: difficulty can itself be
difficult to demonstrate.
Another markedness diagnostic that has been related to complexity is
naturalness. Even though the term naturalness is explicitly used, it over-
laps on the one hand with the diagnostic of effort and phonetic difficulty,
and on the other hand with frequency. The discussion of naturalness can be
traced back to Natural Phonology (Donegan and Stampe, 1979, among
others). A natural, unmarked phenomenon is one that is easier in terms of
the articulatory or acoustic processes it involves, but also one that is more
frequent. In the end it becomes very difficult to tease apart the two con-
cepts, revealing the risk of circularity: processes are natural because they
are frequent, and they are frequent because they are natural.
Information content is proposed by Hume (2006) as an alternative to
markedness. In her proposal she accepts Trubetzkoys challenge, trying to
determine a measure of the probability of a phoneme, rather than just its
frequency of occurrence. She argues that what lies at the basis of marked-
Complexity in phonetics and phonology 25

ness is information content, a measure of the probability of a particular
element in a given communication system. The higher the probability of an
element the lower its information content, and conversely, the lower its
probability the higher its information content. Markedness diagnostics can
thus be replaced by observations about probability, which can be deter-
mined based on a number of factors.
2
While the exact nature of these fac-
tors, their interaction, and the specific definition of probability require fur-
ther empirical investigation, it is plausible to hypothesize a relationship
between complexity and probability. For example, if low probability corre-
lates with higher information content, then it may in turn correlate with
higher complexity. At the same time, a related hypothesis needs to be
tested, one signalled by Pellegrino et al. (2007): it is possible that informa-
tion rate (the quantity of information per unit per second) may turn out to
be more relevant than, or closely related to information content (the quan-
tity of information per unit).


2.2. Theory-driven vs. data-driven approaches
Overall we identify two main types of studies of phonological complexity,
which we refer to as theory-driven and data-driven, respectively.
The theory-driven approach is well illustrated by Chomsky and Halles
(1968) SPE, where counting distinctive features is considered to be the
relevant measure of complexity, not unlike Lehmanns (1974) proposal,
albeit restricted to phonology. In chapter 9 of SPE, Chomsky and Halle
develop a complexity metric. Starting from the assumption that a natural
class should be defined with fewer distinctive features than a non-natural
(or less natural) class, Chomsky and Halle observe some contradictions.
For example, the class of voiced obstruents is captured by more features
than the class of all voiced segments, including vowels. Nevertheless, the
first class is intuitively more natural than the second one, and would there-
fore be expected to have the simpler definition. The solution they propose
is to include the concept of markedness in the formal framework, and to
revise the evaluation measure so that unmarked values do not contribute to
complexity (Chomsky and Halle, 1968:402). This adjustment allows them
to define complexity, and more specifically the complexity of a segment
inventory, in the following way: The complexity of a system is equal to
the sum of the marked features of its members (Chomsky and Halle,
1968:409), or in other words, related to the sum of the complexities of the
26 Ioana Chitoran and Abigail C. Cohn

individual segments (Chomsky and Halle, 1968:414). Thus, a vowel sys-
tem consisting of /a i u e o/ is simpler (and therefore predicted to be more
common) than / i u e o/. By counting only the marked features, the first
system has a complexity of 6, while the second one has a complexity of 8.
The authors themselves acknowledge the possible limitations of their
measure: summing up the marked features predicts, for example, that the
inventory /a i u e / is as simple and common as /a i u e o/, both with a
complexity of 6. One potentially relevant difference between these systems,
which the measure does not consider, is the presence vs. absence of sym-
metry, the first inventory being more symmetrical than the second one. In
general in theory-driven approaches, complexity is defined through a par-
ticular formal framework, and thus the insights gained are inevitably lim-
ited by the set of operational assumptions.
A data-driven study of phonological complexity is Maddiesons (1984)
Patterns of Sounds and the UPSID database that it is based on. The data-
base focuses on the segment, so the implicit measure of complexity in-
volves counting segments. This raises the crucial issue of representation,
which we will return to later. In Maddiesons survey each segment con-
sidered phonemic is represented by its most characteristic allophone
(Maddieson, 1984:6). The representative allophone is determined by
weighing several criteria: (i) the allophone with the widest distribution,
when this information is available; (ii) the allophone most representative of
the phonetic range of variation of all allophones; (iii) the allophone from
which the others can be most easily derived. Maddieson thus codifies an
atheoretical, descriptive definition of the segment, adopting a somewhat
arbitrary, intermediary level of representation between phonology and pho-
netics, that is in between the underlying contrastive elements, and the pho-
netic output characterizable as a string of phones. The database captures the
output of the phonology, a discrete allophonic representation, which is nei-
ther purely phonemic nor purely phonetic, and described as: phonologi-
cally contrastive segments () characterized by certain phonetic attributes
(Maddieson, 1984:160).
Following Maddiesons example, linguists have continued to make so-
phisticated use of typological surveys for many purposes, including that of
evaluating complexity (e.g., Lindblom and Maddieson, 1988; Valle, 1994;
Valle et al., 2002; Marsico et al., 2004).


Complexity in phonetics and phonology 27

2.3. Summary
Both theory-driven and data-driven approaches can offer useful insight in
the nature and organization of phonological systems. It is also important to
bear in mind the implicit assumptions even in what are taken to be data-
driven approaches. (See also Hayes and Steriade, 2004, pp. 3-5 discussion
of inductive vs. deductive approaches to the study of markedness.)
One critical aspect of these efforts is the question of the relevant linguis-
tic units in measuring complexity. This question is addressed explicitly by
Marsico et al. (2004) and Coup et al. (this volume). Feature-hood and
segment-hood can both tell us something about complexity. But neither
concept is as clear-cut as often assumed. Under many views (such as SPE),
features are taken as primitives. Segments are built out of bundles of fea-
tures. Other views take the segment to be primary, or even suggest that
segments are epiphenomenal, as is argued by some exemplar theorists. We
take the view that in adult grammar, both segments and features have a role
to play in characterizing the inventories and patterns of sound systems. As
seen above, the question of the nature of segments is also a complex one:
do we mean underlying contrastive units, do we mean something more
concrete, such as Maddiesons surface allophones? The question about the
nature of segments leads to broader questions about the nature of phonol-
ogy and phonetics and their relationship. In the next section, we turn to this
relationship.
3. The relationship between phonology and phonetics
Chomsky and Halle provided an explicit answer about the nature of repre-
sentations, drawing a distinction between underlying representations, cap-
tured in terms of bundles of binary feature matrices, and surface forms,
which were the output of the phonology. At this point in the derivation a
translation of binary values to scalar values yielded the phonetic transcrip-
tion. They assumed a modular relationship between phonology and phonet-
ics, where phonology was categorical, whereas phonetics was gradient and
continuous. It was also assumed that phonology was the domain of the
language specific and phonetics the domain of universal (automatic) as-
pects of sound patterns. Research since that time has investigated this rela-
tionship from many angles, enriching the view of phonetics in the grammar,
showing that the dichotomy between phonology and phonetics is not as
28 Ioana Chitoran and Abigail C. Cohn

sharp as had been assumed. (See Cohn, 1998, 2006a & b for discussion).
We briefly review the nature of this relationship.
First, as discussed by Cohn (2006b), there are actually two distinct ways
in which phonology and phonetics interact. A distinction needs to be
drawn between the way phonology affects or drives phoneticswhat Cohn
terms phonology in phonetics and the way that phonetics affects phonol-
ogywhat Cohn terms phonetics in phonology. In the first, the nature of the
correlation assumed by SPE, that is, that phonology is discrete and cate-
gorical, while phonetics is continuous and gradient is important. In the
second, the place of naturalness, as internal or external to the grammar, is
central. From both of these perspectives, we conclude that phonology and
phonetics are distinct, albeit not as sharply delineated as implied by strictly
modular models.


3.1. Phonology in Phonetics
Phonology is the cognitive organization of sounds as they constitute the
building blocks of meaningful units in language. The physical realization
of phonological contrast is a fundamental property of phonological systems
and thus phonological elements are physically realized in time. Phonology
emerges in the phonetics, in the sense that phonological contrast is physi-
cally realized.
This then is the first facet of the relationship between phonology and
phonetics: the relationship between these cognitive elements and their
physical realization. Implicit in the realization of phonology is the division
between categorical vs. gradient effects: phonology captures contrast,
which at the same time must be realized in time and space. This leads to the
widely assumed correlations in (1).

(1) The relationship between phonology and phonetics:
phonology = discrete, categorical

phonetics = continuous, gradient

The correlations in (1) suggest the following relationships:
(2) a. Categorical phonology b. Gradient phonology
c. Categorical phonetics d. Gradient phonetics

Complexity in phonetics and phonology 29

If the correlation between phonology and categoriality on one hand and
between phonetics and gradience on the other were perfect, we would ex-
pect there to be only categorical phonology (a) and gradient phonetics (d).
There are reasons why the correlation might not be perfect, but nevertheless
strong enough to re-enforce the view that phonology and phonetics are
distinct. On the other hand, perhaps there is in fact nothing privileged about
this correlation. In 3.2, we review the evidence for categorical phonology
and gradient phonetics. We consider categorical phonetics and gradient
phonology in 3.3.


3.2. Categorical phonology and gradient phonetics
A widely assumed modular view of grammar frames our modeling of more
categorical and more gradient aspects of such phenomena as belonging to
distinct modules (e.g. phonology vs. phonetics). We refer to this as a map-
ping approach. Following a mapping approach, categorical (steady state)
patterns observed in the phonetics are understood to result from either lexi-
cal or phonological specification and gradient patterns are understood to
arise through the implementation of those specifications.
Growing out of Pierrehumberts (1980) study of English intonation,
gradient phonetic patterns are understood as resulting from phonetic im-
plementation. Under the particular view developed there, termed generative
phonetics, these gradient patterns are the result of interpolation through
phonologically unspecified domains. Keating (1988) and Cohn (1990) ex-
tend this approach to the segmental domain, arguing that phenomena such
as long distance pharyngealization and nasalization can be understood in
these terms as well. Within generative phonetics, the account of gradience
follows from a particular set of assumptions about specification and under-
specification.
It is generally assumed that categoriality in the phonology also follows
directly from the nature of perception and the important role of categorical
perception. The specific ways in which perception constrains or defines
phonology are not well understood, although see Hume and Johnson (2001)
for recent discussions of this relationship.
A modular mapping approach has been the dominant paradigm to the
phonology-phonetics interface since the 1980s and such approaches have
greatly advanced our understanding of phonological patterns and their re-
alization. The intuitive difference between more categorical and more gra-
30 Ioana Chitoran and Abigail C. Cohn

dient patterns in the realization of sounds corresponds to the division of
labor between phonology and phonetics within such approaches and this
division of labor has done quite a lot of work for us. Such results are seen
most concretely in the success of many speech synthesis by rule systems
both in their modeling of segmental and suprasegmental properties of
sound systems. (See Klatt, 1987 for a review.)
A modular approach also accounts for the sense in which the phonetics,
in effect, acts on the phonology. In many cases, phonological and phonetic
effects are similar, but not identical. This is the fundamental character of
what Cohn (1998) terms phonetic and phonological doublets, cases where
there are parallel categorical and gradient effects in the same language,
with independent evidence suggesting that the former are due to the pho-
nology and the latter result from the implementation of the former. For
example, this is seen in patterns of nasalization in several languages (Cohn,
1990); palatalization in English (Zsiga, 1995); vowel devoicing in Japanese
(Tsuchida, 1997, 1998); as well as vowel harmony vs. vowel-to-vowel
coarticulation and vowel harmony, investigated by Beddor and Yavuz
(1995) in Turkish and by Przezdziecki (2005) in Yoruba. (See Cohn, 2006b
for fuller discussion of this point.)
What these cases and many others have in common is that the patterns
of coarticulation are similar to, but not the same as, assimilation and that
both patterns cooccur in the same language. The manifestations are differ-
ent, with the more categorical effects observed in what we independently
understand to be the domain of the phonology and the more gradient ones
in the phonetic implementation of the phonology. To document such dif-
ferences, instrumental phonetic data is required, as impressionistic data
alone do not offer the level of detail needed to make such determinations.
Following a mapping approach, assimilation is accounted for in the
phonological component and coarticulation in the phonetic implementation.
Such approaches predict categorical phonology and gradient phonetics, but
do they fully capture observed patterns? What about categorical phonetics
and gradient phonology?


3.3. Categorical phonetics and gradient phonology
We understand categorical phonetics to be periods of stability in space
through time. These result directly from certain discontinuities in the pho-
netics. This is precisely the fundamental insight in Stevenss (1989) Quan-
Complexity in phonetics and phonology 31

tal Theory, where he argues that humans in their use of language exploit
articulatory regions that offer stability in terms of acoustic output.
3
There
are numerous examples of this in the phonetic literature. To mention just a
few, consider Huffmans (1990) articulatory landmarks in patterns of nasal-
ization, Kingstons (1990) coordination of laryngeal and supralaryngeal
articulations (binding theory), and Keatings (1990) analysis of the high
jaw position in English /s/.
There are many ways to model steady-state patterns within the phonetics
without calling into question the basic assumptions of the dichotomous
model of phonology and phonetics. To mention just one approach, within a
target-interpolation model, phonetic targets can be assigned based on pho-
nological specification as well as due to phonetic constraints or require-
ments. Such cases then do not really inform the debate about the gray area
between phonology and phonetics.
The more interesting question is whether there is evidence for gradient
phonology, that is, phonological patterns best characterized in terms of
continuous variables. It is particularly evidence claiming that there is gradi-
ent phonology that has led some to question whether phonetics and phonol-
ogy are distinct. The status of gradient phonology is a complex issue (for a
fuller discussion see Cohn, 2006a). Cohn considers evidence for gradient
phonology in the different aspects of what is understood to be phonology
contrast, phonotactics, morphophonemics, and allophony and concludes
that the answer depends in large part on what is meant by gradience and
which aspects of the phonology are considered. The conclusions do suggest
that strictly modular models involve an oversimplification.
While modular models of sound systems have achieved tremendous re-
sults in the description and understanding of human language, strict modu-
larity imposes divisions, since each and every pattern is defined as either X
or Y (e.g., phonological or phonetic). Yet along any dimension that might
have quite distinct endpoints, there is a gray area. For example, what is the
status of vowel length before voiced sounds in English, bead [bi:d] vs. beat
[bit]? The difference is greater than that observed in many other languages
(Keating, 1985), but does it count as phonological?
An alternative to the types of approaches that assume that phonology
and phonetics are distinct and that there is a mapping between these two
modules or domains are approaches that assume that phonology and pho-
netics are understood and modeled with the same formal mechanisms
what we term unidimensional approaches. A seminal approach in this re-
gard is the theory of Articulatory Phonology, developed by Browman and
32 Ioana Chitoran and Abigail C. Cohn

Goldstein (1992 and work cited therein), where it is argued that both pho-
nology and phonetics can be modeled with a unified formalism. This view
does not exclude the possibility that there are aspects of what has been un-
derstood to be phonology and what has been understood to be phonetics
that show distinct sets of properties or behavior. This approach has served
as fertile ground for advancing our understanding of phonology as resulting
at least in part from the coordination of articulatory gestures.
More recently, a significant group of researchers working within con-
straint-based frameworks has pursued the view that there is not a distinction
between constraints that manipulate phonological categories and those that
determine fine details of the representation. This is another type of ap-
proach that assumes no formally distinct representations or mechanisms for
phonology and phonetics, often interpreted as arguing for the position that
phonology and phonetics are one and the same thing.
The controversy here turns on the question of how much phonetics there
is in phonology, to what extent phonetic detail is present in phonological
alternations and representations. Three main views have been developed in
this respect:
(i) phonetic detail is directly encoded in the phonology (e.g.,
Steriade, 2001; Flemming, 1995/2002, 2001; Kirchner, 1998/2001);
(ii) phonetic detail (phonetic naturalness) is only relevant in the con-
text of diachronic change (e.g., Ohala, 1981 and subsequent work;
Hyman, 1976, 2001; Blevins, 2004);
(iii) phonetic detail is indirectly reflected in phonological constraints,
by virtue of phonetic grounding (e.g., Hayes, 1999; Hayes and
Steriade, 2004).
While there is general agreement on the fact that most phonological
processes are natural, that is, make sense from the point of view of
speech physiology, acoustics, perception, the three views above are quite
different in the way they conceptualize the relationship between phonetics
and phonology and the source of the explanation.
The first view proposes a unidimensional model, in which sound pat-
terns can be accounted for directly by principles of production and percep-
tion. One argument in favor of unidimensional approaches is that they offer
a direct account of naturalness in phonology, the second facet of the rela-
tionship: phonetics in phonology, a topic we will turn to in 4. Under the
second view the effect of naturalness on the phonological system is indi-
rect. Under the third view, some phonological constraints are considered to
be phonetically grounded, but formal symmetry plays a role in constraint
Complexity in phonetics and phonology 33

creation. The speaker/learner generalizes from experience in constructing
phonetically grounded constraints. The link between the phonological sys-
tem and phonetic grounding is phonetic knowledge (Kingston and Diehl,
1994).
An adequate theory of phonology and phonetics, whether modular,
unidimensional, or otherwise needs to account for the relationship between
phonological units and physical realities, the ways in which phonetics acts
on the phonology, as well as to offer an account of phonetics in phonology.
We turn now to the nature of phonetics in phonology and the sources of
naturalness.
4. Naturalness
In this section we consider different views of the source of naturalness in
phonology (4.1). We then present evidence bearing on this question
(4.2). The case we examine concerns patterns of consonant timing in
Georgian stop clusters (Chitoran et al., 2002; Chitoran and Goldstein,
2006).


4.1. Sources of naturalness
Many understand naturalness to be part of phonology. The status of natu-
ralness in phonology relates to early debates in generative phonology about
natural phonology (Stampe, 1979, Donegan and Stampe, 1979). This view
is also foundational to Optimality Theory (e.g. Prince and Smolensky,
2004), where functional explanations characterized in scalar and gradient
terms are central in the definition of the family of markedness constraints.
Contrary to the view that the principles that the rules subserve (the laws)
are placed entirely outside the grammar [] When the scalar and the gradi-
ent are recognized and brought within the purview of theory, Universal
Grammar can supply the very substance from which grammars are built.
(Prince and Smolensky, 2004:233-234.) Under such approaches the expla-
nations of naturalness are connected to the notion of markedness.
It is sometimes argued that explicit phonological accounts of naturalness
pose a duplication problem. Formal accounts in phonological terms (often
attributed to Universal Grammar) parallel or mirror the phonetic roots of
such developments, thus duplicating the phonetic source or historical de-
34 Ioana Chitoran and Abigail C. Cohn

velopment driven by the phonetic source (see Przezdziecki, 2005 for recent
discussion). We return to this point below.
Others understand naturalness to be expressed through diachronic
change. This is essentially approach (ii), the view of Hyman (1976, 2001).
Hyman (1976) offers an insightful historical understanding of this relation-
ship through the process of phonologization, whereby phonetic effects can
be enhanced and over time come to play a systematic role in the phonology
of a particular language. Under this view, phonological naturalness results
from the grammaticalization of low-level phonetic effects. While a particu-
lar pattern might be motivated historically as a natural change, it might be
un-natural in its synchronic realization (see Hyman, 2001 for discussion).
Phonetic motivation is also part of Blevinss (2004) characterization of
types of sound change. According to this view only sound change is moti-
vated by phonetic naturalness, synchronic phonology is not. A sound
change which is phonetically motivated has consequences which may be
exploited (phonologized) by synchronic phonology. Once phonologized, a
sound change is subject to different principles, and naturalness becomes
irrelevant (see also Anderson, 1981).
Hayes and Steriade (2004) propose an approach offering middle ground
between these opposing views, worthy of close consideration. They argue
that the link between the phonetic motivation and phonological patterns is
due to individual speakers phonetic knowledge. This shared knowledge
leads learners to postulate independently similar constraints. (p. 1). They
argue for a deductive approach to the investigation of markedness:
Deductive research on phonological markedness starts from the assump-
tion that markedness laws obtain across languages not because they reflect
structural properties of the language faculty, irreducible to non-linguistic
factors, but rather because they stem from speakers shared knowledge of
the factors that affect speech communication by impeding articulation, per-
ception, or lexical access. (Hayes and Steriade, 2004:5).
This view relies on the Optimality Theoretic (OT) framework. Unlike rules,
the formal characterization of an OT constraint may include its motivation,
and thus offers a simple way of formalizing phonetic information in the
grammar. Depending on the specific proposal, the constraints are evaluated
either by strict domination or by weighting. Phonetically grounded con-
straints are phonetically sensible; they ban structures that are phonetically
difficult, and allow structures that are phonetically easy, thus relying heavi-
ly on the notion of effort. Such constraints are induced by speakers based
on their knowledge of the physical conditions under which speech is pro-
Complexity in phonetics and phonology 35

duced and perceived. Consequently, while constraints may be universal,
they are not necessarily innate. To assess these different views, we consider
some evidence.


4.2. Illustrating the source of naturalness and the nature of sound change
We present here some evidence supporting a view consistent with phonolo-
gization and with the role of phonetic knowledge as mediated by the
grammar, rather then being directly encoded in it. We summarize a recent
study regarding patterns of consonant timing in Georgian stop clusters.
Consonant timing in Georgian stop clusters is affected by position in the
word and by the order of place of articulation of the stops involved (Chito-
ran et al., 2002; Chitoran and Goldstein, 2006). Clusters in word-initial
position are significantly less overlapped than those in word-internal posi-
tion. Also, clusters with a back-to-front order of place of articulation (like
gd, tp) are less overlapped than clusters with a front-to-back order (dg, pt).

(3) Georgian word-initial clusters

Front-to-back Back-to-front
bgera sound g-ber-av-s fills you up
p
h
t
h
ila hair lock t
h
b-eb-a warms you
dg-eb-a stands up gd-eb-a to be thrown

The authors initially attributed these differences to considerations of per-
ceptual recoverability, but a subsequent study (Chitoran and Goldstein,
2006) showed that this explanation is not sufficient. Similar measures of
overlap in clusters combining stops and liquids also show that back-to-front
clusters (kl, rb) are less overlapped than front-to-back ones (pl, rk), even
though in these combinations the stop release is no longer in danger of be-
ing obscured by a high degree of overlap, and liquids do not rely on their
releases in order to be correctly perceived. The timing pattern observed in
stop-liquid and liquid-stop clusters is not motivated by perceptual recove-
rability. Consequently, the same explanation also seems less likely for the
timing of stop-stop clusters. It suggests in fact that perceptual recoverability
is not directly encoded in the phonology after all, but rather that the syste-
matic differences observed in timing may be due to language-specific coor-
36 Ioana Chitoran and Abigail C. Cohn

dination patterns, which can be phonologized, that is, learned as grammati-
cal generalizations.
Moreover, in addition to the front-to-back / back-to-front timing pat-
terns, stop-stop clusters show overall an unexpectedly high degree of sepa-
ration between gestures, more than needed to avoid obscuring the release
burst. Some speakers even tend to insert an epenthetic vowel in back-to-
front stop clusters, the ones with the most separated gestures. While this
process of epenthesis is highly variable at the current stage of the language,
it occurs only in the naturally less overlapped back-to-front clusters, sug-
gesting a further step towards the phonologization of natural timing pat-
terns in Georgian.
The insertion of epenthetic vowels could ultimately affect the phonotac-
tics and syllable structure of Georgian. This would be a significant change,
especially in the case of word-initial clusters. Word-initial clusters are sys-
tematically syllabified as tautosyllabic onset clusters by native speakers.
The phonologization of the epenthetic vowels may lead to the loss of word-
initial clusters from the surface phonology of the language, at least those
with a back-to-front order of place of articulation.
Although the presence of an epenthetic vowel is not currently affecting
speakers syllabification intuitions, articulatory evidence shows that the
syllable structure of Georgian is being affected in terms of articulatory or-
ganization. In a C1
v
C2V sequence with an epenthetic vowel the two conso-
nants are no longer relatively timed as an onset cluster, rather C1 is timed
as a single onset relative to the epenthetic vowel (Goldstein et al., 2007).
In the model recently developed by Browman & Goldstein (2000) and
Goldstein et al. (2006) syllable structure emerges from the planning and
control of stable patterns of relative timing among articulatory gestures. A
hypothesis proposed in this model states that an onset consonant (CV) is
coupled in-phase with the following vowel. If an onset consists of more
than one consonant (CCV), each consonant should bear the same coupling
relation to the vowel. This would result in two synchronous consonants,
which would make one or the other unrecoverable. Since the order of con-
sonants in an onset is linguistically relevant, it is further proposed that the
consonants are also coupled to each other in anti-phase mode, meaning in a
sequential manner. The result is therefore a competitive coupling graph
between the synchronous coupling of each consonant to the vowel, and the
sequential coupling of the consonants to each other. Goldstein et al. (2007)
examined articulatory measures (using EMMA) which showed that in
Georgian, as consonants are added to an onset (CV CCV CCCV) the
Complexity in phonetics and phonology 37

time from the target of the rightmost C gesture to the target (i.e., the center)
of the following vowel gesture gets shorter. In other words, the rightmost C
shifts progressively to the right, closer to the vowel. This is the predicted
consequence of the competitive coupling.
In this study, two Georgian speakers produced the triplet rial-i commo-
tion krial-i glitter tskrial-a shiny clean. One of the speakers
shows the rightward shift of the [r], as expected. This effect has previously
also been observed in English, and is known as the c-center effect
(Browman and Goldstein, 1988, Byrd, 1996). The second speaker, how-
ever, did not show the shift in this set of data. This speaker produced an
audible epenthetic vowel in the back-to-front sequence [kr] in all forms.
This suggests that [k] and [r] do not form an onset cluster for this speaker,
and in this case no rightward shift is predicted by the model. The rightward
shift is absent from this speakers data because the competitive coupling is
absent. Instead, [r] is coupled in-phase with the following [i], and [k] is
coupled in-phase with the epenthetic vowel.
The longer separation observed in Georgian back-to-front clusters may
have been initially motivated by phonetic naturalness (perceptual recover-
ability in stop-stop clusters). But the generalization of this timing pattern to
all back-to-front clusters, regardless of segmental composition, and the
further development of epenthetic vowels in this context can no longer be
attributed directly to the same phonetic cause. An appropriate conclusion to
such facts is the phrase coined by Larry Hyman: Diachrony proposes,
synchrony disposes (Hyman, 2005). Once phonologized, synchronic proc-
esses become subject to different factors, therefore the study of phonetic
naturalness is relevant primarily within the context of diachronic change.
Phonology is the intersection of phonetics and grammar (Hyman, 1976).
The naturalness of phonetics (in our example, the reduced gestural overlap
in back-to-front clusters) thus interacts with grammatical factors in such a
way that the phonetic naturalness observable in phonology (the insertion of
epenthetic vowels) is not the direct encoding of phonetic knowledge, but
rather phonetic knowledge mediated by the principles of the grammar. This
suggests that, as with the case of phonology in phonetics, here too, phonet-
ics and phonology are not reducible to one and the same thing.
Processes may be natural in terms of their motivation. In terms of their
effect they can be more categorical or more gradient. Studies such as the
one outlined above suggest that examining phonetic variability, both within
and across languages, may reveal additional facets of complexity, worthy
38 Ioana Chitoran and Abigail C. Cohn

of investigation. This brings us back to the two facets of the relationship
between phonology and phonetics.
As discussed above, it is not the case that coarticulation and assimilation
are the same thing, since these patterns are not identical and the coarticula-
tory effects are built on the phonological patterns of assimilation. It is an
illusion to say that treating such patterns in parallel in the phonology and
phonetics poses a duplication problem as has been suggested by a number
of researchers focusing on the source of naturalness in phonology. Rather
the parallel effects are due indirectly to the ways in which phonology is
natural, not directly in accounting for the effects through a single vocabu-
lary or mechanism. Thus we need to draw a distinction between the source
of the explanation, where indeed at its root some factors may be the same
(see Przezdziecki, 2005 for discussion), and the characterization of the
patterns themselves, which are similar, but not the same.
Since assimilation and coarticulation are distinct, an adequate model
needs to account for both of them. The view taken here is that while as-
similation might arise historically through the process of phonologization,
there is ample evidence that the patterns of assimilation and coarticulation
are not reducible to the same thing, thus we need to understand how the
more categorical patterns and the more gradient patterns relate. In the fol-
lowing section we consider how the issues discussed so far relate to the
question of the relevant units of representation.
5. The multi-faceted nature of complexity
Based on our discussion of the relationship between phonetics and phonol-
ogy, it becomes increasingly clear that the notion of complexity in phonol-
ogy must be a multi-faceted one. As discussion in this chapter highlights,
and as also proposed by Maddieson (this volume), Marsico et al. (2004)
and subsequent work, different measures of complexity of phonological
systems can be calculated, at different levels of representation, notably
features, segments, and syllables. The question of the relevant primary units
is therefore not a trivial one, as it bears directly on the question of the rele-
vant measure of complexity. Moreover, it brings to the forefront the triad
formed by perception units production units units of representation. The
following important questions then arise:
Complexity in phonetics and phonology 39

- in measuring complexity, do we need to consider all three members
of the triad in their interrelationship, or is only one of the three
relevant?
- does the understanding of the triad change depending on the pri-
mary categories chosen?
In this section we briefly formulate the questions that we consider relevant
in this respect, and we provide background to start a discussion.
We distinguish here between units at two levels: units at the level of
cognitive representation, and units of perception. The fact that these two
types of representations may or may not be isomorphic suggests that a rele-
vant measure of complexity should not be restricted to only one or the
other. We propose that the choice of an appropriate unit may depend on
whether we are considering: (i) representations, (ii) sound systems, or (iii)
sound patterns. For example, when considering exclusively sound systems,
the segment or the feature has been shown to be appropriate (Lindblom and
Maddieson, 1988; Maddieson, 2006; Marsico et al., 2004), but when con-
sidering the patterning of sounds within a system, a unit such as the gesture
could be considered equally relevant.
The number of representation units proposed in the literature is quite
large.
4
So far, concrete measures of complexity have been proposed or at
least considered for features, segments, and syllables. The most compelling
evidence for units of perception has been found also for features, pho-
nemes, and syllables (see Nguyen, 2005 for an overview). A clear consen-
sus on a preferred unit of perception from among the three has not been
reached so far. This suggests that all three may have a role to play. In fact,
recent work by Grossberg (2003) and Goldinger and Azuma (2003) sug-
gests that different types of units, of smaller and larger sizes, can be acti-
vated in parallel. Future experiments will reveal the way in which multiple
units are needed in achieving an efficient communication process. If this is
the case, then multiple units are likely to be relevant to computations of
phonological complexity. Obviously, this question cannot be answered
until a fuller understanding of the perception of the different proposed units
has been reached.
Although more representation units have been proposed in the literature,
other than features, segments, and syllables, we will limit our discussion to
this subset, which overlaps with that of plausible perception units. The
relevance of features for complexity has already been investigated. Marsico
et al. (2004) compare measures of complexity based on different sets of not
so abstract phonetic dimensions, for example features of the type high,
40 Ioana Chitoran and Abigail C. Cohn

front, voiced, etc. Distinctive features as such have not been considered
in calculations of complexity, but their role has been investigated in a re-
lated measure, that of feature economy (Clements, 2003). The hypothesis
based on feature economy predicts that languages tend to maximize the
number of sounds in their inventories that use the same feature set, thus
maximizing the combinatory possibilities of features. Clements thorough
survey of the languages in the UPSID database confirms this hypothesis.
Speech sounds tend to be composed of features that are already used else-
where in a given system. The finding that is most interesting relative to
complexity is that feature economy is not a matter of the total number of
features used per system, but rather of the number of segments sharing a
given feature. This is interesting because feature economy can be seen as a
measure of complexity at the feature level. Nevertheless, this measure
makes direct reference to the segment, another unit of representation. This
again brings up the possibility that more than one unit, at the same time,
may be relevant for computations of phonological complexity. As pointed
out by Pellegrino et al. (2007), the relevance of segments is hard to ignore.
While the authors agree that the cognitive relevance of segments is still
unclear, they ask: if we give up the notion of segments, then what is the
meaning of phonological inventories? Thus, at least intuitively, segments
cannot be excluded from these considerations. As discussed earlier, this is
the level of unit used by Maddieson (1984), and has been the level at which
many typological characterizations have been successfully made. More
recent approaches to complexity have considered the third unit, the sylla-
ble. Maddieson (2007; this volume) has studied the possible correlations
between syllable types and segment inventories, and tone contrasts.
Other units have not yet been considered in the measure of complexity.
Their relevance will depend in part on evidence found for their role in per-
ception and cognitive representation. In addition to this aspect, we believe
that relevant measures will also depend on the general context in which the
interaction of these units is considered: sound inventories or phonological
systems including processes. Moreover, within processes, we expect that
the measures will also differ depending on whether we are considering
synchronic alternations or diachronic change. Finally, to return to the inter-
action between phonetics and phonology, the topic with which we started
this paper, we believe that understanding phonological complexity may
also require an understanding of the relevance of phonetic variation for
example the phoneme-allophone relation for a measure of phonological
complexity.
Complexity in phonetics and phonology 41


Notes

1
. Authors translation.
2
. See Jurafsky et al., (2001) for discussion of the role of predictability of lan-
guage processing and production.
3
. Pierrehumbert et al., (2000) make similar observations.
4
. Here we only consider abstractionist models, acknowledging the importance of
exemplar models (Johnson, 1997, Pierrehumbert, 2001, 2002, among others).
At this point in the development of exemplar models the question of complexi-
ty has not been addressed, and it is not easy to tell what, in an exemplar model,
could be included in a measure of complexity.
References
Anderson, S.
1981 Why phonology isn't natural. Linguistic Inquiry 12: 493-539.
Beddor, P. and H. Yavuz
1995 The relationship between vowel-to-vowel coarticulation and vowel
harmony in Turkish. Proceedings of the 13th International Congress
of Phonetic Sciences, volume 2, pp. 44-51.
Blevins, J.
2004 Evolutionary Phonology. The Emergence of Sound Patterns Cam-
bridge: Cambridge University Press
Browman, C. and L. Goldstein
1988 Some notes on syllable structure in articulatory phonology. Pho-
netica 45: 140-155.
1992 Articulatory Phonology: an overview. Phonetica 49: 155-180.
2000 Competing constraints on intergestural coordination and self-
organization of phonological structures. Bulletin de la Communica-
tion Parle 5: 25-34.
Byrd, D.
1996 Influences on articulatory timing in consonant sequences. Journal of
Phonetics 24:209-244.
Chitoran, I., L. Goldstein, and D. Byrd
2002 Gestural overlap and recoverability: Articulatory evidence from
Georgian. In C. Gussenhoven and N. Warner (eds.) Laboratory Pho-
nology 7. Berlin: Mouton de Gruyter. pp. 419-48.
Chitoran, I. and L. Goldstein
2006 Testing the phonological status of perceptual recoverability: Articu-
latory evidence from Georgian. Poster presented at Laboratory Pho-
nology 10, Paris, France, June-July 2006.
42 Ioana Chitoran and Abigail C. Cohn

Chomsky, N. and M. Halle
1968 The Sound Pattern of English. New York, NY: Harper and Row.
Clements, G. N.
2003 Feature economy in sound systems. Phonology 20.3: 287333
Cohn, A.C.
1990 Phonetic and Phonological Rules of Nasalization. UCLA PhD dis-
sertation. Distributed as UCLA Working Papers in Phonetics 76.
1998 The phonetics-phonology interface revisited: Wheres phonetics?
Texas Linguistic Forum 41: 25-40.
2006a Is there gradient phonology? In G. Fanselow, C. Fery, R. Vogel and
M. Schlesewsky (eds.) Gradience in Grammar: Generative Perspec-
tives. Oxford: OUP, pp. 25-44
2006b Phonetics in phonology and phonology in phonetics. To appear in
Working Papers of the Cornell Phonetics Laboratory 16.
Coup, C., E. Marsico, and F. Pellegrino
this volume Structural complexity of phonological systems
Donegan, P. and D. Stampe
1979 The study of natural phonology. In D.A. Dinnsen (ed.) Current ap-
proaches to phonological theory. Bloomington: Indiana University Press
Flemming, E.
2001 Scalar and categorical phenomena in a unified model of phonetics
and phonology. Phonology 18: 7-44.
2002 Auditory Representations in Phonology. New York, NY: Routledge.
[Revised version of 1995 UCLA Ph.D. dissertation.]
Givn, T.
1991 Markedness in grammar: Distributional, communicative and cogni-
tive correlates of syntactic structure. Studies in Language 15.2:
335-370.
Goldinger, S. and T. Azuma
2003 Puzzle-solving science: the quixotic quest for units in speech percep-
tion. Journal of Phonetics 31: 305-320
Goldstein, L., D. Byrd, and E. Saltzmann
2006 The role of vocal tract gestural action units in understanding the evolution
of phonology. In M. Arbib (ed.) From action to language: The mirror
neuron system. Cambridge: Cambridge University Press. pp. 215-249.
Goldstein, L., I. Chitoran, and E. Selkirk
2007 Syllable structure as coupled oscillator modes: Evidence from Georgian
vs. Tashlhiyt Berber. Proc. of the 16
th
International Congress of Phonetic
Sciences, Saarbrcken, Germany, August 6-10, 2007. pp. 241-244.
Grossberg, S.
2003 Resonant neural dynamics of speech perception. Journal of Phonet-
ics 31: 423-445.

Complexity in phonetics and phonology 43

Haspelmath, M.
2006 Against markedness (and what to replace it with). Journal of Lin-
guistics 42: 25-70.
to appear Frequency vs. iconicity in explaining grammatical asymmetries.
Cognitive Linguistics 18:4.
Hayes, B.
1999 Phonetically-driven phonology: The role of optimality theory and
inductive grounding. In M. Darnell, E. Moravscik, M. Noonan, F.
Newmeyer, and K. Wheatly (eds.) Functionalism and Formalism in
Linguistics, Volume I: General Papers, Amsterdam: John Benja-
mins, pp. 243-285.
Hayes, B. and D. Steriade
2004 Introduction: the phonetic bases of phonological markedness. In B.
Hayes, R. Kirchner, and D. Steriade (eds) Phonetically Based Pho-
nology. Cambridge: CUP, pp. 1-33.
Huffman, M.
1990 Implementation of Nasal: Timing and Articulatory Landmarks.
UCLA PhD dissertation. Distributed as UCLA Working Papers in
Phonetics 75.
Hume, E.
2004 Deconstructing markedness: A predictability-based approach. To
appear in Proceedings of The Berkeley Linguistic Society 30.
2006 Language Specific and Universal Markedness: An Information-
Theoretic Approach. Paper presented at the Linguistic Society of
America Annual Meeting, Colloquium on Information Theory and
Phonology. Jan. 7, 2006.
Hume, E. and K. Johnson
2001 The Role of Speech Perception in Phonology. San Diego: Academic
Press.
Hyman, L.
1976 Phonologization. In A. Juilland (ed.) Linguistic Studies Offered to
Joseph Greenberg. Vol. 2. Saratoga: Anna Libri, pp. 407-418.
2001 The limits of phonetic determinism in phonology: *NC revisited. In
E. Hume and K. Johnson (eds.) The Role of Speech Perception in
Phonology. San Diego: Academic Press, pp. 141-185.
2005 Diachrony proposes, synchrony disposes Evidence from prosody
(and chess). Talk given at the Laboratoire Dynamique du Langage,
Universit de Lyon 2, Lyon, March 25, 2005.
Jakobson, R.
1971 Les lois phoniques du langage enfantin et leur place dans la phonologie
gnrale. In Roman Jakobson : Selected Writings, vol. 1. Phonological
Studies. 2
nd
expanded edition. The Hague: Mouton. pp. 317-327.

44 Ioana Chitoran and Abigail C. Cohn

Johnson, K.
1997 Speech perception without speaker normalization. In K. Johnson
and J. Mullinix (eds.) Talker Variability in Speech Processing. San
Diego: Academic Press, pp. 146-165.
Jurafsky, D., A. Bell, M. Gregory and W. D. Raymond
2001 Probabilistic relations between words: Evidence from reduction in
Lexical Production. In J. Bybee and P. Hopper (eds) Frequency and
the Emergence of Linguistic Sstructure. Amsterdam: John Benja-
mins, pp. 229-254.
Keating, P.
1985 Universal phonetics and the organization of grammars. In V. From-
kin (ed.) Phonetic Linguistic Essays in Honor of Peter Ladefoged.
Orlando: Academic Press, pp. 115-132.
1988 The window model of coarticulation: articulatory evidence. UCLA
Working Papers in Phonetics 69: 3-29.
1990 The window model of coarticulation: articulatory evidence. In Kingston
and Beckman (eds.) Papers in Laboratory Phonology I: Between the
Grammar and the Physics of Speech. Cambridge: CUP, pp. 451-470.
Kingston, J.
1990 Articulatory binding. In J. Kingston and M. Beckman (eds.) Papers
in Laboratory Phonology I: Between the Grammar and the Physics
of Speech. Cambridge: CUP, pp. 406-434.
Kingston, J. and R. Diehl
1994 Phonetic knowledge. Language 70: 419-53.
Kirchner, R.
1998/2001 An Effort-Based Approach to Consonant Lenition. New York, NY:
Routledge. [1998 UCLA Ph.D dissertation]
Klatt, D.
1987 Review of text-to-speech conversion for English. JASA 82.3: 737-793.
Lehmann, C.
1974 Isomorphismus im sprachlichen Zeichen. In H. Seiler (ed.) Linguistic
workshop II: Arbeiten des Klner Universalienprojekts 1973/4.
Mnchen: Fink (Structura, 8). pp. 98-123.
Lindblom, B. and I. Maddieson
1988 Phonetic universals in consonant systems. In C.N. Li and L.H.
Hyman (eds.) Language, Speech and Mind: Studies in Honor of Vic-
toria H. Fromkin. pp. 62-78. Beckenham: Croom Helm.
Maddieson, I.
1984 Patterns of Sounds. Cambridge: Cambridge University Press
2006 Correlating phonological complexity: Data and validation. Linguistic
Typology 10: 108-125.

Complexity in phonetics and phonology 45

2007 Issues of phonological complexity: Statistical analysis of the relationship
between syllable structures, segment inventories and tone contrasts. In
Sol, M-J.,P.S. Beddor, M. Ohala (eds.) Experimental Approaches to
Phonology. Oxford University Press, Oxford and New York.
this volume Calculating Phonological Complexity.
Marsico, E., I. Maddieson, C. Coup, F. Pellegrino
2004 Investigating the hidden structure of phonological systems. Pro-
ceedings of The Berkeley Linguistic Society 30: 256-267.
Newmeyer, F.
1992 Iconicity and generative grammar. Language 68 : 756-796.
Nguyen, N.
2005 La perception de la parole. In Nguyen, N., S. Wauquier-Gravelines,
J. Durand (eds.) Phonologie et phontique: Forme et substance.
Paris : Herms. pp. 425-447.
Ohala, J.J.
1981 The listener as a source of sound change. In C. Masek, R. Hendrick,
and M. Miller (eds.) Papers from the Parasession on Language and
Behavior. Chicago : CLS. pp. 178-203.
Pellegrino, F. , C. Coup, and E. Marsico
2007 An information theory-based approach to the balance of complexity
between phonetics, phonology and morphosyntax. Paper presented
at the Linguistic Society of America, LSA Complexity Panel, January
2007.
Pierrehumbert, J.
1980 The Phonology and Phonetics of English Intonation. MIT Ph.D.
dissertation.
2001 Exemplar dynamics: Word frequency, lenition and contrast. In J.
Bybee and P. Hooper (eds.) Frequency and the Emergence of Lin-
guistic Structure. Amsterdam: John Benjamins. pp. 137-157.
2002 Word-specific phonetics. In C. Gussenhoven and N. Warner (eds.)
Laboratory Phonology 7. Berlin: Mouton de Gruyter. pp. 101-139.
Pierrehumbert, J. M. Beckman, and D. R. Ladd
2000 Conceptual foundations in phonology as a laboratory science. In N.
Burton-Roberts, P. Carr and G. Docherty (eds.) Phonological
Knowledge: Conceptual and Empirical Issues, New York: Oxford
University Press. 273-304.
Pouplier, M.
2003 Units of phonological encoding: Empirical evidence. PhD disserta-
tion, Yale University.
Prince, A. and P. Smolensky
2004 Optimality Theory: Constraint Interaction in Generative Grammar.
Malden, MA: Blackwell.

46 Ioana Chitoran and Abigail C. Cohn

Przezdziecki, M.
2005 Vowel Harmony and Coarticulation in Three Dialects of Yorb: Pho-
netics Determining Phonology, Cornell University PhD dissertation.
Stampe, D.
1979 A Dissertation in Natural Phonology. New York: Garland Press.
[1973 University of Chicago Ph.D dissertation]
Steriade, D.
2001 Directional asymmetries in assimilation: A directional account. In E.
Hume and K. Johnson (eds.) The Role of Speech Perception in Pho-
nology. San Diego: Academic Press, pp. 219-250.
Stevens, K.
1989 On the quantal nature of speech. Journal of Phonetics 17: 3-45.
Trubetzkoy, N.S.
1939 Grundzge der Phonologie. Publi avec lappui du Cercle Linguis-
tique de Copenhague et du Ministre de linstruction publique de la
Rpublique Tchco-slovaque. Prague.
1969 Principles of Phonology [Grundzge der Phonologie] translated by
C. Baltaxe. Berkeley: University of California Press.
Tsuchida, A.
1997 Phonetics and Phonology of Japanese Vowel Devoicing. Cornell
University, PhD dissertation.
1998 Phonetic and phonological vowel devoicing in Japanese. Texas
Linguistic Forum 41: 173-188.
Valle, N.
1994 Systmes vocaliques : de la typologie aux prdictions. PhD disserta-
tion, Universit Stendhal, Grenoble.
Valle, N., L-J Bo, J-L Schwartz, P. Badin, C. Abry
2002 The weight of phonetic substance in the structure of sound invento-
ries. ZAS Papers in Linguistics 28: 145-168.
Zsiga, E.C.
1995 An acoustic and electropalatographic study of lexical and postlexical
palatalization in American English. In B. Connell and A. Arvaniti (eds.)
Phonology and Phonetic Evidence: Papers in Laboratory Phonology IV.
Cambridge: Cambridge University Press. pp. 282-302.


Languages sound inventories: the devil in the
details
John J. Ohala
1. Introduction
In this paper I am going to modify somewhat a statement made in Ohala
(1980) regarding languages speech sound inventories exhibiting the
maximum use of a set of distinctive features. In that paper, after noting
that vowel systems seem to conform to the principle of maximal acoustic-
perceptual differentiation (as proposed earlier by Bjrn Lindblom), I ob-
serve:
... it would be most satisfying if we could apply the same principles to
predict the arrangement of consonants, i.e., posit an acoustic-auditory space
and show how the consonants position themselves so as to maximize the in-
ter-consonantal distance. Were we to attempt this, we should undoubtedly
reach the patently false prediction that a 7 consonant system should include
something like the following: , k, ts, , m, r, . Languages which do have
few consonants, such as the Polynesian languages, do not have such an ex-
otic inventory. In fact, the languages which do possess the above set (or
close to it) such as Zulu, also have a great many other consonants of each
type, i.e., ejectives, clicks, affricates, etc. Rather than maximum differentia-
tion of the entities in the consonant space, we seem to find something ap-
proximating the principle which would be characterized as maximum utili-
zation of the available distinctive features. This has the result that many of
the consonants are, in fact, perceptually quite close differing by a mini-
mum, not a maximum number of distinctive features.
1

Looking at moderately large to quite large segment inventories like those in
English, French, Hindi, Zulu, Thai, this is exactly the case. Many segments
are phonetically similar and as a consequence are confusable.
Some data showing relatively high rates of confusion of certain CV syl-
lables (presented in isolation, hi-fi listening condition) (from Winitz et al.
1972) are given in Table 1.


48 John J. Ohala

Table 1. Confusion matrix from Winitz et al. (1972). Spoken syllables consisted of
stop burst plus 100 msec of following transition and vowel; high-fidelity
listening conditions. Numbers given are the incidence of the specified re-
sponse to the specified stimulus.
Response: /p/ /t/ /k/
Stimulus: /pi/ .46 .38 .17
/pa/ .83 .07 .11
/pu/ .68 .10 .23
/ti/ .03 .88 .09
/ta/ .15 .63 .22
/tu/ .10 .80 .11
/ki/ .15 .47 .38
/ka/ .11 .20 .70
/ku/ .24 .18 .58

I do actually believe that the degree of auditory distinctiveness plays some
role in shaping languages segment inventoriesespecially when auditory
distinctiveness is low. Sound change, acting blindly (i.e., non-
teleologically), weeds out similar sounding elements through confusion
which results in mergers and loss. The loss in some dialects of English of
// and // and their merger with either /f/ and /v/ (respectively) or with /t/
and /d/ (respectively) is a probable example. I also believe it is sound
change, again acting blindly, which is largely responsible for the introduc-
tion of new series of segments which involve re-use of some pre-existing
features. In some cases there is historical evidence of this. Proto-Indo-
European had only three series of stops: voiced, voiceless, and breathy-
voiced (i.e., among labials: /b/, /p/, /b/). The voiceless aspirated series, /p/,
exemplified in Sanskrit and retained in many of the modern Indo-Aryan
languages (like Hindi) developed by sound change from the (simple) voice-
less series. And a fifth (!) series of stops, the voiced implosives, /, , etc./
in Sindhi developed from geminated versions of the (simple) voiced stops.
Similarly we know that the nasal vowels in French and Hindi developed
out of the pre-existing oral vowels plus following nasal consonant (with the
nasal consonant lost). (E.g., French saint [s ] < Latin sanctus holy; Hindi
dant tooth [d)t] < IE dont-, dent- tooth.) It is also relevant to my case
that historically French once had as many nasal as oral vowels and then
over the centuries reduced the nasal vowel inventory due to, I have argued,
auditory similarity (Ohala and Ohala, 1993).
Languages' sound inventories: the devil in the details 49

But the point that I want to revise or distance myself from somewhat is
the idea that re-use of distinctive features always results in a cost-minimal
augmentation, vis--vis the introduction of segments that are distinguished
by virtually all new distinctive features.
I suppose the basic message I am emphasizing here is that the apparent
symmetry found in many languages segment inventories (or possibly the
symmetry imposed by the analyst who put segments in matrices where all
rows and columns are uniformly filled) obscures a more complicated situa-
tion. There is a great deal of what is referred to as allophonic variation,
usually lawful contextual variation. What this means is that the neat sym-
metrical matrices of speech sound inventories are really abstractions. The
complications the devilish details referred to in my title have been
swept under the rug! Can we ignore this variation when speculating
about common cross-language tendencies in the form of languages seg-
ment inventories? I say no since in many cases the same principles are at
work whether they lead to apparent symmetry or asymmetry, and Ill give
some examples in the following sections.
2. Some examples of devilish details
2.1. [p] is weaker than other voiceless stops
Among languages that have both voiced and voiceless stops, the voiceless
bilabial [p] is occasionally missing, e.g., Berber, and this gap is much more
common than a gap at any other place of articulation among voiceless stop
series.
2
In Japanese the /p/ has a distribution unlike other voiceless stops:
it doesnt occur in word-initial position except in onomatopoeic vocabulary
(e.g., /patapatSa/, splash) or medially as a geminate (e.g., /kapa/ cucum-
ber sushi) or in a few other medial environments. Phonetically in English
and many other languages the burst of the /p/ has the lowest intensity of
any of the voiceless stops. The reason, of course, is that there is no down-
stream resonator to amplify the burst. We should see that the latter phonetic
fact is the unifying principle underlying all these patterns. (And this is, in
part, the reason why the sequence [pi] is confused with [ti], as documented
in Table 1.)


50 John J. Ohala

2.2. Voicing in stops and place of articulation
Among voiced stops, the velar, [g], is often missing in languages stop in-
ventory even though they may have a voicing contrast in stops articulated
at more forward places of articulation, e.g., in Thai, Dutch, and Czech (in
native vocabulary). In some languages, morphophonemic variations involv-
ing the gemination of voiced stops shows an asymmetry in their behavior
depending on how far front or back the stop is articulated. E.g., in Nubian
(see Table 2), the geminate bilabial stop retains its voicing; those made
further back become voiceless.

Table 2. Morphophonemic variation in Nubian (from Bell, 1971)
Noun stem Stem + and English gloss
/fab/ /fabn/ father
/sd/ /stn/ scorpion
/kad/ /katn/ donkey
/m/ /mkn/ dog

The usual descriptions of the allophonic variation of voiced stops in Eng-
lish (/b d g/) is that they are voiceless unaspirated in word-initial position
but voiced between sonorants. In my speech, however, and that of another
male native speaker of American English, I have found that // is voiceless
even intervocalically. See Figure 1 (subject DM) which gives the waveform
and accompanying pharyngeal pressure (sampled with a thin catheter in-
serted via the nasal cavity). The utterance, targeted as /gA/ is manifested
as [kA]. However, as shown in Figure 2, when the pharyngeal pressure
was artificially lowered (by suction applied via a second catheter inserted in
the other side of the nasal cavity), the // was voiced!


Languages' sound inventories: the devil in the details 51


[

]
Figure 1. Acoustic waveform (top) and pharyngeal pressure (bottom) of the utter-
ance // spoken by subject DM, a male native speaker of American
English. Condition: no venting of pharyngeal pressure. Phonetically the
realization of the intervocalic stop was voiceless.

[ ]
Figure 2. Acoustic waveform (top) and pharyngeal pressure (bottom) of the utter-
ance /g/ spoken by subject DM, a male native speaker of American
English. Condition: artificial venting of pharyngeal pressure. Phoneti-
cally the realization of the intervocalic stop was voiced (evident in the
pressure signal).
52 John J. Ohala

All of these patterns, from the absence of [] in Thai to the voiceless reali-
zation of // intervocalically in American English speakers are manifesta-
tions of the same universal aerodynamic principle: the possibility of voic-
ing during stops requires a substantial pressure drop across the glottis and
this depends partly on the volume of the cavity between the point of articu-
lation and the larynx and more importantly on the possibility of passive
expansion of that cavity in order to make room for the incoming air flow.
Velars and back-articulated stops have less possibility to accommodate the
incoming airflow and so voicing is threatened.
3. On the various cues for obstruent voicing in English
Lisker (1986) listed several features in addition to presence/absence of
voicing or differences in VOT by which the so-called voicing distinction
in English obstruents is differentiated perceptually.


3.1. F0 perturbations on vowels following stops
The vowels immediately after voiced and voiceless obstruents show a sys-
tematic F0 variation. Figure 3 shows data from Hombert et al. (1979).
These curves represent unnormalized averages of 100 msec of the F0 con-
tours following /b d g/ (lower curve) and /p t k/ (upper curve) from 5
speakers of American English. Each curve is the average of 150 tokens.
(Given that /p t k/ have a positive VOT whereas /b d g/ has VOT close to
zero, the onset of the curves are phase shifted with respect to moment of
stop release.)
3
Such F0 differences can be explained as being mechanically
caused due to differences in vocal cord state, i.e., they seem not to be pur-
poseful on the part of speakers; see Ohala et al. (2004). Nevertheless Fuji-
mura (1971) has presented evidence that such F0 contours are used by na-
tive speakers of English to differentiate this contrast when all other cues
have been neutralized. Does this mean that English is a tone language? We
would probably answer no since the speaker doesnt have to separately
produce and control the tension of the laryngeal muscles to implement
these F0 differences. So it is English listeners, if not the speakers, that have
the added complexity in their perceptual task of recognizing F0 differences
just as native speakers of tone languages do. It is not much of a simplifica-
tion of the sound system of a language if the language users (in their role as
Languages' sound inventories: the devil in the details 53

listeners) have to have skill in categorical recognition of short-term F0 con-
tours in addition to recognizing voicing itself or VOT differences.

20
100
Time (ms)
120
130
140
F
0
(
H
z
)
p
b
20
100
Time (ms)
120
130
140
F
0
(
H
z
)
p
b

Figure 3. Average fundamental frequency values of vowels following English
stops (data from five speakers). The curves labelled [p] and [b] represent
the values associated with all voiceless and voiced stops, respectively
regardless of place of articulation. The zero point on abscissa represents
the moment of voice onset; with respect to stop release, this occurs later
in real time in voiceless aspirated stops (from Hombert et al., 1979).


3.2. Secondary cues to voicing in coda obstruents
As is well known, the class of supposedly voiced and voiceless obstruents
in coda position are reliably differentiated by vowel duration, longer dura-
tion of the vowel before voiced obstruents than before voiceless ones (by
ratios of up to 3:2). Since this ratio is so large and there is no apparent me-
54 John J. Ohala

chanical cause of this difference, as Lisker (1974) concluded, this means
that in this case both speaker and listener have to have distinctive vowel
length in their grammars.


3.3. Vowel-influenced variations in VOT
Several studies have shown that the positive VOT of the voiceless aspirated
stops in English show vowel-specific variations (Lisker and Abramson
1964, 1967; Ohala 1981a): VOT is longer before (actually when the stop is
coarticulated with) high close vowels than before open vowels. These
variations are probably an automatic consequence of differences in degree
of aerodynamic resistance to the exiting airflow. The higher resistance of-
fered by the close vowels delays the decay of P
o
and thus the onset of voic-
ing. Figure 4 (from Ohala, 1981a) gives data from English and Japanese
(from unpublished studies by Robert Gaskins and Mary Beckman, respec-
tively). The English data provide further evidence on the tendency of back-
articulated voiced stops to be voiceless since here, even the so-called
voiced velar has a positive VOT. Lisker and Abramson (1967) have
shown that listeners are sensitive to these vowel-specific variations: cross-
over points in the identification of the two categories of stops when pre-
sented in a VOT continuum also vary with the quality of the following
vowel. This phonetic detail therefore must be part of the English-speaking
listeners knowledge about the sound pattern of the language.

I could add many other examples where there are numerous acoustic fea-
tures characteristic of specific consonant-vowel sequences or at least spe-
cific classes of sounds in the context of other specific classes. The net result
of this is to add complexity to the signaling system of language that goes
beyond what is implied by simply adding another row or column to the
languages phoneme inventory.

Languages' sound inventories: the devil in the details 55

0
50
V
O
T

(
m
s
)
Postconsonantal Vowel
i e A o
k_
t_
p_
JAPANESE
i eI E Q A AI oU u
0
50
100
k_
t_
p_
g_
Postconsonantal Vowel
V
O
T

(
m
s
)
ENGLISH
b)
a)
0
50
V
O
T

(
m
s
)
Postconsonantal Vowel
i e A o
k_
t_
p_
JAPANESE
0
50
V
O
T

(
m
s
)
Postconsonantal Vowel
i e A o
k_
t_
p_
JAPANESE
i eI E Q A AI oU u
0
50
100
k_
t_
p_
g_
Postconsonantal Vowel
V
O
T

(
m
s
)
ENGLISH
i eI E Q A AI oU u
0
50
100
k_
t_
p_
g_
Postconsonantal Vowel
V
O
T

(
m
s
)
ENGLISH
b)
a)

Figure 4. VOT variation for stops as a function of following vowel in English (a)
and Japanese (b) (from Ohala, 1981a).
4. Conclusion
If we conceive our task as phonologists as one of characterizing and under-
standing the function of speech to serve as a medium of communication
then we want to know the implications for this function of the differences
between the phonological system of, say, Rotokas with its 11 phonemes
and !Xu with its 141 phonemes. Just by listing the segmental inventories in
the traditional articulatory matrix does not tell the whole story. Adding new
columns or rows can complicate the task of the languages speakers. The
evidence that experimental phonetics has uncovered about so-called sec-
56 John J. Ohala

ondary distinctive features in virtually every language whose sound system
has been studied in some detail, especially the findings that these features
may be different in different contexts, makes it clear that a languages pho-
nological complexity is itself a complex issue.
In the model of sound change that I have proposed (Ohala 1981b, 1993)
the so-called secondary features are very important for understanding why
change takes place and why it takes a particular direction. A feature that
was secondary can become one of the primary distinctive features of a pho-
nological contrast if the primary feature(s) are not detected or are misinter-
preted. The important point in this model is that some of the elements of the
after state were already present in the before state, without being ex-
plicitely listed in the inventory.
What type of (re)presentation would correct the deficiencies found in
the familiar matrices of languages segment inventories that I have argued
reduce their utility? To begin with, to the extent that the detailed phonetic
character of sounds has been uncovered through empirical means, then
segment inventories should show that. E.g., when presenting the English
stops, rather than labeling them as [voiced] and [voiceless], they should be
given with more accurate labels: [voiceless unaspirated] and [voiceless
aspirated], respectively. Another example: if certain dialects of Swedish
have an alveolar or apical-retracted voiced stop vs. a dental voiceless stop
(Livijn and Engstrand, 2001), then they should not both appear in the same
place of articulation column. Many other languages show the same pat-
tern, i.e., where the voiced coronal stop has a place of articulation that is
more posterior than that of its voiceless counterpart (Sprouse, Sol and
Ohala, 2008). Of course, systematic contextual variation should also be
noted, e.g., that the voiced apical stop in English is often realized as a tap
[] in some contexts. Also, an elaborated but much more useful account of
the contrasts in a language should reflect in some way whether there is any
evidence that there are differences in the cognitive task/abilities required of
speakers vs. listeners in processing allophonic differences due to phonetic
context. For example, as mentioned above, the slight differences in VOT
before different vowels (close vowels showing slightly longer VOT than
those before more open vowels) is probably not something the speaker has
to implement purposefully; they will occur due to physical constraints (the
greater impedance to the exiting airflow delays the achievement of the
transglottal pressure difference necessary to initiate voicing). Nevertheless,
listeners are no doubt aware of these differences and use them when identi-
fying pre-vocalic stops. Thus the knowledge of the speaker and hearer
Languages' sound inventories: the devil in the details 57

about contextual variation may be different and the compleat psychologi-
cally valid phonological grammar of a language should reflect this.
Thus we need to pay more attention to the devilish details in the imple-
mentation of phonological contrasts. It may help us to understand better
both sound change and the communicative function of speech.


Notes

1
. The IPA symbols in this quote conform to current conventions, not those in
1980.
2
. For a survey of gaps in consonant inventories, see Sherman (1975).
3
. See also Ohala (1974) for similar data on F0 following the release of /s/ and /z/.
References
Bell, H.
1971 The phonology of Nobiin Nubian. African Language Review 9: 115-
139.
Fujimura, O.
1971 Remarks on stop consonants: synthesis experiments and acoustic
cues. In: L. L. Hammerich, R. Jakobson and E. Zwirner (eds.), Form
and substance: phonetic and linguistic papers presented to Eli
Fischer-Jrgensen. pp. 221-232. Copenhagen: Akademisk Forlag.
Hombert, J.-M., Ohala, J. J., & Ewan, W. G.
1979 Phonetic explanations for the development of tones. Language 55: 37-58.
Lisker, L.
1974 On explaining vowel duration variation. Glossa 8: 233-246
1986 Voicing in English: A catalogue of acoustic features signaling /b/
vs. /p/ in trochees. Language and Speech 29. pp. 3-11.
Lisker, L., & Abramson, A.
1964 A cross-language study of voicing in initial stops: Acoustical meas-
urements. Word, 20: 384-422.
1967 Some effects of context on voice onset time in English stops. Lan-
guage and Speech, 10: 1-28.
Livijn, P. & Engstrand, O. (2001) Place of articulation for coronals in some Swed-
ish dialects. Proceedings of Fonetik 2001, the XIVth Swedish Pho-
netics Conference, rens, May 30 - June 1, 2001. Working Papers,
Department of Linguistics, Lund University 49, pp. 112-115.


58 John J. Ohala

Ohala, J. J.
1974 Experimental historical phonology. In: J. M. Anderson & C. Jones
(eds.), Historical Linguistics II, Theory and description in phonol-
ogy: Proceedings of the 1st International Conference on Historical
Linguistics, Edinburgh. pp. 353-389. Amsterdam: North-Holland.
1980 Moderator's summary of symposium on 'Phonetic universals in pho-
nological systems and their explanation.' Proc., 9th Int. Cong. of
Phon. Sc. Vol. 3. Copenhagen: Institute of Phonetics. pp. 181-194.
1981a Articulatory constraints on the cognitive representation of speech. In:
T. Myers, J. Laver, and J. Anderson (eds.), The Cognitive Represen-
tation of Speech. Amsterdam: North Holland. pp. 111-122.
1981b The listener as a source of sound change. In: C. S. Masek, R. A.
Hendrick, & M. F. Miller (eds.), Papers from the Parasession on
Language and Behavior. Chicago: Chicago Ling. Soc. pp. 178 - 203.
1993 The phonetics of sound change. In Charles Jones (ed.), Historical
Linguistics: Problems and Perspectives. London: Longman.
pp. 237-278.
Ohala, J. J., A. Dunn, & R. Sprouse.
2004 Prosody and Phonology. Speech Prosody 2004, Nara, Japan.
Ohala, J. J. & Ohala, M.
1993 The phonetics of nasal phonology: theorems and data. In M. K.
Huffman & R. A. Krakow (eds.), Nasals, nasalization, and the ve-
lum. San Diego, CA: Academic Press. pp. 225-249.
Sherman, D.
1975 Stop and fricative systems: a discussion of paradigmatic gaps and the
question of language sampling. Stanford Working Papers in Lan-
guage Universals. 17: 1-31.
Sprouse, R, Sol, M-J. and Ohala, J. 2008. Oral cavity enlargement in retroflex
sounds. Paper delivered at the 8
th
International Seminar on Speech
Production, Strasbourg.
Winitz, H., Scheib, M. E. & J. A. Reeds.
1972 Identification of stops and vowels, Journal of the Acoustical Society
of America, 51.4: 1309-1317


Signal dynamics in the production and perception
of vowels
Ren Carr
Vowels can be produced with static articulatory configurations represented
by dots in acoustic space (generally by formant frequencies in the F1-F2
plane). But because vowel characteristics vary with speaker, consonantal
environment (co-articulation) and production rate (reduction phenomena),
vowel formant frequencies can also be represented by their mean values
and standard deviations, according to different categories (language, age
and gender of speaker). The use of targets means that they are generally
studied from a static point of view. But several questions can be raised:
How are vowel representations set up if vowel realizations rarely reach
their targets in running speech production (vowel reduction (Lindblom,
1963))? Is representation the same from one person to another? How is a
given vowel, produced several times with different acoustic characteristics
and in different environments, identified? By using contextual information?
By normalization? These questions lead to studying vowels from a dynamic
point of view.
Here, we first propose a theoretical deductive approach to vowel-to-
vowel dynamics which leads to a specification in terms of vocalic trajecto-
ries in the acoustic space characterized by their directions. Then, results on
V1V2 transitions produced and perceived by subjects will be presented. In
production, measurements of the F1 and F2 transition rates are represented
in the F1 rate/F2 rate plane. In perception, direction and rate of synthesized
transitions are studied for transitions situated outside the traditional F1/F2
vowel triangle. This situation enables the study of transitions characterized
only by their directions and rates without relation to any vowel targets of
the vowel triangle. Such experiments show that these transitions can be
perceived as V1V2. Several issues can then be revisited in the light of this
dynamic representation: vocalic reduction, hyper and hypo speech, norma-
lization, perceptual overshoot, etc. A fully dynamic representation of both
vowels and consonants is proposed.
60 Ren Carr

1. Introduction
Vowels are generally characterized by the first two or three formant fre-
quencies. Each of them can be represented in the acoustic space (F1-F2
plane) by a dot (Peterson and Barney, 1952). This specification is static.
Vowels can be produced in isolation without articulatory variations, but in
natural speech such cases are atypical since their acoustic characteristics
are not stable. They vary with the speaker and with the age and gender of
the speaker, with the consonantal context (coarticulation), with the speak-
ing rate (reduction phenomena), and with the language (Lindblom, 1963).
So, vowels are classified into crude classes, first according to the language,
and then according to speaker categories. Within each category, vowels can
be specified in terms of underlying targets corresponding to the context-
and duration-independent values of the formants as obtained by fitting de-
caying exponentials to the data points (Moon and Lindblom, 1994). The
point in focus here is that this specification is static and, significantly, may
be taken to imply that the perceptual representation corresponds to the tar-
get values (Strange, 1989).
At this point, several questions can be raised: How is the perceptual rep-
resentation obtained if the vowel targets depend on the speaker, and are
rarely reached in spontaneous speech production? Are the representations
the same from one person to another (Johnson, et al., 1993; Carr and
Hombert, 2002; Whalen, et al., 2004)? How is this perceptual representa-
tion built: by learning, or is it innate? How is the vowel perceived with its
different acoustic characteristics according to the context and the speaker:
by normalization (Nordstrm and Lindblom, 1975; Johnson, 1990; John-
son, 1997)? Why is vowel perception less categorical than consonant per-
ception (Repp, et al., 1979; Schouten and van Hessen, 1992)?
Many studies have been undertaken to answer these questions. The re-
sults are generally incomplete and contradictory. They cannot be used to set
up a simple theory explaining all the results. But they help highlight the
importance of dynamics in vowel perception (Shankweiler, et al., 1978;
Verbrugge and Rakerd, 1980; Strange, 1989).
In view of the fact that sensory systems have been shown experimen-
tally to be more sensitive to changing stimulus patterns than to purely
steady-state ones, it appears justified to look for an alternative to static tar-
gets - a specification that recognizes the true significance of signal time
variations. One possibility is that dynamics can be characterized by the
direction and the rate of the vocalic transitions:
Signal dynamics in the production and perception of vowels 61

Vowel-vowel trajectories in the F1/F2 plane are generally rectilin-
ear (Carr and Mrayati, 1991). So they can be characterized by
their direction. Moreover, privileged directions are observed in the
production of vowels (in single-vowel or CV syllables) called
vowel inherent spectral changes, VISC (Nearey and Assmann,
1986). Moreover, perception experiments show the importance of
VISC in improving vowel identification (Hillenbrand, et al., 1995;
Hillenbrand and Nearey, 1999).
On the topic of transition rate, we recall the results of Kent and
Moll (1969): the duration of a transition and not its velocity
tends to be an invariant characteristic of VC and CV combina-
tions. Gay (1978) confirmed these observations with different
speaking rates and with vowel reduction: the reduction in duration
during fast speech is reflected primarily in the duration of the
vowel, the transition durations within each rate were relatively
stable across different vowels. If the transition duration is in-
variant across a set of CVs with C the same and varying Vs, it fol-
lows that the transition rate depends both on the consonant and the
vowel to be produced.
So the time domain could play an important role in the identification of
vowels (Fowler, 1980). For example, to discriminate the sequences [ae],
[a], and [ai], what acoustic information does the listener need? The answer
is that the second vowel V2 can be detected by using the transition rate as a
cue. This parameter can be specific to the speaker and/or related to the syl-
labic rate. At the very beginning of the transition and throughout the transi-
tion there is sufficient information to detect V2. There are no privileged
points in time (for example the middle of V2 to measure the formant fre-
quencies) for V2 detection. The rate measure is therefore very appropriate
in a noisy environment. It can also explain the perceptual results obtained
by Strange et al. (1983) in silent center experiments that replaced the cen-
ter of the vowel by silence of equivalent duration. This manipulation pre-
serves the direction and the rate of the transition as well as the temporal
organization (syllabic rate). Also relevant are experiments by Divenyi et al.
(1995) showing that, in V1V2 stimuli, V2 was perceived even when V2
and the last half of the transition was removed by gating. Finally, it can be
observed that both Arabic (Al-Tamimi, et al., 2004) and Vietnamese sub-
jects (Castelli and Carr, 2005) have difficulties in producing and perceiv-
ing isolated vowels.
62 Ren Carr

In this paper, a deductive theoretical approach to the study of vowel-to-
vowel dynamics is proposed. It leads to a specification of vocalic trajecto-
ries in the acoustic space characterized by their directions. Then, results of
the production and the perception of V1V2 transitions by French subjects
are presented. Production measurements of the F1, F2 transition rates are
represented in a F1 rate / F2 rate plane. In a perceptual study, we focus on
the direction and rate of synthesized transitions situated outside the tradi-
tional F1/F2 vowel triangle. This situation enables the study of transitions
characterized only by their directions and rates without reference to any
vowel targets in the vowel triangle.
2. Deductive approach and vowel-to-vowel trajectories
Here, we try to infer vowel properties, not from data on vowel production
and perception, but by a deductive approach starting from general physical
properties of an acoustic tube. If the goal is to build an efficient device for
acoustic communication, i.e. a device that gives maximum acoustic contrast
for minimum area shape deformation, then the tube must be structured into
specific regions leading to a corresponding organization of the acoustic
space (Carr, 2004; Carr, 2009). A recursive algorithm using the calcula-
tion of the sensitivity function (Fant and Pauli, 1974) deforms efficiently
the shape of the tube in order to increase (or decrease), step by step, the
formant frequencies. By efficiently we mean that a small or minimal area
shape deformation gives maximum formant variations in the F1/F2 plane
The acoustic space automatically obtained corresponds to the vowel trian-
gle (which is, consequently, maximal in size; it cannot be larger) (Carr,
2009). [a] is obtained with a back constriction and a front cavity; [i] with a
front constriction and a back cavity; [u] with a central constriction and two
lateral cavities. In the first two cases, the front end of the tube is open; in
the last case the front end is almost closed. The specific regions automati-
cally obtained correspond to the main places of articulation (front, back and
central) used to produce vowels (Carr, 2009). The shape deformations can
be represented by deformation gestures corresponding to three different
tongue gestures and one lip gesture. The three different tongue gestures
are: a transversal deformation gesture from front to back places of articula-
tion (and vice-versa) producing [ia] (or [ai])), a longitudinal deformation
gesture from front to central constriction producing [iu] and a longitudinal
displacement gesture from back to central place of articulation producing
Signal dynamics in the production and perception of vowels 63

[au] (Carr, 2009). The lip gesture is used to reach [u] (low F1 and F2). The
deformations can easily be modelled by the Distinctive Region Model
(DRM) (Mrayati, et al., 1988; Carr, 2009).


Figure 1. Vocalic trajectories obtained by deduction from acoustic characteristics
of a tube and corresponding vowels. Dotted lines are labialized trajecto-
ries (Carr, 2009).

From the DRM model, eight more or less rectilinear trajectories structuring
the acoustic space were obtained: [ai], [u], [iu], [ay], [], [uy], [i],
[yu] (Figure 1). The maximum acoustic space obtained by this approach fits
well with the vowel triangle. The use of the DRM does not lead to charac-
terising vowels first, but rather privileges vocalic trajectories. A maximum
acoustic contrast criterion (Carr, 2009) would select the endpoints, and
intermediate points on the trajectories which correspond well with the vow-
els given for example by Catford (1988).
Recall that the recursive algorithm calculates, from an initial shape of
the area function of a tube, a new shape according to a minimum of energy
criterion (minimum deformation leads to maximum acoustic variation)
(Carr, 2004). This operation is repeated until the maximum acoustic limits
of the tube are reached. Thus, the algorithm simulates an evolutionary
process (the goal is not pre-specified at the beginning of the process), by
simply increasing acoustic contrast, step by step, according to a minimum
F1-F2 Plane
0
100
200
300
0 20 40 60 80 100
F1 (Hz)
F2 (Hz)
|i]
|i]
|e]
|c]
|y]
|u]
|u]
|o]
|o]
|]
|a]
|m]
|y]
|]
|a]
|u]
|]
|o]
64 Ren Carr

of energy criterion. The resulting trajectories in the acoustic plane can be
characterized by their directions.
On the basis of the above discussion stressing the importance of the
formant trajectories we hypothesize that the perception of vowels might be
understood, not in terms of static targets, but in terms of a dynamic measure
of the direction and rate of spectral change characterizing such trajectories.
We will test this hypothesis in a series of studies of vowel-to-vowel se-
quences.
3. Vowel-vowel production
[V1V2] sequences were produced 5 times by 5 male and 5 female speakers,
all French, at 2 different rates (normal and fast). In the following experi-
ments, V1 is always /a/ and V2 is one of the French vowels situated on the
[ai] ([i, , e]), [ay] ([y, , ]) or [au] ([u, o, ]) trajectories. A French word
containing V2 appears on the computer screen with alphabetic representa-
tion to help the subject who may have no phonetic knowledge. To exem-
plify, the instructions were the following: at fast rate: say a-i as in the
word lit. The recording process was controlled by PC software that ran-
domly presented the succession of the items to be recorded. In the case of
bad pronunciation, or hesitation, the speaker had the possibility of pro-
nouncing the item again. Formant frequencies were measured using Praat
software every 6.25 ms. The formant variations were smoothed with a
43.75 ms time window by calculating the mean values of the formants ob-
tained for 7 successive frames (running mean value). Then, the derivation
was taken to obtain the formant transition rate. The formant rate was also
smoothed with a 43.75 ms window (running mean value).
Figure 2 shows, for [ai] as produced by speaker RC at normal rate, the
formant transitions, the formant transition rate, the formant transition accel-
eration, in the time domain and in the plane defined by the F2/F1 parame-
ters. Maxima and minima of the F1 and F2 frequencies, rates and accelera-
tions were measured to characterize the formant transitions.

Signal dynamics in the production and perception of vowels 65

[ai]
0
1000
2000
3000
0 200 400
t (ms)
F

(
H
z
)
F1
F2
N(rc1)
[ai]
0
1000
2000
3000
0 500 1000
F1 (Hz)
F
2

(
H
z
)
N(rc1)

a) b)
[ai]
-6
-4
-2
0
2
4
6
8
0 200 400
t (ms)

F

(
H
z
/
m
s
)
F1
F2
N(rc1)
[ai]
-10
-5
0
5
10
-5 -4 -3 -2 -1 0 1 2 3 4 5
F1 (Hz/ms)

F
2

(
H
z
/
m
s
)
N(rc1)

c) d)
[ai]
-0.4
-0.2
0
0.2
0.4
0.6
0 200 400
t (ms)
D
2

(
H
z
/
m
s
/
m
s
)
F1
F2
N(rc1)
[ai]
-1
-0.5
0
0.5
1
-0.5 0 0.5
D
2
F1 (Hz/ms/ms)
D
2
F
2

(
H
z
/
m
s
/
m
s
)
N(rc1)

e) f)
Figure 2. [ai] production, at normal rate, for speaker RC, a) F1 and F2 formant
transition in the time domain, b) corresponding formant trajectory in the
F1-F2 plane, c) F1 and F2 rate in the time domain, d) formant rate trajec-
tory in the F1 rate-F2 rate plane, e) F1 and F2 acceleration in the time
domain, and f) formant acceleration trajectory in the F1 acceleration-F2
acceleration plane.


66 Ren Carr

3.1. Vowel formants
The vowel formant frequencies for [aV] as produced by speaker EM are
represented in the F1/F2 plane (figure 3). The data points are the mean
values of the 5 occurrences for normal and fast production (N and F). Stan-
dard deviations for F1 and F2 are also indicated. There are no significant
differences between normal and fast productions which could be expected
because, in V1-V2 production, the target V2 is always reached. The vowels
can be easily separated.

Max F1-F2 for V,
for normal and fast production
0
500
1000
1500
2000
2500
0 200 400 600 800 1000
F1 (Hz)
F
2

(
H
z
)
[ai] [ai]
[ae] [ae]
[a] [a]
[ay] [ay]
[a] [a]
[a] [a]
[au] [au]
[ao] [ao]
[a] [a]
[a] [a]
N(em) F(em)

Figure 3. Mean F1 and F2 frequencies and standard deviations plotted in the F1/F2
plane for [aV] tokens produced by speaker EM, at normal (N) and fast
(F) rates. Each item is produced 5 times (45 times for [a]).


3.2. [aV] transitions
3.2.1. [aV] characteristics in the F1-F2 plane
Figure 4 shows the different formant trajectories for [ai], [ae], [a], [ay],
[a], [a] and [au], [ao], [a] in the F1-F2 plane for one speaker EM. Each
trajectory is a single production pronounced at normal rate. The trajectories
are rather rectilinear and follow, as far as [ai] and [au] are concerned, the
basic trajectories obtained by deduction (figure 1): [e], [] are situated
Signal dynamics in the production and perception of vowels 67

along the formant movement of [ai]; [o] and [] on the [au] trace. Figure 4
also shows that the end parts of the trajectories corresponding to V can be
characterized by small changes along the rectilinearity of the trajectory.
This result corresponds to the vowel inherent spectral changes observed
by Nearey and Assmann (1986) for the final vowel in VCV sequences and
by Carr et al. (2004) for isolated vowels. These characteristics are also
observed for the other speakers.

[aV] trajectories
0
500
1000
1500
2000
2500
0 200 400 600 800 1000
F1 (Hz)
F
2

(
H
z
)
[]
[y]
[u]
[]
[o]
[]
[]
[i]
[e]
[a]

Figure 4. [aV] formant trajectories for speaker EM (normal rate). The small
changes at the ends of the trajectories corresponding to V do not deviate
significantly from the rectilinearity of the trajectories.

3.2.2. [aV] transition rate
The representation of the F1-F2 transition rate as in figure 2d is used to
compare the [aV] transitions for all V. Figure 5a shows the results for
speaker EM with one utterance of each [aV]. It can be observed that: if, for
example, [ai], [ae], [a] are compared, three distinct rate trajectories can be
discriminated. The rate trajectory of [ai] is longer than that of [ae] and still
longer than that of [a]. In other words, the maximum rate of [ai] is greater
than the maximum rates of [ae] and of [a]. Figure 5b shows, for [ai], [ae],
[a] and for one production by speaker EM, the first formant rates in the
time domain. The three vowels can be discriminated according to the
maximum rates corresponding more or less to the middle of the transition
([ai] maximum rate > [ae] maximum rate > [a] maximum rate). Discrimi-
68 Ren Carr

nation can also be obtained throughout the transition and especially from
the very beginning of the transition (from the very beginning of the produc-
tion task). Figure 5b shows that the three transitions (for [ai], [ae], [a])
synchronized at the beginning (corresponding to about t=50ms), stop at
t=about 150ms. Because of the more or less constant duration of the transi-
tion, the 3 rates, throughout the transition, follow the inequality: [ai] rate >
[ae] rate > [a] rate. In principle, discrimination between the three final
vowels is thus possible throughout the transition and especially at its begin-
ning.

[aV] trajectory rates
F1 rate (Hz/ms)
F
2

r
a
t
e


(
H
z
/
m
s
)
[]
[y]
[u]
[]
[o]
[]
[]
[i] [e]
[a]
[]
[aV] rates
-8
-6
-4
-2
0
2
0 100 200
time (ms)
F
1

r
a
t
e

(
H
z
/
m
s
)
[ai]
[ae]
[a]

a) b)
Figure 5. a) [aV] trajectory rates in the F1 rate-F2 rate plane for speaker EM (nor-
mal rate) and b) F1 rates in the time domain for [ai], [ae], [a].

Figure 6a shows the formant transition rates (mean data and standard devia-
tions for the 5 productions) in the F1 rate/F2 rate plane for the speaker EM,
for normal and fast production. The rates indicated are the maximum rates
of the transitions. We do not observe large differences between normal and
fast production and the vowels can be discriminated according to their
rates.
According to the vowel target approach, identification would be based
on formant frequency information at the end of the transition. It would not
be necessary to know the characteristics of the preceding vowel (here [a]).
In contrast, the dynamic approach assumes that directions and slopes of the
transitions are important parameters. The identification of the vowel V
would depend on the departure point in acoustic space. Standard deviations
can be reduced by normalization based on the formant values of the initial
[a] (Figure 6b).
If we compare the two results: vowel targets (Figure 3 with F1, F2) and
formant transition rates (Figure 6 with F1 rate, F2 rate), the difference in
Signal dynamics in the production and perception of vowels 69

distinctiveness is not evident. However, in this experiment at normal and
fast rates of V1V2 production, the vowel targets V2 are always reached. A
further analysis on items such as V1V2V1 with possible vowel reduction
effect at fast rate production is thus necessary.

Max F1-F2 rate from [a] to V,
for normal and fast production
-10
-5
0
5
10
-6 -4 -2 0
F1 rate (Hz/ms)
F
2

r
a
t
e

(
H
z
/
m
s
)
[ai] [ai]
[ae] [ae]
[a] [a]
[ay] [ay]
[a] [a]
[a] [a]
[au] [au]
[ao] [ao]
[a] [a]
N(em) F(em)

a)
Max F1-F2 rate from [a] to V,
for normal and fast production
-10
-5
0
5
10
-6 -4 -2 0
F1 rate (Hz/ms)
F
2

r
a
t
e

(
H
z
/
m
s
)
[ai] [ai]
[ae] [ae]
[a] [a]
[ay] [ay]
[a] [a]
[a] [a]
[au] [au]
[ao] [ao]
[a] [a]
N(em) F(em)

b)
Figure 6. a) Vowel transition maximum rates of the transition [aV] for normal (N)
and fast production (F) (speaker EM); b) Same data but the formant fre-
quencies F1 and F2 of each [a] vowel at the beginning of the transition
are taken into account to normalize the rates.
70 Ren Carr

3.2.3. [aV] transition duration
The preceding results suppose that the transition durations are more or less
constant for all the [aV] produced by a same speaker. Figure 7 shows the
transition durations for the speaker EM at normal and fast production. The
transition duration is defined as the interval between the maximum and
minimum of the acceleration curve for F1 (see Figure 2e). The duration of
the transition is around 10% smaller for faster production. The standard
deviation is small for both. Our results correspond to those of Kent and
Moll (1969) and Gay (1978), but have to be confirmed with data from more
speakers.


Figure 7. Transition durations for all the [aV] produced by speaker EM at normal
and fast rate.

3.2.4. [aV] transition for 10 speakers
Our first results on [aV] production for a single speaker lead us to hypothe-
size that transition rates ought to be invariant across speakers (male and
female). To test the hypothesis, the first two formants of the [aV] tokens as
produced by 10 speakers (5 males and 5 females) were calculated with the
Praat software. The formants of the V are represented in Figure 8 in the F1-
F2 plane. The standard deviations indicate that these static target represen-
tations show significant variability. However representations of transition
maximum rates exhibit even more variability which is the opposite of our
hypothesis (Figure 9)! These findings raise two questions: a) How accurate
Transition Duration (em)
0
20
40
60
80
100
D
u
r
a
t
i
o
n

(
m
s
)

Normal
Fast
Signal dynamics in the production and perception of vowels 71

is the formant using classical estimation techniques, (e.g., linear predic-
tion), especially for female voices, and b) Would it be possible to reduce
variability by taking syllable rate or transition rate into account?

[aV]-F1/F2 plane
0
500
1000
1500
2000
2500
3000
0 200 400 600 800
F1 (Hz)
F
2

(
H
z
)
[i] [i]
[e] [e]
[] []
[y] [y]
[] []
[] []
[u] [u]
[o] [o]
[] []
N F

Figure 8. F1-F2 plane representation of the vowels [V] from [aV] produced at
normal (N) and fast (F) rate by 10 speakers (5 males and 5 females).

First, good formant estimation is very difficult to obtain especially for fe-
male voices. Furthermore formant measurement errors are emphasized by
the present derivative process of computing transition rates. For example, a
formant frequency error of 10% can lead to an error in transition rate of
100%. Using a large time window to compute mean values can reduce
these errors, but delays rate measurement. Problematic aspects of formant
detection will be discussed further below.
Second, each of the speakers has his own transition rate which might
also change slightly with the syllable rate (normal/fast production), for
instance see the transition rate for normal and fast production for speaker
EM in figure 7.

72 Ren Carr

Figure 9. Maximum transition rates in the plane F1 rate versus F2 rate for [aV]
uttered by 10 speakers (5 males and 5 females) for normal (N) and fast
(F) production; b) same data after normalization using the transition du-
ration.
[aV] Transition Rate
-10
-5
0
5
10
15
-10 -8 -6 -4 -2 0
F1 rate (Hz/ms)
F
2

r
a
t
e

(
H
z
/
m
s
)
[i] [i]
[e] [e]
[] []
[y] [y]
[] []
[] []
[u] [u]
[o] [o]
[] []
N F

Figure 10. Maximum F2 and F1 transition rates for [aV] uttered by 10 speakers (5
males and 5 females) for normal (N) and fast (F) production after nor-
malization based on transition durations.
[aV] Transition Rate
-10
-5
0
5
10
15
-10 -8 -6 -4 -2 0
F1 rate (Hz/ms)
F
2

r
a
t
e

(
H
z
/
m
s
)
[i] [i]
[e] [e]
[] []
[y] [y]
[] []
[] []
[u] [u]
[o] [o]
[] []
N F
Signal dynamics in the production and perception of vowels 73

In view of these considerations we decided to normalize the rate meas-
urements with respect to transition duration. Transition durations were ob-
tained from the time interval between the maximum and minimum of the
acceleration curve for F1 (see Figure 2e). Figure 10 shows the new results
(the reference transition duration was 100ms). The standard deviations are
clearly reduced but are still greater than the ones obtained with the target
formant measurements.
4. Transition perception
Since direction and rate of transitions provide discriminating acoustic in-
formation on the vowel identities in vowel to vowel sequences, it seems
possible that these two attributes could be used in perception. In fact, the
rates of F1 and F2 give the direction in the plane F1/F2. With such an hy-
pothesis, the starting point must be known (Carr et al., 2007) but, it can
also be considered that, for example, high negative F1 rate and high posi-
tive F2 rate leads to /ai/ without prior information on the first vowel. To test
this hypothesis, trajectory stimuli outside the vowel triangle were chosen.
So, the use of normal target values for the vowels was abandoned but typi-
cal rates and directions in the acoustic space were retained. Four different
stimuli (A, B, C, D) were synthesized with 2 formants. The trajectories of
these sequences are shown in the F1/F2 plane Figure 10, and in the time
domain Figure 11. F0 is 300 Hz at the beginning of each sequence, held
constant during the first quarter of the total duration, decreasing to 180 Hz
at the end. Possible responses for identification tests were chosen during a
pre-test, i.e. [iai], [u], [aua], [aoa]. A fifth case, labelled ????, was of-
fered in case of impossible identification (no response). Twelve subjects
took part in the perception tests.

74 Ren Carr

Formant Transitions
0
1000
2000
3000
0 500 1000 1500 2000
F1 (Hz)
F
2

(
H
z
)
A B
C
D

Figure 11. The four trajectories (A, B, C, D) in the plane F1-F2 and the vowel
triangle. The trajectories are outside the vowel triangle. Their directions
and sizes in the acoustic plane vary.

0
500
1000
1500
2000
2500
3000
0 100 200 300 400 500
t (ms)
F

(
H
z
)
F1 (A)
F2 (A)
F1 (B)
F2 (B)
F1 (C)
F2 (C)
F1 (D)
F2 (D)

Figure 12. F1 and F2 in the time domain for the four sequences (A, B, C, D). The
duration of the first part of each sequence was 100 ms, the duration of the
transition was constant and equal to 100 ms. The duration of the last part
was 150 ms. The rates of the transitions in Hz/ms vary. The first and last
parts of each sequence were stable and equal in formant frequency. The
transitions of the four sequences reached more or less the same point in
the acoustic plane.

Signal dynamics in the production and perception of vowels 75

The responses (in %) are given in Figure 13. The sequence A is identified
71% of the times as [iai]. B is identified 87% as [u]. C is identified 95%
and D 96% as [aua] or [aoa] and the long trajectory corresponding to a
faster rate of transition is more [aua] than the short one which is more
[aoa]. The option of no response is generally avoided. The sequence A
which has the same direction in the acoustic plane and transition rate as [ai]
is perceived as /iai/; B which has more or less the same direction and rate as
[u] is perceived as /uu/; C which has the same direction and rate as [au] is
perceived as /aua/ and D which has the same direction as [au] but at lower
rate is more often perceived as /aoa/.
These results can be summarized by saying that the region where the 4
trajectories converge (acoustically close to [a]) is perceived as /a/ or /u/ or
/o/ depending on the direction and length (i.e. rate of the transition) of the
trajectories.

0
20
40
60
80
100
%

/aoa/
/aua/
/u/
/iai/
A B C D

Figure 13. Results of the perception tests. The sequence A is mainly perceived as
[iai], B as [iui], C as [aua] (long trajectory), D as [aoa] (short trajectory).
5. Discussion
At different levels our preliminary results raise several problems and ques-
tions about the dynamic approach and its consequences for the theory of
speech production and perception. Also the findings motivate a closer ex-
amination of current speech analysis techniques and the methodology of
perception tests.
The dynamic approach is very attractive because it may permit conso-
nants and vowels to be integrated within a single theory. Conceivably, us-
76 Ren Carr

ing the parameter of transition rate, one might propose that fast transitions
tend to produce consonants, whereas slow transitions produce vowels.
In the case of perceiving V1V2 sequences, we have reported acoustic
measurements indicating that signal information on V2 is available
throughout the transition and especially at its very beginning. This strategy
presupposes that the identity of the previous V1 has been determined. The
question is: How is this information to be obtained? According to a target
theory of speech perception, V2 can only be identified on the basis of its
target, the goal being to reach the target irrespective of the starting point.
One of the aims of the present study has been to suggest that dynamic
parameters such as direction of spectral change in acoustic space and transi-
tion rate could be more invariant across males, females and children than
vowel targets. This hypothesis would make normalization in terms of static
targets unnecessary. However, normalization of transition rate with respect
to the different transition durations observed in production would seem
necessary. Such normalization could be readily available perceptually,
thanks to temporal coding and the sensitivity of the auditory system to rate
(derivatives) and acceleration (Pollack, 1968; Divenyi, 2005).
Hyper-hypo speech, and reduction phenomena (Lindblom, 1963; Lind-
blom, 1990) of fast and normal speech (Kent and Moll, 1969; Gay, 1978;
van Son and Pols, 1990; van Son and Pols, 1992; Pols and van Son, 1993)
should be further studied with respect to the parameters of transition direc-
tion and rate. The results obtained from experiments with silent center
(Strange, et al., 1983) can be explained in terms of dynamic specification
in the sense that it is not necessary to compensate for undershoot at the
production level (target not reached because of coarticulation, fast speech
or hypo speech) by perceptual overshoot (Lindblom and Studdert-
Kennedy, 1967) calculated not solely by the formant-frequency pattern at
the point of closest approach to target, but also by the direction and rate of
adjacent formant transitions. This finding is compatible with the assump-
tion that, given a specification of their point of origin in phonetic (acoustic)
space, direction and rate of formant transitions could be sufficient to spec-
ify the following vowel.
Our preliminary results on vowel production represent a first few steps
in support of a full dynamic approach. More studies on the normalization
process must also be undertaken.
It is well known that predictive coding based on a model of speech pro-
duction is not well adapted for analyzing speech signals with high funda-
mental frequencies or with noise. Furthermore, such a technique is ill-
Signal dynamics in the production and perception of vowels 77

suited to measuring spectral variations. A dynamic approach necessitates a
reconsideration of analysis techniques in light of our knowledge of the
auditory system. The spikes observed in auditory nerve fibers are statisti-
cally synchronized by the time domain shape of the basilar membrane exci-
tation around the characteristic frequencies (Sachs et al., 1982). So they can
give information not only on the amplitudes of spectral components but
also on the shape in the time domain of the components and thus on the
phases. The phase variation (-180 around formant frequencies for second
order filters describing the transfer function of the vocal tract (Fant, 1960))
could be used to measure the rate of the transitions.
To attain some of these goals new tools would be needed. For example
Chistovich et al. (1982) described a model of the auditory system which
detects spectral transitions without specific formant detection.
These considerations make it evident that in order to test the hypothesis
of greater invariance in transition rates than in formant targets, it would
be necessary both to improve current analysis techniques and to study more
deeply the normalization of transition durations.
Perception tests of formant transitions outside the vowel triangle en-
courage us to study general dynamic properties of the auditory system that
may be used in speech. Formant transitions can be converted into sine-
waves: preliminary tests have shown results close to those obtained with
formants. Differences have to be explained. Many experiments must be
undertaken with such a tool creating speech illusions.
6. Conclusions
This paper follows up on results previously published on the deductive
approach (Carr, 2004) proposing a dynamic view of speech production, on
acoustic modelling (Mrayati et al., 1988), on structuring an acoustic tube
into regions corresponding to the main places of articulation, and on the
prediction of vocalic systems (Carr, 2009). The preliminary results pre-
sented here on vowels must be extended to consonants, many of which are
intrinsically dynamic. At the same time, the evident importance of dynamic
characteristics does not mean that static targets are not used in perception.
The limits of the dynamic approach and the balance between the use of
static and dynamic parameters in perception must be known. But the dy-
namic approach needs to develop new ways of thinking and new tools.
Formant transitions cannot be obtained from a succession of static values
78 Ren Carr

but from directions and slopes. It means that a new tool able to measure
these characteristics directly has to be developed. The dynamic approach is
not a static approach plus dynamic parameters taken into account, it must
be an approach intrinsically dynamic. It calls for an epistemologic study of
the dynamic nature of speech (Carr et al., 2007).
Acknowledgements
The author thanks for their very helpful comments and stimulating discus-
sions Eric Castelli, Pierre Divenyi, Bjrn Lindblom, Egidio Marsico, Fran-
ois Pellegrino, Michael Studdert-Kennedy, Willy Serniclaes. He thanks
also Claire Grataloup for the efficient management of the perception tests.
References
Al-Tamimi, J., Carr, R. and Marsico, E.
2004 The status of vowels in Jordanian and Moroccan Arabic: insights
from production and perception. Journal of the Acoustical Society of
America 116: S2629.
Carr, R.
2004 From acoustic tube to speech production. Speech Communication 42:
227-240.
2009 Dynamic properties of an acoustic tube: Prediction of vowel systems.
Speech Communication, 51: 26-41.
Carr, R. and Hombert, J. M.
2002 Variabilit phontique en production et perception de parole : strat-
gies individuelles. In Invariants et Variabilit dans les Sciences
Cognitives J. Lautrey, B. Mazoyer and P. van Geert, (Eds.). Presses
de la Maison des Sciences de l'Homme: Paris.
Carr, R. and Mrayati, M.
1991 Vowel-vowel trajectories and region modeling. Journal of Phonetics
19: 433-443.
Carr, R., Pellegrino, F., Divenyi, P.
2007 Speech dynamics: epistemological aspects. In: Proc. of the ICPhS,
Saarbrcken, pp. 569-572.
Castelli, E. and Carr, R.
2005 Production and perception of Vietnamese vowels. In: ICSLP, Lisbon,
pp. 2881-2884.

Signal dynamics in the production and perception of vowels 79

Catford, J. C.
1988 A practical introduction to phonetics. Clarendon Press: Oxford.
Chistovich, L. A., Lublinskaja, V. V., Malinnikova, T. G., Ogorodnikova, E. A.,
Stoljarova, E. I. and Zhukov, S. J.
1982 Temporal processing of peripheral auditory patterns of speech. In
The representation of speech in the peripheral auditory system R.
Carlson and B. Grandstrm (Eds.). Elsevier Biomedical Press: Ams-
terdam. pp. 165-180.
Divenyi, P., Lindblom, B. and Carr, R.
1995 The role of transition velocity in the perception of V1V2 complexes.
In: Proceedings of the XIIIth Int. Congress of Phonetic Sciences,
Stockholm, pp. 258-261.
Divenyi, P. L.
2005 Frequency change velocity detector: A bird or a red herring? In Au-
ditory Signal Processing: Physiology, Psychology and Models D.
Pressnitzer, A. Cheveign and S. McAdams (Eds.). Springer-Verlag:
New York, pp. 176-184.
Fant, G.
1960 Acoustic theory of speech production. Mouton: The Hague.
Fant, G., Pauli, S.,
1974 Spatial characteristics of vocal tract resonance modes. In: Proc. of
the Speech Communication Seminar. Stockholm, pp. 121132.
Fowler, C.
1980 Coarticulation and theories of extrinsic timing. Journal of Phonetics
8: 113-133.
Gay, T.
1978 Effect of speaking rate on vowel formant movements. Journal of the
Acoustical Society of America 63: 223-230.
Hillenbrand, J. M., Getty, L. A., Clark, M. J. and Wheeler, K.
1995 Acoustic characteristics of American English vowels. Journal of the
Acoustical Society of America 97: 3099-3111.
Hillenbrand, J. M. and Nearey, T. M.
1999 Identification of resynthesized /hVd/ utterances: Effects of formant
contour. Journal of the Acoustical Society of America 105: 3509-
3523.
Johnson, K.
1990 Contrast and normalization in vowel perception. Journal of Phonet-
ics 18: 229-254.
1997 Speaker perception without speaker normalization. An exemplar
model. In Talker Variability in Speech Processing K. Johnson and J.
W. Mullennix (Eds.) Academic Press: New York, pp. 145-165.


80 Ren Carr

Johnson, K., Flemming, E. and Wright, R.
1993 The hyperspace effect: Phonetic targets are hyperarticulated. Lan-
guage 69: 505-528.
Kent, R. D. and Moll, K. L.
1969 Vocal-tract characteristics of the stop cognates. Journal of the
Acoustical Society of America 46: 1549-1555.
Lindblom, B.
1963 Spectrographic study of vowel reduction. Journal of the Acoustical
Society of America 35: 1773-1781.
1990 Explaining phonetic variation: a sketch of the H and H theory. In
Speech Production and Speech Modelling A. Marchal and W. J.
Hardcastle (Eds.). NATO ASI Series. Kluwer Academic Publishers.
Dordrecht, pp. 403-439.
Lindblom, B. and Studdert-Kennedy, M.
1967 On the role of formant transitions in vowel perception. Journal of the
Acoustical Society of America 42: 830-843.
Moon, J. S. and Lindblom, B.
1994 Interaction between duration, context and speaking style in English
stressed vowels. Journal of the Acoustical Society of America 96: 40-
55.
Mrayati, M., Carr, R. and Gurin, B.
1988 Distinctive regions and modes: A new theory of speech production.
Speech Communication 7: 257-286.
Nearey, T. and Assmann, P.
1986 Modeling the role of inherent spectral change in vowel identification.
Journal of the Acoustical Society of America 80: 1297-1308.
Nordstrm, P. E. and Lindblom, B.
1975 A normalization procedure for vowel formant data. In: 8th Interna-
tional Congress of Phonetic Science, Leeds.
Peterson, G. E. and Barney, H. L.
1952 Control methods used in the study of the vowels. Journal of the
Acoustical Society of America 24: 175-184.
Pollack, I.
1968 Detection of rate of change of auditory frequency. J. Exp. Psychol.
77: 535-541.
Pols, L. C. W. and van Son, R. J.
1993 Acoustics and perception of dynamic vowel segments. Speech Com-
munication 13: 135-147.
Repp, B., Healy, A. F. and Crowder, R. G.
1979 Categories and context in the perception of isolated steady-state
vowels. Journal of Experimental Psychology: Human Perception
and Performance 5: 129-145.

Signal dynamics in the production and perception of vowels 81

Sachs, M., Young, E. and Miller, M.
1982 Encoding of speech features in the auditory nerve. In The Represen-
tation of Speech in the Peripheral Auditory System C. R. and G. B.
(Eds.) Elsevier Biomedical: Amsterdam. pp. 115-130.
Schouten, M. and van Hessen, A.
1992 Modeling phoneme perception. I: Categorical perception. Journal of
the Acoustical Society of America 92: 1841-1855.
Shankweiler, D., Verbrugge, R. R. and Studdert-Kennedy, M.
1978 Insufficiency of the target for vowel perception. Journal of the
Acoustical Society of America 63: S4.
Strange, W.
1989 Evolving theories of vowel perception. Journal of the Acoustical
Society of America 85: 2081-2087.
Strange, W., Jenkins, J. J. and Johnson, T. L.
1983 Dynamic specification of coarticulated vowel. Journal of the Acous-
tical Society of America 74: 695-705.
van Son, R. J. and Pols, L. C. W.
1990 Formant frequencies of Dutch vowels in a text, read at normal and
fast rate. Journal of the Acoustical Society of America 88: 1683-
1693.
1992 Formant movements of Dutch vowels in a text, read at normal and
fast rate. Journal of the Acoustical Society of America 92: 121-127.
Verbrugge, R. R. and Rakerd, B.
1980 Talker-independent information for vowel identity. Haskins Labora-
tory Status Report on Speech Research SR-62: 205-215.
Whalen, D. H., Magen, H. S., Pouplier, M. and Kang, A. M.
2004 Vowel production and perception: hyperarticulation without a hyper-
space effect. Language and Speech 47: 155-174.




Part 2:
Typological approaches to measuring complexity


Calculating phonological complexity

Ian Maddieson
1. Introduction
Several simple factors that can be considered to contribute to the complex-
ity of a languages phonological system have been investigated in recent
papers (including Maddieson, 2006, 2007). The object in these papers was
to see whether, in a large sample of languages, these factors positively cor-
related with each other or displayed a compensatory relationship in which
the elaboration of one factor was counterbalanced by greater simplicity in
others. In general, the factors examined tended to show a pattern of positive
correlation. That is, languages tended to be distributed along a continuum
from lesser to greater phonological complexity, with several factors simul-
taneously contributing to the increase in complexity.
The factors considered in these studies only involved the inventories of
consonant and vowel contrasts, the tonal system, if any, and the elaboration
of the syllable canon. It is relatively easy to find answers for a good many
languages to such questions as how many consonants does this language
distinguish? or how many types of syllable structures does this language
allow? although care must be taken to ensure that similar strategies of
analysis have been applied so that the data are comparable across lan-
guages. It is reasonable to suppose that a language which requires its
speakers to encode and to discriminate between a larger number of distinc-
tions is in this respect more complex than one with fewer distinctions.
However, these properties of a languages phonological system are far from
the only ones that might be considered. Some might even argue that they
are not among the most important.
In this paper, a number of other aspects that plausibly contribute to pho-
nological complexity will be discussed. The factors that will be considered
are: the inherent phonetic complexity of elements in the phonological in-
ventory, the role played by freedom vs limitation of combinatorial possi-
bilities, the contribution of the frequency of occurrence of different proper-
ties, and the relative transparency of the relationships between
phonological variants. The discussion of each factor will consider the cur-
86 Ian Maddieson

rent feasibility of establishing a basis for multi-language comparisons. A
final section will consider possible ways of demonstrating that these intui-
tively plausible contributors to complexity are actual contributors to com-
plexity.
Before proceeding with this discussion, however, it might be valuable to
reiterate why measures of phonological complexity are of interest. Many
linguists assert in one way or another that all natural human languages are
equally complex. Such a view seems primarily to be based on the humanis-
tic principle that all languages are to be held in equal esteem and are equal-
ly capable of serving the communicative demands placed on them. In re-
jecting the notion of primitive languages linguists seem to infer that a
principle of equal complexity must apply. For example, in a widely-used
basic linguistics textbook Akmajian et al. (1979:4) state that all known
languages are at a similar level of complexity and detail there is no such
thing as a primitive human language. A common corrolary derived from
this view is that languages vary in the complexity of parts of their grammar
but trade off complexity in one sub-part of the grammar against simplicity
elsewhere. The introduction to a recent book on language complexity
(Miestamo et al., 2008) asks is the old hypothesis true that, overall, all
languages are equally complex, and that complexity in one grammatical
domain tends to be complensated by simplicity in another? Several of the
contributors (e.g. Fenk-Oczlon and Fenk, Sinnemki) answer in the affir-
mative.
This view is not shared by all. In particular, McWhorter has argued in a
number of places (e.g. 2001a, b) that languages vary considerably in their
complexity, and it is especially a mark of languages that have passed
through a stage of creolization to exhibit reduced complexity. Everett
(2004) argued that compared to other languages Pirah is a language with a
substantial number of elements missing from its grammar and lexicon. Sev-
eral commentators on this paper construed this as a suggestion that Pirah
was in fact a primitive language. If indeed it is true that languages differ
substantially in complexity, then it follows that such tasks as on-line
processing of language input as well the initial acquisition of language abil-
ities as a child would place quite variable demands on individuals, depend-
ing on the language involved. Performance might be expected to vary
commensurate with task difficulty. If languages today vary in complexity
then hypotheses positing an original evolution of language through the
elaboration over time of a less complex pre-language gain plausibility (see
Bickerton, 2007 and contributions to Givn & Maile, 2002).
Calculating Phonological Complexity 87

These are among a number of significant scientific issues that arise in
connection with linguistic complexity. However, it is not possible to ad-
dress these questions without some prior consideration of how to define and
how to measure complexity in linguistic systems. This paper is offered as
one contribution to this discussion.
2. Inherent phonetic complexity
The studies mentioned above looked at the number of consonants, the
number of basic vowel qualities, the total number of vowels and the num-
ber of tonal contrasts, as well as the syllable canon. The number of basic
vowel qualities is the number of vowel distinctions involving the primary
properties of height, backness, and rounding as well as tongue root, but not
including any distinctions which depend only on such features as length,
nasalization or phonation type. The total number of vowels includes any
additional vowels distinguished by these features. Only in the case of the
syllable canon was any account taken of what might be called inherent
complexity. Languages were classified as belonging to one of three groups
based on the maximally elaborated syllable type permitted. Languages al-
lowing nothing more elaborate than a CV structure were classified as Sim-
ple with respect to their syllable structure. Languages permitting no more
than a single consonant in the coda and/or onsets of the most common Ob-
struent + Sonorant types (such as stop + liquid or glide) were classed as
Moderately Complex, and those allowing a sequence of two or more con-
sonants in the coda and/or other types of onset sequences, such as Obstru-
ent + Obstruent, Sonorant + Sonorant or Sonorant + Obstruent or any clus-
ters with three or more members were classed as Complex. This particular
division into three classes reflects a judgment that in CC onsets it is not just
the number of consonants in the sequence that contributes to complexity,
but also the nature and order of those consonants. This judgment is based
on the likelihood that the constituents of onsets such as /tw, bl, fr/ etc can
be more readily recognized than the constituents of onsets such as /sf, tk,
ln/ because they display a greater acoustic modulation than the latter (cf.
Ohala & Kawasaki, 1984, Kawasaki-Fukumori, 1992). This judgment is
indirectly supported by the relative frequency of the two types across the
languages of the world. A similar principle of taking into account some
evaluation of inherent phonetic complexity might be extended to other do-
88 Ian Maddieson

mains. This is easy to imagine with consonant and vowel inventories, as
well as with tones.
A more refined evaluation of the complexity of a consonant inventory
would thus take into account both the nature and the number of the conso-
nants, rather than simply the number. Each consonant type can be assigned
a complexity score, and the complexity of the consonant inventory calcu-
lated by summing the complexity scores of the consonants it contains.
What would be the basis of such scores? A possible scheme was proposed
by Lindblom and Maddieson (1988). Consonants were assigned to one of
three categories Basic, Elaborated and Complex. Elaborated consonants
have one of a set of 'complicating' properties, Complex consonants have
more than one of these properties; the residue are Basic. These assignments
were made in a largely intuitive way which primarily took into account
estimations of the articulatory complexity of a prototypical production of
the consonant in question, but which was also influenced by factors made
less explicit in the article.
According to the scheme proposed, any action of the larynx other than
simple voicing or voicelessness co-occurring with the oral articulation,
such as breathy voicing, aspiration, or ejective articulation is a complicat-
ing factor. In addition, because of the aerodynamic difficulty of producing
local friction in the oral cavity with the reduced air-flow that occurs with
voicing, voiced fricatives and affricates are also classed as Elaborated.
Voiceless sonorants are also considered elaborated as they depart from a
default mode of phonation. Any superposition or sequencing of different
oro-nasal articulatory configurations is also a complicating factor. For ex-
ample, prenasalization, lateral release, and secondary articulations are all
complicating factors, although simple homorganic affrication is not. Clicks
and doubly-articulated consonants such as /kp, gb/ which require two sets
of oral articulatory gestures for their production, are also (at least) in the
Elaborated class. Perhaps the least satisfactory part of this classification is
the treatment of place of articulation. In principle, configurations repre-
senting departures from the near-rest positions of the lips, tongue-tip and
tongue-body components of an articulatory model are Elaborated. The list
of places so defined was given as labio-dental, palato-alveolar, retroflex,
uvular and pharyngeal (Lindblom and Maddieson, 1988:67). While labio-
dental and retroflex articulations involve displacements of an articulator
toward a surface not opposite that articulators rest position, it is unclear
that this is equally the case for the remaining places in this list. In standard
phonetic textbooks (e.g. Ladefoged, 2006) they are not treated as dis-
Calculating Phonological Complexity 89

placed articulations whose articulatory dynamics are for this reason inher-
ently more complex. The classification may well have been influenced by
the relative frequency of these places, since retroflex, uvular and pharyn-
geal consonants are all globally quite rare. Consideration of the interactions
between place and manner also seem to have implicitly played a role.
Labio-dental fricatives and palato-alveolar fricatives and affricates are
common, but plosives at either of these places are rare. Thus, /f/ and /t/
were counted as Basic segments, as was /w/ which has a double articula-
tion.
While this scheme could undoubtedly be improved for example, by
finding a more uniform basis for evaluating articulatory complexity or by
taking into account considerations of perceptual salience it provides a
basis for a demonstration of cross-language differences in inherent phonetic
complexity of consonant systems. In the sample of languages used in Mad-
dieson (2005), the modal number of consonants in the inventory is 22.
Three of the languages with this consonant inventory size are Indonesian
(Alwi et al., 1998), Birom (Wolff, 1959, Bouquiaux, 1970) and Kiowa
(Watkins, 1984, Sivertsen1956). The consonant inventories of these three
languages, based on the references cited above, are given in Table 1. The
lateral in Kiowa is interpreted as a voiced affricate, although it only has this
realization in coda position.

Table 1. Consonant inventories of three languages with 22 consonants
Indonesian
(Austronesian)
Birom
(Niger-Congo; Nigeria)
Kiowa
(Kiowa-Tanoan; USA)
p t k

p t k kp

p t1 k
p t1 k
b d g

b d g gb

b d1 g
p' t1 ' k'
t dZ
f s x h
z
m n N
l r
w j
t dZ
f s h
v z
m n N
l r
w j
ts ts'
s h
z
m n1
d1 L
j

Numeric values for each segment corresponding to a consonant class in the
Lindblom & Maddieson scheme can be substituted such that Basic =1,
90 Ian Maddieson

Elaborated = 2, and Complex = 3 as in Table 2. For the purposes of this
exercise it is stipulated that /f, S, tS, w/ are among the Basic consonants.
Otherwise the definitions given above are applied. A summed score for
each inventory can then be calculated. Indonesian has a score of 24, Birom
a score of 27, and Kiowa, a score of 32. The three languages are thus quite
well differentiated when the phonetic content of the inventories enters,
however imperfectly, into the picture. These relative rankings correspond
quite well with linguistic intuitions. For example, it seems likely that the
Kiowa consonant system would be harder for learners to master if prior
experience of particular languages could somehow be eliminated from in-
fluencing the results (perhaps by selecting a pool of learners with a wide
variety of language backgrounds).

Table 2. Complexity scores for the consonants of the three languages
Indonesian = 24 Birom = 27 Kiowa = 32
1 1 1

1 1 1

1 2
1 1 1 1 1
2
1 1 1 1
1 1
1 1
1 1 1 2

1 1 1 2

1 2
1 1 1
2 2
1 1 1
1 1
1 1
1 1 1 1
2 2 2
1 1 1
2 2 2
1 2
1 1
2
1 1
3
1


A different approach to defining a scale of elaboration based on phonetic
content has been proposed by Marsico et al. (2004). Their approach pro-
ceeds through an analysis of the structure found in the classification of
segments by phonetic categories or phonological features. One principle
explored in this work is that a segment is Basic if, when any term is re-
moved from its phonetic description, the remaining properties no longer
define an existing segment. Thus, /p/ is a basic segment because, if any one
of the terms voiceless, bilabial, plosive is removed from the description,
the remaining terms define a class of segments, rather than an individual
one. Applied in this way, the procedure divides segments into just two sets,
Basic and Complex, and therefore produces a considerably larger class of
Basic segments than the one used by Lindblom and Maddieson. It is, inter-
Calculating Phonological Complexity 91

estingly, dependent on choices made concerning the set of phonetic proper-
ties or features used to define segments. For example, in the classification
scheme of the IPA, nasal is a primitive term applied to segments with an
oral closure and a lowered velum so that all air-flow is directed out through
the nose. But in some other feature schemes, any segment with nasal air-
flow is defined as nasal. Consequently, nasals, in the IPA sense, must be
distinguished from all other segments with a nasal air-flow component by
some additional feature such as stop. If /m/ is defined in the IPA as a
voiced bilabial nasal, it is a Basic segment. However if /m/ is defined as a
voiced bilabial nasal stop (and nasal is taken as a privative feature), then
this segment is no longer Basic, because the term nasal can be removed,
leaving voiced bilabial stop which defines the valid segment /b/.
An advantage of the Marsico et al. procedure is that, once a feature sys-
tem has been settled on, the assignment of degree of complexity can pro-
ceed algorithmically. Decisions on segmental complexity are not made
directly by a linguist, but are derived indirectly from feature assignment,
making the procedure more hands-off. A list of legal segments with
featural descriptions is required though. Marsico et al. (2004) used the set
of 900-odd distinct consonant types listed in the expanded UPSID database
(Maddieson and Precoda, 1990) and a slightly modified version of the fea-
ture set used in Maddieson (1984). The default application of the proce-
dure to the three languages compared above, assigning 1 to Basic and 2 to
non-Basic segments, results in complexity scores for their consonant inven-
tories of 22, for both Indonesian and Birom, and 26 for Kiowa. Only the
aspirated stops and the lateral affricate of Kiowa are non-Basic, since ejec-
tive-stop, ejective-affricate and labial-velar are treated as single features in
the default analysis. A reduced feature set in which each is split into two
gives scores of 22 for Indonesian, 24 for Birom and 30 for Kiowa, which
seems closer to an intuitively-appropriate ranking.
This method could easily be extended to include additional levels of
complexity. For example, one could take the maximum number of succes-
sive removal steps that can be applied as the index of the starting segments
complexity, or one could jointly factor in some of the other indices dis-
cussed by Marsico et al. (2004), like the Derivationality index. The latter
measures the number of different legal segment types that can be built by
adding features to a given segments description. The more segments that
can be derived, the more Basic the starting segment.
There seems to be no decisive advantage overall to one of these two ap-
proaches over the other. A drawback of the Marsico et al. scheme is that it
92 Ian Maddieson

is based on a finite list of segments those which happen to occur in the
languages in a particular sample. The addition or deletion of a language
with unique segments (from the point of view of the sample) will change
some rankings of remaining segments. In Lindblom and Maddiesons
scheme, any newly encountered segment is evaluated on the basis of its
production. Its value is independent of other segments.
On the other hand, the approach based on Lindblom and Maddieson
gives a, perhaps, unwarranted priority to articulatory aspects of complexity
over perceptual ones. The procedure used by Marsico et al. is more neutral
in this regard, as different features may be based on the articulatory, acous-
tic or perceptual properties of segments. It is easier to visualize an intui-
tively-satisfying extension to vowel inventories in the latter approach, as
articulatory complexity does not seem to clearly distinguish among the set
of oral modally-voiced vowels in the way that acoustic/perceptual factors
can, as employed in the Quantal Theory of Stevens (1972, 1989) or in the
definition of the focalization parameter of Schwartz et al. (1997).
In any event, it is not hard to envisage ways of extending these ap-
proaches to include the computation of a measure of the inherent complex-
ity of vowel inventories as well as a languages inventory of tones. We note
that non-peripheral or interior vowels and those with secondary charac-
teristics such as nasalization are generally considered more complex than
plain, peripheral vowels (Crothers, 1978). Unitary contour tones are gener-
ally considered more complex than level tones, those that rise more com-
plex than those that fall, and those that include both a rising and a falling
component the most complex (see, e.g. Gandour & Harshman, 1978). The
calculation of the inherent inventory complexity would be expected to build
on these insights.
The question remains as to whether any scale of inherent segmental or
tonal complexity which seems intuitively appropriate is measuring some-
thing real about the complexity of a languages phonology. There are two
main sides to this question. One relates to the underlying concept of what
complexity is, the other to the problems of demonstrating in some empirical
fashion that what is intuitive has some reality. Since the problems are paral-
lel in regard to the various potential complexity-related factors being con-
sidered in this paper, the discussion will be deferred until the final section.
Calculating Phonological Complexity 93

3. Combinatorial possibilities
Phonological systems consist, of course, not only of an inventory of seg-
ments (and, for some languages, tones), but of patterns of combinations of
these elements within larger structures. Languages differ sharply in how
freely such elements combine and this again seems to provide a natural
scale of complexity. For example, consonant contrasts are typically more
limited in coda than onset position, but the degree of limitation varies con-
siderably. In a quasi-random subsample of 25 languages which permit CVC
structures, 21 allow fewer of their consonants as singleton codas than as
singleton onsets (more complex structures including clusters were not
counted), but the number of singleton coda possibilities ranges from 1 (/m/
in Kiliva) to 32 in Tlingit. To a small degree, differences in combinatorial
possibility enter into the syllabic complexity scale mentioned above, but the
number of different ways that a structure such as CVC can be constituted
played no role. One proposal for a more elaborate scale attempts to calcu-
late the number of possible distinct syllables allowed by the language. In a
simple case this is given by the number of permitted onsets x number of
permitted nuclei x number of permitted codas x number of tones and/or
contrastive accent placements. However, almost all languages impose some
broad-based constraints limiting the combinatorial freedom between onsets
and nuclei, nuclei and codas, tones and rhyme structures, and so on. For
example, in some languages with nasalized vowels in their inventory, the
contrast between oral and nasalized vowels is absent or limited after an
onset nasal (e.g. in Yoruba and Yl Dnye); in numerous languages with
labialized consonants there is no contrast with their plain counterparts be-
fore rounded vowels (e.g. in Hausa and Huastec). Frequently, there are also
limitations constraining the pairing of particular nuclei and codas. It is es-
sential that such limitations be known and factored into the calculation in
order to avoid potentially large errors. Since this calculation is multiplica-
tive, errors compound exponentially.
To see how far astray calculations can go, consider Standard Thai.
There are 21 consonants in the inventory, only 9 of which can occur as
codas (to which zero coda must be added); // occurs only in coda. A lim-
ited number of onset clusters consisting of a stop plus a liquid or /w/ in-
crease the number of onsets to 38 (including zero) (Li, 1977). There are
also 21 vowel nuclei, made up of 9 vowel qualities which occur long or
short, plus the three diphthongs /i u /. Finally, the language has 5
tones 3 level, one rising and one falling. A stress contrast must also be
94 Ian Maddieson

added. If we assume no further combinatorial limitations, this would pro-
duce the estimate of 79800 (38 x 21 x 10 x 5 x 2) as the number of distinct
possible syllables. However, short vowels cannot occur without a coda
except in unstressed syllables, where only three vowels are distinctive and
no coda is permitted. In stressed syllables with a short vowel and an obstru-
ent coda, only two, not five tones contrast. Taking just these limitations into
account, the number of unstressed syllables is not over 648, the number of
stressed syllables with short vowels not over 1520, and the number of
stressed syllables with long vowels or diphthongs not over 22800, for a
total of 24968 less than a third of the original estimate. This number is
still an over-estimate because other regularities are not yet incorporated,
such as the absence of rhymes containing a long vowel and a glottal stop
coda or with a final glide following a cognate vowel (such as /-uuw, -uw/).
An advantage of the syllabic computation is that since the numbers of
consonants, vowels and tonal/accentual patterns enter into the calculation, it
gives a single measure which might substitute for all of these, as well as the
syllable complexity measure. It also sidesteps some of the often difficult
decisions on segmentation. For example, whether the onset in the Hausa
word wai egg is analyzed as a sequence of /k/ and /w/ or as a single
labialized segment /kW/ does not affect the number of onsets computed for
the language. On the other hand, decisions must be made regarding syllabi-
fication, which can be equally or even more problematical. In Hausa, one
question would concern the status of geminates. All consonants in Hausa
can occur as geminates in word-medial position between vowels, but few
consonants are permitted as singleton codas. Given that /k/ never surfaces
as a singleton coda (Newman, 2000:404), is it appropriate to syllabify a
word such as bukka grass hut as /buk.ka/ rather than as /bu.kka/? In
this case, tonal patterns help decide in favor of the former, since falling
tones are only permitted on syllables containing a long vowel or a coda,
and they do occur preceding geminates. But decisive arguments are not
always available.
A few languages were compared with respect to their syllable count in
Maddieson (1984), with Hawaiian having the smallest number of possible
syllables among the 9 languages in the set, namely 162, and Thai the largest
at 23638. Shosted (2006) made a similar computation for a sample of 32
languages, selected to represent the areal-typological groupings established
by Johanna Nichols in Nichols (1992) and later work. Shosted estimates
Egyptian Arabic as the highest with 108400 possible syllables, and assigns
the lowest total to Koiari with 70 syllables, though Tukang Besi with 115 is
Calculating Phonological Complexity 95

not far behind. Since Dutton (1996:11) indicates that word stress ... is
phonemic in Koiari (albeit with some strong regularities), the Koiari total
should probably be doubled to 140, leaving Tukang Besi with the smallest
syllable inventory in the set. There are probably other totals that need ad-
justment in Shosteds data, as a number of uncertainties are clearly identi-
fied and some factors may have escaped attention altogether. However, if
the maximum and minimum values of 115 and 108400 are accepted, this
should not be taken as an indication that Egyptian Arabic is over 900 times
as complicated as Tukang Besi in its phonological patterning. A better
comparison is based on the log transform, given the multiplicative nature of
the calculation, which produces exponentially rising values of the syllable
total for any increase of any single component of the input. This transform
also yields a normal distribution for the index, instead of heavy rightward
skewing, making it amenable to analysis by statistical methods that assume
normality. The log ratio between Tukang Besi and Arabic is 2.44.
Syllable counts of this kind can only be reliably computed for a given
language if there is either a careful enough statement of the combinatorial
possibilities in an accessible linguistic description, or if a sufficiently large
lexicon is available which can be searched for the patterns. The lexicon
must be syllabified and transcribed, or at least in a form that enables sylla-
ble boundaries and phonemic structure to be determined in the entries. For
the economically more important languages of the world, such material is
readily available, but for many minor languages, no reliable distributional
statements have been published and a lexicon of only a few hundred words
may be all that is available. At present, it would be impossible to calculate
the combinatorial possibilities for a large and diversified sample of lan-
guages. However, in some traditions of language description, such as the
French structuralist model, the required information is regularly provided
and the number of languages for which suitable data is available continues
to increase.
4. Frequency of types
If segments and sequential patterns differ in complexity, then it is reason-
able to propose that the complexity of a phonological system varies accord-
ing to the relative frequency of more versus less complex elements. That
such frequency differences occur is relatively easy to demonstrate. For
example, in many of the languages for which segment frequency counts are
96 Ian Maddieson

available the most frequent consonant is one of the most basic ones, such as
/k/ as in Andoke (Landaburu, 1979), Kuku Yalanji (Patz, 2002), Hawaiian
(Pukui and Elbert.1979), and Northern Khmu/ (Svantesson, 1983), or /t/ as
in Russian (Kuc#era and Monroe, 1968), Japanese (Bloch, 1950), Tulu
(Bhat, 1967) and Maori (Bauer, 1993). However in the frequency counts
for French in Delattre (1965) and Malcot (1974) as well as in more recent
computational counts (Pellegrino, p.c.) the most frequent consonant is //,
which would be considered a Complex consonant in the scheme of Lind-
blom and Maddieson (1988) (as it is a voiced fricative with a displaced
articulation). Since a speaker of French is called on not merely to produce
this Complex consonant but to produce it quite frequently, French might be
considered more complex in this respect than German, say, which has a
similar consonant in its inventory, but where the most common consonant
is Basic /n/ (Kuc#era and Monroe, 1968).
Similarly, the frequency of syllabic patterns also varies. Both Noon
(Soukka, 2000) and Maybrat (Dol, 1999) seem to allow CVC as the maxi-
mally elaborate syllable type but whereas in Noon this is the predominant
pattern (at least in the case of monosyllabic stems, which form a very large
part of the vocabulary), in Maybrat CV, not CVC, is the predominant sylla-
ble structure. In this respect, therefore Noon might be evaluated as more
complex than Maybrat.
The problem is how to develop a generally applicable method of com-
paring a large number of languages, given that the data are incomplete for
the majority of them. The frequency data cited above were not distin-
guished as to whether these frequencies were calculated on the basis of
frequency in the lexicon or frequency in text. The two are broadly corre-
lated but far from identical. However, a lexicon is available for more lan-
guages than a large body of reliably-transcribed texts, which is required to
avoid the biases that text styles and content can introduce. Hence, lexical
frequencies provide a better opportunity for large-scale comparison. The
size of available lexicons is also very variable. When it is relatively small,
an analysis of the total set of frequency patterns is impossible; the relative
frequencies of types with lower frequency will be unreliably represented
and rare types may be altogether absent. A feasible comparison might,
therefore, be based on only the consideration of patterns which are among
the most frequent.
With respect to segments, it would be feasible to compile a count of,
say, just the ten most frequent in lexical forms. This also provides a solu-
tion to the problem posed by the variation in the number of segments across
Calculating Phonological Complexity 97

languages (the smallest phonemic inventory reported is 11 segments in
Rotokas and Pirah). A sample comparison is provided in Table 3. The
columns on the left show the frequencies of the 10 most frequent consonant
segments in English (British RP dialect), as reported by John Higgins of the
University of Bristol, based on an analysis of the lexicon included in a
popular learners dictionary (Hornsby et al., 1973). The columns on the
right provide comparable data for Totonac extracted from the considerably
smaller lexicon provided by Reid & Bishop (1974).

Table 3. Comparison of segment frequency by rank in English and Totonac
English (RP) Totonac
segment frequency segment frequency
t
34260
t
2138
s
33922
n
2123
n
31934
q
1629
l
27373
k
1529

23069
l
1278
k
22453

1032
d
21275
p
996
z
19972
m
994
p
15553
s
777
m
14823
t
471

To compare these data, it is useful to calculate some kind of index.
There are a number of ways this might be done. One possibility is to calcu-
late a summed frequency x complexity score over the top ten segments, in
which each segment contributes decreasingly according to its rank, and
increasingly according to its complexity. A simple procedure would be to
take the level of complexity of each segment suggested by Lindblom and
Maddieson (1988), and multiply it by its rank, expressed as decreasing
decimal fractions of 1 (i.e. 1, 0.9, 0.8, 0.7, etc). For instance, the segment
/n/ of English would contribute 0.8 x 1 to the score and the segment /z/
would add 0.3 x 2, i.e. 0.6. If all of the top ten segments by frequency are
Basic, then the score on this index would be 5.5. Assuming that // is Basic
98 Ian Maddieson

(which is perhaps debatable), this variety of English scores just 5.8. On the
other hand, Totonac has two segments in its top ten which count as Elabo-
rated, namely /q/ and //. The score for this language is 7.1. This method is
somewhat crude, but its appeal is that it can be easily applied to a large
sample of languages despite differences in their inventory size and the
quantity of lexical forms available for counting.
A lexicon can also be used to calculate relative frequency of syllable types.
A simple procedure would be to calculate the proportion of each of the
three categories of syllabic complexity used in Maddieson (2006, 2007)
that occur in the lexicon. An index summing the proportions multiplied by
the complexity level would then provide a more gradient estimate of a lan-
guages complexity at the syllabic level. All languages whose (native) vo-
cabulary contains only Simple syllables with the pattern (C)V (including
syllabic nasals in the V position in some cases) would have an index of 1.
This set would include, for example, Fijian, Yoruba and Guarani. But even
at the next level of syllabic complexity, different languages would have a
considerable range of values. Eight languages which allow only single coda
consonants were compared with respect to the proportion of closed versus
open syllables found in the lexicon available for counting. The languages
are Yupik (Jacobson, 1984), Kadazan (Lasimbang & Miller, 1995), Gbaya
(Blanchard & Noss, 1982), Darai (Kotapish & Kotapish, 1975), Mandinka
(Tarawale et al., 1980), Lhasa Tibetan (Goldstein & Nornang, 1984), Co-
manche (Wistrand-Robinson & Armagost, 1990) and Wa (Yunnan Minzu
Chubanshe, 1981). These languages either have no onset clusters or allow a
limited set of common onset clusters (C + liquid or glide) which only occur
infrequently. A syllable complexity x frequency index was calculated by
multiplying the proportion of open syllables by 1 and the proportion of
syllables with codas by 2 and summing the products. The closer the index
is to 2, the greater the share of closed syllables in the lexicon. The mean
number of syllables per word was also calculated. The results are shown in
Table 4.








Calculating Phonological Complexity 99

Table 4. Open syllable frequency in the lexicon of selected languages.
Language # syllables
counted
% open
syllables
index mean syllables
per word
Comanche 17485 85.7 1.14 3.63
Lhasa Tibetan 4698 84.5 1.16 1.84
Darai 5597 77.6 1.22 2.34
Mandinka 9795 78.2 1.22 2.73
Gbaya 10342 75.8 1.24 2.35
Kadazan (Dusun) 5339 63.0 1.37 2.64
Yupik 7973 52.7 1.47 3.40
Wa (Parauk) 3180 22.7 1.77 1.00

Comanche, which allows no word-final codas, has an index not much
above 1 (and it might be even lower if a different interpretation of the role
of /h/ in the phonology was adopted), whereas Yupik approaches 1.5, and
Wa is over 1.7. If the assumption that CVC syllables are more complex
than CV syllables is correct, then it is reasonable to argue that Yupik or
Wa is more complex than Comanche or Tibetan because a larger proportion
of their syllables have the more complex structure.
It is interesting to note that these results cannot be predicted from word
length. A lower proportion of (C)VC structures might be expected to re-
quire words to be longer in order to create sufficiently rich lexical re-
sources. This expectation is not borne out by the comparison of the index
and the mean number of syllables per word, as these are not significantly
correlated with each other regardless of whether or not the low word-
length value for Wa (see below) is included. Comanche and Yupik have
the greatest mean word length but are at opposite ends of the open syllable
frequency index. Similarly, Tibetan and Wa have the two lowest index
scores but have the shortest mean word length. Rather, the relative frequen-
cies of longer versus shorter words might be considered to be another inde-
pendent, contributing factor in evaluating the complexity of phonological
forms.
A major caution in using lexical lists must be noted. The choice of what
form is entered as the lemma can have a major impact on the segment and
syllable frequencies in the lexicon for the many languages in which word-
forms vary according to case, tense, gender, phonological environment, etc.
100 Ian Maddieson

For example, in the dictionary by Pogonowski (1983), which was used to
obtain frequency counts of Polish segments, the conventional choice of
entering verbs in their infinitive form was made. The vast majority of in-
finitives end in the affricate /t/ and hence, this segment has a much higher
frequency than if some other choice had been made. It accounts for over
12% of the codas in this count, whereas the next most frequent segment in
this position, /t/, accounts for only about 3% of the codas. In some lan-
guages with elaborate polymorphemic structures, such as those in the Iro-
qoian family (see Chafe (1967) on Seneca or Doherty (1993) on Cayuga)
more fundamental problems arise over what to consider a word and hence,
how to select any single form for a lexical entry. Choice of the form of
lexical entries obviously also affects the word length calculation. For ex-
ample, the compilers of the Wa dictionary consulted for this study made the
decision to treat all lexical entries as consisting of one or more monosyl-
labic words. An orthography in use in Burma (Myanmar) instead writes a
number of these items as disyllabic words (Watkins & Kunst, 2006), which
would yield a mean word-length slightly greater than 1 syllable.
A final comment on frequency is in order. It might be argued that as the
frequency of any segment or pattern in a given language increases, the
more familiar speakers become with it. This familiarity reduces the com-
plexity of the item. While it is undoubtedly true that (over-)learned behav-
iors require less effort than novel behaviors in various ways (e.g. less
attention, less time) this does not constitute an argument that different
learned behaviors are all equally complex. Any spoken language behavior
is highly learned, but specific patterns can differ in the amount of muscular
coordination required, the difficulty of identifying them, or other factors
that can reasonably be considered as making one more complex than an-
other.
5. Variability and Transparency
A further plausible assumption about phonological complexity is that lan-
guages for which the patterns of variation in the phonology are more
transparent are simpler than those for which the variations are more arbi-
trary. For example, consider patterns of tonal variations in two Chinese
languages. In Cantonese, the tone found in the isolation form of a word
remains unchanged in context in most cases. The only regular phonological
alternation is that the mid-high fall [53] becomes level high [55] when it
Calculating Phonological Complexity 101

precedes a syllable with a high tone onset (Chao, 1947, Hashimoto, 1972)
(although some more recent analyses describe this variation as free rather
than conditioned). In Fuzhou, an Eastern Min variety, on the other hand,
the nonfinal elements of a disyllabic or longer sequence often have a differ-
ent tone from their isolation tones, and in some cases, the vowel will also
differ (Maddieson, 1975, Chan, 1985). A couple of examples are given in
Table 5 with tones represented by numbers where 5 is the highest. These
changes in vowel quality and tone shape are not straightforwardly explica-
ble in terms of adaptation to the phonological environment, for example via
assimilation, as is the case for Cantonese. Surely in this respect, Fuzhou is
more complex than Cantonese.

Table 5. Tone sandhi examples from Fuzhou Chinese
Isolation forms Combined form
sei 2 + Nuo 5
ten month
sei 2 Nuo 5
October
sei 24 + tsiu 22
lose hand
si 35 tsiu 22
slip from hand

The most serious attempt to establish a basis for comparison of a significant
number of languages at the level of phonological alternation was the com-
pilation of phonological rules undertaken for the Stanford Phonology Ar-
chive (see Vihman, 1977 for a general description), a major component of
the Language Universals Project conducted at Stanford University under
the direction of Joseph Greenberg and Charles Ferguson. Although a few
studies were made of particular processes, such as reduplication (Moravc-
sik, 1978), or alternations between [d] and [D] (Ferguson, 1978), no method
of comparing the phonological systems in a more global way was devel-
oped and the data remained little exploited.
The problem is a formidable one. Very likely the only practicable way
to develop a measure would be to tightly circumscribe the scope of the
comparison. A standardized basic vocabulary, say of 100 words similar to
the Swadesh list employed to get rough-and-ready genetic groupings, but
perhaps more carefully selected, might be used. The number of variant
forms of these words that arise strictly from different phonological condi-
tions might then be countable and the level of transparency of the proc-
102 Ian Maddieson

esses which create these variants could also be rated. A combined score
taking into account the number and nature of the variants could then be
computed over this restricted wordlist. Unlike the situation with the other
suggestions in this paper, no preliminary studies of this kind have been
carried out, and a major problem would be to separate strictly phonological
processes from those with a morphological or syntactic basis. This pro-
posal, of course, begs the question of whether such a separation is actually
possible.
6. Defining and demonstrating complexity
In the discussion so far an essentially informal or intuitive sense of relative
complexity has been appealed to. In this final section, I want to briefly con-
sider the issues of seeking a more explicit definition, and finding ways to
demonstrate the appropriateness of a definition and directly measure the
relevant properties. Methods that can be widely applied across different
fieldwork settings will be especially considered.
One basic understanding of the meaning of the word complex is a
straightforwardly quantitative one: any structure or system that contains
more elements than another is the more complex of the two. This is easy to
apply in comparing, say, length of syllables or number of consonants in an
inventory. However it does not answer many questions for example,
whether CCV or CVC is a more complex syllable pattern. A more compre-
hensive approach may be sought by making the common equation of com-
plexity with difficulty. In this view a given linguistic element or pattern is
more complex than another if it is more difficult to execute, more difficult
to process, more difficult to learn, or more difficult to retain in memory.
Difficulty can itself be hard to demonstrate, but there are a number of ac-
cepted ways of operationalizing a test for difficulty. One frequently used
way to measure the difficulty of human tasks is to compare reaction times
for different tasks or different variants of a task. A task that takes longer is
assumed to be more difficult. Reaction time experiments can be designed
for a range of linguistic execution and processing tasks, and can be adapted
for use in remote locations and under a variety of cultural conditions. For
example, whether CCV or CVC syllable structures are more difficult might
be tested by picture identification or naming tasks using either appropri-
ately structured real words or nonsense forms taught in pre-test training
sessions. If the time to react to presentation of a target is slower in one case
Calculating Phonological Complexity 103

than another, it may be taken as evidence of greater difficulty. For purposes
of comparing complexity, it may not be necessary to demonstrate the
source of the difficulty problems of execution, recognition, or recall can
all be taken as marks of higher complexity.
Slowness of learning a task is also commonly equated with its degree of
difficulty, and this perspective is often appealed to in explaining the order
of acquisition of phonological contrasts and patterns by children. However,
observing natural first-language acquisition is a demanding process. There
are many idiosyncracies in individual children and we are never likely to
have broad enough data on the large number of languages which would be
required in order to come up with, say, an overall scale of segmental diffi-
culty covering all segment types and motivated by actual learning diffi-
culty. However, ease of learning for adults can be explored with experi-
mental paradigms in which the measure of difficulty is error rate or a
similar variable. Demonstrations that arbitrary phonological alternations are
harder to learn than regular ones have been made by Pycha et al. (2003,,
2007) and Wilson (2003) using limited artificial languages. This approach
draws on earlier experimental work with children, such as the famous wug
test of Jean Berko Gleason (1958). The protocols used in the artifical
grammar learning paradigm seem adaptable enough to be employed outside
the university settings in which they have been developed, and the example
provided by research with children confirms that subjects do not need to be
familiar with formal experimental settings to participate.
A somewhat different notion of complexity might be based on measures
reflecting amount of work done. Lindblom and colleagues (Lindblom &
Moon, 2000, Davis et al., 1999) have considered whether levels of oxygen
consumption can provide an index of effort levels during speech produc-
tion. Whalen and his co-workers have proposed that brain activation levels
revealed by fMRI can indicate which syllable structures are more complex
than others (Whalen, 2007). These techniques demand technological tools
that are not widely available or portable, but potentially provide direct
measures of complexity independent of experimenter judgments. They are
unlikely to be applied to speakers of a large number of languages, but could
furnish a basis for generalization to untested conditions.
One of the most problematic issues concerns how to integrate the vari-
ous facets of discussed phonetic and phonological complexity into more
inclusive measures. At a simple level, the question might be taken as one of
finding appropriate weights for each facet considered. However, assigning
these weights is problematic. In an inventory, does increasing the number
104 Ian Maddieson

of vowels contribute more complexity than increasing the number of con-
sonants? One more vowel typically increases the number of potential syl-
lables more than one more consonant, but consonants typically require
greater articulatory effort than vowels. With respect to phonotactics, are
more constrained combinatorial patterns more complex than free combina-
tion? One contributes more constraints to learn, but the other increases the
total number of possible syllables. These kinds of issues are unlikely to be
resolved soon. In the mean time, the best procedure might be to treat each
evaluated variable as equivalent to every other, assigning index values for
languages, respectively at, above, and below an average complexity level
and then summing the results for an overall score. This procedure is crude,
but readily achievable. As Nichols (2007) has shown, a similar procedure
can be applied to incorporate measures of morphological or syntactic com-
plexity into more global estimates of the complexity of individual lan-
guages.
References
Akmajian, Andrew, Richard A. Demers, & Robert M. Harnish.
1979 Linguistics: An Introduction to Language and Communication. MIT
Press, Cambridge, MA.
Alwi, Hasan, Soenjona Dardjowidjojo, Hans Lapoliwa and Anton M. Moeliono.
1998 Tata Bahasa Baku Bahasa Indonesia: Edisi Ketiga. Balai Pustaka,
Jakarta.
Bauer, Winifred
1993 Maori. Routledge, London.
Berko Gleason, Jean
1958 The Child's Learning of English Morphology. Word 14: 150-177
Bhat, D. N. S.
1967 Descriptive analysis of Tulu. Deccan College Postgraduate and Re-
search Institute, Poona [Pune].
Bickerton, Derek.
2007 Language evolution: A brief guide for linguists. Lingua 117: 510-526.
Blanchard, Yves and Philip A. Noss
1982 Dictionnaire Gbaya-Franais: Dialecte Yaayuwee. Centre de Tra-
duction Gbaya, Mission Catholique de Meiganga et Eglise Evang-
lique Luthrienne du Cameroun, Meiganga.
Bloch, Bernard
1950 Studies in colloquial Japanese, IV: Phonemics. Language 26: 86-125.

Calculating Phonological Complexity 105

Bouquiaux, Luc
1970 La Langue Birom (Nigeria septentrional) Phonologie, Morphologie,
Syntaxe. Bibliotheque de la Faculte de Philosophie et Lettres de
l'Universite de Liege, Liege.
Chafe, Wallace L.
1967 Seneca Morphology and Dictionary. Smithsonian Institution, Wash-
ington, DC.
Chan, Marjorie K-M.
1985 Fuzhou Phonology: A Non-Linear Analysis of Tone and Stress. Ph.
D. dissertation, University of Washington, Seattle.
Chao, Yuan-Ren.
1947 Cantonese Primer. Harvard University Press, Cambridge MA.
Crothers, John.
1978 Typology and universals of vowel systems. In J. H. Greenberg, ed.,
Universals of Human Language, Volume 2: Phonology. Stanford
University Press, Stanford: 93-152.
Davis, J. H., B. Lindblom, R. Spina and Z. Simpson.
1999 Energetics in phonetics. Paper presented at Speech Communication
and Language Development Symposium. Stockholm University.
Delattre, Pierre
1965 Comparing the Phonetic Features of English, German, Spanish and
French. Julius Groos Verlag, Heidelberg.
Doherty, Brian
1993 The Acoustic-Phonetic Correlates of Cayuga Word-Stress. Ph. D.
dissertation, Harvard University, Cambridge MA.
Dol, Philomena
1999 A Grammar of Maybrat. University of Leiden, Leiden.
Dutton, Tom E
1996 Koiari. Lincom Europa, Mnchen and Newcastle.
Fenk-Oczlon, Gertraud and August Fenk.
2008 Complexity trade-offs between subsystems of language. In Miestamo
et al, 2008: 43-66.
Ferguson, Charles A.
1978 Phonological processes. In J. H. Greenberg, C. A. Ferguson, and E.
A. Moravcsik (eds.), Universals of Human Language, vol. 3, Stan-
ford, Stanford University Press, 403442.
Gandour, Jackson, T. and Richard A. Harshman.
1978 Crosslanguage differences in tone perception: A multi-dimensional
scaling investigation. Language and Speech 21: 1-33.
Givn, T and Bertram F. Maile, editors.
2002 The Evolution of Language out of Pre-language. John Benjamins,
Amsterdam.
106 Ian Maddieson

Goldstein, Melvyn C. and Nawang L. Nornang. 1984. Modern Spoken Tibetan:
Lhasa Dialect, 3rd edition. Ratna Pustak Bhandar, Kathmandu.
Hashimoto, Oi-Kan Yue.
1972 Studies in Yue Dialects 1: The Phonology of Cantonese. Cambridge
University Press, Cambridge.
Hornsby, A. S., E. V. Gatenby and H. Wakefield.
1973 Advanced Learners Dictionary of English (Oxford Text Archive
edition). Longman, London.
Jacobson, Steven A.
1980 Yupik Eskimo dictionary. Alaska Native Language Center, Univer-
sity of Alaska, Fairbanks.
Kawasaki-Fukumori, Haruko.
1992 An acoustic basis for universal phonological constraints. Language
and Speech 35:73-86.
Kotapish, Carl and Sharon
1975 A Darai-English, English-Darai Glossary. Summer Institute of Lin-
guistics, and Institute of Nepal and Asian Studies, Tribhuvan Uni-
versity, Kathmandu.
Kuc# era, Henry and George K. Monroe
1968 A Comparative Quantitative Phonology of Russian, Czech, and
German. American Elsevier, New York.
Ladefoged, Peter
2006 A Course in Phonetics, Fifth Edition. Thompson/Wadsworth, Bos-
ton.
Landaburu, Jon
1979 La Langue des Andoke (Amazonie colombienne) Grammaire. Centre
National de la Recherche Scientifique, Paris.
Lasimbang, Rita and John Miller
1995 Kadazan Dusun - Malay - English Dictionary. Kadazan Dusun Cul-
tural Association, Kota Kinabalu. (pre-publication version actually
consulted)
Li, Fang-Kuei
1977 A Handbook of Comparative Thai. University Press of Hawaii,
Honolulu.
Lindblom, Bjrn and Ian Maddieson
1988 Phonetic universals in consonant systems. In Language, Speech and
Mind (ed C. Li and L.M. Hyman). Routledge, London. 62-78.
Lindblom, Bjrn. & Seung-Jae Moon
2000 Can the energy costs of speech movements be measured? A prelimi-
nary feasibility study, Journal of the Acoustical Society of Korea
19.3E, 25-32.
Maddieson, Ian and Kristin Precoda
1990 Updating UPSID. UCLA Working Papers in Phonetics 74: 104-114.
Calculating Phonological Complexity 107

Maddieson, Ian
1975 The intrinsic pitch of vowels and tone in Foochow. San Jose State
Occasional Papers in Linguistics 1: 150-161. (Proceedings of the
Fourth California Linguistics Conference).
1984 Patterns of Sounds. Cambridge University Press, Cambridge.
2005 Consonant inventories. In M. Haspelmath, M. S. Dryer, D. Gil, & B.
Comrie, eds., World Atlas of Language Structures, Oxford Univer-
sity Press, Oxford and New York: 10-13.
2006 Correlating phonological complexity: data and validation. Linguistic
Typology 10: 108-125.
2007 Issues of phonological complexity: Statistical analysis of the rela-
tionship between syllable structures, segment inventories and tone
contrasts. In M-J. Sol, P. Beddor and M. Ohala, (eds), Experimental
Approaches to Phonology. Oxford University Press, Oxford and
New York: 93-103.
Malcot, Andr
1974 The frequency of occurrence of French phonemes and consonant
clusters. Phonetica 29: 158-170.
Marsico, Egidio, Ian Maddieson, Christophe Coup, and Franois Pellegrino.
2004 Investigating the hidden structure of phonological systems. Pro-
ceedings of the 30th Berkeley Linguistic Society Meeting: 256-267.
McWhorter, John H.
2001a The Power of Babel: A Natural History of Language. Times Books,
New York.
2001b The World's Simplest Grammars Are Creole Grammars. Linguistic
Typology 5:125-166.
Miestamo, Matti, Kaius Sinnemki and Fred Karlsson, editors
2008 Language Complexity: Typology, Contact, Change. John Benjamins,
Amsterdam.
Moravcsik, Edith A.
1978 Reduplicative constructions. In J. H. Greenberg, C. A. Ferguson, and
E. A. Moravcsik (eds.), Universals of Human Language, vol. 3,
Stanford, Stanford University Press, 297334.
Newman, Paul
2004 The Hausa Language: An Encyclopedic Reference Grammar. Yale
University Press, New Haven and London.
Nichols, Johanna
1992 Linguistic Diversity in Space and Time. University of Chicago Press,
Chicago
2007 The distribution of complexity in the worlds languages. Paper pre-
sented at symposium on Approaches to Language Complexity. An-
nual Meeting of the Linguistic Society of America, Anaheim.

108 Ian Maddieson

Ohala, John J. and Kawasaki-Fukumori, Haruko
1997 Alternatives to the sonority hierarchy for explaining segmental se-
quential constraints. In Stig Eliasson & Ernst Hkon Jahr (eds.),
Language And Its Ecology: Essays In Memory Of Einar Haugen.
Berlin: Mouton de Gruyter. 343-365
Patz, Elisabeth
2002 A Grammar of the Kuku Yalanji Language of North Queensland
(Pacific Linguistics 526). Research School of Pacific and Asian
Studies, Australian National University, Canberra.
Pogonowski, Iwo
1983 Dictionary Polish-English, English-Polish. Hipprocrene Books, New
York.
Pukui, M. K. and Elbert, S. H.
1979 Hawaiian Grammar. University Press of Hawaii, Honolulu.
Pycha, Anne, P. Nowak, E. Shin and R. Shosted
2003 Phonological rule-learning and its implications for a theory of vowel
harmony. G. Garding and M. Tsujimura (eds) Proceedings of
WCCFL 22. Somerville, Cascadilla Press: 423-435.
Pycha, Anne, Eurie Shin, and Ryan Shosted
2007 Directionality of assimilation in consonant clusters: An experimental
approach. Paper presented at Symposium Towards an Artificial
Grammar Learning Paradigm in Phonology. Linguistic Society of
America Annual Meeting, Anaheim, CA.
http://www.linguistics.berkeley.edu/~shosted/dacc.pdf
Reid, Aileen A. and Ruth G. Bishop
1974 Diccionario Totonaco de Xicotepec de Jurez, Puebla. Instituto
Lingstico de Verano, Mexico.
Schwartz, J-L., L.-J Bo, N. Valle, & C. Abry
1997 The dispersion-focalization theory of vowel systems. Journal of
Phonetics 25: 255-286.
Shosted, Ryan K.
2006 Correlating complexity: a typological approach. Linguistic Typology
10: 1-40.
Sinnemki, Kaius.
2008 Complexity trad-offs in core argument marking. In Miestamo et al,
2008: 67-88.
Sivertsen, Eva
1956 Pitch problems in Kiowa. International Journal of American Lin-
guistics 22: 117-130
Soukka, Maria
2000 A Descriptive Grammar of Noon: A Cangin Language of Senegal.
Lincom Europa, Mnchen.

Calculating Phonological Complexity 109

Stevens, Kenneth N.
1972 The quantal nature of speech: evidence from articulatory-acoustic
data. In E. E. David & P. B. Denes (eds) Human communication: A
unified view. London: Academic Press. 51-66.
1989 On the quantal nature of speech. Journal of Phonetics 17: 3-45.
Svantesson, Jan-Olof
1983 Kammu phonology and morphology. (Travaux de L'Institut de Lin-
guistique de Lund, 18.) Lund: Gleerup.
Tarawale, Ba, Fatumata Sidibe and Lasana Konteh
1980 Mandinka-English Dictionary. National Literacy Advisory Commit-
tee, Bathurst.
Vihman, Marilyn
1977 A Reference Manual and User's Guide for the Stanford Phonology
Archive. Part I. Department of Lingujistics, Stanford University,
Stanford.
Watkins, Justin and Richard Kunst
2006 Writing the Wa language. On-line document at
http://mercury.soas.ac.uk/wadict/wa_orthography.html
Watkins, Laurel
1984 A Grammar of Kiowa. University of Nebraska, Lincoln.
Whalen, Douglas H.
2007 Brain activations related to changes in speech complexity. Paper
presented at symposium on Approaches to Language Complexity.
Annual Meeting of the Linguistic Society of America, Anaheim.
Wilson, Colin
2003 Experimental investigation of phonological naturalness. In G. Gard-
ing and M. Tsujimura (eds). Proceedings of WCCFL 22. Cascadilla
Press, Somerville, MA: 533-546.
Wistrand-Robinson, Lila, and James Armagost
1990 Comanche Dictionary and Grammar. Summer Institute of Linguis-
tics and The University of Texas at Arlington, Arlington.
Wolff, Hans
1959 Subsystem typologies and area linguistics. Anthropological Linguis-
tics 1/7: 1-88.
Yunnan Minzu Chubanshe [Yunnan Minorities Publishing House] (Yan Qixiang,
Zhou Zhizhi et. al., compilers)
1981 Pug lai cix ding yiie si ndong lai Vax mai Hox (A Concise Diction-
ary of Wa and Chinese). Kunming.



Favoured syllabic patterns in the worlds languages
and sensorimotor constraints
Nathalie Valle, Solange Rossato and Isabelle Rousset
The general aim of this study is to investigate the linear organization of
speech sound sequences in natural languages. Using a 17-language syllabi-
fied lexicon database (ULSID), we examine several preferred syllabic and
lexical patterns and we discuss them in light of data from speech produc-
tion and perception experiments. First, we observe that some of the prefer-
ences found for tautosyllabic relations were consistent with the predictions
of the Frame/Content Theory, while others extended them. Then, we inves-
tigate trends in the coocurrence of prevocalic and postvocalic consonants
that appeared in the same syllable or in the onset of two consecutive sylla-
bles.
We discover that the Labial-V-Coronal sequence was widespread in many
lexica of ULSID. Our results not only confirm the presence of the so-called
Labial-Coronal (LC) effect found by MacNeilage et al., (1999) in dissylla-
bic words, but also show that it occurs in various syllabic patterns. Then,
we report results of two experiments which provide phonetic bases for the
LC effect. Finally, we focus on consonant sequences involving plosive and
nasal, showing that the preferred sequences were inconsistent with the so-
nority hierarchy. We claim that the disfavored patterns can be predicted by
a more complex gesture of the velum due to aerodynamic constraints. More
broadly, to better understand the complexity of sound sequences in lan-
guages we discuss the relationship between human sensorimotor capacities
and phonology.
1. Introduction
Although there is no unambiguous definition on what a complex system
is, some key aspects like evolution, adaptation, interaction, dynamics and
self-organization are common characteristics. These aspects are examined
in studies on speech production, speech comprehension and speech acquisi-
tion. They therefore make it possible to regard human speech processing as
a complex system (for discussion on complexity in phonetics and phonolo-
gy see the Introduction of this volume by Pellegrino, Marsico, Chitoran,
112 Nathalie Valle, Solange Rossato and Isabelle Rousset

and Coup). Speech is known to be structured at many different levels and,
at each level, preferred types of organization are found, both within and
across languages. There is no doubt that the description and the analysis of
a common organization to many languages of different genetic groups al-
lows us to gain a better understanding of such a complex system. This re-
search attempts to answer the following questions: Why are some structures
very common in the worlds languages, regardless of linguistic affiliation
and geography? Where do these structures come from? How can they be
explained?
Since Trubetzkoys classification of speech sound systems in the 30s
(Trubetzkoy, 1939), several typological studies have demonstrated that
languages do not structure their sound systems arbritrarily. According to
the approach introduced by Liljencrants and Lindblom (1972), typological
analyses associated with advances in predictive models have shown that
human languages do not make random use of both the vocal apparatus and
the auditory system to organize their phonological structures. Several stud-
ies have contributed to building a theory for predicting most of the vowel
systems of the worlds languages from principles based on perceptual dis-
tinctiveness (Lindblom, 1986 and later; Schwartz, Bo, Valle and Abry,
1997; Valle, Schwartz, and Escudier, 1999; De Boer, 2000). To our
knowledge, no general theory accounts for predicting consonant structures,
though certain works have provided natural phonetic bases for preferences
in consonant inventories. According to these works, consonant structures
are shaped by acoustic and aerodynamic constraints that provide sufficient
auditory contrast (Kingston and Diehl, 1994; Kingston, 2007; Mawass,
1997; McGowan, Koenig and Lfqvist, 1995; Ohala, 1983; Ohala and Jae-
ger, 1986; Stevens, 1989; 2003; Stevens and Keyser, 1989), and for some
of them, by visual contrast (Mawass, Badin and Bailly, 2000; Valle et al.,
2002; Bo, et al., 2005). Moreover, physiological constraints regarding
possible or impossible gestures have also been suggested (Lindblom and
Maddieson, 1988).
Beyond the segmental level, languages structure the linear order of
sounds at other levels, within syllables and grammatical units such as mor-
phemes, words, clauses, sentences. It is widely accepted that the syllable is
a structural unit of speech, in production and perception. Derwing (1992)
specified that the speakers of many languages are able to count the number
of syllables of a word, and obtain significant and consistent results in a
syllabification task, even for words with geminated consonants. However,
even if relevant studies have shown that speakers and listeners are generally
Favoured syllabic patterns and sensorimotor constraints 113

able to agree on the number of syllables in an utterance, sometimes their
syllabification differs (e.g. Content, Kearns and Frauenfelder, 2001). Be
that as it may, many psycholinguistic studies have found that syllables arise
spontaneously when speakers and listeners have to segment the speech
stream (e.g. Morais, Cary, Alegria and Bertelson, 1979; Segui, Dupoux,
and Mehler, 1990; Treiman, 1989; Sendlmeier, 1995; Treiman and Kessler,
1995). Moreover, the syllable is invoked in speech errors and word games
(e.g. Treiman, 1983; Bagemihl, 1995; MacNeilage, 1998) and in secret
languages e.g. Verlan, derived from French (Plnat, 1995). It has been also
proposed as a processing unit in visual word recognition (Carreiras, lva-
rez and De Vega, 1993) and in handwriting production (Kandel, lvarez
and Valle, 2006). In addition, neurophysiological works have suggested
that the syllable is a central unit of language, specifically in its emergence,
acquisition and function (MacNeilage and Davis, 1990; MacNeilage, 1998;
MacNeilage and Davis, 2000).
Despite its well-established role in all these studies, the syllable is an
enigmatic linguistic unit. The question of its nature is still unresolved be-
cause its definition remains a problem at both the phonetic (see Krakow,
1999 for a review) and phonological (Ohala and Kawasaki, 1984;
Kenstowicz, 1994) level. Whatever the linguistic status of the syllable is,
the detailed examination of prevalent cross-linguistic sound combinations
within syllables, and combinations of syllables within morphemes and
words, provides knowledge of the syllabic organization of segments, on
both phonetic and phonological levels (Blevins, 1995), and can contribute
to defining of the syllable. As for sound systems that are found in a wide
range of languages, attempts have been made to explain them in terms of
human capacities of perception (distinctiveness), production (vocal ges-
tures) or both (Kawasaki, 1982; Janson, 1986; Krakow, 1999; Redford and
Diehl, 1999; Maddieson and Precoda, 1992; Lindblom, 2000).
In this paper, we present results on phonotactic regularities found in
lexical and syllabic patterns from several languages. The study was con-
ducted using the ULSID (UCLA Lexical and Syllabic Inventory Database),
partly provided by Ian Maddieson (Maddieson and Precoda, 1992). This
database was created like UPSID (UCLA Phonological Segment Inventory
Database) (Maddieson, 1984; Maddieson and Precoda, 1989), for the un-
derstanding of speech sound structures in the worlds languages. ULSID is
being developed in our laboratory and currently has syllabified lexicons
from 17 languages which are representative of the major language families
and fairly well distributed geographically. In line with Maddieson (1993:1):
114 Nathalie Valle, Solange Rossato and Isabelle Rousset

Recent loan words, especially those of wide international currency relat-
ing to modern technological, political or cultural concepts (telephone,
democracy, football), have been excluded wherever recognizable. The
database currently contains more than 90,000 words, from 2,000 in Ngizim
to 12,200 in French. The mean number of words per language is 5,908. The
other languages included in the study are Afar, Finnish, Kannada, Kanuri,
Kwakwala, Navaho, Nyah kur, Quechua, Sora, Swedish, Thai, Vietnam-
ese, Wa, Yupik and !X. Each lexical entry consists of an IPA transcrip-
tion with marks indicating its syllabic structure, representing the following
information: the division in syllables and, for each syllable, its conventional
sub-syllabic components, such as onset and rhyme. In addition to Mad-
dieson and Precodas (1992) database, other languages were included using
similar sources of information. The syllabification was done either from
published (printed or computer-readable) syllabified lexicons (French:
BDLEX-Syll from BDLEX 50.000, Prennou and de Calms, 2002; Swed-
ish: Berlitz, 1981) or manually by at least two native speakers of the lan-
guage (as in Vietnamese). The lexical entries consisted of lemmas only.
We investigated the sound organization in the lexical units by consider-
ing the syllable boundaries. Adjacent sounds in the same syllable (tautosyl-
labic) and those of two consecutive syllables within lexical items were ex-
amined according to the nature of the segments involved. Clear patterns
emerged from the statistical analyses and showed that syllables, like pho-
nemes, are not arbitrary linguistic units. Three favored combinations of C
and V were found by using computed ratios between observed and ex-
pected values in CV syllables, but also in both ( )CV( ) and ( )VC( ) tem-
plates, with a stronger link in the last one. This provides evidence favoring
sequences where the articulators do not make extensive movements be-
tween the consonant and vowel gestures. The results, presented in Sec-
tion 1, are discussed with respect to the predictions of the Frame/Content
Theory (MacNeilage, 1998).
We also confirmed the presence of the LC effect (MacNeilage and
Davis, 2000) in our data. This effect refers to a strong preference for a spe-
cific intersyllabic pattern: disyllabic word starts more often with a labial
stop consonant preceding the vowel or nucleus, which is itself followed by
a coronal stop consonant (MacNeilage, et al., 1999). In Section 2 we show
that this tendency, strongly attested in ULSID, occurred in several syllabic
patterns. We outline the results of two recent experimental studies, suggest-
ing an articulatory explanation for the existence of a perceptual correlate of
the LC effect in French adults.
Favoured syllabic patterns and sensorimotor constraints 115

This last point concerns the asymmetry we observed in sequences in-
volving a nasal (Nasal+Plosive (NP) and Plosive+Nasal (PN) consonant
sequences). Results from a preliminary study using EMA (Carstens, elec-
tromagnetic articulography) and EVA (S.Q.LAB) measurements (Rossato,
2003) are given. They suggest a possible explanation for the preferred VNP
pattern (V = syllable nucleus) compared to PNV (though following the
Sonority Sequencing Principle) and these results are discussed in Section 3.
Thus, our work attempts to provide explanatory reasons for several pre-
ferred sound sequences, showing that the widespread patterns in many
natural languages are structured from aerodynamic, articulatory and/or
perceptual constraints. More precisely, it reveals a preference for simple
speech movements such languages favour patterns that require less effort
and provide sufficient perceptual distintiveness over those that require ef-
fort. More broadly, our findings are relevant to the relationship between
sensorimotor capacities and phonology.
2. Interaction between tautosyllabic segments
Our first objective was to estimate the role of the syllabic frame in the lexi-
cal organization of ULSIDs languages. According to MacNeilages
Frame, then Content Theory (1998), the universal alternation of consonants
and vowels in speech sound sequences is a consequence of jaw movement.
Consonants and vowels are articulated during one of the two phases of the
mandibular oscillation (raising and lowering of the jaw). Consonants are
produced during the closing of the mouth, while vowels are articulated
during the opening phase. Pure frames are CV syllables produced by only
one basic movement of the jaw (elevation then lowering) without any ac-
tion of other articulators throughout the syllable and they are the simplest
syllables. They correspond to the most economical sound sequences. Ac-
cording to MacNeilage and Davis (2000), pure frames are the most fre-
quent CV-like articulations in babbling and the most frequent syllables in
first words, languages, and proto-languages.
Are the simplest syllables (in the sense of the MacNeilages Theory) the
most widespread in ULSIDs lexicons? What are the favored tautosyllabic
sequences in ULSID?
The database contains almost 250,000 syllables. We first looked at the
most common type of syllable in all languages. A basic CV structure ac-
counted for almost 54% of the ULSIDs syllables. To facilitate the observa-
116 Nathalie Valle, Solange Rossato and Isabelle Rousset

tion of syllabic content, we created co-occurrence matrices. In order to be
able to compare our results with previous studies (MacNeilage, 1998;
MacNeilage and Davis, 2000), we grouped segments according to their
phonetic features (Valle et al., 2002): six manners (plosive, fricative, na-
sal, affricate, approximant and trill/tap/flap), ten places of articulation for
consonants (bilabial, labio-dental, coronal, palatal, labial-palatal, labial-
velar, velar, uvular, pharyngeal, glottal), and three places of articulation for
vowels (front, central, and back). The coronal class grouped apical conso-
nants together i.e. dentals, alveodentals, alveolars, postalveolars and retro-
flexes (following Keating, 1990). Vowel height was not considered here
because we did not find any clear tendencies between vowel aperture and
consonant manner. We counterbalanced the observed syllable frequency
with the expected frequency estimated from its segments. For example, the
frequency of C
1
V
1
was divided by the product of the individual frequencies
of C
1
and V
1
. These ratios, calculated for each language on an individual
basis, showed what type of syllable was favored (ratio > 1) in our data (see
Table 1 for Afar and Table 2 for Thai).

Table 1. Ratios between observed and expected frequencies of CV syllables in
Afar.
Afar Coronal Bilabial Velar
Front 1.16 0.68 0.84
Central 0.88 1.22 0.98
Back 0.99 1.08 1.31

Table 2. Ratios between observed and expected frequencies of CV syllables in
Thai.
Thai Coronal Bilabial Velar
Front 1.07 0.99 0.71
Central 0.94 0.94 1.08
Back 1.01 1.04 1.22

Favoured syllabic patterns and sensorimotor constraints 117

Three favored onset-nucleus combinations emerged for Afar: coronal-front,
bilabial-central, velar-back (Table 1) whereas in Thai, velar-back was clear-
ly preferred and coronal-front was slightly favored (Table 2).
The mean value of the ratios for the 17 languages was worked out for
each simple CV combination. Three favored onset-nucleus patterns
emerged: coronal-front, bilabial-central and velar-back (Table 3). All three
favored CV patterns occurred in seven languages, while both bilabial-
central and velar-back combinations were found in twelve languages; the
coronal-front was found in eleven. Our results were consistent with previ-
ous observations in the babbling utterances of infants, and in a database of
ten natural languages which share French and Quechua data with ULSID
(MacNeilage and Davis, 2000; MacNeilage, Davis, Kinney and Matyear,
1999).
In previous studies, only simple CV sequences were observed and we
decided to examine whether the presence or absence of a coda influenced
the relationship between the onset and the nucleus in a syllable. The ob-
served/expected ratios for onset-nucleus combinations were calculated for
CVC syllables in each language. Figure 1 shows the mean onset-nucleus
ratios across languages obtained on CV with CVC syllables, on simple CV
syllables and on CVC syllables. No significant influences of the coda on
the preferred onset-nucleus pattern were observed: all three favored co-
occurrence patterns between onsets and nuclei were also found in CVC
structures.

Table 3. Mean value of the 17 ratios between observed and expected frequencies of
CV syllables (
2
significant, p < 0.001 for each column).
CV syllables Coronal Bilabial Velar
Front 1.15 0.82 0.90
Central 0.92 1.11 0.99
Back 1.02 1.01 1.19

We extended our analysis to the interaction between nuclei and codas in the
VC syllabic structure. VC syllables accounted for only 2.5% of ULSIDs
syllables. Although VC syllables were disfavored, Table 4 shows that there
was the same relationship as in CV structures. The three favored co-
occurrence patterns were front-coronal, central-bilabial and back-velar. All
118 Nathalie Valle, Solange Rossato and Isabelle Rousset

three patterns occurred in five languages, the central-bilabial combination
emerged in eleven languages, the front-coronal was found in eleven and the
back-velar in nine. This result demonstrates that the preferred VC se-
quences were pure-frames like the preferred CV sequences.

0
0.2
0.4
0.6
0.8
1
1.2
Front Central Back
0
0.2
0.4
0.6
0.8
1
1.2
1.4
Front Central Back
Front Central Back
0.2
0.4
0.6
0.8
1
1.2
Bilabial Onset
Coronal Onset
Velar Onset
CV + CVC CV CVC CV + CVC CV CVC
0
0.2
0.4
0.6
0.8
1
1.2
Front Central Back
0
0.2
0.4
0.6
0.8
1
1.2
Front Central Back
0
0.2
0.4
0.6
0.8
1
1.2
1.4
Front Central Back
0
0.2
0.4
0.6
0.8
1
1.2
1.4
Front Central Back
Front Central Back
0.2
0.4
0.6
0.8
1
1.2
Front Central Back
0.2
0.4
0.6
0.8
1
1.2
Bilabial Onset
Coronal Onset
Velar Onset
CV + CVC CV CVC CV + CVC CV CVC CV + CVC CV CVC CV + CVC CV CVC

Figure 1. Mean expected/observed ratios across individual languages according to
the places of onset: bilabial on the upper left, coronal on the upper right
and velar on the bottom row. X-axis indicates the place of the nucleus:
front, central, back. Each group of three bars represents the mean ratio
for respectively: CV with CVC syllables (left bar), CV syllables (middle
bar), CVC syllables (right bar).

We also noticed that the link between vowel and consonant had a ten-
dency to be more important in VC than in CV syllables. This result is in
line with the concept of the rhyme proposed by phonologists who claim that
Favoured syllabic patterns and sensorimotor constraints 119

the structure of the syllable is hierarchical rather than linear (Selkirk, 1982).
In this approach, the relationship between the syllabic constituents are rep-
resented by a tree-like diagram in which the nucleus and the coda are linked
together in a node located at the same level of structuring than the onset.
Blevins (1995:214-215) makes a strong case for the rhyme constituent
based on i) syllable weight, ii) restrictions on the number of segments oc-
curring in the syllable rhyme, and iii) the close relationship between nu-
cleus and coda. Bagemihl (1995:707), reviewing the interaction of language
games with lingusitic theory, points out that the rhyme was one of the lin-
guistic units manipulated in word games. All of these observations con-
verge and substantiate that the rhyme is a subsyllabic constituent.

Table 4. Mean ratios across the individual languages between observed and ex-
pected frequencies of VC syllables (
2
significant, p < 0.001 for each col-
umn).
VC syllables Front Central Back
Coronal 1.24 0.91 0.95
Bilabial 0.54 1.28 1.13
Velar 0.76 0.84 1.37

Regarding the stronger effect of the frame on VC syllables, the properties
of the jaw cycle could help explain this trend. Redford (1999) points out
inherent asymmetries in the jaw cycle. From experiments carried out in the
framework of the Frame/Content Theory, she shows that the closing phase
of the mouth was articulated with greater velocity peaks and shorter dura-
tions than in the opening phase. In complex syllables the opening phase
was articulated with greater displacement (distance between min and max
opening within a phase) than the closing phase. Finally, the degree of arti-
culatory stiffness (slope between distance and velocity) was smaller at the
opening phase than at the closing phase. If these trends were assessed in
additional cross-linguistic studies, they could explain the following cross-
linguistic preferences: vowels and consonants are more strongly connected
in VC than in CV sequences; CV are more frequent than VC or CVC se-
quences; single consonants are preferred over consonant clusters or com-
plex consonants; clusters or complex consonants are more frequent in onset
position compared to in coda position. Redford claims that the mechan-
120 Nathalie Valle, Solange Rossato and Isabelle Rousset

ical and temporal constraint of the jaw cycle is manifested in the phonolog-
ical and phonetic sound patterns that are perceived as syllables. [] jaw
cycle may provide an articulatory basis for the syllable in language (Red-
ford 1999:25). According to this view, the dominance of the jaw cycle in
adult speech, as described by MacNeilage in the Frame/Content Theory,
seems not to be weakened by within and across language phonological
asymmetries and co-occurrence constraints in VC patterns. On that account,
our results are in line with MacNeilage and Davis study (2000). Languages
of ULSID favor pure-frame syllables (over 30% of the all ULSIDs syl-
lables) which correspond to economical sound patterns. According to
MacNeilage (1998), such less complex syllabic patterns originate from a
simple mandibular cycle (a frame), the vowel, and the adjacent consonant
adopting a quite similar position of the tongue.
Our final analysis examined tautosyllabic combinations between nuclei
and consonants in the ( )CV( ) and ( )VC( ) templates (the brackets indicate
that one or more consonants can appear in this position). All the syllabic
structures of the 17 languages were considered (except the minimal syllable
V which accounted for less than 5% of the ULSID syllables). We con-
ducted statistical analyses taking into account the prevocalic consonant and
the postvocalic one, as long as they appeared in the same syllable.
Our results (Tables 5-6) clearly showed that front vowels were strongly
related to coronals, central vowels to bilabials and back vowels to velars.
Thus, regardless of the complexity of the syllabic structures (the degree of
complexity is estimated from the number of consonants in the onset or coda
position), languages seem to favor consonant-vowel or vowel-consonant
combinations where the tongue does not make great displacement in the
front-back dimension between the vowel gesture and the gesture of the
immediately preceding or following consonant.
Our analyses confirmed that the syllabic frame was one of the aspects of
the sound organization in lexical units. In a next step, we will extend our
analyses to other consonants, by grouping bilabials and labiodentals in a
labial category, dorsopalatals, uvulars, and velars in a dorsal category
(since we noticed that palatal consonants were very frequently combined
with back vowels, as in Vietnamese; ratio observed/expected = 1.9).




Favoured syllabic patterns and sensorimotor constraints 121

Table 5. Mean value of the 17 ratios between observed and expected frequencies of
( )CV( ) combinations (
2
significant, p < 0.001 for each column).
CV patterns Coronal Bilabial Velar
Front 1.09 0.94 1.05
Central 0.87 1.10 0.70
Back 1.01 0.93 1.32

Table 6. Mean value of the 17 ratios between observed and expected frequencies of
( )VC( ) combinations (
2
significant, p < 0.001 for each column).
VC patterns Front Central Back
Coronal 1.11 1.00 0.89
Bilabial 0.53 1.32 1.18
Velar 0.92 0.86 1.22
3. The Labial-Coronal Effect
We explored other prevocalic and postvocalic consonants that appeared in
the same syllable, or in two consecutive syllables, within lexical items.
Regarding structures with less complex onset and coda, we focused on the
relation between the initial and the final consonants in CVC syllables or in
the consecutive onsets in CV.CV words (the dot indicates the syllable
boundary). We observed that various places of articulation for the two con-
sonants were favored. For example, [pap] and [tat] syllables were less fre-
quent than [tap] or [pat], although coronals were the most frequent conso-
nants in onsets, as well as in coda positions (Rousset, 2004). In the favored
patterns, there were no place repetitions between onset and coda in the
CVC structures. The mean observed/expected ratios for the ULSIDs lan-
guages were below 0.9 for the CVC syllables when the two consonants
122 Nathalie Valle, Solange Rossato and Isabelle Rousset

shared the same place of articulation. This result contrasts with the place
assimilation between C and V found in both the favored CV and VC struc-
tures. The more widespread CVC patterns in ULSIDs lexica were in de-
scending order: Bilabial-V-Coronal, Coronal-V-Velar, Coronal-V-Bilabial,
Velar-V-Coronal. While no difference was observed between the ratios of
the two inverse patterns Coronal-V-Velar and Velar-V-Coronal (respective
ratios observed/expected = 1.18 and 1.2), this was not the case for Bilabial-
V-Coronal and Coronal-V-Bilabial (respective ratios = 3 and 1.13). We
observed the same findings for the two consecutive open syllable patterns,
except that repetition of coronal is the most common CV.CV sequences.
The predominance of the Bilabial-V-Coronal-V combination over the Cor-
onal-V-Bilabial-V pattern is strongly attested in CV.CV sequences. We
calculated the same ratios as for CVC patterns, and found an LC effect in
15 languages (at statistically significant levels for 13 of them, p < 0.005),
either in CVC structures (12 languages, significant for 9) or in CV.CV se-
quences (14 languages, significant for 13), regardless of their position
within the word. In 3 languages (Kanuri, Navaho, Thai), the trend was pre-
sent only in disyllabic patterns and was not found in CVC syllable struc-
tures. The Vietnamese lexicon has few compounded words with the CV.CV
structure (less than hundred). The ratio of LC to CL was 0.92. In addition,
Wa and !X had no LC effect in any of the observed structures (Wa has
only CVC). Finally, French, Finnish and Quechua had a stronger LC effect
in intra-syllabic patterns than in disyllabic sequences. However, their basic
syllabic structure is CV.
When we took into account only the bilabial consonants, the mean ratio
of LC to CL was 2.77 for all the dissyllabic words of ULSIDs languages.
Concerning bilabials and labiodentals, the ratio was 2.79. The mean values
of LC/CL as a function of the syllable type and its position within the word
were also examined. Table 7 shows that the trend appeared not only at
word onsets but also elsewhere in the words, both in intersyllabic and intra-
syllabic patterns. Nevertheless, the mean ratio was higher for disyllabic
words and CV.CV disyllabic sequences at word onset position.
These results confirmed the presence in ULSID of the so-called Labial-
Coronal (LC) effect (MacNeilage and Davis, 2000). According to the au-
thors, there is a strong preference for a specific intersyllabic pattern such
that the onset of the first syllable of a disyllabic word CV.CV is a labial
stop consonant and the onset of the second syllable is a coronal stop conso-
nant. This pattern is absent during babbling, but appears during the first
word stage (MacNeilage, Davis, Kinney and Matyear, 1999). This trend is
Favoured syllabic patterns and sensorimotor constraints 123

so strong that infants produce LC patterns even if the words in the adult
language are CL. The authors reported, for example, that soup was pro-
nounced with the inverse sequence pooch [pu:t]. In a review of seven
previous studies, MacNeilage and Davis (1998) noted the predominance of
the LC patterns in infants from five different language communities
(French, American-English, Dutch, Spanish, Czech). In MacNeilage et al.,
(1999), the authors observed both CVC and CVCV words produced by 21
English-speaking infants during the first-50-word stage (12-18 months).
They calculated the overall occurrences of the two opposite patterns and
found a ratio of LC to CL equal to 2.55. They observed also the LC effect
in samples of words of ten natural languages. In this cross-linguistic study,
the mean ratio of LC to CL was 2.23. In a previous study carried out by
Locke (1983), this effect of "anterior-to-posterior progression" was ob-
served in the utterances of children and also in the vocabularies of English
and French.

Table 7. Mean values of LC/CL ratios for all word lengths as a function of syllable
type and position within the word: (1) Bilabial consonants (2) All labial
consonants (bilabials + labiodentals).
Position Word Onset Elsewhere
Sequences CVC CV.CV CVC CV.CV
LC/CL (1) 1.53 2.41 1.68 1.75
LC/CL (2) 1.73 2.28 1.89 1.68

MacNeilage and Davis (2000) propose developmental arguments, based on
the biomechanical properties of the vocal apparatus, to explain the LC ef-
fect in infants and adults. The authors put the LC preference within the
context of the Frame/Content Theory. Findings from investigations in neu-
rophysiology and clinical neurology led the authors to suggest that the LC
effect could be a consequence of articulatory properties and more precisely,
a consequence of selecting the simplest gestures first. They argue that LC
patterns begin with a less complex task than CL patterns due to the fact that
labials are easier to produce than coronals: labials are produced with a basic
movement of the jaw while coronals require an additional movement of the
tip of the tongue. It can be assumed that velars, which involve both the
124 Nathalie Valle, Solange Rossato and Isabelle Rousset

backward movement of the tongue body and the raising of the tongue dor-
sum toward the soft palate, are also more complex to articulate than labials.
In addition, many previous studies on tendencies in babbling and early
speech across languages have observed that the emergence of the velar
closing phase followed the emergence of both labial and coronal closures
(Locke, 1983; Davis and MacNeilage 1994; Robb and Bleile, 1994; Stoel-
Gammon, 1985).
However, comparing three articulatory models, Vilain, Abry, Brosda
and Badin (1999) showed that the frame can produce labial or coronal clo-
sure, where the place of the closure depends on the position of the tongue at
the beginning of the mandibular cycle. So, in this study, coronal closure
seemed just as easy as labial closure to produce.
In a recent study, Rochet-Capellan and Schwartz (2005a,b) proposed an
original explanation of the LC effect. They analyzed the gestural overlap of
the consonant closures (both occlusion and constriction) in a task of accel-
erated repetition of a given CVCV sequence (V = /a/ C = {/p/, /t/, /f/, /s/})
pronounced by thirty-two French subjects. The articulatory and acoustic
measurements revealed that the effect of accelerating the repetition put the
LVCV sequences in phase with a single jaw-opening gesture; in contrast,
there were two jaw cycles when producing CVLV sequences (one cycle for
each syllable). At slow speech-rate (2.5 Hz), the lower lip and the tongue
tip in LVCV sequences were in phase with the mandibular gestures, but
acceleration to five cycles per second prompted a loss of phasing for the
tongue apex. At a fast rate, the labial release fit with the preparation phase
or launching and the coronal release fit with the jaw-lowering phase. Their
results show that LVCV sequences requires more economical mandibular
gestures than the CVLV sequences. This is due to the fact that there is
stronger articulatory cohesion between the jaw oscillations and lip and
tongue movements in LVCV sequences. The authors conclude that the
overlap of the two consonantal gestures is easier because, in LC patterns,
the coronal closure could be prepared during the labial gesture, making LC
simpler to produce than CL.
Another recent study, exploiting the Verbal Transformation Effect
(Warren, 1961), is in line with this idea (Sato, Valle, Schwartz and Rous-
set, 2007). The Verbal Transformation Effect refers to the perceptual
changes experienced while listening to a speech form cycled in rapid and
continuous repetition. Sato et al. assumed that LC percepts would be more
stable (lasting longer before the next transformation) than CL percepts.
They proposed two experiments using either voiced or voiceless plosive
Favoured syllabic patterns and sensorimotor constraints 125

consonants, which confirm a greater stability and attractiveness for LC
percepts.
Several recorded CV syllables with C = {/p/, /t/, /b/, /d/} and V = {/a/,
/i/, /o/} were paired with respect to both the vowel quality and the voiced
consonant feature. The pairs of CV syllables were selected meticulously,
taking into account their acoustic similarities (duration, intensity, formant
and pitch values). The dissyllabic stimuli were /pata/, /tapa/, /piti/, /tipi/,
/poto/ and /topo/ for experiment A; /bada/, /daba/, /bidi/, /dibi/, /bodo/ and
/dobo/ for experiment B. For each stimulus, different lexical factors were
calculated in order to eliminate a possible lexical effect on verbal transfor-
mations: the dissyllabic frequencies, the bigram frequencies, and the
neighborhood density. The twenty-four participants heard the reverse disyl-
labic LVCV sequences (such as "pata" or "tapa") repeated 300 times with
no interval between repetitions. They had to report what they perceived as
soon as the sequence seemed to change into another form (even if it
changed into one they had heard previously). The authors estimated the
perceptual stability duration by measuring the time spent perceiving a given
form before switching to another one. The results showed that the stability
durations of /pV.tV/ were significantly higher than in /tV.pV/ sequences,
regardless of the stimulus, with an average of 1.40 more /pV.tV/ than
/tV.pV/. The same trend was observed for the patterns involving voiced
plosives: the stability durations of /bV.dV/ were significantly higher than in
/dV.bV/ sequences, with an average of 1.36 more /bV.dV/ than /dV.bV/.
Sato et al. noted that, in the French lexicon, the number of lexical entries
beginning with /p/ was twice as high as the ones beginning with /t/. In turn,
the number of lexical entries beginning with /b/ was half the number of
entries beginning with /d/. Thus, they observed a clear perceptual prefer-
ence for the LC forms over the CL forms, even when the dissyllabic pat-
terns heard by the subjects contained voiced consonants.
Although this finding cannot account completely for the LC effect in
languages, it clearly indicates that in a ()CLCLCLCLC() sequence, the
listener more naturally segmentes a sequence into LC chunks. This may
suggest a possible perceptual correlate of the LC effect in French adults.
4. Consonant sequences with nasal and plosive
Languages with similar phoneme inventories may have distinct phonotactic
and distributional patterns in order to shape syllables, morphemes, and
126 Nathalie Valle, Solange Rossato and Isabelle Rousset

words, utterances. Within syllables, it is generally accepted that speech
sounds are linearly ordered according to their intrinsic sonority, with the
nucleus as the most sonorous element. Although languages permit that the
syllabic constituents to be organized in a hierarchical structure, based on
the famous Sonority Sequencing Principle (SSP), there is still no accepted
universal scale of sonority. As pointed out by Ohala and Kawasaki (1997),
the main reason is that there is no clear definition of sonority, not even one
based on phonetic properties, like the correspondence between jaw open-
ness and the sonority scale suggested by Lindblom (1983). On the other
hand, it is well established that the SSP is unable to account for some sound
sequences within and across languages (see Blevins (1995: 211) for /sp st
sk/ onsets in English; Broselow (1995: 177) for prenasalized stops in sylla-
ble onsets; Montreuil (2000) for negative slope sonority in two consonant
onset clusters in Raeto-Romance, Gallo-Italic, also Slavic). Most of the
proposed sonority scales or other sound hierarchies (since Saussure, 1916
to Steriade, 1982, or Selkirk, 1984; Anderson and Ewen, 1987; Klein,
1993; Angoujard, 1997) fail to predict Nasal+Plosive sequences at the on-
set of syllables, and each time predict the inverse pattern Plosive+Nasal
(see Clements (1990) for a detailed and well documented discussion on the
role of the sonority in syllabification). As a result, the SSP cannot explain
the observed combinations between nasals and plosives. In order to account
for the favoured sound sequences involving nasals and plosives, we inves-
tigated the phonetic properties of the segments in the sequences. We at-
tempted to estimate the role of both articulatory and aerodynamic factors in
preferred and non-preferred sound sequences with a plosive and a nasal in
ULSIDs languages. First, the distribution of both liquids and nasals with
adjacent plosives were analyzed within the words and within the syllables.
Our choice to observe liquids adjacent to nasals was justified in that the
sonority ranking of nasals and liquids are similar (irrespective of the se-
lected sonority scale) while their articulatory features are very different.
The distributions of both Plosive+Nasal and Plosive+Liquid sequences
within the words and within the syllables and the inverse combinations
(Nasal+Plosive and Liquid+Plosive) were surveyed using 15 syllabified
lexicons from the ULSID database. The segments under analysis were de-
scribed in terms of manner: Plosive (P), Fricative (F), Nasal (N), and Liq-
uid (L) (Liquid groups type of voiced lateral approximants and rhotics).
The number of observed syllables was compared to the number of ex-
pected syllables. For example, the number of expected NPV( ) syllables
was calculated using the number of CCV( ) syllables weighted by both the
Favoured syllabic patterns and sensorimotor constraints 127

probability of occurrence of the Nasal manner and of the Plosive manner.
The same calculation was carried out for all the sequences under analysis.
Thus, the number of expected NPV( ) sequences was estimated using the
number of CCV( ) sequences in each syllabified lexicon.
Within syllables, both complex onsets and complex codas were ana-
lyzed (complex syllables accounted for 2.7% of all ULSIDs syllables). As
no complex syllables were observed in their lexicons, both !X and
Yupik were not considered in this analysis. Three other languages were
also ruled out (Afar, Navaho and Ngizim) because the expected frequencies
for a given syllable was too low, generating artefactual large difference
(when the ratio of observed and expected frequencies was calculated, a
small difference on low values generated a high difference on the resulting
value). Therefore, we needed a sufficient number of complex syllables for
computing the ratio and pursuing the analyses. When a lexicon contained
less than 30 complex syllables, it was excluded from the analyses. Accord-
ing to this limit, 9 languages with complex onsets (Finnish, French, Kan-
nada, Kanuri, Nyah kur, Quechua, Sora, Thai, Wa) and 5 languages with
complex codas (Kanuri, Nyah kur, Quechua, Thai and Wa) were retained
for the analysis. It is worth noting that Kwakwala had complex codas
without complex onsets. Table 8 presents the mean values of the ratios
observed in tautosyllabic sequences involving either a nasal or a liquid
contiguous to a plosive consonant.
The results revealed that the patterns involving a liquid consonant fol-
lowed the SSP: PLV sequences were mainly favored while LPV sequences
were not found in ULSIDs languages. Complex codas with falling sonority
(VLP) were more widespread than VPL rhymes. The latter pattern was
found only in the French lexicon, without however being favored (ra-
tio = 0.52 < 1). We observed different patterns in the distribution of nasals.
PN sequences in onset position were rare, but followed SSP. Most of the
languages studied did not have NPV sequences except two of them (Kanuri
and Nyah kur). Besides, both languages favored this type of sequence (the
ratios were, respectively, 3.24 and 1.65). Some NPV syllables were present
in Ngizim, with 11 tokens out of the twenty CCV syllables. With respect to
complex codas, VPN sequences were disfavored while VNP sequences
were favored, in accordance with the SSP.



128 Nathalie Valle, Solange Rossato and Isabelle Rousset

Table 8. Mean value of the observed/expected ratios for each tautosyllabic combi-
nation. Underscores indicate the position of the consonant (nasal or liq-
uid) in the sequence. The sound sequences predicted by the SSP are in
the gray columns.
Complex onset (9 languages) Complex coda (5 languages)
P_V _PV VP_ V_P
Nasal 0.01 0.54 0.03 3.89
Liquid 6.93 0 0.10 1.66

Analyses were extended to Nasal+Plosive and Liquid+Plosive sequences
(and opposite combinations) across syllable boundaries. In the 15-language
ULSID database, Wa has a monosyllabic lexicon and !X has too few
clusters. So, they were excluded from the following analysis. For each
combination, we calculated the mean ratio value among the 13 languages.
The results presented in Table 9 reveal that both the VL.P and the L.PV
sequences were widely favored across syllable boundaries when compared
to the other sequences (VP.L and P.LV). With respect to nasals, the distri-
butional patterns across syllable boundaries revealed another clear prefe-
rence: N.PV was widely preferred to P.NV for 11 of the thirteen languages
(except for both French and Yupik), and a similar tendency was found
between VN.P and VP.N patterns (the respective values of the ratios were
2.14 and 0.21).

Table 9. Mean value of the observed/expected ratios for each combination ana-
lyzed across syllable boundaries. The underscore gives the position of the
consonant (nasal or liquid).
P._V _.PV VP._ V_.P
Nasals 0.27 2.34 0.21 2.14
Liquids 0.32 1.09 0.17 1.50

The ratios indicated that both Nasals and Liquids demonstrated a clear pre-
ference for coda position, while Plosives tended to appear in onsets. This
result follows the Syllable Contact Law which predicts that sonority drops
maximally across syllable boundaries (Clements, 1990). Similar trends
were found for Nasal+Plosive and Liquid+Plosive sequences across sylla-
Favoured syllabic patterns and sensorimotor constraints 129

ble boundaries even though Nasals and Liquids behaved differently within
syllables.
We suggest that such behavior can be explained by articulatory and
aerodynamic factors.

In a pilot experiment, the production patterns of the feature [+nasal] were
observed for a French male speaker with an electromagnetic articulograph
EMA (Carstens) (Rossato, Bouaouni and Badin, 2003). The velum height
was measured with a pellet glued to the inferior area of the velum for VCV
sequences, where C = {/p/, /t/, /k/, /b/, /d/, //, /f/, /s/, //, /v/, /z/, //, /m/,
/n/, /l/, //}. Each consonant was repeated in a symmetrical vocalic context
for all French oral and nasal vowels. A series of articulatory measurements
showed that the voiceless plosives /p t k/ were always produced with a high
velar position and the voiced ones with a slightly lower position. The nasal
consonants were produced with a wide range of velum heights (0.8 cm),
since their production required an open velopharyngeal port. Great varia-
tion due to the vocalic context (high vowels, low vowels, or nasal vowels)
was observed in these cases. Figure 2 shows the mean velum height trajec-
tory for the following categories: Voiceless plosives, voiced plosives, nas-
als, and liquids (other categories omitted in this figure).
The same corpus was used to record data with the EVA workstation
(S.Q. LAB) on the same speaker in order to measure the intraoral pressure,
oral airflow and nasal airflow. Intraoral pressure was estimated using a
PVC tube placed in the subjects mouth and connected to the pressure sen-
sor device of the EVA workstation. Therefore the measurements of the
intraoral pressure were not available for velar stops in /u/ and /o/ contexts,
and for uvular consonants in any vocalic contexts. Figure 3 shows that the
intraoral pressure maintained a high value during the closure phase of the
voiceless plosives, progressively increased during the voiced plosives, and
stayed low for the nasals /n, m/ and the liquid /l/ (other consonants were
unanalyzed in this study).

130 Nathalie Valle, Solange Rossato and Isabelle Rousset

0 0.1 0.2 0.3
10.2
10.4
10.6
10.8
Time (s) Time (s)
Nasal
Voiced Plosive
Unvoiced Plosive
Liquid
V
e
l
u
m

h
e
i
g
h
t

(
c
m
)

Figure 2. Mean trajectories of velum height during the production of each category
of consonant.

0 0.1 0.2 0.3 0.4
0
2
4
6
8
Nasal
Liquid
Voiced Plosive
Unvoiced Plosive
0 0.1 0.2 0.3 0.4
0
2
4
6
8
Nasal
Liquid
Voiced Plosive
Unvoiced Plosive
Time (s) Time (s)
I
n
t
r
a
o
r
a
l
p
r
e
s
s
u
r
e

(
h
P
a
)

Figure 3. Intraoral pressure curve for each category of consonants.

Although these articulatory and aerodynamic data were obtained from VCV
sequences, they shed light on possible coarticulation effects between adja-
cent nasal and plosive consonants. The articulation of Nasal+Plosive se-
Favoured syllabic patterns and sensorimotor constraints 131

quences involves both closing the oral tract and lowering the velum to pro-
duce the articulation of the nasal consonant. The escape through the nose of
the airflow prevents an increase of the intraoral pressure. To produce the
following plosive, the velum should raise until the velopharyngeal port is
closed. This gesture is probably facilitated by the increase of intraoral pres-
sure due to the closure of the vocal tract. When the two contiguous conso-
nants have the same place of articulation, velum raising and laryngeal con-
trol seem to be sufficient to produce such a sequence.
On the other hand, the opposite sequences, Plosive+Nasal, start with
both a high velic position and high intraoral pressure. At the release of the
plosive, the intraoral pressure drops, the velum lowers and opens the velo-
pharyngeal orifice, while the closure of the vocal tract during the articula-
tion of the nasal consonant produces very slight variation of intraoral pres-
sure. When the plosive is unreleased, the intraoral pressure stays high until
the velopharyngeal port is open. Figure 2 shows that the mean trajectories
of the velum height rose slightly during the closure phase of the voiced and
unvoiced plosives. This means that the high intraoral pressure increases the
volume of the vocal tract. This pushes the velum upward and causes a wid-
ening of the vocal tract in the velar region. Consequently, to produce a fol-
lowing nasal consonant, the velum should be lowered to open the velo-
pharyngeal port despite a high intraoral pressure creating a drop of the
intraoral pressure. This suggests that the articulatory cost of a Plo-
sive+Nasal sequence, with or without the plosive release, should be higher
than the articulatory cost of a Nasal+Plosive sequence. The latter could be
produced with a simple rising of the velum. These articulatory and aerody-
namic constraints on velum movements could explain the preference in
ULSIDs languages for Nasal+Plosive over Plosive+Nasal sequences, re-
gardless of the syllable boundary.
5. Conclusion
Although phoneme sequences in a language exhibit a high degree of com-
plexity, cross-language investigations of common sound sequences indicate
that physical and cognitive factors shape, at least partly, the linear organiza-
tion of phonemes in words and syllables. The non exhaustive set of obser-
vations in cross-language syllabified lexicons proposed in this paper was
selected with the aim of better understanding the relationship between sen-
sori-motor capacities and phonology. The present results suggest that sev-
132 Nathalie Valle, Solange Rossato and Isabelle Rousset

eral mechanisms underlie both intra-syllabic and inter-syllabic patterns.
They confirm previous findings on CV co-occurrence patterns
(MacNeilage, 1998) and also on inter-syllabic patterns involving labial and
coronal consonants (MacNeilage and Davis, 2000).
In both CV and VC sequences, consonants and vowels often share the
same place of articulation. According to MacNeilages Frame/Content
Theory (1998), these CV patterns stem from the syllabic frame, the basic
raising-lowering movement of the jaw. The same trend is observed for the
reverse syllabic patterns, with a stronger effect of the frame across nuclei
and codas. This indicates that the frame shapes a greater part of the syllabic
inventories in languages compared to MacNeilage and Davis (2000) find-
ings. Although only two of them are pure-frame (both CV and VC), these
results reveal that gesture economy plays a role in shaping the preferred
intra-syllabic patterns.
Likewise, our analyses showed a more extensive LC effect compared to
previous work (MacNeilage and Davis, 2000). We found LC patterns in 15
out of 17 languages, not only in disyllabic words or CVC syllables at the
beginning of the words, but also in CVC(V) sequences. Contrary to what
MacNeilage and Davis claim about the frequency of LC patterns in sound
sequences (preference for simple first), the results of recent experimental
studies conducted in our laboratory suggest i) a perceptual correlate of the
LC effect showing that listening to a ()CLCLC() sequence, a French
subject tends to segment it into LC chunks (Sato, Valle, Schwartz and
Rousset, 2007); ii) a greater ease of coarticulation in labial-coronal struc-
tures compared with coronal-labial ones (Rochet-Capellan and Schwartz,
2005a,b). In this speeded speech production experiment, the dissyllabic
LVCV patterns were produced with a single jaw-opening gesture at a fast
speech rate, while CVLV sequences were still produced with two jaw
frames. At a normal rate, in the production of both LVCV and CVLV pat-
terns, the lower lip and the tongue apex each were in phase with the onset
of the jaw-opening gesture. These results indicate that a CV syllable could
be out of phase with the jaw-opening gesture (the frame), which can be
observed with an increased speech rate paradigm, syllabic cycles being in
phase with mandibular cycles before and after acceleration (both cycles had
a similar duration). According to Rochet-Capellan and Schwartz, the ex-
planatory reason for the LC effect is the easier overlap of the two consonant
gestures in LVCV compared to CVLV sequences.
Among the aspects of the sound organization in lexical units, sound se-
quences involving contiguous nasals and plosives were examined. A series
Favoured syllabic patterns and sensorimotor constraints 133

of articulatory measurements suggest that the preferred sequences avoid a
higher articulatory cost and obey to strong aerodynamic constraints on the
velum movements.
The analyses of preferred sound combinations across languages provide
insight into the grounds of the structure of sound sequences in languages.
The explanation of these preferred patterns with non-phonological princi-
ples stress the importance of the link between language sound structures
and physical factors such as aerodynamic principles, beside other character-
istics of the human speech production and speech perception systems. Our
findings shed light on how sound sequences are structured in language with
respect to articulatory factors and aerodynamic constraints. They also con-
tribute to the understanding of how and which phonological and phonetic
patterns form the basis of syllable perception.
Acknowledgements
We are particularly grateful to Ian Maddieson for letting us use ULSID,
Mathieu Maupeu for programming the ULSID interface and Thuy Hien
Tran for the Vietnamese lexicon. We thank Barbara Davis and Peter
MacNeilage who provided helpful discussion of the findings presented in
sections 1 and 2. The research presented in Section 3 was funded by the
GIP ANR AAP project Dynamique de la nasalit. mergence et phonolo-
gisation des voyelles nasales. The research in Section 2 was funded by the
CNRS-SHS Project Complex Systems in Human and Social Sciences "Pati,
papa? Modlisation de lmergence dun langage articul dans une socit
dagents sensori-moteurs en interaction ".
References
Anderson, J. M., & C., Ewen,
1987 Principles of Dependency Phonology. Cambridge: Cambridge Uni-
versity Press.
Angoujard, J.-P.
1997 Thorie de la syllabe. Rythme et qualit. Gap: CNRS ditions.
Bagemihl, B.
1995 Language Games and Related Areas. In J.A. Goldsmith (ed.), Handbook
of Phonological Theory: 697-712. Oxford: Blackwell Publishers.

134 Nathalie Valle, Solange Rossato and Isabelle Rousset

Berlitz (Ed.)
1981 Ordbok Fransk-Svensk / Dictionnaire Sudois-Franais. Oxford:
Berlitz Publishing Compagny Ltd.
Blevins, J.
1995 The syllable in Phonological Theory. In J.A. Goldsmith (ed.), Hand-
book of Phonological Theory: 206-235. Oxford: Blackwell Publish-
ers.
Bo, L.-J., Abry, C., Cathiard, M., Schwartz, J.-L., Badin, P., & Valle, N.
2005 Comment les exceptions des handicaps rvlent les universaux pho-
nologiques bimodaux : contraintes audiovisuelles des systmes con-
sonantiques des langues du monde. Faits de Langues 25: 175-189.
Broselow, E.
1995 Skeletal Positions and Moras. In J.A. Goldsmith (ed.), Handbook of
Phonological Theory: 175-205. Oxford: Blackwell Publishers.
Carreiras, M., lvarez, C. J., & De Vega, M.
1993 Syllable frequency and visual word recognition in Spanish. Journal
of Memory and Language 32: 766-780.
Clements, G. N.
1990 The role of the sonority cycle in core syllabification. In: J. Kingston
& M Beckman (eds), Papers in Laboratory Phonology 1: Between
the Grammar and Physics of Speech: 283-333. New York: Cam-
bridge University Press.
Content, A., Kearns, R. A., & Fraudenfelder, U. H.
2001 Boundaries versus Onsets in Syllabic Segmentation. Journal of
Memory and Language 45: 177-199.
Davis, B., & MacNeilage, P. F.
1994 Organization of Babbling: A Case Study. Language and Speech 37
(4): 341-355.
De Boer, B.
2000 Emergence of sound systems through self-organization. In M. Stud-
dert-Kennedy, J. Hurford & Knight C. (eds.), The emergence of Lan-
guage: Social Function and the origins of Linguistic Form. Cam-
bridge: Cambridge University Press.
Derwing, B. L.
1992 A 'Pause-Break' task for eliciting syllable boundary judgments from
literate and illiterate speakers preliminary results for five diverse
languages. Language and Speech 35 (1-2): 219-235.
Janson, T.
1986 Cross-linguistic trends in the frequency of CV sequences. Phonology
Yearbook 3: 179-195.
Kandel, S., lvarez, C., & Valle, N.
2006 Syllables as processing units in handwriting production. Journal of Expe-
rimental Psychology: Human Perception and Performance 32 (1): 18-31.
Favoured syllabic patterns and sensorimotor constraints 135

Kawasaki, H.
1982 An acoustical basis for universal constraints on sound sequences.
PhD Thesis, University of California.
Kenstowicz, M.
1994 Phonology in generative grammar. Blackwell: Cambridge, MA.
Keating, P.A.
1990 Coronal places of articulation. UCLA Working Papers in Phonetics
74: 35-60.
Kingston, J.
2007 The phonetics-phonology interface. In P. de Lacy (ed.), Handbook of
Phonology: 435-456. Cambridge, UK: Cambridge University Press.
Kingston, J., & Diehl, R.
1994 Phonetic knowledge, Language 70: 419-454.
Klein, M.
1993 La syllabe comme interface de la production et de la perception pho-
niques. In B. Laks & Plnat M. (eds.), De Natura Sonorum : essais de
phonologie: 99-141. Saint-Denis: Presses Universitaires de Vincennes.
Krakow, R.
1999 Physiological organization of syllables: A review. Journal of Phone-
tics 27: 23-54.
Liljencrants, J., & Lindblom, B.
1972 Numerical simulation of vowel quality systems: The role of percep-
tual contrast. Language 48: 839-862.
Lindblom, B.
1983 On the Teleological Nature of Speech Processes. Speech Communi-
cation 2: 155-158.
1986 Phonetic Universals in Vowel Systems. In J.J. Ohala (ed.), Experi-
mental Phonology: 13-44. New-York: Academic Press.
2000 The Interplay between Phonetic Emergents and the Evolutionary
Adaptations of Sound Patterns. Phonetica 57: 297-314.
Lindblom, B. & Maddieson, I.
1988 Phonetic universals in consonant systems. In L.H. Hyman & Li C.N.
(eds.), Language, Speech and Mind: 62-78. London and New-York:
Routledge.
Locke, J. L.
1983 Phonological Acquisition and Change. New York: Academic Press.
MacNeilage, P.F.
1998 The Frame/Content Theory of Evolution of Speech Production. Be-
havioral and Brain Sciences 21: 499-511.
MacNeilage, P.F., & Davis, B.
1990 Acquisition of speech production: Frame then content. In M. Jeanne-
rod (ed.), Attention and Performance XIII: Motor Representation
and Control: 453-475. Hillsdale, NJ: Lawrence Erlbaum.
136 Nathalie Valle, Solange Rossato and Isabelle Rousset

1998 Evolution of speech: The relation between phylogeny and ontogeny.
In: C. Knight, J. R. Hurforf & M. Studdert-Kennedy (eds), The Evo-
lutionary Emergence of Language: Social function and the origin of
linguistic form. Cambridge, Cambridge University Press: 146-160.
2000 On the Origin of Internal Structure of Word Forms. Sciences
288: 527-531.
MacNeilage, P.F., Davis, B.L., Matyear, C.M. & Kinney, A.
1999 Origin of speech output complexity in infants and in languages.
Psychological Science 310 (5): 459-460.
Maddieson, I.
1984 Patterns of sounds. Cambridge: Cambridge University Press.
1986 The Size and Structure of Phonological Inventories: Analysis of
UPSID. In J.J. Ohala (ed.), Experimental Phonology: 105-123. New
York: Academic Press.
1993 The structure of segment sequences. UCLA Working Papers in Pho-
netics 83: 1-8.
Maddieson, I., & Precoda, K.
1989 Updating UPSID. UCLA Working Papers in Phonetics. 74: 104-111.
1992 Syllable structure and phonetic models. Phonology 9: 45-60.
Mawass, K.
1997 Synthse articulatoire des consonnes fricatives du franais. Thse de
doctorat, INP Grenoble, France.
Mawass, K., Badin, P., & Bailly, G.
2000 Synthesis of French fricatives by audio-video to articulatory inver-
sion. Acta Acustica 86: 136-146.
McGowan, R.S., Koenig, L.L., & Lfqvist, A.
1995 Vocal tract aerodynamics in [aCa] utterances: Simulations. Speech
Communication 16: 67-88.
Montreuil, J.-P.
2000 Sonority and derived clusters in Raeto-Romance and Gallo-Italic. In
L. Repetti (ed.), Phonological Theory and the Dialects of Italy. Ams-
terdam & Philadelphia: John Benjamins Publishing Compagny.
Morais, J., Cary, L., Alegria, J., & Bertelson, P.
1979 Does awareness of speech as a sequence of phones arise sponta-
neously? Cognition 7: 323-331.
Ohala, J.J.
1983 The origin of sound patterns in vocal tract constraints. In P.F. Mac-
Neilage (ed.), The production of speech: 189-216. New-York:
Springer Verlag.
Ohala, J.J., & Jaeger, J.J. (eds.)
1986 Experimental Phonology. Orlando: Academic Press Inc.
Ohala, J. J., & Kawasaki, H.
1984 Prosodic phonology and phonetics. Phonology Yearbook 1: 113-128.
Favoured syllabic patterns and sensorimotor constraints 137

Ohala, J.J., & Kawasaki-Fukumori, H.
1997 Alternatives to the sonority hierarchy for explaining segmental sequen-
tial constraints. In E. Stig & Hkon Jahr E. (eds.), Language And Its
Ecology: Essays In Memory Of Einar Haugen. Trends in Linguistics.
Studies and Monographs 100: 343-365. Berlin: Mouton de Gruyter.
Prennou, G., & de Calms, M.
2002 BDLEX 50000. French Lexical Database: Lexical Resources V2.1.2.
IRIT, Toulouse: ELRA/ELDA.
Plnat, M.
1995 Une approche prosodique de la morphologie du verlan. Lingua 95:
97-129.
Redford, M.A.
1999 An Articulatory Basis for the Syllable. PhD thesis, The University of
Texas, Austin.
Redford, M., & Diehl, R.
1999 The relative perceptual distinctiveness of initial and final consonants
in CVC structures. Journal of the Acoustical Society of America
106: 1555-1565.
Robb, M., & Bleile, K.
1994 Consonant inventories of young children from 8 to 25 months. Clini-
cal Linguistics and Phonetics 8: 295-320.
Rochet-Capellan, A., & Schwartz, J.-L.
2005a The labial-coronal effect and CVCV stability during reiterant speech
production: An acoustic analysis. Proceedings of
INTERSPEECH2005: 1009-1012. Lisbon.
2005b The labial-coronal effect and CVCV stability during reiterant speech
production: An articulatory analysis. Proceedings of
INTERSPEECH2005: 1013-1016. Lisbon.
Rossato, S., Badin, P., & Bouaouni, F.
2003 Velar movements in French: An articulatory and acoustical analysis
of coarticulation. In M.J. Sol, D. Recasens, & Joaquim R. (eds.),
Proceedings of the 15th International Congress of Phonetic
Sciences: 3141-3145. Barcelona.
Rousset, I.
2004 Des Structures syllabiques et lexicales des langues du monde : don-
nes, typologies, tendances universelles et contraintes substantielles.
Thse de doctorat en Sciences du Langage (PhD), Universit Sten-
dhal. Grenoble, France.
Sato, M., Valle, N., Schwartz, J.-L., & Rousset, I.
2007 A perceptual correlate of the Labial-Coronal Effect. Journal of
Speech, Language and Hearing Research 50: 1466-1480.
Saussure, F.D.
1916 Cours de linguistique gnrale. Payot, Paris.
138 Nathalie Valle, Solange Rossato and Isabelle Rousset

Schwartz, J.-L., Bo, L.-J., Valle, N., & Abry, C.
1997 The Dispersion-Focalization Theory of vowel systems. Journal of
Phonetics 25 (3): 255-286.
Segui, J., Dupoux, E., & Mehler, J.
1990 The role of the syllable in speech segmentation, phoneme identifica-
tion and lexical access. In G. Altmann (ed.), Cognitive Models of
Speech Processing: 263-280. MIT Press.
Selkirk, E.
1982 Syllables. In H. van der Hulst & Smith N. (eds.), The Structure of
Phonological Representations (2): 337-383. Dordrecht: Foris.
1984 On the major Class Features and Syllable Theory. In M. Aronoff &
Oerhle R.T. (eds.), Language and Sound Structure. Cambridge: MIT
Press.
Stefanuto, M., & Valle, N.
1999 Consonant systems: From universal trends to ontogenesis. Proceed-
ings of the XIVth International Congress of Phonetic Sciences (3):
1973-1976. San Francisco.
Steriade, D.
1982 Greek Prosodies and the Nature of Syllabification. Phd Thesis, MIT.
Stevens, K.N.
1989 On the Quantal Nature of Speech. Journal of Phonetics 17 (1): 3-46.
2003 Acoustic and perceptual evidence for universal phonological features. In
M.J. Sol, D. Recasens, & Joaquim R. (eds.), Proceedings of the 15th
International Congress of Phonetic Sciences: 33-38. Barcelona.
Stevens, K.N. & Keyser S.J.
1989 Primary features and their enchancement in consonants. Language
65 (1): 81-106.
Stoel-Gammon, C.
1985 Phonetic Inventories, 15-24 Months: A Longitudinal Study. Journal
of Speech and Hearing Research 28: 505-512.
Treiman, R.
1983 The structure of spoken syllables: Evidence from novel word games.
Cognition 15: 49-74.
1989 The internal structure of the syllable. In G.N. Carlson & Tanenhaus
M.K. (eds.), Linguistic Structure in Language Processing: 27-52.
Dordrecht: Kluwer.
Treiman, R. & Kessler, B.
1995 In defense on Onset-Rime syllable structure for English. Language
and Speech 38 (2): 127-142.
Trubetzkoy, N.S.
1939 Grundzge der Phonologie. Travaux du Cercle Linguistique de
Prague, 7. Translation by Cantineau, J.
1970 Principes de phonologie. Paris: Klincksieck.
Favoured syllabic patterns and sensorimotor constraints 139

Valle, N., Bo, L.-J., Schwartz, J.-L., Badin, P., & Abry, C.
2002 The weight of substance in phonological structure tendencies of the
worlds languages. ZAS Papers in Linguistics 28: 145-168. Berlin.
Valle, N., Schwartz, J.-L., & Escudier, P.
1999 Phase Spaces of vowel systems. A typology in the light of the Dis-
persion-Focalization Theory (DFT). Proceedings of the XIVth Inter-
national Congress of Phonetic Sciences: 333-336. San Francisco.
Vilain, A., Abry, C., Brosda, S. & Badin, P.
1999 From idiosyncratic pure frames to variegated babbling: Evidence
from articulatory modelling. Proceedings of the XIVth International
Congress of Phonetic Sciences: 1973-1976. San Francisco.
Warren, M.R.
1961 Illusory changes of distinct speech upon repetition The verbal
transformation effect. British Journal of Psychology 52: 249-258.



Structural complexity of phonological systems
Christophe Coup, Egidio Marsico and Franois
Pellegrino
1. Introduction
In the linguistic tradition, including phonology, complexity has often been
evoked when looking for explanatory arguments (a given phenomenon is
rarer because it is more complex than another) or looking for a balance of
complexity within subsystems of a language, or directly comparing and
ranking several languages according to a linguistic dimension (for a review,
see Chitoran and Cohn, and Pellegrino et al. this volume). In this perspec-
tive, the concept of complexity is intrinsically relative and necessarily
yields to judging something as more or less complex than something else
regarding a particular property or even globally. Thus, anyone involved in
the enterprise of evaluating phonological complexity faces the tricky issue
of, as a first step, defining a set of (phonological) properties and for each
property defining a scale of complexity. Then, and only then, is one able to
start comparing the phonological complexity of the chosen phonological
elements. In that perspective, to be complex or not is a (possibly gra-
dient) quality assigned to a particular set of elements. This task is all but
straightforward. If choosing the set of properties can be quite simple, cha-
racterising them with a scale of complexity is much trickier. Moreover, as
one tries to combine several properties of an element to evaluate its overall
complexity, the issue of weighting these different dimensions can easily
lead to a dead end. In this regard, Maddieson (this volume), is very in-
sightful and presents an excellent summary of where to find and how to
define phonological complexity, as well as the limits of this notion.
Interestingly, an alternative conception of complexity has developed for
a half-century, stemming from cybernetics, systems theory and systems
dynamics (e.g. Abraham, 2001 for an epistemological view). First found in
statistical physics, biology and computer science, it has rapidly proven
relevant within the field of humanities and social sciences and it is now
definitively associated with the notion of complex system. In that frame-
work, a system is or is not complex according to whether its structure and
behaviour satisfies particular characteristics. The picture is thus substan-
142 Christophe Coup, Egidio Marsico and Franois Pellegrino

tially different from the arithmetic view of complexity because one no
longer needs to look for the very dimensions on which to compute com-
plexity, more or less objectively, but rather to just validate if a system
fulfils some properties known a priori. This way, complexity is no longer a
relative notion. To illustrate what the properties of complex systems can be,
we refer to Steels (1997):
a complex system consists of a set of interacting elements where the beha-
viour of the total is an indirect, non-hierarchical consequence of the beha-
vior of the different parts []. In complex systems, global coherence is
reached despite purely local nonlinear interactions. There is no central con-
trol source.
A system can thus be said to be complex if:
i) it is structured in different levels;
ii) the properties of the global level (the systemic ones) differ from those
of the elements of the basic level;
iii) the systemic properties cannot be derived linearly from the basic
ones.
Seeds of this new paradigm can be found in Warren Weavers seminal ar-
ticle (1948) where he emphasized the understanding of organized com-
plexity as one of the key issues to be addressed by modern science. Lying
somewhere between the simple problems that could be solved with pre-20
th

century science, and the disorganized complexity that was handled with
new statistical and probabilistic tools in the first half of the 20
th
century,
this complexity involves dealing with a number of factors that do not be-
have independently, but interact into what Weaver called an organic
whole. Rather than the basic number of factors or constituents in the sys-
tem (that would be low for simple problems and potentially high in the case
of disorganized complexity), it is the nature of their interrelations that ac-
tually matters. The step forward lies precisely in this differentiation of le-
vels where the elements and the properties of each level may differ; and
what matters is the way the structure of the systemic level emerges from
interactions at the basic level.
This view, stemming from the science of complex systems, leads to
modifying the way phonological complexity is addressed: we no longer
intend to compare the overall complexity of phonological systems in terms
of which one is more complex than the others. Instead, we aim at character-
ising their structure. Explaining why there are so many different structures
seems now even more crucial than knowing if one is more complex than
another. After all, all languages seem to work with the same efficiency; no
Structural complexity of phonological systems 143

one has ever reported a language with communicative disabilities or non-
impaired children failing to learn a particular language. For all we know, all
languages are functionally equal (and all complex enough), and yet as Fer-
guson (1978, p. 9) wrote:
As soon as human beings began to make systematic observations about
one another's languages, they were probably impressed by the paradox that
all languages are in some fundamental sense one and the same, and yet they
are also strikingly different from one another.
Indeed, typological research has shown that despite the fact that certain
types of linguistic structures are clearly more frequent than others, even the
uncommon ones can be relatively numerous and very different. Thus, this
coexistence of numerous viable types of linguistic elements and structures,
although unevenly distributed, reveals that language is a system poorly
constrained, or at least presenting numerous degrees of freedom.
In this contribution, we develop a study of the structural complexity of
the phonological systems of the UPSID database
1
in line with these state-
ments. To set the stage, let us just look at the variation present in the lan-
guages of the database. They have from 11 to 141 segments, from 3 to 28
vowels, from 6 to 95 consonants, from 6 to 51 distinctive features. This has
to do with the variations of types, but discrepancies are even wider when
one looks at tokens. To give a few examples, some segments are present in
only one language whereas others can cover up to 90% of the sample; only
one language has 28 vowels but more than 20% have five; stop consonants
are present in all languages, etc. These two sources of data (types and to-
kens) offer different kinds of information. Looking at types raises issues
regarding the set of possible phonological elements (be they features, seg-
ments or systems), and at first glance, the observed diversity could push
toward considering phonological systems as simple sets of unorganized
segments. However, when compared to the theoretical number of possible
combinations of features and segments, the number of attested types is
relatively low, showing instead that phonological systems are not randomly
composed. Moreover, when looking at tokens, the uneven distribution of
types among languages reveals that some systems prevail. Consequently,
we need to understand what parameters are making one system more wide-
spread than another. This means also, from a methodological point of view,
that frequencies of distribution are not an explanation per se and thus
should not be considered as inputs in a model, but rather as what is to be
explained. They are the emergent properties of an underlyingly organized
structure.
144 Christophe Coup, Egidio Marsico and Franois Pellegrino

The notion of 'emergence' is a key concept of the dynamical complex
system framework. As mentioned before, the different structures of the
systems are considered as emerging from the specific interactions of their
elementary units. To some extent then a system can be seen as the reflec-
tion of the constraints at work. The citation below from Bjrn Lindblom,
illustrates this from the diachronic perspective:
"The new form [i.e. the new pronunciation that yields a potential sound
change] gets tested implicitly on a number of dimensions: 'articulatory ease',
'perceptual adequacy', 'social value' and 'systemic compatibility'. If the
change facilitates articulation and perception, carries social prestige and
conforms with lexical and phonological structure, its probability of accep-
tance goes up. If the change violates the criteria, it is likely to be rejected."
(Lindblom, 1998:245).
Again, the notion of "systemic compatibility" pushes forward the idea that
the whole (the system) is more than the sum of its parts. Following this line
of thinking, in a previous paper (Marsico et al., 2004), we have explored
phonological inventories (hereafter PI) assuming that it can lead to the
(even partial) understanding of their structure. We began with a bottom-up
approach where we intended to i) set the different levels of structuration of
PI, ii) identify the properties of each level and iii) characterize the rela-
tion(s) between the levels. We got reasonable results as far as points (i) and
(ii) are concerned, but our approach showed its limits with point (iii), espe-
cially when dealing with the systemic level. The main index we used to
monitor the systemic behaviour of PI deals with the notion of redundancy.
We wanted to evaluate the longstanding idea of PI as being economic sys-
tems (i.e. the MUAF principle Maximal Use of Available Features first
introduced by Ohala, 1980). Our redundancy measure evaluates the average
distance between each segment of a PI and its nearest neighbour. Altough
the quantitative results seem to show that PI are indeed based on a principle
of economy favoring systems with minimal phonological oppositions (i.e.
based on only one feature); the qualitative analysis of these results revealed
that our measure is not really a systemic one. As a matter of fact, our re-
dundancy index deals more with one-to-one relationship between segments
than with a collective behaviour. The lowest redundancy index is obtained
as long as each segment has its minimal counterpart in the system (i.e. a
segment differing by only one feature) without considering any of the un-
derlying systemic principles on which MUAF is based: maximal use of
features, consistent series of segments. The lowest index can be obtained
Structural complexity of phonological systems 145

with a system made of what Lindblom calls "a collection of 'assorted bon-
bons'", (Lindblom, 1998:250).
This has led us to change our perspective and to adopt a top-down ap-
proach directly based on the systemic level. We will develop this approach
in the remainder of this paper. Section 2 deals with a structural approach
where PIs are considered as networks of connected phonemes. In Section 3,
PIs are modelled by considering the distribution of co-occurrences of pho-
nemes, in order to define attraction and repelling relations between them.
These relations are then used to propose a synchronic measure of coherence
for the phonological systems, and then diachronically extended to a meas-
ure of stability.
2. Considering phonological inventories in the light of graph theory
2.1. From a feature-based distance to phonological graphs
2.1.1. About graph theory
Mathematical graph theory, also named network theory, during the last
decade has had a significant impact in various scientific fields, for two
main reasons. The first is the acknowledgment of the range of this theory,
which proposes a set of tools and generic concepts that can be applied to a
wide range of questions. The second is linked to the theoretical progress
made in the understanding of the properties of networks half-way between
regular and random networks (e.g. Erds and Rnyi, 1960). If the detailed
analyses of these two specific kinds of networks go back several decades,
those of intermediate networks are much more recent, and illustrate the
difficulty of apprehending Weavers organized complexity. The small-
world or scale-free networks are by far the most cited today (e. g. Watts
and Strogatz, 1998), since they are very commonly encountered in the
study of the non-living, living or social phenomena. Several concepts bor-
rowed from graph theory like the notions of shortest path, robustness,
aggregation, hub, or resilience, etc. have led to substantial breakthroughs in
a wide range of applications: the functionality and robustness of internet
networks, the understanding of the interactions between proteins, or within
complex eco-systems, or the propagation of epidemics (Dorogotsev &
Mendes, 2001; Pastor-Satorras & Vespignani, 2001). This statement is also
correct in linguistics, where scientists have studied the properties of lexical,
146 Christophe Coup, Egidio Marsico and Franois Pellegrino

syllabic or phonological graphs (Cancho & Sol, 2001; Cancho & al., 2004,
Dorogotsev & Mendes, 2001; Sol, 2004).
In general, a graph is defined by a set of nodes and a set of connections.
The way nodes are connected leads potentially to graphs of different types
but for which a common set of properties may be calculated. While some of
these properties depend on the size of the network, others may be invariant,
for a given type of network, and regardless of their respective size. In our
approach, each phonological system is considered as a set of two networks,
one for the vowel segments and one for the consonants (diphthongs are not
considered so far). For each graph, the segments are the nodes and the con-
nections are derived from phonetic-phonological relations between them
using the algorithm described in the next section.

2.1.2. From phonemes to feature-based phonemic distances
One way of quantifying the relation between any two phonemes is to rely
on the features that compose them. In this approach, the degree of interac-
tion corresponds to the distance in terms of features, where these features
are compared within the natural classes they belong to. The following ex-
amples will illustrate this calculation.
/i/ /u/
Height high high
Backness front back
Lip Rounding unrounded rounded
Distance = 2
/o/ /:/
Height high-mid high-mid
Backness back back
Lip Rounding rounded rounded
Length short long
Nasality oral nasal
Distance = 2
/p/ /v/
Place labial labio-dental
Manner plosive fricative
Voicing unvoiced voiced
Distance = 3

Structural complexity of phonological systems 147

This rough distance could certainly be refined by taking the shared features
into account as well, but the main problem would remain: the nature of the
relation between phonemes is hard to establish a priori, first because of the
lack of common ground for their internal description (see the first part of
this volume, on the phonological primitives) and second because the prin-
ciples explaining the relations between phonological segments is still a
controversial and open issue. The proposed methodology by no means aims
at being the ultimate formalism but it provides a reasonably adequate bal-
ance between the need for some phonetic rationale and the possibility of
being consistently applied to any phonological system.

2.1.3. Construction of phonological graphs
With a quantification of the distance between phonemes, we can now turn
to the construction of a graph where all the phonemes of a PI would be
connected and fulfil the following principles:
1) There must be a path between any two phonemes, direct or indirect;
2) This path must be minimal in a way compatible with the notion of
economy or parsimony.
The first principle is consistent with the promoted view of PI as systems; no
phoneme is isolated within a PI, and consequently each phoneme is at least
related to one of the other phonemes of the system. This principle stems
from the traditional idea of opposition between phonemes. For each pho-
neme, the second principle aims at selecting the connections occurring
within its neighbourhood (in terms of phonetic similarity) since we consid-
er that long-range connections are meaningless. The neighbourhood is not
defined using an a priori distance (for instance a hard threshold of 3 be-
tween segments) but by selecting the path that preserves a minimal cost as
illustrated below
2
.
Let us concentrate on the potential paths linking /o:/ and /a/ in the five
vowel system given in Figure 1. The direct path (based on the feature dis-
tance between the two phonemes) is 4. Besides, there are several indirect
paths, such as for example /o:/ => /e:/ => /a/ or /o:/ => /u/ => /a/ or even
/o:/ => /u/ => /i/ => /a/. This last one is especially interesting because the
biggest jump between two nodes only involves a distance of 2 (in fact all
the jumps in this path are of a distance of 2). In our approach, this path is
then the less costly, since, step by step, it involves skipping from neighbour
to neighbour. In this view, the number of jumps doesnt matter, what counts
148 Christophe Coup, Egidio Marsico and Franois Pellegrino

is their size. For this reason, the direct path (/o:/ => /a/, distance = 4) has
been removed from the network in favor of the indirect one.


STEP 1 STEP 2 STEP 3
We compute the direct phonetic
distance for each phonemes
pair.
Identification of pairs of
phonemes for which an
indirect path requires
smaller "jumps" than the
direct one.
Suppression of costly
direct paths.








Each node is linked with every
other nodes of the network.
The values correspond to the
phonetic distances.
Dotted lines show costly
paths.
The resulting network only
keeps the less costly con-
nections (direct or indi-
rect).
Figure 1. Description of the algorithm of construction of phonological graphs

The inspection of the various networks or graphs built with this approach
reveals properties close to the classical ones in phonology in terms of serial
or derivative structures. The next figure illustrates this point with the most
frequent five vowel system in our data on the left, and a ten vowel system
on the right, composed with the same five vowels plus their nasal counter-
parts. A layered structure is visible: the sub-network consisting of the vo-
wels /i, e, a, o, u/ mirrors the one composed of the nasal counterparts, and is
connected to it in a regular fashion.







i u

e o

a
2
2 2
2
2
4 4
4 3
i u

e o

a
2
2 2
2
2
4 4
4 3
i u

e o

a
2
2 2
2
2
3
3
Structural complexity of phonological systems 149












Figure 2. Two examples of phonological graphs


2.2. Measuring the structural complexity of phonological inventories
The previous step explained how we have built phonological graphs from
PIs; we will now show how we can compare these PIs in terms of structural
complexity using a specific measure relying on the corresponding graphs.
The interest of this approach lies in the fact that it is anchored outside pho-
nology and linguistics, and is not the result of an ad hoc measure based on
language comparison. As such, this approach can make possible a compari-
son of PIs structural complexity with as little a priori bias as possible.
However, estimating the complexity of a graph is not a straightforward
matter. Several measures exist (Neel & Orrison, 2006; Jukna, 2006; Bon-
chev & Buck, 2005) and as always, choosing one over the others seems to
depend on implicit considerations.

2.2.1. The notion of off-diagonal complexity
Among the various possible measures found in the literature, our choice fell
on the off-diagonal complexity proposed by Claussen (2004). This measure
offers different characteristics that parallel simple intuitions linguists have
on PIs. Indeed, this measure:
- Doesnt explicitly take into account graph size (i.e. its number of
nodes or connections). This method consequently does not postulate that a
large PI will be more complex than a small one;
- Is sensitive to the presence of hierarchical sub-structures in the net-
work. This can happen for example when a whole primary system is con-
i
u
e
a
o


u i
a
e

o
150 Christophe Coup, Egidio Marsico and Franois Pellegrino

trasted by a secondary feature (see above the ten vowel system in figure 2,
or below, the system of Chipewyan);
- Is minimal for regular graphs and maximal for free-scale graphs. It
thus provides a benchmarking for PIs structural complexity (for which
free-scale structure is very unlikely).

The calculation of the offdiagonal complexity follows several steps:
1. Calculation of the degree of each node by counting its connections;
2. Construction of a matrix M defined by M(k
1
, k
2
)=number of connec-
tions existing between nodes of k
1
degree and nodes of k
2
degree;
3. Calculation of the entropy C of the distribution of normalized sums,
m
i
, of the values of the minor diagonals of M with the following formula:

=
=
max
0
log
k
i
i i
m m C ; k
max
being the max degree of a phoneme of the graph.
Such a measure can seem complicated, but it is actually able to detect the
structural regularities existing at the level of the relations between nodes.
Figure 3 gives an example.

Figure 3. The different steps of calculation of the offdiagonal complexity.

In (a) we have the initial graph with 19 nodes and 18 connections. In (b) we
added the degree for each node. (c) presents the corresponding matrix M
and the sum of the values of the diagonal. The resulting offdiagonal com-
plexity is thus:

538 . 1 0
18
6
log
18
6
18
5
log
18
5
18
1
log
18
1
18
1
log
18
1
18
1
log
18
1
18
4
log
18
4
=

+ + + + + + = C

1
1
1
1 1
1
1
1
1
1
1
2
3
2
2
2
3
4
7
1
1
1
1 1
1
1
1
1
1
1
2
3
2
2
2
3
4
7
1
1
1
1 1
1
1
1
1
1
1
2
3
2
2
2
3
4
7
4
6
5
1
1
1
0
4
6
5
1
1
1
0
a) b) c)
Structural complexity of phonological systems 151

As the preceding graphs have shown, offdiagonal complexity can only be
calculated for non-valued graphs, i.e. graphs where the connections have no
intrinsic value or weight. This is a serious limitation since, in our approach,
the connections stand for distances between phonemes and are thus natural-
ly valued. Because the Claussen measure doesnt allow taking this informa-
tion into account, the only use we make of it is when pruning the full graph
by removing the costly connections.
Figure 4 gives the offdiagonal complexity of several relatively simple
PIs whereas Figure 5 illustrates the possibility to do so with a much more
complicated system.
Figure 4. Simple vowel systems and values of their offdiagonal complexity.
Figure 5. Off-diagonal complexity of the Chipewyan 14 vowel system; C=0.89.
1) 2)



C = 0.64 C = 0.69

3) 4)






C = 0.69 C = 0.97
i
u
e
a
o
i
u
e
a
o

u i
a
e o
i
u
e
a
o
i u
i
a
u
e o
a

i u


152 Christophe Coup, Egidio Marsico and Franois Pellegrino

These few examples are clear evidence of the absence of a direct relation
between number of phonemes and value of the offdiagonal complexity.
Figure 4 compares systems of the same size (4.1 vs. 4.2 for 5-vowel sys-
tems, and 4.3 vs. 4.4 for 7-vowel system) but with different complexity,
and systems with the same complexity but different cardinal (4.2 vs. 4.3).
The PI in 4.1 with only primary phonemes (or basic ones, according to
Marsico et al. (2004) is less complex than the one in 4.2 with the same
number of vowels but with a secondary non contrastive feature
(length).This latter system being as complex as the PI in 4.3 which has two
more vowels but all primary ones. Chipewyan (figure 5) presents a smaller
complexity than the PI in 4.4 despite having twice as many vowels, due to
its more regular structure.

2.3. Comparisons between phonological inventories from UPSID and
random ones regarding off-diagonal complexity
The table below gives the offdiagonal complexity of vocalic and consonan-
tal systems
3
for the whole set of languages from UPSID.

UPSID Vowel systems Consonant systems
C mean 0.794 1.670
C min 0 0
C max 1.700 2.379
Std. Dev. 0.313 0.325

No correlation between vocalic and consonantal complexity was found
(r=0.0006). Thus, there is no compensation between structural complexity
of vowels and consonants (no negative correlation); nor any parallel beha-
viour between them, like the smaller the one the smaller the other (no posi-
tive correlation). These results confirm, with a different measure, the ones
described in Maddieson (2006).
To assess whether the offdiagonal complexity was really capturing
meaningful information on PIs, we compared the 451 UPSID PIs with a set
of 451 generated PIs. These PIs were randomly composed by picking pho-
nemes (from the whole set of existing phonemes) respecting the distribu-
tion of PI size from UPSID. Thus, every UPSID system was matched with
a random one of the same size, but for which the content did not obey any
linguistic motivation. Our hypothesis was that if random and actual systems
lead to similar distribution of structural complexity, the offdiagonal com-
Structural complexity of phonological systems 153

plexity is pointless. The table below gives the distribution of random sys-
tems and is to be compared with the previous one.

RANDOM Vowel systems Consonant systems
C mean 1.071 1.965
C min 0 1.045
C max 2.106 2.788
Std. Dev. 0.470 0.316

On average, the complexity of random systems is significantly higher than
the one of real systems, both for vocalic (t(450) = -10.41; p < 0.001) and
consonantal systems (t(450) = -13.85; p < 0.001). These results support the
idea that this measure of complexity does capture part of the organization
of PIs; random systems are more complex, i.e. they are less structured than
real ones. On the other hand, there is a large overlap in the ranges of varia-
tion of complexity, especially for vowel systems where both random and
real systems have a minimal zero value. A possible interpretation is that the
off-diagonal complexity is not discriminative enough, due to a limited
number of observed structures for the vowel systems.
In order to further evaluate the performance of the algorithm, we con-
sidered the possible variations in terms of complexity among the main lin-
guistic groups to which the UPSID languages belong. Following Mad-
dieson (2006), we separated our sample in the 6 major geographical areas
presented in the next table, along with the total number of languages per
area and the average vocalic and consonantal complexity value.
Two one-factor ANOVAS, independently run on vocalic and consonan-
tal systems, reveal significant differences among the groups (F(5) = 6.02; p
< 0.001 and F(5) = 23.25; p < 0.001, respectively). Post-hoc Scheffs tests
reveal furthermore that the structural complexity of the vocalic systems of
the area Australia & New Guinea is significantly different than the areas
Europe, South and West Asia and East and South-East Asia. Regarding
consonantal systems, several areas, when considered in pairs, show signifi-
cant differences. For example, the Africa area presents a complexity sig-
nificantly greater than any other, except for the Europe, South and West
Asia area which presents a very close average value, as shown in the next
table:



154 Christophe Coup, Egidio Marsico and Franois Pellegrino


Europe,
South
and
West
Asia
East
and
South-
East
Asia
Africa
North
America
South
and
Central
America
Australia
and New
Guinea
Number of
languages
71 108 74 68 66 64
Structural
complexity
of vocalic
system
0.90 0.87 0.79 0.73 0.81 0.65
Structural
complexity
of conso-
nantal
system
1.83 1.61 1.84 1.67 1.51 1.45


2.4. Conclusion
As the previous paragraphs have shown, the offdiagonal complexity seems
a promising measure for analyzing the structure of PIs. However, although
it coincides pretty well with linguists intuitions when applied to some spe-
cific systems, when the whole set of PIs from UPSID is considered, the
distribution of complexity values is very compact, thus limiting the com-
parison between systems. This is due to the fact that the Claussen measure
is more adapted to bigger graphs with more diverse internal structures. In
our data, the limited typological structural variation is therefore a problem.
One possible improvement could be to take into account the weight of the
connections of the graphs (i.e. the phonetic distance between phonemes),
but this is not possible yet with this measure. Still, these results suggest the
following: (i) there are differences in terms of PI structure among linguistic
areas, (ii) there is no relation whatsoever between the complexities of vo-
calic and consonantal system, and (iii) real PIs display a certain degree of
regularity that random systems dont.
Structural complexity of phonological systems 155

3. From molecules to phonemes: calculating cohesion and stability for
phonological inventories
In this section, we will present an alternative approach aiming at characte-
rizing systemic features of PIs as well. We will propose a measure evaluat-
ing the cohesion of PIs, by borrowing the concept of energy so familiar in
statistical physics. Unlike the measure introduced in the previous section,
which was based on the topology of the systems, this one focuses more on
the phonemes and their very interactions within systems. We will first
present a measure of the interactions between phonemes considered two by
two, and then an extension of this measure to the evaluation of the overall
cohesion of PIs. Last, an evolutionary model of PIs will be presented and
commented on. For reasons of clarity, we will only describe here the case
of vocalic systems, but our approach also applies to full systems without
separating vowels from consonants, as done in section 2.


3.1. On the notion of attraction and repulsion between phonemes
For the topological approach combined with the off-diagonal complexity,
the degree of relationship between phonemes was evaluated using a simple
feature-based distance. Here, we propose to measure the interaction be-
tween two given segments using their patterns of cooccurrence among lan-
guages present in the UPSID database. This approach is based on the as-
sumption that if two segments recurrently appear or don't appear together in
PIs, an underlying constraint is probably responsible for this pattern. The
study of this kind of regularities in the PIs of UPSID has been partially
addressed by Clements (2003), who found convincing arguments in favour
of the feature economy theory. To do so, Clements studied contingency
matrices (see example below) in order to see whether phonetically close
segments have a tendency to attract (to be present in the same PI) or repulse
each other (to not appear together). He used a test to ensure that only
significant interactions are considered, according to the intrinsic frequency
of each phoneme.
Clements approach can be continued in two directions. First, the test
is limited when rare events are at play a problem Clements did not have
to deal with in his study. A solution is to apply the exact Fisher test instead,
which can be used with any number of occurrences; actually, the is just
an approximation of the Fisher test, less costly in terms of calculation, but
for which stronger hypotheses must be met.
156 Christophe Coup, Egidio Marsico and Franois Pellegrino

A second improvement consists in not only considering the cooccur-
rence of two phonemes A & B, but more generally the four possibilities
A & B, !A & B, A & !B, !A & !B where "!" stands for the absence of a
phoneme. This allows for capturing a larger set of possibly relevant phe-
nomena than if only considering the case where the two segments are pre-
sent at the same time. The table below gives the contingency matrix for the
phonemes /a/ and // in UPSID:

// !//
/a/ 82 310
!/a/ 1 58

As we can see, only one language has // without /a/ whereas 310 others
present the reverse situation, /a/ without //. If we only calculate the statis-
tical significance of the cooccurrence between /a/ and //, we are bound to
find a rather weak interaction because /a/ has an high intrinsic frequency
(/a/ is present in 392 languages out of 451). Nevertheless, the contingency
matrix is highly asymmetrical. Taking into account all the four possibilities
allows us to measure not only the direct interaction between two phonemes
of a system, but also the impact of the presence of one of the two when the
other is absent: is the system indifferent or is it going to evolve to "recruit"
the missing one or "get rid" of the other?
Other pairs of oral vowels and their nasal counterparts follow the same
pattern. This particular distribution may be linked to the mechanism of
transphonologisation by which nasal vowels are derived from their oral
counterparts by extension of the nasal feature of an adjacent consonant. The
nasal vowel cannot appear without the oral one and the rare cases where it
does only happen because the oral vowel disappears afterward (usually by
quality change
4
). This example clearly illustrates that PIs can reflect diach-
ronic processes although they are only implicitly expressed. The approach
we are following is similar to a binarisation of PIs as they are now not only
described by the set of phonemes they contain but by the set of all the oth-
ers they don't contain as well. For example, a system with 5 vowels (out of
the 180 ones possible in UPSID) will be described by the presence of these
5 vowels AND by the absence of the 175 others.
To quantify the interaction between two phonemes, we took the loga-
rithm of the exact Fisher test. Since the logarithm of probabilities only pro-
vides negative values, it is also necessary to evaluate the direction of the
interaction: when two phonemes appear together more often than their re-
Structural complexity of phonological systems 157

spective frequencies would predict if they were independent, a "+" sign is
given to their interaction whereas a "-" sign is attributed when the frequen-
cy of appearance is less than what is expected under the independence hy-
pothesis. Finally, values have been normalized between -1 and +1, and give
the strength of the interaction I. Figure 6 below presents some of the
strongest interactions found in UPSID PIs.

i

u
o e
a
u
a
+
a
i

u u
o e
a
u
a
+
a
+
a a
i i


Figure 6. Dashed lines represent the strongest repelling interactions, and solid
ones, the strongest attracting interactions.
The relations illustrated in figure 6 reveal that systems have a tendency to
harmonize front and back vowels for a given height: /i/ and /u/ attract each
other, like /e/ and /o/ or // and //. Another positive interaction groups
together the three nasal vowels derived from /i, a, u/. Negative interactions
are relevant as well. We notice that the three low vowels repel each other;
and so do the pairs /i - / and /u - /. The interactions /i - / and / - u/ can be
the reflection of the harmony between /i/ and /u/.
In a maybe counterintuitive way, the strongest interactions do not in-
volve /a/ with /i/ or /u/, even though these three segments are present to-
gether in a vast majority of languages. This can be explained by the fact
that these segments are all very frequent (considered independently), so
frequent that the Fisher test does not recognize their interactions as plausi-
ble. The same comment applies when the pair of segments involves an ex-
tremely rare secondary feature (like breathy-voiced or creaky-voiced for
example); the test is then not powerful enough to assign strong interactions.
This limitation prevents saying anything about relations between the most
158 Christophe Coup, Egidio Marsico and Franois Pellegrino

frequent or the rarest segments. It can seem odd, especially for the very
frequent segments (/a/, /i/ and /u/ for instance) for which several theories
have proposed explanations of their frequency explicitly based on their
interaction (in the line of the maximum or adaptive dispersion theory (Lil-
jencrants & Lindblom, 1972)). However, it guarantees that only the infor-
mation present in the database (and statistically assessed) is considered,
without any theoretical a priori. Thus, this approach proposes a theory-
neutral point of view that is worth being further explored as a way to access
additional information on PIs.
In order to take into account both the interaction and the intrinsic infor-
mation relative to phoneme frequency in PIs, we also calculate the exact
Fisher test for the frequency of distribution of a particular segment com-
pared to a theoretical frequency of 50%. Segments that are present in less
than 50% of the languages are given a negative intrinsic value and those
present in more than 50% a positive one. These values are obtained by a
transformation of the result of the Fisher test similar to the normalization
used for the interactions. This intrinsic value V is linked to the frequency of
phonemes through a nonlinear relation that takes the sampling effect into
account.
In the current approach, we only consider pairs of segments, but it is
theoretically possible to deal with interactions of n-tuples with n > 2. Nev-
ertheless, the size of the UPSID database would dramatically limit the
number of triplets of phonemes for which significant interactions would be
assessed.


3.2. From pairs of segments to the whole system
We have defined, on the one hand, the intrinsic value V for individual seg-
ments and on the other hand, the interaction forces I for pairs of phonemes.
Since the exact Fisher test, when applied to the interactions, neutralizes the
weight of the intrinsic frequency of the segments, these two measures are
statistically independent and thus can be combined for a global measure of
cohesion. We now define this measure as C:

(1)

+ =
j i
P P
j i
P
i
j i i
P P I P Sv
,
) , ( ' ) ( V ) ( C
In this equation, Sv is a vocalic system, P
i
and P
j
are vowels and
) , ( ) , ( '
j i j i
P P I P P I = if P
i
and P
j
are present, ) , (!
j i
P P I if P
j
is present
Structural complexity of phonological systems 159

and P
i
absent etc. This way, we integrate for each pair of segments of a
system the relevant combination among the four possible ones (not only
present-present). Besides the fact that this doesnt discard useful informa-
tion, it makes the global cohesion independent of the size of the PI. One
potential drawback, though, is the smoothing of the values of C and thus the
resulting small range of variation between the PIs. However, the study of
the PIs distribution is relevant.
Our approach echoes Pablo Jensens in a recent economic study about
the interactions between retail stores (Jensen, 2006). In his work, the inte-
ractions between the various stores, positive or negative, are calculated on
the basis of the frequency of their cooccurrences in a close neighbourhood
(that plays a similar role to PIs in our approach). All the interactions are
then summed to calculate an energy value corresponding to our value I
characterizing the organization of an economic and geographic space. This
measure can also be calculated for any new potential store in order to eva-
luate its fitness in the anticipated location.
Our approach echoes previous research on maximization of perceptual
distance by replacing phoneme-to-phoneme perceptual similarity with syn-
chronic phonemic interactions measured from UPSID. On the one hand, it
definitely limits the explanatory power since the interactions revealed prob-
ably result from several factors without really identifying them. On the
other, it enables us to examine the phonological system as a whole, and not
only the primary vowels for instance, since all possible pairs of phonemes
can be considered. Moreover, it provides a way to reveal interactions that
would have been ignored in other more traditional approaches. Still, an
important drawback remains since a kind of circularity is present, because
we a priori use the frequency of distribution of segments to produce results
on the same inventories.
The concept of cohesion, defined as above, may intuitively be connected
with a kind of global fitness of PIs: a system consisting of a set of antago-
nistic phonemes that have a tendency to repel each other would be ill-fitted;
vice-versa a well-fitted system would consist of phonemes that go well with
each other. Yet, this approach strongly relies on the implicit postulate that
summing over the inventory the 1 by 1 interaction within each pair of pho-
nemes is able to capture the complexity of the whole system. We thus hy-
pothesise that we are dealing with a nonlinear second order complexity, and
not a higher order one. If, based on this hypothesis, we obtain good results,
for example in the predictions of the evolution of PI, then it would seem
reasonable to say that PIs are of a relatively small complexity compared
160 Christophe Coup, Egidio Marsico and Franois Pellegrino

to other systems with complexities of higher order. More explicitly, it
would indicate that the model based on second-order interactions is a good
approximation. On the contrary, if no valid result is reached, higher-order
complexity (involving patterns of interactions with 3 or more segments)
might be assumed.

3.3. Cohesion of the UPSID phonological inventories
The presentation of the results starts with a comparison of the vowel sys-
tems of the 451 languages from UPSID with random systems, and by dis-
tinguishing the contribution of the intrinsic value V from the impact of the
interactions I (Figure 7 to Figure 9).

Figure 7. Intrinsic values V for UPSID vowel systems (in dark) and random sys-
tems (in grey). Standard Deviation bars are displayed for the distribution
of random systems.

Figure 7 shows that the intrinsic values V are higher for real systems than
for random ones. Besides, V tends to decrease when the size of the system
increases, corresponding to the appearance of rarer segments in the system.
If we take a look at the maximal values of V reached for given sizes of the
system, one can observe the following hierarchy, given that S(n) is the sys-
tem of size n of maximum intrinsic value:
V{S(5)} > V{S(6)} > V{S(7)} > V{S(4)} > V{S(3)}.
80
90
100
110
120
130
140
0 5 10 15 20 25 30 35
Size of the vocalic system
V

v
a
l
u
e

o
f

t
h
e

v
o
c
a
l
i
c
s
y
s
t
e
m
UPSID systems
Random systems
80
90
100
110
120
130
140
0 5 10 15 20 25 30 35
Size of the vocalic system
V

v
a
l
u
e

o
f

t
h
e

v
o
c
a
l
i
c
s
y
s
t
e
m
UPSID systems
Random systems
Structural complexity of phonological systems 161



Figure 8 deals with the interaction forces. I is higher, on average, for real
systems than for random systems, although the distributions are overlap-
ping. The overlap decreases for larger inventories since I increases for a
significant proportion of real systems while, on average, it monotonically
decreases for random systems.
Figure 8. Interaction values I for UPSID vs. random vowel systems. Color code is
the same as for Figure 7.

A plausible explanation is that the more the size of the system increases,
the more likely it is to contain phonemes with a low intrinsic value V;
However, the recruited phonemes have a tendency to positively interact
with each other (high I).
Figure 9 represents the global measure of cohesion C. At first sight, the
results are similar to those of intrinsic values V, The main reason is that the
range of variation of I is much lower than V variation (this is visible by
comparing scales of axes of ordinates from Figures 7 and 8).There are
however differences to be highlighted with respect to Figure 7:
The ranking of the systems with the strongest cohesions for given sizes
leads to the following order: C{S(5)} > C{S(7)} > C{S(3)} > C{S(6)}. Like
for the intrinsic value, the max is obtained for a 5-vowel system (/i, e, a, o,
u/), but the following hierarchy is different, suggesting that interactions
play a role in the global cohesion of a system.
-1
0
1
2
3
4
5
6
7
0 5 10 15 20 25 30 35
Size of the vocalic system
I
n
t
e
r
a
c
t
i
o
n
s

I
w
i
t
h
i
n
t
h
e

v
o
c
a
l
i
c
s
y
s
t
e
m
UPSID systems
Random systems
-1
0
1
2
3
4
5
6
7
0 5 10 15 20 25 30 35
Size of the vocalic system
I
n
t
e
r
a
c
t
i
o
n
s

I
w
i
t
h
i
n
t
h
e

v
o
c
a
l
i
c
s
y
s
t
e
m
UPSID systems
Random systems
162 Christophe Coup, Egidio Marsico and Franois Pellegrino


Figure 9. Global cohesion value for UPSID vs. random vocalic systems. Dinka is
circled.

It is worth noticing that Dinka (Nilotic family, circled on the graph) with its
13 vowels doesnt follow the general trend of UPSID PIs but falls into the
variability of random systems. This system is indeed extremely uncommon
since it contains 7 breathy-voiced and 6 creaky-voiced vowels with no
modal voiced vowels (although we should probably take such a description
of Dinka with caution). This fact decreases the intrinsic value of the system
even though its interactional strength is not especially low (3.40).
As a last remark, we would mention that none of the most cohesive sys-
tems violates principles such as symmetry, gradual filling of the vocalic
space, etc. without these principles being directly specified in the calcula-
tions weve presented. Among the most cohesive systems, the first to use a
secondary feature is a ten vowel system for which the five vowels /i, e, a, o,
u/ are contrasted in terms of nasality.


80
90
100
110
120
130
140
0 5 10 15 20 25 30 35
Size of the vocalic system
C
o
h

s
i
o
n

o
f

t
h
e

s
y
s
t
e
m

C
UPSID systems
Random systems
80
90
100
110
120
130
140
0 5 10 15 20 25 30 35
Size of the vocalic system
C
o
h

s
i
o
n

o
f

t
h
e

s
y
s
t
e
m

C
UPSID systems
Random systems
80
90
100
110
120
130
140
0 5 10 15 20 25 30 35
Size of the vocalic system
C
o
h

s
i
o
n

o
f

t
h
e

s
y
s
t
e
m

C
UPSID systems
Random systems
Structural complexity of phonological systems 163

3.4. From static measures of cohesion to evolutionary dynamics: the
notion of stability
Using the measure of cohesion of the PIs as a fitness measure, we can now
build a relatively simple model of stochastic evolution where various poss-
ible evolutionary trajectories are implemented and evaluated. The main
driving force (and hypothesis) of this model is that a change is more likely
to happen if it increases the global cohesion of the system where it takes
place. This does not imply that changes decreasing the cohesion of a PI are
impossible, for example under the influence of social constraints, but such
changes are less likely to happen, and consequently, they are rarer in the
simulations.
The evolutionary algorithm processes as follow:

* For a given PI S, 100 new systems are built differing by 0, 1 or more
segments. These systems represent possible evolutions from S to a neigh-
bour system.

* The probability of each potential evolution is calculated by comparing its
global cohesion with that of S, and then normalizing the differences in or-
der to have a set of probabilities ranging from 0 to 1. The changes leading
to an increase of cohesion will have the highest probabilities.

* A system among the 100 is randomly chosen with respect to this distribu-
tion of probabilities. This system is considered as the new state of system S.

Several mechanisms have been tested to explore the energetic landscape of
the given PI S, as well as for the normalization of the differences between
initial and final cohesion. They all lead to comparable results.
Our model can test several evolutionary routes and then estimate the
stability of a system as a function of its cohesion compared to that of its
neighbouring systems.
The stability is evaluated as follows: given a particular system, let us
consider 500 independent evolutionary hypotheses (as the one described
above) and evaluate the percentage of evolutions that maintain the system
in its initial state (no change). The more cohesive the system S compared to
its neighbours, the more likely the continuation of this state and thus the
higher the stability. Vice-versa, a system surrounded by more cohesive
systems is instable and very likely to change.
164 Christophe Coup, Egidio Marsico and Franois Pellegrino

Figure 10 presents several indicators derived from the stability simula-
tion for UPSID vowel systems. For a given size of the systems (X axis), the
graph displays the stability of the less stable system (diamond shape), the
most stable one (triangle shape) and the average stability of all the systems
of that given size (square shape).
Figure 10. Stability of UPSID vowel systems sorted by increasing size. Numbers
along the top curve give the size of the corresponding system.
Interestingly, whereas the maximal global cohesion decreased as soon as
systems reached 7 or 8 segments, we notice that maximal stability is still
high even for 12 or 14 vowels. Thus, the simulated evolutions reveal that
large systems can play the role of stable attractors even if their cohesion is
not really high; what matters is that it is higher than their neighbours.
Another interesting result is the change of mode that operates at sizes
close to 9 vowels: for smaller systems, odd numbers of vowels are the most
stable (empty triangles in the graph) whereas for larger systems, stability
comes with an even number of segments (full triangles). This particularly
salient effect is worth being linked to the change of organization observable
in the contents of phonological systems precisely around 9 vowels (Valle,
1994). Below this threshold, we mostly find primary vowels in systems
when above, systems tend to reorganize in series, contrasting a primary set
of vowels by/with a secondary feature. To this regard, Kolokuma Ijo (spo-
ken in Nigeria) is a good example as it has 18 vowels: 9 different qualities
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
24
28
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
0 5 10 15 20 25 30
Minimal stability
Average stability
Maximal stability
S
t
a
b
i
l
i
t
y
(
%
)
Size of the vocalic system (nb. of vowels)
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
24
28
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
0 5 10 15 20 25 30
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
24
28
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
0 5 10 15 20 25 30
Minimal stability
Average stability
Maximal stability
Minimal stability
Average stability
Maximal stability
Minimal stability
Average stability
Maximal stability
S
t
a
b
i
l
i
t
y
(
%
)
Size of the vocalic system (nb. of vowels)
Structural complexity of phonological systems 165

and their 9 nasal counterparts; it turns out to be the most stable 18-vowel
system in UPSID (around 63%).
4. Conclusions and perspectives
Phonological systems, because of their variety and their structure, consti-
tute an archetype of complex organized systems. They are the reflection of
physiological, cognitive and linguistic constraints together, as well as so-
cio-linguistic ones linked to the interactions between speakers. Our under-
standing of these constraints, of their interactions and their impact on the
evolution of systems themselves is still limited due to their complexity. In
this picture, the science of complexity provides particularly powerful tools
to shed new light on the issues at hand, especially to understand better the
connections between the microscopic level (the phonological constituents)
and the macroscopic level (each system, considered as a whole). However,
their development and their adaptation to linguistics is not straightforward,
and even if the first results are promising we must keep this difficulty in
mind.
The different approaches developed in this paper aim at extracting the
intrinsic information hidden in a typological database of phonological in-
ventories, avoiding as much as possible traditional a prioris in linguistic
theories. More precisely, we paid attention to two factors of the complexity
of PIs: the structural complexity and the interactional complexity. The issue
of the hierarchy of phoneme complexity has been partially addressed in a
previous paper (Marsico et al., 2004), by correlating the frequency of oc-
currence of phonemes in PIs to their capacity to generate new phonemes
derived by addition of secondary features.
The methodology used to evaluate the structural complexity comes from
graph theory; it tries to take into account some regular patterns of organisa-
tion potentially important in phonological systems (like principles of econ-
omy or symmetry for example) and to dissociate the effects of the topology
of the system from those of its size. These metrics shed light on the fact
that the systems of the languages of the world are globally more structured
than randomly constituted ones, and that significant topological differences
exist between different linguistic areas. Furthermore, the complexity of
consonantal systems is higher than that of vocalic systems.
This last result raises a recurrent question relative to phonological sys-
tems: can we apply the same analysis to consonantal and vocalic spaces?
166 Christophe Coup, Egidio Marsico and Franois Pellegrino

For a while, the structural differences seemed irreducible (discontinuity-
continuity, different acoustic cues and articulatory gestures). However, we
think, following Lindblom and Maddieson (1988), that a common theory is
possible, at least concerning the main internal principles structuring these
spaces with a balance between a perceptual principle of sufficient contrast
and an articulatory one of least effort or economy.
Regarding the interactional complexity of PIs and their intrinsic com-
plexity as well - we used a methodology directly inspired by the interac-
tional forces within physical systems and by the calculation of the resulting
energy of this system. We have thus calculated indices of intrinsic value
(linked to the identity of the different phonemes of a PI), and of interac-
tional force (linked to the reciprocal influences of phonemes among them),
for all the vocalic systems of our database. Here again, this approach has
shown that a significant difference exists between real and random systems.
Furthermore, these measures confirm that when the size of the vocalic sys-
tem increases, the existing phonemes have a strong reciprocal positive in-
fluence (i.e. the interactional force increases), in order to partially compen-
sate for the fact that these phonemes may have a smaller intrinsic value.
This compensation, similar to a positive retroaction loop typical of numer-
ous complex systems, can furthermore be a determining factor in the
mechanisms of evolution of phonological systems.
To test this hypothesis, we modelled a stochastic evolution of PI based
on real systems, taking into account the global cohesion reached by the
systems at each step of evolution. This led us to estimate a stability value
for each system, and it showed that even systems with a relatively small
cohesion (most often because of a large number of vowels) are judged sta-
ble by the model. Moreover, stable systems with more than 9 vowels use
two distinct series of vowels (oral vs. nasal for example), thus illustrating a
principle of feature economy or parsimony. We see here an emergent regu-
larity which, if not predictable on the basis of cohesion alone, is neverthe-
less compatible with the fact that systems with a relatively large number of
vowels are not rare (45% of UPSID languages have 8 or more vowels).
More than the results themselves, our intention is to validate the fact
that an interdisciplinary approach coming from the science of complexity
allows the effective extraction of relevant information from PIs. This
should not hide the fact that various issues are still at stake and require
further study. One of the most important points is to define an approach
which better validates the predictions of the evolutionary model, despite the
limited size of UPSID. As a matter of fact, the frequency of distribution of
Structural complexity of phonological systems 167

systems (which can be linked to some extent to the quality or fitness of
their response to synchronic and diachronic constraints) cannot be properly
estimated with this database. If this major problem is solved, we will be
able to evaluate more precisely if the second order complexity (considera-
tion of 2-2 interactions at the microscopic level) gives a good approxima-
tion of the fitness of systems or if the relations existing between micro- and
macroscopic levels are even more complex.
Finally, another interesting aspect deals with the study of the evolution-
ary routes themselves, so as to discover potential attractors and cyclic at-
tested trajectories in the history of languages. In the long term, the instan-
tiation of these elements within a multi-agents model will allow us to
address the external factors of evolution (socio-linguistic ones) as well, and
to confront this approach with rich theoretical frameworks, such as the one
proposed in Mufwene (2001).


Notes

1
. All our data come from a slightly modified version of the UPSID database
(Maddieson, 1984, Maddieson & Precoda, 1990) which contains 451 languages
balanced regarding geographical distribution and genetic affiliation.
2
. This approach has been selected from among several potential methods because
it preserved interesting properties in terms of structure (see below).
3
. The sets of features describing vowels and consonants being disjoint, we ap-
plied the algorithm separately on the two sub-systems.
4
. The only language in UPSID presenting that situation is Kashmiri, with an
opposition between // and //.
References
Abraham R.
2001 The genesis of complexity, unpublished ms available at:
http://www.ralph-
abraham.org/articles/MS%23108.Complex/complex.pdf (consulted
in December 2007).
Bonchev, D. and Buck, G. A.
2005 Quantitative measures of network complexity. In Complexity in
chemistry biology and ecology. Bonchev, D. and Rouvray, D. (Eds.).
Springer Verlag. New York.

168 Christophe Coup, Egidio Marsico and Franois Pellegrino

Cancho, R. F. i. and Sol, R. V.
2001 The small-world of human language. Santa Fe Institute Working
Paper 01: 03-016.
Cancho, R. F. i., Sol, R. V. and Khler, R.
2004 Patterns in syntactic dependency networks. Physical Review E. 69:
051915-051911-051919.
Claussen, J. C.,
2004 Off-diagonal Complexity: A computationally quick complexity
measure for graphs and networks. q-bio.MN/0410024.
Clements, G. N.
2003 Feature economy as a phonological universal. Proceedings of the 15
th

International Congress of Phonetic Sciences. Barcelona. pp. 371-374.
Dorogovtsev, S. N. and Mendes, J. F. F.
2001 Language as an evolving word web. Proceedings of The Royal Soci-
ety of London Series B, Biological Sciences, 268: 2603-2606.
Erds, P. and Rnyi, A
1960 The Evolution of Random Graphs. Magyar Tud. Akad. Mat. Kutat
Int. Kzl. 5: 1761.
Ferguson, Charles H.
1978 Historical backgrounds of universal research. In J. H. Greenberg, C.
A. Ferguson & E. A. Moravcsik (eds.). Universals of human lan-
guage, vol. 1. pp. 61-93. Stanford, CA: Stanford University Press.
Jensen, P.
2006 Network-based predictions of retail store commercial categories and
optimal locations. Phys. Rev. E 74: 035101.
Jukna, S.
2006 On graph complexity. Combinatorics, Probability & Computing 15:
1-22.
Liljencrants, J., Lindblom, B.
1972 Numerical simulation of vowel quality systems: the role of percep-
tual contrast. Language 48: 839-862.
Lindblom, B.
1986 Phonetic universals in vowel systems. In Experimental phonology.
Ohala, J., Jaeger, J. (eds). Orlando: Academic Press. pp. 13-44.
1998 Systemic constraints and adaptive change in the formation of sound
structure. In Approaches to the evolution of language. Hurford J.R.,
Studdert-Kennedy M. & Knight C. (eds). Cambridge University
Press: Cambridge. pp. 242-264.
Lindblom, B. and Maddieson, I.
1988 Phonetic universals in consonant systems. In Language, Speech and
mind. Li, C., Hyman, L. (eds). London: Routledge. pp. 62-78.
Maddieson, I.
1984 Patterns of sounds. Cambridge: Cambridge University Press.
Structural complexity of phonological systems 169

2006 Correlating phonological complexity: data and validation. Linguistic
Typology 10.1: 106-123
Maddieson, I., Precoda, K.
1990 Updating UPSID. UCLA Working Papers in Phonetics 74: 104-111
Marsico, E., Maddieson, I., Coup, C., Pellegrino, F.
2004 Investigating the hidden structure of phonological systems. Proceed-
ings of the 30th Meetingof the Berkeley Linguistic Society. Berkeley.
pp. 256-267.
Mufwene, S. S.
2001 The ecology of language evolution. Cambridge: Cambridge Univer-
sity Press.
Neel, D. L. and Orrison, M. E.
2006 The Linear Complexity of a Graph. The Electronic Journal of Com-
binatorics 13.
Ohala, J. J.
1980 Moderator's summary of symposium on 'Phonetic universals in pho-
nological systems and their explanation', Proceedings of the 9th In-
ternational Congress of Phonetic Sciences, Vol. 3. Copenhagen: In-
stitute of Phonetics, pp. 181-194.
Pastor-Satorras, R. and Vespignani, A.
2001 Epidemic spreading in scale-free networks. Phys. Rev. Lett. 86:
3200-3203.
Schwartz, J. L., Bo, L. J., Valle, N., and Abry, C.
1997 The Dispersion-Focalization Theory of Vowel Systems. Journal
of Phonetics 25.3: 255-286
Sol, R. V.
2004 Scaling laws in language evolution. In Power Laws in the Social Sci-
ences. Cioffi, C. (Ed.), Cambridge University Press, Cambridge, MA.
Steels, L.
1997 The synthetic modeling of language origin. Evolution of Communi-
cation Journal 1: 1-34
Valle, N.
1994 Systmes vocaliques : de la typologie aux prdictions. Thse de
doctorat, Grenoble: Universit Stendhal.
Watts, D.J. and Strogatz, S. H.
1998 Small world. Nature. 393: 440-442
Weaver, W.
1948 Science and Complexity. American Scientist 36: 536


Scale-free networks in phonological and
orthographic wordform lexicons
Christopher T. Kello and Brandon C. Beltz
1. Competing constraints on language use
Languages are constrained by the physical, perceptual, and cognitive prop-
erties of human communication systems. For instance, there are upper
bounds on the amount of time available for communication. These bounds
constrain the lengths of phonological and orthographic codes so that com-
munication can proceed apace. There are also constraints on the amount of
linguistic information that can be condensed into a given span of perception
or production (Liberman, 1967). These constraints place lower bounds on
the amounts of speech activity needed for phonological and orthographic
codes.
Constraints on languages often work in opposition to one another, per-
haps the most famously proposed example being Zipfs principle of least
effort (Zipf, 1949). On the one hand, memory constraints produce a ten-
dency towards using fewer numbers of words to reduce memory effort
needed to store and access them. A vocabulary that requires minimal mem-
ory effort on the part of the speaker is one that uses a single word for all
purposes. On the other hand, ambiguity constraints produce a tendency
towards using larger numbers of words to reduce the number of meanings
per word, and thereby reduce effort needed to disambiguate word mean-
ings. A vocabulary that requires minimal disambiguation effort on the part
of the listener is one that uses a different word for every distinct concept.
The principle of least effort states that natural languages are constrained to
minimize both speakers and listeners efforts, and only by balancing them
can effective communication be achieved.
It is generally accepted that language usage must strike a balance be-
tween these two kinds of effort. However, Zipf controversially claimed that
the principle of least effort is responsible for a particular kind of scaling
law (also known as a power law) that appears to be true of word usage
throughout the world. The scaling law states that the probability of using a
given word W in language L is approximately inversely proportional to its
frequency rank,
172 Christopher T. Kello and Brandon C. Beltz



where 1. For instance the highest ranked word in English (THE) is
about twice as likely to occur as the second highest ranked word (OF),
which is about twice as likely as the fourth highest, and so on.
This scaling law in the distribution of word frequencies means that a
few words are used very often and most words are used rarely. This dichot-
omy creates a combination (balance) of high frequency words requiring
little memory effort (because they are general-purpose words used often in
many different contexts), and low frequency words requiring little disam-
biguation effort (because they are specialized words with particular mean-
ings and contexts). The connection between word frequency and word
meaning is evident, for instance, in the fact that closed-class words tend to
be the most frequent of their language, and also appear in the most general
contexts (e.g., the English word THE may be followed by virtually any
noun, adjective, or adverb, albeit some words follow more frequently than
others). Rare words are often from highly specialized domains and there-
fore appear in very particular contexts (e.g., terms specific to a given pro-
fession).
Zipfs law transparently corresponds to a continuous balance across the
frequency range, from minimizing memory effort in the few frequent, con-
text-general words, to minimizing disambiguation effort in the many rare,
context-specific words (see Morton, 1969). This balance is present at all
measureable scales because the function between word frequency and fre-
quency rank is the same regardless of the scale at which these variables are
measured (i.e., the relation is invariant over multiplication by a common
factor).
The idea that Zipfs principle of least effort leads to this scaling law
makes some intuitive sense, but Zipf never gave a rigorous proof of it.
More problematically, other candidate hypotheses came to light that ap-
peared to provide simpler explanations. Mandelbrot (1953), Miller (1957),
and Li (1992) each showed that scaling law frequency distributions could
be obtained from texts composed of random letter strings. Their proofs
have led many researchers to discount such distributions as inevitable and
therefore trivial facts of language use.
However, others have pointed out that corpora composed of random
strings have important differences with natural language corpora (Tsonis,
Schultz, & Tsonis, 1997). For instance, the most frequent random strings
are necessarily those of middling length, whereas in natural languages these
( )

r W P
L

Scale-free networks in phonological and orthographic wordform lexicons 173

tend to be the shorter words. Random strings also cannot speak to the rela-
tionship between word frequency and word meaning. More generally, ran-
dom strings do not have the capacity for structure that is requisite of real
wordforms. Thus it appears that random strings exhibit scaling laws be-
cause string frequency has a particular relationship with string length, but
this relationship is not what creates scaling laws in real word frequencies.


1.1. Criticality in language use
Spurred by the inadequacies of random string accounts, Ferrer i Cancho
and Sol (2003) conducted an information theoretic analysis to investigate
Zipfs hypothesized connection between the principle of least effort and
scaling law frequency distributions. The authors showed that, under fairly
general assumptions, the balance of memory effort and disambiguation
effort can be shown to produce a scaling law in the frequency distribution
of word usage. Their analysis was motivated by theories of critical pheno-
mena that were developed in the area of physics known as statistical me-
chanics (Huang, 1963; Ma, 1976)
.

The aim of statistical mechanics is to describe the probabilistic, ensem-
ble (global) states of systems with many interacting components. Ferrer i
Cancho and Sol (2003) modeled communication systems by treating lan-
guage users as system components and word usage as the result of compo-
nent interactions. From this perspective, ensemble states correspond to
distributions of word usage, and the authors focused on two kinds of distri-
butions that often constitute opposing phases of a systems behavior. One
phase is characterized by high entropy in that systems may exhibit different
behaviors with roughly equal probability (i.e., a flat probability distribu-
tion). The other is characterized by low entropy in that some behaviors may
occur more often than others (i.e., a peaked probability distribution).
In this framework, the high entropy phase corresponds to minimizing
disambiguation effort in that many different words are used in order to
distinguish among many different meanings (i.e., a relatively flat probabil-
ity distribution of word usage). The low entropy phase corresponds to
minimizing memory effort in that only one or a few words are used for
most meanings (i.e., a relatively peaked probability distribution of word
usage). As explained earlier, an effective communication system is one that
strikes a balance between these two opposing phases.
174 Christopher T. Kello and Brandon C. Beltz

Theory from statistical mechanics is useful here because it has been
shown that, when complex systems transition between phases of low and
high entropy, the transition often occurs abruptly rather than gradually (Ma,
1976). In thermodynamic terms, low memory effort and low disambigua-
tion effort may be two opposing phases of the communication system that
have a sharp phase transition between them. Systems poised near phase
transitions are said to be in critical states, and critical states are known to
universally exhibit scaling laws in their behaviors, including scaling law
distributions like Zipfs law (Bak & Paczuski, 1995).
Thus evidence of Zipfs law suggests that communication systems tend
to be poised near critical states between phases of low memory effort and
low disambiguation effort. To investigate this hypothesis, Ferrer i Cancho
and Sol (2003) built a very simple, information theoretic model of a com-
munication system, and they optimized the model according to two oppos-
ing objectives: To minimize the entropy of word usage on the one hand
(minimize memory effort), while also minimizing the entropy of meanings
per word on the other hand (minimize disambiguation effort). These entro-
pies were opposed to one another and the model contained a parameter that
governed their proportional influence on communication.
Model results revealed a sharp transition between the phases of low
memory effort and low disambiguation effort. Moreover, Zipfs law was
obtained when communication was poised near this phase transition. These
simulation results provide a theoretically grounded explanation of Zipfs
law, but one might question whether the authors have built a bridge too far:
why would theories of critical phenomena developed for physical systems
apply to systems of human communication? The answer is that systems in
critical states exhibit general principles of behavior that hold true regardless
of the particular kinds of components that comprise the system, a phe-
nomenon known as universality in theoretical physics (Sornette, 2004).
Thus interacting atoms or interacting words or interacting people may all
share in common certain principles of emergent behavior.
2. Competing constraints on wordform lexicons
If principles of criticality are general to language systems, then scaling laws
analogous to Zipfs law should be found in language systems wherever
there is a phase transition between low and high entropy. In the present
study, we adopt and adapt Ferrer i Cancho and Sols (2003) information
Scale-free networks in phonological and orthographic wordform lexicons 175

theoretic analysis to investigate an analogously hypothesized phase transi-
tion in language systems.
The language domain that we focus on is wordform lexicons. For the
sake of simplicity let us represent wordforms as linear strings of phonemes
or letters. The appearances of words in speech or text can be coarsely rep-
resented as such strings, in which case wordform lexicons consist of all
strings that appear as wordforms in a given language (token information
about individual appearances is discarded). Language users must know
their wordform lexicons to communicate, and thus communication con-
straints should apply to lexicon structure, just as they apply to word usage
(the latter being defined in terms of token information instead of lexicon
structure). We investigated two competing constraints on lexicon structure
that are analogous to the ambiguity and memorability constraints hypothe-
sized for Zipfs law, namely, distinctiveness and efficiency constraints.
On the one hand, the efficiency of lexicon structure should be maximal
in order to minimize the resources needed to represent them. Analogous to
Zipfs law, a maximally efficient lexicon is one that uses the fewest number
of letter strings necessary to distinguish among all wordforms. This means
that letter strings are reused across wordforms as much as possible. If one
allows homophones or homographs to occur without limit (i.e., using the
same wordforms to represent multiple word meanings, as in /mit/MEAT
or MEET for homophones, and WIND/wnd/ or /wnd/ for homo-
graphs), then a maximally, overly efficient lexicon would use only one
letter string to code all words.
On the other hand, the mutual distinctiveness of wordforms in a lexicon
should be maximal in order to minimize the chance of confusing them with
each other during communication. A distinctive lexicon is one whose word-
forms use unique letter strings, with minimal substring overlap across
wordforms. For instance, the English orthographic wordform YACHT is
distinctive because substrings like YACH, ACHT, YAC, and CHT are not
themselves English wordforms (note that substrings are position-
independent). By contrast, the wordform FAIRED is less distinctive be-
cause FAIR, AIR, AIRED, IRE, and RED are all wordforms themselves.
Note that a wordform liked FARED would also be less distinctive (e.g.,
FAR, ARE, RED, FARE), even though FARED is not a substring of
FAIRED.
We define these competing constraints in terms of all substrings (i.e.,
wordforms of all sizes) because there does not appear to be any privileged
scale of substring analysis. One can see this in the fact that, collectively
176 Christopher T. Kello and Brandon C. Beltz

speaking, languages of the world use all scales of substrings to express their
phonological, orthographic, and morphological structures. In English, for
instance, some inflectional morphemes are expressed as single letters (e.g.,
-s for pluralization), whereas others conveying whole word meanings are
expressed by strings as large as the wordforms themselves. Between these
extremes one can find morphological structures expressed as substrings at
any given scale, in any given position.
Because distinctiveness and efficiency constraints are defined over all
substrings, an analysis of any given language will include substrings that
are not linguistically relevant to the wordforms containing them. For in-
stance, the wordform RED does not correspond to a linguistic unit in the
wordform FAIRED, yet it is included below in our analysis of an English
orthographic wordform lexicon. Conversely, substrings will not capture all
possible morphological structures (e.g., infixes in languages like Hebrew).
One-to-one correspondence between substrings and linguistic structures is
not necessary for our analysis because substrings are not meant to capture
all the factors that might help to shape a wordform lexicon; this would not
be feasible. Substrings are only meant to capture one facet of the hypothe-
sized balancing act between distinctiveness and efficiency, albeit a salient
one.
The face validity of our analysis can be seen in the functional impor-
tance of balancing distinctiveness and efficiency: If distinctiveness is over-
emphasized, then structure will not be sufficiently shared across word-
forms. If efficiency is over-emphasized, then structure is not sufficiently
heterogeneous across wordforms. Our research question is whether the
need to balance these competing constraints poises wordform lexicons near
a phase transition between states of low and high entropy. If so, then a scal-
ing law is predicted to occur in the distributions of substrings that comprise
wordform lexicons.


2.1. Scale-free wordform networks
We explain how a scaling law is predicted in the next section, but it is help-
ful to first point out that our prediction corresponds to what is commonly
referred to as a scale-free network. To illustrate by contrast, note how the
word frequency distributions following Zipfs law do not have any explicit
connections among the words. This is because only frequency counts are
relevant to Zipfs law. Substring frequency distributions are different be-
Scale-free networks in phonological and orthographic wordform lexicons 177

cause substring counts are related to the substring structure of wordform
lexicons.
For instance, each substring count for the English wordform RED corre-
sponds to its connection with another English wordform (RED is a sub-
string of FAIRED, REDUCE, PREDICT, and so on). These connections
form a structure that can be formalized as network (i.e., directed graph) for
which each node is a different wordform, and one node is linked to another
whenever the former is a substring of the latter. A small piece of the net-
work created from an English wordform lexicon is diagrammed in Fig 1.

Figure 1. Piece of English orthographic wordform network

The inclusion of all substring relations among wordforms creates a densely
interconnected network with a tree-like branching structure from shortest to
longest wordforms. The shortest wordforms serve as the tree trunks; they
have no incoming links because no wordforms are contained within them.
The longest and most unique wordforms are at the branch tips; they have no
outgoing links because they are not substrings of other wordforms. The
progression from trunks to tips is highly correlated with wordform length,
but not strictly tied to it: Some longer but common substrings are more
trunk-like than shorter but unusual substrings (e.g., SING is more root-like
than YO).
This wordform network is relevant to our research question because the
links are directly related to the distinctiveness and efficiency of the word-
form lexicon. In particular, distinctiveness increases as the number of in-
coming links decreases, and efficiency increases as the number of outgoing
links increases. Thus wordform networks serve as tools for conceptualizing
and analyzing the hypothesized distinctiveness and efficiency constraints
on lexicon structure.
In terms of the network formalism, our predicted scaling law can be
found in the counts (i.e., degrees) of outgoing links per node (i.e., the num-
178 Christopher T. Kello and Brandon C. Beltz

ber of times that a given wordform appears as a substring of another word-
form in the lexicon). Rather than use the frequency rank distribution as for
Zipfs law, network link distributions are often expressed in terms of the
cumulative probability distribution: The probability of choosing a word-
form node at random whose number of outgoing links is k is predicted to
be


where 1 is typically referred to as a scale-free network. The cumula-
tive probability distribution is a popular means of expressing scale-free
networks, in part because exponents can be more directly and reliably esti-
mated from it (see Kirby, 2001).
Casting our predicted scaling law as a scale-free network is also poten-
tially useful because scale-free networks have attracted a great deal of at-
tention in recent years throughout the sciences. Many systems in nature and
society can be represented as networks, and it turns out that such networks
are often scale-free. For instance, scale-free network structures have been
found in computer networks (Barabasi, Albert, & Jeong, 2000; Albert, Ha-
woong, & Barabasi, 1999), business networks (Wasserman & Faust, 1994),
social networks (Barabasi, Jeong, Neda, Ravasz, Schubert, & Vicsek,
2002), and biological networks of various kinds (Jeong, Tombor, Albert,
Oltvai, & Barabasi, 2000; Sole, 2001).
In the context of language, Steyvers and Tenenbaum (2005) found that
semantic networks of words have scale-free structures when constructed
using either behavioral or encyclopedic methods. They built one semantic
network from word association data by linking any two word nodes for
which one was given as an associate of the other (e.g., a participant might
associate the word NURSE with DOCTOR). Two other networks were
similarly built using encyclopedic methods, one based on a thesaurus and
the other on an on-line encyclopedia. All three methods yielded semantic
networks whose link distributions obeyed a scaling law.
Semantic networks have the connotation of spreading activation across
the nodes via their links, and many other networks also entail transmission
of information or materials among the nodes. However, it is important to
clarify that our wordform networks do not come with an assumption of
spreading activation or information transmission among wordforms. We
employ the network formalism only for its structural properties.


( )

k k P
Scale-free networks in phonological and orthographic wordform lexicons 179

2.2. Information theoretic analysis
To show how a scale-free wordform network is predicted in the balance of
distinctiveness and efficiency constraints, we parallel Ferrer i Cancho and
Sols (2003) information theoretic analysis that showed how Zipfs law
can be predicted from Zipfs principle of least effort.
We represent a wordform network as a binary matrix A = {a
ij
}. Each
row i represents a wordform w
i
, where 1 i n and n is the number of
words in the lexicon. Each column j also represents a wordform numbered
from 1 to n. Each a
ij
= 1 if w
i
is a substring of w
j
(wordforms are treated as
substrings of themselves, i.e., a
ij
= 1 for all i = j), and a
ij
= 0 otherwise. The
probability that wordform w
i
appears as a substring, relative to all other
wordforms, is given by (all sums are from 1 to n),



The efficiency of a wordform lexicon is defined in terms of the entropy of
the substring probability distribution,



H
n
(w) = 0 when a single wordform is used for all words, and H
n
(w) = 1
when all wordforms appear as substrings of other wordforms equally often
(the upper boundary is 1 because the log is base n). Thus a lexicon is effi-
cient to the extent that it uses as few wordforms as possible, where fewer
corresponds not only to the number of different wordforms, but also the
frequency with which each is used.
The distinctiveness of a wordform w
i
is defined in terms of its diagnos-
ticity, that is, the amount that uncertainty is reduced about the identity of a
word W given that it contains w
i
. The negative of this amount can be quan-
tified by the entropy over the probability distribution of wordforms condi-
tioned by the presence of w
i
,



H
n
(W|w
i
) = 1 when the presence of w
i
provides no information about the
identity of W, and H
n
(W|w
i
) = 0 when the presence of w
i
assures the identity
of W. Each conditional probability is given by

( )

=
k l
kl
j
ij i
a a w P
( ) ( ) ( )

=
i
i n i n
w P w P w H log
( ) ( ) ( )

=
j
i j n i j i n
w w P w w P w W H log
( )

=
k
ik ij i j
a a w w P
180 Christopher T. Kello and Brandon C. Beltz



Finally, the overall distinctiveness of a wordform lexicon is defined as the
average distinctiveness over wordforms (the average is used to normalize
both H
n
(w) and H
n
(W|w) between 0 and 1),



The balancing of distinctiveness and efficiency now translates into the si-
multaneous minimization of H
n
(w) and H
n
(W|w). These constraints are in
opposition to each other because H
n
(W|w) = 1 when H
n
(w) = 0, i.e., when a
single wordform is used for all words. However, when wordforms appear
as substrings equally often, H
n
(w) = 1, there is no guarantee that substrings
will be as diagnostic as possible, H
n
(W|w) = 0. This is true because word-
forms may be equally overused as substrings. Thus these constraints are
not isomorphs of each other. The balance of minimizing H
n
(w) versus
H
n
(W|w) is parameterized by 0 1 in



In their parallel analysis, Ferrer i Cancho and Sol (2003)

created matrices
A

that minimized () at numerous sampled values of 0 1 (see also


Ferrer i Cancho, 2006). They showed that at 0.4, a sharp transition ex-
isted in the values of their entropic measures that were analogous to H
n
(w)
and H
n
(W|w). Moreover, they found that the frequency of word usage was
distributed according to Zipfs law at the transition point. Thus it appears
that this point is a phase transition exhibiting a scaling law.
Our analysis parallels Ferrer i Cancho and Sols (2003) in order to
make the same kind of scaling law prediction, but in terms of substring
structure in a wordform lexicon, rather than word usage in communication.
Thus our analysis predicts a scaling law in the distribution of outgoing links
across wordform nodes, that is, it predicts a scale-free network. This scale-
free network is predicted at the transition point between phases of lexicon
distinctiveness versus lexicon efficiency. Languages are predicted to evolve
towards this transition point because all lexicons need to distinguish be-
tween wordforms while simultaneously minimizing the resources needed.

( ) ( ) ( ) ( ) w W H w H
n n
+ = 1
( ) ( ) n w W H w W H
i
i n n
=
Scale-free networks in phonological and orthographic wordform lexicons 181

3. Empirical evidence for scale-free wordform networks
Our predicted scaling law is relatively straightforward to test. It simply
requires the creation of wordform networks from real languages, and the
examination of their structure for a scaling law in their link distributions.
We begin with networks created from phonological and orthographic word-
forms in English, and we then report the same analyses for four other lan-
guages.


3.1. English wordform networks
A total of 104,347 printed words and 91,606 phonetically transcribed words
were drawn from the intersection of the Carnegie Mellon University pro-
nunciation dictionary and the Wall Street Journal corpus. The letter strings
comprised an orthographic wordform lexicon, and the phoneme strings
were used to create two different phonological wordform lexicons, one with
lexical stress markings on the vowels (primary, secondary, and tertiary) and
one without stress markings. The frequency of wordform usage was not
part of the wordform lexicons.
A wordform network was created for each of the three lexicons. Each
node in each network corresponded to an individual wordform, and within
each network one node was linked to another if the former wordform was a
substring of the latter. For the stress-marked lexicon, one wordform was a
substring of another only if both the phonemes and stress markings of the
former were contained in the latter. Each node i of a network had k
i
outgo-
ing links, where 1 k
i
n and n is the total number of wordforms in the
corresponding lexicon. As mentioned earlier, the predicted scaling law is
usefully expressed in terms of the cumulative probability distribution,
which is linear under a logarithmic transform (the intercept is zero),


This expression facilitates visualization and analysis of the data.
Cumulative probability distributions for the three wordform networks
are plotted on a log-log scale in Fig 2. Clear evidence for a scaling law can
be seen in the negative linear relation between log P( k) and log k. The
exponent of the scaling relation for each distribution was estimated by the
slope of a linear regression line fit to the data between 1 log k 3. In
theory, scaling laws range over all scales (i.e., the entire distribution), but
( ) k k P log log
182 Christopher T. Kello and Brandon C. Beltz

empirical observations rarely if ever achieve this ideal because of limited
amounts of data and other practical limitations. These limitations typically
show up in cumulative probability distributions as deviations in the tails
from the scaling relation. These deviations are slight for the wordform net-
works plotted in Fig 1, but to avoid them exponents were estimated from
the middle of the distribution.
Figure 2. English wordform networks

The estimated exponents are close to the canonical value of 1 for scale-
free networks. The exponent estimate for the orthographic wordform net-
work is slightly more negative than the others, indicating that it is slightly
more densely interconnected (and likewise for the phonological network
without stress versus with stress). These differences in density are under
investigation but they may be partly due to the differences in morphological
transparency among the lexicons: In English, orthographic wordforms
represent morphological structure more directly, e.g., SIGN is a substring
of SIGNATURE in the orthographic network but not the phonological net-
works. Differences aside, the results generally confirm the predicted scaling
law.



-5
-4
-3
-2
-1
0
0 1 2 3 4 5
log k
l
o
g

P
(

k
)
Ortho, -0.82
Phono w/ Stress, -1.05
Phono w/o Stress, -0.90
Fig 1. English Wordform Networks
Scale-free networks in phonological and orthographic wordform lexicons 183

3.2. Wordform networks in other languages
The same wordform network analyses were also conducted on orthographic
wordform lexicons for Dutch, German, Russian, and Spanish. These par-
ticular lexicons were chosen only because they were readily analyzable and
downloadable at ftp://ftp.ox.ac.uk/pub/wordlists. These languages represent
a sample of the Indo-European language family. In terms of their morpho-
logical structure, they are mostly characterized as synthetic languages (i.e.,
high morpheme-to-word ratios). Comparing these languages to English,
which is more of an isolating language (i.e., low morpheme-to-word ratio)
provides an initial gauge of the degree to which language type influences
the results of our analyses.

Figure 3. Other wordform networks

The cumulative probability distribution for each wordform network is plot-
ted in Fig 3 with the lexicon size and the estimated scaling exponent (the
axes are the same as in Fig 2). All four languages show evidence of a scal-
ing relation in the center of their link distributions with estimated exponents
near 1. Estimates varied slightly across languages, as did the amount of
deviation in the tails of the distributions.
178,339 Dutch Wordforms
-1.01
85,947 Spanish Wordforms
-0.91
31,801 Russian Wordforms
-1.07
159,102 German Wordforms
-1.06
184 Christopher T. Kello and Brandon C. Beltz

The wordform statistics of the orthographic lexicons are reported in Ta-
ble 1. N is the number of wordforms analyzed, M and SD are the mean and
standard deviation of wordform lengths respectively, and is the estimated
scaling exponent of the link distributions. Evidence for the isolating quality
of English morphology is reflected in its shorter mean wordform length
compared with the other languages (fewer and smaller morpheme combina-
tions), which are more synthetic by comparison. The slightly less negative
scaling exponent for English may be due to its isolating quality, but the
cross-linguistic differences in may also be due to idiosyncracies in the
corpora used. For our current purposes, it is sufficient that all the languages
exhibit a scaling law as predicted.

Table 1. Summary statistics for orthographic lexicons
N M SD
English 104,347 7.3 2.3 -0.82
Dutch 178,339 10.2 3.0 -1.01
German 159,102 11.9 3.5 -1.06
Russian 31,801 8.1 2.4 -1.07
Spanish 85,947 8.9 2.5 -0.91


3.3. Ruling out an artifactual explanation
Altogether, our network analyses appear to provide considerable evidence
for the scaling law predicted to occur in the balance of distinctiveness and
efficiency constraints on the structure of wordform lexicons. But before
coming to this conclusion, we must first determine whether these results
may be an inevitable and therefore trivial property of wordform networks
created from substring relations. In particular, it may be that lexicons com-
posed of variable-length random letter strings also produce the predicted
scaling law. This may seem possible because, even for random letter
strings, shorter wordforms will tend to have more outgoing links compared
with longer wordforms, and the longest wordforms will have no outgoing
links. Thus variations in wordform length alone may be sufficient to create
the predicted scaling law.
We tested this artifactual explanation by creating a wordform lexicon
comprised of random letter strings using essentially the same method as
used by Mandelbrot (1953), Miller (1957), and Li (1992). Each wordform
Scale-free networks in phonological and orthographic wordform lexicons 185

was incrementally built up by repeatedly adding a letter with probability p
= 0.82, or completing the wordform with probability 1p = 0.18. Each let-
ter was chosen at random with equal probability, and the completion prob-
ability was chosen so that the average wordform length would be the same
as that for our corpus of English orthographic wordforms. A total of
104,347 random wordforms were created, which is the size of our English
orthographic wordform lexicon.
The cumulative probability distribution for the random wordform net-
work is plotted in Fig 4. The graph shows that the distribution does not at
all resemble the scaling relation observed for the English orthographic
wordform network, whose distribution is also plotted for purposes of com-
parison. Instead of a scaling relation, the random wordforms yielded a
tiered distribution that is indicative of characteristic numbers of outgoing
links per node. For instance, the majority of nodes had only one or a few
outgoing links, but a second large group of nodes had 30-35 links. Hardly
any nodes had between 6 and 18 links. Five other random lexicons were
generated and each one resulted in a similarly tiered distribution.
Figure 4. Artificial wordform networks
The failure of random wordform lexicons to yield a scaling relation shows
that our results with real lexicons were not an artifact of length variability
in wordform lexicons. It therefore appears that the observed scaling rela-
tions reflect a property of the structural relations among wordforms in natu-
ral languages. To provide further support for this conclusion, we tried to
recreate the scaling relation by creating an artificial wordform lexicon us-
ing the bigram frequencies of English orthography. Wordforms were again
-5
-4
-3
-2
-1
0
0 1 2 3 4 5
log k
l
o
g

P
(

k
)
English Orthography
Random Wordforms
Bigram Wordforms
Fig 3. Artificial Wordform Networks
186 Christopher T. Kello and Brandon C. Beltz

built up incrementally, except that the probability of each letter being cho-
sen was conditioned on the previous letter, and the conditional probabilities
were estimated from the Wall Street Journal corpus. So for instance, if the
letter Q happened to be chosen as the first letter of a given wordform, there
was a 97% chance that the second letter would be U. This method created a
wordform lexicon that mimicked the statistical properties of English word-
forms.
The cumulative probability distribution for the bigram wordform net-
work is also plotted in Fig 4. This distribution is much closer to the pre-
dicted scaling law in that the tiers are gone and the slope of the overall de-
scent is near -1. However, there is a bump over most of the center of the
distribution that deviates from the nearly perfect linear relation of the Eng-
lish wordform network. This result indicates that the statistical structure of
English wordforms did, in fact, play a role in the observed scaling relation.
However it also suggests that not all relevant aspects of wordform structure
are captured by bigram frequencies because the scaling relation was not
entirely recovered. Work is underway to determine whether more of the
scaling relation can be recovered with artificial lexicons that more closely
mimic the statistical structure of English.
4. Conclusions
In this chapter, theories of criticality were used to predict a heretofore un-
examined scaling law in the structure of phonological and orthographic
wordform lexicons. Evidence for the predicted scaling law was found in the
wordforms of five different languages, and analyses of artificial lexicons
showed that the scaling law is not artifactual. The law is hypothesized to
emerge from the balance of two competing constraints on the evolution of
wordform lexicons: Lexicons must be as distinctive as possible by mini-
mizing substring overlap among wordforms, while also being as efficient as
possible by reusing substrings as much as possible. A phase transition is
hypothesized at the balance of these high and low entropy phases, respec-
tively. Empirical and theoretical work on critical phenomena predicts a
scaling law distribution near the hypothesized phase transition.
The predicted scaling law distribution was expressed in terms of scale-
free networks in which wordforms were connected whenever one was a
substring of another. In general, some of these substring links reflect the
linguistic structure that underlies wordforms. For instance, root morphemes
Scale-free networks in phonological and orthographic wordform lexicons 187

like FORM will often be substrings of their inflected and derived forms like
FORMED and FORMATION, respectively. Also, monosyllabic wordforms
like /fit/ are substrings of multisyllabic wordforms like /d'fit/. However,
substring relations do not always respect linguistic structure, and not all
linguistic structure is reflected in substring relations. For instance, LAND
is a substring of BLAND even though there is no morphological relation
between them, and /id/ is not a substring of /d/ even though the latter
verb is the past tense of the former.
This partial correspondence between our wordform networks and lin-
guistic structure makes their relationship unclear. Substring relations
among wordforms fall within in the purview of linguistics, but they do not
appear to have a place in current linguistic theories. Nonetheless, the ob-
served scaling relations are lawful and non-trivial, as we have argued, and
may be universal as well. If so, then it may prove informative to investigate
whether and how scale-free wordform networks may be accommodated by
linguistic theory.
For instance, there are some phonological processes that may fit with
our explanation of scale-free wordform networks. Processes like assimila-
tion, elision, syncope, and apocope may generally help to make wordform
lexicons more efficient by creating more overlap among wordforms,
whereas processes like dissimilation, epenthesis, and prothesis may help to
make wordform lexicons more distinctive by creating less overlap among
wordforms.
Finally, similar ideas have been explored in Lindbloms Theory of
Adaptive Dispersion (Lindblom, 1986; Lindblom, 1990) and in Ohalas
Maximum Use of Available Features (Ohala, 1980). It has been proposed
that the phonological contrasts of a language are chosen to simultaneously
1) maximize the distinctiveness of contrasts, and 2) minimize articulatory
effort. Constraint 1 is analogous to distinctiveness as we have defined it,
except that phonological contrasts are more fine-grained than substrings.
Constraint 2 stands in opposition to Constraint 1 and phonological systems
must strike a balance between these opposing constraints, analogous to how
lexicons must strike a balance between distinctiveness and efficiency.
Flemming (2004) proposed a third constraint on maximizing the number of
contrasts, which was intended to ensure lexicons of sufficient size without
excessively long words. Flemings constraint is useful for bridging the gap
between phonological inventories and lexicon efficiency.
The similarities between our theory and theories like Lindbloms and
Flemmings suggest possible avenues of fruitful exchange. In one direction,
188 Christopher T. Kello and Brandon C. Beltz

for instance, Lindbloms theory may benefit from principles of critical phe-
nomena. In the other direction, our analysis may benefit from the inclusion
of articulatory effort and distinctiveness among sounds, which clearly have
important influences on the structure of wordforms. Such theoretical ex-
changes exemplify the kind of transdisciplinary work that is currently going
on throughout the complexity sciences.
References
Albert, R., Hawoong, J., Barabasi, A.L.
1999 Diameter of the World Wide Web. Nature, 401, 130.
Bak, P. & Paczuski, M.
1995 Complexity, contingency, and criticality. Proceedings of the Na-
tional Academy of Sciences, 92, 66896696.
Barabsi, A., R. Albert, & Jeong, H.
1999 Mean-field theory for scale-free random networks. Physica A,
272(1), 173-187.
2000 Scale-free characteristics of random networks: the topology of the
world-wide web. Physica A, 281(1-4), 69-77.
Barabsi, A., Jeong, H., Nda, Z., Ravasz, E., Schubert, A., & Vicsek, T.
2002 Evolution of the social network of scientific collaborations. Physica
A, 311(3-4), 590-614.
Fellbaum, C.
1998. Wordnet: An Electronic Lexical Database. Cambridge, MA: MIT Press.
Ferrer i Cancho, R.
2006 When language breaks into pieces: A conflict between communica-
tion through isolated signals and language. BioSystems, 84, 242-253.
Ferrer i Cancho, R., & Sol, R.V.
2003 Least effort and the origins of scaling in human language. Proceedings
of the National Academy of Science, 100(3), 788-791.
Flemming, E.
2004 Contrast and perceptual distinctiveness. In Hayes, Kirchner &
Steriade (Eds), Phonetically-Based Phonology (pp 232-276). Cam-
bridge University Press.
Huang, K.
1963 Statistical Mechanics. New York: Wiley.

Jeong, H., Tombor, B., Albert, R., Oltvai, Z., & Barabsi, A.
2000 The large-scale organization of metabolic networks. Nature,
407(6804), 651-654.

Scale-free networks in phonological and orthographic wordform lexicons 189

Kirby, S.
2001 Spontaneous evolution of linguistic structure: An iterated learning
model of the emergence of regularity and irregularity. IEEE Transac-
tions on Evolutionary Computation, 5(2), 102-110.
Li, W.
1992 Random texts exhibit Zipf's-law-like word frequency distribution.
IEEE Transactions on Information Theory, 38(6), 1842-1845.
Liberman, A.M., Cooper, F.S., Shankweiler, D.P., & Studdert-Kennedy, M.
1967 Perception of the Speech Code. Psychological Review, 74(6), 431-461.
Lindblom, B.
1986 Phonetic universals in vowel systems. In J.J. Ohala and J.J. Jaeger
(Eds), Experimental Phonology. Orlando, Florida: Academic Press.
1990 Phonetic content in phonology. PERILUS, 11.
Ma, S.
1976 Modern theory of critical phenomena. Reading: Benjamin / Cummings.
Mandelbrot, B.
1953 An Informational Theory of the Statistical Structure of Language. In
W. Jackson (Ed), Communication Theory. London: Bettersworths.
Miller, G.
1957 Some effects of intermittent silence. American Journal of Psychol-
ogy, 52, 311-314.
Morton, J.
1969 Interaction of information in word recognition. Psychological Re-
view, 76(2), 165178.
Newman, M.
2005 Power laws, Pareto distributions and Zipf's law. Contemporary
Physics, 46(5), 323-351.
Ohala, J.J.
1980. Chairmans introduction to symposium on phonetic universals in
phonological systems and their explanation. In Proceedings of the
Ninth International Congress of Phonetic Sciences, 1979, 184-185.
Copenhagen: Institute of Phonetics, University of Copenhagen.
Oudeyer, P.
2005 The self-organization of speech sounds. Journal of Theoretical Biol-
ogy, 233(3), 435-439.
Sol, R. V.
2001 Complexity and fragility in ecological networks. Proceedings of the
Royal Society B: Biological Sciences, 268(1480), 2039-2045.
Sornette, D.
2004 Critical phenomena in natural sciences: chaos, fractals, selforgani-
zation, and disorder: concepts and tools (2nd ed.). Berlin; New
York: Springer.

190 Christopher T. Kello and Brandon C. Beltz

Steyvers, M. & J. B. Tenenbaum
2005 The large-scale structure of semantic networks: Statistical analyses
and a model of semantic growth. Cognitive Science, 29(1), 41-78.
Tsonis, A., Schultz, C., & Tsonis, P.
1997 Zipf's law and the structure and evolution of languages. Complexity,
2(5), 12-13.
Wasserman, S. & Faust, K.
1994 Social Network Analysis: Methods and Applications. Cambridge:
Cambridge University Press.
Zipf, G. K.
1949 Human Behaviour and the Principle of Least Effort. New York:
Hafner.



Part 3:
Phonological representations in the light of complex
adaptive systems


The dynamical approach to speech perception:
From fine phonetic detail to abstract phonological
categories
Nol Nguyen, Sophie Wauquier, and Betty Tuller


Much research has been devoted to exploring the representations and
processes employed by listeners in the perception of speech. There is in that
domain a longstanding debate between two opposite approaches. Abstrac-
tionist models, on the one hand, are based on the assumption that an ab-
stract and speaker-independent phonological representation is associated
with each word in the listener's mental lexicon. In exemplar models of
speech perception, on the other hand, words and frequently-used grammati-
cal constructions are represented in memory as large sets of exemplars con-
taining fine phonetic information. In the present paper, the opposition be-
tween abstractionist and exemplar models will be discussed in light of
recent experimental findings that call this opposition into question. In keep-
ing with recent proposals made by other researchers, we will argue that
both fine phonetic detail and abstract phonological categories are likely to
play an important role in speech perception. A novel, hybrid approach, that
aims to get beyond the abstraction vs. exemplar dichotomy and draws on
the theory of nonlinear dynamical systems, as applied to the perception of
speech by Tuller et al. (1994), will be outlined.
1. Two approaches to the perception of speech
In spite of the huge variability shown by the speech signal both within and
across speakers, listeners are in most situations able to identify spoken
words and get access to their meaning in an effortless manner. A major
challenge in studies of speech perception is to understand how such varia-
tions in the pronunciation of words are dealt with by listeners in the sound-
to-meaning mapping. There is still much disagreement about the nature of
the processes and representations that this mapping may involve.
According to proponents of the highly influential abstractionist ap-
proach, the speech signal is converted by listeners into a set of context-
194 Nol Nguyen, Sophie Wauquier and Betty Tuller

independent abstract phonological units, in a way that variations in the
acoustic instantiation of a given word are factored out at an early stage of
processing (e.g., Fitzpatrick & Wheeldon, 2000; Lahiri & Reetz, 2002;
Stevens, 2002). It is often hypothesized that inter-individual variations are
removed prior to building up this abstract representation by means of a
speaker normalization procedure (see Johnson 2005b for a recent review).
A clear demarcation is established in that framework between the surface
phonetic form of a word and the underlying phonological representation
associated with that word. The abstractionist approach contends that the
phonological representation for each word is both unique and permanently
stored in memory. Readers are referred to Cutler et al. (to appear), Eulitz &
Lahiri (2004), Lahiri & Reetz (2002), Pallier et al. (2001), and Stevens
(2002), for recent important papers in the abstractionist framework.
Abstractionist models such as Lahiri and Reetz's (2002) Featurally Un-
derspecified Lexicon (FUL) model offer a representation-based solution to
the speech variability problem (see discussion in Pitt & Johnson, 2003). In
FUL, phonological representations in the lexicon are underspecified for
certain features such as [coronal] and listeners are insensitive to surface
variations shown by words for these features. In this way assimilation of
word-final coronals (such as /n/ in the word green) to the place of articula-
tion of the following consonant (as in green bag, phonetically realized as
[im b]) is considered to remain consistent with the underspecified
phonological representation of the carrier word. Because of this so-called
no-mismatch relationship between the word's surface and underlying forms,
assimilation is expected to have no disruptive effect on the identification of
the word. Other abstractionist models rely on processing, rather than repre-
sentations, and assume that recognizing the assimilated variant of a word
entails recovering the word's unassimilated shape through phonological
inference.
In spite of the fact that exemplar models are representation-oriented,
they differ from abstractionist models on many important dimensions. A
major difference between the two approaches is that prototypical exemplar
models view each exemplar as corresponding to a language chunk that is
stored in memory along with all the details specific to the particular cir-
cumstances in which it has been produced or encountered. These include
sensory-motor, semantic and pragmatic characteristics, but also indexical
information about the speaker's identity and the situation of occurrence, to
mention but a few properties. Exemplars are therefore deeply anchored into
their context of occurrence in the largest possible sense and this has drastic
The dynamical approach to speech perception 195

implications for how spoken language may be represented in the brain. In
some exemplar-based theories, this is at variance with viewing linguistic
utterances as being built from a pre-defined set of context-independent
phonological primitives. In a radical departure from this widely accepted
combinatorial view to language, Bybee & McClelland (2005) extend sensi-
tivity to context and non-uniformity to all levels of linguistic analysis, and
go as far as to claim that there is no analysis into units at any level or set
of levels that will ever successfully and completely capture the realities of
synchronic structure or provide a framework in which to capture language
change.
As pointed out by Johnson (2005a), the exemplar approach to sensory
memory has been well established in cognitive psychology for more than a
century. Goldinger (1996, 1998) and Johnson (1997b) drew on this general
theoretical framework to develop explicit exemplar-based models of speech
perception (see Pierrehumbert, 2006, for a historical overview), although
these models have a number of prominent precursors: Klatt's (1979) Lexi-
cal Access from Spectra model, and Elman's (1990) Recurrent Neural Net-
work, for example, were also based on the assumption that lexical represen-
tations are phonetically rich. Exemplar models today present a major
alternative to the better-established abstractionist approach, with far-
reaching implications not only for phonetics and phonology, but also and
more generally for our understanding of language structure and language
use (e.g., Barlow & Kremmer, 2000). In their current form, exemplar mod-
els of speech perception also raise a number of important theoretical and
empirical questions, and it is on some of these questions that we focus in
the following section.
2. Fine phonetic detail and abstract phonological categories in speech
perception
In recent years, research has increasingly focused on the listener's sensitivi-
ty to properties of the speech signal that are generically referred to as fine
phonetic detail (FPD, hereafter). This research suggests that FPD has a
significant impact on speech perception and understanding, in some cir-
cumstances at least. FPD includes allophonic variation, sometimes specific
to certain words or classes of words (Pierrehumbert, 2002), as well as soci-
ophonetic variation, broadly construed as being associated with the speak-
er's identity, age, gender, and social category (Foulkes & Docherty, 2006).
196 Nol Nguyen, Sophie Wauquier and Betty Tuller

Fine phonetic detail is designated as such in the sense that it is to be distin-
guished from the local and most perceptually prominent cues associated
with phonemic contrasts in the speech signal (Hawkins, to appear). It may,
however, encompass substantial acoustic variations, such as those that dis-
tinguish male from female voices. FPD is therefore identified as detail
only with respect to a specific theoretical viewpoint, namely the traditional
segmental approach to speech perception and production. A form of antiph-
rasis, FPD refers to phonetic properties that are judged non-essential in the
identification of speech sounds in a theoretical framework whose limits the
exemplar approach endeavors to demonstrate. The goal of current research
on FPD is to show that FPD is important in speech perception, and, there-
fore, that a change of theoretical perspective is warranted.
Recent studies on the role of FPD in spoken word recognition have pro-
vided evidence that perceptually-relevant allophonic variation includes
vowel-consonant acoustic transitions (e.g., Marslen-Wilson & Warren,
1994), within-category variations in voice onset time (Allen & Miller,
2004; Andruski et al., 1994; McMurray et al., 2002, 2003), long-domain
resonance effects associated with liquids (West, 1999), and graded assimi-
lation of place of articulation in word-final coronals (e.g., Gaskell, 2003;
the studies cited here were conducted on either American or British Eng-
lish). To a certain extent, however, the fact that listeners are sensitive to
allophonic variation was established much earlier. For example, studies
conducted in the 1970s and 1980s consistently showed that coarticulation
between neighboring segments provides listeners with perceptually-
relevant cues to segment identity (and by extension to word recognition). A
well-known example is regressive vowel-to-vowel coarticulation in Eng-
lish, which allows the identity of the second vowel to be partly predictable
from the acoustic cues associated with it in the first vowel (Martin & Bun-
nell, 1981, 1982). It has also been repeatedly demonstrated that word rec-
ognition is sensitive to the individual characteristics of the speaker's voice
(e.g., Goldinger, 1996, 1998). Although allophonic and between-speaker
variation are often lumped together under the generic term FPD, recent
evidence suggests that these two types of phonetic variation are dealt with
in different ways by listeners (Luce & McLennan, 2005).
The phonetics of conversational interaction is another area in which evi-
dence for the role of FPD in speech perception is growing (e.g., Couper-
Kuhlen & Ford, 2004). A major finding of these studies is the tendency
shown by participants in a conversation to imitate each other. Imitation
seems to occur at every level of the conversational exchange, including the
The dynamical approach to speech perception 197

phonetic level (Giles et al., 1991). For example, Pardo (2006) had different
talkers produce the same lexical items before, during, and after a conversa-
tional interaction, and found that perceived similarity in pronunciation be-
tween talkers increased over the course of the interaction and persisted
beyond its conclusion. Phonetic imitation, or phonetic convergence, is a
mechanism that may be actively employed by talkers to facilitate conversa-
tional interaction. It has also been suggested that phonetic imitation plays
an important role in phonological and speech development (Goldstein,
2003) and is rooted in the ability that human neonates have to imitate facial
gestures (Meltzoff & Moore, 1997). In addition, phonetic imitation has
been assumed in recent work to be one of the key mechanisms that underlie
the emergence and evolution of speech sound systems (e.g., de Boer, 2000).
The behavioral tendency shown by humans to imitate others may be con-
nected at the brain level with the presence of mirror neurons, whose role in
the production, perception and acquisition of speech now seems well estab-
lished (Studdert-Kennedy, 2002; Vihman, 2002; Jarick & Jones, 2008).
Crucially for the present paper, phonetic convergence demonstrates that
listeners are sensitive to speaker-dependent phonetic characteristics which
have an influence on both the dynamics of conversational interaction, and,
across a longer time range, the representations associated with words in
memory after the interaction has ended. Such sensitivity to context in lis-
teners has led researchers like Tuller and her colleagues to contend that
speech perception studies should focus on the listener's individual behavior
in its situation of occurrence, as opposed to abstract linguistic entities (Case
et al., 2003, see also Tuller, 2004).
It is important to point out that by itself, the fact that listeners are sensi-
tive to FPD is not inconsistent with abstractionist models of speech percep-
tion. For example, Stevens (2004) contends that in addition to what he re-
fers to as the defining articulatory/acoustic attributes associated with
distinctive features (e.g., the spectrum of the release burst of stop conso-
nants), the phonetic implementation of these features involves so-called
language-specific enhancing gestures, which allow the features' perceptual
saliency to be strengthened (e.g., tongue-body positioning for tongue-blade
stop consonants, see Stevens, 2004). Although enhancing gestures can be
regarded as FPD in some models and hence non-essential for the identifica-
tion of speech sounds, they are attributed an important role in Stevens' ab-
stractionist model of lexical access (Stevens, 2002). Likewise, the TRACE
model of speech perception (McClelland & Elman, 1986) relies on the as-
sumption that fine-grained acoustic properties may have an impact on word
198 Nol Nguyen, Sophie Wauquier and Betty Tuller

recognition, although TRACE too may be regarded as an abstractionist mod-
el (it contains an infralexical phonemic level of processing and, at the lexi-
cal level, each word is represented by a single processing unit). TRACE
accounts for part of the listener's sensitivity to FPD by modelling the flow
of acoustic information within the speech processing system by means of a
set of continuous parameters. It is also designed to explain how fine-
grained coarticulatory variation is taken into account in the on-line identifi-
cation of phonemes (Elman & McClelland, 1988). Thus, the assumption
that FPD has a role to play in speech perception is not specific to exemplar
models and is also found in at least some abstractionist models. Exemplar
models do diverge from abstractionist models, however, in assuming that in
addition to being relevant to on-line speech perception and understanding,
FPD is stored in long-term memory. Thus, in exemplar models, lexical
representations are phonetically rich.
To our knowledge, much of the available evidence for long-term storage
of FPD in the mental lexicon comes from studies of speech production. As
shown by Bybee (2001, 2006a) and Pierrehumbert (2001, 2002), frequen-
cy-dependent differences in the phonetic realization of words that meet the
same structural description (e.g., words underlyingly containing a schwa
followed by a sonorant, such as the high-frequency word every [ev], pro-
duced with no schwa, compared with the mid-frequency word memory
[mem ], produced with a syllabic /r/)
1
, must be learned and stored in the
mental lexicon by speakers in the course of language acquisition. This is
also true of sociophonetic variation, which has to be learned inasmuch as
the relationship between phonetic forms and social categories is arbitrary
(Foulkes & Docherty, 2006). Because these sometimes subtle patterns of
phonetic variation have to be detected by the speaker, either explicitly or
implicitly, before she/he is able to reproduce them, these studies lend strong
albeit indirect support to the assumption that perceived FPD is stored in the
lexicon. More direct evidence is available from a variety of sources. In a
well-known series of experiments, Goldinger (1996, 1998) showed that
prior exposure to a speaker's voice facilitates later recognition of words
spoken by the same speaker compared with a different speaker. Strand
(2000) found that listeners respond more slowly to nonstereotypical male
and female voices than to stereotypical voices in a speeded naming task.
These studies suggest that the individual characteristics of the speaker's
voice as well as the acoustic/phonetic properties associated with the speak-
er's gender are retained in memory by listeners. Both Johnson (1997b) and
Goldinger (1998) consider that direct storage of FPD in the lexicon allows
The dynamical approach to speech perception 199

listeners to deal with between-speaker variations in the production of words
without having to resort to a normalization procedure (but see Mitterer,
2006, for experimental counterevidence).
Little is known yet about the possible forms of exemplars stored in
memory. A survey of the relevant literature indicates that exemplars are
generally considered as multimodal sensory-motor representations of lan-
guage chunks of various size (more on this later), and can therefore be cha-
racterized in a general way as being a) non-symbolic, b) parametric, c) in a
relationship of intrinsic similarity with the input speech signal, and d) ab-
stract, up to a certain extent, since the auditory trace of speech fades away
after about 400 ms (Pardo & Remez, 2007). Because of the limits of our
current knowledge in that domain, exemplar models of speech production
and perception can be, in a paradoxical way, far more abstract than they
purport to be. In these models, exemplars are sometimes represented in a
highly schematized form which bears little resemblance to the fine-grained
acoustic structure of speech. Much research still needs to be done to charac-
terize the representation of exemplars in the listener's brain.
Exemplar theories also differ among themselves in important ways. For
example, Hintzman's (1986) rationalist approach to memory may be in-
compatible with the neo-empiricist viewpoint advocated by Coleman
(2002)
2
. Importantly, there is a lack of consensus among proponents of the
exemplar approach with regard to the status of phonological representations
in speech perception. In some models, such as Johnson's (1997a, 2005a)
XMOD, exemplars have no internal structure, and are conceived as unana-
lyzed auditory representations associated with whole words. This, however,
does not mean that sublexical units such as segments and syllabic constitu-
ents cannot have psychological reality. Although it is assumed that such
units are not explicitly represented in memory, they can nevertheless be
brought to the listener's consciousness as the speech signal is mapped onto
the lexicon. These units temporarily emerge as a by-product of lexical acti-
vation, as connections between time-aligned, phonetically-similar portions
of exemplars are established. In this framework, listeners are assumed to be
simultaneously sensitive to units of different sizes in the speech signal,
albeit with a natural bias for larger units to prevail over smaller ones
(Grossberg, 2003). What may be viewed as a phonological structure, with a
certain degree of abstraction, is therefore built up by the listener in the on-
line processing of speech, although this structure is said to be but a fleet-
ing phenomenon - emerging and disappearing as words are recognized
(Johnson, 1997a). Other researchers (Hawkins, 2003, 2007; Luce &
200 Nol Nguyen, Sophie Wauquier and Betty Tuller

McLennan, 2005; McLennan & Luce, 2005; Pierrehumbert, 2006) have
proposed a hybrid approach in which exemplars are encoded in memory in
conjunction with permanently-stored abstract phonological representations.
In Hawkins' POLYSP model of speech perception and understanding (Haw-
kins & Smith, 2001; Hawkins, 2003, to appear) for example, FPD is
mapped onto abstract prosodic structures as characterized in the Firthian
Prosodic Analysis phonological framework.
The whole-word exemplar hypothesis, as it is adopted in some models
of speech perception (e.g., Goldinger, 1998; Johnson, 1997b), raises a
number of issues that have been highlighted by different authors. First, it is
not always clear why words should indeed be postulated as basic units of
processing and storage. It seems more likely that fragments of speech of
many different sizes would empirically come to surface in the utterances to
which listeners are exposed over the course of their life. If the logic that
governs the exemplar approach is to be fully followed, one should assume
that word sequences of high frequency, such as I don't know, should be
stored as single units in what then becomes a highly extended mental lex-
icon (see Bybee, 2001, 2006b). Second, the whole-word exemplar hypothe-
sis in perception is inconsistent with what Pierrehumbert (2006) refers to as
the phonological principle, i.e., that languages have basic building blocks,
which are not meaningful in themselves, but which combine in different
ways to make meaningful forms, as shown by the fact that classically-
defined allophonic rules are found to apply to a large majority of words
sharing the same structural description, even if they may not extend to all
of these words. Pierrehumbert (2006) also points out that it is difficult to
see how whole-word exemplar models can account for the bistable charac-
ter of speech perception, i.e., that an ambiguous speech sound potentially
associated with two categories will be perceived as a member of one and
only one of these categories at any one time (since such response patterns
seem to rely on a winner-take-all competition among two underlying ab-
stract units).
There is a well-known and extensive body of evidence in support of the
idea that infra-lexical phonological representations come into play in spo-
ken word recognition (e.g., Cutler et al., to appear; Lahiri & Marslen-
Wilson, 1991; Lahiri & Reetz, 2002; Pallier, 2000). In addition, numerous
experimental studies (e.g., Lahiri & Reetz, 2002, among others) have
shown that, in some circumstances at least, abstract phonological categories
seem to prevail over fine phonetic detail in the mapping of speech sounds
onto meaning
3
. In the following section, we focus on two studies that we
The dynamical approach to speech perception 201

and our colleagues recently carried out and whose results also point to a
role for abstract phonological representations in speech perception.
3. Further evidence for the role of abstract phonological
representations in speech perception
Dufour et al. (2007) examined the influence that regional differences in the
phonemic inventory of French may have on how spoken words are recog-
nized. Whereas the phonemic system of standard French is traditionally
characterized as containing three mid vowel pairs, namely /e/-//, //-//,
and /o/-//, as in pe /epe/ sword vs. pais /ep/ thick, and cte /kot/
hill vs. cote /kt/ rating, southern French is viewed as having three
mid-high vowel phonemes only, /e/, // and /o/ (Durand, 1990). [], [] and
[] appear at the phonetic level but they are in complementary distribution
with respect to the corresponding mid-high variants, according to a variant
of the so-called loi de position (a mid-vowel phoneme is realized as mid-
high in an open syllable and as mid-low in closed syllables and whenever
the next syllable contains schwa, see Durand, 1990). Thus, in southern
French, pe and pais will be both pronounced [epe] and cte and cote
will be both pronounced [kt]. Dufour et al. (2007) asked how words such
as pe, pais, cte and cote, as produced by a speaker of standard French,
i.e., with a contrast in vowel height in the word-final syllable, were per-
ceived by speakers of both standard and southern French. Using a lexical
decision task combined with a long-lag repetition priming paradigm, Du-
four et al. found that pairs of words ending in a front mid vowel (e.g. pe -
pais) were not processed in the same way by both groups of subjects.
Standard French speakers perceived the two words as being different from
each other, as expected, whereas southern French speakers treated one
word as a repetition of the other. By contrast, both groups of subjects per-
ceived the two members of /o/-// word pairs as different one from the oth-
er. Thus, the results showed that there are within-language differences in
how isolated words are processed, depending on the listener's regional ac-
cent. Note that southern speakers are far from being unfamiliar with stan-
dard French. On the contrary, they are widely exposed to it through the
media and at school in particular. According to Dufour et al., the observed
response patterns for southern speakers may be accounted for by assuming
that the /o/-// contrast is better defined than the /e/-// contrast in these
speakers' receptive phonological knowledge of standard French. The /o/-//
202 Nol Nguyen, Sophie Wauquier and Betty Tuller

contrast is a well-established and highly recognizable feature of standard
French, which is as such well-known to southern speakers, even if this con-
trast is neutralized in these speakers' dialect. By comparison, the distribu-
tion of /e/ and // in word-final position in standard French is characterized
by greater complexity both across and within speakers, and there is evi-
dence showing that word-final /e/ and // are in the process of merging in
Parisian French (although the speaker used in Dufour et al.'s study did
make the distinction between the two vowels, as confirmed by the fact that
standard French subjects did not process the second carrier word as being a
repetition of the first one). Dufour et al. hypothesized that because of the
unstable status of the /e/-// contrast, both vowels were perceptually assimi-
lated to the same abstract phonological category by speakers of southern
French. Thus, Dufour et al.'s results suggest that abstract phonological re-
presentations are brought into play by listeners in spoken word recognition.
Nguyen et al. (2007b) recently undertook a study on the perceptual
processing of liaison consonants in French. Liaison in French is a well-
known phenomenon of external sandhi that refers to the appearance of a
consonant at the juncture of two words, when the second word begins with
a vowel, e.g. un [ ] + enfant [f] [nf] a child, petit [pti] + ami
[ami] [ptitami] little friend. In earlier work, Wauquier-Gravelines
(1996) showed that listeners found it more difficult to detect a target pho-
neme (e.g., /n/) in a carrier phrase, when that phoneme was a liaison conso-
nant (son avion [snavj] her plane) compared with a word-initial conso-
nant (son navire [snavi] her ship). The proportion of correct detection
proved significantly lower for the liaison than for the word-initial target
consonant. According to Wauquier-Gravelines, the listeners' response pat-
tern was attributable to the specific phonological status that liaison conso-
nants have in French. More particularly, and in the autosegmental phonolo-
gy framework (Encrev, 1988) espoused by Wauquier-Gravelines, liaison
consonants are treated as floating segments with respect to both the skeletal
and syllabic tiers, as opposed to fixed segments, which are lexically anc-
hored to a skeletal slot, and which include word-initial consonants, but also
word-final (e.g., /n/ in la bonne [labn] the maid) and word-internal (e.g.,
/n/ in le snat [lsena] the senate) ones. Using a speeded phoneme detec-
tion task, Nguyen et al. (2007b) aimed to confirm that detecting liaison
consonants in speech is difficult. They examined to what extent differences
in the detection rate of liaison consonants vs. word-initial consonants could,
at least in part, stem from the phonetic properties of these consonants, by
systematically manipulating these properties. The potentially distinctive
The dynamical approach to speech perception 203

status of liaison consonants compared with fixed consonants in perception
was further explored by inserting word-final and word-medial fixed conso-
nants as well as word-initial ones in the material
4
. The results first showed
that the percent correct detection systematically varied depending on the
position of the target consonant in the carrier sentence: listeners tended to
miss liaison consonants more often than fixed consonants, whether these
were in word-initial, word-final or word-medial position. Second, manipu-
lating the liaison consonants' and word-initial consonants' fine phonetic
properties had no measurable influence on how accurately these consonants
were detected by listeners. Nguyen et al. (2007b) pointed out that the lis-
teners' response pattern was partly consistent with an exemplar-based
theory of French liaison such as the one proposed by Bybee (2001). In this
approach, liaison consonants are deeply entrenched in specific grammatical
constructions, and the realization of liaison is highly conditioned by the
strength of the associations between words within such constructions. Al-
though little is said in Bybee's theory about how liaison consonants may be
processed in speech understanding, a prediction that may be derived from
this theory is that listeners will process liaison consonants as part and par-
cel of the constructions in which these consonants appear. As a result, it
may be difficult for listeners to identify liaison consonants as context-
independent phonemic units, as explicitly required in a phoneme-detection
task. This, however, should be true for all the segments a construction may
contain. On this account, listeners should not experience less difficulty in
detecting a word-initial consonant compared with a liaison consonant,
when these consonants appear in word sequences that are highly similar to
each other with respect to their morpho-syntactic and phonetic make-up, as
was the case in Nguyen et al.s (2007b) material. The lower detection rates
observed for liaison than for word-initial target consonants was consistent
with the assumption that liaison consonants have a specific phonological
status and, to that extent, provided better evidence for the abstractionist
autosegmental account of liaison than for the exemplar-based account.
Readers are referred to Nguyen et al. (2007b: 8-21) for further detail about
the experiment and its potential theoretical implications.
204 Nol Nguyen, Sophie Wauquier and Betty Tuller

4. The dynamical view of speech perception: Beyond the exemplars
vs. abstractions dichotomy?
The above review of the literature suggests that the dichotomy that is some-
times erected between exemplar-based and abstractionist approaches to
speech perception may be to a large extent artificial. Experimental evidence
is available that provides support for the role of both fine phonetic detail
and abstract phonological categories in speech perception. The recent de-
velopment of so-called hybrid models (Hawkins, 2003, to appear; Luce &
McLennan, 2005; McLennan & Luce, 2005; Pierrehumbert, 2006) is go-
verned by the assumption that FPD and abstract phonological categories
combine with each other in the representations associated with words in
memory.
Over the last decade, Tuller and colleagues (Case et al., 1995; Tuller et
al., 1994; Tuller, 2003, 2004) have developed a model that shares some of
the characteristics of the hybrid approach. This model, referred to as the
TCDK model hereafter, uses concepts from the theory of nonlinear dynam-
ical systems to account for the mechanisms involved in the categorization
of speech sounds. In this model, there are two complementary aspects to
speech perception. On the one hand, speech perception is assumed to be a
highly context-dependent process sensitive to the detailed acoustic structure
of the speech input. On the other hand, it is viewed as a nonlinear dynami-
cal system characterized by a limited number of stable states, or attractors,
which allow the system to perform a discretization of perceptual space and
which are associated with abstract perceptual categories. In this section,
after a brief and schematic presentation of the TCDK model, we report the
results of a study recently conducted on the categorization of speech sounds
in French with a view to testing some of the model's predictions. The im-
plications of the model for the exemplar vs. abstraction debate will then be
discussed.
The model was first designed to account for listeners' response patterns
in a binary-choice speech categorization task. In the experiments reported
in Tuller et al. (1994), listeners were presented with stimuli ranging on an
acoustic continuum between say and stay and their task was to identify
each stimulus as one of these two words. Listeners' responses were mod-
eled using a nonlinear dynamical system characterized as follows:

V(x) = k(x) x
2
/ 4 + x
4
/ 4

The dynamical approach to speech perception 205

In this equation, x represents the perceptual form (in this case, say or stay),
k a control parameter, and V(x) a potential function which may have up to
two stable perceptual forms indicated by minima in the potential function,
depending on the value of k. The control parameter k itself depends on the
acoustic characteristics of the stimulus, on the one hand, and the combined
effects of learning, linguistic experience and attentional factors, on the oth-
er hand, in a way described by the following equation:
k() = k
0
+ + / 2 + (n n
c
)(
f
)
where k
0
refers to the system's initial state, represents the acoustic para-
meter that is manipulated in the stimuli (in the present case, the duration of
the silent gap between the fricative and the diphthong), is a parameter that
characterizes the lumped effect of learning, linguistic experience and atten-
tion, is the discrete form of the Heaviside step function, n is the number
of perceived stimulus repetitions in a given run, n
c
represents a critical
number of accumulated repetitions, and
f
denotes the value of at the
other extreme from its initial value.
For a given value of k, the system's state evolves in the x perceptual
space to get trapped into a local minimum, or attractor, of V(x). Each of the
two possible responses in the categorization task corresponds to one attrac-
tor in the perceptual space. Figure 1 shows the shape of the potential func-
tion for five values of k between -1 and 1. The potential function has one
minimum only for extreme values of k, which correspond to stimuli unam-
biguously associated with either of the two categories, and two minima in
the middle range of k, where both categories are possible. As k increases in
a monotonic fashion (from left to right in Figure 1), and in the vicinity of a
critical value k
c
, the system's state, represented by the filled circle in Figure
1, abruptly switches from the basin of attraction in which it was initially
located, to the second basin that has gradually formed as the first one dis-
appears.
Figure 1. Shape of the potential function V(x) for five values of k. Adapted from
Tuller et al. (1994).
206 Nol Nguyen, Sophie Wauquier and Betty Tuller

In Tuller et al.s (1994) experiments, the stimuli on the say-stay continuum
were presented to the listener in either a randomized order, or a sequential
order. In the latter case, listeners heard the entire set of stimuli twice, going
from one of the two endpoints (e.g., say) to the other (stay), and then back
to the first one (say) again. In such sequential presentations, three possible
response patterns were to be expected: a) hysteresis, defined as the tenden-
cy for the listener's response at one endpoint to persist across the ordered
sequence of stimuli towards the other endpoint, b) enhanced contrast, in
which the listener quickly switches to the alternate percept and does not
hold on to the initial categorization, and c) critical boundary, where the
switch between the two percepts is associated with the same stimulus re-
gardless of the direction of presentation across the continuum. The results
showed that critical boundary was much less frequent than hysteresis and
contrast, which occurred equally often. These data provided strong support
for the assumption that speech perception is a highly context-dependent
process, characterized by a rich variety of dynamical properties. Readers
are referred to Tuller et al. (1994) and Case et al. (1995) for further detail
on these experiments.
Nguyen et al. (2005) and Nguyen et al. (2007a) recently undertook to
extend Tuller and colleagues' (1994) hypotheses and experimental para-
digm to the categorization of speech sounds in French. In Nguyen et al.
(2007a), the material was made up of 21 stimuli on an acoustic continuum
between cpe /sp/ (a type of mushroom) and steppe /stp/ (in physical
geography, a steppe, that is, a plain without trees). Each stimulus contained
a silent interval between /s/ and // whose duration increased from 0 ms
(Stimulus 1) to 100 ms (Stimulus 21) in 5-ms steps. These stimuli were
used in a speeded forced-choice identification task administered to eleven
native speakers of French naive as to the purposes of the experiment and
with no known hearing defects. Listeners were presented with the 21 stimu-
li in both random and sequential (1211 or 21121) order. The ex-
periment comprised 20 randomized presentations alternating with 20 se-
quential presentations. The inter-stimuli interval was two seconds and the
experiment lasted about an hour.
An index referred to as the CH (Contrast-Hysteresis) index was devised
to measure the amount that hysteresis or enhanced contrast contributed to
each subject's responses to sequential presentation. This entailed locating
the position on the continuum of the stimulus associated with the switch
from one response to the other in the first part of the presentation (e.g.,
121, in a 1211 sequence), on the one hand, and in the second part of
The dynamical approach to speech perception 207

the presentation (e.g., 211, in a 1211 sequence), on the other hand.
The distance between these two points was then measured, in such a way
that positive values corresponded to hysteresis, negative values to enhanced
contrast, and 0 to critical boundary. The distribution of the CH index across
the 20 sequential presentations for all the subjects is shown in Figure 2.
These data indicate that hysteresis prevailed over enhanced contrast and
critical boundary. The CH index reached a grand average value of 3.5 that
proved to be significantly higher than 0 in a linear mixed-effects model
using the CH values as the predicted variable, the intercept as predictor
(fixed effect) and the subjects as a blocking factor (random effect; t(208) =
3.93, p < 0.001).
Figure 2. Observed values of the CH index in Nguyen et al.s (2007a) speech cate-
gorization experiment. Left panel: distribution across all the presenta-
tions; right panel: mean value and standard deviation for each of the 20
sequential presentations.
Figure 2 also shows the mean and standard deviation of the CH index for
each of the 20 sequential presentations, in chronological order from the
beginning of the experiment. An important prediction of the TCDK model
is that the amount of hysteresis should decrease as the subject gets more
experienced with the task and stimuli (e.g., Case et al., 1995; Tuller, 2004;
Nguyen et al., 2005). This is because the model accommodates increased
experience by using a step function that causes more of a change in the tilt
of the potential for each step change in the acoustic stimulus. This predic-
tion was borne out by the data, as a decrease in the CH value over the
course of the experiment was observed which proved statistically signifi-
cant in a linear mixed-effects model using the CH value as predicted varia-
ble, the rank of presentation as predictor and the subjects as blocking factor
(t(208) = -2.792, p < 0.01). These results offered further confirmation that
the speech perception system can be modelled as a nonlinear dynamical
208 Nol Nguyen, Sophie Wauquier and Betty Tuller

system whose current state simultaneously depends on the input speech
sound, the system's past state, and higher-level cognitive factors that in-
clude the listener's previous experience with the sounds she/he has to cate-
gorize. In the following, we concentrate on what the TCDK model may
contribute to the general issue of abstractionist and exemplar-based views
of phonological representation.
In spite of its being linked to a specific experimental task (forced-choice
categorization), the TCDK model shows a number of general properties
that, in our view, open the way towards a novel, hybrid view to speech
perception that gets beyond the dichotomy traditionally established be-
tween exemplars and abstract phonological representations. Clear differ-
ences arise between the nonlinear dynamical approach exemplified by the
TCDK model and the abstractionist approach. In the former, to a greater
extent and more systematically than in the latter, speech categorization is
viewed as being sensitive to the detailed acoustic characteristics of the in-
put signal. For example, it is assumed that small variations in the acoustic
structure of an ambiguous speech sound can lead to large changes in the
perceptual system's response. This will be the case when these variations
cause the system to move across a saddle point in the potential function
(see Figure 1). Note, however, that perceptual sensitivity to small acoustic
change is not assumed to be the same in all regions of the acoustic space.
Variations in stimuli close to a prototypical sound unambiguously asso-
ciated with a given perceptual category will have little impact on the listen-
er's response. This is viewed in the model as being governed by a one-
attractor potential function (see left and right panels of Figure 1).
The TCDK model also predicts that the relative stability of a category
will vary depending on how frequently that category has been perceived in
the preceding sequence of speech sounds. This is attributed to the fact that
the tilt of the potential function changes more sharply in response to a vari-
ation in the input sound when a given category has been perceived more
often. Yet another prediction of the model is that the location of the percep-
tual switch from one category to another in the acoustic space tends strong-
ly to depend on the trajectory followed by the stimuli in that space, as is the
case in both hysteresis and enhanced contrast. Over a longer time scale,
increasing experience with the stimuli is expected to affect the dynamics of
speech categorization, which will tend to move away from hysteresis to-
wards enhanced contrast. The predicted sensitivity of the speech perception
system to gradient acoustic properties, frequency of occurrence of per-
ceived categories, trajectory of speech sounds in the acoustic space, and
The dynamical approach to speech perception 209

training, is consistent with listeners' observed response patterns, and seems
difficult to account for by purely abstractionist models of speech perception
in the absence of a role for FPD. In the TCDK model, this sensitivity partly
derives from the fact that speech sounds are mapped onto a discrete and
finite set of perceptual categories by means of a continuous potential func-
tion, as opposed to the sharp division between sounds and percepts often
posited in abstractionist models.
While the dynamical nature of speech categorization is central to the
TCDK model, it is also, to a certain extent, emphasized in the exemplar
approach. Exemplars are assumed to accumulate in memory as listeners are
exposed to them, and this causes boundaries between categories to be con-
tinuously pushed around in the perceptual space. As a result, more frequent
categories (represented by a higher number of exemplars) gradually come
to prevail over less frequent ones (Pierrehumbert, 2006). Perceptual catego-
ries are taken in the exemplar approach to be time-dependent, and to evolve
continuously in the course of the conversational interactions in which
speakers/listeners engage. This of course has major theoretical implica-
tions, as frequency of use is expected to have an impact on the very form of
phonological representations in memory. As indicated above, however,
dynamics in speech perception is not restricted to the incremental effect of
exposure on perceived categories but encompasses a much wider range of
phenomena such as hysteresis, contrast, bifurcation, and stability
5
. The
TCDK model aims to take advantage of the powerful theory of nonlinear
dynamical systems to account for these phenomena in all their variety and
along short as well as long time scales. It offers an explanation of the bista-
bility of speech perception, attributed to the coexistence of two mutually
exclusive attractor states in the perceptual space. In addition, theoretical
and methodological tools (e.g., Erlhagen et al., 2006) are available that may
allow the nonlinear dynamical framework to be extended to the study of
conversation interaction between two or several speakers, and to model the
dynamics of speech processing and its influence on the organization of
perceived categories as this interaction unfolds in time.
5. Acknowledgments
This work was partly supported by the ACI Systmes complexes en SHS
Research Program (CNRS & French Ministry of Research). Betty Tuller
was supported by grants from the National Science Foundation (0414657
210 Nol Nguyen, Sophie Wauquier and Betty Tuller

and 0719683) and the Office of Naval Research. A first version of the
present paper was presented at the Workshop on Phonological Systems and
Complex Adaptive Systems held in Lyon in July 2005. Feedback from
Abby Cohn, Adamantios Gafos, and Sharon Peperkamp, among other par-
ticipants, is gratefully acknowledged. We are also grateful to two anonym-
ous reviewers for their critical comments and suggestions.


Notes

1. However, Lavoie (1996), cited by Cohn (2005), did not find the assumed posi-
tive correlation between rate of schwa deletion and lexical frequency.
2. In Hintzman's MINERVA 2 model, each memory trace is assumed to be inter-
nally represented as a configuration of so-called primitive properties, some of
which may be abstract, and which are not themselves acquired by experience.
In Coleman's empiricist view, by contrast, properties shown by word forms in
memory are by default considered as deriving from sensory experience, unless
empirical evidence is obtained that cannot be accounted for without bringing
abstract phonological units into play.
3. To take but one example, Lahiri & Reetz (2002) provide experimental evidence
suggesting that listeners are insensitive to surface variations in place of articula-
tion of word-final coronals, in accord with the assumption that coronals are un-
specified for place of articulation in the lexicon.
4. The material contained twenty groups of four test sentences. Within each
group, the target consonant (e.g., /z/) was located in word-initial position (e.g.,
[] des zros [] /dezero/ zeros) in the first sentence, in word-final position
(e.g., [] seize lves [] /szelv/ sixteen pupils) in the second sentence,
in word-medial position (e.g., [] du raisin [] /dyrez / some grapes) in the
third sentence, and in liaison position at the juncture between two words (e.g.,
[] des crous [] /dezekru/ some nuts) in the fourth sentence. Two differ-
ent versions of Sentences 1 and 4 were created. In the cross-spliced versions,
the target consonant and preceding vowel were exchanged between the two
sentences (for example, /ez/ in /dezekru/ was substituted to /ez/ in /dezero/, and
vice-versa). In the identity-spliced versions, the target consonant and preceding
vowel in each sentence originated from another repetition of that sentence. Lis-
teners' performance in the target detection task was expected to be poorer for
cross-spliced sentences than for identity-spliced sentences if liaison consonants
and word-initial consonants showed perceptually-salient differences in their
phonetic realization.
5. Because the TCDK model presented here is aimed to account for the dynamics
of speech perception in a binary-choice categorization task, both the number of
attractors and the association between attractors and perceptual categories were
The dynamical approach to speech perception 211


set by design. How new attractors can emerge in the perceptual space, as the
listener is exposed to non-native speech sounds for example, is a major issue
addressed by Case et al. (2003), Tuller (2004), and Tuller et al. (2008), in par-
ticular. Although the nature of the perceptual categories associated with attrac-
tors remains to be established, work by Tuller & Kelso (1990, 1991) suggests
that these categories may combine both articulatory and auditory information.
References
Allen, J. and Miller, J.
2004 Listener sensitivity to individual talker differences in voice-onset-
time. Journal of the Acoustical Society of America, 115:31713183.
Andruski, J., Blumstein, S., and Burton, M.
1994 The effect of subphonetic differences on lexical access. Cognition,
52:163187.
Barlow, M. and Kemmer, S. (eds)
2000 Usage-Based Models of Language. Center for the Study of Language
and Information, Stanford, CA.
Bybee, J.
2001 Phonology and Language Use. Cambridge University Press, Cam-
bridge.
2006a Frequency of Use and the Organization of Language. Oxford Uni-
versity Press, Oxford.
2006b From usage to grammar: the minds response to repetition. Lan-
guage, 82:529551.
Bybee, J. and McClelland, J.
2005 Alternatives to the combinatorial paradigm of linguistic theory based
on domain general principles of human cognition. The Linguistic Re-
view, 22:381410.
Case, P., Tuller, B., Ding, M., and Kelso, J.
1995 Evaluation of a dynamical model of speech perception. Perception &
Psychophysics, 57:977988.
Case, P., Tuller, B., and Kelso, J.
2003 The dynamics of learning to hear new speech sounds. Speech Pa-
thology. November 17, 1-8 from
http://www.speechpathology.com/articles/arc_disp.asp?article_id=50
&catid=560.
Cohn, A.
2005 Gradience and categoriality in sound patterns. Paper presented at the
Workshop on Phonological Systems and Complex Adaptive Systems,
Lyons, France, 46 July 2005.
212 Nol Nguyen, Sophie Wauquier and Betty Tuller

Coleman, J.
2002 Phonetic representations in the mental lexicon. In Durand, J. and
Laks, B., editors, Phonetics, Phonology, and Cognition, pp. 96130.
Oxford University Press, Oxford.
Couper-Kuhlen, E. and Ford, C. (eds)
2004 Sound Patterns in Interaction. Cross-linguistic Studies from Conver-
sation. John Benjamins, Amsterdam.
Cutler, A., Eisner, F., McQueen, J., and Norris, D.
to appear Coping with speaker-related variation via abstract phonemic categories.
In Fougeron, C., DImperio, M., Khnert, B., and Valle, N., editors,
Papers in Laboratory Phonology X. Mouton de Gruyter, Berlin.
de Boer, Bart
2000 Self-organization in vowel systems. Journal of Phonetics, 28:441465.
Dufour, S., Nguyen, N., and Frauenfelder, U.
2007 The perception of phonemic contrasts in a non-native dialect. Jour-
nal of the Acoustical Society of America Express Letters.
121:EL131EL136.
Durand, J.
1990 Generative and Non-Linear Phonology. Longman, London.
Elman, J.
1990 Finding structure in time. Cognitive Science, 14:179211.
Elman, J. and McClelland, J.
1988 Cognitive penetration of the mechanisms of perception: compensa-
tion for coarticulation of lexically restored phonemes. Journal of
Memory and Language, 27:143165.
Encrev, P.
1988 La liaison avec et sans enchanement. Seuil, Paris.
Erlhagen, W., Mukovskiy, A., and Bicho, E.
2006 A dynamic model for action understanding and goal-directed imita-
tion. Brain Research, 1083:174188.
Eulitz, C. and Lahiri, A.
2004 Neurobiological evidence for abstract phonological representations
in the mental lexicon during speech recognition. Journal of Cogni-
tive Neuroscience, 16:577583.
Fitzpatrick, J. and Wheeldon, L.
2000 Phonology and phonetics in psycholinguistics models of speech
perception. In Burton-Roberts, N., Carr, P., and Docherty, G., edi-
tors, Phonological Knowledge: Conceptual and Empirical Issues,
pages 131160. Oxford University Press, Oxford.
Foulkes, P. and Docherty, G.
2006 The social life of phonetics and phonology. Journal of Phonetics,
34:409438.

The dynamical approach to speech perception 213

Gaskell, M.
2003 Modelling regressive and progressive effects of assimilation in
speech perception. Journal of Phonetics, 31:447463.
Giles, H., Coupland, N., and Coupland, J.
1991 Accommodation theory: Communication, context, and consequence.
In Giles, H., Coupland, N., and Coupland, J., editors, Contexts of Ac-
commodation: Developments in Applied Sociolinguistics, pp. 168.
Cambridge University Press, Cambridge.
Goldinger, S.
1996 Words and voices: episodic traces in spoken word identication and
recognition memory. Journal of Experimental Psychology: Learning,
Memory and Cognition, 22:11661183.
1998 Echoes of echoes? An episodic theory of lexical access. Psychologi-
cal Review, 105:251279.
Goldstein, L.
2003 Emergence of discrete gestures. In Proceedings of the XVth Interna-
tional Congress of Phonetic Sciences, pp. 8588, Barcelona, Spain.
Grossberg, S.
2003 Resonant neural dynamics of speech perception. Journal of Phonet-
ics, 31:423445.
Hawkins, S.
2003 Roles and representations of systematic ne phonetic detail in speech
understanding. Journal of Phonetics, 31:373405.
to appear Phonetic variation as communicative system: Perception of the par-
ticular and the abstract. In Fougeron, C., DImperio, M., Khnert, B.,
and Valle, N., editors, Papers in Laboratory Phonology X. Mouton
de Gruyter, Berlin. to appear.
Hawkins, S. and Smith, R.
2001 Polysp: A polysystemic, phonetically-rich approach to speech under-
standing. Rivista di Linguistica, 13:99188.
Hintzman, D.
1986 Schema abstraction in a multiple-trace memory model. Psycho-
logical Review, 93:411428.
Jarick, M. and Jones, J.A.
2008 Observation of static gestures influences speech production. Experi-
mental Brain Research, 189:221228
Johnson, K.
1997a The auditory/perceptual basis for speech segmentation. Ohio State
University Working Papers in Linguistics, 50:101113.
1997b Speech perception without speaker normalization. In Johnson, K.
and Mullenix, J., editors, Talker Variability in Speech Processing,
pp. 145166. Academic Press, San Diego.
214 Nol Nguyen, Sophie Wauquier and Betty Tuller

2005a Decisions and mechanisms in exemplar-based phonology. UC Berke-
ley Phonology Lab Annual Report, pp. 289311.
2005b Speaker normalization in speech perception. In Pisoni, D. and Re-
mez, R., editors, The Handbook of Speech Perception. Blackwell,
Malden, MA. pp. 363389.
Klatt, D.
1979 Speech perception: a model of acoustic-phonetic analysis and lexical
access. Journal of Phonetics, 7:279312.
Lahiri, A. and Marslen-Wilson, W.
1991 The mental representation of lexical form: a phonological approach
to the recognition lexicon. Cognition, 38:245294.
Lahiri, A. and Reetz, H.
2002 Underspecied recognition. In Gussenhoven, C. and Warner, N.,
editors, Papers in Laboratory Phonology VII, pp. 637675. Mouton
de Gruyter, Berlin.
Lavoie, L.
1996 Lexical frequency effects on the duration of schwa Resonant se-
quences in American English. Poster presented at LabPhon 5, Chi-
cago, June 1996.
Luce, P. and McLennan, C.
2005 Spoken word recognition: The challenge of variation. In Pisoni, D.
and Remez, R., editors, The Handbook of Speech Perception, pages
591609. Blackwell, Malden, MA.
Marslen-Wilson, W. and Warren, P.
1994 Levels of perceptual representation and process in lexical access -
words, phonemes, and features. Psychological Review, 101:653675.
Martin, J. and Bunnell, H.
1981 Perception of anticipatory coarticulation effects. Journal of the
Acoustical Society of America, 69:559567.
1982 Perception of anticipatory coarticulation effects in vowel-stop con-
sonant-vowel sequences. Journal of Experimental Psychology: Hu-
man Perception and Performance, 8:473488.
McClelland, J. and Elman, J.
1986 The TRACE model of speech perception. Cognitive Psychology,
18:186.
McLennan, C. and Luce, P.
2005 Examining the time course of indexical specicity effects in spoken
word recognition. Journal of Experimental Psychology: Learning,
Memory and Cognition, 31:306321.

McMurray, B., Tanenhaus, M., and Aslin, R.
2002 Gradient effects of within-category phonetic variation on lexical
access. Cognition, 86:B33B42.
The dynamical approach to speech perception 215

McMurray, B., Tanenhaus, M., Aslin, R., and Spivey, M.
2003 Probabilistic constraint satisfaction at the lexical/phonetic interface:
Evidence for gradient effects of within-category VOT on lexical ac-
cess. Journal of Psycholinguistic Research, 32:7797.
Meltzoff, A. and Moore, M.
1997 Explaining facial imitation: A theoretical model. Early Development
and Parenting, 6:179192.
Mitterer, H.
2006 Is vowel normalization independent of lexical processing? Pho-
netica, 63:209229.
Nguyen, N., Lancia, L., Bergounioux, M., Wauquier-Gravelines, S., and Tuller, B.
2005 Role of training and short-term context effects in the identification of
/s/ and /st/ in French. In Hazan, V. and Iverson, P., editors, ISCA
Workshop on Plasticity in Speech Perception (PSP2005), pages
A3839, London.
Nguyen, N., Lancia, L., and Tuller, B.
2007a The dynamics of speech categorization: Evidence from French. in
preparation.
Nguyen, N., Wauquier-Gravelines, S., Lancia, L., and Tuller, B.
2007b Detection of liaison consonants in speech processing in French:
Experimental data and theoretical implications. In Prieto, P., Mas-
caro, J. and Sol, M.-J., editors, Segmental and Prosodic Issues in
Romance Phonology, pp. 323. John Benjamins, Amsterdam.
Pallier, C.
2000 Word recognition: do we need phonological representations? In
Cutler, A., McQueen, J., and Zondervan, R., editors, Proceedings of
the Workshop on Spoken Word Access Processes (SWAP), pp. 159
162, Nijmegen.
Pallier, C., Colom, A., and Sebastin-Galls, N.
2001 The influence of native-language phonology on lexical access: ex-
emplar-based vs. abstract lexical entries. Psychological Science,
12:445449.
Pardo, J.
2006 On phonetic convergence during conversational interaction. Journal
of the Acoustical Society of America, 119:23822393.
Pardo, J. S. and Remez, R. E.
2007 The perception of speech. In Traxler, M. and Gernsbacher, M., edi-
tors, The Handbook of Psycholinguistics, Second Edition. Elsevier,
Cambridge, MA. in press.
Pierrehumbert, J.
2001 Exemplar dynamics: Word frequency, lenition, and contrast. In Bybee, J.
and Hopper, P., editors, Frequency Effects and the Emergence of Linguis-
tic Structure, pages 137157. John Benjamins, Amsterdam.
216 Nol Nguyen, Sophie Wauquier and Betty Tuller

2002 Word-specific phonetics. In Gussenhoven, C. and Warner, N., edi-
tors, Papers in Laboratory Phonology VII, pages 101140. Mouton
de Gruyter, Berlin.
2006 The next toolkit. Journal of Phonetics, 34:516530.
Pitt, M. and Johnson, K.
2003 Using pronunciation data as a starting point in modeling word recog-
nition. In Proceedings of the XVth International Congress of Pho-
netic Sciences, Barcelona, Spain.
Stevens, K.
2002 Toward a model for lexical access based on acoustic landmarks and
distinctive features. Journal of the Acoustical Society of America,
111:18721891.
2004 Invariance and variability in speech: Interpreting acoustic evidence.
In Slifka, J., Manuel, S., and Matthies, M., editors, Proceedings of
From Sound to Sense: 50+ Years of Discovery in Speech Communi-
cation, pages B77B85, Cambridge, MA. MIT. URL:
www.rle.mit.edu/soundtosense/.
Strand, E.
2000 Gender Stereotype Effects in Speech Processing. PhD thesis, Ohio
State University.
Studdert-Kennedy, M.
2002 Mirror neurons, vocal imitation and the evolution of particulate
speech. In Stamenov, M. and Gallese, V., editors, Mirror Neurons
and the Evolution of Brain and Language, pages 207227. John Ben-
jamins, Amsterdam.
Tuller, B.
2003 Computational models in speech perception. Journal of Phonetics,
31:503507.
2004 Categorization and learning in speech perception as dynamical proc-
esses. In Riley, M. and Van Orden, G., editors, Tutorials in Contem-
porary Nonlinear Methods for the Behavioral Sciences. National
Science Foundation. URL: www.nsf.gov/sbe/bcs/pac/nmbs/nmbs.jsp.
Tuller, B., Case, P., Ding, M., and Kelso, J.
1994 The nonlinear dynamics of speech categorization. Journal of Expe-
rimental Psychology: Human Perception and Performance, 20:316.
Tuller, B., Jantzen, M., and Jirsa, V.
2008 A dynamical approach to speech categorization: Two routes to learn-
ing. New Ideas in Psychology, 26:208226.
Tuller, B. and Kelso, J.
1990 Phase transitions in speech production and their perceptual conse-
quences. In Jeannerod, M., editor, Attention and Performance XIII:
Motor Representation and Control, pages 429452. Lawrence Erl-
baum, Hillsdale, NJ.
The dynamical approach to speech perception 217

1991 The production and perception of syllable structure. Journal of
Speech and Hearing Research, 34:501508.
Vihman, M.
2002 The role of mirror neurons in the ontogeny of speech. In Stamenov,
M. and Gallese, V., editors, Mirror Neurons and the Evolution of
Brain and Language, pp. 305314. John Benjamins, Amsterdam.
Wauquier-Gravelines, S.
1996 Organisation phonologique et traitement de la parole continue.
Unpublished PhD dissertation, Universit Paris 7, Paris.
West, P.
1999 Perception of distributed coarticulatory properties in English /l/ and
//. Journal of Phonetics, 27:405426.


A dynamical model of change in phonological
representations: The case of lenition
1

Adamantios Gafos and Christo Kirov
1. Introduction
This paper presents a model of diachronic changes in phonological repre-
sentations. The broad context is that of sound change, where word repre-
sentations evolve at relatively slow time scales as compared to the time
scale of assembling a phonological representation in synchronic word pro-
duction. The specific focus is on capturing certain key properties of an un-
folding lenition process.
Diachronic changes in phonological representations accumulate gradual-
ly during repeated production-perception loops, that is, through the impact
of a perceived word on the internal representation and subsequent produc-
tion of that word or other related words. To formally capture the accumula-
tion of such changes, we capitalize on the continuity of parameter values at
the featural level. Our model is illustrated by contrasting it with another
recent view exemplified with an exemplar model of lenition, which also
embraces continuity in its representational parameters. A primary concern
is providing a formal basis of change in phonological representations using
basic concepts from the mathematics of dynamical systems.
2. The case of lenition
The term lenition is used to describe a variegated set of sound alterna-
tions such as voicing of obstruents between two vowels, spirantization of
stops in a prosodically weak position, and devoicing of obstruents in sylla-
ble-final position which are either attested synchronically or have resulted
diachronically in a restructuring of the phonemic inventory of a language.
One diachronic example of lenition is Grimms Law, according to which
Proto-Indo-European voiceless stops became Germanic voiceless fricatives
(e.g. PIE *[t] > Gmc *[]). Other examples of sound changes described as
cases of lenition are given below (for recent studies see Gurevich, 2004;
220 Adamantios Gafos and Christo Kirov

Cser, 2003, and Lavoie, 2001). In each case, a stop turns to a fricative simi-
lar in place of articulation to the original stop.
(1) a. Southern Italian dialects: [b d g] [v ] intervocalically.
b. Greek (Koine): [p t k] [f x] except after obstruents.
c. Proto-Gaelic: [t k] [ x] intervocalically.
d. Hungarian: [p] [f] word initially.
Consider any single transition between two states of a lenition process, say,
starting with a stop [b] and resulting in a fricative [v], [b] > [v]. At a broad
level, one can describe two kinds of approaches to this kind of transition.
The symbolic approach, as exemplified by Kiparskys classic paper on
linguistic universals and sound change (Kiparsky, 1968), studies the inter-
nal composition of the individual stages (e.g. feature matrices at each stage)
and makes inferences about the nature of the grammar and the representa-
tions. The continuity of sound change, that is, how the representation of the
lexical item containing a [b] changes in time to one containing a [v], is not
studied. This is in part due to the theoretical assumption that representa-
tions are discrete. That is, there is no symbol corresponding to an interme-
diate degree of stricture between that of a stop and a fricative. In the dy-
namical approach, the transition process between the stages is studied at the
same time as the sequence of stages. In what follows, we instantiate a
small, yet core part of a dynamical alternative to the symbolic model of
sound change.


2.1. An exemplar model of lenition
It is useful to describe the main aspects of our model by contrasting it with
another model proposed recently by Pierrehumbert. This is a model of
sound change aimed at accounting for certain generalizations about leni-
tion, extrapolated from observations of synchronic variation or sound
changes in progress. The model proposed in Pierrehumbert (2001) has two
attractive properties. It offers a way to represent the fine phonetic substance
of linguistic categories, and it provides a handle on the effect of lexical
frequency in the course of an unfolding lenition process.
In Pierrehumberts discussion of lenition, it is assumed that the produc-
tion side of a lenition process is characterized by the following set of prop-
erties.

A dynamical model of change in phonological representations 221


Table 1. Properties of lenition
Properties of Lenition
1. Each word displays a certain amount of variability in production.
2. The effect of word frequency on lenition rates is gradient.
3. The effect of word frequency on lenition rates should be observ-
able within the speech of individuals; it is not an artifact of aver-
aging data across the different generations which make up a
speech community.
4. The effect of word frequency on lenition rates should be observ-
able both synchronically (by comparing the pronunciation of
words of different frequency) and diachronically (by examining
the evolution of word pronunciations over the years within the
speech of individuals.)
5. The phonetic variability of a category should decrease over time,
a phenomenon known as entrenchment. The actual impact of en-
trenchment on lenition is not clear, and Pierrehumbert does not
cite any data specific to entrenchment for this particular dia-
chronic effect. In fact, while a sound change is in progress, it
seems equally intuitive (in the absense of any data to the contrary)
that a wider, rather than narrower range of pronunciations is
available to the speaker. Pierrehumbert uses the example of a
childs productions of a category becoming less variable over
time, but this may only apply to stable categories, rather than ones
undergoing diachronic change. It may also be orthogonal to the
childs phonetic representations, and rather be due to an initial
lack of biomechanical control. For these reasons, therefore, our
own model is not designed to guarantee entrenchment while
sound change is taking place, but does show entrenchment effects
for diachronically stable categories.

The first property, variability in production, does not apply exclusively to
lenition process, but rather it is a general characteristic of speech produc-
tion that any lenition model should be able to capture. The frequency re-
lated properties are based on previous work by Bybee who claims that at
least some lenition processes apply variably based on word frequen-
222 Adamantios Gafos and Christo Kirov

cy (Bybee, 2003). Examples include schwa reduction (e.g. memory tends to
be pronounced [mmi]) and t/d-deletion (e.g. told tends to be pronounced
[tol]). Once a lenition process has begun, Bybees claim amounts to saying
that words with high frequency will weaken more quickly over time than
rare words. Consequently, lenition effects can be seen both synchronically
and diachronically. Synchronically, a more frequent word will be produced
more lenited (with more undershoot) than a less frequent word in the cur-
rent speech of a single person. Diachronically, all words in a language will
weaken across all speakers, albeit at different rates.
What are the minimal prerequisites in accounting for the lenition proper-
ties above? First, it is clear that individuals must be capable of storing pho-
netic detail within each lexical item. We also need a mechanism for gra-
diently changing the lexical representations over time. To do this, the
perceptual system must be capable of making fine phonetic distinctions, so
that the information carried by these distinctions can reach the currently
spoken item in the lexicon.
Pierrehumberts exemplar-based model of lenition gives explicit formal
content to each of these prerequisites (Pierrehumbert, 2001). The model is
built on a few key ideas, which can be described in brief terms. Specifical-
ly, in the exemplar-based model, a given linguistic category is stored in a
space whose axes define the parameters of the category. In Pierrehumbert
(2001), it is suggested that vowels, for example, might be stored in an
F1/F2 formant space. This space is quantized into discrete cells based on
perceptual limits. Each cell is considered to be a bin for perceptual expe-
riences, and Pierrehumbert views each bin to be a unique potential exem-
plar. When the system receives an input, it places it in the appropriate bin.
All items in a bin are assumed to be identical as far as the perceptual sys-
tem is concerned, and the more items in a particular bin, the greater the
activation of the bin is. All bins start out empty and are not associated with
any exemplars that have actually been produced and/or perceived (memory
begins as a tabula rasa). When a bin is filled, this is equivalent to the sto-
rage of an exemplar. The new exemplar is given a categorical label based
on the labels of other nearby exemplars. This scheme limits the actual
memory used by exemplars. There is a limited number of discrete bins, and
each bin only stores an activation value proportional to the number of ex-
emplar instances that fall into it. Thus, not all the exemplar instances need
to be stored. A decay process decreases the activation of an exemplar bin
over time, corresponding to memory decay. Figure 1, taken from Pierre-
A dynamical model of change in phonological representations 223

humbert (2001), shows the F2 space discretized into categorically labeled
bins.
Figure 1. Exemplar bins with varying activations.
The set of exemplars with a particular category label constitutes an exten-
sional approximation of a probability distribution for that category over the
storage space. Given the coordinates in the storage space over which a cat-
egory is defined, that distribution would provide the likelihood that a token
with those coordinates would belong to that category (e.g. how likely is it
that the token is an /a/). During production, a particular exemplar from
memory is chosen to be produced, where the likelihood of being chosen
depends on how activated the exemplar is. The chosen exemplar is shifted
by a bias in the direction of lenition. This bias reflects the synchronic pho-
netic motivation for lenition. This includes at least the tendency to weaken
the degree of oral constriction in contexts favoring segmental reductions,
e.g. in non-stressed syllables, syllable codas, or intervocalically. For rele-
vant discussion see Beckman et al. (1992) and Wright (1994). To account
for entrenchment (see Table 1(5)), Pierrehumbert extends this production
model by averaging over a randomly selected area of exemplars to generate
a produced candidate. Since the set of exemplars defines a probability dis-
tribution (in an extensional sense), weighing the average by each exem-
plars probability results in a production candidate pushed toward the cen-
ter of the distribution.
The exemplar scheme described in this section derives the five proper-
ties of lenition discussed earlier as follows. Variability in production is
224 Adamantios Gafos and Christo Kirov

directly accounted for since production is modeled as an average of the
exemplar neighborhood centered around a randomly selected exemplar
from the entire set stored in the system. Each lexical item has its own ex-
emplars, and each production/perception loop causes the addition of a new
exemplar to the set. This new exemplar is more lenited than the speaker
originally intended due to biases in production, so the distribution of exem-
plars skews over time. In a given period of time, the number of produc-
tion/perception loops an item goes through is proportional to its frequency.
Thus, the amount of lenition associated with a given item shows gradient
variation according to the items frequency (Dell, 2000). As all processes
directly described by the exemplar model occur within a single individual,
lenition is clearly observable within the speech of individuals. Diachroni-
cally, lenition will proceed at a faster rate for more frequent items because
they go through more production/perception loops in a given time frame.
The synchronic consequence of this is that at a point in time, more frequent
items will be more lenited in the speech of an individual than less frequent
items. Finally, entrenchment is a consequence of averaging over several
neighboring exemplars during production, shifting the resulting production
towards the mean of the distribution described by all the exemplars.
In sum, the exemplar-based model offers a direct way to encode phonet-
ic details, and captures the assumed effects of frequency on lenition. Pier-
rehumbert further claims that the exemplar model is the only type of model
that can properly handle the above conception of lenition (Pierrehumbert,
2001:147). In what follows, we will propose an alternative dynamical mod-
el of lenition. The dynamical model can also account for the lenition prop-
erties reviewed above. But it is crucially different from the exemplar-based
model in two respects. The dynamical model encodes phonetic details
while maintaining unitary category representations as opposed to represen-
tations defined extensionally by collections of exemplars. In addition, the
dynamical model also admits a temporal dimension, which is currently not
part of the exemplar-based model.


2.2. A dynamical model of lenition
2.2.1. Description of the model
Studying language change as a process occurring in time broadly motivates
a dynamical approach to modeling. A dynamical model is a formal system
A dynamical model of change in phonological representations 225

whose internal state changes in a controlled and mathematically explicit
way over time. The workings of the proposed model are based on a dynam-
ical formalism called Dynamic Field Theory (Erlhagen & Schner, 2002).
A central component of our model is the spatio-temporal nature of its
representations. Take a lexical item containing a tongue tip gesture as that
for /d/. We can think of the specification of the speech movements asso-
ciated with this gesture as a process of assigning values to a number of
behavioral parameters. In well-developed models that include a speech
production component, these parameters include constriction location and
constriction degree (Guenther, 1995; Saltzmann & Munhall, 1989; Brow-
man & Goldstein, 1990). A key idea in our model is that each such parame-
ter is not specified exactly but rather by a distribution depicting the conti-
nuity of its phonetic detail.
Although our model does not commit to any specific phonological fea-
ture set or any particular model for the control and execution of movement,
to illustrate our proposal more explicitly let us assume the representational
parameters of Articulatory Phonology (Saltzmann & Munhall, 1989;
Browman & Goldstein, 1990). Thus, let us assume that lexical items must
at some level take the form of gestural scores. A gestural score, for current
purposes, is simply a sequence of gestures (we put aside the intergestural
temporal relations that also must be specified as part of a full gestural
score). For example, the sequence /das/ consists of three oral gestures - a
tongue tip gesture for /d/, a tongue dorsum gesture for /a/, and a tongue tip
gesture for /s/. Gestures are specified by target descriptors for the vocal
tract variables of Constriction Location (CL) and Constriction Degree
(CD), parameters defining the target vocal tract state. For example, /d/ and
/s/ have the CL target descriptor {alveolar}. The CD descriptor of /d/ is
{closure} and for /s/ it is {critical}. These descriptors correspond to actual
numerical values. For instance, in the tongue tip gesture of a /d/, {alveolar}
corresponds to 56 degrees (where 90 degrees is vertical and would corres-
pond to a midpalatal constriction) and {closure} corresponds to a value of 0
mm.
In our model, each parameter is not specified by a unique numerical
value as above, but rather by a continuous activation field over a range of
values for the parameter. The field captures among other things a distribu-
tion of activation over the space of possible parameter values so that a
range of more activated parameter values is more likely to be used in the
actual execution of the movement than a range of less activated parameter
values. The parameter fields then resemble distributions over the conti-
226 Adamantios Gafos and Christo Kirov

nuous details of vocal tract variables. A lexical item therefore is a gestural
score where the parameters of each gesture are represented by their own
fields. Schematic fields corresponding to the (oral) gestures of the conso-
nants in /das/ are given in Figure 2.
Figure 2. Component fields of /d/, /s/, and /a/. y-axis represents activation. /d/ and
/s/ have nearly identical CL fields, as they are both alveolars, but they
differ in CD.
Formally, parameters are manipulated using the dynamical law from Dy-
namic Field Theory (Erlhagen & Schner, 2002). The basic dynamics go-
verning each field are described by:
noise t x input h t x p t x dp + + + = ) , ( ) , ( ) , ( (1)
where p is the field in memory (a function of a continuous variables x, t), h
is the fields resting activation, dp(x,t) is the change in activation at x at
time t, is a constant corresponding to the rate of decay of the field (i.e. the
rate of memory decay), and input(x,t) is a field representing time dependent
external input to the system (i.e. perceived token) in the form of a localized
activation spike.
The equation can be broken down into simpler components to better un-
derstand how it functions. The core component h t x p t x dp + = ) , ( ) , ( is an
instance of exponential decay. If we arbitrarily select a value for x, and plot
p(x,t) over time, we will see behavior described by the exponential decay
equation. In the absence of any input or interaction, the activation at p(x,t)
A dynamical model of change in phonological representations 227

will simply decay down to its resting level, h, as shown in Figure 3. If p(x,t)
starts at resting activation, it will remain there forever. In the terminology
of dynamical systems, the starting activation of a point is known as an ini-
tial condition, and the activation it converges to, in this case the resting
activation, is known as an attractor. If the input term, input(x,t), is non-
zero, then the system will move towards a point equivalent to its resting
activation plus the input term. The speed of the process is modulated using
the term.
Figure 3. Top left: In the absence of input, field activation at a particular point
converges to the resting level h = 1 (dashed line) ( = 10). Top right:
With added input input(x,t) = 1, activation converges to resting level h =
1 plus input (top dashed line)( = 10). Bottom left: In the absence of in-
put, node activation converges to resting level r = 1 ( = 20). Bottom left:
With added input input(x,t) = 1, activation converges to resting level r =
1 plus input ( = 20).
Fields are spatio-temporal in nature. Thus specifying the value of a gestural
parameter is a spatio-temporal process in our model. We describe each of
these aspects, spatial and temporal, in turn. The spatial aspect of the gestur-
al specification process corresponds to picking a value to produce from any
of the fields in Figure 2, e.g. choosing a value for Constriction Location for
/d/ and /s/ from within the range of values corresponding to the [alveolar]
228 Adamantios Gafos and Christo Kirov

category. This is done by sampling the Constriction Location field, much as
we might sample a probability distribution. Since each field encodes varia-
bility within the users experience, we are likely to select reasonable para-
meter values for production. A demonstration of this is shown in Figure 4.
The noisy character of the specification process allows for variation in the
value ultimately specified, but as the series of simulations in Figure 4 veri-
fies the selected values cluster reliably around the maximally activated
point of the field.
Figure 4. Variability in production. Histogram of selected values over 100 simula-
tions of gestural specification. Histogram overlaid on top of field to show
clustering of selected values near the field maximum.
The specification process presented here is similar but not identical to the
sampling of a probability distribution. Fields have unique properties that
make them useful for modeling memory. Unlike distributions, fields need
not be normalized to an area under the curve of one. The key addition here
is the concept of activation. Fields can vary from one another in total acti-
vation while keeping within the same limits of parameter values. Because
of this added notion of activation, the specification process is more biased
towards the maximally activated point in the field (i.e. the mean of the dis-
tribution) than a true random sampling would be. This leads to an entren-
chment effect for categories not undergoing change. This behavior is shown
in Figure 5. In addition, fields have a resting activation level (a lower-limit
on activation). This level slowly tends to zero over time, but increases
every time the field is accessed during production or perception. Thus, lexi-
cal items whose fields are accessed more frequently have higher resting
activation levels than lexical items whose component fields are accessed
less frequently. Finally, much as memory wanes over time, activation along
a field decays if not reinforced by input.
The other crucial aspect of the specification process is its time-course.
Formalizing gestural parameters with fields adds a time-course dimension
A dynamical model of change in phonological representations 229

to the gestural specification process. Thus, if a lexical representation con-
tains a /d/, the CD and CL parameters for this /d/ are not statically assigned
to their (language- or speaker- specific) canonical values, e.g., CL = [alveo-
lar]. Rather, assigning values to these parameters is a time-dependent
process, captured as the evolution of a dynamical system over time. In
short, lexical representations are not static units. This allows us to derive
predictions about the time-course of choosing or specifying different ges-
tural parameters.
Figure 5. Output of entrenchment simulation. The x-axis represents a phonetic
dimension (e.g. constriction degree). The field defining the distribution
of this parameter is shown at various points in time. As time progresses,
the field becomes narrower.
The specification process begins by a temporary increase in the resting
activation of the field, i.e. pushing the field up, caused by an intent to pro-
duce a particular lexical item (which includes a gesture ultimately specified
for a parameter represented by this field). Activation increases steadily but
noisily until some part of the field crosses a decision threshold and be-
comes the parameter value used in production. This scheme ensures that the
areas of maximum activation are likely to cross the decision threshold first.
After a decision has been made, resting activation returns to its pre-
production level. The following equation represents this process mathemat-
ically:
( ) noise h p d h h dt dh + + + = * ) max( , /
0
(2)
230 Adamantios Gafos and Christo Kirov

where h is the temporarily augmented resting activation, is a time scaling
parameter, h
0
is the pre-production resting activation level, ( ) ) max( , p d is a
nonlinear sigmoid or step function over the distance between the decision
threshold d and the maximum activation of field p, and noise is scaled
gaussian noise. While the distance is positive (the decision threshold has
not yet been breached), the function is also positive and greater than 1,
overpowering the - h term and causing a gradual increase in the resting
activation h. When the decision threshold is breached, the function be-
comes 0, and remains clamped at 0 regardless of the subsequent field state,
allowing the - h term to bring activation back to h
0
.
The gestural specification process is affected by the pre-production rest-
ing activation of the field, in that a field with high resting activation is al-
ready presampled, and thus automatically closer to the decision thre-
shold. This leads to faster decisions for more activated fields, and by
extension more frequent parameter values. The relevant simulations are
described below. Figure 6 shows representative initial fields, and Figure 7
shows the progression of the featural specification process over time. We
see that given two fields identical in all respects except for resting activa-
tion, the field with the higher resting activation reaches the decision thre-
shold first.
Figure 6. The two fields are identical except for resting activation: h
0
= 1 (left), h
0

= 2 (right). The x-axis is arbitrary.
Figure 7. Sampling was simulated with a decision threshold d = 5, = 10, and
noise = 0. The first field (left) reached the decision threshold at t = 25,
A dynamical model of change in phonological representations 231

and the second field (right) reached the decision threshold at t = 9 (where
t is an arbitrary unit of simulation time). The field with higher initial rest-
ing activation reached the decision threshold faster. Both fields return to
their pre-production resting activation after decision threshold is reached.
We now discuss the ways in which representing gestural parameters by
fields relates to other proposals.
The field equation used in our model parallels the exemplar model in
many ways, but encapsulates much of the functionality of that model in a
single dynamical law which does not require the storage of exemplars.
Memory wanes over time as the field decays, much as older exemplars are
less activated in the exemplar model. Input causes increased activation at a
particular area of the field, much as an exemplars activation is increased
with repeated perception. This activation decays with time, as memory
does.
Perhaps the most crucial difference between our model and the exem-
plar model described earlier is the time-course dimension. In the exemplar
model discussed, the assignment of a value to a parameter does not have
any time-course. The process is instantaneous. The same is true for the
relation between our model and those of Saltzman & Munhall (1989),
Browman & Goldstein (1990).
Using fields is a generalization of a similar idea put forth in Byrd &
Saltzman (2003), where gestural parameters are stored as ranges of possible
values. In our model, each range is approximated by an activation field in
memory. Finally, representing targets by activation fields is also a generali-
zation of two well-known proposals about the nature of speech targets,
Keatings "windows" (Keating, 1990) and Guenthers "convex regions"
(Guenther, 1995). In Guenthers model of speech production, speech tar-
gets take the form of convex regions over orosensory dimensions. Unlike
other properties of targets in Guenthers model, the convexity property does
not fall out from the learning dynamics of the model. Rather, it is an en-
forced assumption. No such assumption about the nature of the distribu-
tions underlying target specification need be made in our model.

2.2.2. Lenition in the dynamical model
When a lexical item is a token of exchange in a communicative context,
phonetic details of the items produced instance may be picked up by per-
ception. This will have some impact on the stored instance of the lexical
item. Over longer time spans, as such effects accumulate, they trace out a
232 Adamantios Gafos and Christo Kirov

path of a diachronic change. Our model provides a formal basis for captur-
ing change at both the synchronic and the diachronic dimensions.
We focus here on how a single field in a lexical entry is affected in a
production-perception loop. The crucial term in the field equation is the
input term input(x,t). This input term input(x,t) represents sensory input.
More specifically, input is a peak of activation registered by the speech
perception module. This peak is located at some detected x-axis value along
the field. This value is assumed to be sub-phonemic in character. For ex-
ample, we assume that speakers can perceive gradient differences in Voice
Onset Time values, constriction location, and constriction degree within the
same phonemic categories. In the current model, the input term is formu-
lated as
( ) off x
e

, where off is the detected value or offset along the x-axis of
the field.
The spike corresponding to the input term input(x,t) is directly added to
the appropriate field, resulting in increased activation at some point along
the fields x-axis. A concrete example is presented in Figure 8. Once input
is presented, a system can evolve to a stable attractor state, that is, a loca-
lized peak at a value corresponding to the input. The state is stable in the
sense that it can persist even after the input has been removed. In effect, the
field for the lexical item has retained a memory of the sub-phonemic detail
in the recently perceived input. The process of adding gaussian input spikes
to an existing field is analogous to the storage of new exemplars in the ex-
emplar model. The field, however, remains a unitary function. It is an in-
tensional representation of a phonetic distribution. A growing set of exem-
plars is an extensional representation.
Figure 8. (Left) Field representing a phonetic parameter of a lexical item in mem-
ory. (Middle) Input function (output of perception corresponding to
) , ( t x input in Equation 1). Represents a localized spike in activation
along the field, corresponding in location to, for example, the constric-
tion degree of the input. (Right) Field of lexical item in memory after in-
put is added to it. Field shows increased activation around area of input.
A dynamical model of change in phonological representations 233

Since activation fades slowly over time, only areas of the field that receive
reinforcement are likely to remain activated. Thus, a peak in activation may
shift over time depending on which region of the field is reinforced by in-
put. In terms of the lenition model this means that regions of the field
representing a less lenited parameter fade while regions representing a
more lenited parameter are kept activated by reinforcement from input.
The interaction between localized increase in activation based on input
and the slow fading of the field due to memory decay is the basic mechan-
ism for gradual phonetic change. Since activation fades slowly over time,
only areas of the field that receive reinforcement are likely to remain acti-
vated. So, a peak in activation may shift over time depending on which
region of the field is reinforced by input. Regions of the field representing a
less lenited parameter fade while regions representing a more lenited para-
meter are kept activated by reinforcement from input.
Given an initial field (a preshape) representing the current memory state
of a lexical item, we can simulate lenition using the model described above.
Figure 9 shows the results of one set of simulations. Shown are the state of
the simulation at the starting state, after 50 samples of a token, and after
100 samples (in the simulations, the number of samples is small but each
sample produces a large effect on the field). Each time step of the simula-
tion corresponds to a production/perception loop. Production was per-
formed as described above by picking a value from the field and adding
noise and a bias to it. This produced value, encoded by an activation spike
of the form
( ) off x
e

, where bias noise p sample off + + = ) ( , was fed back into
the system as input.
As can be seen in Figure 9, at the point when lenition begins, the field
represents a narrow distribution of activation and there is little variability
when sampling the field during production. As lenition progresses, the dis-
tribution of activation shifts to the left. During this time the distribution
becomes asymmetrical, with a tail on the right corresponding to residual
traces of old values for the parameter. It also grows wider, corresponding to
an increase in parameter variation while the change occurs.

234 Adamantios Gafos and Christo Kirov

Figure 9. Output of lenition simulation. The x-axis represents a phonetic dimension
(e.g. constriction degree during t/d production). Each curve represents a
distribution of a particular category over the x-axis at a point in time. As
time progresses, the distribution shifts to the left (i.e. there is more un-
dershoot/lenition) and becomes broader.
With small changes in parameterization, our model can more closely
represent the entrenchment behavior seen in Pierrehumbert (2001). In Fig-
ure 10, lowering the strength of memory decay by adding a constant < 1
factor in the p(x,t) term in Equation 1, results in less flattening of the pa-
rameter field as lenition proceeds. However, the distribution retains a wide
tail of residual activation around its base.
To keep the field narrow as time proceeds, we can alternate between
production/perception cycles with a production bias and without. This re-
sembles production of the category in contexts where the phonetic motiva-
tion for the bias is present versus contexts where it is absent (e.g. prosodi-
cally weak versus strong positions). In effect, Figure 11 was created by
biasing only every other simulated production. This was done in addition to
lowering the strength of memory decay as discussed above.

A dynamical model of change in phonological representations 235

Figure 10. Lowering memory decay results in less flattening of the field as the
lenition simulation proceeds.
Like the exemplar model above, the model described in this section can
derive the properties of lenition assumed by the exemplar model. Here we
enumerate the functional equivalence of the two models with respect to the
properties of lention assumed by the exemplar model. In the dynamical
model, variability in production is accounted for by noise during the ges-
tural specification process. Each lexical item has its own fields and each
production/perception loop causes a shift in the appropriate field towards
lenition due to biases in production (see Figure 8 for an example of a field
starting to skew to the left). In a given period of time, the number of pro-
duction/perception loops an item goes through is proportional to its fre-
quency. Thus, the amount of lenition associated with a given item shows
gradient variation according to the items frequency. All the processes de-
scribed here occur within a single individual, so lenition is clearly observa-
ble within the speech of individuals. Diachronically, lenition will proceed
at a faster rate for more frequent items, again because they go through more
production/perception loops in a given time frame. This same mechanism is
evident synchronically as well, since at any single point in time, more fre-
quent items will be more lenited than less frequent items.
236 Adamantios Gafos and Christo Kirov

Figure 11. Interleaving biased and non-biased productions leads to consistently
narrow field.
In sum, the broad proposal of this section is that diachronic change can be
seen as the evolution of lexical representations at slow time scales. The
specific focus has been to demonstrate that certain lenition effects, de-
scribed in a previous exemplar model, can also be captured in our model of
evolving activation fields.
3. Conclusion
We have presented a dynamical model of speech planning at the featural or
vocal tract variable level. This model allows us to provide an alternative
account for lenition in lieu of an exemplar-based model. The dynamical and
exemplar models cover the same ground as far as their broad agreement
with the assumed properties of an evolving lenition process are concerned.
However, there are fundamental high level differences between the two.
Tables 2 and 3 contrast properties of the exemplar and dynamical models.


A dynamical model of change in phonological representations 237

Table 2. Properties of the exemplar model
Exemplar Model
1. Every token of a category (where category could mean any item
capable of being recognized - word, phoneme, animal cry, etc.) is
explicitly stored as an exemplar in memory. A new experience
never alters an old exemplar (Hintzman, 1986).
2. the complete set of exemplars forms an extensional definition of a
probability distribution capturing variability of a category.
3. Distributions are altered by storing more exemplars.
Table 3. Properties of the dynamical model
Dynamical Model
1. Every token of a category is used to dynamically alter a single
representation in memory associated with that category, and is
then discarded. No exemplars are stored.
2. Variability is directly encoded by the singular representation
of a category. The parameters of a category exist as field ap-
proximations to probability distributions which are defined in-
tensionally. That is, they are represented by functions, rather-
than a set of exemplars.
3. Distributions are altered by dynamical rules defining the im-
pact of a token on a distribution, and changes to the distribu-
tion related to the passage of time.

Two key differences are highlighted. First, the dynamical model remains
consistent with one key aspect of generative theories of representation.
Instead of representing categories extensionally as arbitrarily large exem-
plar sets, linguistic units and their parameters can have singular representa-
tions
2
. These are the fields in our specific proposal. It is these unitary repre-
sentations, rather than a token by token expansion of the exemplar sets, that
drifts in sound change. In this sense, our model is similar to other non-
exemplar based models of the lexicon such as Lahiri & Reetzs (2002)
model while still admitting phonetic detail in lexical entries (see previous
238 Adamantios Gafos and Christo Kirov

chapter of this volume by Nguyen, Wauquier & Tuller, for relevant discus-
sion).
Second, the dynamical model is inherently temporal. Since both the ex-
emplar and the dynamical model are at least programmatically designed to
include production and perception, which unfold in time, this seems to be a
key property. In an extension of the present model, we aim to link percep-
tual to motor representations and to provide an account of the effects of
certain lexical factors (such as neighborhood density and frequency) on the
time-course of speech production. Such an account would contribute to the
larger goal of establishing an explicit link between the substantial literature
on the time-course of word planning and linguistic theories of representa-
tion.
Acknowledgments
We wish to thank Matt Goldrick for his comments on an earlier draft. Many
thanks also to the two anonymous reviewers and the editors for detailed and
cogent commentary on the manuscript. Research supported by NIH Grant
HD-01994 to Haskins Labs. AG also acknowledges support from an Alex-
ander von Humboldt Research Fellowship.


Notes

1
. Authors names are listed in alphabetic order. Correspondence should be ad-
dressed to both authors, adamantios.gafos@nyu.edu, kirov@cogsci.jhu.edu
2
. It is useful to distinguish the exemplar approach from a version of the dynami-
cal one where multiple different instances of a category are stored, correspond-
ing to different registers, different speakers, etc. For our purposes, each of these
subcategories is considered unique and has a singular representation.
References
Beckman, Mary E., de Jong, Ken, Jun, Sun-Ah, & Lee, Sook-hyang.
1992 The interaction of coarticulation and prosody in sound change. Lan-
guage and Speech, 35:4558.


A dynamical model of change in phonological representations 239

Browman, Catherine P., & Goldstein, Louis.
1990 Gestural specication using dynamically dened articulatory struc-
tures. Journal of Phonetics,18:299320.
Bybee, Joan
2003 Lexical diffusion in regular sound change. In Restle, D., & Zaefferer,
D. (eds), Sounds and Systems: Studies in Structure and Change.
Mouton de Gruyter, Berlin, pp. 5874.
Byrd, Dani & Saltzman, Elliot
2003 The elastic phrase: modeling the dynamics of boundary-adjacent
lengthening. Journal of Phonetics, 31(2):149180.
Cser, Andrs
2003 The Typology and Modelling of Obstruent Lenition and Fortition
Processes. Akadmiai Kiad.
Dell, Gary S.
2000 Lexical representation, counting, and connectionism. In Broe, Mi-
chael B., & Pierrehumbert, Janet (eds), Papers in Laboratory Pho-
nology V. Cambridge University Press, Cambridge UK, pp. 335348.
Erlhagen, Wolfram, & Schner, Gregor
2002 Dynamic field theory of movement preparation. Psychological Re-
view, 109:545572.
Guenther, Frank H.
1995 Speech sound acquisition, coarticulation, and rate effects in a neural net-
work model of speech production. Psychological Review, 102:594621.
Gurevich, Naomi
2004 Lenition and contrast: functional consequences of certain phoneti-
cally conditioned sound changes. Outstanding dissertations in lin-
guistics Series. Routledge, NY.
Hintzman, Douglas H.
1986 Schema abstraction in a multiple-trace memory memory model.
Psychological Review, 93(4):411428.
Keating, Patricia A.
1990 The window model of coarticulation: articulatory evidence. In
Beckman, M. E., & Kingston, J. (eds), Papers in laboratory phonol-
ogy I: Between the grammar and the physics of speech. Cambridge:
Cambridge University Press, pp. 451470.
Kiparsky, Paul
1968 Linguistic universals and linguistic change. In Emmon, Bach, &
Robert, Harms (eds), Universals in linguistic theory. New York: Ri-
nehart and Winston, pp. 170202.
Lahiri, Aditi & Reetz, Henning
1990 Underspecied recognition. In Gussenhoven, C., & Warner, N. (eds),
Papers in laboratory phonology VII. Berlin, Germany: Mouton de
Gruyter, pp. 637675.
240 Adamantios Gafos and Christo Kirov

Lavoie, Lisa
2001 Consonant strength: Phonological patterns and phonetic manifesta-
tions. Outstanding dissertations in linguistics Series. Garland, NY.
Pierrehumbert, Janet
2001 Exemplar dynamics: word frequency, lenition, and contrast. In By-
bee, J., & Hopper, P. (eds), Frequency effects and the emergence of
linguistic structure. John Benjamins, Amsterdam, pp. 137157.
Saltzman, Elliot L., & Munhall, Kevin G.
1989 A dynamical approach to gestural patterning in speech production.
Ecological Psychology, 1:333382.
Wright, Richard
1994 Coda lenition in american english consonants: An EPG study. The
Journal of the Acoustical Society of America, 95:28192836.



Cross-linguistic trends in the perception of place of
articulation in stop consonants: A comparison
between Hungarian and French
Willy Serniclaes and Christian Geng
1. Introduction
A basic question in the study of speech development during infancy is to
understand how the perceptual predispositions of the pre-linguistic child
contribute to phonological features in a given language. During a period
extending from birth to some six months of age, children perceive many of
the phonological contrasts present in the world's languages. Pre-linguistic
children react to the acoustic differences between various phonological
categories, whereas they do not react to differences within categories
(Kuhl, 2004). This indicates that the perceptual processes are somehow
prepared since birth for handling phonological contrasts. Being independent
of any specific language these perceptual processes correspond to universal
capacities, hence after "predispositions", and there is a long standing debate
on their nature. For the Motor theory of speech perception (Liberman &
Mattingly, 1985), the predispositions are part of a phonetic module specia-
lized for the perception of articulatory features, such as tongue position,
interarticulator timing, etc. Alternatively, the predispositions would be psy-
choacoustic in nature (Jusczyk, 1997; Gerken & Aslin, 2005) and corres-
pond to acoustic features such as the direction of formant transition fre-
quencies, tone onset time, etc. While the issue of the phonetic vs. acoustic
nature of the predispositions certainly has a great interest, another funda-
mental question is about the adaptation of the universal predispositions to
the perception of language-specific categories. We will address this last
question in the present paper, leaving aside as far as possible the issue of
the specific nature of these predispositions.
Possible answers in current theories of speech development are that
phonological categories (1) are acquired through selection of predisposi-
tions relevant for perceiving universal features (Pegg & Werker, 1997); (2)
emerge by building up prototypes in an acoustic space, without straightfor-
ward relationship with the predispositions (Kuhl, 2004). According to se-
lectionist approaches in their strongest form, phonological contrasts should
242 Willy Serniclaes and Christian Geng

conform to the perceptual predispositions and adaptation to the language
environment would then proceed by mere selection of predispositions
(Pegg & Werker, 1997). The orthogonal position votes for a language-
specific morphing of the acoustic space and almost free configurability of
vowels in this space. A compromise between the strong innatist position
and the prototypical approach is that adaptation of the perceptual predispo-
sitions to the linguistic paradigm of the ambient language proceeds not only
by selection but also by combinations between predispositions (Serniclaes,
2000; Hoonhorst, Colin, Radeau, Deltenre, & Serniclaes, 2006). As there is
fairly strong evidence for the existence of predispositions for the perception
of virtually all possible phonetic contrasts in the worlds languages (e.g.
Vihman, 1996), we cannot avoid contemplating the constraints imposed by
such predispositions on the build-up of phonological categories prevailing
in a specific language.
A similar question to the one raised by the acquisition of phonological
systems by individuals is to understand how universal features contribute to
the genesis of phonological systems. These questions are fairly similar as
they pertain to the build-up of language-specific phonological features from
a universal set of features. Somehow in analogy with the developmental
models, phonetic models of cross-linguistic diversity describe phonological
systems as (1) combinations between universal features (Jakobson, Fant, &
Halle, 1952); (2) optimization of distances between categories in some
acoustic space (Liljencrants & Lindblom, 1972). In the featural approach of
phonological systems, categories are the subproduct of feature combina-
tions and the latter can either be orthogonal or not. Cross-linguistic differ-
ences in consonant place of articulation offer a typical example of non-
orthogonal combination between two different binary features, many lan-
guages displaying three places of articulation categories instead of the 17
potential ones for consonants in the world's languages (Ladefoged & Mad-
dieson, 1996). More specifically for the purpose of the present study on the
perception of stop consonants, two different binary features generally giv-
ing rise to three categories, the /b, d, g/, instead of the four potential ones,
the /b, d, , g/, in the Jakobsonian framework which was explicitly defined
with reference to acoustics and perception (Jakobson et al., 1952). Non-
orthogonal combinations between features in system building are conceptu-
ally similar to those occurring during phonological development (see here-
after feature "couplings"): in both cases the features are used in such a way
as to construct, or to appropriate, the categories prevailing in a given lan-
guage. Distance models take quite another way. Liljencrants & Lindblom
Cross-linguistic trends in the perception of place of articulation 243

(1972) were the first to relate phonetic principles of perceptual contrast to
the structure of vowel inventories and their sizes. In short, languages prefer
vowels which are maximally distinct for the perceiver and are produced
with the least effort for the producer. Dispersion approaches to phonologi-
cal systems match nicely with the prototype approach of speech develop-
ment: in both cases categories freely emerge in a uniform psycho-acoustic
space. The language maximizes the perceptual distances between categories
(Liljencrants & Lindblom, 1972) and the perceptual system finds back the
categories by creating maximally different prototypes (Guenther & Boh-
land, 2002).
However, both the prototypical approach of speech development and
distance models of phonological systems do not take account of the possi-
ble role of initial conditions in the build-up of the phonological spaces. In
their strongest forms, phonological contrasts should conform to the percep-
tual predispositions and adaptation to the language environment would then
proceed by mere selection of predispositions. However, this seems hardly
tenable in view of the diversity of phonological contrasts and their plastic-
ity across phonetic contexts. Here we present some further evidence in
support to the hypothesis that adaptation of the universal predispositions to
the ambient language proceeds by combinations between predispositions
(Serniclaes, 2000).


1.1. Models of speech perception development
In many instances, adult listeners display categorical perception: they are
much more sensitive to differences between speech sounds belonging to
two different feature categories than to a similar acoustic variation between
sounds belonging to the same category, i.e. when no boundary is crossed
(see for a review: Harnad, 1987). Still, there are also examples of non-
categorical perception in adults, illustrated by better discrimination of with-
in vs. between-category differences (e.g. Massaro, 1987). However, the
origins of this non-categorical perception are not entirely clear and might at
least partly be due to the existence of subordinate categories, i.e. categori-
cal distinctions without explicit labels that are finer-grained than those un-
der scope in a given task (Snowdown, 1987). The existence of subordinate
categories along speech continua is congruent with the hypothesis that the
perception of the phonological categories evidenced in the adults arises
244 Willy Serniclaes and Christian Geng

from combinations of predispositions present in the pre-linguistic child (see
hereafter "feature couplings").
Predispositions for categorical perception have been evidenced in the
pre-linguistic child (below six months of age, see for a review: Vihman,
1996). For instance, infants younger than six months are sensitive to both
negative and positive natural VOT boundaries whatever their linguistic
background (Spanish, Kikuyu: Lasky, Syrdal-Lasky & Klein, 1975). Simi-
lar predispositions were evidenced for place of articulation (Eimas, 1974).
The initial ability to discriminate the universal set of phonetic contrasts
however appears to decline in the absence of specific language experience.
The decline occurs within the first year of life (Werker & Tees, 1984a) and
it involves a change in processing strategies rather than a sensorineural loss
(Werker & Tees, 1984b). Finally, repeated experience to the sounds of a
given language also gives rise to facilitation effects (Kuhl, Stevens, Haya-
shi, Deguchi, Kiritani, & Iverson, 2006).
How do phonological features arise from predispositions? One possible
answer to this question is that phonological features are acquired through
selection of predispositions relevant for perceiving categories in a given
language (Pegg & Werker, 1997). Another possibility is that phonological
categories emerge by exposure to the sounds present in a given language
without relationship to the predispositions (Kuhl, 2004). While it seems
evident that adaptation to a specific language does not proceed entirely
through selective processes, there is evidence that the emergence of phono-
logical percepts is somehow constrained by predispositions. For instance, in
a French speaking environment the discrimination of VOT in young infants
below the age of six months is organized around universal boundaries lo-
cated around some -30 ms and +30 ms whereas infants above six months
discriminate the adult VOT boundary which is located at 0 ms (Hoonhorst
et al., 2006).
The phonological coupling hypothesis (Serniclaes, 2000) explains the
emergence of non-native language specific boundaries by interactions be-
tween the universal predispositions. For example, the predispositions for
perceiving either negative or positive VOT cannot directly account for the 0
ms VOT boundary, as children below some six months of age are not sensi-
tive to this boundary. This boundary might simply result from the addition
of the raw acoustic inputs, which would imply that the predispositions are
deactivated. However, experiments with stimuli generated by factorial
variation of negative and positive VOT (i.e. stimuli in which the two cues
are in conflict) suggest that the 0 ms boundary is obtained by integrating
Cross-linguistic trends in the perception of place of articulation 245

the cues after interactive processing by the predispositions, i.e. after cate-
gorisation of positive VOT has affected categorisation of negative VOT,
and conversely (Serniclaes, 2000). Such interactive integration is a special
instance of the general concept of coupling between perceptual entities
(for a review see Masin, 1993).
It is well known that different acoustic cues contribute to the perception
of the same phonological feature (Repp, 1982). One might wonder whether
couplings between predispositions underlie the integration of all the cues
which contribute to the same feature. This would be the case if each acous-
tic cue was processed by a different predisposition. There are however two
arguments against generalized couplings. First, the integration of some
acoustic cues result from psychoacoustic interactions, an obvious example
being given by the trade-off between burst duration and intensity for the
perception of the voicing feature (Repp, 1979). The second argument
against generalized couplings is that acoustic cues which are independent
on acoustic groups might nevertheless be part of the same predisposition
because they are tied to the same articulatory dimension. However, this
latter argument is only valuable if we take for granted that predispositions
are phonetic in nature.


1.2. Place of articulation perception
The perception of place of articulation in stop consonants offers additional
evidence for couplings between features. There are basically two different
kinds of acoustic cues involved in place perception: those carried by the
burst and those carried by the formant transitions. Although these cues
might correspond to different predispositions, as a startpoint we will con-
sider here that burst and transitions are part of the same predisposition (be it
for psychoacoustic or phonetic reasons). Further, we will start from the
transition-based description of the place features afforded by the Distinctive
Region Model (DRM) of place production (Carr & Mrayati, 1991). The
DRM is organized around the neutral vowel (schwa) as a central reference.
In the neutral vowel context, place boundaries tend to correspond to flat F2-
F3 transitions, the categories being characterized by rising vs. falling transi-
tions (Figure 1). The four possible combinations between F2 and F3 transi-
tions directions generate four Distinctive Regions on the anterior-posterior
direction with the following specifications: F2-F3 both rising (R8), F2 ris-
ing-F3 falling (R7), F2-F3 both falling (R6), F2 falling-F3 rising (R5). Al-
246 Willy Serniclaes and Christian Geng

though there are no clearcut correspondences between Distinctive Regions
and articulatory descriptions of place categories, the R8, R7, R6, and R5
regions are usually ascribed to labial (/b/), dental (void in French), alveolar
(/d/) and velar (/g/) places of articulation, in that order.
1500
2000
2500
3000
500 1000 1500 2000 2500
F2 onset
F3 onset
S1
S3 S2
S11 S10 S9 S8
S7
S6
S5
S4
S14
S13
S12
labial
R8
dental
R7
velar
alveo-
palatal
R6
R5

Figure 1a.

Figure 1b Figure 1c Figure 1d
Figure 1. Locations of potential place categories in the F2-F3 transition onset
space according to the Distinctive Region Model of speech production
(Fig.1a) and of perceptual boundaries according to different models of
language acquisition (Figs. 1b, 1c, 1d). According to the DRM, changes
in size of four different articulatory regions allow separate control over
the direction of F2 and F3 transitions. S1 S14 indicate the location of
the stimuli of the present experiment in the F2-F3 transition onset space
(Fig. 1a).
Cross-linguistic trends in the perception of place of articulation 247

For the purpose of relating the DRM to the four Hungarian stop categories,
our working hypothesis in the present study was that palatal stops (//)
would occupy the R6 region in this language and that the remaining catego-
ry (/d/) would occupy the R7 region. To locate the Hungarian palatal in the
R6 region is congruent with its complex nature (Keating, 1988) with both
dorsal (like the velar) and coronal (like the alveolar) checkmarks. Although
Hungarian /d/ stops are usually considered as alveolars rather than dentals,
there are large individual variations in coronal place of articulation (as evi-
denced by articulatory investigations in English and French: Dart, 1998).
Therefore, it is not unreasonable to postulate that the presence of palatals in
the R6 region might push the Hungarian /d/ stops inside the dental region
(R7).
Congruent with the acoustic differences between place categories, place
perception is grounded on changes in transition direction in the neutral
context (Serniclaes & Carr, 2002). However, while the perceptual place-
of-articulation boundaries correspond to flat transitions in the neutral con-
text, they undergo specific adjustments in other contexts. The place boun-
dary is shifted towards falling transitions before back rounded vowels, ris-
ing transitions before front unrounded vowels, and intermediate positions
before front rounded vowels. The radial model of place perception states
that the contextual adjustments of the transition boundary follow a rotation-
al movement in the F(onset) F(endpoint) plane around a central point
corresponding to the flat transition in the neutral context, the direction of
the boundary line depending on the perceived identity of the following
vowel (Serniclaes & Carr, 2002).
Though place perception is strongly dependent of phonetic context, the
fact that place boundaries correspond to flat F2 and F3 transitions in the
neutral vowel context points to a relationship with natural psychoacoustic
settings. Both infants below 9 months of age and adults are much less sen-
sitive to a difference between two different falling or rising frequency tran-
sitions than to a difference between a falling and a rising transition (Aslin,
1989). This suggests that flat transition boundaries correspond to basic
psychoacoustic limitations. Although psychoacoustic in nature, the sensitiv-
ity to changes in the direction of frequency transitions might be adapted for
perceiving place of articulation during language development. Alternative-
ly, it is also possible that sensitivity to differences in the frequency transi-
tion direction were integrated into a speech specific module during phylo-
genetic evolution (Liberman, 1998). What is clear is that flat transition
248 Willy Serniclaes and Christian Geng

boundaries are not directly usable for perceiving consonant place of articu-
lation in all the languages.
The flat F2-F3 transitions might be straightforwardly used for perceiv-
ing place of articulation contrasts in a four-category language such as Hun-
garian. Each category would then occupy a single region of the formant
transition onset F2-F3 acoustic space (Figure 1b). In a three category lan-
guages such as French, the perceptual boundaries afforded by flat F2 and
F3 transitions might also be used as such for perceiving place contrasts but
one region would then be perceptually void. This is probably what an en-
tirely selective model of predispositions would predict, although the propo-
nents of selective models did not address the present conjecture. However,
as explained above, a more realistic model of speech development with
predispositions is that there are also couplings between predispositions. In
the present case, coupling between the predispositions for perceiving the F2
and F3 transitions might give rise to a new boundary that would be settled
in the middle of the void region, while the two other boundaries stick to
the natural settings (Figure 1c). While these two models of perceptual de-
velopment are grounded on predispositions, a distance optimization view
would call on the categories present in the environment for positioning the
boundaries. Under this view, categories would tend to divide the space into
three equal regions and the boundaries would be settled accordingly (Figure
1d), without evident relationship with natural settings.
It should be stressed that while coupling between perceptual predisposi-
tions such as those for perceiving changes in the direction of frequency
transitions might proceed in a variety of ways, the simplest hypothesis is
that coupling is linear and simply additive, i.e. with equal weightings. Lin-
ear relationships are simpler on mathematical grounds than are nonlinear
functions. Further, there are positive reasons for preferring linear functions
when dealing with frequency transitions. Linear relationships between the
onset and offset points of frequency transitions contribute to the discrimina-
tion between /b, d, g/ place of articulation categories (Sussman, Fruchter,
Hilbert, & Sirosh, 1998). Finally, linear processing of frequency transitions
might be anchored in highly performant processes evidenced in animals
(bats) and which might also be present under some phylogenetically de-
rived form in man (Sussman et al., 1998). Now, if linear relationships pre-
vail in the processing of single frequency transitions, couplings between
transitions should naturally follow linear rules. As to the additivity hy-
pothesis, couplings might well rely on weighted combinations in which one
transition might be more important than the other. However, if this was
Cross-linguistic trends in the perception of place of articulation 249

true, the boundaries might take a wide range of different directions in the
F2-F3 space. The predictions of the coupling model would then not be dif-
ferent from those of the Free Dispersion model.


1.3. The present study
We recently collected perceptual evidence in support of combinations be-
tween phonetic features for place of articulation (in French: Serniclaes,
Bogliotti, & Carr, 2003; in Hungarian: Geng, Mdy, Bogliotti, Messaoud-
Galusi, Medina, & Serniclaes, 2005). While phonological boundaries did
not always correspond to those included in the universal predispositions,
thereby confirming that simple selectionnist approaches cannot account of
perceptual development on its own, all the phonological boundaries were
somehow related to the universal ones. This lends support to the hypothesis
that phonological boundaries which are seemingly unrelated to the percep-
tual predispositions arise from couplings (i.e. interactive combinations)
between predispositions (Serniclaes, 2000). Also consistent with couplings
is that similar place of articulation boundaries were found for the distinc-
tions that are shared by French and Hungarian, despite the fact that Hunga-
rian uses four place categories whereas French only uses three place cate-
gories. However, no direct comparisons between the French and Hungarian
boundaries were performed in our previous reports.
Here we present some new evidence in support of coupling between
predispositions in the perception of phonological features based on cross-
linguistic comparisons. A direct comparison between the labelling re-
sponses of either French or Hungarian listeners to the same stimuli was
used to test the similarities and differences between these two languages.
We expected that both languages would display the same perceptual
boundaries for distinctions they share in common in the F2-F3 transition
onset space. Further, we wanted to confirm that these boundaries corre-
spond to natural boundaries or to some coupling between natural bounda-
ries (see Figure 1c).
250 Willy Serniclaes and Christian Geng

2. Method
2.1. Participants
Participants for the Hungarian subset were (a) participants of an undergra-
duate linguistics course or (b) volunteers contacted via a mailing list. Apart
from their first language, all of them were familiar with at least one of the
languages German, French or English. Most of them were participants of
undergraduate exchange programs. They were between 18 and 53 years old
with no reading or hearing impairment reported. The French dataset was
similar in age structure: Subjects' age ranging between 17 and 59 years.
Likewise, there were no known auditory problems.


2.2. Stimuli
Twenty-three CV stimuli were generated with a parallel formant synthesiz-
er (conceived by R. Carr: http://www.tsi.enst.fr/~carre/). F1-F2-F3 transi-
tions ended at 500, 1500 and 1500 Hz respectively after a 27 ms transition.
The VOT was set to -95 ms and the stable vocalic portion had a duration of
154 ms. The stimuli differed as to the onset of F2 and F3 transition. 14
stimuli were generated by separate modification of the F2 and F3 onsets
along a "phonetic" continuum, normal to the locations of the natural boun-
daries corresponding to either flat F2 or F3 transitions- as shown in Fig-
ure 1. Nine other stimuli were generated by joint modifications of the F2
and F3 onsets along a "phonological" continuum normal to the expected
category boundaries separating the F2/F3-space into three distinct regions
(for further details see: Serniclaes et al., 2003). Successive stimuli were 1
Bark apart on both continua. The present paper only deals with the data of
the phonetic continuum. The same amount of stimuli was generated with
the same basic data but an additional, constant burst-like signal portion.


2.3. Procedure
Both continua were presented to each of the participants. The continua with
and without burst were presented in alternating order resulting in a be-
tween-subject factor (order of presentation) which was used for control
purposes. Hungarian participants were told that they would hear one of the
Cross-linguistic trends in the perception of place of articulation 251

four sounds b,d, gy or g and were instructed to report which of the
four sounds they had heard. They were told that the sounds not necessarily
were presented with equal frequency, and to judge each sound separately.
For the French participants, the procedure was alike except that for them
only three response alternatives were available.


2.4. Statistical models
The data were fitted by Nonlinear Regression with a model in which the
effect of F2 was nested in the effect of F3 and the latter was nested in the
effects of Residual cues (i.e. the acoustic cues for place which were con-
stant in the stimuli). This model is instantiated by Equations 1 and 2. Label-
ling responses depend on a Logistic Regression (LR) equation including
Residual cues, a nested LR equation including F3, itself including a nested
LR equation including F2. Each LR equation included different variables
representing the effects of Burst and Language.

Equation 1. Logistic Function
1
((cues)) =
1+e
- (cues)
Where is the Logistic function and is a Linear function.
The model used here is a composed of three nested Logistic functions, as
specified in Equation 2.
Equation 2. Coupling model
Labelling response = ( (R, ((F3, ((F2)))))
Where R stands for Residual Cues. and have the following parame-
ters.
(R, ( (F3, ( (F2)))) =
A
1
+ B
11
*Burst + B
12
*Lang + B
13
* Burst*Lang + K
1
* ((F3), ((F2)))
(F3, ( (F2))) =
A
21
+A
22
*F3+B
21
*F3*Burst+B
22
*F3*Lang+B
23
*F3*Burst*Lang+ K
2
* ((F2))
(F2) =
A
31
+A
32
*F2+ B
31
*F2*Burst+ B
32
*F2*Lang+ B
33
*F2* Burst*Lang
Where F2 and F3 are the formant frequencies scaled in Barks. Burst and
Lang are dichotomic variables. Burst stands for the presence vs. ab-
sence of a noise burst in the stimuli. Lang stands for the Hungarian vs.
French group of participants.
252 Willy Serniclaes and Christian Geng


This is a hierarchical coupling model. Coupling means perceptual inter-
dependencies in the processing of different features and, accordingly, the
model includes interdependencies in the perception of the different acoustic
cues which convey these features. However, rather than being symmetrical,
couplings are hierarchical in Equation 1, a working assumption for the sake
of simplicity. A symmetrical model would indeed require feedback loops in
the processing of the different cues.
3. Results
Hungarian without burst
0%
25%
50%
75%
100%
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Hungarian with burst
0%
25%
50%
75%
100%
1 2 3 4 5 6 7 8 9 10 11 12 13 14

labial alveolar palatal velar

French without burst
0%
25%
50%
75%
100%
1 2 3 4 5 6 7 8 9 10 11 12 13 14

French with burst
0%
25%
50%
75%
100%
1 2 3 4 5 6 7 8 9 10 11 12 13 14

Figure 2. Labelling curves for the stimuli with or without burst in Hungarian and
French.
Cross-linguistic trends in the perception of place of articulation 253


The labelling curves for the stimuli with or without burst in French and
Hungarian are presented in Figure 2. Although there are obvious differenc-
es between the labelling curves for the stimuli with vs. without burst, the
location of the boundaries (i.e. the stimuli collecting an equal number of
responses for two adjacent categories) are only marginally affected. In
French, there are three boundaries corresponding to (from left to right in
Figure 2) the alveolar/labial, the labial/velar and the velar/alveolar distinc-
tions. Interestingly, there is a secondary peak of velar responses around the
alveolar/labial boundary, mainly for the stimuli without burst. In Hunga-
rian, there are four boundaries corresponding to (from right to left in Figure
2) the palatal/alveolar, alveolar/labial, the labial/velar and the velar/alveolar
distinctions.
The distinctions between palatals, alveolars and velars are not very
clearcut (Figure 2). However, the Hungarian palatal and alveolar functions,
when taken together, correspond fairly well to the French alveolar function.
This is clear from Table 1 which gives the different boundaries, assessed by
linear interpolations around the 50% response points on the mean labelling
functions (Figure 2). Examination of these boundaries indicate that while
the alveolar/palatal boundary in Hungarian is fairly close to the expected
flat F2 transition value, the alveolar/labial boundary in this language is far
apart from the expected flat F3 transition value and much closer to the
French alveolar/labial boundary.

Table 1. Values of formant transitions at the perceptual boundary for the different
place contrasts prevailing, for each burst condition (without vs. with),
and for each language (Hungarian vs. French). The boundary values are
given separately for the F2 and F3 transitions and scaled as the distances
in Barks between the onset and offset frequencies. Positive values indi-
cate falling transitions, negative values indicate rising transitions. Rele-
vant values for assessing the hypotheses are presented in bold. For the
labial/velar contrast, the F2 boundary values are fairly close to the ex-
pected flat boundary transition, as indicated by the near zero values, both
for the stimuli with and without bursts and in both languages. For the ve-
lar/alveolar-palatal contrast, the values reported correspond to the ve-
lar/alveolar boundary in French and to the velar/palatal boundary in
Hungarian. As expected, the F3 boundary values relative to this contrast
are also fairly close to the flat boundary transition for both for the stimuli
with and without bursts and in both languages. For the palatal/alveolar
contrast, which is only present in Hungarian, the boundary is close to the
254 Willy Serniclaes and Christian Geng

expected flat F2 transition for the stimuli with burst but not for the stim-
uli without burst. Finally, the Hungarian alveolar/labial boundary differs
from the expected flat F3 boundary transition both for the stimuli with
burst and for those without burst. The Hungarian alveolar/labial bound-
ary is similar to the French alveolar/labial boundary, both corresponding
approximately to a stimulus in which a rising F2 transition is compen-
sated by a falling F3 transition although the trade-off is not perfect.
Without burst With burst
Hungarian French Hungarian French
F2 F3 F2 F3 F2 F3 F2 F3
labial/velar 0.3 -1.4 0.1 -1.4 0.1 -1.4 -0.1 -1.4
velar/alv.-pal. 1.7 0.0 1.7 0.2 1.7 0.3 1.7 0.3
palatal/ alv. 1.0 1.4 0.3 1.4
alveolar/labial -1.1 1.4 -2.0 1.4 -2.1 1.4 -1.9 1.4

The data were fitted with Non Linear Regressions (NLR) run on a hierar-
chical coupling model (see Method, Equations 1 and 2). A separate NLR
was run for each contrast, i.e. labial/velar, velar/alveolar-palatal and alveo-
lar-palatal/labial. NLR was used for testing the effect of language on place
identification as well as specific hypotheses on the location of the place
boundaries in the F2-F3 onset transition space. As explained in the Intro-
duction, we expected that the place contrasts which are common to both
languages display the same perceptual boundaries. We also wanted to con-
firm previous analyses conducted separately on the data collected for each
language and which showed that the place boundaries correspond to natural
boundaries or to some coupling between natural boundaries. Specifically,
we expected that the labial/velar boundary would correspond to a flat F2
transition, that the velar/alveolar-palatal boundary would correspond to a
flat F3 transition and that the alveolar-palatal/labial boundary would cor-
respond to a tradeoff between a rising F2 transition and a falling F3 transi-
tion (see Figure 1c).
The boundary estimations are given in Table 2. For the labial/velar con-
trast, the model only included a F2 component nested in a Residual cues
component (cf. Eq. 1: ( ( (R, ((F2)))) submodel). The effect of F3 and
its interactions with Burst and Language were not significant. There were 7
significant parameters. Burst and Language biases were not significant. The
effects of F2 (bias and slope), the Burst x F2, Language x F2 (all p < .001)
and Burst x language x F2 (p < .05) interactions were significant. The labi-
Cross-linguistic trends in the perception of place of articulation 255

al/velar boundary corresponds to an almost flat F2 transition in both lan-
guages, both for the stimuli with and without bursts (Table 2).

Table 2. Values of formant transitions at the perceptual boundary for the place
contrasts common to both languages, for each burst condition (without
vs. with), and for each language (Hungarian vs. French). Each data cell
gives the observed values, NLR estimations and 95% CI limits. For the
labial/velar contrast, the boundary values are fairly close to the flat F2
boundary transition (11.2 Bark, 1500 Hz F2) in both languages, both for
the stimuli with and without bursts. For the velar/alveolar-palatal con-
trast, the boundary values are close to the flat F3 boundary transition
(14.5 Bark, 2500 Hz F3) in both languages, both for the stimuli with and
without bursts. For the alveolar-palatal/labial contrast, boundary values
are indexed by ratio of the extent of the F3 vs. the extent F2 transition in
Bark. The F3/F2 transition extent ratio is fairly close to 1, except for the
Hungarian data for stimuli with burst, and never significantly different
from 1. This suggests that the alveolar-palatal/labial boundary corre-
sponds to a trade-off between a rising F2 and a falling F3 transition in
both languages.
Without burst With burst
Contrast Hungarian French Hungarian French
labial/
velar
F2 onset
Observed
NLR
CI limits
11.5
11.6
11.2-12.2
11.3
11.1
10.7-11.6
11.3
11.3
10.9-11.9
11.1
11.0
10.7-11.5
velar/
alv.-pal.
F3 onset
Observed
NLR
CI limits
14.3
14.3
13.8-14.6
14.7
14.7
14.6-14.9
14.8
14.8
14.7-14.9
14.8
14.9
14.9-15.0
Alv.-pal.
/labial
F3 extent/
F2 extent
Observed
NLR
CI limits
1.0
0.9
0.5-1.3
0.7
0.9
0.5-1.3
0.5
0.7
0.4-1.0
0.7
0.7
0.4-1.0

For the velar/alveolar-palatal contrast, the model included an F3 component
nested in a Residual cues component (cf. Eq. 1: ((R, ((F3)))) submo-
del). The effect of F2 (bias and slope), and its interactions with Burst and
Language were not significant. There were 8 significant parameters. The
effect of the Residual cues, Burst bias, Language bias and Burst x Lan-
guage bias were significant (all p < .00). The effects of F3 (p < .001) and
the Burst x F3 interaction (p < .01) were also significant. The ve-
lar/alveolar-palatal boundary corresponds to an almost flat F3 transition for
256 Willy Serniclaes and Christian Geng

the stimuli without burst and a slightly falling F3 transition for the stimuli
with burst (Table 2).
For the alveolar-palatal/labial contrast, the model included an F2 com-
ponent nested in an F3 component (cf. Equation 1: ((F3, ((F2))))
submodel). The effects of the Residual cues as well as Burst and Language
biases were not significant. There were 6 significant parameters. The ef-
fects of F2 and F3 (bias and slope), as well as the Burst x F2 and Burst x F3
interactions were significant (all p < .001). The tradeoffs between F2 and
F3 transition onset values are presented in Table 2, per language and burst
condition. A rising F2 transition is compensated by a falling F3 transition in
both languages and both burst conditions, indicating that the alveolar-
palatal/labial boundary corresponds to a trade-off between a rising F2 and a
falling F3 transition.

Figure 3. Examples of the relative failure of the Logistic Regression (LR) vs. Non
Linear Regression (NLR) for assessing perceptual boundaries (50% re-
sponse points). Observed and expected response scores for the la-
bial/velar contrast in French (right) and for the alveolar-palatal/velar con-
trast in Hungarian (left). In both cases the assessment of the boundary is
much better with NLR than with LR.

The performances of the NLR models were compared to those of the sim-
ple Logistic Regressions with the same number of parameters. The percen-
tage of explained variance amounted to 63.4 % with NLR vs. 61.8 % with
0%
25%
50%
75%
100%
S8 S9 S10 S11 S12
Observed
NLR Pred.
LR Pred.
0%
25%
50%
75%
100%
S12 S13 S14 S15
Observed
NLR Pred.
LR Pred.
Cross-linguistic trends in the perception of place of articulation 257

LR for the labial/velar contrast, to 40 % with NLR vs. 38 % with LR for the
velar/alveolar-palatal contrast, to 64.1 % with NLR vs. 60.4 % with LR for
the alveolar-palatal/labial contrast. The NLR models fitted the data better
than simple Logistic Regressions although the quantitative differences are
fairly small. However, these differences are far from being negligible be-
cause the differences between expected and observed boundaries are much
larger with the LR vs. NLR models. This is illustrated with two different
examples in Figure 3.

without burst
12
13
14
15
16
9 10 11 12 13
F2 onset (Bark)
F
3

o
n
s
e
t

(
B
a
r
k
)
with burst
12
13
14
15
16
9 10 11 12 13
F2 onset (Bark)
F
3

o
n
s
e
t

(
B
a
r
k
)

Figure 4. NLR estimations of the territorial maps per burst condition with both
French boundaries (plain lines) and Hungarian boundaries (dotted lines).
The labial/alveolar-palatal boundaries of the two languages overlap.

Territorial maps of the place categories in the F2-F3 onset frequencies are
presented in Figure 4. These maps were obtained by calculating the boun-
daries between categories from the outputs of the Non-Linear Regressions
(Equ.1). For both the stimuli with and without burst, the velar region cor-
responds to the lower right quadrant with boundaries corresponding to fair-
ly flat F2 and F3 transitions (see Table 2 for details). The labial/alveolar-
palatal boundary corresponds to the tradeoff between a rising F2 and a fall-
ing F3 transition. There is some tendency for the velar region to be narrow-
er in Hungarian but differences between languages are fairly small. As for
the differences between the dispersion and coupling theories (Fig. 1c vs.
1d), the confidence interval for the labial/velar boundary includes the 11.5
Bark value forecasted by the coupling theory for each Burst and Language
condition whereas the 10.6 Bark value forecasted by the dispersion theory
258 Willy Serniclaes and Christian Geng

falls outside the confidence interval in all the conditions. For the ve-
lar/alveolar-palatal boundary, the confidence interval includes the 14.5
Bark value forecasted by the coupling theory in one over four conditions
whereas the 14.8 Bark value forecasted by the dispersion theory falls inside
the confidence interval in two over four conditions.
4. Discussion
4.1. Stability of place boundaries across languages
The present results show that transitional features are used in much the
same way in both Hungarian and French. Strikingly, the contrasts which are
common in both languages use almost the same perceptual boundaries,
especially for the stimuli with burst. Further, these common boundaries are
not selected at random but correspond to qualitative changes in the direc-
tion of frequency transitions.


4.2. On the enrootment of place perception in natural boundaries
The place boundaries evidenced in the present study are clearly related to
natural settings. The labial/velar distinction is based on rising vs. falling
direction of the F2 transition which obviously corresponds to a natural
boundary. Similarly, the alveolar-palatal/velar distinction is based on the
direction of the F3 transition. These represent two clear examples of direct
implementation of natural boundaries in the phonological framework.
Both infants and adults display a natural sensitivity for perceiving dif-
ferences between rising and falling transitions in non-speech sounds (Aslin,
1989). This suggests that flat transition boundaries evidenced in the present
study correspond to basic psychoacoustic limitations in the processing of
frequency transitions. The relevance of these boundaries for speech percep-
tion in two languages with quite different place of articulation settings, i.e.
French and Hungarian, clearly demonstrate the role of predispositions in
speech development. However, it is also clear that predispositions for
speech perception are not always directly suited for phonological purposes.
Evidently, the distinction between labial and alveolar-palatal stops does not
directly depend on either the sole F2 or the F3 transition. However, the
bilabial/alveolar-palatal contrast calls on a tradeoff between the two transi-
Cross-linguistic trends in the perception of place of articulation 259

tions, a rising F2 being compensated by a falling F3 to yield a globally flat
F2-F3 compound. This tradeoff does not depend on the specificities of
place production in the two different languages. Rather, it corresponds to
yet another qualitative difference between speech sounds: the difference
between globally rising vs. globally falling F2 and F3 transitions. Though
the difference in the global direction of the transitions is more complex
than those between the individual directions of F2 and F3 transitions, they
all correspond to qualitative changes in transition direction. Such qualita-
tive changes are highly specific and the present results suggest that they
impose strong constraints on the location of place boundaries.
Finally, the present data do not allow to decide between the dispersion
and coupling models. Dispersion Theory (Liljencrants & Lindblom, 1972;
for a tentative expansion of this theory to consonants, see Abry, 2003)
claims that language tend to optimally divide the acoustic space between
phonological categories. This theory would predict an equal sharing of the
acoustic space between perceptual categories, a prediction which is fairly
well supported by the location of the alveolar-palatal/velar boundary in the
present results. However, the location of the labial/velar boundary con-
forms more to the predictions of the coupling theory. The present data are
probably not sensitive enough to distinguish between these theories as they
make fairly equivalent predictions on the locations of the boundaries. Still
another possibility is that perceptual development proceeds in two stages,
couplings between predispositions being followed by an optimal share-out
of the acoustic space. This is an interesting perspective in view of the fairly
slow development of speech perception after a rapid adaptation of predis-
positions during the first year of life (Burnham, Tyler, & Horlyck, 2002).
While present data were not designed to test the hypothesis of a two-stage
coupling then optimal dispersion development, it should be addressed in
future research in comparisons between children and adults.


4.3. Place of articulation categories and distinctive regions
The results show that the perception of the fourfold place of articulation
contrasts in Hungarian are partially based on the direction of F2 and F3
formant transitions. However, these features are clearly not sufficient for
supporting the alveolar/ palatal contrast. Our working hypothesis in the
present study was that the Hungarian /d/ stops should occupy the Distinc-
tive Region with rising F2-falling F3 transitions (dental region, R7 in the
260 Willy Serniclaes and Christian Geng

DRM) and that palatal stops should occupy the Distinctive Region with
rising F2-rising F3 transitions (R6 in the DRM). The present results seem
to support our working hypothesis in two different ways. First, the Hunga-
rian /d/ responses peaked inside the R7 region (Figure 2). Second, the al-
veolar peak was fairly weak. This last finding is in agreement with the fact
that the R7 region can only produce fairly unstable transitional F2-F3 pat-
terns (Carr, Serniclaes & Marsico, 2003): if, as we assumed, the Hunga-
rian /d/ stops are produced in this region these percepts should only be
weakly represented in the F2-F3 space and this is indeed what we found.
Finally, as our data indicate that the French /d/ category covers the same
regions as the Hungarian alveolar and palatal categories, it would seem that
the French /d/ is perceived as alveo-palatal rather than alveo-dental. This is
not directly compatible with the articulatory descriptions of /d/ as alveo-
dental (see Ladefoged & Maddieson, 1995, p.23). While the present find-
ings need confirmation with natural speech tokens, it is not excluded that
perceptual representations of place categories do not readily fit the articula-
tory representations. Perceptual representations do not have to be veridical
representations of the vocal tract categories for operating reliable distinc-
tions between sounds. The intrinsic value of the /d/ category does not mat-
ter: all what matters is that /d/ is distinguished from /b/ and from /g/ on
acoustic grounds. However, such acousticallydriven perceptual represen-
tations have to be transformed in specific ways to be related to motor repre-
sentations.


4.4. Transitions vs. burst as vectors of place perception
The poor representation of the alveolar/palatal distinction between Hunga-
rian consonants in the F2-F3 transition space suggests that broader coupl-
ings are necessary for stabilizing the alveolar percepts.
Other features, among which those provided by the burst spectrum,
might be necessary for the addition of the palatal to the three principal
places of articulation. To reveal the contribution of the burst-related fea-
tures one has to use stimuli generated by factorial variation of burst and
transitions. There have been several attempts in the past for separating the
contributions of burst and transitions to place perception in stop consonants
and most of these studies point to the functional equivalence of the two
cues across phonetic contexts (e.g. Dorman, Studdert-Kennedy & Raphal,
1977). However, these results were collected in languages with only three
Cross-linguistic trends in the perception of place of articulation 261

place categories which make it rather difficult for evidencing autonomous
contributions of the two kinds of cues because both contributed to all the
three possible contrasts. Things might turn out differently in a four-
category language like Hungarian in which the different place contrasts
might rely on different cues: as shown in the present results, transitions are
sufficient as long as there is no alveolar/palatal contrast present. It is then
possible that the perception of this contrast might rely on burst properties
which are independent of the onset frequencies of the formant transitions.
Future experiments with stimuli generated by factorial variation of burst
and transitions should allow to clarify this point.
Here one has to be aware that the phonemic status of the Hungarian pa-
latal has been the topic of a long-standing debate on whether it should be
classified as a true stop or an affricate (for a summary on the phonological
treatment of this matter see Siptr & Trkenczy, 2000). Our own record-
ings of the Hungarian palatal (Geng & Mooshammer, 2004) suggested a
high degree of acoustic variability in the realization of this sound compris-
ing clear stop and affricate and even fricative-like realizations with signal
portions we interpreted as a residual burst. More thorough and detailed
spectral analyses of these data would be required before a conclusive cate-
gorization of the observed patterns could be possible.


4.5. Some implications for phonological systems
From a systemic point of view, the present study lends further support to
the idea that languages do not use all the possible combinations between
two (or several) universal features. The data suggest that in languages with
four place categories, such as Hungarian, F2 and F3 transitions are not suf-
ficient for separating these categories and a third feature is necessary. The
examination of laryngeal timing contrasts leads to similar conclusions. Al-
though there are two different predispositions for perceiving negative and
positive VOT, there are only three rather than four reliable VOT categories
in the worlds languages and a third phonetic feature (manner of vocal fold
vibration) is used in languages with four homorganic categories (Lisker &
Abramson, 1964).
The fact that the category boundaries stay relatively stable between both
languages under consideration while adding an additional place of articula-
tion seems to suggest that there is an upper limit of place categories imple-
mentable on transitional cues alone, which is potentially interesting for the
262 Willy Serniclaes and Christian Geng

architecture of phonological systems. We do not want to overemphasize
this point, as there are other well-established mechanisms for the same
explanandum, i.e. the distribution of voiced and voiceless stops at different
places of articulation in the worlds languages: Ohala (1983) related the
relative frequencies of stop categories in the worlds languages to aerody-
namic and acoustic constraints, in particular with /g/ and /p/ being rare in
the worlds languages due either to difficulties to maintain voicing (for /g/)
or to low acoustic salience (for /p/). Maddieson (2003) referred to these
data as missing /g/ and missing /p/ phenomena, analyzing the latter as a
regional phenomenon though. Anyhow, it hardly seems possible to disen-
tangle effects of the different phonetic mechanisms on frequency data as
those available in the UPSID database. The results presented here do not
render such speculations as prohibitive either though the density effect
observed in this study and the aerodynamic considerations made popular by
Ohala could even act in a synergistic fashion in the constitution of
the emergent asymmetries observed in the obstruent inventories of the
world's languages.
5. Conclusion
The present findings bear several implications for both speech development
and phonological systems. Firstly, the fact that, both in French and in Hun-
garian, perceptual boundaries align along natural boundaries for transition
perception or along some combination of these natural boundaries gives
further support to the coupling model of speech development. Further, as
the alveolar/palatal Hungarian boundary is poorly represented in the transi-
tion space, at least one additional phonetic feature seems necessary for
perceiving the fourfold place distinctions in Hungarian, thereby giving still
a further example of coupling between universal features in the build-up of
phoneme categories. Secondly, the fact that the boundaries are generated by
lawful combinations of perceptual predispositions shows that the latter
impose strong constraints on phonological development. An important
question for future research will be to understand how these constraints
might converge with processes based on distance between categories.
Cross-linguistic trends in the perception of place of articulation 263

References
Abry, C.
2003 [b]-[d]-[g] as a universal triangle as acoustically optimal as [i]-[a]-[u]. In
Sol, M.-J., Rescasens, D. & Romero, J. (Eds.). Proceedings of the 15th
International Congress on Phonetic Sciences, pp. 727-730.
Aslin, R.N.
1989 Discrimination of frequency transitions by human infants. Journal of
the Acoustical Society of America, 86:582590.
Burnham, D., Tyler, M., & Horlyck, S.
2002 Periods of speech perception development and their vestiges in
adulthood. In Burmeister, Piske & Rohde (Eds.). An integrated View
of Language Development (Papers in Honor of Henning Wode).
Trier: WVT Wissenschaftlicher Verlag. pp. 281-300.
Carr, R., & Mrayati, M.
1991 Vowel-vowel trajectories and region modeling. Journal of Phonetics,
19:433-443.
Carr, R., Serniclaes, W., & Marsico, E.
2003 Formant transition duration versus prevoicing duration in voiced stop
identification. In Sol, M.-J., Rescasens, D. & Romero, J. (Eds.).
Proceedings of the 15th International Congress on Phonetic
Sciences, pp. 415-418.
Dart, S.N.
1998 Comparing french and English coronal consonant articulation.
Journal of Phonetics, 26:71-94.
Dorman, M.F., Studdert-Kennedy, M., & Raphal, L.S.
1977 Stop consonant recognition: Release bursts and formant transitions
as functionally equivalent, context-dependent cues. Perception and
Psychophysics, 22:109-122.
Eimas, P.D.
1974 Auditory and linguistic processing of cues for place of articulation
by infants. Perception and Psychophysics, 16:513-521.
Geng, C., Mdy, K., Bogliotti, C., Messaoud-Galusi, S., Medina,V., & Serniclaes, W.
2005 Do palatal consonants correspond to the fourth category in the
perceptual F2-F3 space? In: V.Hazan & P. Iverson (Eds.).
Proceedings of the ISCA Workshop on Plasticity in Speech
Perception, London, June 15-17 2005, pp. 219-222.
Geng, C., & Mooshammer, C.
2004 The Hungarian palatal stop: phonological considerations and
phonetic data. ZAS Papers in Linguistics, 37: 221-243.
Gerken, L. & Aslin, R.N.
2005 Thirty years of research on infant speech perception: The legacy of
Peter W. Jusczyk. Language Learning and Development, 1:5-21.
264 Willy Serniclaes and Christian Geng

Guenther, F., & Bohland, J.
2002 Learning sound categories: A neural model and supporting
experiments. Acoustical Science & Technology, 23:213-220.
Harnad, S.
1987 Categorical perception: the groundwork of cognition. New-York:
Cambridge University Press.
Hoonhorst, I., Colin, C., Deltenre, P., Radeau, M., & Serniclaes, W.
2006 Emergence of a language specific boundary in perilinguistic infants.
Early Language Development and Disorders, Latsis Colloqium of
the University of Geneva, Program and Abstracts, 45.
Jakobson, R., Fant, G., & Halle, M.
1952 Preliminaries to Speech Analysis. Cambridge: MIT Press.
Jusczyk, P. W.
1997 The discovery of spoken language. Cambridge, MA: MIT Press.
Bradford Books.
Keating, P.
1988 Palatals as complex segments: X-ray evidence. UCLA Working
Papers in Phonetics, 69:77-91.
Kuhl, P.K.
2004 Early language acquisition: cracking the speech code. Nature
Reviews, 5:831-843.
Kuhl, P.K., Stevens, E., Hayashi, A., Deguchi, T. Kiritani, S., & Iverson, P.
2006 Infants show a facilitation effect for native language phonetic perception
between 6 and 12 months. Developmental Science, 9:F13-F21.
Ladefoged, P. and Maddieson, I.
1996 The sounds of the worlds languages. Cambridge, Mass.: Blackwell.
Lasky, R.E., Syrdal-Lasky, A., & Klein, R.E.
1975 VOT discrimination by four to six and a half month-old infants from
Spanish environment. Journal of Experimental Child Psychology,
20:215-225.
Liberman, A.M.
1998 Three questions for a theory of speech. The Journal of the Acoustical
Society of America, 103:3024.
Liljencrants, J., & Lindblom, B.
1972 Numerical simulation of vowel quality systems: the role of
perceptual contrast. Language, 48:839-862
Lisker, L., & Abramson, A.S.
1964 A cross-language study of voicing in initial stops: acoustical
measurements. Word, 20, 384-422.
Maddieson, I.
2003 Phonological typology in geographical perspective. In Sol, M.-J.,
Rescasens, D. & Romero, J. (Eds.). Proceedings of the 15th
International Congress on Phonetic Sciences, pp. 719-722.
Cross-linguistic trends in the perception of place of articulation 265

Masin, S.C.
1993 Some philosophical observations on perceptual science. In S. Masin (ed.),
Foundations of perceptual theory. Amsterdam: Elsevier. pp. 43-73.
Massaro, D. W.
1987 Categorical Partition: A fuzzy logical models of categorization behavior.
In S. Harnad, (ed.), Categorical perception: the groundwork of cognition.
New-York: Cambridge University Press, pp. 254-286.
Ohala, J. J.
1983 The origin of sound patterns in vocal-tract constraints. In: P. F.
MacNeilage (ed.), The Production of Speech. New York: Springer.
pp. 189-216.
Pegg, J.E., & Werker, J.F.
1997 Adult and infant perception of two English phones. The Journal of
the Acoustical Society of America, 102:3742-3753.
Repp, B.H.
1979 Relative amplitude of aspiration noise as a voicing cue for syllable-
initial stop consonants. Language and Speech, 22:173-189.
1982 Phonetic trading relations and context effects: New experimental
evidence for a speech mode of perception. Psychological Bulletin,
92:81-110.
Serniclaes, W.
2000 La perception de la parole. In P. Escudier & J.L. Schwartz (ed.), La
parole: Des modles cognitifs aux machines communicantes. Paris:
Hermes, Science Publications. pp. 159-190.
Serniclaes, W., & Carr, R.
2002 Contextual effects in the perception of place of articulation: a
rotational hypothesis. In J.L.H. Hansen & B. Pellom (eds.),
Proceedings of the 7th International Conference on Spoken
Language Processing, pp. 1673-1676.
Serniclaes, W., Bogliotti, C. & Carr, R.
2003 Perception of consonant place of articulation: phonological
categories meet natural boundaries. In In Sol, M.-J., Rescasens, D.
& Romero, J. (Eds.). Proceedings of the 15th International Congress
on Phonetic Sciences, pp. 391-394.
Siptr, P., & Torkenczy, M.
2000 The phonology of Hungarian. Oxford: Univ. Press.
Snowdown , C. T.
1987 A naturalistic view of categorical perception. In S. Harnad, (ed.),
Categorical perception: the groundwork of cognition. New-York:
Cambridge University Press, pp. 332-354.
Sussman, H. M., Fruchter, D., Hilbert, J., & Sirosh, J.
1998 Linear correlates in the speech signal: The orderly output constraints.
Behavioral and Brain Sciences, 21:241-259.
266 Willy Serniclaes and Christian Geng

Vihman, M.V.
1996 Phonological development : The origins of language in the child.
Cambridge (MA): Blackwell.
Werker, J.F., & Tees, R.C.
1984a Cross-language speech perception: evidence for perceptual
reorganization during the first year of life. Infant Behavior and
Development, 7:49-63.
1984b Phonemic and phonetic factors in adult cross-language speech
perception. The Journal of the Acoustical Society of America,
75:1866-1878.



The complexity of phonetic features' organisation in
reading
Nathalie Bedoin and Sonia Krifi
1. Introduction
The goal of the experimental work described here is twofold. First, we in-
vestigate the contribution of sub-phonemic knowledge to the phonological
code in reading, using feature similarity as a manipulated factor in priming
experiments. Second, we intend to show that the sensitivity to sub-
phonemic similarity is not directly determined by the number of shared
phonetic features, but is more complex and depends on the type of phonetic
feature. An implicit hierarchical organisation of phonetic feature types is
assumed to affect the time-course of priming effects for sequentially dis-
played printed stimuli, and to guide similarity judgements in a metalinguis-
tic task.
2. Sub-phonemic effects in lexical processing
Lexical activation in speech decoding has been described as a gradual proc-
ess and a phoneme-based evaluation metric appears insufficient to account
for the results of various studies on mismatches in lexical access
(McQueen, Dahan, and Cutler 2003). Consequently, sub-phonemic infor-
mation has been argued to modulate lexical activation (McMurray, Tanen-
haus, and Aslin 2002; Blumstein 2004) and a few lexical processing models
include a level in which phonetic features are mapped onto lexical forms
(Marslen-Wilson and Warren 1994).
Indeed, semantic facilitative priming effects have been recorded in
Dutch for target words preceded by a pseudo-word that differs from an
associated prime word by only one phoneme, provided that the difference
did not exceed one or two phonetic features (Marslen-Wilson, Moss, and
van Halen 1996). Similarly, facilitative priming effects in a cross-modal
paradigm have been shown to be restricted to pairs differing by one, two,
but not three phonetic features in English (e.g., tennis was better processed
after zervice, but not after gervice) (Connine, Blasko, and Titone 1993).
268 Nathalie Bedoin and Sonia Krifi

Additionally, the lexical advantage classically observed for phoneme moni-
toring decreased if the target was embodied within a pseudo-word differing
from a word by one phonetic feature, while it completely disappeared if the
pseudo-word differed by additional features (Connine, Titone, Deelman,
and Blasko 1997). Therefore, the mere evaluation of the phonemic similar-
ity fails to explain such effects and the number of shared phonetic features
between the signal and a lexical unit is critical to lexical access.
Featural similarity effects have been more directly assessed with prime-
target sub-phonemic similarity manipulation: in phonetic priming, prime
and target share many phonetic features, but no entire phoneme, whereas
they share at least one of their constituent phonemes in phonological prim-
ing. Opposite effects have been observed for phonetic and phonological
priming. Auditory perception of words is improved if prime and target
share one phoneme (phonological priming, Slowiaczek, Nusbaum, and
Pisoni 1987), while inhibitory effects are observed if they are phonetically
similar without any shared phoneme (phonetic priming, Goldinger, Luce,
and Pisoni 1989; Goldinger, Luce, Pisoni, and Marcario 1992; Luce, Gold-
inger, Auer, and Vitevitch 2000). To explain this inhibitory phonetic prim-
ing effect, the authors assumed separate levels of representation for fea-
tures, phonemes, and words, as described in interactive-activation models
of word recognition (McClelland and Rumelhart, 1981). Excitatory activa-
tion is supposed to pass between levels, while inhibitory lateral connections
may be present among nodes within each level. In priming experiments,
lateral inhibition at the phonemic level would suppress competitors of the
phonemes identified in the prime (i.e., phonemes that share many phonetic
features with the phonemes of the prime). As a consequence, a high pho-
netic similarity between prime and target would impair the identification of
the phonemes in the target, as it was observed in Goldinger et al.s experi-
ments (1989, 1992). Interestingly, speech production latencies also in-
creased if the onset of visual prime and target shared phonetic features
(Rogers and Storkel 1998). Inhibitory effects of phonetic similarity in both
speech perception and speech production led us to investigate the existence
of analogous effects in reading, since the involvement of phonological
knowledge has been evidenced in printed stimuli processing.
The complexity of phonetic features' organization in reading 269

3. Phonological knowledge in printed word recognition
Although the role of visual and orthographic information is crucial in read-
ing, a great deal of research has also concerned the involvement of phono-
logical knowledge in printed word recognition. Accumulated data argue for
the fast activation of phonemic knowledge, which contributes to written
word recognition in skilled readers (for reviews, Berent and Perfetti 1995;
Frost 1998). Some models propose that lexical entries can be activated only
on the basis of phonological representations (Van Ordens Verification
Model 1987). Other models assume that sub-lexical phonological units
activate the word meaning in parallel with orthographic units (Ferrand and
Grainger 1992, 1994), or participate in word recognition via bidirectional
relations in the non-lexical route (i.e., the Grapheme - Phoneme Corre-
spondences system in the Dual Route Cascaded Model, Coltheart, Rastle,
Perry, Langdon, and Ziegler 2001). Finally, some models rule out the no-
tion of independent routes of lexical access and describe a coherent visual-
phonological dynamic system based on self-consistent relations between
three families of inter-dependent letter-nodes, phoneme-nodes, and seman-
tic nodes (Bosman and Van Orden 1997; Van Orden and Goldinger 1994;
Van Orden, Jansen op de Haar, and Bosman 1997; Van Orden, Johnston,
and Hale 1988). However, the strength of the connections between node
families is assumed to depend on the consistency of relations, which may
be higher between letter-nodes and phoneme-nodes than between semantic-
nodes and the two other node families. Indeed, despite the fact that lan-
guages such as English and French have a phonologically deep written
system (in comparison to the German language, for instance, which has
more consistent grapheme-to-phoneme and phoneme-to-grapheme conver-
sion rules, Ziegler and Jacobs 1995), the relations between letters and pho-
nemes in alphabetic languages support the stronger bidirectional correla-
tions. Therefore, activation would feed forward from letter-nodes to
phoneme-nodes, which would in turn feed activation back to letter-nodes.
The resonance emerging between letter-nodes and phoneme-nodes would
provide the rapid and efficient selection of a combination of coherent
nodes, which would explain why phonology might supply powerful con-
straints on printed word recognition.
Despite the involvement of phonological knowledge in favour of rapid
recognition of printed words, various experimental data show that its ines-
capable role may be sometimes detrimental to performance in reading
tasks. On the contrary, optional and/or controlled involvement of phono-
270 Nathalie Bedoin and Sonia Krifi

logical knowledge would preclude such negative effects. For instance, ho-
mophony has been shown to increase error rates in semantic categorisation
tasks (Van Orden 1987; Van Orden, Johnston, and Hale 1988; Peter and
Turvey 1994), in semantic relatedness decision (Luo, Johnson, and Gallo
1998), in proofreading (Bosman and de Groot 1996; Sparrow and Miellet
2002), and in semantic relatedness judgement (Lesch and Pollatsek 1998).
Phonological processing also conducts readers to misleading responses in
letter detection tasks. More false alarms are made in detecting the letter i
in a target where it is absent (brane) but whose homophone (brain) contains
i (Ziegler and Jacobs 1995). Additionally, readers fail to detect i in a
visual stimulus, if this stimulus has a homophone which does not contain
i (Ziegler, Van Orden, and Jacobs 1997). The difficulty of preventing
phonological knowledge from being entailed in reading is also evidenced
by orthographic-phonological regularity effects obtained in lexical decision
although the list included many pseudo-homophone foils to discourage the
involvement of phonology (Gibbs and Van Orden 1998).
Further lines of evidence supporting the strong impact of phonological
constraints in reading come from phonologically mediated semantic prim-
ing experiments. Written word recognition and naming increase if the target
is preceded by the homophone of a semantically related word. Facilitation
in naming has been observed with a 100 msec-SOA in English (Lesch and
Pollatsek 1993; Lukatela, Lukatela, and Turvey 1993; Lukatela and Turvey
1991) and in lexical decision in French regardless of whether the prime is a
word or a pseudo-word (Bedoin 1995). Additionally, phonological priming
effects have also provided convincing data. Target recognition is improved
by homophonic or phonologically similar word or pseudo-word primes in
Serbo-Croatian, Chinese, Dutch, English, and French (Berent 1997; Brys-
baert 2001; Ferrand and Grainger 1992, 1994; Grainger and Ferrand 1996;
Lukatela and Turvey 1994; Perfetti and Bell, 1991; Perfetti and Zhang
1991; Rayner, Sereno, Lesch and Pollatsek 1995). Prime-target phonologi-
cal similarity effects are generally investigated using pairs of stimuli shar-
ing the complete phonological form, or at least the onset, the rime or sev-
eral phonemes, based on the implicit idea that the early phonological code
in printed word recognition is coarse-grained. However, the data presented
in the next part indicate that it is fine enough to involve sub-phonemic in-
formation.
The complexity of phonetic features' organization in reading 271

4. Sub-phonemic effects in printed word recognition
We have already conducted a series of priming experiments with French
adult good readers to assess their sensitivity to voicing similarity in reading
(Bedoin 1998). Prime and target only differed by their initial consonant and
both onsets shared voicing or not. Adult skilled readers were invited to read
silently and to make a lexical decision on each target by pressing one of
two keys. Response latencies were significantly longer when there was an
additional voicing similarity between prime and target, whatever the stimuli
onset asynchrony (SOA = 33, 66, 100 msec) and the frequency and lexical
status of the prime (word or pseudo-word). The detrimental effect of the
additional voicing similarity in printed target words has been confirmed
with a 33 msec-SOA in another lexical-decision experiment (Bedoin &
Chavand 2000). Taken together, these data argue for the contribution of
sub-phonemic knowledge to the phonological code in reading and the rapid
extraction of voicing.

Yet, we have not systematically investigated manner and place similar-
ity effects in reading, but preliminary results suggest that, contrary to voic-
ing, the influence of place and manner similarity does not provide very
consistent effects. Indeed, we sometimes observed that place or manner
similarity produced a facilitative (and not negative) priming effect, which
varied with the SOA. Facilitative phonetic priming effects in reading have
been also observed by Lukatela, Eaton and Turvey (2001) with a 57 msec-
SOA. In their high-similarity condition, prime and target only differed by
voicing, while they additionally differed by place or manner in the low-
similarity condition. Therefore, the authors assessed place or manner simi-
larity effects between printed primes and targets, and they observed im-
proved performances in case of place or manner prime-target similarity.
To account for facilitative and negative effects of sub-phonemic similar-
ity, we proposed that two mechanisms were involved in phonetic priming
(Bedoin 2003).
On the one hand, potential phoneme candidates may be activated before
the identification of a letter is completed using phonetic feature detectors. If
a visual mask interrupts the stimulus processing at this moment, the acti-
vated phonetic feature detectors are assumed to reinforce consistent pho-
neme detectors via bottom-up excitatory relations to identify the corre-
sponding letter. If the SOA is very short, this bottom-up excitatory
mechanism may be still in progress when the subsequent stimulus appears,
providing facilitative effects in case of high phonetic similarity between
272 Nathalie Bedoin and Sonia Krifi

prime and target. Although such an effect has not been observed with voic-
ing similarity manipulation in adults, it could be expected with feature
similarity based on other phonetic feature types, especially in case of very
short SOA. This bottom-up mechanism could account for the facilitative
priming effect observed by Lukatela et al. (2001) in case of place or man-
ner similarity.
On the other hand, phoneme detectors are assumed to be linked to each
other via lateral inhibitory relations whose weight is proportional to the
phonetic similarity between phonemes (Bedoin 2003). Therefore, within-
level competition would occur between overlapping phoneme candidates,
which would impair the identification of phonemes if the target is preceded
by a phonetically similar prime. This mechanism would be trigerred
slightly later than the inter-level phonetic features-to-phonemes relations.
This view is consistent with models including within-level competition due
to inhibitory links between overlapping candidates, such as the connection-
ist Interactive Activation Model (McClelland and Rumelhart 1981), the
Neighborhood Activation Model (Luce, Pisoni, and Goldinger 1990), and
the TRACE Model (McClelland and Elman 1986). Such a mechanism could
account for the negative effect of phonetic similarity observed in our previ-
ous priming experiments (Bedoin, 1998; Bedoin and Chavand, 2000).
Similarly, the inhibitory effect of a prime upon a phonetically similar
target has been recorded in speech perception, and it has been accounted for
by positing inhibitory between-phonemes connections (Goldinger et al.
1992). Our results argue for the involvement of such connections in read-
ing. For example, if the voiceless feature value is activated from the initial
consonant of the prime (e.g., /p/ in passe), feature-to-phoneme connections
may preactivate voiceless phonemes. Once the level of activation for /p/
exceeds a certain threshold, lateral inhibitory relations between /p/ and
other voiceless phonemes may inhibit the competitors of /p/, such as /t/,
which would increase the emergence of /p/ as the better candidate. As a
consequence, /t/ would be more difficult to identify as a subsequent target
than /d/ for instance.
Results obtained in backward-masking experiments provide support for
this interpretation (Bedoin and Chavand 2000). In such experiments, par-
ticipants were invited to recall a briefly presented visual target that had
been immediately replaced (masked) by another printed stimulus. This task
is difficult because the target processing is disrupted by the mask. How-
ever, if the mask processing is impaired, it is expected to provide less
masking effect on the target. Consequently, because of lateral inhibitory
The complexity of phonetic features' organization in reading 273

relations, a mask preceded by a phonetically similar target may impair the
target recall to a lesser extent than a mask that shares fewer phonetic fea-
tures with the preceding target. This difference has indeed been observed in
an experiment displaying the target at first (e.g., TYPE), immediately re-
placed with a mask that differed or not by voicing (e.g., zyve versus syfe).
In sum, identity in voicing between sequentially presented written stimuli
reduces performance on the second item (e.g., prime-target paradigm),
while it improves performance on the first one (e.g., target-mask para-
digm).
This opposition has been confirmed in a letter identification task, where
phonetic priming and masking effects were assessed within a single printed
stimulus. Subjects were briefly presented with a C
1
VC
2
V pseudo-word for
50 msec (adult readers) or 85 msec (average-reading and dyslexic children).
Then, the target letter appeared, printed in another case, and subjects had to
decide whether it was present or not in the stimulus. In these experiments,
decisions were more rapid for C
1
than for C
2
, suggesting that the letter iden-
tification was achieved at different rates according to its position in the
printed stimulus. As expected, voicing similarity between consonants was
detrimental to C
2
identification, but it increased performances for C
1
in
adults and in third-graders (Krifi, Bedoin, and Mrigot 2003). However, a
reversed pattern of results was observed in second-graders and in dyslexic
children, with voicing similarity improving C
2
identification and decreasing
C
1
identification, which is consistent with the involvement of excitatory
phonetic feature-phoneme connections in reading, but a lack of inhibitory
inter-phonemic relations in such subjects (Krifi, Bedoin, and Herbillon
2003).
Taken together, these results argue for the involvement of a sub-lexical
phonological level of knowledge, composed of a complex (but organised)
set of phoneme detectors in turn relying on sub-phonemic detectors. We
propose that the weight of the inhibitory relations between phoneme detec-
tors depends on the phonetic similarity between phonemes, defined in terms
of phonetic features. However, we assume that the lateral inhibition
strength between phonemes does not depend directly on the number of
shared phonetic features. Different phonetic features could have distinct
weights, or participate in lateral inhibition at different rates. To address this
issue, we conducted lexical decision experiments to assess the impact of
manner and place of articulation similarity.
274 Nathalie Bedoin and Sonia Krifi

5. Types of phonetic features in reading: voicing, manner and place of
articulation
Many current phonological theories consider that segments are organised in
terms of phonetic features, which may be characterized by an internal struc-
ture (Clements 1985). Feature values that are mutually interdependent may
form a feature type, and each type may be represented on a separate tier
(e.g., laryngeal feature, nasal feature, manner feature, place of articulation
feature). Evidence from aphasia is generally consistent with this view, with
phoneme substitution errors reflecting changes within a single tier rather
than across tiers (Blumstein 1990). Indeed, speech production errors may
involve only voicing, only manner, or only place and are less likely to in-
volve both place and voicing or manner and voicing. Similarly, difficulties
experienced by aphasics in discriminating phonemes have been shown to
be restricted to phonemes differing in place of articulation and not in voic-
ing (Miceli, Caltagirone, Gainotti, and Payer-Rigo 1978; Oscar-Berman,
Zurif, and Blumstein 1975). Conversely, a selective disturbance of voicing
contrast discrimination has been described in another patient (Caplan and
Aydelott Utman 1994). This double dissociation and the stability of the
selective impairments over time and between tests (Gow and Caplan 1996)
confirm the relative independence of place and voicing as phonetic types.
Additionally, a clear distinction is assumed between articulator-free fea-
tures (manner and sonorance) and articulator-bound features (place and
voicing): in models of phoneme identification in connected speech, it has
been assumed that the identification of articulator-free features provides the
basis for the subsequent discrimination of articulator-bound features, since
they establish regions in the signal where acoustic evidence for the articula-
tor-bound features can be found (Stevens 2002). This is in accordance with
an advantage for the discrimination of articulator-free over articulator-
bound features observed in aphasic patients (Gow and Caplan 1996). Fi-
nally, neurophysiological and neuropsychological data converge in show-
ing differences in the hemispheric functional asymmetry associated with
place and voicing representation and processing. While the left hemisphere
(LH) is sensitive to the place of articulation feature, voicing appears to be
more strongly represented and processed to a greater extent in the right
hemisphere (RH) than in the LH (Cohen and Segalowitz 1990a; Rebattel
and Bedoin 2001). A RH advantage for the acquisition of a nonnative voic-
ing distinction in adults (Cohen and Segalowitz 1990b), auditory evoked
potentials recorded over the RH and LH during voicing contrast discrimina-
The complexity of phonetic features' organization in reading 275

tion (Molfese 1978; Molfese and Molfese 1988; Segalowitz and Cohen
1989), and performances of neurologically impaired patients (Miceli et al.
1978; Oscar-Berman et al. 1975) converge in supporting the view that the
RH plays a special role in the processing of voicing (Simos, Molfese, and
Brenden, 1997).
Therefore, there is evidence from various domains that phonetic features
pattern in natural classes, but their potential hierarchical organisation is still
under question.
Manner of articulation has been proposed to be at the prominent level,
as it defines the representation of a segment within a syllable (Van der
Hulst 2005). Estimates of psychological distance between consonants have
been derived from similarity ratings performed by listeners on spoken con-
sonants (Peters 1963). Manner of articulation was proven to be the most
important auditory dimension, followed by voicing, and subsequently place
of articulation. Moreover, articulator-free features, such as manner, have
been considered to provide the basis for the subsequent discrimination of
articulator-bound features (voice and place) (Stevens 2002). The advantage
for the discrimination of articulator-free over articulator-bound features in
aphasic patients provides support to this claim (Gow and Caplan 1996).
Additionally, an early sensitivity to sound similarities in sequentially pre-
sented syllables has been evidenced in nine-month-old children, who lis-
tened longer to lists that embodied some sub-phonemic regularity (Jusczyk,
Goodman, and Baumann 1999). Infants exhibited sensitivity to shared
manner features occurring at the beginning of syllables, while they were
insensitive to place similarity. In addition, Rogers and Storkel (1998)
pointed out the negative effect of phonetic similarity between pairs of
words upon speech production latencies, and showed that manner similarity
was the most detrimental factor.
However, the relative importance of manner of articulation has been
challenged by data obtained in speech perception and showing better pres-
ervation of voicing and nasality under noisy listening conditions (Miller
and Nicely 1955). Voicing information is still transmitted at signal-to-noise
levels 18dB below those needed for place of articulation. Additionally,
voicing and nasality are much less affected by a random masking noise
than are the other features. Therefore, voicing has been claimed to be one
of the more salient and robust features of English consonants. Nevertheless,
the debate still remains, since Wang and Bilger (1973) showed better pres-
ervation of manner features under noisy listening conditions, perhaps be-
cause of their robust acoustic correlates. The relative importance of voicing
276 Nathalie Bedoin and Sonia Krifi

versus place is also equivocal. There is evidence from discrimination tasks
that listeners are more sensitive to place (Miceli et al. 1978), whereas
metalinguistic tasks requiring listeners to rate similarity of pairs of conso-
nants found that voice contributed either equally (Perecman and Kellar
1981) or more to judgement than did place (Peters 1963). Additionally, the
hierarchy of phonetic categories may evolve during childhood. Indeed, in a
study about the slip-of-the-tongue phenomenon, Jaeger (1992) showed
that place substitutions were the most frequent errors in both adults and
children aged 1;7 to 6;0, but children made fewer voicing errors than
adults, which suggested a more important role of voicing as an organisation
criterion for them.
Taken together, evidence for a prominent status of manner and voicing
phonetic features has been provided, but the comparative importance of
these features is not clear and it may depend on the task (phoneme identifi-
cation, discrimination, production, and metalinguistic tasks). Namely, the
structure of the representations used to consciously estimate phonemic
similarities in metalinguistic tasks could be different from some aspects of
the organisation of phonetic features and phoneme detectors involved in the
first stages of phonemes and letters identification. In our previous experi-
ments, phonetic similarity effects in reading have been assessed using voic-
ing similarity, and we now propose Experiments 1-2 to manipulate phonetic
similarity with other phonetic feature types. The time-course of the rapid
involvement of phonetic features in directing the lateral inhibitory relations
between phoneme detectors could vary with the feature types, and some
kind of hierarchical organisation of the feature types could emerge.
The first series of new experiments presented in this paper further as-
sessed the role of sub-phonemic units during printed word recognition, and
investigated the time course of manner and place of articulation involve-
ment in priming. In Experiments 1a, 1b, 1c, French readers were invited to
perform lexical decision on targets that were briefly primed by a printed
pseudo-word, with 33 msec-, 66 msec-, and 100 msec-SOA, respectively.
Prime and target initial consonants shared either voicing, or voicing and
manner, or voicing and place. Since voicing similarity effects had been
previously assessed in experiments using prime-target pairs that basically
differed by another(s) phonetic feature(s), place and manner similarity ef-
fects were also tested with prime-target pairs that basically differed by an-
other feature (i.e., voicing) in Experiments 2a, 2b and 2c. We addressed the
question of whether place or manner prime-target similarity provided nega-
tive effects on the target processing (like voicing similarity in previous
The complexity of phonetic features' organization in reading 277

experiments), which should reflect the involvement of intra-level inhibitory
relations. On the contrary, better performance in case of place or manner
prime-target similarity should be explained by inter-level excitatory rela-
tions. It may be the case that phoneme detectors are organised on different
tiers, allowing an asynchronous involvement of different phonetic feature
types in the process of neighbourhood inhibition. As a consequence, it is
hypothesized that additional prime-target phonetic similarity determined by
a phonetic feature that does not have temporal priority would not be able to
trigger inhibitory relations within the intra-level phonemic organisation in
case of very short SOA, leaving the way for rapid inter-level activations to
exert facilitative effects in our experiments. Hence, different patterns of
priming effects (facilitative versus negative effects) observed for different
phonetic feature types with short SOAs may provide information about the
time-course of phonetic processing in reading and the hierarchical organisa-
tion of phonetic features.
In Experiment 1, participants performed lexical decision on a printed
target that was briefly preceded by a printed prime. In experimental trials,
primes consisted of pseudo-words that differed from the target only by the
initial consonant. Across three lists, three kinds of primes were paired with
each target, so that each condition was tested on the same targets. Each
participant processed only one list to preclude any target repetition.
VM condition: prime and target (9 words and 9 pseudo-words) differed
by place of the initial consonant, but shared Voicing and Manner (e.g.,
BAME-dame; /bam/-/dam/);
VP condition: prime and target (9 words and 9 pseudo-words) differed
by manner of the initial consonant, but shared Voicing and Place (e.g.,
ZAMEdame; /zam/-/dam/);
V condition: prime and target (9 words and 9 pseudo-words) differed by
place and manner, but shared Voicing (e.g., VAMEdame; /vam/-
/dam/).
In our experiments, we used consonants, which were either voiced or voice-
less, while they were either plosive or fricative in terms of manner of ar-
ticulation. Three values of place of articulation were considered for plosive
consonants (bilabial - /p, b/ ; dental - /t, d/ ; velar - /k, g/), and three values
were considered for fricative ones (labiodental - /f, v/ ; dental /s, z/ ;
postalveolar - /, /). To make it easier regarding place of articulation, the
consonants have been distributed into three categories: Category 1 was
composed of /p, b, f, v), Category 2 contained /t, d, s, z/, and Category 3
contained /k, g, , /.
278 Nathalie Bedoin and Sonia Krifi

Target words contained 1 or 2 syllables (mean = 1.78), 4-6 letters (mean
= 5.26), and 3-5 phonemes (mean = 4.44). Pseudo-words used as targets or
primes were phonotactically legal sequences. Primes had no lexical
neighbour more frequent than the target. In addition, 216 filler prime-target
pairs without overlapping phonemes were included.
In Experiment 1a (N = 27), the SOA lasted 33 msec, while a 66 msec-
SOA and a 100 msec-SOA were respectively used in Experiment 1b (N =
27) and Experiment 1c (N = 27). The 81 participants were native French
speakers and participated in only one version of the experiment. They were
University students, had normal or corrected-to-normal vision with no
known speech or hearing disorder.
The design of Experiment 2 was quite similar and three versions were
proposed to test phonetic similarity effects with a 33 msec-SOA (Experi-
ment 2a, N = 27), a 66 msec-SOA (Experiment 2b, N = 27), and a 100
msec-SOA (Experiment 2c, N = 27). No participant performed both Ex-
periments 1 and 2. Three priming conditions were used:
PM condition: prime and target differed by voicing of the initial conso-
nant, but shared Place and Manner (e.g. FAGUE-vague; /fag/-/vag/);
M condition: prime and target differed by voicing and place, but shared
Manner (e.g. SAGUEvague; /sag/-/vag/);
P condition: prime and target differed by voicing and manner, but
shared Place (e.g. PAGUEvague; /pag/-/vag/).
Contrary to voicing similarity, which decreased lexical decision perfor-
mances in previous priming experiments whatever the SOA, place and
manner similarity effects, when statistically significant, were facilitative
with a 33 msec-SOA, while they were detrimental to performance with a 66
msec- or 100 msec-SOAs. More precisely, when a 33 msec SOA was used
in Experiment 1a, lexical decision latencies were longer if the initial conso-
nants of prime and target only shared voicing rather than both voicing and
manner, F(1, 48) = 5.42, p = .024, or voicing and place, F(1, 48) = 4.93, p
= .031 (see Figure 1, left). Data recorded in Experiment 2a were consistent
with the facilitative priming effect of additional manner similarity, since
responses tended to be shorter in PM condition than in P condition for word
targets, F(1, 48) = 4.00, p = .051, and for pseudo-word targets,
F(1, 48) = 5.04, p = .029. Additionally, responses were more rapid in case
of manner similarity (M condition) than in case of place similarity (P con-
dition) for word targets, F(1, 48) = 5.05, p = .028 (Figure 1, right). The
error rates did not provide any significant effect in Experiments 1a and 2a
(Table 1).
The complexity of phonetic features' organization in reading 279

Table 1. Percentages of errors in the six priming experiments.
VM VP V PM M P
Experiment 1a
Word
Pseudo-Word

4.93%
5.34%
4.52%
4.93%
5.34%
3.41%
Experiment 2a
Word
Pseudo-Word
4.93%
2.47%
1.64%
1.23%

3.29%
4.52%
Experiment 1b
Word
Pseudo-Word

7.41%
4.52%
7.00%
3.70%
3.70%
0.82%
Experiment 2b
Word
Pseudo-Word
3.29%
3.29%
3.70%
1.23%

5.18%
2.06%
Experiment 1c
Word
Pseudo-Word

4.93%
5.34%
4.52%
8.34%
5.34%
0.41%
Experiment 2c
Word
Pseudo-Word
3.70%
3.70%
2.88%
4.11%

2.88%
2.88%
Taken together, results obtained with a short SOA in Experiments 1-2 suggest
that sub-phonemic similarity can provide facilitative priming effects in reading,
since manner similarity, and to a lesser extent place similarity, result in shorter
response latencies. Inter-level excitatory connections can account for such faci-
litative priming effects produced by manner or place similarity. Therefore, dur-
ing the first 33 msec of printed stimulus processing, phonetic feature knowledge
is involved, but at different rates according to the feature types. At this
processing stage, lateral inhibitory intra-level relations can already account for
voicing similarity effects (Bedoin, 1998, 2003), while only rapid inter-level exci-
tatory features-to-phonemes links can account for manner and place similarity
effects. Therefore, it seems that during the first 33 msec of print processing, the
time-course of lateral inhibitory relations differs with the phonetic feature types,
and voicing may have some temporal priority with respect to this aspect of orga-
nisation. In addition, manner similarity provides a greater (or more systematic)
facilitative effect than place similarity at this step of printed stimulus processing.
Figure 1. Mean response latencies and standard errors recorded with a 33 msec-
SOA in Experiment 1a (left panel), and Experiment 2a (right panel).
SOA = 33 ms
500
550
600
650
VM VP V
TR ( ms)
Word
Pseudoword
SOA = 33 ms
500
550
600
650
PM M P
TR ( ms)
Word
Pseudoword
SOA = 33 ms
500
550
600
650
VM VP V
TR ( ms)
Word
Pseudoword
SOA = 33 ms
500
550
600
650
PM M P
TR ( ms)
Word
Pseudoword
280 Nathalie Bedoin and Sonia Krifi

Figure 2. Mean response latencies and standard errors recorded with a 66 msec-
SOA in Experiment 1b (left panel), and Experiment 2b (right panel).

With longer SOAs, the pattern of results was more homogeneous with
those obtained for voicing similarity in previous experiments. Indeed, when
feature similarity effects were significant, performance always decreased
regardless of whether the additional phonetic similarity involved place or
manner. With the 66 msec-SOA (Experiments 1b), more errors were made
in VM condition than in V condition, F(1, 48) = 7.31, p = .010, and in VP
than in V condition, F(1, 48) = 5.08, p = .029, reflecting the negative effect
of manner and place similarity. The manner similarity negative priming
effect was also significant for latencies, which were longer in VM condi-
tion than in V condition, F(1, 48) = 6.80, p = .012 (Figure 2, left). In Expe-
riment 2b, increased response times were observed in case of place simi-
larity (PM M difference), F(1, 48) = 4.57, p = .038 (Figure 2, right), and
no effect reached significance for error rates.
Figure 3. Mean response latencies and standard errors recorded with a 100 msec-
SOA in Experiment 1c (left panel), and Experiment 2c (right panel).

SOA = 66 ms
500
550
600
650
VM VP V
TR ( ms)
Word
Pseudoword
SOA = 66 ms
500
550
600
650
PM M P
TR ( ms)
Word
Pseudoword
SOA = 66 ms
500
550
600
650
VM VP V
TR ( ms)
Word
Pseudoword
SOA = 66 ms
500
550
600
650
PM M P
TR ( ms)
Word
Pseudoword
SOA = 100 ms
500
550
600
650
VM VP V
TR ( ms)
Word
Pseudoword
SOA = 1 00 ms
500
550
600
650
PM M P
TR ( ms)
Word
Pseudoword
SOA = 100 ms
500
550
600
650
VM VP V
TR ( ms)
Word
Pseudoword
SOA = 1 00 ms
500
550
600
650
PM M P
TR ( ms)
Word
Pseudoword
The complexity of phonetic features' organization in reading 281

Finally, less feature similarity priming occurred with a 100 msec-SOA.
Consistent with data observed in Experiment 1b, error rates were lower in
V condition than in VM and VP conditions in Experiment 1c, but the ef-
fects did not reach significance. The only significant effect on latencies was
obtained in Experiment 2c (Figure 3): place similarity decreased perform-
ance for words, since response times were longer in PM condition than in
M condition, F(1, 48) = 4.82, p = .030.
Together with data from previous experiments assessing voicing simi-
larity effects, these results suggest that phonetic priming in reading is not a
phenomenon based merely on the number of shared phonetic features.
Complexity in phonetic priming may be due to differences in the status of
phonetic feature types and in the rate at which each type of feature partici-
pates in lateral inhibitory relations between phonemes. Our data showed
that every feature prime-target similarity effect reaching significance with a
66 msec-SOAs decreased performance and was in line with lateral inhibito-
ry relations assumed to organize phoneme detectors. It can be noticed that a
place similarity negative effect remained with a 100 msec-SOA, whereas
manner similarity did no longer influence lexical decision. Phonetic simi-
larity effects recorded with a 33 msec-SOA also differed according to the
phonetic feature type, which is of major importance regarding the time-
course of phonetic processing in reading and provides some information
about the hierarchical organisation of phonetic features. With this very
short SOA, voicing similarity already decreases performances, which can
be accounted for by lateral inhibitory relations between phoneme detectors
(Bedoin, 1998).
On the contrary, the present data show that manner similarity, and to a
lesser extent, place similarity, facilitates target processing, which can be
interpreted as the result of pre-activation based on bottom-up excitatory
relations. Therefore, it seems that, 1) place and manner release a mechan-
ism based on lateral inhibitory relations between phoneme detectors around
66 msec of print processing, whereas voicing triggers this mechanism as
soon as 33 msec after the stimulus appeared (Bedoin, 1998, 2003), 2) the
role of place similarity in this mechanism is still observed with a 100 msec-
SOA, which is not the case for manner similarity, and 3) manner is of ma-
jor importance in the inter-level excitatory mechanism during the 33 first
msec of print processing (see Table 2).


282 Nathalie Bedoin and Sonia Krifi

Table 2. Summary of sub-phonemic similarity effects in Experiments 1a, 2a, 1b,
2b, 1c and 2c (+ and - respectively for facilitative and negative effects).
Phonetic feature similarity SOA
33 msec 66 msec 100 msec
Manner
+ + - -
Place
+ - - -

The facilitative priming effect observed in case of manner or place similari-
ty with a 33 msec-SOA is in line with facilitative sub-phonemic priming
effects observed by Lukatela, Eaton and Turvey (2001) with a short SOA
(57 msec). These authors investigated phonetic prime-target similarity ef-
fects using conditions that differed as regards the place and/or manner
prime-target similarity. Thus, the facilitative priming effect that they rec-
orded with a short SOA is consistent with our data in Experiments 1a and
1b. Therefore, phonetic similarity effects in reading are rather complex and
it appears that the feature types govern the time-course of their involvement
within the internal mental organisation of phoneme detectors. According to
the present results, the voicing feature enjoys one kind of priority over
place and manner within the hierarchical organisation of phonetic feature
types, since it seems to be the most rapid to direct lateral inhibitions among
phoneme detectors. The prolonged negative effect of place similarity, in
comparison to manner similarity effect, may be brought together with data
reported by Valle, Rossato and Rousset (this volume). They show that
various articulation places for two consonants are favoured within CVC
syllables as for CV.CV consecutive syllables. It can be noticed that the
preference for patterns where the first consonant is a labial and the second
one is a coronal (i.e. the Labial-Coronal effect, McNeilage and Davis
2000), and the place similarity avoidance principle (Frisch, Pierrehumbert,
and Broe 2004) are also in line with this idea. Negative place similarity
priming effects may contribute to preclude place similarity between conso-
nants of consecutive syllables.
The complexity of phonetic features' organization in reading 283

6. Syllables matching: evidence for a hierarchical organisation of
phonetic feature types from a metalinguistic task
Experiments 1-2 suggested that the time-course of some phonetic process-
ing in reading depends on the type of phonetic features, and that the inhibi-
tory effect of voicing similarity occurs in priority. Does it mean that voic-
ing is of prominent availability to readers in the conceptualization of
phonological information? Since some data emphasize the prominent role
of manner as a phonetic feature type (Gow and Caplan 1996; Jusczyk,
Goodman, and Baumann 1999; Peters 1963; Rogers and Storkel 1998; Ste-
vens 2002; Van der Hulst 2005; Wang and Bilger 1973) but also the role of
voicing (Jaeger 1992; Miller and Nicely 1995; Peters 1963), we assumed
that adults rely strongly on voicing but also on manner to estimate the simi-
larity between consonants. Therefore, our second goal is to provide evi-
dence for an implicit hierarchical representational structure of phonetic
feature types from a metalinguistic task. Classification data have been re-
corded in forced-choice syllable matching experiments in adults and chil-
dren to investigate how the mental organisation of phonemes progressively
sets up in normally developing readers (Krifi, Bedoin, and Herbillon 2005).
In Experiments 3
a,b
, 4
a,b
and 5
a,b
, we assessed the relative weight implic-
itly granted to voicing, manner and place of articulation, to guide responses
in a forced choice syllable matching task. Systematic biases in decisions
would argue for an implicit hierarchy of phonetic feature types, since the
proposed competitors did not differ by the number of shared feature values,
but by the type of shared features. Twenty-four students participated in
Experiments 3a, 4a, and 5a (visual versions), and 17 others were tested in
the audio-visual versions (Experiments 3b, 4b, and 5b). All were native
French speakers, had normal or corrected-to-normal vision, and had no
known speech or hearing disorder. Each trial was composed of one printed
CV target syllable and two other CVs were simultaneously displayed
above. The three syllables remained on the screen until the subject pressed
one of the keys to indicate which of the two response CVs could be paired
with the target according to intuitively estimated acoustic similarity. In the
audio-visual version, the three printed syllables were additionally heard
from headphones. There were 6 voiced consonants (/d/, /b/, /g/, /v/, /z/, //)
and 6 voiceless ones (/t/, /p/, /k/, /f/, /s/, //) and vowels were always /a/ (see
Table 3 for examples). As in the previous experiments, plosive and fricative
consonants were distributed into three categories, regarding place of articu-
284 Nathalie Bedoin and Sonia Krifi

lation: Category 1 (/p, b, f, v/), Category 2 (/t, d, s, z/), and Category 3 (/k,
g, , /).

Table 3. Examples of trials in experiments 3, 4, and 5. Target syllables are in bold.
Experiment 3 Experiment 4 Experiment 5
Manner vs. Place Manner vs. Voicing Voicing vs. Place
ba za da sa ga pa
ta pa va

Two features types were pitted against each other in each experiment: man-
ner and place (Experiment 3), manner and voicing (Experiment 4), place
and voicing (Experiment 5). Additionally, we evaluated if one feature type
had priority whatever its value, which would reflect the consistency of the
feature type hierarchical status. For instance, we tested if manner similarity
was preferred, regardless of whether it was represented by plosive or frica-
tive pairs.
In Experiment 3, subjects responses differed from chance with a pref-
erence for manner similarity over place similarity in the visual version
(67%), t(23) = 2.92, p = .0038, as in the audio-visual one (76%),
t(16) = 9.64, p < .0001. Matching was consistently guided by manner simi-
larity, regardless of whether the paired syllables were plosives,
t(23) = 2.82, p = .0048 (version a), t(16) = 6.61, p < .0001 (version b) or
fricatives, t(23) = 2.73, p = .0060 (version a), t(16) = 10.06, p < .0001 (ver-
sion b).
Data recorded in Experiment 4 confirmed the prominent status of man-
ner as a criterion to evaluate similarity, since choices differed from chance
to the advantage of manner (rather than voicing) in the visual version
(77%), t(23) = 8.04, p < .0001, as in the audio-visual one (80%),
t(16) = 6.08, p = .0001, regardless of whether the phonetic similarity con-
cerned the plosive value, t(23) = 6.43, p < .0001, t(16) = 5.51, p = .0003, or
the fricative one, t(23) = 6.24, p < .0001, t(16) = 5.76, p = .0002 (versions a
and b, respectively). However, the percentage of manner choices was lower
when competition comes from two voiced consonants than from two voice-
less ones, in version a (difference = 8%, t(23) = 2.80, p = .0079) and in
version b (difference = 10%, t(16) = 5.1, p = .0353).
In Experiment 5, the pattern of results differed between the visual and
the audio-visual versions. Place similarity was preferred to voicing similar-
ity above chance (59%) in the visual version, t(23) = 2.89, p = .0042, which
The complexity of phonetic features' organization in reading 285

was confirmed for pairs of consonants whose place of articulation value
was of Category 1, t(23) = 3.36, p = .0013, or Category 2, t(23) = 2.00,
p = .0284, but not Category 3. Additionally, place similarity was not pre-
ferred to voicing similarity when it competes with a pair of voiced conso-
nants. In the audio-visual version, responses did not globally differ from
chance, except that voicing similarity was preferred over place similarity
for pairs of voiced consonants (62%), t(16) = 2.57, p = .0152.
To investigate the progressive organisation of a hierarchy across pho-
netic feature types, Experiments 3-5 were presented to normal reading chil-
dren (10 second graders and 10 third graders in the visual version, and 11
third graders in the audio-visual version). Choices never differed from
chance in second graders; third graders exhibited a growing sensitivity to
manner similarity (57%) over place similarity, since their responses dif-
fered from chance in Experiment 3, t(9) = 2.72, p = .0117 (version a),
t(10) = 4.00, p = .0013 (version b). However, this effect was restricted to
fricative consonants, t(9) = 2.39, p = .0203 (version a), t(10) = 2.86,
p = .0085 (version b). When manner similarity was pitted against voicing
similarity, third graders matched consonants that shared manner more often
than would be expected by chance (56% in version a, t(9) = 2.64,
p = .0135; 55% in version b, t(10) = 2.60, p = .0133). However, the promi-
nent status of manner was less prevailing than in adults, since children pre-
ferred manner similarity only for plosive consonants in the visual version,
t(9) = 2.58, p = .0148, and only for fricative consonants in the audio-visual
one, t(10) = 2.10, p = .0310. Finally, when place and voicing similarity
were set in competition with one another, third graders responses did not
differ from chance.
Taken together, the data recorded in Experiments 3-5 suggest that sylla-
bles can be processed in terms of phonetic features not only in speech per-
ception but also in reading. The phonetic feature types shared by two
printed CVs are critical to bias syllable sorting and the data allow us to
draw the main outlines of a hierarchical organisation of these feature types.
7. Hierarchies of phonetic feature types in the first steps of printed
word recognition and in a metalinguistic task
First of all, this research has confirmed the existence of sub-phonemic
priming effects for printed stimuli. The results obtained in Experiments 1-2
with various SOAs are partly in agreement both with the rare data obtained
286 Nathalie Bedoin and Sonia Krifi

with phonetic priming experiments in reading (Bedoin 1998, 2003; Krifi et
al. 2003, 2005; Lukatela et al. 2001) and with sub-phonemic priming ef-
fects in speech processing experiments (Goldinger et al. 1989, 1992; Luce
et al. 2000). Additionally, they provide arguments for major differences in
status for the phonetic feature types. However, these categories do not seem
to be organized in a unique hierarchical structure. Indeed, depending on the
task requirements, the investigated feature hierarchy can refer either to the
weight of feature types in intuitively guided but consciously accessed rep-
resentations in metalinguistic tasks (Experiments 3-5), or to the time-course
of their involvement in rapid, automatic and transient processes taking part
in printed word recognition (Experiments 1-2). Taken together, the data
presented in this chapter are consistent with the existence of reliable hierar-
chies, which can nevertheless slightly differ according to the investigated
mechanisms.
In the first series of experiments, manner and place similarity between
prime and target produced either facilitative or negative effects, depending
on the processing step. Indeed, manner or place similarity provided facilita-
tive priming effects only with a very brief SOA (33 msec), while the pat-
tern of results was quite different if the SOA lasted 66 msec. This con-
firmed the facilitative priming effect observed by Lukatela and colleagues
(2001) with a SOA that was also shorter than 66 msec (57 msec). The au-
thors compared conditions in which prime and target differed only by voic-
ing, or by voicing and manner, or voicing and place. The facilitative prim-
ing effect obtained with a brief SOA in case of manner and, to a lesser
extent, place similarity is in accordance with the prediction of the previ-
ously proposed model (Bedoin 2003), according to which the more rapid
influence of phonetic features extracted from a printed stimulus is based on
inter-level phonetic feature-to-phoneme detectors links and mainly provides
pre-activation of phonemes. In an experiment conducted with second grad-
ers who had a normal reading level and with older dyslexic children, we
showed that such a facilitative priming effect also occurred for them in case
of voicing similarity, which could be accounted for by a mere inter-level
mechanism (Krifi et al. 2003).
However, in previous experiments we showed that voicing similarity
provided a negative priming effect although the SOA was very brief in
adult good readers and in third graders with a normal reading level. Addi-
tionally, with a longer SOA, this negative priming effect remained for voic-
ing similarity (Bedoin, 1998), and the present paper showed that it appeared
for manner and place similarity. Accordingly, another mechanism that also
The complexity of phonetic features' organization in reading 287

involved phonetic knowledge may occur in print processing. Indeed, this
negative priming effect can be accounted for by an intra-level transient
mechanism of lateral inhibition, intended to reduce neighborhood effects
between phoneme detectors (Bedoin, 2003), consistent with connectionist
models (McClelland and Rumelhart 1981). According to the present results,
it seems that such a mechanism is dependent on the type of phonetic fea-
tures: voicing may be able to provide such lateral inhibition after only 33
msec of print processing, whereas manner and place features require more
time (at least 66 msec) to play this role. The involvement of each phonetic
feature type in reading actually seems to be a question of time: manner and
place similarity exert the same negative priming effect as voicing similar-
ity, provided that the SOA is longer. In light of the present results, the pho-
neme detectors seem to be organized by shared phonetic features, and the
working of this structure may be dependent itself on time constraints ap-
plied to the phonetic feature types.
In the second series of experiments, the participants were not con-
strained by the presentation time and the required evaluation was made
consciously, even though intuitive criteria were probably used. The forced-
choice syllables matching task may provide information about an internal
organisation of feature types and the possible priority of one type was
evaluated not only on the basis of the expressed preferences, but took also
into account the consistency of these preferences. For instance, the promi-
nent status of manner in this task is shown by the higher frequency of syl-
lable matching according to manner than to place similarity (Experiment 3)
or voicing similarity (Experiment 4), but it is also supported by the mainte-
nance of this preference whatever the manner value. Indeed, it was as ap-
parent for a pair of plosive consonants as for a pair of fricative ones. This
bias in favour of manner similarity was observed both in the visual and the
audio-visual versions, and slightly increased in this last version. The
prominent impact of manner over place and voicing similarity is in accor-
dance with models where this phonetic category is assumed to hold the top
level as it defines the representation of a segment within a syllable (Van der
Hulst 2005), and as it may provide the basis for the subsequent discrimina-
tion of articulator-bound features such as place and voicing (Stevens 2002).
Our results are also consistent with other experiments, where manner was
evidenced to be more important than voicing and place to rate the similarity
of syllables (Peters 1963). Additionally, an early sensitivity to sub-
phonemic similarity based on manner of articulation has been evidenced in
infants (Jusczyk, Goodman, and Baumann 1999) and our results show that
288 Nathalie Bedoin and Sonia Krifi

the first type of shared phonetic features that emerges as a salient one for
sorting printed syllables by third graders is also manner.
However, the major importance of manner for syllable matching is at
odds with the very rapid involvement of voicing in the lateral inhibitory
relations, which is assumed to reflect the rising of negative priming effects
at a shorter SOA in case of voicing similarity than in case of manner or
place similarity. Voicing seems to take the better place in the time-course
of this automatic and transient phonetic mechanism of lateral inhibition in
reading. On the contrary, manner plays the most important role in metalin-
guistic tasks designed to investigate how categories are represented. This
difference underlines the importance of the paradigms used to investigate
the relative weight of feature types, since they could address different kinds
of hierarchies, functional ones and representational ones.
Although manner has a prominent status and manifests a good consis-
tency as a category of phonetic features in the metalinguistic task, the ef-
fects of manner similarity and place similarity appear to be reduced when
they are in competition with similarity based on the shared voiced (but not
voiceless) feature value. This result shows that voicing can modulate the
evaluation of syllable similarity, which is consistent with data showing its
robustness for speech perception in noise (Miller and Nicely 1955), its im-
portant role in similarity rating (Peters 1963), and in childrens productions
(Jaeger 1992). Additionally, some data have shown that voicing may be
less adversely affected than place by conditions of dichotic competition
(Studdert-Kennedy and Shankweiler 1970). Nevertheless, our result also
reveals that voicing is less consistent than manner as a similarity criterion:
the voiced value better represents this category than the voiceless one to
compete with manner.
Previous investigations of the relative impact of shared place or voicing
to evaluate syllables similarity have provided mixed results. When they
were invited to sort triads of stop consonants into pairs adjuged most simi-
lar, normal subjects sorted on the basis of place and voicing equally often
(Perecman and Kellar 1981). Similarly, the benefit provided by shared
phonetic features to process dichotic stimuli has been reported to be more
important in case of shared place than in case of shared voicing, which
suggests that voicing is less affected by the negative effect of competition
in speech perception (Studdert-Kennedy and Shankweiler 1970), but this
difference has not been systematically replicated (Oscar-Berman et al.
1975; Studdert-Kennedy, Shankweiler, and Pisoni 1972). In addition, place
substitution errors were the most frequent slip-of-the-tongue errors (Jaeger
The complexity of phonetic features' organization in reading 289

1992), while listeners were more responsive to place than to voicing in
discrimination tasks (Miceli et al. 1978). These mixed results are in agree-
ment with the main importance of task requirements in the assessment of
phonetic features hierarchies. In our metalinguistic experiments, the rela-
tive importance of place and voicing similarity also remains unresolved.
When place and voicing similarity were pitted against each other, different
patterns of results were observed in the visual and the audio-visual ver-
sions. Namely, pairings were more frequently based on shared place (at
least within Category 1 and Category 2) in the visual version, while shared
voicing guided responses more clearly, at least for voiced consonants, in
the audio-visual version. Additionally, contrary to manner, place and voic-
ing were not consistently represented by their various values. Indeed, sub-
jects favoured place similarity over voicing similarity in the visual version
to match syllables sharing place values of Category 1 or Category 2 but not
syllables sharing place value of Category 3. Similarly, voicing was better
represented by the voiced value than by the voiceless one, since syllable
pairing on the basis of manner or place decreased when it was in competi-
tion with a pair of voiced consonants but not with a pair of voiceless ones,
both in the visual and the audio-visual versions.
Therefore, the data presented in this chapter argue for the prominent
status of manner as a phonetic feature type to guide intuitively estimated
acoustic similarity between printed consonants, both in the visual and au-
dio-visual versions of the syllable matching task. Voicing seems to play a
secondary role, but influences decisions all the same, especially when voic-
ing similarity is represented by a pair of voiced consonants.
Finally, the hierarchy of phonetic feature types seems to be progres-
sively taken into account during childhood in our experiments. In second
graders, no significant bias to pair syllables according to one or another
phonetic category was observed. On the contrary, responses provided by
third graders reflect the emergence of the prominent status of manner simi-
larity over place and voicing similarity. However, this prominence was
building up and was not as consistent as in adults, since manner was not yet
preferred regardless of the feature value (plosive or fricative) shared by the
consonants. The testing of fourth and fifth graders is in progress and may
allow us to investigate more precisely the gradual involvement of a hierar-
chical organisation of phonetic feature types in metalinguistic tasks.

290 Nathalie Bedoin and Sonia Krifi

References
Bedoin Nathalie
1995 Articulation de codages phonologiques et smantiques en lecture
silencieuse. Revue de Phontique Applique 115: 101-117.
1998 Phonological feature activation in visual word recognition: The case
of voicing. Paper presented at the Xth Conference of the European
Society for Cognitive Psychology (ESCOP), September 1998, Jeru-
salem, Isral.
2003 Sensitivity to voicing similarity in printed stimuli: Effect of a training
programme in dyslexic children. Journal of Phonetics 31: 541-546.
Bedoin, Nathalie and Hubert Chavand
2000 Functional hemispheric asymmetry in voicing feature processing in
reading. Paper presented at the Tenth Annual Meeting of the Society
for Text and Discourse, July 2000, Lyon, France.
Berent, Iris
1997 Phonological priming in the lexical decision task: Regularity effects
are not necessary evidence for assembly. Journal of Experimental
Psychology: Human Perception and Performance 23: 1727-1742
Berent, Iris and Charles A. Perfetti
1995 A Rose is a REEZ: The two-cycles model of phonology assembly in
reading English. Psychological Review 102: 146-184.
Blumstein, Sheila E.
1990 Phonological deficits in aphasia: Theoretical perspectives. In: A.
Caramazza (ed.), Cognitive neuropsychology and neurolinguistics:
Advances in models of cognitive function and impairment, 33-53.
Hillsdale NJ: Lawrence Erlbaum Associates.
Bosman, Anna M. T. and Annette M. B. de Groot
1996 Phonologic mediation is fundamental to reading: Evidence from
beginning readers. The Quarterly Journal of Experimental Psychol-
ogy 49: 715-744.
Bosman, Anna M. T. and Guy C. Van Orden
1997 Why spelling is more difficult than reading. In: Charles A. Perfetti,
Laurence Rieben and Michel Fayol (eds.), Learning to spell, 173-
194. Hillsdale NJ: Lawrence Erlabaum Associates.
Brysbaert, Marc
2001 Prelexical phonological coding of visual words in Dutch: Automatic
after all. Memory and Cognition 29: 765-773.
Caplan, David and Jennifer Aydelott Utman
1994 Selective acoustic phonetic impairment and lexical access in an aphasic
patient. The Journal of Acoustical Society of America 95: 512-517.
Clements, Nick
1985 The geometry of phonological features. Phonology Yearbook 2: 225-252.
The complexity of phonetic features' organization in reading 291

Cohen, Henri and Norman S. Segalowitz
1990a The role of linguistic prosody in the perception of time-compressed
speech: A laterality study. Journal of Clinical and Experimental
Neuropsychology 12: 39.
1990b Cerebral hemispheric involvement in the acquisition of new phonetic
categories. Brain and Language 38: 398-409.
Coltheart, Max, Kathleen Rastle, Conrad Perry, Robyn Langdon and Johannes
Ziegler
2001 DRC: A dual route cascaded model of visual word recognition and
reading aloud. Psychological Review 108: 204-256.
Connine, Cynthia M., Dawn G. Blasko and Debra Titone
1993 Do the beginning of spoken words have a special status in auditory
word recognition? Journal of Memory and Language 32 : 193-210.
Connine, Cynthia M., Debra Titone, Thomas Deelman and Dawn G. Blasko
1997 Similarity mapping in spoken word recognition. Journal of Memory
and Language 37: 463-480.
Ferrand, Ludovic and Jonathan Grainger
1992 Phonology and orthography in visual word recognition: Evidence
from masked non-word priming. The Quarterly Journal of Experi-
mental Psychology 45: 353-372.
1994 Effects of orthography are independent of phonology in masked
form priming. The Quarterly Journal of Experimental Psychology
47(A): 365-382.
Frisch, Stefan A., Janet B. Pierrehumbert and Michael B. Broe
2004 Similarity avoidance and the OCP. Natural Language and Linguistic
Theory 22: 179-228.
Frost, Ram
1998 Toward a strong phonological theory of visual word recognition:
True issues and false trails. Psychological Bulletin 123: 71-99.
Gibbs, Patrice and Guy C. Van Orden
1998 Pathway selections utility for control of word recognition. Journal
of Experimental Psychology: Human Perception and Performance
24: 1162-1187.
Goldinger, Stephen D., Paul A. Luce and David B. Pisoni
1989 Priming lexical neighbors of spoken words: Effects of competition
and inhibition. Journal of Memory and Language 28: 501-518.
Goldinger, Stephen D., Paul A. Luce, D. B. Pisoni and J K. Marcario
1992 Form-based priming in spoken word recognition: The roles of com-
petition and bias. Journal of Experimental Psychology: Learning,
Memory and Cognition 18: 1211-1238.
Gow, David W. and David Caplan
1996 An examination of impaired acoustic-phonetic processing in aphasia.
Brain and Language 52: 386-407.
292 Nathalie Bedoin and Sonia Krifi

Grainger, Jonathan and Ludovic Ferrand
1996 Masked orthographic and phonological priming in visual word rec-
ognition and naming: Cross-task comparisons. Journal of Memory
and Language 35: 623-647.
Jaeger, Jeri J.
1992 Phonetic features in young childrens slips of the tongue. Lang.
Speech 35: 189-205.
Jusczyk, Peter W., Mara B. Goodman and Angela Baumann
1999 Nine-month-olds attention to sound similarities in syllables. Journal
of Memory and Language 40: 62-82.
Krifi, Sonia, Nathalie Bedoin and Vania Herbillon
2003 Phonetic priming and backward masking in printed stimuli: A better
understanding of normal reading and dyslexia. Paper presented at
the XIIIth Conference of the European Society for Cognitive Psy-
chology (ESCOP), September 2003, Granada, Spain.
2005 The hierarchy of phonetic features categories in printed syllables
matching: Normal reading and developmental dyslexia. Poster pre-
sented at the XIVth Conference of the European Society for Cogni-
tive Psychology (ESCOP), September 2005, Leiden, Netherlands.
Krifi, Sonia, Nathalie Bedoin and Anne Mrigot
2003 Effects of voicing similarity between consonants in printed stimuli,
in normal and dyslexic children. Current Psychology Letters: Behav-
iour Brain, and Cognition 10: 1-7.
Lesch, Mary F. and Alexander Pollatsek
1993 Automatic access of semantic information by phonological codes in
visual word recognition. Journal of Experimental Psychology:
Learning, Memory and Cognition 19: 285-294.
1998 Evidence for the use of assembly phonology in accessing the mean-
ing of printed words. Journal of Experimental Psychology: Learning,
Memory and Cognition 24: 573-592.
Luce, Paul A., Stephen D. Goldinger, Edward T. Auer JR. and Michael S. Vitevitch
2000 Phonetic priming, neighborhood activation, and PARSYN. Percep-
tion and Psychophysics 62: 615-625.
Lukatela, Georgije, Thomas A. Eaton, C. Lee and M. T. Turvey
2003 Does visual word identification involve a sub-phonemic level ? Cog-
nition 78: 41-52.
Lukatela, Georgije, Katerina Lukatela and M. T. Turvey
1993 Further evidence for phonological constraints on visual lexical access:
TOWED primes FROG. Perception and Psychophysics 53: 461-466.
Lukatela, Georgije and M. T. Turvey
1991 Phonological access of the lexicon: Evidence from associative prim-
ing with pseudohomophones. Journal of Experimental Psychology:
Human Perception and Performance 4: 951-966.
The complexity of phonetic features' organization in reading 293

1994 Visual lexical access is initially phonological: 2. Evidence from
phonological priming by homophones and pseudohomophones.
Journal of Experimental Psychology: General 123: 331-353.
Luo, Chun R., Reed A. Johnson and David A. Gallo
1998 Automatic activation of phonological information in reading: Evi-
dence from the semantic relatedness decision task. Memory and
Cognition 26: 833-843.
MacNeilage, Peter and Barbara L. Davis
2000 On the origin of internal structure of word forms. Science 288: 527-531.
Marslen-Wilson, William D., Helen E. Moss and Stef van Halen
1996 Perceptual distance and competition in lexical access. Journal of
Experimental Psychology: Human Perception and Performance 22:
1376-1392.
Marslen-Wilson, William D. and Paul Warren
1994 Levels of perceptual representation and process in lexical access:
Words, phonemes, and features. Psychological Review 101: 653-675.
McClelland, James L. and David E. Rumelhart
1981 An interactive activation model of context effects in letter perception:
Part 1. An account of basic findings. Psychological Review 88: 375-407.
McMurray, Bob, Mickael K. Tanenhaus and Richard N. Aslin
2002 Gradient effects of within-category phonetic variation on lexical
access. Cognition 86: 33-42.
McQueen, James M., Delphine Dahan and Anne Cutler
2003 Continuity and gradedness in speech processing. In: Niels Schiller
and Antje Meyer (eds.), Phonetics and phonology in language com-
prehension and production: Differences and similarities, 39-78.
Berlin: Mouton de Gruyter.
Miceli, Gabriele, Carlo Caltagirone, Guido Gainotti and Paola Payer-Rigo
1978 Discrimination of voice versus place contrasts in aphasia. Brain and
Language 6: 47-51.
Miller, George A. and Patricia E. Nicely
1955 An analysis of perceptual confusions among some English conso-
nants. The Journal of Acoustical Society of America 27: 338-352.
Molfese, Dennis L.
1978 Neural correlates of categorical speech perception in adults. Brain
and Language 5: 25-35.
Molfese, Dennis L. and Veronica J. Molfese
1988 Right-hemisphere responses from preschool children to temporal
cues to speech and non-speech materials: Electrophysiological corre-
lates. Brain and Language 33: 245-259.
Oscar-Berman, Marlene, Edgar B. Zurif and Sheila Blumstein
1975 Effects of unilateral brain damage on the processing of speech
sounds. Brain and Language 2: 345-355.
294 Nathalie Bedoin and Sonia Krifi

Perecman, Ellen and Lucia Kellar
1981 The effect of voice and place among aphasic, nonaphasic right-
damaged, and normal subjects on a metalinguistic task. Brain and
Language 12: 213-223.
Perfetti, Charles A. and Laura Bell
1991 Phonemic activation during the first 40 ms of word identification:
Evidence from backward masking and priming. Journal of Memory
and Language 30: 473-485.
Perfetti, Charles A. and Sulan Zhang
1991 Phonemic processes in reading Chinese words. Journal of Experi-
mental Psychology: Learning, Memory and Language 1: 633-643.
Peter, Mira and M. T. Turvey
1994 Phonological codes are early sources of constraint in visual semantic
categorization. Perception and Psychophysics 55: 497-504.
Peters, Robert W.
1963 Dimensions of perception of consonants. The Journal of Acoustical
Society of America 35: 1985-1989.
Rayner, Keith, Sara C. Sereno, Mary F. Lesch and Alexander Pollatsek
1995 Phonological codes are automatically activated during reading: Evi-
dence from an eye movement priming paradigm. Psychological Sci-
ence 6: 26-32.
Rebattel, Magalie and Nathalie Bedoin
2001 Cerebral hemispheric asymmetry in voicing and manner of articula-
tion processing in reading. Poster presented at the XIIth Conference
of the European Society of Cognitive Psychology (ESCOP 12), Sep-
tember 2001, Edinburgh, Scotland.
Rogers, Margaret A. and Holly L. Storkel
1998 Reprogramming phonologically similar utterances: The role of pho-
netic features in pre-motor encoding. Journal of Speech, Language,
and Hearing Research 41: 258-274.
Segalowitz, Norman S. and Henri Cohen
1989 Right hemisphere EEG sensitivity to speech. Brain and Language
37: 220-231.
Simos, Panagiotis G., Denis L. Molfese and Rebecca A. Brenden
1997 Behavioral and electrophysiological indices of voicing-cue discrimi-
nation: Laterality patterns and development. Brain and Language 57:
122-150.
Slowiaczek, Louisa M., Howard C. Nusbaum and David B. Pisoni
1987 Phonological priming in auditory word recognition. Journal of Ex-
perimental Psychology: Learning, Memory and Cognition 13: 64-75.
Sparrow, Laurent and Sbastien Miellet
2002 Activation of phonological codes during reading: Evidence from errors
detection and eye movements. Brain and Language 81: 509-516.
The complexity of phonetic features' organization in reading 295

Stevens, Kenneth N.
2002 Toward a model for lexical access based on acoustic landmarks and
distinctive features. The Journal of Acoustical Society of America
111: 1872-1891.
Studdert-Kennedy, Michael and Donald Shankweiler
1970 Hemispheric specialization for speech perception. The Journal of
Acoustical Society of America 48: 579-594.
Studdert-Kennedy, Michael, Donald Shankweiler and David B. Pisoni
1972 Auditory and phonetic processes in speech perception: Evidence
from a dichotic study. Cognitive Psychology 3: 455-466.
Valle, Nathalie, Solange Rossato and Isabelle Rousset
To appear this volume.
Van der Hulst, Harry
2005 Molecular structure of phonological segments. In: Philip Carr,
Jacques Durand and Colin J. Ewen (eds.), Headhood, elements,
specification and contrastivity, 193-234. John Benjamins Publishing
Company.
Van Orden, Guy C.
1987 A ROWS is a ROSE: Spelling, sound, and reading. Memory and
Cognition 15: 181-198.
Van Orden, Guy C. and Stephen D. Goldinger
1994 Interdependance of form and function in cognitive systems explains
perception of printed words. Journal of Experimental Psychology :
Human Perception and Performance 20: 1269-1291.
Van Orden, Guy C., Marian A. Jansen op de Haar and Anna M. T. Bosman
1998 Complex dynamic systems also predict dissociations, but they do not
reduce to autonomous components. In: Alfonso Caramazza (ed.), Access
of phonological and orthographic lexical forms: Evidence from disso-
ciations in reading and spelling, 131-166. Hove: The Psychology Press.
Van Orden, Guy C., James C. Johnston and Benita L. Hale
1988 Word identification in reading proceeds from spelling to sound to
meaning. Journal of Experimental Psychology: Learning, Memory,
and Cognition 14: 371-386.
Wang, Marilyn D. and Robert C. Bilger
1973 Consonant confusions in noise: A study of perceptual features. The
Journal of Acoustical Society of America 54: 1248-1266.
Ziegler, Johannes C. and Arthur M. Jacobs
1995 Phonological information provides early sources of constraint in the proc-
essing of letter strings. Journal of Memory and Language 34: 567-593.
Ziegler, Johannes C., Guy C. Van Orden and Arthur M. Jacobs
1997 Phonology can help or hurt the perception of print. Journal of Ex-
perimental Psychology: Human Perception and Performance 23:
845-860.


Part 4:
Complexity in the course of language acquisition


Self-organization of syllable structure: a coupled
oscillator model
Hosung Nam, Louis Goldstein and Elliot Saltzman
1. Syllable structure
It has generally been claimed that every language has syllables with onsets
(CV structure), while languages may or may not allow coda consonants
(VC structure). While recent work on Arrernte (Breen & Pensalfini, 1999)
has cast doubt on the absolute universality of onsets, it is clear that there is
a significant cross-linguistic preference for CV structure (Clements &
Keyser, 1983; Clements, 1990). In addition, evidence from phonological
development shows that CV structure is typically acquired before VC struc-
ture (e.g., Vihman & Ferguson, 1987; Fikkert, 1994; Demuth & Fee, 1995;
Gnanadesikan, 1996; Salidis & Johnson, 1997; Levelt et al., 2000). This
preference for CV structure in distribution and acquisition has been claimed
as arising from universal grammar (UG: Chomsky, 1965) where CV is the
unmarked, core syllable structure. Yet, the UG hypothesis does not answer
the question Why is CV the most unmarked structure? This study aims
to provide a rationale, grounded in dynamical systems theory, for why CV
is favored across languages. The self-organization of syllable structure in
phonological development is simulated using a model in which syllable
structures are defined by the coupling graph in a system of gestural plan-
ning oscillators that control patterns of relative intergestural timing. The
simulation shows that, due to the hypothesized stronger coupling inherent
in CV graphs compared to VC graphs, CVs emerge earlier than VCs. The
model of syllable structure based on coupled planning oscillators (see sec-
tion 2 below) has been developed to account for a variety of empirical ob-
servations about speech production (Nam & Saltzman, 2003; Goldstein et
al., 2006; Nam et al, submitted a, b), independently of any consideration of
acquisition facts.
In addition to the explanatory weakness of the UG hypothesis, two addi-
tional empirical observations about phonological development are not eas-
ily accommodated by the UG hypothesis. One is that the delay in the emer-
gence of coda consonants varies across languages as a function of the
frequency of coda consonants in adults word production (Roark & Demuth
*

300 Hosung Nam, Louis Goldstein and Elliot Saltzman

2000). Thus, both intrinsic and extrinsic factors interact in the development
of syllable structure. The second observation is that, unlike the acquisition
patterns of single consonants, it has been shown in several languages that
consonant clusters can appear earlier in word- (or syllable-) final position
than word-initial position (Mexican-Spanish: Macken, 1977; Telugu:
Chervela, 1981; German and Spanish: Lle & Prinz, 1996; Dutch: Levelt,
Schiller, & Levelt 2000; English: Templin, 1957, Paul & Jennings, 1992,
Dodd, 1995, Watson & Scukanee, 1997, McLeod et al. 2001, Kirk and
Demuth 2003), though the opposite pattern can also be observed (for ex-
ample, Dutch: Levelt et al. ,2000).
In the self-organization model presented in this paper, both of these
facts are readily accounted for. The first follows from hypothesizing a self-
organizing process that includes both intrinsic dynamic constraints on
planning intergestural timing and the attunement of speaker/listener agents
to the behavior of other agents. The second follows from the independently
motivated hypothesis (Browman & Goldstein, 2000; Nam & Saltzman,
2003) that the production of consonant clusters can involve competition
between the coupling of (all of) the consonants to the vowel, and the cou-
pling of the consonants to one another. As we will see in the rest of the
paper, the fact that CV coupling is stronger than VC coupling makes it
easier to learn to produce single onset Cs than coda Cs, but competition
provided by the stronger CV coupling in onsets makes it more difficult to
learn to coordinate consonant clusters.
2. A coupled oscillator model of syllable structure
Within the framework of articulatory phonology (e.g., Browman & Gold-
stein, 1992; 1995), word forms are analyzed as molecules built up by com-
bining discrete speech gestures, which function simultaneously as units of
speech production (regulating constriction actions) and units of (phonologi-
cal) information. In these molecules, the relative timing of gestures is also
informationally or phonologically significant. For example, the words
mad and ban, are composed of the identical set velum, tongue tip,
tongue body, and lip gestures. The only difference between these two
words is in velum gestures timing with respect to the other gestures. Thus,
there must be some temporal glue in speech production that keeps the ges-
tures appropriately coordinated. As with the gestures themselves, this glue
appears to have both a regulatory and an informational function.
Self-organization of syllable structure 301

2.1. A coupled oscillator model of intergestural timing
We have been developing a model of speech production planning in which
dynamic coupling plays the role of temporal glue (Saltzman & Byrd, 2000;
Nam & Saltzman, 2003; Goldstein et al, 2006; Nam, in press; Nam et al.,
submitted a,b). The central idea is that each speech gesture is associated
with a nonlinear planning oscillator, or clock, and the activation of that
gesture is triggered at a particular phase (typically 0
o
) of its oscillator. A
pair of gestures can be coordinated in time by coupling their corresponding
oscillators to one another so that the oscillators settle into a stable pattern of
relative phasing during planning. Once this pattern stabilizes, the activation
of each gesture is triggered by its respective oscillator, and a stable relative
timing of the two gestures is achieved.
There are two sources of evidence supporting the hypothesis that the
relative timing of gestures is controlled by coupling their planning oscilla-
tors. The first comes from experiments on phase-resetting. When subjects
repeat a particular word, the gestures composing that word exhibit stable
relative phasing patterns. When the ongoing repetition is mechanically
perturbed, the characteristic pattern of phasing is quickly re-established
(reset) in a way that is consistent with the behavior of coupled oscillators
(Saltzman et al., 1998). When a word is produced only once, instead of
repeatedly, qualitatively similar phase-resetting is also observed.
The second source of evidence comes from experiments on the kinemat-
ics of speech errors (Goldstein et al., in press). The subjects repeat phrases
like cop top, in which the tongue dorsum gesture for /k/ and the tongue
tip gesture for /t/ alternate. Over time, subjects productions tend to shift to
a new pattern, in which the tongue tip and the tongue dorsum gestures are
produced synchronously at the beginning of both words, causing the per-
ception of speech errors, (Pouplier & Goldstein, 2005). These errors have
been interpreted as resulting from a shift to a more stable mode of fre-
quency-locking among the gestural timing oscillators that compose these
words. Specifically, in normal productions, there is a 1:2 relation between
the frequency of the tongue tip (or dorsum) oscillators and the oscillators
for a vowel or final C. The new (more effortful) pattern exhibits a more
stable, 1:1 frequency relation. Such shifts to more stable modes of fre-
quency-locking have been observed in several types of bimanual coordina-
tion in humans (Turvey, 1990; Haken et al., 1996). Very similar kinds of
changes are observed in tasks that do not involve any overt repetition (Pou-
plier, in press); when the shifts must occur in planning process.
302 Hosung Nam, Louis Goldstein and Elliot Saltzman

To illustrate the planning model, consider Fig. 1. The word bad is
composed of three gestures: a Lip closure, a wide palatal constriction of the
Tongue Body, and a Tongue Tip closure. A typical arrangement of these
gestures in time, as can be observed from kinematic data, is shown in the
gestural score in (a). The width of each box represents that gestures activa-
tion interval, on the time during which its dynamics control the appropriate
constricting system (lips, tongue body, tongue tip). These activation inter-
vals are the output of the coupled oscillator model of planning. The input to
the planning model is a coupling graph, shown in (b). The graph specifies
how the oscillators controlling the gestures timing are coupled to one an-
other. The graph shows that the oscillator for the palatal Tongue Body (/a/)
gesture is coupled to both the Lip (labial closure) and Tongue Tip (alveolar
closure) oscillators. The solid line connecting the TB gesture with the Lip
gesture indicates that the coupling target of those gestures is specified as an
in-phase relation between the oscillators, while the dotted line connecting
the TB gesture to the Tongue Tip gesture indicates that an anti-phase cou-
pling target is specified.

Figure 1. (a) Gestural score for bad. Time is on the horizontal axis. (b) Coupling
graph for bad. Solid line indicates in-phase coupling target, dotted line
indicates anti-phase coupling target. After Goldstein et al, (2006).
At the onset of the planning simulation for an utterance, each (internal)
oscillator is set to an arbitrary initial phase and the oscillators are set into
motion. Over time, the coupling among the oscillators causes them to settle
into a stable pattern of relative phasing. The model that accomplishes this,
Self-organization of syllable structure 303

the task dynamics model of relative phasing, first developed by Saltzman &
Byrd (2000) for single pairs of gestures, has been extended to a network of
multiple couplings (Nam & Saltzman, 2003; Nam et al., submitted a, b).
The coupling between each pair of gestures is controlled by a (cosine-
shaped) potential function defined over their relative phase, with a mini-
mum at the target (intended) relative phase value. When the relative phase
of two oscillators differs from its target, forces derived from the potential
functions are applied to the individual oscillators that have the effect of
bringing their relative phase closer to the target value. The steady-state
output of this planning process is a set of oscillations with stabilized rela-
tive phases. From this output, the gestural score (for example, the one for
bad in (a)) is produced where gestural onset and offset times are specified
as a function of the steady-state pattern of inter-oscillator phasing. So, be-
cause the TB and Lip oscillators (Fig. 1b) settle into a steady-state, in-phase
pattern, their activation intervals (Fig. 1a) are triggered synchronously. The
TB and TT oscillators settle into an anti-phase pattern, and their activations
show the TT gesture to be triggered substantially later than the TB gesture.
Gestural scores can be the input for the constriction dynamics model
(Saltzman & Munhall, 1989), which generates the resulting time functions
of constriction (tract) variables and articulator trajectories. Articulator tra-
jectories can then be used to calculate the acoustic output in our vocal tract
model.


2.2. Intrinsic modes of coupling
One theoretical advantage of modeling timing using oscillator coupling
graphs is that systems of coupled nonlinear oscillators can display multiple
stable modes. These modes have been shown to play a role in the coordina-
tion of oscillatory movements of multiple human limbs (fingers, arms, legs;
see Turvey, 1990, for a review). Such experiments show that when asked to
oscillate limbs in a regular coordinated way, subjects can do so readily,
without any training or learning, as long as the task is to coordinate them
using in-phase (0
o
relative phase) or anti-phase (180
o
relative phase) pat-
terns. Other phase relations can be acquired through learning, e.g. complex
drumming, but only after significant training.
While these two modes of coupling are (intrinsically) available without
training, they are not equally stable. This has been demonstrated in experi-
ments in which the frequency (rate) of subjects oscillation is manipulated.
304 Hosung Nam, Louis Goldstein and Elliot Saltzman

When subjects oscillate two limbs in an anti-phase pattern and the fre-
quency is increased, the relative phasing undergoes a spontaneous transi-
tion to the in-phase pattern. However, if subjects begin oscillating in the
in-phase pattern, an increase of oscillation frequency has no effect on the
relative phase. From these results, it has been concluded that the in-phase
mode is the more stable one.
These experimental results were the basis for Haken, Kelso and Bunzs
(1985) model of coordinated, rhythmic behavior, which can be understood
as a self-organized process governed by low-dimensional nonlinear dynam-
ics. They developed a simple potential function (the HKB potential func-
tion) that can account quantitatively for the results of these experiments.





V() = -a cos() - b cos (2); ( =
2
-
1
)
Figure 2. HKB potential function. a is varied from 1 (left) to 2 (center) to 4 (right).
The function, shown in Fig.2, is the sum of two cosine functions of relative
phase, one of which has half the period of the other. a and b are weighting
coefficients of the two cosine functions respectively and their ratio, b/a,
determines the shapes of the potential landscapes. The left-most example in
Fig. 2 shows the shape of the function for b/a = 1. There are two potential
minima at 0 and 180 degrees and, depending on initial conditions, relative
phasing can stabilize at either of the two minima, making them attractors.
However, the valley associated with the in-phase minimum is both deeper
and broader. Thus, it technically has a larger basin because there is a larger
range of initial values for that will eventually settle into that minimum.
The experimental results suggest that frequency of oscillation (rate) is a
control parameter for the system: as it is scaled upwards in a continuous
fashion, the behavior of the system will undergo a qualitative change at
some critical point. If the value of b/a is specified as an inverse function of
oscillation frequency, then the HKB model in Fig. 2 predicts an abrupt
phase transition from the anti-phase to in-phase pattern as the rate in-
creases. This can be seen by comparing the function for the different values
of b/a shown in Fig. 2. As b/a decreases from 1.0, the basin of the anti-
phase mode becomes shallower. Eventually, the attractor disappears when
Self-organization of syllable structure 305

b/a = 1/4. At this point, the anti-phase pattern becomes unstable and the
relative phase will be attracted to the minimum at 0
o
.


2.3. Syllable structure and coupling modes
For a system like speech, which is acquired early in a childs life and with-
out explicit training, it would make sense for the early coordination of
speech gestures to take advantage of these intrinsically available modes. It
has been proposed (e.g., Goldstein et al., 2006) that phonological systems
make use of the in-phase and anti-phase modes, which form the basis of
syllable structure in phonology. If we treat phonology as a fundamentally
combinatorial system, consider the problem of coordinating two gestures, a
consonant gesture with a vowel gesture, given the predisposition to exploit
the presence of the intrinsically distinct in-phase and anti-phase modes.
Goldstein et al. (2006) have proposed the coupling hypothesis of syllable
structure: in-phase coupling of C and V planning oscillators underlies what
we observe as CV structures, and more generally underlies the relation
between onset and nucleus gestures. The anti-phase mode of coupling plan-
ning oscillators underlies VC structures and the relation between nucleus
and coda gestures.
Evidence for the in-phase mode in CV structures can be found in the
fact that the constriction actions for the C and V gestures in CVs are initi-
ated synchronously. For example, in Fig. 1, the activation of the Lip gesture
and the Tongue Body gestures begins at the same time. This synchrony
follows from the hypothesis that the oscillators associated with the C and V
gestures are coupled so as to settle into an in-phase mode, together with the
model assumption that gestural activation is triggered at phase 0 of a ges-
tures oscillator. The idea that consonant and vowel gestures are triggered
synchronously goes back to the pioneering work of Kozhevnikov & Chis-
tovich (1965). Kinematic data for V
1
pV
2
and V
1
bV
2
utterances presented by
Lfqvist & Gracco (1999) show that the onset of lip movement for /p/ or /b/
and the onset of tongue body movement for V
2
occur within 50 ms of one
another, across all 4 subjects and all six different V
1
V
2
patterns, with only 2
outlier values. In the case of a coda /p/, the relation to the vowel is obvi-
ously not one of synchrony. The evidence that the oscillators exhibit the
anti-phase relation is necessarily indirect. The anti-phase relation implies
that the final /p/ will be triggered at 180
o
of the vowel gesture. The point in
time that corresponds to 180
o
will, of course, depend on the frequency of
306 Hosung Nam, Louis Goldstein and Elliot Saltzman

the vowel oscillator. In Nam et al. (submitted, b), we show that simple hy-
potheses about the frequencies of vowel and consonant oscillators, com-
bined with the hypothesis of in-phase CV and anti-phase VC coupling, can
account for a rich set of quantitative phonetic timing data.
The coupling hypothesis can also be used to explain a variety of qualita-
tive properties of CV and VC structures. These include the following:
Universality. CV syllables occur in all human languages, while VC
ones do not. Since the in-phase mode is more stable, stronger (the
potential well in Fig. 2 is deeper for the in-phase model), and has a
larger basin of attraction than the anti-phase mode, it follows that
the in-phase (CV) mode should always be available for coordinating
Cs and Vs in a language, while the anti-phase (VC) mode may not
be.
Combinatoriality. Even in languages which allow VC structures,
vowels and codas often exhibit restrictions when combined. Onsets
and rimes, in contrast, can usually combine freely in languages. In-
deed, their relatively free combinatoriality is the major source of
phonological generativity and the basis for the traditional decompo-
sition of the syllable into onset and rime. Goldstein et al. (2006)
propose that there is a relation between the stability/strength of cou-
pling and combinatoriality. The idea is that it is possible to jointly
perform any two actions as long as they are coordinated in-phase
because this coordination is intrinsically the most stable. Even
though anti-phase coordinations are more stable than other out of
phase modes, speakers may learn not to use the more stable in-phase
coordination for the forms that have coda Cs. Some combinations
may be difficult to learn, leading to combinatorial restrictions.
Re-syllabification. Single, intervocalic coda consonants may be re-
syllabified into onset position in running speech, particularly as
speech rate increases (Stetson, 1951; Tuller & Kelso, 1991; de Jong,
2001a, b). This follows automatically from the HKB model (Fig. 2),
where CV is defined as in-phase and VC is defined as anti-phase
(Re-syllabification takes place on the following syllable, rather than
on the preceding one because the final C and following V are al-
ready roughly synchronous, even though they may not be coupled
with each other at all. So, they would fall into the basin of an in-
phase attractor)
Another key hypothesis in the coupled oscillator planning model is that
incompatible coupling specifications can compete with one another and
Self-organization of syllable structure 307

that, during planning, the system of oscillators settles to a set of steady-
state relative phases that is the result of the competition. The use of com-
petitive coupling was originally proposed for CC and CV coupling in onset
clusters (Browman & Goldstein, 2000). All C gestures in an onset were
hypothesized to be coupled in-phase with the vowel (this is what defines an
onset consonant). For some combinations of C gestures, such as oral con-
strictions with velic or glottal gestures, synchronizing multiple C gestures
results in a recoverable structure. The result is what is usually analyzed as a
multiple-gesture segment (nasal, lateral, voiceless stop). In other cases
(clusters such as /sp/ for example), synchronous coordination does not pro-
duce a recoverable structure, and the two gestures must be at least partially
sequential (Goldstein et al, 2006; Nam, in press). Therefore, the oral conso-
nant gestures must also be coupled anti-phase to each other. Browman and
Goldstein (2000) proposed that this competitive structure could account for
a previously observed generalization about the relative timing of consonant
and vowel gestures in forms with onset clusters (the so-called c-center
effect, Browman & Goldstein, 1988; Byrd, 1995). As Cs are added to an
onset, the timing of all Cs relative to the vowel is shifted: the C closest to
the vowel shifts rightward to overlap the vowel more, while the first C
slides leftward away from the vowel. The temporal center of the sequence,
the c-center, maintains relatively invariant timing with respect to the vowel.
This effect has since been modelled in coupled oscillator simulations (Nam
& Saltzman, 2003; Nam, et al, submitted-b). In-phase coupling of two onset
Cs with the V and anti-phase coupling of the two Cs with each other results
in a output in which the phasing of C
1
and C
2
to the V is -60
o
and 60
o
, re-
spectively, and the C
1
C
2
phasing is 120
o
, a pattern consistent with available
data.
However, available evidence suggests that this kind of competitive
structure may or may not be found in coda consonants, depending on the
language (Nam, in press), or possibly the speaker. In English, coda clusters
do not exhibit the c-center effect consistently (Honorof & Browman, 1995),
though it may be found for some speakers (Byrd, 1995). Browman & Gold-
stein (2000) hypothesized a non-competitive structure for codas in English:
the first coda C is coupled anti-phase with the vowel and the second coda C
is coupled anti-phase with the first. More recent work (Nam & Saltzman,
2003; Nam, et al, submitted-b) has shown that this hypothesized difference
between onset and coda clusters for English accounts for the lack of a coda
c-center effect and also for the fact that gestures in onset clusters exhibit
less variability in relative timing than do gestures in coda clusters. When
308 Hosung Nam, Louis Goldstein and Elliot Saltzman

noise is added to the coupled oscillator simulation, the competitively-
coupled onset oscillators exhibit less trial-to-trial variability than do the
noncompetitively-coupled coda oscillators.
This hypothesized difference in coupling topology between onsets and
codas has also been argued (Nam, in press) to be part of the explanation for
the cross-linguistic generalization: while coda clusters can add metrical
weight to a syllable but onsets clusters rarely do. (One exception is Ratak, a
dialect of Marshallese; Bender, 1999). The idea is that weight is partly de-
termined by the duration of a syllable (Gordon, 2004). While adding a C to
a coda increases the duration of the rime (and the whole syllable) by the
duration of that consonant, adding a C to the onset increases the duration of
the whole syllable by only about half the duration of the C.
Cross-language differences in the presence of competitive vs. non-
competitive coupling structure in codas have been proposed (Nam, in press)
in order to account for the differing moraic status of coda Cs across lan-
guages. English, and other languages in which coda Cs are moraic, are
modeled with a non-competitive structure in codas: adding Cs to a coda is
predicted not to decrease the duration of the vowel, so the added C in-
creases the duration of the entire syllable substantially. Languages in
which coda Cs are not moraic (e.g. Malayalam), are modeled with a com-
petitive structure in the coda, which causes vowel shortening as Cs are
added to the coda, and, resultingly, a lack of weight associated with the
added C. (Nam (in press) showed how these hypothesized coupling differ-
ences can account for acoustic data from these language types (Broselow et
al., 1997))
Thus, there is a strong asymmetry between onsets and codas. Onsets al-
ways have a competitive structure, but this may be lacking in codas. How-
ever, regardless of topological differences in the coupling structures of
onsets and codas, onset Cs are characterized by in-phase couplings while
coda Cs are characterized by anti-phase couplings. The asymmetry be-
tween onsets and codas led to the hypothesis that, due to the greater intrin-
sic strength of in-phase coupling, all prevocalic Cs are pulled into the in-
phase relation with the V, whereas coda Cs can (and do in some languages)
escape the pull of anti-phase coupling with the V (Nam, in press).
In this paper, we show that the difference in coupling modes between
onsets and codas can also account for the difference in the time course of
acquisition between CV and VC structures. To show this, we performed
simulation experiments investigating the self-organization of syllable struc-
ture, in which the only relevant pre-linguistic structure attributed to the
Self-organization of syllable structure 309

child is that (s)he comes equipped with (a) the HKB potential function for
the pairwise coordination of multiple actions and (b) the ability to attune
his/her behavior to the behaviors of others in his/her environment.
3. Self-organization of syllable structure
We investigated the self-organization of syllable structure in a series of
simulations with a computational agent model. Agent models have been
employed to investigate several aspects of phonological and phonetic struc-
ture such as partitioning of physical continua into discrete phonetic catego-
ries (Oudeyer, 2002, 2005, 2006; Goldstein, 2003), the structure of vowel
systems (deBoer, 2001; Oudeyer, 2006), consonant-vowel differentiation
(Oudeyer, 2005), and sequentiality in consonant sequences (Browman &
Goldstein, 2000). In these simulations, the agents interact using a very sim-
plified set of local behaviors and constraints. Through these interactions,
the agents internal states evolve, as do the more global properties of the
system. Depending on the choice of constraints, the system may evolve to
have quite different properties. Thus, for example, the importance of some
constraint (k) in the evolution of some property of interest (P) can be evalu-
ated by contrasting the results of simulations with and without that con-
straint. The models are not meant to be faithful simulations of the detailed
process of (phylogenetic or ontogenetic) evolution of some property, but
rather a way of testing the natural attractors of a simple system that in-
cludes the constraint of interest.
The models employed here involve a child agent with no syllable struc-
ture, and an adult agent with a developed syllable structure. The child
comes to the simulation with two distinct classes of actions (C and V), and
it attempts to coordinate them in time. The existence of distinct C and V
actions early in the childs development (e.g. during babbling) has been
denied in the frame-content model of speech production development
(MacNeilage and Davis, 1998). That models treatment of syllable structure
and its emergence contrasts, therefore, with the one proposed here. This
conflict will be addressed further in the Discussion. At this point, however,
we note that even if the frame-content view is correct in excluding a C-V
distinction during the babbling stage, this distinction could still have
evolved by the time the child is producing words with onset and coda con-
sonants, which is the age we are simulating here.

310 Hosung Nam, Louis Goldstein and Elliot Saltzman

3.1. Emergence of CV vs. VC structures
Learning in this model is accomplished by self-organization under condi-
tions imposed by both intrinsic constraints on coordination and attunement
to the coordination patterns implicit in and presumably recoverable from,
the acoustic environment structured by the ambient language. A Hebbian
learning model was employed, in a manner similar to that used by Oudeyer
(2006) to model the emergence of discrete phonetic units. The simulation
includes a child agent and an adult agent. Both have a probabilistic repre-
sentation of the distribution of intended relative phasing between a pair of
gestures that evolves over time. The adult representation includes modes
corresponding to CV (in-phase) and VC (anti-phase), where the relative
strength of these modes can differ from language to language, correspond-
ing to the relative frequency of CV and VC structures in that language. At
the onset of learning, the childs representation does not include any CV or
VC modes, so the child displays no preference for producing any relative
phases over others; the relative phases produced are randomly distributed.
As the result of the learning process, modes develop that correspond to the
modes found in the adult speakers representations and to their relative
strength. What we predict is that even though the child will develop the
same modes as the adult partner, the rate at which the CV mode develops
will be faster than that of the VC mode, regardless of the ultimate relative
strength of the modes.

Self-organization of syllable structure 311

Figure 3. Self-organization learning model for emergence of CV and VC structures.
The learning simulation proceeds as follows (Fig. 3). On a given learning
iteration, the child randomly selects a relative phase value,
SEL
, to produce
from its evolving distribution of relative phases, and a single-well intended
potential function with a minimum at that value is added to the double-well
(HKB) intrinsic potential function to create a resultant composite potential
function. The agent then plans the production of a pair of gestures by using
the composite potential to specify the coupling function between a pair of
corresponding planning oscillators. Oscillator motion is initialized with a
random pair of initial phases, and oscillator motions settle in to a stabilized
relative phase,
OUT
, in accordance with the shape of the composite poten-
tial. The child then produces a pair of gestures with relative phase
OUT
; the
child also compares
OUT
to the (veridically) perceived relative phase of an
utterance token produced by the adult,
ADULT
. If the difference between
these two relative phases falls within a criterion tolerance, the child tunes
its relative phase density distribution to increase the likelihood of produc-
ing that phase,
OUT
, again. The details of the model are now described.

3.1.1. Phase representation and selection model
The target relative phase parameter values of the interoscillator coupling
function between C and V gestures is represented (for both the child and
the adult agents) by a set of virtual (neural) units,
i
, each of which
represents some value of the relative phase parameter. At the outset of the
312 Hosung Nam, Louis Goldstein and Elliot Saltzman

simulation, values of relative phase are assigned to these neural units in one
degree increments from -179
o
to 360
o
Thus,
1
=-179
o
,
2
=-178
o
,
540
=360
o
, for a total of 540 units
1
. On a given learning trial, one of the 540
units is selected and its relative phase value,
SEL
, is used as the agents
intended relative phase. Since at the beginning of the simulation, the units
relative phase values are uniformly distributed across the relative phase
range, the value of
SEL
will be completely random. As learning progresses
through attunement (section 3.1.3 below), the values of the neural units will
come to be clustered, with most units having values near 0
o
or 180
o
. We
will represent this clustering at various points in the simulation by plotting
a frequency histogram, showing the number of units as a function of rela-
tive phase. We will refer to this distribution as the density distribution, and
the number of units sharing a value of relative phase as its density. The
density distribution will develop peaks at 0
o
and 180
o
; therefore, since
SEL

values are chosen by randomly sampling the density distribution, the values
of
SEL
will also tend to be either 0
o
or 180
o
as the distribution develops
over the course of learning.

3.1.2. Planning and production model
Once an intended relative phase,
SEL
, is selected by the child, it is used to
construct the systems composite potential function, which will shape the
evolution of interoscillator relative phase between V and C gestures. To
model the fact that not all relative phases patterns can be as easily learned
or produced, not only does the intended single-well relative phase potential
contribute to the shape of this composite potential, but so does the HKB
double-well potential function (see section 2.2) that represents the intrinsic
modes of coupling two oscillators. The intended potential, P
intended
, is mod-
eled by a cosine function whose minima are at
SEL
and whose peaks have
been flattened (see intended potential inset in Fig. 3) according to the
following expression:

where = 1, and can vary between 0.5-1.0 according to the value of the
density distribution at
SEL
(see Equation 2 below). The intended potential
function is added to the HKB intrinsic potential function to build the com-
posite potential function. The relative contribution of the intended potential
( ) ( )
( )
SEL
SEL
P


= cos
1 ) cos( cosh
1
2
intended
(1)
Self-organization of syllable structure 313

in the composite should depend on how well learned, or well-practiced, the
intended pattern is. To implement this, the coupling strength associated
with the intended potential, , (Equation 1) is scaled between 0.5-1.0 ac-
cording to the density (i.e., the number of relative phase neural units) de-
fined for
SEL
in the childs evolving, experience-dependent probability
density distribution for relative phase (section 3.1.1). The coupling strength
associated with the intrinsic HKB potential is defined as (1 ). is a stan-
dard logistic squashing function that takes density for
SEL
, D(
SEL
) as its
argument, as defined in equation (2):


In this equation, = 0.5, = 0.5, = 0.15, and D
0
= 70. This function is
shown in Fig. 4.
Once the composite potential is specified for the planning simulation, the
initial phases of the planning oscillators are chosen at random; over the
course of the simulation, the oscillators settle to a stable relative phase un-
der the control of the composite potential function. This final, steady-state
relative phase value,
OUT
, is thus determined both by the landscape of the
composite potential and the basin selected by the randomly chosen initial

Figure 4. Logistic squashing function relating strength of intended coupling, , to
probability density of
SEL
, D(
SEL
).
=

1+e
D(
SEL
)D
0 ( )






+
(2)
314 Hosung Nam, Louis Goldstein and Elliot Saltzman


i
=
i
rG
i

i

SEL
( )
G
i
=
1
2
e

1
2

SEL









2
conditions. As a result, the produced
OUT
may correspond to neither the
intended relative phase,
SEL
, nor to the in-phase modes intrinsic to the
HKB potential, nor to the anti-phase ones. This final relative phase of the
planning oscillators could then be used to trigger activation of C and V
gestures but, in this agent model, the simulation simply stops with the oscil-
lators settling at their final, steady-state pattern.

3.1.3. Attunement model
The attunement of the child to the language environment is modeled by
comparing the childs produced relative phase (
OUT
) on a given learning
cycle to a randomly sampled relative phase from the adult probability dis-
tribution,
ADULT
. If the childs produced value matches the randomly cho-
sen adult value such that |
OUT

ADULT
| < 5
o
, then the intended phase used
by the child on that trial (
SEL
) is gated into the tuning (or learning)
process. Tuning occurs as the units of phase representation that have values
(
i
) similar to
SEL
respond by increasing their level of activation as a func-
tion of the proximity of
i
and
SEL
in phase space,
i

SEL
. Specifically,
the receptive field of each unit-i is a Gaussian function of
i
with mean,

SEL
, and standard deviation, = 40
o
, as described in (3).
The values of the all units,
i
,

are then attracted to
SEL
in proportion to
their activation levels, according to the parameter-dynamics equation in (4),
where
i
is the new unit value and r is a learning rate parameter (equal to 1
in this simulation). The result of this parameter-dynamic tuning process is
an evolution of the density distribution of units along the relative phase
continuum. The example in Fig. 5 shows the initial uniform state of the
childs density distribution and the effect on the units of gating a value of

SEL
= 2
o
into the tuning process. Tuning ends one cycle of the phase learn-
ing process.
(3)
(4)
Self-organization of syllable structure 315

Figure 5. Visualization of tuning (learning) process in self-organization model
The adults phase distribution was varied across simulations to model am-
bient languages with different properties. For example, both English and
Spanish exhibit preference for onsets (CV) over codas (VC) in production
but coda consonants are more frequently produced in English than in Span-
ish. This kind of asymmetry can be expressed by the difference between the
phase distributions of English and Spanish adult speakers, with the proba-
bility of units clustered in the anti-phase region being higher for English
than Spanish. In the simulations presented here, three different hypothetical
languages with different probabilities of in-phase and anti-phase modes
were tested: a) CV>VC (in-phase = .6, anti-phase = .4); b) CV=VC (in-
phase = anti-phase = .5); and c) CV<VC (in-phase = .4, anti-phase = .6).
The first two types can be thought of as modeling Spanish and English,
respectively. The third might be an Australian language, many of which
have a preference for Coda Cs (e.g., Tabain et al, 2004), the extreme case
being Arrernte, for which the claim of no onsets has been made (Breen &
Pensalfini, 1999).









316 Hosung Nam, Louis Goldstein and Elliot Saltzman

3.1.4. Results
The three simulations are summarized in Fig. 6.
Figure 6. The learning process in three hypothetical language environments with
different corresponding densities of in-phase and anti-phase modes. Top
left: CV>VC; top right: CV=VC; bottom left: CV<VC. Solid line=in-
phase (CV) mode; dashed line=anti-phase (VC) mode.
For each simulation, the figure shows the density of in-phase and anti-
phase modes as a function of iteration, or learning cycle, for 1000 iterations
(main panels), the adult mode probabilities (upper right insets in main pa-
nels), and the child agents density distribution after 200, 400, 600, and 800
iterations (bottom row of insets).
In all simulations, the child agents learning of each mode was halted when
its probability (= (mode density)/540) reached half of the corresponding
adult mode probability.The results show that the in-phase (CV) mode is
stabilized more quickly than the anti-phase (VC) mode regardless of differ-
ences in the adult agents mode distributions. The numbers of iterations
Self-organization of syllable structure 317

required to reach the halting criterion are: a) CV 263, VC 664 (Figure 6,
top left); b) CV 214, VC 563 (Figure 6, top right); and c) CV 228, VC 514
(Figure 6, bottom left). If we assume that adult-like production of CV or
VC depends on development of the corresponding mode, the advantage of
CV in acquisition is predicted. In addition, as the data from different lan-
guages show, the lag between production of onsets and codas is less in
languages with a higher frequency of coda consonants (Figure 6, bottom
left). Finally, due to our halting criterion for the learning procedure, the
mode densities of the child agent come to match that of the adult. Thus, the
adult pattern is learned but, more interestingly, CVs are always acquired
first.


3.2. Emergence of CCV vs. VCC structures
Now lets consider a child who has begun to learn the distinct in-phase and
anti-phase modes of coordinating C and V, and who now intends to pro-
duce a structure (s)he perceives as having two prevocalic Cs (CCV) or two
postvocalic Cs (VCC). We assume that (s)he doesnt know anything about
coordinating the two Cs, so when (s)he attempts to produce the CCV, for
example, she uses the learned in-phase CV mode to coordinate both Cs to
the V. This will have the effect of synchronizing the two Cs, since they will
both be coordinated in-phase with the V. The output, therefore, will be a
form that does not typically match the adult form. By hypothesizing an
additional (evolving) CC coordination pattern that can compete with the
with CV pattern, the child can begin to produce adult-like structures once
the CC pattern is well established enough to push the Cs apart, despite the
synchronizing pressures of the CV pattern.
If we embed this scenario in an agent model like that described in sec-
tion 3.1, and treat VCCs as completely parallel in structure to CCVs, then
we would predict that the childs CC (anti-phase) mode would emerge
more quickly for VCC than for CVV. This would be the case because cou-
pling strength in the model for an intended relative phase is dependent on
mode density, which will typically be less for the anti-phase VC than for
the in-phase CV early in the childs experience. Thus, it would be easier to
pull the Cs apart for the relatively less stable anti-phase VC couplings of
VCC than for the stronger in-phase CV couplings of CCV. In real lan-
guages, the structure of VCC is not always parallel to CCV, as discussed in
section 2.3, and it is not clear whether there is any cross-language differ-
318 Hosung Nam, Louis Goldstein and Elliot Saltzman

ence in acquisition of coda clusters as a function of moraic status. How-
ever, if the child never even attempts to produce coda clusters with a com-
petitive structure, then we would also expect VCC to develop more rapidly
than CCV, since there would be no synchronizing force working against
learning to produce the sequence of Cs in VCC. It seemed to us more sound
methodologically to test the more challenging case in which the structures
of CCV and VCC are completely parallel and differences are in the strength
of CV vs. VC coupling alone. We tested this prediction by performing ad-
ditional agent simulations, one for CCV structures and one for VCC struc-
tures.

3.2.1. Extending the model to clusters
Both CCV and VCC simulations began identically to those in section 3.1,
with adult input on each trial restricted to CV or VCs. These simulations
assume that the frequencies of adult CV and VC are equal (see top right
panel of Fig. 6).
By iteration 200, there are already two established modes of C^V coor-
dination (see the leftmost density distribution in the bottom row insets of
Fig. 6s top right panel), though the frequencies associated with those mod-
es continue to grow. (We use the term C^V to refer to the coordination of V
and C gestures, without regard to whether they correspond to CV or VC
structures). Thus, the child can effectively choose to produce either a CV or
a VC by this time, depending on the part of the C^V distribution the child
samples from (near 0
o
or near 180
o
). In our simulations, the C^V distribu-
tions are frozen at this iteration, and the child partitions the C^V distribu-
tion into CV and VC subparts (Fig. 7) and begins to attempt to produce
CCVs or VCCs using the appropriate subpart for the simulation being run.
At this point (iteration 200), relative frequencies associated with the CV
and VC modes have not reached their adult levels, and all tested language
types show a stronger mode for CV.
Self-organization of syllable structure 319

Figure 7. Child agents developed CV (top) and VC (bottom) modes captured at
iteration 200.
Selection and Planning. From this point on, the child selects three intended
phases on each trial: two are selected from the CV sub-distribution (or VC
sub-distribution, depending on the simulation), specifying the CV (or VC)
phase for each of the two Cs,
C1V
SEL
(or
VC1
SEL
) and
C2V
SEL
(or
VC2
SEL
).
The third one comes from a new CC distribution
CC
SEL
,

which begins
completely flat. The adult output is also now restricted to CCVs or VCCs
(depending on simulation): two CV (or VC) phases, and also a C-C phase,
drawn from a distribution around 120
o
, which represents the result of com-
petition of the adult CV (or VC) mode with an anti-phase mode between
Cs, (discussed in section 2.3).
Thus, starting from iteration 200, the childs coupling graph input to the
planning model includes three coupling links with selected target relative
phases. For the CCV simulation, these are
C1V
SEL
,
C2V
SEL
and
CC
SEL
and
for the VCC simulation, these are
VC1
SEL
,
VC2
SEL
and
CC
SEL
. There is no
addition of the intrinsic HKB potential in this simulation
2
. The final set of
relative phases is computed from the coupled oscillator model, and the final
relative phase of the consonant cluster
CC
OUT
is compared to the value
selected from the adult distribution with a mode at 120
o
,
CC
ADULT
.
320 Hosung Nam, Louis Goldstein and Elliot Saltzman

Attunement. When first attempting CCV or VCC outputs, the CV and
VC modes already developed by the child both tend to foster CC synchro-
nization. Hence, early cluster productions will be nowhere near the adult
phase values (which, we would explain is why in real life we do not hear
the child producing any clusters). Thus, tuning of the CC mode would
never get off the ground if it depended on a child-adult matching criterion
as stringent as that used in the simulation in 3.1. Consequently, we hy-
pothesize a different form of attunement here. A selected value (
CC
SEL
) is
gated into the learning process whenever
CC
OUT
is such that the Cs are
planned to be triggered in the same temporal order as the adult Cs (C
1
be-
fore C
2
). Thus, the condition for gating is that the final relative phase is
positive:
CC
OUT
>0. Then, in order to make the simulation somewhat de-
pendent on the goodness of the match between the correctly ordered CCs,
the learning rate (r in equation 4) was set to be proportional to the inverse
of the relative phase mismatch between output and adult:
r = min (a / |
CC
ADULT

CC
OUT

|, 3) (5)
where a = 20.

3.2.2. Results
Results of the CCV and the VCC simulations are shown in Fig. 8. The den-
sity of the CC mode grows much more quickly in the VCC simulation than
in the CCV simulation. We assume that until stable sequential coupling of
CC is acquired, phasing to the vowel will result in multiple Cs being pro-
duced synchronously.
Figure 8. Density of CC anti-phase mode, as a function of iteration number for
CCV and VCC simulations.

Self-organization of syllable structure 321

Therefore, they will not be readily perceivable in the childs output, and the
child would be described as not producing clusters of the relevant type.
Thus, the model predicts that we should perceive children as producing
VCC structures before CCV structures, because CC coupling stabilizes
earlier in VCC structures.
4. Discussion
In summary, the results presented here show that it is possible to model the
course of acquisition of CV vs. VC and CCV vs. VCC structures as emerg-
ing from a self-organized process if we make three basic hypotheses that
form the boundary conditions for the process: 1) syllable structure can be
modeled in terms of modes of coupling in an ensemble of gestural planning
oscillators; 2) infants come to the learning process with very generic con-
straints that predispose them toward producing in-phase and anti-phase
coordinations between pairs of gestures; and 3) infants attune their action
patterns to those they perceive in the ambient language environment. Our
results are striking because the seemingly contradictory acquisition trends
in the emergence of onsets and codas with single Cs vs. C clusters follow
from the same principle in this modelthe relatively greater strength of in-
phase than anti-phase coupling.
There are, of course, many limitations to the type of modeling presented
here. One major limitation is that we do not provide an explicit account of
how the child agent is able to extract relevant phase information from the
articulatory and acoustic patterns that result from adults phasing patterns.
Behavioral evidence across several domains shows that sensory informa-
tion must make contact in some common form with motor plans (evidence
for common currency in speech gesturesGoldstein & Fowler, 2003;
more generally a common coding principle in action systems, (Prinz,
1997; Galantucci, et al, 2006), and the discovery of mirror neurons (Riz-
zolatti et al, 1988) has made this notion seem more biologically tractable.
However, it would certainly strengthen the kind of simulations presented
here if we could show how that is accomplished in the case of a relatively
abstract property like phase
3
.
The cluster simulation has a more specific limitation. For reasons dis-
cussed in section 3.2, we assumed in both CCV and VCC learning simula-
tions that both Cs are identically coupled to the V, in-phase in CCV, and
anti-phase in VCC. This was appropriate to do, we argued, as we wanted to
322 Hosung Nam, Louis Goldstein and Elliot Saltzman

assume complete parallelism between onsets and codas. We also wanted to
show that the observed differences could emerge from only differential
coupling strength of in-phase vs. anti-phase modes. However, in our model,
the child acquires the coupling associated with a language in which coda Cs
are not moraic, and we are left with the problem of how a child might ac-
quire the pattern exhibited by English and other languages in which coda
Cs are moraic. One possible answer is that in this case, adult CC phasing in
VCCs would presumably be 180
o
(as opposed to the 120
o
employed here),
so perhaps the infant would never, in this case, attempt to use the VC coor-
dination to produce the final coda C. However, the implications of this
would have to be tested in further simulations.
We should also consider how the model would handle cases (like
Dutch) where some children are reported to acquire onset clusters before
coda clusters. Note that the effect obtained in our model crucially de-
pended on the fact that the mode associated with onsets is stronger than the
one associated with codas. However, if the mode associated with codas
were stronger at the time clusters begin to be acquired, then the model out-
put would be reversed. A strong mode associated with codas could indeed
develop in Dutch, with the possibility of relatively frequent coda clusters in
child-directed speech (Levelt et al, 2000).
There are also predictions made by the model that could, in theory, be
tested, although the relevant data are not yet available. One prediction is
that CV structures should appear earlier than VC even in a language like
Arrernte, in which the CV structures would presumably be phonologically
ill-formed, as Arrernte has been argued to have no onsets. Such develop-
mental data are not presently available. A second is that careful analysis of
childrens early productions of intended CCVs that are perceived by adult
transcribers as CV should reveal cases in which both Cs are being pro-
duced, but synchronously. Testing this would require articulatory data from
children that is also not presently available.
Conversely, there are patterns of acquisition that have been reported
(e.g., Rose, this volume) that our model has not yet been tested against. For
example, CCV is reported as acquired earlier than CCVC. Testing such
patterns would require developing a more complete (and much slower)
model in which the syllable position modes and the cluster sequentiality
mode are all allowed to evolve together.
Finally, we should consider alternative accounts for distributional and
developmental regularities of syllable structure. Ohala (1996) attributes the
CV preference in languages to the perceptual robustness of initial versus
Self-organization of syllable structure 323

final Cs, because particularly in the case of stops, acoustic information that
affords perceptual recovery is more salient in CV (e.g., the intensity of
release bursts). While it is plausible that such differences form part of the
explanation for the CV preference, it is not clear how this explanation could
be extended to the developmental lag of CCV compared to VCC. For ex-
ample, a stop-liquid onset cluster would still retain these desirable burst
properties, yet such structures can in many languages be acquired later than
liquid-stop coda clusters that lack these burst cues.
The other major alternative model of the development of syllable struc-
ture is the frame-content model (e.g., MacNeilage, 1998; MacNeilage &
Davis, 2000). The model hypothesizes that a syllable structure frame,
based on mandibular oscillation, develops before the content provided by
individual C or V gestures. While the model has some plausibility, argu-
ments have been raised that MacNeilage and Davis evidence for jaw-only
oscillations in childrens babbling (a preponderance of certain CV combi-
nations) cannot by itself be used as evidence for the jaw-only strategy
(Goldstein et al, 2006; Giulivi et al, 2006). Regardless of how that issue is
resolved, it is not clear how the frame-content model would account for the
pattern of results predicted by the coupled oscillator model: the earlier ac-
quisition of CV compared to VC, but the earlier acquisition of VCC com-
pared to CCV.


Notes

* We gratefully acknowledge support of the following NIH grants: DC-00403,
DC-03663, and DC-03782.

1
.

Since a cycle of tuning is done by attracting neural units to an experienced
stimulus, the density of the units can grow differently at the ends of the range
covered by the units. The following two-step procedure was performed in order
to prevent mode growths from emerging at the boundaries, i.e., to eliminate
boundary effects in the space of neural units: First, 179 units were added below
in-phase (0) and 180 units were added above anti-phase (180), where 0 and
180 are the predicted modes in this simulation. Second, prior to their use in the
tuning process, phases were wrapped between 90 and 270, i.e., phases less
than 90o or greater than 270o were re-expressed as equivalent values within
the interval [90o, 270o] which is medially positioned within the range of the
units.
324 Hosung Nam, Louis Goldstein and Elliot Saltzman


2
. We assumed that, at this stage of the simulations, the intrinsic HKB function is
so weak relative to the intended SEL potentials that it could be ignored. When
we experimented by adding a relatively strong intrinsic potential function to
each of the SEL intended potential functions, the result was to create multiple
attractors due to the competitive structure of the onset and coda graphs used in
the simulations. As a result of this multistability, the simulations could produce
relative phase patterns that were linguistically inappropriate. It is interesting to
speculate, however, that some of these inappropriate patterns may underlie ges-
tural misorderings observed developmentally during the acquisition of conso-
nant sequences.
3
. It is encouraging to note that some progress has been made along these lines in
extracting syllabic phase (a continuously varying, normalized measure of tem-
poral position within syllables) from speech acoustics (Hartley, 2002).
References
Bender, B.
1999 Marshallese grammar (Chapter 1, 2). Ms., University of Hawaii.
Breen, G. & R. Pensalfini
1999 Arrernte: A language with no syllable onsets. Linguistic Inquiry, 30: 1-26.
Browman, C. P. & L. Goldstein
1988 Some notes on syllable structure in articulatory phonology. Pho-
netica 45: 140-155.
1992 Articulatory phonology: An overview. Phonetica 49: 155-180.
1995 Gestural syllable position effects in American English. In F. Bell-
Berti & L. Raphael (Eds.) Producing speech: Contemporary issues.
19-33. NY: American Institute of Physics.
2000 Competing constraints on intergestural coordination and self-
organization of phonological structures, Les Cahiers de l'ICP, Bulle-
tin de la Communication Parle, vol. 5, pp.25-34.
Byrd, D.
1995 C-Centers revisited. Phonetica, 52:263-282.
Chervela, N.
1981 Medial consonant cluster acquisition by Telugu children. Journal of
Child Language, 8, 63-73.
Chomsky, N.
1965 Aspects of the Theory of Syntax. Cambridge: The MIT Press.
Clements, G. N.
1990 The role of the sonority cycle in core syllabification. In John King-
ston & Mary Beckman, (Eds.), Papers in Laboratory Phonology I,
283-333. Cambridge: Cambridge University Press.
Self-organization of syllable structure 325

Clements, G. N. & S. J. Keyser
1983 CV phonology. Cambridge, MA: MIT Press.
de Boer, B.
2001 The Origins of Vowel Systems. Oxford: Oxford University Press.
de Jong, K.
2001a Effects of syllable affiliation and consonant voicing on temporal
adjustment in a repetitive speech production task. Journal of Speech,
Language, and Hearing Research, 44, 826-840.
2001b Rate-induced resyllabification revisited. Language and Speech, 44,
197-216.
Demuth, K. & Fee, E.J.
1995 Minimal prosodic words in manuscript, Brown University and Dal-
housie University.
Dodd, B.
1995 Children's acquisition of phonology. In B. Dodd (Ed.), Differential diag-
nosis and treatment of speech disordered children, 21-48. London: Whurr.
Fikkert, P.
1994 On the acquisition of prosodic structure. Doctoral dissertation, Uni-
versity of Leiden, The Netherlands.
Galantucci, B., C. A. Fowler & M. T. Turvey
2006 Psychonomic Bulletin & Review 2006, 13 (3), 361-377.
Giulivi, S., D. H. Whalen, L. M. Goldstein, & A. G. Levitt
2006 Consonant-vowel place linkages in the babbling of 6-, 9- and 12-
month-old learners of French, English, and Mandarin. Journal of the
Acoustic Society of America 119, 3421.
Goldstein, L.
2003 Emergence of discrete gestures. Proceedings of the 15th Interna-
tional Congress of Phonetic Sciences. Barcelona, Spain.August 3-9,
2003. Universitat Auto`noma de Barcelona
Goldstein, L. & C. A. Fowler
2003 Articulatory phonology: A phonology for public language use. In
Schiller, N.O. & Meyer, A.S. (eds.), Phonetics and Phonology in
Language Comprehension and Production, pp. 159-207. Mouton de
Gruyter.
Goldstein, L., D. Byrd, & E. Saltzman
2006 The role of vocal tract gestural action units in understanding the
evolution of phonology. From action to language: The mirror neu-
ron system: Michael Arbib (eds.), 215-249 Cambridge: Cambridge
University Press.
Goldstein, L., M. Pouplier, L. Chen, E. Saltzman, & D. Byrd
in press. Dynamic action units slip in speech production errors. Cognition


326 Hosung Nam, Louis Goldstein and Elliot Saltzman

Gordon, M.
2004. Syllable weight. In Bruce Hayes, Robert Kirchner, and Donca
Steriade (eds.), Phonetic Bases for Phonological Markedness, pp.
277-312. Cambridge: Cambridge University Press.
Gnanadesikan, A.
1996 Markedness and faithfulness constraints in child phonology. Ms.,
University of Massachusetts, Amherst.
Haken, .H., J. Kelso, & H. Bunz
1985 A theoretical model of phase transitions in human hand movements.
Biological Cybernetics 51: 347-356.
Haken, H, C.E. Peper, P.J. Beek, & A. Daffertshofer
1996 A model for phase transitions in human hand movements during
multifrequency tapping. Physica D 90(12):179196
Hartley, T.
2002 Syllabic phase: A bottom-up representation of the temporal structure
of speech. In J. A. Bullinaria, & W. Lowe, (Eds). Proceedings of the
7
th
Neural Computation and Psychology Workshop: Connectionist
Models of Cognition and Perception. New York: World Scientific
Press, Pp. 277-288.
Honorof, D.N. & C. P. Browman
1995 The center or edge: how are consonant clusters organised with respect to
the vowel? Proceedings of the XIIIth International Congress of Phonetic
Sciences (3), K. Elenius and P. Branderud (eds.), 552-555. Stockholm,
Sweden: Congress Organisers at KTH and Stockholm University.
Kirk, C. & K. Demuth
2003 Onset/coda asymmetries in the acquisition of clusters. In Proceed-
ings of the 27th Annual Boston University Conference on Language
Development, Barbara Beachley, Amanda Brown, and Frances
Conlin (eds.), 437-448. Somerville, MA: Cascadilla Press.
Kozhevnikov, V. A. & L. A. Chistovich
1965 Speech: Articulation and Perception. English translation: U. S. Dept.
of Commerce, Clearing House for Federal Scientific and Technical
Information.
Levelt, C., N. Schiller, & W. Levelt
2000 The acquisition of syllable types. Language Acquisition, 8, 237-264.
Lle, C. & M. Prinz
1995 Consonant clusters in child phonology and the directionality of syl-
lable structure assignment. Journal of Child Language, 23, 31-56.
Lfqvist, A. & V. L. Gracco
1999 Interarticulator programming in VCV sequences: lip and tongue move-
ments. Journal of the Acoustic Society of America 105, 1854-1876.


Self-organization of syllable structure 327

Macken, M. A.
1977 Developmental reorganization of phonology: A hierarchy of basic
units of acquisition. Papers and Reports in Child Language Devel-
opment, 4, 1-36.
MacNeilage, P. F.
1998 The frame/content theory of evolution of speech production, Behav-
ioral and Brain Science., 21, 499511.
MacNeilage, P.F. & B.L. Davis
1998 Evolution of speech: The relation between phylogeny and ontogeny.
Paper presented at the Second International Conference on the Evo-
lution of Language, London.
2000 On the origin of internal structure of word forms. Science, 288, 527-531.
McLeod, S., J. van Doorn, & V. A. Reed
2001 Normal Acquisition of Consonant Clusters. American Journal of
Speech-Language Pathology, Vol. 10 Issue 2, p99, 12p, 3 charts
Nam, H.
in press A competitive, coupled oscillator model of moraic structure: Split-
gesture dynamics focusing on positional asymmetry. Laboratory
phonology 9
Nam, H., L. Goldstein, & E. Saltzman
submitted a Intergestural timing in speech production: the role of graph structure.
submitted b A dynamical model of gestural coordination.
Nam, H. & E. Saltzman
2003 A competitive, coupled oscillator of syllable structure. Proceedings of
the XIIth International Congress of Phonetic Sciences (3): 2253-2256.
Ohala, J. J.
1996 Speech perception is hearing sound, not tongues. Journal of the
Acoustic Society of America 99, 17181725.
Oudeyer, P-Y.
2002 The origins of syllable systems: an operational model, in Proceed-
ings of the 23rd Annual Conference of the Cognitive Science Society,
COGSCI2001, J. Moore and K. Stenning, Eds, London: Laurence
Erlbaum Associates, 2001, pp. 744749.
2005 The self-organization of speech sounds, Journal of Theoretical Biol-
ogy., 233, 435449.
2006 Self-Organization in the Evolution of Speech. Studies in the Evolu-
tion of Language. Oxford University Press.
Paul, R. & Jennings, P.
1992 Phonological behaviour in toddlers with slow expressive language
development. Journal of Speech and Hearing Research, 35, 99-107.
Pouplier, M.
in press Tongue kinematics during utterances elicited with the SLIP tech-
nique. Language and Speech.
328 Hosung Nam, Louis Goldstein and Elliot Saltzman

Pouplier, M. & L. Goldstein
2005 Asymmetries in the perception of speech production errors. Journal
of Phonetics 33, 47-75.
Prinz, W
1997 Perception and action planning. Eur. J. Cognit. Psychol. 9: 129154.
Rizzolatti, G., R. Camarda, L. Fogassi, M. Gentilucci, G. Luppino, & M. Matelli
1988 Functional organization of inferior area 6 in the macaque monkey: II.
Area F5 and the control of distal movements. Experimental Brain
Research, 71, 491-507.Rizzolatti et al
Roark, B. & K. Demuth
2000 Prosodic constraints and the learners environment: A corpus study. In
Proceedings of the 24th Annual Boston University Conference on Lan-
guage Development, S. Catherine Howell, Sarah A. Fish, and Thea
Keith-Lucas (eds.), 597-608. Somerville, MA: Cascadilla Press.
Salidis, J. & J.S. Johnson
1997 The production of minimal words: a longitudinal case study of pho-
nological development. Language Acquisition 6, 136.
Saltzman, E. & D. Byrd
2000 Task-dynamics of gestural timing: Phase windows and multifre-
quency rhythms, Human Movement Science, vol. 19, pp.499-526.
Saltzman, E., A. Lfqvist, B. Kay, J. Kinsella-Shaw & P. Rubin
1998 Dynamics of intergestural timing: a perturbation study of lip-larynx
coordination. Experimental Brain Research, 123 (4): 412-424.
Saltzman, E. & K. Munhall
1989 A dynamical approach to gestural patterning in speech production.
Ecological Psychology 1: 333-382.
Stetson, R. H.
1951 Motor Phonetics. Amsterdam: North-Holland
Templin, M.
1957 Certain language skills in children (Monograph Series No. 26). Minnea-
polis, MN: University of Minnesota, The Institute of Child Welfare.
Tuller, B. & J.A.S. Kelso
1991 The Production and Perception of Syllable Structure. Journal of
Speech and Hearing Research, 34: 501-508.
Turvey, M.
1990 Coordination, American Psychologist, vol. 45, 938-953.
Vihman, M.M. & Ferguson, C.A.
1987 The acquisition of final consonants. In: Viks, U. (Ed.), Proceedings
of the Eleventh International Congress of Phonetic Sciences. Tallinn,
Estonia, USSR, pp. 381-384.
Watson, M. M. & G. P. Scukanec
1997 Profiling the phonological abilities of 2-year-olds: A longitudinal
investigation. Child Language Teaching and Therapy, 13, 3-14.


Internal and external influences on child language
productions
Yvan Rose
1. Introduction*
Over the past three decades, statistical approaches have been successfully
used to explain how young language learners discriminate the sounds of
their mother tongue(s), perceive and acquire linguistic categories (e.g. pho-
nemes), and eventually develop their mental lexicon. In brief, input statis-
tics, i.e. the relative frequency of the linguistic units that children are ex-
posed to (e.g. phones, syllable types), appear to provide excellent predictors
in the areas of infant speech perception and processing. This research offers
useful insight into both the nature of the linguistic input that infants attend
to and how they sort out the evidence from that input (see Gerken 2002 for
a recent overview).
Building on this success, a number of linguists have recently proposed
statistical explanations for patterns of phonological productions that were
traditionally accounted for through typological universals, representational
complexity, grammatical constraints and constraint rankings, or lower-level
perceptual and articulatory factors. For example, Levelt, Schiller and Levelt
(1999/2000) have proposed, based on longitudinal data on the acquisition
of Dutch, that the order of acquisition of syllable types (e.g. CV, CVC,
CCV) can be predicted through the relative frequency of occurrence of
these syllable types in the ambient language. Following a similar approach,
Demuth and Johnson (2003) have proposed that a pattern of syllable trunca-
tion resulting in CV forms attested in a learner of French was triggered by
the high frequency of the CV syllable type in this language.
However, important questions need to be addressed before one can con-
clude that statistical approaches, or any mono-dimensional approach based
on a single source of explanation, truly offer strong predictions for devel-
opmental production patterns. For example, one must wonder whether input
statistics, which are mediated through the perceptual system and computed
at the cognitive level, can have such an impact on production, given that
production, itself influenced by the nature of phonological representations,
330 Yvan Rose

involves a relatively independent set of cognitive and physiological mecha-
nisms, some of which presumably independent from statistical processing.
In this paper, I first argue that while statistics of the input may play a
role in explaining some phenomena, they do not make particularly strong
predictions in general, and, furthermore, simply cannot account for many of
the patterns observed in early phonological productions. Using this as a
stepping-stone, I then argue that the study of phonological development,
similar to that of any complex system, requires a multi-dimensional ap-
proach that takes into consideration a relatively large number of factors.
Such factors include perception-related representational issues, physiologi-
cal and motoric aspects of speech articulation, influences coming from
phonological or statistical properties of the target language and, finally, the
childs grammar itself, which is constantly evolving throughout the acquisi-
tion period and, presumably, reacting or adapting itself to some of the limi-
tations that are inherent to the childs immature speech production system. I
conclude from this that any analysis based on a unique dimension, be it
statistical, perceptual or articulatory, among many others, restricts our abil-
ity to explain the emergence of phonological patterning in child language.
To illustrate this argument, I discuss a number of patterns that are well
attested in the acquisition literature. I argue that explanations of these pat-
terns require a consideration of various factors, some grammatical, some
external to the grammar itself.
The paper is organized as follows. In section 2, I discuss the predictions
made by statistical approaches to phonological development, using the
results from Levelt et al. (1999/2000) for exemplification purposes. I then
confront these predictions with those made by more traditional approaches
based on structural complexity and language typology, in section 3. I intro-
duce the approach favoured in this paper in section 4. In section 5, I discuss
a series of examples that provide support for the view that the acquisition of
phonology involves a complex system whose sub-components may interact
in intricate ways. I conclude with a brief discussion in section 6.
2. Statistical approaches to phonological productions: an example
Statistical approaches, when used to account for production patterns, make
three main predictions, listed in (1). All other things being equal, they pre-
dict that the most frequent units found in the ambient language should ap-
pear first in the childs speech. As opposed to this, the least frequent units
Internal and external influences on child language productions 331

should appear last. Finally, units of equivalent frequency are predicted to
emerge during the same acquisition period (itself determined through rela-
tive frequency) but to display variation in their relative order of appearance.

(1) Statistical approaches to phonological development: predictions
a. Frequent units: acquired early
b. Infrequent units: acquired late
c. Units with similar frequencies: variable orders of acquisition

A clear illustration of the predictions made by the statistical approach
comes from Levelt et al. (1999/2000), who conducted a study of the acqui-
sition of syllable types by twelve monolingual Dutch-learning children.
Their main observations are schematized in (2). As we can see, all learners
first utterances were restricted to the four types of syllables that are the
least complex (CV, CVC, V, VC). Following this, the learners took one of
two different paths, defining groups A (nine children) and B (three chil-
dren). During this second phase, the groups either acquired pre-vocalic
clusters before post-vocalic ones (CCV > VCC) or vice versa (VCC >
CCV). Finally, all learners acquired the more complex CCVCC syllable
towards the end of the acquisition period.

(2) Acquisition of syllable types in Dutch (Levelt et al. 1999/2000)
Group A: CVCC > VCC > CCV > CCVC

CV > CVC > V > VC CCVCC

Group B: CCV > CCVC > CVCC > VCC

We can see in (3) that the four syllable types acquired early (in (2)) are also
the most frequently occurring ones in Dutch. The following four types,
which distinguish the two groups of learners in (2), display relatively simi-
lar frequencies of occurrence in the language. Finally, the last syllable type
acquired by all children (CCVCC) is also the one that occurs with the low-
est frequency in the language.

(3) Frequency of syllable types in Dutch (Levelt et al. 1999/2000)
CV > CVC > VC > V > {CVCC CCVC CCV VCC} > CCVCC

332 Yvan Rose

The correlation between the relative frequency of syllable types in Dutch
and their order of acquisition thus seems to provide support for Levelt et
al.s suggestion that the emergence of production patterns in child language
can be predicted through input statistics. For example, both orders of ap-
pearance and the variability that we observe between groups A and B seem
to correspond to the statistical facts observed. In the next section, however,
I introduce an alternative perspective on these same data.
3. Statistical frequency or representational complexity?
In light of the above illustration, one could be tempted to extend the statis-
tical approach to a larger set of phenomena observed in child language. For
example, we could hypothesise that the development of syllable structure in
a given language is essentially governed by input statistics. However, im-
portant issues remain to be addressed before we can jump to such a conclu-
sion and favour the statistical approach over more traditional ones. Such
approaches have indeed been successful at accounting for various phenom-
ena in child language, for example the acquisition of multi-syllabic word
shapes (and related truncation patterns), or that of syllable structure (e.g.
Ferguson and Farwell 1975, Fikkert 1994, Demuth 1995, Freitas 1997,
Pater 1997, Rose 2000).
As was noted in the preceding section, the rate of acquisition of a given
structure may be correlated with its frequency of occurrence in the target
language. In contrast to this, an approach based on representational com-
plexity predicts that the phonologically simplest units (e.g. singleton on-
sets) should be acquired before more complex units (e.g. complex onsets).
However, in the case at hand (as well as, presumably, in most of the litera-
ture on the development of syllable structure), both the frequency-based
and the complexity-based approaches make essentially identical predic-
tions, because of the fact that, as far as syllable types are concerned, the
most frequent also tend to be the simplest ones. This is certainly the case in
Dutch where we can see that the four syllable types that were acquired first
by all children in (2) are the ones that are the most frequent in (3) and also
those that arguably show no complexity in their internal constituents.
1
From
this perspective, we are at best witnessing a tie between the two approaches
under scrutiny.
However, a further look at the data that enable a distinction in learning
paths between groups A and B in (2) actually raises doubts on the predic-
Internal and external influences on child language productions 333

tive power of the statistical approach. Indeed, if we consider only the ac-
quisition order of the four syllable types that differentiate the two groups of
learners, which are deemed to have equivalent frequency values in the tar-
get language, the statistical approach predicts a total of 24 possibilities (4!
or 4x3x2=24). Yet only two of these 24 potential learning paths are attested
in the data, despite the fact that twelve children were included in the study.
While one may be tempted to blame the relatively small population investi-
gated for this, it is important to note that the two sequences attested corre-
spond exactly to those that an approach based on phonological complexity
would predict. Indeed, as mentioned above, the learners from group A ac-
quired post-vocalic consonant clusters before complex onsets ([CVCC >
VCC] >> [CCV > CCVC]), while the learners from group B followed the
opposite path and acquired complex onsets before post-vocalic clusters
([CCV > CCVC] >> [CVCC > VCC]). However, none of the potential
paths intertwining pre-vocalic clusters with post-vocalic ones is attested.

(4) Unattested patterns
a. *CCV > CVCC > CCVC > VCC
b. *VCC > CCV > CVCC > CCVC
c. *

Under the assumption that the representations of only two units have in fact
been acquired (those for pre-vocalic versus post-vocalic clusters), but at
different times, these data would suggest that a complexity-based approach
enables both an accurate description of the data and an explanation for the
non-attested acquisition paths. In contrast, the statistical approach over-
generates; it predicts many more learning paths than the ones attested.
As rightly pointed out by an anonymous reviewer, if only two units
(representations for pre- and post-vocalic clusters) need to be acquired by
the children, then the syllable types containing a single new unit (e.g.
CVCC and VCC, both of which show a post-vocalic cluster), should be
acquired during the same developmental stage (see also Fikkert 1994 and
Pan and Snyder 2003 for related discussions). While the data description
provided by Levelt et al. (1999/2000) does not enable a complete verifica-
tion of this prediction, it certainly points in its direction. Three data points
are discussed by Levelt et al., namely after the first, third, and sixth re-
cording sessions. I address each of these data points in the following para-
graphs.
334 Yvan Rose

After the first recording session, while most (eight of the twelve) chil-
dren systematically failed to produce pre- or post-vocalic clusters, child
David had CVCC but not VCC, Catootje had CVCC, VCC, CCV but not
CCVC, Enzo had CCV, CCVC, CVCC but not VCC, while Leon had the
four syllable types with complex constituents, and only lacked CCVCC (the
type also missing from all of the other childrens productions). While these
results are relatively mixed, the productions (or absence thereof) from the
first eight children fully support the current hypothesis, since they display
no unsystematic gaps. Also, given that the data were naturalistically re-
corded, the few apparently unsystematic gaps in the other four childrens
productions (e.g. the fact that both David and Enzo displayed CVCC but
lacked VCC) may have occurred simply because the children did not at-
tempt a particular syllable type. It is indeed likely that the sample available
in the corpus underestimates the childrens true phonological abilities, since
the non-occurrence of a given syllable type may simply be an artefact of
data sampling, especially for the rarely occurring types in the language.
This conjecture is in fact supported by Levelt et al. (1999/2000:259), who
show that VCC displays the second lowest frequency of all syllable types in
Dutch, with a frequency value (1.03), which is only slightly above that of
the CCVCC type (0.97). As opposed to this, the CVCC type shows a much
higher relative frequency, at 5.51. Given these figures, we can hypothesize
that both VCC and CCVCC syllable types were very seldom attempted by
Dutch-learning children. This empirical issue suggests that an approach
considering attempted syllables, in addition to the attested ones, should
have been favoured (see Pan and Snyder 2003 for further discussion).
At the second data point, six children still had no complex constituents.
One child, Tirza, had post-vocalic but no pre-vocalic clusters. Three chil-
dren (David, Catootje and Leon) had both pre- and post-vocalic clusters but
no CCVCC syllables, while child Eva had CVCC but not VCC. Finally,
Enzo lacked VCC syllables but yet displayed CVCC and CCVCC. Similar
to the first data point, the apparently unsystematic gaps again come from
the rarely occurring (and presumably rarely attempted) VCC and CCVCC
syllable types. Aside from this issue, the patterns from this second data
point reveal generally systematic behaviours, if taken from a representa-
tional complexity perspective.
This latter observation is further reinforced by the third sample, where
nine children (those from group A in (2)) show either post-vocalic or both
pre- and post-vocalic clusters. Also, the rarely occurring CCVCC syllable
type is only attested in the productions of children who independently dis-
Internal and external influences on child language productions 335

played both clusters. Finally, of the three children from group B, two dis-
play pre-vocalic but no post-vocalic clusters, while the last one has the
CCV but not the CCVC syllable type. This gap is the only one left unex-
plained by the complexity approach, but again without a means to verify
whether that syllable type was even attempted by the child.
We can see from the above discussion that the vast majority of the ob-
servations lend support to an approach based on representational complex-
ity, especially if one considers the possibility that the absence of a given
cluster may be attributed to the fact it was not attempted. Put in the larger
context of linguistic universals, the representational approach advocated
here also finds independent motivation in factorial typology. As reported by
Blevins (1995), word-initial and word-final consonant clusters pattern in
independent ways across languages. We can see in (5) that genetically un-
related languages such as Finnish (Finno-Ugric) and Klamath (Plateau Pe-
nutian) allow for post-vocalic but not pre-vocalic consonant clusters. As
opposed to these, languages such as Mazateco (Oto-Manguean) and Se-
dang (North Bahnaric) allow for pre-vocalic clusters but ban post-vocalic
ones.

(5) CC clusters across languages (Blevins 1995)
a. Finnois, Klamath: CVCC but not *CCV
b. Mazateco, Sedang: CCV but not *CVCC

An analysis of the distribution of these clusters requires a formal distinction
between the two cluster types (pre- and post-vocalic), such that complexity
can be allowed in one independently of the other. Under the view that chil-
drens grammars are not fundamentally different from that of adults (e.g.
Pinker 1984, Goad 2000, Inkelas and Rose 2008), children can acquire
these clusters in various orders. Also, as predicted by an approach based on
phonological complexity (as opposed to frequency), discontinuous learning
paths such as the unattested ones in (4) should generally not occur.
2

Finally, when we consider the issue of the predictive power of the statis-
tical approach from a larger perspective, other questions arise as well. Child
phonological patterns often have no direct correlates with the target lan-
guages being acquired (e.g. Bernhardt and Stemberger 1998). These emer-
gent patterns include, among many others, consonant harmony (e.g. gteau
cake [gato] [tato]; Smith 1973, Goad 1997, Pater 1997, Rose 2000, dos
Santos 2007), velar fronting (e.g. go [do]; Chiat 1983, Stoel-Gammon
1996, Inkelas and Rose 2008), segmental substitutions (e.g. vinger finger
336 Yvan Rose

["vINr] ["sIN]; Levelt 1994, Dunphy 2006), consonant cluster reductions
(e.g. brosse brush [bOs]; Fikkert 1994, Freitas 1997, Rose 2000), sylla-
ble truncations (e.g. banana [bana]; Ferguson and Farwell 1975, Fikkert
1994, Pater 1997) and syllable reduplication (e.g. encore again [kOkO];
Rose 2000). Because of their emerging nature, these processes cannot be
predicted from the kind of statistical tendencies that would enable one to
distinguish either languages or language learners from one another. While a
certain relationship obviously exists between the manifestation of these
processes and the sound patterns that compose the target language, this
relationship typically relates to phonological or lower-level articulatory
aspects of child language development, not statistics. Furthermore, the oc-
currence of a given process seems to be randomly distributed among the
population of learners (e.g. Smit 1993). Despite some implicational rela-
tionships which have been argued for in the acquisition literature (e.g.
Gierut and OConnor 2002), no one can predict, given any population of
learners, which children will or will not display a given process. Therefore,
no direct relationship seemingly exists between emergent processes and the
statistical properties of target languages. Note however that this claim does
not rule out the possibility that specific statistics of the target language
affect the actual manifestation of a process. As I will discuss further below,
it is logical to think that a child may select a given segment or articulator as
default because of its high frequency in the language.
When taken together, the observations above suggest that while statis-
tics of the input should not be dismissed entirely, they should only be taken
as one of several factors influencing phonological productions. In the next
section, I discuss a number of additional factors, all of which should also be
considered.
4. A more encompassing proposal
In order to provide satisfactory explanations for the patterns observed in
child language, I argue that one needs to consider the two general types of
factors listed in (6), which may either manifest themselves independently or
interact with one another in more or less complex ways in child phonologi-
cal productions.

Internal and external influences on child language productions 337

(6) Factors influencing child language phonological productions
a. Grammatical (internal)
b. Non-grammatical (external)

Approaching child language through (6a) is by no means a novel idea. It
has been pervasive in the acquisition literature since the 1970s (see Bern-
hardt and Stemberger 1998 for a comprehensive survey) and in works on
learnability (e.g. Dresher and van der Hulst 1995). However, in contrast to
most grammatical analyses proposed in the literature, I propose to bring the
study of early productions into a broader perspective, one that extends be-
yond grammatical considerations and incorporates factors that relate to
perception, physiology, articulation as well as statistics, to name a few (see
also Inkelas and Rose 2008, and Fikkert and Levelt, in press).
In the next section, I discuss a series of phonological patterns observed
in child language, some of which have been discussed extensively in the
literature, often because of the analytical challenges they offer. I argue that
each of these patterns lends support to the multi-dimensional approach
advocated in this paper.
5. The multiple sources of phonological patterning in child language
I begin the discussion with the process of positional velar fronting, in 5.1,
which highlights the interaction between grammatical and articulatory fac-
tors. In section 5.2, I discuss in turn a number of patterns that have been
described as opaque chain shifts in the literature. I argue that these patterns
are in fact opaque in appearance only. I propose that they are entirely pre-
dictable within a transparent grammatical system once we take into account
the possible impacts of non-grammatical factors. Following a similar rea-
soning, I discuss, in section 5.3, a potential interaction between articulatory
and statistically-induced pressures on the emergence of consonant harmony
in a Dutch learners productions. Finally, in 5.4, I briefly highlight observa-
tions that would be difficult to explain through lower-level (e.g. articulatory
or perceptual) influences. All of these observations point to strong gram-
matical influences on child language development.
Because of space limitations, more comprehensive accounts than the
ones sketched below would ideally be required as well as a consideration of
issues such as variation, both within and across language learners. My aim
338 Yvan Rose

is thus limited here to suggesting what I consider to be a sensible approach
to the data, leaving the fine details of analysis for future work.


5.1. Grammatically-induced systematic mispronunciations
Velar fronting consists of the pronunciation of target velar consonants as
coronal (e.g. go [do]). What is peculiar about this process is that when it
does not apply to all target velars, it affects velars in prosodically strong
positions (e.g. in word-initial position or in word-medial onsets of stressed
syllables; see (7a)) without affecting velars in weak positions (e.g. medial
onsets of unstressed syllables, codas; see (7b)) (e.g. Chiat 1983, Stoel-
Gammon 1996, Inkelas and Rose 2008).

(7) Positional velar fronting (data from Inkelas and Rose 2008)
a. Prosodically strong onsets
[tp] cup 1;09.23
[do] go 1;10.01
[dn] again 1;10.25
[hksdn] hexagon 2;02.22
b. Prosodically weak onsets; codas
[mki] monkey 1;08.10
[bejgu] bagel 1;09.23
[bk] book 1;07.22
[pdjk] padlock 2;04.09

As discussed by Inkelas and Rose (2008), positional velar fronting is, on
the face of it, theoretically unexpected, because positional neutralization in
phonology generally occurs in prosodically weak, rather than strong, posi-
tions. Taking this issue as their starting point, Inkelas and Rose offer an
explanation that incorporates both an articulatory and a grammatical com-
ponent. The articulatory component of their explanation relates to the fact
that young children are equipped with a vocal tract that is different in many
respects from that of an adult, as illustrated in (8).

Internal and external influences on child language productions 339

(8) Child vocal tract

Source: http://www.ling.upenn.edu/courses/Fall_2003/ling001/infant.gif

Inkelas and Rose emphasize the facts that (a) the hard palate of children is
proportionally shorter than that of adults and (b) the tongue is proportion-
ally larger and its mass is located in a more frontal area of the vocal tract.
Adult vocal tract shapes and proportions are attained between six and ten
years of age (e.g. Kent and Miolo 1995, Mnard 2002). In addition, young
children do not possess the motor control abilities that adult speakers gen-
erally take for granted (e.g. Studdert-Kennedy and Goodell 1993).
These differences in vocal tract shape and control, Inkelas and Rose ar-
gue, are not without consequences for the analysis of early phonological
productions. Certain sounds and sound combinations are inherently more
difficult to produce for children than for adults. This is particularly evident
in the acquisition of phonological contrasts that involve lingual articula-
tions. For example, in languages like English in which we find a contrast
between /s/ and /T/, (e.g. sick /sIk/ ~ thick /TIk/), this contrast is often ac-
quired late (e.g. Smit 1993, Bernhardt and Stemberger 1998). In addition, it
is often the case that young children across languages show frontal lisp-like
effects (e.g. /s/ [T]). The relative size and frontness of the tongue body,
compounded by an imperfect control of motor abilities may both be at least
partly responsible for the emergence of this phenomenon.
Coming back to positional velar fronting, Inkelas and Rose further argue
that the positional nature of this phenomenon is not simply the result of
articulatory pressures; it also has a significant grammatical component. It is
well known that speech articulations are more emphasized in prosodically
strong positions such as word-initial or stressed syllables (e.g. Fougeron
and Keating 1996). It is also well known that childrens developing gram-
mars are particularly sensitive to the prosodic properties of their target lan-
340 Yvan Rose

guage (e.g. Gerken 2002). It follows from this that children should be faith-
ful to the strengthening of speech articulations in prosodically strong posi-
tions that they identify in the adult language. Building on these observa-
tions, Inkelas and Rose propose that the children who display positional
velar fronting are in fact attempting to produce these stronger articulations
in prosodically strong contexts. However, because of the articulatory fac-
tors listed above, the strengthening of their target velars results in an articu-
lation that extends too far forward, into the coronal area of the hard palate,
yielding the fronted velars on the surface. Inkelas and Roses argument for
the grammatical conditioning of positional velar fronting is further sup-
ported through another process observed in the same learner, that of posi-
tional lateral gliding, which takes place following the same strong/weak
dichotomy of contexts as velar fronting even though the articulatory under-
pinnings of gliding are completely independent from those of fronting. In
both cases, the child is cosmetically unfaithful to target segments but yet
abides by strong requirements of the target grammar. This explanation has
the advantage over previous analyses of reconciling the positional velar
fronting facts with phonological theory, especially given that articulatory
strengthening should occur in prosodically strong, not weak, positions.
3
In
the context of the current argument, it also provides a clear case where non-
grammatical, articulatory factors can interact with developing grammatical
systems to yield the emergence of systematic patterning in child language.
In the next section, I address other patterns which may look suspicious
from a grammatical perspective, as they suggest opacity effects in child
grammars. I argue that once they are considered in their larger context,
these apparently opaque processes can be explained in transparent ways.


5.2. Apparent chain shifts
A number of child phonological patterns that take the shape of so-called
chain shifts have been considered cases of grammatical opacity in the lit-
erature, thereby posing theoretical and learnability problems (e.g. Smith
1973, Smolensky 1996, Bernhardt and Stemberger 1998, Hale and Reiss
1998, Dinnsen 2008). In line with Hale and Reiss (1998) suggestion that
(apparent) chain shifts are not a problem for theories that consider both
competence and performance, I argue that these patterns can in fact be seen
as entirely transparent if one incorporates factors pertaining to speech per-
ception and/or articulation into the analysis.
Internal and external influences on child language productions 341

Consider first the data in (9). As we can see in (9a), the child produces
the target consonant /z/ as [d] in words like puzzle. This process of stop-
ping, often observed in child language data (e.g. Bernhardt and Stemberger
1998), may by itself be related to articulatory or motor factors such as the
ones listed in the preceding section. However, as we can see in (9b), target
/d/ is itself pronounced as [g] in words like puddle.

(9) Chain shift (data from Amahl; Smith 1973)
a. puzzle /pzl/ [pd ] (/z/ [d])
b. puddle /pdl/ [pg ] (/d/ [g]; *[d])

If the child were grammatically able to produce [d] in puzzle, why is it that
he could not produce this consonant in puddle. Schematically, if AB, then
why BC (and not *BB)? This apparent paradox, previously discussed by
Macken (1980), reveals the importance of another non-grammatical factor,
that of perception, which may have indirect impacts, through erroneous
lexical representations, on the childs speech productions. As Macken ar-
gues, the child, influenced by the velarity of word-final [:], perceived the
/d/ preceding it in puddle as a velar consonant (/g/). Because of this faulty
perception, he built a lexical representation for puddle with a word-medial
/g/. The production in (9b) thus results from a non-grammatical, perceptual
artefact which, itself, contributes to the emergence of a paradoxical produc-
tion pattern. The paradox is only apparent, however; it is not inherent to the
grammar itself.
4

Another possibility for chain shifts emerges when both perceptual and
articulatory factors conspire to yield phenomena that should be unexpected,
at least from a strict grammatical perspective. An example of this, also
from Smith (1973) is provided in (10) (see also Smolensky 1996 and Hale
& Reiss 1998 for further discussion of this case). As we can see, /T/ is real-
ized as [f] (in (10a)), even though it is used as a substitute for target /s/ (in
(10b)).

(10) Circular chain shift (data from Amahl; Smith 1973)
a. /T/ [f] (thick /TIk/ [fIk])
b. /s/ [T] (sick /sIk/ [TIk])

Again here, why cannot the child realize target /T/ as such if [T] is other-
wise possible in output forms (from target /s/)? Consistent with the current
approach, I argue that patterns such as the one in (10) should simply not be
342 Yvan Rose

considered for grammatical analysis, because it arises from a conspiracy of
independent factors, namely perception, which affects the building of lexi-
cal representations, and articulation, which yields surface artefacts in output
forms. First, the realization of /T/ as [f] can arise from a perceptual problem
caused by the phonetic similarity between these two segments. Indeed, the
contrast between these two sounds is often neutralized by both first and
second language learners, who tend to realize both consonants as [f] (e.g.
Levitt, Jusczyk, Murray and Carden 1987, Brannen 2002). This phenome-
non is peculiar because it involves consonants with different places of ar-
ticulation. However, since /f/ and /T/ are acoustically extremely similar
(e.g. Levitt et al. 1987), the merger is not surprising: if the contrast cannot
be perceived by the learner, it cannot be represented at the lexical level and,
consequently, cannot be reproduced in production. Coming back to the
examples in (10), the child thus perceives /T/ as [f] and, consequently, lexi-
cally encodes a target word such as thick with a word-initial /f/ (/fIk/). This
enables an account of the assimilation observed in (10a). Second, if the
same child has not yet mastered the precise articulation required for the
production of /s/, which is realized as [T] for reasons such as the ones men-
tioned in section 5.1, we obtain the second element of the apparent chain
shift in (10b).
The examples discussed thus far highlight ways in which phonetic con-
siderations may affect the childs analysis of the ambient language, for
example by imposing perceptually driven biases on lexical representations
or articulatorily induced artefacts on speech production. Building on this
argument, Hale and Reiss (1998) would further suggest, quite controver-
sially, that examples such as this one basically discredit the study of child
language phonology from a production perspective. I argue that Hale and
Reiss are in fact making a move that is tantamount to throwing the baby out
with the bath water. Contra Hale and Reiss, and in line with most of the
researchers in the field of language development, I support the claim that
the childs developing grammatical system plays a central role in the pro-
duction patterns observed, with the implication that productions are worthy
of investigation in our quest to unveil the grammatical underpinnings of
child language development. This position is further substantiated in the
next two sections, where I discuss examples of processes that reveal more
abstract aspects of phonological (grammatical) processing.


Internal and external influences on child language productions 343

5.3. Interaction between cognitive and articulatory factors
Despite the criticisms formulated against statistical approaches in section 3,
I reiterate that the argument of this paper is not about rejecting statistical
influences altogether, but rather to incorporate them into the larger picture
of what factors can influence grammatical development. This is especially
true in cases where a given unit (e.g. sound, syllable type) can be singled
out as statistically prominent in the ambient language and thus selected by
the learners grammar as representing a default value. As discussed in sec-
tion 3, if this default option correlates with articulatory simplicity, then
there is no easy way to firmly conclude which factor (statistical or articula-
tory) is the determining one. However, if the default option from a statisti-
cal perspective does not correlate with articulatory simplicity, then we
should be expecting children to display variation between the two alterna-
tives. In this section, I discuss patterns of segmental substitution attested in
the productions of Jarmo, a young learner of Dutch. We will see that when
confronted with a sound class that he cannot produce, Jarmo opts for vari-
ous production strategies, which themselves suggest a number of influences
on his developing grammar.
As Dunphy (2006) reports, Jarmo displays difficulties with the produc-
tion of labial continuants (e.g. /f, v, , w/) in onsets. However, instead of
producing these consonants as stops, a strategy that would appear to repre-
sent the simplest solution, his two most prominent production patterns con-
sist of either substituting labial continuants by coronals or debuccalizing
these consonants through the removal of their supralaryngeal articulator.
Stopping occurs but is only the third preferred strategy, as evidenced by the
breakdown in (11).

(11) Realization of labial continuants in onsets (Dunphy 2006)
Attempted forms 229
Target-like 44 19%
Coronal substitution 98 43%
Debuccalization 34 15%
Stopping 22 10%
Velar substitution 11 5%
Other 19 8%

The two main strategies, coronal substitution and consonant debuccaliza-
tion, are exemplified in (12a) and (12b), respectively.
344 Yvan Rose

(12) Examples of substitution strategies for labial continuants
a. Coronal substitution b. Debuccalization
vis ["vIs] ["siS] visje [v

] [isj]
fiets ["fits] ["tIt] willy [li] [hili]
vinger ["vINr] ["sIN] fiets [fits] [i]

In the face of these data, we must find out why the child favoured two
strategies affecting the major place of articulation of the target consonants.
It is also necessary to determine whether there is a formal relationship be-
tween coronals and laryngeals in the childs grammar, given that both of
them act as favoured substitutes for target labial continuants.
First, the distribution of coronals in Dutch (as well as in many of the
worlds languages; see contributions to Paradis and Prunet 1991) provides
support for the hypothesis that the child can analyze them as default (statis-
tically unmarked) consonants in the language. Indeed, coronals account for
55% of all onset consonants and 65% of all coda consonants in spoken
Dutch (van de Weijer 1999). In addition, from the perspective of syllable
structure, coronals are the only consonants that can occupy appendix posi-
tions in Dutch (see, e.g. Fikkert 1994 and Booij 1999 for summaries of the
research on syllable structure in Dutch). From both statistical and distribu-
tional perspectives, coronals can thus appear to the learner as having a spe-
cial, privileged status. Second, laryngeals are considered to be the simplest
consonants from an articulatory perspective by many phonologists and
phoneticians (e.g. Clements 1985). Indeed, these consonants do not involve
any articulation in the supralaryngeal region of the vocal tract. Both cor-
onals and laryngeals thus offer the child good alternatives, which manifest
themselves in output forms.


5.4. Grammatical influences
Finally, the argument presented above would not be complete without a
discussion of influences on the childs productions that seem to be inherent
to the grammatical system itself. Despite perceptual and articulatory effects
such as the ones discussed in the preceding sections, several facts docu-
mented in the literature on phonological development strongly suggest the
presence of general grammatical principles whose effects can be observed
independently in language typology, as already discussed in section 3. For
example, while various combinations of perceptual and articulatory factors
Internal and external influences on child language productions 345

should yield fairly extensive variation between learners, even within the
same target language, it is generally noted that variation is in fact fairly
restricted. Also, several works attribute some of the variability observed
between learners to differences between individual rates of acquisition
rather than actual discrepancies in grammatical analyses once the target
phonological structure is mastered by the learners (e.g. Fikkert 1994, Levelt
1994, Freitas 1997, Goad and Rose 2004).
In addition, relationships between various levels of phonological repre-
sentation, for example, the role of prosodic domains such as the stress foot,
the syllable, or syllable sub-constituents in segmental patterning all point
towards clear grammatical influences over child language productions (e.g.
contributions to Goad and Rose 2003; see also section 5.1 above).
Note also that in the vast majority of the cases documented in the litera-
ture, the emerging properties of child language are grammatically similar to
those of adult languages. There are also strong reasons to believe that ap-
parent counter-examples to this generalization are in fact more cosmetic
than reflecting truly unprincipled grammatical patterns (e.g. Inkelas and
Rose 2008), in the sense that these counter-examples derive from non-
grammatical factors such as those discussed in the above subsections. In-
deed, we can generally account for sound patterns in child language using
theories elaborated on the basis of adult languages. This in itself implies a
strong correspondence between the formal properties of developing gram-
mars and that of end-state (adult) systems. This correspondence in turn
reveals a set of grammatical principles that should be considered in analy-
ses of child language productions. In this regard, it is also important to
highlight the fact that most of the analyses proposed in the literature on
phonological development require a certain degree of abstraction, one that
extends beyond perception- or articulation-related issues such as the ones
noted in preceding sub-sections.
While more observations should be added to this brief survey, we can
reasonably conclude that despite the fact that child language is subject to
non-grammatical influences, its careful study reveals a great deal of sys-
tematic properties. In turn, these properties can be used to formally charac-
terize the stages that the child proceeds through while acquiring his/her
target grammar(s).
346 Yvan Rose

6. Discussion
In this paper, I have discussed phonological patterns that offer strong em-
pirical arguments against any mono-dimensional approaches to phonologi-
cal development, be they based solely on statistical, phonetic or grammati-
cal considerations. I argued that an understanding of many developmental
patterns of phonological production requires a multi-dimensional approach
incorporating, among others, perceptual factors that can affect the elabora-
tion of lexical representations, articulatory factors that can prevent the re-
alization of certain sounds, as well as the phonological properties of the
target language itself (e.g. phonological and phonetic inventories, distribu-
tions and statistics; prosodic properties). A consideration of these factors
offers many advantages, including both the avoidance of unnecessary ana-
lytical issues imposed by true grammatical opacity and, crucially, the ex-
planatory power of the more transparent analyses proposed.
As in all multi-factorial approaches, one of the main challenges lies in
the determination of what factors are involved and of how these factors
interact to yield the outcomes observed in the data. For example, one im-
portant issue that was left open in this paper concerns the fact that while
statistics of the input seem to play a central role in infant speech perception,
such statistics appear to be only one of the many factors underlying patterns
observed in speech production. The relationship between perception and
production thus remains one that warrants further research. In order to
tackle this issue, we should favour strong empirical, cross-linguistic inves-
tigations within which all of the languages involved would be compared on
the basis of their distinctive linguistic properties. By combining the results
obtained through such investigations with those from research on speech
perception and articulation by children, we should be in a better position to
improve our understanding of phonological development, from the earliest
months of life through the most advanced stages of attainment.


Notes

* Earlier versions of this work were presented during a colloquium presentation
at the Universidade de Lisboa (May 2005), during the Phonological Systems
and Complex Adaptive Systems Workshop at the Laboratoire Dynamique du
Langage, Universit Lumire Lyon 2 (July 2005) and at the 2006 Annual Con-
gress of the Canadian Linguistic Association. I am grateful to all of the partici-
pants to these events for enlighting discussions, especially Peter Avery, Abigail
Internal and external influences on child language productions 347

Cohn, Christophe Coup, Elan Dresher, Maria Joo Freitas, Snia Frota, Sophie
Kern, Alexei Kochetov, Ian Maddieson, Egidio Marsico, Nol Nguyen, Fran-
ois Pellegrino, Christophe dos Santos and Marina Vigrio. I would also like to
thank one anonymous reviewer for useful comments and suggestions. Of course
all remaining errors or omissions are my own.
1. One could argue that post-vocalic consonants in VC and CVC forms involve
complexity at the level of the rhyme constituent. This position is however con-
troversial; several authors have in fact noted asymmetrical behaviours in the
development of word-final consonants and argued that these consonants cannot
always be analyzed as true codas (rhymal dependents) in early phonologies and
should considered as onsets of empty-headed syllables (e.g. Rose 2000, 2003,
Barlow 2003, Goad and Brannen 2003).
2. Of course, one should not rule out the possibility that a regression in the acqui-
sition of consonant clusters yields one of the patterns in (4). The presumption
here is that such regressions are unlikely to occur, especially in typically devel-
oping children (e.g. Bernhardt and Stemberger 1998).
3. As correctly noted by an anonymous reviewer, it is not clear whether the child
analyses the strong and weak velars as allophones or separate phonemes. This
issue is however tangential to the analysis proposed.
4. An anonymous reviewer notes that there may be perceptual or articulatory fac-
tors involved in the pronunciation of /z/ as [d]. This point reinforces the argu-
ment of this paper about the need to entertain several potential factors in the
analysis of child phonological data.
References
Barlow, Jessica
2003 Asymmetries in the Acquisition of Consonant Clusters in Spanish.
Canadian Journal of Linguistics 48(3/4):179-210.
Bernhardt, Barbara and Joseph Stemberger
1998 Handbook of Phonological Development from the Perspective of Con-
straint-Based Nonlinear Phonology. San Diego: Academic Press.
Blevins, Juliette
1995 The Syllable in Phonological Theory. In The Handbook of Phono-
logical Theory, John A. Goldsmith (ed.). Cambridge, MA: Black-
well. 206-244.
Booij, Geert
1999 The Phonology of Dutch. Oxford: Oxford University Press.
Brannen, Kathleen
2002 The Role of Perception in Differential Substitution. Canadian
Journal of Linguistics 47(1/2):1-46.
348 Yvan Rose

Chiat, Shulamuth
1983 Why Mikeys Right and My Keys Wrong: The Significance of
Stress and Word Boundaries in a Childs Output System. Cognition
14:275-300.
Clements, George N.
1985 The Geometry of Phonological Features. Phonology 2:225-252.
Demuth, Katherine
1995 Markedness and the Development of Prosodic Structure. In Pro-
ceedings of the North East Linguistic Society, Jill N. Beckman
(ed.). Amherst: Graduate Linguistic Student Association. 13-25.
Demuth, Katherine and Mark Johnson
2003 Truncation to Subminimal Words in Early French. Canadian Jour-
nal of Linguistics 48(3/4):211-241.
Dinnsen, Daniel A.
2008 A Typology of Opacity Effects in Acquisition. In Optimality The-
ory, Phonological Acquisition and Disorders, Daniel A. Dinnsen
and Judith A. Gierut (eds.). London: Equinox Publishing. 121-176.
dos Santos, Christophe
2007 Dveloppement phonologique en franais langue maternelle: Une
tude de cas. Ph.D. Dissertation. Universit Lumire Lyon 2.
Dresher, B. Elan and Harry van der Hulst
1995 Global Determinacy and Learnability in Phonology. In Phonologi-
cal Acquisition and Phonological Theory, John Archibald (ed.).
Hillsdale, NJ: Lawrence Erlbaum. 1-21.
Dunphy, Carla
2006 Another Perspective on Consonant Harmony in Dutch. M.A. The-
sis. Memorial University of Newfoundland.
Ferguson, Charles and Carol B. Farwell
1975 Words and Sounds in Early Language Acquisition. Language
51:419-439.
Fikkert, Paula
1994 On the Acquisition of Prosodic Structure. HIL Dissertations in
Linguistics 6. The Hague: Holland Academic Graphics.
Fikkert, Paula and Clara Levelt
2008 How does Place Fall into Place? The Lexicon and Emergent Con-
straints in Childrens Developing Grammars. In Contrast in Pho-
nology, Peter Avery, B. Elan Dresher and Keren Rice (eds.). Ber-
lin: Mouton de Gruyter. 231-268.
Fougeron, Ccile and Patricia A. Keating
1996 Articulatory Strengthening in Prosodic Domain-initial Position.
UCLA Working Papers in Phonetics 92:61-87.
Internal and external influences on child language productions 349

Freitas, Maria Joo
1997 Aquisio da Estrutura Silbica do Portugus Europeu. Ph.D. The-
sis. University of Lisbon, Lisbon.
Gerken, LouAnn
2002 Early Sensitivity to Linguistic Form. In Annual Review of Lan-
guage Acquisition, Volume 2, Lynn Santelmann, Maaike Verrips
and Frank Wijnen (eds.). Amsterdam: John Benjamins. 1-36.
Gierut, Judith A. and Kathleen M. OConnor
2002 Precursors to Onset Clusters in Acquisition. Journal of Child Lan-
guage 29:495-517.
Goad, Heather
1997 Consonant Harmony in Child Language: An Optimality-theoretic
Account. In Focus on Phonological Acquisition, S. J. Hannahs and
Martha Young-Sholten (eds.). Amsterdam: John Benjamins. 113-
142.
2000 Phonological Operations in Early Child Phonology. SOAS collo-
quium talk. University of London.
Goad, Heather and Kathleen Brannen
2003 Phonetic Evidence for Phonological Structure in Syllabification. In
The Phonological Spectrum, Vol. 2, Jeroen van de Weijer, Vincent
van Heuven and Harry van der Hulst (eds.). Amsterdam: John Ben-
jamins. 3-30.
Goad, Heather and Yvan Rose
2004 Input Elaboration, Head Faithfulness and Evidence for Representa-
tion in the Acquisition of Left-edge Clusters in West Germanic. In
Constraints in Phonological Acquisition, Ren Kager, Joe Pater
and Wim Zonneveld (eds.). Cambridge: Cambridge University
Press. 109-157.
Goad, Heather and Yvan Rose (eds.)
2003 Segmental-prosodic Interaction in Phonological Development: A
Comparative Investigation: Special Issue, Canadian Journal of
Linguistics 48(3/4): 139-152.
Hale, Mark and Charles Reiss
1998 Formal and Empirical Arguments Concerning Phonological Acqui-
sition. Linguistic Inquiry 29(4):656-683.
Inkelas, Sharon and Yvan Rose
2008 Positional Neutralization: A Case Study from Child Language.
Language 83(4):707-736.
Kent, Ray D. and Giuliana Miolo
1995 Phonetic Abilities in the First Year of Life. In The handbook of
Child Language, Paul Fletcher and Brian MacWhinney (eds.).
Cambridge, MA: Blackwell. 303-334.
350 Yvan Rose

Levelt, Clara
1994 On the Acquisition of Place. HIL Dissertations in Linguistics 8.
The Hague: Holland Academic Graphics.
Levelt, Clara, Niels Schiller and Willem Levelt
1999/2000 The Acquisition of Syllable Types. Language Acquisition 8:237-
264.
Levitt, Andrea, Peter Jusczyk, Janice Murray and Guy Carden
1987 Context Effects in Two-Month-Old Infants' Perception of Labio-
dental/Interdental Fricative Contrasts. Haskins Laboratories Status
Report on Speech Research 91:31-43.
Macken, Marlys
1980 The Childs Lexical Representation: The Puzzle-Puddle-Pickle
Evidence. Journal of Linguistics 16:1-17.
Mnard, Lucie
2002 Production et perception des voyelles au cours de la croissance du
conduit vocal : variabilit, invariance et normalisation. Ph.D. Dis-
sertation. Institut de la communication parle, Grenoble.
Pan, Ning and William Snyder
2003 Setting the Parameters of Syllable Structure in Early Dutch. In
Proceedings of the 27th Boston University Conference on Lan-
guage Development, Barbara Beachley, Amanda Brown and Fran-
ces Conlin (eds.). Somerville, MA: Cascadilla Press. 615-625.
Paradis, Carole and Jean-Franois Prunet, eds.
1991 The Special Status of Coronals: Internal and External Evidence.
San Diego: Academic Press.
Pater, Joe
1997 Minimal Violation and Phonological Development. Language
Acquisition 6(3):201-253.
Pinker, Steven
1984 Language Learnability and Language Development. Cambridge,
MA: Harvard University Press.
Rose, Yvan
2000 Headedness and Prosodic Licensing in the L1 Acquisition of Pho-
nology. Ph.D. Dissertation. McGill University.
2003 Place Specification and Segmental Distribution in the Acquisition
of Word-final Consonant Syllabification. Canadian Journal of Lin-
guistics 48(3/4):409-435.
Smit, Ann Bosma
1993 Phonologic Error Distribution in the Iowa-Nebraska Articulation
Norms Project: Consonant Singletons. Journal of Speech and
Hearing Research 36:533-547.
Internal and external influences on child language productions 351

Smith, Neilson
1973 The Acquisition of Phonology, a Case Study. Cambridge: Cam-
bridge University Press.
Smolensky, Paul
1996 On the Comprehension/Production Dilemma in Child Language.
Linguistic Inquiry 27:720-731.
Stoel-Gammon, Carol
1996 On the Acquisition of Velars in English. In Proceedings of the
UBC International Conference on Phonological Acquisition, Bar-
bara H. Bernhardt, John Gilbert and David Ingram (eds.). Somer-
ville: Cascadilla Press. 201-214.
Studdert-Kennedy, Michael and Elizabeth Goodell
1993 Acoustic Evidence for the Development of Gestural Coordination
in the Speech of 2-year-olds: A Longitudinal Study. Journal of
Speech and Hearing Research 36(4):707-727.
van de Weijer, Joost
1999 Language Input for Word Discovery. Ph.D. Dissertation. Max Planck
Institute.


Emergent complexity in early vocal acquisition:
Cross linguistic comparisons of canonical babbling
Sophie Kern and Barbara L. Davis
Phonetic complexity, as evidenced in speech production patterns, is based
on congruence of production system, perceptual, and cognitive capacities in
adult speakers. Pre-linguistic vocalization patterns in human infants afford
the opportunity to consider first stages in emergence of this complex sys-
tem. The production system forms a primary site for considering determi-
nates of early output complexity, as the respiratory, phonatory, and articula-
tory subsystems of infant humans support the types of vocal forms
observed in early stages as well as those maintained in phonological sys-
tems of languages. The role of perceptual input from the environment in
earliest stages of infant learning of ambient language phonological regulari-
ties is a second locus of emergent complexity. Young infants must both
attend to and reproduce regularities to master the full range of phonological
forms in their language. Cross linguistic comparison of babbling in infants
acquiring typologically different languages including Dutch, Romanian,
Turkish, Tunisian Arabic and French are described to consider production
system based regularities and early perceptually based learning supporting
emergence of ambient language phonological complexity.
1. Theoretical background
1.1. Common trends in babbling
Canonical babbling marks a seminal step into the production of syllable-
like outputs in infants. Canonical babbling is defined as rhythmic alterna-
tions between consonant and vowel-like properties, giving a percept of
rhythmic speech that simulates adult output without conveying meaning
(Davis & MacNeilage, 1995; Oller, 2000). These rhythmic alternations
between consonants and vowels are maintained in adult speakers and form
the foundation for complexity in languages (Maddieson, 1984). Longitudi-
nal investigations of the transition from canonical babbling to speech have
shown continuity between phonetic forms in infant pre-linguistic vocaliza-
tions and earliest speech forms (Oller, 1980; Stark, 1980; Stoel-Gammon &
354 Sophie Kern and Barbara L. Davis

Cooper, 1984; Vihman et al., 1986). This continuity supports the impor-
tance of considering canonical babbling as a crucial first step in the young
childs journey toward mastery of ambient language phonology.
Strong similarities in sound and utterance type preferences in canonical
babbling across different language communities have been documented,
suggesting a universal basis for babbling (Locke, 1983). For consonants,
stop, nasal and glide manner of articulation are most frequently reported
(Locke, 1983; Robb & Bleihle, 1994; Roug et al., 1989; Stoel-Gammon,
1985; Vihman et al., 1985). Infants tend to produce consonants at the cor-
onal and labial consonant place of articulation (Locke, 1983) and few dor-
sals are noted (Stoel-Gammon, 1985). Vowels from the lower left quadrant
of the vowel space (i.e. mid and low front and central vowels) are most
often observed (Bickly, 1983; Buhr, 1980; Davis & MacNeilage, 1990;
Kent & Bauer, 1985; Lieberman, 1980; Stoel-Gammon & Harrington,
1990).
The phenomenon of serial ordering is one of the most distinctive proper-
ties of speech production in languages (Maddieson, 1984). In a typical ut-
terance, consonants and vowels do not appear in isolation but are produced
serially. Within-syllable patterns for contiguous consonants and vowels
provide a site for considering the emergence of complexity in utterance
structures, as rhythmic consonant and vowel syllables emerge typically at
around 8-9 months; in previous stages infant vocalizations do not exhibit
rhythmic syllable-like properties (see Oller, 2000, for a review). Three pre-
ferred within-syllable co-occurrence patterns have been reported in studies
of serial properties; coronal (tongue tip closure) consonants with front vo-
wels (e.g. /di/), dorsal (tongue back closure) consonants with back vowels
(e.g. /ku/), and labial (lip closure) consonants with central vowels (e.g.
/ba/). These widely observed serial patterns are predicted by the Frame
Content hypothesis (MacNeilage & Davis, 1990). The Frame Content hy-
pothesis proposes that the tongue does not move independently from the
jaw within syllables, but remains in the same position for the consonant
closure and the open or vowel portions of rhythmic cycles. Within syllable
consonant vowel characteristics are based on these rhythmic jaw close open
close cycles without independent movement of articulators independent of
the jaw. In studies of 6 English-learning infants during babbling (Davis &
MacNeilage, 1995) and 10 infants during the single word period (Davis et
al., 2002), all three predicted co-occurrences of the Frame Content perspec-
tive were found at above chance levels; other potential co-occurrences did
not occur above chance. Evidence for these serial patterns have also been
Emergent complexity in early vocal acquisition 355

found in analyses of 5 French, 5 Swedish and 5 Japanese infants from the
Stanford Child Language database (Davis & MacNeilage, 2000), 2 Brazili-
an-Portuguese learning children (Teixiera & Davis, 2002), 7 infants acquir-
ing Quechua (Gildersleeve-Neuman & Davis, 1998) and 7 Korean-learning
infants (Lee, 2003).
Some counterexamples to these CV co-occurrence trends have been re-
ported (Boysson-Bardies, 1993; Oller & Steffans, 1993; Tyler &
Langsdale, 1996; Vihman, 1992). However, most differences in outcome
may result from methodological differences. A labial central association in
initial syllables was shown by Boysson-Bardies (1993) for French, Swedish
and Yoruba infants but not for English: the English-speaking infants in her
study preferred the labial front association. However, Boysson-Bardies
analyzed the first and second syllables of utterances separately resulting in
very small databases for statistical analysis. Oller and Steffans (1993)
evaluated their results against the expected frequencies of consonants. They
did not include expected frequencies of vowels, complicating comparison
of results. The three predicted co-occurrences were observable in Tyler
and Langsdales (1996) data if the small number of observations in the
three age groups studied were pooled. An alveolar front association was not
found in 3 English-speaking and 2 Swedish-speaking subjects by Vihman
(1992). However, she counted // as a central vowel, also complicating the
interpretation of her results relative to the predicted CV co-occurrences.
Vocalization patterns across syllables are also of importance to consid-
ering emergence of vocal complexity. In languages, most words contain
varied consonants and vowels across syllables; phonological reduplication,
or repetition of the same syllable, is infrequent (Maddieson, 1984). In con-
trast, two types of canonical babbling in pre-linguistic infants have been
described: reduplicated and variegated. Reduplicated or repeated syllables
(e.g. /baba/) account for half or more of all vocal patterns in babbling and
more than half of early word forms (Davis et al., 2002). In variegated
forms, infants change vowels and/or consonants in two successive syllables
(e.g. /babi/ or /bada/). Several studies have shown concurrent use of both
reduplication and variegation during babbling (Mitchell & Kent, 1990;
Smith, Brown-Sweeney & Stoel-Gammon, 1989). In variegated babbling,
more manner than place changes for consonants (Davis & MacNeilage,
1995, Davis et al., 2002) and more height than front-back changes for vow-
els have been shown during babbling and first words (Bickley, 1983; Davis
& MacNeilage, 1995, Davis et al., 2002). The preference for manner
changes for consonants and height changes for vowels is consistent with the
356 Sophie Kern and Barbara L. Davis

Frame Content hypothesis (MacNeilage & Davis, 1990). As patterns are
based on rhythmic jaw oscillations without independent tongue movement,
predominance of manner and height changes over place and front-back
changes are predicted when successive syllables show different levels of
jaw closure.


1.2. Early ambient language effects
Infants exhibit abilities to learn rapidly from language input regularities as
early as 8-10 months, based on responses in experimental lab settings (e.g.
Saffran et al., 1996; Werker & Lalonde, 1988). It has also been proposed
that learning from ambient language input may influence and shape vocali-
zation preferences in the late babbling and/or first word periods. The ap-
pearance of ambient language influences in production repertoires has been
examined for utterance and syllable structures (Boysson-Bardies, 1993;
Kopkalli-Yavuz & Topba, 2000), vowel and consonant repertoires and
distribution (Boysson-Bardies et al., 1989 and 1992) as well as CV co-
occurrence preferences (e.g. Lee, 2003).
Some studies of early appearance of ambient language regularities have
focused on adult capacities for perception of differences in children from
different language environments. Thevenin et al. (1985) failed to find sup-
port for adults ability to discriminate the babbling of 7 to 14 month old
English and Spanish-learning infants. However, their stimuli consisted of
short 1 to 3 sec stretches of canonical babbling. Boysson-Bardies et al.
(1984) presented nave adults with sequences of early babbling of French,
Arabic and Cantonese infants. Participants were asked to identify babbling
of French infants. Listeners were correct in judging 70% of the tokens,
suggesting that babbling in the pre-linguistic period may exhibit perceptu-
ally apparent ambient language characteristics. Adults were able to cor-
rectly identify language differences at 6 and 8 months, but not babbling of
10 month olds. According to Boysson-Bardies et al. (1984), this result
could be explained by stimuli differences: stimuli from 10 month olds
showed less consistency in intonation contours. Despite discrepancies in
results, where adults were less accurate listening to older infants, these
perceptual studies suggest a potential role of prosodic cues in adult listen-
ers abilities to judge language background of young infants.
Other studies targeting acoustic and phonetic properties of infants bab-
bling output have provided some support for early ambient language learn-
Emergent complexity in early vocal acquisition 357

ing. Boysson-Bardies et al. (1989) compared vocalizations of French, Eng-
lish, Cantonese and Algerian 10 month olds. Based on computation of
mean vowels (i.e. mean F1 and F2), they proposed that the acoustic
vowel distribution was significantly different for the 4 language groups.
There was also close similarity between infant and adult vowels in each
of the four linguistic communities. Boysson-Bardies, Hall, Sagart & Du-
rand (1992) also suggested an early influence of the language environment
on consonants in the four languages. They found significant differences in
the distribution of place and manner of articulation across the four lan-
guages. Stop consonants represented the largest proportion for all infants.
From 10 months, French infants produced fewer stops than American and
Swedish infants. Levitt & Utman (1992) compared one French and one
English-learning infant. English shows higher frequencies of fricatives,
affricates and nasals than French; approximants are more frequent in
French than in English. Each infants consonant inventory moved toward
their own ambient language in composition and frequency; both infants
showed a closest match to the ambient frequencies at 5 months. The French
child also favored low front vowels and the English child preferred mid
central vowels, consistent with frequencies in their ambient language. The
study reported on a very small sample of data for the two children, how-
ever, complicating generalization of results on timing of early ambient lan-
guage learning. In general, available studies are limited in the size of the
databases and number of participants, so conclusions must be considered as
needing further confirmation
Strongly consistent trends in production patterns as well as preliminary
indications about the timing of learning from ambient language input are
apparent. However, empirical investigations of early ambient language
learning do not provide strong evidence due to methodological issues (e.g.
adult perceptual studies vs. infant production patterns, amount of data ana-
lyzed, age of observation, number of participants, longitudinal vs. cross-
sectional data collection and use of perception based phonetic transcription
vs. acoustic analysis). To evaluate the emergence of early learning from the
ambient language more fully, the issue must be considered in the context of
common production patterns seen across languages. Larger cohorts of chil-
dren in varied language environments illustrating diverse ambient language
targets are necessary. Consistent data collection and analysis procedures are
also essential to comprehensively evaluate this question.
358 Sophie Kern and Barbara L. Davis

2. Predictions
In this work, a uniform analysis profile on large corpora for five different
languages is imposed, with the goal of understanding timing of emergence
and precise characteristics of ambient language learning in the context of
reports on common production trends. Predictions based on common trends
will be tested as follows:
There will be a significantly higher proportion of:
stop, nasal and glide consonant manner of articulation,
coronal and labial consonant place of articulation,
mid and low front and central vowels.
Within syllable consonant vowel co-occurrences will show a significant
tendency for:
labial consonants and central vowels,
coronal consonants and front vowels,
dorsal consonants and back vowels.
Across syllables, there will be a significant tendency for:
co-occurrence of both reduplication and variegation,
manner over place changes for consonants in variegated syllables,
height over front-back changes for vowels in variegated syllables.
3. Method
3.1. Participants
Twenty infants (4 infants per language) were observed in their normal daily
environment. Infants were described as developing typically according to
community standards and reports from parents and physicians regarding
developmental milestones. All infants were monolingual learners of Turk-
ish, French, Romanian, Dutch and Tunisian Arabic. These languages repre-
sent diverse language families: French and Romanian are Romance lan-
guages, Dutch is a West-Germanic language, Turkish is a Ural-Altaic
language and Tunisian belongs to the Arabic language family. Table 1
summarizes descriptive data for participants.


Emergent complexity in early vocal acquisition 359

3.2. Data collection
One hour of spontaneous vocalization data was audio and video recorded
every two weeks from 8 through 25 months in the infants homes. Parents
were told to follow their normal types of activities with their child. No ex-
tra materials were introduced into the environment, so that samples re-
flected the infants typical vocalizations in familiar surroundings.


3.3. Data analysis
Spontaneous vocalization samples during canonical babbling were ana-
lyzed. Canonical babbling was defined as beginning with the onset of
rhythmic speech-like syllables based on parent report. The data were col-
lected until each child was chronologically 12 months of age.
165 hours of spontaneous data were phonetically transcribed using the In-
ternational Phonetic Alphabet with broad phonetic transcription conven-
tions. All singleton consonants and vowels as well as perceptually rhythmic
syllable-like vocalizations were transcribed. Tokens considered as single
utterance strings were separated by 1 second of silence, noise or adult
speech. Transcribed data were entered into Logical International Phonetic
Programs (LIPP, Oller & Delgado, 1990) for analysis of patterns.

Table1. Participants and data analyzed.
Language Language
Family
Number of par-
ticipants
Number of one hour
sessions
French Romance 4 32
Romanian Romance 4 33
Tunisian Arabic 4 27
Turkish Ural-Altaic 4 34
Dutch West-Germanic 4 39
Total 20 165

A variety of phonetic characteristics were considered. Consonants were
grouped according to 1) manner of articulation: oral and nasal stops, oral
and glottal fricatives, glides, and other (i.e. trills, taps and affricates) and 2)
place of articulation: labial (bilabial, labiodental, labiopalatal and labio-
velar), coronal (dental, alveolar, postalveolar and palatal), dorsal (velar and
uvular) and guttural (pharyngeal and glottal). Glides were considered as
360 Sophie Kern and Barbara L. Davis

consonants, as they share the consonantal property of accompanying the
mouth closing phase of babbling. Vowels were grouped according to 1)
backness: front, central and back, and 2) height: high, mid and low. An
other category included all segments that could not be perceptually recog-
nized by transcribers as specific consonants or vowels (i.e. UC - undefined
consonant, UV - undefined vowel). For all sounds occurring in perceptually
rhythmic syllable contexts, within syllable consonant vowel (CV) co-
occurrence patterns were analysed. For this analysis, consonants were
grouped into 3 categories according to consonant place of articulation: la-
bial, coronal and dorsal. Vowels were grouped into front, central and back
dimensions. For across syllable patterns, utterance strings were considered
reduplicated if all consonant and vowel types were identical. Variegated
strings were designated by changes in consonant place or manner, vowel
height or front-back, or both.
4. Results
4.1. Utterance structures
Table 2 displays frequency of occurrence for utterances, segments and the
C/V ratio. The number of utterances for all languages was 38,719 ranging
from 3,409 (Turkish) to 10,623 (Dutch). Overall the number of segments
totalled 168,145. In all languages, number of vowels exceeded consonants
as illustrated by the C/V ratio. Overall, 57,472 consonants were analysed.
The number of consonants ranged from 6,771 (Turkish) to 16,760 (Tuni-
sian) across languages. For each language, percentages and totals of conso-
nants occurring more than 5% are given in Appendix A. 69,007 vowels,
including those occurring > 5% were transcribed (See Appendix B for per-
centages and totals of vowels occurring > 5%).

Table 2. Frequency of occurrence of segments and utterances.
Utterances Consonants Vowels C/V ratio Other Total segments
French 10,085 9,462 12,196 0.78 320 32,063
Romanian 8,280 9,512 11,807 0.80 19 29,618
Tunisian 6,322 16,760 19,145 0.88 82 42,309
Turkish 3,409 6,771 8,201 0.83 1,595 19,967
Dutch 10,623 14,967 17,658 0.85 940 44,188
Total 38,719 57,472 69,007 0.83 2,956 168,145
Emergent complexity in early vocal acquisition 361

4.2. Consonant characteristics
Manner of articulation: Some similarities were apparent as well as some
striking differences across languages relative to manner of articulation.
Figure 1 displays results for manner of articulation for all consonants in the
corpus. Oral stops were most frequent (43.5%). Four languages out of five
exhibited this trend: oral stops accounted for 51.5% in French, 51% in Ro-
manian, 42.5% in Dutch and 57.5% in Turkish. Tunisian infants produced
only 29.5% stops. A high percentage of glottal fricatives is observed in
Tunisian (31.5%) and Dutch (25.5%). In Tunisian, the glottal fricative [h]
represented the largest consonant type (31.5%), almost equal to the number
of stops (29.5%). The glottal fricative [h] represented 25.5% of occurrences
for Dutch infants. When glottal fricatives were not counted, glides (15%)
and nasals (12%) were the second most frequent manner of articulation for
all languages. French infants produced twice the group average for nasals.
Dutch and Tunisian infants produced far fewer nasals. Finally, in all lan-
guages children produced more orals and nasals and glides than other man-
ner of articulation (Z-test, p 10
-6
). This result confirms our first hypothe-
sis concerning a significantly higher proportion of stop, nasal and glide
consonant manner of articulation.
0
10
20
30
40
50
60
70
French Romanian Dutch Tunisian Turkish Average
P
e
r
c
e
n
t
a
g
e

o
f

o
c
c
u
r
r
e
n
c
e
orals
nasals
glides
oral f ricatives
glottal f ricatives
others

Figure 1. Consonant manner of articulation.

Place of articulation: Figure 2 displays place of articulation results for all
consonants in the corpus. Coronals were the most frequent at 47%. How-
ever in French, labials (47%) were most frequent; in particular, the labial
362 Sophie Kern and Barbara L. Davis

nasal [m] was frequently produced (21.5%). Tunisian infants produced
more glottals, with a high frequency of the glottal fricative [h] as noted
above for manner of articulation. The second most frequent Tunisian place
category was coronals. Across all languages, there were more glottals than
dorsal, due to Tunisian and Dutch. Our second hypothesis is confirmed, as
the proportion of labials and coronals is significantly different (more fre-
quent) than the proportion of dorsals and glottals in each of the 5 languages
(Z-test, p 10
-6
).
0
10
20
30
40
50
60
70
French Romanian Dutch Tunisian Turkish Average
P
e
r
c
e
n
t
a
g
e

o
f

o
c
c
u
r
r
e
n
c
e
labial
coronal
dorsal
glottal

Figure 2. Consonant place of articulation.


4.3. Vowel characteristics
Vowel frequencies ranged from 8,201 to 19,145. Two or three vowels ac-
counted for 50% of all types. Only the mid low vowel [a] occurred with a
frequency of >5% in all 5 languages.
Vowels in the lower left quadrant of the vowel space were separated and
compared with other vowel types (Figure 3). Overall, mid and low front
and central vowels were most frequent. Combining the five languages, the
lower left quadrant category yielded 66% of all vowels. This analysis con-
firms our third hypothesis that children will produce more vowels from the
lower left quadrant than other vowel types. In each language, the difference
between the two groups is statistically significant in showing a predomi-
nance of the vowels from the lower left quadrant (Z-test, p 10
6
).
Emergent complexity in early vocal acquisition 363

In French, the low central vowel [a] and mid- front rounded vowel []
represented approximately 60%; 3 other vowels occurred >5%. In Tunisian,
the two most frequent vowels were [] and [e]; only [a] occurred at > 5%.
Dutch infants exhibited a high percentage of both central vowels [] and
[a]; 3 others at > 5%. In Turkish [] occurred more than 29%: 5 other vow-
els at more than 5%: [], [], [a], [], [u], [] and []. Romanians produced
[a] most frequently (29.5%); five other vowels occurred at frequencies >
5%.

Figure 3. Vowels from the lower left quadrant versus other vowel types.

Group and language trends for vowels were apparent for both vowel height
and front-back dimensions. Figure 4 displays the distribution of vowels by
front-back dimensions for each language and overall. As shown in Figure 4,
the three types of vowels are not homogeneously distributed in the different
languages: Front vowels have a higher frequency in French and Tunisian
(Z-test, p 10
-6
); central vowels have a higher frequency in all languages
except in Tunisian (Z-test, p 10
-6
), back vowels were the least represented
category in all 5 languages (Z-test, p 10
-6
).
0
10
20
30
40
50
60
70
80
French Romanian Dut ch Tunisian Turkish Average
V<lef t
ot her vowels
P
e
r
c
e
n
t
a
g
e
o
f
o
c
c
u
r
r
e
n
c
e
0
10
20
30
40
50
60
70
80
French Romanian Dut ch Tunisian Turkish Average
V<lef t
ot her vowels
P
e
r
c
e
n
t
a
g
e
o
f
o
c
c
u
r
r
e
n
c
e
364 Sophie Kern and Barbara L. Davis

0
10
20
30
40
50
60
70
80
90
French Romanian Dutch Tunisian Turkish Average
P
e
r
c
e
n
t
a
g
e

o
f

o
c
c
u
r
r
e
n
c
e
Front
Central
Back

Figure 4. Vowel front back dimension.

0
10
20
30
40
50
60
70
French Romanian Dutch Tunisian Turkish Average
P
e
r
c
e
n
t
a
g
e

o
f

o
c
c
u
r
r
e
n
c
e
High
Low
Mid

Figure 5. Vowel height dimension.
Figure 5 displays the percentage of vowels in the height dimension for the 5
languages. Again, the three categories do not exhibit a homogeneous distri-
bution. Mid vowels are significantly higher in frequency in French (Z-test,
p 10
-6
), Dutch (Z-test, p 10
-6
) and Turkish (Z-test, p 0.001); low vow-
Emergent complexity in early vocal acquisition 365

els have a higher frequency in Tunisian (Z-test, p 10
-6
) whereas high
vowels are low in frequency (Z-test, p 10
-6
in French, Dutch, Tunisian; Z-
test, p 0.001 in Turkish) except in Romanian.


4.4. Within syllable CV Co-occurrences
The predicted within syllable CV co-occurrence trends tested were: la-
bial consonants with central vowels, coronal consonants with front vowels,
and dorsal consonants with back vowels. This test was based on the Frame
Content prediction (MacNeilage & Davis, 1990) that consonants and vow-
els within syllables will be articulatorily contiguous rather than showing
tongue movement between consonant and vowel portions of the close open
syllable alternation.

Table 3. Ratio of observed to expected occurrences of labial, coronal and dorsal
consonants with front, central and back vowels (predicted values are in
boldface).
Language Vowels Consonants
Coronal Labial Dorsal
French Front 1.13 .83 1.11
Central 1.07 1.06 .70
Back .71 1.15 1.36
Tunisian Front 1.02 .98 1.05
Central .83 1.24 1.44
Back 1.07 .64 1.22
Roma-
nian
Front 1.24 .65 1.07
Central 1.01 .94 .95
Back .28 4.70 .20
Turkish Front 1.11 .99 .59
Central .90 1.03 1.09
Back .69 .72 2.87
Dutch Front 1.03 .62 1.43
Central 1.19 .93 .73
Back .87 2.21 .67

Predicted labial central vowel co-occurrences, occurred in 3/5 languages.
Dutch and Romanian infants did not show the preferred association be-
tween labial consonants and central vowels. The predicted coronal front
366 Sophie Kern and Barbara L. Davis

vowel association occurred in all language groups. Predicted associations
between dorsal and back vowels were found in 4/5 language groups; Dutch
infants did not show the expected dorsal back association.
Overall, there was a tendency for the 20 infants to prefer the three predicted
CV co-occurrence patterns over non-predicted patterns. For individual in-
fants, 36 of 60 of the 3 predicted associations were above chance; only 52
of 117 non-predicted cells were above chance. Three infants did not pro-
duce some types of syllables, so those cells contained no observations. The
Chi-square value was 4.72, (df = 1; p < .05).


4.5. Across syllable reduplication and variegation
Analysis of CVCV sequences for these 5 languages confirms tendencies
toward co-occurrence of reduplication and variegation. Reduplicated and
variegated utterances co-occurred in all languages. French and Tunisian
infants produced significantly more variegation than reduplication (Z-test, p
10
-6
), Dutch and Romanian infants showed the reverse trend (Z-test, p
10
-6
) whereas Turkish infants used as many variegated as duplicated bab-
bling (see Figure 6).

0
10
20
30
40
50
60
70
80
French Romanian Dutch Tunisian Turkish Average
P
e
r
c
e
n
t
a
g
e

o
f

o
c
c
u
r
r
e
n
c
e
duplicated
variegated

Figure 6. Reduplicated and variegated babbling.
When syllables were variegated across utterances, these 20 infants tended
to follow common tendencies described previously. Consonant manner
changes predominated over place changes in 2 languages (Figure 7) as pre-
dicted by the Frame Content hypothesis: Tunisian (Z-test, p 10-6) and
Romanian (Z-test, p 0.001). For French, Dutch and Turkish infants the
relative frequencies were almost equal. Vowel changes in variegated bab-
Emergent complexity in early vocal acquisition 367

bling showed height changes predominating over back/front changes for the
infants in 5 language groups: Turkish and Tunisian (Z-test, p 10
-6
), Ro-
manian and French (Z-test, p 0.001) and Dutch (Z-test, p 0.01) (Figure
8).
0
10
20
30
40
50
60
70
French Turkish Romanian Dutch Tunisian Average
Manner
Place

Figure 7. Manner vs. place changes of consonant articulation in CVCV sequences.

0
10
20
30
40
50
60
70
80
F
r
e
n
c
h
T
u
r
k
i
s
h
R
o
m
a
n
i
a
n
D
u
t
c
h
T
u
n
i
s
i
a
n
A
v
e
r
a
g
e
Height
Front-Back

Figure 8. Height vs. back-front changes of vowel articulation in CVCV sequences.
5. Discussion and conclusions
The emergence of complexity, as exhibited in early speech acquisition pat-
terns, necessitates consideration of both universal patterns based on charac-
teristics of the production subsystems common to all infants and early
368 Sophie Kern and Barbara L. Davis

learning from adult speakers in an infants ambient language community.
Complexity in babbling is founded on convergence of multiple physio-
logical sub-systems to enable the first appearance of syllable based speech-
like output. It is illustrated in interactions within underlying physiological
mechanisms and at the level of emergence of the syllable in observable
behavioral output. Canonical babbling represents the earliest appearance of
language like regularities in infant productions of perceptually rhythmic
syllables that simulate adult speech. Patterns in canonical babbling have
been shown to be continuous with vocal patterns in the early language-
based single word stage. This continuity emphasizes the importance of
considering speech-like prelinguistic babbling as a first step into language
complexity.
Sophisticated early learning capacities have been shown in experimental
laboratory paradigms (e.g. Saffran & Aslin, 1997). Observation of sponta-
neous vocalizations can reveal an infants ability to exhibit learning of am-
bient language regularities in naturally occurring output. In this study, we
analyzed naturally occurring spontaneous vocalization patterns in canonical
babbling for 20 infants in five different language communities to consider
the strength of common patterns versus the early appearance of language
specific tendencies.
To the extent that potentially universal vocalization patterns predomi-
nate across languages, infants can be described as manifesting characteris-
tics of the maturing production system common to neotenous humans
(Locke, 2006). These data on five diverse languages provides strong evi-
dence for a near universal basis for naturally occurring vocal output pat-
terns during pre-linguistic canonical babbling. Consistencies on any index
analyzed across these languages were far more pervasive than differences;
infant vocalization patterns looked more like one another and less like the
adult speakers in their own ambient language environment.
Overall, the predictions regarding common vocalization tendencies were
confirmed. Infants produced more consonant stop and nasal manner of ar-
ticulation and coronal and labial place of articulation. These consonant
tendencies can be characterized as a general pattern of complete closure for
consonant manner (i.e. oral and nasal stop manner) and forward articulation
with lip or tongue tip for place (i.e. labial and coronal place). A strong
preference for vowels in the lower left vowel space emerged, reflecting
more jaw opening and forward tongue placement during open portions of
rhythmic syllabic output. A perceptual basis for these early patterns is not
apparent (see Davis, 2000, for a review of this issue). These patterns for
Emergent complexity in early vocal acquisition 369

consonants and vowels match trends have been reported in a variety of
previous studies of the babbling and early word periods.
Within and across syllable patterns also showed strong trends toward
common patterns in these 5 languages. Concerning within-syllable patterns,
the three predicted trends were largely confirmed. Eleven out of 15 pre-
dicted within syllable associations were confirmed. All coronal front asso-
ciations were confirmed. Three out of 5 labial central associations were
confirmed; Romanian and Dutch infants did not produce these predicted
patterns at above chance levels. Three out of 5 dorsal back associations
were confirmed; Dutch and Romanian infants did not produce these pre-
dicted patterns above chance. CV-co-occurrences have previously been
observed mainly in studies of infants in an English language environment.
The general though not universal confirmation of these early patterns in
five additional languages confirms the influence of serial organization ten-
dencies predicted by the Frame Content hypothesis. These patterns indicate
predominance of jaw open close over independent tongue movements
within syllables as largely characteristic of infant output, regardless of lan-
guage environment. Exploration of the sources of variations found within
and across languages in these patterns will help to understand the emer-
gence of within syllable patterns of complexity more fully.
In the case of multi-syllabic utterances, the prediction for co-occurrence
of reduplication and variegation was confirmed. In variegated utterances,
predicted common trends based largely on previous observations of Eng-
lish-learning infants were confirmed as well. For vowels, all infants varie-
gated more in the height than the front-back dimension. For consonants,
manner exceeded significantly place variegation in 2 out of 5 languages.
Both these tendencies indicate a predominance of jaw close open variega-
tion over tongue movement within utterances. This predominance of man-
ner height variegation patterning is consistent with Frame Content predic-
tions as well, supporting the strong presence of production system-based
influences. Tongue movement within utterances would be signalled by
variegated utterances containing a predominance of consonant place and
vowel front-back variegation.
To fully explore the meaning of these common trends as well as ob-
served differences in infant patterns during babbling, analysis of adult val-
ues in each ambient language is required. These data indicate relatively few
clear examples of ambient language learning in the context of strong com-
mon tendencies across languages. In Tunisian, glottal fricatives were the
most frequent manner/place of articulation pattern. The Tunisian phonemic
370 Sophie Kern and Barbara L. Davis

inventory includes 14 fricatives and 5 glottal consonants while the other
four languages had 2 or fewer glottal phonemes. A frequent vowel type in
Romanian is the high central vowel. This frequency of occurrence of a
vowel type in Romanian infants that is not commonly reported during early
acquisition may be related to the occurrence of this high central vowel pho-
neme in their language input. In addition, the range of variation of individ-
ual infants from the group means should be explored to consider the nature
and importance of individual differences across infants on these patterns. In
Dutch, one infants strong use of dorsals skewed the patterns for the lan-
guage overall. Ongoing analysis of this group of children at later ages could
illuminate the timing and nature of emergence of learning of ambient lan-
guage characteristics.
The present results suggest that common tendencies based on character-
istics of the production system predominate during the babbling period.
Observable characteristics appear to be based less on learning than on in-
trinsic self-organizing propensities of the system and how they are revealed
in the human infants spontaneous vocal output. Kauffman (1995) has
called this level of organized output from a complex living organism order
for free. Early stages of patterned order may thus be viewed as emergent
from the characteristics of the production mechanism of young human
speakers. This hypothesis will need to be explored with comparative analy-
sis of children and languages.
Appendix A. Distributions of consonants in each language
French Tunisian Romanian Turkish Dutch
Cons. (%) Cons. (%) Cons. (%) Cons. (%) Cons. (%)
[m] (21.5) [h] (31.5) [d] (24.0) [d] (33.0) [h] (25.5)
[d] (17.0) [j] (3.5) [j] (12.0) [j] (17.0) [d] (15.0)
[b] (11.0) [t] (9.5) [t] (9.5) [b] (11.5) [t] (14.0)
[t] (10.5) [w] (9.0) [b] (7.5) [g] (8.0) [k] (7.5)
[g] (5.5) [d] (7.5) [n] (7.0) [n] (8.0) [l] (6.0)

[] (7.5)
[w] (5.5) [m] (7.0) [j] (5.5)
[n] (5.1)
6,206
(65.5%)
13,186
(78.5%)
6,232
(65.5%)
5,558
(84.5%)
11,763
(78.6%)

This table displays the relative proportion of the consonants corresponding
to more than 5% of the tokens in each language. The last line indicates the
Emergent complexity in early vocal acquisition 371

corresponding number of tokens and the cumulative proportion regarding
the total number of consonants.
Appendix B. Distributions of vowels in each language
French Tunisian Romanian Turkish Dutch
Vowel (%) Vowel (%) Vowel (%) Vowel (%) Vowel (%)
[a] (30.0)
[] (48.0)
[a] (29.5)
[] (29.0) [] (26.5)
[] (29.0) [e] (26.0)
[] (19.5) [] (22.0)
[a] (20.0)
[] (7.5)
[a] (10.5) [e] (15.5) [a] (14.5)
[] (18.0)
[] (7.0)

[] (12.5) [] (6.5)
[] (12.5)
[e] (6.0)
[] (11.5)
[e] (5.5)
[] (5.5)
[i] (7.5) [] (5.5)
9,707
(79.5%)
16,223
(84.5%)
11,357
(96.0%)
6,803
(83.0%)
14,565
(82.5%)

This table displays the relative proportion of the vowels corresponding to
more than 5% of the tokens in each language. The last line indicates the
corresponding number of tokens and the cumulative proportion regarding
the total number of vowels.


Notes

Acknowledgements: This work is supported by the EUROCORES Program The
Origin of Man, Language and Languages (OMLL), the French CNRS program
Origine de lHomme, du Langage et des Langues (OHLL), and research grant #
HD 27733-10 from the U.S. Public Health Service. Project participants are: Sophie
Kern (project leader) & Laetitia Savot (research assistant), Laboratory Dynamique
du Langage, Lyon, France, Inge Zink (principal investigator), Mieke Breuls &
Annemie Van Gijsel (research assistants), Lab. Exp. ORL/ENT-dept, K.U. Leuven,
Belgium, Aylin Kuntay (main investigator) & Dilara Kobas (research assistant),
Ko University, Turkey, Barbara L. Davis, Peter MacNeilage & Chris Matyear,
Speech Production Laboratory, Austin, Texas, USA.


372 Sophie Kern and Barbara L. Davis

References
Bickley, C.
1983 Acoustic evidence for phonological development of vowels in young
infants. Paper presented at the 10th Congress of Phonetic Sciences,
Utrecht.
Buhr, R. D.
1980 The emergence of vowels in an infant. Journal of Speech and Hear-
ing Research, 12:73-94.
Chomsky, N. & Halle, M.
1968 The Sound pattern of english, New York: Harper & Row.
Davis, B. L. & MacNeilage, P. F.
1995 The Articulatory basis of babbling. Journal of Speech and Hearing
Research, 38:1199-1211.
2000 An embodiment perspective on the acquisition of speech perception.
Phonetica, 57(Special Issue):229-241.
Davis, B. L., MacNeilage, P. F. & Matyear, C. L.
2002 Acquisition of Serial Complexity in Speech Production: A Compari-
son of Phonetic and Phonological Approaches to First Word Produc-
tion. Phonetica, 59, 75-107.
de Boysson-Bardies, B., Sagart, L. & Durant, C.
1984 Discernible differences in the babbling of infants according to target
language. Journal of Child Language, 11(1):1-15.
de Boysson-Bardies, B., Hall, P., Sagart, L. & Durand, C.
1989 A cross linguistic investigation of vowel formants in babbling. Jour-
nal of Child Language, 16:1-17.
de Boysson-Bardies, B., Vihman, M. M., Roug-Hellichius, L., Durand, C.,
Landberg, I., & Arao, F.
1992 Evidence of infant selection from target language: A cross linguistic
phonetic study. In C. A. Ferguson & L. Menn & C. Stoel-Gammon
(Eds.), Phonological development: Models, research, implications.
Monkton, MD: York Press, pp. 369-392.
de Boysson-Bardies, B.
1993 Ontogeny of language-specific syllabic production. In B. de Boys-
son-Bardies & S. de Schoen & P. Jusczyk & P. F. MacNeilage & J.
Morton (Eds.), Developmental neurocognition: Speech and face
processing in the first year of life. Dordrecht: Kluwer Academic
Publishers. pp. 353-363.
Fenson, L., Dale, P., Reznick, S., Thal, D., Bates, E., Hartung, J., Tethick, S. &
Reilly, J.
1993 MacArthur Communicative Development Inventories: User's guide
and technical manual. San Diego: CA Singular Publishing Group.

Emergent complexity in early vocal acquisition 373

Gildersleeve-Neuman, C. & Davis, B. L.
1998 Production versus ambient language influences on speech develop-
ment in Quechua. Paper presented at the Annual Meeting of the
American Speech, Hearing and Language Association, San Antonio,
Texas.
Kent, R. D. & Bauer, H. R.
1985 Vocalizations of one-year olds. Journal of Child Language, 12:491-
526.
Kauffman, S.
1995 At Home in the Universe: The Search for the Laws of Self-
Organization and Complexity, New York: Oxford University Press.
Kopkalli-Yavuz, H. & Topbas, S.
2000 Infants's preferences in early phonological acquisition: How does it
reflect sensitivity to the ambient language? In A. Gksel & C. Ker-
slake (Eds.), Studies on Turkish and Turkic Languages. Wiesbaden:
Harrassowitz. pp. 289-295.
Lee, S.
2003 Perceptual influences on speech production in Korean learning
infant babbling. Unpublished manuscript, Texas, Austin.
Levitt, A. G. & Utman, J. G. A.
1992 From babbling towards the sound systems of English and French - a
longitudinal 2-case study. Journal of Child Language, 19:19-49.
Lieberman, P.
1980 On the development of vowel production in young infants. In G. H.
Yeni-Komshian & J. F. Kavanagh & C. A. Ferguson (Eds.), Child
phonology 1: Production. New York, NY: Academic Press.
Locke, J. L.
1983 Phonological acquisition and change. New York, NY: Academic
Press.
MacNeilage, P. F. & Davis, B. L.
1990 Acquisition of Speech Production: Frames, Then Content. In M.
Jeannerod (Ed.), Attention and Performance XIII: Motor Representa-
tion and Control. Hills: Lawrence Erlbaum. pp. 453-476.
1993 Motor explanations of babbling and early speech patterns. In B. de
Boysson-Bardies & S. de Schonen & P. Jusczyk & P. F. MacNeilage
& J. Morton (Eds.), Changes in Speech and Face Processing in In-
fancy: A Glimpse at Developmental Mechanisms of Cognition.
Dordrecht: Kluwer, pp. 341-352.
2000 On the Origin of Internal Structure of Word Forms. Science,
288:527-531.
MacNeilage, P. F., Davis, B. L., Kinney, A. & Matyear, C. L.
2000 The Motor Core of Speech: A Comparison of Serial Organization Pat-
terns in Infants and Languages. Child Development, 2000(1):153-163.
374 Sophie Kern and Barbara L. Davis

Maddieson, I.
1984 Patterns of sounds. Cambridge University Press.
Oller, D. K., Wieman, L. A., Doyle, W. J. & Ross, C.
1976 Infant babbling and speech. Journal of Child Language, 3:1-11.
Oller, D. K.
1980 The emergence of the sounds of speech in infancy. In G. H. Yeni-
Komshian & J. F. Kavanagh & C. A. Ferguson (Eds.), Child phonol-
ogy 1: Production. New York, NY: Academic Press, pp. 93-112.
Oller, D.K. & Delgado, R.
1990 Logical international phonetic programs. Miami: Intelligent Hearing
Systems.
Oller, D. K. & Eilers, R. E.
1982 Similarity of babbling in Spanish- and English-learning babies.
Journal of Child Language, 9:565-577.
Oller, D. K. & Steffans, M. L.
1993 Syllables and segments in infant vocalizations and young child
speech. In M. Yavas (Ed.), First and second language phonology.
San Diego: Singular Publishing Co. pp. 45-62.
Roug, L., Landburg, I. & Lundburg, L.
1989 Phonetic development in early infancy: A study of four Swedish
infants during the first eighteen months of life. Journal of Child
Language, 17 :19-40.
Rousset, I.
2004 Structures syllabiques et lexicales des langues du monde : Donnes,
typologies, tendances universelles et contraintes substantielles, Un-
published doctoral dissertation, Universit Stendhal, Grenoble,
France.
Saffran, J.R., Aslin, R.N. & Newport, E.L.
1996 Statistical learning by 8-month old infants. Science, 274: 1926-1928.
Smith, B.L., Brown-Sweeney, S. & Stoel-Gammon, C.
1989 A quantitative analysis of reduplicated and variegated babbling. First
Language, 17: 147-153.
Stark, R. E.
1980 Stages of speech development in the first year of life. In G. H. Yeni-
Komshian & J. F. Kavanagh & C. A. Ferguson (Eds.), Child phonol-
ogy 1: Production. New York, NY: Academic Press, pp. 73-91.
Stoel-Gammon, C.
1985 Phonetic inventories 15-24 months - a longitudinal study. Journal of
Speech and Hearing Research, 28:505-512.
Stoel-Gammon, C. & Cooper, J.
1984 Patterns of early lexical and phonological development. Journal of
Child Language, 11:247-271.

Emergent complexity in early vocal acquisition 375

Stoel-Gammon, C. & Harrington, P.
1990 Vowel systems of normally developing and phonologically disor-
dered infants. Clinical Linguistics and Phonetics, 4:145-160.
Teixeira, E. R. & Davis, B. L.
2002 Early sound patterns in the speech of two Brazilian Portuguese
speakers. Language and Speech, 45(2):179-204.
Thevenin, D.M., Eilers, R.E., Oller, D.K. & Lavoie, L.
1985 Where is the drift in babbling ? A Cross linguistic study. Applied
Psycholinguistics 6:1-15
Tyler, A. A. & Langsdale, T. E.
1996 Consonant-vowel interaction in early phonological development.
First Language, 16:159-191.
Vihman, M. M., Macken, M. A., Miller, R., Simmons, H. & Miller, J.
1985 From babbling to speech: A reassessment of the continuity issue.
Language, 61:397-445.
Vihman, M. M., Ferguson, C. A. & Elbert, M. F.
1986 Phonological development from babbling to speech: Common ten-
dencies and individual differences. Applied Psycholinguistics, 7:3-
40.
Vihman, M. M.
1992 Early syllables and the construction of phonology. In C. A. Ferguson
& L. Menn & C. Stoel-Gammon (Eds.), Phonological development:
Models, research, implications. Monkton, MD: York Press, pp. 393-
422.
Werker, J. F. & Lalonde, C. E.
1988 Cross language speech perception: initial capabilities and develop-
mental change. Developmental Psychology, 24:672-683.



Index


!X, 114, 122, 127, 128
!Xu, 56

abstract phonological categories,
193, 195, 200, 202, 204
abstractionist model, 10, 193, 194,
197, 198, 209
acoustic tube, 62, 77
aerodynamic factors, 126, 129
aerodynamic principle, 52, 133
Afar, 114, 116, 117, 127
agent model, 309, 314, 317
Algerian, 357
allophonic variation, 49, 50, 195,
196
ambient language, 12-14, 242, 243,
310, 315, 321, 329, 330, 342,
343, 353, 354, 356-358, 368-370
Andoke, 96
anti-phase, 13, 36, 302-308, 310,
314-317, 319-322
Arabic, 61, 95, 356, 358, 359
Egyptian, 94, 95
Tunisian, 353, 358
area function, 63
Arrernte, 299, 315, 322
articulatory gestures, 12, 32, 36, 88,
166
Articulatory Phonology, 31, 225,
300
assimilation, 30, 38, 101, 122, 187,
194, 196, 342
asymmetry, 7, 12, 49, 50, 115, 119,
120, 262, 274, 308, 315
attractor, 10, 13, 164, 167, 204, 205,
208, 209, 227, 232, 304, 305,
307, 309
attunement, 300, 310, 312, 314, 320
auditory contrast, 112
auditory similarity, 48
Australian language, 315
Austronesian, 89

babbling
canonical, 13, 353-356, 359, 368
reduplicated, 355, 360, 366
variegated, 219, 355, 358, 366,
369
Bahnaric, 335
basic segments, 89-91
basin of attraction, 205, 306
Berber, 49
Birom, 89-91
brain activation levels, 103
Brazilian-Portuguese, 355

Cantonese, 100, 356, 357
categorical
pattern, 38
perception, 29, 243, 244
phonetics, 29, 30
phonology, 29, 30
Cayuga, 100
chain shifts, 337, 340, 341
Chinese, 100, 101, 270
Chipewyan, 150-152
clusters, 1, 33, 35-37, 87, 93, 98,
120, 126, 128, 300, 307, 308,
317, 318, 320-323, 331, 333-335
coarticulation, 7, 30, 38, 60, 76, 130,
132, 196
cohesion, 124, 155, 158, 159, 161,
162, 163, 164, 166
Comanche, 98, 99
complex system, 4-6, 111, 141, 142,
144, 166, 174, 330, 353
complexity
articulatory, 2, 88, 89, 92
coding, 22
emergent, 353
inherent, 7, 8, 85, 87, 89, 92
interactional, 165, 166
378 Index

measure of, 8, 9, 25, 26, 38-40,
153
metric, 25
off-diagonal, 149, 152, 153, 155
organized, 142, 145
scale of, 7, 93, 141
score, 88, 97
segmental, 91
structural, 13, 143, 149, 150, 152,
153, 165, 330
syllabic, 93, 98
vocal, 355
constraints, 2, 4, 5, 10, 12, 31-34, 57,
93, 104, 111, 112, 115, 120, 131,
133, 144, 155, 163, 165, 167,
171, 175-177, 179, 180, 184, 186,
187, 242, 259, 262, 269, 270,
287, 309, 310, 321, 329
coupled oscillator model, 300-302,
319, 323
coupled oscillators, 301
coupling graph, 302, 303
coupling hypothesis, 244, 305, 306
critical boundary, 206, 207
critical states, 174
Czech, 50, 123

Darai, 98, 99
data-driven approach, 25, 27
deductive approach, 27, 34, 59, 62,
77
degrees of freedom, 4, 143
diachronic change, 32, 34, 37, 40,
219, 221, 232, 236
Dinka, 162
dispersion, 158, 187, 243, 257, 259
Distance models, 242, 243
distinctive region, 63, 245, 246, 259,
260
Distinctive Region Model, 63, 245,
246
distinctiveness, 2, 10, 48, 69, 112,
113, 175-177, 179, 180, 184, 187,
188
Dual Route Cascaded Model, 269
Dutch, 50, 123, 183, 184, 267, 270,
300, 322, 329, 331, 332, 334,
337, 343, 344, 353, 358-366, 369-
371
dynamic approach, 7, 68, 75-77
Dynamic Field Theory, 225, 226
dynamical model, 10-12, 224, 231,
235-238
dynamical systems, 3, 10, 193, 204,
209, 219, 227, 299

ease of learning, 103
economy, 2, 7, 40, 132, 144, 147,
155, 165, 166
efficiency constraints, 175-177, 179,
184
effort
articulatory, 1, 24, 104, 187, 188
disambiguation, 171-174
index of, 103
memory, 171-174
principle of least, 23, 171-173,
179
elementary units, 144
emergent behavior, 174
emergent properties, 4, 143
English, 10, 22, 29-31, 37, 47-56,
97, 123, 126, 172, 175-177, 181-
186, 196, 247, 250, 267, 269,
270, 275, 300, 307, 308, 315,
322, 339, 354-357, 369
American, 10, 50-52
British, 196
enhanced contrast, 206, 208
entropy, 150, 173, 174, 176, 179,
186
evolutionary dynamics, 163
evolutionary process, 63
exemplar model, 193-195, 198-200,
219, 224, 231, 232, 235, 236
expected frequency, 24, 116

Featurally Underspecified Lexicon,
194
feature couplings, 244
Index 379

Fijian, 98
fine phonetic detail (FPD), 193, 195-
198, 200, 204, 209
Finnish, 114, 122, 127, 335
Finno-Ugric, 335
Firthian Prosodic Analysis, 200
fitness measure, 163
focalization, 92
Frame Content Theory, 8, 309, 310,
323, 354, 356, 365, 366, 369
Free Dispersion model, 249
French, 11, 14, 47, 48, 62, 64, 95,
96, 113, 114, 117, 122-125, 127-
129, 132, 201, 202, 204, 206,
209, 215, 244, 246-258, 260, 262,
269-271, 276, 278, 283, 329, 353,
355-366, 370, 371
Parisian, 202
southern, 201, 202
standard, 201, 202
frequency count, 24
Fuzhou, 101

Gallo-Italic, 126
Gbaya, 98, 99
Georgian, 33, 35, 36, 37
German, 96, 183, 184, 250, 269, 300
Germanic, 219, 358, 359
gestural score, 225, 226, 302, 303
gestures, 12, 13, 36, 62, 112-114,
123, 124, 132, 197, 225, 226,
300-303, 305, 307, 308, 310-312,
314, 318, 321, 323
gradient
effects, 28, 30
patterns, 29, 30, 38
phonetics, 29, 30
phonology, 29, 30, 31
graph theory, 3, 9, 145, 165
Greek (Koine), 220
Guarani, 98

Hausa, 93, 94
Hawaiian, 94, 96
Hebbian learning model, 310
Hebrew, 176
hierarchical organisation, 267, 275-
277, 281-283, 285, 289
Hindi, 47, 48
Huastec, 93
Hungarian, 11, 220, 247-262
hysteresis, 10, 206-209

Indo-Aryan, 48
Indonesian, 89-91
information content, 21, 22, 24, 25
information theoretic analysis, 173,
175, 179
inhibitory effects, 268
innate, 35, 60
in-phase, 13, 36, 37, 302-308, 310,
314-317, 321
input statistics, 329, 332
Interactive Activation Model, 272
intergestural coordination, 24
intergestural timing, 299, 300, 301
intrinsic dynamic constraints, 300
Italian (Southern), 220

Japanese, 30, 49, 54, 55, 96, 355

Kadazan, 98, 99
Kannada, 114, 127
Kanuri, 114, 122, 127
Kikuyu, 244
Kiliva, 93
Kiowa, 89, 90, 91
Kiowa-Tanoan, 89
Klamath, 335
Koiari, 94
Korean, 355
Kuku Yalanji, 96
Kwakwala, 114, 127

Latin, 48
lawful contextual variation, 49
lenition, 11, 219-224, 231, 233-236
Lexical Access from Spectra model,
195
listeners knowledge, 54
380 Index


Malayalam, 308
Mandinka, 98, 99
Maori, 96
markedness, 1, 6, 21-25, 27, 33, 34
Marshallese, 308
Maximal Use of Available Features,
47, 144, 187
Maybrat, 96
Mazateco, 335
measure of coherence, 9, 145
memory constraints, 171
minimum of energy, 63
modular approach, 30
modular models, 28, 31
motor control abilities, 339
Motor theory of speech perception,
241

naturalness, 6, 21, 22, 24, 28, 32-35,
37, 38
Navaho, 114, 122, 127
Neighborhood Activation Model,
272
Ngizim, 114, 127
Niger-Congo, 89
Nilotic, 162
Noon, 96
normalization, 59, 60, 68, 72, 76, 77,
158, 163, 194, 199
Northern Khmu, 96
Nubian, 50
Nyah kur, 114, 127

opposing phases, 173, 174
Optimality Theory, 33
Oto-Manguean, 335

Parauk, 99
Penutian, 335
perception units, 38, 39
perceptual
distance, 159
overshoot, 59
predispositions, 241-243, 248,
249, 262
salience, 89
phase-resetting, 301
phoneme detectors, 271-273, 276,
277, 281, 282, 286, 287
phonetic feature detectors, 271
phonetic imitation, 197
phonetic knowledge, 33-35, 37, 64,
287
phonetic priming, 268, 271, 273,
281, 286
phonetic variability, 37, 221
phonological priming, 268, 270
phonological principle, 200
phonologization, 34, 35, 36, 38
Pirah, 86, 97
Polish, 100
Polynesian languages, 47
preferred syllabic patterns, 111
primary distinctive features, 56
priming effect, 267, 268, 271, 272,
277-280, 282, 285, 286, 288
primitives, 6, 7, 27, 147, 195
Proto-Gaelic, 220
Proto-Indo-European, 48, 219
pure-frame syllables, 120

Quantal Theory, 31, 92
Quechua, 114, 117, 122, 127, 355

Raeto-Romance, 126
Ratak, 308
rate of transition, 7, 73, 75
reaction times, 102
Recurrent Neural Network, 195
redundancy, 144
relative phasing, 301, 303, 304, 310
representation
abstract, 194
allophonic, 26
articulatory, 260
cognitive, 39, 40
dynamic, 59
Index 381

lexical, 195, 198, 222, 229, 236,
341, 342, 346
motor, 199, 238, 260
perceptual, 260
phonological, 10, 11, 23, 193,
194, 199-202, 208, 209, 219,
269, 329, 345
vowel, 7, 59
Romance, 358, 359
Romanian, 353, 358-361, 365, 366,
369-371
Rotokas, 56, 97
Russian, 96, 183, 184

Sanskrit, 48
scale-free network, 145, 176, 178,
180, 182, 187
scaling law, 171-174, 176-178, 180,
181, 183, 184, 186, 187
secondary distinctive features, 56
Sedang, 335
segment inventories, 40, 47, 48, 49,
56
segmental difficulty, 103
self-organization, 5, 111, 299, 300,
309, 310, 315
Seneca, 100
sensorimotor capacities, 9, 111, 115
Serbo-Croatian, 270
Slavic, 126
Sonority Sequencing Principle
(SSP), 115, 126, 127, 128
Sora, 114, 127
sound change, 34, 35, 48, 56, 57,
144, 219, 220, 221, 237
sound patterns, 9, 21, 27, 32, 39,
120, 336, 345
sound systems, 6, 21, 27, 30, 31, 39,
112, 113, 197
Spanish, 123, 183, 184, 244, 300,
315, 356
Mexican, 300
speech errors, 24, 113, 301
stability, 9, 30, 125, 145, 155, 163,
164, 166, 208, 209, 258, 274, 306
statistical approach, 329-333, 335,
343
Swedish, 56, 114, 355, 357
syllable structure, 36, 87, 96, 299,
300, 305, 309, 321-323, 332, 344
syllable types, 40, 98, 329, 331-334
symmetry, 7, 26, 32, 49, 162, 165
synchronic processes, 37
systemic compatibility, 144

target-interpolation model, 31
targets, 7, 11, 31, 59, 60, 62, 64, 68,
76, 77, 231, 271, 276-278, 357
TCDK model, 204, 207-209
Telugu, 300
temporal organization, 61
Thai, 47, 50, 52, 93, 94, 114, 116,
117, 122, 127
theory-driven approach, 25, 26
Tibetan, 99
Lhasa, 98, 99
Tlingit, 93
Totonac, 97, 98
TRACE model, 197, 272
transitions, 7, 11, 59, 60, 62, 64, 66-
68, 74, 76, 77, 174, 196, 245,
246-248, 250, 253, 255, 257-261,
263
transparency, 8, 85, 101, 182
Tukang Besi, 94
Tulu, 96
Turkish, 30, 353, 358-361, 363-366,
370, 371

ULSID, 8, 111, 113-115, 117, 120,
122, 126-128, 131
underspecification, 29
Universal Grammar, 33
UPSID, 2, 9, 26, 40, 91, 113, 143,
152-162, 164-166, 262
Ural-Altaic, 358, 359

Verbal Transformation Effect, 124
Verification Model, 269
Verlan, 113
382 Index

Vietnamese, 61, 114, 120, 122
vowel inherent spectral changes, 61,
67
vowel reduction, 59, 61, 69

Wa, 98, 99, 100, 114, 122, 127, 128

Yl Dnye, 93
Yoruba, 30, 93, 98, 355
Yupik, 98, 99, 114, 127, 128

Zipfs law, 172, 174-176, 178-180
Zulu, 47





List of Contributors

Nathalie Bedoin
Laboratoire Dynamique Du Lan-
gage, Universit de Lyon & CNRS,
France

Brandon C. Beltz
George Mason University, USA

Ren Carr
Laboratoire Dynamique Du Lan-
gage, Universit de Lyon & CNRS,
France

Ioana Chitoran
Darmouth College, USA

Abigail C. Cohn
Cornell University, USA

Christophe Coup
Laboratoire Dynamique Du Lan-
gage, Universit de Lyon & CNRS,
France

Barbara L. Davis
University of Texas at Austin, USA

Adamantios Gafos
New York University & Haskins
Laboratories, USA

Christian Geng
Center for General Linguistics, Ty-
pology and Universals Research
(ZAS), Berlin, Germany
Louis Goldstein
University of Southern California &
Haskins Laboratories, USA

Christopher T. Kello
Cognitive Science Program
University of California, Merced,
USA

Sophie Kern
Laboratoire Dynamique Du Lan-
gage, Universit de Lyon & CNRS,
France

Christo Kirov
New York University, USA

Sonia Krifi
Hpital Debrousse, Lyon, France

Ian Maddieson
University of California, Berkeley
and University of New Mexico, USA

Egidio Marsico
Laboratoire Dynamique Du Lan-
gage, Universit de Lyon & CNRS,
France

Hosung Nam
Haskins Laboratories, USA




384 List of Contributors

Nol Nguyen
Laboratoire Parole et Langage, Aix-
Marseille Universit & CNRS,
France

John J. Ohala
University of California, Berkeley,
USA

Franois Pellegrino
Laboratoire Dynamique Du Lan-
gage, Universit de Lyon & CNRS,
France

Yvan Rose
Memorial University of Newfound-
land, Canada

Solange Rossato
GIPSA-Lab, Dpartement Parole et
Cognition, INPG, Universit Sten-
dhal, Universit Joseph Fourier,
Grenoble, France

Isabelle Rousset
GIPSA-Lab, Dpartement Parole et
Cognition, INPG, Universit Sten-
dhal, Universit Joseph Fourier,
Grenoble, France

Elliot Saltzman
Boston University & Haskins Labo-
ratories, USA

Willy Serniclaes
Laboratoire de Psychologie de la
Perception, CNRS & Universit
Paris Descartes, France


Betty Tuller
Center for Complex Systems and
Brain Sciences, Boca Raton, FL,
USA

Nathalie Valle
GIPSA-Lab, Dpartement Parole et
Cognition, INPG, Universit Sten-
dhal, Universit Joseph Fourier,
Grenoble, France

Sophie Wauquier
Laboratoire Structures formelles du
langage, Universit de Paris 8 &
CNRS, France

Você também pode gostar