Tones and Features PDF

Tones and Features
De Gruyter Mouton
Studies in Generative Grammar 107
Editors
Harry van der
Hulst J an Koster
Henk van Riemsdijk
Tones and Features
Phonetic and Phonological Perspectives
edited by
J ohn A. Goldsmith, Elizabeth Hume,
and W. Leo Wetzels
De Gruyter Mouton
The series Studies in Generative Grammar was formerly published by Foris Publications Holland.
ISBN 978-3-11-024621-6
e-ISBN 978-3-11-024622-3
ISSN 0167-4331
Library of Congress Cataloging-in-Publication Data
Tones and features : phonetic and phonological perspectives / edited by J ohn A.
Goldsmith, Elizabeth Hume, Leo Wetzels.
p. cm. (Studies in generative grammar; 107)
Includes bibliographical references and index.
ISBN 978-3-11-024621-6 (alk. paper)
1. Phonetics. 2. Grammar, Comparative and generalPhonology.
I. Goldsmith, J ohn A., 1951- II. Hume, Elizabeth V., 1956- III. Wetzels, Leo.
P217.T66 2011
414'.8dc23
2011030930
Bibliographic information published by the Deutsche Nationalbibliothek
The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliograe;
detailed bibliographic data are available in the Internet at http://dnb.d-nb.de.
2010 Walter de Gruyter GmbH & Co. KG, Berlin/Boston
Typesetting: ReneCatch Ltd, Bungay, Suffolk
Printing: Hubert & Co. GmbH & Co. KG, Gttingen
Printed on acid-free paper

Printed in Germany
www.degruyter.com
Contents
Preface vii
John Goldsmith, Elizabeth Hume and W. Leo Wetzels
1. The representation and nature of tone
Do we need tone features? 3
G. N. Clements, Alexis Michaud and Cdric Patin
Rhythm, quantity and tone in the Kinyarwanda verb 25
John Goldsmith and Fidle Mpiranya
Do tones have features? 50
Larry M. Hyman
Features impinging on tone 81
David Odden
Downstep and linguistic scaling in Dagara-Wul 108
Annie Rialland and Penou-Achille Som
2. The representation and nature of phonological features
Crossing the quantal boundaries of features: Subglottal
resonances and Swabian diphthongs 137
Grzegorz Dogil, Steven M. Lulich,
Andreas Madsack, and Wolfgang Wokurek
Voice assimilation in French obstruents: Categorical or gradient? 149
Pierre A. Hall and Martine Adda-Decker
An acoustic study of the Korean fricatives /s, s'/:
implications for the features [spread glottis] and [tense] 176
Hyunsoon Kim and Chae-Lim Park
vi Contents
Autosegmental spreading in Optimality Theory 195
John J. McCarthy
Evaluating the effectiveness of Unied Feature Theory
and three other feature systems. 223
Jeff Mielke, Lyra Magloughlin, and Elizabeth Hume
Language-independent bases of distinctive features 264
Rachid Ridouane, G. N. Clements and Rajesh Khatiwada
Representation of complex segments in Bulgarian 292
Jerzy Rubach
Proposals for a representation of sounds based on their
main acoustico-perceptual properties 306
Jacqueline Vaissire
The representation of vowel features and vowel
neutralization in Brazilian Portuguese (southern dialects) 331
W. Leo Wetzels
Index 361
Preface
The papers in this volume are all concerned with two current topics in
phonology: the treatment of features, and the treatment of tone. Most of them
grew out of a conference at the University of Chicagos Paris Center in J une
of 2009 which was organized by friends and colleagues of Nick Clements in
tribute to decades of contributions that he had made to the eld of phonology,
both in the United States and in France. Nicks work served as a natural focus
for the discussions and interactions that resulted in the papers that the reader
will nd in this book. We, the editors, would like to say a bit about Nicks
career and his work in order to set the context.
1. G. N. Clements
Nick was an undergraduate at Yale University, and received his PhD from
the School of Oriental and African Studies, University of London, for a
dissertation on the verbal syntax of Ewe in 1973, based on work that he did in
the eld. In the 1970s, he spent time as a post-doctoral scholar at MIT and then
as a faculty member in the Department of Linguistics at Harvard University.
Throughout this period he published a series of very inuential articles and
books on areas in phonological theory, a large portion of which involved
linguistic problems arising out of the study of African languages. His work
in this period played an essential role in the development of autosegmental
phonology, and his work in the 1980s, when he was a professor of linguistics
at Cornell University, was crucial in the development of many of the current
views on features, feature geometry, sonority, and syllabication. He worked
closely with students throughout this timeincluding one of us, Elizabeth
Humeat Cornell. He also co-wrote books with several phonologists
(Morris Halle, J ay Keyser, J ohn Goldsmith) and collaborated on many
research projects.
In 1991, Nick moved to Paris, where he and his wife, Annie Rialland,
worked together on projects in phonetics, phonology, and many other things,
both linguistic and not. Visiting Nick in Paris became an important thing for
phonologists to do when they had the opportunity to come to Paris. Over the
next twenty years or so Nick continued to work selessly and generously
viii Preface
with students and more junior scholars, and was widely sought as an invited
speaker at conferences.
Nick passed away a few months after the conference, late in the summer
of 2009. Many of his friends (and admirers) in the discipline of phonology
had been able to express their admiration for his contributions through their
papers and their kind words at the time of the conference in J une. This book is
offered as a more permanent but equally heartfelt statement of our affection
and respect for Nicks work in phonology and in linguistics more broadly.
2. Tone
The proper treatment of tonal systems has long been an area of great activity
and curiosity for phonologists, and for several reasons. Tonal systems appear
exotic at rst blush to Western European linguists, and yet are common
among languages of the world. The phonology of tone is rich and complex, in
ways that other subdomains of phonology do not illustrate, and yet each step
in our understanding of tonal systems has shed revelatory light on the proper
treatment of other phonological systems. At every turn, tonal systems stretch
our understanding of fundamental linguistic concepts: many languages
exhibit tonal contrasts, in the sense that there are lexical contrasts that are
physically realized as different patterns of fundamental frequency distributed
globally over a word. But from a phonological point of view, words are not
unanalyzable: far from itthey are composed in an organized fashion from
smaller pieces, some mixture of feet, syllables, and segments. Breaking a
pitch pattern (when considering an entire word) into pieces that are logically
related to phonological or morphological subpieces (which is ultimately
ninety percent of a phonologists synchronic responsibility) has proven time
and time again to be an enormous challenge in the arena of tone. One of
the classic examples of this challenge can be found in Clements and Fords
paper (1979) on Kikuyu tone. In Kikuyu, the surface tone of each syllable
is essentially the expression of the previous syllables tonal specication.
Each syllable (often, though not always, a distinct morpheme) thus has an
underlying we are tempted to say, a logicaltone specication, but that
specication is realized just slightly later in the word than the syllable that
comprises the other part of the underlying form. Morphemes in such a system
show utter disregard for any tendency to try to be realized in a uniform way
across all occurrences; tones seem to assert their autonomy and the privileges
that come with that, and use it to produce a sort of constant syncopation in
the beat of syllable against tone.
Preface ix
Is tone, then, different from other phonological features? This question
is directly posed by three papers in this volume, that by Nick Clements and
colleagues, that by Larry Hyman, and that by David Odden. Each is written
with the rich background of several decades of research on languages largely
African tone languages, at least as far as primary research is concerned, but also
including the fruits of research done on Asian languages over decades as well. In
the end, Clements, Michaud, and Patin conclude that tonal features may well be
motivated in our studies of tonal systems, but the type of motivation is different
in kind from that which is familiar from the study of other aspects of phonology.
Hyman, for his part, is of a similar conviction: if tones are analyzed featurally
in the ultimate model of phonology, it is not a step towards discovering ultimate
similarity between tone and every other phonological thing: tones diversity
in its range of behavior keeps it distinct from other parts of phonology. David
Oddens chapter also focuses on the motivation for tonal features. However, his
focus is on the types of evidence used to motivate a given feature. Along these
lines, he argues that tonal features, like other phonological features, are learned
on the basis of phonological patterning rather than on the basis of the physical
properties of the sounds (for related discussion, see Mielke 2008).
Goldsmith and Mpiranyas contribution addresses not features for tone,
but rather one particular characteristic of tone that keeps it distinct from other
aspects of phonology: tones tendency to shift its point of realization (among a
words syllables) based on a global metrical structure which is erected on the
entire word. This is similar to the pattern we alluded to just above in Kikuyu,
but in Kinyarwanda, certain High tones shift their autosegmental association
in order to appear in weak or strong rhythmic positions: a bit of evidence that
rhythmicity is an important organization principle of tonal assignment, in at
least some languages, much like that seen in accent assignment and rarely, if
ever, seen in other aspects of a phonological system.
The theme of rhythmicity is continued in the paper by Annie Rialland and
Penou-Achille Som. They hypothesize that there is a relationship between the
linguistic scaling in Dagara-Wul, as manifested in downstep sequences, and the
musical scaling in the same culture, as found in an eighteen key xylophone. They
suggest that downstep scaling and xylophone scaling may share the property of
being comprised of relatively equal steps, dened in terms of semitones.
3. Features
The hypothesis that the speech chain can be analyzed as a sequence of
discrete segments or phonemes, themselves decomposable into a set of
x Preface
phonological features, has been at the core of almost a century of research
in the sound structure of human language. By virtue of their contrastive
nature, phonological features function as the ultimate constitutive elements
of the sound component in the sound-to-meaning mapping, while, being
both restricted in number at the individual language level and recurrent
across languages, their intrinsic characteristics are often associated with
general properties of human anatomy and physiology. Apart from being
distinctive, phonological features appear to be economical in the way they
combine to construct phoneme systems and to express, individually or in
combination, the regularity of alternating sound patterns, both historically
and synchronically.
It was discovered by Stevens (1972) that small articulator movements in
specic areas of the articulatory space may lead to large acoustic changes,
whereas, in other regions, relatively large movements lead to only minor
acoustic variations. Stevens quantal model of distinctive features forms the
theoretical background of the study by Dogil and his colleagues, who discuss
the function of subglottal resonances in the production and perception of
diphthongs in a Swabian dialect of German. It is observed that Swabian
speakers arrange their formant movements in such a way that the subglottal
resonance region is crossed in the case of one diphthong and not the other.
In Stevens model, the dening acoustic attributes of a feature are a direct
consequence of its articulatory denition. The relation between articulation
and acoustics is considered to be language-independent, although a feature
may be enhanced language-specically to produce additional cues that aid
in its identication. As required by the naturalness condition, phonological
features relate to measurable physical properties. Therefore, to the extent
that features can be shown to be universal, it is logical to ask what the
dening categories of a given feature are that account for the full range of
speech sounds characterized by it. This problem is explicitly addressed in
the chapter by Ridouane, Clements, and Khatiwada, who posit the question
of how [spread glottis] segments are phonetically implemented, and propose
a language-independent articulatory and acoustic denition of this feature.
Also following the insights of Stevens quantal theory, Vaissire elaborates
a phonetic notation system based on the combination of acoustic and
perceptual properties for ve reference vowels and discusses its advantages
over J ones articulation-based referential system of cardinal vowels. Kim and
Park address the issue of how the opposition between the Korean fricatives
/s, s/ is best characterized in phonetic terms. From their acoustic data they
conclude that the most important parameter that distinguishes these sounds
is frication duration, which is signicantly longer in /s/ than in /s/. They
Preface xi
propose that this difference is best expressed by reference to the feature
[tense].
Discovering the smallest set of features able to describe the worlds sound
patterns has been a central goal of phonological theory for close to a century,
leading to the development of several different feature theories. The chapter
by Mielke, Magloughlin, and Hume compares the effectiveness of six theories
to classify actually occurring natural and unnatural classes of sounds. They
show that a version of Unied Feature Theory (Clements and Hume 1995)
with binary place features, as suggested by Nick Clements in 2009, performs
better than other proposed theories.
Another important topic in feature research concerns the relation between
the feature structure of phonological representations and phonological
processes or constraints. How are segments, morphemes or words represented
in terms of their feature composition, and which features pattern together
in phonological processes and bear witness to their functional unity? Hall
and Adda-Decker study the latter question by examining whether voice
assimilation in French consonant clusters is complete or partial. They show
that, of the acoustic parameters involved in the assimilation process, voicing
ratios change categorically, whereas secondary voicing cues remain totally
or partially unaffected. They propose to describe voicing assimilation in
French as a single-feature operation affecting the [voice] feature. Rubach
addresses the question whether palatalized and velarized consonants should
be treated as complex or as simplex segments in terms of their geometrical
representation. Looking at Bulgarian data, he concludes that palatalization
as well as velarization on coronals and labials are represented as separate
secondary articulations. In his study on mid-vowel neutralizations in Brazilian
Portuguese, Wetzels argues for a gradient four-height vowel system for this
language. The interaction between vowel neutralization and independent
phonotactic generalizations suggests that vowel neutralization cannot be
represented as the simple dissociation from the relevant contrastive aperture
tier, but is best expressed by a mechanism of marked-to-unmarked feature
substitution. McCarthys paper provides a detailed discussion of how vowel
harmony should be accounted for in Optimality Theory. Since proposals for
dealing with vowel harmony as embedded in parallel OT make implausible
typological predictions, he proposes a theory of Serial Harmony that con-
tains a specic proposal about the constraint that favors autosegmental
spreading within a derivational harmonic serialism approach to phonological
processes.
In addition to the authors noted above and the participants at the 2009
Paris symposium, we would like to acknowledge others who contributed
xii Preface
to this tribute to our friend and colleague, Nick Clements. The University
of Chicago generously provided its Paris Center where the symposium was
held, and we would like to thank Franoise Meltzer and Sebastien Greppo,
Director and Administrative Director of the Paris Center, respectively, for
their invaluable assistance in organizing the event. We are also grateful to
Deborah Morton of The Ohio State University Department of Linguistics
for editorial help in preparing the manuscripts for publication, and to J ulia
Goldsmith for her assistance in creating the index. Likewise, our appreciation
extends to the editorial staff at Mouton de Gruyter, including J ulie Miess,
and the late Ursula Kleinhenz for her enthusiastic support of this project.
J ohn A. Goldsmith, Elizabeth Hume, W. Leo Wetzels
References
Clements, G.N. and Kevin C. Ford
1979 Kikuyu tone shift and its synchronic consequences. Linguistic Inquiry
(10.2): 179210.
Clements, G.N. and Elizabeth Hume
1995 The internal organization of speech sounds. In J ohn A. Goldsmith
(ed.), The Handbook of Phonological Theory, 245306. Oxford:
Blackwell.
Mielke, J eff
2008 The Emergence of Distinctive Features. Oxford: Oxford University
Press.
Stevens, K.N.
1972 The quantal nature of speech; Evidence from articulatory-acoustic
data. In: P.B. Denes and E.E. David J r. (eds.), Human Communication:
A Unied View, 5166. New York: McGraw-Hill.
1. The representation and nature of tone
Do we need tone features?
G.N. Clements, Alexis Michaud, and Cdric Patin
Abstract. In the earliest work on tone languages, tones were treated as
atomic units: High, Mid, Low, High Rising, etc. Universal tone features
were introduced into phonological theory by Wang 1967 by analogy to the
universal features commonly used in segmental phonology. The implicit
claim was that features served the same functions in tonal phonology as in
segmental phonology. However, with the advent of autosegmental phonology
(Goldsmith 1976), much of the original motivation for tone features dis-
appeared. Contour tones in many languages were reanalyzed as sequences
of simple level tones, calling into question the need for tonal features such
as [falling]. Processes of tone copy such as L(ow) >H(igh) / __ H(igh)
were reinterpreted as tone spreading instead of feature assimilation. At about
the same time, a better understanding of downstep emerged which allowed
many spurious tone levels to be eliminated. As a result, in spite of the vast
amount of work on tone languages over the past thirty years, the number
of phenomena that appear to require tone features has become signicantly
reduced, raising the issue whether the notion of tone features is at all useful.
This paper rst reviews the basic functions for which segmental features
have been proposed, and then examines the evidence that tone features are
needed to serve these or other functions in tone languages. The discussion
focuses successively on level tones, contour tones, and register, building on
examples from Africa and Asia. Our current evaluation of the evidence is
that tone features, to the extent that they appear motivated at all, do not serve
the same functions as segmental features.
1. Introduction
In this introduction, we review criteria that are commonly used in feature
analysis in segmental phonology, and suggest that these criteria have not, in
general, been successfully extended to tonal phonology.
Some important functions of features in segmental phonology are
summarized in Table 1.
1
4 G.N. Clements, Alexis Michaud, Cdric Patin
Table 1. Some common functions of features in segmental phonology
Function example (segments)
distinctive distinguish phonemes/
tonemes
/p/ and /b/ are distinguished by
[voice]
componential dene correlations (sets
distinguished by one
feature)
[voiced] p t c k
[+voiced] b d j g
classicatory dene natural classes (rule
targets, rule contexts)
[sonorant] sounds are devoiced
word-nally
dynamic dene natural changes
(such as assimilation)
obstruents become [+voiced]
before [+voiced] consonants
It is usually held, since the work of J akobson, Fant and Halle (1952), that
one small set of features largely satises all functions. We have illustrated
this point by using the feature [voiced] in the examples above. It is also
usually believed that each feature has a distinct phonetic denition at the
articulatory or acoustic/auditory level, specic enough to distinguish it from
all other features, but broad enough to accommodate observed variation
within and across languages. In this sense, features are both concrete and
abstract.
With very few exceptions, linguists have also maintained that features
are universal, in the sense that the same features tend to recur across lan-
guages. Thus the feature [labial] is used distinctively to distinguish
sounds like /p/ and /t/ in nearly all languages of the world. Such recur-
rence is explained by common characteristics of human physiology and
audition.
2
Although all the functions in Table 1 have been used in feature analysis
at one time or another, the trend in more recent phonology has been to
give priority to the last two functions: classicatory and dynamic. We will
accordingly give these functions special consideration here.
Feature theory as we understand it is concerned with the level of
(categorical) phonology, in which feature contrasts are all-or-nothing, rather
than gradient. Languages also have patterns of subphonemic assimilation
or coarticulation which adjust values within given phonological categories.
Such subphonemic variation does not fall within the classical functions of
features as summarized in Table 1, and it should be obvious that any attempt
to extend features into gradient phenomena runs a high risk of undermining
other, more basic functions, such as distinctiveness.
Traditionally, rather high standards have been set for conrming proposed
features or justifying new ones. The most widely-accepted features have
been founded on careful study of evidence across many languages. Usual
requirements on what counts as evidence for any proposed feature analysis
include those in (1).
(1) a. phonetic motivation: processes cited in evidence for a feature are
phonetically motivated.
b. recurrence across languages: crucial evidence for a feature must
be found in several unrelated languages.
c. formal simplicity: the analyses supporting a given feature are
formally and conceptually simple, avoiding multiple rules, brackets
and braces, Greek letter variables, and the like.
d. comprehensiveness: analyses supporting a given feature cover all
the data, not just an arbitrary subset.
Proposed segmental features that did not receive support from analyses
meeting these standards have not generally survived (many examples can be
cited from the literature).
The case for tone features, in general, has been much less convincing
than for segmental features. One reason is that much earlier discussion was
vitiated by an insufcient understanding of:
autosegmental properties of tone: oating tones, compositional
contour tones, toneless syllables, etc.
downstep: for example,
!
H tones (downstepped High tones) being
misinterpreted as M(id) tones
intonational factors: downdrift, nal lowering, overall declination
contextual variation, e.g. H(igh) tones are often noncontrastively
lower after M(id) or L(ow) tones
As a result, earlier analyses proposing assimilation rules must be reexamined
with care. Our experience in the African domain is that most, if not all, do not
involve formal assimilation processes at all.
A second reason, bearing on more recent analysis, is that the best
arguments for tone features have often not satised the requirements shown
in (1). Feature analyses of tonal phenomena, on close examination, very often
prove to be phonetically arbitrary; idiosyncratic to one language; complex
(involving several rules, Greek-letter variables, abbreviatory devices, etc.);
and/or noncomprehensive (i.e. based on an arbitrary selection of cherry-
picked data).
A classic example in the early literature is Wangs celebrated analysis
of the Xiamen tone circle (Wang 1967; see critiques by Stahlke 1977,
Chen 2000, among others). Wang devised an extremely clever feature
system which allowed the essentially idiosyncratic tone sandhi system of
Xiamen to be described in a single (but highly contrived) rule in the style of
Chomsky & Halle 1968, involving angled braces, Greek letter variables, etc.
Unfortunately, the analysis violated criteria (1ac), viz. phonetic motivation,
recurrence across languages, and formal simplicity. As it had no solid
crosslinguistic basis, it was quickly and widely abandoned.
The following question can and should be raised: when analyses not
satisfying the criteria in (1) are eliminated, do there remain any convincing
arguments for tone features?
2. The two-feature model
Though there have been many proposals for tone feature sets since Wangs
pioneering proposal (see Hyman 1973, Anderson 1978), recent work on
this topic has converged on a model which we will term the Two-Feature
Model.
In its essentials, and abstracting from differences in notation and
terminology from one writer to another, the Two-Feature Model posits
two tone features, one dividing the tone space into two primary registers
(upper and lower, or high and low), and the other dividing each primary
register into secondary registers. The common core of many proposals since
Yip [1980] 1990 and Clements 1983
3
is shown in (2). This model applies
straightforwardly to languages that contrast four level tones.
(2) top high mid low
register H H L L
subregister h l h l
We use the conventional terms top, high, mid, and low for the four
tones of the Two-Feature Model in order to facilitate comparison among
languages in this paper. The model outlined in (2) analyzes these four tones
into two H-register tones, top and high, and two L-register tones, mid and
low. Within each of these registers, the subregister features, as we will call
them, divide tone into subregisters; thus the top and high tone levels are
assigned to the higher and lower subregisters of the H register, and the mid
and low tones are likewise assigned to the higher and lower subregisters of
the L register.
The Two-Feature Model, like any model of tone features, makes a number
of broad predictions. Thus:
attested natural classes should be denable in terms of its features
natural assimilation/dissimilation processes should be describable by
a single feature change
recurrent natural classes and assimilation/dissimilation processes
which cannot be described by this model should be unattested (or
should be independently explainable)
We add two qualications. First, more developed versions of the Two-
Feature Model have proposed various feature-geometric groupings of tone
features. We will not discuss these here, as we are concerned with evidence
for tone features as such, not for their possible groupings. Second, there
exist various subtheories of the Two-Feature Model. Some of these, such
as the claim that contour tones group under a single Tonal Node, have been
developed with a view to modeling Asian tone systems (most prominently
those of Chinese dialects), while others were proposed on the basis of
observations about African languages. Again, we will not discuss these
subtheories here except to the extent that they bear directly on evidence for
tone features.
3. Assimilation
As we have seen, much of the primary evidence for segmental features
has come from assimilation processes in which a segment or class of seg-
ments acquires a feature of a neighboring segment or class of segments,
becoming more like it, but not identical to it. (If it became identical to it we
would be dealing with root node spreading or copying rather than feature
spreading).
We draw a crucial distinction between (phonological) assimilation, which
is category-changing, and phonetic assimilation, or coarticulation, which
is gradient. A rule by which a L tone acquires a higher contextual variant
before H in a language with just two contrastive tone levels, L and H, is not
phonological. In contrast, a rule L M in a language having the contrastive
tone levels L, M, and H is neutralizing and therefore demonstrably category-
changing. As we are concerned here with phonological features, we will be
focusing exclusively on phonological assimilation.
4
Now when we look through the Africanist literature, an astonishing
observation is the virtual absence of clear cases of phonological assimilation
in the above sense. The vast number of processes described in the literature
since the advent of autosegmental phonology involve shifts in the alignment
between tones and their segmental bearing units. Processes of apparent tone
assimilation such as L H / __ H are described as tone spreading rather than
feature assimilation.
One apparent case of assimilation that has frequently been cited in the
recent literature proves to be spurious. Yala, a Niger-Congo language spoken
in Nigeria, has three distinctive tone levels: H(igh), M(id), and L(ow). This
language has been described as having a phonological assimilation rule by
which H tones are lowered to M after M or L (Bao 1999, Yip 2002, 2007, after
Tsay 1994). According to the primary source for this language, Armstrong
1968, however, Yala has no such rule. Instead, Yala has a downstep system
by which any tone downsteps a higher tone: M downsteps H, L downsteps
H, and L downsteps M.
Downstep is non-neutralizing, so that, e.g. a downstepped H remains
higher than a M. Yala is typologically unusual, though not unique, in having
a three-level tone system with downstep, but Armstrongs careful description
leaves no doubt that the lowering phenomenon involves downstep and not
assimilation.
5
Our search through the Africanist literature has turned up one possible
example of an assimilation process. Unfortunately, all data comes from a
single source, and it is possible that subsequent work on this language may
yield different analyses. However, as it is the only example we have found to
date, it is worth examining here.
Bariba (also known as Baatonu), a Niger-Congo language spoken in
Benin (Welmers 1952), has four contrastive tone levels. We give these with
their feature analysis under the Two-Feature Model in (3). (Tone labels
top, high, mid, and low are identical to those of Welmers, but we
have converted his tonal diacritics into ours, as given in the last line.)
register H H L L
subregister h l h l
a
By a regular rule, a series of one or more high tones at the end of a word
becomes mid after low at the end of a sentence (Welmers 1952, 87). In
rule notation, this gives H
1
M / L __ ]
S
.

Examples are given in (4ab)
(alternating words are underlined):
(4) a. n b` :r bu I broke a stick (b` :r a stick)
b. n b w I saw a goat / n b w I saw a child
Example (4a) illustrates one condition on the rule: the target H tone of /b` :r/
in I broke a stick occurs after L, as required, but does not occur sentence-
nally, and so it does not lower; in the second example (a stick), however,
both conditions are satised, and H lowers to M. (4b) illustrates the other
condition: the target H tone of /w/ in I saw a goat occurs sentence-nally,
but does not occur after a L tone, and so it does not lower; in the second
example (I saw a child), both conditions are satised, and the H tone lowers
as expected.
Considering the formal analysis of this process, it is obvious that the Two-
Feature Model provides no way of describing this assimilation as spreading.
Consider the LH input sequence as analyzed into features in (5):
(5) low high
register L H
subregister l l
We cannot spread the L register feature from the L tone to the H tone,
as this would change it to L, not M. Nor can we spread the l subregister
feature from the L tone to the H tone, as this would change nothing (H would
remain H).
Other analyses of the Bariba data are possible, and we briey consider
one here, in which what we have so far treated as a M tone is reanalyzed
as a downstepped H tone.
6
There is one piece of evidence for this analysis:
according to Welmers data, there are no M-H sequences. (Welmers does not
make this observation explicitly, so we cannot be sure whether such sequences
could be found in other data, but for the sake of argument we will assume
that this is an iron-clad rule.) We can see two straightforward interpretations
for such a gap. One is that M is a downstepped H synchronically, in which
case any H following it would necessarily be downstepped. The other is that
M is synchronically M, as we have assumed up to now, but has evolved from
an earlier stage in which M was !H (see Hyman 1993 and elsewhere for
numerous examples of historical *!H >M shifts in West African languages).
The absence of M-H sequences would then be a trace of the earlier status of
M as a downstepped H.
Looking through Welmers description, we have found no further
evidence for synchronic downstep in the Bariba data. If Bariba were a true
downstepping language, we would expect iterating downsteps, but these are
not found in the language. Welmers presents no sequences corresponding
to H !H !H, as we nd pervasively in classic downstep systems; we would
expect that if the second of two successive M tones were produced on a new
contrastive lower level in some examples, Welmers would have commented
on it. Also, M does not lower any other tone, notably the top tone. A downstep
analysis would therefore have to be restricted by rather tight conditions. In
contrast, if M is really M, the only statement needed is a constraint prohibiting
M-H sequences, which accounts for all the facts.
We conclude that Bariba offers a signicant prima facie challenge to the
Two-Feature Model, while admitting that further work on this language is
needed before any denitive conclusion can be drawn.
4. Interactions between nonadjacent tones
We have so far examined possible cases of interactions between adjacent
tones. A particularly crucial question for the Two-Feature Model concerns
the existence of interactions between nonadjacent tones. We show the Two-
Feature Model again in (2):
register H H L L
subregister h l h l
This model predicts that certain nonadjacent tones may form natural classes
and participate in natural assimilations. In a four-level system, top and mid
share the feature h on their tone tier, and high and low the feature l. Thus,
under the Two-Feature Model we expect to nd interactions between top
and mid tones, on the one hand, and between high and low tones, in the
other, in both cases skipping the intermediate tone. A few apparent cases of
such interactions were cited in the early 1980s, all from African languages,
and have been cited as evidence for the Two-Feature Model, but no new
examples have been found since, as far as we know. Reexamination of the
original cases would seem to be called for.
A small number of African languages, including Ewe and Igede, have
alternations between non-adjacent tone levels. We will examine Ewe here,
as it has often been cited as offering evidence for the Two-Feature Model
(Clements 1983, Odden 1995, Yip 2002). We will argue that while the
alternations between nonadjacent tones in Ewe are genuine, they do not offer
evidence for a feature analysis, either synchronically or historically.
The facts come from a rule of tone sandhi found in a variety of Ewe
spoken in the town of Anyako, Ghana, as originally described by Clements
1977, 1978. While most varieties of Ewe have a surface three-level tone
system, this variety has a fourth, extra-high level. We will call this the top
level consistent with our usage elsewhere in this paper. These four levels are
characterized in the Two-Feature Model in the same way as the other four-
level systems discussed so far (see 2 above).
The tone process of interest was stated by Clements 1978 as follows.
Whenever an expected M tone is anked by H tones on either side, it is
replaced by a T(op) tone, which spreads to all anking H tones except the
very last. Examples are shown in (6).
(6) / kp + m qb / kpe
me
qb
stone behind behind a stone
/ ty
k + dy
/ tyke
dy
medicine on on medicine
/g + hm + g

+ q / g h me
g

a
q
money sum large INDEF much money
/ny:
n v + + w + v/ ny: n -v a
w v
girl DEF PL come the girls came
In the rst example, the M tone of the second word /m qb / behind shifts to
T since it is anked by H tones. The second example shows that this sandhi
process is not sensitive to the location of word boundaries (but see Clements
1978 for a discussion of syntactic conditions on this rule). In the third example,
the targeted M tone is borne by the last syllable of /hm / sum; this M tone
meets the left-context condition since the rising tone on the rst syllable of
/hm / consists formally of the two level tones LH (see Clements 1978 for
further evidence for the analysis of contour tones in Ewe into sequences of
level tones). The fourth example shows the iteration of T spreading across
tones to the right. This rule must be regarded as phonological since the Top,
i.e. extra-high, tones created by this process contrast with surface high tones
at the word level:
(7) /n + ny + l/ n -nya
-l
thing wash AGENT washer (wo)man
/n + ny + l/ n-ny-l
thing know AGENT sage, scholar
In Clements original analysis (1983), as recapitulated above, the tone-raising
process involves two steps, both invoking tone features. First, the H register
feature spreads from the H tones to the M tone, converting it into T. Second,
the h subregister feature of the new T tone spreads to adjacent H tones,
converting them into T tones (the last H tone is excluded from the spreading
domain). It is the rst of these processes that is crucial, as it gives evidence
for tone assimilation between nonadjacent tone levels prime evidence for
the Two-Feature Model.
The analysis we have just summarized is simple, but it raises a number
of problems. First, there is no apparent phonetic motivation for this process:
not only does it not phonologize any detectable natural phonetic trend, it
renders the location of the original M tone unrecoverable. Second, no other
phonologically-conditioned raising process of this type has come to light;
this process appears to be unique to Anyako Ewe, and is thus idiosyncratic.
Third, though the analysis involves two rules, there is in fact no evidence
that two distinct processes are involved; neither of the hypothesized rules
applies elsewhere in the language. (Top tones arising from other sources
do not spread to H tones.) Thus, the rule seems arbitrary in almost every
respect. Notably, it does not satisfy the rst three criteria for feature analysis
as outlined in (1).
Are other analyses of these data possible? We will consider one here that
draws on advances in our knowledge of West African tonal systems in both
their synchronic and diachronic aspects. More recent work on tone systems
has brought to light two common processes in West African languages. First,
H tones commonly spread onto following L tone syllables, dislodging the L
tone. This is a common source of downstep. Schematically, we can represent
this process as H L H H H
!
H. Second, by a common process of H Tone
Raising, H tones are raised to T before lower tones. Thus we nd H T / __
L in Gurma (Rialland 1981) and Yoruba (Laniran & Clements 2003).
There is some evidence that such processes may have been at work in
the Ewe-speaking domain. Clements 1977 observes that some speakers
of western dialects of Ewe (a zone which includes Anyako Ewe) use
nondistinctive downstep. Welmers 1973: 91 observes distinctive downstep
in some dialects, and observes that the last H preceding a downstep +H
sequence is considerably raised.
Accordingly, we suggest a historical scenario in which original H M H
sequences underwent the following changes:
(8) Processes result
introduction of nondistinctive downstep H M
!
H
H spread, downstep becomes distinctive H H
!
H
H raising before downstep, rendering it nondistinctive H T
!
H
loss of downstep H T H
T spreads to all anking H tones but the last T T T
In this scenario, there would have been no historical stage in which M shifted
directly to T. Any synchronic rule M T would have to conate two or three
historical steps.
Inspired by this scenario, we suggest an alternative analysis in which M
Raising is viewed as the telescoped product of several historical processes.
In a rst step, all consecutive H tones in the sandhi domain are collapsed into
one; this is reminiscent of a cross-linguistic tendency commonly referred to
as the Obligatory Contour Principle (see in particular Odden 1986, McCarthy
1986). The nal H remains extraprosodic, perhaps as the result of a constraint
prohibiting nal T tones in the sandhi domain. Second, H M H sequences
(where M is singly linked) are replaced by T: see Table 2.
Table 2. A sample derivation of the girls came, illustrating the reanalysis of
M Raising as the product of several historical processes.
ny:
n v a w v

H M HH H H
underlying representation
ny:
n v w v

H M H (H)
1. OCP(H), subject to extraprosodicity
(no overt change)
ny: n v a
w v 2. replacement of H M H by T
This analysis is, of course, no more natural than the rst. We have
posited a rule of tone replacement, which has no phonetic motivation.
However, it correctly describes the facts. Crucially, it does not rely on tone
features at all.
Ewe is not the only African language which has been cited as offering
evidence for interactions among nonadjacent tone levels. Perhaps the best-
described of the remaining cases is Igede, an Idomoid (Benue-Congo,
T (H)
Niger-Congo) language spoken in Nigeria (see Bergman 1971, Bergman &
Bergman 1984). We have carefully reviewed the arguments for interactions
among nonadjacent tone levels in this language as given by Stahlke 1977 and
nd them unconvincing. In any case, no actual synchronic analysis of this
language has yet been proposed (Stahlkes analysis blends description and
historical speculation). Such an analysis is a necessary prerequisite to any
theoretical conclusions about features.
In sum, examining the evidence from natural assimilations and predicted
natural classes of tones, the Two-Feature Model appears to receive little if
any support from African languages. Conrming cases are vanishingly few,
and the best-known of them (Ewe) can be given alternative analyses not
requiring tone features. We have also described a potential disconrming
case (Bariba). Perhaps the most striking observation to emerge from this
review is the astonishingly small number of clearly-attested assimilation
processes of any kind. Whether this reects a signicant fact about West
African tonology, or merely shows that we have not yet looked at enough
data, remains to be seen.
5. Register features in Asian languages
The concept of register has long been used in studies of Asian prosodic
systems, with agreement regarding several distinct points. Specialists agree
that Asian prosodic systems give evidence of register at the diachronic level:
the present-day tonal system of numerous Far Eastern languages results
from a tonal split conditioned by the voicing feature of initial consonants
that created a high and a low register (Haudricourt 1972). The question
we will raise here is whether register features in the sense of the Two-Feature
Model are motivated at the synchronic level. In view of a rather substantial
literature on this topic, this question might seem presumptuous were it not for
our impression that much of the evidence cited in favor of register features
suffers from the same shortcomings that we have discussed in the preceding
sections in regard to African languages.
To help organize the discussion, we begin by proposing a simple typology
of East Asian tone languages, inspired by the work of A.-G. Haudricourt
1954, 1972, M. Mazaudon 1977, 1978, M. Ferlus 1979, 1998, E. Pulleyblank
1978, and others. This is shown in Table 3.
Each type is dened by the questions at the top of the table. The rst
question is: Is there a voiced/voiceless contrast among initial consonants?
In certain East Asian languages, mostly reconstructed, a distinctive voicing
contrast is postulated in initial position (e.g. [d] vs. [t], [n] vs. [n ]). This
contrast transphonologized to a suprasegmental contrast in the history of
most languages; it is preserved in some archaic languages (e.g. some dialects
of Khmou). The second question is: are there distinctive phonation registers?
By phonation register we mean a contrast between two phonation types,
such as breathy voice, creaky voice, and so on. Phonation registers usually
include pitch distinctions: in particular, in languages for which reliable
information is available, breathy voice always entails lowered pitch,
especially at the beginning of the vowel. Various terms have been proposed
for distinctive phonation types, including growl (Rose 1989, 1990).
Phonetically, phonation register is often distributed over the initial segment
and the rhyme. In this sense, phonation register can usually be best viewed as
a package comprising a variety of phonatory, pitch, and other properties,
and it may sometimes be difcult to determine which of these, if any, is
the most basic in a linguistic or perceptual sense. The third question is: are
there distinctive tone registers? The putative category of languages with two
distinctive tone registers consists of languages that allow at least some of
their tones to be grouped into two sets (high vs. low register), such that any
tone in the high register is realized with higher pitch than its counterpart(s) in
the low register. In languages with distinctive tone registers, any phonation
differences between a high-register tone and its low-register counter-part
must be hypothesized to be derivative (redundant with the register contrast).
The typology set out in Table 3 is synchronic, not diachronic, and is not
intended to be exhaustive. Further types and subtypes can be proposed,
and some languages lie ambiguously on the border between two types.
Interestingly, however, successive types in this table are often found to
Table 3. A simple typology of East Asian tone languages, recognizing 4 principal
types
voicing contrast
among initials?
distinctive
phonation
registers?
distinctive
tone
registers?
examples
Type 1
+
Early Middle Chinese
(reconstructed)
Type 2 + Zhenhai
Type 3 + Cantonese (see below)
Type 4

most Mandarin dialects;
Vietnamese; Tamang
constitute successive stages in historical evolutions. Also, since voicing
contrasts are typically lost as tone registers become distinctive, there is no
direct relation between consonant voicing and tone; this fact explains the
absence of a further type with a voicing contrast and distinctive tone registers.
It should be noted that only type 3 languages as dened above can offer
crucial evidence for a phonologically active tone register feature. Such
evidence could not, of course, come from Type 1, 2 or 4 languages, which
lack (synchronic) tone registers by denition.
In our experience, clear-cut examples of type 3 languages pure tone
register languages are not easy to come by. Some alleged type 3 languages
prove, on closer study, to be phonation register languages. In others, the
proposed registers are historical and are no longer clearly separated at
the synchronic level. Most East Asian languages remain poorly described
at the phonetic level, so that the typological status of many cannot yet be
determined. The small number of clear-cut type 3 languages may be due in
part to insufcient documentation, but it could also be due to the historical
instability of this type of system, as suggested by Mazaudon 1988.
7
The
dening properties of type 3 languages are the following:
1. no voicing contrast in initials
2. no phonation register
3. distinctive high vs. low tone registers, as schematized below:
melodic type 1 melodic type 2 melodic type 3 etc.
high register Ta Ta Ta
low register Tb Tb Tb
In each column, Ta is realized with higher pitch than Tb (some tones
may be unpaired).
As a candidate type 3 language we will examine Cantonese, a member of
the Yue dialect group spoken in southern mainland China. This language
is a prima facie example of a type 3 language as it has no voicing contrast
in initial position, only marginal phonation effects at best, and a plausible
organization into well-dened tone registers. Our main source of data is
Hashimoto-Yue 1972, except that following Chen 2000 and other sources,
we adopt the standard tone values given in the Hanyu Fangyin Zihui, 2
nd
ed.
(1989).
There are several ways of pairing off Cantonese tones into registers in
such a way as to satisfy the model of a type 3 tone language. The standard
pairings, based on Middle Chinese (i.e. etymological) categories, are shown
in (9).
(9) I II III IVai IVaii
high register [53]~[55] [35] [44] [5q] [4q]
low register
[21]~[22] [24] [33] [3q]
The [53]~[55] variants are conditioned by individual and morphosyntactic
variables (Hashimoto-Yue 1972: 178180, who considers the high falling
variant [53] as underlying). Of course, this particular set of pairings has
no analytical priority over any other in a purely synchronic analysis. The
implicit assumption is that these are the most likely to form the basis of
synchronic constraints and alternations. These pairings (as well as the
alternatives) satisfy our third criterion for a Type 3 language. However, we
have been unable to nd any phonetic studies that conrm the pitch values
above, which are partly conventional.
The crucial question for our purposes is whether or not Cantonese
activates register distinctions in its phonology. That is, is there evidence
for a feature such as [high register] in Cantonese in the form of rules,
alternations, etc.? Contrary to some statements in the literature, Cantonese
has a rather rich system of tonal substitutions and tone sandhi, and two of
these phenomena are particularly relevant to this question.
Cantonese tonal phonology is well known for its system of changed
tones. According to this system, some words, mostly nouns, are produced
with the changed tones 35 or (less productively) 55, instead of their basic
lexical tones. This shift is usually associated with an added component of
meaning, such as familiar or opposite (Hashimoto-Yue 1972: 9398).
Some examples are shown in (10).
(10) replacement by 35: replacement by 55:
) y y:
21
y y:
35
sh A:
44
i:
21
A:
44
i:
55
aunt
le
23
le
35
plum ts
h
21
long ts
h
55
short
ty:n
22
ty:n
35
satin j y y:n
23
far y y:n
55
near
, kr
33
kr
35
trick sA:m
53
sA:m
55
clothes
A feature-based analysis of the changed tones is possible, but requires a
complex analysis with otherwise unmotivated housekeeping rules (see
Bao 1999: 121127, for an example).
A more interesting source of evidence for a register feature comes from
a regular rule of tone sandhi which Hashimoto-Yue describes as follows
(1972: 112): a falling tone becomes a level tone if followed by another tone
that begins at the same level, whether the latter is level or falling. She states
the following rules:
(11) 53 55 / __ 53/55/5
21 22 / __ 21/22
Some examples follow:
, i
53
k::i
53
i
55
k::i
53
should, must
, mA:
21
r
21
mA:
22
r
21
sesame oil
Let us consider the analysis of these alternations. A rather simple analysis is
possible under the Two-Feature Model, if we allow Greek-letter variables or
an equivalent formal device to express the identity of two feature values, as
in (12):
(12) register tier [ register] [ register]
/\ /
subregister tier h l h

h
This rule states that the low component of a falling tone shifts to high,
provided it is followed by a tone beginning with a high component and that
both tones belong to the same register. This analysis makes crucial use of
both register features and subregister features, assigned to separate tiers. It
correctly describes both cases.
A notable aspect of this rule, however, is that it describes alternations
among variants of the same tone. That is, as we saw in (11), [53] and [55]
are variants of the same tone, as are [21] and [22]. The rules are therefore
subphonemic, raising the question of whether they are phonological in the
strict sense that is, category-changing rules or gradient phonetic rules.
In the latter case, they would not constitute evidence for tone features, since
features belong to the phonological level (see our introductory discussion).
To make a clear case for a phonological alternation we would need a set of
alternations between contrastive tones, such as [53] ~[35] and [21] ~[24].
Thus, in spite of the rather elegant analysis that can be obtained under the
Two-Feature Model, these facts do not make a clear-cut case for features.
We know of no other alternations that support a feature-based analysis of
Cantonese tones. However, certain static constraints described by Hashimoto-
Yue (110111) are most simply stated in terms of a low register feature, and
possibly in terms of the level/contour distinction, if [53] and [21] are taken to
be underlying
8
(Roman numerals refer to the categories in (9)):
unaspirated initial consonants do not occur in syllables with the low-
register I and II tones (contour tones?)
aspirated (voiceless) initial consonants do not occur in syllables with
the low-register III and IV tones (level tones?)
zero-initial syllables do not occur with low-register tones
These constraints, which are clearly phonological, might be taken as evidence
for a low-register feature. However, static constraints have never carried
the same weight in feature analysis as patterns of alternation, the question
being whether they are actually internalized as phonological rules by native
speakers.
We conclude that Cantonese does not offer a thoroughly convincing case for
tone features. The interest of looking at these facts is that Cantonese represents
one of the best candidates for a type 3 language that we have found.
We have also surveyed the literature on tone features in other Asian
languages. Up to now, we have found that arguments for tone features typi-
cally suffer from difculties which make arguments for a register feature less
than fully convincing:
evidence is often cited from what are actually Type 2 or 4 languages
very many analyses do not satisfy the criteria for feature analysis
outlined in (1)
One reason for these difculties, in the Chinese domain at least, is the
long history of phonetic evolution that has tended to destroy the original
phonetic basis of the tone classes. This has frequently led to synchronically
unintelligible tone systems. As Matthew Chen has put it, the vast assortment
of tonal alternations defy classication and description let alone expla-
nation. As one examines one Chinese dialect after another, one is left with
the bafing impression of random and arbitrary substitution of one tone
for another without any apparent articulatory, perceptual, or functional
motivation (Chen 2000, 8182).
The near-absence of simple, phonetically motivated processes which can
be used to motivate tone features contrasts with the wealth of convincing
crosslinguistic data justifying most segmental features. This may be the
reason why most tonologists, whether traditionalist or autosegmentalist,
have made little use of (universal) features in their analyses. As Moira Yip
has tellingly observed, Most work on tonal phonology skirts the issue of the
features (Yip 2007, 234).
6. Why is tone different?
Why is it that tones do not lend themselves as readily to feature analysis as
segments?
We suggest that the answer may lie in the monodimensional nature of
level tones:
segments are dened along many intersecting phonetic parameters
(voicing, nasality, etc.); such free combinability of multiple properties
may be the condition sine qua non for a successful feature analysis
tone levels (and combinations thereof) are dened along a single
parameter, F0; there is no acoustic (nor as yet, articulatory) evidence
for intersecting phonetic dimensions in F0based tone systems
The latter problem does not arise in phonation-tone register systems, in which
phonation contrasts are often multidimensional involving several phonetic
parameters (voicing, breathy voice, relative F0, vowel quality, etc.), and can
usually be identied with independently-required segmental features.
Given the monodimensional nature of level tones, it is difcult to see how
a universal tone feature analysis could emerge from exposure to the data.
Unless wired-in by Universal Grammar, tone features must be based on
observed patterns of alternation, which, as we have seen, are typically random
and arbitrary across languages. In contrast, patterns based on segmental
features, such as homorganic place assimilation, voicing assimilation, etc.,
frequently recur across languages (see Mielke 2008 for a description of
recurrent patterns drawn from a database of 628 language varieties).
7. Conclusion
We have argued that the primitive unit in tonal analysis may be the simple tone
level, as is assumed in much description work. Tone levels can be directly
interpreted in the phonetics, without the mediation of features (Laniran &
Clements 2003). Tone levels are themselves grouped into scales. (The issue
whether all tone systems can be analyzed in terms of levels and scales is left
open here.)
Although this paper has argued against universal tone features, it has not
argued against language-particular tone features, which are motivated in
some languages. We propose as a null hypothesis (for tones as for segments)
that features are not assumed unless there is positive evidence for them.
(For proposed language-particular features in Vietnamese, involving several
phonetic dimensions, see Brunelle 2009.)
Acknowledgments
Many thanks to J ean-Michel Roynard for editorial assistance.
Notes
1. Another theoretically important function, namely bounding (dening the
maximum number of contrasts), will not be discussed here.
2. Some linguists have maintained that features are innate in some (usually
vaguely-dened) sense. However, recurrence across languages does not entail
innateness, which is an independent hypothesis; for example, some current work
is exploring the view that features can be developed out of experience (Mielke
2008). This issue is peripheral to the questions dealt with in this paper and will
not be discussed further here.
3. Yip 1980 originally proposed two binary features called [upper register] and
[raised]. However, since the development of feature-geometric versions of this
model (Bao 1999, Chen 2000, and others), these have tended to be replaced by
H and L, or h and l.
4. In a broader sense of the term phonology, any rule, categorical or gradient,
which is language-specic might be regarded as phonological. This indeed was
the view of Chomsky & Halle 1968, though it is less commonly adopted today.
5. The facts of Yala are summarized in Anderson 1978 and Clements 1983.
6. We are indebted to Larry Hyman for e-mail correspondence on this question.
7. Mazaudons Stage B languages correspond approximately to our type 3
languages.
8. However, we have not seen convincing evidence for taking either of the
alternating tones [53]~[55] or [21]~[22] as basic.
References
Anderson, Stephen R.
1978 Tone features. In: Fromkin, Victoria A. (ed.) Tone: a linguistic survey.
New York/San Francisco/London: Academic Press, 133176.
Armstrong, Robert G.
1968 Yala (Ikom): A terraced-level language with three tones. Journal of
West African Languages 5, 4958.
Bao, Zhiming
1999 The Structure of Tone. New York/Oxford: Oxford University Press.
Bergman, Richard
1971 Vowel sandhi and word division in Igede. Journal of West African
Languages 8, 1325.
Bergman, Richard & Bergman, Nancy
1984 Igede. In: Bendor-Samuel, J ohn (ed.) Ten Nigerian tone systems. J os
and Kano: Institute of Linguistics and Centre for the Study of Nigerian
Languages, 4350.
Brunelle, Marc
2009 Tone perception in Northern and Southern Vietnamese. Journal of
Phonetics 37, 7996.
Chen, Matthew Y.
2000 Tone sandhi: Patterns across Chinese dialects. Cambridge: Cambridge
University Press.
Chomsky, Noam & Halle, Morris
1968 The Sound Pattern of English. New York: Harper & Row.
Clements, Nick
1977 Four tones from three: the extra-high tone in Anlo Ewe. In: Kotey,
P.F.A. & Der-Houssikian, H. (eds.), Language and Linguistic Problems
in Africa. Columbia (South Carolina): Hornbeam Press, 168191.
1978 Tone and syntax in Ewe. In: Napoli, D.J . (ed.) Elements of Tone,
Stress, and Intonation. Washington: Georgetown University Press,
2199.
1983 The hierarchical representation of tone features, in: Dihoff, Ivan R.
(ed.) Current Approaches to African Linguistics. Dordrecht: Foris,
145176.
Ferlus, Michel
1979 Formation des registres et mutations consonantiques dans les langues
mon-khmer. Mon-Khmer Studies 8, 176.
1998 Les systmes de tons dans les langues viet-muong. Diachronica 15,
127.
Goldsmith, J ohn
1976 Autosegmental phonology. Ph.D. diss., M.I.T., New York: Garland
Publishing, 1980.
Hashimoto-Yue, Anne O.
1972 Phonology of Cantonese. Cambridge: Cambridge University Press.
Haudricourt, Andr-Georges
1954 De lorigine des tons en vietnamien. Journal Asiatique 242, 6982.
1972 Two-way and three-way splitting of tonal systems in some Far Eastern
languages (Translated by Christopher Court) In: Harris, J immy G.
& Noss, Richard B. (eds.), Tai phonetics and phonology. Bangkok:
Central Institute of English Language, Mahidol University, 5886.
Hyman, Larry M. (ed.)
1973 Consonant Types and Tone. Los Angeles: Department of Linguistics,
University of Southern California.
Hyman, Larry M.
1993 Register tones and tonal geometry. In: van der Hulst, Harry & Snider,
K. (eds.), The Phonology of Tone: the Representation of Tonal
Register. Berlin & New York: Mouton de Gruyter, 75108.
J akobson, Roman, Fant, Gunnar & Halle, Morris
1952 Preliminaries to Speech Analysis. Cambridge, Massachusetts: MIT
Acoustics Laboratory.
Laniran, Yetunde O. & Clements, Nick
2003 Downstep and high raising: interacting factors in Yorb tone
production. Journal of Phonetics 31, 203250.
Linguistics Centre of the Department of Chinese, Beijing University (;,
])_){{) (ed.)
1989 ), [Phonetic Dictionary of Chinese Dialects], second
edition. Beijing: Wenzi Gaige Publishing House ([[1).
Mazaudon, Martine
1977 Tibeto-Burman tonogenetics. Linguistics of the Tibeto-Burman Area 3,
1123.
1978 Consonantal mutation and tonal split in the Tamang subfamily of
Tibeto-Burman. Kailash 6, 157179.
1988 An historical argument against tone features. Proceedings of Congress
of the Linguistic Society of America. New Orleans. Available online:
http://hal.archives-ouvertes.fr/halshs-00364901/
McCarthy, J ohn
1986 OCP Effects: Gemination and Antigemination. Linguistic Inquiry 17.
Mielke, J eff
2008 The Emergence of Distinctive Features. Oxford: Oxford University
Press.
Odden, David
1986 On the role of the Obligatory Contour Principle in phonological
theory. Language 62, 353383.
1995 Tone: African languages. In: Goldsmith, J ohn (ed.) Handbook of
Phonological Theory. Oxford: Blackwell.
Pulleyblank, Edwin G.
1978 The nature of the Middle Chinese tones and their development to
Early Mandarin. Journal of Chinese Linguistics 6, 173203.
Rialland, Annie
1981 Le systme tonal du gurma (langue gur de Haute-Volta). Journal of
African Languages and Linguistics 3, 3964.
Rose, Philip
1989 Phonetics and phonology of Yang tone phonation types in Zhenhai.
Cahiers de linguistique Asie Orientale 18, 229245.
1990 Acoustics and phonology of complex tone sandhi: An analysis of
disyllabic lexical tone sandhi in the Zhenhai variety of Wu Chinese.
Phonetica 47, 135.
Stahlke, Herbert
1977 Some problems with binary features for tone. International Journal of
American Linguistics 43, 110.
Tsay, J ane
1994 Phonological pitch. University of Arizona.
Wang, William
1967 Phonological Features of Tones. International Journal of American
Linguistics 33, 93105.
Welmers, William E.
1952 Notes on the structure of Bariba. Language 28, 82103.
1973 African language structures. Berkeley: University of California Press.
Yip, Moira
1990 The Tonal Phonology of Chinese. Garland Publishing, New York.
Original edition. Cambridge, Massachusetts: Indiana University
Linguistics Club.
2002 Tone. Cambridge, U.K.: Cambridge University Press.
2007 Tone. In: De Lacy, Paul (ed.) The Cambridge Handbook of Phonology.
Cambridge: Cambridge University Press, 229252.
Authors afliations
G.N. (Nick) Clements, LPP (CNRS/Universit Paris 3)
Alexis Michaud, LACITO (CNRS/Universit Paris 3): alexis.michaud@vjf.cnrs.fr
Cdric Patin, LLF (CNRS/Universit Paris 7): cedric.patin@gmail.com
Rhythm, quantity and tone in
the Kinyarwanda verb
John Goldsmith and Fidle Mpiranya
1. Introduction
In this paper, we discuss the some aspects of the tonology of the verbal
inectional system in Kinyarwanda. There is a considerable amount of
literature on tone in Kinyarwanda and in Kirundi (for example, Sibomana
1974, Coupez 1980, Mpiranya 1998, Kimenyi 2002), two languages which
are so similar that the two can be considered dialects of a single language.
We have beneted from previous analyses of both languages, and especially
from work done in collaboration with Firmard Sabimana (see Goldsmith
and Sabimana 1985) and with J eanine Ntihirageza, both linguists and
native speakers of Kirundi. Nonetheless, the focus in the present paper is
Kinyarwanda, which is the native language of one of the present authors (FM).
We wish to emphasize that even restricting ourselves to the material discussed
below, there are some differences between Kirundi and Kinyarwanda, and
while the differences are small, they are signicant. Despite the considerable
work that exists already on the tone of the verbal system, a number of
important questions even basic ones remain relatively obscure, and we
hope that the present study will contribute to a better understanding of them.
We plan to present a more comprehensive account of the tonology of the
verbal system in the future. We use the following abbreviations:
SM
TM
FOC
OM
FV
inf
B
Subject marker
Tense marker
Focus marker
Object marker
Final vowel
Innitive marker
Basic (underlying) tone
Our goal has been to develop a formal account of tone which is as similar
as possible to the analysis of tone in the other Bantu languages that are
reasonably closely related. But the fact is that despite our bias in this regard,
26 John Goldsmith and Fidle Mpiranya
the analysis that we present here is quite different from what we expected,
and from those proposed for nearby Bantu languages. In keeping with some
earlier analyses, our account leans heavily on postulating metrical structure
established from left to right, needed in order to account for the shifting and
spreading of high tone. But the most surprising aspect of this analysis is that
there is no general tonology of the verbal High tone as such: each High tone
has a behavior that is directly tied to its morphological status or origin, and
the shift of High tone occurs both towards a metrically Weak and a metrically
Strong position, depending on the morphological status of the High tone in
question, a fact that we did not expect, and that we were, in retrospect, biased
against.
We will begin by sketching the overall analysis in general terms, and
we describe the conclusions which we have reached. The motivation and
justication will be presented over the course of the paper, and indeed, our
reasons for formulating the generalizations as we do may not be entirely
clear until the data is seen in detail.
1. The general structure of the Kinyarwanda verb is similar to that
found in a range of familiar, and relatively closely related, Bantu
tone languages; see Figure 1, where we present an schema of the
Bantu verb one that is incomplete, but sufciently detailed for our
present purposes.
macrostem
stem
macrostem
H
root
domain
H
post
domain
Negative
Marker
Subject
Marker
Tense
Marker
Focus
Marker
Object
Marker
Object
Marker
Radical Extensions Final
Vowel
tu
we
ra
ki
class 7
mu
class 1
bon
see
er
applicative
a
unmarked
mood
tu-ra-k-m-bnera wewill seeit for him
Figure 1. Verbal structure tone windows
2. Some morphemes have underlying tones and others do not.
3. There is a High/Low tonal contrast among the verb roots, although
there is no evidence that what we might call Low toned verb roots
bear a Low tone as such; they are best analyzed as bearing no tone.
Speaking of a High/Low tonal contrast is a matter of convenience.
4. There is no lexical tonal contrast among the subject markers. In most
environments, the Subject Marker (SM) appears on a low tone, the
result of no tone associating with it. In a few environments, a High
tone is associated with the Subject Marker.
5. There is a sufxal high tone, a sufxal morpheme which we indicate
as H
post
, that appears in certain morphological environments. When
there are no Object Markers in the verb, the sufx H
post
appears on
the second syllable of the stem, but when there are OM prexes, it
appears further to the left. For specialists in historical or comparative
Bantu tone, this tone is especially interesting. Its behavior is quite
different from the verbal sufx High tone, or tones, that we observe
in closely related Bantu languages. In particular, it is common to
nd a High tone that appears on the mora that follows the rst mora
of the verb radical, and in those languages in which there is a tonal
contrast among the verb radicals, this High tone typically appears
when the verb radical is Low (or toneless). This tone, however,
never appears shifted to a position earlier in the word, as far as we
are aware. In addition, there is a distinct High tone that is associated
with the Final Vowel in a number of verbal patterns, such as the
subjunctive. This difference does not naturally carry through to the
Kinyarwanda system, as far as we can see at the present time.
6. There is a leftward shift of High tone in some cases that appears to
be rhythmically motivated. If we group moras into groups of two
from left to right, then it is natural to label one as strong and one
as weak, even if the choice is a bit arbitrary. We label these feet as
trochees (Strong-Weak). H
root

shifts leftward to a Strong position;
H
post

shifts leftward to a Weak position: this is the conclusion that we
mentioned just above that was surprising, and it will become clearer
when we consider some specic examples.
7. Kinyarwanda is relatively conservative among the Bantu languages
in maintaining a vowel length contrast, and it appears to us to be
impossible to avoid speaking of moras in the analysis of the prosodic
system. However, not all moras show the same behavior, and in
some cases, the second mora of a long vowel acts differently than the
mora of a short syllable in a weak position. That much is perhaps not
surprising: the rst and the second mora in a bimoraic syllable may
not have all the same privileges. But there are cases where a High
tone that we might expect (based simply on counting moras, and
distinguishing odd from even positions) to appear on the second mora
of a long vowel will instead associate with the immediately following
mora, which is to say, in the following syllable. We interpret this as
an expression of quantitity-sensitivity in the accentual system: in
particular, if the left-to-right assignment of metrical positions should
encounter (so to speak) a long (i.e., two-mora) vowel in a Strong
position, it treats the long vowel as comprising the strong position of
the trochee, with the weak position falling in the subsequent syllable.
One of the aspects of the verbal tone pattern that makes its analysis so
difcult is the fact that there are few generalizations that hold for High tones
in general. Instead, we nd that in order to make sense of the data, we must
talk about several different High tones these tones are different in the sense
that what makes the tones different is their grammatical function, rather than
their phonetic description. In brief,
1. One of the High tones is the High tone that is part of a verb radicals
underlying form;
2. Another High tone is a formal marker (some would say, a formal
morpheme, if we allow ourselves to speak of morphemes that do
not have a speciable sense or unique grammatical function) that
appears typically to the right of the verb radical;
3. A third High tone is part of the negative prex nti- (although in the
surface representation, that High tone is typically associated with a
different syllable).
We will try to show that the principles that account for the appearance of each
of these tones is different. We thus are not led to a set of rules which must
be applied sequentially, as has often been the case in the analysis of other
related Bantu languages. The analysis does not draw us towards an optimality
theoretic analysis, either, because the complexities of the analysis involve
morphological specications that appear to be inconsistent with a view that
makes strong claims about the universality of phonological constraints.
We list on the facing page the four distinct High tones, according to this
analysis. One of the reasons for distinguishing these classes is that not only
(as we have just said) the left to right position of each tone in the word
is determined by different principles, but in addition, there is a sort of
competition among the tones, in the sense that when the post-radical High
H
post

is present, the radicals High tone does not appear (it is deleted, in
generative terminology); and when the ntis High tone appears, neither the
radicals tone nor the post-radical High tone appears. However, there are
pairs of High tones that can co-exist: the Tense Marker raas High tone, for
example, can occur along with the radical High tone.
(1) Types of High tone
name Type normal position
H
neg
nti (negation) Syllable after nti
H
post
post-radical grammatical tone syllable after radical
H
root
radical lexical tone rst syllable of radical
H
TM
tense marker ra, za on TM
In Figure 1, we have given a schematic of the most important positions for
morphemes in the Kinyarwanda verb. We have indicated towards the bottom
the range of positions in which the H
root

tone can (or does) associate, and the
range of positions for the H
post
. We are not quite certain as to whether these
domains have a real status in the system, or whether the range of positions
that we have indicated there is simply the logical consequence of the other
rules and constraints posited in the grammar. There is one case below which
suggests the former interpretation is correct, in connection with the tonal
behavior of the inceptive (ra) tense: viewing this tonal domain as having
some linguistic reality would perhaps provide the best account for the
placement of the radical High tone there.
We draw the readers attention to the curious fact that while this analysis
depends more heavily on tones morphological status than is found in analyses
of related Bantu tone languages, the analysis is not thereby more concrete.
That is, it is often the case that diachronic development leads a language
from a situation in which a phonological effect is governed by phonological
considerations only, to one, a little later, where the conditioning factor is not
the phonological environment, but rather the specic morphological identity of
the neighboring morphemes velar softening in English, for example. In such
cases, the triggering environment is present, visible, and directly observable.
In the present case, however, the High tones that we observe do not wear their
categorization on their sleeves, so to speak: it requires an analytic leap to
decide that a given High tone in a given word is marked as H
neg
or H
post
.
The most complex aspect of the tone system is the shifting and reas-
sociations of these tones. To understand this, we must distinguish between
the placement of the radical High tone, and the post-radical High tone. Both
of these tones shift to the left, and in their reassociation they remain within
the macrostem (which is to say, they remain to the right of the Tense Marker).
But they shift according to different principles. By macrostem, we mean the
part of the verb that begins after the Tense Marker, consisting of all Object
Markers and the verb stem as well. In addition, the macrostem includes
the secondary prexes which appear in much the same position as Object
Markers do; this is depicted graphically in Figure 1.
The radical High H
root

shifts to the beginning of the macrostem that is, the
rst syllable of the macrostem. Actually, what we nd on the surface suggests
that it might be more appropriate to say that the radical High tone spreads
to the rst syllable of the macro-stem. Essentially what we nd is this: the
radical High tone appears on (i.e., is associated with) the rst syllable of the
macrostem, but in addition, we may nd the tone spread further to the right,
as far to the right as the radical itself the only condition being that the entire
span of Highs must be odd in number (which here means one or three). Such a
condition seems to make more sense on the view that the radical High spreads
to the left, and is then delinked in a right-to-left fashion to satisfy a parity
condition, to which we will return below. This is illustrated in Figure 2.
The post-radical High H
post

shifts leftward to a position within the
macrostem which is an even-numbered position but there are two slightly
different principles that determine how we count. If the macrostem has 2
or more Object Markers, Secondary prexes are counted in this, as they
behave like Object Markers quite generally. Then counting begins with the
beginning of the macrostem, otherwise, counting begins at the beginning
of the word. The even-numbered positions are strong in the sense that
they attract the post-radical High: H
post

shifts to the leftmost even-numbered
position in the macrostem.
It is difcult to avoid the sense that the H
post

is a syncopated High tone
in music, the term syncopation refers to a prosodic impulse that is on
an off-beat, or in present terms, a Weak metrical position. The shifting of
association of this tone preserves this aspect of syncopation, and we suspect
that this is an important fact.
1.1. Innitive
We look rst at the innitive. Its negative is formed with the prex -ta-, not
nti-. In the tabular representation of the verbal tone pattern, we use B to
indicate the basic or inherent tone of the verb radical.
tu ra bon a
H
root
macrostem
tu ra mu bon a
H
root
macrostem
tu ra ki
H
root
tu ra ki
H
root
mu bon er a
ha mu bon er a
Figure 2. Foot marking for H
root
association
(2) Afrmative innitive Gloss
Low tone ku rim a to cultivate
ku rer a to raise (children)
ku rog a to poison
ku rut a to surpass
ku geend a to go
High tone ku bn a to see
ku br a to lack
ku bk a to crow
ku bag a to butcher
ku ber a to suit
(Continued)
Negative innitive Gloss
Low tone ku t rim not to cultivate
ku t rer not to raise (children)
ku t rog not to poison
ku t rut not to surpass
ku t geend not to go
ku t geend n a not to go with
High tone ku t bon not to see
ku t bur not to lack
ku t bik not to crow
ku t baag not to butcher
ku t beer not to suit
ku d teek not to cook
ku d teek r a not to cook for
(3) Basic tone assignment for each morphological pattern
TM H
root
H
post
innitive B
2. Present tense
2.1. Afrmative
In the simple case where there are no Object Markers (OMs), the basic or
lexical tone marking of the verb radical appears on the radical itself. However,
one of the central issues in Kinyarwanda morphotonology is how to account
for what appears to be a shifting of a High tones position, or association,
from the radical when we compare the surface tone pattern of verbs with
no Object Markers (OMs) and verbs with a single OM. As noted above, we
propose that this leftward shift is best understood in terms of a rhythmic
pattern which is established by creating binary feet from left to right from
the beginning of the word: including the negative prex nti- in the case of
Kinyarwanda. We look rst at the afrmative present tense form of the verb.
It is clear that the High tone in these forms is the High tone of the verb
radical, but it will be associated to a position to the left of the radical if there
is such a position within the macrostem. Furthermore, this tone may appear
associated with either one or three syllables: the maximum number possible
if the tones association is not to move outside of its domain, dened as the
macrostem up to the radical. Consider rst the behavior of verb radicals with
a short vowel, given in (5), and next the behavior of verb radicals with a long
vowel, given in (6). The long vowel stems do not behave differently in any
important way in this tense. In Figure 2, we present the foot construction
made on these verbs, and one can see that the H
root

always associates to the
rst (i.e., leftmost) Strong position within the macrostem, which is indicated
with a dotted-line box.
TM H
root
H
post
innitive B
present tense afrmative (focus) B
(5) Present tense afrmative
short vowel
Singular subject Plural subject
Root: -rim- (Low tone: to cultivate)
No OM n da rim a tu ra rim a
u ra rim a mu ra rim a
a ra rim a ba ra rim a
One OM ki (cl 7) n da ki rim a tu ra ki rim a
u ra ki rim a mu ra ki rim a
a ra ki rim a ba ra ki rim a
Two OMs n da ki mu rim ir a tu ra ki mu rim ir a
ki mu (cl 7, 1) u ra ki mu rim ir a mu ra ki mu rim ir a
a ra ki mu rim ir a ba ra ki mu rim ir a
Three OMs ki n da ki ha mu rim ir a tu ra ki ha mu rim ir a
ha mu (cl 7, u ra ki ha mu rim ir a mu ra ki ha mu rim ir a
16, 1) a ra ki ha mu rim ir a ba ra ki ha mu rim ir a
(Continued)
Root: -bn- (High tone: to see)
No object marker n da bn a tu ra bn a
u ra bn a mu ra bn a
a ra bn a ba ra bn a
One OM mu n da m bon a tu ra m bon a
(him/her) u ra m bon a mu ra m bon a
a ra m bon a ba ra m bon a
Two OMs mu n da k m bn er a tu ra k m bn er a
(him/her) u ra k m bn er a mu ra k m bn er a
a ra k m bn er a ba ra k m bn er a
Three OMs mu n da k h m bon er a tu ra k h m bon er a
(him/her) u ra k h mu ra k h
m bon er a m bon er a
a ra k h m bon er a ba ra k h m bon er a
(6) Present tense afrmative
long vowel
Singular subject Plural subject
Low tone: -geend- ( to go)
No OM n da geend a tu ra geend a
u ra geend a mu ra geend a
a ra geend a ba ra geend a
One OM n da ha geend a tu ra ha geend a
u ra ha geend a mu ra ha geend a
a ra ha geend a ba ra ha geend a
High tone -tek- (High tone: to cook)
No OM n da tek a tu ra tek a
a ra tek a mu ra tek a
a ra tek a ba ra tek a
One OM n da g teek a tu ra g teek a
u ra g teek a mu ra g teek a
a ra g teek a ba ra g teek a
Two OMs n da k m tek er a tu ra k m tek er a
mu (him/her) u ra k m tek er a mu ra k m tek er a
a ra k m tek er a ba ra k m tek er a
Three OMs n da k h m teek er a tu ra k h m teek er a
mu (him/her) u ra k h m teek er a mu ra k h m teek er a
a ra k h m teek er a ba ra k h m teek er a
2.2. Negative
When we turn to the negative form of the present tense, we see a different
pattern of a shifting High tone. When there is no OM, the sufxal High tone
appears on the second syllable of the stem (-er-, in the cases examined here).
However, when there is a single OM, we see a complex set of data present
when we look at long and short vowels in Kinyarwanda. In (8), we present
these forms.
TM H
root
H
post
innitive B
present tense negative H
(8) Present tense negative
short vowel
Tone neutralized Singular subject Plural subject
-bn- (to see)
No OM sii m bon r a nti tu bon r a
ntu u bon r a nti mu bon r a
nta a bon r a nti ba bon r a
One OM sii n ki bn er a nti tu ki bn er a
ntu u ki bn er a nti mu ki bn er a
nta a ki bn er a nti ba ki bn er a
Two OMs, nta a ki m bon er a nti ba ki m bon er a
3rd person
(Continued)
Reex OM, nti y ii bn er a nti b ii bn er a
3rd person
Three OMs, nta a ha k mu bon er a nti ba ha k mu bon er a
3rd person
long vowel
Tone neutralized Singular subject Plural subject
-geend- (to go)
(3rd person only)
No OM nt aa teek r a nti ba teek r a
One OM nt aa mu tek er a nti ba mu tek er a
Two OMs nt aa ki m teek er a nti ba ki m teek er a
Reex nti y ii tek er a nti b ii tek er a
In Figure 3, we observe the behavior of a tone that appears to shift
leftward, though that is simply a metaphor derived from comparing different
forms from the same inectional paradigm. The situation is more complex
when the vowel in the verb radical is long. A generalization that merely
counts odd- and even-numbered positions fails to generate the correct data,
and the details of this are shown in Figure 4.
3. Inceptive: -ra-
3.1. Kinyarwanda
This tense only exists in the negative in Kinyarwanda; see (10) and
Figure 6. The verb radical keeps its lexical tone, High or Low, but in the
presence of OM prexes, the radicals lexical tone is pulled leftward: if there
is one OM, the tone is maintained on the root, and if there are two OMs, the
tone is placed on the second OM (counting, as ever, from left to right). If
there are 3 OMs, the tone is placed on the second OM, just as it is in the two
OM case, but we nd spreading of the High tone from that second OM to
the verb radical. We note that in all cases, the High tone moves to an even-
numbered position, and furthermore, if the original position of the High tone
nti tu
Hpost
macrostem
tu ki
Hpost
nti tu ki mu bon er a
Hpost
nti tu ki ha mu bon er a
Hpost
bon er a
nti bon er a
Figure 3. Rhythmic structure in negative present tense (short vowel)
Quantity-insensitivefoot assignment
(incorrect)
nti teek
H
inst
H
post
Quantity-sensitivefoot assignment
(correct)
nti tu teek er a
H
inst
H
post
nti tu gi teek er a
H
post
nti tu ki mu
H
post
nti tu ki hamu teek er a
H
post
tu er a
teek er a
Figure 4. Rhythmic structure in negative present tense (long vowel -teek-)
was in an even-numbered position (here, mora 6 of the word), it remains in
place, and we nd spreading from the 4th to the 6th position. If the High tone
had been on an odd numbered position, the tone moves, rather than spreads,
leftward to the 4th mora.
In putting things this way, we have overlooked the fact that our description
of quantity-sensitive rhythm-assignment does not correctly deal with the case
with no OM, for both the short- and the long-vowel radicals, and we highlight
this in Figure 5. Why do we nd (b) in reality, and not (a)? This is clearly a
radical High tone, not a H
post
, so according to the analysis presented here, it
should associate with a strong position. Why does the ra TM not take the
rst mora of the radical as the weak position in its foot? If we follow the
analysis presented here, the TM ra does just that, as it should, when there is
one or more OM. So why do we nd the situation as we do in Figure 5?
The only answer we have is both partial and tentative: If the H
root

must
associate with a strong position within the macrostem and that is the heart
of our present proposal then the only such position is the one indicated
in Figure 5, and it is to the right of the rst mora of the verb stem. In no
cases does a roots High tone appear to the right of the rst mora of the
radical; that is what we indicated in Figure 1 above. If that generalization has
some real status in the language, and the language uses that domain-based
generalization to govern where the tone associates, then we have perhaps the
basis of an account, or answer, to this question.
(a)
expected?
a r e n o b a a r a b i t n
H
neg
H
TM
H
root
(b)
observed
a r e n o b a a r a b i t n
H
neg
H
TM
H
root
Figure 5. raa with no OMs
nti TM H
root
H
post
innitive B
present H H
tense negative
inceptive H (ra) B
(10) -ra-
Low tone radical
short vowel long vowel
No OM nti ba ra rim a nti ba ra geend a
1 OM nti ba ra ha rim a nti ba ra ha geend a
2 OM nti ba ra bi ha rim a nti ba ra bi ha geend an a
3OM nti ba ra bi ha mu rim er a nti ba ra bi ha mu geend er a
(a)
a n o b a a r a b i t n
H
neg
H
TM
H
root
(b)
a n o b i b a a r a b i t n
H
neg
H
TM
H
root
(c)
a n o b a h i b a a r a b i t n
H
root
H
TM
H
neg
(d)
a r e n o b u m a h i k a a r a b i t n
H
root
H
TM
H
neg
macrostem
Figure 6. Rhythmic structure in negative inceptive with High-toned radical
High tone radical
No OM nti ba ra bn a nti ba ra tek a
1 OM nti ba ra ha bn a nti ba ra ha tek a
2 OM nti ba ra bi h bon a nti ba ra bi h teek a
3OM nti ba ra bi h m bn er a nti ba ra bi h m tek er a
4. Future
The future is marked by TM za/zaa in Kinyarwanda. The tone pattern
behaves just like the parallel case of ra.
4.1. Afrmative indicative future
In Kinyarwanda, nothing special happens in the case of 2 OMs, other than
the shift to even-numbered positions.
Note that in the afrmative, all syllables are Low:
(11) Future afrmative -zaa- or -za-
Low tone radical
No OM ba zaa rim a ba zaa geend a
1 OM ba zaa ha rim a ba zaa ha geend a
2 OM ba zaa bi ha rim a ba zaa bi ha geend an a
3 OM ba zaa bi ha mu rim ir a ba zaa bi ha mu geend an ir a
High tone radical
No OM ba zaa bon a ba zaa teek a
1 OM ba zaa ha bon a ba zaa ha teek a
2 OM ba zaa bi ha bon a ba zaa bi ha teek a
3 OM ba zaa bi ha mubon er a ba zaa bi ha mu teek er a
4.2. Negative
nti TM H
root
H
post
innitive B
present tense negative H H
inceptive negative H H (ra) B
future afrmative
future negative (non-focused) H H (za) B
(13) Future negative nti- +-za-
Low tone radical
No OM nti ba za rim a nti ba za geend a
1 OM nti ba za ha rim a nti ba za ha geend a
2 OM nti ba za bi ha rim a nti ba za bi ha geend an a
3 OM nti ba za bi ha mu rim ir a nti ba za bi ha mu geend
an ir a
High tone radical
No OM nti ba za bn a nti ba za tek a
1 OM nti ba za ha bn a nti ba za ha tek a
2 OM nti ba za bi h bon a nti ba za bi h teek a
3 OM nti ba za bi h m bn er a nti ba za bi h m tek er a
5. Far past
5.1. Far past afrmative
In the afrmative, there is neutralization between radicals of High and Low
tone; both have a High tone (or in the non-focused forms, no tone, i.e. low
tone).
(14) Far Past (non-focused) --
Low tone radical
No OM ba rim aga ba geend aga
1 OM ba ha rim aga ba ha geend aga
2 OM ba bi ha rim aga ba bi ha geend an aga
3 OM ba bi ha mu rim ir aga ba bi ha mu geend an ir aga
High tone radical
No OM ba bon aga ba teek aga
1 OM ba ha bon aga ba ha teek aga
2 OM ba bi ha bon aga ba bi ha teek aga
3 OM ba bi ha mu bon er aga ba bi ha mu teek er aga
(15) Far Past (focused) -ra-
Low tone radical
No OM ba ra rm aga ba ra gend aga
1 OM ba ra ha rmaga ba ra ha gend aga
2 OM ba ra bi h rim aga ba ra bi h geend an aga
3 OM ba ra bi h mu rim ir aga ba ra bi h mu geend an ir aga
High tone radical
No OM ba ra bn aga ba ra tek aga
1 OM ba ra ha bn aga ba ra ha tek aga
2 OM ba ra bi h bon aga ba ra bi h teek aga
3 OM ba ra bi h mu bon er aga ba ra bi h mu teek er aga
5.2. Far past negative
This form is necessarily non-focused, and (we believe) this is why there is no
High tone either on the radical (in the case of High toned verbs).
(16) Far Past (non-focused) --
No OM nti ba rim aga nti ba geend aga
1 OM nti ba ha rim aga nti ba ha geend aga
2 OM nti ba bi ha rim aga nti ba bi ha geend an aga
3 OM nti ba bi ha mu rim ir aga nti ba bi ha mu geend an
ir aga
High tone radical
No OM nti ba bon aga nti ba teek aga
1 OM nti ba ha bon aga nti ba ha teek aga
2 OM nti ba bi ha bon aga nti ba bi ha teek aga
3 OM nti ba bi ha mu bon er aga nti ba bi ha mu teek er aga
The negative Far Past does not have a focused form.
nti TM H
root
H
post
innitive B
future afrmative
future negative (non-focus) H H (za) B
far past afrmative (focus) H (ra) H
far past negative H ()
6. Recent past
6.1. Recent past afrmative
The only difference with the Far Past here is that the TM is on a low tone.
(18) Recent Past (non-focused) -a-
Low tone radical
No OM ba a rim aga ba a geend aga
1 OM ba a ha rim aga ba a ha geend aga
2 OM ba a bi ha rim aga ba a bi ha geend an aga
3 OM ba a bi ha mu rim ir aga ba a bi ha mu geend an ir aga
High tone radical
No OM ba a bon aga ba a teek aga
1 OM ba a ha bon aga ba a ha teek aga
2 OM ba a bi ha bon aga ba a bi ha teek aga
3 OM ba a bi ha mu bon er aga ba a bi ha mu teek er aga
In the following forms, we note a sequence of three adjacent moras in
each case, but on the surface this is not distinct from other two-mora vowels.
(19) Recent Past (focused) -aa-
Low tone radical
No OM ba aa rim aga ba aa geend aga
1 OM ba aa ha rim aga ba aa ha geend aga
2 OM ba aa bi ha rim aga ba aa bi ha geend an aga
3 OM ba aa bi ha mu rim ir aga ba aa bi ha mu geend an ir aga
High tone radical
No OM ba aa bn aga ba aa tek aga
1 OM ba aa ha bn aga ba aa ha tek aga
2 OM ba aa bi h bon aga ba aa bi h teek aga
3 OM ba aa bi h mu bon er aga ba aa bi h mu teek er aga
6.2. Recent past negative
(20) Recent Past (non-focused) -a-
Low tone radical
No OM nti ba a rim aga nti ba a geend aga
1 OM nti ba a ha rim aga nti ba a ha geend aga
2 OM nti ba a bi ha rim aga nti ba a bi ha geend an aga
3 OM nti ba a bi ha mu rim ir aga nti ba a bi ha mu geend an ir aga
High tone radical
No OM nti ba a bon aga nti ba a teek aga
1 OM nti ba a ha bon aga nti ba a ha teek aga
2 OM nti ba a bi ha bon aga nti ba a biha teek aga
3 OM nti ba a bi ha mu bon er aga nti ba a bi ha mu teek er aga
nti TM H
root
H
post
innitive B
future afrmative
future negative (non-focus) H H (za) B
far past afrmative (focus) H (ra) H
far past negative H ()
recent past afrmative (focus) H (a) B
recent past negative (non-focus)
7. Subjunctive
In the afrmative subjunctive, we see the interaction between two
generalizations: rst, the special placement of a High on the second of two
(or more) OMs, and second, the placement of the sufxal tone on an even-
numbered mora, counting from the beginning of the word (as long as there
are at least 4 moras to the word). However, this account does not yet cover
all the data, as we illustrate in Figure 7.
In the negative subjunctive, we see a situation in which the SM (subject
marker) is associated with a High tone, which we analyze functionally as
part of the negative nti-prex.
(22) Subjunctive afrmative
Kinyarwanda
No OM ba rim ba geend
1 OM ba ha rim ba ha gend e
2 OM ba bi h rim e ba bi h geend an e
3 OM ba bi h mu rim ir e ba bi h mu geend an ir e
(23) Subjunctive negative
Kinyarwanda
No OM nti b rim e nti b geend e
1 OM nti b ha rim e nti b ha geend e
2 OM nti b bi ha rim e nti b bi ha geend an e
3 OM nti b bi ha mu rim ir e nti b bi ha mu geend an ir e
8. Conclusion
We have only begun to deal with the complexities of tone assignment to
the Kinyarwanda verb in this paper, but we hope that the material that we
have presented is at the very least suggestive of how rhythmic structure may
interact with tone association in Kinyarwanda, and by implication in Kirundi
and perhaps in some other Lacustrine Bantu languages.
Predicted but wrong:
barime
H
post
Correct:
barime
H
post
Predicted and correct:
ba
H
post
ba bi harime
H
post
Correct:
ba
H
post
?
ba bi hamu rimir e
H
post
Correct:
ba
H
post
?
harime
bi harime
bi ha mu rimir e
Figure 7. Rhythmic structure in afrmative subjunctive
References
Coupez, Andr
1980 Abrg de grammaire rwanda. Butare: Institut national de recherche
scientique.
Goldsmith, J ohn and Firmard Sabimana
1985 The Kirundi verb. In: Francis J ouannet (ed.), Modles en Tonologie
(Kirundi et Kinyarwanda), 1962. Paris: Editions du centre national
de la recherche scientique.
Kimenyi, Alexandre
2002 A Tonal Grammar of Kinyarwanda: Autosegmental and Metrical
Analysis by Alexandre Kimenyi. Lewiston: Edwin Mellen.
Mpiranya, Fidle
1998 Perspective fonctionnelle en linguistique compare les langues bantu.
Lyon: CEL.
Sibomana, Leonidas
1974 Deskriptive Tonologie des Kinyarwanda. Hamburg: Helmut Buske
Verlag.
Do tones have features?
Larry M. Hyman
1. Introduction: Three questions about tone
In this paper I address the question of whether tones have features. Given
that most phonologists accept either binary features or privative elements
in their analyses of segmental systems, it may appear surprising that such
a question needs to be asked at all. However, as I discuss in Hyman (in
press) and below, tone has certain properties that appear to be unique within
phonological systems. Hence, it could also be that featural analyses of tones
are not necessary, even if they are well-founded in consonant and vowel
phonology. Before considering whether tones have features, there are two
prior questions about tone which will bear on my conclusion:
(1) Question #1: Why isnt tone universal?
Question #2: Is tone different?
Question #3: Do tones have features?
The rst question is motivated by the fact that all languages exploit pitch in
one way or another, so why not lexical or grammatical tone? It is generally
assumed that somewhere around 4050% of the worlds currently spoken
languages are tonal, although the distribution is highly areal, covering most
of Subsaharan Africa and East and Southeast Asia, as well as signicant
parts of Mexico, the Northwest Amazon, and New Guinea. There would
seem to be several advantages for universal tone: First, tone presents few,
if any articulatory difculties vs. consonants (which all languages have).
Second, tone is acoustically (hence perceptually?) simple, F
0
, vs. consonants
and vowels. Third, tone is acquired early (Li and Thompson 1978, Demuth
2003), such that nativists may even want to claim that human infants are
prewired for it. Thus, if all of the languages of the world had tone, we would
have no problem explaining why this is. The more interesting question, to
which I will return in 5, is why tone isnt universal.
The second question is whether tone is different. In Hyman (in press) I
suggested that tone is like segmental phonology in every way only more
so, in two different senses: (i) Quantitatively more so: tone does certain
things more frequently, to a greater extent, or more obviously (i.e. in a more
straightforward fashion) than segmental phonology; (ii) Qualitatively more
so: tone can do everything segments and non-tonal prosodies can do, but
segments and non-tonal prosodies cannot do everything tone can do. This
more so property contrasts with the articulatory and perceptual simplicity
referred to in the previous paragraph. As Myers and Tsay (2003: 1056) put
it, ...tonal phenomena have the advantages of being both phonologically
quite intricate and yet phonetically relatively straightforward (i.e. involving
primarily a single perceptual dimension, although laryngeal physiology is
admittedly more complex). There is so much more you can do with tone.
For example, as seen in the Giryama [Kenya] forms in (2), the tones of one
word may be realized quite distantly on another (Philippson 1998: 321):
(2) a. ku-tsol-a ki-revu to choose a beard /-tsol-/ choose
b. ku-on-a ki-rvu to see a beard /-n-/ see
| =
H
In (2a) all of the TBUs are toneless, pronounced with L(ow) tone by default.
In (2b), the H(igh) of the verb root /-n-/ see shifts long distance to the
penult of the following word, which then ends with a H-L sequence. Put
simply, segmental features and stress cant do this. They are typically
word-bounded or interact only locally at the juncture of words. Thus, no
language has been known to transfer the nasality of a vowel to the penult of
the following word. Similarly, one word does not normally assign stress to
the next. While tone is capable of a rich lexical life as well, it has an equal
potential at the phrase level, where the local and long-distant interaction of
tones can produce a high degree of opacity (differences between inputs and
outputs) and analytic open-endedness.
In short, tone can do everything that segmental and accentual phonology
can do, but the reverse is not true. Some of this may be due to the fact that tone
systems can be extremely paradigmatic or syntagmatic, exclusively lexical
or grammatical. Thus consider the eight tone patterns of Iau [Indonesia:
Papua] in (3).
(3) Tone Nouns Verbs
H b father-in- b came totality of action
law punctual
M be re b has come resultative
durative
52 Larry M. Hyman
H
H b

snake b

might come totality of action
incompletive
LM b path b came to get resultative
punctual
HL b thorn b came to end telic punctual
point
HM b ower b still not at telic incompletive
endpoint
ML be` small eel b` come (process) totality of action
durative
HLM b tree fern b sticking, telic durative
attached to
As seen on the above monosyllables (where
=super-high tone), the same

eight tones contrast paradigmatically on both word classes, although with
a lexical function on nouns vs. a grammatical function on verbs (Bateman
1990: 3536).
Compare this with the representative nal vs. penultimate H tone in the
Chimwiini [Somalia] paradigm in (4):
(4) singular plural
n-ji:l I ate chi-ji:l we ate
ji:l you sg. ate ni-ji:l you pl. ate
j:le s/he ate wa-j:le they ate
The properties of Chimwiini are as follows (Kisseberth (2009): (i) there is
grammatical tone only, i.e. no tonal contrasts on lexical morphemes such as
noun stems or verb roots; (ii) H tone is limited to the last two moras; (iii) nal
H is morphologically conditioned, while penultimate H is the default; (iv)
rst and second person subjects condition nal H vs. third person which takes
the default penultimate H. As seen, the only difference between the second
and third person singular [noun class 1] is tonal: ji:l vs. j:le. However, as seen
now in (5), the nal or penultimate H tone is a property of the phonological
phrase:
(5) a. jile: n am you sg. ate meat jile ma-tu:nd you sg. ate fruit
b. jile: n ma s/he ate meat jile ma-t:nda s/he ate fruit
In fact, when there is wide focus, as in (6), each phonological phrase gets the
appropriate nal vs. penultimate H tone:
(6) a. -wa-tind il il e w-aan ] n am ] ka: chi-s ] you sg. cut for the
children meat with
a knife
b. -wa-tind il il e w-ana ] n ma ] ka: ch-su ] s/he cut for the
children meat with
a knife
Although phrasally realized, the Chimwiini nal vs. penultimate patterns
reect an original tonal difference on the subject prexes. Thus, compare the
following from the Cahi dialect of Kirimi (where
=downstep):
(7) a. /o-ko -tng-a/ o-ko-tng- s/he is tying
b. / o-ko-tng-a/ o-k o-
tng- you sg. are tying

H H
As seen, the second person subject prex has a H tone, while the
segmentally homophonous [noun class 1] third person singular subject
prex is toneless. This suggests the following implementation of the
Chimwiini facts: (i) rst and second person subject markers have an underly-
ing /H/ tone; (ii) this H tone links to the last syllable of the phonological
phrase; (iii) any phonological phrase lacking a H tone receives one on its
penult.
While tone is dense and paradigmatic in Iau, it is sparse and syntagmatic
in Chimwiini so much so that the question even arises as to what the nal
vs. penultimate H tone contrast is:
(8) a. morphology? (a property of [+1st pers.] and [+2nd pers.]
subject prexes);
b. phonology? (property of the phonological phrase H is
semi-demarcative)
c. syntax? (property of the syntactic congurations which
dene the P-phrases)
d. intonation? (not likely that there would be a rst/second
person intonation)
Note also that since the nal H tone targets the end of a phonological phrase,
it is not like phrasal morphology, e.g. English -s, which is restricted to the
right edge of a syntactic noun phrase. Again, tone is different: there does not
seem to be a segmental or metrical equivalent.
54 Larry M. Hyman
This, then, brings us to the third question: Do tones have features? If yes,
are they universal in the sense that all languages dene their speech sounds
in terms of a small feature set (http://nickclements.free.fr/featuretheory.
html)? If no, how do we talk about different tone height and contours and
their laryngeal interactions? As Yip puts it:
A satisfactory feature system for tone must meet the familiar criteria of
characterizing all and only the contrasts of natural language, the appropriate
natural classes, and allowing for a natural statement of phonological rules
and historical change. In looking at East Asian tone systems the main
issues are these: (a) How many different tone levels must be represented?
(b) Are contour tones single units or sequences of level tones? (c) What is the
relationship between tonal features and other features, especially laryngeal
features? (Yip 1995: 477; cf. Yip 2002: 4041)
These and other issues will be addressed in subsequent sections. In 2 I will
outline the issues involved in responding to this question. In the following two
sections we will look at whether features can capture tonal alternations which
arise in multiple tone-height systems, rst concerning tonal morphology
(3) and second concerning abstract tonal phonology (4). The conclusion
in 5 is that although tone features may be occasionally useful, they are
not essential. I end by suggesting that the existence of tone features is not
compelling because of their greater autonomy and unreliable intersection
with each other and other features. This explains as well why tone is different
and not universal.
2. Do tones have features?
In addressing the above question, the central issue of this paper, it should
rst be noted that there has been no shortage of proposals of tone features
and tonal geometry. (See Anderson 1978, Bao 1999, Snider 1999, and Chen
2000: 96 for tone-feature catalogs.) However, there has been little agreement
other than: (i) we would like to avoid features like [RISING] and [FALLING];
(ii) we ought in principle to distinguish natural classes of tones by features;
(iii) we ought in principle to be able to capture the relation of tones to
laryngeal features, e.g. voicing, breathiness, creakiness. However, at the same
time, there has been a partial disconnect between tone features and tonal
analysis: Tone features are barely mentioned, if at all, in most theoretical and
descriptive treatments of tone. Tone features are, of course, mentioned in a
textbook on tone, but read on:
Although I have left unresolved many of the complex issues bearing on the
choice of a feature system, in much of the rest of this book, it will not be
necessary to look closely at the features of tone. Instead we will use just H,
M, L, or tone integers, unless extra insights are to be gained by formulating
the analysis in featural terms. (Yip 2002: 64)
In actual practice, unless a researcher is specically working on tone features,
s/he is likely to avoid them. Thus compare two recent books on Chinese
tonology, Bao (1999) vs. Chen 2000). Bao is specically interested in
developing a model of tonal geometry and tone features, which thus pervade
the book. Chen, on the other hand, is interested in a typology of tone sandhi
rules and how they apply, hence almost totally avoids features, using Hs and
Ls instead.
Since tone and vowel height are both phonetically scalar, it is not
surprising that similar problems arise in feature analyses. For example, the
respective coalescence of /a+i/ and /a+u/ to [e] and [o] is hard to describe if
/a/ is [+low] and /i/ and /u/ are [+high], since the desired output is [high,
low]. Similarly, the coalescence of a HL or LH contour to [M] is hard to
describe if H =[+STIFF] and L =[+SLACK], since the desired output is [STIFF,
SLACK]. Scalar chain shifts such as i e c and H M L are notorious
problems for any binary system. Still, phonologists do not hesitate to use
binary height features for vowels, but often not for tones.
The problem of tone features is largely ignored in two-height systems,
where there is little advantage to using, say, [UPPER] over H and L. Instead,
the issue concerns the nature of the H/L contrast, which can be privative and/
or binary, as in (9).
(9) a. /H, L/ e.g. Baule, Bole, Mende, Nara, Falam, Kuki-Thaadow,
Siane, Sko, Tanacross, Barasana
b. /H, / e.g. Afar, Chichewa, Kirundi, Ekoti, Kiwai, Tinputz,
Una, Blackfoot, Navajo, Seneca
c. /L, / e.g. Malinke (Kita), Ruund, E. Cham, Galo, Kham,
Dogrib, Tahltan, Bora, Miraa
d. /H, L, / e.g. Ga, Kinande, Margi, Sukuma, Tiriki, Munduruku,
Puinave, Yagua
Another variant is to analyze level tones as /H/ vs. //, but contour tones
as /HL/ and /LH/, as in Puinave: L-tones are considered phonetic entities,
which are therefore not specied lexically, except for the L-tones that are
part of the contrastive contour tones (Girn Higuita and Wetzels 2007).
56 Larry M. Hyman
Assuming the possibility of underspecication, similar analytical possi-
bilities occur in three-height tone systems, as in (10).
(10) a. /H, M, L/ b. /H, , L/ c. /H, M, L, /
/, M, L/
/H, M, /
Beyond the above possibilities is the fact that in some systems M is a distinct
third tone equally related to H and L, while in others M may be asymmetri-
cally related to one of the tones. This produces output possibilities such as
the following, where the
and
arrows represent raising and lowering,

respectively:
(11) a. /H, M, L/ M is equally related e.g. Tangkhul Naga
to /H/ and /L/ (pers. notes)
b. /

H, H, L/ M is a non-raised e.g. Engenni
variant of /H/ (Thomas 1978)
c. /H,
H, L/ M is a lowered e.g. Kom

variant of /H/ (Hyman 2005)
d. /H,

L, L/ M is a raised e.g. Kpelle
variant of /L/ (Welmers 1962)
e. /H, L,
L/ M is a non-lowered e.g. Ewe (Smith 1973,

variant of /L/ Stahlke 1971, Clements
1978)
While some languages have three underlying contrastive tone heights
(11a), others derive the third height by the indicated process in (11b-e). As
indicated in (12), both Kom and Ik have two underlying, but three surface
tone heights:
(12) a. Kom /H, L/ L-H L-M ( M) (Hyman 2005)
b. Ik /H, L/ L-H M-H ( M) (Heine 1993)
Whereas Kom regularly lowers a H to M after L, Ik raises a L to M before
H. Since the triggering tone may be lost, the M becomes surface-contrastive
in both languages. Finally, M may derive from the simplication of a HL
or LH contour tone, e.g. Babanki L-H
L-H L-M-H (Hyman 1979a: 23).

The above possibilities arise independent of whether the raising or lowering
process creates only one additional pitch level (as in the cited languages) or
whether there can be multiple upsteps and downsteps. The above all assumes
that tone features dene pitch levels rather than pitch changes. In a pitch-
change system such as Clarks (1978), the H, M and L tone heights could be
represented as /
/, // and /
/.
In principle, even more interpretations should be possible in systems
with four or ve surface-contrasting tone heights. Some such systems can be
shown to derive from three (or even two) underlying tones, e.g. Ngamambo,
whose four heights H, M,
M, L can be derived from /H/ and /L/ (Hyman

1986a). While it is sometimes possible to argue that the four (~ve) tone
heights form natural classes (see below), equally common are cases such
as in (13) where such evidence is weak or lacking:
(13) a. Five levels: Kam (Shidong) [China] (Edmondson and Gregerson
1992) (5=highest, L=lowest)
[a
11
[a
22
[a
33
[a
44
[a
55
thorn eggplant father step over cut down
b. Four level +ve contour tones in Itunyoso Trique [Mexico]
(Dicanio 2008)
Level Falling Rising
e
4
hair li
43
small yh
45
wax
nne
3
plough (n.) nne
32
water yah
13
dust
nne
2
to tell lie nne
31
meat
nne
1
naked
Where multiple contrasting tone heights join into natural classes the assump-
tion is that they share a feature. For this purpose numerous tone-feature
proposals have appeared in the literature, among which those in the following
table, based on Chen (2000: 96), where 5 =the highest and 1 =the lowest pitch:
(14) 5
(=H)
4 3
(=M)
2 1
(=L)
a. Halle and
Stevens (1971)
STIFF +
SLACK +
b. Yip (1980) UPPER + +
HIGH + +
c. Clements (1983) ROW 1 h h l l
ROW 2 h l h l
d. Bao (1999) STIFF + +
SLACK + +
# < 545 lgs. with n tone heights: 12 26 140 367
58 Larry M. Hyman
As seen in the top row, linguists often identify the tone heights with
integers, as it is not even clear what to call the tones. Thus, in a four-height
system, the middle two tones are sometimes called raised mid and mid,
sometimes mid and lowered mid. There also is no agreement on which
accents to use to indicate these two tones: While, [] unambigously indicates
M tone in a three-height systems, in a four-height system it sometimes
indicates the lower of the two M tones, sometimes the higher. The numbers
in the bottom line of (14) indicate how many tone systems I have catalogued
out of 545 with ve, four, three and two underlying tone heights. As seen,
systems with more than three heights are relatively rare as compared with
two- and three-height systems.
For the purpose of discussion let us assume the following feature system,
with Pulleyblanks (1986: 125) replacement of Yips HIGH with RAISED:
(15) Yip/Pulleyblank tone feature system (M =a lower-mid tone)
H M M L
UPPER + +
RAISED + +
4 3 2 1
The natural classes captured by such a system are the following:
(16) [+UPPER] [UPPER] [+RAISED] [RAISED]
H, M M, L H, M M, L
4, 3 2, 1 4, 2 3, 1
The interesting groupings are those captured by [RAISED], since the tone
heights 4,2 and 3,1 are not contiguous. While such pairings are sometimes
observed (see Gban in 3), there are problems inherent in this and the other
feature proposals in (14):
(17) a. 5height systems: no way to characterize a fth contrasting
tone height
b. 4height systems: no way to characterize the inner two tone
heights (3,2) as a natural class
c. 3height systems: potential ambiguity between two kinds of
mid tones (3 vs. 2)
Prior to the establishment of the feature system in (15), when features such
as [HIGH] and [LOW] were in currency, the general response to the problem
in (17a) was to propose a third feature such as MID (Wang 1967), to expand
the inventory in the mid range, or EXTREME (Maddieson 1971) which,
expanding the inventory at the top and bottom, has the dubious property
of grouping 1,5 as a natural class. Concerning the problem in (17b), either
a [EXTREME] specication, like [+MID], could group together the 3,2 tones
in a four-height system. However, such features have not gained currency
and appear almost as ad hoc as [UPPER, -RAISED]). Given that there are
only ve contrasting levels, the argument for three binary tone features
is considerably weakened if there is no principled way to pare the eight
logical feature combinations down to ve height values. Of course, there
is always the possibility that the same tone height might have different
feature values in different tone systems, which brings us to the problem
in (17c): The M tone in a three-height system can be either [+UPPER, RAISED]
or [UPPER, +RAISED], an issue which is taken up in 3 and 4 below. All of
these problems raise the question of how abstract the tonal representations
should be allowed to be: A scalar pitch system with 2, 3, 4 or 5 values would
be much more concrete, hence arguably the more natural solution were it
not for the general acceptance of binary features or privative elements
in segmental phonology and elsewhere, e.g. in morphology (Corbett and
Baerman 2006).
In the following two sections we will take a close look at how the features
in (15) fare in the analysis of selected three-height tone systems. 3 is
concerned with tonal morphology and 4 with abstract tonal phonology.
Both involve the potential featural ambiguity of phonetically identical M
tones as [+UPPER, RAISED] and [UPPER, +RAISED], even in the same language.
Although Bao (1999: 186) sees the dual representation of M as a virtue of the
theory, we shall see that such tone features do not always yield a revealing
account of M tone properties.
3. Tonal morphology and M tone
In this section we will examine how the tone features in (15) account for tonal
morphology. Focus will be on tonal marking on verbs. One argument for tone
features would be that they can function independently as tonal morphemes,
e.g. marking the inectional features of tense, aspect, mood, polarity, person
and number. We begin with two four-level tone systems whose inectional
tones tell two quite different stories. The rst is Iau, whose eight tone patterns
in (3) were seen to be lexical on nouns, but morphologically determined on
verbs, as in (18).
60 Larry M. Hyman
(18) telic totality of action resultative
punctual HL H LM
durative HLM ML M
incompletive HM
H

H
Although Iau verbs lend themselves to a paradigmatic display by morpheme
features, the portmanteau tonal melodies do not appear to be further
segmentable into single tones or features.
A quite different situation is found in the subject pronoun tones in
Gban [Ivory Coast], as reported by Zheltov (2005: 24):
(19) present past
sg. pl. sg. pl.
1st pers. i
2
u
2
i
4
u
4
[+upper]
2nd pers. cc
2
aa
2
cc
4
aa
4
3rd pers. c
1
:
1
c
3
:
3
[upper]
[raised] [+raised]
In the present tense, third person subject pronouns are marked by a 1 tone
(=lowest), while rst and second person pronouns have a 2 tone. In the past
tense, each tone is two levels higher: third persons receive 3 tone, while
rst and second persons have 4 tone. In this case tone features work like a
charm: As indicated, rst/second persons can be assumed to be marked by
[+UPPER] and third person by [UPPER]. These pronouns receive a [RAISED]
specication in the present tense vs. a [+RAISED] specication in the past
tense. (The same result would be achieved if we were to reverse [UPPER] and
[RAISED] to mark tense and person, respectively.)
It is cases like Gban which motivate Yips (1980) original proposal,
based on tonal bifurcation in East and Southeast Asia: If [UPPER] represents
the original tonal opposition, often attributable to a laryngeal distinction
in syllable nals, [RAISED] can potentially modify the original contrast
and provide the four-way opposition (which does not always produce four
tone levels in the Asian cases). As (19) demonstrates, the same historical
development has produced a four-height system whose natural classes
include 1,2 (present tense), 3,4 (past tense), 1,3 (rst and second person) and
2,4 (third person). Although Gban is a Mande language, similar four-level
systems are found in other subgroups of Niger-Congo, e.g. in Igede [Nigeria;
Benue-Congo] (Stahlke 1977: 5) and Wobe [Liberia; Kru] (Singler 1984).
Given the neatness of the Gban example, let us now consider how the
features [UPPER] and [RAISED] function as tonal morphemes in three-height
systems. A number of languages have the tonal properties in (20).
(20) a. noun stems contrast /H/, /M/ and /L/ lexically
b. verb roots contrast only two levels lexically but are realized
with all three levels when inectional features are spelled out
Again, it is the assignment of verb tones which is of interest. The relevant
tone systems fall into two types, which are discussed in the following two
subsections.
3.1. Type I: H/M vs. M/L verb tones
In the rst, represented by Day [Chad] (Nougayrol 1979), the two verb
classes have the higher/lower variants H/M vs. M/L:

(21) a. /yuu/ [+u] put on,
wear
/yuu/ [u]
drink
completive [+r] y yu
incompletive [r] y y
b. /yuu, H/ put on,
wear
/yuu, M/
drink
completive y yu
incompletive L y y
c. /yuu/ [+2] put
on, wear
/yuu/ [+1]
drink
completive y [+2] y [+1]
incompletive [1] y[+1] y []
In (21a) the lexical contrast is assumed to be [UPPER], while the (in)
completive aspect assigns [RAISED]. This produces a situation where both
[+UPPER, RAISED] and [UPPER, +RAISED] dene phonetically identical M tones.
The question is how one might account for the above facts without features.
(21b) posits a lexical contrast between /H/ and /M/. The completive aspect
is unmarked, while the incompletive aspect has a /L/ prex which combines
62 Larry M. Hyman
with the lexical tone of the verb. The resulting LH and LM contours would
then have to simplify to M and L, respectively. Since contours are rare in
the language (Nougayrol 1979: 68), this is not problematic. A corresponding
scalar solution is sketched in (21c), where it is assumed that the /H/ and /M/
verbs have values of [+2] and [+1], respectively. As seen, completive aspect
is unmarked, while incompletive aspect contributes a value of [1]. When
the integers combine, there are again two sources of [+1] M tone and one
source each of [+2] H and [] L tone.
While all three analyses capture the limited data in (21), the question is
how they fare when the verb is bi- or trisyllabic. The regular tone patterns
are schematized in (22).
(22)

completive M H HL H-M H-L M-M M-L H-H-L
incompletive L M ML M-M M-L L-M L-ML M-M-L
As seen, bi-syllabic verbs must end M or L. (The nal contour of L-ML
will be discussed shortly.) The one regular trisyllabic pattern shows that it is
only the last syllable that is affected, with inectional [RAISED] targeting the
H-H ~M-M on the rst two syllables. Let us, therefore, add to the analysis
in (21a) that the nal syllable is [UPPER] and contrastively prespecied for
[RAISED]. This produces the feature specications in (23).
(23) underlying

UPPER + +
RAISED + +
completive H-M H-L M-M M-L
UPPER + +
RAISED + + + + + +
incompletive M-M M-L L-M *L-L
UPPER + +
RAISED + +
correct: L-ML
As seen, all of the tones come out correctly except for the bottom right hand
form, where completive M-L is predicted to alternate with L-L rather than
the correct L-ML. The /H, M, L/ analysis in (21b) is better equipped to get
the right output. Recall that in this analysis that verb roots are /H/ vs. /M/.
When the incompletive L is prexed to M-M and M-L inputs, we obtain the
intermediate representations LM-M and LM-L. The LM-M becomes L-M by
delinking the M from the rst syllable. Assuming that the same happens in
the second case, all that needs to be said is that the delinked M reassociates
to the second syllable to produce the ML contour.
Since there is no input M in either the featural or scalar analyses, one
might attempt to provide one by fully specifying verb roots, with a [+UPPER,
+RAISED] /H/ verb becoming [+UPPER, RAISED] M in the incompletive. (There
would no longer be any need for a completive [+RAISED] prex.) However, this
still does not solve the problem. Since the M-L verb would have a [UPPER,
+RAISED] specication on its rst syllable, the [RAISED] incompletive prex
would only change the value of [RAISED], not delink it. We therefore would
have to propose that the incompletive prex is fully specied as [UPPER,
+RAISED]. What this does is make the analysis exactly identical to the /H, M, L/
analysis in (21b), where there was no need to refer to features at all. The
same is true of the scalar analysis, where the [1] incompletive prex would
have to contour with the [+1] M, as if it were a real tone, not a pitch-change
feature. We conclude that there is no advantage of a featural analysis of
tone in Day or in Gokana [Nigeria] which has a similar system (Hyman
1985).
3.2. Type 2: H/L vs. M verb tones
There is a second type of system where nouns have a three-way lexical
contrast between H, M and L and verbs a two-way contrast. While in the
type 1 languages the two-way contrast is identiable as a relatively higher
vs. lower verb tone, in type 2 one verb class alternates between H and L,
while the other is a non-alternating M. First documented in Bamileke-Fefe
(Hyman 1976), consider the H~L alternations on the rst (=root) syllable
of verbs in Leggb (Hyman et al. 2002), where the second tone is sufxal:
(24) MCA/ORA SRA NEG
Root tone: /L/ /M/ /L/ /M/ /L/ /M/
Perf./Prog. H-M M-M L-M M-M H-M M-M
Habitual L-L M-L L-L M-L H-M M-M
Irrealis L-L M-L L-L M-L L-L M-L
(MCA: main clause afrmative; SRA, ORA: subj./obj. relative
afrmative; NEG: negation)
64 Larry M. Hyman
Unless we adopt an ad hoc feature such as MID or EXTREME, there is
no synchronic reason why H and L should alternate to the exclusion of
M. Pasters (2003) solution is to propose that L is the underspeced tone
in Leggb such that H or L prexes can be assigned to it. A M root would
resist these prexal tones since it is specied. The solution has some appeal
as Leggb has only a few LH and HL tonal contours, hence little need to
prespecify L tone. However, it cannot work for Bamileke-Fefe, which has
numerous LM contours and oating L tones. While Hyman (1976) provided
an abstract analysis involving oating H tones on both sides of the L, the
alternative is to simply accept the arbitrariness of the H/L alternations, which
represent morphological processes of replacive tone. In this respect they
no more need to have a featural account than the replacive tone sandhi of
Southern Min dialects, e.g. Xiamen 24, 44 22 21 53 44 (Chen
1987). Type 2 systems thus provide even less evidence for tone features than
type 1.
4. Tonal phonology and M tone
While the previous section sought evidence for features from the behavior of
tonal morphemes which are assigned to verb forms, in this section we shall seek
purely phonological evidence for features in three-height tone systems. Since
the systems in (14b-d) provide four distinct feature congurations they also
make the prediction that a three-height system could have two phonologically
contrasting tones which are phonetically identical, as summarized in (25)
(25) a. /4/ and /3/ could be two kinds of phonetic H tone
b. /3/ and /2/ could be two kinds of phonetic M tone
c. /2/ and /1/ could be two kinds of phonetic L tone
In the following subsections we shall consider Villa Alta Yatzachi Zapotec,
which represents (25c), and Kagwe (Dida), which represents (25b). The
question will be whether tone features can be helpful in accounting for such
behaviors.
4.1. Two kinds of L tone in Villa Alta Yatzachi Zapotec
According to Pike ([1948] 1975), Villa Alta Yatzachi Zapotec [Mexico]
has three surface tones, H, M, and L, as well as HM and MH contours on
monosyllabic words. However, there are two kinds of L tones: those which
remain L in context vs. those which alternate with M. Pike refers to these as
class A vs. class B, respectively. In (26), these are identied as L
a
and L
b
:
(26) a. L
b
M /__ {M, H}
b. L
a
: ba cactus ba gli old cactus
L
b
: ba animal bia gli old animal
Rule (26a) says that class B L tones are raised to M before a M or H tone.
As seen in (26b), there are actual minimal pairs, i.e. words which are
phonetically identical in isolation but which have different behaviors in the
raising context. Assuming that we do not want to identify the two L tones by
means of a diacritic, as Pike does, there are two possible featural strategies
we might attempt. The rst in (27a) is to fully specify L
b
as [UPPER, +RAISED],
a lower-mid (M) tone, featurally distinct from both M and L:
(27) a. L
b
is fully specied as /M/ b. L
b
is underspecied
for [RAISED]
H M L
b
L H M L
b
L
UPPER + + +
RAISED + + + +
The second strategy in (27b) is to underspecify L
b
for exactly the feature
that alternates, namely [RAISED]. This makes L
b
featurally non-distinct from
both /M/ and /L/. The rules needed under each analysis are formulated in
(28).
(28) a. if L
b
is fully specied as [UPPER, +RAISED]
[UPPER, +RAISED] [RAISED] /__ [UPPER, RAISED]
b. if L
b
is underspecied for [RAISED]
[o RAISED] [ RAISED] /__ [ RAISED]
In (28a) the lower-mid tone becomes L when followed by L. Since the
lowering has to occur also before pause, we would have to assume a prepausal
L% boundary tone. In (28b), the underspecied [RAISED] feature acquires the
same value as what follows it, thereby becoming [+RAISED] before H and
M, but [raised] before L(%). Except for the use of the alpha notation to
represent feature spreading, both analyses seem reasonable up to this point.
Now consider a second process where the H of the second part of a
compound is lowered to M after both L
a
and L
b
:
66 Larry M. Hyman
(29) a. /d-/ (L
a
) denominalizer +/zz/ sweet dziz a
sweet
b. /ns/ (L
b
) water +/y?/ re nisyi'
kerosene
Assuming this is assimilation rather than reduction (perhaps questionable),
the rules would be as follows:
(30) a. if L
b
is fully specied as fourth tone [UPPER, +RAISED]
[+UPPER] [RAISED] / [UPPER] #__
b. if L
b
is underspecied for [RAISED]
[+UPPER] [UPPER] /{ [UPPER, {RAISED, o RAISED} ] }#__
Each of the above rules has a problem. In (30a), the change of feature value
is not explicitly formalized as an assimilation, e.g. by spreading of a feature.
Instead, [+UPPER] changes to [RAISED] after [UPPER]. The rule in (30b) can be
expressed as the spreading of a preceding [UPPER], but requires the awkward
disjunction in the environment so that M tone, which is [UPPER, +RAISED],
does not condition the rule. Note that one cannot rst ll in [o RAISED] as
[RAISED], since, as seen in (29b), [o RAISED] becomes [+RAISED] by the rule
in (28b). It is thus not obvious that features are helpful in distinguishing the
two kinds of L tone in this language.
4.2. Two kinds of M tone in Kagwe (Dida)
The problem is even more acute in Kagwe (Dida) [Ivory Coast], which has
two types of M tone (Koopman and Sportiche 1982): /M/ (class A) alternates
between M and H, while /M/ (class B) remains M. The rule in question is
formulated in (31a).
(31) a. M
a
H / M
a
__ otherwise M
a
M (=M
b
)
b. M
a
: le spear mn l this spear
j child mn j this child
c. M
b
: kp\
bench mn kp\
this bank
l elephants mn l these elephants
As indicated, M
a
becomes H after another M
a
. Alternations are seen after the
L-M
a
word mn this/these in (31b). M
b
tones do not change after mn in
(31c).
As in the case of Zapotec L
b
, two possible underlying representations of
M
a
are considered in (32).
(32) a. M
a
is fully specied as /M/ b. M
a
is underspecied for [UPPER]
H M
a
M
b
L H M
a
M
b
L
UPPER + + +
RAI SED + + + + +
In (32a), M
a
is fully specied as M vs. phonetically identical M
b
, which has
the features of a lower-mid. In (32b), M
a
is underspecied for the feature
which alternates, namely [UPPER], hence is non-distinct from both /H/ and
/M
b
/. The rules needed under each of these analyses are formulated in (33).
(33) a. M
a
is fully specied as [+UPPER, RAISED]
[+UPPER, RAISED] [+RAISED]
/
[+UPPER, RAISED] __
b. M
a
is underspecied for [UPPER]
[o UPPER] [+UPPER]
/
[o UPPER] __
[o UPPER] [UPPER]
In (33a), the raising rule appears to be dissimilatory, perhaps an OCP effect?
The question here is why the language would not permit a succession of
abstract [+UPPER, RAISED] tones, at the same time allowing phonetically
identical [M-M] sequences from three other sources: /M
a
-M
b
/, /M
b
-M
a
/,
/M
b
-M
b
/. The rule would make sense only if Kagwe has an output condition
*[+UPPER, RAISED], with all remaining such tones converting to [UPPER,
+RAISED]. However, this would be a very abstract analysis indeed. The rule in
(33b) is even more suspect: Why should [o UPPER] become [+UPPER] only if
preceded by another [o UPPER]?
While Koopman and Sportiche (1982) do point out that other Dida
dialects have four contrasting tone heights as suggested by the matrix in
(32a), there are other possible analyses of M
a
. One is to treat M
a
as /M/ and
M
b
as //. The dissimilation rule would thus become M H / M __. Even
better is to represent M
a
either as a MH contour tone, as in (34a), or as a M
tone followed by a oating H, as in (34b).
(34) a. M
a
as a contour b. M
a
as M +oating H

M H M H
68 Larry M. Hyman
c. M
a
H as plateauing

=
M H M H
If M
a
is analyzed as M followed by oating H, as in (34b), the raising rule
can be formulated as a common case of H tone plateauing, as in (34c). In
fact, one might even attempt such an interpretation of Villa Alta Yatzachi
Zapotec L
b
, which could be a L followed (preceded?) by a oating M. What
this means is that featural analyses may in some cases be denecessitated by
the availability of contour representations and oating tones. Both of the
representations in (34a,b) at least give a principled reason why M
a
becomes
H after another M
a
.
4.3. Lowered or downstepped M tone?
In the preceding two subsections we have considered two three-height tone
systems which have two classes of phonetically identical tones: L
a
vs. L
b
in
Villa Alta Yatzachi Zapotec and M
a
vs. M
b
in Kagwe. While these L
b
and M
a

alternate with M and H, respectively, the output system still remains one of
three tone heights. A slightly different situation is found in J ibu [Nigeria]
(Van Dyken 1974: 89), whose class 1 vs. class 2 M tone properties are
summarized and exemplied in (35).
(35) a. a class 2 mid tone is lowered when it follows a class 1 mid tone.
ti
wa
-
n he is buying cloth (ti

=M
1
, wa
-
n =M
2
)
b. both a class 1 mid tone and a class 2 mid tone are lowered when
they follow a lowered mid tone.
k s
bi b he made bad thing (k, s, b =M

1
, bi =M
2
)
As indicated, J ibu appears to have a surface four-height system with the need
to distinguish between two types of M tone. Since it is M
2
which undergoes
lowering, it seems appropriate to analyze it as involving a L+M sequence in
one of the ways in (36).
(36) a. M
2
as a contour b. M
2
as oating L +linked M

L M L M
What is crucial in this process is that M
2
establishes a new (lower) M level to
which all subsequent M tones assimilate. Thus, in (35b), M
1
b is realized on
the same level as the preceding tone of
bi and not higher. The prediction

of the [UPPER] and [RAISED] tone features is that the inner two tones of a four-
height system should not be systematically related, since they bear opposite
values of both features. In fact in every case I know where M assimilates
to M after another M, the latter can be interpreted as a non-iteratively
downstepped
M, as in J ibu, Gwari, Gokana, Ngamambo etc. (Hyman

1979a, 1986a), and possibly Bariba, the example which Clements, Michaud
and Patin (this volume) cite. This observation raises the question of whether
iterative
H,
M, and
L downsteps should be captured by a feature system

vs. an independent register node or tier (cf. the same question concerning the
relation between vowel height and ATR (Clements 1991)).
In summary, while M tones should provide unambiguous evidence for
features, instead questions arise due to their phonological properties (recall
(11)). For every case where tone features appear to be useful, or at least
usable, there is another case where they either dont provide any insight or
run into difculties. Why this may be so is the issue with which I conclude
in 5.
5. Conclusion
From the preceding sections we conclude that the case for tonal features is
not particularly strong. This is revealed both from the specic examples that
have been examined as well as the widespread practice of referring to tones
in terms of H, M, L or integers. Let us now revise and reorder the questions
that were raised in (1) and ask:
(37) a. Why is tone different?
b. Why is the case for tone features so weak?
c. Why isnt tone universal?
It turns out that the answer to all three questions is the same: Tone is different
because of its greater diversity and autonomy compared to segmental
phonology. Because of its diversity tone is hard to reduce to a single set of
features that will do all tricks. Because of its autonomy, feature systems that
have been proposed, even those which relate tones to laryngeal gestures,
are not reliable except perhaps at the phonetic level. Given that tone is so
diverse and so poorly gridded in with the rest of phonology, it is not a good
70 Larry M. Hyman
candidate for universality. Let us consider the two notions of diversity and
autonomy a bit further.
In the preceding sections we have caught only a glimpse of the extraordinary
diversity of tone systems. Languages may treat tone as privative, /H, /,
equipollent, /H, L/, or both, /H, L, /. Given that F
0
, the primary phonetic
correlate of tone, is scalar, the question is whether some systems treat tone
as gradual:
Gradual oppositions are oppositions in which the members are characterized
by various degrees or graduations of the same property. For example: the
opposition between two different degrees of aperture in vowels... or between
various degrees of tonality.... Gradual oppositions are relatively rare and not
as important as privative oppositions. (Trubetzkoy [1939] 1969: 75)
Because of the phonetically gradient nature of tone, the use of integers to
represent tone heights has some appeal. Speakers are capable of distinguishing
up to ve tone heights and all of the pitch changes between them, whether as
contours within a single syllable or as steps up and down between syllables.
Preserving the pitch changes between syllables sometimes has interesting
effects in tonal alternations. As seen in (38a), in the Leggb N
1
of N
2

construction, if the second noun has a L prex, it will be raised to M (the
genitive marker // is optionally deleted):
(38) a. L-L M-L
g-b squirrel ldzil g-b food of squirrel
l-gwl leaf iz ligwl odor of leaf
b. L-M M-H
l-zl bird gc
mm lizl beak (mouth of bird)

g-di palm nn ged palm oil
c. L-M-M L-H-M
gc
-km disease zc
i gckm cause of disease

c
-kl European c
tt ckl house of European

d. M-M M-M
c-ppy market ldzl cppy day of market
H-M H-M
l-dzil food cvvc
n ldzil place of food

As seen in (38b), if N
2
has a L prex and a M stem-initial syllable, the L-M
sequence will become M-H. It is as if the raising process were one of upstep,

/L-M/, designed to preserve the step up between the L and M syllables of the
input. The fact that only the rst syllable of a M-M stem is affected in (38c)
is neatly accounted for by Steriade (2009): Although the output preserves the
pitch change of /L-M/, there is no requirement to preserve the lack of a pitch
change of a /M-M/ input. The examples in (38d) show that there is no change
if the N
2
has a M or H prex.
To appreciate further how some tone systems care about such syntag-
matic faithfulness, consider the realization of /L-HL-H/ in the following
Grasselds Bantu languages [Cameroon]:
(39) Language Output Process Reference
a. Mankon L-H-

H H-upstep Leroy (1979)
b. Babanki L-M-H HL-fusion Hyman (1979b)
c. Babadjou L-H-
H H-downstep (personal notes)

d. Dschang L-
H-H HL-fusion+ Hyman and

downstep Tadadjeu (1976)
e. Kom L-M-M H-lowering Hyman (2005)
f. Aghem L-H-H L-deletion Hyman (1986b)
While all six languages simplify the HL input, thereby minimizing the number
of ups and downs (Hyman 1979a: 24) and all but the last preserve a trace of
both the H and the L, they make different choices as to what to preserve in
terms of the syntagmatic relations. The upstep in Mankon is similar to what
was seen in Leggb: when the L of HL-H delinks, the rise to the next tone
is preserved by means of upstepping the following H. Similarly, the step up
is preserved in Babanki, this time by fusing the HL to a M tone. While the
H-
H in Babadjou realizes the drop that should have occurred between the
two Hs, there is no pitch change between the second and third syllables in
Bamileke-Dschang and Kom, which unambigously encode the lost L, or in
Aghem, which shows no trace of the L at all.
Having established some of the extraordinary diversity of tone systems,
let us now address the issue of autonomy. Tone, of course, was the original
autosegmental property (Goldsmith 1976), and there is no problem
demonstrating the advantages of representing tone on a tier separate both from
its TBU and from the segmental features. Although tones require segments in
order to be pronounced, I would argue that tones are not reliably integrated into
a system of articulatory or acoustic features the way consonants and vowels
are. For example, [+high, low] not only denes a class of high vowels, /i, ,
u, u/, with F
1
and F
2
dening a two-dimensional gridded vowel space, but
also a systematic intersection with palatal and velar consonants (Chomsky
and Halle 1968). [+UPPER, RAISED], on the other hand, only denes a H tone,
72 Larry M. Hyman
not a class of tones. We might therefore switch to [+STIFF, SLACK] (Halle
and Stevens 1971) to relate H tone to voiceless obstruents and implosives
and L tone to voiced and breathy voiced obstruents. While intersections of
tones with laryngeal features or phonation types (aspiration, breathiness,
glottalization, voicing) appear to provide evidence that tone features are
gridded in, note rst that [STIFF, SLACK] dene only three possibilities,
whereas there can be up to ve contrasting tone heights. More importantly,
tone-laryngeal interactions are notoriously unreliable. As has been long
known from diachronic studies in Southeast Asia and Athabaskan, the same
laryngeal source can correspond diachronically to either H or L (see the
various papers in Hargus and Rice 2005). Within Southern Bantu, so-called
depressor consonants are not necessarily voiced (Schachter 1976, Traill
1990, Downing 2009). Even implosives, long held to be pitch raisers,
show inconsistent tonal correspondences (Tang 2008). A particularly striking
anti-phonetic case comes from Skou [Indonesia: Papua], where there are
no words with a L tone melody in which any syllable has a voiced stop
onset (Donohue 2004:87). This is reminiscent of Newmans (1974:14)
description of Kanakuru verbs, which are H-L after voiced obstruents, L-H
after voiceless obstruents and implosives, and contrastively H-L vs. L-H
when sonorant-initial.
While [UPPER] and [RAISED] and the comparable systems in (14c,d) were
designed to mirror diachronic, laryngeally-induced tonal bifurcations in
Chinese and elsewhere, the synchronic reexes may involve a level vs.
contour contrast, rather than producing a four-height tonal system:
(40) a. Thakali (Hari 1971: 26) b. Grebo (based on Newman 1986:
178)
level falling level rising
tense H HL [+RAISED] M MH
lax L LHL [RAISED] L LM
Starting with a /H/ vs. /HL/ contrast in Thakali [Nepal], a lower (lax)
register adds an initial L feature which converts /H/ to L, but combines with
the HL to produce a LHL contour tone. However, as Mazaudon and Michaud
(2008: 2534) point out for closely related Tamang, the 2 x 2 pairings are not
always obvious. The same can be said about Chinese, where Baos (1999)
tone sandhi analyses in terms of two sets of features {H, L} and {h, l}, as
well as {hl} and {lh} contours, lend themselves to alternative interpretation
and do not come without complication (Hyman 2003: 281).
In fact, it is not clear that diachronic developments inevitably lead to
the positing of tone features. Mazaudon (1988: 1) argues that tones do not
change by shared features, rather Jeder Ton hat seine eigene Geschichte
[each tone has its own history]. As in the present paper, she nds little value
in analyzing tones in terms of features:
It seems to me that tones are simply different from segments and should
be treated differently in the phonology.... My best present proposal would
be that tones do not break up into features until the phonetic level, and
that consequently these features (which I propose to call parameters to
distinguish them clearly from distinctive features) are inaccessible to the
phonology. (Mazaudon 1988: 7)
Nowhere is this clearer than in those systems where one tone is arbitrarily
replaced by another. As mentioned above, in non-phrase-nal position in
Xiamen every tone is replaced by an alternate tone, as follows: 24, 44
22 21 53 44 (Chen 1987). Despite attempts, any featural analysis
of such scales is hopeless. Mortensen (2006) cites a number of other tone
chains which are quite abstract and diverge signicantly from following
a phonetic scale such as L M H. I would argue that tone is capable
of greater abstractness than segmental phonology or, at least, that
comparable abstract analyses are better supported in tone than elsewhere.
This has to do with the greater extractability of pitch and tonal patterns
than segmental distributions: Thus, the Xiamen tone circle is clearly
productive, while the synchronic status of the Great English Vowel Shift
is more controversial. The greater autonomy and extractability of tone are
also responsible for its more extensive activity at the phrase level, as seen in
morphophonemic alternations in Chinese, Otomanguean, African and other
tone systems.
In short, tone is the most isolable gesture-based phonological property.
This property is undoubtedly related to the fact that pitch also provides the
best, if not universal expression of intonation, marking whole clauses and
utterances. However, lexical, post-lexical, and intonational tones cannot
be pronounced by themselves, unlike vowels and most consonants whose
features may produce pronounceable segments of themselves. In fact,
Harris and Lindsey (1995) and Harris (2004) have developed a minimalist
approach to segmental features where no representation is unpronounceable.
It is hard to see how this could be extended to tone, since a pitch feature
cannot be pronounced by itself. While one might think that this would
force tone to become inextricably tied to segments, just the reverse is true:
74 Larry M. Hyman
Tone is highly independent (autosegmental) and free to enter into abstract
relationships including many which defy a featural interpretation. Of course
tone is not alone in having these properties. Length and metrical stress,
two other non-featural prosodic properties, also show high autonomy.
However, neither vowel nor consonant length has the complexity of tone,
as contrasts are normally limited to two values, long vs. short. While stress
is both complex and abstract like tone, it is typically (denitionally?) word
bound. We thus return to the initial observation: Tone can do everything
that non-tonal phonology can do, but not vice-versa. While some languages
require every word to have a H tone, like word-stress, no language requires
every word to have a stop or a high vowel. Thus, if tones consist of features,
they are the only features that can be obligatorily required of a word. To
conclude, there seems to be little advantage to treating tones other than
the way that most tonologists treat them: as privative elements that are
related to each other through their relative and scalar phonetic properties
(cf. Mazaudon above). It thus may make most sense to adopt the integer
system even for two-height systems: /H, L/ =/2, 1/, /H, M, L/ =/3, 2, 1/, and
so forth.
References
Anderson, Stephen R
1978 Tone features. In: Victoria Fromkin (ed.), Tone: A Linguistic Survey,
133175. New York: Academic Press.
Bao, Zhiming
1999 The Structure of Tone. New York and Oxford: Oxford University
Press.
Bateman , J anet
1990 Iau segmental and tonal phonology. Miscellaneous Studies of
Indonesian and other Languages in Indonesia (1): 2942.
Chen, Matthew
1987 The syntax of Xiamen tone sandhi. Phonology Yearbook 4: 109
149.
2000 Tone Sandhi. Cambridge: Cambridge University Press.
Chomsky, Noam and Morris Halle
1968 The Sound Pattern of English. New York: Harper and Row.
Clark, Mary M
1978 A dynamic treatment of tone with special attention to the tonal system
of Igbo. Bloomington: IULC.
Clements, G. N.
1978 Tone and syntax in Ewe. In: Donna J o Napoli (ed.), Elements of
Tone, Stress and Intonation, 2199. Washington, D.C.: Georgetown
University Press.
1983 The hierarchical representation of tone features. In: Ivan R. Dihoff
(ed.), Current Approaches to African Linguistics (vol. 1), 145176.
Dordrecht: Foris.
1991 Vowel height assimilation in Bantu languages. Proceedings of the
Seventeenth Annual Meeting of the Berkeley Linguistics Society,
Special Session on African Language Structures, 2564.
2005 The role of features in phonological inventories. Presented at J ourne
Les gometries de traits/Feature geometries, Universit de Paris
8 and Fdration Typologie et Universaux en Linguistique (TUL),
Paris, 3 December 2005 (powerpoint in English).
Clements, G. N., Alexis Michaud and Cdric Patin.
2011 Do we need tone features? Paper presented at the Symposium on Tones
and Features, University of Chicago Paris Center, June 1819, 2009.
Corbett, Greville G. and Matthew Baerman
2006 Prologomena to a typology of morphological features. Morphology
16.231246.
Demuth, Katherine
2003 The acquisition of Bantu languages. In: Derek Nurse and Grard
Philippson (eds.), The Bantu Languages, 209222. London: Routledge.
Dicanio, Christian
2008 The phonetics and phonology of San Martin Itunyoso Trique. Ph.D.
diss. University of California, Berkeley.
Donohue, Mark
2004 A grammar of the Skou Language of New Guinea. Ms. National
University of Singapore. http://rspas.anu.edu.au/~donohue/Skou/
index.html
Downing, Laura J .
2009 On pitch lowering not linked to voicing: Nguni and Shona group
depressors. In: Michael Kenstowicz (ed.), Data and Theory: Papers
in Phonology in Celebration of Charles W. Kisseberth. Language
Sciences 31.179198.
Edmondson, J erald Al. and Kenneth J . Gregerson
1992 On ve-level tone systems. In Shina J a J . J uang and William R.
Merrield (eds), Language in Context: Essays for Robert E. Longacre,
555576. SIL and University of Texas at Arlington.
Girn Higuita, J .M. and W. Leo Wetzels
2007 Tone in Wnsht (Puinave), Colombia. In: W. Leo Wetzels (ed.),
Language Endangerment and Endangered Languages: Linguistic and
76 Larry M. Hyman
Anthropological Studies with Special Emphasis on the Languages and
Cultures of the Andean-Amazonian Border Area, 129156. Leiden:
CNWS.
Goldsmith, J ohn
1976 Autosegmental phonology. Ph.D. diss., Department of Linguistics,
Massachusetts Institute of Technology.
Halle, Morris and Kenneth Stevens
1971 A note on laryngeal features. Quarterly Progress Report (101):198
213. Cambridge, MA: MIT Research Laboratory of Electronics.
Hargus, Sharon and Keren Rice (eds.)
2005 Athabaskan Prosody. Amsterdam: J ohn Benjamins.
Hari, Maria
1971 A guide to Thakali tone. Part II to Guide to Tone in Nepal. Tribhvan
University, Kathmandu: SIL.
Harris, J ohn
2004 Release the captive coda: the foot as a domain of phonetic interpretation.
In: J . Local, R. Ogden and R. Temple (eds.), Phonetic Interpretation:
Papers in Laboratory Phonology 6, 103129. Cambridge: Cambridge
University Press.
Harris, J ohn and Geoffrey Lindsey
1995 The elements of phonological representation. In: J acques Durand and
Francis Katamba (eds.), Frontiers of Phonology: Atoms, Structures,
Derivations, 3479. Harlow, Essex: Longman.
Heine, Bernd
1993 Ik Dictionary. Kln: Rdiger Kppe Verlag.
Hyman, Larry M.
1976 Do vient le ton haut du bamileke-fefe? In: Larry M. Hyman,
Leon C. J acobson and Russell G. Schuh (eds.), Papers in African
linguistics in Honor of Wm. E. Welmers, 123134. Studies in African
Linguistics, Supplement 6. Los Angeles: University of California, Los
Angeles.
1979a A reanalysis of tonal downstep. Journal of African Languages and
Linguistics (1):929.
1979b Tonology of the Babanki noun. Studies in African Linguistics (10):
159178
1985 A Theory of Phonological Weight. Dordrecht: Foris Publications.
1986a The representation of multiple tone heights. In: Koen Bogers,
Harry van der Hulst, and Maarten Mous (eds.), The Phonological
Representation of Suprasegmentals, 109152. Dordrecht: Foris.
1986b Downstep deletion in Aghem. In: David Odden (ed.), Current
Approaches to African Linguistics, vol. 4, 209222. Dordrecht: Foris
Publications.
2003 Review of Bao, Zhiming. 1999. The Structure of Tone. New York and
Oxford. Oxford University Press. Linguistic Typology (7):279285.
2005 Initial vowel and prex tone in Kom: Related to the Bantu Augment?
In Koen Bostoen and J acky Maniacky (eds.), Studies in African
Comparative Linguistics with Special Focus on Bantu and Mande:
Essays in Honour of Y. Bastin and C. Grgoire, 313341. Kln:
Rdiger Kppe Verlag.
In press Tone: is it different? In: J ohn Goldsmith, J ason Riggle and Alan
Yu (eds.), The Handbook of Phonological Theory, 2nd Edition.
Blackwell.
Hyman, Larry M., Heiko Narrog, Mary Paster, and Imelda Udoh
2002 Leggb verb inection: A semantic and phonological particle analysis.
Proceedings of the 28th Annual Berkeley Linguistic Society Meeting,
399410.
Hyman, Larry M. and Maurice Tadadjeu
1976 Floating tones in Mbam-Nkam. In: Larry M. Hyman (ed.), Studies in
Bantu Tonology, 57111. Southern California Occasional Papers in
Linguistics 3. Los Angeles: University of Southern California.
Kisseberth, Charles W.
2009 The theory of prosodic phrasing: the Chimwiini evidence. Paper
presented at the 40th Annual Conference on African Linguistics,
University of Illinois, Urbana-Champaign, April 911, 2009.
Koopman, Hilda and Dominique Sportiche
1982 Le ton abstrait du Kagwe. In : J onathan Kaye, Hilda Koopman and
Dominique Sportiche (eds.), Projet sur les Langues Kru, 4659.
Montreal: UQAM.
Leroy, J acquelines
1979 A la recherche de tons perdus. Journal of African Languages and
Linguistics (1) : 5571.
Li, Charles N. and Sandra A. Thompson
1977 The acquisition of tone in Mandarin-Speaking Children. Journal of
Child Language (4): 185199.
Maddieson, Ian
1971 The inventory of features. In: Ian Maddieson (ed.), Tone in Generative
Phonology, 318. Research Notes 3. Ibadan: Department of Linguistics
and Nigerian Languages, University of Ibadan.
Mazaudon, Martine
1988 An historical argument against tone features. Paper presented at
the Annual Meeting of the Linguistic Society of America, New
Orleans.
2003 Tamang. In: Thurgood, Graham and Randy J . LaPolla. The Sino-
Tibetan Languages, 291314. London and New York: Routledge.
78 Larry M. Hyman
Mazaudon, Martine and Alexis Michaud
2008 Tonal contrasts and initial consonants: A case study of Tamang, a
missing link in tonogenesis. Phonetica (65): 231256.
Mortensen, David R.
2006 Logical and substantive scales in phonology. Ph.D. diss., University
of California, Berkeley.
Myers, J ames and J ane Tsay
2003 A formal functional model of tone. Language and Linguistics (4):
105138.
Newman, Paul
1974 The Kanakuru Grammar. Leeds: Institute of Modern English
Language Studies, University of Leeds, in association with the West
African Linguistic Society.
1986 Contour tones as phonemic primes in Grebo. In: Koen Bogers,
Harry van der Hulst, and Maarten Mous (eds.), The Phonological
Representation of Suprasegmentals, 175193. Dordrecht: Foris.
Nougayrol, Pierre
1979 Le Day de Bouna (Tschad). I. Elments de Description Linguistique.
Paris: SELAF.
Paster, Mary
2003 Tone specication in Leggbo. In: J ohn M. Mugane (ed.), Linguistic
Description: Typology and Representation of African Languages.
Trends in African Linguistics (8): 139150.
Philippson, Grard
1998 Tone reduction vs. metrical attraction in the evolution of Eastern
Bantu systems. In Larry M. Myman and Charles W. Kisseberth (eds),
Theoretical Aspects of Bantu Tone, 315329. Stanford: C.S.L.I.
Pike, Eunice Victoria.
1975 Problems in Zapotec tone analysis. In: Brend, Ruth M. (ed.),
Studies in Tone and Intonation by Members of the Summer Institute
of Linguistics, University of Oklahoma, 8499. Basel: S. Karger.
Original edition, IJAL (14): 161170, 1948.
Pulleyblank, Douglas
1986 Tone in Lexical Phonology. Dordrecht: D. Reidel.
Schachter, Paul
1976 An unnatural class of consonants in Siswati. In: Larry M. Hyman,
Leon C. J acobson and Russell G. Schuh (eds.), Papers in African
Linguistics in Honor of Wm. E. Welmers, 211220. Studies in African
Linguistics, Supplement 6.
Singler, J ohn Victor
1984 On the underlying representation of contour tones in Wobe. Studies in
African Linguistics (15): 5975.
Smith, Neil
1973 Tone in Ewe. In Eric E. Fudge (ed.), Phonology. London: Penguin.
Original edition, Quarterly Progress Report (88), 290304.
Cambridge, MA: MIT Research Laboratory of Electronics.
Snider, Keith
1999 The geometry and features of tone. Dallas: Summer Institute of
Linguistics.
Stahlke, Herbert
1971 The noun prex in Ewe. Studies in African Linguistics, Supplement 2,
141159.
1977 Some problems with binary features for tone. International Journal of
American Linguistics (43): 110.
Steriade, Donca
2009 Contour correspondence: tonal and segmental evidence. Paper
presented at Tones and Features: A Symposium to Honor Nick
Clements, Paris, J une 1819, 2009.
Tang, Katrina Elizabeth
2008 The phonology and phonetics of consonant-tone interaction. Ph.D.
diss. University of California, Los Angeles.
Thomas, Elaine
1978 A Grammatical Description of the Engenni Language. University of
Texas at Arlington: Summer Institute of Linguistics.
Traill, Anthony
1990 Depression without depressors. South African Journal of African
Languages (10): 166172.
Trubetzkoy, N.S.
1969 Grundzge der Phonologie [Principles of Phonology]. Translated
by Christiane A. M. Baltaxe. Berkeley: University of California
Press. Original edition: Travaux du cercle linguistique de Prague 7,
1939.
van Dyken, J ulia
1974 J ibu. In: J ohn Bendor-Samuel (ed.), Ten Nigerian Tone Systems,
8792. Studies in Nigerian Languages, 4. J os and Kano: Institute of
Linguistics and Centre for the Study of Nigerian Languages.
Wang, William S.-Y.
1967 Phonological features for tone. International Journal of American
Linguistics (33): 93105.
Welmers, William E.
1962 The phonology of Kpelle. Journal of African Languages (1): 69
93.
1973 African Language Structures. Berkeley and Los Angeles: University
of California Press.
80 Larry M. Hyman
Yip, Moira
1980 The tonal phonology of Chinese. Ph.D. diss., Department of
Linguistics, Massachusetts Institute of Technology.
1995 Tone in East Asian Languages. In: J ohn Goldsmith (ed.), Handbook of
Phonological Theory, 476494. Oxford: Basil Blackwell.
2002 Tone. Cambridge: Cambridge University Press.
Zheltov, Alexander
2005 Le systme des marqueurs de personnes en gban: Morphme
syncrtique ou syncrtisme des morphmes. Mandenkan (41): 2328.
Features impinging on tone
*
David Odden
A long-standing puzzle in phonological theory has been the nature of
tone features. Chomsky and Halle (1968) offers features for most other
phonological properties of language, but no proposals for tone were advanced
there. Research into the nature of tone features has focused on three basic
questions. First, how many levels and features exist in tone systems? Second,
what natural classes and phonological changes are possible in tonal grammars?
Third, how should segmental effects on tone be modeled? Although many
proposals provide enlightening answers to individual questions, no proposal
handles all of the facts satisfactorily.
The purpose of this paper is to address the unity of tone features. I
argue that the basic source of the problem of answering these questions
lies in incorrect assumptions about the nature of features, specically
the assumption that there is a single set of predetermined features with a
tight, universal mapping to phonetics. I argue for Radical Substance Free
Phonology, a model where phonological features are learned on the basis
of grammatically-demonstrated segment classes rather than on the basis of
physical properties of the sounds themselves, making the case for such a
theory from the domain of tone. Empirically, I show that, like voicing, vowel
height is a feature relevant to synchronic tonal phonology, drawing primarily
on facts from the Adamawa language Tupuri.
*
Research on this paper was made possible in part with the support of CASTL, Uni-
versity of Troms. I would like to thank Dieudonne Ndjonka, who provided me
with my data for Tupuri, and Molapisi Kagisano, who provided me with my data
for Shua, and Mike Marlo, Charles Reiss and Bridget Samuels for comments on
an earlier version of this paper. Earlier versions of this paper have been presented
at the Universities of Troms, Amsterdam, Indiana, and Harvard, as well as at the
symposium in honor of G. Nick Clements.
82 David Odden
1. The nature of features
The point of theoretical departure for this investigation is the set of
representational and computational assumptions which have characterized
research in non-linear phonology for numerous years. Features are obviously
essential in this theorizing, since they are the basis for grouping sounds
together in phonological rules. There has been a long-standing question
regarding the ontology of features, whether they are fundamentally phonetic
descriptions of sounds which phonologies refer to, or purely formal in nature,
only serving the purposes of phonological classication and lacking intrinsic
phonetic content.
The traditional viewpoint is that a phonetic classication of speech
sounds should provide the conceptual underpinnings for phonological
analysis, early exemplications of this view being found in the works
of Sweet, Sievers, J espersen and J ones. The emergence of a distinction
between phonetics (speech) and phonology (language) following de
Saussure raises questions as to the proper relationship between phonetics and
phonology. Trubetzkoy (1939) observes (p. 11) that most of these [acoustic
and articulatory] properties are quite unimportant for the phonologist...[who]
needs to consider only that aspect of sound which fullls a specic function
in the system of language [emphasis mine], and that (p. 13) ...the linguistic
values of sounds to be examined by phonology are abstract in nature. They
are above all relations, oppositions etc., quite intangible things, which can be
neither perceived nor studied with the aid of the sense of hearing or touch.
The grasping of this distinction between phonetics and phonology leads
to the essential question about the ontology of these relations. Trubetzkoy
states (p. 91): The question now is to examine what phonic properties
form phonological (distinctive) oppositions in the various languages of
the world. An important presupposition contained in this question is that
phonetic properties do indeed form phonological oppositions, which is
to say, Trubetzkoy (and others) assume that the elements of phonology
are phonetically dened. Despite various differences between theories of
phonology over many years, the assumption of phonetic-fundamentality has
been at the root of most phonological theorizing.
In the line of research pertaining to formal feature theory that starts with
J akobson, Fant and Halle (1952), up through Chomsky and Halle (1968), it
has been a standard assumption that phonological classes are stated in terms
of universal phonetically-dened features which are physical properties. For
example, The Sound Pattern of English (SPE) (Chomsky and Halle 1968)
states that The total set of features is identical with the set of phonetic
properties that can in principle be controlled in speech; they represent the
phonetic capabilities of man and, we would assume, are therefore the same
for all languages. (pp. 2945), and that the phonetic features are physical
scales and may thus assume numerous coefcients, as determined by the
rules of the phonological component (p. 297). The perspective that features
are fundamentally phonetic descriptions of language sounds, which happen
to be used in phonological grammars, has been a major claim of generative
grammar.
1
A problem with the SPE theory of features is highlighted in Campbell
(1974), namely the connection between round vowels and labial consonants.
The formal problem is that labiality can trigger or be triggered by vocalic
rounding, but feature theory does not explain this fact.
(1) Finnish: k v / u__u (in weak-grade context)
Tulu: i u / labial C C
0
__
Finnish [v] is not round, nor are Tulu [p,m], therefore the changes in (1) are
formally arbitrary changes, which was considered undesirable.
In the autosegmental era, attention was paid to restricting rules and
representations to disallow arbitrary changes. McCarthy (1988) nicely
summarizes reasoning in this period with his observation that:
...phonological theory has made great progress toward this goal by adhering
to two fundamental methodological premises. The rst is that primary
emphasis should be placed on studying phonological representations rather
than rules. Simply put, if the representations are right, then the rules will
follow. [p. 84]
The problem, according to McCarthy, is that common nasal place assimilation
is predicted to be no more likely than an impossible one that assimilates any
arbitrary set of three features, like [coronal], [nasal], and [sonorant] (p. 86).
The resolution to this problem lies in an interaction between a better theory
of representations and a better theory of rules:
The idea that assimilation is spreading of an association line resolves
the problem raised by (3). Assimilation is a common process because it is
accomplished by an elementary operation of the theory addition of an
association line [p. 86]
A representational theory such as that of Clements (1985) or Clements and
Hume (1995) would then let representations dictate rules for us.
84 David Odden
It is correct that some emphasis must be placed on the representational
atoms of phonological grammars, but attention must also be paid to the
theory of rules, since simply knowing that [coronal], [labial] and [dorsal]
exhaust the constituent [place] does not thereby explain the impossibility of
a rule copying any arbitrary set of three features, such as [coronal], [nasal],
and [sonorant]. The computational issue can be addressed by positing, as
Clements (1985: 244) states, that assimilation processes only involve single
nodes in tree structure, which naturally leads to the conclusion of Clements
and Hume (1995: 250) that phonological rules perform single operations
only. Taken together, theories of rules and representations should delimit
the class of possible phonological operations.
The single-operation theory is intended to rule out a vast set of arbitrary
phonological operations, so that only (2a), (3a) would be possible rules of
post-nasal voicing or postvocalic spirantization.
(2) P ost-nasal voicing
a. [+voice] *b. [+voice]

C

C

C C
[+nas] [+nas]
(3) Pos t-vocalic spirantization
a. [+cont] *b. [+cont]

V C V C
The theory of assimilation as spreading leads to a problem if features are
fairly precise descriptions of speech events. In the Sagey (1986) analysis of
Tulu (4), [labial] spreads from a consonant to following i, so that the vowel
gains a labial specication. Then, the feature [round] is added by default,
because [labial] is linked to a vowel, which gives an actually round vowel.
Since there is just one multiply-linked token of [labial] in the representation,
the labial stop itself should also be round; but there is no reason to believe
that the phonetic output has round p.
(4)
a p t a p t a p t u

[labial] [labial] [labial]
[round]
It is implausible that the phonetic output is *[ap
w
tu], and similar issues arise
in the unication of vowel fronting and coronal (Hume 1994), where /ot/
[ t], not *[ t
y
]. If features are interpreted narrowly and strictly by the
classical denitions, an ad hoc method of cleaning up outputs is required.
Unied Features Theory allows a different solution to the problem in
(5), by claiming that features are more abstract than in SPE, and are subject
to more complex phonetic interpretation. Fine-grained interpretation, in
the case of place features, depends not just on the terminal feature, but
on its relationship to a dominating node, so labial immediately dominated
by C-Place is realized as bilabial or labiodental but labial immediately
dominated by V-Place is realized as lip rounding.
(5)
a p t a p t a p t
CP CP CP CP CP CP CP CP CP
VP VP VP =[round]

[labial]
[labial]

[labial]
The core, universal feature [labial], [coronal] etc. would be less denite from
the physical perspective, and would be neutral as to interpreting labial as
round with lip protrusion versus as labial compression / approximation.
Analogous reasoning equates the features for consonantal coronal and vowel-
frontness in Hume (1994), and voicing and low tone in Bradshaw (1999).
Such abstracting away from phonetic specics is crucial to realizing the goal
of an empirically tenable representational theory which works in tandem
with a computational theory that prohibits arbitrary feature-insertions such
as (6).
(6) i u after a labial i a after uvular, larynge al
C V C V
+ant +hi ant +hi
cor +back cor back

+round +low
The possibility of viewing processes of fronting triggered by coronals
and rounding triggered by labials as constituent-spreading depends
86 David Odden
substantially on a degree of detachment between features and their phonetic
content.
There is a basic logical aw in the idea of a formal principle whereby
assimilation must be treated as spreading. Such a principle does not address
the possibility of a non-assimilatory rule where o becomes before a labial,
or i becomes a before a coronal. That is, attempting to restrict the formalism
of just assimilatory rules will ultimately yield no restriction on phonology,
since the fact of being an assimilatory rule is not a self-evident linguistic
primary it is an analysts concept. From the formal perspective, a crucial
part of the logic is that arbitrary feature changes such as (7) with insertion
of [back] triggered by presence of Labial and insertion of [+low] triggered
by Coronal should also not be possible rules.
(7) * V * V
Place Place Place Place

[o]

[bk]

Lab

[i] [+low] Cor
In short, formal phonology must severely limit insertions, if McCarthys
desideratum is to be realized. Allowing rules such as (7) under any guise
undermines the claim that correct representations yield a more restrictive
view of possible rules.
A complete prohibition on feature insertion would face serious empirical
problems, since insertion certainly exists in the form of segments, provides
well-formed syllables (onsets are created, vowel-epenthesis exists), the
lling in of unspecied features, and OCP-driven insertion of opposite
values. Understanding the nature of insertions is a vital part of understanding
phonological computations. I do not focus on that matter here, and assume
that strong limits on insertions are possible. The central point of this section
is to show that a predictive formal theory cannot rely just on representations,
it also needs a valid theory of how representations are acted on so as to rule
out a vast range of direct feature changes of the type that would be possible
in SPE theory. We consider the consequences of such a restrictive formal
theory for the concept of features in section 4.
2. A substance-free perspective
Given that phonological rules operate in terms of variously-intersecting sets
of segments accessed by features, two fundamental questions about features
are whether they are universally pre-wired and unlearned, and whether they
are ontologically bound to phonetic substance. In the generative context, the
dominant trend has followed Chomsky and Halle (1968), and presumes that
the atoms of phonological representation are phonetically dened. Recent
trends in phonology (exemplied by Archangeli and Pulleyblank 1994 and
Hayes, Kirchner and Steriade 2004) have returned to the SPE program of
denying the distinction between phonetics and phonology, elaborating the
theory of phonology with teleological principles designed to model optimal
production and perception within grammar.
An alternative to features as phonetic descriptions is that they are formal,
substance-free descriptions of the computational patterning of phonemes in
a language, the classes which the sounds of a language are organized into for
grammatical rules. One of the earliest phonological analyses of a language
(Sanskrit), P n inis As t dhy y , is founded on a strictly algebraic system of
referring to segment classes by specifying the rst and last members of the
class, as they appear in a phonologically-ordered list of Sanskrit segments.
Thus the class voiceless stops is identied by the formula khay, and
sibilants are ar; the class voiceless aspirate cannot be described in this
system, which is appropriate since it plays no role in the grammar of Sanskrit
as a distinct class. In the modern era, Hjelmslev (1939) is a well-known pre-
generative proponent of the view that phonological analysis should not be
founded on assumptions of phonetic substance. Thus a phonological theory
which eschews reference to the physical manifestation of speech is certainly
possible.
There have been such countervaling tendencies in generative phonology,
for example Foley (1977), Dresher, Piggott, and Rice (1994), Harris and
Lindsey (1995), Hale and Reiss (2000, 2008), Samuels (2009), which
emphasize the formal computational aspects of phonology and which
assert that phonology is autonomous from phonetics, thus the atoms of
phonological computation and representation have only an accidental
relationship to principles of production and perception. Such a perspective
emphasizes the independence of synchronic grammatical computations from
the historical causes of those computations, and places substance-dependent
considerations outside of the domain of grammar and inside of the domain
of the study of language change and acquisition.
Within this substance-free tradition, the Parallel Structures Model (PSM)
of representations (Mor n 2003, 2006) advances a formal theory of segmental
representation by minimizing the number of representational atoms (features)
and maximizes relational resources (hierarchical structure), pursuing certain
essential ideas of UFT to their logical limit. An important aspect of this model
88 David Odden
of representation is the high degree of phonetic abstractness of features,
especially manner-type features, where PSM eliminates numerous features
such as nasal, lateral, continuant, consonantal and sonorant in
favor of combinations of two abstractly dened features, open and closed
and more free combinations of these features with the nodes V-place and
C-place. In PSM, nasal consonants across languages can, in principle, be
represented in many different ways, effectively answering the question are
nasals [+continuant] or [continuant] by saying either, depending on the
facts of the language.
Pursuing the logic of this model even further, the theory of Radical
Substance-Free Phonology (RSFP) in Odden (2006), Blaho (2008) posits an
entirely substance-free formal theory of phonology, holding that a phonology
is a strictly symbolic computational system, that the terminal representational
primitives which a phonological computation operates on is the feature,
but features have no intrinsic physical denition or interpretation. Physical
interpretation of a phonological representation is handled by the phonetic
interpretive component, which has a wholly different nature from that of the
phonological component.
Phonological segments are the perceptual primitives that feature induction
starts with: a child learning English knows that [p] is not [t], and must
discover what formal properties distinguish these sounds (and unite these
sounds in being opposed to [b], [d]). The general nature of that difference
in a grammar is feature specication, and rules operate as usual, being
stated in terms of features, because the device of features is mandated by the
theory of rule syntax which is the locus of grammatical universality. The
crucial difference between RSFP and theories using SPE or similar universal
features is that in RSFP, the features used in a language and the relationship
between physical properties of phoneme realization and featural analysis
must be learned from grammatical patterns, and are not predetermined by
acoustic or articulatory events. RSFP holds that principles of UG do not refer
to specic features, and that UG is unaware of the substance of features.
Phonetically-grounded facts, especially markedness, are outside the scope of
what grammar explains, but are within the scope of non-grammatical theories
of perception, language change and acquisition, which partially explain the
data patterns that the rule-system generates. Functional factors are relevant
only when they affect the actual data which the next generation uses to learn
a particular grammar.
If certain segments {a,b,d,f} function together in the operation of a
rule, those segments have some feature(s) in common, e.g. [W]. Universal
physical pre-denitions of phonological features are unnecessary to be able
to pronounce outputs or identify inputs: actual experience with the language
is. Since the primary data for language acquisition are a complex function
of numerous antecedent human factors, there are many (but not unlimited)
possible mappings from phonetic fact to grammar. It follows from this that
languages can give different featural analyses to the same phonetic fact. If
the patterns of segment classes of two languages are substantially different,
it is expected that the featural analyses of those segments will differ, even
when the phonetic events that they map to are essentially the same.
The logic of RSFP is inductive, working from the facts to the conclusion,
and is not deductive, working from pre-existing conclusions to language-
particular facts. The primary fact which the child knows and builds on is that
the language has particular segments such as i, e, o, p, b, t
h
, n, k, . When the
facts of the language show that t
h
, n pattern as a class to the exclusion of other
segments, the child knows that those segments have some common feature
that classies them and can induce the class labeled [coronal]; when p, b, t
h
,
k act as a class separate from n, , the child knows that some other feature
e.g. [oral] distinguishes t
h
and n.
2
Numerous competing feature systems to describe a given phonetic
distinction could be induced, but such competition is always crosslinguistic.
Thus nasals may be stops in one language and continuants in another; within
a single language, a sound cannot simultaneously have and not have the same
feature. The facts of the primary linguistic data determine what grammar is
induced, and within a language there is a single, non-contradictory analysis
of the facts (modulo the possibility of feature changing operations such as
default ll-in applying at a particular derivational stage, a possibility available
to substance-dependent theories of features as well). While there cannot be a
single theory of the substance of features across languages, there is a single
theory of the syntax of features. Especially relevant to the concerns of this
paper, the only predicted grammatical limits on natural classes and segment/
tone effects are of the kind could not be part of a linguistically computable
rule.
3. Tone features
There have been a number of contradictory proposals for tonal representations,
including Wang (1967), Sampson (1969), Woo (1969) and Maddieson
(1972), all of which are capable of describing 5 tone levels (an important
consideration for a theory of tone, since there are languages with 5 distinctive
levels). These theories differ in terms of the classes that they predict to be
90 David Odden
possible, for example the proposal of Wang (1967) predicts that tone levels
1,2,4,5 excluding 3 could function as a class dened as [Mid] whereas
the competing theories do not allow this. Sampson (1969) and Maddieson
(1972) predict that levels 1,2,3 can function as a class dened as [Low]
whereas the competitors do not. Woo (1969) predicts that levels 1,3,5 can
be a class dened as [Modify]. None of these theories predicts the well-
known interaction between consonant voicing and tone (see below), which is
addressed by the features of Halle and Stevens (1971), but on the other hand
the latter theory does not handle more than 3 tone levels.
One widely-adopted tone feature theory is the Yip-Pulleyblank model (8),
which assumes a register feature [upper] dividing tone space into upper and
lower registers, and [raised] which subdivides registers into higher and lower
internal levels. All upper register tones are higher than any lower register
tones, and a raised tone in the lower register is physically lower than the non-
raised tone of the higher register.
(8) SH H M L
upper + +
raised + +
A signicant empirical advantage of this system is that it explains a
surprising phonological alternation, the physically-discontinous assimilation
of the feature [+raised], as Clements (1978) documents for Anlo Ewe. In
that language, a Mid (M) tone becomes Superhi (SH) when anked by H
tones. What is surprising is that the M of the postposition [m gb ] actually
becomes higher than the triggering H. From the perspective of a featural
analysis of tones in the raised/upper model, this is perfectly sensible as a rule
assimilating the feature [upper].
(9) kpl me
gb kpl megb behind a spear

e kp me
gb e kpe megb behind a stone

H M H

x
+upper
upper
+upper

raised
+raised raised
Since M is the raised tone of the lower register and H is the non-raised tone
of the upper register, it follows that if M takes on just the register feature of
H, then it becomes the raised tone of the upper register, which is SH.
The prediction of RSFP, with respect to tones as well as other features, is that
other arrangements of the same phonetic facts into phonological systems are
possible, because numerous competing feature systems can be induced from
the simple fact of having 4 tone levels, and the competition is only narrowed
down by looking at natural class behavior. To see that this predicted outcome
is realized, we turn to a language with a different treatment of 4 levels, namely
Kikamba. The 4level tonal space of this language is divided by a distinction
between high and low tones, which are further differentiated by being plain
versus [extreme], that is, at the outsides of the tonal space. Note that the
feature [extreme], which groups together the inner and outer tones into
natural classes, was one of the tone features proposed in Maddieson (1972).
Following that model of tone representation, the highest tone in Kikamba, SH,
is a [+extreme] H, and the lowest tone, SL, is a [+extreme] L.
(10) Kamba (Roberts-Kohno and Odden notes; Roberts-Kohno 2000)
v
v
v v
`
SH H L SL
H + +
extreme + +
This analysis of 4 tones is induced from the natural class patterns of the
phonology, which show a paradigmatic connection between H and SH
triggered by SL. Innitives have a nal SL, which spreads to the second
mora of a long penult. Verbs are also lexically differentiated as to whether
their rst root mora has an underlying H versus L.
(11) L verbs ko-kon-a hit ko-klk-a stir
ko-kc c`l-a strain ko-sitaa k-a accuse
H verbs ko-k lakely-a tickle ko-k olok-a advance
ko-k tek-a occur ko-t laa g-a count randomly
As shown in (12), whenever H comes before SL, that H becomes SH, which
spreads to a nal SL vowel.
(12) ko-tal-a count ko-ko o ly-a ask
ko-tw -a pluck
/kot la / /kok o lya / /ko-tw `/
Th e raising of H to SH before SL is easily comprehensible as the assimilation
rule (13) if this language employs a feature [extreme] grouping SH and SL
together.
92 David Odden
(13) H
V V
[+extreme]
Expressed in the raised/upper model, the rule would be an arbitrary feature
change, which we have s ought to rule out on theoretical grounds.
(14) [+upper] [-upper]
V V
[+raised] [-raised]
Other evidence for analysis of Kikamba tones in terms of [extre me] is seen in
(15), which illustrates the fact that a lexical [extreme] specication in nouns
is deleted when the noun is followed by a modier.
(15) N big N
maio maio mancnc bananas
mabaat mabaat mancnc ducks
ekwaase ekwaas encnc sweet potato
moemi moemi moncnc farmer
This is a simple feature deletion with the feature [extreme], but is an arguably
unformalizable rule under an upper-raised analysis.
(16) Extreme-theory: [+extreme] / ___ .... X ]
NP
Upper-raised theory: *[uupper] [-uraised]/ ___ .... X ]
NP
Finally in (17), certain verb forms cause a SL, which is just a specication
[+extreme], to shift to the end of their complement, explaining the alternations
on postverbal Mocma and maio.
(17) maio bananas
m aMocma of Moema
ng at l maio I will count bananas
ng at l maio m aMocma I will count bananas of Moema
Denite forms of nouns have SH tone on the rst syllable, which of course
involves a [+extreme] specication. The presence of SH then blocks the shift
of SL from the verb.
(18) maio the bananas
ng at l maio m aMocma I will count the bananas of Moema
Blockage is expected given that [+extreme] originates from the verb and
thus precedes the target phrase-nal position, since there is an intervening
[extreme] specication.
(19) [+ex] [+ex] [+ex]
ng at l maio m aMocma ng at l maio m aMocma
Thus the same surface tone system four levels are analyzed different
ways in different languages, supporting the claim of RSFP that features are
learned and not universally identical.
A similar point can be made with respect to segment/tone interactions.
The best-known effect is the so-called depressor effect, whereby voiced
consonants are associated with L-tone behavior. See Bradshaw (1999) for
an extensive treatment and a theoretical account of the facts within UFT. An
example is the pattern in Nguni languages such as Siswati, where H becomes
a rising tone after a depressor. In (20), underlying H from the innitive prex
(underlined) shifts to the antepenult.
(20) k -k-a to arrive ku-f k-el-a to arrive for
k -gez-a to bathe ku-g z-el-a to bathe for
k -ge|-a to chop ku-ge|- l-a to chop for
When the onset of a syllable with a H is a depressor, H appears as a rising
tone. That ri sing tone is sometimes eliminated by shifting H to the penult as
in (21a), but this does not take place when the onset of the penult is also a
depressor as in (21b).
(21) a. k -ge|ela kug |ela kug |ela kuge| la
b. k -gezela kug zela kug zela
This connection between L tone behavior and voiced obstruents has been
know for many years, at least since Masp ro (1912), and is recognised in
94 David Odden
the Halle and Stevens feature system by reducing voicing and L tone to a
single feature, [slack]. The Halle and Stevens account faces the problem that
a complete identication of voicing and L tone accounts for the phonological
relevance of voiced consonants in creating rising tones and blocking rise-
decomposition in Siswati, but it does not account for the irrelevance of
voiced consonants with respect to the rule shifting H to the antepenult, which
operates across all consonants.
This is resolved in the model of Bradshaw (1999) which equates L tone
and voicing as one feature, L/voice, which allows multiple dominating nodes.
When dominated by a tone node it is realized as L tone, and when dominated
by a segmental laryngeal node it is realized as voicing.
(22)
L tone Voiced consonant
root
Tone Laryngeal
L/voi L/voi
This accounts for the facts of consonant-tone interaction in Siswati by a
spreading rule.
(23)
[L/v] [L/v]
H
Laryngeal Lar
Tone Tone
g e z e l a
The a lternative would be an arbitrary insertion feature rule, of the kind that
nonlinear representations are supposed to render unnecessary.
(24) L / C ____
[+voice]
Since rules can be plane- and tier-sensitive, this predicts that there can be
an interweaving of transparency and opacity (Bradshaw 1999: 106) as seen
in the differential blocking or transparency of tone shifts by intervening
depressors.
3
This then is the fundamental and most common interaction between tone
and segmental content, that voiced consonants bring about L tone behavior.
The success of this explanation depends very much on relaxing the degree
of phonetic specicness of features. In the next section, I turn to a different
effect, one that is quite rare in synchronic phonologies and also much more
abstract than the consonant / tone connection, namely the effect of vowel
height on tone.
4. Vowel height and tone
I begin with methodologically-instructive data from the Khoisan language
Shua. This language also has 4 tone levels, and certain facts suggest a
historical sound change relating tone and vowel height. H, M and L tones are
illustrated in (25), whose appearance is fairly unrestricted.
(25) LL // y (v) LM / snake
// ngernail know
t r rat b e zebra
LH h r dish LHL kh give
c cry kh person
d m turtle tsh daytime
d grass g// run
k speak // m` low
ML ! at MM //a
-

i buy
pe jump
// m
cut MH !hu
push
mu
`
see mwe d moon
HL j b axe HM // m hit
k

u
exit s e take
sh r tobacco n/ thigh
HH sh n breathe x m lion
xw white z r bird
What is noticeable in (26) is that the Superhi tone appears almost exclusively
on high vowels.
(26) SH // song SS sh b light
k m heavy kar hard
nj black
x m sand SL ?
y u
sit
send
96 David Odden
There are no melodic tones or alternations in the language to support a
synchronic connection between vowel height and tone. It may be assumed
that there was a historical change in the language explaining the uneven
synchronic distribution of the tones, but that provides no warrant for encoding
that relationship in the synchronic grammar. Induction of a featural relation
between vowel height and tone requires a denite, categorial patterning in
the grammar, which Shua lacks.
A strong case for synchronically connecting tone and vowel height
comes from Tupuri, a member of the Mbum group of Adamawa languages
spoken in Chad and Cameroun. Tupuri also has 4 tone levels, Superhi (SH
=v
), High (H=v
), Mid (M=v ) and Low (L=v
), and the language presents a

grammaticalized transplanar segment effect between vowel height and the
sub-register feature [raised].
Static distributional evidence from nouns in Tupuri proves to be as
unrevealing as it was in Shua. As the forms in (27a) show, there is a
predominance of high vowels in nouns with superhi tone, but as (27b) shows,
this is just a tendency.
(27) a. d name h
bone
r hair t house
sok ear ok smoke
to hole t head
b. fay eld rat har palm leaf
ks card game pcr priest
There is a also a tendency visible in (28a) for high tone to appear on non-high
vowels, but as (28b) shows, this is just a statistical tendency.
(28) a. ' y bean d hare
fc
k smile k k chicken
k w relative l w afternoon
p milk s k haunch
s m sheep t y feather
b. dt
race ht
grudge
pt
n beard o
horn
o
our t y rabbit
As (29a,b) show, mid and low tone s appear freely on any vowel.
(29) a. b y testicle d arm
f amusement h

y nose
h n calabash kr wing
ko
wood pt
r horse
t
sh tl m tongue
fu
y fur ko
leg
b. w
`
y female j k mouth
j w spear w y dog
h n brother no
oil
wt
l boy yo
middle
Thus the non-alternating static lexical distribution is unrevealing o f the
grammar of Tupuri: no phonological rules affect tones in nouns, and there is
no reason to posit any rules of grammar to account for these data.
Verb tone is entirely different, since tones in verbs alternate paradig-
matically, meaning that there is something which the grammar must account
for. Unlike nouns, verbs have no lexically-determined tone. Instead, verbs
receive their tones in a classical autoseg mental fashion via concatenation of
morphemes, which includes oating-tone tense markers. Verb tone is entirely
predictable according to the following informal rules. First, root vowels
are generally M-toned. An expected M on the syllable after the 3s pronoun
becomes SH. The entire verb stem tone becomes H and the 3s pronoun itself
has a SH in the present tense. The paradigm in (30) shows the pattern for
monosyllabic roots.
(30) innitive 1s past 3s past 1s present 3s present
-g nj aa nj a dig
dcf-g nj dcf dcf nj dc
f a dc
f cook
bm-g nj bm bm nj b
m a b
m play
c i k-g nj c i k c k nj c k a c k pound
dt
k-g nj dt
k dtk nj dt
k a dt
k think
ycr-g nj ycr ycr nj yc
r a yc
r write
Polysyllabic verbs particularly show that SH appearing after the 3s past
pronoun only affects the rst syllable of the verb, whereas the H of the
present tense is manifested on all syllables of the verb.
(31) innitive 1s past 3s past 1s present 3s present
bo
lo
l-g nj bo
lo
l bolo
l nj bo
lo
l a bo
lo
l roll over
kt
lcr-g nj kt
lcr ktlcr nj kt
lc
r a kt
lc
r draw
The analysis of this pattern is that verb roots have no underlying tone, and
by default any toneless vowel r eceives M. Pronouns have tones, so the
98 David Odden
invariantly L toned 1sg pronoun nj has L, whereas the 3sg pronoun is toneless
/a/ plus a oating SH. The oating SH of /a/ docks to a following vowel if it
is toneless, and otherwise docks to the preceding toneless pronoun. Whether
the following verb has a tone is determined by tense-aspect. The present/
imperfective has a oating H which spreads to the vowels of the root. In the
presence of such an inectional H, the prexal SH docks to the pronoun (a
he digs). Otherwise the SH from the pronoun appears on the verb (
aa he dug). Illustrative derivations are given in (32).
(32) /l:-g / /nj l:/ /a l:/ /nj l: / /a l: / underlying
[nj l:
] a l:
melody
mapping
a l: rightward
SH docking
[a
l:
] T docking
[l: -g ] [nj l: ] [ l: ] default M
In one tense, the imperative, the tone of verbs is determined by properties
of segments in the verb. Relevant segmental factors include both the voicing
of the root-initial consonant and the height of the vowel. We begin by
considering verb roots with a non-high vowel, where the consonant determines
the verbs tone. In (33a) we observe voiceless consonants conditioning a H
tone, and in (33b) we see implosives and glottalized sonorants triggering H
tone.
(33) a. ' c
k fry '
k braise
h
t eat fufu dry h give

fc
r return frc
k scratch
k p plant klc
w
4
squeeze
s t sweep c
cut
t braid t m chew
b. b l nail bc
s divide
d r insult d w hold
w r kill
In contrast, (34a) shows a L tone when the initial consonant is a voiced
sonorant, and (34b) shows the same tone after a (voiced) plain sonorant.
(34) a. b
m play b r cove r
dc
dip fufu dc
f make soup
d
k repeat d want
g c
raise g
huddle
gr k put across g s sift
jc
l stoop ja
fray
b. lc
fall l
bite
l hear l s maltreat
m beat m carry
na
undress ny take
r
t burn r promenade
wa
speak w k scratch
yc
r write y
k bathe
The examples in (35) show that when the root begins with a prenasalized
consonant (which is always voiced), the verbs tone is L.
(35) mbc
t stretch mb
k diminish
mb r bear ndc
p ll a hole
g decapitate g p measure
Finally, (36) shows that in polysyllabic verbs, the tone of both syllables is
determined by the consonant property of the root-initial syllable.
(3 6) r k tear up h r k break
g r k dress up g r s undercook
Thus with non-high vowels, the choice H vs. L is determined by the voicing
of the rst consonant, as predicted by Bradshaws model of consonant-tone
interaction. The particular arrangement of which consonants behave as tone-
depressors is unique to Tupuri, but similar to the pattern of other languages.
Voiced obstruents are depressors, which is the common case, and implosives
(which are predictably and not necessarily phonologically voiced) are not
tone depressors. Voiced sonorants are also tone depressors which is not the
common pattern (but is attested in some languages); the fact that glottalized
glides are non-depressors in contrast to plain glides has no known analog
in other languages, since they have not been attested in language with
consonant-tone effects.
5
Conceptualized in terms of Yip-Pulleyblank features, the feature analysis
of the 4 Tupuri tones is analogous to that of Ewe. Taking into consideration
the segmental analogs, though, L/voice is the opposite of upper register, that
is, in Tupuri, a better name would be low register. H is treated as the
100 David Odden
non-raised tone of the upper register (SH being the raised tone in that
register), and L is the lower tone of lower register (M being the raised tone
of that register).
(37) v
v
v ` v
SH H M L
L/voi (upper) + +
hi (raised) + +
The H ~L alternation is thus a consonantally-triggered register-change: L/
voice a.k.a. upper spreads qua register from the initial consonant, and
combines with an existing [raised] i.e. H tone to yield the lowest tone, L.
The rule is, essentially, the same as the standard spreading of low register (L/
voice) in (23) targeting a [raised] tone in the imperative (spreading of low
register from a consonant is not a general phenomenon in the language, it is
morphologically restricted to the imperative). The imperative tense itself is
characterized by a oating H ([raised]) tone, which spreads to all vowels.
This results in the following contrast.
(38)
r k ng r k
L/v

[raised] [raised]

Tone node
Whether or nor [ r k] is further specied [L/v] by default depends on
whether features are binary and fully specied in the output.
However, when the vowel of the verb root is [+high], a different pattern is
found. Consider rst the examples in (39), where the initial consonant is non-
depressor and the syllable is either VV or VR (R =liquid, glide or nasal).
Observe that the tone pattern of the syllable is H-SH.
(39) ' y arrange ' r y
c ii
start f u
y pull
h l
cover ko
r ght
k turn around k spend the year
k l
have blisters s announce

bl
entertain d pound
I assume that sonorant coda consonants are moraic, thus the surface
generalization is that the rst mora has H and the second has SH.
Compare the following examples with the same initial consonant and an
obstruent coda: the verb in the imperative has just SH on its one mora.
(40) 'k pant c k pound
htk dry kop cover
krtk scratch t f spit
dtk think
A compact statement of the distribution of tone in the imperative, when the
vowel is [+high], is that the nal mora bears SH tone.
As the data of (41a,b,c) show, the phonatory properties of the root-initial
syllable do not inuence tone.
(41) a. b ol
open bo
m thrash
dii
deform g r stir sauce

g m beat millet g n witch
b. lo
try l learn
lt
w taste mo
n break
ro
m pinch r advise
r yell at w say
c. nd come ngl
provoke
ndl
pierce
Analogous monomoraic roots with an initial voiced consonant are seen in (42).
(42) a. d k vomit gos taste
g p suffocate
b. lk swallow l p immerse
r k ripen wot swell up
ytk dry in sun
c. ngt turn
Finally, disyllabic verbs can be observed in (43).
(43) bo
lol roll over nd l p overcook meat

The analysis of this pattern is as follo ws. A segmentally conditioned rule,
Imperative Raising, changes the H tone of the imperative, bleeding the
102 David Odden
consonantally-triggered tone-lowering rule motivated above. By Imperative
Raising, a [+high] vowel raises H to SH on the last mora, therefore the post -
depressor tone is no longer [raised]. In Yip-Pulleyblank terms, [+upper,
raised] becomes [+raised] when there is a high vowel in the root.
The central theoretical question, then, is how the tone feature [+raised] can
be acquired from a [+high] vowel, in a theory which prohibits arbitrary feature
insertions. The solution is simple, and is parallel to the equation of voicing and
L tone (register). In Tupuri, the tonal feature [raised] and the vowel feature [hi]
are one and the same feature. When linked to a tone node, [hi] is realized as
the high tone in the register i.e. [+raised] and when linked to a vowel place
node, it is realized as a high vowel. The rule is formalized in (44).
(44) Imperative raising
[hi/raised]
V-place
Tone (in the imperative)
]
The effect of this rule is seen in (45).
(45) T T

n d u =[nd ]
Vpl
[hi/raised]

This solution to the problem of height-conditioned tone raising in Tupuri
is possible only if features are abstract they do not intrinsically describe
specic physical events.
5. Conclusions
A central claim of Radical Substance Free Phonology is that features are
not universally pre-dened, and that only the formal mechanism of features
exists as part of universal grammar. Each feature must be learned on the basis
of the fact that it correctly denes classes of segments within the grammar. A
prediction of this claim is that a given phonetic fact or phonological contrast
could be analysed into features in a number of different ways. We have seen
above that this is the case, regarding the analysis of tone-heights.
6
While the theory formally allows essentially any logically coherent
organization of segments into classes, mediated by learned features, this does
not mean that the theory predicts that all or many of those computational
possibilities will actually be realized. As emphasized by Hale and Reiss
(2008), attested languages are a function not just of the theory of computa-
tion, but also are indirectly the result of extragrammatical constraints on
acquisition and language change, which determine the nature of the data
which a child uses to induce a grammar. Grammars are created by children
in response to language facts, so any factors that could diachronically affect
the nature of the primary data could indirectly inuence the shape of a
synchronic grammar. Since phonetic factors obviously affect what the child
hears and thus the inductive base for grammatical generalizations, there are
reasons for grammatical processes to have a somewhat phonetically-natural
appearance.
Is there a sensible functional explanation for why such a correlation between
vowel height and tone raising would have arisen? The phenomenon of intrinsic
pitch is well-known in phonetics high vowels universally have higher F
0
than
comparable non-high vowels, on the order of 15 Hz (see Whalen and Levitt
1995 for a crosslinguistic study). This is often explained by mechanical pulling
on the larynx by the tongue, increasing vertical tension on the vocal chords,
and is sometimes explained based on perception of F
1
with reference to F
0

(the fact that high vowels are ones where F
1
and F
0
are close, thus raising F
0

in high vowels enhances this percept). While this effect is generally believed
to be imperceptible (Silverman 1987, Fowler and Brown 1997), see Diehl and
Kluender (1989 a,b) who claim that intrinsic pitch is under speaker control.
It is apparent that this phonetic tendency was in fact noticed and
amplied by pre-modern Tupuri speakers, yielding the grammaticalization of
an earlier low-level physically-based trend. While extremely rare,
7
such rare
patterns are essential to understanding the nature of possible grammars.
Notes
1. See Bromberger and Halle (2000) for a more contemporary afrmation of the
phonetic grounding of features, in terms of an intention to produce an articulatory
action.
104 David Odden
2. Standard features names may be conventionally kept; or, features may be labeled
with arbitrary indices such as F
5
, as noted in Hall (2007). There is no signicance
to the name assigned to a feature in a language in RSFP, any more than SPE
phonology claims that the intrinsic name for the tongue-blade raising feature
is [coronal] rather than [lingual]. Whether features are binary or privative, on the
other hand, is a fundamental question about UG.
3. The formal details of how plane- and tier-sensitivity remain to be worked out, in
the framework of the theory of adjacency conditions developed in Odden (1994).
4. Notice that the stem-initial consonant is a voiceless obstruent, a non-depressor,
but the consonant immediately preceding the H-toned vowel is a sonorant, a
depressor. The alternation between H and L is triggered by the root-initial
consonant.
5. Words which might be thought to begin with a vowel have a noticeable phonetic
glottal stop, which is preserved phrasally, and is included in the transcriptions
here. Note that glottal stop, if it is phonologically present, does not behave as a
tone-depressor, see e.g. 'c
k fry. This is somewhat noteworthy, since in Kotoko

(Odden 2007), glottal stop is a tone depressor.
6. A further example of feature-duality is the relationship between vowel laxness
and L tone, demonstrated by Becker and J urgec (nd) for Slovenian, which they
argue has tone-lowering alternations triggered by lax mid vowels.
7. To the best of my knowledge, the only other case of synchronically-motivated
tone / vowel-height connection is found in certain J apanese dialects, discussed in
Nitta (2001).
References
Archangeli, Diana and Douglas Pulleyblank
1994 Grounded Phonology. Cambridge, MA: MIT Press.
Becker, Michael and Peter J urgec.
nd Interactions of tone and ATR in Slovenian.
http://roa.rutgers.edu/les/9951008/995BECKER-00.PDF.
Blaho, Sylvia
2008 The Syntax of Phonology: A Radically Substance-Free Approach.
Ph. D. dissertation, University of Troms.
Bradshaw, Mary
1999 A Crosslinguistic Study of Consonant-Tone Interaction. Ph. D. diss.,
The Ohio State University.
Bromberger, Sylvain and Morris Halle
2000 The ontology of phonology (revised). In: Noel Burton-Roberts,
Philip Carr and Gerard Docherty (eds.), Phonological Knowledge:
Conceptual and Empirical Issues, 1837. Oxford: Oxford University
Press.
Campbell, Lyle
1974 Phonological features: problems and proposals. Language 50: 5265.
Clements, G. Nick
1978 Tone and syntax in Ewe. In: Donna J o Napoli (ed.) Elements of Tone,
Stress, and Intonation, 2199. Washington: Georgetown University
Press.
1983 The hierarchical representation of tone features. In: Ivan R. Dihoff
(ed.), Current approaches to African linguistics, Volume. 1, 145176.
Dordrecht: Foris
1985 The geometry of phonological features. Phonology Yearbook 2: 22552.
1991 Place of articulation in consonants and vowels: a unied theory.
Working Papers of the Cornell Phonetics Laboratory 5: 77123.
Clements, G. Nick and Elizabeth Hume
1995 The internal organization of speech sounds. In: J ohn Goldsmith (ed.),
The Handbook of Phonological Theory, 245306. London: Blackwell.
Diehl, Randy and Keith Kluender
1989a On the objects of speech perception. Ecological psychology 1: 121144.
1989b Reply to commentators. Ecological psychology 1: 195225.
Dresher, B. Elan, Glyne Piggott and Keren Rice.
1994 Contrast in phonology: overview. Toronto Working Papers in
Linguistics 14: iiixvii.
Foley, J ames
1977 Foundations of Theoretical Phonology. (Cambridge studies in
linguistics, 20). Cambridge: Cambridge University Press.
Fowler, Carol A. and J ulie M. Brown
1997 Intrinsic F
0
differences in spoken and sung vowels and their perception
by listeners. Perception and Psychophysics 59: 729738.
Hale, Mark and Charles Reiss
2000 Substance abuse and dysfunctionalism: current trends in
phonology. Linguistic Inquiry 31: 157169.
2008 The phonological enterprise. Oxford: Oxford University Press.
Hall, Daniel Currie
2007 The role & representation of contrast in phonological theory. Ph. D.
diss., University of Toronto.
Halle, Morris and Ken Stevens
1971 A note on laryngeal features. RLE Quarterly Progress Report 101:
198312. MIT.
Harris, J ohn and Geoff Lindsey
1995 The elements of phonological representation. In: J acques Durand and
Francis Katamba (eds.), Frontiers of Phonology: Atoms, Structures,
Derivations, 3479. London, New York: Longman.
106 David Odden
Hayes, Bruce, Robert Kirchner and Donca Steriade
2004 Phonetically-Based Phonology. Cambridge: Cambridge University
Press.
Hjelmslev, Louis
1939 Forme et substance linguistiques. In Essais de linguistique II.
Copenhagen: Nordisk Sprog- og Kulturforlag
Hume, Elizabeth
1994 Front Vowels, Coronal Consonants and their Interaction in Nonlinear
Phonology. New York: Garland.
J akobson, Roman, C. Gunnar M. Fant & Morris Halle
1952 Preliminaries to Speech Analysis: The Distinctive Features and their
Correlates. Technical Report 13. Massachusetts: Acoustics laboratory,
MIT.
Maddieson, Ian
1972 Tone system typology and distinctive features. In: Andr Rigault
and Ren Charbonneau (eds), Proceedings of the 7th International
Congress of the Phonetic Sciences, 95761. The Hague: Mouton.
Masp ro, Henri
1912 Etudes sur la phontique historique de la langue annamite. Les
initiales. Bulletin de lEcole francaise dExtrme-Orient, Anne
1912, Volume 12, Num ro 1: 1124.
McCarthy, J ohn
1988 Feature geometry and dependency: a review. Phonetica 43:84108.
Mor n, Bruce
2003 The Parallel Structures Model of feature geometry. Working Papers of
the Cornell Phonetics Laboratory 15: 194270.
2006 Consonant-vowel interactions in Serbian: features, representations
and constraint interactions. Lingua 116: 11981244.
Nitta, Tetsuo
2001 The accent systems in the Kanazawa dialect: the relationship between
pitch and sound segments. In: Shigeki Kaji (ed.) Cross-Linguistic
Studies of Tonal Phenomena, 153185. Tokyo: ILCAA.
Odden, David
1994 Adjacency parameters in phonology. Language 70: 289330.
2006 Phonology ex nihilo. Presented at University of Troms.
2007 The unnatural phonology of Zina Kotoko. In: Tomas Riad and Carlos
Gussenhoven (eds) Tones & Tunes Vol 1: Typological Studies in Word
and Sentence Prosody, 6389. Berlin: de Gruyter.
1986 Tone in Lexical Phonology. Dordrecht: Reidel.
Roberts-Kohno, R. Ruth
2000 Kikamba phonology and morphology. Ph. D. diss., The Ohio State
University.
Sagey, Elizabeth
1986 The representation of features and relations in nonlinear phonology.
Ph. D. diss., Massachusetts Institute of Technology.
Sampson, Geoffrey
1969 A note on Wangs Phonological features of tone. International
Journal of American Linguistics 35.626.
Samuels, Bridgett
2009 The structure of phonological theory. Ph. D. diss., Harvard University.
Silvermann, Kim
1987 The structure and processing of fundamental frequency contours. Ph.
D. diss. University of Cambridge.
Trubetzkoy, Nikolai S.
1969 Grundzge der Phonologie [Principles of Phonology]. Translated by
Christiane A. M. Baltaxe. Berkeley: University of California Press.
Original edition: Travaux du cercle linguistique de Prague 7, 1939.
Wang, William
1967 Phonological features of tone. International journal of American
linguistics 33: 93105.
Whalen, Doug and Andrea Levitt
1995 The universality of intrinsic F
0
of vowels. Journal of phonetics 23:
349366.
Woo, Nancy
1969 Prosody and phonology. Ph. D. diss., Massachusetts Institute of
Technology.
Yip, Moira
1980 The tonal phonology of Chinese. Ph. D. diss., Massachusetts Institute
of Technology.
Downstep and linguistic scaling
in Dagara-Wul
Annie Rialland and Penou-Achille Som
1. Introduction
This chapter is in keeping with George N. Clementss work on tonal
languages and downstep, particularly his work with Yetunde Laniran
on Yorubas downstep. Moreover, it concerns subjects which were very
important in Nicks heart and life: phonology, music and Africa.
There is a tradition in analysing downstep in musical terms: intervals,
register, key-lowering. The present chapter will continue and extend this
tradition, proposing to bridge tighter linguistic and musical scalings, at least
in some African tone languages.
Downstep has been studied in many languages and from various points
of view. In this chapter, we will concentrate on phonetic studies that deal
with the nature of intervals in downstep, their calculation, the whole
geometry of the system with its reference lines. Depending upon the
language, various types of scaling and references lines were found. However,
other factors aside, all downstep calculations involved a constant ratio
between downsteps. A constant ratio is precisely also a central characteristic
of musical intervals.
We begin by presenting the various types of downstep which have been
found, taking into account reference lines and scalings (both parameters
are linked, as shown below in (1). Then, by studying Dagara downstep, we
propose that it can be viewed as being based on (roughly) equal intervals
within a musical scale (semitones) in the same way as another African
language such as Kono, whose tonal system shares many common points
with the Dagara one (J . T. Hogan and M. Manyeh, 1996). As downstep cannot
be an isolated phenomenon, we expect other manifestations of musical
type intervals. Using evidence from a repetition task, we hypothesize that
the remarkable parallelism found among the ve speakers productions is
due to a common linguistic scale involving musical-type intervals (tones,
semitones, cents). Thus, we bring together two types of arguments in favor
of a linguistic scaling based on a musical scaling. Finally, as a rst step in
comparing downstep scaling and Dagara-Wul music scaling, we present a
preliminary study of the scaling of an eighteen key xylophone which also
involves (roughly) equal intervals.
We begin by providing some background on the studies of downstep
(or downdrift), their scalings and reference lines involved in their
calculation.
2. Downsteps, scalings and reference lines
The phonetic implementation of downstep (either automatic downstep
also called downdrift, or non-automatic ones which are always termed
downsteps) has been studied in various languages. Since the seminal article
by Liberman and Pierrehumbert (1984), the calculation of downstep has been
found to involve a ratio (or some type of unit, related in an exponential way
with hertz, such as ERBs), and reference lines. Based on available studies
of downstep, we can distinguish three types, depending upon the reference
line involved in their calculation: 1) Downstep with an H tone reference line;
2) Downstep with an asymptote between the last H and the bottom of the
speakers range; 3) Downstep without a reference line. We will see that the
whole geometry of a downstep system generally involves several reference
lines with various roles (see below).
The best studied language with a H tone reference line is Yoruba. In-depth
studies were performed by Y. Laniran and G. N Clements (see Laniran
1992; Y. Laniran and G.N. Clements 2003). Yoruba is a three-tone language
spoken in Nigeria. Its downstep (or downdrift) is triggered by L tones
alternating with H tones. There is no non-automatic or distinctive downstep;
for example, there is no downstep due to a oating L tone. The fact that this
downstep is not distinctive is important, as it can be absent without any loss
of distinctivity.
Studying the phonetic realizations of downstep, Laniran and Clements
(2003) found that the basic H tone value (given by H realizations in all H
tone sequences) provides a reference line for the realizations of downstepped
H in a sequence of alternating H and L tones. This reference line is reached
by the second or third downstepped H tones.
In the following graph, reproduced from Laniran (1992), values of H in
all H-tone utterances are represented by empty circles. Black circles
correspond to the F0 value of tone realizations in an utterance with alternating
H and L tones, and triangles represent values of L tones in an all L-tone
utterance.
110 Annie Rialland and Penou-Achille Som
In the alternating H and L realization, we observe that the second H, which
is downstepped, is almost lowered to the basic H value and that the second
one is right on the H tone line. Once this H tone reference line is reached,
the following Hs are not lowered any more but instead are realized on the H
tone reference line. We note also that the rst HL interval is the largest one.
Downstep strategies vary depending upon speaker, thus determining the
number and size of the downsteps above the basic tone value and the ways
of landing on the H reference line (soft landing / hard landing). Soft
landing, observed in one speaker out of three, refers to an asymptotic decay,
while hard landing is a more abrupt pitch lowering.
In many languages, the lowering of H tones is not limited by a H reference
line but is asymptotic to a reference line below the last H tone. In their article
on English downstep, Liberman and Pierrehumbert (1984) recognize equal
steps between downsteps, given an exponential scale based on a constant
ratio d and a reference line.
Consider the equation that they propose:
H
n
=d(H
n1
r) +r
H
n
is the F0 of the n-downstep. d is a ratio (between 0 and 1), H
n1
is the value
in hertz of the n 1 downstep, r is a reference line and the asymptote of the
system. The reference line is between the last H and the bottom of the pitch
range. It is an abstract line, without linguistic meaning. In the space over the
Figure 1. F0 curves of all-H-tone-utterances (empty circles), all-L-tone utterances
(empty triangles) and utterances with alternating L and H tones (black
circles). Reproduced from Laniran (1992).
110
100
90
80
70
60
(b)
L H
H
L
HL
L H L H L L H
reference line, the ratio between a downstep and a following one is constant
(0.8, for example). This means that the steps between downsteps are equal
within an exponential scale related to the hertz scale. A similar equation has
been found to be a good predictor of downstep (more precisely downdrift) in
Chichewa, a two-tone Bantu language (Myers 1996).
In the same line of analysis, Pierrehumbert and Beckman (1988) proposed
a calculation of J apanese downstep with equal steps and a reference line
below the last H tone. However, the calculation is more complex as a
conformed space was introduced. In Spanish, Prieto and al. (1996) propose
a model without an asymptote but with a limitation at a rather low level
within speakers pitch ranges, which is generated by the equation itself.
A second asymptote (for the L tones) has been included in the calculation
to account for downstep in two African languages: Igbo (Liberman et al.
1993) and Dschang Bamileke (Bird, 1994). Igbo is a two-tone Kwa language
with downdrift and phonological downstep. Interestingly, Liberman et al.
(1993) note that their equation ts downdrift realizations but not downstep
realizations, which indicates that, in Igbo, downstep and downdrift are
phonetically different. Dschang-Bamileke is known for its complex tonology.
It has no downdrift and the phonological nature of its downstep has been
debated (partial/total). Birds mode of calculation is partly different
from Liberman et al.s and is applied to alternating L and !H tones and not to
sequences of downsteps (H!H!...).
Van Heuven (2004) studied the realization of Dutch downstep, which is
not an automatic downstep as in English, Spanish or J apanese. It does not
result from the alternation of H and L tones but can be analyzed as triggered
by oating L tones and H spreading as in many African languages. Thus,
sequences of downsteps in Dutch are realized as successions of terraces. Van
Heuven found that equal steps between downsteps can be retrieved, provided
that measurements were given in ERB. The conversion of hertz into ERB
involves a logarithm, which means again, an exponential relationship. In this
view, there is no need for a reference line in the equation. Moreover, Van
Heuven noticed three reference values (or lines) in the system: 1) an H value,
which is the value of the rst H tone of an utterance, independently of the
number of downsteps it contains, 2) a last H value, the value of the last H in
a sequence of downsteps, independently of the number of downsteps, 3) a
last L value, the ending point of all utterances. Note that these lines, which
will also be found in Dagara (see section 2), do not intervene in the downstep
calculation in Dutch.
The last study that we will be mentioning, by J . T. Hogan and M. Manyeh
(1996), concerns Kono, a two-tone Mand language, with automatic
downstep or downdrift and phonological downsteps. The utterances that were
studied contained automatic as well as non-automatic downsteps (sequences
such as H!H). These authors found equal steps between downsteps when
measurements were in musical intervals. Consequently, they did not need
a reference line in their calculation of downsteps. The relationship between
two downsteps can simply be expressed by the number of semitones between
them.
From the preceding review of the articles on downdrift/downstep, we
can draw one important conclusion: equal steps without asymptote (in ERB
or semitones) have been found in languages with sequences of downsteps
(H!H) or terraces as in Dutch or Kono.
In the next part, we consider Dagara downsteps, more precisely, the
realizations of non-automatic downsteps or downsteps triggered by a
oating L tone, showing that they are equal-step downsteps based on
measurements in musical intervals. We also consider Akan-Asante downsteps
based on Dolphynes data (1994), proposing the same kind of analysis as in
Dagara. Some data on Dagara downdrift is also considered, showing that
downstep and downdrift with alternating H and L tones are implemented
differently.
1. Dagara-Wul downstep
The present study concerns the Wul dialect of Dagara, spoken in Burkina
Faso. Dagara is a Gur language of the Oti-Volta sub-family. The main
references on Dagara-Wul tonology and phonology are: Systmatique du
signiant en Dagara, varit Wul, by P.-A. Som (1982), Linuence
des consonnes sur les tons en dagara: langue voltaque du Burkina Faso,
P. A. Som (1997), and Dagara downstep: How speakers get started
(A. Rialland and P.A. Some. (2000). The last publication provides our starting
point in this chapter.
Dagara-Wul is a two-tone language, with many polar tone afxes. As
a result, all L tone utterances are very short; it is not possible to get long
sequences of L tones, which could provide a reliable L reference line, similar
to the Yoruba one. We will consider only the H reference line, the relevant
line for our purposes.
Besides downdrift (or automatic downstep) triggered by L tones
realized on a syllable, Dagara has phonological downstep due to oating L
tones. These oating L tones occur in many words or across word boundaries
and, consequently, an utterance could contain several downsteps, such as the
following one:
(1) db m
!
n jc
!
l l
!
n
H H HH HH HH
H H HLH HLH HLH
1 1 1 2 2 3 3 4
man turtledove egg fell down
The egg of the mans turtle dove fell down
The numbers represent the levels of the H tones in a traditional way: 1
being the highest H tone and 4 the lowest H tone. Downsteps are due to
oating L tones, which are indicated with an underlined L in the second line
of tonal notation. Note that a coda consonant is moraic and bears a tone.
In this example, -n at the end of m
!
n and l at the end of jc
!
l both bear a
downstepped H tone.
The phonological nature of these downsteps, due to oating L tones, is
antagonistic to the presence of a H reference line similar to the Yoruba one
as they are distinctive and cannot simply be cancelled as is the case with
Yoruba non-distinctive downstep. Considering the Dagara-Wul system,
the questions are the following: How does Dagara keep its phonological
downsteps realized? How do the downstep intervals vary? Are they kept
constant or not? Are there asymptotes, baselines? Are there anticipatory
raisings associated with this type of downstep and the sequences of down-
steps, and what are their nature (phonological, phonetic or paralinguistic)?
Attempting to answer these questions, we will mainly consider two
corpora. The rst corpus contains sentences with an increasing number of
downsteps as well as all-H-tone-sentences of various lengths, and the second
one includes a large set of sentences, randomly selected. They differ also by
the task involved: reading for the rst corpus and repetition after the second
author for the second corpus.
2.1. Analysis of a read corpus
Consider the rst corpus which includes utterances with an increasing
number of H tones, such as the following:
1. db a man
2. db b a mans child
3. db b p:
g trn a wife of a mans child is arriving

It also contains utterances with an increasing number of downsteps such as
the following:
3. db a man
4. db m
!
n
man turtle-dove
a mans turtle-dove
5. db m
!
n jc
!
l ln
!

man turtle-dove egg fell down
the egg of the mans turtle-dove fell down
6. db m
!
n jc
!
l p
!
r mi
!
n pu
man turtle-dove egg burst sun in

the egg of the mans turtle-dove bursts in the sun
7. b
!
rc
m
!
n jc
!
l p
!
r mi
!
n pu
Baare turtle-dove egg burst sun in

the egg of Baares turtle-dove bursts in the sun
The sentences were read and recorded in various orders, interspersed with
distractors by three bilingual French-Dagara male speakers in Paris. Each
of them was presented at least twice and repeated three times. Some results
based on this corpus have been presented in Rialland and Som (2000). A
fourth male speaker (speaker A) was recorded in Burkina Faso.
In Rialland and Som (2000), we found that, as in Yoruba, all H tone
utterances are basically at with an optional nal lowering and that they
do not exhibit any anticipatory raising associated to their length or the
number of H tones that they contain. The atness of all-H-tone utterances
and the absence of anticipatory raising in these sentences are also conrmed
by the second corpus and will be exemplied by examples taken from this
corpus (see Figures 7 and 8). In the following paragraphs, we will refer to
the regression lines of the all-H-tone utterances which were calculated in
Rialland and Som (2000).
In utterances with downstep, we will consider the nature of the intervals
and the question of the equality of intervals between downsteps. We begin
with the following graphs that show the realization of ve downstep
utterances (5D utterances) by four Dagara-Wul speakers.
The sentence is the following:
7. b
!
rc
m
!
n jc
!
l p
!
r mi
!
n pu
Baare turtle-dove egg burst sun in

the egg of Baares turtle-dove bursts in the sun
Measurements were taken in order to minimize consonantal inuence (in
general in the middle of the vowels) and transitional effects. F0 is measured
on the following syllable when a downstep domain begins with a moraic
H

!
H
!
H
!
H
!
H
!
H
Figure 2. Downstep F0 measurements in 5D-utterances, as realized by 4 Dagara
speakers (A, B, C and D).
8
6
4
2
0
2
4
6
8
10
s
e
m
i
t
o
n
e
s
1 2 3 4 5 6
B
C
D
A
consonant. Thus, the value of the second downstep (point 3 on the abscissa)
is measured on the syllable jc
!
l. The F0 value is taken on the second syllable
when two syllables form the downstep domain, on mi
!
(point 5) in p
!
r
mi
!
n, for example.
Each point corresponds to the mean of 6 repetitions. The unit chosen for
this graph is the semitone (different from Rialland and Some 2000). The
conversion between hertz and semitones is based on the following equation:
f st =12 log
2
(f hz/127.09) with a 0 line at 127.09 Hz, a reference value
which can be used for male and female voices (cf. Traunmller and Eriksson
1995, Traunmller 2005). This line is helpful for comparing speakers. The
semitone is used as the unit in this graph since semitones are considered the
best unit for speaker comparisons (see Nolan 2003, in particular) and since
we expect semitones to be the appropriate units for our study of downstep
(see Hogan and Manyeh, 1996, for Kono).
The mean values of downstep intervals, referred as i, are the following:
speaker A: i =1.8 st (o =0.3 st), speaker B: i =2 st, o =0.2, speaker
C: i =1 st, o =0.2 st, speaker D i =1 st, o =0.2 st. The downsteps
are roughly equal (o being between 0.2 and 0.3 st) along the utterances
but differ depending upon speaker (mean values were between 1 and 2 st).
This speaker-dependent difference is not surprising: it is related to the pitch
range of each speaker. Two speakers (A and B) have a relatively large and
similar pitch range (9 semitones), while being different in terms of global
pitch height (speaker A being 7 semitones above speaker B). The two other
speakers (C and D) have a much smaller pitch range (5 semitones) and differ
only slightly in terms of global pitch height (2 semitones). These values of
downsteps (between 1 and 2 semitones) are rather small, as pointed out by
various Africanists who listened to our recordings.
Note that since downstep intervals in semitones are equal (or roughly
equal), the ratio of any two following downstep intervals in hertz is constant
(or nearly constant). This ratio is comparable to the constant d in Liberman
and Pierrehumbert (1984)s equation. Constant ratios could account for F0
values if the intervals in semitones remain stable, in the following way: H
n
=
d(H
n1
). The difference with Liberman and Pierrehumberts equation is that
there is no reference line (r), as in the calculation of musical intervals. In
our 5 D utterances, the mean value of the ratios between two downsteps are
the following: Speaker A: 0.90, Speaker B: 0.89, Speaker C: 0.94, Speaker
D: 0.94. These mean ratios account rather well for the measurements, as the
standard deviation of the differences between measured values and predicted
values based on these mean ratios is 2 Hz and as no difference between a
measured value and a predicted value exceeds 4 Hz.
We now consider graphs of utterances with between one and six
downsteps by our four speakers. There is an overlay of three reference lines
(the all-H-tone regression line is dashed and the last H reference line for
4
2
0
2
4
6
8
10
s
e
m
i
t
o
n
e
s
1 2 3 4 5 6 7
6 D
5 D
4 D
3 D
2 D
1 D
H
!
H
!
H
!
H
!
H
!
H
!
H
(a) Speaker A
Figures 3a, b, c, d. Downstep F0 measurements in 17D utterances, as realized by
four Dagara speakers. Three reference lines are overlaid: the all-
H-tone regression line is dashed and the two last H tone lines
(one up to 5D utterances, the second in 57D utterances) are
plain. An arrow indicates also the distance between the highest
H and the all-H-tone regression line. (Continued)
6
4
2
0
2
4
6
8
s
e
m
i
t
o
n
e
s
1 2 3 4 5 6
5D
4D
3D
2D
H
!
H
!
H
!
H
!
H
!
H
10
8
6
4
2
0
2
4

s
e
m
i
t
o
n
e
s
1 2 3 4 5 6 7 8
7D
6D
5D
4D
3D
2D
H
!
H
!
H
!
H
!
H
!
H
!
H
!
H
8
6
4
2
0
2
4
6
s
e
m
i
t
o
n
e
s
1 2 3 4 5 6
5D
4D
3D
2D
H
!
H
!
H
!
H
!
H
!
H
(b) Speaker B
(c) Speaker C
(d) Speaker D
Figures 3. (Continued)
1 to 4 downsteps as well as the last H reference line for a higher number of
downsteps are plain). An arrow indicates the distance between the highest H
and the all-H-tone regression line.
A visual examination of these graphs indicates that downsteps within a
given sequence (with 1, 2 or 7D) tend to be equal, except for the last one
in short utterances, which is larger. Note that Dagara provides a mirror image
of Yoruba, as in Dagara it is the last step which tends to be larger and not the
rst one. In order to provide a quantitative approach to the question of the
equality of downsteps, we will consider the ratios between downsteps, which
are directly related to the size of intervals in semitones, as seen above, the
means of these ratios and their standard deviations.
Speaker A has rather equal downsteps in 5D and 6D utterances (mean
ratio =0.91, =0.03), in 3D utterances (mean ratio: 0.90, =0.03),
excluding the last step, which is more important. These values are very close
to the ones we found in 4 D utterances.
Speaker B has fairly stable intervals of downsteps in 5D utterances (mean
ratio =0.90, =0.03), in 3D utterances (mean ratio =0.90), with only two
steps considered.
Speaker C also has rather equal intervals in 5D utterances (mean ratio =
0.94, =0.02), in 3D utterances (0.92). Note that intervals are slightly larger
in 3D utterances than in 5D utterances, which indicates a slight compression
of the intervals in longer utterances.
Speaker D behaves in the same way as speaker C: He has equal steps
in 5D utterances (mean ratio =0.94, =0.02) and in 3D utterances (mean
ratio =0.92).
The stability of the ratios between two downsteps, as expressed by their
means and standard deviation, conrms the visual examination of the graph
in semitones: The intervals between downsteps are fairly equal in semitones
(and, as a consequence, in the ratios corresponding to the musical intervals).
We now consider the reference lines.
In Yoruba, realizations of all-H-tone utterances provide a reference line
for the realization of H tones in Yoruba. In Dagara, H tone lines overlaid on
our graphs are regression lines of all-H-tone utterances. We observe that the
position of this line varies greatly from one speaker to another. For example,
it is rather low in speaker As range, close to the last downstep in 1 to 4D
utterances. It is higher within other speakers ranges, in particular in that of
speaker B. Thus, the H tone line varies in a speakers range and is clearly
not an asymptote, as many downstepped H tones are realized below its level.
We now consider the nal H tone reference line. We observe that nal
H tones tend to be realized on the same line, independently of the number
of downsteps. The tendency for the last tone to be realized on a baseline
provides an explanation for the larger step when the sentence is shorter: a
bigger step is needed to reach this baseline.
Moreover, a second H tone baseline can be recognized for the nal H tone
of speakers A and B when utterances get longer. These two speakers lower
their voice one more degree in order to implement an increased number of
downsteps (up to six or seven), while the maximum number of downsteps
realized by the other speakers is ve.
The arrow on the left of each graph indicates the difference between the
highest H tone and the H tone line, and consequently the maximum amount
of anticipatory raising in the realization of the H tones. The maximum can
reach 6 st or be much smaller (2 st).
These data show that all speakers start higher than the H tone reference
line, if there is at least one downstep. However, the amount of anticipatory
raising varies. Our corpus, which includes sentences with an increasing
number of words and downsteps, clearly favors this anticipatory raising.
Nonetheless, it should be noted that there is a clear-cut difference between
the absence of anticipatory raising in all-H-tone utterances and its presence in
sequences with H and L tone alternations. The large variation in anticipatory
raisings suggests a global adjustment in pitch range in order to accommodate
a larger number of intervals.
Lets recapitulate our conclusions. Dagara-Wule downstep intervals
tend to be fairly constant except for the nal one, when expressed in semi-
tones. Further, speakers have different pitch ranges and intervals. It was also
found that the all-H-tone line is not an asymptote. Speakers tend to reach a
baseline at the end of the downstep sequence and increase the last step, if
necessary to reach it. There is also a lower baseline for some speakers, at
the end of utterances with more than ve downsteps. Moreover, anticipatory
raising is always present but its amplitude varies, depending upon the
speakers.
In the following section we briey consider downdrift in order to compare
its patterning with downstep.
2.2. Comparison with downdrift
Consider downdrift realization in speaker A s utterances with alternating LH
tones.
At rst blush, it can be seen that the pattern shows asymptotic effects.
One of the signatures of an asymptotic pattern is the fact that the rst step is
the largest. In this case, it is between 3 and 5 semi-tones, while it was around
2 semitones in the downstep realizations of the same speaker (see Figure 3a).
The following steps then decrease rapidly. After the third step, the lowering
tends to be quite small until the end of utterance where a nal lowering is
observed, strongly pulling down the pitch of the last H tone and slightly
modifying the pitch of the penultimate H tone.
In this chapter, we do not attempt to determine the calculation of this
downdrift. However, like the other downdrift mentioned previously, it does
not have equal steps in terms of semitones and its general prole is asymptotic.
It can be compared to a progressively dampened oscillation.
2.3. Similarities with Akan-Asante downstep and downdrift
Akan-Asante downstep seems to share many properties with Dagara
downstep. Like Dagara, Akan-Asante has two tones with downstep. Further,
anticipatory raisings related to downstep realizations have been observed in
this language (Stewart, 1965).
We converted data taken from Dolphyne (1994) into semitones and
plotted them as we did for Dagara. The following graph (Figure 5) shows
realizations of 3D utterances by ve Akan-Asante speakers.
The graph clearly shows regularity and conrms the fairly uniform
pitch drop between downsteps noticed by Dolphyne on the basis of her
4
2
0
1 2 3 4 5 6 7 8 9 10 11 121314
2
4
6
8
10
s
e
m
i
t
o
n
e
s
6
5
4
3
2
1
L H L H L H L H L H L H L H
Figure 4. F0 measurements in utterances with L and H tone alternations, as realized
by speaker A.
measurements in hertz and her own knowledge of Akan-Asante, as a native
speaker of the language.
Akan-Asante also seems to be similar to Dagara in terms of the difference
between downstep and downdrift. Moreover, Akan-Asante has the signature
of an asymptotic downdrift: a large rst step.
2.4. Analysis of a corpus of repetitions
The rst corpus, produced by bilinguals, included sentences with an
increasing number of words and downsteps. We found that their structure
could be easily predicted and this could inuence anticipatory raisings. The
second corpus involves speakers who are not bilingual.
Additional recording sessions were organized in Burkina in the Wul
dialect region. However, as monolingual Dagara speakers are illiterate,
the second author, who is a native speaker of Dagara-Wul, read the
sentences and had the speakers repeat after him. In this way, the data came
from what could be referred to as a repetition task. Each speaker was
recorded independently. The sentences were presented in three different
random orders and repeated three times. Thus, each sentence was recorded
nine times.
A variety of repetition tasks have been used in psycholinguistics.
Repetition tasks and shadowing tasks (an extreme form of repetition task)
4
2
0
2
4
6
8
10
s
e
m
i
t
o
n
e
s
1 2 3 4
sp.5
sp.4
sp.3
sp.2
sp.1
Figure 5. Downstep F0 measurements in a 3D utterance by ve speakers in Akan-
Asante (based on Dolphynes data, 1984)
have been used to test various theories of speech processing and to explore
links between perception and production (cf. Marslen-Wilson 1973, Mitterer
and Ernestus 2008, among others). In these various tasks, it has been shown
that speakers do not imitate but instead rely on their own semantic, syntactic
and phonological processing. In fact, the difference between repetition and
imitation belongs to everyday experience. When repeating, speakers keep
their own variants of consonants, vowels and phonological rules (for example,
their own pattern of dropping schwas in French), while when imitating, they
attempt to reproduce precisely the speakers variants.
Keeping these observations in mind, lets consider the second corpus
which includes forty utterances of different lengths and tone patterns, such
as the following:
(8) nib c w n
!
the people came The people came.

(9) b b kl
the child Neg. is gone Neg The child is not gone.
(10) s mn
!
k zu
m:
y
!
the rain rained and sh got out It rained and the sh got out.
(11) b b l ku
n wn wl The child who does not

the child whoNeg. hear adviser listen to an adviser,
?u
l:
nu
k b:
zi
` c m wl is the one who is advised

him-it-is that hole-red Hab. advis . by the red hole (=tomb)
(12) nn nb
!
b dng b mg
!
ngd
chief cow Neg. never be other- side-river Neg
The chiefs cow must never be on the other side of the river
(proverb)
We begin by comparing the same long utterance as produced by the ve
speakers (15). The values are plotted in hertz (Figure 6a), ERBs (Figure 6b)
and semitones (6c). The utterance is:
(13) dn bn mi
nn
!
yd dk d on
There is millet beer in the compound of Nminna in Daakas house.
75
100
125
150
175
200
225
250
275
300
325
350
h
z
sp. 5
sp. 4
sp. 3
sp. 2
sp. 1
1 2 3 4 5 6 7 8 9 10 11 12 13
3
4
5
6
7
8
9
E
R
B
sp. 5
sp. 4
sp. 3
sp. 2
sp. 1
1 2 3 4 5 6 7 8 9 10 11 12 13
5
2,5
0
2,5
5
7,5
10
12,5
15
17,5
s
e
m
i
t
o
n
e
s
sp. 5
sp. 4
sp. 3
sp. 2
sp. 1
1 2 3 4 5 6 7 8 9 10 11 12 13
Figure 6. F0 curves of the utterance: dn bn mi
nn
!
yd dk d on There is
millet beer in the compound of Nminna in Daakas house. by 5 speakers,
in hertz (Figure 6a), in ERBs (Figure 6b), and semitones (Figure 6c).
(a)
(b)
(c)
Each point corresponds to three repetitions. There is one measurement on
short vowels and two on long vowels or long diphthongs such as ion [y:
:]
in d on. The tessitura of the speakers differs as follows. Three male speakers
(speakers 2, 4, 5) have a low-pitched voice, one male speaker (speaker 3)
has a higher-pitched voice than the three other male speakers, and the female
speaker (speaker 1) has the highest tessitura. This sample of voices covers
one octave.
At rst blush, it can be seen that there is a striking parallelism between
the ve realisations, which is even clearer when represented in ERBs or
semitones. We evaluated the parallelism numerically, using coefcients of
variation of the differences between curves. If this coefcient is 0%, it means
that the distance between 2 curves does not vary and, consequently that the
2 curves are parallel. We compared the realizations of speakers 13, two at a
time, leaving aside speakers 4 and 5 which are quite similar to speaker 2. The
mean of the coefcient of variation between the curves of these speakers are:
hertz: c
=24%, ERBsc
=11%, semitones c
=10%.
While a higher score for Hz was expected, it is well known that hertz
is not a suitable unit to compare speakers. There is a slight advantage to
using semitones as opposed to ERBs in terms of parallelism among the
three speakers. These results are consistent with Nolans (2003) results
which were based on an imitation task in English. Based on our results and
Nolans similar results, we chose to use semitones as a unit for our study,
keeping in mind that we might have obtained rather similar results with
ERBs. The semitone was also a more convenient unit since it is a musical
unit, which could then be used for comparison when considering musical
scales.
All speakers execute the same score with similar intervals in terms of
semitones. The parallelism between the ve realizations indicates that
the process involved is a simple transposition. There are two possible
explanations for this parallelism. First, we can hypothesize that speakers
extract the musical score directly from Achille Soms speech and conform
their speech to his score or that they parse the sentence at all linguistic levels
(phonetically, syntactically, etc.) and produce an analog of it, according to
their linguistic knowledge and patterns. The second hypothesis implies that
they produce the same score, because it is the score that they would have
produced anyway, given their linguistic experience and the whole context.
Traditional arguments in favor of the second hypothesis come from mistakes
(Marslen-Wilson 1973). In fact, we also found mistakes in our corpus. One
of the most common is the omission of the denite article or conversely, the
addition of a denite marker. This type of mistake supposes that the speakers
go through a complete linguistic path (semantic, syntactic, morphological)
in order to produce their utterances. This second hypothesis would imply that
speakers encode similar intervals in similar contexts.
Consider additional utterances by the ve speakers, beginning with two
all-H-tone utterances (except for the rst tone) of different lengths (Figures 7
and 8). These sentences are rather at (except for the L tone at the beginning),
with a slight optional lowering on the last word of the longer utterances. There
is no signicant difference between the H maxima of these sentences (plus
another all-H-tone utterance), despite their length differences (ANOVA: F
(2,44) =0.36, p =0.7). These sentences have been interspersed with many
non-all-H sentences, which prevents any inuence between them. The curves
show that all speakers tend to be consistent in the production of their H tone
line, independently of the length of the utterance. The data also conrms
that there is no anticipatory raising associated with sequences of H tones. In
all of these sentences, there is a L tone at the beginning and we assume that
it has no inuence on the following H tones except a local inuence on
the rst H tone, which is lowered. This is veried in examples where an
initial is present in one form but absent in another; the presence of with
a L tone does not modify the pitch of the following H tones, except for the
rst one. Noticing the same fact in Yoruba, Laniran and Clements (2003)
concluded that downstep is triggered only by the HL order and not by the
reverse tonal order.
Figure 7. F0 curves of the all-H-tone-utterances (except for the L at the beginning
b b kl The child is not gone. by 5 speakers.
10
5
0
5
10
15
20
s
e
m
i
t
o
n
e
s
1 2 3 4 5 6
sp. 5
sp. 4
sp. 3
sp. 2
sp. 1
L H H H H H
Lets now consider an utterance with one downstep (Figure 9). The literal
translation of this utterance is as follows:
tu
nb:
z
!
m n
the work overcome me Assert
I have too much work
10
5
0
5
10
15
20
s
e
m
i
t
o
n
e
s
1 2 3 4 5 6 7 8
sp.5
sp.4
sp.3
sp.2
sp.1
L H H H H H H H
Figure 8. F0 curves of the all H tone utterance (except for the L at the beginning):
db btrn A mans child is arriving by 5 speakers
Figure 9. F0 curves of the 1D utterance: to
nb
z
!
mn I have too much work
by 5 speakers.
10
5
0
5
10
15
20
s
e
m
i
t
o
n
e
s
1 2 3 4 5 6 7
sp. 5
sp. 4
sp. 3
sp. 2
sp. 1
L H H
!
H H H H
The single downstep in the utterance is quite large. Its interval corresponds
to 3 or 4 semi-tones, depending upon the speaker. The realizations of the ve
speakers remain quite parallel.
However, the realizations differ in terms of the amplitude of the anticipa-
tory raisings, as shown by the graphs in Figure 10, which combine for each
speaker two F0 curves shown previously: his/her mean F0 curve of an all-
H-tone utterance (Figure 8) and his/her mean F0 curve of an 1D utterance
(Figure 9).
The F0 curves of all-H-tone utterances with a L at the beginning (or (L)
H) are represented by lines with empty circles, while lines with plain
circles refer to F) curves of 1D utterances. The gures have been ranked in
descending order of anticipatory raisings, the rst speaker (speaker 1) having
the largest anticipatory raising and the last speaker (speaker 5) having almost
none. These examples conrm our ndings in the rst corpus: anticipatory
raisings vary considerably from one speaker to another and the H tone line
does not provide an asymptote in the system.
We now consider an utterance with two downsteps (Figure 11):
snn w
!
nin n
!
c
nd kn
the foreigner brought meat fat
The foreigner brought fat meat
The realization of this sentence illustrates the difference between the rst
downstep and the last one in a sequence with two downsteps: the mean value
for the rst downstep interval is 2 st while it is 4 st for the last one.
We now consider an utterance with three downsteps: one of them triggered
by a L tone and the 2 others by a oating L tone (Figure 12):
k:
n b
!
kpic
d- u
!
c
hunger NEG get into him NEG

He is not hungry.
The downstep interval is around 2 semitones. It can be noted that the
dropping interval due to a low tone realized on a mora (on c
of kpic
d-o
)
is more important than the downstep interval. These data also conrm the
previous ndings that the last step of the downstep is not increased when the
tone realizations get closer to a low baseline.
In this second corpus, we found that the speakers transpositions are
parallel and the variations in the speakers pitch ranges are small. This
parallelism could not be achieved if the scores played by all of the speakers
10
5
0
5
10
15
20

s
e
m
i
t
o
n
e
s
1 2 3 4 5 6 7 8
(L)H
1D
(a) Speaker 1
10
5
0
5
10
15
20
s
e
m
i
t
o
n
e
s
1 2 3 4 5 6 7 8
(L)H...
1D
(b) Speaker 4
Figure 10. F0 curves of a (L)H utterance (line with light circles) and of a 1D
utterance (line with black circles) for ve speakers. (Continued)
10
5
0
5
10
15
20
s
e
m
i
t
o
n
e
s
1 2 3 4 5 6 7 8
(L) H...
1D
(c) Speaker 2
10
5
0
5
10
15
20
s
e
m
i
t
o
n
e
s
1 2 3 4 5 6 7 8 9 10
sp.5
sp.4
sp. 3
sp. 2
sp. 1
Figure 11. F0 curves of the 2D utterance: snn w
!
ninn
!
c
nd kn The
foreigner brought fat meat
10
5
0
5
10
15
20
s
e
m
i
t
o
n
e
s
1 2 3 4 5 6 7 8
(L) H...
1D
10
5
0
5
10
15
20
s
e
m
i
t
o
n
e
s
1 2 3 4 5 6 7 8
1D
(L) H...
(d) Speaker 3
(e) Speaker 5
Figure 10. (Continued)
were not based on similar intervals. The speakers parsed the score of the
model in terms of intervals and transposed it within their tessitura.
While the score was reproduced by the speakers in a similar manner,
we showed that they vary in terms of tessitura, as well as in terms of their
anticipatory raisings. This conrms that the raising is part of a general
adjustment of the voice in order to make room for the realization of numerous
intervals.
While we expected to see variations in pitch range in the repetitions since
pitch range is speaker-dependent, the variation is also linguistically signicant.
Expansion of pitch range, for example, is typically used in questions and
focus. Reduction of pitch range is common in post-focus. In Dagara, there
are important pitch ranges variations in discourse (foregrounding and
backgrounding, in particular). We suggest that in these repetitions, pitch
ranges and the intervals associated to them have been reproduced, probably
because they belong to the linguistic system of Dagara.
4. Comparison with the scaling of the Dagara
Wul eighteen key xylophone
In this paragraph, which is quite tentative, we make an attempt to compare
linguistic and musical scaling in Dagara-Wule, based on a preliminary study
of the scaling of an eighteen key xylophone, belonging to the second authors
family.
10
5
0
5
10
15
20
s
e
m
i
-
t
o
n
e
s
1 2 3 4 5 6
sp.5
sp.4
sp.3
sp.2
sp.1
Figure 12. F0 curves of the 3D utterance: k
n b
!
kpic
d-o
!
c
He is not
hungry.
The xylophone, which was in the family compound of the second authors
family in Dagara-Wul region, was recorded over the phone from Paris. The
keys were struck by a xylophone player in a decreasing order in terms of
pitch, from the highest to lowest. Note that only seventeen keys out of the
eighteen keys are played in the Dagara Wul music. Thus, the eighteenth key
was not considered in our test. F0 measurements were made on the stable
part of each note with PRAAT (Boersma and Weenink, 2010), from narrow
band spectrograms and spectrum slices. When the rst harmonic was not
available, F0 was inferred from the pitch difference between two successive
harmonics. A musical notation was also provided by a French musician and
singer who was not familiar with African temperaments. It includes usual
notes such as A, B, etc. and +or symbols to indicate whether the pitch of
a key was higher or lower than the note used to transcribe it. This notation
appears on line (1) in the table below. The table also shows the values of each
key in hertz (line 2) and in semitones (line 3) with a baseline at 127.09 hz
(the same as in the previous studies). The scaling is pentatonic, with a D note
(or C#) recurring regularly after four other notes (keys: 1, 6, 11, 16). Note
that the pitch of one key (key 14) could not be established.
We now consider the intervals, as they have been transcribed by our
French musician and established from the measurements:
In the musical notation, intervals vary basically between 2 semitones and
2.5 semitones (except for the last one, which is larger). These two intervals
do not seem to be organised into any recursive pattern within the pentatonic
scale. The measured intervals show a slightly larger dispersion: from 1.8 st
to 2.7 st. Again, no recursive pattern seems to emerge. The mean of these
Table 1. Musical notation, hertz values, semitones values of a 17 key Dagara-
Wule xylophone.
key 1 2 3 4 5 6 7 8
(1) note
C#
B A G- E D B A +
(2) hertz 560 506 450 393 336 292 251 223
(3) st 25.7 23.9 21.9 19.5 16.8 14.4 11.8 9.7
key 9 10 11 12 13 14 15 16 17
(1) note
F# +
E D B+ A ? E D B
(2) hertz 190 163 147 126 110 ? 85 73 62
(3) st 7 4.3 2.5 0.1 2.5 ? 7 9.6 12,4
intervals (excluding the last one) is 2.4 semitones with a 0.4 st standard
deviation. Based on the absence of recursive patterns and the rather small
standard deviation, we suggest analyzing the Dagara-Wul xylophone
intervals as (roughly) equal, within a pentatonic scale.
Consequently, we hypothesize that there is a relationship between the
linguistic scaling in Dagara-Wul, as manifested in downstep sequences,
and the musical scaling in the same culture, as found in a eighteen key
xylophone. Both seem to share a common basis: (roughly) equal steps in
terms of semitones. However, this common point might be coincidental, as
downstep is widespread in African languages and there is a large variety of
tunings found in African music.
5. Conclusion
In the rst part of this chapter, we examined the realization of downsteps in
Dagara-Wul by ve speakers and showed rather equal intervals between
them when they are expressed in semitones. In the second part, it was shown
that in a repetition task, the productions of the speakers were quite parallel
within a musical scale (with tones, semitones, cents).
These two sets of data converge towards a hypothesis: There is a linguistic
pitch scaling in a language such as Dagara (with two tones and downstep)
based on musical type intervals (dened by a ratio between two frequencies).
This pitch scaling emerges in these two phenomena but does not determine
the whole tone realization. Thus, an equal step-based downstep co-exists
Table 2. Intervals between the 17 xylophone keys, as transcribed by a French
musician (line 1) and measured with PRAAT (line 2). The unit is the
semitone.
Intervals
between keys 1/2 2/3 3/4 4/5 5/6 6/7 7/8 8/9
(1) Notation 2 2 2.5 2.5 2.5 2.5 2.5 2.5
(2) Measur. 1.8 2 2.4 2.7 2.4 2.6 2.1 2.7
Intervals
between keys 9/10 10/11 11/12 12/13 13/14 14/15 15/16 16/17
(1) Notation 2.5 2 2.5 2.5 ? ? 2 3
(2) Measur. 1.8 1.8 2.6 2.4 ? ? 2.6 3.2
with an asymptotic downdrift, an oscillating conguration which might be
triggered by constraints on the production of alternating L and H tones.
The Dagara culture is also well known for its xylophone music.
Considering the scaling of the eighteen key xylophone of Penu-Achille
Soms family, we hypothesise that downstep scaling and xylophone scaling
might have a common point: (roughly) equal steps, in terms of semitones.
This study is quite tentative, being based on the scaling of one instrument,
and could only be considered a rst step in the investigation of relationships
between linguistic and musical scalings in this culture. Analyzing Dagara
pentatonic xylophone music in its various components as well as speech
transposition on the xylophone would shed light on the role of intervals and
reference lines in music and in speech.
References
Bird, Steven
1994 Automated Tone Transcription. In S. Bird (ed.). Proceedings of the
First Meeting of the ACL Special Interest Group in Computational
Phonology. Las Cruces (MN, USA): ACL
Boersma, Paul and David Weenink
2010 PRAAT: doing phonetics by computer. http://www.fon.hum.uva.nl/
praat
Dolphyne, Florence A.
1994 A Phonetic and Phonological study of Downdrift and Downstep in
Akan. ms.
van Heuven, Vincent J .
2004 Planning in speech melody: production and perception of downstep
in Dutch. In H. Quen and V. J . van Heuven (eds.). On speech and
language: Studies for Sieb G. Nooteboom, 8393. LOT Occasional
Series. Utrecht University.
Hogan J ohn T., and Morie Manyeh
1986 Study of Kono Tone Spacing. Phonetica 53: 221229.
Laniran, Yetunde. O.
1992 Intonation in tone languages: the phonetic implementation of tones in
Yoruba. Ph.D. diss., Cornell University.
Laniran, Yetunde O., and George N. Clements
2003 Downstep and high raising: interacting factors in Yoruba tone
production. Journal of Phonetics 31. 203250
Liberman, Mark, J . Michael Shultz, Soonhyun Hong, and Vincent Okeke
1993 The phonetic interpretation of tone in Igbo. Phonetica 50. 147
160
Liberman, Mark, and J anet Pierrehumbert
1984 Intonational invariance under changes in pitch range and length. In:
M. Aronoff, and R. T. Oehrle (eds.), Language and sound structure.
157233. Cambridge, MA: MIT Press.
Marslen-Wilson, William
1973 Linguistic structure and speech shadowing at very short latencies.
Nature 244. 522523
Mitterer, Holger, and Mirjam Ernestus
2008 The link between perception and production is phonological and
abstract: Evidence from the shadowing task. Cognition 109. 163173
Myers, Scott
1996 Boundary tones and the phonetic implementation of tone in Chichewa.
Studies in African Linguistics 25: 2960.
Nolan, Francis
2003 An experimental evaluation of pitch scales. Proceedings of the 15
th

Congress of Phonetic Sciences. Barcelona. 771774
Pierrehumbert, J anet B., and Mary E. Beckman
1988 Japanese Tone Structure. Linguistic Inquiry Monographs 15.
Cambridge (Ma, USA). MIT Press
Prieto, Pilar, Chilin Shih, and Holly Nibert
1996 Pitch downtrend in Spanish. Journal of Phonetics 24. 445473
Rialland, Annie, and Penu-Achille Som
2000 Dagara downstep: how speakers get started. In V. Carstens and F.
Parkinson (eds.). Advances in African Linguistics 251262. 4. Trenton
(N.J ., U.S.A.) Africa World Press.
Som, Penu-Achille
1982 Systmatique du signicant en Dagara: varit wl. Paris.
LHarmattan-ACCT
Som, Penu-Achille
1997 Inuence des consonnes sur les tons du Dagara: langue voltaque du
Burkina Faso. Studies in African Linguistics 271. 347
Stewart, J ohn M
1965 The typology of the twi tone system, preprint of the Bulletin of the
Institute of African Studies I, Legon.
Traunmller, Hartmut
2005 Auditory scales of frequency representations.
http://www.ling.su.se/staff/hartmut/bark.htm
Traunmller, Hartmut, and Anders Eriksson
1995 The perceptual evaluation of F
0
excursions in speech as evidenced in
liveliness estimations. Journal of the Acoustical Society of America
97. vol 3. 1905191
2. The representation and nature of
phonological features
Crossing the quantal boundaries of features:
Subglottal resonances and Swabian diphthongs
1
Grzegorz Dogil, Steven M. Lulich,
Andreas Madsack, and Wolfgang Wokurek
1
1. Introduction
In phonology, as it has been laid out since Trubetzkoy (1939/1969), distinctive
features organize natural classes of sounds. Classes of sounds are considered
natural if their members function together in phonological rules and sound
laws across languages. These functional criteria of dening and choosing
a set of distinctive features prevailed into generative models of phonology
(Chomsky and Halle 1968), however they have been substantially enriched
by such formal considerations such as feature hierarchy (Clements and Hume
1995) and feature economy (Clements 2003). The criteria of phonetic and
physiological naturalness played a minor role in the systems of distinctive
features, with the exception of considerations of auditory distinctiveness
(Liljencrants and Lindblom 1972; Flemming 2005).
The acoustic theory of speech production at its outset (Fant 1960) has
dened a universal set of features which allowed an unconstrained set of
speech sounds from which languages were supposed to select a subset of
their distinctive oppositions (J akobson, Fant and Halle 1952). Further
research on distinctive features within the acoustic theory of speech
production has led to the seminal discovery of the quantal theory of speech
(Stevens 1972, 1989, 1998). In his, at rst hermetic but now textbook proof
argument, Stevens has shown that an acoustically motivated set of distinctive
features is universally constrained by a set of nonlinear articulation-to-
acoustic mappings characteristic of the human speech production apparatus.
Stevens` quantal model proved that equal movements of the articulators
do not lead to equal movements in the acoustic parameters of speech. To
the contrary, he discovered that some small articulator movements lead
to large acoustic changes, and, in other areas of articulatory space, large
movements lead to small variation in the acoustic parameters. Following
Lulich (2010), we will name the regions in which a small articulatory
138 Grzegorz Dogil, Steven M. Lulich, Andreas Madsack, and Wolfgang Wokurek
change leads to a large acoustic change boundaries, and we will name the
areas in which there is small acoustic change in spite of large articulatory
movements states. The boundary and its two anking states form the
basis of the denition of any distinctive feature within the quantal theory
(Lulich 2010; Stevens and Keyser 2010). Moreover the speech production
system is constrained by the avoidance of boundary areas, because of the
great acoustic instability caused by the movement of articulators across these
areas.
One set of natural, physiologically motivated boundaries and states is
dened by the subglottal cavities (the trachea and the main bronchi). The
subglottal airway, just like the vocal tract, has its own natural resonant
frequencies. Unlike the vocal tract, the subglottal airway does not have
articulators. Hence, subglottal resonances are roughly constant for each
speaker and do not vary much within and across utterances of a single
speaker. As such they are ideal as a set of boundaries by which a distinctive
feature can be dened. Stevens (1998: 299303) has proven that subglottal
resonances (labeled Sg1, Sg2, etc. hereafter) when coupled with the
supraglottal resonating system (giving rise to formants F1, F2, etc.) lead to
formant discontinuities in strictly dened narrow-band frequency regions.
2

The discontinuities are not only spectrally visible (Stevens 1998; Stevens and
Keyser 2010) but they also affect the perception of vowels and diphthongs
(Lulich, Bachrach and Malyska 2007). The narrow-band regions of acoustic
instability dened by subglottal resonances are an ideal candidate for what
quantal theory considers as a boundary between +/ values of distinctive
features. Indeed, convincing evidence, particularly for the feature [back]
has been provided for English (Chi and Sonderegger 2004; Lulich 2010).
In this paper we will provide additional evidence for the boundariness
character of subglottal resonances in German, a language with a particularly
crowded vowel space. Morever, we will show that this boundary is used
to distinguished two types of otherwise indistinguishable diphthongs in a
Swabian dialect of German.
2. Subglottal resonances
Recent studies have shown that subglottal resonances can cause discontinui-
ties in formant trajectories (Chi and Sonderegger, 2007), are salient in speech
perception (Lulich, Bachrach and Malyska, 2007), and useful in speaker
normalization (Wang, Lulich and Alwan, 2010), suggesting that variability
Crossing the quantal boundaries of features 139
in the spectral characteristics of speech is constrained in ways not previously
noticed.
Specically, it is argued that 1) for the same sound produced in different
contexts or at different times, formants are free to vary, but only within
frequency bands that are dened by the subglottal resonances; and 2) for
sounds which differ by certain (place of articulation) distinctive features,
certain formants must be in different frequency bands. For instance, given
several productions of the front vowel [], the second formant (F2) is
free to vary only within the band between the second and third subglottal
resonances (Sg2 and Sg3), but in the back vowel [a] F2 must lie between the
rst and second subglottal resonances (Sg1 and Sg2). The feature [+/back]
is therefore thought to be dened by whether F2 is below Sg2 ([+back]) or
above it ([back]).
An analysis of formant values in multiple languages published in the
literature has shown that the second subglottal resonance (Sg2) lies at the
boundary between [back] and [+back] vowels (cf. Sonderegger 2004; Chi
and Sonderegger 2004, for the analysis of 53 languages). Moreover, the
individual formant values tend to avoid the individual subglottal resonance
boundaries. The individual adult speakers of English (Chi and Sonderegger
2004) and Hungarian (Csap et al. 2009) tended to produce [back] vowels
with F2 higher than Sg2 (F2>Sg2), and [+back] vowels with F2 lower than
Sg2 (F2<Sg2).
Traditionally, the [+back]/[back] distinction is based on the concept
of center of gravity (COG) developed in classical experimental phonetic
literature (cf. Chistovich 1985 for relevant experiments on speech perception,
and Syrdal and Gopal 1986 on experiments on speech production). These
studies dene front and back vowels by the distance in the spectral peaks
of the second and third formants. If the distance between F2 and F3 is greater
than 3.5 bark, the vowels are perceived as front, if the distance is smaller
the vowels are perceived as [+back]. Although the concept of center of
gravity and its operation with the formula F33.5 bark has proven useful in
experimental, laboratory settings, it has underperformed in the task of speech
normalization (Adank, Smits and van Hout 2004).
Lulich 2010 has tested both Sg2 and F33.5 bark boundaries on the data
of adult English speakers (n =14), English-speaking children (n =9), and
the running speech of a single adult speaker of American English. The Sg2
boundary outperformed the traditional F33.5 bark boundary as the predictor
of [back]/[+back] distinction in the child data and has proven functional in
running speech. The Sg2 boundary hypothesis has proven equally predictive
as the center of gravity hypothesis on its own ground, i.e. vowel perception.
Lulich, Bachrach and Malyska (2007) not only ruled out F33.5 bark and
COG hypotheses as explanations of their perceptual results but provided
evidence that the discontinuity in F2 caused by Sg2 affects the perception of
front and back vowels.
3. Subglottal resonances in German and Swabian
We present new evidence that the same relation of F2 and Sg2 applies to both
High German and Swabian German monophthongs, and suggest that the F2
discontinuity caused by Sg2 boundary can account for the contrast between
the Swabian German diphthongs [aj] and [ej].
Standard German has a particularly rich vowel system (Mangold 2000).
No less than 15 vowels are considered phonemic. The space of front vowels
is particularly crowded. 10 vowels are classied as [back] in German (they
contain rounded/unrounded as well as tense/lax oppositions).
The Swabian dialects of German are spoken in the southern region of
Baden-Wrttemberg, in a large area around Stuttgart. The dialects possess
strong lexical and grammatical characteristics, but are immediately distin-
guishable from other German dialects because of the pronunciation. The
most characteristic features that distinguish Swabian from other varieties
of German is the loss of aspiration in stops (Swabian voiceless stops are
perceived as voiced by Germans speaking northern dialects), the loss of
the voiceless alveolar fricative (which has been replaced by the voiceless
postalveolar fricative), the distinctive tonal alignment (Kgler 2007), and
strong diphthongization of Standard German monophthongs.
Particularly the Swabian diphthongs [aj] and [ej] are hard on the speakers
of Standard German. Swabians use these dithphongs in minimal pairs, which
are homophones in Standard German. For example, wei white and wei I
know, Weide willow and Weide pasture, die Taube pigeon and der Taube
deaf person, which are homophones in Standard German, are distinguished
by the critical diphthongs in Swabian. Interestingly, the speakers of Standard
dialects often do not perceive this distinction at all. Moreover, as most of
the Swabian speakers state (with some pride), the speakers of Standard
German and other German dialects are unable to produce this distinction. It
simply never sounds right when produced by a stranger. Although Swabian
is difcult to understand for speakers of Standard German, it is spoken with
pride by the majority of the local population.
3
3.1. Subjects and material
We made recordings of twelve native speakers of German, including eight
Swabians, and four uent non-native speakers (whose native languages were
Russian (1), Georgian (2), and Turkish (1)). The speakers read sentences
which included both of the [aj] and [ej] diphthongs (non-Swabians produced
them without a distinction), as well as sentences containing nonsense words
designed to elicit monophthongs in a neutral phonetic environment. In the
latter case, the carrier sentence was Peter hat hVd gesagt (Peter said hVd),
where the vowel was any of the Standard German monophthongs.
3.2. Recording procedure
The recordings were made at the Institute of Natural Language Processing
(IMS) at Universitt Stuttgart in a room that is reasonably anechoic above 200
Hz and has a reverberation time of about 27 milliseconds. Eight simultaneous
recording channels are available. We collected data using a microphone
(AKG K62ULS 1 recording channel), a standard electroglottograph
(Glottal Entreprises 3 recording channels), and an acceleration sensor
(4 recording channels).
The acceleration sensors used to sample subglottal resonances were
developed at IMS (Wokurek and Madsack 2009). The sensor which is
presently used is a cube of 1 cm edge length containing two micromechanical
acceleration sensors that record the acceleration along all three axes. The axis
recording the main movement that is perpendicular to the skin of the neck is
converted by both sensor devices and added during the digital measurement
processing to improve the signal-to-noise ratio.
At the side of the cube that is pressed to the neck, a 5 mm nose made of
hot glue is mounted to improve contact and to maintain this contact while
the larynx moves up and down during speaking. The opposite side of the
cube is glued to a balloon inated to a diameter of about 6 cm. The balloon
is a good solution to provide a steady gentle force towards the neck and still
allow the sensor to follow the movements of the ligamentum conicum. The
ligamentum conicum is the critical spot below which the subglottal resonances
can be registered (Cheyne 2002). It can be found by touching the larynx and
searching for a small gap in the otherwise hard larynx structure. The plastic
nose of the sensor is pressed gently to the neck where the gap was found. The
balloon is held laterally by two or three ngers during the recording.
The mass of the nose, the sensor cube and the connecting cable is about
1.5 grams. This structure of elastic mounted mass shows its own resonances.
The resonances depend on the balloon and the amount it is inated. For
our sensor they are located in the range between 100 Hz and 200 Hz. This
is well below the range of the subglottal resonances (500 Hz700 Hz for
the Sg1 and 1300 Hz1600 Hz for Sg2) and is ltered out together with the
fundamental frequency components by a digital high pass lter during the
recording.
A signal from the sensor is recorded at 48 kHz and reduced to 8 kHz
since all frequency measurement results are located below 4 kHz. A
bandpass lter with a pass band between 250 Hz and 2 kHz is applied to
the subglottal signal in order to remove all signal components above and
below the expected range of the rst and second subglottal resonances. The
manual measurements of the rst two subglottal resonances are made with
wavesurfer, using the spectrogram, the short time fourier transform, and
linear prediction (Wokurek and Madsack 2009).
A standard linear prediction based procedure was used to estimate the rst
two formants of the microphone signal. The formant tracts from wavesurfer
were used in the manual measurements and the formant program from ESPS
was used in automatic formant estimation.
4. Results and discussion
Front vowels, including the rounded front vowels, had F2>Sg2, and back
vowels had F2<Sg2. There was some variability with regard to the low back
vowel [a], for which the relative frequencies of F2 and Sg2 were dependent
on the speaker. Hence, the Sg2 quantal boundary appears to be a good
predictor of the feature [back] even in a language with a very crowded vowel
space. The low vowel [a] generally had F1>Sg1 whereas all other vowels
had F1<Sg1. This is congruent with the hypothesis in Stevens (1998) that the
feature [+/ low] is dened by a relation between Sg1 and F1 (see Table 1).
The notorious Swabian diphthongs [aj] (e.g. Weide, willow) and [ej]
(e.g. Weide, pasture) are an interesting case because no consistent spectral
distinction between them has been found, and speakers of non-Swabian
German dialects do not consistently perceive a difference between them.
They are also unable to produce the difference between these diphthongs,
even where a lot of instruction, talent and diligence is put into this task.
It has been suggested that different temporal patterns of F2 movement
during the diphthongs underlie the contrast (Geumann, 1997; Hiller, 2003).
Table 1. Percentage of monophthong tokens obeying the subglottal hypothesis. The
vowel [a] was considered a back vowel, although for some speakers it is
consistently front according to the Sg2 criterion.
Swabian Non-Swabian
German
Non-German
Front vowels F2>Sg2 96.88% 96.88% 97.92%
Back vowels F2<Sg2 62.50% 70.83% 61.11%
Low vowels F1>Sg1 72.92% 87.50% 83.33%
Non-low vowels F1<Sg1 90.63% 97.92% 98.61%
5000
4000
3000
2000
1000
0
5000
4000
3000
2000
1000
0
0 200 400 600
Time (ms)
F
r
e
q
u
e
n
c
y

(
H
z
)
F
r
e
q
u
e
n
c
y

(
H
z
)
5000
4000
3000
2000
1000
0
0
F
r
e
q
u
e
n
c
y

(
H
z
)
800 0 200 400 600
Time (ms)
800 1000
5000
4000
3000
2000
1000
F
r
e
q
u
e
n
c
y

(
H
z
)
0 200 400 600
Time (ms)
800 0 200 400 600
Time (ms)
800
A)
B)
C)
D)
Figure 1. Spectrograms of the minimal pairs A) Weide ([rajde], willow) and B)
Weide ([rejde], pasture), and C) Saite ([sajte], string) and D) Seite
([sejte], page) from one Swabian speaker. The horizontal lines mark the
approximate frequency of Sg2, and arrows point out where F2 is below
Sg2 in the [aj] diphthongs.
We conducted a new analysis of the original speech data reported by
Geumann (1997), as well as an analysis of tokens produced by our High
German and Swabian German subjects. The High German speakers showed
no systematic pattern to distinguish the two diphthongs, as expected. Of the
six speakers of Swabian in Geumann (1997), there was a consistent spectral
difference between the two diphthongs in all but one case. F2 in [aj] either
began below Sg2 before crossing into higher frequencies, or it dipped below
Sg2 briey before crossing again into the higher frequencies (the latter case
occurred when a consonant with a high F2 locus preceded the diphthong).
In [ej], on the other hand, F2 began above Sg2 in almost all cases and rarely
dipped below it (see Figures 13).
The results suggest that for Swabian German speakers who distinguish
the [aj] and [ej] diphthongs, this contrast is accomplished by a spectral
cue, namely, the frequency of F2 relative to Sg2 at or near the beginning
of the diphthong. Apparently, the speakers make a difference by using their
subconscious knowledge of the placement of the quantal boundary in speech.
5000
4000
3000
2000
1000
0
0 200 400 600
Time (ms)
0 200 400 600 800 1000
Time (ms)
F
r
e
q
u
e
n
c
y

(
H
z
)
800
(A)
5000
4000
3000
2000
1000
0
F
r
e
q
u
e
n
c
y

(
H
z
)
(B)
5000
4000
3000
2000
1000
0
0 200 400 600
Time (ms)
F
r
e
q
u
e
n
c
y

(
H
z
)
800
(C)
5000
4000
3000
2000
1000
0
0 200 400 600
Time (ms)
F
r
e
q
u
e
n
c
y

(
H
z
)
800
(D)
A) C)
B) D)
Figure 2. Spectrograms of the minimal pairs (A) weiss ([rajs], to know) and
(B) weiss ([rejs], white), and (C) reisst ([rajst], tear up) and D) reisst
([rejst], to travel) from one Swabian speaker. The horizontal lines mark
the approximate frequency of Sg2, and arrows point out where F2 is
below Sg2 in the [aj] diphthongs.
5000
A) C)
B) D)
4000
3000
2000
1000
0
0 0 100 200 300 400 500 600 700 200 400 600 800
Time (ms)
0 200 600 800 1000 400
Time (ms)
0 100 200 300 400 500 600
Time (ms)
Time (ms)
F
r
e
q
u
e
n
c
y

(
H
z
)
5000
4000
3000
2000
1000
0
0 0
F
r
e
q
u
e
n
c
y

(
H
z
)
5000
4000
3000
2000
1000
F
r
e
q
u
e
n
c
y

(
H
z
)
5000
4000
3000
2000
1000
F
r
e
q
u
e
n
c
y

(
H
z
)
Figure 3. Spectrograms of the minimal pairs A) Steig ([stajk], steep slope) and
B) steig ([stejk], climb), and C) Laib ([lajp], Eucharist) and D) Leib
([lejp], body) from one Swabian speaker. The horizontal lines mark the
approximate frequency of Sg2, and arrows point out where F2 is below
Sg2 in the [aj] diphthongs.
They arrange their formant movements so that their individual subglottal
resonance region is crossed in the case of one diphthong but not the other.
Given our present knowledge of the production/perception loop which has
to be mastered by every speaker in the speech learning process (Guenther
and Perkell 2004; Dogil 2010) the skill of controlling the relation between
subglottal and the supraglottal resonances may appear simple. However,
learning this skill at an adult age through an auditorily-controlled imitation
process appears impossible.
Previous ndings that the difference between the diphthongs is temporal
(Geumann, 1997; Hiller, 2003) must be revisited in light of our ndings. It
may be that the temporal differences are due to the interaction between F2
and Sg2, or that both spectral and temporal cues independently contribute to
this contrast. In sum, these data support the hypothesis that speech spectral
variability is constrained by subglottal articulatory-acoustic non-linearities
which underlie distinctive features.
Acknowledgments
This work is a part of SFB 732 and was partly funded by the German Research
Association (DFG).
Notes
1. Names of the authors appear in alphabetical order. For correspondence contact
grzegorz.dogil@ims.uni-stuttgart.de
2. The lower airway acoustics and its coupling to the vocal tract have been explored
in Cranen and Boves (1987), Stevens (1998) and most recently in Stevens and
Keyser (2010: 2.3.1) and in Lulich (2010: 2.2).
3. The Baden-Wrttemberg chamber of commerce was highly praised for its
advertising campaign with the slogan: Wir knnen alles. Auer Hochdeutsch.
which means We can do anything. Except speak Standard German.
The campaign was boosting Swabian pride for their dialect and industrial
achievements.
References
Adank, Patti, Roel Smits, and Roeland van Hout.
2004 A comparison of vowel normalization procedures for language
variation research. Journal of the Acoustical Society of America
116:30993107.
Cheyne, H. A.
2002 Estimating glottal voicing source characteristics by measuring
and modeling the acceleration of the skin on the neck. Ph.D. diss.,
Massachusetts Institute of Technology.
Chi, Xuemin and Morgan Sonderegger
2007 Subglottal coupling and its inuence on vowel formants. Journal of
the Acoustical Society of America 122:17351745.
Chistovich, L. A.
1985 Central auditory processing of peripheral vowel spectra. Journal of
the Acoustical Society of America 77: 789805.
1968 The Sound Pattern of English. NY: Harper and Row.
Clements, George N.
2003 Feature economy in sound systems. Phonology 20: 287333
Clements, George N. and Elizabeth Hume
1995 The Internal Organization of Speech Sounds. In: John Goldsmith (ed.),
Handbook of Phonological Theory, 245306. Oxford: Basil Blackwell.
Cranen, B. and L. Boves
1987 On subglottal formant analysis. Journal of the Acoustical Society of
America 81: 734746.
Csap Tams Gbor, Zsuzsanna Brknyi, Tekla Etelka Grczi, Tams Bhm,
Steven M. Lulich
2009 Relation of formants and subglottal resonances in Hungarian vowels.
In: Maria Uther, Roger Moore, Stephen Cox (eds.) Proceedings of
Interspeech, 484487. Brighton: Casual Productions Pty Ltd.
Dogil, Grzegorz
2010 Hard wired phonology. In: Cecile Fougeron and Barbara Khnert
(eds.) LabPhon 10, 343379. Berlin: Mouton de Gruyter.
Fant, Gunnar
1960 Acoustic Theory of Speech Production. The Hague: Mouton.
Flemming, Edward
2005 Speech perception and phonological contrast. In: D. Pisoni and R.
Remez (eds.) The Handbook of Speech Perception, 156181. Oxford:
Blackwell.
Guenther, F.H., and Perkell, J .S.
2004 A neural model of speech production and its application to studies of
the role of auditory feedback in speech. In: B. Maassen, R. Kent, H. Peters,
P. Van Lieshout, and W. Hulstijn (eds.), Speech Motor Control in Normal
and Disordered Speech, 2949. Oxford: Oxford University Press.
Geumann, Anja
1997 Formant trajectory dynamics in Swabian diphthongs.
Forschungsberichte des Instituts fr Phonetik und Sprachliche
Kommunikation der Universitt Mnchen 35:3538.
Hiller, Markus
2003 The diphthong dynamics distinction in Swabian. In: van de Weijer,
van Heuven, and van der Hulst (eds.) The Phonological Spectrum.
Amsterdam: J ohn Benjamins.
J akobson, Roman, Gunnar Fant and Morris Halle
1952 Preliminaries to Speech Analysis. Cambridge, MA: MIT Press.
Kgler, Frank
2007 The Intonational Phonology of Swabian and Upper Saxon. Niemeyer:
Tbingen.
Liljencrants, J . and Bjrn Lindblom.
1972 Numerical simulation of vowel quality systems: the role of perceptual
contrast. Language 48: 839862
Lulich, Steven M., Asaf Bachrach and Nicolas Malyska.
2007 A role for the second subglottal resonance in lexical access. Journal of
the Acoustical Society of America 122:23202327.
Lulich, Steven M.
2010 Subglottal resonances and distinctive features. Journal of Phonetics
38: 2032.
Mangold, Max; et al.
2000 Duden Aussprachewrterbuch: Wrterbuch der deutschen
Standardaussprache [German Pronouncing Dictionary]. Mannheim:
Duden Verlag, Band 6.
Sonderegger, Morgan
2004 Subglottal Coupling and Vowel Space: An Investigation in Quantal
Theory. Undergraduate Thesis, Massachusetts Institute of Technology.
Stevens, Kenneth N.
1972 The quantal nature of speech: evidence from articulatory-acoustic
data. In: E.E. David, J r. and P.B. Denes (eds.) Human Communication:
a unied view, 5166. New York: McGraw-Hill.
1989 On the quantal nature of speech. Journal of Phonetics, 17: 345.
1998 Acoustic Phonetics. Cambridge, MA: MIT Press.
Stevens, Kenneth and Samuel J . Keyser
2010 Quantal theory, enhancement and overlap. Journal of Phonetics 38:
1019.
Syrdal, Ann and H.S. Gopal
1986 A perceptual model of vowel recognition based on the auditory
representation of American English vowels. Journal of the Acoustical
Society of America 79: 10861100.
Trubetzkoy, Nikolaus
1939/ Principles of Phonology. Originally published in German (Grundzge
1969 der Phonologie) as Travaux du Cercle Linguistique de Prague
7. Translated by Christiane Baltaxe. Berkeley and Los Angeles:
University of California Press.
Wang, Shizhen, Steven M. Lulich and Abeer Alwan
2010 Automatic detection of the second subglottal resonance and its
application to speaker normalization. Journal of the Acoustical
Society of America 126(6):32683277.
Wokurek, Wolfgang and Andreas Madsack
2009 Comparison of manual and automated estimates of subglottal
resonances. In: Maria Uther, Roger Moore, Stephen Cox (eds.)
Proceedings of Interspeech, 16711674. Brighton: Casual Productions
Pty Ltd.
Voice assimilation in French obstruents:
Categorical or gradient?
Pierre A. Hall and Martine Adda-Decker
Abstract. This work contributes to the issue of categoricity versus gradiency
in natural assimilations. We focused on voice-assimilation in French and
started from the assumption that the main cue to obstruent voicing is glottal
pulsing. We quantied glottal pulsing continuously with a single acoustic
measurethe proportion in duration of voiced portion(s) within a consonant
which we call v-ratio. We used a large corpus of French radio and television
speech to compute v-ratios for all the obstruents appearing in word-nal to
word-initial obstruent contacts. The results were analyzed in terms of v-ratio
distributions, which were compared with theoretical distributions predicted
by two contrasting hypotheses on the mechanisms of assimilation: categorical
switch versus v-ratio shift. The comparisons strongly suggested that, although
voicing itself can be incomplete, voice assimilation is essentially categorical
in terms of v-ratio. We discuss this result in light of recent perceptual data
showing sensitivity to extremely subtle acoustic differences: secondary cues
to voicing do not seem to follow the same pattern of categoricity as glottal
pulsing.
1. Introduction
In one of his often cited papers, The geometry of phonological features,
Nick Clements wrote: there should be three common types of assimilation
processes in the worlds languages: TOTAL assimilation processes in which the
spreading element A is a root node, PARTIAL assimilation processes in which
A is a class node, and SINGLE-FEATURE assimilation processes in which A is
a single feature. (Clements 1985: 231). An example of total assimilation
would be sweeb boy for sweet boy: the last phoneme in sweet is substituted
with the rst in boy. But we know that place assimilation in English is only
partial by Clements denition: sweet boy becomes sweep boy, that is, only
150 Pierre A. Hall and Martine Adda-Decker
the place node is spreading, not the entire root node. Common to both types of
assimilation, however, spreading of a node from a source to a target segment
(the assimilating and assimilated segments, respectively) is accompanied by
the delinking of a stray node from the target segment. This part of the
process is known as stray erasure.
On that view of assimilation, assimilated segments retain nothing of the
substituted node, whether assimilation is total or partial. At a phonetic level
of description, this might entail that, for instance, /p/ in sweep boy is realized
as a nonambiguous labial [p], with no traces of the underlying coronal place.
This view of categorical assimilation is illustrated in (1). In (1) and (2), level
1 corresponds to an abstract level of representation, before any contextual
rule has applied, whereas level 2 integrates the application of such rules. The
phonetic level describes the output phonetically.
level 1 level 2 phonetic level
(1) X X X X

=
[COR] [LAB] [COR] [LAB] labial labial
(2) X X *X X

[COR] [LAB] [COR] [LAB] mixed labial
(1) and (2) both assume that assimilation systematically happens when
immediate context conditions are met. But on a slightly more complex
elaboration of these descriptions, assimilation may or may not take place
based on immediate context alone: assimilation conditioning is more
complex. To make the issue simpler, however, we will assume that (1) and
(2) are further qualied in that assimilation may not always occur.
Let us go back to the sweet boy example. By the categorical account in (1),
assimilation, when it takes place, substitutes /p/ for /t/ at level 2, hence [p] to
[t] at the phonetic level. Because both the intended sound /p/ and its context
/b/ agree on place specication, no coarticulation for place is expected. When
assimilation does not take place, /t/ is left unchanged and place coarticulation
potentially affects phonetically the sounds in contact: [t] and [b]. Although
such coarticulations may produce phonetically intermediate sounds, the
categorical nature of assimilation in (1) is not challenged.
Voice assimilation in French obstruents 151
By a gradient, incomplete view of assimilation, assimilated sounds
are not so fully modied that they change category. They take on part
of the assimilating contexts phonetic characteristics yet retain part of
their original characteristics, just like when they are coarticulated. But
incomplete assimilation, if this refers to an intentionally planned, active
process, must entail something more than coarticulation albeit something
less than complete, categorical assimilation. In our view, coarticulation
cannot be such an active process (but see Flemming 1997) and coarticulation
effects result from unintended biomechanical constraints.
1
In the sketch
of incomplete assimilation shown in (2), the non-categorical nature of the
process is therefore implemented at level 2 in (2). Since (2) lacks delinking
of the original specication, the process described is hardly acceptable
phonologically (hence the * mark): It violates the basic premise that
phonological representations be categorical. But how else can we formalize
the notion of an assimilation process that is intentionally incomplete?
Whatever the appropriate phonological formalism, the implication is clear at
the phonetic level: When a sound is assimilated, its phonetic characteristics
should always be intermediate between their original specication and that
of the assimilating sound.
In section 2, Gow versus Gaskell, we review the claim that assimilations
occurring in natural speech are not categorical but gradient (in particular,
Gow 2001, 2002; Gow and Im 2004). This claim is in line with the account
of assimilation described in (2), whereas it is obviously contrary to that in (1).
This latter account is de facto ruled out in the gradient view of assimilation. The
claim that natural assimilations are incomplete has important consequences
for our understanding of how the human system of speech processing copes
with assimilation, that is, compensates for assimilation. Gow proposes a
compensation mechanism, which he calls feature cue parsing, based on the
premise that assimilations are always incomplete. We review briey some
recent counter-evidence to that view.
In section 3, regressive voicing assimilation, we narrow in on voicing
assimilation, which is the focus of the present chapter. We review phonetic
evidence for incomplete assimilations and conclude that only distributional
data could convincingly settle the ongoing debate.
In section 4, corpus analyses, we present our own analyses of a large
corpus of French broadcast speech, using the rough but robust index of
voicedness that we call voicing ratio or v-ratio. The distributional data
for v-ratio in assimilating contexts are compared with modeled distributions
based on (a) the distributional data in non-assimilating contexts, and (b) the
changes in distribution that assimilation imposes as predicted by categorical
versus gradient accounts of assimilation mechanisms. Our modeling data
suggest assimilation is more categorical than gradient in nature and that the
phonetic basis for [voice] is fully voiced versus not fully voiced. But these
results must be tempered since we only looked at v-ratio. Although this index
seems to capture most of the variation determining the perception of [+voice]
vs. [voice] in French, voicing is also cued by several other well documented
characteristics.
Some of these characteristics are addressed in section 5, subtle traces of
voicelessness. In this section, we present a brief survey of a recent study
(Snoeren, Segui, and Hall 2008) suggesting that listeners can recover an
original [voice] specication even though v-ratio alone would indicate
complete voicing. Traces of the original [voice] seem to involve durational
patterns as well as, to a lesser extent, microprosodic variation, and amplitude
of glottal pulsing during consonant closure.
We conclude that the picture of voice assimilation in French is more
complex than previously thought. Voice assimilation is largely categorical
with respect to v-ratio. Yet, the secondary cues to voicing do not seem to
undergo complete neutralization after voice assimilation. We discuss some
possible ways to integrate such dissociation in a phonological description.
2. Gow versus Gaskell
A few years ago, David Gow published a series of studies on the issue
of compensation for assimilation. Previous studies had defended two
different accounts of this perceptual device. On the regressive inference
view defended by Gaskell and Marslen-Wilson, the right phonetic context
licenses or not the recovery of the assimilated sounds underlying identity.
For example, leam in leam bacon can be understood as an instance of intended
lean, whereas leam in leam gammon cannot (Gaskell and Marslen-Wilson
1996). In cases of ambiguity created by place assimilation (e.g., a quick
rum/run picks you up), regressive inference does not help in recovering the
intended word (here, run or rum) (Gaskell and Marslen-Wilson 2001). On the
underspecication view (Lahiri and Reetz 2010), [CORONAL] is, as a rule,
left unspecied. Thus, /n/ in lean, for instance, would not be specied for
place and neither leam nor lean would mismatch the lexical representation
of lean.
Gow noted that the studies cited above had used deliberate, complete
assimilations although natural assimilations, that is, produced in natural
speech, are often incomplete. His feature cue parsing proposal relies on the
assumption that natural assimilations are always incomplete. In potentially
ambiguous utterances such as right/ripe berries, the assimilated form of right
is neither [raip] nor [rait]: cues to [CORONAL] and to [LABIAL] coexist at the
contact between words and are assigned to right and berries, respectively.
Thus, by the feature cue parsing mechanism, ambiguity is avoided and at the
same time, the intended word right is recovered, even if it is realized closer
to [raip] than to [rait].
This elegant mechanism has been criticized, however, on several grounds.
One concern is that compensation for assimilation would function identically
across languages differing with respect to their specic phonological
process(es) of assimilation. This point is controversial: Gow found positive
evidence for compensation for assimilation with both Hungarian and American
listeners tested on Hungarian voice-assimilated utterances such as oros
dinastia Russian dynasty (Gow and Im 2004); Darcy et al. found language-
specic compensations for assimilation with English and French listeners on
language-specic place (English) and voice (French) assimilations (Darcy
et al. 2009). Another concern is that feature cue parsing allows recovery of
the assimilated consonant only if it retained traces, however subtle, of its
underlying, original identity. Recent data from large corpora suggest that
complete assimilations do occur (Dilley and Pitt 2007). More critically,
Gaskell and Snoeren (2008) found that compensation for place assimilation
also applies to word forms with no possible traces of the recovered place.
For example, rum in a quick rum picks you up, intended as rum not run,
is interpreted as an instance of run more often than the same rum in a quick
rum does you good. Gaskell and Snoeren interpreted their results in terms of
learned statistical inference: listeners have experienced a signicant number
of rum picks you up utterances in which unambiguously labial rum was a
strongly assimilated form of run, and therefore (mistakenly) interpret rum
as an instance of run rather than rum more often in a labial than alveolar
context. Whether or not this interpretation is correct, the feature cue parsing
mechanism proposed by Gow clearly fails to explain such occurrences of
counterproductive compensation.
Compensation for assimilation can thus rely on something other than
traces of the initially intended category. Moreover, are mechanisms exclu-
sively relying on such traces ever needed? At this point, it is necessary to
gauge the likelihood of encountering assimilations that leave traces of the
intended category, that is, gradient assimilations. To this end, we focus on
voicing assimilation.
3. Regressive voicing assimilation
There is little controversy about the phonetic substrate of voicing in French,
contrary to, for example, English. French voicing is mainly cued by the
presence/absence of glottal pulsing during the constriction of obstruent
consonants, be they stops or fricatives. That voicing assimilation in French
is regressive is also a matter of consensus. But its complete versus partial
nature is debated. In the opinion of Grammont, assimilation is not complete
when occurring between words, as in une robe courte a short dress: la
cessation des vibrations glottales prpare pour le c commence ds le b, qui
devient une occlusive sourde tout en restant une douce. [the interruption
of glottal pulsing planned for the c already takes effect at the b, which becomes
a voiceless stop yet remains a soft [consonant].] (Grammont 1933: 186; also
see Fouch 1969). In other words, voiceless and voiced stops are characterized
by a soft versus strong quality (i.e., lenis versus fortis) in addition to the
presence versus absence of glottal pulsing. The phonetic transcription [b ] would
capture rather well the idea that /b/ retains its soft quality before a voiceless
obstruent. (And likewise, [p
] would retain the hard quality of /p/ when voice-

assimilated before a voiced obstruent). Grammont also notes that within-word
assimilation is always complete, provided it does not follow schwa deletion
as in mdecin pronounced (in free variation) [medesc
], [metsc
] or [mctsc
]. For
example, obtenir to obtain is pronounced optenir avec un p fort et non avec
un b sourd [with a strong p not a soft voiceless b]. Grammont explains this
is because the phonetic context of b in obtenir cannot change, contrary to that
of d in mdecin. Put differently, the phonological form of the word obtenir is
xed at the surface level of representation: It is [:ptenir].
2

Grammonts account of voicing assimilation was left unquestioned until
an aerodynamic and phonetic study by Rigault (1967). Rigault convincingly
showed that voicing assimilation was complete in both within-word and
between-word assimilation situations. For example, he found that d in either
mdecin [metsc
] medical doctor, or guide savant [gitsavo
] learned guide)
was pronounced [t], with the same acoustic and aerodynamic characteristics
as t in fuite secrte secret escape, and in stark contrast with d pronounced
[d] in a voiced context (e.g., guide zoulou Zulu guide). Rigault concluded
that the soft vs. strong distinction proposed by Grammont and elaborated
by Fouch has no phonetic basis and moreover is questionable in terms of
representational economy (Martinet 1955).
What can graded or partial assimilation mean at a phonetic level of
description? For place assimilation, it clearly means that acoustic cues in the
assimilated sound are intermediate between the initially intended sound and
its assimilating context. For example, vowel-to-consonant formant transitions
in assimilated right compared to plain (unassimilated) ripe and right would
point to a stop intermediate between alveolar and labial (cf. Gow, 2002). For
voice assimilation, we may expect that graded changes in voicing be signaled
not only by intermediate cue values but also by incomplete change of the
temporal extension of these cues. This is what Jansen (2004; also see Jansen
and Toft 2002) reported to nd in Hungarian regressive voice assimilation. He
found that voicing duration in plosives and fricatives was the main acoustic
parameter sensitive to assimilation but did not undergo complete changes that
would neutralize the voicing contrast. Although the complete picture Jansen
reported was complex, voicing duration measured as the duration of glottal
pulsing within the obstruents under scrutiny varied substantially for most
obstruents between a no-assimilation baseline condition and an assimilation
condition. For example, voicing duration for /k/ changed from 27 ms in a
voiceless context (e.g., vak tallkozott) to 53 ms in a voiced context (e.g.,
vak darabolta). Conversely, voicing duration for /g/ changed from 70 ms in
a voiced context (e.g., vg dominl) to 30 ms in a voiceless context (e.g., vg
tvolodik). On average, assimilation did not lead to complete neutralization
with respect to voicing duration. In particular, voice-assimilated /k/ did not
change to a plain [g]. Interestingly, other cues to voicing, such as duration
patterns (closure, release, preceding vowel) were less consistently sensitive
to regressive assimilation. Jansen concluded that Hungarian regressive voice
assimilation is governed by different processes at the same time: phonetic
(coarticulation) and phonological (categorical switch) processes. Gow and
Im (2004) reported similar phonetic data on Hungarian voice assimilation in
fricatives, using the same measurements as Jansen and Toft (2002). They found
no assimilation for word-nal voiced fricatives such as /z/ and incomplete
assimilation for voiceless fricatives such as /s/ in oros (Russian). Note that,
instead of voicing durations, Gow and Im reported latencies of glottal pulsing
onset, which they called VOT. This implicitly entails they consistently found
glottal pulsing in the right portion of the fricatives and absence of glottal
pulsing in the left portion, contrary to the ndings of Barry and Teifour (1999)
on voice assimilation in Syrian Arabic. Our own data, reported in section 4.3,
is in line with Barry and Teifours. In Gow and Ims data, the mean VOT
for voiceless fricatives in a voiced context is much shorter than for plain
voiceless fricatives but not as short as for plain voiced fricatives, suggesting
incomplete assimilation. But behind the mean values suggesting incomplete
assimilation may lie a qualitatively different reality: complete assimilation in
most occurrences, as observed by Rigault in French, and no assimilation at all
in some other occurrences. That is, assimilation might actually be categorical,
not gradient. To resolve the categorical vs. gradient issue, distributional data,
not just means are therefore necessary. We expand on this point in the next
section. For the moment, we simply note that categorical assimilation may
appear as incomplete on average because it does not occur all the time but
nevertheless is complete when it does occur.
One attempt at examining distributional data was made in Snoeren, Hall,
and Segui (2006) who measured assimilation degrees for voice-assimilated
stops in French. They argued that voice-assimilation in French is gradient
because about 45% of the assimilations they observed fell in the [20%, 80%]
intermediate range of assimilation degree values. Such a pattern indeed
suggests that assimilation might not be extreme in a substantial number of
cases. The statement is, however, problematic for methodological reasons.
For underlyingly voiceless stops, the assimilation degree in Snoeren, Hall
and Seguis (2006) study is the percentage of voicing. This index has been
used in other studies for either stops or fricatives (e.g., Burton and Robblee
1997) and is equivalent, in percentage, to the v-ratio we use (Hall and Adda-
Decker 2007). By this denition of assimilation degree, it is thus implicitly
assumed that the percentage of voicing is zero for plain, unassimilated
voiceless stops. But this latter assumption is rarely met: Voiceless obstruents
seldom lack glottal pulsing completely (as we show in section 4.1). A
no-assimilation baseline is therefore needed to estimate any assimilation
degree. Snoeren, Hall and Segui (2006) did not measure such a baseline.
For the opposite direction of assimilation devoicing the lack of a baseline
is less problematic. Indeed, voiced stops in a voiced context are almost
always voiced throughout, that is, their percentage of voicing is very often
100%. The denition of the assimilation degree for underlyingly voiced
stops 100% minus percentage of voicing is therefore rather plausible. But
interpreting incomplete voicing remains problematic: How is the percentage
of voicing distributed in plain, non-assimilated voiceless consonants? How
does it compare with the distribution observed for assimilated consonants?
Does the comparison suggest a continuous or a discrete assimilation process?
To answer these questions, we undertook a distributional analysis of v-ratios
on a large corpus of journalistic speech, comparing contexts licensing
assimilation to no-assimilation contexts.
4. Corpus analyses
The corpus data under scrutiny have been described elsewhere (Hall and
Adda-Decker 2007). We rst summarize the main aspects that have already
been reported, then present new distributional analyses.
The initial corpus consisted of over 100 hours of speech from radio
and television news with several levels of transcription aligned on the
speech signal (phonemic, lexical, and morphosyntactic). About 100,000
speech passages with word-nal to word-initial obstruent contacts
(C1#C2) were extracted. A few examples with possible assimilation are
given in (3).
(3) avec des /avck#de/
neuf dcembre /nf#deso
br/
trouvent que /truv#ke/
quinze chars /kc
z#Jar/
3
Voicing ratios (v-ratios) were computed for both C1 and C2 in all the
passages. V-ratio is simply the proportion of voicing, in duration, within an
obstruent (see above). We focus here on the v-ratios on C1. As we discuss
in section 5, the temporal extension of voicing within an obstruent is not
the only cue to voicing but it is a primary one (cf. J ansen 2004). Moreover,
v-ratio is relatively easy to measure in a large corpus, provided robust voiced-
voiceless decision and alignment procedures are used.
4
Here, we relied on
the automatic alignment system developed at LIMSI laboratories (Adda-
Decker and Lamel 1999; Gauvain et al. 2005), and the F0 extraction cross-
correlation algorithm implemented in Praat (Boersma 2001): Glottal pulsing,
that is, voicing was considered to occur wherever F0 could be computed.
5
We
took the variation of v-ratio between the no-assimilation baseline situations
and the potential assimilation situations as a measure of assimilation. Note that
the control situations are all the more needed given that there is uncertainty
with the measurements, as is unavoidable with automated procedures. The
overall averaged results are shown in Table 1.
Table 1. v-ratio for C1 in control vs. assimilating contexts in all possible C1#C2
contacts (UV and V for voiceless and voiced, respectively)
type of contact voicing assimilation devoicing assimilation
UV-UV UV-V V-V V-UV
stopstop .46 .74 .28 .91 .79 .11
stopfricative .47 .74 .26 .89 .74 .15
fricativestop .22 .68 .46 .85 .36 .49
fricativefricative .35 .57 .22 .85 .54 .31
means .38 .68 .30 .88 .61 .27
4.1. Baseline v-ratios
As can be seen in Table 1, the v-ratio of voiceless obstruents in a voiceless
context is much larger than zero (between .22 and .47) and is larger for stops
than fricatives (.46 vs. .28). In the case of stops preceded by a vowel, the
closure portion overlaps with the voicing lag of the vowel. It usually begins
with a short voiced portion of decreasing amplitude, which corresponds to
the vowel offset. This makes the use of baseline voiceless-voiceless contacts
necessary for quantifying assimilation. In the case of fricatives, the v-ratio
observed might also reect the voicing lag of a preceding vowel. But why is
it smaller than for stops? In order to fully understand this pattern, we need to
know where and how the glottal pulses occur within the closure portion of
stops or in the entire constriction of fricatives. We address this point in the
discussion of voicing patterns.
The largest v-ratios are found for voiced obstruents in a voiced context:
on average, .9 for stops and .85 for fricatives. The baseline v-ratio for V-V
is thus much closer to the theoretical extreme (1) than the UV-UV baseline
v-ratio is to the opposite extreme (0), in line with our previous comments.
4.2. v-ratios of assimilated obstruents
The change in v-ratio with respect to baseline when voiceless obstruents are
followed by voiced obstruents is the largest for fricativestop contacts. On
average, it is about .27 for stops and .34 for fricatives. For voiced obstruents
followed by voiceless obstruents, the change in v-ratio is rather modest for
stops (.13 on average) and about three times larger for fricatives (.40 on
average), the largest, again, in the case of fricativestop contacts (.49).
Overall, then, voicing and devoicing assimilations induce more or less
symmetrical changes in v-ratio for fricatives but not stops. For stops there is
much less devoicing than voicing, conrming the earlier ndings in Snoeren,
Hall and Seguis (2006) study, although that study did not use appropriate
baselines to estimate assimilation degrees.
The issue at stake in the present paper is whether assimilations are
complete or not. The data in Table 1 seem to provide a possible answer
to this question in that the v-ratios for assimilated C1s never reach those
for plain C1s. For example, whereas v-ratio for plain voiceless C1s is .38,
v-ratio for devoiced C1s is .61 (on average). Likewise, the v-ratios for plain
voiced C1s and voice-assimilated, phonologically voiceless C1s are .88 and
.68, respectively. Based on these averages, assimilations could be viewed
as incomplete. Yet, as we already noted, intermediate averages may hide an
all-or-none underlying reality: Assimilations may not occur all the time but
nevertheless be complete when they do occur. It is of course also possible
that assimilations, when they occur, are incomplete. Only distributional data
can help decide which account is the most plausible. We present such data
for v-ratio in sections 4.4 and 4.5 and argue that they support the former
rather than the latter account.
Before we look at v-ratio distributions, however, we need to resolve the
concern mentioned earlier as to what v-ratios tell us about voicing. Do changes
in v-ratio reect changes in the temporal extension of voicing changes in
the cumulated durations of glottal pulsation or changes in the strength of
glottal pulsation? Because our v-ratio data was based on F0 (or, alternatively,
harmonicity) computation, they must be sensitive to signal amplitude. More
specically, a 0.5 v-ratio, for example, may reect that periodicity indeed
occurs on half of the segment under scrutiny, or that it occurs on the entire
segment but with such intensity uctuations that periodicity is detected only
half of the time. In order to resolve this issue, we need to know where voicing
occurs in segments.
4.3. v-patterns: Where does voicing lie?
While v-ratio indicates that a segment is not voiced throughout its entire
duration, the question arises as to which portion or portions of the segment
have been found to be voiced. In particular, we need to track situations in
which voiced portions are scattered throughout the entire segment. Such
situations would likely correspond to weak voicing but with full time
extension rather than voicing with partial time extension. We considered
four congurations in addition to fully voiced and fully voiceless: a single
voiced portion located at the left edge, at the right edge, or in the middle of
the segment, or scattered voicing with possible contact at one or both edges.
We call these congurations voicing patterns or v-patterns.
Table 2 shows the frequency of occurrence of each v-pattern according
to C1 manner and to voicing contact, with the two last congurations
pooled together under scattered voicing. The v-pattern data clearly show
that scattered voicing is rather infrequent. Most of the time, voicing lies on
the entire segment duration or on the left edge (about 83% of the time).
This suggests that decreases/increases in v-ratio are not due to an overall
weakening/strengthening of voicing (e.g., in terms of glottal pulsing
amplitude) but, rather, to a reduction/increase in the time extension of
voicing. Obstruents generally have a single voiced portion starting from
obstruent onset and extending to the right, possibly until obstruent offset.
The non-zero baseline v-ratios found for voiceless obstruents must therefore
measure preceding vowel voicing lag in most situations, for fricatives as
well as for stops. This resolves the concern expressed in 4.1. This nding is
in stark contrast with the VOT data reported by Gow and Im (2004) for
Hungarian fricatives, which seem to always begin with a voiceless portion
and end with a voiced portion.
6
Do these patterns tell us something about the discrete nature of
assimilations? They suggest a partial exchange from fully voiced to left
and no voicing: For voiceless fricatives, these left and no voicing
populations decrease uniformly, to the benet of the full voicing population.
We examine this point in more detail in the next section.
4.4. v-ratio distributions: Changes induced by voice context
When voiceless obstruents are followed by voiced obstruents, their v-ratio
globally increases. But how does the v-ratio distribution change? Figure 1
shows the v-ratio distributions for the UV-UV (baseline) and UV-V
(assimilation) conditions.
A striking aspect of these distributions is that, within the [0, 0.9] range,
the v-ratio is not differently distributed for the baseline and assimilation
conditions (
2
(8) =1.22, p =.996). This suggests that voicing assimilations
can be understood as a simple exchange between two categories, leaving
unchanged the internal organization of each category. The two categories
Table 2. Percentages of v-patterns in C1 according to voicing contact for stops and
fricatives (UV and V for voiceless and voiced, respectively)
C1 manner contact no left full right scattered
voicing voicing voicing voicing voicing
stop UV-UV 15.6 63.6 14.4 2.2 4.2
UV-V 9.5 17.5 63.0 3.7 6.3
V-V 1.9 5.3 88.5 1.6 2.7
V-UV 8.5 29.2 58.9 1.2 2.2
fricative UV-UV 24.6 64.1 3.6 2.6 5.1
UV-V 7.9 27.6 52.0 3.1 9.5
V-V 1.5 16.0 77.1 0.7 4.6
V-UV 10.7 74.7 10.8 0.6 3.2
would be fully voiced (v-ratio >0.9) on the one hand and not fully
voiced (v-ratio in the [0, 0.9] range) on the other. However, the close
similarity of the distributions within [0, 0.9] v-ratios between baseline and
assimilation conditions does not hold across the board. It is found with stops,
for both voicing and devoicing assimilation and with fricatives for voicing
assimilation but not clearly for devoicing assimilation. In the latter situation,
v-ratios tend to decrease within the [0, 0.9] range. This suggests a different
process than a simple exchange between the two categories proposed above
or, perhaps, different v-ratio denitions of these categories. In section 4.5,
we adopt a modeling approach to test further the categorical and graded
accounts of voice assimilation.
4.5. v-ratio distributions: Modeling the changes caused by assimilation
By a categorical, discrete account of voice assimilation, there are two
phonetically denable categories voiced and voiceless and the voice
assimilation process simply is a switch from one category to the other.
From a radical view of categorical assimilation, category switch always
occurs in assimilation-licensing contexts. From a less radical view, there is
either category switch or no category change. By the gradient, continuous
account, assimilation can be viewed as a phonetic shift toward one of the two
categories.
7
How can we model these two contrasting views of assimilation
30
A B
25
20
15
10
.05.15.25.35.45
intervals (center values)
.55.65.75.85.95 .05
12.5
63.1
.15.25.35.45
intervals (center values)
.55.65.75.85.95
5
0
30
25
20
15
10
5
0
Figure 1. v-ratio distributions for C1 in the (A) UV-UV and (B) UV-V conditions;
the black bar roughly corresponds to fully voiced C1s and is truncated in
(B) to enhance detail in the [0, 0.9] range of v-ratios.
with respect to the single parameter examined so far v-ratio? Figures 2
and 3 illustrate possible shift and switch scenarios, respectively, in
the case of voicing assimilations. Devoicing assimilations are assumed
to yield symmetrical scenarios. In these hypothetical scenarios, phonetic
shift is modeled by an increase in v-ratio along the entire range of v-ratio
values. This basically entails a rightward shift of the initial distribution. In
particular, the leftmost peak that corresponds to the lowest v-ratios in the
baseline condition (UV-UV) is shifted to the right by a constant amount. In
our modeling, we have incorporated limit conditions and stochastic variation
around a constant v-ratio shift d, as shown in (4).
(4)
f
a
(v) = f
b
(max(1, min(0, v + d + e(v)))
(where v stands for v-ratio, f
a
(v) and f
b
(v) for the frequency of v-ratio v in the
baseline ( f
b
) and assimilation ( f
a
) conditions; the scaling factor ensures a
constant cumulated frequency; d is the mean v-ratio shift.)
The categorical switch scenario is modeled as a partial exchange
between two posited categories, voiced and voiceless , with no category-
100
80
60
40
20
0
0
1
v-ratio
UV-UV
UV-V
f
r
e
q
u
e
n
c
y
Figure 2. shift model for underlyingly voiceless obstruents: made-up v-ratio
distribution in voiceless context (plain line: baseline condition) and
predicted distribution in voiced context (dashed line: assimilation).
internal changes with respect to v-ratio. In such a model, the denition of
the two categories is critical: The category boundary between [voice] and
[+voice] must be specied in the v-ratio dimension. The data discussed in
4.4 suggested a category boundary at a rather high v-ratio (cf. Figure 1).
This is illustrated as boundary 2 in Figure 3. For the sake of comparison, a
low v-ratio boundary is illustrated as boundary 1, hence two variants of the
switch model: switch 1 and switch 2 (see Figure 3). The exchange model
shown in (5) ensures that the within-category distributions of v-ratio are left
unchanged after assimilation has applied.
(5)
f
a
(v) =
f
b
(v) r,
f
b
(v) (1 s), v v
c
v < v
c

, s = (r 1)
[0,v
c
]
f
b
(u)
[v
c
,1]
f
b
(u)

(where v
c
stands for a boundary v-ratio between the hypothetical voiceless
and voiced categories; r and s specify the amount of exchange between the
two categories with no change in cumulated frequency.)
Figure 3. category switch model for underlyingly voiceless obstruents: made-
up v-ratio distribution for UV-UV (plain line: baseline) and predicted
distributions for UV-V (assimilation) for two categorical boundaries
(dashed line: boundary 1 at 0.2; dotted line: boundary 2 at 0.8).
100
80
60
40
f
r
e
q
u
e
n
c
y
v-ratio
20
0
0 1
UV-V (2)
boundary 2 boundary 1
UV-UV
UV-V (1)
Figure 4. v-ratio distributions for voiceless fricatives: (A) as observed in UV-UV
(bold plain line) and UV-V (bold dashed line with triangles) contexts.
(Continued)
Which of these models best predicts the observed data? We computed,
for each model and each underlying voicing, the v-ratio distribution in the
assimilation condition predicted from the baseline distribution. This was
done separately for stops and fricatives. The parameters d for the shift model
(amount of shift), and r and s for the switch models (amount of exchange)
were estimated so that modeled and observed assimilations yield the
same overall v-ratio. Figures 4A-C provides an illustration for the voicing
assimilation of voiceless stops.
To compare the models, root mean square deviations between modeled
and observed distributions were computed. The results are shown in Table 3.
The switch model with [0, .9] and ].9, 1] v-ratios dening [voice] and
[+voice], respectively, clearly yields a better t than the other two models,
with a mean prediction error of about 2%. In detail, the adjustment is very
good for all conditions excepted for devoicing in fricatives (5% error).
A closer inspection of the data reveals that assimilation does not affect
within-category mean v-ratios except for this latter condition. Thus, for
fricatives only, and for the devoicing direction of assimilation, there is a slight
70
UV-UV UV-V
A
60
50
f
r
e
q
u
e
n
c
y
40
30
20
10
0
.05 .15 .25 .35 .45 .55 .65 .75 .85 .95
Figure 4. (Continued) (B) and (C) as observed in UV-V and predicted by the models;
(C) shows the [0, .9] interval of (B) with zoomed in frequencies.
UV-V switch 2 switch 1 shift
B
70
60
50
40
30
20
10
0
30
C
20
10
0
.05 .15 .25
intervals (center v-ratio values)
.35 .45 .55 .65 .75 .85
.05 .15 .25 .35 .45 .55 .65 .75 .85 .95
trend toward gradient assimilation in terms of v-ratio. Yet, for the most part,
voice assimilations in French seem categorical in nature with respect to the
voicing ratio parameter. Moreover, the data suggest a rather narrow phonetic
denition of the categories. Voiced obstruents seem to be fully voiced in
terms of v-ratio, whereas the voiceless category can be loosely specied as
not fully voiced. This points toward a default, unmarked [voice] value of
the [voice] feature. The marked value [+voice] is signaled phonetically by
full voicing, with v-ratio =1. But is v-ratio =1 a sufcient condition for a
segment to be [+voice]? Logically, that condition is necessary but perhaps
not sufcient. The consequence in perception is that obstruents that are not
fully voiced in terms of v-ratio (v-ratio <1) should be perceived as [voice],
whereas obstruents that are fully voiced (v-ratio =1) may be perceived as
[+voice]. In the last section we examine recent perceptual data suggesting
that v-ratio =1 is not sufcient for the perceptual system to treat a segment
as [+voice], at least in cases where the ambiguity between [+voice] and
[voice] cannot be resolved at the lexical level, that is, in cases where the
surface form could correspond to different underlying forms, as in [sud] for
either soude or soute.
5. Subtle traces of voicelessness
In a recent paper, Snoeren, Segui, and Hall (2008) used cross-modal
associative priming to test for the effect of voice assimilation on lexical
access. They used potentially ambiguous words such as soute /sut/ hold,
which is confusable with soude /sud/ soda when strongly assimilated, that
is, when pronounced close to [sud]. Other examples of minimal pairs for nal
consonant voicing included trompe, jatte, bec, rite, bac, rate, etc. (There are
only about twenty such minimal pairs in French.) Snoeren, Segui and Hall
Table 3. Adjustment scores (RMS prediction errors in %) for the three models
examined; for the switch models, the [voice] and [+voice] categories
are dened by ranges of v-ratio variation: [0, .1] and ].1, 1] for switch 1,
[0, .9] and ].9, 1] for switch 2
V-UV UV-V
model stop fricative stop fricative average t
shift 16.9 11.7 14.4 17.3 15.1
switch 1 6.6 18.6 16.0 18.0 14.8
switch 2 0.9 5.1 1.4 1.5 2.2
(2008) asked whether strongly voice-assimilated soute (pronounced close
to [sud]) would activate the word soute not only at a phonological form
level but further, at a lexical-conceptual level. In order to do so, they used
natural assimilations of soute (pronounced in such utterances as une soute
bonde a crammed compartment) that were very strongly assimilated (with
v-ratio =1). These word forms, extracted from the embedding utterances,
were used as auditory primes in a cross-modal association priming
experiment. For instance, baggage (luggage) was paired with either soute
pronounced [sud] or unrelated gratte. Other assimilated word forms such
as jupe pronounced [yb], which has no minimal pair for nal consonant
voicing, were used for comparison purposes: one possible outcome was
indeed that only these unambiguous word forms would be accessed at a
lexical-conceptual level. But the results clearly showed that unambiguous
and potentially ambiguous word forms induced a comparable priming
effect of about 40 ms. Hence, the lexical entry soute was activated by the
strongly assimilated form [sud]. The critical question was of whether the
word form [sud] for plain soude would also activate soute. Indeed, spoken
word recognition can be found to be relatively tolerant of mispronunciations
(Blte and Coenen 2000; Connine, Blasko, and Titone 1993; etc.). A second
experiment showed that [sud] extracted from une soude brute a raw soda
did not prime baggage at all. The priming effect found with assimilated
soute thus could not be due to form similarity with soude.
The only possible explanation of these data was that strongly assimilated
forms (with v-ratio =1), such as soute pronounced [sud], retain something
of their underlying [voice] specication. Snoeren, Segui and Hall (2008)
therefore set up to analyze the detailed acoustic characteristics of the
assimilated stimuli they used. Table 4 summarizes the measurements that
showed assimilated soute indeed retained something of /sut/. V/(V+closure)
summarizes the classic durational cues to voicing in obstruents: Longer
preceding vowel and shorter closure for voiced obstruents, hence larger V/
(V+closure). It seems virtually unaffected by voice assimilation. F0 on the
preceding vowel offset seems to almost neutralize. Finally, the amplitude
of glottal pulsing seems weaker for assimilated soute than for plain soude,
suggesting that gradiency in voicing may be reected not only by graded
temporal extension, as found in several studies (Barry and Teifour 1999;
Gow and Im 2004; J ansen and Toft 2002), but also by graded amplitude of
glottal pulsing.
To summarize, Snoeren, Segui and Halls (2008) study clearly suggested
that v-ratio cannot account entirely for the patterns of assimilation that are
found in natural speech. Whereas some acoustic parameters seem to vary
in an all-or-none manner v-ratios change categorically, and durational
parameters do not change some others seem to vary in a graded manner (for
example, amplitude of glottal pulsing). The picture of voicing assimilation is
thus far more complex than previously thought.
6. Discussion
Let us summarize the observations that our corpus study made possible. The
v-ratio means computed for the four voicing contacts suggested that voice-
assimilated obstruents have intermediate v-ratios between those observed for
their underlying voicing and those for the opposite voicing (Table 1). We
showed that these means masked a different reality which could only be
uncovered by examining distributions. Distributional data indeed suggested
that assimilation takes place only part of the time but is complete, with respect
to v-ratio, when it does take place. More precisely, two voicing categories
may be dened phonetically (again, with respect to v-ratio): full-voicing
and partial-voicing. Assimilation, when it takes place, is basically a switch
between these two phonetic categories. How often does assimilation take
place? A rough estimate can be obtained from the inspection, in Table 2, of
the variation in frequency of the full voicing pattern according to voicing
contact. Since the frequency of this pattern increased by about 48% from
UV-UV to UV-V contacts, for both stops and fricatives, we may infer that
voicing assimilation takes place about 48% of the time. Likewise, devoicing
assimilation seems to occur about 30% of the time for stops but 67% of the
time for fricatives, an asymmetry already noted in 4.2. In section 5, we noted
that secondary cues to voicing must remain unaffected or partially unaffected
by assimilation since listeners can recover the intended voicing of fully
voiced items, such as soute pronounced [sud]. Indeed, acoustic measurements
Table 4. Acoustic measurements of plain soute and soude (bold face) and of strongly
assimilated soute (v-ratio=1) used in Snoeren, Segui and Hall (2008)
V-V UV-V UV-UV
soude brute soute bonde soute pleine
v-ratio 1 1 0.38
V/(V+closure) 0.605 0.564 0.568
F0 at V offset 224 Hz 231 Hz 249 Hz
energy in closure 69.1 dB 67.4 dB 65.2 dB
revealed subtle differences between such items. In other words, apparently
fully voice-assimilated forms retain traces of their underlying voicelessness.
How can we reconcile the divergent observations for v-ratios and secondary
cues?
Such dissociation between primary and secondary cues is reminiscent
of the recent ndings of Goldrick and Blumstein (2006) on tongue twisters
inducing slips of the tongue. They found that when k was erroneously
produced as [g] or g as [k], traces of the targeted consonants VOT were
found in the faulty productions. However, the slip of the tongue productions
showed no traces of the targeted consonant in local secondary cues to
voicing (F1 onset frequency, burst amplitude). As for the non-local cue
examined the following vowels duration it was faithful to the targeted
consonant. For example, erroneous [k]s for targeted gs had a slightly
shorter VOT than [k]s for plain /k/s, but had F1 and burst characteristics
typical of /k/s, and maintained the long following vowel duration observed
for plain /g/s.
8
(Symmetrical patterns obtained in the case of erroneous [g]s
for targeted ks.) Goldrick and Blumstein claimed their data supported a
cascade mechanism translating phonological planning into articulatory
implementation: both the targeted and the slipped segments representations
were activated during phonological planning, resulting in a mix of both
during articulation implementation. They also found evidence of cascading
activations between the posited lexical level (or lexeme selection) and
phonological planning, all this supporting a cascade processing architecture
across the board.
The assimilation data can be analyzed within the same framework of
speech production planning and articulation (Levelt 2002). Following lexical
selection, phonological planning may proceed in several steps: A rst step
may activate a canonical representation at level 1 posited in (12); when
words are assembled together, contextual phonological processes may apply,
activating level 2 representations in a subsequent step. Thus, similar to the
tongue twister situation, the voice assimilation process (or, more generally,
any phonological alternation process) entails the activation of several
phonological representations and representation levels, cascading to the
articulation implementation stage. Hence, the possible mixed articulation
implementation. This, together with coarticulation effects, must contribute to
phonetically mixed outputs. Like Goldrick and Blumstein (2006), Snoeren,
Segui and Hall (2008) found a dissociation of cues in the observed voice
assimilations but at the same time, the observed patterns were quite different.
In assimilating contexts, v-ratios changed categorically but F0 on the
preceding vowel, and waveform amplitude during stop closure underwent
incomplete change, whereas preceding vowel duration did not change at all
(just like Goldrick and Blumsteins following vowel durations). Goldrick and
Blumstein (2006) interpreted the observed dissociation of cues as revealing
the role of subsyllabic assembly mechanisms in articulatory implementation.
They regarded the fate of secondary cues as explained by a lesser perceptual
motivation. But this explanation obviously lacks consistency: Why should
some perceptually unimportant cues be completely neutralized and some
others entirely maintained? We propose instead that the dissociation of cues
is due to different time-courses of phonological planning and articulation
implementation. For instance, the resistance to assimilation for anking
vowel duration might be due to an early step of metric/prosodic planning
completed before the assimilation process switches the [voice] specication
of the assimilated segment. In the same way, we might interpret the weaker
glottal pulsing during closure in voicing assimilations than in plain voicing
as due to a later-occurring specication of voicedness in the assimilation
case. We are of course aware that these interpretations are for the time being
quite speculative and that more specic research is necessary to address, for
instance, the issue of timing within the phonological planning stage and its
possible consequences for articulation implementation.
Before closing, let us examine briey the classical articulatory phonology
account of assimilations in terms of gestural overlap (cf., for example,
Browman and Goldstein 1992). Whatever the gestures involved for voicing
in French plausibly, glottal opening-closing for voiceless obstruents and
glottal critical adduction for voiced ones (see Best and Hall 2010 for an
overview) the gestural overlap account predicts that assimilations occur
in perception rather than in production, and are all the more likely to occur
when speech rate is fast, or prosodic conditioning entails increased overlap.
In other words, according to standard articulatory phonology, no discrete
modication of phonological specication ever occurs in phonological
alternations. Gestural specications are not deleted nor switched: gestures
may only overlap and hide each other in perception, especially at fast rates.
One might want to test for this contention in corpus data: Assimilation degree
should be stronger at faster rates. We attempted to do this by separating the
data into four C1#C2 duration ranges (from less than 120 ms to more than
240 ms) and found a trend toward more frequent assimilations for shorter
durations. This leaves open the possibility that high speech rate favors
assimilation.
To conclude, our data seems more readily amenable to a discrete rather than
graded account of voice assimilation. In the scenario we propose, the classic
description of assimilation, as found in (1), applies within the phonological
planning stage in speech production: level 1 /sut/ produces level 2 [sud].
This takes us back to the typology of phonological alternations offered by
Nick Clements: Voice assimilation belongs to the single-feature type. The
qualications we propose to his typology are twofold. First, we propose,
following Goldrick and Blumstein (2006), that a cascading architecture
characterizes the translation from phonological code to articulation: Both
/sut/ and /sud/ feed articulation implementation. In that view, the assimilations
that are incomplete at the phonetic level, either quantitatively for a single
cue (e.g., v-ratio, amplitude of glottal pulsing) or in terms of dissociation
between cues, reect cascading translation from phonological planning
to articulation implementation, with different time courses of activation/
deactivation for different levels of representation. In other words, whereas
the classic description of the assimilation process in (1) offers a static picture,
we propose to consider the dynamics of its component parts. As a second
qualication, we introduced the notion of occurrence in the application of
a phonological process. Immediate context determines whether assimilation
is applicable or not. Yet, it seems that the actual occurrence of assimilation
requires further determinants. What determines whether assimilation takes
place or not? This question is indeed open to future investigation on the
licensing factors that might operate beyond immediate context.
Acknowledgments
This research was supported by an ANR grant (PHON-REP) to the rst
author.
Notes
1. Coarticulation is viewed here as a mechanical consequence of temporal overlap in
articulation between consecutive sounds (Fowler and Saltzman 1993; Browman
and Goldstein 1990, 1992). Coarticulation occurs with vowels (hman 1966;
Magen 1993) or tones (Abramson 1979; Xu 1994), and indeed with consonants,
in all the situations whereby sounds in contact differ in some phonetic dimension.
2. Should we consider the pronunciation [:ptenit] instead of [:btenit] for obtenir
as a case of within-word voice assimilation? This is a matter of debate. From
a synchronic point of view, we may argue that the lexical form of obtenir is
simply stored as /:ptenir/ and there is no phonological context around or within
that word possibly licensing an alternation with the /:btenir/ form. However, at
the abstract morphophonemic level, obtenir contains the prex {ob-}, hence the
phoneme /b/. The fact that obtenir has a /p/ at a less abstract level can be captured
by a transformation rule governing the alternation between /p/ and /b/ in {ob-},
that is, by an assimilation rule taking place between levels of representation.
The case of mdecin is different because its pronunciation can alternate between
[medesc
] and [metsc
] or [mctsc
]. Interestingly, in mdecin is pronounced [e]

more often than [c], although it should be [c] in the closed syllable /mct/ of
/mct.sc
/. This deviant pronunciation is symptomatic of a morphophonemic level

of representation in which is indeed /e/, as reected in the surface forms of
mdical, mdicament, etc.
3. Note that place assimilation may additionally occur in this example (Niebuhr,
Lancia, and Meunier 2008).
4. A discussion of the reliability and precision of the measurements presented here
falls out the scope of this paper. There are indeed potential shortcomings in any
automatic alignment system as well as in any automatic decision on acoustic
voicing. (Manual labeling and measurement procedures are not error free either.)
But the analyses proved to produce rather consistent and homogeneous patterns
of results, which is about all what is needed for the present study.
5. We compared this voicing decision procedure with a procedure based on the
harmonics-to-noise ratio (HNR: a measure of acoustic periodicity) exceeding a
xed threshold. We set this threshold to 0 dB, which corresponds to equal energy
in the harmonics and in the noise. The two methods yielded similar patterns of
results.
6. The opposite pattern we observe for French is also contrary to the naive intuition
about regressive voice assimilation that the right edge of C1 should be affected
by a following C2 with a different underlying voicing.
7. Similar ideas have been offered by Massaro and Cohen (1983) in a different
context. They proposed a new test for categorical perception in which listeners
had to rate stimuli of a /b/-/d/ continuum on a 15 scale, as 1 if they heard /b/ up to
5 if they heard /d/. Categorical perception predicts that subjects ratings to a given
stimulus be distributed along the 15 scale as two modes centered on the extreme
ratings 1 and 5, whereas continuous perception predicts a single mode centered
on a rating value depending on the stimulus, from 1 for /b/s to 5 for /d/s. That is,
continuous perception predicts a distributional shift from one stimulus to another,
whereas categorical perception predicts a switch between the two modes 1 and 5.
8. In this study, the speech materials were strictly controlled and distributions
restricted to limited dispersion around mean values. In other words, virtually all
observed slips had slightly non-canonic VOT values.
References
Abramson, Arthur S.
1979 The coarticulation of tones: An acoustic study of Thai. In T.
Thongkum, P. Kullavanijaya, V. Panupong, and T. Tingsabadh (eds.),
Studies in Tai and Mon-Khmer Phonetic and Phonology in Honour
of Eugnie J.A. Henderson, 19. Bangkok: Chulalongkorn University
Press.
Adda-Decker, Martine, and Lori Lamel
1999 Pronunciation variants across system conguration, language and
speaking style. Speech Communication 29 (24): 8398.
Barry, Martin, and Ryad Teifour
1999 Temporal patterns in Arabic voicing assimilation. In Proceedings of
the 14th International Congress of Phonetic Sciences, 24292432.
Best, Catherine T., and Pierre A. Hall
2010 Perception of initial obstruent voicing is inuenced by gestural
organization. Journal of Phonetics 38: 109126.
Blte, J ens, and Else Coenen
2000 Domato primes paprika: Mismatching pseudowords activate semantic
and phonological representations. In Proceedings of the SWAP
Conference, 5962. Nijmegen, The Netherlands.
Boersma, Paul
2001 Praat, a system for doing phonetics by computer. Glot International 5
(9/10): 341345.
Browman, Catherine, and Louis Goldstein
1990 Gestural specication using dynamically-dened articulatory
structures. Journal of Phonetics 18: 299320.
1992 Articulatory phonology: an overview. Phonetica 49: 155180.
Burton, Martha W., and Karen E. Robblee
1997 A phonetic analysis of voicing assimilation in Russian. Journal of
Phonetics 25: 97114.
Clements, Georges N.
Connine, Cynthia, Dawn Blasko, and Debra Titone
1993 Do the beginnings of spoken words have a special status in auditory
word recognition? Journal of Memory and Language 32: 193210.
Darcy, Isabelle, Franck Ramus, Anne Christophe, Katherine Kinzler, and Emmanuel
Dupoux
2009 Phonological knowledge in compensation for native and non-native
assimilation. In Franck Kgler, Caroline Fry, and Ruben van de
Vijver (eds.), Variation and gradience in phonetics and phonology,
265310. Berlin: Mouton De Gruyter.
Dilley, Laura C., and Mark A. Pitt
2007 A study of regressive place assimilation in spontaneous speech and its
implications for spoken word recognition. Journal of the Acoustical
Society of America 122 (4): 23402353.
Flemming, Edward
1997 Phonetic detail in phonology: Evidence from assimilation and
coarticulation. In K. Suzuki and D. Elzinga (eds.), Southern Workshop
on Optimality Theory: Features in OT. Coyote Papers.
Fouch, Pierre
1969 Trait de prononciation franaise. Paris: Klincksieck.
Fowler, Carol A., and Elliot Saltzman
1993 Coordination and coarticulation in speech production. Language and
Speech 36 (2, 3): 171195.
Gaskell, Gareth, and William Marslen-Wilson
1996 Phonological variation and inference in lexical access. Journal of
Experimental Psychology: Human Perception and Performance 22:
144158.
2001 Lexical ambiguity resolution and spoken word recognition: Bridging
the gap. Journal of Memory and Language 44: 325349.
Gaskell, Gareth, and Natalie Snoeren
2008 The impact of strong assimilation on the perception of connected
speech. Journal of Experimental Psychology: Human Perception and
Performance 34 (6): 16321647.
Gauvain, J ean-Luc, Gilles Adda, Martine Adda-Decker, Alexandre Allauzen,
Vronique Gendner, Lori Lamel, and Holger Schwenk
2005 Where are we in transcribing French broadcast news? In Proceedings
of Interspeech? 2005Eurospeech, 16651668.
Goldrick, Matthew, and Sheila Blumstein
2006 Cascading activation from phonological planning to articulatory
processes: Evidence from tongue twisters. Language and Cognitive
Processes 21 (6): 649683.
Gow, David W.
2001 Assimilation and anticipation in continuous spoken word recognition.
Journal of Memory and Language 24: 133159.
2002 Does English coronal place assimilation create lexical ambiguity?
J. Exp. Psychology: Human Perception and Performance 28: 163179.
Gow, David W., and Aaron M. Im
2004 A cross-linguistic examination of assimilation context effects. Journal
of Memory and Language 51: 279296.
Grammont, Maurice
1933 Trait de Phontique. Paris: Delagrave.
Hall, Pierre A., and Martine Adda-Decker
2007 Voicing assimilation in journalistic speech. In Proceedings of the 16th
International Congress of Phonetic Sciences, 493496.
J ansen, Wouter
2004 Laryngeal contrast and phonetic voicing: a laboratory phonology
approach to English, Hungarian, and Dutch. Ph.D. diss., University of
Groningen.
J ansen, Wouter, and Zo Toft
2002 On sounds that like to be paired (after all): An acoustic investigation of
Hungarian voicing assimilation. SOAS Working Papers in Linguistics
12: 1952.
Lahiri, Aditi, and Henning Reetz
2010 Distinctive features: Phonological underspecication in representation
and processing. Journal of Phonetics 38: 4459.
Levelt, Willem
2002 Phonological encoding in speech production. In Carlos Gussenhoven
and Natasha Warner (eds.), Papers in Laboratory Phonology VII, 87
99. Berlin: Mouton De Gruyter.
Magen, Harriet S.
1993 The extent of vowel-to-vowel coarticulation in English. Journal of
Martinet, Andr
1955 Economie des changements phontiques: trait de phonologie
diachronique. Berne: Francke.
Massaro, Dominic W., and Michael M. Cohen
1983 Categorical or continuous speech perception: a new test. Speech
Communication 2: 1535.
Niebuhr, Oliver, Leonardo Lancia, and Christine Meunier
2008 On place assimilation in French sibilant sequences. In Proceedings of
the 8th International Seminar on Speech Production, 221224.
hman, Sven E. G.
1966 Coarticulation in VCV utterances: spectrographic measurements.
Journal of the Acoustical Society of America 39 (1): 51168.
Rigault, Andr
1967 Lassimilation consonantique de sonorit en franais: tude acoustique
et perceptuelle. In B. Hla, M. Romportel, & P. J anota (eds.),
Proceedings of the 6th International Congress of Phonetic Sciences,
763766. Prague: Academia.
Snoeren, Natalie, Pierre A. Hall, and J uan Segui
2006 A voice for the voiceless: Production and perception of assimilated
stops in French. Journal of Phonetics 34 (2): 241268.
Snoeren, Natalie, J uan Segui, and Pierre A. Hall
2008 On the role of regular phonological variation in lexical access:
Evidence from voice assimilation in French. Cognition 108 (2):
B512B521.
Xu, Yi
1994 Production and perception of coarticulated tones. Journal of the
Acoustical Society of America 95 (4): 22402253.
An acoustic study of the Korean fricatives /s, s/:
Implications for the features [spread glottis]
and [tense]
Hyunsoon Kim and Chae-Lim Park
1. Introduction
Halle and Stevens (1971) classied the Korean non-fortis fricative as
aspirated with the specication of [+spread glottis] (henceforth, [s.g.])
for glottal opening, and suggested that the fortis fricative /s'/ in Korean is
specied for the feature [s.g.], for glottal closing. Moreover, Kagayas
(1974) berscopic data of the Korean fricatives showed that the maximum
glottal opening of the non-fortis fricative is as wide as that of the aspirated
stops /p
h
, t
h
, ts
h
, k
h
/ in word-initial position, though it reduces to almost half
of that of the stops in word-medial position. In his acoustic data, aspiration
was found after the non-fortis fricative when followed by the vowel /i/, /e/ or
/a/ word-initially and word-medially, but such aspiration was not observed
in the fortis fricative in the same contexts. From the phonetic data, Kagaya
proposed that the non-fortis fricative is aspirated with the specication of
[+s.g.] and that the fortis fricative is specied as [s.g.] in line with Halle and
Stevens (1971) (see Kim et al. (2010) for a more detailed literature review).
However, based on recent stroboscopic cine-MRI data on the Korean
fricatives, Kim, Maeda, and Honda (2011) have shown that the two fricatives
are similar to the lenis and fortis coronal stops /t, ts, t', ts'/, not to the aspirated
ones /t
h
, ts
h
/ in terms of glottal opening both word-initially and word-medially,
and that aspiration occurs during transitions from a fricative to a vowel and
from a vowel to a fricative, regardless of the phonation type of the fricatives.
In addition, in the comparison of the phasing between the tongue apex and the
glottal width of the fricatives with that of the aspirated stops /t
h
, ts
h
/ in Kim,
Honda, and Maeda (2011), it was found that the tongue apex-glottal phasing
of the non-fortis fricative is not like that of the aspirated stops. Thus, Kim,
Maeda, and Honda (2011) have proposed that the Korean non-fortis fricative
is lenis (/s/), not aspirated (/s
h
/) and that the two fricatives are specied as
[s.g.]. Kim et al. (2010) have provided further acoustic and aerodynamic
evidence for the feature specication of [s.g.] in the fricatives. The acoustic
An acoustic study of the Korean fricatives /s, s/ 177
data have shown that the absence or presence of aspiration is not relevant for
the distinction of the fricatives because aspiration can occur during transitions
from the two fricatives to a following vowel both word-initially and word-
medially, regardless of the phonation type of the consonants. The aerodynamic
data have revealed that the fricatives are similar to the lenis and fortis coronal
stops, not to the aspirated stops /t
h
, ts
h
/, in terms of airow.
According to Kim, Maeda, and Honda (2011), what differentiates the
fricatives is the tensing of the tongue blade and the vocal folds during the
oral constriction of the fricatives, in line with the newly dened feature
[tense] in Kim, Maeda, and Honda (2010). The stroboscopic cine-MRI study
of the fricatives has shown that oral constriction is narrower and longer,
with the apex being closer to the roof of the mouth in /s'/ than in /s/, the
pharyngeal width is longer in /s'/ than in /s/, and the highest tongue blade
and glottal height is sustained longer in /s'/ than in /s/. It is proposed then that
the concomitant tongue/larynx movements are incorporated into the feature
[tense]: the fortis /s'/ is specied as [+tense], like fortis and aspirated stops,
and the lenis /s/ as [tense] like lenis stops. The aerodynamic data of Kim et
al. (2010) provides further evidence for the feature [tense] in the fricatives.
Airow resistance (that is, oral-constriction resistance) is signicantly
greater for /s'/ than /s/ during oral constriction. Given that airow resistance
is directly related to the oral constriction shape and that it is consistently
higher in /s'/ than in /s/, Kim et al. (2010: 154) have suggested that the
constriction during the frication of /s'/ is stronger than during that of /s/ in
that the stronger the constriction is, the higher the resistance (e.g., Stevens
1998). The stronger constriction during /s'/ is articulatorily correlated
with narrower and longer oral constriction and the higher or longer glottal
raising in /s'/ in Kim, Maeda, and Honda (2011).
The present paper is a follow-up study to the acoustic part of Kim et al.
(2010) and examines whether the laryngeal characterization of the fricatives
in terms of the features [s.g.] and [tense] are also acoustically supported.
We extended the scope of the acoustic experiment of Kim et al. (2010) in
which two subjects (one male and one female) took part, by recruiting ten
native speakers of Seoul Korean (ve male and ve female). In addition,
we investigated not only the presence/absence of aspiration and voicing of
the fricatives, as in Kim et al. (2010), but also frication duration of the two
fricatives and F0 at the beginning of a following vowel.
If the fricatives are both specied as [s.g.], we can say that aspiration
has nothing to do with the distinction of the fricatives. Thus, one might
expect aspiration to occur during transitions from a fricative to a following
178 Hyunsoon Kim and Chae-Lim Park
vowel, regardless of the phonation type of the fricatives in both word-initial
and word-medial positions, as shown in Kim et al. (2010). If the fricatives
are differentiated by the feature [tense], as in Kim, Maeda, and Honda
(2011), then frication duration, which is articulatorily correlated with oral
constriction, would be longer in /s'/ than in /s/ both word-initially and word-
medially. In addition, given that the highest glottal position in /s'/ is often the
same as in /s/, though the duration of the highest glottal position tends to be
longer in /s'/ than in /s/ (Kim, Maeda, and Honda 2011), it is probable that
F0 values at the voicing onset of a vowel would often be the same after /s/
and /s'/. We also examined whether voicing could occur in both /s/ and /s'/,
as in Kim et al. (2010), or only in the intervocalic word-medial fricative /s/,
as observed in Cho, J un, and Ladefoged (2002) for their proposal that /s/ is
lenis like its counterpart stops due to intervocalic voicing.
This paper is structured as follows. In sections 2 and 3, we provide the
method and results of our acoustic experiments and discuss the implications
of the acoustic data, respectively. A brief conclusion is in section 4.
2. Acoustic experiments
2.1. Method
As in Kim et al. (2010), we put the two fricatives /s, s'/ in /_V_V/ where V is
one of the eight Korean monophthongs /a, , \, c, o, u, i, i/, as shown in (1).
(1) /sasa/ /sasa/
/ss/ /ss/
/s\s\/ /s\s\/
/scsc/ /scsc/
/soso/ /soso/
/susu/ /susu/
/sisi/ /sisi/
/sisi/ /sisi/
The test words, which are all nonsense words, were embedded in the frame
sentence /nka __ palimhapnita/ I pronounce __. On a single page, sentences
with the test words written in Korean orthography were randomized with two
ller sentences at the top and the bottom. The sentences were read ve times
at a normal speech rate by ten subjects (ve male, ve female) all of whom
were in their early 20s. The average age of our subjects was 24.5 years old.
Each subject familiarized him/herself with the test words by reading them
a few times before recording, and then read them as naturally as possible
during recording. A Shure SM57LC microphone was connected to a PC
(SONY-VGN-T236L/W) and Praat was used in recording the subjects. All
800 tokens obtained in this way (16 test words x 10 subjects x 5 repetitions)
were then analyzed in Praat.
Figure 1 shows how the duration of aspiration was measured after the
offset of the fricative /s/ in /ss/, as well as how frication during the oral
constriction was measured word-initially and word-medially. The frication
phase of the fricatives is marked by an arrow with a dotted line at the bottom
of the spectrogram, and is identied by the major region of noise energy
above 4 kHz as an alveolar fricative (e.g., Fant 1960; Kent and Read 2002).
Aspiration following the frication noise is marked by an arrow with a solid
line. The aspiration phase is identied by noise covering a broad range of
frequencies with relatively weak energy.
In addition, F0 values at the onset of a vowel following /s/ and /s'/ were
measured both word-initially and word-medially. Also, we examined whether
the vocal folds vibrated or not during the frication of the two fricatives.
2.2. Results
2.2.1. Frication
Table 1 presents the frication duration of the fricatives /s, s'/ averaged over
ve repetitions from our ten subjects both word-initially and word-medially
/s/ aspiration // aspiration /s/ aspiration //
Figure 1. Wide-band spectrogram of /ss/ taken from a female subject.
in the context /_V_V/, where V is one of the eight Korean monophthongs /a,
, \, c, o, u, i, i/, as in (1).
We can note that the frication duration of /s'/ is longer than that of /s/ in
all the vowel contexts both word-initially and word-medially. A paired
samples two-tailed t-test showed that frication duration is signicantly longer
in the fortis fricative /s'/ than in the lenis /s/ both word-initially and word-
medially (t(7)
=
5.7, p <.0008 for /s/ vs. /s'/ in word-initial position; t(7)
=
24.3, p <.0001 for /s/ vs. /s'/ in word-medial position). Another paired
samples two-tailed t-test showed that the average frication duration of /s/
is signicantly longer in word-initial position than in word-medial position
(t(7) =8.8, p <.0001). However, the frication duration of /s'/ is signicantly
greater in word-medial position than in word-initial position (t(7) =24.8,
p <.0001).
In addition, we compared the frication duration of /s/ and /s'/ in each vowel
context. As shown in Table 2, paired samples two-tailed t-tests revealed that
frication duration is signicantly longer in /s'/ than in /s/ both word-initially
and word-medially before all the vowels except before /i/ in word-initial
position.
2.2.2. Aspiration
Table 3 presents the average aspiration duration after the offset of the
fricatives word-initially and word-medially in the test words in (1). It is
noteworthy that aspiration occurs not only after the offset of the fricative /s/
Table 1. The average frication duration (ms) of the fricatives /s, s'/ (a) word-initially
and (b) word-medially in /_V_V/, where V is one of the eight Korean
monophthongs /a, , \, c, o, u, i, i/.
a. Word-initial position b. Word-medial position
/s/ /s'/ /s/ /s'/
/_a/ 56.5 79.1 /a_a/ 48 106.1
/_/ 61.5 85.3 /_/ 55.1 108.7
/_\/ 61.3 84.2 /\ _ \/ 50.4 107.4
/_c/ 63.6 85.8 /c_c/ 55.8 105
/_o/ 66.4 90.9 /o_o/ 55.5 112.6
/_u/ 92 99.8 /u_u/ 75.7 119.6
/_i/ 95.3 106.2 /i_i/ 80.9 129.6
/_i/ 101.4 104.3 /i_i/ 86.4 129.5
but also after that of the fortis fricative /s'/, no matter which vowel follows
the two fricatives.
In word-initial position, aspiration duration is the longest before the vowel
/a/ after the offset of the fricatives /s/ and /s'/. In word-medial position, it is
the longest before /\/ after the offset of the fricative /s/ and before /c/ after
the offset of the fricative /s'/. No matter which vowel follows the fricatives,
aspiration duration is longer after the offset of the fricative /s/ than /s'/ in
Table 3. A paired samples two-tailed t-test showed that aspiration duration
is signicantly longer after the offset of the lenis fricative /s/ than after the
offset of the fortis /s'/ both word-initially and word-medially (t(7) =4.9, p <
.0017 for /s/ vs. /s'/ in word-initial position; t(7) =4.8, p <.0019 for /s/ vs. /s'/
in word-medial position). Another paired samples two-tailed t-test showed
Table 2. Paired samples two-tailed t-tests of frication duration in (a) word-initial
and (b) word-medial fricatives /s/ vs. /s'/ in /_V_V/.
/s/ vs. /s'/ /s/ vs. /s'/
/_a/ t(49)
=
9.7, p<.0001 /a_a/ t(49)
=
15.3, p<.0001
/_/ t(49)
=
8.6, p<.0001 /_/ t(49)
=
14.8, p<.0001
/_\/ t(49)
=
9, p<.0001 /\_\/ t(49)
=
14.5, p<.0001
/_c/ t(49)
=
9.8, p<.0001 /c_c/ t(49)
=
14.6, p<.0001
/_o/ t(49)
=
11.8, p<.0001 /o_o/ t(49)
=
16.3, p<.0001
/_u/ t(49)
=
2.8, p<.0067 /u_u/ t(49)
=
13.5, p<.0001
/_i/ t(49)
=
4.1, p<.0002 /i_i/ t(49)
=
12.8, p<.0001
/_i/ t(49)
=
1.1, p>.2961 /i_i/ t(49)
=
12.5, p<.0001
Table 3. The average aspiration duration (ms) after the offset of the fricatives /s, s'/
(a) word-initially and (b) word-medially in /_V_V/.
/s/ /s'/ /s/ /s'/
/_a/ 20.1 13.5 /a_a/ 12 10.8
/_/ 16.6 10.7 /_/ 12.8 11.3
/_\/ 18.9 10.5 /\ _ \/ 13.1 10.3
/_c/ 17.9 10.6 /c_c/ 12.5 11.9
/_o/ 18.7 11.1 /o_o/ 12.9 10.7
/_u/ 11.7 9.6 /u_u/ 9.5 8.7
/_i/ 11.4 9.7 /i_i/ 10 9.4
/_i/ 10 8.7 /i_i/ 9.6 8.5
that the average aspiration duration after the offset of /s/ is signicantly
longer in word-initial position than in word-medial position (t(7) =4.4,
p <.003). In contrast, the aspiration duration after the offset of /s'/ in word-
initial position is not signicant, compared with that in word-medial position
(t(7) =.8, p >.4236).
Multiple repeated measures ANOVAs with Vowel context as the main
factor and aspiration duration as the dependent variable showed that vowel
contexts in relation to aspiration duration are highly signicant after the
offset of /s/ (F(7, 280)=20.5, p <.0001 for word-initial /s/; F(7, 280)=7.8,
p <.0001 for word-medial /s/) and also after the offset of /s'/ (F(7, 280)=6.3,
p <.0001 for word-initial /s'/; F(7, 280)=6.5, p <.0001 for word-medial /s'/).
This indicates that aspiration duration is affected by vowel contexts after the
offset of the two fricatives in both word-initial and word-medial positions.
Paired samples two-tailed t-tests also revealed that aspiration duration
is dependent on vowel contexts both word-initially and word-medially,
regardless of the phonation type of the fricatives. For example, in word-
initial position, aspiration duration after the offset of /s/ is signicantly longer
before /a/ than it is before /i/ (t(49)=6.1, p <.0001), /i/ (t(49)=4.8, p <.0001)
and /u/ (t(49)=4.7, p<.0001), whereas it is not signicant in the comparison
of /_a/ vs. /_/ (t(49)=1.7, p >.105), /_a/ vs. /_\/ (t(49)=.9, p >.3977), /_a/ vs.
/_c/ (t(49)=1.4, p >.1639), and /_a/ vs. /_o/ (t(49)=.8, p >.4463). In contrast,
the comparison of aspiration duration after the offset of the fricative /s'/ in
word-initial position shows that it is signicantly longer (p <.05) before the
vowel /a/ than before the other vowels. In word-medial position, aspiration
duration is also dependent on vowel contexts not only after the fricative /s/
but also after /s'/: aspiration duration after the two fricatives is signicantly
longer in /a_a/ than in /i_i/ (t(49)=3, p <.0047 for /s/; t(49)=3.6, p <.0008 for
/s'/), /u_u/ (t(49)=2.6, p <.0116 for /s/ ; t(49)=3.1, p <.0036 for /s'/) and /i_i/
(t(49)=2.9, p <.0061 for /s/ ; t(49)=3.1, p <.0036 for /s'/), whereas it is not
signicant (p >.1 for both /s/ and /s'/) in the comparison of /a_a/ vs. /_/,
/a_a/ vs. /\_\, /a_a/ vs. /c_c/, and /a_a/ vs. /o_o/.
2.2.3. F0
The average F0 values at the voice onset of a vowel after the fricatives are
presented in Table 4.
We can note that F0 at the voice onset of a vowel is higher after /s'/
when followed by the vowel /a/, /\/ and /o/ in word-initial position and when
followed by /\/, /o/ and /i/ in word-medial position. Yet, it is not always
higher after /s'/. For example, F0 is higher when the fricative /s/ is followed
by //, /c/, /u/, /i/, and /i/ in word-initial position, and by //, /c/, /u/, and
/i/ in word-medial position. A paired samples two-tailed t-test showed that
the average F0 value is not signicant after /s/ and /s'/ both word-initially
and word-medially (t(7)=.7, p >.504 for /s/ vs. /s'/ in word-initial position;
t(7)=.8, p >.439 for /s/ vs. /s'/ in word-medial position).
We also compared F0 at the voice onset of each vowel after /s/ and /s'/
in word-initial and word-medial positions. Paired samples two-tailed t-tests
showed that F0 at the voice onset of a vowel is not statistically signicant
after /s/ and /s'/ in both word-initial and word-medial positions, no matter
which vowel follows the fricatives, as shown in Table 5.
Table 5. Paired samples two-tailed t-tests of F0 values at vowel onsets after
(a) word-initial and (b) word-medial fricatives /s/ vs. /s'/.
/s/ vs. /s'/ /s/ vs. /s'/
/_a/ t(49)
=
1.5, p>.1464 /a_a/ t(49) =.1, p>.9388
/_/ t(49) =1.4, p>.1572 /_/ t(49) =1.5, p>.1369
/_\/ t(49)
=
.7, p>.466 /\_\/ t(49)
=
.8, p>.4366
/_c/ t(49) =1.1, p>.2861 /c_c/ t(49) =1.8, p>.0709
/_o/ t(49)
=
1.6, p>.1214 /o_o/ t(49)
=
.7, p>.495
/_u/ t(49) =1.8, p>.0834 /u_u/ t(49) =1.3, p>.1845
/_i/ t(49) =.2, p>.8465 /i_i/ t(49)
=
3.8, p>.0949
/_i/ t(49) =1.8, p>.0754 /i_i/ t(49) =1.9, p>.0603
Table 4. The average F0 values at the voice onset of a vowel after the fricatives
/s, s'/ (a) word-initially and (b) word-medially in /_V_V/ (unit: Hz).
/s/ /s'/ /s/ /s'/
/_a/ 194.6 198.1 /a_a/ 204.4 204.2
/_/ 203.6 197.8 /_/ 212.9 206.6
/_\/ 199.3 201.3 /\ _ \/ 201.5 206.9
/_c/ 201.8 198 /c_c/ 212.3 206.3
/_o/ 195.6 202.3 /o_o/ 209.6 211.8
/_u/ 208.6 204.6 /u_u/ 214.6 211.4
/_i/ 204.3 203.5 /i_i/ 211.3 215.2
/_i/ 207.3 199.6 /i_i/ 210.4 203.1
2.2.4. Voicing
Figure 2 presents the number of tokens of the fricatives /s/ and /s'/ which
have voice bars (i) at the beginning of, (ii) up to the middle of, and
(iii) throughout the frication in (a) word-initial and (b) word-medial position.
As shown in Figure 2, voicing occurs at the beginning of, up to the middle
of, and also throughout the frication of the fricatives, regardless of phonation
0
50
100
150
200
250
300
350
400
(
t
o
k
e
n
s
)
(i) (ii) (iii)
in word-initial position
/s'/
/s/
249
280
112
119
10
1
0
50
100
150
200
250
300
350
400
(
t
o
k
e
n
s
)
(i) (ii) (iii)
in word-medial position
/s'/
/s/
229
301
113
95
58
4
Figure 2. The number of tokens where voicing occurs (i) at the beginning of, (ii) up
to the middle of, and (iii) throughout the frication of the fricatives /s, s'/ in
(a) word-initial and (b) word-medial position.
type, not only in word-initial position but also in word-medial position. The
percentage of voicing at the beginning of the frication of the two fricatives is
66% (249 for /s/ and 280 for /s'/ among 800 tokens) in word-initial position.
The same percentage is observed in word-medial position (229 for /s/ and
301 for /s'/ among 800 tokens). However, the percentage of voicing observed
up to the middle of the frication is reduced to 29% (112 for /s/ and 119 for /s'/
among 800 tokens) in word-initial position and 26% (113 for /s/ and 95 for
/s'/ among 800 tokens) in word-medial position. The occurrence of complete
voicing throughout the frication is much further reduced to 1% (10 for /s/ and
1 for /s'/ among 800 tokens) in word-initial position and 8% (58 for /s/ and 4
for /s'/ among 800 tokens) in word-medial position.
In order to examine whether or not voicing of the two fricatives is
dependent on vowels, we checked the frequency of voicing in each vowel
context word-initially and word-medially. Table 6 shows the number of
tokens where voicing occurs (i) at the beginning of, (ii) up to the middle
of, and (iii) throughout the frication of the fricatives in each vowel context
(a) word-initially and (b) word-medially. Voicing occurs at the beginning
of the frication of the two fricatives, no matter which vowel follows the
consonants. This is also true of voicing up to the middle of the frication,
except that the tokens of voiced /s/ and /s'/ are relatively small before /u/ in
word-initial position and in /u_u/ in word-medial position, respectively. In
the case of complete voicing throughout the frication, voicing occurs in the
fricative /s/ in word-medial position, in all the vowel contexts, though the
tokens of voiced /s/ are relatively small in /o_o/ and /i_i/.
3. Discussion
We suggest that the present acoustic data support the laryngeal characterization
of the fricatives in terms of the two binary features [s.g.] and [tense] in
Kim, Maeda, and Honda (2011), as in Kim et al. (2010): the lenis fricative
/s/ is specied as [-s.g., -tense] and the fortis fricative /s'/ as [-s.g., +tense].
First, the present acoustic data on aspiration duration conrm that the
two fricatives /s, s'/ are specied as [-s.g.], because aspiration noise occurs
during transitions from a fricative to a following vowel both word-initially
and word-medially, regardless of phonation type, as shown in Kim, Maeda,
and Honda (2011) and Kim et al. (2010). For example, in the tongue apex-
glottis phasing in /sasa/ and /sasa/, Kim, Maeda, and Honda (2011) have
noted that aspiration could arise when glottal width is less than the distance of
the tongue apex from the roof of the mouth, regardless of the phonation type
Table 6. The number of tokens with voicing (i) at the beginning of, (ii) up to the
middle of, and (iii) throughout the frication of the two fricatives /s, s'/ in
each vowel context (a) word-initially and (b) word-medially.
a. Word-initial position
/s/ /s'/
(i) (ii) (iii) (i) (ii) (iii)
/_a/ 29 18 3 33 17 0
/_/ 32 16 1 38 12 0
/_\/ 37 11 2 35 15 0
/_c/ 31 17 2 32 17 1
/_o/ 30 19 1 39 11 0
/_u/ 46 4 0 38 12 0
/_i/ 39 11 0 32 18 0
/_i/ 33 16 1 33 17 0
total : 249 112 10 280 119 1
b. Word-medial position
/s/ /s'/
(i) (ii) (iii) (i) (ii) (iii)
/a_a/ 22 17 11 31 18 1
/_/ 24 17 9 40 10 0
/\_\/ 27 14 9 34 15 1
/c_c/ 29 14 7 37 13 0
/o_o/ 26 22 2 36 14 0
/u_u/ 34 10 10 46 3 1
/i_i/ 36 11 3 43 7 0
/i_i/ 31 12 7 34 15 1
total : 229 113 58 301 95 4
of the two fricatives. Thus, aspiration could arise not only word-medially
but also word-initially during transitions from a fricative to a vowel and
from a vowel to a fricative, but its duration is likely to be shorter after /s'/.
Shorter aspiration duration after /s'/ both word-initially and word-medially
was conrmed in the present study (Table 3) as well as in Kim et al. (2010).
Longer aspiration in /s/ than in /s'/ in both word-initial and word-medial
positions, as in Table 3 and also in Kim et al. (2010) can be attributed to
a longer transition or a slower speed of transition due to its wider glottal
opening than that in the fortis fricative in line with Kim, Maeda, and Honda
(2011). Therefore, the wider glottal opening in the fricative /s/ takes a little
longer to achieve the adduction necessary for a following vowel, resulting
in longer aspiration than in the fortis fricative which has a narrower glottal
opening.
Furthermore, note that aspiration duration is signicantly longer after
/s/ in word-initial position than in word-medial position, whereas it is not
signicant after /s'/ in either word-initial or word-medial position in the
present study, as it was in Kim et al. (2010). Longer aspiration duration after
/s/ in word-initial position than in word-medial position can be attributed
to a wider glottal opening during /s/ word-initially than word-medially.
As shown in Kim, Maeda, and Honda (2011), the glottal width of the lenis
fricative /s/ in word-initial position is almost twice as large as that in word-
medial position in their two subjects. In contrast, the glottal width of the
fortis fricative /s'/ does not change much, no matter whether the fricative is
in word-initial or word-medial position in Kim, Maeda, and Honda (2011).
Thus, we can expect aspiration duration not to be signicant after /s'/ in both
word-initial and word-medial positions, as in the present study and also in
Kim et al. (2010).
Moreover, it is noteworthy that in word-initial position, aspiration duration
after the fricative /s/ is signicantly longer before /a/ than before /i/, /i/ and
/u/ in the present study. This is also given the same account in line with Kim,
Maeda, and Honda (2011): a longer transition or a slower speed of transition
is expected from the fricative to the low vowel than to the high vowels,
because the distance of the tongue apex from the roof of the mouth is greater
at the transition from /s/ to the low vowel /a/ than to the high vowels. Recall
also that aspiration duration after the two fricatives is signicantly longer in
/a_a/ than in /i_i/, /i_i/ and /u_u/ in word-medial position. The same account
can be given to the longer aspiration duration after /s/ and /s'/ in /a_a/.
Given the data on aspiration duration in the present study and also in Kim
et al. (2010), as well as the articulatory study in Kim, Maeda, and Honda
(2011), we can say that aspiration does not give rise to the distinction of the
two fricatives. Therefore, both the lenis /s/ and the fortis /s'/ are specied
as [-s.g.] (see H. Kim (2011) for phonological arguments for the feature
specication of [-s.g.] in the fricatives).
The second piece of evidence for the feature [tense], as proposed in Kim,
Maeda, and Honda (2011), comes from the data on frication duration in
Tables 1 and 2. In the MRI study, it was found that during oral constriction,
/s'/ as opposed to /s/ occurs with a longer oral constriction with the apex
being closer to the roof of the mouth, longer pharyngeal width and a longer
highest tongue blade. The difference in oral constriction duration between
the two fricatives is acoustically correlated with that in frication duration
in the present study (see also Cho, J un, and Ladefoged (2002) for longer
frication duration in /s'/ than in /s/ when followed by the vowel /a/). Thus,
the frication duration of the fricatives is considered an acoustic correlate of
the feature [tense].
As for the difference in frication duration between /s/ and /s'/, which can
be expressed by virtue of the feature [tense], we can refer to recent perception
studies (e.g., S. Kim 1999; Lee and Iverson 2006) and loanword data (H.
Kim 2007, 2009) in Korean. According to S. Kim (1999), Korean speakers
are sensitive to acoustic differences in the frication duration of English [s].
Given that the frication duration is shorter in the English [s] in consonant
clusters than in the single [s] (e.g., Klatt 1974), Korean speakers perceived
the English [s] in consonant clusters as the lenis fricative /s/ and the single
[s] as the fortis fricative /s'/. The results of the perception studies can be
explained by reference to the feature [tense], as in the Korean adaptation
of English [s] (H. Kim 2007, 2009). As shown in (2a), the English single
[s], which is longer than [s] in consonant clusters, is borrowed as the fortis
fricative /s'/ into Korean. Yet, short [s] in consonant clusters in English is
borrowed as the lenis fricative /s/, as in (2b).
(2) Korean treatment of the English [s]
English words Korean adapted forms
a. salad sl.l\.ti
sign sa.in
excite ik.sa.i.t
h
i
bus p\.si
kiss k
h
i.si
b. sky si.k
h
a.i
snap si.np
disco ti.si.k
h
o
display ti.si.p
h
il.lc.i
According to H. Kim (2007, 2009), the subphonemic duration difference
in the English [s] is interpreted in Korean in terms of the feature [tense].
Hence, the longer duration of the English single [s] is interpreted as a cue
to the [+tense] fricative /s'/, while the shorter duration of the English [s]
occurring in consonant clusters is a cue to the [-tense] fricative /s/. The same
is true of Korean adaptation of the French [s]. Similar to English [s], the
French [s] has a shorter frication duration in consonant clusters than when it
is a single [s] (e.g., OShaughnessy 1981). As in the Korean adaptation of the
English [s] in (2), H. Kim (2007) has proposed that the acoustic difference
in frication duration between the fricatives in terms of the feature [tense]
also plays a major role when Korean speakers borrow the French fricative
[s]. Thus, as shown in (3a), the French single [s], which is longer than [s] in
consonant clusters, is borrowed as the fortis fricative /s'/ into Korean. Yet,
the shorter [s] occurring in consonant clusters in French is borrowed as the
lenis fricative /s/, as in (3b).
(3) Korean treatment of the French [s]
French words Korean adapted forms
a. Sartre sa.li.t
h
i.li
Sorbonne so.li.pon.ni
Seine scn.ni
Nice ni.si
Provence p
h
i.lo.pa.si
b. Bastille pa.si.t
h
i.ju
Pasteur p
h
a.si.t
h
c.li
J ospin tso.si.p
h
c
Basque pa.si.k
h
i
In short, Korean adaptation of the English and French [s] (H. Kim 2007,
2009), as well as perception studies (e.g., S. Kim 1999; Lee and Iverson
2006) indicates that the difference in frication duration between /s/ and /s'/
gives rise to the distinction of the two fricatives. Moreover, it is expressed by
reference to the feature [tense].
The third type of evidence for the proposed specication of Korean /s,
s'/ concerns F0 in the vowel onset after the fricatives which, in the present
study, does not show the systematic variation found for stops. In the literature
it has been reported that the onset value of F0 after fortis and aspirated stops
is higher than that of lenis ones in Korean (e.g., C.-W. Kim 1965; Han and
Weitzman 1970; Hardcastle 1973; Kagaya 1974; Cho, J un, and Ladefoged
2002; M.-R. Kim, Beddor, and Horrocks 2002 among others). The systematic
variation of the onset value of F0 after Korean stop consonants is articulatorily
conrmed in the MRI data on larynx raising in Korean labial, coronal and
dorsal stops (Kim, Honda, and Maeda 2005; Kim, Maeda, and Honda 2010):
the glottis rises from low to high in the order lenis <aspirated (<) fortis stop
consonants.
However, there is no signicant difference in F0 at the voice onset of
each vowel after /s/ and /s'/, as shown in Table 5. In particular, the lack of a
signicant difference in F0 at the voice onset of the vowel /a/ after /s/ and /s'/
is reminiscent of the MRI data on glottal height of the two fricatives in Kim,
Maeda, and Honda (2011) where two subjects of Seoul Korean (one male
and one female) participated. When the fricatives were put in the contexts
/ma_a/ and /_a_a/, it was found in the articulatory study that the highest
glottal position is the same during the oral constriction of /s/ and /s'/ in the
female subject. In the case of the male subject, the maximum glottal height
during /s'/ is the same as that during /s/ in the word-medial position in the
context /_a_a/. On the other hand, in the MRI study, the male subject had a
higher glottal position during /s'/ in /ma_a/ and word-initially in /_a_a/. From
this, one might expect F0 to be higher at the onset of a vowel after /s'/ than
after /s/. For example, Cho, J un, and Ladefoged (2002: 215) found that F0 is
generally higher for /s'/ than for /s/ at p <0.1 (see also the EMG studies of
Hirose, Park, and Sawashima (1983) and Hong, Niimi, and Hirose (1991)).
In their EMG study of word-initial plosive and fricative consonants, Hong,
Niimi, and Hirose (1991) found that the thyroarytenoid (VOC) is higher in
/s'/ than it is in the lenis fricative, just as in /p/ vs. /p/, /t/ vs. /t'/ and /k/ vs.
/k/. However, Hirose, Park, and Sawashima (1983) showed that there is no
difference in VOC activity for the two types of fricatives in their EMG study
of word-initial plosive and fricative consonants.)
Given that either a higher F0 after /s'/ or the same F0 after /s'/ and /s/ is
expected from the MRI data on the glottal height during the two fricatives,
we can cautiously say that F0 might not reect the laryngeal characteristics
of the fricatives, as in stop consonants. Rather, in that the same glottal
position tends to be sustained longer during /s'/ than /s/ and that frication
duration, which is articulatorily correlated with oral constriction, is most
likely to be signicantly longer during /s'/, we suggest that the laryngeal
characteristics of the fricatives occur during oral constriction, as shown in
Kim, Maeda, and Honda (2011) (see also H. Kim (2011)). This is further
supported by the aerodynamic data in Kim et al. (2010), according to which
airow resistance (that is, oral-constriction resistance) is signicantly greater
for /s'/ than /s/ during oral constriction.
Finally, the present study shows that voicing in the intervocalic fricative
/s/ cannot be equated with the voicing occurring with intervocalic lenis
plosives, as in Kim et al. (2010). Cho, J un, and Ladefoged (2002) propose
that the fricative /s/ is lenis because about 47% of their /s/ tokens were fully
voiced in intervocalic position, like lenis stops. In their acoustic study of
the fricatives, four male speakers of Seoul Korean in their late 50s and early
60s living in Los Angeles, USA, and eight male speakers of Cheju Korean
in their mid 50s and mid 70s read eight test words in a frame sentence six
times and twice, respectively. Thus, the total number of their tokens was
320. In addition, Cho, J un, and Ladefoged (2002:213) have suggested that
voicing for /s/ occurs over a wide range varying in degree, as it does with
lenis stops. That is, full voicing was observed throughout the frication of
/s/, as well as partial voicing during frication. Therefore, partial voicing was
observed at 20, 40, 60 and 80% of frication duration, yet its frequency is less
than 20%.
However, we have already noted in Figure 2 that voicing occurs not only
in /s/ but also in /s'/ word-initially and word-medially, as in Kim et al. (2010).
In addition, the frequency of voicing at the beginning of frication duration is
much higher than that up to the middle of frication duration, and complete
voicing rarely occurs throughout the frication of the two fricatives.
This indicates that the presence/absence of intervocalic voicing in /s/
cannot be a criterion for determining whether or not the fricative /s/ is lenis.
Rather, the much higher frequency of voicing at the beginning of the frication
of the two fricatives may be due to neighboring vowels. For example, as
shown in Figure 1, voice bars continue during transition, that is, aspiration
after the vowel // and even at the beginning of the frication of the intervocalic
fricative /s/. This is also true of voicing during transition and at the beginning
of the frication of /s'/. However, the vocal fold vibration drastically weakens
in the middle of frication both word-initially and word-medially, no matter
whether it is /s/ or /s'/, because of continuous airow during oral constriction.
We can attribute the more complete voicing throughout the frication of /s/
but not in /s'/ in intervocalic position (Figure 2b, Table 6b) to less tensing of
the vocal folds in the lenis fricative ([tense]) than in the tense one, i.e. /s'/
([+tense]).
4. Conclusion
In the present study, we have examined frication duration of the two fricatives
/s/ and /s'/, aspiration duration after the fricatives, F0 at the voice onset of a
vowel following the fricatives and voicing of the fricatives not only in word-
initial position but also in word-medial position. From our acoustic data, we
have found that frication duration is signicantly longer in /s'/ than in /s/
both word-initially and word-medially. We have also found that aspiration
occurs during the transition from a fricative to a following vowel, regardless
of the phonation type of fricatives. Moreover, the results indicated that the
comparison of the onset value of F0 after /s/ and /s'/ shows no statistical
signicance in either vowel context and that voicing occurs during the
frication of /s/ and /s'/ in both word-initial and word-medial positions. Based
on the results of our present acoustic data, we have suggested that aspiration
after the fricatives has nothing to do with the laryngeal characterization
of the fricatives which are specied as [-s.g.]. Further, the difference in
frication duration between the fricatives is expressed by reference to the
feature [tense]. We have also suggested that the laryngeal characteristics of
the fricatives occur during the oral constriction rather than in F0 at the voice
onset of a vowel after the fricatives. Finally, voicing during frication does not
carry the laryngeal characterization of the fricatives.
Acknowledgements
We would like to express our thanks to our subjects. Any errors remain our
own.
References
Cho, Taehong, Sun-Ah J un, and Peter Ladefoged
2002 Acoustic and aerodynamic correlates of Korean stops and fricatives.
Journal of Phonetics 30: 193228.
Fant, Gunnar
1960 Acoustic Theory of Speech Production. The Hague: Mouton & Co.
Halle, Morris, and Kenneth N. Stevens.
1971 A note on laryngeal features. Quarterly progress report 101: 198212.
Cambridge, MA: Research Laboratory of Electronics, MIT.
Han, M. S., and R. S. Weitzman.
1970 Acoustic features of Korean /P, T, K/, /p, t, k/ and /ph, th, kh/. Phonetica
2: 112128.
Hardcastle, W. J .
Some observations on the tense-lax distinction in initial stops in
Korean. Journal of Phonetics 1: 263272.
Hirose, Hajimek, Hea Suk Park, and Masayuki Sawashima.
1983 Activity of the thyroarytenoid muscle in the production of Korean
stops and fricatives. Annual Bulletin Research Institute of Logopedics
and Phoniatrics 17: 7381. University of Tokyo.
Hong, Kihwan, Seiji Niimi, and Hajime Hirose.
1991 Laryngeal adjustments for Korean stops, affricates and fricatives
an electromyographics study. Annual Bulletin Research Institute of
Logopedics and Phoniatrics 25: 1731. University of Tokyo.
Kagaya, Ryohei
1974 A berscopic and acoustic study of the Korean stops, affricates and
fricatives. Journal of Phonetics 2: 161180.
Kent, Raymond D., and Charles Read
2002 Acoustic Analysis of Speech. San Diego, CA: Singular Pub. group. 2
nd

edition.
Kim, Chin-Wu
1965 On the Autonomy of the Tensity Feature in Stop Classication (with
the Special Reference to Korean Stops). Word 21: 339359.
Kim, Hyunsoon
2007 A Feature-driven model of loanword adaptation: Evidence from
Korean. (submitted).
2009 Korean adaptation of English affricates and fricatives in a feature-
driven model of loanword adaptation. In Andrea Calabrese & W. Leo
Wetzel (eds.), Loan phonology, 155180. Amsterdam/Philadelphia:
J ohn Benjamins Publishing Company.
2011 What features underline the /s/ vs. /s'/ contrast in Korean? Phonetic
and Phonological evidence. In G. Nick Clements & Racid Ridouane
(eds.), Where do phonological contrasts come from? Cognitive,
physical and developmental basis of distinctive speech categories, 99
130. Amsterdam/Philadelphia: J ohn Benjamins Publishing Company.
Kim, Hyunsoon, Kiyoshi Honda, and Shinji Maeda
2005 Stroboscopic-cine MRI study of the phasing between the tongue and
the larynx in the Korean three-way phonation contrast. Journal of
Phonetics 33: 126.
Kim, Hyunsoon, Shinji Maeda, and Kiyoshi Honda
2010 Invariant articulatory bases of the features [tense] and [spread glottis]
in Korean: New stroboscopic cine-MRI data. Journal of Phonetics 38:
90108.
2011 The laryngeal characterization of Korean fricatives: Stroboscopic cine-
MRI data. Journal of Phonetics, doi: 10.1016/j.wocn2011.06.001.
Kim, Hyunsoon, Shinji Maeda, Kiyoshi Honda, and Stphane Hans
2010 The laryngeal characterization of Korean fricatives: Acoustic and
aerodynamic data. In: Susanne Fuchs, Martine Toda & Marzena Zygis
(eds.), Turbulent sounds: An interdisciplinary guide, 143166. Berlin/
New York: Mouton de Gruyter.
Kim, Mi-Ryoung, Patrice Beddor, and J ulie Horrocks
2002 The contribution of consonantal and vocalic information to the
perception of Korean initial stops. Journal of Phonetics 30: 77100.
Kim, Soohee
1999 Subphonemic duration difference in English /s/ and few-to-many
borrowing from English to Korean. Ph.D. diss., University of
Washington.
Klatt, Dennis.
1974 The duration of [s] in English words. Journal of Speech and Hearing
Research 17: 5163.
Lee, Ahrong, and Gregory K. Iverson
2006 Variation in Korean integration of English word-nal /s/. Euhakyenkwu
[Language Research] 42: 239251. Seoul: Seoul National University.
OShaughnessy, Douglas
1981 A study of French vowel and consonant durations. Journal of
Stevens, Kenneth N.
1998 Acoustic Phonetics. Cambridge, Mass.: MIT Press.
Autosegmental spreading in Optimality Theory
1
John J. McCarthy
1. Introduction
Nick Clementss contributions to phonological theory have profoundly
inuenced my own work as well as that of many others. Among these
contributions is his formalization of the principles of autosegmental
association. The core idea of autosegmental phonology is that the pieces of
phonological representation tones, segments, and features are separate but
coordinated by association lines (Goldsmith 1976a, 1976b). The principles
of autosegmental association in Clements and Ford (1979) dene an
initial or default association that can be altered by subsequent phonological
rules.
Among these rules is autosegmental spreading. Spreading of a feature
or tone increases its temporal span in short, spreading is assimilation or
harmony. For example, in J ohore Malay nasal harmony (1), the feature
[nasal] spreads rightward to vowels and glides.
(1) Nasal harmony in J ohore Malay (Onn 1980)
m' p pardon
pow
san supervision
mo
ratappi to cause to cry

ban to rise
In most implementations of autosegmental phonology, spreading is obtained
by iterative application of rules like (2), whose effect in J ohore Malay is
schematized in (3):
2
(2) Autosegmental spreading r ule
[+nas]

[+seg] [cons] Direction: left to right
196 John J. McCarthy
(3) /mawara/ [mw
ra]
[+nas]

maw ara
Itera tive rules apply to their own output, proceeding directionally until no
further changes can be made (Anderson 1980; Howard 1972; J ohnson 1972;
Kenstowicz and Kisseberth 1977; and others). Spreading therefore continues
until it runs out of segments or is blocked by a segment with an incompatible
feature specication (e.g., true consonants in J ohore Malay).
Although Optimality Theory (Prince and Smolensky [1993] 2004) has
no direct equivalent to spreading rules, OT markedness constraints that
favor candidates with spreading have been used in analyses of harmony
phenomena. It turns out (section 2) that standard proposals for the pro-
spreading markedness constraint make implausible typological predictions.
This leads in section 3 to a new proposal with two novel elements:
(i) The motive for harmony is a constraint on autosegmental represen-
tations, SHARE(F), that is violated by any pair of adjacent segments
that are not linked to the same [F] autosegment.
(ii) Harmony and all other phonological processes occur serially rather
than in parallel. This assumption is a consequence of adopting
Harmonic Serialism as the overall analytic framework.
I will refer to this theory as Serial Harmony (SH). After explaining SHARE(F)
and Harmonic Serialism in section 3, I go on in sections 4 and 5 to show how
this system eliminates the problems with previous approaches described in
section 2.
Throughout this chapter, I often illustrate problems and results by using
variations on the J ohore Malay nasal harmony pattern in (1). This is just a
matter of convenience. Neither the problems that I address nor SH as a whole
are specic to nasal harmony; rather, they pertain to the range of phenomena
attributable to iterative autosegmental spreading.
2. Problems with current approaches to spreading in OT
If unimportant details are set aside, then there are only two main approaches
to the pro-spreading markedness constraint in OT, local AGREE and long-
distance ALIGN. Both have problems.
2.1. Local AGREE
The constraint AGR EE is perhaps closest conceptually to iterative rules like
(2). AGREE(F) says that, if a segment bears the feature-value [F], then the
immediately preceding/following segment must also bear that feature value
(Bakovic 2000; Eisner 1999; Lombardi [1995] 2001, 1999; Pulleyblank
2004). A directional version of AGREE, appropriate for J ohore Malay, appears
in (4):
(4) AGREE-R([nasal])
In a sequence o f adjacent segments xy, if x is associated with [nasal],
then y is also associated with [nasal].
The [a] sequence in *[peawasan] violates this constraint because the
[nasal] feature of the [] is not shared with the immediately following [a].
The problem with AGREE arises in languages where harmony is blocked.
Nasal harmony is often blocked by featural co-occurrence restrictions that,
in general, discountenance nasality in lower-sonority segments (Cohn 1993;
Piggott 1992; Pulleyblank 1989; Schourup 1972; Walker 1998). Walker
formalizes these restrictions in OT with the following universally xed
constraint hierarchy:
(5) Nasalizability constraint hierarchy (after Walker 1998: 36)
*NASPLO >>*NASFRIC >>*NASLIQ >>*NASGLI >>*NASVOW
For example, *NASLIQ is violated by [r
]. If AGREE-R([nasal]) is ranked
below *NASLIQ, then liquids will not undergo harmony. Under the further
assumption that nasal spreading cannot skip over segments, liquids will
block the propagation of nasality. In J ohore Malay, AGREE-R([nasal]) is
ranked between *NASLIQ and *NASGLI.
AGREE fails because it has a sour-grapes property: it favors candidates
with spreading that is fully successful, but it gives up on candidates where
spreading is blocked (McCarthy 2003; Wilson 2003, 2004, 2006). For this
reason, it predicts for J ohore Malay that hypothetical /mawa/ will become
[mw
], with total harmony, but hypothetical /mawara/ will become

[mawara], with no harmony at all. The tableaux in (6) and (7) illustrate this
prediction. When all AGREE violations can be eliminated (6), then they are.
But when a blocking constraint prevents complete spreading (7), there is no
spreading at all. (The sequences that violate AGREE have been underlined to
make them easy to nd. Tableaux are in comparative format (Prince 2002).)
(6) Agree without blocker
/mawa/ *NASLIQ AGREE-R([nas]) IDENT([nas])
a. mw
3
b. mawa 1 W L
c. mwa 1 W 1 L
d. mw
a 1 W 2 L
(7) Sour -grapes effect of Agree with blocker
/mawara/ *NASLIQ AGREE-R([nas]) IDENT([nas])
a. mawara 1
b. mwara 1 1 W
c. mw
ara 1 2 W
d. mw
ra 1 3 W
e. mw
a 1 W 1 4 W
f. mw
1 W L 5 W
The intended winner is [mw
ra] in (7d), but it is harmonically bounded by

the candidate with no spreading, [mawara] in (7a). Therefore, the intended
winner cannot actually win under any ranking of these constraints.
Clearly, AGREE is unable to account for real languages like J ohore Malay.
Worse yet, it predicts the existence of languages with sour-grapes spreading
like (6) and (7), and such languages are not attested.
A devotee of AGREE might offer to solve this problem by building the
blocking effect into the AGREE constraint itself, instead of deriving this
effect from interaction with higher-ranking constraints like *NASLIQ. In
J ohore Malay, for instance, the AGREE constraint would have to prohibit
any sequence of a nasal segment immediately followed by an oral vowel or
glide: *[+nasal][cons, nasal]. Since [mw
ra] satises this constraint but

no candidate with less spreading does, it would do the job.
This seemingly innocent analytic move misses the point of OT (Wilson
2003, 2004). The fundamental descriptive and explanatory goals of OT are
(i) to derive complex patterns from the interaction of simple constraints
and (ii) to derive language typology by permuting rankings. If AGREE in
J ohore Malay is dened as *[+nasal][cons, nasal], then we are deriving a
more complex pattern by complicating a constraint and not by interaction.
That becomes apparent when we look at a language with a different set of
blockers, such as Sundanese (Anderson 1972; Robins 1957). Because glides
are blockers in Sundanese, a slightly different AGREE constraint will be
required. If we adopt this constraint, then we are deriving language typology
by constraint parametrization rather than ranking permutation. The move of
redening AGREE to incorporate the blocking conditions, while technically
possible, is antithetical to sound explanation in OT.
2.2. Long-distance ALIGN
Alignment constraints require that the edges of linguistic structures coincide
(McCarthy and Prince 1993; Prince and Smolensky 2004). When alignment
constraints are evaluated gradiently, they can discriminate among candidates
that are imperfectly aligned.
Gradient alignment constraints have often been used to enforce auto-
segmental spreading by requiring an autosegment to be associated with the
leftmost or rightmost segment in some domain (Archangeli and Pulleyblank
2002; Cole and Kisseberth 1995a, 1995b; Kirchner 1993; Pulleyblank 1996;
Smolensky 1993; and many others). In J ohore Malay, the gradient constraint
ALIGN-R([nasal], word) ensures that every [nasal] autosegment is linked
as far to the right as possible. In (8), the rightward spreading of [nasal] is
indicated by underlining the segments associated with it:
(8) ALIGN-R([nasal] , word) illustrated
/mawara/ *NASLIQ ALIGN-R([nasal],
word)
IDENT([nasal])
a. mawara 5 W L
b. mwara 4 W 1 L
c. mw
ara 3 W 2 L
d. mw
ra 2 3
e. mw
a 1 W 1 L 4 W
f. mw
1 W L 5 W
Candidate (8d) wins because its [nasal] autosegment is linked to a segment
that is only two segments away from the right edge of the word. (Diagram (3)
illustrates). In candidates with more ALIGN violations, [nasal] has not spread
as far, whereas candidates with fewer ALIGN violations contain the forbidden
segment *[r
].
The blocking situation illustrated in (8) is the source of ALIGNs problems
as a theory of spreading in OT, as Wilson (2003, 2004, 2006) has shown.
ALIGN creates an impetus to minimize the number of peripheral segments
that are inaccessible to harmony because of an intervening blocker. Many
imaginable ways of doing that such as deleting segments, forgoing
epenthesis, or choosing shorter allomorphs are unattested but predicted
to be possible under ranking permutation. These wrong predictions will be
discussed in section 5, after SH has been presented.
3. The propos al: Serial Harmony
The theory of Serial Harmony (SH) has two novel elements: a proposal
about the constraint that favors autosegmental spreading (section 3.1), and a
derivational approach to phonological processes (section 3.2).
The proposal is worked out here under the assumption that distinctive
features are privative, since this seems like the most plausible view (see
Lombardi 1991; Steriade 1993a, 1993b, 1995; Trigo 1993; among others).
Whether this proposal can be made compatible with equipollent features
remains to be determined.
3.1. Autosegm ental spreading in SH
We saw in section 2 that the markedness constraint favoring autosegmental
spreading is a crucial weakness of previous approaches to harmony in OT.
SHs constraint looks somewhat like one of those earlier constraints, AGREE,
but there are important differences as a result of other assumptions I make.
The constraint SHARE(F) requires adjacent elements (here, segments) to
be linked to the same [F] autosegment:
3
(9) SHARE(F)
Assign one violation mark for every pair of adjacent elements that
are not linked to the same token of [F].
Example (10) illustrates the only way that a pair of adjacent segments can
satisfy this constraint, while example (11) shows the several ways that a pair
of segments can violate it. Below each form I show the simplied notation
I will be using in the rest of this chapter.
(10) Example: SHARE([nasal]) obeyed
[nas]

ma
(11) Examples : SHARE([nasal]) violated
a. [nas]

ma
[m|a]

b. [nas]

b
[b|]

c. [nas] [nas]
m
[m|]

d.
ba
[b|a]
The three kinds of SHARE violation exemplied in (11) are: (a) and (b) a
[nasal] autosegment is linked to one segment but not the other; (c) each
segment is linked to a different [nasal] autosegment; (d) neither segment is
linked to a [nasal] autosegment. In the simplied notation, these violations
are indicated by a vertical bar between the offending segments.
Like ALIGN-R([nasal], word), which it replaces, SHARE([nasal]) favors
(10) over (11a) and (11c). Unlike ALIGN-R([nasal], word), SHARE([nasal])
also favors (10) over (11d), the form with no [nasal] feature to spread.
This difference is addressed in section 3.2. And because it has no inherent
directional sense, SHARE([nasal]) disfavors (11b) as much as (11a), whereas
ALIGN-R([nasal], word) nds (11b) inoffensive. Limitations of space
do not permit me to present SHs theory of directionality, which is an
obvious extension of recent proposals that the source segment in autoseg-
mental spreading is the head of the featural domain (Cassimjee and Kisseberth
1997; Cole and Kisseberth 1995a; McCarthy 2004; Smolensky 1995, 1997,
2006).
3.2. SH and Har monic Serialism
Harmonic Serialism (HS) is a version of OT in which GEN is limited to making
one change at a time. Since inputs and outputs may differ in many ways, the
output of each pass through HSs GEN and EVAL is submitted as the input to
another pass through GEN and EVAL, until no further changes are possible. HS
was briey considered by Prince and Smolensky (2004), but then set aside.
Lately, the case for HS has been reopened (see J esney to appear; Kimper to
appear; McCarthy 2000, 2002, 2007a, 2007b, 2007c, 2008a, 2008b; Pater to
appear; Pruitt 2008; Wolf 2008). Besides Prince and Smolenskys work, HS
also has connections with other ideas about serial optimization (e.g., Black
1993; Chen 1999; Goldsmith 1990; 1993; Kenstowicz 1995; Kiparsky 2000;
Norton 2003; Rubach 1997; Tesar 1995).
An important aspect of the on-going HS research program is determining
what it means to make one change at a time. Answering this question for
the full range of phonological phenomena is beyond the scope of this chapter,
but before analysis can proceed it is necessary to adopt some assumptions
about how GEN manipulates autosegmental structures:
(12) Assumption s about GEN for autosegmental phonology in HS
4
GENs set of operations consists of:
a. Insertions:
A feature and a single association line linking it to some pre-
existing structure.
A single association line linking two elements of pre-existing
structure.
b. Deletions:
A feature and a single association line linking it to some pre-
existing structure.
An association line linking two elements of pre-existing
structure.
Under these assumptions, GEN cannot supply a candidate that differs from the
input by virtue of, say, spreading a feature from one segment and delinking
it from another. This means that feature op processes require two steps in
an HS derivation (McCarthy 2007a: 9193).
3.3. SH exemplied
We now have sufcient resources to work through an example in SH. The
grammar of J ohore Malay maps /mawara/ to [mw
ra] by the succession of

derivational steps shown in (13). At each step, the only candidates that are
considered are those that differ from the steps input by at most one GEN-
imposed change. The grammar evaluates this limited set of candidates in
exactly the same way as in parallel OT. The optimal form then becomes the
input to another pass through GEN, and so on until the unchanged candidate
wins (convergence).
(13) SH derivation of /mawara/ [mw
ra] (cf. (8)

Step 1
m|a|w|a|r|a *NASLIQ SHARE
([nasal])
*NASGLI *NASVOW IDENT
([nas])
a. m|w|a|r|a 4 1 1
b. m|a|w|a|r|a 5 W L L
c. b|a|w|a|r|a 5 W L 1
Step 2
m|w|a|r|a *NASLIQ SHARE
([nasal])
([nas])
a. mw
|a|r|a 3 1 1 1
b. m|w|a|r|a 4 W L 1 L
Step 3
mw
|a|r|a *NASLIQ SHARE

([nasal])
([nas])
a. mw
|r|a 2 1 2 1
b. mw
|a|r|a 3 W 1 1 L L
Step 4 Convergence
mw
|r|a *NasLIQ SHARE

([nasal])
([nas])
a. mw
|r|a 2 1 2
b. mw
|a 1 W 1 L 1 2 1 W
3.4. A difference be tween HS and parallel OT
HSs architecture imposes limitations on the kinds of mappings that languages
can perform. Recall that SHARE([nasal]) favors [m] over [b|a]. In parallel
OT, SHARE([nasal]) can compel insertion and spreading of [nasal] to change
/b|a/ into [m], as shown in tableau (14).
(14) Spontaneous nasal ization with SHARE([nasal]) in parallel OT
b|a SHARE([nas]) IDENT([nas])
a. m 2
b. b|a 1 W L
c. m|a 1 W 1 L
This prediction is obviously undesirable; languages with nasal harmony do
not also have spontaneous nasalization in oral words.
HS cannot produce this mapping with these constraints. (This claim has
been veried using OT-Help 2, which is described in section 5.) The winning
candidate [m] differs from the input /ba/ by two changes: nasalization of
one of the segments and spreading of [nasal] to the other. In HS, these two
changes cannot be effected in a single pass through GEN. Starting with input
/b|a/, the candidate set after the rst pass through GEN includes faithful [b|a]
and nasalized [m|a] or [b|] but not [m], which has both inserted [nasal]
and spread it. Tableau (15) shows that SHARE([nasal]) does not favor either of
these unfaithful candidates over [b|a].
(15) Convergence to [b|a] on rst pass through GEN and EVAL
/b|a/ SHARE([nas]) IDENT([nas])
a. b|a 1
b. m|a 1 1 W
c. b| 1 1 W
Clearly, there is no danger of SHARE([nasal]) causing spontaneous nasalization,
since all three candidates violate this constraint equally.
This example typies the difference between parallel OT and HS.
In parallel OT, the (spurious) advantage of spontaneous nasalization
and spreading is realized immediately, and so the unwanted /ba/ [m]
mapping is possible. In HS, however, any advantage accruing to spontaneous
nasalization must be realized without the benet of spreading, which comes
later. HS has no capacity to look ahead to the more favorable result that
can be achieved by spreading once [nasal] has been inserted. Since none
of the constraints under discussion favors spontaneous nasalization, the
/ba/ [m] mapping is impossible in HS with exactly the same constraints
and representational assumptions that made it possible in parallel OT.
Differences like this between parallel OT and HS form the basis for most
arguments in support of HS in the literature cited at the beginning of this
section. This difference is also key to SHs ability to avoid the problems of
AGREE and ALIGN, as we will now see.
4. SH compared with AGREE
SH does not share AGREEs sour-grapes problem described in section 2.1. This
problem is AGREEs inability to compel spreading that is less than complete
because of an intervening blocking segment. AGREE has this problem because
it is not satised unless the feature or tone spreads all the way to the periphery.
That SHARE does not have this problem is apparent from (13). The
mapping /mawara/ [mw
ra] is exactly the kind of situation where

AGREE fails, since faithful [mawara] and the intended winner [mw
ra] each
violate AGREE once. But SHARE deals with this situation successfully because
[m|a|w|a|r|a] has more violations than [mw
|r|a].
Another advantage of SHARE over AGREE is that it does not support feature
deletion as an alternative to spreading. The violation of AGREE in /mawara/
could be eliminated by denasalizing the /m/. Thus, AGREE predicts the
existence of a language where nasal harmony alternates with denasalization:
/mawa/ [mw
] vs. /mawara/ [bawara]. No such language exists, and

SHARE makes no such prediction. Step 1 of (13) shows that the mapping
/mawara/ [bawara] (candidate (c)) is harmonically bounded by the faithful
mapping. Therefore, the constraints in (13), including SHARE([nasal]), can
never cause denasalization under any ranking permutation.
5. SH compared with ALIGN
As I noted in section 2 .2, a constraint like ALIGN-R([nasal], word) could in
principle be satised not only by spreading [nasal] onto segments to its right
but also by other methods. Wilson (2003, 2004, 2006) has identied several
such methods, none of which actually occur. These pathologies, as he calls
them, are problematic for a theory of harmony based on ALIGN, though, as
I will argue, they are no problem in SH.
All of the pathologies have one thing in common: they minimize the
number of segments between the rightmost (or leftmost) segment in the
[nasal] span and the edge of the word. Deleting a non-harmonizing segment
comes to mind as one way of accomplishing that, but there are several
others, including metathesis, afx repositioning, blocking of epenthesis, and
selection of shorter allomorphs.
5
All of the claims in this section about what SH can and cannot do have
been veried with OT-Help 2 (Becker et al. 2009). There are principled
methods for establishing the validity of typological claims in parallel OT
(Prince 2006), but no such techniques exist for HS. Thus, typological claims
in HS, such as those in this section, can be conrmed only by following
all derivational paths for every ranking. OT-Help 2 implements an efcient
algorithm of this type. Moreover, it does so from a user-dened GEN and CON,
so it calculates and evaluates its own candidates, starting only with user-
specied underlying representations. In the present instance, the typologies
were calculated using all of the SH constraints in this chapter and operations
equivalent to autosegmental spreading,

deletion, metathesis, epenthesis, and
morpheme spell-out, as appropriate.
5.1. Segmental deletion
This is the rst of the pathologies that we will consider. Because ALIGN-
R([nasal], word) is violated by any non-harmonizing segment that follows a
nasal, it can be satised by deletion as well as spreading. Tableau (16) gives
the ranking for a language that deletes non-harmonizing /r/ (and perhaps
the vowel that follows it, depending on how ONSET is ranked). This type of
harmony has never been observed, to my knowledge.
(16) Harmony by deletion pathology with ALIGN
/mawara/ *NASLIQ ALIGN-R
([nasal], word)
MAX IDENT([nas])
a. mw
. 1 4
b. mw
ra 2 W 3 L
d. mw
1 W L 5 W
SH does not make this prediction. It does not by virtue of the hypothesis
that segmental deletion is the result of gradual attrition that takes place over
several derivational steps (McCarthy 2008a). This assumption is a very
natural one in light of developments in feature geometry (Clements 1985)
and parametric rule theory (Archangeli and Pulleyblank 1994). GEN can
perform certain operations on feature-geometric structures, among which is
deletion of feature-geometric class nodes. A segment has been deleted when
all of its class nodes have been deleted, one by one. Thus, what we observe as
total segmental deletion is the telescoped (Wang 1968) result of a a series
of reductive neutralization processes. This proposal explains why segmental
deletion is observed in coda position: codas are independently subject to
deletion of the Place and Laryngeal nodes.
With this hypothesis about segmental deletion, SH does not allow SHARE
(or ALIGN) to compel segmental deletion. The argument is similar to the one
in section 3.4: the rst step in deleting a segment does not produce immediate
improvement in performance on SHARE, and HS has no look-ahead ability.
Imagine that the derivation has reached the point where [mw
|r|a] is the
input to GEN. The form [mw
|a], with outright deletion of [r] and consequent

elimination of a SHARE([nasal]) violation, is not among the candidates that
GEN emits. There is a candidate in which [r] has lost its Place node, but the
resulting Place-less segment still violates SHARE([nasal]).
The deletion pathology arises in parallel OT because GEN produces
candidates that differ from the underlying representation in many ways for
instance, from /mawara/, it directly produces [mw
.], which is optimal

under the ranking in (16). In this tableau, [mw
.] is the global minimum

of potential for further harmonic improvement. Parallel OT always nds this
global minimum. HSs GEN is incapable of such fell-swoop derivations. As a
result, HS derivations sometimes get stuck at a local minimum of harmonic
improvement potential. The evidence here and elsewhere (McCarthy 2007b,
2008a) shows that it is sometimes a good thing to get stuck.
5.2. Metathesis
Though there are skeptics, metathesis really does seem to be securely attested
in synchronic phonology (Hume 2001). Certain factors are known to favor
metathesis (Ultan 1978), and it is clear that harmony is not among them. Yet
metathesis is a possible consequence of enforcement of ALIGN in parallel OT,
as tableau (17) shows. Here, [r] and nal [a] have metathesized to make [a]
accessible to spreading of [nasal], thereby eliminating a violation of ALIGN.
(17) Metathesis pathology with ALIGN
/mawara/ *NASLIQ ALIGN-R
([nasal], word)
LINEARITY ID([nas])
a. mw
.r 1 1 4
b. mw
ra 2 W L 3 L
c. mw
1 W L L 5 W
SH makes no such prediction. Metathesis and spreading are distinct operations
that require different derivational steps, so the winner in (17) is never among
the candidates under consideration. Imagine once again that the derivation
has reached the point where [mw
|r|a] is the input to GEN. The candidate

set includes [mw
|a|r], with metathesis, and [mw
|a], with spreading, but

[mw
.r] is not possible at this step, because it differs from the input in two
distinct ways. This result is similar to the one in (15): because there is no
look-ahead, satisfaction of SHARE in HS will never be achieved with a two-
step derivation that rst sets up the conditions that make spreading possible
and then spreads at the next step.
5.3. Epenthesis
Wilson also points out that parallel OT predicts a pathologic interaction
between ALIGN and epenthesis. Because ALIGN disfavors segments that are
inaccessible to spreading, epenthesis into an inaccessible position is also
disfavored. For instance, suppose a language with nasal harmony also has
vowel epenthesis, satisfying NO-CODA by inserting [i]. Obviously, NO-CODA
dominates DEP. Suppose further that NO-CODA is ranked below ALIGN-
R([nasal], word). In that case, epenthesis will be prevented if the epenthetic
vowel is inaccessible to nasal harmony because of an intervening blocking
segment:
(18) ALIGN-R([nasal], word) preventing epenth esis
/mar/ *NASLIQ ALIGN-R([nasal], word) NO-CODA DEP
a. mr 1 1
b. mri 2 W L 1 W
c. mr
i 1 W L L 1 W
Words that contain no nasals vacuously satisfy ALIGN-R([nasal], word), so
this constraint is irrelevant in such words. Thus, nasalless words are able to
satisfy NO-CODA by vowel epenthesis: /pas/ [pasi]. Furthermore, words
that contain a nasal but no blockers will also undergo epenthesis, since the
epenthetic vowel is accessible to nasal spreading:
(19) No blocker: /maw/ [mw
i ]
/maw/ *NASLIQ ALIGN-R([nasal], word) NO-CODA DEP
a. mw
i 1
b. mw
1 W L
A language with this grammar would t the following description: nal
consonants become onsets by vowel epenthesis, unless preceded at any
distance by a nasal and a true consonant, in that order. This is an implausible
prediction.
Epenthesis of a vowel and spreading of a feature onto that vowel are
separate changes, so HSs GEN cannot impose them simultaneously on a
candidate. Rather, epenthesis and spreading must take place in separate steps,
and hence the constraint hierarchy evaluates the consequences of epenthesis
without knowing how spreading might subsequently affect the epenthetic
vowel.
It follows, then, that vowel epenthesis always adds a violation of
SHARE([nasal]), regardless of context: [m|r] vs. [m|r|i], [mw
] vs. [mw
|i].
If SHARE([nasal]) is ranked above NO-CODA, then it will simply block
epenthesis under all conditions, just as DEP will block epenthesis if ranked
above NO-CODA. Ranking SHARE([nasal]) above NO-CODA may be a peculiar
way of preventing epenthesis, but there is no pathology. There are languages
with no vowel epenthesis, and the grammar just described is consistent with
that fact.
5.4. Afx repositioning
By dominating afxal alignment constraints, markedness constraints can
compel inxation (McCarthy and Prince 1993; Prince and Smolensky 2004;
and others).

They can even cause afxes to switch between prexal and
sufxal position (Fulmer 1997; Noyer 1993).
ALIGN-R([nasal], word) is among the markedness constraints that could
in principle have this effect, as Wilson observes. Its inuence on afx
placement is much like its inuence on epenthesis. When the stem contains
a nasal consonant followed by a blocker like [r], then an oral afx can
be forced out of sufxal position to improve alignment of [nasal] (20a).
But if the stem contains no [nasal] segments, then there is no threat of
improper alignment, and so the afx can be a sufx, as is its wont (20b). The
afx will also be sufxed if it is itself nasalizable and no blocker precedes it
in the stem (20c). Nothing like this behavior has been observed among the
known cases of phonol ogically-conditioned afx placement. It is presumably
impossible.
(20) ALIGN-R([nasal], word) affecting af x placement
a. Prexation when inaccessible to harmony
/mar, o/ *NASLIQ ALIGN-R
([nasal], word)
ALIGN-R
(-o, word)
i. omr 1 3
ii. mro 2 W L
iii. mr
1 W L L
b. Sufxation with no nasal to harmonize
/par, o/ *NASLIQ ALIGN-R
([nasal], word)
ALIGN-R
(-o, word)
i. paro
ii. opar 3 W
c. Sufxation when accessible to harmony
/maw, o/ *NASLIQ ALIGN-R
([nasal], word)
ALIGN-R
(-o, word)
i. mw
ii. omw
3 W
We will now look at how cases like this play out in SH. We rst need a
theory of phonology-morphology interaction in HS to serve as the basis
for analyzing afx displacement phenomena. To this end, I adopt the
framework of Wolf (2008). Wolf proceeds from the assumption that the
input to the phonology consists of abstract morphemes represented by their
morphosyntactic features e.g., /DOG-PLURAL/. Spelling out each morpheme
requires a single step of a HS derivation: <DOG-PLURAL, d:q-PLURAL, d:qz>.
Spell-out is compelled by the constraint MAX-M, which is satised when an
abstract morpheme is spelled out by some formative.
Afx displacement phenomena show that the location of spell-out is not
predetermined. Thus, [d:qz], [d:zq], [dz:q] etc. are all legitimate candidates
that satisfy MAX-M. The actual output [d:qz] is selected by the constraint
MIRROR, which favors candidates where the phonological spell-out of a
feature matches its location in morphosyntactic structure. Afx displacement
is violation of MIRROR to satisfy some higher-ranking constraint.
We now have the resources necessary to study the consequences of SH
for our hypothetical example. Small capitals MAS, PAR, MAW will be used
for the morphosyntactic representation of roots, and the [o] sufx will spell-
out PLURAL. We begin with PAR. The input is the morphosyntactic structure
[PAR PLURAL]. The rst derivational step spells out the morphosyntactic
representation PAR as the phonological string [par]. This change improves
performance on the constraint MAX-M (see (21)), but because it introduces
phonological structure where previously there was none, it brings violations
of phonological markedness constraints, including SHARE([nasal]). (In
subsequent examples, the root spell-out step will be omitted.)
(21) First step: [PAR PLURAL] [par PLURAL]
[PAR PLURAL] *NASLIQ MAX-M SHARE([nas])
a. [p|a|r PLURAL] 1 2
b. [PAR PLURAL] 2 W L
Further improvement on MAX-M is possible by spelling out PLURAL as [o]. GEN
offers candidates that differ in where PLURAL is spelled out, and MIRROR chooses
the correct one. MIRROR is shown as separated from the rest of the tableau
because its ranking cannot be determined by inspecting these candidates:
(22) Second step: [par PLURAL] [paro]
[par PLURAL] *NASLIQ MAX-M SHARE([nas]) MIRROR
a. [p|a|r|o] 3
b. [p|a|r PLURAL] 1 W 2 L
c. [o|p|a|r] 3 3 W
d. [p|o|a|r] 3 2 W
Since no further harmonic improvement is possible (relative to the constraints
under discussion), the derivation converges on [paro] at the third step.
When the input to the second step contains a nasal, like [mar PLURAL],
there is a choice between spelling out PLURAL or spreading [nasal]. Since
MAX-M is ranked higher, spell-out takes precedence:
(23) Second step: [mar PLURAL] [maro]
[mar PLURAL] *NASLIQ MAX-M SHARE([nas]) MIRROR
a. [m|a|r|o] 3
b. [m|a|r PLURAL] 1 W 2 L
c. [m|r PLURAL] 1 W 1 L
d. [o|m|a|r] 3 3 W
e. [m|o|a|r] 3 2 W
This is the crucial tableau. It shows that SHARE([nasal]), unlike ALIGN in
(20b), is unable to affect the placement of the afx. All placements of the
afx [o] equally affect performance on SHARE([nasal]), adding one violation
of it. Thus, there is no advantage to shifting this afx out of the position
preferred by the constraint MIRROR.
It might seem that SHARE([nasal]) could affect afx placement by favoring
[m|a|r] or [m|a|r], but these are not legitimate candidates at the afx spell-
out step. HSs one-change-at-a-time GEN cannot simultaneously spell out a
morpheme and spread a feature onto it. Although SHARE([nasal]) would make it
advantageous to spell out [o] next to [m], that advantage cannot be discovered
until it is too late, when the location of the afx has already been determined. An
afxs accessibility to autosegmental spreading is irrelevant to its placement,
because the effect of spreading and the location of spell-out cannot be decided
simultaneously, since it is impossible under HS for competing candidates to
differ in both of these characteristics at the same time.
5.5. Allomorph selection
In phonologically conditioned allomorphy, a morpheme has two or more
surface alternants that are selected for phonological reasons but cannot
be derived from a common underlying form. In Korean, for example, the
nominative sufx has two alternants, [i] and [ka]. There is no reasonable
way of deriving them from a single underlying representation, but their
occurrence is determined phonologically: [i] follows consonant-nal stems
and [ka] (voiced intervocalically to [qa]) follows vowel-nal stems:
(24) Korean nominative sufx allomorphy
cib-i house-NOM
c'a-qa car-NOM
Research in OT has led to the development of a theory of phonologically
conditioned allomorphy based on the following premises (e.g., Burzio 1994;
Hargus 1995; Hargus and Tuttle 1997; Mascar 1996, 2007; Mester 1994;
Perlmutter 1998; Tranel 1996a, 1996b, 1998):
(i) The allomorphs of a morpheme are listed together in the underlying
representation: /cip-{i, ka}/, /c'a-{i, ka}/.
(ii) GEN creates candidates that include all possible choices of an
allomorph: [cib-i], [cip-ka], [c'a-i], [c'a-qa]. (Intervocalic voicing is
an allophonic alternation that I will not be discussing here.)
(iii) Faithfulness constraints like MAX and DEP treat all allomorph choices
equally.
(iv) So markedness constraints determine which allomorph is most
harmonic. In Korean, the markedness constraints ONSET and NO-
CODA correctly favor [cib-i] and [c'a-qa] over [cip-ka] and [c'a-i],
respectively.
The following tableaux illustrate:
(25) Allomorph selection in Korean
a.
/cip-{i, ka}/ ONSET NO-CODA
i. ci.bi
ii. cip.ka 1 W
b.
/c'a-{i, ka}/ ONSET NO-CODA
i. c'a.qa
ii. c'a.i 1 W
Wilson shows that a pathology emerges when ALIGN-R([nasal], word) is
allowed to participate in allomorph selection. This constraint will prefer the
shorter sufx allomorph when the stem contains a [nasal] feature that cannot
spread onto the sufx. Furthermore, it can exercise this preference even in a
language that has no nasal harmony at all, since the potential effect of ALIGN-
R([nasal], word) on allomorph selection is independent of its ranking with
respect to faithfulness to [nasal].
The pseudo-Korean example in (26) illustrates. Although ONSET favors
the allomorph [-qa] after vowel-nal stems, its effect is overridden by ALIGN-
R([nasal], word) when the stem contains a nasal consonant. But with roots
that do not contain a nasal, ALIGN-R([nasal], word) is vacuously satised by
both candidates, and ONSET favors [-qa].
(26) Allomorph selection pathology
/mi-{i, ka}/ ALIGN-R([nasal], word) ONSET
a. mi.i 2 1
b. mi.qa 3 W L
In a language with the ranking in (26), the choice between [i] and [ka] will
be determined by ONSET except when the stem contains a nasal consonant
at any distance, in which case the shorter allomorph will win despite the
marked syllable structure it creates. Furthermore, this effect has nothing to
do with the ranking of IDENT([nasal]) or any similar faithfulness constraint.
It is therefore possible for ALIGN-R([nasal], word) to have this effect in
languages without an inkling of nasal harmony. This prediction is surely an
implausible one.
SHARE([nasal]) does not make these predictions. It simply favors the
shorter allomorph, [i], since this allomorph introduces one SHARE([nasal])
violation while the longer allomorph [k|a] introduces two. SHARE([nasal]) has
this effect regardless of whether the stem contains a nasal consonant:
(27) No pathology with SHARE([nasal])
a. N o nasal in stem
/t|a-{i, k|a}/ SHARE([nas]) ONSET
i. t|a|i 2 1
ii. t|a|q|a 3 W L
b. Nasal in stem
/n|a|m|i-{i, k|a}/ SHARE([nas]) ONSET
a. n|a|m|i|i 4 1
b. n|a|m|i|q|a 5 W L
This effect of SHARE([nasal]) in systems of allomorphy might seem a bit
odd, but it is not pathological. As in the case of epenthesis (section 5.3),
SHARE([nasal]) predicts a system that we already predict in another, more
obvious way. The language in (27) is simply one where ONSET does not choose
among allomorphs; the sufx always surfaces as [i] because SHARE([nasal])
favors the shorter allomorph consistently. Presumably the learner would be
content to represent this sufx as just /i/ instead of taking the roundabout route
in (27). But a language without allomorphy is a possible human language, so
there is no pathological prediction being made.
Although (27) is a language without nasal harmony, the result is the same in
a language with harmony. The reason is the same as in section 5.4: HSs GEN is
limited to doing one thing at a time. In Wolfs (2008) theory, morpheme spell-
out is one of the things that HSs GEN can do. Since spell-out and spreading
cannot occur simultaneously, the possible consequences of spreading cannot
inuence spell-out, so an allomorphs amenability to spreading does not
improve its chances. In general, SHARE([nasal]) favors shorter allomorphs, but
it does so in a non-pathological way: it does not distinguish between bases
that contain nasals and those that do not, so it cannot produce the odd long-
distance afx-minimizing effect that ALIGN predicts.
6
5.6. Summary
When SHARE and its associated representational assumptions are combined
with HS, the pathologies identied by Wilson (2003, 2004, 2006) are resolved.
The shift to SHARE eliminates the long-distance segment-counting effect of
ALIGN, where a nasal anywhere in the word could affect the possibility of
epenthesis, the location of an afx, or the selection of an allomorph. HS
addresses the deletion and metathesis pathologies, and it also explains
why inserting [nasal] is not a legitimate way of improving performance on
SHARE([nasal]). Furthermore, HS denies SHARE the power to have even local
effects on epenthesis or allomorph selection.
6. Conclusion
Harmonic Serialism has OTs core properties: candidate competition judged
by ranked, violable constraints. HS differs from parallel OT in two related
respects: HSs GEN is limited to making one change at a time, and the
output is fed back into GEN until convergence. In their original discussion
of HS, Prince and Smolensky (2004: 9596) noted that [i]t is an empirical
question of no little interest how Gen is to be construed and that [t]here
are constraints inherent in the limitation to a single operation. This chapter
is an exploration of that question and those constraints in the domain of
autosegmental spreading processes.
I have argued that a particular approach to autosegmental spreading,
embedded in HS and called Serial Harmony, is superior to alternatives
embedded in parallel OT. The parallel OT theories of harmony make incorrect
typological predictions, while Serial Harmony does not.
Notes
1. This work is much the better for the feedback I received from the participants
in the UMass Phonology Grant Group in Fall, 2008: Diana Apoussidou, Emily
Elfner, Karen J esney, Peter J urgec, Kevin Mullin, Kathryn Pruitt, Brian Smith,
Wendell Kimper, and especially J oe Pater. Grace Delmolino provided welcome
stylistic support. This research was funded by grant BCS-0813829 from the
National Science Foundation to the University of Massachusetts Amherst.
2. In the earliest literature on autosegmental phonology such as Goldsmith (1976a,
1976b) or Clements and Ford (1979), spreading was effected by constraints rather
than rules. In place of iteration, which makes sense for rules but not constraints,
Clements and Ford recruit the Q variable of Halle (1975).
3. The denition of SHARE in (9) is intended to allow some leeway depending on
how phenomena like neutral segments or problems like locality are handled.
Thus, the adjacent elements referred to in the denition of SHARE could be
feature-geometric V-Place nodes (Clements and Hume 1995), segments, moras,
syllables, or other P-bearing units (Clements 1980, 1981). Adjacency is also an
abstraction, as the adjacency parameters in Archangeli and Pulleyblank (1987,
1994) make clear.
4. Under the assumptions about GEN in (12), feature spreading is an iterative
process, affecting one segment at a time. Nothing in this paper depends on that
assumption, though Pruitt (2008) has argued that stress assignment must iterate
in HS, while Walker (2008) presents evidence from Romance metaphony against
iterative spreading.
5. Wilson cites one more pathological prediction of ALIGN. In a language with
positional faithfulness to [nasal] in stressed syllables, such as Guaran (Beckman
1998), stress could be shifted to minimize ALIGN([nasal]) violations. I do not
address this here because it is one of many pathologies associated with positional
faithfulness pathologies that are eliminated in HS, as J esney (to appear)
demonstrates.
6. Wilson also points out a related prediction. If it dominates MAX-BR, ALIGN-
R([nasal], word) can cause a reduplicative sufx to copy fewer segments when
the stem contains a nasal consonant: /pataka-RED/ [pataka-taka] versus
/makasa-RED/ [makasa-sa] (if other constraints favor a disyllabic reduplicant
that can shrink to monosyllabic under duress). This behavior is also unattested,
and cannot arise in SH. The reasoning is similar to the allomorphy case.
References
Anderson, Stephen R.
1972 On nasalization in Sundanese. Linguistic Inquiry 3: 253268.
1980 Problems and perspectives in the description of vowel harmony. In:
Robert Vago (ed.), Issues in Vowel Harmony, 148. Amsterdam: J ohn
Benjamins.
Archangeli, Diana and Douglas Pulleyblank
1987 Minimal and maximal rules: Effects of tier scansion. In: J oyce
McDonough and Bernadette Plunkett (eds.), Proceedings of the North
East Linguistic Society 17, 1635. Amherst, MA: GLSA Publications.
1994 Grounded Phonology. Cambridge, MA: MIT Press.
2002 Kinande vowel harmony: Domains, grounded conditions and one-
sided alignment. Phonology 19: 139188.
Bakovic, Eric
2000 Harmony, dominance, and control. Ph.D. diss., Department of
Linguistics, Rutgers University.
Becker, Michael, Patrick Pratt, Christopher Potts, Robert Staubs, J ohn J . McCarthy
and J oe Pater
2009 OT-Help 2.0 [computer program].
Beckman, J ill
1998 Positional faithfulness. Ph. D. diss., Department of Linguistics,
University of Massachusetts Amherst.
Black, H. Andrew
1993 Constraint-ranked derivation: A serial approach to optimization. Ph. D.
diss., Department of Linguistics, University of California, Santa Cruz.
Burzio, Luigi
1994 Metrical consistency. In: Eric Sven Ristad (ed.), Language Com-
putations, 93125. Providence, RI: American Mathematical Society.
Cassimjee, Farida and Charles Kisseberth
1997 Optimal Domains Theory and Bantu tonology: A case study from
Isixhosa and Shingazidja. In: Larry Hyman and Charles Kisseberth
(eds.), Theoretical Aspects of Bantu Tone, 33132. Stanford: CSLI
Publications.
Chen, Matthew
1999 Directionality constraints on derivation? In: Ben Hermans and Marc
van Oostendorp (eds.), The Derivational Residue in Phonological
Optimality Theory, 105127. Amsterdam: J ohn Benjamins.
Clements, G. N.
1980 Vowel Harmony in Nonlinear Generative Phonology: An
Autosegmental Model. Bloomington: Indiana University Linguistics
Club Publications.
1981 Akan vowel harmony: A nonlinear analysis. Harvard Studies in
Phonology 2: 108177.
Clements, G. N. and Kevin C. Ford
1979 Kikuyu tone shift and its synchronic consequences. Linguistic Inquiry
10: 179210.
Clements, G. N. and Elizabeth Hume
1995 The internal organization of speech sounds. In: J ohn A. Goldsmith
(ed.), The Handbook of Phonological Theory, 245306. Cambridge,
MA, and Oxford, UK: Blackwell.
Cohn, Abigail
1993 A survey of the phonology of the feature [nasal]. Working Papers of
the Cornell Phonetics Laboratory 8: 141203.
Cole, J ennifer S. and Charles Kisseberth
1995a Nasal harmony in Optimal Domains Theory. Unpublished, University
of Illinois.
1995b An Optimal Domains theory of harmony. Unpublished, University of
Illinois.
Eisner, J ason
1999 Doing OT in a straitjacket. Unpublished, J ohns Hopkins University.
Fulmer, S. Lee
1997 Parallelism and planes in Optimality Theory. Ph.D. diss., Department
of Linguistics, University of Arizona.
Goldsmith, J ohn
1976a Autosegmental phonology. Ph. D. diss., Department of Linguistics,
MIT.
1976b An overview of autosegmental phonology. Linguistic Analysis 2: 23
68.
1990 Autosegmental and Metrical Phonology. Oxford and Cambridge,
MA: Blackwell.
1993 Harmonic phonology. In: J ohn Goldsmith (ed.), The Last Phonological
Rule: Reections on Constraints and Derivations, 2160. Chicago:
University of Chicago Press.
Halle, Morris
1975 Confessio grammatici. Language 51: 525535.
Hargus, Sharon
1995 The rst person plural prex in BabineWitsuwiten. Unpublished,
University of Washington.
Hargus, Sharon and Siri G. Tuttle
1997 Augmentation as afxation in Athabaskan languages. Phonology 14:
177220.
Howard, Irwin
1972 A directional theory of rule application in phonology. Ph.D. diss.,
Department of Linguistics, MIT.
Hume, Elizabeth
2001 Metathesis: Formal and functional considerations. In: Elizabeth
Hume, Norval Smith and J eroen Van de Weijer (eds.), Surface Syllable
Structure and Segment Sequencing, 125. Leiden: Holland Institute of
Linguistics (HIL).
J esney, Karen
to appear Positional faithfulness, non-locality, and the Harmonic Serialism
solution. Proceedings of NELS 39.
J ohnson, C. Douglas
1972 Formal Aspects of Phonological Description. The Hague: Mouton.
Kenstowicz, Michael
1995 Cyclic vs. non-cyclic constraint evaluation. Phonology 12: 397
436.
Kenstowicz, Michael and Charles Kisseberth
1977 Topics in Phonological Theory. New York: Academic Press.
Kimper, Wendell
to appear Local optionality and Harmonic Serialism. Natural Language &
Linguistic Theory.
Kiparsky, Paul
2000 Opacity and cyclicity. The Linguistic Review 17: 351367.
Kirchner, Robert
1993 Turkish vowel harmony and disharmony: An Optimality Theoretic
account. Unpublished, UCLA.
Lombardi, Linda
1991 Laryngeal features and laryngeal neutralization. Ph. D. diss.,
Department of Linguistics, University of Massachusetts Amherst.
1999 Positional faithfulness and voicing assimilation in Optimality Theory.
Natural Language & Linguistic Theory 17: 267302.
2001 Why Place and Voice are different: Constraint-specic alternations in
Optimality Theory. In: Linda Lombardi (ed.), Segmental Phonology
in Optimality Theory: Constraints and Representations, 1345.
Cambridge: Cambridge University Press. Originally circulated in
1995.
Mascar, J oan
1996 External allomorphy as emergence of the unmarked. In: J acques
Durand and Bernard Laks (eds.), Current Trends in Phonology:
Models and Methods, 473483. Salford, Manchester: European
Studies Research Institute, University of Salford.
2007 External allomorphy and lexical representation. Linguistic Inquiry 38:
715735.
McCarthy, J ohn J .
2000 Harmonic serialism and parallelism. In: Masako Hirotani (ed.),
Proceedings of the North East Linguistics Society 30, 501524.
Amherst, MA: GLSA Publications.
2002 A Thematic Guide to Optimality Theory. Cambridge: Cambridge
University Press.
2003 OT constraints are categorical. Phonology 20: 75138.
2004 Headed spans and autosegmental spreading. Unpublished, University
of Massachusetts Amherst.
2007a Hidden Generalizations: Phonological Opacity in Optimality Theory.
London: Equinox Publishing.
2007b Restraint of analysis. In: Sylvia Blaho, Patrik Bye and Martin Krmer
(eds.), Freedom of Analysis, 203231. Berlin and New York: Mouton
de Gruyter.
2007c Slouching towards optimality: Coda reduction in OT-CC. In:
Phonological Society of J apan (ed.), Phonological Studies 10, 89
104. Tokyo: Kaitakusha.
2008a The gradual path to cluster simplication. Phonology 25: 271319.
2008b The serial interaction of stress and syncope. Natural Language &
Linguistic Theory 26: 499546.
McCarthy, J ohn J . and Alan Prince
1993 Generalized Alignment. In: Geert Booij and J aap van Marle (eds.),
Yearbook of Morphology, 79153. Dordrecht: Kluwer.
Mester, Armin
1994 The quantitative trochee in Latin. Natural Language & Linguistic
Theory 12: 161.
Norton, Russell J .
2003 Derivational phonology and optimality phonology: Formal comparison
and synthesis. Ph.D. diss., Department of Linguistics, University of
Essex.
Noyer, Rolf
1993 Mobile afxes in Huave: Optimality and morphological well-
formedness. In: Erin Duncan, Donka Farkas and Philip Spaelti (eds.),
The Proceedings of the West Coast Conference on Formal Linguistics
12, 6782. Stanford, CA: Stanford Linguistics Association.
Onn, Farid M.
1980 Aspects of Malay Phonology and Morphology: A Generative
Approach. Kuala Lumpur: Universiti Kebangsaan Malaysia.
Pater, J oe
to appear Serial Harmonic Grammar and Berber syllabication. In: Toni
Borowsky, Shigeto Kawahara, Takahito Shinya and Mariko Sugahara
(eds.), Prosody Matters: Essays in Honor of Lisa Selkirk. London:
Equinox Publishing.
Perlmutter, David
1998 Interfaces: Explanation of allomorphy and the architecture of
grammars. In: Steven G. Lapointe, Diane K. Brentari and Patrick M.
Farrell (eds.), Morphology and its Relation to Phonology and Syntax,
307338. Stanford, CA: CSLI Publications.
Piggott, G. L.
1992 Variability in feature dependency: The case of nasality. Natural
Language & Linguistic Theory 10: 3378.
Prince, Alan
2002 Arguing optimality. In: Angela Carpenter, Andries Coetzee and Paul
de Lacy (eds.), University of Massachusetts Occasional Papers in
Linguistics 26: Papers in Optimality Theory II, 269304. Amherst,
MA: GLSA.
2006 Implication and impossibility in grammatical systems: What it is and
how to nd it. Unpublished, Rutgers University.
Prince, Alan and Paul Smolensky
2004 Reprint. Optimality Theory: Constraint Interaction in Generative
Grammar. Malden, MA, and Oxford, UK: Blackwell. Originally
circulated in 1993.
Pruitt, Kathryn
2008 Iterative foot optimization and locality in stress systems. Unpublished,
1989 Patterns of feature co-occurrence: The case of nasality. In: S. Lee
Fulmer, M. Ishihara and Wendy Wiswall (eds.), Coyote Papers 9, 98
115. Tucson, AZ: Department of Linguistics, University of Arizona.
1996 Neutral vowels in Optimality Theory: A comparison of Yoruba and
Wolof. Canadian Journal of Linguistics 41: 295347.
2004 Harmony drivers: No disagreement allowed. In: J ulie Larson and
Mary Paster (eds.), Proceedings of the Twenty-eighth Annual Meeting
of the Berkeley Linguistics Society, 249267. Berkeley, CA: Berkeley
Linguistics Society.
Robins, R. H.
1957 Vowel nasality in Sundanese: A phonological and grammatical study.
In: Philological Society of Great Britain (ed.), Studies in Linguistic
Analysis, 87103. Oxford: Blackwell.
Rubach, J erzy
1997 Extrasyllabic consonants in Polish: Derivational Optimality Theory.
In: Iggy Roca (ed.), Derivations and Constraints in Phonology, 551
582. Oxford: Oxford University Press.
Schourup, Lawrence
1972 Characteristics of vowel nasalization. Papers in Linguistics 5: 550548.
Smolensky, Paul
1993 Harmony, markedness, and phonological activity. Unpublished,
University of Colorado.
1995 On the structure of the constraint component Con of UG. Unpublished,
J ohns Hopkins University.
1997 Constraint interaction in generative grammar II: Local conjunction,
or random rules in Universal Grammar. Unpublished, J ohns Hopkins
University.
2006 Optimality in phonology II: Harmonic completeness, local constraint
conjunction, and feature-domain markedness. In: Paul Smolensky
and Graldine Legendre (eds.), The Harmonic Mind: From Neural
Computation to Optimality-Theoretic Grammar, 585720. Cambridge,
MA: MIT Press/Bradford Books.
Steriade, Donca
1993a Closure, release and nasal contours. In: Marie Huffman and Rena
Krakow (eds.), Nasality. San Diego: Academic Press.
1993b Orality and markedness. In: B. Keyser and J . Guenther (eds.), Papers
from BLS 19. Berkeley: Berkeley Linguistic Society.
1995 Underspecication and markedness. In: J ohn Goldsmith (ed.),
Handbook of Phonological Theory, 114174. Cambridge, MA:
Blackwell.
Tesar, Bruce
1995 Computational Optimality Theory. Ph.D. diss., Department of
Linguistics, University of Colorado.
Tranel, Bernard
1996a Exceptionality in Optimality Theory and nal consonants in French. In:
Karen Zagona (ed.), Grammatical Theory and Romance Languages,
275291. Amsterdam: J ohn Benjamins.
1996b French liaison and elision revisited: A unied account within
Optimality Theory. In: Claudia Parodi, Carlos Quicoli, Mario Saltarelli
and Maria Luisa Zubizarreta (eds.), Aspects of Romance Linguistics,
433455. Washington, DC: Georgetown University Press.
1998 Suppletion and OT: On the issue of the syntax/phonology interaction.
In: Emily Curtis, J ames Lyle and Gabriel Webster (eds.), The
Proceedings of the West Coast Conference on Formal Linguistics 16,
415429. Stanford, CA: CSLI Publications.
Trigo, L.
1993 The inherent structure of nasal segments. In: Marie Huffman and
Rena Krakow (eds.), Nasality. San Diego: Academic Press.
Ultan, Russell
1978 A typological view of metathesis. In: J oseph Greenberg (ed.),
Universals of Human Language, 367402 (vol. ii). Stanford: Stanford
University Press.
Walker, Rachel
1998 Nasalization, neutral segments, and opacity effects. Ph.D. diss.,
Department of Linguistics, University of California, Santa Cruz.
2008 Non-myopic harmony and the nature of derivations. Unpublished,
University of Southern California.
Wang, William S.-Y.
1968 Vowel features, paired variables, and the English vowel shift.
Language 44: 695708.
Wilson, Colin
2003 Unbounded spreading in OT (or, Unbounded spreading is local
spreading iterated unboundedly). Unpublished, UCLA.
2004 Analyzing unbounded spreading with constraints: Marks, targets, and
derivations. Unpublished, UCLA.
2006 Unbounded spreading is myopic. Unpublished, UCLA.
Wolf, Matthew
2008 Optimal Interleaving: Serial phonology-morphology interaction in
a constraint-based model. Ph. D. diss., Department of Linguistics,
Evaluating the effectiveness of Unied Feature
Theory and three other feature systems
Jeff Mielke, Lyra Magloughlin, and Elizabeth Hume
1. Introduction
This chapter is in part a response to a suggestion made by Nick Clements
in 2009. Commenting on the results of a comparison of feature theories
presented in Mielke (2008), Clements proposed the following in an e-mail
message to the third author:
...the Clements & Hume [Unied Feature Theory] system fares rather poorly
in capturing natural classes compared with the other systems discussed. If
you look closely at the examples, youll notice... that most of the natural
classes it fails to capture correspond to minus values of the features labial,
coronal, and dorsal, in both vowels and consonants... If these are provided,
I believe the system may perform better than the others... One way of doing
so would be in terms of [the model proposed in Clements 2001], in which
features are potentially binary but specied only as needed, marked values
tend to be prominent, and only prominent features are projected onto separate
tiers. Assuming that plus values are marked, the asymmetry between e.g.
+labial and labial is still captured... If a revised version of the model does
indeed prove to be a competitive system, even for capturing natural classes
(a criterion we did not emphasize), it would be well worth making this point
somewhere in print.
We have thus carried out a large-scale investigation of the ability of several
feature systems to dene natural classes. One of these systems is a revised
version of Unied Feature Theory (UFT, Clements and Hume 1995) in which
place features are represented with both plus and minus values. As Clements
predicts, such a model does indeed emerge as competitive. This chapter also
examines two means by which unnatural classes are formally expressed in
rule- and constraint-based theories of phonology: the union or subtraction of
natural classes. Based on a survey of 1691 unnatural classes, both techniques
are shown to be effective at modeling unnatural yet phonologically active
classes.
224 Jeff Mielke, Lyra Magloughlin, and Elizabeth Hume
2. Six feature systems
In this chapter we report on a comparison of the performance of six sets of
distinctive features in accounting for recurrent phonologically active classes.
This study builds on Mielke (2008), where three feature systems were
compared. In the current work, we increase the number of feature systems
to six and examine differences among the theories in more detail. Similar
to the earlier investigation, we use the P-base database of phonologically
active classes (Mielke 2008) to evaluate the systems. The six feature theories
compared in this paper are:
1. Preliminaries to the Analysis of Speech (J akobson, Fant, and Halle
1952)
2. The Sound Pattern of English (Chomsky and Halle 1968)
3. Problem Book in Phonology (Halle and Clements 1983)
4. Unied Feature Theory (Clements and Hume 1995)
5. Unied Feature Theory with binary place features
6. Unied Feature Theory with full specication of all features
The rst three feature systems, all proposed by Morris Halle and colleagues,
can be viewed as descendants of one another. UFT is distinct from
these feature systems in its use of privative features and its emphasis on
feature organization. We consider a version similar to the one proposed
by Clements and Hume, as well as two variations of this system. The rst of
these uses binary place features, as suggested by Nick Clements. The other
variant (with full specication of all features) was included for comparison
with the others, and to our knowledge, has never been proposed in the
literature.
Figure 1 shows the features used in these feature systems and the
relationships between them. Arrows connect features with their counterparts
in the earlier feature system which match for the largest number of segments
in P-base. The labels on the arrows indicate the degree of match between the
two features, and dotted arrows are used for features that match for fewer
than 90% of the segments. For example, the four systems largely agree on
the features [continuant] and [interrupted/continuant], but the [distributed]
feature found in three of the systems has no counterpart in the Preliminaries
system, most closely matching [vocalic].
Adding a feature to a feature system creates an opportunity to handle
additional phonologically active classes, but also gives the theory more
power, thus creating an opportunity for it to specify phonetically unnatural
P
r
e
n
a
r
e
s
S
P
E
H
a
e

a
n
d

C
e
m
e
n
t
s
U
n
e
d

F
e
a
t
u
r
e

T
h
e
o
r
y
c
o
n
s
o
n
a
n
t
a
v
o
c
a
n
t
e
r
r
u
p
t
e
d
/
c
o
n
t
n
u
a
n
t
v
o
c
e
d
c
h
e
c
k
e
d
c
o
n
s
o
n
a
n
t
a
9
9
.
8
1
%
v
o
c
a
c
9
9
.
9
1
%
s
o
n
o
r
a
n
t
8
7
.
9
7
%
c
o
n
t
n
u
a
n
t
9
7
.
9
4
%
v
o
c
e
9
9
.
9
1
%
g
o
t
t
a

(
t
e
r
t
a
r
y
)

c
o
s
u
r
e
9
7
.
8
5
%
m
o
v
e
m
e
n
t

o
f

g
o
t
t
a
o
s
u
r
e
7
5
.
6
1
%
c
o
v
e
r
e
d
9
9
.
2
5
%
a
t
e
r
a
8
3
.
7
5
%
d
s
t
r
b
u
t
e
d
5
9
.
3
%
s
y
a
b
c
9
5
.
1
4
%
c
o
n
s
o
n
a
n
t
a
1
0
0
%
v
o
c
a
c
1
0
0
%
s
o
n
o
r
a
n
t
1
0
0
%
c
o
n
t
n
u
a
n
t
9
7
.
2
9
%
v
o
c
e
1
0
0
%
c
o
n
s
t
r
1
0
0
%
a
t
e
r
a
1
0
0
%
d
s
t
r
b
u
t
e
d
1
0
0
%
s
y
a
b
c
1
0
0
%
v
o
c
o
d
9
8
.
6
%
v
o
c
a
c
8
0
.
7
%
a
p
p
r
o
x
m
a
n
t
9
5
.
5
1
%
s
o
n
o
r
a
n
t
9
9
.
9
1
%
c
o
n
t
n
u
a
n
t
9
8
.
6
9
%
v
o
c
e
1
0
0
%
c
o
n
s
t
r
c
t
e
d
9
7
.
7
5
%
d
s
t
r
b
u
t
e
d

(
C
)
7
6
.
5
7
%
n
g
u
a

(
C
)
8
7
.
1
3
%
p
h
a
r
y
n
g
e
a

(
C
)
9
3
.
2
4
%
F
i
g
u
r
e

1
.

R
e
l
a
t
i
o
n
s
h
i
p
s

b
e
t
w
e
e
n

f
e
a
t
u
r
e
s

f
r
o
m

f
o
u
r

s
y
s
t
e
m
s

(
C
o
n
t
i
n
u
e
d

o
n

f
o
l
l
o
w
i
n
g

p
a
g
e
s
)
P
r
e
n
a
r
e
s
S
P
E
H
a
e

a
n
d

C
e
m
e
n
t
s
U
n
e
d

F
e
a
t
u
r
e

T
h
e
o
r
y
s
t
r
d
e
n
t
/
m
e
o
w
t
e
n
s
e
/
a
x
n
a
s
a
/
o
r
a
c
o
m
p
a
c
t
a
t
s
h
a
r
p
h
e
g
h
t
e
n
e
d

s
u
b
g
o
t
t
a

p
r
e
s
s
u
r
e
8
1
.
2
9
%
s
t
r
d
e
n
t
9
5
.
0
8
%
d
e
a
y
e
d

p
r
m
a
r
y

r
e
e
a
s
e
8
6
.
6
9
%
t
e
n
s
e
6
8
.
9
3
%
n
a
s
a
9
1
.
0
7
%
d
e
a
y
e
d

r
e
e
a
s
e

o
f

s
e
c
o
n
d
a
r
y

c
o
s
u
r
e
7
7
%
o
w
8
2
.
2
3
%
h
g
h
6
2
.
4
2
%
r
o
u
n
d
9
2
.
0
7
%
s
p
r
e
a
d
9
5
.
3
2
%
s
t
r
d
e
n
t
1
0
0
%
t
e
n
s
e
9
9
.
8
3
%
A
T
R
1
0
0
%
n
a
s
a
1
0
0
%
o
w
1
0
0
%
h
g
h
1
0
0
%
r
o
u
n
d
1
0
0
%
a
b
8
5
.
3
6
%
s
p
r
e
a
d
9
7
.
3
8
%
s
t
r
d
e
n
t
9
7
.
0
6
%
A
T
R
9
0
.
6
%
n
a
s
a
9
9
.
5
3
%
a
t
e
r
a
8
1
.
2
8
%
o
p
e
n
1
,

o
p
e
n
2
,

.
.
.
??
a
b

(
V
)
9
2
.
0
6
%
a
b

(
C
)
8
4
.
5
3
%
a
b

(
a
n
y
)
9
5
.
1
4
%
d
s
t
r
b
u
t
e
d

(
V
)
8
9
.
6
7
%
F
i
g
u
r
e

1
.

(
C
o
n
t
i
n
u
e
d
)
P
r
e
n
a
r
e
s
S
P
E
H
a
e

a
n
d

C
e
m
e
n
t
s
U
n
e
d

F
e
a
t
u
r
e

T
h
e
o
r
y
d
h
u
s
e
g
r
a
v
e
/
a
c
u
t
e
b
a
c
k
7
3
.
0
4
%
c
o
r
o
n
a
8
1
.
0
8
%
a
n
t
e
r
o
r
7
9
.
9
8
%
b
a
c
k
1
0
0
%
c
o
r
o
n
a
1
0
0
%
a
n
t
e
r
o
r
1
0
0
%
c
o
r
o
n
a

(
V
)
7
8
.
9
7
%
d
o
r
s
a

(
V
)
7
2
.
1
%
c
o
r
o
n
a

(
C
)
8
8
.
3
%
c
o
r
o
n
a

(
a
n
y
)
7
8
.
3
9
%
a
n
t
e
r
o
r

(
C
)
9
7
.
8
%
a
n
t
e
r
o
r

(
a
n
y
)
9
6
.
4
3
%
d
s
t
r
b
u
t
e
d

(
a
n
y
)
7
4
.
6
9
%
d
o
r
s
a

(
C
)
8
7
.
2
6
%
F
i
g
u
r
e

1
.

(
C
o
n
t
i
n
u
e
d
)
classes that may not actually participate in a sound pattern in any language.
We consider two ways to evaluate the effectiveness of adding a feature. The
rst is to measure the performance of each feature system with a matching
set of randomly-generated classes, with the idea that a good feature system
will succeed with naturally-occurring phonologically active classes, but fail
with randomly-generated classes.
The real classes were the 6159 naturally-occurring phonologically
active classes in P-base (Mielke 2008). Each of these is a subset of
the segments in an inventory which undergo or trigger a phonological
process, to the exclusion of the other segments in the inventory. For each
of these classes a corresponding randomly-generated class was created by
drawing a class of the same size from the same inventory as the real class.
This is similar to the procedure used by Mackie and Mielke (2011) for
randomly-generated inventories. Table 1 shows four phonologically active
classes in J apanese with their matched randomly generated classes, which
were produced by drawing the same number of segments randomly from the
segment inventory shown in Table 2.
1
Feature analyses were conducted on the 6159 real and 6159 randomly-
generated classes, using the feature analysis algorithm described in Mielke
(2008: ch. 3). Figure 2 shows the success rates of the six feature systems in
specifying these classes. An ideal feature system would represent as many of
the real classes as possible using a conjunction of feature values. One way to
achieve this is for the system to make good choices about which features to
include. Another way is to use a massive number of features, so that virtually
any class can be represented. The analysis with randomly-generated classes is
meant to safeguard against this. In Figure 2, a higher position along the y axis
means more success with real classes, and having a lower x value means this
is being done using well-chosen features, rather than by brute force.
Table 1. Phonologically active classes in J apanese and matched randomly-generated
classes. X in pattern descriptions represents the active class
Phonological behavior Active class Random class
1. X vls / C[vc]__{C[vc], #} i u i q
2. X voiced / at start of non-initial t k s J h J n d s i
morpheme without voiced obstruent
3. high vowels vls / X {X, #} p t k s J h n g b J j e
4. / t k s J h / voiced at start of non-initial b d q z J s q
morpheme without X
Evaluating the effectiveness of Unied Feature Theory 229
Going in chronological order, Preliminaries (J akobson et al. 1952)
handles about 60% of real classes and fewer than 5% of the random ones. The
differences between the Preliminaries and SPE (Chomsky and Halle 1968)
Table 2. The segment inventory of J apanese
p t k i u
b d q e o
s J h a
z i: :
m n e: o:
r a:
j :
0.00 0.05 0.10 0.15
0
.
6
0
0
.
6
5
0
.
7
0
0
.
7
5
Naturalness according to six feature systems
random classes
r
e
a
l

c
l
a
s
s
e
s
Preliminaries
SPE
H&C
UFT
UFTplace
UFTfull
good
bad
Figure 2. Success rates with real and random classes
systems mostly involve adding features (see Figure 1) that were motivated
by observed sound patterns, and this enabled SPE features to account for an
additional 11% of real classes, with the by-product of accounting for another
2% of random classes as well. The Halle and Clements system (Halle and
Clements 1983) was an update of SPE that involved removing seldom-used
features and adding the feature [labial]. This resulted in small increases
in the number of real and random classes accounted for. The UFT system
(Clements and Hume 1995) involves many of the same features as the Halle
and Clements system. However, many of the features were not treated as
binary, and the emphasis was on feature organization rather than on capturing
phonologically active classes. The original UFT system accounts for slightly
more observed classes than Preliminaries, and represents the fewest random
classes of all the feature systems. The UFT variant with binary place features
(proposed by Nick Clements, as noted above) is a substantial improvement
over the original UFT, and performs slightly better than the SPE and Halle
and Clements systems, accounting for more observed classes, and fewer
random classes. The full specication variant of UFT (proposed by no one,
to our knowledge) accounts for more real and random classes than any of
the other approaches. The fact that three of the feature systems can represent
70-73% of the naturally-occurring classes suggests that this might be a
natural threshold, and that 27-30% of naturally-occurring classes are not
phonetically natural enough to be represented in terms of the best systems
of phonetically-dened distinctive features. If this is the case, we expect the
apparent improvements achieved by the full specication version of UFT to
involve a seemingly random assortment of phonetically unnatural classes that
happen to be handled by this feature system. We also expect the additional
classes handled by other incremental changes (if the feature proposals were
on the right track) to involve multiple instances of phonetically natural
classes that could not be expressed in the earlier feature systems. This is
investigated in the remainder of this section.
Comparing the total numbers of real and random classes handled by
each feature system gives a very general comparison of the performance
of the feature systems. The second, more specic approach is to inspect the
phonologically active classes that are natural according to one feature system
but not another, and to determine whether these include recurrent phonetically
natural classes overlooked by one of the feature systems, or just an assortment
of classes that happen to be natural in an overly strong feature system. Table 3
shows how many classes are natural according to combinations of the
six feature systems. Of the 63 logically possible combinations, there are
28 that correspond to sets of classes natural in only those feature systems.
The most frequent combination by far is the 3110 phonologically active
classes that are natural according to all six feature systems. 1367 classes are
unnatural according to all six. The remaining sets of classes are revealing
about differences among the feature systems. For example, the 570 classes
that are natural according to all feature systems except Preliminaries indicate
Table 3. Classes that are natural according to combinations of feature systems
classes median Prelim. SPE H&C UFT UFT- UFT-
size place full
3110 4
570 7
370 3
127 6
84 4
73 3
63 4
55 4
47 2
37 3
34 4
34 4
33 2
30 3
24 3
19 3
15 4
10 2
9 3
8 5
8 3
7 4
5 4
5 5
2 3
2 6
1 15
1 2
1367 5
6159 4 3676 4347 4389 3921 4468 4651
59.7% 70.6% 71.3% 63.7% 72.5% 75.5%
a gap in the coverage of this system. Similarly, the 370 classes that are natural
for all except regular UFT indicate a different gap, while the 127 classes that
are natural only according to the three versions of UFT indicate aspects of
the UFT system that were a step in the right direction.
2.1. Comparing the Preliminaries and SPE systems
The combinations of feature sets shown in Table 3 include a total of 760
(714+46) classes that are natural according to SPE features but not according
to Preliminaries features. Table 4 focuses on how classes are accounted for
by the Preliminaries system and its two direct descendants. This subsection
and the next one explore these classes in more detail in pairwise comparisons
between the feature systems.
The most frequent feature descriptions among these classes involve the
feature [syllabic], which is included only in SPE : 164 [syllabic], 29
[+syllabic]. Many of the other recurrent classes in this category involve
[syllabic] in conjunction with other features, e.g., 23 [+voice, syllabic]
and 17 [low, syllabic]. The most frequent classes without [syllabic]
involve [sonorant], another feature not included in the Preliminaries
feature set, e.g., 14 [+coronal,+sonorant], 13 [heightened subglottal
pressure,+coronal,+sonorant], and 11 [+consonantal,+sonorant].
Figure 3 compares how frequently each SPE feature is used in describing
classes that are unnatural according to Preliminaries features (y-axis) with
its frequency in describing classes that are natural according to both feature
Table 4. The Preliminaries feature system and its direct descendants
classes median size Preliminaries SPE H&C
3560 3
714 5
86 4
65 3
46 4
24 3
22 2
1633 5
6159 4 3676 4347 4389
59.7% 70.6% 71.3%
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
0
.
0
0
.
1
0
.
2
0
.
3
0
.
4
0
.
5
0
.
6
0
.
7
SPE and Preliminaries (n=3582)
S
P
E

o
n
l
y

(
n
=
7
6
0
)
syl
+son
+voice
hi subgl pr
+cor
nas
glot cl
back
+cont
son
high
strid +ant
+high
distr
cor
low
+syl
voc
+cons
voice
ant+nas
del rel
cont +back
+distr
lat
round
+glot cl cons
+low+voc
tense +strid +tense mv glot cl i subgl pr
+lat +round +del rel covered del rel 2
Figure 3. Usage of SPE features (vs. Preliminaries)
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
0
.
0
0
.
1
0
.
2
0
.
3
0
.
4
0
.
5
0
.
6
0
.
7
SPE and Preliminaries (n=3582)
S
P
E

o
n
l
y

(
n
=
7
6
0
)
syl
+son
hi subgl pr
back son
+ant
distr
+syl
ant
del rel +back
+distr
lat
mv glot cl i subgl pr
+lat +del rel covered del rel 2
systems (x-axis). This is meant to highlight the particular features that allow
SPE to represent more classes than Preliminaries, which appear above the
diagonal because they are used more frequently for the classes that are not
natural in the Preliminaries system. The rst chart shows all SPE feature
values, and the second chart shows only the feature values that have no
counterpart in the Preliminaries system.
As seen in Figure 3, [syllabic] is used much more frequently to describe
classes that are unnatural for Preliminaries. This is consistent with the
observation above that the class [syllabic] appears 164 times among the
classes that SPE is able to account for but Preliminaries is not. [+syllabic]
is used less frequently for SPE-only classes, probably because SPE`s
improvements apply mostly to classes of consonants. [+son] is also used
more frequently to account for SPE-only classes, while [sonorant] is used
about the same in both sets of classes. [heightened subglottal pressure] is
also used more to account for SPE-only classes. Other features that have
no counterpart in Preliminaries ([movement of glottal closure], [delayed
release], [covered], [lateral], and [delayed release of secondary closure]) do
not appear to be used more for SPE-only classes than for classes that are also
natural using the feature system of Preliminaries .
The most frequent recurring class involving [heightened subglottal
pressure] is the class of coronal sonorants excluding /r/ (which is [+heightened
subglottal pressure]) as seen in Table 5. 385 out of 628 language varieties
in P-base (61.3%) have /r/, including all of the languages in Table 5. The
fact that classes involving [heightened subglottal pressure] are dominated
by classes requiring the [] value to exclude /r/ suggests that a cross-cutting
feature like [heightened subglottal pressure] is not well motivated, and these
classes specically involve the exclusion of a trill from sound patterns that
Table 5. Phonologically active classes dened as [heightened subglottal
pressure,+coronal,+sonorant] in SPE and unnatural in Preliminaries.
References for languages referred to in examples from P-base are given in
the appendix.
segments excluded languages
/ n l / / r / Acehnese, Catalan, Dhaasanac, Ecuador Quichua
(Puyo Pongo variety), Harar Oromo, Kalenjin, Kilivila,
Maasai, Slovene, Sri Lanka Portuguese Creole, Yiddish
/ n l / / r / Faroese
/ l l / / r / Uvbie
involve other coronal sonorants. This observation is consistent with the
abandonment of [heightened subglottal pressure] shortly after it was
introduced.
Figure 4 compares Preliminaries features with those from SPE. There
are no Preliminaries features without corresponding features in SPE (so
there is only one chart in this gure) but not all of the feature denitions
are the same. Ten of the twelve classes occurring more than once among the
Preliminaries -only classes involve [grave/acute]. The most frequent of these
are 5 [unvoiced, grave] and 5 [grave, non-vocalic]. As seen in Table 6,
these are all grave classes consisting of labial and velar consonants. What
is special about these ten cases is that they are all in languages with palatal
consonants of the same manner class. 273 out of 628 language varieties
(43.5%) in P-base have at least one palatal stop or nasal, including all of the
languages in Table 6. These classes are handled by Preliminaries but not SPE
because palatals are [acute] in Preliminaries but [coronal] in SPE. While
the [grave/acute] was replaced by [coronal], the boundary between the two
feature values is not exactly the same in both systems, crucially involving
the status of palatals. For more detailed discussion of the need for [grave],
see Hyman (1972), Clements (1976), Odden (1978), and Hume (1994). See
Mielke (2008: 158161) for related discussion on differences in place feature
boundaries across feature systems.
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
0
.
0
0
.
1
0
.
2
0
.
3
0
.
4
0
.
5
0
.
6
0
.
7
Preliminaries and SPE (n=3582)
P
r
e
l
i
m
i
n
a
r
i
e
s

o
n
l
y

(
n
=
8
9
)
unvoiced
acute interrupted
grave
nonvocalic
oral
mellow
onsharp
nonflat
consonantal
lax
nasal voiced
continuant noncompact vocalic
nchecked
compact
flat tense nondiffuse diffuse
strident noncons
sharp checked
Figure 4. Usage of Preliminaries features (vs. SPE)
2.2. Comparing the SPE and Halle and Clements systems
The combinations of feature sets shown above in Table 3 include 110
classes that are natural according to the Halle and Clements system, but
not according to the SPE system. Figure 5 shows that [labial], especially
[labial] is used more among the Halle and Clements-only classes. The
other features common to both systems that are used more among these
classes are primarily features for consonants, which are involved in most
of the classes involving [labial]. The Halle and Clements feature [labial],
in conjunction with other features, is found in 92 phonologically active
classes that cannot be handled by SPE, which has no feature [labial]. The
most frequent of these include: 7 [labial, back, syllabic], 4 [labial,
back, syllabic, low], 4 [labial, +voice, strid, son], 4 [labial,
+voice, continuant, sonorant], and 4 [labial, +nasal]. The rst two
of these involve [back] in addition to [labial], as shown in Table 7.
These are classes of coronal consonants, including palatals, the opposite
of the [grave] classes in Table 6. This indicates that this use of [labial]
(in conjunction with [back]) is better interpreted as a proxy for a different
interpretation of the feature [coronal]. The other [labial] classes with at least
four instances more directly involve [labial], as seen in Table 8. These are
classes of coronals and velars to the exclusion of labials. This is effectively
the lingual class that is not available in SPE but is available in some other
feature systems.
On the other hand, 68 classes that were natural in SPE features are
unnatural using Halle and Clements features. However, most of these do
not recur very often, and only three sets of natural classes recur three
times ([high, heightened subglottal pressure, +coronal], [heightened
Table 6. Phonologically active classes dened as [unvoiced, grave], [consonan-
tal, oral, grave], and [interrupted, grave] in Preliminaries and unnatural
in SPE
/p k / /c / Nandi Kalenjin, Martuthunira, Midland
Mixe (Puxmetacan), South Highland Mixe
(Tlahuitoltepec), Muruwari
/b q / /j/ Gooniyandi, Gunin/Kwini
/p m k / /c / Dieri (Diyari), Muruwari
/p b m k q / /c j / K hmu?
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
0
.
0
0
.
1
0
.
2
0
.
3
0
.
4
0
.
5
0
.
6
0
.
7
Halle and Clements and SPE (n=4274)
H
a
l
l
e

a
n
d

C
l
e
m
e
n
t
s

o
n
l
y

(
n
=
1
1
0
)
ant
ATR
back
cons
c.g.
cont
cor distr
high
lab
lat
low
nas
round
son
s.g.
strid
syl
tense
voc
voice
+ant
+ATR +back
+cons
+c.g.
+cont
+cor
+distr
+high
+lab
+lat +low
+nas
+round
+son
+s.g.
+strid
+syl
+tense
+voc
+voice
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
0
.
0
0
.
1
0
.
2
0
.
3
0
.
4
0
.
5
0
.
6
0
.
7
Feature usage for Halle and Clements vs. SPE
Halle and Clements and SPE (n=4274)
H
a
l
l
e

a
n
d

C
l
e
m
e
n
t
s

o
n
l
y

(
n
=
1
1
0
)
lab
+lab
Figure 5. Usage of Halle and Clements features (vs. SPE)
Table 8. Phonologically active classes dened as [labial, +voice, strident,
sonorant], [labial, +voice, continuant, sonorant], or [labial, +nasal]
in Halle and Clements system and unnatural in SPE
/ d q / / b / Batibo Moghamo (Meta), Koromf, Supyire
Senoufo, Yiddish
/ n / / m / Hungarian, Northern Tepehuan
/ n / / m / Nandi Kalenjin, Yidi
/ d q / / b qb
/ Dgr
/ d d
j
q q
j
/ / b b
j
/ Irish
/ d q
j
/ / b / Muscat Arabic
subglottal pressure, +voice, nasal, syllabic, +sonorant], and [+consonantal,
+continuant]), and seven sets recur twice. Figure 6 shows that [heightened
subglottal pressure] and [+continuant] are used more for SPE-only classes
than for the classes that are also natural according to Halle and Clements
features.
Table 7. Phonologically active classes dened as [labial, back, syllabic] or
[labial, back, syl, low] in the Halle and Clements system and unnatural
in SPE
/ t d s n l r [ [ c j / / p b m w k q x p
/ San Pedro de Cajas

J unn Quechua, Tarma
Quechua
/ [t n l t n r r l p j | c j / / p m w k / Arabana
/ t n d n r l q p j | j j / / b m w g / Gooniyandi
/ t s n r t n r l J c
j / / p m w k x k
w
x
w
q / Kumii (Diegueo)
/ t n l t n r r l [ p j | c j / / p m w k / Wangkangurru
/ t d n s q [ t J d j / / p b m w k q ? h / Oodham
/ t d ts tl s n l r r tJ J j / / p b m w f k k
w
g h / Tetelcingo Nahuatl
/ t d d s z n r l c j J j / / b b m w k g q h / Tirmaga
/ t s n r tJ j / / p m w f k / Asmat (Flamingo
Bay dialect)
/ t t d ts ts s l n n l l tJ tJ / p p k
w
k
w
q
w
x
w
qq` Coeur dAlene
d J j j /
q
w
q
w

w
? h /
Figure 6. Usage of SPE features (vs. the Halle and Clements system)
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
0
.
0
0
.
1
0
.
2
0
.
3
0
.
4
0
.
5
0
.
6
0
.
7
SPE and Halle and Clements (n=4274)
S
P
E

o
n
l
y

(
n
=
6
8
)
ant
back
cons cont
cor
covered
del rel
del rel 2
distr
glot cl
hi subgl pr
high
lat low
nas
round
son
strid
syl
tense voc
voice +ant
+back
+cons
+cont
+cor
+del rel +distr +glot cl i subgl pr
+high +lat
+low mv glot cl
+nas
+round
+son
+strid +syl +tense
+voc
+voice
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
0
.
0
0
.
1
0
.
2
0
.
3
0
.
4
0
.
5
0
.
6
0
.
7
Feature usage for SPE vs. Halle and Clements
SPE and Halle and Clements (n=4274)
S
P
E

o
n
l
y

(
n
=
6
8
)
del rel
del rel 2 +del rel mv glot cl
The feature [continuant] appears here because of how it is handled
differently by the Halle and Clements system as opposed to the SPE feature
system, particularly with respect to laterals (which are [continuant] in SPE
and [+continuant] in the Halle and Clements system). The most frequent
types of classes involving [heightened subglottal pressure] are shown in
Table 9. As seen above in Table 5, the function of this feature is to exclude
a trill from classes that would otherwise be expected to include it. This is
consistent with the special conditions needed to produce a trill (Sol 2002)
and with the quick abandonment of the [heightened subglottal pressure] as a
feature shared by aspirated consonants and other non-trills.
2.3. Comparing the Halle and Clements and UFT systems
There are 617 classes that are natural within the Halle and Clements system,
but not within the UFT system, as summarized in Table 10. Of these, the
Halle and Clements features [labial], [back], and [high], in conjunction with
other features, are found in 451 natural classes that cannot be handled by
UFT, which has the privative feature [Labial] but no [Labial]. This can
be seen in Figure 7, where the values of these three features are shown to
describe a great number of classes that UFT is unable to account for.
The most frequent of these include 38 [labial, +syllabic], 32 [+back,
+vocalic], 18 [high, +back], 12 [high, +nasal], 10 [+high, labial,
+back], and 10 [high, labial, +syllabic]. These are mostly subsets of
Table 9. Phonologically active classes dened using [heightened subglottal
pressure] in SPE and unnatural in the Halle and Clements system
/ r l w r j / / r / Okpe, Uvbie
/ l | w r j / / r / Edo
/ z l w j / / r / Mising
/ r l / / r / Agulis Armenian
/ l j w / / r / Ehueun
/ l / / r r / Ukue
/ y l / / r / Epie
/ t s n l / / r / Estonian
/ t d s z n l / / r / Catalan
/ t d ts dz s z n l / / r / Ukrainian
vowels, which would require minus values of UFTs place features to be
natural classes in that feature system. The most frequent classes dened by
[ labial,+syllabic] consist of front vowels and low vowels (but not nonlow
back vowels, which are round). In addition, there are classes of unrounded
vowels, including some nonlow back or central vowels (but not back round
vowels), and classes of unrounded vowels (excluding rounded vowels,
including front rounded vowels). Most of the rest are straightforwardly
classes of back vowels. The 32 classes that are analyzed as [+back, +vocalic]
in the Halle and Clements system but unnatural in the UFT system involve
back vowels, which are [Dorsal], and low vowels, which are not, to the
exclusion of front vowels. These would be natural in the UFT feature system
if [Coronal] were an option.
Conversely, 149 classes that were unnatural using Halle and Clements
features are natural using UFT features. Figure 8 does not show many
features standing far above the diagonal.
The most frequent feature descriptions among these classes involve
the feature [vocoid], which is similar, but not identical, to the opposite
of [consonantal], e.g., 7 [vocoid] and 2 [vocoid, spread glottis].
The primary difference between [+consonantal] and [vocoid] concerns
glottal consonants, which are [consonantal] in the Halle and Clements
system and [vocoid] in UFT. The classes involving [vocoid] are all
examples of consonants, including glottals, as opposed to glides and vowels
(Table 11).
Other recurrent classes involve [approximant], a feature found only in
UFT: [approximant, nasal] occurs twice. This is the class of obstruents
plus / ? / and/or / h /. This is probably better understood as a slightly different
denition of obstruent that includes glottals. See Miller (2011) and
Mielke (to appear) for discussions of glottals and the sonorant-obstruent
Table 10. Classes natural according to the Halle and Clements system and UFT
classes median size H&C UFT
3767 4
617 3
149 5
1617 5
6159 4 4389 3921
71.3% 63.7%
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
0
.
0
0
.
1
0
.
2
0
.
3
0
.
4
0
.
5
0
.
6
0
.
7
Halle and Clements and Unified Feature Theory (n=3767)
H
a
l
l
e

a
n
d

C
l
e
m
e
n
t
s

o
n
l
y

(
n
=
6
7
0
)
ant
ATR
back
cons c.g.
cont
cor
distr
high
lab
lat
low
nas
round
son
s.g.
strid
syl
tense
voc
voice
+ant
+ATR
+back
+cons
+c.g.
+cont
+cor
+distr
+high
+lab
+lat
+low
+nas
+round
+son
+s.g. +strid
+syl
+tense
+voc
+voice
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
0
.
0
0
.
1
0
.
2
0
.
3
0
.
4
0
.
5
0
.
6
0
.
7
Feature usage for Halle and Clements vs. Unified Feature Theory
Halle and Clements and Unified Feature Theory (n=3767)
H
a
l
l
e

a
n
d

C
l
e
m
e
n
t
s

o
n
l
y

(
n
=
6
7
0
)
cor
lab
tense
+tense
Figure 7. Usage of Halle and Clements features (vs. UFT)
Figure 8. Usage of UFT features (vs. the Halle and Clements system)
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
0
.
0
0
.
1
0
.
2
0
.
3
0
.
4
0
.
5
0
.
6
0
.
7
Unified Feature Theory and Halle and Clements (n=3767)
U
n
i
f
i
e
d

F
e
a
t
u
r
e

T
h
e
o
r
y

o
n
l
y

(
n
=
1
4
9
)
ant(C)
ant
approx
ATR
c.g.
cont
distr(C) distr(V)
distr
lat
nasal
open1
open2
open3
open4
open5 open6
son
s.g.
strid
vocoid
voice
+ant(C)
+ant
+approx
+ATR
+c.g.
+cont
+distr(C) +distr(V)
+distr
+lat
+nas
+open1
+open2
+open3
+open4
+open5
+son
+s.g.
+strid +vocoid
+voice
Cplace
Cor(C)
Cor(V)
Cor
Dor(C)
Dor(V)
Dor
Lab(C)
Lab(V) Lab ingual(C) Lingual
Phar(C)
Phar(V)
Vplace
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
0
.
0
0
.
1
0
.
2
0
.
3
0
.
4
0
.
5
0
.
6
0
.
7
Feature usage for Unified Feature Theory vs. Halle and Clements
Unified Feature Theory and Halle and Clements (n=3767)
U
n
i
f
i
e
d

F
e
a
t
u
r
e

T
h
e
o
r
y

o
n
l
y

(
n
=
1
4
9
)
approx
open1
open2
open3
open4
open5 open6
+approx
+open1
+open2
+open3
+open4
+open5
Cplace Cor(V) Dor(C) Dor ingual(C) Lingual
Phar(C)
Phar(V)
Vplace
boundary. [+approx] (vowels, liquids, and glides) occurs twice. Many of the
other recurrent classes in this category involve [continuant] in conjunction
with other features: (6 [continuant, +sonorant], 3 [continuant,
distributed], etc.)
2.4. UFT: Comparing versions with and without binary place features
We have seen in the previous subsection that a weakness of UFT, in
particular compared to the Halle and Clements system, is the absence of
[] values for place features, as observed by Clements in the quote at the
beginning of this chapter. Going from the Halle and Clements system to UFT
involves a reduction in the number of classes considered to be natural. As
noted in the previous subsection, a big part of this is the lack of [Labial]
in UFT. By giving UFTs place features both plus and minus values, the
system is thus able to describe more classes as natural. Table 12 shows the
phonologically active classes accounted for by the three versions of UFT.
The only difference between UFT and UFT-place is the addition of minus
values of place features, and the only difference between UFT-place and
UFT-full is feature values specied in UFT-full but unspecied in UFT-
place . Thus, the classes captured by UFT-full are a superset of the classes
accounted for by UFT-place , and the classes accounted for by UFT-place
are a superset of those accounted for by UFT.
There are 547 more natural classes captured by UFT-place than by
conventional UFT, including 167 classes involving [Coronal], 151
involving [Labial], and 106 involving [Dorsal]. These include many of
the grave classes in Table 6 (above), which made use of the feature [grave]
Table 11. Phonologically active classes dened using [vocoid] in UFT and
unnatural in the Halle and Clements system
Consonants including glottals vowels and glides Ilocano, Irish, Oneida,
Thompson
Consonants including glottals glides Ecuador Quichua (Puyo
Pongo variety)
Consonants including syllabic vowels Dagur
nasal
from Preliminaries, and many of the nonlabial classes in Table 8, which
made use of [labial] in the Halle and Clements system. Figure 9 shows that
[Coronal], [Labial], and [Dorsal] are the only feature values that are far
above the diagonal.
2.5. UFT: Comparing versions with binary place features
and full specication
Compared to the earlier feature systems, UFT is restrictive, in that it excludes
minus values of certain features and does not have values for features that
are dependants of privative features that are not present. It was seen in the
previous subsection that including the minus values of the place features
permits 547 more classes, mostly involving the minus values of [Coronal],
[Labial], and [Dorsal]. An additional step is to specify all features for
every segment (e.g. [distributed], [anterior], and [lateral] for non-coronals).
In this scenario, the difference between the full-specication UFT system
and the non-UFT feature systems is just the choice of features, and not the
restrictions on specifying them. Full specication was achieved by giving a
[ ] in place of all unspecied values.
There are 183 classes that are natural according to UFT-full, but not
according to UFT-place features. The most frequent feature descriptions
among these classes involve the feature [syllabic], which accounts for 65 of
the recurrent classes that UFT-full features handle, but UFT-place features do
not. The feature [distributed] accounts for 75 of the recurrent natural classes
in this category, [sonorant] for 51 classes. Figure 10 shows that versions
of [distributed], [anterior], and [coronal] are more frequent among the
classes not accounted for by the other version of UFT.
Table 12. UFT variants
classes median size UFT UFT-place UFT-full
3916 4
547 3
183 3
1504 5
6159 4 3921 4468 4651
63.7% 72.5% 75.5%
Figure 9. Usage of UFT features, with and without binary place features
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
0
.
0
0
.
1
0
.
2
0
.
3
0
.
4
0
.
5
0
.
6
0
.
7
UFT with or without binary place features (n=3916)
O
n
l
y

U
F
T

w
i
t
h

b
i
n
a
r
y

p
l
a
c
e

f
e
a
t
u
r
e
s

(
n
=
5
4
7
)
ant ant(C)
approx
ATR
c.g.
cont
Cor
Cor(C)
Cor(V)
distr
distr(C) distr(V)
Dors
Dors(C)
Dors(V)
Lab
Lab(C)
Lab(V)
lat
Lingual
ngual(C)
ingual(V)
nas
open1
open2
open3
open4
open5 open6
Phar
Phar(C)
Phar(V)
son
s.g.
strid
vocoid
voice
+ant +ant(C) +approx
+ATR
+c.g.
+cont
+Cor
+Cor(C) +Cor(V)
+distr
+distr(C) distr(V)
+Dors
+Dors(C)
+Dors(V)
+Lab
+Lab(C)
+Lab(V) +lat
+Lingual ingual(C)
ingual(V)
+nas
+open1
+open2
+open3
+open4
+open5 +Phar
+Phar(C)
Phar(V)
+son
+s.g. +strid
+vocoid
+voice
Cplace Vplace
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
0
.
0
0
.
1
0
.
2
0
.
3
0
.
4
0
.
5
0
.
6
0
.
7
Feature usage for binary Unified Feature Theory
UFT with or without binary place features (n=3916)
O
n
l
y

U
F
T

w
i
t
h

b
i
n
a
r
y

p
l
a
c
e

f
e
a
t
u
r
e
s

(
n
=
5
4
7
)
ant ant(C)
Cor
Cor(C)
Cor(V)
distr
distr(C) distr(V)
Dors
Dors(C)
Dors(V)
Lab
Lab(C)
Lab(V)
Lingual
ngual(C)
ingual(V)
Phar(C)
Phar(V)
son
[distributed] and [anterior] are features that are only specied in coronals
in the other UFT-based feature systems. They are used more when they are
fully specied because there are a lot of segments that otherwise would
not have values for them, not necessarily because there are additional
phonetically natural classes that were neglected by the other feature systems.
The feature bundles with three or more phonologically active classes that
become natural if the rest of the UFT features are made binary are 4 [+nasal,
distributed] (nonpalatal nasals, an analysis that depends, perhaps dubiously,
on labials being [distributed]), 4 [C-place Labial, C-place Lingual]
(vowels, glottals and glides, which in three of these cases pattern together by
being transparent to spreading that is blocked by other segments), [C-place
anterior] (noncoronal consonants plus posterior coronals, which are active
in the same sound pattern occurring in three varieties of Slavey), and 3
[C-place Lingual, Labial] consonants (glottals and / j /). These classes,
being the most frequent of all the classes gained by adding full specication,
suggest that we are now scraping the bottom of the barrel.
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
0
.
0
0
.
1
0
.
2
0
.
3
0
.
4
0
.
5
0
.
6
0
.
7
Binary UFT with or without full specification (n=4463)
O
n
l
y

U
F
T

w
i
t
h

f
u
l
l

s
p
e
c
i
f
i
c
a
t
i
o
n

(
n
=
1
8
3
)
ant(C)
ant(V)
ant
approx
ATR
Cplace
c.g.
cont
Cor(C)
Cor(V)
Cor
distr(C)
distr(V)
distr
Dor(C)
Dor(V)
Dor
Lab(C)
Lab(V)
Lab
lat
ingual(C)
Lingual
nas
open1
open2
open3
open4
open5
open6
Phar(C)
Phar(V)
Phar
son
s.g.
strid
Vplace
vocalic
vocoid
voice
+ant(C)
+ant
+approx
+ATR
+Cplace
+c.g.
+cont
+Cor(C) +Cor(V)
+Cor
+distr(C) distr(V)
+distr
+Dor(C) +Dor(V) +Dor +Lab(C) +Lab(V)
+Lab
+lat
ingual(C)
ingual(V)
+Lingual
+nas
+open1 +open2 +open3 +open4 +Phar(C) Phar(V) +Phar
+son
+s.g.
+strid
+Vplace
+vocalic +vocoid
+voice
Figure 10. Usage of binary UFT features, with and without full specication
2.6. Discussion: improving UFT-place
While UFT-place appears to do the best out of three feature systems that
handle similar numbers of phonologically active classes (as seen above
in Figure 2), there are still 201 classes that are handled by SPE but not
by UFT-place (Table 3, above). Figure 11 shows that the feature values
frequently involved in the classes that are only natural for SPE are the
major class feature values [+continuant], [vocalic], the place feature
values used to refer to labial+coronal classes [high] and [+anterior], and
also [heightened subglottal pressure] and [+sonorant], which are not
involved in as many recurrent classes.
Most of these recurrent classes are also handled by the Halle and Clements
system. Many of them involve subgroupings of place of articulation,
including the two most frequent SPE classes that are unnatural in the UFT-
place system: 8 [high, +nasal] and 6 [+anterior, +nasal], both of which
dene classes of labial and coronal (but not velar) nasals. Other recurrent
classes involve the major class features [vocalic] and [consonantal],
which are dened somewhat differently in SPE and UFT: 5 [coronal,
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
0
.
0
0
.
1
0
.
2
0
.
3
0
.
4
0
.
5
0
.
6
0
.
7
SPE and binary UFT (n=4141)
S
P
E

o
n
l
y

(
n
=
2
0
1
)
ant
back
cons
cont
cor
covered
del rel
del rel 2
distr
glot cl
hi subgl pr
high
lat
low
nas
round
son
strid
syl
tense
voc
voice
+ant
+back
+cons
+cont
+cor
+del rel
+distr
+glot cl
i subgl pr
+high
+lat +low mv glot cl
+nas
+round
+son
+strid +syl
+tense
+voc
+voice
Figure 11. Usage of SPE features (vs. UFT-place features)
+continuant, vocalic], 4 [consonantal], 4 [vocalic], 4 [+vocalic],
4 [+high, consonantal, vocalic].
At issue here are features that dene slightly different classes, such as
slightly different denitions for major class features, and the difference
between [high] and [Dorsal]. The problem that these classes highlight
is not the particular features chosen, but the expectation that a limited set
of universally-dened features should be able to dene the wide range
of classes that are phonologically active. These examples suggest that
traditional distinctive feature systems underdetermine the phonologically
relevant boundaries between different places of articulation.
2.7. Summary
Most of the historical changes between feature sytems described above
involved adding the ability to represent a recurrent phonetically natural class.
The exception is the last step, going to UFT-full. This supports the idea that
there may be a natural limit around 70-75%, beyond which there are mostly
just non-recurrent phonetically unnatural classes. The next section explores
two strategies for dealing with the residue of featurally unnatural classes.
3. Union and Subtraction
The phonologically active classes that are unnatural according to the feature
sytems explored in the previous section are generally not random assortments
of segments. Many of them may be due to the effects of what were originally
more than one sound pattern, and as such may be modeled efctively by
using more than one natural class. This section investigates and interprets the
relative success of rule- and constraint-based approaches to phonologically
active classes that cannot be represented by a conjunction of UFT-place
distinctive features. The OT-style conict-based approach is compared with
the SPE-style bracket notation. We examine featurally unnatural classes, to
see how well they can be represented in terms of two natural classes, i.e., as
the union of two natural classes, or as the result of subtracting one natural
class from another.
Unnatural classes are groups of sounds that may be active in the phonology
of a language but which cannot be represented as a single feature bundle in
a particular feature theory. Unnatural classes that are phonologically active
are not rare (Mielke 2008). A classic unnatural class is involved in the ruki
rule of Sanskrit, whereby / s / in verbal roots is retroexed following any of
the segments {r u k i} (Whitney 1960, Renou 1961, Zwicky 1970).
Phonological theory has had ways of dealing with unnatural classes. For
example, in rule-based phonology (e.g., Chomsky and Halle 1968), bracket
notation allows rules to reference unnatural classes if they are the union of two
or more classes which are themselves natural. In Optimality Theory (Prince
and Smolensky 1993), constraint interaction permits unnatural classes to
act together if a constraint referring to a natural class is opposed by one or
more constraints referring to overlapping natural classes, as described by
Flemming (2005). Caveats include the fact that both approaches allow any
logically possible class of sounds to be represented (given a feature system
rich enough to capture all the necessary segmental contrasts), and each can
use the others approach too. The point here is about the relative merits of
union and subtraction as methods for describing unnatural classes.
3.1. Predictions
The two approaches make different predictions about the frequency of
occurrence of unnatural classes. The SPE-style approach predicts that the
most frequent unnatural classes should be easily represented with the union
of natural classes, while the OT-style approach predicts that the most frequent
unnatural classes should be those which are easily represented through the
subtraction of one natural class from another.
Figures 1213 show the segments that cause a preceding /c/ to be realized
as [] in a variety of Afrikaans spoken in the Transvaal and the Free State
Afrikaans.
Figure 12 shows how the class can be analyzed using UFT-place features
as the union of two natural classes. In this approach, the unnatural class
/ k x r l / is assembled by combining the natural class of voiceless dorsals
(/ k x /) with the natural class of coronal approximant consonants (/ r l /).
Figure 13 shows how the same class can be analyzed by subtracting one
natural class from another. The unnatural class / k x r l / is assembled in this
approach by starting with the class of nonnasal nonvocoids and subtracting
the class of nondorsal nonapproximants. Notice that an OT-style union
account is possible using two markedness constraints that refer to different
classes, and an SPE-style subtraction account is possible using rule
ordering.
Figure 13. Subtraction analysis of an unnatural class in a variety of Afrikaans
Subset:
p t k
f s x
b d
H v
r
l
j
m n N
Disjunction:
_
voice
+Dorsal
_
_
+approximant
+C-place Coronal
_
Rule: /E/
_
+open1

/
_
_
_
voice
+Dorsal
_
_
+approximant
+C-place Coronal
_
_
_
Figure 12. Union analysis of an unnatural class in a variety of Afrikaans
(the segments that cause a preceding /c/ to be realized as [])
Subset:
p t k
f s x
b d
H v
r
l
j
m n N
Subtraction:
_
vocoid
nasal
_
_
approximant
Dorsal
_
Ranking: *
_
approximant
Dorsal
_
>> *E
_
vocoid
nasal
_
>> IDENT[open1]
Figures 14 and 15 illustrate phonologically active classes that each can be
analyzed in only one of these two ways. Figure 14 shows the segments that
cause a preceding nasal to be realized as [n] in Diola-Fogny (Sapir 1965).
This unnatural class can be analyzed in UFT-place with the union of two
natural classes. Figure 16 shows the segments that are nasalized before a
nasalized vowel in Aoma (Elugbe 1989), and the analysis of this unnatural
class can in UFT-place with the subtraction of two natural classes.
3.2. Testing
The predictions of these two approaches to unnatural classes were tested
using the 1691 P-base classes that are unnatural according to UFT-place.
These phonologically active unnatural classes were compared with 1691
matching randomly generated class. The random classes were generated as
above by randomly drawing a class of the same size from the same inventory
as the real class. We exclude the 14 randomly-generated classes that turned
out to be natural according to the feature system and focus on the 1677 pairs
of real and randomly occurring classes that are unnatural according to this
Subset:
p t c k
b d g
f s h
m n N
j w
l
Disjunction:
_
+Labial
+continuant
_
_
+anterior
lateral
_
Rule:
_
+nasal

_
+Coronal

/
_
_
_
+Labial
+continuant
_
_
+anterior
lateral
_
_
_
Figure 14. Union analysis of an unnatural class in Diola-Fogny: the segments that
cause a preceding nasal to be realized as [n]
Figure 15. Subtraction analysis of an unnatural class in Aoma: the segments that are
nasalized before nasal vowels
Subset:
p t k
>
kp
b d g
>
gb
f s x h
v z G
r
j w
l
m
Subtraction:
_
nasal
syllabic
_
_
_
lateral
spread
vocoid
_
_
Ranking: *
_
_
+nasal
lateral
spread
vocoid
_
_
>> *
_
nasal
syllabic
_
V >> IDENT[nasal]
feature system. The union approach (combining two natural classes) was able
to handle 1173 real classes (69.9%), and the subtraction approach was able
to handle 857 real classes (51.1%). While these results look favorable for the
union approach, union is also better at handling randomly-generated classes.
Union represents 434 of the random classes (25.9%), while subtraction
represents only 129 random classes (7.7%). Thus, union successfully
represents 2.7 times as many real unnatural classes as randomly-generated
ones, whereas subtraction successfully represents 6.6 times as many. This
suggests that unions success with real unnatural classes is perhaps not due
to being a good model of phonology, but due to being able to handle a lot of
logically possible (but not necessarily attested) combinations of segments.
However, the randomly-generated classes that are handled by union and
subtraction are mostly the very small classes. Two-segment classes are
trivially represented by union (i.e., the class of segment A plus the class of
segment B), but almost no randomly-generated classes of seven segments
or larger can be represented by either technique, while both techniques can
Analysis of unnatural real classes by size
Class size
A
n
a
l
y
s
i
s

s
u
c
c
e
s
s

r
a
t
e
2
(n=261)
3
(n=321)
4
(n=229)
5
(n=175)
6
(n=129)
7+
(n=562)
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
union only
union and subtraction
subtraction only
Figure 16. Representing unnatural real classes: union vs. subtraction
represent about half of the real unnatural classes of seven or more segments.
Figures 1617 show the percentage of classes of different sizes that are
handled by the two techniques.
We conclude that very small classes have little value for testing these
approaches to unnatural classes. Focusing on the classes that are large
enough that their random counterparts are not trivially natural shows
that both techniques are quite effective at modeling the observed unnatural
classes. Union has a higher success rate, and the classes that are handled
by subtraction are nearly a subset of the classes that are handled by
union. Since many of the supercially unnatural classes of sounds can
be understood in terms of the interaction of more than one sound pattern,
deciding whether subtraction or union is more appropriate in a particular
instance requires looking at each sound pattern in more detail (for this, see
e.g., Hall 2010).
4. Conclusion
A central goal of an adequate feature system is to capture classes of sounds
that pattern together in language systems, and ideally only those classes.
Achieving this goal requires a theory powerful enough to represent naturally
occurring classes, yet sufciently constrained so that it does not predict
unnatural classes. The six feature theories examined in this study display
varying degrees of success in capturing the 6,159 naturally-occurring
phonological classes drawn from P-base. Consistent with the prediction
of Nick Clements, a version of UFT enhanced with binary place features
Analysis of unnatural random classes by size
Class size
A
n
a
l
y
s
i
s

s
u
c
c
e
s
s

r
a
t
e
2
(n=261)
3
(n=321)
4
(n=229)
5
(n=175)
6
(n=129)
7+
(n=562)
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
union only
union and subtraction
subtraction only
Figure 17. Representing unnatural random classes: union vs. subtraction
emerged as competitive with the SPE and Halle and Clements systems. Each
of these three systems was able to represent between 70% and 73% of the
naturally occurring phonologically active classes. UFT-place performed
slightly better than the SPE and Halle and Clements systems, accounting
for more observed classes, and fewer random classes. We have suggested
that this percentage may reect distinctive feature theorys natural threshold
for capturing classes, and that the remaining 2730% of real phonological
classes are not phonetically natural enough to be expressed in terms of the
best systems of phonetically-dened distinctive features.
Of the approximately 6,000 real classes used in this study, about 3,000
phonologically active classes are treated as natural by all six feature systems
and about 1,300 classes are unnatural for all. As discussed above, the
remaining classes reveal differences among the feature systems. In some
cases, the differences are due to the absence of a feature in a theory, e.g.
[labial] in SPE. Other differences reect the fact that the partitioning of
classes can vary from one system to another such that the same value of a
feature with the same name targets different sounds depending on the system.
This is seen with the use of [continuant] in the Halle and Clements system
as opposed to SPE systems, as reected, for example, in how laterals are
classied. A slight variation in the denition of similar features also creates
differences in how sounds are classied, e.g. [high] and [Dorsal]. Instead
of converging on well-dened boundaries between places of articulation or
major classes of segments, the different feature systems succeed because
they dene the boundaries in slightly different places, essentially dening
the class of coronals with and without palatals (see Table 6) by using the
features [grave], [+coronal], and [labial, back], and dening the class of
consonants or obstruents with and without glottals and/or glides (see Table
11) by using features such as [sonorant], [+consonantal], [approximant]
and [vocoid]. These slightly different features succeed in allowing slight
variations on familiar classes, but often do so circuitously, much like
using [heightened subglottal pressure] in order to dene the class / n l / in
a language that also has /r/ (see Table 5). The issue that these differences
highlight does not have as much to do with the particular features chosen, but
rather with the view that a limited set of phonetically-dened features should
be able to express the full range of observed classes. While the top theories
do remarkably well, capturing almost 73% of observed classes, the fact that
historical pathways can lead to the creation of phonetically unnatural classes
and to slight variations on familiar classes means that no phonetically-based
feature system with a small number of features will ever be able to represent
all phonologically active classes (see Mielke 2008 for discussion).
Given the signicant number of classes classes that are unnatural, yet
phonologically active, our study also evaluated the success of rule- and
constraint-based theories of phonology in expressing these classes by means
of the union or subtraction of feature values. The results reveal that the union
approach fares slightly better in capturing phonologically active unnatural
classes.
We conclude that the Unied Feature Theory augmented with minus
values of place features, as suggested by Nick Clements, is the most effective
of the six systems we compared at describing phonologically active classes
without overgenerating too much. Improving upon this system brings us to
the point of adding slightly different versions of the same features, blurring
the boundaries between two feature values, dealing with unnatural classes,
or otherwise departing from the original notion of a small set of distinctive
features to describe sound patterns. While these modications are necessary
in order to provide feature-based analyses of all observed sound patterns, we
can conclude that UFT-place has achieved the goal of maximizing the range
of recurrent phonetically natural classes of sounds that are phonologically
active in the worlds languages and can accounted for by a small set of
distinctive features.
Appendix
5. References for languages from P-base mentioned in the chapter
Acehnese (Durie 1985)
Afrikaans spoken in the Transvaal and the Free State (Donaldson 1993)
Agulis Armenian (Vaux 1998)
Aoma (Elugbe 1989)
Arabana (Hercus 1994)
Asmat (Flamingo Bay dialect) (Voorhoeve 1965)
Batibo Moghamo (Meta) (Stallcup 1978)
Catalan (Wheeler 1979)
Coeur dAlene (J ohnson 1975)
Dgr (Bodomo 2000)
Dagur (Martin 1961)
Dhaasanac (Tosco 2001)
Dieri (Diyari) (Austin 1981)
Diola-Fogny (Sapir 1965). Ecuador Quichua (Puyo Pongo variety) (Orr 1962)
Edo (Elugbe 1989)
Ehueun (Elugbe 1989)
Epie (Elugbe 1989)
Estonian (Harms 1962)
Faroese (Lockwood 1955)
Gooniyandi (McGregor 1990)
Gunin/Kwini (McGregor 1993)
Harar Oromo (Owens 1985)
Hungarian (Abondolo 1988)
Ilocano (Rubino 2000)
Irish (Siadhail 1989)
J apanese (Vance 1987)
Kalenjin (Toweett 1979)
Khmu? (Smalley 1961)
Kilivila (Lawton 1993)
Koromf (Rennison 1997)
Kumii (Diegueo) (Gorbet 1976)
Maasai (Hollis 1971)
Martuthunira (Dench 1995)
Melayu Betawi (Ikranagara 1975)
Midland Mixe (Puxmetacan) (Wichmann 1995)
Mising (Prasad 1991)
Muruwari (Oates 1988)
Muscat Arabic (Glover 1989)
Nandi Kalenjin (Toweett 1979)
Northern Tepehuan (Willett 1988)
Oodham (Saxton 1979)
Okpe (Elugbe 1989)
Oneida (Michelson 1983)
San Pedro de Cajas J unn (Adelaar 1977)
Slovene (Herrity 2000)
South Highland Mixe (Tlahuitoltepec) (Wichmann 1995)
Sri Lanka Creole Portuguese (Smith 1981)
Supyire Senoufo (Carlson 1994)
Tarma Quechua (Adelaar 1977)
Tetelcingo Nahuatl (Tuggy 1979)
Thompson (Thompson and Thompson 1992)
Tirmaga (Bryant 1999)
Ukrainian (Bidwell 196768)
Ukue (Elugbe 1989)
Uvbie (Elugbe 1989)
Wangkangurru (Hercus 1994)
Yiddish (Katz 1987)
Yidi (Dixon 1977)
Note
1. The voiceless
voiced pairs in J apanese include h
b.
References
Abondolo, D. M.
1988 Hungarian inectional morphology. Budapest: Akadmiai Kiad.
Adelaar, W. F. H.
1977 Tarma Quechua grammar, texts, dictionary. Lisse: The Peter de
Ridder Press.
Austin, Peter.
1981 A grammar of Diyari, South Australia. Cambridge: Cambridge
University Press.
Bidwell, Charles E.
196768 Outline of Ukrainian morphology. University of Pitts burgh.
Bodomo, Adams B.
2000 Dgr. Muenchen: Lincom Europa.
Bryant, Michael Grayson.
1999 Aspects of Tirmaga grammar. Ann Arbor: UMI.
Carlson, Robert.
1994 A grammar of Supyire. Berlin: Mouton de Gruyter. Chomsky, Noam,
and Morris Halle. 1968. The Sound Pattern of English. Cambridge,
Mass.: MIT Press.
Clements, G.N.
1976 Palatalization: Linking or assimilation? In Chicago Linguistic Society
12, 96109.
2001 Representational economy in constraint-based phonology. In
Distinctive feature theory, ed. T. Alan Hall, 71146. Berlin: Mouton
de Gruyter.
Clements, G.N., and Elizabeth V. Hume.
1995 The internal organization of speech sounds. In The handbook of
phonological theory, ed. J ohn Goldsmith, 245306. Cambridge
Mass.: Blackwell.
Dench, Alan Charles.
1995 Martuthunira: A language of the Pilbara region of Western Australia.
Canberra: Pacic Linguistics.
Dixon, R. M. W.
1977 A grammar of Yidin. New York: Cambridge University Press.
Donaldson, Bruce C.
1993 A grammar of Afrikaans. New York: Mouton de Gruyter.
Durie, Mark.
1985 A grammar of Acehnese on the basis of a dialect of North Aceh.
Bloomington/The Hague: Foris.
Elugbe, Ben Ohiomamhe.
1989 Comparative edoid: Phonology and lexicon. Port: University of Port
Harcourt Press.
Flemming, Edward S.
2005 Deriving natural classes in phonology. Lingua 115:287309.
Glover, Bonnie Carol.
1989. The morphophonology of Muscat Arabic. Ann Arbor: UMI.
Gorbet, Larry Paul.
1976 A grammar of Diegue no nominals New York: Garland no nominals.
Publishing, Inc.
Hall, Daniel Currie.
2010 Probing the unnatural. In Linguistics in the netherlands 2010, ed.
J acqueline van Kampen and Rick Nouwen, 7183. Amsterdam: J ohn
Benjamins.
Halle, Morris, and G.N. Clements.
1983 Problem Book in Phonology. Cambridge, Mass.: The MIT Press.
Harms, Robert T.
1962 Estonian grammar. Bloomington/The Hague: Mouton & Co./Indiana
University.
Hercus, Luise A.
1994 A grammar of the Arabana-Wangkangurru language, Lake Eyre
Basin, South Australia. Canberra: Pacic Linguistics.
Herrity, Peter.
2000 Slovene: A comprehensive grammar. New York: Routledge.
Hollis, Alfred C.
1971 The Masai: Their language and folklore. Freeport, NY: Books for
Libraries Press.
Hume, Elizabeth V.
1994 Front vowels, coronal consonants and their interaction in nonlinear
phonology. New York: Garland.
Hyman, Larry.
1972 The feature [grave] in phonological theory. Journal of Phonet ics
1:329337.
Ikranagara, Kay.
1975 Melayu Betawi grammar. Ann Arbor: UMI.
J akobson, Roman, C. Gunnar M. Fant, and Morris Halle.
1952 Preliminaries to speech analysis: the distinctive features and their
correlates. Cambridge, Mass.: MIT Press.
J ohnson, Robert Erik.
1975 The role of phonetic detail in Coeur dAlene phonology. Ann Arbor:
UMI.
Katz, Dovid.
1987 Grammar of the Yiddish language. London: Duckworth.
Langacker, Ronald W., ed.
1979 Studies in Uto-Aztecan Grammar. Arlington: The Summer Institute of
Linguistics and The University of Texas at Arlington.
Lawton, Ralph.
1993 Topics in the description of Kiriwina. Canberra: Pacic Linguistics.
Lockwood, W. B.
1955 An introduction to Modern Faroese. Kbenhavn: Ejnar Munskgaard.
Mackie, Scott, and J eff Mielke.
2011 Feature economy in natural, random, and synthetic inventories. In
Where do phonological contrasts come from?, ed. G. N. Clements and
Rachid Ridouane. Amsterdam: J ohn Benjamins.
Martin, Samuel Elmo.
1961 Dagur Mongolian grammar, texts, and lexicon; based on the speech of
Peter Onon. Bloomington: Indiana University.
McGregor, William.
1990 A functional grammar of Gooniyandi. Philadelphia: J ohn Benjamins
Publishing Company.
McGregor, William B.
1993 Gunin/Kwini. Mnchen: Lincom Europa.
Michelson, Karin Eva.
1983 A comparative study of accent in the Five nations Iroquoian languages.
Ann Arbor: UMI.
Mielke, J eff.
2008 The Emergence of Distinctive Features. Oxford: Oxford Univer sity
Press.
to appear A phonetically-based metric of sound similarity. Lingua.
Miller, Brett.
2011 Feature patterns: Their sources and status in grammar and re-
construction. Doctoral Dissertation, Trinity College, Dublin.
Oates, Lynette F.
1988 The Muruwari language. Canberra: Pacic Linguistics.
Odden, David.
1978 Further evidence for the feature [grave]. Linguistic Inquiry 9:141144.
Orr, Carolyn.
1962 Equador Quichua phonology. In Studies in Ecuadorian Indian
languages: I, ed. Benjamin Elson. Norman: Summer Institute of
Linguistics of the University of Oklahoma.
Siadhail, Mchel.
1989 Modern Irish: Grammatical structure and dialectal vari ation. New
York: Cambridge University Press.
Owens, J onathan.
1985 A grammar of Harar Oromo (Northeastern Ethiopia). Ham burg:
Helmut Buske Verlag.
Prasad, Bal Ram.
1991 Mising grammar. Mysore: Central Institute of Indian Lan guages.
Prince, Alan, and Paul Smolensky.
1993 Optimality theory: Constraint interaction in generative grammar. Ms,
Rutgers University, New Brunswick and University of Colorado,
Boulder.
Rennison, J ohn R.
1997 Koromfe. New York: Routledge.
Renou, L.
1961 Grammaire Sanscrite. Paris: Adrein-Maisonneuve.
Rubino, Carl Ralph Galvez.
2000 Ilocano dictionary and grammar. Ilocano: Uni versity of Hawaii Press.
Sapir, J . David.
1965 A grammar of Diola-Fogny. Cambridge: Cambridge Univer sity Press.
Saxton, Dean.
1979 Papago. In Langacker (1979).
Smalley, William A.
1961 Outline of Khmu? structure. New Haven: American Oriental Society.
Smith, Ian Russell.
1981 Sri Lanka Portuguese Creole phonology. Ann Arbor: UMI.
Sol, Maria-J osep.
2002 Aerodynamic characteristics of trills and phonological patterning.
Journal of Phonetics 30:655688.
Stallcup, Kenneth Lyell.
1978 A comparative perspective on the phonology and noun classication
of three Cameroon Grasselds Bantu languages: Moghamo, Ngie,
and Oshie. Ann Arbor: UMI.
Thompson, Laurence C., and M. Terry Thompson.
1992 The Thompson language. University of Montana Occasional Papers in
Linguistics No. 8.
Tosco, Mauro.
2001 The Dhaasanac language: grammar, text, vocabulary of a Cushitic
language of Ethiopia. Kln: Rdiger Kppe Verlag.
Toweett, Taaitta.
1979 A study of Kalenjin linguistics. Nairobi: Kenya Literature Bureau.
Tuggy, David H.
1979 Tetelcingo nahuatl. In Langacker (1979).
Vance, Timothy J .
1987 An introduction to Japanese phonology. Albany, N.Y.: State University
of New York Press.
Vaux, Bert.
1998 The phonology of Armenian. Oxford: Clarendon Press.
Voorhoeve, C. L.
1965 The Flamingo Bay Dialect of the Asmat language. S-Gravenhage-
Martinus Nijhoff.
Wheeler, Max.
1979. Phonology of Catalan. Oxford: Basil Blackwell.
Whitney, W.D.
1960. Sanskrit grammar [9th issue of 2nd ed.]. Cambridge, Mass.: Harvard
University Press.
Wichmann, Sren.
1995. The relationship among the Mixe-Zoquean languages of Mexico. Salt
Lake City: University of Utah Press.
Willett, Thomas Leslie.
1988 A reference grammar of Southeastern Tepehuan. Ann Arbor: UMI.
Zwicky, Arnold M.
1970 Greek-letter variables and the Sanskrit ruki class. Linguis tic Inquiry 1:
54955.
Language-independent bases of distinctive features
Rachid Ridouane, G. N. Clements,
and Rajesh Khatiwada
1. Introduction
A basic principle of human spoken language communication is phonological
contrast: distinctions among discrete units that convey different grammatical,
morphological or lexical meanings. Among these units, features have achieved
wide success in the domain of phonological description and play a central
role as the ultimate constitutive elements of phonological representation.
Various principles are claimed to characterize these elements. They are
universal in the sense that all languages dene their speech sounds in terms
of a small feature set. They are distinctive in that they commonly distinguish
one phoneme from another. They delimit the number of theoretically possible
speech sound contrasts within and across languages. They are economical
in allowing relatively large phoneme systems to be dened in terms of a
much smaller feature set. They dene natural classes of sounds observed in
recurrent phonological patterns.
A main theme is that existing distinctive features are generally satis-
factory for phonological purposes but may not be phonetically adequate
(e.g. Ladefoged 1973, 1980, 1993; Lfqvist and Yoshioka 1981). At least
three positions have been taken: (1) Existing features should be improved by
better phonetic denitions (e.g. Halle 1983; Stevens, 1989, 2003; Stevens
and Keyser 2010), (2) Existing features should be supplemented with
phonetic features (e.g. Flemming 1995; Boersma 1998), and (3) Existing
features should be replaced with phonetically more adequate primitives (e.g.
gestures of Browman and Goldstein 1986). We take up the rst position
and argue that both the acoustic and articulatory structure of speech should
be incorporated into the denition of phonological features. Features are
typically dened, according to the researcher, either in the acoustic-auditory
domain (e.g. J akobson, Fant and Halle 1952), or in the articulatory domain
(e.g. Chomsky and Halle 1968). After several decades of research, these
conicting approaches have not yet led to any widely-accepted synthesis
(Durand 2000). A problem for purely acoustic approaches is the widely-
acknowledged difculty in nding acoustic invariants for a number of
fundamental features, such as those characterizing the major places of
articulation. A problem for purely articulatory approaches is raised by the
existence of articulator-independent features such as [continuant], which is
implemented with different gestures according to the articulator employed
(e.g. no invariant gesture is shared by the continuants [f], [s], and [x]).
These and other problems suggest that neither a purely acoustic nor a purely
articulatory account is self-sufcient. In recent years, a new initiative has
emerged within the framework of the Quantal Theory of speech, developed
by K.N. Stevens and his colleagues (e.g. Stevens 1989, 2002, 2003, 2005;
Stevens and Keyser 2010). Quantal theory claims that there are phonetic
regions in which the relationship between an articulatory conguration and
its corresponding acoustic output is not linear. These regions form the basis
for a universal set of distinctive features, each of which corresponds to an
articulatory-acoustic coupling within which the auditory system is insensitive
to small articulatory movements. A main innovation of this theory is the
equal status it accords to the acoustic, auditory, and articulatory dimensions
of spoken language. For a feature to be recovered from a speech event,
not only must its articulatory condition be met, but its acoustic denition
must be satised, or else further enhancing attributes must be present. The
dening acoustic attributes of a feature are a direct consequence of its
articulatory denition. These are considered to be language-independent.
The enhancing attributes of a feature are additional cues that aid in its
identication. These may vary from language to language (Stevens and
Keyser 2010).
Our objective will be to propose a language-independent phonetic
denition of the feature [spread glottis]. We will show that an articulatory
denition of this feature in terms of a single common glottal conguration or
gesture would be insufcient to account for the full range of speech sounds
characterized by this feature; an acoustic denition is also necessary.
1.1. The feature [spread glottis]
Laryngeal features for consonants are used to dene the following
phonologically distinctive dimensions: Voicing, aspiration, and glottalization.
These have been expressed with different sets of features. Depending on
the particular authors, the differences between the features used are mainly
due to the different interpretations of how these laryngeal dimensions are
266 Rachid Ridouane, G. N. Clements, and Rajesh Khatiwada
physically produced. Voicing is dened either by the feature [voice] or
by a combination of the features [slack vocal cords]/[stiff vocal folds].
Glottalisation is dened either by the feature [checked] or [constricted glottis].
Good reviews of the use of these features are provided by Keating (1988:
1722), Lombardi (1991: 130), J essen (1998: 117136), and the references
therein.
Aspiration, which is the main concern of this paper, has been traditionally
dened as a puff of air (Heffner 1950) or breath (J ones 1964) following
the release of a consonant. Lisker and Abramson (1964) relate the contrast
of aspiration (and voicing) mainly to a different timing of laryngeal activity
relative to the supralaryngeal constriction. According to them the feature
of aspiration is directly related to the timing of voice onset (Lisker
and Abramson 1967: 15). With the exception of Browman and Goldstein
(1986), who incorporated this timing into phonological analysis, Lisker and
Abramsons framework had almost no impact on phonologists (the VOT
criterion is discussed in more detail in section 3). Rather, phonologists
used different timeless features to dene aspiration: [tense], [heightened
subglottal pressure], [spread glottis], or [aspirated]. J akobson, Fant and
Halle (1952) use the feature [tense] to distinguish aspirated from unaspirated
stops in Germanic languages, such as English. J essen (1998) provides
the same analysis for German stops. Languages using both voicing and
aspiration distinctively (such as Nepali, Thai and Hindi) are said to use both
the feature [voice] and the feature [tense]. For Chomsky and Halle (1968)
aspiration is represented by the feature [heightened subglottal pressure].
This feature, meant to represent the extra energy for aspiration, was highly
controversial and has never been widely used. Data on subglottal pressure
on several languages, such as Hindi (Dixit and Shipp 1985), have shown that
aspirated stops are not systematically produced with heightened subglottal
pressure, compared to their unaspirated counterparts (but see Ladefoged and
Maddieson (1996) on data on Igbo).
The feature [spread glottis] was formally proposed as a phonological
laryngeal feature for the rst time by Halle and Stevens (1971), and
has achieved since then notable success among both phonologists and
phoneticians (Ladefoged 1973; Kingston 1990; Iverson 1993, Kenstowicz
1994; Lombardi 1991, 1995; Iverson and Salmons 1995, 2003; Avery 1996;
J essen 1998; Avery and Idsardi 2001; Vaux and Samuels 2005; among
others). The majority of linguists working within nonlinear phonology and
Optimality Theory assume it exists as part of the universal set of features.
[spread glottis] singles out classes that play a linguistic role. It has a lexical
function in that it distinguishes similar words in many languages (e.g.
Standard Chinese: [p'a] ower vs. [pa] eight, Nepali [t'iti] condition
vs. [tit'i] date). It has a phonological function in that it denes natural
classes that gure in phonological patterns. In Nepali, for example, both
consonants in a CVC sequence may not be aspirated, i.e., [spread glottis]:
[t'iti] condition, [tit'i] date, but: *[t'it'i]. The following types of segments
are commonly assumed to bear the [spread glottis] specication:
Voiceless aspirated stops (e.g. Standard Chinese): [p']
Voiced aspirated or breathy voiced stops (e.g. Nepali): [d']
Voiceless sonorants (e.g. Burmese): [n ]
While this class of sounds is generally agreed upon, the [+spread glottis]
specication has also been proposed for voiceless fricatives. For Halle
and Stevens (1971), fricatives can exceptionally be [+spread glottis] as in
Burmese, where it is required to distinguish plain /s/ from aspirated /s
h
/, but
not in English, for instance, where these segments are specied as [spread
glottis]. Numerous researchers, however, argue that voiceless fricatives,
including those occurring in languages where they dont contrast in terms of
aspiration, should be specied as [+spread glottis] (e.g. Rice 1988; Kingston
1990; Cho 1993; Iverson and Salmons 1995; Vaux and Samuels 2005). Both
phonological and phonetic data are claimed to motivate this analysis. It is
claimed to be phonologically motivated since, as for English for example, it
allows a unied treatment of stop deaspiration after fricatives (e.g. in speed)
and sonorant devoicing after fricatives (e.g. in s lim). The claim is that these
clusters represent a sharing of [spread glottis]. It is phonetically motivated
since articulatory data from various unrelated languages have shown that
voiceless fricatives are produced with a large degree of glottal opening
(Kingston 1990; Stevens 1998; see Ridouane 2003, for a review).
In the approach assumed here, a segment can be said to bear the feature
[spread glottis] at the phonetic level only if it satises both its articulatory
and acoustic denitions. We will rst examine two proposed articulatory
denitions of the feature [spread glottis], and will show that neither is able to
account for the full class of aspirated sounds across languages. We will then
propose a new articulatory denition coupled with an acoustic denition,
and show that it covers all data.
2. Two proposed articulatory denitions of [spread glottis]
[spread glottis] is typically dened in the articulatory domain. In the current
view, the basic correlate of this feature involves the spreading of the glottis,
as its name suggests. This view originates in the work of C.-W. Kim (1970)
based on cineradiographic data from the Korean voiceless stop series: tense
unaspirated, heavily aspirated, and lax slightly aspirated. Kim dened
aspiration as a function of glottal opening amplitude. In his words (1970:
111):
[] it seems to be safe to assume that aspiration is nothing but a function
of the glottal opening at the time of release. This is to say that if a stop is
n degree aspirated, it must have an n degree glottal opening at the time of
release of the oral closure. [] no stop is aspirated if it has a small glottal
opening, and [] a stop cannot be unaspirated if it has a large glottal opening
at the time of the oral release.
In this view, the differences in terms of aspiration duration between the
Korean three stop series is due to different degrees of glottal opening. The
heavily aspirated stops have the largest glottal opening (with a peak around
10 mm according to Kims cineradiographic tracings), the unaspirated stops
have the smallest (less than 1 mm) and the lax slightly aspirated have an
intermediate glottal opening (around 3 mm).
Kims work was followed by a series of studies on the controlling factors
of aspiration vs. non-aspiration. Two main theories can be distinguished from
these studies: The glottal width theory which views aspiration primarily as
a function of degree of glottal width (e.g. Kagaya 1974, Hutters 1985); and
the glottal timing theory which views aspiration as a function of a specic
temporal coordination of laryngeal gesture in relation to supralaryngeal
events (e.g. Ptursson 1976; Lfqvist 1980). We consider each of these in
turn and show that both fail to account for the full range of facts.
2.1. The glottal width theory
The relevance of the size of glottal opening for the presence vs. absence of
aspiration has been widely demonstrated: In various languages, the voiceless
aspirated stops are produced with a relatively large glottal opening gesture,
whereas their unspirated counterparts are produced with a glottal opening
which is much narrower, being almost completed at the time of oral release.
Table 1 reports some of such languages where aspirated voiceless stops have
been shown to be invariably produced with a wide open glottis.
In Nepali, for example, where aspiration is distinctive, maximal glottal
opening is greater in aspirated stops than in unaspirated stops (Figure 1).

t
h
t
h
t
h
t
h
-i
i t t i
t

t
t
h
-i
i
V
t - i
i
t
h
t
h
Table 1. A list of languages where aspirated stops have been shown to require a large
glottal opening amplitude
Language References
Cantonese Iwata et al. (1981)
Danish Fukui and Hirose (1983), Hutters (1985)
English Lisker and Abramson (1971), Lfqvist (1980), Cooper (1991)
Fukienese Iw ata et al. (1979)
German Hoole, Pompino-Marschall, and Damesl (1984); J essen (1998)
Hindi Benguerrel and Bhatia (1980); Dixit (1989)
Icelandic Ptursson (1976); Lfqvist and Ptursson (1978)
Korean Kagaya (1970); Kim (2005)
Maithili Yadav (1984)
Swedish Lfqvist and Ptursson (1978); Lfqvist and Yoshioka (1980)
Tibetan Kjellin (1977)
Figure 1. States of the glottis during the production of the Nepali items [t'iti]
condition (above) and [tit'i] date (below). The gures show that the
amplitude of glottal opening is larger during the production of aspirated
[t'], compared to unaspirated [t]. (occ =occlusion phase, v =vowel, rel =
release phase).
There is no doubt that differences in the size of glottal opening are
relevant for the presence or absence of aspiration. A question is whether
these size differences are the controlling factor of aspiration, rather than,
say, interarticulatory timing differences between glottal and supralaryngeal
gestures, as assumed by Lfqvist and colleagues. For Hutters (1985: 15),
based on data from Danish, the production of aspiration is primarily a matter
of the glottal gesture type rather than the timing between this gesture and
supraglottal articulations: The difference between aspirated and unaspirated
stops in the timing of the explosion relative to the glottal gesture is primarily
due to the different types of glottal gesture rather than to a different timing of
the glottal and supraglottal articulations. Similarly, Ladefoged (1993: 142)
holds that In general, the degree of aspiration (the amount of lag in the
voice onset time) will depend on the degree of glottal aperture during the
closure. The greater the opening of the vocal cords during a stop, the longer
the amount of the following aspiration. The feature [spread glottis], as used
by most phonologists, is also assumed to entail the size of glottal spreading
without incorporating timing dimensions (e.g. Goldsmith 1990; Kenstowicz
1994).
2.1.1. Problems with glottal width theory
There are at least three problems with the view that [spread glottis] should
be dened in terms of glottal opening amplitude alone. First, not all aspirated
sounds are produced with a wide glottal opening. In voiced aspirated stops,
the glottis is only slightly open. This is the case for example in Nepali,
shown in Figure 2. In this language, voiced aspirated stops are produced
with a closed glottis during part of the closure phase and with a slightly
open glottis during part of the closure and the release phase, with vocal
folds vibrating throughout. Data from Hindi (Dixit 1989; Kagaya and Hirose
1975; and Benguerrel and Bhatia 1980) and Maithili (Yadav 1984) also show
that aspirated voiced stops are produced with a narrow glottal opening. This
limitation of the glottal width theory of [spread glottis] in characterizing
voiced aspirates has already been pointed out by Ladefoged (1972: 77):
Since what are commonly called aspirated sounds can be made with
two different degrees of glottal stricture (voiceless and murmur), it seems
inadvisable to try to collapse the notion of aspiration within that of glottal
stricture as has been suggested by Kim (1970). This suggests that [spread
glottis] should not be dened solely in terms of glottal width.
Second, aspiration as reected in the VOT duration of pre-vocalic voice-
less stops does not always covary with maximal degree of glottal opening.
In Hindi, for example, aspiration duration is not proportional to the
degree of glottal opening amplitude (Dixit 1989). In Tashlhiyt Berber,
geminate stops /tt, kk/ and their singleton counterparts /t, k/, though they
are produced with virtually identical VOT durations (56 ms for singletons
and 50 ms for geminates), have different glottal opening amplitudes. A
photoelectroglottographic (PGG) study, based on one subject, showed that
the geminates are systematically produced with larger glottal opening than
singletons (data and procedures of the PGG study are described in Ridouane
2003: Chapter 4). This is illustrated in Figure 3, which is arranged so as
to show the glottal opening of a minimal pair involving a geminate and a
singleton stop in intervocalic position (see also Figure 7). In addition to
amplitude differences, singletons and geminates are also different in terms
of the timing of laryngeal-supralaryngeal gestures. While for singletons,
the peak of glottal opening occurred at or closely around the release, for
geminates peak glottal opening is timed well before. For dentals, the interval
from peak glottal opening to stop release varies between 0 and 10 ms for
singletons and between 55 and 120 ms for geminates. For velars, the interval
varies between 10 and 20 ms for singletons and between 55 and 70 ms
for geminates (cf. Ridouane 2003). This suggests that radically different
amplitude and timing patterns of glottal opening may lead to similar degrees
of aspiration duration (this aspect of the glottal timing theory is dealt with in
more detail in section 2.2.1. later).
Figure 2. Maximal glottal opening during the production of intervocalic [d'] in
[bid'i] procedure (left) and nal [d'] in [bibid'] variety (right), as
produced by a native speaker of Nepali. This maximal opening is produced
during the release. These two gures show that the glottis is only slightly
open during aspirated voiced stops.
A third problem with the glottal width theory of [+spread glottis] is that a
wide glottal opening during the production of a stop does not always result
in an aspirated sound. Examples of unaspirated sounds produced with wide
glottal opening include at least unaspirated geminate stops in Icelandic
(Ladefoged and Maddieson 1996), voiceless stops in Kabiye (Rialland et
al. 2009), and the voiceless uvular stop /q/ in Tashlhiyt (Ridouane 2003).
Icelandic has three types of voiceless stops: unaspirated (e.g. [pp] in
[k':ppar] young seal), post-aspirated (e.g. [p'] in [k':p'ar] small pot),
and pre-aspirated (e.g. [
h
p] in [k':
h
par] small pot). As Ladefoged and
Maddieson (1996: 71) showed, based on data from ni Chasaide (1985), the
degree of glottal aperture for post-aspirated [p'] is virtually identical to that
of the unaspirated [pp], suggesting that glottal width alone is not the dening
characteristic of aspiration in this language (see also Ptursson 1976). Kabiye
has a contrast between voiceless unaspirated and voiced stops. Fiberscopic
data, drawn from the production of one subject, show that voiced stops are
produced with an adducted glottis and vibrating vocal folds. Voiceless stops,
on the other hand, are sometimes produced with a large glottal opening, as
shown in Figure 4. These stops, however, never display aspiration.
Figure 3. Schematic illustration of the amplitude and duration of glottal opening
during the production of a singleton and a geminate aspirated stop
in Tashlhiyt. The vertical bar shows the point of oral release and the
horizontal bar shows the degree of glottal opening at this point.
G
l
o
t
t
a
l

o
p
e
n
i
n
g

a
m
p
l
i
t
u
d
e
Geminate
150 50 0 50
Time (ms)
Singleton
In Tashlhiyt, where aspiration is not distinctive, /t/ and /k/ are aspirated,
whereas the dorsopharyngealized /t/ and the uvular /q/ are systematically
produced with no aspiration and a VOT duration of less than 30 ms.
Fiberscopic data from two subjects showed, however, that /q/ is produced
with the largest glottal opening, whereas /t/ is produced with the smallest.

1 2 3
4 5 6
Figure 4. State of the glottis during the production of unaspirated [t] in Kabiye,
showing wide glottal opening size during the occlusion phase (boxes 2, 3,
4). Utterance: [eti] he demolishes.
Initial position
Intervocalic position
Final position
D
e
g
r
e
e

o
f

g
l
o
t
t
a
l

o
p
e
n
i
n
g

(
a
r
b
i
t
r
a
r
y

u
n
i
t
s
)
/k/ /t/
/q/ /t/
4
3
2
1
0
4
3
2
1
0
4
3
2
1
0
4
3
2
1
0
40 ms
Figure 5. Averaged glottal pattern for Tashlhiyt voiceless stops in three word
positions, based on 5 repetitions from one native speaker. The gure
shows that /q/, though unaspirated, is produced with a wider glottal
opening, compared to phonetically aspirated [t
h
] and [k
h
] and unaspirated
dorsopharyngealised [t].
/t/ and /k/ display intermediate amplitudes (data and procedures of the
berscopic study are described in Ridouane 2003: Chapter 3).
The reason why the Icelandic geminate /pp/, the Kabiye dental stop /t/,
and the Tashlhiyt /q/ are not aspirated although they are produced with a
large glottal opening can be related to how this laryngeal opening is aligned
relative to oral release. In Tashlhiyt /q/, for example, the peak glottal opening
is reached during the closure phase so that by the time this stop is released,
the size of glottal opening is so small that the voicing for the following vowel
starts few milliseconds after, thus yielding unaspirated stops. This suggests
that timing relationships between laryngeal and supralaryngeal gestures is
an important factor in the control of aspiration. As Kingston (1990) posits,
aspiration can be implemented when the feature [spread glottis] is tightly
bound to the release of a stop. In light of such facts, it might appear
necessary to abandon the glottal width theory for the glottal timing theory,
according to which aspiration is a function of the alignment of peak glottal
opening with the point of release.
2.2. The glottal timing theory
Studies by Lqfvist and colleagues on various Germanic languages have
argued rather persuasively for the importance of laryngealoral timing
relationships in contrasting aspirated and unaspirated plosives (Lfqvist
1980; Lfqvist and Ptursson 1978; Lfqvist and Yoshioka 1981; Munhall
and Lfqvist 1992; Yoshioka, Lfqvist, and Hirose 1981). Based on data
from Swedish, Lfqvist (1980) showed that the timing of laryngeal gesture
in relation to supralaryngeal events is the primary factor in the control
of aspiration: Even if differences in peak glottal opening were a regular
phenomenon in the production of different stop categories, it should be noted
that, in the published studies, these size differences always appear to be
accompanied by timing differences [...]. Thus it appears to be unwarranted
to claim that the size difference is more basic than the timing difference.
Specically, he showed that if the glottal opening gesture starts at implosion
and peak glottal opening occurs early during stop closure the stop is
unaspirated, whereas if peak glottal opening occurs late during closure,
aspiration results. For Lfqvist and Yoshioka (1981: 31): Specifying glottal
states along dimension of spread/constricted glottis and stiff/slack vocal
cords [Halle and Stevens 1971] would thus not only seem to be at variance
with the phonetic facts, but also to introduce unnecessary complications. The
difference between postaspirated and unaspirated voiceless stops is rather
one of interarticulator timing than of spread versus constricted glottis.
Adopting this view, however, requires that timing relations or other dynamic
information be incorporated into feature presentation. The theoretical issue
here is whether timing must be specied in the denition of the feature itself,
at the level at which it is coordinated with other features as in Steriades
aperture node model (1994), or at the level of gestural coordination in the
sense of Browman & Goldsteins (1986) articulatory phonology. We show
below that this is an unnecessary complication, and that timing relations or
other dynamic information need not be included in feature denitions.
2.2.1. Problems with glottal timing theory
Including timing information in the denition of [spread glottis] would not
be satisfactory to account for the full class of aspirated sounds, for at least
three reasons: (1) In some aspirated sounds, peak glottal opening is not
aligned with the release, (2) In fricative-stop clusters, aspiration can result
from different interarticulatory timings, and (3) A voiceless stop can be
produced with a wide glottal opening at the point of release without being
aspirated.
The rst problem is illustrated by the voiceless sonorants, normally
dened as [+spread glottis]. Voiceless sonorants are contrastive in several
languages, such as Icelandic [ni:ta] to use vs. [n i:ta] to knot and Burmese
[na ] pain vs. [n a ] nose. Data on airow during the production of voiceless
sonorants in Burmese suggest a relatively wide glottal aperture. According to
Ladefoged and Maddieson (1996, 113): There is a high volume of airow ...
suggesting that these nasals are produced with a wide open glottis and might
therefore be characterized as aspirated. PGG data from Icelandic also show
that voiceless sonorants are sometimes produced with a wide glottal opening
size (Bombien 2006). The problem with the glottal timing theory is that for
these aspirated sounds, peak glottal opening is not aligned with the release.
The second problem for the glottal timing theory of [spread glottis]
concerns the way it accounts for the presence or absence of aspiration in
fricative-stop clusters (e.g. English speed). A common phonological account
for stop de-aspiration in this context is that word-initial clusters contain
a single specication of [spread glottis] shared between the fricative and
the stop (Kingston 1990; Iverson and Salmons 1995). This analysis echoes
the PGG and electromyographic studies of Lfqvist and colleagues on the
time-course of glottal movement in consonant clusters in some Germanic
languages (see e.g. Lfqvist 1990, for overview). These studies showed that
a word-initial fricative-stop cluster (e.g. [#sk] in I may scale) is produced
with only one glottal opening-closing gesture, with the peak reached during
the fricative. In heteromorphemic sequences (e.g. [s#k'] in my ace caves),
however, each fricative and stop requires a separate laryngeal peak. In
other words, the stop [k] is aspirated in a heteromorphemic sequence but not
in a tautomorphemic sequence, because it is associated with a separate glottal
opening peak in the former but not in the latter. For Browman and Goldstein
(1986), this single-peaked glottal opening is a phonological regularity of
syllable-initial position in English, suggesting that it is a property of the
whole syllable onset. They capture the relevant timing of laryngeal-oral
coordination for stops and fricative-stops clusters in the following rule:
If a fricative gesture is present, coordinate the peak glottal opening with the
midpoint of the fricative. Otherwise, coordinate the peak glottal opening with
the release of the stop gesture.
This rule has been tested over various clusters in German by Hoole, Fuchs
and Dahlmeier (2003). Their results suggest that Browman and Goldsteins
(1986) rule does not appear to be completely accurate. The only generalization
that could be made from their study is that if a fricative is present, the peak
glottal opening almost always occurs within the fricative. In Tashlhiyt, the
PGG analysis of fricative-stop as well stop-fricative clusters also shows that
the peak glottal opening is not systematically coordinated with the midpoint
of the fricative (Ridouane et al. 2006). As in German, the only generalization
that can be drawn from Tashlhiyt data is that peak glottal opening is almost
always located within the fricative both for stop-fricative and for fricative-
stop sequences. The timing of this opening peak tends to shift to a relatively
earlier point in the fricative when it follows a stop (at 23.49% of the fricative)
and to a later point in the fricative when it precedes a stop (at 66.06% of the
fricative), regardless of the word boundary location.
What is more interesting to the topic under issue is that Tashlhiyt stops
are aspirated following fricatives in word-initial position (e.g. #sk in [sk'ijf]
make someone smoke). Nevertheless, the results make clear that only one
glottal opening peak occurs. In other words, stops can be aspirated after /s/
even if they share a single glottal gesture with the preceding fricative, and
even if the peak glottal opening is not timed to coincide with the release
of the stop. This pattern may well be related to the above-cited fact that
voiceless geminates also show aspiration in fact a very similar amount of
aspiration to the singleton consonants: When the stop is released in [kkV] and
[skV], the glottis has about the same size as it has in [kV], yielding similar
aspiration values in all cases (Figure 6). In sum, different interarticulatory
timings can result in the presence of aspiration after /s/: On the one hand, a
large amplitude and a delay in peak glottal opening relative to fricative onset
(as in Tashlhiyt); on the other, two peak glottal openings, each corresponding
to one of the two obstruents (as in English).
The third problem with the glottal timing theory is that it predicts
aspiration in cases where stops satisfy the articulatory requirement, though
no aspiration is present acoustically. We examine two cases: unaspirated
voiceless stops in utterance-nal position and voiceless stops followed by
an obstruent. In Tashlhiyt, voiceless stops are not aspirated in word-nal
position. The patterns of glottal dynamics of these stops show that the glottis
starts opening at the closure onset and continues to open towards a respiratory
open position so that when stops are released, the glottis is largely open. This
is illustrated in Figure 7 with the nal unaspirated [k#] of [ik] he gave
you. A similar wide glottis conguration has also been reported on voiceless
word-nal stops in Moroccan Arabic (Zeroual 2000), English (Sawashima
1970), Korean (Sawashima et al. 1980), Maithili (Yadav 1984), and Swedish
(Lindqvist 1972). In none of these languages, however, do voiceless stops
display aspiration in this context.
Stops are also unaspirated when followed by a fricative, as in English
[lps], [dcp0], or a stop as in English [daktr]. Though the glottal conguration
of English stops in these positions has not been explicitly examined, one can
infer from the PGG studies on English clusters that these stops are produced
while the glottis is largely open. In the glottographic curves presented in
Yoshioka, Lfqvist, and Hirose (1981), for instance, the rst unaspirated /k/
of [sks#k] in He masks cave, is produced with a larger glottal opening
than the second /k/, which is aspirated! Voiceless words in Tashlhiyt provide
additional evidence that a stop can be produced with a large glottal opening
at the point of release without being aspirated (e.g. in [tfkt] you gave,
/#sk/
/k/
150 50 0 50
/kk/
glottal width at
release
Figure 6. Schematic illustration of the degree of glottal opening at the point of
release for pre-vocalic /k/, /kk/ and /#sk/.
[tkkststt] you took it off, [tsskJftstt] you dried it). The stop consonants
in [tkkststt], for instance, are not followed by aspiration noise. Yet, PGG
examination of the abduction patterns during the production of this item
indicates that the glottis is largely open at the release of these stops. This is
illustrated in Figure 8, which shows the averaged glottographic pattern for
the item [tkkststt]. As we can see, word-internal voiceless stops are released
while the glottis is almost maximally open (the peak opening is reached
during the two fricatives contained in the item).
Data from voiceless stops followed by a voiceless obstruent show that
a segment may satisfy the articulatory denition of [spread glottis] without
satisfying the acoustic denition. This is because the glottal function is
constrained by the degree of the constriction within the supraglottal vocal
cavity. For the glottal opening to be manifested as aspiration, it must be
timed to coincide, at least in part, with an unobstructed vocal tract (cf. Dixit
1993). That is, there must be no narrower constriction in the supralaryngeal
cavities. This requirement holds for aspirated stops in prevocalic position
and before non-homorganic sonorants. It is not satised, however, by stops
before obstruents.
3. On the acoustics of [spread glottis]
As already mentioned, one well established acoustic criterion of distinguish-
ing aspirated and unaspirated voiceless stops is through the notion of positive

Figure 7. States of the glottis during the production of Tashlhiyt word-nal
unaspirated [k] in [ifik] he gave you. The gure shows that a segment
may be produced with a large glottal opening at stop release without being
aspirated.
0.2
0.4
0.5 0.6 0.7 0.8
Amp. peaks: 3.3
Vel. peaks (ab): 5.8
Vel. peaks (ad): 4.3
n=7
0.9 1 1.1
t kk
Time (s)
s s tt t
0.25
0.3
P
G
G
2
F
(
V
)
0.4
0.45
0.35
Figure 8. Averaged glottographic pattern for the item [tkkststt] you took it off,
as realised by a speaker of Tashlhiyt. The pattern indicates the duration,
degree, and number of glottal-opening peaks. The vertical axis shows the
amount of light in arbitrary unit. The dashed lines delimit the onset and
offset of each segment. The gure also displays the number of amplitude
peaks as well as the number of abduction and adduction velocity peaks.
The number of repetitions is indicated between parentheses. Arrows show
how large the glottal opening is near the offset of the voiceless unaspirated
stops /kk/ and /t/.
VOT, which is longer in the former (Lisker and Abramson 1964). While
this is a highly effective measure for differentiating pre-vocalic aspirated
and unaspirated stops in various languages, a number of problems arise in
dening aspiration in terms of VOT alone (Bhatia 1976; Dixit 1989; Tsui
and Ciocca 2000; Cho and Ladefoged 1999; J essen 2001; Vaux and Samuels
2005; Mikuteit and Reetz 2007). First, VOT theory cannot account for the
presence of aspiration in word-nal position in the languages which maintain
the contrast in this position (e.g. Eastern Armenian (Vaux 1998), Nepali
minimal pairs like [ruk] stop! (imp.) vs. [ruk'] tree (Bandhu et al. 1971)),
since there is no onset of voicing in this context. Second, VOT does not
provide for a distinction between plain voiced and aspirated voiced stops
such as /d/ vs. /d'/ in Hindi, Maithili, and Nepali since they are produced
with vibrating vocal folds throughout. Third, positive VOT alone cannot
distinguish aspirated from unaspirated stops in languages contrasting
ejectives and aspirated segments (e.g. Athabaskan languages, Oowekyala,
Lezgian, Haisla, Hupa). In the Lezgian example, shown in Figure 9, aspirated
/k
h
/ and ejective /k'/ have virtually identical VOT durations (see also data
from Hupa presented in Cho and Ladefoged (1999), where aspirated /k
h
/ has
a VOT duration of 84 ms and ejective /k'/ a duration of 80 ms).
The acoustic information occurring within the period from stop release
to voicing onset in aspirated stops is important both for characterizing and
recovering the feature [spread glottis]. In Cantonese, for instance, Tsui
and Ciocca (2000) showed that VOT per se is not a sufcient cue to the
perception of aspiration. They manipulated the duration of the VOT interval
of naturally produced initial aspirated and unaspirated stops to create long
VOT conditions with or without aspiration noise between the release and the
onset of voicing. They found that long VOT stimuli manipulated by adding
a silent interval between the burst and the onset of voicing of unaspirated
stops were perceived as unaspirated stops by native listeners.
Following Fant (1973), we recognize three phases in the interval from the
aspirated stop release to the onset of voicing: (1) Aperiodic transient noise
known as release burst, when the pressure behind the constriction is released
and the resulting abrupt increase in volume velocity excites the entire vocal
tract; (2) Frication segment, when turbulent noise generated at the supraglottal
constriction excites primarily the cavity in front of the constriction; and
(3) An aspirated segment, when turbulent noise generated near the
approximating vocal folds excites the entire vocal tract (see also Stevens
1998: 457465).
1
Adopting this view, we dene aspiration as glottal frication,

k
h
k
Figure 9. Acoustic waveforms and spectrograms illustrating the virtually identical
VOT durations of an aspirated [k
h
] in [k
h
ymekar] help, pl. (left) and
ejective [k'] in [sikar] fox-pl. (right). In this example, aspirated [k']
has a VOT duration of 53 ms and ejective [k'] a VOT duration of 58 ms
(courtesy of Ioana Chitoran).
displayed as a mid- and high-frequency formant pattern partly masked
by noise.
2
In other words, we contend, contra Kim (1970)s view, that the
aperiodic energy corresponding to aspiration noise is created not at the point of
constriction of the following vowel but at the glottis. This acoustic denition
makes it possible to distinguish aspirated stops from not only ejectives
presented above, but also from affricated stops which arise through stop
assibilation. There is good evidence from various languages lacking aspiration
contrast that the VOT of /t/ is longer when followed by /i/ compared to /a/. /ti/
in this context is allophonically realized [t
s
i]. Languages in which this is the
case include Maori (Maclagan et al. 2009), Moroccan Arabic (Shoul 2007),
J apanese, Romanian, Cheyene, Ek, Canadian French (Kim 2001; see also
Hall and Hamann 2006, for a cross-linguistic review). These affricated stops,
like aspirated stops, are produced with a positive voicing lag. They differ,
however, in the acoustic information occurring within this lag. The turbulent
noise produced after the burst of affricated stops is created, not at the glottis,
but at the point of constriction for the following vowel whose conguration is
formed through coarticulation, during the stop. Their affricate-like properties
involve supralaryngeal constriction (Kim 2001). Affricated stops may be
produced with a large glottal opening, but this glottis area is greater than
the oral constriction area so that the frication noise generated at the oral
constriction becomes dominant over aspiration noise at the glottis.
The [spread glottis] contrast may be signaled by additional acoustic
cues, which may vary depending on the speaker and the structural context
in which the feature occurs. In pre-vocalic position, where aspirated sounds
are most commonly attested, aspiration may be additionally cued by
increased F0 values in the rst periods of the following vowel. According to
Stevens (1998), this increased fundamental frequency of glottal vibrations
is presumably a result of increased stiffness of the vocal folds that is an
attribute of these voiceless consonants. Languages with this cue include
Cantonese (Zee 1980), Mandarin Chinese (Iwata and Hirose 1976), and
Nepali (Clements and Khatiwada 2007). Aspiration can also be cued by a
greater difference in amplitude between the rst and second harmonics in
the rst few glottal pulses of the vowel. Languages with this acoustic cue
include English (Chapin Ringo 1988), German (J essen 1998), and Nepali
(Clements and Khatiwada 2007).
4. A new proposal: Combine articulation and acoustics
In the approach assumed here, a segment can be said to bear a distinctive
feature F at the phonetic level only if it satises both its articulatory and
acoustic denition. We have shown that an articulatory denition of the
feature [+spread glottis] in terms of a single common glottal conguration or
gesture is problematic and would be insufcient to account for the full range
of speech sounds characterized by this feature. Indeed, different glottal sizes
and different interarticulator timing of laryngeal and supralaryngeal gesture
can result in aspiration. From this point of view, we suggest that the class of
aspirated segments must be dened in both articulatory and acoustic terms,
as in (1).
(1) Dening attributes of [+spread glottis]:
a. (Articulatory) presence of a glottal noise source
b. (Acoustic) presence of aspiration noise, i.e. aperiodic energy in
the second and higher formants, with a duration of around 30 ms
or more in deliberate speech.
The suggested denition does not require that timing relations or other
dynamic information be included in feature denitions. Timing relations
follow from the requirement that the acoustic goal associated with the feature
be manifested in the signal.
This requirement is satised by voiceless stops in contexts where they
are followed by signicant aspiration: stops before vowels, stops before
sonorants, and stops in nal position (for languages contrasting aspirated and
unaspirated stops in this context). Figure 10 illustrates the characteristics of
the aspiration phase that follows a pre-vocalic voiceless stop. F2, F3, and F4
patterns are visible during this phase. It is also satised by voiced aspirated
stops. In Nepali /d
h
a/, for example, a fricative segment and a voiced aspirated
segment can be seen in the transition from the voiced stop release to a vowel
(Figure 11).
This denition of aspiration extends to the Burmese voiceless nasals
mentioned earlier and to aspirated fricatives that have been documented in
Burmese (Ladefoged and Maddieson 1996) and Korean (Kagaya 1974). It is
not satised, however, by stops which are not followed by aspiration, even
if they are produced with a spread glottis: pre-vocalic voiceless stops with
open glottis, but no aspiration (as in Kabiye) and stops before fricatives (as
in English and Tashlhiyt). The suggested denition is not satised by plain
fricatives, even though they are produced with a spread glottis. The reason
is that their glottal opening tends to coincide with a narrower supralaryngeal
constriction, so that oral noise becomes dominant over glottal noise. This large
glottal opening, which may be considered as an enhancing gesture, is due to
the aerodynamics of these segments, according to Lfqvist & McGarr (1987:
Figure 10. Spectrogram of Tashlhiyt [t'ut'id] she passes (scale: 05 KHz)
illustrating the acoustic characteristics of the aspiration phase that
follows the burst of a voiceless stop.
Figure 11. Spectrogram of the Nepali item [d'ada] (scale: 05 KHz) illustrating
the acoustic characteristics of the aspiration phase following a voiced
aspirated stop.
399): a large glottal opening [in fricatives] not only prevents voicing but
also reduces laryngeal resistance to air ow and assists in the build-up of
oral pressure necessary for driving the noise source. Differences in glottal
opening amplitude between anterior and posterior fricatives in languages
such Moroccan Arabic (Zeroual 2000) and Tashlhiyt (Ridouane 2003)
provide additional evidence that the constriction at glottal and supraglottal
levels have to be adjusted to meet the aerodynamics of fricatives. While
[f] and [s] exhibit almost the same glottal opening, the backer fricatives,
produced with a less narrow constriction, are clearly produced with a larger
opening amplitude, yielding a relationship of the form f=s<J<. In [], the
constriction at the uvula is not narrow and hence a larger glottal opening
is needed for the amount of airow necessary to generate the required
turbulence.
5. Conclusion
Language-independent feature denitions specify both articulatory and
acoustic denitions. We have shown that a denition of the feature [spread
glottis] in terms of a single common glottal conguration or gesture would
be insufcient to account for the full range of aspirated sounds characterized
by this feature; an acoustic denition is also necessary. From this point of
view, we have provided a language-independent denition specifying both
articulatory (i.e. presence of glottal friction) and acoustic (i.e. presence of
aperiodic energy in the second and higher formants) attributes of this feature.
The suggested denition does not require that timing relations or other
dynamic information be included in feature denitions. Timing relations
follow from the requirement that the acoustic goal associated with the feature
be manifested in the signal.
Notes
1. The boundary between the second and third phases is not always clear-cut. As
shown by Hanson and Stevens (2003), the third phase can sometimes display a
mix of frication noise and aspiration.
2. According to Kluender et al. (1997), during the delay between release and voice
onset, there is little or no energy in the rst formant because, with an open glottis,
most all low frequency energy is cancelled by the tracheal tube introducing an
antiresonance in the low-frequency region of F1.
References
Avery, Peter
1996 The Representation of Voicing Contrasts. Ph.D. diss., University of
Toronto.
Avery, Peter and William Idsardi
2001 Laryngeal dimensions, completion and enhancement. In: T. Alan Hall
(ed.), Distinctive Feature Theory, 4171. Berlin/New York: Mouton
de Gruyter.
Bandhu, C.M., Dahal, B.M., Holzhausen, A., and Hale, A.
1971 Nepali Segmental Phonology, SIL and Tribubhan University, Kirtipur.
Benguerrel, A.P. and Tej Bhatia
1980 Hindi stop consonants: an acoustic and berscopic study. Phonetica
37: 149158.
Bhatia, Tej
1976 On the Predictive Role of the Recent Theories of Aspiration. Phonetica
33: 6274.
Boersma, Paul
1998 Functional Phonology: Formalizing the interactions between
articulatory and perceptual drives. The Hague: Holland Academic
Graphics.
Bombien, Lasse
2006 Voicing alternations in Icelandic sonorantsa photoglottographic and
acoustic analysis. AIPUK, 37, 6382.
Browman, Catherine, P. and Louis Goldstein
1986 Towards an articulatory phonology. In: C. Ewen and J . Anderson
(eds.), Phonology Yearbook 3, 219252. Cambridge: Cambridge
University Press.
Chapin-Ringo, Carol
1988 Enhanced amplitude of the rst harmonic as a correlate of voicelessness
in aspirated consonants. Journal of the Acoustical Society of America,
83 (Suppl. 1), S70 [Abstract].
Cho, Taehong and Peter Ladefoged
1999 Variations and universals in VOT: evidence from 18 languages.
Journal of Phonetics 27: 207229.
Clements, G. Nick
1985 The Geometry of Phonological Features. Phonology Yearbook 2, 225
252.
Clements, G. Nick, and Elizabeth Hume
1995 The internal organization of speech sounds. In Goldsmith, J . (ed.),
Handbook of Phonological Theory, 245306. Basil Blackwell, Oxford.
Clements, G. Nick, and Rajesh Khatiwada
2007 Phonetic realization of contrastively aspirated affricates in Nepali.
Proceedings of the 16
th
International Congress of Phonetic Sciences,
629632. Saarbrcken, Germany.
Cooper, Andre M.
1991 Laryngeal and oral gestures in English /p, t, k/. Proceedings of the
12
th
International Congress of Phonetic Sciences 2, 5053. Aix-en-
Provence, France.
Dixit, Prakash R.
1989 Glottal gestures in Hindi plosives. Journal of Phonetics 17: 213237.
Dixit, Prakash R.
1993 Spatiotemporal patterns of glottal dynamics and control of voicing
and aspiration in Hindi stops. Indian Linguistics 54: 136.
Dixit, Prakash R. and Shipp, T.
1985 Study of subglottal air pressure during Hindi stop consonants.
Phonetica 42: 5377.
Durand, J acques
2000 Les traits phonologiques et le dbat articulation/audition. In Busuttil,
P. (ed.) Points dinterrogation: Phontique et phonologie de langlais,
5670. Pau: Presses Universitaires de Pau.
Fant, Gunnar
1973 Stops in CV syllables. In: Speech Sounds and Features, 110139. MIT
Press: Cambridge, MA.
Flemming, Edward
1995 Auditory representations in phonology. Ph.D. diss., UCLA.
Fukui, N., Hirose and H.
1983 Laryngeal adjustments in Danish voiceless obstruent production.
Annual Report of the Institute of Phonetics, University of Copenhagen
17, 6171.
Goldsmith, J ohn
1990 Autosegmental and metrical phonology. Cambridge MA: Blackwell
Publishers.
Hall, Tracy A. and Silke Hamann
2006 Towards a typology of stop assibilation. Linguistics 44 (6), 11951236.
Halle, Morris
1983 On Distinctive Features and their articulatory implementation. Natural
Language and Linguistic Theory 1, 91 105
Halle, Morris and Kenneth N. Stevens
1971 A note on laryngeal features. Quarterly Progress Report of the
Research Laboratory of Electronics 101 (MIT), 198213.
Hanson, Helen, M. and Kenneth N. Stevens
2003 Models of aspirated stops in English. Proceedings of the 15
th

International Congress of Phonetic Sciences, 783786. Barcelona,
Spain.
Heffner, Roe-Merrill S.
1950 General Phonetics. University of Wisconsin Press, Madison.
Hoole, Phil, B. Pompino-Marschall, and M. Dames
1984 Glottal timing in German voiceless occlusives. Proceedings of the
10
th
International Congress of Phonetic Sciences, 309403. Utrecht,
Netherlands.
Hoole, Phil, Susanne Fuchs, and Klaus Dahlmeier
2003 Interarticulator timing in initial consonant clusters. In: S. Palethorpe
& M. Tabain (Eds.) Proceedings of the 6th international seminar
on speech production, 101106. Macquarie University, Sydney,
Australia.
Hutters, Birgit
1985 Vocal fold adjustments in aspirated and unaspirated stops in Danish.
Phonetica, 42: 124.
Iverson, Gregory K.
1983 On Glottal Width Features. Lingua 60: 331339.
Iverson, Gregory K. and J oseph C. Salmons
1995 Aspiration and laryngeal representation in Germanic. Phonology 12:
369396.
Iverson, Gregory K. & J oseph C. Salmons
2003 Laryngeal Enhancement in Early Germanic. Phonology 20: 4374.
Iwata, Ray and Hajime Hirose
1976 Fiberoptic, acoustic studies of Mandarin stops and affricates. Ann.
Bull. RILP. (Annual Bulletin. Research Institute of Logopedics and
Phoniatrics) 10: 4760.
Iwata, Ray, Masayuki Sawashima, Hajime Hirose, and Seiji Niimi
1979 Laryngeal adjustments of Fukienese stops. Initial plosives and nal
plosives. Ann. Bull. RILP. 13: 6181.
Iwata, Ray, Masayuki Sawashima, and Hajime Hirose
1981 Laryngeal adjustments for syllable-nal stops in Cantonese. Ann.
Bull. RILP. 15: 4554.
J akobson, Roman, Gunnar Fant, and Morris Halle
1952 Preliminaries to Speech Analysis: The Distinctive Features and their
Correlates. Cambridge, MA: MIT Press.
J essen, Michael
1998 Phonetics and phonology of tense and lax obstruents in German. J ohn
Benjamins Publishing Company: Amsterdam/Philadelphia.
J essen, Michael
2001 Phonetic implementation of the distinctive auditory features [voice]
and [tense] in stop consonants. In Hall, T.A. (ed.), Distinctive Feature
Theory, 237294. Berlin/New York: Mouton de Gruyter.
J ones, Daniel
1964 An Outline of English Phonetics. 9
th
ed. Cambridge: W. Heffers &
Sons Ltd. First edition 1918.
Kagaya, Ryohei
1974 A berscopic and acoustic study of Korean stops, affricates and
fricatives. Journal of Phonetics 2: 161180.
Kagaya, Ryohei and Hajime Hirose
1975 Fiberoptic, electromyographic and acoustic analysis of Hindi stop
consonants. Ann. Bull. RILP. 9: 2746.
Keating, Patricia
1988 A survey of phonological features. Distributed by Indiana University
Linguistics Club.
Kenstowicz, Michael
1994 Phonology in generative grammar. Oxford: Blackwell Publishers.
Kim, Chin-Wu.
1970 A theory of aspiration. Phonetica 21: 107116.
Kim, Hyunsoon
2001 A phonetically based account of phonological stop assibilation.
Phonology 18: 81108.
2005 The representation of the three-way laryngeal contrast in Korean
consonants. In: Marc van Oostendorp and J eroen van de Weijer (eds),
The internal organization of phonological segments, 263293. Berlin/
New York: Mouton de Gruyter.
Kingston, J ohn
1990 Articulatory binding. In: Kingston, J ohn and Mary E. Beckman
(eds.) Papers in Laboratory Phonology I: Between the Grammar
and Physics of Speech, 406434. Cambridge: Cambridge University
Press.
Kjellin, Olle
1977 Observations on consonant types and tone in Tibetan. Journal of
Ladefoged, Peter
1973 The features of the larynx. Journal of Phonetics 1: 7384.
1980 Articulatory parameters. Language and Speech 23: 2530.
1993 A course in phonetics. 3rd ed. Fort Worth: Harcourt College
Publishers.
Ladefoged, Peter and Ian Maddieson.
1996 The sounds of the worlds languages. Oxford: Blackwell Publishers.
Lindqvist, J an
1972 Laryngeal articulation studied on Swedish subjects. Speech
Transmission Laboratory, Quarterly Progress and Status Report
(STL-QPSR), 1027. Royal Institute of Technology, Stockholm.
Lisker, Leigh and Arthur Abramson, S
1964 A cross-language study of voicing in initial stops: acoustic
measurements. Word 20: 384422.
Lisker, Leigh and Arthur Abramson, S.
1967 Some effects of context on voice onset time in English stops. Language
and Speech 10: 128.
1971 Distinctive features and laryngeal control. Language 47: 767785.
Lfqvist, Anders
1980 Interarticulator programming in stop production. Journal of Phonetics
8: 475490.
1990 Speech as audible gestures. In: W.J . Hardcastle and A. Marchal (eds.),
Speech Production and Speech Modelling, pp. 289322. Dordrecht:
Kluwer Academic Publishers.
Lfqvist, Anders and Magnus Ptursson
1978 Swedish and Icelandic stops - A glottographic investigation. In:
Weinstock, J . (ed.), The Nordic Languages and Modern Linguistics 3,
454461. Austin: The University of Texas at Austin Press.
Lfqvist, Anders and Hirohide Yoshioka
1980 Laryngeal activity in Swedish obstruent clusters. Journal of the
Acoustical Society of America 68(3), 792799.
Lfqvist, Anders and Hirohide Yoshioka
1981 Interarticulator programming in obstruent production. Phonetica 38,
2134.
Lombardi, Linda
1991 Laryngeal features and laryngeal neutralization. PhD diss., University
of Massachusetts, Amherst.
1995 Laryngeal features and privativity. The Linguistic Review 12: 35
59.
Mac Lagan, Margaret, Catherine Watson, Ray Harlow, J eanette King, and Peter
Keagan
2009 /u/ fronting and /t/ aspiration in Mori and New Zealand English.
Language Variation and Change 21: 175192.
Mikuteit, Simone and Henning Reetz
2007 Caught in the ACT: The timing of aspiration and voicing in East
Bengali. Language and Speech 50 (2): 249279.
Munhall, Kevin and Anders Lfqvist
1992 Gestural aggregation in speech: laryngeal gestures. Journal of
Ni Chasaide, Ailbhe
1985 Preaspiration in phonological stop contrasts. Ph.D. Diss., University
College of North Wales, Bangor.
Ptursson, Magnus
1976 Aspiration et activit glottale. Phonetica 33: 16998.
Rice, Keren
1994 Laryngeal features in Athapaskan languages. Phonology 11: 107147.
Rialland, Annie, Rachid Ridouane, and Balaibaou Kassan
2009 A physiological investigation of voice quality in Kabiye assertions
and yes/no questions. Paper presented at the 6
th
World Congress of
African Linguistics. Cologne: Germany.
Ridouane, Rachid
2003 Suites de consonnes en berbre chleuh: phontique et phonologie.
Ph.D. Diss., Sorbonne-Nouvelle University-Paris 3.
Ridouane, Rachid, Susanne Fuchs, and Phil Hoole
2006 Laryngeal adjustments in the production of voiceless obstruent
clusters in Berber. In: Harrington, J . and M. Tabain (eds.), Speech
Production: Models, Phonetic Processes, and Techniques, 275301.
New York: Psychology Press.
Sawashima, Masayuki
1970 Glottal adjustments for English obstruents. Haskins Laboratories:
Status Report on Speech Research SR-21/22, 186200.
Sawashima, Masayuki and Seiji Niimi
1974 Laryngeal conditions in articulations of J apanese voiceless consonants.
Ann. Bull. RILP. 8: 1318.
Sawashima, Masayuki, Hea Suk Park, Kiyoshi Honda, and Hajime Hirose
1980 Fiberscopic study on laryngeal adjustments for syllable-nal
applosives in Korean. Ann. Bull. RILP. 14: 125138.
Shoul, Karim
2007 Etude physiologique, articulatoire, acoustique, et perceptive de
lemphase en arabe marocain oriental. Ph.D. diss., Sorbonne-
Nouvelle University-Paris 3.
Stevens, Kenneth, N.
1989 On the quantal nature of speech. Journal of Phonetics 17: 346.
1998 Acoustic Phonetics. MIT Press.
2002 Toward a model for lexical access based on acoustic landmarks and
distinctive features. Journal of the Acoustical Society of America 111,
18721891.
2003 Acoustic and perceptual evidence for universal phonological features.
Proceedings of the 15
th
International Congress of Phonetic Sciences,
3338. Barcelona, Spain.
2005 Features in Speech Perception and Lexical Access. In Pisoni, D.E.
and Remez, R.E. (eds.), Handbook of Speech Perception, 125155.
Cambridge, MA, Blackwell.
Stevens, Kenneth, N. and Samuel, J . Keyser
2010 Quantal theory, enhancement, and overlap. Journal of Phonetics 38
(1): 1019.
Tsui, Ida Y. H. and Valter Ciocca
2000 Perception of aspiration and place of articulation of Cantonese
initial stops by normal and sensorineural hearing-impaired listeners.
International journal of language & communication disorders 35
(4), 507525
Vaux, Bert
1998 The laryngeal specications of fricatives. Linguistic Inquiry 29: 497
511.
Vaux, Bert and Bridget Samuels
2005 Laryngeal markedness and aspiration. Phonology 22: 395436.
Yadav, Ramawatar
1984 Voicing and aspiration in Maithili: A beroptic and acoustic study.
Indian Linguistics 45 (14): 130.
Yeou, Mohamed, Kyoshi Honda, and Shinji Maeda.
2008 Laryngeal adjustments in the production of consonant clusters
and geminates in Moroccan Arabic. Proceedings of the 8
th
Inter-
national Seminar on Speech Production, 249252. Strasbourg,
France.
Yoshioka, Hirohide, Anders Lfqvist, and Hajime Hirose
1981 Laryngeal adjustments in the production of consonant clusters and
geminates in American English. Journal of the Acoustical Society of
America 70(6): 16151623.
Zee, Eric
1980 The Effect of Aspiration of the F0 of the Following Vowel in
Cantonese. UCLA Working Papers in Phonetics 49: 9097.
Zeroual, Chakir
2000 Propos controverss sur la phontique et la phonologie de larabe
marocain. Ph.D. Diss., Universit Paris 8.
Representation of complex segments in Bulgarian
Jerzy Rubach
1. Introduction
Looking at the data from Bulgarian, this chapter
*
asks the question of
how palatalized and velarized consonants should be represented in terms
of features. In particular, the question is whether such consonants should
be treated as complex segments or as simplex segments, the distinction
being that the former but not the latter involve a feature tree with two
nodes, one reecting primary articulation and the other showing secondary
articulation.
Section 1 explains how palatalized and velarized consonants are rep-
resented in three geometric theories: Articulator Theory, Unied Feature
Theory, and Modied Articulator Theory. Section 2 considers the loss of
palatalization and velarization before front vowels in Bulgarian and argues
that the process should be analyzed as the elimination of complex segments.
Section 3 pursues the theoretical consequences of this analysis, arguing that
Modied Articulator Theory must include a constraint that palatalization
and velarization of labials and coronals involves a dorsal articulation.
The conclusion is that palatalized/velarized labials and coronals must be
represented as complex segments.
2. Representation
The contrast between velarized and palatalized consonants in Bulgarian is
limited to the class of laterals.
1
(1) lud [l] mad lut [l'] sharp
laf [l] word lav [l'] left
The opposition illustrated in (1) is classic in the sense that the sole difference
between the consonants is the type of secondary articulation that they carry:
the velarized [l] is produced with the back part of the tongue raised toward
the velum and the palatalized [l'] is articulated with the front part of the
tongue raised toward the palate.
2
These are the congurations that Chomsky
and Halle (1968) characterizes as [+high, +back] for the velarized consonant
and [+high, back] for the palatalized consonant. The understanding is that
these features refer to secondary articulations, the primary articulation being
[+coronal, +anterior] in the case of Bulgarian laterals.
Feature theories developed since Chomsky and Halle (1968) and inspired
by the ground-breaking work of Clements (1985) differ from each other with
respect to the representation of the opposition in (1). In what follows, I look
at three theories: Articulator Theory, Unied Feature Theory, and Modied
Articulator Theory.
Articulator Theory, developed in Clements (1985), Sagey (1986), Halle
(1992, 1995) and Halle, Vaux, and Wolfe (2000), represents consonants
and vowels by different features and makes an assumption that features
are dependents of nodes, but not of other features. Place of articulation in
consonants is determined by the articulator that produces a given consonant:
lips (labial articulator), the tongue blade (coronal articulator) or the tongue
body (dorsal articulator). Since the primary articulator in vowels is the
tongue body, the features of vowels referring to the posture of the tongue are
dependents of the DORSAL node. The relevant fragment of the feature tree is
cited from Halle (1995).
3
(2) Articulator Theory
ROOT
PLACE
LABIAL
CORONAL
DORSAL

[round] [ant] [distr] [back] [high] [low]
Given this feature tree, the opposition of palatalized [l'] versus velarized [l]
exemplied in (1) is expressed as the opposition of [back] and [+back] under
the DORSAL node. In other regards, the structure of the tree for [l'] and [l] is
identical. The essential observation is that both consonants are represented as
complex segments in the sense that they involve two articulators. This fact is
expressed as the involvement of two dependents of the PLACE node: CORONAL,
which dominates the features for primary articulation, and DORSAL, which
dominates the features for secondary articulation. In contrast, plain [l], which
294 Jerzy Rubach
is neither palatalized nor velarized, is a simplex segment. Plain [l] occurs in
Bulgarian, but its occurrence is limited to the context of front vowels, as in
krale kings (see section 2 below).
The three way contrast plain [l], palatalized [l'] and velarized [l]
is illustrated in (3), where I look at the relevant fragment of the feature
tree only.
4
(3) Articulator Theory
a. plain [l]
PLACE
CORONAL

[+ant]
b. palatalized [l']
PLACE
CORONAL DORSAL
[+ant] [back]
c. velarized [l]
PLACE
CORONAL DORSAL
[+ant] [+back]
Unied Feature Theory, developed by Clements (1989), Hume (1992, 1996),
and Clements & Hume (1995), differs from Articulator Theory in several
fundamental assumptions, but only two of them are relevant for the purposes
of this article. First, vowel features are grouped under the VOCALIC node and
there is no DORSAL node of the type employed by Articulator Theory. VOCALIC
is a dependent of the CONSONANTAL-PLACE node (C-PLACE henceforth). Second,
primary articulation for consonants is characterized by the features directly
under C-PLACE, where features can be dependents of other features rather
than of nodes. A further assumption is that the same set of features is used
for vowels and consonants. The tree corresponding to the one in (2) is as
follows.
(4) Unied Feature Theory
ROOT

C-PLACE

[labial]

[coronal]

[dorsal]

VOCALIC
[ant] [distr]
V-PLACE APERTURE
[lab] [cor] [dors] [open ...]
[ant]
The understanding, not shown graphically in (4), is that each feature or node
denes a tier, so, for example, all instances of [coronal] are on the same
tier. All features are privative, including [open] under the APERTURE node
which characterizes height. Since height represents multiple values, there
can be more than one token of [open] under the APERTURE node. The feature
[coronal] corresponds to [back] while [dorsal] corresponds to [+back] in
Articulator Theory. This correspondence holds when [coronal] and [dorsal]
are under V-PLACE. When directly under C-PLACE, [coronal] and [dorsal] refer
to primary articulation, so they correspond to the nodes CORONAL and DORSAL
in Articulator Theory. The essential point is that secondary articulation (here:
palatalization and velarization) and primary articulation are characterized by
features under different nodes.
Given these assumptions, the representation of the three types of laterals
shown in (3) for Articulator Theory is now as follows. The proponents of
Unied Feature Theory have not made it clear how they would represent
velarization, but I assume that it would be a mirror image of palatalization,
so it would be expressed as [dorsal] under V-PLACE.
5
296 Jerzy Rubach
(5) Unied Feature Theory
a. plain [l]
C-PLACE
[cor]
[+ant]
b. palatalized [l']
C-PLACE
[cor] VOCALIC
[+ant]
V-PLACE
[cor]
[ant]
c. velarized [l]
C-PLACE
[cor] VOCALIC
[+ant]
V-PLACE
[dorsal]
In spite of rather substantial differences in both the features and the structure
of the tree, Articulator Theory and Unied Feature Theory are in agreement
in one regard: they represent consonants with a secondary articulation as
complex segments.
6
The most recent proposal regarding feature geometry which I dub
Modied Articulator Theory is that of Halle (2005). It constitutes a radical
departure from the earlier theories in the sense that every feature occupies an
autosegmental tier of its own, so the tree is at and there are no intermediate
nodes between the terminal features and the Root node. Halle (2005) refers
to this structure as the bottle brush model.
In spite of the radical change as compared to the earlier theories, Halle
(2005) preserves the essential properties of Articulator Theory. In particular,
in addition to the binary features familiar from Articulator Theory, Halle
(2005) introduces unary features that represent designated articulators
that are active in the articulation of a given segment (DA features). Thus,
[DACoronal] represents the tongue blade (coronal articulator), [DADorsal]
refers to the tongue body (dorsal articulator) and [DALabial] denotes the lips
(labial articulator). The feature tree is given in (6), but, for reasons of space,
the list of features is not complete.
(6) Modied Articulator Theory
ROOT
[DALabial] [+round] [DACoronal] [+ant] [+distr] [DADorsal] [+back] + [ high] [+low] ...
Given the feature tree in (6), the representation of plain, palatalized and
velarized laterals in Modied Articulator Theory is different from the
representations in Articulator Theory (3) and Unied Feature Theory (5).
(7) Modied Articulator Theory
a. plain [l]
[DACor] [+ant]
ROOT

b. palatalized [l']
[DACor] [+ant]
ROOT
[back]
298 Jerzy Rubach
c. velarized [l]
ROOT
[DACor] [+ant] [+back]
The essential observation is that consonants with secondary articulation
(here: palatalization and velarization) are simplex rather than complex
segments because they involve single occurrences of DA features. This
constitutes a difference with regard to Articulator Theory and Unied
Feature Theory. In contrast to these theories, Halle (2005) assumes that the
articulation of front vowels involves two articulators: the dorsal articulator
and the coronal articulator. That is, all vowels activate [DADorsal] but front
vowels additionally activate [DACoronal].
7
In effect then, front vowels are
complex segments.
In the following section, I test Articulator Theory, Unied Feature Theory
and Modied Articulator Theory by evaluating how they can handle a process
of depalatalization in Bulgarian.
2. Depalatalization
Bulgarian
8
is a typical Slavic language in the sense that it exhibits a rich
system of contrasts involving palatalization.
9
The point of interest is that
the contrast is between plain and palatalized consonants rather than between
velarized and palatalized consonants, as is the case in Russian. Actually the
contrast between palatalization and velarization is attested but, as remarked
in section 1, it is limited to laterals (see (1) in section 1). Furthermore,
palatalization is contrastive for dorsals. The relevant examples have been
gathered in (8).
(8) Palatalization contrasts:
bal [b] ball bal [b] white
tupam [t] beat tutun [t] tobacco
dal [d] he gave dal [d] part
saraf [s] money changer sara [s] sulphur
zandan [z] dungeon zan [z] waste
raz [r] once razka [r] cat
nula [n] zero nux [n] scent
karta [k] card kar [k] prot
gol [g] naked gol [g] puddle
There is a restriction on the occurrence of palatalized consonants: these
consonants are never found before front vowels. Inspection of further
facts shows that Bulgarian exhibits an entirely exceptionless process of
depalatalization before front vowels. This process is motivated by alternations
between soft (palatalized) and plain consonants such as those in (9).
(9) Bulgarian depalatalization:
zem+a [ma] earth zem+i [mi] earths
zem+en [mc] earthly
kon+o [n:] horse (voc.) kon+e [nc] horses
kon+ik [ni] (diminutive)
car+t [re] the emperor car+ic+a [ri] empress
car+ev [rc] imperial
pet+t [te] the road pet+i [ti] roads
pet+em [tc] on the way
The words in (9) have underlying rather than derived palatalized consonants
because they occur before back vowels, that is, in a context that does not
warrant palatalization. The occurrence of plain consonants before i and e must
therefore be an effect of depalatalization. From the perspective of Optimality
Theory (Prince and Smolensky [1993]2004), McCarthy and Prince 1995),
what we need is a constraint that prohibits palatalized consonants before
front vowels.
The statement of the depalatalization constraint is aided further by
two observations. First, in full parallel to the data in (9), underlying soft
/l'/ depalatalizes to plain [l] before front vowels (10a). Second, underly-
ing velarized /l/ loses velarization in the same context (10b). In effect
then, Bulgarian exhibits phonetically three types of laterals: palatalized [l'],
velarized [l], and plain [l]. The latter is restricted distributionally to the context
of front vowels.
(10) a. kral+u [lu] king (voc.), kral+t [le] the king
versus
kral+ic+a [li] queen, kral+e [lc] kings
b. metal [l] metal, metal+t [le] the metal
versus
metal+i [li] (pl.), metal+en [lc] metallic
300 Jerzy Rubach
The generalization is now clear: Bulgarian does not admit secondary
articulation before front vowels, no matter whether it is palatalization (9)
and (10a) or velarization (10b). Since in both Articulator Theory and Unied
Feature Theory secondary articulation is represented as a complex segment,
the relevant constraint prohibits complex segments before front vowels.
(11) PLAIN-SEG: No complex segments before front vowels.
The reason for the absence of palatalization before front vowels is now clear:
PLAIN-SEG, a surface-true constraint, dominates PAL constraints, so candidates
with palatalized consonants before front vowels can never be optimal.
Depalatalization provides an argument against Modied Articulator
Theory. This theory, unlike the other two theories discussed here, cannot
offer a single denominator for both the loss of palatalization and the loss of
velarization because there is no structural similarity between these processes
in Modied Articulator Theory. Such structural similarity the occurrence of
a complex segment is captured in Articulator Theory and Unied Feature
Theory, both of which are equally successful. However, the consideration
of further facts in section 3 shows that Articulator Theory and Modied
Articulator Theory fare better than Unied Feature Theory.
3. Palatalization
The fact that Bulgarian has a depalatalization process might suggest that
it cannot have an active palatalization process. The point of interest is that
this conjecture is false. Bulgarian has fully productive and exceptionless
palatalization before i and e but, oddly, the process is limited to dorsal inputs.
Thus, the consonants in (12a) are plain while those in (12b) are palatalized.
10
(12) a. bik [bi] bull, beda [bc] misfortune,
rib+a [b] sh rib+i [bi] (pl.), rib+eck+a [bc] (dimin.)
mir [mi] peace, mek [mc] soft
tip [ti] type, testo [tc] dough
div [di] wild, godina [di], year delo [dc] work,
vod+a [d] water vod+i [di] (pl.)
sila [si] strength, sega [sc] now
mas+a [s] mass mas+i [si] (pl.)
mes+o [s] meat mes+en [sc] (Adj.)
zima [zi] winter, zet [zc] son-in-law
niva [ni] eld, neka [nc] let, vin+a [n] guilt vin+i [ni] (pl.)
vin+o [n] wine vin+en [nc] (Adj.)
riza [ri] shirt, red [rc] row
b. kino [k'i] cinema, kelner [k'c] waiter, tuk e [k'#c] he is here
gibon [g'i] gibbon, general [g'c] general
drog+a drug drog+i [[g'i] (pl.)
xitrec [x'i] sly person, xele [x'c] at last
mux+a [x] y mux+i [x'i] (pl.),
tax e jasno [x'#c] it is clear to them
The question is why it is only dorsals that palatalize before front vowels.
The answer to this query lies with the representation of palatalized
dorsals.
Both Articulator Theory (13a) and Modied Articulator Theory (13b)
treat palatalized dorsals as simplex segments: they are consonants that carry
a [back] specication under the DORSAL node (13a) or directly under the
ROOT node (13b). Therefore, these theories predict, correctly, that PLAIN-SEG
has no jurisdiction over the palatalization of dorsals. In contrast, Unied
Feature Theory does not express an asymmetry between palatalized dorsals
and other palatalized consonants. All palatalized consonants are represented
in the same way and they are complex segments (13c). The prediction
for Bulgarian is that dorsals, like all other consonants, should escape
palatalization, the wrong generalization. In (13), I show the relevant part
of the tree representing a palatalized [k'] in the three competing theories of
feature geometry.
(13) representation of [k']
a. Articulator Theory
ROOT
PLACE
DORSAL
[back]
b. Modied Articulator Theory
ROOT
[DADorsal]
[back]
302 Jerzy Rubach
c. Unied Feature Theory
ROOT

C-PLACE
[dorsal] VOC
V-PLACE
[cor]
[ant]
Modied Articulator Theory adds an interesting angle to the treatment of
depalatalization. Recall (see section 1) that in this theory front vowels are in fact
complex segments because they involve two articulator features: [DADorsal]
and [DACoronal]. Given this fact, PLAIN-SEG, which prohibits complex
segments before front vowels, can be viewed as a generalization forbidding
the occurrence of two successive complex segments, a natural rationale for this
constraint.
11
However, Modied Articulator Theory fails in an important way:
it treats secondary articulation (palatalization and velarization) as exhibiting
a simplex rather than a complex segment conguration. This drawback is
easily repaired: we need to assume that Optimality Theory has a constraint
mandating that labials and coronals exhibiting palatalization or velarization are
represented as complex segments in the sense that in addition to [DALabial] or
[DACoronal], which express primary articulation, they also involve [DADorsal],
which signals the presence of secondary articulation. The representations of
palatalized [l'] and velarized [l] should therefore be as follows.
(14) Modied Articulator Theory: new representation of secondary
articulation
a. palatalized [l']
ROOT

[DACor] [DACor] [back] [+ant]
b. velarized [l]

ROOT

[DACor] [DACor] [+back] [+ant]
The tenet that coronals and labials with secondary articulation are complex
segments must be a constraint on GEN because it determines how a well-
formed output is congured. Constraints on GEN are inviolable, which
predicts that palatalized and velarized segments behave as complex segments
in all the languages of the world, a prediction that calls for further study.
Notes
* I would like to thank Ivan Ivanov and Roumyana Slabakova for their help with
the Bulgarian data.
1. Other classes of segments show an opposition between plain and palatalized
consonants, as I explain in section 2 below.
2. In this regard, Bulgarian is exactly like Russian, compare the Russian contrast
lukavyj [lu] clever ludi [l'u] people.
3. The Revised Articulator Theory of Halle, Vaux & Wolfe (2000) uses LIPS, TONGUE
BLADE and TONGUE BODY rather than LABIAL, CORONAL and DORSAL shown in (2)
in the function as articulator nodes and treats [labial], [coronal] and [dorsal] as
designated articulator features. These features, assigned to LIPS, TONGUE BLADE and
TONGUE BODY, respectively, distinguish primary from secondary articulation. For
instance, a palatalized coronal such as [l'] carries the specication for [coronal]
as the primary articulation feature but not for [dorsal] because TONGUE BODY
functions as a secondary articulator in palatalization. The relevant fragment of the
tree for [l'], showing contrastive features below the PLACE node, looks as follows:

PLACE

TONGUE BLADE TONGUE BODY

[cor] [+ant] [back]
[l']
In what follows, I use the more widely known terms from the earlier version of
Articulator Theory adopted from Halle (1995). Also, I simplify representations
in feature trees by not showing the designated articulator for primary articulation.
304 Jerzy Rubach
4. Palatalized and velarized consonants are [+high], but here and below I simplify
representations by omitting this specication.
5. Note that [cor] occurs together with [ant], which plays a role in the analysis
of coronalization, a process that turns dorsals into [ant] coronals before
front vowels, for instance, [x] >[J]. For discussion, see Clements and Hume
(1995).
6. This is not true for the representation of palatalized dorsals in Articulator Theory,
a point that I develop in the analysis of the Bulgarian data in section 3.
7. The idea that front vowels are characterized as coronal has its roots in the work
of Clements (1989), Hume (1992, 1996), and Clements & Hume (1995).
8. The data are drawn from standard grammars, including Tilkov (1982, 1983),
Scatton (1975, 1984 and 1993) and Stojanov (1962). These data have been
conrmed and expanded by my eldwork with Ivan Ivanov, whom I would like
to thank here. I also thank Roumyana Slabakova for consultation.
9. The vowel system includes six vowels: [i u c e : a].
10. Recall (see the data in 8) that palatalized dorsals are underlying segments in
Bulgarian, so the changes shown in (12b) are phonemic rather than allophonic.
11. PLAIN-SEG, like any other constraint in Optimality Theory, is violable, so it does
not eliminate palatalization before front vowels. Palatalization occurs if PAL
constraints dominate PLAIN-SEG.
References
Chomsky, Noam, and Morris Halle
Clements, George N.
1985 The geometry of phonological features. Phonology Yearbook 2: 225
252.
1989 A unied set of features for consonants and vowels. Ms., Cornell
University, Ithaca.
Clements, George N., and Elizabeth V. Hume
1995 The internal organization of speech sounds. In: J ohn A. Goldsmith
(ed.), The Handbook of Phonological Theory, 245306. Oxford:
Blackwell.
Halle, Morris
1992 Phonological features. In: William Bright (ed.), International
Encyclopedia of Linguistics, 207212. Oxford: Oxford University
Press.
1995 Feature geometry and feature spreading. Linguistic Inquiry 26: 146.
2005 Palatalization/velar softening: What it is and what it tells us about the
nature of language. Linguistic Inquiry 36: 2341.
Halle, Morris, Bert Vaux, and Andrew Wolfe
2000 On feature spreading and the representation of place of articulation.
Linguistic Inquiry 31: 387443.
Hume, Elizabeth V.
1992 Vowels, coronal consonants and their interaction in non-linear
phonology. Ph.D. diss., Cornell University, Ithaca.
1996 Coronal consonant, front vowel parallels in Maltese. Natural
Language and Linguistic Theory 14: 163203.
McCarthy, J ohn J . and Alan Prince
1995 Faithfulness and reduplicative identity. In: J ill N. Beckman,
Laura Walsh Dickey, and Suzanne Urbanczyk (eds.), University
of Massachusetts Occasional Papers in Linguistics 18: 249384.
Amherst, Massachusetts: GLSA Publications.
Prince, Alan, and Paul Smolensky
2004 Optimality Theory: Constraint Interaction in Generative Grammar.
Oxford: Blackwell. [Original edition, 1993 technical report, Rutgers
University Center for Cognitive Sciences. Available on Rutgers
Optimality Archive, ROA-537.]
Sagey, Elizabeth
1986 The representation of features and relations in non-linear phonology.
Ph.D. diss., MIT, Cambridge, Massachusetts.
Scatton, Ernest A.
1975 Bulgarian Phonology. Cambridge, MA: Slavica Publishers, Inc.
1984 A Reference Grammar of Modern Bulgarian. Columbus, OH: Slavica
Publishers, Inc.
1993 Bulgarian. In: Bernard Comrie and Greville G. Corbett (eds.), The
Slavonic Languages, 188248. London and New York: Routledge.
Stojanov, Stojko
1962 Bulgarska dialektologia. [Bulgarian dialectology]. Soja: Nauka i
izkuctvo.
Tilkov, Dimitur
1982 Gramatika na suvremennija bulgarski knizhoven ezik. Fonetika.
[Grammar of Contemporary Literary Bulgarian. Phonetics]. Soja:
BAN.
1983 Gramatika na suvremennija bulgarski knizhoven ezik. Morfologija.
[Grammar of Contemporary Literary Bulgarian. Morphology]. Soja:
BAN.
Proposals for a representation of sounds based
on their main acoustico-perceptual properties
Jacqueline Vaissire
1. Introduction
The behavior of speech sounds involves a large number of natural,
panchronic processes that take place at different points in time and space, in
unrelated languages. More or less subtle synchronic, dialectal, stylistic, or
allophonic variations may progressively become full-edged sound changes.
An overview of these sound changes brings out preferences for natural
sounds, that are easier to perceive or produce, or more audible in noisy
environments. These preferences are reected synchronically in typologically
preferred systems of contrasts, including cross-linguistic patterns in phonotactic
restrictions. Additional evidence for these general tendencies comes from
various sources, including widespread speech errors by children, hearing-
impaired subjects or language learners, in ambient noise or in spectrogram
reading. (about panchronic phonology: Haudricourt 1969; on language
universals: Greenberg [1966] 2005; on sound systems: Maddieson 1984; on
the phonetics of sound changes: Ohala 1993, Blevins 2004; on universals in
syllable structure: Clements and Keyser 1983, Vennemann 1988).
Phonetic sciences are concerned with explaining the panchronic behavior
of speech sounds by contributing the widest possible range of plausible
phonetic explanations of sound change. The constraints on sound patterns
come from (i) the physiology of the speech production organs and the
perception apparatus, (ii) the laws of aerodynamics and acoustics, and
(iii) neurological-psychological facts. The constraints exert forces that are
gradient; they interact in a nonlinear manner and may trade off against one
another. Thanks to progress in the understanding of the fundamentals of the
speech process, and to technical advances in exploratory techniques, there
now exists the potential for complex modelling which would incorporate
all the different types of constraints, and which would improve gradually as
new experimental evidence comes in. Earlier models that were developed in
this spirit are the three-parameter model by Stevens and House (1955), and
Fants model (1960). Both approximate the vocal tract by three articulatory
variables: place of articulation, degree of constriction and amount of lip
Proposals for a representation of sounds 307
rounding. Maedas (1989) articulatory model continues in the same vein, but
with more parameters. It is based on a statistical analysis of X-ray data, and
allows for the study of compensatory phenomena in a more realistic way than
the former three-parameter models. Other prominent models include Ohalas
aerodynamic model of speech (Ohala 1997), the Task Dynamic model of
inter-articulator coordination in speech (Saltzman and Munhall 1989), and
vowel inventory simulations (Lindblom 1986, Schwartz et al. 1997).
This paper aims to propose a notation system based on the combination
of acoustic and perceptual properties of sounds. The proposal is mainly
based on Stevenss (1989), Fants (1960) and Maedas (1989) work and our
own experience in spectrogram reading in different languages. The use of
an articulatory model was instrumental in exploring the full potential of a
given vocal tract in a more realistic manner than the previous models and in
hearing the resulting percept (Vaissire 2008, 2009). After the introduction,
Section 2 provides a short summary of the articulatory, aerodynamic,
acoustic and perceptual properties of sounds that have previously been put
forward as explaining natural phonetic changes. Some relevant aspects of
the acoustic theory of speech production (Fant 1960, Stevens [1998] 2000)
will be recalled for the non specialists in acoustic phonetics in Section 3.
Section 4 will present a method originally based on articulatory modelling
for placing ve vowels in the maximally stretched three-dimensional vowel
space: these ve vowels are proposed as references for language comparison
or to transcribe detailed variations in the realisation of the phonemes (for
a similar approach, see Lindblom and Sundberg 1969). The determination
of the ve vowels is intimately related to the Quantal Theory proposed by
Stevens (1989). Since these vowels are meant to serve as references, they
are compared with Daniel J ones (1918) cardinal vowels and their rendition
by Peter Ladefoged (the sounds corresponding to the cardinal vowels as
uttered by J ones and Ladefoged are available on the UCLA internet web
site: http://www.phonetics.ucla.edu/course/chapter9/cardinal/cardinal.html).
Section 5 describes possible uses of the same type of notation for enhancing
the similarities between vowels and consonants and describing the effect of
coarticulation.
2. The explanations of natural processes
This section provides a short review on the nature of proposed explanations
for natural processes of sound change.
308 Jacqueline Vaissire
2.1. Articulatory-based explanations
Since Panini, in the 5th century BC, most of the sound changes in historical
phonetics and natural processes have mainly been interpreted in terms of
natural articulatory processes. The distinctive features are dened by the
position of the articulators (Chomsky and Halle 1968), e.g. high, back,
anterior, nasal, etc. The International Phonetic Alphabet (IPA) labels refer to
articulation, such as height and backness of tongue body and lip position for
the vowels. Finally, the basic cardinal vowels (J ones 1918) are also mainly
dened in articulatory terms, at least according to their author.
The explanations based on articulation have proven to be very powerful.
Minimum articulatory effort and economy of gestures (Lindblom 1983)
lead to a decrease in the articulatory distance between the phonemes in a
sequence. The overlapping of the gestures by the different organs required
for the production of the successive phonemes (Hardcastle and Hewlett 1999)
leads to a further reduction of the articulatory distance between the successive
phonemes. The reduction of effort is not uniform and it is determined
according to the prosodic status of each phoneme, mainly its position
relative to word stress and to word boundaries. Sounds in word-initial and
in syllable-initial position or in pre-stressed position are less likely to be
lenited, i.e. reduced or suppressed. Being in a strong position leads to stronger
constriction for the consonants and more opening for the vowels, resulting in
a larger articulatory contrast between the onset consonant and the following
vowel and to less coarticulatory phenomena. For a review of experimental
studies on the effect of prosodic status on the speech organs, see Fougeron
1999.
A number of models emphasize the primacy of articulation over
other aspects of speech: the listener decodes speech by identifying the
underlying vocal tract gestures intended by the speaker (the Motor theory
of speech production: Liberman and Mattingly 1985); the basic units of
phonological contrast are articulatory gestures (Articulatory phonology:
Browman and Goldstein 1992; Task-dynamic model: Saltzman and Munhall
1989).
2.2. Acoustic-based explanation
In their groundbreaking Preliminaries to Speech Analysis, J akobson, Fant
and Halle ([1952] 1967) viewed features as acoustic entities and dened them
mainly in the acoustic domain. The criteria are the sharpness of the formant
structure, the level of total intensity and the way the energy is concentrated
in a central area of the spectrum, the range of frequencies where the energy is
concentrated (e.g. the energy is concentrated in the low frequencies for grave
sounds), the level of noise intensity, the presence of periodic low frequency
excitation, the existence of additional formants and less intensity in existing
formants (nasal/oral), etc. Reference to the acoustic properties of the sounds
allows for the explanation of sound behaviours that are not explainable in
articulatory terms. For example, the feature [grave] sheds light on the change
of [x] into [f] (as in the nal consonant of the English word rough), which
are both [+grave]. While there is no articulatory connection between the
back of the tongue and the lips, both collaborate in lowering the energy.
For Fants later views on distinctive features proposed in the Preliminaries,
see Fant 1973.
Ceteris paribus, the length of the front cavities for the front rounded and
back unrounded vowels, or for rounded /s/ and spread /J/ tend to be similar, and
they share the same acoustic characteristics (for simulation: Vaissire, 2006)
and to become acoustically similar, if the tongue is not retracted (or retroex)
to compensate for the shortening of the front cavity: the French listeners will
not perceive as an /u/ a vowel which is not rounded or perceived coronal
sibilant /s/ and post-alveolar /J/ (for articulatory simulation, Kamiyama and
Vassire, 2009). Open glottis and nasalization execute a large damping of
the rst formant, explaining spontaneous nasalisation (Ohala et al, 1993). In
reading spectrograms, vowels for which bandwidth (amplitude) and formant
intensities do not covary with frequency are judged to be nasalized. Different
congurations produce essentially equivalent acoustic characteristics.
Articulatory tradeoffs: increased constriction length, increased front cavity
length or decreased constriction area for the production of the /r/ sounds in
English, lead to a low F3, but only two gestures are compatible (Guenther et
al, 1999). The basic unit is an auditory target (Perkell et al, 1995).
Stevens has developed a number of enlightening theories based on the
acoustic properties of sounds. According to the Invariance theory, each
distinctive feature has an invariant acoustic property. For example, the gross
shape of the spectrum sampled at the consonantal release shows a distinctive
shape for each place of articulation: a prominent mid-frequency spectral
peak for velars, a diffuse-rising spectrum for alveolars, and a diffuse-falling
spectrum for labials (Stevens and Blumstein 1978). The Quantal Theory
predicts that the languages of the world show a preference for regions
of acoustic stability in the acoustic signal for phonemes. These regions
correspond to quantal states where there are minimal acoustic consequences
to the small perturbations resulting from the position of the articulators
(Stevens 1989). According to Stevens Enhancement theory, the distinctive
features are often accompanied by redundant features that strengthen
the acoustic realization of distinctive features and contribute additional
properties which help the listener perceive the distinction. For example,
lip protrusion enhances distinctive [back] by lowering further the second
formant and therefore enhancing the contrast between [+back] and [back]
sounds; lip protrusion also serves to make post-alveolar [J] more distinct
from [s], by lowering the resonances due to the front cavity (Keyser and
Stevens 2006).
2.3. Aerodynamic constraints
J ohn Ohala has documented the aerodynamic laws underlying a number
of sound changes related to voicing. Intra-oral air pressure build-up in
non-sonorant consonants inhibits voicing. Non-sonorants in the worlds
languages tend to be uttered without vocal fold vibration (i.e. as voiceless).
The phonologically voiced non-sonorants tend to devoice when they are long.
Glides and high vowels have a greater tendency to devoice than comparable
lower vowels because of a higher intraoral pressure. Tense vocal tract walls
and back place of articulation inhibit longer voicing of the phonologically
voiced stops by preventing expansion of the vocal tract volume necessary for
the maintenance of a sufcient transglottal pressure; voiced velar consonants
are accordingly missing more often than labial consonants in the inventory
of the world languages. The voicing of a voiced fricative is gained at the
expense of the energy of its frication: fricatives favor voicelessness (more
than the corresponding stops), etc. (Ohala 1997, see also Passy 1890:161
162). The coupling of a side cavity, such as the nasal cavity, the tracheal
cavity, the sub-lingual cavity or a lateral cavity or the back cavity in the case
of fricatives is responsible for the presence of zeroes in the speech signal:
nasals, fricatives, affricates, laterals, nasalized and breathy vowels share the
presence of zeroes, which sheds light on some sound changes in which they
pattern together. About the spontaneous nasalization of vowels in fricative
context, see Ohala 1996.
2.4. Auditory-based explanation
Perception is known to play a role in preferred sound patterns. Vowel systems
in the worlds languages tend to maximize the perceptual space between
vowels (or, in later versions of the theory, to ensure a sufcient space)
independently of the ease or difculty of their production (the Dispersion
Theory: Liljencrants and Lindblom 1972). The speaker adapts his/her way
of producing speech to his/her estimation of the decoding capacities of the
listener: the speaker will increase or decrease his/her articulatory effort
depending on the context (the Hypo- and Hyperarticulation theory: Lindblom
1990). A large number of substitutions between sounds are explainable in
simple auditory terms and interpreted as misparsing of the acoustic signal
due to perceptual limitations in rapid speech (the theory of misperception:
Ohala 1981; see also Durand 1955). For example, Chang, Plauch and Ohala
(2001) provide an interesting account of asymmetries in sound change
based on asymmetries in perception. For a recent collection of papers on
the importance of perception in shaping phonology, see Hume and J ohnson
2001; for auditory based features, see Flemming 2002.
3. Acoustic theory of speech production
The acoustic theory of speech production offers a well-established method
to relate a given articulatory conguration to the physical characteristics
of the produced sounds. As mentioned in the introduction, the three basic
components of the models simulating the relationship between articulation
and acoustics are: (i) the location of the constriction formed by the tongue or
the lips; (ii) the magnitude of the constriction; and (iii) the lip conguration
(Stevens and House 1955, Fant 1960). More sophisticated models include
the shape of the tongue, larynx height, length and shape of the constriction,
more details on lip conguration (with a distinction between protrusion
and rounding), or side cavities such as the nasal passage and secondary
constriction(s) (Maeda 1996; Fant and Bvegrd 1997).
For the sake of simplicity, the complex reality is maximally simplied in
the present paper. A single nomogram ( Figure 1, adapted from Fant, 1960:
82) is used to exemplify the principles underlying the effect of the speech
organs movements on the acoustics of the speech signal. A nomogram
gives a rough but very useful indication of the behaviour of the rst ve
formants when the narrowest passage in the vocal tract is moved from the
lips (left in Figure 1) to the glottis (right), and when the lips are more or less
rounded. It has been veried that such simplied modelling provides a useful
approximation of the behavior of formant frequencies (for a comparison
between Fants nomogram, Maedas model and the rendition of the
nomogram by phoneticians, see Badin, Perrier and Bo 1990). I n Figure 1,
only the effect of strong rounding is represented. The third parameter, the
degree of constriction, is xed at 0.65 cm in Figure 1. Five formants are
visible under 5 kHz: the average spacing of the formants is 1 kHz for a male
speaker; the spacing is wider for female (and child) speakers, because of
5000
4000
3000
2000
1000
5000
0
lips
18 16 14
F1
F4
F5
F3
F1
Cl [i] C9 [y] [u] [o] [ ]
c
C5 [] C8
F2
F3
F2
u
3
4
F2
F3
F4
F5
Front
mid back
F3
F1
glottis
prepalatal
i
y
ae
2
l
12 10 8 6 4 2 1
3200 Hz
(F3F4)F3
1900 Hz
(F2F3)
400 Hz
(F1F2)
1000 Hz
(F1F2)
F5
F igure 1. Top: Nomogram adapted from Fant using the three-parameter vocal tract
model. The tongue constriction size is xed at 0.65 cm
2
. Plain lines and
dashed lines correspond respectively to no lip rounding and to a 0.16 cm
2

lip area. Glottis at the right and lips at the left [after Fant 1960: 82]. See text
for discussion of the circles. Bottom: spectrograms of the corresponding
four cardinal vowels, as spoken by a female native speaker of French,
plus the intermediate back vowels /o/ and /:/ and their notation. See
text for further discussion. The notation used below the spectrograms is
discussed in Section 4.
their shorter vocal tracts. The ve formants represent the so-called F-pattern
(F1 to F5).
The ve formants are not always visible on spectrograms because some
of them may not be excited by a sound source or not sufciently excited,
or their intensity may be reduced because of the presence of anti-formants.
For vowels, the (voice) source is at the glottis, and the whole F-pattern is
excited. For obstruents, a noise source is created close to the narrowest
point of constriction: this noise source excites mainly the resonances of the
cavity between the constriction and the lips. One of the differences between
vowels, glides and consonants is the size of the narrowest passage: The size
of the constriction varies from large to zero, for vowels, glides, fricatives and
stops. The F-pattern depends mainly on the tongue and lip conguration and
remains approximately the same, independent of the size of the constriction
and location of the source(s). The F-pattern is always calculable, for any
vocal tract conguration, once the shape of the sagittal prole is known
(see Fant 1960, for calculations based on X-ray data, for the complete set
of Russian vowels and consonants). The place of articulation from the lips
to the glottis determines an almost continuously varying aspect of segment
patterns, reecting the continuous movements of the speech articulators.
Some discrete breaks in the F-pattern are due to change in type of source,
or to sudden coupling of a side cavity, such as the nasal cavity, the tracheal
cavity or a lateral cavity (Fant 1960; Stevens [1998] 2000).
The four circles in Figure 1 approximate some of the points where one of
the rst three formants reaches a local maximum (circle 1 for F3) or a local
minimum (circle 1 for F4, circle 2 for F3, circles 3 and 4 for F2). The gure
will be described in more detail further below.
The following remarks concern the points where one formant reaches a
local maximum or a local minimum. First, when a formant is maximally high
or low, it tends to converge with another formant. The points of converging
formants are called focal points and correspond to quantal regions, as
described by Stevens (Stevens 1989): relatively large changes in position
of the constriction around the focal points will cause little change in the
acoustic signal (at least for the frequency of the two formants concerned).
When two formants converge, there is also an increase in their amplitude of
6 dB per halving their distance (Fant 1960:58). Furthermore, the closeness
of the two formants and their increased amplitude create a sharp spectral
salience in a well-dened frequency range. Two close formants are perceived
as a single formant (Chistovich and Lublinskaya 1979). As is well known,
vowel quality can be obtained by two-formant synthesis, F1 and F'2. F'2 is
obtained by matching the quality of the vowel by one single formant, F1
being xed. F'2 therefore gives an indication of the perceptual contribution
of upper formants and represents an integrated value of F2, F3 and F4. F'2
seems to be attracted by clustered formants. For Swedish listeners, F'2 is the
highest for /i/ (close to the cluster F3F4, closer to F4) and the lowest for /u/
(close to the cluster F1F2 and closer to F1). It is close to the cluster (F2F3)
for /y/ and to (F1F2) for /o/ (but closer to F2 than F1) (Carlson, Granstrm,
and Fant 1970). (Swedish /y/ does not correspond to cardinal /y/, as will be
discussed further below.) When there are no converging formants, F'2 is not
attracted by a single formant. We may conclude then that the clustering of
two formants inhibits the perceptual contribution of the upper formants and
of F2 in the case of /i/ (observe the weaker amplitude of F2 than the cluster
(F3F4 ) in Figure 2). Second, when two formants are converging, one of the
converging formants may be extremely sensitive to both lip conguration
and degree of constriction. They seem to be good points of departure to
create new contrasts. Cardinal /i/ is characterized by converging (F3F4). The
sensitivity of F3 to lip rounding for the very front constriction is employed to
contrast /i/ and /y/ by lowering F3 (note that /i/ and /y/ have about the same
F1 and F2, they differ by F3). The sensitivity of degree of constriction is
employed to create glides, such as /j/, at every point where formants converge
(see later).
Figure 2 illustrates the spectrograms of the cardinal vowels /i y o : o u/ as
produced by Daniel J ones and imitated by Peter Ladefoged (the sounds are
available at UCLA internet site). The vowels have converging formants as a
Fi gure 2. Spectrograms corresponding to six of the cardinal vowels as pronounced
by Daniel J ones (left) and Peter Ladefoged (right) (The sounds can
be found at http://www.phonetics.ucla.edu/course/chapter9/cardinal/
cardinal.html). The two converging formants often form a single peak
visible in the spectrogram: (F3F4) for J oness /i/, (F2F3) for Ladefogeds
/y/, (F1F2) for J oness /u/ and Ladefogeds /:, o, u/.
common characteristic: F4 and F3 for /i/, F2 and F3 for /y/, and F1 and F2 for
the back vowels. The vowels are very close to the corresponding six French
vowels, except for J oness /y/ which does not sound French and which does
not have converging formants, F2 and F3, unlike Ladefogeds /y/. /i y o u/
correspond to the four encircled turning points in the nomogram represen ted
in Figure 1.
4. Description of the cardinal vowels
based on their acoustico-perceptual quality
The vowels are now described using our notation. In the expression
(FnFn+1)
xHz
, the parentheses (FnFn+1) indicate that the two formants Fn
and Fn+1 converge; () in the expression Fn (Fn) indicates that the
formant Fn is maximally high (low) in frequency for that formant; in the
expression (FnFn+1)
xHz
indicates that the whole cluster formed by Fn and
Fn+1 is made as high as possible (and where neither Fn nor Fn+1 represents
a turning point). x Hz indicates the approximate location of the spectral
concentration (its exact location depending on characteristics of the speaker,
on particular his/her sex and vocal tract length); underlined Fn indicates
that the formant is a resonance of the front cavity (and therefore especially
sensitive to lip conguration). Fn is also excited by the supraglottic noise
during fricatives and in the frication part of stops). The downward arrow
in Fn indicates that the formant is low.
4.1. Cardinal vowel No. 1: C1[i] =prepalatal (F3F4)
3200Hz

(F3F4)
3200Hz
is a point where F3 and F4 converge (circle 1 in Figure 1) at
about 3200 Hz (for a male speaker, higher for a female or a child speaker),
and where F3 represents a local maximum. F3 is a half-wave resonance (no
close end) of the front cavity, which is made as short as possible to obtain the
Table 1. Correspondence between the focal vowels and four of the cardinal vowels
Place of constriction Spread lips Rounded lips
Back (F1F2)
1000Hz
C5[o]
Mid (F1F2)
400Hz
C8[u]
Front Prepalatal (F3F4)F3
3200Hz
C1[i] (F2F3)
1900Hz
C9[y]
highest possible F3 value. In that position, neither F1 (a Helmholtz resonance)
nor F2 (a half-wavelength resonance of the back cavity) is independently
controllable. F2 and F3 correspond to two half-wavelength resonances,
the type of resonances that produces the highest resonance frequency. The
(F3F4)F3
3200Hz
vowel corresponds to the cardinal /i/ produced by D. J ones
and P. Ladefoged (see bottom of Fi gure 2), to French /i/ (Schwartz et al
1997, Vaissire 2007), and to Swedish /i/ (Fant 1973:96, 2004:29). As J ones
stated, cardinal /i/ is the sound in which the raising of the tongue is as far
forward as possible and as high as possible, and the lips are spread. It does
not correspond to midpalatal /i/, often observed in English (see Delattre 1965
for an X-ray study comparing French and English; Gendrot, Adda-Decker,
and Vaissire 2008 for comparison of statistical data concerning /i/ in a
large number of languages; Willerman and Kuhl 1996 for a perception study
showing differences in identication of /i/-like stimuli between English and
Swedish listeners). Prepalatal /i/ has a higher F3 and lower F2 than mid-
palatal /i/. A vowel with the highest F2 will be represented in our notation as
(F2)
2500Hz
. This (non cardinal) /i/-like sound could be taken as a reference,
with an F2 maximally high (around 2500 Hz), corresponding to a constriction
at about 11cm from the glottis, but where F2 is clustered neither with F3 nor
with F4. When F3 and F4 are clustered, F1 and F2 amplitude is minimal (as
pointed out above concerning F igure 2), again enhancing the acuteness of
the vowel: the lower F3 in midpalatal /i/ leads to a duller quality than in
prepalatal /i/. In languages using prepalatal /i/, /i/ does not necessarily have
the highest F2 as compared to the other vowels (see an example in Swedish,
Fant 1973: 96, where /e/ has higher F2 than /i/). Raising the larynx favors
a high F2, by shortening the back cavity, but does not favor a low F1 (a
Helmholtz resonance), because it reduces the volume of the back cavity. It is
thus preferable to advance the tongue root to increase the volume of the back
cavity (to lower F1), while keeping the back cavity short (for a high F2).
4.2. Cardinal C9[y] =(F2F3)
1900Hz
(F2F3)
1900Hz
also corresponds to the narrowest passage in the prepalatal
region, where F3 is most sensitive to rounding. For the production of /y/, the
lips are rounded, but moderately protruded when compared to the rounding
necessary to create the cardinal vowel /u/. The lengthening of the front cavity
allows for an abrupt decrease in F3 frequency. F3 becomes clustered with F2,
creating a spectral peak around 1900 Hz, after F2 has become a resonance
of the front cavity. Further protrusion of the lips would lower F2 and there
would be no clustering with F3, resulting in a vowel quality that would not
sound like /y/. Front (F2F3)
1900Hz
does not correspond to J oness /y/, nor
to Swedish /y/ (where F3 is equidistant from F2 and F4, Fant 1973: 98).
However, it clearly corresponds to the rendition of cardinal vowel /y/ by
Peter Ladefoged and to French /y/. Note that languages contrasting /i/ and
/y/ seem to prefer a prepalatal position for both (Wood 1986), but this is not
true of Swedish (as described by Fant 1973: 94-99); there is lip protrusion
in French and German, but not in the Scandinavian languages (Malmberg
1974:139). As far as I have observed on spectrograms, German /y/ does not
correspond to (F2F3)
1900Hz
either: F2 is separated from F3. The notation for
Swedish /y/ is given in Vaissire (2007).
4.3. Cardinal C8[u]: (F1F2)
400Hz
(F1F2)
400Hz
corresponds to the narrowest passage in the middle of the vocal
tract (at about 6.5 cm, Fant and Bvegrd 1997). The use of Maedas model
shows that C8[u] is the vowel with the lowest possible concentration of energy
that a human vocal tract is capable of producing: F1 and F2 correspond to
two Helmholtz resonances, the type of resonances that produces the lowest
resonance frequency. It could be represented by (F1F2)
400Hz
. A strong
rounding of the lips allows for a decrease in F2, which reaches its minimum.
F2 of /u/ may be considered mainly as a resonance of the front cavity (Fant
1960: 211). Strictly speaking, the narrowest passage for /u/ is much fronter
than for /o/, in the velar region. (F1F2)
400Hz
corresponds to J oness cardinal
vowel /u/, to its rendition by Ladefoged, to the French and Swedish vowel
/u/, but not to the English /u/, which usually has a higher F2. French /u/
does not have the same spectral quality as Spanish /u/, which is close to
French /o/.
4.4 Cardinal C5 [o]: (F1F2)
1000Hz
(F1F2)
1000Hz
is a sound where F1 is very high and still clustered with
F2, creating a sharp peak around 1000 Hz. F1 is not exactly maximal, and
F2 could be made lower. The location of the constriction corresponds to the
highest clustering (F1F2). Note that a constriction at the root of the tongue
leads to an even higher F1 (see Figure 1), but to a separation of F1 and
F2, since it raises the frequency of F2. A constriction at the root creates an
//-like sound (Fant and Bvegrd 1997). C5[o], [a] and [] share a high
F1, but strictly speaking, only C5[o] is a quantal vowel. In the two other
vowels, the rst two formants are separated and do not sound like a back
vowel.
4.5. Mid vowel [e] =(F2F3)
1500Hz
Another vowel, which is not considered cardinal, nonetheless represents an
extreme in terms of F3 which gets as low as 1500 Hz. Three constrictions
are necessary for the production of such a low F3 since there are three points
along the vocal tract where the volume velocity nodes of F3 are located
(Chiba and Kajiyama 1941). The production of (F2F3)
1500Hz
is achieved by
a constriction in the pharyngeal region, lip rounding and a bunching of the
tongue toward a node corresponding to the third resonance.
4.6. Creating the back (F1F2) series /u o o/
The whole back series, /u o o/, is characterized by the clustering of the rst
two formants, and by weak intensity of the upper formants. The series can be
synthesized using a single formant at equal intervals in frequency between
/u/ and /o/ (Delattre et al. 1952). To keep the rst two formants close together,
the tongue constriction has to move back from /u/ to /o/ synchronously with
the delabialization gesture. X-ray data show that the continuum /u o : o/
corresponds to a backing of the constriction and not to an increase in the
area of the constriction (Wood 1979). When the constriction is in the back
region of the vocal tract, jaw opening has much less effect on the formants
than when it is in the front. Strictly speaking, the vocal tract is as closed
for /o/ as for /u/, but the highest point of the tongue is actually higher for /u/
than for /o/.
4.7. Creating the series C2/e/, C3/c/ and C4/a/
Unlike for the vowels described above, no formants are regrouped for
these vowels. The constricted part is less narrow (Fant and Bvegrd 1997)
than for the focal vowels. Since they are not focal, and do not correspond
to turning points, these vowels are more difcult to dene in acoustic
terms.
5. From vowels to glides and to consonants:
Coarticulation processes
5.1. From vowels with a strong constriction to glides
Table 2 illustrates the specication of the glides corresponding to the point
vowels described earlier: /j/, /q/, //, /t/, /w/ have F-patterns similar to /i/, /y/,
/e/, /o/ and /u/, with clustered formants. The formant frequencies of glides
are more extreme: as the constriction is made tighter in the front region of
the vocal tract, F2 gets higher than for the corresponding vowel; when the
constriction is closer to the glottis (as for /t/ and /w/), it gets lower (on the
effect of reducing a constriction, see Fant 1960: 81). When extracted from
the sequences /iji/, /yqy/, /ee/, /oto/, /uwu/, the portions corresponding to
/j/, /q/, //, /t/, /w/ are respectively identied as the vowels /i/, /y/, /e/, /o/
and /u/.
The palatal approximant /j/, palatal fricatives, palatalized liquid //, and
palatalized allophones of /l/ or /g/ share a low F1 and high F2, and (F3F4)
are often clustered. Their acoustic similarity is generally not perceived.
However, if the consonantal portion of /lili/ is extracted and presented
to nave listeners, it is perceived as having a vowel quality close to /i/.
Similarly, if some portion of the glide /j/ is left before /i/, the stimulus is
perceived as [gi] (Vaissire 2006). As will be discussed below, all phonemes
sharing a similar F-pattern will have the same effect on the surrounding
pho nemes.
Table 2. F-pattern of the glides in relation to the corresponding point vowels
vowel Corresponding glide Type of clustering Main effect on the
surrounding phonemes
i j High (F3F4) F1 F2 F3
y q High (F2F3) F1 F2 F3
e Low (F2F3) F3
o t High (F1F2) F1 F2
U w Low (F1F2) F1 F2
5.2. Stops and fricatives consonants
The same type of F-pattern as illustrated for vowels and glides applies to
more constricted vocal congurations, such as for fricatives and stops.
The parameters used to describe the vocal tract and lip congurations
for vowels and glides pertain also to the description of the tongue and
lip congurations for fricatives and stops. Other parameters pertaining
to the length and shape of the constriction may improve the modelling
(Maeda 1996; Fant and Bvegrd 1997) but such details are not relevant
for our present purpose. As for vowels and glides, the F-pattern for stops
and fricatives contains about 5 resonances up to 5 kHz (see for example
Fant 1973:100-139 for calculations of the F-pattern of stops in CV
syllables).
In contrast to oral vowels, the entire F-pattern is not excited in fricatives
and stops. Simplifying slightly, we can consider that only the resonances due
to the cavity between the constriction and the lips are excited. The effective
length of that cavity depends on lip rounding and protrusion, on the front-
back position of the tongue and on the shape of the constriction. Depending
on the shape of the constriction, for example, the type of resonance may
be a half-wavelength type (as for /t/) or a quarter-wavelength type (as for
/k/). For the same length of the cavity, half-wavelength type resonances are
twice as high as quarter-wavelength type resonances. Large compensation
manoeuvers are therefore possible, which are easy to understand. Figur e 3
(top) represents the resonances due to the front cavity on the nomogram
illustrated in Figu re 1. As the constriction moves from the lips (no formants
excited) to the pharyngeal region, the cavity in front of the constriction tends
to become longer, and as a consequence of this, lower and lower formants
are excited. The lower formants up to F5 are not excited during the labials,
because there is no front cavity. The formants above F5 are excited for the
anterior consonants (dental and alveolar). F3 is excited in the case of post-
alveolar constriction and F2 in the case of a pharyngeal constriction; again,
F3 is excited when the constriction is close to the root, e.g. for /n/.
Fig ure 3 (bottom) illustrates the spectrograms corresponding to /ki/, /ke/,
/ka/, /ko/ and /ku/, where the constriction location of /k/ adjusts from anterior
to posterior due to perceptual requirements. Lower and lower formants are
excited as the constriction moves from very front to velar. Note that the lower
resonance in the case of /ku/ as compared to /ko/, which has a more backed
constriction, is most likely due to rounding: the length of the front cavity is
longer in the case of /ku/ than in the case of /ko/.
Depending on the relative size of the constriction area and the glottis
opening, approximants may become fricatives and vice versa. For example,
the realisation /j/ may be accompanied by noise if the constriction is made
tighter: the higher formants are excited by the noise and the lower formants by
the glottal source; the creation of noise is not favourable to the maintenance
of voicing; it may be devoiced and become acoustically a fricative (only the
formants in front of the constriction are excited). Uvular and pharyngeal
fricatives, when voiced, have the characteristics of approximants (Yeou and
Maeda 1995), etc. The tendency for the constriction of a consonant to be
tighter or less tight than expected, and the opening of the glottis depends on the
prosodic status of the phoneme. Most of the variations make excellent sense
from an acoustic point of view and the gradient changes can be modelled.
5000
4000
3000
2000
1000
5000
4000
3000
2000
1000
ki
glottis
k ka k ku
lips
18 16 14 12 10 8 6 4 2
glottis
1
F2
F2
F3
F4
F5
F5
F4
F3
F2
F1
Figure 3. Top: nomogram with the representations of the resonances due to the front
cavity extracted from the complete nomogram shown in Figure 1. Bottom:
spectrograms corresponding to /ki/, /ke/, /ka/, /ko/ and /ku/. See text.
5.3. Coarticulatory processes
The F-pattern for a phoneme is the result of the coarticulation of the tongue
and lip congurations of the surrounding phonemes and those required for
the phoneme (hman 1966). The direction and extent of coarticulatory
overlapping are language-specic (Manuel 1990) and depend on a number
of factors, such as the duration of the phoneme and the prosodic status of the
phoneme. Coarticulation leads to the neutralisation of certain contrasts in
given contexts, and possible sound changes.
The effect of coarticulation, and well-described phenomena such as
palatalization or pharyngealization can be accounted for by similar principles.
The F-pattern of consonants with a secondary articulation, such as palatalized
or pharyngealized consonants, show the same inuences relative to the
F-pattern for the consonants with a single place of constriction. Palatalization,
labialisation and retroexion modify the effective length of the front cavity,
and therefore the frication part of stops and fricatives. Pharyngealization, on
the contrary, causes a minor change to the resonance patterns in front of the
main constriction because the secondary place of articulation is at the larynx
and does not have much inuence on the shape of the front cavity (Fant 1960:
219). Ta ble 3 summarizes these effects. A fronted position of the tongue due
to the phonological palatalization of consonants (such as in Russian, see Fant
1960: 220221; Fant 1973: 69), or to surrounding front phonemes, always
has the effect of lowering F1 (such as in /tat/, where /a/ tends to be perceived
as /c/ if extracted), and to raise the formants due to the front cavity (generally
F2) (such as in /tut/, where /u/ is centralized). Similarly, a backed position
of the tongue due to phonological pharyngealization (as in Arabic) or to
surrounding back phonemes always has the effect of raising F1 (as in /tut/,
where /u/ tends to be perceived as /o/ if extracted) and to lower the formants
due to the back cavity.
To conclude, Table 4 summarizes the commonalities between vowels
and consonants, and the gestures for achieving extreme (low or high) values
for the rst three formants. In short, there is one main gesture for lowering
F1 (fronting of the constriction), two for manipulating F2 (backing and
rounding) and three for F3 (backing, rounding and retroexion).
6. Conclusion
The articulatory description of phonemes is extremely useful, but sometimes
difcult to achieve in sufcient detail. Vowels defy articulatory description
T
a
b
l
e

3
.

T
h
e

c
o
l
u
m
n
s

g
r
o
u
p

t
o
g
e
t
h
e
r

t
h
e

p
h
e
n
o
m
e
n
a

w
h
i
c
h

h
a
v
e

t
h
e

s
a
m
e

r
a
i
s
i
n
g

o
r

l
o
w
e
r
i
n
g

e
f
f
e
c
t
s

o
n

t
h
e

r
s
t

f
o
u
r

f
o
r
m
a
n
t
s
,

d
u
e

t
o

a

s
i
m
i
l
a
r

t
o
n
g
u
e

o
r

l
i
p

c
o
n

g
u
r
a
t
i
o
n
,

t
h
e

i
n

u
e
n
c
e
s

o
f

p
h
o
n
e
t
i
c

c
o
n
t
e
x
t
,

a
n
d

t
h
e

e
f
f
e
c
t
s

o
f

s
e
c
o
n
d
a
r
y

c
o
n
s
t
r
i
c
t
i
o
n
s

/
i
/
-
n
e
s
s
/
o
/
-
n
e
s
s
/
u
/
-
n
e
s
s
o
t
h
e
r
s
F
1

F
2
F
3
F
1

F
2
F
1

F
2
F
3

o
r

F
4
c
l
o
s
i
n
g

o
f

o
p
e
n

p
h
o
n
e
m
e
s

a
n
d

f
r
o
n
t
i
n
g

o
f

b
a
c
k

p
h
o
n
e
m
e
s
o
p
e
n
i
n
g

o
f

c
l
o
s
e
d

p
h
o
n
e
m
e
s

a
n
d

b
a
c
k
i
n
g

o
f

f
r
o
n
t

p
h
o
n
e
m
e
s
T
o
n
g
u
e

o
r

l
i
p

c
o
n

g
u
r
a
t
i
o
n
f
r
o
n
t
e
d
r
e
t
r
a
c
t
e
d
t
o
n
g
u
e

m
i
d
(
+
l
a
b
i
a
l
i
s
a
t
i
o
n
)
r
e
t
r
o

e
x
e
d
1
)

C
o
n
t
e
x
t
u
a
l

i
n

u
e
n
c
e
s
f
r
o
n
t

c
o
n
s
o
n
a
n
t

o
r

v
o
w
e
l
b
a
c
k

c
o
n
s
o
n
a
n
t

o
r

v
o
w
e
l
r
o
u
n
d

c
o
n
s
o
n
a
n
t

o
r

v
o
w
e
l
r
e
t
r
o

e
x

c
o
n
s
o
n
a
n
t

o
r

v
o
w
e
l
p
a
l
a
t
a
l
p
h
a
r
y
n
g
e
a
l
l
a
b
i
o
-
v
e
l
a
r
r
e
t
r
o

e
x
2
)

S
e
c
o
n
d
a
r
y

c
o
n
s
t
r
i
c
t
i
o
n
p
a
l
a
t
a
l
i
z
e
d
p
h
a
r
y
n
g
e
a
l
i
z
e
d
l
a
b
i
o
-
v
e
l
a
r
i
z
e
d
r
e
t
r
o

e
x
e
d
P
r
o
c
e
s
s
e
s
p
a
l
a
t
a
l
i
z
a
t
i
o
n
p
h
a
r
y
n
g
e
a
l
i
z
a
t
i
o
n
l
a
b
i
a
l
i
z
a
t
i
o
n

a
n
d
v
e
l
a
r
i
z
a
t
i
o
n
F
3
:

p
a
l
a
t
a
l

r
e
t
r
o

e
x
i
o
n
F
4
:

a
l
v
e
o
l
a
r

r
e
t
r
o

e
x
i
o
n
Table 4. Combined gestures causing one of the rst three formants to be maximally
high or low. The values in Hz are approximate.
Lowest possible formant Highest possible formant
F1 Narrowest constriction at the
anterior part, advanced tongue
root, lowered larynx, raised
velum (larger pharyngeal velum)
+small lip opening
(F1)/i/ (-300 Hz)
All stop consonants
F1 =Helmholtz resonance
F1 Narrow constriction at the far
back part, constricted tongue root
+large lip opening
(F1): // (-700 Hz)
Uvular and pharyngeal
consonants (// /t/)
F1=Quarter wave, back cavity
resonance
F2 Narrow constriction at the
middle (velar) region +lip
rounding and protrusion
(F2): /u/ (-700 Hz)
Labio-velar consonants
F2 =Helmholtz resonance
F2 Narrow constriction at the mid-
palatal region +lip spreading
+glottal constriction
(F2) Mid-palatal /i/ (-2300 Hz)
Mid-palatal consonants
F2 =half wave, back cavity
resonance
F3 Narrow constriction at the back
(pharyngeal) region +bunching
of the tongue, retroexion
+lip rounding and lip protrusion
(F3): /e / (-1500 Hz)
Retroex consonants
Front cavity resonance
F3 Narrow constriction at the
front (apical and prepalatal)
regions +lip spreading +(larynx
lowering)
(F3): prepalatal /i/ /j/ // (-3000
Hz)
Pre-palatal consonants
F3 =half wave, front resonance
cavity
because they do not have a precise place of constriction. For consonants,
the shape of the tongue may play a role; an acoustic contrast such as plain
vs. at may be produced by a set of articulatory manoeuvers from different
parts of the vocal tract, which conspire to produce a certain percept for the
consonants.
Phonetic transcription using IPA has proven useful, but in practice,
transcription raises fundamental issues since the choice of IPA symbols
depends in no small part on the transcribers native language and on the
instructions that they received during their training. In addition to the set of
symbols for vowels and consonants, the IPA proposes diacritics to transcribe
some differences between similar sounds; however, in order to describe ne
differences such as that between the vowels transcribed as /u/ in French,
English and Spanish, there is a clear need for well-established references
for comparison. The cardinal vowels devised by Daniel J ones can be used
as references but there are disturbing discrepancies between Daniel J oness
production, and the rendering of the same vowels by Peter Ladefoged. I have
shown that some of the cardinal vowels have a clear acoustic denition and
correspond to quantal regions as described by Ken Stevens. They are good
candidates to be dened and used as reference vowels. The quantal vowels as
described in Stevenss Quantal Theory (with converging formants) may not
correspond precisely to the most frequent vowels in the worlds languages,
but this does not detract from their usefulness as references in the description
of vowel systems.
The specication of the phoneme in terms of distinctive features does
not always reect the acoustic-perceptual similarity between the sounds:
the (only) back consonant in French is actually acoustically close to the
back vowel /o/; /l/ and /g/ in the /i/ context share acoustic characteristics
with /i/ and /j/ (such as a clustering of (F3F4), visible during /g/ when the
closure is not complete). Their short acoustic distance is not reected in
their denition (see examples in Vaissire 2007), but it is reected in sound
changes.
Acoustic description, based on observations of the data on modelling is
a welcome addition to a description in terms of articulation and in terms of
distinctive features. Acoustic description is sometimes done in the literature,
but it is often incomplete. The habit of using only the two rst formants to
represent vowels still persists, but it is not entirely justied, at least for front
vowels: the notion of F'2 has been well established and F3 plays a very
large role in languages such as Swedish and French. The correlation between
front/back constriction and F2, on the one hand, and high-low and F1, on
the other is overestimated, whereas the role of the lips (in determining F2
of the back vowels) and the relative amplitude of the formants (which plays
a role for contrasting oral and nasal vowels) is often neglected. The lack of
information on F3 and F4 makes it difcult to determine the position of a
vowel relative to the cardinal vowels.
The point vowels as dened here may go beyond the extreme vowels
produced by the speaker in his/her native language, even when his/her
vocalic triangle is maximally stretched. The real time visualisation of the
formants when he/she tries to utter the point vowels as dened in the present
article is very useful. The vowels in his/her own language may be located
relative to these point vowels. The point vowels may then be used for speaker
normalisation. The use of a common descriptive apparatus for all sounds
brings out the continuity between vowels, glides, fricatives and stops. A solid
grounding in the laws of acoustics is also a promising basis for studies of
coarticulation. Clearly, these constitute challenges for future research.
Acknowledgements
Many thanks to Shinji Maeda and Alexis Michaud for revising the nal
version of this paper and to Beth Hume for her corrections!
References
Badin, Pierre, Pascal Perrier, and Louis-J ean Bo
1990 Vocalic nomograms: Acoustic and articulatory considerations upon
formant convergences. Journal of the Acoustic Society of America 87:
12901300.
Blevins, J uliette
2004 Evolutionary Phonology: The Emergence of Sound Patterns.
Cambridge: Cambridge University Press.
Bo, Louis-J ean, Pascal Perrier, Bernard Gurin, and J ean-Luc Schwartz
1989 Maximal vowel space. First European Conference on Speech
Communication and Technology (EUROSPEECH-1989). 22812284.
Browman, Catherine P., and Louis Goldstein
1992 Articulatory phonology: An overview. Phonetica 49: 155180.
Carlson, R., Bjorn Granstrm, and Gunnar Fant
1970 Some studies concerning perception of isolated vowels. Speech
Transmission Laboratory Quarterly Status Report (STL-QPSR 2/3):
1935.
Chang, Steve S., Madelaine C. Plauch, and J ohn J . Ohala
2001 Markedness and constraint confusion asymmetries. In: Hume
Elizabeth and Keith J ohnson (eds.). The role of speech perception in
phonology: 79102. San Diego CA: Academic Press.
Chiba, T. and M. Kajiyama
1941 The Vowel: Its Nature and Structure. Tokyo: Tokyo-Kaiseikan
Publishing Company.
Chistovich, Ludmilla A., and Valentina V. Lublinskaya
1979 The center of gravity effect in vowel spectra and critical distance
between the formants: Psychoacoustical study of the perception of
vowel-like stimuli. Hearing Research 1: 185195.
Chomsky, Noam, and Morris Halle
Clements, G.N., and Samuel J ay Keyser
1983 CV-phonology: A Generative Theory of the Syllable. Cambridge. MIT
Press.
Delattre, Pierre
1965 Comparing the vocalic features of English, German, Spanish and
French. International Review of Applied Linguistics 2: 7197.
Delattre, Pierre, Alwin Liberman, Frank Cooper, and Louis J . Gerstman
1952 An experimental study of the acoustic determinants of vowel color.
Word 8: 195210.
Durand, Marguerite
1955 Du rle de lauditeur dans la formation des sons du langage. Journal
de Psychologie Normale et Pathologique 52: 347355.
Fant, Gunnar
1970 Reprint. Acoustic Theory of Speech Production, with Calculations
Based on X-ray Studies of Russian Articulations. The Hague: Mouton.
Original edition, The Hague: Mouton, 1960.
1973 Speech Sounds and Features. Cambridge. MA and London, UK: MIT
Press.
2004 Reprint. The relations between area functions and the acoustical signal.
In: Gunnar Fant (ed.), Speech Acoustics and Phonetics: Selected
Writings. Dordrecht: 2957. Original, Phonetica 37: 5586, 1980.
Fant, Gunnar, and Mats Bvegrd
1997 Parametric model of VT area functions: Vowels and consonants.
Speech, Music and Hearing-Quarterly Progress Status Report:
Stockholm. 38(1): 1 20.
Flemming, Edward S.
2002 Auditory Representations in Phonology. New York: Routledge.
Fougeron, Ccile
1999 Prosodically conditioned articulatory variations: a review. UCLA
Working Papers in Phonetics 97: 174.
Gendrot, Cdric, Martine Adda-Decker, and J acqueline Vaissire
2008 Les voyelles /i/ et /y/ du franais: aspects quantiques et variations
formantiques. Proceedings of the Journes dEtude de la Parole,
2008, Avignon, France.
Greenberg, J oseph H.
2005 Reprint. Language Universals: With Special Reference to Feature
Hierarchies. Berlin: Mouton de gruyter Original edition, The Hague:
Mouton, 1966.
Hardcastle, Williams J ., and Nigel Hewlett
1999 Coarticulation, Theory, Data, and Techniques. Cambridge: Cambridge
University Press.
Haudricourt, Andr
1969 La linguistique panchronique ncessaire la linguistique compare,
science auxiliaire de la diachronie sociologique et ethnographique.
Ethnies 3: 2326.
Hume, Elizabeth, and Keith J ohnson (eds.)
2001 The role of speech perception in phonology. San Diego CA: Academic
Press.
J akobson, Roman, Gunnar Fant, and Morris Halle
1967 Preliminaries to Speech Analysis: The Distinctive Features and Their
Correlates. Cambridge, MA: MIT Press. Original edition, Cambridge,
MA: MIT Press, 1952.
J ones, Daniel
1918 An Outline of English Phonetics. Cambridge: Cambridge University
Press.
Kamiyama, Takeki, and J acqueline Vassire
2009 Perception and production of French close and close-mid rounded
vowels by J apanese-speaking learners. Revue Acquisition et
Interaction en Langue Etrangre 2: 941.
Keyser, Samuel J ay, and Stevens, Kenneth N.
2006 Enhancement and overlap in the speech chain. Language 82(1): 3363.
Liberman, Alwin M., and Ignatus G. Mattingly
1985 The motor theory of speech perception revised. Cognition. 21: 136.
Lijljencrants, J ohan and Bjrn Lindblom.
1972 Numerical simulation of vowel quality systems: The role of perceptual
contrast. Language 48: 83962.
Lindblom, Bjrn
1986 Phonetic universals in vowel systems. In: J ohn Ohala and J . J . J aeger
(eds.). Experimental Phonology, 1344. Orlando, FL: Academic Press.
1983 Economy of speech gestures. In: MacNeilage, Peter F. (ed.). The
Production of Speech. New York: Springler-Verlag.
1990 Explaining phonetic variation: a sketch of the H&H theory. In:
William Hardcastle, and Alain Marchal (eds.), Speech Production and
Speech Modelling: 403439. Kluwer. Dordrecht 1990.
Lindbom, Bjrn, and J ohan Sundberg
1969 A quantitative theory of cardinal vowels and the teaching of
pronunciation. Speech Transmission Laboratory Quarterly Progress
Status Report: Stockholm. 10: 1925.
Maddieson, Ian
1984 Patterns of Sounds. Cambridge: Cambridge University Press.
Maeda, Shinji
1989 Compensatory articulation in speech: analysis of x-ray data with
an articulatory model. First European Conference on Speech
Communication and Technology (EUROSPEECH-1989). 24412445.
1996 Phonemes as concatenable units: VCV synthesis using a vocal-
tract synthesizer. In: Adrian Simpson and M. Patzod (eds.),
Sound Patterns of Connected Speech: Description, Models and
Explanation. Arbeitsberichte des Instituts fr Phonetik und digitale
Sprachverarbeitung der Universitt Kiel. 31: 127232.
Malmberg, Bertil
1974 Manuel de phontique gnrale. Paris : Editions A. & J . Picard.
Manuel, Sharon. Y.
1990 The role of contrast in limiting vowel-to-vowel coarticulation in
different languages. Journal of the Acoustical Society of America. 88:
12861298.
Ohala, J ohn
1981 The listener as a source of sound change. In: C. S. Masek, R. A. Hendrick,
and M. F. Miller (eds.). Papers from the Parasession on Language and
Behavior. Chicago: Chicago Linguistic Society, 178203.
1993 The phonetics of sound change. In Charles J ones (ed.). Historical
Linguistics: Problems and Perspectives. London: Longman, 237278.
1996 Speech perception is hearing sounds, not tongues. Journal of the
Acoustical Society of America. 99: 17181725.
1997 Aerodynamics of phonology. Proceedings of the 4th Seoul
International Conference on Linguistics [SICOL], 9297.
hman, Sven E.G.
1966 Coarticulation in VCV utterances: Spectrographic measurements.
Journal of the Acoustical Society of America 39: 151168.
Passy, Paul
1890 Etude sur les changements phontiques et leurs caractres gnraux.
Paris: Librairie Firmin-Didot.
Saltzman, Elliot L., and Kevin G. Munhall
1989 A dynamical approach to gestural patterning in speech production.
Ecological Psychology 1: 333382.
Schwartz, J ean-Luc, Louis-J ean Bo, Nathalie Valle, and Christian Abry
1997 The dispersion-focalization theory of vowel systems. Journal of
Stevens, Kenneth N.
1989 On the quantal nature of speech. Journal of Phonetics 17:345.
2000 Acoustic Phonetics. Cambridge, MA and London, UK: MIT Press.
Original edition: Cambridge, MA and London, UK: MIT Press, 1998.
Stevens, Kenneth N., and Arthur House
1955 Development of a quantitative description of vowel articulation.
Journal of the Acoustical Society of America 27: 401493.
Stevens, Kenneth, and Sheila Blumstein
1978 Invariant cues for place of articulation in stop consonants. Journal of
the Acoustical Society of America 64: 13581368.
Vaissire, J acqueline
2006 Using an articulatory model as an integrated tool for a better
understanding of the combined articulatory, acoustic and perceptive
aspects of speech. INAE Workshop on Image and Speech Processing
Chennai, India.
2007 Area functions and articulatory modeling as a tool for investigating
the articulatory, acoustic and perceptual properties of sounds across
languages. In: Maria J osep Sol, Patricia Beddor, and Manjari Ohala.
Experimental Approaches to Phonology: 5471. Oxford: Oxford
University Press.
2008 On acoustic salience of vowels and consonants predicted from
articulatory models. Proceedings of the 8th Phonetic Conference
of China and the International Symposium on Phonetic Frontiers,
Beijing, China.
2009 Articulatory modeling and the denition of acoustic-perceptual targets
for reference vowels, The Chinese Phonetics Journal 2: 2233.
Vennemann, Theo
1988 Preference laws for syllable structure and the explanation of sound
change: With special reference to German, Germanic, Italian, and
Latin. Berlin, New York, Amsterdam: Mouton de Gruyter.
Willerman, Rachel, and Patricia Kuhl
1996 Cross-language speech perception of front rounded vowels: Swedish,
English, and Spanish speakers. In: Tim Bunnell and W. Idsardi (eds.).
Proceedings of the International Conference on Spoken Language
Processing (ICSLP) 96(1): 442445.
Wood, Sydney
1986 The acoustical signicance of tongue, lip, and larynx maneuvers in
rounded palatal vowels. Journal of the Acoustical Society of America
80: 39140.
1979 A radiographic analysis of constriction locations for vowels. Journal
of Phonetics 7:2543.
Yeou, Mohamed and Shinji Maeda
1995 Uvular and pharyngeal consonants are approximants: An acoustic
modelling and study. Proceedings of the XIIIth International Congress
of Phonetic Sciences, Stockholm: 586589.
The representation of vowel height and vowel
height neutralization in Brazilian Portuguese
(Southern Dialects)
W. Leo Wetzels
1. Introduction
1
The phenomenon of vowel neutralization (henceforth VN) is a traditional
topic of interest for phoneticians and phonologists. In the recent literature on
the subject of VN, Brazilian Portuguese (henceforth BP) is recurrently used
as an example of a language that uses a reduced vowel system in unstressed
syllables as compared to the one in stressed syllables. In this paper we will
address the process of neutralization from a broader perspective by studying
the global functionality of the mid vowel contrast in this language. It
will be demonstrated that upper and lower mid vowels display a strong
tendency towards complementary distribution governed by the stressed/
unstressed distinction, such that upper mid vowels are found in unstressed
syllables, whereas lower mid vowels are typical of syllables that carry
primary stress. We will also argue that the phonological distinction between
the BP high, upper mid, lower mid, and low vowels is best represented
in a 4-height system, which will be modeled by way of a multiple open
feature, as proposed in Clements (1991). Taking into account the interaction
of VN and other processes of BP phonology, we will furthermore show
that the traditional view of neutralization as the erasure of the neutralized
contrastive features is problematic and formalize VN as a process of feature
substitution.
2. Mid vowel neutralization
In BP, a symmetrical system of seven contrastive oral vowels containing two
series of mid vowels is found in stressed syllables, whereas in unstressed
syllables, a symmetrical ve vowel system occurs, with only one mid vowel
series
2
. The complete system is illustrated in (1) in word-nal stressed vowels,
332 W. Leo Wetzels
and the reduced system is exemplied in (2) with word-initial unstressed
syllables.
(1) Seven oral vowels in stressed syllables
abacax [i] pineapple urub [u] vulture
canjar [e] voodoo ritual camel [o] street vendor
jacar [c] alligator igap [:] swampland
maracuj [a] passion fruit
(2) Five oral vowels in unstressed syllables
pi'loto [i] pilot pu'lando [u] jumping
pe'lado [e] naked po'lido [o] polite
pa'lito [a] toothpick
Alternations between upper and lower mid vowels are exemplied in (3), in
which alternating vowels are italicized
3
:
(3) Alternations between stressed and unstressed mid vowels
Stressed Unstressed
brejo [c
] swamp bre'jal [e] large swamp

vela [c
] sail ve'leiro [e] sailing ship

ferro [c
] iron fe'rragem [e] ironware

caf [c
] coffee cafe'teira [e] coffee pot

mar [c
] tide mare'sia [e] smell of the sea

crebro [c
] brain cere'bral [e] cerebral

moda [:
] fashion mo'dista [o] modist

colo [:
] neck co'lar [o] necklace

sorte [:
] fortune sor'tudo [o] lucky person

igap [:
] ood land igapo'zal [o] ood land

plen [:
] pollen polenfero [o] polliniferous

scio [:
] member soci'al [o] social

In most Brazilian dialects, word-nal unstressed mid vowels are altogether
eliminated and realized as high vowels. As a consequence, in word-nal
unstressed syllables only [i, u, a] occur
4
. The examples in (4) illustrate the
productivity of this alternation.
(4) Alternations between mid vowels and unstressed word-nal high
vowels
comemos [] eat-1pl pr ind come [i] eat-3sg pr ind
abatedor [e] butcher abate [i] slaughter
The representation of vowel height and vowel height neutralization 333
levezinho [e] very light leve [i] light
campons [o] countryman campo [u] eld
matorral [o] dense bush mato [u] bush
povoar [o] to populate povo [u] people
Alternations like the ones given in (3) and (4) show the productivity of mid
vowel neutralization in unstressed syllables. However, even in stressed
syllables the functional load of the aperture contrast in mid vowels is very
low, in verbs as well as in non-verbs. Metrical, morphological, or segmental
contexts usually allow one to predict the aperture of the stressed mid vowels,
such that metrical and morphologically-conditioned distributions favor the
lower quality, whereas segmentally-conditioned distributions, which are
mainly assimilatory in nature, yield either upper or lower qualities. The
predictability of the aperture degree for mid vowels can be demonstrated
most easily in verbs, in which stressed mid vowels are generally lower mid,
unless they are subject to a process of Vowel Harmony, as we show below.
2.1. The quality of stressed mid vowels in verbs
We will adopt the morphological structure of BP regular verbs, as illustrated
in (5) for lavar to wash.
(5) [[ lav +a ]
St
+se +mos ] lavssemos (that) we washed
[[ lav +a ]
St
+va +mos ] lavvamos we washed
[[ lav +a ]
St
+ +o ] 'lavo I wash
We follow Cmara (1970: 94) in assuming that the BP verb stem consists
of a root followed by a theme vowel. In turn, the stem is followed by a
TAM morpheme and a Person/Number marker. In (5), +se and +va represent
the imperfect subjunctive and imperfect indicative morphemes respectively,
whereas +mos represents the rst person plural marker and +o the rst person
singular. The morphological structure of the nite verb is summarized in (6).
(6) Morphological structure of the BP nite verb form
[Root +Theme]
Stem
+Tense/Mood/Aspect +Person/Number
The structure in (6) is also assumed for the verb forms in which the theme
vowel does not surface, as in lavo I wash, or the forms of the present
subjunctive (see in (78)). However, the theme vowel surfaces in almost all
334 W. Leo Wetzels
the verb forms, as can be veried in (7) (the nite forms represent the third
person singular).
(7) lavar inf, lavando ger, lava pres ind, lavava impf ind, lavou (<lava+o)
pret, lavar fut, lave pres subj, lavasse impf subj, lavar fut subj,
lavado pa part, lavara pluperf
Verb stress in BP is on the sufx in future forms, on the theme vowel in forms
marked with the past, while in the present tenses it is on the root or on the
theme vowel
5
. The only mid vowel that serves as a theme is upper mid /e/,
which always surfaces as such: mov[]mos we move-pr ind mov[]ssemos
we move-impf subj. In some forms of the past or future tenses, stressed /e/
occurs as the syllabic part of the falling diphthong /ej/, a context in which it
is predictably upper mid in BP: fal[j] I say-pret, morar[j] I reside-fut.
More interesting is the behaviour of stressed mid vowels in present tense
roots. As an illustration, consider the verb-root vowels in (8).
(8) 1st Conjugation 2nd Conjugation 3rd Conjugation
(a-themes) (e-themes) (i-themes)
morar reside mover move servir serve
a. Present Indicative
m[:
]ro m[o]'ramos m[]vo m[o]'vemos s[]rvo s[e]r'vimos

m[:
]ras m[o]'rais m[:
]ves m[o]'veis s[c
]rves s[e]r'vis
m[:
]ra m[:
]ram m[:
]ve m[:
]vem s[c
]rve s[c
]rvem
b. Present Subjunctive
m[:
]re m[o]'remos m[]va m[o]'vamos s[]rva s[i]r'vamos

m[:
res m[o]'reis m[]vas m[o]'vais s[]rvas s[i]r'vais

m[:
]re m[:
]rem m[]va m[]vam s[]rva s[]rvam

c. Imperative
m[:
]ra m[o]'rai m[:
]ve m[o]'vei s[c
]rve s[e]r'vi
In the verb forms provided above we nd three types of mid vowels:
(i) Stress ed lower mid vowels [c
, :
], as in m[:
]ve he moves-pr ind,

s[c
]rve he serves-pr ind

(ii) Stressed upper mid vowels, as in m[]vo I move-pr ind, m[]va
move-1/3sg pr sub
(iii) Unstressed upper mid vowels, as in m[o]rmos we reside-pr ind,
s[e]rvmos we serve-pr ind.
Since [c, :] can only contrast with [e,o] in stressed syllables, the exclusive
occurrence of [e, o] in un stres sed roots is expected. We will therefore
concentrate on the stressed mid vowels. The forms contai ning stressed [c
, :
]
satisfy a constraint that requires stressed mid vowels in verb roots to be lower
mid. Where stressed [] and [] appear, their upper mid quality results from
Vowel Harmony, which assimilates a mid vowel in the root to the height of
the underly ing theme vowel, if the latter is in prevocalic position. The high
and low vowels /i, u, a/ are never affected by harmony. The assimilatory effect
of the theme vowels on root nal mid vowels is schematically presented
in (9)
6
:

(9) RV TV
1st conjugation
E + a > E /pEg+a+o/ [pc
gu] pego I take (compare pegar

to take)
O + a > O /mOr+a+o/ [m:
ru] moro I reside (compare morar

to reside)
2d conjugation
E + e > e /bEb+e+o/ [bbu] bebo I drink (compare beber
to drink)
O + e > o /mOv+e+o/ [mvu] movo I move (compare mover
to move)
3d conjugation
E + i > i /sErv+i+o/ [srvu] sirvo I serve (compare servir
to serve)
O + i > u /dOrm+i+o/ [drmu] durmo I sleep (compare dormir
to sleep)
Let us compare the forms m[]vo I move-pr ind with m[:
]ves you move-pr

ind, which derive from underlying /mOv+e+o/ and /mOv+e+s/, respectively.
In the rst of these forms, the theme vowel is in prevocalic position, where
it is deleted, but not without leaving behind its aperture features on the mid
vowel of the root, which therefore surfaces as upper mid. In the second form,
the theme-vowel, which is not in prevocalic position, is not erased and Vowel
Harmony does not apply. Similarly, in the present subjunctive forms the
stressed upper mid vowels derive from assimilation to the underlying theme
vowel, which is deleted in front of the TAM morpheme, as in /mOv+e+a/
m[]va move-1/3si pr subj. This process of deletion-cum-spreading
336 W. Leo Wetzels
was formalized as a stability (of aperture features) under deletion effect in
Wetzels (1995).
Independently of whether VH is accounted for by a phonological
operation as proposed in Wetzels (1995) or by some morphology-based
generalization, it is clear that, except for the upper mid quality of the
underlying /e/-theme, the quality of the stressed mid vowels in the BP verb is
completely predictable
7
: stressed mid vowels in verbs are lower mid, except
when targeted by assimilation triggered by the upper mid or high theme-
vowel
8
, in which case they surface with the aperture features of the (deleted)
theme-vowel.
The question of predicting the quality of stressed mid vowels in non-
verbs is more complex, although the overall picture is similar to the one seen
in verbs. To assess the functionality of the mid vowel contrast in non-verbs,
we need to consider the position of the stressed syllable, syllable weight,
the quality of the nominal theme vowel, and, for stressed vowels in closed
syllables, the nature of the coda consonant.
2.2. The quality of stressed mid vowels in non-verbs
The BP rhyme has maximally two positions lled (except for /s/, which can be
added as a third element). The right-hand rhyme position may be occupied by a
nasal mora /N/, the lateral consonant /l/ (realized as the dorsal glide [w] in most
dialects), the rhotic /R/ (realized as [x], [h], or []), the coronal fricative /s/, or
the high vowels /i/ or /u/ (realized as the glides [j] or [w], respectively)
9
. Stress
in BP non-verbs can be nal (mulher woman, jacar alligator), penultimate
(bolo cake, prton proton) and antepenultimate (mdico physician,
ngreme steep). Final stress in words that end in a heavy syllable as well as
prenal stress in words that end in a light syllable are considered productive or
regular in Wetzels (2007); all other stress types are unproductive.
2.2.1. Proparoxytonic stress
BP has a large class of words stressed on the third syllable counting from the
right word edge. Some examples are given in (10).
(10) m[c
]dico physician ab[:
]bora pumpkin
h[c
]lice propeller c[:
]digo code
c[c
]rebro brain l[:
]bulo lobule
r[c
]dea bridle [:
]sseo bony
Although exceptions exist in non-derived words, the vast majority of mid
vowels in this position are lower mid. The alternations in (11) show the
productivity of the constraint, called Dactylic Lowering in Wetzels (1992).
To our knowledge, no exceptions to this constraint exist in derived words.
(11) esquel[]to skeleton esquel[c
]tico skeletal
visig[]do visigoth visig[:
]tico visigothic
n[o]do'ar to stain n[:
]doa stain
camel[] street vender camel[:
]dromo covered market

sp[e]ro rough asp[c
]rrimo very rough

We formalize the constraint as in (12):
(12) Dactylic Lowering: stressed mid vowels must be lower mid in the
antepenultimate syllable (M represents the class of mid vowels
[open
1
,+open
2
]):
'o o o )
e
Domain: phonological word (non-verbs
10
)
|
M

[+open
3
]
The constraint in (12) is modeled within a system of vowel height
representation as proposed by Clements (1991) for the Romance languages
represented in (13) and requires the class of mid vowels /e, o/, dened as
[open
1
, +open
2
] in (13), to be realized as [+open
3
], i.e. as [c] or [:], in the
context dened by (12).
(13) BP vowel system in stressed syllables
aperture: i/u e/o c/: a
open
1
+
open
2
+ + +
open
3
+ +
2.2.2. Paroxytonic stress
In this section we will study the surface quality of mid vowels in stressed
prenal syllables. We will rst look at words with a heavy nal syllable
before turning our attention to paroxytonic words with a nal open syllable.
338 W. Leo Wetzels
2.2.2.1. Final syllable closed
Another irregular stress type in BP is represented by words that show prenal
stress despite their nal heavy syllable. Again, when the stressed vowel is
a mid vowel, there is no upper/lower mid contrast, while the mid vowel
is always realized as lower mid. This constraint, which is called Spondaic
Lowering in Wetzels (1992), is formalized in (14):
(14) Spondaic Lowering
11
'o o )e Domain: phonological word (non-verbs)
| /\
M

[+open
3
]
Examples are given in (15):
(15) m[:
]vel mobile C[c
]sar Caesar
c[:
]dex codex indel[c
]vel unerasable
d[:
]lar dollar est[c
]ril sterile
m[:
]rmon Mormon el[c
]tron electron
B[:
]ston Boston h[c
]rpes herpes
c[:
]smos cosmos w[c
]stern western
[:
]scar Oscar poly[c
]ster polyester
There are only a small amount of exceptions in underived words, e.g. t[]xtil
textile, p[]ster poster.
Derivational sufxes that surface as (or with) a heavy rhyme are rare in
BP. The few that exist show the productivity of the Dactylic Lowering in
derived contexts, as can be seen in words derived with the learned sufx -on,
illustrated in (16):
(16) barion (<bar(i)+on)
b[:
]son (<Bose+on)
c[:
]don (<code+on)
f[:
]non (<fono+on)
f[:
]ton (<foto+on),
magn[c
]ton (<magneto+on)
Exceptions to Dactylic Lowering may be systematic, in which case they are
explained by a competing constraint. For example, the word fonon in (16)
is pronounced by many speakers of BP with a stressed upper mid vowel
f []non by virtue of a constraint that requires mid vowels to be upper mid
when nasalized. Another systematic class of exceptions to Dactylic Lowering
is found in words in which a prenal stressed vowel is followed by a vowel-
initial syllable (hiatus), which may be light or heavy: 'M.V(C)#. Consider
the words in (17), which contain a stressed dorsal mid vowel before another
vowel.
(17) cor[]a crown
lag[]a lagoon
Alag[]as state of ~
b[]er Boer
The mid vowel quality shown by the words in (17) is systematic: when
immediately followed by a heterosyllabic vowel, a prenal stressed /O/ is
always realized as upper mid [o]. The situation is identical for the coronal
mid vowel /E/, although for /E/ the generalization only holds at the lexical
(as opposed to postlexical) level of representation, as we will demonstrate
next.
If the upper and lower mid vowel distinction were surface contrastive in
hiatus, we would expect all the sequences given in (18) to exist, the starred
ones included:
(18) [.V(C)#] lag[a] lagoon *[:
.V(C)#]
*[.V(C)#] *[c
.V(C)#]
We have just seen that there is a hiatus constraint in BP that requires dorsal
mid vowels to be upper mid before an immediately following syllabic
vowel, which explains the absence of sequences *[:
.V(C)#]. If we now
search for the coronal mid vowel in hiatus, it turns out that we nd neither
*[.V(C)#], nor *[c
.V(C)#]. The sequence [E.V] does exist when /E/ is

unstressed, as in the innitive arear [areax] to scour, derived from the
noun areia [arja] sand. As the reader observes, there is an alternation in
the latter words which not only involves the position of the word stress,
but also the absence/presence of a coronal intervocalic glide [e.] ~[.ja].
As it turns out, the glide is obligatory when the mid vowel is stressed.
The question therefore arises as to how we must account for the observed
alternation, or, in other words, whether or not the glide is underlying. To
nd a possible answer to this question, we must consider the surface patterns
in (19).
340 W. Leo Wetzels
(19) *[ja#] [:
ja#] jib[:
ja] boa constrictor

[ja#] ar[ja] sand [c
ja#] estr[c
ja] premire
Consider also the alternations in (20):
(20) estr[c
.ja] premire ~estr[e.]r to make ones rst appearance

ar[.ja] sand ~ar[e.]r to scour
jib[:
.ja] boa constrictor ~jib[o.j]r to sleep off a big meal

Given the contrast between voar [vox] to y and jiboiar [ibojx], we
must assume that the coronal glide is underlying after the dorsal mid vowel
in the noun jib[:
]ia and the verb jiboiar, derived from it. Furthermore, given
the absence of *[ja#], we can use the presence/absence of the intervocalic
glide to predict the surface quality of underlying /O/ in words like lagoa and
jibia: /.a/ [.a] and /.ja/ [:
.ja]. Of course, the upper mid quality of

the unstressed [o] in the derived innitive jiboiar follows from unstressed
vowel neutralization.
There are two possible explanations for the surface contrast between
estr[c
]ia and ar[]ia. One possibility is to posit an underlying contrast between

upper and lower mid quality in the context /.V(C)#/ and predict the glide
after a stressed coronal mid vowel. This option implies an underlying pattern
for coronal and dorsal mid vowels that is asymmetrical in two respects:
(21) /a/ /:
ja/
/a/
/c
a/
As shown in (21), the proposed account implies that there is an underlying
contrast in hiatus for the coronal mid vowels, but not for the dorsal mid
vowels. Furthermore, coronal mid vowels cannot be followed by a coronal
glide onset underlyingly, while dorsal mid vowels can. Let us next consider
the other option, which posits a fully symmetrical pattern for coronal and
dorsal mid vowels, as in (22), where the intervocalic glide in ar[ja] sand is
put between parentheses to indicate its alleged epenthetic nature.
(22) /a/ */:
.a/ lag[a] /:
ja/ */ja/ jib[:
ja]
/a/ */c
.a/ ar[(j)a] /c
ja/ */ja/ estr[c
ja].
The proposal in (22) is based on the assumption that ar[ja] derives from
underlying ar/a/, whereas estr[c
ja] derives from estr/c
ja/. In this scenario,

the same (lexical) constraints that account for the mid-vowel quality in [.a#]
and [:
.ja#] yield (intermediate) [.a] and [c
.ja]. A (postlexical) constraint of

Hiatus Avoidance, or Glide Insertion for those who prefer to think in terms
of processes, which would also be necessary in the proposal (21), creates
ar[ja] from ar/a/. We may subsequently generalize the constraints of Hiatus
Raising and PreGlide Lowering
12
to apply to all underlying mid vowels, as
shown in (23b).
(23) lexical a. hiatus raising b. PrGlide Low
'M.V(C)#. 'M.jV(C)#.

[open
3
] [+open
3
]
Glide Insertion is formalized in (24a). The price we pay for the symmetrical
system and the possibility of predicting the mid vowel quality for all mid
vowels in the relevant contexts is the (postlexical) process of Intervocalic
Glide Deletion (24b) that accounts for the absence of the intervocalic glide
in words like estrear.
(24) postlexical a. GlInsert b. IntVocGlDel
.jV# E.jV (where E is unstressed)

We conclude that there is no systematic contrast between upper and lower
mid vowels in words with prenal stress that have a closed nal syllable.
Exceptions found are either sporadic (and restricted to underived contexts)
or the result of some competing constraint, such as Nasal-Vowel Raising or
Hiatus Raising.
Let us next turn to the class of words that show prenal stress but which
ends in an open syllable. Here, the important factors that help predict the
quality of the stressed mid vowels are the nature of the syllable coda or the
quality of the nominal theme vowel.
2.2.2.2. Final open syllable
As is shown in (25) below, there are two types of words with prenal stress
that end in a light syllable in BP. In one type the stressed syllable is closed,
in the other the stressed syllable is open. In this paper, we will only discuss
the type (25i) in detail.
342 W. Leo Wetzels
i. 'MC.C
1
V#
(25) 'o.C
1
V#
ii. 'M.C
1
V#
The BP rhyme has maximally two positions lled (except for /s/, which can be
added as a third element). The coda may be occupied by one of the sonorant
consonants of the language, which are the nasal mora /N/, the lateral consonant
/l/ (realized as the dorsal glide [w] in most dialects), the rhotic /R/, or the glides
[j] or [w]. The only possible non-sonorant coda is the coronal fricative /s/.
In syllables closed with a nasal mora, the quality of the mid vowel is
always upper mid as a consequence of the Nasal-Vowel Raising constraint
mentioned earlier (see also note 2).
(26) 'MN.C
1
V# obligatory nasalization: nasal mid vowels are
upper mid
In prenal syllables closed by /l/ (in most dialects pronounced as a dorsal
glide [w]), the relevant factor that interferes with the quality of the mid
vowel appears to be the quality of the theme vowel. Nominal theme vowels
are either /e/ (in most dialects pronounced as [i]), /o/ (in most dialects
pronounced as [u]), or /a/.
(27a) 'Ml.C
1
e#
When the theme vowel is /e/, both the coronal and dorsal mid vowels are
lower mid.
reb[c
]lde rebel m[:
]lde mould
p[c
]lve pelvis g[:
]lpe blow
G[c
]LNE acronym p[:
]lme pulp
(27b) 'Ml.C
1
o#
When the theme vowel is /o/, the coronal mid vowel can be both lower and
upper mid (there are only a few words of this type), whereas the dorsal mid
vowel is almost always upper mid.
esb[c
]lto slender s[]lto loose

c[c
]lso excellent b[]lso pocket

su[]lto small journal article s[]ldo soldiers payment
f[]ltro felt f[]lgo breath
p[]lvo octopus
but: [:
]lmo elm
(27c) 'Ml.C
1
a#
When the theme vowel is /a/, the coronal mid vowel is almost always lower
mid, whereas there is no clear preference for the aperture quality of the dorsal
mid vowel.
b[c
]lga Belgian b[]lsa purse

ac[c
]lga beet p[]lpa pulp

c[c
]lta Celt c[]lza rape(seed)

d[c
]lta delta f[:
]lga pause
s[c
]lva jungle v[:
]lta return
r[c
]lva grass p[:
]lca polka
but: f[]lpa fur
To conclude, in the context 'Ml.C
1
V#, the contrast between upper and
lower mid is only signicant when the theme vowel is /-a/ and M is /o/, as
summarized below:
c
l.C
1
e# :
l.C
1
e# No contrast for E,O
{c
,}l.C
1
o# l.C
1
o# Few words with E, no contrast for O
c
l.C
1
a# {:
,}l.C
1
a# No contrast for E, contrast for O
In the sequences above, the usual quality for the mid vowel is lower mid,
unless when the theme vowel is /o/ (phonetic [u]), in which case /O/ is
realized as upper mid (assimilatory raising).
We next turn to paroxytonic words in which the stressed syllable is closed
by /R/. When the theme vowel is /e/, the coronal and dorsal mid vowels are
almost always lower mid.
(28a) 'Mr.C
1
e#
imb[c
]rbe beardless c[:
]rte cut
in[c
]rte inert m[:
]rte death
v[c
]rve verve s[:
]rte luck
v[c
]rme worm sup[:
]rte holder
but: v[e]rde green n[:
]rte north
but: t[]rpe vile
c[]rte court
344 W. Leo Wetzels
(28b) 'Mr.C
1
o#
When the theme vowel is /o/, the coronal mid vowel tends to be lower mid
(with a considerable number of exceptions), whereas the dorsal mid vowel is
almost always upper mid.
adv[c
]rso adverse c[]rpo body

c[c
]rto certain f[]rno oven

mod[c
]rno modern h[]rto little garden

ext[c
]rno external ret[]rno return

v[c
]rbo verb g[]rdo fat

alt[c
]rno alternate(d) ad[]rno ornament

but: f[]rvo conict but: ac[:
]rdo musical instrument

c[]rdo hog
(28c) 'Mr.C
1
a#
When the theme vowel is /a/, there is a strong tendency for both the coronal
and dorsal mid vowels to be lower mid, although there are a handful of
exceptions for both vowels.
[c
]rva herb p[:
]rta door
p[c
]rla pearl [:
]rla rim
cons[c
]rva preserve [:
]rca killer wale

p[c
]rna leg h[:
]rta vegetable garden

of[c
]rta offer c[:
]rda rope
conv[c
]rsa conversation a[:
]rta aorta
but: ac[]rca near but: f[]ra strength
p[]rda loss s[]rva tree sp.
The picture we see for 'Mr.C
1
V#is very similar to the one we have seen for
'Ml.C
1
V#. The contrast between upper and lower mid vowels has a very low
functional load in both these contexts.
c
r.C
1
e# :
r.C
1
c
r.C
1
o# r.C
1
o# No contrast for E, No contrast for O
c
r.C
1
a# :
r.C
1
a# No contrast for E, O
In paroxytonic words with a prenal syllable closed by /s/ that end with the
theme vowel /e/, both coronal and dorsal mid vowels are lower mid.
(29a) 'Ms.C
1
e#
equ[c
]stre equestrian p[:
]ste post
sem[c
]stre semester b[:
]sque forest
t[c
]ste test p[:
]stre dessert
p[c
]ste pest h[:
]ste multitude
(29b) 'Ms.C
1
o#
When the theme is /o/, there is no clear preference for the coronal mid vowel
to be upper or lower mid, while the dorsal mid vowel is almost exclusively
upper mid (notice that one of the rare exceptions has a variant with a nal
heavy syllable, which is regularly lower mid because of Dactylic Lowering).
UN[]SCO idem ag[]sto August
refr[]sco soft drink f[]sco dull
cr[]spo curly g[]sto taste
r[c
]sto remainder m[]sto must

sequ[c
]stro kidnapping p[]sto place

g[c
]sto gesture but: c[:
]smo(s) cosmos
(29c) 'Ms.C
1
a#
When the theme is /a/, the coronal mid vowel is lower mid (with some
exceptions), but there is no clear preference for the lower or upper mid
quality of the dorsal mid vowels.
f[c
]sta party b[:
]sta dung
pal[c
]stra lecture c[:
]sta coast
p[c
]sca shing bir[:
]sca (type of) child play

t[c
]sta forehead resp[:
]sta answer
but: b[]sta beast r[]sca spiral
v[]spa wasp lag[]sta lobster
c[]sta basket m[]sca housey
c
s.C
1
e# :
s.C
1
{,c
}s.C
1
o# s.C
1
o# Contrast for E, no contrast for O
c
s.C
1
a# {,:
}s.C
1
a# No contrast for E, contrast for O
When the prenal syllable is closed by /s/, we nd some more instances of
contrasting mid vowel qualities than we have encountered with /N/ or /l/
346 W. Leo Wetzels
codas, although, also for 'Ms.C
1
V, the mid vowel quality is predictable
in most contexts. The quality of the theme vowel appears to be the
determining factor in predicting whether the mid vowel surfaces as upper or
lower mid.
Besides the syllable types we have examined above, there are two more
sounds that may close a BP syllable, namely the glides /j, w/. Here, the
theme vowel appears not to interfere with the mid vowel quality, but it is
exclusively the nature of the glides that conditions the surface quality of the
mid vowel, in all cases.
(30) 'Mj.C
1
V#
l[j]te milk [j]to eight rec[j]ta income
n[j]te night p[j]to chest l[j]ra blond-fem
There is no contrast in prenal syllables closed by /j/. It seems that the glide
has an assimilatory raising effect on the tautosyllabic mid vowel.
j.C
1
e# j.C
1
j.C
1
o# j.C
1
o# No contrast for E,O
j.C
1
a# j.C
1
a# No contrast for E,O
(31) ...'Mw.C
1
V#
ent[w]se emphyteusis pentat[w]co pentateuch d[w]sa goddess
c[w]ve cabbage p[w]co little l[w]sa china
w.C
1
e# w.C
1
w.C
1
o# w.C
1
o# No contrast for E,O
w.C
1
a# w.C
1
a# No contrast for E,O
There is no contrast in prenal syllables closed by /w/. As in the case of
/j/, the labial glide has an assimilatory raising effect on the tautosyllabic
mid vowel, which obligatorily realizes a upper mid quality. However,
one important proviso must be made with regard to those dialects, which
represent the majority, in which coda /l/ is systematically vocalized to [w].
Since in the prenal syllable coda /l/ never alternates with [w], as happens
with word-nal /l/, as in papel [papc
w] paper ~papelo [papelw] carton,

the BP speaker has no way to reconstruct /l/ as underlying in this context.
This means that words like selva [sc
wva] jungle, folga [f:
wga] pause, etc.

reintroduce a contrast in prenal syllables between /w, w/ and /c
w, :
w/ in
the /l/-vocalizing dialects.
To predict the quality of mid vowels in prenal open syllables,
'M.C
1
V#, a number of factors are involved, such as the quality of the theme
vowel, but also the onset of the nal syllable. Space does not allow an
elaborate discussion of this context here, for which we refer the reader to
a forthcoming study (Wetzels, in preparation). In the next section, we turn
briey to the quality of mid vowels in word-nal stressed syllables, where
the facts are more straightforward.
2.2.3. Word-nal stress
Word-nal stress is productive in BP when the nal syllable is heavy. Stress
on a word-nal open syllable represents an exceptional stress pattern and
typically occurs in loans from the indigenous Brazilian languages, mostly
Tupi, and from African languages, French, or English. The quality of mid
vowels in words of this type is unpredictable (for examples, see the words
in (1) above).
Like word-internal closed syllables, word-nal syllables may be closed
by /N/, /l/ (>[w]), /R/, /i/ (>[j]), /u/ (>[w]), and /s/. As we have seen, it is a
general rule of BP that the mid vowel contrast is neutralized in nasal(ized)
vowels, which always surface with upper mid qualities. Since nasalization
is obligatory in syllables closed by a nasal mora, word-nal syllables of the
type in (32) show no mid vowel contrast.
(32) CMN# No contrast for E,O
Let us briey consider the remaining closed syllable types.
(33) c
l# No contrast for E
coquet[c
]l cocktail
m[c]l honey
pap[c
]l paper
:
l# No contrast for O
s[:]l sun
anz[:
]l hook
futeb[:
]l football
348 W. Leo Wetzels
The only exception is the loan g[o]l goal.
(34) c
r# No contrast for E
colh[c
]r spoon
Xavi[c
]r Xavier
praz[c
]r pleasure
Exceptions to the normally encountered lower mid quality are nouns derived
from second conjugation innitives ending in /eR/ [x], like dever duty,
ser being.
r# No contrast for O
[]r ower
te[]r tenor
ulteri[]r ulterior
In nal syllables closed by /R/ the dorsal mid vowel is usually upper mid,
with very few exceptions, such as su[:
]r sweat, red[:
]r circle, as well as
the bisyllabic comparative adjectives mai[:
]r bigger, men[:
]r smaller,
melh[:
]r better, pi[:
]r worse.
(35) j# No contrast for E
r[]i king
l[]i law
fr[]i friar
...:
j# Few words ending in [:
j]
her[:
]i hero
Niter[:
]i city of ~
b[]i ox
dep[]is after
There are very few non-verbs that end in stressed [:
j]. On the other hand, a

couple of irregular second conjugation forms shows a third person singular
present indicative form ending in [:
j]: m[:
]i grind, d[:
]i feel pain.
(36) {,c
}w# Contrast for E

pigm[]u pygmee
mus[]u museum
c[c
]u sky
chap[c
]u hat
The possessive pronouns m[]u mine, s[]u yours also show the upper
mid variant. Although word-nally the falling diphthong /w/#realizes
contrastive mid vowel qualities, there are few words with the lower mid
quality that are part of the average speakers vocabulary.
w# No contrast for O
Stressed word-nal /w/#is only found in words that are not in use, but for
which the dictionary prescribes the upper mid quality. Maybe a new contrast
is created as a consequence of /l/- vocalization in words that historically end
in /:
l/#and for which there are no derived words with vowel-initial sufxes
that would allow to set up /l/ underlyingly in the synchronic phonology. On
the other hand, since words ending in /w/#may not be part of the common
speakers vocabulary, it could be argued that there still is no synchronic
contrast, but that /w/#is realized as [:
w]#.
The remaining stressed word-nal rhyme type is the one closed by /s/
(where /s/ is not the plural morpheme), illustrated below.
(37) ...{,c
}s# Contrast for E

atrav[c
]s through
gr[c
]s millstone
m[]s month
cort[]s polite
...:
s# No contrast for O
v[:
]z voice
fer[:
]z ferocious
albatr[:
]z albatros
Here, the vowel quality appears to be unpredictable for the coronal mid
vowel, while the dorsal mid vowel, with very few exceptions, e.g. arr[]z
rice, realizes the lower mid quality.
3. Evaluating the neutralization facts
Taken together, the neutralization facts discussed in this paper show that the
contrast between upper and lower mid vowels in BP functions only in a very
small part of the contexts in which it could a priori be contrastive. However,
there are a number of uncertainties regarding the question of how to interpret
some of the observations we have made. The reader may have noticed that in
many contexts the absence of contrast is categorical, whereas in others, there
350 W. Leo Wetzels
is only a strong lexical preference for one of the mid vowel values. Since
generalizations may have exceptions, this may not be disturbing.
Also, the fact that quite a few exceptions concern highly frequent words
may in itself explain their exceptional status. The more interesting question is
whether all the different contexts in which stressed mid vowel neutralization
applies must be represented as different generalizations or whether one could
claim that one of the mid vowel values represents the default value. The
distribution of mid vowel qualities in verbs, which are lower mid except
when they are targeted by VH, strongly supports the idea that lower mid is the
default value for stressed mid vowels. Also in non-verbs, the neutralizations
yielding upper mid qualities are the ones that are most clearly conditioned
by segmental properties. For example, where neutralization applies to mid
vowels that are the nucleus of a falling diphthong, /'M{j,w})
o
/, the upper
mid quality appears, suggesting an assimilatory raising effect exercised
by the glides in the syllable coda. Moreover, one sees a systematic upper
mid quality for // in prenal syllables of non-verbs ending in the theme-
vowel /o/, which is in most dialects pronounced as [u]. Hiatus Raising
as in lag[
(w)
]a lagoon could be understood as facilitating transitional glide
formation.
If lower mid does indeed represent the default value for stressed mid
vowels, the question arises as to how to interpret the lower mid qualities
in contexts that are (or at least seem to be) segmentally conditioned. For
example, how must one understand the lower mid value in words ending in
:
s#? An explanation based on assimilation seems unlikely. Dissimilation,

however, may provide the answer. Similarly, the systematic use of the lower
mid-quality in stressed prenal syllables in e-themes cannot be considered
assimilatory, even less so since word-nal unstressed /e/ is pronounced [i] in
most dialects. On the other hand, one could posit an assimilatory effect of
the /a/-theme, which may be responsible for the lower mid vowels in stressed
prenal syllables. These are systematic in the case of // whatever the nature
of the coda consonant and for both /{,}/ in the context 'Mr.C
1
a#.
Alternatively, one could think that the lower mid quality, wherever it
is attested, represents the default value for stressed mid vowels and that,
consequently, the context-dependent generalizations that are relevant from
the point of view of the synchronic phonological grammar of BP are only
the ones that account for the upper mid qualities. The position one wishes
to take with regard to this issue is certainly interesting from the point of
view of exploring the psychological validity of phonotactic generalizations.
However, irrespective of how one wishes to deal with the systematic
appearance of lower mid vowels in specic contexts, it is clear that the mid
vowel quality that surfaces in contexts that, in any type of analysis, do not
refer to any phonological feature specications (such as vowel lowering in
verbs and metrically-conditioned lowering in non-verbs), is always lower
mid. We tend to interpret this fact within the context of the cross-linguistic
preference for prominent syllables to be relatively sonorant. The stress-to-
sonorance coupling is especially visible in languages that distribute stress as
a function of the sonority of the syllable nucleus, discussed in Kenstowicz
(1994, 1997) and Crosswhite (2004), among others. In the remainder of this
study we will therefore assume that [e, o] represent the unmarked phonetic
realizations of a neutralized mid vowel contrast in unstressed position,
but that [c
, :
] represent the unmarked realizations of mid vowel (aperture)

neutralization in stressed syllables.
4. The Representation of Vowel Height and
Vowel Height Neutralization
In the previous discussion we referred to the contrast between /e, o/ and
/c, :/ as involving a different degree of aperture, instead of, for example, a
different value for the feature [tense] or [ATR]. Our reasons for thinking
that the vowel system of BP is best represented as a system of four degrees
of aperture are the following:
a. In the process of Vowel Harmony in verbs, the theme vowels /i/ and /e/
act as a class: /bEb+e+o/ b[]bo I drink-pr ind (compare, b[c
]be
he drinks-pr ind), /mOv+e+o/ m[]vo I move-pr ind (compare
m[:
]ve he moves-pr ind), /sErv+i+o/ s[]rvo I serve-pr ind

(compare s[c
]rve he serves-pr ind), /dOrm+i+o/d[]rmo I sleep-

pr ind (compare d[:
]rme he sleeps-pr ind

13
). If the default value for
stressed mid vowels is lower mid, the upper mid quality in the stressed
mid vowels targeted by VH in verbs of the second conjugation must
be explained as the consequence of spreading. This means that in a
system in which upper and lower mid vowels are distinguished by,
say, [ATR], the e-theme must be underlyingly specied as [+ATR].
Assuming that assimilation involves either a single feature or a class
node, there must be a class node that dominates [ATR] as well as the
aperture feature, e.g. [low]. In our view this is equivalent to saying
that in BP [ATR] functions as an aperture feature, which is why we
prefer dening the phonological contrast to which it refers as one of
aperture to begin with.
352 W. Leo Wetzels
b. There are Brazilian dialects in which lower mid vowels occur in
unstressed syllables. In the same dialects, these lower mid vowels may
assimilate to the aperture degree of the following stressed vowel. Viegas
(2009) observes the following variation in the speech of speakers in
the state of Minas Gerais: [scgndu] ~[segndu] ~[signdu] second,
[k:mda] ~[komda] ~[kumda] food. This variation is most straight-
forwardly interpreted as involving different degrees of raising.
c. If we are right in assuming that the preference in BP stressed syllables
for lower mid vowels resulting from neutralization nds its motivation
in the greater sonority of lower mid vowels as compared to upper
mid vowels, and given the cross-linguistic coupling between relative
sonority and relative aperture for peripheral vowels, the difference
between [e, o] and [c, :] is probably one of aperture.
We will therefore assume that the BP vocalic system distinguishes four
degrees of aperture and adopt Clements (1991) hierarchical model of vowel
height.
In Clements model, vowel height distinctions are dened as a series of
ordered grades, expressed by the use of integers which occur as indices on
a multiple [open] feature. The contrast between low and non-low vowels is
considered to be the primary aperture distinction, expressed as a contrast
between the [open
1
] and [+open
1
] registers, as in (38a).
(38a) Vowel system with two degrees of aperture, also the system used in
BP unstressed word-nal open syllables
aperture: i/u a
open
1
+
The opposition between either of these classes and a unique series of mid
vowels is dened on the [open
2
] tier, as a secondary division of the [open
1
]
register, as shown in (38b). The primary and secondary [open] tiers together
dene the mid vowels as a single class, as opposed to /i,u/ on the one hand,
and /a/ on the other.
(38b) Vowel systems with three degrees of aperture, also the BP
unstressed vowel system
aperture: i/u e/o a
open
1
+
open
2
+ +
It is only on the [open
3
] tier that the contrast between upper and lower mid
vowels is introduced.
(38c) Vowel systems with four degrees of aperture, also the BP stressed
vowel system
aperture: i/u e/o c/: a
open
1
+
open
2
+ + +
open
3
+ +
As we indicated above tables (38ac), the three aperture systems provided
are exactly the ones used in BP in different contexts. The idea of representing
vowel height as a uniform phonetic dimension divided into primary height
registers and subregisters not only makes it possible to predict the order
in which aperture classes merge as a result of neutralization, but it also
allows for the description of neutralization domains as positions which
realize a smaller number of associations, and the formalization of height
neutralization as a dissociation of a vowel set from one or more contrasting
[open] tiers. This, in turn, establishes a parallel between the formal operation
of height neutralization in systems such as the one in BP and other types of
neutralization, which in autosegmental phonology are generally described
as dissociation operations (cf. Goldsmith (1987), Mascar (1985), and
McCarthy (1988), among others). This was also the position taken in Wetzels
(1995), where unstressed vowel neutralization and word-nal unstressed
vowel neutralization were formulated as in (39) and (40).
(39) Unstressed Vowel Neutralization
V Domain: Phonological word
|
=
Condition: V does not carry primary stress
[open
3
]
(40) Word-nal Vowel Neutralization
V(s)]
W
Condition: V does not carry primary stress
|
=
[open
2
]
The statement in (39) has the effect of neutralizing the contrast between
upper and lower mid vowels in all unstressed syllables, whereas (40), which
354 W. Leo Wetzels
accounts for the further neutralization of the contrast between mid and high
vowels, is restricted to word-nal unstressed vowels in open syllables or
syllables closed by /s/. However, a closer examination of the phonological
grammar of BP leads to the conclusion that the formalization of VN as
segment despecication is problematic.
The problem originates from the well-known fact that the denition of
segments in terms of their distinctive features depends on the nature and
number of co-existing segments in the system. For example, the phoneme
/e/ in a system that also contains the phoneme /c/ is dened as [open
1
,
+open
2
, open
3
] (cf. 38c), whereas in a system without the lower mid vowel,
/e/ is dened as [open
1
, +open
2
] (cf. 38b). It is the difference in the feature
denitions for /e/ that generates a problem in a language in which both
systems co-occur and which contains a constraint that involves /e/ in both
the neutralizing and non-neutralizing context. This is because the feature
denition of /e/ in the context of neutralization matches with that of both /e/
and /c/ in the non-neutralizing context and, conversely, the denition of /e/
in the context within which it contrasts with /c/ does not nd a match in the
context of neutralization. Consequently, it is not possible to state an across-
the-board generalization that either involves /e/ alone, or that needs /e/ only
as part of its conditioning environment.
To the best of our knowledge, there is no constraint in BP that needs to
refer to /e/ across-the board. However, at least intuitively, there is little reason
to consider this independent evidence for the formalization of neutralization
as despecication. Indeed, BP does allow us to show that the formalization
of neutralization as a dissociation operation is undesirable. Proof comes from
a process of affrication that must take /i/ in its conditioning environment, but
for which no adequate denition of /i/ can be given. The neutralization of
mid and high vowels word-nally (see (39)) makes it impossible to dene /i/
such that its denition matches /i/ and /i/ alone in all the subsystems in which
affrication applies, as we show next.
In BP the coronal stops /t, d/ are realized as affricates before /i/. The
examples in (42) illustrate the application of the affrication process in all of
the subsystems created by vowel neutralization.
(41) /t, d/ [t
J
, d
]/_i
(42) Before stressed /i/: tia [t
J
.a] aunt
[open
1
, open
2
, (open
3
)] dia [d
.a] day
Before unstressed /i/
word-internally: tibetano [t
J
ibetnu] Tibetan
[open
1
, open
2
] dinheiro [d
ijru] money
Before unstressed /i/
word-nally: peste [pc
st
J
i] plague
[open
1
] verde [vc
rd
i] green
promete [promc
t
J
i] promise-3si pr ind
cf. promet[]mos promise-1pl pr ind
As shown by the words in (42), affrication applies to coronal stops before
every /i/, whether it belongs to the stressed, the word-internal unstressed,
or the word-nal unstressed domain or whether it is underlying or derived
by neutralization. If we dene a different system of aperture contrasts for
each of these domains, we also have a different feature denition for /i/, as
indicated in (42). Observe that none of the feature denitions for /i/ in (42) is
adequate for triggering affrication before every instance of /i/ and not before
any of the other vowels. In line with the discussion above, we propose to
dene neutralization as a mechanism by which contrastive feature values are
replaced by their opposite values on the tier on which the contrast is dened.
Constraints (39) and (40) can then be replaced by the simple statements (43)
and (44), while stressed vowel neutralization may be formulated as in (45).
(43) Neutralize [open
3
] in unstressed vowels, within the phonological
word
2
] in word-nal unstressed vowels
3
] in primary stressed vowels
These statements are interpreted such that on the [open
x
]-tier on which a
contrast is introduced, the contrasting vowels receive identical feature values
for the [open
x
] feature. Without any further statement in the grammar, we
assume that the unmarked features will be provided automatically, which
is [open
3
] for unstressed mid-vowels, and [+open
3
] for stressed ones, as
we have argued above. We will furthermore assume that for both stressed
and unstressed vowels the unmarked [open
2
] value is [open
2
] for non-low
vowels. Under this interpretation, VN will create the systems (46ac) for the
different contexts.
356 W. Leo Wetzels
(46a) Unstressed vowel neutralization (across the board)
aperture: i/u E/O [e/o] a
open
1
+
open
2
+ +
open
3
+
(46b) Word-nal neutralization
aperture: i/u a
open
1
+
open
2
+
open
3
+
(46c) Stressed vowel neutralization
aperture: i/u E/O [c/:] a
open
1
+
open
2
+ +
open
3
+ +
In the three systems above, the vowel /i/ is minimally dened as [open
1
,
open
2
, ([open
3
])], which is adequate for yielding the attested affrication
patterns.
5. Conclusion
In this paper, we have studied the functional load of the mid vowel contrast
between /e, o/ and /c, :/ in BP, a language which is commonly cited as
displaying a seven-vowel system in stressed syllables. We have shown
that the contrast between upper and lower mid vowels is far from being
systematically exploited even in stressed position. The conclusion that BP is
evolving towards a ve-vowel system seems justied. Accordingly, at least
in the southern dialects, /e, o/ are typically found in unstressed syllables,
whereas in stressed syllables the default mid vowel quality appears to be
represented by /c
, :
/, which compete with /e, o/ created by assimilatory

neutralization. In word-nal unstressed syllables, only the basic triangle
/i, u, a/ is realized. We have furthermore argued in favor of dening the
contrast between the two mid vowel series as involving different degrees of
aperture and have shown that VN should not be formally expressed as the
dissociation of the vowels from the tier on which the contrast is dened, but
rather as involving a process of feature substitution by which for a given
feature the marked value is replaced by the unmarked value, which may be
context-dependent.
Notes
1. Thanks to Ben Hermans and Rachel Santos for their comments on an earlier draft
of this paper.
2. In BP, nasal vowels do not contrast for upper and lower mid. Nasal mid vowels
always have an upper-mid quality (see Wetzels, 1997).
3. BP orthography uses accent marks in words in which primary accent is considered
exceptional, such as stress on word-nal open syllables, proparoxytonic stress,
and prenal stress in words with a nal heavy syllable. The accent marks that
are used distinguish between lower mid vowel (acute accent) and upper mid
vowels (circumex accent). We follow this praxis when words are given in their
orthographic form.
4. The three-vowel system in (4) is also the one used in unstressed syllables
before word-nal /s/, as in the city names Patos, or Londres, in common nouns,
such as cosmos cosmos, diabetes diabetes, and in inected words: mdicos
physicians, hlices helixes, velhos old-pl, verdes green-pl, comemos we
eat, etc.
5. See Wetzels 2007, for discussion of BP primary stress.
6. Below and henceforth, we will use the capitals O, E to refer to the classes of
dorsal and coronal mid vowels, respectively. We will use capital M to refer to all
the mid vowels.
7. Exceptional cases of non-harmonic stressed upper mid vowels exist in irregular
verb roots.
8. Since stressed mid vowels in verbs are lower mid by default, we will assume
here that only /i/ and /e/ trigger VH. Notice also that if /a/ is included in the
trigger set for VH in the four height system used here, some proviso must be
made to avoid that /E,O/ are lowered to the degree of /a/.
9. See Wetzels 1997, for discussion of the BP syllable.
10. Observe that Dactylic Lowering does not affect the underlying upper mid quality
in verb forms like com[]ssemos we ate subj.
11. Dactylic Lowering and Spondaic Lowering could be restated as a single
generalization by way of a binary-branching structure that takes moras as its
terminal elements, as suggested in Wetzels 1992.
12. Although we call this rule PreGlide Lowering for the purpose of the exposition,
the theme vowel /a/ is a necessary part of its conditioning environment. Below,
358 W. Leo Wetzels
the role of the theme vowel in predicting the quality of the stressed mid vowels
will be discussed in words with a prenal closed syllable. The nature of the theme
vowel, among other factors, is equally relevant for the mid-vowel quality in
stressed prenal open syllables 'M.CV#. Sufce it to mention here that, whatever
the theme vowel quality in words of the type 'M.(j)V(C)#is, the quality of M is
predictable by rule. For a complete discussion, see Wetzels, in preparation.
13. The nal vowel in the forms bebe he drinks and move he moves, represents
the theme-vowel /e/ (compare bebemos we drink and movemos we move) ,
whereas in the forms serve he serves and dorme he sleeps, the same vowel
represents the theme vowel /i/ (compare servimos we serve and dormimos we
sleep). The PB orthography does not distinguish these vowels in the forms of
the 3sg pres ind, among others.
References
Cmara, J r. J . Mattoso
1970 Estrutura da Lngua Portugusa. Petrpolis. Editora Vozes, second
edition.
Clements, George, Nick
1991 Vowel Height Assimilation in Bantu Languages. Working Papers of
the Cornell Phonetics Laboratory 5, 3776.
Crosswhite, Katherine M
2004 Vowel Reduction. In: Bruce Hayes, Robert Kirchner, and Donca
Steriade (eds.). Phonetically Based Phonology. Cambridge, the
University Press.
Goldsmith, J ohn
1987 Towards a Theory of Vowel Systems. Chicago Linguisic Society 23:2,
116133.
Kenstowicz, Michael
1994 Sonority-Driven Stress. Manuscript, MIT. ROA33
1997 Quality-Sensitive Stress. Revista di Linguistica 9, 15788.
Mascar, J oan
1985 A Reduction and Spreading Theory of Voicing and Other Sound
Effects. Ms. Universitat Autnoma de Barcelona.
McCarthy, J ohn
1988 Feature Geometry and Dependency: A Review. Phonetica, 45. 84
108.
Viegas, Maria do Carmo
2009 Projeto VARFON-Minas: As Vogais tonas em Minas Gerais-Uma
Interpretao. In: Lee, S.H. et al. (eds.), Anais do II Sisvogais. BH,
UFMG.
Wetzels, W. Leo
1992 Mid Vowel Neutralization in Brazilian Portuguese. In: Wetzels &
Abaurre (eds.), Fonologia do Portugus, Numero Especial dos
Cadernos de Estudos Lingsticos. IEL/UNICAMP: 1955.
1995 Mid-Vowel Alterna tions in the Brazi lian Portuguese Verb. Phonology
12: 281304.
1997 The Lexical Representation of Nasality in Brazilian Portuguese.
Probus, 9.2: 203232.
2007 Primary Word Stress in Brazilian Portuguese and the Weight
Parameter. Journal of Portuguese Linguistics, 5/6: 958.
In preparation. The Phonology of Brazilian Portuguese.
Index
Abramson, Arthur S., 171, 266, 278
Adamawa, 81
Adank, Patti, 139
Adda, Gilles, 157
Adda-Decker, Martine, xi, 156157, 316
Africa (Subsaharan), 50
Afrikaans, 250
Aghem, 71
Akan-Asante,112, 120121
Allauzen, Alexandre, 157
Alwan, Abeer, 138
Amazon (Northwest), 50
Anderson, Stephen R., 6, 21, 54, 196, 199
Anyaka, 11
Apoussidou, Diana, 215
Arabic
Moroccan, 277, 281, 283
Syrian, 155
Archangeli, Diana, 87, 199, 206
Armstrong, Robert G., 8
Articulator Theory, 292298, 300301,
303
articulator, 297
Asia (East and Southeast), 50
aspriate, voiceless, 87
assimilation, 66, 84
nasal, 83
subphonemic, 4
natural, 7
phonetic, 7
phonological, 7
autosegmental phonology, 3, 5
Avery, Peter, 266
Baatonu (Bariba), 8
Babadjou, 71
Babanki, 56, 71
Bachrach, Asaf, 138, 140
Badin, Pierre, 311
Baerman, Matthew, 59
Bakovic, Eric, 197
Bamileke
Dschang, 71, 111
Fefe, 6364
Bandhu, C. M., 279
Bantu, 25, 27, 28, 29, 7172, 11
Bao, Zhiming, 8, 17, 54, 55, 57, 59, 72
Bariba, 810, 14, 69
Brknji, Zsuzsanna, 139
Barry, Martin, 155, 167
Bateman, J anet, 5152
Bvegrd, Mats, 311, 317318, 320
Becker, Michael, 104, 206
Beddor, Patrice, 189
Berber (Tashlhiyt dialect), 271, 273274,
276278, 282283
Bergman, Richard, 13
Best, Catherine T., 170
Bhatia, Tej, 270, 278
Bird, Steven, 111
Black, H. Andrew, 202
Blaho, Sylvia 88
Blasko, Dawn 167
Blevins, J uliette, 306
Blumstein, Sheila, 169, 171, 309
Bo, Louis-J ean, 311
Boersma, Paul, 131, 157, 264
Bhm, Tams, 139
Blte, J ens, 167
Bombien, Lasse, 275
Boves, L., 146
Bradshaw, Mary, 85, 9394
Browman, Catherine P., 170171, 264,
266, 275276, 308
Brown, J ulie M., 103
Brunelle, Marc, 21
362 Index
Bulgarian, xi, 292, 294, 298300, 303
Burkina Faso, 112, 114
Burmese, 267, 275
Burton, Martha W. 156
Burzio, Luigi, 212
Kimiri (Cahi dialect), 53
Cmara, J r. J . Mattaso, 333
Cameroon, 71
Campbell, Lyle, 83
Cantonese, 1517, 19, 280281
Carlson, Rolf, 314
Cassimjee, Farida, 201
Chad, 61
Chang, Steve S., 311
Chapin Ringo, Carol, 281
Chen, Matthew Y., 6, 19, 54, 55, 57, 64,
73, 202
Cheyene, 281
Cheyne, H. A., 141
Chi, Xuemin, 138139
Chiba, T., 318
Chichewa, 111
Chimwiini, 5253
Chinese, 15, 7273, 267
Chistovich, Ludmilla A., 139, 313
Cho, Taehong, 178, 188190, 267,
279280
Chomsky, Noam, 6, 71, 8183, 87, 137,
229, 250, 264, 266, 293, 308
Christophe, Anne, 153
Ciocca, Valter, 279280
Clark, Mary M., 57
classes, natural, 7, 5960, 81, 223, 230,
232, 238, 241, 244, 250, 253, 264
Clements G. N., viiix, xi, 6, 1112, 21,
57, 69, 81, 8384, 90, 108109,
125, 149, 171, 195, 206, 223, 230,
235238, 255257, 281, 293294,
304, 306, 331, 337, 352
coarticulation, 7, 151, 155, 322
coda, 342, 346, 350
Coenen, Else, 167
Cohen, Michael M., 172
Cohn, Abigail, 197
Cole, J ennifer S., 199
Connine, Cynthia, 167
consonants, voiceless, 98
contrast
lexical, 61
voice, 1415
Cooper, Franck, 318
Corbett, Greville G., 59
Coupez, Andr, 49
Cranen, B., 146
Crosswhite, Katherine M., 351
Csap, Tams Gbor, 139
Dactylic Lowering, 337338, 357
Dagara, ix, 111113, 118120, 130
Dahal, B. M., 279
Dahlmeier, Klaus, 276
Danish, 270
Darcy, Isabelle, 153
Day, 61, 63
Delattre, Pierre, 316, 318
Delmolino, Grace, 215
Demuth, Katherine, 50
Dicanio, Christian, 57
Dida (dialects), 67
Diehl, Randy, 103
Dilley, Laura C., 153
dissimilation, 7, 350
Dixit, Prakash R., 270271, 279
Dolphyne, Florence A., 112, 120121
Donohue, Mark, 72
Downing, Laura J ., 72
Dupoux, Emmanuel, 153
Durand, J acques, 264
Durand, Marguerite, 311
Dutch, 111112
Van Dyken, J ulia, 68
Ek, 281
Eisner, J ason, 197
Elfner, Emily, 216
English, 111, 139, 154, 188, 266267,
275277, 281282, 316317, 325,
347 Enhancement theory, 310
Ernestus, Mirjam, 122
Index 363
Fant, Gunnar, 137, 179, 229, 264, 266,
280, 306309, 311314, 316320,
322
features
anterior, 248, 252, 293298, 302303
approximant, 241, 244, 251, 256
aspirated, 266
assimilation, 3
back, 139140, 142143, 236, 240
241, 253, 256, 293294, 297298,
301, 310
bilabial 85
closed 87
consonantal, 232, 236, 241, 248249,
256
continuant, 88, 224, 238, 240, 244,
248249, 252, 265
coronal, 8486,89 152153, 232,
235236, 241, 244245, 252, 256,
293296, 302303
distributed, 244245, 247, 293, 295
dorsal, 84, 241, 244245, 249, 251,
293296, 298, 301302
extreme, 9193
front, 140
grave, 309
heightened subglottal pressure, 232,
234236, 238, 240, 256, 266
high 100, 102, 240, 248249, 293,
304, 330
labial, 4, 8486, 150, 153, 223,
236, 240241, 244245, 252,
295
labiodental 85
lateral, 252253
low, 232, 293, 351
nasal, 84, 195, 197199, 201, 203204,
207, 209214, 236, 238, 240241,
248, 251253
open, 87, 291, 352355
oral 89
raised 90, 92, 100, 102
register, 14, 16
round, 84, 293
segmental, 20
sonorant, 84, 100, 232, 244, 247248,
252, 256
spread glottis, 176, 265267, 270, 272,
275, 278, 281282
subregister, 6
syllabic, 232, 234, 238, 241, 245,
252253, 256
tense, 187189, 191, 266, 351
upper 90, 92
vocalic, 224, 241, 249, 295296
vocoid, 241, 248, 251, 253, 256
voice, 4, 152, 163, 166, 232, 238, 251,
266
Flemming, Edward S., 137, 151, 250,
264, 311
Foley, J ames 87
Ford, Kevin C., viii, 195
Fouch, Pierre 154
Fougeron, Ccile, 308
Fowler, Carol A., 103, 171
French 152155, 172, 188, 316317, 325,
347
Canadian, 281
Fuchs, Susanne, 276
Fulmer, S. Lee, 209
Gaskell, Gareth, 152153
Gauvain, J ean-Luc, 157
Gban, 6061
Gendner, Vronique, 157
Gendrot, Cdric, 316
Georgian, 141
German Research Association, 146
German, 138, 140141, 143, 276, 281,
317
Swabian (dialect), x, 138, 140,
142144
Gerstman, L. J ., 318
Geumann, Anja, 142144
Girn Higuita, J . M., 55
Giryama, 51
Glide Insertion, 341
glide, 99
dorsal, 342
glottalized, 99
364 Index
glottal stop, 104
glottal width theory, 268
glottis, 312313
Gokana, 63
Goldrick, Matthew, 169
Goldsmith, J ohn, vii, 49, 71, 195, 202,
270, 353
Goldstein, Louis, 170171, 264, 266,
275276, 308
Gopal, H. S., 139
Gow, David W., 151152, 155, 160, 167
Grczi, Tekla Etelka, 139
Grammont, Maurice, 154
Granstrm, Bjorn, 314
Great English Vowel Shift, 73
Greenberg, J oseph H., 306
Gregerson, Kenneth J ., 57
Guenther, F. H., 145, 309
Gur, 112
Gurma, 12
Hale, A., 279
Hale, Mark, 87, 103
Hall, Robert A., 255
Hall, Tracy A., 281
Halle, Morris, vii, 6, 57, 7172, 8183,
87, 90, 137, 176, 229230,
236238, 250, 256, 264, 266267,
274, 293, 296297, 303, 308
Hall, Pierre A., xi, 152, 156, 166170
Hamann, Silke, 281
Han, Mieko S., 189
Hans, Stphane, 176178, 185187, 191
Hanson, Helen, M., 284
Hardcastle, William J ., 189, 308
Hargus, Sharon, 76, 212
Hari, Maria, 72
Harlow, Ray, 281
Harris, J ohn, 73, 87
Hasimoto-Yue, Anne O., 1719
Haudricourt, Andr, 306
Hayes, Bruce, 87
Heffner, Roe-Merrill S., 266
Heine, Bernd, 76
Hermans, Ben, 357
van Heuven, Vincent J ., 111
Hewlett, Nigel, 308
Hiller, Markus, 142, 145
Hindi, 270271, 279
Hirose, Hajime, 190, 270, 274, 277,
281
Hogan, J ohn T., 108, 111
Holzhausen, A., 279
Honda, Kiyoshi, 176178, 185187,
189191, 277
Hong, Kihwan, 190
Hong, Soonhyun, 111
Hoole, Phil, 276
Horrocks, J ulie, 189
House, Arthur, 306, 311
van Hout, Roeland, 139
Howard, Irwin, 196
Hume, Elizabeth, xi, 8385, 137, 207,
223, 235, 294, 304, 311, 326
Hungarian, 139, 153, 155, 160
Hupa, 280
Hutters, Birgit, 268, 270
Hyman, Larry M., ix, 6, 9, 21, 50, 57,
6364, 69, 71, 73, 239
Iau, 51, 59, 60
Icelandic, 274275
Idsardi, William, 266
Igbo, 111
Igede, 10, 13, 60
Ik, 56
Im, Arron M., 151, 155, 160, 167
Imperative Raising, 101
implosives, 9899
International Phonetic Alphabet, 308,
324325
Invariance theory, 309
Ivanov, Ivan, 303
Iverson, Gregory K., 188189, 266267,
275
Ivory Coast, 66
Iwata, Ray, 281
J akobson, Roman, 82, 137, 229, 264, 266,
308
Index 365
J ansen, Wouter, 155156, 167
J apanese, 111, 227228
dialects, 104
J esney, Karen, 201, 215
J esperson, Otto, 82
J essen, Michael, 266, 279, 281
J ohnson, C. Douglas, 196, 311
J ohore Malay, 195198, 202
J ones, Daniel, 82, 307308, 314316,
325
J un, Sun-Ah, 178, 188190
J urgec, Peter, 104, 215
Kabiye, 272273, 282
Kagaya, Ryohei, 176, 189, 268, 270,
282
Kagisano, Molapisi, 81
Kagwe, 64, 6668
Kajiyama, M., 318
Kam (Shidong), 57
Kanakuru, 72
Keagan, Peter, 281
Keating, Patricia, 266
Kenstowicz, Michael, 196, 202, 266, 270,
351
Kent, Raymond D., 179
Keyser, S. J ay, vii, 138, 146, 264265,
267, 306, 310
Khatiwada, Rajesh, 281
Khmou, 15
Khoisan, 95
Kikamba, 9192
Kikuya, viiviii
Kim, Chin-Wu, 189, 264, 270, 281
Kim, Hyunsoon, 176178, 185191
Kim, Mi-Ryoung, 189
Kim, Soohee, 188
Kimper, Wendell, 201, 215
King, J eanette, 281
Kingston, J ohn, 266267, 274275
Kinyarwanda, 2527, 29, 32, 3536, 41,
48
Kinzler, Katherine, 153
Kiparsky, Paul, 202
Kirchner, Robert, 87, 199
Kirundi, 25
Kisseberth, CharlesW., 52, 196, 199,
201
Kjelmslev, Louis, 87
Klatt, Dennis, 188
Kluender, Keith, 103
Kom, 56, 71
Kono, 108, 111112
Koopman, Hilda, 6667
Korean, 176, 188189, 212213, 268,
277, 282
Cheju, 190
Seoul, 177, 190
Kgler, Frank, 140
Kwa (languages), 111
Ladefoged, Peter, 178, 188190, 264,
266, 270, 272, 275, 279280, 282,
307, 314317, 325
Lahiri, Aditi, 152
Lamel, Laurie, 157
Lancia, Leonardo, 172
Laniran, Yetunde O., 12, 21, 109110,
125
laterals, 292293
Lee, Ahrong, 188189
Leggb, 6364, 7071
Levelt, Willem, 169
Levitt, Andrea, 103
Lezgian, 280
Li, Charles N., 50
Liberman, A. M., 308, 318
Liberman, Mark, 109111, 116
Liljencrants, J ohan, 137, 311
Lindblom, Bjrn, 137, 307308, 311
Lindsey, Geoffrey, 73, 87
Lisker, Leigh, 266, 278
Lfqvist, Anders, 264, 268, 274275, 277,
282
Lombardi, Linda, 197, 200, 266
Lublinskaya, Valentina V., 313
Lulich, Steven M., 137140, 146
Maclagan, Margaret, 281
macrostem, 30, 33, 39
366 Index
Maddieson, Ian, 59, 8991, 272, 275, 282,
306
Madsack, Andreas, 141142
Maeda, Shinji, 176178, 185187,
189191, 307, 311, 320321, 326
Magen, Harriet S., 171
Magloughlin, Lyra, xi
Maithili, 270, 277, 279
Malmberg, Bertil, 317
Malyska, Nicolas, 138, 140
Mandarin (dialects), 15
Mand (languages), 111
Mande, 60
Mangold, Max, 140
Mankon, 71
Manuel, Sharon Y., 322
Manyeh, Morie, 108, 111
Maori, 281
Marlo, Mike, 81
Marslen-Wilson, William, 122, 124, 152
Mascar, J oan, 212, 353
Mspero, Henri, 93
Massaro, Dominic W., 172
Mattingly, Ignatus G., 308
Mazaudon, Martine, 14, 16, 21, 7273
McCarthy, J ohn J ., xi, 13, 83, 86, 197,
201202, 206207, 209, 299, 353
McGarr, Nancy S., 282
Mester, Armin, 212
Meunier, Christine, 172
Mexico, 50, 64
Michaud, Alexis, ix, 69, 72, 326
Mielke, J eff, xi, 20, 228, 235, 256
Mikuteit, Simone, 279
minimal pairs, 65, 140, 166
Mitterer, Holger, 122
Modied Articulator Theory, 292293,
297298, 300302
mora, 2728, 36, 39, 45, 47, 52, 91, 101,
113114, 342
Morn, Bruce, 87
morphotonology, 32
Mortensen, David R., 73
Mpiranya, Fidle, ix, 49
Mullin, Kevin, 215
Munhall, Kevin G., 274, 308
Myers, J ames, 51
Myers, Scott, 111
Ndjonka, Dieudonne, 81
Nepal, 72
Nepali, 267268, 279, 281283
neutralization (between radicals), 42
New Guinea, 50
Newman, Paul, 72
Ngamambo, 57
Ni Chasiade, Ailbhe, 272
Nibert, Holly, 111
Niebuhr, Oliver, 172
Nigeria, 63, 109
Niimi, Seiji, 190
Nitta, Tetsuo, 104
Nolan, Francis, 124
Norton, Russell J ., 202
Nougayrol, Pierre, 6162
Noyer, Rolf, 209
Ntihirageza, J eanine, 25
Obligatory Contour Principle, 13
obstruent, voiced, 99
Odden, David, ix, 11, 13, 88, 91, 104, 235
Ohala, J ohn, 306307, 309311
hman, Sven E. G., 171, 322
Okeke, Vincent, 111
Onn, Farid M., 195
Optimality Theory (OT), 196198, 200,
204, 207, 212, 215, 250, 266, 299,
302
OShaughnessy, Douglas, 188
Oti-Volta, 112
Otomanguean (tone system), 73
Parallel Structures Model (PSM), 87
Park, Hea Suk, 190, 277
Passy, Paul, 310
Paster, Mary, 64
Pater, J oe, 202, 206, 215
Patin, Cdric, ix, 69
Perkell, J oseph S., 309
Perlmutter, David, 212
Index 367
Perrier, Pascal, 311
Ptursson, Magnus, 268, 272, 274
Philippson, Grard, 51
phonetic motivation, 5, 12
phonological theory, 3
Pierrehumbert, J anet, 109110, 116
Piggott, G. L., 87, 197
Pike, Mary, 6465
pitch, 50, 73, 103, 111, 115, 124125, 130
change, 56
levels, 56
system, 59
Pitt, Mark A. 153
Plauch, Madelaine C., 311
Portuguese, Brazilian, xi, 331336, 338,
341342, 346347, 352354, 356
Potts, Christopher, 206
Pratt, Patrick, 206
Prieto, Pilar, 111
Prince, Alan, 196, 199, 201, 206, 209,
215, 250, 299
Pruitt, Kathryn, 202, 215
Pulleyblank, Douglas, 58, 87, 197, 199,
206
Pulleyblank, Edwin G., 14
pulsing, glottal 149, 155, 157, 170
Quantal Theory of Speech, 265, 309
Radical Substance Free Phonology, 81,
88, 102
Ramus, Franck 153
Read, Charles, 179
Reetz, Henning 152, 279
registers, phonation, 1516
Reiss, Charles, 81, 87, 103
Renou, Louis, 250
Rialland, Annie, ix, 12, 112, 114115
Rice, Keren, 76, 87, 267
Ridouane, Rachid, 267, 274, 276, 283
Robblee, Karen E., 156
Roberts-Kohno, R. Ruth, 91
Robins, R. H. 199
Romanian, 281
rounding, vocalic, 83
Roynard, J ean-Michel, 21
Rubach, J erzy, xi, 202
Russian, 141, 298, 303
Sabimana, Firmard, 25, 49
Sagey, Elizabeth, 84, 293
Salmons, J oseph C., 266267, 275
Saltzman, Elliot L., 171, 308
Sampson, Geoffrey, 8990
Samuels, Bridget, 81, 87, 266, 271, 279
sandhi, 11, 13, 55, 64, 72
Sanskrit, 87, 250
Santos, Rachel, 357
Sapir, J . David, 252
de Saussure, Ferdinand, 82
Sawashima, Masayuki, 277
Scatton, Ernest A., 304
Schachter, Paul, 72
Schourup, Lawrence, 197
Schwenk, Holger 157
Segui, J uan 152, 156, 166169
semitones, 108, 116, 123124, 131132
Shih, Chilin, 111
Shoul, Karim, 281
Shua, 95
Shultz, J . Michael, 111
Sibomana, Leonidas, 49
Sievers, Eduard, 82
Silverman, Kim, 103
simplicity, formal, 5
single-operation theory, 84
Singler, J ohn Victor, 60
Siswati, 9394
Skou, 72
Slabakova, Roumyana, 303304
Slavic language, 298
Slovenian, 104
Smith, Brian, 215
Smits, Roel, 139
Smolensky, Paul, 196, 199, 201, 209, 215,
250, 299
Snider, Keith, 54
Snoeren, Natalie 152153, 156, 166169
Som, Penu-Achille, ix, 112, 114115,
124, 133
368 Index
Sonderegger, Morgan, 138139
sonorants
glottalized, 98
voice, 98
Southern Min (dialects), 64
Spanish, 111
Spondaic Lowering, 338, 357
Sportiche, Dominique, 6667
spreading, root node, 7
Stahlke, Herbert, 6, 14, 60
Staubs, 206
Steriade, Donca, 87, 200, 275
Stevens, Kenneth N., x, 57, 72, 90,
137138, 142, 146, 176, 264267,
274, 280281, 284, 306307,
310311, 313, 325
Stewart, J ohn M., 120
stops, voiceless, 87
Sundberg, J ohan, 307
Swashima, Masayuki, 190
Swedish, 277, 314, 316317,
Sweet, Henry, 82
syncopation, 30
Syrdal, Ann, 139
systems, Asian prodaic, 14
Takeki, Kamiyama, 309
Tamang, 15, 72
Tang, Katrina Elizabeth, 72
Task Dynamic model, 307
Teifour, Ryad, 155, 167
Tesar, Bruce, 202
Thalaki, 72
Thompson, Sandra 50
Tilkov, Dimitur, 304
Titone, Debra 167
Toft, Zo, 155, 167
tone
contour, 3, 62, 64, 68, 72
copy, 3
levels, 20, 8990, 9596
spreading, 3
Traill, Anthony, 72
Tranel, Bernard, 212
Trigo, Loren, 200
Trubetzkoy, N.S., 70, 82, 137
Tsay, J ane, 8, 51
Tsui, Ida Y. H., 279280
Tulu, 8384
Tupuri, 81, 9697, 99, 102103
Turkish, 141
Tuttle, Siri G., 212
Ultan, Russell, 207
Unied Feature Theory, xi, 85, 257,
292293, 295298, 300
Vaissire, J acqueline, x, 307, 309, 316,
319, 325
variation, contextual, 5
Vaux, Bert, 266267, 279, 293, 303
Vennemann, Theo, 306
Vietnamese, 15
vocal tract, 138, 141, 306, 311, 315, 317
vowel neutralization, 331
v-ratio, 149, 151, 157159, 161162, 164,
166, 168169
Walker, Rachel, 197
Wang, Shizhen, 138
Wang, William S.-Y, 6, 59, 8990, 206
Watson, Catherine, 281
Weenink, David, 131
Weitzman, R. S., 189
Welmers, William E., 89, 12, 14
Wetzels, W. Leo, xi, 55, 336338, 347,
353, 357
Whalen, Doug, 103
Whitney, William Dwight, 250
Wilson, Colin, 197198, 200, 205, 209,
215
Wobe, 60
Wokurek, Wolfgang, 141142
Wolf, Matthew, 202, 210, 214
Wolfe, Andrew, 293, 303
Woo, Nancy, 8990
Wood, Sydney, 317319
Wule, 112
Index 369
Xiamen, 64, 73
tone circle, 6
Xu, Yi, 171
xylophone, 131133
Yadav, Ramawatar, 270
Yala, 8
Yeou, Mohamed, 321
Yip, Moira, 6, 8, 11, 2021, 5455, 58, 60
Yip-Pulleyblank model, 90, 99
Yoruba, 12, 109, 113, 118, 125
Yoshioka, Hirohide, 264, 274, 277
Zapotec (Villa Alta Yatzachi dialect), 64,
6768
Zee, Eric, 281
Zeroual, Chakir, 277, 283
Zheltov, Alexander, 60
Zhenhai, 15
Zwicky, Arnold, 250

Tones and Features PDF

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Tones and Features PDF

Enviado por

Direitos autorais:

Formatos disponíveis

Tones and Features

Printed on acid-free paper

=super-high tone), the same

tng- you sg. are tying

arrows represent raising and lowering,

H, L/ M is a lowered e.g. Kom

L/ M is a non-lowered e.g. Ewe (Smith 1973,

L-H L-M-H (Hyman 1979a: 23).

M, L can be derived from /H/ and /L/ (Hyman

bi b he made bad thing (k, s, b =M

bi and not higher. The prediction

M, as in J ibu, Gwari, Gokana, Ngamambo etc. (Hyman

L downsteps should be captured by a feature system

mm lizl beak (mouth of bird)

i gckm cause of disease

tt ckl house of European

n ldzil place of food

H H-downstep (personal notes)

H-H HL-fusion+ Hyman and

gb kpl megb behind a spear

gb e kpe megb behind a stone

), Mid (M=v ) and Low (L=v

), and the language presents a

t eat fufu dry h give

have blisters s announce

deform g r stir sauce

lol roll over nd l p overcook meat

k fry. This is somewhat noteworthy, since in Kotoko

g trn a wife of a mans child is arriving

man turtle-dove egg burst sun in

Baare turtle-dove egg burst sun in

Baare turtle-dove egg burst sun in

the people came The people came.

n wn wl The child who does not

` c m wl is the one who is advised

hunger NEG get into him NEG

] would retain the hard quality of /p/ when voice-

] medical doctor, or guide savant [gitsavo

]. Interestingly, in mdecin is pronounced [e]

/. This deviant pronunciation is symptomatic of a morphophonemic level

ratappi to cause to cry

], with total harmony, but hypothetical /mawara/ will become

ra] in (7d), but it is harmonically bounded by

ra] satises this constraint but

ra] by the succession of

ra] (cf. (8)

|a|r|a *NASLIQ SHARE

|r|a *NasLIQ SHARE

ra] is exactly the kind of situation where

] vs. /mawara/ [bawara]. No such language exists, and

|a], with outright deletion of [r] and consequent

.], which is optimal

.] is the global minimum

|r|a] is the input to GEN. The candidate

|a|r], with metathesis, and [mw

|a], with spreading, but

/ San Pedro de Cajas

voiced pairs in J apanese include h

] swamp bre'jal [e] large swamp

] sail ve'leiro [e] sailing ship

] iron fe'rragem [e] ironware

] coffee cafe'teira [e] coffee pot

] tide mare'sia [e] smell of the sea