Structures in The Mind Essays On Language, Music, and Cognition in Honor of Ray Jackendoff

Structures in the Mind
Structures in the Mind

Essays on Language, Music, and Cognition in Honor of
Ray Jackendoff
Ida Toivonen, Piroska Csri, and Emile van der Zee, editors
The MIT Press

Cambridge, Massachusetts
London, England
2015 Massachusetts Institute of Technology
All rights reserved. No part of this book may be reproduced in any form by any
electronic or mechanical means (including photocopying, recording, or informa-
tion storage and retrieval) without permission in writing from the publisher.
MIT Press books may be purchased at special quantity discounts for business or
sales promotional use. For information, please email special_sales@mitpress.mit.
edu.
This book was set in Times by Toppan Best-set Premedia Limited. Printed and
bound in the United States of America.
Library of Congress Cataloging-in-Publication Data
Structures in the mind : essays on language, music, and cognition in honor of Ray
Jackendoff / edited by Ida Toivonen, Piroska Csri, and Emile van der Zee.
pages cm
Includes bibliographical references and index.
ISBN 978-0-262-02942-1 (hardcover : alk. paper) 1. Psycholinguistics.
2. Cognitive science. 3. Neurolinguistics. 4.
Cognition. I. Jackendoff, Ray, 1945- honoree. II. Toivonen, Ida. III.
Csri, Piroska. IV. Zee, Emile van der.
P37.S846 2015
401.9dc23
2015009287
10 9 8 7 6 5 4 3 2 1
Contents
Acknowledgments ix
Introduction xi
0.1 The Scholar Ray Jackendoff, by Ida Toivonen, Piroska Csri, and
Emile van der Zee xi
0.2 Some Brief Reflections on Ray Jackendoff, by Paul Bloom xii
0.3 Ray Jackendoffs Scholarship, by Noam Chomsky xii
0.4 The Brilliant Ray of Linguistics, by Adele E. Goldberg xiii
0.5 Meeting Ray Jackendoff, by Georgia M. Green xv
0.6 Rays Influence on a Young Generative Semanticist, by Frederick J.
Newmeyer xvi
0.7 Ray Jackendoff in the Semantic Pantheon, by Barbara
H. Partee xvii
0.8 The Man Who Made Language a Window into Human Nature, by
Steven Pinker xix
0.9 Ray Jackendoff, Cognitive Scientist, by Thomas Wasow xxiii
0.10 Why Ray Is Special, by Moira Yip xxv
0.11 The Organization of This Volume, by Ida Toivonen, Piroska Csri,
and Emile van der Zee xxvi
I LINGUISTIC THEORY 1
1 Simpler Syntax and the Mind: Reflections on Syntactic Theory and

Cognitive Science 3
Peter W. Culicover
2 What Makes Conceptual Semantics Special? 21

Urpo Nikanne
3 Semantic Coordination without Syntactic Coordinators 41

Daniel Bring and Katharina Hartmann
vi Contents
4 Out of Phase: Form-Meaning Mismatches in the Prepositional

Phrase 63
Joost Zwarts
5 The Light Verbs Say and SAY 79

Jane Grimshaw
6 Cognitive Illusions: Non-Promotional Passives and Unspecified Subject

Constructions 101
Joan Maling and Catherine OConnor
7 Agentive Subjects and Semantic Case in Korean 119

Max Soowon Kim
8 Lexical Aspect and Natural Philosophy: How to Untie Them 137

Henk J. Verkuyl
II PSYCHOLINGUISTICS 165
9 An Evolving View of Enriched Semantic Composition 167

Mara Mercedes Piango and Edgar B. Zurif
10 Height Matters 187

Barbara Landau and Lila R. Gleitman
11 Accessibility and Linear Order in Phrasal Conjuncts 211

Bhuvana Narasimhan, Cecily Jill Duffield, and Albert Kim
12 Sleeping Beauties 235

Willem J. M. Levelt
III LANGUAGE AND BEYOND 257
13 Evolution of the Speech Code: Higher-Order Symbolism and the

Grammatical Big Bang 259
Daniel Silverman
14 Arbitrariness and Iconicity in the Syntax-Semantics Interface: An

Evolutionary Perspective 277
Heike Wiese and Eva Wittenberg
Contents vii
15 The Biology and Evolution of Musical Rhythm: An Update 293

W. Tecumseh Fitch
16 Neural Substrates for Linguistic and Musical Abilities: A Neurolinguists

Perspective 325
Yosef Grodzinsky
17 Structure and Ambiguity in a Schumann Song 347

Fred Lerdahl
18 The Friars Fringe of Consciousness 371

Daniel Dennett
19 Climbing Trees and Seeing Stars: Combinatorial Structure in Comics and

Diverse Domains 379
Neil Cohn
Contributors 393
Index 395
Acknowledgments
Many people have helped in the creation of this volume, and we would
like to express our sincere gratitude to all. We wish to especially thank
the authors. It was very exciting to receive and read the chapters, and we
truly believe that Ray as well as all other readers will appreciate their
efforts as much as we do. We also want to thank the authors for providing
comments on each others initial drafts; the resulting volume bears
witness to the ideas, care and effort they poured into this task. We also
thank the scholars that contributed to the introduction of the volume.
Their contributions provided insights, depth and color that we could not
have managed on our own. In addition, the chapter authors as well as
the authors of the introduction have also patiently helped us with general
advice and encouragement. We must especially mention Joan Maling and
Daniel Dennett for support and practical advice.
Our special thanks also go to the external reviewers who have gener-
ously lent their time to the texts included in this volume: Erik Anonby,
Ash Asudeh, Andrew Brook, Liz Coppock, Simon Durrant, Evan
Houldin, Tabish Ismail, Jonah Katz, Kumiko Murasugi, Diane Nelson,
Dan Siddiqi, and Raj Singh. We are impressed by and grateful for the
reviewers enthusiasm, expertise, care, and general willingness to help.
Many thanks to Paul Melchin for his excellent editorial and formatting
assistance.
We also thank all the people at MIT Press who have worked on this
with us. We especially thank Sarah Courtney, Christopher Eyer, Philip
Laughlin, Mary Reilly, and Marcy Ross.
From behind the curtains, Hildy Dvorak has given us advice and
support from the very beginning, and we are very grateful for all her help.
Thank you all!
Introduction
0.1 The Scholar Ray Jackendoff

Ida Toivonen, Piroska Csri, and Emile van der Zee
These are exciting times for those who study cognition. Just a few decades
ago, it was commonly assumed to be futile to directly study concepts of
the mind, such as knowledge of language or consciousness. Today, cogni-
tive science is an established area of study. We are slowly moving from
a collection of disciplines with distinct methodologies and different out-
looks on what is important towards a truly interdisciplinary field. Cogni-
tive scientists are no longer necessarily linguists, psychologists, computer
scientists, etc., with a common interest in the mind. It is now common-
place to identify as a student of a topic or research area in cognition:
language, perception, consciousness, attention, memory, moral reasoning,
or learning. Researchers are cross-listed across traditional departments,
students are supervised by teams of scholars with different areas of spe-
cialization, and ties between universities and industry are flourishing.
The field brims with activity and enthusiasm, and for this we owe a huge
debt to pioneering researchers in cognitive science, scholars who took
the study of the mind seriously and had the audacity to reach out across
disciplines. One such pioneer is Ray Jackendoff. This volume is intended
as a thank you and a tribute to him and his work.
Ray Jackendoff was born in 1945. He studied mathematics at Swarth-
more College and then linguistics at MIT. He received his doctorate from
MIT in 1969. After a short stint at UCLA, he was hired at Brandeis
University, where he taught for 35 years. In 2005, he became the Seth
Merrin Professor of Humanities at Tufts University and co-director (with
Dan Dennett) of the Center for Cognitive Studies.
An introduction into the different areas to which Ray Jackendoff con-
tributed (syntax, morphology, semantics, phonology, musical cognition,
xii Introduction
comparative psychology, psycholinguistics, cognitive science, philosophy,

etc.) would either do him injustice or would require a book in itself.
We therefore invited a number of scholars in the area of cognition
to reflect on Ray Jackendoffs scholarly and personal influences, each
individual contribution forming part of an impressionist picture of
what is hard to put into a few words. Please enjoy the different specks
of paint.
0.2 Some Brief Reflections on Ray Jackendoff

Paul Bloom
I first met Ray Jackendoff when I was an undergraduate at McGillhe

had come to give a talk and was visiting with John Macnamara, who was
my advisor. I said hello, but was too much in awe of Ray to say much
else. We only become friends a few years later, when I went to MIT for
graduate school, and grew closer when I spent a semester teaching at
Brandeis in 1989. Weve kept in touch ever since, and at one point we
co-edited a book (along with my colleague and wife, Karen Wynn) in
honor of John Macnamara, who had died a few years earlier.
Im still a little bit in awe of Ray Jackendoff. He is a leading figure in
linguistics, of course, but has done significant research in related areas,
such as evolution, social cognition, and music, and has had a profound
influence on cognitive science more generally. Rays influence reflects
his gifts as a thinker and communicator. He is also an unusually clear
and thoughtful writer, with little patience with obscurity, pomposity, or
appeal to authority.
Ray is an alarmingly sweet person, with a reputation for supporting
students and young scholars. But reticence is not one of his strengths, so
Ill end with a warning: Ray is a dangerous man to sit next to at a confer-
ence, because he will provide a running commentary on any speaker
expansions of the idea, objections, counter-arguments, jokesthat will
inevitably be more interesting than whatever the person on stage is actu-
ally saying.
0.3 Ray Jackendoffs Scholarship

Noam Chomsky
The earliest of Rays major contributions, when he was still a student,

contributed to undermining the so-called Standard Theory of language
structure, which incorporated the Katz-Postal thesis that deep structure
Introduction xiii
determined meaning. Rays work was instrumental in showing that

surface structure made crucial contributions to semantic interpretation,
helping to lay the basis for the Extended Standard Theory (Y-model)
that prevailed in much subsequent work.
Ray continued to produce many further contributions on anaphora,
theory of phrase structure, and other related topics, weaving them into
an array of original and provocative ideas about the basic design of the
language faculty itself. From an early stage, much of his thinking turned
to conceptual structures, a domain in which he made many fundamental
contributions. These interests led him very naturally, step by step, into
broader realms of cognitive science, developing what he called mental
anatomy in a richly articulated form, meanwhile extending his inquiries
into spatial and social cognition, the human musical faculty, the much-
contested debates over the nature of consciousness, and in fact just about
every corner of mental life. The result is a fascinating tapestry of ideas,
insights, theoretical constructions, empirical discoveries over a remark-
ably broad range. Its a very impressive record, doubtless with much more
to come.
0.4 The Brilliant Ray of Linguistics

Adele E. Goldberg
Ray Jackendoff is officiallyas this years recipient of the prestigious

Rumelhart prizeone of the most influential researchers in all of the
cognitive sciences today. It is impossible not to be impressed by the
clarity, precision, depth of analysis, encyclopedic knowledge, and
extremely broad range of data he brings to every debate. His prolific and
profound writing is an inspiration to linguists, philosophers, and psy-
chologists, and his insights have formed the foundation of hundreds of
dissertations and thousands of papers. Ray has truly delivered on the
fields promise that linguistics should both inform and be informed by
the cognitive sciences.
Jackendoff has been key in the introduction of semantic roles, semantic
decomposition, lexical redundancy rules, and X theory into linguistics.
He has made lasting contributions to the fields of control, anaphora,
binding, ellipsis, idioms, constructions, and just about every other topic
in syntax. He was an early proponent of recognizing the importance of
speaker-based construals as opposed to objective truth conditions, and
has offered one of the most comprehensive representations for seman-
tics available. His formalism neatly captures cross-domain mappings
xiv Introduction
(metaphors) and serves to explicitly relate semantics and information

structure with syntactic form.
He is always ready to wrangle over a fine detail of an analysis or an
overarching foundational idea, and he is also always ready with a telling
example or counterexample, keeping every paper and discussion lively
and data-oriented. Ray consistently looks beyond the tiny question at
hand to see connections to a much broader array of phenomena. When
he considers argument structure, he does not stop at verbs, but looks at
nouns; in an analysis of recursion, he recalls lessons learned from work
in vision; when he focuses on ellipsis, he brings in arguments involving
whistling (really!).
A foolish consistency is the hobgoblin of little minds. Emerson
would not have found a little mind in Ray Jackendoff. Ray has evolved
from being Chomskys right-hand man to being a leading light in the
rising tide of alternative work that allows syntax to be simpler than ever
envisioned in mainstream generative grammar. Ray has been outspoken
in challenging a wide variety of cherished views (e.g., that the lexicon
contains only words, that syntax is binary branching, that there exists a
MOVE operation, that there are VP-shells, that aspectual properties
determine argument structure, that language did not evolve). At the
same time, he has consistently brought new arguments to bear in favor
of certain aspects of the more traditional framework (e.g., the existence
of autonomous syntax and innate abilities that are specific to language).
Because of his careful attention to the data and his theoretical eclecti-
cism, or maybe in spite of it, his contributions have had and continue to
have a strong and lasting impact on the field and are respected by all
sides of the debates.
On a more personal note, Ray has been a leader, a mentor, and a role
model to many of us in the field. I was lucky enough to get to know him
when I was a graduate student back in the 80s. I sent him a small paper
and I remember being surprised that he had found time to read it
and being so pleased by his hand-written, supportive message back.
Now I realize that is the way he isjust a very approachable person who
reads voraciously and makes time for everyone. In the days before PDF
files were commonplace, he would send copies of his new papers to his
many contacts, always with a friendly greeting in his memorable
longhand.
I recognize his continuing generous mentorship in the myriad ways he
continues to encourage young people in the field. He is a regular attendee
at a wide array of conferences, and he can very often be found sitting
Introduction xv
with a student or postdoc, sharing comments on talks. When he is not

enthusiastically introducing them, he is privately recommending them to
others.
Ray is also very human. At one meeting, when asked why he had not
cited a particular papermost of us who have presented recognize the
awkwardness of the situation, Ray acknowledged with honesty and
without apology that it is simply impossible to read everything. At the
same time, you can bet he read it the following day.
Perhaps what I love most about Ray is the twinkle that is ever present
in his eye. He embodies the idea that work should be fun. And beyond
being one of the very best syntacticians ever, he is also a devoted family
man, always talking of Hildy, his daughters, and his grandchildren with
the greatest affection.
0.5 Meeting Ray Jackendoff

Georgia M. Green
I first met Ray at a party at the conclusion of one of the San Diego Syntax
Festivals held in La Jolla in maybe 1970. These were informal roundtable
affairs during which Chomskys students who had left the nest after
earning their degrees discussed their own and each others work. Grad
students were there as observers. Ray and I were somewhat wary of each
other. I had a reputation in Cambridge as the Wicked Witch of the West
after an ill-advised implicature in a review I wrote for Language, and in
the heartland of what came to be known as generative semantics, he was
the face of the lexicalist heresy. I thought dancing might defuse (or
diffuse) the unspoken tension, but the music was not suitable, so speaking
happened, but dancing did not. It was probably twenty or twenty-five
years before we spoke again. By the mid-1990s, our views of the relations
between syntax and the speakers construal of the world and how it
works, and of what a grammar is had converged to a degree neither of
us could have foreseen, following the disparate paths of interpretive
semantics (Jackendoff 1972, 1983, 1990, 1997) and modern formal phrase
structure grammar (Gazdar, Klein, Pullum, and Sag 1985; Pollard and Sag
1987; Pollard and Sag 1994; Green 2011), which in the head-driven incar-
nations was strongly lexicalist. Our differences seemed trivial and local.
When Ray came back to Illinois a few years ago (he had taught here at
two LSA Linguistic Institutes), we had a lively dinner after his talk, and
I was sorry we had never realized how much our work was going in the
same direction, from the same general principles.
xvi Introduction
0.6 Rays Influence on a Young Generative Semanticist

Frederick J. Newmeyer
I spent the academic year 196869 writing a University of Illinois Ph. D.

dissertation that analyzed English modal auxiliaries, as well as aspectual
verbs such as start, continue, keep, and stop. Given the (in retrospect)
happy fact that practically my entire Illinois committee was on leave at
that time, I organized for myself a guestship at the MIT Research Labo-
ratory of Electronics for the entire year. (Actually, it turned out to be
less than a full year because on Thanksgiving Day my apartment behind
Tech Square burned to the ground and I had to return to Illinois for a
couple of months.) In keeping with my training at Illinois, my dissertation
was pure generative semantics, with each element of meaning endowed
with its own level of sentential embedding. As I recall, I gave the sentence
John began the book an analysis with five subordinate clauses.
MIT was not then a place where someone from the outside working
in a scorned framework could feel at home. All of the students that I met
were quite collegial at a personal level, and some came to be very good
friends of mine. But for most, their reaction to my verbal description of
how I was treating the English modals was one of bemused lack of inter-
est. They certainly never asked to see any of my work, and I was too
intimidated to offer them samples of it. As far as they were concerned,
Chomsky, in his lectures of the year before, showed what was wrong with
abstract approaches to syntax and that was it. Ray, however, was differ-
ent. From the first day that we met, he not only took a deep interest in
what I was working on, butmuch more importantlyhe spent more
than a little time trying to convince me that I was on the wrong track.
That involved actually reading part of my dissertation, which totally blew
me away. Ray obviously did not convince me to change direction, since
the final product was still a work of generative semantics. But I am happy
to say that I did come away from our interactions with an appreciation
of the lexicalist model that Ray was arguing for. And more importantly,
I developed a deep understanding of the subtleties of syntactic argumen-
tation that would have been impossible if I had never left Illinois. I can
thank Ray more than anybody else for the extent that I have become a
synthesizer and a person who can see more than one side to an issue.
There is an amusing postscript to my discussions with Ray in 1968 and
1969. For many years, one of my University of Washington colleagues
was Karen Zagona, who is one of the finest syntacticians in the genera-
tion that followed mine. She too set out to analyze the English modal
Introduction xvii
auxiliaries, in her case from the viewpoint of the minimalist program. I

couldnt refrain from pointing out to her how similar her analysis was to
the one in my dissertation, notwithstanding the vastly higher degree of
sophistication of hers. Grammatical elements that in my thesis were
analyzed as heads of an embedded clause were analyzed in her work as
having their own projection. I realized right away that every argument
that Ray had challenged me with in our discussions of forty years earlier
I was repeating, practically verbatim, in my exchanges with Karen. As I
see it, there could be no better testament to the influence that Ray has
had on my professional career.
0.7 Ray Jackendoff in the Semantic Pantheon

Barbara H. Partee
Ray Jackendoff has been and remains a pioneer in semantics, clearing

paths and blazing trails through uncharted and difficult territory, often
as a loner, but fortunately not unsung. He did his Ph.D. under Chomsky,
and had an early and notorious stint as an able Chomsky hatchet-man
in the Generative SemanticsInterpretive Semantics Wars.
More important and of greater lasting influence were his early positive
contributions. Among his early works the first to make a great impression
on me, and which I immediately started teaching in my semantics classes,
was his magnificent and wide-ranging book Semantic Interpretation in
Generative Grammar (Jackendoff 1972). It broke new ground in the
study of focus and presupposition and the role of intonation in focus
constructions, and has remained a foundational work in that area. It
included exciting insights about the semantic role of thematic relations
in the interpretation of control structures in surprising configurations,
offering a solution to puzzling passives like The problem was tried to be
solved and Mary was promised to be allowed to leave. And there was
much more, on negation, quantification, anaphora, non-specific noun
phrases. Oh, and beautiful work on the interpretation of adverbs in dif-
ferent syntactic positions. That book is on my list of all-time important
works. Im not surprised to see that its far and away #1 among Jacken-
doffs works as cited on Google Scholar.
And it was soon followed by two memorable articles on very different
topics, but both changing the way one could and should look at really
important issues, and both still very widely cited. The first, Morphologi-
cal and semantic regularities in the lexicon (Jackendoff 1975), was a
major breakthrough in trying to understand the status of what we had
xviii Introduction
been calling lexical redundancy rules. The prevailing view, which came
from Chomskys ideas about evaluation metrics for grammars, was that
the best grammar was the shortest grammar, and theoretical adequacy
was all about finding a theory of grammar for which the grammar picked
by the evaluation metric was indeed the best grammar. This was all in
the service of language acquisition, which was presented schematically
as a matter of choosing among all the possible grammars consistent with
the primary data. And the idea that the evaluation metric would prize
the shortest grammar was undoubtedly influenced by mathematicians
aesthetics in designing axiomatic theories, where it makes sense to search
for a non-redundant and minimal set of axioms, or physicists search
for some small set of physical laws that explain a wide range of
phenomena.
This paper of Jackendoffs really helped to change that whole attitude;
it was probably one important step on the path from taking physics as
the model science to the more recent ethos of seeing linguistics, and
psychology more generally, as parts of biology. What Jackendoff argued
convincingly is that the lexicon is full of important subregularities, both
in semantics and in morphology, that cannot be reduced to pure redun-
dancy rules because they are not fully productive and the actual forms
that occur are therefore not predictable. Whereas a real redundancy rule
like [+human] [+animate] means that you do not have to enter the
value of the animacy feature for a noun with the [+human] feature, the
forms and meanings of nominalizations like discussion, argument, rebut-
tal, show a similar nominalizing semantics result of V-ing associated
with three different suffixes, and discussion, congregation, copulation,
show three different semantic effects (result of V-ing, group that Vs,
act or process of V-ing) associated with the same suffix (Jackendoff
1975, 650). He used examples like these, and examples of morphologi-
cally complex forms with no synchronically live base forms (like retri-
bution, fission), to argue that lexical redundancy rules are not productive
derivational rules; one needs full entries in the lexicon. But he did not
rest with his demonstration of the need for full entries. To me the most
interesting part of the paper was his exploration of the question of what
redundancy rules can be good for if they do not let us omit information
from the lexicon. And his answer was an important advance in psycho-
linguistics: he proposed a view on which redundancies make new lexical
items easier to learn, and on which redundancy rules can also be used
creatively in the formation of new lexical items, suggesting that lexicon
and syntax are not so sharply different as had been supposedthe
Introduction xix
lexicon is not all simply memorized (and larger structures are not all
simply generated). This was all consistent with Chomskys lexicalism, but
it may be hard to realize now how new and surprising was the idea of
having both redundancy rules and full lexical entries.
The other article I remember most vividly from that period is the one
in which he looked closely at the semantics of case roles and offered
structural-metaphorical extensions of basic relations involving locations,
paths, and motion: Toward an explanatory semantic representation
(Jackendoff 1976). Thats the one where he likens explaining an idea to
someone to putting an object into a container, and many other such
analogies. Thats a good example of some of the insights that come from
conceptual semantics that dont have any direct counterpart in formal
semantics.
In all of his work, from the earliest to the most recent, he has been a
champion of the importance of semantics as a part of linguistics proper,
and at the same time has increasingly forged ties with other aspects of
psychology, especially perception. He began as a Chomskyan, and has
kept the mentalistic stance, but has argued against Chomkys syntacto-
centric view of linguistics. He agrees with Lakoff on the importance of a
conceptual perspective on semantics, but disagrees with him in many
other ways. He agrees with Fodor on some fundamental issues about the
language of thought, but parts company with him on realism. He appre-
ciates much of what has been done in formal semantics, while arguing
strenuously against classical versions of its foundations. And he takes
pains to note that not all model-theoretic semanticists insist on realistic
models, citing Bachs Natural language metaphysics (Bach 1986) as
compatible with the view of basing semantics on models as conceptual-
ized by the language user (Jackendoff 1998). In short, Jackendoff has
been an important and independent thinker, making a myriad of major
substantive contributions while taking on the important foundational
issues of our field. We are all in his debt. And thats even without men-
tioning his musical contributions, which have brightened many occasions,
including the 1974 Linguistic Institute in Amherst, from which those of us
who were there carry happy memories of Rays beautiful chamber recital.
0.8 The Man Who Made Language a Window into Human Nature
Steven Pinker
When the science of linguistics was revolutionized in the 1960s by the

theories of Noam Chomsky, hopes ran high that language would become
xx Introduction
a window into human nature. Language is the principal means by which

members of our species share their inner life, so an understanding of
language promised unprecedented insights into the composition of
thought. By bringing a formal rigor into the study of language, which is
out there for all to hear and see, the new linguistics promised to deliver
a comparable rigor into our understanding of the airier and less acces-
sible recesses of the mind. Chomsky argued that beneath the surface of
a spoken sentence lay a deep structure that embodied hitherto unknown
patterns of richness and elegance. And beneath the bewildering Babel
of the worlds languages was an even deeper Universal Grammar. UG
offered a genuinely new way to think about the ancient nature-nurture
problem, and since it was a major part of the genetic patrimony of Homo
sapiens, the part that was most distinctive to our species, it promised to
shed light on what it meant to be human. During the heady days of the
Chomskyan revolution, the concepts of generative grammar, deep struc-
ture, and universal grammar were bruited in discussions of other faculties
of mind, from vision and reasoning to music, art, and literature.
But for various reasons, it turned out not to be Chomsky himself who
led us to the promised land of a new understanding of human nature but
his student Ray Jackendoff. More than anyone else, Ray has fulfilled the
hope that the revolution in linguistics would illuminate the human
psyche. His contributions are almost embarrassingly far-reaching.
Rays 1975 paper Morphological and Semantic Regularities in the
Lexicon was among the first modern analyses of the interaction between
memory and computation in language. He shone a light on the then-
prevalent assumption that every linguistic regularity should be distilled
out of lexical representations and captured in generative rules, leaving
only a compressed residue of irreducibly arbitrary information to be
stored in the lexicon. By documenting patterns across families of related
verbs that were neither random nor genuinely productive, Ray proposed
that lexical memory is organized by a different species of rule, one that
embodied redundancies without necessarily generalizing them. A decade
later, Alan Prince and I argued that these redundancies bespeak a fun-
damental property of human memory, namely the superposition of pat-
terns, and that this was the key property of cognition captured by
connectionist (neural network) models of language and cognition.
X-Bar Syntax, published in 1977, proposed a universal template for the
phrase structure component of language, eliminating the embarrassing
vagueness in a core component of generative grammar and serving as a
paradigm for what a theory of universal grammar should look like. By
Introduction xxi
tying the geometry of parse trees to their meanings, Ray also fleshed out
the crucial linkage between syntax and semantics. On a personal note I
can add that reading X-Bar Syntax as a postdoctoral fellow in 1980 was
a revelation to me. The theory immediately suggested a way in which a
child could work backward from the wording of a parental utterance and
an understanding of its context to the phrase structure rules that gener-
ated it. This could allow children to bootstrap their way into the syntax
of the target language, and it served as the heart of a theory of language
acquisition that I developed in several articles and books.
Grammar as evidence for conceptual structure, included in the
seminal 1978 collection Linguistic Theory and Psychological Reality, was
yet another revelation. Building on ideas by Jeffrey Gruber, Ray showed
that abstract concepts of motion and location lay at the heart of a vast
array of expressions that were not ostensibly about the physical world
at all. This insight truly made language a window into thought, and
it anticipated vast research enterprises in the decades to come on
analogy, conceptual metaphor, and embodied cognition. In subsequent
worksSemantics and Cognition (1983), Semantic Structures (1990), and
Parts and Boundaries (1991), Ray carried out breathtaking analyses of
the cognitive representation of space, time, motion, matter, agency, goals,
causation, and social relationships, perhaps coming closer than anyone
to laying down a spec sheet for the contents of thought.
And there was more to come. In 1997, Ray turned to the question of
How language helps us think, and probed the relation between lan-
guage and thought in a far deeper way than the trendsetters of the neo-
Whorfian fads of the 2000s. In his 1993 paper with Barbara Landau,
What and where in spatial cognition, he suggested that its no coin-
cidence that neuroscientists co-opted interrogative pronouns for the two
major divisions of the visual system: the divisions represent two different
kinds of spatial information encoded respectively in the meanings of
nouns and prepositions. Recent studies of the neurobiological basis of
spatial cognition using neuroimaging techniques that did not exist when
Ray and Barbara wrote their paper have vindicated this ambitious idea.
One of the first and most famous applications of generative linguistics
to other domains was Leonard Bernsteins 1973 lecture series called The
Unanswered Question, which loosely applied Chomskys theories to
music. In 1977 Ray published a critical review of this premature effort,
but Ray is never satisfied with just tearing things down. His 1983 book
with Fred Lerdahl, A Generative Theory of Tonal Music, outlined a
sophisticated analysis of the cognitive structures underlying melody and
xxii Introduction
rhythm and how they overlap with the structures of languagea topic
he returned to in his 2006 essay The Capacity for Music: What Is It and
Whats Special About It? and his 2009 essay Parallels and Nonparallels
Between Language And Music. In my view, it remains the richest and
most insightful analysis of the mental representation of music.
As if language, space, thought, and music were not a broad enough
range of topics, Ray turned in 1987 to the problem of consciousness.
Unlike the many cognitive scientists and neuroscientists who use the
topic as an excuse to do bad philosophy, Ray came up with a substantive,
non-obvious, and plausible hypothesis about the contents of conscious-
ness, namely that we are aware of intermediate levels of representation
in the hierarchy from sensation to abstract knowledge. He also contrib-
uted the invaluable concept of the computational unconscious, the
infrastructure of information processing that makes reasoning and
awareness possible.
This leaves the nature of language itself, and here we have Rays two
capstone contributions. In The Architecture of the Language Faculty
(1997) and Simpler Syntax (with Peter Culicover, 2005), Ray outlined a
theory of language that (unlike the allegedly minimalist theories of his
mentor) implements Einsteins dictum that everything should be made
as simple as possible, but no simpler. Rays parallel architecture model,
which posits multiple generative components whose outputs contain
variables which are unified by interface rules, embraces both the open-
ended combinatorial power of language and the idiosyncrasies it toler-
ates at every level. Best of all, it harmonizes with Rays other capstone,
Foundations of Language (2002), which presents nothing less than a
theory of the place of language in nature, integrating grammatical theory,
parsing, acquisition, evolution, and neuroscience.
Rays stunning record of contributions comes from a happy conglom-
eration of traits: a concentration on the deepest questions about lan-
guage, mind, and human nature; full use of the theoretical precision and
empirical richness made available by modern linguistics; a judicious
level of formalism, which avoids the extremes of woolliness and fussi-
ness; an intuitive feel for the texture of mental phenomena; and a refusal
to be swayed by fads, fashions, ideologies, dogmas, or daringness for its
own sake.
Rays oeuvre is distinctive for another reason. He blazed a trail into
the center of the mind without the appurtenances and perquisites of
academic power. He could not dine out on the brand-name appeal of his
university; did not preside over a factory of graduate student helpers and
Introduction xxiii
high-tech toys; originated no school of thought or cult of personality;

could not tap a war chest bulging with grant funds. Yet, with little more
than a pencil, a library, and his own ingenuity, Ray has elucidated the
workings of the mind perhaps better than anyone alive today. In an era
of big science and academic celebrityhood, Ray has shown that there is
still a place in scientific life for the solitary scholar and deep thinker.
0.9 Ray Jackendoff, Cognitive Scientist

Thomas Wasow
The last paragraph of Rays first book (Jackendoff 1972, 386) includes
the following:
If we open up a human being, what do we find inside? The answers have been
of the form: We find a four-chambered heart, a spine, some intestines, and a
transformational grammar with two or more syntactic levels. The question of this
section has been: What function do the things we have found serve? Why do they
have the structure they have as opposed to any other?
When I first read this, I was struck by the audacity of comparing the
psychological reality of then-current grammatical theory with the physi-
ological reality of hearts, spines, and intestines. Reflecting on the passage
over forty years later, what impresses me is rather different. The impor-
tant part is not the sentence about physical organs and grammar, but the
two questions that follow. The question about function demands a deeper
level of explanation than has been the norm in generative grammar. The
why-question invites explanations in terms of biological evolution.
It has been a hallmark of Rays work over the decades to seek expla-
nations of linguistic phenomena in terms of fundamental properties of
human cognition, and to inquire into the origins of those properties. In
doing this, he has connected his own linguistic discoveries with research
in psychology and biology. Perhaps more than any other linguist, he has
worked to integrate linguistics into cognitive science.
Of course, Ray is not alone in this. Indeed, since the late 1950s, Chomsky
has touted linguistics for the insight it can provide into human cognitive
abilities. Chomskys work played a major role in the birth of cognitive
science, combining insights from linguistics, philosophy, and psychology,
and using new tools made available by the development of computers,
to create a new science of the mind. In the 1960s and 70s, he sometimes
referred to linguistics as a branch of psychology; later, to emphasize his
strong claims about the innateness of much linguistic knowledge, he
began to refer to linguistics as a branch of biology. Such claims led to a
xxiv Introduction
great deal of interest in linguistics from scholars in the other branches

of cognitive science and helped nurture the tremendous growth of the
discipline during those decades.
But since the 1970s the mainstream of theoretical linguistics has
become increasingly inward-directed. Chomsky and his closest followers
have done little to find connections with work in other disciplines, or to
make their own work comprehensible to anyone else. Rays Presidential
Address to the Linguistic Society of America in 2004 noted this develop-
ment with dismay. Ray himself has worked hard to articulate a theory of
language structure that comports well with research in the other cogni-
tive sciences.
Since the 1970s, Ray has produced both influential technical research
on syntactic and semantic phenomena and high-level discussions of how
the language faculty is structured and connected to other faculties. On
the one hand, he has done seminal work on anaphora, quantifier scope,
phrase structure, the structure of the lexicon, resultatives, and ellipsis,
among many other topics. On the other, he has addressed such big-
picture questions as modularity, innateness, and consciousness. Each of
these facets of his work informs the other: the detailed technical inves-
tigations are motivated by the general questions about human cognition,
and the high-level claims about mental organization are supported by
the grammatical research.
One aspect of Rays work that clearly sets him apart from orthodox
Chomskyans is his interest in explaining how human linguistic abilities
could have evolved. As many commentators (e.g., Dennett 1995) have
noted, while Chomsky posits a task-specific and species-specific innate
language organ, he has resisted calls for an account of how it could have
evolved in a biologically realistic timeframe. Instead, he has speculated
that the human capacity for language may be an accidental side effect
of other evolutionary developments. Ray has accepted the idea of a
highly specific innate language faculty,1 but has taken up the challenge
of presenting a plausible evolutionary account of its origins (see for
example Jackendoff [2002]).
Taking the question of language evolution seriously naturally involves
asking the sort of functional questions Ray posed in the opening quote.
A feature of language can enhance the fitness of language users only
if it serves some useful function for survival and/or reproduction.
The obvious function to invoke in this connection is communication,
and that is the basis of Rays account: I assume that language arose
primarily in the interests of enhancing communication (Jackendoff
Introduction xxv
2002, 236). While most cognitive scientists would find this assump-
tion unproblematic, Ray is again breaking with Chomsky, who asserts,
human language is not particularly or specifically a communica-
tion system (http://www.nancho.net/advisors/chomsky.html). Ray, in
contrast, writes, I will assume without justification that any increase
in explicit expressive power of the communicative system is adap-
tive, whether for cooperation in hunting, gathering, defense [footnote
omitted], or for social communication such as gossip (Jackendoff 1999,
272). Rays common-sense approach to this issue puts him out of the
mainstream of theoretical linguistics, but very much in the mainstream of
cognitive science.
Ray was recently awarded the David E. Rumelhart Prize for Contribu-
tions to the Theoretical Foundations of Human Cognition. Though not
the first linguist to win this prestigious award, he is the first whose work
involves neither laboratory experimentation nor computational model-
ing. He follows the work of the experimentalists and the modelers, and
synthesizes their findings into theories of mental architecture that he
then tests using more traditional linguistic methods. In this way, he has
been able to do more to integrate linguistics into cognitive science than
any other linguist.
0.10 Why Ray Is Special

Moira Yip
These remarks are a personal case study based on the years of my contact
with Ray. I hope that it illuminates how broad-ranging his mind is, and
what a great debt many of us owe him.
For me as a phonologist, it is unusual to find myself collaborating with,
or being encouraged by, a syntactician or a semanticist. But Ray has
always resisted being typecast, hence his extraordinary breadth of knowl-
edge and enthusiasm. As a graduate student, I read his work on X-Bar
syntax, but despite this my interests moved towards phonology. Then, as
luck would have it, when my son was a few months old, Ray hired me to
fill a part-time temporary job replacing Joan Maling, who was on mater-
nity leave. The following year Jane Grimshaw took maternity leave, and
the year after that Joan Maling took a second leave, and so it happened
that after three years of me hanging about Brandeis, Ray went to bat on
my behalf, and with great resourcefulness persuaded the administration
to create a part-time tenure-track position in phonology, for which I was
duly hired. The point of this personal tale is that Ray has always seen the
xxvi Introduction
big picture: in 1983 part-time tenure-track jobs were a new idea, but that
has never stopped Ray.
This comes through with great clarity in his scholarly work: he does
not get boxed in by the wisdom-du-jour. I have only collaborated with
him once, on a 1987 Language paper with Joan Maling (on which, typi-
cally, they insisted on making me first author because I didnt have tenure
yet and they thought it might help me). The paper was on quirky case,
and it used mechanisms drawn from autosegmental phonology to assign
case markings, so we called it Case in Tiers. In the context of Rays oeuvre
it is a mere bagatelle, but I remember what sheer fun he was to bounce
ideas off.
When I needed a keynote speaker to launch the University College
London (UCL) interdisciplinary Centre for Human Communication, the
only person I considered was Ray. He has a skill that is desperately rare
among theoretical linguists: he can build bridges to researchers from
other branches of language sciences, as well as cognitive sciences and
philosophy. And of course he gave a superb talk.
More recently, when I began to develop an interest in comparisons
between human language and animal communication, especially bird-
song, he was one of the very few linguists to whom I sent a draft paper,
and, typically for Ray, he quickly responded with thoughtful and encour-
aging comments, including suggestions as to where to submit it. Since
then, his work on the evolution of human language has helped form my
thinking on the issues, and I always assign his papers to my students. I
plan on continuing to do so for many years to come.
This Festschrift is an indication that I am not alone in my admiration
for Rays work, or in my gratitude for being his colleague and friend.
0.11 The Organization of This Volume

Ida Toivonen, Piroska Csri, and Emile van der Zee
The chapters in this Festschrift are written by colleagues and/or former

students of Professor Jackendoff. The topics reflect Jackendoffs wide
contributions to scholarship, as they cover various subfields of linguistics
(phonology, morphology, syntax, semantics), while they also branch out
into other disciplines, such as music, philosophy, neuroscience, and psy-
chology. In the spirit of Jackendoffs work, many of the contributing
authors to this volume reach across disciplines.
The volume is divided into three parts. The first part contains chapters
that pertain directly to core linguistics. The second part is broadly
Introduction xxvii
classified as psycholinguistics, or linguistics and psychology. The third

part is a collection of chapters that touch on language but reach beyond
linguistics into other fields.
Part I, Linguistic Theory, opens with Peter Culicovers chapter Simpler
Syntax and the Mind: Reflections on Syntactic Theory and Cognitive
Science, which addresses how the theory of language fits into a more
general theory of the mind. With Jackendoff and Culicovers Simpler
Syntax Hypothesis as a starting point, Culicover focusses on the role of
syntax in grammar. In particular, he argues that much of grammar can
be viewed as constructions, a key feature of Jackendoffs Parallel Archi-
tecture. Culicover shows how this view of grammar has consequences
for how we understand language acquisition and the competence/
performance distinction.
Similarly to Culicovers chapter, Urpo Nikanne in his chapter What
Makes Conceptual Semantics Special? places linguistic theory in a
broader context. Where Culicover considers linguistic theory in the
context of a more general theory of the mind, Nikanne discusses linguis-
tics and Jackendoffs Conceptual Semantics from the perspective of
general scientific inquiry. He outlines the cornerstones of science: scien-
tific work relies on specific research goals, well-defined background
assumptions, methodological guidelines and formal mechanisms. The
goal of Nikannes chapter is to discuss Conceptual Semantics at all these
levels. He thereby situates Conceptual Semantics within science in
general, and within linguistics and cognitive science more specifically.
Next, Daniel Bring and Katharina Hartmann address the border
between syntax and semantics, and how a linguistic puzzle can be solved
by carefully dividing up the semantic and syntactic puzzle pieces. Their
chapter Semantic Coordination without Syntactic Coordinators intro-
duces a number of coordinators in German that display some character-
istics that make them at first seem like they defy clear description and
categorization. The authors carefully go through the characteristics, and
then, in the tradition of Jackendoff and Culicover, they show that the
typology is quite clear once the syntactic and semantic properties
are regarded separately, and once it is recognized that semantic and
syntactic properties do not necessarily align in the same way for all
coordinators.
Joost Zwarts also addresses a phenomenon that lies on the syntax-
semantics interface. His chapter Out of Phase: Form-Meaning Mis-
matches in the Prepositional Phrase brings into focus a number of
different types of prepositional phrases that all have in common that the
xxviii Introduction
syntactic complement of the preposition is not the expected semantic

argument; it is not the ground. The data are drawn from Dutch,
German, and English. Zwarts shows how this can be captured in
Jackendoffs Parallel Architecture, and he also discusses how these kinds
of mismatches can arise historically.
In a lexical semantic case study The Light Verbs Say and SAY, Jane
Grimshaw explores a class of verbs in English that she calls SAY
verbs. The class includes ask, announce, assert, grunt, and gripe.
Grimshaw proposes a universal schema that captures the commonalities
between these verbs. She then shows that the differences between
the verbs divide them neatly into four, clearly distinct groups: the light
verb say, SAY + discourse role, SAY -by-means, and SAY-with-attitude.
The analysis of SAY verbs presupposes that there are universal semantic
primitives, much in the tradition of Jackendoffs Lexical Conceptual
Semantics.
Joan Maling and Catherine OConnor revisit passives in their chapter
Cognitive Illusions: Non-Promotional Passives and Unspecified Subject
Constructions. They focus on constructions where the subject is
demoted but no other argument has been promoted to subject position.
In the spirit of Jackendoffs Parallel Architecture, the paper takes seri-
ously the interfaces between morphology, syntax, and semantics. Maling
and OConnor investigate a variety of constructions from different lan-
guages and they argue that the forms can be ambiguous, synchronically
or diachronically, as passives or impersonal actives. Under this view, non-
promotional constructions are like optical illusions: they can be inter-
preted in more than one way.
Max Soowon Kims chapter Agentive Subjects and Semantic Case in
Korean continues on the theme of grammatical functions. He introduces
some very interesting data pertaining to case marking and subjecthood
in Korean. In particular, the chapter focuses on subjects with locative
case, which have previously received little attention in the theoretical
literature. Kim compares the Korean data to case marking data from
other languages. The analysis of locative subjects in Korean developed
in this chapter draws upon Yip, Maling, and Jackendoffs Case in Tiers
theory.
What is the basic nature of the properties of verbs and sentences that
we call aspect? This question is asked in Henk Verkuyls Lexical
Aspect and Natural Philosophy: How to Untie Them. In discussing
various ways in which linguists characterize aspect, his chapter calls into
question a number of assumptions that are generally adopted in the
Introduction xxix
linguistics literature, such as the distinction between Aktionsart and

grammatical aspect. He also questions whether notions like movement
and actualities in natural philosophy should be tied to linguistic aspec-
tual distinctions. He argues that an understanding of aspect can only
come from a careful consideration of a number of complex semantic
factors together. Verkuyls paper bridges philosophy and linguistics and
concludes the first part of the volume.
Part II of the volume, Psycholinguistics, consists of a collection of
papers that reach from linguistics into the areas of psychology and neu-
roscience. The first chapter of the section is An Evolving View of
Enriched Semantic Composition, where Mara Mercedes Piango and
Edgar Zurif discuss psycholinguistic and neurolinguistic research on
semantic composition. After outlining their earlier studies bearing on the
processing and neurological properties of meaning composition provided
by syntactic operations, they present their current views on aspectual
coercion and complement coercion, two semantic phenomena where
meaning is introduced beyond that which is introduced by the syntax.
Their research crucially shows support for one key proposal of Jackend-
offs: that semantics is combinatorial and generative and can be studied
independently of syntaxproperties that are observable through real-
time, lesion-based and functional neuroimaging measures.
Barbara Landau and Lila Gleitmans chapter Height Matters is in a
sense the syntactic counterpart to Piango and Zurifs semantics chapter.
While for Piango and Zurif some parts of the interpretation are purely
part of the semantic level of grammar, Landau and Gleitman demon-
strate that the syntax alone can also influence interpretation. They
point to a number of phenomena where the hierarchical relationship
in a syntactic structure influences the interpretation of a sentence
beyond the meaning of the words. They contrast argument path PPs
with adjunct source PPs, and they also discuss asymmetrical readings of
symmetrical predicates such as collide. Finally, they point to studies that
show that asymmetry in linguistic representations aid in childrens
memory of visual stimuli. Together, the Piango and Zurif chapter and
the Landau and Gleitman chapter illustrate that, in order to understand
how people encode meaning in language, we need to consider syntax,
word semantics and compositional semantics both separately and
together. This view of course reflects the essence of Jackendoffs Parallel
Architecture.
Bhuvana Narasimhan, Cecily Jill Duffield, and Albert Kim revisit
linear order orthodoxy in their chapter Accessibility and Linear Order
xxx Introduction
in Phrasal Conjuncts. Adults robustly present old information before

new information in the linguistic string, but children do not; in child
language, new information tends to precede old information. In a clev-
erly designed study, Narasimhan, Duffield, and Kim attempt to tease
apart factors such as ease of production (a speaker-oriented explanation)
and ease of comprehension (a listener-oriented explanation), working
memory, frequency, and priming effects. The results of their study are
consistent with the view that new-before-old (which is what children
prefer) is indeed easier to produce. Their account is presented as a pref-
erence rule system, as developed by Jackendoff in Semantics and Cogni-
tion (1983) and elsewhere.
This part of the volume concludes with Willem Levelts chapter Sleep-
ing Beauties, which takes us on a journey through the history of recent
Western science that considers the nature of scientific discovery and
rediscovery. Levelt discusses a number of discoveries, starting with Stein-
thals work on consciousness, which (in Levelts words) was kissed back
from enchantment by Jackendoff in Consciousness and the Computa-
tional Mind (1987). In addition to Steinthals theory of consciousness,
Levelt identifies six other breakthroughs in research on language and
cognition that were forgotten and later rediscovered: Meringers analysis
of spontaneous speech errors, Exners cohort theory, Wundts grammar
of sign language and his introduction of tree diagrams, Reinachs and
Lippss speech act theory and, finally, Isserlins adaptation theory. Levelt
poses and attempts to answer the question of what factors play a role in
the spreading of new ideas and findings.
Part III of the volume, Language and Beyond, explores issues beyond
linguistic structures, and ventures into topics such as evolution, music,
and the grammar of comics. This section opens with Daniel Silvermans
chapter Evolution of the Speech Code: Higher-Order Symbolism and
the Grammatical Big Bang, which addresses the question of the origin
of human linguistic abilities. Taking a communicational perspective on
language, he presents a proposal as to how hierarchical representations
might have emerged as a necessary by-product of the ever-increasing
complexity of phonetic strings that, on rare occasions, were semantically
ambiguous. In Silvermans view, the communicational pressure on
such structurally ambiguous forms emerging from this interlocutionary
process would have led up to a grammatical Big Bang: the emergence
of hierarchical structure. As a possible motivation and explanation for
the successive qualitative jumps in phonetic complexity resulting in
Introduction xxxi
higher-order symbolism in the speech code, he draws on an array of

diachronic sound changes and synchronic variations widely attested in
human language.
Heike Wiese and Eva Wittenbergs chapter Arbitrariness and Iconic-
ity in the Syntax-Semantics Interface: An Evolutionary Perspective con-
tinues on the theme of cognitive architecture. They investigate the role
of rituals for language evolution and describe how parallelisms in sound
and meaning domains might have acted as stepping stones for the emer-
gence of linguistic symbols. Processing data from present-day language
use highlight the explanatory power of the Jackendovian Parallel Archi-
tecture for the emergence and preference of linguistic parallelisms. The
chapter ties together many themes in Jackendoffs work: language, lin-
guistic processing, music, and evolution.
Continuing on the theme of evolutionary foundation, Tecumseh Fitch
focuses his attention on the hierarchical metrical structure that underlies
human musical and linguistic abilities. Delving deep into cognitive
biology, his chapter The Biology and Evolution of Musical Rhythm: An
Update provides a critical overview of studies on capacities observed
in different non-human species that have been proposed as possible
evolutionary precursors to hierarchical rhythmical organization: sponta-
neous synchronization of rhythmic behavior (i.e., pulse perception and
entrainment) and vocal learning. In the spirit of Jackendoffs Parallel
Architecture approach to cognitive computation and the evolution of
language, Fitch explores recent research that promises insight into the
emergence of the intricate multi-component interplay of cognitive
resources that provide thebasis for human linguistic and musical abili-
ties. The studies under consideration suggest that at least some of these
components may actually be shared with other species.
In his contribution Neural Substrates for Linguistic and Musical Abil-
ities: A Neurolinguists Perspective, Yosef Grodzinsky offers a different
angle on the comparative study of linguistic and musical abilities, a field
where Jackendoff has done groundbreaking work. He offers a critique
of previous research done under normal and disrupted cognitive condi-
tions (such as aphasia and amusia). The previous research has had two
parallel goals: (a) to investigate the functional modularity of these two
abilities, and (b) to explore the possibility that they are neurally modular.
Grodzinsky develops a novel experimental paradigm that involves the
analysis of focus as conveyed by pitch. He proposes to specifically use
cases where pitch discrimination is key to the proper determination of
xxxii Introduction
truth conditions.In the proposed studies, the study of pitch in language,

together with its musical analogues, would be couched in semantic con-
siderations more deeply than is presently done.
Fred LerdahlJackendoffs co-author on the 1983 foundational book
A Generative Theory of Tonal Music (GTTM)embarks on a formal
analysis of Robert Schumanns Im wunderschnen Monat Mai along
multiple dimensions in his chapter Structure and Ambiguity in a
Schumann Song. Through a careful analysis of the songs rhythmic orga-
nization, event hierarchies, and path through the tonal space, he addresses
the listeners implicit understanding of instabilities and ambiguities
present in this musical piece. The well-formedness and preference rules
posited by GTTM for structural constituency and prominence relations
are complemented by a quantitative tension analysis in tonal pitch space,
introduced and elaborated in later work by Lerdahl. These theoretical
tools together yield an account of the perceived open form and tonal
ambiguity that this song is famous for.
In The Friars Fringe of Consciousness, Daniel Dennett revisits a
proposal made in Jackendoffs Consciousness and the Computational
Mind whereby consciousness arises at, and only extends to, an intermedi-
ate level of diverse, interactive levels of representation, giving rise to an
experience of meaningfulness. In sync with a characteristically Jacken-
dovian move to tease apart diverse cognitive functions and establish the
intricate interplay between representations of different ilk, Dennett sets
out to deconstruct a widely-held view by dethroning consciousness as the
Subject, usurping a control function that oversees cognition in the Car-
tesian Theater of the mind. In his model, consciousness fulfills a monitor-
ing function, serving as an expediter or interface between diverse levels
of cognitive representation: it is conscious experience that allows for
internal (first-person) cognitive functions, as well as for making its con-
tents available for inter-personal communication.
Neil Cohns chapter finishes the volume with an excellent illustration
of Jackendoffs influence across different subfields of cognitive science.
Climbing Trees and Seeing Stars: Combinatorial Structure in Comics
and Diverse Domains begins with Jackendoffs work on the structure
of language, music and complex domains, and then explores the structure
of visual narratives such as comics. Following Jackendoff, he argues the
benefits of studying the mind by taking a wide view that encompasses a
variety of cognitive domains, topics and disciplines. The chapter high-
lights the point that distinct cognitive domains have combinatorial struc-
ture in common.
Introduction xxxiii
The chapters of this book form a true celebration of cognitive science

today: creative, daring, interesting, and thought-provoking. They are a
tribute to and celebration of Ray Jackendoffs contribution to the field.
Thank you, Ray!
Notes
1. In fact, when Chomsky coauthored a paper seemingly repudiating his earlier

strong claims about the detailed and task-specific character of humans innate
linguistic abilities (Hauser et al. 2002), Ray coauthored a response (Pinker
and Jackendoff 2005), defending the idea of very specific innate linguistic
mechanisms.
References
Bach, Emmon. 1986. Natural language metaphysics. In Logic, Methodology, and

Philosophy of Science VII, edited by Ruth Barcan Marcus, Georg J.W. Dorn, and
Paul Weingartner, 573595. Amsterdam: North-Holland.
Dennett, Daniel.1995. Darwins Dangerous Ideas. New York: Simon and
Schuster.
Gazdar, Gerald, Ewan H. Klein, Geoffrey K. Pullum, and Ivan A. Sag. 1985.
Generalized Phrase Structure Grammar. Oxford: Blackwell, and Cambridge, MA:
Harvard University Press.
Green, Georgia M. 2011. Elementary principles of HPSG. In Non-transformational
Syntax: a guide to current models, edited by Kersti Borjars and Robert Borsley,
953. Oxford: Wiley-Blackwell.
Hauser, Marc D., Noam Chomsky, and W.Tecumseh Fitch. 2002. The Language
Faculty: What is it, who has it, and how did it evolve? Science 298 (5598):
15691579.
Jackendoff, Ray. 1972. Semantic Interpretation in Generative Grammar. Cam-
bridge, MA: MIT Press.
Jackendoff, Ray S. 1975. Morphological and semantic regularities in the lexicon.
Language 51 (3): 639671.
Jackendoff, Ray S. 1976. Toward an explanatory semantic representation. Linguis-
tic Inquiry 7 (1): 89150.
Jackendoff, Ray. 1977. X-Bar Syntax. Cambridge, MA: MIT Press.
Jackendoff, Ray. 1978. Grammar as evidence for conceptual structure. In Linguis-
tic Theory and Psychological Reality, edited by Morris Halle, Joan Bresnan, and
George Miller, 201228. Cambridge, MA: MIT Press.
Jackendoff, Ray. 1983. Semantics and Cognition. Cambridge, MA: MIT Press.
Jackendoff, Ray. 1987. Consciousness and the Computational Mind. MA: MIT
Press.
xxxiv Introduction
Jackendoff, Ray. 1990. Semantic Structures. Cambridge, MA: MIT Press.

Jackendoff, Ray. 1991. Parts and Boundaries. Cognition 41 (13): 945. Reprinted
in Lexical and Conceptual Semantics, edited by Beth Levin and Steven Pinker,
945. Cambridge, MA: Blackwell, 1992. Reprinted in Ray Jackendoff, Meaning
and the Lexicon, xxxxxx. Oxford: Oxford Unviersity Press, 2010.
Jackendoff, Ray. 1997. How language helps us think. Pragmatics and Cognition 4
(1): 134. Revised version in Ray Jackendoff, The Architecture of the Language
Faculty, 179208. Cambridge, MA: MIT Press, 1997.
Jackendoff, Ray. 1997. The Architecture of the Language Faculty. Cambridge, MA:
MIT Press.
Jackendoff, Ray. 1998. Why a conceptualist view of reference? A reply to Abbott.
Linguistics and Philosophy 21 (2): 211219.
Jackendoff, Ray. 1999. Possible stages in the evolution of the language capacity.
Trends in Cognitive Sciences 7 (3): 272279.
Jackendoff, Ray. 2002. Foundations of Language: Brain, Meaning, Grammar,
Evolution. Oxford: Oxford University Press.
Jackendoff, Ray. 2006. The capacity for music: What is it and whats special about
it? Cognition 100 (1): 3372.
Jackendoff, Ray. 2009. Parallels and nonparallels between language and music.
Music Perception 26 (3): 195204. (In special issue celebrating the 25th anniver-
sary of Lerdahl and Jackendoffs Generative Theory of Tonal Music). Reprinted
as Music and Language, in The Routledge Companion to Philosophy and Music,
edited by Theodore Gracyk and Andrew Kania, 101112. New York: Routledge,
2011.
Jackendoff, Ray, and Peter Culicover. 2005. Simpler Syntax. Oxford: Oxford
University Press.
Jackendoff, Ray, and Barbara Landau. 1993. What and where in spatial cogni-
tion. Behavioral and Brain Sciences 16 (2): 217238.
Lerdahl, Fred, and Ray Jackendoff. 1983. A Generative Theory of Tonal Music.
Cambridge, MA: MIT Press.
Pinker, Steven, and Ray Jackendoff. 2005. The faculty of language: Whats special
about it? Cognition 95 (2): 201236.
Pollard, Carl, and Ivan A. Sag. 1987. Information-based Syntax and Semantics. Vol.
1: Fundamentals. Stanford: CSLI Publications.
Pollard, Carl, and Ivan A. Sag. 1994. Head-Driven Phrase Structure Grammar.
Chicago: University of Chicago Press.
Yip, Moira, Joan Maling, and Ray Jackendoff. 1987. Case in tiers. Language 63
(2): 217250.
I LINGUISTIC THEORY
1 Simpler Syntax and the Mind: Reflections on Syntactic
Theory and Cognitive Science1
Peter W. Culicover
1.1 Introduction
There are many fundamental and far-ranging questions about language

that Ray Jackendoff has touched on in his work; see, for example, Jack-
endoff (2002) Foundations of Language. These questions are often raised
in cognitive science but less often within linguistics itself: What is a rule?
What is (a) grammar? What are syntactic constraints, and where do they
come from? What is the difference between competence and perfor-
mance? What is the difference between grammaticality and acceptabil-
ity? Where do universals come from? What is the relationship between
linguistic theory and language acquisition, language processing and lan-
guage variation?2
In this paper I focus on a question that touches upon many of these
topics: how is knowledge of language represented in the mind? An
answer that Ray has offered over the years to this question is that rules
of grammar are taken to be pieces of structure stored in memory, which
can be assembled online into larger structures (Culicover and Jackend-
off 2006, 415).3
I explore this idea here from the perspective of Simpler Syntax (SS,
see Culicover and Jackendoff 2005), which assumes that grammars are
composed of constructions. The structures posited by SS are motivated
by the Simpler Syntax Hypothesis (SSH):
Simpler Syntax Hypothesis: Syntactic structure is only as complex as it
needs to be to establish interpretation.
SS thus contrasts with mainstream approaches to syntactic theory in
which a primary motivation for structure is the maximization of struc-
tural and derivational uniformitysee Culicover and Jackendoff (2005,
chaps. 13) for discussion.
4 Peter W. Culicover
I suggest below a particular implementation of the idea of pieces of

structure stored in memory, which I refer to in this paper as memory
structures. It is an instantiation of Marrs (1982) algorithmic level that
carries out the computation of some cognitive function. The grammar,
on this view, is a description of the function that the mind is computing,
but not a description of the mental architecture itself. Competence is
embodied in the device in the mind that computes correspondences
between form and meaning, correspondences that are described in terms
of constructions. This is one way of understanding Chomskys idea that
linguistic competence is incorporated into a performance mechanism
that produces and understands language (Chomsky 1965, 15).
This view raises a number of difficult questions. I can only hope to
raise them here, and suggest where some solutions might lie. Section 1.2
sets the stage by briefly summarizing the constructional perspective of
SS. Consideration of the acquisition of constructions by learners takes
us in section 1.3 to the question of representation, in particular, what
memory structures might look like. Crucially, I assume that constructions
are represented in the mind as computational routines for mapping
between strings and meanings. Section 1.4 considers how constructions
might function in real time in the processing of sentences. Section 1.5
concludes with the question of where universals come from in a theory
such as SS that makes minimal assumptions about the human language
faculty, and in particular does not assume that generalizations such as
island constraints are part of grammar.
1.2 The Constructional Perspective
On the constructional view, a native speakers knowledge of language

consists of form-meaning correspondences and how to construct expres-
sions that exemplify them. Following prior constructional theorizing
(e.g., Goldberg 1995), I assume that each individual word, with its
meaning, is acquired by learners as an individual correspondence, that
some correspondences between strings of words that constitute phrases
and their meanings are acquired by learners as sui generis, and that more
general correspondences are formed through generalization over sets of
these individual correspondences that share common properties of form
and meaning.
This constructional perspective is central to Jackendoffs Parallel
Architecture and SS. The essence of the notion CONSTRUCTION is that the
grammar consists of correspondences between sound and meaning,
Simpler Syntax and the Mind 5
mediated by syntax, and stored in the (extended) lexicon. A sentence is

well-formed if every part of its form and meaning is properly licensed
by some construction. Instantiations of this view of well-formedness can
be found in work in Construction Grammar (see Kay and Fillmore 1999;
Kay 2002, 2005; Mller 2006; Sag 2012).
In the Parallel Architecture, a word is a correspondence between a
phonological form and a meaning, mediated by syntactic information,
such as category and formal features (gender, number, etc.). An indi-
vidual phrase and a sentence composed of several phrases is also such a
correspondence. Constructional approaches account for creativity by
allowing for generalization based on sets of individual correspondences.
While speakers may store exemplars of some individual expressions, they
are able to go beyond their experience through generalization.
For example, the word pizza is a construction, a correspondence
between a phonological representation, here [pits], a syntactic represen-
tation N, and a meaning PIZZA. (I use boldface for the elements of con-
ceptual structure representations.) The correspondence is shown in (1):
(1) PHON [pits]1
SYN N1
CS PIZZA1
Similarly, the representation for the lexical item eat is given in (2). It
takes two arguments, an AGENT and a PATIENT.
(2) eat
PHON [it]1
SYN V1
CS y.x.EAT1(AGENT:x,PATIENT:y)
For simplicity of exposition, I ignore the constructional details of
inflection.
A correspondence for eat the pizza is given in (3). The first term is pho-
nological, the second is syntactic, and the third is semantic. By definition,
the phonological term incorporates information about temporal order-
ing, while the syntactic and semantic terms do not. The subscripts indi-
cate correspondences between the constituents of each representation.
(3) PHON [it1 2 pits3]
SYN [VP V1, [NP ART2, N3]]
CS x.EAT1(AGENT:x, PATIENT:PIZZA3[DEF2])
This correspondence says that in the linearized phonological representa-
tion, [it] corresponds to a V with the meaning EAT, [ pits] corresponds
to an NP consisting of ART and N with the meaning of PIZZA[DEF],

and this NP meaning is the PATIENT argument of EAT.
It is plausible that a learner that experiences many exemplars of indi-
vidual correspondences that share certain properties will generalize over
these exemplars and hypothesize a generalized construction. There is
some evidence that children start generalizing quite early in the course
of language acquisition (Naigles 2002; Naigles, Hoff, and Vea 2009;
Gertner, Fisher, and Eisengart 2006), but the extent of such generaliza-
tion remains a contentious issue. Tomasello (2003) has suggested that the
grammar of a learner never reaches maximal generality, and that typical
rules of grammar such as phrase structure rules are actually collections
of more specific constructions.
Whatever the timing and extent of generalization might be, learners
do acquire lexical items and more complex constructions. These are
instances of memory structures. The question of generalization concerns
the extent to which individual pieces of structure are ultimately sub-
sumed by more general representations. But, even given very conserva-
tive assumptions about generalization, it appears that English speakers
eventually arrive at the notion of a transitive VP. That is, given enough
transitive verb phrases, learners eventually generalize the pattern exem-
plified by (3).
The pattern seen in (3) and other transitive VPs is as follows: the
phonological form corresponding to the V precedes the phonological
form corresponding to the NP, and the meaning corresponding to the NP
is the argument of the meaning corresponding to the V. Using the nota-
tion of (3), this generalized construction may be represented as (4),
where is a variable, and V and NP are constituents of VP:
(4) Transitive VP
PHON 1-2
SYN [VP V1, NP2]
CS V1(NP2)
This representation describes a piece of structure in memory.4 Since the
meaning of eat is that in (2), eat the pizza in (3) is licensed by (4).
Idioms and constructions with idiomatic properties take a similar form,
where again PHON specifies the linear order of elements, SYN describes
the structure, and CS the corresponding interpretation. Representations
for kick the bucket, take a walk, and sell NP down the river are given in
(5), (6), and (7), respectively:5
(5) kick the bucket

PHON [[kk]1-[2 bkt3]4]5
SYN [VP V1, [NP ART2, N3]4]5
CS x.DIE5(EXPERIENCER:x)
(6) take a walk
PHON [[teyk]1-[2 wk3]4]5
SYN [VP V1, [NP ART2, N3]4]5
CS x.WALK5(AGENT:x)
(7) sell NP down the river
PHON [sl]1-2-[dn3 4 rvr5]6
SYN [VP V1, NP2, [PP P3, [NP ART4, N5]]6]
CS y.x.BETRAY1+6 (AGENT:x, PATIENT:y)(NP2)
Notice that in the last example, the NP2 in SYN is a variable. The
description of the construction guarantees that the phonological form
of this constituent is situated after that of sell and before that of
down, and that its interpretation functions as the argument of the
meaning BETRAY, which corresponds to the idiom sell1 [down the
river]6.
Because SS is a constructional theory, it strongly favors minimal syn-
tactic structures to account for correspondence with interpretation. For
instance, given the sequence V-NP, if the corresponding interpretation is
V(NP), it is simpler to state this directly in terms of the structure [VP V
NP] rather than posit a more abstract syntactic structure such as [vP Vi
NPj [VP ti tj ]] or something even more complex. In other words, SS
does not rule out complex structures with filler-gap chains, but such
structures would have to be strongly motivated by the linguistic facts.
So SS does assume a filler-gap chain in A constructions, for example,
but only because doing so explains properties of the interpretation, sim-
plifies the grammatical description, accounts for reconstruction effects,
and so on.
A constructional theory is also well-suited to account for semi-
regularities, idiosyncrasies, and exceptions, locating these phenomena
in the degree of specificity of the terms of the syntactic description.
In a more categorical theory (e.g., Principles and Parameters Theory)
that makes a sharp distinction between core and periphery, such
phenomena are typically ruled out of consideration because they
do not fall within the range of the descriptive devices (Culicover
2011).
1.3 Representations
A fundamental characteristic of constructions, that is, memory structures,

is that they include knowledge of temporal ordering of forms, repre-
sented by the ordered subscripted components of PHON in our descrip-
tions. This is a point that Jackendoff has made often, and quite clearly in
Jackendoff (2002), but it is important enough to bear repeating and
restating. A speaker knows that stringing one form after the other in time
in a particular way corresponds to a particular meaning, and that in order
to express a meaning, one orders certain forms one after the other in time.
This view contrasts with the familiar (and conventional) idea that knowl-
edge of a language (I-language) consists of knowledge of the well-
formed structures, and linear order and interpretation are merely the
consequence of processes applying at the interfaces (Chomsky 1986).
In this section I summarize a characterization of memory structures in
terms of the metaphor of TRAJECTORIES in a LINGUISTIC SPACE. The linguis-
tic space is the memory; the individual pieces of structure are the primi-
tive trajectories. Each point on a trajectory is a correspondence embodied
in a linguistic element, that is, a phoneme, a word, or a morpheme.6 The
ordering of the points on the trajectory represents the linear ordering of
expressions in the language; constraints on what trajectories are possible
in a language represent knowledge of grammatical structure. The lin-
guists grammar is a description of the configuration of this space, of how
the trajectories are arranged and how they relate to one another, to some
level of precision.
Consider the acquisition by a learner of a specific correspondence, one
that might in principle be an idiom (but ultimately turns out not to be),
for example, pet the kitty. The learner learns that there is a correspon-
dence between the phonological form and the meaning, along the lines
outlined in the preceding section, and learns that to express the meaning,
one produces the words in the specified order.
Crucially, learning how to produce the form that conveys this meaning
is not something that follows the identification, abstraction, and gener-
alization of the syntactic structure, but precedes it. In other words, what
is acquired first is the individual correspondence, including the particular
actions that one must perform in order to express (or understand) the
expression (Tomasello 2003). Generalization to a construction specified
in terms of lexical and phrasal categories (i.e., forming a rule) abstracts
over the syntactic and semantic categories of the elements, and preserves
the linearization information, along the lines of the constructions in
(4) and (7).
Culicover (1998) and Culicover and Nowak (2003) characterize lan-

guage acquisition in terms of gradually filling the linguistic space with
trajectories representing individual sound-meaning correspondences.
Acquisition is a process of gradually abstracting and generalizing over
the properties of related trajectories. Culicover and Nowak (2003)
assume that each individual construct is a point in the linguistic space,
and the individual constructions are trajectories connecting points.
Each point corresponds to a distinct word, and words that are similar
in meaning are assumed to be near one another in the space. In the
acquisition of generalized constructions, individual trajectories are
engraved in this space on the basis of experience with individual corre-
spondences. A syntactic category is a connected region of the space, and
the structure of an expression is a description of the path that the trajec-
tory takes through the regions of this space.7 For example, the correspon-
dence in (3) says that in order to express the given meaning, follow the
trajectory denoted by PHON, passing through the corresponding regions
of the linguistic space.8
(a) (b) (c)
Figure 1.1
Development of flow between regions at times (a), (b), and (c)
When several trajectories go from one region to another region, a flow

develops. A simple illustration is given in figure 1.1.
On this metaphor, generalization is a matter of filling in the entire set
of trajectories between two regions when a sufficient number of indi-
vidual trajectories between them have been established.
The description of an individual construction is the trajectory that is
followed in processing this expression (Culicover 1998). For example,
(3) says to traverse the VP region by going first to the word eat in the
V region, and then to the NP region, where first the in the ART
region is processed, and then pizza in the N region is processed. Our
linguistic description of a very general construction abstracts away from
individual lexical items and specifies trajectories simply in terms of the
regions, that is, the categories. Other constructions, such as the one
embodying sell down the river, are a mix of individual lexical items and
categories.
On this view, call it the Spatial Implementation, the syntactic structure

of an expression is simply a description of the trajectory: what regions
of the space it visits, in what order, and what interpretations it is linked
to along the way. Syntactic structure is essential to generalization beyond
individual exemplars. If the syntactic description were simply linearized
V, ART, and N, it would say visit a V, then an ART, and then an N,
which would yield the correct sequence. But if the syntactic description
is [VP V, [NP ART N]], then the syntax says that the sequence corresponds
to an interpretation in which ART-N itself has a meaning, this meaning
is an argument of V, and the entire sequence corresponds to a meaning.
In effect, syntactic structure is the link between linear order and struc-
tured meaning.
I suggest that the Spatial Implementation is a useful way to understand
pieces of structure stored in memory. Memory is not static but dynamic.
That is, it is contains a sequence of instructions for processing the sen-
tence in production and comprehension. The radical speculation here is
that the knowledge that underlies this capacity is represented in the
processor and not in some other mental component. In other words,
there is no architectural distinction between competence and perfor-
mance. What exists is performance, and competence is embodied in how
this device is organized. Taking this position has a number of implications
and raises a number of fundamental questions, which I take up in the
next section.
1.4 Processing Constructions
Consider what happens when we take the representations in section 1.3

to be memory structures. The processing of a sentence proceeds from the
beginning of the sentence by projecting possible continuations of the
string, as reflections of projected structure. These possible continuations
are alternative paths that can be followed in the linguistic space. Since
at many points in a sentence there is typically more than one possible
continuation, a plausible theory of sentence processing takes a probabi-
listic, parallel perspective. The set of possible trajectories may be
expressed as a probabilistic phrase structure grammar, where the prob-
ability of each construction at any point in the processing of the string
is determined by its relative frequency in the corpus on which the learner
has been trained (Hale 2001). In computational linguistics, such a learner
is a parser for the language (Nguyen, Van Schijndel, and Schuler 2012).
Computational parsers are trained on annotated corpora such as those
in the Penn Treebank. The human parser is trained on the corpus of the
learners experience.
A probabilistic phrase structure grammar has rules of the form in (8),
where A, B, C . . . are categories and p is the probability of the particular
expansion.
(8) [p]A B C . . .
When the parser encounters an instance of B, it projects the structure
[A B C . . .] with probability p. The probability is determined by the
frequency of the full structure initiated by B in the corpus that the
parser is trained on. These probabilities correspond in our physical
description of processing in a linguistic space to the width and density
of trajectories.
My experiments using a parser trained on a tagged corpus have shown
that configurations that are locally well-formed but globally non-existent
in the corpus cannot be correctly parsed.9 To take just one example, it is
well-known that extraction from a sentential subject in English, as in (9),
is unacceptable (Ross 1967):
(9) *These are the shares whichi [S that the president sold ti] has
surprised the market.
This sentence is locally well-formed, in that a sentence may be a subject
in English, the wh-phrase is where a wh-phrase may be, and the gap is
where a gap may be.
Interestingly, the filler-gap configuration exemplified here does not
occur in the corpus. The reason may be that (9) is ungrammatical in the
traditional sense, or it may be nonexistent in the corpus for reasons other
than grammar per se. In any case, the parser is not trained on sentences
like (9), and hence does not handle such a sentence properly, as shown
in figure 1.2. The feature g (for gap) should appear on the node
RP-IM, but actually is passed down through VS-gNS-II. This is an error,
since the extraction is from the subject, not the matrix VP.
The traditional explanation for the unacceptability of sentences such
as (9) is that it violates a grammatical constraint. However, there is an
alternative possibility: that such cases reflect processing complexity
(Hofmeister, Casasanto, and Sag 2013). On this view, more complex
configurations, like genuine cases of ungrammaticality, are rare in the
experience of the learner. This rarity gives rise to high surprisal, reflect-
ing the low or zero probability of the configuration (Hale 2001, 2003;
Levy 2005, 2008). High surprisal in turn correlates with the subjective
experience of unacceptability (Crocker and Keller 2006).
Figure 1.2
Parse of extraction from subject
The idea that extraction from subjects introduces complexity was pro-
posed by Kluender (1992, 1998, 2004). Similar arguments have been
made for other island constraints in the recent literature (see, e.g., Hof-
meister 2011; Hofmeister and Sag 2010; Hofmeister et al. 2007; Hofmeis-
ter et al. 2013; Hofmeister, Culicover, and Winkler, forthcoming; Sag,
Hofmeister, and Snider 2007). While a fully explicit processing account
of these constraints in terms of complexity is yet to be formulated, SS
points in this direction, on the assumption that grammatical knowledge
consists only of constructions. The task of the processor is to take these
constructions, that is, memory structures in the performance mechanism,
and fit them together in order to compute representations for more
complex expressions. On this view, any judgment that cannot be tied
directly to the well-formedness conditions imposed by constructions
must have an extra-grammatical explanation.
1.5 Where Do Universals Come From?
The preceding sections suggest that no matter what the linguistic experi-
ence of the learner is, it will be incorporated into linguistic competence
in the language processing mechanism in the form of a construction. Such
a view does not explain where the linguistic experience comes from,
or what if anything constrains its properties. But it does appear that
languages share certain properties and lack others, and that some proper-
ties, at least, are good candidates for universals. So we come to what is
probably the most fundamental issue in syntactic theory, which is that of
universals: how are they represented in the mind, and where do they
come from?
Regarding the first question, we propose in SS that Universal Grammar,
that is, the human language faculty, is a toolkit that learners draw upon
in construction grammars of their languages; this is an idea that has been
prominent in Jackendoffs work (see e.g., Jackendoff 2002, chap. 4).
Something that is in the toolkit need not be in every grammar, but it
must be universally available. The toolkit assumed in SS is very restricted,
compared with more traditional grammatical theories (Culicover and
Jackendoff 2005, chap. 1).
Regarding the second question, in Culicover (1999) and Culicover
(2013), I suggest that universals are in part reflections of economy in
the formulation of SYN-CS correspondences. The notion of economy is of
course familiar from the Minimalist Program, where it is envisioned in
terms of computational perfection (Chomsky 1995). I take economy to
be a matter of the actual complexity of the form-meaning correspondence.
Let us begin with the plausible assumption that what is evolutionarily
prior to language is essentially human CS, as articulated by Jackendoff
(1972, 1983, 1990, 1997, 2002). In particular, assume that it represents ref-
erence to objects, relations between objects and properties of objects, and
events and states, that is, representations of the form x.F(:x). Kirby
(1997, 2002) and his colleagues (Kirby, Smith, and Brigthon 2007) have
conducted computational experiments to model the evolution of lan-
guage. These experiments show how groups of agents, that is, learners, in
a generation can settle on increasingly more general grammatical hypoth-
eses about the correspondences between strings and meanings produced
by the preceding generation. Once a group of agents hits upon the idea of
using sounds to refer to and distinguish objects and their properties, syn-
tactic representations may evolve that are as complex as the CS represen-
tations, and in fact closely mirror the structure of these representations.
A key advance in the evolution of such representations is the forma-
tion of categories based on similarity of properties and distribution. So
it is reasonable to assume that three key universals are the following:
(i) CS is structured and recursive.
(ii) Sound corresponds to CS.
(iii) Form categories.
Universal (ii) is, of course, the notion of a constructionessentially

Jackendoffs (2002, sec. 8.3) use of symbols, and adding (iii) gives us
syntax.
I hypothesize that these universals provide a way to get a linguistic
system under way without assuming universals formulated in terms spe-
cific to linguistic structure.
Next, we must assume a notion of economythe SSH:
(iv) SSH: Syntactic structure is only as complex as it needs to be to
establish interpretation.
Beyond this, processing considerations suggest that dependent elements
are as close to one another in time as possible, and that logical scope is
reflected in linear order (see Culicover and Nowak 2002; Culicover 2013;
Hawkins 1994, 2004).
In Culicover (2013) I also argue, following early ideas about marked-
ness in generative grammar (Chomsky 1965), that maximal generality
consistent with the evidence also follows from economy. This is prin-
ciple (v):
(v) Generalize maximally, consistent with the evidence.
There are two ways in which such generalization might simplify construc-
tions. The first is the identification of a particular phonological form with
a particular CS function. An example of such an innovation would be
the introduction of case to represent the correspondence between a
phrase and its thematic role. The second is the identification of a particu-
lar linear order with a particular CS function. An example would be the
introduction of grammatical functions defined in terms of structural posi-
tions, again to represent the correspondence between a phrase and its
thematic role.
Suppose that grammatical devices such as inflection and grammatical
functions are not biologically evolved, that is, that they are not part of
the human language faculty. But if they are not biologically evolved, how
do we account for their ubiquity, if not universality? By appealing to the
role of economy in language change and language contact, we can make
some sense of the fact that there are certain tools that are universally
available without appealing to biological evolution (Briscoe 2000, 2002;
Brighton, Smith, and Kirby 2005; Kirby, Smith, and Brighton 2007; Chater
and Christiansen 2010).
Assuming (i)(v), we can understand the introduction of devices such
as case and grammatical functions into the toolbox as a consequence of
linguistic evolution. A language is far from a perfect system; it may
incorporate non-optimal ways of computing the form-meaning corre-

spondence, a point that Jackendoff has often made.
Suppose now that a particular grammatical device is discovered that
reduces the cost of computing some aspect of the form-meaning corre-
spondence. Once such a grammatical device is invented, it will compete
successfully with less effective devices (Culicover 2013). Further general-
ization of a device might result in syntactic autonomy, where the device
becomes a condition on constructional well-formedness in a language. For
example, English has a requirement that there must be a grammatical
subject in a finite sentence. The result is that when there is no -role linked
to the subject position, there is an expletive subject, as in extraposition (It
is obvious that S), raising verbs (it seems that S), and there-sentences (there
is a fly in my soup; there suddenly entered the room a rowdy bunch of
drunken partygoers). Similarly, English constructions that require an aux-
iliary verb show expletive do when no such verb is available, as in inver-
sion (Did you call?), sentential negation (I did not call), and so on.
On this view, part of the toolbox is transmitted through language itself
as the learner acquires the constructs, and then, through generalization,
the constructions that embody these grammatical devices. In other words,
the grammar itself, and UG, are embodied in the set of correspondences
in the linguistic experience of the language learner and in the construc-
tions that the learner formulates on the basis of this experience.
To sum up, I have a proposed here an interpretation and implementa-
tion of Jackendoffs idea that knowledge of language is represented in
the mind as pieces of structure stored in memory (Culicover and Jack-
endoff 2006, 415). These memory structures are constructions. This idea
fits well with Simpler Syntax, which holds that much of what has been
assumed to be in the language faculty is in fact not part of it. Some (more
or less) universal aspects of language are cultural artifacts that are trans-
mitted to learners and speakers through language acquisition and lan-
guage contact. Others follow from processing complexity, which leads to
non-representation in learners experience and corresponding judgments
of unacceptability by speakers.
Naturally, considerable future research will be required to determine
the extent to which these ideas are on the right track and to fill in the
myriad details.
Notes
1. I have to confess that I (deviously) got Ray to comment on another piece that
I was working on at the same time as this one that dealt with some of the same
issues. As always, his comments have been very much to the point, and have led
to substantial improvements. He is of course not responsible for any errors. More
generally, I am pleased to once again have the opportunity to thank him for his
friendship, his kindness, his patience, and his generosity, to acknowledge the
enormous influence he has had on me and my work, and to thank him for afford-
ing me the privilege of collaborating with him for (wait for it!) . . . over FORTY
fabulous years.
For very helpful comments on this piece in its present form, I thank Dan Siddiqi
and an anonymous reviewer. I am also grateful to Richard Samuel for stimulating
discussions about many issues, including the competence-performance distinc-
tion. Naturally, none of them are responsible for any errors, either.
2. Of course, linguistics is a branch of cognitive science, since language is a cre-
ation of the human mind. But much of linguistic research is not explicitly con-
cerned with the mental representation of language, while mental representation
is the central concern of cognitive science.
3. This particular quotation is from a joint article, but it has been Jackendoffs
idea for some time; see, e.g., Jackendoff (2002, chap. 6).
4. We argue in SS that the grammatical functions Subj and Obj must also be
represented in correspondences. I leave these out here in part to simplify the
exposition, and in part because in simple correspondences the grammatical func-
tions are redundant. They appear to play a role, however, in capturing relation-
ships between constructions such as active-passive.
5. I include the phonetic form of these expressions for explicitness, although it
is inherited from the forms of the individual words and the normal syntactic
structure of the English VP.
6. Treating the elements as points is of course a simplification, since they too
have temporal characteristics.
7. Since the syntactic part of the space is not structured prior to experience,
categories will vary across languages, as suggested by Culicover (1999) and Croft
(2001, 2005), among others. However, since the semantic part of the space is
universal, it will constrain the types of categories that form, under reasonable
assumptions about economy and generalization. See section 1.5 for further
discussion.
8. The traversal of a trajectory is neutral with respect to speaker and hearer. A
speaker starts with the CS representation, producing the sounds while going
through the corresponding syntactic representation and from that to the phono-
logical form. A hearer is driven through the trajectory by the phonological
form, which corresponds to the syntactic structure, which in turn corresponds to
the interpretation. In fact, in the course of real time processing, the hearer
is likely to entertain multiple alternative structures, a point that I return to in
section 1.4.
9. The experiments use the parsing environment described in Nguyen et al.
(2012), and were carried out in collaboration with William Schuler and Marten
van Schijndel.
References
Brighton, Henry, Kenneth Smith, and Simon Kirby. 2005. Language as an evolu-
tionary system. Physics of Life Reviews 2 (3): 177226.
Briscoe, Edward. 2000. Grammatical acquisition: Inductive bias and coevolution
of language and the language acquisition device. Language 76 (2): 245296.
Briscoe, Edward. 2002. Linguistic Evolution through Language Acquisition:
Formal and Computational Models. Cambridge: Cambridge University Press.
Chater, Nick, and Morten H. Christiansen. 2010. Language acquisition meets
language evolution. Cognitive Science 34 (7): 11311157.
Chomsky, Noam. 1965. Aspects of the Theory of Syntax. Cambridge, MA: MIT
Press.
Chomsky, Noam. 1986. Knowledge of Language: Its Nature, Origin, and Use. New
York: Praeger.
Chomsky, Noam. 1995. The Minimalist Program. Cambridge, MA: MIT Press.
Crocker, Matthew W., and Frank Keller. 2006. Probabilistic grammars as models
of gradience in language processing. In Gradience in Grammar, edited by Gisbert
Fanselow, Caroline Fry, Ralf Vogel, and Matthias Schlesewsky, 227245. Oxford:
Oxford University Press.
Croft, William. 2001. Radical Construction Grammar: Syntactic Theory in Typo-
logical Perspective. Oxford: Oxford University Press.
Croft, William. 2005. Logical and typological arguments for radical construction
grammar. In Construction Grammars: Cognitive Grounding and Theoretical
Extensions, edited by Mirjam Fried and Jan-Ola stman, 273314. Amsterdam:
John Benjamins.
Culicover, Peter W. 1998. The minimalist impulse. In The Limits of Syntax,
edited by Peter W. Culicover and Louise McNally, 4777. New York: Academic
Press.
Culicover, Peter W. 1999. Syntactic Nuts: Hard Cases, Syntactic Theory, and Lan-
guage Acquisition. Oxford: Oxford University Press.
Culicover, Peter W. 2011. Core and periphery. In The Cambridge Encyclopedia
of the Language Sciences, edited by Patrick Colm Hogan, 227230. Cambridge:
Cambridge University Press.
Culicover, Peter W. 2013. Grammar and Complexity: Language at the Intersection
of Competence and Performance. Oxford: Oxford University Press.
Culicover, Peter W., and Ray Jackendoff. 2005. Simpler Syntax. Oxford: Oxford
University Press.
Culicover, Peter W., and Ray Jackendoff. 2006. The Simpler Syntax Hypothesis.
Culicover, Peter W., and Andrzej Nowak. 2002. Learnability, markedness, and the
complexity of constructions. In Language Variation Yearbook, vol. 2, edited by
Pierre Pica and Johan Rooryk, 530. Amsterdam: John Benjamins.
Culicover, Peter W., and Andrzej Nowak. 2003. Dynamical Grammar. Oxford:
Gertner, Yael, Cynthia Fisher, and Julie Eisengart. 2006. Abstract knowledge of
word order in early sentence comprehension. Psychological Science 17 (8):
684691.
Goldberg, Adele E. 1995. Constructions: A Construction Grammar Approach to
Argument Structure. Chicago: University of Chicago Press.
Hale, John T. 2001. A probablistic Earley parser as a psycholinguistic model. In
Proceedings of the Second Meeting of the North American Chapter of the Associa-
tion for Computational Linguistics, 18, Morristown, NJ: Association for Com-
putational Linguistics.
Hale, John T. 2003. The information conveyed by words in sentences. Journal of
Psycholinguistic Research 32 (2): 101123.
Hawkins, John A. 1994. A Performance Theory of Order and Constituency. Cam-
bridge: Cambridge University Press.
Hawkins, John A. 2004. Complexity and Efficiency in Grammars. Oxford: Oxford
University Press.
Hofmeister, Philip. 2011. Representational complexity and memory retrieval in
language comprehension. Language and Cognitive Processes 26 (3): 376405.
Hofmeister, Philip, Inbal Arnon, T. Florian Jaeger, Ivan A. Sag, and Neal Snider.
2013. The source ambiguity problem: Distinguishing the effects of grammar and
processing on acceptability judgments. Language and Cognitive Processes 28
(12): 4887.
Hofmeister, Philip, Peter W. Culicover, and Susanne Winkler. Forthcoming.
Effects of processing on the acceptability of frozen extraposed constituents.
Syntax 19.
Hofmeister, Philip, T. Florian Jaeger, Ivan A. Sag, Inbal Arnon, and Neal Snider.
2007. Locality and accessibility in wh-questions. In Roots: Linguistics in Search
of Its Evidential Base, edited by Sam Featherston and Wolfgang Sternefeld,
185206. Berlin: de Gruyter.
Hofmeister, Philip, and Ivan A. Sag. 2010. Cognitive constraints and island effects.
Language 86 (2): 366415.
Hofmeister, Philip, Laura Staum Casasanto, and Ivan A. Sag. 2013. Islands in the
grammar? Standards of evidence. In Experimental Syntax and Island Effects,
edited by Jon Sprouse and Norbert Hornstein, 4263. Cambridge: Cambridge
University Press.
MIT Press.
Jackendoff, Ray. 2002. Foundations of Language. Oxford: Oxford University

Press.
Kay, Paul. 2002. An informal sketch of a formal architecture for Construction
Grammar. Grammars 5 (1): 119.
Kay, Paul. 2005. Argument structure constructions and the argument-adjunct
distinction. In Grammatical Constructions: Back to the Roots, edited by Mirjam
Fried, 7198. Amsterdam: John Benjamins.
Kay, Paul, and Charles J. Fillmore. 1999. Grammatical constructions and
linguistic generalizations: The Whats X doing Y? construction. Language 75 (1):
133.
Kirby, Simon. 1997. Competing motivations and emergence: Explaining implica-
tional hierarchies. Language Typology 1 (1): 532.
Kirby, Simon. 2002. Learning, bottlenecks, and the evolution of recursive syntax.
In Linguistic Evolution through Language Acquisition: Formal and Computa-
tional Models, edited by Edward J. Briscoe, 173204. Cambridge: Cambridge
University Press.
Kirby, Simon, Kenny Smith, and Henry Brighton. 2007. From UG to universals:
Linguistic adaptation through iterated learning. In What Counts as Evidence in
Linguistics: The Case of Innateness, edited by Martina Penke and Anette Rosen-
bach, 117138. Amsterdam: John Benjamins.
Kluender, Robert. 1992. Deriving island constraints from principles of predica-
tion. In Island Constraints: Theory, Acquisition and Processing, edited by Helen
Goodluck and Michael Rochemont, 223258. Dordrecht: Kluwer.
Kluender, Robert. 1998. On the distinction between strong and weak islands: A
processing perspective. In The Limits of Syntax, edited by Peter W. Culicover and
Louise McNally, 241279. New York: Academic Press.
Kluender, Robert. 2004. Are subject islands subject to a processing account?
In WCCFL 23: Proceedings of the 23rd West Coast Conference on Formal
Linguistics, edited by Vineeta Chand, Ann Kelleher, Angelo J. Rodrguez,
and Benjamin Schmeiser, 101125. Somerville, MA: Cascadilla Press. http://
babel.ucsc.edu/~wagers/islands/readings/Kluender_WCCFL04.pdf.
Levy, Roger. 2005. Probabilistic Models of Word Order and Syntactic Discontinu-
ity. PhD diss., Stanford University.
Levy, Roger. 2008. Expectation-based syntactic comprehension. Cognition 106
(3): 11261177.
Marr, David. 1982. Vision. San Francisco: W.H. Freeman and Co.
Mller, Stefan. 2006. Phrasal or lexical constructions? Language 82 (4):
850883.
Naigles, Letitia R. 2002. Form is easy, meaning is hard: Resolving a paradox in
early child language. Cognition 86 (2): 157199.
Naigles, Letitia R., Erika Hoff, and Donna Vea. 2009. Flexibility in Early Verb
Use: Evidence From a Multiple-n Diary Study. Boston: Wiley-Blackwell.
Nguyen, Luan, Marten van Schijndel, and William Schuler. 2012. Accurate
unbounded dependency recovery using generalized categorial grammars. In Pro-
ceedings of COLING 2012): Technical Papers, 21252140. http://www.aclweb.org/
anthology/C/C12/C12-1130.pdf.
Ross, John R. 1967. Constraints on Variables in Syntax. PhD diss., MIT.
Sag, Ivan A. 2012. Sign-based Construction Grammar: An informal synopsis. In
Sign-based Construction Grammar, edited by Hans C. Boas and Ivan A. Sag,
39170. Stanford, CA: CSLI.
Sag, Ivan A., Philip Hofmeister, and Neal Snider. 2007. Processing complexity in
subjacency violations: The complex noun phrase constraint. In Proceedings of the
43rd Annual Meeting of the Chicago Linguistic Society, edited by Malcolm.
Elliott, James Kirby, Osamu Sawada, Eleni Staraki, and Suwon Yoon, 215229.
Chicago: Chicago Linguistic Society.
Tomasello, Michael J. 2003. Constructing a Language. Cambridge, MA: Harvard
University Press.
2 What Makes Conceptual Semantics Special?
Urpo Nikanne
2.1 A Brief History of Conceptual Semantics
Noam Chomsky argued in his article Remarks on Nominalization

(1970) for two very influential hypotheses: (1) interpretative semantics
and (2) the lexicalist hypothesis. In interpretative semantics, the semantic
interpretation is based on the surface structure, that is, not on the deep
structure, as supposed by Generative Semantics. The lexicalist hypothesis
assumes that even derived lexical entries are in the lexicon, not derived
in syntax, as supposed by Generative Semantics. These hypotheses were
crucial for the theoretical development of Ray Jackendoffs Conceptual
Semantics, and in generative linguistics in general.
In his 1972 book Semantic Interpretation in Generative Grammar,
Jackendoff pointed out that it follows from the lexical hypothesis that
semantics cannot be derived from syntax or vice versa. This gives rise
to a need for a theory of language in which semantics plays as central
a role as syntax, while at the same time the theory of semantics is
compatible with the theory of syntax. Jackendoff introduced a set of
semantic functions (CAUSE, BE, GO, STAY) that are the building
blocks of event structure. In addition, he developed the first version of
the theory of linking syntactic and semantic structures. In his 1975 article
Toward an explanatory semantic representation, he developed his
ideas further.
Jackendoffs 1983 book Semantics and Cognition is the most important
declaration of the Conceptual Semantics program for research. In
Semantics and Cognition, Jackendoff introduced the research program
and formulated its main principles. The idea of an integrated formal
theory of the human mind was set as the ultimate goal of the research,
and the theory of a modular mind was developed. In his later
publicationsfor example, Consciousness and the Computational Mind
22 Urpo Nikanne
(1987a), Semantic Structures (1990), Foundations of Language (2002),

Language, Consciousness, Culture (2007)Jackendoff developed his
ideas further, but the big picture can be already found in Semantics and
Cognition.
I have been inspired by Jackendoffs research and Conceptual Seman-
tics since the early 1980s and agree that the theory of language should
be formal and integrated with the theory of the rest of the human mind.
My own work and the research done at bo Akademi University within
Conceptual Semantics have aimed at building a logical theory on the
tenets established by Jackendoff (Nikanne 1990, 1995, 1996, 2006, 2008;
Prn 2004; Paulsen 2011; Petrova 2011).
Researchers are often occupied with their formalisms, statistics, gram-
matical details, etc., in their daily work and it may be hard for some to
see what the deepest essence of the research program actually is. This
chapter is my interpretation of the methodology and the linguistic world
view of Conceptual Semantics. Some of the principles are formulated by
Jackendoff. My goal is to find a systematic way to formalize the funda-
mental methodological building blocks of the approach, especially the
methodological guidelines.
2.2 Goals, Background Assumptions, and Methodological Guidelines
When characterizing a particular school of thought in science, the fol-

lowing things should be taken into account:
1. Goals of research
2. Background assumptions
3. Methodological guidelines
4. Formalisms and technical solutions
As the purpose of science is to find out the true nature of or at least to
better understand natural phenomena, the goals of the research are the
most fundamental features of a scientific approach. The goals consist of
two parts: the research topic (i.e., the natural phenomenon that the
research is about, such as language for linguistics) and the point of view
(i.e., the angle from which the natural phenomenon is approached, such
as whether language is studied as a social phenomenon, as a part of the
human mind, or as a formal apparatus, etc.). In order for the research to
make sense, the research topic and the point of view must, naturally, be
such that they can be studied by scientific methods.
What Makes Conceptual Semantics Special? 23
Every theory has some background assumptionsthat is, well-

motivated hypotheses about the nature of the research topic. These
assumptions are necessary because otherwise the research would
be based on wild guessing, without any direction. The background
assumptions are the basis for choosing the guidelines and tools for the
research.
Methodological guidelines are instructions for the researcher based on
the idea of how to do scientific research in a proper way. As the meth-
odological guidelines are supposed to provide a foundation for a solid
theory, they must be in accord with the background assumptions of the
research topic and the goals of the research. The methodological guide-
lines can even be seen as an action plan for the researcher: they guaran-
tee that the research is disciplined and the theory is developed in a
controlled manner.
The descriptions and explanations must be expressed in order for
the researcher to operate with them and in order for researchers to
communicate with each other. Therefore, formalisms and technical
solutions are necessary. Even though they do not define the theory,
they are nonetheless a very important part of the whole. Formalisms
are supposed to express something essential about the true nature of
the research topic from the chosen point of view. In addition, they
should be compatible with the background assumptions and the meth-
odological guidelines. This is the level of expression of scientific thinking.
The logic of the formalism is supposed to express the logic of the research
topic.
Formalisms and technical solutions are subject to change as the
research makes progress, the goals of research are approached, and
new things are learned about the research topics. Because background
assumptions are hypotheses, the researcher may find out that these
hypotheses do not hold, and they can be abandoned or modified.
One can also come up with new, motivated background assumptions
during a successful research process. Because the methodological
guidelines are dependent on the background assumptions, they are
also subject to change. The goals of research, however, are more
stable. One may abandon a goal of research, for instance, if it turns
out that it cannot be scientifically studied. A theoretical approach
may also spark interest in new research topics and new goals if it
turns out that they are closely related to the old ones and there is an
available methodology capable of revealing something real about their
nature.
24 Urpo Nikanne
The following illustration summarizes the discussion:
Goals of research: The parts of the world that

the research tries to find out and explain.
Background assumptions: Motivated hypotheses on the

nature of the research topic.
Methodological guidelines: The principles of the right way

to do scientific work on the given the goals of research and the
background assumptions of the research topic.
Formalism and technical solutions: The formal expression of the

research topic. The formalism and technical solutions must be compatible
with the goals of research, background assumptions, and methodological
guidelines.
Figure 2.1
Hierarchical levels of a theoretical approach
2.3 Characterization of Conceptual Semantics as a Scientific Approach
Conceptual Semantics is a cognitively-oriented theory of the language

system, and it aims at an integrated theory of the mind. Conceptual
Semantics aims for better understanding within the scientific and linguis-
tic community regarding how language functions as a part of human
cognition. According to Nikanne (2008), Conceptual Semantics is char-
acterized as a scientific approach as illustrated in figure 2.2:
Goals of research: Integrated theory of the

human mind and language as a part of it.
Background assumptions: the systematic nature of the

mind and language, partly universal mind and language,
system-based form-oriented view, modularity of mind and
language, cognitive constraints.
Methodological guidelines: formal approach, analytical

organization, simple formation of modules, importance of linking,
regularities before irregularities.
Formalism and technical solutions: representations, tiers,

linking principles, compositionality of lexical semantics, no separate
language-specific semantic representation; semantic functions,
semantic fields, etc.
Figure 2.2
Hierarchical levels of Conceptual Semantics as a scientific approach. The goals of research
form the innermost level, and the formalism and technical solutions the outermost one (cf.
Nikanne 2008).
In what follows, I will explain briefly what the different layers in figure
2.2 stand for. I will concentrate on the goals, background assumptions,
and methodological guidelines. Many characteristic properties of
Conceptual Semantics, such as compositionality of lexical meanings,
semantic functions, semantic fields, etc., belong to the formalisms and
technical solutions of the theory, and they fall outside of the scope of this
chapter.
2.4 Goals of Research
Conceptual Semantics shares its goals with all cognitively-oriented theo-

ries of language: What is the best way to describe and explain language
26 Urpo Nikanne
as a part of human cognition? How does language function and what are
the relationships between language and other cognitive domains?
At this level, the following two things must be taken as given:
1. The relevance of the research topic: language is a natural phenomenon
(i.e., the research topic is something real).
2. The relevance of the point of view: language is a part of the human
mind.
These assumptions may sound self-evident, but they are still something
to be aware of. If itagainst all oddsturns out that there is reason to
believe that language is not a real phenomenon or that it is not a part of
the human mind, the approach would not be scientific.
Fortunately, there seems to be no reason to give up the goals of Con-
ceptual Semantics: language has been described successfully and gram-
mars have been written for thousands of years in different cultural
traditions (Itkonen 1991), so this long experience of research gives us
reason to believe that language is a relevant scientific research topic.
Some linguists study language primarily as a social phenomenon, a
tool for communication, while other linguists study language as a part of
the human mind. There is, however, no contradiction between these
points of view: even though language is a tool and a medium of com-
munication between people, it must be processed in the minds of indi-
vidual people.
In addition, communication consists of messages with a form and
content. The content of linguistic communication often refers to different
aspects of the human life: emotions, actions, social relations, visual obser-
vations, aesthetic experiences, and so on. Language must link together
all this different information and give it a linguistic form (see Macnamara
[1978]; Jackendoff [1983]). There is, thus, a connection between language
and the other domains of human cognition (see the discussion on cogni-
tive constraints below). Language can even be used for communicating
information that is a result of imagination: lies, fairy tales, surrealistic
jokes, etc. A cognitive approach to language is a crucial part of the puzzle
when science tries to understand what human life consists of.
Note that, for example, Cognitive Linguistics (see, e.g., Langacker
[1987a,b] and Lakoff [1987], among many other texts by the same authors)
shares the goals of research with Conceptual Semantics but not the same
background assumptions and methodological guidelines. Conceptual
Semantics aims at a formal theory, whereas the Cognitive Linguistic
approaches does not.
2.5 Background Assumptions
In order to have a plan for how to meaningfully approach the research

object, one must have some motivated assumptions about its nature.
Background assumptions may be based on previous research, theoretical
argumentation, or sometimes even on common sense. These background
assumptions are needed for developing methodological tools for the
research. I will argue that Conceptual Semantics makes the following
background assumptions (cf. Nikanne 2008):
1. Systematic nature of language and the mind
2. Partly universal mind and language
3. System-based form-oriented view
4. Modularity of mind
5. Cognitive constraints
These background assumptions can be characterized as follows:
2.5.1 Systematic Nature of Language and the Mind

According to Nikanne (2008), an assumption regarding the systematic
nature of language (and the mind, mutatis mutandis) can be of four
degrees of strength:
1. All linguistic phenomena are governed by regular principles.
2. The essential linguistic phenomena are governed by regular principles.
3. There are linguistic phenomena governed by regular principles.
4. No linguistic phenomena are governed by regular principles.
The background assumptions of Conceptual Semantics are of degree 2
on this scale. They posit that core grammar is based on a system of rules,
but outside of the core grammar, there is also room for irregularities, for
example idiomatic expressions that are stored as wholes and may con-
stitute exceptions to the principles that govern the core grammar.
2.5.2 Partly Universal Mind and Language

As is well-known, languages have been successfully described using the
same formal means and categories: for example, predicates and argu-
ments, verbs and nouns, vowels and consonants, syllables and words.
Linguistic categories even seem to be organized as larger units according
to similar general principles.
28 Urpo Nikanne
In spite of their similarities, the grammatical systems of different lan-

guages and dialects differ when it comes to small details. To take a con-
crete example from phonology, Arabic has three vowels (a, i, u) but
Finnish has eight (a, e, i, o, u, , , ). Still, both the Arabic and Finnish
vowel systems are based on the same universal set of possibilities. It is
not uncommon that two dialects have the same set of phonemes, but
different principles of combining phonemes. In Finnish, some dialects (in
the Botnia and Savo dialect areas) do not allow certain consonant clus-
ters, for example, lm and hm, even though most dialectsincluding stan-
dard Finnishdo. An epenthetic vowel is added between the consonants
in these dialects (for details, see, e.g., Kettunen [1940]; Karlsson [1983];
Suomi [1990]; Harrikari [1999]).
In Conceptual Semantics, the innateness hypothesis of language, which
often goes hand in hand with the hypothesis of universal grammar, has
not been a major issue. The discussion regarding semantics has empha-
sized that it is methodologically better to link the linguistic representa-
tions directly to conceptual structures that are assumed to be universal
(Jackendoff 1983). Linking principles may differ in different languages,
and this also includes the lexicon, which has been seen to be a part of
the linking rule system (see, e.g., Jackendoff [1987b, 1990]). When it
comes to the universal nature of linguistic representations (phonology,
syntax), Conceptual Semantics follows the mainstream of generative
linguistics in assuming that they are at least partly based on universal
categories and combinatory principles (on the innateness hypothesis, see,
e.g., Chomsky [1986]).
The universal part of grammar consists of principles that are the same
for all human languages. The core grammar of a particular language is a
realization of the possibilities offered by the universal principles that
restrict and govern the structure of human language (see, e.g., Chomsky
[1986] for a discussion).
The irregular parts are not always completely irregular, and the border
between the core grammar and the periphery (see Chomsky [1986]) is
not well-defined. For example, the English idiom kick the bucket die can
be described as any regular VP consisting of a transitive verb taking a
direct object: The captain has kicked the bucket cf. The captain has kicked
the football. The irregular nature of the VP kick the bucket lies in the fact
that the whole can be understood to mean die, which leaves, for
instance, the NP the bucket without a clear semantic referent. Therefore,
the bucket seldom gets a modifier or is topicalized. (See, however, Petrova
[2011] for a thorough discussion on the variation of idioms in language

use.) Unlike mainstream generative grammar, Conceptual Semantics
(see, e.g., Jackendoff [1990]; Nikanne [1990, 2004]; Prn [2004]; Petrova
[2011]; Paulsen [2011]) has been interested in the periphery and has
studied irregular linking principles, constructions, etc.
The layers of the grammatical system can be visualized as in
figure 2.3:
Irregular part of the grammar of L
Language specific part of the

core grammar of L
Universal part of
grammar
CORE GRAMMAR OF
LANGUAGE L
GRAMMAR OF LANGUAGE L
Figure 2.3
The layers of grammar. The dashed lines indicate that there is no clear-cut borderline
between the layers. The core grammar of L consists of the universal part and the language
specific part. The whole grammar of L consists of the core grammar and an irregular part.
2.5.3 System-Based Form-Oriented View

Most theories assume that language has a form (i.e., structure: phonology,
morphology, syntax, etc.) and a system that canat least to some extent
make a distinction between grammatical and ungrammatical forms. In
addition, both language as a whole and its parts are used for particular
functions. Linguistic forms are not useful if they are not used in concrete
utterances, and the utterances always appear in some context. We can
define four different views on language by resorting to a simple model
30 Urpo Nikanne
to analyze which aspect of language the theory takes as primary and what
consequences follow from it:
1. System-based function-oriented view
2. System-based form-oriented view
3. Occurrence-based use-oriented view
4. Occurrence-based form-oriented view
Language is used for a variety of functions, and the different parts of the
language system (syntactic categories, affixes, phonemes, etc.) typically
serve particular functions. A theory may take the function of language
as its starting point and consider the form of language to be subordinate
to the function. This is the view generally adapted by so-called functional
theories.
The function defines The system defines

the well-formed the function of the
(grammatical) parts of the structure.
structures. SYSTEM
FORM USE
The structures occur The utterances occur

in concrete utterances. in some context.
OCCURRENCE
Figure 2.4
System-based function-oriented view
Another possibility is to take form as the fundamental aspect of lan-

guage. Forms are primary, and certain forms are conventionalized to
serve specific functions. Therefore, in this approach, functions are subor-
dinate to forms. Generative theories, for instance, tend to take this
approach.
The system defines The system defines

the well-formed the function of the
(grammatical) parts of the structure.
structures. SYSTEM
FORM USE

OCCURRENCE
Figure 2.5
System-based form-oriented view
Some linguists aim at basing their theories on the most concrete appear-
ance of language, namely the context. This perspective on language is
quite different from the system-based ones. Occurrence-based and use-
oriented approaches tend to take the frequency of particular parts of
structure as the fundamental tool in their analysis. These approaches are
frequency-based and probabilistic when it comes to the analysis of words
and expressions. In this view the system is an approximation based
on the typical (the most frequent) way the forms occur in concrete
contexts.
The fourth possibility is to take concrete utterances as formal units,
without their contexts, as the starting point. Taking this view would mean
that the primary aspect of language consists of concrete utterances that
would somehow be recognized. Then they would be interpreted in the
context they occur. Function and structure are then subordinate to the
concrete utterances and their concrete contexts. This is an unintuitive
perspective on language, and as it is not widely represented among lin-
guistic theories, therefore I will not discuss it further.
The analysis above is only a tool for analyzing and understanding the
view of language as a background assumption of a linguistic theory. One
can easily come up with more possibilities by changing the order of the
boxes and the direction of the arrows.
32 Urpo Nikanne
Grammar (structure) The function of the

is an approximation parts of structure is
of the potential of based on their
structures to occur in SYSTEM occurrence in concrete
concrete contexts. contexts.
FORM USE

OCCURRENCE
Figure 2.6
Occurrence-based use-oriented view
As pointed out already, the Conceptual Semantics represents a system-

based view of language. This approach posits that there is an underlying
system that governs both producing and decoding concrete utterances in
their contexts. The regular part of grammar is based on rules and prin-
ciples, but there is also room for an irregular part of grammar (see Jack-
endoff [1990] and the account of adjuncts).
Conceptual Semantics aims at describing and explaining how language
functions: how language translates ideas of human life (cf. sections Goals
of research above and Cognitive constraints below) into perceptible
linguistic forms (sounds, signs, etc.) and vice versa. I would still say that
Conceptual Semantics view of language is form-oriented. One of the
basic assumptions of Conceptual Semantics is that the several subsys-
tems of the human mind are built out of simple primitive building blocks,
and their combinations are governed by a set of principles. The form-
orientation is strongly reflected throughout the methodological guide-
lines (see below).
2.5.4 The Modularity of Mind

One of the background assumptions of Conceptual Semantics posits
that the human mind consists of several subsystems, that is, modules.
The modular hypothesis was put forth by Fodor (1983), but the modular-
ity assumed by Conceptual Semantics differs from the one suggested
originally by Fodor in two aspects (see, e.g., Jackendoff [1987b]; Nikanne

[1990, 2008]):
(i) In Conceptual Semantics, the autonomy of the modules is reflected
in the formalism. Each level of representation is a module of
its own.
(ii) In Conceptual Semantics, there is not only one-way traffic from the
peripheral modules to the central ones. Interaction in both directions
is possible.
2.5.5 Cognitive Constraints

According to Jackendoff (1983), There must be levels of mental repre-
sentation at which information conveyed by language is compatible with
information from other peripheral systems such as vision, nonverbal
audition, smell, kinesthesia, and so forth (16). This is the content of the
cognitive constraints. A necessary prerequisite for the cognitive con-
straint is, of course, the modularity hypothesis (see the discussion above).
The cognitive constraints are fundamental for the theoretical develop-
ment of Conceptual Semantics. The ultimate goal of the theory is to come
up with an integrated model of the human cognitive system, and without
the cognitive constraints, a representationally modular human mind
could not function as a whole.
2.6 Methodological Guidelines
In this section, I will explain what the methodological guidelines mean

and how they are motivated. Specifically, I discuss five guidelines that
are characteristic of Conceptual Semantics.
1. Formal approach
2. Analytical organization
3. Simple formation of modules
4. Importance of linking
5. Regularities before irregularities
In the spirit of Martin Luthers Small Catechism, after each command-
ment I will explain what the guideline means (What does this mean?)
and what its motivation is (Why is this?). Certainly, there are further
guidelines that could be mentioned and the discussions could be longer.
My goal is to give the reader an idea of the methodological principles of
34 Urpo Nikanne
Conceptual Semantics without digressing into deep philosophical

discussions.
2.6.1 Formal Approach

Formalize your statements.
What does this mean?
The Conceptual Semantics approach is formal, that is, the statements of
the research topic should be based on and presented by well-defined
terms.
Why is this?
This guideline is based on the background assumption that language and
the mind are organized as systems. If language is a system, it should be
described as a system, and its behavior is to a large extent a consequence
of the properties of the system. There is no way around this. As Itkonen
(1983) points out, this is the requirement of explicitness that any scien-
tific theory must fulfil.
2.6.2 Analytical Organization

Keep the formation of formally independent sub-systems apart.
If it can be shown that there is a part of the system that has its own
primitives and principles of combination, it constitutes a module of its
own.
Why is this?
It makes sense methodologically to keep the independent systems apart.
The understanding achieved of the independent modules is always useful.
If it turns out that two previously assumed independent subsystems are
actually in such close relationship that they should not be kept apart, the
knowledge of both subsystems can be used for the theory of the inte-
grated system. If the research tried to describe and explain all phenom-
ena with one large and complicated representation, the possibility of
independent subsystems would not arise, or at least it would be much
more improbable. The research that will formalize the model (cf. guide-
line Formal approach above) tries to keep the formation of independent
systems separate, and seeks and formalizes the links between subsystems
(cf. guideline Simple formation of modules below), is more likely to rec-
ognize whether the model includes subsystems that serve the same func-
tion and should therefore be merged together.
2.6.3 Simple Formation of Modules

Keep the formation of the sub-systems simple.
The formation of sub-systems should contain as few primitives and as
simple principles of their combination as possible.
Why is this?
This is an application of Occams Razor: One should not increase,
beyond what is necessary, the number of entities required to explain
anything. The guideline Analytical organization above also suggests
even if it does not logically entailthat the representations be simple.
2.6.4 Importance of Linking

Study carefully the interaction between the modules.
The principles that govern the correspondences between the subsystems
are a crucial part of the system.
Why is this?
As, according to the background assumptions of Conceptual Semantics,
language is a part of the mind and the mind works as a whole, the theory
should describe and explain how the whole works. As the different parts
of language together form a whole that can be expressed and understood
and as the linguistic expressions include information of other cognitive
domains (social relations, spatial relations, etc.), they must be linked
together somehow. The model must include explicit assumptions of such
links. Notice that the linking between representations does not always
have to be one-to-one, because then the representations would be ana-
logical, and in practice both guidelines Analytical organization and
Simple formation of modules would be violated.
2.6.5 Regularities before Irregularities

Try to find principles as general as possible.
Check the possibility for regular principles before assuming irregular
principles. Even though the importance of irregularities is accepted, the
possibility of referring to regularities must be checked first.
Why is this?
This guideline, too, is based on Occams Razor. The more general the
principles are, the more they cover. And in that way, we can learn what
36 Urpo Nikanne
the deep down tendencies behind particular phenomena are. If each

particular phenomenon was described with a particular principle, the
research would not lead to generalizations. One could describe practi-
cally anything using particular principles, but that would not help us
understand human language or the mind as a whole. Using only particu-
lar principles would not help us understand what is exceptional and what
is typical. Chomsky (1965) calls the general principles weak and the
particular principles strong, and he claims that a good theory should use
as weak principles as possible.
When the methodological guidelines Formal approach and Analytical
organization are combined, it follows that one should carefully study
what is the most natural representation for each phenomenon. If there
is a subsystem whose form is independent of other subsystems, it should
be treated as a module of its own. As the representations are kept as
simple as possible (in keeping with the guideline Simple formation of
modules), the linking between representations plays a fundamental role
in the theory.
In Conceptual Semantics there has never been any tendency to assume
that the linking between representations is trivial or even has to be very
simple (see Jackendoff 1990). In this respect, Conceptual Semantics is
similar to construction grammars (Fillmore and Kay 1996; Fillmore, Kay,
and OConnor 1988; Goldberg 1995; Fried and stman 2004; Kay 1995;
Croft 2001). Conceptual Semantics differs from some construction gram-
mars because of the guideline Regularities before irregularities. This
guideline is based on the background assumption that the nature of mind
and language is systematic.
2.7 Conclusion
I have discussed Conceptual Semantics from a general methodological

point of view. Conceptual Semantics is a school of thought that concen-
trates on studying language as a part of the human mind, but the ultimate
goal of Conceptual Semantics is an integrated formal theory of the
human mind as a whole. The characteristic property of Conceptual
Semantics is the conviction that languageas well as the rest of the
human mindis a form-based system. This makes it possible and suc-
cessful to describe and explain the structure of language by formal means.
In this respect, Conceptual Semantics differs from many other cogni-
tively oriented approaches to language. The methodological guidelines
formulated in this chapter are based on this idea of language and mind
as form-based systems.
2.8 Acknowledgments
I would like to thank bo Akademi University for a six-month research

period in the fall of 2013, which has made it possible for me to write this
chapter. I would also like to thank the Finnish Society of Sciences and
Letters for their financial support. Last but not least, I would like to
thank the Faculty of Arts and Sciences of Tufts University for the pos-
sibility to spend the fall semester of 2013 as a Research Fellow at the
Center for Cognitive Studies. An earlier version of this chapter was pre-
sented in the Tufts Linguistic Seminar in August 2013. I would like to
thank professor Ray Jackendoff and the participants of the seminar for
their valuable comments.
References
Chomsky, Noam. 1965. Aspects of the Theory of Language. Cambridge, MA: MIT
Press.
Chomsky, Noam. 1970. Remarks on nominalization. In Readings in English Trans-
formational Grammar, edited by Roderick A. Jacobs and Peter S. Rosenbaum,
184221. Waltham, MA: Ginn.
Chomsky, Noam. 1986. Knowledge of Language: Its Nature, Origin, and Use.
Westport, CT: Praeger Publishers.
Croft, William. 2001. Radical Construction Grammar. Oxford: Oxford University
Press.
Fillmore, Charles J., and Paul Kay. 1997. Berkeley Construction Grammar.
Latest update: February 27, 1997. Access August 19, 2013. http://www1.icsi
.berkeley.edu/~kay/bcg/ConGram.html.
Fillmore, Charles J., Paul Kay, and Mary C. OConnor. 1988. Regularity and idi-
omaticity in grammatical constructions: The case of let alone. Language 64 (3):
501538.
Fodor, Jerry A. 1983. The Modularity of Mind. Cambridge, MA: MIT Press.
Fried, Miriam, and Jan-Ola stman, ed. 2004. Construction Grammar in a Cross-
language Perspective. Amsterdam: Benjamins.
Goldberg, Adele. 1995. Constructions. Chicago: University of Chicago Press.
Harrikari, Heli. 1999. Epenthesis, geminates, and the OCP in Finnish. Nordic
Journal of Linguistics 22 (1): 326.
Itkonen, Esa. 1983. Causality in Linguistic Theory. Kent: Groom Helm.
38 Urpo Nikanne
Itkonen, Esa. 1991. Universal History of Linguistics: India, China, Arabia, Europe.
Amsterdam: John Benjamins.
Jackendoff, Ray S. 1972. Semantic Interpretation in Generative Grammar. Cam-
Jackendoff, Ray S. 1975. Toward an explanatory semantic representation. Linguis-
tic Inquiry 7 (1): 89150.
Jackendoff, Ray S. 1983. Semantics and Cognition. Cambridge, MA: MIT Press.
Jackendoff, Ray S. 1987a. Consciousness and the Computational Mind. Cam-
Jackendoff, Ray S. 1987b. The status of thematic relations in linguistic theory.
Linguistic Inquiry (18): 369411.
Jackendoff, Ray S. 1990. Semantic Structures. Cambridge, MA: MIT Press.
Jackendoff, Ray S. 2002. Foundations of Language: Brain, Meaning, Grammar,
Evolution. Oxford: Oxford University Press.
Jackendoff, Ray S. 2007. Language, Consciousness, Culture: Essays on Mental
Structure. Cambridge MA: MIT Press.
Karlsson, Fred. 1983. Suomen kielen nne- ja muotorakenne. Helsinki: WSOY.
Kay, Paul. 1995. Construction Grammar. In Handbook of Pragmatics: Manual,
edited by Jeff Versceuren, Jan-Ola stman, and Jan Blommaert, 171177. Amster-
dam: John Benjamins.
Kettunen, Lauri. 1940. Suomen murteet III. B, selityksi murrekartastoon. Hel-
sinki: Finnish Literature Society.
Lakoff, George. 1987. Women, Fire, and Dangerous Things. Chicago: University
of Chicago Press.
Langacker, Ronald W. 1987a. Foundations of Cognitive Grammar. Vol. 1. Theo-
retical Perquisites. Stanford, CA: Stanford University Press.
Langacker, Ronald W. 1987b. Foundations of Cognitive Grammar. Vol. 2. Descrip-
tive Application. Stanford, CA: Stanford University Press.
Macnamara, John. 1978. How can we talk about what we see? MS, Department
of Psychology, McGill University.
Nikanne, Urpo. 1990. Zones and Tiers: A Study of Argument Structure. Helsinki:
Finnish Literature Society.
Nikanne, Urpo. 1995. Action tier formation and argument linking. Studia Lin-
guistica 49 (1): 131.
Nikanne, Urpo. 2005. Constructions in Conceptual Semantics. In Construction
Grammars: Cognitive Grounding and Theoretical Extensions, edited by Jan-Ola
stman and Mirjam Fried, 191242. Amsterdam: John Benjamins.
Nikanne, Urpo. 2006. Aspectual case marking of object in Finnish. Research in
Language 4: 215242.
Nikanne, Urpo. 2008. Conceptual Semantics. In Handbook of Pragmatics,
edited by Jan-Ola stman and Jef Verschueren, 338343. Amsterdam: John
Benjamins.
stman, Jan-Ola, and Mirjam Fried. 2005. Construction Grammars: Cognitive

Grounding and Theoretical Extensions. Amsterdam: John Benjamins.
Paulsen, Geda. 2011. Causation and Dominance: A Study of Finnish Causative
Verbs Expressing Social Dominance. bo: bo Akademi University Press.
Petrova, Oksana. 2011. Of Pearls and Pigs: A Conceptual-Semantic Tiernet
Approach to Formal Representation of Structure and Variation of Phraseological
Units. bo: bo Akademi University Press.
Prn, Michaela. 2004. Suomen tunnekausatiiviverbit ja niiden lausemaiset tyden-
nykset. Helsinki: Finnish Literature Society.
Suomi, Kari 1990. Huomioita yleiskielen konsonanttien yhdistelyrajoituksista ja
pohjalaismurteiden epenteettisest vokaalista. Virittj 94 (2): 139160.
3 Semantic Coordination without Syntactic Coordinators
Daniel Bring and Katharina Hartmann
One (of very many) important life lessons we can learn from Ray Jack-
endoffs work is to eschew quick identification of semantic properties
with syntactic properties. Rather we must allow for a good amount of
independence between syntax and semantics, and each realm stays
simpler. Plus, with a little luck, phenomena that resist analysis in either
dimension alone can be nicely divided and conquered (e.g., Culicover
and Jackendoff 2006).
Culicover and Jackendoff (1997) present arguments that a con-
struction can at the same time involve syntactic coordination and
semantic subordination, explaining many of its otherwise puzzling
properties. In this paper, we aim to make a similar argument for a
type of coordination in German in which the syntactic coordina-
tor aber but unexpectedly appears in a position characteristic of
conjunct-internal particles. We argue that, indeed, in these cases aber
is syntactically a sentence-internal particle, yet semantically it is the
coordinator it always was. Such an analysis is empirically adequate
and is arguably simpler than either of the alternatives (to wit: syntactic
displacement of a coordinator or analysis as juxtaposition, rather than
coordination).
3.1 Introduction
The German adversative coordinator aber but allows for two classes of
syntactic construals. First, just like English but, it can occur between two
constituents of the same syntactic category, for example, V1 in (1a) and
S in (1b);1 in other words, it behaves like and, except that it carries an
adversative meaning:
42 Daniel Bring and Katharina Hartmann
(1) a. Lola ist reich, aber gnnt sichnie etwas.

L. [V1 is rich] but [V1 treats self never something]
Lola is rich but never treats herself to anything.
b. Lola soll sehr reich sein, aber sich nie etwas gnnen.
L. shall [S very rich be] but [S self never s.th. treat]
Lola is said to be very rich but to never treat herself to anything.
We refer to coordinators that appear in canonical coordinator positions
as SYNTACTIC COORDINATORS.
Second, however, aber can occur within the second conjunct:
(2) a. Lola ist reich, gnnt sich aber nie etwas.
L. [V1 is rich] [V1 treats self but never s.th.]
b. Lola soll sehr reich sein, sich aber nie etwas gnnen.
L. shall [S very rich be] [S self but never s.th. treat]
Lola is said to be very rich but to never treat herself to anything.
The position aber occupies in (2) is a typical position for adverbials and
particles in German, but, needless to say, not for coordinators, compare
(2a)/(2b) to (3a)/(3b):
(3) a. * Lola ist reich, gnnt sich und oft etwas.
L. [V1 is rich] [V1 treats self and often s.th.]
intended: Lola is rich and often treats herself to something.
b. * Lola soll sehr reich sein, sich und oft etwas gnnen.
L. shall [S very rich be] [S self and often s.th. treat]
intended: Lola is said to be very rich and to often treat
herself to something.
The sentences in (1) and (2) are equally acceptable and do not seem to
differ in meaning, not even broadly construed (i.e., neither in truth condi-
tions noras best as we can telluse conditions). The question is how
to analyze cases like (2), which we refer to as BURIED COORDINATORS.
Given that buried and unburied aber are identical in meaning, two
hypotheses suggest themselves immediately:
H1: Buried aber is truly a coordinator (just like conjunct-initial aber); its
surface position inside the second conjunct is deceptive.
H2: Buried aber is truly an adverb/particle (which happens to be
homophonous with the coordinator); the coordination is in fact
asyndetic (lacking a coordinator between the conjuncts).
Semantic Coordination without Syntactic Coordinators 43
SYNTACTIC CATEGORY
syntactic coordinator particle
BUT HOWEVER NEVERTHELESS

aber jedoch, allerdings trotzdem, dennoch
semantic coordinator non-coordinating
SEMANTIC CATEGORY
Figure 3.1
The players: Adversative markers discussed in this paper
In this paper, we will argue for a synthesis of these positions:

H3: Buried aber is semantically a coordinator, but syntactically a (clause-
internal) particle.2
Sentences like (2) are thus syntactically asyndetic, but semantically
equivalent to regular coordinations. Aber has one meaning, but occurs
in two syntactic categories.
Our arguments for this conclusion call upon a number of other adver-
sative elements in German, which we will briefly introduce now; the full
cast of players, along with what we want to claim about them, is presented
in figure 3.1.
Allerdings and jedochboth of which we gloss as howeverare syn-
tactically particles, and thus occur buried within the second conjunct; on
the other hand, we argue, they are semantic coordinators, which means
they have the same mix of properties as buried aber; these three elements
will be called COORDINATING PARTICLES. These contrast with trotzdem and
dennochglossed as neverthelesswhich may occupy the same posi-
tions as allerdings/jedoch (however) and buried aber (and are hence
analyzed as syntactic particles, too), but are not semantic coordinators.
We argue for this, among other things, by showing that asyndetic coor-
dinations with the coordinating particlesburied aber and allerdings/
jedoch (however)are semantically complete (because these are
semantic coordinators), but those with trotzdem and dennoch (neverthe-
less), or without any particle, are not.
3.2 Aber (and Jedoch, Allerdings) Is a Semantic Coordinator
This section presents in detail two arguments that buried aber (as well
as jedoch, allerdings however) has a truly coordinating semantics. In the
context of our final diagnosis, this is taken to indicate thatunlike
semantically similar particles like trotzdem and dennoch nevertheless
they are coordinating adversative particles.
3.2.1 Pragmatic Completeness in Coordination

If two conjuncts are juxtaposed without an overt coordinator, coordina-
tion is said to be asyndetic. Asyndetic coordination typically occurs
between all but the last two conjuncts of a multi-part coordination, see (4):
(4) Sie ist reich, besitzt eine Yacht und fhrt Ski in St. Moritz.
she [V1 is rich] [V1 owns a yacht] and [V1 drives ski in St. Moritz]
She is rich, owns a yacht, and skies in St. Moritz.
Pure asyndetic coordination, as illustrated in (5a), gives an impression of
incompleteness, a notion of the sentence still being up-in-the-air. We
indicate this orthographically by . . . at the end of the sentence:
(5) a. Sie ist reich, besitzt eine Yacht . . .
she [V1 is rich] [V1 owns a yacht]
She is rich, owns a yacht . . .
b. Ich glaube, dass sie reich ist, eine Yacht besitzt . . .
I think that she [VP rich is] [VP a yacht owns]
I think that she is rich, owns a yacht . . .
Such coordinations are typically realized with a major prosodic break
between the conjuncts and both conjuncts ending in an intonational high
plateau (a H-L% in ToBI notation; Hirschberg and Beckman 1994;
Beckman, Hirschberg, and Shattuck-Hufnagel 2005). Although we are
not going to pursue this here, it seems plausible to assume that such
coordinations are in fact syndetic coordinations in which the final
conjunct(s) simply remain unuttered, which would explain their prag-
matic and prosodic signature.
No sense of incompleteness is found, of course, if a syntactic coordina-
tor is inserted, as in (6):
(6) Ich glaube, dass sie reich ist und eine Yacht besitzt.
I think that she [VP rich is] and [VP a yacht owns]
I think that she is rich and owns a yacht.
Likewise, the prosodic juncture between the two conjuncts in (6) can be
much less dramatic or even absent, and the second conjunct will typically
be realized with a final fall, as is characteristic for declarative sentences.
A structure with a buried coordinatorand this constitutes our Exhibit
Aclearly patterns with the syndetic coordination in (6), rather than the
asyndetic ones in (5):
(7) a. Sie ist nicht reich, besitzt aber eine Yacht.
she [V1 is not rich] [V1 owns but a yacht]
She is not rich, yet owns a yacht.
b. Ich glaube, dass sie nicht reich ist, ihr Bruder aber eine
I think that [S she not rich is] [S her brother but a
Yacht besitzt.
yacht owns]
I think that she isnt rich, but (that) her brother owns a yacht.
We submit that this contrast between sentences like (5) on the one hand
and those like (6) and (7) on the other should be taken seriously, even
though it merely involves intonation and pragmatic intuitions about
up-in-the-air-ness; asyndetic coordinations without aber are a different
species from those with buried aber.
In order to make the point we are arguing more perspicuous, we intro-
duce the term BARE COORDINATIONS for coordination structures that
involve neither a syntactic coordinator, nor buried aber (nor its class-
mates jedoch or allerdings, which will be discussed in more detail in
section 3.3 below). Our claim is that bare coordinations are pragmatically
incomplete and are marked so intonationally, but coordinations with
buried aber, and coordinating particles in general, are not. We conclude
from that that buried aberapart from expressing adversativityhas a
genuinely coordinating function even in asyndetic coordinations (which
we will model by making it a semantic coordinator in section 3.4 below).
Before going on, let us note that the bare coordination counterparts
to (7) are even more marked than the bare coordinations in (5):
(8) a. ?? Sie ist nicht reich, besitzt eine Yacht . . .
she [V1 is not rich] [V1 owns a yacht]
She is not rich, owns a yacht . . .
b. ?? Ich glaube, dass sie nicht reich ist, ihr Bruder eine
I think that [S she not rich is] [S her brother a
Yacht besitzt . . .
yacht owns]
I think that she is not rich, (that) her brother owns a yacht . . .
We assume that the sentences in (8) suffer from an additional defect,

namely a failure to mark the pragmatic opposition between the
conjuncts lexically. It seems fair to say that the syndetic coordinations
in (9) are odd in the same way, but of course lack any sense of
incompleteness:
(9) a. ?? Sie ist bettelarm und besitzt eine Yacht.
she [V1 is destitute] and [V1 owns a yacht]
She is destitute and owns a yacht.
b. ?? Ich glaube, dass sie bettelarm ist, und ihr Bruder
I believe that [S she destitute is] and [S her brother
eine Yacht besitzt.
a yacht owns]
I think that she is destitute and (that) her brother owns a
yacht.
The problem in (9) can be remedied by inserting adversative particles in
the second conjunct, for example dennoch or trotzdem nevertheless, as
in (10) (and of course aber, as seen in (7) above):3
(10) a. Sie ist nicht reich und besitzt dennoch
she [V1 is not rich] and [V1 owns nevertheless
eine Yacht.
a yacht]
She is not rich and owns a yacht nevertheless.
b. Ich glaube, dass sie nicht reich ist, und ihr Bruder
I believe that [S she not rich is] and [S her brother
trotzdem eine Yacht besitzt.
nevertheless a yacht owns]
I believe that she is not rich and (that) her brother owns a
yacht nevertheless.
In a manner of speaking, then, the addition of und and removed the
sense of incompleteness from (8), and the addition of dennoch/trotzdem
nevertheless remedied the marginality (??) of (9) that was due to the
lack of any indication of adversativity. Crucially, and as expected from
our perspective, adding dennoch/trotzdem to an asyndetic coordination
like (8) alone is not sufficient to make it pragmatically complete, though
it does remove the additional oddness:
(11) a. Sie ist nicht reich, besitzt dennoch eine Yacht . . .

she [V1 is not rich] [V1 owns nevertheless a yacht]
She is not rich, owns a yacht nevertheless . . .
b. Ich glaube, dass sie nicht reich ist, ihr
I believe that [S she not rich is] [S her
Bruder trotzdem eine Yacht besitzt . . .
brother nevertheless a yacht owns]
I believe that she is not rich, (that) her brother owns a yacht
nevertheless . . .
We summarize this state of affairs as follows: dennoch/trotzdem and
buried aber are all particles and adversative markers, but only buried
aber is a semantic coordinator as well. Therefore, asyndetic coordinations
with dennoch/trotzdem are pragmatically incompletethey are the
adversative variant of a bare coordinationwhile those with buried aber
are not.
3.2.2 Zwar
Our Exhibit B for arguing that buried aber is a true semantic coordinator
involves the concessive particle zwar, inserted in the first conjunct.
Similar to English true . . . but, German zwar absolutely requires an
adversative coordinator in the second conjunct, which can be aber either
in coordinator position or, crucially, buried:
(12) a. Sie ist zwar nicht reich, aber sie besitzt eine Yacht.
[V2 she is zwar not rich] but [V2 she owns a yacht]
True, she is not rich, but she owns a yacht.
b. Sie ist zwar nicht reich, besitzt aber eine Yacht.
she [V1 is zwar not rich] [V1 owns but a yacht]
True, she is not rich, but she owns a yacht.
Crucially, the adversative particles dennoch and trotzdem nevertheless
we met earlier do not qualify well as confederates for zwar, with
or without a syntactic coordinator; compare (13) to (10a) and (11a)
above:4
(13) * Sie ist zwar nicht reich, (und) besitzt dennoch/trotzdem
she [V1 is zwar not rich] (and) [V1 owns nevertheless
eine Yacht.
a yacht]
intended: True, she is not rich, (but) owns a yacht nevertheless.
From a distributional point of view, this suffices to make the argument:

buried aber patterns with the syntactic coordinator aber in allowing zwar,
and not with adversative particles like dennoch/trotzdem nevertheless,
which cannot cooccur with zwar.
We claimed above that the difference between buried aber and the
particles dennoch/trotzdem is that only the former is a semantic coordi-
nator. We can then conveniently blame the unacceptability of (13) on the
same fact: zwar requires a contrasting second conjunct with an adversa-
tive semantic coordinator.
3.3 Other Buried Coordinators
In the previous section we have shown that buried aber behaves just
like the syntactic coordinator aber: it makes for a pragmatically
complete coordination, and it can satisfy zwars appetite for an adversa-
tive second conjunct, two things the regular adversative nevertheless-
type particles cannot. This may seem like evidence in favor of H1. Given
that aber also occurs as an uncontroversial syntactic coordinator, why
not claim that buried aber is in fact the same as the syntactic coordinator,
shuffled into the second conjunct by some syntactic displacement
operation?
In this section we will turn to two other adversative particles,
jedoch and allerdings (however). The crucial observation is that these
(unlike trotzdem and dennoch nevertheless, discussed in the previous
section) share all the properties we took to be indicative of buried abers
coordinator status, but that they cannot occur as syntactic coordinators.
This means there has to be an analysis of these properties that does not
rely on being a syntactic coordinator.
3.3.1 Jedoch and Allerdings Are Semantic, but Not Syntactic, Coordinators
First, asyndetic coordinations with allerdings or jedoch (however) are
pragmatically complete, just like their counterparts with buried aber;
compare (14) to (7) above:5
(14) a. Sie ist nicht reich, besitzt jedoch/allerdings eine Yacht.

she [V1 is not rich] [V1 owns however a yacht]
She isnt rich, owns a yacht, however.
b. Ich glaube, dass sie nicht reich ist, ihr
I think that [S she not rich is] [S her
Bruder jedoch/allerdings eine Yacht besitzt.
brother however a yacht owns]
I think that she isnt rich, (that) her brother, however, owns a
yacht.
Jedoch/allerdings (however) thus pattern with buried aber, and not with
the adversative particles trotzdem/dennoch (nevertheless).
Second, jedoch/allerdings (however) occur with zwar in the first con-
junct, just like buried aber, and unlike trotzdem/dennoch (nevertheless);
compare (15) to (12b) above:
(15) Sie ist zwar nicht reich, besitzt jedoch eine Yacht.
she [V1 is zwar not rich] [V1 owns however a yacht]
She might well not be rich, however owns a yacht.
All of this would indicate that jedoch/allerdings are just like aber,
were it not for the fact that they are not syntactic coordinators: (16a) is
completely impossible, in sharp contrast to the impeccable (1a), repeated
as (16b):
(16) a. * Lola ist reich, jedoch/allerdings gnnt sich nie etwas.
L. [V1 is rich] however [V1 treats self never s.th.]
intended: Lola is rich, however never treats herself to
anything.
b. Lola ist reich, aber gnnt sich nie etwas.
L. [V1 is rich] but [V1 treats self never s.th.]
This is straightforwardly modelled if we say that jedoch/allerdings
are particles, but not syntactic coordinators. However, that in turn
means that none of the properties of buried aber discussed above can
be blamed on its syntactic status as a coordinator, for jedoch/allerdings
share all of them. Put differently, if we insisted that all the differences
between aber and the nevertheless particles trotzdem/dennoch dis-
cussed in section 3.2 could ultimately be reduced to aber being a syntactic
coordinator, we would still be left with the task of explaining the dif-
ference between the nevertheless particles and the however particles
jedoch/allerdings, and in particular why the latter behave exactly like

buried aber.
The strategy we choose instead is to assume that jedoch/allerdings and
aber, but not trotzdem/dennoch, are semantic coordinators. The existence
of jedoch/allerdings shows that being a semantic coordinator is indepen-
dent of being a syntactic coordinator. Aber is both, jedoch/allerdings only
the former, and trotzdem/jedoch neither. And of course, as und shows,
not every semantic coordinator can occur buried within the second
conjunct.
3.3.2 Semantic Coordinators Cannot Be Doubled

Having identified other buried coordinators besides aber brings us into
a position to mount a final argument for our claim that these are in fact
semantic coordinators, while other particles are not: allerdings, jedoch,
and buried aber cannot co-occur with syntactic coordinators, while the
non-coordinating adversative particles trotzdem and dennoch can; this
contrast is illustrated in (17) (recall that (17a) without und or aber is a
perfectly acceptable instance of buried coordinators):6
(17) a. * Sie ist reich und/aber arbeitet jedoch/allerdings/
she [V1 is rich] and/but [V1 works however/
aber an der Tankstelle.
but at the gas station]
intended: She is rich, however (she) works at the gas station.
b. Sie ist reich und/aber arbeitet trotzdem/dennoch
she [V1 is rich] and/but [V1 works nevertheless
an der Tankstelle.
at the gas station]
She is rich and/but works at the gas station nevertheless.
This fact once again confirms our contention that the elements in (17a)
are themselves semantic coordinators; adding an additional syntactic
coordinator is redundant. In fact, assuming that syntactic coordinators
und and non-buried aberare also semantic coordinators, this follows
from the semantics we will sketch in section 3.4 below.
3.4 Semantics
We assume that aber as well as the coordinating particles allerdings and

jedoch (however) denote a relation between propositions (sentence
meanings), a relation we simply write as adv, so that adv(s2)(s1) implies
something like s2 contradicts an expectation triggered by s1 (how to

spell out the adversative relation in detail is immaterial for the purposes
of this paper, see, e.g., Umbach 2004; Vicente 2010). This meaning, we
assume, is a conventional implicature, though nothing hinges on this; the
literal meaning of s1 aber s2 would then be the same as that of s1 und
s2: s1 & s2.
When coordinating two complete sentences, as schematized in (18),
semantic composition proceeds smoothly via function application or
something equivalent to it:
(18) [(c) [she is rich] [(b) aber [(a) her brother works at the gas station]]]
a. that her brother works at the gas station
b. adv(that her brother works at the gas station)
c. adv(that her brother works at the gas station)(that she is rich)
When aber occurs in embedded position, as, for example, in (2), its syn-
tactic argument does not denote a complete proposition; see (19a). In
this case, semantic composition with aber/adv proceeds via function com-
position7 to yield (19b);8 the resulting two-place function combines with
the remainder of the second clause by function application again, yield-
ing (19c):
(19) [(d) [she is rich][(c) her brother [(b) aber [(a) works at the gas
station]]]]
a. x.that x works at the gas station
b. x.adv(that x works at the gas station) (= adv(19a))
c. [x.adv(that x works at the gas station)](her brother)
adv(that her brother works at the gas station)
d. adv(that her brother works at the gas station)(that she is rich)
Note that (19c) is the same function as (18b) (as becomes clear after -
reduction, see the second line in (19c)). That is, [aber [S2 . . .]] in (18)
and [S2 . . . aber . . .] in (19) denote the same function, as desired; whether
aber occurs as a coordinator or within the second conjunct makes no
semantic difference.
It is rather straightforward to explain, finally, why a clause with buried
aber or allerdings/jedoch cannot be inserted in a syndetic coordination;
recall (17a) above. Assume that the coordinators und and aber denote
the logical and, that is, p1.p2.p1 & p2. Then combining, for example,
(18b)/(19c) with it will, again by function composition, yield the function
in (20):
(20) p3.p2.adv(that her brother works at the gas station)(p3) & p2
This, however, cannot combine with another proposition to yield a sen-

tence meaning (it would, in fact, require two propositions to do so). It is
not an appropriate meaning for a coordination lacking a first coordinate.
This seems sufficient as a starting point to explain the impossibility of
coordinator doubling.
Let us address two loose ends before closing this section. First, con-
sider adversative particles that are not semantic coordinators, such as
trotzdem and dennoch (nevertheless). Although these are not synony-
mous with aber/jedoch/allerdings, we still assume that their semantic
content, too, involves an adversative relation, which we will call adv*, so
that again adv*(p1)(p2) implies that the simultaneous truth of p1 and p2
is less expected than that of not p1 and p2; for short, p2 despite p1.
The crucial difference, we want to suggest, is that the second argument
of trotzdem/dennoch is anaphoric. For illustration, consider trotzdem,
which conveniently provides morphological evidence for this idea, as it
is literally despite.that. Assume that trotz in fact denotes adv*, while
dem is a propositional anaphor that receives its meaning from context.
Trotzdem, then, denotes not a relation between propositions, but the
property despite , where is the denotation of dem. This property holds
of any proposition p1 whose truth leads one to expect that not rather
than .
The crucial part in this is that a sentence containing trotzdem denotes
the same kind of semantic object as one without it: a proposition. The
second propositional argument of adv* is saturated by dem, and hence
no semantic argument place for a first conjunct is provided by trotzdem.
This is what it means to not be a semantic coordinator. The analysis for
dennoch proceeds identically, except that the propositional anaphor here
has no morphological reflex.
The second loose end regards sentences with aber, jedoch, or allerd-
ings, but without coordinations, such as in (21):
(21) Sie ist reich. Sie gibt ihr Geld aber/ jedoch nicht gern aus.
she is rich she give her money but however not happily out
She is rich. She doesnt like to spend her money, though.
From what we said so far, it follows that the second sentence in (21)
denotes a function from propositions to propositions, not a proposition
(as a declarative sentence should). It is, in effect, a second conjunct
waiting for a first. In contrast, the same sentence with trotzdem in place
of aber should denote a proposition.
We acknowledge that this is a puzzling result, although no more puz-

zling than the fact that in general final conjuncts can occur as indepen-
dent sentences, or indeed as independent utterances:
(22) Speaker A: She is rich.
Speaker B: And/But she has good taste.
Descriptively speaking, the linearly first argument of a syntactic coordi-
nator can remain unuttered when its content is salient in the context.
Our claim is that buried coordinators are semantically identical to run-
of-the-mill syntactic coordinators (the kind that precede the final con-
junct), and that their semantic content is sufficient to create a pragmatically
complete coordination. However one goes about explaining cases like
Speaker B in (22), the explanation will carry over to (21). What is impor-
tant is simply that one would not attempt to explain Speaker Bs utter-
ance in (22) by saying that and/but are not ever relational, so by the same
token, (21) is not an argument against our claim that buried coordinators
are relational.
3.5 The Syntax of Embedded Coordinators
This section looks at the syntactic distribution of the adversative ele-

ments discussed in this paper. So far we have suggested that all adversa-
tive particles, whether semantically coordinating or not, including buried
aber, belong to the same syntactic class, particles. This leads us to
expect that they have the same distribution within their clause, which is
by and large correct, but not entirely. Although we cannot offer an analy-
sis of the distributional differences, we will document them in some detail
in this section.
For clarity of exposition, we will consider three topological regions of
the German clause: (i) positions following the finite verb (the so-called
Mittelfeld), (ii) positions immediately following the initial constituent
in V2 clauses, and (iii) the initial position in V2 clauses itself.
3.5.1 Particles in the Mittelfeld

As far as we can tell, all elements considered in this paper show the same
distribution when they occur after the second-position, finite verb. As is
typical for particles and adverbials, they appear after weak pronouns if
present, as in (23a), or immediately after the finite verb if there are none,
as in (23b):
(23) a. Sie ist reich, gesteht es sich jedoch nicht ein.

she [V1 is rich] [V1 admits it self however not in]
She is rich, however doesnt admit it to herself.
b. Sie ist reich, kauft aber beim Hofer ein.
she [V1 is rich] [V1 buys but at H. in]
She is rich, but shops at the Hofer store.
The particle may be realized above or below a scrambled constituent,
which, following Riemsdijk (1978) and many others, appears outside
of VP. Example (24), from Haider and Rosengren (1998, 13), shows
scrambling of a quantified NP. The sentence is ambiguous between
a wide scope and a narrow scope interpretation of the quantified NP
fast jedes Bild, which is taken as evidence for scrambling (the ambiguity
is lost if the accusative NP occurs in its base-position below the
dative NP):
(24) . . . dass man fast jedes Bild mindestens einem
that one almost every painting at least one.dat
Experten zeigte.
expert.dat showed
. . . that almost every painting was shown to at least one expert.
If a scrambled structure like (24) occurs as the second conjunct of an
adversative coordination, the buried coordinator may occur above the
scrambled constituent, as in (25a), or below it, as in (25b):
(25) Man konnte eine Flschung nicht ausschlieen, . . .
one [V1 could a counterfeit not exclude]
One couldnt exclude a forgery . . .
a. . . . hat allerdings fast jedes
[V1 has however almost every
Bild mindestens einem Experten gezeigt.
painting at least one expert shown]
b. hat fast jedes Bild allerdings
[V1 has almost every painting however
mindestens einem Experten gezeigt.
at least one expert shown]
(for ab): . . . but showed almost every painting to at least one
expert.
In the preceding examples we have alternated aber, jedoch, and allerd-
ings, but the claim is that any of these can occur in any example.
Furthermore, when the asyndetic coordinations in these examples are

made syndetic, aber/jedoch/allerdings can beand must bereplaced by
trotzdem/dennoch:
(26) Die Polizei vermutet, dass die gestohlenen
the police assumes [S that the stolen
Bilder in diesem Haus versteckt sind,
paintings in this house hidden are]
und dass [ diese Tr aufzubrechen] trotzdem/dennoch
and [S that this door to break open nevertheless
keiner je versucht hat.
nobody ever tried has]
The police assumes that the stolen paintings are hidden in this
house and that nevertheless nobody ever tried to break this door.
Again, we surmise that this holds in general for both dennoch and trotz-
dem in all examples discussed.
This picture is what we expect if indeed all of these elements were of
the same syntactic category, particle, and their clause-internal distribu-
tion depended on that alone. Alas, this comforting identity of distribution
breaks down in two positions, to which we now turn.
3.5.2 Post-Initial Position

All coordinating particles can occur in a position between the initial
constituent and the finite verb, as in (27):
(27) Sie ist eher konservativ, ihr Bruder aber/
[V2 she is rather more conservative] [V2 her brother
jedoch/allerdings ist bei den Grnen.
but/however is at the Greens]
She is rather more conservative, but her brother is with the
Greens.
Standard wisdom has it that at most one constituent can precede the
finite verb in a German main clause, which implies that the coordinating
particles in (27) should be analyzed as right-adjoined to the initial subject
DP, against any semantic intuitions.
To make matters worse, all other candidate elements we can think of,
including crucially the adversative particles in the nevertheless class, are
impossible in that position:
(28) * Sie ist eher konservativ, (und) ihr Bruder

[V2 she is rather more conservative] and [V2 her brother
trotzdem/dennoch ist bei den Grnen.
nevertheless is at the Greens]
intended: She is rather conservative, (and) her brother is with the
Greens nevertheless.
So here we have a case in which the syntactic class of adversative par-
ticles in fact splits up into two subclasses: those that may occur in post-
initial position (whatever structural position that may be) and those that
may not. The former coincides with the class of semantic coordinators,
but at present we do not have a hunch as to why these two properties
being a coordinating particle and occurring in post-initial position
should be correlated.
Pasch et al. (2003, 498) observe that the post-initial placement of coor-
dinating particles requires that the initial constituent be contrastive, as
in (29a). If the initial constituent is given, and is hence a fortiori not
contrastive, post-initial placement is impossible, as in (29b), even though
there is nothing wrong with having a given constituent in the initial posi-
tion in general, as in (29c):9
(29) Sie hat in ihrem Berufsleben nicht ein einziges
she has in her professional career not a single
Mal das Flugzeug benutzt, . . .
time the airplane used
She has not once used an airplane in her professional career . . .
a. . . . [auf das Auto] aber konnte sie nicht verzichten.
on the car but could she not dispense
b. * . . . [sie] aber konnte auf das Auto nicht verzichten.
she but could on the car not dispense
c. . . . [sie] konnte (aber) nicht auf das Auto verzichten.
she could but not on the car dispense
(for ac): . . . but she could not do without a car.
Intuitively, this cashes in on the affinity of adversative particles to con-
trastive topic constructions, of which (29a) (and (27)) are arguably
instances, but again we do not have a more precise statement to offer,
nor do we think that the non-coordinating adversative particles in (28)
are prima facie any less appropriate for such uses.
3.5.3 Initial Position

However- and nevertheless-type particles (i.e., coordinating or not)
may occur as the sole preverbal constituent in a second conjunct:10
(30) a. Sie ist reich, jedoch/allerdings arbeitet ihr Bruder
[V2 she is rich] [V2 however works her brother
an der Tankstelle.
at the gas station]
She is rich, her brother, however, works at the gas station.
b. Sie ist reich, und trotzdem/dennoch arbeitet ihr
[V2 she is rich] and [V2 nevertheless works her
Bruder an der Tankstelle.
brother at the gas station]
She is rich, and nevertheless her brother works at the gas
station.
This is expected since as a rule, any constituent may occupy this position
in German. What is utterly surprising in this light is that aber does not
have this option:
(31) * Sie ist reich, aber arbeitet ihr Bruder an der Tankstelle.
[V2 she is rich] [V2 but works her brother at the gas station]
intended: She is rich, but her brother works at the gas station.
We have so far assumed that the possible positions for aber were
those for a syntactic coordinator, or in whatever position jedoch and
allerdings occur (which in turn are by and large the positions that
any particle can occur in, plus those discussed in section 5.2). In light
of (31), however, this neat subset relation breaks down: aber may
occur in one position impossible for bona fide particlesthe syntactic
coordinator positionand is banned from one of the positions particles
are possible ininitial position in V2-clauses; this is summarized in
figure 3.2.
It seems to us that this distributional picture resists modelling in terms
of primitive syntactic categories; rather, lexical items must be assigned
to one or more specific distributional classes, without obvious external
correlates.
3.6 Summary
Our contribution gives a glimpse of the complex field of adversative

particles in German. These elements do not form a unified class, and even
SYNTACTIC CATEGORY
syntactic coordinator particle
[initial] (anywhere) [post-initial]
BUT HOWEVER NEVERTHELESS

aber jedoch, allerdings trotzdem, dennoch
semantic coordinator non-coordinating
SEMANTIC CATEGORY
Figure 3.2
Adversative markers with refined syntactic distribution
resist a division into neat subclasses. Syntactically, the traditional separa-

tion between syntactic coordinator and particle appears to allow for a
classification, the only complication being aber, which belongs to both
classes. Semantically, however, we argue that aber patterns with a sub-
class of the adversative particles in that it acts as a true coordinator. This
result is noteworthy especially with respect to buried particles, including
aber, whose ability to coordinate is not evident at first. The proof is fur-
nished mainly by asyndetic coordinations, which are pragmatically com-
pleted only in the presence of a truly coordinating element.
In conclusion, this article provides a further argument to the everlast-
ing debate around the proper architecture of the syntax-semantics inter-
face, showing that neither the syntactic position of an element nor its
membership in a class of likes necessarily reveal its semantic properties.
With respect to coordination, it shows that two conjuncts may be seman-
tically coordinated in the absence of a genuine syntactic coordinator.
Notes
1. We somewhat agnostically use the following syntactic labellings for German

examples: V2 for complete verb-second clauses (CP in most contemporary analy-
ses), V1 for finite verb-initial constituents and S for complementizer-initial verb
final clauses ( C), S for a clausal constituent with final finite verb, and VP for
a constituent with final verb and no subject.
Note that S constituents do not necessarily contain a subject, either, as that may
be outside the coordination; in such cases, S-hood is diagnosed by the presence
of other uncontroversially VP-external elements such as weak pronouns (e.g.,
sich in (1b), (2b), see also section 5.1).
2. What we call particles in this paper are equally commonly classified as
adverbials; nothing hinges on this distinction here.
3. An anonymous reviewer suggests that the oddness of failing to mark prag-
matic opposition seen in (8) and (9) might be explained as an instance of failure
to maximize presupposition: aber, dennoch, trotzdem, etc. grammatically express
opposition, while plain und does not, so the former are in a sense stronger,
andwhere they are appropriateblock using the latter due to some principle
of Maximize presupposition! (Heim 1991).
We think this is a plausible suggestion, except that it is unclear to us how the
contrastive or adversative content of aber and its ilk could be a presupposi-
tion (given that A aber B clearly presupposes neither A nor B, how could it
presuppose any relation between them?). Assuming instead that it is a conven-
tional implicature, we could perhaps derive the intended effect from a generaliza-
tion of Maximize presupposition! to something like Maximize non-at-issue
content!
4. We find examples like (13) seriously degraded. A reviewer suggests, however,
that examples similar to (13) could be found in corpora, and that they do occur
in Google search results.
To obtain a more systematic picture, we ran a search on a 22,248,965 word corpus
of German newspaper texts, Berliner Morgenpost, October 1997, MayDecember
1998, JanuaryDecember 1999, using the COSMAS IIweb interface provided by
the Institut fr deutsche Sprache, Mannheim. We found that of the 7,962 occur-
rences of zwar, only 1.39% occur as sentence-mates with dennoch/trotzdem but
without one of aber/jedoch/doch/allerdings (more than half of them clause ini-
tially, incidentally); in contrast, 63.26% of zwar co-occur with aber/jedoch/doch/
allerdings (and without dennoch or trotzdem) in the same sentence (we didnt
search for co-occurrences across sentence boundaries, which probably accounts
for most of the remaining 35%).
Even considering that aber/jedoch/doch/allerdings are more than 15 times more
frequent than dennoch/trotzdem in total, they are still in fact more than 45 times
more frequent with zwar and without dennoch/trotzdem than with dennoch/
trotzdem, and without aber/jedoch/doch/allerdings. We take this to confirm our
original judgment that there is a marked and systematic difference between the
two classes.
5. The English translations with however work less than perfectly (we think
because however prefers to have its contrasting element in a separate sentence);
we provide them nonetheless in order to conveyas best as possiblea feel for
the German construction.
6. Again, prompted by a reviewers Google result similar to (17a), we conducted
a search on a 4,491,138 word tagged corpus of German newspaper texts
(Tagged-C), using the COSMAS IIweb interface provided by the Institut fr
deutsche Sprache, Mannheim. The results confirm our intuitive judgements.

While allerdings and jedoch occur about three times more often in the corpus
than trotzdem and dennoch, the latter occur more than 50 times more often (1,315
and 1,187 occurrences, respectively) than the former (25 and 23 occurrences,
respectively) in the context . . . und/oder V allerdings/jedoch/trotzdem/dennoch
(excluding V und/oder V coordinations, as the most blatant case of non-clausal
coordination).
7. Function Composition: For any functions f : X Y and g : Y Z, g f is that
function h such that for any x X, h(x) = g(f(x)).
8. An alternative suggested by an anonymous reviewer wouldas we under-
stand itgive up the assumption that every syntactic constituent is associated
with its own denotation, so that in particular adv never needs to compose seman-
tically with a predicate. This would obviously solve the problem of interpreting
buried aber. The derivations given in this section assume the worst-case sce-
nario, i.e., that semantic composition proceeds entirely compositional, i.e., in
lockstep with the syntactic structure.
9. Nor is the post-initial position of the coordinating particle obligatory in order
to get a contrastive focus interpretation of the fronted XP. Example (i) presents
a further option for a second conjunct to (29) with a fronted contrasting constitu-
ent but a low embedded coordinator:
(i) . . . [ auf das Auto] konnte sie aber nicht verzichten.
on the car could she but not dispense with
. . . but she could not do without a car.
10. As expected, a syntactic coordinator must precede a second conjunct contain-
ing the particles trotzdem/dennoch, but cannot precede one containing the
semantic coordinators jedoch/allerdings.
References
Beckman, Mary E., Julia Hirschberg, and Stefanie Shattuck-Hufnagel. 2005. The
original ToBI system and the evolution of the ToBI framework. In Prosodic
Typology: The Phonology of Intonation and Phrasing, edited by Sun-Ah Jun,
954. Oxford: Oxford University Press.
Culicover, Peter W., and Ray Jackendoff. 1997. Semantic subordination despite
syntactic coordination. Linguistic Inquiry 28 (2): 195217.
Culicover, Peter W., and Ray Jackendoff. 2006. The simpler syntax hypothesis.
Haider, Hubert, and Inger Rosengren. 1998. Scrambling. Sprache und Pragmatik
49. Lund: University of Lund.
Heim, Irene. 1991. Artikel und Definitheit. In Semantik: Ein internationales
Handbuch der zeitgenssischen Forschung. Handbcher zur Sprach- und Kom-
munikationswissenschaft, vol. 6, edited by Arnim von Stechow and Dieter Wun-
derlich, 487534. Berlin: Walter De Gruyter.
Hirschberg, Julia, and Mary E. Beckman. 1994. The ToBI annotation conventions.
MS. The Ohio State University.
Pasch, Renate, Ursula Braue, Eva Breindl, and Ulrich Herman Waner. 2003.
Handbuch der deutschen KonnektorenLinguistische Grundlagen der Besch-
reibung und syntaktische Merkmale der deutschen Satzverknpfer (Konjunk-
tionen, Satzadverbien und Partikeln). Schriften des Instituts fr Deutsche Sprache,
Band 9. Berlin, New York: Walter De Gruyter.
Riemsdijk, Henk van. 1978. A Case study in Syntactic Markedness: The Binding
Nature of Prepositional Phrases. Ph.D. diss., University of Amsterdam.
Umbach, Carla (2004). On the notion of contrast in information structure and
discourse structure. Journal of Semantics 21 (2): 155175.
Vicente, Luis (2010). On the syntax of adversative coordination. Natural Lan-
guage and Linguistic Theory 28 (2): 381415.
4 Out of Phase: Form-Meaning Mismatches in the
Prepositional Phrase
Joost Zwarts
This paper presents two cases in which the syntactic and semantic struc-
tures of a prepositional phrase (PP) do not line up. This is in line with
the relative independence of these levels of representation in the Parallel
Architecture framework of Jackendoff (2002). At the same time, these
mismatches can be analyzed as restricted lexical exceptions to the oth-
erwise rather tight correspondence between syntax and semantics in this
domain.
In the Parallel Architecture view of grammar (e.g., Jackendoff 2002),
a linguistic expression can be taken as a bundle of different types of
information, each with their own structural primitives and principles.
Take the (partial) representation of the phrase under the table in (1):
(1) Phonology: ( n)( dr)( )( te)( bl)
Syntax: [PP P [NP D N ]]
Semantics: UNDER (THE (TABLE))
There is a piece of phonology, consisting of sound segments, organized
into syllables, a syntactic structure with parts of speech, and a representa-
tion of the expressions meaning in terms of function application. Within
this bundle, parts correspond to each other, like the phonological form
( n)( dr) with the syntactic category P and the semantic function
UNDER, and ( )( te)( bl) with [NP D N ] and THE (TABLE),
forming smaller bundles, some basic, some derived.
In mainstream generative grammar, especially in its current minimalist
form, the syntactic structure forms the combinatorial backbone of an
expression. Sound and meaning components are derived by mapping the
syntactic structure to a phonological and a semantic structure. The syn-
tactic representation tends to be quite rich, allowing the mappings to
sound and meaning to be as simple and direct as possible. In the Parallel
Architecture, however, all three components function as relatively inde-
pendent pieces of structure, held together by interface rules that leave
64 Joost Zwarts
room for potential mismatches between the phonological, syntactic, and

semantic organization of an expression. The syntax can also be simpler
than in the minimalist architecture (Culicover and Jackendoff 2005),
partially because it is no longer the only generative component.
The goal of this paper is to demonstrate the fruitfulness of the Parallel
Architecture for two phenomena in the prepositional domain in which
there are mismatches between form and meaning. I argue that the
simplest and most natural analyses of these cases involve relatively
simple syntactic and semantic structures that the lexicon brings into cor-
respondence in an idiosyncratic way, going against the optimal interface
between the syntax and semantics of (prepositional) phrases for different
reasons.
I sketch the background assumptions about prepositional phrases and
their semantics in section 4.1. This sets the stage for two types of mis-
matches in this domain, one at the level of objects and arguments (section
4.2) and another one at the level of heads and functions (section 4.3).
Section 4.4 concludes with two more constructions (from a much wider
range) that deserve further study in this respect.
4.1 Prepositional Phrases and Their Meanings
Simplifying matters considerably, we can say that the sentences in (2a)

and (3a), taken from Jackendoff (1983, 163), have the syntactic structures
in (2b) and (3b) and the semantic structures in (2c) and (3c),
respectively:
(2) a. The mouse is under the table.
b. [IP NP [VP V [PP P NP ]]]
c. BE (MOUSE, UNDER (TABLE ; DEF))
(3) a. The mouse ran from under the table.
b. [IP NP [VP V [PP P [PP P NP ]]]]
c. GO (MOUSE, FROM (UNDER (TABLE ; DEF)))
The phonological structures of the sentences are simply represented
by its written forms in (2a) and (3a). In (2b)/(3b) and (2c)/(3c), many
details of syntactic and semantic structure are ignored, in particular
tense. This allows us to focus on the issues that are important for this
paper, namely the correspondences that hold between the different
levels in and around the prepositional phrase. The ; is used in (2c) and
(3c) and in the rest of this paper to introduce conceptual information
that specifies or modifies what precedes it.
Out of Phase 65
The first type of correspondence concerns the level of grammatical

and conceptual argument positions. The sentences contain two NPs: the
mouse, which functions grammatically as the subject of the sentence, and
the table, the object of the preposition under. Such grammatical functions
might ultimately require a dedicated tier of syntactic representation (e.g.,
Culicover and Jackendoff 2005), but for my purposes it is sufficient to
assume that they can be read off from the phrase structures in (2b)/(3b)
(Chomsky 1965, 69). In the semantic representation we find a functional
distinction that bears different names in different traditions, but for
which I will use the terms figure and ground (originally introduced
for this purpose in Talmy [1972]). MOUSE, as the first argument of the
BE or GO function, is the figure of the situation, the entity of which the
location or motion is represented relative to the ground, TABLE.
The relation between grammatical and conceptual functions that we
see here is typical for prepositions. The ground of the relation expressed
by the preposition corresponds to its object and the figure to a gram-
matical function outside the PP, usually the subject. Building on Talmy
(2000) and others, Svenonius (2007) argues extensively for this general-
ization and compares it to the constraints that govern the linking between
semantic arguments and grammatical functions in the verbal domain.
The general correspondence rule in (4) covers this generalization (derived
from an even more general rule in Jackendoff [1990, 25]):
(4) If a syntactic head X corresponds to a one-place semantic function
F, then the object of X corresponds to the argument of F.
In other words, the semantic configuration of function application cor-
responds to the syntactic configuration of complementation. When a
place or path function corresponds to a preposition, the argument of such
a one-place function (i.e., the ground) corresponds to the object of the
preposition. In section 4.2, I consider one construction in which this
constraint does not hold.
Let us turn to the second type of correspondence exemplified in (2)
and (3). Implicit in the semantic representations of those examples is a
fundamental distinction between two types of spatial concepts, places
and paths, introduced most explicitly in this form in Jackendoff (1983).
Functions like UNDER and FROM define an entity of a particular
ontological category that Jackendoff (1983) made explicit in the follow-
ing way:
(5) a. [Place UNDER ([Thing TABLE ; DEF ])]
b. [Path FROM ([Place UNDER ([Thing TABLE ; DEF ])])]
66 Joost Zwarts
In line with common practice, I will omit these labeled brackets

because they are always uniquely defined by the functions and therefore
somewhat superfluous. A place is a region of space where something can
be (a location, region). In addition to UNDER, there is a range of other
place functions, like IN, ON, BEHIND, mapping objects to places in
particular ways. A path is a stretch of space (a trajectory, curve) along
which something can move, extend, or be oriented (Jackendoff 1983, 174).
As Jackendoff explains, path concepts can be derived from place con-
cepts in different ways. (3c) contains a path that has its source under the
table, as indicated by the path function FROM. Other path functions are
TO (specifying the goal of the path, e.g., into) and VIA (its route, e.g.,
through).
The prepositional part of example (3) exhibits a perfect match between
meaning and form. The path and place functions correspond one-to-one
to the prepositions from and under, respectively, and the hierarchical
orderings of the two levels also match. A path function usually applies
to the result of a place function and not the other way around. This is
because a path can only be defined once a place has been identified. The
only exception involves the place function ON (Jackendoff 1983, 16667)
that defines a place as the end-point of a path (as in The house is up the
hill, i.e., at the end of the path that goes up the hill). This conceptual
asymmetry of paths and places is paralleled by the syntactic structure
(Van Riemsdijk and Huijbregts 2001): the path preposition from in (3)
is outside the place preposition under and not the other way around
(*The mouse ran under from the table). (6) shows this isomorphism
schematically:
(6) [PP P1 [PP P2 . . . ]]
PATH1 ( PLACE2 (. . .) )
The situation in (6) instantiates a more general correspondence
pattern:
(7) If a semantic function F applies to the result of a semantic
function G, and F and G correspond to different syntactic
elements, then the syntactic element corresponding to F governs
the syntactic function that corresponds to G.
In other words, the semantic hierarchy of function composition cor-
responds to the syntactic hierarchy of government.
The correspondence rule is not intended to rule out the common situ-
ation that the path and place functions are together lexicalized as one
Out of Phase 67
preposition. Through, for example, can be analyzed as involving the func-

tions VIA and IN (Jackendoff 1990, 72). If the mouse ran through the
maze, then it followed a path that involved places in the maze:
(8) a. through the maze
b. [PP P [NP D N ]]
c. VIA (IN (MAZE ; DEF))
There are of course numerous cases where one single lexical item cor-
responds to a semantic representation with multiple functions, like the
verb enter, which lexicalizes the functions GO, TO, and IN (Jackendoff
1983, 183). One might say that in (8) the syntactic P head and the pho-
nological form through correspond to a composite semantic function
VIAIN.
It is also possible that a function at the semantic level does not have
any formal correspondent. This is what we see in (9), from Jackendoff
(1983, 163):
(9) a. (The mouse ran) under the table.
b. [PP P [NP D N ]]
c. TO (UNDER (TABLE ; DEF))
The TO function does not have any counterpart at the other levels,
neither as a separate form (compare from under), nor as part of a special
lexicalization (compare through). Another example is the ON function
mentioned above, which is never lexicalized, as far as I know. Such
covert semantic operations, which are quite common (Jackendoff 1990,
72), do not go against the correspondence formulated in (7). In section
4.3, I will consider a much less common mismatch that goes directly
against this correspondence.
4.2 Objects and Grounds: The Temporal Distance Construction
This section describes a construction in which the object of a preposition

is not a ground, but another semantic element, going against the normal
correspondence formulated in (4) above.
Compare the following two sentences, figuring a relational temporal
preposition in the terminology of Verkuyl (1973):
(10) a. John left three years after the accident.
b. John left after three years.
Both sentences can describe the same temporal relation between two
events: the figure is the event of John leaving and the ground is the event
68 Joost Zwarts
of the accident, which is explicit in (10a) and implicit in (10b). The

implicit argument takes its value from the context. (10b) can be under-
stood as (10a) if an accident is being discussed, but in another context
its value could be something else, for example, Johns arrival. The tem-
poral interval between the two events is specified through the measure
phrase three years. In the terminology of Fillmore (2002) there is a vector
pointing from the Landmark (the accident) to the Target (Johns depar-
ture) over a Distance of three years in the Direction of the future.
In order to be able to (partially) represent the meaning of the PP three
years after the accident, I adopt two more elements from Jackendoffs
conceptual semantics (Jackendoff 1983). First, there can also be locations
in time, in accordance with the localist hypothesis. The PP corresponds
to a Place concept in the semantic field of time, defined by a temporal
function AFTER applied to an event. Second, this concept has an amount
modifier that restricts it in an appropriate way, as indicated by the
semicolon:
(11) AFTER (ACCIDENT; DEF ); [ 3 YEARS ]
ACCIDENT functions here as the ground, in the same way in which
the TABLE functions as the ground in the examples of the previous
section. It is the entity with respect to which Johns departure (the figure)
is located in time. This is done by the function AFTER, which maps it to
a temporal place.
If we now compare (10a) and (10b), we can see that the first sentence
complies with the generalization formulated in (4), but that in the second
sentence it is not the ground that corresponds to the object of the prepo-
sition after, but rather the amount component. The ground is not
expressed in (10b), but it is left implicit and is picked up from the context.
This disturbs the isomorphism between syntax and semantics: it is not an
argument of a function that forms the object of the preposition, but a
modifier.
We can now distinguish two lexical entries for after, corresponding to
(10a) and (10b), each with pieces of phonology, syntax, and semantics,
and coindexed variables over such pieces (Jackendoff 2002):
(12) a. after1 Phon2
[PP P1 NP2 ]
AFTER1 (Event2)
b. after1 Phon2
[PP P1 NP2 ]
AFTER1 (R) ; Amount2
Out of Phase 69
This makes explicit what is special about the use of after in (10b) in
comparison to (10a), how a modifier is treated as if it were an argument
and the ground becomes implicit (because there is nothing in the syntax
or phonology corresponding to the reference event or time R in (12b)).
The pattern in (12b) occurs in many different languages, with a variety
of temporal prepositions that describe temporal relations (Haspelmath
1997; Caha 2010). A temporal distance is expressed from the speech time
S or a reference time R, in the direction of the past or the future. The
German PP (13b), for example, locates an event one month before the
speech time S:
(13) a. einen Monat vor dem Unfall
a.ACC month before the accident
a month before the accident
b. vor einem Monat
before a.DAT month
a month ago
In German, measure phrases are usually accusative, as einen Monat a
month in (13a), but when they follow the preposition, in the temporal
distance construction, they carry the dative case that is typical for the
locative use of prepositions. This constitutes fairly direct evidence that
the measure phrase in (13a) behaves as the syntactic object of the prepo-
sition vor even though it is semantically a modifier. The two different
lexical entries of vor that figure in (13a) and (13b) are shown in (14a)
and (14b), respectively, ignoring dative case for the time being:
(14) a. vor1 Phon2
[PP P1 NP2 ]
BEFORE1 (Event2)
b. vor1 Phon2
[PP P1 NP2 ]
BEFORE1 (S) ; Amount2
The English construction a month ago does not fit the pattern of (12b)
and (14b): ago can better be treated as an intransitive preposition with
an obligatory modifier, as argued by Fillmore (2002) and Coppock
(2009).
Haspelmath (1997) and Caha (2010) choose opposite strategies in
working away the mismatch in (13b), either pragmatically or syntacti-
cally. For Haspelmath einem Monat is semantically the argument of vor
and for Caha it is syntactically a modifier. Haspelmath paraphrases the
70 Joost Zwarts
meaning of vor einem Monat as immediately before a one month period

ending now, which he analyzes as resulting from the normal temporal
meaning of vor before with pragmatic mechanisms of strengthening. In
Cahas analysis, the measure phrase starts at the normal position for
modifiers, but it moves into the dative-marked object position, which is
preceded, after movement, by the preposition. In both cases the underly-
ing assumption is that the syntactic and semantic structures must be
isomorphic and pragmatic or syntactic complexities are necessary to
maintain this assumption. A much simpler analysis is possible if we do
not hold that assumption, as I showed, but rather allow for lexical items
in which form and meaning are out of phase, aligned in an idiosyncratic
way. Note that not every temporal preposition with the appropriate
meaning allows the measure phrase to be put in the object position.
German vor allows it, but English before does not. This makes it neces-
sary to store patterns like (12b) and (14b) in the lexicon.
The question is now why the type of mismatch discussed here would
arise in the first place. Why would a measure phrase that functions as the
modifier of the P end up as its object, violating the general correspon-
dence in (4)? I have presented (12a)/(12b) and (14a)/(14b) as completely
separate lexical entries, but this is not realistic when we want to capture
the rich network of relations among elements in the lexicon (e.g., Jack-
endoff 2008). It seems more likely that (12a)/(12b) and (14a)/(14b) are
specific instantiations of more general prepositional patterns in the
lexicon, like the couplings of the phonological forms /after1 Phon2/ and
/vor1 Phon2/ with the syntax [PP P1 NP2 ]. All by itself, hierarchy and
default inheritance is not enough to explain why measure-phrase modi-
fiers can be direct objects of prepositions. It seems reasonable to assume
that a strong prepositional pattern in the language puts pressure on cases
like (10b) and (13b) to realize the modifier as an object. Phrased differ-
ently, modifiers are realized as objects in (12b) and (14b) in analogy with
the frequent and canonical prepositional construction in (12a) and (14a),
even when this mixes up the usual correspondence between form and
meaning in that pattern.
I now turn to a situation in which the mismatch involves syntactic
heads and semantic functions.
4.3 Heads and Functions: The Spatial Case Alternation
Many languages in the Indo-European language family show a mean-

ingful alternation between two types of grammatical case within
Out of Phase 71
prepositional phrases. I focus here on German, but the pattern can

also be seen to various degrees in other IE languages (see Gehrke
2008; Caha 2010; Lestrade 2010). The German case alternation is
well-covered in descriptive and theoretical work (in both cognitive
and generative grammar, e.g., Smith 1995; Zwarts 2006; Van Riemsdijk
2007).
Some spatial prepositions in German can occur either with the dative
or accusative case on their object:
(15) a. Anna stand in dem Zimmer
Anna stood in the.DAT room
Anna stood in the room
b. Otto trat in das Zimmer
Otto stepped in the.ACC room
Otto stepped into the room
The dative case is used when the PP describes a place and the accusa-
tive when it describes a path to that place. The prepositions with which
this happens are an on, auf on, hinter behind, in in, neben next to, ber
above, unter under, vor in front of, zwischen between, which constitute
almost all the primary locative prepositions of German, covering both
topological and projective relations. The set of alternating prepositions
is not the same in every language that shows the alternation and it is not
stable in German either: it varies somewhat across dialects (Draye 1996)
and across time (Dal 1966). That motivates a lexical treatment of the case
assignment properties of individual prepositions.
We can make more precise what (15a) and (15b) mean in terms of the
semantics sketched in section 4.1, but ignoring the contribution of the
verbs:
(16) a. BE (ANNA, IN (ROOM ; DEF))
b. GO (OTTO, TO (IN (ROOM ; DEF)))
When the preposition governs the accusative case, the TO function
applies in the semantics, but when it governs the dative, this function is
absent. This is the pattern with all the alternating prepositions mentioned
above.
What do the PPs in (15) look like in the Parallel Architecture? Sim-
plifying matters considerably, I syntactically represent dative and accusa-
tive case as features on the NP (which are mostly spelled out on the
determiner):
72 Joost Zwarts
(17) a. in1 dem2,3 Zimmer4

[PP P1 [NP[DAT3] D2 N4 ]]
IN1 (ROOM4 ; DEF2)
b. in1 das2,3 Zimmer4
[PP P1 [NP[ACC3] D2 N4 ]]
TO3 (IN1 (ROOM4 ; DEF2))
The dative marker in (17a) has no semantic component corresponding
to it since location is characterized by the absence of a path function.
The dative case can be treated as a default case for several reasons
(Zwarts 2006), and this is one of them. The accusative case, however, is
directly linked to the semantic TO function.
We can take the next analytical step by assuming two lexical entries
for German in, one governing dative and having place semantics, and
another one governing accusative with path semantics:
(18) a. in1 Phon2
[PP P1 NP2[DAT] ]
IN1 (Thing2)
b. in1 Phon2
[PP P1 NP2[ACC3] ]
TO3 (IN1 (Thing2))
I assume that all the alternating prepositions in German have two
entries like this. A noun phrase can only be inserted in or unify with
the open place in (18) if it has the right case, as determined by the
feature on the syntactic variable. In the construction grammar view of
Jackendoff (2008) and others, (18a) and (18b) might be part of two more
general constructions in which a syntactic form [PP P1 NP2[ ] ] would
DAT
correspond to the meaning Place1 (Thing2) and [PP P1 NP2[ 3] ] to TO3

ACC
(Place1 (Thing2)), both instantiated by the phonological string Phon1

Phon2.
If we now consider (18b), or the schematic construction that it instanti-
ates, more closely, we can see that it violates the correspondence prin-
ciple formulated in (7). The order of function application between TO
and IN does not correspond to the order of government between the
preposition in and the accusative case marker. The accusative marker is
in the wrong place, semantically speaking; it should be outside the
preposition. For the Parallel Architecture, this is no problem. Even
though the correspondence principle in (7) captures the default situation,
the lexicon can contain idiosyncratic exceptions that go against the
default.
Out of Phase 73
In minimalist syntax, the approach is different. Following work by

Koopman (2000) on Dutch, it has become customary to assume that the
semantic articulation of paths and places is actually part of the syntax
(see Cinque and Rizzi [2010] for a representative collection of papers).
There are different versions of this but, roughly speaking, the central idea
is that a directional PP consists of a PathP on top of a PlaceP, as illus-
trated here for the phrase from under the table in (19a):
(19) a. [PathP from [PlaceP under [DP the table ]]]
b. [PathP vandaan [PlaceP onder [DP de tafel ]]] (Dutch)
b. [PathP [PlaceP onder [DP de tafel ]] vandaan t ] (Dutch)
There can be no real mismatches between syntax and semantics,
because the semantic hierarchy of place and path has become an integral
part of the syntax of the prepositional phrase. It is possible to move ele-
ments within the structure to account for postpositional structures, like
the Dutch translation of (19a) in (19b), which is then derived from (19b)
by movement. The actual analysis is usually more complicated than this.
Applications of this idea to German case alternation can be found in
Van Riemsdijk (2007), Den Dikken (2010), and Caha (2010). Here I
focus on Cahas treatment, which explains the connection between accu-
sative case marking and path semantics by moving the object noun
phrase to the accusative position associated with the PathP, followed by
another movement that puts the preposition in front of the object again.
The first step could be taken as parallel to the derivation of postposi-
tional structures in Dutch (see (19b) and (19b)), but it is unclear what
motivates the crucial second step of putting the locative preposition in
front. As a result, the way goals are marked by accusative case inside
German prepositional phrases does not really fall out naturally from
general principles but has to be stipulated in a way that is much more
complicated than a lexically stipulated correspondence between more
independent pieces of structure, like in (18b), in line with the Parallel
Architecture.
But why would such a mismatch between the position of a syntactic
element (a case marker) and a semantic element (the TO function) exist?
Stipulating the existence of pairs like those in (18) is insufficient. We also
want to know why German (and other IE languages) have such pairs.
The explanation does not lie within the workings of the synchronic
grammar itself, but outside it, in the historical development of Indo-
European languages and in the process of the grammaticalization of
cases. The case system of Proto-Indo-European that the system of modern
74 Joost Zwarts
German derives from was not only richer in its inventory, but it also
allowed the spatial use of cases without any prepositions, something that
can be seen in Latin. The accusative form Romam has the meaning TO
(ROME) and the ablative form Carthagine the meaning FROM (CAR-
THAGE). It is assumed that prepositions came in later in the IE lan-
guages, developing out of adverbs (see Dal [1966] for German). This
means that nouns were already carrying obligatory case markers with
elementary directional meanings and prepositions were combined with
those case-marked nouns, adding locative meanings. The accusative case
in German is closer than the preposition because it represents an older
layer and the locative preposition is outside it, grown as a newer layer
(see Vincent [1999] for this situation in Latin and Romance).
In order to allow these non-compositional combinations, the gram-
matical system has to reanalyze them as lexical units, as in (18). It
would be impossible to first build an accusative noun phrase das
Zimmer with the meaning TO (ROOM ; DEF) and then apply in with
the meaning IN in such a way that the place function gets squeezed
between TO and the ground ROOM. The only option is to take the
combination in+ACC as a lexical unit, non-compositionally associated
with the meaning TO IN.
4.4 More Mismatches
I have taken a detailed look at two form-meaning mismatches in the

prepositional domain, demonstrating that they allow for a simple repre-
sentation in the Parallel Architecture, giving syntactic and semantic
structures their due. The class of mismatches in the prepositional phrase
is not exhausted by the two cases discussed here. Let me mention two
other cases here that deserve further study.
There is a class of locative PPs that refer to the body part of the figure
that makes contact with a supporting surface, like in the following
example (with his anaphoric to Bob):
(20) Bob stood on his head.
Crucially, his head is not the ground of the relation, because Bob is not
located relative to his own head. The ground is implicit in (20)it is the
floor, for instance. One might think that on does not have a spatial sense,
but simply marks body parts involved in location, but Dutch shows that
the preposition still functions with its usual spatial component. Dutch
has two versions of on: roughly speaking, op is used for situations where
the figure is supported from below and aan is used when it is supported
Out of Phase 75
from above, that is, hanging (Van Staden, Bowerman, and Verhelst 2006).
Now consider the following examples:
(21) a. Bob stond op zijn handen (op de tafel).
Bob stood on his hands (on the table)
a. * Bob stond aan zijn handen (op de tafel).
b. Bob hing aan zijn handen (aan de dakgoot).
Bob hung on his hand (on the gutter)
b. * Bob hing op zijn handen (aan de dakgoot).
The preposition op or aan used to introduce the body part of the figure
object that makes contact with the ground object (op/aan zijn handen
on his hands) is always the same contact preposition that is used to
express the type of contact made by the figure object with the ground
object (op de tafel on the table, aan de dakgoot on the gutter). If there
is support from below, then op is used with both body part and ground
object; if there is support from above, then aan is used with both body
part and ground object.
Suppose now that semantically the preposition on in example (20) still
applies to an implicit ground and that his head refers to the figure of the
spatial relation and not the ground. The representation of the contribu-
tion of the PP could be as given in (22):
(22) on1 his2 head3
[PP P1 [NP D2 N3 ]]
BE (HEAD3 (BOB2), ON1(Ground))
Although many aspects of this construction need further study, it
seems a potential example of a PP that involves a mismatch between
form and meaning because the syntactic object of the preposition cor-
responds to what is conceptually the figure.
A different type of mismatch is presented by doubling in the preposi-
tional phrase, which is rare in English, but common in many other lan-
guages (see, for example, Aelbrecht and Den Dikken [2013]). The FROM
function can be expressed in Dutch by a preposition van, a postposition
vandaan (with a meaningless cranberry morpheme daan), but interest-
ingly, also by a combination of the two:
(23) a. van onder de tafel
from under the table
b. onder de tafel vandaan
under the table from-DAAN
c. van onder de tafel vandaan
from under the table from-DAAN
76 Joost Zwarts
Such a situation is potentially problematic for a model that encodes

meaning in the syntax through a unique PathP. In the Parallel Architec-
ture representation of (23c), however, there might be just one FROM,
corresponding to a combination of adpositions:
(24) van Phon1 vandaan
[PP [PP P PP1 ] P ]
FROM (X1)
Of course, such a representation does not release us from the obliga-
tion to make generalizations about doubling patterns like those in (24)
and to explain how and why they occur, but such generalizations and
explanations are not driven by a syntax that directly embodies the
semantics of space, but by a system that flexibly aligns form and meaning
on the basis of a variety of factors and constraints.
Acknowledgments
The research for this paper was made possible by a grant from the Neth-
erlands Organization for Scientific Research (NWO), grant 360-70-340.
Parts of this paper were presented at various workshops in the past
couple of years, and I thank the audiences there for helpful comments
and questions. Urpo Nikanne and Henk Verkuyl are gratefully acknowl-
edged for their remarks on an earlier version of this paper.
References
Aelbrecht, Lobke, and Marcel den Dikken. 2013. Preposition doubling in Flemish
and its implications for the syntax of Dutch PPs. Journal of Comparative Ger-
manic Linguistics 16 (1): 3368.
Caha, Pavel. 2010. The German locative-directional alternation: A peeling
account. Journal of Comparative Germanic Linguistics 13 (3): 179223.
Chomsky, Noam. 1965. Aspects of the Theory of Syntax. Cambridge, MA: MIT
Press.
Cinque, Guglielmo, and Luigi Rizzi, eds. 2010. Mapping Spatial PPs: The Cartog-
raphy of Syntactic Structures. Vol. 6. Oxford: Oxford University Press.
Coppock, Elizabeth. 2009. The Logical and Empirical Foundations of Bakers
Paradox. PhD diss., Stanford University.
Culicover, Peter, and Ray Jackendoff. 2005. Simpler Syntax. Oxford: Oxford Uni-
versity Press.
Dal, Ingerid. 1966. Kurze deutsche Syntax auf historischer Grundlage. Tbingen:
Max Niemeyer Verlag.
Out of Phase 77
Den Dikken, Marcel. 2010. On the functional structure of locative and directional
PPs. In Mapping Spatial PPs: The Cartography of Syntactic Structures, vol. 6,
edited by Guglielmo Cinque and Luigi Rizzi, 74126. Oxford: Oxford University
Press.
Draye, Luk. 1996. The German dative. In The Dative, vol. 1, Descriptive Studies,
edited by William van Belle and Willy van Langendonck, 155215. Amsterdam/
Philadelphia: John Benjamins.
Fillmore, Charles. 2002. Mini-grammars of some time-when expressions in
English. In Complex Sentences in Grammar and Discourse: Essays in Honor of
Sandra A. Thompson, edited by Joan Bybee and Michael Noonan, 3159.
Amsterdam/Philadelphia: John Benjamins.
Gehrke, Berit. 2008. Ps in Motion: On the Semantics and Syntax of P Elements
and Motion Events. PhD diss., Utrecht University.
Haspelmath, Martin. 1997. From Space to Time: Temporal Adverbials in the
Worlds Languages. Mnchen: LINCOM Europa.
Jackendoff, Ray. 2002. Foundations of Language: Brain, Meaning, Grammar, Evo-
lution. Oxford: Oxford University Press.
Jackendoff, Ray. 2008. Construction after construction and its theoretical chal-
lenges. Language 84 (1): 828.
Koopman, Hilda. 2000. Prepositions, postpositions, circumpositions, and particles:
The structure of Dutch PPs. In The Syntax of Specifiers and Heads, edited by
Hilda Koopman, 204260. London: Routledge.
Lestrade, Sander. 2010. The Space of Case. PhD diss., Radboud University
Nijmegen.
Smith, Michael B. 1995. Semantic motivation vs. arbitrariness in grammar: Toward
a more general account of the dative/accusative contrast with German two-way
prepositions. In Insights in Germanic Linguistics I. Methodology in Transition,
edited by Irmengard Rauch and Gerald Carr, 293323. Berlin/New York: Mouton
de Gruyter.
Svenonius, Peter. 2007. Adpositions, particles, and the arguments they introduce.
In Argument Structure, edited by Eric Reuland, Tanmoy Bhattacharya, and
Giorgos Spathas, 63103. Amsterdam/Philadelphia: John Benjamins.
Talmy, Leonard. 1972. Semantic Structures in English and Atsugewi. PhD diss.,
University of California, Berkeley.
Talmy, Leonard. 2000. Toward a Cognitive Semantics. Cambridge, MA: MIT
Press.
Van Riemsdijk, Henk. 2007. Case in spatial adpositional phrases: The dative-
accusative alternation in German. In Pitar Mos: A Building with a View. Papers
in Honour of Alexandra Cornilescu, edited by Gabriela Alboiu, Larisa Avram,
Andrei Avram, and Daniela Isac, 265283. Bucharest: Editura Universitatii
Bucuresti.
78 Joost Zwarts
Van Riemsdijk, Henk, and Riny Huijbregts. 2001. Location and locality. In Prog-
ress in Grammar: Articles at the 20th Anniversary of the Comparison of Gram-
matical Models Group in Tilburg, edited by Marc van Oostendorp and Elena
Anagnostopoulou, 123. Amsterdam: Rocquade. Reprinted in Phrasal and
Clausal Architecture: Syntactic Derivation and Interpretation. In honor of Joseph
E. Emonds, edited by Simin Karimi, Vida Samiian, and Wendy K. Wilkins, 339
364. Amsterdam: John Benjamins, 2007.
Van Staden, Miriam, Melissa Bowerman, and Mariet Verhelst. 2006. Some prop-
erties of spatial description in Dutch. In Grammars of Space: Explorations in
Cognitive Diversity, edited by Stephen C. Levinson and David P. Wilkins, 475511.
Cambridge: Cambridge University Press.
Verkuyl, Henk J. 1973. Temporal prepositions as quantifiers. In Generative
Grammar in Europe, edited by Ferenc Kiefer and Nicolas Ruwet, 582615. Dor-
drecht: D. Reidel.
Vincent, Nigel. 1999. The evolution of c-structure: Prepositions and PPs from
Indo-European to Romance. Linguistics, 37 (6): 11111153.
Zwarts, Joost. 2006. Case marking direction: The accusative in German PPs. In
Proceedings of the 42nd Annual Meeting of the Chicago Linguistics Society, vol.
2, The Panels, edited by Jacqueline Bunting, Sapna Desai, Robert Peachey, Chris-
topher Straughn, and Zuzana Tomkov, 93107. Chicago: Chicago Linguistic
Society.
5 The Light Verbs Say and SAY
Jane Grimshaw
5.1 The SAY Schema
This paper proposes a universal schema for what I refer to as SAY verbs,
and shows how their shared syntactic and semantic properties derive
from the schema. The proposal is that SAY verbs fall into four distinct
types: the light verb say, verbs which encode SAY and discourse role, SAY-
by-means verbs, and SAY-with-attitude verbs.
The verb say is a light verb which corresponds to the abstract light
verb SAY, which is the shared semantic component of all SAY verbs.
Verbs such as ask, announce, assert, maintain, note, order, remark,
report, tell, and wonder encode aspects of the discourse role of the events
they report: asserting, ordering, questioning, and commenting, among
others. Mode verbs, which subdivide into SAY-by-means (mutter, grunt,
write) and SAY-with-attitude (bitch, gripe), encode other aspects of the
saying event by combining with an independent activity predicate.
Discourse-role verbs and mode verbs impose restrictions on their
arguments beyond those imposed by the SAY schema. The English light
verb say directly lexicalizes the SAY schema: it does not encode the prop-
erties that distinguish among discourse-role verbsit can be used to
report events of asserting, questioning, and commentingnor does it
encode the properties that distinguish among mode verbs such as mutter,
grunt, and bitch. It is therefore compatible with all of the grammatical
contexts that any of the other SAY verbs occurs in.
The SAY schema proposal builds on a long-standing hypothesis origi-
nating in works like Dowty (1979), Talmy (1985), Jackendoff (1990), and
Hale and Keyser (1993) that the syntactic and semantic properties of
predicates derive from universal semantic components and the principles
governing their realization. The core characteristics of SAY verbs are
schematized in (1). SAY requires an agentive subject and a Linguistic
80 Jane Grimshaw
Material argument. It admits a Goal.1 Universal principles determine

which argument is realized as the subject as well as other aspects of
argument realization. Given that the schema includes an Agent, it can
only be realized as an event, so this is not stipulated in (1). Some of the
arguments may be introduced syntactically by a separate verbal head v:
the schema simply evaluates the well-formedness of a predicate-argument
complex, however it is formed.
Agent / i

(1) SAY Lingustic Material / j

Goal / k
In (2) all three of the arguments are present, and the Linguistic Material
corresponds to the complement clause.
(2) The teacher said to the students that the exam was easy.
Direct quotes can be interpreted only as Linguistic Material, so they
can combine only with SAY verbs. In contrast, wh-CPs and that-CPs can
realize other argument types in addition. The unique distribution of
quotes provides important evidence for a unified analysis of say, SAY-by-
means, SAY-with-attitude, and discourse-role verbs.
The SAY verbs are grammatically distinct both from other verbs
pertaining to the domain of speech or language (such as address, con-
verse, discuss, speak, talk, and utter) and from other verbs taking finite
sentential complements, such as doxastic predicates like believe and
know, and emotive predicates like regret.2 None of these verbs have
Linguistic Material arguments, and they are not instances of the SAY
schema.
5.2 The Linguistic Material Argument of SAY
The schema in (1) entails that the complement of a SAY verb should be
obligatory. Discourse-role SAY verbs like those in (3) and (4) transpar-
ently fit this pattern:3
said
remarked

(3) *The students reported .
noted

maintained
The Light Verbs Say and SAY 81
said
remarked

(4) The students reported that the exam was easy.
noted

maintained
The Linguistic Material argument can correspond to a variety of
syntactic complements, including that-CPs as in (4) and wh-CPs as
in (5):
asked
(5) The students whether the exam was easy.
wondered
Note that the latter two complement structures are not unique to SAY
verbs. Verbs like believe, discover, and feel also allow that-CPs, and verbs
such as know and find out allow wh-CPs. The examples in (6) illustrate
this point:
believed

(6) a. The students discovered that the exam was easy.
felt

knew
b. The students whether the exam was easy.
found out
However, since they have Linguistic Material arguments, SAY verbs can
combine with direct quotes.4 They do so in three contexts. In the first,
the quote is in complement position.5 In the second, the quote hosts a
parenthetical quotation fragment (QF). In the final case, the quote com-
bines with a copula in a pseudo-cleft.6 In the last two configurations the
quote is identified indirectly with the verbs complement position through
an operator that is coindexed with the quote (see Grimshaw 2013). Of
the verbs in (3)(6) only the SAY verbs can appear with quotes in any of
these configurations.
Every SAY verb appears in all three syntactic configurations, provided
that no independent principles interfere. One factor concerns the struc-
ture of pseudo-clefts, in which the DP what fronts from the complement
position of the SAY verb. Any SAY verb that does not admit a DP comple-
ment is excluded from the pseudo-cleft SAY verb contexts, while it is
allowed in the others. This is discussed in section 5.6.
The examples in (7) and (9) are well-formed because say and remark
are SAY verbs: those in (8) and (10) are not. The verbs believe, discover,
82 Jane Grimshaw
feel, know, and find out do not combine with quotes: they are not SAY
verbs.7
said
(7) The students Our exam was easy.
remarked
believed

(8) *The students discovered Our exam was easy.
felt

asked
(9) The students Will our exam be easy?
wondered
knew
(10) *The students Will our exam be easy?
found out
In (11)(14) the quote hosts a QF, which contains an embedding verb
missing its embedded clause. I use only clause-final examples here, but
the parenthetical can appear within the quote instead of clause-finally
(Grimshaw 2013). Again, only the SAY verbs are possible.
said
(11) Our exam was easy, the students .
remarked
believed

(12) *Our exam was easy, the students discovered .
felt

asked
(13) Will our exam be easy? the students .
wondered
knew
(14) *Will our exam be easy? the students .
found out
In the representation of QFs, the quote is not embedded. The comple-
ment of the verb in the parenthetical is a trace, which is bound by an
operator, which in turn is identified with the quote. (The works cited in
note 5, as well as Corver and Thiersch [2001], give evidence for the pres-
ence of the chain.) Hence, indirectly, the quote provides the complement
of the embedding verb, and only a Linguistic Material complement can
license the quote. In (15), which is the representation of (11), the func-
tional projection (FP) hosting the operator is right-adjoined to the TP
which dominatines the quote:
(15) TP
TPi FP
DP T XP F
Our exam T VP Opi F TP
V DP T
V AdjP the students T VP
was easy V
V XPi
said
The pseudo-cleft evidence again shows that SAY verbs combine with
direct quotes, but the other verbs do not:
said
(16) What the students was Our exam was easy.
announced
believed

(17) *What the students discovered was Our exam was easy.
felt

asked
(18) What the students was Will our exam be easy?
wondered
knew
(19) *What the students was Will our exam be easy?
found out
All of the examples of quotation in (7)(14) and (16)(19) are consistent
with the effects of selection, which I return to in section 5.7. (4) and
(6) show that say, remark, believe, discover, and feel are compatible with
CPs introduced by that. (5) and (7) show that ask, wonder, know, and
84 Jane Grimshaw
find out are compatible with CPs introduced by whether. Nevertheless,

the direct quote counterparts are impossible with believe, discover, feel,
know, and find out. The proposal is that this is because these verbs do
not combine with Linguistic Material arguments.
Interpreted this way, the evidence from direct quotes shows that some
unexpected verbs show SAY properties. The verb wonder, used above, is
one example. The expression want to know can replace ask or wonder in
(5), (9), (13), and (18), while know itself is impossible in combination
with direct quotes. This shows that wonder and want to know combine
with Linguistic Material complements, that is, can realize the SAY schema.
The same is true for the morpheme think: it combines with quotes as
complements, in QFs and in pseudo-clefts.
(20) a. The students thought This exam is easy.
b. This exam is easy, the students thought.
c. What the students thought was This exam is easy.
This case of think is not a doxastic synonym of believe, and indeed is not
even stative: all of the examples in (20) are well-formed with think in the
progressive. In such examples, wonder, want to know, and think are
instances of SAY which report internal linguistic formulation only, an
instance of the SAY schema in which the Agent and Goal arguments are
not distinct.
In sum, the evidence from direct quotes separates say and discourse-
role SAY verbs from others that take clausal complements.
5.3 SAY Verbs with Mode Specifications
SAY-by-means and SAY-with-attitude combinations arise from the enrich-

ment of the SAY schema by the properties of an independent verb encod-
ing means (manner, sound, or form) or attitude. (21) lists a representative
example for each:8
(21) How SAY + mode verbs are constructed
Means Manner SAY + mutter mutter

Sound SAY + grunt grunt
Form SAY + write write
Attitude SAY + bitch bitch
The properties of SAY + mode combinations follow from the SAY schema
and the mode verb, together with independent principles constraining
thematic roles and aspect. The mode verb (e.g., mutter, grunt, write, and
bitch) provides the morphological realization for the SAY + mode combi-
nation.9 It is an activity predicate:
muttered
grunted
at
(22) The customer ( the manager) for a few seconds
wrote to
bitched
(then left).
Like the SAY verbs discussed above, SAY + mode combinations are all
verbal and project achievements, despite the activity status of the mode
component. Projections headed by SAY verbs seem to lack the internal
structure of accomplishments.
The activity verb has its own argument-taking capacities, illustrated in
(23). Here, bitch combines with an Agent, and optionally with a Goal
(introduced by to) or a Target (introduced by at):10
Agent /
(23) bitch
Goal or Target /
The mode verb combines with the SAY schema as indicated in (24). The
indexing on the arguments is carried over from (1) and (23):
Agent / i,

(24) SAY-bitch Lingustic Material / j
Goal or Target/k,

The subject of the SAY + mode verb is an argument of both components
of SAY + bitch, represented formally by the fact that it carries two indices,
and the same holds for the Goal/Target. The Linguistic Material argu-
ment is related only to the SAY component.
As for the other SAY verbs, the Linguistic Material argument of SAY +
mode verbs can be realized as a CP or a direct quote as in (25)-(28). In
(25) and (26) the quote is in complement position:
muttered
grunted

(25) The students that the exam was too difficult.
wrote
bitched
86 Jane Grimshaw
muttered
grunted

(26) The students The exam was too difficult.
wrote
bitched
SAY + mode verbs occur in QFs, as in (27), combining with quotes as
indirect complements:
muttered
grunted

(27) The exam was too difficult, the students .
wrote
bitched
SAY-by-means verbs can also occur in pseudo-clefts:
muttered

(28) What the students grunted was the exam was too difficult.
wrote

The SAY schema contributes the Linguistic Material argument, and hence
these complementation possibilities.11
The semantic complexity of SAY + mode structures is not unique. It is
well known that a manner component can form part of the representa-
tion of motion verbs, forming verbs that would be analyzed in the present
terms as GO-by-means. For recent examples of the line of research initi-
ated in Talmy (1985), see Zubizarreta and Oh (2007), Beavers, Levin, and
Tham (2010), Beavers and Koontz-Garboden (2012).
Yet another case of conflation can be found in verbs indicating ges-
tures or facial expressions, like shrug and beam, and allowing them to
occur with clausal complements. These are not SAY verbs, though, and
they do not combine with quotes.
5.4 Subjects of Say and SAY Verbs
The SAY verbs looked at so far have agentive subjects and are eventive,
but say also occurs with a subject that encodes the location of Linguistic
Material. Nouns like shelf and river cannot appear as the subject in (29)
because of their non-linguistic character.
sign
poster

(29) The said that the park was closed.
book
article
When the subject is a Location rather than an Agent, the entire clause
is stative. It is therefore odd in the progressive, and incompatible with a
Goal argument:12
sign
poster

(30) ??The was saying that the park was closed.
book
article
sign
poster

(31) ??The said to the tourists that the park was closed.
book
article
Nevertheless, say continues to display the hallmarks of a SAY verb: it
combines with quotes in complement position, in QFs, and in
pseudo-clefts:
sign
poster

(32) The says the park is closed.
book
article
(33) a. The park is closed, the sign says.
b. What the sign says is, The park is closed.
The verb say thus has two variants, corresponding to two variants of the
schema. The second is in (34):
Location/i
(34) SAY
Lingustic Material/j
All SAY verbs should occur with non-agentive subjects in principle.
Whether they do or not will depend upon the demands of their discourse
role or mode. Certain discourse roles are clearly compatible with non-
agentive subjects:
(35) a. The survey asks whether people work more than 40 hours a
week.
b. The article comments that most people lie about their work
habits.
Which ones are compatible and why remains to be investigated.
The restrictions on the subject of the SAY + mode verb are compara-
tively transparent. The subject of SAY is identified with the subject
88 Jane Grimshaw
argument of the mode predicate, as shown for the agentive schema in

(24). So the subject of the complex SAY verb must be semantically com-
patible with both the SAY schema and the mode verb. (36) shows that
mutter, grunt, and write do not allow the non-agentive subjects of (29);
their subjects must be agentive, or at least animate:
sign
poster muttered

(36) #The grunted all year.
book wrote
article
Hence the corresponding SAY-by-means structures do not allow these

subjects either. With non-agentive subjects, SAY-by-means verbs cannot
have quotes in complement position, cannot appear in QFs, and do not
allow that-complements. The examples in (37) illustrate the point for
that-complements:13
sign
poster muttered

(37) #The grunted that the park was closed.
book wrote
article
SAY-with-attitude verbs allow some, but not all, of the non-agentive

subjects of (29):
book
article bitched

(38) The ...
# sign complained
# poster
This seems to be due to a construal in which the Source is identified with

the authors of books and articles, whose attitudes are expressed in the
documents. Such a construal is only marginally possible for sign and
poster.
5.5 Goals and Targets of SAY Verbs
SAY licenses a Goal, and means or attitude components license a Goal

or Target, as (23) indicates. I assume that, as for subjects, the Goal argu-
ment of SAY is identified with the Goal argument of the mode verb, so
the to-PP is an argument of both:
muttered
grunted

(39) They to the instructor that the exam was too difficult.
wrote
bitched
Target PPs introduced by at are not permitted by say or by discourse-

role verbs. They are allowed by some mode verbs, but not by all. The
verbs complain and write, for example, do not occur with at-PPs:14
(40) a. mutter at ... , grunt at ... , bitch at ...
b. *say at ... , *remark at ... , *announce at ...
When a mode verb with a Target is conflated with SAY, the resulting
structure contains the subject, the Target, and the Linguistic Material
argument, realized by a CP complement (shown in (41) below) or a quote
(not shown here):
muttered

(41) They grunted at the instructor that the exam was too difficult.
bitched

Following this line of analysis, the properties of SAY-by-means and SAY-
with-attitude predicates are composed from those of SAY and those of
the activity predicates contributing the means or the attitude.
5.6 DP Realizations of the Linguistic Material Argument of SAY
The analysis of SAY verbs divides them into four distinct cases: say,
discourse-role verbs, SAY-by-means, and SAY-with-attitude. These distinc-
tions coincide with differences in the realization of the Linguistic Mate-
rial argument.
5.6.1 What as the Linguistic Material argument of SAY

One discrepancy among the four SAY verb types is found in pseudo-clefts.
The discourse-role verbs in (16) and (18) are say, announce, ask, and
wonder. Their counterparts with comment and remark are ungrammati-
cal, even though both verbs allow that-CPs.
commented
(42) *What the students was Our exam was difficult.
remarked
Similarly, SAY-with-attitude verbs are ungrammatical in pseudo-clefts,
while SAY-by-means verbs are allowed (see (28) above).
90 Jane Grimshaw
bitched
(43) *What the students was Our exam was difficult.
griped
The source of these discrepancies lies in the expression of the Linguis-
tic Material argument, which is realized by the DP what in a pseudo-cleft.
The table in (44) summarizes the generalization, based on
pseudo-clefts:
(44) Linguistic Material argument realized as a DP
say Discourse role verbs SAY SAY

+ means + attitude
Yes Yes No Yes No
say announce comment mutter grunt bitch
ask remark write complain
assert insist gripe
Since the verbs in the No columns in (44) ban DP realization of the

Linguistic Material argument of SAY, they are also incompatible with DP
wh-phrases in questions, when the wh-phrase corresponds to a Linguistic
Material argument. The following examples illustrate the judgments for
say, and the discourse-role verbs.
(45) What did the students say?
(46) What did the students announce?
comment
(47) *What did the students ?
remark
The examples in (48) and (49) contrast SAY-by-means and SAY-with-atti-
tude verbs. Only the means verbs are possible:
mutter

(48) What did the students grunt ?
write

bitch
(49) *What did the students ?
gripe
SAY verbs in QFs are possible because it is the trace of the moved
operatorrather than whatthat is the complement of the SAY verb, as
the structure in (15) indicates. The trace apparently counts as clausal,
rather than nominal, otherwise the SAY verbs which do not allow DP
complements would be ungrammatical in QFs. This point merits further

investigation.
A further generalization, which follows the same fault lines as pseudo-
clefts and interrogatives, concerns the possibility of passivization with a
Linguistic Material argument. The passive of say with a clausal comple-
ment is a little marginal:
That the accident wasnt their fault said
(50) ? has been by
It wasnt my fault announced
every youthful driver at some point.
Replacing the verbs in (50) with comment or remark renders the sen-
tence ungrammatical:
That the accident wasnt their fault commented
(51) * has been by
It wasnt my fault remarked
The same contrast is found between the SAY-by-means and SAY-with-
attitude verbs:
muttered
That the accident wasnt their fault
(52) ? has been grunted by
It wasnt my fault written
That the accident wasnt their fault bitched

(53) * has been by
It wasnt my fault griped
In sum, DP realization for the Linguistic Material argument of SAY,
along with passivization, is permitted by say and SAY-by-means verbs,
rejected by SAY-with-attitude verbs, and permitted by just some discourse-
role verbs.
5.6.2 Saying a Few Words

Other DP realizations for the Linguistic Material argument are informa-
tive: when the DP is a few words, a subtly different picture emerges. The
verb say permits this realization, and the SAY-by-means means verbs also
permit it:
(54) a. The student said a few words (and sat down).
muttered

b. The student grunted a few words (and sat down).
wrote

92 Jane Grimshaw
The attitude verbs and all discourse-role verbs are ungrammatical

with a few words as the Linguistic Material argument. This is true for
announce, ask, assert, comment, maintain, note, remark, report, and tell,
as illustrated in (55):
bitched
(55) a. *The student a few words (and sat down).
griped
announced
b. *The student a few words (and sat down).
reported
The grammatical sentences in (54) entail their (less informative) para-
phrases with utter and emit, verbs disallowing clausal complements alto-
gether. This suggests that the DP a few words is able to satisfy the
requirements imposed on the Linguistic Material argument by say and
the SAY-by-means verbs because the verbs can be instances of emit and
correspond to a different SAY schema. Their complements denote linguis-
tic units (such as words and sentences) but not Linguistic Material.
Neither discourse-role verbs nor SAY-with-attitude verbs realize the emit
schema.
The complement of a discourse-role verb must be capable of playing
the discourse role encoded by the verb. Since a few words cannot express
an assertion, an order, a question, or a comment, it is not a valid comple-
ment for these verbs. The light verb say does not encode a discourse
role, so emitting a few words can be validly described as saying, despite
the fact that it involves no assertion or other discourse move. With
an attitude verb, the complement must express the state of affairs or
proposition that is the target of the attitude. Mere words cannot do this,
hence the impossibility of a few words as the complement to bitch and
gripe.
5.7 The General Properties of SAY Verbs
5.7.1 Aspectual Properties of SAY Verbs

While some SAY verbs can be stative (section 5.4), no SAY verb is exclu-
sively stative. SAY cannot combine with BECOME and CAUSE to encode a
change of state or a caused change of state. Therefore the SAY system
contains no counterparts to doxastics like believe, conclude, or convince.
A related point is that no SAY verb is factive or semi-factive.15 Factive and
semi-factive complements occur only with emotive, evaluative and dox-
astic predicates. The properties of SAY verbs are unlike those of other
clausal-complement-taking predicates in some central respects.
5.7.2 Productivity and Regularity: The Means and Attitude Verbs

SAY-by-means and SAY-with-attitude verbs are lexically unrestricted and
show no accidental gaps. An activity verb expressing means or attitude
can be conflated with SAY, subject only to the general restriction that it
express a mode that is compatible with SAY (e.g., mutter but not wiggle,
see note 8). Both GO-by-means verbs and verbs like beam, touched on in
section 5.3, seem equally free of arbitrary restrictions.
Section 5.6.2 shows that SAY-by-means and SAY-with-attitude verbs are
completely regular in their ability to occur with DPs realizing the Lin-
guistic Material argument. The means verbs allow it, and the attitude
verbs disallow it.
Similarly, the Goal arguments of SAY-by-means and SAY-with-attitude
verbs are entirely regular: a Goal argument is possible with every verb,
it is optional for every verb, and it is realized as a PP for every verb. As
the next two groups of examples illustrate, these SAY verbs simply carry
over the argument-taking properties of the activity predicates that they
incorporate.
muttered
(56) a. She to the teacher.
bitched
muttered
b. *She the teacher.
bitched
muttered
(57) a. She to the teacher that the exam was too difficult.
bitched
muttered
b. *She the teacher that the exam was too difficult.
bitched
5.7.3 Variation in Argument Realization: Discourse-Role SAY VERBs

Discourse-role SAY verbs seem more idiosyncratic. They vary in whether
or not they admit a DP as their Linguistic Material argument, as shown
in 5.6.1. Similarly, some discourse-role SAY verbs (tell, for example) allow
a Goal to be realized as a DP. Others, like explain, do not. It is possible
that these properties will prove to be less arbitrary than they now seem,
once SAY verbs are separated from other predicates which take clausal
complements.
5.7.4 Limits on What a SAY Verb Can Encode

The following combinations are impossible in a SAY verb: means with atti-
tude; discourse role with attitude; and discourse role with means. No verb
has the whisper means and the bitch or grouch attitude, combining
94 Jane Grimshaw
means with attitude to yield verbs with the rough paraphrases bitch in a
whisper or whisper grouchily. No verb has a structure which encodes
ask bitchily or assert grouchily (combining discourse role with atti-
tude), and no verb has a structure which encodes ask by whispering or
assert by shouting (combining discourse role with means). The para-
phrases indicate that these are not logically impossible meanings, but
they seem to be linguistically impossible, suggesting that discourse role,
means, and attitude compete for a single position in the structure of
complex SAY verbs. This conclusion is reminiscent of the hypothesis that
manner and result components are incompatible in verb meanings.
(See Beavers and Koontz-Garboden (2012) for a recent review.)
5.7.5 Selection for Clausal Complements

Embedding verbs and their clausal complements are subject to restric-
tions on clause type: some embedding verbs combine with wh-clauses,
some with that-clauses, some with infinitives, and so forth. Within the SAY
verb system there is a very clear pattern. The discourse-role verbs show
effects of selection for their complement: for example, assert combines
only with propositions/declaratives, ask only with interrogatives/
questions. Thus in (58) it is ungrammatical to switch the quotes in a and
b while leaving the verbs unchanged. (I use quotes hosting parentheticals
to illustrate the point, rather than ordinary clausal complements, in order
to avoid complexities stemming from the syntactic form of clausal com-
plements; Grimshaw (2014).)
(58) a. The exam was too difficult, the students asserted.
b. Will the next exam be that difficult? the students asked.
The key to these selectional effects is that assert reports events of asser-
tion, and ask reports questioning events. I suggest that discourse role is
the only source of sensitivity to clause type within the SAY system. This
predicts that other SAY verbs should be free of such effects.
Let us consider first the SAY-by-means verbs. They do not encode dis-
course role, since they encode means, and only one specification is pos-
sible for each SAY verb (see section 5.7.4). Therefore they are predicted
to occur with both complement types, and indeed we find that all SAY-
by-means verbs allow both interrogative and declarative quotes:
muttered
(59) a. The exam was too difficult, the students .
grunted
muttered
b. Will the next exam be that difficult? the students .
grunted
The next case to consider is the English light verb say. Continuing to use
QFs, we can show that say is similarly indifferent to the distinction
between interrogatives and declaratives:16
(60) a. The exam was too difficult, the students said.
b. Will the next exam be that difficult? the students said.
Finally we turn to the SAY-with-attitude verbs, which show a slightly dif-
ferent pattern. They are a little odd with interrogatives, as in (61b):
bitched
(61) a. The exam was too difficult, the students .
griped
bitched
b. ?Will the next exam be that difficult? the students .
griped
The attitude that these verbs encode when they combine with clauses
is an attitude toward a state of affairs, and an interrogative complement
does not denote a state of affairs. Hence the combination in (61b) is
possible only in a context in which the current exam was regarded as too
difficult and the students are indirectly complaining about this state of
affairs. If this line of reasoning is correct, bitch and gripe combine freely
with clausal arguments, provided that the argument supplies the state of
affairs that the attitude is related to.
Under this reasoning, the only SAY verbs that exercise control over the
clausal arguments that they combine with are the discourse-role verbs.
If discourse role is the source of selection effects among SAY verbs,
selection by verbs that do not encode discourse role, that is, non-SAY
verbs, must be different in nature from the selection observed with SAY
verbs. This is the starting point of Grimshaw (2014).
5.8 Conclusion
A skeletal verb meaning determines core grammatical properties of SAY

verbs. Verbs built on this skeleton are of four types. Setting aside the
emit cases in section 5.6.2, the verb say directly realizes the light verb
SAY. Verbs like assert, ask, and comment add information about the role
in discourse of the event that they report. SAY-by-means verbs and SAY-
with-attitude verbs are constructed by grafting SAY onto independent
activity verbs in a principled fashion.
The light verb say entails the general characteristics of verbs of saying.
The specific aspects are encoded by individual morphemes. The analysis
is a step toward a theory that distinguishes sharply between aspects of
96 Jane Grimshaw
words that are specific to a morpheme, and must be learned piece by

piece; aspects that are determined by a grammar, and must be learned
once for the target language; and aspects that are governed by universal
grammar, and need not be learned at all.
Acknowledgements
My gratitude goes to the editors for making this volume possible, and to
Ray Jackendoff for making it necessary. I would like to thank Pranav
Anand, Veneeta Dayal, Valentine Hacquard, Florian Jaeger, Angelika
Kratzer, Julien Musolino, Sara ONeill, Alan Prince, Ken Safir, Roger
Schwarzschild, Chung-chieh Shan, the Colloquium audience at the
Rutgers University Center for Cognitive Science, and participants in the
2013 Rutgers Syntax I course. Their input into this research has been
enormously helpful. The paper has also benefitted from the astute com-
ments of an anonymous reviewer.
Notes
1. For the sake of simplicity, I will assume that interrogative-taking verbs such
as ask, wonder, and inquire also have Goal arguments. A more refined treatment
might modify this.
2. For related studies on say and verba dicendi see Munro (1982), Lehrer (1988),
Suer (2000). The special status of these verbs is recognized in typological
studies, such as Dixon (2006) and Noonan (2007).
3. Other verbs (e.g., tell) allow their complements to be elided in null comple-
ment anaphora (Grimshaw 1979, Depiante 2000) but still require the presence
of their complement if there is no appropriate antecedent in the discourse. See
note 8 on the status of manner-of-speaking verbs without complements.
4. The verbs hear and read also take Linguistic Material arguments and combine
with quotes. This suggests that it is the argument itself, rather than the SAY predi-
cate, which licenses direct quotes.
5. Whether the direct quote is the actual complement of the verb is controversial.
Obviously direct quotes are not just CPs like the complements in (2), (4), and
(6). The case for their complement status is argued in Grimshaw (2013, 2014).
See also Bonami and Godard (2008), de Vries (2006).
6. Only examples where the quote follows the copula as in (16) and (18) are
given here. The quote may instead be the subject as in (i):
(i) Our exam was easy is what the students said.
7. For the sake of brevity, I illustrate the behavior of verbs with Linguistic Mate-
rial arguments only in configurations where the quote is their sole argument. The
point can be replicated for verbs such as ask and tell, versus convince and show,
which take a DP in addition to their clausal argument, as in ask someone whether

its raining or convince someone that its raining.
8. The best-known example is the manner-of-speaking verbs (Zwicky 1971). I
do not use this term, because the additional component must be the means of
saying. This is why sentences like *He wiggled that it was time to leave are not
possible. The verb wiggle encodes an appropriate means for GO but not for SAY.
The SAY-by-means verbs are treated as a group here, but they are not uniform in
all respects. See Labendz (1998) on differences among them.
9. A very similar relationship holds between verbs like shrug and beam, which
report gestures or facial expressions. These exist as independent verbs, and also
form verbs with a complex meaning, roughly, express. In this combination they
take that-complements. They are not SAY verbs, however, and they do not combine
with quotes unless they are coerced.
10. The SAY-by-means and the SAY-with-attitude verbs typically combine with an
about-PP, which does not co-occur with a Linguistic Material argument. This
suggests that the about-PP is part of the schema for the activity predicates but
is not incorporated into their SAY versions. Presumably it is incompatible with
the Linguistic Material argument of these complex verbs, despite being compat-
ible with say even when a clausal argument is present.
11. The schema leads us to expect that the linguistic complement will be
obligatory. This is necessarily difficult to test. Any sentence which contains
the verb with no complement can in principle be analyzed as either the indepen-
dent activity verb or the SAY + mode verb with no complement. Since the
two have different aspectual characteristics (activity versus achievement), it
is possible in principle to distinguish the two analyses, but I do not pursue the
issue here.
12. Some SAY verbs such as say, tell, and hint are associated with yet another
schema meaning (approximately) show, i.e., constitute a source of evidence. In
this reading a Goal is possible.
13. Metaphorical extensions are possible, often accompanied by the adverb posi-
tively, e.g., That sign positively shrieks that people may not walk on the grass.
14. The explanation for this pattern may lie in the fact that write, say, and the
discourse-role verbs lack an affective element to license the at-PP. However, this
does not cover the case of complain.
15. A possible counterexample is the verb remind, which combines with quotes
(and is thus a SAY verb) and with that complements. In both cases the truth of
the linguistically expressed proposition seems to be presupposed. It is constant
under negation. Two points are of relevance in assessing this case: first, remind
is also a doxastic predicate, which may have ramifications for its use as a SAY
predicate. Second, other SAY verbs that seem factive at first glance are better
analyzed as reporting repetitions, and it is possible that remind is a special case
of this.
16. With non-quote clausal complements, say is awkward with interrogative
complements except in a question or negated as in: He hasnt said when he is
arriving.
98 Jane Grimshaw
References
Beavers, John, Beth Levin, and Shiao Wei Tham. 2010. The typology of motion
expressions revisited. Journal of Linguistics 46 (2): 331377.
Beavers, John, and Andrew Koontz-Garboden. 2012. Manner and result in the
roots of verbal meaning. Linguistic Inquiry 43 (3): 331369.
Bonami, Olivier, and Danile Godard. 2008. On the syntax of direct quotation
in French. In Proceedings of the HPSG08 Conference, edited by Stefan Mller,
355377. Stanford, CA: CSLI Publications.
Corver, Norbert, and Craig Thiersch. 2001. Remarks on parentheticals. In Prog-
ress in Grammar: Articles at the 20th Anniversary of the Comparison of
Grammatical Models Group in Tilburg, edited by Marc van Oostendorp and
Elena Anagnostopoulou. http://www.meertens.knaw.nl/books/progressingrammar/
corver.pdf.
Depiante, Marcela Andrea. 2000. The Syntax of Deep and Surface Anaphora: A
Study of Null Complement Anaphora and Stripping/Bare Argument Ellipsis.
PhD diss., University of Connecticut.
Dixon, Robert M. W. 2006. Complement clauses and complementation strategies
in typological perspective. In Complementation: A Cross-linguistic Typology,
edited by Robert M. W. Dixon and Alexandra Y. Aikhenvald, 148. Oxford:
Dowty, David. 1979. Word Meaning and Montague Grammar. Dordrecht:
D. Reidel.
Grimshaw, Jane. 1979. Complement selection and the lexicon. Linguistic Inquiry
10 (2): 279326.
Grimshaw, Jane. 2013. Quotes, subordination, and parentheticals. MS, Depart-
ment of Linguistics. Rutgers University.
Grimshaw, Jane. 2014. Direct quotes and sentential complementation. MS,
Department of Linguistics, Rutgers University.
Hale, Kenneth, and Samuel Jay Keyser. 1993. On argument structure and the
lexical expression of syntactic relations. In The View from Building 20: Essays in
Linguistics in Honor of Sylvain Bromberger, edited by Kenneth Hale and Samuel
Jay Keyser, 53109. Cambridge, MA: MIT Press.
Labendz, Jacob. 1998. Using standard American English manner-of-speaking and
sound-emission verbs as speech verbs. Senior essay, Brandeis University.
Lehrer, Adrienne. 1988. Checklist for verbs of speaking. Acta Linguistica Hun-
garica. 38 (14): 143161.
Munro, Pamela. 1982. On the transitivity of say verbs. In Studies in Transitivity,
Syntax and Semantics 15, edited by Paul J. Hopper and Sandra A. Thompson,
301318. New York: Academic Press.
Noonan, Michael. 2007. Complementation. In Language Typology and Language
Description, edited by Timothy Shopen, 42140. Cambridge, UK: Cambridge
University Press.
Suer, Margarita. 2000. The syntax of direct quotes with special reference to
Spanish and English. Natural Language and Linguistic Theory 18 (3): 525578.
Talmy, Leonard. 1985. Lexicalization patterns. In Grammatical Categories and the
Lexicon, vol. 3, edited by Tim Shopen, 57149. Cambridge, UK: Cambridge Uni-
versity Press.
Vries, Mark de. 2006. Reported direct speech in Dutch. Linguistics in the Neth-
erlands 23: 212223.
Zubizarreta, Maria Luisa, and Eunjeong Oh. 2007. On the Syntactic Composition
of Manner and Motion. Cambridge, MA: MIT Press.
Zwicky, Arnold. 1971. In a manner of speaking. Linguistic Inquiry 2 (2): 223233.
6 Cognitive Illusions: Non-Promotional Passives and
Unspecified Subject Constructions
Joan Maling and Catherine OConnor
6.1 Introduction1
Ambiguous illusions are one well-known subclass of cognitive illusions

pictures or objects that elicit a perceptual switch between alternative
interpretations. A famous example is the Rubin vase. Rubins vase (some-
times known as the figureground vase) is an ambiguous (i.e., reversing)
two-dimensional form that presents the viewer with two shape interpre-
tations. As the brain subconsciously attempts to distinguish the fore-
ground from the background, the viewer perceives either two black faces
looking at each other, or a white vase against a black background.
Each percept is consistent with the retinal image, but only one of them
can be maintained at a given moment.
Figure 6.1
The Rubin vase
102 Joan Maling and Catherine OConnor
While the most well-known cognitive illusions are visual, we argue in

this paper that impersonal constructions offer a syntactic analogue to the
ambiguous image on the previous page, a surface string which can
be analyzed in two different ways: as passive or active. As with visual
illusions, the surface string remains identical, but the grammatical
representationthe linguistic analogue of the perceptchanges. Just as
visual illusions underscore how the brain organizes its visual environ-
ment, the rare linguistic analogue may reveal how the brain organizes
linguistic input.
Our case for a linguistic parallel to the ambiguous visual image is
complex and it takes place in two different arenas. One is the arena of
language change. As we will illustrate in this paper, a variety of languages
show evidence through patterns of historical change that speakers of a
language may be interpreting these ambiguous impersonals in divergent
wayssome speakers reading the construction as a passive, and other
speakers interpreting it as an active. We will show that linguistic change
reveals the consequences of constructional ambiguity, and provides the
basis for our parallel with the Rubin vase.
Another quite different arena where the parallel can be explored is
amidst conflicts among linguists about the linguistic status of certain
impersonal passives as either active or passive. The passive is surely
one of the most thoroughly examined constructions in the worlds lan-
guages. To the uninitiated, it is puzzling to see experienced linguists
arguing about whether a particular construction in a given language is
or is not a passive. How could there be disagreement on something so
foundational? Yet the constructions that we focus on here, sometimes
called non-promotional passives, lead to responses that resemble viewers
arguing over the Rubin vase.
In what follows, we will track both arenas at the same time, looking
at data from language change (a result of speakers interpretations
over time) and at linguists responses to that data. We will show that
the ambiguity of impersonal constructions makes them an active nexus
of potential for change. At the same time, we will show linguists
struggling in their own analyses to discern what cues are most important
for speakers. Of course, the linguistic ambiguity is not completely
parallel with the Rubin vase: two viewers arguing over whether the image
really depicts two profiles or really depicts a vase are missing the
point. For linguists or speakers, there will be a decidable fact of the
matter: for this construction, at this time, for this speaker, is it active or
passive? The underlying assignment of grammatical functions to the
syntactic form by a speaker is the real reading. It is the shift across
Cognitive Illusions 103
speakers and across time, and consequently across linguists, that we are
following here.
6.2 The Problem Space
Linguists generally have no problem agreeing that some constructions

are passives. Focal exemplars (i) are formed on transitive verbs, (ii) map
the theme or patient of that verb onto the grammatical function of
subject with all the morphological properties that ordinary subjects
in that language display, and (iii) display the option of expressing the
agent of the verb in an oblique agent phrase (sometimes called the
by-phrase).2
Beyond these focal exemplars, however, are many constructions that
are not so easy to categorize. Some constructions do not allow any
surface expression of the transitive agentno by-phrase is possible. At
this point, definition of a passive might follow a decision tree of sorts.
First, is the theme/patient argument of the transitive verb promoted to
subject status? Constructions in which the patient shows evidence of
holding the grammatical function of subject are common.3 But even if
the theme/patient argument appears to be a subject, there are still debates
about category membership. Possibilities include middles, inchoatives,
reflexives, and adjectival participles, as well as real passives without the
capacity for a by-phrase.
On the other branch of this decision tree are constructions in which
the theme/patient argument of the transitive verb simply remains in
the morphosyntactic setting reserved for direct objectsthat is, it is
not expressed as a subject. If there is no subject NP expressed on
the surface (neither agent nor theme/patient), then we have another
set of puzzles. Perhaps it is what is sometimes called a non-promotional
passive. Or perhaps it is an active clause with a silent subject argu-
ment. It is this problematic branch of the decision tree for passives
that we focus on in this paper. We present evidence that speakers,
over time, may find this a Rubin vase-like task. At another level of
description, linguists are trailing after the speakers, arguing about the
analysis.
Part of the classification problem lies in the fact that surface morphol-
ogy is often ambiguous. A given morpheme may have two (or more)
interpretations, each of which is consistent with the grammar, but only
one of which can be maintained at a given moment.
Consider the italicized verb forms in the following examples from Jane
Austen:
(1) a. Our garden is putting in order, by a Man who bears a remarkably

good character, has a very fine complexion & asks something less
than the first.
(Austen 1884, letter to her sister Cassandra dated February 8,
1807)
b. The clock struck ten while the trunks were carrying down.
(Austen 1803, Northanger Abbey, chap. 20)
The clauses containing these verb forms are semantically passive, yet
nothing in the verb forms themselves identifies the clause as passive
voice. The progressive passives that we are familiar with, containing
two consecutive occurrences of the auxiliary be, did not appear in
English until the late 18th century.4 The two constructions co-existed for
about a century, and during this period the verbal morphology, for
example, were carrying, continued to be ambiguous between active and
passive voice:
(2) a. The men were carrying the trunks down the stairs. Active
b. The trunks were carrying down the stairs. Passive
Our conclusion is that morphology is an unreliable indicator of voice.
Sells, Zaenen, and Zec (1987) make a similar argument that the morpho-
logical status of the reflexive as either a suffix or a free-standing pronoun
is independent of its syntactic behavior. Such mismatches pose chal-
lenges for both the linguist and the child learner, and can lead to syntactic
reanalysis and grammatical change.
We begin by briefly reviewing one example of this kind of construction
in Irish, the so-called autonomous construction. This has been described
as a non-promotional passive (Stenson 1989; Noonan 1994). However,
McCloskey (2007) has argued convincingly that it is not a passive, but is
rather an impersonal active. Using the same syntactic tests, we then
briefly compare the no/to construction in two closely related Slavic
languages, Ukrainian and Polish, showing that this cognate construction
is syntactically passive in one language, but syntactically active in the
other. The contrasting syntactic behavior shows definitively that although
many of the constructions designated as non-promotional passives or
transitive passives are actually impersonal actives, there are some
that are indeed passives according to standard diagnostic syntactic
properties.
Finally, we look briefly at a new construction in Icelandic that has
emerged over the past few decades and has occasioned a great deal of
disagreement over its categorization as active or passive. Maling and
Sigurjnsdttir (2002) argue that it is an impersonal active, like the other

constructions we review here. However, because the change is ongoing,
the evidence is not as categorical as it is for Irish or for Polish versus
Ukrainian. We conclude by reflecting on two different sources of defini-
tional indeterminacy in all these cases, and in others located in the same
nebulous part of constructional space.
6.3 The Irish Autonomous Construction
Irish has a form of the finite verb known as the free (form of the) verb,
or the autonomous form:5
(3) a. Tgadh suas an corpn ar bharr na haille.
raise-PST-AUT up the body on top the cliff-GEN
The body was raised to the top of the cliff.
(McCloskey 2007, 826, ex. 1a)
b. h-Itheadh, h-ladh, ceoladh, . . .
eat-PST-AUT drink-PST-AUT sing-PST-AUT
There was eating, drinking, singing, (and then the storytelling
began).
(McCloskey 2007, 826, ex. 2c)
The autonomous form is derived by adding a distinctive suffix ((e)adh
in the Past) to the verbal stem, one for each tense (Present, Past, Future,
Conditional Mood, Past Habitual). The autonomous inflection is derived
historically from the passive; however, as illustrated in (3b), it can be
added not only to transitive verb stems, but also to intransitive verbs.
McCloskey (2007) argues that [d]espite its origin, and despite the fact
that it fulfills many of the same discourse functions as short passives in
English, the autonomous construction is not a passiveor not at least if
by a passive form we mean one in which the underlying object of a transi-
tive verb is rendered as a surface subject (827). The internal argument
of an autonomous form derived from a transitive verb stem behaves like
any other direct object in Irish: (a) it is marked accusative rather than
nominative; (b) if it is a light pronoun, it may be postposed to clause-final
position, an option available to direct objects but not to subjects; and (c)
it may be a resumptive pronoun, also an option available to direct objects
but not to subjects in Irish (see McCloskey [2007] and references cited
there). Scholars agree that the patient is not promoted to surface subject
in the autonomous form, but some still analyze it as a passive, albeit
impersonal in the sense of not having a grammatical subject (see

Stenson [1989] and Noonan [1994]).
What, then, happens to the subject of the corresponding active verb,
the most prominent of the verbs argumentsthe external argument of
a transitive verb, the internal argument of an unaccusative, the experi-
encer argument of a psych-predicate, and so on? Is this silent subject
completely absent? McCloskey (2007) argues that the silent subject of
an autonomous verb is like an arbitrary subject pronoun, but unlike an
implicit agent, in being syntactically active (828n3).
Cross-linguistically, the syntactic presence of an external argument can
be detected in standard ways. For example, an external argument can
bind anaphors, act as a controller, and support subject-oriented adverbi-
als. An apparent challenge to the active analysis for the Irish autonomous
form comes from the fact that it does not license reflexives (see e.g.,
Stenson 1989, 384; Noonan 1994, 287288):
(4) *Gortaodh fin.
hurt-PST-AUT him REFL
Intended: People hurt themselves.
(McCloskey 2007, ex. 11)
But as McCloskey argues, the impossibility of such examples can be
explained by a failure of agreement; the reflexive fin is added to the
3rd person singular masculine pronoun to make the corresponding
reflexive pronoun; the base pronoun must agree in person, number (and
for 3rd singular pronouns, also gender) with the binder. If the null argu-
ment of the autonomous form lacks the necessary person and number
features, it would not be surprising that it cannot bind the reflexive.
Support for this suggestion comes from the fact that the autonomous
form does allow the reciprocal pronoun, which has a single invariant
form a chile:
(5) a. Chuirt geall len- a chile.
put-PST-HABIT-AUT bet with each.other
People used to place bets with each other.
b. Tgadh suas an corpn ar bharr na haille
raise-PST-AUT up the body on top the cliff
ansan le cabhair a chile.
then with help each.other
The body was raised to the top of the cliff then with each
others help. (McCloskey 2007, 830, ex. 13a,b)
As McCloskey notes, the ungrammaticality of the English passive trans-

lation in (5b) reinforces the contrast between the autonomous form and
agentless passives; unlike the autonomous argument, the implicit agent
of a passive cannot bind an anaphor. A more faithful translation of (5b)
might be an active impersonal such as Then they raised the body to the
top of the cliff with each others help (830, n4).
What about an agentive by-phrase? Noonan (1994, 284285) claims
that the agent can be realized overtly as an oblique (though with a dif-
ferent preposition than is used for the by-phrase of a canonical passive),
and provides the example in (6):
(6) Bualadh Sen (le Liam).
hit-PST-IMPERS John (with Bill)
John was hit (by Bill).
(Noonan 1994, 280, ex. 2)
Many speakers do not accept this sentence, however; attested examples
are all from texts that are in other respects also sort of archaic (either
naturally or by artifice) (McCloskey, pers. comm., February 25, 2013).
The most thorough study is S (2006), who documents that although
overt agents with autonomous forms were common in earlier stages of
the language, they are rare and marginal in modern varieties of Irish
(McCloskey 2007, 828n3). Indeed, this topic is famous in Irish linguistic
politics. At the time of the first revival efforts, there was a big debate
about what form of the language should be the target of revival efforts:
the literary language of the 17th century, or caint na ndaoine the speech
of the people. Those who favored the vernacular pointed out that if the
old literary language were to be revived, Irish would have things like
autonomous verbs with overt agents, something which, they claimed, was
unknown in the living language of the day.
Thus it seems that the autonomous form has the same argument struc-
ture as the corresponding active verb. When attached to a finite verb, the
autonomous inflection licenses the appearance of a silent argument with
semantic properties close to those of pronominal elements usually called
arbitrary or impersonal subjects.
6.4 Polish versus Ukrainian
The Irish controversy indicates the importance of developing concrete

syntactic diagnostics for an active vs. a passive analysis when the direct
object shows no signs of promotion to subject, yet no subject argument
is expressed on the surface. Based on her study of the Polish and Ukrai-
nian participial no/to constructions and the Irish autonomous construc-
tion, Maling (1993) selected the four syntactic properties listed in (7) to
use as diagnostics. The values given below would indicate that a given
construction is active:
(7) a. No agentive by-phrase is possible.
b. Binding of anaphors (reflexive/reciprocal) by the null argument
is possible.
c. Control of subject-oriented adjuncts by the null argument is
possible.
d. Nonagentive (unaccusative) verbs can occur in the
construction.
The underlying assumption is that a syntactically present subject argu-
ment licenses binding of lexical anaphors and control of subject-oriented
adjuncts, but blocks an agentive by-phrase. Furthermore, unaccusative
verbs should be able to occur in the construction provided that the verb
selects for a human (internal) argument. A syntactically active imper-
sonal construction with an overt grammatical subject, for example,
French on or German man, has all four of these properties; in contrast,
the canonical passive construction lacks all four properties.6
Using this diagnostic framework, Maling and Sigurjnsdttir (2002,
100107) contrasted the syntactic properties of the accusative-assigning
participial no/to construction in Polish versus Ukrainian:
(8) a. wityni zbudowano w 1640 roku. (Polish)
church-F.ACC built-no in 1640 year
The church was built in 1640.
(Maling and Sigurjnsdttir 2002, ex. 8b)
b. Cerkvu bulo zbudovano v 1640 roci. (Ukrainian)
church-F.ACC was built-no in 1640 year
The church was built in 1640.
(Sobin 1985, 653)
This contrast is puzzling, because in addition to the null subject and non-
promoted direct object, both constructions display the same verbal mor-
phology. Maling and Sigurjnsdttir showed that despite their common
historical origin, and the shared morphological properties of assigning
accusative case and consequent lack of agreement, the Polish and Ukrai-
nian constructions are polar opposites in terms of syntactic behavior. The
Table 6.1
Syntactic properties of various constructions in Polish and Ukrainian
Pol/Ukr Pol/Ukr Polish Ukrainian

Syntactic property Active Passive no/to no/to
agentive by-phrase * ok * ok
bound anaphors in object position ok * ok *
control of subject-oriented adjuncts ok * ok *
unaccusative (nonagentive) verbs ok * ok *
comparison is summarized in table 6.1. As Maling and Sigurjnsdttir

document, the Ukrainian no/to construction behaves like a true passive,
whereas its Polish counterpart does not (for Polish, see also Blevins
[2003]; Kibort [2001, 2004]). Note that in addition to the no/to construc-
tion, Polish and Ukrainian both have a canonical passive with the
expected syntactic properties.
The take-home lesson from this comparison is that we cannot tell
what the syntactic behavior of a construction is by looking at super-
ficial morphological properties such as case and agreement. Despite
their clearly cognate verbal morphology, Polish and Ukrainian have
evidently evolved two syntactically distinct versions of what must
have been the same construction at some earlier point. The syntactic
properties of the Ukrainian no/to construction show that the ability
to assign accusative case does not necessarily decide between the two
possible mental representations (contra Haspelmath [1990, 35]; Blevins
[2003, 481]).7
The Polish and Ukrainian outcomes, one active and one passive as
judged by our syntactic tests, show us that over time, speakers may waver
between the two interpretations of their active-passive Rubin vase. One
might guess that linguists could agree on the facts, following speakers
eventual stabilization. However, the next case shows that the linguistic
Rubin vase may continue to evoke disagreement for linguists, even after
speakers have reached a stable interpretation.
6.5 The Icelandic New Transitive Impersonal Construction
A new transitive impersonal construction is developing in Icelandic. The

New Transitive Impersonal (NTI) takes the form in (9); it appears to have
a passive participle but differs from the canonical passive in that the
verbal object (marked in bold) remains in situ and gets assigned
accusative rather than nominative case (if that argument does not bear
a lexical case, dative or genitive):
(9) Loks var fundi stelpuna eftir mikla leit.
finally was found-NEUT girl.the-ACC after great search
The girl was finally found after a long search. or
They finally found the girl after a long search.
This innovation is a system-internal change that is neither the result of
borrowing nor the result of any phonological change or morphological
weakening. What exactly is the nature of the change? The analysis of the
innovative construction has been the subject of lively debate in recent
years; scholars differ in their assessment of whether the NTI is a transi-
tive passive or an active impersonal construction.8 Everyone agrees that
the postverbal NP in the NTI is an object; the disagreement lies in what
is assumed to occupy the syntactic subject position. Under one analysis,
the NTI is a non-promotional passive resembling the Ukrainian parti-
cipial no/to construction (Eythrsson 2008). Under the alternative
analysis, the null subject is proarb, a thematic [+human] subject that can
serve as a syntactic binder; the construction is syntactically active like the
Polish counterpart (Maling and Sigurjnsdttir 2002; Maling 1993, 2006).
Icelandic also has a productive impersonal passive of intransitive
verbs, which presents an important backdrop to the NTI. The fact that
the understood subject of an impersonal passive of an intransitive verb
can be interpreted only as a volitional agent (typically human), even if
the verb allows inanimate subjects in the active voice, surely supports
the plausibility of the proarb analysis for the NTI. The subject of the verb
flauta whistle can be many things, including tea kettles or trains, but the
impersonal passive a var flauta itEXPL was whistled can be under-
stood only as describing human whistlers.9
The syntactic characteristics of the NTI have been investigated in two
nationwide surveys, the first of which was conducted in 19992000 and
reported in Maling and Sigurjnsdttir (2002). A questionnaire was dis-
tributed to 1,731 tenth graders (age 1516) in 65 schools throughout
Iceland; this number represents 45% of the children born in Iceland in
1984. More than half of the adolescents in most parts of the country (n
= 1475) accepted sentences with an accusative definite postverbal object
like the one in (9), with a range between 51%69% across the various
test sentences. However, only 28% of adolescents in Inner Reykjavk
(n = 220) accepted these sentences, and very few of the adult controls
(n = 200).
A surprising and unexpected result of the survey came from the adult
controls. In spite of their disagreements about the syntactic status of the
NTI, all scholars of Icelandic considered traditional impersonal passives
of intransitive verbs to be true passives. Thus it was a surprise to discover
that about half of the adult speakers in the survey accepted two of the
diagnostics for active constructionsreflexives and subject-oriented
adjunctsin traditional impersonal passives. An example containing a
subject-oriented adjunct is shown in (10):
(10) a var komi skellihljandi tmann.
itEXPL was come laughing.out.loud into class
People came into class laughing out loud.
(Maling and Sigurjnsdttir 2002, ex. 37a)
Maling and Sigurjnsdttir pointed out that the more subject-oriented
participles are accepted, the more simple reflexives are accepted (126).
For adolescents, the correlation was highly significant (r = 0.433, n = 1693,
p < 0.001, 2-tailed); for adults the correlation was also highly significant
(r = 0.532, n = 199, p<0.001, 2-tailed) (Maling and Sigurjnsdttir 2002,
126n15). This correlation supports the suggestion that these speakers
have a syntactically active representation for the traditional so-called
impersonal passives.
In contrast, there are other speakers who allow neither reflexives nor
subject-oriented adjuncts; these judgments reflect a passive analysis.
When asked about a sentence like (10), one such speaker, a woman in
her seventies, remarked: a vantar einhvern someone is missing. Her
remark suggests that her grammar did not make available a referent for
the controller of the adjunct.
We take no position on whether the grammar of an individual speaker
can have both or only one of the representations. We simply observe that,
in the aggregate, there is evidence for both grammatical analyses among
contemporaneous speakers.
Since almost no adults accepted the NTI, there is an implicational
relation: speakers who accept the NTI with accusative objects also accept
traditional impersonal passives with reflexive verbs, but not vice versa.
Maling and Sigurjnsdttir (2002) suggested that sentences with a reflex-
ive object represent the first step in the reanalysis of the past participle
in the NTI from passive to syntactically active. They interpreted the age-
related variation for reflexive impersonals as reflecting three stages in
the diachronic development of the NTI. Support for the claim that reflex-
ive impersonals are an intermediate stage in the development of what is
now called the NTI comes from the fact that the reflexive impersonal is
a relative newcomer. Eythrsson (2008, 189) observed that impersonal
passives of reflexive verbs were not found in Old Icelandic, but rather
seem to be an innovation of Modern Icelandic that is increasingly gaining
ground. A corpus search on an open-access digital library (timarit.is),
which hosts digital editions of newspapers and magazines from the 17th
century to the early 21st century, found only sporadic examples of reflex-
ive impersonal passives in the earlier periods, but after about 1890, the
number of examples increases significantly (rnadttir, Eythrsson, and
Sigursson 2011). Thus a crucial first step in the reanalysis of the imper-
sonal passive as a syntactically active construction in Icelandic seems to
have been the extension to reflexive predicates; this then extends to
other bound anaphors, and finally to the inclusion of (definite) object
NPs, both full NPs and personal pronouns that are morphologically
accusative, as expected in an active clause.
The ongoing syntactic change in Modern Icelandic indicates that
native (adult) speakers do not all necessarily come to the same gram-
matical analysis of every construction; on the contrary, speakers them-
selves may come to radically different analyses of the same data. The
readily observable data underdetermines the analysis; it is only by
pushing the speaker to judge more complex, or less common (even van-
ishingly rare), sentences that we can see the empirical consequences of
choosing one syntactic representation over another.
6.6 Discussion and Conclusion
In our view, morphology and syntax conspire in the cases discussed here
to assemble a linguistic Rubin vase. Although we have discussed only
cases where an apparently passive construction has been reanalyzed as
an impersonal active, grammatical change can occur in the opposite
direction as well (Siewierska [2010], drawing on Broadwell and Duncan
[2002]). Kaqchikel, an Eastern Mayan language of highland Guatemala,
has a variety of passive constructions, including one marked with the affix
ki. The verb in the ki-passive shows active morphology, with an active
transitive verbal suffix /-Vj/ and the 3rd plural ergative agreement marker
ki, as would be appropriate for a transitive verb with an impersonal
they subject.
Broadwell and Duncan (2002) argue that this verb form has evolved
into a construction with the syntactic properties of a passive. It can
co-occur with an agentive by-phrase, which can be singular or plural, and
Table 6.2
Mismatch between morphology and syntax
Active syntax Passive syntax
Active morphology French on; German man Kaqchikel ki-passive

Passive morphology Polish no/to; Ukrainian no/to
Irish autonomous
even 1st or 2nd person. But in contrast with the Polish no/to construc-
tion, the ki-passive is a promotional passive: it is the patient argument
and not the agent which has the grammatical properties of a subject,
as shown by syntactic tests including the use of subject-oriented
adverbials.
Taken together, our exemplars reveal that every possible association
between surface morphology and syntactic behavior is attested cross-
linguistically, as shown in table 6.2. In each case that we have discussed
above, there are several potential sources of indeterminacy within the
constructional object itself. One is structural: a construction that derives
an intransitive impersonal is inherently ambiguous (Haspelmath 1990,
35; Maling and Sigurjnsdttir 2002, 126; Blevins 2003, 481), as are sub-
jectless transitives. As we have seen here, the morphology may be associ-
ated with a canonical passive historically or in other languages in the
family, while the syntax may suggest an active construction. Because of
the inherent morphosyntactic ambiguity, speakers of the same language
may construe one of these constructions in different ways, leading to
eventual change, as in the Icelandic case, and in Polish versus Ukrainian.
This leads to actual divergences in the data, even as linguists seek to
discern the unifying reality.
But as we have mentioned, it is not just speakers who diverge in their
interpretation of these ambiguous linguistic objects. In their typological
survey, Keenan and Dryer (2007) delve further into disputes among
linguists over exactly the class of objects we have discussed: those at the
border of impersonal actives (which they call unspecified subject con-
structions) and non-promotional passives. They make reference to several
long-standing debates about such cases. Wolfart (1973) analyzes the
ikawi suffixed construction in Cree as an unspecified subject construc-
tion, while Dahlstrom (1991) analyzes it as a passive. Wolfarts analysis
relies on the morphology, Dahlstroms on a syntactic test. Reportedly,
Hockett and Bloomfield sparred over the analogous construction in
Ojibwa. MacKay (1999) describes a Tepehua construction as passive, and
the cognate construction in Misantla Totonac as an unspecified subject

(active) construction. OConnor and Maling (2014) examine the proper-
ties of the ya construction in Northern Pomo (OConnor 1992), arguing
that it resembles the Irish autonomous construction; Mithun (2008) ana-
lyzes the Central Pomo counterpart as an impersonal passive. The histori-
cal dimension is significant, both for speakers and for linguists. As the
Icelandic, Irish, and Polish/Ukrainian cases tell us, the syntactic behavior
of such constructions can change over time, perhaps as a result of the
ambiguity created by this linguistic analogue to the Rubin vase illusion
the non-promotional passive/impersonal active.
It is a mystery why speakers might vacillate in their interpretation of
such constructions, finally settling on one interpretation over time. Yet
linguists, one might think, should be able to simply follow the speakers
judgments and apply the syntactic tests. Why should there be persistent
disagreement among linguists as to whether they are seeing faces or a
vase? In OConnor and Maling (2014), we drew a comparison with the
International Astronomical Unions debate over the status of Pluto as a
planet, and suggested that linguists trailing disagreements about the
categorization of a construction reflect their commitments to the relative
importance of distinct dimensions of the construction (information struc-
ture, syntax, morphology, etc.) for a theory of grammar. We suggest that
for linguists, these examples provide an opportunity to move beyond
the shifting linguistic image to a discussion of the grounding reality
beyond it, the grammatical interpretations assigned to morphosyntactic
complexes. Thoughtful dialogues about such cases may provide a
space for reaching rare but valuable agreements on our theoretical
commitments.
Notes
1. We thank the editors for the invitation to contribute to this volume in honor
of Ray Jackendoff, who has made such insightful and important contributions to
our understanding of the semantics-syntax interface. The material in this paper
is based in part on work done while the first author was serving as Director of
NSFs Linguistics Program. Any opinion, findings, and conclusions expressed in
this material are those of the authors, and do not necessarily reflect the views of
the U.S. National Science Foundation. Special thanks to Jim McCloskey for
helpful discussions of the syntactic issues surrounding the Irish autonomous
construction, and to Jane Grimshaw for help in clarifying our exposition.
2. Note that here we are not saying that constructions with an oblique agent
phrase, or by-phrase, are the most common type of passives. Siewierska and
Bakker (2012), following Corbett (2005), discuss how the by-phrase constitutes
a unique feature that excludes other analyses and thus serves to identify the
canonical passive.
3. We purposely problematize the promotion of the patient/theme argument in
this paper. Oddly, Keenan and Dryer (2007), in their impressive typological
survey of passives, do not prioritize our property (ii) as characteristic of a basic
passive. Rather, they seem to take it for granted and select the following proper-
ties as basic: We shall refer to passives like (1b), John was slapped, as basic
passives. What makes them distinct from other passives is (i) no agent phrase
(e.g., by Mary) is present, (ii) the main verb in its non-passive form is transitive,
and (iii) the main verb expresses an action, taking agent subjects and patient
objects. Our justification for calling such passives basic is that they are the most
widespread across the worlds languages (328).
4. The earliest attested example is from 1772, First Earl of Malmesbury, cited by
Warner (1995):
(i) I have received the speech and address of the House of Lords; probably, that
of the House of Commons was being debated when the post went out.
Visser (19631973, vol. IV) has a long and amusing collection of vitriolic com-
ments on this new usage that go almost right through to the end of the 19th
century.
5. The following abbreviations are used:
ACC accusative (case) M masculine
ADV adverbial M.EV multiple event
AUT autonomous NEUT neuter
CAUS causative OBL oblique
COMP complementizer PL plural
DEM demonstrative PPART past participle
EXPL expletive PROX proximate
F feminine PST past (tense)
GEN genitive (case) REFL reflexive
HABIT habitual S singular
IMPERS impersonal SPEC specifier
LOG logophoric 3 third person
6. The dichotomy is not always this clear-cut. For example, in German, imper-
sonal passives allow a by-phrase, but also reflexives and reciprocals. Both inher-
ent and noninherent reflexive predicates form impersonal passives (see Plank
[1993], and especially Schfer [2012] for discussion); moreover, at least some
unaccusative verbs can form impersonal passives (Primus 2011). A Google
search turns up examples like Es wurde gestorben auf beiden Seiten it was died
on both sides. Clearly further investigation of the lexical restrictions is needed.
For Icelandic, see Sigursson (1989, 322, n48) and Thrinsson (2007, 266273).
7. See Dubinsky and Nzwanga (1994) for discussion of a transitive impersonal
construction in Lingala, a western Bantu language.
8. A good survey of the empirical facts and theoretical issues can be found in
Thrinsson (2007, 273283).
9. The situation for German and Dutch is more nuanced (see the discussion in
Primus [2011]). For impersonal passives in Norwegian, see Maling (2006).
References
rnadttir, Hlf, Thrhallur Eythrsson, and Einar Freyr Sigursson. 2011. The
passive of reflexive verbs in Icelandic. In Relating to Reflexives, edited by
Tania E. Strahan, special issue, Nordlyd 37:, 3997. http://septentrio.uit.no/
index.php/nordlyd/article/view/2024/1884.
Austen, Jane (1803) Northanger Abbey. Ebook, Project Gutenberg. http://
www.gutenberg.org/files/121/121-h/121-h.htm.
Austen, Jane. 1884. Letters of Jane Austen. Edited with an introduction and critical
remarks by Edward Lord Brabourne. London: Bentley. http://www.pemberley.com/
janeinfo/brablet7.html#letter37.
Blevins, James P. 2003. Passives and impersonals. Journal of Linguistics 39 (3):
473520.
Broadwell, George Aaron, and Lachlan Duncan. 2002. A new passive in Kaq-
chikel. Linguistic Discovery 1 (2):116.
Corbett, Greville. 2005. The canonical approach in typology. In Linguistic Diver-
sity and Language Theories, edited by Zygmunt Frazyngier, Adam Hodges, and
David S. Rood, 2549. Amsterdam: John Benjamins.
Dahlstrom, Amy. 1991. Plains Cree Morphosyntax. Outstanding Dissertations in
Linguistics. New York: Garland Publishing.
Dubinsky, Stanley, and Mazemba Nzwanga. 1994. A challenge to Burzios gener-
alization: Impersonal transitives in western Bantu. Linguistics: An Interdisciplin-
ary Journal of the Language Sciences 32 (1): 4764.
Eythrsson, Thrhallur. 2008. The New Passive in Icelandic really is a passive. In
Grammatical Change and Linguistic Theory: The Rosendal Papers, edited by
Thrhallur Eythrsson, 173219. Amsterdam: John Benjamins.
Haspelmath, Martin. 1990. The grammaticization of passive morphology. Studies
in Language 14 (1): 2572.
Keenan, Edward, and Matthew Dryer. 2007. Passive in the worlds languages. In
Language Typology and Syntactic Description, vol. I, 2nd ed., edited by Timothy
Shopen, 325361. Cambridge: Cambridge University Press.
Kibort, Anna, 2001. The Polish passive and impersonal in Lexical Mapping
Theory. In Proceedings of the LFG 01 Conference, University of Hong Kong,
Hong Kong, edited by Miriam Butt and Tracy Holloway King, 163183. Stanford,
CA: CSLI Publications.
Kibort, Anna. 2004. Passive and Passive-like Constructions in English and Polish:
A Contrastive Study with Particular Reference to Impersonal Constructions. Ph.D.
diss., University of Cambridge.
MacKay, Carolyn Joyce. 1999. A Grammar of Misantla Totonac. Salt Lake City:
University of Utah Press.
Maling, Joan. 1993. Unpassives of unaccusatives. Unpublished ms., Brandeis Uni-

versity. Short version published as Maling (2010).
Maling, Joan. 2006. From passive to active: Syntactic change in progress in Ice-
landic. In Demoting the Agent: Passive, Middle and Other Voice Phenomena,
edited by Benjamin Lyngfelt and Torgrim Solstad, 197223. Amsterdam: John
Benjamins.
Maling, Joan. 2010. Unpassives of unaccusatives. In Hypothesis A / Hypothesis B:
Linguistic Explorations in Honor of David M. Perlmutter, edited by Donna B.
Gerdts, John Moore, and Maria Polinsky, 275292. Cambridge, MA: MIT
Press.
Maling, Joan, and Sigrur Sigurjnsdttir. 2002. The new impersonal construc-
tion in Icelandic. Journal of Comparative Germanic Linguistics 5: 97142.
McCloskey, James. 2007. The grammar of autonomy in Irish. Natural Language
and Linguistic Theory 25 (4): 825857.
Mithun, Marianne. 2008. Does passivization require a subject category? In Case
and Grammatical Relations: Studies in Honor of Bernard Comrie, edited by
Greville Corbett and Michael Noonan, 211240. Amsterdam: John Benjamins.
Noonan, Michael. 1994. A tale of two passives in Irish. In Voice: Form and Func-
tion, edited by Paul Hopper and Barbara Fox, 279312. Amsterdam: John
Benjamins.
OConnor, Mary Catherine. 1992. Topics in Northern Pomo Grammar. Outstand-
ing Dissertations in Linguistics. New York: Garland Publishing.
OConnor, Catherine, and Joan Maling. 2014. Non-promotional passives and
unspecified subject constructions: Navigating the typological Kuiper Belt. In
Perspectives on Linguistic Structure and Context: Studies in Honor of Knud Lam-
brecht, edited by Stacey Katz Bourns and Lindsy L. Myers, 1738. Amsterdam:
John Benjamins.
S, Diarmuid. 2006. Agent phrases with the autonomous verb in Modern Irish.
riu 56 (1): 85115.
Plank, Frans. 1993. Peculiarities of passives of reflexives in German. Studies in
Language 17 (1): 135167.
Primus, Beatrice. 2011. Animacy and telicity: Semantic constraints on impersonal
passives. In Case Variation, edited by Klaus von Heusinger and Helen de Hoop,
special issue, Lingua 121 (1): 8099.
Schfer, Florian. 2012. The passive of reflexive verbs and its implications for
theories of binding and case. Journal of Comparative Germanic Linguistics 15
(3): 213268.
Sells, Peter, Annie Zaenen, and Draga Zec. 1987. Reflexivization variation: Rela-
tions between syntax, semantics and lexical structure. In Working papers in
grammatical theory and discourse structure, edited by Masayo Iida, Stephen
Wechsler, and Draga Zec, 169238. Stanford, CA: CSLI Publications.
Siewierska, Anna. 2010. From third plural to passive: Incipient, emergent and
established passives. Diachronica 27 (1): 73109.
Siewierska, Anna, and Dik Bakker. 2012. Passive agents: Prototypical vs. canoni-
cal passives. In Canonical Morphology and Syntax, edited by Dunstan Brown,
Marina Chumakina, and Greville G. Corbett, 151189. Oxford: Oxford University
Press.
Sigursson, Halldr rmann. 1989. Verbal Syntax and Case in Icelandic in a
Comparative GB Approach. Ph.D. diss., Lund University. Reprint, Reykjavk:
Linguistic Institute, University of Iceland, 1992.
Sobin, Nicholas. 1985. Case assignment in Ukrainian morphological passive con-
structions. Linguistic Inquiry 16 (4): 649662.
Stenson, Nancy. 1989. Irish autonomous impersonals. Natural Language and Lin-
guistic Theory 7 (3): 379406.
Thrinsson, Hskuldur. 2007. The Syntax of Icelandic. Cambridge: Cambridge
University Press.
Visser, Fredericus Theodorus. 19631973. An Historical Syntax of the English
Language. 4 vols. Leiden: E. J. Brill.
Warner, Anthony R. 1995. Predicting the progressive passive: Parametric change
within a lexicalist framework. Language 71 (3): 533557.
Wolfart, H. Christoph. 1973. Plains Cree: A Grammatical Study. Transactions of
the American Philosophical Society. New Series, vol. 63, part 5. Philadelphia:
American Philosophical Society.
7 Agentive Subjects and Semantic Case in Korean
Max Soowon Kim1
7.1 Introduction
Agentive subjects in Korean are typically marked nominative (i/ka) as

the unmarked case.2 But when a main verb is an unaccusative verb with
no agent argument, its (nonagentive) (surface) subject can also appear
in the dative. This is illustrated in (1).
(1) a. Wuli sensayngnim-i/*kkey piano-lul chi-si-ess-ta.
our teacher-NOM/*DAT piano-ACC play-HON-PAST-IND
Our teacher played the piano.
b. Wuli sensayngnim-i/kkey casik-i epsu-si-ta.
our teacher-NOM/DAT child-NOM lack-HON-IND
Our teacher has no child.
The subject status of the dative NP wuli sensayngnim in (1b) is confirmed
by the fact that it still licenses the verbal honorific affix si. Dative case,
however, is not possible for the agentive subject in (1a).
A less well-known and little-studied fact is that under certain condi-
tions agent subjects in Korean can also be marked with locative case (Suh
1996, 208209; Sells 2004):
(2) Tutie wuli team-eyse/i kyelsungcen-ul ikiessta.
finally our team-LOC/NOM final.match-ACC won
At long last, our team won the final match.
Despite its locative case, wuli team in (2) retains the agent role of the
verb and exhibits all properties of grammatical subjects. This locative
case can alternate with nominative case, too, since nominative is always
available for subjects. The subject status of locative agent NPs can be
independently confirmed by attested examples like (3), where the loca-
tive NP licenses the verbal honorific affix.3
120 Max Soowon Kim
(3) Swutokwon haksayngtul-eykey hakkyochuk-eyse thukpyelhi

capital.area students-DAT administration-LOC specially
paylyehaycwu-si-ess-umyen hanta.
consider-HON-PAST-COND hope
We hope the (university) administration would consider
specifically the students from the Metropolitan Seoul area.
Examples like (2) and (3) raise interesting and important questions
about the properties of morphological case-marking within and across
languages. For Korean, they provide evidence against the widely-accepted
generalization that there must be at least one nominative-marked NP
in every [finite] clause (Y. Kim 1990, 206), which is corroborated by the
fact that no verb in Korean can take a dative subject alone (for the latest
overview of morphological case-marking in Korean, see Maling [2013]
and the references therein).4 From a crosslinguistic viewpoint, too, loca-
tive agentive subjects in Korean are of considerable interest. Since there
is no (autonomous) semantic case specifically for agents, agentive sub-
jects tend to be marked uniformly by syntactic case (i.e., nominative or
ergative case) across languages (Blake 1994). Thus, agent subjects marked
with a locative case (i.e., semantic case) have interesting implications for
the observed crosslinguistic generalization as well.5
This paper investigates the morphosyntactic property of locative agen-
tive subjects in Korean in detail and will provide relevant descriptive
generalizations (section 7.2). The paper also examines and refutes an
alternative analysis whereby the locative agentive subject is treated as
an adjunct phrase of sorts (section 7.3). The paper then discusses two
major consequences (section 7.4), including the implication of the analy-
sis for the Case-in-Tiers approach to morphological case (Yip, Maling,
and Jackendoff 1987).
7.2 Agentive Subjects and Semantic Case: Some Generalizations
7.2.1 Typology of Case

Studies of morphological case-marking have recognized a distinction
between semantic case and syntactic (or grammatical) case. Generally,
semantic case marks the semantic or thematic relations of NPs to the
governing head (e.g., verb), and syntactic case marks the syntactic or
grammatical relations of NPs to the governing head, irrespective of its
thematic properties (see e.g., Blake 1994). Thus, crosslinguistically,
while a particular semantic case is used for a particular semantic role or
a class of related semantic roles (e.g., various case forms expressing
Agentive Subjects and Semantic Case in Korean 121
source, location, or direction), a single syntactic case (e.g., nominative,

absolutive) is used for the common grammatical function shared by NPs
with diverse semantic roles.
Two salient points can be established about the locative case eyse. First,
eyse is a semantic case used normally for adjunct NPs and expresses the
physical location of an event:
(4) Hansoo-ka kyosil-eyse/*i/* cemsim-ul mekessta.
Hansoo-NOM classroom-LOC/*NOM/*0 lunch-ACC ate
Hansoo ate his lunch in the classroom.
The locative phrase in (4) is clearly an (optional) adjunct since informa-
tion pertaining to location or time is only peripherally construed with
respect to the act of eating. Importantly, this locative adjunct can only
be case-marked with eyse. Since eyse expresses a homogeneous semantic
relation (i.e., location in particular), it is construed as a semantic case
(for the classification of case, see Blake [1994], 3234). If, on the other
hand, a locative NP is not an adjunct but a verbal argument and its
co-occurrence with the verb becomes obligatory, the locative NP must
bear dative case in Korean, not locative case. This is an important point
to be discussed in section 7.2.2.
Second, eyse is not a lexical case. As a verb-governed case, not only is
lexical case used exclusively for argument NPs, its assignment is idiosyn-
cratically determined by verbs (Zaenen, Maling, and Thrinsson 1985;
Sigursson 2003). Moreover, unlike syntactic case, lexical case is retained
throughout derivation and is unaffected by syntactic operations that
produce case alternations (e.g., passivization). There then are two com-
pelling reasons why eyse cannot be a lexical case. First, lexical idiosyn-
crasy is irrelevant since the subject of any agentive verb may potentially
be a candidate for eyse-marking. Second, eyse can alternate with nomina-
tive case when it marks an agentive subject, but this is precisely what
lexical case is known to resist (i.e., it exhibits case preservation). Thus,
we are dealing with true instances of agentive subjects marked with a
semantic case.
7.2.2 Two Types of Locative Phrases

Korean distinguishes locative arguments and locative adjuncts by means
of case-marking. While locative adjuncts must be marked with locative
case, as shown in section 7.2.1, locative arguments must be marked with
the dative (ey) and cannot be marked with locative case. This bifurcation
is illustrated by the contrasts in (5).
122 Max Soowon Kim
(5) a. Insoo-ka uycawuy-ey/*eyse hwapwun-ul nohassta.

Insoo-NOM chair.top-DAT/*LOC flower.pot-ACC put
Insoo put a flower pot on the chair.
b. Ehang-ey/*eyse yelekaci koki-ka issta.
tank-DAT/LOC various fish-NOM exist
There are a variety of fish in the tank.
The morphosyntactic behavior of dative case and locative case, then,
results in the following robust generalization:6
(6) Mutual Exclusivity of Dative Case and Locative Case
Dative case and locative case in Korean are in complementary
distribution. In particular, dative case cannot mark an agentive
subject (i.e., external argument) and locative case cannot mark a
locative argument (i.e., internal argument).
That is, the distribution of dative case and locative case yields an interest-
ing subject-object asymmetry: locative case can be used for adjunct loca-
tive phrases and external arguments with an agent or actor role, but it
cannot be used for VP-internal arguments, that is, NPs that assume a
direct, indirect, or oblique object role underlyingly (for discussion of the
overall architecture of argument structure, see Grimshaw [1990]; Jack-
endoff [1990]).
7.2.3 A Semantic Condition

For an agentive subject to receive locative case, however, it must meet a
certain semantic condition, since agentivity alone is not sufficient, as
illustrated in (7).
(7) a. Pyengwon-i/eyse mwupohem hwanca-lul kepwuhayssta.
hospital-NOM/LOC no.insurance patient-ACC refused
The hospital refused patients with no health insurance.
b. Uysa-(tul)-i/*eyse mwupohem hwanca-lul kepwuhayssta.
doctor-(PL)-NOM/LOC no.insurance patient-ACC refused
The doctor(s) refused patients with no health insurance.
The minimal pairs in (7a,b) illustrate the semantic condition in question:
while the subject in (7a) can be construed as a collective noun that refers
to an organization or institution, the subject in (7b) cannot be so con-
strued. For sentence (7a) to be semantically well-formed, the subject
pyengwon must be understood as making reference to people at the
hospital (i.e., it gets a coerced reading, as Ray Jackendoff [pers. comm.]
pointed out to me) rather than to a mere location; otherwise, the verbs
agent theta role would be incompatible with its subject NPs reference
to a physical location (unless there is evidence that the verb in (7a) can
have a different argument structure from the (same) verb in (7b), which
seems unlikely; see also Sells [2004]).
Interestingly, it is on this sort of coerced reading that an agent
subject referring to an organization triggers plural agreement on the
verb in British English, as pointed out to me by Jane Grimshaw
(pers. comm.):
(8) a. My team are/*is playing tonight.
b. The hospital are/*is going to refuse patients with no health
insurance.
The existence of locative agentive subjects is certainly not an isolated
fact about Korean. Japanese, a typologically similar language, shows loca-
tive agentive subjects as well, as exemplified in (9) (Suh 1996, 209):
(9) Kore-ni tzuite-wa yatogawa-de tzuyoi hantai-o
this-DAT regard-TOP opposition.party-LOC strong objection-ACC
simesi-teiru. (Japanese)
show-PROG
On this matter, the opposition party is raising a strong objection.
The de-marked NP yatogawa in (9) also acts as a collective noun whose
referent is a political organization.
Why is it that semantic reference to an organization or institution is
an important prerequisite for locative marking in Korean? Since an
organization or institution that a collective noun refers to typically estab-
lishes itself with a location (i.e., office, mailing address, material exis-
tence), the intrinsic meaning of location becomes an integral part of the
denotation of a collective noun. The crucial point, then, is that for an
agentive subject to receive locative case, it must have an inherent meaning
of location, and this semantic condition explains why the agent subject
of (7b), which refers to individuals and not to an organization, cannot
receive locative case.
On the other hand, the contrasts in (10) show that some qualification
is needed for the required semantic condition vis--vis the integrated
meaning of location.
(10) a. *Phiko-eyse cayphan-ul ikiessta.
defendant-LOC trial-ACC won
The defendant(s) won the trial.
b. Phiko-chuk-eyse cayphan-ul ikiessta.
defendant-side-LOC trial-ACC won
The defense won the trial.
124 Max Soowon Kim
Why is locative case possible in (10b) but not in (10a)? Notice the crucial
difference: in (10b) a nominal suffix with a locative meaning (e.g., chuk,
phyen, or ccok side/part (of)) has been added to the subject. The ratio-
nale for this seems clear. An NP like phiko-chuk defendants side
includes every member of the party referred to collectively as the defen-
dant (i.e., all of the accused and their lawyers), rendering it a collective
noun eligible for locative case, even if the defendants side consists of a
singleton member (i.e., a lone defendant who represents himself or
herself without a lawyer).
We now state the requisite semantic condition in (11).
(11) Eligibility for Locative Case-Marking
For an agentive subject to receive locative case, it must have an
intrinsic meaning of location (by virtue of being a collective
noun) or an acquired meaning of location (by means of a
location-denoting nominal suffix).
The availability of location-denoting nominal suffixes, then, makes virtu-
ally any agentive subject eligible for locative case in Korean, predicting
a large linguistic corpus.
It is worth noting two related properties of Japanese and comparing
them with Korean. In Japanese, the suffix tati, which Nakanishi and
Tomioka (2004) analyze as a group-denoting suffix rather than a mere
plural marker, behaves similar to the location-denoting suffix in Korean,
as illustrated in (12a) (Sells 2004, exx. 2021).
(12) a. Gakusei-tati-de/*Gakusei-de bokoo-o otozureta.
student-PL-LOC/*student-LOC alma.mater-ACC visited
A group of students visited their alma mater.
b. Taroo-dake-de bokoo-o otozureta.
Taroo-only-LOC alma.mater-ACC visited
c. *Insoo-man-eyse mokyo-lul chacassta.
Insoo-only-LOC alma.mater-ACC visited
Without the suffix tati, the agentive subject in (12a) cannot receive loca-
tive case. On the other hand, a subset of locative agentive subjects in
Japanese clearly differs from their Korean counterparts. As the contrasts
in (12b,c) show, adding dake only to the agentive subject makes the
Japanese example eligible for locative case, whereas that still does not
salvage the Korean counterpart. These and related issues require further
research.
7.2.4 Case Stacking

Case stacking refers to the agglutination of two or more morphological
case markers producing a single NP that bears multiple morphological
cases. In Korean the most common examples of case stacking involve
dative-nominative stacking (although restricted to southwestern dia-
lects), but examples of dative-accusative stacking also exist with indi-
vidual variation (see Gerdts and Youn 1999; Yoon 2007).
Given that agent subjects can alternate between locative and nomina-
tive case, the obvious question is whether locative and nominative case
can stack. Sells (2004, 155) remarks that eyse [locative] and ka [nomina-
tive] represent options that the speaker must choose between and do
not allow the sequence eyse-ka, which is confirmed by (13) (cf. ex. (2)
in section 7.1).
(13) *Wuli team-eyse-ka kyelsungcen-ul ikiessta.
our team-LOC-NOM final.match-ACC won
7.3 Refuting the Alternative Analysis
I have argued that locative agent subjects have the properties of true
grammatical subjects. Should they not be true grammatical subjects, what
could be an alternative analysis? The alternative analysis (raised by a
reviewer) assumes that the true subject is nonovertmost likely a null
or elided argument NP (see S. Kim [1999] for NP ellipsis)and that for
the purposes of case marking, the nonovert subject claims the nomina-
tive. This is sketched in (14).
(14) NPNULL-(NOM) NP-LOC NP-ACC Verb (order irrelevant)
I will examine this null-NOM analysis and show why it cannot be a
correct analysis.
The null-NOM analysis can be shown to work well for a subset of
locative agentive subjects. Relevant examples are given below:
(15) a. Wuli hakkyo-eyse kummeytal-ul ttassta.
our school-LOC gold.medal-ACC won
Our school won the gold medal.
b. Wuli hakkyo-eyse chwukkwupwu-ka kummeytal-ul ttassta.
our school-LOC soccer.team-NOM gold.medal-ACC won
The soccer team of our school won the gold medal.
The locative NP in (15a) as an institution takes the verbs agent role and
is interpreted as indicated; it does not refer to the location of the event
126 Max Soowon Kim
(where the athletic games were played and the medals were awarded).
But in (15b) there is an (extra) nominative subject that assumes the
verbs agent role. On closer inspection, however, (15b) exemplifies a
part-whole relation whereby the part-NP (i.e., the soccer team) and the
possessor NP (i.e., our school) must share the agent role (see Maling and
Kim [1992] for the case-marking of part-whole relations). Nonetheless,
it does have the structure expected under the grammatical representa-
tion sketched in (14).
But the real problem with the alternative analysis is posed by examples
where the agent subject is an individual but still receives locative case
by virtue of the location-denoting suffix it bears. This is illustrated in (16).
(16) a. John-ccok-eyse cangki-lul ikiessta.
John-side-LOC chess-ACC won
John/Johns side won the chess game.
b. proj-(NOM) NPj-LOC NP-ACC Verb
c. NPj-LOC proj-(NOM) NP-ACC Verb
An example like (16a) is semantically and pragmatically well-formed
even when a chess game was played by only two individuals, say, John
and Peter, and John won the game by beating Peter. According to the
null-NOM analysis sketched in (14), there must be a null subject NP that
claims the nominative and either of the structures in (16b,c) must be true.
But since the overt locative NP in (16a) and the null subject NP postu-
lated in (16b) or (16c) must refer to the same individual (i.e., John), the
two NPs must co-refer as indicated. Neither of the structures, however,
can survive as grammatical since (16b) is a Condition C violation and
(16c) is a Condition B violation (for Binding Theory, see Chomsky
[1981]). This means that the locative agent NP in (16a) must be the true
subject and directly take the verbs agent theta role. The conclusion, then,
is that the alternative analysis cannot be an adequate account for the
whole set of data, and therefore must be rejected in favor of the Locative
Agentive Subject Analysis proposed here, which takes an eyse-marked
subject to be a true grammatical subject.
7.4 Some Consequences
7.4.1 Honorific Case Markers

The honorific case kkeyse has been almost unanimously analyzed
as a nominative case (e.g., Sells [1995]; Suh [1996]; cf. Martin [1992]
and Yoon [2005] as the only exceptions) primarily because an NP

that bears it licenses the verbal honorific affix sia property indic-
ative of subjecthoodand also because of the widely-accepted gen-
eralization that every finite clause in Korean must have at least one
nominative NP.
(17) Apeci-kkeyse na-lul pwulu-si-ess-ta.
father-kkeyse I-ACC call-HON-PAST-IND
My father called me.
Note, however, that being a subject does not imply nominative case, and
non-nominative NPs of certain kinds have also been analyzed as subjects
(Yoon 2005).
As I will show, kkeyse is in fact the honorific form of eyse and is there-
fore a locative case, not a nominative case. The kkeyse-marked subject
in (17) has the same grammatical property as the eyse-marked agent
subjects examined in this chapter, and the two case particles are in fact
allomorphs (see below). The following table summarizes the morpho-
logical paradigm of plain vs. honorific case particles.
(18)
Table 7.1
Plain Versus Honorific Case Forms
Case Plain Form Honorific Form
Dative (inanimate) ey kkey

Dative (animate) eykey *kkeykey
Locative (inanimate) eyse kkeyse
Notice the gap in the paradigm: the dative case for animate NPs has no
honorific form. Why is it that only case markers used for inanimate NPs
have honorific forms? Given that honorifics are used to express defer-
ence towards individuals, the link between deference and inanimacy at
first seems counterintuitive and contradictory. I argue, however, that that
is what honorifics are truly for: avoidance of direct reference. In Korean
culture, avoidance of direct eye contact during a conversation is a way
to show respect for the other party, since direct eye contact may be
interpreted as confrontation rather than attention. The linguistic coun-
terpart of this cultural aspect of indirectness is that avoidance of
making direct reference to a person also implies respect in a similar
manner, since direct personal reference may be interpreted more likely
128 Max Soowon Kim
as accusation rather than attribution. My argument is thus that it is no

coincidence that the honorific case particles take the form of locative
case in Korean.
It is interesting to note that, historically, kkeyse is believed to have
derived from (Noun)-s-kung-eyse after certain phonological processes
(e.g., deletion, consonant tensing), where s is a possessive marker and
kung is a by-now obsolete suffix meaning vicinity (Yoon 2005, n. 20),
that is, in the vicinity of (a person), which is very much like the meaning
of (Noun)-ccok-eyse at the side of (a person) after the addition of a
location-denoting suffix discussed in section 7.2.3.
The mistaken identification of kkeyse as a nominative case particle has
cost the research community some price. Assuming that kkeyse marks
an honorific nominative subject, Sells (1995) claims that kkeyse has
no syntactic or semantic postpositional properties . . . or spatial or tem-
poral meaning (294) and that its position in the morphemic template
provides crucial evidence against any view of Korean case whereby
postpositional case is supposed to represent underlying levels of struc-
ture, and grammatical case (nominative and accusative) is supposed to
represent surface levels (312). This mistaken identification has had
unfortunate consequences for the morphosyntactic analysis of Korean
case particles (cf. Sells 1995; Yu-Cho and Sells 1995).
That the honorific kkeyse is a locative case, not a nominative case,
helps explain certain puzzles in morphological case marking in Korean.
Unlike the regular nominative case marker (i/ka), kkeyse cannot be used
to mark predicate complement NPs in Korean:
(19) Kim kyoswunim-kkeyse/i choncangnim-i/*kkeyse toy-si-essta.
Kim professor-LOC/NOM president-NOM/*LOC become-HON-
PAST-IND
Prof. Kim became the (college) president. (Yoon 2005, ex. 32)
This restricted distribution of kkeyse, noted first by Sells (1995,
n. 21), would be a puzzle if it were a nominative case particle. But
this contrast is explained in simple terms when we recognize that it
is a subset of the same restriction that bans a locative case from
marking an internal argument or a complement NP (see generalization
(6) above).
My analysis that eyse and kkeyse are allomorphs makes an interesting
prediction: the two will be in complementary distribution in that kkeyse
cannot be used where eyse is used, and vice versa. This is indeed con-
firmed (Yoon 2005, exx. 38a,c):
(20) a. Apenim-kkeyse/*eyse mence ceyyu-lul hasiessta.

father-LOC first suggestion-ACC made
My father made a suggestion first.
b. Apenim-ccok-eyse/*ccok-kkeyse mence ceyyu-lul
father-side-LOC first suggestion-ACC
hasiessta.
made
(Lit.) My fathers side made a suggestion first.
This otherwise puzzling distribution of the two, noted by Yoon (2005),
however, is now explained as a classic linguistic behavior of allomorphs.
(The same holds true for the dative pair of ey and kkey as well, but space
limitations do not allow any further discussion here.)
7.4.2 Case Shift versus Case Overlay

What impact does lexical case marking have on assignment of syntactic
case? Lexical case marking may or may not block syntactic case marking,
hence two-way variation has been observed crosslinguistically: case shift
or case overlay. The Icelandic example in (21a) (Sigursson 2003, 231)
illustrates case shift of the nominative over to the postverbal object, and
the (spoken) Faroese example in (21b) (Barnes 1986, 18), case overlay
of the dative over the nominative subject:
(21) a. Henni lkai essi hugmynd. (Icelandic)
her(D) liked this idea(N)
She liked this idea.
b. Mr lkar henda filmin. (Faroese)
me(D) likes this film.the(A)
I like this film.
Behind these two modes of interaction is the influence and privilege of
the nominative in structural case assignment, which has been widely
accepted as a crosslinguistic property (and often referred to as Burzios
Generalization). The idea is that the nominative has priority over the
accusative, or conversely, the accusative cannot be activated until the
nominative is activated first. Thus, if a lexically case-marked subject NP
(i.e., henni in (21a)) refuses the nominative, the nominative must find
another NP for assignment (if there is one in the same finite clause),
resulting in case shift as in Icelandic. If, however, the nominative is taken
by a lexically case-marked subject NP, the accusative then has its chance
for assignment, as in Faroese (a language closely related to Icelandic),
resulting in case overlay (i.e. mr in (21b)). Thus, on this analysis, case
130 Max Soowon Kim
overlay is tantamount to case stacking with a nonovert nominative case,

and this happens if a language-specific restriction disallows the stacked
cases to be morphologically visible.
Theories differ as to how syntactic case is assigned in general and how
case shift vs. case overlay should be handled in particular. In this chapter
I will consider the Case-in-Tiers (CiT) theory of morphological case
assignment proposed by Yip, Maling, and Jackendoff (1987) and discuss
how finer adjustment to its provisions can make the theory better able
to deal with crosslinguistic variation, including locative marking of agen-
tive subjects in Korean. I suggest that the minimal revision necessary for
the CiT theory should be the addition of syntactic domains as a para-
metric choice to the Case Tiers:
(22) A Parametric Option for the Case-in-Tiers Theory
A language-specific parametric option is whether an entire finite
clause is selected as the syntactic domain in which the hierarchical
assignment of syntactic case is implemented.
In the remainder of this section I will justify this revision by showing how
the revised theory can better handle the crosslinguistic facts.
The distribution of lexical case and its interaction with syntactic case
in Icelandic strongly supports the CiT theory as originally proposed. In
Icelandic, the nominative gets assigned to another NP in the same finite
clause whenever there is a lexically case-marked subject. This is illus-
trated with the mappings in (23). (To save space, I will give relevant data
only when necessary, and refer the reader instead to the cited references
for verification.)
(23) a. GF-Tier: NPSUBJ NPOBJ
Case-Tier: NOM ACC
b. LEX
GF-Tier: NPSUBJ NPOBJ
Case-Tier: NOM ACC (case shift)

The diagram in (23a) illustrates the typical one-to-one and left-to-right

association between the GF-Tier and the Case-Tier. Sentence (21a) is
accounted for by the mapping in (23b). Because the mapping between
the GF-Tier and Case-Tier is one-to-one and left-to-right, assignment of
the nominative skips the lexically-marked subject and instead targets the
object NP. This CiT account also has been shown to handle the intricate
patterns of morphological case marking in Finnish with remarkable sim-
plicity (Maling 2008).
When applied to languages that exhibit case overlay, however,
the CiT theory faces certain problems that need to be resolved.
The diagram in (24) illustrates case overlay for the Faroese example
in (21b).
(24) LEX
GF-Tier: NPSUBJ NPOBJ
Case-Tier: NOM ACC (case overlay)

The problem is coherence or predictability as to when case overlay must
occur and when it must not since the choice between case shift and case
overlay appears to be a matter of language-internal variation as well as
crosslinguistic variation. In Faroese, passives are the ones that defy the
expected case pattern of (24), giving rise to a dative-nominative pattern
instead (Barnes 1986). In Korean, locative agentive subjects induce case
overlay (i.e., LOC-ACC), as shown in (10b), while dative nonagentive sub-
jects trigger case shift (i.e., DAT-NOM), as shown in (1b).
Upon scrutiny, it appears that we should recognize the role of an
external argument or its syntactic domain for these other languages.
Unlike Finnish and Icelandic, which show evidence that the domain of
syntactic case assignment is an entire finite clause (see Maling [2008] for
Finnish examples where case shift extends into nonfinite complement
clauses as well), Faroese, Japanese, and Korean show evidence that a
critical division that separates IP (or S) from VP needs to be made for
the purpose of syntactic case assignment. In other words, in Faroese,
Japanese, and Korean, case shift occurs in a syntactic domain that lacks
an external argument, while case overlay occurs in a syntactic domain
that has an external argument.
132 Max Soowon Kim
Note that case overlay creates a conundrum. If case overlay has to

trigger assignment of the accusative in (10b), what blocks case overlay
from applying in (1b) is unclear. Besides, an overt case-stacked form like
*eyse-ka (LOC-NOM) is plainly ill-formed in Korean, and case stacking is
altogether disallowed in Japanese. The null hypothesis then is that if an
overt form of case stacking does not exist, neither does a covert form of
case stacking.
Moreover, on the IP vs. VP segregation, the CiT theory can be further
revised to have case shift and case overlay unified as one and the same
operation, namely, case shift.
(25) Case Shift (unified)
Assignment of the nominative skips any subject of the following

type: a non-nominative subject (e.g., quirky subject, oblique subject)
or PRO.
Coupled with the IP vs. VP division, the revised CiT theory works
as follows. For Icelandic or Finnish, the theory applies as before:
a quirky subject induces case shift, and the nominative targets an object
NP since the domain of syntactic case assignment is the entire finite
clause that includes IP and VP. For Faroese, Japanese, and Korean,
a quirky (or oblique) subject also induces case shift. But if it is an exter-
nal argument, the nominative is left unassigned because the external
argument is the sole NP within the IP domain (excluding VP); accusative
case then looks into VP for assignment. If, however, a quirky subject is
not an external argument, hence the IP domain is either nonexistent
or there is no NP in the IP domain, the nominative then looks into
VP for assignment and takes priority over the accusative. This is sketched
in (26).
(26) a. LEX/SEM
GF-Tier: [IP NPEXT.ARG [VP NPOBJ ]]
Case-Tier: NOM ACC
b. LEX/SEM
GF-Tier: [IP [e] [VP NPOBJ NPOBJ ]]
Case-Tier: NOM ACC

The fact that the nominative is assigned only when there is an appro-
priate overt NP in its syntactic domain must be kept distinct from any
requirement that the nominative be mandatorily assigned (or discharged).
For one thing, if a finite clause has only one argument NP and if that NP
is lexically case-marked, the sentence shows up well-formed without a
nominative NP, as seen in Icelandic (Andrews 1982); we certainly do not
want to treat it as overlay case.
(27) Mr klnar. (Icelandic)
me(D) is-getting-cold
I am getting cold.
For another, the revised operation of case shift can readily extend
to PRO:
(28) a. Eat them/*they!
b. Sy suklaa/*suklaan! (Finnish)
eat chocolate-NOM/*ACC
Eat the chocolate!
English and Finnish show the opposite property: while a null subject in
Finnish induces case shift (Maling 2008), a PRO subject in English does
not. The crucial difference is that in English the first domain of syntactic
134 Max Soowon Kim
case assignment is IP (excluding VP), but in Finnish the first domain of

syntactic case assignment is always the entire clause. Case shift applies
as stated in (25), with the outcome that the nominative is left unassigned
(or vacuously assigned) in English (PRO having to be caseless), which is
what happens to a single dative subject in Icelandic, while in Finnish the
nominative moves down to the object NP. This differs from the position
of Yip, Maling, and Jackendoff (1987) that a PRO subject is assigned
nominative case in English.
7.5 Conclusion
In this chapter I have shown that agentive subjects in Korean can be case
marked with a locative case, both in plain form (eyse) and honorific form
(kkeyse), quite generally under a proper semantic condition. Based on
the morphosyntactic properties of locative agentive subjects, I argued
that the honorific case is a locative case and its linguistic function is to
avoid direct personal reference. In light of the semantic case marking of
agentive subjects, I reexamined certain aspects of the Case-in-Tiers
theory of morphological case assignment, especially the distinction
between case shift and case overlay, and suggested that case overlay be
treated as a subset of case shift.
Notes
1. At various stages of writing this paper I benefitted from the suggestions and
comments made by the following people: Jane Grimshaw, Ray Jackendoff, Joan
Maling, Keiko Murasugi, Mamoru Saito, James Yoon, and the anonymous review-
ers. I also benefitted from the audience at the Nanzan University Syntax Work-
shop (Nagoya, July 2011), where parts of the material were presented. I am solely
responsible for any flaws and errors. Last, but not least, I am grateful to the
editors of this volume, especially Ida Toivonen, for the opportunity to participate
in the much-awaited celebration of Rays 70th birthday. His intellectual influence
on my linguistic thinking has been enormous, and it has been my pleasure and
privilege to learn from and be around such a gifted linguist. I fondly remember
when I was housesitting his Belmont home in the summer of 1996. I had the
privilege of using his marvelous home office on the third floor. There was a small
additional room that was accessible only from his office. What did I see there?
His stunning artistic and engineering talent: the whole room was railed for the
lovely locomotive Ray had built for himself! Who would have thought that while
writing a train of books, this world-class linguist was building a train and rails?
2. A topic marker (un/nun) must not be taken as a morphological case since it
can mark any NP, regardless of the NPs GF, theta role, or argumenthood, hence
it lies outside the morphological case system.
3. Available at: http://news.hallym.ac.kr/news/articleView.html?idxno=4045 (with

translation and Romanization added by this author).
4. The status of dative case in Korean is an unresolved issue. If dative is a syn-
tactic case rather than a semantic case, as suggested to me by Joan Maling (pers.
comm.), much of the generalization can be explained (i.e., it has to compete
with nominative case for priority). Stacking of dative case and nominative
case or accusative case then may become an issue unless two syntactic cases
can stack.
5. Even in Icelandic, which is famous for a rich array of lexically case-marked
subjects, agentive subjects always receive the nominative, which Sigursson
(2003, ex. 44) states as a universally-quantified proposition: x: (DP(x) &
subject(x) & agent(x)) nominative(x).
6. A reviewer has suggested that a bimorphemic analysis of locative case as
dative (ey) plus location (se) can account for the observed mutual exclusivity.
That is quite plausible, and it can also extend readily to animate dative case
(eykey) as dative (ey) plus animacy (key).
References
Andrews, Avery. 1982. The representation of case in modern Icelandic. In The

Mental Representation of Grammatical Relations, edited by Joan Bresnan, 427
503. Cambridge, MA: MIT Press.
Barnes, Michael P. 1986. Subject, nominative, and oblique case in Faroese. Scripta
Islandica 37: 1346.
Blake, Barry J. 1994. Case. Cambridge: Cambridge University Press.
Chomsky, Noam. 1981. Lectures on Government and Binding. Dordrecht: Foris
Publications.
Gerdts, Donna B., and Cheong Youn. 1999. Case stacking and focus in Korean.
In Harvard Studies in Korean Linguistics VIII, edited by Susumu Kuno et al.,
325339. Seoul: Hanshin Publishing.
Grimshaw, Jane. 1990. Argument Structure. Cambridge, MA: MIT Press.
Jackendoff, Ray. 1990. Semantic Structure. Cambridge, MA: MIT Press.
Kim, Soowon. 1999. Sloppy/strict identity, empty objects, and NP ellipsis. Journal
of East Asian Linguistics 8 (4): 255284.
Kim, Young-Joo. 1990. The Syntax and Semantics of Korean Case: The Interaction
Between Lexical and Syntactic Levels of Representation. PhD diss., Harvard
University.
Maling, Joan. 2008. The Case tier: A hierarchical approach to morphological case.
In Handbook of Case, edited by Andrej Malchukov and Andrew Spencer, 7287.
Oxford: Oxford University Press.
Maling, Joan. 2013. What Korean tells us about case theory: A retrospec-
tive. Plenary presentation at the Harvard International Symposium on Korean
Linguistics, Harvard University, Cambridge, MA, July 2013.
136 Max Soowon Kim
Maling, Joan, and Soowon Kim. 1992. Case assignment in the inalienable posses-
sion construction in Korean. Journal of East Asian Linguistics 1 (1): 3768.
Martin, Samuel. 1992. A Reference Grammar of Korean. Rutland, VT: Charles E.
Tuttle Co.
Nakanishi, Kimiko, and Satoshi Tomioka. 2004. Japanese plurals are exceptional.
Journal of East Asian Linguistics 13 (2): 113140.
Sells, Peter. 1995. Korean and Japanese morphology from a lexical perspective.
Linguistic Inquiry 26 (2): 277325.
Sells, Peter. 2004. Oblique case marking on core arguments. Perspectives on
Korean Case and Case Marking, edited by Jong-Bok Kim and Byung-Soo Park,
151182. Seoul: Thaehaksa.
Sigursson, Halldor Arman. 2003. Case: Abstract vs. morphological. In New
Perspective on Case Theory, edited by Ellen Brandner and Heike Zinsmeister,
223268. Stanford, CA: CSLI Publications.
Suh, Cheong-Soo. 1996. Kwuke Mwunpep [Korean grammar]. Seoul: Hanyang
University Press.
Yip, Moira, Joan Maling, and Ray Jackendoff. 1987. Case in tiers. Language 63
(2): 217250.
Yoon, James H. 2005. Non-morphological determination of nominal particle
ordering in Korean. In Clitic and Affix Combinations: Theoretical Perspectives,
edited by Lorie Heggie and Francisco Ordez, 239282. Amsterdam: John
Benjamins.
Yoon, James H. 2007. Raising of major arguments in Korean and Japanese.
Natural Language and Linguistic Theory 25 (3): 615653.
Yu-Cho, Young-mee, and Peter Sells. 1995. A lexical account of inflectional suf-
fixes in Korean. Journal of East Asian Linguistics 4 (2): 119174.
Zaenen, Annie, Joan Maling, and Hskuldur Thrinsson. 1985. Case and gram-
matical functions: The Icelandic passives. Natural Language and Linguistic
Theory 3 (4): 441483.
8 Lexical Aspect and Natural Philosophy: How to Untie
Them
Henk J. Verkuyl
8.1 Introduction
Let me begin with two quotations. The first one is from Jackendoff:
[. . .] the learning of language isnt just a passive soaking up of information
from the environment. Rather, language learners actively construct unconscious
principles that permit them to make sense of the information coming from the
environment. (1993, 35)
This is one of the many statements made by Jackendoff stressing the

importance of the cognitive perspective on how to do linguistics. It says
that he does not accept the physicalist idea of the mind being a mirror
of what is going on out there, raising the question of which unconscious
principles come into play.
This question is also raised in the second quotation:
Language is the medium to think and to make ones thoughts known; the instru-
ment with which everyone shapes his inner world and makes it known to others.
It expresses what one thinks, not how reality is; the meaning of the words should
therefore not be sought in this [reality] but in the world of thoughts. . . . The
reader should consult his own representation of the things, not the things
themselves.
For the author there is a clear distinction between an external world and
an internal one. The external one is the world of reality, nature, life, and
what these bring about; the internal world is the mental world, the world
of thoughts, giving shape to the external world in so far as this is know-
able to a human being, language teaches this in the clearest way. Then
he continues:
An action expressed by a verb is thought of as going on, as an action in progress,
or as having been done, as a completed action. An action is really the ever-
continuing transition from an action in progress to a completed action. A verb
138 Henk J. Verkuyl
captures an action either in the middle of this transition or at the other end, when
it has become a totally completed action.
Here clearly a linguist must be speaking about a well-known aspectual

opposition. The first two quotations invoke the question of how the
active construction of unconscious principles permit us to make sense of
our dealing with the external world. The third raises two questions central
to the present chapter: (a) what sort of property of a verb allows to
express what is and what is not a completed action?, and (b) what is the
nature of this ever-continuing transition from action in progress to
completed action?
There is a gap of nearly one hundred and fifty years between Jackend-
offs quotation and the other two: they are from Te Winkel (1866), a
paper about the Dutch tense system.1 By his choice of a binary system,
Te Winkel did not accept the tripartition PastPresentFuture as a prin-
ciple making sense of the information coming from the environment,
but he organized the Dutch tense system in terms of three oppositions:
(i) Past Present, (ii) Synchronous Posterior, and (iii) Action in Prog-
ress Completed Action.2
This is a clear example of modeling semantic competence by the active
construction of a binary principle that enables speakers and hearers to
use language as guiding the interpretation of the world out there. To
consider future a part of the non-actualized present is to give priority to
modal rather than to temporal considerations in spite of the fact that we
use temporal terms based on the conceptual tripartition into Past, Present,
and Future (cf. Broekhuis and Verkuyl 2014). The third oppositionin
terms of operators, an opposition between IMP and PERFwill be central
to the present chapter. It is therefore important to see that the primary
operators PRES and PAST of opposition (i) differ from the other four in
(ii) and (iii) by connecting a tenseless predication to the real time of
speaker and hearer, making the predication tensed. This means that the
opposition IMP and PERF may be seen as belonging to the tenseless part
of a tense system but also, as in Russian, as being complementary to
tense. Their role in the cognitive organization of our dealing with tense
and aspect will be argued to be crucial for providing structure necessary
to rendering abstract verbal information into actualization in real time.
8.2 Lexical Aspect vs. Grammatical Aspect
In the aspectual literature, often a distinction is made between grammati-

cal aspect and what is called lexical aspect. Grammatical aspect is
Lexical Aspect and Natural Philosophy 139
generally seen as a matter of viewpoint at the clausal level ever since

Smith (1991), who followed Comrie (1976) in distinguishing it from
lexical aspect. Lexical aspect concerns the internal constituency of
actions, also known under the name of the German term Aktionsart. In
spite of its popularity, there are some problems with the distinction itself.
The difference between Russian imperfective and perfective aspect in
so-called aspectual pairs such as pisati napisatp write, vyigryvati
vyigratp win, igrati sygratp play might be considered a lexical matter
because the semantic difference between sentences like (1a) and (1b) is
located in the verb morphology:
(1) a. Tibor igrali sonaty Bethovena.
Tibor was playing Beethoven sonatas.
b. Tibor sygralp sonaty Bethovena.
Tibor played the (= these/those, a certain number of)
Beethoven sonatas.
However, if the difference between the imperfective past-tense form
igrali and its perfective counterpart sygralp is to be understood as a
matter of lexical semantics, one is bound to attribute the difference in
meaning between (1a) and (1b) to the difference between the verbs
igrati and sygratp without any appeal to their arguments, because these
are the same in both sentences. One is also forced into saying that, on
top of the meaning element PLAY, the verb in (1a) has an imperfective
aspectual meaning element Ai requiring that (1a) be interpreted as per-
taining to an action-in-progress, whereas, apart from what is expressed
by PLAY, the verb in (1b) has a perfective meaning element Ap which
allows (1b) to pertain to a completed action. In both cases, the meaning
element A should be seen as strictly independent from the arguments of
PLAY, because otherwise one would do structural semantics where the
analysis of the relation between a verb and its arguments belongs. In
other words, on a strictly lexical analysis of the aspectual difference, one
is bound to distinguish between the verb PLAY+Ai(X,Y) and the verb
p
PLAY+A (X,Y) before the values for the arguments X and Y enter the stage
in (1a) and (1b). This is what Slavic linguists generally do and certainly
what didactic grammars such as Kolni-Balozky ([1938] 1960) and Beyer
(1992) do: they speak of the vidy glagolov (the aspects of the verb) and
do not worry about arguments.
Yet, those who advocate a separation between grammatical aspect
and lexical aspect consider the aspectual difference between (1a) and
(1b) structural, not lexical in the strict sense just described. This is
140 Henk J. Verkuyl
understandable, because the interpretation of these sentences can be

shown to be determined by information not restricted to the verb itself:
(1b) pertains to a specific set of sonatas due to the presence of the per-
fective prefix s-. Likewise, the negation element in (2) as well as the NP
nikto nobody interact with the verb determining the resulting aspect.
(2) Nikto ne igrali sonatu Bethovena.
Nobody played a Beethoven sonata.
The sentence does not say that nobody was playing a Beethoven sonata,
rather (2) expresses a statement about a fact, about a complete non-
action, so to say. And the same holds for (2) with sygralp instead of igrali.
This can only be accounted for by structural considerations.
At this point, it is necessary to extend the scope by taking into account
non-Slavic languages for which the opposition between grammatical
aspect (also called viewpoint aspect) and lexical aspect (also called situ-
ational aspect) has become popular. The motivation for the opposition
in non-Slavic languages like English and French is the feeling that one
should (roughly) follow the Slavic way of dealing with what in English
is expressed by the Progressive Form and in French by the Imparfait (the
examples in (3) are from De Swart [2012]).
(3) a. She was writing her thesis in 2009.
Elle crivait sa thse en 2009.
b. She wrote her thesis in 2009.
Elle crivit sa thse en 2009.
(4) a. She wrote papers with him.
b. She wrote three papers with him.
Those who distinguish viewpoint aspect from lexical aspect mostly
restrict the former term to the sentences in (3) and talk about the opposi-
tion as a difference between the eventuality from the inside (in (3a)) and
from the outside of it (in (3b)) as parallel to the Russian opposition
discussed in (1). For this, it is necessary to assume that the two sentences
have the same eventuality in common, say W. The inside viewpoint in
(3a) presumes [a < p < b] where p is taken as the point of perspective
from which one can observe what is going on (somewhere in the middle)
of the W-interval [a,b]. For the outside point of view, one would have
[a < b] < p, with the point p located outside [a < b].3
The next step is to assume that the opposition between the durative
sentence (4a) and the terminative sentence (4b) is situational due to our
knowledge that writing papers is different from writing three papers,
again independent of the language under analysis. This knowledge about

the world is then located in the lexicon, so a large majority of linguists
tends to consider situational aspect (in the German literature: Aktion-
sart) identical to lexical aspect and by this they allow the terms lexical
and structural to be blurred, although they continue to speak about
lexical aspect (e.g., Gvozdanovic [2012, 782]; De Swart [2012]; Dowty
[1979], among many others). Thus many scholars consider the Russian
opposition in (1) and the opposition between (3a) and (3b) a matter of
viewpoint aspect as opposed to the lexical aspectual opposition between
(4a) and (4b).
However, this picture is disturbed in view of Russian sentences such
as (5).
(5) Vchera Tibor igrali sonaty Bethovena.
Yesterday Tibor played Beethoven sonatas.
In this case, the translational equivalent of (5) is more correct than a
translation with the Progressive Form would be: (5) expressesor at
least may expressa fact without focusing on action in progress. This
might have to do with the fact that yesterday provides a closed domain
and so (5) may convey that Tibor practiced the whole day without claim-
ing that each sonata was played completely and without excluding that
he played some of them completely. Here it is quite hard to understand
why the difference between igrati and sygratp should be a matter of
grammatical aspect, because one cannot maintain that the factual inter-
pretation of (5) is from the inside and its perfective counterpart Vchera
Tibor sygralp sonaty Bethovena from the outside.
The difference between (6a) and (6b) goes in the other direction.
(6) a. Tibor sygralp sonaty Bethovena.
lit.: Tibor played Beethoven sonatas.
b. Tibor sygralp obe sonaty Bethovena.
Tibor played both Beethoven sonatas.
In Russian, the difference between (6a) and (6b) cannot be explained in
terms of viewpoint aspect because both sentences express completed
action. It can only be explained in terms of the overt absence of quanti-
ficational information in the NP sonaty Bethovena and overt presence of
it in obe sonaty Bethovena. This explanation is based on the idea that
playing Beethoven sonatas is crucially different from playing both
Beethoven sonatas. This language-independent ontological difference is
inherent to these eventualities and leaves no room for a viewpoint of the
142 Henk J. Verkuyl
speaker or hearer. What matters linguistically is that the Russian perfec-

tive prefix s- is not very welcome in sentences describing an open situa-
tion, so what it does in (6a) is to assign to its internal argument a
quantificational restriction, called [+SQA] in Verkuyl (1972), imposing an
interpretation some sonatas but not unboundedly many. Informally, the
feature [+SQA] assigned to an NP stands for Specified Quantity of A,
where A is the denotation of the Noun and where Specified Quantity
stands for information expressed by the Determiner (cf. Verkuyl 1993).4
The [+SQA]-information in (6b) is compatible with the presence of sygralp,
so that (6b) is about a completed action.
The discussion so far amounts to the following conclusions:
The term situational aspect as opposed to grammatical aspect is, strictly
speaking, a terminological misnomer: situations are in the world out
there and so an ontological notion is opposed to a linguistic one.
Lexical aspect is also a terminological misnomer. The terminativity of
(4b) She wrote three papers with him is a matter of predication. Aspect
is always a matter of structure and so lexical aspect ceases to be lexical
as soon as compositionality is allowed to do its work.
There is no need for a distinction between grammatical/viewpoint
aspect and lexical/situational aspect because both are structural.
Lexical semantics is a proper part of aspectual analysis only in so far
as it focuses on a lexical item in order to get at an atomic aspectual
element expressed by it that participates in aspectual composition.
It is impossible to untie something that does not exist, so in the remain-
der of the present chapter the term lexical aspect will be used as corre-
sponding to those aspectual phenomena that are generally discussed
under this label.
8.3 Natural Philosophy
The reason for combining the notion of lexical aspect with the notion of
natural philosophy in the title of the present chapter is for me the obser-
vation made by Filip (2012b, 721) that the origins of our understanding
of lexical aspect lie in Aristotles distinction of kinesis and energeia.
Both Filip (2012a,b) and Rothstein (2004)this book is about lexical
aspectgive credit for this insight to Dowty (1979), a work that indeed
can be seen as linguistically completing the foundations for seeing aspec-
tual classes as organizing our ontological view of the world, foundations
laid around the fifties by natural language philosophers like Ryle (1949),
Vendler (1957), and Kenny (1963).
In a section titled The development of verb classification, Dowty
observes that Aristotle distinguishes between kineseis (translated
movements) and energiai (actualities), a distinction which corresponds
roughly to the distinction we shall be making between accomplishments
and activities/states (1979, 5253).5 By this, Dowty opens the door for
an external justification of his linguistic classification by the authority of
philosophers, with Vendler in the lead. Many linguists have gone through
this door.
Let me first give the crucial quotation for understanding the Aristo-
telian distinction mentioned by Dowty:
(7) Now of these processes we should call the one type motions (kinseis), and
the other actualizations (energeias). Every motion is incompletethe pro-
cesses of thinning (ischnasia), learning (mathsis), walking (badisis), building
(oikodomsis)these are motions (kinseis), and incomplete (ateleis) at that.
For it is not the same thing which at the same time is walking and has walked,
or is building and has built, or is becoming and has become, or is being moved
and has been moved, but two different things; and that which is causing
motion is different from that which has caused motion. But the same thing
at the same time is seeing and has seen, is thinking and has thought. The
latter kind of process, then, is what I mean by actualization, and the former
what I mean by motion. (Metaphysics 1048b, 2834; in Aristotle ([1933] 1961),
transl. by H. Tredennick; Greek key words as they occur in the text are added
to the translation.)
As pointed out in Ackrill (1965, 1978) and Charles (1985), the distinction
between a process aimed at a goal (kinsis) and actualization, the situa-
tion in which the goal has been achieved (energeia) makes them mutually
exclusive. Each process is incomplete as long as the telos has not been
reached, whereas each actualization is in itself complete. Each completed
process is an actuality (energeia), which has no goal in itself. Aristotle
made motion dependent on a force (a mover) as a precondition of
change to keep it going, all motion ultimately being reduced to the
(Unmoved) Prime Mover described in Physics, Book 8 (Aristotle 1985,
vol. 1, 418-446). As a consequence of this ideatotally absent in the
Galilean perspectivechange had to be seen as always related to a telos,
a goal.
In the seventeenth century, Aristotelian natural philosophy came to
its end in the domain which today is called natural science. Physicists took
a supreme step-by-step effort by saying farewell to what nowadays is
considered at best a form of nave physics. The Galilean vision on motion
144 Henk J. Verkuyl
(Galilei [1632] 1953, The Second Day) is quite different from the Aris-
totelian one in that in the former, motion is in principle eternal
(unbounded in aspectual terminology) unless there is some force bring-
ing the moving object to a stop. It should be added that Aristotle was
only interested in motion as carrier of change.
Many linguists working on aspect are attracted to the Aristotelian
kinetics, the key term for them being telicity.6 After Aristotle was put
aside as a guide in physics it took the Catholic Church nearly four
hundred years before Galileos work finally was declared compatible
with the papal doctrine. This might explain why Aristotle remained an
ipse dixit-guide in logic and philosophy, under the label natural philoso-
phy, until far into the 20th century. In many European countries, the
humaniora were imbued with natural philosophy.7
One could, of course, argue that Aristotles ontology has nothing to do
with the notion of motion in modern physics and that it perfectly
addresses the problem of how to account for the cognitive organization
of our dealing with the world out there. In other words, his Metaphysics
could be taken as providing a serious model for the semantics of expres-
sions in natural language in spite of severe criticisms from outside lin-
guistics, appropriately or not. I am on the side of those who hold that
one cannot exclude that the construction of unconscious principles deter-
mining our cognitive organization could turn out to be compatible with
the Aristotelian view after all. Yet the best attitude for those who take
Aristotle as an aspectual guide seems to be to remain skeptical or at least
to be prepared to follow a different track. The appeal to Aristotles theo-
retical terms such as telos, change, and motion in the aspectual literature
is not only amazing in view of the fact that his analysis of motion has
been shown to be insufficient but also that nowadays there is still a lot
of uncertainty among philosophers about the correct interpretation of
the distinctions in Metaphysics 1048.8
At any rate, it is rather difficult to escape from misleading conclusions
made on the basis of translations. For example, in (7) Aristotle uses the
Greek verb badidzein, which is translated as walk even in the case of
authoritative translations, such as Hugh Tredennicks and William David
Rosss translations of Metaphysics (Aristotle [1933] 1961, 1908) and
Rosss translation of Nicomachean Ethics in Aristotle (1985, vol. 2).
Albert Rijksbaron (pers. comm.) points out that the English walk has as
its most natural translation the verb peripatein, where peri-, of course,
already indicates that there is no telos, hence no kinsis, as in choreuein
(dance) and aulein (play the flute). Badidzein most generally implies or
is connected to an explicit goal, and Aristotle used this verb because in

his analysis of motion, he was interested in change, not in activity. In
other words, in (7) walking must be seen as an incomplete motion.
The same inaccuracy holds for the translation of the Greek verb
oikodomein, which means to build a house. In the Bywater edition of
Nicomachean Ethics 1174a, 21, available on the website of the Perseus
Digital Library (Aristotle 1894), the adjectival form oikodomik, which
modifies the noun kinesis, is translated as building a house(-movement),
whereas the Ross/Urmson-translation in Aristotle (1985, vol. 2) trans-
lates it as building(-movement). The fact that translations are insensi-
tive to important aspectual differences such as between building and
building a house and between walk and walk to the beach might justify
skepsis against the Aristotelian ontology as a beacon for research into
aspectual information expressed in natural language.
One of the key problems is that Aristotle did not distinguish between
talking about situations and talking about the meaning of verbs. Still,
his Metaphysics is to be seen as an exercise in ontology in the first
place. So the problem boils down to the question of whether an ontologi-
cal analysis can be equated with a lexical semantic analysis. In this
respect, there is something interesting in the role of Vendler as an aux-
iliary guide. Vendler (1957) is an essay called Verbs and times, which
also appeared in Vendler (1966), a collection of essays with the title
Linguistics in Philosophy. Linguists fond of the Vendler-classes seem to
have forgotten that Vendler did not write for linguists but for philoso-
phers interested in ontology. The title Verbs and times suggests that
Vendler analyzes the meaning of English verbs as to how they express
temporal information. But for the philosopher Vendler there is no objec-
tion to calling push a cart or run a mile or draw a circle a verb while
taking each of them as a (complex) predicate (e.g., 1966, 102). As a phi-
losopher interested in metaphysics, Vendler was justified in seeing
meaning analysis of verbs ( predicates) as instrumental for discovering
different ontological classes corresponding with them. After all, Vendler
follows Aristotle in the art of blurring the difference between object
language and metalanguage.
What really counts against using Aristotles work as the foundation for
aspectual analysis seems to be that he is not interested at all in aspectual-
ity. If one interprets his analysis of the Greek equivalent of build as a
lexical aspectual analysiswhich it is certainly not then one should at
least see that he treats the meaning of build as prototypical (in the
Wittgenstein/Rosch sense of prototypicality): it is about one house or
146 Henk J. Verkuyl
temple, whereas a linguistic analysis should at least raise the question:

what is the telos of building houses? And what is the telos of building
on talents? And of building data bases? Does the presence of a bare
plural in the VP have the effect of ruling out the telos so that we are
back at the incomplete motion? If so, what then is the telos of this incom-
plete motion? Aristotles ontological bias about the natural end of a
motion prevailed by prototyping. And we see this back in the current
aspectual literature, at the cost of compositionality.
From a linguistic point of view, the examples by Aristotle given in (7)
concern at best the semantics of the Greek counterparts of verbs like
walk, build, see, think, etc.9 This would bring many of those who deal with
Accomplishments and Achievements in terms of completion into trouble.
Moreover, Aristotle explicitly used the perfect tense form bebadiken for
expressing that a walking process is completed (i.e., actualized) and the
perfect form oikodomken for the completion of a building process. Is
perfect tense necessary in order for a telos to become actualized? What
sort of completion is expressed by the perfect and what sort of comple-
tion is expressed at a telos? Questions like these should be raised before
accepting telicity as an aspectual notion. Without an appeal to tense, the
interpretation of energeia as actuality and actualization in Aristotles
Metaphysics remains mysterious, in particular when energeia is also taken
as expressing a state. With tense, a present-tense form introduces real
time in which motions are or have been actualized with respect to the
present of the speaker/hearer.10
Summarizing, there is no reason for taking Aristotle as an a priori
guide to revealing the principles by which we organize our temporal
experience in language. This implies that there is no need to accept his
ontology as soon as we can analyze the contribution of a verb apart from
the content of its arguments. Aspect can be studied without natural
philosophy.
8.4 Verbs and Time
Linguists often look down on lexicography, and indeed it is quite easy to

say a lot of negative things about the consistency, correctness, and com-
pleteness of dictionaries, but it is also comforting to see that the Oxford
English Dictionary (OED) defines a verb as a word used to describe an
action, state, or occurrence, and forming the main part of the predicate
of a sentence and to observe that this definition is far more strict than
most of the linguists writing about aspect in the Aristotelian tradition
feel themselves obliged to give. The OED seems to make a clear distinc-
tion between a verb and a verb phrase, staying close to what is (or should
be) the linguistic norm: to consider a verb the kernel (head) of a predi-
cate and not a VP, let alone a predication at the S-level. In this respect
the OED does not pay lip service to the corresponding linguistic labels,
because (leaving the irrelevant things out) the verb walk is described as
in (8) and play as in (9).11
(8) walk [no obj., usu. with adverbial] move at a regular pace by lifting
and setting down each foot in turn, never having both feet off the
ground at once: I walked across the lawn | she turned and walked a
few paces.
(9) play [with obj.] produce (notes) from a musical instrument;
perform (a piece of music): they played a violin sonata.
This is exactly what is to be expected from a dictionary. In the definition
of walk there is no room for information about complements that
walk may take except in terms of some examples. This allows walk to
be the same semantic unit occurring in Mary walked in the park,
Mary walked a few paces, Mary walked to the station, and in Nobody
walked. Thus it makes no sense to take walk as pertaining to an incom-
plete action with a fixed goal. Also, we find our experiences with moving,
lifting and setting down feet and staying with one foot on the ground
back in definition (8), but the definition abstracts away from temporality
in the sense of unique actualization in real time. Verbs are atemporal
as long as they are in the lexicon, and it is only in the use of a sentence
in a particular discourse situation that actualization can play a role.
Note that, in spite of the prototyping example in (9), the definition
leaves room for all sorts of complements, due to the use of produce and
perform.
Given that temporality is to be blocked from the lexicon and given the
need to evade prototypicality, when it comes to analyzing aspectual
information, it is time to discard terms like telicity, culmination, homoge-
neity, cumulativity, and other popular terms such as DO and CAUSE in the
Aristotle-Vendler-Dowty tradition as explanatory terms. This gives room
for starting with the verb alone abstracting from the content of its argu-
ments, directed by the question: what property must a verb have to
provide the abstract structure necessary for obtaining actualization in
real time in a spoken utterance? Referring back to the separation between
PLAY and the aspectual information A in the paragraph after (9), it
seems appropriate from the methodological point of view to locate this
148 Henk J. Verkuyl
structure in the A-part of a verb. That is, in all verbs. So, the next question
is: what sort of structure is it that verbs have in common and that is
involved in aspectual composition?
8.5 Sobering Down Ontology
Verkuyl (1993, chap.13) argues that the interaction between the number
systems R and N is essential for aspectual information, where R models
our experience with continuity and density, and N our experience with
discreteness, repetition and habituality, but in both cases outside real
time. In the present section, I will follow that line, but the outcome will
be different in important respects. Given the task of analyzing verbs
without taking into account the content of their arguments, the following
list of assumptions announces itself:
a. each verb represents type structure in the sense that the time axis
Tmodeling actualization in real time at the moment of speechis
never part of the lexicon itself; therefore the denotation of each verb
in the lexicon contributes atemporal R-structure because R can be
seen as isomorphic to T;
b. certain verbs require lexically a mapping from R+ into N;
c. certain verbs require an additional mapping from N into N in order
to be able to lexically express repetition, habituality, plurality, etc.;
d. verbs not falling under (b) and (c) can occur with complements that
cause a structural shift from R+ to N;
e. the Progressive Form requires the presence of R+.
It is not possible to work this out in detail within the space allowed here.
What follows is therefore a programmatic description of the architecture
necessary for the construction of a coherent account of how a verb con-
tributes to complex aspectual information.12
The basic idea underlying the present analysis is that the tense opera-
tors pres and past map from numerical systems like R and N into the
time axis T. This idea is based on the assumption that the (mental)
lexicon has no timeline, because its essence is to abstract from actual
situations and to store our knowledge of them independently of tensed
time. A verb does not provide a token in real time, it provides type struc-
ture, and the idea is that number systems are more appropriate for
expressing type structure, and that the time axis is more suitable for
actualizing a type in real time as a unique token. This means that a
tenseless verb is to be interpreted as expressing structure in R, here

modeled in terms of the function fco : R+ R+ defined as:
(10) fco (x) = x
This is a function that provides in an elementary way the basic atem-
poral properties necessary to experience time as linearly ordered, dense,
continuous, directional, and so on.13 The subscript co stands for continu-
ity in R+. On the assumption that the function in (10) belongs to the
denotation of every verb in the lexicon, it provides the sense that each
verb is founded in R+ and that its meaning can be expressed in terms of
intervals mapped onto the time axis T by tense.14
8.5.1 Modeling Aspectual Information of Intransitive Verbs Lexically

At this point it is necessary to separate the intransitive verbs from the
other verbs, because we want to consider them in their capacity of having
just one (external) argument and taking no complement. This means that
we have to consider the question of how to integrate (10) as part of our
lexical knowledge. The function fco provides a stretch in R+ for verbs like
hang, sit, stumble, laugh, lie, sleep, purr, escape, die, etc. by delivering an
unrestricted range of images: for example, Wagners Brnnhilde might
have slept forever, had Siegfried not awakened her. However, for a
subset of themignite, knock, resign, diefco should be restricted because
these verbs demand a discrete output.
In order to distinguish the restricted verbs from the unrestricted ones,
one can make use of two functions in (11): the ceiling function fce : R
N defined as in (11a) and the floor (or, entier) function fen : R N
defined as in (11b).
(11) a. fce = x, where x is the b. fen = x, where x is the
smallest integer not smaller highest integer smaller than
than x. or equal to x.
In an interval (0,1), fce maps all real numbers x such that 0 < x < 1 to 1,
whereas fen maps them to 0.15
Given (11a), verbs like die, ignite, and knock can be separated from
verbs like walk and hang in terms of a composition of the functions (10)
and (11a), that is, as fce fco : R+ N defined as:
(12) (fce fco)(x) = fce (fco (x)) = x
In this way, the verb die is to be interpreted as pertaining to discretization
on the basis of a mapping from R+ into N. This makes it possible to
interpret John died on the evening of August 25 as a discrete event leaving
150 Henk J. Verkuyl
in the dark how many images of the function fco were involved in the
mapping into N. The Past tense locates the (unique) actualization of the
discrete event in real time without requiring that the range of fce be taken
as a point in time at which he died (if that is possible anyhow). In this
way, the truth conditions are clearly revealed.
The most natural way of looking at a verb meaning is that the functions
associated with it start with the origin 0, and that the ceiling function
maps to 1 in order to provide discreteness (in terms of Te Winkel [1866]:
completed action).16 However, such an assumption would not suffice
because verbs like stutter, knock, hit lexically allow for repetition, albeit
not necessarily. This repetition is not expressed by fco in R+, it belongs
clearly to N. In other words, for the correct lexical characterization of
these verbs, the output of fco is to be taken by fce so as to provide a unit
that may be repeated.
To account for this sort of higher level repetition, there is a well-
known function available: the successor function s : N N standardly
defined as:
(13) s(n) = n + 1
Suppose that this function is also part and parcel of the information
lexically expressed by a verb on top of the shift from R into N. Then it
accounts for the unbounded repetition that may be expressed by the
verbs tick, knock, and hit. But it does not account for verbs like die and
melt and, as we shall discuss later, for sentences like The bullet hit the
target on the interpretation that the sentence is about one hit as opposed
to The bullets hit the target. This implies that if one decides to attribute
the successor function to a verb (whether lexically or structurally), there
should always be a way to block it by information outside the verb itself
to stop the function s from being applied. Thus one may think of defining
a function fs : N N as:
m if x m
(14) fs ( x ) =
x + 1 otherwise
Applying this to sentences like Bill belched his way out of the restau-
rant and Harry moaned his way down the road discussed in Jackendoff
(1990, 211-243), this requires that the value of m be context-dependent.
In the restaurant case, the size of the restaurant as well as information
about the amount of beer drunk by Bill could bring the hearer to esti-
mate m as lying between 4 and 8. In the moaning case, it clearly depends
on the length of the road. In both cases, m can be taken as a contextually
Table 8.1
Verbs Occurring with One (External) Argument
Continuous Discrete
fco fcefco fsfcefco

exist, sunbathe, laugh die, arrive, thin, resign, ignite jump
rotate, walk, swim, burn, sit, melt, start, stop, freeze, win, knock, hit, belch
lie, hang hang up
determined natural number higher than or equal to 2. In speech situa-

tions, people mostly are not forced to be very precise: what counts is
finiteness in N.
Restricting ourselves to the expression of the (optional or obligatory)
built-in repetition in verbs like knock and stutter, one may think of
the composition of three functions. To intransitive verbs like knock,
hit, blink, stutter, etc., but not to die, melt, and stop, one may assign
in their role of one-place predicate the function (fs fce fco) : R+ N
defined as:
m if [ x ] m
(15) ( fs fce fco )( x ) = fs ( fce ( fco ( x ))) =
[ x ] + 1 otherwise
If the value of m is taken as 2 or more, table 8.1 gives the three resulting
classes in its three columns based on what has been discussed above.
Note that all verbs share the fco-information, that fce fco-verbs are a
proper subset of them, and that the verbs in the third column are a proper
subset thereof. The arrangement into three classes follows from proper-
ties of the functions involved: fco operates inside an interval, fce maps
the interval into N, and fs operates on the whole interval taken as a
discrete unit.17
The first row of table 8.1 contains purely intransitive verbs, the second
row pseudo-intransitive verbs: they may occur without a complement but
in many situations they allow for them: rotate pictures, walk to school.
They will be discussed below in the analysis of verbs taking a comple-
ment because these require structural information. The verbs in the left
column express only unbounded linear monotone increase along the
number line in R: one can hang, walk, smile, rotate, exist, age linearly and
eternally (given the right conditions such as being allowed to eternal
life). Verbs like arrive and die in the second column belong to the same
class as the verbs thin, melt, and freeze, irrespective of the length of the
interval in which values of fco are mapped to a value in N.
152 Henk J. Verkuyl
The advantage of sobering down on ontology is that, from the aspec-

tual point of view, it is now possible to remove animacy and activity from
the aspectual domain by treating a verb like hang in the same way as
walk: they both presume a stretch in R+ brought about by fco. Of course,
except for robots, walk is mostly taken as an activity of [+animate] beings,
whereas hang is generally used with an [-animate] subject. This allows
for two interpretations of John hung in the tree: in one he was alive and
in control so that the sentence is to be interpreted as expressing an activ-
ity of John, and in the other he was dead or without control, in which
case it is impossible to associate the sentence with activity. In the first
case, John may even have interrupted the hanging in the same way in
which walkers take a rest or stop for a shop window. But what the verbs
walk and hang have in common lies outside the domain of animacy and
activity: they require fco.
By sobering down, we need not distinguish between two verbs hang.
The information [animate] is located in the HANG-part, not in the A-part
of the verbal meaning of hang. Thus, the much-discussed issue of homo-
geneity expressed by verbs ceases to be of interest. Our knowledge about
the need to interrupt a walk will fall under knowledge of WALK not under
A: we know that people involved in physical action sometimes interrupt
their action. Interruption and stops are not aspectually relevant proper-
ties of walk itself, but follow from our knowledge about its external
argument. A walking robot need not have a rest; John will need one if
he is hanging alive and kicking. A pleasant consequence of sobering
down is that it allows for explaining why meaning extensions, figurative
expressions and metaphors retain the aspectual properties of their non-
figurative counterparts.
Some remark about the floor function is in order here. As far as I can
see, there are four options: First, one may use the mapping to 0 by fen in
the composite function fen fco as characteristic for states. In that case, one
would obtain the tripartition State (fen) Process (fco) Event (fce), which
would run counter to the wish to exclude activity as an aspectual factor
in view of verbs like hang and walk. Second, one may use fen as a way to
account for negation: fen takes what did not happen out of time, so to say.
Third, one may use the floor function for dealing with the so-called
ingressive aspect that could be said to focus on the beginning of the
interval yielded by the fco-function. And fourth, one may not use it at all.
I will not make a choice here but take the second option as the most
promising one.
Table 8.2
Verbs with an External Argument and an Internal Argument/Complement
Continuous Discrete
fco fcefco fsfcefco

cover, contain, process reach, discover, realize
draw, hide, rotate, walk, play, pay, melt, start, stop, freeze, win, cross knock, hit, belch
sing, see, swim, burn
8.5.2 Modeling Aspectual Information of Transitive Verbs

At this point it is helpful to take into account transitive verbs as pre-
sented in table 8.2.
Its first row contains verbs that are necessarily transitive in requiring
an overt argument. The verbs in the second row are the pseudo-(in)
transitive verbs of table 8.1. As to the columns, the general idea is that
verbs in the leftmost one express continuity lexically so that it depends
entirely on the arguments whether or not the resulting sentence is ter-
minative. For example, The devil possessed me has a terminative inter-
pretation The devil took possession of me, whereas She possessed a sense
of humor is durative. I saw him for hours makes I saw him durative
(although I need not have seen him permanently), but #For hours, I saw
on TV that the plane had crashed expresses the well-known forced
repetition.
At this point, it should be observed that the floor function might be
necessary for explaining why some verbs do not allow for function com-
position so that state-verbs are to be distinguished from process-verbs,
but the pros of this option should be weighed against the cons of distin-
guishing verbs possess, cover, and contain (hold vs. restrain). For the
verbs of the leftmost column, we assume that the ceiling function fce is
to be contributed from outside the verb itself in the course of the com-
positional process making phrases. For example, the complement of walk
in a sentence like She walked to school contains information that makes
walk to school discrete in N, and this information operates on the fco-
information of walk in a structural way.
The second column contains verbs that are lexically marked as express-
ing the composite fce fco function. An argument for allowing lexical fce
fco-composition in verbs such as win or discover is derived from observ-
ing the sentences in (16):
154 Henk J. Verkuyl
VP
VP NP
int NP
Figure 8.1
The ceiling function positioned at
(16) a. She won medals. - - - - - - . . .

b. She won the three medals here on the wall - - -
c. # For hours, she won the three medals here on the
wall. - - - - - - . . .
Suppose that one may win only one medal per occasion and that (16a)
with the [-SQA]-NP medals reports about a series of successes. The quan-
tification over these occasions is in N: the successor function fs operates
unboundedly on the output of the lexical fce fco. Sentence (16b) is ter-
minative due to the [+sqa-NP. Note that the forced repetition in (16c)
displays the same pattern as in (16a). As said before, there is no reason
to subdivide the discrete verbs in table 8.2 into verbs pertaining to a
longer or shorter interval. The mapping from R+ into N can be under-
stood as offering certainty about the eventuality being engrained in the
Reals. Aspectually, it makes no sense to compare belch, knock, and win
in terms of length even if a belch mostly takes a couple of milliseconds
longer than a knock on the door.
The problem raised by the verbs in the second and third columns, is
to match their fce-information with the [+SQA]-information contributed
by their internal argument or complement. The plausible way of bringing
the lexical application of fce together with the [+SQA]-information of an
argument or complement is to position it such that it serves its duty at
the edge between the verb and its complement. One may think here in
terms of a thematic role assigned by the verb to its argument, as proposed
by Verkuyl (1993, 298304) and pictured in figure 8.1.18
This also works for pseudo-(in)transitive verbs in the leftmost column,
but differently: they are only marked as fco-verbs, so in the case of She
walked to school the fce-information is contributed by the PP-complement.
In many sentences in which fco-verbs occur, the need for the application
of fce comes from outside the lexicon and so the -position qualifies as a
place where fce can be triggered. In She played her card, the quantifica-
tional information in the internal argument her card forces fco into N by
fce. In other words, the function fcecrucial for making an action discrete
is structurally available for receiving the complement-NP so as to yield
the VP-meaning. The place where this happens can be argued to be at
. This makes it unnecessary to go into the Noun-information itself as
in Krifkas work, so that it is not necessary to follow the process of eating
an apple in Mary ate an apple by mapping events to objects and reversely.
It suffices to have the determiner-information expressed by an available
plus the information that apple is a count noun.19
An argument in favor of figure 8.1 comes from VPs like to play matches.
The verb play itself is marked as a fco-verb, and in its intransitive use
it may pertain to something going on eternally (think of mythological
gods playing on Olympus). But in playing matches one ends up in
the same category as winning medals: one needs a way to discretize
locally, that is, per match. Due to the plural, one needs the successor
function to get a(n unbounded) sequence of playing situations. This is
only an option in the case of She played her cards. Here it should remain
underdetermined whether fce is stopped after one application or may
continue unboundedly, but the difference between She played her cards
and She played matches is not aspectual at all. That is a matter of PLAY,
not of A.
8.5.3 Particles in Phrasal Verbs

The -position in figure 8.1 is clearly the place where phrasal particles
belong. Historically they come from prepositions, adverbs, separable pre-
fixes, etc. Claridge (2000, 50) observes that they should either have the
feature motion in general (not location and not direction in the sense
of -ward(s) adverbials) or the feature result or bothwith the last
perhaps being the prototypical cases. (cf. also Bolinger [1971]). This is a
description that fits perfectly in the present analysis, if one is prepared
to stay away from the physical connotation of the terms motion and
result, trading them in for fco and fce, respectively.
Jackendoff (2002) divides phrasal particles into several subclasses,
among them the set of aspectual particles, from which I take some
examples. Sentences like Elena drank the milk up roughly mean V NP
completely, i.e. up is not directional as it is in toss the ball up (2002,
76). Other features of aspectual particles are: (i) some of them are redun-
dant (close up the suitcase); (ii) they are not idiomatic: their meaning is
156 Henk J. Verkuyl
fully predictable; (iii) they are independent, free to combine with verbs
(2002, 76). These are exactly conditions compatible with the functions fco
and fce. For sentences like Bill slept away and Bill wrote on, Jackendoff
observes: These mean roughly Bill kept on V-ing, i.e. away is not direc-
tional as in run away, and on is definitely not locational (2002, 77). In
the present framework these cases are dealt with automatically in terms
of the presence or absence of the ceiling function in the -position. This
also holds for cases as discussed in Toivonen (2006). A sentence like The
children jumped on is to be analyzed as a case in which the particle on
feeds the fs fce fco function expressed by verbs like knock, hit, and jump,
with the instruction to take the second option in definition (15): the
repetition in N is unbounded.
8.6 Testing the Interplay between the Aspectual Functions
The above reduction of A-information in verbs looks promising for the

cases we have dealt with in explaining how the function composition with
the functions fco, fen, fce, and fs may be accounted for. In the present
section, I will inspect some of the sentences (1)(6) discussed above in
order to see whether or not the reduction yields the correct picture.
The perfective prefix s- in (1b) Tibor sygralp sonaty Bethovena (Tibor
played a certain number of Beethoven sonatas) can be seen as enforcing
the transition from R+ into N. There are two ways of dealing with the
verb sygratp. The first is to assign the fce fco-structure to the verb as a
whole (as in the case of verbs like win, realize, reach, etc.); the second is
to break down the verb into a s- and a -grat-part and to locate fce in s-.
In both cases, the right result is obtained along the lines sketched above.
With regard to (1a) Tibor igrali sonaty Bethovena (Tibor was playing
Beethoven sonatas), there are two possibilities if one also takes into
account (5) Vchera Tibor igrali sonaty Bethovena (Yesterday Tibor
played Beethoven sonatas). One can characterize the (covert) IMP-prefix
in igral as preventing fce from being applied, so that the remaining tense-
less predication is forced to stay in R, yielding the progressive reading.
In (5), the factual interpretation of the imperfective aspect is to be under-
stood in terms of an inference: the adverbial Vchera (yesterday) pro-
vides a closed domain, and an appropriate context may imply the
application of fce in spite of the absence of information in the internal
argument itself triggering fce.
For the English equivalent of (1a) and the non-Slavic sentences in
(3) and (4), the analysis appears to work properly. Many analyses of the
Table 8.3
Continuity, Imparfait, and the Present Participle
Infinitive Present Participle Imparfait (3rd singular) Gloss
crire criv-ant criv-ait to write

traire tray-ant tray-ait to milk
plaire plais-ant plais-ait to please
connatre connaiss-ant connaiss-ait to know
progressive form assume an operator PROG outside a tenseless predica-

tion. For (3a) She was writing her thesis in 2009 the only assumption to
be made is to make BE + -ING part of the tenseless predication. I do not
see obstacles for an analysis PAST(she BE writ-ING her thesis in 2009).20
Assuming a bottom-to-top interpretation, the suffix -ING should then be
seen as the instruction to block or overrule the application of fce, which
brings about the unbounded continuity expressed by the progressive
form. It prevents the shift from R+ to N by its focus on the going-on part
of the predication.
Such an analysis might also apply to the French data in (3a) Elle criv-
ait sa thse en 2009 (She was writing her thesis in 2009) and (3b)
Elle crivit sa thse en 2009 (She wrote her thesis in 2009). Consider
table 8.3.
The general situation is this: if one deprives the French imparfait
of its tense suffixes one obtains the stem criv-, which also occurs in
the present participle, the present subjunctive forms and the plural
present tense forms. This is a striking correlation.21 It opens the way
for assuming that in tenseless predication the verb stem expresses only
fco, allowing the present tense forms, the imparfait and the present par-
ticiple to express unbounded progress. Admittedly, the pass simple
also makes use of this stem, but in a considerable number of cases this
is not the case. Note that this can be understood in terms of an option
in definition (15): the pass simple requires the first option in (15) with
m = 1. This corresponds to the use of the Greek aorist. The other forms
in table 8.3 do not have this requirement because they are defined as
expressing fco. One may think of a morphological marking for the ceiling
function of French verbs in a tenseless predicate. This would result in an
interesting parallel with the Russian situation: if the present participle
stem is not marked perfectively as requiring fce fco, then the resulting
tense will be the imparfait, otherwise the resulting tense will be the
pass simple.22
158 Henk J. Verkuyl
8.7 Summary
In the present chapter, I have argued that in the literature about aspec-
tual classes there is an annoying imbalance in allowing too much a priori
ontology. This is due to considering the lexicon as the place to be for
having a free ride to the secrets of ontological structure. As a conse-
quence, the main aspectual difference visible in the sentences in (1) and
in the first two sentences of (3) is put aside in the domain of grammatical
aspect so that one can focus on lexical aspect, hence on aspectual classes,
hence on ontological classes. As long as one takes the position that aspect
is a complex of different semantic factors contributed by different
meaning elements in a phrase or sentence, one is bound to work more
soberly on the question of how to account for aspectual phenomena such
as continuation, compulsory repetition, unboundedness, etc., apart from
what we know about verbs. One way to sober down is to look for abstract
mathematical principles that guide our construction of complex informa-
tion. The first step is then to remove tense from the analysis and to
restrict the focus to tenseless predication. The second step is to abstract
from the content of the arguments verbs may have and to see verbs as
carrying type structure expressed by elementary functions operating on
number systems that can be assumed to underlie our temporal organiza-
tion. The combined action of the functions discussed above seems to be
sufficient for doing away with the distinction between grammatical and
lexical aspect. I am aware that the present sketch is not the whole story,
so I refer the reader to Verkuyl (forthcoming) for a more detailed
account. That paper has been revised quite drastically after handing in
the present chapter. In particular, the function fco is to be replaced by
two functions having the same format but with different restrictions, the
first one accounting for stative verbs, and the other one for nonstative
verbs. In that sense, the present chapter turns out to be the first of two
steps necessary to rescue the theory of tense and aspect from the trap of
nave physics.
8.8 Acknowledgments
I would like to thank Emmon Bach, Robert Binnick, Olga Borik, Theo
Janssen (ILLC), Jan Luif, Pim Levelt, Remko Scha, Joost Zwarts and an
anonymous reviewer for their comments on earlier versions. I thank
Albert Rijksbaron for guiding me through thorny questions of how to
interpret Aristotles analysis of motion without doing injustice to him,
and Floris Cohen for an instructive conversation about Galileo and

Aristotle from the point of view of physics. I was very happy to be invited
by Hana Filip for a lecture about lexical aspect and natural philosophy,
and I thank her and the audience at the Heinrich-Heine-Universitt in
Dsseldorf for fruitful remarks and questions. I am also grateful for nice
and sometimes long discussions about the technical part of the machin-
ery proposed in the present chapter with Harry Sitters and Remko Scha,
and peripatetically with Johan van Benthem and with Ray Jackendoff, in
the latter case during a stay in Amsterdam where Ray was totally unaware
about the telos of the conversation: the present chapter.
Notes
1. The two quotations are from Te Winkel (1866, 67, 69). Translations of the quotes
are mine. Lammert. A. te Winkel (18101868) was a prominent nineteenth
century Dutch grammarian. Levelts brilliant A History of Psycholinguistics
(2013) makes clear that the so-called Cognitive Revolution by many scholars
located in the United States in the decades of the fifties and sixties of the past
century has a flourishing European pre-history going back to the 18th century.
Te Winkel participated in the mid-nineteenth century linguistic discussion, being
part of that history.
2. Taking (i)(iii) as oppositions between tense operators this means that Mary
will leave is to be analyzed as PRES(POST(IMP(Mary leave))) and Mary will have
left as PRES(POST(PERF(Mary leave))). It should be underscored that PRES and PAST
are in fact the only operators expressing real time, whereas SYN, POST, IMP, and
PERF operate on and yield a tenseless predication. This excludes posteriority from
being identified with future because it lacks the sense of being directly related
to the point of speech. For a formalized account of the binary system, see Verkuyl
(2008).
3. Nowadays both the term aspect and its Russian equivalent vid cannot escape
from a visual interpretation. However, there are other ways of dealing with the
opposition between (3a) and (3b). In a historical analysis of the heteroclite origin
of the term aspect, De Vogu et al. (2004, 118) shows that its visual connotation
may be simply due to a dubious interpretation of the word vid. Apart from its
visual meaning connected to the Latin verb videre, the Russian vid may also
mean sort or (conceptual) subdivision, in the sense of branch. This simply
means that originally aspect was simply seen as a form opposition, not a semantic
one. In the early use of the term vid in the 18th and 19th century linguistic litera-
ture, this non-visual meaning was predominant. The visual metaphor crept in by
translation. For a perspicuous sketch of the history of the study of aspect and
Aktionsart, see Mynarczyk (2004, 3367).
4. Krifka (1989, 1998) uses the term quantized for this restriction, but the two
terms have a totally different content because the notion quantized presumes
notions such as cumulativity, part structure, etc.; see Krifka (1998, 200).
160 Henk J. Verkuyl
5. Albert Rijksbaron (pers. comm.) points out that for classicists the identifica-
tion of activities/states with energeia is absolutely wrong even with the use of the
modifier roughly.
6. Te Winkels distinction between Action in Progress and Completed Action
does not appeal (overtly) to Aristotles ontology. This holds for virtually all lin-
guists who wrote about aspect before the second half of the last century, such as
Poutsma (1926) in his chapter on aspect and the German grammarians that I
mentioned in the first chapter of Verkuyl (1972), among them Streitberg (1889),
Herbig (1896), and Jacobsohn (1933).
7. In the early sixties, there was a huge clash in Holland between the leading
mathematical logician Evert Beth and one of the leading linguists at the time,
Anton Reichling, who was in his earlier days trained as a Jesuit priest. Elffers
(2006) describes in detail how Beth decided to discontinue the discussion about
Chomskys Syntactic Structures with the frustrated feeling that Reichling
approached science as an Aristotelian natural philosopher. Is it accidental that
Zeno Vendler in his earlier life also was thoroughly trained as a Jesuit priest, that
Anthony Kenny was trained as a Roman Catholic priest, and that Gilbert Ryle
was a philosopher feeling himself at home in the tradition of phenomenology
(like Meinong and Heidegger)? It might explain their intimacy with Aristotles
mental legacy and with their natural philosophical approach to the role of lan-
guage in ontological issues. A highly interesting picture of the role of the Church
in scientific research is given in Jaspers and Seuren (forthcoming).
8. A clear account of the long-standing difficulties in interpreting Aristotles
notion of motion is Sachs (2005); see also Rijksbaron (1989).
9. It should be said that Vendler had a keen eye for translational problems (1966,
10ff.).
10. Albert Rijksbaron (pers. comm.) pointed out to me that Aristotles analysis
of the difference between motions and actualities is restricted to the indicative;
see also Rijksbaron (2002). He explains this in terms of the dominant role of
truth in Aristotles Metaphysics.
11. Of course, this is not the only information provided about the two verbs, but
(8) and (9) suffice to make the point at issue.
12. A more detailed account of the sketch following in section 8.5.1 is given in
Verkuyl (forthcoming).
13. In other terms, fco = {x,y|y = x x 0} What (10) does, is also inherent to
the notion of continuity as used in Jackendoff (1996, 351).
14. For languages without the Pres/Past-distinction, e.g. Chinese, see Verkuyl
(2008, 162179).
15. fce, applied to , =3, applied to 0.658, 0.658=0. fen, applied to , =4,
applied to 0.658, 0.658=1. The floor and ceiling functions are generally defined
as functions from R or Q to the set of integers Z, but in the present analysis we
restrict ourselves to positive integers including 0. I will continue with the ceiling
function returning to the floor function later on.
16. This would have as a consequence that if the ceiling function fce maps, say
the image 3.6789 of the function fco to 4 as the first discrete number in its range,
the number 4 will be replaced by 1. Technically, this would require an adaptation
in the definition of fce, but more importantly, conceptually it would mean that the
length of the interval created by fco is not reflected in the N-structure: the holes
between the natural numbers are made independent of the intervals between
them if they occur in R. This independence is exactly what is necessary to account
for repetition and habituality.
17. In Verkuyl (forthcoming), I explore a different route from the one sketched
here on the basis of skipping the successor function fs. The ceiling function fce is
a so-called step function which allows us to account for the repetition possibly
expressed by verbs like belch and jump in terms of a continued mapping from
R+ to N providing a sequence of similar steps. In terms of the present chapter
this amounts to allowing m to be set as m1. This alternative would lead to blur-
ring the difference between the two discrete classes in table 8.1.
18. Such a structure would even allow intransitive fcefco-verbs to have the fce
function located at the -location where the place for the NP is blocked. For
space considerations, I will not go into the technicalities here.
19. See Verkuyl (1993, 168187) for a formal account (in the framework of the
theory of Generalized Quantification) of how to deal with quantificational infor-
mation in NPs with a count or mass noun.
20. In a binary tense system as sketched in Verkuyl (2008) this would be a
normal procedure. It runs counter to the proposal made in Verkuyl (1993, 318
327) where PROG is considered an operator external to the tenseless predicate p:
TENSE(PROG(p))).
21. Exceptions are avoir and tre. In a number of cases singular present tense
forms back out of the regularity.
22. The same appears to apply to Italian. Lenci and Bertinetto (2000) discusses
the impossibility of sentences like Gianni andava al mare con Maria (Gianni
went-IMP to the beach with Mary) occuring with adverbials like due volte (twice)
or molte volte (many times) as opposed to these sentences occurring with the
present perfect.
References
Ackrill, John L. 1965. Aristotles distinction between Energeia and Kinsis. In

New Essays on Plato and Aristotle, edited by Renford Bambrough, 121141. New
York: Humanities Press.
Ackrill, John L. 1978. Aristotle on action. Mind 87 (348): 595601.
Aristotle. 1894. Ethica Nicomachea. Edited by Ingram Bywater. Oxford: Claren-
don Press. http://www.perseus.tufts.edu/hopper/searchresults?q=bywater.
Aristotle. 1908. Metaphysics. Translated by William David Ross. In The Works of
Aristotle, Oxford Translations, vol. 8, edited by William David Ross and John
162 Henk J. Verkuyl
Alexander Smith. Oxford: Clarendon Press, 19081954. http://en.wikisource.org/

wiki/Metaphysics_(Ross,_1908).
Aristotle. [1933] 1961. The Metaphysics. Translated by Hugh Tredennick. Cam-
bridge, MA: William Heinemann LTD/Harvard University Press.
Aristotle. 1985. The Complete Works of Aristotle. The Revised Oxford Transla-
tion. 2nd edition. Edited by Jonathan Barnes. Bollingen Series LXXI. Princeton,
NJ: Princeton University Press.
Beyer, Thomas R., Jr. 1992. 501 Russian Verbs Fully Conjugated in All the Tenses
Alphabetically Arranged. New York: Barrons Educational Series, Inc.
Bolinger, Dwight Le Merton. 1971. The Phrasal Verb in English. Cambridge, MA:
Harvard University Press.
Broekhuis, Hans, and Henk J. Verkuyl. 2014. Binary tense and modality. Natural
Language and Linguistic Theory 32 (3): 9731009.
Charles, D. 1985. Aristotles distinction between Energeia and Kinesis: Inference,
explanation and ontology. In Language and Reality in Greek Philosophy: Papers
Read at the Second International Philosophy Symposium Organised by the Greek
Philosophical Society, May 1984, 173181. Athens: Greek Philosophical Society.
Claridge, Claudia. 2000. Multi-word Verbs in Early Modern English: A Corpus-
Based Study. Amsterdam: Rodopi.
Comrie, Bernard. 1976. Aspect. Cambridge Textbooks in Linguistics. Cambridge:
De Swart, Henriette. 2012. Verbal aspect. In The Oxford Handbook of Tense and
Aspect, edited by Robert I. Binnick, 752780. New York: Oxford University Press.
De Vogu, Sarah, Remi Camus, Maryse Dennes, Ilse DePraetere, Sylvie Mellet,
Albert Rijksbaron, and Maria Tzevelekou. 2004. Aspect. Laspect entre parole,
langues et langage. In Vocabulaire Europen des philosophies. Dictionaire des
intraduisibles, edited by Barbara Cassin, 116-144. Paris: Le Robert/Seuil.
Dowty, David. 1979. Word Meaning and Montague Grammar. The Semantics of
Verbs and Times in Generative Semantics and in Montagues PTQ. Synthese
Language Library 7. Dordrecht: D. Reidel.
Elffers, Els. 2006. Evert Beth vs. Anton Reichling: Contrary forces in the rise of
Dutch generativism. In Linguistics in the Netherlands, edited by Jeroen van den
Weijer and Bettelou Los, 89100. Amsterdam: John Benjamins.
Filip, Hana. 2012a. Aspectual class and Aktionsart. In Semantics. An International
Handbook of Natural Language Meaning, vol. 3, edited by Claudia Maienborn,
Klaus von Heusinger, and Paul Portner, 11861217. Berlin: De Gruyter.
Filip, Hana. 2012b. Lexical aspect. In The Oxford Handbook of Tense and Aspect,
edited by Robert I. Binnick, 721751. New York: Oxford University Press.
Galilei, Galileo. [1632] 1953. Dialogue Concerning the Two Chief Worlds Systems.
Translated by Stillman Drake. Berkeley, CA: University of California Press.
Gvozdanovic, Jadranka. 2012. Perfective and imperfective aspect. In The Oxford
Handbook of Tense and Aspect, edited by Robert I. Binnick, 781802. New York:
Herbig, Gustav. 1896. Aktionsart und Zeitstufe. Beitrge zur Funktionslehre des
Indogermanischen Verbums. Indogermanische Forschungen 6: 157269.
Jackendoff, Ray. 1990. Semantic Structures. Current Studies in Linguistics Series
Jackendoff, Ray. 1993. Patterns in the Mind; Language and Human Nature. New
York: Harvester/Wheatsheaf.
Jackendoff, Ray. 1996. The proper treatment of measuring out, telicity, and
perhaps even quantification in English. Natural Language and Linguistic Theory
14 (2): 305354.
Jackendoff, Ray. 2002. English particle constructions, the lexicon, and the auton-
omy of syntax. In Verb-Particle Explorations, edited by Nicole Deh, Ray Jack-
endoff, Andrew McIntyre, and Silke Urban, 6794. Berlin & New York: Mouton
de Gruyter.
Jacobsohn, Hermann. 1933. Aspektfragen. Indogermanische Forschungen 51:
292318.
Jaspers, Dany, and Pieter A. M. Seuren. Forthcoming. The square of opposition
in Catholic hands: A chapter in the history of 20th-century logic. Logique et
Analyse.
Kenny, Anthony. 1963. Action, Emotion and Will. London: Routledge &
Kegan Paul.
Kolni-Balozky, J. [1938] 1960. A Progressive Russian Grammar. 6th ed. London:
Pitman & Sons.
Krifka, Manfred. 1989. Nominal reference, temporal constitution and quantifica-
tion in event semantics. In Semantics and Contextual Expression, Groningen-
Amsterdam Studies in Semantics 11, edited by Renate Bartsch, J. van Benthem,
and Peter van Emde Boas, 75115. Dordrecht: Foris Publications.
Krifka, Manfred. 1998. The origins of telicity. In Events and Grammar, edited by
Susan Rothstein, 197235. Dordrecht: Reidel.
Lenci, Alessandro, and Pier Marco Bertinetto. 2000. Aspect, adverbs, and events:
Habituality vs. perfectivity. In Speaking of Events, edited by James Higginbotham
and Fabio Pianesi, 245287. New York: Oxford University Press.
Levelt, Willem J. M. 2013. A History of Psycholinguistics: The Pre-Chomskyan
Era. Oxford: Oxford University Press.
Mynarczyk, Anna. 2004. Aspectual Pairing in Polish. PhD diss., Utrecht Univer-
sity. http://dspace.library.uu.nl/handle/1874/633.
Poutsma, Hendrik. 1926. A Grammar of Late Modern English. II. The Parts of
Speech. Groningen: Noordhoff.
Rijksbaron, Albert. 1989. Aristotle, Verb Meaning and Functional Grammar:
Towards a New Typology of States of Affairs. Amsterdam: J.C. Gieben
Publisher.
Rijksbaron, Albert. 2002. The Syntax and Semantics of the Verb in Classical
Greek. An Introduction. 3rd ed. Amsterdam: J.C. Gieben Publisher.
164 Henk J. Verkuyl
Rothstein, Susan. 2004. Structuring Events: A Study in the Semantics of Lexical

Aspect. Oxford: Blackwell.
Ryle, Gilbert. 1949. The Concept of Mind. New York: Barnes and Noble.
Sachs, Joe. 2005. Aristotle: Motion and its place in nature. In Internet Encyclope-
dia of Philosophy, edited by James Fieser and Bradley Dowden. http://
www.iep.utm.edu/aris-mot/.
Smith, Carlota S. 1991. The Parameter of Aspect. Studies in Linguistics and Phi-
losophy 43. Dordrecht: Kluwer.
Streitberg, Wilhelm. 1889. Perfective und Imperfective Actionsart im Germanischen
I. Halle a.S.: Karras.
Te Winkel, Lammert. 1866. Over de wijzen en tijden der werkwoorden. De Taal-
gids 8: 6675.
Toivonen, Ida. 2006. On continuative on. Studia Linguistica 60 (2): 181219.
Vendler, Zeno. 1957. Verbs and times. The Philosophical Review 66 (2): 143160.
Reprinted in Zeno Vendler, Linguistics in Philosophy, 97121. Ithaca, NY: Cornell
University Press, 1966.
Vendler, Zeno. 1966. Linguistics in Philosophy. Ithaca, NY: Cornell University
Press.
Verkuyl, Henk J. 1972. On the Compositional Nature of the Aspects. Foundations
of Language Supplemental Series 15. Dordrecht: D. Reidel.
Verkuyl, Henk J. 1993. A Theory of Aspectuality. The Interaction between Tempo-
ral and Atemporal Structure. Cambridge Studies in Linguistics 64. Cambridge:
Verkuyl, Henk J. 2008. Binary Tense. CSLI Lecture Notes 187. Stanford, CA: CSLI
Publications.
Verkuyl, Henk J. Forthcoming. Events and temporality in a binary approach to
tense and aspect. In The Oxford Handbook of Event Structure, edited by Robert
Truswell. Oxford: Oxford University Press.
II PSYCHOLINGUISTICS
9 An Evolving View of Enriched Semantic Composition
Mara Mercedes Piango and Edgar B. Zurif
Our chapter focuses on the psycholinguistic and neurolinguistic proper-

ties of syntax-independent semantic composition, also referred to as
enriched semantic composition. This line of research has its roots in Ray
Jackendoffs conception of a parallel architecture that centrally features
a generative semantic component. Our interest in pursuing Rays lead
grew from a rather remote set of findingsnamely, those concerning
temporal constraints on the construction of syntactic dependencies
during the course of sentence comprehension. Accordingly, before
addressing enriched semantic composition, our chapter will describe how
we were led to this work via analyses of real-time syntactic processing,
and how the link between the two initially depended upon some neuro-
anatomical considerations based on observations taken from focal-lesion
studies.
9.1 Syntactic Composition as Real-Time Dependency Composition: Gap-

Filling and its Neurological Underpinnings
Our experiments dealt with the phenomenon of gap-fillingthat is,

with the real-time formation of a syntactic link between a displaced
constituent and the position in the sentence from which it has been dis-
placed and where it must be interpreted. This process is normally auto-
matic and fast-acting, or, to use Merrill Garretts term, reflexive.
To reveal this evanescent, intermediate stage in the process of com-
prehension, we relied on a commonly used low-tech paradigm called
priming, whereby the activation of a words meaning is revealed by the
facilitating effect it has on the processing of a following word or target
(Meyer, Schvaneveldt, and Ruddy 1975). So, if in a lexical decision task
the word cat is followed by the related word dog in one instance and the
unrelated word bank in another, it will take a subject less time to decide
168 Mara Mercedes Piango and Edgar B. Zurif
that dog is a word than that bank is a word. In effect, the meaning of the
preceding word cat has been activated to facilitate the processing of all
words within its underlying semantic network.
By using a cross-modal version of this technique, we were able to
follow the real-time activation of a displaced constituent during the
course of sentence comprehension. Consider the annotated sentence
This is the catk1 thatk/i the young girl2 followed (GAP)i3 last night in4 the
dark, in which the subscript i shows the syntactic dependency existing
between the relative pronoun and the gap position (GAP), and the
superscripts show the locations of the visual target sitesthat is, the sites
at which the experimenter examines if the word cat has been activated
to prime the targets or probes. The subjects task was twofold: to listen
to the sentence, and while listening, to decide if a letter string flashed on
a computer screen (the target) was a word or not. When sentences were
spoken at a normal speed, neurologically intact subjects showed faster
lexical decisions for the target dog than for the target bank at positions
1 and 3, but, crucially, not at position 2. In effect, while listening to the
sentence, subjects had activated the meaning CAT immediately after
hearing it, following which they held it in a non-active memory store, and
then reactivated it at the gap site; thus filling the gap.
Although likely capable only of coarse lexical coding, Wernickes
aphasic patients also showed this so-called gap-filling effect. However,
Brocas aphasic patients did not (Zurif et al. 1993; Swinney et al. 1996).
This difference between the two groups capacities bears on functional
neuroanatomy. That is, lesions yielding Brocas aphasia which tend to be
large and somewhat variable within an imprecisely bounded left inferior
frontal region are distinguishable from lesions yielding Wernickes
aphasia which emerge from damage to the left posterior superior region.
So, even though the two lesion sites are only imprecisely specified, the
classical syndromes of Brocas and Wernickes aphasia do have lesion-
localizing value (see, e.g., Alexander, Naeser, and Palumbo 1990; Naeser
et al. 1989; Vignolo 1988). The evidence suggests then that gap-filling
capacity appears to depend crucially upon an intact left anterior region,
but not upon the left posterior language region.
The Brocas patients inability to form a syntactic link during the
course of comprehension ties into Grodzinskys Trace Deletion Hypoth-
esis (1986, 1990, 2000), which characterizes their failure to understand
agentive semantically reversible sentences as an inability to represent
gaps, (see also Hickok, Zurif, and Canseco-Gonzalez 1993; Avrutin 2006;
Piango 2000). However, our subsequent work indicates that the
An Evolving View of Enriched Semantic Composition 169
syntactic linkage problem that we charted for this patient group does not
reflect a limitation of linguistic knowledge. In one of the follow-up
studies of Brocas aphasic patients, we widened the temporal window in
order to probe for priming not only at the post-verbal gap position but
also at 500ms after the gap (position 4). At this last position we observed
reactivation of the antecedent (Burkhardt, Piango, and Wong 2003;
Burkhardt et al. 2008; Love et al. 2008). Thus, left frontal damage did not
disallow the antecedents reactivation; rather it slowed the process such
that the syntactic linking operation was no longer formed in a timely
manner. A second study also pointed to a temporal alteration following
left frontal damage. When the rate of input was decreased by one third
(from a normal speaking rate of six syllables per second to one of four
syllables per second), Brocas patients did reliably reactivate a displaced
constituent at the gap position (Love et al. 2008).
We conclude from these findings that the failure to form syntactic
dependencies can be explained not as the result of any specific loss of
syntactic knowledge, but rather as the consequence of a disruption to
elemental processing resources that sustain the speed of lexical activa-
tion necessary for implementing this knowledge in real time. These
resources appear to depend crucially upon the integrity of the left frontal
region associated with Brocas aphasia. Indeed, recent fMRI results
provide considerable support for this characterization of the functional
commitment of the left frontal cortex, pointing to the recruitment of this
area as a function of searching for the gapa processing consideration
and not as a function of licensing the gapa representational matter
(Piango et al. 2009).
As described above, the reflexive formation of a relative pronoun/wh-
dependency, which fundamentally involves the creation of a syntactic
link between the wh- element and its gap, seems not to involve the left
posterior superior cortical region connected to Wernickes aphasia. Still,
Wernickes aphasic patients do have sentence comprehension problems
comprehension problems that are not always accountable by reference
either to single word meanings or other morphosyntactic factors. That is,
even when they understand the meaning of the individual words in a
structurally simple sentence, Wernickes patients oftentimes are unable
to combine them to gain the meaning of the sentence. This seemed to us
to reflect a specifically sentence-level (i.e., compositional) semantic limi-
tation, an observation that connected in promising ways with the funda-
mental proposal of the Tripartite Parallel Architecture (Jackendoff 1997):
the possibility that not only phonology, morphology, and syntax, but also
meaning could be combinatorial and generative. What our early observa-

tions suggested to us was that this possibility of distributed combinato-
riality could also underlie the processing and neurological architecture
of the linguistic system. To fully test this possibility and complement our
work on the cortical basis of syntactic composition, we began to explore
real-time semantic composition. Our first experiments sought to establish
normal processing patterns of real-time semantic composition. But, since
our ultimate goal was to elaborate upon functional differentiation within
the language region of the left hemisphere in terms of a syntax-semantics
partition, we continued also to capitalize on the lesion-localizing value
of Brocas and Wernickes aphasia.
9.2 Real-Time Semantic Composition: Aspectual and Complement Coercion

and their Neurological Underpinnings
Although there is evidence that argument structure information is nor-

mally available immediately after hearing the verb in a sentence (e.g.,
Shapiro, Zurif, and Grimshaw 1987, 1989), other observations suggest that
the deployment of such information in thematic role assignmentthat
is, in the determination of the specific semantic role that an entity bears
in a sentencehas a longer time course than does the assignment of the
specific syntactic form associated with that position (e.g., McElree and
Griffith 1998). So there seems to be a brief time period between the
completion of a segments syntactic operations and the completion of
that segments semantic operationsa period in which, we reasoned, we
might observe sentence-level semantic processes. Concerning this period,
our questions were these: In line with Rays notion of a parallel archi-
tecture (Jackendoff 1997), would we find evidence for a real-time pro-
cessing component attributable only to semantic compositionality? And
if so, and in contrast to the real-time formation of a syntactic dependency,
would the normal implementation of this component crucially depend
upon an intact left tempo-parietal regionthat is, the region associated
with Wernickes aphasia?
For this exploration, we focused on two phenomena that had been
independently described by Ray (e.g., Jackendoff 1997) and by James
Pustejovsky (e.g., Pustejovsky 1995) in the context of semantic composi-
tion: aspectual coercion and complement coercion, so called because they
refer to compositional operations that are syntactically underspecified.
Their interpretation does not arise directly from the syntactic composi-
tion of the lexical items involved. Rather, their interpretation demands
that specific semantic combinatorial processes be invoked to explain it;

semantic combinatorial processes that must be deployed independently
of syntactic structure. In what follows, we a) briefly describe each of
the phenomena, b) present the respective traditional analyses on which
our early experimental work was originally based, c) present some new
empirical and experimental evidence from our lab and others that have
challenged those traditional analyses, and conclude with d) a presenta-
tion of the new ways in which, given this new evidence, these phenomena
should be understood with implications for how we conceptualize the
Parallel Architecture from processing and neurological perspectives.
Aspectual coercion refers to the mechanism that gives rise to the itera-
tive interpretation found in sentential constructions involving a semel-
factive verb (e.g., jump, sneeze, or clap) and a temporal modifier (e.g.,
for an hour). The first observation that emerges in these cases is that the
sense of iteration is not rooted in any of the lexical items contained in a
sentence, yet it is clearly the preferred reading. Consider, for example,
The girl jumped for a long time during recess. In this sentence, the sense
of repeated jumping does not come from the meaning of any of its indi-
vidual words or from any other overt morphosyntactic element; yet it is
the clearly salient reading. So, in a grammatical model whereby sentence
meaning is composed strictly from the word meanings in the sentence,
this kind of construction should not be possible. What, then, allows it?
The possibility, explicitly proposed in the Tripartite Parallel Architecture
(Jackendoff 1997), was that the existence of constructions like this was
in fact due to the possibility that the semantic representation could
emerge from syntax-independent meaning generation. In the case of
aspectual coercion, the meaning of iteration would be introduced by a
semantic operation, coercion, which is implemented on the semantic
representation of the sentence alone and observed more clearly (although
not exclusively as Deo and Piango [2011] would later argue) in con-
structions with a semelfactive and a temporal modifier. In this way,
aspectual coercion is said to enrich (in fact, complete) the meaning of
the sentencesomething that is not required for a syntactically transpar-
ent sentence such as The girl rested for a long time during recess, whose
interpretation comes directly from the meaning of its lexical items put
together by syntactic processes alone.
The second phenomenon, complement coercion, is observable in the
sentence The girl began the book. In this sentence the aspectual verb
1
BEGIN combines with an NP that denotes some particular book, but the
resulting interpretation refers instead to the beginning of an event whose
participants are the girl and the book. The observation is that the explicit
morphosyntax of the sentence [NP[V[NP]]], does not contain expres-
sions that refer to any event, let alone the beginning of one. Yet, the only
coherent interpretation demands that there be such an event. Moreover,
BEGIN is an aspectual verb that modifies temporal reference. Conse-
quently, it can be posited that the verb BEGIN (and other predicates with
similar behavior) is restricted to arguments of a temporal or eventive
nature. And herein lies the issue: the book, which denotes an entity par-
ticipant, does not directly provide an argument that satisfies such a
lexical restriction. Nevertheless, despite this verb-complement mismatch,
the sentence The girl began the book receives a coherent interpretation
with an eventive component. This indicates that a temporal/eventive
argument is supplied (or coerced) as comprehension unfolds. The mecha-
nism that accomplishes this, the analysis goes, must be semantic in nature
as the (overt) morphosyntax of the sentence does not suggest a syntactic
basis for this eventive meaning. In a manner similar to aspectual coercion
then complement coercion is also taken to enrich the meaning of the
sentence: that is, it introduces meaning beyond that introduced via syn-
tactic composition alone, thus providing the sentence with an accessible
interpretation.
9.2.1 Early Experimentation on Aspectual and Complement Coercion

Our first study of enriched composition in real time focused on the
normal processing of aspectual coercion, for example, the boy jumped for
an hour after the teacher called him. To this end we used a dual-task
interference paradigm seeking to measure the hypothesized cost of
implementing the coercion operation, without which the sentence would
not be interpretable. Specifically, we found that the time it took neuro-
logically intact subjects to carry out an independent secondary task
(lexical decision) was significantly greater when the primary task was to
listen to and understand a sentence requiring coercion than when the
sentence was one whose interpretation resulted directly through syntac-
tically transparent composition (jump for an hour vs. sleep for an hour,
respectively). In effect, we discovered the normal processing cost of
aspectual coercion. We also found that the temporal parameter of this
processing operation was different from that of gap filling. Whereas the
latter was normally observed to occur as soon as it was structurally
licensed, the cost of semantic composition did not arise until 250 milli-
seconds after the operation was licensed (Piango, Zurif, and Jackendoff
1999; Piango et al. 2006).
Complement coercion has also been shown to be costly and delayed

(McElree et al. 2001; Traxler, Pickering, and McElree 2002; Traxler, et al.
2005), a finding we have replicated recently via an eye-tracking proce-
dure (Katsika et al. 2012). However, as our recent work has shown, the
existence of the complement coercion operation appears to be attribut-
able only to aspectual verbs (e.g., begin, continue, and finish) and not to
psychological verbs (e.g., enjoy, tolerate, and endure), previously claimed
to be part of the coercion set (Katsika et al. 2012; Utt et al. 2013; Lai
et al. 2014).
As with our work on syntactic composition, we based our exploration
of the brain correlates of semantic composition on the performance of
aphasic patients. However, the dual task interference paradigm that we
had used to study normal performance could not be adapted to fit our
analyses of aphasia. Instead, we used an offline task where we queried
the meaning of the sentence just heard directly or through picture-
matching. So, for aspectual coercion, after each sentence, we had the
patients respond to a binary-choice question focusing on the presence or
absence of iteration. And for complement coercion, we made use of
sentence-picture matching. The data were clear for both forms of coer-
cion: the Brocas patients had little or no trouble with either the syntacti-
cally transparent sentences (e.g., The boy began reading the book) or
those requiring coercion (e.g., The boy began the book). And this was
expected: given the more relaxed time course of semantic processing,
there was no reason to anticipate that the implementation of coercion
would be adversely affected by the slower-than-normal lexical activation
pattern observed with Brocas patients.
By contrast, the Wernickes patients showed a statistically significant
contrast between transparent sentences and coercion counterparts (T >
E) (Piango and Zurif 2001). Given these initial data, and given also
Shapiro and Levines (1990) finding that argument structure is available
online to Brocas patients but not to Wernickes patients, we hypothe-
sized a double dissociation: a specific computational role at the sentence-
semantic level for left posterior superior cortex set against the role of
syntactically-based composition charted for the left inferior frontal
region.
Our first compositional semantic studies were carried out well over 10
years ago. And our claim of a full double cortical dissociation seems now
to have been too strong. Recent work has suggested that although the
syntax/semantics divide connects in important ways to an anterior/
posterior distinction within the language region of the left hemisphere,
this functional division is not likely to be so cleanly categorical (see Price

[2012] for a relevant review). Of perhaps greater relevance to this volume,
however, is our evolved understanding of coercion as a semantic enrich-
ment operation. We no longer think it to be a process based on type-
mismatching and type-repair via operator insertion. One of us (MMP),
together with Ashwini Deo, now proposes a lexico-conceptual approach
whereby all the semantic requirements for sentence interpretation are
lexically encodedin the for-adverbial in the case of aspectual coercion,
and in the aspectual verb in the case of complement coercion. These two
claims (one for aspectual coercion, the other for complement coercion)
are summarized below as are the initial signs that our more nuanced view
of enrichment is compatible with emerging work in aphasia research,
brain imaging, and electrophysiology.2
9.2.2 New Analyses and Experimental Evidence on Aspectual and

Complement Coercion
9.2.2.1 Aspectual Coercion as Partition-Measure Retrieval Two linguistic
observations fundamentally challenge the traditional characterization of
aspectual coercion as iteration resulting from a selectional mismatch
between a telic predicate, normally a verb, and a durative modifier, spe-
cifically a for-adverbial (or for-adverb as that kind of modifier is referred
to in the literature): 1) non-syntactically supported iteration, similar to
that observed in aspectual coercion, may arise even though no verb-
modifier mismatch has taken place, and 2) a telic predicate may be modi-
fied by a for-adverbial without triggering iteration (Deo and Piango
2011, 306307). These observations challenge the traditional character-
ization because they cannot naturally be accounted for by the proposal
that iteration interpretation is introduced by an ITER operator whose
appearance is triggered by the aspectual mismatch. In what follows, we
present the two observations and sketch the structure of the analysis that
captures them.3
Observation (1) is illustrated by sentences such as Mary played a
sonata for two months, Mary walked a mile for a year, or John biked
to Whole Foods/drove to the university for a year. In these sentences
the verbal predicates in question are telic, and are modified by a dura-
tive phrase (for two months, for a year), a combination that predictably
induces iteration. However, in contrast to the standard aspectual coer-
cion cases, no mismatch in composition (requiring repair) is claimed to
have taken place. Instead, these cases are normally explained through
pragmatic contextualization whereby the knowledge of the temporal
constraints of playing a sonata/walking a mile or driving to the uni-

versity lead to the inference that if the denoted events must occur for
two months/a year they must have occurred in an iterated fashion.
These cases therefore represent a puzzle for the standard analysis
because they have exactly the same general structure as the aspectual
coercion cases, yet they cannot be accounted for through the same
means.
Observation (2) is illustrated by sentences such as John read a book/
built a sand castle/baked a cake for an hour. In sentences like these, the
predicate is also telic and modified by a durative phrase yet their inter-
pretation is not iterative. Instead, it is durative. This represents an impor-
tant challenge to the traditional ITER-insertion approach because it
shows that the iterative interpretation does not necessarily depend on
the telicity of the modified predicate, the cornerstone of the ITER-
insertion analysis.
In sum, observation (1) tells us that a mismatch-and-repair approach
to aspectual coercion does not exhaust the possible approaches to capture
the interpretation observed. Observation (2) tells us that characterizing
the trigger for coercion in terms of telicity properties on the verbal
predicate wrongly predicts the presence of iteration.
In contrast to the traditional account, we have proposed (Deo and
Piango 2011) that it is the semantics of the for-adverbial and not the
telicity property of the verb that contains the trigger for iteration. Spe-
cifically, on this account, the lexico-conceptual structure of the for-
adverbial is proposed to introduce in the semantic representation of the
sentence a regular partition of intervals (i.e., a set of collectively exhaus-
tive, non-overlapping, equimeasured subsets). The actual measure of
such intervals is not part of the lexicalized content. Rather, such a
measure is retrieved from the other lexical items in the sentence (e.g.,
the modified predicate, the subject, etc.) or from the larger discourse
context. This means that, in order for a for-adverbial to be interpreted,
a specific partition measure, which provides the structure of the set of
subsets, must be retrieved as comprehension unfolds.
On this analysis, the interpretation of Mary played a sonata/walked a
mile/swam two miles for two months follows from the possibility of inter-
preting predicates like play a sonata for two months, swim in the local
pool for a year, or jump for an hour, as being instantiated at regular
intervals across the measuring interval. The precise size of the partition
measure may vary: playing a sonata for two months may involve regular
and frequent playing events; swimming in the local pool may occur at
least once a week throughout the year, and jumping for an hour need
not require jumping exhaustively during the hour, only frequently enough
during that period of time. Under this analysis then, the interpretation
of these sentences (including those normally said to involve coercion)
does not demand the involvement of an ITER operator as the traditional
approach has it, but rather depends on the retrieval (from the conceptual
structure associated with the lexical items in the sentence) of a measure
that allows the partitioning of the specific interval along which the event
predicate distributes.
When the absolute length of the measuring interval is large in com-
parison to the duration of a typical event in the predicatefor example,
a swimming practice session, or a drive to the local marketthe partition
measure is assumed to be correspondingly large. When the length of the
measuring interval is short in comparison to the duration of a typical
event in the predicatefor example, the duration of a jump or a sneeze
the partition measure is correspondingly short. In both kinds of cases,
however, the source of the iteration is the same. This allows us to con-
clude that iterative readings with for-adverbials do not depend on the
(a)telicity of the verb as had been previously claimed, but rather on
the interaction between knowledge of the typical duration of events, the
length of the measuring interval, and the availability of a partition
measure from the contextual conceptual structure to determine the
intervals internal structure.
Viewing the partition measure-retrieval analysis from a processing
perspective, the account of the comprehension effect reported is straight-
forward: when a for-adverbial is combined with a predicate in the process
of sentence composition, a search of a partition measure must take place.
If the predicate is atelic (e.g., SWIM in swim for an hour), the partition
measure comes at minimal cost, as the preferred interpretation can make
do with the infinitesimal partition, which is available by default. If the
predicate is telic, then the processor can still opt for the infinitesimal
partition, but in most cases this will yield an implausible interpretation
(e.g., ???sneeze for an hour, whereby only one sneeze has taken place
covering the whole hour period). When that is the case, a search through
context must take place in order for the processor to retrieve a more
plausible partition measure. It is this search that manifests itself as pro-
cessing cost. According to this analysis then, the interpretation of sen-
tences such as Mary skipped/jumped for an hour entails no break in
interpretation or repair of any sort. Instead, it requires the satisfaction
of the requirements of the lexical items in the sentence. One of these
requirements, encoded as a variable in the lexical item of for, includes

the retrieval of a partition measure.
Finally, the analysis makes a prediction. It predicts that long itera-
tions, for example, She swam in the local pool for a year, will elicit
the same processing cost as She jumps for an hour. Deo et al. (2012)
tested the predictions of the traditional mismatch-and-repair approach
against those of the partition measure-retrieval account. Using self-paced
reading, they compared the processing of punctual iteration with that
of durative iteration (e.g., John jumped for an hour vs. John jogged
for a year), containing respectively durative verbs and long durative
adverbials. Crucially, durative iterationfor example, jogged for a year
contains no mismatch between the verbal predicate and the temporal
adverbial, yet the condition requires iterationperiodic jogging for a
yearwhich is predicted to elicit cost by partition-measure retrieval
account, but not by the mismatch approach. That is, partition-measure
retrieval predicts durative iteration and punctual iteration to be equally
costlier than no iteration. By contrast, mismatch-and-repair predicts
only punctual iteration to differ from no iteration. Results from planned
comparisons support the partition measure-retrieval account by showing
a significant difference between no iteration and punctual iteration
at the adverbial window (p = 0.035), replicating previous work. Cru-
cially, they also show a significant difference between no iteration and
durative iteration at the adverbial window (p = 0.048). Finally, no dif-
ference was found between punctual iteration and durative iteration
(p = 0.87).
As can be seen, the partition measure-retrieval analysis answers the
question of multiple sources of iteration raised above, and captures in a
more parsimonious manner the processing profile reported for aspectual
coercion, a profile that can now be understood as a process of searching
for a partition-measure through the local conceptual structure in a
lexically-guided manner; this is the kind of process that one would expect
in an architecture where semantico-conceptual composition can take
place in a syntax-independent manner. From a processing perspective,
the analysis also addresses the question of the source of the iteration
interpretation (i.e., why must the resulting interpretation be iteration?).
Under mismatch-and-repair, iteration is introduced through an operator,
ITER. And iteration introduced in this way has no cognitive basis other
than its own existence. Under partition-measure retrieval, iteration is an
outcome from a search through the local conceptual structure triggered
by the lexically-encoded requirement to fill the partition measure
variable in the semantic representation of the temporal modifier. In this

way, the iterative interpretation is grounded, on the one hand, in the
lexical semantics of the durative predicate (no extra operation is required)
and, on the other, in the larger conceptual structure associated with the
sentence itself. Such grounding, which we take to be one key contribution
of the analysis, is desirable not only because it contextualizes the unifica-
tion of the empirical and processing observations, but also because it
provides the basis for understanding purely semantic composition as a
formal connection between sentence-level, real-time composition and
conceptual structure dynamics. In doing so, it further blurs the lexicon-
grammar divide. And here we find another point of convergence, since
the elimination of the divide between grammar and lexicon is a central
feature of the Parallel Architecture. This is one of the several ways in
which the partition measure-retrieval analysis connects fundamentally
with the core substance of Rays work (see Jackendoff (2007) for exten-
sive discussion of the arguments for lexicon-grammar consolidation). In
what follows, we provide a similar analytical approach, this time in con-
nection to complement coercion.
9.2.2.2 Complement Coercion as Dimension-Ambiguity Resolution Our

recent work on complement coercion builds upon two observations, one
experimental and the other empirical. The experimental observation
comes from Katsika et al. (2012) (recently replicated by Utt et al. 2013
and Lai et al. 2014), which states that only aspectual verbs (e.g., begin,
finish) and not psychological verbs (e.g., enjoy, endure) exhibit the
increased processing cost previously associated with the computation of
complement coercion. The empirical observation (based on linguistic
judgments and actual examples attested in corpora) comes from Lai et
al. (2014) and Piango and Deo (forthcoming), according to which, and
contrary to standard assumptions, aspectual verbs do not only select for
event-denoting complements, but they also select for entity-denoting
complements. Moreover, when they select for entity-denoting comple-
ments, they do so without triggering an event interpretation. This is not
only seen in cases such as A very long dedication begins the book, where
the subject is inanimate, but also in cases with animate subjects such as
The little girl begins the queue, where the intended interpretation is con-
stitutive: the little girl is the first person in the queue. Accordingly, the
fundamental assumption that coercion verbs select ONLY for events is
untenable. As it is the case that this assumption is taken to be the defin-
ing property of complement-coercion verbs and indeed the basis for the
mismatch-and-repair approach that accounts for the eventive interpreta-

tion that they trigger. Showing that the assumption is not valid thus
renders the approach itself untenable.
In light of these recent observations, we have proposed an analysis that
restricts the coercion phenomenon to aspectual verbs (e.g., begin, finish,
or continue) and again seeks to ground the compositional facts in con-
ceptual structure by distributing the labor, as it were, between lexicalized
selectional restrictions and the principles that provide access to the spe-
cific conceptual domain that satisfies such restrictions. The starting point
here is our observation that the interpretation of sentences containing
aspectual verbs in their transitive form systematically makes reference
to parthood relations between conceptual objects along a range of
dimensions (the dimensions understood here as perspectives on objects)
that capture the cognitively salient perspective of an entity. The dimen-
sions are themselves conceptual in nature, based on the physical appear-
ance and varieties of usage of the entities. Consequently, not only may
any given entity be conformed by more than one such dimension, but
the set of possible dimensions associated with an entity does not have to
constitute a natural class. Sentences illustrating specific parthood rela-
tions along with their dimensions are This is the famous perch that offi-
cially begins the Appalachian Trail (spatial dimension), A thunderstorm
began the morning (temporal dimension), A prayer started the banquet
(temporo-eventive dimension), The penultimate stanza continues the
poems resonance (spatio-informational dimension) (all examples are
from Piango and Deo [forthcoming] and are web-attested). As Piango
and Deo note, this deeper exploration into the environment of aspectual
verbs reveals the unviability of characterizing this semantic class in terms
of restrictions on the complement. Instead, this verbal class appears to
license any complement as long as it (the complement) can have some
dimension along which a parthood structure (or axis) can be identified
or induced. And that is what in essence is proposed: aspectual verbs lexi-
cally select for structured individuals or entities that can be construed as
having a totally ordered axis (a directed path structure) along a
dimension.
On this analysis, the interpretation of all sentences with aspectual
verbsincluding the ones involved in the coercion configuration set
(animate subject + aspectual verb + entity-denoting object, e.g., The girl
began the book)can only be determined after the relevant dimension-
specific function encoded in the meaning of the verb has been chosen.
Since any given complement can have more than one salient dimension,
these sentences can receive multiple interpretations (one for each dimen-
sion) which, crucially, are mutually exclusive.
So, in the case of The girl began the book, upon encountering began,
the processor must exhaustively retrieve all dimension-specific functions
encoded in its lexical representation, and upon encountering the book, a
potential structured-individual under at least the spatial, informational,
and eventive dimensions, all the dimensions associated with the book are
retrieved as possible candidates for the required axis. In this situation, at
least two possible functions are viable: the eventive-dimension function
that leads to the interpretation (began event involving the book, whereby
the girl is mapped onto the agent role whereas the book is mapped to
the patient role of the event, e.g., the girl began the book = the girl began
writing/reading/restoring the book) and the informational-dimension
function that leads to an interpretation whereby the girl is the source of
information for the segment, e.g., the girl began the book = the anecdote/
story about the girl began the book). The availability of these two inter-
pretations represents an ambiguity that must be resolved, as only one of
the readings can be the intended at any given time.
As can be seen, and in contrast to the traditional account, our analysis
of aspectual verbs and of the complement-coercion effect does not
depend on the implementation and processing cost of introducing spe-
cialized entity-to-event (type-shifting) operators into the semantic rep-
resentation. Rather, our analysis tells us that the interpretation of an
aspectual verb + complement segment depends upon two processes: (1)
the exhaustive activation of all possible dimension-specific functions that
are lexically encoded in each of the predicates in the aspectual class
that is, we claim, part of what defines the class as aspectual, and (2)
the lexically-guided search through the conceptual structure associated
with the complement seeking to determine the dimension (eventive,
informational, etc.) along which the axis (structured object) is to be
determined. In light of these observations, we further propose that it is
the combination of these two processesexhaustive activation of func-
tions (at the verb) and dimension determination (at the complement)
that is the source of the cost observed in the comprehension of coercion
configuration sentences. At this point, we note that this analysis extends
to other possible configurations involving aspectual verbs. We focus on
this one because this one happens to be the only experimentally studied
subclass of aspectual verb sentences.4
Early support for our analysis is found in Traxler et al. (2005) who
reported that whereas previous exposure to an activity, for example,
building a condominium, does not facilitate the subsequent parsing of

the animate subject + aspectual verb + entity-denoting complement con-
figuration (e.g., John began the condominium), previous exposure to an
aspectual verb plus its entity-denoting complement does (e.g., began the
condominium). This facilitation is observed in fact in the elimination of
the coercion effect in the subsequent processing of the same string (e.g.,
John began the condominium). Whereas the existence of this contrast
remains unexplained in the repair-and-mismatch approach, it finds a
natural explanation in the present account. Indeed, in our account, we
interpret the disappearance of the processing cost after the subject has
parsed the critical configuration as the result of facilitation of dimension
retrieval. This is so because in both presentations of the animate subject
+ aspectual verb + entity-denoting complement (e.g., John began the
condominium) the intended dimensional-reading is the same. So, only in
the first presentation is a dimension-disambiguation required. In the
subsequent one, one dimension has already been privileged, and the
processing cost is therefore lessened. Of course, this interpretation
depends upon the notion that conceptual-semantic processes (such as
dimension-determination) can penetrate and constrain at least some
sentence-level semantic processes during the course of their operation.
And this stands in contrast to the impenetrability of the syntactic gap-
filling process we have charted (Hickok et al. 1992)a difference that
brings a consideration of temporal constraints to the fore. The gap-filling
process is reflexive, encapsulated, as it were, by its required speed. But
the sentence-level semantic processes described here have a slower rise
time, and therefore can interact with other forms of contextual informa-
tion whenever such information is available.
As for the neurological underpinnings of complement coercion, a
recent fMRI study from our lab (Lai et al. 2014) offers continued support
for our initial claim (Piango and Zurif 2001) that the left posterior
cortical region associated with Wernickes aphasia is crucially involved
in the processing of complement coercion. However, it also points to the
recruitment of other brain regions in the frontal cortex involved in this
semantic operation, which had already been suggested by previous work
(Husband, Kelly, and Zhu 2011). Two particulars of our study are worth
mentioning: the test sentences differed only with respect to the nature
of the verbaspectual (e.g., begin) vs. psychological (e.g., enjoy), and
our fMRI analyses were performed on both full sentences and, indepen-
dently, on sentence segments. Based on this design, the basic findings
were as follows: activation of Wernickes area (BA40), bilateral parietal
cortex (BA7), and frontal cortex (BA6, BA24) were associated with
computation of subject + verb (aspectual-psychological: (The boy began)
vs. (The boy loved) and a separate activation of subregions within the
left frontal cortex, involving BA 44, 45, BA47, BA6 (bilateral), and mar-
ginally BA8, were associated with comprehension of the complement,
which in this case was identical for each aspectual/psychological pair
((the book...) vs. (the book...)).
Given these data, it seems reasonable to suggest that the activation of
the region associated with Wernickes aphasia (BA40) and some other
regions within the fronto-temporal network such as BA7 and BA6 indi-
cates their role in the exhaustive retrieval of the dimension-specific
functions encoded in the verb in anticipation of the complement. It also
seems reasonable to suggest that the functional role of the cortical area
associated with Brocas aphasia be expanded to include its participation
in some semantic operations. Still, this area remains set apart from the
other cortical regions activated in the experiment in light of its crucial
role in gap-filling. This last consideration highlights the difference
between, on the one hand, a fast-acting, impenetrable syntactic compo-
sitional process whose overriding objective for any given utterance
is to mark constituency and subcategorization displacements and, on
the other, a slower-acting, penetrable semantic compositional process
whose objective is to build the local meaning of an utterance not in isola-
tion, but inextricably embedded in the larger non-linguistic conceptual
system.
9.3 Acknowledgments
We feel fortunate, indeed, to have interacted almost daily with Ray

during our Brandeis years. We probably would not have undertaken our
initial real-time analyses of sentence-level semantic processing were it
not for Rays insistence on the need for an independent generative
semantic component in the architecture of language, his openness to the
possibility of psycholinguistic and neurolinguistic evidence as valid
sources of constraint on linguistic theory, and his intellectual generosity
as we explored together and tried to understand and model the process-
ing implications of such an architecture. Nor would we have continued
with the task of trying to understand the nature of sentence-level seman-
tic processing. But he offered us more than a theory of semantic enrich-
ment; he also enriched our lives. And we are deeply honored to count
him as a close friend.
Notes
1. A notational point: the use of SMALL CAPS is intended to indicate reference to

the decontextualized meaning of the expression (and semantic class to which the
expression belongs), not to the specific morphosyntax or phonology.
2. For detailed reports of this work see Deo and Piango (2011), Lai et al. (2014),
and Piango and Deo (forthcoming).
3. For the complete analysis, which goes beyond the scope of this article, please
see Deo and Piango (2011).
4. See Piango and Deo (forthcoming) and Lai et al. (2014) for a more detailed
presentation of the analysis, a summary of all the experimental work on comple-
ment coercion published to date, and a discussion of experimental extensions of
the analysis.
References
Alexander, Michael P., Margaret A. Naeser, and Carole Palumbo. 1990. Brocas
area aphasias: Aphasia after lesions including the frontal operculum. Neurology
40 (2): 353362.
Avrutin, Sergey. 2006. Weak syntax. In Brocas Region, edited by Yosef Grodzin-
sky and Katrin Amunts, 4962. New York: Oxford University Press.
Burkhardt, Petra, Sergey Avrutin, Mara M. Piango, and Esther Ruigendijk.
2008. Slower-than-normal syntactic processing in agrammatic Brocas aphasia:
Evidence from Dutch. Journal of Neurolinguistics 21 (2): 120137.
Burkhardt, Petra, Mara M. Piango, and Carol Wong. 2003. The role of the
anterior left hemisphere in real-time sentence comprehension: Evidence from
split intransitivity. Brain and Language 86 (1): 922.
Deo, Ashwini, and Mara M. Piango. 2011. Quantification and context in measure
adverbs. In Proceedings of the 21st Semantics And Linguistic Theory Conference,
edited by Neil Ashton, Anca Chereches, and David Lutz, 295312. http://
elanguage.net/journals/salt/article/view/21.295/2516.
Deo, Ashwini, Mara M. Piango, Yao-Ying Lai, and Emily Foster-Hanson. 2012.
Building multiple events: The cost of context retrieval. Paper presented at the
AMLaP Conference, Ria da Garda, Italy, September 2012. Poster 224. http://
pubman.mpdl.mpg.de/pubman/item/escidoc:1563764:3/component/
escidoc:1563765/Smit_huettig_monaghan_amlap2012.pdf.
Grodzinsky, Yosef. 1986. Language deficits and the theory of syntax. Brain and
Language 27 (1): 135159.
Grodzinsky, Yosef. 1990. Theoretical Perspectives on Language Deficits. Cam-
Grodzinsky, Yosef. 2000. The neurology of syntax: Language use without Brocas
area. Behavioral and Brain Sciences 23 (1): 121.
Hickok, Gregory, Enriqueta Canseco-Gonzalez, Edgar Zurif, and Jane Grim-

shaw. 1992. Modularity in locating wh-gaps. Journal of Psycholinguistic Research
21 (6): 545561.
Hickok, Gregory, Edgar Zurif, and Enriqueta Canseco-Gonzalez. 1993. Structural
description of agrammatic comprehension. Brain and Language 27 (3):
371395.
Husband, E. Matthew, Lisa A. Kelly, and David C. Zhu. 2011. Using complement
coercion to understand the neural basis of semantic composition: Evidence from
an fMRI study. Journal of Cognitive Neuroscience, 23 (11): 32543266.
MIT Press.
Jackendoff, Ray. 2007. A Parallel Architecture perspective on language process-
ing. Brain Research 1146: 222.
Katsika, Argyro, Dave Braze, Ashwini Deo, and Mara Mercedes Piango. 2012.
Complement coercion: Distinguishing between type-shifting and pragmatic infer-
encing. The Mental Lexicon 7 (1): 5872.
Lai, Yao-Ying, Cheryl Lacadie, Todd Constable, Ashwini Deo, and Mara Mer-
cedes Pinango. 2014. Complement Coercion as the Processing of Aspectual
Verbs: Evidence from Self-Paced Reading and fMRI. In Proceedings from the
36th Annual Cognitive Science Conference. Quebec City, Canada, edited by Paul
Bello, Marcello Guarini, Marjorie McShane, and Brian Scassellati, 25252530.
https://mindmodeling.org/cogsci2014/papers/438/paper438.pdf.
Love, Tracy, David Swinney, Matthew Walenski, and Edgar Zurif. 2008. How left
inferior frontal cortex participates in syntactic processing: Evidence from aphasia.
Brain and Language 107 (3): 203219.
McElree, Brian, and Teresa Griffith. 1998. Structural and lexical constraints on
filling gaps during sentence comprehension: A time-course analysis. Journal of
Experimental Psychology: Learning, Memory, and Cognition 24 (2): 432460.
McElree, Brian, Matthew J. Traxler, Martin J. Pickering, Rachel E. Seely, and Ray
Jackendoff. 2001. Reading time evidence for enriched composition. Cognition 78
(1): B17B25.
Meyer, David E., Roger W. Schvaneveldt, and Margaret G. Ruddy. 1975. Loci of
contextual effects on visual word recognition. In Attention and Performance, vol.
V, edited by P. M. A. Rabbit and Stanislav Dornic, 98118. New York: Academic
Press.
Naeser, Margareta, Carole L. Palumbo, Nancy Helm-Estabrooks, Denise Stiassny-
Eder, and Martin L. Albert. 1989. Severe nonfluency in aphasia: Role of the
medial subcallosal fasciculus and other white matter pathways in recovery of
spontaneous speech. Brain 112 (1): 138.
Piango, Mara Mercedes. 2000. Canonicity in Brocas sentence comprehension:
The case of psychological verbs. In Language and the Brain: Representation and
Processing, edited by Yosef Grodzinsky, Lewis P. Shapiro, and David Swinney,
327350. San Diego, CA: Academic Press.
Piango, Mara Mercedes, and Ashwini Deo. Forthcoming. A general lexical
semantics for aspectual verbs. Journal of Semantics.
Piango, Mara Mercedes, Emily Finn, Cheryl Lacadie, and Todd Constable. 2009.
The Role of the Left Inferior Frontal Gyrus in Sentence Composition: Connect-
ing fMRI and Lesion-Based Evidence. Paper presented at the 47th Annual
Meeting of the Academy of Aphasia, Boston, MA, October 2009.
Piango, Mara Mercedes, Aaron Winnick, Rashad Ullah, and Edgar Zurif. 2006.
Time-course of semantic composition: The case of aspectual coercion. Journal of
Psycholinguistic Research 35 (3): 233244.
Piango, Mara Mercedes, and Edgar Zurif. 2001. Semantic operations in aphasic
comprehension: Implications for the cortical organization of language. Brain and
Language 79 (2): 297308.
Piango, Mara Mercedes, Edgar Zurif, and Ray Jackendoff. 1999. Real-time
processing implications of enriched composition at the syntax-semantics inter-
face. Journal of Psycholinguistic Research 28 (4): 395414.
Price, Cathy J. 2012. A review and synthesis of the first 20 years of PET and fMRI
studies of heard speech, spoken language, and reading. Neuroimage 62 (2):
816847.
Pustejovsky, James. 1995. The Generative Lexicon. Cambridge, MA: MIT Press.
Shapiro, Lewis P., and Beth A. Levine. 1990. Verb processing during sentence
comprehension in aphasia. Brain and Language 38 (1): 2147.
Shapiro, Lewis P., Edgar Zurif, and Jane Grimshaw. 1987. Sentence processing and
the mental representation of verbs. Cognition 27 (3): 219246.
Shapiro, Lewis P., Edgar Zurif, and Jane Grimshaw. 1989. Verb representation
and sentence processing: Contextual impenetrability. Journal of Psycholinguistic
Research 18 (2): 223243.
Swinney, David, Edgar Zurif, Penny Prather, and Tracy Love. 1996. Neurological
distribution of processing resources underlying language comprehension. Journal
of Cognitive Neuroscience 8 (2): 174184.
Traxler, Matthew J., Brian McElree, Rihana S. Williams, and Martin J. Pickering.
2005. Context effects in coercion: Evidence from eye movements. Journal of
Memory and Language 53 (1): 125.
Traxler, Matthew J., Martin J. Pickering, and Brian McElree. 2002. Coercion in
sentence processing: Evidence from eye-movements and self-paced reading.
Journal of Memory and Language 47 (4): 530547.
Utt, Jason, Alessandro Lenci, Sebastian Pad, and Alessandra Zarcone. 2013. The
curious case of metonymic verbs: A distributional characterization. In Proceed-
ings of the 10th International Conference on Computational Semantics (IWCS
2013). Workshop Towards a Formal Distributional Semantics. http://aclweb.org/
anthology/W/W13/W13-0604.pdf.
Vignolo, Luigi. 1988. The anatomical and pathological basis of aphasia. In
Aphasia, edited by Frank Clifford Rose, Renata Whurr, and Maria A. Wyke,
227255. London: Whurr.
Zurif, Edgar, David Swinney, Penny Prather, Julie Solomon, and Camille Bushell.
1993. An on-line analysis of syntactic processing in Brocas and Wernickes
aphasia. Brain and Language 45 (3): 448464.
10 Height Matters
Barbara Landau and Lila R. Gleitman
If two bodies collide, then the first of them collides with the second, the
second collides with the first, and they collide with each other. Surpris-
ingly, assenting to these mutual entailments does not imply that these
sentence forms are semantically equivalent, at least in a court of law. For
if a scooter collides with a bus then the scooters insurance company
pays, and the reverse obtains if the bus collides with the scooter.
Although any collision must be a single event, the asymmetry of syntac-
tic structure in these cases imparts a further semantic element to the
interpretation. Of course, if the bus and the scooter collide (or the
scooter and the bus collide), that is simply a tragic accident and money
doesnt change hands. This set of syntactic structures is a striking case
whereby even a single symmetrical motion event (colliding) can be lin-
guistically framed so as to alter the relative prominence of the partici-
pants, resulting in additional interpretive values of path direction and
evenas in the present caseattributions of instigation and cause. Ray
Jackendoffs career in linguistics and psychology has been materially
involved with uncovering and explicating such subtle framing properties
by means of which languages add perspective to description in render-
ing the representations of events. In this essay, we focus on one such
powerful framing device that centrally influences listeners interpreta-
tions. This is the modulation of meaning conveyed through the relative
prominence of sentential constituents, established through height in the
syntactic structure. This structural property is a major controller of lis-
teners semantic interpretations even in the face of countervailing con-
ceptual biases. The syntactic patternings that we will discuss, though
partly unique to English, fall within a range of parametric cross-language
variability that is sufficiently narrow so that children can use them
to recover the meanings of words. For English, as we shall seeand
to varying degrees in all languagesheight in the observed (that is
188 Barbara Landau and Lila R. Gleitman
surface) parse tree plays a crucial restrictive role in organizing the

lexicon and in interpreting sentences.
We begin with the oft-cited point that observations of objects and
events in the world are inherently indeterminate with respect to their
possible descriptions (cf. Quine 1960; Goodman 1955), potentially creat-
ing a problem for the linguist who visits a new linguistic community or
a child who is learning his native tongue. In Quines example, hearing
the utterance Gavagai! while observing a rabbit run by leaves the
listener-observer open to an indefinite number of interpretations, includ-
ing those that are lucky guesses (such as rabbit) and those that are no
use at all (some fur momentarily obscured by brush).1 Even if the
observers are fortunately biased in their interpretations of objects and
events (rabbits rather than bits of fur; running rather than widely sepa-
rated temporal slices of motion), it is by no means assured that the
interpretation they derive from their observation of the scene will be the
same as that of the speaker. The case of collision is a useful example:
given one and the same event, the participants in traffic accidents may
have quite different accounts of what went on, as revealed by their syn-
tactic choices and by the consequent decisions of judge and jury.
The three interrelated studies that we review here describe several
syntactic framing devices built into Englishand presumably all
languagesthat selectively restrict the choice among the multiple
descriptions that are always available for any one scene. The first focuses
on two natively available conceptual biasespriority of agents over
patients, and goal paths over source pathsthat influence childrens and
adults initial preferences in interpreting scenes and events. These con-
ceptual biases are reflected in language by prominence in syntactic struc-
tures, with agents canonically assigned to subject and patients to object
position, and goal paths more likely to be arguments than source paths.
The second case study concerns symmetrical predicates, including the
case of collide that we have just mentioned. Paradoxically, the placement
of symmetrically compared entities at different heights in a phrase-
structure tree can set up an asymmetry that systematically influences the
interpretations of the symmetrically compared nominals themselves.
The third case study builds on the first two by showing that lexical
and syntactic informationin particular, the role assignments of
words representing visual features in briefly presented displayscan
improve childrens memory for fleetingly observed stimuli. Children can
recall the left/right placement of colors on a rectangular figure if the
figure is first described with a predicate that requires an asymmetric
Height Matters 189
(subject-complement) structure. This is another striking case where the

listeners interpretation of the heights of noun phrases in a parse tree
alters the represented prominence of constituent entities in the observed
visual world.
10.1 Two Kinds of Asymmetry: Thematic Roles and Path Types
Here we first take up the linguistic representation of agents and patients,

and then, source and goal paths.
10.1.1 Thematic Roles

The notion agent of an action, though hard to define precisely (see Dowty
[1991] for a magisterial discussion) has to do with animacy and volition,
and applies to sentient creatures, prototypically humans, as they move
about and influence events and objects in the world. Something like this
agentive concept is mentally present before ones native language is
learned, as evidenced by pre-linguistic infants sensitivity to the inten-
tions and volitional acts of animate beings (Gergely, Bekkering, and
Kirly 2002; Gordon 2003; Woodward 1998). Languages universally
reflect the conceptual prominence of agents over other thematic roles
by canonical assignment of this role to the syntactic subject positionthe
highest NP in the parse tree, for almost all languages. Isolated deaf chil-
dren manifest this same tendency in their home-made gestural language
(Feldman, Goldin-Meadow, and Gleitman 1978) and in the elaboration
of such languages in stable deaf communities (Senghas, Kita, and zyrek
2004). As Otto Jespersen (19091949) put it, the alignment of agency and
grammatical subjecthood reflects the greater interest felt for persons
than for things (54).
Consider as an example the event depicted in figure 10.1: it is as much
a drama of the dog chasing the man as of the man fleeing the dog. But
experimental participants do not come down fifty-fifty if they are asked
to describe whats happening in such pictures. Rather, there is always a
preponderance of The dog is chasing a man over The man is running
away from a dog. This choice of verb, given the scene, accomplishes two
desiderata at once, by selecting the dog as the entity who most plausibly
set this event in motion (its cause, its instigator), and placing this item as
the highest node in the parse tree, the subject of the sentence.
These default descriptions are mutable, however, as shown by Gleit-
man et al. (2007). The idea is to manipulate the observers focus of atten-
tion by visual capture: as each test picture is displayed, a flash of light
Figure 10.1
Visual capture and the interpretation of scenes. When the dog is subliminally highlighted
for 60 msec as experimental participants view this scene, they are more likely to describe
the scene as A dog is chasing the man than if the highlight is placed at a neutral point
or, especially, if the man is highlighted, in which case they are more likely to say A man
is running away from the dog (after Gleitman et al. [2007]).
appears for 60 milliseconds (too brief a time to be consciously detected)

behind either the dog or the man, or (as a control) in some neutral place
between the two of them. When the man is covertly highlighted, the
proportion of flee/run away responses increases over that proportion in
the neutral case. By contrast, when the dog is highlighted, this further
increases the natural prominence of chasing over fleeing, resulting in
increased chase responses.
This universal tendency to encode agents as subjects is likely to provide
an easy point of entry in acquisition; indeed, many have assumed that
the mapping between agents and subjects is among the earliest realized
(possibly innate) linking rules. As Bever pointed out as early as 1970, it
is a plausible explanation for why active voice sentences are easily under-
stood by very young children whereas passives are systematically misun-
derstood. Actives align the conceptual and linguistic-representational
facts, whereas passives decouple them. Alignment among semantic and
syntactic hierarchies is likely to facilitate acquisition across the board
(Grimshaw 1981; Fisher and Song 2006; Gleitman 1990; Pinker 1989).
Young learners will use the observed linguistic structures, and the bias
to align conceptual prominence with height in the parse tree, to recon-
struct the meanings of new words that are uttered in situationally ambig-
uous circumstances. This is quite a feat of reverse engineering: because
the verb argument structure is a projection from the verb meaning, the
Height Matters 191
learner can venture a verb meaning by recovering the linguistic structure

in the heard sentence as this co-occurs with a scene. This was documented
by Fisher, Hall, Rakowitz, and Gleitman (1994) by using the kinds of
scenario illustrated in figure 10.1, which offer a choice as to which event
participant an accompanying sentence can be about. They showed
children (aged 3 and 4) videotaped events of puppets giving/getting,
chasing/fleeing, or feeding/eating, asking whether syntactic choice would
determine the interpretation of new (nonsense) verbs. Thus different
groups of children heard the scenarios described as either Look! Glorp-
ing! (syntactically uninformative context) or Look! The dog is glorping
the man (the most plausible instigator-as-agent option) or Look! The
man is glorping the dog (the description incongruent with the default
plausible agent but still compatible with the scenario). The cover story
was that experimenters were partly speaking puppet language, and the
childs job was to say what glorping meant. In the syntactically neutral
case, children chose as do adults; for example, by interpreting glorping
as chase rather than flee. This tendency was increased even further by
syntactically congruent input (where the dog was subject), but heavily
diminished by incongruent input; for example, The man is glorping the
dog. In fact, in this final case the modal response now became run away
from. No matter that it is usually, in real life, an act of chasing that initi-
ates an act of fleeing, the syntactic structure plays the determinative role,
so that now the prototypical patient (the man) must be who the sentence
is about. So, given the scene, the unknown verb must mean flee.2
Speaking more generally, providing children or adults with a sentence
frame that specified who was the agent and who was the patient/recipient/
undergoer disambiguated the scenes, resulting in an interpretation con-
sistent with chasing/fleeing. A remaining question has to do with the
varying potency of cues to word meaning. For instance, much evidence
shows that eye-gaze direction of the speaker (reflecting his focal atten-
tion) as he utters a word is a powerful influence on word learning (e.g.,
Baldwin 1991). Nappa et al. (2009) replicated this finding, but only in
case there was no countervailing syntactic cue. That is, when syntax and
social-pragmatic cues such as eye-gaze were directly pitted against one
another in these studies, the syntax trumped the pragmatics for every age
groupfrom 2-year-olds to adults.
10.1.2 Path Types: Goal Paths versus Source Paths

While it is in principle possible that languages could encode a very large
number of distinct path types, they usually get by with a very few basic
terms that encode paths. This is consistent with the idea that linguistic
encoding does not exactly reflect our non-linguistic spatial representa-
tions (Landau and Jackendoff 1993). As described by Jackendoff (1983),
the major ontological type, [PATH], includes just two types of bounded
paths. Goal-paths represent paths whose endpoint is the object of the
prepositional phrase (PP, encoded in English by to plus an NP) and
Source-paths represent paths whose starting point is the object of the PP
(usually encoded by from).
Linguistic analyses support asymmetry between these two path types
on a number of grounds: goal PPs tend to be unmarked by inflectional
material in a wide range of languages, whereas source PPs tend to be
marked (Fillmore 1997; Ihara and Fujita 2000; Jackendoff 1983); goal PPs
tend to be arguments of verbs, whereas source PPs tend to be adjuncts
(despite exceptions such as English remove and empty, Nam [2004]); goal
and source PPs also distinguish themselves on other properties such as
movement and behavior in locative alternations (Nam 2004). Typological
groupings based on the collapsing of either goal or source paths with
marking of place led Nikitina (2006) to suggest that goal and source
paths are maximally distinct in universal semantic space. In addition
to the linguistic evidence for asymmetries, there is now evidence for the
prominence of goal paths over source paths in human pre-linguistic
understanding of events (Lakusta et al. 2007). The prominence of goals
prior to language learning, combined with universal prominence in syn-
tactic structures that express paths, leads naturally to the prediction that
the source-goal asymmetry should be reflected in young childrens lan-
guage. It is, as we describe next. However, as in the case of the agent/
patient asymmetry, we also show that preferences for expressing the goal
path can be reversed by providing contrary linguistic informationin
this case, by the choice of source-path lexical verbs; for example, get
rather than give.
The goal bias in language has been demonstrated in several experi-
ments. Lakusta and Landau (2005) showed 3-year-olds videotaped events
in which an object or person moved from one landmark-specified loca-
tion to another, with both origin and endpoint indicated by landmarks
and visible throughout. For example, in one event, a toy bird emerged
from a bucket, moved in an arc to a glass, and came to rest in it (see
figure 10.2).
Children responded to the question What happened? by saying The
bird flew to the glass, rather than The bird flew out of the bucket or
even The bird flew out of the bucket and into the glass. That is, although
Height Matters 193
Figure 10.2
The goal bias in the expression of spatial events. When children or adults are shown a
motion event in which an object moves from one location (the source) to another (the
goal), they are strongly biased to express the event in terms of the goal path rather than
the source path. In the example event above, they are more likely to say The bird flew
into the glass than either The bird flew out of the bucket or The bird flew out of the
bucket and into the glass. See text for discussion of findings (Lakusta and Landau 2005).
the physical event that was depicted afforded at least these three differ-
ent descriptions (motion from the source, to the goal, or both), children
and adults were strongly biased to describe the events in terms of motion
towards the goal. Whenever possible, they included path expressions that
encoded the goal path (with goal as argument) rather than the source
path. Lakusta and Landau found that this bias is quite general, holding
for manner-of-motion events (e.g., running, hopping) that do not have
an inherent directionality as well as for transfer events (giving, getting),
and even events involving change of state (saddening, brightening). The
generalization of the goal bias from spatial domains to non-spatial
domains accords with Grubers (1965) observations, further articulated
by Jackendoff (1983) as the Thematic Relations Hypothesis.
The finding of a goal bias in language has now been replicated and
extended, and is robustly present in other languages that have been
investigated (e.g., Lakusta et al. 2006; Ihara and Fujita 2000) and across
both animate/intentional events and physical events (Lakusta and
Landau 2012). The goal bias is not a simple reflection of the non-linguistic
bias present in infancy, however. When children and adults are given a
non-linguistic task in which they must identify changes to either goal or
source across a sequentially-presented pair of events, they detect goal
changes more accurately, but only for animate/intentional events.
In light of the strength of the goal bias, it can be nullified or reversed
only by introducing blatantly contrary information. Lakusta and Landau
(2005) documented this by showing 3-year-olds videotaped events that

could be described using either of a pair of verbs such as giving versus
getting or throwing versus catching. Children were told that their job was
to view each movie and tell the experimenter what happened. But they
were also supplied with a hint verb (Your hint is throw, Your hint
is catch) after viewing each event. The children complied, describing the
event from each perspective, as instructed. This in itself shows that lan-
guage can serve as the trigger to force a change in the interpretation of
an event. Equally important is the structure of the sentences that the
children produced. When using a verb whose natural complement is a
goal path (give/throw), they usually (in 70% of the trials) included the
goal path expression (e.g., The man gave the ball to the girl or The
man gave the girl the ball). But when using a verb whose natural
complement is a source path (get/catch), the path expression was included
less than 10% of the time (The girl got the ball from the man) and
omitted in the overwhelming proportion of cases (The girl got the
ball). Clearly, 3-year-olds were easily able to reverse their natural bias
to encode events in terms of goal-oriented verbs, now encoding the same
event as one of getting rather than giving (and analogous to the reversal
of choice from chase to flee that we discussed earlier). When they did so,
they also tended to omit the relevant source-path complement, though
this was compatible with the scene and despite the fact that English
provides a ready linguistic means to express it ( . . . from the man).
Following Nams (2004) analysis, we conjecture that the children were
more likely to include the goal path when they used a goal-oriented verb
because goals are more likely than sources to appear as arguments; sym-
metrically, the children were more likely to omit source paths when they
used source-oriented verbs because sources are more likely to surface as
adjuncts, hence less prominent syntactically. This is the kind of conspiracy
of conceptual and linguistic tendency that languages embody, and that
serve as entry points for acquiring the verb lexicon.
10.2 Asymmetries in the Linguistic Encoding of Symmetry
In this section we consider the semantic effects of syntactic argument

position when nominals are being compared symmetrically. To presage
the relevance of this case, the expectation should be that if a comparison
is indeed symmetrical this should predict that the compared entities will
appear as constituents at the same height in a syntactic tree structure.
But as we now show, the facts about syntactic-semantic alignment are far
Height Matters 195
S S
NP VP NP VP
NP conj NP V NP conj NP recip V
John and Peter meet Jean et Pierre se rencontrent

(a) (b)
Figure 10.3
Simplified phrase structure descriptions for a sentence with the English verb meet (3a) and
the French verb se rencontrer (3b). Figure 10.3a represents an intransitive use of the verb
meet with a conjoined subject noun-phrase. This English format for meet is identical to that
for eat or walk or any ordinary intransitive, that is, it gives no indication of the reciprocal
interpretation (each other) associated with meet and other symmetrical verbs in this
construction. Rather, the symmetricality is assumed to be represented as part of the lexical
(rather than syntactic) specification for symmetrical predicates (Gleitman et al. 1996). In
contrast, as figure 10.3b shows, French and many other languages mark the reciprocal for
symmetrical predicates with a pronominal clitic (se), thus morphosyntactically (as well as
lexically) differentiating symmetrical from nonsymmetrical predications. According to
some accounts (e.g., Gleitman 1965), such a reciprocal element occurs as well in the under-
lying morphosyntactic representation of English symmetricals. In essence, under such a
syntactic rather than lexical account of the reciprocal inference structure, the underlying
syntactic tree for English meet is just like that for French se rencontrer, only in Englishthe
reciprocal occurs on the surface as a phonetically empty item.
more complex than this. We begin with the orthodox definition of a sym-
metrical relation:
(i) For all x, y, xRy < -- > yRx
This property is expressed in some hundreds of English language predi-
cates, including such stative relational terms as match, equal, near, and
friend, and in inherently reciprocal activity terms such as meet, argue,
and marry. For instance, if x is equal to y, so must y be equal to x, and if
John and Peter are friends then each of them stands in this relation to
the other. Because symmetrically compared entities necessarily play a
single thematic role, we would expect them to surface as sisters in a single
syntactic argument position, and so indeed they do (see figure 10.3a), as,
for example:
(1) John and Peter meet.
NP VP
V NP
John meets Peter
Figure 10.4
Simplified phrase structure description for John meets Peter. Here, the subject/
complement structure is asymmetric, implying some within-category distinction in promi-
nence of the compared nominals or of their roles in the predication even though the
verb itself is a symmetrical one. Notice that even with the two names (John, Peter)
used here, the positioning may imply that the complement noun-phrase expresses a more
prominent individual (as in, say, My sister met Barack Obama) or the ground in a figure-
ground perspective such that the subject noun-phrase went to meet the complement
noun-phrase.
Moreover, the semantic interpretation of such sentences is roughly

reciprocal; that is, the compared terms stand in the designated relation
to each other, a fact reflected on the surface in some languages (e.g.,
French se rencontrer, see figure 10.3b) but present only by implication in
others, including English. In contrast, although in principle John and
Peter could drown or see each other, the intransitive plural forms
(2) John and Peter/the men drown/see.
do not at all imply that they see or drown each other, and plural subjects
are not preferred to singulars for these verbs. A fortiori, John and Peter
are fathers cannot imply mutual fatherhood, while John and Peter are
cousins invites the reciprocal reading.3
So far this picture seems quite straightforward in the sense that the
structural facts (sisterhood of the compared entities in the tree struc-
ture) of examples (1) and (2) line up with the semantic facts; the mapping
is simple. But the contrapositive should hold as well: in light of the defini-
tion (i) of symmetry, it should follow that entities compared symmetri-
cally should not ever appear at different heights in a phrase-structure
tree; that is, by implication as arguments of different kinds or bearing
different semantic relations to the predicate. And yet they do, as shown
in figure 10.4.
Height Matters 197
This asymmetry of interpretation due to differences in NP heights

holds across a wide range of symmetrical predicates, for example:
(3) The button matches the shirt.
(4) North Korea is similar to Red China.
(5) The Walmart is near my apartment house.
How are we to deal with this shadow in a beautiful picture?
10.2.1 The Hypothesis that Similarity and Similar Concepts Are

Asymmetrical
A popular response to these representational oddities is to deny that the
concepts that such words express are symmetrical in the first place, but
rather encode something like a match to standard (Tversky 1977) or
figure/ground relation (Talmy 1983). Tversky and Gati (1978) put this
hypothesis to the test by asking experimental participants to rate (on a
5-point scale) various similarity comparisons, with results apparently
incompatible with the definition of symmetry. For instance, participants
consistently rate the similarity of North Korea to Red China as higher
than the similarity of Red China to North Korea. These authors interest
was to explicate the notion of psychological similarity in particular, but
in fact the same analysis and the same experimental effects can be gener-
ated for the entire class of so-called symmetrical predicates, including
equal, meet, and match (see Gleitman et al. [1996]). This effect holds not
only when people make similarity judgments between entities, but also
when they estimate distances between places, with judgments from well-
known to less well-known places smaller than the distances from
unknown to known places, though in fact it is always the case that my
house is just as close to the Empire State Building as the Empire State
Building is to my house (Sadalla, Burroughs, and Staplin [1980]; see also
Rosch [1975]).
For both judgments of similarity and of distance, any difference of
rating with argument position reversed appears to be an explicit denial
of the definition of symmetry in (i). That is, by avering that North Korea
is very much like Red China (North Korea is similar to Red China is
enthusiastically rated 4 or 5 on a 5-point scale of how similar) but
that Red China isnt so similar to North Korea (rating this comparison
only a 2 or so), subjects in the laboratory seem to be claiming that
similarity is a one-way entailment, rather than the symmetrical relation
defined in (i). Notice that to accept this conclusion, based on the subjects
responses, is a potentially general argument for denying the paradox of

ratings differences when the syntactic position of the compared entities
is reversed. If the predicates are asymmetrical, then the subject-
complement syntax in figure 10.4 (asymmetric in terms of the different
relative heights of the two nominal phrases in the tree structure) reflects
the semantics in this best of all possible worlds.
10.2.2 Problems with the Panglossian Solution

So far, as weve described, psychologists and linguists have accounted for
subject-complement contexts for words like similar by denying the sym-
metricality of the concepts they express. But the cost now is that there
is no account for the facts graphed in figure 10.3, namely the preferred
argument-plurality and the reciprocal interpretation of meet (etc.) in
intransitive structures. The other major problem is that on this analysis
all predicates heretofore thought to be symmetrical are now deemed
asymmetrical. This is not really desirable. For instance, Orwell aside, it
should not be expected that some things are more equal than others, and
yet there is an interpretive distinction between
(6) The least of the citizens is equal to the president.
(7) The president is equal to the least of the citizens.
The two sentences bring to mind very different presidents of the USA,
a new way of distinguishing between a Lincoln and a Ford. Experimental
participants are easily coaxed to assert such distinctions when they are
asked, Which would you rather say? for various symmetrical predica-
tions such as (6) and (7). If these preference judgments were driven by
inherent relations between the compared nominals, the experimental
participants should be nonplussed if asked . . . would you ever want to
say (7) in preference to (6)?, but instead they readily come up with
alternatives that switch the valence of the compared terms, or the very
basis of comparison: Well maybe if you are talking about the height of
the presidents and he is a midget. This implies that it is not, as usually
thought, inherent relations between the nominals themselves (for these
do not change when their subject/complement ordering switches) but
rather their placement in the syntactic tree that controls the interpretive
asymmetry. To solidify this claim, Gleitman et al. (1996) showed that the
asymmetries could readily be reproduced by asking participants to assign
semantic properties to nonsense words like yig and zav when these
appeared as the nominal arguments of symmetrical predications. For
example, given the sentence The yig is similar to the zav people were
Height Matters 199
asked to judge the semantic values of yig and zav on the basis of relative
fame, size/mobility, power, and birth order; for example, Which is older,
the yig or the zav? People assigned different scores as a function of the
positioning of the nonce items in the syntactic structure, with the higher
score always given to the item in complement position.4
As follows from these findings, the intuition that the concepts/terms
are symmetrical is put on a more secure footing via the twin linguistic
diagnostics of plurality preference and inference-structure characteristics
(figure 10.3). Syntactically distinct placement of the compared nominal
items as subject and complement (figure 10.4) establishes their place-
ment in a conceptual hierarchy but does not alter the symmetry of the
predicate itself. Thus there is no paradox in the unequal ratings of simi-
larity as between the president comparisons in (6) and (7): these were
never the same comparison at all, and therefore the definition of sym-
metry (i) was never violated. Two entities compared on property p (say,
prominence or competence as a leader) may be very similar, but when
compared on property q (say, relative physical size or strength) may be
very different.
Armed with these findings and interpretations, we can now return to
our main focus of attention: how does the language of agents and patients
and sources and goals behave in relevant regards?
10.2.3 Agents and Patients under Symmetrical Predication

Switching the syntactic positions of symmetrical predicates has a small
but significant effect on peoples judgments of change of meaning in case
the predicate is a formal stative term such as similar (as in the compari-
son of (4) to its reversed-order twin Red China is similar to North
Korea). But when the symmetrical predicate is an activity term, for
example,
(8) My sister met/argued/collided with Meryl Streep.
(9) Meryl Streep met/argued/collided with my sister.
the result of reversing nominal positions is typically that people judge
there to be a larger difference in meaning between the two sentences
(Gleitman et al. 1996). The difference has to do with the assignment of
agency. For the formal relations such as similarity, North Korea is not the
agent/cause/instigator of Red Chinas similarity. In contrast, agents are
prototypical figures that operate as movers and doers in the conceptual
and physical worlds and thus are preferentially assigned subject position.
Moreover, and rather more surprising, if the compared nominals differ
in power, fame, or birth order, the reordering requires the listener to

explicitly reassign these properties; for example, Well, if my sister was
even more famous than Meryl Streep. At the limit, if the reversal places
an inanimate in subject position (The lamppost collided with the
drunk), the effect is comical or fantastical. Thus the switch in agency
causes a mental jolt just because of the a priori identification of animate
beings as causal agents.
10.2.4 Spatial Relations under Symmetrical Predication

If two things are near each other, then the first is as near to the second
as the second is to the first. Yet participants, as we have seen, have a
preference for where to place the nominals. They would rather say
that the bicycle is near the garage than that the garage is near the
bicycle. This finding was one basis for saying that space is not treated in
a strictly metric fashion by humans, but rather like a layout in which a
figure moves on a constant ground (Rosch 1975; Talmy 1983; Sadalla et
al. 1980). However, again it can be shown that the causal factor is not the
inherent or natural distinctions between garages and bicycles per se in
regards to their size and mobility, but rather the assignment of size/
mobility distinctions to any pair of nouns in virtue of their position in
a symmetrical predication. When the positions of the nominals are
switched to The garage is near the bicycle, participants have no trouble,
in fact they are spontaneously creative, in supplying new interpretations,
including these (actual) participant responses (from Gleitman et al.
[1996], 246247):
Power: Well, if it was a very famous bicycle.
Size/mobility: If you had a humongous concrete statue of a bicycle and a little
garage on wheels going round and round it.
Familiarity: If I parked my bicycle somewhere and while I was gone they built a
garage next to it.
Notice then that a perceptual property has been assigned to common

concepts as a function of their assignments to positions in the syntactic
structure.5 This is exactly what has happened, as well, in the litigious situ-
ation we described in our introductory remarks: if the scooter instigates
contact with a bus, its rider is culpable no matter the inherent sizes and
dynamic powers of the colliding bodies. The preference judgments, when
sentences are presented without context, simply reflect the plausibility
of scenarios that the listener is able to conjure up, and are readily changed
by actual circumstancesor further reflection.
Height Matters 201
10.2.5 Acquisition of Symmetrical Predicates

Given the subtle and complex relationship between symmetrical versus
asymmetrical predicates and their syntactic encoding, one might wonder
whether young children are sensitive to the distinctions between these
types of predicates and their syntactic encoding early in life. Although
there is little research on this issue, two sets of empirical findings show
remarkable sensitivity in children as young as 3 or 4 years of age.
First, children respect the order of nominals for symmetrical predi-
cates, acting out sentences like The dog is fighting the bunny in systemati-
cally different ways from The bunny is fighting the dog (Miller [1998];
see also Grcanl and Landau [2011]). This shows that, assuming they
know that fight is symmetrical, they also know that the nominal assign-
ment determines their interpretive ranking. Second, they show evidence
of clearly sorting out symmetrical from asymmetrical predicates by rec-
ognizing that surface intransitives with symmetrical predicates can
express mutual action, whereas asymmetrical predicates cannot. Thus,
they act out The dog and the bunny meet by showing the two toys
engaging in mutual action at the same time (i.e., moving towards each
other simultaneously); whereas they act out The dog and the bunny
kick by showing the two toys engaging in independent action (each one
kicking in the air, but not kicking the other; Grcanl and Landau [2011]).
Such sensitivity to the difference between symmetrical and asymmetri-
cal predicates and their syntactic encoding is remarkable for several
reasons. First, as we have discussed, the range of the symmetrical con-
cepts is very broad, applying across domains of space, animacy, and
quantity in subtle ways. Second, the syntactic reflexes of symmetrical
versus asymmetrical predicates are subtle and rather complex. The childs
job is to learn the mapping between the two types of concept onto the
two types of predicatethat is, to learn that meet is a symmetrical verb,
but kick is not. We hypothesize that childrens acquisition is guided in
part by implicit attention to height in the phrase structure tree, which
allows them, for example, to recognize that whichever nominal is higher
must be the actor. More generally, the feat of distinguishing between
symmetrical and asymmetrical predicates is particularly interesting
because there is considerable cross-language variability in the patterns
that must be identified with symmetricality (for example, reflexive-
reciprocal clitics in Romance languages, see figure 10.3). Whatever cues
are provided in the linguistic structure, the childs job must be to recover
the underlying semantics from surface-variable encodingthat is, though
Figure 10.5
Two search situations. In one case (left panel), search for a red L among green Ls is easy
and fast; it requires a search of only one feature, color. The red L appears to pop out of
the display. In the second case (right panel), search for a red L among green Ls and red
Os is difficult; it requires a search of two features, color and shape.
language is innate and organized under highly restrictive semantic-

syntactic principles, each language must be learned, because within the
universal parameters there is considerable surface variability.
10.3 Asymmetries Have Powerful Effects on Encoding and Memory
In previous sections, we have argued that children and adults represent

and are highly sensitive to the relative prominence of elements in a syn-
tactic structure and that they regularly use prominence to modulate their
initial interpretive biases. In this section, we build on these findings,
reporting a further powerful effect of prominencehere, the lexical
semantics of an asymmetric predicate together with the arguments rela-
tive height in the syntactic frame, override a representational fragility in
the visual system.
Our case draws on a well-known fragility in the visual systemthe
maintenance of feature conjunctions (e.g., red L, combining the color
and shape of a stimulus). This fragility results in what have been
called illusory conjunctions, errors in which the color of one stimulus
appears to combine with the shape of another (Treisman and Schmidt
1982). For example, people observing a red L next to a green O may
report that they have seen a red O or a green L. One can gain a sense
of what this would mean by comparing two different search situations
(figure 10.5).
Height Matters 203
If a person searches for a red L among a set of green Ls, s/he need only
use the feature red to identify the target. In such a case, one subjectively
feels that the target stimulus pops out of the display; indeed, search
time does not increase as the number of green elements increases. By
contrast, if a person searches for the same red L in a display that contains
both green Ls and red Os, s/he will need to search for both red and L
to find the target, differentiating it from green Ls and red Os. This search
feels much more effortful, and search times increase linearly with set size.
The difference between the feature and conjunction searches illustrate
that the latter is much more difficult. Although the mechanisms underly-
ing such illusory conjunctions are debatable, one theory is that active
allocation of attention must be deployed in order to accurately represent
and maintain feature conjunctions (Treisman and Gelade 1980).
The fragility of the visual system in binding object properties under
certain conditions raises the more general question of whether language
can help resolve the potential ambiguity in the visual representation. If
the visual properties fail to bind properly, then our representation will
be indeterminate with respect to which properties go together. What is
needed is a format that establishes just a single correct representation of
the several that are possible.
In a series of experiments, Dessalegn and Landau (2008, 2013) showed
that linguistic information can indeed disambiguate the potential misas-
signment of two properties, resulting in improved memory for the right
combination of color and location in a stimulus. Their studies probed the
ability of young children to encode and then remember a simple visual
stimulus that combined color and location, specifically a square that is
split, with one red and one green half. The details of the findings show
that the effects hold only under highly specific conditions. To work, the
linguistic information must establish the choice between two possible
interpretations available to the visual system. This is accomplished by the
use of an asymmetric predicate (e.g., left, right) together with the syntac-
tic frame in which the two NPs (red, green) are situated in different
positions.
In Dessalegn and Landaus experiments, 4-year-olds were shown a
square that was split in half vertically by two colors (e.g., red on the right,
green on the left), and were told they would have to remember it. The
square then disappeared for one second, after which a display appeared
containing the original square, its reflection (e.g., red on left, green on
right), and a third square with a different geometric split (e.g., a diagonal
or horizontal split with red and green in each half; see figure 10.6 for an
Same Reflection Other Other

Figure 10.6
Sample test set from Dessalegn and Landau (2008, 2013). Children were shown a target
stimulus (top row), which was then removed for a 1-second delay, after which three choices
appeared: the Same stimulus, a Reflection of the original target, or an Other stimulus which
displayed the same two colors in a different geometric split from the target. 3- and 4-year-
olds performed better than chance, but frequently erred by choosing the Reflection, sug-
gesting that they failed to combine the colors with the correct locations. Adding specific
linguistic instructions to the task enhances 4-year-olds performance, but not 3-year-olds.
See text for discussion.
example). Children were correct on only about 60% of the trials; almost
all errors were choices of the targets reflection rather than the target
itself. That is, children rarely chose the square with the different geomet-
ric split (e.g., a horizontal or diagonal split), showing that they retained
the type of split they had seen; their errors reflected fragility in remem-
bering the assignment of color to each side of the split square.
This pattern held over a number of experiments that manipulated the
context of presentation. In one, the target square was named with a novel
noun (See this? This is a dax) in order to evaluate whether simply
naming the square could draw sufficient attention to disambiguate the
two descriptions (red left/red right). Results remained the same as base-
line. In another, children were asked to Point to the red part in order
to evaluate whether perceptual-motor activity might ground the childs
representation of what she saw. Results again remained the same as
baseline. These findings tentatively ruled out explanations holding that
the children simply needed to deploy more attention in order to store
and remember the correct color/location assignments.
By contrast, when children were instructed with the sentence The red
is left/right/top/bottom of the green, their performance increased by
approximately 20%, now around 80% or better. This instruction contains
Height Matters 205
two elements. First, the predicate is inherently asymmetrical: if X is left

of Y, this entails that Y cannot be left of X. Accordingly, it is ungram-
matical to say The red and green are to the left of each other. The
second element is a consequence of the first. Using an asymmetrical
predicate forces the NPs to be placed in different positions of promi-
nence, with red now corresponding to the figure object (to be compared)
and green corresponding to the reference object (the comparator). Red
and green do not have inherent prominence differences (the way, say,
Meryl Streep and my sister might, as we observed earlier). However,
placing them in their relative positions in the sentence establishes the
directionality of the asymmetry. Now green is the reference object, red is
the figure, located with respect to that.
Subsequent experiments provided surprising confirmation that the
asymmetry of the predicate was crucial in allowing children to retain the
representation of the visual stimulus and match correctly (Dessalegn and
Landau 2013). In one experiment, Dessalegn and Landau instructed with
the same syntactic frame, but used a predicate that was not inherently
asymmetric. When children were instructed, The red is touching/near/
adjacent to the green, they remembered the stimulus no better than in
the baseline condition. In this case, the meaning of the predicate provides
a truthful representation of the stimulus structure, but it does not estab-
lish a directional asymmetry. Perhaps more surprisingly, using an inher-
ently asymmetric predicate that does not convey spatial directionality
did improve childrens memory and matching performance. When the
4-year-olds were instructed that, in the experimenters view, The red is
prettier/lighter/happier than the green, their matching performance was
reliably better than baseline, and no different from the right/left/top/
bottom instructions. This suggests that the asymmetric value of the predi-
cate prettier, etc., was sufficient to establish directionality; combined with
the relative prominence of the two NPs, children were able to remember
the directionality of the two colors. Since these predicates do not in any
way provide spatial information, it is likely that the structure provided
the abstract asymmetric relationship between colors, but the visual
system provided the spatial directionality. That is, children hearing these
sentences and looking at the squares could combine the information
about color and relative location and hold it in memory for matching
one second later.
These powerful effects turn out to have a distinct developmental sig-
nature. 4-year-olds benefited from the spatial asymmetric predicates in
asymmetric frames (red is left of green), as well as the non-spatial
asymmetric predicates (red is prettier than green). 3-year-olds never ben-

efited from any of the manipulations; they hovered around 5060%
correct across all linguistic instructions. 6-year-olds did not need the
linguistic instruction: in the baseline, they were already performing close
to ceiling levels. Dessalegn and Landau speculated that by the age of 6,
children were spontaneously encoding the stimuli in a linguistic format
that preserved spatial directionality between the two color patches. This
idea is supported by findings showing that adults who are required to
shadow verbally during the matching task perform reliably more poorly
than they do with either no shadowing or non-verbal shadowing.
10.4 Concluding Remarks
We have discussed in this essay three of the many ways that syntactic
and semantic representations interact, concentrating on a single aspect
of linguistic geometry: height in a phrase structure tree. In the first
section, we showed how anchor points in the perceptual-conceptual
domains of animacy and motion, owing to their special psychological
saliency, are prototypically assigned to higher nodes in these linguistic
representationsanimates over inanimates, and motions toward over
motions from a goal. In these cases, the causal flow is from conceptual
prominence to linguistic representation, with central features capturing
the higher nodes in the configural tree. Second, we looked at the curious
case of symmetrical comparison where, one might suppose, the very
definition of symmetricality should lead us to expect that the compared
entities would appear at the same height in phrase structure trees. As we
discussed, however, though sometimes they do (as in the intransitive uses
in figure 10.3), sometimes they do not (as in the tree in figure 10.4). In
this latter case we see effects of the kinds of variable we looked at earlier,
with various influences of perceptual and semantic prominence predict-
ing which entity will surface linguistically in the subject rather than
complement position, thus higher in the tree. Third, we showed that
learners under cognitive stressin this case, very young children trying
to distinguish spatial and hue aspects of fleetingly glimpsed symmetrical
figureslean on asymmetrical structural information as an effective
boost to memory. This time, it is the linguistic structure that plays the
causal role, facilitating memory for the relevant aspect of the visually
perceived world.
We want to end as we began: by acknowledging our significant indebt-
edness to Ray Jackendoff, both for coaxing linguists and psychologists to
Height Matters 207
think about these interface issues and for developing the formal frame-
work that allows them to be investigated and explained. Ray: you are
very high in our personal phrase structure trees.
Notes
1. Notationally we use double quotes for utterances, italics for the mention
(rather than use) of a word or phrase, and single quotes for the concept that
the word or phrase expresses.
2. Notice that the more conceptually difficult it is to conceive of some nominal
element as an agent, the more grotesque the outcome of switching the compo-
nent noun phrases becomes, e.g., Fame/his dreams fled the man as alternatives to
The man chased fame/his dreams.
3. The interpretive contrast between the kin terms father and cousin is perhaps
the clearest indication that symmetry is a lexical-semantic rather than a syntactic
feature. When appearing in the same linguistic environments, father does not
elicit inferences of symmetry, whereas cousin (defeasibly) does.
4. Psychologists have been quick to embrace some version of the view that the
structures just discussed say something useful about the concept of similarity (see
in particular the analyses from Medin, Goldstone, and Gentner [1993] and Smith
and Heise [1992]), for once there has been at least a hint of the respects in
which entities are similar to each other, via structured syntactic representations,
then the relation of similarity itself can be rehabilitated. The relation famously
villified (as vacuous and fickle) by Nelson Goodman (1972) can by the same
token now be viewed with the more positive descriptors dynamic and flexi-
ble. Still, as Goodman pointed out, the rehabilitation deals with similarity only
on the streetsfor practical but not theoretical purposesbecause to say that
two things are similar in having a specified property in common is to say nothing
more than that they have that property in common (445) so the term similar is
doing no independent work despite its retitling.
5. These reassignments apply across the symmetrical class, e.g., if asked for when
they would say Red China is similar to North Korea, participants conjecture
preferences for the climate, the opportunities for surfing, etc., in regards that may
favor North Korea.
References
Baldwin, Dare A. 1991. Infant contribution to the achievement of joint reference.

Child Development 62 (5): 875890.
Bever, Thomas G. 1970. The cognitive basis for linguistic structures. In Cognition
and the Development of Language, edited by John R. Hayes, 279362. New York:
Wiley.
Dessalegn, Banchiamlack, and Barbara Landau. 2008. More than meets the eye:
The role of language in binding visual properties. Psychological Science 19 (2):
189195.
Dessalegn, Banchiamlack, and Barbara Landau. 2013. Interaction between lan-

guage and vision: Its momentary, abstract, and it develops. Cognition 127 (3):
331344.
Dowty, David. 1991. Thematic proto-roles and argument selection. Language 67
(3): 547619.
Fillmore, Charles J. 1997. Lectures on Deixis. Stanford, CA: CSLI Publications.
Feldman, Heidi, Susan Goldin-Meadow, and Lila R. Gleitman. 1978. Beyond
Herodotus: The creation of language by linguistically deprived deaf children. In
Action, Symbol, and Gesture: The Emergence of Language, edited by Andrew
Lock, 351414. New York: Academic Press.
Fisher, Cynthia, D. Geoffrey Hall, Susan Rakowitz, and Lila R. Gleitman. 1994.
When it is better to receive than to give: Syntactic and conceptual constraints on
vocabulary growth. Lingua 92: 333375.
Fisher, Cynthia, and Hyun-Joo Song. 2006. Whos the subject? Sentence struc-
tures as analogs of verb meanings. In Action Meets Word: How Children Learn
Verbs, edited by Kathy Hirsh-Pasek and Roberta Michnick Golinkoff, 392425.
New York: Oxford University Press.
Gergely, Gyrgy, Harold Bekkering, and Ildik Kirly. 2002. Rational imitation
in preverbal infants. Nature 415 (6873): 755.
Gleitman, Lila R. 1990. The structural sources of verb meanings. Language Acqui-
sition 1 (1): 355.
Gleitman, Lila R., Henry Gleitman, Carol Miller, and Ruth Ostrin. 1996. Similar,
and similar concepts. Cognition 58 (3): 321376.
Gleitman, Lila R., David January, Rebecca Nappa, and John C. Trueswell. 2007.
On the give and take between event apprehension and utterance formulation.
Goodman, Nelson. 1955. Fact, Fiction, and Forecast. Cambridge, MA: Harvard
University Press.
Goodman, Nelson. 1972. Seven strictures on similarity. In Problems and Proj-
ects, edited by Nelson Goodman, 437446. Indianapolis, IN: Bobbs-Merrill
Company, Inc.
Gordon, Peter. 2003. The origin of argument structure in infant event representa-
tion. In Proceedings of the 28th Annual Boston University Conference on Lan-
guage Development, edited by Alejna Brugos, Linnea Micciulla, and Christine E.
Smith, 189198. Somerville, MA: Cascadilla.
Grimshaw, Jane. 1981. Form, function, and the Language Acquisition Device. In
The Logical Problem of Language Acquisition, edited by C. Carl Lee Baker and
John Joseph McCarthy, 165182. Cambridge, MA: MIT Press.
Gruber, Jeffrey Steven. 1965. Studies in Lexical Relations. Ph.D. diss., MIT. Pub-
lished, Bloomington, IN: Indiana University Linguistics Club, 1970.
Grcanl, zge, and Barbara Landau. 2011. Representation and acquisition of
symmetrical verbs. Poster presented at the Cognitive Science Society, Boston,
MA, July 2011.
Height Matters 209
Ihara, Hiroko, and Ikuyo Fujita. 2000. A cognitive approach to errors in case
marking in Japanese agrammatism: The priority of goal-ni over the source-kara.
In Constructions in Cognitive Linguistics: Selected Papers from the Fifth Interna-
tional Cognitive Linguistics Conference, Amsterdam, 1997, edited by Ad Foolen
and Frederike Van der Leek, 123140. Amsterdam: John Benjamins.
Jespersen, Otto. 19091949. A Modern English Grammar on Historical Principles.
Vol. 7. Copenhagen: Munksgaard; London: Allen & Unwin.
Lakusta, Laura, and Barbara Landau. 2005. Starting at the end: The importance
of goals in spatial language. Cognition 96 (1): 133.
Lakusta, Laura, and Barbara Landau. 2012. Language and memory for motion
events: Origins of the asymmetry between source and goal paths. Cognitive
Science 36 (3): 517544.
Lakusta, Laura, Laura Wagner, Kirsten OHearn, and Barbara Landau. 2007.
Conceptual foundations of spatial language: Evidence for a goal bias in infants.
Language Learning and Development 3 (3): 179197.
Lakusta, Laura, Hanako Yoshida, Barbara Landau, and Linda Smith. 2006.
Cross-linguistic evidence for a goal/source asymmetry: The case of Japanese.
Poster presented at the International Conference on Infant Studies, Kyoto,
Japan, June 2006.
Landau, Barbara, and Ray Jackendoff. 1993. What and where in spatial lan-
guage and spatial cognition. Behavioral and Brain Sciences 16 (2): 217265.
Medin, Douglas L., Robert L. Goldstein, and Dedre Gentner 1993. Respects for
similarity. Psychological Review 100 (2): 254278.
Miller, Carol A. 1998. It takes two to tango: Understanding and acquiring sym-
metrical verbs. Journal of Psycholinguistic Research 27 (3): 385411.
Nam, Seungho. 2004. Goal and source: Asymmetry in their syntax and seman-
tics. Paper presented at the Workshop on Event Structure, Leipzig, Germany,
March 2004.
Nappa, Rebecca, Allison Wessel, Katherine L. McEldoon, Lila R. Gleitman, and
John C. Trueswell. 2009. Use of speakers gaze and syntax in verb learning. Lan-
guage Learning and Development 5 (4): 203234.
Nikitina, Tatiana. 2006. Subcategorization pattern and lexical meaning of motion
verbs: A study of the source/goal ambiguity. Linguistics 47 (5): 11131141.
Pinker, Steven. 1989. Learnability and Cognition: The Acquisition of Argument
Structure. Cambridge, MA: MIT Press.
Quine, Willard. 1960. Word and Object. New York: Wiley.
Rosch, Eleanor. 1975. Cognitive reference points. Cognitive Psychology 7 (4):
532547.
Sadalla, Edward K., W. Jeffrey Burroughs, and Lorin J. Staplin. 1980. Reference
points in spatial cognition. Journal of Experimental Psychology, Human Learn-
ing, and Memory 6 (5): 516528.
Senghas, Ann, Sotaro Kita, and Asli zyrek. 2004. Children creating core prop-
erties of language: Evidence from an emerging sign language in Nicaragua.
Science 305 (5691): 17791782.
Smith, Linda B., and Diana Heise. 1992. Perceptual similarity and conceptual
structure. In Percepts, Concepts and Categories: The Representation and Process-
ing of Information. Advances in Psychology 93, edited by Barbara Burns, 233
272. Oxford: North Holland.
Talmy, Leonard. 1983. How language structures space. In Spatial Orientation:
Theory, Research, and Application, edited by Herbert L. Pick and Linda P.
Acredolo, 225282. New York: Plenum Press.
Treisman, Anne M., and Garry Gelade. 1980. A feature-integration theory of
attention. Cognitive Psychology 12 (1): 97136.
Treisman, Anne, and Hilary Schmidt. 1982. Illusory conjunction in the perception
of objects. Cognitive Psychology 14 (1): 107141.
Tversky, Amos. 1977. Features of similarity. Psychological Review 84 (4):
327350.
Tversky, Amos, and Itamar Gati. 1978. Studies of similarity. In Cognition and
Categorization, edited by Eleanor Rosch and Barbara B. Lloyd. Hillsdale, NJ:
Erlbaum.
Woodward, Amanda L. 1998. Infants selectively encode the goal object of an
actors reach. Cognition 69 (1): 134.
11 Accessibility and Linear Order in Phrasal Conjuncts
Bhuvana Narasimhan, Cecily Jill Duffield, and Albert Kim
11.1 Introduction
Consider the following utterance from a conversation in which two

acquaintances have been discussing the difficulties of saving money for
their childrens college tuition:
So, its only recently that weve had the money where we could start putting away
large sums of it for, uh, long-range goals like college and sickness and travel and
that kind of thing.
While the participants of the conversation have been discussing college,

the topics of sickness and travel are new as they have not been previ-
ously mentioned. How does the relative newness of these nominals
college, sickness, and travelaffect the speakers choice to order them in
a particular way?
When communicating with their conversational partners, speakers
refer to different entities in the discourse, some of which are new,
whereas others are old. In talking about these old and new referents,
speakers must choose which one to mention first in their utterances;
that is, speakers must linearize their thinking for the purpose of speak-
ing (Wundt 1900; Levelt 1989). In distinguishing between old and new
referents, we are talking about an asymmetry in the information status
of referentswhether or not a referent has been previously encoun-
tered in the discourse (or nonlinguistic) context. It has been found that
typically, speakers prefer to mention old referents before they mention
referents that are new. The old-before-new ordering preference
has been documented in a variety of construction types in languages
such as English (Bock and Irwin 1980) as well as in scrambled vs.
unscrambled utterances in languages such as Japanese (Ferreira and
Yoshita 2003).
212 Bhuvana Narasimhan, Cecily Jill Duffield, and Albert Kim
In many of these studies, the motivation for the old-before-new order-

ing preference has been attributed to speaker-oriented processes. The
speaker-oriented approach accounts for the old-before-new ordering
preference in terms of the incrementality of speech production and
the relative accessibility of (labels for) old referents vis--vis new refer-
ents (Bock and Irwin 1980; Levelt 1989; Ferreira and Yoshita 2003).
Since language production is incremental, we do not complete an entire
thought before speaking, rather we begin formulating our utterance as
soon as a piece of information becomes available to us. As a conse-
quence, information that is more activated at the time of grammatical
encoding, and hence easier to retrieve, tends be ordered early in the
utterance, before information that is less activated. Old referents are
more activated than new referents because they have been encountered
in a prior context, and are therefore more available for early mention.
For example, Branigan, McLean, and Reeve (2003) demonstrate that
speakers tend to order old referents earlier in their utterances than new
referents irrespective of the newness of the referent for the hearer,
appearing to disregard the nature of the information in the common
ground shared by speaker and hearer in favor of their own ease of
processing.
Other research has stressed the role of addressee-oriented processes,
also called audience design, in explaining why speakers choose to for-
mulate their utterances as they do. There is evidence that addressee-
oriented considerations influence the formulation of utterances in a
variety of ways (Haywood, Pickering, and Branigan 2005), for example
in the production of referential expressions (Clark and Wilkes-Gibbs
1986), in the specificity of information required to identify novel objects
(Blte et al. 2009), and in the modulation of the rate and redundancy in
speech when addressing children versus adults (Hoff-Ginsberg 1997).
The addressee-oriented approach accounts for the old-before-new pref-
erence in terms of speakers motivation to facilitate hearers comprehen-
sion. In this view, speakers produce the old-before-new order not because
it is easier to produce that order but because speakers tailor their utter-
ances to meet the informational needs of hearers. Speakers assume that
hearers find comprehension easier when they have a structure already
available to which incoming information can be linked (Clark and Havi-
land 1977), and it is plausible that the earlier this link with prior discourse
can be established (e.g., by mentioning the old referent first), the more
it aids hearers comprehension (Bock 1977). In other words speakers
mention Given [old] entities first so that addressees know which part of
Accessibility and Linear Order in Phrasal Conjuncts 213
their knowledge store to address, and then update that entry with the
New information contained in the later part of the sentence (Branigan,
McLean, and Reeve 2003, 181).
11.2 Accessibility and Word Order in Phrasal Conjuncts
The speaker-oriented and addressee-oriented accounts are not necessar-

ily incompatible. They are both based on the assumption that the old-
before-new order is the preferred order. However, the interpretation of
prior findings relies on how we define the terms old and new. The
definitions commonly used often confound two distinct dimensions
regarding entities in discourse: accessibility and aboutness. The terms
old and new are often used to label the accessibility of entities in the
discourse. Entities that have been activated recently or are in the focus
of the speaker and hearers attention are old, whereas entities that
have not been mentioned at all are new. The accessibility dimension is
distinct from the second dimension of aboutness relating to the topic-
comment distinction. Simply put, the topic is what the utterance is
about, whereas the comment is what the speaker wants to say about
the topic (related terminology include theme vs. rheme, presupposi-
tion vs. focus, and topic vs. focus, see Jackendoff [1972]; Halliday
[1994]; Von Stutterheim and Klein [2002], among others). The dimen-
sions of accessibility and aboutness are often confounded since topical
entities are typically old (accessible, activated), whereas comments intro-
duce information that is new.
In an attempt to separate the two dimensions, and to investigate the
effects of accessibility alone on word order, Narasimhan and Dimroth
(2008) investigated the ordering of labels in conjoined noun phrases.
Conjoined noun phrases (e.g., an apple and a spoon in the sentence an
apple and a spoon are on the table) are particularly interesting as a test
domain to investigate accessibility effects in the ordering of old and new
referents since the two noun phrases in such a construction do not differ
with respect to aboutness. They can jointly constitute the topic or the
comment portion of the utterance, depending on context. Similarly they
share the same grammatical and thematic role (e.g., the phrase an apple
and a spoon constitutes the grammatical subject as well as expresses the
thematic role Theme in the sentence an apple and a spoon are on the
table). Hence the relative accessibility of the noun phrases within the
conjunct phrase can be manipulated independently of grammatical
status, thematic role, and aboutness/topicality.
Narasimhan and Dimroth (2008) asked adult and 3- to 5-year-old

speakers of German to describe pictures of pairs of objects (e.g., an apple
and a spoon) that were not visible to the addressee, who then matched
them with the corresponding picture from a stack of other pictures. One
of the objects in the object-pair was old for both the speaker and the
addressee, having been encountered and labeled by the speaker in the
presence of the addressee in the prior discourse, whereas the other was
new. The study showed a dissociation between adults and children in
order-of-mention preferences. Adults showed a robust preference for the
old-before-new order, replicating prior findings (Bock and Irwin 1980;
Ferreira and Yoshita 2003). Thus, when topicality, grammatical role, and
thematic role are controlled for, adults typically labeled the more acces-
sible (old) entity first, before labeling the new entity.
Interestingly, 3- to 5-year-old children in the same study exhibited the
opposite ordering preference: new-before-old. In order to verify that this
preference was not simply a reproduction of ordering patterns found in
the child-directed speech register, the experiment was replicated with
adult caregivers directing speech to children. As in the previous experi-
ment, adults were asked to describe pairs of objects to their children,
who then matched the descriptions to a set of pictures. Findings showed
that when adult caregivers produce conjunct phrases in input directed to
children, the noun phrases in the conjuncts are ordered in the old-before-
new sequence, just as they are when adults direct speech to other adults.
Hence childrens non-adult-like ordering preference is unlikely to stem
from the patterns in the ambient language. One possible explanation is
that it is motivated by a novelty preference that has its roots in early
infancy (and forms the basis for methodologies such as high-amplitude
sucking and preferential looking). Childrens attention is focused on the
salient novel objectmaking it more activated, and hence more acces-
sible for production. They utter first the label for the object in the here-
and-now before producing the label for the previously encountered
entity.
Further evidence for young childrens robust preference for the new-
before-old order in conjunct noun phrases comes from a follow-up study
conducted with German-speaking children. Dimroth and Narasimhan
(2012) investigated whether 4- to 5-year-old children could be encour-
aged to employ the adult-like old-before-new order by making a referent
not only old (accessible) as in the prior study, but also topical (what the
discourse is about). It was hypothesized that such a discourse manipula-
tion would induce a temporary shift to the old-before-new ordering
preference even in younger children, whose habitual ordering pattern at

the phrasal level is new-before-old. The experimental manipulation
involved having the experimenter make comments about the old referent
after it had been introduced to the participant, thus increasing its topical-
ity in the discourse relative to the new referent. Young children showed
no change in their new-before-old ordering preference even though a
referent was made both accessible (by prior mention) and topical (by
producing comments about it in prior discourse), factors that are pre-
dicted to encourage its early mention in the conjunct noun phrase. This
finding suggests that young childrens preference for the new-before-old
order is robust and resistant to the manipulation designed to encourage
use of the opposite, old-before-new, ordering pattern. The strong prefer-
ence for new-before-old disappears between the ages of 5 and 9 years.
In an additional study, Dimroth and Narasimhan (2012) demonstrated
that 9-year-olds exhibit an old-before-new preference that is similar to
adults basic ordering preferences even without the manipulation of
topicality.
Why do we see such differences between adult and young child prefer-
ences for word order in phrasal conjuncts? If the old-before-new order,
which dominates adult production, is due primarily to ease of processing
(as suggested by speaker-oriented accounts of word order preferences),
then children seem to overcome this processing hurdle quite easily.
However, it seems unlikely that children will suppress an easier response
and prefer to produce a word order that they do not typically encounter
in the input. An alternative possibility is that the new-before-old order
may well be easier to produce for young children. And if the new-before-
old order is indeed easier for the children to produce, the question arises
as to whether this order is also, at some level, easier for adults (see
further discussion in section 11.6).
There is some prior evidence consistent with the suggestion that new-
before-old may, in fact, be easier to produce, or, at least, that old-before-
new is not easier. Some prior research shows that old-before-new is not
always preferred in adults, depending on construction type and process-
ing load. In sentence comprehension studies, Clifton and Frazier (2004)
found that processing was facilitated when the postverbal arguments in
double-object constructions (e.g., The senator mailed the woman a report)
followed the definite-indefinite (old-before-new, e.g., . . . mailed the
woman a report) order versus the indefinite-definite (new-before-old,
e.g., . . . mailed a woman the report) order. Yet a similar old-before-new
facilitation was not found for NP-PP constructions (in Clifton and
Fraziers [2004] terminology) where the definite postverbal noun phrase

was followed by a prepositional phrase containing an indefinite noun
phrase (e.g., The senator mailed the report to a woman). More relevant
to the present study, Slevc (2011) reports a reduction in the old-before-
new (or given-new) ordering preference in the production of (prepo-
sitional or double-object) datives under certain conditions. Speakers are
less likely to describe dative-eliciting pictures using the old-before-new
order of postverbal constituents when under a verbal processing load
compared to a condition in which no processing load is imposed.
To briefly summarize, the empirical research discussed so far suggests
that the old-before-new order is not always the dominant preference in
speakers. Rather, the linear ordering of old and new referents during
utterance production is influenced by a multiplicity of factors, including
speaker-oriented factors such as ease of production, addressee-oriented
factors such as addressee comprehension, the salience of novel entities
(which may be age-related), construction type, and processing load.
Other factors include learned conventions based on experience with
frequent ordering patterns in the ambient language. In many languages
the subject typically precedes the predicate, and since subjects frequently
encode topics, which tend to be old in the discourse, there is an overall
high prevalence of the old-before-new order in discourse.
However, we know little about the interplay of these competing factors
during language production in different contexts. Here we ask: is it pos-
sible to modulate speakers ordering preferences by manipulating the
role of one or the other factors that are posited to influence ordering
preferences? Specifically, if ease of processing plays a role in influencing
speakers ordering preferences, then will increasing speakers cognitive
load change their ordering patterns, even when other factors, such as the
construction type and the communicative goals of the speaker, are kept
constant? We investigate this issue by introducing a dual-task manipula-
tion that makes it harder for speakers to retrieve referent labels, but
which keeps constant the information that is shared between the speaker
and the hearer.
11.3 The Influence of Cognitive Load on the Accessibility of Noun Phrases

in Conjuncts
In this section, we describe two studies involving a picture-matching

game between the participant and a confederate to elicit descriptions of
old and new referents from adult speakers of English. Participants
perform the same task under two different conditions. In the naming
study, the participants label pictures of pairs of objects shown on a com-
puter screen that is visible only to themselves (e.g., apple, pencil). One
of the objects in the picture is old, having been labeled in the immedi-
ately prior trial (e.g., apple), whereas the other object is new, not having
been encountered in prior trials (e.g., pencil). The confederate then finds
the picture that matches the participants description from among a stack
of similar pictures. Based on prior research employing a similar paradigm
with adult native speakers of German (Narasimhan and Dimroth 2008),
we predict that adult English speakers are more likely to use the old-
before-new ordering within conjunct noun phrases (e.g., an apple and a
pencil) versus the new-before-old ordering (e.g., a pencil and an apple).
In the naming-under-load study, a second group of participants per-
forms the identical labeling task described above, but concurrently mem-
orizes and rehearses a list of distractor words that are semantically
related to both the old and the new referents. We hypothesize that inter-
ference from semantically related distractor words (Gordon, Hendrick,
and Levine 2002; Ferreira and Firato 2003) will make it harder for speak-
ers to retrieve labels for the old and the new referents in the naming-
under-load condition. A straightforward prediction is that speakers will
tend to show an increased tendency to use the order that is easier for
them to produce when under a processing load relative to the simple
naming condition.
But what is the easier order likely to be? If accessibility leads to ease
of processing, then old referents are mentioned early in the utterance
because their mention in discourse makes them more activated and avail-
able for retrieval earlier than new referents (Bock and Irwin 1980; Fer-
reira and Yoshita 2003), freeing up working memory capacity for other
processes (Baddeley 1986; Jackendoff 2002; Just and Carpenter 1992).
But as 3- to 5-year-olds prefer the new-before-old order in phrasal con-
juncts (Narasimhan and Dimroth 2008), it may be the new-before-old
order that is easier for speakers to produce. Owing to the multiplicity of
factors that favor production of the old-before-new order (discussed
earlier), adult speakers may not exhibit the new-before-old order in
typical discourse contexts that do not tax the language processing system
in any way. But we conjecture that when speakers are placed under a
cognitive load, their processing resources are taxed in such a way that
ease-of-processing considerations become paramount during utterance
production. For instance, speakers may want to produce the new item
first because it is novel, salient, and therefore in the current focus of
attention. Additionally, it is more fragile in representation than the old

information, and producing it first will get this item off the stack quickly,
before its representation becomes unavailable. The old information may
be more stable, having been processed more deeply, and therefore can
be counted on to remain available for longer in working memory. Alter-
natively, the working memory load may cause the old information to
become displaced from focus of attention, rendering it susceptible to
interference from the items currently in focus when it is retrieved for
encoding purposes (Slevc 2011). Under such circumstances, adult speak-
ers may well exhibit a new-before-old preference, or at least a reduced
old-before-new preference.
In summary, our predictions are that (a) adult speakers under no load
will show a preference to order old entities before new entities in phrasal
conjuncts and that (b) a processing load will modulate this preference
such that speakers will exhibit a greater propensity to use the order that
is easier for them to produce. If the easier order is old-before-new we
expect to see an increased tendency to use this order. But if the opposite
order is easier to process, a decrement in the old-before-new preference
is expected.
11.4 Experiment 1: Naming
The study employs a picture-matching task that elicits descriptions of

objects in contexts that are interactive yet controlled (Yule 1997).
11.4.1 Participants
Participants were 18 native English-speaking adults (11 females), with
no history of language disorders, ranging in age from 18 to 38 years,
recruited on the University of Colorado Boulder campus. Two partici-
pants were excluded from the study, one due to showing strong influence
of a second language, and one due to equipment failure.
11.4.2 Materials
The stimulus items consisted of photographs of 24 inanimate objects. The
object names were grouped into 12 pairs. Object pairs were matched
based on the frequency of their labels in the CHILDES (Child Language
Data Exchange System) database (MacWhinney 2000) given that this
database was used to generate object labels in preparation for future
comparisons with young children. Additional matching criteria for object
pairs were based on the phonological features of object labels, ease of
labeling the objects, and the size of the real world objects that the labels
named. Three warm-up pairs and 14 filler pairs were also included (see
appendix A).
Short film clips of the two items moving in random paths across the
screen were created. Two versions of each clip were created, with the
items initially appearing in different locations on the screen in order to
avoid any spatial bias that might influence order of mention of the
objects. The stimulus items were randomized and organized in eight
conditions based on order of list presentation, version of film clip shown
(as described above), and order of stimulus presentation (item A or item
B in a pair presented first). Film clips of items were presented on a
15-inch MacBook Pro.
11.4.3 Procedure
Participants labeled a single item (e.g., a flower) shown on the computer
screen. An experimenter who could not see the screen found a matching
picture out of a set of pictures and repeated the participants object label.
Participants then saw a clip of two items, one of which had been shown
in the immediately prior clip (e.g., a flower and a crayon), and again
labeled the objects such that the experimenter could find the matching
picture.
11.4.4 Analysis and Results

All test trials were transcribed and coded by two coders. Participants
first spontaneous responses were coded, with 4 responses being excluded,
due to experimenter error, from a total of 192 responses. The remaining
188 responses were coded for order of mention (new-before-old or old-
before-new) and for the variables described below.
We ran a mixed-effect logistic regression model (Baayen 2008), using
contrast coding for the fixed effects, with order of mention as the outcome
variable. There were two random effect factors: participant and item.
Participants descriptions often varied from the canonical target response
(X and Y). For instance, participants varied in the words they selected
to label the same object. They also used labels of differing lengths (or
weights), different conjunctions and determiners, and utterances
varying in fluency (e.g., utterances not produced within a single, smooth
intonation contour, or containing false starts, hesitations, or longer
pauses). The experimenter also sometimes repeated an object label more
than once, or occasionally, not at all. In order to examine the influence
of information status (old vs. new) independently of these factors that
might influence linear order, we entered, as control variables, the follow-

ing: weight, conjunction, determiner, repetitions, label type, and fluency.
Additional control variables included the order in which the trials were
presented, the order in which the two objects were visually displayed
when they first appeared on the screen, and whether item one or item
two from the object pair shown in the test trials was the old object.
There were no main effects of any of the control variables. Keeping
participant and item as random effects with only an intercept, we get an
intercept significantly different from zero ( = 3.27, SE = 0.59, Z value =
5.55, p < 0.001).
11.4.5 Discussion
The results of the current study replicated the results seen with the adults
in the Narasimhan and Dimroth (2008) study with German speakers.
Adults prefer to label first a referent made relatively more accessible by
prior mention versus a newly introduced referent. This preference may
be due to ease of processing; speakers find it easier to produce more
accessible items first. Alternatively, this word order preference may
reflect a learned convention. Perhaps participants prefer old-before-new
because it is the most frequent order to which they have been exposed,
or because they adopt an audience-design strategy; speakers may have
assumed that the old-before-new order would facilitate the confederates
comprehension and picture-matching activity.
In order to examine whether manipulating ease of processing inde-
pendently of other factors influences linear ordering preferences, we next
employed a concurrent recall task that increases retrieval difficulty of
the labels for the old and new referents.
11.5 Experiment 2: Naming-Under-Load
11.5.1 Participants
Participants were 18 native English-speaking adults (9 females) recruited
on the University of Colorado Boulder campus ranging in age from 19
to 34 years, with no history of language disorders. Two participants data
were excluded, one due to failure to name all items presented during the
test trials, and a second due to experimenter error. None of the partici-
pants had participated in Experiment 1.
11.5.2 Materials
Stimulus items. Stimulus items were identical to those in the naming
study.
Distractor words. The materials for the concurrent verbal recall task
consisted of a list of 6 distractor words for each trial. Three words were
related to each of the two test items in the trial. Distractor items were
selected from the WordNet online database (Fellbaum 1998; Princeton
University 2010) or semantic associates chosen by the experimenters.
Distractors were also matched for concreteness, familiarity, and image-
ability ratings from the MRC Psycholinguistic Database (Coltheart
1981). In filler trials, distractor words were randomly selected. Distractor
words were presented in an ABABAB format, such that all A-distractors
were related to one item in a pair, and all B-distractors were related to
the second item (see appendix B).
11.5.3 Procedure
The procedure for the distractor experiment consisted of two tasks: a
naming task and a recall task. In the recall task, participants saw six
words that they had to memorize on a computer screen. Participants
engaged in rehearsal of the distractors: they were instructed to continu-
ously repeat the distractor words aloud to aid their memorization until
they saw a screen with a question mark. Upon seeing the question mark,
participants then completed the recall task with a test: they were
instructed to recall as many of the distractor words as they could. Par-
ticipants were told that they would be rated on the number of correct
words recalled.
The naming task was concurrent with the repetition (rehearsal) stage
of the recall task: in between the presentation of the distractor words
and the question mark signaling the recall task (i.e., while participants
were repeating the distractor words aloud), participants were shown the
stimulus items for the object naming task on the computer screenfirst
the old object, and then the old and new objects together. At this point,
they would name the objects (in the first instance, by labeling the old
object, and in the second, by using a phrasal conjunct labeling both the
old and new objects).
An example of the procedure is presented in appendix C.
11.5.4 Data Treatment

All test trials were transcribed and coded as in the naming study. A total
of 192 responses were collected, from which 21 were excluded due to
failure to name both items in the object pair, inconsistent labeling of one
of the items presented, or experimenter error. The remaining 171 score-
able responses were coded for order of mention (old-before-new or
new-before-old). All responses were coded for the same categories used
to code responses in the naming study, in addition to seven coding cat-

egories relevant only to the naming-under-load study:
(1) Last-produced distractor type during repetition (rehearsal) is a
coding category that included the type of distractor word uttered imme-
diately prior to the production of the phrasal conjunct (the naming task),
which varied because participants were rehearsing the distractor list at
their own pace. Since producing a distractor word that is semantically
related to either the old or the new referent label might influence the
order in which they are subsequently mentioned, we noted whether the
distractor was related to the old item (R-old), related to the new item
(R-new), or whether the last word mentioned during repetition/
rehearsal prior to the naming task was not on the original distractor list
(Else).
Other factors that might influence the order in which the labels in the
phrasal conjunct were produced were also coded for, including:
(2) The number of repetitions by the participants during rehearsal of
distractor words related to the new item (i.e., prior to the naming task,
and therefore prior to the test at the completion of the recall task).
(3) The number of repetitions by the participants during rehearsal of
distractor words related to the old item (i.e., prior to the naming task,
and therefore prior to the test at the completion of the recall task).
(4) The number of repetitions by the participants during rehearsal of a
novel distractor (a word inserted by the participant that was not on the
distractor list), that is, prior to the naming task, and therefore prior to
the test at the completion of the recall task.
(5) The number of distractor words related to the old item that were
correctly recalled at the end of each trial (i.e., at the test at the comple-
tion of the recall task).
(6) The number of distractor words related to the new item that were
correctly recalled at the end of each trial (i.e., at the test at the comple-
tion of the recall task).
(7) The number of words produced during recall that were not on the
distractor list (i.e., at the test at the completion of the recall task).
Each of the categories (2)(7) had the values few or many corre-
sponding to a pre-determined range of values in the data. For categories
(2)(4), the average number of repetitions in each of the categories was
computed, and all values below the average value were coded as few,
otherwise as many (decimals were rounded to the nearest whole

number). For categories (5)(7), if the number of words produced at the
test at the completion of the recall task was 0 or 1, it was coded as few,
otherwise as many.
11.5.4 Analysis and Results

Responses were included based on the criteria outlined in the naming
study. The scoreable responses (171 out of 192 responses) were annotated
for order of mention.
As before, we ran a mixed-effect logistic regression model with order
of mention as the outcome variable, and two random effect factors, par-
ticipant and item. The naming studys control variables were included, as
well as the following additional variables: (a) last-produced distractor
type; (b) number of participants mentions of distractor items related to
the new item; (c) number of participants mentions of distractor items
related to the old item; (d) number of novel distractors produced;
(e) number of correctly recalled distractors related to the new item; (f)
number of correctly recalled distractors related to the old item; and (g)
number of novel distractors produced during the recall task.
Results showed that participants word order preferences in the
naming-under-load study differed from the preferences of participants
in the naming study. When placed under a processing load, participants
do not display the old-before-new bias found in the naming study (figure
11.1). This was demonstrated by an intercept that does not significantly
differ from zero in a model of naming-under-load responses that kept
only participant and item as random effects ( = 0.43, SE = 0.37, Z value
= 1.16, p = 0.245).
There were no main effects of any of the control variables except for
last-produced distractor type: participants were significantly more likely
to employ the new-before-old order if they produced a distractor word
that was semantically related to the new referent just before producing
a phrasal conjunct ( = 1.36, SE = 0.62, Z value = 02.19, p < 0.05).
To address the question of whether the change in word order prefer-
ence between the naming study and the naming-under-load study was
due to factors other than cognitive load, we considered whether the
overall reduction in old-before-new orders in the naming-under-load
study could arise entirely from the aforementioned effect of distractor
type. In other words, perhaps the lack of old-before-new bias in this
second study was not due to the addition of the recall task, but rather
due to the high number of new-before-old responses produced in test
1.00
0.90 new_old
0.90
old_new
0.80
0.70
0.60 0.57
0.50
0.43
0.40
0.30
0.20
0.10
0.10
0.00
Naming Naming + Load
Figure 11.1
Mean proportions of old-before-new and new-before-old responses in the naming study
and the naming-under-load study
trials that were immediately preceded by a distractor word semantically

related to the new item.
We therefore examined whether the reduction in old-before-new pref-
erence is due to the processing load manipulation or to effects from a
distractor word related to new referents. To do this, we compared the
preference for old-before-new responses in the naming study with the
preference for old-before-new responses in each of the three different
distractor type conditions in the naming-under-load study, viz. in the test
trials immediately preceded by a distractor word that was semantically
related to the new item, a distractor word that was related to the old
item, or the word was not on the original distractor list at all. If it were
the case that the reduction in old-before-new bias seen in the naming-
under-load study were the result of a distractor word semantically related
to the new item, then we would expect to see the reduction only in those
test trials prior to which a distractor word semantically related to the
new item is produced. However, if cognitive load is also responsible for
the reduction in old-before-new bias seen in Experiment 2, then we
would expect to see the reduction in old-before-new bias even in those
test trials in which the distractor word produced immediately prior to
the response is not related to the new item.
We pooled the data obtained from both naming and naming-under-
load experiments and ran a mixed-effect logistic regression model with
1.00
0.90 new_old
0.90
old_new
0.80 0.79
0.70 0.65
0.60 0.58
0.50
0.42
0.40 0.35
0.30
0.21
0.20
0.10
0.10
0.00
Naming R-new R-old Else
(Naming + Load) (Naming + Load) (Naming + Load)
Figure 11.2
Mean proportions of old-before-new and new-before-old responses in the naming study
(first two bars to the left of the graph) and in the R-old, R-new, and Else trial groups of
the naming-under-load study. (R-old: last-produced distractor word was related to the first,
older item of the pair; R-new: last-produced distractor word was related to the newer item;
Else: the last word mentioned prior to the object labeling was a word not on the original
distractor list.) Note: Although participants reproduced distractor words that they were
instructed to memorize, sometimes they randomly produced a word that was not on the
list; such occurrences were coded as else.
order of mention as the outcome variable, and two random effect

factors, participant and item. As predictor variable, we included distrac-
tor status: None (no distractor item used in the naming study), R-old
(the last distractor item mentioned was related to the old item in
the naming-under-load study), R-new (the last distractor item men-
tioned was related to the new item in the naming-under-load study), and
Else (the last word mentioned prior to the object labeling was not
on the original distractor list in the naming-under-load study). The
control variables included those variables that were common in both
experiments.
The responses in the naming study showed a significantly higher old-
before-new preference compared to the responses in each of the three
distractor status conditions in the naming-under-load study (figure 11.2).
There were no effects of any of the control variables except for fluency
and a marginally significant effect of weight, and only fluency survived
a likelihood ratio test. Hence the analysis was rerun to examine the
interaction of fluency and distractor status (table 11.1). In this latter
Table 11.1
Effects of the Last-Produced Distractor Word and Fluency on New-Before-Old versus
Old-Before-New Responses
Estimate Std. Error Z value p value
(Intercept) 3.44 0.58 5.89 0.00***

Last distractor word: Else 2.36 1.02 2.33 0.02*
Last distractor word: New 4.45 0.83 5.37 0.00***
Last distractor word: Old 3.17 0.80 3.97 0.00***
Fluency: Nonfluent 1.01 0.60 1.69 0.09
Last distractor word: Else*Fluency: 0.45 1.10 0.41 0.68
Nonfluent
Last distractor word: New*Fluency: 0.27 0.88 0.31 0.76
Nonfluent
Last distractor word: Old*Fluency: 0.90 0.84 1.08 0.28
Nonfluent
analysis, the only significant coefficient obtained was for distractor status.
Participants were more likely to produce old-before-new responses in
the naming study than in any of the three distractor status conditions of
the naming-plus-recall study: R-old group ( = 3.17, SE = 0.80, Z value
= 3.97, p < 0.001), the R-new group ( = 4.45, SE = 0.83, Z value =
5.37, p < 0.001), and the Else group ( = 2.37, SE = 1.02, Z value =
2.33, p < 0.05). Thus we find a reduction in the old-before-new responses
in the naming-under-load task relative to the naming task irrespective of
the semantic relatedness of the distractor word to either new or old refer-
ent labels.
11.5.5 Discussion
Our results show that speakers demonstrate an overall elimination of the
old-before-new bias under a processing load. These findings provide
empirical support for the role of speaker-oriented considerations such as
ease of processing in modulating word-order preferences. They are also
compatible with the specific hypothesis that speakers first produce new
information that has a relatively less robust representation in working
memory. Possibly, the saliency of new objects also contributes to their
activation and ease of retrieval. Furthermore, when speakers produce a
distractor that is semantically related to the new item immediately prior
to naming the objects in the test trials, the old-before-new bias is com-
pletely reversed. One possible explanation is that the related distractor
primes the new item, perhaps having an additive effect along with
saliency, increasing its activation and likelihood of being produced first.
Distractors related to the old items do not have a similar effect, a finding
that requires further research for an explanation.
While the introduction of cognitive load in our second study reduced
the preference for old-before-new responses in adults, it did not result in
a strong, across-the-board preference for new-before-old, as seen in the
young childrens production even without added cognitive load (Nara-
simhan and Dimroth 2008; Dimroth and Narasimhan 2012). If the
saliency of the new object makes it the more easily accessible item, or if
the fragility of its mental representation motivates early encoding in the
utterance by speakers, why do adults not show a basic preference for
new-before-old? As discussed earlier, one possibility is the influence
of competing factors that favor the old-before-new order in adults.
For instance, adults have had more exposure than children to the
(putatively) more frequently occurring old-before-new order pattern
across different construction types over the course of their linguistic
experience. For instance, speakers may be more likely to use, and encoun-
ter, the old-before-new order when using active declarative construc-
tions, which they may often hear used with the old-before-new order, as
opposed to phrasal conjuncts (see Stephens [2010] for evidence of an
old-before-new preference in childrens production of ditransitive con-
structions, and Slevc [2011] for a similar preference in adults in the
absence of a cognitive load). The influence of addressee-oriented con-
siderations favoring the old-before-new as a way to facilitate listener
comprehension may also play a role, albeit in an attenuated matter under
conditions in which speakers lack the cognitive resources to engage in
audience design.
An alternate explanation for the response pattern seen in the naming-
under-load study does not rely on competition between factors favoring
the old-before-new versus the new-before-old order. Rather, it accounts
for the reduction in the old-before-new bias in terms of the influence of
cognitive load on how the old information is encoded, maintained, or
retrieved. For instance, Slevc (2011) suggests that speakers old-new
preference in the production of dative constructions is attenuated when
under a verbal processing load because of interference-based effects
from items held in memory from the concurrent recall task: WM
[working memory] load either made it difficult to keep [old] information
sufficiently active to warrant early mention or led to increased interfer-
ence at the point of retrieving that otherwise accessible item. . . . a plau-
sible alternative is that the WM load interfered with the encoding of the
accessible item (2011, 1511). Since there was no preference for the old-
before-new or new-before-old order in our second study (except in those
cases where a distractor word semantically related to a new item was
produced before a test trial), it is possible that a similar explanation can
be provided for our results. That is, it is possible that the old information
was not more robustly represented than the new information, or that it
was not even retained in memory at all. Although it is not mutually
exclusive with our account, several factors suggest that an explanation
along the lines provided by Slevc (2011) is unlikely to be the sole factor
motivating the reduction in the old-before-new bias in our second study.
First, anecdotal evidence suggests that participants are maintaining the
old vs. new distinction: participants used the definite determiner the to
label old referents (7 responses; no responses showed the definite deter-
miner used with new referents). Furthermore, in 19 excluded responses,
participants named only the new referent (there were no cases in which
participants named only the old referent and omitted the new one).
Second, producing a distractor word semantically related to a new refer-
ent label facilitates the retrieval of the new item to a greater extent than
when a distractor word related to an old referent label is produced (see
figure 11.2). This suggests that the representation of old versus new ref-
erents is distinct. Third, there is no relationship between the number of
correctly recalled distractor words and ordering preference. If impaired
memory for the old object led to the decrement in old-before-new order,
participants ordering preferences should be influenced by differences in
their recall abilities, but this is not the case. Finally, if participants were
simply using random ordering patterns, we would expect to see a roughly
50-50 split in choice of orders at the individual level. Instead, we see a
bimodal pattern (table 11.2), where almost all the participants either
have a predominantly old-before-new preference or a predominantly
new-before-old ordering preference.
11.6. Conclusions
Adult speakers of English have a robust preference to order old referents

before new referents, but such a preference is eliminated when partici-
pants processing resources are taxed. These findings demonstrate that
speakers ordering preferences can be shifted by boosting ease of pro-
cessing considerations. The direction of the effect in the cognitive
load condition is towards an overall reduction in the old-before-new
preference. The decreased preference for the old-before-new order under
Table 11.2
Proportion of New-Before-Old and Old-Before-New Responses Per Participant in the
Naming-Under-Load and Naming Tasks*
naming study naming-under-load study
Participant new_old old_new Participant new_old old_new
1 0.00 1.00 1 0.27 0.73

2 0.17 0.83 2 0.25 0.75
3 0.00 1.00 3 0.82 0.18
4 0.00 1.00 4 0.83 0.17
5 0.25 0.67 5 0.89 0.11
6 0.00 1.00 6 0.17 0.83
7 0.58 0.42 7 1.00 0.00
8 0.00 0.92 8 0.36 0.64
9 0.00 0.92 9 0.75 0.25
10 0.08 0.92 10 0.67 0.33
11 0.00 1.00 11 0.58 0.42
12 0.00 0.92 12 0.20 0.80
13 0.08 0.92 13 0.91 0.09
14 0.08 0.92 14 0.86 0.14
15 0.00 1.00 15 0.25 0.75
16 0.33 0.67 16 0.44 0.56
*Bolded items represent participants preferred pattern of responses (60% or more of their
responses).
cognitively taxing circumstances is compatible with the notion that the

new-before-old order is a more basic preference in children and adults
alike, at least in phrasal conjuncts. The elimination of the old-before-new
bias in the overall data (see figure 11.1), combined with the complete
reversal of the bias within the R-new trial group (see figure 11.2), pro-
vides supporting evidence for a new-before-old preference in circum-
stances involving processing load that is not related to the communicative
needs of an addressee.
In children, the new-before-old preference can be linked to the new
object being in the focus of attention, hence highly activated and acces-
sible. At the same time, not having been as deeply processed as the old
item, its representation is also relatively less stable, motivating its early
mention in the utterance. In adults, the new-before-old preference com-
petes with other variables that combine to promote the old-before-new
order, such as addressee-oriented considerations or the frequency of the
old-before-new order observed across various construction types in the
language. Hence we do not see an overall new-before-old preference,
only a reduction in the old-before-new bias.
The research presented here shows that it is possible to separate the
influences of aboutness and accessibility on sentence production in
order to examine the effects of accessibility alone on linear ordering

preferences. In doing so, the empirical findings demonstrate that the
assumption regarding a putatively universal linguistic preference for old-
before-new needs to be reexamined. In conjunction with the prior studies
discussed in the introduction, the findings of the research reported here
suggest that speakers linear ordering preferences can be modulated in
gradient ways depending on the degree to which one or the other factor
prevails, including the processing resources available to the speaker, the
degree to which addressee comprehension is facilitated by using a par-
ticular word order, the salience of novel entities relative to other entities
in the discourse-pragmatic context (which may be influenced by the age
of the speaker), and the frequency with which the old-before-new order
occurs in the construction type used by the speaker in the language,
among others.
This view is compatible with the idea that linear ordering preferences
are governed by something akin to a preference rule system, broadly
construed (Jackendoff 1983, 1990; Lerdahl and Jackendoff 1983). A pref-
erence rule system consists of a set of conditions none of which are
necessary, but any one of which is sufficient to license a specific phenom-
enon, whether it is a specific musical structure or the extension of a
lexical item to a particular referent (Jackendoff 1983). Our work suggests
that while the interface principle(s) governing the mapping between
information status (new vs. old) and linear order may be simple (old-first
or new-first), the relative strength of the preference for a particular order
during utterance production is influenced by a set of linguistic, commu-
nicative, and cognitive conditions that may be satisfied to varying degrees.
Acknowledgements
We would like to express our appreciation to our research assistants at

the Language, Development, and Cognition Lab for their help with the
experiments: Steve Duman, Skye Smith, Celeste Smitz, and Cecil Yeatts.
We benefited greatly from feedback given to us by our colleagues and
students, especially Susan Brown, Steve Duman, David Harper, Alison
Hilger, Lise Menn, and Les Sikos, as well as the audiences at the Institute
for Cognitive Science Colloquium (University of Colorado) and the
Competing Motivations Workshop at the Max Planck Institute for Evo-
lutionary Anthropology, Leipzig, where previous versions of this research
were presented.
Appendix A
Target labels for object pairs for each trial (filler items not shown)
Pair Item 1 Item 2
1 book chair
2 clock plate
3 flower crayon
4 cup shoe
5 key knife
6 hat egg
7 cookie bottle
8 tree bus
9 ball spoon
10 car bed
11 apple pencil
12 glass shirt
Appendix B
Target labels for object pairs and related distractors for each trial
Pair Item 1 Item 2 Distractors
1 book chair text, ottoman, newspaper, bench, magazine, sofa

2 clock plate sundial, dish, timer, platter, watch, saucer
3 flower crayon seed, pen, fruit, chalk, leaf, marker
4 cup shoe can, slipper, teapot, boot, mug, clog
5 key knife lock, razor, phone, axe, wallet, scissors
6 hat egg cap, nest, turban, chick, hood, hen
7 cookie bottle muffin, ladle, brownie, jar, cupcake, vat
8 tree bus shrub, train, bramble, shuttle, bush, van
9 ball spoon racket, napkin, court, fork, hoop, chopstick
10 car bed motorcycle, nightstand, jeep, pillow, taxi, blanket
11 apple pencil orange, eraser, banana, ruler, peach, stapler
12 glass shirt vase, blouse, pitcher, sweater, flask, vest
Appendix C
Idealized example from a single trial in the distractor task (target items
are underlined)
(distractor words presented on the screen: ORANGE, ERASER, BANANA, RULER,
PEACH, STAPLER; Participant begins rehearsal for the recall task)
Participant: Orange, eraser, banana, ruler, peach, stapler . . . Orange,

eraser, banana, ruler, peach, stapler . . .
(distractor words disappear from the screen)
Participant: Orange, eraser, banana, ruler, peach, stapler . . . Orange,

eraser, banana, ruler, peach, stapler . . .
(Object A appears on the screensignaling the first part of the naming
task)
Experimenter: Whats on the screen?
Participant: An apple. Orange, eraser, banana, ruler, peach,
stapler . . . Orange, eraser, banana, ruler, peach, stapler . . .
Experimenter (producing matching picture): An apple.
Participant: Orange, eraser, banana, . . . yes . . . ruler, peach,
stapler . . . Orange, eraser, banana, ruler, peach, stapler . . .
(Objects A and B appear on the screensignaling the second part of the
naming task)
Experimenter: Whats on the screen?
Participant: An apple and a pencil. (Or: A pencil and an apple.) Orange,
eraser, banana, ruler, peach, stapler . . . Orange, eraser, banana, ruler,
peach, stapler . . .
Experimenter: An apple and a pencil.
Participant: Orange, eraser, banana, . . . yes . . . ruler, peach, stapler . . .
Orange, eraser, banana, ruler, peach, stapler . . .
(question mark appears on the screensignaling the test for the recall
task)
Participant: Orange, eraser, banana, ruler, peach, stapler.
References
Baayen, R. Harald. 2008. Analyzing Linguistic Data: A Practical Introduction to

Statistics Using R. Cambridge: Cambridge University Press.
Baddeley, Alan D. 1986. Working Memory. Oxford: Oxford University Press.
Bock, J. Kathryn. 1977. The effect of a pragmatic presupposition on syntactic
structure in question answering. Journal of Verbal Learning and Verbal Behavior
16 (6): 723734.
Bock, J. Kathryn, and David E. Irwin. 1980. Syntactic effects of information avail-
ability in sentence production. Journal of Verbal Learning and Verbal Behavior
19 (4): 467484.
Blte, Jens, Andrea Bhl, Christian Dobel, and Pienie Zwitserlood. 2009. Effects
of referential ambiguity, time constraints and addressee orientation on the pro-
duction of morphologically complex words. European Journal of Cognitive Psy-
chology 21 (8): 11661199.
Branigan, Holly P., Janet F. McLean, and Hannah Reeve. 2003. Something old,
something new: Addressee knowledge and the given-new contract. In Proceed-
ings of the 25th Annual Conference of the Cognitive Science Society, edited by
Richard Alterman and David Kirsch, 180185. Boston, MA: Psychology Press.
Clark, Eve V., and Susan E. Haviland. 1977. Comprehension and the given-new
contract. In Discourse Production and Comprehension, edited by Roy O. Freedle,
140. Norwood, NJ: Ablex.
Clark, Herbert H., and Deanna Wilkes-Gibbs. 1986. Referring as a collaborative
process. Cognition 22 (1): 139.
Clifton, Charles, Jr., and Lynn Frazier. 2004. Should given information come
before new? Yes and no. Memory and Cognition 32 (6): 886895.
Coltheart, Max.1981. The MRC psycholinguistic database. Quarterly Journal of
Experimental Psychology 33 (4): 497505.
Dimroth, Christine, and Bhuvana Narasimhan. 2012. The development of linear
ordering preferences in child language: The influence of accessibility and topical-
ity. Language Acquisition 19 (4): 312323.
Fellbaum, Christine, ed. 1998. WordNet: An Electronic Lexical Database. Cam-
Ferreira, Victor S., and Carla E. Firato. 2002. Proactive interference effects on
sentence production. Psychonomic Bulletin and Review 9 (4): 795800.
Ferreira, Victor S. and Hiromi Yoshita. 2003. Given-new ordering effects on the
production of scrambled sentences in Japanese. Journal of Psycholinguistic
Research 32 (6): 669692.
Gordon, Peter C., Randall Hendrick, and William H. Levine. 2002. Memory load
interference in syntactic processing. Psychological Science 13 (5): 425430.
Halliday, Michael A.K. 1994. Introduction to Functional Grammar. London:
Edward Arnold.
Haywood, Sarah L., Martin J. Pickering, and Holly P. Branigan. 2005. Do speakers
avoid ambiguities during dialogue? Psychological Science 16 (5): 362366.
Hoff-Ginsberg, Erica. 1997. Language Development. Pacific Grove, CA: Brooks/
Cole.
Jackendoff, Ray S. 1972. Semantic Interpretation in Generative Grammar. Cam-
Jackendoff, Ray.1983. Semantics and Cognition. Cambridge, MA: MIT Press.
Jackendoff, Ray. 2002. Foundations of Language: Brain, Meaning, Grammar,
Evolution. New York: Oxford University Press.
Just, Marcel A., and Patricia A. Carpenter. 1992. A capacity theory of comprehen-
sion: Individual differences in working memory. Psychological Review 99 (1):
122149.
Levelt, Willem J. M. 1989. Speaking: From Intention to Articulation. Cambridge,
MA: MIT Press.
MacWhinney, Brian. 2000. The CHILDES Project: Tools for Analyzing Talk. 3rd
ed. Mahwah, NJ: Lawrence Erlbaum Associates.
Narasimhan, Bhuvana, and Christine Dimroth. 2008. Word order and information
status in child language. Cognition 107 (1): 317329.
Princeton University. 2010. About WordNet. WordNet. Princeton University.
http://wordnet.princeton.edu.
Slevc, L. Robert. 2011. Saying whats on your mind: Working memory effects on
syntactic production. Journal of Experimental Psychology: Learning, Memory,
and Cognition 37 (6): 15031514.
Stephens, Nola. 2010. Given-before-new: The Effects of Discourse on Argument
Structure in Early Child Language. Ph.D. diss., Stanford University.
Von Stutterheim, Christiane, and Wolfgang Klein. 2002. Quaestio and
L-perspectivation. In Perspective and Perspectivation in Discourse, edited by Carl
F. Graumann and Werner Kallmeyer, 5988. Amsterdam: John Benjamins.
Wundt, Wilhelm M. 1900. Die Sprache. Leipzig: Engelmann.
Yule, George. 1997. Referential Communication Tasks. Mahwah, NY: Lawrence
Erlbaum Associates.
12 Sleeping Beauties
Willem J. M. Levelt
12.1 Mendels Laws: The Prototype of Scientific Rediscovery
During the decade around 1860 Gregor Mendel ran his classic experi-
ments on the hybrids of pea plants in the botanical garden of his
Augustine monastery in Brnn, Austria. There he discovered the basic
principles of heredity, later called Mendels laws: the law of segregation
(the existence of dominant and recessive traits) and the law of indepen-
dent assortment (traits being independently inherited). In 1866 he
published these discoveries as Versuche in Pflanzenhybriden in the
journal of the local natural science society, not exactly a journal that
featured on Charles Darwins shelves. Mendel then became abbot of his
monastery and spent little further effort on promoting his discoveries.
They became sleeping beauties for the next three decades. By the
end of the 1890s, four princes, more or less independently, kissed them
back to life: Hugo de Vries from Amsterdam, Erich Tschermach-
Seyseneggassisted by his brother Arminfrom Vienna, and Carl
Correns from Tbingen. Their papers, all three reporting the rediscovery
of Mendels laws, appeared almost simultaneously in 1900, two of them
acknowledging Mendels priority, the third one, Hugo de Vries, soon
joining in.
This is undoubtedly the most famous case of rediscovery in modern
science. However, rediscovery is not limited to the natural sciences. The
present chapter will review a number of sleeping beauties in linguistics
and psycholinguisticsdiscoveries, tools, and theories that reawakened
after long periods of slumber. I came across them while writing A History
of Psycholinguistics (2013).1 One of these beauties, the first to be dis-
cussed, was kissed back from enchantment by Ray Jackendoff in his
theory of consciousness (1987).
236 Willem J. M. Levelt
12.2 Heymann Steinthal on Consciousness
The discovery of the Indo-European language family by the end of the

eighteenth century engendered a concerted search for the proto-language
from which these languages had evolved. It was a search for the
Ur-Wurzeln, the original lexical core roots from which all later lexicons
had evolved. Sanskritist Max Mller, for instance, didnt hesitate to claim
that there were 121 core roots: These 121 concepts constitute the stock-
in-trade with which I maintain that every thought that has passed through
the mind of India, so far as it is known to us in its literature, has been
expressed (Mller 1887, 406 ).
With the widely accepted notion of a proto-language, with its core of
lexical roots, a dilemma arose for linguists. Should they try to explain
how these roots in their turn had come about, or should they simply stop
at this so-called root barrier? Among those refusing to go beyond the
root barrier were leaders such as Franz Bopp, August Pott, August
Schleicher and William Dwight Whitney. But others, such as Max Mller,
Lazarus Geiger and Ludwig Noir were more adventurous, coming up
with wonderful stories about how primordial human society produced
its first lexical roots.
Heymann Steinthal (18231889) was the first to develop just for that
purpose a serious psychology of language. We may in fact consider him
as the inventor of psycholinguistics. Steinthal was a comparative linguist,
he taught at the Hochschule fr die Wissenschaft des Judentums in
Berlin. He was overly impressed by the work of Wilhelm von Humboldt,
in particular his indeed original idea that language is not a bunch of texts,
as studied by the historical linguists, but an activity of mind. For Hum-
boldt, language is what the speaker does. Linguistics should explain how
this works and this requires a developed psychology. Succesful advances
in the science of Linguistics are dependent on a mature Psychology,2
Steinthal wrote in 1855 (234), but there was no advanced psychology
around. Together with his life-long friend Mauritz Lazarus, Steinthal
founded in 1859 the Zeitschrift fr Vlkerpsychologie. They argued that
the psychology needed was an ethnic, social, or in modern terms, anthro-
pological psychology. Curiously enough, Steinthal never developed any-
thing of the sort. The psychology he adopted was Herbarts.
Johann Friedrich Herbart was Immanuel Kants successor in Knigs-
berg. He developed a very clever mathematical psychology of how ideas
(Vorstellungen) get in and out of consciousness, mutually associating
or dispelling each other (Herbart 1824). Herbart provides the precise
Sleeping Beauties 237
differential equations that govern this mental mechanics. The basic

idea is quite simple. Consciousness is like a stage. On the stage are one
or a few actors; it cannot contain more. All other actors push to get onto
the stage, using their associations to actors on the stage, and dispelling
other actors from the stage. Below consciousness are conglomerates of
associated ideas. New ideas on the stage are easily drawn into existing
conglomerates, for instance by similarity. This process is called appercep-
tion. The conglomerate into which a new idea gets associated, Herbart
calls the apperceptive mass.
Steinthal further developed this theory in order to explain how an
original spontaneous vocal response to some exciting, consciously per-
ceived event got perceived, landing on the stage of consciousness. The
short co-presence of perceived sound and perceived event on the stage
of consciousness leads to their association because they share the affect
of excitement. Here is an Ur-Wurzel (primal root) in statu nascendi.
Steinthal developed this theory in exceeding detail, including a phenom-
enology of consciousness far ahead of its time.
What do we mostly have on the stage of our consciousness? Steinthals
answer was: words, language, and specifically inner speech. Inner speech,
according to him, is the consciousness of the connection of a word to its
apperceptive mass. Psychologically speaking, the apperceptive mass is
the words meaning. It is the conglomerate of ideas we have come to
associate with that particular spoken word. Meaning almost never enters
consciousness itself because of its complexity. Consciousness is too
narrow for it. What consciousness can contain is the internal speech form;
that is the consciousness of the words connection to the dark appercep-
tive mass below. We translate the content of our thoughts in words . . . the
content sends its word substitutes into consciousness because it cannot
get there itself3 (Steinthal 1881, 437).
Steinthal now drastically narrows the notion of idea. In the civilized
language user practically any idea is a word-idea. Any idea in conscious-
ness is just the abstract reference of a word to its unconscious meaning
conglomerate. This internal speech form itself has little or no content,
but in the listener it can activate the underlying apperceptive mass, which
is, psychologically speaking, the words meaning.
Steinthal then goes on to discuss the economy of language and thought.
Words in consciousness are only lightweight references to the underlying,
unconscious apperceptive structures, their meanings. Any thinking or
creative mental process is unconscious, according to Steinthal. It is the
never ending apperceptive interaction of association and dissociation
among unconscious conglomerates. These highly complex events are con-

sciously represented as words and sentences. In this way, lightweight
consciousness can represent and affect heavyweight unconscious thought
processes.
This is almost exactly Ray Jackendoffs theory of consciousness, ini-
tially outlined in Jackendoff (1987) and (1997). We experience language
as organized sequences of sounds . . . the content of our experience, our
understanding of the sounds, is encoded in different representations, in
particular conceptual structure and spatial representations. The organiza-
tion of this content is completely unconscious (1997, 189). In 2007 Jack-
endoff writes that we are conscious of our thoughts not through
awareness of the thoughts themselves, but through the awareness of
phonological structure associated with thoughts (Jackendoff 2007, 84).
In Jackendoff (2012), this is called the unconscious meaning hypothesis
(90) and the author acknowledges Steinthals original work. Conscious
inner speech is phonological, according to Jackendoff. We are never
conscious of word class, syntax, or even meaning. We are only conscious
of the meaningfulness of our phonological images. A classic insight
indeed.
12.3 Sigmund Exner on Cohort Theory
Sigmund Exner (18461926) was a brilliant Viennese neurologist. He had

been a student of Hermann von Helmholtz, he was co-inventor of the
gramophone record, with which he established the Sound Archive in the
Austrian Academy, an institution still in existence. He had also proposed
a graphic/writing center in the brain, but here I want to mention Exners
invention of cohort theory.
In 1978, William Marslen-Wilson, as a member of the beginning Max
Planck enterprise in Nijmegen, formulated his cohort theory together
with Alan Welsh. It is a theory of how we recognize spoken words. The
core idea of cohort theory is that the initial speech sound of a word acti-
vates all words in the listeners lexical memory beginning with that sound.
As further speech sounds follow, the initial cohort of activated words
narrows down, step by step, excluding non-fitting members until just a
single word, the target, remains. Marslen-Wilson and his research team
developed entirely new experimental paradigms to test the theory, which
as a consequence went through several subsequent versions. The theory
made quite non-trivial predictions, which made it an attractive experi-
mental target. The strictly incremental nature of the activation predicted
that we cannot recognize a word when its initial speech sound is experi-
mentally changed. We will not recognize cold when we hear told. Still, we
might recognize the spoken non-word gypothesis as hypothesis. Later
versions of the theory allow for slight activation of (candidate) words that
were not in the original cohort (in the example the word hypothesis).
Another attractive feature of cohort theory is the notion of uniqueness
point. Each new incoming speech sound further reduces the cohort, till
just one candidate word is left, which is then recognized as the target
word. That can happen before all of the words speech sounds have come
in. Take the word snorkel. When the input has reached the stage snor-
then the cohort has been reduced to snorkel, snorer, snort, snorter, and
snorty. But as soon as k comes in, only snorkel remains. Hence, speech
sound k is snorkels uniqueness point. A words uniqueness point thus
depends on the set of word-initial alternatives in the listeners lexicon.
The theory predicts that a word is recognized as soon as its uniqueness
point is reached. This was nicely confirmed in the initial experiments, and
the notion is still a basic one in spoken word perception.
Sigmund Exner had been ahead of Marslen-Wilson by over eight
decades. He formulated the essence of the theory in 1894. Here is the
relevant text in English translation (from Levelt 2013, 81):
When you for instance hear the sound K, with [. . .] very low intensity the traces
are activated which in many earlier cases were simultaneously active with the
perception of K and which correspond to the images of Knabe [boy], Kuh
[cow], Kirsche [cherry], Kugel [ball], Kern [kernel], etc. [. . .] This activation
doesnt disappear however with the disappearance of the sound K, but continues
[. . .] as a trace for a duration of a number of seconds [. . .]. If during the existence
of this activation [. . .] also the sound I is heard, then a further bit of activation
will be received by those traces that are associatively connected to the sound I.
This should not mean that the image of Fisch [fish] is not also activated by the
I-sound because of its connection to the I-sound, but it is obvious that all images
whose name begins with KI have a remarkable advantage, because they were
already activated by the previous K-sound. [. . .] Hence, the image Kirsche will
be closer to the activation value needed for clear consciousness as the image
Fisch. In addition, it [the I-sound] will [. . .] suppress the vague images Knabe,
Kuh, Kugel, Kern, etc. [. . .] [Kirsche] will however still be at the same
activation level with other words beginning with Ki [. . .]. If then the further
sound R is added, the total activation process of the traces in the brain is nar-
rowed down following the same principle, so that only the traces representing
the images Kirsche and Kirche are activated; the further sound Sch then hits
a relatively very small number of active brain traces, but it is intensive and it will,
during the pause that follows completion of the word, develop itself into the full
activation of the image traces of Kirsche.4 (German original: Exner 1894,
307308)
Exner does not formulate a notion equivalent to uniqueness point, but

he does allow for words outside the word-initial cohort to be also acti-
vated by later speech sounds. He mentions the word Fisch (fish), which
will also receive some activation from second speech sound i.
I have not seen a single later reference to Exners cohort theory.
Neither Exner nor anybody else set out to test the theory experimentally,
although this would in principle have been possible at some time before
Marslen-Wilson re-invented the idea. It would certainly have speeded up
our understanding of spoken word perception.
12.4 Rudolf Meringer and Carl Mayer on Speech Errors
A most remarkable sleeping beauty has been Rudolf Meringer and Carl
Mayers (1895) theory of speech errors and its further extension in
Meringer (1908). There is indeed great beauty here. The thoroughly data-
based theory is the first to explain speech errors from an explicit psycho-
logical theory of utterance production, a theory that in its essentials still
stands today. It is, moreover, almost incomprehensible how this work
could suffer the fate of a decades-long sleep state. Let us shortly consider
these two features of the case.
The linguist Rudolf Meringer (18591931) was born in Vienna, and
held teaching positions there and, since 1899, in Graz. He was a con-
firmed empiricist: one who cannot observe is not a researcher, but a
bookworm5 (Meringer 1909, 597). His grand empirical project became
the systematic collection, analysis, and psycholinguistic explanation of
spontaneous speech errors. Meringer organized the systematic collection
by involving the participants in a regular lunch-time meeting. They
agreed to stick to certain rules, such as speaking one person at a time
and halting all conversation as soon as a tongue slip occurred. The latter
would allow for proper recording of the error and for immediate intro-
spection on the part of the speaker concerned. This procedure introduced
an important methodological feature: all occurring speech errors were
recorded, not just the remarkable, interesting, or funny ones as had been
the tradition, and as would regrettably become the tradition again.
Medical doctor Carl Mayer was only marginally involved with data col-
lection and analysis and not at all with the writing. However, his
co-authorship was important for Meringer because it would mark empir-
ical speech error research as natural science. The total corpus recorded
amounted to some 2500 slips of the tongue.
The three basic error categories Meringer distinguished are still in

good use: exchanges, anticipations, and perseverations, and the core
observation in all three categories was that the exchanged elements are
functionally similar. In the exchange denile Semenz, for instance, two
word-initial consonants are exchanged, the anticipation lssen nmlich
(for lassen nmlich) involves two stressed vowels in word-initial syllables,
the perseveration konkret und kontrakt (for abstrakt) perseverates the
first word-initial syllable as the second words initial syllable. Meringer
considered speech errors as resulting from the regular speech producing
mechanism: Only attention fails in a speech error, the machine runs
without a supervisor, is left to its own devices6 (Meringer and Mayer
1895, vii). Linguistic elements, whether consonants, vowels, syllables,
roots, prefixes, suffixes, words, or phrases get ordered by the production
machine. They should end up in particular target positions. There are
always multiple elements simultaneously conscious in inner speech.
Occasionally, an active element ends up in a wrong but functionally
similar target position, with an ordering error as outcome. Target posi-
tions differ in weight. Word-initial consonants, for instance, are heavy.
Vowels in unstressed syllables are light. Heavy elements have better
access to consciousness than light elements and hence are better intrud-
ers into functionally similar target positions. Meringers weight hierarchy
is a good predictor of the frequency distribution of the sound errors he
had observed. We will not go into further details of the cogs in Meringer
clockwork (his own terms: Rder, Uhrwerk), but they have stood the
test of time. They figure in one way or another in all modern theories of
error generation. If any work deserves the qualification that Georg
Mendel expressed about his own work, it is Rudolf Meringers: It still
requires some courage to submit oneself to such a far going enterprise;
but it seems nevertheless to be the only proper way (Mendel 1866, 4).
That, however, was not appreciated for almost seven long decades.
How did this wonderful work get lost? One source of obliteration has
been Sigmund Freuds psychoanalysis of speech errors. The first edition
of his Zur Psychopathologie des Alltagslebens (1901) makes reference to
Meringer and Mayers book as a Vorarbeit (preliminary work) to his
own. However, their views are fernab von den meinigen (far away from
my own; Freud [1901] 1954, 5253). He then does away with the pro-
posed mechanical explanations: In a major set of substitutions slips of
the tongue fully ignore such sound laws7 (Freuds own emphasis, 74). He
then comes up with a number of speech errors, some from Meringer and
Mayer, many more from his own or his colleagues, supporting an entirely
different story: speech errors result from something suppressed from
consciousness, forcing its way out. For example, Sie werden Trost finden,
indem Sie sich vllig Ihren Kindern widwen (target: widmen devote)
spoken by a gentleman to a beautiful young widow (you will find con-
solation in fully widowing yourself to your children). Here is Freuds
explanation for this mechanically obvious perseveration, the suppressed
thought indicated a different kind of consolation: a beautiful young
widow will soon enjoy new sexual pleasures. No wonder that Meringer
describes such analyses as jenseits von gut und bse (beyond good
and evil; 1908, 129). In subsequent editions of Zur Psychopathologie des
Alltagslebens, Freuds stories become ever wilder and more offensive to
Meringer. Ultimately, after its sixth edition in 1919, Meringer had had
enough, and wrote a detailed, totally devastating and hilarious review
(Meringer 1923). After carefully deconstructing Freuds phantom
interpretations case after case, Meringer concludes, How much clearer
spoke Pythia than the way Fate reveals itself to modern Freud-humans!
One should even despair, if the same Fate hadnt also blessed the same
human beings with psychoanalysis8 (140). However, it was to no avail.
Freuds story telling about speech errors had conquered the world; in
1923, the 11th printing of the English edition became available already.
This brings us to the other cause of obliteration. There was never an
English translation of Meringer and Mayers (1895) treatise. After World
War I, and especially after the establishment of the Third Reich in 1933,
the center of gravity of psycholinguistics shifted to the Anglo-Saxon
world, especially North-America. As we will consider, research lines were
drastically broken, knowledge of German was limited, and mental
machinery was anathema for dominant behaviorism. Behavioristic psy-
cholinguistics culminated in Burrhus Frederic Skinners Verbal Behavior
of 1957, or rather already in his William James Lectures of 1947, which
was generally considered as holy writ. Verbal Behavior is essentially a
book about the speaker, in which the theoretical framework of operant
conditioning is applied to the phenomena of language productionan
enormous scaling up from elementary behaviors of rats and pigeons in
Skinner boxes to the most complex of all behaviors, speaking. Not sur-
prisingly, the book lacks an empirical, let alone experimental, basis; it is
a discursive text. It does however discuss speech errors. They can occur
when two verbal operants (verbal responses such as snarl and tangle)
have the same strength and become simultaneously emitted (as snangle).
Here, Skinner rejects Freuds approach to look for explanations in highly
selective observations: A careful study of large samples of recorded

speech would be necessary to determine the relative frequency of differ-
ent types of fragmentary recombination (294). But Skinner makes no
reference to Meringer and Mayer, who had done just that. He should
have known better because the one source he used for his slip examples
was Wells (1906), an English monograph that makes repeated reference
to Meringer and Mayer. He either wasnt able or willing to consult that
German monograph or he decided to ignore it because of its psycho-
mechanical explanationsor probably both. As a result, Skinner left the
issue without a new empirical basis, that is, an unbiased corpus of spon-
taneous speech errors, and without theoretical explanation.
However, the cognitive revolution was already on its way, and linguists
began systematically collecting error corpora, analyzing them linguisti-
cally, and providing explanations in line with Meringers, and making due
reference to his work. To the best of my knowledge the first prince to
kiss this sleeping beauty was the Dutch linguist Anthony Cohen (1968),
but he was soon followed by many others on both sides of the Atlantic.
Ann Cutler and David Fay erected a monument for Meringer and Mayer
in 1978 by editing a facsimile reproduction of their 1895 book, with an
introduction that offered a detailed and lucid discussion of both the
empirical and theoretical accomplishments of this work, which they char-
acterized as modern in all major respects.
12.5 Wilhelm Wundt, Grammarian of Sign Language and Inventor of Phrase

Structure Diagrams
Theorizing about language origins has always fluctuated between surmis-

ing vocal or gestural origins. Steinthal, we saw, opted for a vocal theory.
Sign languages or their precursors have, according to him, no grammati-
cal categories because they lack inflections and particles. This was in fact
the dominant view during the second half of the 19th century. But
Wilhelm Wundt (18321920) took the opposite position. Language, he
argued, originated from a gestural base. The deep motivation for compos-
ing his magnum opus Die Sprache (1900) was to provide the ultimate
psychological theory on the origins of language. We can still observe the
primordial state of language in the spontaneously arising, natural, and
largely universal sign language of the Deaf, Wundt sustained, and sign
language is grammatical.
How do signers express their thoughts? It starts out by being conscious
of some state of affairs that they want to express, which Wundt calls the
S A O V A
man furious child hit hard
Figure 12.1
Gesamtvorstellung, the total image. The signer then successively focuses

on elements of three kinds in that total image: entities, properties, and
states. Here, elements that are salient in the total image get precedence
over less salient elements. The elements can entertain a small set of
binary, logical relations to one another, such as subject-predicate (subject
= what the sentence is about) or modification relations.
Here is an example from sign language. Wundt was the first to produce
a (very partial) grammar of Deaf sign language. In this example, the deaf
persons Gesamtvorstellung is of the furious man hitting the child hard.
Most salient is (the) furious man. It is focused on first. Its elements
entertain a logical, binary relation of modification. In sign language,
according to Wundt, the modifier follows the modified. That also holds
for the other modifier relation between hit and hard. Another binary
relation in the total image that the signer will focus on is between hard
hitting and (the) child. Finally, there is the highest level partition in the
Gesamtvorstellung, between subject and predicate: the sentence is about
(the) furious man, and what is said about him is that he hits (the) child
hard. Wundt argues in much detail that sign language is an SOV lan-
guage, but his work on sign language went into oblivion. We would have
to wait for six decades before the next grammar of a sign language
appeared (Stokoe 1960).
In the final chapter 9 of Die Sprache, Wundt goes through an amazing
tour de force in spelling out how spoken languages emerged from sign
language. We will not follow him there. What is relevant here is that
Wundt was the first to draw phrase structure diagrams, such as the one
above. They are at the same time structural representations of logical,
grammatical relations and representations of the partitioning process
involved in the generation of sentences. Wundt introduced these dia-
grams in his Logik (1880), but then went into much more detail when
developing his theory of the speaker in Die Sprache. Here is a phrase
structure diagram for a sentence produced by speaker/writer Johann
Wolfgang von Goethe:
a b (a) c A B
(a) d (a) e aI bI A D
Figure 12.2
Als er sich den Vorwurf sehr zu Herzen zu nehmen schien ( a b ) und immer aufs neue
beteurte (c), da er gewi gern mitteile (d), gern fr Freunde ttig sei (e), so empfand sie
( A B), da sie sein zartes Gemt verletzt habe ( a 1 b 1 ), und sie fhlte sich als seine
Schuldnerin ( A D ). [As he seemed to take the reproach to heart ( a b) and again and
again proclaimed (c) that he certainly gladly intimated (d) to be eagerly active for his
friends (e), then she experienced ( A B), that she injured his tender heart ( a 1 b 1 ), and
she felt indebted to him ( A D ).]
It represents two types of connection: logical ones (curved arcs) and

associative ones (straight arcs). Logical connectors are always binary
partitionings; associative connections can create strings of arbitrary
length (Hans is blonde, tall, kind, . . . and fresh).
These were the first phrase diagrams in linguistics, but also the last to
be seen for half a century; Nida reintroduced them in 1949. He had not
yet used them in the first, 1946 edition of his text, but the second edition
featured on page 87 this phrase diagram, the very first diagrammatic
representation of an IC (immediate constituent) analysis:
Peasants throughout China work very hard
Figure 12.3
This is in particular surprising because Leonard Bloomfield, the father

of IC analysis, had paid a study visit to Germany (19131914) and Leipzig
in particular, where he attended Wundts lectures. He was deeply
impressed. In his preface to An Introduction to the Study of Language,
Bloomfield (1914) wrote: It will be apparent, especially, that I depend for
my psychology, general and linguistic, entirely on Wundt (vi). For a
century, this textbook was the best English-language introduction to
Wundts theory of language. But it did not contain a single phrase diagram.
For some reason, Bloomfield denied himself the luxury of using the
obvious formal tool for representing his immediate constituent analyses.
12.6 Adolf Reinach and Hans Lipps on Speech Act Theory
Whereas Wilhelm Wundt took the speakers perspective in his theory of

language, Philipp Wegener took the dialogical perspective in a still attrac-
tive text of his from 1885: The purpose of our speaking is always to
influence the will or knowledge of a person in such a way as seems useful
to the speaker9 (1885, 67). Speakers will either try to involve the listener
in their own states or value judgments or, alternatively, express their
involvement with the listeners states or value judgments. Wegener then
sketches the ethical dimension of dialogue, which proceeds from its func-
tion of affecting the will of the interlocutor.
Lawyer and student of Husserls, Adolf Reinach (18831917) was the
first to formulate the quasi-legal nature of dialogical speech acts (Reinach
1913). A command, for instance, is an action of the subject to which is
essential not only its spontaneity and its intentionality towards alien sub-
jects, but also its need of being perceived10 (707). Social acts are initiated
by ego (spontaneous) and are intended to be perceived. This holds as
much for commands as for requesting, admonishing, questioning, inform-
ing, or answering. They are all cast to an alien subject, in order to pitch
into his mind (um sich in seine Seele einzuhacken; 707). These, clearly,
are the performatives of modern speech act theory. An essential feature
of social acts, in particular speech acts, is that they are registered by the
intended audience. These social acts always have a purpose and a presup-
position. A command, for instance, has as its purpose to induce some
response in the interlocutor. Its presupposition is the speakers will that
the response is executed. This is nowadays called the sincerity condition
of the speech act. Reinach also introduces the notion of commitment
(Verbindlichkeit) in his speech act theory. Reinach died at the age of 32,
which tragically broke off the further development of his speech act
theory, but the notion of commitment was further worked out by Hans
Lipps ([1937, 1938], republished in 1958). Hans Lipps (18891941), along
with Adolf Reinach, had belonged to the Gttingen Philosophical Society
formed around Edmund Husserl. Later he also became a botanist and a
medical doctor. His 1938 paper was entitled The binding nature of lan-
guage (Die Verbindlichkeit der Sprache). It maintained that each spoken
word implicates a commitment and the addressee executes (vollzieht)
the meaning of the words. In case of a promise, for instance, the addressee
is informed about the speakers intention and is at the same time accept-
ing it, taking the speakers word. For the speaker, on the other hand,
the promise is an assurance that he vouches for his words.
Neither Adolf Reinach nor Hans Lipps are referred to in John Austins
famous 1955 William James Lectures (Austin 1962), but they had cer-
tainly been pioneers of speech act theory.
12.7 Max Isserlin on Telegram Style as Adaptation in Agrammatism
Hermann Steinthal had introduced the term akataphasia for the inability
of certain aphasic patients to build sentences in spite of the fact that the
underlying thought or judgment is intact. Adolf Kussmaul, in his wonder-
ful 1877 text on disorders of language, recognized the same syndrome
calling it agrammatism, the term we still use. It is the inability to inflect
words appropriately and to syntactically order them into sentences11
(164). A more detailed analysis of agrammatic speech style was under-
taken by Carl Wernickes students Karl Bonhoeffer (1902) and Karl
Heilbronner (1906). They characterized this style as telegraphic. Heil-
bronner argued that this style was not voluntary but a real syntactic
inability, a primary effect of a lesion in the speech motor area.
This was the state-of-the-art when Max Isserlin (18791941) published
his paper ber Agrammatismus (1921). The paper includes extensive
protocols of the spoken and written texts of three agrammatic patients.
Here is an utterance of case 1 (WD), who describes how his brother-in-
law was killed:
Thief beenbrother-in-law at job, nothing noticed at all2 daysthrown in the
Pregelin Knigsberg anyhow very badjust Goldmarksnothing to eat.
Killer found latertaken out of bed worker.12
Isserlin summarizes this style of speaking as follows: the patient shows

the correct telegraphic style as a free form of expression.This tele-
gram style does not involve real slips in word forms (wrong case, flexion).
It is essential that the patient rejects the grammatical mistakes and
selects the correct forms offered to him, with great certainty. The patient
has a lively awareness of his own defective speech. The patient can give
up his telegraphic style under certain conditions, for instance in retelling
or in teaching, approaching normal speech, however with occasional
errors. Isserlin stresses that correct pure telegraphic style is neither
incorrect nor erroneous speech. It is rather a lawfully existing way of
speaking, developed in the history of mankind (394395). Telegraphic
style, Isserlin (1936) argues, is the patients free adaptation to his speech
need: The notion of telegraphic agrammatism as a need phenomenon
is supported by the fact that the same patient can choose other forms of
utterance in situations of less speech needin writingand produce
relatively correct grammatical expressions (749). Or as one of his

patients put it: Sprechen keine ZeitTelegrammstil (Speaking no
timetelegramstyle; 1921, 408).
Steinthal (1881) had already considered this speech need. In order to
build a sentence, the speaker must keep the underlying meaning con-
glomerates vibrating, because consciousness can only hold one word.
If the activation of the relevant sub-conscious meanings is too short-
lived, establishing their syntactic relations and ordering cannot be
achieved. This insight got lost in history. Haarmann and Kolk (1991)
re-introduced it in their computational theory of agrammatism. During
the same period Kolk and Heeschen (1990) published their adaptation
theory of agrammatism, arguing in much linguistic detail that many
agrammatic patients freely opt for a grammatically correct, but less
demanding, telegraphic style. They had become aware of Isserlins work,
which had been lost for over half a century. Here, as in so many other
cases, the Nazi-regime had silenced a leading scientist. During World War
I, young Max Isserlin had begun establishing a clinic for brain damaged
war victims in Munich. He directed that federal clinic until 1933, when
he was dismissed for being a Jew. But he stayed in charge of the annex
Bavarian state hospital. There he was dismissed in 1938, ultimately
leaving the country at the last moment in 1939. He emigrated to Shef-
field, England, where he died in 1941.
12.8 Who Was the Wicked Fairy?
We have considered seven sleeping beauties: Steinthals theory of con-

sciousness, Meringers analysis of spontaneous speech errors, Exners
cohort theory, Wundts grammar of sign language and his introduction
of tree diagrams, Reinachs and Lippss invention of speech act theory
and, finally, Isserlins adaptation theory. How come such remarkable
scientific discoveries, tools, insights, or theories fall into oblivion? There
are specific, but also more common impediments. Mendels case is rather
specific, though not unique. He did not work in an academic setting and
science was not his main occupation, especially after becoming abbot of
his monastery in Brnn, shortly after publication of his paper. A some-
what similar case in psycholinguistics was John Ridley Stroops discovery
of what is now called the Stroop effect: naming the color of a printed
word is exceedingly slow if that word is the name of a different color.
Stroops paper, essentially his dissertation, was published in 1935. It was
to be his last scientific paper. He devoted the rest of his life to religion,
writing religious texts, teaching bible classes, and preaching in his local
Nashville community. It took almost two decades before Stroops paper
returned to the scientific agenda. By now it is the most cited paper in
the domain of reading research. For both Mendel and Stroop religious
duties took precedence over scientific self-promotion.
Another quite general impediment is the language of publication. This
certainly holds for all seven cases discussed in this paper. All of them
were published in German, and none of the relevant publications by
Steinthal, Exner, Meringer, Wundt, Reinach, Lipps, or Isserlin were trans-
lated into English. With the shift of gravity in the language sciences to
the Anglo-Saxon world, especially North-America, during the first half
of the 20th century, English became the language of science. Increasingly,
the mastery of German was lost in the linguistic community. Secondary
English-language sources became the tools of reference to the original
sources, often with major misrepresentations or omissions as a conse-
quence. Wundt, for instance, was soon called an introspectionist in the
United States and often still is, but he wasnt. Wundt never introduced a
method of systematically observing and reporting ones own inner expe-
rience, thoughts, and feelings. That was done by his students Oswald
Klpe in Wrzburg, and Edward Titchener at Cornell. It was the latter
who ascribed introspectionism to Wundt, whereas Wundt had himself
attacked that method in his ferocious 1908 critique of Karl Bhlers
Habilitationsschrift (Wundt 1908), which had been supervised by Klpe.
As mentioned, the major American source on Wundts (psycho-)linguis-
tics was Bloomfields (1914) text, but it left out Wundts phrase diagrams
and didnt mention his grammar of sign language.
One really wicked fairy has been behaviorism, in particular the North-
American Watsonian variant of it. This played in linguistics and psychol-
ogy alike. All above beauties had originated in the minds of mentalists.
Still in 1914, the year John Broadus Watsons Behavior appeared, Leonard
Bloomfield put the common view this way: To demonstrate in detail the
role of language in our mental processes would be to outline the facts of
psychology (56), but then the tide quickly turned in the United States,
for reasons that are still not well understood. This is how Bloomfield
rejected mentalism in 1933: It remains for linguists to show, in detail,
that the speaker has no ideas, and that the noise is sufficientfor the
speakers words to act with a trigger-effect upon the nervous systems of
his speechfellows ([1933] 1976, 93). Although behavioristic language
scholars deeply disagreed among themselves, they all outlawed explana-
tion in terms of mental constructs. It even became an industry to translate
traditional notions into behaviorese, replacing mental linguistic termi-

nology by an objective one. Here is just one example, from Skinner
(1957, 4445), Otto Jespersen translated into behaviorese:
Jespersens text: In many countries it has been observed that very early a child
uses a long m (without a vowel) as a sign that it wants something, but we can
hardly be right in supposing that the sound is originally meant by children in this
sense. They do not use it consciously until they see that grown-up people, on
hearing the sound, come up and find out what the child wants. (44; original:
Jespersen [1922, 157])
Skinners translation: It has been observed that very early a child emits the
sound m in certain states of deprivation or aversive stimulation, but we can
hardly be right in calling the response verbal at this stage. It is conditioned as a
verbal operant only when people, upon hearing the sound, come up and supply
appropriate reinforcement. (45)
The general disdain for mentalism increasingly led to ignorance of the

original sources in (psycho-)linguistics.
The most vicious of all fairies was no doubt anti-Semitism and war. By
the end of World War I, the Austro-Hungarian Empire had fallen apart.
Its formerly booming capital Vienna became the impoverished, top-
heavy capital of powerless Austria. The Versailles treaties of 1919 under-
mined Germanys economy. In both countries science suffered. This
triggered the gradual shift of the language sciences center of gravity to
North-America, but the deathblow was dealt by Hitlers National-
Socialism. The havoc raised in the language sciences is best documented
by Utz Maas (2010). The exodus of Jewish, but also non-Jewish language
scholars began right upon Hitlers accession to power on January 31, 1933
and his shortly following April 7 law which compelled universities to
dismiss their Jewish members of staff.13 This amounted to some 20 percent
of the total German university faculty. A second wave of exodus imme-
diately followed the Austrian Anschlu of March 12, 1938. Many of the
great contributors to language science in both countries were Jewish. I
reviewed these tragic developments in my book A History of Psycholin-
guistics (2013). What is relevant here is that in quite a number of cases
the dismissed scientists had no chance to re-establish their reputation in
their new environments. Some died or were killed before the war was
over. Among them were phonologist Nikolay Trubetskoy, who suffered
a heart attack when the Gestapo entered his home in Vienna for a search;
phonetician Elise Richterthe first woman university professor of
Austria, who was murdered in Theresienstadt; psychologist Otto Selz,
who died in a freight wagon on the way to Auschwitz. None of all these
scientists were given the opportunity to further develop and promote
their intellectual heritage. In one case, the two World Wars joined forces
to truncate a promising intellectual development. Both pioneers of
speech act theory were killed on the German front: in 1913 young Adolf
Reinach in Diksmuide Belgium, and in 1941 Hans Lipps on the Russian
front. John Austin could hardly have become aware of their work.
12.9 Prospect
Has modern science successfully banished the wicked fairy? The lan-
guage barriers have been largely removed, with (bad) English as the
generally accepted lingua franca of science. Although dogmatic behav-
iorism has faded from the scene, other forms of intellectual provincialism
have until recently blossomed in linguistics behind impenetrable walls of
defense. But this era of linguistic wars also belongs to the past it seems.
Most importantly, the seven decades since the latest (and hopefully very
last) World War has seen a large scale globalization of the scientific
enterprise, from which the language sciences are profiting immensely.
Language diversity can now, finally, be addressed involving native speak-
ers of all ethnicities and cultures. The beauties on this global academic
scene are very much alive and kicking, but let us stay alert. One menacing
wicked fairy in modern science is its quasi market model. Frequent pub-
lication in high-impact journals has become the sine qua non for a sci-
entific career. Publication rate, especially among the young and untenured,
has been rocketing in recent years. Journal papers, especially short and
multiple-authored ones, have become the dominant output commodity
of science and (psycho-)linguistics. However, a really functioning market
matches producers and consumers. That healthy situation does not exist
in science as Klein (2012) has argued. Most published papers are hardly
ever cited and quite probably hardly ever carefully read. There is no
guarantee whatsoever that the best ideas will ultimately emerge in the
market. It seems moreover inevitable that especially risky, non-trivial,
and innovative insights will be hard put to survive peer review. In short,
new sleeping beauties are bound to be added to the hidden, overgrown
castle of science. History will keep repeating itself.
Notes
1. Inevitably, the present paper occasionally uses material from that book.
2. Glckliche Fortschritte in der Sprachwissenschaft setzen eine entwickelte
Psychologie voraus. This and all following translations are mine.
3. Alles Sprechen und Denken in Worten beruht darauf [. . .] dass der Inhalt
seine stellvertretenden Wrter in das Bewusstsein schicke, da er selbst nicht
dahin gelangen kann.
4. Mit hnlicher, sehr geringer Intensitt werden beim Hren, z.B. des Lautes
K, die Bahnen erregt werden, welche in vielen Fllen gleichzeitig mit der Emp-
findung des K in Action waren und die den Vorstellungen von Knabe, Kuh,
Kirsche, Kugel, Kern etc. entsprechen. . . . Diese Erregung verschwindet aber
nicht sofort mit dem Aufhren des Lautes K, sondern besteht als Bahnung, wie
wir gesehen haben, noch eine nach Secunden zhlende Zeitdauer fort. . . . Wenn
nun whrend des Bestehens der Bahnung dieser Rindenfasern . . . noch der Laut
I gehrt wird, so werden dadurch aus dem ganzen Bereiche der gebahnten Vor-
stellungen jene Bahncomplexe einen weiteren Zuschuss an Erregung bekom-
men, welche assoziativ mit dem Laute I verknpft sind. Es soll dabei nicht gesagt
sein, dass nicht auch die Vorstellung Fisch durch den I -Laut gehoben wird, indem
auch sie mit dem Laute I zusammenhngt, aber es leuchtet ein, dass alle Vorstel-
lungen, deren Wortbezeichnung mit KI beginnt, einen bedeutenden Vorsprung
haben, da sie durch das vorgehende K bereits gehoben waren. . . . Es wird also
die Vorstellung Kirsche nher dem Erregungswerthe liegen, bei dem sie dem
Bewusstsein klar vorschwebt, als die Vorstellung Fisch. Sie wird weiterhin nach
dem Prinzip der centralen Hemmung die dunkle Vorstellungen Knabe, Kuh,
Kugel, Kern etc. unterdrcken, sie wird aber nicht allein dies thun, da sie mit
der Lautfolge Ki noch nicht voll entwickelt ist, vielmehr wird sie . . . noch auf
gleicher Erregungsstufe stehen mit den Vorstellungen, welche anderen mit Ki
beginnenden Worten angehrt, und diese werden gemeinschaftlich die centrale
Hemmung erwecken. Reiht sich dann weiterhin der Laut R an, so wird der
gesammte Erregungsprocess der Rindenbahnen nach demselben Principe noch
weiter eingeschrnkt, so dass etwa nur mehr die Bahnen, welche der Vorstellung
Kirsche und Kirche entsprechen, gebahnt sind; der weitere Laut Sch trifft nur
mehr eine verhltnissmssig sehr geringe Anzahl von Rindenfasern gebahnt,
diese Bahnung aber ist eine intensive und wird mit der Pause, welche nach Vol-
lendung des Wortes eintritt, sich zur vollen Erregung der Vorstellungsbahnen der
Kirsche entwicklen knnen. (Exner 1894, 307308).
5. . . . und wer nicht beobachten kann, ist kein Forscher, sondern ein
Bcherwurm.
6. Beim Sprechfehler versagt nur die Aufmerksamkeit, die Machine luft ohne
Wchter, sich selbst berlassen.
7. . . . wird beim Versprechen von solchen Lautgesetzen vllig abgesehen.
8. Wieviel klarer sprach die Pythia, als wie sich das Schicksal modernen Freud-
Menschen offenbart! Man mte verzweifeln, wenn dasselbe Schicksal die Men-
schen nicht auch mit der Psychoanalyse begnadet htte!
9. Der Zweck unseres Sprechens ist stets der, den Willen oder Erkentniss einer
Person so zu beeinflussen, wie es dem Sprechenden als wertvoll erscheint.
10. Vielmehr ist das Befehlen ein Erlebnis eigener Art, ein Tun des Subjektes,
dem neben seiner Spontaneitt, seiner Intentionalitt und Fremdpersonalitt die
Vernehmungsbedrftigkeit wesentlich ist.
11. . . . das Unvermgen, die Wrter grammatisch zu formen und syntaktisch

im Satze zu ordnen.
12. Dieb gewesenSchwager auf Posten, gar nichts gemerkt2 Tagein den
Pregel geschmissenin Knigsberg berhaupt sehr schlechtnur Marken
nichts zu essen. Mrder spter gefundenaus dem Bett genommen Arbeiter.
13. The law in question is the Law for the Restoration of the Professional Civil
Service (Gesetz zur Wiederherstellung des Berufsbeamtentums).
References
Austin, John Langshaw. 1962. How to Do Things with Words. Oxford: Clarendon
Press.
Bonhoeffer, Karl. 1902. Zur Kenntniss der Rckbildung motorischer Aphasien.
Mitteilungen aus den Grenzgebieten der Medizin und Chirurgie 10: 203224.
Bloomfield, Leonard. 1914. An Introduction to the Study of Language. New York:
Henry Holt.
Bloomfield, Leonard. [1933] 1976. Language. London: Allen & Unwin.
Cohen, Anthony. 1968. Errors of speech and their implications for understanding
the strategy of language users. Zeitschrift fr Phonetik 21 (12): 177181.
Exner, Siegmund. 1894. Entwurf zu einer physiologische Erklrung der psy-
chischen Erscheinungen. Vol. 1. Leipzig and Vienna: Franz Deuticke.
Freud, Sigmund. [1901] 1954. Zur Psychopathologie des Alltagslebens. Frankfurt
am Main: Gustav Fischer.
Haarmann, Henk, and Herman H. J. Kolk. 1991. A computer model of the tem-
poral course of agrammatic sentence understanding: The effects of variation in
severity and sentence complexity. Cognitive Science 15 (1): 4987.
Heilbronner, Karl. 1906. Ueber Agrammatismus und die Strung der inneren
Sprache. Archiv fr Psychiatrie und Nervenkrankheiten 41: 653683.
Herbart, Johann Friedrich. 1824. Psychologie als Wissenschaft, neu gegrndet auf
Erfahrung, Metaphysik und Mathematik. 2 vols. Knigsberg: Unzer.
Isserlin, Max. 1921. ber Agrammatismus. Zeitschrift fr die gesamte Neurologie
und Psychiatrie 75: 332410.
Isserlin, Max. 1936. Aphasie. In Handbuch der Neurologie, vol. 6, edited by
Oswald Bumke and Otfrid Foerster, 626807. Berlin: Springer.
Jackendoff, Ray. 1987. Consciousness and the Computational Mind. Cambridge,
MA: MIT Press.
MIT Press.
Jackendoff, Ray. 2007. Language, Consciousness, Culture: Essays on Mental Struc-
ture. Oxford: Oxford University Press.
Jackendoff, Ray. 2012. A Users Guide to Thought and Meaning. Oxford: Oxford
University Press.
Jespersen, Otto. 1922. Language: Its Nature, Development and Origin. New York:
Henry Holt.
Klein, Wolfgang. 2012. Auf dem Markt der Wissenschaften oder: Weniger wre
mehr. In Herausragende Persnlichkeiten berichten ber ihre Begegnung mit Hei-
delberg, edited by Karlheinz Sonntag, Heidelberger Profile, 6184. Heidelberg:
Universittsverlag Winter.
Kolk, Herman, and Claus Heeschen. 1990. Adaptation symptoms and impairment
symptoms in Brocas aphasia. Aphasiology 4 (3): 221231.
Kussmaul, Adolf. 1877. Die Strungen der Sprache: Versuch einer Pathologie der
Sprache. In Handbuch der Speciellen Pathologie und Therapie, edited by Hugo
von Ziemssen. Anhang. Leipzig: F. C. W. Vogel.
Levelt, Willem J. M. 2013. A History of Psycholinguistics: The Pre-Chomskyan
Era. Oxford: Oxford University Press.
Lipps, Hans. [1937, 1938] 1958. Die Verbindlichkeit der Sprache. Frankfurt: Vittorio
Klostermann.
Maas, Utz. 2010. Verfolgung und Auswanderung deutschsprachiger Sprach-
forscher. 2 vols. Tbingen: Stauffenburg Verlag.
Marslen-Wilson, William D., and Alan Welsh. 1978. Processing interactions and
lexical access during word-recognition in continuous speech. Cognitive Psychol-
ogy 10 (1): 2963.
Mendel, Gregor. 1866. Versuche ber Pflanzenhybriden. Verhandlungen des
naturforschenden Vereins in Brnn 4: 347.
Meringer, Rudolf. 1908. Aus dem Leben der Sprache. Versprechen, Kindersprache,
Nachahmungstrieb. Berlin: Behr.
Meringer, Rudolf. 1923. Die tglichen Fehler im Sprechen, Lesen und Handeln.
Wrter und Sachen 8: 122140.
Meringer, Rudolf, and Carl Mayer. 1895. Versprechen und Verlesen. Eine
psychologisch-linguistische Studie. Stuttgart: Gschensche Verlagshandlung.
New edition, edited by Anne Cutler and David Fay. Amsterdam: John Benjamins,
1978.
Mller, Friedrich Max. 1887. The Science of Thought. London: Longmans, Green,
and Co.
Nida, Eugene Albert. 1949. Morphology: The Descriptive Analysis of Words. 2nd
edition. Ann Arbor, MI: University of Michigan Press.
Reinach, Adolf. 1913. Die apriorischen Grundlagen des brgerlichen Rechtes.
Halle: Max Niemeyer.
Skinner, Burrhus Frederic. 1957. Verbal Behavior. Acton, MA: Copley Publishing
Group.
Steinthal, Heymann. 1855. Grammatik, Logik und Psychologie: Ihre Prinzipien
und ihr Verhltniss zu einander. Berlin: F. Dmmler. New edition, Hildesheim:
Georg Olms, 1968.
Steinthal, Heymann. 1881. Einleitung in die Psychologie und Sprachwissenschaft.
Berlin: F. Dmmler.
Stokoe, William C. 1960. Sign Language Structure. Studies in Linguistics Occa-

sional Papers 8. Buffalo, NY: University of Buffalo Press.
Stroop, John Ridley. 1935. Studies of interference in serial verbal reactions.
Journal of Experimental Psychology 18 (6): 643662.
Watson, John Broadus. 1914. Behavior: An Introduction to Comparative Psychol-
ogy. New York: Henry Holt.
Wegener, Philipp. 1885. Untersuchungen ber die Grundfragen des Sprachlebens.
Halle: Max Niemeyer.
Wells, Frederic Lyman. 1906. Linguistic Lapses: With Especial Reference to the
Perception of Linguistic Sounds. Columbia University Contributions to Philoso-
phy and Psychology 14 (3). New York: The Science Press.
Wundt, Wilhelm. 1880. Logik. 2 vols. Stuttgart: Enke.
Wundt, Wilhelm. 1900. Die Sprache. 2 vols. Leipzig: Engelmann.
Wundt, Wilhelm. 1908. Kritische Nachlese zur Ausfragemethode. Archiv fr die
gesamte Psychologie 11: 445459.
III LANGUAGE AND BEYOND
13 Evolution of the Speech Code: Higher-Order
Symbolism and the Grammatical Big Bang
Daniel Silverman
Our speech code may have originated as an accompaniment to a manual

system consisting of iconic gestures (Tomasello 2008). In this scenario,
the speech code broke away from its redundant origins, coming to replace
an iconic visual-receptive system with a symbolic auditory-receptive one.
This qualitative change from (hand-based) iconicity to (speech-based)
symbolism may have quickly evolved to the higher-order symbolic status
that is characteristic of language.
Herein, first-order symbolism refers to a one-to-one correspondence
between (arbitrary) symbol and meaning. It is a consequence of single
vocal symbols produced in isolation. Second-order symbolism evolves
from first-order symbolism as two vocal symbols are juxtaposed, inevi-
tably changing the phonetic character of both. Symbolism of the second
order involves a breakdown of a one-to-one symbol-meaning correspon-
dence, culminating in many-to-one and one-to-many correspondences
between symbol and meaning. Third-order symbolism evolves from
second-order symbolism as a consequence of string-medial phonetic
content being of sporadically ambiguous affiliation between our two
juxtaposed symbols, thus potentially inducing listener confusion: if both
structures are sensibly interpretable, listeners may wonder, Is the medial
portion of this phonetic event part of the first symbol or the second?
As will be argued, such semantic ambiguity of structural origin triggers
this phonetic strings analysis into a hierarchical constituent structure by
listeners, thus paving the way for recursion.
As lower orders of symbolism naturally (and perhaps rather suddenly)
evolved to higher orders, we may characterize the beginnings of the
speech code as triggering a grammatical Big Bang.
260 Daniel Silverman
13.1 Zero-Order Symbolism: The Iconic Manual Gesture
As noted, Tomasello (2008) suggests that our early communication

system may have consisted of iconic hand-based gestures produced in
isolation from one another, just as exists in our primate relatives today.
Such iconic manual gestures were likely to have been non-symbolicor
zero-order symbolicin nature. But hand-based visual signaling does
not permit manual multi-tasking, requires close, daytime contact, and
possesses limited cue redundancy, likely rendering it ill-equipped to
jump-start a system as complex as grammar. This is especially true of
iconic visual symbols: regardless of the magnitude of the hand gestures
or the angle from which they were viewed, if there is a direct link
between action and meaning, these gestures iconic status would likely
have resisted transformation into a symbolic system.
Indeed, even if a manual iconic system has the potential to evolve into
a manual symbolic one, the intervening innovation of a sound-based
system quickly and irrevocably quashed that conceivable trajectory.
Acoustic signaling allows for vocal-manual multi-tasking, does not
require close, daytime contact, and is particularly rife with cue redun-
dancy (Ay, Flack, and Krakauer 2007). Any era of multi-modal commu-
nication (involving both vision and sound) was largely pruned of its
visual component, settling towards a sound mode of sufficient robust
overdesign (Krakauer and Plotkin 2004) to evolve toward higher-order
symbolic status.
Perhaps most importantly, the inherently symbolic character of the
speech code acted to unshackle its semiotic character from the invariant
one-to-one relationship between action and meaning that is characteris-
tic of an iconic, gestural system, culminating in a system possessing both
one-to-many and many-to-one relationships between sound and meaning.
To understand both the simple causes and the complex effects of this
development, we trace the origins of first-order symbolism as sounds are
produced in isolation.
13.2 First-Order Symbolism in the Speech Code: One-to-One

Correspondence between Sound and Meaning
The first meaning-imbued sounds of our species (morphemes) may have

quickly settled towards ones involving a sudden expulsion of air from
the mouth due to an oral seal being broken (oral stops) followed by vocal
fold vibration accompanying the oral opening gesture (vowels). There
Evolution of the Speech Code 261
are articulatory, aerodynamic, acoustic, and auditory reasons for this (the
four As).
Regarding articulation, a complete oral closure followed by its release
is quite easy to produce in comparison to other gestures that have
become part of the speech code, gestures that often require extreme
muscular and timing precision to achieve their characteristic aerody-
namic, acoustic, and auditory traits (Ladefoged and Johnson 2011).
Aerodynamically, this simple articulatory action produces a passively
energized expulsion of air from the vocal tract. As air is the medium of
sound transmission, increased airflow allows for more salient and more
varied sounds. Perhaps especially, upon the breaking of an oral seal and
allowing air to rapidly flow from the lungs and out the mouth, the vocal
folds, when properly postured, may readily engage in vibratory activity
(Rothenberg 1968).
Acoustically, this sudden and forceful expulsion of air produces a
speech signal of comparatively heightened energy, one in which any
number of pitch/phonation (source) and resonance (filter) modifications
might be encoded.
Regarding audition, the mammalian auditory nerve is especially
responsive to sudden increases in acoustic energy (Delgutte 1982; Tyler
et al. 1982); a quick reaction to the sudden breaking of silence provides
obvious survival advantages in predation situations. The incipient speech
code would likely exploit this property from the outset, as it does to this
very day (Bladon 1986).
This nascent oral seal may be at the lips, but also, the flexibility of the
tongue allows both its front to form a seal at the alveolar ridge, and its
back to form a seal at the soft palate. The perceptual product of these
distinct closure locations is three easily-distinguished speech events of
exceptionally short duration. This tripartite perceptual distinction estab-
lishes the conditions for different acoustic signals to encode different
meanings; we might imagine an early stage during which these three
closure postures were in place, coordinated with largely undifferentiated
qualities to their opening postures, perhaps resulting in three sounds,
roughly, pu, ti, ka.
If vocal activity of this nature was indeed harnessed to encode meaning,
the semiotic character of primitive speech was of a first-order state, in
contrast to the zero-order state of the manual-gestural system with which
it may have overlapped: each of the three sounds might encode a single
meaning (maybe Run!, Kill!/Eat!, Sex!). One arbitrary event cor-
responds to one meaning, and one meaning is cued by one arbitrary
event. Still, despite this move toward a speech-based semiotic system,

this one-to-one correspondence between event and meaning is perhaps
characteristic of almost all animal sound communication systems, though,
to be sure, whereas early human vocalizations were probably both vol-
untary and situation-semantically flexible, animal vocalizations are
almost surely involuntary, situation-reactive, and instinctual (Jackendoff
1999). Even sporadic deceptive and stifled animal calls are amenable to
such an analysis, as such behaviors may be a consequence of a genetically
inherited probability of use and disuse.
Nonetheless, we are far from grammar.
13.3 Second-Order Symbolism in the Speech Code: One-to-Many and

Many-to-One Correspondence between Sound and Meaning
How might second-order symbolism have evolved from these modest

beginnings?
13.3.1 Innovating the Juxtaposition of Two Symbols, and the Rise of

Compositionality
Consider the physical consequences of producing two of our meaning-
imbued sounds in quick succession. Exhaustively, these are pu-pu, pu-ti,
pu-ka, ti-pu, ti-ti, ti-ka, ka-pu, ka-ti, and ka-ka. There is any number of
ways that such complexity might develop. For example, two-sound
sequences may represent an assemblage of a complex verb-like element,
say Run! Kill!/Eat! (pu-ti) or Kill!/Eat! Run! (ti-pu), either of which
might convey a passive predation warning (Run if you dont want to get
killed and eaten (by that animal)!) or an active predation call (Run to
kill and eat (that animal)!). Alternatively, two sounds may be strung
together to name more objects or events in a nascent form of noun-noun
compounding. Both of these structure-building strategies are present in
perhaps all languages, of course, but while we will return to the increased
semantic complexity that results from such groupings of sounds, for now,
consider their phonetic complexities, complexities that culminate in
second-order symbolism.
Indeed, from the moment that a juxtaposition of two sounds is regu-
larly produced, the speech signal is irrevocably transformed into a
second-order symbolic system. Heres why: as one sound is juxtaposed
to another, each of the sounds undergoes a systematic change in its pho-
netic character. Consider pu-ti. Here, the end of the first sound is sys-
tematically modified by the immediate succession of the second, and
likewise, the second sound is systematically modified by the immediate

precedence of the first. After all, the vocal tract posture that accompanies
one sound cannot instantaneously transform into the posture that accom-
panies another sound. Rather, the postures affect each other, and the
acoustic signal follows suit (hman 1966). So, whereas until this time
there had been a one-to-one correspondence between sound and
meaning, nowinstantly and irrevocablythis correspondence is sabo-
taged: there is now a many-to-one correspondence between sound and
meaning (allomorphy), a situation found in all languages (Silverman
2006). Under the plausible assumption that compositionality is main-
tained at these early stages, now it is two sounds that correspond to one
meaning: pu- when immediately followed by -ti is systematically phoneti-
cally distinctthough semantically non-distinctfrom pu in isolation; -ti
when immediately preceded by pu- is systematically phonetically
distinctthough semantically non-distinctfrom ti in isolation. The jux-
taposition of one sound to another thus opens the floodgates to second-
order symbolism.
Consequently, as these sound complexes are repeated and repeated in
their appropriate real-world contexts, new sounds inevitably arise. This
is certainly true of oral openings when they come to immediately precede
oral closures, but for now, consider the oral closures themselves. While
constant repetition of juxtaposed sounds in appropriate situations may
serve to reinforce their semantic constancy, it is this very repetition that
induces their phonetic change (Kruszewski [1883] 1995). For example, the
medial closure in our pu-ti example may eventually undergo a process
of voicing, becoming pu-di; voicing a mouth closure between mouth
openings is a very natural phonetic development (Rothenberg 1968), one
frequently encountered in the languages of the world. At this point, both
ti- and -di correspond to a single meaning (remember, we are assuming
compositionality): every time ti (with a voiceless closure) is immediately
preceded by another sound, it is replaced by (alternates with) di (with a
voiced closure). Again, this systematic change in sound does not expand
the inventory of meanings, but it does expand the inventory of motor
routines put in service to encoding this meaning.
But now, with a larger garrison of sounds to deploy, a huge expansion
of the semantic inventory becomes possible, one that is able to meet the
needs of our species increasingly sophisticated cognitive and social
structures. Motor routines and sounds that have heretofore corresponded
to a single meaning may now unhinge themselves from their predictable
contexts, to be cycled and recycled in ever-increasing and unpredictable
ways: -di, for example, may now come to be associated with an additional
meaning, and thus becomes free to appear as the first element of a
complex, for example, di-bu (as opposed to a different complex, ti-bu).
Note that the articulatory properties of these initial di-s are slightly dis-
tinct from -di (typically involving an expanded pharynx and lowered
larynx during oral closure in order to maintain trans-glottal airflow,
hence voicing), but nonetheless correspond to -di quite well in acoustic
terms.
This sort of simple and natural sound change sets in motion a massive
increase in the systems complexity. For example, newly-voiced medial
closures may undergo further sound changes, to be harnessed for new
meanings: when the di of di-bu is placed in second position (for example,
ka-di), it is pronounced with closure voicing, comparable to the closure
voicing that had earlier been added to -ti in this context (for example,
earlier bu-ti, now bu-di). That is, two different meanings are now cued
by the same sounds in comparable or even identical contexts. We may
have bu-di in which -di means one thing, but also bu-di in which -di
means something else. This establishes a one-to-many relationship
between sound and meaning (derived homophony), a development also
found in all languages (Silverman 2012).
If many sounds each came to symbolize more than one meaning, lis-
tener confusion and communicative failure may result. Such a scenario
will not come to pass, however (Martinet 1952; Labov 1994; Silverman
2012). Defeating the pervasiveness of this potentially function-negative
development, the di- of di-bu may passively undergo another change
when placed in second position: some spontaneous productions of origi-
nal -di that possess a slight weakening of their voiced closures may evolve
towards a new value, perhaps, -zi, so we have bu-di (earlier bu-ti), and a
different form, bu-zi (earlier bu-di; still earlier, bu-ti). Indeed, such sound
patterns are likely to take hold exactly because of their function-positive
consequences: creeping phonetic patterns that inhibit undue listener
confusion are likely to be replicated and conventionalized. In short, suc-
cessful speech propagates; failed speech falls by the wayside.
This means we now have di- alternating with -zi, both meaning one
thing, and, recall, we have -di alternating with ti-, both meaning another.
The co-evolution of these many-to-one relationships between sound and
meaning results in many meaningful elements of the speech signal pos-
sessing both systematic phonetic variation and semantic stability, even
across varied contexts. Now, in turn, this new sound zi may unhinge itself
from its context and be deployed to signal new meanings.
Such speech patterns are found time and again in both (diachronic)
sound changes and (synchronic) sound alternations (Gurevich 2004).
It is now clear that the mere juxtaposition of two simple sounds trig-
gers remarkable growth and complexity of both the phonetic and the
semantic inventories. Both one-to-many and many-to-one correspon-
dences between sound and meaning naturally emerge. This is second-
order symbolism.
13.3.2 More Examples, More Complexity

Recall that maintaining vocal fold vibration during an oral closure in
utterance-initial position is aerodynamically unnatural, often involving
an actively expanded pharynx and lowered larynx. Consequently, newly-
evolved bu-, di-, and ga- might gradually lose this voicing, thus running
the risk of sounding the same as pu-, ti-, and ka-. If this occurs, then those
spontaneous productions of original pu-, ti-, and ka- that possess a slight
delay in voicing may emerge as new and different sounds, phu-, thi-, and
kha-, which now, again, may unhinge themselves and acquire new mean-
ings, thus allowing them to appear in second position: -phu, -thi, and -kha.
Alternatively, bu-, di-, and ga- may come to be accompanied by velic
venting during their oral closures, thus again maintaining their phonetic
distinctness, mbu-, ndi-, and ga-, which, as expected now, may unhinge
and recombine as -mbu, -ndi, and -ga, thus opening the gates to pho-
notactic complexity, say, ti-mbu, ti-ndi, and ti-ga, and of course, creating
more fodder for an expanding inventory of sounds with semiotic func-
tion. Another possibility is that the pitch-lowering effect that naturally
accompanies voiced closure releases may, over time, migrate to pervade
the opening, coming to replace closure voicing itself, and so becoming a
pitch distinction that the language may now recycle: bu-, di-, and ga- as
distinct from pu-, ti-, and ka- yields to p-, t-, and k- as distinct from
p-, t-, and k-, respectively.
All of these hypothetical developments are not merely proposed char-
acteristics of the nascent speech-based system. Rather, they are encoun-
tered over and over again in the history of language change. This is not
a coincidence. Modern-day pressures on sound patterning are not merely
characteristic of the modern-day grammatical system. Rather, they
may have been in place long before the grammatical system came into
existence, acting as a driving and inertial pressure on the very develop-
ment of grammar itself. Natural systematic phonetic changes are not
merely a result of grammatical complexity; they are a very cause of this
complexity.
To summarize, the juxtaposition of two simple speech sounds may

evolve to convey increasingly complex meanings. Such juxtapositions
necessarily change the phonetic character of both sounds in systematic
ways. These sound complexes may also be harnessed to encode new
meanings, thus precipitating an explosive growth in the complexity of
both the phonetic and the semantic inventories. The consequent sound
complexes now achieve second-order symbolic status: both many-to-one
and one-to-many sound-meaning correspondences come to be present
in the speech code. Still, on rare occasions, certain of these complexes
may result in semantic ambiguity, hence listener confusion and commu-
nicative failure.
13.3.3 Entrenching the Juxtaposition of Two Symbols, and the Rise of

Post-Compositionality
Repeated usage of these compositionally transparent two-symbol struc-
tures not only induces the sorts of phonetic changes just considered, but
may actually trigger the loss of compositionality itself, resulting in even
more complex sounds with semiotic value. For example, compositional
pu-ti possesses a meaning that is transparently built from pu and ti. But
through its constant use and re-use, in addition to its phonetic changes,
it may lose its link to its semantic origins, and thus become stranded as
a semantic primitive (Kruszewski [1883] 1995), becoming post-
compositional. The now-opaque form (perhaps puti, pudi, phuzi, or pt)
becomes a single sound that correlates with a single meaning, thus
embodying a counter-pressure back towards first-order symbolism, even
as the system becomes increasingly phonetically complex.
This tug-of-war between first-order and a second-order symbolic states
induces a lengthening of our meaning-impregnated sounds. Whereas
earlier, the juxtaposition of one sound to another involved only two
mouth-opening gestures (of increasingly varied forms), now such
juxtapositions may involve three or four opening gestures, for example,
puti-ka, puti-kati, etc.
We are moving closer to grammar.
13.4 Third-Order Symbolism in the Speech Code: The Ambiguous Affiliation

of String-Medial Content, and the Triggering of Hierarchical Constituent
Structure and Recursion
Semantic ambiguity of structural origin feeds a hierarchical constituent-

structural analysis, which in turn feeds recursion. Lets consider how.
13.4.1 The Tug-of-War between First-Order and Second-Order Symbolism

There are now pressures towards, and pressures against the development
of third-order symbolism. We first consider a passive resistance to the
triggering of third-order symbolism.
We have been assuming that context-induced phonetic changes to
sounds inevitably trigger their unhinging, such that they may now be
assigned additional meanings, and thus come to freely combine in new
ways (recall, if pu-ti becomes pu-di, the new sound involving vocal fold
vibration during the oral closure, -di, may now be assigned an additional
meaning, thus freeing itself from the shackles of its context, allowing for
di-). Still, if more and more sounds combine into wholly unconstrained
sequences, a genuinely damaging ambiguity-of-meaning may result, in
the form of an excess of one-to-many correspondences between sound
and meaning. For example, the string putika may be ambiguous between
compositional pu-tika and puti-ka.
Recall that successful speech propagates and conventionalizes; failed
speech falls by the wayside. Speech sounds may thus be subject to a
passive curtailment in their distribution such that certain sounds are only
found in certain contexts. For example, perhaps our closure-voicing
development comes to be limited to sound-medial position, and never
takes place sound-initially, thus pu-tiga and pudi-ka. Closure voicing now
acts to cue the compositionality of the forms. Every language passively
evolves such patterns, which sometimes go by the name of boundary
signals (Trubetzkoy [1939] 1969). In our example, voiced closures indi-
cate the absence of a boundary; voiceless closures the presence of a
boundary. Heterophony and clarity of meaning is thus maintained in a
decidedly passive way, simply because those speech signals that are not
semantically ambiguous are likely to be the very ones that are commu-
nicated successfully, hence imitated and conventionalized. Indeed, in
many languages, the phonetic properties of word-initial oral obstructions
are different from these properties in word-medial position, thus often
serving this boundary-signaling function.
Still, even in the absence of these particular sorts of boundary signals,
most languages have extremely reliable cues to boundaries in the form
of prominence or stress. Lets return to putika. Even in the absence of
medial closure voicing, clarity of compositional structure may be con-
veyed by prominence, thus pu-tika or puti-ka; one prominence per
semantic primitive. These prominence distinctions serve to structurally
and, in most cases, semanticallydisambiguate strings that might other-
wise sound the same.
Reflecting its proposed origins as an aid in disambiguating these early

two-sound structures, prominence (linguistic stress) typically involves a
binary strong-weak or weak-strong rhythmic pattern at word edges, often
iteratively applied in accommodation to the inevitably increased length
of meaningful elements of the speech stream, that is, words and phrases
(Hayes 1995). The role of prominence as a binary phonetic structure that
originally cued a binary semantic structure thus persists, in remarkably
comparable function and form, up to the present day.
In sum, the juxtaposition of a very small inventory of simple meaning-
imbued sounds may inevitably lead to an explosion of phonetic and
semantic complexity, rife with cues to structure and meaning, of the sort
possessed by all languages. This complexity now sets the stage for full-
blown grammar to emerge, as second-order symbolism yields to symbol-
ism of the third order.
13.4.2 The Ambiguous Affiliation of String-Medial Content, and the Rise of

Hierarchical Constituent Structure
Boundary signals are not ubiquitous; not in grammar, and almost cer-
tainly not in these early stages of pre-grammar. In the absence of such
signals, a genuine counter-functional ambiguity will, on occasion, be
present in the speech code. Indeed, it is the very ambiguity of some of
our increasingly complex sound strings that establishes the conditions
for third-order symbolism to arise. Consider our putika case again
(assuming the absence of any boundary-signaling content). At these
early stages, recall that at least two structures and meanings may be
paired with this single phonetic string: pu-tika and puti-ka.
In most cases, real-world context will serve a disambiguating function,
but once in a while, genuine ambiguity prompts a deeper structural
analysis by listeners (Is it pu-tika or puti-ka?). The very moment listen-
ers consider competing structures and their associated meanings, they
are engaging in constituent analysis: the potential for hierarchically-
structured strings suddenly becomes a reality.
The semantic ambiguity exemplified by pu-tika versus puti-ka is of
another, higher order than what we have considered thus far; it is an
ambiguity rooted in structure. Listeners now-conditioned expectation of
binarity, coupled with the strings semantic ambiguity, triggers its deeper,
higher-order analysis. Structural ambiguity, then, opens the gateway to
third-order symbolism, by requiring listeners to perform a deeper struc-
tural analysis of the sounds than had been heretofore required. The
ambiguous affiliation of the middle term thus opens the gates to hierar-
chical structure.
Of course, these multiple interpretations of particular phonetic strings
should be few and far between, since most strings possess (1) sound-
sequencing cues, (2) meaning-sequencing cues, and (3) pragmatic cues to
the intended structure and meaning of the string. Consequently, and most
interestingly, it is exactly those rarely-encountered ambiguous forms that
are most important for the development of the system toward third-
order symbolic status. We turn to this issue now.
13.4.3 Hierarchical Constituent Structure, and the Rise of Recursion

Consider a longer string that is ambiguous, putikakatipu. This string
might be intended by the speaker as, say, putika-katipu, and yet is open
to a number of interpretations by the listener. For example, imagine the
ambiguous affiliation of its middle content, kaka. As listeners impose
binarity, both putikaka-tipu and puti-kakatipu may be interpreted,
assuming each of these makes sense to the listener. So far, this is exactly
the scenario just considered with respect to putika.
Clearly though, in comparison to putika, this longer string is impreg-
nable, with many more structures and meanings. Consider [[pu-ti]-kaka]-
tipu, or puti-[kaka-[ti-pu]], or [[puti-]ka]-[[kati]-pu], etc., some of which
might be sensibly interpretable by listeners under the appropriate real-
world conditions, even if the speaker intends a flat non-hierarchical
binary structure. Again, it is listeners expectation of binarity, coupled
with the semantic ambiguity of the string, that triggers these strings
deeper structural analyses, analyses that quickly culminate in both
hierarchical and now recursive structures, when embedding involves
elements of the same type. Indeed, recursion is considered by some
to be a primary characteristic of grammar (Hauser, Chomsky, and
Fitch 2002).
In sum, the phonetic product of two juxtaposed sounds of increased
length may lack semantic clarity, due to the ambiguous affiliation of its
middle span. These ambiguous forms prompt deeper structural analyses
on the part of listeners, culminating in both hierarchical and eventually
recursive configurations. In short, semantic ambiguity of structural origin
drives grammatical complexity.
All the major structural components of grammar have now emerged:
a lexicon, a phonology, a morphology, a syntax, a semantics. All the rest
is commentary. Now go study.
13.5 Discussion
When it comes to the origins of grammar, the search for evidence typi-
cally encompasses four domains:
1. Naturally occurring sub-language states in child learners, pidgins,
innovated signed languages, and impeded speech (due to drunkenness,
semi-consciousness, or pain, for example)
2. Ape-training studies
3. Laboratory experiments
4. Computer simulations
The present proposals exploit a fifth domain of inquiry, one of internal
reconstruction (Saussure 1879) taken to its final frontier. Internal recon-
struction is a method for investigating the origins of grammar inasmuch
as observing the receding of distant galaxies is a method for investigating
the real Big Bang: we observe extant pressures on structure and change,
and extrapolate them to their logical origins.
Several advantages arise from this approach to the origin of grammar.
1. These proposals properly treat language as a complex adaptive
system (Steels 2000; Beckner et al. 2009), one that is inherently social,
involving both speakers and listeners; one that is inherently dynamic,
involving competing pressures, and thus allowing for adaptive change;
one whose structures are wholly emergent; one that affectsand is
affected bythe co-evolutionary interactions of biological, cognitive,
and social structures.
2. The present approach strictly adheres to the tenets of Uniformitari-
anism (Hutton 1795; Lyell 18301833). As noted, the proposed pressures
and emergent structures by which the system originated remain in place
to this very day. And while Uniformitarianism does not rule out the pos-
sibility of punctuated equilibrium (Eldredge and Gould 1972)indeed,
the proposed grammatical Big Bang embodies this phenomenonstill,
saltation itself is fully absent: natura non facit saltum.
3. Speaker-based approaches to the evolution of grammar and gram-
matical change, as compared to listener-based approaches, are not equals-
and-opposites: production is solely relevant at the level of the speaker
(not the listener), whereas perception crucially relies on a role for both
the speaker and the listener. That is, perception is inherently dependent
on the interlocutionary event, whereas production is not. With its empha-
sis on the interlocutionary event itself, the present approach properly
situates the origins of grammar in the social world, a domain necessarily

involving both producer and perceiver. Thus, unlike speaker-based
approacheswhich sometimes propose a single mutation in a single
individual as the trigger of the grammatical Big Bang (for example,
Bickerton 1990; Hauser, Chomsky, and Fitch 2002)the present approach
allows for a genotypic change in a group of individuals that may have
been in place well before its phenotypic expression.
4. There need be no debate over whether grammar has its origins in a
system of cognitive organization rather than a system of cognitive expres-
sion (Bickerton 1990). Rather, as an emergent consequence of sporadic
semantic ambiguity in the sound signal, grammatical structures passively
come into being due to a necessary interaction between speaker and
listener, and most pertinently, these emergent structures necessarily
affect both organization and expression: speakers structurally ambigu-
ous productions trigger listeners higher-order structural analyses.
5. The current proposals take a decidedly holistic or Gestalt-based
view of both language structure and language evolution. Indeed, it would
be incorrect to characterize the present approach as one in which pho-
nology precedes syntax, or syntax precedes phonology, or anything
comparable. Rather, both phonetics and semanticsthe only compo-
nents of language that are empirically discernible by both language users
and language analysts (Kiparsky 1973)are inherently intertwined from
the outset. So-called intermediate levels of grammatical structure
phonology and syntaxpassively emerge from these two components
necessary interaction (and may, in fact, have no independent structural
standing).
6. Some assert that our sound communication system has achieved its
final state in the form of spoken language (Bickerton 1990; Mithen 1996;
Hauser, Chomsky, and Fitch 2002). For example, Mithen proposes that
language was triggered when supposedly distinct modules of intelli-
gence (Fodor 1983) eventually coalesced into one, oddly likening this
supposedly fully-culminated end-state of the human mind to a Christian
house of worship. The present approach imposes no such upper limit on
the evolution of the system. Indeed, perhaps the very same pressures that
gave rise to the system, and that continue to shape and change it,
also allow its evolution towards new, as-yet-unfathomed states of com-
plexity. For example, in coordination with vocal tract, brain, and social
changes, a slow decay of linearity (in the form of increased temporal
overlap of morpho-syntactic content) may result in an increase in both
parallel production (Mattingly 1981) and parallel processing (Rumelhart,
McClelland, and the PDP Research Group 1986); the present-day sen-
tence might shorten to present-day word length, and in turn, these
evolved word-sentences may be subject to an additional level of hier-
archical and recursive arrangement. The semantic content of these
higher-level (fourth-order?) structureswhatever they might turn out to
bemay force a re-evaluation of the present-day system as one of infi-
nite expressivity (Kirby 2007). Indeed, certain present-day languages
already reverberate with the stirrings of such properties: witness the
polysynthetic languages of North America, and the stem-modifying
languages of Meso-America and East Africa.
7. The present approach to the origins of grammar incorporates
degeneracy as an important component in its evolution: comparable
forms may have distinct functions, and single functions may be underlain
by multiple, different forms. Degeneracy may be a crucial element to the
introduction of hierarchical complexity in any complex adaptive system
(Whitacre 2010; see also Firth [1948] for an analysis in a specifically
linguistic context). Earlier employed to characterize genetic and biologi-
cal systems (Edelman and Gally 2001), degeneracy may be characteristic
of any system when categories are at once sufficiently robust to fulfill
and maintain their function (stability) and also sufficiently variable to
be under constant modification (evolvability). Clearly, the presence of
second order symbolismwith its one-to-many and many-to-one rela-
tions between form (sound) and function (meaning) paving the way to
third-order symbolism (hierarchical and recursive structures)is the
analog of this trait in the evolution of the speech code: a degenerative
grammar.
13.6. Conclusion: The Grammatical Big Bang
It may or may not be relevant that the acquisition of grammar by children

proceeds on a trajectory that reasonably hugs the levels of complexity
proposed herein for the origins of grammar itself, just as it may or may
not be relevant that implicational hierarchies concerning phonotactic
complexity also fit rather snugly into these proposals.
Still, there is likely no evolutionary-biological privilege bestowed upon
the primordial binary configuration that is characteristic of so many
grammatical structures, just as there is no evolutionary-biological privi-
lege bestowed upon the pentadactyl configuration among our planets
tetrapods. In both cases, there was merely a sensitivity to an initial
complex of conditions that culminated in these features prominent role

in the evolution of species.
Regarding these initial conditions, again, the humble origins of the
speech code may have consisted of extremely short, meaning-imbued
sounds uttered in isolation that first accompanied, and then replaced a
manual iconic communication system. These sounds yielding to their
juxtaposition in pairs may indeed have triggered a sort of grammatical
Big Bang. Phonetic and semantic pressures came to interact in a way that
inexorably, and perhaps rather suddenly, led to genuine grammatical
complexity: the conditioned expectation of binarity, coupled with the
sporadic semantic ambiguity of these increasingly long structures,
prompted listeners to perform deeper analyses in order to extract their
meaning, which in turn triggered the emergence of hierarchical and
recursive grammatical structures.
Again, semantic ambiguity of structural origin drives grammatical
complexity.
These primordial pressures and their yielded structures, in remarkably
similar function and form, continue to constrain, shape, and change the
speech code, even unto to this very day, and beyond.
13.7 Acknowledgments
Many thanks to James Winters, Paul Willis, Devin Casenhiser, and to

Simon Kirby and members of the Edinburgh University Language Evo-
lution and Computation Research Unit. Thanks especially to the editors
of this volume, and the reviewers of this submission. All embarrassing
errors are mine and mine alone. Happy birthday Ray, and thank you for
being my teacher all those years ago!
References
Ay, Nihat, Jessica C. Flack, and David C. Krakauer. 2007. Robustness and com-
plexity co-constructed in multi-modal signaling networks. Philosophical Transac-
tions of the Royal Society of London / B 362 (1479): 441447.
Beckner, Clay, Richard Blythe, Joan Bybee, Morton H. Christiansen, William
Croft, Nick C. Ellis, John Holland, Jinyun Ke, Diane Larsen-Freeman, and Tom
Schoenemann. 2009. Language is a complex adaptive system: Position paper.
Language Learning, 59 (s1): 126.
Bickerton, Derek. 1990. Language and Species. Chicago: University of Chicago
Press.
Bladon, Anthony. 1986. Phonetics for hearers. In Language for Hearers, edited
by Graham McGregor, 124. Oxford: Pergamon Press.
Delgutte, Bertrand. 1982. Some correlates of phonetic distinctions at the level of
the auditory nerve. In The Representation of Speech in the Peripheral Auditory
System, edited by Rolf Carlson and Bjrn Granstrm, 131150. Amsterdam:
Elsevier Biomedical.
Edelman, Gerald M., and Joseph A. Gally. 2001. Degeneracy and complexity in
biological systems. Proceedings of the National Academy of Sciences of the United
States of America 98 (24):1376313768.
Eldredge, Niles, and Stephen J. Gould. 1972. Punctuated equilibria: An alterna-
tive to phyletic gradualism. In Models in Paleobiology, edited by Thomas J. M.
Schopf, 82115. San Francisco: Freeman Cooper.
Firth, John R. 1948. Sounds and prosodies. Transactions of the Philological Society
47: 127152.
Fodor, Jerry A. 1983. Modularity of Mind: An Essay on Faculty Psychology.
Gurevich, Naomi. 2004. Lenition and Contrast: The Functional Consequences of
Certain Phonetically Conditioned Sound Changes. New York: Routledge.
Hauser, Marc D., Noam Chomsky, and W. Tecumseh Fitch. 2002. The faculty of
language: What is it, who has it, and how did it evolve? Science 298 (5598):
15691579.
Hayes, Bruce. 1995. Metrical Stress Theory: Principles and Case Studies. Chicago:
University of Chicago Press.
Hutton, James. 1795. Theory of the Earth; with Proofs and Illustrations. Edin-
burgh: Creech.
Kiparsky, Paul. 1973. How abstract is phonology? In Three Dimensions of Lin-
guistic Theory, edited by Osamu Fujimura, 556. Tokyo: The TEC Corporation.
Kirby, Simon. 2007. The evolution of language. In Oxford Handbook of Evolu-
tionary Psychology, edited by Robin Ian MacDonald Dunbar and Louise Barrett,
669681. Oxford: Oxford University Press.
Krakauer, David C., and Joshua B. Plotkin. 2004. Principles and parameters of
molecular robustness. In Robust Design: A Repertoire for Biology, Ecology and
Engineering, edited by Erica Jen, 115133. Oxford: Oxford University Press.
Kruszewski, Mikoaj. [1883] 1995. Oerk Nauki O Jazyke (An Outline of Linguis-
tic Science). Translated by Gregory M. Eramian. In Writings in General Linguis-
tics, edited by Ernst Frideryk Konrad Koerner, 43174. Amsterdam Classics in
Linguistics 11. Amsterdam: John Benjamins.
Labov, William. 1994. Principles of Linguistic Change: Internal Factors. Oxford:
Blackwell.
Ladefoged, Peter, and Keith Johnson. 2011. A Course in Phonetics. 6th ed. Inde-
pendence, KY: Wadsworth, Cengage Learning.
Lyell, Charles. 18301833. Principles of Geology, Being an Attempt to Explain the

Former Changes of the Earths Surface, by Reference to Causes Now in Operation.
3 vols. London: John Murray.
Martinet, Andr. 1952. Function, structure, and sound change. Word 8 (2): 132.
Mattingly, Ignatius G. 1981. Phonetic representations and speech synthesis by
rule. In The Cognitive Representation of Speech, edited by Terry Myers, John
Laver, and John Anderson, 415419. Amsterdam: North-Holland Publishing
Company.
Mithen, Steven J. 1996. The Prehistory of the Mind: The Cognitive Origins of Art
and Science. London: Thames and Hudson.
hman, Sven. 1966. Coarticulation into VCV utterances: Spectrographic mea-
surements. Journal of the Acoustic Society of America 39: 151168.
Rothenberg, Martin. 1968. The Breath-Stream Dynamics of Simple Released-
Plosive Production. Basel: S. Karger.
Rumelhart, David E., James L. McClelland, and the PDP Research Group. 1986.
Parallel Distributed Processing: Explorations in the Microstructure of Cognition.
Vol. 1. Foundations. Cambridge, MA: Bradford Books/MIT Press.
Saussure, Ferdinand de. 1879. Mmoire sur le systme primitif des voyelles dans
les langues indo-europennes. Leipzig: Teubner.
Silverman, Daniel. 2006. A Critical Introduction to Phonology: Of Sound, Mind,
and Body. London/New York: Continuum.
Silverman, Daniel. 2012. Neutralization (Rhyme and Reason in Phonology). New
York: Cambridge University Press.
Steels, Luc. 2000. Language as a complex adaptive system. In Parallel Problem
Solving from Nature. PPSN-VI, edited by Marc Schoenauer, Kalyanmoy Deb,
Gnther Rudolph, Xin Yao, Evelyne Lutton, Juan Julian Merelo, and Hans-Paul
Schwefel, 1726. Lecture notes on computer science 3242. Heidelberg:
Springer-Verlag.
Tomasello, Michael. 2008. Origins of Human Communication. Cambridge, MA:
MIT Press.
Trubetzkoy, Nikolai S. [1939] 1969. Principles of Phonology. Berkeley: University
of California Press.
Tyler, Richard S., Quentin Summerfield, Elizabeth J. Wood, and Mariano A.
Fernandes. 1982. Psychoacoustic and phonetic temporal processing in normal and
hearing-impaired listeners. Journal of the Acoustical Society of America 72 (3):
740752.
Whitacre, James M. 2010. Degeneracy: A link between evolvability, robustness
and complexity in biological systems. Theoretical Biology and Medical Modelling
7 (6): 117.
14 Arbitrariness and Iconicity in the Syntax-Semantics
Interface: An Evolutionary Perspective
Heike Wiese and Eva Wittenberg
14.1 Introduction
Most of Ray Jackendoffs work is concerned with cognitive systems such

as language (with its subsystems, i.e., syntax, semantics, and phonology),
music, and social and spatial cognition. Throughout his career, he has
been interested in how these systems are represented in the mind, how
they interact, and how they came about through evolution. In this article,
we offer a perspective on several of these systems, focusing on the struc-
tural parallelisms between different levels of representation.
Over the past thirty years, Ray Jackendoff has developed the model
of the Parallel Architecture for the language facultya powerful model
that effortlessly integrates insights from other areas of research on the
human mind, such as vision, music, and social cognition (Jackendoff 1997,
2002, 2007, 2012; Culicover and Jackendoff 2005). This model has been
deeply influential not only on research in theoretical linguistics, but also
on work in psycholinguistics and cognitive science. At this point, its basic
insights have become unspoken conventions in much empirically founded
research on language.
From a personal side for the two of us, Rays work has had an impact
lasting two generations. Heike first worked with Ray as a postdoc at
Brandeis University in the late 1990s, while on leave from Humboldt
University Berlin, funded by a DAAD stipend. This gave her the chance
to spend time in Rays department. This visit provided the venue for
many intense discussions with Ray on grammatical organization, seman-
tics, and human cognition; it offered insights that have fundamentally
shaped the way she thinks about linguistic architecture. From these roots
sprung her habilitation thesis, where she carried his notion of interfaces
further, integrated it into a formal model, and related it to questions of
linguistic processing. Rays influence has also had a lasting impact
278 Heike Wiese and Eva Wittenberg
generally on the way she approaches linguistic phenomena and tries to

connect them with other domains of human cognition.
Today, Eva, a former masters student and now a PhD student with
Heike, received her first linguistic training in the Parallel Architecture.
After a short visit at Tufts University to do research for her masters thesis,
she returned to work with Ray on the processing of light verb construc-
tions and on a model of semantic composition. The focus of Evas work is
on psycholinguistics, where many aspects of the Parallel Architecture
have been validated, and her conversations with Ray continue to shape
her thinking about linguistics and cognitive science in general.
In this paper, we spotlight a central feature of the Parallel Architecture,
namely the parallelisms and interfaces between different levels of repre-
sentation. In doing so, we attempt to synthesize many of Jackendoffs
wide-ranging interests, from the overall architecture of language to music,
evolution, and linguistic processing. We introduce a perspective on rituals
into an evolutionary account of linguistic architecture, show how paral-
lelisms in sound and meaning domains might have acted as stepping
stones for the emergence of linguistic symbols, and argue that direct
parallelisms between linguistic levels of representation are still the pre-
ferred option in modern language.
14.2 Linguistic Parallelisms and Symbolic Structure: Dependent Links as a

Central Feature of Language
The Parallel Architecture model recognizes a key problem that we face

as language users: the concepts we want to convey are structured hier-
archically within semantic representations, but the sound waves that
carry our message phonologically are linear in time (and the same applies
to visual representations in the case of sign languages). In order to bridge
this gap and thus make a connection between meaning and sound, lan-
guage makes use of a computational syntactic system (see figure 14.1).
From the perspectives of both semantics and phonology, linguistic
signs are arbitrary (in the sense of Saussure [1916]) and conventional;
that is, there is nothing inherent or causal in a sound that links it to a
certain meaning, and the other way around. This means that links between
individual signs and referents rely on memorization from the point of
view of the language user, which is necessarily limited. Nevertheless, we
can form an unlimited number of utterances with such a limited number
of signs because they are embedded within a symbolic system. The linking
of signs ranges from individual (word-level) form-meaning relations to
Arbitrariness and Iconicity in the Syntax-Semantics Interface 279
Syntax
hie
e ra
ur rc
t hic
truc al
rs str
ea uc
lin tu
re
Phonolgy Semantics
PHOL
SEM
PHON CS
Figure 14.1
Syntax as the combinatorial mechanism that translates linear structure from Phonology
into hierarchical structure in Semantics, and the other way around (with functions PHOL
and SEM that generate grammatically relevant representations of sound and meaning; for
definitions of such functions see Wiese [2003b, 2004]).
relations between expressions and relations between contents of expres-

sions; linguistic signs are crucially part of a system and take their signifi-
cance not primarily as individual elements, but with respect to their
position in this system. Figure 14.2 gives an example for the sentence
Paula bites Fred, where the upper level indicates linguistic signs, and the
lower level indicates their referents:
Paula is Subject of bites is Object of Fred
[BITING
is Agent of is Patient of
EVENT]
Figure 14.2
Links between sign-sign and meaning-meaning relationships for the sentence Paula bites
Fred.
As the illustration shows, in such complex sign-meaning pairings, hori-

zontal relations between signs (such as is Subject/Object of) and those
between their meanings (such as is Agent/Patient of), respectively, are
associated by vertical links that relate to the system. In what follows, we
refer to these links as (system-)dependent links (Wiese 2003a,b): the
form-meaning pairing here is based not only on the individual elements
(such as Paula referring to the cat, and Fred referring to the dog, or bites
referring to a biting event) but also on links that depend on their posi-
tions in their respective systems.
At that level, the link between sign and referent is iconic in nature: it
is not arbitrary, but rather based on a mapping between relations, a
structural correlation between two systems that constitutes something
we can think of as a second-order iconicity, that is, an iconicity not
between individual elements, but between the structures that they
support. Bhler (1934) already discussed this preservation of relations as
Relationstreue, which he interprets as a central feature of language, and
contrasts with the Material- oder Erscheinungstreue (constancy of mate-
rial or of appearance) of single signs:
Because of its entire structure, the reproduction of language does not emphasize
consistency in material (or: appearance) rather, through temporary constructions,
consistency in relations. (Bhler 1934, III 12.4; our translation from the German
original)
Dependent links, as relation-preserving connections between represen-

tational levels, are a central feature of language in the sense that they
enable the systematic derivation of interpretations for complex signs.
The development of this kind of linking can hence be regarded as an
essential step in the evolution of language. In earlier stages of language
evolution, relevant symbolic relations were plausibly a matter of linear
relations. This organizing principle still reverberates in modern language;
for example, there is a tendency for agent-first word order (see, for
example, Jackendoff and Wittenberg [2014]). Most modern languages,
however, now also rely on hierarchical syntactic relations and morpho-
logical markers to indicate these relations (Jackendoff 1999, 2002). One
way or another, the relations that exist in the respective systems deter-
mine the link between signs and their referents.
How could dependent links have evolved, what is a conceivable basis
for their development? In the following, we will examine an element of
human culture that might play a key role in this: rituals. As we will show,
rituals could have stepped up the development of such dependent links
in evolution, conducing to the emergence of complex human language.
14.3 How Rituals Could Have Supported the Emergence of Dependent Links
We assume that an essential step in the evolution of language is the one

that takes us from isolated signs that refer to particular referents (like
we find, e.g., in such animal communications as alarm calls in monkeys)
to symbolic signs, whose interpretation is dependent on their position in
a semiotic system, as illustrated in figure 14.2 above. In this development,
signs are no longer represented as isolated items, but function as ele-
ments of a system in particular relations to each other. What is required
is a progression from single signs to a system; this in turn can provide
the basis for dependent links, that is, second-order links between rela-
tions. In other words, some development resulted in brain structures
that permitted the more complex languages that humans speak today
(Jackendoff and Wittenberg 2014). Is it possible that this kind of cogni-
tive step was boosted by ritual contexts?
In modern cultures, we encounter rituals, for example, in religious
contexts, but also in secular ones such as spectator sports, for example.
A central feature of rituals is repetition. This means, on the one hand,
that rituals tend to be repeated in the same way over different occasions,
and on the other hand, that they involve, at the verbal and nonverbal
levels, activities characterized by strong parallelisms. As an illustration
for the latter, compare the following examples from language in a reli-
gious ritual (a passage from the Lords Prayer) and in a secular ritual (a
fan chant in football) (elements that establish parallelisms are marked
in bold):
(1) Thy kingdom come
Thy will be done
...
For thine is the kingdom
and the power
and the glory,
for ever and ever.
(2) Glory glory Leeds United,
glory glory Leeds United,
glory glory Leeds United.
And the boys go marching
on and on,
on and on,
on and on!
Such repetitive patterns might emphasize structural qualities. According

to Deacon (1997), ritual activities, including vocalizations, might thus
have boosted the development of symbolic signs from indices. As Wiese
(2007) showed, a closer look at the characteristics of rituals supports such
a view. In particular, there are five central features of rituals that could
have formed a basis for establishing sign-sign relations, supported the
transition to expressive aspects rather than instrumental ones, and made
rituals a central and early trait of human communities.
First, rituals are based on structural formalization and parallelisms.
They revolve around stylized, rhythmic sequences that are often highly
repetitive and fixed, with some minor variations between repetitions.
That is, between one part of a ritual and the next, there is a lot of paral-
lelism in sequence, rhythm, and often content. This leads to predictable
and easily memorizable patterns, while emphasizing the rules that form
the basis for these patterns, that is, their internal syntax. Thus, ritualiza-
tion leads to a salience of structure and, by doing so, can support the
emergence of sign-sign relationships.
Second, rituals are often multimodal; activities in one modality can be
enhanced and reinforced by activities in a different modality. This leads
to a further emphasis on structural features as system-internal relations
in the different modalities support each other. The structure of verbal
patterns, in particular, can be supported and thus further emphasized by
associated nonverbal patterns.
Third, ritual activities can evolve from activities that were initially
instrumental but then lost their original purpose, leading them to become
an expressive part of a ritual (Leach 1968). For example, many modern
baptism rituals seem to have evolved from washing and cleaning ones
body to a purification ritual of sprinkling water onto a candidates head.
Thus, in a baptism ritual, it is not important (or intended) that the person
becomes clean; instead, what is important is the communication of a
religious message. Thus, an activity within a ritual loses its instrumental
character to become expressive within a more abstract, second-order
purpose. In this process, the ritual becomes conventionalized and more
arbitrary, with the effect that its elements can then be changed without
affecting its expressive purpose. For example, baptism can be performed
by aspersion (sprinkling water), immersion (some part of the body is
immersed in water), or submersion (the water completely covers the
candidate). This is possible because the initially instrumental purpose of
cleaning has been replaced by an expressive ritual activity. Similarly, the
rhythmic steps in the Haka performance that some New Zealand rugby
teams show before gamestaking up elements of more traditional Maori

ritualsdo not fulfill the purpose of locomotion, but are used to express
a feeling of power, thus strengthening the bonds within the team and
intimidating their opponent.
Fourth, rituals have a social and conventional character and trigger
emotional effects in their participants that are important for social orga-
nization within a community. They synchronize affective processes and
have emotional, bonding effects on their participants. Thus, rituals can
promote social integration and also mark social transformation, making
rituals a central feature of human communities. A fact that further sup-
ports this aspect is that rituals have a positive effect on emotional well-
being (Lee and Newberg 2005), and this again seems to be linked to their
repetitive nature: it has been shown recently that structural, ritual-style
parallelisms in verbal stimuli facilitate processing of positive faces
(Czerwon et al. 2012).
Finally, ritual behaviour is a well-known aspect of animal communica-
tion (e.g., in mating rituals); this points to ancient evolutionary origins
for human rituals.
In short, rituals exhibit a high degree of parallelism and formalization
in different modalities, placing emphasis on a message, not the outcome
of an action; they serve to create bonds within a community, and they
are most likely evolutionarily ancient. Thus, rituals might have provided
a boost for dependent links, and ultimately for syntax. But how could
this boost have come about? As we will argue in the next section, music
in rituals might have been the crucial steppingstonewhich brings
us back to another domain that has been central to Ray Jackendoffs
research.
14.4 Music, Rituals, and Language Evolution
Music is an important part of human culture. Musical ability, albeit with

considerable variation, seems to be innate (Jackendoff and Lerdahl
2006). Music is also a very common part of rituals, and we argue that its
characteristics are particularly well suited to support the emergence of
dependent links in the domain of sounds. Like language, music involves
a complex combinatorial system, with its own subsystems, such as rhythm
and pitch, and hierarchical organization (Lerdahl and Jackendoff 1983).
Similar to language, complex musical structures are subject to well-
formedness conditions and are dominated more by preference rules.
Jackendoff (2012) devotes a chapter to analyzing the reasoning behind
musical decisions. While there are no changes in meaning when a

passage is played louder or softer, changes along these lines result in
strong intuitive judgments analogous to truth-value judgment of
sentences.
Evidence for the hierarchical representation of music, similar to the
one in language, can be seen in effects such as harmonic or crossmodal
priming and similar processing mechanisms involved in both (Tillmann,
Bigand, and Pineau 1998; Tillmann and Bigand 2002; Patel 2003, 2008;
Slevc, Rosenberg, and Patel 2009). Numerous neuropsychological inves-
tigations also indicate a relation between linguistic syntax and music. For
instance, an MEG study by Maess et al. (2001) shows that similar cere-
bral areas are involved in the sequencings of syntactic relations in lan-
guage and of harmonic and/or functional relations in music, namely, in
particular Brocas area and its right-hemispheric counterpart (in the
inferior frontal-lateral cortex). These results are also supported by studies
that use different methods, such as an fMRI study by Fiebach et al. (2002)
and an ERP study by Patel (1998); for a greater syntactic complexity,
Fiebach et al. (2002) prove a sensitivity of the right-hemispheric homo-
tope of Brocas area; Patel (1998) shows similar patterns in reaction to
syntactic errors and musical harmonic irregularities.
Musical cognition thus also contains links between linear and hierar-
chical relations in the acoustic domain, similar to language (Jackendoff
2002). However, hierarchical relations in music are not linked with hier-
archical relations in the conceptual system; no systematic connection of
complex sound structures and propositional structures occurs. Instead,
music expresses affective content (cf. also Dienes et al. [2012]):
Unlike language, music does not communicate propositions that can be true or
false. Music cannot be used to arrange a meeting, teach someone how to use a
toaster, pass on gossip, or congratulate someone on his or her birthday (except
by use of the conventional tune). (Jackendoff and Lerdahl 2006, 6061)
Against this background, we can think of a scenario for the development

of grammatical structures based on music, with its ritual characteristics,
along the following lines. The starting point for this scenario is that rituals
primarily serve to express affective or emotional content instead of prop-
ositional content. Music supports this purpose, and could thus have sup-
ported the development of phonological structures in early vocalizations.
Just like music, phonology in language involves hierarchical sound struc-
tures without mapping them onto hierarchical propositional structures.
At this stage, the internal hierarchical phonological organization enables
the generation of a large number of possible sound combinations and
forms the basis for phonological processes and restrictions, but it does
not link to hierarchical structures in meaning; no dependent links are
created as of yet.
However, with hierarchical sound structures established, it would
be possible for these patterns to be transferred to complex elements
above the syllable level and thus to meaningful elements. Once hierarchi-
cal structures are salient for verbal elements that carry meaning,
this could then give rise to links with hierarchical representations
of meaning. Complex sounds could now obtain their interpretation
through the connection of sign-sign relations between their referents.
This provides the basis for a grammatical system like that found in
modern language, that is, a system based on the correlation of sound
relations with hierarchical semantic relations through a syntactic system
that organizes dependent links tying together parallel structures in dif-
ferent domains.
14.5 The Connection between Syntax and Semantics
Usually, mappings that realize dependent links between syntax and

semantics will reflect a direct second-order iconicity, that is, they will lead
to close parallelisms between the two systems. However, once dependent
links are in place, all that is needed is that these links are dependent on
system-internal relations. Hence, they can also constitute more complex
mappings, allowing deviations from a strict parallelism. Such deviations,
then, reflect a linguistic arbitrariness at a higher, system-targeted level:
the option to link up relations from different grammatical subsystems in
complex patterns (Wiese 2003b).
This option notwithstanding, a direct parallelismthat is, one that
reflects a close similarity between structuresis the default, and it is
highly preferred in the linguistic domain. Sentences where the semantic
structures map straightforwardly onto surface syntactic structure are
easier to understand than sentences where this parallelism is interrupted.
For example, studies that compare the processing of parallel and non-
parallel syntactic and semantic structures for such phenomena as aspec-
tual coercion, light verb constructions, and temporal order in discourse,
have shown that such non-parallel structures can create processing dif-
ficulties. This difference in processing indicates a lasting disposition of
the linguistic system for dependent links that work in a straightforward
manner, relying on direct structural parallelisms between grammatical
subsystems.
In order to discuss such findings in light of our previous argument on

dependent links, consider first examples (35), in which structural paral-
lelism is preserved, as a background for our comparisons:
(3) Harry read the book.
(4) Harry gave Sally a book.
(5) Sally gave Harry a book last Christmas. Now Harry is reading it.
In (3), the syntactic subject (Harry) corresponds to the semantic Agent
of the reading action; the object (the book) denotes the semantic Patient
(or Theme) of the reading action. This correspondence between syntactic
structure and semantic roles reflects the default case, where the depen-
dent links within and between each level of representation preserve
structures in a relation-consistent way, that is, they are isomorphic
between syntactic and conceptual structures (for a similar idea, cf. Culi-
cover and Nowak [2002]). Looking at this from a comprehenders point
of view, the form of the sentence directly leads to the interpretation.
Similarly, in (4): Harry is the syntactic subject and refers to the Agent of
the giving action; Sally is the first object syntactically and refers to the
Recipient of the giving action; and the book is the second object, and
refers to the Theme. Again, through consistent dependent links, the com-
prehender is able to directly read off the meaning from the syntactic
configuration of the sentence. Finally, (5) is an example of the same
phenomenon at the discourse level: there is a parallelism in the arrange-
ment of information between the two sentencesfirst the sentence
referring to the giving action, followed by the sentence referring to the
reading actionand the order of the intended meaningfirst giving,
then reading.
The coherence and parallelism of the dependent links in these exam-
ples leads to a direct second-order iconicity, a parallelism feeding into
dependent linking that should support linguistic processing. Experimen-
tal evidence indicates that this kind of parallelism is indeed preferred in
processing. Consider (6), the nonparallel counterpart to (3):
(6) Harry began the book.
What we find in (6) can be described as complement coercion (Jack-
endoff 1997): here, the event-selecting verb begin coerces a physical
object as its argument. In contrast to the canonical case of (3), the paral-
lelism between syntactic and conceptual structure is interrupted: Harry
is the grammatical subject and refers to the Agent, and the book is the
grammatical object and refers to a Patient, but not to that of begin (which
by itself does not denote an action), but of a different event, such as

reading or writing, which needs to be inferred from the Patient. These
coercions occur very frequently, and yet they are harder to process than
such non-coerced sentences as Harry wrote the book. In recent years,
psycholinguists have amassed evidence for this phenomenon using a
number of techniques, such as fMRI, MEG, and ERP (Husband, Kelly,
and Zhu 2011; Pylkknen et al. 2009; Kuperberg et al. 2010), self-paced
reading and eye-tracking (Katsika et al. 2012; Lapata, Keller, and Scheep-
ers 2003; McElree et al. 2006; Pickering, McElree, and Traxler 2005;
Traxler et al. 2005; Traxler, Pickering, and McElree 2002), and speed
accuracy trade-off (McElree et al. 2006). These studies have repeatedly
shown that processing coerced sentences as in (6) is tied to cognitive
costs that differ from processing non-coerced sentences, as in (3); argu-
ably, they are harder to comprehend (Katsika et al. 2012).
Next, consider (7), where syntactic form is the same as in (4), but where
the parallelism between the dependent links in syntax and semantics are
interrupted:
(7) Harry gave Sally a kiss.
This is a light verb construction where, again, Harry is the grammatical
subject and refers to the Agent, and Sally is the grammatical object. But
this time, Harry is not the Agent of givein the real world, he is not
giving Sally anything, but he is kissing her. Similarly, Sally is the Patient
of kiss. Thus, in this example, the syntactic form would suggest a transfer
event, as is usually denoted by sentences that have give as their main
verb, but in the semantic domain, the linkage is between the Agent and
Patient of an event of physical contact, namely kissing. Even though
these light verb constructions are very frequent and thus familiar to
speakers, several studies have shown that the disrupted parallelism leads
to longer reaction times and increased demand on working memory
(Piango, Mack, and Jackendoff 2006; Wittenberg and Piango 2011;
Wittenberg et al., 2014; see also Wittenberg [2013] on the sensitivity of
methods that is necessary to detect such effects).
Furthermore, light verb constructions offer a glimpse into just how
much comprehenders like to rely on direct, second-order iconicity and
direct parallelisms between syntactic and semantic structures: they
even categorize events of the sort described by giving a kiss as being
somewhat similar to transfer events like giving a book, and significantly
different than events described by the base verb kissing (Wittenberg and
Snedeker 2014).
Finally, consider example (8), which describes the same events as (5),
but in a different order:
(8) Harry read a book. Last Christmas, Sally had given it to him.
In this example, there is a mismatch between the order of events (first
the giving, then the reading) and the syntactic order of the sentences
referring to these events (the sentence referring to the reading event is
followed by that referring to the giving event). Again the parallelism
between semantic and syntactic structure is disrupted, this time at the
level of discourse. Numerous studies using different technologies such as
probe-recognition, recall, or measuring ERPs have shown that scenarios
like (8), in which the direct second-order iconicity is disrupted, require
more processing effort than scenarios in which this order is preserved
(Briner, Virtue, and Kurby 2012; Mnte, Schiltz, and Kutas 1998; Ohtsuka
and Brewer 1992). Also, childrens understanding of sentences with
chronologically ordered events (Ilkka read the letter before he went to
school) is better than their understanding of sentences where the order
of events is inverse (Before Ilkka went to school, he read the letter;
Johnson 1975; Notley et al. 2012; Pyykknen and Jrvikivi 2012). Thus,
there is ample evidence that the parallelism between conceptual and
discourse structure aids comprehension and memory.
14.7 Conclusion: Parallelism Helps
In this paper, we have examined the parallelisms and interfaces between

levels of representation, cognitive domains, and linguistic subsystems. We
showed that Ray Jackendoffs Parallel Architecture is not only helpful
and theoretically adequate when researching grammatical structures, as
others in this book have described, but it also connects well with phe-
nomena from other cognitive domains, evolution, and arguments on such
aspects of social cognition as ritual behavior in humans.
The starting point of our argumentation was that dependent links were
an essential step in the development of modern language. These links
connect signsign relations with relations between (conceptualizations
of) referents. We then explained how rituals could have been a boost for
the development of such dependent links. An important aspect we
focused on is the repetitive nature of rituals: rituals are characterized by
parallelisms and formalizations that emphasize structural features, thus
forming a potential basis for the development of grammatical relations.
Rituals might thus have facilitated the linking of relations with relations,
that is, the development of dependent links, and could have provided the
crucial basis for the development of syntactic structure.
We identified a domain with strong ritual characteristics as particularly
significant in this development, namely, music. A central and evolution-
ary early phenomenon in human cultures, music not only supports the
linking of relations in general but also provides a domain for the linking
of linear and hierarchical relations in the acoustic domain in particular.
This linking can, in further steps, be transferred to meaningful elements
and connect linear representations from the acoustic domain with hier-
archical meanings; in todays language, dependent links are what ulti-
mately get us from sound to meaning. We argued that this works best
when the representational levels that are linked with each other run
closely in parallel with respect to their structures, thus allowing straight-
forward dependent links. We provided three examples for phenomena
where such a parallelism was disruptedcoercion, light verb construc-
tions, and temporal dissociation in discourseand we reviewed psycho-
linguistic evidence that shows how comprehenders rely on direct
parallelism and perform poorer or slower when such parallelism is
absent.
Taken together, this article provided something like a voyage through
a number of areas that have benefited from Ray Jackendoffs research:
theories about such diverse topics as grammar, music, cognition, and the
evolution of language are, in his mind, never far apart; and he constantly
seeks evidence for or against his theories in a variety of places, as his
various endeavours with psycholinguists show. In his own words:
[S]cience is a lot like chamber music. You cant just do your own stuff. You have
to be constantly listening to everyone. Sometimes the crucial facts come from
your own field, sometimes from the most unexpected place in someone elses.
Were all in this together, and the goal is to create a coherent story about thought
and meaning and the mind and the brain that will satisfy usand, we hope,
posterity. (Jackendoff 2012, 213)
References
Briner, Stephen W., Sandra Virtue, and Christopher A. Kurby. 2012. Processing
causality in narrative events: Temporal order matters. Discourse Processes 49 (1):
6177.
Bhler, Karl. 1934. Sprachtheorie: Die Darstellungsfunktion der Sprache. Jena: G.
Fischer.
Culicover, Peter W., and Ray Jackendoff. 2005. Simpler Syntax. Oxford: Oxford
University Press.
Culicover, Peter W., and Andrzej Nowak. 2002. Learnability, markedness, and
the complexity of constructions. In Language Variation Yearbook, vol. 2, edited
by Pierre Pica and Johan Rooryk, 530. Amsterdam: John Benjamins. Reprinted
in Peter W. Culicover, Explaining Syntax, 530. Oxford: Oxford University
Press, 2013.
Czerwon, Beate, Annette Hohlfeld, Heike Wiese, and Katja Werheid. 2012. Syn-
tactic structural parallelisms influence processing of positive stimuli: Evidence
from cross-modal ERP priming. International Journal of Psychophysiology 87
(1): 3834.
Deacon, Terrence William. 1997. The Symbolic Species: The Co-evolution of Lan-
guage and the Brain. New York: Norton & Co.
Dienes, Zoltn, Gustav Kuhn, Xiuyan Guo, and Catherine Jones. 2012. Commu-
nicating structure, affect, and movement. In Language and Music as Cognitive
Systems, edited by Patrick Rebuschat, 156168. Oxford: Oxford University Press.
Fiebach, Christian J., Matthias Schlesewsky, Ina D. Bornkessel, and Angela D.
Friederici. 2002. Specifying the brain bases of syntax: Distinct fMRI effects of
syntactic complexity and syntactic violations. Paper presentated at the 8th Annual
Conference on Architectures and Mechanisms for Language Processing (AMLAP
2002), Tenerife, Spain, September 2002.
Husband, E. Matthew, Lisa A. Kelly, and David C. Zhu. 2011. Using complement
coercion to understand the neural basis of semantic composition: Evidence from
an fMRI study. Journal of Cognitive Neuroscience 23 (11): 32543266.
MIT Press
Trends in the Cognitive Sciences 3 (7): 272279.
Jackendoff, Ray. 2002. Foundations of Language. Oxford: Oxford University
Press.
ture. Cambridge, MA: MIT Press.
University Press.
Jackendoff, Ray, and Fred Lerdahl. 2006. The capacity for music: What is it, and
whats special about it? Cognition 100 (1): 3372.
Jackendoff, Ray, and Eva Wittenberg. 2014. What you can say without syntax: A
hierarchy of grammatical complexity. In Measuring Linguistic Complexity, edited
by Frederick Newmeyer and Laurel Preston, 6582. Oxford: Oxford University
Press.
Johnson, Helen L. 1975. The meaning of before and after for preschool children.
Journal of Experimental Child Psychology 19 (1): 8899.
Katsika, Argyro, David Braze, Ashwini Deo, and Mara Mercedes Piango. 2012.
Complement coercion: Distinguishing between type-shifting and pragmatic infer-
encing. The Mental Lexicon 7 (1): 5876.
Kuperberg, Gina R., Arim Choi, Neil Cohn, Martin Paczynski, and Ray Jackend-
off. 2010. Electrophysiological correlates of complement coercion. Journal of
Cognitive Neuroscience 22 (12): 26852701.
Lapata, Mirella, Frank Keller, and Christoph Scheepers. 2003. Intra-sentential
context effects on the interpretation of logical metonymy. Cognitive Science 27
(4): 649668.
Leach, Edmund R. 1968. Ritual. In International Encyclopedia of the Social Sci-
ences, vol. 13, edited by David L. Sills, 520526. New York: Macmillan.
Lee, Bruce Y., and Andrew B. Newberg. 2005. Religion and health: A review and
critical analysis. Zygon 40 (2): 443468.
Maess, Burkhard, Stefan Koelsch, Thomas C. Gunter, and Angela D. Friederici.
2001. Musical syntax is processed in Brocas area: An MEG study. Nature Neu-
roscience 4 (5): 540545.
McElree, Brian, Liina Pylkknen, Martin J. Pickering, and Matthew J. Traxler.
2006. A time course analysis of enriched composition. Psychonomic Bulletin and
Review 13 (1): 5359.
Mnte, Thomas F., Kolja Schiltz, and Marta Kutas. 1998. When temporal terms
belie conceptual order. Nature 395 (6697): 7173.
Notley, Anna, Peng Zhou, Britta Jensen, and Stephen Crain. 2012. Childrens
interpretation of disjunction in the scope of before: A comparison of English
and Mandarin. Journal of Child Language 39 (3): 482522.
Ohtsuka, Keisuke, and William F. Brewer. 1992. Discourse organization in the
comprehension of temporal order in narrative texts. Discourse Process 15 (3):
317336.
Patel, Aniruddh D. 1998. Syntactic processing in language and music: Different
cognitive operations, similar neural resources? Music Perception 16 (1): 2742.
Patel, Aniruddh D. 2003. Language, music, syntax and the brain. Nature Neurosci-
ence 6 (7): 674681.
Patel, Aniruddh D. 2008. Music, Language, and the Brain. New York: Oxford
University Press.
Pickering, Martin J., Brian McElree, and Matthew J. Traxler. 2005. The difficulty
of coercion: A response to de Almeida. Brain and Language 93 (1): 19.
Piango, Mara M., Jennifer Mack, and Ray Jackendoff. Forthcoming. Semantic
combinatorial processes in argument structure: Evidence from light verbs. In
Proceedings of the 32nd Annual Meeting of the Berkeley Linguistic Society. Berke-
ley, CA: Berkeley Linguistics Society.
Pyykknen, Pirita, and Juhani Jrvikivi. 2012. Children and situation models of
multiple events. Developmental Psychology 48 (2): 521529.
Pylkknen, Liina, Andrea E. Martin, Brian McElree, and Andrew Smart. 2009.
The anterior midline field: Coercion or decision making? Brain and Language
108 (3): 184190.
Saussure, Ferdinand de. 1916. Cours de linguistique gnral. Paris: ditions Payot
et Rivages.
Slevc, L. Robert, Jason C. Rosenberg, and Aniruddh D. Patel. 2009. Making psy-
cholinguistics musical: Self-paced reading time evidence for shared processing of
linguistic and musical syntax. Psychonomic Bulletin and Review 16 (2): 374381.
Tillmann, Barbara, and Emmanuel Bigand. 2002. A comparative review of
priming effects in language and music. In Language, Vision, and Music, edited by
Paul Mc Kevitt, Sen Nuallin, and Conn Mulvihill, Advances in Conscious-
ness Research 35, 231240. Amsterdam: John Benjamins.
Tillmann, Barbara, Emmanuel Bigand, and Marion Pineau. 1998. Effects of
global and local contexts on harmonic expectancy. Music Perception 16 (1):
99118.
Traxler, Matthew J., Brian McElree, Rihana S. Williams, and Martin J. Pickering.
2005. Context effects in coercion: Evidence from eye movements. Journal of
Memory and Language 53 (1): 125.
Traxler, Matthew J., Martin J. Pickering, and Brian McElree. 2002. Coercion in
sentence processing: Evidence from eye-movements and self-paced reading.
Wiese, Heike. 2003a. Numbers, Language, and the Human Mind. Cambridge:
Wiese, Heike. 2003b. Sprachliche Arbitraritt als Schnittstellenphnomen [Lin-
guistic Arbitrariness as an Interface Phenomenon]. Habilitation thesis, Hum-
boldt University.
Wiese, Heike. 2004. Semantics as a gateway to language. In Mediating between
Concepts and Language, Trends in Linguistics 152, edited by Holden Hrtl and
Heike Tappe, 197222. Berlin: Mouton de Gruyter.
Wiese, Heike. 2007. Grammatische Relationen und rituelle Strukturenein evo-
lutionrer Zusammenhang? In WahlverwandschaftenVerben, Valenzen, Vari-
anten: Festzeitschrift fr Klaus Welke zum 70. Geburtstag, Germanistische
Linguistik 188/189, edited by Hartmut E. H. Lenk and Maik Walter, 113136.
Hildesheim: Georg Olms.
Wittenberg, Eva, and Mara M. Piango. 2011. Processing light verb constructions.
The Mental Lexicon 6 (3): 393413.
Wittenberg, Eva, Martin Paczynski, Heike Wiese, Ray Jackendoff, and Gina
Kuperberg. 2014. The difference between giving a rose and giving a kiss:
Sustained neural activity to the light verb construction. Journal of Memory and
Language 73: 3142.
Wittenberg, Eva. 2013. Paradigmenspezifische Effekte subtiler semantischer
Manipulationen. Linguistische Berichte 235: 293308.
Wittenberg, Eva, and Jesse Snedeker. 2014. It takes two to kissbut does it take
three to give a kiss? Conceptual sorting based on thematic roles. Language,
Cognition and Neuroscience 29 (5): 635641.
15 The Biology and Evolution of Musical Rhythm: An
Update
W. Tecumseh Fitch
15.1 Introduction
Ray Jackendoff stands out in contemporary cognitive science in the

consistency with which he has embraced and furthered a formal approach
to human cognitive abilities. In particular, his work on language and
music provides an excellent illustration of the value of rigorous, formal
conceptions in clarifying our thinking and allowing precise contrasts
and comparisons that would, without formalization, remain fuzzy and
metaphorical. In this essay, I address one of the issues that have come
up repeatedly during Jackendoffs long and productive career: the rela-
tionship between the human capacities to acquire language or music.
Speculations on this topic are legion, with prominent commentators
including Jean-Jacques Rousseau, Charles Darwin, and Leonard Bern-
stein. But I believe it is safe to say that Jackendoffs contributions to the
issue are so fundamental that future discussions of this relationship will
never be the same. His and his colleagues work is and will remain the
touchstone to which present and future music/language comparisons
must return and against which they will be continually compared and
reevaluated.
I will not try to detail the many ways in which Jackendoffs research
on language and music have led to further productive inquiry (for a
review see Patel 2008). Nor will I survey the large literature comparing
and contrasting the two domains in general terms (Rousseau [1781] 1966;
Darwin 1871; Cooke 1959; Martin 1972; Simon 1972; Bernstein 1981;
Levman 1992; Merker 2002; Patel 2003; Mithen 2005; Vaux and Myler
2012; Lerdahl 2013). My aim is more modest: to update one corner of
the music/language comparisonrhythmbased on recent biological
findings. I will focus on rhythm in music and language from the viewpoint
of cognitive biology, reviewing a body of comparative work that helps
294 W. Tecumseh Fitch
clarify and ground our thinking about rhythmic cognition from a biologi-
cal and evolutionary viewpoint.
15.2 Adopting a Multi-Component Approach: Divide and Conquer
The starting point for any comparison of music and language, following
Jackendoffs lead, is to adopt a divide and conquer strategy in both the
musical and linguistic domains. There is an unfortunate tendency in the
cognitive science literature to adopt an overly monolithic view of capaci-
ties like language, music, social intelligence, and similar abilities, rather
than to squarely face their composite, multi-component nature. A mono-
lithic viewpoint leads all too naturally to the wrong questions, such as
when did language evolve? (as if all components have evolved at some
specific moment in our evolutionary history) or where is music located
in the brain? (as if the complex of perception, abstract cognition, and
production underlying music would occupy a single cortical region). The
antidote to this tendency is to recognize that any complex cognitive
capability will, when properly broken down and understood, prove to
rely upon a suite of interacting cognitive and neural capabilities, each of
which may well have its own independent evolutionary history and
neural implementation.
Jackendoff has clearly and forcefully advocated a multi-component
approach in both of these domains. In language evolution, he has offered
one of the most finely articulated multi-step scenarios for the evolution
of specific components of language, clearly separating the evolution of
phonology, syntax, and semantics (Jackendoff 1999). His general approach
to language as a system, the Parallel Architecture, embodies the need for
the separation of sub-capacities (Jackendoff 2002, 2007). Similarly in
music, his joint work with Fred Lerdahl articulates the multiple interact-
ing layers of rhythm, melody, and harmony, again illustrating that the
clearest path to understanding is to first analytically carve nature at the
joints, investigate the pieces, and then synthetically consider their inter-
actions. This becomes particularly crucial when comparing music and
language, since we can safely assume a mixture of distinctness and
overlap in their individual components.
Overall, Jackendoffs approach to the music/language comparison has
been agnostic: he proposes that we analyze each domain in its own terms,
and then let the chips fall where they may (Jackendoff and Lerdahl
1982; Lerdahl and Jackendoff 1983; Jackendoff and Lerdahl 2006; Jack-
endoff 2009). Not all commentators on this issue have been equally
The Biology and Evolution of Musical Rhythm: An Update 295
Music
Music
Music
language
Language
Language
Modularity Partial overlap Identity
Figure 15.1
Three models for the relationship between music and language.
agnostic, and the scholarly literature includes outspoken advocates for a

wide range of hypotheses regarding the relationship between these two
cognitive domains, which I now briefly survey.
15.3 Hypotheses about the Relationship between the Music and Language
Capacities
Both music and language are universal human capacities, found in every
known culture. Both domains appear to rest on some species-specific
biological basis, but nonetheless encompass a large number of culturally-
acquired instantiations (different languages and different musical idioms).
Both are generative systems that make infinite use of finite means,
combining atomic primitives (notes, phonemes) into hierarchical com-
plexes (melodies, words, sentences). But despite these similarities, the
differences between music and language are equally obvious: most prom-
inently, music lacks the form of explicit, proposition-based semantics that
gives language its semantic power (Fitch 2006; Jackendoff and Lerdahl
2006; Jackendoff 2009). Music also has typical features lacking in lan-
guage, such as isochronicity (a steady beat) and a discretized frequency
range (pitch system) (Nettl 2000); Western tonal music also features a
complex harmonic syntax (Jackendoff and Lerdahl 2006). Fitch (2006)
dubbed these design features of music. Understanding this complex
pattern of similarities and differences clearly necessitates a multi-
component approach to comparison (Patel 2008; Jackendoff 2009).
Researchers who have adopted specific multi-component models have
nonetheless reached quite different conclusions (figure 15.1). On the
different side, there is a long tradition in neurology of seeing the neural
bases for language and music as being spatially nonoverlapping in the

brain, as evidenced by double dissociations between amusia and aphasia
(reviewed by Peretz and Coltheart 2003). Furthermore, congenital
amusics show a lifelong lack of musical ability, while exhibiting normal
language skills and intelligence (Peretz et al. 2002; Dalla Bella, Gigure,
and Peretz 2009). Such findings have led some researchers to draw rather
clear dividing lines between music and language (Peretz and Morais
1989; Peretz and Coltheart 2003). At the opposite extreme, numerous
writers have recently championed the idea that some sub-components of
music and language are in fact identical. For instance, Katz and Pesetsky
(2009) have advanced an identity thesis, hypothesizing a core compu-
tational component shared by harmonic and linguistic syntax. Similarly,
several linguists (Roberts 2012; Vaux and Myler 2012) have embraced a
strict identity thesis for the hierarchical metrical structuring of stress
patterns in phonology with that of musical rhythm (to which we return
below). Between these two extremes, many commentators embrace a
mixed model of partially shared computational resources. For example,
Aniruddh Patel (Patel 2003, 2008, 2013) hypothesizes that while the
representations involved in linguistic and musical syntax are distinct
(notes versus words), processing and integration of long-distance depen-
dencies is done using the same neural resources. Jackendoff (2009) also
suggests a mixed model as the one that best captures the empirical reality.
I will now further explore this partial overlap conception, focusing
on specific features of rhythmic cognition. Initially, a clear distinction is
required between the perception and production of an isochronic pulse
or tactustypical of music but not of speechand metrical structure
which may be partially or entirely shared between speech and music
(Liberman and Prince 1977; Lerdahl and Jackendoff 1983; Patel 2008;
Jackendoff 2009). I join Lerdahl and Jackendoff in seeing poetry or song
lyrics as an imposition of a musical structure upon the speech stream
(Lerdahl 2001; Jackendoff 2009). Thus, in ordinary speech the stress
pattern of a phrase is perceived in the absence of isochrony, while in
poetry or lyrics this pattern must be aligned, perhaps imperfectly, to
an independent metrical grid. I will therefore treat isochrony/pulse per-
ception and meter perception as conceptually distinct processes (cf. Fitch
2013c) and explore these two elements in turn, focusing on comparisons
between humans and other animals.
There has until recently been rather little comparative research inves-
tigating rhythm in nonhuman species. Although Darwin, in The Descent
of Man (1871), confidently stated that The perception, if not the
enjoyment, of musical cadences and of rhythm is probably common to

all animals, and no doubt depends on the common physiological nature
of their nervous systems (Darwin 1871, 333), he never attempted to
precisely characterize musical cadence (meaning melody) or rhythm
in detail. Subsequent research strongly suggested that Darwin was (for
once) wrong in these statements, and that indeed even the simple capac-
ity to entrain ones own voice or other actions to an externally-generated
pulse, far from being common to all animals, is very limited in the animal
world.
15.4 Pulse Perception and Entrainment in Nonhuman Animals
The longest-known examples of animal entrainment do not come from

so-called higher vertebrates like birds or mammals, but rather from
some insect and frog species. A striking example of group entrainment
on a massive scale is found in several species of firefly (Buck 1988).
Fireflies are insects, a well-known type of winged beetle in the family
aptly named Lampyridae, which contains roughly 2000 species. In general,
fireflies have a capacity for bioluminescence, and this is often used in a
courtship and mating context, sometimes by both sexes, and often by
males alone.
In several different firefly species, in particular the Indo-Malayan
species Pteropyx malaccae, large assemblages of males engage in group
entrainment, such that an entire tree full of fireflies all begin flashing in
precise 0 synchrony (that is, all flashing simultaneously). This level of
synchronization is quite outstanding among nonhuman species, and P.
malaccae probably represents the animal species whose synchronization
abilities are most closely analogous to those exhibited in human musical
ensemble playing. Compelling models of the neural and mathematical
basis for such entrainment now exist (Ermentrout 1991; Strogatz and
Stewart 1993; Strogatz 2003), and it appears that the tight synchroniza-
tion of flashing in this species is accomplished, as in human rhythmic
playing or dancing, both by modifying the internal periodicity (tempo
adjustment) and adjusting the phase of an internal, neurally-generated
clock. This tempo/phase combination is very unusual in animalsmost
of whom can only entrain to a very narrow, fixed tempoand it matches
closely what a human listener must do in order to clap or dance along
with a novel piece of music.
Surprisingly, despite many decades of research on these fireflies (Buck
1938, 1988), the adaptive function of Pteropyx synchronous flashing
remains uncertain (Greenfield 2005). One compelling hypothesis is that

synchronization acts to sum signals together, creating a more powerful
overall signal to attract females from further away. Such synchronization
has often initially been considered to be a cooperative endeavor, in
which, by combining their relatively weak signals, a group of males can
collectively generate a louder or brighter signal. This in turn should
attract females from further away (cf. Merker 2000). While this is intui-
tively plausible, the question remains whether the net number of females
per male would be increased by such an effect. Data on this count for
other species (Gerhardt and Huber 2002) suggest that in general female
preference for choruses is not strong enough to compensate for the dilu-
tion in sex ratio caused by the greater number of competing males. These
basic considerations have led to other adaptive explanations that have
better empirical support.
In the auditory domain, several frog species are known in which males
mating calls are reasonably well-synchronized (Wells 1977), and in many
insect species (including cicadas, crickets, and katydids) where males call
to attract females, spontaneous entrainment of these calls is observed to
produce large, roughly-synchronized choruses of calling males (Alexan-
der 1975). These acoustic displays rarely, or never, approach the degree
of synchronization seen in Pteropyx malaccae (Gerhardt and Huber
2002; Greenfield 2005). Probably the species that comes closest to human
or firefly accuracy in the auditory domain are cicadas in the genus
Magicicada, especially the seventeen-year cicada Magicicada cassini.
These North American insects spend most of their lives in an under-
ground larval stage, until they emerge en masse simultaneously every
seventeen years to form huge breeding concentrations. This broad scale
clustering in space and time is thought to afford protection from preda-
tion and parasitism via over-satiation. This safety in numbers hypoth-
esis is reasonably well supported for this and other chorusing species
(Greenfield 2005), but it doesnt explain why precise synchronization at
the millisecond level would evolve. In any case, unlike in fireflies or a
human symphony orchestra, only neighboring cicadas are synchronized,
and at the larger level (e.g., an entire tree), there is a continuous ebb and
flow of sound, not a concerted pulse by the entire chorus or ensemble.
The apparent failure of the obvious evolutionary hypotheses to explain
synchronous chorusing as an adaptation in itself has led to exploration
of alternative perspectives (e.g., that it is a side-effect of something else).
In at least some species, it now seems clear that synchronization is a
non-adaptive by-product of competitive interactions, resulting from
males attempting to jam each others signal (Greenfield and Roizen

1993). In this case, rather than inferring a general pulse and adjusting
their phase, males appear to be very rapidly reacting to a neighbors
individual pulses. The male can then produce his own output after a slight
lag (leading to an alternated staggering or hocketting of calls) or adjust
his call to coincide with, but slightly lead, the calls of other males. This
leads to a leap-frog phenomenon, in which males roughly alternate in
leading and following roles (Ravignani 2014). Since females in many
species appear to be preferentially attracted to the leading male, syn-
chrony in these cases is a non-adaptive global phenomenon: the real
causal agent is a local competitive battle for primacy.
Because frogs and insects have relatively small nervous systems, and
these mate-attraction behaviors are under strong selection, the mecha-
nisms underlying these examples of synchronization and entrainment in
frogs and insects are not usually considered analogous to the abstract
cognitive abilities relevant to human music. In all cases they are
domain-specific, and are presumably underpinned by hard-wired neural
circuitry that evolved to support that specific domain. These systems are
alsoeven in the best-developed casesstrictly periodic, while musical
rhythms are typically not perfectly periodic, but rather involve more
complex temporal patterns. Although some dance music may have a
perfectly even bass drum pulse at the musical surface, it is more typical
to hear patterns in which not all notes of the basic pulse are played,
and where additional notes are interspersed between pulses. This
makes even the simplest aspect of human rhythmthe inferences of a
steady pulse from a complex musical surfacedemand a cognitive
complexity beyond any of the insect or frog examples just considered
(cf. Fitch 2012). Thus, while the existence of numerous synchronizing
species provides an excellent test bed for adaptive hypotheses about the
evolution of entrainment (cf. Alexander 1975; Wells 1977; Greenfield
1994; Gerhardt and Huber 2002), at a mechanistic level it seems unlikely
that the specific neural circuits underlying synchronization in insects or
frogs will teach us much about the neural circuitry underlying human
rhythmic abilities.
What then of the synchronization abilities of mammals or birds? Here,
until very recently, the comparative data painted a bleak picture, and
there was little or no evidence for entrainment by any nonhuman species
(contra Darwin). This apparent absence seems particularly striking for
nonhuman primates, since both chimpanzees and gorillas do engage in
so-called drumming behavior, where the hands or feet are used to
repeatedly strike resonant objects (cf. Fitch 2006). In the case of gorillas,
the hands typically strike the animals own body, while chimpanzees
more commonly strike a resonant object (Arcadi, Robert, and Boesch
1998). However, there is no published evidence for synchronization of
such drumming, nor evidence that either of these species is able to
entrain their drumming to an external auditory signal. One possible
exception concerns a vocal phenomenon in bonobos dubbed staccato
hooting by Franz de Waal: During choruses, staccato hooting of differ-
ent individuals is almost perfectly synchronized so that one individual
acts as the echo of another or emits calls at the same moments as
another. The calls are given in a steady rhythm of about two per second
(De Waal 1988, 203). Unfortunately, De Waal presented no data or acous-
tic analysis in support of this statement, and no further reports of staccato
hooting have occurred in the twenty-five years since this tantalizing
statement was published. Thus, in general, until recently there was virtu-
ally no evidence for synchronization in any bird or nonhuman mammal
species, which led some commentators (e.g., Williams 1967) to the conclu-
sion that humans are uniqueat least among higher vertebratesin
our capacity to synchronize our rhythmic movements and vocalizations
among multiple individuals, or to an external sound source.
15.5 The New Wave of Animal Rhythm Studies: Animal Entrainment

Confirmed
All this changed abruptly in 2009, when two papers were published
simultaneously in the prestigious journal Current Biology (Patel et al.
2009a; Schachner et al. 2009). The initial indications of well-developed
synchronization to a musical rhythm in birds first surfaced in YouTube
videos purportedly showing dancing in a sulphur-crested cockatoo
(Cacatua galerita) named Snowball. Snowball was anonymously
donated to a bird rescue shelter along with a note indicating that he
enjoyed the music on an enclosed CD. When the CD was played, Snow-
ball began to rhythmically bob his head and lift his legs in time to the
music (figure 15.2). A YouTube video of this dancing went viral (more
than five million views by 2015) and subsequently came to the attention
of scientists, many of whom were initially sceptical about its veracity. But
the videos were suggestive enough for Aniruddh Patel and his colleagues
to travel to Snowballs home in Indiana to explore his synchronization
abilities experimentally.
Figure 15.2
Snowball, a sulfur-crested cockatoo, dancing. See (Patel et al. 2009a).
The crucial experiment involved slowing down and speeding up Snow-

balls preferred song (Everybody, by the Backstreet Boys) without
changing its pitch, and then recording his subsequent movements.
Although Snowball did not always synchronize to the beat, once a syn-
chronized state was reached, he bobbed his head in nearly perfect time
to the music: the average phase relation between head bobs and pulses
was not significantly different from 0. This means that the parrot bobbed
neither before nor after the downbeat, but directly simultaneous with it.
In a purely reactive situation, where a listener moves only after hearing
the relevant event, we would expect consistent lagging (positive) phase.
Snowballs average 0 phase instead indicates a predictive situation in
which a variable pulse period is first inferred, and then subsequent move-
ments are synchronized to it. This study provided the first convincing
evidence that a bird can extract a rhythmic pulse from human music and
synchronize its movements to that pulse: Pulse Perception and Entrain-
ment (PPE) (cf. Fitch 2013c).
The discovery of PPE in Snowball immediately raised multiple ques-
tions about the origins and frequency of this ability in other species. To
address the zoological generality of such abilities, Adena Schachner and
colleagues performed a large-scale analysis of YouTube videos purport-
ing to show dancing animals (Schachner et al. 2009). Because many
popular videos on the internet that supposedly show dancing animals are
obviously doctored by synchronizing the audio track to the animals
movements, initial scepticism about each video is clearly warranted.
Schachner and colleagues sifted through more than one thousand such
videos, excluding examples of doctoring, and in the remaining sample
testing whether the animal subjects maintained a consistent phase rela-
tive to the downbeat and/or matched the tempo of the music. Most
videos showed no evidence fulfilling these criteria. However, in thirty-
three videos, they observed what appeared to be PPE.
Among the fifteen species in Schachner and colleagues videos for
which solid evidence for PPE was observed, an astonishing fourteen were
of parrots; the only exception was a single potential example of PPE in
an Asian elephant. Schachner and colleagues also experimentally inves-
tigated PPE in both Snowball the cockatoo and the African grey parrot
Alex. In both birds, clear evidence for PPE was uncovered, consistent
with the conclusions of Patel and colleagues. Despite hundreds of videos
showing dancing dogs, no dogs showed convincing evidence of PPE.
These data pointed to a rather surprising conclusion: PPE was charac-
teristic of only two taxa among all bird and mammal species: humans
and various parrots (Fitch 2009).
These findings led to a surge of interest in animal rhythmic abilities,
including more carefully controlled laboratory studies. The ability of
another parrot species, budgerigars (parakeets or budgies), to synchro-
nize was studied by Hasegawa and his colleagues (2011), who easily
trained eight birds to tap to an acoustically- and visually-indicated tempo
at a wide range of frequencies. While budgies learned the task more
easily for slow tempos (12001800ms period), they subsequently tapped
more accurately to more rapid tempos (450600 ms), closer to typical
human tempos. As typical for human tapping experiments, all of the
budgies tended to lead the beat slightly, so a merely reactive process is
unlikely to account for PPE in this species. They should therefore provide
a suitable model in which to study animal rhythm further.
With the evidence for PPE in parrots now clear, several laboratories
renewed the search for evidence of synchronization abilities in nonhu-
man primates. Two new studies with rhesus macaques confirmed major
differences between the rhythmic abilities of humans and these monkeys
(Zarco et al. 2009; Merchant et al. 2011). In both studies, macaques were
trained to tap a key at a regular rate, and their behaviour was compared
to that of human participants. Despite certain similarities in error pat-
terns, monkeys were unable to synchronize to a metronomic pulse, or to
continue tapping regularly once such a pulse was removed. Furthermore,
humans typically show a distinct advantage when tapping is cued acousti-
cally rather than visually (cf. Patel et al. 2005); such a modality difference
was not seen in macaques (Zarco et al. 2009). These recent experiments
thus lend credence to the notion that human rhythmic abilities are unique
among primates.
However, a final recent primate study provides some glimmer of hope
for other primates. Three common chimpanzees were trained to tap on
alternating, briefly illuminated keys of a MIDI keyboard (Hattori,
Tomonaga, and Matsuzawa 2013). They were required to learn to tap
alternating keys, and after a minimum of thirty consecutive taps, they
received a food reward. After consistently meeting this criterion, each
individual moved on to a test stage in which a repeated distractor
note (different from the one produced by their own keyboard press)
was played at a consistent tempo (400, 500, or 600ms inter-onset inter-
val). Reward was given for completing thirty taps, irrespective of any
synchronization, so while the apes were trained to tap, they were not
trained to synchronize. Nonetheless, one of the three chimpanzees, a
female named Ai, demonstrated spontaneous synchronization to this
regular distractor note, but only at the 600ms tempo. This chimpanzee
spontaneously aligned her taps (mean of roughly 0 phase) to this steady
auditory pulse. The two other chimpanzees showed no evidence of
synchronization.
Unfortunately, Ai did not show synchronization to the other two
tempos, and the authors hypothesized that her successful synchroniza-
tion to the 600ms tempo stemmed from the fact that her spontaneous
tapping frequency was very close to this (about 580ms). Although the
limitation to one of three animals and a single tempo suggests that chim-
panzee synchronization abilities remain quite limited compared to the
abilities of humans or parrots, they go well beyond those previously
observed in macaques. This is thus the first well-controlled primate study
demonstrating any component of PPE in a nonhuman primate, though
Ais performance still does not approach typical human (or parrot)
levels.
15.6 Patels Vocal Learning Hypothesis Meets Ronan the Sea Lion
During this period of growing interest in animal rhythm, Aniruddh

Patels (2006) suggestion that entrainment abilities in a given species may
be a by-product of their capacity for vocal learning played a galvanizing
role, and several studies of animal entrainment have been framed as
explicit tests of Patels (2006) hypothesis (Schachner et al. 2009; Hasegawa
et al. 2011; Cook et al. 2013; Hattori, Tomonaga, and Matsuzawa 2013).
This hypothesis starts with the fact that complex vocal production
learningthat is, the capacity to imitate novel sounds vocallyis an
unusual ability among animals. Nonetheless, this capacity has repeteadly
evolved convergently in both mammalian (Janik and Slater 1997; Janik
and Slater 2000) and avian evolution (Nottebohm 1975; Jarvis 2004; Fitch
and Jarvis 2013). Because vocal learning requires well-developed con-
nections between auditory and vocal motor systems, Patel suggested that,
once such connections are in place, driven by selection for vocal learning,
they may lead to the ability for auditory inputs to modulate motor behav-
iour in general (not just in vocal motor behaviour). Patel thus proposed
that a capacity for rhythmic entrainment could arise as a by-product of
selection for vocal learning, dubbing this the vocal learning and rhyth-
mic synchronization hypothesis (Patel 2006).
Patels hypothesis is consistent with the capacity for rhythmic synchro-
nization in humans, who are the only primates known to exhibit complex
vocal production learning (Janik and Slater 1997; Fitch 2000; Egnor and
Hauser 2004). The lack of PPE in non-vocal-learning mammals like dogs
or non-human primates is also correctly predicted by Patels hypothesis.
The new finding of parrot PPE is consistent with the hypothesis, since
most parrot species are highly competent vocal learners (Nottebohm
1975). Although the evidence for PPE in elephants remains tenuous, if
confirmed, it would also be consistent, since both extant elephant species
have now been shown to have vocal learning capabilities (Poole et al.
2005; Stoeger et al. 2012).
More problematic is the lack of any evidence for PPE in numerous
species which are known to have excellent vocal learning abilities. These
include most prominently songbirds, in which most of roughly five thou-
sand species are vocal learners. Many songbirds that are skilled vocal
learners (including mynahs or starlings) are commonly kept as pets, and
Figure 15.3
Ronan, a California sea lion, bobbing her head up and down to music. See (Cook et al.
2013).
easily learn to imitate speech. Nonetheless, there are no documented

examples of rhythmic entrainment in birds other than parrots. Equally
surprising is the lack of evidence for PPE in dolphins, orcas, or other
toothed whales (members of the cetacean suborder Odontoceti) since
such species are vocal learners that are both common in captivity and
are frequently trained to do elaborate performances while music is
played. Nonetheless, there is no evidence (yet?) for PPE in these or other
odontocete species. This absence of evidence for PPE in known vocal
learners strongly suggests that vocal learning may be a necessary but not
sufficient precondition for entrainment (Fitch 2009; Patel et al. 2009b;
Schachner 2010). Although it remains possible that dolphins or mynahs
can be trained to entrain, the data reviewed to this point (late 2014)
suggest that, in addition to vocal learning, some other selective pressures
and neural equipment are required for the form of flexible PPE observed
in parrots and humans to evolve.
However, this otherwise consistent set of data supporting Patels vocal
learning hypothesis has recently been challenged by a laboratory study
demonstrating excellent PPE abilities in a California sea lion, Zalophus
californianus, named Ronan (Cook et al. 2013). Because there is no
evidence for vocal learning in sea lions (Schusterman 2008)or indeed
in the entire family to which they belong (the eared seals, members of
the family Otariidae), this finding presents a clear challenge to Patels

hypothesis. Ronan was first trained to synchronize her head-bobbing
movements (see figure 15.3) to a simple repetitive sound at two different
tempi (80 and 120 BPM). Crucially, like Snowball, after this training
Ronan spontaneously generalized her synchronized head-bobbing tempo
to five new rates. Equally important, after training solely with a simple
metronomic stimulus, she generalized spontaneously to complex recorded
music at various tempos. This suggests, surprisingly, that once the motor
task of synching periodic motion to a simple repeated sound was learned,
the perceptual task of extracting the beat from a complex acoustic signal
was comparatively trivial for this sea lion.
This study is exemplary from a methodological viewpoint. Cook and
colleagues took particular pains to avoid potential confounds like uncon-
scious cueing by human experimenters, who remained invisible to Ronan
during training and testing. An important control experiment incorpo-
rated stimuli with missing beats (rests), that is, beats omitted from an
otherwise steady tempo. Ronan did not omit her head bobs when they
were preceded by such silences, demonstrating that she does not simply
react to auditory events, but extracts the tempo from either a simple
pulse or from a complex musical surface, and then uses it to entrain her
own inferred inner pulse. Thus, while these results derive from a single
animal, they provide some of the best evidence to date of PPE in a non-
human species that replicates multiple features of human synchroniza-
tion abilities.
Taken at face value, this finding of PPE in a non-vocal learning species
presents a sharp challenge to Patels (2006) vocal learning and rhythmic
synchronization hypothesis. But there is enough circumstantial evidence
to suggest that some form of Patels hypothesis can be salvaged.
First, the larger mammalian clade of pinnipeds includes three families:
the otariid seal family that includes sea lions, the true seals Phocidae,
and the walruses Odobenidae (figure 15.4). Members of both of these
other (non-sea lion) families appear to be vocal learners (Janik and
Slater 1997; Sjare, Stirling, and Spencer 2003; Schusterman 2008; Schus-
terman and Reichmuth 2008). In true seals the most famous example is
provided by a harbor seal (Phoca vitulina) named Hoover, who, after
spending his early childhood with a Maine fisherman, spontaneously
acquired the ability to produce several English words and phrases
(including Hey, Hoover, Hello there, and Get over here!) (Ralls,
Fiorelli, and Gish 1985). Other evidence includes the multiple examples
of striking dialectal differences in various other phocid seal species,
* True Seals
Phocidae
Eared Seals
Otariidae
* Walruses
Odobenidae
Non-Pinniped Carnivores
Figure 15.4
The evolutionary relationships between the three main clades of pinnipeds: true seals,
walruses and otariid eared seals like sea lions. Following (Arnason et al. 2006).
which suggest vocal learning in this clade. For example, Weddell seals
(Leptonychotes weddelli) have songs that vary considerably in spectral
cues between adjacent sites in Antarctica, and each of these neighboring
populations also has its own unique call types. These and other examples
strongly suggest that vocal learning exists in several other species of
phocid seals.
In walruses, strong circumstantial evidence for vocal learning again
comes from captivity, in this case from two walruses (a male and a
female), who were trained to emit novel vocalizations for a reward
(Schusterman 2008; Schusterman and Reichmuth 2008). All the walruses
were easily trained to emit various sounds and to make up new sounds,
demonstrating considerable vocal flexibility and control. More telling,
the male walrus developed a novel (and un-reinforced) noise-making
behavior that involved buzzing a floating toy by breathing out through
it. The sound produced seemed to be intrinsically rewarding, and it also
attracted the attention of female walruses in the tank. One of the females
later learned how to do the same thing. So, while these examples do not
conclusively demonstrate walrus vocal learning, they are consistent with
excellent vocal control and flexibility, and show the capacity of walruses
to learn a novel method of sound production.
Walruses are more closely related to otariid seals than to phocid seals,
so this nested pattern of data can be explained in various ways. The
common ancestor of all pinnipeds may have evolved vocal learning, and
then otariids (apparently) lost it, or phocids and walruses may have
independently evolved vocal learning. The third possibility is that all
pinnipeds are vocal learners, and we just dont have evidence for this in
otariids like sea lions yet.
Another possibility that would realign Patels hypothesis with the sea
lion results relies on the fact that sea lionslike most marine mammals
can easily be trained to bring their vocalizations under operant control
(Schusterman and Feinstein 1965). While most mammals, including pri-
mates, can (with extensive training) learn to emit vocalizations to a
command, this is typically much more difficult than ordinary operant
responses (e.g., bar pressing) (Adret 1992), and it is typically very chal-
lenging to train a primate to vocalize on command. For sea lions this task
is easy, which suggests that, instead of considering vocal learning as a
binary feature, we should think of it as a continuum (Fitch and Jarvis
2013), spanning from elaborate and complex vocal learning (humans,
parrots) through good vocal control with little learning (sea lions and
most marine mammals) to very little voluntary vocal control or learning
(most mammals, including primates). By this slight modification of Patels
hypothesis, which could be then renamed the vocal control and rhythmic
synchronization hypothesis, the basic insight of a close mechanistic
link between PPE and vocal motor control would remain valid, and
Ronans performance would constitute data consistent with this revised
hypothesis.
Summarizing these new animal data, it is now quite clear that the
capacity for rhythmic synchronization exists in several nonhuman species,
including at least sea lions and multiple parrot species. Crucially, and
unlike the long-known examples of entrainment in insects and frogs, both
parrots and sea lions appear to share with humans the ability to entrain
their movements to a wide range of tempos and to infer a pulse from a
complex musical surface. Although the data for Ai the chimpanzee
suggest that some modicum of synchronization abilities may be found in
at least some individual chimpanzees, her failure to generalize to new
tempi still indicates a sharp distinction from human, parrot, or sea lion
rhythmic performance. Thus human pulse perception and entrainment,
while unusual or unique among primates, is shared with these more
distant relatives, almost certainly as a result of convergent evolution.
Returning to the comparison of music and language, however, these
new comparative data remain silent about the second major component
of musical rhythm: hierarchical metrical structure. Because metrical
structure is hypothesized to be a shared aspect of music and speech, while
entrainment to an isochronic rhythm is typical only of music, the cur-
rently available animal data are relevant only to music and not the (argu-
ably more interesting) question of the biological origins of metrical
structure. There is thus a clear need for animal studies of meter percep-
tion: we know virtually nothing at present about any species ability to
detect different cues to stress in speech or respond to the metrical grid
in music. I thus end with a brief discussion of meter in music and lan-
guage, explaining its relevance to the biology and evolution of both
capacities, in the hope of spurring such comparative research.
15.7 Hierarchical Metrical Structure in Music and Language
One of the most compelling features apparently shared by music and

language is metrical structure (Lerdahl and Jackendoff 1983; Jackendoff
1987; Lerdahl 2001; Jackendoff and Lerdahl 2006). Metrical structure is
the patterning of a series of sonic events into a hierarchically-structured
sequence of strong and weak beats. This implies that when perceiving a
simple series of sonic events such as:
. . . x x x x x x x x x . . .
we tend to structure this into groups of events by imposing a multi-
layered structure of one or more additional, relatively evenly spaced
virtual events:
. . . x x x x x . . .
. . . x x x x x x x x x . . .
in this case grouping the surface events into pairs, and placing the accent
on the first event (represented by a vertical column containing two xs
in the representation above).
In speech, this is reflected in stress assignment: the English noun rebel
and the verb rebel are differentiated by whether the stress falls on the
first or second syllable. In music, meter is reflected in the metrical grid:
several layers of rhythmic elements (beats, bars, etc.), where some ele-
ments are picked out as stronger than others at each level. Interestingly,
in many ways these two rather different reflections of meter share similar
constraints. In both cases, the hierarchical structure allows only a very
small number of subdivisions (two or three; Lerdahl and Jackendoff
1983). In speech and music there is a strong preference for regularity and
for a good fit between the sound stream and a consistent metrical grid.
This is reflected in speech by the fact that words that receive one stress
pattern when spoken alone (like kangaro, which takes stress on the last
syllable) can have the stress reassigned in the context of other words in
a phrase (in kangaroo curt, the main stress shifts to court and away from
the third syllable oo). Finally, in both music and language, there are
multiple potential acoustic realizations of strength, including pitch,
duration, and loudness in both domains (although an additional cue, the
timbral changes in unstressed syllables, as seen in English, Dutch, or
German vowel reduction, appears to be limited to language).
Noting these similarities, Jackendoff and Lerdahl (2006) also point out
two key differences: speech meter is less regular in terms of hierarchical
structure, and also, regarding pulse timing, the beats underlying speech
are not isochronic, at least not in ordinary spoken language. These dif-
ferences suggest a clear distinction between two cognitive processing
domains: isochrony and pulse perception, on the one hand, and meter
and hierarchy perception, on the other (cf. Fitch 2013c). Of course, this
implies no strict dichotomy between music and language: poetry, nursery
rhymes, and song lyrics all occupy an intermediate zone between music
and language and are much more regular than standard speech. Fur-
thermore, even within music, the degree of isochrony varies considerably.
While dance music tends to be strongly isochronic (Temperley 2004),
much Western art music encourages a more flexible interpretation of the
pulse for expressive purposes (Repp 1998), and several known musical
styles are not isochronic at all (Frigyesi 1993; Clayton 1996). Thus isoch-
rony represents a continuum, with dance music and ordinary speech at
opposite ends. In any case, the distinction between meter and pulse
renders these isochrony issues unproblematic for Jackendoff and Ler-
dahls proposal that meter is a shared cognitive aspect of the two domains.
In a recent book comparing language and music as cognitive systems
(Rebuschat et al. 2012), three chapters explore slightly different view-
points about metrical phonology that extend the previous discussion of
Jackendoff and Lerdahl. The target article by Nigel Fabb and Morris
Halle (Fabb and Halle 2012) first introduces their recently-developed
approach to poetic meter, and then compares this with both ordinary
word stress and musical rhythm. The authors conclude that these three
domains share multiple important features, particularly of grouping, but
Fab and Halle allow various phenomena in music (such as rests) that
violate their proposed rules of linguistic stress. They consider exceptions
in poetry to be examples of extrametricality, invoked by artistic pre-
rogative. Fabb and Halles theory assigns beats exclusively to syllables,
so every abstract beat must be a projection from a pronounced syllable.
In music, by contrast, rests (silences) often project to beats. Their model
thus argues against a strict identity of the cognitive underpinnings of
meter and stress in music and poetry.
In opposition to this partial sharing hypothesis, both Vaux and Myler
(2012) and Roberts (2012) argue for a strict identity between metrical
cognition in these two domains. As Vaux and Myler point out, the metri-
cal patterns of poetic verse can easily be accommodated to that of music
if, instead of allowing only for strict projection from syllables, silences
can also play the role of beats. They illustrate this with the nursery rhyme
Hickory Dickory Dock (previously discussed by Jackendoff and
Lerdahl 2006). In this rhyme, the natural way to speak the verse involves
waltz meter, but leaving a pause (like a musical rest) after dock and
clock:
x x x (x)
x x x x x x x (x x) (x x) x
Hick-or-y, dick-or-y, dock, ___ the
x x x (x)
x (x) x x (x) x x (x x) (x x x)
mouse ran up the clock. ___
where the parentheses indicate beats that are felt and timed, but which
do not project to any syllable (termed catalexis in poetry). In particu-
lar, at the end of the last line, an entire triplet is left silent. Vaux and
Myler suggest that this requires a modification of Fabb and Halles
model, and that instead of projection from syllables, call for a model that
involves a mapping between linguistic syllables and abstract timing
slots. In such a model, they argue, several problems with Fabb and
Halles approach disappear and, not coincidentally, the cognitive under-
pinnings of musical and speech meter are indeed identical: they conclude
that poetic metre is music.
A slightly different identity thesis is outlined by Ian Roberts (2012),

who emphasizes similarities between musical meter and linguistic syntax.
Roberts postulates that an identical combinatory operator Merge
builds structure in both the musical and speech rhythm contexts, and that
the clear surface differences between these two domains reflect the dif-
ferent interface constraints that are entailed by externalization to
musical and spoken domains. In language, stress grids of whole phrases
must be brought into coherence with a lexicon of words, which them-
selves have stress, not to mention meaning and morphological/
phonological structure. In music, by contrast, notes or rests typically have
neither internal structure nor propositional meaning, and thus pose
fewer restrictions on the spell-out into acoustic stimuli. The result is
consistent with Jackendoff and Lerdahls hypothesis that music reflects
metrical structure much more clearly and directly than language, where
additional (non-metrical) constraints exist. As they put it:
Stress in language is constrained by the fact that it is attached to strings of words,
where the choice of words is in turn primarily constrained by the fact that the
speaker is trying to convey a thought. Therefore regularity of stress usually has
to take a back seat. (44)
In summary, each set of authors has a slightly different take on the

details, but all concur that the similarities in metrical structure in music
and language are deep and non-coincidental, and whether they are iden-
tical is perhaps a terminological issue. To resolve this debate, it would
obviously be welcome to have more empirical data. For example how
does stress assignment during recitation of poetry compare, at a neural
level, with rhythm assignment when singing or playing a melody? Exist-
ing neural imaging data indicate an important role for the basal ganglia
(and perhaps other traditional motor structures) in rhythmic percep-
tion (Grahn 2009, 2012; Merchant et al. 2015). Are the structures elicited
by reciting poetry comparable, or is strict isochrony required to observe
such activations (cf. Zeman et al. 2013)?
More generally, we currently know almost nothing about the percep-
tion of meter, in either domain, by nonhuman animals. The little evidence
available, based on ERP data with rhesus macaques, provides no indica-
tion that these monkeys perceive hierarchical structure in musical
rhythms (Honing 2012; Honing et al. 2012). In any case, the million-
dollar question concerns not monkeys, but parrots or sea lions, who have
a clear ability to detect and match an isochronic pulse. Whether animals
in these species detect metrical structure remains, at present, unknown.
15.8 Conclusions
Clearly, at present, our understanding of the biological basis for musical

rhythm remains incomplete, and our lack of knowledge concerning the
capacity of nonhuman animals to recognize and process hierarchical
metrical structure represents a central lacuna in current attempts to
compare language and music (cf. Fitch 2013c). As Jackendoff has stressed,
it is not sufficient to ask what music and language have in common: we
also must ask, In the respects that language and music are the same, are
they genuinely distinct from other human activities? (Jackendoff 2009,
195). If, for example, a proclivity for hierarchical structure is character-
istic of all aspects of human cognition, noting that music and language
both possess such structure is not evidence of any special relationship
between the two. However, I think this within-species question is incom-
plete, and that, if we are interested in the evolution of music and lan-
guage, we also need to ask whether particular human capacities, possibly
shared across cognitive domains, are also shared with other animals.
To illustrate, consider a somewhat neglected shared aspect of music
and language: both make use of relative, rather than absolute, pitch.
Relative pitch perception develops perfectly naturally, with no training,
in virtually all humans. In contrast, the comparative data indicate clearly
that our capacity for relative pitch perception is very unusual (Hoeschele
et al. 2015): most species perceive absolute rather than relative pitch
(Hulse and Cynx 1985; DAmato 1988; Cynx 1993). Indeed, it has been
difficult to find any evidence for relative pitch processing in a nonhuman
species, even simple octave generalization, much less melodic transposi-
tion (cf. Wright et al. 2000). From a comparative viewpoint, this suggests
that the human capacity for relative pitch perception is another unusual,
recently-evolved trait, one that may have considerable relevance to the
evolution of both music and language.
Such comparative issues become particularly acute when we consider
hierarchically-structured cognition. As Jackendoff (2009) notes, and as
previously observed (by Lashley 1951; Miller, Galanter, and Pribram
1960), hierarchical structure is not only typical of music and language,
but also of human action and motor planning. Thus, such mundane
human activities as making a cup of coffee or preparing manioc require
an ability to nest sub-goals within the overall goal that, when considered
formally, have a headed hierarchical structure reminiscent of linguistic
phrase structure. This suggests that motor hierarchy might have been the
evolutionary precursor of hierarchy in music and language (Lenneberg
1967; Allott 1989; Galantucci, Fowler, and Turvey 2006), suggesting in

turn that such motor hierarchies are also typical of nonhuman animals.
Here the comparative data remain incomplete, but no learned motor
behaviour known in any animal species, including chimpanzees, has hier-
archical structure as complex as making a cup of coffee (not to mention
building a house, weaving a basket, or numerous other technological
achievements). The most complex motor plans known to date in animals
involve the use of several different tool types in termite fishing by chim-
panzees (Sanz, Call, and Morgan 2009), which, while impressive, do not
approach the level of even simple modern human technologies (e.g.,
making and using a bow and arrow, fishing with hooks or nets, cooking,
etc.). Note that although hierarchical structure can be assigned to animal
motor behavior, even at the simplest level of ingesting food or locomo-
tion (Lashley 1951; Fitch and Martins 2014), this does not entail that
animals are able to form perceptual hierarchies or to generate and
manipulate cognitive representations of hierarchy. As for firefly synchro-
nization, motor hierarchies may represent highly-encapsulated, hard-
wired abilities, rather than something the organism can reflect upon,
modify, or bring into a mapping relation with other hierarchically-
organized structures.
Elsewhere my colleagues and I have proposed that hierarchical cogni-
tion is a distinctive trait of our species, and that the human love of tree-
structures or dendrophilia is not typical of even our closest primate
relatives (Fitch and Friederici 2012; Fitch 2013a; Fitch 2014; Westphal-
Fitch and Fitch, forthcoming). If true, what would this mean for the
music/language comparison? The flexible and multi-domain capacities
that we humans use to structure multiple aspects of our cognition may
in fact reflect a singular, recently evolved dendrophilic capacity, with
reflexes in music, language, social cognition, and techne (organized motor
planning). This capacity may stem from the emancipation of low-level
hierarchical organization, or reflect a de novo evolutionary innovation
in early hominins that evolved in the context of tool manufacture, and
was later co-opted for use in language and music (Montagu 1976;
Calvin 1983; Leakey 1994; Stout et al. 2008). But it is also possible that
hierarchical cognition evolved first in the context of language (or, as
Darwin (1871) proposed, in music), and was then co-opted for use in
social cognition and technology. All of these possibilities are, to my mind,
plausible, and simply acknowledging that hierarchy is in some sense
domain general in modern humans does not help in deciding among
these evolutionary hypotheses.
Given the manifestly hierarchical nature of meter in both music and

language, it should now be clear why I believe that the question of
whether non-human animals perceive metrical structure is very impor-
tant. If many species turn out to perceive hierarchical stress patterns in
human-generated sounds, or in their own species vocalizations, this
would falsify the hypothesis that there is anything unique about hierar-
chical capacities in humans. And, while the search for hierarchical struc-
ture in animals has so far been disappointing (ten Cate and Okanoya
2012), it does not seem implausible that Snowball the parrot, Ronan the
sea lion, or members of other rhythmically skilled species might also
appreciate hierarchical structure in speech or music. Musical rhythms
represent perhaps the simplest form of hierarchical structure in human
cognition, both in music and dance (Fitch 2013c). And animal meter
perception would also provide a plausible evolutionary precursor for the
more complex forms of hierarchy seen in language or in harmonic syntax
in musican alternative to the idea that motor hierarchy provided such
a precursor. Not coincidentally, it would support Darwins hypothesis
that the origins of language lie in a now-lost musical protolanguage
that shared elements of both domains (Darwin 1871; Livingstone 1973;
Richman 1993; Brown 2000; Mithen 2005; Fitch 2006, 2013b).
Returning to the bigger picture, it is clear that the relationship between
music and language remains, in many ways, mysterious (Arbib 2013).
Equally clear is that the most profitable approach to exploring this fas-
cinating relationship is to adopt a formal multi-component approach,
subdividing each of these broad capacities into specific cognitive mecha-
nisms, and then comparing those mechanisms: the approach consistently
advocated by Jackendoff throughout his career. Furthermore, I suggest
that without his important contributions to formal analysis in both lin-
guistics and musicology, the types of detailed comparisons considered in
this chapter would be impossible (or, at least, would look very different).
I hope to have shown that consideration of animal capabilities from a
comparative perspective can play an important role in this endeavor,
even if current knowledge remains too patchy for firm conclusions to be
drawn. The detailed formal analysis of both musical and linguistic struc-
tures championed by Ray Jackendoff has played, and will doubtless
continue playing, a central role in such future work in cognitive biology.
I conclude that, particularly with regard to the biology and evolution of
metrical structure, Jackendoffs perspective provides fertile ground that
should keep linguistically- and musicologically-minded biologists busy
for decades to come.
Acknowledgements
I thank Simon Durrant, Jonah Katz, Fred Lerdahl, and Ida Toivonen for
constructively critical comments on a previous version of the manuscript,
and ERC Advanced Grant #230604 SOMACCA for financial support.
References
Adret, Patrice. 1992. Vocal learning induced with operant techniques: An over-
view. Netherlands Journal of Zoology 43 (12): 125142.
Alexander, Richard D. 1975. Natural selection and specialized chorusing behav-
ior in acoustical insects. In Insects, Science and Society, edited by David Pimentel,
3577. New York: Academic Press.
Allott, Robin. 1989. The Motor Theory of Language Origin. Sussex: The Book
Guild.
Arbib, Michael A., ed. 2013. Language, Music, and the Brain: A Mysterious Rela-
tionship. Cambridge, MA: MIT Press.
Arcadi, Adam Clarke, Daniel Robert, and Christophe Boesch. 1998. Buttress
drumming by wild chimpanzees: Temporal patterning, phrase integration into
loud calls, and preliminary evidence for individual distinctiveness. Primates 39
(4): 505518.
Arnason, Ulfur, Annette Gullberg, Axel Janke, Morgan Kullberg, Niles Lehman,
Evgeny A. Petrov, and Risto Vinl. 2006. Pinniped phylogeny and a new
hypothesis for their origin and dispersal. Molecular Phylogenetics and Evolution
41 (2): 345354.
Bernstein, Leonard. 1981. The Unanswered Question: Six Talks at Harvard
(Charles Eliot Norton Lectures). Cambridge, MA: Harvard University Press.
Brown, Steven. 2000. The Musilanguage model of music evolution. In The
Origins of Music, edited by Nils Lennart Wallin, Bjrn Merker, and Steven
Brown, 271300. Cambridge, MA: MIT Press.
Buck, John. 1938. Synchronous rhythmic flashing in fireflies. Quarterly Review of
Biology 13 (3): 301314.
Buck, John. 1988. Synchronous rhythmic flashing in fireflies. II. Quarterly Review
of Biology 63 (3): 265289.
Calvin, William H. 1983. A stones throw and its launch window: Timing precision
and its implications for language and hominid brains. Journal of Theoretical
Biology 104 (1): 121135.
Clayton, Martin R. L. 1996. Free rhythm: Ethnomusicology and the study of
music without metre. Bulletin of the School of Oriental and African Studies,
University of London 59 (2): 323332.
Cook, Peter, Andrew Rouse, Margaret Wilson, and Colleen J. Reichmuth.
2013. A California sea lion (Zalophus californianus) can keep the beat: Motor
entrainment to rhythmic auditory stimuli in a non vocal mimic. Journal of Com-

parative Psychology 127 (4): 116.
Cooke, Deryck. 1959. The Language of Music. Oxford: Oxford University
Press.
Cynx, Jeffrey. 1993. Auditory frequency generalization and a failure to find octave
generalization in a songbird, the European starling (Sturnus vulgaris). Journal of
Comparative Psychology 107 (2): 140146.
DAmato, Michael R. 1988. A search for tonal pattern perception in Cebus
monkeys: Why monkeys cant hum a tune. Music Perception 5 (4): 452480.
Dalla Bella, Simone, Jean-Franois Gigure, and Isabelle Peretz. 2009. Singing in
congenital amusia. Journal of the Acoustical Society of America 126 (1):
414424.
Darwin, Charles. 1871. The Descent of Man and Selection in Relation to Sex.
London: John Murray.
De Waal, Frans B. M. 1988. The communicative repertoire of captive bonobos
(Pan paniscus), compared to that of chimpanzees. Behaviour 106 (3-4):
183251.
Egnor, S. E. Roian, and Marc D. Hauser. 2004. A paradox in the evolution of
primate vocal learning. Trends in Neurosciences 27 (11): 649654.
Ermentrout, Bard. 1991. An adaptive model for synchrony in the firefly Pteroptyx
malaccae. Journal of Mathematical Biology 29 (6): 571585.
Fabb, Nigel, and Morris Halle. 2012. Grouping in the stressing of words, in metri-
cal verse, and in music. In Language and Music as Cognitive Systems, edited by
Patrick Rebuschat, Martin Rohmeier, John A. Hawkins, and Ian Cross, 421.
Fitch, W. Tecumseh. 2000. The evolution of speech: A comparative review. Trends
in Cognitive Sciences 4 (7): 258267.
Fitch, W. Tecumseh. 2006. The biology and evolution of music: A comparative
perspective. Cognition 100 (1): 173215.
Fitch, W. Tecumseh. 2009. Biology of music: Another one bites the dust. Current
Biology 19 (10): R403404.
Fitch, W. Tecumseh. 2012. The biology and evolution of rhythm: Unravelling a
paradox. In Language and Music as Cognitive Systems, edited by Patrick Rebus-
chat, Martin Rohmeier, John A. Hawkins, and Ian Cross, 7395. Oxford: Oxford
University Press.
Fitch, W. Tecumseh. 2013a. The biology and evolution of language: A com-
parative approach. In The Language-Cognition Interface, edited by Stephen R.
Anderson, Jacques Moeschler, and Fabien Reboul, 5981. Geneva: Librarie
Droz.
Fitch, W. Tecumseh. 2013b. Musical protolanguage: Darwins theory of language
evolution revisited. In Birdsong, Speech and Language: Exploring the Evolution
of Mind and Brain, edited by Johan J. Bolhuis and Martin B. H. Everaert. Cam-
Fitch, W. Tecumseh. 2013c. Rhythmic cognition in humans and animals: Distin-

guishing meter and pulse perception. Frontiers in Systems Neuroscience 7 (68):
116.
Fitch, W. Tecumseh. 2014. Toward a computational framework for cognitive
biology: Unifying approaches from cognitive neuroscience and comparative cog-
nition. Physics of Life Reviews 11 (3): 329364.
Fitch, W. Tecumseh, and Angela D. Friederici. 2012. Artificial grammar learning
meets formal language theory: An overview. Philosophical Transactions of the
Royal Society B 367 (1598): 19331955.
Fitch, W. Tecumseh, and Erich D. Jarvis. 2013. Birdsong and other animal models
for human speech, song, and vocal learning. In Language, Music, and the Brain:
A Mysterious Relationship, edited by Michael A. Arbib, 499539. Cambridge,
MA: MIT Press.
Fitch, W. Tecumseh, and Mauricio D. Martins. 2014. Hierarchical processing in
music, language and action: Lashley revisited. Annals of the New York Academy
of Sciences 1316: 87104.
Frigyesi, Judit. 1993. Preliminary thoughts toward the study of music without
clear beat: The example of flowing rhythm in Jewish Nusah. Asian Music 24
(2): 5988.
Galantucci, Bruno, Carol A. Fowler, and Michael T. Turvey. 2006. The motor
theory of speech perception reviewed. Psychonomic Bulletin and Review 13 (3):
361377.
Gerhardt, H. Carl, and Franz Huber. 2002. Acoustic Communication in Insects
and Anurans: Common Problems and Diverse Solutions. Chicago: University of
Chicago Press.
Grahn, Jessica A. 2009. The role of the basal ganglia in beat perception: Neuro-
imaging and neuropsychological investigations. Annals of the New York Academy
of Sciences 1169 (1): 3545.
Grahn, Jessica A. 2012. Neural mechanisms of rhythm perception: Current find-
ings and future perspectives. Topics in Cognitive Science 4 (4): 585606.
Greenfield, Michael D. 1994. Cooperation and conflict in the evolution of signal
interactions. Annual Review of Ecology and Systematics 25: 97126.
Greenfield, Michael D. 2005. Mechanisms and evolution of communal sexual
displays in arthropods and anurans. Advances in the Study of Behavior 35 (5):
162.
Greenfield, Michael D., and Igor Roizen. 1993. Katydid synchronous chorusing
is an evolutionarily stable outcome of female choice. Nature 364 (6438):
618620.
Hasegawa, Ai, Kazuo Okanoya, Toshikazu Hasegawa, and Yoshimasa Seki. 2011.
Rhythmic synchronization tapping to an audio-visual metronome in budgerigars.
Scientific Reports 1: 120.
Hattori, Yuko, Masaki Tomonaga, and Tetsuro Matsuzawa. 2013. Spontaneous
synchronized tapping to an auditory rhythm in a chimpanzee. Scientific Reports
3: 1566.
Hoeschele, Marisa, Hugo Merchant, Yukiko Kikuchi, Yuko Hattori, and Carel
ten Cate. 2015. Searching for the origins of musicality across species. Philosophi-
cal Transactions of The Royal Society B 370 (1664): 20140094.
Honing, Henkjan. 2012. Without it no music: Beat induction as a fundamental
musical trait. Annals of the New York Academy of Sciences 1252 (1): 8591.
Honing, Henkjan, Hugo Merchant, Gbor P. Hden, Luis Prado, and Ramn
Bartolo. 2012. Rhesus monkeys (Macaca mulatta) detect rhythmic groups in
music, but not the beat. PLoS One 7 (12): e51369.
Hulse, Stewart H., and Jeffrey Cynx. 1985. Relative pitch perception is con-
strained by absolute pitch in songbirds (Mimus, Molothrus, Sturnus). Journal of
Comparative Psychology 99 (2): 176196.
MA: MIT Press.
Jackendoff, Ray. 2002. Foundations of Language. New York: Oxford University
Press.
Jackendoff, Ray. 2007. A Parallel Architecture perspective on language process-
ing. Brain Research 1146: 222.
Jackendoff, Ray. 2009. Parallels and nonparallels between language and music.
Music Perception 26 (3): 195204. Reprinted as Music and Language, in The
Routledge Companion to Philosophy and Music, edited by Theodore Gracyk and
Andrew Kania, 101112. New York: Routledge. 2011.
Jackendoff, Ray, and Fred Lerdahl. 1982. A grammatical parallel between music
and language. In Music, Mind, and Brain: The Neuropsychology of Music, edited
by Manfred E. Clynes, 83117. New York: Plenum.
Janik, Vincent M., and Peter J. B. Slater. 1997. Vocal learning in mammals. In
Advances in the Study of Behavior, vol. 26, edited by Peter J. B. Slater, Charles
T. Snowdon, Jay Rosenblatt, and Manfred Milinski, 5999. San Diego: Academic
Press.
Janik, Vincent M., and Peter J. B. Slater. 2000. The different roles of social learn-
ing in vocal communication. Animal Behaviour 60 (1): 111.
Jarvis, Erich D. 2004. Learned birdsong and the neurobiology of human language.
Annals of the New York Academy of Sciences 1016 (1): 749777.
Katz, Jonah, and David Pesetsky. 2009. The identity thesis for language and music.
LingBuzz. http://ling.auf.net/lingbuzz/000959.
Lashley, Karl. 1951. The problem of serial order in behavior. In Cerebral Mecha-
nisms in Behavior: The Hixon Symposium, edited by Lloyd A. Jeffress, 112146.
New York: Wiley.
Leakey, Richard E. 1994. The Origin of Humankind. New York: Basic Books.
Lenneberg, Eric H. 1967. Biological Foundations of Language. New York: Wiley.
Lerdahl, Fred. 2001. The sounds of poetry viewed as music. Annals of the New
York Academy of Sciences 930 (1): 337354.
Lerdahl, Fred. 2013. Musical syntax and its relation to linguistic syntax. In Lan-
guage, Music and the Brain, edited by Michael A. Arbib, 257272. Cambridge,
MA: MIT Press.
Levman, Bryan G. 1992. The genesis of music and language. Ethnomusicology 36
(2): 147170.
Liberman, Mark, and Alan Prince. 1977. On stress and linguistic rhythm. Linguis-
tic Inquiry 8 (2): 249336.
Livingstone, Frank B. 1973. Did the Australopithecines sing? Current Anthropol-
ogy 14 (12): 2529.
Martin, James G. 1972. Rhythmic (hierarchical) versus serial structure in speech
and other behavior. Pyschological Review 79 (6): 487509.
Merchant, Hugo, Wilbert Zarco, Oswaldo Prez, Luis Prado, and Ramn N.
Bartolo. 2011. Measuring time with different neural chronometers during a
synchronization-continuation task. Proceedings of the National Academy of Sci-
ences 108 (49): 1978419789.
Merchant, Hugo, Jessica Grahn, Laurel Trainor, Martin Rohrmeier, and W
Tecumseh Fitch. 2015. Finding the beat: A neural perspective across humans and
non-human primates, Philosophical Transactions of The Royal Society B 370
(1664): 20140093.
Merker, Bjrn. 2000. Synchronous chorusing and human origins. In The Origins
of Music, edited by Nils Lennart Wallin, Bjrn Merker, and Steven Brown,
Merker, Bjrn. 2002. Music: The missing Humboldt system. Musicae Scientiae 6
(1): 321.
Miller, George A., Eugene Galanter, and Karl H. Pribram. 1960. Plans and the
Structure of Behavior. New York: Henry Holt.
Mithen, Steven J. 2005. The Singing Neanderthals: The Origins of Music, Lan-
guage, Mind, and Body. London: Weidenfeld & Nicolson.
Montagu, Ashley. 1976. Toolmaking, hunting and the origin of language. Annals
of the New York Academy of Sciences 280 (1): 266273.
Nettl, Bruno. 2000. An ethnomusicologist contemplates universals in musical
sound and musical culture. In The Origins of Music, edited by Nils Lennart
Wallin, Bjrn Merker, and Steven Brown, 463472. Cambridge, MA: MIT Press.
Nottebohm, Fernando. 1975. A zoologistss view of some language phenomena
with particular emphasis on vocal learning. In Foundations of Language Develop-
ment: A Multidisciplinary Approach, edited by Elizabeth Lenneberg, 61103. New
York: Academic Press.
Patel, Aniruddh D. 2003. Language, music, syntax, and the brain. Nature Neurosci-
ence 6 (7): 674681.
Patel, Aniruddh D. 2006. Musical rhythm, linguistic rhythm, and human evolu-
tion. Music Perception 24 (1): 99104.
Patel, Aniruddh D. 2008. Music, Language, and the Brain. New York: Oxford
University Press.
Patel, Aniruddh D. 2013. Sharing and nonsharing of brain resources for language
and music. In Language, Music, and the Brain: A Mysterious Relationship, edited
by Michael A. Arbib, 329355.Cambridge, MA: MIT Press.
Patel, Aniruddh D., John R. Iversen, Micah R. Bregman, and Irena Schulz. 2009a.
Experimental evidence for synchronization to a musical beat in a nonhuman
animal. Current Biology 19 (10): 827830.
Patel, Aniruddh D., John R. Iversen, Micah R. Bregman, and Irena Schulz. 2009b.
Studying synchronization to a musical beat in nonhuman animals. Annals of the
New York Academy of Sciences 1169 (1): 459469.
Patel, Aniruddh D., John R. Iversen, Yanqing Chen, and Bruno H. Repp. 2005.
The influence of metricality and modality on synchronization with a beat. Experi-
mental Brain Research 163 (2): 226238.
Peretz, Isabelle, Julie Ayotte, Robert J. Zatorre, Jacques Mehler, Pierre Ahad,
Virginia B. Penhune, and Benot Jutras. 2002. Congenital amusia: A disorder of
fine-grained pitch discrimination. Neuron 33 (2): 185191.
Peretz, Isabelle, and Max Coltheart. 2003. Modularity of music processing. Nature
Neuroscience 6 (7): 688691.
Peretz, Isabelle, and Jos Morais. 1989. Music and modularity. Contemporary
Music Review 4 (1): 279293.
Poole, Joyce H., Peter L. Tyack, Angela S. Stoeger-Horwath, and Stephen
Watwood. 2005. Elephants are capable of vocal learning. Nature 434 (7032):
455456.
Ralls, Katherine, Patricia Fiorelli, and Sheri Gish. 1985. Vocalizations and vocal
mimicry in captive harbor seals, Phoca vitulina. Canadian Journal of Zoology 63
(5): 10501056.
Ravignani, Andrea. 2014. Chronometry for the chorusing herd: Hamiltons legacy
on context-dependent acoustic signalling. Biology Letters 10 (1): 20131018.
Rebuschat, Patrick, Martin Rohrmeier, John A. Hawkins, and Ian Cross, eds.
2012. Language and Music as Cognitive Systems. Oxford: Oxford University
Press.
Repp, Bruno H. 1998. A microcosm of musical expression. I. Quantitative analysis
of pianists timing in the initial measures of Chopins Etude in E major. Journal
of the Acoustical Society America 104 (2): 10851100.
Richman, Bruce. 1993. On the evolution of speech: Singing as the middle term.
Current Anthropology 34 (5): 721722.
Roberts, Ian. 2012. Comments and a conjecture inspired by Fabb and Halle. In
Language and Music as Cognitive Systems, edited by Patrick Rebuschat, Martin
Rohrmeier, John A. Hawkins, and Ian Cross, 5166. Oxford: Oxford University
Press.
Rousseau, Jean-Jacques. [1781] 1966. Essay on the Origin of Languages. Chicago:

University of Chicago Press.
Sanz, Crickette, Josep Call, and David B. Morgan. 2009. Design complexity in
termite-fishing tools of chimpanzees. Biology Letters 5 (3): 293296.
Schachner, Adena. 2010. Auditory-motor entrainment in vocal mimicking species:
Additional ontogenetic and phylogenetic factors. Communicative and Integrative
Biology 3 (3): 290293.
Schachner, Andrea, Timothy F. Brady, Irene M. Pepperberg, and Marc D. Hauser.
2009. Spontaneous motor entrainment to music in multiple vocal mimicking
species. Current Biology 19 (10): 831836.
Schusterman, Ronald J. 2008. Vocal learning in mammals with special emphasis
on pinnipeds. In The Evolution of Communicative Flexibility: Complexity, Cre-
ativity, and Adaptability in Human and Animal Communication, edited by D.
Kimbrough Oller and Ulrike Griebel, 4170. Cambridge, MA: MIT Press.
Schusterman, Ronald J., and Stephen H. Feinstein. 1965. Shaping and discrimina-
tive control of underwater click vocalizations in a California sea lion. Science 150
(3704): 17431744.
Schusterman, Ronald J., and Colleen J. Reichmuth. 2008. Novel sound production
via contingency learning in the Pacific walrus (Odobenus rosmarus divergens).
Animal Cognition 11 (2): 319327.
Simon, Herbert A. 1972. Complexity and the representation of patterned
sequences of symbols. Psychological Review 79 (5): 369382.
Sjare, Becky, Ian Stirling, and Cheryl Spencer. 2003. Structural variation in the
songs of Atlantic walruses breeding in the Canadian High Arctic. Aquatic
Mammals 29 (2): 297318.
Stoeger, Angela S., Daniel Mietchen, Sukhun Oh, Shermin de Silva, Christian T.
Herbst, Soowhan Kwon, and W. Tecumseh Fitch. 2012. An Asian elephant imi-
tates human speech. Current Biology 22 (22): 21442148.
Stout, Dietrich, Nicholas Toth, Kathy Schick, and Thierry Chaminade. 2008.
Neural correlates of Early Stone Age toolmaking: Technology, language, and
cognition in human evolution. Philosophical Transactions of the Royal Society B
363 (1499): 19391949.
Strogatz, Seven H. 2003. Sync: The Emerging Science of Spontaneous Order. New
York: Hyperion.
Strogatz, Steven H., and Ian Stewart. 1993. Coupled oscillators and biological
synchronization. Scientific American 269 (6): 102105.
Temperley, David. 2004. Communicative pressure and the evolution of musical
styles. Music Perception 21 (3): 313337.
ten Cate, Carel, and Kazuo Okanoya. 2012. Revisiting the syntactic abilities of
non-human animals: Natural vocalizations and artificial grammar learning. Philo-
sophical Transactions of the Royal Society B 367 (1598): 19841994.
Vaux, Bert, and Neil Myler. 2012. Metre is music: A reply to Fabb and Halle. In
Language and Music as Cognitive Systems, edited by Patrick Rebuschat, Martin
Rohrmeier, John A. Hawkins, and Ian Cross, 4350. Oxford: Oxford University
Press.
Wells, Kentwood D. 1977. The social behaviour of anuran amphibians. Animal
Behaviour 25 (3): 666693.
Westphal-Fitch, Gesche, and W. Tecumseh Fitch. Forthcoming. Towards a com-
parative approach to empirical aesthetics. In Art, Aesthetics and the Brain, edited
by Marcos Nadal. Oxford: Oxford University Press.
Williams, Leonard. 1967. The Dancing Chimpanzee: A Study of the Origins of
Primitive Music. New York: Norton.
Wright, Anthony A., Jacqueline J. Rivera, Stewart H. Hulse, Melissa Shyan, and
Julie J. Neiworth. 2000. Music perception and octave generalization in rhesus
monkeys. Journal of Experimental Psychology: General 129 (3): 291307.
Zarco, Wilbert, Hugo Merchant, Luis Prado, and Juan Carlos Mendez. 2009.
Subsecond timing in primates: Comparison of interval production between
human subjects and rhesus monkeys. Journal of Neurophysiology 102 (6):
31913202.
Zeman, Adam, Fraser Milton, Alicia Smith, and Rick Rylance. 2013. By heart:
An fMRI study of brain activation by poetry and prose. Journal of Consciousness
Studies 20 (910): 132158.
16 Neural Substrates for Linguistic and Musical Abilities:
A Neurolinguists Perspective1
Yosef Grodzinsky
It is tempting to say that musical and linguistic abilities, likely among the
hallmarks of humanity, are similar. What comes to mind are not only
formal properties and processing routines that these two abilities may
share, but also common brain mechanisms. In this chapter, I consider the
logic of inquiry and the current state of empirical evidence as they
pertain to the quest for common neural bases for language and music. I
first try to enumerate the properties that any cognitive ability akin
to language should possess (section 16.1), and move to a brief consider-
ation of the neurological argument for the modularity of language from
music (section 16.2). I then proceed to a critical review of studies that
have investigated gross double dissociations between music/language
(section 16.3). In section 16.4, I focus on studies of pitch discrimination
in amusia, which I critique (section 16.5). In section 16.6, I propose a
novel experimental paradigm for the study of pitch in language. In
response to past critiques, I show that this paradigm overcomes them.
The paradigm, which I present in detail, is based on semantic consider-
ations, specifically on the claim that only associates with focus (expressed
via pitch accent). When an element in a sentence is focused, a set of
alternative meanings emerges; only is a function that picks certain alter-
natives out of the focus set, and negates them. This paradigm helps to
create minimal sentence pairs that need not be compared in order to test
sensitivity to pitch accent. Rather, they can be investigated separately.
This property of the materials helps the new paradigm get around criti-
cisms raised in the literature by Patel and his colleagues. I conclude
(section 16.7) by alluding to salient properties of the speech of a famous
amusical individual.
It is most pleasing to use this space for a discussion of focus in the
context of music/language modularity, as these are two areas of inquiry
to which Ray Jackendoffan early teacher/mentor of minehas made
326 Yosef Grodzinsky
multiple, most valuable, contributions throughout his rich career (e.g.,

1972; 1983, passim).
16.1 Human Abilities Akin to the Linguistic
How can we tell that two (or more) classes of behaviors belong in the
same cognitive unit? We must ask whether they are governed by the
same set of building blocks and rules that combine them, structural con-
straints on such combinations, and algorithms that implement them in
use. Osherson (1981) puts it very succinctly:
. . . let C1 and C2 be two classes of processes and structures that conform to two
sets of interlocking and explanatory principles, P1 and P2, respectively. If the
properties of C1 can be proved not to be deducible from P2, and likewise for C2
and P1, then distinct faculties are (provisionally) revealed. (241242)
Fodor (1983) suggests several perspectives from which the modularity of

cognitive systems from one another can be assessed: a) the computa-
tional perspective, in which we inquire whether the structural principles
(a.k.a. knowledge) that govern one system can be deduced from those
of another; b) the implementational perspective, which examines identity
or distinctness of the processes that implement this knowledge in use;
c) the developmental perspective, which looks at similarities and dif-
ferences in the way cognitive systems unfold in the developing child; and
d) the neurological perspective, which explores anatomical and physio-
logical properties and brain loci that support each system.
Given what we currently know about language, here are some
properties we should require from a neurocognitive ability akin to
language:
I. It must be able to handle (i.e., analyze, perhaps even produce) strings,
or continua, that unfold over time.
II. It must be able to concatenate smaller forms into bigger ones by
combinatorial rules, to ensure rich expressiveness. These must be con-
strained by principles similar to linguistic ones.
III. Its inventory of basic forms must be meaning bearing, where smaller
pieces of meaning compose into larger ones.
IV. Diversity of forms and rules is permissible, as long as it is constrained
by universal principles.
V. Its dedicated mechanisms must be supported by specialized neural
clusters.
Neural Substrates for Linguistic and Musical Abilities 327
This list helps us home in on three suspects: mathematical and musical

abilities, and the ability to sequence motor actions. Each of these seems
to be a serious candidate for satisfying criteria I-IV. Indeed, some have
maintained that language and mathematics share a common cognitive
basis (Changeux, Connes, and DeBevoise 1998; Chomsky 1988; Hen-
schen 1920), while others have argued that the same holds for language
and motor ability (Schuell 1965; Kimura 1973a,b; Rizzolatti and Arbib
1998; Fadiga, Craighero, and Roy 2006).
Neurologically, we know that each of these abilities is associated with
a disorder or deficit, which may lead to the satisfaction of criterion V, the
focus of this chapter:
Linguistic ability aphasia
Mathematical ability acalculia
Motor ability apraxia
Musical ability amusia
Aphasia, acalculia, and apraxia manifest subsequent to focal brain
damage, which leads to debates regarding criterion V. Recent experimen-
tal evidence has suggested that language and mathematics are neurologi-
cally separable (Brannon 2005; Cohen and Dehaene 2000; Gelman and
Butterworth 2005).2 Regarding motor abilities, there have also been
claims for and against modularity, most notably in the context of the
Mirror Neuron theory (e.g., Rizzolatti and Arbib [1998]; Pulvermller
and Fadiga [2010]; Fazio et al. [2009]; see Grodzinsky [2006, 2013],
Venezia and Hickok [2009] for critical approaches).
In the case of music, matters are more complicated. Not only is neu-
rological evidence scarce, but also discussions of differences and similari-
ties between language and music have been rather complex. In his
famous Norton Lectures, Leonard Bernstein (1973) proposed to try to
find true parallels between language and music, since all musical think-
ers agree that there is such a thing as a musical syntax, comparable to a
descriptive syntax of speech (lecture 2).3 In the same spirit, Lerdahl and
Jackendoff (1980) claimed to have found deep parallels between lan-
guage and music. Jackendoff (2009) has further asked whether there is
domain specificity for language: What does music share with language
that makes them distinct from other human activities? (195). Katz and
Pesetsky (2011) have gone even further, formulating the Identity Thesis
for Language and Music: All formal differences between language and
music are a consequence of differences in their fundamental building
blocks (arbitrary pairings of sound and meaning in the case of language;

pitch-classes and pitch-class combinations in the case of music). In all
other respects, language and music are identical.
Bernstein, Jackendoff and Lerdah as well as Katz and Pesetsky discuss
the relation between language and music from a representation and
operational (processing) perspective. They steer clear of the neurological
perspective, on which I will henceforth focus. The question here, then,
will be: are there common neural substrates for musical and linguistic
processes? At present, the evidence doesnt tell us as much as wed like
it to. We can nonetheless try to think about new ways to explore it, but
that is not easy. In what follows, I will look at the form of the neurologi-
cal argument in each of its incarnations, and try to see what conclusions,
if any, can be drawn from the evidence at hand.
16.2 The Neurological Argument for the Separability of Language from Music
Schematically, tests of neurological modularity have the structure in (1):
(1) Functional anatomy type 1 Functional anatomy type 2
Measured variable A +
Measured variable B +
That is, to demonstrate neurological modularity and in keeping with

Oshersons dictum, tests that measure variables A and B must produce
different values in neurologically distinct areas of functional types 1 and
2, respectively. The putative result in (1) would therefore indicate that
the neural basis of the cognitive component(s) probed by test A is sup-
ported by area(s) or functional type 1, whereas B is supported by 2.
Crucially, A and B are distinct. This is the well-known argument from
Double-Dissociation (DD henceforth).
The DD argument can be applied in several ways, as the rows and
columns in (1) can have different headers, as detailed in (2):
(2) Pieces of the neurological argument
a. Types of functional anatomy: loci of lesion; loci of activation
clusters
b. Types of measured (dependent) variables: (i) behavior along
some dimension, (ii) brain activity due to behavioral manipulation
c. The behavior pieces chosen (driven by a cognitive theory)
In what follows, I will offer a critical review of past work along the lines
detailed in (1)-(2). I will then follow with a constructive proposal.
16.3 Gross Double Dissociations in Disease and in Health
Traditionally, neuropsychologists have been engaged in a search for DDs.

In the present context, the quest has been for cases in which language is
severely disrupted whereas music remains intact, juxtaposed with cases
in which language is intact, but music is gone, as schematized in (3),
where the measured variables A and B of table (1) are replaced by non-
specific tests of musical and linguistic ability, and the functional areas are
replaced by missing (lesioned) brain regions:
(3) Lesion in brain locus 1 Lesion in brain locus 2
Musical ability High performance Low performance
Language ability Low performance High performance
Plainly put, the expectation here is to observe aphasia without amusia

and vice versa. Such cases seem to exist (Peretz 1993; Peretz and Col-
theart 2003; Grodzinsky and Finkel 1998), and their functional impair-
ment is described as follows.
G.L. (Peretz 1993) is a Qubec man who apparently has amusia without
aphasia. He has lesions in both the right and left superior temporal gyri,
temporal poles, inferior frontal gyri and insulae. Out of 140 musical
excerpts . . . familiar to everyone in Qubec . . . he could not identify a
single one . . . he was able to discriminate changes between single
pitches . . . was sensitive to differences in melodic contour in short melo-
dies. Yet he showed an absence of sensitivity to musical key. Language
was largely intact. He scored 32/36 on the Token Test. He scored in the
normal range on standardized aphasia tests.
J.C. (Grodzinsky and Finkel 1998) is a woman who apparently
suffers from aphasia without amusia. She has a fronto-temporal lesion,
including Brocas area, her speech is non-fluent and agrammatic,
she speaks in short utterances and omits functional vocabulary. Her
musical abilities are intact; an opera singer and a voice teacher prior
to the cerebro-vascular incident that impaired her, she can still sing
rather well.4
(4) Patient G.L.: Lesion excludes Patient J.C.: Lesion includes

Brocas region Brocas region
music Low performance High performance
language High performance Low performance

J.C.s comprehension deficit was documented in detail (Grodzinsky and

Finkel 1998). It consisted of a syntactic impairment, manifested through
deficient performance on a forced-binary-choice Sentence-to-Picture
Matching task (5), and on a Grammaticality Judgment task (6) that fea-
tured grammatical sentences (6a,c) as well as violations (6b,d). The
symbol represents the extraction site:
(5) Comprehension performance % correct
a. The woman who dried the girl was thin 70

b. The woman who the girl dried was thin 40
(6) Judgment of well-formedness % correct

a. It seems to Sally that the father rewards himself 80
b. *It seems to Sally that the father rewards herself 70
c. The father seems to Sally to reward himself 40
d. *The father seems to Sally to insult herself 30
The documented performance of these patients suggests that aspects

of language and music are indeed doubly dissociated.
A similar logic has guided inquiries with healthy populationsthe
idea has been to search for double dissociations in the healthy brain. For
example, in fMRI, a linguistic task is expected to activate neuronal aggre-
gate X but not Y; whereas a musical task would activate cell aggregate
Y but not X. Thus Koelsch (2005) reports ERP and fMRI studies of well
formedness, in which activations of well-formed sentences were com-
pared to regular and irregular (tonal) musical pieces. This schematic
design is in (7), where represents the difference in activation level
between test and control (i.e., between brain activation with +well-
formed continua stimuli and well-formed ones):
(7) Brain loci for language Brain loci for music
Music (well-formed) Low activation High activation
Language (well-formed) High activation Low activation
Koelsch reports large bilateral frontal and temporal (perhaps temporo-

parietal) regions that are activated by the musical contrast, which he
juxtaposes to left Brodmann Areas 44, 45regions traditionally thought
to be activated by syntax.
Using the same logic, Fedorenko et al. (2012) have monitored the
fMRI signal with a different set of contrasts. That is, they used scrambled
sentences, which they compared to songs. Their study had the following
schematic design:
(8) Brain loci for language Brain loci for music
Music (Scrambled) Low activation High activation
Language (Scrambled) High activation Low activation
Loci for language were found in the left inferior frontal gyrus (LIFG,
roughly Brocas region), left middle frontal gyrus, and left anterior,
middle, and posterior temporal regions, as well as the angular gyrus.5
Music areas were found on both hemispheres, from right and left anterior
and posterior temporal regions, to right and left premotor, supplemen-
tary motor, areas. Again, a double dissociation is demonstrated, but not
as sharply as one would have wished.
We might examine the results of these studieswhether they evince
neuroanatomical overlap between language and music, and whether we
observe a match between the lesion studies and those in health, or even
anatomical congruence between the two sets of fMRI studies. However,
before looking at the results, we might question the choice of tasks,
materials, and contrasts:
I. Are the musical and linguistic materials and contrasts uniform? If dif-
ferent studies use different types of contrasts, why would one expect the
resulting errors (in the case of lesion work) or activation patterns (in
health) to be similar in the first place?
II. Are the musical and linguistic tasks matched? The specificity/
modularity agenda requires use of parallel methodology and reasoning
across cognitive domains.
III. How do the tests connect to linguistic and musical structure? The
interest in the relation between language and music stems from the belief
that linguistic and musical strings are structured and governed by rules.
We also know that the neuropsychology of other domains indicates
complex symptomatology that differentiates between different syndrome
types within each domain. How does this structural complexity enter into
the considerations here?
Reviewing the studies above, we begin with the neuropsychological
cases. G.L. and J.C. received a mixed bag of tests. G.L.s linguistic abilities
were assessed through the Token Test (De Renzi and Vignolo 1962).
However it is not clear that this test assesses linguistic, as opposed to

general cognitive, skills: it presents a display of shapes in different colors
and sizes, and requests the subject to act on statements whose complex-
ity, on a metric that has little to do with linguistic structure, is varied.
For example, in a context of large objects onlywhere the properties to
be attended to are shape and colorthe command may be pick up
the yellow circle; in a context that includes all objects, with more proper-
ties to attend to than before (shape, color, size), the command may
be pick up the small yellow rectangle AND the large red circle. With
this structure, it is very difficult to ascertain that the dimension on
which difficulty increases here is linguistic. The increased length of
command, and its appeal to a larger number of spatial properties
of shapes, might well tap some general cognitive resource in an incre-
mental fashion. In other words, G.L.s success on the Token Test is not
indicative of full linguistic ability, as this test may well have missed fine
linguistic deficits. Musically, G.L. was asked (and failed) to identify tunes
that were well-known in his culture at the time. There was nothing in this
test to suggest a direct analogy or parallel to the linguistic test just
described.
J.C. was given a very different language test battery, in which syntactic
structure was varied systematically along the Movement dimension, and
constraints on syntactic movement were occasionally violated. In the
musical domain, she was not tested formally, but as she actively sang, we
are fortunate to have access to recordings of her singing ability. Her
singing, good as it was, did not necessarily tap all her musical abilities; in
particular, it was not designed to match the linguistic materials in terms
of structural complexity or difficulty. Thus on all counts, G.L. and J.C. do
not constitute a double dissociation. The road to such dissociations
appears long and treacherous.
More recent studies of music/language relations in health appear to
be finer grained in terms of the choices made. Still, we might want to
scrutinize the materials and contrasts chosen, as well as the tasks. We
should also review the degree of cross modal matchingthe extent to
which tests carried out in different modalities are matched in terms of
the generic resources they require. One useful review is Koelsch (2005),
which looked at studies in which violation of musical expectation in
musical continua were compared to violations of grammaticality in lan-
guage. Regrettably, there is no discussion of the nature of the violations
in both domains and the rationale behind the choices made; nor is there
a parallelism established between the violations across domains. One
wonders, therefore, whether the contrasts chosen are (i) representative

of their respective domains in a theoretically justifiable fashion, and (ii)
whether they are parallel.
Fedorenko et al. (2012) carried out an fMRI study using a collection
of English sentences and Western musical pieces. Both sentences and
musical pieces were presumably matched, and were further contrasted
to their scrambled versions (i.e., scrambled sentences and songs). Their
participants were asked, in the music task, how much did you like this
piece? and in the language task did X feature in the stimulus? where
X was a given memory probe. The authors search for regions of interest
(ROIs) in the brain, defining them on the basis of a functionally selective
activation pattern they exhibit (i.e., in terms of the between brain
responses to scrambled vs. non-scrambled stimuli was low for music and
high for language). They are therefore known as fROIs (as opposed to
anatomical ROIs, defined by anatomical properties such as borders or
topography). Fedorenko et al.s first goal was to identify fROIs sensitive
to language but not music (fROI-1), as well as opposite functional regions
(fROI-2). Finding such a DD, they argue for a functional double dissocia-
tion (3294).
But in light of the above discussion, one might wonder about the
motivation for the choice of basic stimuli, whether the language and
musical tasks were on a par, and moreover whether scrambling the
stimuli is a theoretically interpretable manipulation. Finally, it is not
clear that the tasks in the two domains are parallel. The absence of dis-
cussion of any of these issues leaves a reader puzzled. These authors
justification of the choice of linguistic and musical continua comes from
the fact that all these continua activate clusters in each individual
subjectan interesting observation, but hardly a key to interpretation.
There is also no reason to think that the tasks were on a par, one request-
ing a likeability ranking, another word monitoring. These choices, as
well as the principles that may underlie the irregularity induced by
scrambling in each domain, are not discussed any further. The materials
and tasks are thus left as a black box, which seems to preclude a conclu-
sion of the sort that Fedorenko et al. wish to draw. It would appear, then,
that theoretically motivated and better matched tests would be needed
in order to evaluate whether language and music are supported by the
same brain regions.
Let me be a bit more specific. One would imagine that Fedorenko
et al.s interest in differences and commonalities between language
and music stems from the fact that sentences and musical pieces are
structured, rule-governed objects. Indeed, they seem to suppose that any

task that involves musical combinations (compared to blatant violations
of combinatorial rules) is comparable to a task that involves linguistic
combinations (compared to blatant violations of combinatorial rules).
By this logic, the tasks they used were on a par.
But differences and commonalities between these two presumed fac-
ulties or modules can only be established through a detailed and
precise specification of the combinatorial rules at issue. Only this way
can a valid comparison be established and thus ignoring the details, as
Fedorenko et al. do in this case, does not really help. In order to argue
that music and language are distinct, we need to ascertain that similar
musical and linguistic contrasts and tasks indeed tapped different neural
resources. An argument for the modularity of these two faculties would
first require a demonstration that the contrasts used were equally taxing,
that task demands tapped the same structural principles/combinatorial
rules. This is not likely to have been the case here, and at any rate, no
discussion of this issue is found in the paper. As a result, we are left in
the dark.
The foregoing discussion and critique leads to several desiderata from
a proper design for music/language experiments. To be truly informative,
such experiments should:
Make an explicit connection to theories of musical and linguistic
knowledge
Keep task demands parallel across modalities and groups
Focus on cognitive dimensions that are relevant to structural analysis
in all domains
Once the right cognitive dimension is found, the DD schema in (1)
can be refined. Below are sketchy design tables for studies aimed to
detect double dissociations in disease via selective performance deficien-
cies (9), and in health via localized signal intensity differences (10), which
I develop below:
(9) DDS IN LESION STUDIES Functional Deficiency A Functional Deficiency B
Music (right cognitive Low performance High performance

dimension)
Language (right cognitive High performance Low performance

dimension)
(10) DDS IN fMRI SIGNAL DETECTION IN Brain region A Brain region B

HEALTHY INDIVIDUALS
Music (right cognitive dimension) Low activation High activation
Language (right cognitive dimension) High activation Low activation
Next, I will try to illustrate how such a research program is implemented.
16.4 Pitch Discrimination in Amusia
Pitch is that quality of sound that allows us to play musical melodies.

Moreover, it represents an abstraction: many different sounds have the
same pitch (Schnupp et al. 2011, chap. 3). It is an abstraction, as many
different instruments (and voices) can produce sounds with the same
pitch. It is thus among the most important properties of sound that help
humans make music (Nelken 2011). No wonder, then, that it has featured
in the research program that attempts to connect music to the neural
tissue that supports it (e.g., Ayotte et al. 2002; Hyde et al. 2011; Patel
2012). It is also important for linguistic meaning and communication: as
in the difference between sentences in which a different element is
focused as manifested by pitch accent (e.g., between HE congratulated
you and he congRATUlated you).
Pitch therefore avails us of a possible dimension along which we can
compare the linguistic and the musical. Indeed, several studies have
attempted to identify pitch discrimination problems in the linguistic
context in so-called amusical individuals, who suffer musical pitch defi-
cits. Ayotte et al. (2002) asked these individuals to detect differences in
melody pairs that differed in one semi-tone (positioned quasi-randomly),
and then presented them with a language task, in which they were asked
to indicate whether two sentences were the same or different, where the
pairs consisted of sentences that differ in pitch accent:
Sentences compared (Ayotte et al., 2002)
(11) a. Sing NOW please!

b. SING now please!
Ayotte et al. found that amusical individuals were near normal in dis-
criminating between these sentences. Their success here, contrasted with
their failure on the musical discrimination task, led Ayotte et al. to con-
clude that music and language are modular from one another.
Patel et al. (2008) and Liu et al. (2010) disagree with this conclusion.
To them, the high performance on (11) is not particularly telling.
Amusical individuals may have succeeded on (11) because salient pitch

changes can be tagged according to the syllable on which this occurs,
thus reducing the memory demands of the task (Liu et al. 2010, 1683).
The idea is that location of the pitch rise in the sentence could serve as
a cue in the comparison task. Patel and colleagues therefore suggest
ignoring the Ayotte et al. (2002) result, and moving on to instances in
which tagging is not an option, like question/statement pairs as in
(12)(13), where the pitch difference was always in the same (sentence-
final) position.6 This linguistic material, they argue, would be a better test
of language-music modularity, as the materials are now better matched.
And so these materials were administered with two types of tasks: dis-
crimination (same/different) and identification (question or statement):
Stimuli from Liu et al. (2010)
(12) a. She looks like Ann!

b. She looks like Ann?
(13) a. He was born in Illinois!

b. He was born in Illinois?
Indeed, those individuals who had serious trouble with the musical
comparison task were also not good at distinguishing questions from
statements as in (12)(13). Patel and colleagues conclude that this
resultthe cross-modal co-occurrence of failuresargues against
domain specificity, as the musical deficit co-occurs with a linguistic one.
Still, the conclusion reached by Patel and his colleagues may be a bit
hasty. As the stakes are highat issue is music/language modularityI
would like to revisit Ayotte et al.s results for (11), and see whether a
different interpretation is possible. I will then propose a way to get
around the experimental problems noted by Patel and his colleagues, one
that might lead to an improved test, with the hope of obtaining a some-
what higher resolution than previous studies.
16.5 A Critique of the Pitch Discrimination Studies
The situation as presented, then, is as follows:
(14) Amusical performance
Music ( pitch) Low discrimination
Language a. Focus location ( pitch)(11) High discrimination

b. Question/statement ( pitch)(12)(13) Low discrimination
Patel and his colleagues reject the relevance of the focus discrimination
test in (11), arguing that those in (12)(13) are more informative.
But there may be reasons to take the opposite viewto argue that
in fact the amusical subjects success on the contrast in (11) is a
better benchmark of their pitch identification in the linguistic context
than their failure in (12)(13). In what follows, I will try to argue for
the latter view. A successful argument would hopefully reopen the
possibility of an empirical argument in favor of language/music
modularity.
To begin with, let me note that amusical subjects, said to fail in recogni-
tion and imitation tasks with familiar musical pieces, reportedly have
normal communicative skills.7 And yet, if the failure of these individuals
to discern a question from a statement as in (11)(12) is indicative of a
communication deficit, why is it not manifest in their daily linguistic
functioning? It is true that many communicative acts contain many
cues beyond pitch regarding semantic type, but there surely are instances
in which such a discrimination deficit would manifest in communication.
As Liu et al. (2010) point out, amusics rarely report problems outside
the musical domain, but proceed to suggest that it may be expected
that these individuals would struggle with aspects of spoken language
that rely on pitch-varying information (1682). Curiously, while the
amusical subjects performance level on the imitation task was lower
than normal (only 87 percent correct), it was much higher than their
chance performance in identification or discrimination. While Liu et al.
acknowledge the absence of noticeable communicative deficits in amusi-
cal subjects, they nonetheless insist that pitch deficits can be behavior-
ally relevant to both speech and music (1691), offering no further
discussion.
Next, consider the argument that linguistic pitch is carried by meaning-
and form-bearing objects, whereas musical pitch is not. Patel (2012)
proposed this distinction to account for the amusical subjects success in
the focus discrimination condition (11). The idea is that pitch is linked
to a word (perhaps to a syllable), whereas pitch contours without lan-
guage are not, and that this link might have eased memory demands.
While this might be the case, but a question immediately arises: why
are the same subjects worse with question/statement pairs, which
also have syllabic, lexical, and propositional content? Moreover, do we
understand the reasons behind the differential performance found
between discrimination/identification and imitation of questions and
statements?
These questions remain unanswered, leading to apparent inconsisten-

cies in the data. In light of these, I would like to suggest ways to revisit
the language/music modularity question in a manner that gets around
some of the problems.
16.6 A Proposal: Focus Structures with and without Only
In this section, I will put forward a simple proposal for an improved pitch
test in the linguistic domain, one that would get around Patels tagging
critique of Ayotte et al.s focus discrimination study, and would also be
on a par with the typical musical recognition task on which amusical
subjects fail. The goal here is to situate the tasks in a more naturalistic
context, which would not require a comparison (made easy by tagging),
and moreover would not be taxing in a way that isnt necessarily relevant
to communication. For that purpose, I will propose a task in which pitch
is required for linguistic (as opposed to meta-linguistic) analysis in sen-
tence comprehension. That is, as amusical subjects fail in simple tasks in
which pitch is crucial, namely recognition of familiar musical pieces (let
alone singing them or detecting deviations from melodic lines), we might
want to create a linguistic analogue, in which difference in pitch accent
would be crucial for language use. Natural candidate tasks involve com-
prehension, question answering, or verification. I will focus on the latter,
in the hope of finding a test for Patels claim that the discrimination
between different sites of pitch accent within a sentence is not a valid
test of sensitivity to pitch.
In many of the worlds languages (though by no means all), semantic
focus is triggered by pitch accent. Semantic focus evokes a set of alterna-
tives, picks out one, and makes it more salient. Ray Jackendoff made
an early contribution to the analysis of focus in the generative frame-
work, analyzing it through the use of the structured-meaning approach;
later, alternative semantics was introduced (Rooth 1985; 1992), which is
what guides my brief presentation below. In (15a), we have a sentence
p, and focus on an element within p evokes a set of alternatives whose
members are all propositions that John introduced Bill to someone in
the context.
Simplifying somewhat, assume a context C that features a scenario
in which John (and only he) is introducing people to one another, and
where the other participants are Bill, Mary, Betty, and Sue. Focus on
Sue, conveyed through pitch accent, asserts the proposition p (15b),
and in addition gives rise to additional, focus semantic value by allowing
a set of alternatives AcSUE (15c), of which one is made more salient

by focus:
(15) Focus evokes an alternative set AC
a. John introduced Bill to SUE (though there were others present)

b. p = John introduced Bill to Sue
c. AcSUE = {xDe/John introduced Bill to x}=
={John introduced Bill to Sue, John introduced Bill to Betty, John introduced
Bill to Mary}8
Focus, then, underscores the meaning in which the alternative contain-

ing p, John introduced Bill to Sue, is made the most salient one. We
need not get into the details of the mechanism here. Suffice it to
note that the critical element for us here is the set of alternatives that
focus gives rise to, and that this set varies with the focused element.
Thus, in (16) below, the set of alternatives AC is different from that in
(15) above:
(16) Focus evokes an alternative set AC
a. John introduced BILL to Sue (though there were others present)

b. p = John introduced Bill to Sue
c. AcBILL = {xDe/John introduced x to Sue}=
={John introduced Bill to Sue, John introduced Betty to Sue, John introduced
Mary to Sue}
Sentences (15a) and (16a) make the same assertion p, but differ in
their focus semantic value, as AcSUE AcBILL. Namely, the focus value
evoked when the pitch accent is on Bill or Sue is different. Thus a
scenario in which John introduced Bill to someone, who happened to
be Sue, is compatible with (15) but not (16), whereas a situation in
which John introduced someone to Sue, and that someone was Bill, is
compatible with (16) but not (15). The respective acceptability judgments
follow.
The meaning differences between (15) and (16) may seem somewhat
murky, because focus makes a certain alternative more salient than
others, and the notion of salience is somewhat difficult to capture.
However, matters become crystal clear when only is introduced as an
element that associates with focus (Rooth 1985). Sentential only is a
function that combines with a sentence p and a set of alternatives Ac that
focus evokes (i.e., the set of all non-weaker alternative propositions to p
that is supplied by C), and returns a set of propositions in which all but
p are negated. The result is a sentence that asserts p, where all the other
alternatives in Ac are false (Rooth 1985; Fox 2007):
(17) [[only]] (A<st,t>)(pst) = w: p(w) = 1.q A: q(w) = 09
a. Informally: only is a function that takes a proposition p, a world w, and

a set of alternatives A, presupposes that p is true in w, and makes false in
w every proposition q that is non-weaker than p.
A concrete application is given in (18):
(18) John only introduced Bill to SUE (and to no other individual present)
a. p = John introduced Bill to Sue
b. AcSUE = {xDe/John introduced Bill to x} = {John introduced Bill to Sue,

John introduced Bill to Betty, John introduced Bill to Mary}
c. Only(p)(AcSUE) = {John introduced Bill to Sue, (John introduced Bill

to Betty), (John introduced Bill to Mary)}
d. It is true that John introduced Bill to Sue, but it is false that John
introduced Bill to Betty, and it is false that John introduced Bill to
Mary
When only associates with another focused element in p, the result is a

different meaning, because Ac Ac and the application of only to it would
negate different alternatives:
(19) John only introduced BILL to Sue (and to no other individual present)
a. p = John introduced Bill to Sue
b. AcBILL = {xDe/John introduced x to Sue} = {John introduced Bill to Sue,

John introduced Betty to Sue, John introduced Mary to Sue}
c. Only(p)(AcBILL) = {John introduced Bill to Sue, (John introduced Betty

to Sue), (John introduced Mary to Sue)}
d. It is true that John introduced Bill to Sue, but it is false that John
introduced Betty to Sue, and it is false that John introduced Mary to
Sue
We can now see that although (18) and (19) make the same assertion p,
they have different truth-conditions, because pitch accent marks a differ-
ent element in each case, thereby evoking a different set of alternatives.
Only then negates every proposition q p:
(20) Alternative sets of (18) vs. (19)
a. AcSUE = {xDe/John introduced Bill to x} = {John introduced Bill to Sue,

John introduced Bill to Betty, John introduced Bill to Mary}
b. AcBILL = {xDe/John introduced x to Sue} = {John introduced Bill to Sue,

John introduced Betty to Sue, John introduced Mary to Sue}
The reader may have noticed that the examples chosen above all involve
a ditransitive predicate (introduce). This is done on purpose, in order to
make the meaning contrast that different focus choices produce as
minimal as possible. The idea here is to create a task whose performance
requires sensitivity to pitch accent, and where pitch accent is placed on
elements that are syntactically and semantically on a par, modulo the
task at hand.10 Only needs focus, and our task would include a possible
position for association with focus on each of the two objects of the
ditransitive verb. Normal performance on a verification task, given a
scenario, would require the identification of focus location, which would
occur in the absence of a comparison between two representations. In
this task, tagging as postulated by Patel and his colleagues, is not
possible.
Let me provide a concrete example of how this meaning contrast is
produced:
(21) Scenario C: John made several introductions. He introduced Bill to Sue.

He then introduced Mary to Betty. Finally, he introduced Mary to Sue.
There were no other introductions.
Sentences: a. John only introduced Bill to SUE True in C

b. John only Introduced BILL to Sue False in C
Lets analyze what happened in each case. Scenario (context) C makes

the assertion in both (21a) and (21b) true. Yet, C makes no member of
the alternative set of (21a) false, which is hence true. However, (21b)
contains the proposition (John introduced Mary to Sue), which is false
in C, as the reader may verify.
The reader may likewise verify that scenario C, described in (22),
produces opposite results:
(22) Scenario C: John made several introductions. He introduced Bill to Mary.

He then introduced Mary to Betty. Finally, he introduced Bill to Sue. There
were no other introductions.
Sentences: a. John only introduced Bill to SUE False in C

b. John only introduced BILL to Sue True in C
The above sketch makes it quite clear, I hope, that this setupthe asso-
ciation of only with focusallows for the testing of sensitivity to pitch
accent in a task that does not require discrimination. When the right
controls are introduced (and there are many, to be sure), this should
allow for testing through a verification (truth-value judgment) task. It is
equally easy to imagine, I think, a production task with scenarios like
(21) and (22), in which amusical subjects would be forced to use only,
and the issue would be whether or not they can successfully use pitch
accent to mark the associated focus.
An implementation of this proposal is presently unavailable. What is
important about it is that the above does not enable tagging, because
no comparison or discrimination between two utterances is required.
Patel et al. would predict that amusical subjects would fail in this verifica-
tion task. Failure on their part would provide strong empirical evidence
against the modularity of language and music. And thus, while at present
no relevant result is available, the jury appears to be still out on the
modularity of language and music, at least until a result of the proposed
experiment, or some related one, is obtained.
16.7 Coda
I tried to revive the notion that amusia, as reported in the clinical litera-
ture, does not co-occur with a language deficit (contra Liu et al. [2010]).
One anectodal, yet not insignificant, observation relates to the famous
late economist Milton Friedman, believed to be amusical. His fame
allows us to have access to speech samples of his. An important example
is an interview on Greed he granted Phil Donahue in 1979.11 If you
havent seen it, I would urge you to do so, for Friedmans especially
expressive intonation, containing many questions and exclamations
(apparently intended to make his argumentation more convincing) might
make a compelling case for language/music modularity.
Notes
1. An earlier version of this paper was presented at Music and Brains: The Sur-
prising Link, an ELSC/ICNC conference at Mishkenot ShaananimJerusalem,
the Hebrew University of Jerusalem, February 10th, 2013. I would like to thank
the organizers, Eli Nelken, Ronny Granot, and Nori Jacoby for their kind invita-
tion. I also thank the following agencies and institutions for their support:
Edmond and Lily Safra Center for Brain Sciences, Canada Research Chairs
(CRC), and the Canadian Social Science and Humanities Research Council
(SSHRC). Eli Nelkens comments and the crucial help of Michael Wagner, Direc-
tor of McGills Prosodylab, is also gratefully acknowledged.
2. This evidence, however, is mostly based on work at the single word level, while
the linguistic perspective focuses on operations that form larger expressions from
more basic units (Varley, Klessinger, Romanowski, and Siegal [2005] being a
possible exception). See Heim et al. (2012), Deschamps et al. (under review) for
further evidence that bears on this issue.
3. Bernstein also proposed to use linguistic tools in order to build an analogy

between musical and linguistic procedures and to seek the world-wide inborn
musical grammar (lecture 1).
4. J.C. has courageously participated in a public concert subsequent to her stroke.
Her singing in this event can be viewed at http://www.drunkenboat.com/db7/
feature-aphasia/curtis/index.html.
5. Note that localizing claims here must be taken as rough approximations
rather than precise pointers, as these authors use functional, as opposed to ana-
tomical, localization (fROIs). Indeed, their expressed focus is on DDs of func-
tion, rather than on the identification of the exact anatomical loci of these
functions.
6. The pitch difference in the language task (511 semi-tones) is greater than the
one in its musical counterpart, in which, recall, one note was changed by a single
semi-tone. Yet this difference between the musical and linguistic discrimination
tasks cannot be the reason for success, as the same difference did not help the
amusical subjects when asked to discriminate question/statement pairs.
7. Patel et al. (2008) and Stewart (2008) mention the well-known economist and
public figure Milton Friedman, as well as activist Che Guevara, as having been
amusical. One cant help but doubt the possibility that their deficit extended to
linguistic pitch in a manner that would have hampered their ability to distinguish
questions from statements.
8. Notice that the set A is constructed so as to exclude alternatives that p entails
(Fox 2007). E.g., p = that John introduced Bill to Sue entails the alternative that
John introduced someone to Sue. As the latter is weaker than p, it carries the
same truth value as p. Thus, of the set of possible alternatives, we only include
the set of non-weaker (NW) ones, which contains those alternatives to p which
are not entailed by p:
(i) NW(p, A) = {qA: p does not entail q}
E.g., that John introduced Betty to Sue, an alternative to p, neither entails nor is
entailed by p. It is thus a member of NW. For notational simplicity, I henceforth
assume that Ac = NW. This assumption will become more significant below, in
the context of only.
9. Once again, just non-weaker alternatives are negated by only (Fox 2007), all
others are entailed by p, hence true (as p is presupposed to be true).
10. A reviewer notes that pitch accent on Bill in (19) ensures that Bill is focus-
marked, but pitch accent on Sue in (18) is compatible with F-marking either on
Sue, or on the whole VP introduced Bill to Sue. As empirical evidence, s/he notes
that (19) is a good answer to what did John do yesterday? but (18) is not. This
observation, while valid, has no interaction with the present proposal: (18) and
(19) uncontroversially differ in truth conditions due to focus marking, and the
verification task at issue, in which the scenarios mentioned in the text are pro-
vided, would therefore distinguish between the two approaches to amusia as
described in the text.
11. http://www.youtube.com/watch?v=RWsx1X8PV_A.
References
Ayotte, Julie, Isabelle Peretz, and Krista Hyde. 2002. Congenital amusia: A
group study of adults afflicted with a music-specific disorder. Brain 125 (2):
238251.
Bernstein, Leonard. 1973. The Unanswered Question. Cambridge, MA: Harvard
University Press.
Brannon, Elizabeth M. 2005. The independence of language and mathematical
reasoning. Proceedings of the National Academy of Sciences of the United States
of America 102 (9): 31773178.
Changeux, Jean-Pierre, and Alain Connes. 1998. Conversations on Mind, Matter,
and Mathematics. Edited and translated by M. B. DeBevoise. Princeton, NJ:
Princeton University Press.
Chomsky, Noam. 1988. Language and Problems of Knowledge: The Managua
Lectures. New York: Cambridge University Press.
Cohen, Laurent, and Stanislas Dehaene. 2000. Calculating without reading:
Unsuspected residual abilities in pure alexia. Cognitive Neuropsychology 17 (6):
563583.
De Renzi, Enio, and Luigi Vignolo. 1962. The Token Test: A sensitive test to detect
receptive disturbances in aphasics. Brain 85 (4): 665678.
Deschamps, Isabelle, Galit Agmon, Yonatan Loewenstein, and Yosef Grodzinsky.
Under review. Quantities and quantifiers: Webers law, monotonicity and modu-
larity. MS. McGill University and The Hebrew University, Jerusalem.
Fadiga, Luciano, Laila Craighero, and Alice Roy. 2006. Brocas region: A speech
area? In Brocas Region, edited by Yosef Grodzinsky and Karin Amunts, 137152.
New York: Oxford University Press.
Fazio, Patrik, Anna Cantagallo, Laila Craighero, Alessandro DAusilio, Alice C.
Roy, Thierry Pozzo, Ferdinando Calzolari, Enrico Granieri, and Luciano Fadiga.
2009. Encoding of human action in Brocas area. Brain 132 (7): 19801988.
Fedorenko, Evelina, Josh McDermott, and Nancy Kanwisher. 2012. Sensitivity to
musical structure in the human brain. Journal of Neurophysiology 108 (12):
32893300.
Fodor, Jerry. 1983. The Modularity of Mind. Cambridge, MA: MIT Press.
Fox, Danny. 2007. Free choice disjunction and the theory of scalar implicatures.
In Presupposition and Implicature in Compositional Semantics, edited by Uli
Sauerland and Penka Stateva, 71120. New York: Palgrave Macmillan.
Gelman, Rochel, and Brian Butterworth. 2005. Number and language: How are
they related? Trends in Cognitive Sciences 9 (1): 610.
Grodzinsky, Yosef. 2006. The language faculty, Brocas region, and the mirror
system. Cortex 42 (4): 464468.
Grodzinsky, Yosef. 2013. The mirror theory of language: A neuro-linguists per-
spective. In Language Down the Garden Path: The Cognitive and Biological Basis
for Linguistic Structure, edited by Montserrat Sanz, Itziar Laka, and Michael
Tanenhaus, 333347. Oxford: Oxford University Press.
Grodzinsky, Yosef, and Lisa Finkel. 1998. The neurology of empty categories:
Aphasics failure to detect ungrammaticality. Journal of Cognitive Neuroscience
10 (2): 281292.
Henschen, Salomon Eberhard. 1920. Klinische und anatomische Beitrge zur
Pathologie des Gehirns. Stockholm: Nordiska Bokhandeln.
Heim, Stefan, Katrin Amunts, Dan Drai, Simon Eickhoff, Sara Hautvast, and
Yosef Grodzinsky. 2012. The language-number interface in the brain: A complex
parametric study of quantifiers and quantities. Frontiers in Evolutionary Neuro-
science 4 (4): 112.
Hyde, Krista L., Robert J. Zatorre, and Isabelle Peretz. 2011. Functional MRI
evidence of an abnormal neural network for pitch processing in congenital
amusia. Cerebral Cortex 21 (2): 292299.
Jackendoff, Ray. 2009. Parallels and non-parallels between language and music.
Music Perception 26 (3): 195204.
Jentschke, Sebastian, and Stefan Koelsch. 2008. Musical training modulates the
development of syntax processing in children. NeuroImage 47 (2): 735744.
Katz, Jonah, and David Pesetsky. 2011. The identity thesis for language and music.
MS. Institute Jean Nicod and MIT.
Kimura, Doreen. 1973a. Manual activity during speakingI. Right-handers. Neu-
ropsychologia 11 (1): 4550.
Kimura, Doreen. 1973b. Manual activity during speakingII. Left-handers. Neu-
ropsychologia 11 (1): 5155.
Koelsch, Stefan. 2005. Neural substrates of processing syntax and semantics in
music. Current Opinion in Neurobiology 15 (2): 207212.
Liu, Fang, Aniruddh D. Patel, Adrian Fourcin, and Lauren Stewart. 2010. Intona-
tion processing in congenital amusia: Discrimination, identification, and imita-
tion. Brain 133 (6): 16821693.
Nelken, Israel. 2011. Music and the auditory brain: Where is the connection?
Frontiers in Human Neuroscience 5: 106.
Osherson, Daniel N. 1981. Modularity as an issue for cognitive science. Cognition
10 (13): 241242.
Patel, Aniruddh. 2012. Language, music, and the brain: A resource-sharing frame-
work. In Language and Music as Cognitive Systems, edited by Patrick Rebuschat,
Martin Rohrmeier, John A. Hawkins, and Ian Cross, 204223. Oxford: Oxford
University Press.
Patel, Aniruddh, Meredith Wong, Jessica Foxton, Aliette Lochy, and Isabelle
Peretz. 2008. Speech intonation perception deficits in musical tone deafness
(congenital amusia). Music Perception 25 (4): 357368.
Peretz, Isabelle. 1993. Auditory atonalia for melodies. Cognitive Neuropsychology
10 (1): 2156.
Peretz, Isabelle, and Max Coltheart. 2003. Modularity of music processing. Nature
Neuroscience 6: 688691.
Pulvermller, Friedemann, and Luciano Fadiga. 2010. Active perception: Senso-
rimotor circuits as a cortical basis for language. Nature Reviews Neuroscience 11
(5): 351360.
Rizzolatti, Giacomo, and Michael Arbib. 1998. Language within our grasp. Trends
in Neurosciences 21 (5): 188194.
Rooth, Mats. 1985. Association with Focus, PhD. diss., University of Massachu-
setts, Amherst.
Rooth, Mats. 1992. A theory of focus interpretation. Natural Language Semantics
1 (1): 75116.
Schuell, Hildred. 1965. Minnesota Test for Differential Diagnosis of Aphasia. Min-
neapolis, MN: University of Minnesota Press.
Schnupp, Jan, Israel Nelken, andAndrew J. King. 2011. Auditory Neuroscience:
Making Sense of Sound. Cambridge, MA: MIT Press.
Stewart, Lauren. 2008. Fractionating the musical mind: Insights from congenital
amusia. Current Opinion in Neurobiology 18 (2): 127130.
Varley, Rosemary A., Nicolai J. C. Klessinger, Charles A. J. Romanowski,
and Michael Siegal. 2005. Agrammatic but numerate. Proceedings of the National
Academy of Sciences of the United States of America 102 (9): 35193524.
Venezia, Jonathan, and Greg Hickok. 2009. Mirror neurons, the motor system,
and language: From the motor theory to embodied cognition and beyond. Lan-
guage and Linguistics Compass 3 (6): 14031416.
17 Structure and Ambiguity in a Schumann Song
Fred Lerdahl
17.1 Introduction
Robert Schumanns Im wunderschnen Monat Mai, the first song of

the cycle Dichterliebe on poems by Heinrich Heine, is famous for its tonal
ambiguity and open form. This chapter applies the methodology of Ray
Jackendoffs and my A Generative Theory of Tonal Music (Lerdahl and
Jackendoff 1983; hereafter referred to as GTTM) and my Tonal Pitch
Space (Lerdahl 2001a; hereafter TPS) to analyze the songs structure,
thereby elucidating its uncertainties and tensions.1
First, a few words about GTTM. When listening to a piece of music,
the listener does not merely hear a sequence of sounds but also uncon-
sciously organizes them into structures. These structureswhich repre-
sent intuitions of constituency, prominence, and tension and
relaxationconstitute the core of the listeners implicit understanding
of the piece. GTTMs goal, outlined in figure 17.1, is to take as input the
musical surfacethat is, the pitches and rhythms resulting from psycho-
acoustic processingand generate by rule the structures inferred from
the input. The rules represent psychological principles of organization.
GTTM adopts from generative linguistics several methodological ide-
alizations, one of which will impinge on the analysis of the Schumann
song: the theory assigns final-state representations, setting aside the com-
plicated problem of how listeners process musical structures in real time.
At points in the analysis, it will be useful instead to take a quasi-processing
perspective.
GTTMs rules are mainly of two types, well-formedness rules and
preference rules. Well-formedness rules describe possible structures
within a given component. Preference rules select from possible struc-
tures those that are predicted to be heard given a specific musical surface.
Preference rules are gradient rather than categorical; that is, they do not
348 Fred Lerdahl
Musical surface Rules Heard structure
Figure 17.1
Form of generative music theory
Grouping
Musical surface
Time-span
segmentation
Meter
Time-span
reduction
Stability Prolongational
conditions reduction
Figure 17.2
A flowchart of GTTMs components
generate a single correct solution but yield a small range of preferred

solutions. A third rule type, transformational rules, which play a minor
role in the system, permit certain alterations on well-formed structures.
Figure 17.2 gives a flowchart of the components developed in GTTM
to assign hierarchical structure to a musical surface. The grouping com-
ponent parses the musical surface into motives, phrases, and sections. The
metrical component assigns periodic patterns of strong and weak beats.
These two components act together to locate each pitch eventthat is,
a pitch or chordat a particular location within the nested time-span
segmentation. Stability conditions apply to events within this segmenta-
tion to produce a time-span reduction, which represents levels of event
importance in the rhythmic structure. The time-span reduction is input
to a second kind of event hierarchy, prolongational reduction, which
describes patterns of tension and relaxation among events. Relative
tension is again controlled by the stability conditions.
The disposition of these components will emerge from the analysis of
Im wunderschnen Monat Mai. Figure 17.3 presents the score, and
Structure and Ambiguity in a Schumann Song 349
Figure 17.3
Schumann, Im wunderschnen Monat Mai (the first song of Dichterliebe)
figure 17.4 gives the German text and English translation.2 Each poetic
strophe is set to the same music, with the unresolved piano introduction
repeating at the end to produce the sense of a fragmentary, unbounded
structure. This striking formal feature reflects the poets emotions, which
are torn between hope and doubt that his love will be reciprocated. His
uncertainty is also mirrored in the songs ambiguous tonality. The begin-
ning and ending imply F# minor, but this tonic never arrives, and the
song does not resolve. Each vocal stanza begins firmly in A major but
350 Fred Lerdahl
Im wunderschnen Monat Mai, In the lovely month of May,
Als alle Knospen sprangen, when all the buds were bursting,
Da ist in meinen Herzen then within my heart
Die Liebe aufgegangen. love broke forth.
Im wunderschnen Monat Mai, In the lovely month of May,
Als alle Vgel sangen, when all the birds were singing,
Da hab ich ihr gestanden then I confessed to her
Mein Sehnen und Verlangen. my longing and desire.
Figure 17.4
The text of Im wunderschnen Monat Mai (by Heinrich Heine)
Figure 17.5
Metrical and grouping analysis of the first vocal phrase (bars 58)
ends in tonal instability, as it were in the midst of a thought, only to circle

back in the piano to the beginning with its implication of F# minor.
17.2 Rhythmic Organization of the Song
Figure 17.5 provides a rhythmic analysis of the first vocal phrase of the
song. The grouping brackets parse the phrase into two halves. The metri-
cal grid represents strong and weak beats by a dot notation.3 If a beat is
strong at one level, it is also a beat at the next larger level. The note
Figure 17.6
Grouping overlap and hypermetrical ambiguity in bars 115
values to the left of the grid register the distance between beats at each
level. Notice that the grouping boundaries are slightly out of phase with
the time spans between beats, showing an upbeat of one 16th note to bar
1 and three 16th notes to bar 3.
Figure 17.6 shows a rhythmic analysis of the first stanza, ignoring beats
beneath the bar level. Throughout the song, two-bar groups combine to
form four-bar groups, but behind this simple pattern lies a complication.
The two-bar group in bars 910 repeats sequentially in bars 1112. Yet
bars 1213 also form a two-bar group, echoing bars 12 with the bass line
DC# and a progression into the dominant of F# minor. Thus bar 12 both
ends one group and begins another, producing an overlap. The figure also
displays two plausible hypermetrical interpretations.4 On one hand, there
is a preference for hearing strong beats early in a group, favoring
interpretation A. On the other hand, the relative harmonic stability of
even-numbered bars, together with the crescendos into bars 10 and 12
and the longer harmonic duration in those bars, supports interpretation
B. Further, the grouping overlap in bar 12 causes a metrical shift (or
deletion, as indicated by the dots in parentheses), for under either inter-
pretation A or interpretation B, the listener hears a metrical pattern in
bars 1213 and 1415 parallel to that in bars 12 and 34. For some
352 Fred Lerdahl
(a)
Bars: 1-4 5-8 9-12 12-15 16-19 20-23 23-26
4-bar groups: A1 B1 C1 A2 B2 C2 A3
?
(b)
Bars: 1-4 5-8 9-12 12-15 16-19 20-23 23-26

4-bar groups:
A1 B1 C1 A2 B2 C2 A3
Sections:
Figure 17.7
Ambiguity in the global grouping structure
listeners, however, the conflicting evidence for the two hypermetrical

interpretations may cancel any intuitions of metrical structure at the
two-bar level.
The grouping structure above the four-bar level is ambiguous. Initially
bars 112 seem to comprise a three-part section marked A1, B1, and C1
in figure 17.7a, with B1 and C1 grouped together: a four-bar piano intro-
duction followed by an eight-bar verse. Bars 14 repeat in bars 1215, so
by this rationale bars 1223 form a parallel section, A2B2C2. (Bars 12
and 23 are counted twice because of the overlaps.) But this leaves A3 in
bars 2326 hanging. The problem is that the A phrase can be either a
beginning (A1) or an ending (A3). A2 functions as both at once. Its first
impression is of a second beginning parallel to A1, but the ritardando
notated throughout A2 (see figure 17.3) has the effect of winding down
B1C1 and foreshadowing the closing role of A3.
Figure 17.7b offers a symmetrical grouping in which A2 explicitly
doubles as beginning and ending. At the eight-bar level, the grouping is
A1B1 and C1A2, followed in parallel fashion by A2B2 and C2A3.
(Strictly speaking, at the surface C1A2 and C2A3 are seven bars long
because of the overlaps in bars 12 and 23.) Finally, the largest groupings
Figure 17.8
Structural beginnings and cadences in time-span reduction
are A1B1C1A2 and A2B2C2A3. This interpretation is supported by

details at the four-bar level. Bar 5 repeats and tonally reinterprets bars
1 and 3 with their first-inversion B minor chords and suspended C#s,
thus binding A1 and B1 together. The overlap in bar 12 similarly joins C1
and A2. The link between B1 and C1, by contrast, is weak, because B1
cadences.
GTTM allows grouping overlaps for single events, not groups such as
A2 that contain multiple events. The analysis in figure 17.7b suggests a
revision to permit complete low-level groups (but not parts of groups)
to be treated as overlaps. Alternatively, the single-event restriction could
be maintained and the grouping analysis seen as evolving over time. In
this perspective, A2 functions first as a beginning as in figure 17.7a but in
retrospect as a concluding group parallel to A3.
17.3 Pitch Organization of the Song
The time-span segmentation component parses the music from beat to

beat at small levels and from group to group at larger levels. If metrical
and grouping segments conflict at intermediate levels, adjustments are
made to prevent violations of grouping boundaries. Within each time
span, the time-span reduction component selects the most stable event,
level by level from the bottom up. Once the phrase level is reached,
cadences (full or half) are marked [c] and preserved up to the highest
level for which they function. Paired with each [c] is a structural begin-
ning, optionally marked [b], which is the most stable event before [c] in
that unit. Figure 17.8 illustrates schematically with four four-bar phrases
grouped symmetrically into two eight-bar paired phrases and one sixteen-
bar group. The trajectory from [b] to [c] takes place in each four-bar
phrase. At the eight-bar level, [b] starting the first phrase goes to [c]
354 Fred Lerdahl
Figure 17.9
Time-span reduction of bars 115 on the interpretation that the song is in F# minor
ending the second phrase; similarly for the third and fourth phrases. At
the sixteen-bar level, all that remains are [b] launching the group and [c]
ending it.
Figure 17.9 shows a time-span reduction of bars 115 on the interpreta-
tion that the global tonic is F# minor.5 (Later I shall consider the alterna-
tive of A major.) Level f reduces the 16th-note musical surface to 8th notes.
Level e in turn eliminates embellishing events at level f to yield a quarter-
note sequence. Level d continues the process to the half-note level and
level c to the two-bar level. Two-bar groupings are shown beneath level
c. The overlap in bar 12 is represented by two events, a D major arrival
for the previous phrase and a C# dominant 7th for the ensuing phrase.
Levels b and a eliminate less structural events at the four- and eight-bar
levels. The dominant 7ths of F# minor dominate the entire structure
because they act as the structural beginning and cadence of the largest
groups.
Prolongational analysis is represented by a tree structure in which
right branching signifies a tensing motion and left branching a relaxing
motion. In figure 17.10a, dominating event x tenses into subordinate
event y; in figure 17.10b, subordinate x relaxes into dominating y. The tree
notation is an adaptation from syntactic trees in linguistics, but without
syntactic categories. Prolongational trees are often accompanied by a
formally equivalent notation in slurs. The slurs coordinate with branch-
ings. Dashed slurs are reserved for repetitions.
A prolongational analysis derives from global to local levels of its
associated time-span reduction via the interaction principle illustrated in
(a) (b)
Events: x y x y
tensing relaxing
Figure 17.10
The branching notation for prolongational reduction. In (a), y is subordinate to x, and the
progression from x to y is a tensing motion. In (b), x is subordinate to y, and the progression
from x to y is a relaxing motion.
a a
b b
Time-span Prolongational
reductional reductional
levels c c levels
d d
Figure 17.11
Schematic diagram of the interaction principle
figure 17.11. As shown by the solid arrows, events at time-span level a

find the most stable available connection at prolongational level a, and
so on to levels b, c, etc. This mapping amounts to a claim that, at a given
level, the most stable events in the rhythmic structure are the events that
project patterns of tension and relaxation at that level. If, however, an
identical event appears at the immediately smaller level, it is elevated to
the larger level for connection. This exception, shown by the dashed
arrows, reflects the perceptual force of literal repetition.
Figure 17.12 displays a prolongational analysis of bars 113 derived
from the time-span reduction in figure 17.9. Derivational levels are
labeled by letters in the tree. The slurs in the upper system represent
local prolongational connections. Observe that the various first-inversion
B minor chords do not connect to one another but resolve locally to the
dominant 7th of F# minor in bars 1, 3, and 14 and to the dominant 7th
of A major in bars 5 and 7. This detail illustrates a fundamental feature
of tonal music, the interplay between salience and stability. The first-
inversion B minor chords with suspended C#s project the salient sound
356 Fred Lerdahl
Figure 17.12
Prolongational analysis of bars 113 on the interpretation that the song is in F# minor
of the song, yet in this analysis they are all unstable. A prolongational
analysis selects stability over salience.
The lower system in figure 17.12 removes repetitions to bring out the
basic harmonic and linear motion. The C#7 chords dominate the struc-
ture. As half cadences they point to F# minor as the global tonic. At level
b, the first dominant 7th progresses to the local tonic of A major, which
then elaborates into the region of B minor. The sequenced modulation
to D major emerges at level c. The dashed branch to the D major chord
in bar 12 receives a double branch because of the grouping overlap dis-
cussed earlier. Its second branch reflects a reinterpretation of that event
as the predominant of F# minor.
At the bottom of figure 17.12 there is a functional harmonic analysis
employing the symbols of T for tonic function, S for subdominant or
predominant function, and D for dominant function.6 Another, Dep,
signifies departure. These symbols represent not chords per se but their
prolongational role: Dep for the branching that departs from the
superordinate event, D for the branching that attaches to or points to T,
and S for the branching that attaches to D.
The prolongational and functional analysis of most phrases takes the
form of figure 17.13: a T prolongation elaborated by a departure, followed
by S that moves into a two-membered cadence, D to T. Whatever else
happens in the phrase, this pattern usually occurs, for it efficiently proj-
ects a tensing-relaxing pattern. In a half cadence, the final T is omitted
from the schema, and occasionally the opening T is absent. Another
variant is the absence of S. The more a phrase deviates from the schema,
[ c ]
T Dep S D T
Figure 17.13
Normative prolongational and functional structure. Dep stands for departure, [c] for
cadence, usually VI.
Figure 17.14
Time-span reduction of bars 115 on the interpretation that the song is in A major
the less stable the overall structure. This normative branching and func-
tional schema also takes place at grouping levels larger than the phrase.
The analysis in figure 17.12 achieves a version of normative prolonga-
tional structure but with an unorthodox functional progression. The
framing prolongation is not T to T but D to D, and the primary departure
in bar 6 is, at a smaller level, T of a related key. At a global level only S
to D in bars 1213 is standard. This unusual realization of normative
structure weakens the sense of F# minor as global tonic.
The theory derives the alternative global tonic of A major if only one
change is made in the time-span reductionby not labeling the C#7
chord in bar 2 (and its repetitions) as half-cadential. The revised time-
span reduction in figure 17.14 takes this step. Its justification is that bars
12 alone do not firmly establish F# minor. With the removal of the initial
358 Fred Lerdahl
Figure 17.15
Prolongational and functional analyses of bars 12: (a) if the C# 7th chord is treated as a
half-cadence in F# minor; (b) if the C# 7th chord is treated as not cadential but as a chro-
matic deviation within A major.
half cadence, TPSs key-finding component waits until the cadence in

bars 56 to interpret bars 16 entirely in A major. As a result, the hier-
archical relationship between the inverted B minor chord and the now
tonally more distant C# 7th chord reverses. This change can be traced by
comparing bars 14 at levels c and d in figures 17.9 and 17.14. In the
former figure at level c, the C#7 chord wins; in the latter, the B minor
chord wins.
The prolongational effect is of the B minor chord making a feint away
from the A major cadence before resolving. Figure 17.15 illustrates: in
15a, the C# 7th chord is labeled as half-cadential, and the progression in
bars 12 functions as S to D in F# minor; in 15b, the B minor chord
dominates and functions as S in A major, with the C# 7th chord as
neighboring.
Figure 17.16 shows the prolongational and functional analysis derived
from the time-span reduction in figure 17.14. An S-functioning B minor
chord prolongs from bar 1 to bar 5, and the A major arrival in bar 6 acts
as the true structural beginning of the song. The main departure is to B
minor and D major in bars 912. The D major arrival in bar 12 doubles
as a predominant return that extends to the second A major cadence in
bars 1617. Figure 17.14 is extended to bar 17 to show this connection
across strophes. Normative structure with its standard functions resolves
well after the second verse begins: T in bar 6, Dep in bar 10, S in bars
1216, and D to T in bars 1617.
Figure 17.16
Prolongational analysis of bars 117 on the interpretation that the song is in A major
Im wunderschnen Monat Mai does not end there, of course. It

closes on the dominant 7th of F# minor, reviving the F# minor interpreta-
tion. The song is exquisitely poised between these alternatives, a perfect
reflection of the poets emotional ambivalence.
17.4 Tonal Space
GTTM leaves the conditions for pitch stability, which are needed to
construct a prolongational analysis, in an imprecise state. TPS resumes
this thread to develop a quantitative model of pitch stability that cor-
relates with, and in a sense explains, Carol Krumhansls well-established
empirical data on the relatedness of pitches, chords, and keys (Krumhansl
1990). TPS calculates relatedness in terms of cognitive distance and
provides a quantitative treatment of tonal tension and relaxation.
The fundamental construct of the pitch-space model is the basic space
shown in figure 17.17a, oriented to a tonic chord in C major. (Keys are
represented in boldface, with major keys designated by upper case and
minor keys by lower case.) In figure 17.17b, the same configuration is
represented in standard pitch-class set-theory notation in order to
perform numerical operations. The space represents relationships that
everyone knows intuitively: starting at the bottom row, the chromatic
scale is the collection of available pitches, repeating every octave to form
12 pitch classes; the diatonic scale is built from members of the chromatic
scale; the triad is built from members of the diatonic scale; the root and
fifth of a triad are more stable than the third; and the root is more stable
360 Fred Lerdahl
(a) (b)
(C)
G (C) 7
E G (C) 4 7
C D EF G A B (C) 0 2 45 7 9 11 (0)
C C# D D# E F F# G G# A Bb B (C) 0 1 2 3 4 5 6 7 8 9 10 11 (0)
Figure 17.17
Basic diatonic space: (a) using note-letter names; (b) in numerical format (C = 0, C# =
1, . . . B = 11). Both (a) and (b) are oriented to I/C.
(x y) = i + j + k,
where (x y) = the distance between chord x and chord y;
i = the number of moves on the cycle of fifths at level (d);
j = the number of moves on the cycle of fifths at levels (a-c);
k = the number of noncommon pcs in the basic space of y compared
to those in the basic space of x.
Figure 17.18
Diatonic chord-distance rule
than the fifth. The basic space can be seen as an idealized form of Krum-
hansl and Kesslers (1982) empirically established tone profile of the
stability of pitches in a major key. If the tonic note C is wrapped around
to itself, the basic space takes the geometric shape of a cone.
Any chord in any key is representable by a configuration of the basic
space. The distance rule in figure 17.18 transforms one configuration into
another and measures the distance traversed, utilizing three factors that
combine additively: (1) the number of moves on the chromatic cycle of
fifths to reach another key, for instance C major to G major; (2) the
number of moves on the diatonic cycle of fifths to reach another chord
within a key, for instance the tonic of C major to its dominant; and (3)
the number of new pitch classes, weighted by psychoacoustic salience, in
the new configuration of the basic space.
To illustrate, figure 17.19a calculates the distance from I/C to its domi-
nant. Figure 17.19b does the same from I/C to i/c. The smaller the output
number, the shorter the distance. The pitch-class set-theory notation is
not essential; indeed, a computer implementation of the rule employs
the equivalent binary notation shown at the bottom of the figure.
Just as there are many possible routes between cities, so there are many
routes from one chord in one key to another chord in the same or
(a) (b)
In pitch-class set notation:
7 0
2 7 0 7
2 4 7 11 0 3 7
0 2 4 5 7 9 11 0 2 3 4 5 7 8 9 10
0 1 2 3 4 5 6 7 8 9 10 11 0 1 2 3 4 5 6 7 8 9 10 11
(I/C V/C) = 0 + 1 + 4 = 5 (I/C i/c) = 3 + 0 + 4 = 7
In binary notation:
000000010000 100000000000
001000010000 100000010 000
001000010001 100100010000
101011010101 101101011010
111111111111 111111111111
Figure 17.19
Illustrations of
another key. A core assumption in TPS is the principle of the shortest

path; that is, listeners understand a progression in the most efficient way.
For example, assuming the context of a C major tonic chord, a G major
chord is most likely heard as the dominant of C major, not, for instance,
as the subdominant of D major or mediant of E minor. By the same
token, the first pitch or first triad at the beginning of a piece sounds like
a tonic because the shortest distance is from an event to itself.
Distances among chords and keys can be mapped geometrically such
that distances in the space correspond to distance computed by the dis-
tance algorithms. Figure 17.20 shows a portion of a two-dimensional array
of chordal space within a key. The columns display chords on the cycle
of fifths, the rows chords on the cycle of diatonic thirds. The columns and
rows each wrap around to form orthogonal cylinders or a four-dimensional
sphere. In a four-dimensional representation each chord would have a
single location. Figure 17.21 similarly shows a portion of regional (or key)
space, with cycles of fifths on the vertical axis and cycles of minor thirds
on the horizontal axis, the latter expressing an alternation of relative and
parallel major-minor relationships.
362 Fred Lerdahl
V viio ii IV vi
I iii V viio ii
IV vi I iii V
viio ii IV vi I
iii V viio ii IV
Figure 17.20
A portion of chordal space arrayed in two dimensions
B b D d F
E e G g Bb
A a C c Eb
D d F f Ab
G g Bb bb Db
Figure 17.21
A portion of regional space arrayed in two dimensions. Major keys are in upper-case letters,
minor keys in lower-case letters.
Figure 17.22 combines figures 17.20 and 17.21 into a portion of chordal-
regional space. Each region is designated by a boldface letter, and this
letter simultaneously stands for the tonic of that key. Arrayed within each
key are its other six triads.
Figure 17.23 shows the relevant portion of chordal-regional space for
the Schumann song and traces the path of its harmonic progression on
the interpretation that it is in F# minor. The numbers next to the arrows
give the order of the progression. The double lines represent pivotsthat
is, chords that assume two locations in the space. The music passes
through four adjacent regions and reaches the tonic of all of them except
for the tonic of F# minor. The graph brings out the multiple roles of the
B minor chord. At the beginning, it is the subdominant of F# minor. In
bar 5, it migrates to the supertonic of A major. In bar 10, it appears as
tonic of B minor and then pivots as the submediant of D major before
returning to its initial state as subdominant of F# minor.7
17.5 Tension Analysis of the Song
Pitch-space paths such as that in figure 17.23 give a useful but only
approximate picture of distances from one event to the next. To achieve
a precise account, one must return to the distance rule in figure 17.18,
III V viio iii V viio III V viio

VI e III vi G iii VI g III
iio iv VI ii IV vi iio iv VI
o
III V vii iii V viio III V viio
VI a III vi C iii VI c III
o
III V vii iii V viio III V viio
VI d III vi F iii VI f III
Figure 17.22
A portion of chordal-regional space
Figure 17.23
Path of the songs harmonic progression in chordal-regional space on the interpretation
that the global tonic is F# minor. The numbers next to the arrows give the order of the
progression. Double lines represent pivots.
which, together with other rules whose discussion lies beyond the scope
of this essay, affords a quantified prediction of patterns of tension and
relaxation. The crucial concept is to equate distance traveled with the
amount of change in tension or relaxation. If the motion is away from a
point of rest, the rule computes an increase in tension; if it is toward a
point of rest, it computes a decrease in tension. A change in tension
can be computed sequentially from one event to the next, as if the lis-
tener had no memory of past events or expectation of future ones; or it
can be computed hierarchically down the prolongational tree, so that
right branches signify connections to past events and left branches
364 Fred Lerdahl
Figure 17.24
Hierarchical tension analysis for the F# minor interpretation of the song
anticipations of future ones. A striking conclusion of Krumhansls and

my empirical study of tonal tension is that listeners, regardless of musical
training, hear tension hierarchically rather than sequentially (Lerdahl
and Krumhansl 2007).
Figure 17.24 accordingly repeats the prolongational analysis from
figure 17.12, in which F# minor is taken as the global tonic, but now with
pitch-space distances included in the tree. The globally dominating V7
chords that frame the song receive a value of 6 because they point to the
unstated tonic of F# minor; (V7/f#i/f#) = 6. Thus a fair degree of
tension is built into the global structure. Hierarchical tension is summed
down the branches, leading to the row of tension numbers shown between
the staves. For example, the ii6/f# in bar 1 receives a tension value of 6 +
8 = 14; the arrival in B minor in bar 9, which is reached through the
intermediate key of A major, receives a value of 6 + 9 + 10 = 25.
Figure 17.25 converts the tension numbers in figure 17.24 into a tension
curve that describes the songs ebb and flow of tension. After an initial
relaxation from ii6 into V7, the curve describes rising-falling waves of
increasing tension until the most distant event is reached, the inverted
G minor chord in bar 9. After this point there is a sharp relaxation as
the D major arrival in bar 11 pivots into VI/f#, bringing the pitch-space
journey back to the home region.
The tension analysis in figures 17.24 and17.25 is incomplete in two
respects. First, it does not include the factor of surface tension produced
by psychoacoustic dissonance. This factor is most obvious in the repeated
suspensions of C# over the B minor chords, but it is also operative, for
instance, in the fact that the B minor chords are in inversion. Second, the
Figure 17.25
Tension curve for the values in figure 17.24
analysis does not include the factor of melodic and harmonic attractions,
which contribute crucially to expectations of ensuing events. For example,
a leading tone is strongly attracted to its tonic pitch and is expected to
resolve there; likewise a dominant 7th chord to its tonic chord. This factor
is especially powerful for the V7 chords that frame the song: in pitch-
space tension, they are close to the tonic, but in terms of expectation they
are very tense. The full tension model developed in TPS incorporates
both surface-dissonance and attraction factors, and their role in making
accurate tension predictions is demonstrated empirically in Lerdahl and
Krumhansl (2007). If these factors were included in figure 17.25, the most
telling effect would be to increase the composite tension of the framing
V7 chords. In spite of these missing factors, the curve graphed in figure
17.25 reflects essential aspects of the F# minor hearing and shows a
jagged bell-like shape that is typical of tension curves in most tonal
pieces.
The interpretation of the song as globally in A major presents a dif-
ferent picture. Figure 17.26 repeats the prolongational analysis from
figure 17.16 but with tension values in the tree and summed tension
numbers between the staves. Figure 17.27 translates these numbers into
an unorthodox tension curve. (The curve stops at bar 15 in order to
facilitate a comparison with figure 17.25.) After a local tensing motion to
the neighboring C#7 chord, the curve relaxes to zero tension at the A
major cadence in bar 5. At this point it follows a shape for bars 912
366 Fred Lerdahl
Figure 17.26
Hierarchical tension analysis on the A major interpretation of the song
Figure 17.27
Tension curve for the values in figure 17.26
similar to that in the F# minor interpretation, only to close on an upswing

in tension. If attraction values were added to this analysis, the tension at
the end would be even higher.
The tension curves in figures 17.25 and 17.27 raise larger issues for the
theory. On one hand, the overall shape of the F# minor tension curve
better fits the pattern of normative prolongational structure. Moreover,
this interpretation, with its labeling of the C#7 chords as half-cadential,
reflects the methodology of time-span reduction better than does the A
major interpretation, which sidesteps this labeling. TPS states that at
least two consecutive events in a key are required to establish the key
(218). By this criterion, the first two chords of the song fit within F#
minor, and as a result the C#7 chord ought to be labeled as a half

cadence.8 On the other hand, the A major interpretation yields consider-
ably lower tension numbers than does the F# minor interpretation. The
tension numbers in figure 17.25 sum to 305, whereas those in figure 17.27
sum to 228 (in both cases, the smaller number is taken for bar 12). Such
a calculation has not been part of TPSs tension component, but it
follows from the principle of the shortest path. That is, the A major
interpretation covers less territory and hence is more parsimonious. It is
theoretically suggestive that these curves point to a conflict between two
abstract organizing criteria, normative prolongational structure and the
principle of the shortest path. The F# minor interpretation satisfies the
first criterion, the A major interpretation the second.
I can only guess which curve will better correlate with listeners intu-
itions of tension and relaxation in an experimental setting. One might
suppose, given the songs ambiguity, that some listeners will more closely
follow one curve and other listeners the other curve. The results in
Lerdahl and Krumhansl (2007), however, suggest less variability among
listeners intuitions of tension than one might imagine. Alternatively, one
might suppose that listeners compute some sort of average between the
two interpretations. But one does not hear a piece holistically as, say,
60% in F# minor and 40% in A major. Tonic orientation in a tonally
ambiguous piece is more like the Necker cube or duck-rabbit visual illu-
sions familiar in the Gestalt and philosophical literature (Koffka 1935;
Wittgenstein 1953): one can toggle between one perception and the
other, but one does not perceive both at the same time.
This consideration suggests that listeners switch tonic orientation as
the song proceeds, first hearing the song in F# minor (bars 14), then in
A major (bars 58, with subsidiary modulations to B minor and D major
in bars 912), then in F# minor again (bars 1315). To represent such
dynamic hearing, the theory must generate shifting tree structures as
the music unfolds in time and arrive at a consolidated representation.
Exactly how this would work is unclear. One place to start is Jackendoff
(1991), which offers a theoretical exploration of how GTTMs structures
are constructed by the listener in real time. Experiments on perceived
tonal tension also give some indication of how prolongational structure
evolves in the course of a piece (Smith and Cuddy 2003; Lerdahl and
Krumhansl 2007). A fruitful next step would be to submit Im wunder-
schnen Monat Mai to empirical study as a guide to further theory
construction.
368 Fred Lerdahl
Notes
1. The GTTM/TPS theory applies not only to classical and romantic tonal music
but equally well to a wide variety of musical styles including pop music (see
Jackendoff and Lerdahl 2006 for an analysis of a Beatles song). I choose to
analyze this particular Schumann song because it fascinates me and because it
challenges the theory in interesting ways. Recordings of it are easily accessible
on the internet, and I urge the reader to listen to it several times before studying
the analysis.
2. The score and translation are taken from Schumann (1971).
3. This notation, first proposed in Lerdahl and Jackendoff (1977), is analogous
to the phonological grid notation introduced at about the same time by Liberman
and Prince (1977). Lerdahl (2001b, 2013) discusses this and other aspects of the
relationship between linguistic and music theory.
4. Hypermetrical means metrical structure at a level larger than the
notated bar.
5. For reasons of space, it is convenient not to show the pitch analysis of the
entire song. Since the second strophe repeats the structure of the first, this omis-
sion does not affect the analysis in any significant way. Again for convenience,
at larger levels the music is compressed to one staff.
6. These designations are familiar from Riemannian function analysis (Riemann
1893). My use of them departs from that tradition. TPS (chap. 5) explains how
these and other functions arise from prolongational position in combination with
tonic orientation.
7. Figure 23 corresponds to an analysis in Cohn (2011) in the context of an inter-
esting comparison between TPS and neo-Riemannian theories.
8. The fleeting A# in the arpeggiation of the B minor chord in bar 1 (see figure
17.3) briefly implies the key of B minor, but this detail reduces out already at the
8th-note level of time-span reduction and is not a factor at larger levels of
analysis.
References
Cohn, Richard. 2011. Tonal pitch space and the (Neo-)Riemannian Tonnetz. In
The Oxford Handbook of Neo-Riemannian Music Theories, edited by Edward
Gollin and Alexander Rehding, 322348. New York: Oxford University Press.
Jackendoff, Ray. 1991. Musical parsing and musical affect. Music Perception 9 (2):
199230.
Koffka, Kurt. 1935. Principles of Gestalt Psychology. New York: Harcourt, Brace
& World.
Krumhansl, Carol L. 1990. Cognitive Foundations of Musical Pitch. New York:
Krumhansl, Carol L., and Edward J. Kessler. 1982. Tracing the dynamic changes
in perceived tonal organization in a spatial representation of musical keys. Psy-
chological Review 89 (4): 334368.
Lerdahl, Fred. 2001a. Tonal Pitch Space. New York: Oxford University Press.
Lerdahl, Fred. 2001b. The sounds of poetry viewed as music. In The Biological
Foundations of Music, edited by Robert J. Zatorre and Isabelle Peretz. Annals
of the New York Academy of Sciences 930 (1): 337354. Reprinted with revisions
in The Cognitive Neuroscience of Music, edited by Isabelle Peretz and Robert J.
Zatorre, 412429. New York: Oxford University Press, 2003.
Lerdahl, Fred. 2013. Musical syntax and its relation to linguistic syntax. In Lan-
guage, Music, and the Brain: A Mysterious Relationship, edited by Michael A.
Arbib, 257272. Strngmann Forum Reports 10, series edited by Julia Lupp.
Lerdahl, Fred, and Ray Jackendoff. 1977. Toward a formal theory of tonal music.
Journal of Music Theory 21 (1): 111171.
Lerdahl, Fred, and Carol L. Krumhansl. 2007. Modeling tonal tension. Music
Perception 24 (4): 329366.
Liberman, Mark, and Alan Prince. 1977. On stress and linguistic rhythm. Linguis-
tic Inquiry 8 (2): 249336.
Riemann, Hugo. 1893. Vereinfachte Harmonielehre; oder, Die Lehre von den
tonalen Funktionen der Akkorde. London: Augener.
Schumann, Robert. 1971. Dichterliebe. Edited by Arthur Komar. Norton Critical
Scores. New York: W. W. Norton & Company.
Smith, Nicholas A., and Lola L. Cuddy. 2003. Perceptions of musical dimensions
in Beethovens Waldstein sonata: An application of tonal pitch space theory.
Musicae Scientiae 7 (1): 734.
Wittgenstein, Ludwig. 1953. Philosophical Investigations. Oxford: Blackwell.
18 The Friars Fringe of Consciousness
Daniel Dennett
Ray Jackendoffs Consciousness and the Computational Mind (1987) was

decades ahead of its time, even for his friends. Nick Humphrey, Marcel
Kinsbourne, and I formed with Ray a group of four disparate thinkers
about consciousness back around 1986, and, usually meeting at Rays
house, we did our best to understand each other and help each other
clarify the various difficult ideas we were trying to pin down. Rays book
was one of our first topics, and while it definitely advanced our thinking
on various lines, I now have to admit that we didnt see the importance
of much that was expressed therein. For instance, in my Consciousness
Explained (1991)which was dedicated to my colleagues Nick and
Marcel, and RayI gave only the briefest mention of the contribution
of Rays I want to explore here: the idea that we are conscious of only
an intermediate level of all the nested, interacting levels of representation
that the brain uses to accomplish its cognitive tasks.
Ray Jackendoff (1987) argues . . . that the highest levels of analysis performed
by the brain, by which he means the most abstract, are not accessible in experi-
ence, even though they make experience possible, by making it meaningful. His
analysis thus provides a useful antidote to yet another incarnation of the Carte-
sian Theater as the summit or the tip of the iceberg. (Dennett 1991, 278)
That antidote is still much needed by thinkers about consciousness

today, and since I am probably not alone in acknowledging the point
while underestimating its implications, I am going to try to saddle it with
a memorable image to remind us just what adjustments to our thinking
it requires. I hereby dub Rays vision the Friars Fringe model of
consciousnesslike the monks halo of hair halfway down the crown of
his head, it occupies neither the Headquarters nor the Top of the hier-
archy of cognitive processes. That fringe of hair may be our chief sign
that we are in the presence of a friar, but the hair isnt the source of
whatever makes the friar special, and the intermediate level in Rays
372 Daniel Dennett
model is not where the work of semantic processing occurs. Ray argues
for this in two detailed chapters in his 1987 book, drawing on phenom-
enological observation of our experience of music, vision, and visual
imagery, and language itself, of course. He also analyzes the difficulties
of other theories. His claim has since been taken up by another fine
theorist, Jesse Prinz (2012). The Cartesian idea, shared by Jerry Fodor,
Tom Nagel, and John Searle, that consciousness is the source (somehow)
of all Understanding and Meaning1 is, I believe, the greatest single cause
of confusion and perplexity in the study of the mind. For some (e.g.,
Fodor and Nagel) it fuels the conviction that a science of the mind is
ultimately beyond us, an unfathomable mystery. For others (e.g., Searle)
it deflects attention from the one kind of science that could actually
explain how understanding happens: a computational approach that in
one way or another breaks down the whole mysterious, holistic, ineffable
kaleidoscope of phenomenology into processes that do the cognitive
work that needs to be done.
Ray has seen that the first step toward any viable theory of conscious-
ness must demote consciousness from its imagined position as the ulti-
mate Inner Control Room (where it all comes together and the
understanding happens), but he doesnt quite carry through on the
second step, which is embodied in the moral I draw from the demise of
the Cartesian Theater:
All the work done by the imagined homunculus in the Cartesian Theater must
be distributed around in space and time to various lesser agencies in the brain.
(Dennett 2005, 69)
All the work. And all the play, too, for that matter: the savoring, enjoying,
delighting, as well as the abhorring, being disgusted by, disdaining. . . . It
all has to be outsourced to lesser entities, none of which is the ego, or
the person, or the Subject. Just as the phenomenon of life is composed,
ultimately, of non-living parts (proteins, lipids, amino acids, . . . ) so con-
sciousness must be dismantled and shown to be the effects of non-
conscious mechanisms that work sub-personally. When this step is taken,
the Subject vanishes, replaced by mindless bits of machinery uncon-
sciously executing their tasks. In Consciousness Explained, I described
what I called the Hard Question: and then what happens? (255). This is
the question you must ask and answer after you have delivered some
item to consciousness. If instead you stop there, in consciousness,
youve burdened the Subject with the task of reacting, of doing some-
thing with the delivery, and left that project unanalyzed. Answering the
The Friars Fringe of Consciousness 373
Hard Question about the sequelae of any arrival in consciousness

reduces one more bit of Cartesian magic to mere legerdemain. Can
this be the right direction for a theory of consciousness to take? Resis-
tance to this step is still ubiquitous and passionate. As so often before,
Jerry Fodor finds a vivid way of expressing it:
If, in short, there is a community of computers living in my head, there had also
better be somebody who is in charge; and, by God, it had better be me. (Fodor
1998, 207)
Another eloquent naysayer is Voorhees:

Daniel Dennett is the Devil. . . . There is no internal witness, no central recog-
nizer of meaning, and no self other than an abstract Center of Narrative Gravity,
which is itself nothing but a convenient fiction. . . . For Dennett, it is not a case
of the Emperor having no clothes. It is rather that the clothes have no Emperor.
(Voorhees 2000, 5556)
Exactly. If you still have an Emperor in your model, you havent begun
your theory of consciousness. A necessary condition any theory of con-
sciousness must satisfy in the end is that it portrays all the dynamic
activity that makes for consciousness as occurring in an abandoned
factory, with all the machinery churning away and not a soul in sight, no
workers, no supervisors, no bosses, not even a janitor, and certainly no
Emperor! For those who find this road to progress simply unacceptable,
there is a convenient champion of the alternative option: if you DONT
leave the Subject in your theory, you are evading the main issue! This is
what David Chalmers (1996) calls the Hard Problem, and he argues that
any theory that merely explains all the functional interdependencies, all
the backstage machinery, all the wires and pulleys, the smoke and mirrors,
has solved the easy problems of consciousness, but left the Hard
Problem untackled. There is no way to nudge these two alternative posi-
tions closer to each other; there are no compromises available. One side
or the other is flat wrong. There are plenty of Hard Questions crying out
for answers, but I have tried to show that the tempting idea that there is
also a residual Hard Problem to stump us once weve answered all the
Hard Questions is simply a mistake. I cannot prove this yet but I can
encourage would-be consciousness theorists to recognize the chasm and
recognize that they cant have it both ways.2
It is one thing to declare that you are abandoning the Cartesian
Theater for good, and another thing to carry through on it. Rays work
offers a nice example of a half measure that needs to be turned into a
full measure: his discussion of what he called affects in Consciousness
374 Daniel Dennett
and the Computational Mind and now calls (always in scare-quotes)

feels or character tags. Here is how he puts it in Users Guide to
Thought and Meaning:
[An earlier chapter discussed] the character tags that contribute the feel of
meaningfulness and the feel of reality. . . . In contrast to the complexity of
pronunciation and visual surfaces, these feels are simple binary distinctions. Is
what Im hearing meaningful or not? Is it a sentence that someone uttered, or is
it in my head?
Id like to look more closely at these character tags, which mark the overall
character of the experience. Ill contrast them with content features of concep-
tual structure and spatial structuresuch as that this object belongs to the cat-
egory fork, its heavy and smooth, it has points, you use it to eat with, it belongs
to you, its 17 years old, and so on. (Jackendoff 2012, 139)
The fact that he calls these items affects or feels is a bit ominous: just
whose feels are they and how does this Subject, whoever or whatever it
is, respond to them? Ray is silent on this scorethat is, Ray ducks the
Hard Question. But we can try to answer it for him. These feels are
present in our phenomenology, and as such are denizens of the fringe
of consciousness, byproducts of the (higher, or more central) uncon-
scious workhouse in which conceptual and spatial structures get built and
analyzed. Rays excellent half step forward is to dismantle the tradition-
ally mysterious and unanalyzable grasping or comprehending by
the Subject in the Cartesian Theater, outsourcing all that work to
unconscious high-level processes into which we have no introspec-
tive access at all. Those backstage processes make all the requisite links
to conceptual structures, taking care thereby of our ongoing compre-
hension of the words streaming through the fringe of consciousness.
Those words have phonological properties we experience directly
accompanied by the feeling that they are meaningful (or not). Here
we have the beginnings of a nice division of labor: (almost) all the Work
of Understanding has been assigned to unconscious bits of machinery,
leaving only one task for the conscious Subjectappreciating the mean-
ingfulness or noticing the meaninglessness of whatever is on stage at the
moment.
Calling such a signal a feeling at first looks like a step backwards,
back into the murky chaos of qualia, but the fact that the distinction is
binary is encouraging, since it suggests that it does only a small job; its
a single-throw switch, the effects of which are in need of delegation to
some unconscious functionaries. Lets consider some minimal reactions
and then build up from there.
Alternative 1. Discard it unopened. If the arrival in consciousness

engenders no further response at all, if becoming conscious doesnt make
the item even the tiniest bit famous or influential, then it never really
entered consciousness at all. The Given was simply not Taken (to revert
to the traditional language Wilfrid Sellars wisely urged us to abandon).
Alternative 2. Log it in short-term memory. This suffices to elevate the
item to the status of reportability, whether or not the person reports it
(saying something like Hey, weird, I just had this feeling that ugnostic
was meaningful!) This is a start, but just what is short-term memory, and
what does it do? (The Hard Question again: and then what happens?)
The answer, I propose, is that putting an item in short term memory
permits it to reverberate for a while in the Global Neuronal Workspace
(Baars 1989; Dehaene et al. 1998; Dehaene and Naccache 2001) where
it can contribute to a host of other ongoing projects of conceptual struc-
ture refinement, action guidance, and so forth. It is influential enough to
be reportable, noticeable, memorableat least for a short period.
Alternative 3. Draw conclusions from it. Among the contributions it
can make while echoing back and forth in short term memory is to influ-
ence what happens next in some of these projects. To take the case in
point, a feeling of meaningfulness will typically not disrupt ongoing
projects the way its opposite, a feeling of meaninglessness does. The gist
of its normal influence is All is well. Carry on!, in contrast to Abort!
Caution!, the typical (but not universal) gist of its opposite. The latter
may also initiate a new project, the formation and deliverance of a public
speech act along the lines of Hang on there, it sounded like you just
said turnip voting highway. What did you mean? The role of conscious-
ness in this instance is to serve as the expediter or interface between a
struggling central conceptual structure analyzer and some outside source,
another person.
This is the role that accounts for the most striking feature of the Friars
Fringe model of consciousness: the intermediate level of the contents to
which we have access. When I say we, I mean the first-person and
the second-person. Our facility of conscious access has been designed
(by a combination of genetic evolution, cultural evolution and individual
learning histories) to be a user-friendly interface between persons. When
Ned Block speaks of access consciousness and we ask ourselves access
for whom?, the best answer is: access for other people. Your conscious-
ness is other folks avenue to whats going on in your head, and it has
some of the features it has because everything has to be couched in terms
376 Daniel Dennett
that can be communicated to other people readily. (Cf. Chris Friths

recent discussions of similar ideas.)
Just as the desktop screen on your laptop has been designed to convey
to the user only the readily digestible, intuitively natural aspects of
what is going on in your laptop, the requirements for entrance into the
Friars Fringe (which isnt a neuroanatomical place, of course, but a func-
tional category) are that an item have content that is readily communi-
cable to others.
But what about the fabled ineffability of some contents in conscious-
ness? Isnt this variety of incommunicability a hallmark of the qualia
of experience? This is the inevitable byproduct of the user-friendliness
condition: our capacity to report on any topic bottoms out at a lowest
level, and whenever that level is reached in an attempt to convey what
it is like, a null result occurs: I cant describe it; its an ineffable some-
thing. Ineffable, but somehow identifiable. This is a feature that is par-
ticularly striking in cases of the tip-of-the-tongue phenomenon, which is
a kind of temporary ineffability: we cant find the word (yet) but we can
say a lot about what it isnt and a little about the linguistic neighborhood
(its two syllables with the stress on the first) in which it will be found.
Temporary ineffability is the ubiquitous phenomenon that provides the
best support for this treatment of ineffability, as simply the current limit
of analysis. Ear training, courses in wine tasting, and the like can move
the boundaries, deepening individuals access to their inner goings on.
The Fringes boundaries are neither sharp nor permanent, in most
regards. There are many flavors of ineffability, and we can tell them
apart but not say how. (Since we cant say how, it is deeply misleading to
say they have flavors, even in scare-quotes, since that implies we
knowits by tasteprecisely what we dont know: how we do it.)
Alternative 4. Monitor. In a different circumstance the role of conscious-
ness might be entirely internal or first-personal, provoking the redirec-
tion of conceptual analysis machinery down new avenues in search of
meaningfulness. The traditional idea of consciousness as a monitor of
ones ongoing activities is not in itself mistaken; it is only when the
monitor is allowed to work away intelligently, unreduced and undistrib-
uted, that it constitutes a bad homunculus, a postponer of theory. When
we talk to ourselves, either aloud or in silent soliloquy, we have expe-
rientially direct access to the words identities, their sounds and empha-
ses, as Ray points out, and to their meaningfulness or meaninglessness,
but not to the unconscious machinery that does all the heavy lifting, both
producing the speech acts and analyzing them, nor to the factors that are
controlling that machinery. Monitoring our own thought, we can hope
for an insightful breakthrough, but not command one.
These are, of course, the apt and familiar responses we make to feelings
of meaninglessness or its opposite, but notice that once we have cata-
logued a few of them (the highlights from an apparently inexhaustible
list of possibilities), we can leave the feeling out of it, and just have the
binary switch or flag as the triggerer of this family of responses. The
feeling is, as Ray says, ineffableit has no content beyond just the bare
sense of meaninglessness or meaningfulnessand we have, arguably,
captured that content in our catalogue of appropriate responses. The
feeling is not doing any work. One might put it this way (tempting fate):
a zombie, lacking all feelings or qualia, who is equipped with a binary
switch with the input-output conditions we have just described doesnt
lack anything important; it can monitor its own cognition for signs of
meaninglessness, and react appropriately when they are uncovered just
as we conscious folk do; it can tell others about the phenomenology
of its own experiences of meaningfulness and meaninglessness, and that
account will gybe perfectly with our accounts, since there is nothing more
to these feelings than this.
These binary character tags are the easiest cases. Ray did well to put
the term feelings in scare-quotes, since they are best considered as only
feelings pro tem, on their way to the junkyard once we answer the Hard
Question about what happens next when we have them. Once we get
used to the move, we can start tackling all the more complicated, multi-
dimensional aspects of our experience and deconstructing them in similar
fashion.3
Notes
1. Rays innovation in his Users Guide to Thought and Meaning of using a rather
sacred font for philosophical terms that are meant to be particularly deep and
portentous, is irresistible.
2. I can offer intuition pumps to render my claim at least entertainable by those
who find it frankly incomprehensible at first. See especially The Tuned Deck,
in Dennett (2003), (from which some material in the previous paragraphs is
drawn) and Dennett (2005, 2013).
3. My favorite example of this kind of further deconstruction (effing the inef-
fable, we might call it) is David Hurons analysis of the qualia of musical scale
tones, in Sweet Anticipation (2006). What does the stability of do, the tonic,
amount to, compared to the instability of ti, the leading tone, and which families
378 Daniel Dennett
of metaphors, adjectives, and adverbs, tend to go with which families of tones?

With patient and experimentally tested analysis, Huron demonstrates the com-
position of the heretofore ineffable qualia of re and mi and sol and fa, showing
that however atomic and unanalyzable they seem to be at first, their perception
and appreciation is a task that can be outsourced to unconscious neural responses
(Huron 2006, 145).
References
Baars, Bernard J. 1989. A Cognitive Theory of Consciousness. Cambridge: Cam-

bridge University Press.
Chalmers, David. 1996. The Conscious Mind. New York: Oxford University Press.
Dehaene, Stanislas, and Lionel Naccache. 2001. Towards a cognitive neuroscience
of consciousness: Basic evidence and a workspace framework. Cognition 79
(12): 137.
Dehaene, Stanislas, Michel Kerszberg, and Jean-Pierre Changeux. 1998. A neu-
ronal model of a global workspace in effortful cognitive tasks. Proceedings
of the National Academy of Sciences of the United States of America 95 (24):
1452914534.
Dennett, Daniel, 1991. Consciousness Explained. Boston: Little Brown.
Dennett, Daniel. 2003. Explaining the magic of consciousness. Journal of Cul-
tural and Evolutionary Psychology 1 (1): 719.
Dennett, Daniel. 2005. Sweet Dreams: Philosophical Obstacles to a Science of
Consciousness. Cambridge, MA: MIT Press.
Dennett, Daniel. 2013. Intuition Pumps and Other Tools for Thinking. New York:
Norton.
Fodor, Jerry. 1998. The trouble with psychological Darwinism. Review of Steven
Pinkers How the Mind Works and Henry Plotkins Evolution in Mind. London
Review of Books, January 22, 1998, 1113. Reprinted in In Critical Condition,
edited by Jerry Fodor, 203214. Cambridge, MA: MIT Press, 2000.
Huron, David. 2006. Sweet Anticipation. Cambridge, MA: MIT Press.
MA: MIT Press.
University Press.
Prinz, Jesse. 2012. The Conscious Brain: How Attention Engenders Experience.
Voorhees, Burton. 2000. Dennett and the deep blue sea. Journal of Consciousness
Studies 7 (3): 5369.
19 Climbing Trees and Seeing Stars: Combinatorial
Structure in Comics and Diverse Domains
Neil Cohn
Climbing Trees and Seeing Stars 381
392 Neil Cohn
Note
Any images provided without attribution were created by Neil Cohn ( copy-
right Neil Cohn).
References
Cohn, Neil. 2013a. Navigating comics: An emprical and theoretical approach to

strategies of reading comic page layouts. Frontiers in Cognitive Science 4: 186.
doi: 10.3389/fpsyg.2013.00186.
Cohn, Neil. 2013b. Visual narrative structure. Cognitive Science 37 (3): 413452.
Cohn, Neil, and Tymothi Godek. 2007. Comic Theory 101: Loopy Framing.
Comixpedia 5 (3). Original webpage archived. Material available at http://
visuallanguagelab.com/ct101/loopy_framing.html.
Godek, Tymothi. 2006. One Night. http://www.yellowlight.scratchspace.net/
comics/onenight/onenight.html. (Originally posted on March 20, 2006.)
Jackendoff, Ray. 2002. Foundations of Language: Brain, Meaning, Grammar, Evo-
lution. Oxford: Oxford University Press.
ture (Jean Nicod Lectures). Cambridge, MA: MIT Press.
Marr, David, and Herbert Keith Nishihara. 1978. Representation and recognition
of the spatial organization of three-dimensional shapes. Proceedings of the Royal
Society of London. Series B. Biological Sciences 200 (1140): 269294.
OMalley, Bryan Lee. 2005. Scott Pilgrim vs. The World. Portland, OR: Oni Press.
Contributors
Daniel Bring, University of Vienna, daniel.buring@univie.ac.at

Neil Cohn, University of California, San Diego,
neilcohn@visuallanguagelab.com
Peter W. Culicover, Ohio State University, culicover.1@osu.edu
Daniel Dennett, Tufts University, Daniel.Dennett@tufts.edu
Cecily Jill Duffield, University of Colorado Boulder, cecily.duffield
@colorado.edu
W. Tecumseh Fitch, University of Vienna, tecumseh.fitch@univie.ac.at
Lila R. Gleitman, University of Pennsylvania,
gleitman@psych.upenn.edu
Jane Grimshaw, Rutgers University, grimshaw@ruccs.rutgers.edu
Yosef Grodzinsky, The Hebrew University of Jerusalem; and Institut
fr Neurowissenschaften und Medizin Strukturelle und funktionelle
Organisation des Gehirns (INM-1) yosef.grodzinsky@mail.huji.ac.il
Katharina Hartmann, University of Vienna, katharina.hartmann
@univie.ac.at
Albert Kim, University of Colorado Boulder, albert.kim@colorado.edu
Max Soowon Kim, Columbia University, msk51@live.com
Barbara Landau, Johns Hopkins University, landau@jhu.edu
Fred Lerdahl, Columbia University, fred@music.columbia.edu
Willem J. M. Levelt, Max Planck Institute for Psycholinguistics,
Pim.Levelt@mpi.nl
Joan Maling, Brandeis University, maling@brandeis.edu
Bhuvana Narasimhan, University of Colorado Boulder,
bhuvana.narasimhan@colorado.edu
394 Contributors
Urpo Nikanne, bo Akademi University, urpo.nikanne@abo.fi

Catherine OConnor, Boston University, mco@bu.edu
Mara Mercedes Piango, Yale University, maria.pinango@yale.edu
Daniel Silverman, San Jos State University,
daniel.silverman@sjsu.edu
Henk J. Verkuyl, Utrecht University, h.j.verkuyl@uu.nl
Heike Wiese, University of Potsdam, heike.wiese@uni-potsdam.de
Eva Wittenberg, University of California, San Diego,
ewittenberg@ucsd.edu
Edgar B. Zurif, La Loye, France, edgarzurif@gmail.com
Joost Zwarts, Utrecht University, J.Zwarts@uu.nl
Index
Aber, 4143 rnadttir, Hlf, 112

Aboutness. See Topicality Arnason, Ulfur, 307
Accessibility in discourse, 212, 213215, Aspect, 85, 92, 138158
229 Aspectual coercion, 170171, 172, 173,
Accomplishment predicates, 143 174178
Ackrill, John L., 143 Aspectual verbs, xvi, 171172, 173, 178182
Activation, 167169, 180, 212, 238240 Asyndeton, 4243, 4447, 48
Activity predicates, 85, 93, 143 Attitude. See Mode (means and attitude)
Adjuncts, 121122 verbs
Adret, Patrice, 308 Austen, Jane, 103104
Adversatives, 4158 Austin, John Langshaw, 247, 251
Aelbrecht, Lobke, 75 Autonomous construction in Irish, 104,
Affects, 373374 105107
Agent (thematic role) Avrutin, Sergey, 168
expressed with by-phrase, 103, 107 Ay, Nihat, 260
and subject position, 7980, 119134, 188, Ayotte, Julie, 335336, 338
189191, 199200, 280
Agrammatism, 247248 Baars, Bernard J., 375
Ai (chimpanzee), 303304, 308309 Baayen, R. Harald, 219
Aktionsart, 139, 141, 159n3. See also Bach, Emmon, xix
Lexical aspect Backstreet Boys, the, 300
Alexander, Michael P., 168 Baddeley, Alan D., 217
Alexander, Richard D., 298, 299 Bakker, Dik, 114n2
Allerdings, 43, 4850, 57 Baldwin, Dare A., 191
Allott, Robin, 314 Barnes, Michael P., 129, 131
Ambiguity Beavers, John, 86, 94
linguistic, 101102, 103104, 113114, 259, Beckman, Mary E., 44
266269, 271 Beckner, Clay, 270
musical, 347, 349350, 352, 367 Behaviorism, 242243, 249250
Andrews, Avery, 133 Bekkering, Harold, 189
Anti-Semitism, 250 Bernstein, Leonard, xxi, 293, 327, 328
Aphasia, 168169, 173, 181182, 247, 296, Bertinetto, Pier Marco, 122
329330 Bever, Thomas G., 190
Apperception, 237 Beyer, Thomas R., Jr., 139
Arbib, Michael A., 315, 327 Bickerton, Derek, 271
Arbitrariness of signs, 278, 285 Bigand, Emmanuel, 284
Arcadi, Adam Clark, 300 Biology and linguistics, xviii, xxiii
Architecture of the Language Faculty, The, Bladon, Anthony, 261
xxii Blake, Barry J., 120, 121
Arguments and argument positions, 65, Blevins, James P., 109, 113
121122, 131 Block, Ned, 375
Aristotle, 142146, 147 Bloomfield, Leonard, 113, 245, 249
396 Index
Bock, J. Kathryn, 211, 212, 214, 217 Cognitive linguistics, 26

Boesch, Christophe, 300 Cognitive science
Bolinger, Dwight Le Merton, 155 linguistics and, xxiiixxv, 2526, 137
Bonami, Olivier, 96n5 state of, xi, xxxiii, 294
Bonhoeffer, Karl, 247 Cohen, Anthony, 243
Bonobos, 300 Cohen, Laurent, 327
Boundary signals, 267 Cohn, Neil, 387
Bowerman, Melissa, 75 Cohn, Richard, 368n7
Branigan, Holly P., 212213 Cohort theory, 238240
Brannon, Elizabeth M., 327 Coltheart, Max, 221, 296, 329
Brewer, William F., 288 Combinatorial structure, 380388. See also
Brighton, Henry, 14 Hierarchical cognition
Briner, Stephen W., 288 Comics, 380, 383387
Briscoe, Edward, 14 Communication, xxivxxv
British English, 123 Comparative research on rhythmic
Broadwell, George Aaron, 112 cognition, 293294, 296309
Brocas area and Brocas aphasia, 168169, Competence, 4, 10
173, 182, 284, 329 Complement coercion, 171172, 173,
Broekhuis, Hans, 138 178182, 286287
Brown, Steven, 315 Completeness, pragmatic, 4447, 48
Buck, John, 297 Complexity, and processing, 1112
Budgerigars, 302 Composition, semantic, 5152, 6567, 74,
Bhler, Karl, 249, 280 142, 148156
Burkhardt, Petra, 169 enriched, 167, 171, 172, 174
Burroughs, W. Jeffrey, 197 Compositionality, 262263, 266
Burzios Generalization, 129 Comprehension. See Processing
Butterworth, Brian, 327 Comrie, Bernard, 139
By-phrases, 103, 107, 108, 114n2 Conceptual semantics, xix, 2137, 68
Conceptual structure, xiii, xxi
Caha, Pavel, 6970, 71, 73 Concessives, 47
Call, Josep, 314 Conjoined noun phrases, 213229
Calvin, William H., 314 Connes, Alain, 327
Canseco-Gonzalez, Enriqueta, 168 Consciousness, xxii, 236238, 248,
Carpenter, Patricia A., 217 371377
Cartesian Theater, 371, 372, 373374 Consciousness and the Computational
Casasanto, Laura Staum, 11 Mind, 371, 373374
Case, xxvi, 14, 7074, 119134 Construction grammars, 5, 13, 36, 72
Chalmers, David, 373 Constructions, 415
Changeux, Jean-Pierre, 327 Continuity, 148, 149, 157, 163n13
Charles, D., 143 Convergent evolution, 304, 309
Chater, Nick, 14 Cook, Peter, 304, 305306
Children and (child) language, 110, Cooke, Deryck, 293
190191, 192194, 201, 203206, 214215. Coordinators and coordination, 4158.
See also Language acquisition See also Conjoined noun phrases
Chimpanzees, 299, 303 Coppock, Elizabeth, 69
Chomsky, Noam, xvi, xviii, xixxx, xi, Corbett, Greville, 114n2
xxiiixxiv, xxv, 4, 8, 13, 14, 21, 28, 36, 65, Core and periphery, 7, 2829
126, 160n7, 269, 271, 327 Correspondences, form-meaning, 47, 13,
Ray Jackendoff and, xiv, xvii, xix, xx 63, 6467, 259267. See also Mismatches,
Christiansen, Morten H., 14 form-meaning
Cinque, Guglielmo, 73 Corver, Norbert, 82
Claridge, Claudia, 155 Craighero, Laila, 327
Clark, Eve V., 212 Creativity, 5
Clark, Herbert H., 212 Crocker, Matthew W., 11
Clayton, Martin R. L., 310 Croft, William, 1920n7, 36
Clifton, Charles, Jr., 215216 Cuddy, Lola L., 367
Coercion, 122123, 170182, 286287 Culicover, Peter W., xxii, xxvii, 3, 7, 9, 12,
Cognitive biology, xxxi 13, 14, 15, 16n7, 41, 64, 65, 277, 286
Index 397
Cynx, Jeffrey, 313 Extended Standard Theory, xiii

Czerwon, Beate, 283 Extraction from sentential subjects, 11
Extrametricality, 311
Dahlstrom, Amy, 113 Eythrsson, Thrhallur, 110, 112
Dal, Ingerid, 71, 74
Dalla Bella, Simone, 296 Fabb, Nigel, 310311
DAmato, Michael R., 313 Fadiga, Luciano, 327
Darwin, Charles, 235, 293, 296297, 299, Faroese, 129, 131
314, 315 Fazio, Patrik, 327
Dative case in Korean, 119, 121122, 127, Fedorenko, Evelina, 330331, 333334
135n4 Feel, 374
Deacon, Terrence William, 282 Feinstein, Stephen H., 308
Dehaene, Stanislas, 327, 375 Feldman, Heidi, 189
Delgutte, Bertrand, 261 Fellbaum, Christine, 221
Den Dikken, Marcel, 73, 75 Ferreira, Victor S., 211, 212, 214, 217
Dennett, Daniel, xxiv, 371, 372, 373, 377n2 Few words, a, 9192
Dennoch, 43, 4647, 4748, 50, 52 Fiebach, Christian J., 284
Deo, Ashwini, 171, 174, 175, 177, 178, 179, Figure-ground relations, 65, 6769, 7475,
185n4 101, 197, 200
Depiante, Marcela Andrea, 96n3 Filip, Hana, 142
De Renzi, Ennio, 331 Filler-gap chains, 7, 11. See also Gap-filling
Descartes, Ren, 371, 372373, 373374 Fillmore, Charles J., 5, 36, 68, 69, 192
Deschamps, Isabelle, 342n2 Finkel, Lisa, 329330
Dessalegn, Banchiamlack, 203206 Finnish, 131, 133
De Swart, Henriette, 140, 141 Fiorelli, Patricia, 306
De Vogu, Sarah, 159n3 Firato, Carla E., 217
De Waal, Frans B. M., 300 Firth, John R., 272
Dienes, Zoltn, 284 Fisher, Cynthia, 6, 190, 191
Dikken, Marcel den, 73, 75 Fitch, W. Tecumseh, 269, 271, 295, 296, 299,
Dimroth, Christine, 213215, 217, 220, 227 302, 304, 305, 308, 310, 313, 314, 315
Discourse-role verbs, 79, 8081, 87, 92, 93, Flack, Jessica C., 260
94 Focus, and pitch, 325, 335342
Dixon, Robert M. W., 96n2 Fodor, Jerry A., xix, 3233, 271, 326, 372,
Double dissociation, 173, 296, 328334 373
Dowty, David, 79, 141, 142143, 147, 189 Formalisms and formalization, 22, 23, 34
Draye, Luk, 71 Formal phrase structure grammar, xv
Dryer, Matthew, 113, 115n3 Foundations of Language, xxii
Dubinsky, Stanley, 115n7 Fowler, Carol A., 314
Duncan, Lachlan, 112 Fox, Danny, 339, 343n8, 343n9
Dutch, 7475, 116n9, 138 Frazier, Lynn, 215216
French, 140, 157
Economy, 13, 14 Freud, Sigmund, 241242
Edelman, Gerald M., 272 Fried, Mirjam, 36
Egnor, S. E. Roian, 304 Friederici, Angela D., 314
Eisengart, Julie, 6 Friedman, Milton, 342, 343n7
Eldredge, Niles, 270 Frigyesi, Judit, 310
Elffers, Els, 160n7 Frith, Chris, 376
Emit, 92 Fujita, Ikuyo, 192, 193
Energeia, 142143 Function
Entrainment, 297309 evolutionary/adaptive, xxiii, xxivxxv
Ermentrout, Bard, 297 of language use, 30
Evaluation metrics, xviii Functional harmonic analysis, 356
Evolution of language, xxiii, xxiv, 13, 14,
243, 259273, 280285, 294 Galanter, Eugene, 313
Evolution of rhythm and entrainment, Galantucci, Bruno, 314
297299, 304309 Galilei, Galileo, 143144
Exner, Siegmund, 238240, 248, 249 Gally, Joseph A., 272
Explicitness, 34, 35 Gap-filling, 7, 11, 167169, 182
398 Index
Garrett, Merrill, 167 Hasegawa, Ai, 302, 304

Gati, Itamar, 197 Haspelmath, Martin, 6970, 109, 113
Gazdar, Gerald, xv Hattori, Yuko, 303, 304
Gehrke, Berit, 71 Hauser, Marc D., xxxiii, 269, 271, 304
Gelade, Garry, 203 Haviland, Susan E., 212
Gelman, Rochel, 327 Hawkins, John A., 14
Generalization, 6, 8, 9, 14 Hayes, Bruce, 268
Generative semantics, xv, xvi, xvii, 21 Haywood, Sarah L., 212
Generative Theory of Tonal Music, A, Heeschen, Claus, 248
xxixxii, xxxii, 347348, 353, 359, 367 Heilbronner, Karl, 247
Gentner, Dedre, 207n4 Heim, Irene, 59n3
Gerdts, Donna B., 125 Heim, Stefan, 342n2
Gergely, Gyrgy, 189 Heise, Diana, 207n4
Gerhardt, H. Carl, 298, 299 Hendrick, Randall, 217
German, 4158, 6974, 115n6 Henschen, Salomon Eberhard, 327
Gertner, Yael, 6 Herbart, Johann Friedrich, 236237
Gesture and language, 243, 259, 260 Herbig, Gustav, 160n6
Gigure, Jean-Franois, 296 Hickok, Gregory, 168, 181, 327
Gish, Sheri, 306 Hierarchical cognition, 313314. See also
Gleitman, Lila R., 189190, 191, 195, 197, Combinatorial structure
198, 199, 200, 207n6 Hirschberg, Julia, 44
Goal paths, 66, 73, 191194 Hoeschele, Marisa, 313
Godard, Danile, 96n5 Hoff (formerly Hoff-Ginsberg), Erika, 6,
Godek, Tymothi, 385, 387 212
Goldberg, Adele E., 4, 36 Hofmeister, Philip, 11, 12
Goldin-Meadow, Susan, 189 Honing, Henkjan, 312
Goldstone, Robert L., 207n4 Honorifics, 126129
Goodman, Nelson, 188, 207n4 How language helps us think (paper),
Gordon, Peter, 189 xxi
Gordon, Peter C., 217 Huber, Franz, 298, 299
Gorillas, 299300 Huijbregts, Riny, 66
Gould, Stephen J., 270 Hulse, Stewart H., 313
Grahn, Jessica A., 312 Human nature, xixxx
Grammar as evidence for conceptual Humphrey, Nick, 371
structure (paper), xxi Huron, David, 377378n3
Grammatical aspect, 138142, 156157 Husband, E. Matthew, 181, 287
Grammatical functions, xxviii, 16n4 Hutton, James, 270
Green, Georgia M., xv Hyde, Krista L., 335
Greenfield, Michael D., 298, 299
Griffith, Teresa, 170 Icelandic, 109112, 115n6, 129, 130, 133,
Grimshaw, Jane, xxv, 81, 82, 94, 95, 96n3, 135n5
96n5, 122, 123, 170, 190 Iconicity, 259, 260, 280
Grodzinsky, Yosef, 168, 327, 329330 Identity Thesis for Language and Music,
Grouping structure, 350353, 382 296, 327328
Gruber, Jeffrey Steven, xxi, 193 Ihara, Hiroko, 192, 193
Guevara, Che, 343n7 Immediate constituent analysis, 245
Grcanl, zge, 201 Imparfait in French, 140, 157
Gurevich, Naomi, 265 Impersonal constructions, 101114
Gvozdanovic, Jadranka, 141 Im wunderschnen monat Mai (song),
347367
Haarmann, Henk, 248 Indo-European language family, 7071,
Haider, Hubert, 54 7374, 236
Hale, John T., 10, 11 Ineffability, 376
Hale, Kenneth, 79 Information status, 211230
Hall, D. Geoffrey, 191 Innateness, xxiv, 28
Halle, Morris, 310311 Intellectual history, 235251
Halliday, Michael A. K., 213 Internal reconstruction, 270
Harrikari, Heli, 28 Interpretive semantics, xv, xvii, 21
Index 399
Intonation in coordinations, 4445 Kiparsky, Paul, 271

Irish, 104, 105107 Ki-passive, 112113
Irwin, David E., 211, 212, 214, 217 Kirly, Ildik, 189
Island constraints, 1112 Kirby, Simon, 13, 14, 272
Isserlin, Max, 247248, 249 Kita, Sotaro, 189
Italian, 161n22 Klein, Ewan H., xv
Iterative interpretation, 171, 174178 Klein, Wolfgang, 213, 251
Itkonen, Esa, 26, 34 Klessinger, Nicolai J. C., 342n2
Kluender, Robert, 12
Jackendoff, Ray S., xv, xviixix, xxxxii, Koelsch, Stefan, 330, 332
xxiii, xxiv, xxv, xxvii, xxviii, xxix, xxx, Koffka, Kurt, 367
xxxi, xxxii, 3, 4, 8, 13, 14, 15, 2122, 26, Kolk, Herman H. J., 248
28, 29, 32, 33, 36, 41, 6364, 65, 66, 67, 68, Kolni-Balozky, J., 139
70, 72, 79, 120, 122, 130, 134, 137, 150, Koontz-Garboden, Andrew, 86, 94
155156, 163n13, 167, 169, 170, 171, 172, Koopman, Hilda, 73
178, 187, 192, 193, 213, 217, 230, 235, 238, Korean, 119134
262, 277278, 280, 281, 283, 284, 286, 287, Krakauer, David C., 260
288, 289, 293, 294, 295, 296, 309, 310, 311, Krifka, Manfred, 155, 159n4
312, 313, 315, 325326, 327, 328, 338, 347, Krumhansl, Carol L., 359, 360, 364, 365,
367, 368n1, 368n3, 371, 374, 377n1, 367
380382 Kruszewski, Mikoaj, 263, 266
academic career of, xi, xxiixxiii Kuperberg, Gina R., 287
contributions of, xiixiv, xviixxv, 388391 Kurby, Christopher A., 288
personal experiences with, xii, xiv, xv, Kussmaul, Adolf, 247
xvixvii, xxvxxvi, 277278, 371, 380 Kutas, Marta, 288
personal qualities of, xii, xivxv
Jacobsohn, Hermann, 160n6 Labendz, Jacob, 97n8
Janik, Vincent M., 304, 306 Labov, William, 264
Japanese, 123, 124 Ladefoged, Peter, 261
Jarvis, Erich D., 304, 308 Lai, Yao-Ying, 173, 178, 181, 185n2, 185n4
Jrviviki, Juhani, 288 Lakoff, George, xix, 26
Jaspers, Dany, 160n7 Lakusta, Laura, 192, 193194
Jedoch, 43, 4850, 57 Landau, Barbara, xxi, 192, 193194, 201,
Jespersen, Otto, 189, 250 203, 204, 205, 206
Johnson, Helen L., 288 Langacker, Ronald W., 26
Johnson, Keith, 261 Language acquisition, xviii, xxi, 89,
Just, Marcel A., 217 190191, 201202
Juxtaposition of sounds, 262265 Language change, 14, 7374, 101, 104105,
109112, 265
Kaqchikel, 112113 Lapata, Mirella, 287
Karlsson, Fred, 28 Lashley, Karl, 313, 314
Katsika, Argyro, 173, 178, 287 Latin, 74
Katz, Jonah, 296, 327328 Leach, Edmund R., 282
Katz-Postal thesis, xiixiii Leakey, Richard E., 314
Kay, Paul, 5, 36 Lee, Bruce Y., 283
Keenan, Edward, 113, 115n3 Lehrer, Adrienne, 96n2
Keller, Frank, 11, 287 Lenci, Alessandro, 161n22
Kelly, Lisa A., 181, 287 Lenneberg, Eric H., 313
Kenny, Anthony, 143, 160n7 Lerdahl, Fred, xxi, xxxii, 230, 283, 284, 293,
Kessler, Edward J., 360 294, 295, 296, 309, 310, 311, 312, 327, 328,
Kettunen, Lauri, 28 347, 364, 365, 367, 368n1, 368n3, 382
Keyser, Samuel Jay, 79 Lestrade, Sander, 71
Kibort, Anna, 109 Levelt, Willem J. M., 159n1, 211, 212, 239
Kim, Soowon, 125, 126 Levin, Beth, 86
Kim, Young-Joo, 120 Levine, Beth A., 173
Kimura, Doreen, 327 Levine, William H., 217
Kinesis, 142143 Levman, Bryan G., 293
Kinsbourne, Marcel, 371 Levy, Roger, 11
400 Index
Lexical aspect, 138142, 149156 Metrical structure, 296, 309313, 351, 382
Lexical Conceptual Semantics, xxviii Meyer, David E., 167
Lexicalism, xv, xvi, 21 Miller, Carol A., 201
Lexical redundancy rules, xviiixix Miller, George A., 313
Lexical semantics, xxviii, 139, 145 Minimalist Program, 13, 63, 73
Lexicography, 146147 Mirror Neuron theory, 327
Liberman, Mark, 296, 368n3 Mismatches
Light verbs, 79, 287 form-meaning, xviixviii, 6376
Linear order in linguistic representation, 8, morphology-syntax, 104, 109, 113
10 Mithen, Steven J., 271, 293, 315
Linguistic Material arguments, 8084, Mithun, Marianne, 114
8586, 8992 Mittelfeld, 53
Linking rules/principles, 28, 35, 65, 190, Mynarczyk, Anna, 159n3
278280, 284289 Model-theoretic semantics, xix
Lipps, Hans, 246247, 248, 249, 251 Mode (means and attitude) verbs, 79,
Liu, Fang, 335336, 337, 342 8486, 8788, 93
Livingstone, Frank B., 315 Modularity, 3233, 3435, 271, 326
Localist hypothesis, 68 of music and language, 295, 325, 327335,
Locatives and locative case, 69, 74, 336, 337338, 342
119134 Montagu, Ashley, 314
Love, Tracy, 169 Morais, Jos, 296
Lyell, Charles, 270 Morgan, David B., 314
Morphological and semantic regularities
Maas, Utz, 250 in the lexicon (paper), xviixviii, xx
MacKay, Carolyn Joyce, 113114 Morphology, relationship to syntax,
Macnamara, John, xii, 26 103104, 109, 112113
MacWhinney, Brian, 218 Motor hierarchies, 313314
Maess, Burkhard, 284 Mller, Friedrich Max, 236
Maling, Joan, xxv, xxvi, xxviii, 104105, Mller, Stefan, 5
108109, 110111, 113, 114, 116n9, 120, Munro, Pamela, 96n2
121, 126, 130, 131, 133, 134, 135n4 Mnte, Thomas F., 288
Manner-of-speaking verbs, 97n8. See also Music, xxixxii, 347367, 382
Mode (means and attitude) verbs biological basis for, 293315
Market model in science, 251 language, relationship to, 293296,
Marr, David, 4, 381 309312, 313315, 325342
Marslen-Wilson, William D., 238239, 240 in rituals, 283285, 289
Martin, James G., 293 Myler, Neil, 293, 296, 311
Martin, Samuel, 126
Martinet, Andr, 264 Naccache, Lionel, 375
Martins, Mauricio D., 314 Naeser, Margaret A., 168
Matsuzawa, Tetsuro, 303, 304 Nagel, Tom, 372
Mattingly, Ignatius G., 271 Naigles, Letitia R., 6
Mayer, Carl, 240, 241242, 243 Nakanishi, Kimiko, 124
McClelland, James L., 272 Nam, Seungho, 192, 194
McCloskey, James, 104, 105, 106107 Nappa, Rebecca, 191
McElree, Brian, 170, 173, 287 Narasimhan, Bhuvana, 213215, 217, 220,
McLean, Janet F., 212, 213 227
Meaning. See Semantics Natural philosophy, 142143
Means. See Mode (means and attitude) Necker cube, 367
verbs Nelken, Israel, 335
Medin, Douglas L., 207n4 Nettl, Bruno, 295
Memory, lexical, xx Neuroscience and neurolinguistics, xxi, 167,
Memory structures, 4, 6, 8 168170, 295296, 326327, 328335. See
Mendel, Gregor, 235, 241, 248, 249 also Aphasia
Merchant, Hugo, 303, 312 Newberg, Andrew B., 283
Meringer, Rudolf, 240242, 243, 248, 249 New information. See Information status
Merker, Bjrn, 293, 298 New Transitive Impersonal construction,
Methodology, 23, 3336, 331334, 341342 109112
Index 401
Nguyen, Luan, 10, 20n9 Pesetsky, David, 296, 327328

Nida, Eugene Albert, 245 Petrova, Oksana, 22, 2829
Nikanne, Urpo, 22, 2425, 27, 29, 33 Phenomenology, 237, 372, 374, 377
Nikitina, Tatiana, 192 Phrasal verbs, 155156
Nishihara, Herbert Keith, 381 Phrase structure diagrams, origin of,
Nonhuman animals, 296309 244245
Non-promotional passives, 102104, 110, Pickering, Martin J., 173, 212, 287
114 Pieces of structure, 3, 4, 8. See also
Nonsense words, 190191, 198199, 204, Memory structures
207n6 Pilgrim, Scott, 383
Non-weaker alternatives, 343n8, 343n9 Piango, Mara M., 168, 169, 171, 172, 173,
Noonan, Michael, 96n2 174, 175, 178, 179, 181, 185n4, 287
Notley, Anna, 288 Pineau, Marion, 284
-no/to construction in Polish and Pinker, Steven, xx, xxxivn1, 190
Ukrainian, 104, 108109 Pinnipeds, 306308
Nottebohm, Fernando, 304 Pitch perception, 313, 329, 335342
Nowak, Andrzej, 9, 14, 286 Place semantics, 6567, 7173
Number systems and aspect, 148149 Plank, Frans, 115n6
Nzwanga, Mazemba, 115n7 Plotkin, Joshua B., 260
Poetry and music, 311, 312
Occams Razor, 35 Polish, 104, 108109
OConnor, (Mary) Catherine, 36, 114 Pollard, Carl, xv
Oh, Eunjeong, 86 Polysynthetic languages, 272
hman, Sven, 263 Poole, Joyce H., 304
Ohtsuka, Keisuke, 288 Prn, Michaela, 22, 29
Okanoya, Kazuo, 315 Poutsma, Hendrik, 160n6
Old information. See Information status Prepositions and prepositional phrases,
OMalley, Bryan Lee, 383 6376, 192
Only, 339341 Pribram, Karl H., 313
Ontology, 137, 142146, 158 Price, Cathy J., 174
Operant conditioning, 242 Primates, 299300, 303
S, Diarmuid, 107 Primitives, semantic, xxviii
Osherson, Daniel N., 326, 328 Primus, Beatrice, 115n6, 116n9
stman, Jan-Ola, 36 Prince, Alan, xx, 296, 368n3
zyrek, Asli, 189 Princeton University, 221
Prinz, Jesse, 372
Palumbo, Carole L., 168 Probabilistic phrase structure grammars,
Parallel Architecture, xxii, 45, 6364, 1011
7172, 169170, 171, 178, 277, 294 Processing, 10, 11, 167170, 172173,
Parrots, 302 215230, 285288
Parsers, 1011 Progressive form in English, 140
Particles, 4243, 5358, 127128, Prolongational analysis, 354356
155156 Prosody of coordinations, 4445
Parts and boundaries (paper), xxi Prototypicality, 145146, 147
Pasch, Renate, 56 Pseudo-clefts, 81, 8384, 86, 87, 8990
Passives, 101114, 190 Psycholinguistics, 167, 236, 242
Patel, Aniruddh D., 284, 293, 295, 296, Pullum, Geoffrey K., xv
300302, 303, 304306, 308, 325, 335336, Pulvermller, Friedemann, 327
337, 338, 341, 342, 343n7 Pustejovsky, James, 170
Path semantics, 6567, 7173, 188, Pylkknen, Liina, 287
191194 Pyykknen, Pirita, 288
Paulsen, Geda, 22, 29
PDP Research Group, 272 Qualia, 374, 376, 379n3
Peretz, Isabelle, 296, 329 Quantification, and aspect, 141142, 154
Perfective and imperfective aspect, 138, Quine, Willard, 188
139, 156 Quotation fragments, 8184, 86, 87, 88, 90,
Performance, 4, 10 95
Performatives, 246 Quotes, 80, 8184, 8586, 87, 88, 9495
402 Index
Rakowitz, Susan, 191 Schiltz, Kolja, 288

Ralls, Katherine, 306 Schmidt, Hilary, 202
Ravignani, Andrea, 299 Schnupp, Jan, 335
Rebuschat, Patrick, 310 Schools of thought, 2224
Recursion, 259, 266269, 387 Schuell, Hildred, 327
Redundancy rules, xviiixix Schuler, William, 10, 20n9
Reeve, Hannah, 212, 213 Schumann, Robert, 347, 349
Reflexives, 106, 111112 Schusterman, Ronald J., 305, 306, 307, 308
Reichmuth, Colleen J., 306, 307 Schvaneveldt, Roger W., 167
Reinach, Adolf, 246, 247, 248, 249, 251 Science, 2224, 235, 248251
Relationstreue, 280 Sea lions, 305
Repetition in rituals, 281282 Searle, John, 372
Repp, Bruno H., 310 Selection, grammatical, 9495, 174, 178179
Rhesus macaques, 303, 312 Sellars, Wilfrid, 375
Rhythm and rhythmic cognition, 293, Sells, Peter, 104, 119, 123, 124, 125, 126, 128
296312, 350353 Selz, Otto, 250
Richman, Bruce, 315 Semantic composition, 5152, 6567, 74,
Richter, Elise, 250 142, 148156
Riemann, Hugo, 368n6, 368n7 enriched, 167, 171, 172, 174
Riemsdijk, Henk van, 54, 66, 71, 73 Semantic Interpretation in Generative
Rijksbaron, Albert, 144, 159n5, 160n8, Grammar, xvii, 21
160n10 Semantics
Rituals, 281283, 284 conceptual, xix, 2137, 68
Rizzi, Luigi, 173 generative, xv, xvi, xvii, 21
Rizzolatti, Giacomo, 237 interpretive, xv, xvii, 21
Robert, Daniel, 300 lexical, xxviii, 139, 145
Roberts, Ian, 296, 311, 312 model-theoretic, xix
Roizen, Igor, 299 Ray Jackendoffs contributions to, xiixiii,
Romanowski, Charles A. J., 342n2 xviixix
Ronan (California sea lion), 305306 syntax, relationship to, xv, 21, 41, 6364,
Rooth, Mats, 338, 339 73, 76, 285288
Rosch, Eleanor, 145, 197, 200 Semantics and Cognition, xxi, 21
Rosenberg, Jason C., 284 Semantic Structures, xxi
Rosengren, Inger, 54 Senghas, Ann, 189
Ross, John R., 11 Sentential subjects, 11
Ross, William David, 144, 145 Seuren, Pieter A. M., 160n7
Rothenberg, Martin, 261, 263 Shapiro, Lewis P., 170, 173
Rothstein, Susan, 142 Shattuck-Hufnagel, Stefanie, 44
Rousseau, Jean-Jacques, 293 Siegal, Michael, 342n2
Roy, Alice, 327 Siewierska, Anna, 112, 114n2
Rubin vase, 101102, 103, 109, 112, 114 Sign language, 243244
Ruddy, Margaret G., 167 Sigursson, Einar Freyr, 112
Rumelhart, David E., 271 Sigursson, Halldr rmann, 115n6, 121,
Russian, 139, 141142, 156 129, 135n5
Ryle, Gilbert, 143, 160n7 Sigurjnsdttir, Sigrur, 105, 108109,
110111, 113
S- (Russian perfective prefix), 139140, Silverman, David, 263, 264
141142, 156 Similarity, 197198, 207n4
Sachs, Joe, 160n8 Simon, Herbert A., 293
Sadalla, Edward K., 197, 200 Simpler Syntax, xxii
Sag, Ivan A., xv, 5, 11, 12 Simpler Syntax Hypothesis, 3, 14
Sanz, Crickette, 314 Sincerity condition, 246
Saussure, Ferdinand de, 270, 278 Sjare, Becky, 306
SAY verbs, 7995 Skinner, Burrhus Frederic, 242243, 250
Schachner, Adena, 300, 302, 304, 305 Slater, Peter J. B., 304, 306
Schfer, Florian, 115n6 Slevc, L. Robert, 216, 218, 227228, 284
Scheepers, Christoph, 287 Smith, Carlota S., 139
Schijndel, Marten van, 10, 20n9 Smith, Kenny, 13, 14
Index 403
Smith, Linda B., 207n4 Temperley, David, 310

Smith, Michael B., 71 ten Cate, Carel, 315
Smith, Nicholas A., 367 Tense, 138
Snedeker, Jesse, 287 Tension and relaxation, in music, 354355,
Snider, Neal, 12 359, 362367
Snowball (sulphur-crested cockatoo), Te Winkel, Lammert, 138, 150, 159n1,
300302 160n6
Sobin, Nicholas, 108 Tham, Shiao Wei, 86
Song, Hyun-Joo, 190 Thematic Relations Hypothesis, 193
Source paths, 66, 191194 Thiersch, Craig, 82
Spatial cognition, xxi Thought bubbles, 387
Spatial Implementation of the idea of Thrinsson, Hskuldur, 115n6, 115n8, 121
pieces of structure, 10 Tillmann, Barbara, 284
Speech act theory, 246247 Time-span reduction, 353354
Speech balloons, 387 Time-span segmentation, 353
Speech errors, 240243 Toivonen, Ida, 156
Spencer, Cheryl, 306 Tomasello, Michael J., 6, 8, 259, 260
[SQA], 142, 154 Tomioka, Satoshi, 124
Staden, Miriam van, 75 Tomonaga, Masaki, 303, 304
Standard Theory of language structure, xii Tonal Pitch Space, 347, 358, 359, 361, 365,
Staplin, Lorin J., 197 366367
Steels, Luc, 270 Topicality, 213, 214215, 216
Steinthal, Heymann, 236238, 243, 247, 248, Topic markers, 134n2
249 Toward an explanatory semantic
Stem-modifying languages, 272 representation (paper), xix, 21
Stenson, Nancy, 104, 106 Trace Deletion Hypothesis, 168
Stephens, Nola, 227 Trajectories, 810
Stewart, Ian, 297 Translation problems, 144145, 160n9
Stewart, Lauren, 343n7 Traxler, Matthew J., 173, 180, 287
Stirling, Ian, 306 Tredennick, Hugh, 143, 144
Stoeger, Angela S., 304 Treisman, Anne M., 202, 203
Stokoe, William C., 244 Trotzdem, 43, 4647, 4748, 50, 52
Stout, Dietrich, 314 Trubetzkoy, Nikolai S., 250, 267
Streitberg, Wilhelm, 160n6 Turvey, Michael T., 314
Stress, 267268 Tversky, Amos, 197
Strogatz, Steven H., 297 Tyler, Richard S., 261
Stroop, John Ridley, 248249
Stutterheim, Christiane von, 213 Ukrainian, 104, 108109
Subject (grammatical), 119134, 188, Umbach, Carla, 51
189191, 216, 280, 286 Unconscious-meaning hypothesis, 238
Suh, Cheong-Soo, 119, 123, 126 Uniformitarianism, 270
Suer, Margarita, 96n2 Uniqueness point, 239240
Suomi, Kari, 28 Universal Grammar, xx, 13, 96
Surprisal, 11 Universals, 1214, 2728, 7980
Svenonius, Peter, 65 Ur-Wurzeln, 236, 237
Swart, Henriette de, 140, 141 Users Guide to Thought and Meaning, A,
Swinney, David, 168 374, 377n1
Symbolism, 259273, 278280 Utt, Jason, 173, 178
Symmetrical predicates, 187, 188, 194202
Syntax Van Riemsdijk, Henk, 54, 66, 71, 73
morphology, relationship to, 103104, 109, Van Schijndel, Marten, 10, 20n9
112113 Van Staden, Miriam, 75
semantics, relationship to, xv, 21, 41, Varley, Rosemary A., 342n2
6364, 73, 76, 285288 Vaux, Bert, 293, 296, 311
Vea, Donna, 6
Talmy, Leonard, 65, 79, 86, 197, 200 Vendler, Zeno, 143, 145, 147, 160n7, 160n9
Telegram style, 247248 Venezia, Jonathan, 327
Telicity and telos, 143, 144 Verhelst, Mariet, 75
404 Index
Verkuyl, Henk J., 67, 138, 142, 148, 154, 158, Zubizarreta, Maria Luisa, 86
159n2, 160n6, 160n12, 160n14, 161n17, Zurif, Edgar, 168, 170, 172, 173, 181
161n19, 161n20 Zwar, 4748, 49
Vicente, Luis, 51 Zwarts, Joost, 71, 72
Vignolo, Luigi, 168, 331 Zwicky, Arnold, 97n8
Vincent, Nigel, 74
Virtue, Sandra, 288
Visser, Fredericus Theodorus, 117n4
Vocal learning and rhythmic
synchronization hypothesis, 304308
Vogu, Sarah de, 159n3
Von Stutterheim, Christiane, 213
Voorhees, Burton, 373
Vries, Mark de, 96n5
Waal, Frans B. M. de, 300

Walruses, 306, 307308
Warner, Anthony R., 117n4
Watson, John Broadus, 249
Weak and strong principles, 36
Wegener, Philipp, 246
Wells, Frederic Lyman, 243
Wells, Kentwood D., 298, 299
Welsh, Alan, 238
Wernickes area and Wernickes aphasia,
168, 169, 173, 181182
Westphal-Fitch, Gesche, 314
What and where in spatial cognition
(paper), xxi
Whitacre, James M., 272
Wiese, Heike, 279, 280, 282, 285
Wilkes-Gibbs, Deanna, 212
Williams, Leonard, 300
Winkel, Lammert te, 138, 150, 159n1, 160n6
Winkler, Susanne, 12
Wittenberg, Eva, 280, 281, 287
Wittgenstein, Ludwig, 145, 367
Wolfart, H. Christoph, 113
Wong, Carol, 169
Woodward, Amanda L., 189
Wright, Anthony A., 313
Wundt, Wilhelm M., 211, 243245, 246, 248,
249
X-Bar Syntax, xxxxi
Yip, Moira, 120, 130, 134

Yoon, James H., 125, 127, 128129
Yoshita, Hiromi, 211, 212, 214, 217
Youn, Cheong, 125
Yu-Cho, Young-mee, 128
Yule, George, 218
Zaenen, Annie, 104, 121

Zarco, Wilbert, 303
Zec, Draga, 104
Zeman, Adam, 312
Zhu, David C., 181, 287

Structures in The Mind Essays On Language, Music, and Cognition in Honor of Ray Jackendoff

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Structures in The Mind Essays On Language, Music, and Cognition in Honor of Ray Jackendoff

Enviado por

Direitos autorais:

Formatos disponíveis

Structures in the Mind

Structures in the Mind

The MIT Press

Library of Congress Cataloging-in-Publication Data

1 Simpler Syntax and the Mind: Reflections on Syntactic Theory and

2 What Makes Conceptual Semantics Special? 21

3 Semantic Coordination without Syntactic Coordinators 41

4 Out of Phase: Form-Meaning Mismatches in the Prepositional

5 The Light Verbs Say and SAY 79

6 Cognitive Illusions: Non-Promotional Passives and Unspecified Subject

7 Agentive Subjects and Semantic Case in Korean 119

8 Lexical Aspect and Natural Philosophy: How to Untie Them 137

9 An Evolving View of Enriched Semantic Composition 167

10 Height Matters 187

11 Accessibility and Linear Order in Phrasal Conjuncts 211

12 Sleeping Beauties 235

III LANGUAGE AND BEYOND 257

13 Evolution of the Speech Code: Higher-Order Symbolism and the

14 Arbitrariness and Iconicity in the Syntax-Semantics Interface: An

15 The Biology and Evolution of Musical Rhythm: An Update 293

16 Neural Substrates for Linguistic and Musical Abilities: A Neurolinguists

17 Structure and Ambiguity in a Schumann Song 347

18 The Friars Fringe of Consciousness 371

19 Climbing Trees and Seeing Stars: Combinatorial Structure in Comics and

0.1 The Scholar Ray Jackendoff

comparative psychology, psycholinguistics, cognitive science, philosophy,

0.2 Some Brief Reflections on Ray Jackendoff

I first met Ray Jackendoff when I was an undergraduate at McGillhe

0.3 Ray Jackendoffs Scholarship

The earliest of Rays major contributions, when he was still a student,

determined meaning. Rays work was instrumental in showing that

0.4 The Brilliant Ray of Linguistics

Ray Jackendoff is officiallyas this years recipient of the prestigious

(metaphors) and serves to explicitly relate semantics and information

with a student or postdoc, sharing comments on talks. When he is not

0.5 Meeting Ray Jackendoff

0.6 Rays Influence on a Young Generative Semanticist

I spent the academic year 196869 writing a University of Illinois Ph. D.

auxiliaries, in her case from the viewpoint of the minimalist program. I

0.7 Ray Jackendoff in the Semantic Pantheon

Ray Jackendoff has been and remains a pioneer in semantics, clearing

When the science of linguistics was revolutionized in the 1960s by the

a window into human nature. Language is the principal means by which

high-tech toys; originated no school of thought or cult of personality;

0.9 Ray Jackendoff, Cognitive Scientist

great deal of interest in linguistics from scholars in the other branches

0.10 Why Ray Is Special

0.11 The Organization of This Volume

The chapters in this Festschrift are written by colleagues and/or former

classified as psycholinguistics, or linguistics and psychology. The third

syntactic complement of the preposition is not the expected semantic

linguistics literature, such as the distinction between Aktionsart and

in Phrasal Conjuncts. Adults robustly present old information before

higher-order symbolism in the speech code, he draws on an array of

truth conditions.In the proposed studies, the study of pitch in language,

The chapters of this book form a true celebration of cognitive science

1. In fact, when Chomsky coauthored a paper seemingly repudiating his earlier

Bach, Emmon. 1986. Natural language metaphysics. In Logic, Methodology, and

Jackendoff, Ray. 1990. Semantic Structures. Cambridge, MA: MIT Press.

There are many fundamental and far-ranging questions about language

I suggest below a particular implementation of the idea of pieces of

1.2 The Constructional Perspective

On the constructional view, a native speakers knowledge of language