Você está na página 1de 144

1

x G61.1310
Introductory Syntax LecturesMark R. Baltin
Lecture #1- Preliminaries
This course is a course about syntax- the principles by which words combine to
form sentences. The study of syntax tries to answer two main questions:
(i) what are the principles particular to syntax for a particular
language?
(ii) what are the principles of syntax for any human language?
The study of syntax is a branch of the field of linguistics, which has as its
main goal a characterization of human language. As such, linguistics can be
distinguished from the field of semiotics, which studies the properties of
symbolic systems in general. For example, the system of traffic lights in the U.S.
is a sort of symbolic system. There are essentially three symbols: red light,
yellow light, and green light. We call these three states of a traffic light symbols
because each condition symbolizes a different meaning---a red light signals that
the approaching traveler is to stop, and not cross the intersection, a yellow light
indicates that the approaching traveler is to stop before reaching the intersection
because the light is about to turn red, and a green light indicates that the
approaching traveler is free to cross the intersection.
We can say that the system of traffic lights has a grammar, which can be
defined as a specification of the possible expressions in the symbolic system,
together with a pairing of expressions with meanings. In this case, the
grammar of traffic lights has three expressions, and three pairings with
meanings.
In this case, the grammar of traffic lights is extremely simple. It can be
presented as follows:
(1) [[ Green]]-----> Go
[[Yellow]]-----> Stop if before the intersection
[[ Red]]------> Stop
The pairing in (1) is a specific type of relation in mathematics,
known as a function. A function is a pairing of elements from two sets, such
that each element in the first set is paired with no more than one element in the
second set. If the first set has additional elements that are not paired with any
elements in the second set, the function is said to be a partial function. If every
element in the first set is paired, the function is said to be a total function. Let
us assume that the function that pairs the expressions of a natural language,
such as English, Chinese, Welsh, Papago, etc., is a total function.
Question: What would it mean for the function that pairs the expressions of
e.g. English, to be a partial function?

2
I. Can the Grammar of English Be Described as Easily as the Grammar of
Traffic Lights?
The grammar of traffic lights has some noteworthy features that are
useful to think about in thinking about what we think of as a human language.
For one thing, you can count up the number of sentences that the grammar of
traffic lights allows. There are three. For another, we cannot really say that the
grammar of traffic lights has a syntax. It has a list of "words"--- red, green, and
yellow, but each of these words comprises a complete expression in "trafficlightese", and none of the expressions can be combined with any other
expressions.
To introduce some jargon, we would say that (i) traffic-lightese is
finite, and that (ii) the sentences of traffic-lightese are bounded in length.
When we say that a language is finite, we mean that there is a fixed number to
the expressions of the language. When we say that the sentences of the language
are bounded in length, we mean that we can precisely define how long the
sentences of the language can be.
And to complete the circle, you can see what we mean by the term language.
A language is simply the set of sentences that a grammar generates.
To learn traffic-lightese, necessary ( possibly, although it's unclear if you
live in New York City), you had to simply memorize the "sentences" of trafficlightese and learn the function given in (1) that pairs each sentence with its
meaning. Could you memorize the individual sentences of English the way
that you memorized the sentences of traffic-lightese?
Consider the following lines from Lewis Carroll's poem "Jabberwocky":
(2) The blithy toves did gyre and gimble.
(3) The blithy toves karulized elatically.
Sequences of elements in a language are called strings. The reaction of
native speakers of English to the strings in (2) and (3) is rather interesting. The
strings are recognized as being English-like, so that these strings are felt to be
sentences of English, even though the words in these "sentences" have never been
encountered before. We have a feeling that "toves" is a plural (i.e. a form denoting
more than one) of "tove", which is what we learned in school as a noun ( we will
soon learn what the basis of notions like noun and verb is) . Furthermore, even
though we have never encountered the "words" "gyre", "gimble", and
"karulized", we perceive them to be verbs, and, furthermore, that "karulized" is
the past tense of "karulize". Finally, we recognize "elatically" as an adverb.
The way in which we deal with strings such as (2) and (3)
illustrates an important difference between English and other languages that are
termed natural languages, on the one hand, and traffic-lightese, on the other.

3
That difference has been termed linguistic creativity-the ability to produce and
understand strings of a language that have never been previously encountered.
Traffic-lightese is a language with a fixed number of sentences- what is termed a
finite language. Natural languages such as English, on the other hand, are
infinite languages, in the sense that there are an infinite number of sentences in
each natural language. What is the source of this infinity?
Well, for one thing, the words of English seem to be grouped into classes, so
that we can recognize new words coming in as members of these classes. Unlike
traffic-lightese, in which there are a fixed number of words (three, to be precise),
natural languages have an unlimited number of words that simply have to be
fitted into word-classes. The traditional grammar term for a word-class is partof-speech. We term a word-class a grammatical category. We will soon be
examining the basis for the notion of a grammatical category, and contrasting
two views of grammatical categories- the notional view, in which each
grammatical category has a particular meaning, and the distributional view, in
which each grammatical category has a unique distribution, but the
Jabberwocky example bears on the comparison of these two views. Lets see
why.
The reason that the Jabberwocky example is so striking is that the
words are nonsense words. Weve never encountered them before, so we cant
possibly know what they mean. Nevertheless, we feel that (2) and (3) are
English sentences with unfamiliar words. The basis for our feeling is that the
words are in the right places for words of the appropriate word-classes (lets call
them grammatical categories from now on). To see this more carefully, lets
systematically deform, for example, (2), and see if, at each stage of the
deformation, we still have the feeling that the string of words is an English
sentence.
Lets start by removing the [-s] from the example in (2), and see if the
[-s]s removal changes our perceptions of the status of the string:
(2) The blithy tove did gyre and gimble.
(2) does seem to be English-its talking about a single tove, who
performed a compound action in the past of gyring and gimbling. Now, let us
remove the did:
?(2) The blithy tove gyre and gimble.
This has a somewhat shakier status as an English sentence, and the sense
that I have gotten in the past, when Ive performed this experiment in classes, is
that speakers of English are split. Some people find this sentence to be nonEnglish, while others find it to be English if tove is taken to be an irregular

4
plural of some sort, like children or cattle. Removing the makes the string still
harder to recognize as English:
?(2)Blithy tove gyre and gimble.
Finally, removing the and causes the sequence of words to be felt by all
speakers as being simply a string of words, with the character of a list:
(3)*Blithy tove gyre gimble.
An asterisk before a set of words is taken by convention to mean that the
sequence of words is an ungrammatical sentence.
Let us stop for a minute and think about how we dealt with this example.
We couldnt have known the words. Rather, we took some words that we knew
(and parts of words, such as the [-s]), and figured out details about the
unfamiliar words from how they were positioned with respect to the familiar
parts of English. In this sense, the distributional account of what grammatical
categories are seems to fare better than the notional account. We had to be
figuring out what kind of structure to assign the string based on the sequencing
of its parts, looking at the unfamiliar parts and seeing where they were relative
to the familiar ones.
It is important to see what weve just done. Weve taken two a
priori plausible views of what a grammatical category is, and weve tested them
by seeing what predictions they each make about phenomena in the part of the
world that were investigating (i.e., sentences).
In any event, weve seen one reason for the open-endedness of a
language such as English, as opposed to traffic-lightese, and that is the fact that
natural languages (human languages, for our purposes) have a syntax- a set of
rules for arranging elements into more complex units of language (i.e., words
into sentences). Traffic-lightese does not have these principles-every word is a
complete sentence, and there are no principles for stringing words together to
form more complex sentences.
As well see very shortly, there are two ways in which natural
languages are infinite, meaning that theres an infinity of sentences in the
language. We have seen the first way, in which sentences are said to be made
up of members of grammatical categories, and new words can enter the
language to instantiate these grammatical categories.
A second way in which natural languages are infinite is that, as opposed
to traffic-lightese, in which you can specify the length of each sentence ( because
each sentence is composed of one word and there are no procedures in trafficlightese for combining words), there is no specifiable bound on the length of a
sentence in a natural language. To see this, consider the following:
(4) a. The teacher left.

5
b. The teachers mother left.
c. The teachers mothers friend left.
d. The teachers mothers friends sister left.
e. The teachers mothers friends sisters boss left.
f. The teachers mothers friends sisters bosss mother left.
I could have kept going with this type of example, and the sentence would have
gotten continually longer. In English, as indeed in all natural languages, the
grammar must contain methods or devices to create sentences of any length.
Obviously, for a sentence to be a sentence of a language, it must stop at some
point, but the grammar of English must allow that point to be of any conceivable
length. In technical parlance, the grammar of English must generate an infinite
language.

Competence and Performance


At this point, we must step back for a minute and consider what
we are trying to account for. We have been trying to account for what it means to
know English ( just as an example- we could have picked any language to try to
account for). However, in order to get our data for English, we have relied on
the intuitions of speakers of English- how speakers of English feel about the
strings that are presented to them. However, it seems that we cannot go directly
from our intuitions about English to inferences about whether particular strings
are in the language. The reason is that the properties of particular strings may
be due to factors that are not, properly speaking, part of the language at all.
For example, suppose we had continued to elaborate (4)(f) by continuing to add
[s] plus a noun, as in (4)(f):
(4)(f) The teachers mothers friends sisters bosss mothers cousins sisters
doctors fathers neighbors daughters friends teachers cousins neices
accountant left.
This string would be felt to be unacceptable, but not, it is usually
thought, because of our knowledge of English. To understand a string such as
(4)(f), and to make sense of it, we have to integrate what we know about the
individual words with a structure for the whole sentence, and a run-on
sentence such as this taxes our ability to remember everything that has come
before when we get to the end of a sentence. There are a number of studies of
memory, and one thing that we know about human memory is that it is limited
(theres a classic paper by the psychologist George Miller, entitled The Magic
Number Seven, Plus or Minus Two: Some Limits On Our Capacity for
Processing Information, Psychological Review (1957), that proposes a specific
bound on short-term memory across a wide variety of perceptual domains).

6
In any event, if what is wrong with (4)(f) is due to a memory problem in
understanding the whole sentence, this problem would not be felt to be a
problem with the English of the string, but rather with the fact that people put
their knowledge of English to use by employing the rest of their mental
resources. In other words, our knowledge of English is embedded in the rest of
our capacities, such as memory, limitations on articulation (the fact that our
vocal tract can do some things but not others), etc.
Chomsky, in Aspects of the Theory of Syntax (MIT Press,
1965) made a distinction between what he calls competence and performance.
Competence is our knowledge of language, and performance is the mechanisms
by which our knowledge of language is put to use. As linguists, particularly as
syntacticians, we are interested in specifying competence in a particular
language, rather than performance. However, since when we try to determine
what constitutes knowledge of,e.g. English, the raw data that we start with is
peoples intuitions about the language, we dont know what status to ascribe to
peoples intuitions. When somebody says that a string sounds funny, is it
because of a property of English(competence), or because of some factor other
than language (performance)?
There are really two parts to deciding to relegate the factor to the domain
of performance:

Rationale for Performance Explanation


(i)
Putting it into competence would cause competence to have
complicated restrictions that are seemingly arbitrary from the
point of view of competence.
(ii)
One can come up with a performance account that is plausible
within the domain of performance.
Weve seen the two parts of the argument for using performance as the account
already in the discussion of (4)(f). Putting a limit on the length of an English
sentence would complicate our account of English, and we would have to
explain why the account exists within our description of English. Furthermore,
the explanation of the restriction in terms of a limitation on memory is a natural
one. It has to be the case that we have memory limitations. Try remembering a
sequence of 20 digits.
There are two other cases of strings that would seem to be unacceptable
for performance reasons. One has to do with (2), repeated here:
?(2) The blithy tove gyre and gimble.
Recall that this string was not considered to be word-salad, but was not as
acceptable as (2). It was intermediate in acceptability, and its acceptability
hinged upon whether tove was taken to be an irregular plural. It is instructive to
consider this intermediate unacceptability further. Obviously, English contains
irregular plurals, and we are free to coin new lexical items. This has happened
several times in the history of English, as it has in all languages, surely.
However, we dont tend to assume that a new word is irregular in what is called
its morphology ( the distribution of meaningful forms). Let us state a principle
like the following:
(3) Assume initially that a novel form is regular.
The question is: What status does a statement like Assume initially have
within a grammar? Grammars tend to rule things in or out. (2), however, gets
better if we decide that our initial assumption was wrong.
It seems plausible to take an example such as (2) to be due to the
application of what the psychologist Thomas Bever has called a perceptual
strategy ( T.G. Bever (1970), The Cognitive Basis of Linguistic Structures, in
J.R. Hayes, ed., Cognition and the Development of Language, Wiley & Sons). A
perceptual strategy is a sort of heuristic that hearers develop that enables them to
assign a structure, and hence to understand, a sequence of elements that they
are encountering. Bever illustrated the application of a perceptual strategy with a
now-famous example. Consider a non-nonsense string such as (4):
(4) The horse raced past the barn fell.

8
Does this sound like an English sentence? Most people would say that
it doesnt; it sounds like a main clause The horse raced past the barn, but then we
have no way of integrating the word fell. This analysis of the string (called
parsing- the assignment of a structure to a string) relies on analyzing raced as the
past tense of race.
However, the past tense of race is homophonous with another form of
race, called a participle form of race. The participle form of race is shown in
examples such as (5):
(5) ?? 1The horse was raced past the barn.
Keep in mind the two uses of the word raced- the past tense form and the
participle form. Now, let us alter (4) by substituting the word driven for the
word raced:
(6) The horse driven past the barn fell.
This sentence is perfectly acceptable, as is its paraphrase (7):
(7) The horse which was driven past the barn fell.
In (7), the sequence which was driven past the barn is an instance of
what is known as a relative clause- strictly speaking, a restrictive relative clause.
Restrictive relative clauses have the function of limiting the class of objects to
which the noun that precedes the relative clause can refer. When a restrictive
relative clause begins with a word like who or which (known as wh-words, which
well talk more about later on) and is followed by a form of the verb be, there is
under most conditions a synonymous sentence that simply omits the wh-word
plus be. This construction is known as the reduced relative construction. Further
examples:
(7) a. The girl who was sitting on the stoop was studying for her finals.
b. The girl sitting on the stoop was studying for her finals.
(8) a. People who are angry about this issue should write their elected
representatives.
b. People angry about this issue should write their elected representatives.
If we decide that (4) is ungrammatical, the question that we would ask is why.
Why is there no reduced relative counterpart to (9), which is perfectly
acceptable?
A question mark before a sentence indicates that the sentence is of dubious acceptability,
while an asterisk indicates ungrammaticality. The point of this section, however, is that
acceptability is a pre-theoretic notion, having to do with how we feel about certain strings,
while grammaticality is a post-theoretic notion, having to do with whether or not a certain
string is generated by the grammar. We cannot decide whether or not a certain string is
generated by the grammar until we have constructed the grammar, however, and it is in this
sense that grammaticality is post-theoretic, since the account that we are constructing, a
grammar, is a theory of what it means to know a language.
1

9
(9) The horse which was raced past the barn fell.
We have a textbook example here of the Rationale for Performance
Explanation. Obviously, (4) is unacceptable as a paraphrase of (9) because the
word raced is taken to be a main clause. We could of course say that (4) is
ungrammatical . However, in considering the implications of this decision, we
would be saying that the reduced relative clause construction does not occur in
English when the first word of the reduced relative clause would create a
sequence that is homophonous with a simple main clause. This restriction is
mysterious from the standpoint of grammar, but is explained naturally from the
vantage point of perception, given that we, as speakers of English, must have a
psychological mechanism for understanding sentences as they come in. In a
sense, placing the restriction within the grammar of English would make the
restriction look bizarre; the restriction as an instance of what Bever calls a
perceptual strategy is quite natural.
All of this is intended as a cautionary note that, paradoxically, the raw
data for syntactic analysis is speech, which is an instance of performance, but
what we are trying to construct is a model of competence, reflected
psychologically as our knowledge of language (as opposed to performance,
which is how that knowledge is put to use one particular occasions).
As a historical note, the competence-performance distinction that
Chomsky makes is a quite traditional one within linguistics, but under different
names for these concepts: the late Swiss linguist Ferdinand de Saussure coined
the terms langue (for language) and parole (speech).
The Role of Formalism
As linguists, we are trying to mimic the task of children in learning their
native languages. It is uncontroversial that children learn the rules of their
languages without explicit instruction, as can be seen by, e.g. the innovations
and over-regularizations (making forms regular that are irregular in the adult
language, such as goed and buyed as the past tenses of go and buy
respectively).
It is clear that children are constructing a grammar, but what form does
this grammar take? It cannot be expressed in, e.g., English, because they dont
know English yet. To borrow a term from the philosophy of language, we
would say that English in this case is the object language, the language being
described, while the rules of English are being formulated by children in what
is known as a meta-language, a language that is outside of the language being
described.
A good deal of what we will be doing will involve discovering the nature
of this meta-language. We will be posing hypotheses about the rules of
grammar, and the way that they interact, by viewing the grammar as what is
known as a formal system, a system in which all of the concepts have a precise
definition. It is by making this assumption that we can make testable predictions
about the grammar. Additionally, formalism has the advantage of ensuring that
the terms that are used in an account have the same meaning to all parties.

10
Criteria of Adequacy for a Grammar
There is an old saying, If you stand for nothing, youll fall for
everything. How do you decide whether the set of rules that youve proposed
for some set of sentences in a language is the right one, or the best one, given
that there are an infinite set of possible grammatical descriptions?
In formulating a grammar, or a set of rules that we assume models the language
users knowledge of language, we must first decide how we are going to evaluate
the grammar that we propose. Chomsky (1965) proposed three criteria of adequacy
for a grammatical description, which he dubbed:
(i)
observational adequacy;
(ii)
descriptive adequacy;
(iii) explanatory adequacy.
I will now discuss these concepts.
A.Observational Adequacy
Linguists who formulate grammars of languages that are not their own, and who
work with native speakers of those languages, are called field linguists. They
typically start by finding out the words for various concepts in the target language,
and eventually ask the speaker if (s)he can put the words together in this way or that
way to form an acceptable sentence in the language. After collecting the responses
for some time period (say, an hour-and-a half, for example), the field linguist
leaves and analyzes the responses, trying to figure out the rules that generate all of
the acceptable strings and none of the unacceptable strings. A set of rules, or
grammar, that achieves this, is said to be observationally adequate. Hence,
observational adequacy can be defined as follows:
Observational adequacy: the ability of a grammar to generate all and only
the grammatical sentences of a language in a fixed body of data (called a corpus).
B. Descriptive Adequacy
As we saw from the Jabberwocky example, natural languages are infinite, and hence
a grammar of a natural language must be able to generate an infinite number of
sentences. To take the example of the field linguist above, after the field linguist has
formulated a grammar that is observationally adequate, he tests the grammar
against the native speakers intuitions by asking the native speaker if some further set
of sentences that are not in the original corpus are acceptable sentences in the
language.
If a grammar generates all and only the set of grammatical sentences in the
language, it is said to be descriptively adequate.
Descriptive adequacy: the ability of a grammar to generate all and only the
grammatical sentences of the language.

11
However, we are not only interested in generating the right set of strings.
Remember, what we are really interested in modeling is the full set of abilities
that native speakers have, and one of those abilities is the ability to recognize
the meanings of sentences. For example, we know that The cat is on the mat does
not mean that John saw Mary. We therefore have to build in this ability as well.
A traditional way of describing a grammar is as an infinite set of pairings of form
and meaning. Let us therefore revise our definitions of observational and
descriptive adequacy as follows:
Observational adequacy (Final Version): the ability of a grammar to generate all
and only the grammatical sentences of a language in a fixed body of data
(called a corpus), and to pair each grammatical sentence with its meaning.
Descriptive adequacy(Final Version): the ability of a grammar to generate all
and only the grammatical sentences of the language, and to pair each grammatical
sentence with its meaning.
C.Explanatory Adequacy:
The third requirement is not, strictly speaking, a requirement on grammars,
but, rather, a requirement on the account that underlies the construction of a
particular grammar, i.e. an account of what a possible grammar of a human
language can be. This needs a little more explanation.
When we formulate a grammar, we must have, at some level, a set of
assumptions as to what a possible grammar can be- there are certain possibilities for
rules that dont even occur to us. We therefore have, if only implicitly, a theory of
possible grammars.
It is commonplace to view the task of a linguist, in discovering a descriptively
adequate grammar of a language, as being identical to the task of a child, who is
trying to discover the descriptively adequate (adult) grammar of the language of her
or his community. Because linguists are trying to model the abilities of native
speakers, one of their goals is to try to formulate this theory, called a theory of
universal grammar, as well as the grammars of particular languages. We would
therefore say that the relation of the theory of grammar to grammars of particular
languages could be described as follows:
(10) Theory of Grammar (Universal Grammar) ={G1,., Gn}
In other words, a theory of grammar is a specification of the possible grammars,
which were calling G1 through Gn (instead of French, English, Ewe, Chinese, etc.).
Now, a theory of grammar that is the correct account of what a possible grammar of
a natural language should predict only the set of actual possible grammars, and
should not predict that some grammar is the grammar of a natural language that is
never realized in fact. In other words, a theory of grammar should not over-predict.

12
There is a distinction between actual grammars of human languages and
possible grammars of human languages. As Chomsky & Halle put it in the
preface to their classic work in phonology, The Sound Pattern of English (Chomsky
& Halle (1968)), If a nuclear explosion were to wipe out everybody on earth except
for the inhabitants of Tanzania, we would not want to say that pitch is a linguistic
universal. (the assumption being that the language spoken in Tanzania is what is
known as a pitch-accent, or tone, language). External circumstances would cause
only one language to spoken in the world, the speakers of all of the others having
been wiped out by nuclear extinction, but the speakers of that language would have
the capacity to learn other languages that are not pitch-accent languages.
Remember, in constructing an account of what a possible human language is,
we are modeling actual human capacities, and the assumption is that humans will
only consider a certain set of grammars as possible grammars as human languages.
Explanatory adequacy, therefore, can be defined as a requirement that a theory of
grammar only allow for the possible grammars of human languages.
Explanatory Adequacy= the ability of a theory of grammar to predict only the set
of possible grammars of human languages.
Summary:
The goal of the study of syntax is two-fold:
(i)
characterization of what it means to know a particular language (i.e.
formulation of descriptively adequate grammars of specific languages);
(ii)
determination of the range of possible grammars of human languages.
You cant do one in the absence of the other. Rather, syntacticians are
always working heuristically (i.e., back-and-forth) between the two
accounts, universal grammar and particular grammar.

13

Lecture #2- Parts of Speech


We can think of our task, in characterizing the syntax of
English, as the task of answering the following question:
(1) S=?
with S being the set of English sentences. A more usual way of
expressing the question is by using an arrow, so that (1) would be
expressed as (2):
(2) S--?
The difference being that the arrow expresses the idea that sentences are
structured, so that the arrow means roughly consists of.
As we saw last time, knowing a language must be more than
knowing a fixed number of sentences, since we have what is known as
linguistic creativity- the ability to create and recognize new forms all of
the time. We must therefore have some rules to fit the new forms into
patterns that we already recognize. The Jabberwocky example last time
illustrated this point. While the particular words were new, we were
able to recognize the stringing together (technically known as
concatenation) of them, because we were able to fit the concatenation
into a pattern, and the environment of each word enabled us to
recognize it as part of a word-class.
So, we can say that part of what allows us to see particular
sequences of words as sentences, and not other sequences, is that we
state the rules for stringing together words not in terms of particular
words, but word-classes, known traditionally as parts of speech, or
grammatical categories.
As I mentioned briefly last time, there are two views of the
basis of grammatical categories- the notional, and the distributional.
The notional view, which comes from traditional grammar, holds that
each grammatical category is distinguished from the others by a
particular meaning, so that each grammatical category is associated
with a unique meaning. This is the basis for the view that a noun is the
name of a person, place, or thing, that a verb denotes an action, etc.
If we consider this view, we quickly find that there are nouns that
do not fit into this criterion- for example, what about nouns such as
destruction or attempt (known as nominalizations, in that they are nouns
that are thought to be derived morphologically from verbs):
(1)
Romes destruction of Carthage.
(2)
Johns attempt to frame Susan.

14
Also interestingly in this connection is a grammatical construction that
was dubbed by the late Danish grammarian Otto Jespersen as the light verb
construction. This construction is exemplified by a sentence-type that has the
appearance of containing a transitive verb; however, the verb does not seem to
carry any meaning, and the main predicate of the sentence is carried by the
noun. English shows it in the make the claim construction. Note that (3) and
(4) are synonymous.
(3)
John made the claim that he was descended from Thomas
Jefferson.
(4)
John claimed that he was descended from Thomas Jefferson.
Japanese shows it as well ( Grimshaw & Mester (1988)). An example is (5)
(their (2)(a):
(5) John-wa Bill-to AISEKI-o
shita.
John-Top Bill-with table-sharing-Acc suru-Past.
John shared a table with Bill.
AISEKI (table-sharing) is a noun, and has the characteristics of
nouns. For example, it takes a particle known as a Case-marking particle, Case
being a flag, roughly, of the grammatical or semantic function of a noun.
Semantically, however, AISEKI (capitalized here simply for typographical
emphasis) functions as the main predicate in the sentence, as can be seen from
the translation, and shita , a form of the element suru, has all of the
characteristics of a verb in Japanese-for example, the fact that it is inflected for
Tense.
If we assume that parts of speech are defined in terms of meaning, we
assume, as a null hypothesis, that this form-meaning correspondence is
universal (i.e., holding across all languages)-the reason being that it is not
possible to figure out how different principles of form-meaning correspondence,
which would have to be very abstract, could not readily be learned by children
who are acquiring their first language. Therefore, we would assume that
Japanese and English would have the same principles relating parts of speech to
meaning.
The light verb construction is interesting because it highlights a mismatch
between form and meaning. The object noun is the main predicate in the
sentence, and the verb, which is usually thought to be the main predicate
(usually described as denoting action-more on this below), does not seem to
have any semantic content at all; rather, it seems to function to carry Tense and
other information which is exclusively associated with verbs, and which every

15
sentence needs. The term light verb is so-called because the verb is
semantically light, not carrying much, if any, meaning.
When we look at verbs, as I mentioned above, verbs are often said to
denote actions. However, what do we then make of the underlined elements
here, all of which are thought to be verbs, and none of which seem to describe
actions by the subject?
(5)
a. Germany endured a crushing defeat.
b. Jones underwent surgery.
c. Bill suffered a fatal blow to the head.
The underlined elements above all have an understood beginning
and end, a characteristic of actions, but they do not denote the volition, or free
will, on the part of the subject that is characteristic of actions. For example, the
subject of run chooses to run, while the subject of each of the above elements
doesnt choose to perform an action that each of the elements denote. You dont
normally choose to endure something, or suffer something, and you can choose
to undergo surgery (if its elective), but you can undergo surgery that is totally
involuntary.
In fact, the philosopher Zeno Vendler came up with a classification for
verbs in 1967 (Zeno Vendler, Linguistics in Philosophy, Cornell University
Press). He classified verbs into accomplishments, achievements, activities, and
states. Accomplishments are actions that have a definite result, which the
subject intends to bring about. An example is (6):
(6)
John built a house.
Achievements are events that have a definite result, but the subject does not
intend to bring this about, such as dying or being born:
(7)
John died. ( a rather dubious achievement, but an achievement
nevertheless!)
Activities are actions that do not necessarily have a result, so that
the notion of success is not an integral part of the felicitous use of the verb. An
example is walking:
(8)
John walked.
States are timeless, and, unlike the other three semantic types, dont have a
beginning and end. Examples:
(9)
a. John knows French.
b. John understands this point.
For our purposes, we see a real heterogeneity in Vendlers classification,
such that it is difficult if not impossible to pick out a single aspect of meaning
that unites all four types of verbs.

16
If we drop the idea that we classify words into parts of speech based on
meaning, we are left with a distributional basis for parts of speech, and this was
the idea of the structuralists, who essentially founded American descriptive
linguistics (Leonard Bloomfield (1933), Language, Holt, Rinehart, & Winston;
Zellig Harris (1951), Structural Linguistics, University of Chicago Press). In this
view, elements are classified into word classes on the basis of the environments
in which they can appear in sentences. To illustrate, consider the class of
elements that can appear in the slots in (10) , (11), and (12). Think of ten
elements that can appear in (10), and ten elements that cannot.
(10)
_____interest me.
(11)
I talked about ___.
(12)
John likes ___.
Restricting our attention to single words, ten elements that can appear in the
underlined slot in (10) are: ideas, people, cards, sheep, pencils, lectures, classes,
dogs, journals, computers. Ten elements that cannot appear there are: at, laugh,
from, angry, yellow, grow, because, incidentally, never, not. We set up classes, then,
of elements that can appear in a large number of identical environments, and
which are said (to introduce a somewhat technical term) to be mutually
substitutable. As a matter of historical accident, we call these terms nouns,
verbs, adjectives, etc., but they could have been called anything. In any event,
a grammatical category is defined as follows:
(13)
A grammatical category = a class of elements whose members
are mutually substitutable (i.e., interchangeable without any
diminution of acceptability of the resulting string) in a
sufficiently wide range of environments.
You will recall that we are trying to model as linguists the linguistic abilities
of fluent native speakers of the languages that we are describing, and it is useful
in this connection to return to the Jabberwocky example that I discussed in the
first lecture. You didnt know the meanings of the words there, by design, since
they were nonsense words, and yet you were able to understand the string The
blithy toves did gyre and gimble by virtue of the environments in which the words
appeared, and were able to assign the words to grammatical categories (known
as parts of speech). The only basis for this assignment had to be by applying the
definition of grammatical category given in (13), since there was no other.
The definition in (13) has a caveat, however-namely, the italicized
phrase. This is the point at which science becomes art, in that we really have no
way of determining in advance which set of environments are the right ones to
pick out the correct set of grammatical categories. We will now consider the
pitfalls in two polar extremes in applying the mutual substitutability criterion-

17
requiring mutual substitutability in all environments, and requiring mutual
subsitutability in only one environment.
Let us first look at the consequences of requiring mutual substitutability in
all environments, so that we are considering definition (14), which we will call
Straw Man I, as a replacement for (13):
(14)
Straw Man I:
Grammatical category= a class of elements whose members
are mutually substitutable in all environments.
Let us now consider the following words:
(15) like, put, elapse, dash, grow (meaning become), become, persuade.
These words all have somewhat different distributions, and do not
occur in all of the same environments. The word like, for example (in the sense
of being fond of), only occurs before a noun:
(15)
a. John likes pizza.
b. * John likes.
The word put only occurs before a noun and a preposition:
(16)
a. John put books on tables.
b. *John put books.
c. *John put.
The word elapse cannot occur directly before a noun, and may occur
finally in the sentence.
(17) a. Time elapsed.
b. *Time elapsed the day.
The word grow can occur before an adjective, but not a noun:
(17)
a. John grew despondent.
b. *John grew a lawyer.
The word become can occur before an adjective, or before a noun:
(18)
a. John became despondent.
b. John became a lawyer.
The word persuade occurs before a noun, and, optionally, a sentence.
(19)
a. John persuaded Sally.
b. John persuaded Sally that Clinton should be impeached.
The word dash requires a following P, as in (20):
(20)
a. He dashed into the room.
b. *He dashed.
The point is that all of these words differ from one another in their distributions;
none of them can be said to be mutually substitutable in all environments. Can
they therefore be members of the same grammatical category if we adopt Straw

18
Man I, which would require that members of the same grammatical category be
mutually substitutable in all environments?
Obviously not, so wed have to set up seven totally distinct parts of speech
(call them Thelma, Louise, Bob, Carol, Ted, Alice, Mortimer), so that we would
have the following assignment:
(21)a.like is of category Thelma.
21
b. put
Louise.
c.elapse
Bob.
d.grow
Carol.
e.become
Ted.
f.persuade Alice.
g.dash
Mortimer.
Whats wrong with having all of these words as separate parts of
speech?
Recall that in our last lecture, we talked about three criteria by which we
could evaluate proposed grammars: observational adequacy, descriptive
adequacy, and explanatory adequacy.
Descriptive adequacy involves a grammar not only generating the right
forms, but doing it in such a way as to show the regularities that speakers make
about their language.
While the seven words above have differences in their distributions, they
also have similarities. For example, they all agree with the preceding nouns:
22 a. John likes pizza.
b. People like pizza.
23.a. John puts books on tables.
b. People put books on tables.
24.a. John grows despondent.
b. People grow despondent.
25. a. John becomes despondent.
b. People become despondent.
26. a. John persuades us that Clinton will be impeached.
b. People persuade us that Clinton will be impeached.
27. a. John dashes into rooms.
b. People dash into rooms.
The (a) forms agree with the singular noun John, while the (b) forms agree
with the (irregular) plural form people.
A second similarity that all of the words have above can be generalized as
follows:

19
28. Every English declarative sentence must contain an element which is
capable of agreeing with a noun that is at or near the beginning of the
sentence.
When I say is capable of agreeing, I mean to say that the form does not
always have to show agreement. For example, there are a class of elements that
we will discuss soon, known as helping verbs, such as can, could, shall, should,
may, might , have, and be, and when these forms occur, the words above do not
agree. Example:
(29) a. John would become despondent.
b. People would become despondent.
However, in the absence of helping verbs, the words above do show the
normal agreement pattern.
There are, then, two similarities that are shared by the seven words given in
(21). Suppose we were to specify our syntax of English as in (30):
(30)S---- N { Thelma N}
{Louise N P }
{Bob
}
{Carol A }
{ Ted {A} }
{N}
{ Alice N (S)}
{Mortimer P}
A word about notation. The symbols { and } are known as curly
brackets or braces. Their use signifies that one must choose one of the rows that
they enclose. Parentheses (( and )) in grammatical descriptions indicate that
the element is optional-it can be present but need not.
It is clear that the use of the curly brackets in (30) to describe the
structure of English sentences misses the fact that all of the elements have
something in common, and we would have to use the same set of elements
within curly brackets to formulate agreement in English, along the following
lines:
(30) The first noun in the sentence agrees with the first instance of:
{ Thelma
}
{ Louise
}
{ Bob
}
{ Carol
}
{ Ted
}
{ Alice
}
{ Mortimer }

20

Because we linguists are trying to model the knowledge and abilities of


native speakers, our descriptions are really models of what goes on in
language users minds. In other words, the notation of our grammar
makes claims, and the curly brackets notation essentially makes the
claim that the elements within the curly brackets are just an un-related set
of elements. We are not reflecting the fact that the words have an intrinsic
relationship to one another, a fact that gains prominence when we see that
we would have to use the curly brackets with the same set of elements at
more than one place in the grammar.
Therefore, we would say that a grammar that puts these words
into totally distinct classes fails on the grounds of descriptive adequacy,
because it doesnt capture a linguistically significant generalization-in this
case, that all of these elements are verbs.
We cannot require mutual substitutability in all environments,
then, because to do so would be to have an extremely large number of
grammatical categories, and we would have to formulate the processes of
grammar in terms of large disjunctions of categories that would keep reappearing, leading to the question of why the same list of elements
keeps appearing in curly brackets. We would say, then, that a given
grammatical category does not have all of its members appearing in
exactly the same set of environments, but in enough of the same
environments to warrant putting them in the same class.
However, we must have some way of capturing the differences in
distribution that these members share.
A useful way to do so is to make a distinction in a grammar
between the syntactic rules, or rules of formation, and the lexicon, or
dictionary. To do so, let us set up our first syntactic rule as follows:
(31) S-- N V (N) ( { A} )
({P} )
( {S } )
As I mentioned earlier, the arrow -- means consists of, and
furthermore, the parentheses (( and ) ) and braces ({ and }) are
abbreviations for optionality (parenthesis) and choice or disjunction
(braces). Therefore, the rule (31) is really an abbreviation for 8 different
rules, which are unpacked in (32):
(32)a.S-- N V
b. S-- N V N

21

22
23
24

25

26

c. S-- N V N A
d.S-- N V N P
e. S-- N V N S
f. S--- N V S
g. S-- N V P
h. S-- N V A
The technical term for a rule such as (31), which uses abbreviatory
conventions to collapse a number of rules, is rule schema, an abstraction
over more than one rule which has the appearance of only being one rules.
Furthermore, application of one of the rules in (32) will create a structured
representation, so that application of, for example, (32)(b), will create a
representation as in (33):
(33)S

N V N
And we could show that the second line was formed from the first by
drawing lines from the first symbol to the elements that are introduced in the
second line, as in (34):
(34) S
N
V
N
Rules as in (32a-g) are known as phrase-structure rules, because they show
how phrases are structured. We can say that the phrase-structure rules generate,
or create, structured representations of sentences, which are known as phrasemarkers.
27 Phrase-structure rules have at most one symbol to the left of the arrow, a
necessary restriction, as we will see. The arrow is said to be an instruction to
rewrite the symbol to its left as a sequence of symbols on the right, and the
symbols on the right of the arrow are called the expansion of the symbol on
the left.
28 However, notice that (34) does not contain any words, and we still do not
have any device for registering the fact that not all verbs can, for example,
occur in the phrase-marker in (34).
29 Suppose that we extend our definition of grammatical categories, based
on mutual substitutability in a sufficiently wide range of environments,
and say that a grammatical category can be composed of different
subcategories, categories that are members of some larger category but
which have some further characteristic in common.

22
30

31
32

33
34
35
36
37
38
39
40

41

42
43
44
45
46

47
48

We now say that phrase-structure rules or, more precisely, phrasestructure rule schemata, only take us as far as generating the phrasemarkers up to, but not including, the point at which the words are
inserted into the phrase-markers. The words are said to be drawn from
the lexicon, or dictionary. A lexicon contains a list of lexical items (words
in the dictionaries with which we are all familiar), each of which contains
a lexical entry, or information about that lexical item which is not
predictable by more general rule.
(15) like, put, elapse, dash, grow (meaning become), become, persuade.
So, a sample lexicon, to take account of the subcategory
membership that each of the eight words given in (15) would have, would
be as in (35):
(35) like, V, +[____N]
elapse, V, +[___#]
put, V, + [___N P]
grow, V, +[___A]
become, V, +[____{N } ]
{A }
persuade, V, +[____ N (S)]
The part of the lexical entry that occurs after the italicized lexical
items in (35) which begins with the +sign is called the subcategorization
frame.
We can think of the subcategorization frame, which shows the
subcategory membership of the particular lexical item, as the
information that encodes the restriction as to the particular environment
in which the lexical item may be inserted into the phrase-marker.
Hence, if we add (36) to our lexicon, we can generate a sentence such
as (37).
(36) people, N
pizza, N
(37) People like pizza.
The sentence would be generated as follows. First, our phrasestructure rule would generate (34), after which, we would insert the two
nouns, and the verb, into the phrase-marker. On the other hand, we
could not generate (38) because elapses subcategorization frame would
only allow it to be inserted into a V position that was final in the phrasemarker.
(38) * People elapse pizza.
Lecture 3- Phrase-Structure

23
Let us consider the definition of grammatical category that was
given in the last lecture as (13), repeated here:
(13) A grammatical category = a class of elements whose
members are mutually substitutable (i.e., interchangeable without
any diminution of acceptability of the resulting string) in a
sufficiently wide range of environments.
50 Last time, we considered single words that are interchangeable in many
of the same places, placing them into a single category. However, the
same definition also shows us that sequences of words are mutually
substitutable for single words.
51
To see this, let us again consider the set of environments that we
used to test for noun-hood in the last lecture, (10-12), repeated here:
(10)_____interest me.
(11) I talked about ___.
(12) John likes ___.
Last time, we saw that words can be divided up into classes, so that a
certain class of words can fill in the blanks in (10-12), and we call that class the
class of nouns. However, just as single words can be substituted for other single
words in many environments, justifying grouping them into a class, sequences
of words can be substituted for single words. By parity of reasoning, we
would therefore say that such sequences are members of the same classes as the
single words.
For example, if we look at the environments in (10-12), we see that all of
the elements in (1) are mutually subsitutable:
(1) a. books
b. big books
c. the books
d. the big books
e. the big books about Nixon
We therefore must allow these sequences of words to form a single class,
and we would say that, e.g., the big books about Nixon is of the same grammatical
category as books, i.e. a noun. However, we must revise our phrase-structure
schema in (31) of Lecture #2 in order to allow for nouns to be rewritten as
sequences of words. It would seem, then, that we must add (2) to our phrasestructure component:
(2) N-- (Art) (Adj) N (P N)
in which Art stands for the category Article (including the, a, many,
some, few, etc.)
So, let us now generate sentence (3):
49

24
(3) The big books about Nixon interest me.
We start with the symbol S (for sentence), and we apply one of the phrasestructure rules that is included in the phrase-structure rule schema, to yield
the following sequence:
(4) S
N
V
N
We have two options now for the symbol N. We could simply go to the
lexicon and insert two Ns and a V, or we could now apply our new phrasestructure rule given in (2), to yield the sequence Art Adj N P N, so that
we would have the following sequence:
(5) S
N

Art Adj N P N
V N
We now go to our lexicon, which includes the following:
the , Art
big, Adj
books, N
about, P
Nixon, N
Interest, V, +[___ N]
Me, N
Let us now introduce some terminology. We would say that the grammar
generates, or creates, a set of sentences by allowing a set of derivations of those
sentences. A derivation is a sequence of representations such that each
representation , except for the initial representation, is formed from the
preceding representation by a rule of grammar. One symbol is designated as
the initial symbol of the grammar, and every derivation must therefore begin
with this symbol. In this case, the designated initial symbol is S, and so every
derivation must begin with S.
A symbol which appears to the left of an arrow in a phrase-structure rule is
said to be a non-terminal symbol, and a symbol which does not appear to the
left of any arrow is said to be a pre-terminal symbol. Lexical items, which are
introduced by the lexicon, are said to be terminal symbols.
Therefore, the grammar so far, with the phrase-structure rules in (6) and
the lexicon above, will generate the phrase-marker in (7):

25
(6)a. S-- N V (N) ( { A}
({P}
( {S }
b. N-- (Art) (Adj) N (P

)
)
)
N)

(7)
52

S
N
Art

Adj N

The big books about

V
N

interest

N
me

Nixon

The problem with the grammar in (6), however, is that, while it will generate
grammatical sentences such as (3), it will also generate many sentences that we
will want to say are ungrammatical. In this sense, the grammar is said to be too
powerful, in that it does more than we want it to be able to do.
We can see this by considering the symbol N. The phrase-structure rules
operate in a top-down fashion, beginning with the designated initial symbol S.
Therefore, whenever we reach the symbol N, we can, by the phrase-structure
schema expanding N in (6), take any of the options for expanding N that the
schema permits. For instance, we could allow the first N, for example, in, e.g.
Art Adj N P N
V N, to be expanded as Art N, generating such strings
as The big the books about Nixon interest me.
The grammar in (6) exhibits a property that is known in mathematics as
recursion, the ability of a device to re-apply to its own output an infinite number
of times. The recursion in the phrase-structure component of the grammar
results from the same symbol that appears to the left of the arrow appearing in
an expansion of that symbol.
Exercise: Generate five ungrammatical sentences using the recursive
power of N.

26
A way to solve this problem was suggested by Zellig Harris in his book,
Structural Linguistics( (1951), University of Chicago Press). Harris suggested
assigning integers to the different occurrences of N, so that the symbol N that
appears to the left of the arrow would be notated as N 1, and the symbol of N
that is the simple instance of N would be notated as N 0. Hence, the phrasestructure rule that expands N would be formulated as in (8):
(8) N1-- (Art) (Adj) N0 (P N1)
and the rule that expands S would be reformulated as in (9):

27

(9) . S-- N1 V (N1) ( { A


} )
1
({P N } )
( {S
} )
I should emphasize that while the recursion is unwanted in this instance,
it is not always unwanted. Indeed, if we look at the phrase-structure rule
schema in (9), it still contains some recursion.
Question: Where is the recursion in (9)? Can you think of an instance in
which it correctly describes an aspect of English syntax?
Returning to Harriss superscript notation, we note that it does two things:
(i) it reflects the similarity between N0 and N1 by giving them the same category
label (N); (ii) it differentiates them by the superscripted integer. We would
interpret the superscript notation by saying that the higher level integer is a
projection of the lower level integer.
This is another way of saying that there are simple nouns (N 0) and noun
phrases (N1). Are there higher-level projections of other categories?
It seems that there are. We will now look at the evidence for phrasal
projections of V, A, and P. Let us take these in turn, utilizing our definition of
a grammatical category as a class of elements whose members are mutually
substitutable in a sufficiently wide range of environments.
A.
The category V
Let us consider the environment in (10):
(10) John___.
We can substitute the following for one another in that
environment:
(11) laughs, plays the harmonica, puts his coat on the rack, feels angry.
So again, we see that we can substitute sequences of words for single
words (i.e. plays the harmonica for laughs). However, we need to test for
substitutability in a number of different environments; a single
environment will not do. Otherwise, the underlined elements in (12)
would be members of the same grammatical category.
(12)a.They became angry.
b. They became lawyers.
Were we to place angry and lawyers into the same grammatical
category, we would then have to complicate our grammar by
having extremely complicated subcategorization frames, in order to
account for why, e.g. angry, does not occur in so many of the
environments in which lawyers appears, and vice versa. It seems

28
that, for become, we need to have a disjunctive subcategorization
frame as in (13):
(12) become, V , +____ { N1}
{A }
We will return to this subcategorization frame in the next section,
when we see the need for an A1.
With respect to verbal units that are larger than simple verbs, however,
we can find environments that take verbs as well as such larger units.
Specifically, consider subordinate clauses introduced by the word
though, as in (13):
(13) Though he may cry, it wont matter.
The verb may also appear before the word though, as in (14):
(14) Cry though he may, it wont matter.
We now have another environment for verbs, other than the normal
environment at or near the end of the sentence. Notice that when the
verb appears before though, the position of the verb after the subject is
not occupied by the verb.
The position before though can be occupied by sequences that
include verbs plus additional material, and when the sequence precedes though,
none of the material can appear after the subject:
(15)a. Play the harmonica though he may, it wont matter.
b. Put his coat on the rack though he may, it wont matter.
c. Feel angry though he may, it wont matter.
Another environment occurs in coordinate sentences, as in (16):
(16)a.He said he would laugh, and he did laugh.
b. He said he would play the harmonica, and he did play the
harmonica.
c. He said he would put his coat on the rack, and he did put his
coat on the rack.
d. He said he would feel angry, and he did feel angry.
The verb can appear before the subject in the second conjunct:
(17) He said he would laugh, and laugh he did.
However, the sequences that consist of the verb plus additional
material also appear before the subject in the second conjunct:
(18) a. He said he would play the harmonica, and play the harmonica
he did.
b. He said he would put his coat on the rack, and put his coat on
the rack he did.

29
c. He said he would feel angry, and feel angry he did.
Hence, we find the sequences play the harmonica, put his coat on the rack, and feel
angry, as members of the same grammatical category as cry or laugh, and hence
we would be justified in calling them verb phrases. Hence, we would revise
our rule (9) further, repeated here, as (19):
(9) S-- N1 V (N1) ( { A
} )
1
({P N } )
( {S
} )

(19) S- N1 V
V- V (N1) ({A}
)
1
({P N })
({ S
})
Hence, we have a major division of the sentence into two parts,
consisting of a noun phrase and a verb phrase, so that a sentence
such as (20) would have the phrase-marker in (21):
(20)The man read the book.
(21)
S
N1

Art N0
The man

N1

read Art

N0

the book
At this point, a question arises. We established that it was necessary to
distinguish levels of projection for nouns, via the superscript notation. Is it
necessary to distinguish levels of projection in the same way for other categories,
such as verbs?
Recall that we motivated the device of assigning integers to grammatical
categories as a method of preventing unwanted recursion. If we look at the
rule for expanding Vs in (19), it would permit a potentially infinite sequence of
verbs, as in, for example, (22):
(22)
V

30
N1

V
N1

V
V
V

N1

N1
We can get around this problem by simply using the superscript notation for Vs
as well, so that the phrase-structure component in (19) would be revised to
include the rules in (23):
(23)
S- N1 V1
V1-- V0 (N1) ({ P N1)}
({ A
} )
( {S
} )
B.
1
The Category A
Let us look at some environments for simple adjectives. We have
seen one, following the verbs become and grow. In addition to this
environment, we can find adjectives occurring after the verb consider
followed by a noun phrase:
(24)
I consider the man crazy.
We can also substitute adjectives that are modified in that environment:
(25)
a. I consider him fond of chocolate.
b. I consider him partial to vanilla.
Furthermore, just as we can question simple adjectives, in which they appear at
the front of the sentence (we will return to question formation later), modified
adjectives can occur in that position:
(26)a. How angry are you?
b. How fond of Sally are you?
c. How partial to vanilla are you?
Hence, we can justify a phrasal projection of A as well.
C.
The Category P1
In a sense, the category P1 is the easiest category to motivate, in the sense that
prepositions usually occur with following Ns, as in (27):
(26)
John ran to Mary.
However, Joseph Emonds has argued (in Evidence that Indirect Object
Movement is a Structure-Preserving Transformation, Foundations of Language
(1972)) that , just as certain verbs, such as elapse, are intransitive (i.e., dont
take objects), or are optionally intransitive, such as eat, there are intransitive
prepositions as well. For example, consider the verb put, which requires a

31
locative prepositional phrase ( see the subcategorization frame for put in (35) of
Lecture #2)). Hence, (27) is unacceptable:
(27)(a) *John put the book.
(b) John put the book on the table.
However, certain single words can satisfy puts requirement of
having an element after the object:
(27)
John put the book on.
Aside from Emonds view of words such as on, which didnt occur with a
following noun phrase, there has been another view--- that of Bruce Fraser, who
analyzed such words in Fraser (1965) ( An Examination of the Verb-Particle
Construction, MIT Doctoral Dissertation) as particles. Hence, Fraser posited the
category Prt.
Let us consider the two views more closely, and formalize them in our
phrase-structure grammar. Emonds posited a set of phrase-structure rules that
included (28):
(28)
a. V1--- V 0 (N1) (P1)
b. P1- P0 (N1)
Frasers view can be described as follows:
(29)

a.V1- V0 (N1) ({P1 } )


( {Prt } )
1
0
1
b. P - P N
Let us compare the two views. Morris Halle, in a 1962 paper
entitled Phonology In Generative Grammar ( published in the
journal Word), proposed what he called a simplicity metric for
comparing two grammars. Simplicity was measured in terms of
the number of symbols in the grammar, and the idea was that if
two grammatical descriptions of the same outputs were compared
in terms of number of symbols in each , and Grammar A had less
symbols than Grammar B, Grammar A was simpler, and hence to
be preferred.
Suppose we apply the simplicity metric to (28) and (29),
counting grammatical categories and abbreviatory conventions
(parentheses and curly brackets). Notice that Frasers analysis also
requires an expansion for P1. He would eliminate the parentheses
around the N1 that follows P0, because he is analyzing these words
as particles.
By this count, Emonds analysis, which has ten symbols, is
simpler than Frasers, which has eleven. We can see this if we look

32

53
54

55
56

at a wider fragment of both grammars. Let us consider the


subcategorization frame for put in both grammars. Emonds would
posit the subcategorization frame in (35) of Lecture #2, repeated
here:
(30)
put, V, +[____ N1 P1]
and P1 could introduce a P0 that was either transitive ,
intransitive, or optionally transitive, depending on the
subcategorization frame of the P0.
Frasers subcategorization frame would be as in (31):
(31) put, V, +[____ N1 { P1 } ]
{ Prt }
Clearly, (30) is a simpler subcategorization frame than (31), and
is hence to be preferred.
Another piece of evidence that Emonds adduces is based on the
distribution of the word right, which modifies some prepositions, as in
(32):
(31)
Hell send it right up the stairs.
(32)
Ill send it right to you.
This word doesnt modify all prepositions:
(33)* He was working right at Citibank.
(34)* He was talking right about Sally.
This word, however, can also modify the class of elements that Fraser calls
particles, and, by our definition of a grammatical category, the word
right and this word would constitute a grammatical category:
(35) Hell send it right up.
We could simply say that the word right modifies words that express
direction, and not have a syntactic statement of the distribution of this word.
However, the contrast between (35) and (31) is instructive in the comparison
between Emonds view and Frasers. Clearly, (35) should receive much the
same analysis in terms of structure as (31). For one thing, the sentences mean
much the same thing. Let us see what the structure of (35) and (31) would
be under Frasers analysis (we will omit the helping verb will, expressed
here as ll, because we will come back to it in the next lecture, and it doesnt
affect the choice between Emonds analysis and Frasers). Fraser would
assign (35) the structure in (36), and (31) would receive the structure in (37):

33
57
58

(36)

S
N1
N0

He

(37)

send

He

V0

N1

N0

Prt1
Prt0

?
it

N1
N0

V1

right up

V1
V0

N1

send

N0

it right up

P1
?
Art

P0

N1

N0

the stairs
Under Emonds analysis, (35) would receive the phrase-marker in (38), and
(31) would receive the phrase-marker in (39):

34

(38)

N1

V1

N0

V0

N1

He

send

N0

P1
?

it
(39)

right up

N1
N0
He

P0

V1
V0
send

N1
P1
it ? P 0

N1

right up Art N0
the stairs
Notice that the view of these single-word categories as particles forces us to
assign radically different structures in (36) and (37), while the view of these
categories as being intransitive prepositions, in this case optionally intransitive,
makes the structures in (38) and (39) as being minimally different. Notice that
the analysis of these words as being optionally transitive prepositions places
them on a footing parallel to that of verbs, which can , in some instances, be
optionally transitive, as in the case of the verb eat:
(40)
a. He ate something.
b. He ate.
or adjectives:
(41)
a. John is angry with Sally.
b. John is angry.
Hence, we will assume that what has been called particles are really
nothing more than intransitive prepositions.
D.
The Parallelism of Grammatical Categories and How to Reflect It In
the Grammar
Toward the end of the discussion of prepositions, we appealed to the
notion that there is a certain symmetry in the way that grammatical
categories are constructed. Zellig Harris originally noted this, and

35
Chomsky developed this idea into what is now known as X-bar theory
(N. Chomsky (1970), Remarks on Nominalization, in R. Jacobs and P.
Rosenbaum, eds. , Readings in English Transformational Grammar, GinnBlaisdell). The idea is that grammatical categories are constructed
according to a fixed template. It will be noted that all of the grammatical
phrasal categories that we have discussed so far are of the form in (42):
(42)
X1-- (Y) X0 Z1**2
The asterisk is known as the Kleene star (after a mathematician named
Kleene), and it means from zero to infinite occurrences of the symbol to
which it is asterisked.
X, Y, and Z stand for arbitrary categories, with the only
understanding of this notation being that each symbol stands for the same
category in all of its occurrences in a given statement. It is what is known as a
variable, in this case ranging over categories. In other words, all phrasal
categories of N, V, A, and P have the same arrangement, and are constructed
the same way, so that a P1 would have the same structure as an A1, for
instance.
(42) is another instance of a rule schema. It is an abbreviation for a number of
different rules. Note that each phrase-structure rule has an obligatory
element in the expansion, while all of the other elements in the expansion
is optional. The obligatory element in the expansion is called the head, so
that N0 is the head of N1, A0 is the head of A1, P0 is the head of P1, and V0
is the head of V1.
The notion of all categories being constructed in the same way is standard in
modern syntactic theories of all stripes. It is also well-supported in studies of
language typology, the study of the ways in which languages may be said to
differ from one another. In a classic study by Joseph Greenberg (1963) (Some
Universals of Word Order With Reference to Meaningful Elements, in J.
Greenberg, ed., Universals of Language, MIT Press), for example,
Greenberg classified languages into three main types -V(erb)S(ubject) O(bject),
SVO, and SOV. Describing SOV as verb-final, and the other two as non-verbfinal, he noted some striking correlations of the dimension of verb-finality
with, for example, the fact that some languages have postpositions, while
others have prepositions, so that SOV languages tended to have postpositions,
while VSO and SVO languages had prepositions. He also noted that VSO
languages always had SVO word order as an alternative, and we will discuss
this more later.
2

We will discuss S in the next lecture.

36
However, Greenbergs correlations have suggested to people that languages
can be distinguished along the dimension of head position, so that languages
such as, e.g., Japanese, which is SOV, are really head-final, while languages
such as English are head-initial.

E.

Some Terminology
At this point, it would be useful to review where weve come to.
A grammar is a sequence of rules that generates a set of phrase-markers,
which are structured representations of sentences. For natural languages,
the set of phrase-markers is infinite, even though each phrase-marker is
finite in length. There is only a finite set of rules, and so the rules that
generate the phrase-markers for natural languages must be recursive i.e.,
have the ability to reapply to their own output a potentially infinite number
of times, although there must also be non-recursive rules in the grammars
of natural languages, otherwise phrase-markers would never terminate.
So far, we have only seen one type of grammatical rule- a phrasestructure rule, which determines how sentences are composed. Phrasestructure rules have the formal requirement that they can have at most one
symbol to the left of the arrow in the phrase-structure rule (called the
symbol to be expanded), and can have, in principle, any number of
symbols to the right of the arrow (called the expansion).
A symbol that appears to the left of the arrow in a phrase-structure rule is
called a non-terminal symbol. A symbol that is introduced by the phrasestructure rules, but does not appear to the left, is called a pre-terminal
symbol. A grammar is set to generate a set of derivations. A derivation is
a sequence of representations such that each representation is formed from
the immediately preceding representation by a rule of grammar, except for
a distinguished symbol that is said to be the designated initial symbol;
every derivation must start with this symbol. For our grammar so far, the
designated initial symbol is S.
We also assume that, apart from S, all phrasal categories are expanded
according to a particular template, called an X-bar schema.
The levels of complexity of the various grammatical categories, such as
0
N versus N1, A0 versus A1, etc., are called the levels of projection of the
grammatical categories. So far, our X-bar schema has claimed that all
grammatical categories project up to level 1. The level 1 projection of the
category is said to be the maximal projection of the category (so far).

37
It is also useful to note some of the relations that are defined on phrasemarkers. Phrase-markers can be represented in any one of a number of
ways, just so long as the groupings are represented. For example, we have
given phrase-markers as trees, but they could also be represented as
labelled bracketings. I will represent them as trees, because I feel that
they are easier to inspect that way, but this is simply an expository
convenience.
The labelled points in the tree diagram are called nodes.A node A that
is above another node B in the phrase-marker, such that A contains B, is
said to dominate node B. If node A is the first node above node B, node A
is said to immediately dominate node B.
Phrase-markers show the groupings, and these groupings of elements
are called constituents. A constituent is a sequence of nodes that are all
immediately dominated by the same node, such that the immediately
dominating node exhaustively dominates the sequence (i.e. immediately
dominates the sequence and nothing else).
In the next lecture, we will examine the X-bar schema more closely,
and refine the structures of NP, VP, AP, and PP.

38

Lecture 4: Levels of Projection


So far, our X-bar schema has posited only one level of projection above
the X0 level (essentially, the word level). Hence, our schema has all of the
grammatical categories (other than S, to which we shall return) fitting into
the template of (42) of lecture 3, repeated here:
(43)
X1-- (Y) X0 Z1*
This analysis makes the claim that, while the sequence Y - X 0- Z1 acts as a
constituent, there is no further constituency among these three elements. C.L.
Baker (1978), in his textbook Introduction to Generative Syntax (Prentice-Hall),
showed evidence, however, that there is evidence for X0 and Z1 forming a
constituent, if we take X0 to be N0 and Z1 to be P1. The evidence comes from
nominals such as those in (44):
(1)
a. the king of England
b. the picture of Fred
c. the destruction of Rome
An assumption that is made in constructing grammars is that grammatical
processes only operate on constituents. With this in mind, Baker considered a
phenomenon known as ones-pronominalization, an example of which is given
in (2):
(2)
This picture is bigger than that one.
Notice that the word one refers to picture. It is said to be an
anaphoric element, and anaphora is defined as in (3):
(3) Anaphorathe grammatical reflection of the identity of two elements.
We will be returning to other grammatical constructions that are said to be
anaphoric later on, but the point about anaphoric elements is that they cannot
be said to refer to anything on their own, but , rather, get their reference from
some other element that is expressed, which does have an independent
reference. So, in this case, the picture can be said to be referentially
independent, and that one gets its reference from the picture.
It turns out that there are syntactic conditions on the anaphora relation
between a word such as one and the element from which it gets its meaning.
In particular, Baker notes that a PP headed by of cannot follow one, as in
(4):
(3)
a. *The king of England is taller than the one of France.
b.* The destruction of Rome was more horrifying than the one of Carthage.
59
c. * The picture of Fred was clearer than the one of Bill.
60 Interestingly enough, however, one can be understood as the noun
followed by the PP headed by of:

39
a. This king of England was taller than that one.
b. The prospect of a slow trial of the President is more damaging than a
quick one.
c. This picture of Fred is clearer than that one.
It would seem, therefore, that one can refer to a noun and a PP headed by of.
One way to account for this would be to ascribe a structure such as (5) to a noun
phrase such as the king of England:
(5)
N2
(4)

Det
The

N1
N0
king

P1
P0 N 1
of

N0

England
We then have a unit that consists of the noun and the following
prepositional phrase, namely the constituent N 1. We would then say that one
must be anaphoric to an N1. Because the N1 is comprised in, e.g. (5), of both
the noun and the following PP, the ungrammaticality of (3) stems from the
violation of the requirement that one replace an N1, rather than just an N0.
Interestingly, not all sequences of nouns followed by a PP disallow one
followed by a PP, as pointed out by Radford (1988), (Transformational
Grammar: A First Course, Cambridge University Press). For example, (6) is
perfectly acceptable:
(6)
The man with Sally left, but the one with Susan stayed.
However, we can also allow one to be interpreted as a noun plus a PP headed
by with:
(7)
One man with Susan left, but that one stayed.
We can account for this if we posit a structure in which two things happen: (i)
the sequence consisting of the noun itself is an N1, to the exclusion of the withPP; (ii) the sequence consisting of the noun itself plus the with-PP is an N1.
In short, we would need the following phrase-structure rule for noun
phrases:
(8)
N2- (Det) N1
N1-- { N0 (P1)}
{ N1 P1 }

40
Hence, the structure of the man with Sally would be as in (9):

41

N2

(9)

N1

Det

N1

the

N0
man

P1
P0

N2

with N1
N0

Sally
In short, a recursive expansion of N 1 allows, in some instances, a simple N 0
to also be an N1, and allows a sequence N followed by PP to be analyzed as
two N1s, a simple N1 followed by a PP, as well as a N 1 consisting of a N1 and a
PP.
Exercise: Show the phrase-marker for the underlined noun phrase in (a),
using the grammar that we have developed so far:
(a) The picture of Sally with the green frame is bigger than the one with
the blue frame.
A. Adjectives within the Noun Phrase
It will also be noted that the assumption that one replaces an N1 tells us
about the hierarchical position of adjectives within the noun phrase, as in the
blue car. Consider sentences such as (10) and (11):
(10)
The blue car was prettier than the green one.
(11)
This blue car was prettier than that one.
Our previous reasoning forces us to posit a phrase-structure rule as in (12) for
1
N s:
(12)
N1- A N1
Hence, the structure of , e.g., the blue car, would be as in (13):
(13)
N2
Det N1
the A N1
blue N0
car
Exercise: What is the structure of the big blue car , given the following sentences?

42
(b) The big blue car was faster than the small green one.
(c) The big blue car was faster than the small one (can mean either the small blue car or
the small car).
(d) This big blue car was smaller than that one (can mean either that big blue car, that
blue car, or that car).
Summary: We have motivated the following phrase-structure rules for the noun
phrase:
(14) N2- Det N1
N1- {A N1 }
{ N 1 P1 }
{ N0 (P1)}
We have not yet talked about genitives, such as Johns mothers boyfriends sisters
teacher. Notice that genitives seem to occur in the same position as the determiner.
Therefore, we can modify the rule for expanding N2 above as in (15), putting aside for
the moment the mechanism by which the possessive s is introduced:
(15) N2- { N2 } N1
{ Det }
B. Implications of the structure of the Noun Phrase for Other Categories
In the last section, we examined the structure of the noun phrase, and argued for two
levels of projection of the noun above the N0. If we assume that all categories are created
by the same X-bar schema, this would indicate that our X-bar schema of (42) of the last
section, repeated here, should be revised to (16):
(42) X1-- (Y) X0 Z1*
(15) X2-- (Z2) X1
X1-- { Y2 X1 }
{ X1 Y2 }
{ X0 Y2*}
We might then say that Y is instantiated by articles in noun phrases (as well as
possessives, to which we shall return).
Notice, first of all, that the determiner that is found in noun phrases, a word that
appears before the head, is paralleled by degree words that precede adjectives in adjective
phrases and adverbs that precede verbs and prepositions. Examples are as in (17):
(16) a. The pictures of Sally.
b. quite fond of Sally.
c. completely lost his mind.
d. right up the stairs.
It will be noted that we have direct evidence, from ones- pronominalization, for a
projection that includes the noun and a following PP ( in some instances- specifically, PPs
headed by of), and we have generalized from that evidence to saying that all categories
have two levels of projection. Are we warranted in leaping from evidence for two levels
of projection in N to two levels of projection in all categories, when we have no evidence
for the latter?
Recall that we are assuming that the grammar is as simple and as general as possible.
We have no direct evidence for two levels of projection in adjectives, verbs, and
prepositions, but we have no evidence against two levels, either. If we were to assume
that there is only one level of projection for these categories, but two for the noun, we

43
would clearly be assuming a more complicated grammar than if we assumed two levels of
projection for all the categories. For this reason, we assume that there is a single X-bar
schema for all grammatical categories, given in (15). Much of what we will be doing in
the succeeding weeks is finding evidence for the various options given by (15) for
particular sentences.
It will be noted that we have not fit S into the X-bar schema. We have not
defined S so far as the maximal projection of any category. That will change in the next
lecture.
B. Grammatical Relations and Grammatical Categories
Traditional grammar speaks of such notions as subject and object. Do these
notions play a role in grammar? Notice that our grammar generates partial phrasemarkers such as (18):
(17) S
1
N V
g
V
1
V N
The phrase-marker shows the hierarchical (up-and- down dominance) and linear
(right-to-left precedence) relations of grammatical categories-in this case, S ,
N, V, V, and V. What is the difference between grammatical categories and
grammatical relations?
There is a once-and-forever characteristic of grammatical categories that is
not present in grammatical relations. An element or sequence of elements either
is or is not a given category depending on the nature of its head. Grammatical
relations, on the other hand, can be deduced from phrase-markers depending
on the positions of the various grammatical categories. Hence, in the partial
phrase-marker in (17), an N is a subject if it is immediately dominated by S,
and an object if it is immediately dominated by V. If the Ns were in different
structural positions, they would bear different grammatical relations. This was
the point made by Chomsky in Aspects of the Theory of Syntax (1965, MIT
Press), who introduced the following formulations of subject and object
(updated for X-bar theory):
(18)subject = [N, S] (meaning the N immediately dominated by S)
object = [ N, V] (meaning the N immediately dominated by V)
If grammatical relations are predictable from the positions of the elements that bear
them, the reasoning goes, they should not be represented in phrase-markers, for the
same reason that we do not represent regular plurals in the lexical entries of nouns. We
only represent in representations what we cannot predict from something else.
C. Some Terminology
The grammatical relations of subject and object are akin to other grammatical
relations that are particular to X-bar theory. Let us take a representation as in (20), which
is generable from the schema in (15):

44

(20) X
2
Z
X

1
Y X
2
Y
X
2
X Y
2
X
Y
In phrase-markers, it is convenient to use the notions sister and daughter, defined in
terms of immediate domination. These notions are defined as in (20):
(21) A is a sister of B if A and B are both immediately dominated by the same node.
(22)A is a daughter of B if B immediately dominates A.
We can now define the following grammatical relations:
(23)a.A is a specifier if A is a daughter of X.
b. A is an adjunct if A is a sister and daughter of X.
c.A is a complement if A is a daughter , but not a sister, to X.
These notions will play a role in the next lecture, when we try to integrate S into the Xbar system. For now, however, let us note the following.
One of the original motivations for the X-bar system was the attempt by Chomsky to
capture the similarity in understood semantic relations between sentences and
nominalizations, as in (24):
(24) a. Rome destroyed Carthage.
b. Romes destruction of Carthage.
We could say that, in (b), the N that realizes the agent semantic role is a specifier, but
we cannot say that the agent is the specifier of the phrase in (a), because the notion of a
specifier is defined in X-bar terms, and S is not (so far) an X-bar projection. We could
say that the object in (a) is really a complement, and the notion of a complement is
realized in sentences (which contain Vs) and noun phrases (which contains Ns).
We will return to this in the next lecture.

45

Lecture 5- The Helping Verb System: The Need for Transformations


In the last couple of lectures, we have examined the phrase-structure of the
projections of nouns, verbs, adjectives, and prepositions. We have posited a single
schema for constructing phrasal categories-namely, the one in (16), repeated here as (1):
(1) X2-- (Z) X1
X1-- { Y2 X1 }
{ X1 Y2 }
{ X0 Y2*}
Obviously, however, the rule for expanding S, which is given in (2), does not fit this
schema:
(3) S- N V
For one thing, there is no element in the expansion of S that could be considered the
head, since both elements are obligatory. For another, both elements are Xs in the
expansion.
We will now bring S into the X-bar schema fold by analyzing it as a projection of Tense.
In so doing, we will introduce another type of syntactic rule known as a transformation,
which converts phrase-markers into other phrase-markers.
A. The Helping Verb System of English
So far, we have analyzed sentences that consist of a subject immediately followed by a
main verb, as in (4). However, sentences may also contain what are known as
helping verbs, as in (5):
(4) John eats steak.
(5) a. John would eat steak.
b. John has been eating steak.
c. John is eating steak.
d. John would have eaten steak.
e. John would have been eating steak.
f. John would be eating steak.
g. John has eaten steak.
Have and be, when they occur as helping verbs, are said to be markers of aspectreferring to the completedness of an action or state of affairs. Have is said to mark the
perfective aspect, which generally marks completion, while be is said to mark the
progressive aspect, which generally marks an ongoing state of affairs. When the
perfective helping verb have appears, the next verb is marked with what is known as a
perfect participle, generally written as en, while the progressive helping verb be triggers
the suffix ing on the next verb. There can be at most one perfective have, and at most
one perfective be:
(6) a. *John has had eaten the steak.
b.* John is being eaten the steak.
Furthermore, perfective have, and progressive be, when they both occur in a simple
sentence, must occur in that order:
(7) *John is having eaten the steak.
There is one set of elements that can occur before perfective have or perfective be,
and this set of elements is known as the set of modals. They include the elements would,
will, can, could, shall, should, may, might, or must, and again, they are mutually
exclusive.
(8) John {can
}eat the steak.

46
{ could }
{ will }
{ would }
{ shall }
{ should }
{ may }
{might }
{ must }
These, then , are the facts concerning linear order of helping verbs. If
we call the class of elements in (8) modals, and abbreviate this class by
the symbol M, we can account for the facts (aside from the affixes on the
verbs that occur with the aspectual helping verbs, which we will put aside
for now), by positing, as a first approximation, the phrase-structure rule
in (9):
(9) S- N (M) (have) (be) V.
B. Yes-No Questions In English
Let us now consider the formation of yes-no questions in English. First, consider
yes-no questions in simple sentences that have all three types of helping verbs- modals,
perfective have, and progressive be.
When all three helping verbs occur, the modal will appear at the beginning of the
question:
(10) a. Would he have been eating?
b. *Have he would been eating?
c. *Been he would have eating?
When the modal is absent, however, and have and be occur, have will appear
at the beginning of the question:
(11) a. Has he been eating?
b. *Been he has eating?
When the modal and have are absent, but be occurs in the simple sentence, be
introduces the question:
(12) Is he eating?
There is a generalization about the helping verb that appears at the beginning of a yesno question, if one notices the corresponding order of helping verbs in the declarative.
Can you guess what it is?
Youre right. Its (13):
(13) The helping verb that appears at the beginning of a yes-no question
is the helping verb that would appear immediately after the first N in
the declarative version of the sentence.
We might try revising our phrase-structure rule in (9) to account for this, as in (14):
(14) S- {M N (have) (be) } V
{have N (be)
}
{be N
}
The phrase-structure rule in (14), while it gives us the effect in (14), doesnt directly
capture it as a generalization. The fact that the helping verb that appears at the beginning
of the yes-no question is just that helping verb which would appear after the N in the
declarative is accidental. We have three separate expansions of S which gives us this
result, but we could just as well have had a grammar that had (15) as the phrase-structure
rule:

47
(15)S-- ({M
} ) N (M) (have) (be) V
( { have } )
( { be
})
Question: What ungrammatical strings would (15) generate in forming questions?
Chomsky suggested, in Syntactic Structures (1957, Mouton) that we could get around
this problem by not generating yes-no questions in English via a phrase-structure rule,
but to exploit the fact that the appearance of helping verbs in questions was predictable
from the order of helping verbs in the corresponding declarative sentences. Specifically,
he posited a new type of grammatical rule known as a transformation, which is defined as
a rule which changes a phrase-marker into another phrase-marker. In this case, the
transformation, which is known as Subject-Helping Verb Inversion, is formulated as in
(16):
(16) N- { M }
{have}
{be }
12 --
2 - 1
Transformations as in (16) are thought to have two parts- a structural description,
which specifies the properties that the phrase-markers must meet in order to be
transformed, and a structural change, which specifies how they are to be transformed.
The structural description occurs before the arrow, and the structural change occurs after
the arrow. In this case, the rule is stated so that the structural description must consist of
two factors. The first is the N, and the second is the M, have, or be that immediately
follows it in the phrase-marker. To see how this works, let us consider the phrasemarker for (5)(e):
(17)
S
N M have been V
N

would

eating the steak

N
John

In this case, the subject N is factor 1, and the modal is factor 2.


The structural description of Subject-HV Inversion is met, and the
S
phrase-marker is altered by interchanging the two factors,
resulting in (18):
M
N have been V
(18)
Would John

eating the steak

48

The elements in braces in (16) are arranged vertically, and the


braces indicate that one of the elements within the braces must be
chosen, and, furthermore, that this element must be immediately
after the N, which is Factor #1. Hence, the rule could not apply
to move have or be in (17), because it is the Modal which is
immediately after the N. Hence, (10)(b) and (10)(c) could not be
generated because of the way that Subject-HV Inversion is
formulated.
On the other hand, if the modal were absent, have, if present,
would satisfy the requirements of being Factor #2, as in (19):
(19)
S
N

have been V

John
1

eating the steak


2

And then have and the immediately preceding N


would invert, yielding (20):
(19)
S
have

N
John

been

V
eating the steak

49
And, of course, if be is present, but not a modal or have, then be
will fulfill the requirements of being Factor #2:

(20)

Be

John

V
eating the steak

In short, the rule (16), as a consequence of its structural description,


directly expresses the generalization that the first helping verb after the
subject in the declarative phrase-marker is the helping verb that
appears at the beginning of a yes-no question.
We now have, however, another type of syntactic rule-namely, a
transformation, which has the power to add, move, or delete elements
(although so far, we have only seen one transformation, which moves
elements.) Hence, our grammar is organized as follows:
(21)Phrase-structure Rules + Lexicon-- Phrase-marker 1
Transformation(s)
Final Phrase-marker
Of course, we have only seen one transformation so far. We will now
remedy that.
C. Yes-No Questions Without Helping Verbs
How do we form yes- no questions in English without helping verbs, as
in the question counterparts of (23) and (24)?
(22)John visited Sally.
(23)John visits Sally.
The question counterparts of (23) and (24) are (25) and (26) respectively:
(24)Did John visit Sally?
(25)Does John visit Sally?
It is noteworthy that the declarative forms of (25) and (26), which would be
(27) and (28), respectively, are unacceptable unless emphatic:

50
(26)*John did visit Sally.
(28)* John does visit Sally.
In short, the tense, which must be a suffix attached to the verb in the
affirmative declarative, is separated from the verb in the interrogative. How
can we account for this fact?
We could account for it by generating the tense away from the verb, and
saying that it must attach to the verb when it is adjacent to it. First , let us revise
our phrase-structure rule for introducing helping verbs from (9), repeated here,
to (29):
(15)
(9) S- N (M) (have) (be) V.
(29) S-- N T (M) (have) (be) V
We can then say that the tense element moves in forming yes-no questions as
well, revising (16), repeated here, to (30):
(16) N- { M }
{have}
{be }
12 --
61
21
(30)
N- T ( {M } )
( {have} )
( {be } )
1 2
--
2 1
Finally, the rule of tense-hopping would be formulated as in (31):
(31) T - { have }
{ be }
{ V }
1 - 2 ---
0 - 2+1
A word of explanation is in order about the + symbol, and what it is supposed to
mean. It means that the two elements form a unit after movement. Let us illustrate
with the derivation of (23), John visited Sally.
The initial phrase-marker for (23) would be (32), with the Tense generated separately:
(32) S
N
John

past

V
V

51

visit
Sally
The phrase-marker that results from tense-hopping is (33):

52

(33)

John

V
V

N
T

Sally

visit Past
There is a term for this type of forming a unit, as in the forming of a unit
between the verb and the tense in (33). It is called adjunction, which is
defined as follows:
(34)A adjoins to B iff A moves to the periphery (i.e., beginning or end of B),
moves out of B, and forms a new instance of B dominating A and B.
If we look at the structural description of (31), however, it requires adjacency
between Factor #1 and Factor #2. It is this requirement that the Tense be
adjacent to the element that it adjoins to which seems to be violated in the
formation of the yes-no question .
There are two transformations that we are looking at here, in the formation
of the yes-no question. One is Subject-Helping Verb Inversion; the other is
Tense-Hopping. Their formulations are repeated here:
(30) N- T ( {M } )
( {have} )
( {be } )
1 2
--
2 1
(31) T - { have }
{ be }
{ V }
1 - 2 ---
0 - 2+1
Suppose the transformations are ordered, in the sense that they apply, or can only get
the chance to apply, in a fixed sequence, and that (30) is ordered before (31). In that
case, the application of (30) to (32) would be (35):

53

(35)

Tense

Past

John

V
V

visit Sally
Tense-hopping, given in (31), cannot apply to this phrase-marker. The structural
description is not met, since Tense is not next to any possible Factor #2.
The tense is instead affixed to the form do. In other words, the form with
do attached to the tense is a kind of default form, in the sense that the
tense,being listed in the lexicon as an affix, needs a stem to affix to. Our final
transformation is called Do-support, and is formulated as in (36):
(36)Tense- do +Tense, if Tense is not affixed.
So, we have the following three transformations, which apply in this order:
(37)Subject-HV Inversion
Tense-Hopping
Do-Support
The concept of ordering of these transformations is crucial here. When we say that
Subject-HV Inversion is ordered before Tense-Hopping, we do not mean that TenseHopping applies after Subject-HV Inversion. Rather, we mean that Tense-Hopping gets
its chance to apply only after Subject-HV Inversion applies, and can only apply if its
structural description is met. More precisely, ordering is defined as in (38):
(38)Linear Ordering= The transformations in a grammar are ordered, in the sense
that if A is ordered before B, B can only apply if A has applied or had its
chance to apply.
When I say had its chance to apply, I direct your attention to Subject-HV
Inversion, and Tense-Hopping. Subject-HV Inversion is what is known as an
optional transformation, in the sense that , if its structural description is met, it
can apply, but it doesnt have to. Tense Hopping, on the other hand , is known
as an obligatory transformation. It must apply if its structural description is
met. Therefore, if Subject-HV Inversion applies, it will remove phrase-markers
from the domain of phrase-markers to which Tense-Hopping can apply. In the
terms of Paul Kiparsky, who discussed an analogous situation in historical
linguistics ( the study of linguistic change), Subject Helping Verb will bleed
Tense-Hopping, in the sense that it will remove representations to which TenseHopping could otherwise apply.
__________________________________________________________________
Question: Will Subject-Helping Verb Inversion totally bleed Tense-Hopping, in the
sense that it will remove any chance for Tense-Hopping to apply? Consider the following
in your answer, and show how they are generated (disregard the verbal suffixes en and
-ing:
(a) Has he eaten the steak?
(b) Had he eaten the steak?

54
(c) Is he eating the steak?
(d) Was he eating the steak?
It is noteworthy to consider the conception of transformations that we have here,
and the overall organization of the grammar. We have the phrase-structure rules,
which must conform to X-bar theory, and with S as the designated initial symbol,
generating phrase-markers that end with pre-terminal symbols. The terminal symbols
are then inserted from the lexicon. We then have a phrase-marker, which may either
correspond to a sentence of the language or not, depending on whether or not it must
undergo any obligatory transformations.
In Lecture #6, we will see additional evidence for the rule of Tense-hopping, and
for the analysis of Tense that posits a level at which the Tense is separated from the
verb, and will see that we do not need to change the rules of Tense-hopping and
Do-support that we have adopted based on the formation of yes-no questions,
when we look at negation in English and Verb Phrase Ellipsis and their interaction
with Tense-hopping.

55

Lecture #6- Additional Evidence for Tense- Hopping and a Revision of the
Phrase-Structure Rule for S
In the last lecture, we needed to account for the fact that present and
past tense in English, which usually are realized as affixes on verbs, can be
separated from those verbs in some instances. We accounted for this by positing a
level of representation at which the tense was not part of the verb, generated by
the phrase-structure rule (29) of Lecture #5, repeated here:
(29) S-- N T (M) (have) (be) V
and formulating a transformation that turned it into a unit with the verb when it is adjacent
to it, i.e. the transformation of Tense-hopping, given in (31) of Lecture #5:
(31) T - { have }
{ be }
{ V }
1 - 2 ---
0 - 2+1
We also saw that, in the event that the structural description of Tense-hopping was
not met, so that it could not apply, a rule that inserts do would apply, rule (36) of the
last lecture:
(35) Tense- do +Tense, if Tense is not affixed.
The analysis in which tense is generated separate from the verb and then affixed to it
transformationally, unless some earlier transformation destroys the adjacency between
Tense and V, was motivated by the fact that we could find an earlier stage of the
derivation at which Tense and the verb were separated from one another, and some
process applies at this earlier point to destroy the environment for Tense to attach to the
V. We found such a process in our examination of Subject-Helping Verb Inversion. We
will now find two other transformations that destroy the required adjacency between
Tense and V, a transformation that places negatives and a transformation that elides verb
phrases, and we will see that our rule of Tense-hopping, required by the interaction of
the occurrence of Tense and the distribution of yes-no questions in the last lecture, carries
over without modification to the analysis of Tense in these other two areas of English
syntax. One set of rules that is motivated on the basis of the analysis of one area of
grammar must be the set of rules that is motivated on the basis of other areas of grammar.
We dont have an analysis of Tense for yes-no questions that is different from the analysis
of Tense for negation. If we require a different analysis for different areas, we assume
that we must go back to the drawing board, and that one of our two analyses is wrong.
A grammar is an inter-locking system.
A. Negation
There are two types of negation in English sentential negation and constituent
negation. Sentential negation is the negation of a sentence, and constituent
negation is the negation of a part of the sentence. For example, consider two
possible interpretations of (1):
(1) John could not read the book.
One interpretation is that John is unable to read the book, and could be
paraphrased as (2):
(2) It is not the case that John could read the book.
In this case, we would say that, with reference to the semantics, or meaning, of
the sentence, the negative is taking scope over the modal.

56
A second interpretation of (1) is that John is able to refrain from reading the book.
These two interpretations correlate with non-semantic distinctions, such as
phonological or morphological ones. For example, it is possible to contract the
negative onto a helping verb, but only if the negative is an example of sentential,
rather than constituent, negation. For example, (3) can only have the
intepretation of (2), and cannot mean that John is able to refrain from reading the
book:
(3) John couldnt read the book.
We will be concentrating on the distribution of sentential negation for now.
The Placement of Sentential Negation
To consider the placement of sentential negation, let us consider an
affirmative sentence, i.e. one that lacks negation. First, consider a sentence with all the
helping verbs present:
(4) John could have been reading the book.
The negative goes perfectly well after the modal:
(5) John could not have been reading the book.
It cannot occur after have:
(6)* John could have not been reading the book.
It also cannot occur after be:
(7)* John could have been not reading the book.
However, if the modal is absent, the negative will most naturally occur after have:
(8) John has not been reading the book.
(9) * John has been not reading the book.
If both the modal and have are absent, the negative will occur after be:
(10)John is not reading the book.
It seems , then, that the generalization about where the negative will occur is the
following:
(11)The negative will occur after the first helping verb in the sentence, and there is
only one sentential negation in a sentence.
Now, we will try to account for (11) within the grammar, putting Tense aside for the
moment. Assume the phrase-structure rule in (12):
(12) S--- N (M) (have) (be) V
It is impossible to introduce negation by the phrase-structure rules and keep to (12). Let
us see why. Suppose we introduced the negative directly after the modal:
(13)S- N (M) (Neg) (have) (be) V
We could generate (5), but we could not generate (8) or (10). Similarly, if we
generated the negative after have, we could generate (8), but we would also
incorrectly generate (6), and could not generate (5) or (10). Similarly, if we
generated the negative after be, we could generate (10), but we would incorrectly
generate (9) and (7).
The impossible phrase-structure rules that are hypothesized in the preceding
paragraph are (14) and (15):
(14)
S- N (M) (have) (Neg) (be) V
(15)
S-- N (M) (have) (be) (Neg) V
If we accounted for the multiplicity of positions for the negation by introducing the
negative and three different points in the phrase-marker, as in (16), we would run
afoul of the generalization that there can only be one negative per simple sentence,
and would incorrectly generate (17):

57
S--- N (M) (Neg) (have) (Neg) (be) (Neg) V
* John could not have not been not reading books.
Chomsky, in Syntactic Structures, proposed a solution which involved dropping the
assumption that negatives are present in the initial phrase-marker. He proposed, instead,
that negatives are inserted via a transformation after the first helping verb in the phrasemarker. This rule , called negative placement , was formulated as in (18):
(18) N- {M }
{have }
{ be }
12 --
12- Neg
As in the formulation of Subject-Helping Verb Inversion in the last lecture, we again
make reference to the notion first helping verb after the subject, not in the phrasestructure rules, but in the structural description of a transformation. We will return
shortly to the question of why this set of elements is mentioned in the structural
description of two separate transformations, Subject-Helping Verb Inversion and Negative
Placement, but we will assume the formulation of Negative Placement in (18), and, as
we did with Subject-Helping Verb Inversion, consider the distribution in sentences
without helping verbs:
(18)
a. John did not read books.
b. John does not read books.
Again, the tense does not appear as a suffix on the verb, but is separated from it,
appearing instead on a form of do. We can account for this by generating Tense as an
element separate from the verb, as in the phrase-structure rule (29) in Lecture #5,
repeated here:
(29)) S-- N T (M) (have) (be) V
and reformulating negative placement as in (19):
(19) N- T ({M })
({have } )
( { be } )
12 --
62
12- Neg
We would then order negative placement before Tense-hopping, repeated at the
beginning of this lecture. Hence, the deep structure (initial phrase-marker) of (18) would
be (20):
(16)
(17)

58

(20)

Past

John

read

N
N

books
Negative Placement will insert the negative after T, transforming (20) to (21):
(21)
S
N

Past

N
John

Neg

V
V

read

N
N

books
It is clear, however, that Tense-hopping cannot apply, because T and V are not
adjacent, the adjacency having been destroyed by the insertion of the negative element.
Do- support will then apply, as in (36) of Lecture #5.
In short, the same analysis of Tense that we needed for the distribution of yes-no
questions is needed for the analysis of negation.
B. Verb-Phrase Deletion
In English, there is a process that allows verb phrases to fail to be expressed. An
example is (22):
(22) John reads books, and Bill does __,too.
Which means (23):
(23) John reads books, and Bill reads books, too.
Interestingly enough, verb phrases can only fail to be expressed when they follow a
helping verb. There are verbs in English that take verb phrases as complements, typically
the verbs of temporal aspect : start, begin, continue, stop, and keep on. Verb phrases
that appear after the verb begin, for example, cannot elide (pointed out by Joan Bresnan
in a 1976 article , On the Form and Functioning of Transformations, Linguistic Inquiry,
Vol. 7):
(24)*First fire began pouring out of the building, and then smoke began___.

59
It therefore seems as though Verb Phrase Ellipsis requires what Bresnan calls a
context predicate, or trigger, to be mentioned in the structural description of VPEllipsis:
(25) {M
} - V
{ have }
{ be }
1
- 2 --- 1-O
There is an aspect of verb phrase ellipsis that is not specifically mentioned in (25), which
is that the verb phrase that is deleted must be identical to another verb phrase that is
specifically mentioned In the terminology of (3) of Lecture #4, we would say that the
null element must be anaphoric to another verb phrase in the sentence, so that (22)
cannot mean, for instance, (26):
(26) John reads books, and Bill drinks wine, too.
Now, again notice that, in (22), the tense remains as a suffix to do, while the verb,
which is part of the verb phrase, has been deleted. We can account for this by saying that
the rule of Verb phrase Deletion, formulated finally as in (27), is ordered before Tensehopping:
(27) (VP-Ellipsis){T
} - V
{ M }
{Have }
{be }
1
- 2 --
1
- 0
Applying VP-ellipsis to the deep structure of the second conjunct of (22) generates the
phrase-marker in (28):
(28) S
0
N T
N

Past

N
Bill
Tense-hopping cannot apply, in this case because there is nothing for the Tense to
hop onto (i.e., no Factor #2), and so Do-support will apply, yielding (29):
(29) Bill does.
In this case, again, the same rules of Tense-hopping and Do- support that we needed
for Yes-No Questions and the placement of negation are needed for the distribution of
tense in sentences with elided verb phrases, confirming the original analysis of Tense.
This is what is meant by the grammar being an inter-locking system, with rules that are
justified on the basis of one set of considerations internal to the grammar having to jibe
with rules that are needed for other areas of grammar.

60
Lecture #7- S and the X-Bar System
So far, we have the following phrase-structure rule as the initial phrasestructure rule in the grammar:
(1) S-- N T (M) (have) (be) V
All of the other phrasal categories, however, are constructing according to the Xbar schema in (2):
(2) X- (Z) X
X--{ Y X }
{ X Y }
{ X (Y) }
There are several differences between the phrase-structure rule for S that is
given in (1) and the X-bar schema in (2), standing in the way of reformulating S
as an X:
(i)
There is no X under the S, i.e. a single obligatory category.
(ii)
There are a string of non-phrasal elements directly under the S
as sisters, i.e. T , M, have, and be.
There is some evidence that modals, have, and , be are not sisters, but rather
that have and be must head their own VPs. Consider the prediction that (1)
makes with respect to VP-ellipsis if we assume, as we did in Lecture #6, the
following formulation of VP-ellipsis:
(3) (VP-Ellipsis){T
} - V
{ M }
{Have }
{be }
1
- 2 --
1
- 0
Assuming (1) would give us the phrase-marker in (5) for sentence (4):
(4) Bill would have been reading the book.
(5)
S
N
N
N
Bill

T
Past

M have been V
will

V
V
reading

N
Det

the

book
If we apply VP-ellipsis to (5), however, we predict that all of the helping verbs
would have to remain. They can all remain, as in (6):

61
(6) Although John wouldnt have been reading the book, Bill would have
been__.
However, it is also possible to just leave the modal, or the modal and have, as in
(7), and this is not predicted by (5), assuming (3):
(7) a. Although John wouldnt have been reading the book, Bill would__.
b. Although John wouldnt have been reading the book, Bill would have__.
If we assume that the ellipses in (7) arise via deletion of Vs, we would have to
assume that the structure of (4) is (8), rather than (5):
(8) S
N
N

Past

will

V0
V

V 1

Bill

have

V
V

V 2

been

V
V
reading

N
Det

the

book
This would allow for any of the numbered Vs to delete, assuming (3).
A. Are Modals Tensed?
It has occasionally been suggested that modals are generated directly under T.
This would imply that modals are not themselves tensed. A competing view about modals
is that they can themselves be tensed, implying that they must be generated separately
from Tense, so that the rule of Tense-Hopping should really be formulated as in (9):
(9) T- { M }
{have }
{ be }
{ V }
1- 2 --
02+1
First, it can be noted that the modals, while a closed class of items, seem to contain
pairs such as those in (10):
(9) will-would
can-could
shall-should
may-might

62

This by itself is not persuasive, since the closed nature of this class (there are
less than ten modals in the language).
More persuasive is the evidence from idioms. Idioms are sequences of
words whose meaning is non-compositional in nature (i.e., the meaning of
the whole idiom cannot be predicted from the meanings of the individual
words). Examples of idioms are phrases such as make headway, keep tabs
on, kick the bucket, keep track of. Examples are given in (10):
(10)a. John made headway. (means John progressed)
b. John kept tabs on Mary. ( John kept apprised of Marys situation).
c. John kicked the bucket. (John died).
d. John kept track of Mary (same meaning as (b)).
Recalling the role of the lexicon as the repository of idiosyncratic information, idioms ,
by the very nature of their unpredictability and irregularity, must be listed in the lexicon.
Hence, an idiom such as, e.g., make headway, will have a lexical entry as in (11):
(11)make headway, [V make] [ N headway]
The lexicon has unpredictable information, but recall from our discussion of plurals on
English nouns ( i.e., you dont want to specify the plural of book in the lexicon since its
predictable), that you want to keep the amount of information in the lexicon to the bare
minimum . In this connection, there are idioms that include the modal can- specifically,
the idioms can help but and can afford. The requirement that help but and afford occur
with can is seen in (12):
(12)a. *Did John help but notice?
b. *John afforded a new car.
Obviously, the lexical entries for these idioms will have to mention can, and will
look something like (13):
(13)a. can help but, [M can] [V help] [Conj but] V
b. can afford, [ M can] [V afford] N
Interestingly enough, could can replace can in these two idioms:
(14)a. Can he help but notice?
b. Could he help but notice?
(15)a. John can afford a new car.
b. John could afford a new car.
If we posit could as a past tense variant of can, formed by Tense hopping, we can
keep to the lexical entries in (13). If we dont, we would have to have a disjunctive
lexical entry for each of these two idioms:
(16) a. {can
} help but, {[M can ] }[ V help][ Conj but] V
{could }
{[Mcould ] }
b. { can } afford, { [M can ] } [V afford] N
{ could }
{ [M could ] }
We would then have to answer the question of why the same set of elements appears in
two separate disjunctive statements (i.e., the two lexical entries in (16)), whereas if we
analyze could as a past tense variant of can, we do not have to posit a lexical entry that
leads to the posing of this question.
B. The Position of The Modal
Earlier in this lecture, I proposed that the phrase-structure rule for S should be
revised to (17):
(17) S-- N T (M) V

63
I would now like to propose that M heads its own projection as well, perhaps as V.
First, it is time to re-consider the treatment of sentential negation in English. Earlier, we
analyzed negatives as not being present in deep structures, but after Tense and the first
helping verb. It was formulated as (19) in Lecture #6:
(19) N- T ({M })
({have } )
( { be } )
12 --
63
12- Neg
However, claiming that negatives will not be present, but rather inserted after T,
predicts that negatives will not be able to occur in clauses that apparently lack Tense.
Such clauses exist, however. Infinitives are a case in point (we will return to infinitives
in more detail later):
(19)
For John to leave early.
Sentential negation precedes the to:
(20)
For John not to leave early.
If we assume that negatives are only inserted after T, how do we then account for the
presence of negation in clauses that apparently lack T?
Another case that makes the same point is gerunds:
(21)
a. Johns eating steak bothered me.
b. Johns not eating steak bothered me.
One proposal that has been made is due to Jean-Yves Pollock (Verb Movement,
Universal Grammar, and the Structure of IP, Linguistic Inquiry, Vol. 20 (1989)) . He
proposed that negation headed its own projection, so that there is a constituent Neg
Phrase ( Neg). If this is the case, we might adapt (17) to (22):
(22)
S- N T (M) {Neg }
{ V }
Neg - Neg
Neg - Neg V
The deep structure (i.e., initial phrase-marker) of (23) would then be (24):
(23)
John would not eat steak.

64

(24)

N
N
John

Neg

Past will Neg


Neg

V
V

V
eat

N
N
N
steak

Note, however, that in the infinitive, to follows the negative. Therefore, if we assume
that negation is a head that is lower than Tense in the phrase-marker, to cannot be a sister
to Tense, but rather must also be lower than Tense in the phrase-marker.
With this in mind, let us consider the distribution of modals in infinitives. They
are absent in infinitives, and, in fact, are the only helping verbs that do not appear in
infinitives. We can account for this fact if we analyze to as occurring in the same position
as modals, so that the consequence of the absence of modals in infinitives is simply a
consequence of the fact that, in English, we can only have one modal per simple
sentence.
We might, therefore , analyze modals as heads of their own projections. Let us
call them Ms. Therefore, the phrase-structure rule for S would be (25): 3
(25) S- N T { Neg }
{ M }
{ V }
Neg-- Neg
Neg-- { M }
{ V }
M- M
M- M V
Looking at (25), however, we see two elements that must appear in every S- N
and T. N, being a phrasal constituent, is not a possible head, but T is. We might
therefore view T as being a possible head of S, so that S would really be T. However,
we would then have to posit a T, consisting of T and a following phrasal constituent,
which would be the complement of T. In other words, the structure of , e.g. (26),
would be (27):
At this point, note that we have the same disjunction, {V}, in two separate phrase-structure
{M}
rules. There is a way to eliminate this, but we will not go into it at this time.
3

65
(26)John likes pizza.
(27) T
N

John

Pres

V
V

like

N
N

pizza
We can actually find somewhat direct evidence for the constituency of T and
the following phrasal unit, if we assume that only constituents can conjoin
(This argument is originally due to Ray Dougherty in his (1970) article, Recent
Studies on Language Universals, Foundations of Language, Vol. 5).
We must find an element that we know resides in T, based on our analysis so
far. One such element is the do that results from Tense-Hopping being unable to
apply, as in (28):
(28)John does not like pizza.
As Dougherty points out, we can conjoin such sequences as those in (29):
(29)John does not like pizza and does not like steak.
Assuming a T allows us to conjoin Ts, so that the structure of (29) would be (30):
(30)T
N
N
N T

T
T

and
Neg

John does Neg


Neg

T
T

Neg

does

Neg

Neg

V
V
like

V
V

N
pizza

like

steak

Hence, we have direct evidence for the constituency of T and a following phrase. If we
analyze S as the maximal projection of T, we can call this phrase a T, and analyze the

66
phrase following T as its complement. Hence, our phrase structure rules at the clausal
level are given in (31):
(31) T-- N T
T- T { Neg}
{ M }
{ V }
Neg- Neg
Neg- Neg { M }
{V }
M- M
M- M V
C. Restructuring
If we assume that negatives are generated between Tense and the main verb, and
are not placed there by a transformation of negative-placement, and we assume that the
helping verbs are generated lower, and to the right of, negatives, we must account for the
fact that the helping verbs precede, rather than follow, sentential negatives:
(31)a. John would not eat the steak.
b. *John not would eat the steak.
(32)a. John has not eaten the steak.
b. *John not has eaten the steak.
(33)a. John is not eating the steak.
b. *John not is eating the steak.
We might account for the ungrammaticality of (31)(b)-(33)(b) by noting that Tensehopping is blocked by the negation, but if that were the reason for the unacceptability of
the (b) examples, we would expect Do-support to be able to rescue them, contrary to
fact:
(34)*John did not will eat the steak.
(35)*John does not have eaten the steak.
(36)*John does not be eating the steak.
Rather, assuming that the negatives stay in the position in which they are generated by
the phrase-structure rules, we must move the helping verbs to the left of them. One way
to state this movement is as a movement of the helping verb to Tense, formulated as in
(37), giving the helping verbs a feature [ +Aux] (for Auxiliary):
(37) Restructuring
T- (Neg) - +Aux
12 - 3 --
3+1 -2 - 0
Hence, the D-Structure of , e.g. (32), would be as in (38):

67

(38)
(37) T
N

Neg

pres

Neg

John

Neg

V
V

V
+Aux

V
V

have
V
eaten

N
Det

the

steak
Factoring the phrase-marker according to the structural description of restructuring gives
us the following factored phrase-marker:4

We will return to the question of whether anything is left behind when a head moves out of its
maximal projection.
4

68

(39) T
N

Neg

pres

Neg

John

Neg

V
V

V
+Aux

V
V

have
V
eaten

N
Det

the

N
steak

69

Finally, restructuring yields (40):


(40)
T
N

N
N
John

T
V
+Aux

Neg
T
Pres

Neg
Neg

have

V
V
V
V

eaten
the steak
Positing restructuring, as in (37), enables us to solve a problem about the
helping verb system that we have not talked about, but which is a problem of
descriptive adequacy of the earlier account. Recall that we formulated SubjectHelping Verb Inversion in terms of (41):
(41) T ({ M } )
({have} )
({be } )
Negative Placement was also formulated in terms of the set of elements in (41).
We no longer have a rule of negative placement, but it is still the case that
the same set of verbal elements that inverts in questions will appear before the
negative, and this was formulated as an accident. Furthermore, (41) does not
form a constituent. However, if we posit the restructuring operation, we can
simply reformulate Subject- Helping Verb Inversion as in (42), in which case a
constituent is moving:
(42) N- T
1 - 2 - 2-1

70
Lecture #8- NP-Movements
In this lecture, we will shift gears a bit and talk about another transformation in
English syntax, but in an area distinct from the helping verb system. Specifically, we
shall motivate a transformation that moves Ns into subject position, showing two
constructions in which this transformation is operative.
I. Passives
.First,, we shall examine English verbal passives, and we will show that English verbal
passives must be transformationally derived. To see this, we shall examine the
consequences of not assuming that English verbal passives are transformationally
derived, showing that generating English verbal passives directly via the phrasestructure rules would lead us to miss crucial generalizations.
An example of a passive is the following:
(1) John was visited by werewolves.
A. Correspondence With Active Transitive Verbs
It is clear, first of all, that passives, by and large, correspond to transitive active
verbs. Hence, we have pairs such as the following:
(2) a. John is hated by everybody.
b. Everybody hates John.
(3) a. John saw Sally.
b. Sally was seen by John.
(4) a. Lee Harvey Oswald assassinated JFK.
b. JFK was assassinated by Lee Harvey Oswald.
But not pairs like these:
(5) a. John laughed.
b. *John was laughed.
If actives and passives were generated independently by the phrase-structure rules, we would
have an expanded lexicon, and no account of the fact that passives corresponding to active
transitives are missing.
B. Thematic Roles
The noun phrases that a verb selects, as well as the subject, are said to be the
verbs arguments, and verbs differ on the semantic relations of the arguments to the
verb. For example, the verbs fear and frighten are both transitive, and yet the semantic
relations of the subject and object are reversed. The subject of fear is said to be the
experiencer of the emotion (fear being an emotive predicate), and the object is said to be
the theme. Hence, (6) and (7) are paraphrases:
(6) John fears thunder.
(7) Thunder frightens John.

The semantic relations that the arguments of the predicate bear to the predicate are
termed thematic relations. They include notions such as agent, theme, patient,
experiencer, etc. (for a lucid account of thematic relations, see R. Jackendoff (1987),
The Status of Thematic Relations in Linguistic Theory, Linguistic Inquiry, Vol.
18.)

71
The thematic relations that are exhibited in passive sentences are the same as the
thematic relations in the corresponding actives, but the thematic relations in passives
are simply realized in different positions. There is a simple algorithm (i.e., method of
computing) the positions in which thematic relations in passives are realized:
(8) The passive subject bears the thematic relation of the post-verbal NP 5 in the
corresponding active, and the passive object of by bears the thematic relation
of the active subject.
Clearly, it would be desirable to have the grammar of English reflect (8) in some
way, rather than stating (8) as a sort of post-hoc, after the fact observation.
C. Idiom Chunks
The third regularity between passives and actives concerns the form of idioms,
sequences of words which have meanings that are non-compositional in nature.
Examples of idioms are : keep track of , keep tabs on, make headway.
Sentences with idioms include such sentences as (9):
(9) a. John kept track of Sally.
b. John kept tabs on Sally.
c. John made significant headway.
By their very nature, idioms are unpredictable. Every language has idioms, and
recall that there is a specific place in the grammar to put unpredictable information- the
lexicon, which we have called the suppository of idiosyncratic information. One way
of representing idioms is as in (10):
(10) a. track, N, +[keep____[P of X ]]
b. tabs, N, +[ keep___ [P on X ]]
c. headway, N, +[make ____]
Representing the idioms as in (10) reflects the fact that the sequence of words is not just an
isolated list of words, but rather that the words are sequenced in a way that conforms to the
general syntactic patterns of the language. In particular, NPs are generated after verbs, and
keep and make in the idioms above are formally verbs, and track, tabs, and headway are nouns
that are sequenced in the same way that non-idioms are sequences.
The nouns in each of the three idioms above can appear as subjects of the verbs in the
passive voice:
(11)a. Careful track was kept of Sally.
b. Close tabs were kept on Sally.
c. Significant headway was made by John.
If we generated actives and passives separately, we would have to have disjunctive
subcategorization frames for these nouns, so that, e.g. (10)(c) would have to be modified to (12):
(12) headway, N,{ +[ make__
] }
{+[ ___be made] }
Clearly, disjunctive subcategorization frames are missing the relationship between actives
and passives. If we allow such disjunctive subcategorization frames, we would then have to ask
why we couldnt have a disjunctive frame for headway as in (13), for instance:
(13) *headway N, { + [make____
] }
There is a reason that I am using the term post-verbal NP rather than object, in that I will be
trying to show in Lecture #10 that there are post-verbal NPs that are not objects, but which
participate in the passive construction.
5

72
{+ [ ____ be seen ] }
Generating actives and passives separately does not predict the non-occurrence of lexical
entries such as (13).
II.
The Solution- Generating Passives Transformationally
Chomsky (1957), in Syntactic Structures (Mouton), after noticing the above
regularities between English actives and passives, proposed to capture them by
not generating verbal passives directly via the phrase-structure rules, but
rather by forming passives from the phrase-markers for the corresponding active
sentences. The transformation was formulated as in (14):
(14) N- X - V - N
1 - 2 - 3 - 4--
4- 2 - be+en 3- by +1
We immediately capture the fact that passives correspond to actives in
which the active verbs take post-verbal NPs, because of the mentioning of Ns as Term
#4 in the structural description of the passive transformation. Furthermore, if we
assume that thematic roles are assigned in deep structures, the correspondence stated
in (8) is accounted for. The passive subject is, in deep-structure, the NP that follows
the verb in the corresponding active, and since thematic roles are assigned in deep
structure, whatever thematic role the post-verbal NP got in deep structure will be
retained when it moves. Similarly, if the passive object of by is generated as the deepstructure subject, whatever thematic role that the subject was assigned at deep structure
will be retained if the subject is postposed in the passive transformation.
Finally, given that the distribution of idioms is stated in the lexicon, and lexical
information is only accessed at deep structure, an idiom chunk that is a postverbal
noun phrase will be permitted to move to subject position.
Hence, a transformational derivation of verbal passives will capture all of the
regularities described at the beginning of this section.
III.
On Restricting the Scope of the Passive Transformation
The passive transformation, as formulated in Section II, has a number of components:
(i) it preposes the post- verbal NP into subject position; (ii) it postposes the original
subject to the position after by; (iii) it inserts be +en and by. We shall now see that the
postposing of the subject , component (ii), and the insertion of be +en and by,
component (iii), are best viewed as not being transformations.
With respect to agent postposing, we can see that the grammar of English needs a
mechanism to give the object of by, in certain instances, the thematic role that it would have
received had it appeared in subject position, without movement being the way of accounting for
this dependency . Norbert Hornstein first noticed nominals such as (15) in S and the X-Bar
Convention, , Vol. 3 (1977):
(14) Johns portrait of Nixon by Warhol.
Warhol is, of course, interpreted as the agent of the verb related to the nominalization portrait,
i.e. portray, and John is interpreted as the owner of the portrait. However, Warhol could not have
moved from any other position within the nominal, since all of the other positions are occupied.
Hornsteins conclusion is that the agent that occurs as the complement of by must be generated in
that position, and that there must be a semantic mechanism that interprets agents in two placessubject position, and the object position of by. Nominalizations such as (14) point to the

73
necessity of such a mechanism in some cases, and it would seem natural, given its necessity, to
posit it for verbal passives. This sheds new light on so-called truncated passives, which lack byphrases altogether, as in (15):
(15) John was murdered.
It had previously been thought that there was an underlying agent there which was
deleted by a transformation called Unspecified Agent Deletion, an optional
transformation which, if it did not apply, would yield (16):
(16) John was murdered by someone.
Another way of interpreting truncated passives is to say that by-phrases are
adjuncts, and that the subject thematic role is present but not linked to an argument.
Notice that I say that by takes whatever thematic role the subject would take. It has
occasionally been suggested that by marks agents. This cannot be right, however, in
view of passives such as (17):
(17) a. A crushing defeat was endured by Germany.
b. A glancing blow was suffered by John.
The subjects of the verbs endure and suffer are not agents.
Another problem exists with the idea that passive be is always inserted via a passive
transformation, in that we find passives without be:
(18) I want him given a book.
A rule deleting to be is occasionally invoked, so that the deep structure of (18) would
be the structure corresponding to (19):
(19) I want him to be given a book.
Presumably, to be deletion would assign a common deep structure to (20)(a) and
(20)(b) as well:
(20)(a) I consider him to be crazy.
(b) I consider him crazy.
However, we can find evidence against the rule of to be deletion if we consider
the English expletive there , which needs a verb such as be for its appearance:
(21)
There is a valid reason for his absence.
We can have a full infinitival counterpart to (21) within a VP headed by consider, but the copula
must be retained:
(22)
a. I consider there to be a valid reason for his absence.
b. * I consider there a valid reason for his absence.
Assuming that the expletive there requires the copula, we must ask why, if there is a
rule of to be deletion, the expletive would not be licensed at deep structure by the
copula, followed by the copulas deletion, yielding (22)(b). Because (22)(b) is not
acceptable, we can account for its unacceptability by not positing a deep structure for
such instances of secondary predication which posits the be.
If there is no rule of to be deletion, we must conclude, then, that passives
contained in such larger structures as (18) must be derived without be having been
present in their formation.
Hence, the transformation involved in the formation of English verbal passives is
simply a transformation that preposes the post-verbal NP, formulated as in (23):

74
(23)

N - X- V - N
1 - 2 - 3 - 4 -- 4-2-3- 0
We will now see that (23), which we will call NP-Preposing, operates in a wider
range of constructions than just passives. In the next section, we will see the operation
of NP-Preposing in a class of superficially intransitive verbs that do not take passive
morphology at all.
II.
Unaccusatives

There is a great deal of evidence from other languages that superficially intransitive
verbs differ in the syntactic position of the one argument that occurs with the verb.
The sole argument generally acts as a surface structure subject, but for some verbs,
there is evidence that the surface subject is an underlying object, while the surface
subjects of other verbs are deep-structure subjects as well. Verbs of the latter class
take subjects that are agents, while verbs of the former class take subjects that are
non-agents. Examples of the two types of verbs are the verbs telephone and arrive:
(23)
(24)

John telephoned.
John arrived.
Therefore, the deep structures of (23) and (24) are (25) and (26), respectively:

75

(25)

John

Past

V
V
Telephone

(26)
N

T
T

Past

V
V

arrive
John
There is no evidence for this distinction in English, but there is a great deal of
evidence from other languages. Furthermore, there seems to be no basis for
learning this distinction in these other languages, and the two classes in each of
the languages that show overt evidence of the distinction seem to have the same
set of verbs.
We will first look at the evidence from Italian, as first discussed by David
Perlmutter (1978, Impersonal Passives and the Unaccusative Hypothesis,
Proceedings of the Berkeley Linguistic Society). Perlmutter noted that Italian
has two auxiliaries that are used for expressing past tense- the verbs avere
(roughly have) and essere (roughly be). Transitive verbs take avere in the past
tense:
(27) (L. Burzio (1986), Italian Syntax, Reidel, ex.((80)(a))
Lartigliera ha affondato due navi memiche.
The artillery has (A) sunk two enemy ships.
Agentive intransitives take avere:
(28)) (Burzios (79)(b))Giovanni ha telefonato.

76
Giovanni has telephoned.
Non-agentives, however, take essere as the past auxiliary, as do passives:
(29) (Burzios (79)(a)): Giovanni e arrivato.
Giovanni has arrived.
(30) (Burzios (81)(a)): Maria e stata accusata.
Maria has been accused.
The generalization that Perlmutter arrives at is the following:
(31) Essere is the past tense auxiliary in Italian when the surface structure subject
is not the underlying subject, and avere is the auxiliary that is used when
the surface structure subject is the underlying subject.
Generalization (31) receives immediate support from the fact that essere is the past tense
auxiliary for passives, as in (30), while avere is the auxiliary used for transitives,
assuming that subjects of transitives do not have any possible point of origin within the
verb phrase. If one looks at passives and transitives as the two types of verbs whose
subject origins are transparently justified, we can extrapolate from the auxiliary choice
for these two types of verbs to the two types of intransitives, with the non-agentive
intransitives patterning with passives (in that both take essere) while the agentive
intransitives pattern with transitives (in that both take avere).
Further support for this distinction in Italian comes from the distribution of the partitive
clitic ne (meaning of them). It can modify quantified objects (the quantified noun phrases that
ne modifies must follow ne, but Italian has subject-postposing, meaning that the subject can
appear in final position in the sentence), but not quantified subjects:
(32) (Burzios 1.7a): Giovanni ne invitera molti.
Giovanni of-them will invite many.
(33)(Burzios 1.5iii) * Ne esamineranno il caso molti.
Of-them will examine the case many.
A postposed subject of a non-agentive intransitive may be modified by ne, while a postposed
subject of an agentive intransitive cannot be:
(34) (Burzios 1.5I): Ne arriveranno molti.
Of-them will arrive many.
Many of them will arrive.
(35) (Burzios 1.5ii): * Ne telefoneranno molti.
Of-them will telephone many.
We have the same distinction between agentive intransitives and non-agentive intransitives
that we had in the discussion of auxiliary selection. Furthermore, objects can be modified by
ne, as in (32). We can make sense of these facts if we say that the post-verbal noun phrases
that follow non-agentive intransitives are really not post-verbal subjects, but rather are objects
that have never been moved. The post-verbal noun phrases that follow agentive intranstives,
however, are postposed subjects, adjoined to the verb phrase. Hence , we can say that ne
can only modify objects.
An intransitive verb whose sole argument is an underlying object is called an
unaccusative verb, while an intransitive verb whose sole argument is an underlying subject is
called an unergative verb. Russian also gives evidence for the unaccusative-unergative contrast,
based on observations of Leonard Babby ( Existential Sentences in Russian (1980), Slavica
Publishers). Russian is a heavily Case-marked language, and the object is usually marked in
the accusative Case. However, when the main verb is negated, the object may be marked in
the genitive Case. This is termed the genitive of negation. However, subjects of non-

77
agentive intransitives may also be marked with the genitive of negation. Examples are given
in (36) and (37):
(36) (Babbys (4)(b)): V- nasem- lesu-ne-ratet-gribov.
In-our-forest-neg-grow(3rd. sg.) mushrooms (GEN pl.)
There are no mushrooms growing in our forest.
(37) (Babbys (6)(b)):
(38)
Ne-ostalos-somnenij.
Neg-remained(3rd.n. sg.)-doubts (GEN pl.).
There were no doubts that remained.
Subjects of negatied transitive verbs that are nominative in the affirmative cannot take the
genitive:
(39) (D. Pesetsky, Paths and Categories, unpublished Doctoral dissertation, MIT (1982), ex.
(15)):
a. ni odna gazeta ne pecetaet takuji erundu.
Not one newspaper(fem nom sg) NEG prints (3sg) such nonsense (fem acc sg).
b. *ni odnoj gazety ne pecataet takuju erundu.
fem. gen. sg.
Also, agentive subjects of negated intransitive verbs cannot appear in the genitive.
(40) (Pesetskys (9)):
a. v pivbarax
kulturnye ljudi ne pjut.
In beerhalls refined people NEG drink.
(masc. nom. pl.) (3 pl).
b. * v pivbarax kulturnyx ljudej ne pet.
(masc. gen. pl.) (3rd sg.)
Babbys generalization is that those subjects that can appear in the genitive of negation are
in the scope of negation at D-Structure, in fact are D-Structure direct objects, and the
distinction between unaccusatives and unergatives makes this distinction correctly for
Russian. Passive subjects, as would be predicted, may also appear in the genitive of
negation:
(41) (Babbys (24)(a)):
ne- naslos- mesta.
NEG- be found (n. sg).- seat/place (GEN n.sg.).
There was not a seat to be found.
To sum up this section, there seems to be strong evidence from Italian and Russian
that some superficially intransitive verbs are subjectless but have underlying objects,
while other superficially intransitive verbs are objectless but have underlying subjects, and
that each class has the same members, abstracting across translation equivalents.
Furthermore, if we assume that children do not get corrected for ungrammaticality, it is
impossible to see how they would learn which verbs were unaccusative and which were
unergative. Therefore, we assume that the distinction is universal. Getting back to our
formulation of passive as simply being NP-preposing, we see that the rule of NPpreposing that is given in (23), repeated here, will also work for unaccusatives.
(23) N - X- V - N
1 - 2 - 3 - 4 -- 4-2-3- 0
We can view the term passive as a term that denotes a particular syntactic construction,
with particular pragmatic properties. We see, however, that it is inappropriate to term the
transformation that derives passives as the passive transformation, because it operates
more widely than just in passives. It operates in unaccusatives as well, and we would
therefore not call it a construction-specific transformation.

78

Lecture #9- Clausal Complementation


We will now concentrate on the syntax of clausal complementation, or, more
generally, the mechanism by which clauses function as arguments. Examples are given in
(1) and (2):
(1) (a) That John visited Sally bothered me.
(b) Bill claimed that John visited Sally.
(2) a. For John to visit Sally would bother me.
b. Bill would prefer for John to visit Sally.
A. The Generation of Complementizers
The words that and for which introduce these embedded sentences in English are called
complementizers. The complementizer that introduces finite clauses, and the complementizer for
introduces infinitives. Hence , we have the following pattern of acceptability:
(3) a.That John visited Sally.
b. *That John to visit Sally.
c. *For John visited Sally.
d. For John to visit Sally.
The earliest treatment of sentential complementation in modern syntactic theory was P.
Rosenbaums A Grammar of English Sentential Complementation (MIT Press (1967)), and
Rosenbaum proposed that complementizers were not present in deep structure but, rather, were
inserted transformationally. The two transformations that Rosenbaum proposed were along the
lines of (4):
(4) a. T- that T
b. T- for-to T
To N- T
1 - 2 - 3-
0- 2 - 1
However, Joan Bresnan (1970) (On Complementizers: Toward A Syntactic Theory of
Complement Types, Foundations of Language Vol. 5) argued that complementizers should be
present in deep structure. Her arguments basically showed that complementizers are on a par with
prepositions. Just as different meanings are signalled by different prepositions, choice of
complementizer can affect meaning, as in the pair in (5):
(5)(a) John would hate it that Fred is more popular than him.
(b) John would hate it for Fred to be more popular than him.
Sentence (5)(a), in which the clausal complement is introduced by the that-complementizer,
presupposes the truth of the complement- in other words, the utterer of (5)(a) is committed to the
belief that Fred is more popular than John. Presupposition is defined as follows:
(6)A presupposes B if and only if B is true whenever A or the negation of A is true.
In other words, just as the speaker must believe that Fred is more popular than John in (5)(a),
the utterer of (7) is likewise committed to that belief:
(7) John would not hate it that Fred is more popular than him.
Verbs such as hate are termed factive (the term is due originally to Paul and Carol
Kiparsky in an important paper, Fact, which appeared in M. Bierwisch and K.
Heidolph (1970), Recent Progress in Linguistics, Mouton), in that they presuppose the
truth of their complements. The important point is that while that-complements may be

79
interpreted factively, for-to complements can never be, as noted by Kiparsky &
Kiparsky (1970).
A second point about the choice of complementizer is that it is lexically restricted,
a point made by Bresnan. While verbs such as hate can take either that-complements
or for-to complements, verbs such as claim can only take that-complements, and verbs
such as wait can only take for-to complements:
(8) a. John claimed that Fred was more popular than him.
b. *John claimed for Fred to be more popular than him.
(9) a. *John waited that Fred was more popular than him.
b. John waited for Fred to be more popular than him.
Therefore, if there were a complementizer-placement transformation, as proposed
by Rosenbaum, there would actually have to be two complementizer-placement
transformations-one to insert the that-complementizer, and the other to insert the for-to
complementizer. The transformation would have to be lexically restricted, so that
claim would trigger the that-placement transformation, wait would trigger the for-to
placement transformation, and hate would trigger either one, with a rule of
interpretation interpreting the that-complement as factive.
We have been making a division thus far between the lexicon, which contains
unpredictable information that is peculiar to particular lexical items, and the syntax,
which is more regular. Syntactic rules are thought to apply maximally generally, but
the marking of a large number of lexical items as to which of a family of
transformations apply to them undercuts this division between lexical (idiosyncratic)
and grammatical (systematic). Therefore, Bresnan proposes to base-generate
complementizers (i.e., generate them directly via the phrase-structure rules). She
originally proposed the phrase-structure rule in (10):
(10) S-- Comp S
and the selection by particular predicates is now a simple matter of selection, rather
than features that trigger particular transformations. Hence, the lexical entries for
hate, claim, and wait would be as in (11):
(11) a.hate, V, +[___ [S [Comp {that } ]]]
{ for }
b. claim, V, + [ ___[S [Comp that] ]]]
c. wait, V, +[ ____[S [ Comp for ]]]
Updating (10) into current X-bar terms, we would say that Comp is the head of this
clausal projection, so that (10) would be replaced by (12):
(12) a.C- C
b.C-- C T
Bresnan (1974) ( The Position of Certain Clause-Particles in Phrase-Structure,
Linguistic Inquiry Vol. 5) later provides direct evidence for the constituency in which
the rest of the sentence forms a constituent that is sister to the complementizer. There is
a construction known as the Right-Node-Raising construction, in which, in a conjoined
phrase, if the rightmost elements of the conjuncts are identical, the final rightmost

80
element is set off intonationally as a pause, and the previous rightmost elements are
deleted. An example is (13):
(13) Mary wrote, and John performed, a beautiful Peruvian love song.
which is presumably related to (14):
(14) Mary wrote a beautiful Peruvian love song, and John performed a beautiful
Peruvian love song.
The structure of (13) is plausibly (15):

81

(15)

T
N

and

Mary T
Past

T
N

V
V
V
wrote

John

N
T

a beautiful Peruvian
love song

Past

V
V
performed

The assumption is that only constituents can appear in the position after the pause in
the Right-Node-Raising constuction. With this in mind, Bresnan notes that the
sequence after the complementizer can appear in this position:
(16) Im wondering whether, but Im not sure that, your hypothesis is correct.
Hence, we have evidence for the constituency in which the complementizer is set off
from the rest of the clause.
B.For Infinitives
Let us now consider the position of the infinitive marker to. As we noted in
Lecture #7, it follows the sentential negation, and is incompatible with the presence
of a modal. The incompatibility of to with the modal could follow from the fact that
only one modal is permitted per clause if we analyzed the to itself as a modal.
We could therefore assume that the infinitive takes a null T, which selects to in
the modal position. Hence, the structure of (17) would be (18):
(17)For John to leave.

82

(18)

C
C

C
For

T
N
John

T
T

M
M
to

V
V
V
Leave

We can account for the dependencies between that-complementizers and finiteness, and
for-complementizers and non-finiteness, via the mechanism of selection, if we assume
that heads select for the heads of their complements ( as I had argued in Heads and
Projections, in M. Baltin & A. Kroch, eds., Alternative Conceptions of PhraseStructure (1989), University of Chicago Press). Hence, the following lexical entries
would suffice:
(19) that, C, +[____ [T {Pres } ]
{ Past}
(20) for, C, +[ ___[ T 0 ]
(21) 0, T, +[___[M to] ]
C. Clauses in NP Positions
It is clear that clauses can appear in subject position. To see this, consider an
alternative way of expressing the previous sentence:
(21) That clauses can appear in subject position is clear.
It is also clear that clauses can appear in object position:
(22) John proved that Bill liked Sally.
It is also clear that clauses in object position can passivize:
(23) That Bill liked Sally was believed by everybody.

83
Rosenbaum proposed to account for the ability of clauses to appear in subject and
object position, as well as the ability of clauses to passivize, by positing a phrasestructure rule as in (24):
(24) N- C
However, this violates X-bar theory, in that N is not a projection (ultimately) of
0
N . J. Emonds (1976) ( A Transformational Approach to English Syntax, Academic
Press) modified Rosenbaums analysis by proposing that these clauses were actually
complements to a null N0 head, so that we have the phrase-structure rule as in (25):
(25) N-- N C
Hence, the structure of , e.g., (21), would be as in (26):
(26)
(25)
C
C
C

N
0

C Pres

C
C
That

V
T

be

A
A

clauses can
A
appear in
subject position
clear
The need for a phrase-structure rule like (25) is transparently justified by the fact that a
handful of nouns such as fact and claim take clausal complements overtly:
(26) a. the fact that clauses can appear in subject position.
b. the claim that clauses can appear in subject position.
Positing such a phrase-structure rule will automatically account for the fact that these
clauses can passivize.
D. Extraposition
However, there is one fact about clausal complementation that we have not yet
accounted for, and that is the generalization in (27):

84
(27) For every sentence in which a clause appears in subject position, there is a
variant of the sentence in which the clause appears at the end of the sentence , and the
expletive it appears in subject position.
For example, we have the following pairs:
(28)a. For Fred to leave would bother me.
b. It would bother me for Fred to leave.
(29) a. That Fred is crazy is obvious.
b. It is obvious that Fred is crazy.
(30) a. That Fred has blood on his hands proves nothing.
b. It proves nothing that Fred has blood on his hands.
The exceptionless nature of this generalization strongly suggests that the grammar of
English should be formulated in such a way as to express it . There have been two main
approaches to capturing this generalization transformatonally: extraposition and
intraposition. The extraposition approach moves the C rightward and inserts the
expletive it. The intraposition approach takes the variant in which the C is in clausefinal position as basic, and moves the C leftward into the subject position.
Extraposition was originally proposed by Rosenbaum, and Intraposition was proposed
by J. Emonds (1970) in his MIT Doctoral dissertation, Root, Structure-Preserving, and
Local Transformations.
Extraposition can be formulated as follows:
(31) [N it - C] - X - V
1- 2 3 - 4---
1 - 0 - 3 - 4+2
Hence, the D-structure of (28)(a) would be (32):
(32)
C
C
C

T
N

Past

C
For

T will
N

V
V

85

Fred

T
0

M bother

me

M
M

to

V
V
Leave

Extraposition would then adjoin the C to the matrix V, yielding (33):

86
(33)

C
C

T
N
It

T
T

Past

M
M

will

V
V

For Fred to leave


N

bother

me

Intraposition works in reverse: the underlying structure of (28)(a) and (b) would, under
the intraposition analysis, be (34):
(34)
C
C
C

T
N
It

T
T

past

M
M

will

V
V

87
bother
me
for Fred to leave
Intraposition would be formulated as in (35):
(35) it- XC
1- 23-
3- 20
After applying intraposition to (34), the surface structure would be as in (36):

(36)

C
C

T
N

Past

C
C
For

T
N

will

Fred

T
0

V
M

bother

N
me

M
M
to

V
V
V
Leave

The intraposition account generates the clausal argument within the V, and it is tied
to the independent motivation for positions within the V for clausal arguments. For
example,

88
the intraposition analysis of (28)(b), in which the clausal argument is generated within
the VP, depends upon the phrase-structure rule in (37):
(37) V-- V (N) (C)
and the analysis requires, for its plausibility, that there be independent instances of
this pattern in which the subject is something other than the expletive it. We can find
such independent instances of the pattern V N C. For example, we have the verbs
convince, tell, and persuade:
(38) a. John convinced Sally that she should leave.
b. John told Sally that she should leave.
c. John persuaded Sally that she should leave.
However, we have no instances of the pattern in (39), a verb followed by two
clausal arguments:
(39) *
V
V
C
C
Verbs with sentential subjects and complements exist, however (these are known
as bisentential verbs):
(39) a. That John has blood on his hands proves that hes the murderer.
b. That John has blood on his hands convinces me that hes the murderer.
c. That John has blood on his hands suggests that hes the murderer.
d. That John has blood on his hands indicates that hes the murderer.
e. That John has blood on his hands means that hes the murderer.
If clausal arguments are generated within the VP, as they are under the intraposition
analysis, we would need to generate the configuration in (39), but this configuration
would only be employed for verbs in which one of the arguments ended up in subject
position. We would then have to answer the question of why no verbs existed which
allowed both clausal arguments to remain inside the V, i.e. why there are no verbs of
the form in (40):
(40) * John glorped that Fred has blood on his hands that hes the murderer.
If we adopt the extraposition analysis, which allows for clauses to be generated in
subject position and moved rightward, we do not have this problem. We would
generate the clausal subjects in (39) in subject position, and only generate one clause
inside the V, making (39)(a-e parallel to (41)(a-e),or (42)(a-e), in which the clausal
subject or object is replaced by a constituent that is clearly an N:
(41) (a) This proves that hes the murderer.
(b) This convinces me that hes the murderer.
(c) This suggests that hes the murderer.
(d) This indicates that hes the murderer.
(e) This means that hes the murderer.
(42) (a)That John has blood on his hands proves nothing.
(b) That John has blood on his hands convinces me of nothing.
(c) That John has blood on his hands suggests his guilt.

89
(d) That John has blood on his hands indicates his guilt.
(e) That John has blood on his hands means nothing.
I believe that a further argument can be made for allowing clauses to be generated in
subject position, and this argument deals with the possibility of formulating a set of
linking principles , principles that link thematic roles and syntactic positions. Recall
that, in Lecture #8, when we discussed unaccusatives, we noted that the same set of
verbs (i.e. translation equivalents of each other) were unaccusative and unergative, so
that agentive intransitives were unergative, and non-agentive intransitives were
unaccusative. The cross-linguistic predictability of the membership of the two classes
of verbs indicated strongly that Universal Grammar has some linking principles that
require this. A problem with the formulation of such linking principles, however, is
that some psychological predicates seem to exist which are paired in such a way that
the two members of the pair take the same set of arguments, and the same set of
thematic relations of the arguments, but the thematic relations of the arguments of
each verb are realized in the opposite positions from the other verb. The verbs fear and
frighten show this:
(43) a. John fears Sally.
b. Sally frightens John.
Each of these verbs takes an argument that is called an experiencer, the experiencer of
the emotion, as well as what could, for convenience, be called the theme, the object of
the emotion. However, the experiencer is t he subject of fear but the object of frighten,
and the theme is the object of fear but the subject of frighten. If these two sentences are
synonymous, and hence have the same array of thematic relations for their arguments,
how could we say that there is a universal set of linking principles that allows us to
predict the syntactic position of an argument from its thematic relation?
Grimshaw (1990)( Argument Structure, MIT Press) provides a solution. She
claims that the synonymy of the pair in (43) is only apparent. In particular, she notes
that there is a grammatical difference between verbs in which the theme is the subject
and the experience is the object and verbs with experiencer subject and theme object
verbs. Verbs of the former class can appear in the progressive, while verbs of the latter
class cannot:
(44) a. *John is fearing Sally.
b. Sally is frightening John.
She ties this difference in progressivizability to the claim that object-experiencer
verbs are accomplishment verbs ( recall the discussion in terms of Zeno Vendlers
classification in the early lectures), while subject-experiencer verbs are states. She
then posits a lexical representation for accomplishment verbs in which they are
decomposed into two parts- (i) an activity of causing, which results in (ii) a state.
Hence, the lexical representation of the meaning of,e.g. frighten would be cause to fear,
as in (45):
(45) frighten, V, [[ CAUSE][ ENTITYi][ STATE [FEAR][ ENTITY]j]
The linking principle, then ,would link the causer to the subject position.

90
Getting back to the current concern, which is the underlying position of
apparently clausal subjects, there is an extremely noteworthy fact about bisentential
verbs. Every bisentential verbs allows for the expression of an experiencer in nonsubject position, underlined in the examples below:
(46) a. That John has blood on his hands proves to me that hes the
murderer.
b.That John has blood on his hands convinces me that hes the murderer.
c. That John has blood on his hands suggests to me that hes the
murderer.
d. That John has blood on his hands indicates to me that hes the
murderer.
e. That John has blood on his hands means to me that hes the murderer.
I would then suggest that each bisentential verb has, as at least part of its meaning,
something like cause to believe, and the sentential subject would be the cause of the
experiencers belief. By the linking principle suggested by Grimshaw, then, the clausal
subject would be linked to the subject position.
In the next lecture, we will see more syntactic evidence for the extraposition
analysis over the intraposition analysis, but the focus of the next lecture will be on
infinitival complementation, and the generation of infinitives with no overt subjects.

91

Lecture #10- Infinitival Complementation


So far, we have seen evidence that the designated initial symbol in the
grammar is not S, or even T, but C, so that the designated initial symbol in
the grammar is a maximal projection of C (for complementizer). Embedded CPs
can be either finite (introduced by a that-complementizer) or non-finite
(introduced by a for-complementizer).

A. Understood Subjects
We will now look at infinitives that are not introduced by a for-complementizer,
and do not even seem to contain subjects. An example is the following:
(1) To leave would be inconvenient.
It is clear that, in some sense, there is an understood subject, and this can
be brought out if we add a benefactive phrase to the main clause:
(2) To leave would be inconvenient for Fred.
We can understand (2) to mean either that it would be inconvenient for Fred if he
himself were to leave, or it would be inconvenient for Fred if somebody else were
to leave. The fact that we understand a subject, however, does not mean that
the subject is present in the syntactic representation, i.e. the phrase-marker. We
assume that the grammar of natural language is, like the grammars of logical
languages, organized in such a way that the syntactic component generates
representations that are then interpreted by the semantics. Therefore, the fact
that a subject is understood does not mean that it is present syntactically.
Let us then look for some syntactic evidence that infinitives have subjects.
There are a number of theories of grammar that claim that infinitives do not
have syntactic subjects, but rather, subjects that are, as it were, plugged in, or
supplied, by the semantics. The subject is missing, in all of these approaches,
because there is no structural position for it. For example, we could generate
subjectless infinitives as Ms, with the phrase-structure rules that we have used
in (3):
(3) M- M
M-- M V
And we could generate them in the same way that we decided to
generate clauses that function as Ns, i.e. as in (4):
(4) N- N ({ M } )
( {C } )
Hence, the D-structure of (1) would be as in (5):

92

(5) C
C
C

T
N

Past

M
M
to

will

be

leave

A
inconvenient

Notice that we have to complicate the phrase-structure rule for generating


clausal arguments, by having the curly brackets in (4) to generate either Ms or
Cs. However, the disjunctive phrase-structure rule in (4) fails to capture the fact
that, for individual predicates that take clausal arguments, every predicate that
allows for a full for-infinitive, which is a C, would also have to allow for a
subjectless infinitive, which is an M. To see this, notice that each of the forinfinitives below is substitutable for a subjectless infinitive:
(6) a. I would prefer for John to leave.
b. I would prefer to leave.
(7) a. I was hoping for John to leave.
b. I was hoping to leave.
(8) a. I was waiting for John to leave.
b. I was waiting to leave.
(9) a. I would hate for John to leave.

93
b. I would hate to leave.
We could, of course, have subcategorization frames for prefer, hope, wait, and
hate as in the following:
(10)
prefer, V, +[N [N 0] {[C [C for] ]}
{ [M [M to] ]}
In evaluating the claim that subjectless infinitives are simply Ms, one might
note that disjunctive subcategorization frames are needed in any event. A case
in point is the subcategorization frame for become, which takes either an N or
an A.
(11)
a. become, V, +[___ { N } ]
{ A }
b. He became { a lawyer
}.
{ quite angry }
However, there is a crucial difference between a disjunctive subcategorization
frame for one verb, and a disjunctive subcategorization frame for every
predicate in the language that takes a given category A, such that every
predicate that take the category A will also take the category B. For example, we
know that a disjunctive subcategorization frame is the appropriate mechanism
for expressing the combinatory possibilities of the verb become because there are
other environments in which only one of the categories with which become
combines can occur, such as the verb grow, which only subcategorizes for an A,
but not a N:
(12)
He grew { * a lawyer }.
{ quite angry}
However, the disjunctive subcategorization frames that would be required for
the analysis of subjectless infinitives as Ms would be required for every
predicate in the language that takes a for-infinitive. Moreover, there are forinfinitives that can occur as adjuncts; they are termed purpose clauses ( R.
Faraci (1974), Aspects of the Grammar of Infinitives and For-Phrases,
unpublished Doctoral dissertation, MIT) :
(13)
I bought it for Sally to play with__.
A subjectless infinitive can also occur as a purpose clause:
(14)
I bought it to play with__.
Clearly, by our definition of a grammatical category as a class of elements that
are mutually substitutable in a sufficiently wide range of environments,
subjectless infinitives and for-infinitives are members of the same grammatical
class. Since the presence of the complementizer for suggests that the latter is a
C, we would seem to be required to analyze the subjectless infinitive as a C.
However, Cs are expanded by the phrase-structure rules in (15):

94
(15)

a. C-- C
b. C-- C T
c. T- N T
d. T-- T { M }
{ V }
Notice that by using the phrase-structure rules in (15), there is a subject
position , and we must then ask why this subject position for the infinitive is not
overtly realized. We must also ask why the complementizer position is not
overtly realized.
In evaluating the claim of the plug-in theory of understood subjects of
infinitives, in which they are not syntactically present but instead supplied in
the semantics, and subjectless infinitives are generated as Ms, a further
complication arises with respect to the substitutability of for-infinitives and
subjectless infinitives. A for-infinitive cannot appear as the complement of a verb
if its subject is understood as identical to the main clause subject. English has a
form that expresses the identity of a noun phrase with another noun phrase in
the sentence, and this form is called the reflexive pronoun ( we will be talking
more about reflexive pronouns shortly):
(16)
John likes himself.
The identity of John and himself, termed referential identity because both terms
pick out the same individual, is usually expressed by superscripting an index
to the term that is the same as the index that is superscripted to the term with
which it is co-referential, and this device is termed co-indexing. An example is
(17):
(17)
Johni likes himselfi.
We cannot use a for-infinitive, however, when the subject of the infinitive is coindexed with the main clause subject. We must use the subjectless infinitive:
(18)a. * He would prefer for himself to win.
b. He would prefer to win.
(19) a. *He would hate for himself to lose.
b. He would hate to lose.
(20)a. * He was hoping for himself to win.
b. He was hoping to win.
(21)a. * He was waiting for himself to leave.
b. He was waiting to leave.
If we adopt the plug-in view of understood subjects, and all that it entails, we
would still need a mechanism to prevent the generation of for-infinitives with
subjects that are co-referential with main clause subjects. In other words, given

95
that we can generate sentences with full for-infinitive complements, as in (22-25),
what prevents the (a) examples in (18-21)?
(22)He would prefer for John to win.
(23)He would hate for John to lose.
(24)He was hoping for John to win.
(25)He was waiting for John to leave.
Interestingly enough, there is a dialect of English, spoken in the Ozarks, in
which the for-complementizer shows up without an expressed subject of the
infinitive, when the understood subject of the infinitive is understood as being
co-referential with the main clause subject:
(26)% He was hoping for to win. (% means is acceptable in this dialect).
We might then account for the difference between Ozark English and Standard
English by positing a rule that obligatorily deletes a reflexive pronoun in the
subject position of an infinitive:
(27)[C for] [N + refl]
12 --- 1- 0
and , subsequent to rule (27), which we will call Reflexive Deletion, a rule
that obligatorily deletes a for-complementizer next to the infinitive marker to:
(28)for- to
1- 2-
0- 2
Rule (28) (For deletion) would be obligatory in standard English.
By positing the rules of Reflexive Deletion and For-Deletion, and making them
obligatory, we can account for the fact that subjectless infinitives can occur
wherever for-infinitives can occur. The D-structure of , e.g. (18)(b) is simply (29):
(29)C
C
C

T
N

He

past

96
M

will

V
V

prefer

N
N

C
C
for

T
N

himself T
0

M
M

M
to

V
V
V
win

Reflexive Deletion then applies, yielding (30):

30 (30)
C
C
C

97
N

He

past

M
M

will

V
V

prefer

N
N

C
C

for

T
T
1

M
M

M
to

V
V
V
win

Finally, For- Deletion applies, yielding (31):

98
(31)
31

C
C

T
N
He

T
T
past

M
M
M

will

V
V

prefer

N
N

C
T
T
T

M
M
to

V
V
V

99

Win
B. Verbs of Obligatory Control
The generalization that a subjectless infinitive can appear wherever a forinfinitive can appear is exceptionless, but the converse is not true- there are
environments in which a subjectless infinitive can appear but a for-infinitive
cannot appear. Examples are the complements of the verbs try and attempt:
(32) a. *He tried for Fred to leave.
b. He tried to leave.
(33) a. * He attempted for Fred to leave.
b. He attempted to leave.
We can account for this by positing a lexical feature for the relevant
predicates which stipulates that the subjects of their complements must be
identical to their own subjects. These verbs are known as verbs of obligatory
control, with control being defined as the phenomenon whereby an element
must be anaphoric to some other element in the phrase-marker.
Exercises:
1. How does the existence of passive infinitives, as in (i), choose between the
plug-in theories of subjectless infinitives and the analysis of subjectless
infinitives as arising through reflexive deletion?
(i)
John wants to be visited by werewolves.
2. Under the analysis that posits reflexive deletion, show the derivation of (i).
What would be the ordering of N preposing and reflexive deletion?
C. Subject-to Subject Raising
Of the predicates that take obligatorily subjectless infinitives, it can be
shown that the process by which the infinitive comes to lack its subject is not
always reflexive deletion. It is also possible for the subject to have moved out of
the infinitive .
To see this, consider verbs such as try and attempt, on the one hand, and verbs
such as seem and appear, on the other:
(34) a. John tried to be happy.
b. * The car tried to be heading toward us.
c. *There tried to be a good reason for that.
d. *Headway tried to have been made.
Vs. seem:
(35) a. John seemed to be happy.
b. The car seemed to be heading toward us.
c. There seemed to be a good reason for that.

100
d. Headway seemed to have been made.
Apparently, the verb try imposes restrictions on its subject (specifically, the
subject must be animate, and hence capable of being an agent), while the verb
seem imposes no such restrictions on its subject. When seem takes an infinitive,
the subject of seem gets no restrictions from seem itself. Any noun phrase can be
the subject of seem provided that it can be the subject of the infinitive predicate.
Hence, the expletive there requires, in simple finite clauses, the verb be for
its appearance:
(36) There is a good reason for that.
(37) * There became a good reason for that.
And, although (35)(c) is acceptable, (38) is not:
(38) * There seemed to become a good reason for that.
We note, further, that the verb seem can also take a finite complement. When
it takes a finite complement, however, its subject must be the expletive it:
(38) It seems that there is a good reason for that.
Furthermore, when we look at idiom chunks, it will be recalled, in our
discussion of passives, that idioms had to be listed as such in the lexicon.
Specifically, the optimal lexical entries for the idioms keep track of, keep tabs on,
and make headway, were given as (10) in Lecture #8, repeated here:
(10) a. track, N, +[keep____[P of X ]]
b. tabs, N, +[ keep___ [P on X ]]
.
c.
headway, N, +[make ____]
Now consider the fact that these idiom chunks can appear as subjects of the
verb seem, when seem takes an infinitival complement. We have already seen this in
(35)(d). Parallel to (35)(d) is (39):
(39) a. Careful track seemed to have been kept of his progress.
b. Careful tabs seemed to have been kept on Monica.
If the infinitive complement of seem were to come to lack its subject by the rule of
reflexive deletion, we would have to generate the antecedent of the reflexive as the
subject of seem. Hence, the derivation of , e.g. (39a) would have to take (40) as the DStructure:

101

(40) C
C
C

T
N
Careful track

T
T

Past

V
V

seem

C
C
for

T
N

to

V
V

have

V
V

been

V
V

P
kept careful
of

102
track his
progress
We would then need a mechanism to convert the second occurrence of the idiom
chunk careful track to a reflexive, after which it would undergo N-preposing to the
empty subject position of the infinitive, where it would undergo reflexive deletion,
and the for would undergo for-deletion.
However, we would be violating our lexical requirements on the occurrence of
these idiom chunks by generating them as subjects of seem.
It would also be desirable to relate the use of seem with the finite complement and
expletive subject to the use with the infinitive. Let us now try to do this.
Suppose we give seem the lexical entry in (41):
(41) seem, V, +[___C]
We must make one stipulation. Given that the overt complementizer for never shows up in this
type of infinitive construction, we actually have no evidence that the infinitive is introduced by for
here. We do have evidence , as we have just seen , that the subject of seem , when seem takes
an infinitive, is, for all intents and purposes, the subject of the infinitive. Furthermore, we have
seen that seem, when it takes a finite complement, lacks a subject in the semantic sense. Let us
then generate seem without a semantic subject in both instances, so that the D-structure of, e.g.
(35)(a), would be (42):
(42) C
C
C

T
N

Past

seem

C
C

T
N

John

M
M
to

V
V

103
V

be

A
A

happy
The symbol e simply means empty, i.e. an unexpanded node in the phrase-marker. We then
apply N-preposing, the same transformation that was employed in the derivation of passive and
unaccusative constructions, to move the subject of the infinitive into the subject position of seem.
Recall that the formulation of N-preposing was given in Lecture #8, (23), repeated here:
(23) ) N - X- V - N
1 - 2 - 3 - 4 -- 4-2-3- 0
In order to have the phrase-marker in (42) meet the structural description of N-preposing, we
must disregard the intervening complementizer. Let us therefore, for the moment, assume that
null elements are not factored as being present when inspecting phrase-markers for compatibility
with the structural descriptions of transformations.
Therefore, N-preposing will apply, yielding (43):
(43) C
C
C

T
N
John

T
T

pres

V
V

seem

C
C

T
T

M
M

to

V
V

be

A
A

104
happy
The analysis of the subject position of seem in the infinitive as coming to be occupied
by the employment of N-preposing now makes the derivation of (35)(d), repeated here,
straightforward.
(35)(d) Headway seemed to have been made.

It simply involves t wo applications of N-preposing to the D-structure in


(44):
(44) C
C
C

T
N
e

T
T

past

seem

C
C
0

T
N
e

T
T

M
M

to

V
V
have

V
V

105
V

be

V
V
make+en

headway

106
There is a significant aspect to this account of passives, unaccusatives, and
subject-to-subject raising. The terms passive, unaccusative, and subject-to-subject
raising do not play a role in the grammar at all; there is a single transformation,
N-preposing, plays a role in the generation of all three constructions. The
grammar, then, can be said not to pay attention to particular constructions, and
these three constructions have only an expository use.
Furthermore, in our discussion of grammatical relations versus grammatical
categories in Lecture #4, we noted that phrase-markers do not explicitly
represent grammatical relations. We are now in a position to see why that
decision has been made.
Note that the rule of N-preposing does not just move objects; it also moves
subjects of infinitive complements. In a certain sense, transformations are
structure-dependent rather than function-dependent (terms due to Joan Bresnan
(1976), On the Form and Functioning of Transformations, Linguistic Inquiry,
Vol 7). They simply move grammatical categories in the right structural
positions. The claim, and it is a strong one, is that there are no rules that, say,
move subjects, or objects, but only rules that move such elements as Ns , Ps,
etc.
Object Infinitival Complementation
We will now see that the mechanisms that we have motivated for the syntax of infinitives
on the basis of sentences that consist, on the surface, of a contentful subject and an
infinitive in complement position, will work without further ado for sentences that
superficially consist of a subject , and, following the verb, a N followed by an
infinitive. Examples are given in (1) and (2):
(1) John {persuaded } Sally to be polite.
{ordered }
{ convinced}
(2) John { believed } Sally to be polite.
{ proved }
{ expected }
Although the strings in (1) and (2) are all identical save the choice of main
verb, when one looks further, one sees a significant difference in the range of
N plus infinitival sequences that are permitted to follow each of the verbs in
(1), as opposed to the class of verbs in (2). Specifically, the N that follows a
verb of t he first class (which we will refer to for now as the persuade-class for
ease of exposition) must be animate. In particular, it must be interpreted as the
agent of the following infinitive in some sense:
(3) John {persuaded }* { the rock to be on the table }.
{ordered } { there to be a valid reason for his absence }.
{convinced } { Fred to be six feet tall }.

107
No such restriction exists for the verbs of the second class (called for
expository convenience the believe-class); any N infinitive sequence is
possible, provided that the N can be interpreted as the subject of the infinitive.
Hence, the star is removed for all of the examples in (3) if the verb is of the
believe-class:
(4) John { believed } { the rock to be on the table }
{ proved } { there to be a valid reason for his absence }.
{expected } { Fred to be six feet tall
}.
We see, then, that while the N that follows a verb of either class must be
interpreted as the subject of the infinitive that follows, a verb of the persuadeclass imposes thematic restrictions on the N as well, while a verb of the believeclass does not. Can we deduce anything about the structure of sentences
containing such verbs from these co-occurrence facts?
With respect to verbs of the persuade-class, we can deduce that the postverbal N is not syntactically the subject of the following infinitive, but rather is
in the structural position of the object. We can deduce this from the following
constraint on locality of theta-marking:
(5) Principle of Locality of Theta-Marking:
If theta-marks (i.e. assigns a theta-role to ) , then and must be
sisters.
We can see evidence of (5) by examining sentences containing clausal
complements that are overtly marked by complementizers. Turning first to verbs
that take complements that are introduced by the complementizer for, we see
that the matrix verb never restricts the content of the infinitive in any way, let
alone restricting the subject position of the infinitive:

108

(6) John would { prefer } for { the rock to be on the table }.


{ hate }
{ there to be a valid reason for his
absence }.
{ love }
{ Fred to be six feet tall}.
{ hope }
{ wait }
When we turn our attention to that-complements, we see that that-complements
are similarly unrestricted by the predicates that select them:
(7) John { claimed } that { the rock was on the table }.
{ knew }
{ there was a valid reason for his absence }.
{ said
}
{ Fred was six feet tall
}.
{ indicated}
We see, then, that for cases in which we know that an N is not a sister to the
head of the phrase in which the N resides ( since the N is preceded by a
complementizer), the N is never assigned a theta-role by the head. Therefore,
because the N that follows a verb of the persuade-class is assigned a theta-role by
that head, it must be a sister to the verb. In short, it must be the object of the
verb, rather than the subject of the infinitive. Hence, the structure of, e.g., (1)
(a) must be (8) rather than (9):

109

(8) C
C
C

T
N
John

T
T
Past
V

persuade

V
V
N

Sally

C
for

T
N

herself

M
M

to

V
V
be

A
A
A
polite

110

(9) C
C
C

T
N
John

T
T

past

V
V

persuade

C
C

T
N

Sally

M
M

to

V
V
be

A
A
A
polite

111
We can then use the rules of reflexive deletion and for-deletion to derive the
structure for (1)(a).
For verbs of the believe-class, however, the matrix verb does not assign a
theta-role to the post-verbal N. Note that the Principle of Locality of ThetaMarking only states a necessary condition for theta-marking; in order to be
theta-marked, the N must be a sister to the element that theta-marks it. We
might ask, however, sisterhood is a sufficient condition for theta-marking, in
the sense that if an element is a sister to a lexical head, the head would assign a
theta-role to the element. If we could establish that sisterhood assigns thetamarking, we would then be in a position to establish the structures of sentences
containing believe-type verbs- the post-verbal N would have to be the subject of
the following infinitive, rather than the object of believe. Therefore, the
structure of , e.g. (2)(a), would have to be (10):
(10) C
C
C

T
N
John

T
T

past

believe

C
C
0

T
N

Sally

T
0

M
M

112
to

V
V

be

A
A

polite
There is some evidence that sisterhood entails theta-marking, as pointed out by
Chomsky (Lectures on Government and Binding (1981), Foris Press).
Specifically, expletives appear in subject position, but not in object position.
Therefore, we have no intransitive verbs that take an expletive object, such as
the hypothetical laugh ( having the thematic structure of laugh, but taking an
expletive object):
(11) a.* John laughed there.
b. * John laughed it.
We might therefore propose that sisterhood, the environment for
subcategorization, entails theta-marking, requiring the structure in (10).
Proposals have been made in the literature, however, notably by Paul Postal
(On Raising (1974), MIT Press) that , while the post-verbal N may originate as
the subject of the infinitive complement of a believe-type verb, it becomes the
object , by a transformation that is called Subject-to-Object Raising, and would
be formulated as follows: (12) N- [C- N -X ]
1 - 2 - 3- 4--
3 - 2- 0 - 4
We can concretize the analysis by positing an empty N position, as in (13):

113

(13)

C
C

T
N
John

T
T

past

V
V

believe

C
C
0

T
N

Sally

T
0

M
M

to

V
V
be

A
A
A

polite
We must ask what the evidence is for subject-to-object raising, which would alter the
structure but not the terminal string (as does restructuring of the helping verbs into T).

114
The best argument for subject-to-object raising concerns the placement of adverbs that
must modify the main clause, as in (14):
(14) I believe John with all my heart to be guilty.
The adverb obviously refers to the speakers belief. Now, let us consider a verb that
takes an infinitive complement with the for-complementizer, as in (15):
(15) I would prefer for John to be the winner.
Because the complementizer for is present, we assume that the N that
immediately follows is within the infinitive. Notice, however, that an adverb which
modifies the main clause cannot intervene between the post-verbal N and the
infinitive marker when for is retained:
(16) * I would prefer for John with all my heart to be the winner.
When the for is deleted, however, a main clause adverb can occur there more
naturally.
(17) I would prefer John with all my heart to be the winner.
It would seem, therefore, that an adverb must occur in the clause that it modifies.
Therefore, the N must be in the matrix clause, according to Postals argument.
One argument for subject-to-object raising that does not go through relies on the
assumption that the antecedent for a reflexive must be in the same clause as its antecedent
, known as a clause-mate condition. Evidence for the clause-mate condition can be seen in
the ungrammaticality of sentences containing reflexives when this condition is not met:
(18) a. * John thinks that nobody likes himself.
b. *John would prefer for himself to win.
However, reciprocals in English seem to be subject to the same distributional constraints
as reflexives:
(19) a. * They think that nobody likes each other.
b. * They would prefer for John to see each other.
However, reciprocals are clearly not subject to a clause-mate condition:
(20) They would prefer for each other to win.
We will return to the distribution of reciprocals and reflexives and their antecedents. It
is an extremely important topic in current syntactic theory, and we will account for the
ungrammaticality of such examples as (18) and (19) in a different way.
Another argument for the subject-to-object raising is based on the interpretation of
logical words such as every and not (called logical operators). Consider the
interpretation of a sentence such as (21):
(21) Every boy did not read the book.
Many people say that (21) is ambiguous, and can have either the interpretation in
(22) or (23):
(22) Not every boy read the book.
(23) No boy read the book.
The two interpretations are said to correspond to a difference in the scope (roughly,
the logical jurisdiction) of the two logical operators every and not. In the interpretation
corresponding to (22), the negative is said to take wide scope relative to every, and
every (called a universal quantifier) is said to take narrow scope. In (23), the
universal quantifier is said to take wide scope relative to the negation, and the negative
is said to take narrow scope. The assumption is that there is a mapping procedure
between these expressions in natural language and a logical language which provides the
basis of their semantic interpretation . The logical language is called Logical Form.
The scope of negation in Logical Form corresponds to the clause in which it is contained.

115
With this in mind, consider the interpretation of a sentence containing a persuade-type
verb, in which the object is a universal quantifier and negation is contained within the
infinitive, as in (24):
(24) I persuaded every boy not to leave.
As predicted by the assumption that the scope of negation corresponds to the clause in
which it resides at surface structure, together with the assumption that the N that
precedes the infinitive complement of a persuade-type verb is outside of the infinitive,
the negative takes narrow scope with respect to the universal quantifier, and so the
interpretation of (24) must be (25):
(25) I persuaded no boy to leave.
Now, crucially, the interpretation of (26) is also unambiguous. The negative takes
narrow scope with respect to the universal quantifier:
(26) I believe every student not to like that class.
That is, (26) can only mean (27), and not (28):
(27) I believe that no student likes that class.
(28) I believe that not every student likes that class.
The prediction, then, would be that a universally quantified N that precedes a
negated infinitive when both follow a believe type verb should always take wide
scope over the negative. However, we also predict that when the N + infinitive
sequence follows the complementizer for, the ambiguity should reappear, and it
seems that it does:
(29) I would prefer for every student not to have to leave.

116
Lecture #11- Wh-Movement
In discussing transformations thus far, we have concentrated on one
variety of phrasal movement- N-preposing. This operation moves an N into an
argument position, typically the subject- position, but, as we saw at the
conclusion of Lecture #11, into the object position at times. We will now
discuss another transformation that moves phrasal elements, but which has
some properties that distinguish it from N-preposing.
To begin with, let us consider the process of question formation in
English. So far, in our discussion of English questions, we have concentrated on
what are called Yes-No Questions, in that they simply admit of yes or no
answers, such as (1):
(1) Did John eat the steak?
However, there is another variety of question, known as a constituent
question, which does not admit of a yes or no answer, but which asks for a
specification of some element in the sentence, such as (2):
(2) What did John eat?
The element that is being questioned, in this case the object of eat,
is expressed as what, and appears at the beginning of the sentence. The normal
position of the object is empty- there is nothing after the verb. The questioned
constituent takes a form that is known as a wh-form, so-called because all of the
words that are used for questioning parts of a sentence in English have the letters w
and h in them, as in (3):
(3) Who did John kill?
(4) How angry did John become?
(5) Where did John put the book?
The wh-form in (3) ( the wh-forms are all underlined) corresponds to an
animate N, as in (6):
(6) John killed Fred.
The wh-form in (4) corresponds to an adjective phrase, as in (7):
(7) John became quite angry.
And the wh-form in (5) corresponds to a prepositional phrase, as in (8):
(8) John put the book on the table.
We therefore have the following informal description of the formation of
English constituent questions. A wh-phrase appears at the beginning of the
sentence (in main clause questions) which corresponds to a particular constituent
type (N, A, or P), and there is a gap that corresponds to that constituent type
elsewhere in the sentence. That the wh-phrase must correspond to a gap can be
seen by the unacceptability of the following, which results from placing a full
constituent of the appropriate type in the position of the gap in (3-5):

117
(9) *Who did John kill Fred?
(10)
* How angry did John become quite sad?
(11)
*Where did John put the book on the table?
One can also see the effect of a matching between the wh-phrase and the gap in that
the wh- phrase must correspond to a gap that is appropriate. Hence, the following shows the
effect of this mismatch:

(12)
* How angry did John kill?
Hence, there is a dependency between a wh-phrase at the front of a question,
and a gap at the beginning, such that the wh-phrase must, in a sense, agree
with the gap, in the sense of having the same characteristics as a non-wh-phrase
that could have appeared in the position of the gap. We can account for this
dependency by generating the wh-phrase in the position of the gap, where it
would obey all the co-occurrence restrictions appropriate to elements that are
generated in that position, and then moved to the front of the sentence (we will
be more specific in a moment). For example, we have discussed the generation
of idioms, and have utilized the mechanism of lexical specification of idioms as
an argument for movement , in connection with N-preposing, as in (13) (a) and
(b):
(13)
(a) John made significant headway.
(b) Significant headway was made.
Parallel to the argument for N movement in the passive construction (and
subject-to-subject raising) on the basis of idiom chunks, we find evidence for
movement of questioned elements in constituent questions:
(14)
How much headway did John make___?
By the (by now) familiar argument from idiom chunks, the wh-phrase in
(14) must have been moved to that position.
Notice that the wh-phrase that moves can move from a potentially
indefinitely embedded position, so that the wh-phrase in (15) must have
moved from a sentential complement within a sentential complement, i.e. three
clauses down:
(15)
How much headway did Joe say that Bill thought that John
made__?

The Landing Site of Moved Wh Phrases


In our phrase-structure rule for the projection of complementizers, we posited
the phrase-structure rules in (16):
(16)
C- C
C-- C T
The phrase-structure rule expanding C violates the X- bar schema which
predicts that every X has a specifier. We might therefore analyze moved wh-

118
phrases as moving to the Spec of C, filling this otherwise missing position.
One way of formulating the movement process would be to posit a feature [+wh]
on the X to be moved, and then formulating the movement as in (17):
(17)
X - C- W- [X +wh]
1 - 2 -3 - 4--
4 - 2- 3 - 0
Hence, the D-structure of (2) would be (18):

119

(18)
N

C
C
C

T
N

John

Past

V
V

[ N ]
[+wh]
what

A Refinement of Term #2 in the Structural Description of Wh-Movement


It seems that (17) is too general in one crucial respect: in the way
that it is formulated, a wh-phrase can move to the Spec of any C. Hence,
consider (19):
(19)John believes that Fred saw who.
Nothing would prevent the wh-phrase from moving to a Spec within the
embedded C, generating either (20) if the complementizer is retained, or (21) if
it is deleted:
(20)*John believes who that Fred saw?
(21)* John believes who Fred saw?
Of course, the only possible wh-movement that could occur to the structure
corresponding to (19) would be (22):
(22)Who does John believe that Fred saw?
(22) is often called a direct question, in that the wh-phrase occurs at the
beginning of the entire sentence. English and every other language also has
what are called indirect questions (also called embedded questions), in that the
question is actually an argument that is selected by a predicate. For example,
although (21) is unacceptable, (23) is acceptable, when the verb believe is
replaced by the verb wonder. In fact, wonder requires a wh-phrase to introduce
its complement, as seen by the unacceptability of (24):

120
(23)John wonders who Fred saw.
(24)* John wonders that Fred saw Sally.
Interestingly enough, wonder also allows a complement to occur with a wh-phrase
that does not correspond to a gap in the clause-namely, the word whether:

(25)John wonders whether Fred saw Sally.


Notice that the interpretation of a complement introduced by whether is that of a
yes-no question. Hence, what John is wondering about can be given (26) as its
content:
(26)Did Fred see Sally?
Notice also that just as whether can occur with the phrase or not, as in (27)(a)
and (b), or not can occur in a direct yes- no question, as in (28):
(27)(a) I wonder whether or not John saw Sally.
(b) I wonder whether John saw Sally or not.
(28)Did John see Sally or not?
There are voerbs other than wonder, which take such whcomplements: The verbs inquire, ask, tell, and know, for example:
(29)a. John inquired as to who Fred saw.
b. John told me who Fred saw.
c. John asked me who Fred saw.
d. John knew who Fred saw.
They can all take complements that are introduced by whether as
well:
(30)a. John inquired as to whether Fred saw Sally.
b. John told me whether Fred saw Sally.
c. John asked me whether Fred saw Sally.
d. John knew whether Fred saw Sally.
Notice also that the complements are interpreted as having the content of
questions.
We might account for the selection of such complements as questions by
positing a feature on the Comp that is notated as +wh, and say that a +wh
complementizer is an alternative to that, accounting for the impossibility of that
s occurrence in embedded questions. The lexical entry for a predicate that
selects for an indirect question, such as wonder, will then be as in (31):
(31)wonder, V , +[___[C +wh] ]
Notice that the verb know allows both interrogative complements (question
complements) and that complements, as in (32). Hence, it would have the lexical
entry as in (33):

(32)John knows that Fred saw Sally.


(33) Know, V, + [___ { [C +wh ] }

121
{ [ C that ] }
We might then reformulate wh-movement to require that the C to whose
Spec the wh-phrase moves must be a +wh C. Hence, we can account for
the ungrammaticality of (20) and (21), because believe selects a CP
headed by a that-complementizer.
To account for whether, we might propose that it is a marker of a yes-no
question that is generated in [ Spec, C ] when the question is a yes-no
question. We might propose, then, that whether is deleted in the Spec of a
direct question, accounting for the interpretation of whether in embedded
contexts, but its absence in main clause contexts.
In short, (17) should be re-formulated as (34):
(34) X - CW- [X +wh]
[+wh]
1 - 2 - 3 - 4--
4 - 23 - 0
In short, wh-movement to the Spec of a CP that does not contain a +wh
complementizer will be ruled out because the structural description of whmovement will not be met.
English, however, like most (but not all) other natural languages
allows for more than one constituent to be questioned . When a multiple
question occurs, such as (35), however, only one element will undergo
wh-movement:
(35)Who gave what to whom?
We can see the reason for this if we consider the derivation of (35). The Dstructure will be (36):
(36)
C
N

C
+wh

T
N

who

past

122
give

what

P
P

to
whom
Let us assume that the subject wh-phrase moves by wh-movement into the Spec
of the matrix C, yielding (37). This is known as a string-vacuous movement
(like restructuring of have or be into T), discussed in Lecture #5), in that it
changes the structure without changing the terminal string of the phrase-marker.
(37)
C
N
Who

C
C
+wh

T
T

Past

V
V
give

N
what

P
P

to
whom
In the case of multiple whs, only one can move to [Spec, C] for the simple
reason that there is only one [Spec, C], and when it is occupied by one whphrase, movement of another wh-phrase to that position would cause the first
wh-phrase to be irrecoverably deleted, violating Recoverability of Deletion .
Recall that movement only takes place to empty positions, as we saw in the case
of restructuring and N-preposing. Hence, these transformations are
obligatory, but only to the extent that their application does not violate
recoverability.
C. What the feature +wh selects
A striking discrepancy exists between declarative complements and
interrogative complements. We have seen that, among the set of verbs that select

123
for declarative complements, some only select finite complements, such as say,
and some only select infinitive complements, such as wait:
(38)a. John said that Sally was crazy.
b. * John said for Sally to be crazy.
(39)a. * John waited that Sally left.
b. John waited for Sally to leave.
However, whenever a verb selects an interrogative complement, the
complement can always be either finite or non-finite.
(40) a. I inquired as to whether or not to leave.
b. I inquired as to whether or not I should leave.
(41) a. I asked him whether or not to leave.
b. I asked him whether or not I should leave.
(42) a. He knew what to do.
b. He knew what he should do.
We can account for this by assuming, as we have, that when A selects for
B, A is selecting for the head of B, and the head of B imposes its own
selectional restrictions. Hence, selection is a head-to-head phenomenon. With
this in mind, let us assume that the complementizer that selects for finite T, for
selects for non-finite T, and +wh simply selects for T, and doesnt care about
whether or not T is finite. Hence, it would allow either finite or non-finite
complements. Because selection is simply for the head of a sister, it would be
impossible for a verb that selected a +wh complement to require that the
complement be finite or non-finite, because the verb would be separated from
the complements Tense by the intervening Complementizer.

Non- Interrogative WhsWe must note that there are other instances of wh-movement that do not
have an interrogative interpretation. One case in point is the relative clause
construction in English:
(43) Im looking for a person whom I can trust.
Interestingly enough, relative clauses can also occur as infinitives:
(44) Im looking for something on which to put this.
Following Luigi Rizzi ( Residual Verb Second and t he Wh-Criterion, in
Adriana Bellettiand Luigi Rizzi, eds., Parameters and Functional Heads:
Essays in Comparative Syntax, Oxford University Press (1996)), we might
give questions the feature +Q in addition to the feature +wh, and relative
clauses the feature +Rel in addition to the feature +wh.

124

Lecture #12- Relative Clauses and Noun-Complement Constructions


The rule of wh-movement, discussed in Lecture #11, does not operate
simply to form questions. It operates in other constructions as well. There is an
interesting parallel between wh-movement and N-preposing that shows a shift
in the view of transformational grammarians toward the nature of
transformations. At one point, early in the development of transformational
grammar, at the time of Chomskys Syntactic Structures (1957, Mouton), it was
thought that transformations were extremely specific, and tied to specific
constructions, so that there was ,e.g., a passive transformation , a subject-tosubject raising transformation, etc.
Around the 1970s, there was a shift in thinking about the nature of
transformations, such that transformations were no longer thought to be tied to
specific transformations, but were stated more generally, as operations that
were responsible for the generation of a wide range of constructions. A clear

125
statement of this could be found in a 1976 paper by Lasnik & Fiengo (Some
Issues In The Theory of Transformations, Linguistic Inquiry, Vol. 7). So, for
example, the rule of N-preposing that we have discussed operates in the
derivation of the passive construction, as well as the unaccusative construction
and in the subject-raising constructions.
Similarly, there is not thought to be a specific transformation of questionformation, responsible for the generation of constituent questions, but rather a
transformation of wh-movement, that generates constituent questions as well as
other constructions in which wh-movement plays a role. One of these other
constructions is the relative clause construction, exemplified in (1):
(1) The man who I saw.
There are two types of relative clauses in English, known as restrictive
relative clauses and non-restrictive relative clauses. The relative clause in (1) is
known as a restrictive relative clause, and an example of a non-restrictive
relative clause is given in (2):
(2) John, who I like,
In written English, non-restrictive relative clauses are set off by commas,
and in spoken English, by pauses ( the intonation with pauses around the
relative clauses is actually known as comma intonation). Semantically, the two
types of relative clauses are quite different. The restrictive relative clause serves
to restrict the reference of the head noun., so that in (1), the speaker is
specifying more closely which man is being referred to. Non-restrictive relative
clauses do not restrict the reference of the head noun, but simply provide , as a
sort of side-comment, a description of some property that the head noun
possesses ( they are also known as appositive relative clauses.) We will now
focus on the structure of restrictive relative clauses, but we must first distinguish
restrictive relative clauses from another construction in which a clause occurs
within a N that contains a lexical head noun, known as the noun-complement
construction.

The Noun-Complement Construction


Every common noun in English, and indeed all natural languages, can occur
with a restrictive relative clause, but certain nouns can also take a C
complement with somewhat different characteristics. These nouns include the
nouns theory, story, claim, statement, rumor, belief, knowledge, and realization,
among others. For example, the following noun phrases are all well-formed:
(3) The { theory
} that John was the murderer
{ rumor
}
{ knowledge}

126
{ belief
}
{ claim
}
Notice that the clause within the C in (3) does not contain a gap, and the C is
introduced by that rather than a wh-phrase. Notice that this type of clause
within an N is lexically restricted by the head noun in its occurrence, in that not
all nouns allow this type of clause within the N, as can be seen by the
impossibility of (4):
(4) * The { pencil } that John was the murderer.
{ book
}
{ letter
}
Hence, it would seem that the nouns exemplified in (3) subcategorize for the
clause, and, by local subcategorization, the noun and the clause must be sisters.
Hence, the structure of the N must be as in (5):
(5)
N
Det
the

N
N

theory

C
C

that

T
N

John

Past

V
V
be

N
D
the

N
N

murderer
On the other hand, restrictive relative clauses can always occur within a N
that is headed by a common noun. In this sense, the licensing of relative
clauses, which must occur within Ns, is similar to the licensing of temporals,

127
which occur within a clause. Every simple sentence allows some kind of
temporal, and the specific type is restricted by the semantic class within which
the particular verb is situated, but the type of temporal that can occur within a
clause is not restricted by the individual verb. Hence, temporal phrases that
denote duration cannot occur within sentences headed by stative verbs:
(6) John {knows
}
French while Sally visited Fred.
{ understands}
However, this is a matter of the semantic class of stative verbs, not an
individual lexical choice. For temporals, a temporal can always occur in a
simple sentence, although the particular temporal is restricted by the semantic
context in which it occurs. Restrictive relative clauses show a similar freedom of
occurrence, suggesting that the phrase-structures of temporals and restrictive
relative clauses should be similar.
Interestingly enough, when a noun-complement occurs with a relative
clause, the order within the N is most naturally noun complement-relative
clause, as in (7):
(6) The theory that John is the murderer that Bill was propounding.
Furthermore, there is no upper bound on the number of restrictive relative
clauses that can modify a N:
(7) The book which John wrote which you wanted to read which was on
the table....
The fact that (i) restrictive relative clauses follow noun-complements in the
N ,and (ii) are infinite in number, suggests that relative clauses should be
adjoined to some projection of N. There are two possibilities: (i) adjunction to
N; and (ii) adjunction to N. The two possibilities are shown in (8):
(8) a. N
b.
N
Det

Det

(C)

N
(C)

In fact, it is possible to choose between (8)(a) and (8) (b) if we analyze numerals
as determiners, as argued by Jean-Roger Vergnaud (1974, French Relative
Clauses, unpublished Doctoral dissertation, MIT) . Consider a relative clause
such as (9):
(8) Five men and three women who were similar.

128
Under the interpretation in which the men in question are similar to the
women in question. A predicate such as similar is known as a symmetric
predicate (G. Lakoff & S. Peters (1969), Phrasal Conjunction and Symmetric
Predicates in English, in D. Reibel & S. Schane, eds., Modern Studies in
English, Holt, Rinehart, & Winston). Symmetric predicates require plural or
conjoined subjects, hence it is impossible to say (unless interpreted elliptically),
John is similar. Hence, the conjunction must be interpreted as being basegenerated. With this in mind, the structure of (8) must be (9):

129

(9)

N
N

N
Det
five

N
N
men

and

N
Det

three

N
who

C
C

N
women

T
T

Past

V
V

be

A
A

similar
If relative clauses are adjoined to N, as in (8)(a), there is no source for the second
numeral, which is analyzed as a determiner by hypothesis. Hence, we have direct
evidence for the adjunction to N for relative clauses.
B. Relative Clauses That Are Not Introduced By a Wh-Phrase
There are restrictive relative clauses that are not introduced by a wh-phrase, an
example of which is given as the title to this section. Notice, however, that all relative
clauses contain a gap. Assuming that the that which introduces these relative clauses is
a complementizer, the subject of the relative clause is missing in the title above. Other
instances of that-relatives which contain a gap in a position other than the subject
position are given in (10):
(10)
a. The book that I read
b. The person that John was speaking to
The process that forms the gap in that relatives has all of the characteristics of
wh-movement, with the exception that the wh-form does not occur. Interestingly
enough, earlier stages of English, and some Scandinavian languages, such as
Swedish, allow both the wh-phrase and the overt complementizer to occur, as in
(11):

130
(11)
*The book which that he read.
One way of describing the inability of wh-phrases and overt complementizers to cooccur in Modern English would be to posit a filter on certain S-Structures ( this was
proposed by Chomsky & Lasnik (1977), Filters and Control, Linguistic Inquiry, Vol.
8 ). In other words, there would simply be a constraint on the sequence wh-phrase
followed by an overt complementizer ( This output condition is known as The
Doubly Filled Comp Filter). In order to rescue wh-constructions from the Doubly
Filled Comp Filter, either the wh-phrase or the complementizer must delete. English
must allow complementizers to delete in any event, as in (12):
(12)
a. I believe that he left.
b. I believe he left.
We must allow wh-phrases to delete as well, when they occur in [Spec, C].
This does not seem particularly problematic, since the content of the wh-phrase
is really given by the head of the relative clause (i.e., the N to which the
relative clause is adjoined), and hence recoverability would not be violated. On
the other hand, deletion of a question wh-word would violate recoverability,
since it has independent semantic content.
To sum up, then, we would derive a that-relative by deleting the wh-phrase in
[Spec, C], or a wh-relative by deleting the complementizer that. Examples are
given in (13):
(13) a. The book *(wh-phrase) that he read.
b. The book which *(that) he read.
We also have the option of deleting both the wh-phrase and the
complementizer:
(14) The book he read.

131

Lecture #13- Island Constraints


Most of the research in syntactic theory over the last 30 years or so has
shown exactly how restricted the grammars in fact are, compared to what it is
logically possible for them to be. We have seen this so far in the material that we
have covered, but some of the constraints on grammars may have what is known
as a functional basis, in the sense that the constraints may exist, while not for
logical reasons, for reasons external to language. One such constraint is
recoverability, which requires that no transformation in apply in such a way as
to render material with semantic content irrecoverable. It is clear, however, that
natural language would be much less efficient without the recoverability
constraint. In this section, we will be concerned with other constraints whose
external basis is much less clear, but which are nevertheless well-supported and
pervasive.
J.R. Ross, in his 1967 MIT Doctoral dissertation ( Constraints on
Variables In Syntax), noted a paradox in that there are transformations that are
seemingly unbounded, but yet cannot apply out of certain configurations. The
configurations out of which these transformations cannot apply were termed
islands. For example, wh-movement, discussed in the past two lectures,
apparently applies over an in principle unbounded stretch of the phrase-marker,
as in (1):
(1)
Who does John think that Fred said that Mary believed that Bill liked?
However, there are certain configurations out of which wh-movement
cannot occur. For example, if a clause is contained within an N that has a
lexical head noun, wh-movement cannot apply. For example, wh- movement
cannot occur out of a noun-complement into a main clause, as in (2), or out of a
relative clause into a main clause, as in (3):
(2)
* Who does John believe the claim that Mary likes?
(3)
*Who did John visit the man who saw?

132
So far, we have been looking at only one transformation that applies over an
apparently unbounded distance. Another such transformation is the movement
rule of topicalization, which can be seen to be operative in (4) and (5):
(4)
John I really like.
(5)
John I cant believe the claim that anybody likes.
It is clear that topicalization is a different transformation than wh-movement. For
one thing, topicalization moves the N that is topicalized to a position in the phrasemarker that is distinct from [Spec, C]. Wh-phrases never follow a complementizer,
but topicalized phrases do, as pointed out in Baltin (1982) (A Landing Site Theory
of Movement Rules, Linguistic Inquiry, Vol. 13, No. 1):
(6)
John said that this book, he really likes.
As I had also pointed out, topicalized elements can also follow fronted whphrases, as in (7):
(7)
Hes a man to whom liberty, we could never grant.
I will adopt the analysis of topicalization in Baltin (1982), in which topicalized
elements adjoin to T. With this in mind, notice that topicalized elements show
the same restriction as the one exemplified by wh-movement in (2) and (3):
(8)
*This book I cant believe the claim that anybody likes.
(9)
*This book I saw the man who read.
Therefore, Ross argued that the relevant restriction was not a restriction that was
stated as conditions on particular transformations, but rather as a separate
constraint on all transformations. It was stated as follows:
(10)
Complex NP Constraint
No transformation can move an element out of a C that is
contained within an N that has a lexical head noun to a position out of that
N.
We can see how the Complex NP Constraint operates to block, e.g. (2).
The underlying structure of (2) would be (11):

133

(11)

C
C
+wh

T
N

John

Pres

V
N
N

V
believe

Det
the

N
N

claim

C
C
that

T
N

Mary

Pres

like who
The circled N counts as a complex NP by Rosss definition, and hence extraction
out of it is impossible.
There are other island constraints, and we shall now go through them.
A. The Coordinate Structure Constraint
Consider a coordination such as (12):
(12)
John gave a book to Bill and Mary gave a magazine to Fred.

134
It is impossible to wh-move an N within just one of the conjuncts, as in
(13):
(13)
* I wonder what John g ave to Bill and Mary gave a magazine to Fred.
It is possible, on the other hand , to extract from both conjuncts, as in
(14):
(14)
I wonder what John gave to Bill and Mary gave to Fred.
Extraction from all of the conjuncts simultaneously is called AcrossThe-Board extraction (see Edwin Williams (1978) Across-The-Board Rule
Application, Linguistic Inquiry, Vol. 9 for a clear account of this phenomenon).
It would operate as follows in (14). The underlying structure of (14) would be
(15):

135
(15)

C
C

T
N
I

T
T

Pres

V
V

wonder N

C
C
+wh

T
T

and

John T

N
V

Past

Mary

V
V

give what to Bill

T
T

Past

give what to

Fred

and both wh-phrases will move into [Spec, C] simultaneously, with


recoverability not being violated because they are identical.
With this in mind, the Coordinate Structure Constraint is stated in (16) :
(16)
Coordinate Structure Constraint
No element can be extracted from just one conjunct of a coordinate
structure.

136

B. The Subject Condition


The Subject Condition may be formulated as follows:
(17)
Subject Condition
No element may be extracted from within a subject.
To see an example of the subject condition, consider the fact that wh-extraction
can at times operate out of Ns in object position, but not in subject position, as
can be seen in the contrast in (18):
(18)
a. Who did you see a picture of?
b. * Who was a picture of seen?
C. The Right Roof Constraint
This constraint notices an asymmetry between leftward movement rules, such
as wh-movement, and rightward movement rules, such as extraposition . As we
have seen, leftward movement rules can move elements leftward out of the clauses
in which they originate. Rightward movement rules can never move elements out
of the clauses in which they originate. To see this, consider extraposition of
sentential complements, discussed earlier:
(19)
a. That John has blood on his hands proves nothing.
b. It proves nothing that John has blood on his hands.
Now, we notice that we can have an underlying sentential subject within a
sentential subject, as in (20):
(20)
That it is obvious that John has blood on his hands proves nothing.
In (20), the underlying subject of obvious has been extraposed to the end of the matrix sentential
subject, but it cannot extrapose to the end of the matrix clause:
(21)

* That it is obvious proves nothing that John has blood on his hands.
The underlying structure of (20) is (22):

137

(22)

C0
C

T
N

C1

N
C
That

Pres
T

V
V

prove

N
nothing

N T

C2 Pres

be

That
John has blood on his hands obvious
The Right Roof Constraint as stated by Ross is as follows:
(23)
Right Roof Constraint
No element can be moved rightward out of a clause in which it
originates.
We will examine some implications of the Right Roof Constraint. One
implication that we can see is that it enables us to choose between the analysis of
restrictive relative clauses in which they are adjoined to a phrasal projection of N,
and one in which they are not. In the last lecture, we discussed the analysis of
restrictive relative clauses such as Lecture #12s (7), repeated here as (24):
(24)
The book which John wrote which you wanted to read which was on
the
table....

138
Such relative clauses were analyzed as being left-branching, with the structure
given in (25):

139

(25)

N
N
N
N

C
C

which was on the table

which you wanted to read

The book
which John read
Another alternative, suggested by Stefan Benus, is based on the fact that relative
clauses can extrapose, as in (26):
(26)
A man arrived who came from Boston.
Given the existence of a process which moves restrictive relative clauses to the
ends of the clauses that contain them, we can argue that sequences of restrictive
relative clauses within the N are really such that each relative clause is
contained within the relative clause that appears to its left, so that a more
abstract structure for , e.g. (27) would be (28):
(27) Someone who likes Mary who Fred likes arrived.

140

C0

(28)

C
C

T
N

T
C1

N
Someone N

Pres

C2

V
T

arrived

who N C
who C

Pres

likes

Mary

Fred

Pres

V
V

likes
The structure for (27) would be derived by extraposing C2 to the end of C1.
In this view, sequences of relative clauses are derived by positing structures in
which the later relative clauses are contained within the earlier ones.
However, (27) has (29) as a variant:
(29) Someone who likes Mary arrived who Fred likes.

141
If (28) were the correct structure for (27), (29) would have to be derived by
extraposing C2 to the end of C0. However, this application of extraposition
would violate the Right Roof Constraint, otherwise well-motivated. Hence, if
we assume the Right Roof Constraint, we must allow the second relative clause
to be dominated by the matrix clause, rather than the first relative clause. In
short, we must assume the possibility of stacked relative clauses, as in (25),
rather than assuming that the only source for sequences of restrictive relative
clauses is one in which the second relative clause is embedded within the first
one.

142

143

144