The Psychological Reality of Word Senses: Julia C. Jorgensen I

Journal of Psycholinguistic Research, Vol. 19, No.
3, 1990
The Psychological Reality of Word Senses
Julia C. Jorgensen I
Accepted November 22, 1989 -- Revised March 25, 1990
This paper tests psychologists" frequent assumption that dictionaries are psychologically realistic
models of polysemy in the mental lexicon. Psychologists have not often explored the nature of
polysemy, and lexicographers' methods have not involved scientific sampling of usages or inform-
ants. It is argued, however, that the lexicographic technique of citation sorting is an effective way
of diseovering sense differences. Here this technique was used in three tasks involving usage samples
for 24 high- and low-frequency nouns varying widely in degree of polysemy in the dictionary.
Analyses of agreement within and between subjects showed that subjects consistently judged and
substantially agreed upon the major senses of most nouns, but that few nouns in either frequency
group were perceived to have more than three significant senses. Additionally, the possibility that
larger usage samples will bias people to make more sense groupings was found not to be true,
suggesting that the larger number of senses lexicographers create for high-frequency words are not
artifacts of larger usage samples.
Polysemy is a central problem in the study and description of natural language.

A word is said to be polysemous if it harbors more than one sense or meaning.
When examining a dictionary intended for everyday use (such as Webster's New
Collegiate Dictionary, 1975), one quickly discovers that many common words
are characterized by a large number of senses. For example, light is given 15
sense definitions as a noun and 17 as an adjective, while line is given 14 as a
noun. Zipf (1945, 1949) found a strong positive relation between a word's
frequency of usage and the number of senses which characterize it in the dic-
tionary. So, less common words such as proof or pump may be found to have
This research was partially supported by grants from the Spencer and Sloan Foundations to George
A. Miller.
1 Correspondence should be addressed to the author at Psychology Department, Lehman College,
The City University of New York, Bedford Park Boulevard West, Bronx, New York 10468.
167
0090-6905/9010500-0167506.00/0 9 1990 Plenum PublishingCorporation

168 Jorgensen
seven or eight sense definitions, while even rarer words such as nepotism,
ecumenical, or indemnity may be found to have only one or two.
Accurately identifying and describing the senses of particular words is
important in several types of endeavor. First, lexicographers hope that diction-
aries will be helpful in teaching word use, and teachers often believe this to be
true (Deese, 1967; Miller & Gildea, 1987). The ultimate test of a pedagogically
good dictionary may be whether the sense distinctions and descriptions it con-
tains for any given word can convey the native's understanding of that word's
meaning to a naive user, such as a child or a second-language learner (Edwards,
i983).
Second, psycholinguists (e.g., Zipf, 1945; Johnson-Laird and Quinn, 1976)
often count or identify word senses as part of experimental procedure, and they
frequently obtain this information from dictionaries. Miller, Fellbaum, Kegl, &
Miller (1988) have remarked, "By and large, psycholinguistic experiments pre-
suppose the validity of the general structures that linguists and lexicographers
have identified and try instead to test hypotheses concerning the way such struc-
tures arise or how they contribute to other cognitive processes" (p. 4).
Third. artificially intelligent programs to process natural language must
identify and describe word senses, and researchers are beginning to use machine-
readable versions of ordinary dictionaries as the basic sources of information
for the large, computerized lexicons of such programs (Byrd et al., 1987).
Therefore, the assumption that dictionaries are accurate descriptions of our lex-
ical knowledge and of the extent of polysemy in the language is apparently
widespread and has important practical consequences.
However, this assumption begs the important psychological questions of
how one can accurately identify the senses of a particular word, what the mental
representation of a sense is actually like, and how a particular sense is chosen
as the intended meaning of a word in disambiguating a sentence, possibly be-
cause these questions have proven to be among the most intractible in under-
standing natural language. The major difficulty in saying what a sense is involves
clarifying the distinction between the ambiguity of a word and the diversity or
generality of its use: when are perceived differences in the meaning of a word
in different contexts indicative of "true" ambiguity or sense difference? At least
three levels of perceived meaning difference have been discussed in the literature
on polysemy. These may be called homonymy, ambiguity, and microdistinction
or generality of use.
What has been called homonymy occurs when native speakers can see no
obvious semantic relation between two different uses of a word (even though
those uses may have derived from a common ancestor) (Panman, 1982; Zgusta,
1971). Common examples of homonymy are the relation between ball, a dance,
and ball, a round projectile; port, a wine, and port, a docking area for boats;
Psychological Reality of Word Senses 169
crane, a bird, and crane, a piece of heavy machinery. Homonymy is usually

thought of as a clear-cut case of ambiguity, or even as the existence of two
different lexical items (although, as Quine, 1960, says, this last approach just
creates difficulties for the notion of lexical identity).
There is also a level of perceived semantic difference in the uses of a word
which is usually called ambiguity, and in which speakers do perceive a semantic
relation between the different uses (even though that relation may be hard for
them to describe). Examples include: the meanings of head in "'head of the
corporation" and "head of the table"; the meanings of way in " a way to solve
the problem" and "that is the way to town"; the meanings of bury in "bury
the man in a nice casket" and "the rake is buried under the leaves"; or, the
meanings of skater in "the skater is in bed asleep" and "the skater on the other
side of the rink".
Both Weinreich (1980) and Quine (1960) have suggested a test for this sort
of ambiguity based on the interpretation of a word in a "neutral" context. Quine
has noted that a seemingly clear (if perhaps not necessary) condition of the
ambiguity of terms is "that from utterance to utterance they can be clearly true
or clearly false of one and the same thing, according as interpretive clues in the
circumstances of utterance point one way or another" (p. 27). Two examples
of terms that meet this condition are:
la. The fur was light.
(This could be resolved in context by either "And black mink was also her
favorite for warmth" or "She had never liked dark-colored coats.")
lb. He just won a small prize.
(This could be resolved in context by either "And here he is now" or " I f only
it could have been bigger.")
In these cases of ambiguity, the word clearly has more than one sense. But
there are words that rarely, if ever, display this kind of ambiguity, yet are, as
Quine says, multiply applicable, and may therefore seem to have more than one
sense. His example of this sort of word is hard, which is multiply applicable
in that it can be applied to chairs as well as to questions. It does, in fact, have
twelve adjectival senses (or applications, in Quine's terms) in Webster's Col-
legiate Dictionary, and many of them fit the pattern of having one or a few
particular referents characteristically tied to them. For example, there is "'hard
money," meaning metallic money (or, by extension, good money); "hard wor-
sted," meaning firmly twisted worsted; and "hard evidence," meaning definite
evidence.
170 Jorgensen
This level of perceived difference in meaning has been called, in addition

to multiple applicability: microdistinction, diversity or generality of use or ap-
plication, or indefiniteness of reference. Examples of microdistinction abound,
and are apparently demonstrated by experiments on encoding specificity or se-
mantic flexibility (e.g., Anderson & Ortony, 1975; Barclay, Bransford, Franks,
McCarrell, & Nitsch, 1974; Reder, Anderson, & Bjork, 1974; Schoen, 1988;
Tulving & Thomson, 1973). For instance, according to Barclay, et al., when
people hear a sentence such as "The man lifted the piano," they are likely to
recall it better when cued by " h e a v y " than by "musical," whereas "musical"
is a better cue for "'The man tuned the piano." Thus, the meaning of piano
seems to vary in its degree of association to "'piece of furniture" vs. "musical
instrument," depending on the context. In fact, it would be possible to define
the ultimate degree of microdistinction in meaning as any change in the truth
value of the word in context. Examples cited in Quine (1960) and Weinreich
(1980) seem to offer support for this idea. For instance, Quine notes that the
truth value of a sentence such as "The door is open" changes with any change
in position of the particular door, or each time door is used to refer to a different
door. Weinreich has noted that the activity symbolized by eat differs depending
on what is eaten: apples, peanuts, soup, or spaghetti. He proposes that this
generality of use would not be experienced as true ambiguity were eat to be
used in a neutral context such as "I'd like to _ _ something" (in contrast
to the perceived ambiguity of light in "The fur was light"). However, as Wein-
reich notes, this test using neutral context does not actually demonstrate an
absolute distinction between true ambiguity and mere generality or indefiniteness
of reference. In addition, the notion of a neutral context may be impossible to
define.
Another test for the presence of ambiguity, suggested by Kurylowitz (1955)
and described by Weinreich (1980), involves finding a one-word synonym for
one meaning of a particular word which cannot be synonymous for another
meaning of that word. However, not finding such a synonym cannot be taken
as an indication of the absence of ambiguity.
The general problem plaguing semantic studies is that one person's ambi-
guity may be another person's microdistinction or generality of use, and vice
versa; they have proven hard to tell apart. This problem is one which undermines
the well-known effort of Katz and Fodor (1963) to formalize a model of word
senses to explain the disambiguation process. Katz and Fodor use a metalan-
guage of necessary and sufficient selection restrictions to describe each of the
different senses a word can take in context, but they give no principle for
distinguishing ambiguity from microdistinction or generality of use. Without
such a principle, their metalanguage of selection restrictions cannot be parsi-
moniously constrained. That is, without a principle by which to say what dif-
ferences in usage context make significant differences in a word's meaning, the

Katz and Fodor procedure ultimately entails the description of an unlimited
number of meanings (as many as there are contexts of occurrence or truth
values), so that the number of selection restrictions required may equal the
number of distinctions they are used to describe. Weinreich (1980), MacNamara
(1971), Kelly & Stone (1975), and Jackendoff (1983) give more detailed ac-
counts of this model and its failings.
MacNamara, Kelly and Stone, Jackendoff, and others have suggested that
the Katz-Fodor parsimony problem arises from the very nature of language:
there may be no differences in a word's usage contexts which can be used to
classify those contexts (into senses or ambiguities) in a parsimonious way if
those differences in context must be necessary and sufficient for doing so, as
Katz and Fodor suggest they must. Thus, if word meanings are related to each
other in a fuzzy manner (perhaps, as Wittgenstein, 1958, suggested, in terms
of family resemblances or as Rosch and Mervis, 1975, showed for names of
objects in natural categories), there may be no principled way to distinguish
ambiguity from microdistinctions or generality of use, and, concomitantly, no
principled line between encyclopedic ("world") knowledge and dictionary in-
formation (Churchland, 1979; Edwards, 1983). This is not surprising when one
notices that microdistinctions in word meaning seem to involve inferential processes
drawing on world knowledge.
However, family resemblance models can explain why some uses of a word
seem more like "true ambiguities," which we characterize as sense differences,
than do others, by using the notion of semantic distance or semantic relatedness
of uses (Caramazza, Grober, & Zurif, 1974; Jackendoff, 1983; Rosch & Mervis,
1975). In this vein, Kelly and Stone (1975) suggest that the distinction between
ambiguity and generality is based more on degree of semantic relatedness of
usages than it is on any established principle, and that this accounts for the
tangled, rampant disagreement about the issue. For instance, Quine apparently
believes the word hard is general (consisting of one meaning applied to many
different things) rather than ambiguous, but two proposed meanings of hard
("unyielding, solid, brittle" as in "hard candy" and "difficult, trying" as in
"hard problems") do seem quite distinct to many people. In fact, it is often
possible to construct contexts in which a word like hard, which may at first
seem to possess only "generality," can be ambiguous for a single utterance.
The entire apparatus of punning depends on this possibility. For instance, when
constructing a straight, wooden chair in woodworking class, as other students
are constructing simpler objects, the utterance "The chair is hard" would seem
truly ambiguous. (Bierwisch, 1981, has given similar examples of context con-
struction to reveal multiple meanings.) Thus, two meanings, which are often
judged as semantically distant (such as "hard" in "hard problem" and "hard
172 Jorgensen
candy") may coexist in a lexical entry with meanings which may be judged as
closely related (such as " h a r d " in "hard money" and "hard candy").
Making a judgment as to whether two related usages of a word constitute
two different senses, or, instead, two applications of a single sense, must then
rest on a judgment as to whether the two usages are sufficiently remote se-
mantically rather than on any secure principle or test. Kelly and Stone believe
that this intuitive criterion is the only one possible, that the line between poly-
semy (and thus " t r u e " ambiguity in Quine's sense) and generality is impossible
to draw in any principled way.
PURPOSE OF THE STUDY
Since there is no principled way to distinguish different senses of words,

the only way to identify different word senses will be by collecting consistent
judgments of semantic similarity and difference of use from speakers of the
language (i.e., if two uses of a word are semantically distant enough, they will
be judged to be different meanings). By doing this for a large portion of the
vocabulary and using many judges, one may collect information that will be
relevant to modeling lexical memory.
The major focus of the study reported here was the assessment of the extent
of psychologically real polysemy in the mental lexicon for nouns. This study
assessed patterns of sense distribution for a set of nouns, based on semantic
similarity and distance judgments. Subjects made such judgments by sorting
usage citations for a given word into groups according to similarity of use or
meaning of that word in the citations, much as lexicographers do when compiling
definitions. The outcome of the study provided a way of evaluating the model
of the language presented by the dictionary, particularly the polysemy/frequency
correlation discovered by Zipf. This study focused on whether people do distin-
guish a greater number of senses for highly frequent nouns, as the dictionary
would suggest, and, if so, whether all those senses are equally frequent or
important.
This study also assessed the possible effect of a certain kind of bias on the
results of the citation sorting task described above. It may be that subjects are
very flexible in the number of distinctions they are able to make between dif-
ferent uses of a word (particularly if comprehension of atypical uses involves a
certain amount of extrasentential inference); if this is so, and if subjects become
motivated to make more distinctions when they have more uses to group, we
would find that more frequent words collect more senses (as in the dictionary)
at least in part as a result of this task bias. Since it is probable that lexicographers
have more citations to group for high-frequency words when they write deft-
nitions, a finding of such task bias for our subjects could imply that some senses
which appear in dictionaries are a product of a biasing effect of sample size.
Past studies of polysemy have been limited in one of two ways: Either they
have relied on judgments of only one informant or they have limited judgments
to uses of only one word. Lexicographers, in building dictionaries (and as in
the computerized disambiguation system of Kelly and Stone, 1975), have es-
sentially done sample studies of uses of the entire vocabulary of the language,
but they have relied on one individual's intuitions in most judgments. In addi-
tion, lexicographers have failed to adopt systematic & unbiased procedures for
sampling the language.
Psychologists, on the other hand, have pooled the intuitions of many in-
formants. However, they have generally failed to use systematic samples of
vocabulary. (Weinreich, 1980, and Clark, 1973, comment on this failure; studies
by Caramazza et aI., 1974, and Osgood, Suci, & Tannenbaum, 1957, exemplify
this failure.)
The sorting tasks included here sample from a large and representative
corpus of citations from written English, collected by Kucera and Francis (1967),
and multiple informants were used.
An advantage of having multiple judges lay in being able to evaluate in-
terpersonal consistency. In addition, having subjects make judgments twice, for
large samples of citations, allowed some evaluation of intrapersonal consistency.
The stability of judgments in both these realms acts as a measure of confidence
in subjects' intuitions about sense groupings.
METHODOLOGICAL BACKGROUND
The method of sorting has been used by Miller (1969) and others to study
the organization of lexical information in memory. Miller has argued that a
sorting task using individual nouns as items results in clusters of words that
reflect their common conceptual features while discounting their idiosyncratic
features. Sorting sentences according to the meaning of some key word that
appears in them would seem to be based on a similar process. In fact, Miller's
subjects had said that they sorted the words by trying to put them into sentence
contexts and then finding commonalities in the contexts.
Lexicographers use the method of sorting sentence citations as the means
of distinguishing various word senses for which to write definitions. The ex-
periments described here were designed to mimic the method of lexicography;
that is, subjects were given sentences using a particular word and asked to sort
them into groups, according to the similarities that they perceived in the uses
or meanings of the word in those sentences. So, comparing any two sentence
174 Jorgensen
citations that used a given word, the subject was required to ask himself or
herself whether that word had the same meaning in those two sentences. In order
to eliminate possible effects of short-term memory limitations on the number of
groupings subjects would make, subjects were allowed to keep all grouped stacks
of citations before them at all times, as well as notations they desired to make
concerning the senses represented by the groupings. In addition, no time limi-
tation on task completion was imposed.
The three sorting tasks used here are actually variants of the same task. In
Task 1, subjects sorted citations and wrote definitions for 12 high-frequency
nouns of high or low polysemy, using citation samples of varying size. In Task
2, the same subjects sorted these citations again, but dictionary definitions were
available to them for use as guides in sorting. In Task 3, a new group of subjects
sorted citations and wrote definitions for 12 low-frequency nouns. Using com-
bined data from these experiments, assessments of individual consistency, agree-
ment between subjects, agreement between subject and dictionary estimates of
polysemy, and the biasing effects of sample size were carried out.
METHODS
Task 1
Subjects. Seven graduate students and two undergraduates served as sub-
jects. They were chosen because they were native English speakers who had
done well on verbal ability tests (SAT or GRE verbal sections). They were
volunteers who received a small amou m of money for participating.
Design and Procedure. Sentences using 12 high-frequency nouns were
drawn from the Kucera and Francis (1967) corpus of one million running words
of written English. For every occurrence of a given noun in the corpus, a single
stimulus sentence was available. All such sentences for each of the 12 nouns
were individually typed on three-by-five cards, with the noun itself underlined
on each card. The total number of citations (equal to the frequency) for each
word is given in Table I. Uses of the words as proper nouns were eliminated
from our sample and from the count given in the table.
Six of the words were chosen because they are highly potysemous (11-21
nonarchaic senses), according to Webster's New Collegiate Dictionary, 1975,
and six were chosen for their comparatively low degree of polysemy (2-4 non-
archaic senses). Table I also gives the number of senses found in Webster's for
each word.
Card packets for subjects to sort were made in three sizes: they contained
20, 100, or 200 citation cards. Cards for each packet were drawn at random
from the entire set of citations for a given word, except that care was taken to
Table I. Nouns Used in Tasks 1 and 2

Noun Frequency~ Number of sensesb
head 421 21
life 682 18
world 688 14
way 893 12
side 370 12
hand 422 11
fact 447 4
group 382 4
night 400 3
development 311 3
something 449 2
war 306 2
o Frequency refers to number of occurrences (excluding proper nouns) in 1 million running words
of text (Kucera & Francis, 1967).
b Number of senses refers to sense divisions appearing in entries in Webster's New Collegiate
Dictionary (1975), excludingsenses labeled archaic or British, and, in the case of side, one esoteric
sense that was a theatrical term.
sample equally across all of the 15 genres represented in the corpus (see Kucera
and Francis, 1967, for a description of the genres). Each of the nine subjects
received a packet for each of the 12 words; four packets for each subject con-
tained 20 cards each, four contained 100 cards each, and four contained 200
cards each, so that over the nine subjects each word was represented by each
citation sample size three times. No two subjects got exactly the same set of
cards, although some overlap necessarily existed between sets for the same
word.
Each subject was given his or her 12 card packets in a random order. The
following written instructions were given: (1) Sort the sentences into groups
according to similarity in meaning or use of the underlined word. Make as many
(or few) groups as you wish. (2) As you work, attach a written label to each
card pile to help you remember your criteria for grouping those sentences to-
gether. Start labeling at the beginning, and make notations as you need them.
(3) After you finish sorting the cards, go back to your notations (and look at
the cards if you wish) and write a definition for each group. This definition
should be detailed enough to allow another person to write a meaningful sentence
using the word in that particular way/sense, even if the word were unfamiliar
to him. (4) If there are any sentences you cannot understand, set them aside and
ask the experimenter for substitutes. (5) All the key words in this task are nouns.
Some sentences may contain adjectival uses of these words, however, such as
"'night s k y " or "'group therapy." You should not need to characterize these
176 Jorgensen
uses by form (that is, adjective vs. noun) per se, since your task is to sort
according to similarities in meaning. (6) Please don't look up any of these words
in a dictionary or thesaurus until you have finished all experimental tasks.
There was no time limit for the sorting task, and subjects completed parts
of it on different days, as it was very time-consuming (although any one packet
had to be finished at one sitting). Just as subjects were allowed to add to or
change their labels for the groups as needed, they were also allowed to alter
their groupings of the citations at any point in the task. Following the sorting
task, subjects were asked to give written responses to a set of questions about
the sorting strategies they used.
Task 2
Task 2 was used to determine how subjects will change their strategies and
groupings in the same sorting task when given dictionary definitions to use as
guides.
Subjects. The nine subjects from Task 1 also participated in Task 2.
Design and Procedure. The stimuli were the same cards allotted to each
subject in Task 1, organized into the same packets. The order of presentation
was changed, both for the individual cards (which were shuffled) and for the
packets themselves (which were in a new random order).
Each of the 12 packets a subject received was attached to a set of cards
that had individual dictionary definitions (from Webster's New Collegiate Dic-
tionary) for the key word typed on them. These definitions corresponded to
those counted in "Number of senses" in Table I. Thus, the card packets for
head were each accompanied by 21 definition cards, packets for group by 4
definition cards, etc. The definition cards themselves were randomized before
being attached to the citation card packets.
A period of at least 1 week, but less than 2 weeks, elapsed between any
subject's completion of Task 1 and beginning of Task 2.
Instructions were the same as those for Task 1, except that the definition
cards were to be read prior to, and used in, the sorting task. Subjects were told
to try to categorize all the citation cards in a packet according to the definition
cards; for any citations they could not categorize this way, they were told to
make their own categories and definitions. Subjects were advised that they could
use as many or as few of the given definition cards as they chose to, and, again,
that they could make as many or few categories for the citations as they wished.
Subjects were also asked to report any ambiguities they noticed in the dictionary
definitions.
After completing the sorting task, subjects were again asked to reply in
writing to a series of questions about their sorting strategies.
Task 3
The purpose of Task 3 was to obtain a set of groupings and definitions

written by subjects for low-frequency words, which could be compared to those
written for high-frequency words.
Subjects. A new group of nine Princeton University graduate and under-
graduate students, chosen because they had done well on verbal ability tests
(verbal section of SAT or GRE), served as subjects voluntarily, and were paid
a small amount for participating.
Design and Procedure. Sentences using 12 low-frequency nouns were drawn
from Kucera and Francis (1967). These sentences were typed on cards, just as
in Task 1. Table II lists these 12 words, giving frequency and sense counts.
Six of the words were chosen because they are highly polysemous (relative
to other words in this frequency range), having 5 or 6 nonarchaic senses in
Webster's. The other six words have 1 or 2 senses each.
Card packets for subjects to sort contained 20 cards each, chosen randomly
for each subject from the total set of cards. Since these words had only 20 to
26 citations each, subjects in fact saw primarily the same cards. Each of the
nine subjects received a packet for each of the 12 words, and cards within the
packets were always re-randomized for each new subject.
Each subject was given his or her 12 card packets in random order. Instruc-
tions were the same as those for Task 1, that is, subjects were to sort cards into
Table II. Nouns Used in Task 3

i
Noun Frequency~ Number of sensesb

storm 24 6
vein 24 6
composition 23 6
devil 20 6
settlement 24 5
prospect 24 5
ritual 24 2
promotion 24 2
disaster 26 1
pond 24 t
cigarette 24 I
exploration 22 1
" Frequency refers to number of occurrences (excluding proper nouns) in 1 miIlion running words
of text (Kucera & Francis, 1967).
b Number of senses refers to sense divisions appearing in entries in Webster's New Collegiate
Dictionary (1975), excluding senses labeled archaic.
178 Jorgensen
categories according to the meaning or use of the underlined word, to keep notes
during the process, and to write definitions after sorting.
RESULTS
Number of Categories. Table III shows overall results for the three sorting
tasks in terms of the mean number of definition categories subjects made for
each word type and for each task type (that is, with or without dictionary
definitions as guides), as compared to the mean number of dictionary definitions
for each word type.
Except for words characterized by the greatest polysemy (averaging about
14.6 dictionary senses), subjects distinguished around three senses for any given
word in the task without dictionary definitions.
When provided with dictionary definitions to use in sorting, subjects ap-
peared to change their estimates of the number of senses. For the words of
greatest polysemy (dictionary average of 14.6 senses), subjects significantly
increased the number of sense categories (p < .01, matched t-test), although
they did not adopt the even larger number of categories suggested by the dic-
Table I I L Tasks 1, 2, and 3; Mean Numbers of Sorting and Dictionary Sense

Categories with Standard Deviations for High- and Low-Frequency Words Having
Many or Few Senses in the Dictionary ~
High-frequency words
Many dictionary senses Few dictionary senses
(11-21) (2-4)
M SD b M SO b
Dictionary 14.6 3.98 3.0 0.89

Task 1 5.59 1.42 3.25 0.85
Task 2 9.14 1.05 2.73 0.52
Low-frequency words
Many dictionary senses Few dictionary senses
(5-6) (1-2)
M SD t" M SD z"
Dictionary 5.66 0.51 1.30 0.51

Task 3 3.50 0.36 2.47 0.73
Webster's N e w Collegiate Dictionary (1975).
t, Standard deviations from experiments are between words; values for standard deviations between
subjects were very similar to those between words.
tionary. For words which have few dictionary senses (an average of three),
subjects modified their estimates to be more in harmony with the number of
categories suggested by the dictionary in a few cases, but not consistently. The
overall difference between the sorting results for the low-polysemy words with
and without dictionary definitions was not significant; the independent estimates
of our subjects and the dictionary both converged on an average of three senses
to begin with.
Consistency of Categories. Since information is available about which in-
dividual citations were grouped together in particular categories by particular
subjects in the sortings, it is possible to examine the consistency in grouping
between the first sorting task (without dictionary definitions) and the second
sorting task (with dictionary definitions).
As a measure of such consistency, the Agreement-Disagreement (A-D)
ratio, devised by George A. Miller, and used and described by Shipstone (1960),
is useful, as it was designed for sorting data. This ratio is a statistic devised to
calculate the amount of agreement between different sortings of the same ma-
terial by taking into account differences in grouping of particular items (in this
study, the sentence citations). The ratio is a kind of correlational technique:
when two groupings are identical, it yields a value of + 1.0, and when two
groupings are completely diverse, it yields a value of - 1.0. Roughly speaking,
it expresses the observed number of agreements between the groupings as di-
vided by the total number of possible agreements. The appendix at the end of
this article gives details of calculation of the ratio, along with notes regarding
possible bias in the technique.
Table IV presents the Agreement-Disagreement ratios between the group-
ings for individual words by particular subjects in the first sorting task and in
the second sorting task. An analysis of variance showed that consistency values
for individual words were significantly different, with F(11, 88) = 6.98, p <
.0001, as were overall consistency values for the six high-polysemy words
compared to the six low-polysemy words, with F(1, 8) = 36.82, p < .0003.
Duncan's Multiple Range Test (at p < .05) yielded groupings of words
which were significantly different in consistency values, as shown in Table V.
The overall difference in consistency between words of high and low po-
lysemy (as measured by number of senses in dictionary entries or by number of
senses assigned by subjects in the second sorting task) goes against the natural
direction of bias in the Agreement-Disagreement ratio; the ratio is more likely
to be inflated when categories are larger (thus, having fewer categories is infla-
tionary). However, it is possible that some difference in the way citations are
distributed in categories for various words could give some words one unusually
large category along with some extremely small ones, while other words could
have several medium-size categories. Perhaps in that case the words with one
180 Jorgensen
T a b l e IV. Agreement-Disagreement Ratios for Tasks 1 and 2

i
Subject
Worda 1 2 3 4 5 6 7 8 9 Mean
head (21) .99 .63 .88 .68 .93 .54 .81 .97 .99 .83
life (18) .30 .00 .69 .52 .31 .58 .26 .49 .30 .38
world (14) .23 .08 .18 .34 .08 .45 .25 .47 .37 .27
way (12) .45 .66 .47 .36 .86 .69 .55 .54 .45 .56
side (12) .69 .12 .64 .42 .26 .36 .33 .66 .62 .45
hand (11) .65 .82 .95 .87 .70 .76 .49 .88 .87 .78
fact (4) .27 - . 3 1 .59 .49 1.00 -.05 -.12 .29 .63 .31
group (4) -.32 -.15 -.04 -.18 -.31 .79 .04 .44 - . 2 9 -.005
night (3) -.18 .02 .25 .01 -.04 .33 .11 .53 .41 .16
development (3) -.11 .51 -.18 -.09 .60 .32 .70 -.t9 .13 .18
something (2) .02 - . 0 0 7 .13 .13 .79 -.19 1.00 -.03 .24 .23
war (2) .63 .70 -.04 .15 .52 .32 .53 .01 .02 .31
Mean .30 .25 .37 .31 .47 .41 .41 .42 .39
Number of dictionary senses is given in parentheses.
Table V. Agreement-Disagreement Ratio Difference Groupings Indicating Consistency

Between Sortings
Number of senses
Mean
Grouping A-D ratioa Word Dictionary Task 2
A .82 head 21 9.44
A .78 hand 11 7.44
AB .56 way 12 8.44
B C .45 side 12 9.66
B C .38 life 18 10.44
B C .31 war 2 2.11
B C .31 fact 4 3.33
B CD .27 world 14 9.44
CD .23 something 2 2.22
CD .18 development 3 2.77
CD .16 night 3 3.33
D - .005 group 4 2.66
a Mean ratios with different grouping letters are significantly different (p < .05) by Duncan's
Multiple Range Test.
outsize c a t e g o r y w o u l d h a v e i n f l a t e d ratios, c o m p a r e d to the o t h e r w o r d s . H o w -

ever, this p o s s i b i l i t y is i r r e l e v a n t to the set of w o r d s w e used: A s it turns out,
the m o r e c o n s i s t e n t w o r d s do n o t h a v e larger m a j o r categories t h a n do the less
consistent words. For instance, development, something, and group each have
a major category with a larger proportion of citations in it than do head or hand,
which are much more consistent. So the difference in consistency by degree of
polysemy should not be attributable to bias in the ratio.
The size of the citation sample provided to the sorting subject (which could
be 20, 100, or 200 citation cards) had no effect on the size of the Agreement-
Disagreement ratio.
It seems likely, then, that some dictionary definitions are much more like
subjects' initial intuitions about features of similarity and difference between
the sentences that comprise sense groupings than are others. In addition, al-
though seeing the dictionary definitions for the high-polysemy words seems to
encourage subjects to add new senses, there seems to be a greater basic con-
gruence between initial intuitions and the definitions for these words than there
is between initial intuitions and the definitions for low-polysemy words.
InterpersonalAgreement. Table VI summarizes the extent of subject agree-
ment about proportional allotment of citations to various senses in Task 2. That
is, for each dictionary sense label, we calculated the proportion of citations each
subject allotted to it, and then we calculated the mean proportion of citations
for each sense (using N = 9, total number of subjects, regardless of possible
empty cells) and the standard deviation. Considering the standard deviation a
measure of interpersonal agreement, we computed the coefficient of variation
(V = s/m) to allow a comparison of interpersonal agreement between senses
with different means. Because different senses may have different numbers of
empty cells (subjects who chose not to use the sense), V is not always a precise
measure of comparison (as empty cells increase, the mean decreases, while the
standard deviation grows). Since Nwas set equal to the total number of subjects,
V is a measure of agreement about whether or not to choose the sense at all, as
well as a measure of agreement about the proportions of citations which belong
with the sense.
Table VI is based on all groupings which comprised 5% or more of any
subject's sample of citations (since 5% is equivalent to one citation for samples
of 20). A smaller value of V reflects greater agreement about the proportion of
citations which should be put in the sense, keeping in mind the biasing effect
of any empty cells.
Considering the number of dictionary definitions from which subjects had
to choose (especially for the high-polysemy words), as well as the fact that no
two subjects had the same set of citations for a given word (in spite of some
overlap), the extent of interpersonal agreement about which senses are important
seems impressive. However, the extent of agreement about proportional allot-
ment of citations varies considerably for different words. The primary sense of
head is an example of the cases with strong agreement: To obtain a coefficient
182 Jorgensen
Table VI. Agreement in Proportional Distribution of Senses"

i
Sense
Word 1 2 3 4 5 6 7
head
percentb 69,0 6.8 1,7 1.6 1.3 0.8 0.6
Vc 0.06 0.63 2.06 3.01 1.98 3.02 3.03
empty cellsd 0 2 7 8 7 8 8
hand
percent 63.0 21.0 1.8
V 0.20 0.30 1.38
empty cells 0 0 5
way
percent 44.0 11.0 9.0 5.3 4.8 3.4 1.7
V 0.21 0.46 1.14 1.72 1.53 1.31 2.07
empty cells 0 1 3 6 6 5 7
side
percent 27.7 11.0 10.8 10.7 9.4 5.8 3.4
V 0.56 0.68 0.40 1.11 1.13 0.90 1.98
empty cells 1 2 0 3 4 3 7
life
percent 30.4 14.7 10.8 10.6 9.7
V 0.50 0.50 0.99 1.34 0.93
empty cells 1 0 3 5 3
war
percent 85.4 12.9
V 0.09 0.44
0 0
fact
percent 42.7 37.6 18.3 3.1
V 0.37 0.52 0.96 1.99
empty cells 0 0 1 5
world
percent 27.8 13.3 9.7 7.0 6.2 4.0
V 0.66 0.65 0.86 0.85 0.90 1.02
empty cells 1 1 2 2 3 4
something
percent 67.7 20.5
V 0.51 1.00
empty cells 1 1
development
percent 86.8 7.2 5.8
V 0.09 1.14 0,83
empty cells 0 2 1
night
percent 55.5 39.0 4.0
V 0.31 0.31 0.84
empty cells 0 0 2
Table VI. Agreement in Proportional Distribution of Senses a (Continued)

Sense
Word 1 2 3 4 5 6 7
group
percent 8t.2 17.1 0.9 0.2
v 0.38 1.84 1.61 3.00
empty cells 1 1 4 8
" Words are listed in descending order of Agreement-Disagreement ratios. Senses included are those
for which at least one subject apportioned at least 5% of a citation sample (equivalent to one
citation for a sample of 20). Head, hand, and side each had a few very small senses with low
agreement, which were omitted for lack of space.
b Percent = mean proportion of citations allotted, when n = 9, total number of subjects in Task
2.
c V = coefficient of variation (S/m).
u Empty cells = the number of subjects who did not use the sense at all in labeling.
of variation equal to .062, one needed a standard deviation of only 4.3 against
a mean of 69. The primary sense of world is an example of weak agreement,
with a standard deviation of 18.59 against a mean of 27.88. If interpersonal
agreement is taken as a criterion for saying that a grouping of citations exem-
plifies a psychologically real sense (which is a matter of degree, just as is
intrapersonal agreement), then it could be argued that none of these words has
much more than three real senses, if having V less than (around) 1 (so that the
standard deviation is less than the mean) is used as the criterial limit.
It is notable that interpersonal agreement does not seem to correlate with
the A - D ratio number (or with number of dictionary senses), i.e., there is as
much agreement about major senses and proportions for war, development, and
night, as for head, hand, and way.
Patterns of sense distribution also vary considerably among these words.
For instance, " h a n d " and " h e a d " show less agreement on minor senses than
do life and world, but more agreement on the primary sense. It is possible that
the more important senses for life and world (or the ways they relate to dictionary
definitions) are more alike and confusable than those for hand and head.
One interesting practical outcome of having empirically derived information
about patterns of sense distribution is that one can use them to estimate the
probabilities with which a computerized disambiguation program will make er-
rors in processing a given amount of text if it lacks a representation for a
particular sense. Some very infrequent senses might not be worth including in
such a system if their absence generates very low error rates. Kelly and Stone
(1975) describe a similar error prediction procedure.
Effect of Sample Size on Number of Categories. In using sorting tasks to
184 Jorgensen
assess the polysemy of particular words, lexicographers may naturally often find
that the number of citations available for low-frequency words is much smaller
than for high-frequency words. If the sorting task just involves judgments of
meaning difference, the number of senses of a word actually represented in a
given citation sample will vary as a function of the size of the sample and the
probability of occurrence of the various senses of the word. However, the sorting
process may be contaminated by biases of various kinds, so that it is not an
accurate measure of the meaning differences people actually perceive in ordinary
comprehension. Since lexicographers do not have equal numbers of citations
for words of differing frequencies, it seems important to ask whether or not
citation sample size alone has an effect on the number of categories people will
make when they are asked to sort. Is the greater polysemy of high-frequency
words in the dictionary at least partly an effect of this kind of bias, so that the
larger the sample, the more categories one feels inclined to make?
Since the high-frequency words in the sorting tasks used here were pre-
sented to subjects in citation samples of three different sizes, the results of these
tasks may offer some insight concerning the possibility of this kind of sample
size bias in lexicographic sorting tasks. Table VII presents the mean number of
senses created for each word in each of the three sample size conditions, both
for Task 2 and for Task 1, sorting with and without dictionary definitions as
guides.
Table VII. Mean Numbers of Sense Groupings by Sample Size

i
Sample size
Task 1" Task 2 ~
Noun 20 100 200 20 100 200
head 4 7 7.6 6.6 10.6 11

life 4.3 4 5.3 8.6 11.6 11
world 3.3 4 4.3 6 10 12.3
way 3.6 4 7.3 6.6 8.6 9.6
side 4 7 6.3 6.6 10 12.3
hand 5.3 8 10.6 4.6 7.3 10
fact 2 3.3 2 3 3.6 3
group 2.3 4.3 4 1.3 3 3.6
night 1.6 3 3.3 2.6 3 4
development 3 5 5 2.3 3 3
something 3.3 5 4 2.6 2.3 1.6
war 1.3 3.3 2.6 2.3 2 2
Task 1: sorting citations.

b Task 2: sorting citations with dictionary definitions as guides.
To evaluate the possibility of a sample size bias, one needs to know what
to expect if such a bias does not exist, i.e., what are the expected outcomes of
our sorting conditions when those outcomes are solely a function of sample size
and the probabilities of occurrence of the senses of each given word? Such a
model of expectations is difficult to find because the true numbers and relative
probabilities of senses are not known. There are not any independent data to
prove information about sense distributions; West (1953) has estimates for some
words, but one cannot tell how they were obtained. Also, it is clear from the
earlier results of these studies that senses are not at all equiprobable, but analytic
ways of deriving a model of expectations, such as the Poisson distribution
(Feller, 1950), are not applicable to nonequiprobable categories of outcome.
In order to get around these problems, one may use differences in the
proportional occurrence of senses in samples of different sizes as a clue to the
presence or absence of a sample size bias in sorting. It is possible to use the
proportions of citations allotted to the various senses for a given word by our
subjects in Task 2 as an estimate of the true probabilities of occurrence of those
senses. The dictionary definition labels let one combine proportional information
about the senses from different subjects, since the subjects are in good agreement
about the use and relative importance of the various labels (even though no two
subjects had the same set of citations), as Table VI shows.
The estimate of sense proportions for any word, then, was based on data
from three subjects from the largest sample size condition (200 citations), by
averaging proportions of citations allotted to each sense category chosen by
subjects for that word. This averaging did not violate any subject's weightings
of senses, except in the case of a few very small senses.
These average sense proportions were taken as estimates of probabilities.
For each word, a Monte Carlo simulation generated events according to these
estimated probability distributions. Results of this procedure are estimates, for
a given word, of the probability of occurrence of a given number of senses in
the samples of 20 and 100 citations.
The numbers of categories that were created by each individual subject in
each condition (20 or 100 citations) for each of the words in Task 2 may be
taken as data to be compared to the distribution of expected values resulting
from the Monte Carlo simulation. Unfortunately, to firmly reject the hypothesis
that sample size creates a bias in sorting, one would need a larger number of
subjects for each word in each condition (n = 3 for this in Task 2). However,
one may examine the probabilities of the individual outcomes across all 12 words
to see if there is any clear pattern of deviation above or below the median of
expected values, which would indicate a good possibility of bias. A pattern
below the median (since we are extrapolating from the sample of 200 to the
samples of 20 and 100) would be indicative of the bias to make disproportion-
186 Jorgensen
ately larger numbers of sense categories as samples grow larger. Comparison

of our data to these medians showed no clear pattern of deviation either above
or below them. Out of 72 outcomes, 32 of them were substantially like the
median, while 18 were somewhat above it and 22 were somewhat below it.
Therefore, a fair tentative conclusion is that there does not seem to be any
systematic effect of sample size per se on the number of categories subjects
make. Thus, the polysemy of often-used words is probably not attributable to
an artifact resulting from the larger number of instances available to the lexi-
cographer.
DISCUSSION AND CONCLUSIONS
Patterns of sense distribution in the sorting tasks and interpersonal agree-

ment about them suggest that most words, even those of high frequency, have
no more than three major senses. This finding may explain data reported by
Durkin and Manning (1989) that a majority of subjects will name the same one
or two meanings for a common word when asked to "write the first meaning
that comes to mind." It seems fair to say that dictionary entries for some words
(such as hand and head) do inflate the number of sense categories beyond those
normally distinguished by speakers. One difficulty people will have in using the
dictionary is in distinguishing major and minor senses, since most dictionaries
treat all senses as equally important and equiprobable, which is clearly mis-
leading, particularly to someone who doesn't know the word in the first place.
Sorting citation samples seems to be an effective way of bringing to mind
facts about a word's meaning. In the followup questionnaire, subjects reported
that larger citation samples were not harder to sort than smaller ones, so pre-
sumably the memory demands of the tasks did not interfere with categorization.
Definitions based on smaller samples were sometimes reported to be more dif-
ficult to write, perhaps because the smaller samples provided less information
about the word. Another finding of this research is that increasing the size of
the citation sample does not seem to bias people to increase the number of
categories they will make; therefore, we can have some confidence that subjects'
judgments of what senses exist for particular words are stable, and so presumably
based on structural properties of their internal lexicons. In addition, lexicography
may be free of this bias, which is fortunate since equal numbers of citations for
different words will not be available to lexicographers.
The sorting tasks showed that intrapersonal and interpersonal agreement
about a word's senses and about which usages instantiate particular senses may
vary considerably for different words. The reason for this variability is unknown,
but two factors may relate to it. First, there is some suggestion, from responses
on subject questionnaires, that concreteness of a word may increase agreement

and that concrete words are easier to define. Further research would be needed
to substantiate this, but it would be interesting to examine the role of classifying
referents as a sense identification and definition-writing strategy. Second, using
dictionary definitions as guides in the sorting task may bias people to make
some citation groupings they would admit are arbitrary in relation to their true
intuitions about the word's senses. Some dictionary definitions that cause this
kind of bias (and thus a lack of consistency when subjects sort citations the
second time) may still elicit groupings that subjects will agree about interper-
sonally. (We did find that definitions that Were associated with low Agreement-
Disagreement ratios were not always associated with interpersonal disagreement
about which senses were relevant or proportionally important.) That is, salient
features of the definitions may have little to do with the ways people naturally
distinguish senses, but their saliency may cause most people to apply them in
citation sorting. Thus, saliency of irrelevant features in definitions may vary for
words just as does inclusion of important features.
Lack of interpersonal agreement in allotting citations to the minor senses
of high-frequency words existed when dictionary definitions were used as guides
to sorting. It is possible that this phenomenon relates to the occurrence of idi-
omatic uses which are difficult to classify--in the followup questionnaire sub-
jects often reported that idiomatic uses were the most difficult to classify and
define because they did not seem to belong with the major sense or senses of
the word, but making separate categories for them did not seem justifiable. It
would be interesting to know how many of the minor senses of high-frequency
words, and how much of the variability in our subjects' performance, are due
to idiomatic uses. If idiomatic uses account for much of the polysemy of high-
frequency words, it may be that entries in the internal lexicon for major (non-
idiomatic) senses are much the same in number and scope for high- and low-
frequency words.
APPENDIX
The Agreement-Disagreement Ratio

Shipstone (1960) describes calculation of the ratio as follows:
The Agreement-Disagreement ratio is a statistic devised to measure the amount
of correlation between sortings made by any two persons. The technique of cal-
culating the amount of agreement between different sortings is as follows:
Consider the sortings made by any two subjects X and Y:
X: (1 2 3 4 5) (6 12 16) (9 14) (7 8 10 11 13 15)
Y: (1 2 4 5) (3) (6 10 12 15) (7 8 9 11 13 14) (16).
188 Jorgensen
Step 1. Construct sets from the intersection of these two classifications:

Intersection: (1 2 4 5) (6 12) (7 8 11 13) (9 14)(10 15) (3) (16).
The intersection is constructed in such a way that every set in the data set for X
and Y can be reconstructed by combining sets from the intersection, and yet the
sets in the intersection are as large as possible.
Otherwise said, every distinction among the strings that either X or Y indicated
is preserved in the intersection.
Step 2. Calculate the number of agreements for each set. For example, if a set
contains four strings there are 3 + 2 + 1 = 6 pairs of strings that agree with
one another in the sense that they are placed in the same class and considered to
be equivalent. Below are tabulated the number of pairs of strings that agree as a
function of the size of set.
size = 1, # agreements = 0
size = 5, # agreefiaents = 10
size = x, # agreements = x(x - 1)/2.
Thus, for example, a set of 11 strings would yield 55 pairs that were in agree-
ment. If size is x then the number of agreements is given by the equation A =
x(x - 1)/2. So the second step is to add up the number of agreements in this
sense for X, Y, and the intersection:
subject X: 5 10 subject Y: 4 6 intersection: 4 6

33 10 10
21 46 21
6 15 6 15 4 6
__ 10 21
29 __ 2 1
27 1 0
15
Step 3. Calculate the A-D ratio from these three numbers, 29, 27, 15, by
substituting them into the following equation:
r -- 1 - (x + y - 2intersection)/60
r - ~ 1 - (29 + 2 7 - 30)/60 = 1 - 26/60 = 1 - .443 = .56
The denominator in the above formula (60 in this example) represents the max-
i m u m n u m b e r of agreements possible, x(x - 1)/4.
In this paper, we were concerned with comparing the sorting for a given
word in sorting experiment I with its sorting in experiment II, b y individual
subject, rather than with c o m p a r i n g sortings made b y different subjects (since
no two subjects had the same set of citations).
Shipstone points out that the ratio is biased in that the measure of agreement
for a set rises with the increase in the number of instances; that is, the A - D
ratio grows somewhat disproportionately as categories get larger.
If each citation added to a set is not compared with every item already in
the set (that is, if the subject simply assumes his or her criteria are transitive
across all items), then the assumption that every item in a set adds a unit of
agreement for its conjunction with every other item in the set m a y also spuriously
inflate the ratio for large sets.
REFERENCES
Anderson, R. C., & Ortony, A. (1975). On putting apples into bottles--A problem of polysemy.
Cognitive Psychology, 7, 167-180.
Barclay, J. R., Bransford, J. D., Franks, J. J., McCarrell, N. S., & Nitsch, K. (1974). Compre-
hension and semantic flexibility. Journal of Verbal Learning and Verbal Behavior, 13, 471-
481.
Bierwiseh, M. (1981). Basic issues in the development of word meaning. In W. Deutsch (Ed.),
The child's construction of language (pp. 341-380). New York: Academic Press.
Byrd, R. J., Calzolari, N., Chodorow, M. S., Klavans, J. L., Neff, M. S., & Rizk, O. A. (1987).
Tools and methods for computational lexicology (RC 12642, No. 56847). Yorktown Heights,
NY: IBM Thomas J. Watson Research Center.
Caramazza, A., Grober, E. H., & Zurif, E. B. (1974). A psycholinguistic investigation ofpolysemy:
The meanings of LINE. Unpublished manuscript, The Johns Hopkins University, Baltimore.
Churchland, P. M. (1979). Scientific realism and the plasticity of mind. Cambridge: Cambridge
University Press.
Clark, H. H. (1973). The language-as-fixed-effect fallacy: A critique of language statistics in
psychological research. Journal of Verbal Learning and Verbal Behavior, 12, 335-359.
Deese, J. (1967). Meaning and change of meaning. American Psychologist, 22, 641-651.
Durkin, K., & Manning, J. (1989). Polysemy and the subjective lexicon: Semantic relatedness and
the salience of intraword senses. Journal of Psycholinguistic Research, 18, 577-611.
Edwards, D. (1983). Foundationalism in the philosophy of science and in semantics. Unpublished
manuscript, Princeton University, Princeton, NJ.
Feller, W. (1950). An introduction to probability theory and its applications. New York: John
Wiley and Sons.
Jackendoff, R. (1983). Semantics and cognition. Cambridge, MA: MIT Press.
Johnson-Laird, P. N., & Quinn, J. G. (1976). To define true meaning. Nature, 264, 635--636.
Katz, J. J., & Fodor, J. A. (1963). The structure of a semantic theory. Language, 39, 170-210.
Kelly, E., & Stone, P. (1975). Computer recognition of English word senses. Amsterdam: North
Holland.
Kucera, H., & Francis, W. (1967). Computational analysis of present-day American English.
Providence, RI: Brown University Press.
Kurylowitz, J. (1955). [Notes on word meanings]. Voprosy Jazykoznanija, 3, 73-81.
MacNamara, J. (1971). Parsimony and the lexicon. Language, 47, 359-374.
Miller, G. A. (1969). A psychological method to investigate verbal concepts. Journal of Mathe-
matical Psychology, 6, 169-191.
Miller G. A., Fellbaum, C., Kegl, J., & Miller, K. (1988). Wordnet: An electronic lexical reference
system based on theories of lexical memory (Report No. 11). Princeton, NJ: Princeton Uni-
versity, Cognitive Science Laboratory.
190 Jorgensen
Miller, G. A., & Gildea, P. M. (1987). How children learn words. Scientific American, 258, 94-
99.
Osgood, C. E., Suci, G. J., & Tannenbaum, P. H. (1957). The measurementof meaning. Urbana:
University of Illinois Press.
Panman, O. (1982). Homonymy and polysemy. Lingua, 58, 105-136.
Quine, W. v. O. (1960). Word and object. Cambridge, MA: MIT Press.
Reder, L. M., Anderson, J. R., & Bjork, R. A. (1974). A semantic intepretation of encoding
specificity. Journal of Experimental Psychology, 102, 648-656.
Rosch, E., & Mervis, C. (1975). Family resemblances: Studies in the internal structure of categories.
Cognitive Psychology, 7, 573-605.
Schoen, L. M. (1988). Semantic flexibility and core meaning. Journal of Psycholinguistic Research,
17, 113-123.
Shipstone, E. I. (1960). Some variables affecting pattern conception. Psychological Monographs:
General andApplied, 74, 1-41.
Tulving, E., & Thomson, D. M. (1973). Encoding specificity and retrieval processes in episodic
memory. Psychological Review, 80, 352-373.
Webster's New Collegiate Dictionary (1975). Springfield, M: G. & C. Merriam.
Weinreich, U. (1980). On semantics. Philadelphia: University of Pennsylvania Press.
West, M. (1953). A general service list of English words with semantic frequencies and a supple-
mentary word-listfor the writing ofpopular science and technology. London: Longmans Green.
Wittgenstein, L. (1958). Philosophical investigations. New York: Macmillan.
Zgnsta, L. (1971). Manual of lexicography. The Hague: Mouton.
Zipf, G. K. (1945). The meaning-frequency relationship of words. Journal of General Psychology,
33, 251-256.
Zipf, G. K. (1949). Human behavior and the principle of least effort. Cambridge, MA: Addison-
Wesley.

The Psychological Reality of Word Senses: Julia C. Jorgensen I

Enviado por

Dados do documento

Descrição original:

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

The Psychological Reality of Word Senses: Julia C. Jorgensen I

Enviado por

Direitos autorais:

Formatos disponíveis

Journal of Psycholinguistic Research, Vol. 19, No.

The Psychological Reality of Word Senses

Polysemy is a central problem in the study and description of natural language.

0090-6905/9010500-0167506.00/0 9 1990 Plenum PublishingCorporation

crane, a bird, and crane, a piece of heavy machinery. Homonymy is usually

la. The fur was light.

lb. He just won a small prize.

This level of perceived difference in meaning has been called, in addition

ferences in usage context make significant differences in a word's meaning, the

PURPOSE OF THE STUDY

Since there is no principled way to distinguish different senses of words,

Table I. Nouns Used in Tasks 1 and 2

The purpose of Task 3 was to obtain a set of groupings and definitions

Table II. Nouns Used in Task 3

Noun Frequency~ Number of sensesb

Table I I L Tasks 1, 2, and 3; Mean Numbers of Sorting and Dictionary Sense

Dictionary 14.6 3.98 3.0 0.89

Dictionary 5.66 0.51 1.30 0.51

T a b l e IV. Agreement-Disagreement Ratios for Tasks 1 and 2

Table V. Agreement-Disagreement Ratio Difference Groupings Indicating Consistency

outsize c a t e g o r y w o u l d h a v e i n f l a t e d ratios, c o m p a r e d to the o t h e r w o r d s . H o w -

Table VI. Agreement in Proportional Distribution of Senses"

Table VI. Agreement in Proportional Distribution of Senses a (Continued)

Table VII. Mean Numbers of Sense Groupings by Sample Size

Task 1" Task 2 ~

Noun 20 100 200 20 100 200

head 4 7 7.6 6.6 10.6 11

Task 1: sorting citations.

ately larger numbers of sense categories as samples grow larger. Comparison

DISCUSSION AND CONCLUSIONS

Patterns of sense distribution in the sorting tasks and interpersonal agree-

on subject questionnaires, that concreteness of a word may increase agreement

The Agreement-Disagreement Ratio

Step 1. Construct sets from the intersection of these two classifications:

subject X: 5 10 subject Y: 4 6 intersection: 4 6

Você também pode gostar