Você está na página 1de 23

The Value of Cognitive Diversity:

The Correlation of Local Aggregates with World Standards.


1


James Shilts Boster
University of Connecticut

Abstract

As one increases the size of a pool of informants performing a similarity
judgment task, the mean correlation of the aggregated responses to a world standard is
equal to
N
r
r
r
xx
xx
xy

+
1
, where r
xy
is the average individual informants correlation with
the world standard, r
xx
is the average correlation among informants on the similarity
judgment task, and N is the number of informants in the pool of aggregated responses. In
six studies spanning three domains (color, verbs of disintegration, emotions), the r
2
of this
relationship is above .999. This result can be interpreted as one of the consequences of
the fact that cases of cross-cultural universals are rooted in individual cognition. Further,
the strength of the relationship suggests that the existence of cross-cultural universals
imposes a limit on the magnitude of intra-cultural variation: The square root of the
average correlation among informants on the similarity judgment task can be no lower
that the average individual correlation with the world standard. At this limit of minimum
possible intra-cultural variation, there could be no aspect of the local cultural pattern that
was not subsumed by the universal, because the correlation of an infinite sized aggregate
with the world standard would be one. Finally, the result also shows that it is precisely
the disagreement among informants that allows their aggregation to so closely
approximate the world standard.
5/6/2004 2
Culture is the information pool that emerges when members of a
community attempt to make sense of the world and each other as they
struggle and collaborate with each other to get what they want (e.g., food,
sex, power, acceptance, etc.). Because individuals construct their
conceptions of the world from their own experiences and for their own
motivations, their understandings vary from one another depending on the
characteristics of the individuals, the nature of the domain learned, and the
social situations in which learning takes place. (modified from Boster,
1986).

Cognitive anthropologists since Roberts (1964) have seen culture as an
information pool. The metaphor is an apt though sloppy one: culture is a fluid with
nothing to constrain it, save a container. The metaphor frees us from the expectation that
cultural boundaries (the limits of the information pools) will coincide with the boundaries
of social groups some information will spill over from one group to another. Culture is
a collective representation, an aggregation of what it is that individuals have learned, but
individuals in different social groups may come to similar understandings based on
common inferences from experience (Boster, 1987). Indeed, cognitive anthropologists
have documented a number of cases of strong cross-cultural universals
2
in which
particular domains are understood in fundamentally similar ways by different social
groups. These semantic universals have been most thoroughly explored in the case of
color classification (Berlin and Kay, 1969; Kay and McDaniel, 1978; Kay, Berlin, and
Merrifield, 1991, Kay and Maffi, 1999) and folk biological classification (Bulmer, 1970;
Berlin, Breedlove, and Raven, 1973; Boster, Berlin, and ONeill, 1986; Boster and
DAndrade, 1989), although they have been described for other domains as well.
3

The common denominator of cases of cross-cultural universals is that the
universals are rooted in the details of individual cognition: Linguistic communities agree
because the individual members of them discern the same underlying pattern in the
phenomenon. Usually, this is a result of the fact that the phenomenon has some kind of
intrinsic structure independent of the observer (e.g., the biological world) and that
humans share a facility for discerning that structure (Boster and DAndrade, 1989). In
other cases, such as color classification, the source of structure in experience that leads to
the existence of cross-cultural universals is controversial. Some argue that the
phenomenon has no intrinsic structure per se (the photons of light that produce the color
spectrum continuously vary in their energies) and that the universal appearance of
structure is a product of the way that the neurophysiology of perception interacts with the
color spectrum. Others assert that the dimensions of contrast in color lexicons directly
reflect dimensions of difference in reflected light.
4
Regardless of the source of the
structure in experience that leads to agreement, the fact that cross-cultural universals
generally have their origin in individual cognition has a number of consequences. These
consequences have been best documented in the case of color classification. First,
differences among languages are mirrored by variation among individual speakers of the
same language: Speech communities often contain speakers at adjacent stages in the
evolutionary sequence with the younger speakers tending to have more advanced color
lexicons than older speakers (Kay, 1975; Dougherty 1977). Second, individuals
5/6/2004 3
conceptions of the internal structure of color categories are often congruent with the
evolution of color lexicons: The bifocality of the Tarahumaran GRUE term siyoname
in green and blue anticipates next stage in color term evolution (Burgess, Kempton, and
MacLaury, 1983). Third, individuals respond to the fundamental color foci regardless of
the degree of elaboration of their color lexicons: Although the Dani only have two color
terms (warm- light and cool-dark), unnamed foci are more salient and more easily
remembered than surrounding nonfocal colors (Heider, 1972). Fourth, aggregates of
individuals agree more than individuals do: The level of cross-cultural variation in the
selection of foci is small in comparison to the level of intra-cultural variation (Berlin and
Kay, 1969; Boster, 1991). Finally, individuals appear to respond to the structure in the
domain similarly to the way that aggregates do: Individuals faced with the task of
successively dividing color categories recapitulate the stages by which languages evolve
(Boster, 1986).
These last two points are both important and general. To rephrase, aggregates
agree more than individuals do and aggregates agree because individuals do. Each
individual can be seen as fallibly capturing aspects of a universal structure aggregates
agree more than individuals do because the noise in the individual judgments tends to
cancel and the common signal reinforces. We can make the simplifying assumption that
the informants are more or less equivalent to each other each contributes their own
piece of the overall signal, dusted with their own bit of individual noise or error.

To illustrate this principle, I asked four informants, Andy, Beth, Carl, and
Dee, to guess the distance from Storrs, CT to New York, Chicago, Miami, and Los
Angeles and they answered as follows:

Distance to Andy Beth Carl Dee Average Actual
New York 100 60 140 105 101 140
Chicago 300 200 1500 700 675 926
Miami 700 1400 1400 800 1075 1434
Los Angeles 1000 3600 3080 3000 2670 2976

Each of them individually does quite respectably, making guesses that are correlated an
average of .971 with the actual distances. (They also agree substantially with each other,
their guesses have an average correlation of .926 with each other.) But if we aggregate
the guesses of pairs of informants, we do even better: The pairs are correlated an average
of .987 with the true distances. Aggregating sets of three informant s does better still with
an average correlation of .993, and aggregating all four informants does best of all with a
correlation of .996. In other words, even in a case in which informants individually
produce guesses that are highly correlated with the actual distances, the aggregation of
their guesses produces an even higher correlation.
5


To state this more formally, if individuals are responding to some universally
perceptible structure in a similarity judgment task, then the correlation of an aggregate of
individuals with a world standard should increase as the size of the aggregate increases
according to the following formula:
5/6/2004 4
N
r
r
r
xx
xx
xy

+
1
[1]
where r
xy
is the average individual informants correlation with the world standard, r
xx
is
the average correlation among informants on the similarity judgment task, and N is the
number of informants in the pool of aggregated responses (Kelley, 1923:200, cited in
Guilford, 1936:422).
6
This is the general formula for the correlation between a criterion
(in this case, the world standard) and the sum or average of a number of equally weighted
scores (in this case, the aggregated similarity matrices). The world standard is a similarity
matrix that represents how items in the domain are categorized by speakers of a number
of different languages, and therefore captures the universally perceptible structure. As
we increase the number of informants in the set of aggregated responses, the correlation
of the aggregate to the world standard steadily increases: To start, when we have a single
informant, N is one and the expression becomes xy
xx xx
xy
xx
xx
xy
r
r r
r
r
r
r
=
+
=

+
1
1
1

In other words, the expression becomes simply the correlation of that single informant
with the world standard (as it should). At its maximum value, when we have an infinite
number of aggregated informant responses, the term
N
rxx 1
approaches zero, and the
correlation of the aggregate to the world standard becomes
xx
xy
r
r
[2]
7

If we apply formula [1] to the guessed distances example above, we obtain
4
926 . 1
926 .
971 .

+
or an estimate of .999, not far from the observed correlation of the
aggregate to the actual distances of .996.

To test this relationship, I compare similarity judgments and world standards for a
number of domains. The domains are color, the facial expression of emotion, and
disintegration events (e.g., scenes of cutting, breaking, tearing, etcetera). In each domain,
a world standard was derived from how speakers of different languages named or
identified a collection of items from the domain. After all, language speakers assign
items to labeled categories based on the features of the items items named the same
share more features and are judged more similar than items that are distinguished. But
the features chosen as the basis of category membership may vary from language to
language, so that the naming responses from a number of different languages should
better reflect the universally perceptible structure than those from any single language. In
comparing the world standard with the similarity judgments, we are comparing two kinds
of similarity structures: one derived from the linguistic classification of the items from
speakers of many languages, the other derived from a non-linguistic
8
categorization of
the items by speakers of a single language.

5/6/2004 5
Methods

The steps in the process of creating the world standard are as shown in the cartoon
on the top of Figure 1 and are as follows: First, the items in the domain are named by
speakers of a number of languages in a naming or identification task. These
identification task responses are then used to produce a mapping matrix. The columns of
the mapping matrix correspond to the items used in the task, the rows correspond to the
terms used in each language to name the items, and the cell entries are the frequencies by
which the item corresponding to the column is given the term corresponding to the row.
The columns of the mapping matrix are then inter-correlated to produce an item-by-item
world standard similarity matrix of items that captures how similarly the items are named
across the languages sampled.
9


Figure 1.
Identification Task Mapping Matrix World Standard
Similarity Judgment Task Similarity Matrix
red
yellow
green
blue
1
1
1
1
1.0
1.0
1.0
1.0
.10 -.04
4
4
4
4
2
2
1 1
1 1
1 1
1 1
3
3
.10
-.04
-.05
-.05
.00
.00 -.02
-.02
.69
.69



5/6/2004 6
Similarity judgments from individuals were elicited using the successive pile sort
(Boster, 1987, 1994). In this task, informants are presented with a number of items and
asked to sort them into groups according to which they think are most similar to one
another. After the initial sort, they are asked to successively merge their groups until all
of the items are merged together. They then are asked to return to their initial groups and
to successively split the groups until all of the items have been split apart. The procedure
elicits from each informant a complete binary tree expressing the ranked similarity of
each pair of items on a scale from 1 to N-1, where N is the number of items. Thus, the
similarity matrix is directly derived from the informants responses, as shown in the
cartoon at the bottom of Figure 1. The similarity of each pair is equal to the lowest node
of the binary tree that includes both items or, alternatively, the number of steps in the
successive splitting of the completely merged group of items that first separates that pair
of items. Items that are split apart when there are only two groups are the least similar
(similarity = 1) while the pair of items that are the last to be split apart are the most
similar (similarity = N-1).

In the domain of color, the world standard was created using the results of the
World Color Survey (Kay, Berlin, and Merrifield, 1991, Kay and Maffi, 1999).
10
In this
research, 2616 speakers of 110 different languages were asked to provide the basic color
terms in their own language for each of 330 color chips drawn from the Munsell array.
Similarity judgments of the same 330 color chips were elicited from 10 American
English speaking students at the University of Connecticut. This study will be referred in
Tables and Figures with the symbol WC. The Munsell array is depicted in Figure 2.

Figure 2. World Color Survey Munsell Color Chips.

In the domain of the facial expression of emotion, the world standard was created
by asking 260 informants of 5 languages (English, Spanish, Italian, Polish, and Shuar) to
identify the emotion expressed in 22 facial gestures posed by a male and a female actor.
The facial expressions were chosen to evenly sample Russells circumplex emotion space
(Russell, 1980). Similarity judgments of the same 11 male faces and the 11 female faces
were elicited from 26 American English speaking students at the University of
Connecticut and 26 Polish speaking informants from Warsaw and Krakow. In the
balance of this paper, the results of this investigation will be treated as four separate
studies: Americans judging the similarity of male faces, Americans judging the
similarity of female faces, Poles judging the similarity of male faces, and Poles judging
the similarity of female faces. These four studies will be referred in Tables and Figures
with the symbols AM, AF, PM, and PF respectively. The photographs used are shown
in Figure 3.
5/6/2004 7
Figure 3. Facial Expression Photographs.


5/6/2004 8
In the domain of disintegration-events, the world standard was created by asking
91 informants of 28 languages to name the action depicted in 61 video clips of an actor
disintegrating (e.g., cutting, breaking, smashing, poking, tearing, etc.) a variety of objects
(pottery, yarn, vegetables, sticks, cloth, etc.) (Bohnemeyer, Bowerman and Brown, 2001;
Majid, Van Staden, Boster, and Bowerman, 2004). Similarity judgments of the same 61
video clips were elicited from 15 American English speaking students at the University
of Connecticut. Informants were first shown each video clip, and then given a still
photograph capturing the most salient moment of the disintegration event to be used in
the pile sort. This study will be referred in Tables and Figures with the symbol DE. The
still photographs used are shown in Figure 4.

Figure 4. Disintegration Events Still Photographs


5/6/2004 9
Table 1.

Study
Number of
informants
(languages)
contributing
to World
standard
Number of
stimulus items
Number of
informants
participating in
the similarity
judgment task
(N)
Number of
combinations of
informants
(2
N
-1)
Color (WC) 2616 (110) 330 10 1023
American male
emotion faces (AM)
260 (5) 11 26 67,108,863
American female
emotion faces (AF)
260 (5) 11 26 67,108,863
Polish male
emotion faces (PM)
260 (5) 11 26 67,108,863
Polish female
emotion faces (PF)
260 (5) 11 26 67,108,863
Disintegration
events (DE)
91 (28) 61 15 32,767


Results

The next step in the analysis was to produce every possible combination of
informants in the similarity judgment task and to correlate each aggregated similarity
matrix with the world standard. The number of possible combinations of N informants is
2
N
-1. For example, for three informants, there are 3 ways of choosing a single informant,
3 ways of choosing a pair of informants, and one way of choosing all three informants;
3+3+1 = 2
3
-1 = 7. Thus, there are 1023 possible combinations of the 10 informants in the
world color chip similarity judgment task and more than 67 million possible
combinations of the 26 informants in the facial expression similarity judgment task, as
shown in Table 1.
11
The average correlation of each size aggregate from 1 to N was
recorded and compared with the expected value given by formula [1]. As shown in Table
2, the r
2
of the fit between the observed average correlation to the world standard and the
expected value given by formula [1] in all cases is above .999, ranging from .99903 to
.99999. It appears that the formula very accurately captures the way that the correlation
of aggregated responses with a world standard improves with increased size of the
aggregate. This is illustrated in Figure 5 which shows the increase in the correlation of
the aggregate to the world standard with increasing aggregate size from the actual data
fitted to the curve derived from formula [1]. The two constants in formula [1] affect the
shape of the curves differently: Increasing r
xy
increases the height of the intercept while
increasing r
xx
decreases the steepness of the slope of the curve.



5/6/2004 10
Table 2.

Study
Average
individual r
with World
standard
(r
xy
)
Average
individual r
with other
informants
(r
xx
)
Correlation of
aggregate
with infinite
informants
(
xx
xy
r
r
)
Fit of Data to
Model (r
2
)
Color (WC) .36 .40 .57 0.99999
American male
emotion faces (AM)
.60 .43 .92 0.99986
American female
emotion faces (AF)
.68 .55 .91 0.99972
Polish male emotion
faces (PM)
.59 .38 .97 0.99990
Polish female
emotion faces (PF)
.70 .59 .91 0.99995
Disintegration
events (DE)
.48 .35 .80 0.99903


Figure 5.
12

-1/sqrt(N)
PF
AF
PM
AM
DE
WC
-1.0 -0.8 -0.6 -0.4 -0.2 0.0
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0

5/6/2004 11
Discussion

Given the close fit of the data to formula [1], it is useful to reexamine it and
unpack its implications for the relationship between intra-cultural variation and cross-
cultural universals.
N
r
r
r
xx
xx
xy

+
1
[1]

A dominant theme in recent research on the pattern of intra-cultural variation has
been to emphasize the importance of agreement. The central idea has been that intra-
cultural variation is often patterned in such a way that indicates that there is a single
cultural system that is known to varying degrees by different individuals. Those
informants who agree most with each other can be inferred to best understand the cultural
system (Boster, 1985). This insight was formalized in Romney et al.s (1986) cultural
consensus model. They labeled a measure of the average agreement among informants
(the first factor score of a minimum-residual factor analysis of the inter-informant
agreement matrix), the informants cultural competence. An informants first factor
score is roughly equal to the square root of the informants average correlation with other
informants while the average first factor score is roughly equal to the square root of the
average inter- informant correlation, r
xx
.

There has been a tendency to view this agreement or cultural competence as an
unalloyed good. Thus, DAndrade (1987) presents evidence that those who more often
give modal responses on a variety of tasks (including some that have no obvious
correct answer like a word-association task) tend to be more reliable, consistent,
normal, educated, intelligent, and experienced than other informants. However, an
examination of the role of r
xx
in either formula [1] or [2] shows that competence or
agreement may be the enemy of validity the lower the agreement among the judges of
similarity, the higher the correlation of the aggregated responses to the world standard.
13

This principle can be illustrated by the results of the similarity judgments of the color
chips, shown in Figure 6. One of the informants, lets call him John, judged the
similarity of the color chips in a way that corresponded to the world standard a little less
well than the average of the other informants, but also in a way that had the lowest
correlation with other informants. To put this another way, John was just as fallible as
other informants in capturing aspects of the universal structure (the world standard), but
offered a piece of it that was the most different from his fellow informants. As a
consequence, if the other informants had to pick another informant who would best
improve the correspondence of their aggregated responses to the world standard, they
would be best off in choosing John. In this case, the most valuable informant for
improving the fit with the world standard is the one who disagrees the most with other
informants, not the one closest to the consensus or the one closest to the world standard.
(The appendix presents a formula for predicting the effect on the aggregate correlations
with the world standard of these individual differences.)


5/6/2004 12
Figure 6.
14

0.30 0.35 0.40 0.45 0.50
Agreement with other informants
0.30
0.35
0.40
0.45
0.50
John
World Color (WC)


However, in practice this effect may be overwhelmed by another phenomenon:
Informants who agree the most to their own aggregate also agree the most with the world
standard it may be rare to find informants like John who capture a significant portion of
the signal that is maximally independent of that caught by other informants. This point is
illustrated by the results of the American similarity judgments of the female emotion
faces (AM), shown in Figure 7. Here, neither Ann (the informant who agrees the most
with the world standard) nor Zo (the informant who agrees the least with other
informants) is the most valuable informant for improving the fit with the world standard.
Rather it is Jane, who among the informants who have high correlations with the world
standard has the lowest agreement with other informants. It is the optimal combination
of high agreement with the world standard and low agreement with other informants that
makes an informant the most valuable contributor to the aggregate. Intriguingly, Ann
and Zo make essentially equivalent contributions to increasing the magnitude of the
correlation of the aggregates they join with the world standard: Ann by contributing a
strong signal highly correlated with the world standard and Zo by contributing a signal
that while weakly correlated with the world standard is highly independent of that offered
5/6/2004 13
by other informants. Again, aggregates agree more than individuals do because the noise
in the individual judgments tends to cancel and the common signal reinforces, but it helps
if the signals are as strong and as independent of one another as possible.

Figure 7.
15

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Agreement with other informants
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
American
female (AF)
Jane
Ann
Zo



This observation prompts us to reassess whether or not agreement or cultural
competence is an unalloyed good, and to argue that the value of agreement depends on
ones goals as an ethnographer. If one seeks to estimate the judgments of a group with a
minimum number of informants, then agreement is good because one needs fewer
informants to get a highly reliable picture of the group aggregate. However, if one seeks
to discover the maximum possible concordance between an aggregated structure and a
world standard, then high agreement within the group limits what one learns from each
additional informant, and the correspondence between the aggregated similarity
judgments and the world standard will be lower. Again, the high agreement of the
informants diminishes the correlation with the world standard: Their reliability is an
enemy of validity. For this purpose, there is a great value in cognitive diversity.

5/6/2004 14
However, there is a limit on cognitive diversity, else the aggregation of an infinite
number of informants would produce a correlation greater than one with the world
standard. This follows from formula [2], if 1
xx
xy
r
r
, then xx xy r r . In other words, the
square root of the average inter-informant correlation on the similarity judgment task can
not be lower than the average individual correlation with the world standard: The
existence of cross-cultural universals imposes a limit on the possible magnitude of intra-
cultural variation. Furthermore, at the limit when decreasing xx r approaches r
xy
, there
would be no pattern in the local similarity judgments that is not part of the universal
pattern, because the similarity structure based on an infinite number of informants would
have a correlation of 1.0 with the world standard. Only when the agreement among
informants, as assessed by xx r , is substantially greater than r
xy
, could one argue that a
substantial portion of the informants judgments represent a local culturally specific
pattern not subsumed by the universal one. This is clearly not the case with the four
studies of the facial expression of emotion. In each study, the expected correlation of an
infinite sized aggregate of informants to the world standard is above .9. This leaves very
little room for the existence of a culturally unique pattern in the similarity judgments
expressing a peculiarly American or Polish point of view.

Conclusion

In sum, as one increases the size of a pool of informants performing a similarity
judgment task, the mean correlation of the aggregated responses to a world standard is
equal to
N
r
r
r
xx
xx
xy

+
1
, where r
xy
is the average individual informants correlation with
the world standard, r
xx
is the average correlation among informants on the similarity
judgment task, and N is the number of informants in the pool of aggregated responses.
The r
2
of this relationship was found to be above .999 in six studies spanning three
domains (color, verbs of disintegration, facial expression of emotions). This result can be
interpreted as one of the consequences of the fact that cases of cross-cultural universals
are rooted in individual cognition: Cross-cultural universals occur when some structure in
experience is available to individuals irrespective of their cultural or linguistic
background. In these cases, the individual informants can be seen as fallibly capturing
aspects of a universal structure. Unless there is perfect agreement among informants,
groups of individuals are always closer to the world standards than are individuals
because the error or noise in the individual judgments tends to cancel and the common
signal reinforces. Because they more faithfully converge on the common signal, we can
say that the aggregates are smarter or more accurate than the individuals are. But the
formula also shows the diminishing returns of each additional informant: we learn less
and less from each successive informant. Cognitive diversity is finite. Indeed, given the
fact that no correlation can be greater than one, the existence of cross-cultural universals
can be seen as imposing a limit on the magnitude of intra-cultural variation: xx xy r r .
In other words, the square root of the average correlation among informants can be no
lower that the average individual correlation with the world standard. In the case of
5/6/2004 15
minimum possible intra-cultural variation, there could be no aspect of the local cultural
pattern that was not subsumed by the universal, because the correlation of an infinite
sized aggregate with the world standard would be one. Further, the result also shows that
it is precisely the disagreement among informants that allows their aggregation to so
closely approximate the world standard. In demonstrating the value of cognitive
diversity, this result harkens back to Anthony Wallaces early and prescient observation
(1961) that some degree of cognitive diversity is necessary to the functioning of society.
Here it is shown to be also necessary for the common phenomenon that groups are more
accurate than individuals are.
5/6/2004 16
Literature Cited.
Berlin, Brent, Breedlove, Dennis, & Raven, Peter (1973) General principles of
classification and nomenclature in folk biology. American Anthropologist 74:214-242.
Berlin, Brent and Paul Kay. 1969. Basic Color Terms. Berkeley, CA: University of
California Press.
Bohnemeyer, Jurgen, Melissa Bowerman and Penelope Brown 2001. Cut and break
clips, version 3. In Stephen C. Levinson and Nicholas Enfield (Eds.), Field Manual
2001. Language & Cognition Group, Max Planck Institute for Psycholinguistics.

Boster, James S. 1985. 'Requiem for the Omniscient Informant': There's Life in the Old Girl
Yet. In Directions in Cognitive Anthropology. Janet Dougherty (ed.). University of Illlinois
Press. pp. 177-197.

Boster, James S. 1986. Can Individuals Recapitulate the Evolutionary Development of
Color Lexicons? Ethnology 25(1):61-74.

Boster, James S. 1987. Agreement between Biological Classification Systems is Not
Dependent on Cultural Transmission. American Anthropologist. 89(4):914-919.

Boster, James S. 1991. The Information Economy Model Applied to Biological Similarity
Judgment. In Socially Shared Cognition. Lauren Resnick, John Levine, and Stephanie
Teasley (eds.). Washington, DC: American Psychological Association. pp. 203-225.

Boster, James S. 1994. The Successive Pile Sort. Cultural Anthropology Methods 6(2):7-8.

Boster, James S., Brent Berlin, & John P. O'Neill. 1986. The Correspondence of Jivaroan to
Scientific Ornithology. American Anthropologist 88(3):569-583.

Boster, James S. and Roy G. D'Andrade. 1989. Natural and Human Sources of
Cross-Cultural Agreement in Ornithological Classification. American Anthropologist
91(1):132-142.

Brown, Roger W. and Eric H. Lenneberg. 1954. A study of language and cognition.
Journal of Abnormal and Social Psychology 49: 454-462.

Bulmer, Ralph. 1970. Which came first, the chicken or the egghead? In Pouillon, J.,
Maranda, P., ed., changes et communications. Vol. II. Mouton and Co., The Hague,
Netherlands. pp. 1069-1091.
Burgess, Donald, Willett Kempton, and Robert MacLaury. 1983. Tarahumara Color
Modifiers: Category Structure Presaging Evolutionary Change. American Ethnologist
10(1): 133-149.
5/6/2004 17
DAndrade, Roy. 1987. Modal Responses and Cultural Expertise. American Behavioral
Scientist: 31(2):194-202.
DeValois, Russell L., Israel Abramov and Gerald H. Jacobs. 1966. Analysis of response
patterns of LGN cells. Journal of the Optical Society of America 56(7):966-977.
Dougherty, Janet W.D. 1977. Color categorization in West Futunese: Variability and
change. In Blount, Ben and Mary Sanches eds. Sociocultural Dimensions of Language
Change, pp.103-118.
Guilford, J. P. 1936. Psychometric Methods. New York: McGraw-Hill.
Heider, Eleanor Rosch (1972 Universals in color naming and memory. Journal of
Experimental Psychology 93: 1-20.
Jameson, Kimberly and Roy G. DAndrade. 1997. Its not really red, green, yellow, blue:
an inquiry into perpetual color space. In C.L Hardin and L. Maffi (eds.), Color
Categories in Thought and Language . Cambridge: Cambridge University Press.

Kay, Paul. 1975. Synchronic variability and diachronic change in basic color terms.
Language in Society 4: 257-70.

Kay, Paul, Brent Berlin, and William R. Merrifield. 1991. Biocultural implications of
systems of color naming. Journal of Linguistic Anthropology 1(1): 12-25.

Kay, Paul, Brent Berlin, Louisa Maffi, and William Merrifield. 1997. Color naming
across languages. In C.L Hardin and L. Maffi (eds.), Color Categories in Thought and
Language . Cambridge: Cambridge University Press.

Kay, Paul and Louisa Maffi. 1999. Color Appearance and the Emergence and Evolution
of Basic Color Lexicons. American Anthropologist 101:743-760.

Kay, Paul and Chad McDaniel. 1978. The Linguistic Significance of the Meanings of
Basic Color Terms. Language 54:610-64.

Kay, Paul and Willett M. Kempton. 1988. What is the Sapir-Whorf hypothesis?
American Anthropologist 86: 65-79.

Kelley, Truman Lee. 1923. Statistical Method. New York: MacMillan.

Majid, Asifa, Miriam van Staden, James S. Boster, and Melissa Bowerman. 2004.
Categorization of events: A cross- linguistic perspective

Nunnally, Jum C. and Ira H. Bernstein. 1994. Psychometric Theory (3rd ed.) New York:
McGraw-Hill.

5/6/2004 18
Roberts, John. 1964. The self- management of culture. In Explorations in Cultural
Anthropology: Essays in Honor of George Peter Murdoch, W. Goodenough, Ed.
McGraw-Hill, London, UK.

Romney, A. Kimball and Tarow Indow. 2002. A model for the simultaneous analysis of
reflectance spectra and basis factors of Munsell color samples under D65 illumination in
three-dimensional Euclidean space. Proceedings of the National Academy of Sciences
99(17): 11543-11546

Romney, A. Kimball, Susan C. Weller, and William H. Batchelder. 1986. Culture as
consensus: A theory of culture and informant accuracy. American Anthropologist 88(2),
313-338.

Russell, James A. (1980). A circumplex model of affect. Journal of Personality and
Social Psychology, 39, 1161-1178.

Shepard, Roger N. 1992. The Perceptual Organization of Colors: An Adaptation to
Regularities of the Terrestrial World? In Barkow, Jerome H., Leda Cosmides, and John
Tooby (eds) The Adapted Mind: Evolutionary Psychology and the Generation of Culture
Oxford: Oxford University Press. pp. 495-532

Wallace, Anthony F. C. 1961. Culture and Personality. New York: Random House.
5/6/2004 19
Appendix

The derivation of the formula [1] in the context of psychometric theory is as follows: The
correlation of two variables with an increase of N is equal to the ratio of the square roots
of the changed reliability to the original reliability. (Nunnally and Bernstein, 1994: p.
258).

xx
xx
xy Nx y
r
r
r r
'
) ( =

If one uses the Spearman-Brown prophecy formula to estimate the changed reliability,
the formula becomes

xx
xx
xx
xy
r
r N
Nr
r
) 1 ( 1 +


Dividing above and below by xx r , it becomes

xx
xy
r N
N
r
) 1 ( 1 +


Dividing within the radical above and below by N, it becomes

N
r N
r
xx
xy
) 1 ( 1 +


Using the distributive law and rearranging terms, it becomes

N
r
N
Nr
r
xx xx
xy

+
1


Finally, simplifying the expression on the left, it becomes

N
r
r
r
xx
xx
xy

+
1
[1]

5/6/2004 20
The expected correlation with the world standard of an aggregate including a
specific informant is given by the following formula:

I
I
r
N
r Nr
I
I
r
N
r Nr
I
I
r
N
r Nr
I
xx
xx xx
xx
xx xx
xy
xy xy
'
) 1 (
) ' (
) 1 (
1
'
) 1 (
) ' (
) 1 (
'
) 1 (
) ' (
) 1 (
+

+
+

[3]

where N is the total number of informants, r
xy
is the average correlation of informants
with the world standard, r
xx
is the average correlation among informants, r
xx
' is the
average inter- informant correlation of the specific informant, r
xy
' is the correlation of the
specific informant with the world standard, and I is the size of the aggregate that includes
the specific informant. It can be seen that when I = 1, the formula resolves to r
xy
', while
when I = N, the formula resolves to
N
r
r
r
xx
xx
xy

+
1
, which is formula [1]. Formula [3]
simply adjusts formula [1] to take into consideration the changed magnitudes of r
xy
and
r
xx
when we specify one of the informants who is a contributor to the aggregate.

To compute the average correlation of an aggregate including a specific
informant, one sums across all size aggregates from 1 to N, multiplying by the number of
aggregates of that size that include the specific informant, and dividing the summation by
the total number of combinations including that informant, 2
N-1
.


1
2
'
) 1 (
) ' (
) 1 (
1
'
) 1 (
) ' (
) 1 (
'
) 1 (
) ' (
) 1 (
1
)! 1 ( ))! 1 ( ) 1 ((
)! 1 (

+
+

N
xx
xx xx
xx
xx xx
xy
xy xy
I
I
r
N
r Nr
I
I
r
N
r Nr
I
I
r
N
r Nr
I
N
I
I I N
N


In practice, this formula does not provide as accurate estimates as [3] above, because the
small errors in estimation given by [3] are magnified by the very large number of
aggregates of intermediate size. The term
)! 1 ( ))! 1 ( ) 1 ((
)! 1 (

I I N
N
reaches a maximum
when the aggregate size is half the total number of informants. With 26 informants and
an aggregate size of 13, it equals 5,200,300.
5/6/2004 21
Endnotes

1
I would like to express my sincerest appreciation to Roy DAndrade and to David A.
Kenny who were invaluable in the development of these ideas. I am also grateful for
comments by Cornelia Dayton, Douglas Hume, Kristin Kostick, Kateryna Maltseva, John
Shaver, and Asha Srinivasan. A portion of this research was supported by grants from
the National Science Foundation.

2
It could be argued that the term cross-cultural universal is a misnomer and should be
replaced with the term cross-social universal, because these are cases in which some
cultural content is shared between social groups. It is not the culture that is cross but
the societies. However, I will respect common usage and continue use of the term
cross-cultural universal.

3
The existence of the universals does not mean that there are no cultural differences;
indeed in each of the domains there is evidence of strong commonalities based in
attributes of our common humanity, but the phenomena are always understood in locally
culturally-specific terms. The task for the ethnographer is not to maintain either of the
absurd positions of radical universalism or radical relativism that differences or
similarities are non-existent but to tease out the pattern and content of those similarities
and differences.
4
The controversy regarding the source of the universals in color classification can be
summarized as follows: Kay and McDaniel (1978), relying on DeValois et. al, (1966)
argued that the unique hue points of red, yellow, green, and blue were given by the
pattern of firing of opponent process cells in the Lateral Geniculate Nuclei. However,
Jameson and DAndrade (1997) pointed out that the opponent processes gave unique hue
points in the wrong places: The axes of the system are closer to cherry teal versus
chartreuse violet, rather than red green versus yellow blue. Thus, they argue that
the universals in color classification cannot be traced solely to a pattern imposed by the
human neurophysiology of color vision, though they remained agnostic about the precise
source of the structure. At the other extreme, Shepard (1992) argues that the perceptual
organization of color is an adaptation to regularities of the illumination by the sun of the
earth which has three degrees of freedom: 1) light-dark variation (midday vs. shade vs.
moonlight); 2) yellow-blue variation (blue blocked sunlight vs. yellow direct
sunlight); and 3) red- green variation (the shortest wavelengths in red are the least
scattered by sunlight). The three types of cones (sensitive to long, medium, and short
wavelengths respectively), according to Shepard, are an adaptive response to these three
degrees of freedom. Romney and Indow (2002) studying the reflectance spectra of
Munsell chips found three axes: one that roughly corresponds to the mean power of the
spectral reflectance

(approximate Munsell value), a second that goes from Munsell red to
blue-green,

and a third that goes from Munsell green-yellow to purple. It is intriguing
that these are almost precisely the dimensions that Jamieson and DAndrade (1997) had
found for the opponent-process system. Although these are not the dimensions that
Shepard (1992) asserts are the dimensions of regularities in sunlight, it does provide some
clear evidence that the human neurophysiology of color vision has evolved to extract the
5/6/2004 22

maximum amount of information from reflected light the perceptual organization of
color is virtually precisely matched to the dimensional structure of reflected light. This
suggests that the structure in experience that leads to the universals in color classification
is in the phenomenon itself after all, even if they do not match the Hering model
oppositions of red -green and yellow-blue.
5
Note that the aggregate of all the guesses is highly correlated with the true distances
even though it has a systematic bias, underestimating the distances by about 20%.

6
This is the formula that Kelley (1923) gives for the correlation between a criterion and
the sum or average of a number of equally weighted scores. One can derive the same
formula using the Spearman-Brown prophecy model and assuming that adding
informants is comparable to adding items to a test, as shown in the appendix. Guilford
(1936) appreciated the analogy of informants to test items, but as far as I have been able
to determine, it has dropped out of more recent treatments. I am grateful to David A.
Kenny and Roy DAndrade for alerting me to this formula.

7
This is related to the correction for attenuation familiar to psychometricians (Nunnally
and Bernstein, 1994: 257).

8
The successive pile sort is non- linguistic in the sense that informants are not explicitly
required to give a linguistic label for the items in the pile sort. This is not to say that
informants cannot be influenced by the categories labeled by the words in their language
as they are sorting the items, only that they are not requested to give the category labels
as part of the task. This follows the distinction introduced by Brown and Lenneberg
(1954) and elaborated by Kay and Kempton (1988) in describing the way to appropriately
test the Sapir-Whorf hypothesis.

9
The results are not dependent on this particular way of producing a world standard
alternative methods such as counting the number of informants who called each pair of
items by the same term gave similar results.

10
The data are available on- line at http://www.icsi.berkeley.edu/wcs/data.html.

11
The value 2
N
-1 is related to the sums of the rows of Pascals triangle, a common
method of calculating the number of combinations of a given number of items.

1 1
1 2 1
1 3 3 1
1 4 6 4 1

The first value in each row is the number of ways of taking zero items, the second value
is the number of ways of taking two items, etcetera. One can see that each row sums to
2
N
, subtracting 1 for the ways of taking no items gives us 2
N
-1.

5/6/2004 23

12
The horizontal axis is
N
1
so as to produce a scale that increases with increasing N but
compresses high values of N. The vertical axis is the magnitude of the correlation of the
aggregate with the world standard. The points are the observed aggregate correlation
while the line is derived from the relationship predicted by formula [1]. The meanings of
the labels (PM, PF, etc.) are given in the first column of Table 1, and in the description
of the methods of data collection.

13
Roy DAndrade has pointed out to me (DAndrade, personal communication) that
Guilford (1936) makes a very similar observation about the reliability and validity of
mental tests. Guilford states (1936:423)

A close study of the preceding formulas, particularly of formula (204),
will show that, for a given validity r
xy
, the smaller the reliability r
xx
of the
test, the higher will be the upper limit of validity when the test is
lengthened. In other words, an unreliable test gains proportionally more in
validity by lengthening than does a test that is already very reliable.

14
The horizontal axis is the average correlation of each informant with other informants,
while the vertical axis is the correlation of each informant with the world standard in the
World Color similarity judgment task. The slanting striations indicate the average
correlations of aggregates that include the informants, ranging from a high of .51 at the
upper left to a low of .49 at the lower right. The informant John discussed in the text is
indicated.

15
As in Figure 6, the horizontal axis is the average correlation of each informant with
other informants, while the vertical axis is the correlation of each informant with the
world standard in the American Female (AM) emotion face similarity judgment task.
The slanting striations indicate the average correlations of aggregates that include the
informants, ranging from a high of .89 at the upper left to a low of .875 at the lower right.
The informants Jane, Ann, and Zo discussed in the text are indicated.