Themir 1

Journal of Experimental Psychology:
Learning, Memory, and Cognition

1990, Vol. 16, No. l, 5-16
Copyrighl 1990 by the American Psychological Association, Inc.
0278-7393/90/$00.75
The Mirror Effect in Recognition Memory:
Data and Theory
Murray Glanzer and John K. Adams
New Yor k Uni ver si t y
The mirror effect is a regularity in recognition memory that requires reexamination of current
views of memory. Five experiments that further support and extend the generality of the mirror
effect are reported. The first two experiments vary word frequency. The third and fourth vary
both word frequency and concreteness. The fifth experiment varies word frequency, concreteness,
and the subject's operations on the words. The experiments furnish data on the stability of the
effect, its relation to response times, its extension to multiple mirror effects, and its extension
beyond stimulus variables to operation variables. A theory of the effect and predictions that
derive from the theory are presented.
The mi r r or effect ( Gl anzer & Adams , 1985) is a st r ong
regul ari t y in r ecogni t i on memor y. I t is s ummar i zed as follows.
I f t her e are t wo classes of st i mul i , a nd one is mor e accur at el y
r ecogni zed t han t he ot her, t hen t he super i or class is both mor e
accur at el y r ecogni zed as ol d when ol d and also mor e accu-
rat el y r ecogni zed as new when new. For exampl e, low-fre-
quency wor ds are bet t er r ecogni zed t han hi gh-frequency
words. The mi r r or effect means t hat t he great er effi ci ency i n
recogni zi ng is al ways t wofol d. Ol d l ow- f r equency wor ds are
bet t er r ecogni zed as ol d t han are ol d hi gh- f r equency words,
and new l ow- f r equency wor ds ar e bet t er r ecogni zed as new
t han are new hi gh- f r equency words.
I n t he di scussi on t hat follows, r ecogni t i on per f or mance is
vi ewed as based on subj ect s' responses t o under l yi ng di st ri -
but i ons of some measur e for new a nd ol d i t ems. These
di st r i but i ons are not , of course, di r ect l y observed. They are
deduced f r om r ecogni t i on dat a. The r el at i on of t he under l yi ng
di st r i but i ons t o t he dat a obt ai ned f r om st andar d r ecogni t i on
t e s t s - - ye s / no, conf i dence rat i ng, forced c hoi c e - - i s gi ven in
det ai l by Gl anzer and Ada ms (1985) and ot her s (Egan, 1975;
Gr een & Swets, 1966; McNi col , 1972).
Some possi bl e di st r i but i ons for t wo classes of st i mul i , when
one is mor e accur at el y r ecogni zed t han t he ot her, are shown
i n Fi gur e 1. Panel 1 r epr esent s t he di st r i but i ons t hat under l i e
t he mi r r or effect. The panel shows t he di st r i but i ons for t wo
classes of st i mul i , A a nd B. Cl ass A is r ecogni zed wi t h great er
accuracy. Thi s is r epr esent ed by t he rel at i vel y large di st ance
bet ween t he under l yi ng A ol d (AO) and A new ( AN) di st ri -
This research was supported by Grant 1 ROI MH449 from the
National Institute of Mental Health, Grant BNS 84-15904 from the
National Science Foundation, and Contract F49620-86-C-0131 with
the Air Force Office of Scientific Research.
We thank Leslie Sherman and Amy Wolff for assistance in collec-
tion and analysis of data, Jean-Claude Falmagne, Elliot Hirshman,
Geoff Iverson, and Gay Snodgrass for their helpful comments on an
earlier draft of this article. We also thank Douglas L. Hintzman,
William E. Hockley, and two anonymous reviewers for their construc-
tive comments.
Correspondence concerning the article should be addressed to
Murray Glanzer, Department of Psychology, New York University,
New York, New York 10003.
but i ons. Cl ass B is r ecogni zed wi t h less accuracy. Thi s is
r epr esent ed by t he rel at i vel y smal l di st ance bet ween t he B ol d
(BO) and B new (BN) di st r i but i ons. The mi r r or effect means
t hat t he di fference i n accur acy of r ecogni t i on of A and B
det er mi nes two mor e di fferences i n di st ance. A ol d is higher
on t he deci si on axis t han B ol d, and A new is lower t han B
new, as shown i n t he panel . These di fferences will be t he focus
of t he st at i st i cal anal yses of t he exper i ment s r epor t ed here.
The mi r r or effect regul ari t i es do not fol l ow f r om t he si mpl e
fact t hat class A st i mul i are handl ed mor e accur at el y t han
class B st i mul i . Gi ven t he di fference i n accuracy, a vari et y of
pat t er ns of t he under l yi ng di st r i but i ons coul d hol d t hat vi ol at e
t he mi r r or effect. Two such pat t er ns are shown i n Panel s 2
and 3 of Fi gur e 1.
Each of t he r el at i ons r epr esent ed i n t he panel s of Fi gur e 1
i mpl i es a par t i cul ar pat t er n of dat a for each of t he st andar d
r ecogni t i on tests. The r el at i ons i n Panel l i mpl y for confi -
dence r at i ng dat a t hat
R( AN) < R( BN) < g ( a o ) < R( AO) ,
where R represent s t he mean conf i dence r at i ng on a scale t hat
has very sure new at its l ow end and very sure old at its hi gh
end.
For yes / no dat a t he i mpl i ed r el at i ons are
FA( AN) < FA( BN) < H(BO) < H( AO) ,
where FA is false al ar m rat e, and H is hi t rate.
For forced choi ce dat a t he r el at i ons are
P(BO, a N) < P( AO, BN), P(BO, AN) < P( AO, AN) ,
where P is t he pr opor t i on of choi ces of t he first ar gument
over t he second ar gument wi t hi n t he parent heses. The c o mma
bet ween t he t wo mi ddl e t er ms signifies an i ndet er mi nat e
r el at i on bet ween t hose t erms.
A met a- anal ysi s of 80 r ecogni t i on exper i ment s s uppor t ed
t he exi st ence of t he mi r r or effect ( Gl anzer & Adams, 1985)
for al l r ecogni t i on par adi gms: yes/ no, conf i dence rat i ng, and
forced choi ce. The met a- anal ysi s, mor eover , demons t r at ed
t hat t he effect hel d for all st i mul us vari abl es t hat coul d be
surveyed: wor d frequency, concret eness, meani ngful ness, and
others.
6 MURRAY GLANZER AND J OHN K. ADAMS
AN BN BO AO
AO
AN BN & BO
AN
& BN BO AO
DECI SI ON AXI S
Figure 1. Three possible orders of underl yi ng di st ri but i ons when
accuracy on st i mul us class A is greater t han accuracy on class B. (O
= old, N = new.) Panel 1 shows t he mi r r or effect.
Suc h a r e gul a r i t y i n me mo r y is a c ha l l e nge t o s t r e ngt h
t he or i e s o f r e c o g n i t i o n me mo r y . Th i s p o i n t was fi rst n o t e d by
Br o wn ( 1976) . Ac c o r d i n g t o s t r e ngt h t heor i es , i n a r e c o g n i t i o n
t es t t he s ubj e c t deci des o n t he bas i s o f t he s t r e ngt h o f t he
i t ems . Te r ms e q u i v a l e n t t o s t r e ngt h ar e the amount of marking
or familiarity of the items. Th e s e t heor i es , t he r e f or e , l abel t he
de c i s i on axi s i n Fi gur e 1 as s t r e ngt h, a mo u n t o f ma r k i n g , or
f ami l i ar i t y. Suc h t he or i e s h a v e p r o b l e ms i n a c c o u n t i n g f or
t he mi r r o r effect , Th e y c o n t a i n n o i n h e r e n t me c h a n i s m t h a t
ar r ays t he u n d e r l y i n g ne w a n d ol d d i s t r i b u t i o n s i n t he mi r r o r
or de r as de pi c t e d i n Pa n e l 1 o f Fi gur e I. I n t hi s ar t i cl e we wi l l
c ons i de r a di f f e r e nt t h e o r y o f t he effect : a t t e n t i o n / l i k e l i h o o d
t heor y.
Sever al e x p e r i me n t s wi l l n o w be pr e s e nt e d. Th e fi rst exper -
i me n t wi l l e x p a n d t he d a t a ba s e o f t h e mi r r o r effect . I t wi l l
al s o e x a mi n e t he me a n c o n f i d e n c e r a t i ngs f or mi sses. Th i s
me a s u r e i s o f i nt e r e s t f or t h e t e s t i ng o f t he or i e s o f t he effect .
Experiment 1 : Word Frequency and
Incidental Learning
I n t hi s e x p e r i me n t t he s ubj e c t s f i r st c a r r i e d o u t a n i n c i d e n t a l
l e a r n i n g t ask, l exi cal deci s i on. T h e n t he y wer e gi ve n a s ur pr i s e
r e c o g n i t i o n t es t i n wh i c h t he ol d wor ds wer e t he wor ds pr e-
s e nt e d d u r i n g t h e l exi cal de c i s i on t ask. Th e t i me s t o ma k e t he
l exi cal de c i s i on r e s pons e s a n d t he r e c o g n i t i o n r e s pons e s wer e
r ecor ded.
Method
Procedure. In t he lexical decision task, t he subjects viewed words
and nonwords on a moni t or. The present at i on was paced by t he
subjects who, for each item, pressed one of two response keys on a
response board labeled yes (for word) and no (for nonword). The yes
key was assigned to t he subject' s domi nant hand. The subjects were
told to be qui ck and accurate. The lexical decision task was preceded
by eight practice items.
The recognition test was carried out on t he comput er keyboard.
Onl y words were presented dur i ng t hi s test. Onset of t he test word
started a t i mi ng period. When t he subjects reached a decision, they
pressed t he space bar which ended t he t i mi ng period. Then they
pressed one of two keys i ndi cat i ng whet her t he i t em was old or new.
The keys i n t hi s experi ment and i n Experi ment s 3, 4, and 5 were
arranged so t hat old was assigned to t he subject' s domi nant hand.
Finally, t hey pressed one of four keys labeled unsure, somewhat sure,
moderately sure, or very sure. Thi s three-stage response was used to
exclude from t he subjects' recognition response t i mes t he addi t i onal
t i me to move to t he ext reme rat i ng scale positions. That t i me could
produce an artifactual speed-accuracy trade-off.
All stimuli for bot h t he i nci dent al l earni ng task and recognition
test were presented centered, in uppercase letters. Following t he
subject' s response, t he screen went bl ank for 500 ms in t he i nci dent al
l earni ng task and for 2,000 ms i n t he recognition test. Then t he next
i t em appeared on t he screen, The present at i on of t he items on t he
moni t or was cont rol l ed by a comput er which also recorded responses
and response times. The program used for t he comput er is described
in Adams (1985). Except when not ed otherwise, t he procedures and
st i mul us present at i on used here were t he same in Experi ment s 3, 4,
and 5.
Materials. The words present ed i n t he lexical decision task con-
sisted of 124 high-frequency words ( mean log Ku~era-Francis fre-
quency 4.8) and 124 low-frequency words (mean log frequency 2.4).
The word groups bot h had a mean length of 5.0 letters. The 248
nonwor ds were const ruct ed t o be orthographically and phonologically
legal. They had a mean length of 5.6 letters. The new words presented
i n t he recognition test (124 high frequency and 124 low frequency)
had t he same mean frequency and length as t he old words. The mai n
list of lexical decision i t ems was preceded by 12 initial filler i t ems
and followed by 12 final filler i t ems (each consisting of six words and
six nonwords) t o el i mi nat e serial position effects. Nonwords and filler
words di d not appear on t he subsequent recognition test. The word
sets were count er bal anced across subjects so t hat each of t he experi-
ment al words was used an equal number of t i mes as old and new in
t he recognition test. Again, t hi s count erbal anci ng of word sets was
used in Experi ment s 3, 4, and 5.
Subjects. Sixteen undergraduat es participated in t he experi ment
to fulfill an i nt roduct ory psychology course requi rement . All were
nat i ve speakers of English. Thi s description of t he way subjects were
recruited and selected holds also for Experi ment s 3, 4, and 5.
Results
Th e s ubj e c t s wer e hi ghl y a c c ur a t e o n al l cl asses of i t e ms i n
t he l exi cal de c i s i on t ask. Th e p r o p o r t i o n s o f c or r e c t r e s pons e s
ar e as fol l ows: h i g h - f r e q u e n c y wor ds ( M = . 99), l ow- f r e que nc y
wor ds ( M = . 97), n o n wo r d s ( M = . 95). Th e ef f ect o f i t e m
cl ass i s s t at i s t i cal l y s i gni f i cant , F( 2, 30) = 13. 43, p < . 0001,
MSe = 0, 022. Ana l ys i s o f p r o p o r t i o n s he r e a n d i n t he r est o f
t hi s ar t i cl e was c a r r i e d o u t o n t he ar c s i ne t r a n s f o r ma t i o n o f
t he or i gi nal p r o p o r t i o n s . ( I n t hi s ar t i cl e, whe r e scor es ar e
t r a n s f o r me d e i t h e r by ar c s i ne or l oga r i t hm, t he a c c o mp a n y i n g
MS~s wi l l be f or t he t r a n s f o r me d scor es. )
Th e l exi cal de c i s i on r e s pons e t i me s ar e, as expect ed, nega-
t i vel y c or r e l a t e d wi t h t he p r o p o r t i o n cor r ect . Th e g e o me t r i c
me a n s ( ant i l ogs o f me a n l ogs) wer e 618, 6 7 1 , 8 4 0 ms f or hi gh-
MIRROR EFFECT IN RECOGNITION MEMORY 7
Table 1
Means f or t he Four Condi t i ons o f Exper i ment 1 ( N = 16)
New Old
Measure Low High High Low
Rating 3.34 3.76 5.09 5.56
P(yes) .304 .359 .592 .661
RT ~ 1,213 1,192 1,170 1,166
Note. P(yes) = proportion of yes responses. RT = response time.
a Antilog of mean log response time (in milliseconds).
frequency words, l ow-frequency words, and nonwords, re-
spectively. Analyses of response times in this experi ment and
in Experi ment 3 were carried out on t he logs of t he response
times. The effect of item class is, again, statistically significant,
F(2, 30) = 74.61, p < .0001, MSe = 0.005. I n summar y, t he
lexical decision task showed t he usual pat t ern f ound i n exper-
i ment s with word frequency as t he variable (see Gl anzer &
Ehrenreich, 1979).
Two related sets of recognition measures are of interest
with respect t o t he mi r r or effect. One is t he confi dence ratings
for the four stimulus conditions. The ot her is t he proport i ons
of hits and false alarms. I n the case of confi dence ratings, t he
mi rror pat t ern is
R( LN) < R( HN) < R( HO) < R(LO),
where R signifies mean rating. The argument s L and H refer
again t o low- and hi gh-frequency words; N and O refer t o
new and old. The ratings here and in the following experi-
ment s are placed on a single scale, with t he highest value, 8,
assigned t o very sure t he i t em is old, 7 t o moderat el y sure t he
i t em is old, 6 t o somewhat sure t he i t em is old, 5 t o unsure
t he i t em is old, 4 t o unsure t he i t em is new, and so on down
t o 1, assigned t o very sure t he i t em is new.
I n case of hits and false al arms t he mi r r or pat t ern is
FA(LN) < FA( HN) < H( HO) < H(LO).
Table 1 and t he following tables are arranged so t hat the
mi rror effect is evidenced by a progression of increasing means
going from left t o right. The mean confi dence ratings (row 1)
and the hits and false alarms, t he pr opor t i on of yes responses
(row 2), bot h show t he mi r r or effect.
The statistical analysis of the dat a set s- - conf i dence ratings,
hits and false alarms, response t i me s - - f or this and t he follow-
ing experi ment s is carried out by first doi ng a prel i mi nary
one-way analysis of variance across t he experimental condi -
tions, in this case l ow-frequency new (LN), hi gh-frequency
new (HN), hi gh-frequency old (HO), and l ow-frequency old
(LO). This analysis is followed by t wo key compari sons: (a)
high-frequency old versus l ow-frequency ol d and (b) high-
frequency new versus l ow-frequency new.
These t wo compari sons are critical. I f the differences are in
t he right direction and statistically significant, t hey support
the stability of the mi rror effect. One-tailed tests are used for
these pl anned compari sons.
The overall evaluation (the one-way analysis of variance
here of the four experimental conditions) of t he confi dence
ratings shows F(3, 45) = 120.92, p < .0001, MSe = 0.147.
This overall eval uat i on gives highly significant effects because
it includes the effect of new versus old items as well as the
compari sons of interest. The overall evaluation, which in all
experi ment s is highly significant, is reported here but not in
t he following experi ment s because it is not of interest. Onl y
the key pl anned compari sons are presented.
These compari sons for t he confidence ratings show bot h
critical differences i n the right direction and bot h statistically
significant: hi gh-frequency old versus low-frequency old, t(45)
= 3.46, p < .005; high-frequency new versus l ow-frequency
new, t(45) = 3.10, p < .005. The parallel analysis of pr opor t i on
of t he yes responses (hits and false alarms) shows the overall
F(3, 45) = 120.91, p < .0001, MSe = 0.018. Bot h critical
differences are again in the right direction and statistically
significant: l ow-frequency versus high-frequency hits, t(45) =
3.14, p < .005; l ow-frequency versus high-frequency false
alarms, t(45) = 2.57, p < .01. The analysis of t he pr opor t i on
of yes responses is partially r edundant with the analysis of
confi dence ratings. It therefore will be reported in onl y abbre-
viated f or m in the following experiments. The mean propor-
tions will be i ncl uded in t he tables t o underscore t he regularity
of the effect.
Also shown in Table 1 are t he mean response times for
each of t he four conditions. There are two possible expecta-
tions concerni ng t he pat t ern of response times. One, and of
greater concern t o us, is t hat there is a speed- accur acy trade-
off, with response times for LO > HO and LN > HN. Such a
t rade-off woul d make the mi r r or effect trivial. Hockl ey and
Mur dock (1987) present evidence (Hockley, 1982) against a
trade-off. The possibility of t rade-off is, however, i mpor t ant
enough t o require full checking. The ot her possibility is a
positive correlation of speed and accuracy: LO < HO and LN
< HN. The Hockl ey (1982) dat a show such a positive corre-
lation. Here, however, neither pat t ern holds: neither a speed-
accuracy t rade-off (negative correlation) nor a speed- accur acy
positive correlation.
Analysis of variance of the log response times reveals onl y
the difference between new and old as statistically significant,
t(45) = 2.11, p < .05, MSe = 0.002. Nei t her of the ot her
relevant compari sons is large or statistically significant: high-
frequency old versus low-frequency old; hi gh-frequency new
versus low-frequency new. There is no evidence here, there-
fore, t hat speed- accur acy correlation plays a role in t he mi r r or
effect. It coul d be argued that, for t he response arrangement
used in this experiment, t he subjects' initial response may
have preceded their actual decision and t hat this reduced t he
correlation of speed and accuracy. The similarity of t he mean
response times, all close t o 1,200 ms, does not support such
an argument . We will, however, exami ne this quest i on again
in Experi ment 3 with a different response arrangement .
The mean ratings for misses have been singled out as
i mpor t ant by Brown, Lewis, and Monk (1977). They not e
t hat missed highly memor abl e old items are rejected with
greater confi dence t han missed l ow-memorabl e old items.
This finding has theoretical i mport ance because it contradicts
expectations on t he basis of strength theories. The dat a of this
experi ment replicate t he finding. The mean confi dence rating
for l ow-frequency misses is 2.15 and for high-frequency misses
is 2.33. The difference, t hough small, is in t he right direction
and is statistically significant, F(1, 15) = 15.58, p < .002, MSe
8 MURRAY GLANZER AND JOHN K. ADAMS
= 0. 016. We cons i der t hi s f i ndi ng a nd ot her s t hat show t he
s ame r el at i on i n t he f i nal sect i on.
I n s umma r y, t he pr es ent e xpe r i me nt shows t he fol l owi ng:
(a) a not he r r epl i cat i on of t he mi r r or effect for wor d f r equency
on bot h conf i dence r at i ng a n d yes~no dat a; (b) n o evi dence
of a s peed- accur acy cor r el at i on; (c) di fferences i n t he me a n
conf i dence r at i ngs for mi sses. The next e xpe r i me nt was de-
si gned t o e xa mi ne Poi nt s a a n d c furt her.
E x p e r i me n t 2: Wo r d F r e q u e n c y a n d I n t e n t i o n a l
L e a r n i n g
Thi s was a r epl i cat i on of Expe r i me nt 1, wi t h several changes
i n pr ocedur e. It was car r i ed out as a gr oup e xpe r i me nt wi t h
i nt e nt i ona l i nst ead of i nc i de nt a l l ear ni ng a n d audi t or y i ns t ead
of vi sual pr es ent at i on.
Met hod
Procedure. A group of subjects heard a single list of words read
at a 1-s rate. They were told that they would be given a recognition
test. The test consisted of a printed list of words, mixed old and new.
Next to each word was a sequence of letters and number s- - Y, N, 1,
2, 3, and 4. The subject indicated old by circling Y, new by circling
N, and degree of confidence by circling the number (1 for unsure, 4
for very sure).
Materials. A shorter list was constructed from the materials used
in Experiment 1. The study list consisted of 50 high-frequency words
(mean log frequency = 5. I) and 50 low-frequency words (mean log
frequency = 2.5) plus 24 initial filler words and 24 final filler words.
The filler words were evenly divided into high- and low-frequency
words. The test list consisted of the 100 study list words plus matched
(same mean log frequency) groups of 50 new high-frequency and 50
new low-frequency words as distractors. The subjects' responses on
the test were self-paced.
Subjects. Thirty-five undergraduates in a memory course partic-
ipated in the experiment as a class exercise.
Resul t s
The ma i n resul t s are s hown i n Tabl e 2. The y par al l el cl osel y
t he resul t s of Expe r i me nt 1. The mi r r or effect is pr es ent i n
bot h t he me a n conf i dence r at i ngs a n d pr opor t i on of yes
responses.
The tests (MS = 0. 436) of t he me a n r at i ngs agai n show t he
key di fferences t o be st at i st i cal l y si gni f i cant : hi gh- f r equency
ol d versus l ow- f r equency ol d, t (102) = 3. 59, p < . 0005; hi gh-
f r equency ne w ver sus l ow- f r equency new, t (102) = 2. 11, p <
.025. The mi r r or pat t er n also hol ds i n t he par al l el anal ysi s of
t he hi t s a n d false al ar ms (row 2), bot h ps < .05.
The me a n r at i ngs for t he mi sses show t he s ame pat t er n as
Tabl e 2
Means f or the Four Conditions of Experiment 2 (IV = 35)
New Old
Measure Low High High Low
Rating 3.22 3.56 5.51 6.08
P(yes) .228 .281 .613 .704
Note. P(yes) = proportion of yes responses.
i n Expe r i me nt 1. The me a n conf i dence r at i ngs for mi sses are
l ower for l ow- f r equency wor ds (2. 56) t h a n for hi gh- f r equency
wor ds (2.68). Thi s is based on 34 subj ect s because 1 subj ect
di d not have a ny misses. The di f f er ence agai n is slight but
st at i st i cal l y si gni f i cant , F( 1, 33) = 7.90, p < .01, MSe = 0. 030.
I n s u mma r y , t he resul t s of Expe r i me nt 2 - - wi t h audi t or y
pr es ent at i on, i nt e nt i ona l l ear ni ng, a nd gr oup t e s t i n g - - c o n -
f i r m t he f i ndi ngs of Expe r i me nt 1.
E x p e r i me n t 3: Mu l t i p l e Mi r r o r Ef f ect s a n d Pa r t i a l
Or d e r - - F r e q u e n c y a n d Co n c r e t e n e s s
The pur pos e of t hi s e xpe r i me nt was t o devel op a mul t i pl e
mi r r or effect by us i ng t wo v a r i a b l e s - - n o r ma t i v e f r equency
a n d c o n c r e t e n e s s - - i n a si ngl e set of i t ems. Each of t he t wo
var i abl es al one pr oduces a mi r r or effect. We c o mb i n e d t hese
t wo var i abl es fact ori al l y i n or der t o pr oduce a mor e compl ex
mi r r or effect i nvol vi ng mor e or der ed t er ms t ha n t he f our
or der ed t er ms seen i n t he pr eceedi ng exper i ment s . I f t he t wo
var i abl es are equal i n t hei r effectiveness, t he n t he me a n con-
f i dence r at i ngs s houl d give t he f ol l owi ng par t i al order:
R( LCN) < R( HCN) , R( LAN) < R( HAN) <
R( HAO) < R( LAO) , R( HCO) < R( LCO) ,
wher e C r epr esent s concr et e a n d A abst r act wor ds (for exam-
ple, LCN = l ow f r equency, concr et e, new). Paral l el t o t hese
i nequal i t i es for t he r at i ngs s houl d be a par t i al or der for t he
hi t s a n d false al ar ms:
FA( LCN) < FA(HCN), FA( LAN) < FA( HAN) <
H( HAO) < H( LAO) , H( HCO) < H( LCO) .
We had act ual l y expect ed t hat f r equency woul d be mor e
effective t h a n concr et eness. I n t hat case, wher e one var i abl e
is st ronger, a ful l r at her t ha n a par t i al or der is expect ed. A
ful l y or der ed set of t er ms will be pr oduc e d i n t he next exper-
i me nt .
Met hod
Procedure. The procedure was basically the same as that in
Experiment 1, with lexical decision as the incidental learning task.
The sequence of responses required of the subjects in the recognition
test was, however, simplified. The subjects made a single response to
each test word, pressing one of an array of eight keys with the
rightmost key indicating very sure old and the leftmost key indicating
very sure new. The eight keys were in the top row of the keyboard
with labels indicating confidence levels. On the test the subjects saw
a series of 280 words--hal f old, half new. The stimulus presentations
in both study and test were self-paced. The interstimulus intervals on
the study and the test lists were the same as in the Experiment 1 (500
and 2,000 ms, respectively).
Subjects. Sixteen undergraduates participated.
Materials. The composition of the lists differed from that in
Experiment 1. In the lexieal decision task 140 words and 140 non-
words were presented. The 140 words were drawn, 35 from each of
four 70 word sets: low-frequency concrete (LC), high-frequency con-
crete (HC), low-frequency abstract (LA), and high-frequency abstract
(HA). The two low-frequency sets both had mean log frequency of
1.5; the two high-frequency sets both had mean log frequency 3.9,
based on the Kurera-Francis (1967) norms. The two concrete sets
both had a mean concreteness rating of 6.8; the two abstract sets both
had a mean rating of 2.6, based on the Paivio, Yuille, and Madigan
(1968) norms. With both concreteness and frequency varied, it was
not possible to match the word lengths across conditions as closely
as in Experiments 1 and 2. The means for the four groups listed
above were 7.2, 5.9, 7.8, and 6.9 letters, respectively. Because we
were concerned that these differences might affect the pattern of
results, we subsequently carried out a special analysis of the data to
determine whether the differences had an effect. They did not. This
analysis will be reported briefly later.
The main list of lexical decision items was preceded by 80 filler
items and followed by 80 filler items (half words and half nonwords)
which did not appear on the recognition test. The recognition test list
consisted of the old words plus the remaining unpresented 140 words,
35 from each of the four word sets.
Results
In the lexical decision task, high-frequency words took less
time to respond to ( M = 592 ms) than did low-frequency
words (M = 702 ms), F(I, 15) = 65.38, p < .0001, MSe =
0.007. High-frequency words ( M = .98) were responded to
more accurately t han low-frequency words ( M = .92), F(1,
15) = 37.34, p < .0001, MS~ = 0.062. The concrete versus
abstract words in the lexical decision test did not differ signif-
icantly in response time ( F < 1). Accuracy was, however,
somewhat higher for abstract (M -- .96) t han concrete words
( M= .94), F(1, 15) = 3.92, p = .07, MSe = 0.038.
The overall recognition test results are presented in Table
3. They are considered first with respect to frequency alone
and concreteness alone. This simplification is justified because
a factorial analysis of variance of the data showed that the
two stimulus variables, frequency and concreteness, do not
interact. After examining the mai n effects of frequency and
concreteness separately, the results for the combi nat i on of the
two variables will be examined.
Summi ng across concreteness conditions, the mirror effect
for frequency is evident again in both the confidence ratings
and the proportions of yes responses (hits and false alarms).
The key tests (MS~ = 0.272) of the confidence ratings for old
low-frequency (M = 5.79) versus old high-frequency words
(M = 5.31) show t(105) = 3.66, p < .0005; and for new high-
frequency (M = 3.63) versus new low-frequency words ( M =
3.17) show t(105) = 3.50, p < .0005. Parallel tests on the
proportion yes data give the same results (both ps < .025).
Summi ng across frequency conditions, the mirror pattern
also appears for concreteness in both the confidence ratings
and the hits and false alarms. Tests of the confidence ratings
show concrete old ( M = 5.74) higher than abstract old (M =
5.37), t(105) = 2.88, p < .005, and concrete new (M = 2.98)
lower t han abstract new (M -- 3.82), t(105) = 6.40, p < .0005.
Concrete hits (M = .679) are higher than abstract hits, (M =
.654), but the difference is not statistically significant. Con-
crete false alarms (M = . 177) are lower than abstract false
alarms (M = .320), t(105) = 6.01, p < .0005.
The mean confidence ratings of misses again show the
order noted by Brown et al. (1977). The order holds for both
frequency and concreteness. The low-frequency misses are
rated lower (2.81) than the high-frequency misses (3.03), t(45)
= 2.86, p < .005, MSe -- 0.096. The concrete misses are also
rated lower (2.89) than the abstract (2.95), but the difference
is not statistically significant.
Before moving to the consideration of the accuracy scores
for the combined conditions, two issues will be touched on.
One concerns the effect of word length on the pattern of the
results. We noted earlier that the word sets differed in mean
length. Although those lengths did not correspond to the
mirror effects observed, we decided to check on any possible
effects of word length fully. We did this by removing words
from the word sets so that the reduced word sets all had
identical mean lengths while preserving the match of fre-
quency and concreteness. This meant going from four sets of
70 words to four sets of 30 words. We then computed the
mean ratings for the reduced sets and analyzed the pattern
produced. The pattern and overall analysis of variance cor-
responded fully to those obtained for the complete sets of
words. To convey the correspondence, the means for the
reduced set corresponding to the means in the first row of
Table 3 read from left to right as follows: 2.72, 3.32, 3.68,
4.01, 5.19, 5.46, 5.42, and 6.10. The means are only slightly
different from those for the larger set of items. The results of
statistical analysis based on the reduced set also differ only
slightly and in no important way from the full analysis. The
differences in word length, therefore, were not important.
The second issue concerns the response times. There is
evidence of some differences: old are faster than new items,
F(1, 15) = 12.776, p < .003, MSe = 0.013; overall, high-
frequency words are faster than low, F(1, 15) = 20.168,
p < .0005, MSe = 0.004; concrete are faster than abstract,
F(t , 15) = 5.716, p < .05, MSe = 0.013.
Our mai n concern was, however, the presence of a speed-
accuracy trade-off. There is no evidence of this. There is no
relation between the response times and either of the accuracy
measures within either the new conditions or the old in Table
3. The rank order correlation of speed and accuracy is zero
Table 3
Means for the Eight Conditions of Experiment 3 (N = 16)
New Old
Measure LC HC LA HA HA LA HC LC
Rating 2.73 3.23 3.61 4.02 5.19 5.54 5.44 6.04
P(yes) .161 .193 .284 .357 .630 .677 .625 .732
RT a 2,011 1,885 2,051 1,970 1,877 1,927 1,728 1,844
Note. L = low frequency; H = high frequency; C = concrete; A = abstract; P(yes) = proportion of yes
responses; RT = reaction time.
a Antilog of mean log response time (in milliseconds).
for both the new and old conditions. There is, then, no
evidence in this experiment, or in Experiment l, for any
general relation between the mirror effect and response times.
We return now to the accuracy measures in Table 3.
Contrary to our expectations, concreteness and frequency
were approximately equal in their effectiveness. This can be
seen by comparing the highest mean rating for frequency,
which for the low-frequency old words (LC plus LA) is 5.79,
and the highest mean rating for concreteness, which for
contrete old words (LC plus HC) is 5.74. With two variables
of equal strength, only partial orders are expected for the
confidence ratings. A partial order is what is obtained:
R(LCN) < R(HCN), R(LAN) < R(HAN) <
R(HAO) < R(LAO), R(HCO) < R(LCO).
A deviation from the partial order is obtained, however, in
the hits and false alarms because H(HCO), .625, is slightly
lower than H(HAO), .630. This we consider to reflect the
relative weakness of the proportion yes data, which contain
less information than do the ratings.
If a partial rather than full order is due to the absence of
strong differences in the effectiveness of the two stimulus
variables, frequency and concreteness, then a number of
changes can be introduced to produce the full order. One
possible change is to select the sets of words used so that either
the differences in frequency or the differences in word con-
creteness would be greater t han in the word sets used in this
experiment. We could, for example, select only the very
highest and very lowest frequency words. This, however,
would reduce further an already limited pool of words. The
other way would be to weaken one of the variables, for
example, by adding middle range items in either the high-
concreteness set or the low-frequency set. This could, how-
ever, weaken the effectiveness of the variable sufficiently to
lose the mirror regularity. We decided not to change the word
sets but to introduce an encoding task that would differentially
affect the word sets. We therefore repeated the experiment,
making the concreteness variable stronger by a concreteness
encoding task.
Exper i ment 4: Mul t i pl e Mi r r or Effects and Ful l
Or de r - - Fr e que nc y and Concr et eness Pl us
Concr et eness Encodi ng Task
This was a replication of Experiment 3, with a change in
the encoding task. We hoped that a concreteness encoding
task would strengthen the concreteness variable and thus give
a full order of the eight means produced by the combination
of two variables--word frequency and concreteness. The eight
means should display a higher order mirror effect.
The words were the same as those in Experiment 3. Instead
oflexical decision, however, a concreteness encoding task was
given. No nonwords were shown. During the initial list pres-
entation, the subjects carried out, as an incidental learning
task, a concreteness j udgment on the words. During the
recognition test the words, both new and old, were each judged
first for concreteness before the recognition j udgment was
made. This was done in order to have the encoding operation
affect new as well as old items.
Me t h o d
Materials. The study list consisted of the 140 words in four
categories used in Experiment 3 (LC, HC, LA, HA) plus 4 practice
items, 40 initial filler words, and 40 final filler words. The test list
consisted of those 140 words plus 140 matched words.
Procedure. The subject was instructed that items that could be
sensed (seen, heard, touched, tasted, or smelled) were concrete. Dur-
ing the encoding task the subjects pressed a key on a keyboard labeled
"+" if the word on the screen was judged concrete or a key labeled
"- " if it was judged not concrete. During the initial encoding trials,
the subject received feedback on the correctness of the judgment. The
feedback consisted of the word right or wrong appearing on the screen
for 750 ms. During the recognition test, the subject made a concrete-
ness judgment first for each word, but no feedback was given. Im-
mediately after the concreteness judgment, the subject made a con-
fidence judgment on whether the word was old or new, on an eight-
key array as in Experiment 3.
Subjects. Sixteen undergraduates participated.
Re s u l t s
On the initial encoding task, the subjects were more accu-
rate on low-frequency (M = .96) than high-frequency words
(M = .94), F(I, 15) = 6.66, p < .03, MSe = 0.025, and on
concrete (M = .97) than abstract words (M = .93), F(1, 15)
= 16.36, p < .002, MSe = 0.054. Items encoded incorrectly
on test trials (3.5%) were not included in the scoring of
recognition performance. Examination of the data shows,
however, that even if these items are included, they do not
change the pattern of results.
The results for the recognition test are given in Table 4.
First, the encoding task was successful in making the concrete-
ness variable stronger in the recognition task. The mean rating
for old concrete words here is 6.72 as compared with 6.47 for
old low-frequency words. (The corresponding means in Ex-
periment 3 were 5.74 and 5.79.) We can expect, then, that a
full order of inequalities will be found for these data.
The results will be examined again, first with respect to
frequency alone and concreteness alone. As in Experiment 3,
a factorial analysis of variance of the data showed that fre-
quency and concreteness did not interact.
The means show the mirror effect for both word frequency
alone and concreteness alone, and for both the mean confi-
dence ratings and the proportion of yes responses (hits and
false alarms) in each. The tests (MSe = 0.300) of the ratings
of high-frequency old ( M = 6.17) versus low-frequency old
(M = 6.47) give t(105) = 2.25, p < .025; and high-frequency
new (M = 3.29) versus low-frequency new (M = 2.85), t(105)
= 3.21, p < .001. Parallel tests on the proportion yes data
show both comparisons with p < .05.
For concreteness the confidence ratings (MSe = 0.300) of
the concrete old ( M = 6.72) versus abstract old (M = 5.92)
give t(105) = 5.87, p < .0005. The ratings for the abstract new
(M = 3.30) versus concrete new (M = 2.85) give t(105) =
3.32, p < .001. Parallel tests on the proportion yes data show
both comparisons with p < .001.
The order of the mean confidence ratings for misses noted
before holds again ( MSo = 0.380). Low-frequency misses
(M = 2.48) have lower ratings than do high-frequency misses
Tabl e 4
Means f or the Eight Conditions of Experiment 4 (N = 16)
New Old
Measure LC HC LA HA HA LA HC LC
Rating 2.67 3.02 3.03 3.57 5.85 5.99 6.48 6.96
P(yes) .148 .200 .201 .300 .747 .758 .813 .882
Note. L = low frequency; H = high frequency; C = concrete; A = abstract; P(yes) --- proportion of yes
responses.
( M = 2.77), t(45) = 1.92, p < .05. Concr et e mi sses ( M = 2.47)
have l ower rat i ngs t han do abst r act mi sses ( M = 2.78), t(45)
= 2.06, p < .025.
Of par t i cul ar i nt erest her e is whet her t her e is an ext ended
ei ght -cat egory mi r r or effect, i n full order, now t hat one of t he
exper i ment al vari abl es, concret eness, is st ronger t han t he
other. Tabl e 4 di spl ays t he dat a for all ei ght c ombi ne d con-
di t i ons. Bot h mean conf i dence rat i ngs and pr opor t i on of yes
responses now show t he expect ed full order:
R( LCN) < R( HCN) < R( LAN) < R( HAN) <
R( HAO) < R( LAO) < R( HCO) < R( LCO)
and
FA( LCN) < FA( HCN) < FA( LAN) < FA( HAN) <
H( HAO) < H( LAO) < H( HCO) < H( LCO) .
To exami ne t he st rengt h of t he orderi ngs, we car r i ed out an
anal ysi s t hat par al l el s t he t est s car r i ed out i n t he pr ecedi ng
exper i ment s. I n t he ear l i er t est s t her e were t wo ol d and t wo
new means. Her e t her e are f our of each. The mi r r or effect
will be evi denced by t he st rengt h of t he l i near c ompone nt i n
each set of f our means. We t her ef or e eval uat ed t he l i near
component of each set of f our r el at ed r ecogni t i on measur es
i n Tabl e 4, for exampl e, t he conf i dence rat i ngs of LCO, HCO,
LAO, and HAO i n row 1. Whe n t hi s is done, we f i nd t he
following: The l i near c ompone nt for conf i dence r at i ng means
of t he f our ol d condi t i ons gives F( 1, 105) = 39.13, p < .0005;
for t he f our new condi t i ons, F(1, 105) = 19.45, p < .0005; for
hits, F(1, 105) = 19.06, p < .0005; and for false al ar ms, F( 1,
105) -- 17.89, p < .0005. The ext ent of or der in t he means
can be ful l y conveyed by eval uat i ng t he pr opor t i on of var i ance
account ed for by t he mi r r or or der i ng i n each ar r ay of means.
For conf i dence ratings, t he pr opor t i on of var i ance account ed
for by t hese t wo l i near component s , aft er t he effect of ol d
versus new i t ems is t aken out , is .93; i t is .85 for t he hi t s and
false al arms.
The results of t he exper i ment st rengt hen t he empi r i cal basi s
of t he mi r r or effect. The effect is shown, mor eover , t o pr oduce
an ext ended or der when t wo var i abl es t hat di ffer i n effective-
ness ar e used. The ext ended or der is an ei ght - posi t i on mi r r or
effect.
E x p e r i me n t 5: F r e q u e n c y , Co n c r e t e n e s s ,
a n d T r a n s f o r ma t i o n
The pur pose of t hi s exper i ment was t o exami ne mul t i pl e
mi r r or effects wi t h a t hi r d, new t ype of var i abl e a d d e d - -
t r ans f or mat i on of t he list words. Kol er s (1973, 1974, 1975a,
1975b), Kol er s and Ost ry (1974), and Gr a f (1982) have shown
t hat r ecogni t i on me mor y is bet t er for t r ans f or med t ext (for
exampl e, t ext i n whi ch t he l et t ers are i nver t ed or reversed)
t han for st andar d t ext . Of t he seven separ at e exper i ment s
r epor t ed, however, onl y one shows t he mi r r or effect.
The effect of t r ans f or mat i on is of i mpor t ance for est abl i sh-
i ng t he general i t y of t he mi r r or effect. Al mos t al l of t he
demons t r at i ons of t he mi r r or effect are for st i mul us vari abl es,
such as wor d f r equency and concret eness. Those vari abl es are
pr oduced by t he sel ect i on of sets of i t ems. Tr ans f or mat i on
falls out si de t he class of st i mul us vari abl es. Tr ans f or mat i on
can be appl i ed t o any i t em, and i t is, t herefore, i ndependent
of any i t em set. I f t r ans f or mat i on can be shown t o pr oduce
t he mi r r or effect, t hen a mor e general st at ement concer ni ng
t he effect ma y be made: Any vari abl e ( not j us t classes of
st i mul i ) t hat affects effi ci ency of r ecogni t i on will pr oduce t he
mi r r or effect. I f t he effect cannot be demons t r at ed for t r ans-
f or mat i on, t hen t he mi r r or effect ma y be l i mi t ed t o st i mul us
vari abl es.
Ther e are t wo reasons why t he ci t ed exper i ment s on t r ans-
f or mat i on ma y not have shown t he mi r r or effect. One is t hat
t he t est i ng pr ocedur e i n t hose exper i ment s was compl ex. I n
t he Kol er s exper i ment s, t he t est i t ems i ncl uded not onl y ol d
sent ences i n t he same f or m as ori gi nal l y pr esent ed but also
ol d sent ences in a di fferent f or m (for exampl e, i n st andar d
form when ori gi nal l y pr esent ed i nvert ed). The subj ect s clas-
sifted t he sent ences as ol d same- f or m, ol d di fferent -form, or
new. Hi t s and false al ar m rat es t hat appr oxi mat e t hose f r om
or di nar y r ecogni t i on tests were der i ved f r om t hose classifica-
t i ons. The compl exi t y of t he pr ocedur e r equi r ed of t he sub-
j ect s ma y have wor ked agai nst cl ear demons t r at i on of t he
effect. I n t he Gr a f (1982) exper i ment , t he subj ect s vi ewed
sent ences dur i ng t he st udy phase but were gi ven wor d pai rs
dur i ng t he test.
Anot her r eason for t he negat i ve results ma y be fl oor effects
on t he false al ar ms. The subj ect s i n t he Kol er s negat i ve cases
showed l ow fal se-al arm rat es (.02 t o .09) i n bot h t he st andar d
and t r ans f or med condi t i ons. Thi s means t hat t he possi bi l i t y
of a cl ear di fference showi ng is slight. We t herefore deci ded
t o exami ne t he effect of t r ans f or mat i on in a si mpl er arrange-
ment and wi t h mat er i al t hat we knew woul d not show fl oor
effects.
Thi s exper i ment was basi cal l y t he same as Exper i ment 3
except for t he addi t i on o f a t r ans f or mat i on t o hal f t he wor ds
present ed. Thi s t r ansf or mat i on, reversal of t he or der of l et t ers
in t he word, r equi r ed a decodi ng oper at i on by t he subject.
Ha l f t he wor ds were pr esent ed i n st andar d order; hal f were
pr esent ed i n reverse order, for exampl e, emoh.
Me t h o d
Procedure. The subjects were instructed to pronounce all words
presented on the screen. Those presented in standard order were
simply read. Those in reverse order had to be decoded and then
spoken. The experimenter monitored the performance throughout to
make sure that both tasks were performed correctly. This was done
both in the list presentation and in the test. During the test, the
subject said each word aloud and then responded as in Experiment 4
by pressing one of eight keys on the top row of the keyboard (with
labels ranging from NNNN, NNN, . . . to YYYY) to indicate whether
the word was new or old and the degree of confidence in the judgment.
Materials. The word lists were the same as in Experiment 3 and
4 except that two words were deleted from each of the four basic
word sets (LC, HC, LA, and HA) to give a total of 272. This permitted
the counterbalancing of word lists with the additional transformation
variable. The mean log frequencies and mean concreteness measures
for the basic word sets were the same as in Experiments 3 and 4.
The study list consisted of 136 words in four categories plus 4
practice items, 40 initial filler words, and 40 final filler words. The
test list consisted of the 136 old words, plus 136 new words from the
same four categories. Old words were presented with letters in the
same order as in their initial presentation. For example, if a word was
reversed initially, it was presented reversed during test.
Subjects. Thirty-two undergraduates participated.
Resul t s
The overall means for each variable separately are shown
in Table 5. The mirror effect appears i n each row of the table.
Preliminary analysis of the data indicated, however, that
frequency and transformation interacted. The data for those
variables are, therefore, separated out in Table 6, which shows
the transformation conditions at both levels of word fre-
quency. It can be seen that the mirror effect holds for the
transformation at both high and low frequency. The test of
the critical conditions for the means in Table 6 shows all the
differences for the mean ratings (MS, = 0.425) as statistically
significant at the .01 level or better except the difference for
new reversed versus new standard in the low-frequency con-
dition (p <. 10). The same pattern holds for the proportion
yes data (MSe = 0. l l 5), in which all key comparisons are
significant at the .025 level or better except, again, for the
Table 5
Means f or the Transformation, Frequency, and Concreteness
Conditions of Experiment 5 (N = 32)
Measure New Old
Transformation
Reversed Standard Standard Reversed
Rating 2.74 2.96 5.66 7.25
P(yes) .182 .211 .687 .922
Frequency
Low High High Low
Rating 2.55 3.15 6.30 6.61
P(yes) .157 .237 .780 .829
Concreteness
Concrete Abstract Abstract Concrete
Rating 2.46 3.24 6.35 6.56
P(yes) .136 .257 .799 .810
Table 6
Means f or the Transformation Condition, High and Low
Frequency Separate
New Old
Transformation
Condition Reversed Standard Standard Reversed
High frequency
Rating 3.01 3.29 5.32 7.28
P(yes) .215 .259 .630 .929
Low frequency
Rating 2.47 2.62 5.99 7.23
P(yes) .150 .164 .742 .916
new reversed versus the new standard in the low-frequency
condition (p < .20). The comparisons for frequency are all
statistically significant for both confidence ratings and pro-
portion yes at the .0005 level except for a nonsignificant and
slightly reversed effect of low old versus high old in the
reversed condition (p > .20).
The reason for the interaction between frequency and trans-
formation may be that the reversed old condition brings the
performance close to the ceiling in both the ratings (greater
than 7.2 on a scale of 8) and the proportion yes greater than
.90). The mirror effect holds, however, for the transformation
variable at both levels of frequency. This variable was our
mai n concern. The deviation in word frequency is not of
major concern because the meta-analysis (Glanzer & Adams,
1985) showed the mirror effect for word frequency in 23 out
of 24 published experiments, and Experiments 1, 2, 3, and 4
above all show it.
The confidence ratings for misses have the same pattern as
before on each of the variables. Low-frequency misses (M =
2.49) are lower than high-frequency misses (M = 2.79), F(I,
31) = 19.80, p < .0001, MSe - - 0 . 0 7 0 ; concrete misses (M =
2.52) are lower than abstract ( M = 2.76), F(1, 31) = 9.59,
p < . 0 0 5 , MSe - - 0.096; and reversed misses (M = 2.49) are
lower than standard (2.69), F(1, 31) = 2.86, p = .10, MS~ =
0.224.
The results of this experiment support further the points
made in the preceding experiments. The mai n new finding is
that the mirror effect can be produced by variables other than
stimulus variables such as word frequency and word concrete-
ness. It is produced by transformations on a single set of
stimulus words. The transformations induce subjects to carry
out operations on the words that affect the accuracy of rec-
ognition. There is support, therefore, for the more general
statement concerning the mirror effect. Any variable that
affects recognition accuracy, not just stimulus variables, will
produce the effect.
Gener al Di scussi on
Brown (1976) and Brown et al. (1977) were the first to
argue that the mirror effect required a change in the theoretical
approach to recognition memory. They argued that the sub-
jects took account of more than the strength of the items
being evaluated. They took account also of the memorability
of the items. This more complex basis of decision is incor-
porat ed i n the t heory t hat will be presented next, at t ent i on/
likelihood theory.
At t ent i on/ l i kel i hood t heory is a sampl i ng t heory with t wo
special me c ha ni s ms - - a n at t ent i on mechani sm and a decision
mechani sm. The decision mechani sm proposed differentiates
it f r om current theories of recognition. The key idea concern-
ing t he decision mechani sm i n recogni t i on is t hat the subjects
evaluate a compl ex of i nf or mat i on related t o an item. The
compl ex includes i nf or mat i on about t he relation of t he given
item t o bot h a model new i t em and a model ol d item. Thi s
i nformat i on is realized in a likelihood ratio (see Assertion 5
below).
The assertions of t he t heory are t he following:
1. Stimuli are sets of features. The number of such features
is N. This will be assumed const ant for all stimuli. Because N
refers t o features, there is no reason t o assume, at this point,
t hat one stimulus has mor e or fewer features t han anot her.
2. Some pr opor t i on of those features is mar ked in new
stimuli. This pr opor t i on is p(new). The p(new) represents t he
noise level. Thi s again, here, will be assumed const ant for all
stimuli. There is no reason, at this point, t o assume t hat one
new stimulus enters with greater noise marki ng t han another.
3. Different classes of stimuli or different situations evoke
different amount s of at t ent i on by t he subject. This is trans-
lated i nt o differences in t he number of features, n(i), exami ned
(sampled) duri ng a trial. The sampl i ng is r andom.
4. When features are exami ned, t hey are marked. The
proport i on of features mar ked is a(i) = n ( i ) / N . Therefore, the
state of stimuli after t hey have been experienced is given by
the following equat i on:
p(i, old) = p(new) + a(i). (1 - p(new)). (1)
Condi t i ons t hat evoke exami nat i on of a larger pr opor t i on of
features will result in t he marki ng of a larger pr opor t i on of
features. The learning const ant a(i) will be larger, and t he
learning rate faster.
5. Duri ng a recognition test, t he subject uses t he st andard
mechani sms of signal det ect i on t heory in maki ng responses.
Specifically, likelihood ratios are comput ed and decisions are
made on t he basis of those likelihood ratios.
Assertions 2 and 3 set up the underl yi ng distributions for
new i t ems- - bi nomi al s with the paramet ers n(i) and p(new)
for a particular condi t i on. Assertion 4 sets up bi nomi al distri-
but i ons for the old items with paramet ers n(i) and p(old). The
subject uses i nf or mat i on related t o those distributions t o
generate likelihood ratios and responds on t he basis of those
likelihood ratios. This distinguishes this t heory from strength
theories in whi ch t he subject responds on t he basis of strength
or its equivalent: amount of marki ng, familiarity. The likeli-
hood ratio is a key mechani sm in t he pr oduct i on of the mi rror
effect. For the bi nomi al distributions we consider here, the
log likelihood ratio for a single presented item is the following:
(p_(i, ol d) ] ( q ( i , old)~
In L = x. l n \p(---~-e--~-ew) / + [n(i) - xl . In \ q("~ew)/ " (2)
The n(i) is t he number of features the subject observes. The x
is the number of those marked. They are presented by the
stimulus and are available t o the subject. The l ogari t hmi c
terms reflect the subject' s model of the situation.
The process is the following. A test i t em is presented. The
subject exami nes a number of features (n(i)) and notes the
number of those t hat are mar ked (x). The subject t hen brings
in t wo items of i nf or mat i on- - t he proport i on of marked fea-
tures an old i t em of this t ype is expected t o have and t he
proport i on of marked features a new i t em is expected t o have.
On t he basis of this i nformat i on, likelihood ratios are com-
puted. The likelihood is used in t he final decision. For ex-
ample, in a y e s ~ n o test i f t he likelihood ratio is greater t han a
preset likelihood criterion, the subject says "yes. " The loga-
ri t hmi c t erms in Equat i on 2 are t he subjects' model of t he
situation. They pl ay the same role as Brown' s (1977) memor -
ability evaluation.
The t heory permi t s us t o specify key statistics of the process.
It also permi t s us by comput at i on, t o simulate t he regularities
t hat make up the mi rror effect. Two key statistics are the
mean and variance of the log likelihood (In L) distributions:
~ p ( i , ol d) /
M In L(i, j) = n(i).p(i, j ) - l n \ ~ ]
+ n(i)-q(i, j ) - l n (q(i, old)~
\ q ( n e w) ] (3)
Var In L(i, j) = n(i)-p(i, j)-q(i, j)
r [ p ( i , ol d) . q( new) \ ] 2
[ l n t ' ~ - - ~ e ~ o ' ~ } j , ( 4 )
where i is t he experimental condi t i on, such as stimulus set A
or B, and j is t he stimulus state, either new or old. The
variance will be used later t o test the theory.
One possible objection t o t he theory, as stated earlier, is
t hat it has the subject hol d in mi nd several different p(old)s.
Fr om one poi nt of view, however, the subject has t o have
onl y some idea of t he average p(new), t he n(i), and the number
of features. The p(i, old) can t hen be estimated, or at least
ordered. We can simplify the t heory further and assume t hat
the subject works with a single p(old), for example, t he average
p(old) for several stimulus classes, not t wo or mor e as implied
above. I n t hat case, p(i, old) and q(i, ol d) - - t he logarithmic
terms in t he mean and var i ance- - r educe t o a single p(old)
and q(old). The terms outside t he logarithmic terms are not
affected. They reflect the cont ri but i on of t he actual stimuli,
not t he subject' s model of the situation. It can be shown t hat
the mai n effect considered so f ar - - t he mi r r or order--st i l l
holds under this simplification. Moreover, the derivations
concerni ng t he variances which will be tested later also hold.
We cannot , however, handl e the ratings for misses with this
assumption.
Using the t heory as presented above, we have carried out
hundreds of comput at i ons with a large range of Ns, n(i)s, and
p(news)s, and therefore also for a large range of a(i)s and
p(i,old)s. Our onl y restriction on the ps has been t hat t hey
stay under .50. Our comput at i ons show t hat the t heory pro-
duces t he mi rror pat t ern for t he st andard recognition meas-
ures:
1. hits and false alarms: FA(AN) < FA(BN) < H(BO) <
H(AO);
2. mean confi dence ratings: R(AN) < R(BN) < R(BO) <
R(AO);
3. two-alternative forced choice: P(BO, BN) < P(AO, BN),
P(BO, AN) < P(AO, AN).
It al so pr oduces t he or der of conf i dence rat i ngs for misses
f ound in t he dat a.
Some general tests of t he t heor y are possible. We will not
do convent i onal fi t t i ng of t he val ues for t he five exper i ment s.
Al t hough t he t heor y has onl y f our basi c pa r a me t e r s - - N, t wo
n(i)s, and p( ne w) - - i t cannot be used t o fit t he dat a of t he
present exper i ment s. For exampl e, t he yes/no dat a of Exper-
i ment 1 give onl y f our means wi t h f our par amet er s t o be
est i mat ed. Therefore, i nst ead of convent i onal fitting, we will
appl y some general tests of t he t heory. The tests will be
concer ned wi t h t he slopes of t he recei ver oper at i ng char act er -
istic (ROC) for condi t i ons in t he Exper i ment s 1 t hr ough 5,
usi ng Equat i on 4, t he equat i on for t he vari ance.
The t heor y per mi t s us t o der i ve some cri t i cal i nf or mat i on
about t he rat i o of t he vari ances for pai rs of condi t i ons r el evant
t o ROCs. We will do t hi s for one case, t hat i nvol vi ng l ow (L)
and high (H) frequency, first. For t hat case we consi der f our
vari ance ratios: (a) Var In L( LO) / Va r In L( HN) ; (b) Var In
L( LO) / Va r In L( LN) ; (c) Var In L( HO) / Va r In L( HN) ; (d)
Var In L( HO) / Va r In L( LN) . These var i ance r at i os yi el d
pr edi ct i on concer ni ng t he sl opes of t he ROCs.
On t he basi s of Equat i on 4 we can der i ve t wo st at ement s.
Thei r der i vat i on is gi ven i n t he Appendi x.
1. The f our r at i os above are l i st ed i n or der of size, wi t h t he
highest rat i o first.
2. The first t hr ee r at i os are all great er t han 1.0. The fourt h,
for HO and LN, is i ndet er mi nat e. It ma y be great er, less t han,
or equal t o 1.0. Its size rel at i ve t o t he ot her r at i os is, however,
known. Thi s is assert ed i n t he first st at ement .
The f our r at i os a bove ar e r e l a t e d t o f our ROCs : (a) l ow-
frequency hi t s agai nst hi gh-frequency false al ar ms ( LO/ HN) ;
(b) l ow-frequency hi t s agai nst l ow- f r equency false al ar ms ( LO/
LN); (c) hi gh-frequency hi t s agai nst hi gh-frequency false
al ar ms ( HO/ HN) ; (d) hi gh-frequency hi t s agai nst low-fre-
quency false al ar ms ( HO/ LN) . ROCs 2 and 3 ( st andar d
ROCs) are t he t wo t hat woul d or di nar i l y be pl ot t ed. ROCs 1
and 4 (crossed ROCs) will be consi der ed here i n or der t o test
t he t heor y fully.
Ther e is a known rel at i on ( Gr een & Swets, 1966, p. 62)
bet ween t he vari ances of t he signal and noi se di st r i but i ons
and t he sl ope of t he nor mal i zed ( z score) ROC. The r at i o of
t he signal var i ance t o t he noi se var i ance is t he i nverse of t he
sl ope of t he ROC. On t he basi s of t hi s rel at i on, t he f our rat i os
above i mpl y t he fol l owi ng t wo st at ement s for t he ROC' s.
1. Because t he r at i os of var i ances are l i st ed i n or der f r om
highest t o lowest, t he sl opes of t he nor mal i zed ROCs shoul d
show t he i nverse order. The sl ope of t he ROC for low-
frequency hi t s agai nst hi gh-frequency false al ar ms ( LO/ HN)
cor r espondi ng t o t he first r at i o shoul d be t he lowest, and t he
sl ope of t he ROC ( HO/ LN) cor r espondi ng t o t he l ast r at i o
shoul d be t he highest.
2. The first t hree nor mal i zed ROCs shoul d all give sl opes
less t han 1.0.
We will now exami ne t he sl opes of t he f our nor mal i zed
ROCs obt ai ned for each of t he t hree vari abl es in t he five
r epor t ed exper i ment s. Because Exper i ment s 3, 4, and 5 con-
t ai n several vari abl es, t hey give a t ot al of ni ne sets of ROCs.
Al l five exper i ment s var i ed frequency. Exper i ment s 3, 4, and
5 var i ed concret eness. Onl y Exper i ment 5 had t r ansf or mat i on
as a vari abl e.
The sl opes for t hese ROCs are pr esent ed in Tabl e 7. The
frequency vari abl e gives t he ent ri es i n t he left par t of t he table.
The concret eness and t r ans f or mat i on vari abl es give t he ent ri es
in t he ri ght par t of t he t abl e. The or der i ng of t he ROCs has
been set so t hat equi val ent ROCs appear on t he same row.
For exampl e, L is t he strong, Ht h e weak frequency condi t i on;
Ct h e strong, A t he weak concret eness condi t i on; R t he strong,
S t he weak t r ans f or mat i on condi t i on. Therefore, LO/ HN,
CO/ AN, and RO/ SN ar e on t he same r o w- - t h e ROCs for
st rong condi t i on hi t s agai nst weak condi t i on false al arms.
1. Wi t h respect t o t he val ue of t he slopes, every one of t he
27 ( ni ne sets of t hree each) pr edi ct ed t o be less t han 1.0 is
i ndeed less t han 1.0 ( LO/ HN, LO/ LN, HO/ HN; CO/ AN,
CO/ CN, AO/ AN; RO/ SN, RO/ RN, SO/ SN) . The pr obabi l i t y
of 27 such resul t s occur r i ng by chance, usi ng t he bi nomi al
wi t h p = .5, is 7.5 x 10 -9. I f onl y t he st andar d ROCs are
c o n s i d e r e d - - LO/ LN and HO/ HN and t hei r par al l el s - - t hen
t he pr obabi l i t y of 18 such results occur r i ng by chance is 3.81
X 10 -6.
2. Wi t h respect t o t he or der i ng of t he sets of f our sl opes for
each vari abl e, we f i nd t hat all but one cor r esponds fully t o
t he pr edi ct ed order. Appl yi ng t he bi nomal , t he pr obabi l i t y of
ei ght out of ni ne cases giving t he pr edi ct ed or der by chance,
wi t h p = 1/24 and n = 9, is 3.78 x 10 -j2. We can, again,
rest ri ct our at t ent i on t o t he st andar d ROCs - - L O/ L N and
HO/ HN and t hei r parallels. We f i nd t hen t hat ei ght out of
t he ni ne show t he pr edi ct ed order. The pr obabi l i t y of t hi s
number or mor e occur r i ng by chance, usi ng t he bi nomi al
wi t h p = .5, is .0195.
At t ent i on/ l i kel i hood t heor y does t he following:
1. It handl es t he known regul ari t i es of t he mi r r or e f f e c t - -
t he or der i ng of hi t s and false al arms, t he or der i ng of confi-
dence ratings, and t he or der i ng of choi ces i n t he t wo- al t er na-
t i ve forced choice.
2. It handl es new r egul ar i t i es - - t he size and or der of ROC
slopes.
Tabl e 7
Slopes of Normalized ROCs for Each Variable in the Five Experiments (1, 2, 3, 4, and 5)
Freq. 1 2 3 4 5 Conc. 3 4 5 Transf. 5
LO/ HN .66 .56 .69 .64 .61 CO/AN .68 .56 .61 RO/SN .56
LO/LN .74 .61 .81 .74 .72 CO/CN .74 .65 .67 RO/ RN .66
HO/ HN .91 .70 .84 .71 .76 AO/AN .89 .77 .82 SO/SN .85
HO/ LN 1.03 .75 .98 .82 .91 AO/CN .96 .89 .90 SO/RN 1.00
Note. Experiments 1, 2, 3, 4, and 5 all varied frequency. Only Experiments 3, 4, and 5 varied
concreteness. Experiment 5 alone varied transformation. Freq. = frequency; Conc. = concreteness;
Transf. = transformation; L = low frequency; H = high frequency; C = concrete; A = abstract; R =
reversed; S = standard; O = old; N = new.
Ther e are t hr ee ot her appr oaches t o ei t her t he general
mi r r or effect or speci al cases of it. Two of t hese ( Gl anzer &
Bowles, 1976; Gi l l und & Shiffrin, 1984) were concer ned wi t h
a speci al c a s e - - wo r d f r equency effects. Bot h are st rengt h
t heor y appr oaches.
The first appr oach was based on wor k i n our l abor at or y
(Bowles & Gl anzer , 1983; Gl anzer & Bowles, 1976). One
pr obl em wi t h t he appr oach is t hat i t was specific t o t he case
of wor d frequency. I t coul d not be gener al i zed t o ot her
st i mul us vari abl es and woul d have f ur t her di ffi cul t i es wi t h
vari abl es such as t r ansf or mat i on. Our di ssat i sfact i on wi t h t hat
t heor y l ed t o t he f or mul at i on of at t ent i on/ l i kel i hood t heory.
A second appr oach is par t of t he compr ehensi ve me mo r y
t heor y of Gi l l und a nd Shiffrin (1984, p. 46). It al so focuses
on t he specific case of wor d frequency. The appr oach assumes
t hat t he subj ect rescales t he under l yi ng di st r i but i ons on t he
basi s of di st ance f r om separ at e cr i t er i a and t hei r st andar d
devi at i ons. The subj ect t hen al i gns t he di st r i but i ons by pl aci ng
t he di fferent cri t eri a i n a single l ocat i on. The rescal i ng and
al i gnment pr oduce t he mi r r or effect. The specific char act er -
istics of t he process t hat necessari l y pr oduce t he effect are not
given.
A t hi r d appr oach t hat concer ns i t sel f wi t h t he mi r r or effect
is t hat of Hockl ey and Mur doc k (1987, p. 355). I n t hat
appr oach t he or der of under l yi ng di st r i but i ons depi ct ed i n
Panel 1 of Fi gur e 1 is assumed. The pr obl em, however, is t o
expl ai n why t hose under l yi ng di st r i but i ons are or der ed as t hey
are.
Al l t hr ee appr oaches handl e t he mi r r or effect as a speci al
puzzle. We bel i eve, however, t hat t he general i t y of t he mi r r or
effect requi res a new view of t he process under l yi ng recogni -
t i on memor y. Subj ect s make t hei r r ecogni t i on deci si ons by
usi ng a compl ex of i nf or mat i on about each st i mul us. I n t he
t heor y out l i ned here, t he compl ex is a l i kel i hood rat i o. What -
ever t he appr oach, some equi val ent compl ex and an appr o-
pr i at e deci si on mechani s m will have t o be post ul at ed t o
handl e t hi s regul ari t y in r ecogni t i on memor y.
Re f e r e n c e s
Adams, J. K. (1985). Visually presented verbal stimuli by assembly
language on the Apple II computer. Behavior Research Methods,
Instruments & Computers, 17, 489-502.
Bowles, N. L., & Glanzer, M. (1983). An analysis of interference in
memory. Memory & Cognition, 11, 307-315.
Brown, J. (1976). An analysis of recognition and recall and of prob-
lems in their comparison. In J. Brown (Ed.), Recall and recognition
(pp. 1-35). New York: Wiley.
Brown J., Lewis, V. J., & Monk, A. F. (1977). Memorability, word
frequency and negative recognition. Quarterly Journal of Experi-
mental Psychology, 29, 461-473.
Egan, J. P. (1975). Signal detection theory and ROC analysis. New
York: Academic Press.
GiUund, G., & Shiffrin, R. M. (1984). A retrieval model for both
recognition and recall. Psychological Review, 91, 1-67.
Glanzer, M., & Adams, J. K. (1985). The mirror effect in recognition
memory. Memory & Cognition, 13, 8-20.
Glanzer, M., & Bowles, N. (1976). Analysis of the word-frequency
effect in recognition memory. Journal of Experimental Psychology:
Human Learning and Memory, 2, 21-31.
Glanzer, M., & Ehrenreich, S. L. (1979). Structure and search of the
internal lexicon. Journal of Verbal Learning and Verbal Behavior,
18, 381-398.
Graf, P. (1982). The memorial consequences of generation and
transformation. Journal of Verbal Learning and Verbal Behavior,
21,539-548.
Green, D. M., & Swets, J. A. (1966). Signal detection theory and
psychophysics. New York: Wiley.
Hockley, W. E. (1982). Retrieval processes in continuous recognition.
Journal of Experimental Psychology: Learning, Memory, and Cog-
nition, 8, 497-512.
Hockley, W. E., & Murdock, B. B., Jr. (1987). A decision model for
accuracy and response latency in recognition memory. Psycholog-
ical Review, 94, 341-358.
Kolers, P. A. (1973). Remembering operations. Memory & Cognition,
1, 347-355.
Kolers, P. A. (1974). Two kinds of recognition. Canadian Journal of
Psychology, 28, 51-61.
Kolers, P. A. (1975a). Addendum to "Remembering operations.""
Memory & Cognition, 3, 29-30.
Kolers, P. A. (1975b). Memorial consequences of automatized encod-
ing. Journal of Experimental Psychology: Human Learning and
Memory, 1,689-701.
Kolers, P. A., & Ostry, D. (1974). Time course of loss of information
regarding pattern analyzing operations. Journal of Verbal Learning
and Verbal Behavior, 13, 599-612.
Kubera, F., & Francis, W. (1967). Computational analysis of present-
day American English. Providence, RI: Brown University Press.
McNicol, D. (1972). A primer of signal detection theory. London:
George Allen & Unwin.
Paivio, A., Yuille, J. C., & Madigan, S. A. (1968). Concreteness,
imagery, and meaningfulness values for 925 nouns. Journal of
Experimental Psychology Monograph Supplement, 76(No. 1).
(Appendix follows on next page)
16 MURRAY GLANZER AND J OHN K. ADAMS
Appendix
The order and values of t he vari ance ratios can be det er mi ned by
exami ni hg t he t erms t hat make up each rat i o and t aki ng account of
t he relative sizes of correspondi ng terms. All t hat is assumed is t hat
one condi t i on (for example, low frequency) is mor e effective t han t he
ot her (for example, hi gh frequency). For low- and high-frequency
words, we have t he following relations: n(L) > n(H); p(LO) > p (HO);
p(LN) = p(HN) = p(N), where L = low, H = high, O = old, N =
new. To simplify t he compari sons, let t he l ogari t hmi c t erms i n t he
varai nce equat i on be wri t t en as R(H) and R(L), where
R(H) = [l n(P(HO)" q( N) ] ] 2
I_ \ q( HO) . p( ) / J
R , L , =
L \ q ( o).p( )/j
Because p(LO) > p(HO), t hen R(L) > R(H).
Let us look first at two ratios of variance:
n(L). p(LO). q(LO)-R(L)
Var In L( LO) / Var In L(HN) = n(H).p(N).q(N).R(H)
(A1)
n(L). p(LO), q(LO)-R(L)
Var In L( LO) / Var In L(LN) = n(L).p(N).q(N).R(L)
(A2)
Because n(L) > n(H) and R(L) > R(H), t he first rat i o has t o be hi gher
t han t he second. For t he t hi rd ratio,
n( H) . p( HO) - q( HO) . R(H)
Var In L( HO) / Var In L(HN) = (A3)
n( H) . p( N) - q( N) . R( H)
Because p(LO), q(LO) > p(HO), q(HO), t he second rat i o has t o be
hi gher t han t he third. (The i nequal i t y hol ds when p(old) ___.50. We
assumed t hi s boundar y initially and used it i n all of our exploratory
comput at i ons. ) Finally, we look at t he fourt h ratio:
n( H) . p( HO) - q( HO) - R( H)
Var In L( HO) / Var In L(LN) = n( L) . p( N) . q( N) . R( L) (A4)
Because n(L) > n(H) and R(L) > R(H), Rat i o 4 has t o be less t han
Rat i o 3. These compari sons give t he order of t he four ratios.
The exami nat i on of t he t erms composi ng each rat i o shows t hat t he
first t hree are all greater t han 1.0. For example, every t er m- - n( L) ,
p(LO), q(LO), R( L) - - i n t he numer at or of Rat i o 1 is greater t han t he
correspondi ng t er m i n t he denomi nat or - - n( H) , p(N). q(N), R(H).
Rat i o 4 is t he onl y one t hat is i ndet er mi nat e i n size. Of t he corre-
spondi ng t erms i n t he numer at or and denomi nat or , n(H) < n(L), and
R(H) < R(L), but p(HO), q(HO) > p(N). q(N).
If i t is assumed t hat t he subject works with only a single p(old) in
st ruct uri ng t he decision process, t hen R(H) = R(L). The l ogari t hmi c
t er ms do not t hen affect t he relations bet ween t he variances. However,
t he ot her paramet ers do, and t he predi ct i ons above still hold.
Re c e i ve d Fe b r u a r y 28, 1989
Re vi s i on r ecei ved Ma y 19, 1989
Ac c e pt e d Ma y 19, 1989

Themir 1

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Themir 1

Enviado por

Direitos autorais:

Formatos disponíveis

Journal of Experimental Psychology:

Learning, Memory, and Cognition

Você também pode gostar