1990, Vol. 16, No. l, 5-16 Copyrighl 1990 by the American Psychological Association, Inc. 0278-7393/90/$00.75 The Mirror Effect in Recognition Memory: Data and Theory Murray Glanzer and John K. Adams New Yor k Uni ver si t y The mirror effect is a regularity in recognition memory that requires reexamination of current views of memory. Five experiments that further support and extend the generality of the mirror effect are reported. The first two experiments vary word frequency. The third and fourth vary both word frequency and concreteness. The fifth experiment varies word frequency, concreteness, and the subject's operations on the words. The experiments furnish data on the stability of the effect, its relation to response times, its extension to multiple mirror effects, and its extension beyond stimulus variables to operation variables. A theory of the effect and predictions that derive from the theory are presented. The mi r r or effect ( Gl anzer & Adams , 1985) is a st r ong regul ari t y in r ecogni t i on memor y. I t is s ummar i zed as follows. I f t her e are t wo classes of st i mul i , a nd one is mor e accur at el y r ecogni zed t han t he ot her, t hen t he super i or class is both mor e accur at el y r ecogni zed as ol d when ol d and also mor e accu- rat el y r ecogni zed as new when new. For exampl e, low-fre- quency wor ds are bet t er r ecogni zed t han hi gh-frequency words. The mi r r or effect means t hat t he great er effi ci ency i n recogni zi ng is al ways t wofol d. Ol d l ow- f r equency wor ds are bet t er r ecogni zed as ol d t han are ol d hi gh- f r equency words, and new l ow- f r equency wor ds ar e bet t er r ecogni zed as new t han are new hi gh- f r equency words. I n t he di scussi on t hat follows, r ecogni t i on per f or mance is vi ewed as based on subj ect s' responses t o under l yi ng di st ri - but i ons of some measur e for new a nd ol d i t ems. These di st r i but i ons are not , of course, di r ect l y observed. They are deduced f r om r ecogni t i on dat a. The r el at i on of t he under l yi ng di st r i but i ons t o t he dat a obt ai ned f r om st andar d r ecogni t i on t e s t s - - ye s / no, conf i dence rat i ng, forced c hoi c e - - i s gi ven in det ai l by Gl anzer and Ada ms (1985) and ot her s (Egan, 1975; Gr een & Swets, 1966; McNi col , 1972). Some possi bl e di st r i but i ons for t wo classes of st i mul i , when one is mor e accur at el y r ecogni zed t han t he ot her, are shown i n Fi gur e 1. Panel 1 r epr esent s t he di st r i but i ons t hat under l i e t he mi r r or effect. The panel shows t he di st r i but i ons for t wo classes of st i mul i , A a nd B. Cl ass A is r ecogni zed wi t h great er accuracy. Thi s is r epr esent ed by t he rel at i vel y large di st ance bet ween t he under l yi ng A ol d (AO) and A new ( AN) di st ri - This research was supported by Grant 1 ROI MH449 from the National Institute of Mental Health, Grant BNS 84-15904 from the National Science Foundation, and Contract F49620-86-C-0131 with the Air Force Office of Scientific Research. We thank Leslie Sherman and Amy Wolff for assistance in collec- tion and analysis of data, Jean-Claude Falmagne, Elliot Hirshman, Geoff Iverson, and Gay Snodgrass for their helpful comments on an earlier draft of this article. We also thank Douglas L. Hintzman, William E. Hockley, and two anonymous reviewers for their construc- tive comments. Correspondence concerning the article should be addressed to Murray Glanzer, Department of Psychology, New York University, New York, New York 10003. but i ons. Cl ass B is r ecogni zed wi t h less accuracy. Thi s is r epr esent ed by t he rel at i vel y smal l di st ance bet ween t he B ol d (BO) and B new (BN) di st r i but i ons. The mi r r or effect means t hat t he di fference i n accur acy of r ecogni t i on of A and B det er mi nes two mor e di fferences i n di st ance. A ol d is higher on t he deci si on axis t han B ol d, and A new is lower t han B new, as shown i n t he panel . These di fferences will be t he focus of t he st at i st i cal anal yses of t he exper i ment s r epor t ed here. The mi r r or effect regul ari t i es do not fol l ow f r om t he si mpl e fact t hat class A st i mul i are handl ed mor e accur at el y t han class B st i mul i . Gi ven t he di fference i n accuracy, a vari et y of pat t er ns of t he under l yi ng di st r i but i ons coul d hol d t hat vi ol at e t he mi r r or effect. Two such pat t er ns are shown i n Panel s 2 and 3 of Fi gur e 1. Each of t he r el at i ons r epr esent ed i n t he panel s of Fi gur e 1 i mpl i es a par t i cul ar pat t er n of dat a for each of t he st andar d r ecogni t i on tests. The r el at i ons i n Panel l i mpl y for confi - dence r at i ng dat a t hat R( AN) < R( BN) < g ( a o ) < R( AO) , where R represent s t he mean conf i dence r at i ng on a scale t hat has very sure new at its l ow end and very sure old at its hi gh end. For yes / no dat a t he i mpl i ed r el at i ons are FA( AN) < FA( BN) < H(BO) < H( AO) , where FA is false al ar m rat e, and H is hi t rate. For forced choi ce dat a t he r el at i ons are P(BO, a N) < P( AO, BN), P(BO, AN) < P( AO, AN) , where P is t he pr opor t i on of choi ces of t he first ar gument over t he second ar gument wi t hi n t he parent heses. The c o mma bet ween t he t wo mi ddl e t er ms signifies an i ndet er mi nat e r el at i on bet ween t hose t erms. A met a- anal ysi s of 80 r ecogni t i on exper i ment s s uppor t ed t he exi st ence of t he mi r r or effect ( Gl anzer & Adams, 1985) for al l r ecogni t i on par adi gms: yes/ no, conf i dence rat i ng, and forced choi ce. The met a- anal ysi s, mor eover , demons t r at ed t hat t he effect hel d for all st i mul us vari abl es t hat coul d be surveyed: wor d frequency, concret eness, meani ngful ness, and others. 6 MURRAY GLANZER AND J OHN K. ADAMS AN BN BO AO AO AN BN & BO AN & BN BO AO DECI SI ON AXI S Figure 1. Three possible orders of underl yi ng di st ri but i ons when accuracy on st i mul us class A is greater t han accuracy on class B. (O = old, N = new.) Panel 1 shows t he mi r r or effect. Suc h a r e gul a r i t y i n me mo r y is a c ha l l e nge t o s t r e ngt h t he or i e s o f r e c o g n i t i o n me mo r y . Th i s p o i n t was fi rst n o t e d by Br o wn ( 1976) . Ac c o r d i n g t o s t r e ngt h t heor i es , i n a r e c o g n i t i o n t es t t he s ubj e c t deci des o n t he bas i s o f t he s t r e ngt h o f t he i t ems . Te r ms e q u i v a l e n t t o s t r e ngt h ar e the amount of marking or familiarity of the items. Th e s e t heor i es , t he r e f or e , l abel t he de c i s i on axi s i n Fi gur e 1 as s t r e ngt h, a mo u n t o f ma r k i n g , or f ami l i ar i t y. Suc h t he or i e s h a v e p r o b l e ms i n a c c o u n t i n g f or t he mi r r o r effect , Th e y c o n t a i n n o i n h e r e n t me c h a n i s m t h a t ar r ays t he u n d e r l y i n g ne w a n d ol d d i s t r i b u t i o n s i n t he mi r r o r or de r as de pi c t e d i n Pa n e l 1 o f Fi gur e I. I n t hi s ar t i cl e we wi l l c ons i de r a di f f e r e nt t h e o r y o f t he effect : a t t e n t i o n / l i k e l i h o o d t heor y. Sever al e x p e r i me n t s wi l l n o w be pr e s e nt e d. Th e fi rst exper - i me n t wi l l e x p a n d t he d a t a ba s e o f t h e mi r r o r effect . I t wi l l al s o e x a mi n e t he me a n c o n f i d e n c e r a t i ngs f or mi sses. Th i s me a s u r e i s o f i nt e r e s t f or t h e t e s t i ng o f t he or i e s o f t he effect . Experiment 1 : Word Frequency and Incidental Learning I n t hi s e x p e r i me n t t he s ubj e c t s f i r st c a r r i e d o u t a n i n c i d e n t a l l e a r n i n g t ask, l exi cal deci s i on. T h e n t he y wer e gi ve n a s ur pr i s e r e c o g n i t i o n t es t i n wh i c h t he ol d wor ds wer e t he wor ds pr e- s e nt e d d u r i n g t h e l exi cal de c i s i on t ask. Th e t i me s t o ma k e t he l exi cal de c i s i on r e s pons e s a n d t he r e c o g n i t i o n r e s pons e s wer e r ecor ded. Method Procedure. In t he lexical decision task, t he subjects viewed words and nonwords on a moni t or. The present at i on was paced by t he subjects who, for each item, pressed one of two response keys on a response board labeled yes (for word) and no (for nonword). The yes key was assigned to t he subject' s domi nant hand. The subjects were told to be qui ck and accurate. The lexical decision task was preceded by eight practice items. The recognition test was carried out on t he comput er keyboard. Onl y words were presented dur i ng t hi s test. Onset of t he test word started a t i mi ng period. When t he subjects reached a decision, they pressed t he space bar which ended t he t i mi ng period. Then they pressed one of two keys i ndi cat i ng whet her t he i t em was old or new. The keys i n t hi s experi ment and i n Experi ment s 3, 4, and 5 were arranged so t hat old was assigned to t he subject' s domi nant hand. Finally, t hey pressed one of four keys labeled unsure, somewhat sure, moderately sure, or very sure. Thi s three-stage response was used to exclude from t he subjects' recognition response t i mes t he addi t i onal t i me to move to t he ext reme rat i ng scale positions. That t i me could produce an artifactual speed-accuracy trade-off. All stimuli for bot h t he i nci dent al l earni ng task and recognition test were presented centered, in uppercase letters. Following t he subject' s response, t he screen went bl ank for 500 ms in t he i nci dent al l earni ng task and for 2,000 ms i n t he recognition test. Then t he next i t em appeared on t he screen, The present at i on of t he items on t he moni t or was cont rol l ed by a comput er which also recorded responses and response times. The program used for t he comput er is described in Adams (1985). Except when not ed otherwise, t he procedures and st i mul us present at i on used here were t he same in Experi ment s 3, 4, and 5. Materials. The words present ed i n t he lexical decision task con- sisted of 124 high-frequency words ( mean log Ku~era-Francis fre- quency 4.8) and 124 low-frequency words (mean log frequency 2.4). The word groups bot h had a mean length of 5.0 letters. The 248 nonwor ds were const ruct ed t o be orthographically and phonologically legal. They had a mean length of 5.6 letters. The new words presented i n t he recognition test (124 high frequency and 124 low frequency) had t he same mean frequency and length as t he old words. The mai n list of lexical decision i t ems was preceded by 12 initial filler i t ems and followed by 12 final filler i t ems (each consisting of six words and six nonwords) t o el i mi nat e serial position effects. Nonwords and filler words di d not appear on t he subsequent recognition test. The word sets were count er bal anced across subjects so t hat each of t he experi- ment al words was used an equal number of t i mes as old and new in t he recognition test. Again, t hi s count erbal anci ng of word sets was used in Experi ment s 3, 4, and 5. Subjects. Sixteen undergraduat es participated in t he experi ment to fulfill an i nt roduct ory psychology course requi rement . All were nat i ve speakers of English. Thi s description of t he way subjects were recruited and selected holds also for Experi ment s 3, 4, and 5. Results Th e s ubj e c t s wer e hi ghl y a c c ur a t e o n al l cl asses of i t e ms i n t he l exi cal de c i s i on t ask. Th e p r o p o r t i o n s o f c or r e c t r e s pons e s ar e as fol l ows: h i g h - f r e q u e n c y wor ds ( M = . 99), l ow- f r e que nc y wor ds ( M = . 97), n o n wo r d s ( M = . 95). Th e ef f ect o f i t e m cl ass i s s t at i s t i cal l y s i gni f i cant , F( 2, 30) = 13. 43, p < . 0001, MSe = 0, 022. Ana l ys i s o f p r o p o r t i o n s he r e a n d i n t he r est o f t hi s ar t i cl e was c a r r i e d o u t o n t he ar c s i ne t r a n s f o r ma t i o n o f t he or i gi nal p r o p o r t i o n s . ( I n t hi s ar t i cl e, whe r e scor es ar e t r a n s f o r me d e i t h e r by ar c s i ne or l oga r i t hm, t he a c c o mp a n y i n g MS~s wi l l be f or t he t r a n s f o r me d scor es. ) Th e l exi cal de c i s i on r e s pons e t i me s ar e, as expect ed, nega- t i vel y c or r e l a t e d wi t h t he p r o p o r t i o n cor r ect . Th e g e o me t r i c me a n s ( ant i l ogs o f me a n l ogs) wer e 618, 6 7 1 , 8 4 0 ms f or hi gh- MIRROR EFFECT IN RECOGNITION MEMORY 7 Table 1 Means f or t he Four Condi t i ons o f Exper i ment 1 ( N = 16) New Old Measure Low High High Low Rating 3.34 3.76 5.09 5.56 P(yes) .304 .359 .592 .661 RT ~ 1,213 1,192 1,170 1,166 Note. P(yes) = proportion of yes responses. RT = response time. a Antilog of mean log response time (in milliseconds). frequency words, l ow-frequency words, and nonwords, re- spectively. Analyses of response times in this experi ment and in Experi ment 3 were carried out on t he logs of t he response times. The effect of item class is, again, statistically significant, F(2, 30) = 74.61, p < .0001, MSe = 0.005. I n summar y, t he lexical decision task showed t he usual pat t ern f ound i n exper- i ment s with word frequency as t he variable (see Gl anzer & Ehrenreich, 1979). Two related sets of recognition measures are of interest with respect t o t he mi r r or effect. One is t he confi dence ratings for the four stimulus conditions. The ot her is t he proport i ons of hits and false alarms. I n the case of confi dence ratings, t he mi rror pat t ern is R( LN) < R( HN) < R( HO) < R(LO), where R signifies mean rating. The argument s L and H refer again t o low- and hi gh-frequency words; N and O refer t o new and old. The ratings here and in the following experi- ment s are placed on a single scale, with t he highest value, 8, assigned t o very sure t he i t em is old, 7 t o moderat el y sure t he i t em is old, 6 t o somewhat sure t he i t em is old, 5 t o unsure t he i t em is old, 4 t o unsure t he i t em is new, and so on down t o 1, assigned t o very sure t he i t em is new. I n case of hits and false al arms t he mi r r or pat t ern is FA(LN) < FA( HN) < H( HO) < H(LO). Table 1 and t he following tables are arranged so t hat the mi rror effect is evidenced by a progression of increasing means going from left t o right. The mean confi dence ratings (row 1) and the hits and false alarms, t he pr opor t i on of yes responses (row 2), bot h show t he mi r r or effect. The statistical analysis of the dat a set s- - conf i dence ratings, hits and false alarms, response t i me s - - f or this and t he follow- ing experi ment s is carried out by first doi ng a prel i mi nary one-way analysis of variance across t he experimental condi - tions, in this case l ow-frequency new (LN), hi gh-frequency new (HN), hi gh-frequency old (HO), and l ow-frequency old (LO). This analysis is followed by t wo key compari sons: (a) high-frequency old versus l ow-frequency ol d and (b) high- frequency new versus l ow-frequency new. These t wo compari sons are critical. I f the differences are in t he right direction and statistically significant, t hey support the stability of the mi rror effect. One-tailed tests are used for these pl anned compari sons. The overall evaluation (the one-way analysis of variance here of the four experimental conditions) of t he confi dence ratings shows F(3, 45) = 120.92, p < .0001, MSe = 0.147. This overall eval uat i on gives highly significant effects because it includes the effect of new versus old items as well as the compari sons of interest. The overall evaluation, which in all experi ment s is highly significant, is reported here but not in t he following experi ment s because it is not of interest. Onl y the key pl anned compari sons are presented. These compari sons for t he confidence ratings show bot h critical differences i n the right direction and bot h statistically significant: hi gh-frequency old versus low-frequency old, t(45) = 3.46, p < .005; high-frequency new versus l ow-frequency new, t(45) = 3.10, p < .005. The parallel analysis of pr opor t i on of t he yes responses (hits and false alarms) shows the overall F(3, 45) = 120.91, p < .0001, MSe = 0.018. Bot h critical differences are again in the right direction and statistically significant: l ow-frequency versus high-frequency hits, t(45) = 3.14, p < .005; l ow-frequency versus high-frequency false alarms, t(45) = 2.57, p < .01. The analysis of t he pr opor t i on of yes responses is partially r edundant with the analysis of confi dence ratings. It therefore will be reported in onl y abbre- viated f or m in the following experiments. The mean propor- tions will be i ncl uded in t he tables t o underscore t he regularity of the effect. Also shown in Table 1 are t he mean response times for each of t he four conditions. There are two possible expecta- tions concerni ng t he pat t ern of response times. One, and of greater concern t o us, is t hat there is a speed- accur acy trade- off, with response times for LO > HO and LN > HN. Such a t rade-off woul d make the mi r r or effect trivial. Hockl ey and Mur dock (1987) present evidence (Hockley, 1982) against a trade-off. The possibility of t rade-off is, however, i mpor t ant enough t o require full checking. The ot her possibility is a positive correlation of speed and accuracy: LO < HO and LN < HN. The Hockl ey (1982) dat a show such a positive corre- lation. Here, however, neither pat t ern holds: neither a speed- accuracy t rade-off (negative correlation) nor a speed- accur acy positive correlation. Analysis of variance of the log response times reveals onl y the difference between new and old as statistically significant, t(45) = 2.11, p < .05, MSe = 0.002. Nei t her of the ot her relevant compari sons is large or statistically significant: high- frequency old versus low-frequency old; hi gh-frequency new versus low-frequency new. There is no evidence here, there- fore, t hat speed- accur acy correlation plays a role in t he mi r r or effect. It coul d be argued that, for t he response arrangement used in this experiment, t he subjects' initial response may have preceded their actual decision and t hat this reduced t he correlation of speed and accuracy. The similarity of t he mean response times, all close t o 1,200 ms, does not support such an argument . We will, however, exami ne this quest i on again in Experi ment 3 with a different response arrangement . The mean ratings for misses have been singled out as i mpor t ant by Brown, Lewis, and Monk (1977). They not e t hat missed highly memor abl e old items are rejected with greater confi dence t han missed l ow-memorabl e old items. This finding has theoretical i mport ance because it contradicts expectations on t he basis of strength theories. The dat a of this experi ment replicate t he finding. The mean confi dence rating for l ow-frequency misses is 2.15 and for high-frequency misses is 2.33. The difference, t hough small, is in t he right direction and is statistically significant, F(1, 15) = 15.58, p < .002, MSe 8 MURRAY GLANZER AND JOHN K. ADAMS = 0. 016. We cons i der t hi s f i ndi ng a nd ot her s t hat show t he s ame r el at i on i n t he f i nal sect i on. I n s umma r y, t he pr es ent e xpe r i me nt shows t he fol l owi ng: (a) a not he r r epl i cat i on of t he mi r r or effect for wor d f r equency on bot h conf i dence r at i ng a n d yes~no dat a; (b) n o evi dence of a s peed- accur acy cor r el at i on; (c) di fferences i n t he me a n conf i dence r at i ngs for mi sses. The next e xpe r i me nt was de- si gned t o e xa mi ne Poi nt s a a n d c furt her. E x p e r i me n t 2: Wo r d F r e q u e n c y a n d I n t e n t i o n a l L e a r n i n g Thi s was a r epl i cat i on of Expe r i me nt 1, wi t h several changes i n pr ocedur e. It was car r i ed out as a gr oup e xpe r i me nt wi t h i nt e nt i ona l i nst ead of i nc i de nt a l l ear ni ng a n d audi t or y i ns t ead of vi sual pr es ent at i on. Met hod Procedure. A group of subjects heard a single list of words read at a 1-s rate. They were told that they would be given a recognition test. The test consisted of a printed list of words, mixed old and new. Next to each word was a sequence of letters and number s- - Y, N, 1, 2, 3, and 4. The subject indicated old by circling Y, new by circling N, and degree of confidence by circling the number (1 for unsure, 4 for very sure). Materials. A shorter list was constructed from the materials used in Experiment 1. The study list consisted of 50 high-frequency words (mean log frequency = 5. I) and 50 low-frequency words (mean log frequency = 2.5) plus 24 initial filler words and 24 final filler words. The filler words were evenly divided into high- and low-frequency words. The test list consisted of the 100 study list words plus matched (same mean log frequency) groups of 50 new high-frequency and 50 new low-frequency words as distractors. The subjects' responses on the test were self-paced. Subjects. Thirty-five undergraduates in a memory course partic- ipated in the experiment as a class exercise. Resul t s The ma i n resul t s are s hown i n Tabl e 2. The y par al l el cl osel y t he resul t s of Expe r i me nt 1. The mi r r or effect is pr es ent i n bot h t he me a n conf i dence r at i ngs a n d pr opor t i on of yes responses. The tests (MS = 0. 436) of t he me a n r at i ngs agai n show t he key di fferences t o be st at i st i cal l y si gni f i cant : hi gh- f r equency ol d versus l ow- f r equency ol d, t (102) = 3. 59, p < . 0005; hi gh- f r equency ne w ver sus l ow- f r equency new, t (102) = 2. 11, p < .025. The mi r r or pat t er n also hol ds i n t he par al l el anal ysi s of t he hi t s a n d false al ar ms (row 2), bot h ps < .05. The me a n r at i ngs for t he mi sses show t he s ame pat t er n as Tabl e 2 Means f or the Four Conditions of Experiment 2 (IV = 35) New Old Measure Low High High Low Rating 3.22 3.56 5.51 6.08 P(yes) .228 .281 .613 .704 Note. P(yes) = proportion of yes responses. i n Expe r i me nt 1. The me a n conf i dence r at i ngs for mi sses are l ower for l ow- f r equency wor ds (2. 56) t h a n for hi gh- f r equency wor ds (2.68). Thi s is based on 34 subj ect s because 1 subj ect di d not have a ny misses. The di f f er ence agai n is slight but st at i st i cal l y si gni f i cant , F( 1, 33) = 7.90, p < .01, MSe = 0. 030. I n s u mma r y , t he resul t s of Expe r i me nt 2 - - wi t h audi t or y pr es ent at i on, i nt e nt i ona l l ear ni ng, a nd gr oup t e s t i n g - - c o n - f i r m t he f i ndi ngs of Expe r i me nt 1. E x p e r i me n t 3: Mu l t i p l e Mi r r o r Ef f ect s a n d Pa r t i a l Or d e r - - F r e q u e n c y a n d Co n c r e t e n e s s The pur pos e of t hi s e xpe r i me nt was t o devel op a mul t i pl e mi r r or effect by us i ng t wo v a r i a b l e s - - n o r ma t i v e f r equency a n d c o n c r e t e n e s s - - i n a si ngl e set of i t ems. Each of t he t wo var i abl es al one pr oduces a mi r r or effect. We c o mb i n e d t hese t wo var i abl es fact ori al l y i n or der t o pr oduce a mor e compl ex mi r r or effect i nvol vi ng mor e or der ed t er ms t ha n t he f our or der ed t er ms seen i n t he pr eceedi ng exper i ment s . I f t he t wo var i abl es are equal i n t hei r effectiveness, t he n t he me a n con- f i dence r at i ngs s houl d give t he f ol l owi ng par t i al order: R( LCN) < R( HCN) , R( LAN) < R( HAN) < R( HAO) < R( LAO) , R( HCO) < R( LCO) , wher e C r epr esent s concr et e a n d A abst r act wor ds (for exam- ple, LCN = l ow f r equency, concr et e, new). Paral l el t o t hese i nequal i t i es for t he r at i ngs s houl d be a par t i al or der for t he hi t s a n d false al ar ms: FA( LCN) < FA(HCN), FA( LAN) < FA( HAN) < H( HAO) < H( LAO) , H( HCO) < H( LCO) . We had act ual l y expect ed t hat f r equency woul d be mor e effective t h a n concr et eness. I n t hat case, wher e one var i abl e is st ronger, a ful l r at her t ha n a par t i al or der is expect ed. A ful l y or der ed set of t er ms will be pr oduc e d i n t he next exper- i me nt . Met hod Procedure. The procedure was basically the same as that in Experiment 1, with lexical decision as the incidental learning task. The sequence of responses required of the subjects in the recognition test was, however, simplified. The subjects made a single response to each test word, pressing one of an array of eight keys with the rightmost key indicating very sure old and the leftmost key indicating very sure new. The eight keys were in the top row of the keyboard with labels indicating confidence levels. On the test the subjects saw a series of 280 words--hal f old, half new. The stimulus presentations in both study and test were self-paced. The interstimulus intervals on the study and the test lists were the same as in the Experiment 1 (500 and 2,000 ms, respectively). Subjects. Sixteen undergraduates participated. Materials. The composition of the lists differed from that in Experiment 1. In the lexieal decision task 140 words and 140 non- words were presented. The 140 words were drawn, 35 from each of four 70 word sets: low-frequency concrete (LC), high-frequency con- crete (HC), low-frequency abstract (LA), and high-frequency abstract (HA). The two low-frequency sets both had mean log frequency of 1.5; the two high-frequency sets both had mean log frequency 3.9, based on the Kurera-Francis (1967) norms. The two concrete sets MIRROR EFFECT IN RECOGNITION MEMORY 9 both had a mean concreteness rating of 6.8; the two abstract sets both had a mean rating of 2.6, based on the Paivio, Yuille, and Madigan (1968) norms. With both concreteness and frequency varied, it was not possible to match the word lengths across conditions as closely as in Experiments 1 and 2. The means for the four groups listed above were 7.2, 5.9, 7.8, and 6.9 letters, respectively. Because we were concerned that these differences might affect the pattern of results, we subsequently carried out a special analysis of the data to determine whether the differences had an effect. They did not. This analysis will be reported briefly later. The main list of lexical decision items was preceded by 80 filler items and followed by 80 filler items (half words and half nonwords) which did not appear on the recognition test. The recognition test list consisted of the old words plus the remaining unpresented 140 words, 35 from each of the four word sets. Results In the lexical decision task, high-frequency words took less time to respond to ( M = 592 ms) than did low-frequency words (M = 702 ms), F(I, 15) = 65.38, p < .0001, MSe = 0.007. High-frequency words ( M = .98) were responded to more accurately t han low-frequency words ( M = .92), F(1, 15) = 37.34, p < .0001, MS~ = 0.062. The concrete versus abstract words in the lexical decision test did not differ signif- icantly in response time ( F < 1). Accuracy was, however, somewhat higher for abstract (M -- .96) t han concrete words ( M= .94), F(1, 15) = 3.92, p = .07, MSe = 0.038. The overall recognition test results are presented in Table 3. They are considered first with respect to frequency alone and concreteness alone. This simplification is justified because a factorial analysis of variance of the data showed that the two stimulus variables, frequency and concreteness, do not interact. After examining the mai n effects of frequency and concreteness separately, the results for the combi nat i on of the two variables will be examined. Summi ng across concreteness conditions, the mirror effect for frequency is evident again in both the confidence ratings and the proportions of yes responses (hits and false alarms). The key tests (MS~ = 0.272) of the confidence ratings for old low-frequency (M = 5.79) versus old high-frequency words (M = 5.31) show t(105) = 3.66, p < .0005; and for new high- frequency (M = 3.63) versus new low-frequency words ( M = 3.17) show t(105) = 3.50, p < .0005. Parallel tests on the proportion yes data give the same results (both ps < .025). Summi ng across frequency conditions, the mirror pattern also appears for concreteness in both the confidence ratings and the hits and false alarms. Tests of the confidence ratings show concrete old ( M = 5.74) higher than abstract old (M = 5.37), t(105) = 2.88, p < .005, and concrete new (M = 2.98) lower t han abstract new (M -- 3.82), t(105) = 6.40, p < .0005. Concrete hits (M = .679) are higher than abstract hits, (M = .654), but the difference is not statistically significant. Con- crete false alarms (M = . 177) are lower than abstract false alarms (M = .320), t(105) = 6.01, p < .0005. The mean confidence ratings of misses again show the order noted by Brown et al. (1977). The order holds for both frequency and concreteness. The low-frequency misses are rated lower (2.81) than the high-frequency misses (3.03), t(45) = 2.86, p < .005, MSe -- 0.096. The concrete misses are also rated lower (2.89) than the abstract (2.95), but the difference is not statistically significant. Before moving to the consideration of the accuracy scores for the combined conditions, two issues will be touched on. One concerns the effect of word length on the pattern of the results. We noted earlier that the word sets differed in mean length. Although those lengths did not correspond to the mirror effects observed, we decided to check on any possible effects of word length fully. We did this by removing words from the word sets so that the reduced word sets all had identical mean lengths while preserving the match of fre- quency and concreteness. This meant going from four sets of 70 words to four sets of 30 words. We then computed the mean ratings for the reduced sets and analyzed the pattern produced. The pattern and overall analysis of variance cor- responded fully to those obtained for the complete sets of words. To convey the correspondence, the means for the reduced set corresponding to the means in the first row of Table 3 read from left to right as follows: 2.72, 3.32, 3.68, 4.01, 5.19, 5.46, 5.42, and 6.10. The means are only slightly different from those for the larger set of items. The results of statistical analysis based on the reduced set also differ only slightly and in no important way from the full analysis. The differences in word length, therefore, were not important. The second issue concerns the response times. There is evidence of some differences: old are faster than new items, F(1, 15) = 12.776, p < .003, MSe = 0.013; overall, high- frequency words are faster than low, F(1, 15) = 20.168, p < .0005, MSe = 0.004; concrete are faster than abstract, F(t , 15) = 5.716, p < .05, MSe = 0.013. Our mai n concern was, however, the presence of a speed- accuracy trade-off. There is no evidence of this. There is no relation between the response times and either of the accuracy measures within either the new conditions or the old in Table 3. The rank order correlation of speed and accuracy is zero Table 3 Means for the Eight Conditions of Experiment 3 (N = 16) New Old Measure LC HC LA HA HA LA HC LC Rating 2.73 3.23 3.61 4.02 5.19 5.54 5.44 6.04 P(yes) .161 .193 .284 .357 .630 .677 .625 .732 RT a 2,011 1,885 2,051 1,970 1,877 1,927 1,728 1,844 Note. L = low frequency; H = high frequency; C = concrete; A = abstract; P(yes) = proportion of yes responses; RT = reaction time. a Antilog of mean log response time (in milliseconds). 10 MURRAY GLANZER AND JOHN K. ADAMS for both the new and old conditions. There is, then, no evidence in this experiment, or in Experiment l, for any general relation between the mirror effect and response times. We return now to the accuracy measures in Table 3. Contrary to our expectations, concreteness and frequency were approximately equal in their effectiveness. This can be seen by comparing the highest mean rating for frequency, which for the low-frequency old words (LC plus LA) is 5.79, and the highest mean rating for concreteness, which for contrete old words (LC plus HC) is 5.74. With two variables of equal strength, only partial orders are expected for the confidence ratings. A partial order is what is obtained: R(LCN) < R(HCN), R(LAN) < R(HAN) < R(HAO) < R(LAO), R(HCO) < R(LCO). A deviation from the partial order is obtained, however, in the hits and false alarms because H(HCO), .625, is slightly lower than H(HAO), .630. This we consider to reflect the relative weakness of the proportion yes data, which contain less information than do the ratings. If a partial rather than full order is due to the absence of strong differences in the effectiveness of the two stimulus variables, frequency and concreteness, then a number of changes can be introduced to produce the full order. One possible change is to select the sets of words used so that either the differences in frequency or the differences in word con- creteness would be greater t han in the word sets used in this experiment. We could, for example, select only the very highest and very lowest frequency words. This, however, would reduce further an already limited pool of words. The other way would be to weaken one of the variables, for example, by adding middle range items in either the high- concreteness set or the low-frequency set. This could, how- ever, weaken the effectiveness of the variable sufficiently to lose the mirror regularity. We decided not to change the word sets but to introduce an encoding task that would differentially affect the word sets. We therefore repeated the experiment, making the concreteness variable stronger by a concreteness encoding task. Exper i ment 4: Mul t i pl e Mi r r or Effects and Ful l Or de r - - Fr e que nc y and Concr et eness Pl us Concr et eness Encodi ng Task This was a replication of Experiment 3, with a change in the encoding task. We hoped that a concreteness encoding task would strengthen the concreteness variable and thus give a full order of the eight means produced by the combination of two variables--word frequency and concreteness. The eight means should display a higher order mirror effect. The words were the same as those in Experiment 3. Instead oflexical decision, however, a concreteness encoding task was given. No nonwords were shown. During the initial list pres- entation, the subjects carried out, as an incidental learning task, a concreteness j udgment on the words. During the recognition test the words, both new and old, were each judged first for concreteness before the recognition j udgment was made. This was done in order to have the encoding operation affect new as well as old items. Me t h o d Materials. The study list consisted of the 140 words in four categories used in Experiment 3 (LC, HC, LA, HA) plus 4 practice items, 40 initial filler words, and 40 final filler words. The test list consisted of those 140 words plus 140 matched words. Procedure. The subject was instructed that items that could be sensed (seen, heard, touched, tasted, or smelled) were concrete. Dur- ing the encoding task the subjects pressed a key on a keyboard labeled "+" if the word on the screen was judged concrete or a key labeled "- " if it was judged not concrete. During the initial encoding trials, the subject received feedback on the correctness of the judgment. The feedback consisted of the word right or wrong appearing on the screen for 750 ms. During the recognition test, the subject made a concrete- ness judgment first for each word, but no feedback was given. Im- mediately after the concreteness judgment, the subject made a con- fidence judgment on whether the word was old or new, on an eight- key array as in Experiment 3. Subjects. Sixteen undergraduates participated. Re s u l t s On the initial encoding task, the subjects were more accu- rate on low-frequency (M = .96) than high-frequency words (M = .94), F(I, 15) = 6.66, p < .03, MSe = 0.025, and on concrete (M = .97) than abstract words (M = .93), F(1, 15) = 16.36, p < .002, MSe = 0.054. Items encoded incorrectly on test trials (3.5%) were not included in the scoring of recognition performance. Examination of the data shows, however, that even if these items are included, they do not change the pattern of results. The results for the recognition test are given in Table 4. First, the encoding task was successful in making the concrete- ness variable stronger in the recognition task. The mean rating for old concrete words here is 6.72 as compared with 6.47 for old low-frequency words. (The corresponding means in Ex- periment 3 were 5.74 and 5.79.) We can expect, then, that a full order of inequalities will be found for these data. The results will be examined again, first with respect to frequency alone and concreteness alone. As in Experiment 3, a factorial analysis of variance of the data showed that fre- quency and concreteness did not interact. The means show the mirror effect for both word frequency alone and concreteness alone, and for both the mean confi- dence ratings and the proportion of yes responses (hits and false alarms) in each. The tests (MSe = 0.300) of the ratings of high-frequency old ( M = 6.17) versus low-frequency old (M = 6.47) give t(105) = 2.25, p < .025; and high-frequency new (M = 3.29) versus low-frequency new (M = 2.85), t(105) = 3.21, p < .001. Parallel tests on the proportion yes data show both comparisons with p < .05. For concreteness the confidence ratings (MSe = 0.300) of the concrete old ( M = 6.72) versus abstract old (M = 5.92) give t(105) = 5.87, p < .0005. The ratings for the abstract new (M = 3.30) versus concrete new (M = 2.85) give t(105) = 3.32, p < .001. Parallel tests on the proportion yes data show both comparisons with p < .001. The order of the mean confidence ratings for misses noted before holds again ( MSo = 0.380). Low-frequency misses (M = 2.48) have lower ratings than do high-frequency misses MIRROR EFFECT IN RECOGNITION MEMORY 11 Tabl e 4 Means f or the Eight Conditions of Experiment 4 (N = 16) New Old Measure LC HC LA HA HA LA HC LC Rating 2.67 3.02 3.03 3.57 5.85 5.99 6.48 6.96 P(yes) .148 .200 .201 .300 .747 .758 .813 .882 Note. L = low frequency; H = high frequency; C = concrete; A = abstract; P(yes) --- proportion of yes responses. ( M = 2.77), t(45) = 1.92, p < .05. Concr et e mi sses ( M = 2.47) have l ower rat i ngs t han do abst r act mi sses ( M = 2.78), t(45) = 2.06, p < .025. Of par t i cul ar i nt erest her e is whet her t her e is an ext ended ei ght -cat egory mi r r or effect, i n full order, now t hat one of t he exper i ment al vari abl es, concret eness, is st ronger t han t he other. Tabl e 4 di spl ays t he dat a for all ei ght c ombi ne d con- di t i ons. Bot h mean conf i dence rat i ngs and pr opor t i on of yes responses now show t he expect ed full order: R( LCN) < R( HCN) < R( LAN) < R( HAN) < R( HAO) < R( LAO) < R( HCO) < R( LCO) and FA( LCN) < FA( HCN) < FA( LAN) < FA( HAN) < H( HAO) < H( LAO) < H( HCO) < H( LCO) . To exami ne t he st rengt h of t he orderi ngs, we car r i ed out an anal ysi s t hat par al l el s t he t est s car r i ed out i n t he pr ecedi ng exper i ment s. I n t he ear l i er t est s t her e were t wo ol d and t wo new means. Her e t her e are f our of each. The mi r r or effect will be evi denced by t he st rengt h of t he l i near c ompone nt i n each set of f our means. We t her ef or e eval uat ed t he l i near component of each set of f our r el at ed r ecogni t i on measur es i n Tabl e 4, for exampl e, t he conf i dence rat i ngs of LCO, HCO, LAO, and HAO i n row 1. Whe n t hi s is done, we f i nd t he following: The l i near c ompone nt for conf i dence r at i ng means of t he f our ol d condi t i ons gives F( 1, 105) = 39.13, p < .0005; for t he f our new condi t i ons, F(1, 105) = 19.45, p < .0005; for hits, F(1, 105) = 19.06, p < .0005; and for false al ar ms, F( 1, 105) -- 17.89, p < .0005. The ext ent of or der in t he means can be ful l y conveyed by eval uat i ng t he pr opor t i on of var i ance account ed for by t he mi r r or or der i ng i n each ar r ay of means. For conf i dence ratings, t he pr opor t i on of var i ance account ed for by t hese t wo l i near component s , aft er t he effect of ol d versus new i t ems is t aken out , is .93; i t is .85 for t he hi t s and false al arms. The results of t he exper i ment st rengt hen t he empi r i cal basi s of t he mi r r or effect. The effect is shown, mor eover , t o pr oduce an ext ended or der when t wo var i abl es t hat di ffer i n effective- ness ar e used. The ext ended or der is an ei ght - posi t i on mi r r or effect. E x p e r i me n t 5: F r e q u e n c y , Co n c r e t e n e s s , a n d T r a n s f o r ma t i o n The pur pose of t hi s exper i ment was t o exami ne mul t i pl e mi r r or effects wi t h a t hi r d, new t ype of var i abl e a d d e d - - t r ans f or mat i on of t he list words. Kol er s (1973, 1974, 1975a, 1975b), Kol er s and Ost ry (1974), and Gr a f (1982) have shown t hat r ecogni t i on me mor y is bet t er for t r ans f or med t ext (for exampl e, t ext i n whi ch t he l et t ers are i nver t ed or reversed) t han for st andar d t ext . Of t he seven separ at e exper i ment s r epor t ed, however, onl y one shows t he mi r r or effect. The effect of t r ans f or mat i on is of i mpor t ance for est abl i sh- i ng t he general i t y of t he mi r r or effect. Al mos t al l of t he demons t r at i ons of t he mi r r or effect are for st i mul us vari abl es, such as wor d f r equency and concret eness. Those vari abl es are pr oduced by t he sel ect i on of sets of i t ems. Tr ans f or mat i on falls out si de t he class of st i mul us vari abl es. Tr ans f or mat i on can be appl i ed t o any i t em, and i t is, t herefore, i ndependent of any i t em set. I f t r ans f or mat i on can be shown t o pr oduce t he mi r r or effect, t hen a mor e general st at ement concer ni ng t he effect ma y be made: Any vari abl e ( not j us t classes of st i mul i ) t hat affects effi ci ency of r ecogni t i on will pr oduce t he mi r r or effect. I f t he effect cannot be demons t r at ed for t r ans- f or mat i on, t hen t he mi r r or effect ma y be l i mi t ed t o st i mul us vari abl es. Ther e are t wo reasons why t he ci t ed exper i ment s on t r ans- f or mat i on ma y not have shown t he mi r r or effect. One is t hat t he t est i ng pr ocedur e i n t hose exper i ment s was compl ex. I n t he Kol er s exper i ment s, t he t est i t ems i ncl uded not onl y ol d sent ences i n t he same f or m as ori gi nal l y pr esent ed but also ol d sent ences in a di fferent f or m (for exampl e, i n st andar d form when ori gi nal l y pr esent ed i nvert ed). The subj ect s clas- sifted t he sent ences as ol d same- f or m, ol d di fferent -form, or new. Hi t s and false al ar m rat es t hat appr oxi mat e t hose f r om or di nar y r ecogni t i on tests were der i ved f r om t hose classifica- t i ons. The compl exi t y of t he pr ocedur e r equi r ed of t he sub- j ect s ma y have wor ked agai nst cl ear demons t r at i on of t he effect. I n t he Gr a f (1982) exper i ment , t he subj ect s vi ewed sent ences dur i ng t he st udy phase but were gi ven wor d pai rs dur i ng t he test. Anot her r eason for t he negat i ve results ma y be fl oor effects on t he false al ar ms. The subj ect s i n t he Kol er s negat i ve cases showed l ow fal se-al arm rat es (.02 t o .09) i n bot h t he st andar d and t r ans f or med condi t i ons. Thi s means t hat t he possi bi l i t y of a cl ear di fference showi ng is slight. We t herefore deci ded t o exami ne t he effect of t r ans f or mat i on in a si mpl er arrange- ment and wi t h mat er i al t hat we knew woul d not show fl oor effects. Thi s exper i ment was basi cal l y t he same as Exper i ment 3 except for t he addi t i on o f a t r ans f or mat i on t o hal f t he wor ds present ed. Thi s t r ansf or mat i on, reversal of t he or der of l et t ers in t he word, r equi r ed a decodi ng oper at i on by t he subject. Ha l f t he wor ds were pr esent ed i n st andar d order; hal f were pr esent ed i n reverse order, for exampl e, emoh. 12 MURRAY GLANZER AND JOHN K. ADAMS Me t h o d Procedure. The subjects were instructed to pronounce all words presented on the screen. Those presented in standard order were simply read. Those in reverse order had to be decoded and then spoken. The experimenter monitored the performance throughout to make sure that both tasks were performed correctly. This was done both in the list presentation and in the test. During the test, the subject said each word aloud and then responded as in Experiment 4 by pressing one of eight keys on the top row of the keyboard (with labels ranging from NNNN, NNN, . . . to YYYY) to indicate whether the word was new or old and the degree of confidence in the judgment. Materials. The word lists were the same as in Experiment 3 and 4 except that two words were deleted from each of the four basic word sets (LC, HC, LA, and HA) to give a total of 272. This permitted the counterbalancing of word lists with the additional transformation variable. The mean log frequencies and mean concreteness measures for the basic word sets were the same as in Experiments 3 and 4. The study list consisted of 136 words in four categories plus 4 practice items, 40 initial filler words, and 40 final filler words. The test list consisted of the 136 old words, plus 136 new words from the same four categories. Old words were presented with letters in the same order as in their initial presentation. For example, if a word was reversed initially, it was presented reversed during test. Subjects. Thirty-two undergraduates participated. Resul t s The overall means for each variable separately are shown in Table 5. The mirror effect appears i n each row of the table. Preliminary analysis of the data indicated, however, that frequency and transformation interacted. The data for those variables are, therefore, separated out in Table 6, which shows the transformation conditions at both levels of word fre- quency. It can be seen that the mirror effect holds for the transformation at both high and low frequency. The test of the critical conditions for the means in Table 6 shows all the differences for the mean ratings (MS, = 0.425) as statistically significant at the .01 level or better except the difference for new reversed versus new standard in the low-frequency con- dition (p <. 10). The same pattern holds for the proportion yes data (MSe = 0. l l 5), in which all key comparisons are significant at the .025 level or better except, again, for the Table 5 Means f or the Transformation, Frequency, and Concreteness Conditions of Experiment 5 (N = 32) Measure New Old Transformation Reversed Standard Standard Reversed Rating 2.74 2.96 5.66 7.25 P(yes) .182 .211 .687 .922 Frequency Low High High Low Rating 2.55 3.15 6.30 6.61 P(yes) .157 .237 .780 .829 Concreteness Concrete Abstract Abstract Concrete Rating 2.46 3.24 6.35 6.56 P(yes) .136 .257 .799 .810 Note. P(yes) = proportion of yes responses. Table 6 Means f or the Transformation Condition, High and Low Frequency Separate New Old Transformation Condition Reversed Standard Standard Reversed High frequency Rating 3.01 3.29 5.32 7.28 P(yes) .215 .259 .630 .929 Low frequency Rating 2.47 2.62 5.99 7.23 P(yes) .150 .164 .742 .916 Note. P(yes) = proportion of yes responses. new reversed versus the new standard in the low-frequency condition (p < .20). The comparisons for frequency are all statistically significant for both confidence ratings and pro- portion yes at the .0005 level except for a nonsignificant and slightly reversed effect of low old versus high old in the reversed condition (p > .20). The reason for the interaction between frequency and trans- formation may be that the reversed old condition brings the performance close to the ceiling in both the ratings (greater than 7.2 on a scale of 8) and the proportion yes greater than .90). The mirror effect holds, however, for the transformation variable at both levels of frequency. This variable was our mai n concern. The deviation in word frequency is not of major concern because the meta-analysis (Glanzer & Adams, 1985) showed the mirror effect for word frequency in 23 out of 24 published experiments, and Experiments 1, 2, 3, and 4 above all show it. The confidence ratings for misses have the same pattern as before on each of the variables. Low-frequency misses (M = 2.49) are lower than high-frequency misses (M = 2.79), F(I, 31) = 19.80, p < .0001, MSe - - 0 . 0 7 0 ; concrete misses (M = 2.52) are lower than abstract ( M = 2.76), F(1, 31) = 9.59, p < . 0 0 5 , MSe - - 0.096; and reversed misses (M = 2.49) are lower than standard (2.69), F(1, 31) = 2.86, p = .10, MS~ = 0.224. The results of this experiment support further the points made in the preceding experiments. The mai n new finding is that the mirror effect can be produced by variables other than stimulus variables such as word frequency and word concrete- ness. It is produced by transformations on a single set of stimulus words. The transformations induce subjects to carry out operations on the words that affect the accuracy of rec- ognition. There is support, therefore, for the more general statement concerning the mirror effect. Any variable that affects recognition accuracy, not just stimulus variables, will produce the effect. Gener al Di scussi on Brown (1976) and Brown et al. (1977) were the first to argue that the mirror effect required a change in the theoretical approach to recognition memory. They argued that the sub- jects took account of more than the strength of the items being evaluated. They took account also of the memorability of the items. This more complex basis of decision is incor- MIRROR EFFECT IN RECOGNITION MEMORY 13 porat ed i n the t heory t hat will be presented next, at t ent i on/ likelihood theory. At t ent i on/ l i kel i hood t heory is a sampl i ng t heory with t wo special me c ha ni s ms - - a n at t ent i on mechani sm and a decision mechani sm. The decision mechani sm proposed differentiates it f r om current theories of recognition. The key idea concern- ing t he decision mechani sm i n recogni t i on is t hat the subjects evaluate a compl ex of i nf or mat i on related t o an item. The compl ex includes i nf or mat i on about t he relation of t he given item t o bot h a model new i t em and a model ol d item. Thi s i nformat i on is realized in a likelihood ratio (see Assertion 5 below). The assertions of t he t heory are t he following: 1. Stimuli are sets of features. The number of such features is N. This will be assumed const ant for all stimuli. Because N refers t o features, there is no reason t o assume, at this point, t hat one stimulus has mor e or fewer features t han anot her. 2. Some pr opor t i on of those features is mar ked in new stimuli. This pr opor t i on is p(new). The p(new) represents t he noise level. Thi s again, here, will be assumed const ant for all stimuli. There is no reason, at this point, t o assume t hat one new stimulus enters with greater noise marki ng t han another. 3. Different classes of stimuli or different situations evoke different amount s of at t ent i on by t he subject. This is trans- lated i nt o differences in t he number of features, n(i), exami ned (sampled) duri ng a trial. The sampl i ng is r andom. 4. When features are exami ned, t hey are marked. The proport i on of features mar ked is a(i) = n ( i ) / N . Therefore, the state of stimuli after t hey have been experienced is given by the following equat i on: p(i, old) = p(new) + a(i). (1 - p(new)). (1) Condi t i ons t hat evoke exami nat i on of a larger pr opor t i on of features will result in t he marki ng of a larger pr opor t i on of features. The learning const ant a(i) will be larger, and t he learning rate faster. 5. Duri ng a recognition test, t he subject uses t he st andard mechani sms of signal det ect i on t heory in maki ng responses. Specifically, likelihood ratios are comput ed and decisions are made on t he basis of those likelihood ratios. Assertions 2 and 3 set up the underl yi ng distributions for new i t ems- - bi nomi al s with the paramet ers n(i) and p(new) for a particular condi t i on. Assertion 4 sets up bi nomi al distri- but i ons for the old items with paramet ers n(i) and p(old). The subject uses i nf or mat i on related t o those distributions t o generate likelihood ratios and responds on t he basis of those likelihood ratios. This distinguishes this t heory from strength theories in whi ch t he subject responds on t he basis of strength or its equivalent: amount of marki ng, familiarity. The likeli- hood ratio is a key mechani sm in t he pr oduct i on of the mi rror effect. For the bi nomi al distributions we consider here, the log likelihood ratio for a single presented item is the following: (p_(i, ol d) ] ( q ( i , old)~ In L = x. l n \p(---~-e--~-ew) / + [n(i) - xl . In \ q("~ew)/ " (2) The n(i) is t he number of features the subject observes. The x is the number of those marked. They are presented by the stimulus and are available t o the subject. The l ogari t hmi c terms reflect the subject' s model of the situation. The process is the following. A test i t em is presented. The subject exami nes a number of features (n(i)) and notes the number of those t hat are mar ked (x). The subject t hen brings in t wo items of i nf or mat i on- - t he proport i on of marked fea- tures an old i t em of this t ype is expected t o have and t he proport i on of marked features a new i t em is expected t o have. On t he basis of this i nformat i on, likelihood ratios are com- puted. The likelihood is used in t he final decision. For ex- ample, in a y e s ~ n o test i f t he likelihood ratio is greater t han a preset likelihood criterion, the subject says "yes. " The loga- ri t hmi c t erms in Equat i on 2 are t he subjects' model of t he situation. They pl ay the same role as Brown' s (1977) memor - ability evaluation. The t heory permi t s us t o specify key statistics of the process. It also permi t s us by comput at i on, t o simulate t he regularities t hat make up the mi rror effect. Two key statistics are the mean and variance of the log likelihood (In L) distributions: ~ p ( i , ol d) / M In L(i, j) = n(i).p(i, j ) - l n \ ~ ] + n(i)-q(i, j ) - l n (q(i, old)~ \ q ( n e w) ] (3) Var In L(i, j) = n(i)-p(i, j)-q(i, j) r [ p ( i , ol d) . q( new) \ ] 2 [ l n t ' ~ - - ~ e ~ o ' ~ } j , ( 4 ) where i is t he experimental condi t i on, such as stimulus set A or B, and j is t he stimulus state, either new or old. The variance will be used later t o test the theory. One possible objection t o t he theory, as stated earlier, is t hat it has the subject hol d in mi nd several different p(old)s. Fr om one poi nt of view, however, the subject has t o have onl y some idea of t he average p(new), t he n(i), and the number of features. The p(i, old) can t hen be estimated, or at least ordered. We can simplify the t heory further and assume t hat the subject works with a single p(old), for example, t he average p(old) for several stimulus classes, not t wo or mor e as implied above. I n t hat case, p(i, old) and q(i, ol d) - - t he logarithmic terms in t he mean and var i ance- - r educe t o a single p(old) and q(old). The terms outside t he logarithmic terms are not affected. They reflect the cont ri but i on of t he actual stimuli, not t he subject' s model of the situation. It can be shown t hat the mai n effect considered so f ar - - t he mi r r or order--st i l l holds under this simplification. Moreover, the derivations concerni ng t he variances which will be tested later also hold. We cannot , however, handl e the ratings for misses with this assumption. Using the t heory as presented above, we have carried out hundreds of comput at i ons with a large range of Ns, n(i)s, and p(news)s, and therefore also for a large range of a(i)s and p(i,old)s. Our onl y restriction on the ps has been t hat t hey stay under .50. Our comput at i ons show t hat the t heory pro- duces t he mi rror pat t ern for t he st andard recognition meas- ures: 1. hits and false alarms: FA(AN) < FA(BN) < H(BO) < H(AO); 2. mean confi dence ratings: R(AN) < R(BN) < R(BO) < R(AO); 3. two-alternative forced choice: P(BO, BN) < P(AO, BN), P(BO, AN) < P(AO, AN). 14 MURRAY GLANZER AND JOHN K. ADAMS It al so pr oduces t he or der of conf i dence rat i ngs for misses f ound in t he dat a. Some general tests of t he t heor y are possible. We will not do convent i onal fi t t i ng of t he val ues for t he five exper i ment s. Al t hough t he t heor y has onl y f our basi c pa r a me t e r s - - N, t wo n(i)s, and p( ne w) - - i t cannot be used t o fit t he dat a of t he present exper i ment s. For exampl e, t he yes/no dat a of Exper- i ment 1 give onl y f our means wi t h f our par amet er s t o be est i mat ed. Therefore, i nst ead of convent i onal fitting, we will appl y some general tests of t he t heory. The tests will be concer ned wi t h t he slopes of t he recei ver oper at i ng char act er - istic (ROC) for condi t i ons in t he Exper i ment s 1 t hr ough 5, usi ng Equat i on 4, t he equat i on for t he vari ance. The t heor y per mi t s us t o der i ve some cri t i cal i nf or mat i on about t he rat i o of t he vari ances for pai rs of condi t i ons r el evant t o ROCs. We will do t hi s for one case, t hat i nvol vi ng l ow (L) and high (H) frequency, first. For t hat case we consi der f our vari ance ratios: (a) Var In L( LO) / Va r In L( HN) ; (b) Var In L( LO) / Va r In L( LN) ; (c) Var In L( HO) / Va r In L( HN) ; (d) Var In L( HO) / Va r In L( LN) . These var i ance r at i os yi el d pr edi ct i on concer ni ng t he sl opes of t he ROCs. On t he basi s of Equat i on 4 we can der i ve t wo st at ement s. Thei r der i vat i on is gi ven i n t he Appendi x. 1. The f our r at i os above are l i st ed i n or der of size, wi t h t he highest rat i o first. 2. The first t hr ee r at i os are all great er t han 1.0. The fourt h, for HO and LN, is i ndet er mi nat e. It ma y be great er, less t han, or equal t o 1.0. Its size rel at i ve t o t he ot her r at i os is, however, known. Thi s is assert ed i n t he first st at ement . The f our r at i os a bove ar e r e l a t e d t o f our ROCs : (a) l ow- frequency hi t s agai nst hi gh-frequency false al ar ms ( LO/ HN) ; (b) l ow-frequency hi t s agai nst l ow- f r equency false al ar ms ( LO/ LN); (c) hi gh-frequency hi t s agai nst hi gh-frequency false al ar ms ( HO/ HN) ; (d) hi gh-frequency hi t s agai nst low-fre- quency false al ar ms ( HO/ LN) . ROCs 2 and 3 ( st andar d ROCs) are t he t wo t hat woul d or di nar i l y be pl ot t ed. ROCs 1 and 4 (crossed ROCs) will be consi der ed here i n or der t o test t he t heor y fully. Ther e is a known rel at i on ( Gr een & Swets, 1966, p. 62) bet ween t he vari ances of t he signal and noi se di st r i but i ons and t he sl ope of t he nor mal i zed ( z score) ROC. The r at i o of t he signal var i ance t o t he noi se var i ance is t he i nverse of t he sl ope of t he ROC. On t he basi s of t hi s rel at i on, t he f our rat i os above i mpl y t he fol l owi ng t wo st at ement s for t he ROC' s. 1. Because t he r at i os of var i ances are l i st ed i n or der f r om highest t o lowest, t he sl opes of t he nor mal i zed ROCs shoul d show t he i nverse order. The sl ope of t he ROC for low- frequency hi t s agai nst hi gh-frequency false al ar ms ( LO/ HN) cor r espondi ng t o t he first r at i o shoul d be t he lowest, and t he sl ope of t he ROC ( HO/ LN) cor r espondi ng t o t he l ast r at i o shoul d be t he highest. 2. The first t hree nor mal i zed ROCs shoul d all give sl opes less t han 1.0. We will now exami ne t he sl opes of t he f our nor mal i zed ROCs obt ai ned for each of t he t hree vari abl es in t he five r epor t ed exper i ment s. Because Exper i ment s 3, 4, and 5 con- t ai n several vari abl es, t hey give a t ot al of ni ne sets of ROCs. Al l five exper i ment s var i ed frequency. Exper i ment s 3, 4, and 5 var i ed concret eness. Onl y Exper i ment 5 had t r ansf or mat i on as a vari abl e. The sl opes for t hese ROCs are pr esent ed in Tabl e 7. The frequency vari abl e gives t he ent ri es i n t he left par t of t he table. The concret eness and t r ans f or mat i on vari abl es give t he ent ri es in t he ri ght par t of t he t abl e. The or der i ng of t he ROCs has been set so t hat equi val ent ROCs appear on t he same row. For exampl e, L is t he strong, Ht h e weak frequency condi t i on; Ct h e strong, A t he weak concret eness condi t i on; R t he strong, S t he weak t r ans f or mat i on condi t i on. Therefore, LO/ HN, CO/ AN, and RO/ SN ar e on t he same r o w- - t h e ROCs for st rong condi t i on hi t s agai nst weak condi t i on false al arms. 1. Wi t h respect t o t he val ue of t he slopes, every one of t he 27 ( ni ne sets of t hree each) pr edi ct ed t o be less t han 1.0 is i ndeed less t han 1.0 ( LO/ HN, LO/ LN, HO/ HN; CO/ AN, CO/ CN, AO/ AN; RO/ SN, RO/ RN, SO/ SN) . The pr obabi l i t y of 27 such resul t s occur r i ng by chance, usi ng t he bi nomi al wi t h p = .5, is 7.5 x 10 -9. I f onl y t he st andar d ROCs are c o n s i d e r e d - - LO/ LN and HO/ HN and t hei r par al l el s - - t hen t he pr obabi l i t y of 18 such results occur r i ng by chance is 3.81 X 10 -6. 2. Wi t h respect t o t he or der i ng of t he sets of f our sl opes for each vari abl e, we f i nd t hat all but one cor r esponds fully t o t he pr edi ct ed order. Appl yi ng t he bi nomal , t he pr obabi l i t y of ei ght out of ni ne cases giving t he pr edi ct ed or der by chance, wi t h p = 1/24 and n = 9, is 3.78 x 10 -j2. We can, again, rest ri ct our at t ent i on t o t he st andar d ROCs - - L O/ L N and HO/ HN and t hei r parallels. We f i nd t hen t hat ei ght out of t he ni ne show t he pr edi ct ed order. The pr obabi l i t y of t hi s number or mor e occur r i ng by chance, usi ng t he bi nomi al wi t h p = .5, is .0195. At t ent i on/ l i kel i hood t heor y does t he following: 1. It handl es t he known regul ari t i es of t he mi r r or e f f e c t - - t he or der i ng of hi t s and false al arms, t he or der i ng of confi- dence ratings, and t he or der i ng of choi ces i n t he t wo- al t er na- t i ve forced choice. 2. It handl es new r egul ar i t i es - - t he size and or der of ROC slopes. Tabl e 7 Slopes of Normalized ROCs for Each Variable in the Five Experiments (1, 2, 3, 4, and 5) Freq. 1 2 3 4 5 Conc. 3 4 5 Transf. 5 LO/ HN .66 .56 .69 .64 .61 CO/AN .68 .56 .61 RO/SN .56 LO/LN .74 .61 .81 .74 .72 CO/CN .74 .65 .67 RO/ RN .66 HO/ HN .91 .70 .84 .71 .76 AO/AN .89 .77 .82 SO/SN .85 HO/ LN 1.03 .75 .98 .82 .91 AO/CN .96 .89 .90 SO/RN 1.00 Note. Experiments 1, 2, 3, 4, and 5 all varied frequency. Only Experiments 3, 4, and 5 varied concreteness. Experiment 5 alone varied transformation. Freq. = frequency; Conc. = concreteness; Transf. = transformation; L = low frequency; H = high frequency; C = concrete; A = abstract; R = reversed; S = standard; O = old; N = new. MIRROR EFFECT IN RECOGNITION MEMORY 15 Ther e are t hr ee ot her appr oaches t o ei t her t he general mi r r or effect or speci al cases of it. Two of t hese ( Gl anzer & Bowles, 1976; Gi l l und & Shiffrin, 1984) were concer ned wi t h a speci al c a s e - - wo r d f r equency effects. Bot h are st rengt h t heor y appr oaches. The first appr oach was based on wor k i n our l abor at or y (Bowles & Gl anzer , 1983; Gl anzer & Bowles, 1976). One pr obl em wi t h t he appr oach is t hat i t was specific t o t he case of wor d frequency. I t coul d not be gener al i zed t o ot her st i mul us vari abl es and woul d have f ur t her di ffi cul t i es wi t h vari abl es such as t r ansf or mat i on. Our di ssat i sfact i on wi t h t hat t heor y l ed t o t he f or mul at i on of at t ent i on/ l i kel i hood t heory. A second appr oach is par t of t he compr ehensi ve me mo r y t heor y of Gi l l und a nd Shiffrin (1984, p. 46). It al so focuses on t he specific case of wor d frequency. The appr oach assumes t hat t he subj ect rescales t he under l yi ng di st r i but i ons on t he basi s of di st ance f r om separ at e cr i t er i a and t hei r st andar d devi at i ons. The subj ect t hen al i gns t he di st r i but i ons by pl aci ng t he di fferent cri t eri a i n a single l ocat i on. The rescal i ng and al i gnment pr oduce t he mi r r or effect. The specific char act er - istics of t he process t hat necessari l y pr oduce t he effect are not given. A t hi r d appr oach t hat concer ns i t sel f wi t h t he mi r r or effect is t hat of Hockl ey and Mur doc k (1987, p. 355). I n t hat appr oach t he or der of under l yi ng di st r i but i ons depi ct ed i n Panel 1 of Fi gur e 1 is assumed. The pr obl em, however, is t o expl ai n why t hose under l yi ng di st r i but i ons are or der ed as t hey are. Al l t hr ee appr oaches handl e t he mi r r or effect as a speci al puzzle. We bel i eve, however, t hat t he general i t y of t he mi r r or effect requi res a new view of t he process under l yi ng recogni - t i on memor y. Subj ect s make t hei r r ecogni t i on deci si ons by usi ng a compl ex of i nf or mat i on about each st i mul us. I n t he t heor y out l i ned here, t he compl ex is a l i kel i hood rat i o. What - ever t he appr oach, some equi val ent compl ex and an appr o- pr i at e deci si on mechani s m will have t o be post ul at ed t o handl e t hi s regul ari t y in r ecogni t i on memor y. Re f e r e n c e s Adams, J. K. (1985). Visually presented verbal stimuli by assembly language on the Apple II computer. Behavior Research Methods, Instruments & Computers, 17, 489-502. Bowles, N. L., & Glanzer, M. (1983). An analysis of interference in memory. Memory & Cognition, 11, 307-315. Brown, J. (1976). An analysis of recognition and recall and of prob- lems in their comparison. In J. Brown (Ed.), Recall and recognition (pp. 1-35). New York: Wiley. Brown J., Lewis, V. J., & Monk, A. F. (1977). Memorability, word frequency and negative recognition. Quarterly Journal of Experi- mental Psychology, 29, 461-473. Egan, J. P. (1975). Signal detection theory and ROC analysis. New York: Academic Press. GiUund, G., & Shiffrin, R. M. (1984). A retrieval model for both recognition and recall. Psychological Review, 91, 1-67. Glanzer, M., & Adams, J. K. (1985). The mirror effect in recognition memory. Memory & Cognition, 13, 8-20. Glanzer, M., & Bowles, N. (1976). Analysis of the word-frequency effect in recognition memory. Journal of Experimental Psychology: Human Learning and Memory, 2, 21-31. Glanzer, M., & Ehrenreich, S. L. (1979). Structure and search of the internal lexicon. Journal of Verbal Learning and Verbal Behavior, 18, 381-398. Graf, P. (1982). The memorial consequences of generation and transformation. Journal of Verbal Learning and Verbal Behavior, 21,539-548. Green, D. M., & Swets, J. A. (1966). Signal detection theory and psychophysics. New York: Wiley. Hockley, W. E. (1982). Retrieval processes in continuous recognition. Journal of Experimental Psychology: Learning, Memory, and Cog- nition, 8, 497-512. Hockley, W. E., & Murdock, B. B., Jr. (1987). A decision model for accuracy and response latency in recognition memory. Psycholog- ical Review, 94, 341-358. Kolers, P. A. (1973). Remembering operations. Memory & Cognition, 1, 347-355. Kolers, P. A. (1974). Two kinds of recognition. Canadian Journal of Psychology, 28, 51-61. Kolers, P. A. (1975a). Addendum to "Remembering operations."" Memory & Cognition, 3, 29-30. Kolers, P. A. (1975b). Memorial consequences of automatized encod- ing. Journal of Experimental Psychology: Human Learning and Memory, 1,689-701. Kolers, P. A., & Ostry, D. (1974). Time course of loss of information regarding pattern analyzing operations. Journal of Verbal Learning and Verbal Behavior, 13, 599-612. Kubera, F., & Francis, W. (1967). Computational analysis of present- day American English. Providence, RI: Brown University Press. McNicol, D. (1972). A primer of signal detection theory. London: George Allen & Unwin. Paivio, A., Yuille, J. C., & Madigan, S. A. (1968). Concreteness, imagery, and meaningfulness values for 925 nouns. Journal of Experimental Psychology Monograph Supplement, 76(No. 1). (Appendix follows on next page) 16 MURRAY GLANZER AND J OHN K. ADAMS Appendix The order and values of t he vari ance ratios can be det er mi ned by exami ni hg t he t erms t hat make up each rat i o and t aki ng account of t he relative sizes of correspondi ng terms. All t hat is assumed is t hat one condi t i on (for example, low frequency) is mor e effective t han t he ot her (for example, hi gh frequency). For low- and high-frequency words, we have t he following relations: n(L) > n(H); p(LO) > p (HO); p(LN) = p(HN) = p(N), where L = low, H = high, O = old, N = new. To simplify t he compari sons, let t he l ogari t hmi c t erms i n t he varai nce equat i on be wri t t en as R(H) and R(L), where R(H) = [l n(P(HO)" q( N) ] ] 2 I_ \ q( HO) . p( ) / J R , L , = L \ q ( o).p( )/j Because p(LO) > p(HO), t hen R(L) > R(H). Let us look first at two ratios of variance: n(L). p(LO). q(LO)-R(L) Var In L( LO) / Var In L(HN) = n(H).p(N).q(N).R(H) (A1) n(L). p(LO), q(LO)-R(L) Var In L( LO) / Var In L(LN) = n(L).p(N).q(N).R(L) (A2) Because n(L) > n(H) and R(L) > R(H), t he first rat i o has t o be hi gher t han t he second. For t he t hi rd ratio, n( H) . p( HO) - q( HO) . R(H) Var In L( HO) / Var In L(HN) = (A3) n( H) . p( N) - q( N) . R( H) Because p(LO), q(LO) > p(HO), q(HO), t he second rat i o has t o be hi gher t han t he third. (The i nequal i t y hol ds when p(old) ___.50. We assumed t hi s boundar y initially and used it i n all of our exploratory comput at i ons. ) Finally, we look at t he fourt h ratio: n( H) . p( HO) - q( HO) - R( H) Var In L( HO) / Var In L(LN) = n( L) . p( N) . q( N) . R( L) (A4) Because n(L) > n(H) and R(L) > R(H), Rat i o 4 has t o be less t han Rat i o 3. These compari sons give t he order of t he four ratios. The exami nat i on of t he t erms composi ng each rat i o shows t hat t he first t hree are all greater t han 1.0. For example, every t er m- - n( L) , p(LO), q(LO), R( L) - - i n t he numer at or of Rat i o 1 is greater t han t he correspondi ng t er m i n t he denomi nat or - - n( H) , p(N). q(N), R(H). Rat i o 4 is t he onl y one t hat is i ndet er mi nat e i n size. Of t he corre- spondi ng t erms i n t he numer at or and denomi nat or , n(H) < n(L), and R(H) < R(L), but p(HO), q(HO) > p(N). q(N). If i t is assumed t hat t he subject works with only a single p(old) in st ruct uri ng t he decision process, t hen R(H) = R(L). The l ogari t hmi c t er ms do not t hen affect t he relations bet ween t he variances. However, t he ot her paramet ers do, and t he predi ct i ons above still hold. Re c e i ve d Fe b r u a r y 28, 1989 Re vi s i on r ecei ved Ma y 19, 1989 Ac c e pt e d Ma y 19, 1989