A Method For Comparing The Areas Under ROC Curves Derived From The Same Cases

1981].1 839 84;],September, [Reprintedfrom RADIOLOGY, Vol. 148,No.
3, Pages Societyof North America,Incorporated Copyright 1983by the Radiological
|ames A. Hanley, Ph.D. BarbaraI. McNeil, M.D., Ph.D.
the A Method of Comparing Areas
Operating underReceiver from Derived Curves Characteristic

Casesl the Same
Receiveroperating characteristic(ROC) curves are used to describeand compare the performance of diagnostic technology and diagnostic algorithms. This paper refines the statistical comparison of the areasunder two ROC curves derived from the same set of patients by taking into account the correlation between the areasthat is induced by the paired nature between of the data. The correspondence the area under an ROC curve and the Wilcoxon statistic is used and underlying Gaussiandistributions (binormal) are assumed to provide a table that converts the observed correlations in paired ratings of images into a correlation between the two ROC areas.This between-areacorrelation can be used to reduce the standard error (uncertainty) about the observed difference in areas.This correction for pairing, analogousto that used in the paired ttest, can produce a considerableincrease in the statistical sensitivity (power) of the comparison.For studies involving multiple readers,this method provides a rneasure of a component of the sarnpling variation that is otherwise difficult to obtain.
o I n d e x t e r m : R e c e i v e r p e r a t i n gc h a r a c t e r i s t ic u r v e (ROC) , R a d i o l o g y 1 4 8 :8 3 9 - 8 4 3 S e p t e m b e r1 9 8 3
1l rvnnar questionsdealing with comparativebenefitsfor alternaor D tirr" diagnosticalgorithirs, diagnoitic tests, therapeuticregimens have recently emergedin medicine. For example,how do we know whether one diagnostic algorithm is better than another in groups?Whether the and nondiseased sorting patientsinto diseased algorithm imaddition of a new test or procedureto an established proves its performance?Whether it matters who of severalavailable readersinterprets a mammogram?Whether one type of hard-copy unit in radiology is better than another?Whether reading a CT scan diin coniunction with the patient'shistorv allows a more accurate ug.,ori, than reading it without the hiitory? The analysesof such problems have startedwith constructionof receiveroperating charhave used as acteristic(ROC)cutves (1-3). Generally theseanalyses cutoff points either different posteriorprobabilitieson a continuous The latter apscaleor different thresholdson a discreterating scale. proach has been particularly popular in radiology. Major gaps in the understanding of statisticalpropertiesof ROC curves have limited their usefulness,especiallyfor questions inof volving comparisons curvesbasedon the samesampleof subjects Thesecomparativesituationscontrastwith those involving or objects. the a single datasetand a single ROC curve. In such cases, investigator generally only needs to know that a single modality or diagnostic "poor", "moderate", or "good" accuracy, and the locaapproachhas However, when a ROC curve gives a rough assessment. tion of the comparisonof two algorithmsor modalitiesis relevant,more formal statisticalcriteria are needed in order to judge whether observed are differencesin accuracy more likely to be random than real.Thus far thesecriteria have not been fully developedfor ROC curves. index that In a recentpaper(4) we dealtwith one popular accuracy can be derived from and used as a summary of the ROC curve. We showed that the relationshipof the areaunder the ROC curve to the properties,such Wilcoxon statisticcould be used to derive its statistical as its standarderror (SE)and the samplesizesrequired to measurethe areawith a prespecifieddegreeof precision(reliability) and to provide power (low type II error) in comparative a desiredlevel of statistical experiments.This paper extendsour statisticalanalysisto another where the two or more ROC curvesare genlarge classof situations, erated using the sameset of patients.In thesesituations,it is inappropriate to calculatethe standarderror of the difference between ( A ) t w o a r e a s A r 0 a 1 a n d r A n 2a s SL(Arint-Ariazl=fffi (l)
1 From tho Department of Epidcmiology antl Health, McGill Universitv, Montreal, Canada (J.A.H.) and the Department of Iladiologv, Harvard lvfedical School and l3riglram and Women's HosPit.rl, Bostor, r MA, USA (B.l.M.). Rt'ceived June 3, 1c)81; evision requested Jull- 21, 1981; final revision receivt'cland acceptecl Feb. 15, 1983. Supported in part by the Hartford Found.rtion antl ht the National Center for Health Care Technologv.
srnceAr0at and Ar0szare likely to be correlated.This correlationis likely to be positive; if the vagariesof random sampling of cases produce a higher/lower than expectedaccuracyindex for one modality (e.g.,if the sampleconsistedof a larger than usual number of of then the accuracy the secondmodality will easy/difficult cases), probably alsobe correspondinglyhigher/ lower than one would expect. In other words, while the two indices may fluctuate indepen-
dently by amounts SE1 and SE2 sepin arate samples,they will tend to fluctuate in tandem when derived from a single sample. In this paper we have developedan approach to take account of this correIation. In brief, we indicate that the relevant standarderror for such comparisons is not that shown in Equation 1 but rather SE(ArAa1- rAaz) A
=@
-2rSE(ArAa) SE(ArAa) Q) where r is a quantity representingthe correlation introduced between the two areasby studying the samesample of patients. This paper reviews the calculations for comparing the ROC curyesof two modalitiesand illustrates this new approach using data from a seriesof experimentsinvolving phantoms.
tained in three ways: (l) by the trapezoidal rule; (ll) as output from the Dorfman and Alf maximum Iikelihood estimation program (5); or (lii) from the slope and intercept of the original data when plotted on binormal graph paper (3). As indicated in our companion paper (4) the trapezoidal approach systematically underestimates areas. Because the Dorfman and Alf approach is becoming readily accessibie to those interested in this area, we will calculate areas using this approach. (For those limited to graphical methods, the area can be derived from the slope and intercept according to the rule Area = Percentage of Gaussian distribution to left of zt, where Z1 = Inter-
ratings (rn among the normals, r..1 among the abnormals)are obtained,it is necessatyto calculatethe correlation that they induce between the two areas A r and Az; for easeof notation we have called this r (without any subscript). This is the coefficient present in E q u a t i o n s2 a n d 3 . T a b u i a t i o no f r (Taet-E is the fundamental contribuI) tion of this paper3;therefore, in our subsequent examplewe will illustrate its use. Experimental Data for Illustrative Examples We studied 112phantomsthat were specially constructedto evaluate the accuracyof two different computeralgorithms used in image reconstruction for CT. Fifty-eight of thesephantoms were of uniform density and were "normal"; designated the remaining 54 contained an areaof reduceddensity to simulate a lesion and were designated "abnormal". Two images of each phantom were reconstructedusing the two different algorithms, which we will refer to as modality 1 and modality 2. A single reader read each image and rated it on a 6-point scale:I = Definitely Normal;2 = Probably Normal; 3 = PossiblyNormal;4 = PossiblyAbnormal; 5 = Probably Abnormal; 6 = Definitely Abnormal. From the resulting data, we constructed two ROC curves.The data were submitted to the Dorfman and Alf maximum likelihood program to produce areas under the ROC curvesand standarderrors. RESULTS Our resultswill be divided into two parts. First, the analysisof the example involving CT phantoms will be illustrated. Then, in order to verify that the z statisticperforms correctly, results of several simulations will be summarized. CT Phantom Example The basicdata are presentedin the Appendix, along with the calculations produced from them. The areasunder (SE the ROC curveswere 89.45% 3.0V") (SE and93.82% 2.6%). The (Kendall tau) correlationsbetween the paired ratings were rN = 0.39(nondiseased patients) and r4 = 0.60(diseased patients),giving an /iaverage// correlationbetween the ratings of 0.50.With this average correlationof 0.50and with an average areaof (89.45+ 93.82) = 91.64, TaarE l2
cept/r/JT stopez-). CalculatingStandardErrors Thestandard errors associated with
areascan be obtained in three ways: (i ) as output directly from the Doriman METHODS and Alf maximum likelihood estimaThe general approach to assessing tion program; (li) front the varianceof whether the difference in the areas the Wilcoxon statisticas illustrated in under two ROC curves derived from detail in Reference4; or (iii) from an the same set of patients is random or approximation to the Wilcoxon statistic real is to calculate critical ratio z, de- by making an assumption,shown to be a conservative(comparedwith assuming fined as a Gaussian-based ROC curve),that the o,_ Az underlying signal (diseased) noise and \J/ (nondiseased) distributions are expo,rlp1Ts77 rrtr,* (a). We will use the where A 1and SE1 refer to the observed nential in type standard errors estimated from the area and estimated standard error of Dorfman and AIf program. the ROC areaassociated with modality 1; where A2 and SE2 refer to correCalculating the Correlation sponding quantitiesfor modality 2; and Coefficient, r, BetweenAreas where r representsthe estimatedcorrelation between ,41 and A2.2 This Two intermediate correlation coefquantity z is then referred to tablesof ficients are required, which are then the normal distribution and values of converted into a correlation between z above some cutoff, e.9.,z > 7.96,are ,4 and A2 aia a table that we supply 1 taken as evidence that the "true" ROC below. The first is ro,,', correlation the areasare different. The importance of coefficient for the ratings given to imintroducing the 2rSE$E2term in the agesfrom nondiseased patientsby the above equation is obvious: failure to two moctaiities. The secondis r,i, the subtract out from the sampling vari- correlation coefficient for the ratings ability those fluctuations that the of diseasedpatients imaged by the two paired design has already eliminated modalities. Eachof thesecan be calcuwill leave the denominator of Equation lated in traditional ways using either 3 too large and z too small, thereby re- the Pearsonproduct-moment correladucing the chanceof detecting a dif- tion method or the Kendall tau. The ference between two modalities. former approach is usually used for results derived from an interval scale whereasthe latter is more appropriate Calculating Areas for results obtained from an ordinal Areas under ROC curvescan be ob- scale. ROC curvesin radiology are derived from ordinal scale data and therefore we have used the Kendall tau for calculating /"r' and 11. Standard 2 As we will see later, the SE of an estimated ( s t a t i s t i c a lp a c k a g e s c . 9 . ,S P S S ,S A S ) area depends on the magnitude of the underlying "true" ot atea. When calculating : to test the null provide tau; when the number of rathypothesis that this underlying area is the same ing categoriesis small, however, say for both modalities, one should equate SE1 and four or less,the caiculationcan alsobe SE2, calculating them both from a common est:iperformed manually. mate of the area. In this case the denominator becomes t/2SET - a or sE/2(1 aJ. Once the correlationsbetween the
840 . Radiology
3 Mathematical derivation available upon request.
September
1983
It8 . fSoIorpPU Sursn sarpnls Jo raqrunu-e dq,tr ureldxa asrer plno.t{ 1sa1parred aql leql 1ra[ord deru uor;enrasqo srqJ 'aq llr.tt lsal z plnoJ auo 'sarJern)Je aurlaseq pue parred ar{} e^llrsuas aloru aql /seaJ ar{J suorlIarJo) Jo suor]PurqruoJ Jnot raô 'aJuaJeJJrp uaemlaq uorJlaJro) aql ra8rel aq1 luerr;ruBrs e Surlerrpur slsal 'uJnl ur slurod asar{JssnJsrp parredun pue parred yo a8eluarrad oql 3u11e1nqe1 paqenlerra se,1taJueruJoJ .,{q ar14 'srsdleue pue u8rsap leluaurrad -rad aq1 'sluarJrJJaoJuorlelarror 8ur -xa Jo pur{ srq} uroJJ saSlarua +eql dru -o-uo)a -dre,L ruory palelnurrs araru sluarur,rad Ierrlsrlpls aLIl alprlpul ol palelo 'seare -xa 002 -deJlxa aq UPJ elEp Jno 'puoJas 'aJrMl Jo slas'asodrnd srr{l roC luerled qrea 8ur.{pn1s .,{q parnpur JOU tuaraJJIP '{tl^ salllleporu o^{l serJe ur arualaJJrp JL{l ;o J o , ' { 1 r 1 r q e r " r e . r uosrredruor parrnbar )rlsrlels srql roJ flr,rrlrsuas ar{} Jo uorlenlelg (,razvrod) Surldwes.rallerus aLIl lunoJJ olur salel _ '%1'96 ,role8rlsarrur ar{l Jr alrlrsuas aJoru operu Jo lrJrJrraos eq ueJ uosrJedruor aql ]eql umoqs aêLI e ''a'!'obg', uaaq a^pr{ plnoM srlll 'uoll e1,4 'lua.reddp arp sllnsal alerparurur o1!r,J'sluarled;o aldrues arues aLIl ruorJ pa^rJap seAJnJ lou ol!\l Japun seaJe aql SurredruoJ Jo poqlau e paqlrrs -ap aleq a.u uorle8rlsalur srq] uI
raqurnN
8tI aurnlo^
-rrap prepuels peq pue sauo uerssneD
NOISSNJSIC 'obgrTqol (srs,,{leue parredun

ue uroJJ palradxa) dlr,Lrlrsues %09 e
- n q l r l s r p u p r s s n e J1 r a 1 r . r d u l ' h 6 i 6 e e ''a't /oby'g set (arua.ra; ;o ,,{1rrr;rcads -tlp luPrlJlu8rs d11err1sr1e1sSurletrp e -ur sp ua{el uJlJo sanlel) g 7- rr,rolaq ro 0'Z aôqe sanlel z 1o uorlrodord a8e,ra,reaql (sarur] 007 unJ r{rea suorl -eurquor 71) s1err1008't er'{t Suotue 'dlprryoadg'lradxa plnoqs auo teqm ol J s o l f ,p u p ' , l r o l J I J M s e l e r a n r ; r s o d - a s 1 e ; aql 'I ol asol) dlqeldale suor+el 'zl(v tv) + ' z l ( v t N t )t + +
ruor; alqer{srnBurlsrpur sasodrnd 1errl -rerd 11eJoJ aJaM suorlplnrurs snorJel esaql ruorJ paurelqo z Jrlsrlels lsal aql Jo suortnqlrlslp palelnqel aql 'suoll -elalJor pue seare 3gy Surdpapun JO suorleuiquroJ IeJaAas Jo qJea roJ pauroyad ara,u sas.{1eue pallnruls 00t 'flrrrlnads aql alpln)ler ol Japro uI 'u) uPluuor) pue zlal pue (9) qarsH tr pue ))ellod ,{q pasn asoql ol snoSoleue spoqlaru Sursn'suorlenlrs palelnurrs yo a8ue,r e JaAo aJuurroyad rrlsou8erp 'lsal slr paurruexa a1\1. IeJrlsrlls /vtau su.{l JoJ sJrlsrJalJeJerlJasaql auIruJa]ap q8rq) o1 '(,,(1rl;nads pue ,,{1r,r.r1rsuas 'lJeJ ur 'uar{.4{ lsrxa lsrxa saop auou ol prs sr aJuaraJJrpe qJTLIMur saJuels -ul to Jaqurnu aq1 .{e.u alqelrrpard e ur azrurrurru plnoqs 1r lnq 'luasard ,,{11ear auo uar{.naaJualaJJrp e aleJ sr -rpur plnor{s lsal leJrlsrlels poo8 y lsal parIEd aql Jo aruEruroJrad lerauac
uorllaJroJ a8era,te yo uorlJunJ e sp z y pup I y spaJe fod
'(surunloJ) earp aBe,rale pue (s^{oJ) sButler uaa,nlaq o.{^l uaaalaq I IUeIJIJJJoJ UoIIPIJJJoJ i
980 rB0 r80 6t0 Lt0 ftt0 zt0 010 t90 990 90 r9 0 690 t90 rE0 290 090 810 9i0 tt 0 z,0
0r0 rt0 zn'j r0
tB0 180 280 080 tt0 910 .t'0 rt'0 990 99' 0 t90 z9'0 090 890 990 90 r90 6t0 fi'U 91 0 i0
180 980 zB0 080 8t'0 9t0 tt0 rL0 690 t90 990 z9'0 090 890 990 190 290 0s0 8?0 91'0 rn0
180 980 80 I80 8 10 9t0 bL0 zt0 690 190 990 90 190 690 l90 990 90 r90 6b0 lt'j 9t0
06 t) 890 980 180 280 080 8t0 9t0 tt0
zt0
0t0 890 990 t90 z9'0 090 8E0 990 190 290 090 8n0 9f0 tt0 0f0 80 90
80 t,0 90 0 I0 6Z0 tz0 920 ,20 zz0
020 rz0 zz0 zz0
6e0 8t0 9e0 n00 zt0 000 820 920 920 020
0''0 80 90 90 t-0 r'0 6z'0 lZ0 920 ,20
It0 6E0 t,'0 9E0 tt0 rt0 00 820 920 ,20
880 s80 e80 r80 610 910 rt0 zt} 0t'0 890 990 t90 290 690 t90 990 90 r90 6t0 fi'j 9t0 r0 zn0 0n0 80 90 r0 z0 00 820 920 920 20
880 980 e80 r80 6t0 tt0 910 ztj 010 890 990 i90 z9'0 090 890 990 190 290 090 Bn0 9t0 fr'O zt0 0r0 80 90 r'0 z0 rt0 620 tz} 920 sz0
880 980 180 r8'0 6t0 lt0 9[0 t,to rl0 690 990 ,90 290 090 890 990 r's0 290 090 810 9r'0 nr0 zt0 0n0 60 t'0 9e0 e0 IE0 620 t(,0 920 ez0
880 980 180 280 6t0 tl0 9t0 .t0 rl0 690 t90 990 90 I90 690 t90 990 t90 r90 610 fi0 9t0 ti0 It0 60 lE0 9t0 800 I0 62' 0 tzj 920 1z'0
r00
ZO 00 BZ0 920 ftz0
rz0 rz0 zz0 zz0 6 r ' 0 6 r ' 0 0 z0 0 z0
zz0
020 8I 0 9 10 rI 0
'alerrdordde uaaq aêq plno.4 lsal palrel-oMl e uaql'uoll -Jalrp JelnJrlred auo ur lsaralur uoud , ou peq arvr;1 '(sarlrlepoul qloq qllrr sluerled Jo tas aurs aql 8u1,(pnls {q paJnpur .,{lr'rrlrsuaspaseaJJur ar{l }uno) -re olur d)el ot PrIeJ rM pPt{ 'sproM Jar{lo ul) oraz oJ lenba aq ol pauns -Se uaaq saJe uaa.ft{Jaq uollelalJoJ ar{l peq Paleln)lef, uaaq aêq plnoM leql (l uI I ro 9I'0 Jo anle^ d e ro) 0I'I Jo orler IeJrlrJJ e woJJ uMeJp aq plno,4t ]eql aJuaJaJur Ja{ea1vraql.Lllllvr slseJl -uoJ srr{I 'ruopueJ aq lou Aeru af,uaJaJ -Jrp pa^rasqo aql ler{l s1sa33nsaruap -rla srllt :(OZO'O d) saldrues g1 ,{ra.,ra = ur aJuo ,,{1q8norrnJro plnoqs raq8rq ro It'I Jo anle^ e leql saleJrpul uorl -nqrrlsrp uerssneD aq1 'alerrdordde sr lsal palrel-auo e uar{l'sluaruanordrur ur palsaralur,,{1uo ere pue 1 dlrleporu uelll sr raltaq aq o1 .{14111 7 dlrlepou 1eq1 tnud a aârlaq ol uoseoJ aêrl a,tr JJ 'zr'l lo anleA z Ie)rluapl lsourle ue splar.{ g uorlenbg Jo Joleu -rruouap aLIt ut g00'0 = ft'}-I)zL I8Z0'0 Sursn l1g7g'g sp srorra prepuels arll Jo r.{)ea lJrpalo ol t a)uaJaJau u r P l n u r r o j . ) r { l r s n p u P ' t 9 1 6 ' 0J o e a r P oa,r1 aq1 aSera uor.ur.uoJ urPlqo ol 5PJJe e -,r.elq8rru auo 'rar]rea pauorluau sV I7'I = 600'0ll10'0 =
zr0
- zgzo (qzo'o)(eo'o)(t'o)zo +zso'oA -z86'o) z = lGvogo
280 6t0 9L0 L0 0L0 t90 r9'0 r9'0 690 950 t90 r90 6r'0 l,0 910 zv0 0r0 80 90 t'0 z,0 00 620 lz0 9Z 0 nz0 zz0 rz0 610 8r 0 910 9I0 r'0 zI0 II0 0I 0 600 100 900 900 r00 000 200 200 r00 9t6
t80 I80 BLU 9'' 0 tt'0 0t0 8 90 990 890 090 890 990 90 Is0 6r'0 lt0 9r'0 ?0 It0 60 l 0 90 0 r'0 6Z'0 820 9z'0 ,z'0 z'0 tz'j 610 8r'0 9 10 9I0 tI0 zr'0 II0 600 800 100 900 r00 00 z00 r00 096
980 280 080 tL0 9t0 zt0 010 t90 S90 90 090 890 990 F90 r90 6n0 t 0 9r'0 i0 rn0 6 0 l,0 90 r0 Z 0 00 820 9z0 920 z0 rZ0 020 8I 0 lr'0 9I0 tI 0 zI0 II0 600 800 900 900 r00 200 r00 926
980 80 I80 8t0 9t0 t0 rt 0 690 990 b90 a9'0 090 ts0 9s0 s0 r90 610 h0 910 f0 If'o 60 l0 90 S'0 z'0 00 8z0 920 ,7,0 tz0 rz0 6 10 8I0 9I0 9I 0 gr'0 Ir'0 0I0 800 l00 900 f00 00 r00 006
610 tr'\ 9r'0 fI 0 zI'0 II'O 600 t00 900 r00 00 I00 stg
6r'0 8I 0 9r'0 tI 0 tI 0 II'O 600 800 900 900 t00 200 098
+eaJV
020 8r'0 lr'0 9I 0 I 0 II'O 0I0 800 900 900 00 200 s28
PJJ^V
Iz0 6 10 tr'0 9r'0 r'0 ZIO 0I0 800 100 900 s00 200 008
tr'j 9r'0 fI0 zr0 OI() 600 100 s00 00 200 qtt
8r'0 9r'0 tI0 zr0 IIO 600 t00 900 00 200 09L
8I0 9 10 tI0 zl0 II'O 600 100 900 f00 200 qzt
8 10 910 tI0 tI0 IIO 600 100 900 i00 200 00t
0I0 800 900 r00 200

r 1s.9u ley LrJi),\{laq uorlPI.rJJoJ aBera,ty
p l e pr ^ o q e e q l S u l s n '3ur1dwesruopuer to llnsar e .,{lararu

se]l{ seale pa^Jasqo uaa^{laq a)ualaJ -Jrp pa^rasqo ar{l leql srsaqlodfq 11nu aql lsa] ol Japro uI z olleJ IeJrlrJJ ar{l e l e l n J l p ro l p a s n u r q l s p M g u o r l e n b 3 '67'g ,,{lalerurxorddesr seale uea.Ml -aq I uorlelarroJ ar{l leql sale)lpul I
*sluarJrJJaoJuorlelaJJoJ :I a-lsvI
an unpaired z test that assumed the two areas were statisticallv independent failed to find a significantdiiference between the modalities. The degree of correlation expected between R(JC areas obtained with different modalities varies considerably depending upon the types of modaiities involved. For example, if the two images are obtained from the same machine with two different settings or if a radiologist reads a CT scan with and without extensive clinical history, high correlation can be expected. In this study involving different reconstruction algorithms with CT, the correlation between the paired ratings of abnormal phantoms was 0.60 ancl between paired ratings of normal phantoms was 0.39. We have observed similar results in a study of ours (8) involving the interpretation of CT studies of the head with and without extensive clinical history. On the other hand, when the onlv common denominator in the comparison is the patient, the correlations are likelv to be weaker. For example, a study by Alderson ct al.(9) comparing CT, ultrasound, and nuclear medicine imaging in the diagnosis of liver metastases found considerably lower rating-pair correlations (0.36 in abnormal patients and 0.28 in normal patients). Obviously, in the latter situation the gains from using a paired rather than an unpaired analysis are smailer. Two other points must be made about correlation coefficients. First, in general we have noted that whatever the modalities under study, the ratings tend to be less correlated in the nondiseased patients than in the ciiseased patients. This suggests that in diagnostic imaging agreement tends to be greater if there is in fact underlying disease,and less if there is not. Second, if an investigator knew a Ttriorithat the correlations between the modalities under study were smaii, then an experimental design that did not involve pairing could be used, provided that it was no more difficult to separate (diagnose) the patients studied by one modality than it was to diagnose those studied by the other modalitv. The statistical economy resulting from this new statistical test is large. Statistical economy relates to the question of how many more patients are required in an unpaired design then in a paired design to achieve the same sensitivity or statistical power. A comparison of Equations I and 2 provides an answer to this question. Each of the standard errors is inversely proportional to the square root of the sample size n. Also, the equations can be simplified by assuming that the standard errors of the two areas are 842. Radiology
equal; in this case, Equation 2 differs from Equation 1 only in the presence of the factor (1 - t). When the sample sizes associated with the two techniques are arranged so that the pairc-d and unpaired tests prclduce the same z value, then a simple algebraic iclentity emerges: t r , ,= r t r l ( 1 - r )
or
tr,,=(l-r)rr,, where li,, and 1,, are the numbers of patients per modalitv in the respective unpaired and paired designsa. For example, i.f r is anticipated to be roughlv 0.3 and an unpaired design called for 100 patients per modalitv, then a paired design should require only 70 per moclalitv. Thus the total number of images iead would be 140 rather than 2 0 0 . T h i s e ff i c i e n c v i s e v e n m o r e i m portant if the limiting factor is the number of avaiiable patients with a proved outcome (rather than the number of images a reader can be expected to reacl), since the total of 140 paired images is obtained from just 70 patients, rather than from 200 patients in the unpaired. design. The investigator must weigh very carefully the practical and statistical issues,keeping in mind that if one uses an unpairecl design, one must establish (thrr:rugh casematching and/or random allocation) that the method of constructing two independent samples of subjects does not give one modalitv an inbuilt advantage. The discussion thus far has centered on a rather restricted design where just one reader read the images 5;enerated by the two moclalities being compared. The statistical test simplv asked the q u e s t i o n :i f t h i s o n e r e a d e r r e a d a n i n finite rather than a finite number of images, would his/her accuracy be comparable in both modalities?q Clearlv, a more general question is relevant: how do the modalities compare over many reaclers? For the sake of completeness, we refer briefly to this probiem of multiple readers and readings in each modality. This situation has been discussed extensivelv by Swets and Pickett (10); our main reasonsfor mentioning it here are to draw readers' attention to a very extensive treatment of the design and analysis of imaging experiments, and to point out that our method of ob-
taining r now allows the methods therein to be used with greater sensitivity. This is best appreciated by reproducing the formula that the authors give (Equation 2, Chapter 3) for the standard error of a difference between the value of an accuracv index (such as the area under an ROC curve) for one modality (averaged over / readers, each reading each image rr times) and the value of the same accuracv inclex (again averaged over readers and reaclings) for a seconcl moclality. The expression i n v o l v e s t h r e e s o u r c e so f v a r i a t i o n : S f , the variation in the inclex clue to differences in mean difficultv of cases f r o m c a s c s a m p l e t ( ) c . t s es . i m p l e ; 5 f , , between-reade-r variance due to differences in diagnostic^capability from reader to reader; ancl 5;,,, within-reader v a r i a n c e d u e -t o d i f f e r e n c e s i n a n i n d i vidual reacler's diagnc'rses the same of c a s e i n r e p e a t e d o c c a s i o n s .I t a l s o i n voives two correlation coefficients: r,. t o d e n o t c t h e -c o r r e l a t i o n s i n t r o c l u c e d bv using similar (or even the same') c a s e sw i t h b o t h m o d a l i t i e s a n d r r , , t c r denote correlationsbetn,eenthe accuracy index obtained bv usinp; rnatchecl (or possiblv the same) readers. With this notation. the formula becomes = SE(diifererrce)
r 2
The authors describe fullv i'ln seve-rai w o r k c d e r a m p l e s h t r w t o e v a l u a t ee a c h 'fhev point out, howof these terms. ever, that the estimation of the two c o m p o n e n t s r a n t l 5 r . c r e a t e sp r o b l e m s . F i r s t , i f n r = 7 , i . c . ,i f e a c h i ^ m a g e is read just once, then Sf, ancl Su',, are not separable., anci one is forced to overestimate the SE. The second, and more serious, problem is that if rl = 1 and if one does not have a large number of cases,enough ifor example) to split them into a number of subsamples and fit an ROC curvL- to each, one is unable to estimate r, . ln such cases,the authors explain that one has no alternative but to assume r, : 0, therebv g i v i n g u p n n v b e n e f i t sa t t a i n a b l el r o m case matching. The method we have-presented here means that if on. uses the area under the ROC curve as an index of accuracy, one is not forced to assume r,. = 0. The quantity we have callecl r, which is obtainable r;la Taslr I from the area and from the correlations between r a t i n g s , i s t h e s d m e q u . r n t i l v r .- , . , mentioned in Equation 5, Chapter 4 of Swets and Pickett (8)6. The interested
6If m ) 1, one can correct the quantitv /. ,,., (obtained from TlnLr l) for the "attenuation" "true" p r o d u c e d b v S 1 , . ,a n d e s t i m a t e t h e correlation r,. introduced bv using similar (or the same-) cases.
4 This simple relation allows the user to mult i p l y t h e s a m p l e s i z e s i n T A B L I - :I I I o f o u r f i r s t publication (4) bv the appropriate (1 - r) ancl use them for paired designs. 5 Onc could also use the z test to compare two specific readers on one modality.
September 1983
gpg . f8o1orpel
rrqunN
8'l rrunlo^
'ssJJJJrLU,)pP-rV:jrOI.rAN SU,llS.\S rlsOU f -3prpr;o uorlr'n1p^:I l lll ltal-lrd '\/J sl.).uq 0 l ' . r t d r r r 1 g s 1 . i i , ' , r r 1 ' 1 ' yi 1 ' n 1 r JArlf,Jdsord V ieuoLrrllpl lseJrq l() Lr()l()J qlr,u stu;rtecl ur JrArl rql lo,iqdt.rilrlur:s p L r r l ' p u n ( ) s r l J l l n ' , i q c l r . r B o r u o lp J l n c l r u ( ) ) '{ci sLuPpv '()(l uosr,)plY '1fl IP lr IlrNrl,"{ ssar cl ur 1!B6l .iNirlrrrpry ,ipnis rsr': Lrsp pt,Jtl : u ( ) l J L ' l a l ( { l a l L I-Ir t q c l t r S t l l p u r u t t J q l J ( )l - ) .{ro1sr _1o 11 1:11q.r .rL{lprr r sJ,\r n.) -ll l<uJlJpJprJ-l i t t r l r ' : , r . 1 ' t' , ) \ t i \ r l l \ ' r l t r ' ( j I r l | t t l l l { ' \ \ . llll LIr')l\trrltrn\l| \'r'.rIH lrl lr'rN)n l f ; _ r I i : a i r 0 8 6I L l r . \ s c l tlllll^\i 'sJA.In-r | l())l Jrul.l(rLlt!ll()l slsil ilLIt'J r g r u H r s r r r 1 s r 1 e 1 qg H L r p L u L { ( ) J \ ' : i / l)J I / \ , 1 . 1 / fJI lql I r6s6l llntl Llr\s([ ''p J()pLr. J,\rnl-.)())J,)tl] rJfrun 0.)ipaLll 'l '.j.)Pll1)(i 1 o . i 1 r l r q c r : e , rl u r l e l L u r q ) l L I , ) r sil \ 9 6 1 - i 8 f : q r b 9 6[ L i - rs ( l Lllpl,\ I tlpp poL{l!)Lu-NLr}Pl-ilP.\J,rltrl , r ) U A l r t l U r rl \t r l t , r i l r ' u t r u l , r l , )l l\ \U t ' \ l , r . r { l
'a8b I
z I t I = ( 9 2 00 ) ( 0 0 ' 0 X 0 1 0 ' 0-) z 9 z 00 + . 0 0 0 ' \ / l e t 0 0 = : : J ' l s r l e l sl s r J
(p)
0t 0 = sP;iP LlJJ,tlaLlu(lllL)iJ.llt).) llt0 9tl6 0 = tart.rHerr,rl' t t = S P J I PL I rJ f , u i ) l d l i l ( I
O 9200 0 e 00
(lrJJV) JoJlq
T lz |
9 !
1 tt q
Z860 f9t 91680 SZr

PJrV
0lI Zlr
i9t0 6 Er6'0
6 1 9 T lt E tl q s I 8z t al
Lr I Pr.UJ() q Y E () {P.LUl N z.\trlLlpor\ JPUrI()Uq\i IPrUr()N I.\l{lP})(rl\ LlIlPfr()nl
uorlJalip lt,rrNrs osl.rlauuJpd J()LI()rlrttirlsi 1 p o o r i l l J . l r Iu n r u i \ P l \ .{]lv'CC] LrPLulr{)(j 9l 6;:lt l : 1 9 11 . i f o 1 o r p r ; 1 a . u n J ( . ) ( ) ) J ) - r r l s r J r ] r p l t ' t l l r I r i r l r : a d o r J , \ r , r . r i .Ir J a l ] u n u , ) r pJ L l l J ( ) , ) s n 11 s 1 ' . 1 r ' 1 1 r 1 1 r ' .. rr 1 s . l l l I r . ' \ r n \ | \ ' ) l u l l l I Z I 6 0 i : t t : 6 / 6 [ l ( ) r p P ]tl \ r , \ u l \ J n [ ] ru L { r , l l i t i ri t r u r I p - ) r p . ) t u ( ) u ( ) rI p t l I P , \ . 1 J rrll ()l Prrlcldr crs,ilPuP X))l \''l sli.11( 'g6a-!Sa:B 18/61 pJI{ l . ) n f . Jt r r u r q srs,\1rur' .11)y 1o s.rltlrrrir:cl rrsug t1.1z1atrr,1 9 9 6 1 ' s u o sP U P ,ia|,11 uqof :jl() \ uaN s:rs.iqcloq:.isd pur . i r o ; r q t t t o t l - r r 1 a pl r u F r q y 1 s 1 . r , u q ' C l u i a J ( )
1cirr.ra1u1 aclol_q urlPu
saJuaJatau
prepuPls
r5^rs.ilPur' -)c)u (.))
f i l a v f H P p P U P) .),)q.rnO'iPJrirrol\ ] , 1 , ) J l s. \ l r s l ! ) A l l la I 1 / ! . \ l r s J J , \ r u nl I r r ) J l \ q I I p a H p u r , t N o 1 r , ru r . r p rr l r l 1 o 1 u ; t u ; r u c J . 1 6 1 '1c1 u:sntreu itll . r r 1 l p . r c { , \ 1u o u r
09 0 = ur)llPl):t,r: r8t'.t:,tt'
610: \r
Ilppu.)))
Z .\lIllrpotu
.s,l ,UIIL?L]otrl [ :(nPl
090= i'r
's.2 ,tlllepor-u I ; ilrlepotu

sHurlt,r Lraa,!\laq uorlel.)JJo-)
(,1)
salJPrlJ JpuJouq\/.i1a1ru1116:9 ol Jt'uuoN,\l;lrrrlJc = I L u ( u J : s N r t t l r y, ro1
- u I p - ) - l J r ' \ .i r u . ; . r 1 t r o r I e . 9 r 1 s . ) , \ L l r I q I J o J s J n o - ) s 1 1, I L r \ u ( ) r \ \ n . r ' r 1 t r n . ; J 1 . r t . { 1 1 1 ' 1 1 r 12 1 r t r 1 ()slu JJP .l^|sllnsal ()l p.)lq.pur llurpuuar .ipnI ptre sr.uo;ueqcl ,1.; aql 3urpr,to.r.l prr-rt u()ssLtr,1r.S pJeLlJi]l clrpq.l aJB a,1l :s1uaru3pa1,r'roulry 'sJ.lpPaJ
r-9 0r
6 ZZ ft f I IPIoJ 9 9
9t
1 Oi I
6I L 6
: ! l
I I I I I I
r
f 9 S
Iel()f ()
ot pJlq.lpur
a l
z r
; 6
[ a t
l f a
9'
L e
a t
z I
I 6 I
t E Z I
I ,\trlPpol,\ ql1.1\ * ilr-rrlPu
9 f t L 9^ 9 E t Z TuloJ . L U , " L r t . l ,B u J " N l -LU.,ltl,!J $uJ,\,qV _ z ,\lrlPpo] l L[]L\4. *illlltrtl
.r1dr11nr-u r-Jrpnls LrI 'p.rJeurrlsaJap Jo -un Jo'lp prassanB iluo ilsnornard se.ll. 'JaAoaJotr IPLII Lriall uu sapr,lord slql tr 'seaJP aql 1L])ttosuedr-uoJ JAtltsuJs aJoLu e urrolred ol uollelaJJof, stql
:Lrlep.Isug (t/) : J r r l l o a q l u f q l e a J eJ O l l r J l u a r x B p r p u J l - q n s { 1 r 1 e p o u ra u o J . r { J a L { , rla a l o l l s . ) l z J o s s u o r J p l n J I e Jq l r , n r a q l a S o ] ' s t u o 1 u e q c l7 1 1 Jo L{)eJyo s.rBeur o,u1 o; ua,ttB sBr.trley
XICIN!IddV
JAuLIpue sluaried 1o JSn ol ,ltor,lu.MoLIs pJAIrJp sa JnJ aldues aulps JLII LUoJJ JOU o^ i rJpun sParP JLll uraMlJq Jql SurlprurlsJroj PoLllsru P uorJeiaJJoJ papr,r.ordJAeLIa,M'uaql ',fueutuns u1 ' p a s n s r M o L Iu o s l l e l r p ll IInJ JoJ JJUJJ.IJJJIPLII llnsuoJ ueJ JJpPaJ

A Method For Comparing The Areas Under ROC Curves Derived From The Same Cases

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

A Method For Comparing The Areas Under ROC Curves Derived From The Same Cases

Enviado por

Direitos autorais:

Formatos disponíveis

1981].1 839 84;],September, [Reprintedfrom RADIOLOGY, Vol. 148,No.

3, Pages Societyof North America,Incorporated Copyright 1983by the Radiological

|ames A. Hanley, Ph.D. BarbaraI. McNeil, M.D., Ph.D.

the A Method of Comparing Areas

Operating underReceiver from Derived Curves Characteristic

cept/r/JT stopez-). CalculatingStandardErrors Thestandard errors associated with

3 Mathematical derivation available upon request.

-rrap prepuels peq pue sauo uerssneD

NOISSNJSIC 'obgrTqol (srs,,{leue parredun

uorllaJroJ a8era,te yo uorlJunJ e sp z y pup I y spaJe fod

0r0 rt0 zn'j r0

06 t) 890 980 180 280 080 8t0 9t0 tt0

80 t,0 90 0 I0 6Z0 tz0 920 ,20 zz0

020 rz0 zz0 zz0

0''0 80 90 90 t-0 r'0 6z'0 lZ0 920 ,20

It0 6E0 t,'0 9E0 tt0 rt0 00 820 920 ,20

rz0 rz0 zz0 zz0 6 r ' 0 6 r ' 0 0 z0 0 z0

- zgzo (qzo'o)(eo'o)(t'o)zo +zso'oA -z86'o) z = lGvogo

0I0 800 900 r00 200

p l e pr ^ o q e e q l S u l s n '3ur1dwesruopuer to llnsar e .,{lararu

z I t I = ( 9 2 00 ) ( 0 0 ' 0 X 0 1 0 ' 0-) z 9 z 00 + . 0 0 0 ' \ / l e t 0 0 = : : J ' l s r l e l sl s r J

0t 0 = sP;iP LlJJ,tlaLlu(lllL)iJ.llt).) llt0 9tl6 0 = tart.rHerr,rl' t t = S P J I PL I rJ f , u i ) l d l i l ( I

Z860 f9t 91680 SZr

Lr I Pr.UJ() q Y E () {P.LUl N z.\trlLlpor\ JPUrI()Uq\i IPrUr()N I.\l{lP})(rl\ LlIlPfr()nl

1cirr.ra1u1 aclol_q urlPu

r5^rs.ilPur' -)c)u (.))

f i l a v f H P p P U P) .),)q.rnO'iPJrirrol\ ] , 1 , ) J l s. \ l r s l ! ) A l l la I 1 / ! . \ l r s J J , \ r u nl I r r ) J l \ q I I p a H p u r , t N o 1 r , ru r . r p rr l r l 1 o 1 u ; t u ; r u c J . 1 6 1 '1c1 u:sntreu itll . r r 1 l p . r c { , \ 1u o u r

.s,l ,UIIL?L]otrl [ :(nPl

's.2 ,tlllepor-u I ; ilrlepotu

salJPrlJ JpuJouq\/.i1a1ru1116:9 ol Jt'uuoN,\l;lrrrlJc = I L u ( u J : s N r t t l r y, ro1

9 f t L 9^ 9 E t Z TuloJ . L U , " L r t . l ,B u J " N l -LU.,ltl,!J $uJ,\,qV _ z ,\lrlPpo] l L[]L\4. *illlltrtl

:Lrlep.Isug (t/) : J r r l l o a q l u f q l e a J eJ O l l r J l u a r x B p r p u J l - q n s { 1 r 1 e p o u ra u o J . r { J a L { , rla a l o l l s . ) l z J o s s u o r J p l n J I e Jq l r , n r a q l a S o ] ' s t u o 1 u e q c l7 1 1 Jo L{)eJyo s.rBeur o,u1 o; ua,ttB sBr.trley

Você também pode gostar