Escolar Documentos
Profissional Documentos
Cultura Documentos
1
,
2
,
3
,
4
,) in order that future rapid screening criteria
can be established.
1
2
3
4
ROC curve (analysis)
Definition:
A graphical representation of the trade-off between the
false negative (FN) and false positive (FP) rates for every
possible cut-off.
An ROC curve represents the relationship of two specific
distributions (under the same monotonic transformation
class).
Let the false positive be a variable , then 1-G(F
-1
(1-))
is the ROC curve (needs to be estimated)
The empirical function 1-G
m
(F
n
-1
(1- )) converges to
1-G(F
-1
(1-)) strongly.
Data and Implementation
1-SP 1.0 0.7 0.4 0.1
SN 1.0 1.0 0.6 0.3
Receivers operating characteristic curve
Source:
http://www.anaesthetist.com/mnm/stats/roc/
A relation with logistic regression model
Mathematical requirement:
log(SN/(1-SN))=log((1-SP)/SP)+, or
y/(1-y)=x/(1-x); =exp() = odds ratio;
y=x(x-x+1)
Homogeneity effect models (class)
(vs. heterogeneity effect)
Area under curve (AUC)
(still using logistic regression)
Assume>0
AUC=ydx= x/(x-x+1) dx
==(-1-log)/(-1)
2
; =exp()
~ 0 ==> AUC~0.5
AUC vs.
The AUC is increasing in beta
The confidence interval (CI) of AUC can be constructed via
the CI of beta (in standard two-sample setting). However,
this is a parametric approach.
Multivariate logistic regression & CI of AUC? Complex!
To combine several markers??
Theoretical thinking and Implementation
Using logistic regression model:
log[p(Y=1|X)/1-p(Y=1|X)]=
0
+
1
X
1
+
2
X
2
++
p
X
p
MLEs: b
1
X
1
+b
2
X
2
++b
p
X
p
, where bs are MLEs ofs
Define a new variable T as
T= b
1
X
1
+b
2
X
2
++b
p
X
p
Then for the n observations,
T
i
= b
1
X
1i
+b
2
X
2i
++b
p
X
pi
(Y
i
,T
i
), i=1,2,n, forms a new two-sample problem with
only one (simple) explanatory variable.
Requirement: At least one variable of (X
1
,X
2
,,X
p
) is
continuous.
Linear discriminant function obtained from
logistic regression model and its companion
ML estimation.
Merit and Drawback
Unified model (Homogeneity model!)
Easy to implement using standard statistical package (SAS,
SPSS,)
Variables selection procedure can also be employed. (SAS,
SPSS)
However, configurations of arbitrary variables leads to
difficulty in interpretation about the mechanism behind
this configuration.
--------------------------------------------------------------------
Comment
1. Case-control design: overestimate the AUC of an ROC.
2. Disease progression is not synchronous with the markers.
3. A follow-up study is needed. (Large population, time
consuming, and great costs.)
Optimal cut-off point?
Y (Sens.)
X (1-spec.)
Loss=X*a+(1-Y)*b,
Or more generally, Loss=f(X)*a+g(Y)*b,
1-Y=false negative with loss b
X=false positive with loss a
X=0 implies Y=1-Loss/b (intercept)
To minimize the Loss is equivalent to
maximize the intercept.
Gold Standard?
1. Biopsy as a gold standard (cancer)
2. Compare with the best-ever diagnosis
3. To fix sensitivity or specificity, and then compare with
another rapid diagnosis, still under 1 or 2.
Example: Creutzfeldt-Jakob Disease (CJD) and
S100 protein (as a marker of diagnosis/early
detection?) [BMJ (1998) Vol. 316, 21, pp. 577-582]