Escolar Documentos
Profissional Documentos
Cultura Documentos
3
12A
September 1989: 12A-22A
Stepwise regression procedures are often used to identify a variety of examples from published reports. Wider appre-
small set of variables that serve as important predictors of ciation of these limitations should encourage the develop-
clinical outcome and to construct prediction models based ment of more relevant models, and thereby improve the
on those variables. Several theoretical and practical limita- quality of clinical prediction.
tions of this process are discussed and highlighted with a (J Am Co11Cardiol1989;14:12A-22A)
The sciences do not try to explain; they hardly even try to 99th annual meeting of the American Heart Association
interpret; they mainly make models . mathematical (centenary planners, take note).
constrnct[s] which, with the addition of certain verbal Nobel prize-winning economist Wassily Leontief (3) re-
interpretations, [describe] observed phenomena. The jus- cently lamented his colleagues’ slavish love affair with the
tification of such a mathematical construct is solely and
modeling profession, noting that only 31% of reports pub-
precisely that it is expected to work.
lished in the flagship journal of the discipline between 1972
John von Neumann
and 1981contained any empiric data at all, and that only 1%
contained new data generated on the author’s own initiative
I have it on good authority that the world will end at
(Table 1). On the basis of these observations, he predicted
exactly 1:12 PM on Friday, November 13,2026 (1). This bold
that the economists would:
prediction derives from impressive empiric evidence gath-
ered over the last 2,000 years (Fig. 1) demonstrating that
. . continue to produce scores of mathematicalmodels [by
world population (in billions of people) is a simple function
fitting] algebraic functions of all possible shapes to essen-
of time (in years measured from the birth of Christ): tially the same sets of data without being able to advance,
in any perceptible way, a systematic understanding of the
Population= 179/(2026.87
- time)0.99 structure and operations of [the real underlying] system.
Since the discovery of this so called “Doomsday equation” A recent survey (4) reveals that this practice is beginning
in 1960, population growth has been just slightly ahead of to rub off on the medical profession. Only 8% of articles
schedule. The equation predicts a population of 5 billion in published in the journal Circulation during 1965 contained
1989, but the Population Reference Bureau says that thresh- anything more than descriptive summary statistics, and none
old was actually reached in 1987 (2). At this rate, world used the advanced methods common to prediction models.
population is predicted to exceed 25 billion some time in the By 1985, however, 96% of published studies contained some
year 2020, and will go to infinity late in 2026, just before the form of statistical analysis, and 58% employed various
advanced methods (Table 2). So, before we all cancel our
plans for attending that AHA meeting, we would be well
From the Division of Cardiology, Cedars-Sinai Medical Center and the
School of Medicine, University of California, Los Angeles, California. This
advised to take a closer look at the validity of the prediction
work was supported in part by a Specialized Center of Research (SCOR)grant process itself-a process that relies heavily on the use of
from the National Institutes of Health (HL-17651),Bethesda, Maryland and is statistical regression models.
based on a presentation givenat the 1987Regenstrief Conference (Quality and
Cost-Conscious Cardiovascular Care: Role of Decision Modeling) sponsored
There are many ways for clinical predictions based on
by the Regenstrief Institute for Health Care, Indiana University Medical these statistical models to go wrong. We can ask the wrong
Center, Indianapolis,Indiana. question, make the wrong assumptions, choose the wrong
Address for renrints: George A. Diamond, MD, Cedars-Sinai Medical
Center, Division of Cardiology, 8700 Beverly Boulevard, Los Angeles,
model, select the wrong variables, make the wrong measure-
California 90048. ments, compute the wrong parameters, and correlate the
3.0 7
- N=179/(2026.87-TP.99 /
- r=0.994
c m
o.o? ’ ’ ’ ’ ’ ’ ’ ’ ’ ’ ’ ’ ’ ’ ’ ’ ’ ’ ’
0 500 1000 1500 2000
YEARS (A.D.)
wrong outcomes. For these reasons, Wasson et al. (5) identified: an increased number of T suppressor lympho-
recently proposed a number of standards that should be cytes, a paradoxically low titer of human immunodeficiency
required of clinical prediction models, and concluded that a virus (HIV) antibody (indicating pre-existing immunodefi-
majority of validation studies failed to meet these standards. ciency), a high titer of cytomegalovirus antibody and a
They observed that only 42% of published reports describing history of sex with someone who subsequently developed
33 prediction models contained an adequate description of AIDS. Kaposi’s sarcoma and opportunistic infections, fac-
the model, only 34% described its error rate and only 6% tors previously considered important, were not so identified
described its impact on patient care. Are such poorly stan- in this study, and the authors thereby concluded that
dardized prediction models really relevant to individual “[tlhese variables may be markers rather than determinants
patient decisions, or have the clinicians-like the econo- of disease progression” (7). But how certain can we be that
mists-become slaves to fashion? the predictors identified by stepwise regression are really the
important ones and that those not so identified are really not
important? How fickle are such models?
What Do We Know and How Well Do We To answer this question, we followed up 598 postinfarc-
Know It? tion patients undergoing stress electrocardiography and
How reproducibleis stepwiseregression? Clinically “im- radionuclide ventriculography, and developed a stepwise
portant” risk factors-the basis of most conventional pre- logistic regression model to predict cardiac events (death or
diction models-are usually identified by a familiar statistical nonfatal infarction) over the next year based on a variety of
procedure known as stepwise regression (6-8). This proce- observations derived from these tests (9). Stepwise regres-
dure was used, for example, to identify the important sion selected only 3 of 10 candidate variables for inclusion in
independent predictors for the development of 59 cases of the prediction model (peak exercise left ventricular ejection
acquired immunodeficiency syndrome (AIDS) among 1,835 fraction, exercise duration and the magnitude of exercise-
seropositive homosexual men over a median follow-up inter- induced ST segment depression). We assessed the reproduc-
val of 15 months. The following important factors were ibiIity of this modeling process by repeating it multiple times
60
Our confidence in these “somewhat reliable” models can
be further degraded by baseline differences in the composi-
tion of the reference population and in the operative defini- 50
REST EF REST EF
tions of the candidate clinical variables (l3), by a variety
of cognitive and methodologic biases (l4), and even by EXER&E EF
E 501
<20 <30 40 <50
RESTBIG EJECTION FRACTION THRESHOLD
21 99.9969
Figure 6. Inverse relation between the accuracy and precision of a
The explained variance is the proportion of total least squares variance in logistic prediction model based on 646 postinfarction patients.
the outcome measure (in percent) accounted for by all the variables entering Accuracy (upward bars) is expressed as area under a receiver-
into the regression equation at that step. operating characteristic curve constructed from the set of logistic
outcome probabilities, and precision (downward bars) is expressed
as the average variance of the individual probabilities.
patient’s risk on day 1 is 30%. To predict the risk on day 2 8 5.6 43.3
9 29.5 71.9
we need only select the 30% value on the x axis, move
10 96.7 3.2
vertically until we intersect the graph, and read off the
20A DIAMOND JACC Vol. 14, No. 3
LIMITS OF CLINICAL PREDICTION MODELS AND CLINICAL PREDICTION September 1989: 12A-22A
lation doubling time for our species cannot be much less than 12. Harrell FE Jr, Lee KL. Califf RM, Pryor DB, Rosati RA. Regression
modeling strategies for improved prognostic prediction. Stat Med 1984;
9 months, no matter how good the fit of the data (51). 3:143-52.
Second, if we use the Doomsday equation to calculate
13. Lohr JW, McFarlane JM, Grantham JJ. A clinical index to predict
backward in time, we discover that Adam (N = 1) appeared survival in acute renal failure patients requiring dialysis. Am J Kidney Dis
(on the sixth day of creation) some 233 billion years ago. Eve 1988:11:254-9.
(N = 2) appeared 115 billion years later (that same day!) and 14. Tversky A, Kahneman D. Judgment under uncertainty: heuristics and
bore her first child (N = 3) when she was 39 billion years old. biases. Science 1974:185:1124-31.
These estimates, being slightly in excess of those advocated 15. Diamond GA. Monkey business. Am J Cardiol 1986;57:471-5.
by both the physicists and the creationists, serve to remind 16. Diamond GA, Forrester JS. Metadiagnosis: an epistemologic model of
us that the model’s parameters are not known with certainty. clinical judgment. Am J Med 1983;75:129-37.
To be specific, there is a 7.8% error in the constant appearing 17. Ladenheim ML. Kotler TS, Pollock BH, Berman DS, Diamond GA.
Incremental prognostic power of clinical history, exercise electrocardio-
in the numerator, a 0.9% error in the exponential constant
graphy and myocardial perfusion scintigraphy in suspected coronary
appearing in the denominator and 0.3% error in the linear artery disease. Am J Cardiol 1987:9:270-7.
constant in the denominator representing the actual date of 18. Ahnve S, Gilpin E, Henning H, Curtis G, Collins D, Ross J. Limitations
doomsday (1). As a result of these seemingly small individual and advantages of the ejection fraction for defining high risk after acute
errors, the aggregate error of the outcome prediction be- myocardial infarction. Am J Cardiol 1986;58:872-8.
comes huge when we extrapolate even 3% beyond the range 19. Jones RH. McEwan P, Newman GE, et al. Accuracy of diagnosis of
coronary artery disease by radionuclide measurement of left ventricular
of our data (Fig. 10). Based on these errors, then, the lower function during rest and exercise. Circulation 1981:64:586-601.
bound of a 95% confidence interval for the world population
20. Work JW. Ferguson J. Diamond GA. Incremental power of rest-exercise
overlaps a value of zero in the year 20lLmeaning there’s a ejection fraction to predict coronary events after myocardial infarction
chance our species will dwindle to extinction a full decade tabstrl. Clin Res 1987:35:336A.
before the Doomsday crush. The future ain’t what it used to 21. Gill JB. Ruddy TD. Newell JB. Finkelstein DM, Strauss HW. Boucher
be. CA. Prognostic importance of thallium uptake by the lungs during
exercise in coronary artery disease. N Engl J Med 1987;317:1485-9.
23. Califf RM, Mark DB. Harrell FE Jr, et al. Importance of clinical measures
of ischemia in the prognosis of patients with documented coronary artery
disease. J Am Coll Cardiol 1988;11:89-93.
References 24. Specht L. Nordentoft AM, Cold S, Clausen NT, Nissen Nl. Tumour
burden in early stage Hodgkin’s disease: the single most important
I. von Foerster H. Mora PM, Amiot LW. Doomsday: Friday, 13 November, prognostic factor for outcome after radiotherapy. Br J Cancer 1987;
A.D. 2026. Science 1960:132:1291-5. 55:535-9.
2. Umpleby SA. World population: still ahead of schedule. Science 1987: 25. Work JW. Ferguson J, Diamond GA. Risk stratification after myocardial
237: 1555-6. infarction: the lights are on but nobody’s home (abstr). Clin Res 1987;
3. Leontief W. Academic economics. Science 1982:217:1067. 35:364A.
26 Mayer RP. Stowe RA. Would you believe 99.9969% explained? Industr
4. Ware HJ. Statistical practice and statistical education in cardiology.
Eng Chem 1969:61:426.
Circulation 1987:75:307-IO.
27. Murphy EA. A Companion to Medical Statistics. Baltimore: John Hop-
5. Wasson JH, Sox HC, Neff RK. Goldman L. Clinical prediction rules.
kins University Press. 1985:139-40.
Applications and methodological standards. N Engl J Med 1985:313:
793-9. 28 Work JW. Ferguson J. Diamond GA. Ejection fraction predicts outcome
for populations but not for individuals (abstr). Clin Res 1987:35:364A.
6. Harrell FE Jr. SUGI Supplemental Library User’s Guide. Version 5 ed.
Cary. North Carolina: SAS Institute Inc: 1986:280. 29 Hauck WW. A note on confidence bands for the logistic response curve.
Am Stat 1983:37:158-60.
Polk BF. Fox R. Brookmeyer R. et al. Predictors of the acquired
immunodeficiency syndrome developing in a cohort of seropositive ho- 30 Feinstein AR. Clinimetrics. New Haven: Yale University Press. 1987:
mosexual men. N Engl J Med 1987;316:61-6. 112-3.
Gottlieb SO, Weisfeldt ML, Ouyang P, Mellits ED. Gerstenblith G. Silent 31 Shannon CE. Weaver W. A Mathematical Theory of Communication.
ischemia as a marker for early unfavorable outcomes in patients with Urbana: University of Illinois Press, 1949:36&t.
unstable angina. N Engl J Med 1986:314:1214-9. 32 Barnoon S. Wolfe H. Measuring the Effectiveness of Medical Decisions.
Springfield, Illinois: Charles C. Thomas, 1972:53-75.
Ferguson JG. Pollock BH. Work JW. Diamond GA. How does sample
size affect the reproducibility of a clinical prediction rule? tabstrl. Clin 33 Ruddy TD. Dighero HR, Newell JB, et al. Quantitative analysis of
Res 1987:35:344A. dipyridamole-thallium images for the detection of coronary artery dis-
ease. J Am Coll Cardiol 1987:10:142-9.
Diaconis P, Efron B. Computer-intensive methods in statistics. Sci Am
1983:248: 116-30. 34. Ransohoff DF. Feinstein AR. Problems of spectrum and bias in evaluating
the efficacy of diagnostic tests. N Engl J Med 1978:299:926-31.
Ferguson JG. Pollock BH. Work JW. Diamond GA. How reliable are
published clinical prediction models? tabstrl. Circulation 1987:76lsuppl 35. Diamond GA. An improbable criterion of normality (letter). Circulation
IV):lV-253. 1982:66:681.
22A DIAMOND JACC Vol. 14, No. 3
LIMITS OF CLINICAL PREDICTION MODELS AND CLINICAL PREDICTION September 1989:12A-22A
36. Rozanski A, Diamond GA, Berman DS, Forrester JS. Morris D. Swan 43. McNeil BJ, Hanley JA. Statistical approaches to the analysis of receiver
HJC. The declining specificity of exercise radionuclide ventriculography. operating characteristic (ROC) curves. Med Decis Making 1984;4:137-50.
N Engl J Med 1983;309:518-22.
44. Diamond GA. ROC steady: a receiver operating characteristic curve that
37. Rozanski A, Diamond GA, Forrester JS, Berman DS, Morris D, Swan is invariant relative to selection bias. Med Decis Making 1987;7:238-43.
HJC. Alternative referent standards for cardiac normality: implications
for diagnostic testing. Ann Intern Med 1984:101:164-71, 45. Diamond GA, Pollock BH. Computer-assisted diagnosis in noninvasive
evaluation of coronary artery disease. J Am Coll Cardiol 1984;3:465-6.
38. Diamond GA, Rozanski A, Forrester JS. et al. A model for assessing the
sensitivity and specificity of tests subject to selection bias: application to 46. May R. Simple mathematical models with very complicated dynamics.
exercise radionuclide ventriculography for diagnosis of coronary artery Nature 1976;261:459-67.
disease. J Chron Dis 1986:39:343-55.
47. Dennett DC. Brainstorms. Philosophical Essays on Mind and Psychol-
39. Mark DB, Hlatky MA, Harrell FE, Lee KL. Califf RM, Pryor DB.
ogy. Cambridge: MIT Press, 1978:181.
Exercise treadmill score for predicting prognosis in coronary artery
disease. Ann Intern Med 1987:106:793-800. 48. Foucault M. This Is Not a Pipe. Los Angeles: University of California
40. Hlatky MA, Pryor DB, Harrell FE, Califf RM, Mark DB, Rosati RA. Press, 1982.
Factors affecting sensitivity and specificity of exercise electrocardio-
49. Kerr RA. Whom to blame for the Great Storm? Science 1988;239:1238-9.
graphy. Multivariable analysis. Am J Med 1984;77:64-71.
41. Diamond GA. Reverent Bayes’ silent majority. An alternative factor 50. Health Care Financing Administration. Selected performance information
affecting sensitivity and specificity of exercise electrocardiography. Am J on hospitals providing care to Medicare beneficiaries. Fed Reg 1987;
Cardiol 1986;57:1175-80. 52:30741-5.
42. Hanley JA, McNeil BJ. The meaning and use of the area under a receiver 53. Deevey ES Jr. Of Doomsday and the lower Mississippi (letter). Science
operating characteristic (ROC) curve. Radiology 1982;143:29-36. 1987:238:1215.