Você está na página 1de 2

BMJ 2012;344:e3186 doi: 10.1136/bmj.

e3186 (Published 24 May 2012)

Page 1 of 2

Editorials

EDITORIALS
Comparing risk prediction models
Should be routine when deriving a new model for the same purpose
1

Gary S Collins senior medical statistician , Karel G M Moons professor of clinical epidemiology

Centre for Statistics in Medicine, Wolfson College Annexe, University of Oxford, Oxford OX2 6UD, UK ; 2Julius Centre for Health Sciences and
Primary Care, UMC Utrecht, 3508 GA Utrecht, Netherlands
1

Risk prediction models have great potential to support clinical


decision making and are increasingly incorporated into clinical
guidelines.1 Many prediction models have been developed for
cardiovascular diseasethe Framingham risk score, SCORE,
QRISK, and the Reynolds risk scoreto mention just a few.
With so many prediction models for similar outcomes or target
populations, clinicians have to decide which model should be
used on their patients. To make this decision they need to know,
as a minimum, how well the score predicts disease in people
outside the populations used to develop the model (what is the
external validation?) and which model performs best.2

In a linked research study (doi:10.1136/bmj.e3318), Siontis and


colleagues examined the comparative performance of several
prespecified cardiovascular risk prediction models for the
general population.3 They identified 20 published studies that
compared two or more models and they highlighted problems
in design, analysis, and reporting. What can be inferred from
the findings of this well conducted systematic review?
Firstly, direct comparisons are few. A plea for more direct
comparisons is increasingly heard in the field of therapeutic
intervention and diagnostic research and may be echoed in that
of prediction model validation studies. Many more prediction
models have been developed than have been validated in
independent datasets. Moreover, few models developed for
similar outcomes and target populations are directly validated
and compared.2 The authors of the current study retrieved
various validation studies, but only 20 studies evaluated more
than one model and most of those compared just two models.
Thus, readers still need to judge from indirect comparisons
which of the available models provide the best predictors in
different situations. It would be much more informative if
investigators who have (large) datasets available were to validate
and compare all existing models together. And it would be even
better if they first conducted and reported a systematic review
of existing models before validating them in their dataset. Fair
comparison requires that if an existing model seems to be
miscalibrated for the data at hand, attempts should be made to
adjust or recalibrate the model.4 5 For example, a prediction
model developed in one country or population does not
necessarily provide accurate predictions elsewhere. Ideally,

attempts should be made to examine pre-existing prediction


models in the new target setting and if necessary recalibrate or
further update the model and check its performance before
developing yet another model.4
Secondly, as Siontis and colleagues concluded, studies that
suggest one model is better than another often have potential
biases and methodological shortcomings. Authors who develop
a new risk prediction model using their data and then compare
it with an existing model often report better performance for
the new model. Prediction models tend to perform better on the
dataset from which they were developed and usually, if not
always, perform better than existing models when validated on
that dataset. This is simply because the model is tuned to the
dataset at hand, which is why a models performance should be
evaluated in other datasets, preferably by independent
investigators. However, some form of reporting bias must play
a role here,6 because a newly developed prediction model that
performed worse than an existing one would probably not be
submitted or published. Greater emphasis should therefore be
placed on methodologically sound and appropriately detailed
external validation studies, ideally of multiple models at once,
to show which model is most useful.7

Thirdly, the Framingham risk score may often require


recalibrating when used as a comparator. In many of the studies
examined by Siontis and colleagues a new model was compared
against the Framingham risk score. Although the Framingham
risk scoredeveloped in the United States during the
1970shas stood the test of time, it has been shown to be
miscalibrated in several other settings.8 It is not surprising that
without recalibration comparisons against it will often favour
the new model, especially if the validation dataset covers
specific subpopulations that were not covered in the original
Framingham study.
Fourthly, Siontis and colleagues review supports the findings
of existing systematic reviews of prediction models.9 The
conduct and reporting of prediction models has been criticised
as poor, and key details needed to evaluate the model objectively
are often omitted. In the absence of reporting guidelines for
such studies, Siontis and colleagues have provided suggestions
for conducting and reporting comparative studies, which if

gary.collins@csm.ox.ac.uk
For personal use only: See rights and reprints http://www.bmj.com/permissions

Subscribe: http://www.bmj.com/subscribe

BMJ 2012;344:e3186 doi: 10.1136/bmj.e3186 (Published 24 May 2012)

Page 2 of 2

EDITORIALS

adhered to will make the task of appraising these studies easier.


Guidelines for studies reporting the development and validation
of prediction models are being developed.10
Finally, there is a lack of consistency between studies that
compare prediction models because different statistical measures
are used to describe the performance of the models. Statistical
properties such as discrimination and calibration are widely
recommended characteristics to evaluate; yet calibration is rarely
examined. As important as the statistical characteristics of the
model are, they do not ensure its clinical usefulness. There
should therefore be more emphasis on demonstrating net benefit,
for example,11 or, preferably, on conducting a randomised trial
to evaluate the models ability to change clinicians decision
making and patient outcomes.7 12
Journal editors and peer reviewers should be more critical of
methodological shortcomings in prediction model studies, and
they should work towards improved reporting, calling for studies
to describe a fair validation and to compare two or preferably
more risk prediction models simultaneously.
Competing interests: All authors have completed the ICMJE uniform
disclosure form at www.icmje.org/coi_disclosure.pdf (available on
request from the corresponding author) and declare: no support from
any organisation for the submitted work; no financial relationships with
any organisations that might have an interest in the submitted work in
the previous three years, no other relationships or activities that could
appear to have influenced the submitted work.

For personal use only: See rights and reprints http://www.bmj.com/permissions

Provenance and peer review: Commissioned; not externally peer


reviewed.
1
2
3
4
5
6
7
8
9
10
11
12

National Institute for Health and Clinical Excellence. Lipid modification: cardiovascular
risk assessment and the modification of blood lipids for the primary and secondary
prevention of cardiovascular disease. 2008. CG67. http://guidance.nice.org.uk/CG67.
Altman DG, Vergouwe Y, Royston P, Moons KGM. Prognosis and prognostic research:
validating a prognostic model. BMJ 2009;338:b605.
Siontis GCM, Tzoulaki I, Siontis KC, Ioannidis JPA. Comparisons of established risk
prediction models for cardiovascular disease: systematic review. BMJ 2012;344:3318.
Toll DB, Janssen KJ, Vergouwe Y, Moons KG. Validation, updating and impact of clinical
prediction rules: a review. J Clin Epidemiol 2008;61:1085-94.
Steyerberg EW, Borsboom GJ, van Houwelingen HC, Eijkemans MJ, Habbema JD.
Validation and updating of predictive logistic regression models: a study on sample size
and shrinkage. Stat Med 2004;23:2567-86.
Rifai N, Altman DG, Bossuyt PM. Reporting bias in diagnostic and prognostic studies:
time for action. Clin Chem 2008;54:1101-3.
Moons KGM, Altman DG, Vergouwe Y, Royston P. Prognosis and prognostic research:
application and impact of prognostic models in clinical practice. BMJ 2009;338:b606.
Brindle P, Beswick A, Fahey T, Ebrahim S. Accuracy and impact of risk assessment in
the primary prevention of cardiovascular disease: a systematic review. Heart
2006;92:1752-59.
Collins GS, Mallett S, Omar O, Yu LM. Developing risk prediction models for type 2
diabetes: a systematic review of methodology and reporting. BMC Med 2011;9:103.
Collins GS. Opening up multivariable prediction models: consensus-based guidelines for
transparent reporting. BMJ Blogs 2011; http://blogs.bmj.com/bmj/2011/08/03/gary-collinsopening-up-multivariable-prediction-models/.
Vickers AJ, Elkin EB. Decision curve analysis: a novel method for evaluating prediction
models. Med Decis Making 2006;26:565-74.
Moons KG, Kengne AP, Grobbee DE, Royston P, Vergouwe Y, Altman DG, et al. Risk
prediction models: II. External validation, model updating, and impact assessment. Heart
2012;98:691-8.

Cite this as: BMJ 2012;344:e3186


BMJ Publishing Group Ltd 2012

Subscribe: http://www.bmj.com/subscribe

Você também pode gostar