Você está na página 1de 5

Brief Communication

Reporting on Statistical Methods To Adjust for Confounding:


A Cross-Sectional Survey
Marcus Mullner, MD; Hugh Matthews, BSc, MBBS; and Douglas G. Altman, DSc

Background: The use of complex statistical models to adjust for confounding was used. In 1 paper in 10, it was unclear which
confounding is common in medical research. statistical method was used or for which variables adjustment was
made. In 45% of papers, it was not clear how multicategory or
Objective: To determine the frequency and adequacy of adjust- continuous variables were treated in the analysis. Inadequate re-
ment for confounding in medical articles.
porting was less frequent if an author was affiliated with a de-
Design: Cross-sectional survey. partment of statistics, epidemiology, or public health and if arti-
cles were published in journals with a high impact factor.
Setting: 34 scientific medical journals with a high impact factor.
Conclusions: Details of methods used to adjust for confounding
Measurements: Frequency of reporting on methods used to
are frequently not reported in original research articles.
adjust for confounding in 537 original research articles published
in January 1998.
Ann Intern Med. 2002;136:122-126. www.annals.org
Results: Of the 537 articles, 169 specified that adjustment for For author affiliations, current addresses, and contributions, see end of text.

D iscovering the determinants of disease is often not


straightforward, particularly if the disease or the
risk factor is rare or not easily recognized (1). In case
In this cross-sectional study, we sought to determine
how frequently adjustment is reported in medical scien-
tific articles and whether reporting is sufficiently de-
control, cohort, and other nonrandomized studies, the tailed.
groups being compared are likely to vary with respect to
several demographic, clinical, and other characteristics. METHODS
In randomized trials, authors often try to demonstrate Journal Selection
that the observed treatment effect is not explained by Scientific medical journals were included if they
some difference in baseline characteristics (2). The def- were published in English, were available in the British
inition of confounding is that there are alternative ex- Medical Associations library, and had an impact factor
planations for an observed association between a risk (6) that placed it in the highest 20% of journals within
factor and a health outcome. This may occur when one its medical specialty. We excluded review journals and
or more of these demographic or clinical characteristics journals specializing in statistics, epidemiology, and
are associated with one another and with the outcome of public health.
interest (3). Social class, for example, is known to be
associated with cardiovascular risk factors, such as smok-
ing, serum cholesterol level, and leisure physical activity, Article Selection
as well as with mortality. This difference in demo- Two of the authors independently assessed all Jan-
graphic and clinical characteristics may account for uary 1998 issues of the selected journals to identify full-
about half of the excess coronary and all-cause mortality length original research articles. Short reports, scientific
in blue-collar workers compared with white-collar work- letters, case reports, and review articles, and animal stud-
ers (4). ies were excluded.
If confounding cannot be avoided at the design
stage of a study, disentangling the web of causation is Data Collection
often difficult, and more or less complex statistical We assessed whether adjustment for baseline vari-
methods are needed. Readers of published scientific ar- ables or confounding was performed and whether the
ticles need to know whether and how the authors ap- paper specified which method was used, the variables for
propriately adjusted for confounding (5). which adjustment was made, and how these variables
122 15 January 2002 Annals of Internal Medicine Volume 136 Number 2 www.annals.org
Reporting on Statistical Methods To Adjust for Confounding Brief Communication

Figure. Articles assessed for reporting of adjustment for confounding or baseline differences and whether adjustment
was reported in the methods or results section.

were handled in the analysis (that is, whether it was to 10, or 10)was assessed by using chi-square tests
specified how continuous and multicategory variables or chi-square tests for trend. We used multiple logistic
were entered into the statistical model). Inappropriate regression to investigate simultaneously variables that
reporting was defined as not reporting or insufficiently showed an association with inappropriate reporting.
reporting on one or more of these points. For simplicity, Data were processed by using Excel 97 software (Mi-
we assumed that all multiple regression analyses were crosoft Corp., Redmond, Washington) and Stata, re-
done to adjust for confounders, even though in some lease 6 (Stata Corp., College Station, Texas).
cases they were done to identify prognostic variables.
Each paper was independently assessed by two of the
authors. Agreement between the two authors was good RESULTS
to very good ( 0.61 to 0.96). Disagreement was Thirty-four journals met our inclusion criteria (Ap-
usually caused by oversight rather than differing opin- pendix). The median impact factor of these journals in
ions. In case of disagreement or uncertainty, the third 1996 was 4.26 (interquartile range, 2.64 to 5.74). We
author (a senior statistician) was consulted. identified 537 articles that fulfilled our inclusion criteria
(Figure), of which 169 (32%) reported adjustment for
confounding or baseline differences.
Statistical Analysis
The univariate association between inappropriate
reporting and several variablesfor example, impact Reporting of Methods
factor (quartiles); at least one of the authors being affil- Of the 169 articles, 152 (90%) appropriately re-
iated with department of statistics, epidemiology, or ported methods to adjust for confounding (Figure), 7
public health; and number of authors (1 or 2, 3 to 5, 6 reported no method, and 10 mentioned but did not
www.annals.org 15 January 2002 Annals of Internal Medicine Volume 136 Number 2 123
Brief Communication Reporting on Statistical Methods To Adjust for Confounding

Table. Association between Inadequate Reporting of Adjustment for Confounders and Study Characteristics in 169
Original Research Articles

Characteristic Studies with Relative Risk (95% CI) P Value*


Inadequate Reporting,
n/n (%)
At least one author affiliated with department
of statistics, epidemiology, or public health
No 59/105 (56) 1.00 (referent)
Yes 17/64 (27) 0.47 (0.300.73) 0.001
Number of authors
12 4/8 (50) 1.00 (referent)
35 28/64 (44) 0.88 (0.421.85)
610 37/83 (45) 0.89 (0.431.86) 0.2
10 7/14 (50) 1.00 (0.422.38)
Impact factor (quartiles)
3.34 25/45 (56) 1.00 (referent)
3.354.67 22/47 (47) 0.84 (0.561.26)
4.689.28 21/40 (53) 0.95 (0.641.40) 0.008
9.28 8/37 (22) 0.39 (0.200.76)

* Chi-square test for comparing two proportions or test for trend.

adequately specify a method. In these 10 articles, the groups were comparable at baseline. In a table, the au-
authors used the phrases multiple regression or mul- thors report baseline adjusted incidence of pulmonary
tivariable analysis. complications. However, the reader is not told why
Of articles that specified the method, multiple logis- adjustment was needed to present a main end point,
tic regression analysis was the most frequently used (n how this adjustment was performed, or which of the 14
76), followed by multiple Cox proportional hazards or more baseline variables were included (7).
models (n 43); multiple linear regression (n 33);
and other methods (n 31), including stratified analy- Example 2
sis, partial correlation, direct and indirect standardiza- A randomized, controlled trial compared two anti-
tion, mixed-effect modeling, and age-adjusted z-score. biotics for the treatment of gonorrhea. The authors
Some papers included more than one method of ad- stated that After correction for baseline abnormalities
justed analysis, but each paper was counted only once in there was no significant difference in laboratory abnor-
the following analyses. malities. However, they did not indicate which baseline
abnormalities were meant and how this correction was
Reporting of Variables performed (8).
Of the 169 articles that reported adjustment for
confounding, 154 (91%) clearly specified variables for Example 3
which adjustment was made, and 93 (55%) clearly A randomized, controlled trial compared balsalazide
stated how variables were handled (Figure). Only 93 with mesalamine in patients with acute ulcerative colitis.
articles (55%) met criteria for appropriate reporting: 51 The authors stated that Logistic regression techniques
articles had one inadequacy, 17 articles had two inade- were used to identify prognostic factors significantly as-
quacies, and 8 articles had three inadequacies. sociated with remission. The reader is told which vari-
ables were finally significantly associated with the out-
come but not which variables were entered into the
Examples of Inappropriate Reporting
Example 1
model (9).
A randomized, controlled trial investigated whether
surfactant administered to preterm infants reduces the Example 4
incidence of severe complications. The authors provided Breathing patterns and respiratory muscle perfor-
great detail to demonstrate that treatment and control mance measures during weaning were examined in a
124 15 January 2002 Annals of Internal Medicine Volume 136 Number 2 www.annals.org
Reporting on Statistical Methods To Adjust for Confounding Brief Communication

case series of 17 patients receiving prolonged mechanical the interpretation of the results. Transformation to ful-
ventilation. In the methods section, the authors state fill the assumption of a normal distribution may or may
that adjusted means were calculated for the variables not have been performed (13). Likewise, for categorical
presenting a group effect, but the reader is not told variables, the definition and number of categories are
how or where this calculation was performed (10). needed. Assuming a linear association when it is nonlin-
ear may mask an association (14), having too few and
broad categories may lead to considerable residual con-
Determinants of Inappropriate Reporting
founding, and having too many categories may reduce
In 64 of the 169 articles (38%), at least one author
the power of a model (15, 16).
was a methodologist (that is, he or she was affiliated
The reasons for these shortcomings may be mani-
with a department of statistics, epidemiology, or public
fold. It is possible that all the necessary information was
health). Among papers with a methodologist-author, the
present before peer review and was omitted during the
rate of any inappropriate reporting was about half
publication process. However, errors and omissions are
that of papers without a methodologist-author (Table).
more likely the consequence of a system failure at many
The rate of inappropriate reporting tended to decrease
levels (17), including that of the authors, reviewers, stat-
as the journal impact factor increased, but this effect was
isticians, and editors (18).
largely due to a lower rate of inappropriate reporting in
Although data analysis may be correct despite inap-
the journals in the highest quartile of impact factor. The
propriate reporting, such reporting leaves readers unable
number of authors was not associated with inadequate
to assess whether the data were processed appropriately.
reporting.
Having a methodologist as an author seems to have a
A multiple logistic regression model with any inap-
protective effect, which is in accordance with the find-
propriate reporting as the dependent variable and meth-
ings of an earlier study (15). Why articles published in
odologist-author and impact factor (quartiles) as predic-
journals with a very high impact factor have a lower rate
tor variables showed that these two effects were largely
of inappropriate reporting remains a matter of specula-
independent (results not shown). Among the journals in
tion. It may relate to the fact that these journals more
the lower three quartiles of impact factor, 12 of 42
frequently use statistical reviewers than do lower-rank-
(29%) articles with a methodologist-author and 56 of
ing journals (19).
90 (62%) without a methodologist-author had inappro-
We suggest that readers, authors, referees, and edi-
priate reporting. In contrast, among the journals in the
tors try to assess whether original articles state which
top quartile of impact factor, the rate of inappropriate
statistical method was used to adjust for confounders,
reporting was similar among articles with and without a
for which variables adjustment was performed, and the
methodologist-author (3 of 15 [20%] articles vs. 5 of 22
way in which the variables were handled in the analysis.
[23%] articles, respectively). However, because the
number of papers was small and this split by impact
factor was not prespecified, P values are not presented.
APPENDIX
The following journals were included in the study: American
DISCUSSION Journal of Cardiology, American Journal of Medicine, American
Statistical methods are often misused, and poorly Journal of Obstetrics and Gynecology, Anaesthesia, Anesthesiology,
presenting them leaves the reader unable to critically Annals of Internal Medicine, Archives of Dermatology, Archives of
Internal Medicine, BMJ, Blood, Brain, British Journal of Anaesthe-
interpret the findings of an original research study (11,
sia, British Journal of Cancer, British Journal of Dermatology, Brit-
12). Some studies use a selection procedure to reduce
ish Journal of Obstetrics and Gynaecology, British Journal of Surgery,
the number of variables for which adjustment is needed Circulation, Critical Care Medicine, Gastroenterology, Gut, JAMA,
to those that are statistically significant. In such cases, Journal of the American College of Cardiology, Journal of Gerontol-
authors should report all variables considered in addi- ogy, Journal of the National Cancer Institute, Journal of Pediatrics,
tion to those for which adjustment was actually made. Journal of Rheumatology, Kidney International, The Lancet, New
Not reporting whether variables are treated as con- England Journal of Medicine, Neurology, Pediatrics, Thorax,
tinuous or as categorical data may make a difference in Thrombosis and Haemostasis, and Transplantation.
www.annals.org 15 January 2002 Annals of Internal Medicine Volume 136 Number 2 125
Brief Communication Reporting on Statistical Methods To Adjust for Confounding

From BMJ, London, and ICRF Medical Statistics Group, Oxford, 5. Uniform requirements for manuscripts submitted to biomedical journals.
United Kingdom. International Committee of Medical Journal Editors. Ann Intern Med 1997;
126:34-47. [PMID: 8992922]
6. Garfield E. How can impact factors be improved? BMJ. 1996;313:411-3.
Acknowledgments: The authors thank the BMJ staff, particularly Rich-
[PMID: 8761234]
ard Smith, for providing the environment that enabled this research
7. Lotze A, Mitchell BR, Bulas DI, Zola EM, Shalwitz RA, Gunkel JH. Mul-
project.
ticenter study of surfactant (beractant) use in the treatment of term infants with
severe respiratory failure. Survanta in Term Infants Study Group. J Pediatr. 1998;
Requests for Single Reprints: Marcus Mullner, MD, Universitatsklinik 132:40-7. [PMID: 9469998]
fur Notfallmedizin, Allgemeines Krankenhaus Wien, Wahringer Gurtel 8. Jones RB, Schwebke J, Thorpe EM Jr, Dalu ZA, Leone P, Johnson RB.
18-20/6D, A-1090 Vienna, Austria; e-mail, marcus.muellner@univie.ac.at. Randomized trial of trovafloxacin and ofloxacin for single-dose therapy of gon-
orrhea. Trovafloxacin Gonorrhea Study Group. Am J Med. 1998;104:28-32.
Current Author Addresses: Dr. Mullner: Universitatsklinik fur Notfall- [PMID: 9528716]
medizin, Allgemeines Krankenhaus Wien, Wahringer Gurtel 18-20/6D, 9. Green JR, Lobo AJ, Holdsworth CD, Leicester RJ, Gibson JA, Kerr GD, et
al. Balsalazide is more effective and better tolerated than mesalamine in the treat-
A-1090 Vienna, Austria.
ment of acute ulcerative colitis. The Abacus Investigator Group. Gastroenterol-
Mr. Matthews: Sandbanks, Graveney, Faversham, Kent ME13 9DJ,
ogy. 1998;114:15-22. [PMID: 9428213]
United Kingdom.
10. Capdevila X, Perrigault PF, Ramonatxo M, Roustan JP, Peray P, dAthis F,
Dr. Altman: ICRF Medical Statistics Group, Centre for Statistics in
et al. Changes in breathing pattern and respiratory muscle performance parame-
Medicine, Institute of Health Sciences, Old Road, Headington, Oxford ters during difficult weaning. Crit Care Med. 1998;26:79-87. [PMID: 9428547]
OX3 7LF, United Kingdom. 11. Bender R, Grouven U. Logistic regression models used in medical research
are poorly presented [Letter]. BMJ. 1996;313:628. [PMID: 8806274]
Author Contributions: Conception and design: M. Mullner, H. Mat- 12. Khan KS, Chien PF, Dwarakanath LS. Logistic regression models in obstet-
thews, D.G. Altman. rics and gynecology literature. Obstet Gynecol. 1999;93:1014-20. [PMID:
Analysis and interpretation of the data: M. Mullner, D.G. Altman. 10362173]
Drafting of the article: M. Mullner, H. Matthews, D.G. Altman. 13. Bland JM, Altman DG. Transforming data. BMJ. 1996;312:770. [PMID:
Critical revision of the article for important intellectual content: M. 8605469]
Mullner, H. Matthews, D.G. Altman. 14. Katz MH. Multivariable Analysis: A Practical Guide for Clinicians. New
Final approval of the article: M. Mullner, H. Matthews, D.G. Altman. York: Cambridge Univ Pr; 1999.
Statistical expertise: M. Mullner, D.G. Altman. 15. Altman DG, De Stavola BL, Love SB, Stepniewska KA. Review of survival
Administrative, technical, or logistic support: M. Mullner. analyses published in cancer journals. Br J Cancer. 1995;72:511-8. [PMID:
Collection and assembly of data: M. Mullner, H. Matthews. 7640241]
16. Brenner H. A potential pitfall in control of covariates in epidemiologic stud-
ies. Epidemiology. 1998;9:68-71. [PMID: 9430271]
17. Reason J. Human error: models and management. BMJ. 2000;320:768-70.
References [PMID: 10720363]
1. Hill AB. The environment and disease: association or causation? Journal of the 18. Godlee F, Gale CR, Martyn CN. Effect on the quality of peer review of
Royal Society of Medicine. 1965;58:295-300. blinding reviewers and asking them to sign their reports: a randomized controlled
2. Altman DG. Adjustment for covariate imbalance. In: Armitage P, Colton T, trial. JAMA. 1998;280:237-40. [PMID: 9676667]
eds. Encyclopaedia of Biostatistics. New York: Wiley; 1998:1000-5. 19. Goodman SN, Altman DG, George SL. Statistical reviewing policies of
3. Hennekens CH, Buring JE, Mayrent SH. Epidemiology in Medicine. Boston: medical journals: caveat lector? J Gen Intern Med. 1998;13:753-6. [PMID:
Little, Brown; 1987. 9824521]
4. Pekkanen J, Tuomilehto J, Uutela A, Vartiainen E, Nissinen A. Social class,
health behaviour, and mortality among men and women in eastern Finland. 2002 American College of PhysiciansAmerican Society of Internal
BMJ. 1995;311:589-93. [PMID: 7663252] Medicine

126 15 January 2002 Annals of Internal Medicine Volume 136 Number 2 www.annals.org

Você também pode gostar