Escolar Documentos
Profissional Documentos
Cultura Documentos
Jeremy Warner MD MS1,2, Amin Zollanvari3, Peijin Zhang4, Graham Snyder5, Gil Alterovitz5 1Vanderbilt University, Nashville, TN; 2Beth Israel Deaconess Medical Center, Boston, MA; 3Texas A&M University, College Sta=on, TX; 4MIT PRIMES; 5Harvard Medical School
Abstract Electronic medical records (EMRs) enable large-scale phenome-based analysis, which can reveal important biologic characteris=cs of large popula=ons. In pathologic disease states, the character of the phenome is expected to change over =me; analysis of such temporal evolu=on may create new insights into disease processes [Figure 1]. We developed a novel method for disease phenome visualiza=on, over =me, in a comprehensive inpa=ent EMR database of more than 20,000 adult pa=ents [Table 1]. Analysis of the resultant temporal phenome map [Figure 2] led to the recogni=on of the presence of several serious hospital-acquired complica=ons, including Clostridium dicile infec=on (HA-CDI) and venous thromboembolism (HA-VTE). Further phenotypic deni=on of these serious complica=ons [Table 2, Figure 3] allowed for the development of Bayesian classiers which could predict them in advance, using commonly available demographics, laboratory results, and medica=on ordering informa=on [Figure 4]. The trained Bayesian network classiers could predict either complica=on 24 hours in advance with good performance (AUC > 0.80). Transla=ng these ndings into clinical care focused on the detec=on and preven=on of such complica=ons could alleviate considerable morbidity and mortality and also yield signicant cost savings, on the order of $2.6-$4.4 billion annually in the United States alone.
Figure 1. Example schema of a learning healthcare system. This example demonstrates the ideal ow of a learning healthcare system environment, which begins with data analysis and visualiza=on. Based on interpreta=on of this data, poten=al problems are recognized and hypotheses are generated. These lead to development of interven=ons to mi=gate or improve the iden=ed problems, which are then implemented and evaluated in an itera=ve fashion.
Figure
3.
Outcomes
for
HA-CDI
and
HA-VTE
cases
as
compared
to
randomly
selected
controls.
A)
Hospitaliza=on
dura=on
is
signicantly
longer
for
HA-CDI
cases;
B)
30-day
post-discharge
mortality
is
slightly
worse
for
HA-CDI
cases
compared
to
controls.
C)
Hospitaliza=on
dura=on
is
signicantly
longer
for
HA-VTE
cases;
D)
30-day
post-discharge
mortality
is
signicantly
worse
for
HA-VTE
cases
compared
to
controls.
Table
1.
Baseline
demographics
of
the
MIMIC-II
version
6
dataset
and
specic
subgroups.
Characteris:cs
Total
Adult
Hospitaliza:ons
(n=28,061)
65
(51-77)
15,781
(56)
586
(2)
2,362
(8)
825
(3)
19,704
(70)
4,584
(16)
2
(1-4)
7
(4-14)
HA-CDI
Cases
(n=362)
68
(22-99)
199
(55)
5
(1)
27
(7)
7
(2)
282
(78)
41
(11)
2
(1-3)
20
(13-33)
HA-CDI
Controls
(n=362)
65
(21-95)
201
(56)
13
(4)
27
(7)
10
(3)
257
(71)
55
(15)
2
(1-3)
9
(6-14)
HA-VTE
Cases
(n=580)
HA-VTE
Controls
(n=580)
63
(16-103)
332
(57)
11
(2)
45
(8)
12
(2)
442
(76)
70
(12)
2
(1-3.5)
8
(3-13)
Age, median (IQR), y Men, No. (%) Race/Ethnicity, No. (%) Asian Black Hispanic White Other/unknown Elixhauser comorbidity index, median (IQR) Length of stay, median (IQR), d
65 (17-102) 330 (57) 19 (3) 57 (10) 19 (3) 422 (73) 63 (11) 2 (1-4) 17 (11-27)
Figure 4. Bayesian network classiers. A) The HA-CDI classier comprises 19 laboratory measurements and 20 medica=ons (including the aggregated high-risk an=bacterials category). B) The HA-VTE classier comprises 20 laboratory measurements and 26 medica=ons. ALT: alanine transaminase; BUN: blood urea nitrogen; HGB: hemoglobin; IH: inhaled; MAX: maximum value measured over data collec=on period; MCHC: mean corpuscular hemoglobin concentra=on; MIN: minimum value measured over data collec=on period; MULTI: more than two bioequivalent routes; OU: ocular; PLT: platelets; PTT: par=al thromboplas=n =me; RBC: red blood cell count; RDW: red cell distribu=on width; SP GRAV: specic gravity; TP: topical; WBC: white blood cell count.
Conclusions Temporal paqerns of risk of disease, as dened by ICD-9- CM codes, can be quan=ed and visualized. Bayesian network classiers can predict serious hospital- acquired complica=ons with good accuracy. This approach could enable learning healthcare systems.
Table 2. Exposures and outcomes of HA-CDI and HA-VTE cases as compared to matched controls. Exposure or Outcome Low-risk an=bacterial exposure, No. (%)a,b High-risk an=bacterial exposure, No. (%)a,b PPI or H2-blocker exposure, No. (%)a,b Length of stay, median (IQR), d 30-day post-discharge mortality, No. (%) Pharmacologic VTE prophylaxis, No. (%)b,e Length of stay, median (IQR), d 30-day post-discharge mortality, No. (%) HA-CDI Cases (n=362) 249 (69) 240 (66) 297 (82) HA-CDI Controls (n=362) 168 (46) 124 (34) 265 (73) P-Value <.001 <.001 .006 <.001 .05 <.001 <.001 .04 Hazard or Odds Ra:o (95% CI) 2.54 (1.86-3.49)c 3.77 (2.74-5.20)c 1.67 (1.16-2.43)c 0.34 (0.30-0.41)d 1.48 (1.00-2.17)d 2.12 (1.65-2.72)c 0.41 (0.36-0.46)d 1.35 (1.02-1.77)d
Figure 2. Temporal phenome-wide associa=on of ICD-9-CM codes that are more likely (increased risk) in the lengthier hospitaliza=on subgroup, as a func=on of =me. Odds ra=os of signicant codes are shown, with the upper 95% CI shown as a light blue halo. Each chapter of the ICD-9-CM coding schema is shown in a separate color, with V- and E- codes shown in purple and gray, on the right. Median dura=on of hospitaliza=on and IQRs, for the en=re MIMIC-II database, are shown as horizontal lines.
a Medica=on POE data collected up to 24 hours before diagnosis for cases, and for the rst 48 hours of healthcare exposure, for controls. b Aggregate medica=on categories are dened in Table S6. c Odds ra=o d Hazard ra=o e Medica=on POE data collected up to 24 hours before diagnosis for cases, and for the rst 24 hours of healthcare exposure, for controls.
20 (13-33) 80 (22) HA-VTE Cases (n=580) 270 (47) 17 (11-27) 134 (23)