Você está na página 1de 6

International Conference on Complex, Intelligent and Software Intensive Systems

Development of a Vital Sign Data Mining System for Chronic Patient


Monitoring

Vincent S. Tseng1 Lee-Cheng Chen1 Chao-Hui Lee1 Jin-Shang Wu2 Yu-Chia, Hsu3
1
Dept. of Computer Science and Information Engineering, National Cheng Kung University, Taiwan, R.O.C.
2
Dept. of Family Medicine, National Cheng Kung University Hospital, Taiwan, R.O.C.
3
Southern Innovation Center, Institute for Information Industry, Taiwan, R.O.C.
Email: tsengsm@mail.ncku.edu.tw

Moreover, we proposed a chronic illness predicting


Abstract mechanism to explore the relation between bio-signals
In recent years, the structure of global population and illness changes. Therefore, prediction model
keeps going towards highly-aged continuously. The building could be applied to various chronic illnesses
development of chronic patient medical care system corresponding to patients bio-signals. For a period of
becomes important and meaningful since people paid a ECG, we allocate P wave, QRS complex and T wave
lot attention to medical prevention. The medical care with medical knowledge. The period of each waveform
system has to provide alerts in time before the severe has a static rhythm, as shown in Figure 2.
chronic illness occurs, such as stroke, diabetics, heart
disease. Thus, necessary procedures can be taken in
short time to save one precious life. In this paper, we Patient
Bio-monitoring System
presented a data mining system for chronic patient
Receive Equipment
monitoring with applications on caring of Signal
Data Setting
cardiovascular patients. By mining vital signs like ECG,
the system can predict with a classification tree and Lab Sample
Patient

Diagnosis Doctor
inform doctors to take actions if any anomaly could
happen. A series of experiments on PAF data showed that Nurse
Treatment
our system can stably predict the anomaly from patients
ECG data without coding of medical rules as done in
other existing approaches. Life Support Lab Result
Equipment
Keyword: Data mining, Electrocardiogram analysis,
Pharmacy
Patient monitoring system, Vital sign analysis
Figure 1. Workflow for patient vital sign monitoring.
1. Introduction
The structure of this paper is organized as follows: We
With the development of global civilization and discuss related works in Section 2, and section 3 gives a
industry, the structure of global population keeps going detail description of the vital sign monitoring system we
towards highly-aged continuously. It makes the old-aged proposed. Section 4 describes the experiments and
medical care more important. If we can take notice on performance evaluation. Finally, we make a conclusion
patients vital signs like electrocardiogram (ECG), blood and future work in section 5.
pressure, heart beat and body temperature, it is very R
likely that we can make appropriate care procedure on
the patients in time. Therefore, it is valuable to develop
medical care and monitoring techniques.
The main function of a medical monitoring system is
to receive the vital signs of patients to monitor their T
status. Under the procedure of monitoring, it can provide P
alerts and proper medical decisions to paramedics based
on the collected data. Figure 1 shows the workflow for
patient vital sign monitoring.
Our goal is focused on developing a data mining Q
system for analyzing the vital signs of chronic patients.
For applications, we treated illness related to ECG as
application since ECG has been proved effective in S
clinical medicine for years. We sought for doctors Figure 2. A rhythm map of ECG waveform.
professional opinions since the signatures and diagnoses
already had explicit directions.

0-7695-3109-1/08 $25.00 2008 IEEE 649


DOI 10.1109/CISIS.2008.140
2. Related Work
Figure 4 shows the flow of patient monitoring system
The traditional patient monitoring system was proposed in this study. The flow could be separated into
proposed by Puers [8] in 1989. Its structure contains data several parts as described in the following. Firstly, the
receiving, transmission, paramedics and functional system caught patients raw bio-signals. For training data,
components. The main part of this structure is to process a diagnosis would be given, such as normal or abnormal.
patients vital signs by equipments and send to We then derive some bio-indicators through the
paramedics and expert system to make diagnoses. Alonso doctors domain knowledge based on the illness. With
et al. [1] [8] proposed a leading system as shown in the stage, the rules reliability and precision could be
Figure 3. Bio-signal time series were transformed by enhanced. With these indicators, we transformed raw
Fourier Transform and indexed by R* Tree. It could bio-signals into an attribute table and performed data
introduce a general model and rule set to treatment unit. mining skills to generate a classification tree. Finally,
this tree could be applied to clinical diagnoses. The next
Support
Action
paragraph will describe the realization of the system.
Treatment

Raw bio-signal (i.e. ECG) Diagnosis will be given in train set


DataBase Step Step
Step44
DataBase Step11 Step
Step22 Step
Step33 RuleBase
RuleBase
Using the discrete Creating R* tree Creating reference Suggestions Experts identified key indicators i.e.heart ratecount of APC
Fourier transform classification model for popular
groups
Extract value from raw bio-signal
Figure 3. Structure of patient monitoring as proposed
by Alonso et al. Perform data mining on all attribute

There are some related studies to bio-signal analysis, Generate classification rule Output
such as ECG analysis and parsing. For instance, Wolff el
al. [4] used a non-supervised to diagnose via Figure 4. Flow of patient monitoring system.
bio-information, or detect apnea by ECG [6] [7]. Even
more, the complete hardware equipment [5] applied to 3.3 The realization of patient monitoring system
daily life care. The above methods were traditional
statistics basically. They could achieve certain results, Since raw bio-signal data are usually consecutive time
yet in tremendous data the precision and time cost series sequences, we identified some medical reference
performed relatively lower. points by tools provided from WFDB [12]. Then, we
W. Zong [11] pointed out that Paroxysmal Atrial preprocessed signals for easier reading thereafter through
Fibrillation (PAF) is related to the occurrence of Atrial bio-indicators recommended by doctors.
Premature Contraction (APC) and the closer to the point Because patients bio-signals have individual
of PAF, the more impact of APC. For this kind of differences, we proposed a mechanism to numerate and
approach, some medical rules have to be set in advance. emphasize the attributes which doctors recommended in
This might not be feasible in some applications, e.g., order to simplify the construction of classification model.
analysis of new diseases. Therefore, we presented a Figure 5 shows the idea of the extreme pattern
general data mining system for patient monitoring in this statistics we proposed. This method normalized raw
paper without requirements of coding medical rules in bio-signals first and then treated each attribute as a
advance. distribution. We divided this distribution into 100 parts
between the maximum and minimum and set a threshold
3. Patient Monitoring System parameter . Then we extracted percent data of both
side endpoints and calculated their origin average and
3.1 Problem Definition standard deviation to fully record the characteristic of
this attribute. We did the above procedure for every
We assumed that patients illnesses are often related to attribute that doctors recommended and recorded the
some kinds of their bio-signals. By observing those distribution of characteristic such as the number of
bio-signals and adopting corresponding medical anomaly and average.
attributes identified by professional doctors, we are able We set a parameter for processing simplicity. The
to build a classification tree based on target illness with parameter is for dividing the whole bio-signal as
data mining methods. Thus doctors could take segments to perform easier on time series. It contains the
suggestions of the system into account and diagnose same attributes in each segment. A weight of a segment
more properly. would be given then after considering the importance of
medical theory, doctors recommendation and the time
3.2 System Flow impact of attributes. For example, the doctors

650
recommended attributes as {a1 ,a2, a3 ,, ak} for an we divided ECG into 3 segments, that is, =3.
illness and we divided a bio-signal into segments of {S1
,S2, S3 ,, S}. Thus, every segment Sn contains Classifier 1 Model 1

attributes {an1, an2, an3, , ank}. We assume the weight of Classifier 2 Model 2
Sn is Wn, and we calculated extended attributes aj, as Data Table Classification
shown below. Rule

a 'j = ai , j Wi , 1 j k Classifier N Model N


i =1

By applying the above procedure, we could have a set


of attributes as {axy|1x, 1yk} {aj|1jk}. Figure 6. Multi-classifiers evaluation.
Thus the above data could serve as training data in
Table 1. Example data after preprocessing.
building the classification model. Table 1 shows the table
Attribute A Attribute B Attribute X Diagnosis
for numerical attributes after preprocessing.
In order to focus on various data and illness more Case 1 Value A1 Value A1 Value X1 Normal
preciously, we adopted a multi-classifiers method to
build classification model. F-measure was used to Case m Normal
evaluate all classifiers in this kernel as shown below: Case m+1 Abnormal
2 recall precision 2 TP
F measure = =
recall + precision 2 TP + FP + FN
Case n Abnormal
The evaluation of F-measure takes recall and precision
into consideration to prevent a high recall and high false
positive situation. Based on known data, we can choose A classification result is shown in Table 3. The
the classifier with the highest F-measure value in indicators for evaluating the system are mainly precision
multi-classifiers for further anomaly detection and and recall and aided with false positive (FP), as follows:
X +W W
diagnosis kernel, as shown in Figure 6. With the previous precision = , recall =
X +Y + Z +W Z +W
built classification model, we could give the Y
preprocessed raw bio-signals into this model and decide FP =
Y +W
whether this case is normal or at risk.
Occurrence Table 2. PAF dataset.
The whole value
Train sets Test sets
Abnormal: % Normal Patients 50 44
Non-Immediate Risk
(No PAF 45minutes later 25 28
PAF
or former)
Patients
Patient with PAF in 30
25 28
minutes later

Value Table 3. Diagram of a classification result.


Normal part: 100-2%
Prediction
Normal Abnormal
Class
Normal X Y
Minima Maxima Abnormal Z W

Figure 5. Diagram of extreme pattern statistics. 4.2 Experimental Evaluation


4. Experimental Evaluation 4.2.1 Extreme Pattern Statistics
4.1 Experiment Dataset We tested different settings of in extreme pattern
statistics for classification. The training sets were taken
The bio-signals used in the following experiments to do the inner test. Figure 7 shows the experimental
were obtained from PhysioNet [15]. From this source, results. When =95 or 90, the precision outperformed
we chose the PAF Database [13] for our experiment others. Besides, the recall under =90 is better than that
samples which are ECG signals in 30 minute long. under =95. Thus, we set =90 for our further
The dataset includes 100 training instances, which experiments.
have 50 normal patients and 100 test instances, as shown
in Table 2. This study mainly targeted on the patients 4.2.2 Evaluation of Multi-classifiers
with PAF. Based on suggestions from a medical expert,

651
We used different kinds of classifiers provided by square weight <1,5,25>
WEKA [3], and the evaluation results are shown in Table We chose 5-Fold validation method to repeatedly test
4. Obviously NaiveBayes could achieve 68% of recall. upon the above weight policies, shown in Table 6. In
Besides this, we chose J48 classifier for the following aspect of recall, linear weight <1,2,3> was less than
experiments since it generates a rule-based classifier square weight <1,2,4> and <1,3,9>, yet square weight
rapidly. Based on the above, we adopted NaiveBayes and <1,3,9> and <1,5,25> were similar to square weight
J48 to build multi-classifiers of the system. <1,2,4>. In another aspect of precision, all weight skills
performed the same.
100% Briefly speaking, square weight <1, 2, 4> was better
80%
since we emphasized on the outcome of recall.
60% Precision
40% Recall
Table 4. Multi-classifiers experiment.
20% recall false positive
0% NaiveBayesSimple 68% 67%
40% 50% 60% 70% 80% 90% 95%
NaiveBayes 68% 69%
%
libSVM 61% 76%
Figure 7. Extreme pattern statistics.
Radial Basis Function 57% 56%
4.2.3 Effects of number of Attributes Random-ForestTree 39% 21%

According to medical experts recommendations, we J48 32% 26%


experimented two sets of attributes, shown as follows 1-NN Classification 29% 26%
(avg abbreviates for average and stdev for standard
deviation): NaiveBayes-Multi 29% 36%
LogitRegression Tree 25% 36%
Set A
avg, stdev of P-wave lengths difference Classification Tree 21% 19%
avg, stdev of P-wave amplitudes difference Probit-Regression Tree 21% 35%
avg, stdev of R-wave lengths difference
avg, stdev of T-wave lengths difference ADTree 18% 25%
avg, stdev of T-wave amplitudes difference RandomForest 7% 4%
count, avg, stdev of time lag between T and P-wave
count ,avg ,stdev of time lag between P and Q-wave
weighted attributes of the above Table 5. Comparison of number of attributes.
Set B NaiveBayes NaiveBayes J48
count of P-wave length difference
recall false positive recall
count of P-wave amplitude difference
count of R-wave length difference Set A 67% 69% 32%
count of T-wave length difference
count of T-wave amplitude difference Set B 75% 55% 28%
weighted attributes of the above
Set B enhanced the attributes of Set A and included Table 6. Comparison of weight experiment.
weighted attributes as well. The result is shown in Table precision recall
5. We compared those two combinations of attribute sets <1,2,3> 56.32% 63.14%
mainly in the NaiveBayes classifier. Set B had better
<1,2,4> 54.64% 68.97%
recall than Set A and reduced false positive as well.
<1,3,9> 56.32% 69.00%
<1,5,25> 57.76% 57.42%
4.2.4 Effects of Weighting

In previous experiments, we introduced a weight 4.2.5 Effects of the Length of Bio-signals


mechanism on 3 segments. According to the studies of
references and the doctors recommendation, we gave Currently we experimented with 30 minutes ECG and
more weight for the segment near the endpoints of data. we discuss on different lengths of bio-signals in this
We tested certain kinds of weight methods and their paragraph. We extracted the beginning 5 minutes, 15
value: minutes, and the whole 30 minutes in the ECG data and
linear weight <1,2,3> made an experiment shown in Table 7.In the aspect of
square weight <1,2,4> recall, 30 minutes ECG data performed more stable than
square weight <1,3,9> any other time sample. However, the sample of 5

652
minutes ECG could achieve 61.48% of precision and of precise medical rules is needed in our system.
relatively low 46% of recall. We inferred that the normal 2. Feature extraction in periodic data: This system
patterns are more obvious in short time and the abnormal adopted segmentation skill and extreme pattern statistics.
patterns need a long observation. The length of Segmentation skill could avoid the missing impact of
bio-signals had not much impact on precision in this relatively infrequent patterns to data mining and
experiment, but longer bio-signals definitely are good for emphasize the signal signature in the period of time.
the diagnoses of illnesses. Extreme pattern statistics could calculate the changes of
attributes in each segment and make a great contribution
Table 7. Effects of bio-signals length. to illness data analysis and mining.
precision recall 3. Adaptive structure to data signatures: Our structure
5 minutes 61.48% 46% has considered medical factors with various bio-signals.
15 minutes 54.52% 59.71% Therefore, the operation of this system did not adapt to a
30 minutes 54.64% 68.97% kind of signal only. The embedded multi-classifiers could
improve the flexibility of data requirements.
4.3 General Experimental Result 4. Stable analysis results: Based on real data
experiments, we observed that system could have a
We showed the adjustment on parameters and data certain precision and recall to illness prediction. For
types through the above experiments. For PAF, the whole public test data PAF, the system could maintain stable
setting and experiment results are described as follows: sensitivity and the result was not far from the
1. We got 30 minutes ECG in PAF and set =3 to achievement of medical approval. This shows that the
divide data into 3 segments. system had stable results in bio-signal analysis and
2. By medical experts recommendation in section illness prediction.
4.2.3, we extracted each attribute with respect to Our research had completed a system design and
every segment. requirement for chronic patient bio-signal data mining
3. We transformed data by using extreme pattern techniques and validated the possibility by real data.
statistics and set =90 to get the attribute table such Currently we applied the system to illness related to ECG
as Table 1. only. For the future work, we pointed out some possible
4. We extended the attributes with a weight <1,2,4> in aspects here:
the attribute table 1. The reduction of parameters: Besides the doctors
recommendation, this system still had to set three
5. Finally the multi-classifiers read the preprocessed
parameters. We would like to reduce the number of
attribute table to build PAF classification model. We
parameters or provide a parameter-free system.
evaluated J48 and NaiveBayes classifier via
2. The application of bio-signals: This system took
F-measure and chose better one for the classification
ECG into consideration for now. We would hope to
kernel.
combine time series, such as blood pressure and breathe,
6. We tested performance by 5-Fold validation.
to enhance the analysis and the feasibility with kinds of
The experiment showed that this system could achieve
illness.
54% of precision and 68.97% of recall. In the PhysioNet
3. The choice of multi-classifiers: This system adopted
2001 Challenge Event 2 (PAF Prediction) [14], the recall
a measuring indicator to use a better classifier only. We
was ranged between 54% and 79%. The system of this
would like to combine the results of multi- classifiers to
research can achieve stable 68.97% of recall.
achieve a higher performance.
5. Conclusion and Future Work Acknowledgement
We have presented a data mining system for chronic
This research was supported by the Applied
patient monitoring with applications on caring of
Information Services Development & Integration project
cardiovascular patients. By the operation of this system,
of Institute for Information Industry and sponsored by
we could aid general and chronic illnesses diagnoses.
MOEA, Taiwan, R.O.C. under grant no
Based on the designed architecture, we completed an
IA95H01311D01.
analysis and prediction system on chronic illnesses via
ECG. This system could do analysis and prediction to
assist the care and diagnoses in clinical medical related
References
to ECG. The followings are the main characteristics of
[1] F. Alonso, J. P. Caraca-Valente, L. Martinez, C.
the system:
Montes, Discovering similar patterns for
1. A foundation based on light medical knowledge:
characterizing time series in a medical domain,
This system was not a fully independent auto system. We
Proceedings IEEE International Conference on Data
still required experts to list the features of bio-signals.
Mining, 2001, pp. 577-579.
This mechanism could avoid irrelevant information to
affect the system and ensure the medical basis of the [2] N. Friedman, D. Geiger, M. Goldszmidt, Bayesian
results. However, only light medical knowledge instead Network Classifiers, Machine Learning, vol. 29,

653
1997, pp.131-163.
[3] E. Frank, M. A. Hall, G. Holmes, R. Kirkby, B.
Pfahringer, I. H. Witten, Weka - a machine learning
workbench for data mining, The Data Mining and
Knowledge Discovery Handbook, Springer, 2005, pp.
1305-1314.
[4] J. G. Wolff, Medical diagnosis as pattern recognition
in a framework of information compression by
multiple alignment, unification and search, Elsevier
Decision Support Systems, 2005.
[5] R. Jafari, F. Dabiri, P. Brisk, M. Sarrafzadeh,
"Reconfigurable Fabric Vest for Fatal Heart Disease
Prevention," Journal of Embedded Computing (JEC),
2004.
[6] J. N. McNames, A. M. Fraser, Obstructive Sleep
Apnea Classification Based on Spectrogram Patterns
in the Electrocardiogram, Proceedings Computers in
Cardiology, 2000.
[7] T. Penzel, J. McNames, A. Murray, P. de Chazal, G.
Moody, B. Raymond, "Systematic comparison of
different algorithms for apnoea detection based on
electrocardiogram recordings," Med Biol Eng
Computer, vol. 40, 2002, pp. 402-407.
[8] B. Puers, W. Sansen, K.U. Leuven, Patient
monitoring systems, VLSI and Microelectronic
Applications in Intelligent Peripherals and their
Interconnection Networks, 1989,pp. 3/152-3/157
[9] R.Watrous, G. Towell, A Patient Adaptive Neural
Network ECG Patient Monitoring Algorithm,
Proceedings Computers in Cardiology, 1995.
[10] I. H. Witten, E. Frank, Data Mining: Practical
Machine Learning Tools and Techniques (Second
Edition), Morgan Kaufmann, 2005.
[11] W. Zong, R. Mukkamala, R. G. Mark, A
Methodology for Predicting Paroxysmal Atrial
Fibrillation Based on ECG Arrhythmia Feature
Analysis, Proceedings Computer in
Cardiology ,2001
[12] WFDB Software Package,
http://www.physionet.org/physiotools/wfdb.shtml
[13] PAF Prediction Challenge Database,
http://www.physionet.org/physiobank/database/afpdb
[14] Computers in Cardiology Challenge 2001 Top
Scores,
http://www.physionet.org/challenge/2001/top-scores.
shtml
[15] PhysioNet, http://www.physionet.org

654

Você também pode gostar