Escolar Documentos
Profissional Documentos
Cultura Documentos
Vincent S. Tseng1 Lee-Cheng Chen1 Chao-Hui Lee1 Jin-Shang Wu2 Yu-Chia, Hsu3
1
Dept. of Computer Science and Information Engineering, National Cheng Kung University, Taiwan, R.O.C.
2
Dept. of Family Medicine, National Cheng Kung University Hospital, Taiwan, R.O.C.
3
Southern Innovation Center, Institute for Information Industry, Taiwan, R.O.C.
Email: tsengsm@mail.ncku.edu.tw
There are some related studies to bio-signal analysis, Generate classification rule Output
such as ECG analysis and parsing. For instance, Wolff el
al. [4] used a non-supervised to diagnose via Figure 4. Flow of patient monitoring system.
bio-information, or detect apnea by ECG [6] [7]. Even
more, the complete hardware equipment [5] applied to 3.3 The realization of patient monitoring system
daily life care. The above methods were traditional
statistics basically. They could achieve certain results, Since raw bio-signal data are usually consecutive time
yet in tremendous data the precision and time cost series sequences, we identified some medical reference
performed relatively lower. points by tools provided from WFDB [12]. Then, we
W. Zong [11] pointed out that Paroxysmal Atrial preprocessed signals for easier reading thereafter through
Fibrillation (PAF) is related to the occurrence of Atrial bio-indicators recommended by doctors.
Premature Contraction (APC) and the closer to the point Because patients bio-signals have individual
of PAF, the more impact of APC. For this kind of differences, we proposed a mechanism to numerate and
approach, some medical rules have to be set in advance. emphasize the attributes which doctors recommended in
This might not be feasible in some applications, e.g., order to simplify the construction of classification model.
analysis of new diseases. Therefore, we presented a Figure 5 shows the idea of the extreme pattern
general data mining system for patient monitoring in this statistics we proposed. This method normalized raw
paper without requirements of coding medical rules in bio-signals first and then treated each attribute as a
advance. distribution. We divided this distribution into 100 parts
between the maximum and minimum and set a threshold
3. Patient Monitoring System parameter . Then we extracted percent data of both
side endpoints and calculated their origin average and
3.1 Problem Definition standard deviation to fully record the characteristic of
this attribute. We did the above procedure for every
We assumed that patients illnesses are often related to attribute that doctors recommended and recorded the
some kinds of their bio-signals. By observing those distribution of characteristic such as the number of
bio-signals and adopting corresponding medical anomaly and average.
attributes identified by professional doctors, we are able We set a parameter for processing simplicity. The
to build a classification tree based on target illness with parameter is for dividing the whole bio-signal as
data mining methods. Thus doctors could take segments to perform easier on time series. It contains the
suggestions of the system into account and diagnose same attributes in each segment. A weight of a segment
more properly. would be given then after considering the importance of
medical theory, doctors recommendation and the time
3.2 System Flow impact of attributes. For example, the doctors
650
recommended attributes as {a1 ,a2, a3 ,, ak} for an we divided ECG into 3 segments, that is, =3.
illness and we divided a bio-signal into segments of {S1
,S2, S3 ,, S}. Thus, every segment Sn contains Classifier 1 Model 1
attributes {an1, an2, an3, , ank}. We assume the weight of Classifier 2 Model 2
Sn is Wn, and we calculated extended attributes aj, as Data Table Classification
shown below. Rule
651
We used different kinds of classifiers provided by square weight <1,5,25>
WEKA [3], and the evaluation results are shown in Table We chose 5-Fold validation method to repeatedly test
4. Obviously NaiveBayes could achieve 68% of recall. upon the above weight policies, shown in Table 6. In
Besides this, we chose J48 classifier for the following aspect of recall, linear weight <1,2,3> was less than
experiments since it generates a rule-based classifier square weight <1,2,4> and <1,3,9>, yet square weight
rapidly. Based on the above, we adopted NaiveBayes and <1,3,9> and <1,5,25> were similar to square weight
J48 to build multi-classifiers of the system. <1,2,4>. In another aspect of precision, all weight skills
performed the same.
100% Briefly speaking, square weight <1, 2, 4> was better
80%
since we emphasized on the outcome of recall.
60% Precision
40% Recall
Table 4. Multi-classifiers experiment.
20% recall false positive
0% NaiveBayesSimple 68% 67%
40% 50% 60% 70% 80% 90% 95%
NaiveBayes 68% 69%
%
libSVM 61% 76%
Figure 7. Extreme pattern statistics.
Radial Basis Function 57% 56%
4.2.3 Effects of number of Attributes Random-ForestTree 39% 21%
652
minutes ECG could achieve 61.48% of precision and of precise medical rules is needed in our system.
relatively low 46% of recall. We inferred that the normal 2. Feature extraction in periodic data: This system
patterns are more obvious in short time and the abnormal adopted segmentation skill and extreme pattern statistics.
patterns need a long observation. The length of Segmentation skill could avoid the missing impact of
bio-signals had not much impact on precision in this relatively infrequent patterns to data mining and
experiment, but longer bio-signals definitely are good for emphasize the signal signature in the period of time.
the diagnoses of illnesses. Extreme pattern statistics could calculate the changes of
attributes in each segment and make a great contribution
Table 7. Effects of bio-signals length. to illness data analysis and mining.
precision recall 3. Adaptive structure to data signatures: Our structure
5 minutes 61.48% 46% has considered medical factors with various bio-signals.
15 minutes 54.52% 59.71% Therefore, the operation of this system did not adapt to a
30 minutes 54.64% 68.97% kind of signal only. The embedded multi-classifiers could
improve the flexibility of data requirements.
4.3 General Experimental Result 4. Stable analysis results: Based on real data
experiments, we observed that system could have a
We showed the adjustment on parameters and data certain precision and recall to illness prediction. For
types through the above experiments. For PAF, the whole public test data PAF, the system could maintain stable
setting and experiment results are described as follows: sensitivity and the result was not far from the
1. We got 30 minutes ECG in PAF and set =3 to achievement of medical approval. This shows that the
divide data into 3 segments. system had stable results in bio-signal analysis and
2. By medical experts recommendation in section illness prediction.
4.2.3, we extracted each attribute with respect to Our research had completed a system design and
every segment. requirement for chronic patient bio-signal data mining
3. We transformed data by using extreme pattern techniques and validated the possibility by real data.
statistics and set =90 to get the attribute table such Currently we applied the system to illness related to ECG
as Table 1. only. For the future work, we pointed out some possible
4. We extended the attributes with a weight <1,2,4> in aspects here:
the attribute table 1. The reduction of parameters: Besides the doctors
recommendation, this system still had to set three
5. Finally the multi-classifiers read the preprocessed
parameters. We would like to reduce the number of
attribute table to build PAF classification model. We
parameters or provide a parameter-free system.
evaluated J48 and NaiveBayes classifier via
2. The application of bio-signals: This system took
F-measure and chose better one for the classification
ECG into consideration for now. We would hope to
kernel.
combine time series, such as blood pressure and breathe,
6. We tested performance by 5-Fold validation.
to enhance the analysis and the feasibility with kinds of
The experiment showed that this system could achieve
illness.
54% of precision and 68.97% of recall. In the PhysioNet
3. The choice of multi-classifiers: This system adopted
2001 Challenge Event 2 (PAF Prediction) [14], the recall
a measuring indicator to use a better classifier only. We
was ranged between 54% and 79%. The system of this
would like to combine the results of multi- classifiers to
research can achieve stable 68.97% of recall.
achieve a higher performance.
5. Conclusion and Future Work Acknowledgement
We have presented a data mining system for chronic
This research was supported by the Applied
patient monitoring with applications on caring of
Information Services Development & Integration project
cardiovascular patients. By the operation of this system,
of Institute for Information Industry and sponsored by
we could aid general and chronic illnesses diagnoses.
MOEA, Taiwan, R.O.C. under grant no
Based on the designed architecture, we completed an
IA95H01311D01.
analysis and prediction system on chronic illnesses via
ECG. This system could do analysis and prediction to
assist the care and diagnoses in clinical medical related
References
to ECG. The followings are the main characteristics of
[1] F. Alonso, J. P. Caraca-Valente, L. Martinez, C.
the system:
Montes, Discovering similar patterns for
1. A foundation based on light medical knowledge:
characterizing time series in a medical domain,
This system was not a fully independent auto system. We
Proceedings IEEE International Conference on Data
still required experts to list the features of bio-signals.
Mining, 2001, pp. 577-579.
This mechanism could avoid irrelevant information to
affect the system and ensure the medical basis of the [2] N. Friedman, D. Geiger, M. Goldszmidt, Bayesian
results. However, only light medical knowledge instead Network Classifiers, Machine Learning, vol. 29,
653
1997, pp.131-163.
[3] E. Frank, M. A. Hall, G. Holmes, R. Kirkby, B.
Pfahringer, I. H. Witten, Weka - a machine learning
workbench for data mining, The Data Mining and
Knowledge Discovery Handbook, Springer, 2005, pp.
1305-1314.
[4] J. G. Wolff, Medical diagnosis as pattern recognition
in a framework of information compression by
multiple alignment, unification and search, Elsevier
Decision Support Systems, 2005.
[5] R. Jafari, F. Dabiri, P. Brisk, M. Sarrafzadeh,
"Reconfigurable Fabric Vest for Fatal Heart Disease
Prevention," Journal of Embedded Computing (JEC),
2004.
[6] J. N. McNames, A. M. Fraser, Obstructive Sleep
Apnea Classification Based on Spectrogram Patterns
in the Electrocardiogram, Proceedings Computers in
Cardiology, 2000.
[7] T. Penzel, J. McNames, A. Murray, P. de Chazal, G.
Moody, B. Raymond, "Systematic comparison of
different algorithms for apnoea detection based on
electrocardiogram recordings," Med Biol Eng
Computer, vol. 40, 2002, pp. 402-407.
[8] B. Puers, W. Sansen, K.U. Leuven, Patient
monitoring systems, VLSI and Microelectronic
Applications in Intelligent Peripherals and their
Interconnection Networks, 1989,pp. 3/152-3/157
[9] R.Watrous, G. Towell, A Patient Adaptive Neural
Network ECG Patient Monitoring Algorithm,
Proceedings Computers in Cardiology, 1995.
[10] I. H. Witten, E. Frank, Data Mining: Practical
Machine Learning Tools and Techniques (Second
Edition), Morgan Kaufmann, 2005.
[11] W. Zong, R. Mukkamala, R. G. Mark, A
Methodology for Predicting Paroxysmal Atrial
Fibrillation Based on ECG Arrhythmia Feature
Analysis, Proceedings Computer in
Cardiology ,2001
[12] WFDB Software Package,
http://www.physionet.org/physiotools/wfdb.shtml
[13] PAF Prediction Challenge Database,
http://www.physionet.org/physiobank/database/afpdb
[14] Computers in Cardiology Challenge 2001 Top
Scores,
http://www.physionet.org/challenge/2001/top-scores.
shtml
[15] PhysioNet, http://www.physionet.org
654