Deepti 694-701 PDF

International Journal of Research in Computer and Communication Technology, Vol 2, Issue 9, September -2013
ISSN(Online) 2278-5841 ISSN (Print) 2320-5156
Classification Of Heart Disease Using Svm And ANN

Deepti Vadicherla1, Sheetal Sonawane2
1
Department of Computer Engineering, Pune Institute of Computer and Technology, University of Pune, Pune, India deepti.vadicherla@gmail.com 2 Department of Computer Engineering, Pune Institute of Computer and Technology, University of Pune, Pune, India sssonawane@pict.edu Abstract: Classification of heart disease can be useful for the physicians if it is computerized for the purpose of fast diagnosis and accurate result. Predicting the existence of heart disease accurately can save patients life. The objective of this paper is to analyze the application of AI tools for classification and prediction of heart disease. The work includes the classification of heart disease using Support Vector Machine and Artificial Neural Network. Comparison is carried out among two methods on the basis of accuracy and training time. This paper presents a Medical decision support system for heart disease classification in a rational, objective, accurate and fast way. The dataset used is the Cleveland Heart Database taken from UCI learning data set repository. In the proposed model we classify the data into two classes using SMO algorithm in Support Vector machine and Artificial Neural Network (ANN). Keywords: Support Vector Machine, Sequential Minimal Optimization ,Optimization problem, Heart disease, Artificial Neural Network. I. INTRODUCTION At present, the number of people suffering from heart disease is on a rise. Accurate diagnosis at an early stage followed by proper subsequent treatment can result in significant life saving. New data released by the National Heart, Lung, and Blood Institute (NHLBI) of the show that especially women in older age groups are more at risk of getting heart disease. A recent study fielded Heart disease can be controlled effectively if it is diagnosed at an early stage [24]. But its not easy to do accurate diagnosis because of many complicated factors of heart diseases. For example, many clinical symptoms are associated with many human organs other than the heart and very often heart diseases may exhibit various syndromes. Due to this complexity, there is a need to automate the process of medical diagnosis which can help medical practitioners in the diagnostic process [1], [2]. To reduce the diagnosis time and improve the diagnosis accuracy, it has become more of a demanding issue to develop reliable and powerful medical decision support systems to support the diagnosis decision process. Basically medical diagnosis is complicated process, hence the approach for solving this issue, is to develop such an intelligent system, such as Support Vector Machine and Artificial Neural Network[4],[5]. It has shown great potential to be applied in the design and implementation of decision support system of heart disease. The system uses features extracted from the ECG data of the patients. The experiments however, have been performed taking the Cleveland Heart Database taken from UCI learning data set repository which was donated by Detrano[23]. Results obtained from support vector machine model are satisfactory. This paper presents a medical decision support system for heart disease classification. In the proposed model we classify the data into two classes using Support Vector machine and Artificial Neural Network [21], [22]. The rest of the paper organized as, support vector machine described in section 2. Section 3 includes artificial neural network, in which functioning of ANN is explained. Proposed model of MDSS and related work is mentioned and explained in section 4. Experiments and results are shown in section 5. Section 6 has conclusion followed by the future
work in section 7.
www.ijrcct.org
Page 694
II. SUPPORT VECTOR MACHINE Support Vector Machine, is a promising method of learning machine, based on statistical learning theory developed by Vladimir Vapnik. Support vector machine (SVM) used for the classification of both linear and nonlinear data [6], [7]. It performs classification by constructing a linear optimal separating hyperplane within higher dimension, with the help of support vectors and margins, which separates the data into two categories (or classes). With an appropriate nonlinear mapping the original training data is mapped into a higher dimension. Within this the data from two classes can always be separated by a hyperplane[8]. Relevant mathematics associated with the project is as given below. Let S represents Medical decision support system. This system provides classification by two methods, one is SVM and another is ANN. It can be shown in following form, S= { SVM, ANN} Suppose f is a function for Support vector machine, then, f: IO where, I is domain ( set of inputs) I= {D, E} D= { X, Y} X= {xi | 1<=i<=n} Y= {yi |1<=i<=n} i.e. D= { (xi, yi) (X Y) E (set of constants) = {C, e} O is co domain (set of output) O = { opi | 1<=i<=n} The support vector machine computes a linear classifier of the form, f(x) = WX + b Where, W is weight vector X is input vector B is bias The separating hyperplane is the plane, f(x) = 0.
Therefore we can say that, any point from one class lies above the separating hyperplane satisfies, f(X) > 0. In the same way any point from another class lies below the separating hyperplane satisfies, f(X) < 0. Above equations were processed to make the linearly separable set D to meet the following inequality, yi ( f(x) ) 1, i Here the margin m is ,
1 || w ||2
Using above equation, maximizing margin can be written in the form of optimization problem as below:
min 2 || w||
w, b
Subject to
y w.x b 1, i
i
This optimization problem can be solved by using dual Lagrange multiplier,
N min ( ) min yi y j ( xi x j ) i j i ,
1 2
i 1 j 1
i 1
The output of a non-linear SVM is explicitly computed from the Lagrange multipliers [17, 9],
N u y j j K ( x j , x ) b, j 1
where K is a kernel function. We used Radial Basis Kernel Function (RBF) [10] here, which is denoted as follow: K(xi, xj) = exp(-|| xi xj ||2), >0 The non-linearity alter the quadratic form, but the dual objective function is still quadratic in , N N N 1 min ( ) min y y K ( x , x ) i, i j i j i j 2

i 1
0 i C , i,
i 1 j 1 N
i 1
y
i 1 i
0.
Sequential minimal optimization algorithm solves above quadratic programming problem by repeatedly finding two Lagrange multipliers that can be optimized with respect to each other. Sequential minimal optimization (SMO) [5] is an algorithm for efficiently solving the optimization problem which arises during the training of support vector machine. At every step, SMO chooses two Lagrange multipliers to jointly optimize, finds the optimal values for these multipliers and updates the SVM to reflect the new optimal values [15]. Functioning of SMO as in the below algorithm.
www.ijrcct.org
Page 695
SMO training algorithm:
Step 1: Input C, kernel, kernel parameters, and epsilon. Step 2: Initialize i = 0 and b= 0 Step 3: Let f(x) = b+ and the tolerance. Step 4: Find Lagrange multiplier i, which violates KKT optimization. Step 5: Choose second multiplier and optimize pair. Repeat steps 4 and 5 till convergence. Step 6: Update 1 and 2 in one step. 1 can be changed to increase f(x1). 2 can be changed to decrease f(x2). Step 7: Compute new bias weight b. III. ARTIFICIAL NEURAL NETWORK An artificial neural network is a computational model based on the structure and functions of biological neural networks. Information that flows through the network affects the structure of the ANN because a neural network changes, in a sense based on that input and output. ANNs are considered nonlinear statistical data modeling tools where the complex relationships between inputs and outputs are modeled or patterns are found.Training a neural network model essentially means selecting one model from the set of allowed models that minimizes the cost criterion. There are numerous algorithms available for training neural network models; most of them can be viewed as a straightforward application of optimization theory and statistical estimation. In the proposed model Resillient Backpropogation algorithm is used for training ANN. Classification algorithm of ANN: Step 1: Initialize the training class buffers, input data buffers, Declare BasicNetwork ANNnetwork in ENCOG. Step 2: Extract input data and update as, d= Input Layers, out= Output Read, Hidden layers = 2d -1. Step 3: Initialize trainigSet with InputVal[] and OutputVal[]. Step 4: Use Resillient Propagation for training of neural network. Step 5: epoch = 1; do{ train the ANNnetork. Epoch++;
}while(training error > 0.01 && epoch <25000); Step 6: Compute the training accuracy. Step 7: Use this trained ANNnetwork for testing. Suppose g is a function for Artificial Neural Network, then, g: MN where, M is domain ( set of inputs) M= {A, B} A= { X, Y} A= { (xi, yi) (X Y)
i 1
B={ wij, ij(t), E, , epochs} Let wij =weights ij(t) = individual update value ij(t) exclusively determines the magnitude of the weight-update. This update value can be expressed mathematically according to the learning rule for each case based on the observed behavior of the partial derivative during two successive weight-steps by the following formula:
where, 0 < - < 1 < + . Whenever the partial derivative of the equivalent weight wij varies its sign, it indicates that the last update was large in magnitude and the algorithm has skipped over a local minima then, ij(t) = ij(t) - Otherwise, the update-value will do some extent increase. If the derivative is positive, the weight is decreased by its update value, if the derivative is negative, the
www.ijrcct.org
Page 696
update-value
is added as shown below:
wij(t+1) = wij(t) + wij(t) However, there is one exception. If the partial derivative changes sign that is the previous step was too large and the minimum was missed, the previous weight-update is reverted: wij(t) = - wij(t-1), if To avoid a double penalty of the update-value, set the above update rule by putting below value in ij.
result in speeding up the computation process. They have high tolerance of noisy data. The major disadvantage of neural networks is that, they have poor interpretability. Fully connected networks are difficult to articulate. Whereas various empirical studies of Bayesian classifier in comparison with decision tree and neural network classifiers have found out that, in theoretical way Bayesian classifiers have minimum error rate in comparison to all other classifiers. However, in practice this is not always the case, owing to inaccuracies in the assumptions made for its use, such as class conditional independence and the lack of available probability data. 2. Pre-processed data The experiments are carried out on heart dataset using Sequential Minimal Optimization in Support Vector Machine. The experiments are carried out on heart dataset using Sequential Minimal Optimization in Support Vector Machine. Heart disease is diagnosed with the help of some complex pathological data. The heart disease dataset used in this experiment is the Cleveland Heart Disease database taken from UCI machine learning dataset repository [23]. This database contains 14 attributes as below: 1. Age of patient, 2. Sex of patient, 3. Chest pain type, 4. Resting blood pressure, 5. Serum cholesterol, 6. Fasting blood sugar, 7. Resting ECG results, 8. Maximum heart rate achieved, 9. Exercise induced angina, 10. ST depression induced by exercise relative to rest, 11. Slope of the peak exercise ST segment, 12. number of major vessels colored by flourosopy, 13. thal, 14. Diagnosis of heart disease. 3. Flow diagram of MDSS The purpose of this proposed model is to diagnose the heart disease by classifying the dataset of heart disease. This classification process is shown in Figure 1.
The partial derivative of the total error is given by the following formula:
This indicates that the weights are updated only after the presentation of all of the training patterns. Resilient back-propagation (RPROP) training algorithm [21] was adopted to train the proposed ANN model as mentioned previously. After the selection of network, the network has been trained using resilient backpropagation training scheme. The training parameters have been modified several times as explained above until the optimum performance has been achieved. Maximum number of iterations has been set to 25000 epochs. IV. PROPOSED MODEL OF MEDICAL DECISION SUPPORT SYSTEM 1. Related work Medical decision support system work has been carried on the basis of performance of different methods like SVM[20], Artificial neural network, Bayesian classification method, etc. [1], [2]. Neural network algorithms are inherently parallel, which
www.ijrcct.org
Page 697
minimal optimization in Support vector machine is more effective than Resilient backpropagation in Artificial neural network. The proposed system is tested with many datasets. The experimental results are shown in the following figures.
Figure 2: Pie chart of multiclass SVM classification Multiclass classification by using Support Vector Machine Following Table1 gives the result of multiclass classification of SVM. Figure 1. Flow diagram of MDSS for heart disease V. RESULT AND DISCUSSION In the proposed model, we used dataset having 297 total number of patient records. Large part of records in the dataset is used for training and rest of them are used for testing. The main difference between the dataset given as input to training and testing is that, the input we are giving to training is the data with correct diagnosis (14th field in the dataset) and whereas the input data of testing doesnt have the correct diagnosis purposely. The Diagnosis (14 th) field refers to the presence or absence of heart disease of that respective patient. It is integer valued field, having value 1(absence of disease) or -1(presence of disease). So that at the end of testing process we can check the result in the output file created after testing and verify the efficiency of the proposed model in terms of accuracy. In the proposed system two methods of classification are provided, Support vector machine and Artificial neural network. Performance of both the classification techniques is compared in terms of time needed for classification and accuracy of the system. From the below analysis we can say that, Sequential Figure 2 of pie chart shows the performance of the system with fifth sample from table1, which gives 100% accuracy. 2. Multiclass classification by using Artificial neural network 1.
Following analysis done with, Input Layer = 13, Hidden Layer = 25 and Output Layer = 1. In this case training is done till error becomes less than 0.010 or epochs are less than 25000. Table2 and Figure 3 gives the result of ANN multiclass classification. Following Figure 3 of pie chart shows the performance of fifth sample from table 2 which gives 65% accuracy.
www.ijrcct.org
Page 698
Figure 3: Pie chart of multiclass ANN classification No. of samples 50 100 150 200 297 Class II 7 19 27 37 54 Class III 4 10 15 20 35 Class IV 5 10 18 23 35 Class V 4 4 7 10 13 Testing Accuracy(10 samples) 100 100 100 100 100
Class I 30 57 83 110 160
Accuracy 100 100 100 100 100
Time 0.078 0.063 0.078 0.094 0.218
Table 1: Performance of the multiclass SVM decision support system No. of samples 50 100 150 200 297 Class I 30 57 83 110 159 Class II 7 19 25 36 40 Class III 4 10 14 10 21 Class IV 5 10 18 15 23 Class V 4 3 5 4 4 Training Accuracy 100 95 91.2 70.4 68.2 Time 1.43 31.02 45.6 36.65 56.5 Testing Accuracy 100 90 90 70 65
Table 2: Performance of the multiclass ANN decision support system
www.ijrcct.org
Page 699
VI. CONCLUSION The results of the SVM classification algorithm compared to the ANN classification, are very encouraging. The difference in the accuracy is noticeable. Moreover the difference in the execution times is even more noteworthy. The enhanced performance of the SVM classification is due to the fact that they can avoid repetitive searches in order to find the best two points to use for each optimization step. It is found that SMO performs better with high accuracy when the data is preprocessed and given as input. Applied to the task of solving classification problem of heart disease and the features extracted based on statistical properties, the accuracy is higher in proposed SVM classification model which uses SMO. VII. FUTURE WORK SMO is a carefully organized algorithm which has excellent computational efficiency. However, because of its way of computing and use of a single threshold value it can become inefficient. In future multiple threshold parameters can be used to improve the performance in terms of speed. In case of artificial neural network, window momentum is a standard technique that can be used to speed up convergence and maintain generalization performance. Window momentum can give significant speed-up over a set of applications with same or improved accuracy. REFERENCES [1] Long Wan, Wenxing Bao, Research and Application of Animal Disease Intelligent Diagnosis Based on Support Vector Machine IEEE International Conference on Computational Intelligence and Security, Pages 66-70, 2009. S.N. Deepa, B. Aruna Devi, Neural Networks and SMO based Classification for Brain Tumor, IEEE World Congress on Information and Communication Technologies, Pages 1032-1037,2010. C. Cortes and V. Vapnik, Support-vector network, Machine Learning, vol.20, 1995. S. S. Keerthi, S. K. Shevade, C. Bhattacharyya, and K. R. K. Murthy Improvements to platts smo algorithm for [11] [8] [5]

svm classifier design, Neural Computation, vol.13,pp:637-649, 2002. John C. Platt, Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines, Microsoft Research, Technical Report MSR-TR-98-14. Gong Wei Wang Shoubin, Support Vector Machine for Assistant Clinical Diagnosis of Cardiac Disease, IEEE Global Congress on Intelligent Systems, pp: 588-591, 2009. Ya Gao; Shiliang Sun, An empirical evaluation of linear and nonlinear kernels for text classification using Support Vector Machines, IEEE Seventh International Conference on Fuzzy Systems and Knowledge Discovery (FSKD), Pages: 1502-1505, 2010. Qinghua Jiang, Guohua Wang, Tianjiao Zhang, Yadong Wang, Predicting Human microRNA-disease Associations Based on Support Vector Machine, IEEE International Conference on Bioinformatics and Biomedicine, pp:467-472 , 2010. Browne, K.E.; Burkholder, R.J., Nonlinear Optimization of Radar Images From a Through-Wall Sensing System via the Lagrange Multiplier Method, IEEE Geoscience and Remote Sensing Letters, Pages: 803-807, 2012. Gao Daqi; Zhang Tao, Support vector machine classifiers using RBF kernels with clustering-based centers and widths IEEE Intrnational Joint Conference on Neural Networks, 2007 , Pages: 2971-2976. Olvi L.Mangasarian and Michael E.Thompson, Massive Data Classification via Unconstrined Support Vector Machines Journal of Optimization Theory and Applications, 131, Pages315325, 2006. P. S. Bradley and O. L. Mangasarian. Massive data discrimination via linear support vector machines. Optimization Methods and Software, 13:1-10, 2000.ftp://ftp.cs.wisc.edu/math-prog/techreports/98-05.ps Osuna, E., Freund, R., Girosi, F., Improved Training Algorithm for Support Vector Machines, Proc. IEEE NNSP 97, 1997.
[6]
[7]
[9]
[10]
[2]
[12]
[3] [4]
[13]
www.ijrcct.org
Page 700
[14]
Yiqiang Zhan, Dinggang Shen, Design efficient support vector machine for fast classification, The journal of pattern recognition society 38, Pages 157-161,2004. [24]

Mediline Plus: Information related to Heart Diseases. http://www.nlm.gov/medlineplus/heartdiseas es.html
[15]
Peng Peng, Qian-Li Ma, Lei-Ming Hong, The Research of the Parallel SMO algorithm for solving SVM, Proceedings of the Eighth International Conference on Machine Learning and Cybernetics, Baoding, pp:1271-1274, 12-15 July 2009. Chin-Jen Lin, Asymptotic convergence of an SMO algorithm without any assumptions, IEEE Transactions on Neural Networks, vol.13,issue 1, pp:248-250, 2002. Baxter Tyson Smith, Lagrange Multipliers Tutorial in the Context of Support Vector Machines, Memorial University of Newfoundland St. Johns, Newfoundland, Canada. Ya-Zhou Liu; Hong-Xun Yao; Wen Gao; De-bin Zhao, Single sequential minimal optimization: an improved SVMs training algorithm, Proceedings of 2005 International Conference on Machine Learning and Cybernetics, 2005. Pages 4360- 4364. Kjeldsen, Tinne Hoff. "A contextualized historical analysis of the Kuhn-Tucker theorem in nonlinear programming: the impact of World War II". Historia Math. 27 (2000), no. 4, 331361. Deepti Vadicherla, Sheetal Sonawane, Decision support system for heart disease based on sequential minimal optimization in support vector machine International Journal of Engineering Sciences and Emerging Technologies, Volume 4, Issue 2, Pages: 19-26, 2013. Ian H. Witten, Eibe Frank, Data Mining, Elsevier Inc. 2005. Jiawei Han and Micheline Kamber, Data Mining: Concepts and Techniques, 2/e, Morgan Kaufmann Publishers, Elsevier Inc. 2006. UCI Machine Learning Repository: Heart Disease Data Set. http://archive.ics.uci.edu/ml/datasets/Heart+ Disease
[16]
[17]
[18]
[19]
[20]
[21] [22]
[23]
www.ijrcct.org
Page 701

Deepti 694-701 PDF

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Deepti 694-701 PDF

Enviado por

Direitos autorais:

Formatos disponíveis

International Journal of Research in Computer and Communication Technology, Vol 2, Issue 9, September -2013

ISSN(Online) 2278-5841 ISSN (Print) 2320-5156

Classification Of Heart Disease Using Svm And ANN

ISSN(Online) 2278-5841 ISSN (Print) 2320-5156

This optimization problem can be solved by using dual Lagrange multiplier,

SMO training algorithm:

ISSN(Online) 2278-5841 ISSN (Print) 2320-5156

ISSN(Online) 2278-5841 ISSN (Print) 2320-5156

ISSN(Online) 2278-5841 ISSN (Print) 2320-5156

ISSN(Online) 2278-5841 ISSN (Print) 2320-5156

Class I 30 57 83 110 160

Accuracy 100 100 100 100 100

Time 0.078 0.063 0.078 0.094 0.218

Table 2: Performance of the multiclass ANN decision support system

ISSN(Online) 2278-5841 ISSN (Print) 2320-5156

ISSN(Online) 2278-5841 ISSN (Print) 2320-5156

Você também pode gostar