Você está na página 1de 5

N.DEEPIKA* et al. / (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES Vol No. 11, Issue No.

2, 253 - 257

Association rule for classification of Heart-attack patients


N.DEEPIKA, M.Tech, Department of CSE ATRI ,Parvathapur, Uppal, Hyderabad, India. e-mail: nagilladeepika@gmail.com K.CHANDRA SHEKAR, Sr. Asst.Prof. Department of CSE, ATRI, Parvathapur, Uppal, Hyderabad, India. e-mail :chandhra2k7@gmail.com D. SUJATHA, Assoc.Prof and HOD Department of CSE ATRI, Parvathapur, Uppal, Hyderabad, India. e-mail : sujatha.dandu@gmail.com

Keywords- Association rule, Binning, Data Mining, Frequent Patterns, Heart Disease, PCAR (Pruning Classification of association rule), Pre-processing.

I. INTRODUCTION The healthcare industry collects huge amounts of healthcare data which, unfortunately, are not mined to discover hidden information for effective decision making. Discovery of hidden patterns and relationships often goes unexploited. Advanced data mining techniques can help remedy this situation. Using medical profiles such as age, sex, blood pressure and blood sugar it can predict the likelihood of patients getting a heart disease. It enables significant knowledge, e.g. patterns, relationships between medical factors related to heart disease, to be established. information about data [1]. Data mining technology provides a useroriented approach to novel and hidden patterns in the data.

IJ A
ISSN: 2230-7818

Abstract Data mining is the non trivial extraction of implicit, previously unknown and potentially useful information from data. Data mining technology provides a user- oriented approach to novel and hidden patterns in the data. This paper presents about the various effective heart attack prediction system using Pruning-Classification Association Rule (PCAR): an Efficient Approach for mining Association Rules. A proficient methodology for the generation of association rules from the heart disease warehouses for heart attack prediction has been presented. Initially, the Pima Indian heart attack data warehouse is pre-processed in order to make it suitable for mining process. Once preprocessing gets over, the heart disease data warehouse is binning with the aid of the modified equal width binning interval approach to discretizing continuous valued attributes. The approximate width of the desired interval is chosen based on the opinion of medical expert and is provided as an input parameter to the model. First we have converted numeric attributes into categorical form based on above techniques. Consequently the frequent patterns applicable to heart disease are mined with the aid of the PCAR algorithm from the data extracted. In addition, the patterns vital to heart attack prediction are selected on basis of the computed significant class labels. The classification technique is trained with the selected class labels for the effective prediction of heart attack. Lastly we have generated the association rules which are useful to identify general associations in the data. The results thus obtained have illustrated that the designed prediction system is capable of predicting the heart attack effectively.

The discovered knowledge can be used by the healthcare administrators to improve the quality of service. A wide variety of areas including marketing, customer relationship management, engineering, medicine, crime analysis, expert prediction, Web mining, and mobile computing, besides others utilize Data mining [2]. Numerous fields associated with medical services like prediction of effectiveness of surgical procedures, medical tests, medication, and the discovery of relationships among clinical and diagnosis data as well employ Data Mining methodologies [3].Therefore, data mining has developed into a vital domain in healthcare [4]. It is possible to predict the efficiency of medical treatments by building the data mining applications. The real-life data mining applications are attractive since they provide data miners with varied set of problems, time and again. Working on heart disease patients databases is one kind of a real-life application. Therefore it appears reasonable to try utilizing the knowledge and experience of several specialists collected in databases towards assisting the diagnosis process [2], [5]. In the recent past, the data mining techniques were utilized by several authors to present diagnosis approaches for diverse types of heart diseases [6, 7, 8, 9, 10, 11]. This paper presents about the various effective heart attack prediction system using PCAR: an Efficient Approach for mining Association Rules. A proficient methodology for the generation of association rules from the heart disease warehouses for heart attack prediction has been presented. Initially, the Pima Indian heart attack data warehouse is pre-processed in order to make it suitable for mining process. Once preprocessing gets over, the heart disease data warehouse is binning with the aid of the modified equal width binning interval approach to discretizing continuous valued attributes. The approximate width of the desired interval is chosen based on the opinion of medical expert and is provided as an input parameter to the model. First we have converted numeric attributes into categorical form based on above techniques. Consequently the frequent patterns applicable to heart disease are mined with the aid of the PCAR algorithm from the data extracted. In addition, the patterns vital to heart attack prediction are selected on basis of the computed significant class labels. The classification technique is trained with the selected class labels for the effective prediction of heart attack. A lot of existing algorithms used for mining association rules identify frequent item sets by the method of bottom-up combination of smaller frequent item sets or top-down decomposing of larger

@ 2011 http://www.ijaest.iserp.org. All rights Reserved.

ES

Page 253

N.DEEPIKA* et al. / (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES Vol No. 11, Issue No. 2, 253 - 257

infrequent item sets, these methods result the large volumes of candidate item sets. Actually as the supersets of infrequent items are infrequent item sets, this paper presents a new efficient method, namely Pruning-Classification Association Rule (PCAR). PCAR combines minimum frequency items with minimum frequency item sets. It firstly deletes infrequent items from item sets, then classifies item sets based on frequency of item sets, finally discovers frequent item sets. The number of candidate item sets is greatly reduced and item sets need not to be combined or decomposed, therefore, operation time and memory requirement could be decreased accordingly. This method has significant advantage in mining association rule at large volumes of items and small frequency of item sets. It is proved by experiments that PCAR outperforms the well-known Apriori algorithm. The efficiency of the designed system in predicting the heart attack is illustrated by the acquired results. The remaining sections of the paper are organized as follows: In Section 2, a brief review of some of the works on heart disease diagnosis is presented. An introduction about the heart disease and its effects are given in Section 3. The extraction of significant patterns from heart disease data warehouse is pre-processed in order to make it suitable for mining process. Consequently the frequent patterns applicable to heart disease are mined with the aid of the PCAR algorithm from the data extracted in Section 4. The experimental results are described in Section 5. The conclusions are summed up in Section 6..

principle of the algorithm is: firstly calculates the support of all item sets in candidate item set Ck obtained by Lk-1, if the support of the item set is greater than or equal to the minimum support, the candidate k-item set is frequent k-item set, that is Lk, then combines all frequent k-item sets to a new candidate item set Ck+1, level by level, until finds large frequent item sets, as shown in Fig.1.

II. REVIEW OF RELATED BACKGROUND LITERATURE Numerous works in literature related with heart disease diagnosis using data mining techniques have motivated our work. Some of the works are discussed below:

IJ A
Steps to perform Apriori algorithm:
ISSN: 2230-7818

This paper presents a novel and more efficient PCAR algorithm. It comes from the analyzing and considering of Apriori algorithm. It is the best-known algorithm to mine association rules. It uses a breadth-first search strategy to counting the support of item sets and uses a candidate generation function which exploits the downward closure property of support.

1. Generating item sets that pass a minimum support threshold. 2. Generating rules that pass a minimum confidence threshold. 3. Apriori uses a bottom up approach, where frequent subsets are extended one item at a time (a step known as candidate generation, and groups of candidates are tested against the data. The algorithm terminates when no further successful extensions are found. 4. Apriori uses breadth-first search and a hash tree structure to count candidate item sets efficiently. Apriori algorithm gets large frequent item sets through the combination and pruning of small frequent item sets. The

@ 2011 http://www.ijaest.iserp.org. All rights Reserved.

ES
III.

The problem of identifying constrained association rules for heart disease prediction was studied by Carlos Ordonez [12].Three constraints were introduced to decrease the number of patterns. First one necessitates the attributes to appear on only one side of the rule. The second one segregates attributes into uninteresting groups. The ultimate constraint restricts the number of attributes in a rule. Experiments illustrated that the constraints reduced the number of discovered rules remarkably besides decreasing the running time. Two groups of rules envisaged the presence or absence of heart disease in four specific heart arteries. Data mining methods may aid the clinicians in the predication of the survival of patients and in the adaptation of the practices consequently. HEART DISEASE

The term Heart disease encompasses the diverse diseases that affect the heart. Heart disease kills one person every 34 seconds in the United States. Coronary heart disease, Cardiomyopathy and Cardiovascular disease are some categories of heart diseases. The term cardiovascular disease includes a wide range of conditions that affect the heart and the blood vessels and the manner in which blood is pumped and circulated through the body. Cardiovascular disease (CVD) results in severe illness, disability, and death. Narrowing of the coronary arteries results in the reduction of blood and oxygen supply to the heart and leads to the Coronary heart disease (CHD). Myocardial infarctions, generally known as a heart attacks, and angina pectoris, or chest pain are encompassed in the CHD. A sudden blockage of a coronary artery, generally due to a blood clot results in a heart attack. Chest pains arise when the blood received by the heart muscles is inadequate. High blood pressure, coronary

Fig-1: Procedure of the Apriori algorithm

Page 254

N.DEEPIKA* et al. / (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES Vol No. 11, Issue No. 2, 253 - 257

artery disease, stroke, or rheumatic fever/rheumatic heart disease are the various forms of cardiovascular disease. The World Health Organization has estimated that 12 million deaths occurs worldwide, every year due to the cardiovascular diseases. IV. PROPOSED METHOD The extraction of significant patterns from the heart disease data warehouse is presented in this section. The heart disease data warehouse contains the screening clinical data of heart patients. Initially, the data warehouse is preprocessed to make the mining process more efficient. In the First stage of our proposed study, we used preprocessing in order to handle missing values. Later we applied equal interval binning with approximate values based on medical expert advice to pima Indian heart attack data. Lastly we applied PCAR algorithm to generate the rules. We also consider important measure confidence. The significant items is calculated for all frequent patterns with the aid of the approach proposed. The frequent patterns with confidence greater than a predefined threshold are chosen. These chosen frequent patterns can be used in the design and development of heart attack prediction system. A. Data Set The, Pima Indian Heart attack dataset used was obtained from UCI machine learning repository [13]. Characteristics of the patients like number of times of chest pain and age in years were recorded. Some other important parameters need to be checked for every 2 hours thalach (maximum heart rate achieved), blood pressure (mm Hg), serum cholestral in (mg/dl), and electrocardiographic result.

records, normalizing the values used to represent information in the database, accounting for missing data points and removing unneeded data fields. Moreover it might be essential to combine the data so as to reduce the number of data sets besides minimizing the memory and processing resources required by the data mining algorithm [15]. In real world, data is not always complete and in the case of the medical data, it is always true. To remove the number of inconsistencies which are associated with data we use Data preprocessing. C. Approximate equal binning techniques based on expert advice: After the preprocessing the below attributes are included. Classification of general associations requires categorical data. The data variables are binned in to small number of categories. We have used approximate equal interval binning and also taken advice from medical experts. The following summarizes the cut-off values along with the names of the bins for the variables.

Attributes Age

T
Description Age of the Age<40 Age<60 patient(years) Age >60 1=male 0=female Value1:typical angina Value2:atypical angina Gender Chest pain type pain Resting blood BP<80 BP<90 BP>90 pressure in (mmHg) Serum cholestral in mg/dl Chol<200 Chol<400 Chol>400 blood 1=true 0=false Val=0 Val=1 Val=2 (value) <100 Fasting Resting electrocardiographic results sugar>120mg/dl Maximum heart rate achieved

Cut-off values

Type Young Middle Old age

ES
Sex CP Trest bps Chol Fbs Rest ecg Thalach

Value3:non-anginal Value4:asymptomatic BP-Normal BP-Normal-toHigh BP-High Chol-Normal Chol-High Chol-Severe

IJ A
B. Data Preprocessing Cleaning and filtering of the data might be necessarily carried out with respect to the data and data mining algorithm employed so as to avoid the creation of deceptive or inappropriate rules or patterns[14]. The actions comprised in the pre-processing of a data set are the removal of duplicate
ISSN: 2230-7818

Ecr-Normal Ecr-Abnormal EcrProbable/definite MHR-Normal MHR-High MHR-Severe

@ 2011 http://www.ijaest.iserp.org. All rights Reserved.

Page 255

N.DEEPIKA* et al. / (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES Vol No. 11, Issue No. 2, 253 - 257
Old peak St depression St depression induced by exercise relative to rest 0 or 1 (value) <1.5 (value) <2.5 (value) >2.5 num Angiographic disease status T (value) <50% T (value) >50% DE-Normal DE-Normal-toHigh DE-High Volve diameter is low Volve diameter is high

induced by exercise relative to rest 0 or 1 Exang Exercise angina induced

generated from even the data set is small. We keep such rules which are applicable reasonably large number of instances based on coverage and accuracy criteria. The coverage of an association rule is the number of instances for which it predicts correctly this is often called its support. Its accuracy often called confidence is the number of instances that it predicts correctly, expressed as proportion of all instances to which it applies. The user has to specify the minimum coverage and accuracy values and look for only those rules whose values are at least of the specified minimum value. V. EXPERIMENTAL RESULTS

IF conditions THEN conclusion This kind of rule consists of two parts. The rule antecedent (the IF part) contains one or more conditions about value of predictor attributes where as the rule consequent (THEN part) contains a prediction about the value of a goal attribute. An accurate prediction of the value of a goal attribute will improve decision-making process. IF-THEN prediction rules are very popular in data mining; they represent discovered knowledge at a high level of abstraction. In the health care system it can be applied as follows: (Symptoms)(Previous--disease)

PCAR Algorithm

PCAR algorithm identifies frequent item sets by pruning infrequent items. Its procedure is shown as Fig.2

IJ A
Fig-2: Procedure of the PCAR algorithm

ES
S.NO 1 2 3

D. Applying Pruning-Classification Association Rule on Heart Attack Patient Dataset Pruning-Classification Association Rule (PCAR). PCAR combines minimum frequency items with minimum frequency item sets. It firstly deletes infrequent items from item sets, then classifies item sets based on frequency of item sets, finally discovers frequent item sets. The number of candidate item sets is greatly reduced and item sets need not to be combined or decomposed, therefore, operation time and memory requirement could be decreased accordingly. This method has significant advantage in mining association rule at large volumes of items and small frequency of item sets.

T
Rules [(0;Y)(8;0)(7;MHR-S)]==> [(10;DN-H)].......88% [(2;NAP)(8;0)(2;AA)]==> [(10;DN-L)].......72% [(10;DNL)].....63%

history) ===>

(Causeof---

Example 1: If then rule induced in the diagnosis of level of disease status in blood IF (Age < 40 -Young) and (Blood Pressure > 90 - High) and (Maximum Heart Rate Achieved >125 - Severe) THEN Diagnosis =Volve Diameter narrowing is High >=50% [(0; Y) (3; BP-H) (7; MHR-S)]==> [(10; DN-H)] ....... 92% [(2;NAP) (2;ASMP) (8;0) (0;O) (2;AA) ]==> [(10; DN-L)] ....... 80% confidence

88% 72% 63%

[(2;NAP)(2;ASMP)(7;MHR-s)]==>

E. Applying Association Rules

Association rules are nothing different from classification rules except that does not predict only class labels but also predict any other attribute. It has freedom to produce a combination of attributes. Different association rules convey different regularities that trigger in the dataset and generally predict the different things and so many association rule

VI. CONCLUSION We studied the problem of constraining and summarizing different algorithms of data mining. We focused on using different algorithms for predicting combinations of several target attributes. In this paper, we have presented Firstly, This

ISSN: 2230-7818

@ 2011 http://www.ijaest.iserp.org. All rights Reserved.

Page 256

N.DEEPIKA* et al. / (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES Vol No. 11, Issue No. 2, 253 - 257

paper presents about the various effective heart attack prediction system using PCAR: an Efficient Approach for mining Association Rules. A proficient methodology for the generation of association rules from the heart disease warehouses for heart attack prediction has been presented. In our future work, this can further enhanced and expanded. For predicting heart attack significantly 30 attributes are listed. Continuous data can also be used instead of just categorical data. We can also use Text Mining to mine the vast amount of unstructured data available in healthcare databases. The experimental Results have illustrated the efficacy of the designed prediction system in predicting the heart attack. REFERENCES
1. Frawley and Piatetsky-Shapiro, Knowledge Discovery in Databases: An Overview. The AAAI/MIT Press, Menlo Park, C.A, 1996. 2. Hsinchun Chen, Sherrilynne S. Fuller, Carol Friedman, and William Hersh, "Knowledge Management, Data Mining, and Text Mining In Medical Informatics", Chapter 1, eds. Medical Informatics: Knowledge Management And Data Mining In Biomedicine, New York, Springer, pp. 3-34, 2005. 3. Tzung-I Tang, Gang Zheng, Yalou Huang, Guangfu Shu, Pengtao Wang, "A Comparative Study of Medical Data Classification Methods Based on Decision Tree and System Reconstruction Analysis", IEMS, Vol. 4, No. 1, pp. 102-108, June 2005. 4. S Stilou, P D Bamidis, N Maglaveras, C Pappas, Mining association rules Health Technol Inform 84: Pt 2. 1399-1403, 2001.

intelligent engineering systems through artificial neural networks, vol. 16, pp:305-310, 2006. 12. Carlos Ordonez, "Improving Heart Disease Prediction Using Constrained Association Rules,"Seminar Presentation at University of Tokyo, 2004. 13. D.Newman, J. S.Hettich, C.L.S. Blake, and C.J. Merz, UCI Repository of machine learning databases,Irvine, CA: University of California, Department of Information and Computer Science.1998, last accessed: 1/10/2009. 14.Gerhard Mnz, Sa Li, and Georg Carle, Traffic anomaly detection using k-means clustering, In Proc. of Leistungs-, Zuverlssigkeits- und Verlsslichkeitsbewertung September 2007. 15 .Wynne Hsu, Mong-Li Lee, Bing Liu, Tok Wang Ling, Exploration mining in diabetic patients databases: findings and conclusions, KDD 2000: pp: 430-436, 2000 von Kommunikationsnetzen und Verteilten Systemen, 4. GI/ITG-Workshop MMBnet 2007, Hamburg, Germany,

from clinical databases: an intelligent diagnostic process in healthcare, Stud

5.Andreeva P., M. Dimitrova and A. Gegov, Information Representation in Cardiological Knowledge Based System, SAER06, pp: 23-25 Sept, 2006. 6. Heon Gyu Lee, Ki Yong Noh, Keun Ho Ryu, Mining Biosignal Data:

Coronary Artery Disease Diagnosis using Linear and Nonlinear Features of HRV, LNAI 4819: Emerging Technologies in Knowledge Discovery and 7. Sellappan Palaniappan, Rafiah Awang, "Intelligent Heart Disease Prediction System Using Data Mining Techniques", IJCSNS International Data Mining, pp. 56-66, May 2007.

Journal of Computer Science and Network Security, Vol.8 No.8, August 2008. 8. Niti Guru, Anil Dahiya, Navin Rajpal, "Decision Support System for Heart Disease Diagnosis Using Neural Network", Delhi Business Review, Vol. 8, No. 1 (January - June 2007. 9. Carlos Ordonez, "Improving Heart Disease Prediction Using Constrained Association Rules," Seminar Presentation at University of Tokyo, 2004. 10. Franck Le Duff, Cristian Munteanb, Marc Cuggiaa, Philippe Mabob, "Predicting Survival Causes After Out of Hospital Cardiac Arrest using Data Mining Method", Studies in health technology and informatics, Vol. 107, No. Pt 2, pp. 1256-9, 2004. 11. Boleslaw Szymanski, Long Han, Mark Embrechts, Alexander Ross, Karsten Sternickel, Lijuan Zhu, "Using Efficient Supanova Kernel For Heart Disease Diagnosis", proc. ANNIE 06,

IJ A
ISSN: 2230-7818

@ 2011 http://www.ijaest.iserp.org. All rights Reserved.

ES
Page 257

Você também pode gostar