Você está na página 1de 4

RULE ACQUISITION IN DATA MINING USING SELF ADAPTIVE GENETIC ALGORITHM

BY GAURAV SETHIA.D KUMARAN.S PRABHAKAR.J UNDER THE GUIDANCE OF Dr. S. KANMANI PROBLEM DEFINITION: The implementation and the functioning of RAGA (Rule Acquisition with a Genetic Algorithm), a genetic-algorithm-based data mining system suitable for both supervised and certain types of unsupervised knowledge extraction from large and possibly noisy databases. SYSTEM REQUIREMENT SPECIFICATION: Windows OS Microsoft Office(Access Database) JVM

SURVEY ON EXISTING SYSTEMS: LIMITATIONS OF THE EXISTING SYSTEMS: MOTIVATION TO TAKE UP THE PROBLEM: Association Rule mining is a technique of data mining that is very widely used in many areas to deduce results that prove to be very helpful in the field as they provide some inferences from possibly large databases that cannot be noticed without data mining. Also, Genetic Algorithm can be applied to different areas of applications as Biology, biometrics, Education, Manufacturing Information System, Application Protocol interface records from Computers for Intrusion Detection, Software Engineering, Virus information from Computer data, Image data base, Finance information, Students Information etc. It is seen that by altering representations and operators the Genetic algorithm could be applied for any fields without compromising the efficiency. The performance of self adaptive genetic algorithms has been proved to be better than non self adaptive ones. And, there is no self adaptive algorithm available to carry out data mining as of now, and hence to improve the efficiency of the whole process, this problem is taken up. OBJECTIVE OF THE PROBLEM: The basic objective of the problem is to actually carry out Association Rule Mining, using a Self Adaptive Genetic Algorithm. The mutation and crossover rates are to be adapted dynamically such that the performance of the algorithm improves when compared to the existing ones. And the results thus obtained must be in quicker time and also the more accurate than the existing algorithms.

COMPLETE MODULES DESCRIPTION: The following are the main modules of the project: i. ii. iii. iv. v. vi. vii. Data Cleaning Data Integration Data Selection Data Transformation Data Mining Pattern Evaluation Knowledge Representation Data cleaning: also known as Data Cleansing, is a phase in which noise data and irrelevant data are removed from the collection. Data integration: at this stage, multiple data sources, often heterogeneous, may be combined in a common source. Data selection: at this step, the data relevant to the analysis is decided on and retrieved from the data collection. Data transformation: also known as data consolidation, it is a phase in which the selected data is transformed into forms appropriate for the mining procedure. Data mining: it is the crucial step in which clever techniques are applied to extract patterns potentially useful. Pattern evaluation: in this step, strictly interesting patterns representing knowledge are identified based on given measures. Knowledge representation: is the final phase in which the discovered knowledge is visually represented to the user. This essential step uses visualization techniques to help users understand and interpret the data mining results.

The data mining and pattern evaluation parts of the project are the modules where the Genetic Algorithm comes in. The following are the modules of a Genetic Algorithm: A. Start] Generate random population of n chromosomes (suitable solutions for the problem) B. [Fitness] Evaluate the fitness f(x) of each chromosome x in the population C. [New population] Create a new population by repeating the following steps until the new population is complete i. [Selection] Select two parent chromosomes from a population according to their fitness (the better fitness, the bigger chance to be selected) ii. [Crossover] With a crossover probability cross over the parents to form a new offspring (children). If no crossover was performed, offspring is an exact copy of parents. iii. [Mutation] With a mutation probability mutate new offspring at each locus (position in chromosome). iv. [Accepting] Place new offspring in a new population D. [Replace] Use new generated population for a further run of algorithm E. [Test] If the end condition is satisfied, stop, and return the best solution in current population F. [Loop] Go to step 2

Basic Genetic Algorithm The basic algorithm stays the same in the project, with only one change which is the change of the mutation and crossover rates after every iteration so as to achieve the best result.

COMPLETED MODULES CODE DESCRIPTION: A sample genetic algorithm is coded and the results are analyzed by changing the mutation and crossover rates before executing the program every time and the effects are studied. PLAN OF ACTION: REFERENCES:
1. Jing Li, Han Rui Feng, A self-adaptive genetic algorithm based on real code, Capital Normal University, CNU, 2010. 2. Genxiang Zhang, Haishan Chen, Immune Optimization Based Genetic Algorithm for Incremental Association Rules Mining, International Conference on Artificial Intelligence and Computational Intelligence, AICI '09 Volume: 4, Page(s): 341 345, 2009. 3. Gonzales, E., Mabu, S., Taboada, K., Shimada, K., Hirasawa, K., Mining Multi-class Datasets using Genetic Relation Algorithm for Rule Reduction, IEEE Congress on Evolutionary Computation,CEC09 , Page(s): 3249 3255, 2009. 4. Hong Guo, Ya Zhou, An Algorithm for Mining Association Rules Based on Improved Genetic Algorithm and its Application, 3rd International Conference on Genetic and Evolutionary Computing, WGEC '09, Page(s): 117 120, 2009. 5. Xiaoyuan Zhu, Yongquan Yu, Xueyan Guo, Genetic Algorithm Based on Evolution Strategy and the Application in Data Mining, First International Workshop on Education Technology and Computer Science, ETCS '09, Volume: 1 , Page(s): 848 852, 2009. 6. Xian-Jun Shi, Hong Lei, A Genetic Algorithm-Based Approach for Classification Rule Discovery, International Conference on Information Management, Innovation Management and Industrial Engineering, ICIII '08, Volume: 1, Page(s): 175 178, 2008.

Você também pode gostar