Você está na página 1de 6

Han et al.

provided a comprehensive survey, in database perspective, on the data mining


techniques developed recently [7].
Several major kinds of data mining methods, including generalization, characterization,
classification, clustering, association, evolution, pattern matching, data visualization, and metarule guided mining, was reviewed by them. Techniques for mining knowledge in different kinds
of databases, included relational, transaction, objectoriented, spatial, and active databases, as
well as global information systems, was examined by them [7]. Clustering is the most commonly
used technique of data mining under which patterns are discovered in the underlying data.[8]
Sidhu et al. presented that how clustering was carried out and the applications of
clustering.
They also provided us with a framework for the mixed attributes clustering problem and also
showed us that how the customer data can be clustered identifying the high-profit, high-value
and low-risk customer [8].
Huang presented an algorithm, called k-modes, to extend the k-means paradigm to
categorical domains.
He introduced new dissimilarity measures to deal with categorical objects, replace means of
clusters with modes, and use a frequency based method to update modes in the clustering process
to minimize the clustering cost function.. Experimented on a very large health insurance data set
consisted of half a million records and 34 categorical attributes showed that the algorithm was
scalable in terms of both the number of clusters and the number of records[9]
Berkhin surveyed that concentrated on clustering algorithms from a data mining
perspective. [10]
In [11] k-prototypes algorithm was proposed, which was based on the k-means paradigm but
removed the numeric data limitation whilst preserved its efficiency. In that algorithm, objects
were clustered against k prototypes. A method was developed to dynamically update the k
prototypes in order to maximize the intra cluster similarity of objects. In [12] the efficiency and
scalability issues were addressed by proposing a data classification method which integrated
attribute oriented induction, relevance analysis, and the induction of decision trees. Such an

integration lead to efficient, high quality, multiple level classification of large amounts of data,
the relaxation of the requirement of perfect training sets, and the elegant handling of continuous
and noisy data.
Antonie et al. presented some experiments for tumor detection in digital mammography.
They investigated the use of different data mining techniques, for anomaly detection and
classification. The experiments they conducted demonstrated the use and effectiveness of
association rule mining in image categorization [13].
Kohavi et al. described the two most commonly used systems for induction of decision trees
for classification:
C4.5 and CART. They highlighted the methods and different decisions made in each system with
respect to splitting criteria, pruning, noise handling, and other differentiating features[14]. Ma et
al. proposed to integrate classification and association rule mining techniques. The integration
was done by focusing on mining a special subset of association rules, called class association
rules (CARs). An efficient algorithm was also given to build a classifier based on the set of
discovered CAR [15]. In [16] two new algorithms were presented for solving the problem of
discovering association rules between items in a large database of sales transactions .The
algorithms were fundamentally different from the known algorithms. They showed how the best
features of the two proposed algorithms could be combined into a hybrid algorithm, called
AprioriHybrid . Agrawal et al. proposed an efficient algorithm that generated all
significant association rules between items in the database.
The algorithm incorporated buffer management and novel estimation and pruning techniques
[17].
Yavas et al. proposed an algorithm for predicting the next inter-cell movement of a mobile
user in a Personal Communication Systems network.
The performance of the proposed algorithm was evaluated through simulation as compared to
two other prediction methods [18]. Nanopoulos et al. presented a new context for the
interpretation of Web pre fetching algorithms as Markov predictors. They identify the factors that
affect the performance of Web pre fetching algorithms and proposed a new algorithm called

WM,,, which was based on data mining and was proved to be a generalization of existing ones
[19].
References
[1] Bhise, R. B., S. S. Thorat, and A. K. Supekar. Importance of data mining in higher education
system. IOSR Journal Of Humanities And Social Science (IOSR-JHSS) ISSN (2013): 22790837. | [2] Padhy, Neelamadhab, Dr Mishra, and Rasmita Panigrahi. "The survey of data mining
applications and feature scope." arXiv preprint arXiv:1211.5723 (2012). | [3] Sumathi, N., R.
Geetha, and S. Sathiya Bama. "Spatial data miningtechniques trends and its applications."
Journal of Computer Applications 1, no. 4 (2008): 28-30 | [4]Shaikh,Yasmin ,and Sanjay
Tanwani. "INTERACTIVE TEMPORAL MINING OF WORKFLOW LOGS." (2013). |
[5]http://en.wikipedia.org/wiki/Sequential_Pattern_Mining

[6]

http://en.wikipedia.org/wiki/Intention_mining | [7] Han, Jiawei. "Data mining techniques." In


ACM SIGMOD Record, vol. 25, no. 2, p. 545. ACM, 1996 | [8] Sidhu, Nimrat Kaur, and Rajneet
Kaur. "Clustering In Data Mining." | [9] Huang, Zhexue. "A Fast Clustering Algorithm to Cluster
Very Large Categorical Data Sets in Data Mining." In DMKD, p. 0. 1997. | [10] Berkhin, Pavel.
"A survey of clustering data mining techniques." In Grouping multidimensional data, pp. 25-71.
Springer Berlin Heidelberg, 2006. | [11] Huang, Zhexue. "Clustering large data sets with mixed
numeric and categorical values." In Proceedings of the 1st Pacific-Asia Conference on
Knowledge Discovery and Data Mining,(PAKDD), pp. 21-34. 1997. | [12] Kamber, Micheline,
Lara Winstone, Wan Gong, Shan Cheng, and Jiawei Han. "Generalization and decision tree
induction: efficient classification in data mining." In Research Issues in Data Engineering, 1997.
Proceedings. Seventh International Workshop on, pp. 111-120. IEEE, 1997. | [13] Antonie,
Maria-Luiza, Osmar R. Zaiane, and Alexandru Coman. "Application of Data Mining Techniques
for Medical Image Classification." In Proceedings of the Second International Workshop on
Multimedia Data Mining, MDM/KDD'2001, August 26th, 2001, San Francisco, CA, USA, pp.
94-101. 2001. | [14] Kohavi, Ronny, and J. Ross Quinlan. "Data mining tasks and methods:
Classification: decision-tree discovery." In Handbook of data mining and knowledge discovery,
pp. 267-276. Oxford University Press, Inc., 2002. | [15] Ma, Bing Liu Wynne Hsu Yiming.
"Integrating classification and association rule mining." In Proceedings of the 4th. 1998. | [16]
Agrawal, Rakesh, and Ramakrishnan Srikant. "Fast algorithms for mining association rules." In

Proc. 20th int. conf. very large data bases, VLDB, vol. 1215, pp. 487-499. 1994. | [17] Agrawal,
Rakesh, Tomasz Imieliski, and Arun Swami. "Mining association rules between sets of items in
large databases." In ACM SIGMOD Record, vol. 22, no. 2, pp. 207-216. ACM, 1993. | [18]
Yava, Gkhan, Dimitrios Katsaros, zgr Ulusoy, and Yannis Manolopoulos. "A data mining
approach for location prediction in mobile environments." Data & Knowledge Engineering 54,
no. 2 (2005): 121-146 | [19] Nanopoulos, Alexandros, Dimitrios Katsaros, and Yannis
Manolopoulos. "A data mining algorithm for generalized web prefetching." Knowledge and Data
Engineering, IEEE Transactions on 15, no. 5 (2003): 1155-1169. |

Apriori algorithm was first proposed by Agrawal in [20].Apriori is more efficient during
the candidate generation process [19].
It uses a breadth-first search strategy [25] to count the support of item sets and uses a candidate
generation function which exploits the downward closure property of support. Apriori uses
pruning techniques to avoid measuring certain item sets, while guaranteeing completeness. The
Apriori algorithm is based on the Apriori principle [24], which says that the item set X
containing item set X is never large if item set X is not large. Based on this principle, the Apriori
algorithm generates a set of candidate large item sets whose lengths are (k+1) from the large k
item sets (for k1) and eliminates those candidates, which contain not large subset. Then, for the
rest candidates, only those with support over minimum support threshold are taken to be large
(k+1)-item sets. The Apriori generate item sets by using only the large item sets found in the
previous pass, without considering the transactions.
Clustering with Apriori algorithm: Zhiyu Zhang [9] in his paper used
Clustering algorithm to first categories the students and courses and then with the help Apriori
algorithm various hidden information is extracted from the large amount of education data.
Chandrani Singh, Dr. Arpita Gopal ,Santosh Mishra [10] in their paper deals with the extraction
and analysis of faculty performance of management discipline from student feedback using
clustering and association rule mining techniques. First the faculty members are categorised
based on student feedback data and then Apriori algorithm has been implemented to undermine

the hidden trends in Faculty performance and their behaviour. Matrix based Apriori algorithm:
Hong Liu, Yuanyuan Xia [11] in their paper used matrix based Apriori algorithm to extract and
analyse the Indicator-score in teaching evaluation data and found the information of Indicatorscore that have high frequency, and then analyzed the strengths and weaknesses of the teaching,
to provide recommendations of improving teaching quality of teachers. This method does not
repeat the database scanning, thus reducing the I/O load. Improved Apriori algorithm based on
Tid set: Qiang Yang,Yanhong Hu [12] in their paper used improved Apriori algorithm to find
the correlation rules of course which provided the directive significance information for the
curriculum . This algorithm need to scan the original database only once when generating
candidate item set, it compute support count of the other candidate item sets through stating the
count of the corresponding Tid set, not scanning the database repeatedly, which saves the visiting
time greatly.
Improved Apriori algorithm based on clipping technique: Jian Wang, Zhubin Lu, Weihua Wu
and Yuzhou Li [13] in their paper used improved Apriori algorithm of association rules to
analyze the intrinsic link among various courses, dig out the precedence relationship and
association of students learning courses, reveal the teaching regularities and problems from large
amount of data, as well as provide a strong basis for reasonable course-setting . Improved Apriori
algorithm uses clipping technique to remove all candidate itemset in Ck that doesnt belong to
Lk-1. Improved Apriori algorithm based on logo list intersection: Lanfang Lou, Qingxian Pan,
Xiuqin Qiu [14] in their paper proposed a novel association rules for data mining to improve
Apriori algorithm. The proposed approach uses the intersection operation to generate frequent
item sets. It is different from the existing algorithm as it scans the database only one time and
then uses the database to mine association rules. The proposed technique has been implemented
in a teaching evaluation system, to enhance the foundation in performance evaluation for staff in
teaching issues. Improved Apriori Algorithm based on Modified Pruning process and flag bit:
Deng Jiabin, Hu JuanLi, Chi Hehua, Wu Juebo[15] in their paper put forward a kind of
intelligent evaluation method based on improved Apriori, which can be used to mine different
levels of association rules and evaluate the teaching quality automatically. The improvement
ideological of the frequent items: Between the Lk and Ck, introducing the Lk', when one item
has been validated that it is not a frequent item set, it will be inserted into Lk', but not be deleted.

In order to distinguish an item set whether it is frequent item sets or nonfrequent item sets flag
bit is introduced into the item sets. When it is the frequent item set, we use 1, or else use 0. At the
same time, the verification process and the pruning process also need to be modified: when
verifying the candidate set Ck, each time, we select items from the item set Ck to verify.
However, each time, we select items from Lk' in the pruning process as the pruning conditions
and iteratively generate Lk +1'. Different techniques have implemented to improve the
shortcoming of classic Apriori algorithm in education data mining. Although these improved
algorithms can reduce the number of candidate itemsets or improve the mining efficiency by
pruning methods, but still can't completely solve the problem of which candidate itemsets appear
no longer. And, what's more, facing masses of education data for mining long pattern to adopt
basic association rules mining is not the solution to problem as they will be producing a large
number of candidate itemsets, using lots of memory space and CPU processing time. Apart from
this setting the appropriate minimum support threshold is also an issue as it may lead to too
many or too few rules.

Você também pode gostar