Você está na página 1de 5

International Journal of Advanced Engineering Research and Technology (IJAERT)

Volume 2 Issue 2, May 2014, ISSN No.: 2348 8190


65
www.ijaert.org
A Retrieval Strategy for Case-Based Reasoning using USIMSCAR for
Hierarchical Case
Daxa k. Patel
Department of Computer Science of Engineering and Technology, PIET, Limda
Gujarat Technology University
Vadodara, India
Abstract In the Case-Based reasoning Retrieval is the one of
most important phase, because the overall effectiveness of the
Case-Based Reasoning system is depending on the retrieval
Phase. To solve the target problem retrieve the useful cases from
the database. To perform the retrieval process, CBR system
typically exploit similarity knowledge is called the similarity-
based retrieval. Similarity measures are used in similarity-based
retrieval to approximate the usefulness of cases with respect to
the target problem. In this paper, we propose and develop a
retrieval strategy for hierarchical case that combine the support-
count bit from multilevel (SC-BF) and soft-matching criteria for
generating the soft-matching class association rules. Second apply
the unified knowledge of similarity and association knowledge
(USIMSCAR) for the improvement the performance of the
similarity based retrieval (SBR). Using the various association
rules mining technique, generate the association rule.
Keywords CBR, Association knowledge (AK), Association rule
mining (ARM), Case-Based Reasoning, Multilevel association rules,
data mining
I. INTRODUCTION
The fundamental premise of the case-based reasoning is
that the experience in the form of past cases can be leveraged
to solve new problems [1]. An individual experience is called
a case, and its collection is stored in the case based. Typically,
each case is described by the problem description and the
corresponding solution description. Among the four typical
phases in CBR are the retrieval, reuse, revise, and retain, the
retrieval is a key phase in CBR, since the success of CBR
system is heavily reliant on the performance of retrieval [2].
Its aim is to retrieve the useful or relevant cases that can be
successfully used to solve a target problem. If the retrieved
cases are not useful, CBR systems may not eventually produce
the suitable solution to the problem. Typically, retrieval is
achieved through the specific strategy leveraging similarity
knowledge (SBR) [2]. In SBR, SK is used to estimate the
usefulness of stored cases with respect to the target the target
problem. SK is usually encoded via similarity measures
between the problem and stored cases, by using the measures,
SBR find cases ranked by their similarities to the problem, and
then their solutions are used to solve the problem. However,
there are two main problems in the SBR. First, SBR is too
much dependent on domain experts to define SK in practice
[3]. No clear methodology or general approaches to support
the modelling of such measures in an intelligent way have
been developed yet. Thus, defining SK is still very
complicated, time-consuming, and hard to practices. Second
the similarity measure is very often static so that the definition
is highly possible to be applied to all target problems
consistently.
In this paper, propose the association analysis of cases for
the hierarchical cases. Association knowledge represents
strongly evident, interesting relationships between known
problem features and solutions shared by a large number of
cases. Aim of retrieval in this paper is to retrieve the combined
set of both cases and rules relevant to the target problem,
where the relevance is determined by quantification method
using the integration of similarity knowledge and the
association knowledge. Association knowledge is dynamic in
that according to the characteristic of the target problems, the
best set of rules can be differently chosen and leveraged for
the retrieval process. The key strength of unified knowledge of
similarity and soft-matching class association rules
(USIMSCAR). For USIMSCAR to enable the retrieval
process with hierarchical cases, addressed the two issues: the
first is how to formalize similarity measure encoding
similarity knowledge and the second is how to generate soft-
matching class association rules (scars) encoding association
knowledge. Similarity measure for the hierarchical cases has
to able to adequately compute the similarity between the same
level cases or the different level cases.
The generation of scars from hierarchical cases basically
requires a mechanism that discovers frequent itemsets from
cases at different-levels. This issue may be addressed by using
an algorithm that extended Apriori allowing for mining multi-
level association rules. Support-count bit from (SC-BF)
multilevel , these algorithms are proposed with the aim of
finding frequent itemsets at the top most level and then
progressively deepening the mining process into their frequent
descendants at lower concept level. Therefore, by integrating
SC-BF algorithm and the soft-matching criterion, generate
scars from hierarchical cases
II. RELATED WORK
Similarity based reasoning has been widely used in the
different case based reasoning application. Such as the
medical diagnosis[4][5], IT service management[6], product
recommendation[7] and personal rostering decision [8], to
International Journal of Advanced Engineering Research and Technology (IJAERT)
Volume 2 Issue 2, May 2014, ISSN No.: 2348 8190
66
www.ijaert.org
predict the similar cases having the appropriate solution for
the target problem. Similarity based retrieval achieved through
a nearest neighbour retrieval [2]. The idea of k-NN is that
retrieval is achieved through retrieving the k most similar
cases to the target problem. The limitation of the k-NN lies in
allowing irrelevant attributes to influence the similarity
computation.
Approaches integrating data mining and k-NN have often
been applied in the Case Based Reasoning research to improve
the k-NN through two main schemes. The first is to integrate
feature selection (FS) or feature weighting (FW) into the k-
NN. In this context, FS is used to choose the relevant features
of the cases [5], [8], FW is applied to estimate optimal weights
of the original features of cases [9], [10], or their combination
is used to leverage their advantages [4]. The second scheme is
to combine the data clustering with k-NN, where the structure
of clustered cases is leveraged to guide more relevant cases
[11], [12]. Given the case base, a set of clusters is constructed,
where each cluster represents the group of relevant cases. For
case retrieval, the similarity between the target problem and
each case is combined with the relevance of the clustered
group containing the case considered.
The improved clustering technology is applied in the case-
based reasoning decision making system. During the setup
stage of the case library, the test result sets are clustered by the
improved CURE_KNN algorithm to identify the central points
and setup indexes of these subsets. In the retrieval process the
distance between the target case and each center is compared
to select the subset with the largest similarity. Retrieval and
maintenance of the large case library is particularly effective;
ensure the retrieval efficiency and the quality, overcome the
nearest neighbor retrieval method for its disadvantages of the
low efficiency on large scale case database searching [13].
Applying both inductive indexing and the nearest neighbor
techniques in the case base retrieval phase, to retrieve the set
of matching case inductive indexing will be used and then
nearest neighbour is used to rank the cases in the set according
to the similarity of the target case.
III. BACKGROUND OF SIMILARITY KNOWLEDGE AND
ASSOCIATION KNOWLEDGE
A. Background of Similarity Knowledge
In the case based reasoning context, similarity knowledge
encoded via measure computing similarity between the target
problem Q and cases. The higher the similarity between Q and
case C is the more useful case C for problem Q. It is the local
global principle that decomposes the similarity measure by the
local similarities for individual attributes of cases and the
global similarity aggregating these similarities [14]. A global
similarity function can be arbitrarily complex, but simple
functions are usually used such as weighted average
aggregation [14].
B. Background of AK
AK aims to represent evidently interesting relationships
shared by the large number of relevant stored cases, using the
combination of various DM techniques. These are the ARM
[15], class ARM [16], and soft-matching ARM (SARM) [17].
1) ARM: ARM aims to mine certain interesting
relationships, called associations, in the transaction database
[15]. It focuses on discovering the set of highly co-occurred
features shared by large number of records in the database. In
the Case Based Reasoning context, ARM can be used to
discover interesting relationships from the given case base. A
transaction and the item can be seen as the case and an
attributevalue pair, respectively. Apriori [15] is one of the
traditional algorithms for the ARM. Interestingness measures
are useful to evaluate the quality and rank the large number of
ARs extracted [18]. Generating association rules that have
greater support as compared to user defined minimum
threshold and confidence greater than user defined minimum
confidence is the main problem of Association Rule Mining.
2) Class ARM: Class association rules (CARs) [17] are
the special subset of ARs whose consequents are restricted to
the single target variable. In the Case Based Reasoning
context, the CAR is seen as an AR whose consequent holds
the item formed as the pair of the solution attribute and the
value of it.
3) SARM: Consider a rule X Y. A limitation of the
traditional ARM algorithms (e.g., Apriori [15]) is that itemsets
X and Y are discovered based on the equality relation. when
dealing with items similar to each other, these algorithms may
perform poorly. For example, in the supermarket sales
database, Apriori cannot find rules like 80% of the customers
who buy products similar to milk (e.g., cheese) and products
similar to eggs (e.g., mayonnaise) also buy bread. To address
this issue, the soft-matching criterion was proposed [17], where
the antecedents and consequents of ARs are found by similarity
assessment. Using this criterion, the problem of SARM is to
find all rules of the form X Y, where the soft support and
soft confidence of each rule are not less than minsupp and
minconf, respectively. The definitions of soft support and soft
confidence are generalized by using support and confidence.
This generalization is done by allowing items to match, as long
as their similarity exceeds the user-specified minimum
similarity minsim.
IV. EXISTING SYSTEM
In case-based reasoning, the case represents the problem-
solving experience from the past. The case is structured into
the two main parts. The first part is the problem part that
contains the description characterizing the past problem. The
second part is the solution part that contains the description of
the suitable solution for the described problem. To represent
the cases formally, many CBR systems generally adopt well-
known knowledge representation attribute-value pairs. In this
process two algorithms is generated that soft-matching class
association rule (SCAR) and unified knowledge of similarity
and soft- matching class association rule (USIMSCAR). In the
International Journal of Advanced Engineering Research and Technology (IJAERT)
Volume 2 Issue 2, May 2014, ISSN No.: 2348 8190
67
www.ijaert.org
SCAR Algorithm, generate the soft-matching class association
rules for the frequent itemsets (concept of the soft-matching
criteria describe in the section III). In the second algorithm,
combined the similarity knowledge and the association
knowledge for improve the performance of the similarity
Based Retrieval (SBR).
V. AK REPRESENTATION
In this section explain the proposed system. Here, same
process for the Case-Based Reasoning (CBR), but use the
structural representation for the case base that is the
hierarchical representation. The hierarchical representation
represents each case at multiple levels of abstraction. The
hierarchical representation is the simple extension of the
attribute- value pair representation allows for the description
of the cases with the complex hierarchical structure [23].
To represent the Hierarchical cases with the USIMSCAR,
need to address the following two issues: first is how to
formalize similarity measures encoding similarity knowledge
and the second is how to generate SCARs encoding
association knowledge. The similarity measure for the
hierarchical cases has to be able to adequately compute the
similarity between the same- level cases or the different level
cases. The generation of the SCARs from the hierarchical
cases basically requires the mechanism that discovers frequent
itemsets from the cases at the different levels. Used the
Support- Count and Bit-from multilevel (SC-BF) Algorithm,
using these algorithm finding the frequent itemsets at the top
most level and then progressively deepening the mining
process into their frequent descendants at lower concept
levels. By integrating SC-BF algorithm and the soft-matching
criterion, generate the SCARs from the hierarchical cases.
This section presents our approach for extracting and
representing association knowledge using the technique
describe in the section III. The aim of association knowledge
building is: 1) representing strongly evident association
between known problem features and the solution from the
given case base, 2) valuably combined these associations
along with SK in unified knowledge of similarity and soft-
matching class association rules (USIMSCAR).
Mine multilevel association rules efficiently using concept
hierarchies, and the soft-matching criteria. Hierarchical
algorithm defines sequence of mappings from the set of low-
level concepts to higher-level [19]. Using the concept
hierarchies, first retrieve the frequent itemsets from the case
base at the same level or different level and then combined the
soft-matching criteria to generate the soft-matching class
association rules. A SCAR has an implication of the form X
y, where X is the frequent itemset representing problem
features that occur frequently and are discovered by the soft-
matching criterion from the case base. And y is the solution
item. A SCAR X y thus implies that the target problem Q
is likely to be associated with the solution contained in the y,
if Qs problem features are sufficiently similar to the X.
In a concept hierarchy, this is represented as the tree with
the root as D. This uses the hierarchical information to encoded
transaction table instead of original transaction table. This is
because the DM query is usually in relevance to only the
portion of the transaction database, instead of all the items in
the database. It is beneficial to first gather the relevant set of
data and then work again and again on task-relevant set [20]
[21]. Encoding can be performed during the gathering of task
relevant data and thats why there is no extra encoding pass
needed.
VI. RESULTS AND ANALYSIS
. This section provides the comparisons between the
existing method and the proposed method in term of the
accuracy and the time complexity. For car evaluation database,
take the total time for retrieve frequent itemsets from the
hierarchical case and generating the soft-matching rules from
the frequent itemsets and finally generate the association rule,
is less compared to the existing method. And accuracy is also
improved in the hierarchical case. Below figure show the
Accuracy between the existing method and proposed method.
TABLE
Attribute Vs Accuracy (%) for Car Dataset
Car Dataset
Attributes Existing
Algorithm(USIMSCAR)
Modified
Method
(Hierarchical
USIMSCAR)
1 15 20
2 15 23
3 24 33
4 24 30
5 30 38
6 30 37
7 60 65
8 24 29
The graphical representation for the car database is shown in
Fig. 1.
Fig. 1.Accuracy for Car Dataset
In the graph it is clearly seen that accuracy for proposed
Algorithm is improved. Time complexity is also improved in
International Journal of Advanced Engineering Research and Technology (IJAERT)
Volume 2 Issue 2, May 2014, ISSN No.: 2348 8190
68
www.ijaert.org
the proposed method. Table II Contains the information about
execution time for running the existing method and the
proposed method. Proposed method take less time to run the
algorithm for the number of attributes enter by the user.
TABLE II
Attributes Vs Time for Car Dataset
Car Dataset
Attributes Existing
Algorithm(USIMSCAR)
Modified
Method
(Hierarchical
USIMSCAR)
1 1248426 1035337
2 1376311 0837051
3 1182619 0661093
4 1275880 0970096
5 1518538 0543652
6 1211486 0627721
7 1218748 0457310
8 0518805 0425308
Fig. 2. Show the graphical representation for execution time.
Fig. 2. Execution Time for Car Dataset
Also analysis the results for the Camera Database and the PC
Database. Table III contain the accuracy for the existing
database and the proposed database.
TABLE III
Attribute Vs Accuracy (%) for PC Dataset
PC Dataset
Attributes Existing
Algorithm(USIMSCAR)
Modified
Method
(Hierarchical
USIMSCAR)
1 51 60
2 32 38
3 43 49
4 43 50
Fig. 3. Represent the graphical represent for the PC Database.
Table IV contains execution time for running the PC Database.
Proposed algorithm takes less time to execute the algorithm. So
it improves the execution time and accuracy for the
experimental database.
Fig. 3. Accuracy for PC Dataset
TABLE IV
Attributes Vs Time for PC Dataset
PC Dataset
Attributes Existing
Algorithm(USIMSCAR)
Modified
Method
(Hierarchical
USIMSCAR)
1 1639146 0629975
2 0745293 0395333
3 0694610 0376362
4 0736399 0384672
The graphical representation for the execution time for the PC
Database is shown in Fig. 4.
Fig. 4 Execution Time for PC Dataset
International Journal of Advanced Engineering Research and Technology (IJAERT)
Volume 2 Issue 2, May 2014, ISSN No.: 2348 8190
69
www.ijaert.org
VII. ADVANTAGES OF PROPOSED SYSTEM
Take the less memory to store the entire data because the
data was grouped at the branch level and reduce the execution
time. It improves its accuracy.
VIII. CONCLUSIONS AND FUTURE WORK
In this paper present the case based reasoning for the
hierarchical structure, in this process first retrieve frequent
itemsets from the hierarchical case using the SC-BF algorithm
after that applied Soft-matching criteria for generating the
SCARs. Second combined the Association knowledge and the
similarity knowledge for the improvement of the similarity
based retrieval. Advantages of using the SC-BF algorithm is
Take the less memory to store the entire data because the data
was grouped at the branch level and reduce the execution
time. It improves its accuracy. So it improves the total
performance of the case-based reasoning.
As future work, USIMSCAR could also be extended for
cases with complex structures such as object-oriented and
semantic web-based cases [2], [22]. For USIMSCAR to run
with such cases, two issues must be addressed: 1) how to
define similarity measures for the cases; and 2) how to
formalize AK from the cases.
ACKNOWLEDGMENT
I would like to express the deepest appreciation to Rahul Joshi
who has guided me and for their support and motivation that
they have provided. He has always been willingly present
whenever I needed the slightest support from his. I would not
like to miss a chance to say thank for the time that he spared
for me, from his extremely busy schedule.
REFERENCES
[1] R. Lopez De Mantaras, D. McSherry, D. Bridge, D. Leake, B.
Smyth, S. Craw, B. Faltings, M. L. Maher, M. T. Cox, K.
Forbus, M. Keane, A. Aamodt, and I. Watson, Retrieval, reuse,
revise and retention in CBR, Knowledge. Eng. Rev., vol. 20,
no. 3, pp. 215240, 2005.
[2] Y. Guo, J. Hu, and Y. Peng, Research on CBR system based on
data mining, Appl. Soft Comput., vol. 11, no. 8, pp. 5006
5014, 2011.
[3] H. Ahn and K.-J. Kim, Global optimization of case-based
reasoning or breast cytology diagnosis, Expert Syst. Appl. ,
vol. 36, no. 1, pp. 724734, 2009.
[4] B. Pandey and R. Mishra, Case-based reasoning and data
mining integrated method for the diagnosis of some
neuromuscular disease, Int. J. Med. Eng. Informat., vol. 3, no.
1, pp. 115, 2011.
[5] Y.-B. Kang, A. Zaslavsky, S. Krishnaswamy, and C. Bartolini,
A knowledge-rich similarity measure for improving IT incident
resolution process, in Proc. ACM Symp. Appl. Comput., 2010,
pp. 17811788.
[6] F. Lorenzi and F. Ricci, Case-based recommender systems: A
unifying view, in Intelligent Techniques for Web
Personalization, vol. 3169. Berlin, Germany: Springer, 2005,
pp. 89113.
[7] A. Aamodt and E. Plaza, Case-based reasoning: Foundational
issues,methodological variations, and system approaches, AI
Commun., vol. 7,pp. 3959, Mar. 1994.
[8] G. R. Beddoe and S. Petrovic, Selecting and weighting features
using a genetic algorithm in a case-based reasoning approach to
personnel rostering, Eur. J. Oper. Res., vol. 175, no. 2, pp.
649671, 2006.
[9] K. Bradley and B. Smyth, Personalized information ordering:
A case study in online recruitment, Knowl.-Based Syst.,
vol. 16, nos. 56, pp. 269275, 2003.
[10] C. M. Vong, P. K. Wong, and W. F. Ip, Case-based
classification system with clustering for automotive engine
spark ignition diagnosis, in Proc. 9th Int. Conf. Comput. Inf.
Sci., Aug. 2010, pp. 1722.
[11] F. Azuaje, W. Dubitzky, N. Black, and K. Adamson,
Discovering relevance knowledge in data: A growing cell
structures approach, IEEE Trans. Syst., Man, Cybern. B,
Cybern., vol. 30, no. 3, pp. 448460, Jun.2000.
[12] Z. Y. Zhuang, L. Churilov, F. Burstein, and K. Sikaris,
Combining data mining and CBR for intelligent decision
support for pathology ordering by general practitioners, Eur. J.
Oper. Res., vol. 195, no. 3, pp. 662675, 2009.
[13] L.Tong and D.Wu, Research on optimization of case-based
reasoning system, 3
rd
international conference on control,
automation and system engineering ,2013.
[14] A. Stahl, Learning of knowledge-intensive similarity measures
in casebased reasoning, Ph.D. dissertation, Artificial
Intelligence nowledge- Based Systems Research Group, Tech.
Univ. Kaiserslautern, Kaiserslautern,Germany, 2003.
[15] R. Agrawal, T. Imielinski, and A. Swami, Mining association
rules between sets of items in large databases, ACM SIGMOD
Rec., vol. 22, no. 2, pp. 207216, Jun. 1993
[16] B. Liu, W. Hsu, and Y. Ma, Integrating classification and
association rule mining, in Knowledge Discovery and Data
Mining. Berlin, Germany: Springer, 1998, pp. 8086.
[17] U. Y. Nahm and R. J. Mooney, improve information extraction
Using soft-matching mined rules , in Proc. AAAI Workshop
Adaptive Text Extract. Mining, 2004, pp. 2732
[18] L. Geng and H. J. Hamilton, Interestingness measures for data
mining: A survey, ACM Comput. Surv., vol. 38, no. 3, Article
9, Sep. 2006.
[19] H. Ravi Sankar, Dr. M.M. Naidu, An Innovative Algorithm for
Mining multilevel ARs, Proceeding of the 25th IASTED
International multi-conference AI and applications February 12-
14, 2007, Innsbruck Austria.
[20] Predrag Stanii, Savo Tomovi, "Apriori Multiple Algorithm
for Mining Association Rules," 124X Information Technology
and Control, vol.37, pp.311-320, 2008.
[21] Mehmet Kaya, Reda Alhajj, "Mining Multi-Cross-Level Fuzzy
Weighted Association Rules," Second IEEE International
Conference on Intelligent Systems.vol.1, pp.225- 230, 2004.
[22] V. Nebot and R. Berlanga, Mining association rules from
semantic web data, in Proc. 23rd Int. Conf. Ind. Eng. Appl.
Appl. Intell. Syst., 2010, pp. 504513.
[23] Y.B.Kang, S.Krishnaswamy, and A.Zaslavsky, A Retrieval
Strategy for Case-Based Reasoning using Similarity and
Association Knowledge, IEEE ,March 20,2013.

Você também pode gostar