Escolar Documentos
Profissional Documentos
Cultura Documentos
Zhang Xizheng
Department of Computer Science, Hunan Institute of Engineering, Xiangtan,China,411104
Z_X_Z2000@263.net
854
855
thesaurus collections are used to match the containing all items a1, a 2, and aH , as well as b ;
requirements.They are represented as N = {n1, n 2, } , count(T) is the total number of data records contained
V = {v1, v2, } , ADJ = {adj1, adj2, } , ADV ={adv1, adv2, } in T ;In addition, the set {a1, a 2, , aH } embodies a
for noun, verb, adjective, and adverb, respectively. non-empty subset of P , whereas the set {b} exhibits a
Semantic rules obtained based on above approach are
non-empty subset of L .
represented as IF–THEN rule formats and stored in the
Apriori algorithm is adopted to acquire the classifi-
semantic rule database to indicate the inference
ers[9].After all the frequent itemsets are discovered,
relationship between requirements and predefined
association rules can be generated with the correspon-
phrases.After extraction,customer requirements are
ding support and confidence levels. Thus,classifiers are
acquired and represented as a set of phrases which are
built using association rule learning.
utilized in the following module to build the classifiers.
3.3. Classification of rules
3.2. Association rule mining
Before classifers generated in subsection 3.2 can be
Association rule mining technique is aiming at
utilized to classify the rules, a process called pruning
discovering the relationships among a large set of data
is done. By reason of acquiring the timely and accurate
items[7][8]. Let η = {i1, i 2, , ix , , iy, , im} be a set
responses, pruning of those classifier is necessary.Here,
of items, and T a set of database transactions, where CBA-CB algorithm is applied to produce the best
T = {t1, t 2, , tQ} . Each transaction tq (1 ≤ q ≤ Q) com- classifiers out of the whole set of rules.CBA-CB
prises a set of items and a assigned unique identifier. A algorithm is based on the idea that only those rules
transaction tq is said to contain ix when ix ⊆ tq . As which are more general and hold high support as well
the association rule learning method excels in finding as confidence levels are necessary for the classification
the complex relationships among a huge number of task[9]. The unnecessary rules should be pruned by
semior non-structured items, it is adopted here to database coverage.Two principles are defined in CBA-
acquire the rules.An association rule is an implication CB as follows:
of the form ix ⇒ iy , where ix ⊆ P and iy ⊆ C as well General rules. Given two rules rx ⇒ lj and
as ix ∩ iy = ∅ corresponding to requirement phrases and ry ⇒ lj ,where 1 ≤ j ≤ N ,the first rule is more general
class labels, respectively. The rule ix ⇒ iy in the than the second one if rx ⊆ ry .
transaction set T holds the support s % and the Precedence rank. Given two rules rx and ry , rx
confidence c% , where s % is the probability has a higher precedence than ry , namely rx ry , if
P (ix ∩ iy ) that ix and iy hold together among all the (1) the confidence of rx ( con( rx ) ) is greater than that
transactions and c% is the conditional proba- of ry ( con( ry ) ); or (2) con( rx) = con( ry ) , but the
bility P (iy ix ) that iy is true under the condition of ix . support of rx ( sup( rx ) ) is greater than that of
In a general form, the mined association rules are ry ( sup(ry) ); or (3) con(rx) = con(ry) and
represented as the following: sup( rx ) = sup( ry ) ,but rx is generated earlier than ry .
a1 AND a2 AND AND aH ⇒b with Support = s%;Confidence = c% The rule set R = {r1, r 2, ", rM } comprise M rules.
where phrase as = pi ⊆ P and 1 ≤ s ≤ H ,1 ≤ i ≤ M , class Each rule rm is pruned according to the first principle.
label b = lj ⊆ L,1 ≤ j ≤ N ; and for any two elements in After pruning, general rules are selected and stored in a
the precedence, ax and a y , where 1 ≤ x, y ≤ H and pruned rule set, Rp. Rank all the rules in Rp in
descendent order according to the second principle and
x ≠ y, ax ∩ ay = ∅ ; s% and c% refer to the support then record the ranked rules in a rule set RD. For each
and confidence levels for this rule, respectively. They rule rm in RD, the phrases involving in its precedent
are calculated as Equation (2) and (3) , respectively. part comprise a set, PRrm , where PRrm ⊆ P. Let L(rm)
count (a1 AND a 2 AND AND aH AND b) present the class label associated with rm, where
s% = ×100% (2)
count (T ) L (rm ) = lj . Set a cover-count zero for each transaction
count (a1 AND a2 AND AND aH AND b) record ts, namely L(ts) = 0. For each rule rm in RD
c% = ×100% (3) ,search all the records in the transaction database. For
count (a1 AND a2 AND AND aH )
any record ts, if it satisfies the condition PRrm ⊆ Pts, it
where count (a1 AND a 2 AND AND aH AND b) is
is selected. All the selected transaction records
the number of transaction records in database T comprise a new set T ' , where T ' ⊆ T . Increase the
855
856
cover-count by one for all the transaction records in S
set T ' . For any rule rm in RD, if it satisfies the condition e = ∑ hs / S
L(rm ) = L(ts ) , where ts ∈ T ' , it is put into the filtered s =1
rule set, RF. Then delete the selected rule in set RD and 1, C (ts ) ∈ C (ts ) | ∀t s ∈ T
empty set T ' . Finally, delete record ts in the trans- s.t vs = (5)
action database if it satisfies the condition L (ts) ≥ LTH , 2, else
where LTH is a threshold for cover-count. where e means the recommendation accuracy, namely
The pruned rules in set RF are the most significant the percentage of the transactions in the test set that are
and finally selected as classifiers to predict the class correctly classified; hs is a binary variable where it is
labels for new object customer. The object customers
set to one if transaction t s is correctly classified, and
comprise an object database O = {ok}K . For object
zero otherwise.
customer k , the requirement record is described as a
o
set of phrases, P j , where P ok ⊆ P and P ok ≠ P ts . 4. Simulation Results
Select rules rm from set RF which satisfy the condition
PRrm ⊆ Pok ,and put the selected rules in the classifier We have designed and implemented the proposed
rule set RC. Group all the rules in set RC. based on their recommendation system in an Internet programming
associated class labels. Suppose N groups are environment. The architecture of system is composed
generated, where G = { g 1, g 2, , g N } . Each group of three tire : client,application server and database
server. Five function modules are implemented
gn associates with a class label, namely
according to the system framework of the recom-
L ( gn ) = lj ,1 ≤ j ≤ N . Therefore, the classification based mendation system. In an Internet environment, we
on multiple-classifiers can be formulated as follows: apply ASP language to create the dynamic and
P ok ⇒ L ( gn ),1 ≤ n ≤ N interactive personalized pages for the Web site.To
allow the rules to be queried and managed efficiently,
s.t . ∑ con(r
m
m ) ≥ ψ | ∀rm, rm ∈ gn (4) this research deploys the Web database query run on
o
IIS server. Based on the historical sales records,
where P j represents the requirement information of transaction database is established comprising 2136
customer k; L(gn) means the class label associated with transaction records, as shown in Table 1, where
nth group gn ;con(rm) is the confidence of rule rm in customer requirements are described as a set of phrases
RF, and ψ is the threshold for confidence sum. Eqs. (4) and the corresponding class labels indicate the mobile
indicate that for the rules selected as classifiers, their phone that has been purchased.
associated class labels are selected as recommended
ones only if their confidence sum exceeds a certain Table 1. Part of Transaction records
threshold. It enables multiple class labels to be Record Requirement phrases Class
identified based on strong patterns thus adapting to the Label
problems where multiple recommendations are t1 Stylish,colorful screen, light, Nokia
black, elegant,high security 8800se
preferred to allow customers to make comparisons
t2 More functions, small,light, Dopod
among a small set of similar products. camera, large buttons 586
t3 Larger screen, business Samsung
3.4. Performance test function,Bluetooth, picture SGH-D418
and video
System performance of the proposed recomm- … … …
endation system is test and validated via the accuracy t999 Small,stylish,cheap , video, VK
measurement [8]. Suppose the test set T comprises S friendly keypad, camera 550
records, where T = {t s} . For each record in set T , t1000 Entertainment support, slim, NEC
cheap,duralbe N3105
ts = {P ts , C (ts )} , the associated class labels assigned by
the classification module comprise a set C (ts ) . Then Allowing the transaction database, the classifiers are
the recommendation accuracy is computed similar with identified. After pruning, the pruned classifiers are
[8] as following used to assign the class labels for future customer
requirements. The best classifiers are generalized after
pruning using CBA-CB algorithm . The system perfor-
mance has been validated using 500 test records. As
856
857
shown in Table 2, the second column indicates the are extracted from text documents and transformed into
mobile phone product that has been purchased in each a set of significant phrases. And a set of association
test record, and the third column lists the mobile phone rule are mined from database using Apriori algorithm.
products that are recommended by the recommen- To enable the timely and accurate responses, CBA-CB
dation system using classifiers. In this case, 452 test algorithm is applied to produce the best rules out of the
records are correctly recommended resulting in the whole set of rules. The best classifiers are then
recommendation accuracy as 90.4%. generated after the test and validation of those rules.
These classifiers are used to predict the item labels for
Table 2. Part of recommendation results new customer requirements and thus assigns the
Record Original Recommended labels hs corresponding class labels to the customer. The system
class label analysis and design of the proposed recommendation
t1 Nokia 7610 Nokia 3230, 1 system as well as the results are also presented.
Nokia 7610
t2 Samsung Motorola E 680g, 0 References
SGH-D508 Samsung SGH-D720
[1] Liu Guo-rong,Zhang Xi-zheng, “Collaborative Filtering
t3 Nokia N72 Nokia N72, Motorola 1
A 1200, Sony Based Recommendation system for Product bundling”,
Ericsson W810c, Proceeding of ICMSE’06, Lille, France, pp.251-254, 2006.
Samsung SGH-D828 [2] Chen, H. C., Chen, A. L. P, “A music recommenation
system based on music data grouping and user interests”,
… … … … Proceedings of the ACM /CIKM, Atlanta, USA, pp. 231-
238,2004.
t998 Nokia 7610 Nokia 7610, Nokia 1
[3] Francisco, G. S., Rafael, “An integrated approach for
8910
developing e-commerce applications”, Expert Systems with
t999 VK 550 VK 550, NEC N200 1
Applications, vol. 28, pp.223–235, 2005.
t1000 Motorola Motorola V3, Nokia 1
[4] Han, P., Xie, B., Yang, F., “A scalable P2P recommender
V3 6111
system based on distributed collaborative filtering”, Expert
Systems with Applications, vol. 27, pp. 203-210, 2006.
After validation, the generated classifiers are stored [5] Hull, D. A, “Improving text retrieval for the routing
in the classifier rule database for further retrieval, problem using latent semantic indexing”, Proceedings of 17th
update, and query. Online customers are allowed to ACM international conference on research and develop-ment
search and buy mobile phone products via the Internet. in information retrieval, Dublin, Ireland pp.282–289,1999.
By connecting with the database through ODBC proxy [6] Liu, B., Hsu, W., & Ma, Y, “Integrating classification and
server, data and rules can be easily retrieved to support association rule mining”, Proceedings of 4th inter-national
conference on knowledge discovery and data mining, New
the application server to process the classification York, USA, pp. 27-31, 1998.
tasks. The most valuable product alternatives are then [7] Wang, F.-H., & Shaob, H.-M, “Effective personalized
identified and represented to the online customers via recommendation based on time-framed navigation clustering
HTML file format. Supported by the associative and association mining”, Expert Systems with Applications,
classification-based recommendation system, online vol. 27,pp.365-377, 2002.
customers are able to find the mobile phone products [8] Yiyang Zhang, Jianxin (Roger) Jiao, “An associative
that accord with their requirements mostly among classification-based recommendation system for persona-
numerous available mobile phones. This solves the lization in B2C e-commerce applications”,Expert Systems
information overload problem thus improving the with Applications, vol. 33 ,pp. 357–367, 2007.
efficiency and effectiveness of businees to customer e- [9] Agrawal, R., Srikant, R, “Fast algorithms for mining
association rules in large databases”, Proceedings of 20th
commerce. international conference on very large data bases, Santiago
de, Chile , pp. 487-499, 2003.
5. Conclusions [10] Weng, S.-S., & Liu, C.-K, “Using text classification and
multiple concepts to answer e-mails”, Expert Systems with
In this paper,we propose a personalized recommen- Applications, vol. 26,pp. 529-543, 2004.
dation system using association rule mining and
classification in e-commerce. Customer requirements
857
858