Você está na página 1de 5

Eighth ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing

Building Personalized Recommendation System in E-commerce Using


Association Rule-based Mining and Classification

Zhang Xizheng
Department of Computer Science, Hunan Institute of Engineering, Xiangtan,China,411104
Z_X_Z2000@263.net

Abstract dation system to suggest personalized information in a


brief form without going through an annoying
Due to the convenience of Internet, people can process.Diverse recommendation systems have been
search for whatever information they need and buy proposed for different businesses to solve the person-
whatever they want on the web. In the age of E- alization problem by guiding customers to find the
Commerce, it is difficult to provide support for products they would like to purchase[1][2].
customers to find the most valuable products that Mooney had presented one of the earliest
match their heterogeneous needs. Traditional appro- recommendation systems. It is a mail filtering system
aches to this so-called personalization problem adopt that allows its members to annotate each document in
pre-defined formats to describe the customer detail to represent how interesting the reader found the
requirements. This always leads to distortion in document.Based on this work, several automated
eliciting requirement information and thus inaccurate recommendation systems have been developed.
recommendations. In this paper,we propose a Most of them are based on either content filtering or
personalized recommendation system using association collaborative filtering. However, both of the two
rule mining and classification in e-commerce. paradigms yielded few promising results.The content-
Customer requirements are extracted from text based filtering (CBF) approach recommends products
documents and transformed into a set of significant to target customers according to the preferences of their
phrases. Allowing the transformed transaction records, neighbors. However, it is often inhibitive to estimate
a set of association rule are mined from database using the preference similarities between various
Apriori algorithm. CBA-CB algorithm is applied to customers.The collaborative filtering (CF) approach,
produce the best rules out of the whole set of rules. The on the other hand, recommends products to object
best classifiers are then generated after the test and customers based on their past preferences. When
validation of those rules, aimed to predict the item dealing with new customers, this type of recom-
labels for new customer requirements and thus assigns mendation systems cannot recommend any product as
the corresponding class labels to the customer. The no historical preference records are available.
system analysis and design of the proposed Nevertheless, both approaches require the customers to
recommendation system as well as the implementation express their requirements according to certain system
of prototype are also presented. pre-defined formats, and thus real customer require-
ment information may be distorted[3][4].
1. Introduction Due to the disadvangtages of traditional approaches,
a new paradigm is preferred to recommendate proper
The advent of the E-Commerce induced by the rapid products by capturing accurately individual require-
development of Internet and the accompanying ment information. The main difficulties in establishing
adoption of the Web has deeply promoted the chances personalized recommendation systems can be
to create greater business opportunities and to reach manifested in two aspects[8]. First, customers require-
customers more easily.This 24*7 online accessibility ments expressed in their own natural languages are
has resulted in the enlargement of choices, but normally qualitative and tend to be either imprecise or
customers are now faced with information over- ambiguous. It’s difficult in understanding the require-
load.Their own arduous efforts are required to retrieve ments expression. Second, relational data-oriented
information that matches their preferences.What is classification methods,which commonly used on
needed is an automated and sophisticated recommen- structured data to predict the unknown class label of

0-7695-2909-7/07 $25.00 © 2007 IEEE 854


853
DOI 10.1109/SNPD.2007.548
object product,cannot be adopted to classify customer o
formalized as P j ⇒ L(oj ) , where an association
requirements which are organized into a set of text-
based documents. rule, ⇒ , indicates an inference from the customer
o
In this paper,we propose a personalized recom- requirements P j to the class label lj = l (cj )
mendation system using association rule mining and
classification in e-commerce. Customer requirements 3. Recommendation System
are extracted from text documents and transformed into
a set of significant phrases. Allowing the transformed The framework of such a recommendation system is
transaction records, a set of association rule are mined illustrated as Fig.1. The system comprises four
from database using Apriori algorithm. To enable the consecutive stages, namely the requirement extraction
timely and accurate responses, CBA-CB algorithm is module, association rule generation module, classi-
applied to produce the best rules out of the whole set of fication module, and system performance validation
rules. The best classifiers are then built after the test module. First, historical requirement data are selected
and validation of those rules. These classifiers are and transformed into proper phrase datasets. Data
utilized to predict the item labels for new customer mining procdure then starts to search for a set of
requirements and thus assigns the corresponding class associated, frequently occurring phrase patterns
labels to the customer. The system analysis and design (classifiers). The generated classifiers are pruned by
of the proposed recommendation system as well as the which only those classifiers with good quality are kept
prototype implementation are also presented. for recommendations. When new requirement
information comes, the system identifies the corre-
2.Recommendation Problem Outline sponding class labels using multiple classifiers. Finally,
the performance of the whole system is validated to
Recommendation system is facing the problem to evaluate how accurately the system will give
help customer find the items he would like to purchase recommendations.
at e-commerce sites.After having received simple
descriptions about the appearance, color, price,
specification of certain item, recommendation system
will show the according items which satifies all the
above requirements. Customer requirements can be
described as a set of phrases, P := { p1, p 2, , pM } ,
and item labels denoted by L = {l1, l 2, , lN} .Each
entity of L representing a specific class of product. For
general, suppose there are sales records for K
customers, C = {ci | ∀i = 1, , K } ,and their sales records
compose the transaction database T . Every transaction
record comprises the customer requirement record and Figure 1. Framework of recommendation
the purchased item record as Equation (1) system.
t (ci ) |= {P(ci ), L(ci) | ∀i = 1, 2, , K } (1)
For each customer c i ( i = 1, 2, , K )in the 3.1. Requirement extraction
transaction database T , the requirement record is
described as a set of phrases, P(ci ) ,where P(ci) ⊆ P . Customer requirements are usually expressed by
The according purchased item record indicates a class natural languages where many common words occur
which contribute nothing to information retrie- val
label L(ci) to show which item he purchased. .These common words should be filtered out. On the
Consider recommendation system with K′ new other hand, a group of different words may share the
object customers,whose object class labels are same word stem. To reduce variations in words and
unknown. The object customers compose an object increase the scope of searches, these words should be
database O , where O = {oj}K ′ . For each customer transformed into their canonical forms. In this regard,
j ( j = 1, 2, , K ′) in the object database, the require- stemming algorithm and a common stopword list in
o English are adopted to reduce the dimensions of the
ment record is described as a set of phrases, P j ,
o
text documents, further improve the efficiency of the
where P o j ⊆ P and P j ≠ P ci . Thus, the recom- classifier extraction[5][6][10].To extract the require-
mendation problem based on customer requirements is ment information, semantic analysis is adopted. Four

854
855
thesaurus collections are used to match the containing all items a1, a 2, and aH , as well as b ;
requirements.They are represented as N = {n1, n 2, } , count(T) is the total number of data records contained
V = {v1, v2, } , ADJ = {adj1, adj2, } , ADV ={adv1, adv2, } in T ;In addition, the set {a1, a 2, , aH } embodies a
for noun, verb, adjective, and adverb, respectively. non-empty subset of P , whereas the set {b} exhibits a
Semantic rules obtained based on above approach are
non-empty subset of L .
represented as IF–THEN rule formats and stored in the
Apriori algorithm is adopted to acquire the classifi-
semantic rule database to indicate the inference
ers[9].After all the frequent itemsets are discovered,
relationship between requirements and predefined
association rules can be generated with the correspon-
phrases.After extraction,customer requirements are
ding support and confidence levels. Thus,classifiers are
acquired and represented as a set of phrases which are
built using association rule learning.
utilized in the following module to build the classifiers.
3.3. Classification of rules
3.2. Association rule mining
Before classifers generated in subsection 3.2 can be
Association rule mining technique is aiming at
utilized to classify the rules, a process called pruning
discovering the relationships among a large set of data
is done. By reason of acquiring the timely and accurate
items[7][8]. Let η = {i1, i 2, , ix , , iy, , im} be a set
responses, pruning of those classifier is necessary.Here,
of items, and T a set of database transactions, where CBA-CB algorithm is applied to produce the best
T = {t1, t 2, , tQ} . Each transaction tq (1 ≤ q ≤ Q) com- classifiers out of the whole set of rules.CBA-CB
prises a set of items and a assigned unique identifier. A algorithm is based on the idea that only those rules
transaction tq is said to contain ix when ix ⊆ tq . As which are more general and hold high support as well
the association rule learning method excels in finding as confidence levels are necessary for the classification
the complex relationships among a huge number of task[9]. The unnecessary rules should be pruned by
semior non-structured items, it is adopted here to database coverage.Two principles are defined in CBA-
acquire the rules.An association rule is an implication CB as follows:
of the form ix ⇒ iy , where ix ⊆ P and iy ⊆ C as well General rules. Given two rules rx ⇒ lj and
as ix ∩ iy = ∅ corresponding to requirement phrases and ry ⇒ lj ,where 1 ≤ j ≤ N ,the first rule is more general
class labels, respectively. The rule ix ⇒ iy in the than the second one if rx ⊆ ry .
transaction set T holds the support s % and the Precedence rank. Given two rules rx and ry , rx
confidence c% , where s % is the probability has a higher precedence than ry , namely rx ry , if
P (ix ∩ iy ) that ix and iy hold together among all the (1) the confidence of rx ( con( rx ) ) is greater than that
transactions and c% is the conditional proba- of ry ( con( ry ) ); or (2) con( rx) = con( ry ) , but the
bility P (iy ix ) that iy is true under the condition of ix . support of rx ( sup( rx ) ) is greater than that of
In a general form, the mined association rules are ry ( sup(ry) ); or (3) con(rx) = con(ry) and
represented as the following: sup( rx ) = sup( ry ) ,but rx is generated earlier than ry .
a1 AND a2 AND AND aH ⇒b with Support = s%;Confidence = c% The rule set R = {r1, r 2, ", rM } comprise M rules.
where phrase as = pi ⊆ P and 1 ≤ s ≤ H ,1 ≤ i ≤ M , class Each rule rm is pruned according to the first principle.
label b = lj ⊆ L,1 ≤ j ≤ N ; and for any two elements in After pruning, general rules are selected and stored in a
the precedence, ax and a y , where 1 ≤ x, y ≤ H and pruned rule set, Rp. Rank all the rules in Rp in
descendent order according to the second principle and
x ≠ y, ax ∩ ay = ∅ ; s% and c% refer to the support then record the ranked rules in a rule set RD. For each
and confidence levels for this rule, respectively. They rule rm in RD, the phrases involving in its precedent
are calculated as Equation (2) and (3) , respectively. part comprise a set, PRrm , where PRrm ⊆ P. Let L(rm)
count (a1 AND a 2 AND AND aH AND b) present the class label associated with rm, where
s% = ×100% (2)
count (T ) L (rm ) = lj . Set a cover-count zero for each transaction
count (a1 AND a2 AND AND aH AND b) record ts, namely L(ts) = 0. For each rule rm in RD
c% = ×100% (3) ,search all the records in the transaction database. For
count (a1 AND a2 AND AND aH )
any record ts, if it satisfies the condition PRrm ⊆ Pts, it
where count (a1 AND a 2 AND AND aH AND b) is
is selected. All the selected transaction records
the number of transaction records in database T comprise a new set T ' , where T ' ⊆ T . Increase the

855
856
cover-count by one for all the transaction records in S
set T ' . For any rule rm in RD, if it satisfies the condition e = ∑ hs / S
L(rm ) = L(ts ) , where ts ∈ T ' , it is put into the filtered s =1

rule set, RF. Then delete the selected rule in set RD and 1, C (ts ) ∈ C (ts ) | ∀t s ∈ T
empty set T ' . Finally, delete record ts in the trans- s.t vs =  (5)
action database if it satisfies the condition L (ts) ≥ LTH ,  2, else
where LTH is a threshold for cover-count. where e means the recommendation accuracy, namely
The pruned rules in set RF are the most significant the percentage of the transactions in the test set that are
and finally selected as classifiers to predict the class correctly classified; hs is a binary variable where it is
labels for new object customer. The object customers
set to one if transaction t s is correctly classified, and
comprise an object database O = {ok}K . For object
zero otherwise.
customer k , the requirement record is described as a
o
set of phrases, P j , where P ok ⊆ P and P ok ≠ P ts . 4. Simulation Results
Select rules rm from set RF which satisfy the condition
PRrm ⊆ Pok ,and put the selected rules in the classifier We have designed and implemented the proposed
rule set RC. Group all the rules in set RC. based on their recommendation system in an Internet programming
associated class labels. Suppose N groups are environment. The architecture of system is composed
generated, where G = { g 1, g 2, , g N } . Each group of three tire : client,application server and database
server. Five function modules are implemented
gn associates with a class label, namely
according to the system framework of the recom-
L ( gn ) = lj ,1 ≤ j ≤ N . Therefore, the classification based mendation system. In an Internet environment, we
on multiple-classifiers can be formulated as follows: apply ASP language to create the dynamic and
P ok ⇒ L ( gn ),1 ≤ n ≤ N interactive personalized pages for the Web site.To
allow the rules to be queried and managed efficiently,
s.t . ∑ con(r
m
m ) ≥ ψ | ∀rm, rm ∈ gn (4) this research deploys the Web database query run on
o
IIS server. Based on the historical sales records,
where P j represents the requirement information of transaction database is established comprising 2136
customer k; L(gn) means the class label associated with transaction records, as shown in Table 1, where
nth group gn ;con(rm) is the confidence of rule rm in customer requirements are described as a set of phrases
RF, and ψ is the threshold for confidence sum. Eqs. (4) and the corresponding class labels indicate the mobile
indicate that for the rules selected as classifiers, their phone that has been purchased.
associated class labels are selected as recommended
ones only if their confidence sum exceeds a certain Table 1. Part of Transaction records
threshold. It enables multiple class labels to be Record Requirement phrases Class
identified based on strong patterns thus adapting to the Label
problems where multiple recommendations are t1 Stylish,colorful screen, light, Nokia
black, elegant,high security 8800se
preferred to allow customers to make comparisons
t2 More functions, small,light, Dopod
among a small set of similar products. camera, large buttons 586
t3 Larger screen, business Samsung
3.4. Performance test function,Bluetooth, picture SGH-D418
and video
System performance of the proposed recomm- … … …
endation system is test and validated via the accuracy t999 Small,stylish,cheap , video, VK
measurement [8]. Suppose the test set T comprises S friendly keypad, camera 550
records, where T = {t s} . For each record in set T , t1000 Entertainment support, slim, NEC
cheap,duralbe N3105
ts = {P ts , C (ts )} , the associated class labels assigned by
the classification module comprise a set C (ts ) . Then Allowing the transaction database, the classifiers are
the recommendation accuracy is computed similar with identified. After pruning, the pruned classifiers are
[8] as following used to assign the class labels for future customer
requirements. The best classifiers are generalized after
pruning using CBA-CB algorithm . The system perfor-
mance has been validated using 500 test records. As

856
857
shown in Table 2, the second column indicates the are extracted from text documents and transformed into
mobile phone product that has been purchased in each a set of significant phrases. And a set of association
test record, and the third column lists the mobile phone rule are mined from database using Apriori algorithm.
products that are recommended by the recommen- To enable the timely and accurate responses, CBA-CB
dation system using classifiers. In this case, 452 test algorithm is applied to produce the best rules out of the
records are correctly recommended resulting in the whole set of rules. The best classifiers are then
recommendation accuracy as 90.4%. generated after the test and validation of those rules.
These classifiers are used to predict the item labels for
Table 2. Part of recommendation results new customer requirements and thus assigns the
Record Original Recommended labels hs corresponding class labels to the customer. The system
class label analysis and design of the proposed recommendation
t1 Nokia 7610 Nokia 3230, 1 system as well as the results are also presented.
Nokia 7610
t2 Samsung Motorola E 680g, 0 References
SGH-D508 Samsung SGH-D720
[1] Liu Guo-rong,Zhang Xi-zheng, “Collaborative Filtering
t3 Nokia N72 Nokia N72, Motorola 1
A 1200, Sony Based Recommendation system for Product bundling”,
Ericsson W810c, Proceeding of ICMSE’06, Lille, France, pp.251-254, 2006.
Samsung SGH-D828 [2] Chen, H. C., Chen, A. L. P, “A music recommenation
system based on music data grouping and user interests”,
… … … … Proceedings of the ACM /CIKM, Atlanta, USA, pp. 231-
238,2004.
t998 Nokia 7610 Nokia 7610, Nokia 1
[3] Francisco, G. S., Rafael, “An integrated approach for
8910
developing e-commerce applications”, Expert Systems with
t999 VK 550 VK 550, NEC N200 1
Applications, vol. 28, pp.223–235, 2005.
t1000 Motorola Motorola V3, Nokia 1
[4] Han, P., Xie, B., Yang, F., “A scalable P2P recommender
V3 6111
system based on distributed collaborative filtering”, Expert
Systems with Applications, vol. 27, pp. 203-210, 2006.
After validation, the generated classifiers are stored [5] Hull, D. A, “Improving text retrieval for the routing
in the classifier rule database for further retrieval, problem using latent semantic indexing”, Proceedings of 17th
update, and query. Online customers are allowed to ACM international conference on research and develop-ment
search and buy mobile phone products via the Internet. in information retrieval, Dublin, Ireland pp.282–289,1999.
By connecting with the database through ODBC proxy [6] Liu, B., Hsu, W., & Ma, Y, “Integrating classification and
server, data and rules can be easily retrieved to support association rule mining”, Proceedings of 4th inter-national
conference on knowledge discovery and data mining, New
the application server to process the classification York, USA, pp. 27-31, 1998.
tasks. The most valuable product alternatives are then [7] Wang, F.-H., & Shaob, H.-M, “Effective personalized
identified and represented to the online customers via recommendation based on time-framed navigation clustering
HTML file format. Supported by the associative and association mining”, Expert Systems with Applications,
classification-based recommendation system, online vol. 27,pp.365-377, 2002.
customers are able to find the mobile phone products [8] Yiyang Zhang, Jianxin (Roger) Jiao, “An associative
that accord with their requirements mostly among classification-based recommendation system for persona-
numerous available mobile phones. This solves the lization in B2C e-commerce applications”,Expert Systems
information overload problem thus improving the with Applications, vol. 33 ,pp. 357–367, 2007.
efficiency and effectiveness of businees to customer e- [9] Agrawal, R., Srikant, R, “Fast algorithms for mining
association rules in large databases”, Proceedings of 20th
commerce. international conference on very large data bases, Santiago
de, Chile , pp. 487-499, 2003.
5. Conclusions [10] Weng, S.-S., & Liu, C.-K, “Using text classification and
multiple concepts to answer e-mails”, Expert Systems with
In this paper,we propose a personalized recommen- Applications, vol. 26,pp. 529-543, 2004.
dation system using association rule mining and
classification in e-commerce. Customer requirements

857
858

Você também pode gostar