Você está na página 1de 7

Collaborative filtering recommendation with threshold

value of the equipotential plane in implication field


Hoang Tan Nguyen Hung Huu Huynh Hiep Xuan Huynh
Department of Information and University of Science and Technology Cantho University
Communications of Dong Thap 54 Nguyen Luong Bang Street, Lien 3/2 Street, Ninh Kieu District, Can
12 Tran Phu Street, ward 1, Cao Lanh Chieu District, Da Nang City, Viet Nam Tho City, Viet Nam
City, Dong Thap, Viet Nam (+84)905444669 (+84)985796067
(+84)913794800 hxhiep@ctu.edu.vn
hhhung@dut.udn.vn
hoangntdt@gmail.com

ABSTRACT between the user and the item), in this approach, the authors are
Collaborative filtering is one of the most popular and effective particularly interested in the ratio or implicative relationship
techniques available today in the recommender system. However, between the user and the data item in a particular context in order
most of them use symmetric similarity measures. Therefore, the to make recommendations to the user more effective. Another study
default effect and the role of the pair of users are the same, but in in the application of statistical implication analysis to the
practice this may not be true. In addition, they only logically recommender system was the user-based collaborative filtering
demonstrate the existence of a priority relationship between two recommender system using association rules combined implication
users rather than the level of the relationship in practice. In this cohesion measure [16] to calculate the similarity for each pair of
paper, we propose a new approach for the collaborative filtering users in collaborative filtering. Recently, in [5], we was proposed
based on the variation analysis of the implication index. An recommendation based on the variance of implication index in
asymmetric measure is developed which can be used to rank or implication field to user for solving these issues.
filter information based on the variation of the implication index by In this paper we also use statistical implication analysis to propose
a counter-example. This measure provides a meaningful a new approach to collaborative filtering based on threshold value
recommendation with a certain level of implication. Experimental of the equivalence plane in the implication field [8] to continue to
results shown that the proposed approach can overcome the solve the problems of asymmetric user influence and the
drawbacks in the traditional recommender systems. implication relationship between the users in the recommender
systems.
Keywords
The paper is organized in five parts, the first one introduces the
Implication index; implication field; collaborative filtering; context and issues to be solved by the present system as well as
implication threshold; equipotential plane. proposing our proposed approached, and the second part presents
the related contents. To the statistical implication analysis and the
1. INTRODUCTION extended studies in the implication field, the third part presents the
Because of the rapid increase of data in era of information model of the recommender system based on the variance of the
explosion today, recommender systems [1][2] become a tool that is implied index in the implication field, the next part is the
extremely necessary and widely used more in electronic trade and experimental section model with scenarios and finally conclusions.
services such as Amazon, Pandora, Netflix, etc. The objective of
the recommender systems is to filter useful information from a 2. IMPLICATION STATISTICAL FIELD
large amount of information so that it is predictable, user will use
them to rate for an item and thereby recommendation items 2.1 Implication statistical analysis
(products, services, etc.) suitable for the user. Algorithms for the Statistical implication analysis (SIA) theory [11] [13] [14],
recommender system have attracted the attention of the researchers proposed by Regis Gras, studies the implication relationship of data
for practical application. Among them, the collaborative filter variables. Measures in the analysis implicative statistical us
algorithms [17] are the most widely used. Most of these algorithms implication index (aka Gras implication index) and implication
are based on the measure of symmetry for filtering information and intensity, are used to detect the rule or R-rule (rule of the rule)
recommendations for users. Recently, several solutions have been strong implicative relationship between the two sides of the rule, or
proposed that use asymmetric similarity to the recommendation to measure the correlation between two variables (individual,
system, such as asymmetric similarity for collaborative filtering via attribute ...), these measures are asymmetric. In addition, statistical
matrix factorization [3][4][6]. Recommendation with asymmetric implication analysis focuses on counter example factor analysis. It
user influence and global importance value [18] to address the can be presented as follows:
asymmetric effects of users in the recommendation system. Let 𝐸 be a finite set of binary variables, A and B are two subsets of
Another new trend is the use of statistical implication analysis in 𝐸 , respectively, which contain the elements 𝑎 ∈ 𝐴 such that
the recommendation system, which addresses the problem of 𝐴(𝑎) = 𝑡𝑟𝑢𝑒 and 𝑏 ∈ 𝐵, such that 𝐵(𝑏) = 𝑡𝑟𝑢𝑒, sets 𝐴̅ , 𝐵̅ is the
asymmetric user influence and solves the problem of assessing the complement of sets 𝐴 and 𝐵 respectively, let 𝑛𝑎 = 𝑐𝑎𝑟𝑑(𝐴), 𝑛𝑏 =
occurrence or functional relationship. Interaction between users 𝑐𝑎𝑟𝑑(𝐵) is the cardinality of 𝐴 and 𝐵 respectively, 𝑛𝑎̅ =
and data items in practice, such as the recommender system model 𝑐𝑎𝑟𝑑( 𝐴̅), 𝑛𝑏̅ = 𝑐𝑎𝑟𝑑( 𝐵̅) is the cardinality of the set 𝐴̅ and the set
based on approach to association rules combined implicative 𝐵̅ and 𝑛𝑎𝑏̅ = 𝑐𝑎𝑟𝑑(𝐴 ∩ 𝐵̅) is the cardinality of the set 𝐴 ∩ 𝐵̅, that
measure [15], to overcome the disadvantage of traditional is a set containing the elements that satisfy the properties 𝑎 = 𝑡𝑟𝑢𝑒
recommender systems (They only focus on the logic that and 𝑏 = 𝑓𝑎𝑙𝑠𝑒 , 𝑛𝑎𝑏̅ also called counter-example, and also
demonstrates the existence or absence of a priority relationship randomly and independently selects subsets of X and Y same
cardinality with 𝐴, 𝐵 respectively, meaning 𝑐𝑎𝑟𝑑(𝑋) = 𝑛𝑎 and Now, to further examine the relationship between the implication
𝑐𝑎𝑟𝑑(𝑌) = 𝑛𝑏 . Let 𝑋̅ and 𝑌̅ respectively be the complement of 𝑋 index and implication intensity. Take the primitive of the equation
and 𝑌 in 𝐸 and have corresponding cadinality as 𝑛𝑎̅ = 𝑛 − 𝑛𝑎 (1), we have:
𝑛𝑏̅ = 𝑛 − 𝑛𝑏 . dφ 1 −q2
=- e 2 <0 (5)
The implication relationship between 𝐴 and 𝐵 is modeled in the dq √2π

statistical implication analysis as follows (see Figure 1). This confirms that the implication intensity increases as 𝑞
decreases, but the rate of increase is determined by formula (5),
which allows for a more rigorous study of the variability of 𝜑.
2.3 Implication Field
2.3.1 Implication statistical field
Figure 1. The illustration of the components of statistical analysis Consider the implication index 𝑞(𝑎, 𝑏̅) in the four-dimensional
implicated by Venn diagrams space 𝐸, with the point 𝑀 whose coordinates are the parameters
associated with the binary variables a and b are (𝑛, 𝑛𝑎 , 𝑛𝑏 , 𝑛𝑎𝑏̅ ),
Implication intensity measure 𝜑(𝑎, 𝑏) of rule 𝐴 → 𝐵 is defined by
then 𝑞(𝑎, 𝑏̅) is a scalar field by applying the mapping from space
[11][13]:
𝑅 4 to space 𝑅. For the vector 𝑔𝑟𝑎𝑑. 𝑞 contains the partial
𝜑(𝑎, 𝑏) = 1 − Pr( 𝑄(𝑎, 𝑏̅) ≤ 𝑞(𝑎, 𝑏̅)) derivatives of 𝑞 for the variables 𝑛, 𝑛𝑎 , 𝑛𝑏 , 𝑛𝑎𝑏̅ is a special gradient
𝑛𝑎𝑏
̅ ∞ field is called implication field, because it satisfies the Schwartz
𝜆𝑠 −𝜆 1 𝑡2
(1)
1−∑ 𝑒 = ∫ 𝑒 − 2 𝑑𝑡 , với 𝑛𝑦 < 𝑛 criteria for the mixed differential, that is, The mixed derivative
= 𝑠! √2𝜋
𝑠=0 ̅) 𝑞(𝑎,𝑏 event of each pair of variables [8], is:
{ 0, other wise δ δq δ δq
( )= ( ) (6)
𝒏𝒂 𝒏𝒃̅ δna∧b
̅ δnb δnb δna∧b̅
where 𝝀 = and 𝑞(𝑋, 𝑌̅) is implication index and is definded
𝒏
by: Similar to each other pairs in the variables (𝑛, 𝑛𝑎 , 𝑛𝑏 , 𝑛𝑎𝑏̅ ) ,
𝒏𝒂 𝒏̅
𝒃
g𝑟𝑎𝑑 𝑞 is considered to be the potential of 𝑞. Vector grad q is
𝒏𝒂𝒃
̅− 𝒏
̅) =
𝒒(𝒂, 𝒃 (2) performed to change the space of the confidentiality of(3) the case, it's
𝒏𝒂𝒏̅
𝒃
√ sort of the low value to a higher value. At each point of the gradient,
𝒏

In terms of approximation (e.g. 𝜆 ≥ 4 ), 𝑞(𝑎, 𝑏̅) is the we observe an increase in the implied density of space and to what
approximation of the normal distribution 𝑁(0,1). extent the rate at which it changes under the influence of one or
more parameters.
The implication rule that X→Y is admissible at the confidence
level 𝛼 if and only if 𝜑(𝑋, 𝑌) ≥ 1 − 𝛼. [11][13]. 2.3.2 Implication index equipotential plane
Consider the implication index as a function of four
2.2 Implication index variation variables𝑞(𝑛, 𝑛𝑎 , 𝑛𝑏 , 𝑛𝑎^𝑏̅ ). A line or plane of equipotential in field
Let consider small variations in the neighborhood of all four 𝐶 is curved in 𝐸, an ordered 4-dimensionals space along which or
observed values of variables: 𝑛, 𝑛𝑎 , 𝑛𝑏 , 𝑛𝑎𝑏̅ . These variables must at which point a variable 𝑀 maintains the same value of potential
be considered as real numbers and q as a continuously of 𝑞. The equation of this curve is shown in [8]:
differentiable function with respect to these variables constrained
𝑛𝑎 𝑛𝑏̅
to respect inequalities: 0 ≤ 𝑛𝑎 ≤ 𝑛𝑏 ; 𝑛𝑎^𝑏̅ ≤ inf{𝑛𝑎 , 𝑛𝑏 } and 𝑛𝑎𝑏̅ −
𝑞(𝑎, 𝑏̅) − 𝑛 =0 (7)
sup{𝑛𝑎 , 𝑛𝑏 } ≤ 𝑛. The differential of q in Frechet’s geometry is 𝑛𝑎 𝑛𝑏̅
expressed in the following way: √
𝑛
𝜕𝑞 𝜕𝑞 𝜕𝑞 𝜕𝑞 Next, on that curve, the scalar product of grad q and 𝑑𝑀 are 0 (in
𝑑𝑞 = 𝑑𝑛 + 𝑑𝑛 + 𝑑𝑛 + 𝑑𝑛 ̅ = 𝑔𝑟𝑎𝑑𝑞. 𝑑𝑀 (3) (4)
𝜕𝑛 𝜕𝑛𝑎 𝑎 𝜕𝑛𝑏 𝑏 𝜕𝑛𝑎^𝑏̅ 𝑎^𝑏 (3) and (7)). This is understood as the orthogonal of a gradient
tangent or a hyperplastic tangent to that curve, that is, to the
with 𝑀 the point with the coordinates (𝑛, 𝑛𝑎 , 𝑛𝑏 , 𝑛𝑎𝑏̅ ) belong to
equipotential plane.
the scalar vector field 𝐶, 𝑑𝑀 is the differential component vector of
the instance variables and grad q is the partial differential vector of To illustrate, considering the relationship from a potential F
the variables. depends only on two variables, figure 2 shows, for example, the
orthogonal orientation of the gradient to the difference of
From (3), the differential of the function 𝑞 appears as a scalar isomorphism, along the constant F but changes from F7 to F10 [8].
product between gradient q and the increase of 𝑞 on the surface
representing the variables of the function 𝑞(𝑛, 𝑛𝑎 , 𝑛𝑏 , 𝑛𝑎𝑏̅ ). 𝑔𝑟𝑎𝑑 𝑞 Grad F in M’
𝒚
denotes the variability of the function of four variables, which is
the cardinalities of the sets 𝐸, 𝐴, 𝐵, and 𝐴 ∩ 𝐵̅, which points to the Grad F in M
direction of the function 𝑞 in four dimensions space. In fact, the M’
M
value of this differential lies in the estimation of the increase F7

(positive or negative) of q that we note 𝛥𝑞 relative to the respective F8
variations ∆𝑛, ∆𝑛𝑎 , ∆𝑛𝑏 , and ∆𝑛𝑎𝑏̅ . So we have: F9
𝜕𝑞 𝜕𝑞 𝜕𝑞 𝜕𝑞
F10
∆𝑞 =
𝜕𝑛
∆𝑛 +
𝜕𝑛𝑎
∆𝑛𝑎 +
𝜕𝑛𝑏
∆𝑛𝑏 +
𝜕𝑛𝑎^𝑏
∆𝑛𝑎𝑏̅ + 𝑜(∆𝑞) (4) 𝒙
̅

with 𝑜(𝑞) is an infinitely small. Figure 2. The illustrating potentials relationship depends only on 2
variables.
In this case, the potential q forms the equipotential plane (shown in considered are still acceptable when the number of counter-
Figure 2 for easy representation). examples remains in the "acceptable" threshold, because in these
We can understand that this case stronger for strict plane and situations rules are still active and effective. In data analysis, the
weakening in the more sparse. To get a value q in this case, fixed 3 problem is to define a consensus standard, thereby quantifying the
variables, such as 𝑛 , 𝑛𝑎 , 𝑛𝑏 and q values compatible with the confidence threshold of the rule according to user requirements.
constraints of field. In this section, we propose a recommendation based on the
variation of the implication index depending on the variation of the
3. RECOMMENDATION BASED ON counter example in the implication field for determining the
EQUIPOTENTIAL PLANE IN equipotential plane of implication index set, from there, the item
(or k-top items list) consultant is suitable for the user with a definite
IMPLICATION FIELD implication threshold. The threshold θ is the tolerance value of q in
3.1 Implication statistical rules the same equipotential plane, θ is defined by byFactor. To
Let 𝐷 is a dataset that consists of 𝑇 transactions, each transaction determine θ, it is necessary to consider how the dependent variable
𝑇𝑖 consists of objects or items that are objects that appear in 𝑞(𝑎, 𝑏) varies when an element 𝑥 is added to (or removed from) the
transactions such as (products, services, ...). The itemset 𝐼 is a set sample data, with four occurrences.
consisting of m items. The frequency of occurrence of a data item
To more specific, the partial derivative according to 𝑛, 𝑛𝑎 , 𝑛𝑏 , 𝑛𝑎𝑏̅ :
in the database is denoted δ. The support of the set 𝑋, denoted
by 𝑠𝑢𝑝𝑝𝑜𝑟𝑡(𝑥), is the percentage of transactions that contain 𝑋 in 𝜕𝑞 1 𝑛𝑎 𝑛𝑏̅
= (𝑛𝑎⋀𝑏̅ + ) (8a)
database 𝐷.The association rule is a rule of the form 𝑋 → 𝑌, where 𝜕𝑛 2√𝑛 𝑛
𝑋, 𝑌 ⊂ 𝐼 are itemsets, 𝑋 is called premise, 𝑌 is the consequence. 𝜕𝑞 1 𝑛𝑎𝑏̅ 𝑛 2 1 𝑛𝑏̅
3
(8b)
Association rules is usually evaluated by two metric is the support =− ( ) − √ (5d)
𝜕𝑛𝑎 2 𝑛𝑏̅ 𝑛𝑎 2 𝑛𝑎
(support - S) and reliability (confidence - 𝐶). The support of the √
𝑛
rule 𝑋 → 𝑌 denoted by sup (𝑋 → 𝑌) is the ratio of transactions 1
𝛿(𝑋∪𝑌 ) 𝜕𝑞 1 𝑛𝑎 2 3 1 𝑛𝑎 1 1
including X and Y to total transactions. 𝑠𝑢𝑝(𝑋 → 𝑌 ) = = 𝑛 ̅ ( ) (𝑛 − 𝑛𝑏 )2 + ( )2 (𝑛 − 𝑛𝑏 )2 (8c)
|𝑇| 𝜕𝑛𝑏 2 𝑎𝑏 𝑛 2 𝑛
The confidence of the rule 𝑋 → 𝑌, symbol: conf (𝑋 → 𝑌) is the
𝜕𝑞 1 1
probability that a transaction contains 𝑋 will contain 𝑌 defined by: = =
𝛿(𝑋∪𝑌 ) 𝑠𝑢𝑝(𝑋→𝑌 ) 𝜕𝑛𝑎^𝑏̅ 𝑛 𝑛̅ 𝑛 𝑛 − 𝑛𝑏 ) (8d)
𝑐𝑜𝑛𝑓(𝑋 → 𝑌) = = = 𝑃(𝑌|𝑋). √ 𝑎 𝑏 √ 𝑎(
𝛿(𝑋) 𝑠𝑢𝑝(𝑥) 𝑛 𝑛
The rule in statistical implication analysis for recommnender From (8d), if 𝑛𝑎𝑏̅ increases, the implication index increased, and
system is as follows: Assume I is the set of m items, 𝐴 ⊂ 𝐼 that is thus the intensity implies decreased.
the set of items rated by the user 𝑢𝑎 ; 𝐴̅ is the complement of A. The Let 𝜆1 , 𝜆2 and 𝑞1 , 𝑞2 , corresponding to 𝜆, 𝑞 are related to the
set 𝐵 ⊂ 𝐼 is the set of items rated by the user 𝑢𝑏 ; 𝐵̅ is the original data sample and the extended data sample as Table 1A
complement of B; 𝑛𝑎 = 𝑐𝑎𝑟𝑑(𝐴) is the number of data items (Value ± 1 corresponds to 1 in the additional case and -1 when
rated by the user 𝑢𝑎 , which is the number of elements of the set A); removing the 𝑥 in the dataset) and variability ∆𝑞 = 𝑞2 − 𝑞1 as
𝑛𝑏 = 𝑐𝑎𝑟𝑑(𝐵) is the number of data items rated by user 𝑢𝑏 Table 1B
(number of elements of set B); 𝑛𝑎𝑏̅ = 𝑐𝑎𝑟𝑑(𝐴 ∩ 𝐵̅) is the number
Table 1A. Table of variability of 𝑛𝑎 , 𝑛𝑏 , 𝑛𝑎𝑏̅ , 𝜆
of data items rated by the user 𝑢𝑎 but not rated by the user 𝑢𝑏 . In
a b ∆𝑛𝑎 ∆𝑛𝑏̅ ∆𝑛𝑎𝑏̅ 𝜆1 𝜆2 ∆𝜆
addition to the usual measures, in the above paragraph, the specific
measure of the implicative rules is implication index, this measure (𝑖) 0 0 0 ±1 0 𝑛𝑎 (𝑛𝑏̅ ± 1) >0
expressed the degree to implication which association rules are not, 𝑛±1
an implication rule expressed by a set of four variables (𝑖𝑖) 0 ±1 0 0 0 𝑛𝑎 𝑛𝑏̅ <0
(𝑛, 𝑛𝑎 , 𝑛𝑏 , 𝑛𝑎𝑏̅ ). They are called the cardinalities of the implication 𝑛𝑎 𝑛𝑏̅ 𝑛±1
rules. In other words, the relationship between the user 𝑢𝑎 and 𝑢𝑏 𝑛 (𝑛𝑎 ± 1)(𝑛𝑏̅ ± 1) >0
(𝑖𝑖𝑖) ±1 0 ±1 ±1 ±1
is the relationship between the item set 𝐴 is liked by the user 𝑢𝑎
𝑛±1
and the item set 𝐵 is liked by the user 𝑢𝑏 represented by the set of
(𝑖𝑣) ±1 ±1 ±1 0 0 (𝑛𝑎 ± 1)𝑛𝑏̅ >0
four elements (𝑛, 𝑛𝑎 , 𝑛𝑏 , 𝑛𝑎𝑏̅ ).
𝑛±1
3.2 Threshold implication variability between Table 1B. Table of variability of 𝑞
equipotential planes ∆𝑞
As discussed in the previous section, statistical implication analysis
(𝑖) 𝑛𝑎 (𝑛𝑏̅ + 1) 𝑛𝑎 𝑛𝑏̅ (a)
𝑛𝑎𝑏̅ −
focuses on counter example factor to analysis. 𝑛 + 1 − 𝑛𝑎𝑏̅ − 𝑛
𝑛 𝑛̅
It is difficult to replace an original rule by another rule when a few √𝑛𝑎 (𝑛𝑏̅ + 1) √ 𝑎 𝑏
counter examples (unlikelihood) appear, only when the counter 𝑛+1 𝑛
example higher the confidence of the rule decreases and the rule (𝑖𝑖) 𝑛𝑎 𝑛𝑏̅ 𝑛𝑎 𝑛𝑏̅ (b)
𝑛𝑎𝑏̅ −
can be denied. However, when the number of example (likelihood) 𝑛 + 1 − 𝑛𝑎𝑏̅ − 𝑛
is numerous and the number of counter examples is rarer, the rule 𝑛 𝑛̅ 𝑛 𝑛̅
√ 𝑎 𝑏 √ 𝑎 𝑏
𝑛+1 𝑛
becomes stronger and is recognized. For example, let's look at the
rules that are acceptable. “Ferrari cars are red." Even if one or two (𝑖𝑖𝑖)
(𝑛𝑎𝑏̅ + 1) −
(𝑛𝑎 + 1)(𝑛𝑏̅ + 1) 𝑛 𝑛̅
𝑛𝑎𝑏̅ − 𝑎 𝑏
(c)
of the counter examples appear (Ferrari cars are not red), this rule 𝑛+1 − 𝑛
𝑛 𝑛̅
is maintained, and it will be even confirmed once again by the √(𝑛𝑎 + 1)(𝑛𝑏̅ + 1) √ 𝑎 𝑏
𝑛+1 𝑛
release of new examples. Thus, contrary to mathematics, where
rules are not allowed to have any exceptions, the rules here
(𝑖𝑣) (𝑛𝑎 + 1)𝑛𝑏̅ 𝑛𝑎 𝑛𝑏̅ (d) Step 1: generate rules set from set of transactions by using data
𝑛𝑎𝑏̅ −
𝑛 + 1 − 𝑛𝑎𝑏̅ − 𝑛 mining algorithms (such as apriori).
𝑛 𝑛̅
√(𝑛𝑎 + 1)𝑛𝑏̅ √ 𝑎 𝑏 Step 2: Calculating cardinalities of implication rules, Details are as
𝑛+1 𝑛
follows: Count number of transactions n. Generating two binary
To determine the variation threshold of implication index θ on the (True/False) matrixes 𝑙ℎ𝑠𝑅𝑢𝑙𝑒𝑠, 𝑟ℎ𝑠𝑅𝑢𝑙𝑒𝑠, with true value if item
𝜕𝑞 Δ𝑞
equipotential planes, let
𝜕𝜉
and Δ𝜉 respectively partial derivatives j belong to left hand side for 𝑙ℎ𝑠𝑅𝑢𝑙𝑒𝑠 (respectively, right hand
side for 𝑟ℎ𝑠𝑅𝑢𝑙𝑒𝑠). Then, for each rule[𝑖]:
and increment of q according to ξ, where 𝜉 ∈ {𝑛, 𝑎, 𝑏, 𝑎𝑏̅}. A
𝑙ℎ𝑠𝑃𝑟𝑜𝑑𝑢𝑐𝑡 = 𝑙ℎ𝑠𝑅𝑢𝑙𝑒𝑠 × (𝑑𝑎𝑡𝑎)𝑇
1
variation of 𝑞 from the addition (or eliminate) of an individual on
the dataset can change the number of k implication rules based on 𝑛𝑎[𝑖] = 𝑟𝑜𝑤𝑆𝑢𝑚(𝑙ℎ𝑠𝑃𝑟𝑜𝑑𝑢𝑐𝑡[𝑖])
Δ𝑞
the dataset, this leads to an increase in𝜃 = 𝑘 , it mean:
Δ𝜉 𝑟ℎ𝑠𝑃𝑟𝑜𝑑𝑢𝑐𝑡 = 𝑟ℎ𝑠𝑅𝑢𝑙𝑒𝑠 × (𝑑𝑎𝑡𝑎)𝑇
𝜕𝑞 Δ𝑞 (9) 𝑛𝑏[𝑖] = 𝑟𝑜𝑤𝑆𝑢𝑚(𝑟ℎ𝑠𝑃𝑟𝑜𝑑𝑢𝑐𝑡[𝑖])
=𝑘 + 𝑜(𝑞)
𝜕𝜉 Δ𝜉 𝑛𝑎𝑏 [𝑖]: The calculation is the same as 𝑛𝑎[𝑖], 𝑛𝑏[𝑖]
𝜕𝑞 Δ𝑞 but on both the left and right sides.
where 𝑜(𝑞) is an infinitely small. , are definded with
𝜕𝜉 Δ𝜉 𝑛𝑎𝑏̅ [𝑖] = 𝑛𝑎 [𝑖] − 𝑛𝑎𝑏 [𝑖]
formulas from (8a) to (8d) and (a) to (d) of table 1B.Threshold θ is
Δ𝑞 Step 3: Return (𝑛[𝑖], 𝑛𝑎 [𝑖], 𝑛𝑏 [𝑖], 𝑛𝑎𝑏̅ [𝑖])
defined as 𝑘 from (9).
Δ𝜉 Algorithm 2. RBEP (Recommendation by Equipotential Plane)
3.3 Recommendation based on variation Input: dataset, threshold θ, variant factor byFactor
implication index with threshold value of Output: recommendation: item/ top k item list
equipotential plane Step 1: call IRG(dataset) for generating rules set and calculating
To provide a formal definition of the recommendation task, it is 𝑛, 𝑛𝑎 , 𝑛𝑏 , 𝑛𝑎𝑏̅ .
necessary to introduce some concepts of the consulting system.
Accordingly, the set of users in the system will be denoted by U, Step 2: With each 𝑟𝑢𝑙𝑒(𝑖) , calculate implication index 𝑞(𝑖)
and the set of items is I. In addition, the set of ratings in the system according to formula (2). After that, calculate partial derivatives of
is denoted by R, and the set of possible scales for a rating is 𝒮 (eg, 𝑞(𝑖) follow byFactor according to formula (5).
𝒮 = [1, 5] or 𝒮 = {𝑙𝑖𝑘𝑒, 𝑑𝑖𝑠𝑙𝑖𝑘𝑒}). Also, assume that no more than Step 3: Determine the set of recSet containing q on the same
one rating can be performed by any user 𝑢 ∈ 𝑈 for a particular item equipotential plane follows byFactor: (|∆𝑞(𝑎, 𝑏̅)| ≤ 𝜃), is defined
𝑖 ∈ Ι and written to 𝑟𝑢𝑖 ∈ 𝑅 for this rating. To determine the by equation (10).
subsets of users 𝑢 have rated an item 𝑖, the symbols 𝑈𝑖 are used.
Step 4: return recommended: item or k items from recSet set.
Likewise, Ι𝑢 represents the subset of the line items evaluated by a
user. Finally, the items were rated by two people 𝑢 and 𝑣, that The algorithms described above serves as the basis for a
is, Ι𝑢 ⋂Ι𝑣 , which is an important concept in the presentation of the recommendation model for a statistical-based recommendation
article, and Ι𝑢𝑣 is used to denote this concept, ie Ι𝑢𝑣 = Ι𝑢 ⋂Ι𝑣 . In a system RSIF (Recommender System based on Implication Field)
similar representation, 𝑈𝑖𝑗 is used to denote the set of users that as figure 3:
have rated both items 𝑖 and 𝑗, i.e. 𝑈𝑖𝑗 = U𝑖 ⋂U𝑗 Dataset Implication
field
One of the most important issues related to the recommender
system is the best recommendation and recommended items list n Algorithms Recom-mendation
Data mining ISA (IRG,
best items for the user. This issue included in the search, for a RBEP.etc.) item
particular user u, the new item 𝑖 ∈ Ι\Ι𝑢 which user u most likely algorithms
Knowledge
have interest in. When ratings are available, this task is often
defined as a regression or classification problem (multilayer) whose
purpose is to understand a function: Association
Rule set RSIF
𝑓: 𝑈 × 𝐼 → 𝑆 k-Recommen-
Model dation list (topk-
In which predicts the rating of 𝑓(𝑢, 𝑖) of a user 𝑢 for a new item𝑖. list)
This function is used to recommendation the active user 𝑢𝑎 to an Implication rules
set
item 𝑖 ∗ that rating the highest estimated value [1][2].
𝑖 ∗ = 𝑎𝑟𝑔 max 𝑓(𝑢𝑎 , 𝑗) (10) Figure 3. RSISF Recommender system model
𝑗∈𝐼\𝐼𝑢

Based on the results of studies on the above implication field, 4. EXPERIMENT


recommendation algorithms proposed as follows:
4.1 Dataset
Algorithm 1. IRG (Implication Rules Generator) With the system model suggested above, we conducted
Input: set of transactions experiments on a collection of MovieLens data collected by
studying GroupLens from the MovieLens site, of which
Output: implicative rule set and their cardinality (𝑛, 𝑛𝑎 , 𝑛𝑏 , 𝑛𝑎𝑏̅ ). approximately 100,000 ratings are from around 1682 films. It was
made by 943 users the ratings range from 1-5 corresponding to the

1 The transition matrix of data.


films rated from the lowest to the highest. The data set is {Star Wars (1977),Empire Strikes Back, The
preprocessed to serve the experiment to be more accurate, by: (1980),Return of the Jedi (1983)} => {Raiders of
226 the Lost Ark (1981)} -9.000696
- Standardization of data: Users who rank high (or low) for all {Empire Strikes Back, The (1980),Return of the
their films depending on the individual can lead to bias. 86 Jedi (1983)} => {Raiders of the Lost Ark (1981)} -8.970471
Eliminate this effect by normalizing the data so that the average
rating of each user is the same scale. Table 4. Error indexes of ISF, IBCF and UBCF model
- Selecting relevant data: Ignoring data can lead to bias and also Model RMSE MSE MAE
to speed up computation, by not interested in the film has had ISF 0.9434059 0.8900147 0.7419290
only a few times, because the ratings of these films may be
IBCFcosine 1.2372211 1.5307160 0.9264473
subject to bias due to lack of data, and users rated only a few
films because their ratings may be biased. UBCFcosine 0.9857491 0.9717012 0.7785217
The dataset has been preprocessed to avoid overfitting problems, IBCFPearson 1.2204847 1.4895830 0.9094559
as well as to get better accuracy. We conducted experiments in k- UBCFPearson 0.9987563 0.9975141 0.7919161
fold cross validation mode.
4.2 Experimental tools Trends variability implication factorial byFactor, here, elements
Experiments were conducted based on the 𝑖𝑚𝑝𝑙𝑖𝑐𝑎𝑡𝑖𝑣𝑒𝑓𝑖𝑒𝑙𝑑 𝑛𝑎𝑏̅ have a major role in strengthening or reject a rule (in theory
toolkit developed by our group using the R language, it includes implicative statistics mentioned in the previous paragraphs), this
statistical analysis tools for our proposed algorithms. factor increases have increased the implication index value, is
synonymous with strength reduction implication intensity, however
4.3 Scenario 1 The recommendation is based not significant reduction, so set the rule on equality of treatment
on the variance implication index by the implication remains previous level.
counter-example in the implication field The density of the implication field an unequal distribution, the
The experimental results of our proposed model in the dataset high implicative density in the equipotential plane has a slightly
described in the preceding paragraph, the collective rule was more variable indicator value and is more concentrated than the 5,
generated (with conditional support = 0.4 and confident = 0.4. A 11, 12 22, 23 and 27. The density of the implication field the least
total of 119 rules, after eliminating meaningless rule (the left side and the minimum of such aspects as the equipotential plane 3, 9,
of the rule by nil), and satisfying the implied magnitude greater than 10, 19 and 20, as shown in Table 3. This shows the suitability of
0.5, the remaining 84 rules, With the threshold θ = 0.337565 we the rule. With the variation of the implication index, where the
obtain the updated values of the implication index q in terms of the implication index a certain amount of variability, where the rule is
variation of any element of ( 𝑛, 𝑛𝑎 , 𝑛𝑏 , 𝑛𝑎𝑏̅ ) , in this scenario not accepted at a specified threshold, it will move to another
byFactor= 𝑛𝑎^𝑏̅ and collected 25 set of equipotential planes 3 equipotential plane whose implication threshold more appropriate.
dimensions (𝑛, 𝑛𝑎 , 𝑛𝑏 ), These hyperplanes have the potentials of And so, it will help to recommendation users of the item with the
the implication index of unevenness, listed in Table 3. The most appropriate level of implication.
equipotential plane of the equation is composed of the rules set A target user will be recommended a movie or a list of films that
number 1 {21, 35, 16, 17}, the rules set number 2 {38, 29, 15}, etc., he or she would like to follow corresponding content based on
and the rules set number 27 {150, 128, 125, 135, 211}. The sets on previous movies they viewed, as shown in table 3. It is possible to
each hyperplane have implication index values that are the same recommendation movie “Raiders of the Lost Ark (1981)” for users
with an approximation of θ as Table 2. who have seen movies “Empire Strikes Back, The (1980),Return of
Table 2. The Intensity of implication field on equipotential planes and their the Jedi (1983)”, (rule No.86).
implications by byFactor=𝑛𝑎𝑏̅
Eq. quantity 𝑞 Eq. quantity 𝑞 Eq. quantity 𝑞
4.4 Scenario 2: Comparison with user-based
plane of rule plane of rule plane of rule collaborative filtering
1 4 -9.0434710 1 -6.73528 19 1 -3.83082 To compare the accuracy of the proposed model with user-based
2 3 -8.80657 11 5 -5.97182 20 1 -3.5129 collaborative filter models (UBCF) using the Cosine and Pearson
3 1 -8.69112 12 5 -5.70381 21 2 -3.13998
measures, the experiment in this scenario is also carried out with
the results recorded in the figure. 4 and figure.5, the model has
4 2 -8.1998 13 4 -5.43779 22 5 -2.80447 better results ISF model UBCF use metrics Pearson but inferior
5 5 -7.75697 14 4 -5.1421 23 5 -2.58721 model Cosine UBCF use metrics, indicators of low error more ISF
6 4 -7.45843 15 4 -4.8574 24 2 -2.36298
then UBCF table 4.
7 4 -7.26654 16 2 -4.59922 25 2 -2.16484
8 3 -6.97816 17 4 -4.27338 26 3 -1.78044
9 1 -6.75977 18 2 -3.9652 27 5 -1.3093
Table 3. Implication rules and implication index equipotential plane no.1
Implicati
ID of rule Description of the rule on index
{Star Wars (1977),Empire Strikes Back, The
138 (1980)} => {Raiders of the Lost Ark (1981)} -9.149508 Figure 4. The ROC curve Figure 5. Precision and Recall
compares the ISF and other UBCF comparison between ISF and other
{Star Wars (1977),Raiders of the Lost Ark
modes UBCF modes
(1981),Return of the Jedi (1983)} => {Empire
90 Strikes Back, The (1980)} -9.053185
4.5 Scenario 3: Comparison with item-based [7] Rahul Katarya, Om Prakash Verma, Effective collaborative
movie recommender system using asymmetric user similarity
collaborative filtering and matrix factorization, The 2016 IEEE International
In this scenario, the ISF is compared to the IBCF using the Pearson Conference on Computing, Communication and Automation
and Cosine index over the ROC, Precision-Recall curves, in Figure ,DOI:10.1109/CCAA.2016.7813692,pp.1-12, 2016.
6 and Figure 7, the ISF model is better. The Cosine and Pearson
IBCF models, the ISF's lower rating than the IBCF in Table 4 [8] Régis Gras, Pascale Kuntz and Nicolas Greffard, Notion de
champ implicatif en analysis statistique implicative, The 8th
International Meeting on Statistical Implicative Analysis,
Tunisia, pp 1-21, 2015 (in French).
[9] Régis Gras, Dominique Lahanier-Router, Duality between
variables space and subjects space of the statistic implicative
analysis, Dualite entre espace des variables et espace des
sujets en analyse statisticque implicative, The VI
International conference, ASI Analyse statistique
implicative- Implicative statistical Analysis Caen (ASI6),
France, pp 1-28, 2012.
Figure 6. The ROC curve compares Figure 7. Precision and Recall
the ISF and other IBCF modes comparison between ISF and [10] Régis Gras, Pascale Kuntz, Discovering R-rules with a
other IBCF modes directed hierarchy,Journal Soft Computing - A Fusion of
Foundations, Methodologies and Applications (Volume 10
5. CONCLUSIONS Issue 5), Springer-Verlag, pp 453-460, 2006.
Approach to variation in the implication index was applied to [11] Régis Gras, Einoshin Suzuki Fabrice Guillet, Filippo
modeling ISF recommender system. The proposed model was also Spagnolo (Eds.), Statistical Implicative Analysis, Theory and
tested on the Movilens 100K and the implicativefield toolkit for Application, Springer Verlag Berlin Heidelberg, 2008.
user recommendation and evaluation with workflow models using
symmetry similarity measures, the results were mostly good more [12] Regis Gras, Pascale Kuntz. and Briand H., “Les fondements
than those using common symmetry, has a matching match. These de l’analyse statistique implicative et quelques
contributions are intended to increase the effectiveness of prolongements pour la fouille de données”, The
recommendations (to improve the accuracy of ranking predictions, Mathématiques et Sciences Humaines 39, pp.9-29, 2001.
and to show trends in rules). [13] Regis Gras, Raphael Couturier, Spécificités de l'Analyse
Statistique Implicative (A.S.I.) par rapport à d'autres mesures
6. REFERENCES de qualité de règles d'association, Quaderni di Ricerca in
[1] Adomavicius Gediminas, Tuzhilin Alexander, Toward the Didattica - GRIM (ISSN on-line 1592-4424, p.19-57, 2010.
Next Generation of Recommender Systems: A Survey of the
[14] Dominique Lahanier-Reuter, Didactics of Mathematics and
State-of-the-Art and Possible Extensions, IEEE transactions
Implicative Statistical Analysis, Statistical Implicative
on Knowledge and Data engineering, Vol.17 No.6, pp. 734 –
Analysis - Studies in Computational Intelligence, pp 277-
749, 2005.
298, 2008.
[2] Adomavicius Gediminas, Tuzhilin Alexander, Context-
[15] Nghia Quoc Phan, Ky Minh Nguyen, Hoang Tan Nguyen,
aware recommender systems, Springer US, pp. 217-253,
Hiep Xuan Huynh, Recommender system based approach
2011.
combining associationrule and implicative statistical
[3] Bin Cao, Qiang Yang, Jian-Tao Sun, Zheng Chen, Learning measure, Proceedings of the VIII National Conference on
bidirectional asymmetric similarity for collaborative filtering Fundamental and Applied IT Research (FAIR’15); Ha Noi,
via matrix factorization, Data Mining and Knowledge 2015. (in Vietnamese).
Discovery, Volume 22, Issue 3, pp.393–418, 2011.
[16] Lan Phuong Phan, Trang Uyen Tran, Hung Huu Huynh, Hiep
[4] Hoang Tan Nguyen, Hung Huu Huynh, Hiep Xuan Huynh, Xuan Huynh, the user-based collaborative filtering
Recommender system based on analysis Implicative recommeder system using associaion rules combined
statistical user preferences over time, IX International implication statistical cohension measure, Proceedings of the
Conference A.S.I. Analyse Statistique Implicative – IX National Conference on Fundamental and Applied IT
Statistical Implicative Analysis (ASI9), Franch, 2017 (in Research (FAIR’16); Cần Thơ, 2016, (in Vietnamese).
Vietnamese) (Accepted). [17] Francesco Ricci, Lior Rokach and Bracha Shapira,
[5] Hoang Tan Nguyen, Hung Huu Huynh, Hiep Xuan Huynh, Introduction to Recommender Systems Handbook, Springer-
Recommendation based on the variance of implication index Verlag and Business Media LLC, pp.1-35, 2011.
in statistical implication field, Proceedings of the X National [18] Zhi-Lin Zhao Chang-Dong Wang, Jian-Huang Lai AUI&GIV
Conference on Fundamental and Applied IT Research Recommendation with asymmetric user influence and global
(FAIR’17); Da Nang, 2017. (in Vietnamese) (Accepted). importance value. Public Library of Science ONE, pp.2016.
.
[6] Mukund Deshpande, George Karypis, Item-based top-N
recommendation algorithms. ACM Transaction on
Information Systems 22(1), pp. 143–177, 2004.
Authors’ background
Your Name Title* Research Field Personal website

Hoang, Tan Nguyen master student Data mining, Statistical none


Implicative Analysis
Hung, Huu huynh PhD candidate Computer Vision Scv.udn.vn/hhhung
Hiep, Xuan Huynh Associate Professor Data Mining, Artificial none
(HDR) Intelligence, Statistical
Implicative Analysis,
Wireless Sensor
Network

Você também pode gostar