Escolar Documentos
Profissional Documentos
Cultura Documentos
Local PurTree
Spectral Clustering
for Massive Customer
Transaction Data
Xiaojun Chen, Si Peng, Joshua Zhexue Huang,
Feiping Nie, and Yong Ming, Shenzhen University
T
A new method uses he clustering of customer transaction data is one of the most critical
a purchase tree to tasks in successful modern marketing and customer relationship man-
represent customer agement. Its used to categorize customers into different groups based on their
transaction data purchase behaviors, indicating that customers in the same cluster bought more
and computes the goods similar to each other than to those in methods cant handle large-scale transaction
other clusters. Early segmentation methods data due to their high computational complex-
distance between used general variables such as customer de- ity. Advanced clustering algorithms have been
mographics, life style, attitude, and psychol- proposed in the past, such as spectral cluster-
two trees, learning ogy because such variables are intuitive and ing,4 subspace clustering,5 multiview cluster-
easy to operate.1 But with the rapid increase ing, 6 and clustering ensemble,7,8 but few
a data similarity in the amount of customer behavior data, methods have been used for transaction data.
new studies use product-specific variables Recently, a PurTreeClust clustering algo-
matrix from the such as items purchased.2 These methods rithm was proposed for large-scale transac-
often define distance in transaction records, tion data.9 In this algorithm, a purchase tree
local distances and but such distance functions cant effectively is built for each customer from the large-scale
handle transaction data because the number transaction data, and a PurTree distance met-
the level weights of transaction records is often very large. ric is defined to measure the difference be-
Most distances defined in transaction data tween two purchase trees. The PurTreeClust
simultaneously. are categorical distances, so hierarchical algorithm builds a cover tree for indexing
clustering and genetic algorithms are com- the purchase tree dataset, selecting initial
monly used for clustering.2,3 However, such cluster centers through a fast leveled density
estimation method. Finally, the clus- most transaction datasets, items are The PurTree distance d(j i, j j, w)
tering result is produced by assigning organized with the same number of has the following properties:9
each customer to its nearest cluster categories. Here, we assume that all
center. However, as encouraging as this products are organized with the same d(j i, j j, w) [0, 1];
new method is, its difficult to adjust levels of categories, which means that if 1 l t H(F), then dl(j i, j j,
level weights in the PurTree distance, all leaf nodes in Y have equal depth. w) dt(j i, j j, w); and
and theres no optimization method for A purchase tree j is a subgraph of d(j i, j j, w) is a metric.
the clustering result. Y, that is, N(j) N(Y), E(j) E(Y).
In this article, we propose LPS, a lo- Given a node v N (j), there must Local PurTree Spectral
cal PurTree spectral clustering algo- exist a leaf node w N(j) such that Clustering
rithm, to solve these two shortcomings. the path from root(j) to w also exists Given a product tree Y and n pur-
We use a weighted PurTree distance to in Y. For each purchase tree j, H(j) = chase trees j, we want to cluster j
measure the difference between two H(Y). A purchase tree is used to illus- into c clusters. Let dijl be the leveled
purchase trees. This new method au- trate the items bought by a customer. tree distance between j i and j j on the
tomatically learns the data similarity lth level, which is computed as Equa-
matrix from the local distances and Definition 1. Given a product tree Y tion 1. Assume that j i and j j are con-
the level weights simultaneously dur- and a set of n purchase trees j where nected with probability pij. A smaller
ing the clustering process; an itera- j i Y, the leveled PurTree distance distance d(j i, j j, w) will be assigned
tive optimization algorithm optimizes between two purchase trees j i and with a larger probability pij. We tend
our proposed model. We used six real- j j on the lth level is defined as to simultaneously learn w and the con-
life transaction datasets, representing nection probability matrix P = [pij]nn,
nearly 500 million transaction records d l ( i , j ) = av v ( i , j ), such that the graph constructed from
and 6,000 users, to compare the clus- v N l ( i ) N l ( j ) P only consists of c connected compo-
tering results of LPS with those of four (1) nents. To achieve this goal, we form
commonly used clustering methods the following clustering problem:
for transaction data. The experimen- where d v(j i, j j) is the Jaccard dis-
n H ( )
tal results show that LPS outperformed tance of j i and j j on an internal
other algorithms. node v, which is defined as
min
P , F ,
l dil, j pij
i , j =1 l =1
n H ( )
Notations and Preliminaries
Before we get into the method and ex- v ( i , j ) = 1
Cv ( i ) Cv ( j )
, (2)
+ pij2 + l2 (5)
i , j =1 l =1
periments, we first need to set the nota- Cv ( i ) Cv ( j )
tions that well use throughout the ar- subject to
ticle. Let T be a tree with nodes N(T) and a v is the node weight for node
and edges E(T) N(T) N(T). A node v N(F), which is defined as i , pT
i 1 = 1, pij [0,1]
without children is a leaf; otherwise, its T
1 if v = root () 1 = 1, wl [0,1] , (6)
an internal node. For an edge {v, w} rank(L ) = n c
v w .
E(T), node w is the child of v, that is, w C ( ) C ( )
where v Cw ( i ) Cw ( j ) P
w i w j
Cv(T). The level of a node is defined
(3)
by 1 + (the number of edges between where dil, j is the abbreviation of dl(j i,
the node and the root). Nl(T ) represents Definition 2. Given H(F) level (
j j), LP = DP PT + P 2 is the Lapla-)
nodes in the lth level of T. The height of weights w RH(Y)1, the PurTree dis- cian matrix, the degree matrix DP
tree T, denoted by H(T), is the number tance between two purchase trees ji, Rnn is defined as a diagonal matrix
of edges on the longest downward path jj F is computed as the weighted (
where dij = nj =1 pij + p ji 2 , and )
between the root and a leaf node. sum of the leveled tree distances de- and h are two regularization parameters.
Let Y be a tree used to systemati- fined in Equation 1: The first term in Equation 5 is the
cally organize items with multiple lev- pairwise product of the PurTree dis-
els of categories, with each leaf node H ( ) tance and connection probability, in
representing an item and each inter- d ( i , j , ) = l d l ( i , j ). (4) which a smaller distance d(ji, jj, w) will
nal node representing a category. In l =1 be assigned with a larger probability pij.
n H ( )
becomes +X
pij 1 pT
i , (16)
j=i
min
P , F ,
l dil, j pij min
F Rn c ,FT F = I
T
Tr(F LP F). (12)
i , j =1 l =1
H ( )
where X and positive vector t R+n1
n
+ pij2 + l2 (7) The optimal solution F to Equation
12 is formed by the c smallest eigen-
are Lagrangian multipliers.
According to the KarushKuhnTucker
i , j =1 l =1
values of L P. (KKT) condition, we have
subject to
Optimization of P
L(pi , X ,
i , pT
i 1 = 1, pij [0,1] When F and w are fixed, Equation 10 p = 2 pij + Eij + X j = 0
ij
j i , pij = 0 becomes
L(pi , X ,
T 1 = 1, wl [0,1]
. (8)
n H ( ) X
= n
j =1pij 1 = 0
rank(LP ) = n c
min
P
l dil, j pij
j , j = 0
i , j =1 l =1
n j i , pij = 0.
Its difficult to solve Equation 7 be-
cause LP = DP PT + P 2 and DP ( ) + pij2 + 2Tr(FT LP F) (17)
i , j =1
both depend on P, and the rank con-
straint rank(LP) = n c is a complex st. i , pT
i 1 = 1, pij [0,1], It can be verified that the optimal
nonlinear constraint. Fortunately, other j i , pij = 0. (13) pij of Equation 16 is
k H ( )
n 1
1
pij =
(Eij + X ) , (18) l =
1
2
pij dijl + X , (23) i
2 l dil, ik
dil, ij . (27)
2 + i , j =1 j =1 l =1
+
where Eij is defined in Equation 15. In Equation 7, we wish all k neigh-
To make pi consist of at most k pos- where X can be solved from the con- bors be used for clustering, so can be
itive values, we have to carefully set l straint lH=1() l = 1. set as its upper bound. However, its
such that pi computed from Equation 18 Theorem 1 states the properties of w. impossible to compute the exact upper
only consists of k positive values. The bound i because wi will change dur-
k-nn probabilities pi can be obtained Theorem 1. If 1 l t H(F), then ing the clustering process. We turn to
by solving c in the following equation wi w t, which is computed according compute an approximate upper bound
from the constraint nj =1 pij = 1 : to Equation 23. of with equal level weights of w, and
Proof set i as the approximate upper bound:
1 The PurTree distance is proved to
(Eij + X ) if j i
pij 2 + . (19) be decreasing from the top level to H ( ) k
1
0
if j i the bottom level.9 Therefore, if 1 i =
2H () d l
j =1
i , i , k +1
d l
i ,
.
ij
l t H(F), then dijl dijt . It can be l =1
Optimization of W verified that ni , j =1 pij dijl ni , j =1 pij dijt . (28)
When P and F are fixed, Equation 10 According to Equation 23, we have
becomes wl wt. The final can be set as the median
of {1 , , n}. Because k is an integer
n H ( )
Determine k and has explicit meaning, its easy to
T
min
1=1, [0,1]
l dijl pij In the new method, a regularization tune by selecting the proper k.
l i , j =1 l =1
parameter is used to adjust the dis-
H ( )
tribution of P. In practice, its difficult Determine g
+ l2 . (20)
to tune because its value could be from Next, we need to determine the value of
l =1
zero to infinite. In this subsection, we h. To utilize information in the data, we
H ( )
The Lagrangian function of Equa- present an effective method to deter- want { l }l =1 to be positive. According
tion 20 is mine the value of l. We can verify that to Equation 23, c can be solved from
the optimal pij of Equation 7 is the constraint lH=1() l = 1 as
n H ( ) H ( )
L(w , X , ) = l dijl pij + l2 pij =
2 lH=1() ni, j =1pij dijl . (29)
X= +
nh=1 lH=1() l dihl k lH=1() l dijl .
i , j =1 l =1 l =1
, i H () H ()
H ( ) +
+X
l 1 T
k 2k
Substituting into Equation 23, we
l =1 (24)
(21) have
We wish pi contains at most k posi- l =
where X and positive vector t R+H(Y)1 tive values, which means
are Lagrangian multipliers.
1
+
lH'=(1) ni, j =1pij dijl' H () ni, j =1pij dijl .
H () 2H ()
According to the KKT condition,
1 + kj =1 lH=1() l dil, ij
lH=1() l dil,
k
ik
0 (30)
we have k 2ki
, H ( )
1 k
j =1
H ( ) l
l =1 l d i , ij lH=1() l dil,
k
i , k +1
To force { }l =1 to be positive, we
L( , X , ) + 0
l
= ni, j =1dijl pij + 2 l + X j = 0 k 2ki have
(25)
L( , X , ) H ( ) n
lH=1() l 1 = 0
n
= 1
X
.
where ai bi consists of k + 1 nearest
max H ()
2 l pij dijl pij dijl' .
j , j = 0 i , j =1 l '=1 i , j =1
purchase trees for j i.
(31)
(22) Then we have
k H ( )
Because p is unknown, its impossi-
1
It can be verified that the optimal w l
of Equation 21 is
i
2 l dil, i ,k +1 dil, ij (26)
ble to compute the exact lower bound
of h. We turn to compute the upper
j =1 l =1
Algorithm 1. LPS (product tree Y, a set of n purchase trees j, the number of clusters c, the number of nearest neighbors k).
bound of the lower bound of h and the effectiveness and scalability of the where {C, ..., Cc} is the clustering re-
set h as the bound: LPS algorithm. D1 was built from the sult of c partitions from a purchase
Kaggle competition (www.kaggle.com/c/ tree dataset = { i }ni =1. The lower
n H ( )
1 acquire-valued-shoppers-challenge/data) the N LW(c), the better the cluster-
= max
2 l max H ()dijl
j i
dijl' .
and consists of more than 7 million ing result.
i =1 l '=1
transaction records from 202 cus- In a social network, modularity is
(32) tomers. D2 was built from four years often used to measure the strength
of a superstores transactional data of a networks division into modules.
Then h can be tuned by selecting the (https://community.tableau.com/docs/ Intuitively, it can be used to measure
proper bi, which is determined by k. DOC-1236) and consists of more the quality of a clustering result pro-
than 8,000 transaction records from duced by a spectral clustering. The
The Optimization Algorithm 795 customers. D3, D4, D5, and D6 modularity is computed as10
The detailed algorithm to solve Equa- were built from a big supermarkets
tion 7 is summarized in Algorithm data and contain more than 4 mil- c
si s j
1
1. (In practice, can be determined lion transaction records from 2015. Q=
m sij
m
, (34)
during the iteration. For example, The number of customers in the four l =1 i , j C l , i j
we can initialize = l, then increase datasets are 1,338, 1,500, 2,000, and
if the connected components of 2,676, respectively. The heights of the where sij = 1 d(j i, j j), si = nj =1sij ,
P is smaller than c and decrease purchase trees in the six datasets are and m = ni , j sij . A positive Q indicates
if its greater than c in each itera- 5, 3, 4, 4, 4, and 4, respectively. a clustering result better than random
tion.) Given a set of n purchase trees assignment. The higher the Q, the
j, we want to cluster j into c clus- Evaluation Methods better the clustering result.
ters. With a given number of nearest Because the PurTree distance de-
neighbors k, the k-nearest neighbors pends on the level weights that will Experimental Results on LPS
b i for each purchase tree j is formed. be learned, we propose a normalized We first investigate the relation-
Two regularization parameters l and logarithm of within-cluster disper- ship among the number of neighbor
h are computed according to Equa- sion for evaluating a clustering result, trees k, the final number of clusters
tions 28 and 32. F, w, and P in Equa- which is computed as c, and the level weights w. In this
tion 10 are iteratively solved until experiment, we selected 96 integers
they converge. c from 5 to 100 for c and 20 integers from
1
N L W = log
l =1 2 C l
d ( i , j )
5 to 100 for k to run LPS on D 3. We
i , j C l
, computed the average level weights
Experimental Results n w for each k and for each c from
and Analysis log
i , j =1
d ( i , j )
these results by LPS (see Figure 1).
We conducted experiments on six real- From these results, we can see that
life transaction datasets to investigate (33) the weights didnt change too much
1 2 3 4 1 2 3 4
0.5 0.5
0.4 0.4
0.3 0.3
Weights
Weights
0.2 0.2
0.1 0.1
0 0
0 100 200 300 400 500 0 20 40 60 80 100
(a) k (b) C
Figure 1. The relationship among k, c, and level weights: (a) k versus v and (b) c versus v.
0.11 0.15
0.1
0.09
0.1
0.08
Moduality
Moduality
0.07
0.06
0.05
0.05
0.04
0.03 0
0 10 20 30 40 50 60 70 80 90 100 0 20 40 60 80 100
(a) No. of nearest neighbors (b) C
Figure 2. The relationship among k, c, and Q: (a) k versus Q, and (b) c versus Q.
with changes of k and c. These re- Q when c = 48, which decreases af- the lighter color representing higher
sults are consistent with Theorem 1, ter that. Q reached its biggest value distance). From Figure 3a, we cant
which states that the level weights when c = 48, so we select 48 as the see clearly the cluster structure. To
decrease with the increase in tree number of clusters to investigate clus- check whether it can be observed
levels. ter structure. from the LPS result, we arranged the
The maximal modularity Q was We select the clustering result with tree order such that the trees in the
also computed for each k and c from the highest Q to visually investi- same cluster were close to each other.
LPSs results (see Figure 2). From gate the effectiveness of the LPS al- The result is shown in Figure 3b, in
these results, we can see that Q in- gorithms clustering result. We first which 48 clusters are placed in order
creased with the increase of k at the computed the paired distances be- of decreasing cluster size from left
start, then it became nearly stable af- tween trees in D3 and drew the dis- to right (on the left, two big clusters
ter k 50, indicating that LPS can tance matrix (Figure 3a; each row each contain more than 400 objects;
produce a good clustering result with and column represent a purchase tree on the right, 37 small clusters con-
a small k. From Figure 2b, we can see in the same order, with the darker tain fewer than 10 objects). The small
that there exists a sudden increase of color representing lower distance and clusters can be considered as outliers.
Figure 3. Distance matrix on D3: (a) original and (b) 48 clusters recovered by LPS.
The Authors
Xiaojun Chen is a lecturer in the College of Computer Science and Software at Shenzhen
Adaptive Neighbors, Proc. 20th ACM University. His research interests include machine learning, data mining, and pattern rec-
SIGKDD Intl Conf. Knowledge Discov- ognition. Contact him at xjchen@szu.edu.cn.
ery and Data Mining, 2014, pp. 977986. Si Peng is a masters student in the College of Computer Science and Software at Shen-
5. X. Chen et al., A Feature Group zhen University. Her research interests focus on clustering. Contact her at 2150230405@
Weighting Method for Subspace Clus- email.szu.edu.cn.
tering of High-Dimensional Data, Pat- Joshua Zhexue Huang is a professor in the College of Computer Science and Software at
tern Recognition, vol. 45, no. 1, 2012, Shenzhen University. His research interests include machine learning, data mining, and
pp. 434446. pattern recognition. Contact him at zx.huang@szu.edu.cn.
6. X. Chen et al., TW-k-means: Automat- Feiping Nie is a professor in the College of Computer Science and Software at Shenzhen
ed Two-level Variable Weighting Clus- University. His research interests include machine learning, data mining, pattern recogni-
tering Algorithm for Multi-view Data, tion, and computer vision. Contact him at feipingnie@gmail.com.
IEEE Trans. Knowledge and Data Eng., Yong Ming is a masters student in the College of Computer Science and Software at Shenzhen
vol. 25, no. 4, 2013, pp. 932944. University. His research interests focus on clustering. Contact him at 948325572@qq.com.
7. Y. Yang and J. Jiang, Hybrid Sampling-
Based Clustering Ensemble with Global
and Local Constitutions, IEEE Trans.
Neural Networks and Learning Sys- IEEE Trans. Knowledge and Data Transaction Data, Proc. 32nd IEEE Intl
tems, vol. 27, no. 5, 2016, pp. 952 965. Eng., vol. 28, no. 3, 2016, pp. 701714. Conf. Data Eng., 2016, pp. 661672.
8. Z. Yu et al., Incremental Semi- 9. X. Chen, J.Z. Huang, and J. Luo, Pur- 10. M.E. Newman, Analysis of Weighted
supervised Clustering Ensemble for TreeClust: A Purchase Tree Clustering Networks, Physical Rev. E, vol. 70,
High Dimensional Data Clustering, Algorithm for Large-Scale Customer no. 5, 2004, article no. 056131.
ADVERTISER INFORMATION
Northeast, Midwest, Europe, Middle East: Advertising Sales Representatives (Jobs Board)
Ann & David Schissler
Email: a.schissler@computer.org, d.schissler@computer.org
Phone: +1 508 394 4026 Heather Buonadies
Fax: +1 508 394 1707 Email: h.buonadies@computer.org
Phone: +1 973 304 4123
Fax: +1 973 585 7071