Chen 2017

Big Data
Local PurTree
Spectral Clustering
for Massive Customer
Transaction Data
Xiaojun Chen, Si Peng, Joshua Zhexue Huang,
Feiping Nie, and Yong Ming, Shenzhen University
T
A new method uses he clustering of customer transaction data is one of the most critical
a purchase tree to tasks in successful modern marketing and customer relationship man-
represent customer agement. Its used to categorize customers into different groups based on their
transaction data purchase behaviors, indicating that customers in the same cluster bought more
and computes the goods similar to each other than to those in methods cant handle large-scale transaction
other clusters. Early segmentation methods data due to their high computational complex-
distance between used general variables such as customer de- ity. Advanced clustering algorithms have been
mographics, life style, attitude, and psychol- proposed in the past, such as spectral cluster-
two trees, learning ogy because such variables are intuitive and ing,4 subspace clustering,5 multiview cluster-
easy to operate.1 But with the rapid increase ing, 6 and clustering ensemble,7,8 but few
a data similarity in the amount of customer behavior data, methods have been used for transaction data.
new studies use product-specific variables Recently, a PurTreeClust clustering algo-
matrix from the such as items purchased.2 These methods rithm was proposed for large-scale transac-
often define distance in transaction records, tion data.9 In this algorithm, a purchase tree
local distances and but such distance functions cant effectively is built for each customer from the large-scale
handle transaction data because the number transaction data, and a PurTree distance met-
the level weights of transaction records is often very large. ric is defined to measure the difference be-
Most distances defined in transaction data tween two purchase trees. The PurTreeClust
simultaneously. are categorical distances, so hierarchical algorithm builds a cover tree for indexing
clustering and genetic algorithms are com- the purchase tree dataset, selecting initial
monly used for clustering.2,3 However, such cluster centers through a fast leveled density
march/april 2017 1541-1672/17/$33.00 2017 IEEE 37

Published by the IEEE Computer Society
Big Data
estimation method. Finally, the clus- most transaction datasets, items are The PurTree distance d(j i, j j, w)
tering result is produced by assigning organized with the same number of has the following properties:9
each customer to its nearest cluster categories. Here, we assume that all
center. However, as encouraging as this products are organized with the same d(j i, j j, w) [0, 1];
new method is, its difficult to adjust levels of categories, which means that if 1 l t H(F), then dl(j i, j j,
level weights in the PurTree distance, all leaf nodes in Y have equal depth. w) dt(j i, j j, w); and
and theres no optimization method for A purchase tree j is a subgraph of d(j i, j j, w) is a metric.
the clustering result. Y, that is, N(j) N(Y), E(j) E(Y).
In this article, we propose LPS, a lo- Given a node v N (j), there must Local PurTree Spectral
cal PurTree spectral clustering algo- exist a leaf node w N(j) such that Clustering
rithm, to solve these two shortcomings. the path from root(j) to w also exists Given a product tree Y and n pur-
We use a weighted PurTree distance to in Y. For each purchase tree j, H(j) = chase trees j, we want to cluster j
measure the difference between two H(Y). A purchase tree is used to illus- into c clusters. Let dijl be the leveled
purchase trees. This new method au- trate the items bought by a customer. tree distance between j i and j j on the
tomatically learns the data similarity lth level, which is computed as Equa-
matrix from the local distances and Definition 1. Given a product tree Y tion 1. Assume that j i and j j are con-
the level weights simultaneously dur- and a set of n purchase trees j where nected with probability pij. A smaller
ing the clustering process; an itera- j i Y, the leveled PurTree distance distance d(j i, j j, w) will be assigned
tive optimization algorithm optimizes between two purchase trees j i and with a larger probability pij. We tend
our proposed model. We used six real- j j on the lth level is defined as to simultaneously learn w and the con-
life transaction datasets, representing nection probability matrix P = [pij]nn,
nearly 500 million transaction records d l ( i , j ) = av v ( i , j ), such that the graph constructed from
and 6,000 users, to compare the clus- v N l ( i ) N l ( j ) P only consists of c connected compo-
tering results of LPS with those of four (1) nents. To achieve this goal, we form
commonly used clustering methods the following clustering problem:
for transaction data. The experimen- where d v(j i, j j) is the Jaccard dis-
n H ( )
tal results show that LPS outperformed tance of j i and j j on an internal
other algorithms. node v, which is defined as
min
P , F ,

l dil, j pij
i , j =1 l =1
n H ( )
Notations and Preliminaries
Before we get into the method and ex- v ( i , j ) = 1
Cv ( i ) Cv ( j )
, (2)
+ pij2 + l2 (5)
i , j =1 l =1
periments, we first need to set the nota- Cv ( i ) Cv ( j )
tions that well use throughout the ar- subject to
ticle. Let T be a tree with nodes N(T) and a v is the node weight for node
and edges E(T) N(T) N(T). A node v N(F), which is defined as i , pT
i 1 = 1, pij [0,1]

without children is a leaf; otherwise, its T
1 if v = root () 1 = 1, wl [0,1] , (6)
an internal node. For an edge {v, w} rank(L ) = n c
v w .
E(T), node w is the child of v, that is, w C ( ) C ( )
where v Cw ( i ) Cw ( j ) P
w i w j
Cv(T). The level of a node is defined
(3)
by 1 + (the number of edges between where dil, j is the abbreviation of dl(j i,
the node and the root). Nl(T ) represents Definition 2. Given H(F) level (
j j), LP = DP PT + P 2 is the Lapla-)
nodes in the lth level of T. The height of weights w RH(Y)1, the PurTree dis- cian matrix, the degree matrix DP
tree T, denoted by H(T), is the number tance between two purchase trees ji, Rnn is defined as a diagonal matrix
of edges on the longest downward path jj F is computed as the weighted (
where dij = nj =1 pij + p ji 2 , and )
between the root and a leaf node. sum of the leveled tree distances de- and h are two regularization parameters.
Let Y be a tree used to systemati- fined in Equation 1: The first term in Equation 5 is the
cally organize items with multiple lev- pairwise product of the PurTree dis-
els of categories, with each leaf node H ( ) tance and connection probability, in
representing an item and each inter- d ( i , j , ) = l d l ( i , j ). (4) which a smaller distance d(ji, jj, w) will
nal node representing a category. In l =1 be assigned with a larger probability pij.
38 www.computer.org/intelligent IEEE INTELLIGENT SYSTEMS

The second and third terms are regular- researchers proved that rank(LP) = n c It can be verified that
ization terms of P and w, where and is equivalent to the following problem:4
n
h are two regularization parameters.
pij
2
2Tr(FT LP F) = fi f j ,
Because LP is a nonnegative matrix, it min T
2Tr(F LP F), (9) i , j =1
2
has been proved that the multiplicity c F R nc T

,F F = I
of the eigenvalue 0 of LP is equal to the where f i R c1 is the i-th row of F.
number of connected components in the where is a large enough parameter. Then Equation 13 can be rewritten
graph with the similarity matrix P. The Equation 7 is then equivalent to the as
rank constraint rank(LP) = n c is im- following problem:
n n
posed to LP, such that the sparse graph
constructed from P only consists of c
n H ( ) n H ( ) min [Eij pij + pij2 ]
connected components. The final clus-
min
P , F ,
l dil, j pij + pij2 + l2 P
i =1 j =1
i , j =1 l =1 i , j =1 l =1
tering assignment can be made by mak- st. i , pT
i 1 = 1, pij [0,1],
+ 2Tr(FT LP F)
(10)
ing the objects in the same connected j i , pij = 0, (14)
component belong to the same cluster. subject to
Because a transaction dataset of- where Eij is defined as
ten consists of a large number of us- i , pT
i 1 = 1, pij [0,1]
H ( )
ers, its very useful to explore the lo-
l dijl + fi f j 2 . (15)
2
j i , pij = 0 Eij =
cal connectivity during the clustering T . (11)
i =1
process. Lets assume that P is a sparse 1 = 1, wl [0,1]
T
graph in which for each i, pi contains F F = I , F Rn c Note that Equation 14 is indepen-
at most k positive values for its k-near- dent between different is, so we can
est neighbors. Since w is unknown, its We can apply the alternative opti- solve the following problem individu-
impossible to select the real k-nearest mization approach to solve Equation ally for each pi. The Lagrangian func-
neighbors. To simplify the analysis, 10. Lets examine how to derive the tion of Equation 14 for pi is
we assume that w is with equal level optimal solution to each variable in
weights and select the k-nearest neigh- Equation 10. n n
bors b i for each j i. Then we improve L(pi , X , ) = pij2 + Eij pij
Equation 5 to form the following local Optimization of F j=i j=i
PurTree clustering problem: When P and w are fixed, Equation 10 n
n H ( )
becomes +X

pij 1 pT

i , (16)
j=i
min
P , F ,
l dil, j pij min
F Rn c ,FT F = I
T
Tr(F LP F). (12)
i , j =1 l =1
H ( )
where X and positive vector t R+n1
n
+ pij2 + l2 (7) The optimal solution F to Equation
12 is formed by the c smallest eigen-
are Lagrangian multipliers.
According to the KarushKuhnTucker
i , j =1 l =1
values of L P. (KKT) condition, we have
subject to
Optimization of P
L(pi , X ,
i , pT
i 1 = 1, pij [0,1] When F and w are fixed, Equation 10 p = 2 pij + Eij + X j = 0
ij
j i , pij = 0 becomes
L(pi , X ,
T 1 = 1, wl [0,1]
. (8)
n H ( ) X
= n
j =1pij 1 = 0
rank(LP ) = n c
min
P
l dil, j pij
j , j = 0
i , j =1 l =1
n j i , pij = 0.
Its difficult to solve Equation 7 be-
cause LP = DP PT + P 2 and DP ( ) + pij2 + 2Tr(FT LP F) (17)
i , j =1
both depend on P, and the rank con-
straint rank(LP) = n c is a complex st. i , pT
i 1 = 1, pij [0,1], It can be verified that the optimal
nonlinear constraint. Fortunately, other j i , pij = 0. (13) pij of Equation 16 is
march/april 2017 www.computer.org/intelligent 39

Big Data
k H ( )
n 1
1
pij =

(Eij + X ) , (18) l =
1

2
pij dijl + X , (23) i
2 l dil, ik
dil, ij . (27)

2 + i , j =1 j =1 l =1
+
where Eij is defined in Equation 15. In Equation 7, we wish all k neigh-
To make pi consist of at most k pos- where X can be solved from the con- bors be used for clustering, so can be
itive values, we have to carefully set l straint lH=1() l = 1. set as its upper bound. However, its
such that pi computed from Equation 18 Theorem 1 states the properties of w. impossible to compute the exact upper
only consists of k positive values. The bound i because wi will change dur-
k-nn probabilities pi can be obtained Theorem 1. If 1 l t H(F), then ing the clustering process. We turn to
by solving c in the following equation wi w t, which is computed according compute an approximate upper bound
from the constraint nj =1 pij = 1 : to Equation 23. of with equal level weights of w, and
Proof set i as the approximate upper bound:
1 The PurTree distance is proved to
(Eij + X ) if j i
pij 2 + . (19) be decreasing from the top level to H ( ) k
1
0
if j i the bottom level.9 Therefore, if 1 i =
2H () d l
j =1
i , i , k +1
d l
i ,
.
ij

l t H(F), then dijl dijt . It can be l =1
Optimization of W verified that ni , j =1 pij dijl ni , j =1 pij dijt . (28)
When P and F are fixed, Equation 10 According to Equation 23, we have
becomes wl wt. The final can be set as the median
of {1 , , n}. Because k is an integer
n H ( )
Determine k and has explicit meaning, its easy to
T
min
1=1, [0,1]
l dijl pij In the new method, a regularization tune by selecting the proper k.
l i , j =1 l =1
parameter is used to adjust the dis-
H ( )
tribution of P. In practice, its difficult Determine g
+ l2 . (20)
to tune because its value could be from Next, we need to determine the value of
l =1
zero to infinite. In this subsection, we h. To utilize information in the data, we
H ( )
The Lagrangian function of Equa- present an effective method to deter- want { l }l =1 to be positive. According
tion 20 is mine the value of l. We can verify that to Equation 23, c can be solved from
the optimal pij of Equation 7 is the constraint lH=1() l = 1 as
n H ( ) H ( )
L(w , X , ) = l dijl pij + l2 pij =
2 lH=1() ni, j =1pij dijl . (29)
X= +
nh=1 lH=1() l dihl k lH=1() l dijl .
i , j =1 l =1 l =1
, i H () H ()
H ( ) +
+X
l 1 T

k 2k
Substituting into Equation 23, we
l =1 (24)
(21) have
We wish pi contains at most k posi- l =
where X and positive vector t R+H(Y)1 tive values, which means
are Lagrangian multipliers.
1
+
lH'=(1) ni, j =1pij dijl' H () ni, j =1pij dijl .
H () 2H ()
According to the KKT condition,
1 + kj =1 lH=1() l dil, ij
lH=1() l dil,
k
ik
0 (30)
we have k 2ki
, H ( )
1 k
j =1
H ( ) l
l =1 l d i , ij lH=1() l dil,
k
i , k +1
To force { }l =1 to be positive, we
L( , X , ) + 0
l
= ni, j =1dijl pij + 2 l + X j = 0 k 2ki have
(25)
L( , X , ) H ( ) n
lH=1() l 1 = 0
n
= 1

X
.
where ai bi consists of k + 1 nearest
max H ()
2 l pij dijl pij dijl' .

j , j = 0 i , j =1 l '=1 i , j =1

purchase trees for j i.
(31)
(22) Then we have
k H ( )
Because p is unknown, its impossi-
1
It can be verified that the optimal w l
of Equation 21 is
i
2 l dil, i ,k +1 dil, ij (26)

ble to compute the exact lower bound
of h. We turn to compute the upper
j =1 l =1

Ensure: Clustering result
l
= 1 where d ij is the l-th level purchase tree distance for i and j.
Compute D = {Dl}lH( )
1.
2. For each purchase tree i, form its k-nearest neighbors i, where is given equal
values.
3. Compute i for each purchase tree i according to Equation 28. Select as the median
of {1,, n}.
4. Compute according to Equation 32.
5. Repeat
6. Update F by selecting the first c smallest values from the eigenvector of LP.
7. Update according to Equation 23.
8. Update P according to Equations 19 and 15.
9. until Equation 10 converges.
10. Return the clustering result recovered from P.
Algorithm 1. LPS (product tree Y, a set of n purchase trees j, the number of clusters c, the number of nearest neighbors k).
bound of the lower bound of h and the effectiveness and scalability of the where {C, ..., Cc} is the clustering re-
set h as the bound: LPS algorithm. D1 was built from the sult of c partitions from a purchase
Kaggle competition (www.kaggle.com/c/ tree dataset = { i }ni =1. The lower
n H ( )
1 acquire-valued-shoppers-challenge/data) the N LW(c), the better the cluster-
= max
2 l max H ()dijl
j i
dijl' .
and consists of more than 7 million ing result.
i =1 l '=1
transaction records from 202 cus- In a social network, modularity is
(32) tomers. D2 was built from four years often used to measure the strength
of a superstores transactional data of a networks division into modules.
Then h can be tuned by selecting the (https://community.tableau.com/docs/ Intuitively, it can be used to measure
proper bi, which is determined by k. DOC-1236) and consists of more the quality of a clustering result pro-
than 8,000 transaction records from duced by a spectral clustering. The
The Optimization Algorithm 795 customers. D3, D4, D5, and D6 modularity is computed as10
The detailed algorithm to solve Equa- were built from a big supermarkets
tion 7 is summarized in Algorithm data and contain more than 4 mil- c
si s j
1
1. (In practice, can be determined lion transaction records from 2015. Q=
m sij
m
, (34)
during the iteration. For example, The number of customers in the four l =1 i , j C l , i j
we can initialize = l, then increase datasets are 1,338, 1,500, 2,000, and
if the connected components of 2,676, respectively. The heights of the where sij = 1 d(j i, j j), si = nj =1sij ,
P is smaller than c and decrease purchase trees in the six datasets are and m = ni , j sij . A positive Q indicates
if its greater than c in each itera- 5, 3, 4, 4, 4, and 4, respectively. a clustering result better than random
tion.) Given a set of n purchase trees assignment. The higher the Q, the
j, we want to cluster j into c clus- Evaluation Methods better the clustering result.
ters. With a given number of nearest Because the PurTree distance de-
neighbors k, the k-nearest neighbors pends on the level weights that will Experimental Results on LPS
b i for each purchase tree j is formed. be learned, we propose a normalized We first investigate the relation-
Two regularization parameters l and logarithm of within-cluster disper- ship among the number of neighbor
h are computed according to Equa- sion for evaluating a clustering result, trees k, the final number of clusters
tions 28 and 32. F, w, and P in Equa- which is computed as c, and the level weights w. In this
tion 10 are iteratively solved until experiment, we selected 96 integers
they converge. c from 5 to 100 for c and 20 integers from
1
N L W = log
l =1 2 C l
d ( i , j )

5 to 100 for k to run LPS on D 3. We
i , j C l
, computed the average level weights
Experimental Results n w for each k and for each c from
and Analysis log
i , j =1
d ( i , j )

these results by LPS (see Figure 1).
We conducted experiments on six real- From these results, we can see that
life transaction datasets to investigate (33) the weights didnt change too much

Big Data
1 2 3 4 1 2 3 4
0.5 0.5
0.4 0.4
0.3 0.3
Weights
Weights
0.2 0.2
0.1 0.1
0 0
0 100 200 300 400 500 0 20 40 60 80 100
(a) k (b) C
Figure 1. The relationship among k, c, and level weights: (a) k versus v and (b) c versus v.
0.11 0.15
0.1
0.09
0.1
0.08
Moduality
Moduality
0.07
0.06
0.05
0.05
0.04
0.03 0
0 10 20 30 40 50 60 70 80 90 100 0 20 40 60 80 100
(a) No. of nearest neighbors (b) C
Figure 2. The relationship among k, c, and Q: (a) k versus Q, and (b) c versus Q.
with changes of k and c. These re- Q when c = 48, which decreases af- the lighter color representing higher
sults are consistent with Theorem 1, ter that. Q reached its biggest value distance). From Figure 3a, we cant
which states that the level weights when c = 48, so we select 48 as the see clearly the cluster structure. To
decrease with the increase in tree number of clusters to investigate clus- check whether it can be observed
levels. ter structure. from the LPS result, we arranged the
The maximal modularity Q was We select the clustering result with tree order such that the trees in the
also computed for each k and c from the highest Q to visually investi- same cluster were close to each other.
LPSs results (see Figure 2). From gate the effectiveness of the LPS al- The result is shown in Figure 3b, in
these results, we can see that Q in- gorithms clustering result. We first which 48 clusters are placed in order
creased with the increase of k at the computed the paired distances be- of decreasing cluster size from left
start, then it became nearly stable af- tween trees in D3 and drew the dis- to right (on the left, two big clusters
ter k 50, indicating that LPS can tance matrix (Figure 3a; each row each contain more than 400 objects;
produce a good clustering result with and column represent a purchase tree on the right, 37 small clusters con-
a small k. From Figure 2b, we can see in the same order, with the darker tain fewer than 10 objects). The small
that there exists a sudden increase of color representing lower distance and clusters can be considered as outliers.

(a) (b)
Figure 3. Distance matrix on D3: (a) original and (b) 48 clusters recovered by LPS.
Table 1. Comparison results by five clustering algorithms on six datasets.*

Algorithms D1 D2 D3 D4 D5 D6
CHC 0.09(6.11) 0.06(7.46) 0.04(8.00) 0.03(8.11) 0.03(8.36) 0.03(8.69)
DBSCAN 0.17(6.00) 0.12(7.37) 0.02(7.89) 0.04(8.01) 0.02(8.30) 0.00(8.59)
HAC 0.47(6.02) 0.16(7.37) 0.02(7.90) 0.05(8.01) 0.03(8.30) 0.01(8.59)
PurTreeClust 0.47(6.03) 0.16(7.38) 0.02(7.90) 0.05(8.01) 0.03(8.30) 0.01(8.59)
LPS 0.74(7.50) 0.03(7.51) 0.11(7.36) 0.10(7.36) 0.10(8.62) 0.09(8.70)

* The value before brackets is the maximum Q and the value in brackets is the minimum NLW; the maximum Q and minimum NLW on each dataset are rendered in bold.
Comparison Results NLW on four datasets. Especially References

We used all six datasets to compare on D1, D5, and D 6, LPS significantly 1. R. Kuo, L. Ho, and C.M. Hu, Integra-
the effectiveness of the LPS algorithm outperforms all other algorithms in tion of Self-Organizing Feature Map
with those of four clustering algo- terms of Q and NLW. and k-Means Algorithm for Market
rithms: PurTreeClust,9 concept hier- Segmentation, Computers & Opera-
archy clustering (CHC), hierarchical tions Research, vol. 29, no. 11, 2002,
agglomerative clustering (HAC), and
DBSCAN with Jaccard distance on
individual items. The same 96 inte-
I n future work, well improve the
weighting technique used in dis-
tance by weighting individual tree
pp. 14751493.
2. F.-M. Hsu, L.-P. Lu, and C.-M. Lin,
Segmenting Customers by Transaction
gers from 5 to 100 were selected for c nodes instead of tree levels. It will be Data with Concept Hierarchy, Expert
to run the three algorithms. For each interesting to apply this method in Systems with Applications, vol. 39, no.
clustering algorithm, we computed clustering other objects embedded in 6, 2012, pp. 62216228.
the maximum Q and minimum NLW. a tree. 3. T. Xiong et al., Dhcc: Divisive Hi-
Table 1 shows the results; the value erarchical Clustering of Categorical
before brackets is the maximum Data, Data Mining and Knowledge
Q, and the value in brackets is the Acknowledgments Discovery, vol. 24, no. 1, 2012,
This research was supported by NSFC un-
minimum NLW. From these results, der grant no. 61305059, 61473194, and pp. 103135.
we can see that LPS produced the high- 61472258 and Natural Science Foundation 4. F. Nie, X. Wang, and H. Huang, Clus-
est Q on five datasets and the lowest of SZU (grant no. 201432). tering and Projected Clustering with

Big Data
The Authors
Xiaojun Chen is a lecturer in the College of Computer Science and Software at Shenzhen
Adaptive Neighbors, Proc. 20th ACM University. His research interests include machine learning, data mining, and pattern rec-
SIGKDD Intl Conf. Knowledge Discov- ognition. Contact him at xjchen@szu.edu.cn.
ery and Data Mining, 2014, pp. 977986. Si Peng is a masters student in the College of Computer Science and Software at Shen-
5. X. Chen et al., A Feature Group zhen University. Her research interests focus on clustering. Contact her at 2150230405@
Weighting Method for Subspace Clus- email.szu.edu.cn.
tering of High-Dimensional Data, Pat- Joshua Zhexue Huang is a professor in the College of Computer Science and Software at
tern Recognition, vol. 45, no. 1, 2012, Shenzhen University. His research interests include machine learning, data mining, and
pp. 434446. pattern recognition. Contact him at zx.huang@szu.edu.cn.
6. X. Chen et al., TW-k-means: Automat- Feiping Nie is a professor in the College of Computer Science and Software at Shenzhen
ed Two-level Variable Weighting Clus- University. His research interests include machine learning, data mining, pattern recogni-
tering Algorithm for Multi-view Data, tion, and computer vision. Contact him at feipingnie@gmail.com.
IEEE Trans. Knowledge and Data Eng., Yong Ming is a masters student in the College of Computer Science and Software at Shenzhen
vol. 25, no. 4, 2013, pp. 932944. University. His research interests focus on clustering. Contact him at 948325572@qq.com.
7. Y. Yang and J. Jiang, Hybrid Sampling-
Based Clustering Ensemble with Global
and Local Constitutions, IEEE Trans.
Neural Networks and Learning Sys- IEEE Trans. Knowledge and Data Transaction Data, Proc. 32nd IEEE Intl
tems, vol. 27, no. 5, 2016, pp. 952 965. Eng., vol. 28, no. 3, 2016, pp. 701714. Conf. Data Eng., 2016, pp. 661672.
8. Z. Yu et al., Incremental Semi- 9. X. Chen, J.Z. Huang, and J. Luo, Pur- 10. M.E. Newman, Analysis of Weighted
supervised Clustering Ensemble for TreeClust: A Purchase Tree Clustering Networks, Physical Rev. E, vol. 70,
High Dimensional Data Clustering, Algorithm for Large-Scale Customer no. 5, 2004, article no. 056131.
ADVERTISER INFORMATION
Advertising Personnel Southwest, California:

Mike Hughes
Marian Anderson: Sr. Advertising Coordinator Email: mikehughes@computer.org
Email: manderson@computer.org Phone: +1 805 529 6790
Phone: +1 714 816 2139 | Fax: +1 714 821 4010
Southeast:
Sandy Brown: Sr. Business Development Mgr. Heather Buonadies
Email sbrown@computer.org Email: h.buonadies@computer.org
Phone: +1 714 816 2144 | Fax: +1 714 821 4010 Phone: +1 973 304 4123
Fax: +1 973 585 7071
Advertising Sales Representatives (display)
Advertising Sales Representatives (Classified Line)
Central, Northwest, Far East:
Eric Kincaid Heather Buonadies
Email: e.kincaid@computer.org Email: h.buonadies@computer.org
Phone: +1 214 673 3742 Phone: +1 973 304 4123
Fax: +1 888 886 8599 Fax: +1 973 585 7071
Northeast, Midwest, Europe, Middle East: Advertising Sales Representatives (Jobs Board)
Ann & David Schissler
Email: a.schissler@computer.org, d.schissler@computer.org
Phone: +1 508 394 4026 Heather Buonadies
Fax: +1 508 394 1707 Email: h.buonadies@computer.org
Phone: +1 973 304 4123
Fax: +1 973 585 7071

Chen 2017

Enviado por

Dados do documento

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Chen 2017

Enviado por

Direitos autorais:

Formatos disponíveis

Big Data

march/april 2017 1541-1672/17/$33.00 2017 IEEE 37

38 www.computer.org/intelligent IEEE INTELLIGENT SYSTEMS

has been proved that the multiplicity c F R nc T

PurTree clustering problem: When P and w are fixed, Equation 10 n

march/april 2017 www.computer.org/intelligent 39

40 www.computer.org/intelligent IEEE INTELLIGENT SYSTEMS

march/april 2017 www.computer.org/intelligent 41

42 www.computer.org/intelligent IEEE INTELLIGENT SYSTEMS

Table 1. Comparison results by five clustering algorithms on six datasets.*

DBSCAN 0.17(6.00) 0.12(7.37) 0.02(7.89) 0.04(8.01) 0.02(8.30) 0.00(8.59)

HAC 0.47(6.02) 0.16(7.37) 0.02(7.90) 0.05(8.01) 0.03(8.30) 0.01(8.59)

PurTreeClust 0.47(6.03) 0.16(7.38) 0.02(7.90) 0.05(8.01) 0.03(8.30) 0.01(8.59)

LPS 0.74(7.50) 0.03(7.51) 0.11(7.36) 0.10(7.36) 0.10(8.62) 0.09(8.70)

Comparison Results NLW on four datasets. Especially References

march/april 2017 www.computer.org/intelligent 43

Advertising Personnel Southwest, California:

44 www.computer.org/intelligent IEEE INTELLIGENT SYSTEMS

Você também pode gostar