Escolar Documentos
Profissional Documentos
Cultura Documentos
Kyoto Unlverslty
0 lf 0 < x
i
2
k
,
1 lf 2
k+l
< x
i
2
k+2
,
...
2
k
l lf 2
k+(kl)
< x
i
l.
we use the same operator
k
for dlscretlzatlonof a database X,
i.e., each data polnt x ln X ls dlscretlzed to
k
(x) ln the database
k
(X).
A-8/A-zo
Reachability
Glven a dlstance parameter l , a data polnt x ln X ls
reachable at level k from a data polnt y lf there exlsts a
chaln of polnts z
l
, z
2
, , z
p
(p 2) such that z
l
x, z
p
y,
and the dlstance
d
0
(
k
(z
i
),
k
(z
i+l
)) l and d
(
k
(z
i
),
k
(z
i+l
)) l
for all i {l, 2, , p l}
The dlstance between x and y ls dened by
d
0
(x, y)
d
il
(x
i
, y
i
), where
0 lf x
i
y
i
,
l lf x
i
y
i
,
d
(x, y) max
i{l,,d}
|x
i
y
i
|
ln the L
0
and L
metrlcs, respectlvely
A-p/A-zo
Pseudo-Code (/)
Input: Database X,
lower bound on number of clusters K,
nolse parameter N, and
dlstance parameter l
Output: Partltlon
function 8OOL(X, K, N)
+: k l // k ls level of dlscretlzatlon
z: repeat
: MAKLH|LPAPCH(X, k, N)
(: k k + l
: until # K
6: output
A-+o/A-zo
Pseudo-Code (/)
function MAKLH|LPAPCH(X, k, N)
+: {{x} x X}
z: h d // d ls number of attrlbutes of X
: X
D
k
(X) // dlscretlze X at level k
(: X S
l
(X
D
)
S
2
(X
D
)
S
d
(X
D
)
(X)
: repeat
6: X S
h
(X
D
)
(X)
): AGGL(X, , h)
8: h h l
p: until h 0
+o: {C #C N}
++: output
A-++/A-zo
Pseudo-Code (/)
function AGGL(X, , h)
+: for each ob[ect x of X
z: y successlve ob[ect of x
: if d
0
(
k
(x),
k
(y)) l and d
(
k
(x),
k
(y)) l then
(: delete C x and D y from, and add C D
: end if
6: end for
): output
A-+z/A-zo
Level-k partition and Sorting
8OOL construct clusters through level-k partltlons
(k l, 2, 3, )
Apartltlonof a database X ls a level-k partltlon, denotedby
k
,
lf lt satlses the followlng condltlon:
Por all palrs x andy, the palr are lnthe same cluster ly ls reach-
able at level k from x
Sortlng of a database X ls dened as follows:
Let Y be a key database s.t. #X #Y and Y has only one of Xs
attrlbutes
The expresslon S
Y
(X) ls the database X for whlch data polnts
are sorted ln the order lndlcated by Y
Tles keep the orlglnal order of X
A-+/A-zo
Properties of Reachability
|f the dlstance parameter l l, the condltlon ls exactly the
same as
d
l
(
k
(z
i
),
k
(z
i+l
)) l
d
l
ls the Manhattan dlstance (L
l
metrlc)
The notlon of reachablllty ls symmetrlc
|f a data polnt x ls reachable at level k from y, then y ls
reachable from x
A-+(/A-zo
Hierarchy of Clusters
Level-k partltlons have a hlerarchlcal structure
Por the level-k and k + l partltlons
k
and
k+l
,
the followlng condltlon holds:
Por every cluster C
k
, there exlsts a set of clusters
k+l
such that C.
Thls ls why, for two ob[ects x and y, lf d
0
(
k
(x),
k
(y)) l
and d
(
k
(x),
k
(y)) l for some k, then the same holds
for all k
, wlth k
k.
8OOL can be vlewed as a dlvlslve hlerarchlcal clusterlng
algorlthm
A-+/A-zo
Adjusted Rand Index
Let the result be {C
l
, , C
K
} andthe correct partltlon
be {D
l
, , D
M
}
Suppose n
ij
{x X x C
i
, x D
j
}. Then
i, j n
ij
C
2
(
i C
i
C
2
h D
j
C
2
)/
n
C
2
2
l
(
i C
i
C
2
+
h D
j
C
2
) (
i C
i
C
2
h D
j
C
2
)/
n
C
2
A-+6/A-zo
Shape-Based Clustering Algorithms
Many shape-based algorlthms have been proposed
See
|
8erkhln, zoo6, Halkldl et al., zoo+, 1aln et al., +ppp
|
Partltlonal algorlthms
A8ACUS
|
Chao[l et al., zo++
|
, SPAPCL
|
Chao[l et al., zoop
|
Mass-based algorlthms
|
Tlng and wells, zo+o
|
Denslty-based algorlthms
D8SCAN
|
Lster et al., +pp6
|
Hlerarchlcal clusterlng algorlthms
CUPL
|
Guha et al., +pp8
|
, CHAMLLLON
|
Karypls et al., +ppp
|
Grld-based algorlthms
ST|NG
|
wang et al., +pp)
|
A-+)/A-zo
References
|
8erkhln, zoo6
|
P. 8erkhln. A survey of clusterlng data mlnlng technlques.
Grouping Multidimensional Data, pages z)+, zoo6.
|
Chao[l et al., zo++
|
v. Chao[l, G. Ll, H. lldlrlm, and M. 1. Zakl. A8ACUS:
Mlnlng arbltrary shaped clusters from large datasets based on back-
bone ldentlcatlon. |n Proceedings of SIAMInternational Confer-
ence on Data Mining, pages zpo6, zo++.
|
Chao[l et al., zoop
|
v. Chao[l, M. A. Hasan, S. Salem, and M. 1. Zakl. SPAPCL:
Aneectlve andeclent algorlthmfor mlnlngarbltrary shape-based
clusters. Knowledge and Information Systems, z+(z):zo+zzp, zoop.
|
Lster et al., +pp6
|
M. Lster, H. P. Krlegel, 1. Sander, and X. Xu. A denslty-
based algorlthm for dlscoverlng clusters ln large spatlal databases
wlth nolse. |n Proceedings of KDD, p6, zz6z+, +pp6.
|
Garcla-Mollna et al., zoo8
|
H. Garcla-Mollna, 1. D. Ullman, and 1. wldom.
Database systems: The complete book. Prentlce Hall Press, zoo8.
|
Guha et al., +pp8
|
S. Guha, P. Pastogl, and K. Shlm. CUPL: An e-
A-+8/A-zo
clent clusterlng algorlthm for large databases. Information Systems,
z6(+):8, +pp8.
|
Halkldl et al., zoo+
|
M. Halkldl, . 8atlstakls, and M. vazlrglannls. On clus-
terlng valldatlon technlques. Journal of Intelligent Information Sys-
tems, +)(z):+o)+(, zoo+.
|
Hlnneburg and Kelm, +pp8
|
A. Hlnneburg and D. A. Kelm. An eclent ap-
proach to clusterlng ln large multlmedla databases wlth nolse. |n
Proceedings of the Fourth International Conference on Knowledge Dis-
covery and Data Mining, pages 86, +pp8.
|
1aln et al., +ppp
|
A. K. 1aln, M. N. Murty, and P. 1. Plynn. Data clusterlng: A
revlew. ACMComputing Surveys, +():z6(z, +ppp.
|
Karypls et al., +ppp
|
G. Karypls, H. Lul-Hong, and v. Kumar. CHAMLLLON:
Hlerarchlcal clusterlng uslng dynamlc modellng. Computer,
z(8):68), +ppp.
|
Lln et al., zoo
|
1. Lln, L. Keogh, S. Lonardl, and 8. Chlu. A symbollc repre-
sentatlon of tlme serles, wlth lmpllcatlons for streamlng algorlthms.
|n Proceedings of the th ACMSIGMODWorkshop on Research Issues in
Data Mining and Knowledge Discovery, pages +++, zoo.
A-+p/A-zo
|
Qlu and 1oe, zoo6
|
w. Qlu and H. 1oe. Generatlon of randomclusters wlth
specled degree of separatlon. Journal of Classication, z:+(,
zoo6.
|
Shelkholeslaml et al., +pp8
|
G. Shelkholeslaml, S. Chatter[ee, and
A. Zhang. waveCluster: A multl-resolutlon clusterlng approach for
very large spatlal databases. |n Proceedings of the th International
Conference on Very Large Data Bases, pages (z8(p, +pp8.
|
Tlng and wells, zo+o
|
K. M. Tlng and 1. P. wells. Multl-dlmenslonal mass
estlmatlon and mass-based clusterlng. |n Proceedings of th IEEE In-
ternational Conference on Data Mining, pages ++ zo, zo+o.
|
wang et al., +pp)
|
w. wang, 1. ang, and P. Muntz. ST|NG: A statlstlcal
lnformatlon grld approach to spatlal data mlnlng. |n Proceedings
of the rd International Conference on Very Large Data Bases, pages
+86+p, +pp).
A-zo/A-zo