Você está na página 1de 6

JOURNAL OF COMPUTING, VOLUME 3, ISSUE 2, FEBRUARY 2011, ISSN 2151-9617

HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING
WWW.JOURNALOFCOMPUTING.ORG 85

Spatial Clustering Algorithm using R-tree


Harleen Kaur, Ritu Chauhan and M. Afshar Alam

Abstract — Recently, there has been extensive growth in databases; we require proper analyzing techniques to dis-
cover hidden patterns from large databases. Spatial data mining is the application area of data minining which helps to
detect hidden patterns from spatial datasets. There has been enormous growth in spatial data from last decade, growing
need has forced researchers to find efficient algorithm for spatial data analysis. There are several spatial indexing tech-
niques applied in past for storage and retrieval of spatial data. In this approach suitable indexing techniques has being
proposed for three dimensional spatial objects such as R tree which has been used as a spatial data structure for index-
ing of spatial data. We have designed a spatial clustering model using R tree (SCART), with three dimensional spaces
using spatial objects where data is organized in the form of three dimensional grid and Hilbert space filling curve is used
to find the linear order points of the grid. In this paper efficient clustering algorithm with combination of hierarchical and
grid based approach has being used to find the effective and efficient spatial cluster. The structure of node has been de-
veloped to find the relevant clusters with the help of R tree data structure. The novel search algorithm has being pro-
posed for finding the objects or retrieve the answer for given query. We have found that the algorithm is suitable for
achieving three dimensional searching in a grid.

Index Terms— Spatial Databases, Three Dimensional Grid, R -tree, Spatial Cluster Algorithm using R tree (SCART),
Space filling curve

——————————  ——————————

1 INTRODUCTION
  collected from NASA satellites, Geographical Information 

K  DD (Knowledge Discovery in Databases) is the pro‐
cess  to  extract  knowledge  from  data,  whereas  data 
mining is the step in KDD process to determine pat‐
System,  data  collected  from  X‐ray  and  weather  and  cli‐
mate data. A spatial database can be called as collection of 
spatial attribute and non spatial attribute. Spatial attribute 
terns  from  databases.  KDD  is  a  multistep  process  which  includes  the  location  based  attributes  whereas,  Non  spa‐
requires  data  selection,  data  cleaning,  Pre‐processing  of  tial  attribute  are  age,  sex  and  marital  status.  There  exist 
data, data transformation, data mining and interpretation  several  spatial  data  mining  techniques  such  as  classifica‐
of  results  in  mining.  Data  mining  can  be  defined  as  the  tion, visualization, clustering and spatial association. The 
non‐trivial process of identifying valid, novel, potentially  aim  of  spatial  data  mining  is  to  achieve  the  best 
useful  and  ultimately  understandable  patterns  in  data  knowledge  of  spatial  data  and  retrieve  the  hidden  infor‐
and  describing  them  in  a  concise  and  meaningful  way.  mation from spatial databases. The features in spatial da‐
Data  mining  involves  interest  for  researchers  in  fields  ta are represented in the form of point, line and polygon. 
such as statistics, artificial intelligence, machine learning,  Spatial  databases  can  be  stored  in  2‐dimensional  or  N‐
data  visualization  techniques,  classification  and  pattern  dimensional spaces. To deal with several dimensionalities 
recognition [12], [13], [14], [21].  of spatial data numerous spatial data structures has being 
        Researchers are faced with challenges to handle large  proposed  in  the  past  [27].  The  contribution  of  our  work 
amount  of  spatial  data  and  relative  algorithm  which  can  focuses on R‐tree as spatial data structure to discover spa‐
effectively  and  efficiently  scale  the  data.  To  deal  with  tial clusters. An R‐tree is spatial data structure which does 
enormous  amount  of  spatial  data,  we  have  spatial  data  not  require  transforming  point  data  and  hence  provide 
mining technique. Spatial data mining refers to discovery  better  feasibility  for  spatial  clustering  algorithms.  It  has 
of  knowledge  from  large  spatial  databases.  Examples,  of  attracted  researchers  from  past  deacde  for  its  robustness 
spatial  databases  include  remotely  sensed  images,  data  to  handle  large  amount  of  multidimensional  spatial  data 
and  its  efficiency  has  proved  beneficial  for  several  query 
————————————————
algorithms. There  is  certain  disadvantage  of  R  tree  as  it 
 Harleen Kaur is with the Department of Computer Science, Hamdard Uni-
lacks  from  poor  updation  problem.  To  overcome  this 
versity, New Delhi, India. flaws R*‐tree [9] was discovered, it works similarly like R 
 Ritu Chauhan is a Research Scholar with the Department of Computer tree  data  structure,  but  it  has  updation  algorithm  that 
Science, Hamdard University, New Delhi. certainly makes it more efficient than R trees. 
 M. Afshar Alam is with the Department of Computer Science, Hamdard
University, New Delhi, India.          The rest of the paper is organized as follows. Section 
2 briefly gives background knowledge explaining drift in 
spatial data mining and several spatial data structures for 
JOURNAL OF COMPUTING, VOLUME 3, ISSUE 2, FEBRUARY 2011, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING
WWW.JOURNALOFCOMPUTING.ORG 86
clustering of spatial data. In Section 3, we describe several  ally  separated  by  regions  of  low  density  which  can  also 
definitions  based  on  spatial  clustering  algorithm  as  well  contain  noise.  Density  based  approach  has  being  exten‐
as  R  tree  as  index  structure  in  three  dimensional  spatial  sively applied to deal with spatial databases. The density 
grids.  It  explains  spatial  cluster  algorithm  using  R  tree  based  clusters  are  generally  formed  with  set  of  objects 
(SCART)  model  and  structure  for  three  dimensional  R  that  meets  some  density  requirement.  They  are  several 
trees.  Section  4  explains  the  role  of  space  filling  curve  in  pre existing density based clustering algorithms approach 
grid  data  structure  and  define  search  algorithm  in  three  which  relates  with  spatial  data  such  as  Density‐Based 
dimension space and the challenges encountered, Section  Spatial Clustering of Applications with Noise (DBSCAN) 
5 conclusion is covered and in the last section future work  [7]  and  Distribution  Based  Clustering  of  Large  Spatial 
is presented.  Databases  (DBCLASD)  [25]. The grid based clustering
algorithm divides the region into finite space and clusters
are discovered according to adjancancy of neighbour.
2 LITERATURE REVIEW There are several grid based clustering algorithm such as
There  has  been  tremendous  increase  in  the  spatial  data  Statistical Information Grid (STING) [22] which is a multi
for  the  last  two  decades;  researchers  are  laying  down  resolution grid based clustering algorithm, it divides the
enormous  efforts  to  detect  hidden  patterns  from  spatial  region space into finite number of cells and spatial clus-
databases  [6].  The  outcome  of  research  has  been  used  in  ters are retrieved using statistical techniques such as
numerous  application  areas  such  as  medical  domain,  Clustering in Quest (CLIQUE) is combination of density
business  analysis,  weather  forecasting,  etc.  Spatial  Data  and grid based approach to find clusters in high dimen-
Mining can be used to discover unknown hidden patterns  sionality of spatial data [2], Wave  Cluster  algorithm  [20] 
from  spatial  databases.  There  are  several  pre  existing  clusters data in multidimensional grid space and wavelet 
techniques  used  in  discovery  of  spatial  patterns  such  as  transform  determines  the  dense  regions  of  space  and 
spatial clustering, spatial association rules, spatial charac‐ STING+ [24] is enhanced part of STING algorithm. In this 
terization and spatial trend detection [14].  approach  triggers  are  used  whenever  there  is  change  in 
       Spatial clustering plays major role in spatial data min‐ the  database.  Spatial  queries  are  processed  with  the  pyr‐
ing techniques. It has being widespreadly used for several  amid  like  structure.    The  main  advantages  of  grid  based 
application  domains  such  as  weather  forecasting,  remote  clustering  it  has  fast  processing  time,  generally  depends 
sensing,  satellite  image  analysis  and  several  other  re‐ on  the  dataset.  Spatial  data  requires  proper  indexing for
search  areas.  Spatial  clustering  can  be  defined  as  unsu‐ storage, updation and querying of data. To deal with such
pervised  learning  technique  because  it  does  not  work  on  key issues several spatial data structures has being pro-
criteria  of  predefined  class.  Spatial  data  clustering  is  the  posed in past to deal with large spatial datasets.
process of grouping the objects according to the similarity        The  Spatial  data  structures  are  of  significant  im‐
of spatial objects.  The objects inside the cluster have high  portance for storange and retrieval of spatial data for sev‐
similarity  whereas  outside  the  cluster  objects  are  highly  eral application domains [15]. They are several pre exist‐
dissimilar  in  their  properties  [1].  The  spatial  clustering  ing  spatial  data  structures  such  as  quad  tree  utilized  for 
algorithms  are  capable  of  discovering  clusters  of  variant  two dimensional spaces [26] it can be defined as the hier‐
shapes. The clustering algorithms can be broadly catego‐ archical  spatial  data  structure  which  divides  the  space 
rized into Partitioning, Based, Hierarchical based, Density  into  four  equal  sized  nodes.  The  quad  tree  consists  of 
Based and Grid based clustering [8],[ 6]. They are recently  unique  root  where  each  parent  nodes  divides  its  self  re‐
reviewed  clustering  techniques  can  be  found  in  [2],  [6],  cursively into four child nodes whereas R tree is a height 
[4], [14].  balanced  tree  which  is  an  index  based  data  structure, 
Spatial clustering algorithms are designed to deal widely  used  in  application  areas  of  spatial  Database  for 
with large amount spatial data set to find relavant pat- decision  making  process  [3].  There  are  several  different 
terns. They are several advances in spatial clustering al- subclasses  of  R  tree  which  are  used  for  indexing  such  as 
gorithm from the past decade. The first spatial clustering R*Tree [9], Packed R Tree, Hilbert R ‐tree [5] and R+ Tree. 
technique developed in spatial data mining to discover K‐d‐tree  is  a  spatial  data  structure  which  divides  the  re‐
patterns from spatial data sets was CLARANS. It can be gional space into two hyperrectangles where the division 
defined as Clustering Large Applications based upon of plane is conducted by using perpendicular axis for di‐
Randomized Search (CLARANS) as in [17], [18], [19]. viding  the  hypeplane  through  median  point  [23].  Octree 
CLARANS detect patterns based on randomized search has  being  promisingly  used  for  indexing  of  spatial  data 
of data and can easily and efficiently handle large spatial [16]  it  subdivides  the  regional  space  into  octant,  until  or 
data. They are several other extensions of CLARANS al- unless  cubes  are  obtained,  it’s  a  tree  data  structure  in 
gorithms such as Spatial Dominant Approach SD- three  dimensional  spaces.  In  this  paper  we  are  focusing 
CLARANS and Non-Spatial Dominant NSD CLARANS on  R  tree  as  spatial  data  structure  for  storage  of  spatial 
which partitions the databases on the basis of K-Medoids data as R tree does not divide space into several pieces as 
partition method. There  are  several  density  based  ap‐ other  spatial  data  structure  such  as  quad  tree.  We  have 
proaches  applied  to  spatial  data  to  discover  clusters  on  used several statistical techniques for spatial clustering of 
the concept of density for specific region. Spatial clusters  data  in  three  dimensional  spaces  using  R  tree  as  spatial 
are determined on the basis of dense regions and are usu‐ data structure. 
JOURNAL OF COMPUTING, VOLUME 3, ISSUE 2, FEBRUARY 2011, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING
WWW.JOURNALOFCOMPUTING.ORG 87
level is determined with 30, 31, 32, until the required level
this can be done, when all the nodes contain maximum
number of entries. The root contains 1 node, level two
3 SPATIAL CLUSTER ALGORITHM USING R TREE IN contains 6 nodes. The number of level depends on the
THREE DIMENSIONAL GRID STRUCTURE required size of the data. We used three dimensional grids
to store the values of dataset. M is the maximum number
Growing data and dimensions has created researchers to
of nodes and m is the minimum number of nodes. If the
adopt new methodology for spatial data analysis. There
number of nodes inserted is less than m then the condi-
are several research based methodologies on two dimen-
tion of underflow exists whereas if it exceeds the M then
sional data, but real time datasets are multidimensional.
overflow of stack occurs. The underflow occurs during
Our proposed work discusses the issues related to spatial
the deletion process whereas overflow occurs during the
data in three dimensional spaces. The proposed algorithm
insertion process.
is based on simple stastical techniques to determine spa-
tial clusters using R tree as spatial data structure. The idea
comes from hierarchical as well as grid based clustering
algorithm. There are several definitions based on algo- Level 0 Root Node

rithm which are stated as below: Level 1

Parent
Node

3.1 Definitions Level 2

 In three dimensional spaces each data has been regarded  Child
Node

as point in spatial grid where each point in N dimensional 
Level 3

space can be defined as:  Leaf Nodes

a)  The  dataset  M  in  the  N  dimensional  space  can  be  di‐
Fig.1. Three Way R-trees showing different level of distribution
rected  as  M  {m1,  m2.  m3……,  mn}  with  N  number  of  di‐
mensionality as N {n1, n2, n3}.       
Fig. 2 represents the X, Y and Z plane in three dimen-
b)  The  mean  centre  of  each  dimension  can  be  defined  as 
sional. The grid size can be N*N*N where N represents
summation  of  n1  {na+  nb…..+  nn}/number  of  elements 
integer, the number of cells in each grid is N3.  The over‐
whereas the mean centre of each dimension can be statis‐
lapping  of  MBR  can  be  represented  in three  dimensional 
tically similarly found. 
grids. 
c)  The  dispersion  of  each  cluster  formed  around  mean 
centre  is  calculated  as  standard  dispersion  in  three  di‐
mensional  spaces,   dis,  and  relative  threshold  has  been   
regarded  as     between  the  dimensionality 
as  0 . 5   1 , the value is defined between standard 
Z plane

Enclosed
MBR
dispersion of value 0.5 and +1.  X plane
Y plane
d) The neighbor hood of each grid value is compared and 
hence merged if, standard dispersion around mean centre 
is substantially between the required threshold values. 

The spatial data structure R-tree in three dimensional Fig.2. Enclosed MBR in Three dimensional grid
spaces to represent spatial data in grid structure. Each cell
in grid structure corresponds to spatial attribute value. 3.2 Structure of Spatial Cluster Algorithm using R
The values in these cells are then analyzed with the help tree (SCART)
of R trees and searching is performed. R Tree is a height  
balanced tree with leaf level in the tree corresponds to the The  operation  of  R‐tree  in  three  dimensional  grid  struc‐
actual data present in the spatial grid and the root node tures is stored in main memory. We have  partitioned the 
summarizes the data in grid. Each node in the R tree di- space of cells in grid according to one diameter in space. 
vides itself into maximum of three corresponding nodes, Grid  helps  to  find  adjacency  points  and  neighbors  very 
except the root level corresponds to minimum of two easily.  The  leaf  nodes  in  tree  point  to  the  spatial  objects. 
nodes, until or unless it is a leaf node. The splitting of The Spatial cluster algorithm using R tree (SCART) merg‐
node occurs when each data points occur and it organizes es  the  cluster  as  similar  to  agglomerative  hierarchical 
itself on the grid structure. A leaf node contains spatial cluster  but  has  spatial  constraints  related  to  spatial  ob‐
object as well as MBR that is the minimum bound rectan- jects.  The  spatial  objects  are  merged  together  as  we  go 
gle of spatial object. A non leaf has list of child node and through  bottom  up  approach  or  reverase  approach, 
minimum bound rectangle. The bounding rectangle in whereas the R tree is formed in forward approach which 
parent node contains all the rectangle boxes of child node. is  from  top  to  bottom.  The  parent  node  contains  the  in‐
Fig. 1 represents R tree data structure where numbers of formation of child node as we move in forward approach 
nodes are determined at each level. Each node divides from one level to another. The clusters are formed on the 
itself to three nodes at each level until all the data points basis  of  the  information  processed  on  the  grid  structure 
are located in the grid form. The number of nodes at each from reverse approach. The R tree structure connects eve‐
JOURNAL OF COMPUTING, VOLUME 3, ISSUE 2, FEBRUARY 2011, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING
WWW.JOURNALOFCOMPUTING.ORG 88
ry  point  on  the  grid  at  various  levels  to  form  relevant  clusters at different level of tree. The clusters of common 
clusters. We recursively keep on expanding the tree until  characteristics are prepared at each node at various levels. 
or unless the spatial objects are sufficiently placed on grid  We start our algorithm after we have calculated statistical 
structure. The inner node in R tree contains the statistical  relevant factors at each node which mainly includes mean 
information such as mean centre and standard dispersion  centre  value,  child  nodes,  parent  node,  level,  standard 
of all the child nodes. This helps us in retaining the rele‐ dispersion and x, y, and z coordinates. Each node consists 
vant clusters and discards the others.   of list of values and the relevant lists are found and merge 
    The structure of R tree node in three dimensional archi‐ from  one  level  to  other  in  bottom  up  fashion.  The  algo‐
tecture of spatial grid is defined as     rithm  to  calculate  the  relevant  information  in  bottom  up 
  approach is as follows: 
Struct node {   
TypeMBR MBR, // minimum bound rectangle  Algorithm Find_Clusters 
Int node_number, // node number of each node      If (node_level >= 0) 
Float xcoordinate, // value of xcoordinate  1. For each node calculate  statistical relevant factor 
Float ycoordinate, // value of ycoordinate  2. mean centre 
Float zcoordiante, //value of zcoordinate  3. standard dispersion , the number of child node , 
Bool lnode, //determine the node is leaf or non leaf 
the x, y, z coordinate values and minimum 
Float  mean_centre,  //determine  the  mean  centre  of       
bounding rectangle 
node 
Float standard dispersion// measure of dispersion  4. merge the list according to threshold value 
 
Int listofitems, // total number of elements in node  5. else  
        Int parent_number, //the parent node   6. { increase the node_level and then repeat the 
                     }  find_cluster} 
  7.  End  
    The statistical relavant information is calculated at each   
level  by  SCART  algorithm  to  determine  the  correspond‐
ing  leaf  nodes.  We  determine  the  mean  centre  as  well  as  4 SPACE FILLING CURVE
the standard dispersion of each node at each level of tree. 
The mean centre contains the attribute values of nodes at  The grid based approach is very efficient to handle large 
lowest level, the mean centre has the least square proper‐ amount of databases. It utilizes the space into finite num‐
ty  which  can  be  defined  for  three  dimensional  spaces  by  ber of blocks and it focuses on large number of blocks. To 
(1),  (2)  and  (3)  where  X m is  the  mean  centre  represents  arrange  the  data  on  the  grid  we  are  using  space  filling 
X i the  attribute  values  and  n   is  the  total  number  of  val‐ curves. The space filling curves are the continuous curves 
ues,  correspondingly  we  are  calculating  the  mean  centre  which moves exactly once among the points in grid with‐
of other dimensions.  out crossing themselves. The curves usually have two free 
  ends  that  may  be  joined  with  other  paths.  We  are  using 
X 
 Xi
                                                                                         (1)  the  Hilbert  space  filling  curve  to  store  the  data  on  three 
                          m n dimensional  grid  structures.  It  is  first  described  by  [10] 
  and used to express the locality of two‐dimensional data 

                                                                                         (2) 
Ym 
Yi in  a  one‐dimensional  space.  The  Hilbert  space‐filling 
  curve imposes a linear ordering on the grid cells, so that 
n
  similar data should be placed together in the linear order 

                                                               
Zm 
Zi
                                                                                        (3) 
whereas  assigning  single  value  to  each  cell.  The  Hilbert 
space  filling  curve  achieves  the  best  clustering  by  mini‐
  n mizing  the  number  of  clusters.  The  search  algorithm  im‐
  plemented on three dimensional R tree as follows.  
    The  standard  dispersion  of  node  n  can  be  represented   
as dis ,  where  we  are  finding  the  absolute  dispersion  of  1. The searching algorithm starts with the root node 
objects  in  space.  The  measure  of  standard  dispersion  is 
G  where  the  cuboid  H  is  the  search  and  to  find 
given in (4). 
each cuboid overlapped by the search cuboid. 
 
  2. Search Sub tree: if the root node is not a leaf node 

 dis  
( X i  X m ) 2  (Yi  Y m ) 2  ( Z i  Z m ) 2
                                                                                              (4)  then  each  entries  in  the  sub  tree  L,  whether  it 
  n ( n  1) overlaps  the  cuboid  H  or  not.  If  it  overlaps  then 
      we start search for the consecutive non leaf node.   
     The  calculated  information  is  propagated  above  the  R  3. Search leaf node: if the root G is a leaf node, then 
tree  from  bottom  to  up  approach;  the  similar  values  are  we  check  all  entries  for  the  suitable  overlap  and 
merged  as  relevant  to  information.  We  have  found  the  result is found.  
mean centre, standard dispersion of all node values with 
the  top  to  bottom  approach.  We  aim  to  discover  similar 
JOURNAL OF COMPUTING, VOLUME 3, ISSUE 2, FEBRUARY 2011, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING
WWW.JOURNALOFCOMPUTING.ORG 89
5 CONCLUSIONS 4th International Conference on Knowledge Discovery and Data 
Mining (KDD98), New York, NY, USA, pp. 58‐65, August 1998. 
The approach is to find R tree as a suitable index structure
[12] Jajuga, K., Sokolowski, A., Bock, H. ‐H.  Classification, cluster‐
for three dimensional grids. We have demonstrated a ing and data analysis. New York: Springer, 2002. 
search algorithm in three dimension grid with the help of [13] Kaur,  H.,  Wasan  ,S.  K.,  An  Integrated  Approach  in  Medical 
R tree as index data structure for Decision Making for Eliciting Knowledge, Web‐based Applica‐
Spatial data, the Hilbert space filling curve is used for tions in Health Care & Biomedicine, Annals of Information Sys‐
three dimensional grids to store the values. The spatial tems (AoIS),U.S.A., Eds: A. Lazakidou, Springer, 2009. 
clustering algorithm is developed to find the clusters of [14] Kaur, H., Chauhan, R., Alam, M, A., An Optimal Categorization 
different shapes without any user input. The R tree data of Feature Selection Methods for Knowledge Discovery, Visual 
structure is really efficient to deal with the spatial data Analytics  and  Interactive  Technologies:  Data,  Text  and  Web 
and hence can prove to be backbone for future spatial Mining Applications, Eds: Zhang, Segall, and Cao, IGI Publish‐
data mining techniques. ers Inc., 2010. 
[15] Samet, H., The Design and Analysis of Spatial Data Structures. 
Addison–Wesley, 1990. 
6 FUTURE WORKS [16] Fujimura, K., Toriya, H., Tamaguchi, K.., Kunii. T. L., Octree 
The  future  work  will  be  focused  on  other  spatial  index  algorithms for solid modeling. In Proc. Inter    graphics’83, vol‐
techniques that can be helpful for determination of spatial  ume B2‐1, pages 1–15, 1983. 
cluster with the help of spatial data structures and space  [17] Kaufman,  L.,  Rousseeuw,  P.J.,  Finding  Groups  in  Data:  an  In‐
filling curve for n dimensional data space.  troduction to Cluster Analysis, John Wiley and Sons, 1990 
[18] MacQueen, J., Some Methods for Classification and Analysis of 
Multivariate Observations, In Proceedings of 5th Berkeley Sym‐
posium  on  Mathematical  Statistics  and  Probability,  1,  pp.  281‐
REFERENCES 
297. 
[1] Abonyi, J, Feil, B., Cluster analysis for data mining and system  [19] Ng, R. T., Han, J., Efficient and Effective clustering methods for 
identification. Boston, MA: Birkhäuser Basel, 2007.  spatial  data  mining,  In  Proceedings  of  20th  International  Con‐
[2] Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P., Automat‐ ference on Very Large Data Bases, Santiago, Chile, pp. 144‐155, 
ic Subspace Clustering of High Dimensional Data for Data Min‐ September 12‐15, 1994, 
ing  Applications,  Proceedings  of  the  1998  ACM‐SIGMOD  In‐ [20] Sheikholeslami,  G.,  Chatterjee,  S.,  Zhang,  A.,  WaveCluster:  A 
ternational Conference on Management of Data, Seattle, Wash‐ Multi‐Resolution  Clustering  Approach  for  Very  Large  Spatial 
ington, pp. 94‐105, June 1998.  Databases, Proceedings of the 24th Very Large Databases Con‐
[3] Guttman  A.,  R‐trees:  A  Dynamic  Index  Structure  for  Spatial  feerence  (VLDB98),  New  York,  NY,  USA,  pp.  428‐439,  August 
Searching”, Proc. ACM SIGMOD Int.Conf. On Management of  24‐27, 1998. 
Data, Boston, MA, pp.47‐57, 1981.  [21] ]  Hand,  D.,  Mannila,  H.,  Smyth,  P.,  Principles  of  Data  Mining, 
[4] Berkhin,  P., Survey of clustering data mining techniques. Tech.  Prentice Hall of India Private Limited, India, 2001.  
rep., Accrue Software, San Jose, CA, 2002.  [22] Wang, W., Yang, J., Muntz, R., STING: A Statistical Information 
[5] Kamel,  I.,  Faloutsos  C.,  Hilbert  R‐tree:  An  improved  R‐tree  Grid Approach to Spatial Data Mining, Proceedings of the 23rd 
using fractals. In Bocca J.,  Matthias, J., and Zaniolo C., editors,  VLDB Conference, pages 186‐195, Athens, Greece, 1997. 
20th International Conference on Very Large Data Bases, Santia‐ [23] Bentley, J. L., Multidimensional binary search trees used for 
go, Chile proceedings, pp. 500‐509, Morgan Kaufmann Publish‐ associative searching. Commun. ACM, 18(9):509–517, Sept.1975. 
ers , Los Altos, CA 94022, USA, 1994.  [24] Wang, W., Yang, J., Muntz, R., STING+: An Approach to Active 
[6] Chauhan,  R.,  Kaur,  H.,  Alam,  M,  A.,  Data  Clustering  Method  Spatial Data Mining, Proceedings of the Fifteenth International 
for  discovering  clusters  in  Spatial  Cancer  Databases,  Interna‐ Conference  on  Data  Engineering,  Sydney,  Australia,  pp.  116‐
tional Journal of Computer Applications, 2010.  125, March 23‐26, 1999. 
[7] Ester,  M.,  Kriegel,  H.  –P.,  Sander,  J.,  Xu,  X.,  A  Density‐Based  [25] Xu, X., Ester, M., Kriegel, H‐P., Sander, J., A Distribution‐Based 
Algorithm  for  Discovering  Clusters  in  Large  Spatial  Databases  Clustering  Algorithm  for  Mining  in  Large  Spatial  Databases, 
with  Noise,  Proceedings  of  2nd  International  Conference  on  Proceedings of 14th International Conference on Data Engineer‐
Knowledge  Discovery  and  Data  Mining  (KDD‐96),  Portland,  ing (ICDE’98), Orlando, FL, pp. 324‐331, AAAI Press, 1998. 
Oregon, August 1996, AAAI Press, 1996.  [26] Orenstein, J. A., Multidimensional tries used for associative 
[8] Everitt,  B.  S.,  Landau,  S.,  Leese,  M.  Cluster  analysis  (4th  ed.).  searching. Inform. Process. Lett., 13:150–157, 1982. 
London: Arnold, 2001.  [27]  Samet,  H.,  The  Design  and Analysis  of  Spatial  Data  Structures.    
[9] Beckmann N., Kriegel H.P., Schneider R., Seeger B., “The R*‐    Reading, MA: Addison‐Wesley, 1989b. 
tree: an efficient and robust access method for points and rec‐  
tangles”, Proceedings of the 1990 ACM SIGMOD international  Harleen Kaur gained her Ph.D. in Computer Science from Jamia
conference on Management of data, Atlantic City, NJ USA,  Millia Islamia University, New Delhi, India on the topic of Applications
of Data Mining techniques in Health care Management. She gradu-
pp.322‐331, 1990. 
ated from the University of Delhi, New Delhi. She has previously
[10] Hilbert,  D.,  Uber  die  stetige  abbildung  einer  linie  auf   served as a Lecturer in Computer Science, University of Delhi. Cur-
Fl¨achenst¨uck,  Mathematische  Annalen,  vol.  38,  pp.  459–460,  rently, she is an Assistant Professor at the Department of Computer
1891.  Science, Hamdard University. She has published numerous re-
search articles in refereed international journals and conference
[11] Hinneburg, A., Keim, A.D., An Efficient Approach to Clustering 
proceedings and chapters in an edited book. She is a member of
in Large Multimedia Databases with Noise, Proceedings of the  several international bodies. Her main research interests are in the
JOURNAL OF COMPUTING, VOLUME 3, ISSUE 2, FEBRUARY 2011, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING
WWW.JOURNALOFCOMPUTING.ORG 90
fields of Data analysis with applications to medical databases, Medi-
cal decision making, Fuzzy logic, Information Retrieval, Bayesian
networks and visualization.

Ritu Chauhan is currently studying at the Hamdard University at


New Delhi, India towards a Ph.D. in Spatial Data Mining; concentrat-
ing on clustering data mining under the supervision of Dr. Harleen
Kaur. She previously gained a M.Sc. degree in Computer Science at
the Hamdard University, New Delhi. She has published numerous
research papers in international journals and chapter in an edited
book.

M. Afshar Alam is a Professor in Computer Science and Head,


Department of Computer Science, Faculty of Management and In-
formation Technology, at the Hamdard University, New Delhi, India.
In 1997-2000, he founded the Department of Computer Science,
Hamdard University. He was also founder of Computer Centre at
Hamdard University. He received his Master degree in Computer
Science from the Aligarh Muslim University, Aligarh and Ph.D. from
Jamia Millia Islamia University, New Delhi. His research interests
include Fuzzy logic, Software engineering and Bioinformatics. He is
the author of a book on Software re-engineering and over 50 publi-
cations in International/ National journals and conference. He is a
member of expert committee AICTE, DST, UGC and Ministry of Hu-
man Resource Development (MHRD), New Delhi, India.

Você também pode gostar