Você está na página 1de 2

Group 9 : Navneet (P16013), Abhishek Pandey (P16039), Sourabh (P16040), Utthra (P16044), Viral (P16058),

Dhruvkumar (16052)

Cluster Analysis
Cluster Analysis is a class of techniques used to classify objects or cases into relatively
homogeneous groups called clusters. It is also known as classification analysis or numerical
taxonomy. It is used in marketing to segment the market, to understand buyer behavior, to
identify the product opportunities, to select test markets and to reduce data. Most of the
cluster analysis are heuristics, based on algorithms. It is in contrast with the analysis of
variance, regression, factor analysis etc. which are based on statistical reasoning. Some of the
statistics associated with cluster analysis are, agglomeration schedule, cluster centroid, cluster
centers, cluster membership, dendogram, icicle plot, and similarity/distance coefficient matrix.
Cluster Analysis is conducted in following processes
1. Formulate a problem The most important part is the selection of the variables, as
even one irrelevant variable may distort the whole solution. Variable should be selected
based on past research and the hypotheses being tested.
2. Set a distance/similarity measure It is measured by measured by Euclidean distance,
i.e. the square root of sum of squared differences in values of each variable. Other
distance measures include City block or Manhattan distance, Chebychev distance.
Solution is influenced by the units of measurement.
3. Select a clustering procedure It can be hierarchical or Nonhierarchical.
Clustering Procedures
Hierarchical Non Hierarchical Others
Agglomerative Divisive Two step
Linkage Variance Centroid sequential parallel Optimizing Partitioning
Wards
Single Complete Average
The disadvantage of non-hierarchical procedures is that the number of clusters must be
prespecified and the selection of cluster centers is arbitrary.
4. Decide on the number of clusters Number of clusters are selected by using some
guidelines like considering the theoretical, conceptual, or practical suggested numbers,
or considering the distance of the clusters combined, from the dendogram or other
statistical measurements, or from ratio of total within group variance to between group
variance, in case of non-hierarchical clustering, or from the frequency count of cluster
membership.
Group 9 : Navneet (P16013), Abhishek Pandey (P16039), Sourabh (P16040), Utthra (P16044), Viral (P16058),
Dhruvkumar (16052)

5. Interpret and Profile the clusters Interpreting and profiling the clusters by examining
the cluster centroids. It is helpful to profile the clusters in terms of variables that were
not used for clustering.
6. Assess reliability and validity It is done by performing cluster analysis on same data
using different distance measures, using different methods of clustering, performing
cluster analysis after splitting the data into halves, performing cluster analysis after
eliminating some variables, and then finally comparing the results.

Você também pode gostar