Você está na página 1de 14

COMPARISON

OF

GRAPHICAL DATA ANALYSIS

METHODS

Karl Erich Wolff

Fachhochschule Darmstadt
Forschungsgruppe Begriffsanalyse der Technischen Hochschule Darmstadt
Ernst Schröder Zentrum für Begriffliche Wissensverarbeitung

In: Faulbaum, F. & Bandilla, W.


SoftStat ´95 Advances in Statistical Software 5,
Lucius&Lucius, Stuttgart 1996, 139-151.
Comparison of graphical data analysis methods

Karl Erich Wolff

Summary
Factor Analysis, Principal Component Analysis, simple and multiple Correspondence
Analysis, Cluster Analysis, Multidimensional Scaling, Partial Order Scalogram Analysis and
Formal Concept Analysis are well-known graphical methods in data analysis. To uncover the
main differences between these methods we discuss their interpretation principles and their
information preserving abilities. It is shown that Formal Concept Analysis may represent the
original data without loss of information in the plane. All other mentioned methods loose
some information on the way from the original data to the graphical output. The critical steps
in the data representations of these methods are discussed.

1 Data structures and the Information Representation Problem

Graphical methods in data analysis are designed to visualize structures of the data in the plane
or in 3-space.
What does this mean? What are data? Up to now there is no formal theory of data which could
help to clarify the language between data analysts. The data are usually given as „tabular
data“, often obtained as „measurement data“ arising from measurements of objects (or
„individuals“) with respect to some attributes (often called „variables“) or as „proximity
data“ like dissimilarities or distances between objects or attributes. For a detailed discussion
of data types the reader is referred to COOMBS (1964). As a formal definition of tabular data
we use here the notion of a many-valued context which is defined (cf. WILLE 1982,
GANTER, STAHL, WILLE 1986) as a quadrupel (G,A,W,J) of sets, where J ⊆ G×A×W and (
(g,a,v) ∈ J, (g,a,w) ∈ J ⇒ v = w ). We read the statement „(g,a,w) ∈ J“ as „at object g
attribute a has value w“ and say „w is a value of a“ if there is an object g ∈ G such that (g,a,w)
∈ J. If there is no value w with (g,a,w) ∈ J we say „a has a missing value at g“. Hence many-
valued contexts describe roughly speaking „data matrices with arbitrary entries and possibly
missing values“, but without matrix notation.
To analyse a many-valued context the data analyst has to decide which data representations
and operations are „allowed“ with respect to the given purpose of the analysis. For
measurement data the main decision lies usually in the determination of the „coding-“ or
„scaling-procedure“, which describes for each attribute not only the domain, i.e. the set of all
„possible“ values, but also a structure on the domain, which is used to describe the operational
meaning of the values. This scaling decision should be discussed carefully between the data
analyst and the data expert (who is responsible for the design and the collection of the data).
The structure of the chosen scale may be e.g. metric, ordinal, nominal, geometric, topological,
stochastic, conceptual or a mixture of them. For each structure on the domain of a given
attribute it should be decided what the data preserving domain transformations are. Then the

1
resulting structure of the data table together with the domain structures for each attribute
represents the formalized meaning of the data. This structure is called here the data structure.
Each data analysis method represents these data structures in further structures and the
graphical data analysis methods finally lead to a representation which shows some image of
the original data structure in a graphical structure, e.g. the plane or the 3-space. But what is
„the plane“ or „the 3-space“? There are many structures on the set R2 (or on R3 ) which may
be used to represent data, e.g. the Euclidean distance, the structure of a real vector space or of
an affine or projective geometry, the closure system of the convex sets, orders of different
types or many other possibilities. This graphical structure should be described also in detail to
study the data representations from the original data structure into the final graphical structure.
What does it mean to „visualize structures of the data“? Looking at some graphical
representation of the data we hope to understand in terms of the graphical concepts some
structures in the original data. Hence the data representations from the original data into the
final graphical structure should preserve at least some parts of the information given in the
original data.
Therefore the main problem in the graphical representation of multidimensional data and
more generally in all knowledge representation methods is the Information Representation
Problem IRP: Given a class R of representations from a knowledge system K1 into a
knowledge system K2 : Which „informations about K1“ are preserved by the representations of
R, i.e. which parts of the knowledge system K1 can be reconstructed from an r-image of K1 in
K2 and the knowledge that r is a representation of R?
Example: In many graphical methods this problem arises for a class R of projections in the
form: „What does the distance between two points in the plane mean for the two objects
represented by the two points?“.
The Information Representation Problem has been studied e.g. in Measurement Theory
(D.H.Krantz, R.D. Luce, P. Suppes, A.Tversky, Foundations of Measurement, Vol. I-III),
where the knowledge systems are described by relational structures. Representations
preserving the „whole“ structure of the first relational structure are defined as „fundamental
measurements“ or „total homomorphisms“. The IRP has been studied also in Formal Concept
Analysis (cf. GANTER, WILLE 1989), where the knowledge systems are formal contexts (i.e.
triples (G,M,I) of sets, where I is a binary relation between the set of objects G and the set of
attributes M, i.e. I ⊆ G×M) and the representations are scale measures, i.e. functions from the
set of objects of the first context into the set of objects of the second context such that the
preimage of every extent of the second context is an extent of the first context. Full scale
measures, i.e. scale measures with the property that every extent of the first context is a
preimage of an extent of the second context, are representations which preserve the whole
conceptual structure of the first context.

2 Interpretation principles of graphical data analysis methods

Each graphical data analysis method interprets the given data in a way which usually depends
heavily on some principles of knowledge representation. These principles influence the design
of the experiments and the choice of measurements. From the large number of these
principles we select some which are central for the description of graphical data analysis
methods. A very important one is the interpretation principle of distance: The objects under
consideration are viewed as members of a usually not explicitly defined metric space. To
construct such a metric space from the given many-valued context the values of each attribute
are interpreted as elements of a metric space, hence the n-tuple of all measurements for a

2
given object lies in the direct product of these spaces, in which a further metric has to be
defined. In many methods the real n-dimensional vector space together with the Euclidean
metric is the first representation space which is then projected into the Euclidean plane to
visualize some image of the data structure. These methods use in addition to the interpretation
principle distance the interpretation principle coordinatization which indicates here the use of
the real n-dimensional vector space, bases, coordinates and standard matrix algebra. These
classical tools can be applied successfully in the case of measurements on the absolute scale
level (in the sense of measurement theory), especially for the analysis of contingency tables
which are usually obtained from measurement data by nominal scaling.
It is well-known that some types of data (e.g. ordinal data) can´t be represented satisfactorily
by metric methods, especially not by metric vector space methods. Therefore more general
methods have been developed, using the interpretation principle hierarchy. A simple tool in
this realm is the concept of a chain (where a chain is defined as an ordered set in which any
two elements are comparable with respect to the given order relation). If the domain structure
of each attribute of the given data structure is interpreted as a chain, then these data can be
represented in a direct product of chains. A very general and fruitful interpretation principle is
that of a concept. This has a long philosophical tradition in which the connection between
objects and attributes is investigated. The classical concept of concept and the conceptual
hierarchy has been formalized in Formal Concept Analysis, a new method in data analysis.
Now we collect in List 1 some graphical data analysis methods and for each of these methods
several key-words to give an overview over the main interpretation principles used in the
selected methods.

List 1:

Method Interpretation principles

FA Factor Analysis distance, coordinatization,


absolute scale
PCA Principal Component Analysis distance, coordinatization,
absolute scale
CA Correspondence Analysis distance, coordinatization,
nominal scaling,
contingency table,
absolute scale
MCA Multiple Correspondence Analysis distance, coordinatization,
nominal scaling
PRINCALS Principal Component Analysis distance, coordinatization
by Alternating Least Squares
CLUS Cluster Analysis distance, hierarchy
MDS Multidimensional Scaling distance
POSA Partial Order Scalogram Analysis hierarchy, product of chains
FCA Formal Concept Analysis concept, hierarchy

Before going into details we give a graphical representation of the information in this list
using Formal Concept Analysis.

3
Diagram 1: Interpretation principles of graphical data analysis methods

Diagram 1 contains exactly the information given in List 1. How to read such a line diagram?
Each graphical data analysis method and each key-word of List 1 is represented as a point in
Diagram 1. Each method has exactly those key-words in List 1 whose points are reachable by
an upwards leading path from the point representing the given method (e.g. FCA has the key-
words concept and hierarchy). For further details about Formal Concept Analysis the reader is
referred to WILLE (1984, 1987) and WOLFF (1988, 1994a,b).
Now we use Diagram 1 to understand List 1:
Diagram 1 shows that the key-words distance and hierarchy play a prominent role, since all
methods mentioned are covered by these two key-words. Cluster analysis is the only method
having both key-words. MDS is the only method having only the key-word distance. What´s
the reason why we didn´t give MDS the key-word coordinatization though the MDS
calculations use matrix algebra? Answer: the formulation of the MDS-problem doesn´t need
the structure of the vector space, but only the set Rn together with the Euclidean metric.
Among the methods using coordinatization PRINCALS is the most general one (in the sense
that is has no further key-word under coordinatization), while MCA transforms the data using
nominal scaling and FA and PCA work usually with data of absolute scale type. Simple
Correspondence Analysis (CA) represents contingency tables whose values are of absolute
scale type, but these values are usually obtained from data using nominal scaling. Besides the
hierachic cluster analysis methods there are two hierarchical methods which use neither
distance nor coordinatization: POSA represents the data in direct products of chains and FCA
uses the concept of concept and the conceptual hierarchy which is represented by line
diagrams in the plane. POSA and FCA don´t have the key-word coordinatization in the
narrow sense explained above though both methods use ordinal coordinates in direct products
of chains as an important tool.

4
After this short overview over the mentioned graphical data analysis methods and their
interpretation principles we discuss the representation of data in these methods.

3 Data representations in graphical data analysis methods

All graphical data analysis methods represent the original data structure after several steps in a
data structure in the plane or in 3-space. Hence the Information Representation Problem arises
for each of these methods. To formulate these problems in a precise manner it would be
necessary to describe for each method the original and the graphical data structure and a
suitable representation. This can be done here only in an informal way describing the most
important steps and problems.

3.1 Main features of graphical data analysis methods

The following Table 1 gives an overview over the data representations of the above selected
data analysis methods. Under the heading data the main meaning of the input data of each
method is mentioned. Under the heading calculation we indicate the main tools or results of
the calculations used in the given method. The different kinds of calculations are related to
different types of methods in Diagram 1, e.g. the methods under the key-word
coordinatization use the n-dimensional real metric vector space and calculate singular values
or eigenvalues and bases of eigenvectors to embed the rows and columns of the given matrix
in a low-dimensional space. Under the heading plane there are five different kinds of planes in
the set R2 : The key-word Euclidean denotes the usual Euclidean two-dimensional vector
space. The y-metric plane is defined as the pair (R2, dy), where dy is the distance between the
y-coordinates of two points:
dy ((x1 , y1 ), (x2 , y2 )) :=  y1 - y2 .
Just the y-metric plane is necessary to represent dendrograms of hierarchies of clusters. The
metric plane is defined as the pair (R2 , d ), where d is the usual Euclidean metric in R2 . Just
the metric plane is used for the representation of MDS-solutions. The ordered plane denotes
the direct product of the usual real order (R, ≤R ) with itself. The y-ordered plane is defined by
the low-high quasiorder (R2 , ≤y ) where the low-high quasiorder ≤y extends the usual real
order ≤R from the y-axis to the plane: (x1 , y1 ) ≤y (x2 , y2 ) :⇔
⇔ y1 ≤ R y2 .
Under the heading diagram we mention the names of the standard diagrams used in the
corresponding method. The authors introduced the corresponding method in the year listed in
the last column of Table 1.
METHOD data calculation plane diagram author year
FA correlation eigenvalues Euclidean factor plot Pearson 1901
PCA real matrix singular values Euclidean PCA plot Hotelling 1933
CA contingency singular values Euclidean map Benzécri 1973
MCA data matrix singular values Euclidean map Benzécri 1973
PRINCALS data matrix alt.least squares Euclidean solution Gifi 1985
CLUS distance cluster y-metric dendrogram Sneath 1957
MDS similarity points, distances metric solution Torgerson 1958
POSA ordinal structure order embedding ordered Hasse diagram Shye 1976
FCA m.v. context concept lattice y-ordered line diagram Wille 1982

Table 1: Main features of graphical data analysis methods

5
3.2 Explanation of Table 1:

Now we give a more detailed description of the selected methods and their data
representations:

• Factor Analysis:
Factor Analysis was invented by PEARSON (cf. GIFI (1990), page 102). „According to
Pearson everything in this world varied continously on a scale; discrete variables are always
discritized continous variables ... . Pearson ...was convinced that a unified conception of
science was possible starting from the concept of correlation, instead of causality. The unified
conception implied the idea that biology and anthropology had the same kind of lawlike
relationships as physics, i.e. functional relationships between measurable variables.“ (cf. GIFI
(1990), p.43). Therefore the set R of real numbers is used as the standard description of the
domain of a measurement and the m-dimensional real vector space as the standard
representation space for m measurements. These data are understood in Factor Analysis as an
information about a sample taken from a large universe. To interpret the data in terms of a
general theory about the universe a model is formulated giving a more or less narrow frame
for the embedding of the data. The central idea in Factor Analysis is the assumption of the
existence of certain „hypothetical variables“, called „factors“ such that the (ternary) relation
between attributes and objects which is given by the data can be „explained“ by the
connection of two relations, namely a relation between attributes and factors and a relation
between factors and objects. Since Factor Analysis represents the measured values of m
attributes on n objects as a real m×n-matrix Y which is usually transformed by row
standardization to a matrix Z, the central model in Factor Analysis is described in matrix
notation by Z = AP, i.e. the „standard score“ zij of the i-th attribute on the j-th object is a
linear combination zij = ai1p1j +...+ airprj , where aik is called the „factor loading“ of the i-th
attribute with respect to the k-th factor and pkj is the „factor score“ of the j-th object with
respect to the k-th factor (k = 1,...,s). Hence the main mathematical problem in Factor
Analysis is the construction of an integer s, a m×s-matrix A and a s×n-matrix P such that Z =
AP. This problem has many solutions (s,A,P). To construct „meaningful“ solutions the
restriction is used that „the factors are uncorrelated“, i.e. PP´/(n-1) = I, where I is the identity
matrix. This implies for the correlation matrix R := ZZ´/(n-1) of the correlations between the
attributes that R = AA´, the fundamental theorem of Factor Analysis. This equation is used to
construct A from the correlation matrix R. The standard technique calculates the positive
eigenvalues λ1 ≥ λ2 ≥...≥ λr of the correlation matrix R, where r is the rank of R and computes
an orthonormal basis u1,...,ur of eigenvectors such that Rui = λi ui (1≤ i ≤ r). Let U =
(u1,...,ur) be the m×r-matrix with ui as i-th column. It´s well-known that R = UDU´, where D
:= diag(λ1 , ..., λr ). There are several strategies to determine a „meaningful“ number s of
factors, e.g. using the scree test (cf. CATTELL (1966)). Often the s factors are chosen as the
eigenvectors u1,...,us belonging to the s largest eigenvalues (counted with their multiplicity),
hence A := (u1,...,us). For other factor construction strategies on the basis of the model of
multiple Factor Analysis, for the problem of communalities, the rotation problem and the
problem of the estimation of the matrix P the reader is referred to the literatur (cf. Harman
(1960),Überla (1971)).
Why Factor Analysis is mentioned here as a graphical data analysis method? To understand
how the attributes are related to the extracted factors it is useful to represent the attribute
vectors (i.e. the rows of the matrix A) in the coordinate system of the factors. The plot of the
attribute vectors in the plane spanned by two selected main factors (e.g. u1, u2 ) is called the

6
factor plot. In the same way also the object vectors of the matrix P can be represented in the
coordinate system of the factors. This graphical representation of data is the forerunner of
PCA, CA, MCA, PRINCALS and POSA. Therefore Factor Analysis is mentioned here as the
first graphical data analysis method.
Main problems: 1. A fundamental problem in the applications of Factor Analysis (and many
other methods) lies in the selection of the real n-dimensional space as a too narrow knowledge
representation space. 2. If the measured values are represented as real numbers, then the
„operational meaning“ of the values of each attribute is usually described by one of the
classical scale types (e.g. absolute scale, ratio scale, interval scale, ordinal scale and nominal
scale). Are the meaningful statements (with respect to the scale types of the attributes)
invariant under the data transformations in Factor Analysis, e.g. rotations and projections? 3.
It is well-known that the linear correlation coefficient between two attributes is a very rough
measure for the distribution of the objects in the plane spanned by the two given attributes.
Which statements about the original data can be reconstructed from the correlation matrix or
even from the factor plots, e.g. from the distances between the points representing the
attributes? 4. The correlation matrix contains only information about pairs of attributes, but
not about triples or quadruples.

• Principal Component Analysis:


Based on the idea of PEARSON to study the eigenvalues and eigenvectors of the correlation
matrix HOTELLING (1933) developed Principal Component Analysis (cf. GIRSHICK
(1936) „Principal Components“) which became applicable to real (not necessarily square)
matrices (especially the data matrices) by the application of the ECKART-YOUNG
approximation using the Singular Value Decomposition (SVD) . This led to the possibility of
the representation of objects and attributes in the same plane. To get an understandable
interpretation of the relation between the object vectors and the attribute vectors GABRIEL
(1971) introduced the biplots, in which the inner product between an object vector and an
attribute vector approximate the corresponding elements of the given data matrix. For details
the reader is referred to the literature (cf. JOLIFFE 1986).
Main problems: 1. Which statements about the original data can be obtained from the PCA
plot? 2. What is the meaning of the distances in the PCA plot? 3. Even, if the „explained
variance“ is high, it may happen that some objects and attributes are represented falsely, i.e.
the reading rule leads to results contradicting the original data (cf. SPANGENBERG, WOLFF
1991b).

• Correspondence Analysis:
Correspondence Analysis was introduced by BENZÉCRI (1973). The standard application of
CA handles the problem of scale type in the original data matrix by nominal scaling (of two
attributes of the original data matrix) leading to a contingency table. The chi-square distances
between the row profiles of the contingency table are represented isometrically in the
Euclidean n-dimensional vector space Rn and the same holds for the column profiles. The CA
maps are obtained by projections onto the Euclidean plane spanned by the eigenvectors of the
two largest eigenvalues.
Main problems: 1. The profile points in Rn contain the main information of the contingency
table by the „reconstitution formula“, but the contingency table is not reconstructable from the
CA map. 2. The distances between row points in the map and also the distances between
column points in the map approximate the corresponding chi-square distances, but „there is no
direct row-to-column distance interpretation“ (cf. GREENACRE in GREENACRE,
BLASIUS 1994, p. 21). 4. What is the conceptual meaning of the chi-square distances?

7
• Multiple Correspondence Analysis:
In MCA the original data matrix is transformed by nominal scaling into a 0-1-matrix Z
(objects corresponding to rows, attributes to columns) with constant row sums, called the
complete indicator matrix. MCA is the CA of Z or of the Burt matrix ZTZ (cf. GREENACRE
in GREENACRE, BLASIUS 1994, ch. 7). Hence the CA problems occur also in MCA.
Main problems: 1. Nominal scaling of an attribute doesn´t represent relevant ordinal
structures in the set of values of the scaled attribute. 2. The MCA plot usually represents only
the columns of Z but not the rows. 3. The Burt matrix represents only information about all
pairwise intersections of attributes (viewed as subsets of objects), but not the whole
information about the indicator matrix since there are non-isomorphic indicator matrices with
the same Burt matrix. 4. The graphical representation of data in multidimensional contingency
tables is not well developed. The line diagrams in FCA close this gap, since each
multidimensional contingency table can be viewed as a part of a line diagram.

• PRINCALS:
Principal Component Analysis by Alternating Least Squares as described by GIFI (1985,
1990) is a graphical data analysis method which allows to „handle any mixture of nominal,
ordinal, and numerical variables“ (GIFI 1990, p. 157). Instead of applying only linear methods
(like PCA) PRINCALS works with rescaling procedures using alternating least square
methods to minimize a certain loss function with the purpose to represent the rows and/or
columns of the original data matrix by points in the real n-dimensional vector space. To
visualize these row and column points they are projected into a suitable Euclidean plane (like
in the methods previously mentioned). Ordinal variables are represented in the plane using
half-planes.
Main problems: 1. It is well-known that not all formal contexts are pictorial (cf. BOKOWSKI,
KOLLEWE 1991 and KOLLEWE 1993), which implies that not all ordinal data can be
represented by half-planes. 2. Missing data for ordinal or numerical variables cannot be
treated (cf. GIFI 1990, p. 177). 3. PRINCALS algorithms may run into local minima (cf. GIFI
1990, p. 178/179).

• Cluster Analysis:
Cluster Analysis has been introduced by SNEATH (1957). For details the reader is referred to
JARDINE, SIBSON (1971), BOCK (1974), HARTIGAN (1975), SOKAL, SNEATH (1973),
SPÄTH (1980).
In contrast to all previously mentioned methods hierarchical Cluster Analysis represents the
data (usually after a transformation into a distance matrix) not in the Euclidean plane but in
the y-metric plane in form of a dendrogram which visualizes a hierarchy of partitions on the
set of objects (or on the set of variables). The y-metric is used to represent the selected
heterogeneity measure for the partitions of the dendrogram.
Main problems: 1. If the original data are given as a many-valued context there are many
distances definable but there is no standard distance matrix to represent the data. 2. The data
representation from the many-valued context to a distance matrix usually looses a lot of
information. 3. The standard dendrograms representing partitions on the set of objects don´t
contain the attributes, hence the meaning of the object clusters in terms of the attributes has to
be guessed. 4. To overcome this problem there are several methods to „explain“ the
dendrogram by introducing the original attributes and certain contribution values (cf. JAMBU,
1991, chapter 10.5). These „explained dendrograms“ often represent a relevant „partition
insight“ into the corresponding concept lattice, but this „partition insight“ neglects the
intersections between extents of formal concepts.

8
• Multidimensional Scaling:
Multidimensional Scaling has been introduced by TORGERSON (1958). The main
interpretation principles in MDS are distance and (dis-)similarity between objects or
attributes. Therefore the original data are transformed into (or given as) a distance or a
dissimilarity matrix D. The MDS-problem consists in the construction of a set of points in a
Euclidean space of low dimension such that the Euclidean distance matrix for these points
equals the given matrix D. Formally: D(i,j) = E(f(i),f(j)), where E denotes the Euclidean
metric and f(i) the point corresponding to row i of D. Since it is clearly impossible to embed a
tetrahedron isometrically in the Euclicean plane the MDS-problem for the plane is not always
solvable. Hence one can expect only approximative solutions of the MDS-problem (for the
plane). The ordinal MDS-problem consists in the construction of a set of points in a Euclidean
space of low dimension such that D(i,j) < D(k,l) ⇔ E(f(i),f(j)) < E(f(k),f(l)). To solve these
problems approximately the following data representations are used:
The distance or dissimilarity matrix D is transformed into a scalar products matrix, then the
eigenvalue decomposition yields an initial set of vectors having the previous scalar products.
This initial set of points is transformed during an iterative calculation using alternating least
squares or gradient algorithms to minimize the stress which describes the deviation of the
actual matrix E from the given matrix D. The iterations are stopped if the improvement of
stress is less than a specified value. To visualize the „MDS-solution“ the final set of points is
projected into the Euclidean plane or 3-space. For details the reader is referred to the literature
(e.g. KRUSKAL, WISH1978, SCHIFFMAN, REYNOLDS, YOUNG 1981).
Main problems: 1. The problem of generating a meaningful distance matrix (if it is not given
directly) from a many-valued context is the same as in Cluster Analysis. 2. If a many-valued
context is transformed into a distance matrix for the objects, then the MDS solution represents
only the objects, but not the attributes, hence the relation between objects and attributes has to
be guessed like in Cluster Analysis. 3. To interpret the MDS-solutions preference analysis,
property fitting and canonical regression are used which need „information about the stimuli
(and subjects) in addition to the similarity information“ (cf. SCHIFFMAN, REYNOLDS,
YOUNG 1981, chapter 12.1.). 4. The MDS-algorithms may run into local minima.

• Partial Order Scalogram Analysis:


POSA has been introduced by SHYE (1985a,b) as an extension of the unidimensional
Guttman scale (GUTTMAN 1950) to configurations of higher dimensionalities. The POSA-
data consist of a finite many-valued context without missing values such that each attribute
(called test) has a range which is ordered in a chain (like the test scores). A scalogram is a
mapping from a set P (the „population“) into a direct product of chains (with the usual
(partial) product order). The partial order dimensionality of a scalogram is the usual order
dimension of the order generated by the image of the scalogram in the direct product order, i.e.
the smallest number m of chains C1 ,...Cm such that there exists an order-embedding of the
image of the given scalogram into the direct product of C1 ,...Cm . If m = 2, it is easy to
construct a Hasse diagram of the order generated by the image of the scalogram. In general the
m chains are interpreted as the „factors of the contents analyzed, and in this sense scalogram
merits the name of „nonmetric factor analysis“ proposed by Coombs“ (cf. SHYE 1985a).
„POSAC/LCA is a computer program that produces pictorially the best (two-dimensional)
minimal space for empirical data, and, in addition, relates that diagram to external
variables....Lattice space analysis (LSA) aids the investigator in interpreting the scalogram
axes and in testing regional hypotheses (see Smallest Space Analysis)“ (cf. SHYE 1985a).
Main problems: 1. POSA doesn´t possess a general graphical representation for scalograms of
arbitrary dimensionality. 2. The POSAC representation of high-dimensional scalograms in a

9
direct product of two chains looses a lot of information which can be easily understood, if
such data are represented in the corresponding concept lattice which contains the whole
information of the data.

• Formal Concept Analysis:


The graphical data representation in Formal Concept Analysis uses three data transformations:
Scaling: Transformation of a many-valued context into a formal context.
Conceptualization: Calculation of the concept lattice of the formal context.
Visualization: Construction of a line diagram of the concept lattice.
For the details of the following short description the reader is referred to the literature (cf.
GANTER, WILLE 1989, WOLFF 1994a).
The scaling procedure constructs from a given many-valued context (G,A,W,J) a formal
context (G,M,I). The most trivial way of scaling is the nominal scaling: For each many-
valued attribute a of the many-valued context and each value w of a we introduce an
attribute m := (a,w) and define the nominally-derived context (G,M,I), where
M:= {(a,w) a ∈ A, w a value of a } and the relation I is defined by the formula
F1 : g I (a,w) :⇔ ⇔ (g,a,w) ∈ J .
Indeed in most applications other information preserving (or information reducing) scales are
used, e.g. to represent some interesting ordinal structure on the set of values of an attribute. In
practical situations this is usually done in cooperation with the data expert who collected the
data (cf. GANTER,WILLE 1989, WOLFF 1993).
The conceptualization procedure calculates the concept lattice (B(G,M,I), ≤) of the formal
context (G,M,I). The relation I can be reconstructed from the concept lattice using the formula
F2 : g I m ⇔ γ(g) ≤ µ(m) (for all g ∈ G, m ∈ M),
where γ(g) is the smallest concept having g in its extent and µ(m) is the biggest concept
having m in its intent.
The visualization procedure represents a finite concept lattice by a line diagram in the y-
ordered plane, i.e. a labeled Hasse diagram of (B(G,M,I), ≤) in which each concept is
represented by a point such that
F3 : g I m ⇔ there is an upwards leading path from the point of γ(g) to the point
of µ(m)
(cf. WILLE 1987). This shows that finite contexts can be represented in the plane without any
loss of information and this holds also (by F1) for finite many-valued contexts which are
scaled nominally. For other information preserving scaling procedures one has to replace (F1)
by the corresponding scaling rule.
Moreover it is possible to represent the data with prescribed granularity, e.g. at first very
rough and then for some interesting parts of the data very fine (cf. GANTER, WILLE 1989,
WOLFF 1993). The main technique to visualize large concept lattices uses the so-called
„nested line diagrams“ where one line diagram is nested into the blown-up points of another
line diagram (cf. WILLE 1984,1989, WOLFF 1994a). A nested line diagram shows the
distribution of the objects in the direct product of the concept lattices of two (or more) scales.
This is the standard graphic in the data management system TOSCANA (cf. VOGT, WILLE
1994, KOLLEWE, SKORSKY, VOGT, WILLE 1994) which enables the user to explore large
concept lattices without the need to draw complicated line diagrams since these are
automatically embedded into nested line diagrams with small „factors“.
Main problems: 1.It is difficult to construct „good“ line diagrams automatically since there are
many possibilities to represent a concept lattice by a line diagram. 2. Some experience is
necessary to understand the main structures in concept lattices especially for lattices of high
order dimension.

10
4 Conceptual remarks

It´s obvious that POSA is very similar to FCA, e.g. the POSA-diagrams are drawn in the y-
ordered plane as Hasse diagrams (without the labelling of the line diagrams) or as „minimum
space diagrams“ embedded in the direct product of two ordered chains namely the positive x-
and y-axes.
But it is remarkable how order theoretic or conceptual ideas are hidden in the other graphical
data analysis methods: All methods in Diagram 1 which fall under the key-word
coordinatization use not only the real n-dimensional Euclidean space, but also the usual order
of the real numbers to determine maximal eigenvalues or singular values as indicators of
meaningful distributional excentricity of a cloud of points. The standard projection onto the
plane spanned by the eigenvectors of the two largest eigenvalues is a forerunner of Shye´s
POSA-representation which again is closely related to the result that any finite concept lattice
can be order-embedded into the direct product of m chains, where m is the order dimension of
the concept lattice. The drawing strategy for line diagrams to draw at first the two longest
chains (of meet-irreducible attribute concepts) fits into this development. Such line diagrams
emphasize to look at first at the graphically well-drawn two longest chains, but the line
diagram shows also all the other informations of the given context in contrast to the other data
analysis methods which usually unfold just the „main“ effects. The well-known „horseshoe“
or „Guttman-effect“ in CA-solutions is a good example: This parabolic shaped configuration
of object and attribute points in the CA-solution occurs not only in data with a certain „trend“
like in one-dimensinal interordinal scales but also in one-dimensional ordinal scales and in
many variations of these contexts representing a great variety with respect to their conceptual
structure. (cf. GABLER, WOLFF (to appear), GANTER, WILLE 1989, SPANGENBERG,
WOLFF 1991a)
Finally we mention that the representation of multidimensional data in the Euclidean 3-space
by a so-called „3-dimensional scatterplot“ visualized by 2-dimensional projections of the
dynamically rotated scatterplot gives a much better understanding of sets of points in the
Euclidean 3-space than just one „optimal“ 2-dimensional scatterplot. But the problem of
constructing a conceptually meaningful distance from the original many-valued context
remains unsolved even if the chosen distance is represented isometrically in the Euclidean 3-
space.
Hence the main problem in the graphical representation of multidimensional data and more
generally in all knowledge representation methods is the Information Representation Problem
mentioned above. To summarize the discussion of the information preserving abilities of the
above mentioned methods we give the following short description of the state of the art of
graphical data analysis: Among the graphical data analysis methods mentioned in List 1 only
FCA transforms the original data structure without loss of information into a graphical data
structure in the plane. All other methods in List 1 loose some information of the original data
structure.
It is very instructive to compare these methods using the same data, especially to study how
some simple conceptual structures like standard scales are represented in the graphical data
structures of these methods. A comparison between FCA and biplots is published in
SPANGENBERG, WOLFF (1991b). The following two papers analyse the same data with
FCA in WOLFF, GABLER, BORG (1994b) and with CA in GABLER, RIMMELSPACHER
(1994). A comparison of visualizations in CA and FCA will appear in GABLER, WOLFF.

11
References

Benzécri, J.-P.: L´analyse des données. Tome I. La taxinomie. Tome II. L´analyse des
correspondances. Dunod, Paris, 1973, 1976, 1980.
Bock, H.H.: Automatische Klassifikation. Theoretische und praktische Methoden zur Gruppierung
und Strukturierung von Daten (Clusteranalyse). Vandenhoek & Ruprecht, Göttingen, 1974.
Bokowski, J., W. Kollewe: On representing contexts in line arrangements. Order 8, (1991) 393-403.
Cattell, R.B.: The scree test for the number of factors. Multivar. behav. Res. 1, (1966)245-276.
Coombs, C.H.: A Theory of Data.Wiley, New York, 1964.
Gabler, S., B. Rimmelspacher: Korrespondenzanalyse von Arbeitswerten in Ost- und
Westdeutschland. ZUMA-Nachrichten 34, Jg.18, Mai 1994, 83-96.
Gabler, S., K.E. Wolff: Comparison of Visualizations in Correspondence Analysis and Formal
Concept Analysis. To appear in: J. Blasius, M. Greenacre (eds.): Visualization of Categorical Data.
Gabriel, K.R.: The biplot-graphic display of matrices with applications to principal component
analysis. Biometrica 58, (1971) 453-467.
Ganter, B., R. Wille: Conceptual Scaling. In: F. Roberts, (ed.): Applications of combinatorics and
graph theory to the biological and social sciences, Springer Verlag, New York, 1989,139-167.
Ganter, B., J. Stahl, R. Wille: Conceptual measurement and many-valued contexts.In: W. Gaul, M.
Schader (eds.): Classification as a tool of research. North-Holland, Amsterdam 1986, 169-176.
Gifi, A.: PRINCALS. Research Report UG-85-03. Leiden: Department of Data Theory.1985.
Gifi, A.: Nonlinear multivariate analysis. Wiley, Chichester, 1990.
Girshick, M.A.: Principal components. Journal of the American Statistical Association 31, 1936, 519-
528.
Greenacre, M.: Theory and applications of correspondence analysis. Academic Press, London, 1984.
Greenacre, M., J. Blasius: Correspondence Analysis in the Social Sciences. Academic Press, San
Diego, 1994.
Guttman, L.: The basis for scalogram analysis. In: S.A.Stauffer et al. (eds.) Measurement and
Prediction. Vol.4, Princeton University Press, Princeton, New Jersey, 1950.
Harman, H.H.: Modern Factor Analysis. Chicago, 1960.
Hartigan, J.A.: Clustering algorithms. Wiley, New York, 1975.
Hotelling, H.: Analysis of a complex of statistical variables into principal components. Journal of
Educational Psychology 24, 1933, 417-441.
Jambu, M.: Exploratory and multivariate data analysis. Academic Press, San Diego, 1991.
Jardine, N., R. Sibson: The construction of hierarchic and non-hierarchic classification. Computer
Journal 11, 1968, 177-184.
Jardine, N., R. Sibson: Mathematical taxonomy. Wiley, New York, 1971.
Joliffe, I.T.: Principal Component Analysis. Springer, New York, 1986.
Kollewe, W.: Representation of data by pseudoline arrangements. In: O.Opitz, B.Lausen, R.Klar
(eds.): Information and Classification. Springer-Verlag, Heidelberg, 1993, 113-122.
Kollewe, W., M. Skorsky, F. Vogt, R. Wille: TOSCANA - ein Werkzeug zur begrifflichen Analyse und
Erkundung von Daten.In: R.Wille, M.Zickwolff (Hrsg.): Begriffliche Wissensverarbeitung -
Grundfragen und Aufgaben. BI-Wissenschaftsverlag, Mannheim, 1994, 267-288.
Krantz, D.H., R.D. Luce, P. Suppes, A. Tversky, Foundations of Measurement, Vol. I, Akademic
Press, New York, 1971.
Kruskal, J.B., M. Wish: Multidimensional scaling. Sage Publications, Beverly Hills, California, 1978.
Luce, R.D., D.H. Krantz, P. Suppes, A. Tversky: Foundations of Measurement, Vol. III, Akademic
Press, San Diego, 1990.
Schiffman, S.S., M.L. Reynolds, F.W. Young: Introduction to multidimensional scaling. Academic
Press, New York, 1981.
Shye, S.: Partial Order Scalogram Analysis. In: The International Encyclopedia of Education,
Oxford, Pergamon Press, 1985a.
Shye, S.: Multiple Scaling. North-Holland, Amsterdam, 1985b.

12
Sneath, P.H.A.: The application of computers to taxonomy. Journal of General Microbiology 17,
1957, 201-226.
Sokal, R.R., P.H.A. Sneath: Principles of numerical taxonomy. Freeman, San Francisco, 1973.
Spät, H.: Cluster-Analyse-Algorithmen. Oldenbourg, München, 1977.
Spät, H.: Cluster analysis algorithms for data reduction and classification of objects. Halsted/Wiley,
New York 1980.
Spangenberg, N. , K.E. Wolff: Interpretation von Mustern in Kontexten und Begriffsverbänden.
Publication de l´ Institut de Recherche Mathématique Avancée, Strasbourg, Actes 26e Séminaire
Lotharingien de Combinatoire, 1991a, 93-113.
Spangenberg, N., K.E. Wolff: Comparison of Biplot Analysis and Formal Concept Analysis in the
case of a Repertory Grid. In: H.H. Bock, P. Ihm Classification (eds.): Data Analysis, and Knowledge
Organization, Springer Verlag 1991b, S.104-112.
Suppes, P., D.H. Krantz, R.D. Luce, A. Tversky: Foundations of Measurement, Vol.II, Academic
Press, San Diego, 1989.
Torgerson, W.S.: Theory and Methods of Scaling. Wiley, New York, 1958.
Überla, K.: Faktorenanalyse. Springer-Verlag, Berlin, 1971.
Vogt, F., R. Wille: TOSCANA - A Graphical Tool for Analyzing and Exploring Data.
TH-Preprint 1670, 1994.
Wille, R.: Restructuring lattice theory: an approach based on hierarchies of concepts. In: I. Rival
(ed.): Ordered Sets. Reidel, Dordrecht-Boston 1982, 445-470.
Wille, R.: Liniendiagramme hierarchischer Begriffssysteme. In: H.H. Bock (ed.): Anwendungen der
Klassifikation: Datenanalyse und numerische Klassifikation. INDEKS-Verlag, Frankfurt 1984, 32-51;
engl. Übersetzung: Line diagrams of hierarchical concept systems. International Classification 11
(1984) 77-86.
Wille, R.: Bedeutungen von Begriffsverbänden. In: B. Ganter, R. Wille, K.E. Wolff (Hrsg.) Beiträge
zur Begriffsanalyse. B.I.-Wissenschaftsverlag, Mannheim/Wien/Zürich 1987, 161-211.
Wille, R.: Lattices in data analysis: how to draw them with a computer. In: I. Rival (ed.) Algorithms
and order.Kluwer, Dordrecht-Boston, 1989, 33-58.
Wolff, K.E.: Einführung in die Formale Begriffsanalyse. Publication de l' Institut de Recherche
Mathematique Avancee, Strasbourg, Séminaire Lotharingien de Combinatoire,1988, 85-96.
Wolff, K.E., M. Stellwagen: Conceptual optimization in the production of chips. In: Janssen, J. &
Skiadas, C.H. (eds.) Applied Stochastic Models and Data Analysis, Vol. 2, World Scientific
Publishing Co. Pte. Ltd., 1993,1054-1064.
Wolff, K.E.: A first course in Formal Concept Analysis - How to understand line diagrams. In:
Faulbaum, F. (ed.) SoftStat´93, Advances in Statistical Software 4, 1994a, 429-438.
Wolff, K.E., S. Gabler, I. Borg: Formale Begriffsanalyse von Arbeitswerten in Ost- und
Westdeutschland. ZUMA-Nachrichten 34, Jg.18, Mai 1994b, 69-82.

13

Você também pode gostar