Você está na página 1de 13

1

Computational Gestalts and Prägnanz


Riccardo Luccio
Dipartimento di Psicologia, Università di Trieste

The two ways


Our phenomenal experience is founded on the objects and events that the
perceptual field presents to us, in today life as in laboratory. Excepted some
very rare situation, like the Ganzfeld, the phenomenal field is seldom
homogeneous, and evry time that the perceptual processes present to us
properties that are “different from what one ca expect by bare summation”, we
are confronted with what we can call Gestalts (cfr Köhler, 1920, p. ix).
Usually, when in psychology we speak about perception, we refer at
least to two different concepts. i) a first meaning refers to an immediate
segmentation of the field, which appears so to the awareness as a plurality of
distinct objects, before and indipendently from the attribution of any meaning;
ii) a second meaning, on the contrary, refers to the identification of suche
objects, with their categorization and their recognition,
One can note that is the first concept, specifically visual, that Kanizsa
called primary, while the second, properly cognitive, was by him called
secondary (Kanizsa, 1979), in his distinction between the “two ways to go
beyond the information given”.
Kanizsa spoke always about this two moments in terms of seeing versus
thinking. Personally, while agreeing with this distinction, I do prefer definitely
to avoid the identification of the two moments with the terms seeing and
thinking, that have created so many misunderstandings and fruitless debates. In
particular it is worth to remember the arguments advanced by the supporters of
the opportunity to maintain a continuity between perception and thinking, as in
the classic Gestalttheorie (see Arnheim, 1986 – see Kanizsa & Luccio, 1987;
Luccio, 2005). Anyway, the opportunity to save the distinction is given by
logic as well as experimental reasons. When the aim of the student of
2

perception is to individuate the principles on the basis of whom the primary


process operates, first one must leave any temptation to advocate any
ratiomorphic explanation of the segmentation of the field. From the logic point
of view, the so-called “Höffding’s step” (Höffding, 1885; Kanizsa & Luccio,
1987) has definitely demonstrated the untenability of any argment that
maintain the recognition of an object not yet recognized on the basis of the past
experience. This is true not only for the classical ratiomorphic assumption like
Helmholtz’ unbewußter Schluss, or Benussi’s assimilation, but also for models
derived from the “classic approach” of the computer vision (see below).
In my opinion, the lone tendency that one can individuate in perception
is towards a maximum stability in the given conditions, according to minimum
principles (Koffka, 1935), and not towards a maximum regularity, like many
Gestalt theorists have maintained (f. i. Metzger, 1976; Wertheimer, 1923 –
contra, see Kanizsa & Luccio, 1986). In this sense, it is the interesting to
examinate (and these are the experimental reasons above advocated) mainly the
cases in which there are i) multistable configurations (Kanizsa & Luccio,
1995), ii) sharp and compelling organizational changes, like in Street’s figures;
iii) perceptual results that are stable, but amodal, that is independent from an
actual stimulation (Kanizsa & Gerbino, 1982).; or (iv) the phenomenal results
are more or less different from the objective conditions of stimulation, that is in
the case of the so called illusions.
However, the first tasks to individuate the mecha nisms that could be at
the base of the primary process, that is the segmentation of the visual field.

Partial gestalts
In the last few decades many attempts were made to formulate Gestalt
principles in a more precise way, using mathematical tools that at the time of
the founding of the Berlin school were not at disposal of the first Gestalt
psychologists, from information theory (Attneave, 1854, 1957; Garner, 1962)
to synergetics (Haken & Stadler, 1990; Kelso, 1995), or other dynamic non-
linear approaches (van Leeuwen, 2007), and so on. One apparently very
3

promising attempt was undertaken, in the last ten years or so, by a group of
French mathematicians mainly interested in computer vision; among them
(Morel, Cao, Almansa, etc:), the leading figure appears today Agnés
Desolneux. (for a comprehensive review, see Desolneux, Moisan, Morel,
2006). The theory of computational Gestalts that they are building is centered
on three basic principles:
1. Shannon-Nyquist, definition of signals and images. Any image or
signal, including noisy signals, is a band-limited function sampled on
a bounded, periodic grid.
2. Wertheimer's contrast invariance principle: Image interpretation does
not depend upon actual values of the stimulus intensities, but only on
their relative values.
3. Helmholtz principle, indeed stated by D. Lowe (1985): Gestalts are
sets of points whose (geometric regular) spatial arrangement could
not occur in noise.
This means that, given the discreteness of the visual field (first
principle), and given the prevalence of the relative over the absolute values of
the stimuli (second principle), it is possible to determine a probability value ,
for whom all the stimuli whose probability is less of tend to group together
(third principle).
The name of Helmholtz can sound a little odd in this context for
psychologists. As a matter of fact, in his Handbuch (1867), neither in other
paper concerning perceptual theory, Helmholtz never stated something of
similar. But in general the quotations of such authors of psychological matters
are not to be taken too seriously. We will concentrate in our analysis overall on
Helmholtz principle.
Anyway, before examining the theory, is useful some introductory
remarks. In this approach, the starting point is the attempt made by Gestaltists
(overall Wertheimer) to find the basic laws that contribute to the formation of
shapes, on the basis of several common properties. These properties, the partial
gestalts (Desolneux, Moisan and Morel, 2001) correspond at least in part to the
4

classical principles stated by Wertheimer (1923); their application converge in


forming larger groups, according also to some other less classic principles, like
the articulation without rests (Metzger, 1956, Kanizsa, 1979). It could be
stressed that Metzger and Kanizsa are considered by these authors very
important points of reference in this matter, in some sense more than the
classical authors of the Berlin school.
According to these authors, Gestalt Theory predicts that the partial
gestalts are recursively organized with respect to the grouping laws. The
algorithms are non-local, since alignments or similarity between some partial
features have to be considered for the totality of the perceptual field.
A good example could be the detection of good continuation (Cao,
2004). The steps to go on in the study are the following. First, it is not given a
global explanation of the form to detect, neither a model of it. One must instead
decide whether (in case of good continuation) a given curve made assembling
two or more other curves produces a result that is smooth or not. So, given a
curve and a number of other curves with different levels of smoothness, the
participant has the task to make which considers is a meaningful assembling,
indicating which curves can belong each other. We can so work out the false
alarm rate; in such a detection task the parameters reduce to this rate, that,
under null hypothesis, is a fair measure of probability. The algorithm is in
consequence parameter free. We must stress that the verb “decide” could be
misleading, if one assumes that it implies some sort of ratiomorphic
explanationm with a reasoning abour the smoothness as a perceptual result.
Instead, the process has in some sense an automatic exit. In other terms, it
could be conidered as the output of a sort of “smart mechanism”, in the sense
of Runeson (1977): in this case, as we will see soon, the primary process is the
output off a smart mechanism that is able to assess probabilities e to segment
the perceptual field according to the result of this assessment, without any need
to know nothing about the theory of probability.
5

Differences with the traditional approach


We must stress the great difference that this approach has with respect to
all the traditional approaches in computer image analysis, not to say in
experimental psychology. In most approaches the students define a priori a set
of structures that one should find in the given images to analyse. Such structures
could be lines with a given curvatures, junctions (f.i. T-junctions), textures,
convexities, and so on. The next step is, given a certain image or shape or in
general pattern, to try to maximize the values of some parameters associated
with such structures in computing (in computer vision) or in detecting (in
experimental psychology) the pattern. This is made, formally in computer vision,
informally (implicitly) in experimental psychology, minimizing a function of the
type F(u, u0) + R (u), were u is the model of the image, u0 the given image, and
F and R are respectively fidelity and regularity functions (see Morel and
Solimini, 1994).
This classical approach presents a series of drawbacks: without going in
deep, we must list the main drawbacks: the need for normalization constants,
that determine the segmentation of the image; the fact that, assigning a
minimum for the above function, it is implied that any image (random noise too)
could be segmented, irrespectively of the fact that this segmentation is significant
or not. And least but not last, this approach is for its very nature localist, and so
anti Gestaltist in spirit.
To illustrate this problem, we shall illustrate briefly a recent model of this
kind, highly discussed in the last years, proposed by Brox, Bruhn, Papenberg, &
Weickert, (2004 – for a discussion and an extension of the model, see Amiaz,
Lubetzky, & Kiryati, 2007). The aim of Brox & coll. was the estimate of the
optical flux, minimizing an objective function E(u, v), including two terms, a
data term and a smoothness term. The data term, Ed (u, v), gives the
correspondance of two frames, on the basis of the constancy of the gray level in
consecutive frames. To avoid the effect of the noise, one canm for instance,
utilize some weighted windows around each pixel. It implies clearly the
insufficience of an approach based only on minimizing Ed. We need some
6

“regularization”, for instance the so-called piecewise smoothness (see Black &
Anandan, 1996, Brox et al., 2004, Amiaz & Kiryati, 2006).

Helmholtz principle
Let’s go back to the model here discussed. As we said, the so called
Helmholtz principle was introduced by Lowe (1985). In very general terms, we
can state the principle in this way: we are able to detect any configuration that
has a very low probability to occur only by chance. So, any detected
configuration has a low probability, that implies that every improbable
configuration is perceptually relevant. Lowe stated so the principle: “ we need
V° Ë

to determine the probability that each relation in the image could have arisen
by accident p(A). Naturally, the smaller that this value is, the more likely the
relation is to have a causal interpretation.” A more formal statement of this
principle was first given by Desolneux, Moisan and Morel (2000): “We say
that an event of type ‘such configuration of points has such property’ is -
meaningful if the expectation in a image of the number of occurrences of this
event is less than ”.
What means the -meaningfulness? It can be restated assuming that in an
image are present n objects (parts, regions). Now, if k of them share a common
feature, we must decide if this is happening by chance or not. To answer this
question, we make the following mental experiment: we assume that the
considered quality has been randomly and uniformly distributed on all objects
O1, . . . , On. Notice that this quality may be spatial (e.g., position, orientation).
Then we (mentally) assume that the observed position of objects in the image
is a random realization of this uniform process, and ask the question: is the
observed repartition probable or not? The Helmholtz principle states that if the
expectation in the image of the observed configuration O1, . . .,Ok is very small,
then the grouping of these objects makes sense, is a Gestalt (see Desolneux,
Moisan and Morel, 2003). The Helmholtz principle can be illustrated by the
psychophysical experiment of Figure 1. On the left, we display roughly 400
segments whose directional accuracy (computed as the width–length ratio) is
7

about 12 degrees. Assuming that the directions and the positions of the
segments are independent, uniformly distributed, we can compute the
expectation of the number of alignments of four segments or more. (We say
that segments are aligned if they belong to the same line, up to the given
accuracy.) The expectation of such alignments in this case is about 2.5. Thus,
we can expect two or three such alignments of four segments and we found
them by computer. Do you see them? On the right, we performed the same
experiment with about 30 segments, with accuracy (width–length ratio) equal
to 7 degrees. The expectation of a group of four aligned segments is 1/250.
Most observers detect them immediately.

When ≤ 1, we talk about meaningful events. If the Helmholtz principle


is true, we perceive events if and only if they are meaningful in the preceding
sense. The alignment on the right side of Figure 1 is meaningful while the left
side of the figure contains no meaningful alignment of four segments.
As an example of generic computation we can do with this definition, let
us assume that the probability that a given object Oi has the considered quality
is equal to p. Then, under the independence assumption, the probability that at
least k objects out of the observed n have this quality is
8

n ! n$
B( p,n,k) = ' # &p i (1 ( p) ,
n(i

i=k " k %

i.e. the tail of the binomial distribution. The independence assumption is not
realistic, but it is an a contrario assumption. In order to get an upper bound of
the number of false alarms, i.e. the expectation of the geometric event
happening by pure chance, we can simply multiply the above probability by the
number of tests we perform on the image. Let us call NT the number of tests.
Then in most cases we shall consider in the next subsections, a considered
event will be defined as -meaningful if
N T B( p,n,k ) ! ".
We call in the following the left hand member of this inequality on the the
“number of false alarms” (NFA).
If this expected number, is very low, then the group should be
considered as meaningful, since it cannot be due only to chance. This means
that we reject, a contrario, the independence hypothesis. Under an
independence assumption, probabilities are obtained as products of more
elementary probabilities. Therefore, it is often possible to prove (it will be the
case in what follows), that the minimal size of the meaningful group, depends
on the logarithm of the allowed number of false alarms. We shall see that,
experimentally, we can take this number equal to 1, since modifying its value
does not much change the results. We have to pay attention to the fact that the
a contrario events we define must not depend in any way of the observation.
When 1, we talk about meaningful events. This seems to contradict
the necessary notion of a parameter-less theory. Now, it does not, since the -
dependency of meaningfulness will be low (it will be in fact a log -
dependency). The probability that a meaningful event is observed by accident
will be very small. In such a case, our perception is liable to see the event, no
matter whether it is “true” or not. Our term !-meaningful is related to the
classical p-significance in statistics ; as we shall see further on, we must use
expectations in our estimates and not probabilities.
9

The general method we have just outlined can be viewed as a


systematization of Stewart’s ”MINPRAN” method [23]. The method was
presented as a new paradigm, but was applied only to the 3D alignment
problem. Now, Stewart actually addressed but did not solve two problems we
have intended to overcome, in order to make the method fully general. One of
the problems raised by Stewart was the generation of the set of samples, which
generates in Stewart’s method at least three user’s parameters and the second
one was the severe restriction about the independence of samples. We actually
solved both difficulties simultaneously by introducing the number of samples
as an implicit parameter of the method (computed from the image size and
Shannon’s principles) and by replacing in all calculations the ”probability of
hallucinating a wrong event” by the ”expectation of the number of such
hallucinations”, namely what we call the false alarm rate NFA. The method we
develop here has probably been proposed several times in Computer Vision
(e.g. in the early Lowe work [15]), but, to the best of our knowledge, not
systematically developped.
An important point is raised by Cao (2004). If we assume that the points
are uniformly and independently distributed in the image, then this configuration
is neither more nor less probable than any other configuration. This means that
summarizing Helmholtz principle by “any improbable configuration has to be
detected” is erroneous. The answer of this objection is given by the Gestalt
Theory itself: the non generic configurations we are interested in, are given by
the Gestalt laws. Thus, we have to compute the probability of occurrence of
partial gestalts: alignments, good continuations, etc . . ., and not the probability
of any random configuration. The argument could appear a circular one, but in
fact this is a nice operative way to define operationally what Gestalt principles
are.
What does “small probability” mean? In fact, we can only compare the
number of false alarms (that is the number of detections that are due to chance)
and the number of tested events (which depends on the observation). Therefore,
this number has to be finite. Together with the fact that all the tested events
10

must be defined independently of the observation, this implies that our


observations have to be quantized, as explained in the next section. According
to the preliminary experimental results that we have obtained in this
perspective, we can so give a definite formulation of the concept of Prägnanz,
eliminating all the ambiguities that surround it (Kanizsa and Luccio, 1986), and
include in a very precise way oppositenes as a founding principle in
constituting perceptual Gestalts:

References
Almansa, A., Desolneux, A., Vamech, S. (2003). Vanishing point detection
without any a priori information. IEEE Transactions: Pattern Analysis
and Machine Intelligence, 25 (4), 502–507
Alvarez, L., Morales, F. (1997). Affine morphological multiscale analysis of
corners and multiple junctions. International Journal of Computer
Vision, 25 (2), 95–107.
Amiaz, T. & Kiryati, N. (2006). Piecewise-smooth dense optical flow via level
sets, International Journal of Computer Vision, 68, 111–124.
Amiaz, T., Lubetzky, E., & Kiryati, N. (2007). Coarse to over-fine optical
flowestimation. Pattern Recognition, 40, 2496 – 2503-
Arnheim, R. (1987). Prägnanz and its discontents. Gestalt Theory, 9,102-107.
Attneave, F. (1954). Some informational aspects of visual perception.
Psychological Review, 61, 183–193.
Attneave, F. (1959). Applications of Information Theory to Psychology, New
York, NY: Holt.
Black, M.J. & Anandan, P. (1996). The robust estimation of multiple motions:
parametric and piecewise-smooth flow fields. Computer Vision Image
Understanding, 63, 75–104.
Brox, T., Bruhn, A., Papenberg, N., & Weickert, J. (2004). High accuracy
optical flow estimation based on a theory for warping, in: Eighth
European Conference on Computer Vision (ECCV04), vol. IV, 25-36.
11

Canny, J. (1986). A computational approach to edge detection. IEEE


Transactions: Pattern Analysis and Machine Intelligence, 8 (6), 679–698
Cao, F. (2002). Contrast invariant, parameterless detection of good
continuations, corners and terminators. Technical Report PI-1487,
IRISA,
Caselles, V., Coll, B., Morel, J.M.: (1996). A Kanizsa program. In Progress in
Nonlinear Differential Equations and their Applications, Vol. 25, 35–55
Deriche, R., Giraudon, G. (1993). A computational approach for corner and
vertex detection. Int. J. of Computer Vision, 10, 101–124
Desolneux, A., Moisan, L., Morel, J.M. (2000). Maximal meaningful events
and application to image analysis. Technical Report Preprint 2000-22,
CMLA, ENS-Cachan.
Desolneux, A., Moisan, L., Morel, J.M. (2000). Meaningful alignments. Int. J.
of Computer Vision, 7–23.
Desolneux, A., Moisan, L., Morel, J.M. (2001). Edge detection by Helmholtz
principle. J. Math. Imag. Vision, 14, 271–284.
Desolneux, A., Moisan, L., Morel, J. M. (2001). Partial gestalts.
http://www.cmla.ens-cachan.fr. IEEE Transactions on Pattern Analysis
and Machine Intelligence.
Desolneux, A., Moisan, L., Morel, J. M. (2003).A grouping principle and four
applications. IEEE Transactions on Pattern Analysis and Machine
Intelligence, 25, 508–513.
Desolneux, A., Moisan, L., Morel, J.M (2006). From Gestalt Theory to Image
Analysis: A Probabilistic Approach. (Technical Report)
Garner, W. R. (1962). Uncertainty and Structure as Psychological Concepts,
New York, NY: Wiley.
Gousseau, Y., Morel, J.M.: Are natural images of bounded variation? J. of
Math. Anal. 33(3), 634–648 (2001)
Haken, H. and Stadler, M. (eds) (1990). Synergetics of Cognition, Berlin:
Springer, 2-30.
12

Harris, C.G., Stephens, M. (1988). A combined corner and edge detector. In


4th Alvey Vision Conference, Manchester, pp. 189–192
Helmholtz, H., von (1867). Handbuch der Physiologischen Optik, Leipzig:
Voss.
Höffding H., (1885). Psychologie in Umrissen auf Grundlage der Erfahrung.
Leipzig: Fus's Verlag.
Jacobs, D.W.: Robust and efficient detection of salient convex groups. IEEE
Trans. Pattern Anal. Mach. Intell., 18, 23–37 (1996)
Julesz, B. (1983): Textons, the fundamental elements in preattentive vision and
perception of textures. Bell-Systems Technological Journal, 62, 1619–
1645
Kanizsa, G. (1979). Organization of Vision, New York, NY: Praeger.
Kanizsa, G. (1991). Vedere e Pensare. Bologna: Il Mulino.
Kanizsa, G. and Gerbino, W. (1982). Amodal completion: Seeing or Thinking?
in:, J. Beck (ed) Organization and Representation in Perception.
Hillsdale, NJ: LEA, pp. 167-190.
Kanizsa, G. and Luccio R. (1986) Die Doppeldeutigkeiten der Prägnanz.
Gestalt Theory, 8, 99-135.
Kanizsa, G. & Luccio R. (1987). Formation and categorization of visual
objects: Höffding's never confuted bus always forgotten argument, Gestalt
Theory, 9, 111-127.
Kanizsa, G. & Luccio, R. (1995). Multistability as a research tool in
perception. in: P. Kruse & M. Stadler (eds), Ambiguity in Mind and
Nature. Berlin: Springer.
Kelso, J. A. S. (1995). Dynamic Patterns. The Self-Organization of Brain and
Behavior. Cambridge, MA: MIT Press.
Köhler, W. (1920). Die physischen Gestalten in Ruhe und im stationären
Zustand. Braunschweig: Vieweg.
Lindeberg, T., Li, M.X. (1997). Segmentation and classification of edges using
minimum description length approximation and complementary junction
cues. Computer Vision and Image Understanding, 67, 88–98
13

Lisani, J.L., Monasse, P., Rudin, L. (2001). Fast shape extraction and
application. Preprint 16, CMLA, ENS-Cachan. Lowe, D.
(1985).Perceptual Organization and Visual Recognition. London:
Klüwer,.
Marr, D. (1982). Vision. San Francisco: CA, Freeman.
Montanari, U. (1971). On the optimal detection of curves in noisy pictures.
Communications of the ACM, 14, 335–345
Mumford, D., Shah, J. (1989). Optimal approximation by piecewise smooth
functions and associated variational problems. Communication on Pure
and Applied Mathematics, XLII (4)
Nitzberg, M., Shiota, T. (1992). Nonlinear image filtering with edge and corner
enhancement. IEEE Trans. Pattern Anal. Mach. Intell., 14, 826–833
Pao, H., Geiger, D. (1999) Rubin, N.: Measuring convexity for figure/ground
separation. In: International Conference of Computer Vision, ICCV 9
Vol. 2, 948–955
Prytulak L. S. (1974). Good continuation revisited. Journal of Experimental
Psychology, 102, 773-777.
Rausch E. (1952). Struktur und Metrik figural-optischer Wahrnehmung.
Frankfurt: Kramer. .
Runeson, S. (1977). On th possibility of “smart” perceptual mechanisms.
Scandinavian Journal of Psychology, 18, 172-179.
Sojka, E. (2001). A new algorithm for detecting corners in digital images. In:
8th Spring Conference on Computer Graphics,
van Leeuwen, C. (2007). What needs to emerge to make you conscious?.
Journal of Consciousness Studies, 14, 115--136.
Wertheimer, M. (1923). Untersuchungen zur Lehre der Gestalt, II.
Psychologische Forschung. 4, 301–350.
Zhu, S.C.: Embedding Gestalt laws in markow random fields. IEEE
Transactions: Pattern Analysis and Machine Intelligence, 21(11), 1170–1187
(1999)