Escolar Documentos
Profissional Documentos
Cultura Documentos
ELECTRON PHYSICS
IMAGE MATHEMATICS AND IMAGE PROCESSING
VOLUME 84
EDITOR-IN-CHIEF
PETER W. HAWKES
Centre National
de la Recherche Scientifique
Toulouse. France
ASSOCIATE EDITOR
BENJAMIN KAZAN
Xerox Corporation
Palo Alto Research Center
Palo Alto, California
Advances in
Electronics and
Electron Physics
Image Mathematics and
Image Processing
EDITEDBY
PETER W. HAWKES
CEMESILaboratoire d’Optique Electronique
du Centre National
de la Recherche Scientijique
Toulouse, France
VOLUME 84
92 93 94 95 EC 9 8 I 6 5 4 3 2 1
CONTENTS
CONTRIBUTORS . . . . . . . . . . . . . . . . . . . . . . . ...
viii
. . . . . . . . . . . . . . . . . . . . . . . . . .
PREFACE ix
V
vi CONTENTS
Numbers in parentheses indicate the pages on which the authors’ contributions begin.
...
Vlll
PREFACE
In view of my attempts during the past few years to make image processing
one of the principal themes of these Advances, I am very pleased that this
volume is wholly concerned with image mathematics and image processing.
The subject is in a state of rapid development because, despite its many
successes in domains as far apart as space science, robotics, forensics, and
microscopy, many fundamental problems remain unsolved or imperfectly
understood. Several of these are examined here, together with a practical
application in echographic imagery.
The volume of data in a raw digitized image is so vast that coding is an
important task and vector quantization is known to be attractive in theory.
In practice, the size of the necessary codebook is an obstacle and the opening
chapter by C. F. Barnes and R. L. Frost analyzes the associated difficulties.
The introduction of image algebras (first covered in this series by
C. R. Giardina in Volume 67 and presented in detail by G. X. Ritter in
Volume 80) has generated many original ideas and revealed unexpected
connections between existing processing methods and classical mathematics.
A recent and extremely rich example is the relation between minimax algebra
and mathematical morphology. This has been explored in detail by
J. N. Davidson, author of the second chapter, who gives here a fuller account
of her work than is available elsewhere, in a langauge that should make it
widely accessible.
Invariance under translation, rotation, and perhaps more general transfor-
mation is an essential property of recognition algorithms but is extremely
difficult to achieve. The Lie group approach lends itself particularly well to
the study of this problem, as is shown in the chapter by M. Ferraro.
The topology of digitized images is not obvious; familiar notions such as
adjacency, interior and exterior, and connectedness need to be defined afresh
and there is so far no general consensus of opinion about the best way of
doing this. There is, however, a full but little-known literature on finite
topological spaces and the importance of this subject in image analysis is the
theme of V. A. Kovalevsky in the fourth chapter.
Estimation of a covariance is necessary in many statistical signal process-
ing problems, in one or more dimensions, but this task is often performed
without a proper knowledge of the relevant algebraic formalism. This
involves Jordan algebras, more familiar in quantum mechanics than in the
ix
X PREFACE
FORTHCOMING
ARTICLES
RICHARD L . FROST
Department of Electrical and Computer Engineering. Brigham Young University Provo Utah . .
I . Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
I1. Review of Single-Stage Quantizers . . . . . . . . . . . . . . . . . . . . 6
A. Single-Stage Scalar Quantizers. . . . . . . . . . . . . . . . . . . . . 6
B . A Design Algorithm for Single-Stage Scalar Quantizers . . . . . . . . . n
C . Single-Stage Vector Quantizers . . . . . . . . . . . 9
D . A Design Algorithm for Single-Stage Vector Quantizers 10
111. Residual Quantizers . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
A . Definition and Notation . . . . . . . . . . . . . . . . . . . . . . . 11
B . The Optmization Problem . . . . . . . . . . . . . . . . . . . . . . 12
C . Equivalent Quantizers . . . . . . . . . . . . . . . . . . . . . . . . 13
IV . Scalar Residual Quantizers . . . . . . . . . . . . . . . . . . . . . . . . 14
A . Optimum Stagewise Quanta . . . . . . . . . . . . . . . . . . . . . . 15
B . Optimum Stagewise Partitions . . . . . . . . . . . . . . . . . . . . . 19
C . Tree-Structured Stagewise Partitions . . . . . . . . . . . . . . . . . . 23
V. Vector Residual Quantizers . . . . . . . . . . . . . . . . . . . . . . . 26
A . Optimum Stagewise Code Vectors . . . . . . . . . . . . . . . . . . . 26
B . Optimum Stagewise Partitions . . . . . . . . . . . . . . . . . . . . . 28
C. Tree-Structured Stagewise Partitions . . . . . . . . . . . . . . . . . . 28
VI . Reflection Symmetric RQ . . . . . . . . . . . . . . . . . . . . . . . . 30
A . The Reflection Constraint . . . . . . . . . . . . . . . . . . . . . . 32
B . Optimum Reflected Stagewise Code Vectors. . . . . . . . . . . . . . . 35
VII . Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . 31
A . A New Design Algorithm for Residual Quantizers . . . . . . . . . . . . 31
B . Synthetic Sources . . . . . . . . . . . . . . . . . . . . . . . . . . 38
C . Exhaustive Search Residual Quantizers . . . . . . . . . . . . . . . . . 39
D . Reflected RQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
VIII . Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
Appendix: Tables of Rate-Distortion Data . . . . . . . . . . . . . . . . . 52
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
* This material is based upon work supported by the National Science Foundation under
Grant No . 8909328 and a Centers of Excellence Grant from the State of Utah .
I. INTRODUCTION
’ The acronym “VQ” is used as an abbreviation for both vector quantizer and vector quantiza-
tion.
4 C. F. BARNES A N D R. L. FROST
entially with n are termed non-instrumentable; those whose costs grow only
algebraically with n are instrumentable (Berger, 1971). Instrumentability can
be achieved by imposing structure on the VQ code book so as to simplify the
search procedure and reduce the memory requirements for code book
storage. Once structure has been imposed, it is the task of the designer to
optimize the VQ subject to the imposed structural constraint (Gabor and
Gyorfi, 1986). Unfortunately, the constrained optimization problem is not
always tractable, and one has no choice but to resort to ad hoc design
procedures. In either case, the imposition of structure on the code book will
increase the distortion relative to ESVQ for a given n and R. In this sense,
structured VQs are suboptimal. For a given level of complexity, however, it
is possible that the distortion of the structured VQ may be less than that of
ESVQ. Although complexity is a much more difficult notion to quantify than
is vector dimensionality, it is clearly more relevant to determining the
practical merits of any given quantizer. A note of caution is in order here. Not
infrequently, a “reduced complexity” quantizer will turn out to be more
complex than an ESVQ for the same level of distortion; it is important to
assess the performance of any proposed VQ structure with care. Complexity
should be evaluated while fixing both distortion and rate, and distortion
should be evaluated while fixing complexity and rate.
Examples of structured VQs proposed by researchers include tree-structured
VQ (Buzo et al., 1980; Baker, 1984; Makhoul et al., 1985) and lattice VQ
(Gibson and Sayood, 1988; Conway and Sloane, 1982; Sayood et al., 1984).
Tree-structured VQs (TSVQs) encode each source vector by a tree search of
the possible code vectors. This type of search reduces computation to instru-
mentable levels. Memory requirements remain exponential in nR, and are
actually larger than those of an ESVQ for fixed n and R. A TSVQ has the
advantage of being adaptable to many different source distributions, and
usually suffers only relatively small increases in distortion compared with an
ESVQ (Makhoul et al., 1985). Lattice VQs perform well on uniformly
distributed memoryless sources, and their highly structured algebraic organ-
ization make them instrumentable in terms of both computation and
memory. However, they do not generally perform well on non-uniformly
distributed sources with memory (Gibson and Sayood, 1988).
Residual quantizer? (RQs) have been proposed to fill the middle ground
between ESVQ and lattice VQ. Earlier literature sometimes refers to RQs as
multiple-stage (Juang and Gray, 1982) or cascaded (Makhoul et al., 1985)
VQs. An RQ is organized as a sequence of ESVQ stages, each stage of which
uses a relatively small code book to encode the residual error of the preceding
The acronym “RQ’ is used as an abbreviation for both residual quantizer and residual
quantization.
RESIDUAL VECTOR QUANTIZERS 5
stage. This organization is appealing because it appears to induce a tree-
structure on both the VQ encoder and decoder, thereby reducing both
computation and memory relative to the ESVQ. Despite their apparent
economies, RQs have not been widely adopted. Earlier researchers (Makhoul
et al., 1985j reported that RQs with more than two stages did not perform
well compared with ESVQs. Nevertheless, the RQ or some variant continues
to be suggested in the literature (Chen and Bovik, 1990; Chan and Gersho,
1991).
Recently, we undertook a careful study of the RQ (Barnes, 1989; Barnes
and Frost, 1990) to understand its structure and limitations and to determine
under what circumstances, if any, the RQ is a viable alternative to the ESVQ
or lattice VQ. Our study has produced two main results. The first is a
derivation of necessary conditions for thejoint optimality of all RQ stagewise
code vectors. The second is the understanding that, despite their multistage
organizations, RQs are not in general effectively searched by a tree-structured
encoder. The combination of suboptimal code vector design and incompat-
ible tree-searching seems to account for the poor results reported by earlier
researchers (Makhoul et af., 1985). However, if the RQ alphabet is exhaus-
tively searched, the RQ distortion can be quite close to that of the ESVQ.
Exhaustive search residual quantizers3 (ESRQsj are the complement of
TSVQs in that they perform well and reduce memory cost relative to an
ESVQ, but they do not guarantee reduced computational costs.
In practice, of course, computational costs are very important and often
dominate concerns for memory costs. Accordingly, in this chapter we suggest
a new approach to reduce encoding complexity in RQs and characterize its
effects on distortion. The work described here does not provide a final answer
to the problem of efficient RQ encoding, but it does clarify the structure of
the problem and suggests other possibilities.
This chapter is organized as follows: Section I1 reviews the basic principles
of minimum distortion quantization. Both scalar and vector quantizers are
considered. Section 111 describes the RQ structure and an alternative RQ
representation used in subsequent analysis, called the equivalent single-stage
quantizer. Section IV considers the optimization of scalar RQs and presents
a derivation of necessary conditions for minimum distortion. The problem of
encoding complexity is also considered, and the difficulties associated with
tree-structured encoders for RQ are described and illustrated. Section V
generalizes the results of Section IV to vector residual quantizers. A modified
A note on semantics: It may be more accurate to refer to this structure as an ESVQ with
a direct sum code book (Barnes and Frost, 1990). However, the motivating factor for this study
has been the original residual structure and hence, in this paper, we call this a residual quantizer
with an exhaustive search encoder.
6 C. F. BARNES AND R. L. FROST
11. REVIEW
OF SINGLE-STAGE
QUANTIZERS
DMSE= ( x - Q(x))2fx(x)dx,
=
J =o
6'' ( x - y,)2fx(x)dx.
Necessary conditions for a minimum value of D,, are obtained by dif-
ferentiating Eq. (2) with respect to the b, assuming the y j are fixed, and by
differentiating with respect to they, assuming that the b, are held fixed, that
is,
N - (3)
The solution of Eq. (3) implies the partition boundaries must be midway
between adjacent quanta
6. = Y,-l +Y,
2 '
and that of Eq. (4) implies the quanta must be the centroids of their respective
partition cells
T'' xfx(x) dx
8 C. F. BARNES A N D R. L. FROST
These two conditions are known as the Lloyd-Max conditions. They are
necessary conditions for the minimization of Eq. (2), but may characterize
any stationary point of the quantizer distortion function. In the special case
where the source probability density function is log-concave, the quantizer
distortion function has a single stationary point, so these conditions are
sufficient to determine the global minimum (Trushkin, 1982).
The derivation above gives conditions for minimum quantizer distortion
assuming a fixed number of quantizer levels, but it does not account for
coding the quantizer output. The most obvious code is to represent each
quantum level by its indexj. In binary notation, the index requires a word of
length Pog2(N)1 bits, where rxl denotes the smallest integer at least as large
as x . For example, if N = 8 then in binary notation level 0 would be
coded by 000, level 5 by 101, and so on. It is easy to see that if N is not
a power of two, i.e., N = 2", m E Z + (the positive integers), there is some
inefficiency in this coding scheme. However, even when N is a power of
two, the coding efficiency of this straightforward scheme is optimal only if
p, = Prob(x c S,) = N - ' for all j ; that is, if each output value yJ is
equiprobable. In general, the minimum possible coding rate in bits per
sample is given by the entropy of the quantizer output, defined as
N
H(y) = - 1
J=I
PI log, (PJ)' (7)
If the p, are known, an entropy coding scheme can then be used to minimize
the average number of bits per sample required to represent the quantizer
output. Commonly used entropy coding schemes include Huffman coding
(Gallager, 1968), Lempel-Ziv coding (Welch, 1984), and arithmetic coding
(Langdon, 1984). Entropy coding schemes typically use code words of
varying lengths. Variable length codes are very sensitive to corruption by
noise, since changing a bit in the code may cause the decoder to become
confused as to the length of the corrupted code word and all succeeding code
words. Also, variable length codes create the possibility of serious data loss
(buffer overflow) or inefficient channel use (buffer underflow) when the
variable rate code is transmitted on a fixed-rate channel. These problems can
be managed but the gains in coding efficiency are sometimes outweighed by
increases in system complexity. The interested reader can explore these
problems further in Farvardin and Modestino (1984); Farvardin and
Modestino (1986), and in Jayant and No11 (1984). The present discussion
considers only the case of fixed-rate codes.
Closed form solutions that satisfy both Eqs. ( 5 ) and (6) simultaneously are
RESIDUAL VECTOR QUANTIZERS 9
not usually available. Solutions are obtained iteratively, typically through the
use of an algorithm suggested by Lloyd, which he called Method I. This
algorithm is initialized by some arbitrary placement of the { y,}. Holding the
{ y,} fixed, the algorithm computes optimal {b,}, which satisfy Eq. (5). Then
the {b,} are held fixed, and new { y,} are computed according to Eq. (6).This
process is repeated many times, alternating between the two optimizations.
Since, for each minimization the distortion is non-increasing, and since the
overall distortion is bounded below by zero, the algorithm is guaranteed to
converge monotonically to a solution satisfying both Eqs. ( 5 ) and (6).
The basic development of Lloyd and Max just reviewed was generalized by
Linde et al. (1980) to include the vector case. They also explored the use of
distortion measures more general than squared error. Their work is reviewed
in this section.
An n-dimensional single-stage vector quantizer of a random vector4 X with
probability distribution function Fx( .) is a direct generalization of the scalar
quantizer described above, and consists of the following: 1) a finite indexed
subset A = {yo, y , , . . . , y N - , } of W, called a code book, where each yj E A
is a code vector; 2) a partition B = { S o ,S , , . . . , S N - ,} of %", where the
equivalence classes or cells 4 of 9 satisfy +,'
4 = %", and 4 n sk = 0 for
j # k; and 3 ) a quantizer mapping Q: W H A that defines the relationship
between the code book and partition as Q ( x ) = y, if and only if x E 4.
Specification of the triple ( A , Q, 9)determines a vector quantizer.
Analogous to Eqs. ( 5 ) and (6), necessary conditions for minimum distortion
of single-stage vector quantizers are that the y, E A be centroids of their
respective partition cells
Bold fonts are used for vectors, normal fonts for scalars.
10 C. F. BARNES A N D R. L. FROST
Channel
Source G *
111. RESIDUAL
QUANTIZERS
In this section, RQs are defined and issues relevant to their optimization are
discussed. A direct application of the Lloyd-Max analysis to the stagewise
code vectors is not straight forward and probably accounts for the delay
between the introduction of RQ by Juang and Gray (1982) and the analysis
by Barnes (1989). The key to optimizing multistage quantizers is a description
of the RQ in terms of an equivalent single-stuge quantizer. A careful notation
is required to maintain consistency between the two descriptions. This
chapter introduces the use of the nested labeling notation used by Forney
(1988) to describe coset decompositions of lattices. Once the notation is
established, the optimization of the single-stage quantizer follows directly
from the approach of Lloyd (1957) and Max (1960), reviewed in earlier
sections.
@(il) 02(j2)
- User
FIGURE
3. Block diagram of a residual quantizer decoder.
12 C. F. BARNES A N D R. L. FROST
where x'+' is both the residual error of the last stage and the total residual
error of all stages.
In practice, each Q P ( . )is realized as a composition of an encoder mapping
bp(.) and a decoder map W(*), viz., Qp(xp) = Qp(bp(xp)).Thepth encoder
mapping bp: H JP is defined as b p ( x p ) = j p if and only if xp E SP,. For
each source vector, the indexes produced by the sequence of encoder maps
are concatenated to form an index P-tuple j' = ( j ' , j 2 , . . . ,j')'. Each
P-tuple is called a product code word and is an element of the Cartesian
product of the stagewise index sets j' E {J' x J2 x . * . x J p } .the decoder
maps W :JpH AP recover from each stagewise index j p the corresponding
code vector y$. The quantized representation S1of the input source vector X I
is formed by the sum of the selected stagewise code vectors,
P
S' = 1 y$
p=l
(1 1)
Let d(x', 9 ' )denote the distortion that results from representing X I with S',
and let D(x', 9 ' ) = E{d(x', S')} denote the expected distortion, where E ( - )
is the expectation operator. A P-stage residual quantizer is said to be optimal
The concatenated P-tuple index j p written in bold face is not to be confused with the
Pth-stage index j p .
RESIDUAL VECTOR QUANTIZERS 13
for the source distribution F x l (.) if it gives at least a locally minimal value of
the average distortion. An optimal RQ has a set of code books { A P }and a
set of partitions { P p }which together minimize
P
D(xl, 9') = I d [XI, Qp(xp)] dFxI,..xp. (12)
p= I
C . Equivalent Quantizers
Each f(.)E A' also represents a path through a tree structure that may be
associated with the residual quantizer. Figure 4 represents the code vector
tree of a scalar, three stage, binary (two code vectors per stage) RQ. The root
node of the tree represents the origin of the coordinate system. The leaf nodes
represent the set of equivalent quanta.* The intermediate nodes represent
FIGURE4. Example of an unentangled three stage, two quanta per stage, scalar residual
quantizer.
9'is simply the collection of all equivalent cells. Similarly, the equivalent
mapping p:93" H A'is defined as g ( x l ) = y'(jP) if and only if x' E Se(jf).
The average distortion of the equivalent quantizer is
D(X1, j z ' ) =
I d[x', 8'(X1)]dFXl,
and is given in terms of the known source distribution function F,, (.).
(15)
It follows that the expected distortion of the RQ, given by Eq. (12), and that
of its equivalent quantizer, given by Eq. (15) must be equal. However, the
latter expression is much easier to minimize.
IV. SCALARRESIDUAL
QUANTIZERS
We now consider the optimization of scalar RQs under the mean square
metric. Let XI be a real random variable with probability density function
f X , ( - ) . The RQ design problem is to choose a set of code books {AP}and
RESIDUAL VECTOR QUANTIZERS 15
FIGURE5. The equivalence class 2; (indicated by thick lines) and 2;(indicated by thin
lines) in an unentangled tree.
partitions { g p that
} together give at least a locally minimum value of
m
DM~E = cJ
alljp sqj4
[XI - y'(j'>12f,,(x')dx'. (18)
In addition to assuming a fixed Y e ,assume that all code books except for A P
with p E { I , . . . , P} are held fixed. To minimize DMSE with respect to the
(kP)thcode vector in A P ,set the partial derivative of Eq. (18) with respect to
yipequal to zero to get
where Hk"p is the set of all j' such that the pth element of j' = ( j l , . . . ,
j p ,. . . ,j ' ) is equal to kP, i.e., HIP = {j': j p = k P } .The set of P-tuple
indexes H:p corresponds to all equivalent quanta y'(j') that contain yip in
their construction. Corresponding to H i , is a subset of the real line
XL = ~ , ~ , ~ ~ ~ For ( j ~ ) . Fig. 5 has emphasized with heavy lines
S l example,
the equivalence class Z;, the union of all equivalent cells Se(jP) whose
corresponding ye(j') use y: in their construction.
16 C . F. BARNES AND R. L. FROST
Substituting Eq. (20) into Eq. (19) and solving for yfp gives the desired
result
c
jPEH&
J se(jq f,I(X’)dX’
c Y;
p= I
(22)
P+P
contained in Eq. (21) differs from the construction of the (jP)th equivalent
quantum y e ( j p )in that the pth node of the (jP)th path through the RQ tree
is removed. Since p E { 1,2,. . . ,P},the removed node is not necessarily at the
end of the path through the RQ tree and hence is not a simple “pruning” of
the tree. Instead, Eq. (22) corresponds to the (jP)th path in the RQ tree with
the pth node removed and the two remaining portions of the path “grafted”
back together. If we define for each possible y e ( j p ) the corresponding pth
grafted branch as
= 2 Yi’p,
p= 1
P+P
= XI - gp(j'). (27)
Since t pis the residual that results when the corresponding pth grafted
branch is subtracted from XI, t pis called the pth grafl residual. Because it is
a translation of the realization X I of the random variable XI, tPis also a
realization of a random variable 2' with associated pdff,,(-). The optimal
value of y i p will now be shown to be a certain conditional mean of 3.
Notice that the map from the XI E '93 to the t pE '93 is many-to-one and into.
That is, for any given t pthere may be many different values of XI (including
no value), each in a different F(j'), that yield the same value of tp.To
account properly for the effects of this many-to-one map in the follow-
ing development, define the pth graft residual cell G p ( j P = ) { t Pt:P=
XI - g P ( j p ) , x ' ~ F ( j ' ) } Gp(j')
. contains all graft residuals Cp formed from
the XI EF(j'). Associate with each Gp(jP)an indicator function
Yy, =
The form of Eq. (29) can be simplified by equating the common sum in the
numerator and denominator to the pdf of EP,conditioned upon the use of y i p .
Proceeding, expand the pdffx,kp(jP) + t p ]as a sum of conditional pdfs
Now, substitute Eq. (31) into the sum common to the numerator and
denominator of Eq. (29) and reduce the resulting double sum to obtain
1 IGp(jp)fxI[gV(j') + PI
jPEH$
NP-I
g o =
s, tY& rops~p(sp
I xpE S,&)dtP9
which is valid for 1 < p < P and 0 < kP < N P .Thus, assuming fixed Bpand
(35)
AP with p # p, each optimal quantum yfois the conditional mean of the pth
graft residual 2 given xpE Sip.
The preceding development may be made clearer with the aid of an
illustration. Figure 6 illustrates the relationships between A', Hfp, and ygP.On
the left the two branches of the tree representing the stagewise quanta y i are
marked by heavy lines. Also marked in heavy lines are the SC(j') in the
equivalence class Z i associated with y i . On the right, the S y ( j pof
) HJhave
been translated by the appropriate grafted branches g 2 (j') of the corre-
sponding grafted tree structure (also illustrated). After translation, the
portions off,, ( - ) that have support on these constituent S"(j') are summed
and normalized to in accordance with Eq. (34). The optimal
value of y i is the mean
RESIDUAL VECTOR QUANTIZERS 19
FIGURE6. The equivalence class 2; before and after translation of the constituent S' by
the appropriate graft residuals.
No partition 8" can yield a lower average distortion than the partition that
maps each X I into the y e (j') E A' that minimizes d ( x ' , Hence, the optimal
a).
partition is defined by
<
x'ESe(j') if and only if d ( x ' , y ' ( j P ) ) d ( x ' , y ' ( k P ) ) for all kP, (37)
where any arbitrary rule may be used in the event of a tie. This is the
unsurprising result that the best s ' for a fixed A' is a nearest-neighbor or
Voronoi partition (Gersho, 1979).
To answer the second question, we wish to describe a sequence of stagewise
partitions {S',P2,. . . , P'} that will realize the optimal equivalent partition
P described by Eq. (37). Proceeding, first extend the definition of the
distance metric as
d ( x , A ) = mind(x,y),
YEA
(38)
so that d ( . , - ) can be used to indicate the distance between a point and a set.
Equation (37) can then be written
x ' ~ S ' ( j if~ and
) only if d(x',y:, + y f 2 + +$)
< d ( x ' ,A' + A Z + . . . + A'). (39)
Assuming that x' ES'(j'), it is clear that optimal first stage partition cells 4''
must satisfy
X ' E ~ if' ~
and only if d(x',y:, + A' + + A')
< d ( x ' , A ' + A'+ ... + A p ) . (40)
In other words, Eq. (40) requires 4'' to be the subset of % that is nearest
RESIDUAL VECTOR QUANTIZERS 21
FIGURE7. The translations of the subtrees of A ' + A'+ A' to form the smaller tree
A2 + A3.
neighbor with respect to be terminating nodes of the subtree that originates
at Y:'.
A similar construction can be repeated to form the optimal equivalence
classes SPp of the remaining stages. For example, when the residual x2 is
formed by the difference x2 = X I - Q' (x'), the code vector tree
+
A' + A2 + . . . AP is modified by subtracting the first non-zero component
of each path to yield a smaller subtree A' + A' + . . . + A P .The formation of
this smaller tree is illustrated by Figs. 7 and 8. In Fig. 7, the difference
x2 = x' - Q l ( x ' ) causes a translation of each of the subtrees corresponding
to nodes in the first nonzero layer of A' + A' + A 3 . In Fig. 8, each of the
translated subtrees of Fig. 7 superimposes to form the smaller tree A2 A3, +
where the root nodes of the smaller tree occurs at the origin of the residual
x2. Assuming d ( . , * ) is translation invariant in that d ( x , y ) = d ( x - z , y - z )
for x, y , Z E % , the procedure used to determine an optimal 9'can be
recursively utilized to determine an optimal Y pfor any p E { 1,2,. . . ,P} to
yield the rule
XPE SPp if and only if d(xP,y; + AP+'+ . . . + A P )
< d(XP, AP + AP+' + ... + AP), (41)
where x p = x' -::Z: Q(x').
In other words, a stagewise partition ypis optimal if and only if it
corresponds to a union 2; of equivalent cells Se(.) where the Se( .) E 2; are
nearest neighbor with respect to the corresponding equivalent quanta. Again,
an illustration will clarify the relationship between optimum 4'' and optimum
S e ( j P ) .Like Fig. 4, Fig. 9 represents a three-stage scalar residual quantizer.
I
FIGURE9. Example of a partially entangled three stage, two quanta per stage, scalar
residual quantizer.
However, this tree has crossing branches. Now, compare Figs. 5 and 10. In
both figures, the subset of %, indicated by thick lines, represents #,, . Because
Hd is also the nearest neighbor set (the union of four nearest neighbor
equivalent quantizer cells) with respect to the subtree that originates at y:, it
is identical to the optimal stagewise partition cell Sit The same is true for the
subset XI1 of % indicated by the thin lines, the quantum y f, and the partition
cell Sl .
To answer the third question, observe that the stagewise equivalence class
Si in Fig. 5 is a connected interval of % and may be distinguished from the
equivalence class S,' by a single boundary point. In contrast, the equivalence
class S
A in Fig. 10 is not a single connected interval. It is a union of three
disjoint line segments and requires five boundaries to distinguish it from S: .
We say that Fig. 5 represents an unentangled tree and Fig. 10 represents an
entangled tree. If the optimal encoding rule Eq. (40) were to be implemented
for the entangled tree by a optimal stagewise encoder, five tests would be
required at the first stage alone to determine whether x' where in Si or in S ; .
For a completely entangled tree, a single optimal stagewise partition can have
complexity as high as that of 9".In general, it seems to be most economical
to implement an optimal Pewith a single exhaustive search encoder. We call
such RQs exhaustive search residual quantizers (ESRQs).
FIGURE10. The equivalence class (indicated by thick lines) and X; (indicated by thin
lines) in an entangled tree.
RESIDUAL VECTOR QUANTIZERS 23
FIGURE 12. SQNR vs. rate performances of various scalar quantizers on the memoryless
Gaussian source.
1" xtf(x)dx
SQNR = 10 log,, J - m dB. (42)
DMSE
FIGURE13. SQNR vs. rate performances of various scalar quantizers on the memoryless
Laplacian source.
V. VECTOR
RESIDUAL
QUANTIZERS
In this section, the necessary conditions for the optimality of the stagewise
quanta, stagewise partitions, and equivalent partitions developed in Section
IV for scalar RQs are extended to vector RQs. This generalization follows
directly from the work of Linde et al., (1980). However, for vector code books
it will be seen that residual encoding can be inefficient even when the RQ code
vector tree is not entangled. This is demonstrated with some illustrations of
two-dimensional RQ equivalent code books.
(43)
Since the distortion measure is position invariant, subtract the pth grafted
branch gp(j') = X i = l y,$ from both X I and ye(jf) in Eq. (43) to obtain
D(x',g')= c ExllxlESY(jP){d(x'
alljp
P#P
gp(jP),y~)lx'ESe(jP)}
-
x Prob(x'EF(j')). (44)
Next, partition the indexes j' into the familiar subsets H:,,for 1 < kp< NP.
Thus, if jPEHkpPp,then y;p = y;,, and
D ( x ' , g ' )= 11 E X I I X I E S c ( j P-~gP(jP),Y;p)IX'ESe(jP)}
{d(~'
L P E J , ' jp,H;P
(46)
Since F ( j P ) n F ( k P ) = @ for j' # k',
N x ' , ~ ' =) E=PIxIsJyII,
{ d ( T P , ~ ~ p ) I ~ ' ~ X ~ P } P r o b(47)
(x'~X~).
kPe J P kP
By definition, the sets { x ' : x ' E XIP} and the sets { X I : x p E S , , } are identical for
kg' = 1,2,. . . ,N p , where xP ;: Q p ( x p ) is the pth casual residual of
= X I - ZCpp
the RQ. Hence Eq. (47) gives
In the code vectors yipE APare allowed to vary while all partitions {PP}
and
all code books { A P }with p # p are held fixed, observe that
2 C
kPEJP
inf Ez- P I
uE'Rn
XPE S& { d ( 5 P , u ) I x " ~ S & } P r o b ( x P ~ S ~ P )(50)
.
no other pth stage code book can yield a smaller average distortion (with the
other RQ stages arbitrarily fixed). In Barnes and Frost (1990), it is proved
that there exist points ytP that satisfy Eq. (51) if for all j p E H:" the sets S e ( j p )
have nonzero measure.
In conclusion, for an RQ to give minimum average distortion the code
vectors y& must satisfy
EEP X P E SfP 1
{ d (TP,Y f P ) I x P E S,q, = min
UP%"
ESP,X P € S f , { d (TP,4 I x p E S,p,1 (52)
for 1 < p < P and 0 < kP < N p . The conditional densityfz,,lx,,Es,, (.) used in
ki'
Eqs. (46-52) is determined from the source density f,, (.) as
28 C. F. BARNES AND R. L. FROST
The structure of the optimal equivalent and the optimal stagewise partitions
for a vector RQ follows directly from the results obtained for the scalar case.
For completeness the definitions are repeated here with appropriate
modifications to the notation. The optimal equivalent partition is defined by
x ' E S e ( j P ) if and only if d ( x ' , y'(j')) < d ( x ' , y ' ( k P ) ) for all k P , (54)
where some arbitrary rule is used in the event of a tie. Likewise, the optimal
stagewise partitions are given by
x P E ~ $ if and only if d(xP,yiPp + AP+' + . .. + A')
< d(XP, AP + AP+' + . . . + AP). (55)
. .
. . b
. ..........
....
0 b
..........
b
b
. . * .
. * *.
b.
.
b
0
....... . . b
b
0
........
.............
.
.
. . . . . . ... . . * .
. .... . .
b .
FIGURE14. Code vector constellation of an ESVQ for the memoryless Gaussian source.
I I I I I
FIGUREIS. Equivalent code vector constellation of an ESRQ with eight binary stages for
the memoryless Gaussian source.
VI. REFLECTION
SYMMETRIC
RQ
nearest neighbor stagewise equivalence classes are both simply connected and
convex. Perhaps the simplest constraint is to require each stagewise code
vector to be orthogonal to every stagewise code vector at every other stage.
A stagewise orthogonal RQ was recently been suggested (Chen and Bovik,
1990) for image quantization. In this quantizer, the first level code vectors are
amplitude vectors chosen to represent the average brightness across the
source vector. This reduces the search for the first level vectors to a scalar
distance computation. The second level vectors can then be forced to have
zero mean, and to represent only deviations from the mean vector. Imagery
appears to lend itself to this RQ structure, and the authors report good results
at very modest complexity.
Other constraints are also possible, limited only by the ingenuity of the
researcher. In this section, we consider a constraint for binary RQs that
forces a reflection symmetry between the stagewise code vectors. Although
optimality conditions are derived for the stagewise code vectors under the
reflection constraint, they cannot always be satisfied and still maintain the
32 C . F. BARNES AND R. L. FROST
YQ
FIGURE17. First stage code vectors, equivalent code vectors, and equivalent partition
boundaries of a two stage rRQ.
desired connectivity and convexity of the stagewise partitions. Nevertheless,
this new structure is instrumentable. and permits the construction of very
large equivalent code books with very large vector dimensions at achievable
levels of complexity. For sources with memory, such as natural imagery, it
gives significantly lower distortion than does conventional RQ. As such, it
may be particularly useful for VQ applications at high rates.
Assume there are only two code vectors { y i , y f } at each of the RQ stages.
Consider the perpendicular bisecting hyperplane halfway between yo”and yP .
To encourage convex optimal stagewise partition cells, we require that each
equivalent code vector on one side of the hyperplane have a “reflection”
equidistant from and on the opposite side of this hyperplane boundary. That
is, if the hyperplane is imagined to be a mirror, then each equivalent code
vector that originates from y i on one side of this mirror must have a reflection
originating from yf on the other side of the mirror. If this condition can be
satisfied, then the simple hyperplane boundary will describe optimal
stagewise partition cells. Figure 17 illustrates a two-dimensional, two-stage
RQ with the desired symmetry.
This reflection symmetry can be described as follows. Given two code
vectors {yo”,y f } at the pth stage, the point midway between the code vectors
is given by
The point m p lies on the nearest-neighbor boundary between the two code
RESIDl.JAL VECTOR QUANTIZERS 33
vectors yf and y(. The difference vector n p = yf - yo" is normal to this
nearest-neighbor hyperplane boundary. The unit normal vector is
-
where 11 )I is the Euclidean norm. The plane through m pperpendicular to the
normal vector A p is the desired boundary, and is described by
-
fiP * m P U P = 0, (58)
where u p is any point in the plane. The smallest distance 6 between any point
xp and the perpendicular bisecting hyperplane is given by
--
6 = IlAP mPxP 11. (59)
Define the reflected vector S p and, with a slight abuse of notation, the
forward reflection operator' a;(-) at the pth stage as
w p = &!$(XP) =
i'"
xp - 26AP
ifjP=O-xPES[,
if j p = 1 - x P ~ S f .
By convention, the forward reflection operator .Yep,(.) reflects points in Sf to S;.
(60)
fP(jP,jP+l ,. . ., j P ) = 2 J ; p ( g p + I ) , (63)
where the inverse reflected residual operator is defined as
9,;"(BP+I) =9,;p(y+I + 9". (64)
It follows that the final reconstruction 2' of the source vector XI is given by
the resulting composition
I
where gpt' = 0, the zero vector. Clearly, f ' ( j ' , j 2 ,... , j p ) is analogous to
y'(j') defined for conventional RQ.
A visual aid for illustrating the structure of the partially reconstructed code
vectors 9 p ( j p , j p +.'.,. , j p ) given by
can be constructed by folding a sheet of paper over onto itself P times for a
P-stage coder. Punch one hole that passes through each fold in the paper. The
hole represents 9'. Undo the first fold to form the two partially reconstructed
equivalent code vectors g p ( j p )The
. crease represents the stagewise boundary
between S,P and Sp. Now translate the pattern by yp-' and undo the second
fold to observe the four code vectors gp-'(jp-',jp) and the three partially
reconstructed equivalent boundaries. There are in general N P = 2 ' + ' - p such
partially reconstructed code vectors, constructed according to Eq. (66) from
all possible ( P + 1 - p)-tuples ( j p . . j ' ) {Jp
~ x * .. x J'}. There are also
2 p + ' - p - 1 hyperplanes that determine the individual equivalent partition
cell boundaries at the pth stage. Continue to unfold the paper to reconstruct
the entire equivalent code vector constellation and all equivalent boundaries.
Because reflection in two dimensions is equivalent to folding, this represents
the equivalent code book A' of a two-dimensional reflected RQ (rRQ). This
visual aid suggests that rRQ might also be called origami" RQ.
Note that rRQ requires somewhat more computation than residual
encoded RQ because of the need to reflect the residual vectors x p at the
encoder and to unreflect the partial reconstructions gp(.) at the decoder. I t
now remains to derive optimality conditions on the stagewise code vectors.
Since the distortion measure d ( x ,y) is translation invariant, and since reflec-
tion is distance preserving, we may rewrite Eq. (67) by applying a sequence
of forward reflection operators to both x' and 9'(jp) to obtain
x Prob(x' E Se(jP)),
The form of Eq. (71) is identical to that of Eq. (46). It follows directly that
to minimize the expected distortion in quantizing X I with an rRQ, the
stagewide code vectors must satisfy
E{d(zP, , I j i p E Sp}
I gpE g p }= min E { d ( t p u) (72)
UEOi"
for (1 < p < P) and (1 < kP < N p ) .This result is analogous to Eq. (52), but
differs in that if for the origami code the reflection boundary at the pth stage
is assumed fixed there is only one independent code vector Y p to optimize.
Alternatively, the reflection boundary may also be iteratively improved
during the decoder optimization step of the design procedure. That is, instead
of finding one graft centroid for 9, two graft residual centroids can be
calculated, one each for S,P and Sf,and then the corresponding hyperplane
36 C. F. BARNES A N D R. L. FROST
t p =xp - 1
p=p+ I
y$ if x' E S E ( j P ) , (73)
I I I I I
FIGURE18. Equivalent code vector constellation of a r K Q with eight binary stages for the
memoryless Gaussian source.
much more spread out than those shown in Fig. 16, so it is not surprising that
the SQNR is more than 4dB better than conventional unoptimized RQ even
though both use a tree-structured encoder. In fact, the reflection constraint
has cost only 0.6dB in SQNR when compared with an optimized ESRQ.
VII. EXPERIMENTAL
RESULTS
Both the Lloyd Method I and the LBG algorithm can be interpreted as
iterated design procedures where finding centroids of fixed partition cells is
analogous to optimizing the decoder for a fixed encoder; and finding a new
nearest-neighbor partition with respect to a fixed set of quanta is analogous
to optimizing the encoder for a fixed decoder. When repeated application of
38 C. F. BARNES A N D R. L. FROST
these optimization steps leaves the quanta and partitions unchanged, the
quantizer satisfies a fixed point condition.
The basic philosophy of this design approach can be used to design jointly
optimal residual quantizers. The difference, however, between the Lloyd and
LBG algorithms for single-stage quantizers and a similar algorithm for
multistage residual quantizers is that there must be two interlaced iterative
fixed-point procedures: one for optimization of the encoder/decoder pair,
and another to satisfy the graft residual centroid condition simultaneously
among all R Q stages. In the second iterative procedure, each R Q stage is
optimized while holding the code books of all other stages fixed. The new
code vectors of an optimized stage satisfy the necessary graft residual
centroid conditions with respect to the fixed code books of the other stages.
This procedure is then repeated for a different stage. However, the process of
optimizing the code vectors of a different stage cause the first stage that was
optimized no longer to satisfy the graft residual centroid condition. It is
eventually necessary to return to all stages and repeat the process in “round
robin” fashion. Since the changes made to the code books of each stage can
only decrease or leave unchanged the average distortion of the R Q (assuming
a constant fixed partiton), this iterative procedure converges to a fixed point.
After this fixed point has been reached, a new encoder/decoder iteration is
performed (a new partition is selected) and the entire process is repated until
both fixed-point conditions are simultaneously satisfied. This is the method
used to design the jointly optimal residual quantizers tested in this section.
B. Synthetic Sources
(74)
The various R Q simulation results reported here have the following charac-
teristics in common: The training sets contained 500,000 vectors, since under
these conditions we found that the simulation results using in-training set
data varied negligibly from the results obtained using out-of-training set
data. Since the equivalent code book sizes in these experiments varied from
2 to 256 code vectors, the corresponding training set size on a per equivalent
code vector basis ranges from 250,000 to about 1,950 training set vectors per
equivalent code vector.
Each of the different R Q designs was tested with the number of stages
varying from two to eight stages. The code book sizes for the RQs were
divided as equally as possible among the stages. If an equal number of code
vectors could not be allocated to each stage for a given n and R , then the first
few stages were assigned the larger code book sizes. All stopping thresholds
used during the design process for relative changes in distortion were set to
0.0005. The splitting algorithm of Linde and Gray ( 1 980) was used to seed
the initial code books.
Tables of SNQR(n, P, R ) at rates of 0.5, 1 .O, or 2.0 bits per sample can be
found in the Appendix. The tables are organized in pairs. The first table of
each pair gives the SQNR(n,P,R) performance of a conventional, suboptimal
R Q designed with sequential use of the LBG algorithm as in Juang and Gray
(1982). The second table gives the performance of the same RQ only where
40 C. F. BARNES A N D R . L. FROST
10.00
9.00
8.00
7.00
6.00
5.00
4.00
3.00
2.00
an exhaustive search encoder is used and where the stagewise code vectors
satisfy necessary conditions for joint optimality. In each plot the P = 1 curve
represents an unconstrained ESVQ quantizer performance, which serves as
a reference to determine the effect of the multistage residual memory
constraint. The P = 2 RQ is the least constrained and the P = 8 RQ is the
most severely constrained in that each stage has only two code vectors.
It can be argued that the comparison of ESRQs with tree-searched sequen-
tial L,BG RQs is unfair since the search procedures are not identical. For this
reason. this comparison between conventional RQ and ESRQ is not overly
emphasised. The main thrust of these experiments is the comparison of
ESRQ with ESVQ. As we shall see. however. an interesting result is that the
performance of sequentially designed LBG RQs and jointly optimal ESRQs
can be nearly identical at low rates for memoryless sources. For sources with
memory this is not true. This study also illustrates some of the undesirable
phenomenon that occur at higher rates with sequentially designed RQs.
RESIDUAL VECTOR QUANTIZERS 41
17.00 EsVQ
16.00 .--------
ys.&.ges. ....
3 stages
*-----
15.00 4 stages
c - - - -
5 Stages
c - - -
14.00 6 Stages
c - -
7 stages
13.00 c - -
8 Stages
12.00
11.00
10.00
9.00
8.00
7.00
6.00
5.00
5.00 10.00
entanglement problems for both RQ code book design methods and search
procedures, at least for this memoryless unimodal source.
For the AR(2) source, the SQNR of conventional RQ decreases significantly
as P increases. For ESRQs, however, there is very little variation in SQNR
with increasing P,and there is only a slight loss of performance between the
multistage quantizers and the single-stage quantizer. The performance drop
between the single-stage and multistage quantizers ranges from 0 to about
0.5 dB for the ESRQs. The corresponding drop for the conventionally
designed RQs is as large as 3.0dB.
These results help quantify the extent to which the SQNR vs. vector
dimension performance of ESRQ is suboptimal to ESVQs for various values
of P and R . The question remains as to whether or not ESRQs give superior
RESIDUAL VECTOR QUANTIZERS 43
FIGURE
22. Lena compressed with conventional RQ at 0.25 bits per pixel.
somewhat surprising, since ESRQ was intended to reduce the memory costs
relative to ESVQ.
A very different result is obtained on the Gauss-Markov source. As shown
in Fig. 20, for the parameters tested, ESRQs required one-fourth to one-
sixteenth the memory of corresponding ESVQs. Equivalently, the ESRQs
give approximately at 0.25 dB to 2.5 dB increase in SQNR over ESVQs at a
given memory expenditure. The savings depend on both R and n. This
demonstrates that extreme care should be taken when evaluating “cost
efficient” compression schemes. In this case, the ESRQ structure proved
more efficient than ESVQ on one source but not on the other.
It is not surprising that a structured VQ is better suited to a structured
source. However, even though ESRQ proved to be more memory-efficient
RESIDUAL VECTOR QUANTIZERS 45
D . Reflected RQ
required time for code book design of the rRQ code vector constellation, the
rRQ stages evaluated here were jointly optimized only over sub-blocks of
eight stages. That is, the first eight stages were jointly optimized. Then, while
holding these stages fixed, the next eight stages were jointly optimized, added
to the first eight stages, and the process repeated. Experience with reflected
RQ designs shows that, as the number of encoder stages allowed to change
during the design process is increased, it becomes increasingly likely that
entanglement will occur. This is manifested by nonmonotonic behavior of the
quantizer distortion during the design process. This incremental sub-block
design approach is an ad hoc method of encouraging monotonic convergence
during the design process. One possible improvement to this design approach
might be to use separate encoder and decoder rRQ code books. The encoder
RESIDUAL VECTOR QUANTIZERS 47
FIGURE
27. Lena compressed with rRQ at 1.00bits per pixel.
TABLE I1
RESULTSFOR TESTIMAGELENA
PERFORMANCE
FIGURE
29. Lena compressed with rRQ at 1.77bits per pixel.
tation costs for the rRQ codes is very low: only 128 vectors need to be stored
and only 64 pair-wise nearest-neighbor vector encoding decisions (plus the
computation expense of the reflection operations) are required for encoding.
We believe that rRQ is the only nonlattice vector quantizer developed to date
that is instrumentable in both memory and computation costs, and yet seems
to yield acceptable performance levels.
These results are quite encouraging. The distortion results can be expected
to improve if the code vector size (and number of stages) is allowed to
increase. This would not compromise implementability, since the 64 stage
quantizers designed here do not come close to challenging current stage-of-
the art digital hardware.
RESIDUAL VECTOR QUANTIZERS 51
VIII. CONCLUSIONS
APPENDIX:
TABLES
OF RATE-DISTORTION
DATA
TABLE 111
DISTORTION
OF UNOPTIMIZED
RQ ON THE MEMORYLESSGAUSSIAN
SOURCEAT 0.5 BITPER SAMPLE
# of Vector Dimension
Stages 2 4 6 8 10 12 14 16
TABLE IV
OF OPTIMIZED
DISTORTION ESRQ ON THE MEMORYLESS SOURCEAT 0.5 BIT PER SAMPLE
GAUSSIAN
~~ ~~
# of Vector Dimension
Stages 2 4 6 8 10 12 14 16
# of Vector Dimension
Stages 1 2 3 4 5 6 7 8
TABLE VI
OF OPTIMIZED
DISTORTION EsRQ ON THE MEMORYLESS
GAUSSIAN
SOURCE
AT 1 .O BITPER SAMPLE
# of Vector Dimension
Stages 1 2 3 4 5 6 7 8
TABLE VII
DISTORTION
OF UNOPTIMIZEDRQ ON THE MEMORYLESS
GAUSSIAN
AT 2.0 BITSPER SAMPLE
SOURCE
# of Vector Dimension
Stages 1 2 3 4
TABLE VIII
DISTORTION
OF OPTIMIZED
ESRQ ON THE MEMORYLESS
GAUSSIAN
SOURCE
AT 2.0 BITSPER SAMPLE
# of Vector Dimension
Stages 1 2 3 4
# of Vector Dimension
Stages 2 4 6 8 10 12 14 16
TABLE X
DISTORTION OF OPTIMIZED E s R Q ON THE GAUSS-MARKOV
SOURCE AT 0.5 BIT PER SAMPLE
# of Vector Dimension
Stages 2 4 6 8 10 12 14 16
TABLE XI
DISTORTION
OF UNOPTIMIZED
RQ ON THE GAUSS-MARKOV
SOURCE
AT 1.0 BIT PER SAMPLE
# of Vector Dimension
Stages 1 2 3 4 5 6 7 8
TABLE XI1
DISTORTION
OF OPTIMIZED
ESRQ ON GAUSS-MARKOV
THE SOURCE
AT 1.0 BITPER SAMPLE
# of Vector Dimension
Stages 1 2 3 4 5 6 7 8 9 10
1 4.39 7.51 8.41 9.56 10.36 10.92 11.35 11.68 11.98 12.23
2 7.52 8.32 9.45 10.12 10.70 11.13 11.40 11.70 11.86
3 8.31 9.44 10.14 10.71 11.14 11.31
4 9.44 10.23 10.63 11.11 11.32
5 10.15 10.70 11.14 11.35
6 10.70 11.11 11.45
7 11.05 11.41
8 I 1.40
RESIDUAL VECTOR QUANTIZERS 57
TABLE XI11
DISTORTION
OF UNOPTIMIZED RQ ON THE GAUSS-MARKOV
SOURCE
AT 2.0 B i n PER SAMPLE
# of Vector Dimension
Stages 1 2 3 4
TABLE XIV
OF OPTIMIZED
DISTORTION ESRQ ON THE GAUSS-MARKOV
SOURCE
AT 2.0 BITSPER SAMPLE
# of Vector Dimension
Stages 1 2 3 4
REFERENCES
Baker, R. L. (1984). “Vector quantization of digital images.” Ph.D. Thesis, Stanford University,
California.
Barnes, C. F. (1989). “Residual Quantizers.” Ph.D. Thesis, Brigham Young University, Utah.
Barnes, C. F., and Frost, R. L. (1990). “Vector quantizers with direct sum code books,” to
appear in IEEE Transactions on Information Theory.
Berger, T. (1971). “Rate Distortion Theory,” Prentice-Hall, Englewood Cliffs, New Jersey.
Budge, S. E., Barnes, C. F., Talbot, L. M., Chabries, D. M., and Christiansen, R. W. (1989).
“Image coding for data compression using a human visual model,” SPIEISPSE Symposium
on Electronic Imaging: Advanced Devices and Systems, Los Angeles, California.
Buzo, A,, Gray Jr., A. H., Gray, R. M., and Markel, J. D. (1980). “Speech coding based upon
vector quantization,” IEEE Transactions on Acoustics, Speech and Signal Processing ASSP-
28, 562-574.
Chan, W.-Y., and Gersho, A. (1991). “Constrained-storage quantization of multiple vector
sources by codebook sharing,” IEEE Transactions on Communications COM-39, 1 1-1 3.
Chen, D., and Bovik, A. C . (1990). “Visual pattern image coding,” IEEE Transactions on
Communciations COM-38, 2137-2 146.
Conway, J. H., and Sloane, N. J. A. (1982). “Fast quantizing and decodingalgorithms for lattice
quantizers and codes,” IEEE Transactions on Information Theory IT-28, 227-232.
Farvardin, N., and Modestino, J. W. (1984). “Optimum quantizer performance for a class of
non-Gaussian memoryless sources,” IEEE Transactions on Informution Theory IT-30, 485-
497.
Farvardin, N., and Modestino, J. W. (1986). “Adaptive buffer-instrumented entropy-coded
quantizer performance for memoryless sources,” IEEE Transactions on Information Theory
IT-32, 9-22.
Fischer, T. R., and Dicharry, R. M. (1984). “Vector quantizer design for memoryless Guassian,
Gamma, and Laplacian sources,” IEEE Transactions on Communications COM-32, 1065-
1069.
Flanagan, J. K., Morrell, D. R., Frost, C. J., and Nelson, B. E. (1989). “Vector quantization
codebook generation using simulated anealing,” in IEEE International Conference on
Acoustics, Speech and Signal Processing, 1759- 1762.
Forney, D. G. (1988). “Coset codes-Part I: Introduction and geometric classification,” IEEE
Transactions on Information Theory IT-34, 1 123- 1 I5 I .
Gabor, G., and Gyorfi, Z. (1986). “Recursive Source Coding.” Springer-Verlag. New York.
Gallager, R. G . (1968). “Information Theory and Reliable Communication.” John Wiley and
Sons, New York.
Gersho, A. (1979). “Asymptotically optimal block quantization,” IEEE Transactions on Infor-
mation Theory IT-25, 373-380.
Gibson, J . D., and Sayood, K. (1988). “Lattice Quantization,” in “Advances in Electronics and
Electron Physics” (P. Hawkes, ed.) 72, 259-330. Academic Press, New York.
Gray, R. M., Kieffer, J. C., and Linde, Y. (1980). “Locally optimal block quantizer design.”
Information and Control 45, 178- 198.
Jayant, N. S., and NOH, P. (1984). “Digital Coding of Waveforms: Principles and Applications
to Speech and Video.” Prentice-Hall, Englewood Cliffs, New Jersey.
Jelinek, F., and Anderson, J. B. (1971). “Instrumentable tree encoding of information sources,”
IEEE Transactions on Information Theory IT-17, 118-1 19.
Juang, B. H., and Gray, A. H. (1982). “Multiple stage vector quantization for speech coding,”
RESIDUAL VECTOR QUANTIZERS 59
in Proceedinxs of the IEEE International Conference on Acoustics, Speech, and Signal Process-
ing 1, 597-600.
Langdon, G. G. (1984). “An introduction to arithmetic coding,” IBM Journal of Research and
Development 28, 1 3 5- 149.
Linde, Y . , Buzo, A., and Gray, R. M. (1980). “An algorithm for vector quantizer design,” IEEE
Transactions on Communications COM-28, 84-95.
Lloyd, S. P. (1957). “Least squares quantization in PCM,” Bell Laboratories Technical Notes;
Also published in the March 1982 special issue on quantization: IEEE Transactions on
Information Theory, Part 1 IT-28, 129-137.
Makhoul, J., Roucos, S., and Gish, H . (1985). “Vector quantization in speech coding,” in
Proceedings ofthe IEEE 73(11), 1551-1588.
Marcellin, M. W. (1987). “Trellis coded quantization: an efficient technique for data com-
pression,” Ph.D. Thesis. Texas A&M University, College Station, Texas.
Marcellin, M. W., and Fischer, T. R. (1990). “Trellis code quantization of memoryless and
Gauss-Markov sources,” IEEE Transactions on Communications COM-38, 82-93.
Max, J. (1960). “Quantization for minimum distortion,” IRE Transactions on Information
Theory IT-6, 7-12.
Pilc, R. (1967). “Coding theorems for discrete source-channel pairs,’’ Ph.D. Thesis,
Massachusetts Institute of Technology, Cambridge, Massachusetts.
Pilc, R. (1968). “The transmission distortion of a source as a function of the encoding block
length,” Bell Syst. Tech. J . 47, 827-885.
Sabin, M. J., and Gray, R. M. (1986). “Global convergence and empirical consistency of the
generalized Lloyd algorithm,” IEEE Transactions on Information Theory IT-32, 148-1 55.
Sdyood, K., Gibson, J. D., and Rost, M. C. (1984). “An algorithm for uniform vector quantizer
design,” IEEE Transaclions on Information Theory IT-30, 805-8 14.
Shannon, C. E. (1948). “A mathematical theory of communication”, Bell Syst. Tech. J . 27,
379-423, 623-656.
Shannon, C. E. (1959). “Coding theorems for a discrete source with a fidelity criterion,” in IRE
Nut. Conv. Rec., Part 4, 142-163.
Turshkin, A. V. (1982). “Sufficient conditions for uniqueness of a locally optimal quantizer for
a class of convex error weighting functions,” IEEE Trunsactions on Information Theory IT-28,
187-198.
Welch, T. A. ( I 984). “A technique for high-performance data compression,” IEEE Computer
Mugazine, 8-19.
This Page Intentionally Left Blank
ADVANCES IN ELECTRONICS AND ELECTRON PHYSICS, VOL. 84
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
A. Lattice Structures in Image Processing . . . . . . . . . . . . . . . . . 62
B. Image Algebra and Its Relation to Image Processing . . . . . . . . . . . 64
11. Theoretical Foundation of Lattice Transforms in Image Processing . . . . . . 66
A. Minimax Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . 66
B. Image Algebra. . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
C. The Embedding Isomorphism between Minimax Algebra and Image Algebra 85
D. Mathematical Morphology . . . . . . . . . . . . . . . . . . . . . . 86
111. Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
A. Mapping of Minimax Algebra Properties to Image Algebra . . . . . . . . 90
B. A General Skeletonizing Technique. . . . . . . . . . . . . . . . . . . 115
C. An Image Complexity Measure . . . . . . . . . . . . . . . . . . . . 120
D. The Dual Transportation Problem in Image Algebra . . . . . . . . . . . 124
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
I. INTRODUCTION
b, = v + ak.
n
k= I
f,k
were first investigated in the area of operations research, which has long been
known for its class of problems in optimization. The types of optimization
problems that Cuninghame-Green considered used arithmetic operations
different from the usual multiplication and summation. Some machine-
scheduling and shortest-path problems, for example, could be best charac-
terized by a nonlinear system using additions and maximums. The minimax
algebra is a matrix calculus that uses a special case of a generalized matrix
product (Cohen, 1988), where matrices and vectors assume values from a
lattice. By adding a few more conditions, such as a group operation on the
lattice, and the self-duality of the resulting structure, Cuninghame-Green was
able to develop a solid mathematical foundation in which to pose a wide
variety of operations research questions. It turns out that mathematical
morphology is a special subalgebra of the minimax algebra, the details of
which are presented in Section 11. Much theoretical and applied work has
been done in the area of mathematical morphology. The generalization of
morphology to lattice transforms is intended to extend the knowledge
already gathered in this area, not to supplant it.
A . Minimax Algebra
-
(f g)(A, B) = c
where Clk = (allgblk)f(a,,gb,k)f.. . f(alpgbpk), for i = 1, . . . , m, k =
I , . . . , n, and f and g are viewed as binary operations.
r*=
I -r
-co
+co
Thus, (r*)* = r . This give the following relation:
r A u = (r* v
if r E R
ifr=+co
ifr=-co
u*)*
for all r, u in R,, . If the value set is RZ0 = R u {O, co}, then every element
r E R:' has a multiplicative conjugate i' defined by
ifr#Oandr# foo
i f r = +co (2)
-
Hence, (i') = r, and
r A u = (i' v ii)
FOUNDATION A N D APPLICATIONS OF LATTICE TRANSFORMS 7I
for all r, u in RZ0.
There are two types of operations defined on matrices having values in a
bounded I-group. Specifically, if A = (a,,) and B = (b,,) are two m x n
matrices having entries in the set R+,
- , then the pointwise maximum A v B
is defined as
A v B = C , where c, = a,, v b,.(3)
If A is m x p and B is p x n, the product of A and B is the matrix C ,
C = A x B, which has size m x n and values
k= I
If the value set is RZo, then the pointwise maximum between two matrices has
the same definition as ( 3 ) , but the product is defined as
v
P
c, = (ark * bk,).
k= I
B. Image Algebra
This section provides the basic definitions and notation that will be used for
the image algebra throughout this chapter. We will define only those image
algebra concepts necessary to describe ideas in this document. For a full
discourse on all image algebra operands and operations, we refer the reader
to Ritter et al. (1990).
The image algebra is a heterogeneous algebra, in the sense of Birkhoff and
Lipson (1970), and is capable of describing image manipulations involving
not only single-valued images, but multivalued images, although here we
shall restrict our discussion to single-valued image manipulation. In fact, it
has been formally proven that the set of operations is sufficient for expressing
any image-to-image transformation defined in terms of a finite algorithmic
procedure, and also that the set of operations is sufficient for expressing any
image-to-image transformation for an image that has a finite number of gray
values (Ritter et al., 1987b: Ritter and Wilson, 1987). In addition, since the
lattice properties parallel many of the linear ones, definitions presented will
focus on both the linear and lattice properties of the image algebra.
The six operands of the image algebra are value sets, point sets, the elements
of each of these sets, images, and templates.
Note that a * b is a real number, whereas the operations (6) and (7) result in
an image.
Two other common operations used in image processing are expo-
74 JENNIFER L. DAVIDSON
i
ab = (x, c(x)) : c(x) =
(0 otherwise
, .€Xi.
i
xs(a) = b = (x, b(x)): b(x) =
i I2 if a(x) E S(x)
I , otherwise 1.
Note that ( F , 0, x ) = (R, +, *) satisfies the above conditions with I , = 0 and
I, = I , as well as ( F , 0, x ) = (R”’, v , *) with I , = 0 and I2 = 1. Hence, we
have
3 . Generalized Templates
Templates and template operations are the most powerful tools of the image
algebra. A template as defined in the image algebra not only unifies but
generalizes the familiar concepts of templates, masks, windows, and
neighborhood functions as used in the image processing community. In
particular, image algebra templates generalize the notion of structuring
elements as used in mathematical morphology (Ritter et al., 1987a)
Let X and Y be point sets and F a value set. An F-valued generalized
template t from Y to X is an element of (F')'. For each y E Y, t(y) is an image
on X. Denoting t(y) by fy for notational convenience, we have
t, = {(x, t,(x)): X E X ,t,(x)EF}.
The set of all F-valued templates from Y to X is denoted by (F')'. A template
t can be viewed as a collection of images, {fy}YEY. See Fig. 1 for an example
of a variant template from Y to X where X # Y. In this example, Y is a 6 x 10
array, X is a 3 x 5 array, and the value set F = R. For e a c h j = 1, . . . , 60,
there is an image fy, assigned to yj. For instance, for j = 1, t,,(xl) = 1,
FOUNDATION AND APPLICATIONS OF LATTICE TRANSFORMS 77
\ /'
tXY) = [t,(x)l*.
Similarly, using Eq. (2), if t E ((R$o)X)Ythe multiplicative conjugate of t is the
template T E ((R$')')' defined by
-
tX(Y) = [t,(x)l.
t@ a =
{ (Y, b(y)):b(y) =
a[vlt=
Similarly, if S(4)#
i(y,b(y)):b(y)=
0,
then
v
x E S- rn (ty )
a(X)+tl(X),yEY}.
XE
V
s- m(fr)
a(x) + t,(x) = - 00 and v
x E SO,)
a(x)*&(x) = 0.
Note that these values represent the null or identity values of the respective
value sets, just as 0 is the identity value for the value set {R, *}. We may +,
therefore restrict our computation of the new pixel value to the support oft.
FOUNDATION AND APPLICATIONS OF LATTICE TRANSFORMS 81
This becomes particularly important when considering mapping of
transforms to certain types of parallel architectures.
Because of the duality inherent in the structure R,, (and RZ'), the
operations (and @) induce dual image-template operations, called
additive minimum (and multiplicative minimum). They are defined by
a m t 2 (t*[vla*)*
and
Equivalent, we have
A
x E s+, (t,)
a(x) + ~ ( x =) + 00.
+
pointwise between two templates s and t E (Rx)' as s t by (s t), = sy t, + +
and s * t by (s * t), = s, * t,.
Many of the properties that hold for images also hold for G-valued
templates. For example, the above two operations are each commutative and
associative, and each has an identity. Under addition, the identity template
is t = 0, that is, t, = OER' for all y in Y.Under multiplication, the identity
template is t = 1, that is, t, = 1 E RX for all y in Y.
If G = RZ, then we can define extended addition, maximum, and
minimum between templates:
and
s + t 5 r, where ry(x)=
s v t by
s A
1 s,(x)
s, (XI
t,(x)
-03
(S
+ t,(x)
t by (s A t),
if xES-,(ty)nS-ao(sy)
if X E S- ao (sy)\S- ic
if x E S - , (t,)\S-, (s,)
otherwise
v t), = S, v t,
= s, A t,.
(61
D . Mathematical Morphology
+
respectively, where Ah = { a b : U E A } and B' = { - b : bE B } . This is the
original notation as used in Hadwiger's book (1957). It can be easily shown
that A / B = (A" x B'y, where A" denotes the complement of A in R".The two
morphological operations of dilation and erosion are constructed from these
definitions. While there are several slight variations on the actual definitions
of dilation and erosion, we will use Sternberg's, which are
and
Here, the set A represents the input image and the set B the structuring
element.
To avoid anomalies without practical interest, the structuring element B is
assumed to include the origin OER", and both A and B are assumed to be
compact. Also, the actual symbols used for dilation and erosion are typically
0 and 0, respectively. However, to avoid confusion with the image algebra
operation of @ we replace 0 and 0 with and 0, respectively.
All morphological transformations are combinations of dilations and
erosions, such as the opening of A and B, denoted by
AoB=(AHB)BB,
and the closing of A by B, denoted by
A* B=(AWB)BB.
However, a more general image transform in mathematical morphology is
the Hit or Miss transform (Serra, 1969). Since an erosion, and hence a
88 JENNIFER L. DAVIDSON
dilation, is a special case of the Hit or Miss transform, this transform is often
viewed as the universal morphological transformation upon which the theory
of mathematical morphology is based. Let B = (0,E ) be a pair of structuring
elements. Then the Hit or Miss transform of the set A is given by the
expression
A @ B = { a :D, c A , E, c A ' } .
For practical applications it is assumed that D n E = 0. The erosion of A by
D is obtained by simply letting E = 0, resulting in A 0 B = A [3 D .
Extension of these boolean operations have been accomplished through
the concept of an umbra. It has been shown that this somewhat cumbersome
method of developing gray-value morphology is unnecessary and that a
simpler and more intuitive approach can suffice (Davidson, 1989, 1990).
We now discuss the relationship between the morphological algebra and
the image algebra. For the appropriate template t, performing a dilation is
equivalent to calculating a m t . Also, an erosion can be represented by
alt*.
Let A , B be finite subsets of the n-fold Cartesian product of Z, z".Choose
a point set X such that X c Z" and satisfies A FjJ B c X. Let F, denote the
value set { - co, 0, 1, m}, and define a function p from the power set of z"
to the set of all F,-valued images on X by
p : 2""+ F t , p ( A ) = a, a(x) =
i 1 ifxEA
0 otherwise
This maps a morphology image A , represented by a set, to an image algebra
image a, represented by a function. The mapping of a structuring element is
as follows. Let
5={Bcz":JBI<coandO~B}.
Let 9 denote the set of all F,-valued invariant templates from X to X such
that y ~ S _ , ( t ~ Now
) . we define a function [ from 5 to 9 by
otherwise
Using these two functions, we can map a morphology structuring element to
B;
an image algebra template and vice versa. The image a is said to correspond
to the set A , and the template t is called the template corresponding to the
structuring element B. The correspondence between image algebra and
mathematical morphology is described in the next two theorems.
Theorem 2.5. Let p and [ be as defined above. Let A c Z", and B E Y a
FOUNDATION AND APPLICATIONS OF LATTICE TRANSFORMS 89
structuring element. Then
111. APPLICATIONS
J,(x) =
i
1 ifx=y
-a otherwise
Next, we state the distributive properties of with respect to v .
(a v b ) m t = ( a m t ) v ( b m t )
(s v t ) M U
u i g ( s v t) = (urns) v ( U M t )
The dual to properties 1 through 6 also hold, because the belt R has
duality.
(7) ((F')', A ) is a semi-lattice and ((F')', A ) is a function space over
(F, A , x');
(8) {(F')', A , m} is a belt;
etc.
Now let F be a subbelt of R, and F,, the bounded I-group with group F.
Corresponding to the identity matrixand the null matrix, we have the (one
92 JENNIFER L. DAVIDSON
I,(x) =
i
0
- 00
ifx=y
otherwise
and the (constant) null template - cc E (FZ,)" defined by
3
WX,(Xl) =
i
d,
- co
ifjgi
otherwise.
There is an obvious relationship between the weighted digraph associated
with the partial order relation W and the template w. For example, suppose
we have 5 tasks or activities, or subroutines of a program, which have the
following relation or partial order:
(1,2)(1,3)(2,4)(2,5)(3,4)(3, 5)(4,5).
Here, activity 1 is the start activity, activity 5 is the end activity, and tasks 2,
3, 4 are intermediate subroutines. Suppose the duration times d,, of the
activities are
d21 = 1 d31 = 6 d42 = 2
d43 = 1 d5, = 1 d53 = 3 d54 = 3,
and d,, = 0 for each i = 1, . . . , 5 . This is consistent with a meaningful
physical interpretation of the definition of duration time for a task.
The corresponding weighted digraph is given in Fig. 2. The nodes represent
the activities, and the duration times are given as numbers on the directed
edges linking the nodes.
FOUNDATION AND APPLICATIONS OF LATTICE TRANSFORMS 95
relations:
(1,2)(1,3)(2,4)(2,5)(3,4)(3,5)(4,5)
Here, we write ( i , j ) if task i must precede taskj. Suppose the timesfl, of the
activities are
h2=1 fn=6 f24=2
h4=1 h5=1 h 5 = 3 f45=3
Suppose we would like to find a(x4), for example, satisfying
= max
]=I, .5
{wx4(x,)+ a(x,)).
The value
-wx4(x5) + a(x5)
is the latest allowable time to start task 5 minus the minimum amount of time
by which activity 4 must precede activity 5, and the time to start task 4 must
be at least as small as this number. Thus, the time to start task 5 must be at
least as small as - 1 + a(x5). The value a(x4) = min { - wx4(xs)+ a(x5)} =
- I + t. (All other values - w,,(x,) + a(x,) = + cc as - w,,(x,) = + cc for
j # 5.) Since t is given, this quantity can be explicitly determined. The
remaining equations can be solved similarly.
If we define u E (RZ m ) x by
- w,,(x,) ifj9i
U,,(X,> = 9
i+cc otherwise
then it is obvious that in general we must solve for a the following:
a m u = a. ( 1 1)
It is clear that the template u in Eq.(1 1) is the conjugate of the template w
in Eq. (10).That is,
u = w*
We can say that the templates w and w* define the structure of the network
as we analyze it backward or forward in time, respectively.
e. Alternating tt* Products. This section discusses the concept of an
alternating tt* or t f product of a template t and its conjugate under the
operation or @, respectively. We shall state the results for the sub-
bounded 1-groups of R and the operations and IA].
Theorem 3.6. Let F he a suh-hounded 1-group of R + *, where F denotes the
group of the hounded&roup F c m ,and t E (F:,)'. Then we have
t(vl(t*lr\lt) = t p J ( t * ( V l t ) = ( t p J t * ) ( v l t = (t(vlt*)Wt = t.
FOUNDATION AND APPLICATIONS O F LATTICE TRANSFORMS 97
Similarly,
t*((tlt*) = t*I(tpiJ*) = ( t * W t ) W t * = (t*pJ)prJt* = t*
tmt*mt
tW(t* t)
(t* I ( ( t r n t * ) I v l t ) ) Ir\l(t* a t )
An algebraic expression so constructed is called an alternating tt* product.
Suppose an alternating tt* product has an odd number of letters t and/or
t*. Then we say it is of type t if it begins and ends with t and that it is of type
t* if it begins and ends with t*. If it has an even number of letters we say that
it is of type
t m t * or tit* or t * a t or t * m t
exactly according to the first two letters with its separating operator, re-
gardless of how the brackets lie in the entire expression. As an example
t*mt is of type t* mt
tI ( t *p
J t) is of type t
(t*m((tmt*)mt))m(t*mt) is of type t * I t .
Theorem 3.7, Let F, be a sub-bounded 1-group of R -t =, and t an arbitrary
~
P is of type Q , then
P(z) = Q@>.
2. Systems of Equations
We now discuss the problem of finding solutions to the problem
given tE(R;,)'and bERr,, find aER;, such that a n t = b. (12)
Here, 1x1 = m, IYI = n.
a. F-asticity and /-solutions. If F is a bounded I-group and x, y E F, we say
that the products x x y and x x ' y are /-undefined if one of x, y is - 00 and
the other is + co. We say that a template product is /-undejined if the
evaluation of t,(x) requires the formation of a /-undefined product of
elements of the bounded 1-group F*,. Otherwise, we say that a template
product is /-dejined or /-exists. Some mathematical models require solutions
that avoid the formation of /-undefined products, since in practical cases
these often correspond to unrelated activities. We state these results for the
bounded 1-group R, , .
Lemma 3.9. Let F,, be a suhbelt of R,, . Let X and Y he nonempty, jinite
arrays, and t E (Ft ,)' . Then the set of all images a E F: ,such that a IVIt is
/-dejined is a sub-semi-lattice of F;m. Hence the set of solutions a of statement
(1 2) such that a t /-exists is either empty or is a sub-semi-lattice of Ff , .
Lemma 3.10. Let X, Y, and W be nonempty, jinite arrays, and t E (FY,)'.
Then the set of templates s E (FX,,)w, such that sm t is /-dejined is a sub-semi-
lattice of (F;,)'.
Any solution a of statement (12) such that a m t /-exists is called a
/-solution of (12).
Lemma 3.11. Let F,, be a sub-bonded I-group of R+,. Then (12) has at
least one solution if and only if a = b m t * is a s&tion. In this case,
a = b m t * is the greatest solution.
Recall from probability theory that a row-stochastic matrix is a non-
negative matrix in which the sum of the elements in each row is equal to 1.
We will make analogous definitions, where the operation + is replaced by the
operation v , and the unity element is - co.
Let P c F,,, where F,, is an arbitrary sub-bounded 1-group of R,,. A
template t E (F;,)' is called row-P-astic if V:= I ty,(x,)E P for all
i = 1, . . . , n and column-P-astic if V:=, tk,(yi) E P for all j = 1, . . . , m.
The template t is called doubly-P-astic if t is both row- and column-P-astic.
Note that if t is column-P-astic, then t' is row-P-astic.
Theorem 3.12. Let F,, be a sub-bounded I-group of R,, and tE(FZ,)',
FOUNDATION AND APPLICATIONS O F LATTICE TRANSFORMS 99
b E F r such that ( 12) is soluble. Then a = b mt* /-exists and is a /-solution
of (12), i f and only if one of the following cases is satisfied:
(i) ts(FX)', and b = + 00, the constant image with + 00 everywhere.
(ii) tE(FX)', and b = - co.
(iii) t E (F:,)' is doubly F-astic, and b E FX.
Moreover, every solution of (12) is then a /-solution, and bit* is equal to
+ co, - 00, or isjinite, respectively according as case ( i ) , (ii), or (iii) holds.
In the following corollary, we state the dual and left-right generalizations
of Theorems 3.1 1 and 3.12.
Corollary 3.13. Let F,, be a sub-bounded I-group of R*,, and let
tE(F;,)', bEFr,. Thenforallcombinationsofc,q,and6given in Fig. 3 the
following statement is true:
The image algebra equation c has at least one solution if and only if the
product d is a solution; and the product d is then the 6 solution. Furthermore,
if the product d is 1-dejined, and equation c is /-defined when a = d , then
equation c is /-dejined when a is any solution of equation c.
If d is a solution to c in Figure 3, then d is called a principal solution.
We can also restate the last three theorems as a solubility criterion:
Problem (12) is soluble if and only if ( b m t * ) P J t = b; and every solution
is a /-solution if ( b m t * ) P J t = b /-exists.
Note that Theorem 3.12 identifies the cases in which (12) has a /-defined
/-solution. All solutions are then /-solutions. The next question to ask is, Can
we find all solutions? We now focus on the following problem.
Given that F = R,, and that ( b m t * ) m t = b /-exists and equals b,
find all solutions of (12). (13)
C d 6
am t = b b t* greatest
a m t * = b bmt greatest
a m t = b b t* m least
a m t * = b bmt least
t m a = b t* b greatest
t * m a = b tmb greatest
t m a = b t* b least
t * m a = b tmb least
value. That is, suppose there exists Y , E Ysuch that sY,(x,) is not a marked value
for any j . Then there does not exist a E F: 3L such that a t = b.
There now remains the case in which for every i, there is at least one j such
that sy,(x,) is a marked value. We transform the question into a boolean
problem, where it can be shown that the following procedure will give a set
of solutions to Eq. (14) (Cuninghame-Green, 1979).
Step 1. For the bounded I-group F, = F,, define g E (Ft )' by
~
gy,(X/ 1 =
i
0
-
if s:,(y,) is marked
co otherwise
Letting ftz F t 1 , now solve the boolean system
fpJg = I.
exists at least one j such that sy,(xI)is a marked value, and for each
j = 1, . . . , n, there exists an i, 1 < i < m such that lW,'l = 1.
Define a template tE(F;,)' to be strictly doubly 0-astic if it satisfies the
following two conditions:
(i) ty,(x,) < 0, i, j = 1, . . . , n;
(ii) for each i = 1, . . . , n, there exists a unique index j E { 1, 2, . . . , n }
such that q,(x,) has value 0.
If tE(F;,)', (XI= m, IYJ= n, then we say that t contains a template
SE(F?,)W, i f the matrix Y-'(t) contains the matrix Y-'(s)of size h x k ,
where IW, I = h, IW, I = k , and both h, k < min (m, n). We say that a template
t E (F; contains an image a E FZ ,if a = t, for some y E Y.
Theorem 3.21. Let F,, be a bounded 1-group, let tE(F;,)' be double F -
astic, and let bE FYbe$nite. Then a necessary and suficient condition that the
equation aY IJ t = b shall have exactly one solution is that we can find k finite
elements a , , . . . , ak such that the template d defined by
dY,(XI)= - b(YJ + t,,(X,) + a,
is doubly 0-astic and that d contains a strictly doubly 0-astic template
S E ( F \ " , ) ~ , IWI = k .
d. A Linear Programming Criterion. W e can show that Problem (12) can
be stated as a linear programming problem for this bounded 1-group.
Theorem 3.22. Let t E (R;,)' be doubly F-astic and b E FY befinite. Let I be
the set of index pairs (i, j ) such that ty,(xI)isfinite, 1 < i < n, 1 <j < m. Then
a suficient condition that the equation a m t = b be soluble is that some
solution { tll1(i, j ) E I } of the following optimization problem in the variables z!,,
for ( i , j ) E I :
Minimize c my,) -
(!,I)€ I
$,(X]))Z!,
[ f i,]
j= I
> 0, i = 1, . . . , n
(i,j ) E I
We now make a definition that will be used in the next section. Let F,,
FOUNDATION AND APPLICATIONS OF LATTICE TRANSFORMS 103
be a belt, and let tE(F5,)' be arbitrary. The right column space oft is the set
of all bEF;, for which the equation a m t = b is soluble for a.
e . Linear Dependence. Linear dependence over a bounded 1-group. We
consider the equation alv]t = b in another way. For images ti,, rewrite
a m t = b as
m
v [ti,rna(x,)l
I= I
= by
where a(x,) E (FI,)' is the one-point template with target pixel value of a(x,).
In this case, we say that b is a linear combination of {t;, ,tk2,. . . , tkm},or that
bEF;, is (right) linearly dependent on the set {tk,, tk2, . . . , tk,}. While in
linear algebra the concept of linear dependence provides a foundation for a
theory of rank and dimension, the situation in the minimax algebra is more
complicated. The notion of strong linear independence is introduced to give
us a similar construct.
Theorem 3.23. Let F, a, be a bounded 1-group other than F, . Let X be a
coordinate set such that 1x1 > 2, and k > 1 be an arbitrary integer. Then we can
alwaysjind kjinite images on X, no one of which is linearly dependent on the
others.
If F,,
~ = F,, then we can produce a dimensional anomaly.
Theorem 3.24. Suppose F, ,= F, , and let X be a coordinate set such that
(XI = m > 2 . Then we can alwaysfind at least (mZ- m) images on X, no one
of which is linearly dependent on the others.
Since every bounded 1-group contains a copy of F,, the dimensional
anomaly in Theorem 3.24 extends to any arbitrary bounded 1-group.
Let 1x1 = m, JYI = n, and t E (F")' where F is an arbitrary bounded 1-group.
We would like to define the rank o f t in terms of linear independence, and to
be equal to the number of linearly independent images ti, oft. Suppose we
were to define linear independence as the negation of linear dependence, that
is, a set of k images on X(a, , . . . , ak)is linear independent if and only if no
one of the a, is linearly dependent on any subset of the others. Then applying
Theorem 3.23 for (XI= n and k > n, we could find k finite images that are
linearly independent. If we defined rank as the number of linearly inde-
pendent images t, o f t , then every template would have rank k 2 n, which is
not a useful definition in this context.
Strong linear independence. As for the matrix algebra, we define the concept
of strong linear independence.
Let F,, be a bounded 1-group and let a( l), . . . , a(k) E FZa, k b 1 . We say
that the set {a( I), . . . , a(k)) is strongly linearly independent, or SLI, if there
104 JENNIFER L. DAVIDSON
is at least one finite image bEFXthat has a unique expression of the form
h
b= v a(jp)mlJn9
p= I
b = AI a(jP)BAjp
p=
then we have the concept of right dual SLI. We define in an analogous way
the concept of left dual SLI.
3. Rank of Templates
a. Template Rank over a Bounded 1-group. Let F,, be a bounded I-group
and tE(FZ,)' be arbitrary. We call the template t (right or) left column
regular if the set of images {tk}x,xare (right or) left SLI, respectively. We say
t is right or left row regular if the template t' is right or left column regular,
respectively.
Now suppose that F,, is a bounded I-group and tE(F;,)'. Suppose r is
the maximum number of images t: o f t that are SLI. In this case we say
that t has column rank equal to r. The row rank o f t is the column rank oft'.
For a template t E (F; m)y, we say that t has 0-astic rank equal to r E Z + if the
following is true for k = r but not for k > r:
Let W be a coordinate set, IW1 = k < min(m, n). There exist aEFX and
bEF', both finite, such that the template SE(F;,)' is doubly 0-astic and
s contains a strictly doubly 0-astic template u E (FY,)W, where
sy,(xJ)= b(y,) + t,,(x,) + a(x,), V i = I , . . . , n a n d j = 1, . . . , m
for F = R.
Lemma 3.28. Let F , ,be a bounded 1-group with group F = R, and suppose
that tE(F;,)' has 0-astic rank equal to r. Then t is doubly F-astic and t'
contains a set of at least r images, tk,,, k = I , . . , , r, which are SLI.
Lemma 3.29. Let F = R, and suppose that t E (FZ,)' is doubly F-astic and
consists of a set of r images which are SLI. Then t has 0-astic rank equal to at
least r.
Accordingly, we have the following theorem:
Theorem 3.30. Let F = R, and suppose that tE(F;,)' is doubly F-astic.
Then the following statements are all equivalent:
(i) t has 0-astic rank equal to r;
(ii) t has right column rank equal to r;
106 JENNIFER L. DAVIDSON
FIGURE
4. A template and its associated graph. (a) a template t; (b) associated graph A(t).
r(t) = (I v t)+' v t
Lemma 3.40.(t v I)'.' = I v t v . . . v t'-', tE(Fxz)".
Theorem 3.41. Let t E(F;,)" be dejnite. Then
t' G T(t), r = 1, 2, . . .
Theorem 3.42. Let t E (Ft )" be dejnite. Then
~
AJx) =
iA
- 00
soluble then every jinite eigenimage has the same unique corresponding finite
eigenvalue A. The template tlvJ - A is dejinite, and all$nite eigenimages o f t
lie in the eigenspace oft - 2. The non-equivalent fundamental eigenimages
that generate this space have the property that no one of them is linearly
dependent on (any subset o f ) the others.
The unique scalar in Theorem 3.55, when it exists, is called the principal
eigenvalue of t.
We call a bounded I-group F radicable if for each a E F and integer k 3 1,
there exists a unique , f ~ F such that f h = a.
Some examples of radicable bounded I-groups are R,, , Q k x . RZ".
However, Z +% <
is not radicable. Choosing a = 12 and k = solving forfin
the equation-
.f5 = 12
is just solving for f in (using regular arithmetic)
5f = 12,
which, of course, has no integral solution.
Let F be a radicable bounded I-group, and t E ( F t X ) ' . Let a = y o ,
y , , . . . , ymjbe a circuit in A(t). We define the length of CT to be m. For each
circuit a in A(t), of length I(a) and having circuit product p ( a ) , we define a
circuit mean p ( a ) ~ Fby
[P(41'(') =P ( 4 .
We also define
A(t) = v {p(a) :a is a simple circuit in A(t)}.
For the template and associated graph A(t) in Fig. 5 we have the following
computations.
4
-1
7
5
--a,
--oc
I
-m
a = (t’ x q) v r.
Lemma 3.61. Let aEFn be finite, and t E A n , (- co) satisfy S - , ( t , ) #
@ V i = 1,. . . , n. Define i by i = (t*)’. Then both 9 x ’ a and i x (? x ’ a )
are finite, and
t’ x (i x ‘a) < a.
Proof First we note that
by our choice of p and the fact that ik, E F and a, E F Vi. Thus,
c, = f:k + 6, = f:k f ikp + a,, < f:k + ik, + a, = t k , + ( - tk,) + a, = a,.
Thus, c, < a,, and our lemma is proved.
We now state the Division Algorithm.
Theorem 3.62. The Division Algorithm. Let a, t satisfy the hypothesis of
Lemma 3.61. Then for q = x a, and r dejined by
a, if a, > [t’ x (i x ‘a)],
- co if a, = [t’ x (i x ‘a)],
we have
a = (t’ x q) v r.
Proof By Lemma 3.61, a b t’ x q = t’ x (f x ‘a), and hence,
a2r
Thus, [t’ x (T x’a)] v r < a. To show that equality holds, that is, that
[t’ x (f x ‘a)] v r = a, we examine two cases.
Case I . a, > [t’ x (i x ‘a)],.
Here, [t’ x (f x ’ a)], v r, = [t’ x (f x ’ a)], v a, = a,.
Case 2. a, = [t’ x (f x a)],.
-
Here, [t’ x (t x ‘a)], v r, = a, v r, = a, v - co = a,.
Then we have
a = a’ = (t’ x a’) v r0 . (16)
By Lemma 3.61, a’ = f x ‘a” is finite, and in fact, a’ = f x ‘a‘+’ will be finite
for each i = 1,2, . . . Thus the Division Algorithm applies in particular to a’ :
a’ = (t’ x a*) v rl, (17)
FOUNDATION AND APPLICATIONS OF LATTICE TRANSFORMS 1 17
and substituting (17) into (16), we get
a = (t’ x a’) v ro
= {t’ x [(t’ x a’) v r’]} v ro
= (t’ x t’ x a’) v (t’ x r’) v ro
= [(t’)’ x a’] v (t’ x rl) v yo,
f , ( x ) = y where y i =
Then
but
Then ams = a - b.
SyW=
i -b(y)
-a
if x = y
otherwise
we have that
a = (q(t’) v r.
Proof;. We need to show that Y (r) matches with our definition of the matrix
r in Theorem 3.62. Let b = ( a W i ) M t ’ . Then using Lemma 3.67,
a - b = a m s , where
sy(x) =
i-b(y)
- co
if x =y
otherwise
Thus, a - b k 0 implies that a m s 2 0. Now,
0 if a(x) > b(x)
xCo(amIs)= CEF’, where c(x) =
- co if a(x) = b(x)
120 JENNIFER L. DAVIDSON
where any template t raised to the zeroth power, to, is the identify template, e.
In the boolean case, there exists an integer m such that
a”m(t’)” = 0,
so that the expression for a becomes
m
a= V rkm(t’)k
k=O
we have that
a = (qmi)A r.
where
t=#.
The volume v(k) of the blanket between the upper and lower surfaces is
calculated for each k by computing
pi ( k ) = uk 0 S, 91 ( k ) = bk 0 ( - s l y
where
122 JENNIFER L. DAVIDSON
area(k) = v(k)
-2k .
FOUNDATION AND APPLICATIONS OF LATTICE TRANSFORMS 123
The rate of change of log(area(k)) with respect to log(k) contains
important information about the image. The slope S(k) of area(k) versus k
is computed on a log-log scale for each k by finding the best fitting straight
line through the three points
(log ( k - I), log (area(k - 1))), (log ( k ) , log (area(k))),
(log(k + l), log(area(k + 1))).
The graph of S(k) versus k is called the signature of the image. We can also
calculate a signature for the case where the array X represents the bottom
surface and uk the upper surface. We call this the upper signature. Similarly,
the signature that is calculated using {b,} for the lower surfaces and X for the
upper surfaces is called the lower signature.
This algorithm was run on 12 outdoor images of size 120 x 240, having
255 gray values. For each image, we calculated the upper and lower images
u,,b,, i = I , . . . , 50, and the graph of the upper and lower signatures.
As k increases, regions of pixels initially having the greatest gray values
decrease in size in the images b, . However, as k increases, the images uk shrink
regions having lower gray values. In theory, this asymmetry can be used to
advantage. Roughly, the lower signature represents the shape of objects with
high gray values, and the upper signature represents the distribution of
objects throughout the image. The images to which we applied this method
were infrared, so we were mainly interested in the lower signatures.
The magnitude of the curve S(k) is related to the information lost on
objects with details less than k in size. The more gray-level variation at
distance k, the higher the values for S(k). Thus, if at small k, S(k) is large,
then there are “high-frequency” gray-level variations, and if at large k, S(k)
is large, then we have “low-frequency’’ gray-level variations. The curve S(k)
thus gives us information about the rate of change of variations in the
gray-level surface.
After running the program on a dozen images, we have concluded that this
algorithm is too sensitive to the great variance in outdoor scenery. For
example, an image that has a background of trees and no man-made objects,
and an image that has two distinct man-made objects and no trees as
background have similar graphs of the signatures. While the lower signature
represents more of the shape of the hot objects (areas with high gray values)
in the image, in one image we have no hot objects while in the other, there
are two distinct hot objects. As another example, in two other images we have
a man-made object with a road and a field as background, yet the graphs for
the upper signatures of these images are very distinct. The theory suggests
that upper signatures should represent similar targets, but we cannot draw
that conclusion from this data. A controlled scene such as a conveyor belt or
124 JENNIFER L. DAVIDSON
other industrial scene will most likely produce better results than outdoor
scenery.
The initial goal of investigating this type of complexity measure was that
these graphs would give a measure of gray-level variation within an image
and help in choosing a more effective edge operator. If an image has a high
incidence of gray-level variation at small values of k, then it is reasonable to
assume that a more sensitive mask, such as the gradient mask, would give
better results. Otherwise, if an image had small values of S(k) at small values
of k, then computation time could be saved by using a Sobel operator instead
of a computationally intensive edge operator such as the Kirsch. Unfortu-
nately, the algorithm did not produce data that leads to this conclusion.
zii<pi, i = I , . . . , m ,
j=l
subject to
n
zy 20 for all i, j
Let x , be the dual variable associated with the ith constraint in (19), and
y, the dual variable associated with thejth constraint in (20). Then the dual
transportation problem is given by Murty (1976):
maximize
m
subject to
-xi + y j < cij for all i, j
xi 3 0, y i B 0 for all i , j
This is equivalent to solving
minimize
subject to
-x, + y, < c, for all i, j
x , 2 0, y, >0 for all i, j ,
which is
minimize
m n
subject to
-x , + Y/ G c, for all i, j
x , 2 0, y, 2 0 for all i,j.
126 JENNIFER L. DAVIDSON
c d,vj- 1 p . u
n
j= I
m
i= I
I f
subject to
and, hence,
n
ui = min {cv y j } ,
j= I
+
where u = (uI, . . . , u,) and v = (vl , . . . , v,) are optimal feasible solutions.
We can rewrite (21) in vector notation, as
u= c x'v,
and ui,vj < 0.
To formulate this problem in the context of the image algebra, we define
X and Y to be nonempty, finite coordinate sets, 1x1 = m, IYI = n. Define
dE(R!dY by
Then we have
LP Image Algebra
REFERENCES
Backhouse, R. C., and Carre, B. (1975). “Regular algebra applied to path-finding problems,”
J . Inst. Math. Appl. IS, 161-186.
Bdtcher, K.E. (1980). “Design of a massively parallel processor,” IEEE Trans. Computers 29(9),
836-840.
Benzaken, C. (1968). Structures algebra des cheminements. In “Network and Switching Theory”
(Biorci, ed.), pp. 40-57. Academic Press.
Birkhoff, G. (1940). “Lattice Theory,” Vol. 25. AMS, Providence, RI.
BirkhofT, G., and Lipson, J. L. (1970). “Heterogeneous algebras,” J . Combinatorial Theory 8,
115-133.
Carre, B. (1971). “An algebra for network routing problems”, J . Inst. Math. Appl. 7 , 273-294.
Cloud, E., and Holsztynski, W. (1984). “Higher efficiency for parallel processors,” In Proc. IEEE
Southcon 84. pp. 416-422. Orlando, FL.
Cohen, J. E. ( I 988). “Subadditivity, generalized products of random matrices and operations
research,” H A M Review, 69-86.
Crimmons, T. R., and Brown, W. M. (1985). “Image algebra and automatic shape recognition,”
IEEE Trans. Aerospace and Elec. Systems AES-21(1), 60-69.
Cuninghame-Green, R. (1 960). Process synchronisation in steelworks - a problem of feasibility.
128 JENNIFER L. DAVIDSON
In “Proc. 2nd Int. Conf. on Oper. Research,” (Banbury, ed.), pp. 323-328. London, English
University Press.
Cuninghame-Green, R. (1962). “Describing industrial processes with interference and
approximating their steady-state behaviour,” Oper. Research Quart. 13, 95-100.
Cuninghame-Green, R. (1979). “Minimax Algebra: Lecture Notes in Economics and Mathe-
matical Systems 166”. Springer-Verlag, New York.
Davidson, J. L. (1989). Lattice Structures in the Image Algebra and Applications to Image
Processing, PhD thesis, Department of Mathematics, University of Florida, Gainesville, FL.
Davidson, J. L. (1990). “A classification of lattice transformations used in image processing,”
accepted for pub. in Comp. Vis., Graphics, and Image Proc., 1992.
Davidson, J. L. (1991). “Nonlinear matrix decompositions and an application to parallel
processing,” accepted for pub. in Journal of Mathematical Imaging and Vision, 1992.
Davidson, J. L., and Ritter, G . X. (1990). Theory of morphological neural networks. In “Proc.
of the 1990 SPIE OE/LASE 1990 Optics, Elec.-Optics, and Laser Appl. in Sci. and Eng.,”
Vol. 1215, pp. 378-388. Los Angeles, CA.
Davidson, J. L. and Sun, K. (1991). Template learning in morphological neural nets. In “SPIE
- Proc. SOC.of Photo-Optical Instr. Eng.,” Vol. 1568, San Diego, CA.
Duff, M. J. B. (1982). CLIP4. In “Special Computer Architectures for Pattern Processing,”
(K. S. Fu, ed.), CRC Press, Boca Raton, FL.
Fountain, T. J., Matthews, K. N., and Duff, M. J. B. (1988). “The CLIP7A image processor,”
IEEE PAMI lO(3).
Fraleigh, J. B. (1967). “A First Course in Abstract Algebra.” Addison-Wesley, Reading, MA.
Gader, P. D. (1986). Image Algebra Techniquesfor Parallel Computation of Discrete Fourier
Transforms and General Linear Transforms. PhD thesis, University of Florida, Gainesville,
FL.
Gader, P. D. (1988). “Necessary and sufficient conditions for the existence of local matrix
decompositions,” SIAM Journal on Matrix Analysis and Applications, 305-3 13.
Gader, P. D. (1989).“Bidiagonal factorization of Fourier matrices and systolic algorithms for
computing discrete Fourier transforms,” IEEE Trans. on ASSP 37(8), 1290-1283.
Giffler, B. (1960). “Mathematical solution of production planning and scheduling problems.”
Technical Report, IBM.
Hadwiger, H. (1950). “Minkowskische Addition und Subtraktion beliebiger punktmengen und
die Theoreme von Erhard Schmidt,” Mathemarische Zeitschrvt 53, 210-218.
Hadwiger, H. (1957). “Vorlesungen iiber inhalt, Oberflache und Isoperimetrie.” Springer-
Verlag, Berlin.
Haralick, R. M., Sternberg, S. R., and Zhuang, X. (1987). “Image analysis using mathematical
morphology,” IEEE Trans. PAMI PAMI3(4), 532-550.
Heijmans, H. J . A. M. (1990). “The algebraic basis of mathematical morphology, I: Dilations
and erosions,” Comp. Vis.. Graphics, and Image Proc. 50, 245-295.
Hillis, W. D. (1985). “The Connection Machine.” MIT Press, Cambridge, MA.
Klein, J. C. and Serra, J. (1972). “The texture analyzer,” J. Micros 95, 349-356.
Maragos, P. (1985). A UniJied Theory of Translation-Invariant Systems with Applications to
Morphological Analysis and Coding of Images. PhD thesis, Georgia Inst. Tech, Atlanta, GA.
Maragos, P. and Schafer, R. W. (1987). “Morphological filters, part I: Their set-theoretic
analysis and relations to linear shift-invariant filters,” IEEE Trans. Acoustics, Speech, and
Signal Proc. ASSP-35, 1153-1 169.
Matheron, G. (1967). “Elements pour une Theorie des Milieux Poreux.” Masson, Paris.
McCubbrey, D. L., and Lougheed, R. M. (1985). “Morphological image analysis using a raster
pipeline processor.” In “IEEE Comp. Workshop on Comp. Arch. for Patt. Anyl. and Image
Database Mngemt.,” 444-452, Miami Beach, FL.
FOUNDATION AND APPLICATIONS O F LATTICE TRANSFORMS 129
Meyer, F. (1978). “Iterative image transformation for an automatic screening of cervical
smears,” Journal of Histochem. and Cytochem. 27(1), 128-135.
Miller, P. E. (1978). An Investigation of Boolean Image Neighborhood Transformations. PhD
thesis, Ohio State University.
Miller, P. E. (1983). “Development of a mathematical structure for image processing.” Technical
Report, Optical Division, Perkin-Elmer.
Minkowski, H. (1903). “Volumen und Oberflache,” Mathematische Annalen 57, 447-495.
Minkowski, H. (191 1). “Gesammelte Abhandlungen.” Teubner Verlag, Leipzig-Berlin.
Murty, K. G. (1976). “Linear and Combinatorial Programming.” John Wiley, New York.
Nakagawa, Y., and Rosenfeld, A. (1978). “A note on the use of local min and max operations
in digital picture processing,” IEEE Trans. Sys., Man, and Cyber. SMC-8, 632-635.
Parlett, B. N. (1982). “Winograd’s Fourier transform via circulants,” Linear Algebra Appl. 45,
137-155.
Peleg, S. (1983). “Multiple resolution texture analysis and classification,” Technical Report,
Center for Automation Research, University of Maryland, College Park, MD.
Peteanu, V. (1967). “An algebra of the optimal path in networks,” Mathematica 9, 335-342.
Ritter, G. X., Davidson, J. L., and Wilson, J. N. (1987). Beyond mathematical morphology. In
“Proc. of SPIE Conf. - Visual Communication and Image Processing 11,” Vol. 845, 260-269.
Cambridge, MA.
Ritter, G. X., and Gader, P. D. (1987). “Image algebra techniques for parallel image pro-
cessing,” Journal of Parallel and Distributed Computing 4(5), 7-44.
Ritter, G. X., Shrader-Frechette, M. A., and Wilson, J. N. (1987b). Image algebra: A rigorous
and translucent way of expressing all image processing operations. In “Proc. of the 1987 SPIE
Tech. Symp. Southeast on Optics, Elec.-Opt., and Sensors,” 116-121. Orlando, FL.
Ritter, G . X . , and Wilson, J. N. (1987). Image algebra: A unified approach to image processing.
In “Proceedings of the SPIE Medical Imaging Conference,” Newport Beach, CA.
Ritter, G . X., Wilson, J. N., and Davidson, J. L. (1990). “Image algebra: An overview,” Comp.
Vis., Graphics. and Image Proc. 49(3), 297-33 I .
Rose, D. J. (1980). “Matrix identities of the fast Fourier transform,” Linear Algebra Appl. 29,
423-443.
Serra, J. (1969). Introduction a la morphologie mathtmatique. Technical Report, Cahiers du
Centre de Morphologie Mathematique, Fontainebleau, France.
Serra, J. (1975). Morphologie pour les fonctions “a peu pris en tout ou rien.” Tehnical Report,
Cahiers du Centre de Morphologie Mathematique, Fontainebleau, France.
Serra, J. (1982). “Image Analysis and Mathematical Morphology.” Academic Press, London.
Serra, J. (1988). “Image Analysis and Mathematical Morphology, Volume 2: Theoretical
Advances.” Academic Press, New York.
Shimbel, A. (1954). Structure in communication nets. In “Proc. Symp. on Information
Networks,” 119-203. Polytechnic Institute of Brooklyn.
Sinha, D., and Giardina, C. R. (1990). “Discrete black and white object recognition via
morphological functions,” IEEE Trans. PAMI PAMI-12(3), 275-293.
Sternberg, S. R. (1980a). Cellular computers and biomedical image processing. In “Lecture
Notes in Medical Informatics, Proc. on Biomedical Images and Computers,” (J. Sklansky,
ed.), Vol. 17, 274-319. Springer-Verlag, Berlin.
Sternberg, S. R. (1980b). Language and architecture for parallel image processing. In “Conf. on
Patt. Rec. in Practice,” Amsterdam.
Sternberg, S. R. (1983). “Biomedical image processing,” Computer 16(1), 22-34.
Sternberg, S. R. (1985). Overview of image algebra and related issues. In “Integrated Technology
for Parallel Image Processing.” Academic Press, London.
130 JENNIFER L. DAVIDSON
Sternberg, S. R. (1986). Grayscale Morphology, Comp. Vis., Graph., and Image Processing 35,
pp. 333-355.
Uhr, L. (1983). Pyramid multi-computer structures, and augmented pyramids. In “Computing
Structures for Image Processing,” (M. J. B. Duff,ed.). Academic Press, London.
Unger, S. H. (1958). “A computer oriented toward spatial problems,” Proc. IRE46, 1744-1750.
Von Neumann, J. (1951). The general logical theory of automata. In “Cerebral Mechanism in
Behavior: The Hixon Symposium.” Wiley and Sons.
Zhuang, X., and Haralick, R. M. (1986). “Morphological structuring element decomposition,”
Comp. Vis.,Graph.. and Image Proc. 35, 370-382.
ADVANCES I N ELECTRONICS A N D ELECTRON PHYSICS, VOL 84
I. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
11. The LTG/NP Approach to Visual Perception . . . . . . . . . . . . . . . . 137
111. Invariant Integral Transforms and Lie Transformation Groups . . . . . . . . 142
A. Background. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
B. Necessary and Sufficient Conditions for the Invariance of Integral I46
Transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
C. Examples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
D. Invariant Functions and Kernels of Integral Transforms. . . . . . . . . . 154
IV. Transformations of Integral Transforms . . . . . . . . . . . . . . . . . . 157
A. Weakly Invariant Representations . . . . . . . . . . . . . . . . . . . 157
B. “Covariance” of Integral Transforms . . . . . . . . . . . . . . . . . . 160
V. Notes on Invariant Representations of 3D Objects. . . . . . . . . . . . . . 166
VI. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
Appendix A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
Appendix B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I92
I. INTRODUCTION
Suppose the pattern undergoes a rotation with parameter value a; the trans-
formed pattern is,
T,f(x, y ) = cost if ( y c o s a + xsina) - k(xcosa - ysina) = 0,
TJ(x, y ) = 0 otherwise,
and T,f(x, Y ) Z T o f ( x ,Y ) = f kY ) .
Thus, patterns, or patterns features, are generally not invariant; indeed,
invariance is a property of the process of perception and not of the images.
This point is crucial for understanding the problem of invariant coding, and
it is often missed in the literature.
Representations satisfying the condition of weak invariance can be
obtained fromf(x, y ) in a variety of ways. The average of image intensity is
invariant to rigid motion, and representations based on geometric features of
the pattern, such as critical points (Haralick et al., 1983) or the Gaussian
curvature (Zetzsche and Barth, 1990), enjoy the same property, but none of
these preserves the uniqueness of image representation. A single real-valued
scalar function in the domain (x,y ) cannot define a representation that is
both invariant and unique, and the reason is obvious: if the function changes
under the action of the transformation, invariance is lost, whereas if it does
not change, the transformational state is not encoded. Thus, it is apparent
that invariant representations must be sought by mapping the pattern from
(x, y ) to some suitable space (u, v), that is, by giving the image a new
representation in (u, v) that possesses the desired property of invariance, and
comparison of images can be made in this new space. To draw conclusions
about the original patterns in the spatial domain, such mapping must
preserve the uniqueness of the image.
An encoding of visual information, alternative to the formf(x, y ) , is given
by the complex-valued integral transforms of images. Formally, a general
integral transform off(x, y ) is
for some kernel k(u, v ;x, y ) ;g(u, v) is, in general, a complex-valued function
and can be written as
g(u, v) = 4,
v)exp[Mu, v>I, (3)
where A(u, v) = Ig(u, v)l and 4(u, v) correspond to magnitude (amplitude)
and phase spectra respectively. The representation of the image in the domain
(u, v) is defined by the pair
{ A b , v), 4@,v>>, (4)
that is, to any point in the domain, (u, v) is associated a vector with
INVARIANT PATTERN REPRESENTATIONS AND LIE GROUPS THEORY 135
components A(u, v) and 4 ( u , v). Note that in the domain (x, y ) the repre-
sentation was given by a scalarf(x, y ) defined for each value of the pair
(x, y ) . In the following, to simplify the notation we use g(u, v) both to denote
the results of operation of the integral transform, i.e., a map from (u, v) to
the complex plane, and as a shorthand for the representation { A @ , v), $(u, v)}.
In the vector-valued representation (4), the requirements of invariance and
uniqueness are determined by amplitude and phase components respectively.
The invariance condition is fulfilled if A(#, v) is constant for all states of the
image transformation T,, whereas uniqueness is preserved if different states
are uniquely coded in the phase component of the transform (Ferraro and
Caelli, 1988):
Note that the action of T, is defined in the domain (x, y ) and not in (u, v),
i.e., T, acts on the original pattern form f ( x , y ) . With a slight abuse of
notation we shall call an integral transform satisfying condition (5) invariant
in the strong sense with respect to To, since the corresponding representation
g(u, v) is strongly invariant. Condition (5) can be extended to two transfor-
mation groups T,, S, . We require that
For example, the shift theorem (Rosenfeld and Kak, 1982; Papoulis, 1984)
demonstrates that the Fourier transform off(x, y ) is invariant in the strong
sense with respect to translations along the x and y axes. If
perceived paths of apparent motion are indeed Lie orbits, provided that the
angular separation between subsequent stimuli is not too large.
The LTG/NP model itself supplies only a general language for contour
perception and not a computational procedure; a formulation is needed that
can predict the direction of the contour as a function of some neural process
and of the geometry of the stimulus. In the general framework of LTG/NP,
Caelli et al. (1978) proposed a model in which vectors tangent to a contour
were computed from a sample of N points. To each pair of points P, =
P ( x , , y,), P, = P(x,, y , ) is assigned a measure of association by the function
wa(rl,)= exp (- url,), where rl, is the distance between P, and P,, and u is a
constant. Next, for each point the components u,, v,of the tangent vectors are
calculated in two steps. First, the averages iii,ei are computed with the
formulae
N
iii = C cos 28, wm(ri,), (7a)
i= I
N
fji = C sin 20ijwa(ri,). (7b)
i= I
Note that ii, 6 are the weighted averages of cos 28, and sin 28, respectively,
and that these averages are calculated since they have the property that
vectors with the same orientation but opposite sign give the same
contribution. (We are interested here in determining only the orientation of
the tangent vectors). The components uir vi of the tangent vectors are
obtained by converting the “28 averages” to “8 averages,” that is, by
calculating
( u i ,v i ) = (ricos Oi, ri sin O i )
so that
(iii, iji) = (ricos28,, risin 20i),
and the orientation of the contour at P, is estimated by the angle O i . This
method is consistent with the idea that the visual system samples the visual
stimulus and that contours of patterns are reconstructed by the action of a
vector field; but it must be noted that there is not an explicit calculation of
the integral curves that are solutions of the equations
dx dv
class. For example, a pattern f(x, y ) #f(x’ + y’) is not invariant under
rotations, but in general we are able to recognize it independent of its
orientation.
On a more fundamental level, it must be observed that LTG/NP is a very
abstract approach to vision, and it would require detailed low-level models
of visual stimulus encoding to provide the necessary predictive power.
Because of this abstractness, even the experimental support is too generic to
be convincing. For instance, LTG/NP postulates the existence of an inte-
grative process that connects local position and orientation codes to global
encoding of images, and indeed evidence has been found for such a process
(Caelli and Dodwell, 1982); however, this assumption is common to many
different models (see, e.g., Grossberg, 1976a,b; Borello et al., 1981; Zucker,
1985; Carpenter et al., 1989), and thus these experimental results cannot be
considered a verification of LTG/NP. Crucial postulates of LTG/NP are that
pattern recognition takes place by a process of cancellation and, in particular,
that complex visual stimuli are processed (cancelled) by prolongations of a
small set of basic vector fields. It is clear that to test this hypothesis a model
is required showing how neural cells in the retina and in the visual cortex
implement the operation of cancellation. In conclusion, it may be said that
LTG/NP is a meta-language, useful for conveying concepts of perceptual
invariances rather than a model that can be tested by experiment or computer
simulation; a different and more favorable assessment of LTG/NP has been
formulated by Dodwell ( 1 983).
A . Background
We shall review here some methods that permit invariant recognition under
certain transformations. The method of the cross-correlator, or matched
filter, and its relationship with the Fourier transform will be considered first;
later, integral transforms that are invariant with respect to rotations and
dilations will be presented.
For its compatibility with the human visual system and its computational
efficiency, the cross-correlator has been the most commonly used form of
pattern matching since the early 1970s. Letf(x,y) and g(x,y) be a template
and larger picture (a scene) respectively; we assume thatf(x, y ) is zero outside
a small region A and we are interested in finding places where g(x, y ) matches
f(x,y). We can do so by shiftingf(x,y) into all possible positions relative to
g(x,y) and by computing a measure of the match for each position P(x,y).
INVARIANT PATTERN REPRESENTATIONS A N D LIE GROUPS THEORY 143
+ +
where q2(a,P) = J j A g 2 ( x a,y b)dxdy.
From the Chauchy-Schwartz inequality it follows that L&(CI,P) takes a
maximum, i.e. [jjf'(x,y)d~dy]''~,for displacements (a, 8) at which g(x, y ) =
cf(x,y), that is, at positions where g(x, y ) andf(x,y) coincide, or at least are
proportional; indeed, the actual value of c is irrelevant and can be always set
equal to one by a suitable rescaling of the light intensity. The cross-correlator
thus provides a method of finding a pattern regardless of its location in the
picture, that is, the cross-correlator is invariant under translations, and the
position of the pattern in the scene, its transformational state with respect to
translations, is also encoded. The ability of the cross-correlator to function
invariant to translations is closely related to the invariance with respect to
translations of the Fourier transform, since C/,(a, 8) can be written as the
inverse transform of the product of G(u, v) and F*(u, v ) , where G(u,v) and
F ( u , v ) are the Fourier transforms of g(x,y) and f ( x , y ) respectively, and
F*(u, v) is the complex conjugate of F(u, v); in particular, the uniqueness of
the Fourier transform ensures that no false recognitions occur and that
the position of the pattern in the picture is registered. However, the cross-
correlation technique fails if the pattern to be detected is transformed by the
action of some group T,; for instance, the cross-correlator is very sensitive to
orientation and scale changes, as one must expect since the Fourier transform
is not invariant under rotations and dilations, and thus it cannot be used for
matching patterns with arbitrary orientation and size. A possible solution is
to use many templates forf(x,y) at different orientation and sizes, but this
solution requires storing a large number of templates, and the computation
time increases with the number of templates, and lacks elegance and simplicity.
To attain pattern recognition invariant with respect to rotations, Hsu et al.
144 MARIO FERRARO
(1982) and Hsu and Arsenault (1982) proposed a technique based on work
originally done in image reconstruction (Hansen, 198 I). Consider an image
function f ( x , y ) in Cartesian coordinates or 3(r, 0 ) in polar coordinates. A
circular harmonic expansion of 3(r, 0) is given by
+m
3(r, 0 ) = C
-a
fm ( r )exp (ime), (9)
where
3(r, 8 + a) = -m
fm(r) exp (id)(ima),
exp
R, =
ss
3(r,O x * ( r , 0)drdO.
For a given target pattern g(x,y), is generated by the vector
c = (IC, I, ICA.. . ,ICNI),
where
C,, =
ss
g(r, B)f,*(r, 8)drdO.
Finally, a decision rule was defined by using the vector X = R - C and taking
the norm IlXll = (XTX)'I2.The test criterion is, in this approach,
llXll < T reference pattern present,
> T reference pattern absent.
The main advantage of this method is that uniqueness of the match is
improved because several harmonic components are used to determine the
vectors R and C, and the experimental results of Wu and Stark are, as
expected, better then when a single component is used. However, this method
is computationally very expensive in that it requires calculating 2N harmonic
components and 2N cross-correlations, and the question may arise whether
it is an improvement over using a conventional match filter and rotating the
reference pattern. In conclusion, the circular harmonic decomposition
approach, in any of its versions, provides a pattern recognition procedure
that is invariant under rotations and encodes the transformational state. As
noted before, this method does not preserve uniqueness (unless all
components are used), and, contrary to claims in the literature (Yuzan et al.,
1982), it is not shift invariant, since for any pattern in the scene the center
of expansion is in general different and must be known or computed in
advance. Thus, circular harmonics decomposition cannot be used to find
patterns embedded in a larger picture or scene.
More recently (Ferrier, 1987; Caelli and Liu, 1988), a representation has
been proposed that satisfies the conditions for strong invariance under
rotations and dilations. Such representation is provided by an integral
transform of the original pattern f(x, y ) , the so-called log-polar circular
harmonic transform, or LPCH transform, whose kernel is given by
k(u,v; x , y ) = (xz +y2)-'exp(-i[uIn(x2 +y2)I" + vtan-'(y/x)]}. (10)
146 MARIO FERRARO
and it is evident that the LPCH transform is just the Fourier transform
computed in the coordinate system (r,8). The measure of the match between
two patternsf, (x, y ) andf,(x, y ) is given by the normalized cross-correlation
For the integral transform defined by Eq. (2), we call g(u, v) the response of
f ( x , y ) to k(u,v; x,y), and we define g , ( u , v ) as the' response of
T,f (x, Y ) = f T,y), where
Tax = x'(a, x, y )
TOY= y'@, x, Y ) .
Suppose we are given two one-parameter (Lie) transformation groups. The
infinitesimal operators in the domain (x, y ) have the form
INVARIANT PATTERN REPRESENTATIONS A N D LIE GROUPS THEORY 147
where
and analogously for b,( x ,y ) and b,(x, y). The functions a,(x,y ) and bi(x,y )
are the components of the vector field associated with the transformation.
The condition of strong invariance with respect to To, sb requires that
responses for changes under the action of r,,
& ,be expressed as (compare
with Eq. ( 6 ) )
O [Ta &Ax, v)I = exp [i (au + bv)lg,(u, v) = exp ( i 4 g o b ( u ,v), (1 3 4
where g,(u, v) is the response corresponding to the identity transformation,
a=b=O.
Analogously,
o[sbT , f ( x ,y)l = exp [i (au + bv)lgW(u, v) = exp ( i W g A u , 4, (13b)
and
OLsb T a f ( x , y)] = OLSb Taf(x,y)] = gab (u, v).
Note that Eqs. (13a, b) imply
v)l = kaO(u, v)l = kOb(u, v)l = kCQ(u?v)l*
It is obvious that an integral transform satisfying conditions (13a and b)
exists if it is possible to define a change of coordinates ( x , y )+ (q, g), such
that q(x,y ) and ( ( x , y )are the canonical coordinates of Taand s b , that is, the
actions of T, and s b are translations along the q and 4 axes respectively (and
they are independent of each other). In this case the desired integral
transform is given by (Ferraro and Caelli, 1988)
n n
where? (q, 4 ) + z is the form the functionfassumes in the coordinates (q, 4).
The integral transform given by Eq. (14) is the Fourier transform in the
coordinate system (q, t),denoted by FV(q,4)]; hence, it is unique in the sense
specified previously ifJsatisfies conditions (la, b) and is strongly invariant for
translations along the q and 4 axis, that is, under the action of the transfor-
mation groups Ta and sb.
In the coordinate system (q,(), 9,and 9 b can be written simply as
8/3q, 3/84 respectively, and the following equations hold:
Yaq=1 Y b q =o (1 5 4
$Pbl=o $Pbl=I. (15b)
148 MARIO FERRARO
Equations (15a and b) must be satisfied whether Y,, y b are written in the
coordinate system (x, y ) - that is, have expressions (12a and b) - or are simply
9, = alaq, y b = slay.
It is easy to show that alaq, a/a< form a basis for all Lie derivatives
operating in the two-dimensional space (more formally, they form a basis for
the tangent bundle F = U p F p ,where Ypis the tangent space at a point
P E W’) and therefore the change of coordinate (x, y ) --$ (q,<) is one to one
(Schutz, 1980). Thus, we have shown that the existence of canonical coordi-
nates for y,,Y b is a sufficient condition for the existence of a representation
g(u,v) that is invariant in the strong sense, as defined by (13a and b).
We now address the inverse problem: for any g(u, v) such that conditions
(13a and b) hold with respect to a pair T,, sb of Lie transformation groups,
there exist q,< that are canonical coordinates of these groups, that is, the
condition is necessary. Indeed this is the case, and details of the proof can be
found in Ferraro and Caelli (1988); here, only the main points are reported.
Since Eqs. (13a and b) must hold for any arbitraryf(x,y) it follows that
6p,(lk(u, v; x,Y)ldxdy) = 0, (16 4
yb(Ik(u,v; x,y)ldxdY) = O, ( 16b)
where (k(u,v;x,y)l is independent of (u,v), that is,
Mu, v; x, Y ) = h(x,y ) exp (- i y h , a;x, y)).
Consequently, Eqs. (16a and b) become
Y a (h(x,. Y ) ~ x ~=Y0) (17 4
(h(x,y)dxdy) = 0,
Y b (1 7b)
and it can be proved, from Eqs. (13a and b) and by using the identities
T, = exp [aY,], Sb s exp [byb], that
Y a y ( u ,v ; x, y ) = u (184
yby(u, v ; x,y) = v. (18b)
Equations (18a and b) are satisfied by y(u,v; x , y ) = q(x,y)u + < ( x , y ) v for
Yuq(x,y)= 1 and dRbq(x,y) = 0, and for Y,((x,y) = 0 and Yb<(x,y) = 1.
Then, from what we have shown so far, there exists an invertible transfor-
mation form (x, y ) to (q,5 ), and the latter are canonical coordinates for Y,,
Ybrespectively. This proves that the condition is necessary. Further, it
follows that Eqs. (17a and b) can be written as
ya[h(q, t;)lJ(x>~;
II, Oldqd<l = 0,
yb[h(q, <)IJ(x>y;q, <)ldqdtl = 0,
where J ( x , y ; q, <) is determinant of the Jacobian matrix of the change of
INVARIANT PATTERN REPRESENTATIONS A N D LIE GROUPS THEORY I49
variables ( x ,y ) + ( I ] , < ) . The term dud( is invariant under T,, S,, since these
transformations are translations along I] and 4 respectively; the preceding
equations imply that
that is, h(q, r ) l J ( x , y ;I],()I = c, where cis a constant. We can set, without loss
in generality, c = I; thus, the kernel of a representation g ( u , v ) satisfying
conditions ( 1 3a and b) is, in coordinates I], l ,
Mu, v; I ] ,5) = exp [- i(vu + 4v>13 (19)
that is, just the kernel of the integral transform (14). Finally, note that
or
h(X,Y) = IJ(r1, 4; X , Y ) l ,
where J ( I ] 5;
, x , y ) is the Jacobian determinant of the change of coordinates
(I], 4 1 ( x ,Y).
+
Then the kernel of the integral transform (14) is, when the integral is
calculated in the domain ( x ,y ) ,
N u ,v ; x , Y ) = IJh4; x , y)l exp { - i [ d x ,y b + 4 ( x , ~ > v l } . (20)
The existence of a kernel of the form (1 9) or (20) is then a necessary and
sufficient condition for the strong invariance of a representation g(u, v ) .
(Throughout this paper we shall denote the kernel of the integral transform
by k(u, v; I], 4 ) when the integral is supposed to be calculated in the domain
( 1 ] , 4 )and by k ( u , v ; x , y ) when the domain of integration is ( x , y ) . ) We can
summarize these results in the following proposition (compare with Ferraro
and Caelli ( 1 988)):
Proposition 1. Given two Lie transformation groups T,, s h acting on an
image f ( x ,y ) there exists a representation g ( u , v ) of the image in the domain
I
their Lie bracket (or commutator) [9, Yb],is equal to zero. Then we can
state, in conjunction with Proposition 1 , the main result of this section.
Proposition 2. A representation g ( u , v ) of an image invariant in the strong
sense with respect to the Lie transformation groups T,, sb, exists if and only if
the infinitesimal operators pa,Ybare linearly independent and [Y,, 9 b ] = 0.
Formally, Propositions 1 and 2 are equivalent, but it is clear that Pro-
position 2 provides a simpler method to determine whether a pair of transfor-
mation groups T,, s b admits a representation g(u, v ) strongly invariant;
commutativity and linear independence of two infinitesimal operators 9,, 9 b
can be determined by straightforward (although sometimes very tedious)
calculations, whereas existence of canonical coordinates is usually more
difficult to prove. For instance, if T R is a rotation and S, a dilation, it is easy
to verify that YR,9,, commute and are orthogonal, and hence linearly
independent, whereas for translations along the x and y axes, with in-
finitesimal operators Yxand 9y, we have
[9R,9 x 1 # 0, [9, 1 # 0,
9 x, [ S R ? Yy1# 0, [=%, 9
y # 0.
It is therefore possible to find kernels k(u, v ; x, y ) such that the response
g(u, v ) is strongly invariant for dilations and rotations, but impossible for
translations and rotations or translations and dilations. It follows that,
contrary to claims in the literature (Casasent and Psaltis, 1976; Yuzan et al.,
1982), no integral transform can exist that is size-shift or orientation-shift
invariant while preserving the uniqueness of the pattern’s transformational
encoding.
The representation in (14) entails writing the stimulus pattern in coordi-
nates (q, 5 ) ; computationally, it is desirable to maintain the pattern in the
formf (x, y ) and to write the kernel in the coordinate system (x, y ) . The results
we have proved so far provide a simple procedure for finding the invariant
representation (14), which can be summarized as follows:
1. Solve Eqs. (1 5a, b) using 9, y,b expressed in (x, y ) to find q(x,y ) and
tky).
2. Compute the Jacobian determinant J ( q , 5; x, y ) of the transformation
from (q, 5 ) to (x, y ) . (Note that, since the change of variable is one-to-one
IJh,5; x,v)l # 0.1
The result will be Eq. (14) calculated in the integration domain (x, y):
g ( u , v )= jj
+m
-a,
f ( x , u ) l J ( v , t ;x,y)lexp{-i[q(x,y)u + 5(x9Y)v1jdx4 (21)
and
k(u,v ; x, Y ) = IJ(v,5; x , y)l exp { - i [q(x,y)u + t(x, ~ 1 ~ 1 ) . (22)
INVARIANT PATTERN REPRESENTATIONS A N D LIE GROUPS THEORY 151
It is obvious that g(u, v) does not change whether it is calculated using (14)
or (21).
As concerns the comparison of images transformed by the action of T, and
s b , it must be observed that any imagef(x,y) is uniquely represented in the
canonical coordinates domain byy(q, 5 ), since the change of coordinate from
( x , y ) to ( q , 5 ) is one-to-one, and then it is possible to define a generalized
cross-correlation
C . Examples
+
It is trivial to verify that x2 y 2 = const is a solution of Y R ( ( x , y )= 0 , so the
canonical coordinate tJ must be of the form t ( x ,y ) = t ( x 2+ y 2 )(Ovisian-
nikov, 1982). Hence,
a
x-t(x2 + y 2 )+ y -am 2 + Y 2 >= 1.
ax aY
INVARIANT PATTERN REPRESENTATIONS AND LIE GROUPS THEORY 153
+
Set x2 y 2 = z; by application of the chain rule, it follows that
a
x-<(z)2x
a
+y--5(z)2y
a
= 2x2-5(z)
a
+ 2y2-&) = 1,
aZ az aZ aZ
that is,
a 1
-((z) =-
aZ 22
and
t ( x , y ) = t(x2 + y 2 )= In [(x’ + y’)”’].
In a similar way we can calculate q. A solution of P D q ( x , y )= 0 is y / x =
const, then q(x,y) = q ( y / x ) , and by setting z = y / x we obtain
y’aq(z) + a q ( z ) = 1
-
x’ az az
and
? ( X , Y ) = tan-’(y/x).
where
Y’ = exp (@Y,
that is, a dilation with different parameter values for x and y ; the correspond-
ing infinitesimal operators are YU= xa/ax and dpb = ya/ay. The system ( I 5a
and b) then becomes
) 5
The first and third equations show that r] = ~ ( y and = t(x), and the
solutions are, from the second and fourth equation
r](y)= 1/21ny2 and ( ( x )= 1/21nx2.
The modulus of the Jacobian determinant is
IJ(% 5; x,y)l = I(xy)-’l.
= exp (iua)
ss f (x’, y’)k(u, v ; x’,y’)dx’dy’,
by virtue of conditions (13a and b); here J(x,y; x’,y’) is the Jacobian
determinant of the change of variable ( x , y ) -,(x’,~’). Since the above
relations must hold for any f (x, y ) it follows that
exp(iua)k(u,v ; x ’ , y ’ ) = exp(iua)T,k(u, v; x , y ) = IJ(x,y; x’,y’)lk(u, v; x,y),
hence
Conversely
156 MARIO FERRARO
= exp (iau)
ss f ( x ’ ,y’)k(u, v ; x’, y’)dx’dy’,
In fact,
Y cos (qu + {v) = v sin (qu + & ~ ) Y ~ ( q+u 5v)
- u sin (qu + 5v)Yb(qu+ {v)
M[f(x)] = M ( u ) =
s,u f(x)x-i”-’dx.
M[f(x,y)] = M ( u , v ) = ~ ~ o m / ( x , y ) x - i ” - ~ ~ - ” - ’ d x d y .
M(u)=
jomf(exp t )exp (- iu<)dt.
If the original functionf(x) is dilated, that is, transformed tof(exp (a)x), the
corresponding Mellin transform is
MF(f(x)) = MF(s) =
IornIF(u)lu-’”-’du,
where IF(u)l is the amplitude spectrum of the Fourier transform F(u) off(x).
(33)
J‘s
M F ( f ( x ,y ) ) = MF(s) = IF@,0)l exp ( - isr)dr.
As mentioned earlier, it is well-known (Rosefeld and Kak, 1982) that
(34)
here, x’ = exp(a)x, y = exp(a)y. Equations (35) and (36) show that the
amplitude spectrum IMF(s)l of the Fourier-Mellin transform is invariant
under translations off(x, y ) and dilations of F(u, v ) but, contrary to claims
in the literature (Casasent and Psaltis, 1976), is not invariant with respect to
dilations of the original image. However, the value of the parameter a is
encoded in the phase spectrum so that the amplitude spectra IMF(s)l can be
suitably rescaled by a known factor exp (2a).
In conclusion, the Fourier-Mellin transform MF(f(x, y ) ) provide a
representation invariant in the weak sense under translations; as concerns
dilations, the representation is invariant except for a multiplicative factor
exp (- 2a), and the transformational state is encoded. However, this
representation is not unique, because it is based on an integral transform of
IF(u,v)l, and the amplitude spectrum of the Fourier transform does not
define uniquely the image f(x, y ) .
The detection of the target is determined by the Mellin-type cross-corre-
lations of the amplitude spectra of the Fourier transforms of the original
images,
C(a) =
Io*IFWl IF(r + alldr,
where 8 is dropped for notational clarity. The cross-correlation has a
maximum at s if and only if
,‘fI Wl = c l F k + 41.
The Fourier-Mellin transform method is shift and size invariant, but it is
well-known that using amplitude spectrum alone to represent a pattern will
result in erroneous recognition since the power spectrum does not define
uniquely the original pattern. That is, this method achieves invariance by
losing uniqueness of the matching process.
In the next section we will show which conditions one-parameter transfor-
mation groups must satisfy to ensure the existence of integral transforms that
enjoy properties similar to those of the Fourier-Mellin transform with
respect to dilations and translations.
g, (u, =
ss J(vl’,t’)k(?, t; u,v)dq&
= jj3(S.. 5’)4q’, 5’; u, v)lJ(rl, t; q’, t’)ldq’dt’
= IJ(& 5; v’, t’)lg(u’,4 9 (38)
where J ( q , 4; q’, 5’) is the Jacobian determinant of the changes of variables
(q, 4 ) + (q’, 5 ’ ) and u’ = U, u, v’ = U, v.
Despite its rather complicated formulation, the covariance property has a
very simple meaning: among the transformations under which g(u, v) is not
invariant in the strong sense there exist some such that their action onf(x,y)
results in a simple transformation of u and v. The following propositions
have been demonstrated by Giulianini et al. ( 1 992) and we shall report here
just a sketch of the proof.
Proposition 4. Let g(u, v) be the response of an integral transform with kernel
exp [ - i(uq(x, y ) + v<(x,y))]: g(u, v) is covariant with respect to a transfor-
mation group N, ifand only if the action of N, is a linear transformation of the
coordinates q , 5. Furthermore U, = ( N c ’ ) ~ i.e.,
, U, is the transpose of the
inverse of N,.
Proof. First we prove that the condition is necessary: if exp [ - i(uq(x,y ) +
v ( ( x , y ) ) ] is covariant with respect to N, we can write
[ 1(0,0)+,
a1
a1
q’+y
lOqO 2 lo,5’+--
2 a12 w2 I
0.0 1‘2
1-1 a2 1 aZ1
+ a1 a5 0.0 1’5’ + 7 p ~ o , 0 5 ’ 2 + . . .]
+v
[ t(0,O) +-
a5
a1 loa2t. at lo.
1’+:
65
y+--
2 a251 0,o
1’2
1-
+ a1a2at5 0.0
1’5‘
1
+ 7p~o,ot’z
+ * * .]
= 24’1’ + V’t’, (39)
where the derivatives are computed in (0,O). Rearranging the left-hand side
of equation (39) it is easy to show that the terms containing powers of
1’ and 5’ of order greater than one must vanish, and furthermore,
q(0,O) = t(0,O) = 0.
Therefore,
where
a(c) = 7
a1 I
0.0
P(C) = 91
85‘ 0.0
FR ( u , v) = F(Nl u, N, v).
2. In the case of dilations, N, is a symmetric matrix i.e., (N; = N l - ’ . The
determinant of the Jacobian is 1/c’*, where c’ = expc. Then
FD(U,v) = (1 /c’2)F(u/c’,v/c’).
164 MARIO FERRARO
Given two commuting Lie transformation groups T, and s b and the corre-
+
sponding invarinat representation g(u, v) with kernel exp [ - i(qu (v)],it is
possible to establish whether g(u, v) is covariant with respect to a Lie trans-
formation group N, simply considering the properties of the commutator
[Yu,Y,] and [Yb,Yc], that is, without computing the canonical coordinates
v 9 5.
Proposition 5. The integral transform g ( u , v) is covariant with respect to N,.
if and only if the following conditions hold:
-%I
[Yu, = + PYb (40a)
and therefore
Conditions (40a) and (40b) are also sufficient. The infinitesimal operators
9, = a/aq and Yb= a/a( form a basis in R2, and in general 9, can be written
as
hence,
and
s
G(s) = A(i(u, v)> exp ( - isi)di,
[Ykj,YD] = 9 x 9 [ y j ? T D ] = Y y ?
C(s,t ) =
SIIF(r, 0)l exp [ - i (SY + tO)]drdO,
where 0 = tan-’ (u, v ) and r = In (u2+ v2)”’. It is interesting to note that
(43)
v. NOTES OF 3D OBJECTS
O N INVARIANT REPRESENTATIONS
often contrasting meanings. Some approaches deal only with single pre-
segmented objects, whereas other schemes aim to interpret multiobjects
scenes. Some recognition systems require multiple viewpoints and in others
data are supposed to be available from both sensors and intermediate
processors. (A comprehensive bibliography and a precise definition of the
problem can be found in Besl and Jain (1989.) We shall be concerned here
solely with the problem of invariant coding in three dimensions, i.e., with the
problem of finding surface representations invariant under rigid motion in
R3.The literature on surface representations is vast in computational vision
(compare with Besl and Jain, 1985, 1986); the scope of our investigation is to
show how differential geometry provides necessary and sufficient conditions
for the soluton of three-dimensional invariant coding and to analyze some
examples of differential-geometric surface descriptors.
There are at least three characteristics that make object recognition more
difficult than pattern recognition (Caelli et al., 1992). First, sensory data are
usually in the form of light intensity and must be converted into data about
the shape of the surface. This entails solving the problem of “shape from X,”
that is, inferring the shape of a surface from the information contained in the
surface’s image. “Shape from X” is, in itself, a major problem in compu-
tational vision, since it is ill-posed in the sense of Hadamard (Hadamard,
1923; Poggio and Torre, 1984). Over the years, a variety of methods has been
proposed to infer shape from images: shape from stereo (Grimson, 1980),
from motion (Ullman, 1979), from texture (Witkin, 1981; Blake and
Marinos, 1990), from shading (Horn and Brooks, 1986; Bischof and Ferraro,
1989), from focus (Pentland, 1987),and from photometric stereo (Woodham,
1980). The difficulty in solving “shape from X” is certainly related to a
number of factors (e.g., scene illumination and reflectance properties of the
surface) other than the surface’s shape, which take part in the process of
formation of depth maps.
An alternative technique for gaining information about surface geometry
uses range finders to produce depth maps, or range images, of the surface. In
range images, the depth value encodes information at each pixel about
surface geometry in terms of the distance between the sensor and the surface
(Besl and Jain, 1985). The interpretation of depth maps is more immediate
than that of intensity images in that factors such as scene illumination and
reflectance properties of the surface do not concur to form the range image;
the information about surface geometry is directly encoded, but, of course,
the process of formation of range images is not related to vision. Whatever
be their specific format, sensory data refer to visible parts of surfaces, or
visible surfaces for short, and any surface of a physical object in not completely
visible from an observer in a fixed position (apart from objects made of
transparent material!).
168 MARIO FERRARO
whereas in the implicit representation, points on the surface must satisfy the
equation
F ( x ,y , 2 ) = 0. (45)
View-dependent surfaces are Monge patches, graphs of the depth map h,
h: (x,y)+ h(x,y) = z (46)
(precise definition of surfaces and related mathematical entities can be found
in Appendix B).
We are not interested here in abstract surfaces but rather in surfaces of
physical objects that are closed, bounded and continuous, and we shall also
assume that surfaces are smooth and regular, that is, that there are no cusps
or sharp edges. The last two assumptions in general are not satisfied by most
physical surfaces, but, if fine or microscopic details are disregarded, they are
at least piecewise smooth, and usually non-regular (singular) points form a
set of zero measure in R2; therefore, our hypotheses are not too restrictive.
Note that the condition of continuity holds for view-independent represen-
tation of surfaces, whereas in Monge patches, occlusions of parts of the
surface result in discontinuities of the depth map h.
INVARIANT PATTERN REPRESENTATIONS AND LIE GROUPS THEORY 169
Any rigid motion in R3can be decomposed in six one-parameter transfor-
mation groups, three translations and three rotations. Translations are
defined by the formula
(x',y',z')'= T(x,y,z)'=(~+a,y+b,z+~)~,
and rotations are generated by the matrix operators
A2=A2(0)=[
cos6
0
-sin6
I cos$
0
1
0 cos6
sin6
0
-sin$
1.
A3=A3($)=[; si;$ co;$],
where 4, 6, II/ are the Euler angles (Korn and Korn, 1968). We shall denote
a generic translation by T(a,h, c), or simply by T, and likewise R($,O, $), or
R, will denote a rotation obtained by any application of the matrices A , .
The infinitesimal operators are
t , = a/ax, t , = slay, t , = aidz,
for translations, and
I, = - Zqay + ya/az, i2 = za/ax - xa/az, i3 = - ya/ax + xalay,
for rotations about the x, y and z axes, respectively. (In the following we shall
use symbols t, and 1, for infinitesimal operators of translations and rotations
and shall keep the symbol 9to indicate a generic infinitesimal operator.) The
Lie brackets of t , , 1, are
[t,, t,l = 0, (47a)
(Crampin and Pirani, 1986). Since the operators li and ti do not commute, the
result of the application of a rigid motion to a vector x depends on the order
in which translations and rotations are performed; however, it is well known
(see, e.g., O'Neill, 1966) that any rigid motion in R3is uniquely determined
by a rotation followed by a translation, and thus we denote a generic rigid
motion by TR, and a transformed surface by S' = TR(S).Analogously, the
commutator between 1, and rj is different from zero, unless i = j , and this
shows that the result of a generic rotation R depends on the order of
applications of the matrices A i . There are various sequences A i , A,, A,, where
i, j , k need not be different, that uniquely define R ( 4 , 0 ,$) in the 3D space,
and we can set, without loss in generality, R ( 4 , 0 ,$) = A , (4)A2(O)A3($)
(Korn and Korn, 1968).
It is obvious that representations given by the mapsf, F, and h define
surface uniquely but are not invariant with respect to rigid motions.
We begin the analysis of invariant representations by showing how any
surface can be generated by the action of two commuting, linear independent
vector fields. (Later, in order to give the formulae a more compact form, we
shall use the notation x, = x, x2 = y , x3= z . )
Let S be a surface and let 9: be a vector field that assigns to each point
x = (x, ,x2, S a tangent vector vl,. From an algebraic point of view, a
vector field is the infinitesimal operator of a one-parameter group of transfor-
mations that are smooth and one-to-one: the action of this group, starting
from a point xo, generates smooth integral curves of Yu,a(u) = (x,(u),
~ , = xo, whose tangent vector at a point x coincides with the
x2(u),x ~ ( u ) )a(0)
value of 3, at the same point:
= 6Ru(x).
du x
Then it follows that, in R3, Yuhas the form
+ +
Yu= (dx,/du)d/dx, (dx,/du)d/dx, (dx3/du)i3/dx3.
The vector field 2'"completely determines the curve a(u) except for the initial
point xo, and hence there exists an infinite number of curves a(u), one for
each different initial point xo. Let a(u, xo) denote the maximal integral curve
starting from xo.The curve a(u, xo)can be calculated by using the exponential
map from the tangent bundle TS to S,
exp: TS -,S, a(u, xo) = exp ( u g U ) x , ,
where the exponentiation has the usual operational sense (compare with
Appendix A), i.e., a(#,xo)is computed as a Taylor's expansion in powers of
u. Consider a vector field gW linearly independent of Yuand such that
[Yu,Yw] = 0, and let fi(v,xo) be the maximal integral curve of Ywstarting
INVARIANT PATTERN REPRESENTATIONS A N D LIE GROUPS THEORY 17 1
Among all vector fields satisfying Eq. (49), there exist pairs of commuting,
linearly independent vector fields that are generators of a parametric
representation of a surface. The Frobenius integrability theorem, when
applied to R', states that if there exist three vector fields s1,
p2,9,such that
7
where cij are smooth real-valued functions, then the integral curves of the
vector fields mesh to form a family of R2 surfaces that fill a subset A of R3
(Spivak, 1979; Schutz, 1980); moreover, each point of A is on one and only
one surface. The condition is also necessary. As an example consider the
vector fields
172 MARIO FERRARO
projection on the image plane (Brady et al., 1985; Beusmans et a]., 1987;
Koenderink, 1987; Richards et al., 1987).
Surface representations based on tangent vector fields are useful for
analyzing properties of curves on the surface, permit the exploration of
different types of parameterization, and further are invariant, by definition,
under translations. Unfortunately, these representations do not meet the
conditions of invariance under rigid motion because they are linearly trans-
formed under rotations of the surface. It is a standard result of differential
geometry (O’Neill, 1966) that if a surface S is mapped into S‘ by a rigid
motion, S‘ = T R ( S ) ,any tangent vector vp = (vl ,v 2 ,v3)T to S at a point p is
transformed in a tangent vector to S‘ at q = TR(p), denoted by
wq = (w, w2, 5
and
wq = R(v,),
that is, tangent vectors are invariant under translations of the surface,
whereas when the surface is rotated, tangent vectors are rotated exactly the
same way.
The answer to the problem of invariance and uniqueness of representations
lies in the fundamental theorem of surface theory: a surface is defined uniquely,
up to a rigid motion, by the coefficients of its first and second fundamental
form
g&, v ) = v,, bf,(u,v) = x,,n, iJ = 1,2, ( 5 1 a)
where
XI= xu = ax/au, x2 = x, = a x / a v , (51b)
= a2x/au2, x22= a Z X / a v 2 , x 1 2= Xz1 = a2x/auav, (514
and n is the unit normal to the surface (see Appendix B). In other words, two
surfaces with the same coefficients of the first and second fundamental form
can be superposed onto each other by a rigid motion.
It is obvious from Eqs. (51a, b, c) that there are six independent coefficients
of the first and second fundamental form; they are invariant under rigid
motions of the surface in the sense that g,(p(u, v)) = g,,(q(u, v)), and b,](p(u,v)) =
h,,(q(u,v ) ) , where P E S, q E S’, and S’ = T R ( S ) ,q = TR(p). In the literature
on differential geometry, the term uniqueness is always understood to mean
“uniqueness within a rigid motion,” the reason being that the shape of the
surface is unique even though its position and orientation are not determined.
For the sake of simplicity, we shall heretofore use the same convention even
though it differs from our previous definition of uniqueness.
The set of functions g,(u, v ) ,b,(u, v ) defines a six-dimensional representa-
174 MARIO FERRARO
tion of S,
{gij(u, v), bij(u, v>>
that is unique (in the sense of differential geometry) and is invariant, albeit
in the weak sense, as position and orientation of the surface are not encoded;
in turn, this representation depends on the action on S of the five differential
operators of first and second order
(a/au, a/av, a2/au2,a21auav,a21av2), (52)
(compare with Caelli et al., 1992), or, more generally,
z,-e,%-%,93.
Mu, (53)
In other words, surfaces are completely described by tangent vectors,
surface normals, and rate of change of tangent vectors with respect to the
parametric representation.
Although the representation {(g,(u, v ) , bij(u,v))} is the answer to the
problem of invariance and uniqueness it requires the computation of six
functions, and, furthermore, it is difficult to interpret which information
about surface shape is conveyed by each of these functions. It would thus be
advantageous to find a simpler representation that combines the information
of g, and b, in a way that makes surface characteristics easier to interpret.
Besl and Jain (1986) proposed the use of two curvature functions, the
Gaussian (K(u,v)) and mean ( H ( u ,v)) curvatures to characterize surface
shape. They argued (Besl and Jain, 1986) that K(u, v) and H(u, v) capture the
salient properties of surface geometry even though, in general, cannot
ensure uniqueness. However, for compact and convex surfaces, where
K(u,v) > 0 at every point, there exists a single function, the Gaussian
curvature K(u,w) that uniquely defines (up to a rigid motion) the surface
(Chern, 1957); an example of such surfaces are the ovaloids, that is closed,
bounded, and convex surfaces. Moreover, it is interesting to note that, under
certain conditions, H can uniquely define a Monge patch. The function H can
be written as
H(X,Y) = tV[Vf(X,Y)[l +l~f~~~Y~ll-”21, (54)
(Besl and Jain, 1986) and Eq. (54) is a second order elliptic quasi-linear
partial differential equation. If the domain of definition D of the Monge
patch is bounded, H(x,y) is an arbitrary function of the two variables with
continuous first partial derivatives in D,and fi ,f2are solutions in D to Eq.
(54) such thatfi(x,y) =f2(x,y) on the boundary aD, thenf,(x,y) =f2(x,y)
throughout D (Gilbarg and Trudinger, 1977).Thus H(x, y ) plusf(x, y ) on aD
together constitute a representation of Monge patches such that all informa-
tion present in the original depth map is preserved. Under conditions con-
INVARIANT PATTERN REPRESENTATIONS A N D LIE GROUPS THEORY 175
cerning the absolute value of the integral IJHdxdyl calculated on any domain
A c D , it can be proved (Giusti, 1978) that H alone defines uniquely, within
a rigid motion in R2, the functionfand hence the Monge patch. However,
the above results apply to Monge patches only and thus they have a limited
relevance to our problem.
It is well known from differential geometry that K ( u , v) and H ( u , v) are
invariant under rigid motion (Gauss theorema egregium establishes a stronger
invariance property, namely that K ( u , v) is invariant under isometries), and
hence an encoding by K(u, v) and H ( u , v) provides a representation invariant
(in the weak sense).
One of the advantages of the representation { K ( u ,v),H ( u , v)} is that it
provides a simple way to segment surfaces in parts; every surface point can
be classified according to the sign of K and H (Besl and Jain, 1986).
If K > 0 the point is said to be elliptic, that is, the surface in a neighbor-
hood of x is like an ellipsoid; if K < 0 the point is hyperbolic and the S is
locally saddle-shaped; when K = 0 it is locally flat or conical or cylindrical.
If the sign of H is also considered, any point of the surface can be classified
as belonging to one of eight classes. If K = 0 and H < 0 the surface looks
locally like a ridge; if K = 0 and H = 0 it is locally planar; and if K = 0 and
H > 0 it is locally valley-shaped. When K < 0, the sign of H # 0 indicates
whether the surface looks more like a valley or a ridge, and K < 0, H = 0
correspond to the case of a surface that is locally minimal. Finally, if K > 0,
H < 0 the surface is locally ellipsoid-shaped and peaked (i.e., the surface
bulges in the direction of the surface normal), and if K > 0, H > 0 the surface
is locally ellipsoidal and bulges in the direction opposite the surface normal.
Note that if K > 0 it is impossible that H = 0 (see Appendix B). (Of course
it would also be possible to use the signs g,] and b , to classify surface points,
but the resulting classification would be very complicated, as there are
36 = 729 classes!)
It is not clear how the advantages of simplicity of the representation
{ K ( u ,v), H ( u , v)} compare with the lack of uniqueness, and this problem
can be solved only by a detailed analysis on the relationship between
{g,](u,v), b,](u,v ) } and { K ( u ,v ) , H ( u , v)}. In experiments with simple range
images (Besl and Jain, 1986), depth maps were reconstucted by using K ( u , v )
and H ( u , v), together with four other surface descriptors, invariant under
rigid motion:
1. The determinant g of the matrix k ] of the coefficients of the first
fundamental form; the integral o f g over the domain of definition of a surface
gives the area of the surface.
2. The coordinate angle function 0 defined as
@ = cos-'k12(gllg22)-11~
176 MARIO FERRARO
VT. DISCUSSION
by conditions (40a and b); however, it may be conjectured that there exists,
in some space of dimension at least equal to the number of infinitesimal
operators of the set, a system of coordinates that permits the definition of a
representation that encodes the transformational state in a simple way. We
have found an example of such space in the problem of invariant representa-
tions of objects, where to encode the transformational state requires a
six-dimensional parameter space.
Integral transforms of images were developed in the framework of artificial
vision, but they have also enormously stimulated the study of visual percep-
tion in biological systems. Since the pioneer work by Campbell and Robson
(1968), many other psychophysical studies (see, e.g., Braddick ef al., 1978;
Graham, 1980) have shown that spatial vision in biological systems may
depend on a Fourier decomposition of the stimulus pattern in elementary
stimuli that are the basis functions of the transform, and these results have
been supported by electrophysiological findings (see, for instance, Maffei,
1980; De Valois and De Valois, 1988). Although most studies have been
focused on the investigation of visual system sensitivity to amplitude infor-
mation (i.e., contrast), other experiments have proved that vision also
depends critically on phase information (Brettel et al., 1982; Lawden, 1983).
These results, and some neurophysiological experiments (Pollen and Ronner,
1981, 1982) seem to support the suggestion that amplitude and phase values
of local frequency of a stimulus pattern may be represented by a pair of
cortical cells, with even and odd symmetries (Robson, 1975).
There is no a priori reason why basis functions of the Fourier expansion
should be the only ones appropriated to decompose visual stimuli; a log-
polar coordinate system has been used to describe the mapping of retinal
images to the visual cortex (Schwarz, 1980). More recently, it has been
proposed that elementary stimuli based on the kernel of the LPCH transform
can also be used to specify the characteristic of the visual system, and some
results from psychophysical investigations seem to support this idea (Simas
and Dodwell, 1990). Thus, the operations of visual perception, at least in the
early stages of the process, might be characterized by the coding properties
of two sets of independent filters or channels, and these two systems might
encode both shift and size/orientation characteristics of the stimulus pattern.
One might speculate that similar filters exist for any pair of canonical coor-
dinates (compare with section III.D), possibly not as fixed filters but rather
as the result of some adaptive process that depends on the signal (the stimulus
pattern) and the task of the observer.
Pattern representations considered here are specified by real or complex-
valued functions defined on some domain (x, y ) or (u, v ) , and they are called
implicit (Caelli et al., 1992), in that they do not encode explicit image features.
A different type of representation exists that encodes images explicitly or
180 MARIO FERRARO
symbolically, and has been called explicit (Caelli et al., 1992). In such
representations, patterns are decomposed in parts labelled according to a set
of basis elements, such as lines, corners, or regions. Basis elements are
assigned a list of attributes, or unary predicates, such as “straight,” “acute/
obtuse,” or “close/open,” which define individual part characteristics. Parts
are related by binary relationships between parts, e.g., “adjacent to,” “left/
right of,” or “above/below,” which play a specific role in coding patterns
with the required invariance characteristics. The difference between implicit
and explicit representations entails different methods for pattern recognition:
whereas cross-correlation is the standard matching technique associated with
implicit representations, graph matching, heuristic search, and decision trees
are the predominant tools in the matching procedure of explicit represen-
tations. In general, invariant recognition for explicit representations comes
from the development of unary and binary properties of image parts that
have invariant characteristics. For example, part area, perimeter, and
interpart distances are unary features that are invariant under translations
and rotations; tri-part intersection angles are also invariant to dilations.
Thus, the invariance of a representation is determined by the choice of
appropriate features and binary relations, but uniqueness and registration of
transformation can only be guaranteed if the pattern can be uniquely
reconstructed from the features list, and the features are indexed according
to the transformational state.
Finally, we have seen that for three-dimensional objects it is more difficult
to find strongly invariant representations that are computationally efficient,
and this depends on two facts: first there is not, or at least it has not been
found, an alternative way to represent surfaces with properties similar to
integral transforms of images, and further, the transformations of interest do
not commute, unless one considers only the trivial case of translations in R3.
Differential geometry provides the basic conditions for the invariance and
uniqueness of the representations, but the representation {g,(u, v), b , ( u , v ) } is
not computationally very efficient in that it requires a six-dimensional space
only to encode surfaces shape, that is, without considering position and
orientation in R3. Of course, there may exist alternative representations of
lower dimensionality that retain all important information about surface
shape even though they are not unique, and we have seen that the curvature
functions H(u, v) and K(u, v) seem to have some of these characteristics; but
the solution of this problem requires further investigation, both theoretical
and experimental.
INVARIANT PATTERN REPRESENTATIONS AND LIE GROUPS THEORY 181
A
APPENDIX
is a difeomorphism.
( A more general deJnition of manifold can be found in Lunge (1967).)
The coordinate maps 4,: U, + endow the manifold with a coordinate
system x = ( x , , . . . ,x,) and with the topological structure of R".
Roughly speaking, a Lie group is an infinite group whose elements can be
parameterized smoothly. Thus, any element g of the group can be denoted by
g(a,,. . .a,) in terms of the parameters a , , . . . ,a,. The parameters of element
gh, resulting from the group operation, are smooth functions of the
parameters of g and h. The importance of the Lie group resides in the fact
that one can combine both differential calculus and algebra to investigate the
structure of the groups.
Definition A2. An r-parameter Lie group is a group G that also carries the
structure of an r-dimensional smooth manifold such that both the group
operation
m: G x G-G, m ( g , h ) = g h , g,hEG,
182 MARIO FERRARO
2. For all p E M ,
T k P ) = P.
3. I f ( g , p ) E G x M , then ( g - ' , T ( g , p ) ) E G x M and
The symbols d/dx, can be considered for the moment as a special way to
denote the basis of a tangent vector; later, we shall see that they are indeed
partial differential operators. Consequently, tangent vectors (or, more
exactly, vector fields) can be regarded as differential operators.
Consider the helix
y(s) = (cos s,sins, s)
in R3, with coordinates (xI,x2.x3).It has tangent vector
vI, = - sin sd/dx + cos sd/dy + a/dz = - y a / a x + xa/dy + d/dz
at the point x = (xi,x2,x3)= (cos s,sins, s).
Two curves passing through the same point x have the same tangent vector
if and only if the derivatives a t x are equal. It is possible to prove (Olver,
1986) that this property does not depend on the local coordinate system used
in the neighborhood of x.The set of all tangent vectors to all curves passing
through x in M is called the tangent space to M at x and is denoted by TM,.
The collection of all tangent spaces at all points x in M is called the tangent
bundle of M , denoted by
TM = U,TM,.
If y(s) is a smooth curve on M , then its tangent vectors will vary smoothly
from point to point.
A vectorfield v on M is a mapping that assigns a tangent vector vI to each
'i
point x E M , and vI, varies smoothly from point to point. The vector field has
184 MARIO FERRARO
the form
m
v= 1 Ci(x)a/axi,
i= I
Comparing Eqs. (56a and b) with Definition A3 and identifying the group
operation with the addition, it is apparent that the flow generated by a vector
field is the same as a group action of the Lie group R on the manifold M , a
Lie one-parameter group of transformations. We shall denote by T(s,x ) (or T,,
to simplify the notation) both the transformation groups and the action. The
vector field v is called the infinitesimal operator (injinitesimal generator, Lie
derivative) of the action, and henceforth will be denoted by YS, or just 9
when no confusion arises. Lie’s first theorem (Guggenheimer, 1963) demon-
strates that any one-parameter transformation group is determined by its
infinitesimal operator. The orbits of T(s,x ) are just the integral curves of the
vector field and are given by the formula
X’ = T(s,X ) = ( T I(s, x ) , . . . , T,(s, x ) ) (57)
INVARIANT PATTERN REPRESENTATIONS AND LIE GROUPS THEORY 185
and, defining
Ys= cm
i= I
ai(x)a/axi. (59)
The computation of the one-parameter group T(s,x) from the infinitesimal
operator Y is often referred to as exponenfation of 9,
and it is customary to
adopt the notation
T ( s ,x) = exp [SLY].
Let 9be the infinitesimal operator of T(s,x) and letf: M -+ R be a smooth
function. We are interested in studying the changes off under the action
T(s,x), denoted by T’f(x) =f(T,x) =f(exp (sLY)x). By Taylor’s theorem,
+ + +
f(exp(s9)x) = f ( x ) s p f ( x ) S2/29’f(x) . . . ?/k! Y k f ( x ) O(sk+l), +
where,
g ILO
T,(s,X ) = X.
which yields l g b ) - lg(x) = const, that is, y / x - const = 0, i.e., the orbits
form a star of radial lines.
The most important operation on infinitesimal operators is their com-
mutator or Lie bracket:
[yS,g1
= y: % - g y 5 .
In local coordinates, if
then
INVARIANT PATTERN REPRESENTATIONS A N D LIE GROUPS THEORY 187
3. Jacobi identity
[% [%,-%I] + [%,[Z.,-%I] + [%,WW.,
=zl]
= 0.
Finally, we define the prolongation of an infinitesimal operator (or vector
field) and restrict ourselves to the case M = R2(see Olver, 1986, for a general
treatment). Let G be a one-parameter group acting on Rz, and let the
transformations be given by
x’ = T ,(s; x, y ) , y’ = T2 (3; x,y ) ,
and
2 = a, (x,y)a/ax+ a,(x,y)a/ay.
The first prolongation of the group G is a group GI acting on the variables
x, y and y = dy/dx,
x’ = T , ( s ; x , y ) , y’ = T 2 ( s ; x , y ) , j ’ = T , ( s ; x , y , j ) .
The infinitesimal operator of the prolongation is
LP) = =Y + Al(x,y,y)a/aj,
where
(Hoffman, 1970; Olver, 1986). The group G , , the second order prolongation,
can be obtained from GI in a similar way and the process can be repeated to
G k ,the kth-prolongation of G . The group Gkis determined by the kth-prolon-
gation of the infinitesimal operator
LP) =2 + Al(x,y,j)a/aj+ . . . + ..
~ k ( ~ , ~ , ) i ,, y. ( k ) ) ,
188 MARIO FERRARO
APPENDIX
B
The literature on differential geometry is even larger than that on Lie groups.
These notes are based mainly on Stoker (1963) and Do Carmo (1976).
The most general way to define a surface is given by the parametric
represen tat ion:
f:(u, v ) -+f(u, v ) = (x(u,4,y(u, 4, d u , v)),
INVARIANT PATTERN REPRESENTATIONS AND LIE GROUPS THEORY 189
n=-. xu x xv
1% x XVI
metric of a surface depends only on the surface itself and does not depend on
how the surface is embedded in the three-dimensional space; therefore, the
metric is referred to as an intrinsic property of the surface.
We denote by
xu, = a2x/auau, x , = a2x/avav, xu, = a2x/auav,
the second partial derivatives of x with respect to u and v, and, for reasons
that will be apparent later on, we introduce the notation
X I 1 = xu,, x22 = x,, X I 2 = x21 = xuv.
I1 = 1 bijduiduj,
i.i=l
x 1.1 . =
k= I
+
ri?xxk b,n,
where i , j range from 1 to 2 (note that there are only three equations as
XI2 = X2I).
The Christofel symbols of second kind ri;depend only upon the coefficients
of the first fundamental form and are expressed by the formula
K = k , k, = det [PI = -b
g
REFERENCES
Aarts, E., and Korst, J. (1989). “Simulated annealing and Boltzmann Machines.” John Wiley
and Sons, New York.
Attneave F. (1954). Psychological Review 61, 183.
Ballard, D. H., and Brown, C. M. (1982). “Computer Vision.” Prentice-Hall, Englewood Cliffs,
N.J.
Bed, P. J., and Jain, R. C. (1985). ACM Compur. Surveys 17, 75.
Besl, P. J., and Jain, R. C. (1986). Comput. Vision Graphics and Image Process. 33, 33.
Beusmans, J . M. H., Hoffman, D. D., and Bemnet, B. M. (1987). J . Opt. SOC.Am. A4, 1155.
INVARIANT PATTERN REPRESENTATIONS AND LIE GROUPS THEORY 193
Bischof, W. F., and Ferraro, M. (1989). Computational Intelligence 5, 121.
Blake, A., and Marinos, C. (1990). ArtiJicial Intelligence 45, 323.
Bluman, G. W., and Cole, J. D. (1974). “Similarity Methods for Differential Equations.”
Springer-Verlag, New York.
Bochner, S., and Chandrasekharan, K. (1949). “Fourier Transforms.” Princeton University
Press, Princeton, New Jersey.
Borello, L., Ferraro, M., Penengo, P., and Rossotti, M. L. (1981). Cybern. 39, 78.
Braddick, O., Campbell, F. W., and Atkinson, J. (1978). In “Handbook of Sensory Physiology”
(R. Held, H. Leibowitz, and H. L. Tuber, eds.), Vol. 8. Springer-Verlag, New York.
Brady, M., and Yuille, J. (1984). IEEE Transactions on Pattern Analysis and Machine Intelligence
PAMI-6, 288.
Brady, M., Ponce. J., Yuille, A., and Asasa, H. (1985). Comput. Vision Graphics Image Process.
32, I .
Breitmeyer, B. S . (1973). Vision Res. 13, 41.
Brettel, H., Caelli, T. M., Hilz, R., and Rentschler, I. (1982). Human Neurobiol. 1, 61.
Bundesen, C., and Larsen, A. (1975). J. Exp. Psychol. Human Percept. Perform. 3, 214.
Caelli, T. M. (1976). Mathematical Eiosciences 30, 191.
Caelli, T. M., and Dodwell, P. C. (1982). Percept. Psychophys. 32, 314.
Caelli, T. M., and Liu, Z-Q. (1988). Pattern Recognition 21, 205.
Caelli, T. M., and Umanski, J. (1976). Vision Res. 16, 1055.
Caelli, T. M., Preston, G. A. N., and Howell, R. (1978). Vision Res. 18, 723.
Caelli, T. M., Ferraro, M., and Barth, E. (1992). In “Neural networks for human and machine
perception” (H. Wechsler, ed.). Academic Press, Boston.
Campbell, F. W., and Robson, J. G. (1968). J. Physiol. (Lond.), 203, 237.
Canny, J. ( I 986). IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI-8, 679.
Carpenter, G. A,, Grossberg, S., and Mehanian, C. (1989). Neural Networks 2, 169.
Cartan, H.(I97 I). “Differential Forms.” Kershaw Publishing Company, London.
Casasent, D., and Psaltis, D. (1976). Optical Engineering 15, 258.
Chern, S. (1957). Amer. J. Math. 19, 949.
Corballis, M. C., Zbrodoff, N. J., Shetzer, L. I., and Butler, P. B. (1978). Memory Cognition 6,
98.
Crampin, M., and Pirani, F. A. E. (1986). “Applicable Differential Geometry.” Cambridge
University Press, Cambridge.
De Valois, R. L., and De Valois, K. K. (1988). “Spatial Vision.” Oxford University Press,
Oxford.
Do Carmo, M. P. (1976). “Differential Geometry of Curves and Surfaces.” Prentice Hall,
Englewood Cliffs, New Jersey.
Dodwell, P. C. (1983). Percept. Psychophys. 34, I .
Eley, M. G. (1982). Memory Cognition 10, 25.
Eriksen, B. A., and Eriksen, C. W. (1974). Percept. Psychophys. 16, 143.
Fan, T-J., Medioni, G., and Nevatia, R. (1989). IEEE Transactions on Pattern Analysis and
Machine Intelligence, 11, 1140.
Faux, I. D., and Pratt, M. J. (1979). “Geometry for Design and Manufacture.” Ellis Horwood,
Chichester, United Kingdom.
Ferraro, M., and Caelli, T. M. (1988). J. Opt. Soc. Amer. AS, 738.
Ferraro, M., and Foster, D. H. (1984). Eiol. Cybern. 50, 9.
Ferrier, N. (1987). “lnvariance coding in pattern recogniton.” MSc. Thesis. University of
Alberta, Edmonton, Alberta, Canada.
Foster, D. H. (1972). Eiol. Cybern. 11, 223.
Foster, D. H., and Mason, R. J. (1979). Biol. Cybern. 32, 85.
194 MARIO FERRARO
Gilbarg, D., and Trudinger, N. (1977). “Elliptic Partial Differential Equations.” Springer-
Verlag, New York.
Giulianini, F., Ferraro, M., and Caelli, T. M. (1992). J . Opt. Soc. Amer. A9, 494.
Giusti, E. (1978). Invent. Math. 46, 111.
Gonzalez, R. G., and Wintz, P. (1987). “Digital Image Processing.” Addison-Wesley, Reading,
Massachusetts.
Graham, N. (1980). In “Visual Coding and Adaptability” (C.S. Haris, ed.). Lawrence Erlbaum
Associates, Hillsdale, New Jersey.
Crimson, W. E. L. (1980). AIM 565, Artificial Intelligence Laboratory, Massachusetts Institute
of Technology, Cambridge, Massachusetts.
Grossberg, S. (1976a). B i d . Cybern. 23, 121.
Grossberg, S. (1976b). Biol Cybern. 23, 187.
Guggenheimer, H. W. (1963). “Differential Geometry.” McGraw-Hill, New York.
Hadamard, J. (1923). “Lectures on the Chauchy Problem in Linear Partial Differential
Equations.” Yale University Press, New Haven, Connecticut.
Hansen, E. W. (1981). Applied Optics 20,2266.
Hardlick, R. M., Watson, L. T., and Laffey, T. J. (1983). International Journal of Robotic
Research 2, 50.
Hoffman, W. C. (1966). Journal of Mathematical Psychology 3, 65; errata (1967). Journal of
Mathematical Psychology 4, 348.
Hoffman, W. C. (1970). Mathematical Biosciences 6, 437.
Hoffman, W. C. (1977). Cahiers de Psychologie 20, 139.
Horn, 9 . K. P., and Brooks, M. J. (1986).Comput. Vision Graphics and Image Process. 33, 174.
Hsu, Y-N., and Arsenault, H. H. (1982). Applied Optics 21,4016.
Hsu, Y-N., Arsenault, H. H., and April G . (1982). Applied Optics 21,4012.
Hubel, D. M., and Wiesel, T. N. (1962). J . Physiol. 160, 106.
Hubel, D. M., and Wiesel, T. N. (1965). J . Neurophysiol. 28,229.
Kahn, J. I., and Foster, D. H. (1981). Q. J . Exp. Psychol. 33A, 155.
Koenderink, J. J. (1987). In “Image Understanding” (W. Richards and S. Ulman, eds.). Ablex
Publishing Corporation, Norwood, New Jersey.
Kolers, P. A., Duchnicky, R. L., and Sundstroem, G. (1985). J. Exp. Psychol. Percept. Perform.
11, 726.
Korn, G . A., and Korn, T. M. (1968). “Mathematical Handbook for Scientist and Engineers.”
McGrdw-Hill, New York.
Kubovy, M., and Podgorny, (1981). Percept. Psychophys. 30. 24.
Lang, S. (1967). “Introduction to Differentiable Manifolds.” John Wiley and Sons, New York.
Lawden, M. C. (1983). Vision Res. 23, 1451.
Lupker, S. J., and Massaro, D. W. (1979). Percept. Psychophys. 25, 60.
Maffei, L. (1980). In “Handbook of Sensory Physiology” (R. Held, H. Leibowitz, and H. L.
Tuber, eds.). Vol. 8. Springer-Verlag, New York.
Marr, D. C. (1976). Phil. Trans. Roy. Soc. London B207, 483.
Marr, D. C. (1982). “Vision.” Freeman, San Francisco.
Marr, D. C., and Hildreth, E. (1980). Phil. Trans. Roy. Soc. London B-207,187.
Metzler, J., and Shepard, R. N. (1974). In “Theories in Cognitive Psychology.” (R. Solso, ed.).
Lawrence Erlbaum Associates, Hillsdale, New Jersey.
Nazir, T. A., and ORegan, J. K. (1990). Spatial Vision 5, 81.
Olver, P. J. (1986). “Application of Lie Groups to Differential Equations.” Springer-Verlag,
New York.
O’Neill, B. (1966). “Elementary Differential Geometry.” Academic Press, New York.
INVARIANT PATTERN REPRESENTATIONS A N D LIE GROUPS THEORY 195
Ovisiannikov, L. V. ( 1982). “Group Analysis of Differential Equations.” Academic Press, New
York.
Papoulis, A. ( I 984). “Signal Analysis.” McGraw-Hill, New York.
Pentland. A. P. (1987). IEEE Transaciions on Paifern Analysis and Machine Infelligence PAMI-
9, 523.
Poggio, T., and Torre, V. (1984). AIM 773, Artificial Intelligence Laboratory, Massachusetts
Institute of Technology, Cambridge, Massachusetts.
Pollen, D. A,, and Ronner, S. F. (1981). Science N . Y . 212, 1409.
Pollen, D. A,, and Ronner, S. F. (1982). Vi.rion. Res. 22, 101.
Richards, W. A,, Koenderink, J . J., and Hoffman, D. D. (1987). J . Opi. Soc. Am. A4, 1168.
Robson. J. G . (1975). In “Handbook of Perception” (E.C. Carterette and M . P. Friedman, eds.),
Vol. 5, 81. Academic Press, New York.
Rock, I. (1973). “Orientation and Form.” Academic Press, New York.
Rock, I. (1984). “Perception.” Scientific American Library, New York.
Rosenfeld, A,. and Kak, A. C. (1982). “Digital Picture Processing,” Second Edition. Academic
Press, New York.
Sagle, A. A,, and Walde, E. W. (1973). “Introduction to Lie Groups and Lie Algebras.”
Academic Press, New York.
Schutz. B. (1980). “Geometrical Methods for Mathematical Physics.” Cambridge University
Press.
Schwartz, E. L. (1980). Vision Res. 20, 645.
Sederberg, T. W., and Anderson, S. N. (1985). IEEE Compui.Graphics Appl. 5, 23.
Simas, M. L. de B., and Dodwell, P. C. (1990). Spaiial Vision 5, 59.
Spivak, M. (1979). “A Comprehensive Introduction to Differential Geometry.” Publish or
Perish, Berkeley, California.
Stoker, J. J. (1963). “Differential Geometry.” Wiley-Interscience, New York.
Tiller, W. (1983). IEEE Compui. Graphics Appl. 3, 61.
Torre. V., and Poggio, T. A. (1986). IEEE Transactions on Paitern Analysis and Machine
Inielligence PAMI-8, 147.
Ullman, S. (1979). Proc. R. Soc. Lond. B. 203, 405.
Wilkinson, F. E., and Dodwell, P. C. (1980). Naiure 284, 258.
Witkin, A. P. (1981). Artificial Infell. 17, 17.
Woodham, R. J. (1980). Opiical Engineering 19. 139.
Wu, R., and Stark, H. (1984). Applied Optics 23, 838.
Yuzan, Y., Hsu. Y-N., and Arsenault, H. H. (1982). Opiica Acfa. 29, 627.
Zetzsche, C., and Barth, E. (1990). Vision Res. 30, I 1 I I .
Zucker, S. W. (1985). Compuf. Vision Graphics and Image Process. 32, 74.
This Page Intentionally Left Blank
ADVANCES I N ELECTRONICSA N D ELECTRON PHYSICS,
VOL. a4
I . Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
11. Abstract Cell Complexes. . . . . . . . . . . . . . . . . . . . . . . . . 20 1
111.Images on Cell Complexes, . . . . . . . . . . . . . . . . . . . . . . . 208
IV.Resolution of Connectivity Contradictions. . . . . . . . . . . . . . . . . 212
V.Boundaries in Complexes . . . . . . . . . . . . . . . . . . . . . . . . 216
VI.Simple Image Analysis Problems . . . . . . . . . . . . . . . . . . . . . 220
VII.The Cell List Data Structure. . . . . . . . . . . . . . . . . . . . . . . 224
VIII.Subgraph and Subcomplex Isomorphism . . . . . . . . . . . . . . . . . 229
IX.Variability of Prototypes and Use of Decision Trees . . . . . . . . . . . . 238
X. Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
A. Handwritten Characters . . . . . . . . . . . . . . . . . . . . . . . 245
B. Block Diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
C. Cartography. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
D. Technical Drawings . . . . . . . . . . . . . . . . . . . . . . . . . 254
XI. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . 258
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258
I. INTRODUCTION
(Alexandroff and Hopf, 1935), but this knowledge is only weakly represented
in toplogical text books. Therefore, specialists in image analysis were forced
to look for their own solutions of the problem. Rosenfeld (1970) introduced
the adjacency relation among pixels, thus considering the digital plane as a
graph whose vertices are the pixels; an edge of the graph corresponds to each
pair of adjacent pixels. Such a graph is called an adjacency graph. Connected
subsets of the digital plane are then defined by means of paths in the
adjacency graph. A path in a graph G is such a sequence Q of vertices of G,
Q= ( ~ 1 9 vz, . . . v,,h
that any two vertices being subsequent in Q are connected by an edge of G.
A subset S of the digital plane defines a subgraph SG of G. The subset S is
declared to be connected if for any two pixels a, b of S there exists a path in
SG that contains the vertices corresponding to a and b (“path in SG”
means that it lies completely in S G ) . A subset S that is not connected may
be considered as a union of disjoint connected subsets such that no two of
them compose a connected set. Such subsets are called components of S.
The concept of adjacency has led to some progress in image analysis: it
became possible to consider connected subsets of a segmented image, bound-
aries, pairs of adjacent subsets and to formulate some image analysis
problems as, e.g., those of subgraph isomorphism (Ullmann, 1976) or of
consistent labelling (Shapiro, 1983). These problems are considered in
Section VIII.
However, attempts to develop a consistent topology for two- and three-
dimensional images by means of graphs have failed due to the so-called
connectivity paradox and some great difficulties in defining the boundaries of
subsets (Pavlidis, 1977). The connectivity paradox consists in the following.
The well-known Jordan theorem states that a simple closed curve in the
Euclidean plane separates the complementary part of the plane into two
components: the interior and the exterior of the curve. The natural substitute
for a simple closed curve in the digital plane is a simple closed path P in the
adjacency graph. “Simple” means that any vertex of P has exactly two
adjacent vertices in P . It is possible to consider at least two different kinds of
adjacency graphs: those of 4- and %adjacency. In the 4-adjacency graph, a
vertex has four adjacent vertices (Fig. la), and in the 8-adjacency it has eight
(Fig. lc). As it may be seen in Fig. la, there exist in a 4-adjacency graph such
simple closed paths that the rest of the graph consists of more than two
components. On the other hand, a simple closed path in an 8-adjacency graph
does not separate the rest of the graph at all: the “interior” of the path always
remains connected with the “exterior.” Only under 6-adjacency is the number
of components always equal to two (Fig. lb).
Attempts to overcome this difficulty have been made by introducing an
FINITE TOPOLOGY AND IMAGE ANALYSIS 199
a b C
FIGURE I . Separation by simple closed paths under (a) 4-adjacency, (b) 6-adjacency, and
(c) 8-adjacency.
R R R -
R R R R R
- R R R R R *
R R R R R
--R R R R R-
a b C
FIGURE2. (a) A subset, (b) its inner (I) and outer (0)boundaries under 4-adjacency, and
(c) under 8-adjacency.
200 V. A. KOVALEVSKY
two boundaries: the inner and the outer (compare labels I and 0 in Fig. 2).
The width becomes equal to one pixel but there is still no difference between
a boundary and a narrow region, the area of a boundary (being commonly
defined as the number of pixels) is still not equal to zero. In addition, one gets
different boundaries for a given subset S and for its complement. The
boundaries are different for 4- and 8-adjacency (compare Fig. 2b and 2c).
Each of the 4-boundaries (inner and outer) is disconnected under this
adjacency. The 8-boundaries are not simply connected. All these peculiarities
of boundaries are toplogical paradoxes.
Intuitive attempts to overcome the difficulties were often reported in the
literature. Thus, Rosenfeld and Kak (1976), when considering perimeters of
subsets in digital images, have suggested regarding “cracks” separating pixels
of a subset from those of its complement. Elliott and Srinavasan (1981)
considered boundaries as consisting of “boundary elements,” i.e., short line
segments equivalent to “cracks.” The Apple company also uses a similar
concept when describing its graphics software. Herman and Webster (1983)
define the boundary surface of a three-dimensional region as a set of “faces”:
space elements separating two adjacent voxels (volume elements) from each
other. These ideas may serve as evidence that image processing specialists
feel strongly that a consistent topological concept for digital images must
include space elements of various kinds.
This feeling will be verified in Section 11, where it is shown that the
resolution of these problems consists in considering the digital plane as a
finite topological space in full accordance with topological axioms. It is
shown that the most suitable for practical purposes is the particular case of
finite topological spaces, known as abstract cell complexes. Topologically
consistent definitions of connectivity and boundaries are given there. In
Section 111, images on cell complexes are defined and ways of encoding them
are discussed. Important notions of Cartesian complexes and coordinates are
introduced there. Section IV contains the explanation of why some adjacency
graphs are topologically contradictory. Section V is devoted to boundaries.
A definition of boundaries of subcomplexes is given, and advantages of this
concept as compared to boundaries in adjacency graphs are presented.
Section VI is devoted to the simplest applications of finite topology to image
analysis: tracking and filling of boundaries, and thinning of regions. Section
VII describes a new topologically founded data structure: the cell list, which
represents a segmented image as a cell complex. The cell list is the base for
efficient image analysis. Algorithms for transforming a raster image into a
cell list are also described in Section VII. Sections VIII and IX represent an
advanced concept of image analysis: a generalization of the subgraph isomor-
phism problem based on the notion of a cell list. Both the problem formula-
tions and solutions are discussed. Section X contains applications.
FINITE TOPOLOGY AND IMAGE ANALYSIS 20 1
FIGURE
3. The surface of a polyhedron.
CELLCOMPLEXES
11. ABSTRACT
that elements with lower numbers bound those with higher numbers. The
numbers are called dimensions of the space elements. Thus, vertices that are
not bounded by other elements get the lowest dimension, i.e., 0; the edges get
dimension 1 and the faces dimension 2. Structures of this kind are known as
abstract cell complexes (Steinitz, 1908).
Definition 1: An abstract cell complex C = ( E , B, dim) is a set E of abstract
elements provided with an antisymmetric, irreflexive, and transitive binary
relation B c E x E, called the bounding relation, and with a dimension
function dim: E + I from E into the set I of non-negative integers such that
dim(e‘) < dim(e”)for all pairs (e‘, e”)E B.
Elements of E are called abstract cells. It is important to draw the attention
of topologists to the fact that, in contrast to cells of Euclidean complexes,
abstract cells should not be regarded as point sets in a Euclidean space. That
is why abstract cell complexes (ACC’s) and their cells are called abstract.
Neither should an ACC be regarded as a quotient space of a Euclidean space,
as it was proposed by Kong and Rosenfeld (1991). (A quotient space Q of a
space S with a given decomposition of S into disjoint subsets is a space whose
elements correspond to subsets of S , while a subset of Q is open in Q iff the
union of corresponding subsets of S is open in S.) Considering cells as
abstract space elements makes it possible to develop the topology of ACC’s
as a self-contained theory that is indepenendt of the topology of Euclidean
spaces.
If the dimension dim(e’) of a cell e‘ is equal to d then e‘ is called a
d-dimensional cell or a d-cell. An ACC is called k-dimensional or a k-complex
if the dimensions of all its cells are less than or equal to k . If (e’, e”)E B then
e‘ is said to bound e”.
Examples of ACC’s are shown in Fig. 4. In Fig. 4 and in the sequel, the
following graphical notations (similar to that of Fig. 3) are used: 0-cells are
denoted by small circles or squares representing points (which cannot be
drawn), I-cells are denoted by line segments, 2-cells by interiors of rectangles
or other polygons, and 3-cells by interiors of polyhedrons. The complexes of
Figs. 4a and 4d are one-dimensional: they contain only 0- and 1-cells. The
bounding relation in these examples is defined in a natural way: a I-cell
represented in the figure by a line segment is bounded by the 0-cells represented
by its end points.
The complexes of Figs. 4a and 4d may be considered as graphs whose
vertices are the 0-cells and whose edges are the I-cells. Any graph may be
considered in turn as an abstract cell complex if each vertex of the graph is
declared to bound all edges indicent with it and if every vertex gets, for
example, the dimension 0 and every edge the dimension I . The relation
between ACC’s and graphs is one more illustration of the idea of abstract
FINITE TOPOLOGY A N D IMAGE ANALYSIS 203
FIGURE
5 . Examples of open and non-open subsets.
spaces. We shall need in the sequel the following properties of ACC’s. (We
shall write complex for ACC.)
Definition 3: A subcomplex S = (E’, B‘, dim‘) of a given abstract complex
C = ( E , B, dim) is a complex whose set E’ is a subset of E and the relation B’
is an intersection of B with E’ x E’. The dimension dim‘ is equal to dim for all
cells of E’.
111. IMAGES
ON CELLCOMPLEXES
y + 2). The 2-cell (cX+,, c,,,,) of the product complex consists of the l-cell
+
c , + ~and the 1-cell c V + , . The coordinates of ( c ~ + c~ ,~ + are
~ ) (x +
1, y 1).
Similar spaces that do not regard dimensions of space elements were
considered by Khalimsky (1977) (see also Kong et al., 1991). It is easy to see
that a Cartesian ACC represents a finite analogy of a Cartesian Euclidean
space.
An n-dimensional image (n = 2 or 3) is defined by assigning numbers (gray
values or densities) to the n-dimensional cells of an n-dimensional Cartesian
ACC. There is no need to assign gray values or densities to cells of lower
dimensions. Such an assignment would be unnatural since a gray value may
be physically determined only for a finite area. We interpret 2-cells in a
two-dimensional ACC as elementary areas. Cells of lower dimensions have
area equal to zero. Similarly, a density may be physically determined only for
a finite volume that is represented in a three-dimensional ACC by a 3-cell.
However, when considering the connectivity of subsets (subcomplexes),
the membership in a subset under consideration must be specified for cells of
all dimensions in such a way that each cell of the ACC is declared to belong
either to the subset or to its complement. If more than one subset of a given
ACC is being regarded, then a partition of the ACC in disjoint subsets must
be considered and each cell of the ACC must be assigned to exactly one
subset of the partition. The membership may be determined by assigning
labels to the cells. A label may be considered as the identification number of
a subset. In the simplest case of a binary image there are just two subsets, e.g.,
the black and the white ones. Then both the gray values (densities) and the
membership labels may consist of a single bit. The membership labels of the
n-cells may then be identical with their gray values. This tempts one to
interpret the membership labels of the lower dimensional cells as gray values,
which is not correct because of the previously mentioned reasons. It is better
to distinguish between gray values or densities on the one side and mem-
bership labels on the other.
As soon as membership labels are assigned to all cells (of all dimensions)
of an ACC, the connectivity of its subsets may be consistently specified by
Definitions 7 or 9. It is important to stress that the connectivity is determined
by means of the lower dimensional cells, which serve as “cement” joining
n-dimensional cells. A set consisting of only n-dimensional cells is always
disconnected.
When storing the membership labels of a two-dimensional ACC explicitly,
i.e., in an image memory, four times more memory space is required as
compared with the space required for the 2-cells only (Kovalevsky, 1989a).
This quotient is equal to eight in the three-dimensional case. However, such
a great memory volume is rarely (if ever) needed in practice. There are many
ways to use apriori knowledge about the image under consideration to obtain
FINITE TOPOLOGY A N D IMAGE ANALYSIS 21 I
a b
FIGURE10. The south-east membership rule: (a) a connected and (b) a disconnected subset.
some implicit determination of the membership labels of a lower dimensional
cell as a function of the labels and gray levels of the n-cells bounded by it. The
determination may be realized by means of the so-called membership rules.
Such a rule cannot be chosen arbitrarily; it must specify the membership
of every cell of the ACC, and the membership of a cell must be specified
uniquely. E.g., it is not allowed to specify all faces of an n-cell as belonging
to the same subset as the n-cell itself since the same cell may be a face of two
different n-cells belonging to different subsets.
Consider some examples of membership rules. One of the simplest rules for
two-dimensional images assigns every 1-cell to the same subset as the 2-cell
that is incident with it and lies directly below or to the right of it. Every 0-cell
is assigned together with the incident 2-cell below and to the right of it. It is
easy to see that under this rule, the set of pixels shown in Fig. 10a is connected
and that in Fig. 10b disconnected. Thus, the connectivity is non-isotropic, as
in the case of the 6-adjacency (Fig. 1b).
An isotropic and more practically useful rule follows.
Maximum Value Rule. In an n-dimensional ACC, every cell c of dimension
less than n gets the membership label of the n-cell that has the maximum gray
value (density) among all n-cells bounded by c.
Under this rule, both sets of pixels in Figs. 10a and 10b are connected if the
pixels of the sets have a greater gray value than those of the background. The
membership of the 0-cells in Fig. 10b is changed, thus making the set
connected. It is, of course, possible to formulate a similar Minimum Value
Rule. The connectivity of a binary image is similar in both cases to that
obtained according to the widely used idea of an 8-adjacency for objects and
a 4-adjacency for the background (Rosenfeld and Kak, 1976). An important
advantage of the Maximum (Minimum) Value Rule is the possibility of using
it for multi-valued images. A slightly more complicated and also practically
212 V. A. KOVALEVSKY
IV. RESOLUTION
OF CONNECTIVITY
CONTRADICTIONS
12b and 12c, differing from them only by a rotation by 90" or 180". Hence,
in any case the complementary complex SON-S consists of two components.
Let pI and p 2 be two 0-cells of S that are adjacent in path S, i.e., they have
a common 1-cell I of S incident with both of them (Fig. 13a). Each of the
components of SON(p,) has one 2-cell in common with one of the
components of SON(p2).This is one of the 2-cells belonging to SON(I) (e.g.,
u2 and a, in Fig. 13a). Thus, the union of a component of SON(p,) with one
of the components of SON(p2)composes a connected subset of C. When
repeatedly composing such unions for all subsequent 0-cells of S one obtains
two disjoint connected open subsets 0' and 0",separated by S (Fig. 13b).
These are the two components of the set U-S where U is the union of SON's
of all 0-cells of S . Notice that the SON of any I-cell of S contains one 2-cell
belonging to 0' and one belonging to 0".
Now we shall show that any cell of C-S is connected in C-S either with 0'
a b
FIGURE13. (a) common 2-cells of two SON's, and (b) a path connecting a cell c' to the
set 0'.
214 V. A. KOVALEVSKY
ea
v3 er VI
a b
FIGURE14. Relation among (a) an adjacency graph, (b) an abstract cell complex, and (c)
a Cartesian complex.
the object iff either v I and v4 or v 2 and v 3 belong to it; otherwise, both e, and
e6 belong to the background.
To make adjacency graphs consistent for non-binary images, another
version of a membership rule may be suggested: numbers must be assigned
to subsets of vertices (e.g., gray values), to corresponding subgraphs, and
their elements; both e, and e6 must then belong to the subgraph with the
maximum number along those of v I , v4, v2,and v3.
Similar rules may be introduced in the three-dimensional case. The
following three situations must be distinguished:
a) Two voxels have a common 2-dimensional face. In the adjacency graph
there is a direct edge connecting these two voxels. Its membership is a
function of the memberships of the same two voxels.
b) Four voxels have a common 1-dimensional edge. In the adjacency
graph there are four direct and two “diagonal” edges connecting them. The
membership of these two diagonal edges is a function of the memberships of
the four voxels. This situation exactly corresponds to that of the 8-adjacency
in the two-dimensional case as previously considered.
c) Eight voxels have a common 0-dimensional vertex. In the adjacency
graph there are twelve direct, twelve diagonal (such as those of case b), and
four “double diagonal” edges connecting them. The membership of these
four double diagonal edges is a function of the memberships of the eight
voxels.
In the n-dimensional case, a constellation of 2‘k-’’ “multi-diagonal’’ graph
edges corresponds to a group of 2k n-dimensional space elements having a
common (n-k)-face. The membership of the whole constellation is a function
of the memberships of the 2k elements of the group.
There are n different kinds of constellations of graph edges, which should
be considered differently to obtain a consistent imbedding into a Cartesian
space. This is the payment for the conciliation of adjacency graphs with
Cartesian spaces. The number n of kinds of constellations is exactly the
number of different dimensions of cells minus one (the n-dimensional cells are
excluded). However, it is much simpler and more descriptive to consider
+
n 1 different kinds of cells in an ACC than n different constellations of
edges in an adjacency graph. This is one of the obviously important advan-
tages of the ACC’s as compared to adjacency graphs.
V. BOUNDARIES
IN COMPLEXES
1977). On the other hand, the theory of finite topological spaces, particularly
cell complexes, leads to a simple and consistent definition. The notion of a
boundary is for ACC’s similar to that in general topology:
Definition 10: The boundary (frontier) of a subcomplex S c C relative to C
is the subcomplex Fr(S, C ) consisting of all cells c of C such that the S O N ( c )
contains cells both of S and of its complement C-S.
Figure 15a shows an example of a subcomplex S of a two-dimensional ACC
and Fig. 15b its boundary. The subcomplex S contains the 0-cells p, ,p , , p,,
p6, p s , and plo,which are marked by small fat squares. It also contains the
I-cells l , , 12, l,, I,, I,, I,,,l,2r
I,,, I,,, and I,,, which are represented by fat lines,
and the 2-cells a , , a,, a,, and a6, shown as big shaded squares. Figure I5a
demonstrates various particular cases, which may occur when determining a
boundary. The cell 1, belongs to the boundary of S because its SON, consist-
ing of the cells a,,, l,, and a,, contains both cells of S (1, and a , ) and a cell of
the complement C-S (the cell ao).The cell l9 also belongs to the boundary
because its SON, consisting of the cells a,, 19, and a6,contains the cells u2and
a,, which are in S, but l9itself belongs to the complement. The 0-cell p , , does
not belong to the boundary since its SON, consisting o f p , , itself, four I-cells,
,
and four 2-cells incident with p, , is completely in the complement of S. The
SON’S of most other 0-cells intersect both S and its complement and thus
belong to the boundary.
Now consider the differences between boundaries according to Definition
10 and boundaries in adjacency graphs. First of all, let us notice that a
boundary Fr(S, C) in an n-dimensional complex C contains no n-dimensional
cells since n is the highest dimension, and hence an n-cell bounds no cells of
218 V. A. KOVALEVSKY
C . Therefore, the SON of such a cell consists of a single cell, which is the cell
itself. Hence, such a SON cannot contain cells of both S and its complement
and the cell cannot belong to the boundary. Consequently, the boundary of
S is a subcomplex of a lower dimension equal to n - 1.
Thus, the boundary of a region (a connected open subcomplex) in a
two-dimensional ACC contains no pixels and consists of 0- and 1-cells. It
looks like a closed polygon (or like several polygons, if the region has holes
in it). The boundaries so defined are analogous to those considered by Elliott
and Srinavasan (198 1) or to the “(C,D)-borders” (sets of “cracks”) briefly
mentioned by Rosenfeld and Kak ( I 976, second edition). Similarly, the
boundary of a region in a three-dimensional ACC contains no voxels and
consists of 0-, 1-, and 2-cells. It looks like a closed surface of a polyhedron
(or several surfaces, if the region has holes). A 2-cell of a boundary separates
a voxel of the region from a voxel of its complement. Thus, the 2-cells of the
boundary are the “faces” considered by Herman and Webster (1983). We
may now see that the theory of the ACC’s brings many intuitively introduced
notions together in a consistent and topologically well founded concept.
The next peculiarity of the boundary Fr(S, C ) is that it is unique: there is
no need (and no possibility!) to distinguish between the inner and outer
boundaries, defined by Pavlidis (l977), or between the “D-border of C” and
“C-border of D,” defined by Rosenfeld and Kak (1 976, second edition). A
boundary according to Definition 10 is the same for a subset and for its
complement, since Definition 10 is symmetric with respect to both subsets.
Remember that this was not the case for adjacency graphs.
The boundary now depends neither on the kind of adjacency (which notion
is no longer used) nor on the membership rules as defined in Section 111. To
prove the last assertion we need one more notion:
Definition 11: A membership rule for an n-dimensional complex is called
local if the membership label of a cell c’ specified by this rule is equal to that
of some n-dimensional cell bounded by c’.
Theorem 1: The boundary of an n-dimensional subcomplex S consisting of
a set of n-cells and of cells of lower dimensions assigned to S by some local
membership rule does not depend on the choice of this local rule.
Proof: Consider a boundary cell c‘ of S. The dimension of c’ is obviously
less than n. Without loss of generality, we may suppose that c‘ is assigned by
the membership rule to S (rather than to its complement). Since the rule is
local, there must be in the SON(c’) an n-dimensional cell belonging to S.
Since c’ belongs to the boundary of S there must also be at least one cell c”
in SON(c’)that does not belong to S. The dimension of c” is higher than that
of c’ since SON(c’) contains, besides c’ itself, only cells bounded by c’. If the
dimension of c” is equal to n then the SON(c’)contains at least one n-dimen-
FINITE TOPOLOGY AND IMAGE ANALYSIS 219
sional cell that is not in S. If, however, dim(c”) < n then the membership of
c” is specified by the local membership rule and hence there must be an
n-dimensional cell c” bounded by C” that does not belong to S. According to
the transivity of the bounding relation, c” is also bounded by c’ and hence
belongs to the SON(c’). Thus, in any case there is in SON(c’) an n-dimensional
cell not belonging to S . It has been shown previously that there is also in
SON(c’) an n-dimensional cell belonging to S. The membership of the n-
dimensional cells does not depend on the membership rule. Therefore, c’ will
belong to the boundary of S independently of the membership of the cells of
dimensions less than n and thus independently of the choice of a local rule.
A similar consideration may be repeated for the case when c’ is not in S .
The next important property of a boundary of a subcomplex (satisfying
some commonly fulfilled conditions) is that it has no end points, as was the
case for the 4-boundaries in Fig. 2b. The proof of this assertion demands
however, additional definitions to specify the just mentioned conditions.
Therefore, we shall not bring it here.
The notion of adjacent regions, which was profoundly investigated by
Pavlidis (1 977) in connection with boundaries, may successfully be replaced
by that of incident ones.
Definition 12: Two non-intersecting subcomplexes S , and S2 of a complex C
are culled incident with each other if there are two cells e‘E S, and e”E S2 such
that one of them bounds the other.
It may be easily shown that the boundaries of incident subcomplexes
intersect. The solution of this problem, suggested by Pavlidis (1977) by means
of the so-called extended boundaries, may be considered as an intuitive
prevision of the topological results.
Consider the problem of the area of a boundary in a two-dimensional
complex. As explained earlier, a boundary of any subset S of such a complex
C contains no 2-cells. Therefore, such a boundary is a one-dimensional
subcomplex of C consisting only of 0- and I-cells, i.e., of line elements and
points. It is natural to assign a non-zero area to the 2-cells (pixels) only. Then
the area of any one-dimensional complex is equal to zero, which is in
accordance with our intuition. Similarly, the boundary of any subsets of a
three-dimensional ACC contains no 3-cells and in an n-dimensional complex
no n-cells. Hence, the volume of a boundary or, in general, its n-dimensional
measure is equal to zero.
We have demonstrated that the theory of the ACC’s, being a consistent
branch of classical, well-proven topology, removes all topological paradoxes
and contradictions from the theory of digitized images. It may be applied
without any change to describe finite topological spaces of any dimensions.
Why is the notion of a boundary in an adjacency graph topologically
220 V. A. KOVALEVSKY
VI. SIMPLEIMAGEANALYSIS
PROBLEMS
The concept of abstract cell complexes not only makes the theory of digital
image analysis and computer graphics free of contradictions, it also enables
one to develop elegant, simple, and comprehensible algorithms. Consider
first the problem of tracking boundaries of regions in two-dimensional raster
images. Regions may be specified in the usual way, that is, by labelling all
pixels (2-cells) of a region by some label different from those of adjacent
regions. Cells of lower dimensions need not be labelled explicitly; their
membership in a region may be defined in the most practical cases by a
membership rule as explained in Section 111. According to the Maximum
Value Rule, every 0- and I-cell c gets the label of the 2-cell bounded by it that
has the maximum gray value. Under such a labelling, the boundaries cannot
contain isolated 0-cells. Therefore, it is sufficient to test only the 1-cells for
their membership in a boundary. According to Definition 10, a 1-cell c’
belongs to a boundary of a subset S iff the SON(c’) consisting of c’ itself and
of two incident 2-cells (see Fig. 6 ) intersects both Sand its complement. Since
the subsets are defined by labels, c’ belongs to a boundary iff the three cells
do not have the same label. If the membership rule is a local one then the label
of c’ is always equal to one of the two labels of the incident 2-cells. Thus, it
is sufficient to test these two labels.
The tracking algorithm described below is identical with the “crack
following” (Rosenfeld and Kak, 1976, second edition). Our description is
given in terms of cell complexes, which has the advantage that it is topologic-
ally justified and more comprehensible.
The algorithm goes from one 0-cell to the next, step by step, in such a
direction that the region with the chosen label (the object) always remains to
FINITE TOPOLOGY A N D IMAGE ANALYSIS 22 1
the right-hand side of the direction. These moves go along the 1-cells, which,
in a two-dimensional Cartesian ACC, are either horizontal or vertical. Thus,
there are onlyfour possible directions, as shown in Fig. 16. Having only four
directions rather than eight, as is usual when tracking boundaries in
adjacency graphs, is already a contribution toward simplifying the algorithm.
When arriving at the next 0-cell p , the direction of the last step that led to
p is known. Thus, it is known that the 2-cell lying to the right of this direction
belongs to the object, and that the one to the left belongs to the background
(Fig. 16). In this way, the membership of two pixels of SON(p) is already
known. It is only necessary to test the labels of the remaining two pixels of
SON(p) lying ahead: one to the right and one to the left of the direction of
the last step (R and L in Fig. 16). Consider the case when the object has a
greater gray level than the background, and accept the Maximum Value Rule
to determine the membership of the 0-cells. Then the actual 0-cell p (denoted
by a circle in Fig. 16). always belongs to the object, because p is a boundary
cell and, according to Definition 10, there must be in the SON(p) at least one
object pixel. The pixel having the maximum gray level determines the mem-
bership of p .
The direction of the next step depends upon the labels of R and L in the
following way: if L is in the object then turn left, else if R is in the background
turn right, else retain the old direction. A similar decision rule must be used
in the case when the object has a smaller gray level than the background. This
decision rule is the kernel of the tracking algorithm. The rest consists of some
obvious procedures:
a) calculating the coordinates of the pixels R and L as functions of the
actual coordinates of p and the direction;
b) calculating the new direction when turning to the right or to the left;
and
c) calculating the new coordinates o f p after having made the next step in
the new direction.
Procedures a) and c) may be easily realized by means of small arrays of
constants serving as lookup tables for the coordinate increments depending
on the direction. Procedure b) may be realized as increasing or decreasing the
direction value by 1 modulo 4 (if the directions are encoded by numbers
from 0 to 3). The whole procedure, including the definition of the lookup
tables, contains about 20 Pascal instructions. Tracking algorithms that do
222 V. A. KOVALEVSKY
a b C
FIGURE17. Recognizing inner pixels (a, b) in adjacency graphs, and (c) in a cell complex
not use the concept of cell complexes is much more complicated and less
comprehensible (compare, e.g., Rosenfeld and Kak, 1976, second edition; or
Pavlidis, 1982).
Consider now the problem of filling the interior of a closed curve. The
problem is obviously equivalent to that of deciding if a pixel is inside or
outside the curve: the inner pixels must be filled, the outer must not. The
decision is based on the fact that a ray that starts at a given point and goes
to infinity crosses the given curve an odd number of times if the point is inside
the curve, and an even number of times otherwise. Difficulty arises in
discriminating between crossing and tangency. It may be seen in Figs. 17a
and 17b that when describing curves as sets of pixels, situations may occur
in which it is impossible to decide correctly whether a pixel p is in the interior
of the curve when analyzing only the line containing p : the lines containing
p are identical in Figs. 17a and 17b, while p is inside the curve in Fig. 17a but
outside it in Fig. 17b. Algorithms not based on the concept of cell complexes
are rather complicated since they need to test three adjacent lines to decide
between crossing and tangency (compare, e.g., Pavlidis, 1982).
In the case of a cell complex, the ray is replaced by a horizontal open strip
consisting of alternating 2-cells and vertical 1-cells, all lying in a horizontal
row of the raster containing the pixel p (Fig. 17c). The curve is represented
as a 1-dimensional subcomplex consisting of alternating 0- and 1-cells. There
arises no problem of tangency since a horizontal strip does not contain
horizontal 1-cells. Crossings with the curve are only possible on vertical
1-cells. Therefore, the filling is reduced to scanning the image with the given
curve horizontally, row by row, and counting in each row the encountered
vertical 1-cellsof the curve. Counting must start with 0 at the left side of each
row. For each pixel in the row the number of vertical 1-cells counted since
the start of the row must be tested. If the count is odd then the pixel must be
filled, otherwise not. In other words, filling of subsequent pixels in a row must
be started whenever the count becomes odd, and stopped whenever it
becomes even. In the image of Fig. 17c, the count becomes equal to 1 in the
second column. Thus, the pixels in columns 2 through 10 must be filled. In
FINITE TOPOLOGY A N D IMAGE ANALYSIS 223
the 1 Ith column the count becomes 2 and the filling must be stopped. A
similar algorithm, again based on the notion of “cracks,” was described by
Rosenfeld and Kak (1976, second edition).
The filling procedure is important for computer graphics applications since
it enables fast and precise drawing of complex regions defined by their
boundaries. Calculating a boundary and then filling it requires much less time
than calculating all pixels of the region.
The advantages of cell complexes may be also demonstrated by the
example of a thinning problem, which consists in reducing the number of the
object pixels in a way similar to a “prairie fire,” destroying an object simul-
taneously at all boundary locations until two fire fronts collide. Thus, only
a skeleton line of every connected object must be left, while the connectivity
of all objects must be retained, Difficulty arises when regarding a boundary
as a sequence of pixels and there are two boundary pixels that are adjacent
in the region but not adjacent in the boundary sequence. The problem
consists in deciding which of the two pixels should be deleted, since if both
are deleted then the connectivity of the object may be damaged. The solution
of the problem is simple in the case of sequential algorithms: one of the pixels
under consideration may be chosen by means of some preference rule and
deleted. Then the decision about the possibility to delete the other pixel may
be taken according to the new situation occurring after the deletion.
However, the solution of the problem is more difficult in the case of
developing parallel thinning algorithms since there is no further possibility of
deleting one pixel and then investigating the new situation. Theories and
algorithms proposed for parallel thinning are numerous and complicated. On
the contrary, the solution based on cell complexes is again very simple. To
present it we need a new notion of an “open boundary,” which is dual to that
of the closed boundary specified by Definition 10:
Definition 13: The open boundary of a subcomplex S of a complex C relative
to C is the subcomplex Ob(S. C ) consisting of all cells c of C such that the
closure Cl(c) contains cells both of S and of its complement C-S.
Remember (Section 11) that the closure of a cell is a notion dual to the SON:
the closure Cl(c) in a complex C consists of c itself and all cells of C bounding
c.
The thinning algorithm consists in alternatively finding the closed and the
open boundaries of the objects. After finding the closed boundary, each cell
c’ contained in it is tested: if the cells of the background that are bounded by
c‘ comprise exactly one connected component then c‘ must be deleted.
Similarly, for each cell C” of the open boundary the set of background cells
bounding c” must be tested: if it consists of exactly one connected component
then c” must be deleted. The process stops if all cells to be deleted have three
224 V. A. KOVALEVSKY
incident 1-cells in the background. (In this way deletion of end points is
prevented.) It is easy to see that this algorithm, being simple and elegant, may
be parallelized in such a way that at any step either the closed or the open
boundaries of the objects are defined and the appropriate cells deleted. The
same algorithm may realize a dilation if we simply interchange object and
background: dilation of the object is the same as thinning of the background.
It can be shown that in both cases the connectivity of all object and back-
ground components is preserved.
VII. THECELLLISTDATASTRUCTURE
2-blocks must also be defined through metric data, e.g., in the form of a
triangulation in which the coordinate triples of some intermediate points are
accompanied by a list of “digital triangles.” Each record in the latter list
consists of three pointers indicating the intermediate points that are the
vertices of the corresponding triangle.
A cell list may be constructed automatically from a given segmented
2-dimensinal raster image. The corresponding program finds the boundaries
of regions, tracks them, and resolves them into digital straight line segments
(DSS) (Kovalevsky, 1990). Thus, every boundary is rcpresented as a
polygonal line. This, however, is an exact representation rather than an
approximation since the program encodes the DSS by their end points along
with some additional parameters that specify the exact location of the DSS.
Thus, a precise reconstruction of the original segmented image is possible.
The structure of the cell list may be explained by the example of a small
segmented image, as shown in Fig. 18. The corresponding list is shown in
Table I. The first column in the sublist of branching points contains the
identifiers of the 0-blocks used also in Fig. 18. These data are not stored in
the computer since they correspond to the addresses of the records. The next
two columns contain the coordinates of the points. The following four
columns contain the identifiers of all lines (1-blocks) bounded by the current
point. A line contacts its end point through a 1-cell. A point in a two-dimen-
sional Cartesian ACC is incident with at most four 1-cells lying to the east,
south, west, or north of the point. The identifier of each line is placed into
the column corresponding to one of these directions according to the position
of the incident I-cell with respect to the current branching point. The lines
FINITE TOPOLOGY AND IMAGE ANALYSIS 227
TABLE I
BRANCHINGPOINTS
~ ~
coordinates lines
PI 10 24 - L3 - LI 0 + L2
P2 30 23 0 + LI + L5 - L2
P3 17 17 - L4 + L3 0 + L6
P4 24 20 - L5 0 + L4 - L6
LINES REGIONS
LI PI P2 RI R2 1 4 RI 0 +LI
L2 P2 PI RI R3 5 8 R2 112 +L3
L3 PI P3 R2 R3 9 II R3 255 -L5
L4 P3 P4 R2 R4 12 13 R4 0 -L6
L5 P4 P2 R2 R3 14 15
L6 P4 P3 R3 R4 16 20
METRIC
DATA
address: 1 2 3 4 5 6 7
coord.: (10, 24) (11, 26) (30, 25) (30, 23) (30, 23) (29, 10) (10, 11)
address: 8 9 10 II 12 13 14
coord.: (10, 24) (10, 24) (17, 18) (17, 17) (17, 17) (24, 20) (24, 20)
address: 15 16 17 18 19 20
coord.: (30. 23) (24, 20) (23, 17) (12, 11) (16, 16) (17, 17)
are considered as directed (Fig. 18). A minus sign of the identifier denotes a
line starting from the point, and a plus sign corresponds to a finishing line.
The first column in the list of lines contains their identifiers. The next two
columns contain identifiers of the start and end points of a line. If the line is
closed, and hence is neither starting nor finishing at a branching point, then
both identifiers are zero. The next two columns contain identifiers of the
regions lying to the right and to the left of the line correspondingly. The last
two columns contain identifiers of a starting and finishing coordinate pair,
228 V. A. KOVALEVSKY
which are to be found in the metric list. Each pair represents an end point of
a digital straight segment (DSS).
This list is a single sequence of coordinate pairs. The identifier of each pair
is its ordinal number in the sequence. The items in the last two columns of
the list of lines indicate the beginning and end of the subsequence containing
all vertices of the digital polygon representing the current line. For example,
the numbers 5 and 8 in the row L2 denote that the coordiantes of the vertices
of the corresponding polygon are to be found in the list of metric data
starting at pair number 5 and finishing at 8. These are
When encoding the DSS by the coordinates of the end points, a reconstruc-
tion of the original segmented image is possible only with an accuracy of
about one pixel. This is so because there exist more than one DSS connecting
two given points. All these DSS deviate from each other by no more than one
pixel (see, e.g., Kovalevsky, 1990). If it is necessary to have a precise coding
of the line, certain additional parameters must be stored for each DSS. These
parameters are not shown in Table I to make the presentation simpler.
The first column in the list of regions contains the identifiers of the regions.
The next column contains the labels (gray values) of the regions. The last
column contains the identifier of a line belonging to the boundary of the
region. Starting from this line in the proper direction, one may reconstruct
the complete sequence of lines composing the boundary. The boundary is
directed in such a way that the region is always lying to the right side of the
boundary. The minus sign at the identifiers of some starting lines indicates
that the line should be transversed from the end to the beginning to obtain
the correct direction of the boundary.
Encoding images by cell lists is rather economical: in applications to
cartography (Kovalevsky, 1989b) and to technical drawings (see Section X)
the average compression factor, in comparison to a raster representation, is
in the range of 20 to 100. This means that the cell list for an image of
512 x 512 bytes is only about 3 to 13 Kbyte long.
Another important advantage of the cell list is that the data are region-
related, and therefore different objects of interest in the image are represented
in the list separately. The regions, and thus the objects of interest, are
represented in the list explicitly by their topological relations and coor-
dinates. Consequently, geometrical analysis of the image is reduced to simple
calculations using well-known formulae of analytic geometry. Details about
the procedure of transforming a segmented raster image into a cell list may
be found in (Kovalevsky, 1989a). Recognition of the DSS is described in
(Kovalevsky, 1990).
FINITE TOPOLOGY A N D IMAGE ANALYSIS 229
blue
a) bl
FIGURE19. (a) a simple scene, (b) its region adjacency graph, and (c) a prototype graph of
a house.
VIII. SUBCRAPH
AND SUBCOMPLEX
ISOMORPHISM
The cell lists described in the previous section give us a powerful means for
analyzing images, since an image is encoded in the list in such a form that
homogeneous regions, their boundaries, and their topological relations to
each other are represented explicitly. Moreover, geometrical features are
described by means of coordinates rather than by gray-value distributions.
This makes the analysis of size and shape of image parts easy. Cell lists
contain explicit information necessary for analyzing both geometrical
features of image parts and their spatial relations to each other. This is
exactly what is needed to solve complex problems of image understanding.
What one needs, in addition to this information, is a suitable technique for
verifying whether geometrical shapes and topological relations of image parts
correspond to certain predetermined demands characteristic of the image
classes to be analyzed. A well-known means of analyzing topological
relations is the technique of subgraph isomorphism (Ullmann, 1983). It is
based on describing the topological structure of an image by a region
adjacency graph (Strong and Rosenfeld, 1973; Pavlidis, 1977). In such a
graph, regions of a given segmented image are represented by graph vertices.
Every pair of adjacent regions is associated with an edge. The statement of
the image analysis problem may then sound as follows:
Formulation 1: Subgraph Isomorphism
Given is a region adjacency graph (image graph) IG of an image and a
prototype graph PG.
Find an isomorphic mapping M : PG + IG.
This means that a vertex of IG must be assigned to every vertex of PG in such
a way that for any two vertices of PG that are connected by an edge of PG,
the corresponding vertices of IG are also connected by an edge of IG.
Consider an example. Figure 19a shows a simple scene. The corresponding
230 V. A . KOVALEVSKY
graph IG is shown in Fig. 19b, and the prototype graph PG for a house in
Fig. 19c. A possible mapping from PG into IG may look like
s-I, r+2, w+3, g-4;
(1)
(s, 4 -+ (1, 2), (3, 4 -+ (1, 31, ( r , w > -+ ( 2 , 31, (w,8 ) -+ ( 3 , 4).
The prototype vertex s (sky) is mapped to the vertex 1 of IG,which corres-
ponds to the upper region in the scene of Fig. 19a. The edge (s, r ) of PG,
representing the adjacency of sky and roof, is mapped to the edge ( I , 2 ) of
IG,etc.
Subgraph isomorphism is a complex mathematical problem known to be
in general NP-complete. This means that the computation time grows ex-
ponentially with the number of the vertices and may become unacceptably
great for large graphs. Although this is not the case for planar graphs, the
difficulty remains relevant for analyzing planar images: as we shall show,
planar graphs are not sufficient to describe all relations that may be
important for image analysis. Hence, the computation time may also become
great for planar images.
Another disadvantage of the method consists in many “false alarms”: e.g.,
it is easy to see that the graph PG in the last example may also be mapped
onto the subgraph of IG representing the tree in Fig. 19b since this subgraph
is also isomorphic to PG.Thus, the tree will be erroneously recognized as a
house. The reason for such errors is that too little information about the
desired objects is contained in prototype graphs.
Both disadvantages have common causes: if too little information about a
vertex of a graph is available, then the program tries to match it with a large
number of vertices of the other graph. This causes a long computation time
and a large number of “false alarms,” since the probability of encounterring
an occasionally isomorphic subgraph becomes high. There is a possibility of
overcoming both disadvantages of the subgraph isomorphism simul-
taneously: the information content of the data describing a vertex in a graph
must be increased by assigning to it some additional features. A vertex in a
graph may be distinguished from other vertices only by its relations to other
vertices. A region of an image, however, may be characterized by many other
features, such as colour, texture, size, shape, etc. Such features may be
assigned to a vertex of an image graph in the form of labels.
The vertices of a prototype graph may be labelled by similar labels and
then a prototype vertex should be matched only onto an image vertex having
the same label. However, to make the recognition procedure more flexible
with respect to the variability of images, it is more advisable to label the
prototype vertices by other symbols, corresponding to classes of possible
feature values. Such symbols may be regarded as sematic labels whose com-
patibility with the features is known. Then a prototype vertex must be
FINITE TOPOLOGY A N D IMAGE ANALYSIS 23 1
matched only with those vertices of the image graph that have compatible
feature values, i.e., values belonging to a predetermined class of possible
values.
Realization of this idea leads to the following
/T\
white
green
green
a) bl
FIGURE20. Two scenes leading to recognition errors: (a) unexpected adjacency, and (b)
missing adjacency.
edge more, but this does not prevent the finding of the subgraph isomorphic
to PG. On the contrary: when applying consistent labelling in the same case,
without correcting the adjacency relation, the roof would not be recognized,
since the adjacency of the roof to a green region was not allowed. On the
other hand, if a roof without a wall is present inside a blue region (Fig. 20b),
subgraph isomorphism would reject it, since there is no complete subgraph
isomorphic to PG. Consistent labelling, however, would recognize a roof,
since there are no adjacencies that are not allowed. These considerations
demonstrate the advantages of subgraph isomorphism as compared to con-
sistent labelling.
The method of subgraph isomorphism may be still improved by two
means. Firstly, more features of image parts and relations between the parts
must be introduced. Features must represent size, area, shape, curvature, etc.
Relations need not be those of adjacency: relations important for image
analysis may have the nature of geometric features of pairs of image parts
that are not necessarily mutually adjacent. These may be, e.g., angles between
lines, quotients of sizes, quotients of curvatures, etc.
Additional features and relations may increase the reliability of recog-
nition and reduce computation time. To achieve this, the order in which the
vertices of PG are tested must be chosen properly: the vertex of PG whose
semantic label is compatible with the fewest vertices of ZG must be tested first.
Such a vertex does not match most of the vertices of ZG, and the correspond-
ing matching variants are rejected from the beginning. There are only a few
vertices of ZG that match it, and only in these few cases are further vertices
of PG tested. Thus, the number of tested variants may essentially be reduced.
Additional relations may also reduce the computation time when at least
some relations are stored in the form of pointers indicating those parts of the
image that are in the desired relation to another part. This is the case, e.g.,
for the bounding relation in a cell list. Imagine that a vertex z, of ZG is found
whose features match the vertexp of PG. There is another vertex q in PG that
is in a relation R with p. If R is represented by a pointer then there is a pointer
(in the data structure describing the image) indicating a vertex w of ZG that
234 V. A. KOVALEVSKY
is in the same relation R with v . Thus, the time to scan all vertices of ZG to
find those in the relation R with v may be saved. In this way, certain
additional relations that are properly encoded may reduce the computation
time. Other additional relations serve as a means to reject more matching
variants as early as possible, which also reduces the computation time. The
second means of improving the method of subgraph isomorphism consists in
the following. Graphs, as tools for representing topological relations between
image parts, must be replaced by complexes, as was shown in Sections 11-V.
Cell lists (Section 111) not only describe the topological structure of images
completely and consistently; they also contain precise geometrical data about
image parts. These data are represented in the form of coordinates, which
makes possible the application of analytical geometry to image analysis.
possible the application of analytical geometry to image analysis.
Let us improve the subgraph isomorphism method step by step. Consider
first a slightly changed version of Formulation 2. The change consists in
representing the set ZE of the edges of the image graph ZG as a binary relation
in the set V of the vertices. (Remember that region adjacency graphs were
introduced to represent the adjacency relation of the regions.) The existence
of an edge between the vertices v, w E V will then be expressed as a two-place
predicate PR”(v, w) of two vertices v and w ; this predicate is true if the
vertices are connected by an edge of ZG.
For the second step, the featuresf, the semantic labels s,and the interpreta-
tion relation IN will be replaced by a set of one-place predicates. One such
predicate PRI must be assigned to every vertex p of the prototype graph PG
instead of the semantic label s. The predicates PRI are defined on the set of
vertices of ZG:
PRI: V + {true, false}.
PRI ( v ) is true if the formerly used featuref = FM(v) and the semantic label
s correspond to the interpretation relation IN c F x S.
Then, the mapping M : PG+ZG of the prototype graph PG into the
image graph ZG must be replaced by a mapping M V from the set P of
vertices of PG into the set V of vertices of ZG. The mapping of edges is
then replaced by the requirement that the previously mentioned two-place
predicate P R of certain vertex pairs of ZG be true, particularly if a pair ( p , q)
of vertices of PG is connected by an edge of PG then this pair becomes
labeled as related to the predicate P R . The corresponding pair ( M V ( p ) ,
MV(q)) of IG must then be tested as to whether P R of this pair is true. It
must be stressed here that all these changes influence only the form but not
the essence of the problem formulation. This step is necessary to make the
next, essential change comprehensible. The changed problem statement is as
follows:
FINITE TOPOLOGY AND IMAGE ANALYSIS 235
Formulation 4 Predicate Conditioned Mapping
Given:
a) the set V of image regions;
b) the set P of prototype regions;
c) a one-place predicate PRL for every region p of P with the predicate
defined on the set V of image regions;
d) the subset RP of marked prototype region pairs ( p , q), p , q E P;
e) a two-place predicate PR" defined on the set of image region pairs
(v, W ) , 27, WE v.
Find:
a mapping MI? P - + V such that the one-place predicates of all images
M V ( p ) of the prototype regions P E P and the two-place predicates of the
pairs ( M V ( p ) , MV(q))of the images of all marked pairs of prototype regions
are true.
It is easy to see that Formulation 4 is equivalent to Formulation 2 if
1) the set RP of marked pairs is equivalent to the former set of edges of
PG;
2) V V E V , V p e P P R ~ ( v=
) (FM(v), S)E I N with s = S M ( p ) ;
and
3 ) VV, W E V PR"(v, W ) = (v, W ) E IE.
Formulation 4 may be naturally generalized in a way, which is based on the
representation of segmented images by block complexes as specified in
Section VII. We shall call the elements of a block complex in the sequel cells
rather than blocks because they are cells with respect to the block complex.
The generalization consists in the following:
1) We replace the set Vof image regions by the set SC of cells of the image
block complex. Thus, the image graph IG is replaced by the image complex
IC = (E, B, dim).
2 ) We replace the prototype graph PG by a prototype complex PC = ( P ,
B', dim') containing cells of dimensions 0, I , and 2 while the 2-cells are the
prototype regions and the other cells compose their boundaries.
3) We replace the one-place predicates PR: verifying the colors of regions
by other one-place predicates having a cell of IC of any dimension as its
argument. This may be, e.g., a predicate depending on the area of a region,
or on the direction of a line, etc. Such a predicate may be defined as being
true if the area of a region is in a predetermined range, i.e., greater than A,,,
and less than A,,, . One or many such one-place predicates may be assigned
to a cell p of the prototype complex PC in the same way that we have formerly
assigned a semantic label s to a vertex of the prototype graph. Instead of
verifying the interpretation relation IN, we must now calculate all predicates
assigned top, for the cell c of the image complex IC,which p is mapped onto.
236 V. A. KOVALEVSKY
FIGURE
21. A prototype complex representing a house.
238 V. A. KOVALEVSKY
Further, we assign to the pair (&,, R , ) both and to the pair (&, R 2 )the first
of the following two-place predicates:
PR; = (the first cell bounds the second one);
PR; = (the first cell lies below the second one).
Then a global predicate defining a house may be
G P = ( P R ; ( M P ( R , ) )v PRi(MP(R1))) A PRi(Mf‘(R2))A
f‘Ri(MP(L6))A PR;’(MP(&), MP(R1)) A
PR;(MP(L6)9 MP(R1)) A PR;(MP(L,), MP(R2)).
We have represented predicates by their verbal descriptions. In computer
realizations, each predicate is a subroutine that may be applied to certain
records in the cell list representing the image. The subroutines test geometri-
cal features of cells of different dimensions, topological and/or geometrical
relations of pairs of cells, and return logical values “true” or “false.” These
values are then verified by the global predicate realized as a main subroutine
calling the predicate subroutines. The next section describes the realization of
this concept.
439BBeI
22. Variants of prototype complexes representing a house.
FIGURE
Trapez iun
Parallelogram
adjacent to horizontal horizontal
Padjacent
a r a 1 l ;to
;ogrmn I H ;,uds E I RJ
e
each other
Comnon side of
para1 lelograms
ver t ica 1
+$
FIGURE
24. Prototypes of four classes of hand-made drawings.
FINITE TOPOLOGY AND IMAGE ANALYSIS 24 1
1 S i z e
3 One neighbor of
comparable size
4 Rectang1e
5 1
I
Rectangle is longer I
I
‘RJ
1 I I
Comnon side is 7 Comnon side
horizontal
10 Comnon side RJ
8 is below the
trapezim
I I
111
I
H O U S E I
1
9
Comnon side is
the great base
1
of trapezium
12 S H I P
!+MOOTHING
13 IRON
FIGURE
25. A decision tree for recognizing the classes of Fig. 24
A record contains the following fields, which are also written in an arbitrary
order: the name of the vertex, the name of the eldest son,* the name of the
next brother, the name of the predicate (subroutine), some input parameters,
and some output results. An input parameter may be a number or a name.
A number will be transferred to the subroutine and directly used for calcula-
tions, e.g., as a limit for a value being verified by the subroutine. A name as
an input parameter points to a result obtained at some previous stage of the
recognition process. The same name must be used in the record of another
vertex as an output result.
Consider as an example the record describing vertex 2 of the tree of
Fig. 25:
Record = (Name: trapezium; Son: one neighbor; Brother: rejection;
Subroutine: TRAPEZ; Znpl: region; Znp2: 0.05; Znp3; 0.2; Outl: decision;
Out2: great base; Out3: small base; End vertex).
The notation “Son: one neighbor;” denotes that if the first output parameter
“Outl: decision” of the subroutine “TRAPEZ” is equal to 1 (which means
“true”) then the next vertex to be chosen is the vertex with the name “one
neighbor”. The notation “Znp2: 0.05;” means that the value 0.05 must be
transfered to the subroutine “TRAPEZ” as its second input parameter. This
value will be interpreted by the subroutine as the upper limit for the sine of
the angle between two straight segments that are considered candidates for
the bases of a trapezium. The sine must be less than 0.05 for the two segments
to be recognized as the bases of a trapezium. Similarly, the notation “Znp3:
0.2;” means that the sine of the angle between the lateral edges must be
greater than 0.2.
The notation “Out2: great base;” means that the subroutine “TRAPEZ”
returns as its second result the pointer onto the straight segment in the cell
list; this segment was recognized by “TRAPEZ” as the greater base of the
trapezium.
As an example of using pointers as input parameters, consider the descrip-
tion of vertex 6 of Fig. 25:
Record = (Name: common great base; Son: hammer; Brother: rejection;
Subroutine: EQUINT; Znpl: joint; Z p 2 : great base; Outl: decision; End
vertex).
Here, the notion “Znp2: great base;” means that the second input parameter
of the subroutine “EQUINT” is the pointer onto the straight segment in the
cell list; the segment was previously found by the subroutine “TRAPEZ” as
* According to the commonly used terminology referring to genealogical trees, the son of a
vertex v is the vertex at the end of an edge starting at v . The meaning&Juther and brother is
obvious.
FINITE TOPOLOGY AND IMAGE ANALYSIS 243
the greater base of the trapezium. The task of “EQUINT” is to check if two
integers are equal to each other. In the particular case of vertex 6, these
integers are pointers onto the great base and onto another straight segment
“joint” bounding both the trapezium and the rectangle. However, the same
subroutine may be used in other vertices of the tree to compare other
integers.
A special compiler translates the set of descriptions into an array whose
elements are numbers. Some of the numbers are numerical parameter values,
others are addresses in the array. The array is then used by the main
recognition program to control its performance: the program reads the array,
calls the necessary subroutines, verifies their decisions, and tracks the corres-
ponding path in the decision tree, leading to an ultimate decision. As we have
seen in the preceding examples, the array also serves to transfer data among
the subroutines.
The described recognition program was successfully used for recognizing
both the synthetic images shown in Fig. 26 and the hand-made drawings
shown in Fig. 27. Synthetic images were produced interactively, by means of
an image editor, and then converted to cell lists. Such an image consists of
polygonal regions, each having a gray value different from those of other
regions. The sizes, shapes, and locations of the regions were chosen
244 V. A . KOVALEVSKY
FICUKE
27. Hand-made drawings used in experiments.
X. APPLICATIONS
A . Handwritten Characters
The image analysis method described in the previous section was used by the
author in many applications. The earliest of them (Kovalevsky, 1986) is
concerned with the recognition of handwritten characters. At this early stage
of the research the technique was yet rather imperfect: the early version of the
cell list contained a list of break points and a list of strokes but no regions.
The prototypes were described by matrices in which two one-place predicates
were specified for every prototype stroke and up to two two-place predicates
for some stroke pairs. No decision trees combining subclasses and classes
were used. Thus, the global predicate of each subclass was verified separately,
which took a relatively large amount of time. The variability of the pro-
totypes was represented by the presence of non-obligatory strokes. E.g., the
character “three” might have a short horizontal stroke at the break point in
the middle, through careless writing. The prototype was provided with a
corresponding non-obligatory stroke, which was mapped onto an
appropriate stroke of the image to be recognized if such a stroke existed. Its
absence, however, did not prevent the recognition. On the contrary, every
246 V. A. KOVALEVSKY
- 1
1
i d
1
FIGURE 28. (a) a hand-made block diagram, (b) boundary approximation,and (c) result of
automatic digitization.
FIGURE
28. continued.
was about 2%, and the recognition time about 2 seconds per character on a
rather slow PDP-1 1-like computer. Even at that early stage it was possible
to define new classes without changing the program: the prototypes were
described by editable text files, which were automatically compiled to
numeric matrices then used by the recognition program.
B. Block Diagrams
straight segments. The next processing stages are more complex. They are
subdivided into three hierarchy levels.
At the first level the break points of the boundaries are found and recorded
in an additional list. Break points are classified into 16 classes according to
the convexity or concavity of the boundary of the black region at the point
and to the orientations of the two boundary segments meeting at the point
(Fig. 29). At the second level, groups of adjacent break points are recognized
as “singular locations.” They are classified into four classes according to the
number of strokes meeting at a location (Fig. 30), and recorded into the list
of singular locations. Coordinates of singular locations are calculated as the
average values of the coordinates of all corresponding break points.
At the third level, the recognition of blocks, nodes, and connection lines is
performed as explained in the following. The prototype of a block is a white
region with exactly four concave corners while exactly two strokes meet at
each corner (Fig. 3 la). The rest of the outer boundary may be arbitrary. The
prototype of a node is a singular location that is either a cusp (Fig. 31b) or
a T-shaped crossing with the boundary of a block (Fig. 31c). All other
singular locations are either block corners or intermediate points of connec-
tion lines.
The prototype of a connection line is a pair of singular locations ( S L , ,SL,)
a b C d
FIGURE
30. Classes of singular locations:(a) cusp, (b) comer, (c) T-shapedcrossing, and (d)
cross.
FINITE TOPOLOGY A N D IMAGE ANALYSIS 249
a b C
satisfying certain conditions, as specified. Let us call the direction of the line
from SL, to SL, rounded to a multiple of 90" the main direction of the pair
( S L , , SL,). Thus, a main direction may be only east, south, west, or north.
The conditions are
a) there is in the cell list a boundary line, and two adjacent break points
such that one belongs to SL, and the other to SL,;
b) neither SL, nor SL, is a block corner;
c) if both are nodes then they belong to different blocks;
d) the straight line connecting SL, with SL, must have an angle with the
horizontal or vertical direction less than a predetermined threshold (e.g.,
30"); and
e) there is no other pair containing S L , , that satisfies condition a) and has
the same main direction as ( S L ,, SL,), in which pair the distance between the
singular locations is less than the distance between SL, and SL,.
Every boundary component of a binary image is represented in the cell list
as an item of the sublist of lines. Intermediate points of a line that are located
in the metric list compose a closed polygon. This makes the recognition of
blocks and connection lines easy and fast. Thus, during the recognition of
blocks, polygons contained in the cell list are tracked point by point. Each
point is checked as to whether or not it is marked as a break point. If so, then
the properties of the corresponding singular location are tested. As soon as
they d o not correspond to the requirements of the block prototype, the
tracking is interrupted and the next polygon is tested. If all properties are
correct, a block is recorded into the list of blocks.
After having recognized all blocks, the program reads the list of singular
locations and looks for T-shaped crossings. Each T-shaped crossing is tested
as to whether it is located at a block boundary. If so, a node is recorded into
the list of nodes accompanied by the identifier of the block to which the node
belongs. The record of the block gets a pointer onto the first of its nodes. All
other nodes of the same block are linked to each other by pointers and thus
may be read later on without searching.
250 V. A. KOVALEVSKY
C. Cartography
FIGURE 34. Fragment of a topographical map with a settlement: (a) the negative of the
original image, (b) visualization of the cell list with approximation tolerance of 2 pixels, ( c )
recognized street borders, (d) recognized houses and (e) synthetic image reconstructed from the
list of recognized objects.
FINITE TOPOLOGY A N D IMAGE ANALYSIS 253
tolerance). The vertical and horizontal diameters of the region, as well as the
ratio of the diameters, must be in prescribed limits. The same must be true
for the hole. The boundary components must be convex.
This description is complete: any region satisfying all indicated conditions
is a black ring with some possible small deviation from the ideal shape. The
description is rather complex and it may seem that checking all conditions for
all regions in an image may be rather time consuming. However, due to a
skilled construction of the decision tree, the average recognition time was
made small. The fulfillment of the first three conditions may be directly
extracted from the cell list. Each following condition is checked only for
regions that satisfy the preceding ones. The most time consuming condition
(that of convexity) is checked last (and hence, rather rarely). Thus, recogniz-
ing all seven trees in the comparatively complex image of Fig. 34a took a
fraction of a second.
A more complex problem is the recognition of houses, which are represented
as small rectangles merged with the strips denoting the street borders. The
program separates houses from street borders in the following way. For each
polygon edge e in the cell list whose length is greater than a threshold, all
other edges close and parallel to e are found. “Close” means that both end
points are on the black side of e, have a distance from e less than the
maximum possible strip width, and there is no third edge between e and the
edge being tested. “Parallel” means that the angle between the edges is less
than a threshold. Then a strip having e as one of its sides and a width equal
to the average distance of all found points is constructed and recorded as the
recognized street border.
All the necessary computations are fast and easily realizable when using
the cell list. However, the necessity of checking all pairs of edges may
nevertheless lead to great computation time. There are two ways to make the
computation faster. For comparatively small image fragments containing a
few thousand edges, it is sufficient to construct a circumscribed rectangle for
the edge e with sides parallel to coordinate axes, and to enlarge it by the
maximum strip width. Then, only such edges must be tested both of whose
end points are inside the rectangle. Such a test is much faster than the
calculation of distances and angles.
If the image fragment is large then it is expedient to prepare an auxiliary
data structure called “pseudoraster” (Kovalevsky, 1989b), which may be
considered as a means for two-dimensional sorting of objects in an image.
The image is (implicitly) partitioned into squares of, e.g., 32 x 32 pixels
composing a rectangular grid. For each square there is a list of pointers in the
pseudoraster indicating all objects (regions, lines, points) in the cell list that
cross the square. By this means it becomes possible to check for a given edge
e only such other edges that lie in the neighborhood of e.
254 V. A . KOVALEVSKY
D . Technical Drawings
a b
FIGURE35. Replacing (a) a magnified low resolution fragment in the cell list by (b) a high
resolution fragment.
where there are some arrows of the dimension lines, or where two or more
lines are located close to each other.
Experiments were done by means of a scanner consisting of a moving table,
controlled by the computer, and a fixed CCD-camera with two different
objectives yielding resolutions of 60 and 600dpi. A fragment of
200 x 200 mm was scanned with low resolution and quantized into four gray
levels such that level 0 corresponded to black regions and thick lines, level 3
to the white background, and levels 1 and 2 to thin lines, or groups of merged
parallel thin lines, or gaps between close lines. Then, a cell list was prepared
for this image and the curvature of the boundaries was measured. All
locations where the curvature was greater than a threshold (of about
( 5 mm)-’) were recorded as “suspect.” Each suspect location was surrounded
by a square of 2 x 2 mm. Overlapping squares were joined to clusters. Then
each cluster was scanned with the high resolution, the obtained image was
binarized and converted into a cell list, and this cell list was linked to the low
resolution list. The scale of the latter was adjusted to that of the high
resolution list simply by multiplying all coordinates by 10.
The linking procedure may be explained as follows. Suppose the location
of the boundary of the cluster scanned with high resolution is known exactly
as a region in the coordinate system of low resolution. Consider the boun-
daries of the dark strips that represent the drawing lines in both images with
low and high resolution. Regard now the crossing points of these boundaries
with the boundary of the cluster (Fig. 35). The part of the low resolution
image (Fig. 35a) inside the cluster boundary, shown as a thin square in Fig.
35a, must be deleted and replaced by the high resolution image inside the
corresponding square of Fig. 35b. Thus, those edges of the polygons
representing the stroke boundaries in the low resolution list, which cross the
cluster boundary, must be deleted and replaced by the corresponding edges
256 V. A. KOVALEVSKY
of the high resolution list. E.g., the edge L of Fig. 35a must be deleted, the
edge I of Fig. 35b must be disjoined from the point r of the high resolution
list and connected to the point p of the low resolution list. Similar changes
must be made with all edges crossing the cluster boundary.
The problem consists in finding the correspondence between two sets of
edges of both lists. The first difficulty arises when there are more high
resolution edges than low resolution ones. This may happen if some lines
close to each other are merged in the low resolution image (compare, e.g., two
horizontal lines in the bottom part of Fig. 35b with the corresponding part
of Fig. 35a). If such merged lines cross the cluster boundary, additional lines
must be inserted into the combined cell list.
The second difficulty arises because the location of the cluster in the low
resolution coordinate system is known only approximately. Consequently,
the boundary of the cluster must be replaced by a strip whose width depends
on the accuracy with which the location of the cluster is known. The mapping
between the two sets of edges that cross the strip must be such that the
directions of edges mapped onto each other coincide approximately and the
sum of squared distances between such edges be minimal.
The computer realization of this solution was successful. The simple
drawings used in the experiments were correctly linked. The same technique
was successfully used for linking many fragments of equal resolution to a
joined cell list. This is necessary if the drawing cannot be scanned as a single
fragment even under low resolution.
Another problem important for transforming scanned drawings into
CAD-files consists in replacing black strips by mathematical lines and in
recognizing types of lines such as dimension lines, symmetry axes, etc. Trans-
forming strips into mathematical lines cannot be realized by the usual
thinning techniques since these techniques are noise-sensitive and yield rather
imprecise results for lines crossing each other at an acute angle. Using cell
lists leads to faster processing and better results since the medial axes of strips
may simply be calculated by averaging the coordinates of the polygon
vertices representing the boundaries of the stripe. Recognition of line types
has also been realized successfully by means of prototypes describing
relations between dimension lines, auxiliary dimension lines, contour lines
etc. Experiments were successful under high resolution sufficient to reliably
recognize the arrows of the dimension lines. Thus, Fig. 36a shows a fragment
of a drawing and Fig. 36b shows the automatically found mathematical lines
(CAD-lines) represented by dotted, dashed, and solid lines, corresponding to
the recognized line types. The notation is clear from Fig. 36b.
Difficulties arise in the case of insufficient resolution. However, the concept
of using double resolution processing as described here may solve this
FINITE TOPOLOGY A N D IMAGE ANALYSIS 257
b
FIGURE36. Recognition of lines and their types: (a) an original drawing and (b) the
recognition results.
problem also, since the higher resolution may always be chosen high enough
to reliably recognise the arrows as well as all other fine details.
XI. CONCLUSIONS
The concept of finite topology presented here has led to a new data structure,
called cell list, for encoding and processing segmented images. The cell lists
make any topological and geometrical analysis of images efficient and simple.
Adjacent regions, regions contained in one another, and lines and points
incident with each other or with some regions may be directly and quickly
found in the list. Distances between points, areas and perimeters of regions,
angles between line segments, etc., may be easily calculated from their coor-
dinates explicitly stored in the list, by means of usual well-known formulae.
All geometrical transformations, e.g., translation, magnification, reduction,
rotation, etc., may be performed by recalculating the coordinates of the
0-cells and intermediate points according to formulae of analytic geometry.
258 V. A. KOVALEVSKY
The results of such transformations are immediately visible: a cell list may be
rapidly converted into a raster image and sent to a display unit.
As we have demonstrated in the last section, the cell list technique was
effectively applied to object recognition and structural image analysis. A
numeric analysis of coordinates of polygon vertices, instead of the commonly
used mask matching, makes the recognition procedure tolerant of geometri-
cal distortions of the boundaries. Thus, the technique was used to convert
scanned drawings into CAD structures. This was done for both technical
drawings and hand-made block diagrams. Recognition of hand-written
characters was also performed successfully. Recognition techniques for this
kind were implemented in an experimental system for computerized
cartography. There is every reason to hope that this technique will soon be
implemented successfully for analyzing three-dimensional images and struc-
tures.
ACKNOWLEDGMENTS
REFERENCES
Foreword . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
I . Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
I1 . Covariance Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 264
A . Covariance Models with Simple Symmetry . . . . . . . . . . . . . . . 265
B . Linear Covariance Models . . . . . . . . . . . . . . . . . . . . . . 266
C . Covariance Estimators Conforming to the Linear Model . . . . . . . . . 269
D . The Algebra of Inverse Toeplitz Covariances . . . . . . . . . . . . . . 271
I11 . Jordan Algebras . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
A . Generation of a Jordan Algebra . . . . . . . . . . . . . . . . . . . . 274
B . Dimension of a Jordan Algebra . . . . . . . . . . . . . . . . . . . . 275
C . Decomposition of the Covariance Estimator . . . . . . . . . . . . . . . 278
D. Jordan Algebra Homomorphism . . . . . . . . . . . . . . . . . . . . 279
IV . Explicit MLE Solution . . . . . . . . . . . . . . . . . . . . . . . . . . 281
A . Vector Formulation of the MLE . . . . . . . . . . . . . . . . . . . . 282
B . Necessary and Sufficient Condition . . . . . . . . . . . . . . . . . . . 283
C . Relevance to Class of Toeplitz Matrices . . . . . . . . . . . . . . . . . 285
D . Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . 287
V . AR Process Parameter Estimation . . . . . . . . . . . . . . . . . . . . . 287
A . The Transformation Method . . . . . . . . . . . . . . . . . . . . . 288
B . Covariance of AR Process Parameter Estimates . . . . . . . . . . . . . 291
C . Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . 291
D . The Role of a Jordan Algebra . . . . . . . . . . . . . . . . . . . . . 293
VI . Exact Loglikelihood for AR Process Parameter Estimation . . . . . . . . . . 296
A . The Box-Jenkins Likelihood . . . . . . . . . . . . . . . . . . . . . 297
B . The Forward-Backward Likelihood . . . . . . . . . . . . . . . . . . 298
C . Maximization of the Forward-Backward Loglikelihood . . . . . . . . . . 300
D. Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . 302
VII . Summary and Conclusions . . , . . . . . . . . . . . . . . . . . . . . . 309
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310
Appendix A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310
Appendix B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312
Appendix C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314
FOREWORD
Over the past decade, the notion of structured estimation and, in particular,
that of structured covariance estimation, has played a role in many statistical
signal processing problems. Too often, however, the structure of an underly-
ing problem has been used in an ad hoc manner to obtain estimators having
either low variability for finite sample size or low computational complexity.
Much of what is done, or should be done, has an abstract algebraic formalism
dating back to the laying of the mathematical foundations of quantum
theory. This work attempts to formalize both structured covariance esti-
mation and autoregressive (AR) process parameter estimation in terms of the
underlying abstract Jordan algebra, an algebra that differs from the usual
noncommutative but associative matrix algebra. Our investigation puts us on
a firm footing from which to attack a wide variety of future problems in
statistical signal processing, rather in the same manner that the introduction
of Lie algebra and Lie groups in control theory and robotics made new ideas
and developments possible.
I. INTRODUCTION
The need to estimate accurately a covariance matrix using finite data sample
size arises in a wide variety of signal processing problems. Intuition seems to
indicate that if the true covariance matrix is known to possess a certain
element pattern or structure, then constraining the structure of the estimator
in a similar manner should result in a covariance matrix estimate having
perhaps lower variability than one that would result by ignoring the
structure. The common argument in favor of a structured estimate is simple
- fewer parameters means that the available data sample will yield a “better”
estimator, since the number of data samples per parameter will be greater
vis-a-vis the same ratio for an unstructured estimator.
Doubts regarding this argument linger, however, principally because so
little is known, in general, concerning the behavior of finite sample size
estimators. The situation is further complicated if covariance matrix estima-
tion is but one step toward the estimation of another set of parameters, a
pertinent example being the estimation of the AR process parameters via the
normal, or Yule-Walker, equations. In this case, it is not at all clear for finite
data sample size if minimum bias, minimum variance AR process parameter
estimates result by using a covariance matrix estimate constrained in
structure to conform to the true covariance matrix.
In this work, we do not profess to solve the exceedingly difficult finite
THE INTERTWINING OF ABSTRACT ALGEBRA 263
sample size structure problem, but do attempt to shed some light thereon
under the assumptions that the underlying stochastic process is weakly
stationary multivariate Gaussian and that the true covariance R has a
positive definite ( p d ) linear structure, i.e., the true covariance may be
represented as a sum of linearly independent ( l i ) basis matrices with each
basis matrix multiplied by a true covariance parameter. This is the versatile
model that plays an important role in the work of Anderson (1969, 1970,
1973), in which an iterative algorithm is also proposed for finding the
maximum likelihood estimate (MLE) of R under the constraint, of linear
structure.
Considering the form of the Gaussian multivariate distribution (or, indeed,
the more general exponential family of distributions (Lehmann, 1986)), it is
then natural to investigate the nature of the MLE of R when R-'also has a
linear structure, and to determine what impact this may have on Anderson's
iterative algorithm. A necessary and sufficient condition for both R and R-'
to have a linear structure is that the subspace spanned by the li basis matrices
of R be a Jordan algebra, an algebra that played an important role in the
formalism of quantum mechanics (Jordan et al., 1934). As pointed out by
Seely ( 1 977) and Jensen (1 988), a Jordan algebra completely characterizes the
solutions to the MLE problem that are not only complete, sufficient, and
unbiased, but are also explicit and, under these assumptions, obtainable as
a relatively simple quadratic function of the data. The idea of considering an
algebra and its relationship to problems that arise in statistical signal process-
ing was inspired by the work of James (1 957) on experimental designs.
In this chapter, we accomplish a number of tasks: (1) provide an alternate
view of maximization of a relevant likelihood under linear constraints;
(2) derive the manner in which Anderson's iterative approach reduces to an
explicit MLE when both R and R-' have linear structure; (3) illustrate the
point that highly constrained, low-variance covariance matrix estimates may
not necessarily lead to AR process parameter estimates having equally low
variance for finite sample size; (4) derive a method for obtaining the exact
MLE of AR process parameters; and (5) complement Morgera (1992) in
demonstrating the utility of considering the underlying algebra and its
properties found in many statistical signal processing problems.
Our style is largely expository, however; new results are presented and
generally highlighted in the form of theorems, lemmas, and propositions. As
study of an algebra is a highly abstract topic in its purest form, certain
concepts not generally found in the engineering literature are presented, but
relegated to an appendix. A rather comprehensive list of references is
included for researchers wishing to delve further into topics treated.
264 SALVATORE D. MORGERA
MODELS
11. COVARIANCE
where
r N
is the general sample covariance matrix. Taking the logarithm of (l), multi-
plying through by 2/N, and neglecting the additive constant, the log-
likelihood function may be written as
f(R I x) = - In IRI - tr (R-'&). (3)
We shall have occasion to use (3), or other expressions derived from (3),
throughout this chapter.
There are two observations regarding (3) that are worth noting. The first,
that the reader may verify, is that f ( R 1 x) < f ( & 1 x), R E L,( V ) , where
f ( R , I x) = In Iff,' I - p . We have, therefore, an upper bound onf(R I x) for
R E L,( V ) ,which can be written in terms of the entropy of the sample set and
is attained at the stationary point R, as will be shown. This upper bound
may, in turn, be tightly bounded for 1 < p < N with additional assump-
tions by the use of recent results (Jonsson, 1982). The second observation is
of theoretical interest and concerns the behavior off(R I x) about the point
l?, when R is not restricted to being symmetric, i.e., when R E L( V ) . Instead,
it is convenient to work with R-'E L (V ) . Construct such a matrix R-' as
R-' = l?,' + uE, where the perturbation E = - ET about the point I?;' is a
skew-symmetric matrix, and u is a real scalar. Using (3), it is not difficult to
show that
THE INTERTWINING OF ABSTRACT ALGEBRA 265
In Karrila and Westerlund (1991), a proof is given for the inequality
+
IR;' aEl > lR;'l, for u # 0; thus, f(RI x) >f(R,I x). The conclusion is,
therefore, that the stationary point R , for R E L3(V ) is, evidently, a saddle
point for R EL( V ) . This observation provides some feeling for the nature of
the loglikelihood surface.
The inner product on the space L,( V ) (or L( V ) ) is called the trace inner
product. Let {e;: i = 1, 2, . . . ,p } be any orthonormal basis for V ; we define
the trace inner product for A, B E L , ( V ) as tr(AB) = tr(BA) = Z,t, ( A e i ,
Be;). The trace inner product also plays a role in describing the necessary
condition for maximization of f(R 11) over REL,(V), i.e., setting the
+
gradient of (3) equal to zero results in the set of p ( p 1)/2 equations,
tr[(R-'&R-' - R-')Ejj] = 0, i, j = 1, 2, . . . , p ; j 2 i, (4a)
where
and p,/ is the i j th element of R. We call a linear subspace basis set such as { E,, :
i , j = 1, 2, . . . , p ; j 2 i } a structure set; in the previous case, the structure
is simple symmetry and the solution to (4a) for N 2 p is R,. The problem
becomes more interesting when additional structural constraints are
imposed. These generally lead to a loglikelihood surface that exhibits local
maxima, due to passage of the constraint set through ridges of the log-
likelihood surface associated with simple covariance model symmetry.
It is well known that R , is p d with probability 1 if N 2 p , is the MLE of
Rover allpdmembers of L,( V ) ,and constitutes a sufficient, unbiased statistic
for R which is, however, not necessarily a minimal sufficient statistic. Finite
sample size results involving specific functions of R, (Morgera and Cooper,
1977; Morgera, 1981) suggest that, for sufficiently large N , Nmust be at least
5p for R , to be close to R on the average. For example, in Morgera and
Cooper ( 1 977) and Morgera ( 1 986), the classical two-hypothesis problem,
consisting of signal plus noise and noise only, is treated in the context of
adaptive pattern classification, and a performance criterion, the signal-to-
interference ratio (SIR), is derived and employed to compare the behavior of
various covariance matrix estimators as a function of sample size. The SIR
conditioned on an unbiased, nonsingular covariance matrix estimate R of R
is shown to be approximated by
266 SALVATORE D. MORGERA
The optimum, or Wiener, weight vector, wept, is given by wop, = R - ' s, where
s is the signal vector. The preceding expression for the SIR provides a useful
means whereby the expected SIR, and consequently, the probability of
classification error under the Gaussian assumption, may be evaluated for
various covariance matrix estimates. Substituting (2) into the preceding
expression and taking the expected value leads to the very simple and
revealing result
N > P-. + 7
U
Note that when a = 0.2, i.e., the expected SIR is within 1 dB of the optimum
SIR, the required sample set size is at least N = 5p + 35, which is
approximately Sp for p large. The reader may select a value for p suited to
the application at hand. Figure 1 illustrates the improvement in expected SIR
with N for p = 12.
For statistical signal processing problems in which R must be accurately
estimated orf(R I x) must be maximized, the sample size necessary when R ,
is employed is usually not available. We also note that the sample size
required to achieve a specified estimation accuracy depends to a great extent
on the functional form of the estimate, i.e., the manner in which the estimate
depends on the elements of R,. With reference to the previous example, an
accurate estimate of the expected SIR does not necessarily imply an accurate
estimate for the Wiener weight vector. In general, it makes sense to select a
reasonable model for R that possesses fewer than [ p ( p + 1)]/2 free
parameters and to require that the estimator, R , conform to the model.
One covariance model that is felt to be reasonable is the linear model, for
which R is written as
0.9-
RT,
_ _ - _- _ - -
/._-
I
-
0.7 - -
-
-
-
-
-
-
0 I I I I 1 1 I I
20 40 60 80 100 120 140 160 180 200
N
FIGURE1 . Expected signal-to-interference (SIR) ratios for unstructured, R,, and Toeplitz
structured, 8,. estimators as a function of sample set size N forp = 12 and b = 3. Optimum SIR
is equal to unity.
over a nonempty open set R of R". Suppose further that for each P E R , the
covariance R ispd. Let Lg be the m-dimensional linear subspace of L,( V ) , for
which the set {Ci: i = 0, I , . . . , m - l} constitutes a basis; we may then
characterize the family of covariance matrices conforming to ( 5 ) as
9 = { R E L g :PER}.
We also assume that the identity matrix ZEY,although this is not strictly
necessary. Now using (3) and (9,the necessary condition for maximization
off'(R 1 x) over 9 results in the set of m equations
tr[(R-'k,R-' - R-')E,] = 0, i = 0, 1, . . . , m - 1, (6a)
where
A =
I
1=o a , A , , AEUV),
This identity is quite interesting and reveals the manner in which the solution
R is related to 8, when certain additional structural constraints are imposed.
An iterative procedure of the type previously alluded to could now be
developed. Such an approach would, we feel, be of benefit only if n < m
and/or a general approach could be devised t o find the A, matrices; neverthe-
less, the connection to a Lie algebra is indeed fascinating and worthy of
additional investigation.
THE INTERTWINING OF ABSTRACT ALGEBRA 269
C. Covariance Estimators Conforming to the Linear Model
+
where m < m' < [ p ( p 1)]/2, fi = [$,, $, ... $ m . - , ] T , and the structure set
{ H i :i = 0, 1, . . . ,m' - l} has ( p x p)-dimensional members that are known,
symmetric, and li. Now, let Lh be the m'-dimensional linear subspace of L,( V )
for which the set {Hi: i = 0, 1, . . . , m' - l} constitutes a basis. Typically,
5
Lg Lh, and we leave the apparent issue of estimating a parameter vector p
of dimension m from a statistic fi of dimension m' for later.
When R is symmetric Toeplitz, m = p , unless a reduced band structure is
suspected, and it is commonly assumed that Lh L g , i.e., the structure set
{G,: i = 0, 1, . . . , m - l} serves as a basis for R . In this case, we shall see
in Section IV that the MLE fi of p cannot be explicitly written in terms of the
sufficient statistic R,, i.e., as a linear transformation of the elements of R,
depending only on the structure set. For situations of this type, Anderson, as
previously mentioned, has refined (3) and suggested a Newton-Raphson, or
scoring (Rao, 1973) approach to finding the MLE. In practice, this method
can suffer from convergence difficulties and has problems negotiating
multiple maxima exhibited by (3) under structural constraints. In regard to
convergence, considerably improved algorithms are found in Morgera and
Armour (1989) and Mukherjee and Maiti (1988). We further note that any
algorithm should not allow excursion of the iterates outside the set f2. The
following example illustrates a structure set when the estimation problem is
formulated as previously described.
Example 1. The symmetric Toeplitz structure results in the following
structure set when m = p = 5 :
-
0 1 0 0 0
- -0 0 1 0 0
-
1 0 1 0 0 0 0 0 1 0
G,=Z, GI= 0 1 0 1 0 , G2= I 0 0 0 1 ,
0 0 1 0 1 0 1 0 0 0
-0 0 0 1 0 - -0 0 1 0 0-
270 SALVATORE D. MORGERA
- - -
0 0 0 1 0 0 0 0 0 1
0 0 0 0 1 0 0 0 0 0
G3= 0 0 0 0 0 , G4= 0 0 0 0 0 . ( 8 )
1 0 0 0 0 0 0 0 0 0
-0 1 0 0 0 - 1 0 0 0 0-
It is instructive to represent each Gi matrix in terms of rank one linear
transformations of Vinto itself. Recall that, in general, for v, w E V , the linear
transformation u --t (w,u)v of V into itself is the dyad vwT. Each Gi may be
represented in terms of the standard basis, { e i : i = 1, 2, . . . ,p } of V , where
e, has unity in the ith position and zeroes elsewhere. Since the structure of R
falls within the broader class of symmetric centrosymmetric (SCS) matrices
whose elements satisfy pi, = p,i = pp+ - i , p + I -,,
i, j = 1, 2, . . . ,p , and whose
eigenvectors (for distinct eigenvalues) are always of symmetric or skew-
symmetric form (Collar, 1962; Morgera, 1982), it is preferable to use the
following orthogonal transformation of the standard basis:
1 - 1
E l = - (el + e,), e, = -(e2 + e,), C, = e,,
8 8
1 1
= - (e, - e,), C - - (el - e 5 ) . (9)
64
4 ’-$
Now, in terms of the basis { E l : i = 1, 2, . . . ,p } , we have
5
Go= C Eli!:
1 = I
G4 = ClEY - E5Ci
Note that, in general, all dyads occur in the symmetric form Eli!,? + C,Cf;
i, j = 1, 2, . . . ,p . The trace inner product of the basis matrices is given, in
general, for any dimension, by
tr(G,G,) =
L?
2(p - i ) , i = j # 0;
p, i=j
i#j.
We return to the SIR estimation problem of Section 1I.A to provide the
= 0. (1 1)
THE INTERTWINING OF ABSTRACT ALGEBRA 27 1
reader with an understanding of the practical value of employing a linear,
structured covariance estimator. Assume that the problem appears to
warrant the use of an estimate of Toeplitz structure, which we denote by R , ,
formed as in Morgera and Cooper (1977) and Morgera (1981) using the
structure set of Example 1, and possessing m = p covariance parameters.
Very little is known concerning the finite sample size behavior of such a
structured estimate or functions thereof other than what is found in the work
of Morgera and Cooper (1977; 1981). From Morgera (1981), we conclude
that
111. JORDANALGEBRAS
To generate a Jordan algebra using (15) and the basis elements Gi€ L g , we
start by assuming that the Gi are candidate elements of the basis set for Lhand
form all products Gf and G,GjGi,i, j = 0 , 1, . . . , m - I; i # j . The resulting
matrices are found to be linearly representable in terms of the elements Gi and
a number of symmetric matrices not in L, that serve as additional candidate
elements of the basis set for Lh. We then proceed in a recursive fashion, again
forming the above products using all candidates until no additional can-
didates are found. The process terminates in a finite number of steps, at which
point we delete all elements that are linearly representable in terms of other
elements and retain those that remain as a basis set for L,.
Example 3. This example illustrates the extension of the Toeplitz matrix
structure set { G i : i = 0, 1, . . . , m - I}, for m = p = 5 , of Example 1 to a
structure set { H i :i = 0, I , . . . ,m‘ - I } for the Jordan algebra Lh. Following
the procedure outlined previously, we find that a basis for the subspace Lh is
THE INTERTWINING OF ABSTRACT ALGEBRA 275
given by
- - - - - -
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
H o = O O O O O , H l = O O O O O , H 2 = 0 0 1 0 0 ,
0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
-0 0 0 0 1- -0 0 0 0 0 - -0 0 0 0 0-
- - - - - -
0 0 0 0 1 0 0 0 0 0 0 0 1 0 0
0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
H 3 = 0 0 0 0 0 , H 4 = 0 0 0 0 0 , H 5 = 1 O O O 1 ,
0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
-1 0 0 0 0- -0 0 0 0 0 - -0 0 1 0 0-
- - - - - -
0 0 0 0 0 0 0 0 1 0 0 1 0 0 0
0 0 1 0 0 0 0 0 0 1 1 0 0 0 0
H 6 = 0 1 0 1 0 , H 7 = 0 0 0 0 0 , H , = O O O O O .
0 0 1 0 0 1 0 0 0 0 0 0 0 0 1
-0 0 0 0 0 - -0 1 0 0 0- -0 0 0 1 0-
We see that m' = dimL, = 9 and that L, is the subspace of SCS matrices,
consistent with what was shown in Example 2, and, in fact, Examples 2 and
3 illustrate that neither the method for finding a basis set nor the set itself is
unique. In terms of the generating basis, we have Go = H , H , H 2 , + +
+ +
GI = H6 H , , G2 = H4 H , , G, = H , , and G4 = H , . The multiplication (*)
table for this Jordan algebra is found in Table 1 of Appendix B. Those
matrices for which H , * Hi= H i , or H,' = H , , are called the principal idem-
potents of the algebra; these are H,, H I , and H 2 .
where
alk/ = ( H IZ/ 9 E,) = El H IE/. ( 17b)
Since the E,, i = 1, 2, . . . , p , form an orthonormal set,
5
The set of structure constants {aik/} may be arranged into m' p-dimensional
278 SALVATORE D. MORGERA
matrices u i , where aikl= uilk is the ( k , I)th entry in the ith matrix, i = 0,
1, . . . , in' - 1. We say that ai is the matrix of Hi relative to the basis set
{Ci:i = 1,2, , . . ,p} of V(see Appendix A, Definition 1). The correspondence
Hi ++aj is an isomorphism of L,,, the Jordan algebra, onto the algebra L,( V )
of the symmetric ( p x p)-dimensional matrices with entries ajk,.The interested
reader may wish to find the a ; ; we provide one as a means of checking. The
isomorphism H, 2 u, has structure matrix
- 0 1 0 ; 0 0-
a
1 0 0 :
I
0 0
I
0 0 0 : 0 0
I
_ _ _ _ r _ _ _ _ _ _ _ - - - - - - - -
I
0 0 0 : 0 - 1
I
-0 0 0 : - I 0-
where we have explicitly shown the partition into blocks of sizep, = rp/21 and
pz = Lp/2J. We note that the a; reveal a decomposition of V into orthogonal
subspaces as the direct sum V = V, 0 6 .In this example, we have V, = sp{C,,
E2, E , } and V, = sp{&,, E5}, where sp denotes linear span.
Now, let us assume that R e 9 and that Y = 9-', where R may be written
as R = ZyLil p,Hi. Let R of (7) be the solution to (14) and recall that
$ z A ~ $ ~ x . . . x $ k . T h e u n i q u e d e c o m p o s i t i o n V =V , 0 V 2 0 . . . 0 v k
into a direct sum of k orthogonal subspaces allows R to be written as
where R denotes an element of the real field; there are m' = 9 such scalar
, .a,, respectively, are
parameters. Typical members A , , A , of the ideals 9,
shown as follows, where a, 6 , c, . . . , i E R :
-
a c e c a g i 0 -i -g
c b f b c i h 0 -h -i
A, = e f d f e , A,= 0 0 0 0 0
c b f b c -i -h 0 h i
a c e c a -g -i 0 i g
To find thc dentities in each ideal, we write the ideal subspace multiplication
tables; these are found in Table 2 of Appendix B. Examining this table reveals
that, with proper normalization, the orthogonal projectors are
Ql = +(Ho + Hj) + +(Hi + H4) + H2
(21)
Q2 = +(Ho - H3) + +(HI - H4).
We now present the final property of Jordan algebras and the idea of an
algebra homomorphism, a mapping that preserves algebra structure and a
concept that has its analog in almost every part of mathematics (see
Appendix A, Definition 5). To do this, we formalize the results found in
Example 4. In terms of the algebras involved, the isomorphism discussed in
Example 4 may be thought of as a 1 : 1 Jordan algebra homomorphism,
which we refer to simply as a Jordan algebra isomorphism in the sequel, such
280 SALVATORE D. MORGERA
t(8)= 2
, I
(26)
-
where the N ti(&) are statistically independent, complete sufficient
statistics for the family 9 and have Wishart W ( z i ( R i )pi,
, N ) distributions.
Subsequent simulation results, as they pertain to the class of SCS matrices,
employ the decomposition (24).
EXPLICIT
IV. EXPLICIT SOLUTION
MLE SOLUTION
demonstrate the
The goal of this section is to demonstrate the manner inin which
which Anderson’s
Anderson’s
explicit expression
approach (1969, 1970, 1973) leads to an explicit expression for
for the
the MLE
MLE I?I?
apdmember of a Jordan
of R in the case that R is apdmember Jordan algebra
algebra of
of symmetric
symmetric linear
linear
282 SALVATORE D. MORGERA
3 = [fill $22 3
(28)
g, = [s',l/ g';l . . . g;; g',lj . . . gp-l,p]T ;
(1) i = O , 1,. . . , m - 1.
The vector formed in the same manner from the elements of R, is denoted
+
by i,. Construct the [ p ( p 1)]/2-dimensional matrix @(I?) from the elements
4 # j , k / = fitkfij, +
$,$, , i < j , k < 1, where +lJ,kl is the element of @ ( R )in the row
position of filJ in P and in the column position of fik, in PT. Note that @(R) =
NCov (i,) and that @(I) = 2 4 0 Zp(,, -
Using these definitions, each side of (27) may be written as (Anderson,
1973)
tr ( kI G, R - I GJ) = 2g:@(I?) ~ I gJ,
(29)
tr(klGl&-'&) = 2g:@(R)-'PG; i, j = 0, I , . . , , m - 1.
Finally, we define the [ p ( p + 1)/2 x m]-dimensional matrix G as G =
fi, we obtain
I , I
intermediate expression,
1
@(R)@(Z)-'G= - S'(R@ R ) Y . (36)
Jz
Employing the linear model for fi, we have
Rod= c 1b i j ~ i ~ j [ ( G i O G , ) + ( G , O G , ) l ,
i j
(37)
j 2 i
where b, = 1 - 6,/2. There are a total of [m(m + 1)/2] terms in the summand
of (37). We are now prepared to state the following proposition.
Proposition 2. If R is a pd member of a Jordan algebra of symmetric linear
transformations for which the basis set is { G o , GI,. . . , G , , - ' } , i.e., RELY,
L? = Y - ' ,then the m columns of Q are spanned by m eigenvectors of R @ R
or, equivalently, m li linear combinations of the columns of 9 are eigenvectors
of R @ R. In mathematical form,
( R @ R)Y = Y X , (38)
where the m-dimensional (nonsymmetric)matrix X is pd.
The proof of this proposition follows directly from the discussion of Jordan
algebras found in Section I11 and some straightforward algebra. The pro-
position applies equally well to an estimate 8 that is also an element of the
Jordan algebra; in this case, we write (I? @ 2)s = 92.The relationship (38)
is key to simplifying (30) and exposing the explicit MLE. In fact, (38) is a
necessary and sufficient condition for obtaining an explicit MLE. We state
the result of applying Proposition 2 to the simplification of (30) as the
following lemma.
Lemma 2. Let R be as defined in Proposition 2. The MLE, fi, of p is given
bY
fi = [G'~(Z)-'G]-'G'@(Z)-'P,. (39)
The MLE so obtained is explicit and a complete suficient statistic for p.
Proof. The estimate R is an element of the Jordan algebra; substitute
(38) into (36) to obtain @(d)@(Z)-'G = G 2 . Recall that 2 ' = LY-', so
that R - ' is also an element of the Jordan algebra; thus, from Proposition 2,
@(Z)@(&'G = Gf,where P = 2-l. Using these expressions and noting that
2 [ G T @ ( r ) - ' G ] -P'
' = [GT@(r)-'G]-',we obtain [G'@(R)-'G]-'G'@(ff)-' =
[G'~(Z)-'G]-'G'@(Z)-', which simplifies (30) to (39) and renders the
solution explicit. The completeness and sufficiency follows from Section 111.
We make a remark in the special case that R is a member of a von Neumann
algebra, as defined in Section 1I.D. If this is so, the solution is explicit if and
THE INTERTWINING OF ABSTRACT ALGEBRA 285
only if m is the number of distinct characteristic roots of R. In this case, the
matrix X is of diagonal form.
1
= - [ ( H W - 'H T @ ( R ) H ( H T H ) - ' ] .
N
The expression for Cov ( f i ( T ) I p ) when (30) is used is complex, due to the fact
that (30) is not explicit but can be directly calculated using the fourth-order
moment expression for Gaussian random variables. It is, however, possible
to relate the total variance of the two estimators as in the following lemma.
Lemma 3. For su8ciently large N and some No such that N > N o ,
tr {COV ones, I P I > tr {COV (B(T,I P I > .
froo$ We first consider the Fisher information matrix, J(p), associated
286 SALVATORE D. MORGERA
Note that J i l ( p )is the Cramer-Rao bound for the variance of any unbiased
estimator p of p based on a sample set of size N. Under the assumption of
normality, the elements of J ( p ) , [J(p)],, i, j = 1, 2, . . . , m, are given by
[J(p)],, = 3 tr(R-'GiR-'G,] = gr@(R)-'gj,
where we have used Porat and Friedlander (1986) and (29). The definition of
the matrix G allows the simple result J(p) = G'@(R)-'G; thus, J(p)-' =
[G'@(R)-l GI-'.Now, we identify J ( p ) with the constant matrix J ( p ) ; thus,
for a sample set of size N, we have
D . Experimental Results
13 , 1
12 -
11 -
10 -
9-
2 8-
-
-
<a
b 7-
iilscs!
9
Y 6-
b
5 --
4- 4T,
3-
2-
1- Cramhr-Rao bound
0
d
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ~ 1 1 ~
IPJ
FIGURE2. Variance of structured covariance matrix estimators for N = 3. Pole parameters
are IpII = 0.96, 0, = n/4,0, = 2x13.
V. AR PROCESS
PARAMETER
ESTIMATION
12-
11- &
10 -
9-
Rlsjlsl
8-
1-
6-
5-
%,
1 - Cram&-Rao bound
0
d
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
IP*l
FIGURE3. Variance of structured covariance matrix estimators for N = 3. Pole parameters
are IpII = 0.96, 0, = n/4,O2 = 4 4 + 3n/50.
where s2is an estimate of the variance, a,, of the zero mean, white Gaussian
term driving the AR process.
Note that (42) does not restrict the coefficient matrix, R , to be Toeplitz, a
structure that is, however, commonly used, principally because AR process
parameter estimates so obtained are asymptotically efficient, even though the
vector of covariance estimates &) is not a sufficient statistic for a. This lack
of sufficiency arises for the Toeplitz structure since the solution to (42)
290 SALVATORE D. MORGERA
requires 8-l and Y # 9 - l . We are then faced with a situation in which Ij(r)
may be the lowest variance finite sample size estimate among the Toeplitz,
SCS, and generalized estimates, but does not, however, result in the lowest
variance estimate of a via (42). In fact, it may be argued that the lowest
variance estimate of a using (42) must be realized through a covariance
matrix estimate consistent with R E Y and Y = 2-’, i.e., R must be a
member of a Jordan algebra of symmetric linear mappings. ThiLpoint is
related to the “old” question of whether one should utilize R - ’ or R - I ; when
Y = Y - ’ ,the two are equivalent, and when R is a MLE for R,any linear
combination, such as R - ‘ v , V E V , is a uniformly minimum variance unbiased
estimate (Seely, 1971).
2-
1 - Cramer-Rao bound
J
O l l , l l l , l , l l l l l l l l l I I
C . Experimental Results
We now present a finite sample size experiment. This example highlights the
importance of selecting an appropriate covariance matrix estimate structure
when solving for the AR process parameters via the transformation method.
Example 8. Let p = 5 and construct the data sample set as in Example 7.
Consider the fourth order AR model for which a, = - 2.7607, a, = 3.8106,
a3 = -2.6535, and a, = .9238. The variance c2 = 1 and is assumed known.
The true psd for this model is shown in Fig. 8. In view of the previous
definitions, this AR model is associated with a relatively peaked, narrowband
spectrum, which poses some difficulty for finite sample size estimation. The
actual pole positions for this model are p , = 0.98 ej2n(o.14)
and p , = 0.98 ej2"'.'').
Figures 6 and 7 show the total variance tr {Cov(i(.)lp)}and bias b , , . ,=
E{ii,(.)}- a, as a function of sample size, N , where the subscript ( )
indicates the type of structured covariance matrix estimator (T or SCS) used
prior to solution of the normal equations. The results presented represent an
average taken over 100 realizations; no appreciable differences were detected
THE INTERTWINING OF ABSTRACT ALGEBRA 293
0.2
0.18
0.16
0.14
i
u) 0.12
Al
P
0.1
0.08
0.06
0.04
0.02
3 4 5 6 7 8 9 10
N
FIGURE7. Bias of structured AR process parameter estimators as a function of sample
size, N .
when 1000 realizations were employed. In this example and the other cases
studied, the bias associated with ri, was chosen as being representative of that
calculated for the other AR parameters. Figure 6 also shows the exact
Cramer-Rao bound on tr { Cov (ii I p ) } computed using the approach found
in Giannella (1986). Use of the generalized estimate, ff,, produced variances
and biases of the elements of 6 that were several orders of magnitude higher
than those associated with the structured estimates, rendering unstructured
estimation of AR process parameter estimates virtually useless for small
sample sizes. Figures 8 and 9 verify that the SCS covariance estimator leads
to the best estimate of the AR process psd for N = 35, with the Toeplitz and
SCS estimators producing comparable psd estimates when N = 10.
The preceding example points out that if finite sample size AR process
parameter estimation via the normal equations is the goal, it is important that
the estimate be a member of a Jordan algebra. We conclude this section by
showing why this is so. To begin, we consider the matrix
- 1
I=RR*R-'=~~RR-~+R-'R); (45)
for the covariance estimators under consideration, we have that limN-.m f = I
294 SALVATORE D. MORGERA
50 1 I 1 I 1 1 I I 1
-
-
-
-
U
v)
U
0- -
-10 -
-20 -
-30 1 1 I I I I I 1 1
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
norrnalised frequency
50 I I I I I I I 1 I
-
-
-
-
‘0
L 0-
-
-10 - -
-20 - -
-30 I I I I 1 I I I I
THE INTERTWINING OF ABSTRACT ALGEBRA 295
almost surely. Let p = 5 and consider the SCS estimator, &s, = X:=, &.cs,H,,
where the structure set is shown in Example 3. Assuming weak stationarity,
R is Toeplitz and we can write R - ' = Xf,, s,H, . One of the important reasons
that I?,,, leads to lower variance AR process parameter estimates vis-a-vis
I?,,, is the increased number of degrees of freedom (rp/2l) along the diagonal.
We consider, therefore, all products H, * H, in computing (45) that lead to
diagonal elements of f (Table 1 of Appendix B is of help in deducing which
are relevant) and call the resulting matrix D ( i ) . Some simple algebra shows
that
D(4SCS)) = &CS) * R - = &CS) + &J 9 (46)
where fi,,,,, is linearly expressed in terms of H,, H I , and H,, and is
linearly expressed in terms of H , and H4. We can show that E{&cs)} = I ;
thus, &cs, may be considered a stochastic error term (of antidiagonal
form) for an estimator that is an element of a Jordan algebra. Let =
Zf=, &cs)H, and = XtC3 4(scs,H,; in particular, C,(,, = ij5(scs)s5and
t4(scs,= & ( s C s ) s 6 . As a measure of the variance, we compute
CWS) = E{~(i,SCS,>~'(~(,CS))}
- E2{D@(SCS))1
= w a r ( d S C S ) 1 + Var (~3(SCS)IHO+ w a r (~I(SCS))
+ Var (@qSCS))IHi
+ P a r (Ciz(SCS))1~Z+ ~ ~ ~ { ~ o ( s c s )1~-3 (Ws3c( Ss C) S )
+ 2[E~~l(scs)~qscs)1 - W4(SCS)}Iff49 (47)
which shows the manner in which the error terms contribute to the variance
of the diagonal terms.
Now, consider a Toeplitz estimator. We write &) = Zp=, &)G,, where
the structure set is shown in Example 1. From (45), we have
D ( f ( , ) )= I?,,, * R-'= a,,) &), + (48)
where d,,, is expressed as l$T) = Xf=, &,H, with E{&} = I , and the
stochastic error term is now of the form = X:=, C ? , ( ~ , HIn
, , particular,
4(T)= fi2(TpSI &) = d l ( T ) s 6 , and 4(T) =h(T)(% +
sJ2. As before, we
compute
C(T) E{D(f(T))DT(f(T)N- EZ{D(f(T))}
= rvar@o(T)) + Var(@,(T))+ Var (exT))IHo
+ w a r (&)I + Var (64(T))IHI
+ w a r ( 4 T ) ) + 2 Var (@5(T))IHZ
~ [ E { J O ( T ) ~-~ E( T{ ;) ~ ( T +
) } Var(65(~))IH3
2[E{&)&) - E{;~(T)
296 SALVATORE D. MORGERA
+ [E{(Jo(T)i-
J2(r)+ P ~ ( T ) ) & ) } - E I P ~ ( T ) ) E { ; s ( T-) )~E{PxT,}IH~.
(49)
For N sufficientlylarge, elements on the same super- or sub-diagonal of
all converge to the element on the corresponding super- or sub-diagonal
of R . Comparing (47) and (49) in this case, we obtain
C(T) = C,,v + AE, (504
where
AE = Var(Ps(,))[H0 + 2H2 + 2H3]+ t H 5 , (50b)
with t representing the coefficient of H5 in (49).
The quality of importance is clearly Var ( P ~ ( ~= ) ) Var (&))(Sg + s,)'/4.
+
Since s6 s8 = (2a, + a, a, - a3a4)/a2, using the Gohberg-Semencul decom-
position of R - ' , the magnitude of this term depends on the variance of the
first lag covariance estimate and the true AR parameters which, in turn,
depend on the true covariances. In general, the norm of AE is not
thereby illustrating the importance
insignificant relative to the norm of Ccscs),
of using an SCS estimator as a first step to estimating the AR process
parameters. Furthermore, simulation results indicate that Var (P,,,,) is larger
for narrowband spectra than for wideband spectra and in the former case, if
the pole magnitudes are approximately equal, is maximum when the pole
angles are equal. The interested reader may easily carry this investigation
further in order to obtain a good understanding of the dependencies
involved.
VI. EXACTLOGLIKELIHOOD
FOR AR PROCESS PARAMETER ESTIMATION
Changing variables in accord with (51) results in the joint conditional prob-
ability density function
(55)
The exact loglikelihood function, neglecting constant terms, then follows
directly from (55), viz.,
(56)
We note that R,,-l is a function of a, and that the partial derivatives of
f(a, a2I S)with respect to the elements of { a j : j= 1,2, . . . , p - l} are highly
nonlinear functions, thereby making iterative maximization of (56) com-
plicated. A common approach taken is to neglect the initial conditions x, and
just to maximize the logarithm of (53); this approach is not suitable for finite
sample size. We now present a solution to the problem of ML AR process
parameter estimation for finite sample size. The approach uses a new
form of the loglikelihood function, which we call the forward-backward
loglikelihood function.
where
Neglecting constant terms, the resulting exact loglikelihood for the usual, or
forward predictor, is
Inserting (64) and (65) into (63), taking the logarithm, and dropping the
constant terms yields the exact backward loglikelihood
f"(a, a21%) =
M
+
- -In a2 - In ll?,,[ - - @(a),
2
I -
2a2
(66a)
300 SALVATORE D. MORGERA
where
M-p- I
@(a) = X $ - ~ R ; I X ~ - ~ + 1 (xlTa)'. (66b)
i =O
and
Lj = c
M-p-1
i50
xlxl'.
To obtain the precise form off "(a, a' I X)that is to be maximized, the noise
variance is first eliminated, and then a is mapped into Rp using (42). The
partial derivative off"(a, a*I X)with respect to a' is
THE INTERTWINING OF ABSTRACT ALGEBRA 301
Setting this partial derivative equal to zero yields a2= &a)/M; therefore,
given that the MLE e(B) has been found, the MLE of a2is just e2= &(B)/M.
Substituting o2 = e ( a ) / Minto (67)and again dropping constant terms results
in
2f"(a 1%) = In 1R;' I - MIn &a). (69)
Finally, we substitute a = R;'h into (68a) and denote the result by Q(R;');
using this result, we write (69) as
f"(&'l~) = In&'[ - MlnQ(R;'), (70)
where we have dropped the factor of two on the left hand side of (69).
It is important to keep in mind I?, = R/a2 and that, by definition, a, = 1.
Referring to (42), we see that the only way that the latter condition can be
met is if [R; 'lo,, = 1. Satisfying this requirement and maximizingf"(8; ' I 3)
with respect to l?, is complicated; however, if we maximize the loglikelihood
function with respect to 3, G &', the constraint of [R;'],,, = 1 is not
difficult to enforce. The maximization problem is, therefore, stated as
follows: obtain a ( p x p)-dimensional matrix 3, with [3,],,, = 1 that
maximizes
~ " (I33), = In lSpl - Mln Q(3,), (7 1 4
where
+x~-,x~-,)~,]
&Sp) = 3 tr[(~,-,x,T_~
+ )h'S,(d + B)S,h. (71b)
In the spirit of Section 11, we expand the normalized inverse covariance, g,
according to the SCS linear model
where the dimension m' is given by Proposition 1 in terms ofp. The first and
second order partial derivatives off"(S,I 3)with respect to &,i = 1, 2, . . . ,
m' - 1, are not difficult to obtain; therefore, a Newton-Raphson maximiz-
ation of the loglikelihood, similar to that employed in Morgera and Armour
(1989) may be formulated. The derivatives and the update equation for the
optimization are given in Appendix C . The MLE of the AR process
parameters is given by B = Sph, where 3, is understood to be the estimate
obtained upon termination of the iterative process. The corresponding MLE
of the noise variance is as previously described.
It is clear that the individual forward and backward loglikelihoods, given
by (62) and (66), respectively, can be written in terms of Sp; therefore,
utilizing the linear model (72), either loglikelhood can be maximized in the
302 SALVATORE D. MORGERA
D . Experimental Results
TABLE I
COMPARISON
OF THE APPROX.
ML, EXACTFB-ML, RMLE, BURGAND LS-FB
SPECTRUMESTIMATION
METHODS.
M = 15 DATAPOINTS- 1000 REALIZATIONS MODEL1
AVERAGED,
~ ~~ ~
far. In this way, the number of realizations for which the algorithm fails is
counted and included in the results.
Example 9: M L Estimates for Model 1. The sample mean and variance of
the AR process parameter estimates for Model 1 are shown in Table I. The
exact FB-ML algorithm failed to converge 148 times out of 1000, but all 1000
realizations are included in the sample mean and sample variance of the
estimates. Each of the psd estimates for Model 1 are plotted in Figure 10. We
see that the first sharp peak is correctly estimated by all the methods, and the
second, lower energy peak is, in general, poorly estimated. Both the LS-FB
and the exact FB-ML psd estimates place the second peak correctly in
frequency, but the ML method more closely models the energy content. The
AML estimate is biased, shifting the second peak up in frequency from its
true position.
Referring to Table I, we note that there are large differences in the variance
of the estimates. Defining the total average variance as the sum +X;=,var (ci,),
we see that the AML method yields relatively high variance AR parameter
estimates. The exact FB-ML, RMLE, Burg, and the LS-FB psd estimators
all yield comparable total variance. The exact FB-ML estimate has lower
total variance than the suboptimal RMLE method, but slightly higher
variance than the LS-FB AR estimates with which the exact FB-ML
algorithm is initialized. None of the estimators attains the CramQ-Rao (CR)
THE INTERTWINING OF ABSTRACT ALGEBRA 305
-15 I 1 I I I I 1 I I I
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
norrnalised frequency
FIGURE10. True and estimated power spectral densities for Model I and M = 15.
lower bound on variance. The LS-FB AR estimates come the closest, with the
total variance being approximately twice the Cramer-Rao lower bound.
Turning our attention now to the white noise power estimates, 82,we see
that the variance is quite high. The best estimate is derived from the LS-FB
method, but it still has variance that is an order of magnitude from the
Cramtr-Rao lower bound. The poor performance of the estimators in this
respect results, in part, because the noise power estimate is derived from the
AR process parameter estimates.
Example 10: M L Estimates for Model 2. The narrowband psd of Model 2,
posing a more difficult spectrum estimation problem, is estimated with a
higher total variance for all the estimation methods. The sample mean and
variance of the AR process parameter estimates are listed in Table 11. The
exact FB-ML method failed to converge in 256 realizations out of 1000.
Again, all 1000 realizations are included in the sample mean and sample
variance of the estimated parameters.
The mean psd estimates are plotted in Fig. 11. The AML estimate fails to
produce a psd estimate that resolves the two peaks in the spectrum. None of
306 SALVATORE D. MORGERA
TABLE I1
COMPARISON
OF THE APPROX.
ML, EXACTFB-ML, RMLE, BURGAND LS-FB
SPECTRUMESTIMATION
METHODS.
M = 15 DATAPOINTS- 1000 REALIZATIONS
AVERAGED,MODEL2
the estimators exactly estimate the energy of the two peaks, but the exact
FB-ML mean spectrum estimate does succeed in resolving both peaks. The
LS-FB algorithm only barely resolves the second peak.
Overall, the exact FB-ML, RMLE, Burg, and LS-FB methods yield com-
parative variance AR process parameter estimates, which is well below that
of the AML method. The noise power estimates, again, have high variances,
with the lowest variance estimate resulting from the LS-FB method. As with
the wideband psd of Model 1, none of the methods attains the Cramer-Rao
lower bound.
The mean psd estimate of the exact FB ML algorithm is compared with
the RMLE, Burg, and LS-FB algorithms in Figure 12. Though all of the
estimates have very low bias, some differences can be seen. The Burg
algorithm positions the second peak slightly up in frequency and the energy
content is too low. The LS-FB estimate used to initialize the exact FB-ML
algorithm does not adequately resolve the second peak. The RMLE and the
exact FB-ML algorithms yield very similar mean psd estimates. The peak
positions are unbiased, but the exact FB-ML estimate more closely models
the spectral energy of the first peak.
There should be concern that the good quality of estimates obtained using
the exact FB-ML algorithm is due only to the excellent initialization provided
by the covariance estimate $,, . The results show, however, that the FB-ML
algorithm leads to AR process parameter estimates that are quite different
THE INTERTWINING OF ABSTRACT ALGEBRA 307
-30 I I I I I I I I I I
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 5
normalised frequency
FIGUREI I. True and estimated power spectral densities for Model 2 and M = 15.
from those obtained using the LS-FB method. For each case where the
algorithm converges, a local maximum of the forward-backward log-
likelihood is attained. One might expect lower variance estimates if the
realizations for which an exact ML estimate is not found, i.e., the realizations
for which the algorithm did not converge, are excluded. To see if this is true,
an additional experiment using Model 2 with M = 15 and 1000 realizations
was conducted, only this time all the realizations for which the exact FB-ML
algorithm did not properly converge were excluded from the sample mean
and variance calculations. The results are shown in Table 111. We see that the
total variance of the exact FB-ML algorithm is indeed lower than that
obtained for the LS-FB estimate. The results presented here are conservative
in that these realizations are retained.
The number of realizations for which convergence does not occur
decreases slowly with increasing data record length M . For example, with
Model 2 and M = 50, 116 realizations out of 1000 did not converge. This is
about half the number of convergence failures recorded when M = 15. With
M = 100, 86 realizations out of 1000 failed to converge. Further research is
308 SALVATORE D. MORGERA
normalised frequency
FIGURE12. Power spectral density estimates for Model 2 and M = 15.
TABLE I11
COMPARISON
OF THE EXACTFB-ML AND LS-FB SPECTRUMESTIMATION
METHODS.
M = 15 DATAPOINTS- 1000 REALIZATIONS,
MODEL2
DUETO CONVERGENCE FAILURE.
250 REALIZATIONS EXCLUDED
This work has shown that, under the assumption of normality, a necessary
and sufficient condition for a.complete sufficient and explicit statistic for the
covariance estimation problem is that the linear subspaces associated with
the covariance and the inverse covariance be identical and a Jordan algebra
of symmetric linear mappings. The class of symmetric centrosymmetric (SCS)
matrices forms such an algebra under the proper composition, whereas the
class of symmetric Toeplitz matrices does not. The extension of the smaller
class to the larger has been demonstrated using the composition - or
symmetric product - operation associated with a Jordan algebra. In addition,
the concepts of an ideal structure and related Jordan algebra isomorphism
has been employed to expose the form of the maximum likelihood estimator
(MLE) for the SCS class. Simulation results for finite, and very small, sample
sizes have shown that, under the assumption of weak stationarity, the highly
constrained Toeplitz covariance estimate has lower variance than the SCS
estimate, but gives rise to higher bias and variance autoregressive (AR)
process parameter estimates obtained via the normal equations. This
phenomenon is explained using an argument in which the Jordan algebra
structure plays a key role.
Cognizant of the relevance of the Jordan algebra structure, a new form of
the loglikelihood function suitable for finite sample size Gaussian AR process
parameter estimation is derived. The resulting functional is called the exact
forward-backward (FB) loglikelihood and is a linear combination of the
forward and backward loglikelihoods. Experimental results are presented
310 SALVATORE D. MORGERA
ACKNOWLEDGMENTS
APPENDIX
A
ABSTRACTALGEBRAIC
CONCEPTS
is defined by
($ 0 Cp)Vl = $(cpVl)9 VI E V, . ('4.2)
THE INTERTWINING OF ABSTRACT ALGEBRA 31 1
The mapping $ 0 cp is again linear and satisfies (A. 1). For the most part, we,
in fact, work with the matrix representation of a linear mapping. Consider the
linear mapping cp: V, -+ V2, where El = { e l l : i = 1, 2, . . . , p > and E2=
{e,,: i = 1 , 2, . . . , q} are basis sets for V, and V,, respectively. Every vector
cpe,, can be expressed as
4
cpe,, = 1 ~ l , ~ e , ~ ,i = 1, 2, . . . ,p .
J=I
('4.3)
APPENDIX
B
JORDAN
ALGEBRA
MULTIPLICATION
TABLES
TABLE 1
MULTIPLICATION
(*) TABLE
FOR THE %DIMENSIONAL JORDAN ALGEBRA
OF ( 5 X 5) scs MATRICES
THE INTERTWINING OF ABSTRACT ALGEBRA 313
TABLE 2
(*) TABLE
MULTIPLICATION FOR THE %DIMENSIONAL JORDAN
ALGEBRA
IDEALS OF
( 5 x 5 ) SCS MATRICES
-42,
A22
1 A21
A22 A21
0
0
A21
APPENDIX
C
A NEWTON-RAPHSON
MAXIMIZATION
OF THE EXACT
FORWARD-BACKWARD FUNCTION
LOGLIKELIHOOD
sp= 1 S,,H,,,
n=O
a -
-Sp = H,,,
a%
314 SALVATORE D. MORGERA
The first and second order partial derivatives off"($, 1%) are
and
and
f o r m , n = 1 , 2 , . . . , m ' - 1.
Letting 9 = [Sl S2 . . . Fm.- I]r and ? =[c -
. . . tm.- I]r, a Newton-Raphson
maximization procedure, for example, that found in Morgera and Armour
(1989) may be used to maximize f"(spl%).
Here, ; ( ' + I ) is defined as the
step taken from the estimate 9(') at the ith iteration to the estimate 9"'"
given by
#;+I) = 60) + p ? U + l ) . (C.3)
The step size p is initially set to p = 1 , and is reduced as necessary such that
the loglikelihood increases. The update equation is
m= I
where gt) is the nth component of the gradient given by (C.1) evaluated
at the ith iteration and is the (m,n)th component of the Hessian matrix
given by (C.2) evaluated at the ith iteration. Iteration is terminated
when the change in f "($p 1%) is sufficiently small for several consecutive
iterations.
T H E INTERTWINING OF ABSTRACT ALGEBRA 315
REFERENCES
I. Introduction . . . . . . . . . . . . . . . . . . . . , , . . . , . , , . 317
11. Physics of Ultrasound . . . . . . . . . , . .. . . . .
, . . . . . . . . 318
111. Acoustic Tissue Models . . . . . . . . , . .
, . . . . . . . . . . . . . 321
IV. Estimation of Acoustic Parameters: Acoustospectrography . . . . . . . . . . 323
V. Generation of Tissue Texture. . . . . . . . . . . . . . . . . . . . . . . 325
VI. Texture Analysis . . . . . . . . . . . , . . , . . , . , . , . , . . , . 329
A. Diffuse Scattering Model . . . . . . . , , , . . . . . , . , . . , . . 329
B. Combined Diffuse/Structural Scattering Model. . . . . . . . . . . , . . 332
C. Resolved Structure. . . . . . . . . . , , , . . . . . . . . . . . . . 333
D. Non-Parametric Texture Analysis . . . , . . . . . . . . . . . , . .
. 337
VII. Image Processing , . . . . . . . . . . . , . . . . . . . . . . . . . .
. 338
A. Detection of Focal Lesions . . . . . . , . , , . . . . . . . . . . ,
. 338
9. Improvement of Lesion Detection . . . , . , , . . . . . . . . . , , . 341
C. Detection of Diffuse Pathological Conditions , , . . . . . . . . . . . . 344
Acknowledgments , . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345
References . , . , , . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345
I. INTRODUCTION
OF ULTRASOUND
11. PHYSICS
-30 dB
0 1 0 10 20
time (ps) frequency (MHz)
FIGUREI . (a) Waveform of transmitted ultrasound pulse; (b) power spectrum correspond-
ing to waveform in (a) (Oosterveld el al., 1985).
FIGURE 2. (a) Continuous wave mode of transmission, cross section of beam obtained by
simulation; (b) pulsed wave mode of same transducer as in (a) (Thijssen, 1987).
absorption
attenuation
/
/ backscattering
scattering
FIGURE3. Acoustic tissue model: constant sound velocity, homogeneous absorption, and
isotropic (diffuse) scattering. Accessible acoustic tissue parameters: attenuation and backscatter-
ing (Thijssen and Oosterveld, 1990).
It may be emphasized that this equation does not simplify the field conditions
and is not restricted to plane wave propagation, as is often assumed.
TISSUE
111. ACOUSTIC MODELS
absorption
attenuation
/
-
scattering tissue model
-
\
backscattering
/
structure
texture
U
FIGURE 4. Acoustic tissue model as in Fig. 3, with additionally structural scattering.
Additional parameter(s) related to structure can be estimated (Thijssen and Oosterveld, 1990).
Iv. ESTIMATION
OF ACOUSTIC
PARAMETERS: ACOUSTOSPECTROGRAPHY
where R equals the focal distance of the employed transducer. Since the
rf-signal from the calibration material is measured in a water tank, while
carefully selecting a “time window” at the same location within this material
and changing the distance between transducer and the top surface, all the
other terms in Eq. ( 5 ) are divided out of Eq. (7).
It will be clear that the division of Eq. (5) by Eq. (7) yields
E ( L 2) = W ) H , 2 ( f ,R ) f G ( f ,z ) H , s ( f ) (8)
After insertion of Eq. (6) into this equation, and after taking the (natural)
logarithm of it, the first derivative to z yields
The factor of two is due to the square of H,;in other words, the distance from
transducer to the insonified region of interest (ROI) is travelled twice.
The attenuation coefficient of most biological tissues is proportional to the
frequency. One method of estimating the “slope” of the attenuation
coefficient is the “quasi multi-narrow band” method (Cloostermans and
Thijssen, 1983). E ( f , z ) is estimated by a sliding window technique from a
series of windowed rf-line segments at depths z,. The discrete Fourier trans-
formation yields estimates of E at a range of discrete frequenciesf;. So, the
324 J. M. THIJSSEN
This method was first applied by the author (Cloostermans and Thijssen,
1983), and has a statistical advantage over the log-spectral difference method
devised some years before (Kuc et al., 1976). Another method of estimating
the attenuation coefficient (slope) can be found in the literature: the centroid
shift method, which is applicable if the transmitted spectrum is a Gaussian
(Kuc et al., 1976; Dines and Kak, 1970; Fink et al., 1983).
The backscattering characteristics of tissue can be estimated by proceeding
with a further reduction of Eq. (8). This is done by a correction of the
attenuation, i.e., by using Eq. (9), HT is divided out. Then a new measurement
is to be involved: a registration of the echo from a plane reflector placed in
the focus, Hdp(R).Taking a (known) particular reflectivity of this reflector
into account and knowing that this reflection is practically frequency-
independent and identical to Hd(f,R),we get
R, = zZ(f)Hdp(R> 12(f>Hd(f, R)* (1 1)
So, dividing Eq. (8) by Eq. (1 1) yields
E ( f )= H b s ( f ) . (12)
As was shown (Lizzi et al., 1983; Romijn et al., 1989) both for discrete and
inhomogeneous continuum media, the backscattering spectrum of biological
tissues can be modelled by a straight line in the frequency band transmitted
by diagnostic transducers (Fig. 5). The slope of this line is determined by the
effective size of the scattering structures, providing that the attenuation and
the diffraction have been adequately compensated for. The zero-frequency
intercept depends on both this size and the reflectivity of the structures.
The scattering also contributes to the attenuation of the propagating
ultrasound pulse. The attenuation coefficient as defined in Eq. 6 therefore
consists of a pure absorption part and a part related to scattering:
P(f) = P a ( f ) + Ps(f)* (1 3)
In the range of diagnostic frequencies (2-10 MHz), the scattering has been
estimated to contribute in a small fraction to the attenuation coefficient, from
a few percentage points at the low end to the order of 10% at the higher end
(Campbell and Waag, 1984; Nicholas, 1977). The frequency dependence of
the (back)scattering intensity was found to be a power of the order of 1 to
2, which increases with increasing frequency (Nicholas, 1977). Since the
ECHOGRAPHIC IMAGE PROCESSING 325
0 2 4 6 8 10
Frequency (MZ)
v. GENERATION
OF TISSUE
TEXTURE
4 - h w
”V
FIGURE7. Scheme of linear summation of echoes from scatterers in Fig. 6 . Resulting
rf-echogram does not display number and level of echoes due to interference: speckle is formed.
Dashed curve is envelope (A-mode) (Thijssen, 1987).
transducer produces an electrical signal (rf), which is the algebraic sum of the
instantaneous sound pressures originating from the four backscattered
waves. This operation is called a linear phase-sensitive reception. As is shown
in Fig. 7, the four rf-echoes build an “interference pattern” because the depth
differences of the scatterers are smaller than the axial size of the resolution
volume of the transducer (i.e., the pulse length).
This is in fact the basic principle of the generation of tissue textures! The
dashed line in Fig. 7 is the demodulated, i.e., video, A-mode echogram,
which, in this case, contains three peaks. Neither this number of peaks nor
the amplitudes are simply related to the number (nor to the location) of the
scattering structures. In an analogy with the interference phenomena, which
are visible when viewing an image produced by laser light, the texture of an
echogram obtained from a scattering medium is called a “speckle” pattern.
It should be stressed once more that the tissue texture is in general not a true
image of the histological structure but rather an interference pattern that is
mainly determined by the beam characteristics. However, as is discussed
later, some characteristics of the tissue structure may be revealed by the
texture. The next step is the construction of a B-mode echogram from the
single A-mode lines (Fig. 8).
When the number of scattering structures in the resolution cell (i.e.,
effective beam width times pulse length) is large, they will not be resolved in
the rf-signal (Fig. 7). The condition at the transducer corresponds to a
random walk in 2-D space: a summation of a large number of phasors with
a phase that is uniformly distributed between 0 and 271. The rf-signal has, for
the large number limit, a circular Gaussian joint probability distribution
ECHOGRAPHIC IMAGE PROCESSING 321
p ( a ) = (2naZ)-'exp
where a, and ai are the real and imaginary parts of the analytic function.
It can be shown, (Goodman, 1975; Burckhardt, 1978; Abbott and
Thurstone, 1979; Flax et al., 1981; Wagner et al., 1983) that after demodu-
lation of the rf-signal, which yields the video signal v(t),
p(v) = v/aZexp { -v2/2a2}. (15)
This is the Rayleigh p.d.f., whereas the intensity 1, i.e., the square of 'u, has
an exponential p.d.f.
p ( ~ =) 1/ 2 d exp { - 1/2u2}. (16)
The condition where these formulas apply is sometimes called the
"Rayleigh" limit of the number density of scatterers within the tissue. In fact,
it is the absolute number of scatterers N within the resolution cell that is the
important factor. The p.d.f. for lower numbers can be derived in integral
form (Jakeman, 1984)
where Jo is the Bessel function of zero order first kind, and b is the scattering
amplitude of individual scatterers. This equation has to be solved numeri-
cally, but the moments of the p.d.f. can be derived in analytical form.
In addition to these first-order gray-level statistics, i.e., the histogram, it is
important to quantify the second-order statistics as well. These can best be
described by the spatial autocorrelation function (ACF), which in 1-D is
given by
A(Ax) = E { v ( x + Ax)v(x)}, (18)
where E stands for expectation value.
This ACF represents the spatial relations between image pixels, i.e., the
texture characteristics of the image. The speckle nature of echographic
images is illustrated by the B-mode images in Fig. 8B, which were obtained
by calculation with a simulation software package developed at the author's
laboratory.
The rf-lines were calculated from a 3-D volume with a density of 5000
scatterers per cubic centimetre, and at each depth indicated in Fig. 8B a 5 mm
depth range was selected. The sound beam was displaced 0.2 mm, and a new
rf-line was calculated until a lateral image size (vertical in the figure) of 20 mm
was reached. After completion of the simulations, the rf-line segments were
ECHOGRAPHIC IMAGE PROCESSING 329
software-demodulated and depicted in gray-scale. The gray levels of each box
were normalized in order to obtain the most adequate display; this procedure
corresponds to the appropriate setting of the time-gain compensation (TGC)
of equipment. It may be remarked that without this normalization, the image
in the focal zone would have displayed clearly a larger mean gray level than
the surrounding images. The most striking feature of the B-mode images in
Fig. 8B is the continuing increase of the lateral speckle size with increasing
depth, i.e., from left to right. This feature is present in any B-mode picture
and it can be understood now from the foregoing discussion on the inter-
ference at reception by the transducer. When the insonified volume is near the
transducer, the differences in the distance to the transducer of the scatterers
located within the sampling volume are large as compared to the wavelength.
Therefore, the changes of these distances due to the scanning of the beam are
also relatively large, and the lateral size of the speckles is small. This
phenomenon may be looked a t as the inverse of the interference phenomena
that occur in the near field at the transmission of the sound beam (Fig. 5).
When moving through the near field toward the focus, the lateral speckle size
reaches a magnitude that does not change much any more beyond it. This
latter phenomenon can be explained by the simultaneous increase of the
lateral extent of the sampling volume (beam width) and the decrease of the
depth differences of the scatterers with respect to the transducer. The effect
of an attenuation of 0.1 Np/cmMHz is quite evident in Fig. 8c, again
predominantly in the lateral speckle size.
ANALYSIS
VI. TEXTURE
which is also directly derived from the p.d.f. in the case of fully developed
speckle (Eq. 16).
The similar expression for the echoamplitude v is not known to the author;
however, Eq. (19) can be rewritten as
(v’)/((v4) - (V2)’)1/* = (1 + (b4)/nV(b’)’)-1/’, (21)
and after some rearrangement,
(v4)/(v’ j’ = ( 1 + (b4)/nV(b2)2). (22)
This is the expression of the kurtosis of the p.d.f. of v, which can be
experimentally estimated. The value of ( b 4 ) / ( b 2 ) ’ has been assessed for
biological tissues and is of the order of 3 (Sleefe and Lele, 1988). It can be
concluded then, that from Eq. (22) the number density n can be obtained. The
value of V is assessed from the point spread function (PSF) of the employed
transducer, or the analysis of the second-order statistics of the image
(Thijssen and Oosterveld, 1988).
The dependence of the SNR,, i.e., of the echoamplitude, was investigated
by simulations (Thijssen and Oosterveld, 1985; Oosterveld et al., 1985) and
appeared to increase continuously with increasing number density to a limit
set by the mean and standard deiiation of the Rayleigh p.d.f. (Eq. 15):
p v = (?rrT’)I/* (23)
rTv = { (4- ?r)rT’} I/’ (24)
Hence,
SNR, = { 4 ( 4 - 7~))”’ = 1.91, (25)
which is a limit value for large number density, equivalent to Eq. (20). It may
be mentioned that according to the general theory of scattering (Flax et al.,
198l), the mean scattering intensity is proportional to the number density
under all conditions. Therefore,
pw: nil2. (26)
This relation was confirmed in a simulation study (Thijssen and Oosterveld,
1985, 1986; Oosterveld et al., 1985). It indicates a potential for characterizing
tissues and changes due to pathologic conditions. It should be emphasized
that the relative change of SNR, is of the order of a factor two, when the
number density increases over two decades. The mean p w ,however, displays
an increase by a factor of 10 for the same density range.
The lateral and the axial size of the speckle cannot be calculated analyti-
cally except in the small zone around the focus, which underlines the
importance of performing realistic 3-D simulation studies. Therefore, this
discussion will be restricted to this focal zone. Analytic formulae for the axial
ECHOGRAPHIC IMAGE PROCESSING 33 1
and lateral dimensions of the speckle for this condition were derived by
Wagner et al. (1983). These authors extended the theory of speckle that was
developed for coherent light (i.e., laser speckle) by Goodman (1975). The size
of the speckles as given by the full-width-at-half-maximum (FWHM) of the
“autocorrelation” function in the axial direction is found to be (in the focus)
FWHM,, = 0.61/Af (p), (27)
where Af is the FWHM of the spectrum corresponding to the transmitted
ultrasound pulse (- 6 dB width). When assuming a pulse with a Gaussian
envelope with “standard deviation” o r , it can easily be shown that the
spectrum is also a Gaussian and
O,Of = (27T-I
FIGURE 9. B-mode images from simulations with increasing volume densities of the scatter-
ing structures, from left to right 100 to 3000cm-’ (Oosterveld et al., 1985).
respectively, can be shown to be (Thijssen and Oosterveld, 1988)
AZ = 2.3550, (32)
and
AX = l.Q2&F/D. (33)
Because in biological tissues the attenuation coefficient is proportional to
frequency to a fair approximation, it can easily be shown that a Gaussian
spectrum corresponding to the ultrasound transmission pulse will remain
Gaussian. This property implies that attenuation induces a downshift of the
central frequency, while the bandwidth is maintained. Therefore, only the
lateral speckle size will increase (Eq. 31) with increasing depth. This increase
enhances the already occurring increase due to the beam diffraction (Fig. 8),
and it is still present in the far field. It may be clear that any texture analysis
of echographic images can be unambiguous only when these effects are in
some way corrected for and the texture has been made homogeneous in
depth.
where
(Id) = ensemble average of diffuse scattering intensity = I d when taking
stationarity into account;
I, structural scattering intensity (= ( I , ) in case of unresolved structure);
=
and
I, = modified Bessel function of zero order, first kind.
This equation is derived while assuming that the variance of I, is small
compared with that of Id, and that the number density of the diffuse scatter-
ing is large.
It can be shown (Wagner et al., 1986) that the signal-to-noise ratio becomes
SNR, = {Id+ I , } / { I j + 21s1d}”2 (35)
Defining r = Is/Id, it follows from this equation that
SNR, = (1 + r)/(l + 2r)1’2. (36)
Hence, the high number density limit of SNR, is again a constant (Eq. 20),
which is determined by the intensity ratio r . It may be remarked that the
denominator of Eqs. (35) and (36) equals the so-called Rician variance:
c’,= I: + 21,Id. (37)
C. Resolved Structure
FIGURE10. B-mode images from simulations with increasing relative scattering strength of
structural scattering (cubic matrix, I .25 mm characteristic distance) embedded in diffusely
scattering medium (volume density 5000 cm-’). Relative reflectivity of structural scatterers: (a)
25%; (b) 50%; (c) 75% (Thijssen and Oosterveld, 1990).
FIGURE1 I . (a) Autocorrelation functions (ACF) of the texture in the images of Fig. 10,
axial direction, d = characteristic distance of the cubic matrix; (b) spectra calculated from the
ACF’s in (a). Oscillations of ACF’s (ie., structural scattering) are revealed by a peak (arrow)
upon the spectra produced by the diffuse scattering.
336 J. M. THIJSSEN
tions, a Gaussian spectrum was implied; hence, both the spectral components
due to diffuse and to structural scattering are Gaussian weighted.
The analysis of the texture for this case of resolved structure in addition to
diffuse scattering is based on the autocorrelation function and the corre-
sponding power spectrum of image texture (Wagner et al., 1983; Lowenthal
and Arsenault, 1970; Insana et al., 1986b).The somewhat lengthy expressions
are not reproduced here, and the discussion is restricted to the derivation of
relevant parameters. Writing the total variance of the intensity,
0: = (I2) - (I)2 = 0; + c;, (38)
where 0; is the Rician variance, as before, and X: is the variance due to
(resolved part of) structural scattering. X: can be derived from the overall
second-order statistics. However, when considering a Gaussian-shaped
ultrasound transmission pulse, the power spectrum corresponding to the sum
of the diffuse and the unresolved structural scattering will be a Gaussian as
well. Therefore, a Gaussian is fitted to the minima of the scalloping due to
resolved scattering. The area below this curve then equals the Rician variance
c’, and the integral of the superimposed line spectrum yields the structural
variance Z:. It can be shown that
Hence,
(I>2 = .; + I:. (39)
3: = xi xj (i - m i ) 2 p ( i , j ) .
It may be remarked that in the first application of the cooccurrence matrix
by Haralick et al. (1973), these four parameters were also employed.
Another method that was applied to clinical echograms is the MAX-MIN
method (Mitchell et al., 1977). In this case the radiofrequency scanlines
underlying the echographic image are processed (Lerski et al., 1979). The
method consists of a gradual smoothing of the echosignals, and for each
grade of smoothing the number of extrema, i.e., maxima and minima, is
estimated. The smoothing algorithm is as follows:
if yk < xk+i - then Yk+i = xk+i - TI2
if xk+l - T/2 < y k < xk+l + T/2, then Y k + l = xk (47)
if xk+l + < Y k , then yk+l =xk+l + T/2,
where xk is the original (rf-) signal value at sample k,Y k is the new signal value
at sample k , and T is the threshold value.
As was shown by Mitchell et al. (1977), a plot of the logarithm of the
number of extrema vs. the threshold level may display a fairly straight line
at the low threshold position. These authors also explained that when pro-
cessing the logarithm of the image intensity, the slope of this line becomes
independent of the amplifier gain of the display system. Since most com-
mercially available echoscanners basically display the log-compressed envelope
of the echodata, the MAX-MIN method could easily be implemented as
described, provided that an adequate compensation for the ultrasound
attenuation is implemented. This is confirmed in recent work by Berger et al.
(1992).
VII. IMAGE
PROCESSING
Until here, the changes of the image texture due to changes of the volume
density, the structure, and/or the reflectivity of the scattering sites have been
considered for the image as a whole. It has been concluded that the mean
gray-level may change as well as the size of the speckles. However, in many
instances, the clinical question is the detection of the presence of a focal lesion
within an organ (Fig. 12). In terms of the statistical theory of signal detection
(Metz, 1978; Swets and Pickett, 1982; Thijssen, 1988), the problem can be
stated as follows: Which is the possibility of observing a difference between
a circular area (containing therefore a particular number of speckles)
ECHOGRAPHIC IMAGE PROCESSING 339
D
I AV
I
suspected of being a lesion and an area of the same size that can be considered
to belong to a “normal” part of the tissue? and When do the mean texture
characteristics (mean gray-level and/or speckle size) of these areas differ to a
certain amount? By also taking into account the transfer of echolevel to gray
level at the image display, a “lesion signal-to-noise-ratio” can be defined
(Smith et al., 1983b; Wagner and Brown, 1985), which uniquely describes the
detectability of the lesion. This detectability index applies to the concept of
an ideal observer, i.e., a concept where it is assumed that no uncertainty
(noise) is introduced by the detection process itself (North, 1963).
The lesion signal-to-noise ratio SNR, is defined as
SNRL = [(sz) - + I >I[d,, + d,Ll-1’2, (48)
where ( s j ) is the mean over the lesion area in the case of background ( j = I),
or of lesion ( j = 2), and c:., is the variance over the lesion area in the case
of background ( j = I), or of lesion ( j = 2 ) .
The relation of the statistical area (lesion) parameters to the pixel statistics
now has to be derived. The lesion is characterized by a weighting function
a(x,y) which can be uniform (e.g., equal to unity), or any other function (e.g.,
Gaussian; Wagner and Brown, 1985). The numerator in Eq. (48) can be
written
(50)
where C,(x - x’) = ([s(x) - (s(x))] [s(x’) - (~(x’))]) = autocovariance
(ACV). It can be shown that Eq. (50) for uniform weighting reduces to
340 J. M. THIJSSEN
FIGURE13. Left: single image of hyperechoic “lesion” in a contrast detail phantom. Right:
compound image of same lesion, average of six scans (‘1986 IEEE).
medium. For this reason, the SNR, can be used as an estimator of the
second-order statistics, but with a window size that is comparable to those
involved in the mean and median filters described previously. The SNR-filter
could be shown to produce visible lesions in the case of absence of gray-level
contrast (Verhoeven and Thijssen, 1990; Verhoeven et al., 1991).
The non-adaptive filters can also be looked at as producing parametric
images. So far, only parameters derived from the texture statistics have been
considered. It is, however, also feasible to estimate locally acoustic tissue
parameters: attenuation coefficient and backscattering parameters (Coleman
et al., 1985; Insana and Hall, 1990). Furthermore, multiparameter images
were derived by applying a cluster analysis (Mailloux et a]., 1985) or a
discriminant analysis (Momenan et al., 1988) to the locally derived
parameters in an effort to obtain a segmentation. It may be concluded, then,
that currently many ideas are circulating in the scientific community, but the
clinical impact remains to be shown. Moreover, the inhomogeneity of the
speckle characteristics of echographic images (i.e., depth-dependence due to
beam diffraction) remains an additional complicating factor in image pro-
cessing, at least for the time being.
C . Detection of Diffuse Pathological Conditions
The detection of diffuse pathological changes is a difficult task for a human
observer when using present-day equipment. The first problem is that the
“normal” condition, i.e., the normal appearance of the tissue texture, has to
be memorized. The second problem is a practical one: to be able to assess
changes of the mean gray-level of the equipment, its gain and TGC settings
should be consequently maintained after repeated calibrations, e.g., with a
stable tissue-mimicking phantom. This procedure is complicated by the
variable attenuation of intervening tissues (e.g., subcutaneous fat layer) of
patients, which should be compensated for by taking an average attenuation
per cm of tissue into account. Some brands of equipment facilitate this
procedure by enabling different TGC ranges and slopes to be set. The third
problem is extensively discussed in this paper; it arises from the dependence
of the speckle pattern on the transducer characteristics and on the depth
range (the “diffraction” effects). This problem is circumvented to a large
extent by the “computed sonography” type of echographic equipment. The
array transducer of this equipment is software-controlled in such a way that,
at a series of depths, a focusing at transmission is obtained with the same
numerical aperture, i.e., the employed part of the array is increased in
proportion to the depth range. This multifocus transmission mode is
combined with either the continuous (dynamic) focusing at reception, or else
the multifocus mode is also employed at reception. The resulting tissue
echograms display a fairly homogeneous texture over a large depth range.
ECHOGRAPHIC IMAGE PROCESSING 345
When scanning a patient, attenuation will again cause a depth dependence,
but it might be avoided to some extent if the synthetic focusing is performed
while anticipating an average attenuation level (e.g., 0.3 dB/cm MHz).
However, the effects caused by the modification of the spectral contents of the
travelling waveform by the tissue cannot easily be corrected for, and the
lateral size of the speckle will therefore still be depth-dependent.
A more appropriate means of assessing diffuse changes of the tissue texture
is the employment of a computer to analyze not only the first-order (i.e.,
gray-level histogram) but also the second-order (i.e., speckle characteristics)
statistical properties of the texture. When the radiofrequency signals, rather
than the video echograms, are digitized, a proper correction for both the
beam diffraction and the attenuation effects along the scan lines can be
achieved in the frequency domain prior to (software) image formation. The
assessment of abnormality can then be performed by comparing the acoustic
tissue parameters (attenuation coefficient, backscattering) as well as the
texture features of the image under investigation with a data base of
“normals.” This kind of combined acoustospectrographic and textural
analysis has already produced very convincing results (Insana et al., 1986a;
Garra et al., 1987, 1989; Oosterveld et al., 1991; Oosterveld, 1990; Nicholas
et al., 1986; Schlaps et al., 1987; Raeth et al., 1985; Feleppa et al., 1986;
Thijssen et al., 1991) and should be advocated for future developments in
equipment technology.
ACKNOWLEDGMENTS
This work has been supported by grants from the Netherlands’ Cancer
Foundation and the Technical Branch of the Netherlands’ Organization for
Scientific Research (NWO).
REFERENCES
Abbott, J . G . , and Thurstone, F. L. (1979). Acoustic speckle: theory and experimental analysis.
Ultrasonic h a g . 1, 303-324.
Bamber, J. C., and Daft, C . (1986). Adaptive filtering for reduction of speckle in ultrasonic
pulse-echo images. Ultrasonics 24, 41-44.
Berger, G., Giat, P., Laugier, P., and Abouelkaram, S. (1992). Basic aspects of the max-min
measure related to tissue texture. In “Acoustical Imaging” (H. Ermert and H. P. Harjes, eds.),
Vol. 19. Plenum, New York (in press).
Burckhardt, C. B. (1978). Speckle in ultrasound B-mode scans. IEEE Trans. Sonics Ultrasonics
SU-25,1-6.
Campbell, J. A,, and Waag, R. C. (1984). Measurement of calf liver ultrasonic differential and
total scattering cross sections. J . Acoust. Soc. Am. 75, 603-611.
346 J. M. THIJSSEN
Oosterveld, B. J., Thijssen, J. M., Hartman, P., and Rosenbusch, G. J. E. (1991). Ultrasound
attenuation and texture analysis of diffuse liver disease: Methods and preliminary results.
Phys. Med. Eiol. 36, 1039-1064.
Pauli, H., and Schwann, H. P. (1971). Mechanism of absorption of ultrasound tissue. J. Acoust.
SOC.Am. 50, 692-699.
Pitas, I., and Venetasanopoulos, A. N. (1990). “Nonlinear Digital Filters: Principles and
Applications.” Kluwer, Boston.
Pratt, B. (1978). “Digital Image Restoration.” Wiley, New York.
Raeth, U., Schlaps, D., Limberg, B. et al. (1985). Diagnostic accuracy of computerized B-scan
texture analysis and conventional ultrasonography in diffuse parenchymal and malignant liver
disease. J. Clin. ultrasound 13, 87-89.
Rice, S. 0. (1945). Mathematical analysis of random noise. Eel1 Syst. Tech. J . XXIV,46-158.
Romijn, R. L., Thijssen, J. M., and van Beuningen, G. W. J. (1989). Estimation of scatterer size
from backscattered ultrasound: a simulation study. IEEE Trans. Ultrasonics Ferroel. Freq.
Control UFFC-36, 593-606.
Romijn, R. L., Thijssen, J. M., Oosterveld, B. J., and Verbeek, A. M. (1991). Ultrasonic
differentiation of intraocular melanomas: parameters and estimation methods. Ultrasonic
Imag. 13, 27-55.
Schlaps, D., Zuna, I., Walz, M. et al. (1987). Ultrasonic tissue characterization by texture
analysis: elimination of tissue independent factors: In “Proceedings SPIE Congress”
(L. A. Ferrari, ed), Proc. SOC.Photo-Opt. Instr. Eng. 768, 128-134.
Sleefe, G. E., and Lele, P. P. (1988). Tissue characterization based on scatterer number density
estimation. IEEE Trans. Ultrasonics. Ferroel. Freq. Control UFFC-35, 749-757.
Smith, S. W., and Lopez, H. (1982). A contrast detail analysis of diagnostic ultrasound imaging.
Med. Phys. 9, 4-12.
Smith, S. W., and Wagner, R. F. (1984). Ultrasound speckle size and lesion signal to noise ratio:
verification of theory. Ultrasonic Imag. 6, 174-180.
Smith, S. W., Lopez, H., and Bodme, W. J. (1983a). Frequency independent ultrasound
contrast-detail phantom. J . Ultrasound Med. 2, 75.
Smith, S. W., Wagner, R. F., Sandrik, J. M., and Lopez, H. (1983b). Low contrast detectability
and contrast/detail analysis in medical ultrasound. IEEE Trans. Sonics ultrasonics SU-30,
16 4173.
Stephanishen, P. R. (1971). Transient radiation from pistons in an infinite planar baffle. J.
Acoust. SOC.Am. 49, 1629-6638.
Swets, J. A,, and Pickett, R. M. (1982). “Evaluation of Diagnostic Systems.” Academic Press,
New York.
Thijssen, J. M. (1987). Ultrasonic tissue characterization and echographic imaging. Med. Progr.
Technol. 13, 29-46.
Thijssen, J. M. (1988). Focal lesions in medical images: a detection problem. In “Proceedings
NATO-AS1 Mathematics and Computer Science in Medical Imaging” (M. A. Viergever and
A. Todd-Prakopek, eds.) pp. 415-440, Springer, Berlin.
Thijssen, J. M., and Oosterveld, B. J. (1985). Texture in B-mode echograms: a simulation study
of the effects of diffraction and of scatterer density on gray scale statistics. In “Acoustical
Imaging” (A. J . Berkhout, J. Ridder and L. Van der Wal, eds.), Vol. 14, pp. 481-486. Plenum,
New York.
Thijssen, J. M., and Oosterveld, B. J. (1986). Speckle and texture in echography: artifact or
information? In “IEEE Ultrasonics Symposium Proceedings” (B. R. McAvoy, ed.) Vol. 2,
pp. 803-810.
Thijssen, J . M., and Oosterveld, B. J. (1988). Performance of echographic equipment and
potentials for tissue characterization. In “Proceedings NATO-AS1 Mathematics and
ECHOGRAPHIC IMAGE PROCESSING 349
Computer Science in Medical Imaging” (M. A. Viergever and R. Todd-Prokopek, eds)
pp. 455-468. Springer, Berlin.
Thijssen, J. M., and Oosterveld, B. J. (1990). Texture in tissue echograms: speckle or informa-
tion? J . Ultrasound Med. 9, 215-229.
Thijssen, J. M., Oosterveld, B. J., and Wagner, R. F. (1988). Gray level transforms and lesion
detectability in echographic images. Ultrasonic Imag. 10, 171-195.
Thijssen, J. M., Verbeek, A. M., Romijn, R. L. et al. (1991). Echographic differentiation of
histological types of intraocular melanoma. Ultrasound Med. Biol. 17, 127- 138.
Trahey, G . E., Smith, S. W., and Von Ramm, 0. T. (1986a). Speckle pattern correlation with
lateral translation: Experimental results and implications for spatial compounding. IEEE
Trans. Ultrasonics Ferroel. Freq. Control UFFC-33, 257-264.
Trahey, G . E., Allison, J. W., Smith, S. W., and Von Ramm, 0. T. (1986b). A quantitative
approach to speckle reduction via frequency compounding. Ultrason Imag. 8, 151-164.
Van Kervel, S. J. H., and Thijssen, J. M. (1983). A calculation scheme for the design of optimal
ultrasonic transducers. Ultrasonics 21, 134-140.
Verhoef, W. A,, Cloostermans, M. J. T. M., and Thijssen, J. M. (1984). The impulse response
of a focussed source with an arbitrary axisymmetric velocity distribution. J. Acoust. SOC.Am.
75, 1716-1721.
Verhoef, W. A., Cloostermans, M. J. T. M., and Thijssen, J. M. (1985). Diffraction and
dispersion effects on the estimation of ultrasound attenuation and velocity in biological
tissues. IEEE Trans. Biomed. Engng. BME-32, 521-529.
Verhoeven, J. T. M . , and Thijssen, J. M. (1990). Improvement of lesion detection by echographic
image processing: signal-to-noise ratio imaging. Ultrasonic h a g . 12, 130.
Verhoeven, J. T. M., Thijssen, J. M., and Theeuwes, A. G. M. (1991). Improvement of lesion
detection by echographic image processing: signal-to-noise ratio imaging. Ultrasonic Imag. 13,
238-251.
Wagner, R. F., and Brown, D. G. (1985). Unified SNR analysis of medical imaging systems.
Phys. Med. Biol. 30, 489-518.
Wagner, R. F., Smith, S. W., Sandrik, J. M., and Lopez, H. (1983). Statistics of speckle in
ultrasound B-scans. IEEE Trans. Sonics Ultrasonics SU-30, 156-163.
Wagner, R. F., Insana, M. F., and Brown, D. G. (1986). Unified approach to the detection and
classification of speckle texture in diagnostic ultrasound. Opt. Eng. 25, 738-742.
Wagner, R. F., Insana, M. F., and Smith, S. W. (1988). Fundamental correlation lengths of
coherent speckle in medical ultrasonic images. IEEE Trans. Ultrasonics Ferroel. Freq. Control
UFFC-35, 34-44.
Wells, P. N. T., and Halliwell, M. (1981). Speckle in ultrasonicimaging. Ultrasonics 19,225-229.
This Page Intentionally Left Blank
ADVANCES IN ELECTRONICS A N D ELECTRON PHYSICS,VOL. 84
Index
35 I
352 INDEX
Birkhoff, 72 Commutator, 139, 150, 186
Block Complete lattice, 63
of block complex, 225 Complex, see Cell, complex
diagrams, 247 Component, 207
B-mode image, 325, 327 Compounding, spatial, 341
Bound frequency, 342
Cramer-Rao, 286-287, 292, 302, 304-306, Compression, logarithmic, 340
310 Compression factor, 228
sample size, 266, 271 Computed sonography, 344
trace covariance, 285-286 Conjugacy
Boundary, 199-216 in image algebra
in adjacency graphs, 199, 219-220 images, 74
area, 200, 216 templates, 78
definition, 21 7 in semi-lattice, 70
difficulties with, 199 self-, 70
tracking, 220 Conjugate element in lattice, additive and
Bounded lattice-ordered group, 67, 69 multiplicative, 70
radicable, 113 Connected
Bragg diffraction, 322 complex, 208
Break point, 244, 248 pairs of cells, 207
Burg method, 302-309 Connectedness relation, 207
Connectivity
C
of complexes, 207-208
Cancellation, 138 paradox, 198, 212
invariance and uniqueness, 141 resolution, 212-216
Canonical coordinates, 147-149 Consistent labelling, 232
rotations and dilations, 152 Constellation of graph edges, 216
smooth deformations, 154 Continuous additive and multiplicative
translations, 152 maximum, extension, 81
Cardinality of set, 73 Contour, 138
Cartesian ACC, 209 invariance, I38
Cartographical Contour, preservation, 343
data structure, 254 Contraidentity matrix, 276
point objects, 250 Contrast detail cprve, 340
Cartography, 250-254 Coordinates, 209
Causal residual, 17 Covariance matrix estimate
Cell explicit, 282, 284, 287
abstract, 202 general sample, 264-265, 268, 292-293
complex, abstract (ACC), 202-208 inverse linear, 301
connected, 208 isomorphic block diagonal, 280
definition, 202 linear, 263, 269, 271, 282
k-dimensional, 202 normal equations, 262, 289, 291, 297
&dimensional, 202 orthogonal subspace decomposition, 278
list, 224-228, 247, 250, 254 Crack, 200, 203, 218
Cellular array machines, 64 following, 220
Characteristic function, 7 Cross-correlation, 143-144
of image, 75 generalized, I5 I
Circular harmonic expansion, 144 Mellin-type, 158
Closed subcomplex, 205 normalized, 143
Closure, 207 Cross-correlator, 142-143
INDEX 353
Cuninghame-Green. 67 Estimation of volume, 3-D space, 122
Curvature measure. 244 Experiment, 245-246, 250-251, 255-256
Exponential map, 170, l85f
D Extended boundary, 219
Decision tree, 238
text file description, 241 F
language, 24 1 Face, 201, 204
compiler, 241, 243 relation, 204
Demodulation, 328 False alarm, 230
Detectability index, 339 Feature, 230-234
Diffraction, 3 19, 332 Filling, interior, of closed curve, 222
Bragg, 322 Filter
correction, 323 adaptive mean, 342
effect, 344 adaptive non-linear, 343
Fraunhofer, 3 19 adaptive weighted mean, 343
term, 323 adaptive weighted median, 343
Diffuse pathological conditions, 344 mean, 342
Digital straight segments (DSS),225-226 smoothing, 342
Digitization, automatic SNR, 344
maps, 250 window, 343
technical drawings, 254 Finite sample size
Dimension, 202 effective, 27 I
Dirac matrix, 277 estimation, 262-263, 265-266, 289, 291,
Distribution 296
autoregressive process parameters, 297-300 Fisher information matrix, 285-286
convergence, 309 initial conditions, importance, 298
exponential family, 263, 274 Morgera-Cooper coefficient, 27 I
Gauss-Markov, 27 I performance
heavy-tailed, 309 adaptive pattern classification, 265-266,
multivariate Gaussian, 264, 274, 380 270-271
sample set, 264, 297, 299 autoregressive parameter estimation,
univariate Gaussian, 297 292-293, 302-309
Wishart, 280-281 covariance estimation, 287
Division algorithm, 1 15, see also First fundamental form, 189
Skeletonizing technique coefficients, 173
Dual transportation problem, image algebra, Fisher information matrix, 285-286
124 Focus, 33 1 , 344, see also Transducer
Fourier-Mellin transform, I59
E Fourier transform, 135
Eigenproblem, image algebra, 1 1 1-1 14 discrete, 62, 84
eigenimage, 1 13 fast, 62
eigennode, 1 I2 Forward predictor, 299
equivalent, 112 Fraunhofer diffraction, 3 19
eigenspace of template, 1 12 Fundamental theorem of surface theory,
eigenvalue, 112 173, 192
principal, template, I 1 3
solutions, I14 G
Entanglement, 28 3aussian curvature, 174, 191
Entropy, 8 3auss-Weingarten equations, 176, 191
sample set, 264 3eneralized Lloyd algorithm, 10
354 INDEX