Você está na página 1de 375

ADVANCES IN ELECTRONICS AND

ELECTRON PHYSICS
IMAGE MATHEMATICS AND IMAGE PROCESSING

VOLUME 84
EDITOR-IN-CHIEF

PETER W. HAWKES
Centre National
de la Recherche Scientifique
Toulouse. France

ASSOCIATE EDITOR

BENJAMIN KAZAN
Xerox Corporation
Palo Alto Research Center
Palo Alto, California
Advances in
Electronics and
Electron Physics
Image Mathematics and
Image Processing

EDITEDBY
PETER W. HAWKES
CEMESILaboratoire d’Optique Electronique
du Centre National
de la Recherche Scientijique
Toulouse, France

VOLUME 84

ACADEMIC PRESS, INC.


Harcourt Brace Jovsnovich, Publishers
Boston San Diego New York
London Sydney Tokyo Toronto
This book is printed on acid-free paper. @
COPYRIGHT 0 1992 BY ACADEMIC
PRESS, INC.
ALLRIGHTS RESERVED.
N O PART OF THIS PUBLICATION MAY BE REPRODUCED OR
TRANSMITTED IN ANY FORM OR BY ANY MEANS, ELECTRONIC
OR MECHANICAL, INCLUDING PHOTOCOPY, RECORDING, OR ANY
INFORMATION STORAGE AND RETRIEVAL SYSTEM, WITHOUT
PERMISSION IN WRITING FROM THE PUBLISHER.

ACADEMIC PRESS, INC.


1250 Sixth Avenue, San Diego. CA 92101-4311

United Kingdom Edition published by


ACADEMIC PRESS LIMITED
24-28 Oval Road, London NWI 7DX

LIBRARY OF CONGRESS CARDNUMBER:


CATALOG 49-7504
ISSN 0065-2539
ISBN 0-12-014726-2

PRINTED IN THE UNITED STATES OF AMERICA

92 93 94 95 EC 9 8 I 6 5 4 3 2 1
CONTENTS

CONTRIBUTORS . . . . . . . . . . . . . . . . . . . . . . . ...
viii
. . . . . . . . . . . . . . . . . . . . . . . . . .
PREFACE ix

Residual Vector Quantizers with Jointly Optimized Code Books


CHRISTOPHER F . BARNESAND RICHARDL. FROST
I. Introduction . . . . . . . . . . . . . . . . . . . . . 1
I1. Review of Single-Stage Quantizers . . . . . . . . . . . . 6
I11. Residual Quantizers . . . . . . . . . . . . . . . . . . 11
IV . Scalar Residual Quantizers . . . . . . . . . . . . . . . 14
V. Vector Residual Quantizers . . . . . . . . . . . . . . . 26
VI . Reflection Symmetric RQ . . . . . . . . . . . . . . . . 30
VII . Experimental Results . . . . . . . . . . . . . . . . . . 37
VIII . Conclusions . . . . . . . . . . . . . . . . . . . . . . 51
Appendix: Tables of Rate-Distortion Data . . . . . . . . 52
References . . . . . . . . . . . . . . . . . . . . . . 58

Foundation and Applications of Lattice Transforms in Image Processing


JENNIFER L . DAVIDSON
I . Introduction . . . . . . . . . . . . . . . . . . . . . 61
I1. Theoretical Foundation of Lattice Transforms in Image
Processing . . . . . . . . . . . . . . . . . . . . . . 66
I11. Applications . . . . . . . . . . . . . . . . . . . . . 90
References . . . . . . . . . . . . . . . . . . . . . . 127

Invariant Pattern Representations and Lie Groups Theory


MARIOFERRARO
I. Introduction . . . . . . . . . . . . . . . . . . . . . 131
I1. The LTG/NP Approach to Visual Perception . . . . . . . 137
I11. Invariant Integral Transforms and Lie Transformation Groups 142
IV . Transformations of Integral Transforms . . . . . . . . . 157

V
vi CONTENTS

V . Notes on Invariant Representations of 3D Objects . . . . . 166


VI . Discussion . . . . . . . . . . . . . . . . . . . . . . 177
Appendix A . . . . . . . . . . . . . . . . . . . . . . 181
Appendix B . . . . . . . . . . . . . . . . . . . . . . 188
References . . . . . . . . . . . . . . . . . . . . . . 192

Finite Topology and Image Analysis


V. A . KOVALEVSKY
I. Introduction . . . . . . . . . . . . . . . . . . . . . 197
I1. Abstract Cell Complexes . . . . . . . . . . . . . . . . 201
I11. Images on Cell Complexes . . . . . . . . . . . . . . . 208
IV . Resolution of Connectivity Contradictions . . . . . . . . 212
V. Boundaries in Complexes . . . . . . . . . . . . . . . . 216
VI . Simple Image Analysis Problems . . . . . . . . . . . . . 220
VII . The Cell List Data Structure . . . . . . . . . . . . . . 224
VIII . Subgraph and Subcomplex Isomorphism . . . . . . . . . 229
IX . Variability of Prototypes and Use of Decision Trees . . . . 238
X. Applications . . . . . . . . . . . . . . . . . . . . . 245
XI . Conclusions . . . . . . . . . . . . . . . . . . . . . . 257
Acknowledgements . . . . . . . . . . . . . . . . . . . 258
References . . . . . . . . . . . . . . . . . . . . . . 258

The Intertwining of Abstract Algebra and Structured Estimation Theory


SALVATORE D . MORGERA
Foreword . . . . . . . . . . . . . . . . . . . . . . . 262
I . Introduction . . . . . . . . . . . . . . . . . . . . . 262
11. Covariance Models . . . . . . . . . . . . . . . . . . . 264
I11. Jordan Algebras . . . . . . . . . . . . . . . . . . . . 273
IV . Explicit MLE Solution . . . . . . . . . . . . . . . . . 281
V . AR Processes Parameter Estimation . . . . . . . . . . . 287
VI . Exact Loglikelihood for AR Process Parameter Estimation . 296
VII . Summary and Conclusions . . . . . . . . . . . . . . . 309
Acknowledgments . . . . . . . . . . . . . . . . . . . 310
Appendix A . . . . . . . . . . . . . . . . . . . . . . 310
Appendix B . . . . . . . . . . . . . . . . . . . . . . 312
Appendix C . . . . . . . . . . . . . . . . . . . . . . 313
References . . . . . . . . . . . . . . . . . . . . . . 314
CONTENTS vii

Echographic Image Processing


J. M. THIJSSEN
Introduction
I. 317
Physics of Ultrasound
11. 318
111.Acoustic Tissue Models. 321
IV.Estimation of Acoustic Parameters 323
V.Generation of Tissue Texture 325
VI. Texture Analysis. 329
VII.Image Processing 338
Acknowledgements I
345
References 345
INDEX 35 1
CONTRIBUTORS

Numbers in parentheses indicate the pages on which the authors’ contributions begin.

CHRISTOPHER F. BARNES(l), Georgia Tech Research Institute, Georgia


Institute of Technology, Atlanta, Georgia 30332
JENNIFER L. DAVIDSON(61), Department of Electrical and Computer Engin-
eering, 319 Durham Center, Iowa State University, Ames, Iowa 50011
MARIOFERRARO (1 3 l), Dipartimento di Fisica Sperimentale, Universita di
Torino, via Giuria 1, 10125 Torino, Italy
RICHARD L. FROST(l), Department of Electrical and Computer Engineering,
Brigham Young University, Provo, Utah 84602
V. A. KOVALEVSKY (197), Technische Fachhochschule Berlin, Luxemburger
Str. 10, 1000 Berlin 65, Germany
SALVATORE D. MORCERA(26 l), Department of Electrical Engineering,
Canadian Institute for Telecommunications Research, McGill University,
Montreal, Quebec, Canada
J. M. THIJSSEN (317), Biophysics Laboratory of the Institute of Ophthal-
mology, University Hospital, 6500 HB Nijmegen, The Netherlands

...
Vlll
PREFACE

In view of my attempts during the past few years to make image processing
one of the principal themes of these Advances, I am very pleased that this
volume is wholly concerned with image mathematics and image processing.
The subject is in a state of rapid development because, despite its many
successes in domains as far apart as space science, robotics, forensics, and
microscopy, many fundamental problems remain unsolved or imperfectly
understood. Several of these are examined here, together with a practical
application in echographic imagery.
The volume of data in a raw digitized image is so vast that coding is an
important task and vector quantization is known to be attractive in theory.
In practice, the size of the necessary codebook is an obstacle and the opening
chapter by C. F. Barnes and R. L. Frost analyzes the associated difficulties.
The introduction of image algebras (first covered in this series by
C. R. Giardina in Volume 67 and presented in detail by G. X. Ritter in
Volume 80) has generated many original ideas and revealed unexpected
connections between existing processing methods and classical mathematics.
A recent and extremely rich example is the relation between minimax algebra
and mathematical morphology. This has been explored in detail by
J. N. Davidson, author of the second chapter, who gives here a fuller account
of her work than is available elsewhere, in a langauge that should make it
widely accessible.
Invariance under translation, rotation, and perhaps more general transfor-
mation is an essential property of recognition algorithms but is extremely
difficult to achieve. The Lie group approach lends itself particularly well to
the study of this problem, as is shown in the chapter by M. Ferraro.
The topology of digitized images is not obvious; familiar notions such as
adjacency, interior and exterior, and connectedness need to be defined afresh
and there is so far no general consensus of opinion about the best way of
doing this. There is, however, a full but little-known literature on finite
topological spaces and the importance of this subject in image analysis is the
theme of V. A. Kovalevsky in the fourth chapter.
Estimation of a covariance is necessary in many statistical signal process-
ing problems, in one or more dimensions, but this task is often performed
without a proper knowledge of the relevant algebraic formalism. This
involves Jordan algebras, more familiar in quantum mechanics than in the

ix
X PREFACE

image processing community, and the intertwining of these algebras and


structured estimation theory is disentangled by S . Morgera in the
penultimate chapter.
The book concludes with an important practical example of image process-
ing, in the field of echographic images. The difficulties and successes of these
techniques are described and illustrated fully by J. M. Thijssen.
Much of the work described in this volume has not hitherto been surveyed
systematically and we believe that these accounts will form an indispensable
complement to the image processing literature. I am most grateful to all the
authors for the trouble they have taken to enable readers who are not
specialists in the topic in question to comprehend it. As usual, I conclude with
a list of forthcoming chapters.

FORTHCOMING
ARTICLES

Neural networks and image processing J. B. Abbiss and


M. A. Fiddy
Image processing with signal-dependent noise H. H. Arsenault
Parallel detection P. E. Batson
Ion microscopy M. T. Bernius
Magnetic reconnection A. Bratenahl and
P. J. Baum
Sampling theory J. L. Brown
ODE methods J. C. Butcher
The artificial visual system concept J. M. Coggins
Dynamic RAM technology in GaAs J. A. Cooper
Corrected lenses for charged particles R. L. Dalglish
The development of electron microscopy in Italy G. Donelli
The study of dynamic phenomena in solids using M. Drechsler
field emission
Amorphous semiconductors W. Fuhs
Median filters N. C. Gallagher and
E. Coyle
Bayesian image analysis S . and D. Geman
Magnetic force microscopy U. Hartmann
Theory of morphological operators H. J. A. M. Heijmans
Kalman filtering and navigation H. J. Hotop
3-D display D. P. Huijsmans and
G. J. Jense
Applications of speech recognition technology H. R. Kirby
Spin-polarized SEM K. Koike
PREFACE xi

Expert systems for image processing T. Matsuyama


Electronic tools in parapsychology R. L. Morris
Image formation in STEM C. Mory and C.
Colliex
Phase-space treatment of photon beams G. Nemes
Z-contrast in materials science S . J. Pennycook
Languages for vector computers R. H. Perrot
Electron scattering and nuclear structure G. A. Peterson
Edge detection M. Petrou
Electrostatic lenses F. H. Read and
I. W. Drummond
Scientific work of Reinhold Riidenberg H. G . Rudenberg
X-ray microscopy G. Schmahl
Accelerator mass spectroscopy J. P. F. Sellschop
Applications of mathematical morphology J. Serra
Focus-deflection systems and their applications T. Soma
The Suprenum project 0. Trottenberg
Knowledge-based vision J. K. Tsotsos
Electron gun optics Y. Uchikawa
Spin-polarized SEM T. R. van Zandt and
R. Browning
Cathode-ray tube projection TV systems L. Vriens,
T. G. Spanjer and
R. Raue
n-beam dynamical calculations K. Watanabe
Thin-film cathodoluminescent phosphors A. M. Wittenberg
Parallel imaging processing methodologies S . Yalamanchili
Diode-controlled liquid-crystal display panels Z. Yaniv
Parasitic aberrations and machining tolerances M. I. Yavor
Group theory in electron optics Yu Li
This Page Intentionally Left Blank
ADVANCES IN ELECTRONICS A N D ELECTRON PHYSICS.VOL. 84

Residual Vector Quantizers with Jointly Optimized


Code Books*
CHRISTOPHER F . BARNES
.
Georgia Tech Research Institute. Georgia Institute of Technology Atlanta. Georgia

RICHARD L . FROST
Department of Electrical and Computer Engineering. Brigham Young University Provo Utah . .

I . Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
I1. Review of Single-Stage Quantizers . . . . . . . . . . . . . . . . . . . . 6
A. Single-Stage Scalar Quantizers. . . . . . . . . . . . . . . . . . . . . 6
B . A Design Algorithm for Single-Stage Scalar Quantizers . . . . . . . . . n
C . Single-Stage Vector Quantizers . . . . . . . . . . . 9
D . A Design Algorithm for Single-Stage Vector Quantizers 10
111. Residual Quantizers . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
A . Definition and Notation . . . . . . . . . . . . . . . . . . . . . . . 11
B . The Optmization Problem . . . . . . . . . . . . . . . . . . . . . . 12
C . Equivalent Quantizers . . . . . . . . . . . . . . . . . . . . . . . . 13
IV . Scalar Residual Quantizers . . . . . . . . . . . . . . . . . . . . . . . . 14
A . Optimum Stagewise Quanta . . . . . . . . . . . . . . . . . . . . . . 15
B . Optimum Stagewise Partitions . . . . . . . . . . . . . . . . . . . . . 19
C . Tree-Structured Stagewise Partitions . . . . . . . . . . . . . . . . . . 23
V. Vector Residual Quantizers . . . . . . . . . . . . . . . . . . . . . . . 26
A . Optimum Stagewise Code Vectors . . . . . . . . . . . . . . . . . . . 26
B . Optimum Stagewise Partitions . . . . . . . . . . . . . . . . . . . . . 28
C. Tree-Structured Stagewise Partitions . . . . . . . . . . . . . . . . . . 28
VI . Reflection Symmetric RQ . . . . . . . . . . . . . . . . . . . . . . . . 30
A . The Reflection Constraint . . . . . . . . . . . . . . . . . . . . . . 32
B . Optimum Reflected Stagewise Code Vectors. . . . . . . . . . . . . . . 35
VII . Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . 31
A . A New Design Algorithm for Residual Quantizers . . . . . . . . . . . . 31
B . Synthetic Sources . . . . . . . . . . . . . . . . . . . . . . . . . . 38
C . Exhaustive Search Residual Quantizers . . . . . . . . . . . . . . . . . 39
D . Reflected RQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
VIII . Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
Appendix: Tables of Rate-Distortion Data . . . . . . . . . . . . . . . . . 52
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

* This material is based upon work supported by the National Science Foundation under
Grant No . 8909328 and a Centers of Excellence Grant from the State of Utah .

1 Copyright 0 1992 by Academic Press Inc. .


All rights of reproduction in any form reserved.
ISBN 0-12-014726-2
2 C. F. BARNES AND R. L. FROST

I. INTRODUCTION

Many information-bearing waveforms occur naturally in a continuous-time


continuous-amplitude form. Such waveforms are called analog signals. These
signals cannot be operated on by modern digital processing, storage, or
transmission systems without first being converted to digital form. Digital
signals are discrete-time, discrete-amplitude representations that are usually
encoded into a sequence of binary code words. Hence, the process of analog-
to-digital (A/D) conversion consists of three distinct steps: sampling, quan-
tization, and coding. Although this article will mention sampling and coding,
it will address primarily the problem of optimal quantization.
A continuous-time signal is converted to a discrete-time representation by
sampling the analog signal waveform at uniform intervals in time. This
sampling process is invertible if the original analog signal is essentially
band-limited and if the sampling rate satisfies the well-known Nyquist
criterion. Under these circumstances, there is no loss in the fidelity of the
discrete-time representation. The discrete-time continuous-amplitude signal
is converted to a discrete-time discrete-amplitude representation by restrict-
ing the continuous-valued amplitude of each sample to a finite set of possible
reconstruction values. This step is called quantization. In contrast to the
sampling procedure, the quantization process is not invertible, and there is
always some loss of fidelity in the discrete-amplitude representation. Finally,
the digital signal is formed by associating with each possible quantization
value an index or digital code word, usually in binary notation, which is used
for transmission, storage, and processing. If no errors corrupt the code
words, this binary coding procedure is also invertible.
The average number of bits per input sample in a binary digital represen-
tation determines the bit rate of the digital signal and is denoted R. Since the
fidelity of the digital signal increases as R increases, it is always possible in
theory to achieve any desired level of fidelity by increasing R. However,
increasing R also increases the computational speed, memory, and
bandwidth requirements of digital information processing systems. It is
therefore of interest to minimize the rate required to achieve a desired fidelity.
Alternatively, given a specified rate, it is of interest to minimize the associated
quantization distortion; the two problems are equivalent.
The study of the theoretical relationship between rate and fidelity is the
province of rate-distortion theory (Shannon, 1959), which is a subbranch of
information theory (Shannon, 1948). In rate-distortion theory, the informa-
tion source is represented by the probability density function (pdf) of its
possible amplitude values. A memoryless source may be completely specified
by a one-dimensional pdf, while a source with memory may require for its
RESIDUAL VECTOR QUANTIZERS 3

specification a pdf of very high, possibly infinite, dimensionality. Fidelity is


quantified as the expected value of an appropriate measure of quantization
distortion. A central insight of rate-distortion theory is that, for a fixed rate,
the distortion of the digital representation can be decreased if consecutive
samples from the discrete-time continuous-amplitude signal are grouped
together and quantized as a single pattern or vector. Quantizing a vector of
samples as a whole makes it possible to exploit any statistical dependencies
that may exist between samples and, to a lesser extent, geometric properties
of higher dimensional spaces. In contrast, a scalar quantizer, which quantizes
each sample independently of all other samples, cannot exploit such gains.
Another insight of rate-distortion theory is that for every fixed rate R there
exists a lower bound on the distortion D achievable by vector quantization
(VQ).’ This lower bound is attained asymptotically in the limit as the number
of samples in the vector becomes infinite (Berger, 1971). The set of lower
bounds for all R defines a continuous function of R, the distortion-rate
function D(R). Conversely, there exists a lower bound on the rate R achiev-
able at a fixed distortion D, the rate-distortion function R(D). The function
R(D)is the functional inverse of D(R). A readable discussion of the precise
definition of D(R) is available in (Gibson and Sayood, 1988).
It is desirable that a VQ have distortion close to D(R). Unfortunately, it
is impractical to accomplish this simply by increasing the vector dimension
n. Minimum distortion code vectors (the possible source vector represen-
tations) are located randomly in W. Optimal encoding of each source vector
requires, in general, a search of all code vectors. In the literature, such VQs
are referred to as exhaustive search vector quantizers (ESVQs). More specifi-
cally, if the rate of the ESVQ is held fixed at R bits per sample (bps), then the
number of code vectors to be searched is YR.The memory required to store
the collection of all code vectors in the code book and the computation
required to search the code book on a per sample basis are proportional to
Y R This
. in itself would not necessarily be a problem if the distortion D(n, R)
of a VQ with vectors of length n converged rapidly to the bound D(R) with
increasing n. Unfortunately, convergence is quite slow. For large n, a bound
on the best possible rate of convergence has been given (Pilc, 1967; Pilc, 1968)
as
D(n, R) 2 D(R) + o ( n - ’ log (n)). (1)
Empirical results (Fischer and Dicharry, 1984) confirm this slow rate of
convergence.
Quantizers whose memory and computation requirements increase expon-

’ The acronym “VQ” is used as an abbreviation for both vector quantizer and vector quantiza-
tion.
4 C. F. BARNES A N D R. L. FROST

entially with n are termed non-instrumentable; those whose costs grow only
algebraically with n are instrumentable (Berger, 1971). Instrumentability can
be achieved by imposing structure on the VQ code book so as to simplify the
search procedure and reduce the memory requirements for code book
storage. Once structure has been imposed, it is the task of the designer to
optimize the VQ subject to the imposed structural constraint (Gabor and
Gyorfi, 1986). Unfortunately, the constrained optimization problem is not
always tractable, and one has no choice but to resort to ad hoc design
procedures. In either case, the imposition of structure on the code book will
increase the distortion relative to ESVQ for a given n and R. In this sense,
structured VQs are suboptimal. For a given level of complexity, however, it
is possible that the distortion of the structured VQ may be less than that of
ESVQ. Although complexity is a much more difficult notion to quantify than
is vector dimensionality, it is clearly more relevant to determining the
practical merits of any given quantizer. A note of caution is in order here. Not
infrequently, a “reduced complexity” quantizer will turn out to be more
complex than an ESVQ for the same level of distortion; it is important to
assess the performance of any proposed VQ structure with care. Complexity
should be evaluated while fixing both distortion and rate, and distortion
should be evaluated while fixing complexity and rate.
Examples of structured VQs proposed by researchers include tree-structured
VQ (Buzo et al., 1980; Baker, 1984; Makhoul et al., 1985) and lattice VQ
(Gibson and Sayood, 1988; Conway and Sloane, 1982; Sayood et al., 1984).
Tree-structured VQs (TSVQs) encode each source vector by a tree search of
the possible code vectors. This type of search reduces computation to instru-
mentable levels. Memory requirements remain exponential in nR, and are
actually larger than those of an ESVQ for fixed n and R. A TSVQ has the
advantage of being adaptable to many different source distributions, and
usually suffers only relatively small increases in distortion compared with an
ESVQ (Makhoul et al., 1985). Lattice VQs perform well on uniformly
distributed memoryless sources, and their highly structured algebraic organ-
ization make them instrumentable in terms of both computation and
memory. However, they do not generally perform well on non-uniformly
distributed sources with memory (Gibson and Sayood, 1988).
Residual quantizer? (RQs) have been proposed to fill the middle ground
between ESVQ and lattice VQ. Earlier literature sometimes refers to RQs as
multiple-stage (Juang and Gray, 1982) or cascaded (Makhoul et al., 1985)
VQs. An RQ is organized as a sequence of ESVQ stages, each stage of which
uses a relatively small code book to encode the residual error of the preceding

The acronym “RQ’ is used as an abbreviation for both residual quantizer and residual
quantization.
RESIDUAL VECTOR QUANTIZERS 5
stage. This organization is appealing because it appears to induce a tree-
structure on both the VQ encoder and decoder, thereby reducing both
computation and memory relative to the ESVQ. Despite their apparent
economies, RQs have not been widely adopted. Earlier researchers (Makhoul
et al., 1985j reported that RQs with more than two stages did not perform
well compared with ESVQs. Nevertheless, the RQ or some variant continues
to be suggested in the literature (Chen and Bovik, 1990; Chan and Gersho,
1991).
Recently, we undertook a careful study of the RQ (Barnes, 1989; Barnes
and Frost, 1990) to understand its structure and limitations and to determine
under what circumstances, if any, the RQ is a viable alternative to the ESVQ
or lattice VQ. Our study has produced two main results. The first is a
derivation of necessary conditions for thejoint optimality of all RQ stagewise
code vectors. The second is the understanding that, despite their multistage
organizations, RQs are not in general effectively searched by a tree-structured
encoder. The combination of suboptimal code vector design and incompat-
ible tree-searching seems to account for the poor results reported by earlier
researchers (Makhoul et af., 1985). However, if the RQ alphabet is exhaus-
tively searched, the RQ distortion can be quite close to that of the ESVQ.
Exhaustive search residual quantizers3 (ESRQsj are the complement of
TSVQs in that they perform well and reduce memory cost relative to an
ESVQ, but they do not guarantee reduced computational costs.
In practice, of course, computational costs are very important and often
dominate concerns for memory costs. Accordingly, in this chapter we suggest
a new approach to reduce encoding complexity in RQs and characterize its
effects on distortion. The work described here does not provide a final answer
to the problem of efficient RQ encoding, but it does clarify the structure of
the problem and suggests other possibilities.
This chapter is organized as follows: Section I1 reviews the basic principles
of minimum distortion quantization. Both scalar and vector quantizers are
considered. Section 111 describes the RQ structure and an alternative RQ
representation used in subsequent analysis, called the equivalent single-stage
quantizer. Section IV considers the optimization of scalar RQs and presents
a derivation of necessary conditions for minimum distortion. The problem of
encoding complexity is also considered, and the difficulties associated with
tree-structured encoders for RQ are described and illustrated. Section V
generalizes the results of Section IV to vector residual quantizers. A modified

A note on semantics: It may be more accurate to refer to this structure as an ESVQ with
a direct sum code book (Barnes and Frost, 1990). However, the motivating factor for this study
has been the original residual structure and hence, in this paper, we call this a residual quantizer
with an exhaustive search encoder.
6 C. F. BARNES AND R. L. FROST

FIGUREI . Example of 4-level single-stage scalar quantizer.


RQ alphabet based on stagewise reflection symmetry and termed reflected
RQ (rRQ) is introduced in Section VI. Finally, Section VII compares the
distortion and complexity of the ESVQ, the unoptimized RQ, optimized RQ,
and rRQ on a variety of synthetic and natural sources. The chapter concludes
with a discussion of possible future work.

11. REVIEW
OF SINGLE-STAGE
QUANTIZERS

The theory of optimal single-stage quantizers was addressed first by Lloyd


(1957) in an unpublished work, and later in a published paper by Max (1960).
Both researchers considered the problem of establishing optimality
conditions for scalar quantizers with fixed alphabet sizes. Their work is
reviewed here to establish the standard mathematical approach to this
problem and to provide the foundation for the discussion of RQs.

A . Single-Stage Scalar Quantizers

A scalar quantizer can be graphically represented by its input-output


relationship. Figure 1 illustrates such a relationship for a quantizer with the
number of output levels N = 4. The abscissa represents the amplitude of the
input sample x and the ordinate represents the quantized output value Q(x).
RESIDUAL VECTOR QUANTIZERS 7
The possible output values are denoted { yo, y , ,y , , y 3 } .The decision bounda-
ries between the y j are the points of discontinuity of the quantizer's charac-
teristic function, or input-output curve, and are denoted {bo,b , , b,, b,, b 4 } ,
where bo = - 00, and 6, = 00. The quantizer operates on each sample of x
independently and assigns it the value Q ( x ) = yi if and only if x E [bj,b j + l ) .
The line segment S, = [b,, b,+,) is called thejth cell of the induced partition
of the real line.
Each continuous-amplitude sample to be quantized is considered to be a
realization x of a random variable X , having its amplitude distributed
according to a known probability density functionfx(. ). The performance of
the quantizer is characterized by the expected value of some appropriate
measure of distortion. The most widely adopted measure, both for analytical
tractability and practical value, is the squared error d(x, Q(x)) = ( x - Q ( x ) ) ~ .
Thus, the measure of quantizer performance is the quantizer's mean squared
error (MSE), defined as
+m

DMSE= ( x - Q(x))2fx(x)dx,

=
J =o
6'' ( x - y,)2fx(x)dx.
Necessary conditions for a minimum value of D,, are obtained by dif-
ferentiating Eq. (2) with respect to the b, assuming the y j are fixed, and by
differentiating with respect to they, assuming that the b, are held fixed, that
is,

N - (3)

The solution of Eq. (3) implies the partition boundaries must be midway
between adjacent quanta
6. = Y,-l +Y,
2 '
and that of Eq. (4) implies the quanta must be the centroids of their respective
partition cells

T'' xfx(x) dx
8 C. F. BARNES A N D R. L. FROST

These two conditions are known as the Lloyd-Max conditions. They are
necessary conditions for the minimization of Eq. (2), but may characterize
any stationary point of the quantizer distortion function. In the special case
where the source probability density function is log-concave, the quantizer
distortion function has a single stationary point, so these conditions are
sufficient to determine the global minimum (Trushkin, 1982).
The derivation above gives conditions for minimum quantizer distortion
assuming a fixed number of quantizer levels, but it does not account for
coding the quantizer output. The most obvious code is to represent each
quantum level by its indexj. In binary notation, the index requires a word of
length Pog2(N)1 bits, where rxl denotes the smallest integer at least as large
as x . For example, if N = 8 then in binary notation level 0 would be
coded by 000, level 5 by 101, and so on. It is easy to see that if N is not
a power of two, i.e., N = 2", m E Z + (the positive integers), there is some
inefficiency in this coding scheme. However, even when N is a power of
two, the coding efficiency of this straightforward scheme is optimal only if
p, = Prob(x c S,) = N - ' for all j ; that is, if each output value yJ is
equiprobable. In general, the minimum possible coding rate in bits per
sample is given by the entropy of the quantizer output, defined as
N

H(y) = - 1
J=I
PI log, (PJ)' (7)

If the p, are known, an entropy coding scheme can then be used to minimize
the average number of bits per sample required to represent the quantizer
output. Commonly used entropy coding schemes include Huffman coding
(Gallager, 1968), Lempel-Ziv coding (Welch, 1984), and arithmetic coding
(Langdon, 1984). Entropy coding schemes typically use code words of
varying lengths. Variable length codes are very sensitive to corruption by
noise, since changing a bit in the code may cause the decoder to become
confused as to the length of the corrupted code word and all succeeding code
words. Also, variable length codes create the possibility of serious data loss
(buffer overflow) or inefficient channel use (buffer underflow) when the
variable rate code is transmitted on a fixed-rate channel. These problems can
be managed but the gains in coding efficiency are sometimes outweighed by
increases in system complexity. The interested reader can explore these
problems further in Farvardin and Modestino (1984); Farvardin and
Modestino (1986), and in Jayant and No11 (1984). The present discussion
considers only the case of fixed-rate codes.

B. A Design Algorithm for Single-Stage Scalar Quantizers

Closed form solutions that satisfy both Eqs. ( 5 ) and (6) simultaneously are
RESIDUAL VECTOR QUANTIZERS 9
not usually available. Solutions are obtained iteratively, typically through the
use of an algorithm suggested by Lloyd, which he called Method I. This
algorithm is initialized by some arbitrary placement of the { y,}. Holding the
{ y,} fixed, the algorithm computes optimal {b,}, which satisfy Eq. (5). Then
the {b,} are held fixed, and new { y,} are computed according to Eq. (6).This
process is repeated many times, alternating between the two optimizations.
Since, for each minimization the distortion is non-increasing, and since the
overall distortion is bounded below by zero, the algorithm is guaranteed to
converge monotonically to a solution satisfying both Eqs. ( 5 ) and (6).

C . Single-Stage Vector Quantizers

The basic development of Lloyd and Max just reviewed was generalized by
Linde et al. (1980) to include the vector case. They also explored the use of
distortion measures more general than squared error. Their work is reviewed
in this section.
An n-dimensional single-stage vector quantizer of a random vector4 X with
probability distribution function Fx( .) is a direct generalization of the scalar
quantizer described above, and consists of the following: 1) a finite indexed
subset A = {yo, y , , . . . , y N - , } of W, called a code book, where each yj E A
is a code vector; 2) a partition B = { S o ,S , , . . . , S N - ,} of %", where the
equivalence classes or cells 4 of 9 satisfy +,'
4 = %", and 4 n sk = 0 for
j # k; and 3 ) a quantizer mapping Q: W H A that defines the relationship
between the code book and partition as Q ( x ) = y, if and only if x E 4.
Specification of the triple ( A , Q, 9)determines a vector quantizer.
Analogous to Eqs. ( 5 ) and (6), necessary conditions for minimum distortion
of single-stage vector quantizers are that the y, E A be centroids of their
respective partition cells

and that the partition cells be described by

x E 4 if and only if d ( x , y,) 6 d ( x , Y k ) for all k. (9)


Any arbitrary tie-breaking rule may be used in the event of an equality.
Equation (9) implies that the partition cells have boundaries which are
nearest-neighbor with respect to adjacent code vectors. For obvious reasons,
Eqs. (8) and (9) are called generalized Lloyd-Max conditions.

Bold fonts are used for vectors, normal fonts for scalars.
10 C. F. BARNES A N D R. L. FROST

D . A Design Algorithm for Single-Stage Vector Quantizers

As with scalar quantizers, the design of an arbitrary VQ is performed iter-


atively. In principle, Lloyd’s Method I can be used without modification. In
practice, however, two very important problems arise. The first problem
concerns the description of the partition boundaries: the calculation of the
cell centroids requires an explicit representation of the cell boundaries in
order to integrate the source pdf over each partition cell. The explicit descrip-
tion of the cell boundaries of an arbitrary partition of W can be extremely
complicated. The second problem is that, for most natural information
sources, such as speech and imagery, analytical source pdf descriptions are
not available.
To circumvent these problems, Linde et al. (1980) proposed the use of a set
of statistically representative data called a training set to replace the use of an
analytic pdf. The training set can be partitioned implicitly by associating each
training set vector with its closest code vector y j . After this partitioning-by-
coding step, a new centroid can be computed for every cell S, by averaging
over the set of training set vectors associated with S,, and the process is
repeated until convergence. This algorithm is known as either the Linde-
Buzo-Gray (LBG) algorithm or as the Generalized Lloyd Algorithm (GLA).
The convergence properties of the LBG algorithm have been studied in
Sabin and Gray (1986), and Gray et al. (1980). In particular, Gray et al.
(1980) showed that if the source is block stationary and ergodic, and if the
training set size is unbounded, the LBG algorithm will produce the same
quantizer design as it would if run on the ‘true’ underlying distribution. They
also showed that these conclusions could be generalized to a broad class of
distortion measures. Their analysis holds for all distortion measures with the
following properties:
0 For any fixed x E W,d ( x , y ) is a convex function of y.
0 For any fixed x, if some sequence y(n) -+ co as n --f 00, then also
d ( x , y ( 4 ) -, a.
0 For any bounded sets B , , B, E Sn,S U ~ ~ . d(x, ~ , y, ) ~ <~ co,
~ ~that is,
d ( . , .) is locally bounded.
As with Lloyd’s Method I, the LBG algorithm can only be guaranteed to
find a stationary point of the VQ quantizer distortion function. Typically, the
variation in distortion produced at different local minima to the same
quantizer design problem is fairly small. As a result, the LBG algorithm has
been widely used in the design of quantizers for the past decade. Recently,
some research has been directed toward the use of alternative design
algorithms, such as simulated annealing, in an attempt to find globally
optimal solutions (Flanagan et al., 1989). This work seems to indicate that
RESIDUAL VECTOR QUANTIZERS 11

Channel

Source G *

2. Block diagram of a residual quantizer encoder.


FIGURE
the variation in distortion between designs obtained with the two algorithms
is also small, at least at the low rates for which VQs are potentially practical.

111. RESIDUAL
QUANTIZERS

In this section, RQs are defined and issues relevant to their optimization are
discussed. A direct application of the Lloyd-Max analysis to the stagewise
code vectors is not straight forward and probably accounts for the delay
between the introduction of RQ by Juang and Gray (1982) and the analysis
by Barnes (1989). The key to optimizing multistage quantizers is a description
of the RQ in terms of an equivalent single-stuge quantizer. A careful notation
is required to maintain consistency between the two descriptions. This
chapter introduces the use of the nested labeling notation used by Forney
(1988) to describe coset decompositions of lattices. Once the notation is
established, the optimization of the single-stage quantizer follows directly
from the approach of Lloyd (1957) and Max (1960), reviewed in earlier
sections.

A . DeJinition and Notation

Block diagrams of a two-stage RQ encoder and decoder are shown in


Figs. 2 and 3. Let X' be an n-dimensional random vector with probability

@(il) 02(j2)

- User

FIGURE
3. Block diagram of a residual quantizer decoder.
12 C. F. BARNES A N D R. L. FROST

distribution function Fx,(.). A P-stage residual quantizer of X' consists of


a sequence of P quantizers { ( A p ,Q', P ) ;I < p 6 P}, ordered such that ( A ' ,
Ql, 9 ' )quantizes the source vector X I (a particular realization of XI) and
(PI , , , + I ) quantizes the residual vector xp+' = xp - Qp(xp)of the
@'+I,
preceding stage (AP, Q p , P') for 1 < p < P. Each residual vector xp E %' is
considered to be a realization of a random vector X P with distribution
function F x p ( - on ) %' for 1 < p < P + 1. The collection of stagewise
residual vectors X', . . . , Xp has joint distribution F x I , , , X P ( . ) .
The number of code vectors in AP and partition cells in P p is denoted N p .
The code vectors comprising the code book AP and the cells comprising the
partition P pare indexed with the subscriptjP, wherejPis a member of the pth
indexsetJP = { O , l , . . . , N p - l}.Forexample,AP = {y;,yf,. . .,y",,_,}
and P p = { S ; , Sf, . . . , S ; p p , } .
As before, the map Q p :"IH AP applied to the pth stage input xp yields
@(x'p) = ypI* if and only if xp E S$. The vectors xp and the quantizer maps
@(-) are related according to
P

where x'+' is both the residual error of the last stage and the total residual
error of all stages.
In practice, each Q P ( . )is realized as a composition of an encoder mapping
bp(.) and a decoder map W(*), viz., Qp(xp) = Qp(bp(xp)).Thepth encoder
mapping bp: H JP is defined as b p ( x p ) = j p if and only if xp E SP,. For
each source vector, the indexes produced by the sequence of encoder maps
are concatenated to form an index P-tuple j' = ( j ' , j 2 , . . . ,j')'. Each
P-tuple is called a product code word and is an element of the Cartesian
product of the stagewise index sets j' E {J' x J2 x . * . x J p } .the decoder
maps W :JpH AP recover from each stagewise index j p the corresponding
code vector y$. The quantized representation S1of the input source vector X I
is formed by the sum of the selected stagewise code vectors,
P
S' = 1 y$
p=l
(1 1)

B. The Optimization Problem

Let d(x', 9 ' )denote the distortion that results from representing X I with S',
and let D(x', 9 ' ) = E{d(x', S')} denote the expected distortion, where E ( - )
is the expectation operator. A P-stage residual quantizer is said to be optimal

The concatenated P-tuple index j p written in bold face is not to be confused with the
Pth-stage index j p .
RESIDUAL VECTOR QUANTIZERS 13
for the source distribution F x l (.) if it gives at least a locally minimal value of
the average distortion. An optimal RQ has a set of code books { A P }and a
set of partitions { P p }which together minimize
P
D(xl, 9') = I d [XI, Qp(xp)] dFxI,..xp. (12)
p= I

This expression is difficult to evaluate since F x l , , , x p (is- )generally not known.


Moreover, Eq. (12) is unsuitable for a Lloyd-Max analysis since this joint
distribution function is dependent upon the RQ code books and partitions.
This difficulty may be circumvented by adopting the approach described in
the next section.

C . Equivalent Quantizers

To simplify the minimization of Eq. (12), the notion of an equivalent single-


stage quantizer was introduced by Barnes (1989). The residual and equivalent
quantizers are identical in that they produce the same representation of the
source output and therefore have the same average distortion. However, the
equivalent quantizer has the advantage that the expected value of the RQ
distortion can be expressed simply in terms of the known source distribution
Fxl instead of the unknown joint distribution F X , , , , X P ( - ) .
( a )

An equivalent quantizer is specified by a triple (A', @, T )consisting of


an equivalent code book, equivalent mapping, and equivalent partition, respect-
ively. These are now defined. The elements of Ae are the elements of the set
of all possible sums of stagewise code vectors A' = A' + +
. . . A', one
code vector from each stage. The set of these equivalent code vectors is the set
of all possible RQ output quanta.6 Each equivalent code vector y' E A' can
be indexed by the P-tuple j"' = ( j l , j 2 ,. , . , j ' ) introduced earlier,' and can
be written as
P

Each f(.)E A' also represents a path through a tree structure that may be
associated with the residual quantizer. Figure 4 represents the code vector
tree of a scalar, three stage, binary (two code vectors per stage) RQ. The root
node of the tree represents the origin of the coordinate system. The leaf nodes
represent the set of equivalent quanta.* The intermediate nodes represent

To avoid pathological cases, we assume each element of A' is a unique point in W.


' Whenever a P-tuple is used as an index to an equivalent component, it is written with
fw).
functional notation, i.e., Stagewise components are indexed with subscript notation.
In Fig. 4, the equivalent quanta are written with decimal subscripts that correspond to the
binary P-tuple j p . That is, y'(jp) = Y;;, where j' = Xi=,j p 2 p - ' and j p E (0,I ) .
14 C. F. BARNES A N D R. L. FROST

FIGURE4. Example of an unentangled three stage, two quanta per stage, scalar residual
quantizer.

partial sums of the equivalent quanta. The branches represent stagewise


quanta. At the first level, the partial sums and the stagewise quanta are
identical. There are N' = n;=,N Pequivalent code vectors in A'.
The j'th equivalent cell of the RQ is the subset S e ( j P )c W such that all
x i E Se(jP) are mapped by the residual quantizer into y'(jP), that is,
P
Se(jP) =
i
XI: 1 QP(xP) = y'(jP) } .
p=l

It is assumed that each Se(jP)has nonzero measure. The equivalent partition


(14)

9'is simply the collection of all equivalent cells. Similarly, the equivalent
mapping p:93" H A'is defined as g ( x l ) = y'(jP) if and only if x' E Se(jf).
The average distortion of the equivalent quantizer is

D(X1, j z ' ) =
I d[x', 8'(X1)]dFXl,
and is given in terms of the known source distribution function F,, (.).
(15)

By construction, the equivalent single-stage quantizer produces the same


representation as its corresponding multistage RQ, namely,

It follows that the expected distortion of the RQ, given by Eq. (12), and that
of its equivalent quantizer, given by Eq. (15) must be equal. However, the
latter expression is much easier to minimize.

IV. SCALARRESIDUAL
QUANTIZERS

We now consider the optimization of scalar RQs under the mean square
metric. Let XI be a real random variable with probability density function
f X , ( - ) . The RQ design problem is to choose a set of code books {AP}and
RESIDUAL VECTOR QUANTIZERS 15

FIGURE5. The equivalence class 2; (indicated by thick lines) and 2;(indicated by thin
lines) in an unentangled tree.
partitions { g p that
} together give at least a locally minimum value of
m

DM,, = [x' - p(x')]2fxl(x')dx'. (17)


-m

A . Optimum Stagewise Quanta

In this section, our strategy is to find { A P } directly by finding the gPthat


minimize Eq. (17), assuming that all stagewise partitions {P'}, and hence,
8', are fixed. First, write Eq. (17) as

DM~E = cJ
alljp sqj4
[XI - y'(j'>12f,,(x')dx'. (18)

In addition to assuming a fixed Y e ,assume that all code books except for A P
with p E { I , . . . , P} are held fixed. To minimize DMSE with respect to the
(kP)thcode vector in A P ,set the partial derivative of Eq. (18) with respect to
yipequal to zero to get

The partial derivative in Eq. (19) is

where Hk"p is the set of all j' such that the pth element of j' = ( j l , . . . ,
j p ,. . . ,j ' ) is equal to kP, i.e., HIP = {j': j p = k P } .The set of P-tuple
indexes H:p corresponds to all equivalent quanta y'(j') that contain yip in
their construction. Corresponding to H i , is a subset of the real line
XL = ~ , ~ , ~ ~ ~ For ( j ~ ) . Fig. 5 has emphasized with heavy lines
S l example,
the equivalence class Z;, the union of all equivalent cells Se(jP) whose
corresponding ye(j') use y: in their construction.
16 C . F. BARNES AND R. L. FROST

Substituting Eq. (20) into Eq. (19) and solving for yfp gives the desired
result

c
jPEH&
J se(jq f,I(X’)dX’

Although the meaning of Eq. (21) is self-evident, it is possible to give a


more satisfying interpretation of this result as the conditional mean
of a particular residual random variable. The development is somewhat
involved, but it provides a more complete understanding of optimal residual
quantization.
Note that the expression
P

c Y;
p= I
(22)
P+P

contained in Eq. (21) differs from the construction of the (jP)th equivalent
quantum y e ( j p )in that the pth node of the (jP)th path through the RQ tree
is removed. Since p E { 1,2,. . . ,P},the removed node is not necessarily at the
end of the path through the RQ tree and hence is not a simple “pruning” of
the tree. Instead, Eq. (22) corresponds to the (jP)th path in the RQ tree with
the pth node removed and the two remaining portions of the path “grafted”
back together. If we define for each possible y e ( j p ) the corresponding pth
grafted branch as

= 2 Yi’p,
p= 1
P+P

then Eq. (21) may be written as


r

where the indicator function Zse(,P)is defined as


1 if x 1 ~ F ( j P ) ,
IS.(jP, =
0 otherwise,
and the order of summation and integration in Eq. (21) has been inter-
RESIDUAL VECTOR QUANTIZERS 17
changed. Now define for all j' and for all x' E S e ( j P )the quantity

= XI - gp(j'). (27)
Since t pis the residual that results when the corresponding pth grafted
branch is subtracted from XI, t pis called the pth grafl residual. Because it is
a translation of the realization X I of the random variable XI, tPis also a
realization of a random variable 2' with associated pdff,,(-). The optimal
value of y i p will now be shown to be a certain conditional mean of 3.
Notice that the map from the XI E '93 to the t pE '93 is many-to-one and into.
That is, for any given t pthere may be many different values of XI (including
no value), each in a different F(j'), that yield the same value of tp.To
account properly for the effects of this many-to-one map in the follow-
ing development, define the pth graft residual cell G p ( j P = ) { t Pt:P=
XI - g P ( j p ) , x ' ~ F ( j ' ) } Gp(j')
. contains all graft residuals Cp formed from
the XI EF(j'). Associate with each Gp(jP)an indicator function

Since XI E Se( j') o t pE GP(jP), changing the variable of integration in


Eq. (25) gives

Yy, =

The form of Eq. (29) can be simplified by equating the common sum in the
numerator and denominator to the pdf of EP,conditioned upon the use of y i p .
Proceeding, expand the pdffx,kp(jP) + t p ]as a sum of conditional pdfs

This expansion is a simple application of the law of total probability, since


for p fixed the XL are disjoint and the collection (%{, %f, . . . ,Xip- I } forms
a partition of '93. Now notice that the conditioning over x1in Eq. (30) can also
be expressed in terms of pth causal residual xp = XI - CPp:; Q p ( x p ) , the
residual formed from the encodings of all stages prior to the pth stage. This
is true since, by the definitions of the equivalent partition and the pth stage
partition, the sets { X ' : X ' E % ~ ~ } and ( x ' : x P ~ S ~ ,are
} equal for
18 C. F. BARNES A N D R. L. FROST

kp = 0,1,. . . ,N p - 1. Hence, Eq. (30) can be written as


NP - I
fxl[S"(jp)+ PI = 1
kP=O
f,l,,rPE, [s"(j') + t ~ l x " E ~ p l P r ~ b ( ~ P ~ S (31)
~P).

Now, substitute Eq. (31) into the sum common to the numerator and
denominator of Eq. (29) and reduce the resulting double sum to obtain
1 IGp(jp)fxI[gV(j') + PI
jPEH$

NP-I

Combining Eqs. (34) and (29) gives

g o =
s, tY& rops~p(sp
I xpE S,&)dtP9
which is valid for 1 < p < P and 0 < kP < N P .Thus, assuming fixed Bpand
(35)

AP with p # p, each optimal quantum yfois the conditional mean of the pth
graft residual 2 given xpE Sip.
The preceding development may be made clearer with the aid of an
illustration. Figure 6 illustrates the relationships between A', Hfp, and ygP.On
the left the two branches of the tree representing the stagewise quanta y i are
marked by heavy lines. Also marked in heavy lines are the SC(j') in the
equivalence class Z i associated with y i . On the right, the S y ( j pof
) HJhave
been translated by the appropriate grafted branches g 2 (j') of the corre-
sponding grafted tree structure (also illustrated). After translation, the
portions off,, ( - ) that have support on these constituent S"(j') are summed
and normalized to in accordance with Eq. (34). The optimal
value of y i is the mean
RESIDUAL VECTOR QUANTIZERS 19

FIGURE6. The equivalence class 2; before and after translation of the constituent S' by
the appropriate graft residuals.

The form of Eq. (35) has an attractive intuitive interpretation. We know


that to minimize the distortion of a Lloyd-Max quantizer, each code vector
yj must be the centroid of its corresponding partition cell 4. If the quantizer
output values are unconstrained, this condition can always be satisfied.
However, in a P-stage RQ the equivalent code vectors are constrained by the
residual quantizer tree structure. Therefore, it is generally not possible to
choose each y'(j') as the centroid of its associated partition cell sC(jp). The
best that can be done is to choose each stagewise code vector y;,, such that the
expected distortion is minimized for all x' E ,x;P,.
Also, note that in conventional RQ the yf,, are centroids of sets of residuals
formed from the encoding decisions of all prior stages (Juang and Gray,
1982). In contrast, optimal y f p are centroids of residuals formed from the
encoding decisions of all prior and subsequent stages. In this sense, each stage
is aware of the encoding capabilities of all other RQ stages and adjusts its
quanta accordingly. This may minimize the loss of information due to
pooling residuals, which was conjectured by Makhoul et al. (1985) to be a
major cause of high distortion in RQs.

B. Optimum Stage wise Part it ions

An optimum sequence of partitions {PIP2, , . . . ,P P }must give at least a


locally minimum value of Eq. (17) with respect to a set of fixed stagewise code
books ( A 1 , A 2 ,... , A P ) or, equivalently, a fixed A ' . However, there is a
fundamental asymmetry between the problems of finding optimal stagewise
partitions and of finding optimal stagewise quanta just discussed. In the
quanta design problem, Eq. ( 1 3) defines explicitly the relationship between
equivalent quanta and stagewise quanta. This relationship makes it possible
to optimize the equivalent quanta by direct optimization of the stagewise
quanta. No such explicit relationship exists between the equivalent partition
and the stagewise partitions. Therefore, we cannot optimize the equivalent
20 C. F. BARNES A N D R. L. FROST

partition boundaries by direct optimization of the appropriate stagewise


partitions. Instead, we are forced to take a more indirect route in our pursuit
of optimum partitions.
We consider the following three questions:
1. What unconstrained 9'minimizes Eq. ( 1 7) with respect to a fixed A'?
2. What sequence of stagewise partitions {PI,Pz,. . . ,P'} realizes an
optimal P?
3. If such a s ' is implemented by a sequence of stagewise partitions
{PI,P2,. . . ,S'}, under what conditions, if any, is the complexity of
encoding reduced?
The answer to the first question is straightforward, given our understanding
of single-stage quantizers. For any equivalent map @( -), the expected distor-
tion satisfies

No partition 8" can yield a lower average distortion than the partition that
maps each X I into the y e (j') E A' that minimizes d ( x ' , Hence, the optimal
a).

partition is defined by
<
x'ESe(j') if and only if d ( x ' , y ' ( j P ) ) d ( x ' , y ' ( k P ) ) for all kP, (37)
where any arbitrary rule may be used in the event of a tie. This is the
unsurprising result that the best s ' for a fixed A' is a nearest-neighbor or
Voronoi partition (Gersho, 1979).
To answer the second question, we wish to describe a sequence of stagewise
partitions {S',P2,. . . , P'} that will realize the optimal equivalent partition
P described by Eq. (37). Proceeding, first extend the definition of the
distance metric as
d ( x , A ) = mind(x,y),
YEA
(38)
so that d ( . , - ) can be used to indicate the distance between a point and a set.
Equation (37) can then be written
x ' ~ S ' ( j if~ and
) only if d(x',y:, + y f 2 + +$)
< d ( x ' ,A' + A Z + . . . + A'). (39)
Assuming that x' ES'(j'), it is clear that optimal first stage partition cells 4''
must satisfy
X ' E ~ if' ~
and only if d(x',y:, + A' + + A')
< d ( x ' , A ' + A'+ ... + A p ) . (40)
In other words, Eq. (40) requires 4'' to be the subset of % that is nearest
RESIDUAL VECTOR QUANTIZERS 21

FIGURE7. The translations of the subtrees of A ' + A'+ A' to form the smaller tree
A2 + A3.
neighbor with respect to be terminating nodes of the subtree that originates
at Y:'.
A similar construction can be repeated to form the optimal equivalence
classes SPp of the remaining stages. For example, when the residual x2 is
formed by the difference x2 = X I - Q' (x'), the code vector tree
+
A' + A2 + . . . AP is modified by subtracting the first non-zero component
of each path to yield a smaller subtree A' + A' + . . . + A P .The formation of
this smaller tree is illustrated by Figs. 7 and 8. In Fig. 7, the difference
x2 = x' - Q l ( x ' ) causes a translation of each of the subtrees corresponding
to nodes in the first nonzero layer of A' + A' + A 3 . In Fig. 8, each of the
translated subtrees of Fig. 7 superimposes to form the smaller tree A2 A3, +
where the root nodes of the smaller tree occurs at the origin of the residual
x2. Assuming d ( . , * ) is translation invariant in that d ( x , y ) = d ( x - z , y - z )
for x, y , Z E % , the procedure used to determine an optimal 9'can be
recursively utilized to determine an optimal Y pfor any p E { 1,2,. . . ,P} to
yield the rule
XPE SPp if and only if d(xP,y; + AP+'+ . . . + A P )
< d(XP, AP + AP+' + ... + AP), (41)
where x p = x' -::Z: Q(x').
In other words, a stagewise partition ypis optimal if and only if it
corresponds to a union 2; of equivalent cells Se(.) where the Se( .) E 2; are
nearest neighbor with respect to the corresponding equivalent quanta. Again,
an illustration will clarify the relationship between optimum 4'' and optimum
S e ( j P ) .Like Fig. 4, Fig. 9 represents a three-stage scalar residual quantizer.
I

FIGURE8. The tree A* + A'


22 C. F. BARNES AND R. L. FROST

FIGURE9. Example of a partially entangled three stage, two quanta per stage, scalar
residual quantizer.

However, this tree has crossing branches. Now, compare Figs. 5 and 10. In
both figures, the subset of %, indicated by thick lines, represents #,, . Because
Hd is also the nearest neighbor set (the union of four nearest neighbor
equivalent quantizer cells) with respect to the subtree that originates at y:, it
is identical to the optimal stagewise partition cell Sit The same is true for the
subset XI1 of % indicated by the thin lines, the quantum y f, and the partition
cell Sl .
To answer the third question, observe that the stagewise equivalence class
Si in Fig. 5 is a connected interval of % and may be distinguished from the
equivalence class S,' by a single boundary point. In contrast, the equivalence
class S
A in Fig. 10 is not a single connected interval. It is a union of three
disjoint line segments and requires five boundaries to distinguish it from S: .
We say that Fig. 5 represents an unentangled tree and Fig. 10 represents an
entangled tree. If the optimal encoding rule Eq. (40) were to be implemented
for the entangled tree by a optimal stagewise encoder, five tests would be
required at the first stage alone to determine whether x' where in Si or in S ; .
For a completely entangled tree, a single optimal stagewise partition can have
complexity as high as that of 9".In general, it seems to be most economical
to implement an optimal Pewith a single exhaustive search encoder. We call
such RQs exhaustive search residual quantizers (ESRQs).

FIGURE10. The equivalence class (indicated by thick lines) and X; (indicated by thin
lines) in an entangled tree.
RESIDUAL VECTOR QUANTIZERS 23

TI1 ,-, 71'


Y:

FIGURE11. The relationship between the successive partitions of a tree-structured encoder


and the optimal equivalent partitions.

C. Tree-Structured Stagewise Partitions

The RQ encoder implied by Fig. 2, which encodes each source sample xi


sequentially by finding at the pth stage the nearest quanta yjp to the residual
x p , is not an optimal encoder. It is a suboptimal tree-structured encoder and
is not guaranteed to find the nearest equivalent quanta even though it finds
the nearest stagewise quanta at every stage. Of course, it is not necessary for
the encoder to be optimal to be useful. However, the encoder should at least
have access to all equivalent quanta as possible encoding decision candidates,
and for any source sample should always select a nearby equivalent quanta.
Surprisingly, the residual encoder of Fig. 2 cannot do this; in general, a
residual encoder may be "blind" to many of the equivalent quanta and will
never choose them.
This can be demonstrated with an illustration. Figure 11 represents the
same entangled scalar decoder tree seen earlier. The partition cells induced by
a binary tree-structured RQ encoder are denoted as 5.
The cells F,' and q'
are shown near the top of the tree as a heavy and light horizontal bar,
respectively. The optimal stagewise partitions S d and S,' are marked in the
same manner at the base of the tree. Notice that y; does not lie in $', even
though it uses yl in its construction. Similarly, y; does not lie in 9,'.
Therefore, the tree-structured encoder of Fig. 1 1 will never use either y; or y;
to represent any source sample in the stagewise partition cells 5' and F,',
respectively.
The inability of the residual encoder to access all of the equivalent quanta
via sequential encoding decisions can be a cause of serious inefficiency in RQ.
By design, the residual encoder transmits for every source sample a code
word j' large enough to represent every equivalent quanta. By reserving
indices for equivalent quanta that can never be used, the residual encoder
inflates the operating rate of the RQ above the true or effective rate. The
24 C. F. BARNES AND R. L. FROST

Scalar Quantization of the Gaussian Source


SQNR (a)
b(R) ................
yM.N..
36.00
34.00
. --
- - onuniform
- - - - - - - - -.
Scalar R Q
th-"-"rf;&- - -
32.00
30.00
------.
JG Scalar RQ
28.00
26.00
24.00
22.00
20.00
18.00
16.00
14.00
12.00
10.00
8.00
6.00
4.00

2.00 4.00 6.00

Rate (bits/sample) = number of binary stages

FIGURE 12. SQNR vs. rate performances of various scalar quantizers on the memoryless
Gaussian source.

importance of this effect can be illustrated quite clearly by examining the


distortion behavior of tree-searched RQs as the rate is increased.
For example, scalar RQs were designed for two memoryless sources: a
zero-mean unit-variance Gaussian N(0,I)and a zero-mean unit-variance
Laplacian 9(0,1) source. The distortion vs. rate curves of these RQs are
illustrated in Figs. 12 and 13. Each RQ has only two quanta per stage.
Therefore the abscissa, which represents the rate R in bits per sample, also
represents the number of stages in each RQ. The ordinate represents the
distortion D M s E in terms of the signal-to-quantization noise ratio (SQNR),
defined as

1" xtf(x)dx
SQNR = 10 log,, J - m dB. (42)
DMSE

Each figure compares the SQNR of an optimized, exhaustively searched


RQ, a tree-searched (JG) RQ designed as in (Juang and Gray, 1982), and an
RESIDUAL VECTOR QUANTIZERS 25

Rate (bits/sample) = number of binary stages

FIGURE13. SQNR vs. rate performances of various scalar quantizers on the memoryless
Laplacian source.

optimal nonuniform step-size Lloyd-Max quantizer. Also shown is D(R) and


the optimal uniform step-size Lloyd-Max quantizer performance. In each
case, the SQNR of the exhaustively searched scalar RQ lies between that of
the nonuniform Lloyd-Max quantizer and the uniform Lloyd-Max
quantizer. Note that these binary RQs are the most memory efficient and
hence the most highly constrained of all scalar RQs with the same number
of equivalent quanta. The SQNR of RQs with the same N’ but fewer stages
and hence more quanta per stage would approach more closely the SQNR of
the nonuniform Lloyd-Max quantizer. These results seem to suggest that a
properly designed and searched RQ will achieve distortion between that of
an optimal unconstrained quantizer and a uniform step-size or lattice
quantizer.
In contrast, the SQNR of the tree-searched RQ with suboptimum quanta
rapidly approaches an asymptote as R increases. Increasing the code book
size and, therefore, the apparent rate of the RQ by adding more stages has
little effect on distortion. This very clearly demonstrates the inefficiency of the
26 C. F. BARNES A N D R. L. FROST

residual encoder (together with suboptimal stagewise quanta) as previously


described. In our experience, the inefficiency of residual encoding becomes
more severe as the bit rate is increased by adding stages. Even at low rates,
some inefficiency is frequently present if the number of stages is greater than
two or three. We believe that the inefficiency of the residual encoder, as much
as or more than the averaging of residual code vectors, explains the dis-
appointing distortion results reported for RQ by other researchers (Makhoul
et al., 1985).

V. VECTOR
RESIDUAL
QUANTIZERS

In this section, the necessary conditions for the optimality of the stagewise
quanta, stagewise partitions, and equivalent partitions developed in Section
IV for scalar RQs are extended to vector RQs. This generalization follows
directly from the work of Linde et al., (1980). However, for vector code books
it will be seen that residual encoding can be inefficient even when the RQ code
vector tree is not entangled. This is demonstrated with some illustrations of
two-dimensional RQ equivalent code books.

A . Optimum Stagewise Code Vectors

To derive a vector generalization of Eq. ( 3 9 , assume the partitions


{9i",P2,.. . ,9'}are fixed and seek stagewise code books { A ' ,A', . . . ,A'}
that minimize
WX',
g ' ) = Exi {d[x', Q(x')l},

(43)
Since the distortion measure is position invariant, subtract the pth grafted
branch gp(j') = X i = l y,$ from both X I and ye(jf) in Eq. (43) to obtain
D(x',g')= c ExllxlESY(jP){d(x'
alljp
P#P
gp(jP),y~)lx'ESe(jP)}
-

x Prob(x'EF(j')). (44)
Next, partition the indexes j' into the familiar subsets H:,,for 1 < kp< NP.
Thus, if jPEHkpPp,then y;p = y;,, and
D ( x ' , g ' )= 11 E X I I X I E S c ( j P-~gP(jP),Y;p)IX'ESe(jP)}
{d(~'
L P E J , ' jp,H;P

x Prob (x' E Se( j')). (45)


RESIDUAL VECTOR QUANTIZERS 27
For all jp, define for the x ' e S e ( j p ) the pth graft residual T P = X I - gP(j'),
which is considered to be a realization of a random variable EP with con-
ditional pdf f,,(- I x ' E F ( j p ) ) . Substitute TP into Eq. (45) and make the
appropriate change of variables to obtain
W,:') = C C
k P E J Y . p HQP
~ ~ P I x , e , ~ , j p , { d ( T Px, ~' EP Fk p( j)PI ) }Prob(x'EF(jP)).
J E k

(46)
Since F ( j P ) n F ( k P ) = @ for j' # k',
N x ' , ~ ' =) E=PIxIsJyII,
{ d ( T P , ~ ~ p ) I ~ ' ~ X ~ P } P r o b(47)
(x'~X~).
kPe J P kP

By definition, the sets { x ' : x ' E XIP} and the sets { X I : x p E S , , } are identical for
kg' = 1,2,. . . ,N p , where xP ;: Q p ( x p ) is the pth casual residual of
= X I - ZCpp
the RQ. Hence Eq. (47) gives

In the code vectors yipE APare allowed to vary while all partitions {PP}
and
all code books { A P }with p # p are held fixed, observe that

2 C
kPEJP
inf Ez- P I
uE'Rn
XPE S& { d ( 5 P , u ) I x " ~ S & } P r o b ( x P ~ S ~ P )(50)
.

Clearly, if points yip exist in !Rfl such that


&"Ix,'E.FP
to
{d(TP,Y f P ) I X " E SJ,: = inf
US%"
x,'Es,J{d(CP,u)
kP
I x P E s:, 1, ( 5 1)

no other pth stage code book can yield a smaller average distortion (with the
other RQ stages arbitrarily fixed). In Barnes and Frost (1990), it is proved
that there exist points ytP that satisfy Eq. (51) if for all j p E H:" the sets S e ( j p )
have nonzero measure.
In conclusion, for an RQ to give minimum average distortion the code
vectors y& must satisfy
EEP X P E SfP 1
{ d (TP,Y f P ) I x P E S,q, = min
UP%"
ESP,X P € S f , { d (TP,4 I x p E S,p,1 (52)
for 1 < p < P and 0 < kP < N p . The conditional densityfz,,lx,,Es,, (.) used in
ki'
Eqs. (46-52) is determined from the source density f,, (.) as
28 C. F. BARNES AND R. L. FROST

B. Optimum Stagewise Partitions

The structure of the optimal equivalent and the optimal stagewise partitions
for a vector RQ follows directly from the results obtained for the scalar case.
For completeness the definitions are repeated here with appropriate
modifications to the notation. The optimal equivalent partition is defined by
x ' E S e ( j P ) if and only if d ( x ' , y'(j')) < d ( x ' , y ' ( k P ) ) for all k P , (54)
where some arbitrary rule is used in the event of a tie. Likewise, the optimal
stagewise partitions are given by
x P E ~ $ if and only if d(xP,yiPp + AP+' + . .. + A')
< d(XP, AP + AP+' + . . . + AP). (55)

C. Tree-Structured Stagewise Partitions

In Section IV, an RQ was defined to be entangled if its optimal stagewise


partitions were not simply connected regions. We saw that, for a scalar RQ,
entanglement was sufficient to cause equivalent vectors to be inaccessible by
a residual encoder. For a vector RQ, an even weaker condition may produce
this inaccessibility. In multidimensional spaces, if the code vector tree-
structure is such that optimal stagewise cells SlP are not convex, then a
tree-structured encoder may not have access to every equivalent code vector
even though they may be simply connected. This is easily illustrated by
examining several example code vector constellations of two-dimensional
RQ code books. Constellations from an ESVQ and two different RQs are
shown in Figs. 14-16. All of these code books were designed using a training
set of 40,000 random vectors generated by a memoryless Gaussian source.
The plots are all on the same scale, but are unitless, since identical (except for
scale) code structures will result as the variance of the source is changed. In
each case, the rate is four bits per sample.
Figure 14 illustrates a conventional ESVQ code book. As expected, it
produces the highest SQNR of the three, but has no structure or pattern in
the code vector locations. Figure 15 illustrates the equivalent code vector
constellation of an ESRQ. The RQ consists of eight stagewise code books,
each of which contains only two code vectors. To illustrate some of the code
book structure, the code vectors are divided to correspond to the elements of
H i , marked with an *, and the elements of Hf,marked with a + . If this code
book is exhaustively searched, the SQNR is only about 1.7dB less than the
performance of the corresponding ESVQ. However, consider searching this
code book with a residual encoder. Equation (41) requires Sd to be the subset
that is nearest neighbor with respect to the collection of * marks, and S,' to
RESIDUAL VECTOR QUANTIZERS 29

. .
. . b

. ..........
....
0 b

..........
b
b

. . * .
. * *.
b.

.
b
0

....... . . b
b

0
........
.............
.
.
. . . . . . ... . . * .

. .... . .
b .

Rate = 4 bps : SQNR = 21.95 dB

FIGURE14. Code vector constellation of an ESVQ for the memoryless Gaussian source.

be the subset that is nearest neighbor with respect to the collection of +


marks. Note that Si and SI are each simply connected. However, since they
are not convex regions, the complicated boundary between them cannot be
approximated reasonably by an single straight line (two-dimensional binary
tree-structured encoders can use only straight line boundaries at each stage).
It is not difficult to see that the use of a binary residual encoder on this code
vector constellation will cause a serious degradation in SQNR.
Finally, consider the code book constellation of Fig. 16, designed
according to the procedure described in Juang and Gray (1982). The spread
of the constellation is much smaller than either the ESVQ or optimized
ESRQ constellations and is poorly structured, and, of course, the SQNR is
much reduced compared to ESRQ.
30 C. F. BARNES A N D R. L. FROST
I I 1 I I

I I I I I

Rate = 4 bps : SQNR = 20.24 dB

FIGUREIS. Equivalent code vector constellation of an ESRQ with eight binary stages for
the memoryless Gaussian source.

VI. REFLECTION
SYMMETRIC
RQ

It appears that arbitrary RQ code books cannot in general be searched


effectively by a conventional residual encoder. It is therefore of interest to
determine whether RQ efficiency can be improved without resorting to an
exhaustive search encoder. We have considered two approaches to this
problem. The first and most obvious alternative is to modify the search. We
have studied the use of the M-algorithm (Jelinek and Anderson, 1971), a well
known extension of conventional tree searches. The M-algorithm does not
restrict itself to the single best path at a given stage, but propagates the M
best paths at every stage. In our experiments (Barnes, 1989), this does
increase the probability of finding a good encoding at the expense of
increased computational complexity.
The second alternative is to constrain the RQ code book such that the
RESIDUAL VECTOR QUANTIZERS 31

Rate = 4 bps : SQNR = 15.49 dB

FIGUKE16. Equivalent code vector constellation of sequentially optimized RQ.

nearest neighbor stagewise equivalence classes are both simply connected and
convex. Perhaps the simplest constraint is to require each stagewise code
vector to be orthogonal to every stagewise code vector at every other stage.
A stagewise orthogonal RQ was recently been suggested (Chen and Bovik,
1990) for image quantization. In this quantizer, the first level code vectors are
amplitude vectors chosen to represent the average brightness across the
source vector. This reduces the search for the first level vectors to a scalar
distance computation. The second level vectors can then be forced to have
zero mean, and to represent only deviations from the mean vector. Imagery
appears to lend itself to this RQ structure, and the authors report good results
at very modest complexity.
Other constraints are also possible, limited only by the ingenuity of the
researcher. In this section, we consider a constraint for binary RQs that
forces a reflection symmetry between the stagewise code vectors. Although
optimality conditions are derived for the stagewise code vectors under the
reflection constraint, they cannot always be satisfied and still maintain the
32 C . F. BARNES AND R. L. FROST

YQ

FIGURE17. First stage code vectors, equivalent code vectors, and equivalent partition
boundaries of a two stage rRQ.
desired connectivity and convexity of the stagewise partitions. Nevertheless,
this new structure is instrumentable. and permits the construction of very
large equivalent code books with very large vector dimensions at achievable
levels of complexity. For sources with memory, such as natural imagery, it
gives significantly lower distortion than does conventional RQ. As such, it
may be particularly useful for VQ applications at high rates.

A . The Reflection Constraint

Assume there are only two code vectors { y i , y f } at each of the RQ stages.
Consider the perpendicular bisecting hyperplane halfway between yo”and yP .
To encourage convex optimal stagewise partition cells, we require that each
equivalent code vector on one side of the hyperplane have a “reflection”
equidistant from and on the opposite side of this hyperplane boundary. That
is, if the hyperplane is imagined to be a mirror, then each equivalent code
vector that originates from y i on one side of this mirror must have a reflection
originating from yf on the other side of the mirror. If this condition can be
satisfied, then the simple hyperplane boundary will describe optimal
stagewise partition cells. Figure 17 illustrates a two-dimensional, two-stage
RQ with the desired symmetry.
This reflection symmetry can be described as follows. Given two code
vectors {yo”,y f } at the pth stage, the point midway between the code vectors
is given by

The point m p lies on the nearest-neighbor boundary between the two code
RESIDl.JAL VECTOR QUANTIZERS 33
vectors yf and y(. The difference vector n p = yf - yo" is normal to this
nearest-neighbor hyperplane boundary. The unit normal vector is

-
where 11 )I is the Euclidean norm. The plane through m pperpendicular to the
normal vector A p is the desired boundary, and is described by
-
fiP * m P U P = 0, (58)
where u p is any point in the plane. The smallest distance 6 between any point
xp and the perpendicular bisecting hyperplane is given by
--
6 = IlAP mPxP 11. (59)
Define the reflected vector S p and, with a slight abuse of notation, the
forward reflection operator' a;(-) at the pth stage as

w p = &!$(XP) =
i'"
xp - 26AP
ifjP=O-xPES[,

if j p = 1 - x P ~ S f .
By convention, the forward reflection operator .Yep,(.) reflects points in Sf to S;.
(60)

Define also an inverse reflection operator ,~2,;~( *) such that .%?,;p(B;(xp))


= xp.
Note that a single code vector y p = yo" represents both code vectors (y;, y f }
since yo" = :%!f(yf). The probability density function on f i P ~ g=pa{(S() u
3f(Sf> +
is .fXP ( n p ) = f,, ( a p ) f,, (W).
+
Define also the ( p I)th reflected residual x p + I = iip - f p , together with
the cell S P + '= ,!? - yp. The residual xp+lE S ~ +has I pdf fXPfl(xP+')=
+
fRp(xp+' y p ) . Sp+' is further subdivided with corresponding {yo"", yf" }
such that S P + '= S(+l u Sf+'. Initially, of course, %" = Sd u S : .
The reflected residual vector xp+I can be defined recursively as
X P f ' == 9 p ( X P ) = g p (XP) - y, (61)
IP /p

where 2??p,(xp)combines both the reflection and translation operations (an


affine transformation). A closed form expression for the total reflected
residual x p + ' can now be written as the sequence of compositions
xP+I = = q P ( . . ,).%
)qw
q")
P
= n Li!$(XP).
p= I
(62)

The quantized source vector is reconstructed from the stagewise code

The subscript j p on the operator %


. (,;-) should not be thought of as indexing one of several
operators but rather as an aid for indicating whether the variable x p was reflected. I t could be
more properly written as an independent variable .P'(.,in).
34 C. F. BARNES AND R. L. FROST

vectors by performing in inverse order the appropriate inverse translations


and reflections, In accordance with Eq. (61) the partially reconstructed
g P ( j P , j P + l , . . . , j p ) equivalent code vector can be written recursively as

fP(jP,jP+l ,. . ., j P ) = 2 J ; p ( g p + I ) , (63)
where the inverse reflected residual operator is defined as
9,;"(BP+I) =9,;p(y+I + 9". (64)
It follows that the final reconstruction 2' of the source vector XI is given by
the resulting composition
I

where gpt' = 0, the zero vector. Clearly, f ' ( j ' , j 2 ,... , j p ) is analogous to
y'(j') defined for conventional RQ.
A visual aid for illustrating the structure of the partially reconstructed code
vectors 9 p ( j p , j p +.'.,. , j p ) given by

can be constructed by folding a sheet of paper over onto itself P times for a
P-stage coder. Punch one hole that passes through each fold in the paper. The
hole represents 9'. Undo the first fold to form the two partially reconstructed
equivalent code vectors g p ( j p )The
. crease represents the stagewise boundary
between S,P and Sp. Now translate the pattern by yp-' and undo the second
fold to observe the four code vectors gp-'(jp-',jp) and the three partially
reconstructed equivalent boundaries. There are in general N P = 2 ' + ' - p such
partially reconstructed code vectors, constructed according to Eq. (66) from
all possible ( P + 1 - p)-tuples ( j p . . j ' ) {Jp
~ x * .. x J'}. There are also
2 p + ' - p - 1 hyperplanes that determine the individual equivalent partition
cell boundaries at the pth stage. Continue to unfold the paper to reconstruct
the entire equivalent code vector constellation and all equivalent boundaries.
Because reflection in two dimensions is equivalent to folding, this represents
the equivalent code book A' of a two-dimensional reflected RQ (rRQ). This
visual aid suggests that rRQ might also be called origami" RQ.
Note that rRQ requires somewhat more computation than residual
encoded RQ because of the need to reflect the residual vectors x p at the
encoder and to unreflect the partial reconstructions gp(.) at the decoder. I t
now remains to derive optimality conditions on the stagewise code vectors.

lo The Japanese word origumi refers to the art of paper folding.


RESIDUAL VECTOR QUANTIZERS 35
B. Optimum Rejected Stagewise Code Vectors

An appropriate optimality condition can be derived for rRQ as follows. The


expected distortion of an origami coder can be expressed as
D ( x l , R 1=
) 1 E{d(x',9'(jP))Ix'ES'(jP)}
Prob(xlES'(jP)). (67)
all j P

Since the distortion measure d ( x ,y) is translation invariant, and since reflec-
tion is distance preserving, we may rewrite Eq. (67) by applying a sequence
of forward reflection operators to both x' and 9'(jp) to obtain

x Prob(x' E Se(jP)),

= c E { d ( x p , f P ( pI)x)l E S ' ( j P ) } Prob(x'


alljP
E sP(jp)), (68)

where Y''( p) = 9 p ( j p , j pI,.t . . , j p ) .By applying a single reflection operator to


both vectors and using Eq. (63), d ( x p ,y p ) can be manipulated into the form

Now, define the pth reflected graft residual as


5 = 2~ - jip+l(jp+l) ifx'EF(jP). (70)
Combining Eq. (70), Eq. (69) and Eq. (68) gives
D(x',%')= E { d ( p , f P ( p ) )x'ESe(jP)}
l Prob(xlES'(jP)). (71)
all j P

The form of Eq. (71) is identical to that of Eq. (46). It follows directly that
to minimize the expected distortion in quantizing X I with an rRQ, the
stagewide code vectors must satisfy
E{d(zP, , I j i p E Sp}
I gpE g p }= min E { d ( t p u) (72)
UEOi"

for (1 < p < P) and (1 < kP < N p ) .This result is analogous to Eq. (52), but
differs in that if for the origami code the reflection boundary at the pth stage
is assumed fixed there is only one independent code vector Y p to optimize.
Alternatively, the reflection boundary may also be iteratively improved
during the decoder optimization step of the design procedure. That is, instead
of finding one graft centroid for 9, two graft residual centroids can be
calculated, one each for S,P and Sf,and then the corresponding hyperplane
36 C. F. BARNES A N D R. L. FROST

Voronoi boundary can be modified accordingly. Optimizing the stagewise


reflection boundaries usually decreases distortion, but may lead to encoder
entanglement. The latter method was used to design the rRQ code books
tested in this paper.
The close similarity between the RQ and rRQ code vector optimality
criteria is not surprising. Both RQ and rRQ use their stagewide code vectors
additively in the construction of their equivalent code vectors. Also, both
make use of the partially reconstructed code vectors. For unreflected RQ, this
can be demonstrated by rewriting the original graft residual Eq. (27) as
P

t p =xp - 1
p=p+ I
y$ if x' E S E ( j P ) , (73)

where the summation C,P=,+, y$ in Eq. (73) describes partially reconstructed


code vectors in conventional RQ. The reflected graft residual ep in Eq. (70)
has therefore the same relationship to its stagewise code vector y p as the
original graft residual t pin Eq. (73) has to its stagewise code vector y:,,.
It turns out that Eq. (72) cannot always be satisfied at all stages simul-
taneously without impairing the efficiency of the residual encoder. This can
be explained in the following way: A translation of any f p changes all the
equivalent code vectors y ' ( j p ) by producing a rigid translation of the entire
pattern of partially reconstructed code vectors y p ( j p , j p +. '.,. , j p ) .This trans-
lation does not affect either the boundaries or the reflection operators BfP( .)
at previous stages p < p, so the reflection symmetry induced by previous
stages is unchanged. Indeed, each of the y ' ( j p )is translated in such a way as
to preserve the reflection symmetry of the previous stages in very much the
same way that the pieces of a kaleidoscope image translate. However, an
arbitrary translation of j+' does translate the boundaries and hence modifies
the reflection operators at all subsequent stages. Iff!' is moved far enough,
it will move one or more partially reconstructed code vector
f p ( j p , j p + .l ., ,. j p ) across the hyperplane boundary established at the pth
stage. If this happens, the rRQ encoder will be unable to access some of the
y'(jp) and the same problem of inefficient code book use that plagued
conventional RQ will affect rRQ. This condition can be checked during the
design process by ensuring that all equivalent code vectors are themselves
correctly encoded by the rRQ encoder. If they are not, it may be helpful to
change the order of stagewise optimization. If this does not correct the
problem, it may not be possible to optimize jointly all stages and still preserve
consistent labeling. In this case it seems best to optimize only the latter stages,
leaving earlier stages fixed.
Figure 18 illustrates the equivalent rRQ code vector constellation corre-
sponding to Figs. 14-16. As before, the code book has eight stages. The
reflection symmetry between stages is quite evident. The code vectors are
RESIDUAL VECTOR QUANTIZERS 37

I I I I I

Rate = 4 bps : SQNR = 19.63 dB

FIGURE18. Equivalent code vector constellation of a r K Q with eight binary stages for the
memoryless Gaussian source.

much more spread out than those shown in Fig. 16, so it is not surprising that
the SQNR is more than 4dB better than conventional unoptimized RQ even
though both use a tree-structured encoder. In fact, the reflection constraint
has cost only 0.6dB in SQNR when compared with an optimized ESRQ.

VII. EXPERIMENTAL
RESULTS

A . A New Design Aigorithm for Rrsiduul Quuizrizrrs

Both the Lloyd Method I and the LBG algorithm can be interpreted as
iterated design procedures where finding centroids of fixed partition cells is
analogous to optimizing the decoder for a fixed encoder; and finding a new
nearest-neighbor partition with respect to a fixed set of quanta is analogous
to optimizing the encoder for a fixed decoder. When repeated application of
38 C. F. BARNES A N D R. L. FROST

these optimization steps leaves the quanta and partitions unchanged, the
quantizer satisfies a fixed point condition.
The basic philosophy of this design approach can be used to design jointly
optimal residual quantizers. The difference, however, between the Lloyd and
LBG algorithms for single-stage quantizers and a similar algorithm for
multistage residual quantizers is that there must be two interlaced iterative
fixed-point procedures: one for optimization of the encoder/decoder pair,
and another to satisfy the graft residual centroid condition simultaneously
among all R Q stages. In the second iterative procedure, each R Q stage is
optimized while holding the code books of all other stages fixed. The new
code vectors of an optimized stage satisfy the necessary graft residual
centroid conditions with respect to the fixed code books of the other stages.
This procedure is then repeated for a different stage. However, the process of
optimizing the code vectors of a different stage cause the first stage that was
optimized no longer to satisfy the graft residual centroid condition. It is
eventually necessary to return to all stages and repeat the process in “round
robin” fashion. Since the changes made to the code books of each stage can
only decrease or leave unchanged the average distortion of the R Q (assuming
a constant fixed partiton), this iterative procedure converges to a fixed point.
After this fixed point has been reached, a new encoder/decoder iteration is
performed (a new partition is selected) and the entire process is repated until
both fixed-point conditions are simultaneously satisfied. This is the method
used to design the jointly optimal residual quantizers tested in this section.

B. Synthetic Sources

Experimentals results derived from simulations of vector residual quantizers


are presented here. In all cases, the fidelity criterion is the mean squared
error normalized by the source variance. The distortion is denoted D(n, P , R )
to indicate the dependency on vector size n, the number of stages P ,
and the rate R . Distortion is represented in terms of the signal-to-
quantization noise ratio ( S Q N R )measured in dB, defined as SQNR(n,P, R ) =
10 log,,, (a:/D(n,P, R)),where a: is the source variance.
Two synthetic sources are considered: the memoryless Gaussian and a
Gauss-Markov source. The zero-mean memoryless Gaussian source has a
probability density function given by

(74)

where a: is the variance of the source. The general Gauss-Markov (or


RESIDUAL VECTOR QUANTIZERS 39
TABLE I
BOUNDSOF INTEREST
DISTORTION-RATE

Source SQNR(co.1,O.S) SQNR(co.l.1) SQNR(m,l,2) SQNR(m.1,3)

Memoryless Gaussian 3.01 6.02 12.04 18.06


Gauss-Markov NA 14.96 2 1.64 17.66

Gaussian autoregressive) source is defined as


L
x, = C a+, + w , ,
/=I
-I (75)

where { a , :1 = I , . . . ,L } are the coefficients of a linear, shift-invariant filter


and the w, are realizations of a white, zero-mean Guassian process. The
Gauss-Markov source used in these experiments is the second order model
with coefficients {a, = 1.515, a, = -0.752}, which is also known as the A R ( 2 )
source. Some of the known distortion rate bounds (Berger, 1971; Jayant and
Noll, 1984; Marcellin, 1987) for these sources are given in Table I.

C . Exhaustive Search Residual Quantizers

The various R Q simulation results reported here have the following charac-
teristics in common: The training sets contained 500,000 vectors, since under
these conditions we found that the simulation results using in-training set
data varied negligibly from the results obtained using out-of-training set
data. Since the equivalent code book sizes in these experiments varied from
2 to 256 code vectors, the corresponding training set size on a per equivalent
code vector basis ranges from 250,000 to about 1,950 training set vectors per
equivalent code vector.
Each of the different R Q designs was tested with the number of stages
varying from two to eight stages. The code book sizes for the RQs were
divided as equally as possible among the stages. If an equal number of code
vectors could not be allocated to each stage for a given n and R , then the first
few stages were assigned the larger code book sizes. All stopping thresholds
used during the design process for relative changes in distortion were set to
0.0005. The splitting algorithm of Linde and Gray ( 1 980) was used to seed
the initial code books.
Tables of SNQR(n, P, R ) at rates of 0.5, 1 .O, or 2.0 bits per sample can be
found in the Appendix. The tables are organized in pairs. The first table of
each pair gives the SQNR(n,P,R) performance of a conventional, suboptimal
R Q designed with sequential use of the LBG algorithm as in Juang and Gray
(1982). The second table gives the performance of the same RQ only where
40 C. F. BARNES A N D R . L. FROST

SQNR vs. Memory Costs for Gaussian Source


sQNR (a)

10.00

9.00

8.00

7.00

6.00

5.00

4.00

3.00

2.00

2.00 4.00 6.00 8.00 10.00 12.00

log2(# of scalar memory locations)

FIGIJRE19 Memory efficiency of ESRQs on the memoryless Guassran source

an exhaustive search encoder is used and where the stagewise code vectors
satisfy necessary conditions for joint optimality. In each plot the P = 1 curve
represents an unconstrained ESVQ quantizer performance, which serves as
a reference to determine the effect of the multistage residual memory
constraint. The P = 2 RQ is the least constrained and the P = 8 RQ is the
most severely constrained in that each stage has only two code vectors.
It can be argued that the comparison of ESRQs with tree-searched sequen-
tial L,BG RQs is unfair since the search procedures are not identical. For this
reason. this comparison between conventional RQ and ESRQ is not overly
emphasised. The main thrust of these experiments is the comparison of
ESRQ with ESVQ. As we shall see. however. an interesting result is that the
performance of sequentially designed LBG RQs and jointly optimal ESRQs
can be nearly identical at low rates for memoryless sources. For sources with
memory this is not true. This study also illustrates some of the undesirable
phenomenon that occur at higher rates with sequentially designed RQs.
RESIDUAL VECTOR QUANTIZERS 41

SQNR vs. Memory Costs for Gauss-Markov Source

17.00 EsVQ
16.00 .--------
ys.&.ges. ....

3 stages
*-----
15.00 4 stages
c - - - -
5 Stages
c - - -
14.00 6 Stages
c - -
7 stages
13.00 c - -
8 Stages
12.00

11.00

10.00

9.00

8.00

7.00

6.00

5.00

5.00 10.00

log,(# of scalar memory locations)

FIGURE20. Memory efficiency of ESRQs on the Gauss-Markov source.

As can be seen from the tables, on the memoryless Gaussian source


conventional RQs tend to exhibit erratic SQNR behavior as a function of n
for fixed P and R . That is, the performance may actually decrease as the
vector dimension is increased. This phenomenon is most pronounced at the
larger values of R . Also, most of the conventional RQs have a lower SQNR
at rates of 1 and 2bps than the corresponding scalar Lloyd-Max non-
uniform quantizers! Notable exceptions are the binary RQs, which tend to
have SQNRs equal to nonuniform Lloyd-Max quantizers. In constrast,
optimal ESRQs do have a monotonically nondecreasing SQNR with n, and
never have an SQNR less than that of Lloyd-Max nonuniform quantizers.
However, the SQNR of both conventional and optimal RQs SQNR(n,P,R)
declines steadily as P is increased on this memoryless source. At the lowest
rate, of 0.5 bps, the SQNR of conventional tree-searched RQs and optimal
ESRQs are nearly equal. We believe that this result is explained by the
sparseness of the code vectors in n-space at this low rate, which minimizes
42 C. F. BARNES AND R. L. FROST

FIGURE21. Original 8 bit per pixel image Lena

entanglement problems for both RQ code book design methods and search
procedures, at least for this memoryless unimodal source.
For the AR(2) source, the SQNR of conventional RQ decreases significantly
as P increases. For ESRQs, however, there is very little variation in SQNR
with increasing P,and there is only a slight loss of performance between the
multistage quantizers and the single-stage quantizer. The performance drop
between the single-stage and multistage quantizers ranges from 0 to about
0.5 dB for the ESRQs. The corresponding drop for the conventionally
designed RQs is as large as 3.0dB.
These results help quantify the extent to which the SQNR vs. vector
dimension performance of ESRQ is suboptimal to ESVQs for various values
of P and R . The question remains as to whether or not ESRQs give superior
RESIDUAL VECTOR QUANTIZERS 43

FIGURE
22. Lena compressed with conventional RQ at 0.25 bits per pixel.

performance compared to ESVQs for a fixed memory expenditure. We call


this comparison the memory eficiency (relative to ESVQ) of the ESRQ code
structures. We measure memory costs by the number of scalar memory
locations required to store the code book(s) at either the encoder or decoder.
The term “scalar” is used so we may be imprecise about the data type (fixed
point, single precision, double precision, etc.) of the code book entries.
Figure 19 shows the SQNR vs. log, of the number of memory cells required
for implementation of different codes designed for the memoryless Gaussian
source. Note that ESRQ generally requires more memory to achieve the same
SQNR performance of ESVQs. This is a consequence of ESRQ requiring a
larger n to achieve the same SQNR performance of ESVQs. It is also
44 C . F. BARNES AND R. L. FROST

FIGURE23. Lena compressed with rRQ at 0.25 bits per pixel

somewhat surprising, since ESRQ was intended to reduce the memory costs
relative to ESVQ.
A very different result is obtained on the Gauss-Markov source. As shown
in Fig. 20, for the parameters tested, ESRQs required one-fourth to one-
sixteenth the memory of corresponding ESVQs. Equivalently, the ESRQs
give approximately at 0.25 dB to 2.5 dB increase in SQNR over ESVQs at a
given memory expenditure. The savings depend on both R and n. This
demonstrates that extreme care should be taken when evaluating “cost
efficient” compression schemes. In this case, the ESRQ structure proved
more efficient than ESVQ on one source but not on the other.
It is not surprising that a structured VQ is better suited to a structured
source. However, even though ESRQ proved to be more memory-efficient
RESIDUAL VECTOR QUANTIZERS 45

FIGURE24. Lena compressed with conventional RQ at 0.44bits per pixel.


than ESVQ on the AR(2) source, it is clearly not more computation-efficient.
In this sense, ESRQ is the complement of TSVQ: TSVQ is computation-
efficient but not memory-efficient. In the next section, the memory- and
computation-efficient reflected residual quantizer is tested and evaluated on
real imagery data.

D . Reflected RQ

Experiments were conducted comparing the performance of RQs designed


with sequential use of the GLA to the performance of reflected RQs. (This
comparison is fair since both RQ codes are tree-searched.) The rRQs each
consist of 64 stages with various vector sizes and bit rates. To reduce the
46 C. F. BARNES AND R. L. FROST

FIGURE25. Lena compressed with rRQ at 0.44 bits per pixel.

required time for code book design of the rRQ code vector constellation, the
rRQ stages evaluated here were jointly optimized only over sub-blocks of
eight stages. That is, the first eight stages were jointly optimized. Then, while
holding these stages fixed, the next eight stages were jointly optimized, added
to the first eight stages, and the process repeated. Experience with reflected
RQ designs shows that, as the number of encoder stages allowed to change
during the design process is increased, it becomes increasingly likely that
entanglement will occur. This is manifested by nonmonotonic behavior of the
quantizer distortion during the design process. This incremental sub-block
design approach is an ad hoc method of encouraging monotonic convergence
during the design process. One possible improvement to this design approach
might be to use separate encoder and decoder rRQ code books. The encoder
RESIDUAL VECTOR QUANTIZERS 47

FIGURE26. Lena compressed with conventional RQ at 1.00 bits per pixel.

code books should be designed to reduce entanglement, and the correspond-


ing decoder code books might then be jointly optmized over all stages in the
rRQ.
Unlike the previous set of experiments, the rRQs were tested on real
imagery data. The training set consisted of six 512 x 512 digitized density
(Budge et al., 1989) images where the original data was quantized to 8 bits per
pixel. (The training set images included pictures of a baboon, a woman sitting
in a living room, a boat in dry dock, a close-up of a woman (taken around
1900), a couple standing in a living room, and a close-up of a bird.) Using this
training set, various RQs, consisting of 64 stages, with the vector sizes
varying between 4 x 4 to 16 x 16, were designed with the two algorithms.
Shown in Table I1 are the peak signal-to-noise ratios (PSNR) resulting from
48 C. F. BARNES A N D R. L. FROST

FIGURE
27. Lena compressed with rRQ at 1.00bits per pixel.

TABLE I1
RESULTSFOR TESTIMAGELENA
PERFORMANCE

Number Vector Bit PSNR (dB) PSNR (dB) Relative


of Stages Size Rate LBG RQ Refl RQ Improvement

64 16 x 16 0.25bpp 25.87 28.24 2.37 dB


64 12 x 12 0.44bpp 27.32 30.05 2.73 dB
64 8 x 8 1.OO bpp 29.91 32.73 2.82dB
64 6 x 6 1.77 bpp 31.96 35.33 3.37dB
64 4 x 4 4.00bpp 36.98 41.07 4.09dB
RESIDUAL VECTOR QUANTIZERS 49

FIGURE28. Lena compressed with conventional RQ at 1.77bits per pixel.

encoding an out-of-training set image commonly referred to as “Lena.” The


performance of reflected RQ improves from 2.37 dB to 4.09 dB over sequen-
tial GLA RQ as the bit rate was increased from 0.25 bpp to 4.00 bpp. Shown
in Figs. 22-29 are some of the corresponding reconstructed images. Figure 21
is the original 8 bit per pixel image. It can be seen that the increase in SQNR
for the rRQ design is evident in the edges of the image, where the blocking
effect is less severe than in the sequential LBG designs.
Since it is not feasible to design ESVQs with these vector sizes and rates,
it is very difficult to comment on the relative efficiency of this scheme. It does
show, however, that optimized rRQ can perform significantly better than
conventional RQ, especially at higher rates. We stress that the implemen-
50 C. F. BARNES AND R. L. FROST

FIGURE
29. Lena compressed with rRQ at 1.77bits per pixel.

tation costs for the rRQ codes is very low: only 128 vectors need to be stored
and only 64 pair-wise nearest-neighbor vector encoding decisions (plus the
computation expense of the reflection operations) are required for encoding.
We believe that rRQ is the only nonlattice vector quantizer developed to date
that is instrumentable in both memory and computation costs, and yet seems
to yield acceptable performance levels.
These results are quite encouraging. The distortion results can be expected
to improve if the code vector size (and number of stages) is allowed to
increase. This would not compromise implementability, since the 64 stage
quantizers designed here do not come close to challenging current stage-of-
the art digital hardware.
RESIDUAL VECTOR QUANTIZERS 51

VIII. CONCLUSIONS

Residual quantizers consistute a class of structured VQ that seem to fall


somewhere between exhaustive search VQ and lattice VQ in both complexity
and distortion. They can be understood using classical VQ analysis
techniques. This was shown by deriving necessary conditions for the optimality
of all stagewise code vectors assuming difference distortion measures. Locally
optimal ESRQs can be designed by appropriate modification of Lloyd’s
Method I or the LBG design algorithm. The distortion of ESRQ relative to
ESVQ is source, rate, and dimension dependent. The SQNR of ESRQ is
often within a few tenths of a dB of that of ESVQ, especially on sources with
memory.
Residual quantizer decoder trees cannot always be well coordinated with
tree-structured encoders, especially a t high rates. At low rates, on the
memoryless Gaussian source, sequentially designed RQs perform essentially
as well as jointly optimal ESRQs. For general sources and high rate RQ
codes, there are various ways the RQ decoder can be constrained such that
a tree-structured encoder is effective. One such constraint is the reflection
constraint considered here in detail. The work reviewed here leads us to
conclude that highly structured VQ alphabets, such as rRQ, appear to work
well at moderate to high rates.
The work reviewed here leads us to conclude that highly structured VQ
alphabets, such as rRQ, appear to work well at moderate to high rates.
However, at low rates, it seems to be too difficult to approach D ( R ) simply
by increasing the dimension n of any VQ, structured or not. Convergence is
simply too slow, and the imposition of structure does not seem to reduce
costs sufficiently to overcome this burden.
We suggest that future research on low rate structured vector quantization
should consider exploiting alphabet expansion as well as vector dimension.
Alphabet expansion has proven to be very useful in developing structured
trellis-coded scalar quantizers (TCQ) (Marcellin 1987; Marcellin and Fischer,
1990). VQ alphabets can be used in TCQ, and are required for operation of
fixed-rate TCQ coders at rates R < 1 bps. Because multistage alphabets of the
type used in RQs can be viewed as a kind of generalized coset code, they are
well suited for use in trellis-coded applications. In particular, preliminary
work suggests that the coset structure can be exploited to develop efficient
algorithms for exhaustive search. We plan to report on this work in a future
paper.
52 C. F. BARNES A N D R. L. FROST

APPENDIX:
TABLES
OF RATE-DISTORTION
DATA

TABLE 111
DISTORTION
OF UNOPTIMIZED
RQ ON THE MEMORYLESSGAUSSIAN
SOURCEAT 0.5 BITPER SAMPLE

Signal 10 Quantization Noise Ratios (dB)


LBG R Q Gaussian 0.5 bps

# of Vector Dimension
Stages 2 4 6 8 10 12 14 16

1 1.66 1.89 2.06 2.16 2.22 2.28 2.33 2.39


2 1.66 1.81 1.89 2.02 2.04 2.02 2.11
3 1.66 I .78 1.85 1.89 1.99 2.00
4 I .66 I .75 1.81 I .86 1.89
5 1.67 I .74 1.79 1.83
6 1.66 1.73 1.78
7 1.66 1.72
8 1.67

TABLE IV
OF OPTIMIZED
DISTORTION ESRQ ON THE MEMORYLESS SOURCEAT 0.5 BIT PER SAMPLE
GAUSSIAN
~~ ~~

Signal 10 Quantization Norse Ratios (dB)


ES R Q Gaussian 0.5 bps

# of Vector Dimension
Stages 2 4 6 8 10 12 14 16

I .66 I .89 2.06 2.16 2.22 2.28 2.33 2.39


1.66 1.82 I .89 2.03 2.08 2.13 2.17
I .66 1.78 1.84 I .89 1.92 2.05
I .67 1.75 1.81 1.84 I .89
I .67 I .74 I .79 1.83
I .66 I .73 I .78
I .66 I .72
I .66
RESIDUAL VECTOR QUANTIZERS 53
TABLE V
DISTORTION
OF UNOPTIMIZED RQ ON THE MEMORYLES GAUSSIAN
SOURCE AT 1.0BITPER SAMPLE
_______

Signal to Quantization Noise Ratios (dB)


LBC R Q Gaussian 1.0 b p

# of Vector Dimension
Stages 1 2 3 4 5 6 7 8

1 4.40 4.40 4.47 4.58 4.71 4.80 4.86 4.93


2 4.40 3.79 4.09 4.18 4.34 4.27 4.40
3 4.40 3.94 4.02 4.02 4.11 4.28
4 4.40 4.02 3.79 3.94 4.10
5 4.40 4.08 3.88 3.92
6 4.40 4.13 3.94
7 4.40 4.15
8 4.39

TABLE VI
OF OPTIMIZED
DISTORTION EsRQ ON THE MEMORYLESS
GAUSSIAN
SOURCE
AT 1 .O BITPER SAMPLE

Signal to Quantization Noise Ratio (dB)


ES RQ Gaussian 1.O bps

# of Vector Dimension
Stages 1 2 3 4 5 6 7 8

4.40 4.40 4.47 4.58 4.71 4.80 4.86 4.93


4.40 4.45 4.34 4.54 4.60 4.65 4.74
4.40 4.46 4.47 4.50 4.57 4.65
4.40 4.46 4.49 4.51 4.55
4.40 4.46 4.48 4.52
4.40 4.47 4.49
4.39 4.46
4.39
54 C. F. BARNES AND R. L. FROST

TABLE VII
DISTORTION
OF UNOPTIMIZEDRQ ON THE MEMORYLESS
GAUSSIAN
AT 2.0 BITSPER SAMPLE
SOURCE

Signal to Quantization Noise Ratios ( d B )


LBG RQ Gaussian 2.0 bps

# of Vector Dimension
Stages 1 2 3 4

9.31 9.69 9.93 10.14


8.86 8.87 8.89 9.17
8.83 8.56 8.49
8.86 8.14 8.32
8.07 8.25
8.85 8.22
8.22
8.84

TABLE VIII
DISTORTION
OF OPTIMIZED
ESRQ ON THE MEMORYLESS
GAUSSIAN
SOURCE
AT 2.0 BITSPER SAMPLE

Signal to Quantization Noise Ratios ( d B )


ES RQ Gaussian 2.0 bps

# of Vector Dimension
Stages 1 2 3 4

9.31 9.69 9.93 10.14


9.30 9.42 9.57 9.81
9.44 9.67 9.75
9.44 9.67 9.68
9.72 9.14
9.51 9.83
9.77
9.90
RESIDUAL VECTOR QUANTIZERS 55
TABLE IX
DISTORTION
OF UNOPTIMIZED
R Q ON THE GAUSS-MARKOV
SOURCEAT 0.5 BIT PER SAMPLE

Signal to Quantization Noise Ratios (dB)


LBG RQ Gauss-Markov 0.5 bps

# of Vector Dimension
Stages 2 4 6 8 10 12 14 16

1 3.91 5.04 6.03 6.52 6.91 7.20 7.40 7.59


2 4.56 5.57 5.97 6.47 6.68 6.89 7.01
3 5.62 5.96 5.97 6.28 6.56 6.72
4 5.86 6.39 6.04 6.35 6.54
5 6.39 6.31 6.22 6.48
6 6.41 6.52 6.47
7 6.56 6.53
8 6.54

TABLE X
DISTORTION OF OPTIMIZED E s R Q ON THE GAUSS-MARKOV
SOURCE AT 0.5 BIT PER SAMPLE

Signal to Quantization Noise Ratios (dB)


ES RQ Gauss-Markov 0.5 bps

# of Vector Dimension
Stages 2 4 6 8 10 12 14 16

I 3.91 5.04 6.03 6.52 6.91 7.20 7.40 7.59


2 5.04 5.95 6.44 6.77 7.05 7.26 7.41
3 5.95 6.44 6.80 7.08 7.29 7.41
4 6.43 6.82 7.07 7.27 1.45
5 6.79 7.10 7.26 7.44
6 7.1 I 7.28 7.46
7 7.25 7.45
8 7.46
56 C. F. BARNES A N D R. L. FROST

TABLE XI
DISTORTION
OF UNOPTIMIZED
RQ ON THE GAUSS-MARKOV
SOURCE
AT 1.0 BIT PER SAMPLE

Signal to Quantization Noise Ratios (dB)


LBG R Q Gauss-Markov 1.Obps

# of Vector Dimension
Stages 1 2 3 4 5 6 7 8

I 4.39 7.51 8.41 9.56 10.36 10.92 11.35 11.68


2 7.23 7.92 8.97 9.69 10.10 10.65 10.90
3 7.64 8.96 9.27 9.61 10.31 10.39
4 8.23 9.25 9.38 9.63 10.02
5 9.26 9.44 9.87 9.74
6 9.38 9.97 9.80
7 9.80 10.05
8 10.06

TABLE XI1
DISTORTION
OF OPTIMIZED
ESRQ ON GAUSS-MARKOV
THE SOURCE
AT 1.0 BITPER SAMPLE

Signal to Quantization Noise Ratios (dB)


ES RQ Gauss-Markov 1.0 bps

# of Vector Dimension
Stages 1 2 3 4 5 6 7 8 9 10

1 4.39 7.51 8.41 9.56 10.36 10.92 11.35 11.68 11.98 12.23
2 7.52 8.32 9.45 10.12 10.70 11.13 11.40 11.70 11.86
3 8.31 9.44 10.14 10.71 11.14 11.31
4 9.44 10.23 10.63 11.11 11.32
5 10.15 10.70 11.14 11.35
6 10.70 11.11 11.45
7 11.05 11.41
8 I 1.40
RESIDUAL VECTOR QUANTIZERS 57
TABLE XI11
DISTORTION
OF UNOPTIMIZED RQ ON THE GAUSS-MARKOV
SOURCE
AT 2.0 B i n PER SAMPLE

Signal to Quantization Noise Ratios (dB)


LBG R Q Gauss-Markov 2.0 b p

# of Vector Dimension
Stages 1 2 3 4

9.29 12.86 15.34 16.67


8.83 11.98 14.23 15.46
11.97 13.80 14.67
1I .02 13.73 13.92
13.50 13.79
12.61 13.71
13.70
13.63

TABLE XIV
OF OPTIMIZED
DISTORTION ESRQ ON THE GAUSS-MARKOV
SOURCE
AT 2.0 BITSPER SAMPLE

Signal to Quantization Noise Ratios (dB)


ES RQ Gauss-Markov 2.0 bps

# of Vector Dimension
Stages 1 2 3 4

9.29 12.86 15.34 16.67


9.29 12.70 14.86 16.10
12.65 14.84 16.04
12.67 14.84 16.27
15.01 16.27
15.00 16.25
16.27
16.15
58 C. F. BARNES AND R. L. FROST

REFERENCES

Baker, R. L. (1984). “Vector quantization of digital images.” Ph.D. Thesis, Stanford University,
California.
Barnes, C. F. (1989). “Residual Quantizers.” Ph.D. Thesis, Brigham Young University, Utah.
Barnes, C. F., and Frost, R. L. (1990). “Vector quantizers with direct sum code books,” to
appear in IEEE Transactions on Information Theory.
Berger, T. (1971). “Rate Distortion Theory,” Prentice-Hall, Englewood Cliffs, New Jersey.
Budge, S. E., Barnes, C. F., Talbot, L. M., Chabries, D. M., and Christiansen, R. W. (1989).
“Image coding for data compression using a human visual model,” SPIEISPSE Symposium
on Electronic Imaging: Advanced Devices and Systems, Los Angeles, California.
Buzo, A,, Gray Jr., A. H., Gray, R. M., and Markel, J. D. (1980). “Speech coding based upon
vector quantization,” IEEE Transactions on Acoustics, Speech and Signal Processing ASSP-
28, 562-574.
Chan, W.-Y., and Gersho, A. (1991). “Constrained-storage quantization of multiple vector
sources by codebook sharing,” IEEE Transactions on Communications COM-39, 1 1-1 3.
Chen, D., and Bovik, A. C . (1990). “Visual pattern image coding,” IEEE Transactions on
Communciations COM-38, 2137-2 146.
Conway, J. H., and Sloane, N. J. A. (1982). “Fast quantizing and decodingalgorithms for lattice
quantizers and codes,” IEEE Transactions on Information Theory IT-28, 227-232.
Farvardin, N., and Modestino, J. W. (1984). “Optimum quantizer performance for a class of
non-Gaussian memoryless sources,” IEEE Transactions on Informution Theory IT-30, 485-
497.
Farvardin, N., and Modestino, J. W. (1986). “Adaptive buffer-instrumented entropy-coded
quantizer performance for memoryless sources,” IEEE Transactions on Information Theory
IT-32, 9-22.
Fischer, T. R., and Dicharry, R. M. (1984). “Vector quantizer design for memoryless Guassian,
Gamma, and Laplacian sources,” IEEE Transactions on Communications COM-32, 1065-
1069.
Flanagan, J. K., Morrell, D. R., Frost, C. J., and Nelson, B. E. (1989). “Vector quantization
codebook generation using simulated anealing,” in IEEE International Conference on
Acoustics, Speech and Signal Processing, 1759- 1762.
Forney, D. G. (1988). “Coset codes-Part I: Introduction and geometric classification,” IEEE
Transactions on Information Theory IT-34, 1 123- 1 I5 I .
Gabor, G., and Gyorfi, Z. (1986). “Recursive Source Coding.” Springer-Verlag. New York.
Gallager, R. G . (1968). “Information Theory and Reliable Communication.” John Wiley and
Sons, New York.
Gersho, A. (1979). “Asymptotically optimal block quantization,” IEEE Transactions on Infor-
mation Theory IT-25, 373-380.
Gibson, J . D., and Sayood, K. (1988). “Lattice Quantization,” in “Advances in Electronics and
Electron Physics” (P. Hawkes, ed.) 72, 259-330. Academic Press, New York.
Gray, R. M., Kieffer, J. C., and Linde, Y. (1980). “Locally optimal block quantizer design.”
Information and Control 45, 178- 198.
Jayant, N. S., and NOH, P. (1984). “Digital Coding of Waveforms: Principles and Applications
to Speech and Video.” Prentice-Hall, Englewood Cliffs, New Jersey.
Jelinek, F., and Anderson, J. B. (1971). “Instrumentable tree encoding of information sources,”
IEEE Transactions on Information Theory IT-17, 118-1 19.
Juang, B. H., and Gray, A. H. (1982). “Multiple stage vector quantization for speech coding,”
RESIDUAL VECTOR QUANTIZERS 59
in Proceedinxs of the IEEE International Conference on Acoustics, Speech, and Signal Process-
ing 1, 597-600.
Langdon, G. G. (1984). “An introduction to arithmetic coding,” IBM Journal of Research and
Development 28, 1 3 5- 149.
Linde, Y . , Buzo, A., and Gray, R. M. (1980). “An algorithm for vector quantizer design,” IEEE
Transactions on Communications COM-28, 84-95.
Lloyd, S. P. (1957). “Least squares quantization in PCM,” Bell Laboratories Technical Notes;
Also published in the March 1982 special issue on quantization: IEEE Transactions on
Information Theory, Part 1 IT-28, 129-137.
Makhoul, J., Roucos, S., and Gish, H . (1985). “Vector quantization in speech coding,” in
Proceedings ofthe IEEE 73(11), 1551-1588.
Marcellin, M. W. (1987). “Trellis coded quantization: an efficient technique for data com-
pression,” Ph.D. Thesis. Texas A&M University, College Station, Texas.
Marcellin, M. W., and Fischer, T. R. (1990). “Trellis code quantization of memoryless and
Gauss-Markov sources,” IEEE Transactions on Communications COM-38, 82-93.
Max, J. (1960). “Quantization for minimum distortion,” IRE Transactions on Information
Theory IT-6, 7-12.
Pilc, R. (1967). “Coding theorems for discrete source-channel pairs,’’ Ph.D. Thesis,
Massachusetts Institute of Technology, Cambridge, Massachusetts.
Pilc, R. (1968). “The transmission distortion of a source as a function of the encoding block
length,” Bell Syst. Tech. J . 47, 827-885.
Sabin, M. J., and Gray, R. M. (1986). “Global convergence and empirical consistency of the
generalized Lloyd algorithm,” IEEE Transactions on Information Theory IT-32, 148-1 55.
Sdyood, K., Gibson, J. D., and Rost, M. C. (1984). “An algorithm for uniform vector quantizer
design,” IEEE Transaclions on Information Theory IT-30, 805-8 14.
Shannon, C. E. (1948). “A mathematical theory of communication”, Bell Syst. Tech. J . 27,
379-423, 623-656.
Shannon, C. E. (1959). “Coding theorems for a discrete source with a fidelity criterion,” in IRE
Nut. Conv. Rec., Part 4, 142-163.
Turshkin, A. V. (1982). “Sufficient conditions for uniqueness of a locally optimal quantizer for
a class of convex error weighting functions,” IEEE Trunsactions on Information Theory IT-28,
187-198.
Welch, T. A. ( I 984). “A technique for high-performance data compression,” IEEE Computer
Mugazine, 8-19.
This Page Intentionally Left Blank
ADVANCES IN ELECTRONICS AND ELECTRON PHYSICS, VOL. 84

Foundation and Applications of Lattice Transforms in Image


Processing
JENNIFER L. DAVIDSON
Department of Electrical Engineering and Computer Engineering, Iowa State University, Ames,
l o wa

1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
A. Lattice Structures in Image Processing . . . . . . . . . . . . . . . . . 62
B. Image Algebra and Its Relation to Image Processing . . . . . . . . . . . 64
11. Theoretical Foundation of Lattice Transforms in Image Processing . . . . . . 66
A. Minimax Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . 66
B. Image Algebra. . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
C. The Embedding Isomorphism between Minimax Algebra and Image Algebra 85
D. Mathematical Morphology . . . . . . . . . . . . . . . . . . . . . . 86
111. Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
A. Mapping of Minimax Algebra Properties to Image Algebra . . . . . . . . 90
B. A General Skeletonizing Technique. . . . . . . . . . . . . . . . . . . 115
C. An Image Complexity Measure . . . . . . . . . . . . . . . . . . . . 120
D. The Dual Transportation Problem in Image Algebra . . . . . . . . . . . 124
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

I. INTRODUCTION

Nonlinear transforms are demanding increasing attention in image pro-


cessing applications. One class of nonlinear transforms used extensively in
image processing grew out of Minkowski’s set theoretic operations [39]
investigated by Matheron during the mid-I 960s [34]. Since then, mathemati-
cal morphology has been developed and refined until the late eighties, when
it was placed into a rigorous mathematical environment called minimax
algebra. Because these transforms are based on the number system of
extended real numbers, which is a lattice, they have been termed lattice
transforms. The purpose of this chapter is to familiarize the image processing
community with the mathematical background necessary for understanding
the full potential of lattice transforms in image processing applications, and
to present several applications. Because this is relatively recent research,
much work remains to be done in the field, and potential for significant
research to be accomplished is high. To illustrate this, we use the following
analogy. Linear transforms are used extensively in the scientific community,
and the theory of linear transforms is extremely well developed. Theoretical
61 Copyright Q 1992 by Academic Press, Inc.
All rights of reproduction in any form reserved.
ISBN 0- 12-014726-2
62 JENNIFER L. DAVIDSON

results are constantly being developed to advance linear-related transform


applications in image processing. For example, the discrete Fourier
transform (DFT), which can be represented by a matrix, has a fast version
called the fast Fourier transform (FFT). The FFT is heavily used in many
engineering arenas due to its greatly increased speed of computation when
compared with the DFT. The FFT can be represented by a product of
matrices that, when multiplied together, give the DFT. In other words, a
decomposition of the DFT matrix gives the FFT. This powerful result is an
example of how the theory of linear transforms can be used to give
applications-oriented results. Potential for similar impact exists in the areas
of lattice transforms. However, until the mid- 1980s, lattice transforms have
seen applications only in the area of operations research. It is hoped that the
material presented here will provide the reader with the incentive to investi-
gate other applications of the basic theory that have not been pursued yet,
primarily in the area of image processing. Already, a new theory of artificial
neural networks, called morphological neural networks, has been developed,
using minimax algebra as the theoretical setting (Davidson and Ritter, 1990;
Davidson and Sun, 1991).
The topics in this chapter are divided into three groups. The first group,
Section I, provides background and history of the mathematical structures
pertinent to lattice transforms, namely mathematical morphology, the
minimax algebra, and the image algebra. Section I1 lays the theoretical
foundation for lattice transformations in image processing and presents
detailed discussions on the three algebras and the relationship among them.
Section Ill gives four major applications of the theory to specific problems.
The first, mapping of minimax algebra properties to image algebra, describes
how a series of minimax algebra results can be readily formulated in an image
processing environment, thus providing new tools for solving a certain class
of image processing problems. The second, a general skeletonizing technique,
which can be viewed as a division algorithm, is given. Third, an application
to image complexity measures is presented. Finally, the dual transportation
problem in context of lattice transforms is stated.

A . Lattice Structures in Image Processing

The algebraic structures of early image processing languages, such as


mathematical morphology, had no obvious connection with a lattice
structure. Those algebras were developed to express binary image manipu-
lation. As the extension to gray-value images developed, the notions of
performing maximums and minimums over a set of numbers emerged.
Formal links to lattice structures were not developed until very recently
(Davidson, 1989; Heijmans, 1990: Serra, 1988). We present background in
FOUNDATION AND APPLICATIONS OF LATTICE TRANSFORMS 63
this area, showing how the lattice properties were inherent in the structures
being investigated.
Mathematical morphology is an algebraic structure used in image pro-
cessing to analyze the shape content in images. Matheron (1967) and Serra
(1969) applied Minkowski's vector addition of sets (1903) to analyze boolean
images at the School of Mines in Paris. A boolean, or binary, image is one
that can assume two values, usually 0 or 1. Later, Sternberg [Sternberg, 19861
extended the notions to include gray-value images, that is, images having
more than two values. The algebraic operations developed by Serra and
Sternberg are equivalent and based on the operations of Minkowski addition
and Minkowski subtraction of sets in R", the n-dimensional Euclidean space.
Given A c R" and B c R",Minkowski addition is defined by
A + B = {u + b:aeA, b e B J
and Minkowski subtraction is defined by
A/B = A + B,
where the bar denotes set complementation. It was actually Hadwiger who
defined Minkowski subtraction (1950). The value set underlying the gray-
value mathematical morphology structure is the set R-, = R u { - a}, the
real numbers with - co adjoined. Sternberg's functional notation is most
often used to express the two morphological operations, as it is simply stated
and easy to implement in computer code. The gray-value operations of
dilution and erosion, corresponding to Minkowski addition and subtraction,
respectively, are
D(x, y ) = v (A(x
1.1
- i, y - i ) + B(i,j)}
E(x, y ) = A {A(x - i, y - i) - B(- i, - j ) }
1.1

respectively, where A and B are real-value functions on R2.


It is well known that R,, = R u { 00, - 00) is a complete lattice (Birkhoff,
1940). The lattice structure provides the basis for categorizing certain classes
of image processing problems. This is discussed in more detail in Section 11.
A lattice transform can be described in the following way. Let a be a vector
from R1, and let t be an m x n matrix with values in R-, . Then, when t is
applied to a as per the following equation, we can view t as transforming a
to the m-tuple b:

b, = v + ak.
n

k= I
f,k

This is essentially the definition of the minimax matrix product as described


in Cuninghame-Green's monograph Minimax Algebra (1979). Applications
64 JENNIFER L. DAVIDSON

were first investigated in the area of operations research, which has long been
known for its class of problems in optimization. The types of optimization
problems that Cuninghame-Green considered used arithmetic operations
different from the usual multiplication and summation. Some machine-
scheduling and shortest-path problems, for example, could be best charac-
terized by a nonlinear system using additions and maximums. The minimax
algebra is a matrix calculus that uses a special case of a generalized matrix
product (Cohen, 1988), where matrices and vectors assume values from a
lattice. By adding a few more conditions, such as a group operation on the
lattice, and the self-duality of the resulting structure, Cuninghame-Green was
able to develop a solid mathematical foundation in which to pose a wide
variety of operations research questions. It turns out that mathematical
morphology is a special subalgebra of the minimax algebra, the details of
which are presented in Section 11. Much theoretical and applied work has
been done in the area of mathematical morphology. The generalization of
morphology to lattice transforms is intended to extend the knowledge
already gathered in this area, not to supplant it.

B. Image Algebra and Its Relation to Image Processing

The idea of establishing a unifying theory for concepts and operations


encountered in image and signal processing has been pursued since the
advent of computers. It was the 1950s work of von Neumann that inspired
Unger to propose a “cellular array” machine on which to implement, in
parallel, many algorithms for image processing and analysis (von Neumann,
1951; Unger, 1958). Among the machines embodying the original automaton
envisioned by von Neumann are NASA’s massively parallel processor (MPP)
(Batcher, 1980), and the CLIP series of computers developed by Duff and his
colleagues (Duff, 1982: Fountain et al., 1988). A more general class of cellular
array computers are pyramids (Uhr, 1983), and the Connection Machine, by
Thinking Machines Corporation (Hillis, 1985).
Many of the operations that cellular array machines perform can be
expressed by a set of primitives, or simple elementary operations. One
opinion of researchers who design parallel image processing architectures is
that a wide class of image transformations can be represented by a small set
of basic operations that induce these architectures. Matheron and Serra
developed a set of two primitives that formed the basis for the initial develop-
ment of a theoretical formalism capable of expressing a large number of
algorithms for image processing and analysis. Special purpose parallel archi-
tectures were then designed to implement these ideas. Several such systems
are Matheron and Serra’s Texture Analyzer (Klein and Serra, 1982), the
Cytocomputer at the Environmental Research Institute of Michigan (ERIM)
FOUNDATION AND APPLICATIONS OF LATTICE TRANSFORMS 65
(Sternberg, 1983; McCubbrey and Lougheed, 1985), and Marietta’s GAPP
(Cloud and Holsztynski, 1984).
The basic mathematical formalism associated with these cellular archi-
tectures is the concepts of pixel neighborhood arithmetic and mathematical
morphology. Mathematical morphology is a mathematical structure used in
image processing to express image processing transformations by the use of
structuring elements, which are related to the shape of the objects to be
analyzed. The origins of mathematical morphology lie in work done by
Minkowski and Hadwiger on geometric measure theory and integral
geometry (Minkowski, 1911, 1903: Hadwiger, 1957). It was Matheron (1967)
and Serra (1982) who used Minkowski’s operation as a basis for describing
morphological image transformations. Mathematical morphology has since
become a very active area in image processing, producing many useful
results. Some recent research papers on morphological image processing are
Crimmins and Brown (1989, Maragos and Schafer (1987), Heijmans ( 1 990),
Haralick et al. (1987), and Sinha and Giardina (1990).
It was Serra and Sternberg who first unified morphological concepts into
an algebraic theory specifically focusing on image processing and image
analysis. The first to use the term “image algebra” was, in fact, Sternberg
(1980b, 1985). Recently, a new theory encompassing a large class of linear
and nonlinear systems was put forth by Maragos (1985). However, despite
these profound accomplishments, morphological methods have some well
known limitations. They cannot, with the exception of a few simple cases,
express some fairly common image processing techniques, such as Fourier-
like transformations, feature extraction based on convolution, histogram
equalization transforms, chain-coding, and image rotation. At Perkin-
Elmer, Miller demonstrated that a straightforward and uncomplicated target
detection algorithm, furnished by the U.S. Government, could not be
expressed using a morphologically-based image algebra (1 983).
The morphological image algebra is built on the Minkowski addition and
subtraction of sets (Hadwiger, 1957), and the set-theoretic formulation of its
basic operations prohibits mathematical morphology from being used as a
basis for a general purpose algebra-based language for digital image pro-
cessing. Morphological operations ignore the linear domain, transformations
between different domains (spaces of different dimensionalities), and trans-
formations between different value sets, e.g., sets consisting of real, complex,
or vector-valued numbers. The image algebra that was developed at the
University of Florida under United States Air Force (USAF) funding
includes these concepts and also incorporates and extends the morphological
operations. Morphology operations form a subset of a more general class of
operations, or lattice transforms, which, in turn, form a subalgebra of the
image algebra. Henceforth, we refer to the USAF image algebra simply as
66 JENNIFER L. DAVIDSON

“image algebra.” A full discussion of the entire image algebra is presented by


Ritter et al. (1990). The main focus of this work is to place lattice transforms
as used in image processing in a mathematically rigorous environment using
image algebra and a matrix-based algebra called minimax algebra, and to
demonstrate its value with several applications.

11. THEORETICAL OF LATTICE


FOUNDATION TRANSFORMS
IN IMAGE
PROCESSING

Underlying most successful engineering projects are the theoretical foun-


dations on which the applications are based. A very successful image pro-
cessing concept is mathematical morphology, whose theory is well developed.
However, it is possible to place mathematical morphology in a more general
setting while gaining a wealth of potentially useful applications. This section
describes the general algebraic environment and discusses the relationship
among the three algebraic structures involved, minimax algebra, image
algebra, and mathematical morphology. One way we will use image algebra
is for ease of application of the minimax algebra results to image processing
problems. The theoretical base of lattice transforms, however, is the minimax
algebra, which we discuss first.

A . Minimax Algebra

Since 1950, several different authors have discovered, apparently inde-


pendently, a nonlinear algebraic structure, which each author has used to
solve a different type of problem. The operands of this algebra are the real
numbers, with - co (or + co) adjoined, with the two binary operations of
addition and maximum (or minimum). The extension of this structure to
matrices was formalized mathematically by Cuninghame-Green in his book
Minimax Algebra (1979). It is well known that the structure of R with the
operations of + and v is a semi-lattice ordered group, and that (R,v , A ,
+) is a lattice-ordered group, or an I-group (Birkhoff, 1940). Viewing
R-, = R u { - co}as a set with the two binary operations of + and v , and
then investigating the structure of the set of all n x n matrices with values in
R-, leads to an entirely different perspective of a class of nonlinear
operators. These ideas were applied by Shimbel ( I 954) to communications
networks, and to machine-scheduling by Cuninghame-Green (1 960, 1962)
and Giffler ( 1 960). Others (Peteanu, 1967; Benzaken, 1968; CarrC, 1971;
Backhouse and Carre, 1975) have discussed their usefulness in appli-
cations to shortest-path problems in graphs. Several examples are given in
FOUNDATION AND APPLICATIONS OF LATTICE TRANSFORMS 67
Cuninghame-Green ( 1 979), primarily in the field of operations research.
Another useful application, to image processing via image algebra, was again
independently developed by Ritter ( 1 990).
Minimax algebra is an algebraic structure of matrices and vectors whose
underlying value set is a bounded lattice-ordered group, or bounded I-group.
Properties of bounded I-groups induce properties on the set of matrices, and
thus the resultant matrix structure is very dependent on the specific attributes
of bounded I-groups. This runs parallel to the theoretical foundations for
linear algebra. For example, the properties of the vector space of all m x n
matrices over the field of complex numbers C are intimately related to the
algebraic properties of C. Also, the analysis and continued investigation of
linear transforms as applied to image processing is greatly facilitated by the
theoretical foundations of linear algebra. A well-utilized fact in many areas
of engineering and science is that an arbitrary linear transformation can be
represented by a matrix and vectors manipulated by established theoretical
results. Similarly, the basis for the development of the minimax matrix
algebra lies on the structure inherent in a bounded I-group. The set of real
extended numbers R,, = R u { + co, a}with the lattice operations v and
A and the group operation of addition is an example of a bounded I-group.
Both the usual matrix product and the minimax matrix product are special
cases of the generalized matrix product (Cohen, 1988), whose definition is
given below.

Let F denote a set of numbers. Let f and g be functions from F x F into F.


For simplicity, assume the binary operation f to be associative. Let F,, denote
the set of all m x p matrices with values in F, and let (a,) = A E F,, and
(b,k) = B E F,, . Dejine f * g to be the function from F,, x F,, into F,, given by

-
(f g)(A, B) = c
where Clk = (allgblk)f(a,,gb,k)f.. . f(alpgbpk), for i = 1, . . . , m, k =
I , . . . , n, and f and g are viewed as binary operations.

Thus, iff denotes addition and g multiplication, then ( f . g ) ( A , B) is the


ordinary matrix product of matrices A and B. The formal matrix calculus
based on the two binary operations f = v and g = + defined on the
extended real numbers was developed by Cuninghame-Green ( 1 979), who
called it minimax matrix theory. The development of the theory is performed
in the abstract, with an eye toward applications of matrices having values in
the extended real numbers.
For the remainder of this work, we assume that the reader is familiar with
basic abstract algebra and lattice theory concepts. Otherwise, two good
sources to consult are Fraleigh (1967) and Birkhoff (1940). We will describe
68 JENNIFER L. DAVIDSON

some basic concepts necessary to understanding lattice transforms, some of


which can be found in Cuninghame-Green (1 979) and Birkhoff (1 940).
A semi-lattice ordered semi-group F with semi-lattice operation v and
semi-group operation x can be viewed as a set with two binary operations,
v and x . Recalling the notion of an algebraic ring, which is also a set with
two binary operations satisfying certain criterion, the previous structure F is
called a belt (Cuninghame-Green, 1979). A familiar example of a belt is (F,
v , x ) = (R, v , +), where v is the lattice operation of least upper bound
(1.u.b. or maximum), and + is real addition. If in addition F is assumed to
have the dual operations A and x’, and also satisfies
x v ( y A x) = x A ( y v x) = x for all x, y in F, and x ’ = x , then F is a
lattice-ordered semi-group. Here, A is the lattice operation of greatest lower
bound (g.1.b. or minimum). If the operation x is distinct from the operation
v , and the operation x makes F a group, and we still have x ‘ = x , then
(J?, v , A , x , x ’) is a lattice-ordered group, or simply I-group. The structure
F can still be viewed as a set with two binary operations plus their respective
dual operations. This structure parallels the structure (C, + , -, *, t),where
C is the set of complex numbers, + and - denote complex addition and
subtraction, respectively, and * and -+ denote complex multiplication and
division, respectively. When taken in context of linear transforms, C is viewed
as a set with two binary operations plus their inverse operations. While an
I-group has notions of identity elements, commutivity of elements under its
operations, etc., we will be interested primarily in two specific cases. The main
case is where (F, v , A , x , x ’) = (R, v, A , +, +), and the other case is
(F, v , A , x , x ’) = (R’, v , A , *, *). Here
R + = { r E R : r > O}.
Note that R and R + are isomorphic both as groups and lattices. Hence, most
of our discussions will be limited to R with the understanding that with the
appropriate substitutions for notation, the results proven for R will also hold
for R + . The isomorphism is given by the function f :R -+ R + defined by
f(x) = e x .
An arbitrary I-group F having two distinct binary operations v and x can
be extended in the following way. We adjoin the elements - 00 and + co to
the set F and denote this new set by F*,, where - a3 < x ~oVXEF. The
operations x and x ’ are defined in the following way. If x and y are elements
in F then x x y is already defined. Otherwise,
X X - ~ O = --COXX=-~O XEF-,
xxc0=0Oxx=co XEF,
x X’ - 00 = - 00 X’X = - 00 XEF-,
FOUNDATION AND APPLICATIONS OF LATTICE TRANSFORMS 69
x x’co = co x ’ x = co XEF,
x co = oc, x
(-a) (-oc,)= -co
x’co
(-a) = co x ’ ( - c o ) = co.
This second to the last rule ensures that - 00 acts as a null element in the
entire system (R+,, v , +), while the last rule introduces an asymmetry
between the operations x and x ’ with respect to the elements + 00 and
- 00. A belt has an identity element qi under the operation x if the set F is
a group; in this case, qi is the identity element of the group. Also, a belt has
anullelement0ifforallxinF,xv 0 = x a n d x x 0 = 0 xx=O.Thesystem
(FtW, v , A , x , x ’) is called a bounded I-group (Cuninghame-Green, 1979).
A simple bounded 1-group is the 3-element one, denoted by F3. Here,
F, = { - 00, 0, co}. Of course, the one we will be most interested in is for
F = R.
Let (F, v , x ) be a belt and let (T, v ) be a semi-lattice. Suppose we have
a right multiplication of elements of T by elements of F:
x x AET Vpairs x, A, X E T , AEF.
We say that (T, v ) is a right semi-lattice space over (F, v , x ), or simply, T
is a space over F, if the following holds for all x, y E T and for all A, p E F:
(xxA)xp= xx(/lxp)
(x v y) x A = (x x 2) v ( y x A)
x x (3, v p ) = (x x A) v (x x p)
xxqi=x
These play the role of vector spaces in the minimax theory.
Let (S, v ), (T, v ) be given spaces over a belt (F, v , x ). A semi-lattice
homomorphism g : (S, v ) -+ (T, v ) is called right-linear over F if
g(x x A) = g(x) x A VXES,VAEF.
The set of all right-linear homomorphisms from S to T is denoted by
Hom,(S, T).
By defining a left multiplication of elements of T by elements of F, we can
define a left space over F. This leads to a two-sided space, which is a triple (L,
T, R) such that
L is a belt and T is a left space over L
R is a belt and T is a right space over R
VAEL, VxETand Vp€R:A x (x x p ) = (A x x) x p
An important class of spaces over F is the class of function spaces. Here, the
70 JENNIFER L. DAVIDSON

semi-lattice is (F', v ). Such spaces are naturally two-sided. We are interested


only when I UI = n E Z', the set of positive integers.
When discussing conjugacy in linear operator theory, two approaches are
commonly used. One defines the conjugate of a given space S as a special set
S* of linear, scalar-valued functions defined on S. The other involves defining
an involution taking x E S to x* E S* which satisfy certain axioms. (Recall a
function f is an involution if f-'(f(x)) = x.) The situation is slightly more
complicated in the case of lattice transforms.
Let (S, v , x ) and (T, A , x ') be given belts. We say that (T, A , x ') is
conjugate to (S, v , x ) if there is a function g : S + T such that
1 . g is bijective;
2 . vx, y E s, g ( x v Y ) = g(x> A g ( y ) ;
3. vx, Y E S , g ( x x y ) = g ( y ) x ' g ( x ) .
In the context of lattice theory, the function g is a dual isomorphism.
Note that conjugacy is a symmetric relation. If ( S , v , A ) is a semi-
lattice with duality satisfying (1) and ( 2 ) , then S is called self-conjugate. If
(S, v , x , A , x ') is a belt with duality, then (S, v , x , A , x ') is self-
conjugate if (S, v , x ) is conjugate to (S, A , x ').
The operation of addition in C induces an additive inverse, the difference
of two complex numbers: p - q = p + ( - q). Similarly, the lattice operation
v of a bounded I-group allows for the definition of an additive conjugate
element as follows. If r E R,,, - then the additive conjugate of r is the unique
element r* defined by

r*=

I -r
-co
+co
Thus, (r*)* = r . This give the following relation:
r A u = (r* v
if r E R
ifr=+co
ifr=-co

u*)*
for all r, u in R,, . If the value set is RZ0 = R u {O, co}, then every element
r E R:' has a multiplicative conjugate i' defined by
ifr#Oandr# foo
i f r = +co (2)

-
Hence, (i') = r, and
r A u = (i' v ii)
FOUNDATION A N D APPLICATIONS OF LATTICE TRANSFORMS 7I
for all r, u in RZ0.
There are two types of operations defined on matrices having values in a
bounded I-group. Specifically, if A = (a,,) and B = (b,,) are two m x n
matrices having entries in the set R+,
- , then the pointwise maximum A v B

is defined as
A v B = C , where c, = a,, v b,.(3)
If A is m x p and B is p x n, the product of A and B is the matrix C ,
C = A x B, which has size m x n and values

If n = 1 then we have the matrix-vector product

A x B = C , where ci = v (aik+ bk).


P

k= I

If the value set is RZo, then the pointwise maximum between two matrices has
the same definition as ( 3 ) , but the product is defined as

v
P
c, = (ark * bk,).
k= I

The bounded I-group in this case is (Rg’, v , A, *, *’). In fact, since


( R + , v , *)
is isomorphic to
(R, v , +)
both as groups and as lattices, the corresponding bounded I-groups are also
isomorphic. Hence, without loss of generality, the discussion at hand can be
limited to the bounded I-group R , oo, with the understanding that all results
proven using R,, hold for RZo with the corresponding operations in RZo.
We remark that in R + , , the “bottom” element is - 00, while the “top”
element is 00. In RZ0, the bottom element is 0, while the top element is co.
Any transformation that has the form of Eq. (4) or (5) is called a lattice
transform. As should be clear by now, it is not a coincidence that these
definitions have the same character as the familiar operations of matrix-
matrix addition and product.
The structures underlying the linear (R, *, +) and the nonlinear (R, sc, +,
v ) algebras have distinctly different numeric properties that each satisfy.
However, when each is extended to its own matrix algebra, the resulting
matrix algebras have a remarkably similar character. For instance, the
notions of solutions to systems of equations, linear independence, rank of
72 JENNIFER L. DAVIDSON

matrices, matrix norms, eigenvalues, eigenvectors, and eigenspaces, spectral


inequalities, matrix inverses, and equivalence of matrices occur in both
algebras. It is exactly these linear algebraic concepts that provide tools to help
solve linear image processing problems, and the minimax algebra will un-
doubtedly prove to be as useful.
Within the minimax algebra, there are notions of matrix associativity,
commutativity, identity matrices, and so forth. These properties will not be
listed here, but will be called upon as needed in the future. A more detailed
and comprehensive presentation of minimax matrix properties can be found
in the book Minimax Algebra (Cuninghame-Green, 1979).
In Section 1I.C it will be shown that there is a one-to-one correspondence
between the minimax algebra and two subalgebras of the image algebra. With
this relationship established, all results established in the minimax algebra
become available for solving image processing problems related to lattice
transforms. In addition, the mathematical foundation laid by the minimax
algebra enables solutions to these types of problems to be approached in a
rigorous mathematical way.

B. Image Algebra

This section provides the basic definitions and notation that will be used for
the image algebra throughout this chapter. We will define only those image
algebra concepts necessary to describe ideas in this document. For a full
discourse on all image algebra operands and operations, we refer the reader
to Ritter et al. (1990).
The image algebra is a heterogeneous algebra, in the sense of Birkhoff and
Lipson (1970), and is capable of describing image manipulations involving
not only single-valued images, but multivalued images, although here we
shall restrict our discussion to single-valued image manipulation. In fact, it
has been formally proven that the set of operations is sufficient for expressing
any image-to-image transformation defined in terms of a finite algorithmic
procedure, and also that the set of operations is sufficient for expressing any
image-to-image transformation for an image that has a finite number of gray
values (Ritter et al., 1987b: Ritter and Wilson, 1987). In addition, since the
lattice properties parallel many of the linear ones, definitions presented will
focus on both the linear and lattice properties of the image algebra.
The six operands of the image algebra are value sets, point sets, the elements
of each of these sets, images, and templates.

1. Value Sets, Point Sets, and Images


A value set is simply a semi-group, a set with an associative binary operation.
FOUNDATION AND APPLICATIONS OF LATTICE TRANSFORMS 73
Value sets that interest us are R,the set of real numbers: C,the set of complex
numbers; Z, the set of integers; R-,, R+, , and R,, , the set of real numbers
with one or both of - co or + 00 adjoined; R + = { r E R :r > 0 } ,
R”’ = R + u { 0 } , RZ = R + u { a }and
, RZo = R u (0, a}, the positive real
numbers, with one or both of 0 and co adjoined. An unspecified value set will
be denoted by F. The main importance of value sets is that the elementary
operations associated with a value set F induce the basic operations defined
for images and templates. Operations on the value set F include the natural
operation(s) associated with F .
Point sets are subsets of n-dimensional Euclidean space R”. The letters X,
Y, and W are reserved to denote point sets. Operations on a point set X
include the set theoretic operations associated with subsets of R”. A useful
unary operation is the cardinality of the set X, denoted by 1x1. If X is finite,
we write 1x1 < co.
The most fundamental of the algebra’s operands are images. Let X be a
point set and F a value set. Then an F-valued image a on X is the graph of
a function a : X + F, and thus has form
a = {(x, a(x)):xEX, a(x)EF}.
The set of all F-valued images on X is denoted by FX,following the usual
mathematical convention where BA = {f:fis a function from set A to set B } .
An element (x, a(x)) is called a picture element, or pixel, where x is the pixel
location and a(x) is the pixel value at location x. A point set corresponds to
the set of pixel locations in an image.

2. Operations on and between images


The algebraic system F induces operations on and between F-valued images.
For the value set (R,+ , *) and for real valued images a, b E RX,we have
a + b = c = {(x, c(x)) :c(x) = a(x) + b(x), x E X} (6)
a * b z c = {(x, c(x)) : c(x) = a(x) * b(x), x EX}. (7)
These two pointwise operations results in an image c on X. The binary
operation of image-image dot product is defined, for 1x1 < co and for a,
bERX,as
a b= a(x)b(x).
xsx

Note that a * b is a real number, whereas the operations (6) and (7) result in
an image.
Two other common operations used in image processing are expo-
74 JENNIFER L. DAVIDSON

nentiation and logarithms. If a and b are real valued images on X, then


a(x)b(x) if a(x) z 0,

i
ab = (x, c(x)) : c(x) =
(0 otherwise
, .€Xi.

Similar to the situation for real exponentiation, we say the image ab is


undefined if a(x)b(x)
is undefined (not a real number) for some element x in X.
The inverse operation, taking a logarithm, is defined by
log,(b) = {(x, ~ ( x ) )C(X)= logpcx)b(x), XEX}
Again, as for real logarithms, log,(b) is defined only when a(x) > 0 and
b(x) > 0 for all x in X.
The remaining operations on R" are described in terms of these basic ones
or are induced by corresponding operations on R.
One useful concept is that of a constant image. An image a E FXis a constant
image if a(x) = k E F for all x in X. Several important constant images are the
zero image, denoted by 0, and defined by ((x, 0) : x E X}; the identity image 1,
defined by 1 = {(x, 1): XEX}; the (negative) null image - cc = ((x,
- 00) : XEX}, and the positive null image cc = {(x, +
00) : XEX}. The
negative null image is referred to as simply the null image, while the positive
null image uses the term positive.
If bERX and a is a constant image with a(x) = k , then we define
bk = b"
kb= a * b a n d k + b = a + b
log,(b) = log,(b).
We also have
a -b = a + ( - b) = a + ( - 1) * b
a/b = a * b - ' .
As discussed in Section I1.A the algebraic structure (R-m, v , +) mimics
algebraically many of the properties of (R, +, *). The binary operation of
maximum between two images a, b E R; u1 is defined as
a v b = {(x, c(x)) : c(x) = a(x) v b(x), XEX}.
If a is a constant image with a(x) = k and bER;,, then we define
k v b = a v b.
The additive conjugate defined for the bounded I-group R,, induces an
additive conjugate image a* for an image aER;,. This is defined by
a*(x) = [a(x)]*, where the * refers to the definition as in (1). Thus we have
(a*)* = a, and if aERxmo,then a* is a member of the dual space a* ER:.
FOUNDATlON AND APPLICATIONS OF LATTlCE TRANSFORMS 75
Since R,, is a lattice, for images a, bERZ, we have
a A b = (a* v b*)*.
In addition to these operations on F-valued images, we also allow unary
operations on F to induce unary operations on F X . In particular, iff: F + F
is a function, then f induces a function, also denoted by A such that
f:F X -, F X , and is defined by
f(a) =b= {(x, b(x)):b(x) = f(a(x)), XEX}.
The functions
sin (a) = {(x, sin (a(x))) : x EX}
and
la1 = {(x, la(x)l>: x E X}
are examples of this.
A common unary image processing function is the characteristic function
of which the image algebra allows a generalization. Let (F, 0, x ) be a set with
two commutative and associative binary operations o and x . Further,
assume that F has a unique element I , under o and also a unique element I2
under x satisfying
rol, = I,or = r VrEF,
and
r x I2 = I2 x r =r VrEF.
Let 2’ denote the power set of X (the set of all subsets of X). Given a EF’
and S E(2F)X, we define the characteristic function of a E FXwith respect to S
to be

i
xs(a) = b = (x, b(x)): b(x) =
i I2 if a(x) E S(x)
I , otherwise 1.
Note that ( F , 0, x ) = (R, +, *) satisfies the above conditions with I , = 0 and
I, = I , as well as ( F , 0, x ) = (R”’, v , *) with I , = 0 and I2 = 1. Hence, we
have

xs(a) = b, where b(x) =


i 1 if a(x) E S(x)
0 otherwise
For example, image thresholding on real-valued images can be expressed
using the characteristic function. Let S(x) = K for all x in X, that is, K is a
constant subset of R for all x in X. Suppose K = { r E R :r 2 T } for some
76 JENNIFER L. DAVIDSON

threshold TER. Then


1 if a(x)EK
xs(a) = b, where b(x) =
0 otherwise
Thus xs(a) marks those pixels locations where a(x) meets or exceeds some
threshold value T. To reflect this specific case of the character function, we
write xzT(a) instead of xs(a). We can define in a similar way x,T(a), xGT(a),
x<T(a), and x=T(a).
Also, note that (F, o, x ) = (R-m, v , +) satisfies the above conditions
with I , = - co and Z, = 0. To notate the use of the characteristic function
with the set R-, we use the symbol &,(a). Hence,
0 if a(x)ES(x)
xsm(a)= b, where b(x) =
{ - 00 otherwise
Similarly, and with a similar notation, if ( F , 0, x ) = (R,, A, +), I, = + co
and Z2 = 0, then

xzm(a)= b, where b(x) =


i +co
0 if a(x)ES(x)
otherwise
Other concepts that apply to functions, such as the extension and the
restriction, also apply to images. However, we omit further discussion on
image properties and refer the reader to Ritter et al. (1990).

3 . Generalized Templates
Templates and template operations are the most powerful tools of the image
algebra. A template as defined in the image algebra not only unifies but
generalizes the familiar concepts of templates, masks, windows, and
neighborhood functions as used in the image processing community. In
particular, image algebra templates generalize the notion of structuring
elements as used in mathematical morphology (Ritter et al., 1987a)
Let X and Y be point sets and F a value set. An F-valued generalized
template t from Y to X is an element of (F')'. For each y E Y, t(y) is an image
on X. Denoting t(y) by fy for notational convenience, we have
t, = {(x, t,(x)): X E X ,t,(x)EF}.
The set of all F-valued templates from Y to X is denoted by (F')'. A template
t can be viewed as a collection of images, {fy}YEY. See Fig. 1 for an example
of a variant template from Y to X where X # Y. In this example, Y is a 6 x 10
array, X is a 3 x 5 array, and the value set F = R. For e a c h j = 1, . . . , 60,
there is an image fy, assigned to yj. For instance, for j = 1, t,,(xl) = 1,
FOUNDATION AND APPLICATIONS OF LATTICE TRANSFORMS 77

\ /'

FIGUREI . Example of a variant template from Y to X.


$,(xi) = 0, i = 2, . . . , 15; and tY25(x8)
= 1, tYz5(xi)= 0, i # 8. This template,
when combined with an image aERX via the 0 operation (discussed in
Section II.B.4), will magnify a by a factor of 2, producing an image b on Y
that is a magnified.
If we let FX= G, then G is a value set in its own right. For F = R, G is a
semi-group under each of the operations of function addition, function
multiplication, and function maximum. Thus we can view tE(FX)' = GYas
a G-valued image on Y.
The point y of t, is called the target point of t. If F E { R , C}, then for
t E (F')', the set
S(t,) = {x E x : t,(x) # 0)
is called the support oft,. If t E (R: m)y we define the (negative) infinite support
S _, (t,) to be the set
S-,(t,)= { x E X : t y ( X ) # -a)},
while the (positive) infinite support S, (t,) is defined to be the set
S,(t,) = {xEX:ty(X)# +m}.
All linear and lattice transforms can be represented by an appropriate
template. The isomorphisms embedding minimax algebra and mathematical
morphology into image algebra present an algorithmic procedure by which
to do this. We discuss this in Section 1I.C in more detail.
78 JENNIFER L. DAVIDSON

The support of a template plays an important role in theoretical research


as well as in applications. For example, most transforms cannot be imple-
mented directly on a parallel architecture. Instead, a particular transform
must be mapped to a specific architecture, that is, the limitations of the
machinery or architecture upon which the transform is to be implemented
must be represented in the mathematical expression of the transform. In the
case of neighborhood array processors, this involves decomposing the
transform into a sequence of factors where each factor is directly imple-
mentable on the architecture. The underlying connection scheme of the
parallel machine is represented by a graph whose nodes are the processing
elements (PEs) and the edges the connections between PEs. Usually there is
a limited number of other spatially close PEs to which each PE is connected,
called the local neighborhood. In a template decomposition, each factor,
corresponding to a template, is allowed to have its support only in the local
neighborhood.
Several important concepts concerning templates are now discussed. If
+
tE(F')' and for all triples x, y, Z E X with y + z and x ZEX,we have
$(x) = t,+,(x + z),then t is called translation invariant. A template which is
not translation invariant is called translation variant, or simply variant. Trans-
lation invariant templates have the nice property that they may be represented
pictorially in a concise manner. The Sobel template is a well-known example
of an invariant template. The cell with the hash marks in the pictorial
representation of this template indicates the location of the target point.
Several representations of a template will be important to our discussion.
One is the transpose of a template. Let t E (F')'. Then the transpose o f t is a
template t'E(FY)' defined by t:(y) = t,(x). If F = R,, then we introduce the
notion of a conjugate template. For t E(R;,)' the additive conjugate o f t is
the template t* E (RI,)' defined by

tXY) = [t,(x)l*.
Similarly, using Eq. (2), if t E ((R$o)X)Ythe multiplicative conjugate of t is the
template T E ((R$')')' defined by
-
tX(Y) = [t,(x)l.

4. Operations between Images and Templates


One common use of templates is to describe a transformation of an input
image based on its image values within a subset of the point set X.This notion
can be generalized to include transformations from one set of image coordi-
nates (input domain) to a different set of image coordinates (output domain)
of a more general type of image than those with real or complex numbers for
FOUNDATION AND APPLICATIONS OF LATTICE TRANSFORMS 79
values. This is done by the concept of a generalizedproduct between an image
and a template.
Let X c R" be finite, X = { x , , x2, . . . , x,}. Let y be a commutative and
associative binary operation on the value set F. Then the global reduce
operation r on FX induced by y is defined by
T(a) = X E X a(x) = a(xl)ya(x2)y . . . ya(x,),
and a E FX.Thus, : FX-,F.
Images and templates are combined by combining appropriate binary
operations. Let F, ,F,, and F be three value sets, and suppose o : F, x F, + F
and d : F2 x F, + F are binary operations. If y is a commutative and associ-
ative binary operation on F, a.F:, and tE(F:)Y, then the generalized
backward template operation of a with t (induced by y and 0) is the binary
operation@: F: x (F:)' -,FY defined by

If t€(F;)', then the generalized forward template operation of a with t is


defined as

t@a = {(Y? b(y)): b(Y) = Lxtx(y)6a(x), YEY}.


Note that the input image a is an F, -valued image on the point set X, and
the output image b is an F,-valued image on the point set Y, regardless of
which template operation, forward or backward, is used. Templates can
therefore be used to transform an image on one point set and with values in
one set to an image on a completely different point set whose values may be
entirely different from those of the original image.
Only three special cases of the previous generalized operation have been
investigated in detail, one by Gader (1986) and the other two by the author
(Davidson, 1989). Future research will certainly discover other useful combi-
nations. These three operations are denoted by 0,[vl, and 0. The
operation 0 is linear, and we refer the interested reader to other references
for recent research in this area (Gader, 1986, 1988; Ritter and Gader, 1987;
Ritter et al., 1990). The remaining operations, and @, are nonlinear.
Let X c R" be finite and Y c R". Let a .FX and t E (F')', where F E {R, C}.
Then the generalized backward convolution is defined as

We also define a generalized forward convolution as


80 JENNIFER L. DAVIDSON

where, in this case, t E (F')'.


The main operations of interest to us are the additive maximum and
multiplicative maximum operations. These are defined as follows. Let X c R"
be finite and Y c R". Let a E RZ and t E (RZ,)'. Then the backward additive
maximum operation is defined to be

The forward additive maximum is defined as

In this case, tE(RIm)'.


For finite X c R" and Y c R", let aE(R2')' and tE((Rzo)X)Y.Then the
backward multiplicative maximum operation is defined to be

The forward multiplicative maximum is defined as

t@ a =
{ (Y, b(y)):b(y) =

For the forward multiplicative maximum, t E ((R$o)Y)X.


XEX
v t,(~)*a(x), Y E Y . I
We remark that for computational as well as theoretical purposes, we can
restate the previous two convolutions with the new pixel value calculated only
over the support of the template t. If S-,(t,) # then VXEXa(x) t,(x) = a, +
+
VxEs-rn(ty)a(X)t,W, and we have

a[vlt=

Similarly, if S(4)#
i(y,b(y)):b(y)=

0,
then
v
x E S- rn (ty )
a(X)+tl(X),yEY}.

V x E xa(x) * t,(x) = VxES(ty)a(x)


* t,(x), and we
have

If in either case the support is empty, then we define

XE
V
s- m(fr)
a(x) + t,(x) = - 00 and v
x E SO,)
a(x)*&(x) = 0.

Note that these values represent the null or identity values of the respective
value sets, just as 0 is the identity value for the value set {R, *}. We may +,
therefore restrict our computation of the new pixel value to the support oft.
FOUNDATION AND APPLICATIONS OF LATTICE TRANSFORMS 81
This becomes particularly important when considering mapping of
transforms to certain types of parallel architectures.
Because of the duality inherent in the structure R,, (and RZ'), the
operations (and @) induce dual image-template operations, called
additive minimum (and multiplicative minimum). They are defined by
a m t 2 (t*[vla*)*
and

Equivalent, we have

= {(Y, b(YN : b(Y) = A a(x>


x E s, cu (Iy)
+ t,(x), Y EY} *
The multiplicative minimum and corresponding forward maximum
operations are defined in the natural way. The dual operation + ' is presented
in Section 1I.A. As before, if the support is empty, we define

A
x E s+, (t,)
a(x) + ~ ( x =) + 00.

The forward maximum operations are defined in the usual way.


These definitions assume that the respective support is finite for each y EY.
We may extend the above definitions to real, continuous functions a and t,
on a compact set S-,(t,) c R". This is well-defined as the sum or product of
two continuous functions on a compact subset of R",which is continuous and
always contains a maximum. Extending the basic properties of the image
algebra operations involving and @ from the discrete case to the
continuous case should present little difficulty, and remains an open problem
at this time.

5 . Operations between Generalized Templates


There are two types of operations defined between templates: template
arithmetic and template convolution. Template arithmetic is basically an
extension of image arithmetic, while template convolution generalizes image-
template convolution.
Binary operations of addition, multiplication, and maximum are defined
between templates. As discussed in Section II.B.3, if we denote the semi-
group {RX,y } by G , where y E { +, *}, then t E (RX)' = GYcan be viewed as
a G-valued image on Y. Thus, we can define addition and multiplication
82 JENNIFER L. DAVIDSON

+
pointwise between two templates s and t E (Rx)' as s t by (s t), = sy t, + +
and s * t by (s * t), = s, * t,.
Many of the properties that hold for images also hold for G-valued
templates. For example, the above two operations are each commutative and
associative, and each has an identity. Under addition, the identity template
is t = 0, that is, t, = OER' for all y in Y.Under multiplication, the identity
template is t = 1, that is, t, = 1 E RX for all y in Y.
If G = RZ, then we can define extended addition, maximum, and
minimum between templates:

and
s + t 5 r, where ry(x)=

s v t by
s A
1 s,(x)
s, (XI
t,(x)
-03

(S
+ t,(x)

t by (s A t),
if xES-,(ty)nS-ao(sy)
if X E S- ao (sy)\S- ic

if x E S - , (t,)\S-, (s,)
otherwise

v t), = S, v t,
= s, A t,.
(61

Note that in the case where s and t have no values of - co or + co anywhere,


the definition of s + t on the value set R,, degenerates to the definition of
+
s t on the value set R. Under the operation +
of extended addition, the
identity template is the same as above, namely t = 0. Under the operations
of maximum and minimum, the identity templates are the null template
t = - co,that is, t, = - co,the negative null image for all y in Y, and t = co,
is the positive null template.
A template S E (FX)' is a constant template if and only if s,(x) = k E F for
all x in X and for all y in Y. In this case we denote s by k. Thus, scalar
multiplication is simply template multiplication by a constant template k:
k * t = s * t. Scalar addition is template addition with a constant template:
+
k t = s + t. A one-point templates is one whose support satisfies I S(s,)I = 1
for all y in Y.
A function f : F + F induces a function, again denoted by f : (FX)'+
(FX)'where [f(t)], =f(t,) for all y in Y.There is also the concept of template
exponentiation. For two templates, s, tE(FX)', we define f by (f), z for
all y in Y.
Recall the global reduce operation :FX+ F, where F is a commutative
semi-group and X is finite. Let t E (FX)'so that t E G', where G = FX.If F = R
and r = E, then
FOUNDATION AND APPLICATIONS OF LATTICE TRANSFORMS 83
Hence, Ct is a real-valued image on X, that is, C : (RX)' + R X.
The generalized image-template operation @ generalizes to a generalized
template-template product. Let X c R" be finite, with X = { x , , x2, . . . , xm},
and let y be an associative and commutative binary operation on the value
set F with global reduce operation r on FX.Let F, and F, be two additional
value sets, and suppose 0 :F, x F, -+F is a binary operation. If y is an
associative and commutative binary operation on F, t E (F:)W, and s E (Fy )',
then the generalized template operation o f t with s (induced by y and 0) is the
binary operation @ : (F:)W x (FY)' + (F')' defined by
t@s = rE(FX)', where ry(x)= T{t,(x)os,(w):wEW}, Y E Y , X E X .
Note that if 1x1 = 1, then the definition of the generalized template operation
o f t and s degenerates to the definition of the generalized backward template
operation of the image tEFy with the template s ~ ( F y ) ' ,and rEF'. If
IYI = I , then the definition of the generalized template operation o f t and s
degenerates to the definition of the forward template operation of the image
S E FY with the template t E (F:)W, where r E FX.
The specific cases of@= 0 , m, or @ thus generalize to operations
between templates. If t is a real- or complex-valued template from W to X and
s is a real- or complex-valued template from Y to W, then the template
r = s 0 t from Y to X is defined by determining the image function ry at each
point Y E Y :
ry(x) = c tw(x)sy(w)
W€W
= 1sy(w)tw(x),where X E X .
W€W

As in the case for image-template operations, it is usually not necessary to


sum over all of W but only a certain subset of W. In particular, given y, then
for each X E X we define the set
S(x) = { W E W : W E S ( Sand
~ ) xES(t,)}.
Then, since s,(w)t,(x) =0 if w 4 S(x), we have
ryW= 1
w E S(X)
sy(w)tw(x)3

and where we define ~ w ~ S ~ x ~ ~ y =(w t w ( x ) S(x) = 0.


0 )whenever
The operation between two templates is defined in a similar way. For
t E (R:,)" and s E (RY,)', define the set
S-,(x) = ( W E W : W E S _ , ( Sand
~ ) xES-,(t,)}.
+
( ~ ) = - 00 if w#S-,(x),
Then, since V w E S ( X ) ~ ytw(x) we define r =
t s E (R; *)' by
ry(x)= v
W€S(X)
sy(w>+ tw(x),
84 JENNIFER L. DAVIDSON

where V w e S ( x ) ~+ ~ ) = - 00 whenever S-,(x) = 0.


y (&(x)
a},
From these definitions it is clear that S(r,) = { X E X: S(x) # and that
K m ( r Y= ) {xEX:S-,(X) # a}.
The dual operation of m for templates is defined as follows. Let
t E ( R t , ) W and s ~ ( R y , ) ~Then
. tlr\lsE(Rt,)Y is defined by
tms = ( S * l t * ) * .
Template composition and decomposition are the primary reasons for
introducing operations between generalized templates. Composition and
decomposition of templates provide a tool for algorithm optimization. For
example, decomposition of templates under the operation 0 allows for a
reduction in the number of multiplications necessary to compute the
transformation at a given point. This property is useful for implementation
on either parallel or sequential computers. The two-dimension1 discrete
Fourier transform (DFT), which can be decomposed into two one-
dimensional DFTs, is a good example of this. Another use for template
decomposition is to use the decomposition to map a transform to a particular
architecture. For example, the two-dimensional DFT cannot be directly
mapped to a mesh-connected architecture because the transformed value
depends on the value at every other pixel location in the image. Instead, the
DFT must be decomposed into a product of “smaller” transforms, each of
which is able to be computed on a mesh architecture. This is often called a
local decomposition, since the computation at each point depends only on
information available from the four nearest neighbors. For the DFT, finding
a decomposition is equivalent to factoring the Fourier matrix (of order n)
F, = (wf), 0, = e-*’””, i, k = 1, . . . , n. Several methods exist for various
factorizations (Parlett, 1982; Rose, 1980; Gader, 1989), and in particular
Gader (1989) gives an algorithm for the DFT computation that can be
implemented on a mesh or systolic architecture.
Template decomposition techniques under the operation are not as
richly developed as for the linear case. However, the mathematical mor-
phology operations of dilation and erosion, which can be expressed by the
image algebra operations and 11\1 respectively, are lattice transforms, and
decomposition of structuring elements is an active area of research. Factoring
binary templates under certain restrictions such as convexity has been investi-
gated (Zhuang and Haralick, 1986), and a general, gray-valued decom-
position has been developed as well (Davidson, 1989; 1991). The material
presented in this work provides an environment in which template decom-
position can be expressed as matrix decomposition. These tools have already
been used and should prove to be useful in the future for other topics besides
decomposition.
FOUNDATION AND APPLICATIONS OF LATTICE TRANSFORMS 85

C . The Embedding Isomorphism between Minimax Algebra and Image


Algebra

All linear transforms can be described by the image algebra operations of


addition and 0. One very powerful implication of this is that all the tools of
linear algebra are directly applicable to solving problems in image processing
whenever the image algebra operation 0 is involved. We now present an
embedding of the minimax algebra into image algebra for the case where the
value set is R 7. and the operations are v and
+ m.
This allows any minimax
algebra result to be applied toward solving image processing problems.
Let X and Y be finite arrays, with 1x1 = m and IYI = n. Assume the points
of X are labelled lexicographically, X = { x , , x 2 , . . . , x , } . Assume a similar
labelling for Y: Y = { y , , y2, . . . , y,}. Let R;% = { ( x , , x2,. . . , x,):
x , E R * , }. That is, R;, is the set of row vectors of m-tuples with values in
R,, . Let aE R f K ,M,, denote the set of m x n matrices with values in R , , ,
and define v: RZ -+ R;% by
v(a>= ( a ( x , ) ,. . . , a(x,)).
Define Y: (RZ )' -+ M,, by
'('1 = Qt where qlJ= t,,(',).
= (qlJ)>

Note that the j t h column of Q, is simply (v(ty,))',the prime denoting


transpose. Hence, v takes an image to a row vector and Y takes a template
to a matrix.
In the following theorems, we assume that 1x1 = m, IYI = n, and IWI = p .
The following are straightforward to prove:
Theorem 2.1. v ( a m t ) = v(a) x Y ( t ) , f o r tE(R:,)', aER:,.
~ -

Theorem 2.2. v(a v b) = v(a) v v(b),for a, b e R r , .


Theorem 2.3. Y ( t m s ) = Y ( t )x Y ( s ) , f o r t E ( R t , ) W , s ~ ( R y , ) ' .
Theorem 2.4. v(t v s) = " ( t ) v Y(s), for t , S E ( R ~ , ) ' .
Theorem 2.1 states that calculating a m t is the same as calculating the
corresponding vector-matrix product. Theorem 2.3 states that calculating
the composition of two templates is the same as calculating the corre-
sponding matrix product. Theorems 2.2 and 2.4 state that performing image-
image maximum or template-template maximum (pointwise) is the same as
performing matrix-matrix maximum.
Just as linear transformation of vector spaces can be characterized entirely
in terms of matrices, minimax algebra homomorphisms from R;m to RTm
can be characterized entirely in terms of matrices under the minimax product
(Cuninghame-Green, 1979). This important result gives necessary and
sufficient conditions for a transformation to be classified as a lattice
86 JENNIFER L. DAVIDSON

transform. In turn, a lattice transform has a representation as a template. As


will be discussed in the next section, a morphology transform, or structuring
element, always corresponds to a translation-invariant template. In minimax
algebra, these types of transforms are represented by matrices which are
block toeplitz with toeplitz blocks.
It is easily shown that the mappings v and Y are one-to-one and onto, and
that the identity template gets mapped to the corresponding identity matrix,
as well as that the dual operations of A and 1/\1 are also preserved.
Thus, the minimax algebra with the bounded I-group R,, is embedded
into image algebra by the functions v- ' and Y I . An image algebra transform
~

using either or @ can thus be viewed as a matrix transform in the


minimax algebra for the respective case of the value set R,, or RZ'.
This completes the mathematical identification of the three main
subalgebras in the image algebra. The linear transforms were classified by
Gader (1986), who showed that linear algebra is embedded into image
algebra. As a result ofeach minimax algebra embedding above, the full power
of the minimax matrix theory can be applied to solving problems in image
processing, as long as the image processing problem can be formulated using
image algebra operations of T !J or0. Since it has been formally proven
that the image algebra can represent all image-to-image transforms (Ritter et
al., 1987b; Ritter and Wilson, 1987), the embeddings are very useful to have.
While many of the concepts described in this section concerning the
algebraic systems FXand (FX)' for a bounded 1-group F are not new, it is
important to recognize that the formalism involving the generalization of the
operations is new and provides a very powerful algebraic and notational tool.

D . Mathematical Morphology

The early morphological image processing language had no obvious con-


nection with a lattice structure because its original purpose was to express
binary image manipulation. As morphology was extended to include gray-
value image processing techniques, the notions of taking maximums and
minimums of a set of numbers became apparent. While shrinking and
expanding operations utilized a maximum operation as early as 1978
(Nakagawa and Rosenfeld, 1978), it wasn't until the mid-1980s that formal
links were made to lattice structures (Davidson, 1989; Serra, 1982; Heijmans,
1990).
Up until the mid-l960s, the theoretical tools of quantitative microscopy as
applied to image analysis were not based on any cohesive mathematical
foundation. It was Matheron and Serra at the Ecole des Mines de Paris who
first pioneered the theory of mathematical morphology as an attempt to unify
the underlying mathematical concepts being used for image analysis in
FOUNDATION A N D APPLICATIONS OF LATTICE TRANSFORMS 87
microbiology, petrography, and metallography (Serra, 1982; Matheron,
1967). Initially, its main use was to describe boolean image processing in the
plane, but Sternberg (1980) extended the concepts in mathematical mor-
phology to include gray-valued images via the cumbersome notion of an
umbra. While others, including Serra (1975; Meyer, 1978), also extended
morphology to gray-valued images in different manners, Sternberg's
definitions have been used more regularly, and, in fact, are used by Serra in
his book (Serra, 1982).
The basis on which morphological theory lies are the two classical
operations of Minkowski addition and Minkowski subtraction from integral
geometry (Minkowski, 1903; Hadwiger, 1957). For two sets A c R" and
B c R", Minkowski addition and subtraction are defined as
A x B = u
heB
A , and AIB = n Ab,
bpB

+
respectively, where Ah = { a b : U E A } and B' = { - b : bE B } . This is the
original notation as used in Hadwiger's book (1957). It can be easily shown
that A / B = (A" x B'y, where A" denotes the complement of A in R".The two
morphological operations of dilation and erosion are constructed from these
definitions. While there are several slight variations on the actual definitions
of dilation and erosion, we will use Sternberg's, which are

and

Here, the set A represents the input image and the set B the structuring
element.
To avoid anomalies without practical interest, the structuring element B is
assumed to include the origin OER", and both A and B are assumed to be
compact. Also, the actual symbols used for dilation and erosion are typically
0 and 0, respectively. However, to avoid confusion with the image algebra
operation of @ we replace 0 and 0 with and 0, respectively.
All morphological transformations are combinations of dilations and
erosions, such as the opening of A and B, denoted by
AoB=(AHB)BB,
and the closing of A by B, denoted by
A* B=(AWB)BB.
However, a more general image transform in mathematical morphology is
the Hit or Miss transform (Serra, 1969). Since an erosion, and hence a
88 JENNIFER L. DAVIDSON

dilation, is a special case of the Hit or Miss transform, this transform is often
viewed as the universal morphological transformation upon which the theory
of mathematical morphology is based. Let B = (0,E ) be a pair of structuring
elements. Then the Hit or Miss transform of the set A is given by the
expression
A @ B = { a :D, c A , E, c A ' } .
For practical applications it is assumed that D n E = 0. The erosion of A by
D is obtained by simply letting E = 0, resulting in A 0 B = A [3 D .
Extension of these boolean operations have been accomplished through
the concept of an umbra. It has been shown that this somewhat cumbersome
method of developing gray-value morphology is unnecessary and that a
simpler and more intuitive approach can suffice (Davidson, 1989, 1990).
We now discuss the relationship between the morphological algebra and
the image algebra. For the appropriate template t, performing a dilation is
equivalent to calculating a m t . Also, an erosion can be represented by
alt*.
Let A , B be finite subsets of the n-fold Cartesian product of Z, z".Choose
a point set X such that X c Z" and satisfies A FjJ B c X. Let F, denote the
value set { - co, 0, 1, m}, and define a function p from the power set of z"
to the set of all F,-valued images on X by

p : 2""+ F t , p ( A ) = a, a(x) =
i 1 ifxEA
0 otherwise
This maps a morphology image A , represented by a set, to an image algebra
image a, represented by a function. The mapping of a structuring element is
as follows. Let
5={Bcz":JBI<coandO~B}.
Let 9 denote the set of all F,-valued invariant templates from X to X such
that y ~ S _ , ( t ~ Now
) . we define a function [ from 5 to 9 by

[ : 5+9, [(B) = t, t,(x) =


i"
-m
if XE

otherwise
Using these two functions, we can map a morphology structuring element to
B;

an image algebra template and vice versa. The image a is said to correspond
to the set A , and the template t is called the template corresponding to the
structuring element B. The correspondence between image algebra and
mathematical morphology is described in the next two theorems.
Theorem 2.5. Let p and [ be as defined above. Let A c Z", and B E Y a
FOUNDATION AND APPLICATIONS OF LATTICE TRANSFORMS 89
structuring element. Then

Theorem 2.6. Let p and [ be as defined above. Let A c Z", and B E Y a


structuring element. Then
P(A 0B) = P(A)W"(B)I*.
Now, the above two theorems show a correspondence for boolean images,
that is, the image and structuring element are subsets of R2. If we consider
a gray-value image as a subset of R3, then we still have the same relationship
as evidenced in these two theorems. To make things easier notationally, we
can view the mapping of the template in the following way. The value set is
now R - m , with the image values being mapped in the canonical way and the
template values getting assigned by
g(y - x) if x E B;
t,(X) =
otherwise
Here, g is the gray-value structuring element, and B its support in the plane.
The image algebra operation of @ can be used to express a boolean
dilation and erosion, and also the Hit or Miss transform (Davidson, 1990).
In addition, there is a straightforward technique by which to express the Hit
or Miss transform that uses no maximum or minimum operations, and
instead uses a linear convolution. We refer the interested reader to Davidson
(1989 or 1990).
Theorems 2.5 and 2.6 can be used to show that a strict subalgebra of the
full image algebra is isomorphic to the traditional mathematical morphology.
Since invariant templates with the target pixel in their support are a special
type of template, it should be clear to see that {RX,(R:,)', v,m, m} A ,
is a much larger algebra than mathematical morphology. Image algebra
templates generalize the concept of a structuring element; templates can vary
in size, shape of support, and weight values from point to point. Using the
operation m, a more general mapping of an image can be described,
representing a far more complex process than a simple dilation. This has
already been investigated, for example, in the new area of artificial neural
networks called morphological neural networks (Davidson and Ritter, 1990;
Davidson and Sun, 1991).
The isomorphism presented here, along with the isomorphism between the
minimax algebra and image algebra presented in Section I1.C can be used to
show that any dilation or erosion can be represented by a matrix in the
minimax algebra, and allows manipulation of a morphological transform as
a matrix. In addition, it is easy to show that a morphological transform will
always be represented by a matrix that is block toeplitz with toeplitz blocks.
90 JENNIFER L. DAVIDSON

This is another way to see that a morphological transform is a special type


of lattice transform.

111. APPLICATIONS

In this section we present several applications of the preceding theory. The


first part describes the mapping of a selection of properties given in the text
Minimax Algebra to image algebra. Many of these properties have not been
investigated as to their specific image processing applications. The second
section describes a generalized skeletonizing technique. This can be viewed as
a “division algorithm” on an image, which decomposes an image in a certain
way with respect to a template. The third section discusses an image complex-
ity measure, which is a global parameter describing how complex an image
is with respect to some reference. This section presents a method on how to
calculate such a measure using fractals and is described in image algebra. The
last topic is a formulation in image algebra of a problem in operations
research known as the dual transportation problem. These examples are
designed to show the wide range of situations in which lattice transforms
occur in image processing.

A . Mapping of Minimax Algebra Properties to Image Algebra

Here we describe algebraic properties of the substructures {{F’)’, FX; m,


v, m, A }, where F is a subbelt of R, During the investigation of the
lattice structure of the image algebra, but before the discovery of the link to
minimax algebra, many basic properties, such as the associativity of the IVl
operation, were proven within the context of the image algebra. Many
theorems had excessive notational overhead, and often the proofs were
laborious. Most of these same properties were found to have been stated and
proven in the context of the minimax algebra (Cuninghame-Green, 1979).
Using the matrix calculus makes some proofs less tedious, and in some cases
makes them less cumbersome notationally. Thus, in order to place the
presentation in a more elegant mathematical environment, we will omit
proofs that were done in the image algebra notation, and shall make use of
the isomorphisms given in the previous section. Most of the theorems
presented here are mapped into image algebra notation using the isomor-
phisms. The results will be stated for the bounded I-group R-, and operation
m, with the understanding that the exact same results hold for RZ0 and
operation @ with the appropriate substitutions.
FOUNDATION AND APPLICATIONS OF LATTICE TRANSFORMS 91
1. Basic DeJnitions and Properties
Unless otherwise stated, we assume that X, Y, and W are finite coordinate
sets, with 1x1 = m, IYI = n, IWI = k, with the pixel locations lexicographically
ordered as in Section 1I.C. We assume the belt F with duality is a subbelt of
R,, . The templates s and t are F-valued templates on appropriate domains,
and a, b are F-valued images. For a subbelt F of R,, we have the following
basic properties:
( I ) ((F')', v ) is a semi-lattice and ((F')', v ) is a function space over
(F, v , X I ;
(2) {(F')", v , m> is a belt;
(3) ((F')', v ) is a left space over the belt ((F')', v ,m);
(4) (F")' is a right space over the belt F;
(5) We define scalar multiplication of a template t E (F')' by a scalar 1 E F
as multiplication by the one-point template 2 6 (F?,)' or AE(FY,)',
depending on whether the template 1 multiplies from the left or from
the right, respectively (and adjoining - 00 to F if necessary), as
tJv)1 = 1 m t = s~(Fx,)', where s,,(x) = t,(x) +1
Here,

J,(x) =
i
1 ifx=y
-a otherwise
Next, we state the distributive properties of with respect to v .

(a v b ) m t = ( a m t ) v ( b m t )
(s v t ) M U
u i g ( s v t) = (urns) v ( U M t )

The dual to properties 1 through 6 also hold, because the belt R has
duality.
(7) ((F')', A ) is a semi-lattice and ((F')', A ) is a function space over
(F, A , x');
(8) {(F')', A , m} is a belt;
etc.
Now let F be a subbelt of R, and F,, the bounded I-group with group F.
Corresponding to the identity matrixand the null matrix, we have the (one
92 JENNIFER L. DAVIDSON

point) identity template 1 E (FZ,)', defined by

I,(x) =
i
0
- 00
ifx=y
otherwise
and the (constant) null template - cc E (FZ,)" defined by
3

- coY(x) = - 00, for all Y E Y ,X E X .


Thus, we have
a m 1 = a, t m l = Imt = t VaERZ,, VtE(R;,)'.
For - co e(FX)',
t v - a = = ,t p 7 - 0 O = - a I m t = - a l ,
a m - 00 = null image, VaERZ,, VtE(R;,)'.
a. Homomorphisms. We now discuss homomorphisms in the context of
the image algebra. Let 1x1 = m. Since the semi-lattice {FZ,, v } is isomorphic
(viav) to thes-lattice {F;,, v } , {F:,, v } isaspace. ForlEFXtheconstant
image, we have
a v 1 = 1 v a = bEF;,, where b(x) = a(x) v 1
and for the one-point template AE(RX,)',
a m 1 = I m a = bEF;,, where b(x) = a(x) + 1,
for F = R. Let F = R+,. Since {F", v } is a semi-lattice, a semi-lattice
homomorphism from Fx to F" is a function f : FX-,F" satisfying
f ( a v b) = f ( a ) v f ( b ) .
A right linear homomorphism g : FX-,FY is a semi-lattice homomorphism
satisfying
s(a m4 = g(a) mi.
Thus, the set of all right linear homomorphism from FXto F" is denoted by
Hom,(FX, F") = { g : FX+F", and g satisfies g(a v b)
= g(a) v g(b),g(a IVI4= g(a)El2 ) .
6 . Classijication of Homomorphisms in the Image Algebra. Right linear
transformations can be characterized entirely in terms of template transfor-
mations, and we give necessary and sufficient conditions for (FX)" to be
isomorphic to Hom,(FX, F").
Theorem 3.1. Let F be a belt with identity and null element. Then for aN
FOUNDATION AND APPLICATIONS OF LATTICE TRANSFORMS 93
non-emptyjnite coordinate sets X , Y , (F")' is isomorphic to Hom,(FX, F').
Corollary 3.2. Let F be a belt, and let X # be a$nite coordinate set with
1x1 > 1. Then a necessary and sufJicient condition that (F")' be isomorphic to
Hom,(FX, F'), for all non-empty finite coordinate sets Y , is that F have an
identity element 0with respect to x and a null element 8 with respect to v .
We call a template t E (F")' used with the operation v , m,or a lattice
transform.
c. Inequalities. Some useful inequalities are stated in the next theorem.
Theorem 3.3. Let F be a subbelt of R, co. Then the following inequalities hold
for images and templates with the appropriate domains, having values in F.
(i) a v (b A c) < (a v b) A (a v c) (i) s v (t A r) < (s v t) A (s v r)
(ii) a A (b v c) 2 (a A b) v (a A c) (ii) s A (t v r) 2 (s A t) v (s A r)
(iii) (a A b)at <(amt) A (bmt) (iii) tm(s A r) < ( t l s ) A (tlr)
(iv) am(t A s) <(amt) A (aas) (iv) (s A r ) R t < ( s m t ) A (rmt)
(v) (a v b ) m t 2 ( a m t ) v ( b m t ) (v) t W ( s v r) 2 ( t a r \ ) v (tmr)
(vi) a m ( t v s) 2 (amt) v (ams) (vi) (s v r ) a t 2 ( s m t ) v (rmt)

al(smr) < ( a m s ) l r and a m ( s m r ) k ( a l s ) l r .


t l ( s m r ) < ( t m s ) m r and t W ( s m r ) k ( t l s ) m r .
We remark that the above properties corresponding to the forward
multiplication of an image by a template as defined in Section II.B.4 are also
valid, namely, t m ( a A b) < ( t m a ) A ( t m b ) , etc.
d. Conjugacy. The notion of conjugacy as discussed in Section 1I.A
extends to templates as well. Suppose that F and F* are conjugate. Then for
t E (F')', t* E ((F*)')" is defined by
tXY) = (t,(x))*.
The conjugate of tE(R;,)' is the additive dual t*, which is defined in
Section 1I.A.
Let P be any set of F-valued templates from Y to X, with F and F* as
conjugate systems. Define P* by
P* = { t * : t E P } .
Here, the star symbol * denotes the dual template for the value set R,, . Note
that P* c ((F*)')". We have the following theorem.
Theorem 3.4. Let (F, v ) and (F, A ) be conjugate. Then ((F")', v , and m)
(((F*)')", A , m)
are conjugate as belts, where F is a sub-bounded 1-group qf
R, r , for any non-empty finite coordinate sets X , Y . In all cases the conjugate
of a given template t is the dual template t* as dejned previously.
94 JENNIFER L. DAVIDSON

Proposition 3.5. If (F, v , x , A , x') is a self-conjugate belt, then


((F*)')' = (F')' for all non-emptyjnite coordinate sets X, Y.Also, ((Rtm)',
v, m, m) A, is a self-conjugate belt.
We now present an application to a scheduling problem, showing the use
of the conjugate of a template. In particular, this example provides a physical
interpretation of the conjugate of a template.
Suppose we have n tasks, or activities, or subroutines, labelled 1, . . . , n.
Let a h ) denote the starting time of task i, and assume without loss of
generality that task 1 is the starting activity, task n is the finishing activity,
and that tasks 2 through n - 1 are intermediate activities. Suppose we are
given the time of the starting activity, and we wish to know the soonest time
at which each subsequent activity can be started. In particular, what is the
earliest time that task n can start, or, what is the earliest expected time of
completion of the collection of tasks?
The relation of the tasks to one another can be described by a partial order
W on the set of tasks { 1 , . . . , n } :
jWi if and only if task j is to be completed before task i can start.
Let d,] denote the minimum amount of time before which the start of
activity j must precede the start of activity i. That is, d,, is the duration time
of activity j , or the processing time of task j , which must pass before activity
i can start. Define w €(Rxca)' by

WX,(Xl) =
i
d,
- co
ifjgi
otherwise.
There is an obvious relationship between the weighted digraph associated
with the partial order relation W and the template w. For example, suppose
we have 5 tasks or activities, or subroutines of a program, which have the
following relation or partial order:
(1,2)(1,3)(2,4)(2,5)(3,4)(3, 5)(4,5).
Here, activity 1 is the start activity, activity 5 is the end activity, and tasks 2,
3, 4 are intermediate subroutines. Suppose the duration times d,, of the
activities are
d21 = 1 d31 = 6 d42 = 2
d43 = 1 d5, = 1 d53 = 3 d54 = 3,
and d,, = 0 for each i = 1, . . . , 5 . This is consistent with a meaningful
physical interpretation of the definition of duration time for a task.
The corresponding weighted digraph is given in Fig. 2. The nodes represent
the activities, and the duration times are given as numbers on the directed
edges linking the nodes.
FOUNDATION AND APPLICATIONS OF LATTICE TRANSFORMS 95

FIGURE2. A scheduling network.


For example, in determining the starting time of task 4, a(x4), note that
a(x4) must satisfy
NX4) = max (d42 + a(%), d
43 + a(x3), d44 + a(x4)>,
or equivalently,
a(x4) =
16165
IWX,('J) + a(x,)}'
This last equality follows from the fact that w,,(x,) = - co i f j is not related
to i. In the general setting, we must solve, for each i = 1, . . . , n,
a h ) = max {wx,(x,)
I6Jbn
+ a(x,)>,
or, writing the problem as an image algebra expression, we must solve for a
in
a m w =a. (10)
Here, a is an image on X where 1x1 = n.
An analysis of a network in this manner is called backward recursion
analysis.
Under forward recursion, suppose we have n tasks with duration timesfl,,
wherefl, is the minimum amount of time by which the start of activity i must
precede the start of activityj, if the activities are so related. Otherwise, letfl,
have value - 00. Define w E (RX,,)' by
if j B i
wx,(x,> =
- co otherwise
As before, fl,= 0 gives a consistent physical interpretation.
Let r be the planned completion date of the project, which is given, and
define a(x,) to be the latest allowable starting time for activity i. We wish to
determine a(x,), . . . , a(xn-,) such that a(x,) = t. Thus, we desire to solve
for a in
a(x,) = min { - wX,(xJ)
~ = l,n.
+
for i = 1, . . . , n. For example, for 5 nodes, suppose we have the following
96 JENNIFER L. DAVIDSON

relations:
(1,2)(1,3)(2,4)(2,5)(3,4)(3,5)(4,5)
Here, we write ( i , j ) if task i must precede taskj. Suppose the timesfl, of the
activities are
h2=1 fn=6 f24=2
h4=1 h5=1 h 5 = 3 f45=3
Suppose we would like to find a(x4), for example, satisfying
= max
]=I, .5
{wx4(x,)+ a(x,)).
The value
-wx4(x5) + a(x5)
is the latest allowable time to start task 5 minus the minimum amount of time
by which activity 4 must precede activity 5, and the time to start task 4 must
be at least as small as this number. Thus, the time to start task 5 must be at
least as small as - 1 + a(x5). The value a(x4) = min { - wx4(xs)+ a(x5)} =
- I + t. (All other values - w,,(x,) + a(x,) = + cc as - w,,(x,) = + cc for
j # 5.) Since t is given, this quantity can be explicitly determined. The
remaining equations can be solved similarly.
If we define u E (RZ m ) x by
- w,,(x,) ifj9i
U,,(X,> = 9

i+cc otherwise
then it is obvious that in general we must solve for a the following:
a m u = a. ( 1 1)
It is clear that the template u in Eq.(1 1) is the conjugate of the template w
in Eq. (10).That is,
u = w*
We can say that the templates w and w* define the structure of the network
as we analyze it backward or forward in time, respectively.
e. Alternating tt* Products. This section discusses the concept of an
alternating tt* or t f product of a template t and its conjugate under the
operation or @, respectively. We shall state the results for the sub-
bounded 1-groups of R and the operations and IA].
Theorem 3.6. Let F he a suh-hounded 1-group of R + *, where F denotes the
group of the hounded&roup F c m ,and t E (F:,)'. Then we have
t(vl(t*lr\lt) = t p J ( t * ( V l t ) = ( t p J t * ) ( v l t = (t(vlt*)Wt = t.
FOUNDATION AND APPLICATIONS O F LATTICE TRANSFORMS 97
Similarly,
t*((tlt*) = t*I(tpiJ*) = ( t * W t ) W t * = (t*pJ)prJt* = t*

We now define an alternating tt* product. Write a word consisting of the


letters t and t*, in an alternating sequence. A single letter t or t* is allowed.
If we have k > I letters, insert k - 1 symbols of and W, in an alternating
manner. For example, the following sequences are allowed:
t*mt

tmt*mt

t Jp* JTpJp* JTp tm* t

Now, insert brackets in an arbitrary way so that the resulting expression is


not ambiguous. For example,
t*mt

tW(t* t)

(t* I ( ( t r n t * ) I v l t ) ) Ir\l(t* a t )
An algebraic expression so constructed is called an alternating tt* product.
Suppose an alternating tt* product has an odd number of letters t and/or
t*. Then we say it is of type t if it begins and ends with t and that it is of type
t* if it begins and ends with t*. If it has an even number of letters we say that
it is of type
t m t * or tit* or t * a t or t * m t
exactly according to the first two letters with its separating operator, re-
gardless of how the brackets lie in the entire expression. As an example
t*mt is of type t* mt
tI ( t *p
J t) is of type t

(t*m((tmt*)mt))m(t*mt) is of type t * I t .
Theorem 3.7, Let F, be a sub-bounded 1-group of R -t =, and t an arbitrary
~

template in (FZ,)Y. Then every alternating tt* product P is well-defined, and


ifP is of type Q, then P = Q .
If a product P has more than one letter, then we define P(z)to be the formal
product obtained when the last (rightmost) letter, t or t*, is replaced by z,
where z is a F-valued template on the appropriate coordinate sets X and Y.
Theorem 3.8. Let F , , be a sub-bounded 1-group of R+%,and t, z arbitrary
templates over F. If P is an alternating tt* product containing four letters and
98 JENNIFER L. DAVIDSON

P is of type Q , then
P(z) = Q@>.
2. Systems of Equations
We now discuss the problem of finding solutions to the problem
given tE(R;,)'and bERr,, find aER;, such that a n t = b. (12)
Here, 1x1 = m, IYI = n.
a. F-asticity and /-solutions. If F is a bounded I-group and x, y E F, we say
that the products x x y and x x ' y are /-undefined if one of x, y is - 00 and
the other is + co. We say that a template product is /-undejined if the
evaluation of t,(x) requires the formation of a /-undefined product of
elements of the bounded 1-group F*,. Otherwise, we say that a template
product is /-dejined or /-exists. Some mathematical models require solutions
that avoid the formation of /-undefined products, since in practical cases
these often correspond to unrelated activities. We state these results for the
bounded 1-group R, , .
Lemma 3.9. Let F,, be a suhbelt of R,, . Let X and Y he nonempty, jinite
arrays, and t E (Ft ,)' . Then the set of all images a E F: ,such that a IVIt is
/-dejined is a sub-semi-lattice of F;m. Hence the set of solutions a of statement
(1 2) such that a t /-exists is either empty or is a sub-semi-lattice of Ff , .
Lemma 3.10. Let X, Y, and W be nonempty, jinite arrays, and t E (FY,)'.
Then the set of templates s E (FX,,)w, such that sm t is /-dejined is a sub-semi-
lattice of (F;,)'.
Any solution a of statement (12) such that a m t /-exists is called a
/-solution of (12).
Lemma 3.11. Let F,, be a sub-bonded I-group of R+,. Then (12) has at
least one solution if and only if a = b m t * is a s&tion. In this case,
a = b m t * is the greatest solution.
Recall from probability theory that a row-stochastic matrix is a non-
negative matrix in which the sum of the elements in each row is equal to 1.
We will make analogous definitions, where the operation + is replaced by the
operation v , and the unity element is - co.
Let P c F,,, where F,, is an arbitrary sub-bounded 1-group of R,,. A
template t E (F;,)' is called row-P-astic if V:= I ty,(x,)E P for all
i = 1, . . . , n and column-P-astic if V:=, tk,(yi) E P for all j = 1, . . . , m.
The template t is called doubly-P-astic if t is both row- and column-P-astic.
Note that if t is column-P-astic, then t' is row-P-astic.
Theorem 3.12. Let F,, be a sub-bounded I-group of R,, and tE(FZ,)',
FOUNDATION AND APPLICATIONS O F LATTICE TRANSFORMS 99
b E F r such that ( 12) is soluble. Then a = b mt* /-exists and is a /-solution
of (12), i f and only if one of the following cases is satisfied:
(i) ts(FX)', and b = + 00, the constant image with + 00 everywhere.
(ii) tE(FX)', and b = - co.
(iii) t E (F:,)' is doubly F-astic, and b E FX.
Moreover, every solution of (12) is then a /-solution, and bit* is equal to
+ co, - 00, or isjinite, respectively according as case ( i ) , (ii), or (iii) holds.
In the following corollary, we state the dual and left-right generalizations
of Theorems 3.1 1 and 3.12.
Corollary 3.13. Let F,, be a sub-bounded I-group of R*,, and let
tE(F;,)', bEFr,. Thenforallcombinationsofc,q,and6given in Fig. 3 the
following statement is true:
The image algebra equation c has at least one solution if and only if the
product d is a solution; and the product d is then the 6 solution. Furthermore,
if the product d is 1-dejined, and equation c is /-defined when a = d , then
equation c is /-dejined when a is any solution of equation c.
If d is a solution to c in Figure 3, then d is called a principal solution.
We can also restate the last three theorems as a solubility criterion:
Problem (12) is soluble if and only if ( b m t * ) P J t = b; and every solution
is a /-solution if ( b m t * ) P J t = b /-exists.
Note that Theorem 3.12 identifies the cases in which (12) has a /-defined
/-solution. All solutions are then /-solutions. The next question to ask is, Can
we find all solutions? We now focus on the following problem.
Given that F = R,, and that ( b m t * ) m t = b /-exists and equals b,
find all solutions of (12). (13)

C d 6

am t = b b t* greatest
a m t * = b bmt greatest
a m t = b b t* m least
a m t * = b bmt least
t m a = b t* b greatest
t * m a = b tmb greatest
t m a = b t* b least
t * m a = b tmb least

FIGURE3. Solutions to systems of equations.


I00 JENNIFER L. DAVIDSON

Given that F is R,, and that ( b a t * ) m t = b /-exists and equals b, find


all solutions of ( 1 2).
For cases (i) and (ii) of Theorem 3.12, we note that t is finite. The next
proposition gives solutions for these two cases.
Proposition 3.14. Let F,, be a suh-bounded 1-group of R,, . I f b = - co
(the constant image), then Problem (13) has b as its unique solution. I f
+
b = 00, then Problem (13) has as its solutions exactly those images of FX, ~

which have at least one pixel value equal to co. +


To determine solutions to case (iii), we need to consider the particular case
that F,, is the 3-element bounded I-group F3. Here b is finite with all
elements having value 4.
Lemma 3.15. Let F, be a 3-element bounded I-group F, . Lei t be doubly
F-astic and b befinite. Then (1 2 ) is soluble, having as principal 1-solution a = 1
where I(x) = 0 for all x. Hence, no solution to (12) contains + 00 for any pixel
value, and all solutions are /-solutions.
b. All Solutions to a m t = b. We now give some criteria for finding all
solutions to Problem (12) for the case where the template t is doubly F-astic
and b finite. We discuss the general case where F is the belt R.
If a template t E (F;,)' has form
t,,(X,) = a , , and t,,(x,) = - 00, j # i,
we write t = diag(a,, a2, . . . , a m ) .
For bEF finite, define the template dE(F?,)' by
d = diag([b(x,)l*, [b(x*)I*, . . . > "x,,)l*).
Since b is finite, so is d,,(x,), and d,(x,) = -b(x,) Vi = 1, . . . , m. Thus,
solving (12) is equivalent to solving
a m s =I,
where s = dmtE(FX+,)Y, and 1 = 0, the constant image. Note that
s,,(x,) = f,(x,) - b(x,). Now, for each image S:,EF:,, let W;' = {(x,,
y,) :s:,(yl) = V;=, s i f ( y k ) } Note
. that W;' c X x Y for everyj. The elements
s:,(y,) corresponding to (x,, y , ) W;' ~ are called marked values of W;'. Notice
that every image s:, will have at least one marked value, as d, t and s are
doubly F-astic. Our next theorem gives conditions where there is no solution.
Lemma 3.16. Let F,, he a bounded 1-group, tE(F;,)' where t is doubly
F-astic, and b E FY.Define s E (F? ,)' by
s=dlv/t
and d is as above. Suppose there exists i such that for no j is s,,(x,) a marked
FOUNDATION A N D APPLICATIONS OF LATTICE TRANSFORMS 101

value. That is, suppose there exists Y , E Ysuch that sY,(x,) is not a marked value
for any j . Then there does not exist a E F: 3L such that a t = b.
There now remains the case in which for every i, there is at least one j such
that sy,(x,) is a marked value. We transform the question into a boolean
problem, where it can be shown that the following procedure will give a set
of solutions to Eq. (14) (Cuninghame-Green, 1979).
Step 1. For the bounded I-group F, = F,, define g E (Ft )' by
~

gy,(X/ 1 =
i
0
-
if s:,(y,) is marked
co otherwise
Letting ftz F t 1 , now solve the boolean system
fpJg = I.

As in the case for matrices, each solution to Eq. ( 1 5) consists of an assignment


of one of the values - co or 0 to each f(x,).
Let f = (f(x,, . . . , f(x,)) be a solution to Eq. ( 1 5).
Step 2. For each j = 1, . . . , m: if f(x,) = 0 then set a(x,) to be the value
- [ v sk,]. If f(x,) = - co then a(x,) is given an arbitrary value such that
a@,) < - [ v q.
For the boolean case, we have the following proposition:
Proposition 3.17. The solutions of Eq. (1 5 ) are exactly the assignments of the
values 0 or - w to the variables f(x,) such that for every i = 1, . . . , m there
holds f(x,) = 0 for at least one j such that sy,(x,) is a marked value.
Theorem 3.18. Let F,, be a bounded I-group. Then the above two step
procedure yields all solutions to Eq. ( 1 5 ) without repetition.
c. Existence and Uniqueness. This section discusses some existence and
uniqueness theorems concerning solutions to Problem (1 2 ) .
Theorem 3.19. Let F, ;c be a bounded I-group and let t E (FX )' be doubly
F-astic and bEFYbe.finite. Then a necessary and suficient condition that the
equation amt = b shall have at least one solution is that for all x , E X , there
exists at least one j such that for the template s = d a
t, where d is as dejined
as above, sy,(x,) is a marked value.
We remark that the solution a(x,) = - [ v s:,] gives exactly the principal
solution.
This is equivalent to the following theorem:
Theorem 3.20. Let F,, be a bounded I-group, let t E ( F X , ) ' be doubly F-
astic, und let b E FYbe finite. Then a necessary and suficient condition that the
equation a m t = b shall have exactly one solution is that ,for all x, E X, there
102 JENNIFER L. DAVIDSON

exists at least one j such that sy,(xI)is a marked value, and for each
j = 1, . . . , n, there exists an i, 1 < i < m such that lW,'l = 1.
Define a template tE(F;,)' to be strictly doubly 0-astic if it satisfies the
following two conditions:
(i) ty,(x,) < 0, i, j = 1, . . . , n;
(ii) for each i = 1, . . . , n, there exists a unique index j E { 1, 2, . . . , n }
such that q,(x,) has value 0.
If tE(F;,)', (XI= m, IYJ= n, then we say that t contains a template
SE(F?,)W, i f the matrix Y-'(t) contains the matrix Y-'(s)of size h x k ,
where IW, I = h, IW, I = k , and both h, k < min (m, n). We say that a template
t E (F; contains an image a E FZ ,if a = t, for some y E Y.
Theorem 3.21. Let F,, be a bounded 1-group, let tE(F;,)' be double F -
astic, and let bE FYbe$nite. Then a necessary and suficient condition that the
equation aY IJ t = b shall have exactly one solution is that we can find k finite
elements a , , . . . , ak such that the template d defined by
dY,(XI)= - b(YJ + t,,(X,) + a,
is doubly 0-astic and that d contains a strictly doubly 0-astic template
S E ( F \ " , ) ~ , IWI = k .
d. A Linear Programming Criterion. W e can show that Problem (12) can
be stated as a linear programming problem for this bounded 1-group.
Theorem 3.22. Let t E (R;,)' be doubly F-astic and b E FY befinite. Let I be
the set of index pairs (i, j ) such that ty,(xI)isfinite, 1 < i < n, 1 <j < m. Then
a suficient condition that the equation a m t = b be soluble is that some
solution { tll1(i, j ) E I } of the following optimization problem in the variables z!,,
for ( i , j ) E I :
Minimize c my,) -
(!,I)€ I
$,(X]))Z!,

shall also satisfy

[ f i,]
j= I
> 0, i = 1, . . . , n

(i,j ) E I
We now make a definition that will be used in the next section. Let F,,
FOUNDATION AND APPLICATIONS OF LATTICE TRANSFORMS 103
be a belt, and let tE(F5,)' be arbitrary. The right column space oft is the set
of all bEF;, for which the equation a m t = b is soluble for a.
e . Linear Dependence. Linear dependence over a bounded 1-group. We
consider the equation alv]t = b in another way. For images ti,, rewrite
a m t = b as
m

v [ti,rna(x,)l
I= I
= by

where a(x,) E (FI,)' is the one-point template with target pixel value of a(x,).
In this case, we say that b is a linear combination of {t;, ,tk2,. . . , tkm},or that
bEF;, is (right) linearly dependent on the set {tk,, tk2, . . . , tk,}. While in
linear algebra the concept of linear dependence provides a foundation for a
theory of rank and dimension, the situation in the minimax algebra is more
complicated. The notion of strong linear independence is introduced to give
us a similar construct.
Theorem 3.23. Let F, a, be a bounded 1-group other than F, . Let X be a
coordinate set such that 1x1 > 2, and k > 1 be an arbitrary integer. Then we can
alwaysjind kjinite images on X, no one of which is linearly dependent on the
others.
If F,,
~ = F,, then we can produce a dimensional anomaly.
Theorem 3.24. Suppose F, ,= F, , and let X be a coordinate set such that
(XI = m > 2 . Then we can alwaysfind at least (mZ- m) images on X, no one
of which is linearly dependent on the others.
Since every bounded 1-group contains a copy of F,, the dimensional
anomaly in Theorem 3.24 extends to any arbitrary bounded 1-group.
Let 1x1 = m, JYI = n, and t E (F")' where F is an arbitrary bounded 1-group.
We would like to define the rank o f t in terms of linear independence, and to
be equal to the number of linearly independent images ti, oft. Suppose we
were to define linear independence as the negation of linear dependence, that
is, a set of k images on X(a, , . . . , ak)is linear independent if and only if no
one of the a, is linearly dependent on any subset of the others. Then applying
Theorem 3.23 for (XI= n and k > n, we could find k finite images that are
linearly independent. If we defined rank as the number of linearly inde-
pendent images t, o f t , then every template would have rank k 2 n, which is
not a useful definition in this context.
Strong linear independence. As for the matrix algebra, we define the concept
of strong linear independence.
Let F,, be a bounded 1-group and let a( l), . . . , a(k) E FZa, k b 1 . We say
that the set {a( I), . . . , a(k)) is strongly linearly independent, or SLI, if there
104 JENNIFER L. DAVIDSON

is at least one finite image bEFXthat has a unique expression of the form
h

b= v a(jp)mlJn9
p= I

withA,,EF,p=I , . . . , h, l < j , , < k , p = l , . . . , h , a n d j , , < j , i f p < q .


I f d = {a,, a*, . . . , ak}i s a s e t ofkimages whereeach a,EF;,, IYI = n,
then we define the template based on the set .d in the following way. For the
integer k, we find a coordinate set W that has k pixel locations, that is,
IWI = k. To this end, choose a positive integer p such that k = p q r, - +
where r < p (by the division algorithm for integers). Let W denote the set
{(i,j):O<i<p- l , O < j < q - l}u{(-l,j):O<j<r- I}, which i s a
subset of Z2 that is almost rectangular. There is an additional row in the
fourth quadrant corresponding to the r left-over pixel locations that do not
where quite make a full row. Of course, there are other selections that can be
made for W . Define the template t based on d by t E(F:,)' where
t w,' = a I ? i = l , . . . , k.
To clarify notation, we denote the template based on the set d = { a , ,
a2, . . . , ak} by t = B ( d ) . Hence, if tE(FX)', then for d = {tk,,
ti2,. . . , tkm},we have B ( d ) = t. If 9 = { a l , a,, . . . , a,,} is a set of h
F-valued images on X, we denote the right column space of B ( 9 ) by ( a l ,
a2, . . . , a,). Thus, fort e(FX)', (ti,, ti2, . . . , t:,) is the right column space
o f t . The set ( a , , a2, . . . , ah) is also called the space generated by the set
{ala2,. . . , ah}.
Lemma 3.25. Let F,, be a bounded I-group with group F. Let c, , . . . , ck,
b E FZm, k 2 1 be such that b isjinite and has a unique expression of the form
(15). Then h = k: j, = I , . . . ,jh= k; F , p = 1, . . . , h; and t is doubly
F-astic, where t E(F;,)' is the template based on the set %? = {cl, . . . , ck}.
Here, IYI = k.
We also have the following corollary:
Corollary 3.26. Let F, be a bounded 1-group and let c, , . . . , c, E F; co for
an integer n 2 1. Then {c,, . . . , c,) is SLI if and only if there exists ajinite
image b E FXsuch that the equation a m t = b is uniquely soluble for a, where
tE(FZ,)' is the template based on the set %' = {c,, . . .. , cn}, t = B(%'),
IYI = n.
We now define linear independence. Let F,, be a given belt. Then linear
independence is the negation of linear dependence: c , , . . . , c,EF;, are
linearly independent when no one of them is linearly dependent on the others.
How is linear dependence related to strong linear independence?
Theorem 3.27. Let F,, be a bounded 1-group, and c , , . . . , c,EF:, . For
FOUNDATION AND APPLICATIONS OF LATTICE TRANSFORMS 105
cI, . . . , ck to be linearly independent it is suficient, but not necessary, that
c I , . . . , ck be SLI.
We may call the above definition of SLI right SLI. If, in the definition of
SLI, we were to multiply by the scalars AJ's from the left, we define the
concept of left SLI. If formula (1 5) is replaced by
h

b = AI a(jP)BAjp
p=

then we have the concept of right dual SLI. We define in an analogous way
the concept of left dual SLI.

3. Rank of Templates
a. Template Rank over a Bounded 1-group. Let F,, be a bounded I-group
and tE(FZ,)' be arbitrary. We call the template t (right or) left column
regular if the set of images {tk}x,xare (right or) left SLI, respectively. We say
t is right or left row regular if the template t' is right or left column regular,
respectively.
Now suppose that F,, is a bounded I-group and tE(F;,)'. Suppose r is
the maximum number of images t: o f t that are SLI. In this case we say
that t has column rank equal to r. The row rank o f t is the column rank oft'.
For a template t E (F; m)y, we say that t has 0-astic rank equal to r E Z + if the
following is true for k = r but not for k > r:
Let W be a coordinate set, IW1 = k < min(m, n). There exist aEFX and
bEF', both finite, such that the template SE(F;,)' is doubly 0-astic and
s contains a strictly doubly 0-astic template u E (FY,)W, where
sy,(xJ)= b(y,) + t,,(x,) + a(x,), V i = I , . . . , n a n d j = 1, . . . , m
for F = R.
Lemma 3.28. Let F , ,be a bounded 1-group with group F = R, and suppose
that tE(F;,)' has 0-astic rank equal to r. Then t is doubly F-astic and t'
contains a set of at least r images, tk,,, k = I , . . , , r, which are SLI.
Lemma 3.29. Let F = R, and suppose that t E (FZ,)' is doubly F-astic and
consists of a set of r images which are SLI. Then t has 0-astic rank equal to at
least r.
Accordingly, we have the following theorem:
Theorem 3.30. Let F = R, and suppose that tE(F;,)' is doubly F-astic.
Then the following statements are all equivalent:
(i) t has 0-astic rank equal to r;
(ii) t has right column rank equal to r;
106 JENNIFER L. DAVIDSON

(iii) t has left row rank equal to r;


(iv) t* has dual right column rank equal to r; and
(v) t* has dual left row rank equal to r.
If t is double F-astic, then we can apply Theorem 3.30 and simply use the
term rank o f t for ranks (i) to (iii), and the term dual rank o f t for ranks (iv)
and (v). If the bounded 1-group F,, is commutative, as in our case, we have
the following
Corollary 3.31. Let F = R, and let tE(F;,)' be doubly F-astic. Then the
following statements are all equivalent:
(i) t has left column rank equal to r;
(ii) t has right row rank equal to r;
(iii) t* had dual left column rank equal to r; and
(iv) t* has dual right row rank equal to r.
b. Existence of Rank and Relation to SLI. We now discuss the existence
of the rank of a template and the relationship of rank to SLI.
Theorem 3.32. Let F = R, and let t E (FZ,)'. Then there is an integer r such
that t has 0-astic rank r, if and only ift is doubly F-astic. In this case r satisfies
1 < r < min(m, n) where m = 1x1, n = IYI.
We now have the tools to show that the previous dimension anomalies are
avoided in the context of strong linear independence.
Theorem 3.33. Let F = R, X an arbitrary nonempty, finite coordinate set
with 1x1 = m. Then for each integer n, 1 < n < m, we canfind n images on X ,
ajEF;,, j = 1, . , . , n that are SLI. This is impossible for n > m.
c. Permanents and Inverses. As in linear algebra, if t is a matrix all of
whose eigenvalues satisfy < 1, then the expression
(e-t)-'=e+t+e+. ..
is valid. We state an analogous case in the image algebra.
For a bounded 1-group F*,, a template t E(F;,)' is called increasing if
a l t 2 a for all aEFZ,, and s m t 2 s for all s ~ ( F t , ) ' ,
where Y is any arbitrary coordinate set.
We have the following lemma:
Lemma 3.34. Let F,, be a bounded 1-group, and let tE(F;,)'. Then t is
increasing if and only if t, (x) 2 0 tlx E X.
Let tE(R;,)' be a template, 1x1 = m. We define the permanent oft to be
the scalar Perm(t) E R * ,given by
FOUNDATION AND APPLICATIONS OF LATTICE TRANSFORMS 107
where the maximum is taken over all permutations 0 in the symmetric group
S, of order m!.
The adjugate template of tE(F;,)' is the template Adj (t) defined by
[ A 4 (t)I,,(x,) = Cofactor[tlx,(~,)
where Cofactor[tIx,(y,)is the permanent of the template s defined by
SYk(Xh)= fYk(Xh)
h = l , . . . ,j - I , j + l , ...,mandk=1, . . . ,i - l , i + l , . . . , m.
d. Graph Theory. We now present some graph theoretic tools that will be
used later.
A digraph or directed graph is a pair D = { V , E } where V is a finite set of
vertices { 1, . . . , n } and E c V x V . The set E is called the set of edges of D.
An edge ( i , j ) is directed from i toj, and can be represented by a vector with
tail at node i and head at node j .
A graph is a pair G = { V , E } where V is a finite set of vertices { 1 , . . . , n }
and E c V x V s u c h t h a t ( i , j ) E E i f a n d o n l y i f ( j , i ) E E .
A u-v path in a digraph or graph is a finite sequence of vertices u = y o ,
y , , . . . , y , = v s u c h t h a t ( y , , y , + , ) ~ E f o r a I l j = O , . . . , m - 1.Acircuitis
a path with the property that yo = y,. A simpfe path y o ,y I , . . . , y , is a path
with distinct vertices except, possibly, for yo and y,. A simple circuit is a
circuit that is a simple path.
A weighted digraph is a digraph to which every edge (i, j ) is uniquely
assigned a value in F,, . We denote the weight of the edge (i, j ) by t(i,j ) or
t,,. Note that the value t, is not necessarily equal to the value tJI.
We remark that if G = { V , E } is a graph then, if there exists a u-v path,
there also exists a v-u path.
With each path (circuit) 0 = y o , y , , . . . , y , of a weighted graph G , there
is an associated path (circuit)product p ( a ) , defined by
tY,.Yl x t,,.,z x ...x f,m-l.lm.

For each template tE(FX,)' where 1x1 = n, we can associate a weighted


graph A(t) in the following way. The associated graph A(t) is the weighted
graph G = ( V , E ) , where V = { 1, 2, . . . , n } , and whose weights are t,(x,),
for the pair ( i , j ) such that xl~S-,(tX,).The pair (i, j ) is then considered an
edge. If t,(x,) = - 03, then we can extend E to all of V x V by stating that
(i, j ) E E with null weight - 03. An example of a template t and its associated
weighted graph A(t) is given in Fig. 4. We have omitted listing the values of
- co on A(t). Here, 1x1 = 3.
For the belt F _x,, the correspondence is one-one. We note this in the next
theorem.
Lemma 3.35. Let F- ,be a belt, where + 03 4 F. Let a : (F!, )' -,Y = { G: G
108 JENNIFER L. DAVIDSON

FIGURE
4. A template and its associated graph. (a) a template t; (b) associated graph A(t).

is a weighted graph with n nodes} be defined by a(t) = A(t). Then c1 is one-one


and onto.
Proof Suppose a(t) = a(s). Let {tx(x,)} be the weights for A(t) and let
{sx,(x,)}be the weights for A(s). By definition, t,(x,) = sx,(x,) for all i, j, and
hence t = s.
Now suppose that G = ( V , E) is a weighted graph with weights {w,,}.
Define tE(FX,)' by t,(x,) = w,, if (i, j ) E,~and t,(x,) = - co otherwise.
Then a(t) = G.
Lett E (FXm)'. If for each circuit a in A(t) we havep(a) ,< 0 and there exists
at least one circuit a such that p(a) = 0, then we call t a definite template.
Lemma 3.36. A template t E (FX ou )' is definite if and only if for all simple
circuits a E A(t), p(a) < 0 and there exists at least one such simple circuit a such
that p(a) = 0.
Theorem 3.37. Let t E (F? m)x be either row-0-astic or column-0-astic. Then
t is definite.
Theorem 3.38 Let t E (F?, )'. If t is definite then so is ,'t f o r any integer
r 2. 0.
FOUNDATION AND APPLICATIONS OF LATTICE TRANSFORMS 109

Let t E (Fx ,)' where 1x1 = n. The metric template generated by t is


ryt) = t v t2 v . . . v r.
The dual metric template is
r*(t) = t* A (t*)* A . . . A (t")*.
The name metric originates from the application of the minimax algebra
to transportation networks. If for the bounded I-group R,, the value t,(x,)
represents the direct distance from node i to node j of a transportation
+
network with t , ( x , ) = 00 if there is no direct route, then (T(t))* represents
the shortest distance matrix, that is, ((T(t))*)x,(x,)is the shortest path possible
from node i to nodej of all possible paths. A description of a transportation
problem concerning shortest paths is discussed in Cuninghame-Green (1979).
Theorems 3.39 through 3.41 are used to prove Theorem 3.42.
Lemma 3.39. Let t E (Fx )". Then
~

r(t) = (I v t)+' v t
Lemma 3.40.(t v I)'.' = I v t v . . . v t'-', tE(Fxz)".
Theorem 3.41. Let t E(F;,)" be dejnite. Then
t' G T(t), r = 1, 2, . . .
Theorem 3.42. Let t E (Ft )" be dejnite. Then
~

[T(t)]' < T(t), r = 1, 2, . . .


T(t)=(Ivt)' v t , r = l , 2, . . . , n - I .
Using the adjugate of a template, we have the following theorem:
Theorem 3.43. Let F, be a commutative bounded 1-group and t E (Ft %)" be
~

dejnite and increasing. Then Adj(t) = T(t).


Now we define the inverse of a template. For t E (Ft =)", we define
Inv(t) = (Perm(t))-'mAdj(t)
by direct analogy in elementary linear algebra.
We note that the template Inv(t) is not necessary invertible in the sense that
I n v ( t ) m t = I, for example.
e. Invertibility. In order to define an invertible template, that is, a
template t E (FX, )" that has the property that there exists a unique template
ms
s satisfying t = s m t = I, we need to introduce the concept of equivalent
templates.
Let F,, be a subbelt of R+=.A template PE(F:,)" is said to be inverrible
if there exists a template qE-(F:m)x such that p m q = qmp = I.
110 JENNIFER L. DAVIDSON

These templates can be described in close detail. Let us define a strictly


doubly F-astic template over a bounded I-group F,, to be an element t of
((F:,)') satisfying
(i) ty,(xj) < + co, i, j = I , . . . , n; and
(ii) for each index i there exists a unique index j i E { 1,2, . . . , m } such that
t,,(x,,) is finite.
Theorem 3.44. Let F, be a bounded 1-group with group F and let p E (FZ m ) x
be given. Then p is invertible if and only if p is strictly doubly F-astic.
As is usual, if p is invertible, the the template q above is written as p - ' .
The intersection of the set of strictly doubly 0-astic templates and the set
of strictly doubly F-astic templates we call the permutation templates. It is not
difficult to show the next proposition.
Proposition 3.45. Let F,, be a bounded I-group. Then the set of invertible
templates from X to X,where 1x1 = m , form a group under the multiplication
a, containing 1 as the identity element and having the permutation templates
as a subgroup isomorphic to the symmetric group S , on m letters.
Pre- or post-multiplication of a template t by a permutation template p will
permute the images t: or the images t, o f t , respectively, and these permu-
tation templates play a role exactly like their counterparts in linear algebra.
f. Equivalence of Templates. Let F,, be a bounded 1-group, and let t,
s ~ ( F t , ) ' be given. We say that t and s are equivalent, written t = s, if there
exist invertible templates P E (F:,)'and qE(F;,)'such that pm t m q = s.
Now we define elementary templates. An elementary template PE (F;,)'
over a bounded I-group with group F is one of the following:
(i) a permutation template; or
(ii) a diagonal template of the form diag(0, . . . , 0, u, 0, . . . , 0), where
uEF.
Elementary templates correspond to matrices that perform elementary
operations on matrices. A permutation template
(i) permutes the images t: o f t ; or
(ii) permutes the images t, o f t ,
depending on whether the multiplication is from left or right, respectively.
Diagonal templates of the type listed in (ii) have the effect of multiplying
some image t: o f t by a finite constant a, or multiplying some image t, o f t
by a finite constant a, depending on whether the multiplication o f t is from
left or right, respectively.
Theorem 3.46. Let F,, be a bounded I-group, and let X and Y be given
FOUNDATION AND APPLICATIONS OF LATTICE TRANSFORMS 11 1

coordinate sets, 1x1 = m , IYI = n. Then the relation of equivalence is an equiva-


lence relation on ( F Z , )'. I f t, s E (FZ,)', then t = s if and only if there is a
sequence of templates u,, u , , . . . , uj such that u, = t and uj = s, and up is
obtained by an elementary operation on up-, , p = 1, , . . ,j .
Permutation and diagonal templates of this form will play an important
role in the discussion on local template decompositions, as well as the
following theorem.
Lemma 3.47. Let F,, be a bounded 1-group with group F and let t E (F:,)'
be given. I f a given image oft' (or t) is F-astic then t is equivalent to a template
in which that image oft' (or t) is 0-astic and all other images in t' (or t) are
identical with the corresponding image in t' (or t). Hence i f t is (row-, column-,
or doubly) F-astic then t is equivalent to a template which is (respectively row-,
column-, or doubly) 0-astic.
Equivalence and Rank. The following are results that show the relation
between equivalence and rank.
Proposition 3.48. Let F,, be a bounded 1-group, and let t, S E (FZ,)'. Then
t has 0-astic rank equal to r if and only if thefollowing statement is truefor j = r
but not for j > r:
t is equivalent to a doubly 0-astic template d that contains
a strictly doubly 0-astic template u E (FT,)W, where IWI = j .
Corollary 3.49. Let F,, be a bounded 1-group with group F and let t,
S E (F;,)' be equivalent.-Then ifeither t or s has a rank, then so does the other,
and the ranks are equal.

3. The Eigenproblem in the Image Algebra


Using the isomorphism, we can discuss the eigenproblem in the context of the
image algebra. In this section we present the eigenproblem and solution in
image algebra notation.
a. The Statement in Image Algebra. Unless otherwise stated, we assume
that F is a subbelt of R, and let F*,, F-,, and F,, have their usual
meanings. The coordinate sets X and Y are assumed to be nonempty, finite
arrays, with 1x1 = m and IYJ= n.
Let A E F , ccI. Let 1 E (FZ ,)" be the one-point template defined in the usual
way by

AJx) =
iA
- 00

Suppose F is a subbelt of R and tE(FZ,)'.


x=y
otherwise
Then the eigenproblem is to
112 JENNIFER L. DAVIDSON

find a E F X and ,IE F , such that


amt = a m A .
If such a and ,Iexist, then a is called an eigenimage o f t ,and ,Ia corresponding
eigenvalue. The eigenproblem is calledfinitely soluble if both a and ,Iare finite.
Theorem 3.50. Let t E (F")'. Then there exist s E (F! , )' such that if b is in
the column space oft, then b is an eigenimge of s with corresponding eigenvalue
0. Here, s = t*mtE(FX,)X. Hence b m t = b[vll = b.
Theorem 3.51. Let tE (FX,)'. r f the eigenproblem for t isjnitely soluble, t
must be row-F-astic. In particular, ift is row-0-astic, then the eigenproblem for
t isfinitely soluble, in which case 1 = 0.
Let tE(F?,)' be definite. We know that A(t) has at least one circuit CJ such
that p(o) = 0. An eigennode of A(t) is any node on such a circuit. Two
eigenmodes are equivalent if they are both on any one such circuit.
Lemma 3.52. Let t E (F:,)' be definite. Then r(t) is definite, and i f j is an
eigenmode of A(t), then
(r(t))X,(xJ) = O'
Conversely, if(l-(t))x,(xJ) = 0 for some x,EX, then j is an eigennode ofA(t).
Lemma 3.53. Let tE(F;,)" be definite. r f j is an eigennode of A(t) then
a ' m t = a ' m l = a',
where a' is the image [l-(t)]i,.
Thus, images [r(t)]i, where j is an eigennode give us eigenimages of the
template t, with corresponding eigenvalue 0. For a given t, the set of all such
images are called the fundamental eigenimages for t. Just as in the case for
matrices, two fundamental eigenimages are called equivalent if nodes j and h
are equivalent, and otherwise the eigenimages are not equivalent.
Theorem 3.54. Let t E (FZ,)" be definite. r f a', akE FZ ,are fundamental
eigenvectors oft corresponding to equivalent eigennodes j and k , respectively,
then
d = akMa,

where ct E F, and a E (F: ,)" is a one-point template.


b. Eigenspaces. If tE(FZ,)' is definite, let {a", . . . , d k }be a maximal
set of non-equivalent fundamental eigenimages oft. The space (a", . . . , d A )
generated by these eigenimages is called the eigenspace of t.
Theorem 3.55. Let t E (FZ m)X be given. If the eigenproblem for t is finitely
FOUNDATION A N D APPLICATIONS OF LATTICE TRANSFORMS 1 I3

soluble then every jinite eigenimage has the same unique corresponding finite
eigenvalue A. The template tlvJ - A is dejinite, and all$nite eigenimages o f t
lie in the eigenspace oft - 2. The non-equivalent fundamental eigenimages

that generate this space have the property that no one of them is linearly
dependent on (any subset o f ) the others.
The unique scalar in Theorem 3.55, when it exists, is called the principal
eigenvalue of t.
We call a bounded I-group F radicable if for each a E F and integer k 3 1,
there exists a unique , f ~ F such that f h = a.
Some examples of radicable bounded I-groups are R,, , Q k x . RZ".
However, Z +% <
is not radicable. Choosing a = 12 and k = solving forfin
the equation-
.f5 = 12
is just solving for f in (using regular arithmetic)
5f = 12,
which, of course, has no integral solution.
Let F be a radicable bounded I-group, and t E ( F t X ) ' . Let a = y o ,
y , , . . . , ymjbe a circuit in A(t). We define the length of CT to be m. For each
circuit a in A(t), of length I(a) and having circuit product p ( a ) , we define a
circuit mean p ( a ) ~ Fby
[P(41'(') =P ( 4 .

We also define
A(t) = v {p(a) :a is a simple circuit in A(t)}.
For the template and associated graph A(t) in Fig. 5 we have the following
computations.

4
-1
7
5
--a,
--oc
I
-m

5. Computation of the circuit mean


FIGURE ~(6).
114 JENNIFER L. DAVIDSON

In this example, A(t) = 7.


c. Solutions to the Eigenproblem. We now present the relation between
A(t) and the principal eigenvalue for t.
Theorem 3.56. Let F, ,be a radicable bounded I-group and let t E (F; , )" be
given. If the eigenproblemfor t isfinitely soluble then A(t) isjinite, and in this
case, A(t) is the only possible value for the eigenvalue in any finite solution to
the eigenproblem for t. That is, I(t) is the principal eigenvalue o f t .
Theorem 3.57. Let F,, be a radicable sub-bounded I-group of R,, and let
tE(F;,)" be given. Then the eigenproblem for t isfinitely soluble if and only
if A(t) is finite and the template B ( d ) is doubly F-astic, where
d = { [ W m - WI;,, , [wm - 4t))l;,2, . . . , [WM- A(t>>l;,k} is a
maximal set of non-equivalent fundamental eigenimages for the dejinite
template t m - A(t).
The computational task. If 1x1 is large, and tE(F;,)' then to evaluate the
circuit product directly for all simple circuits in t is very time consuming. We
now state a theorem which makes the task more manageable for the case
where the bounded 1-group is R,, .
Theorem 3.58. Let tE(F;,)' be given. the eigenproblem for t isfinitely
soluble, then A(t) is the optimal value of I in the following linear programming
+
problem in the n 1 real variables I, x, , . . . , x,:
minimize A subject to A + x, - x, 2 tx,(x,),
where the inequality constraint is taken over all pairs i, j for which t,,(x,) is
finite.
In Theorem 3.55, we noted the linear independence of the fundamental
eigenimages that generate an eigenspace. We are able now to prove a stronger
result that has applications to R*,.
Theorem 3.59. Let F,, be a radicable bounded I-group other than F, , and let
t E(F;,)' have a finitely soluble eigenproblem. Then the fundamental
eigenimages of - A ( t ) m t corresponding to a maximal set of non-equivalent
eigennodes in A[- A ( t ) m t] are SLI.
We now present a result relating A(t) and Inv.
Theorem 3.60. Let F,, be a bounded 1-group and tE(F;,)" be such that
I(t) < 0. Then
Inv(1 v t) = I v t v tz v . . . v tK
for arbitrary large K. Here, I denotes the identity template of (F;,)'
FOUNDATION A N D APPLICATIONS OF LATTICE TRANSFORMS 1 15
B. A General Skeletonizing Technique

It is possible to describe a general skeletonizing technique using the image


algebra additive maximum operation. This procedure can actually be viewed
as a division algorithm in a noneuclidean domain. For example, the integers
have the property that a division algorithm can be defined on them. For a,
b E Z, there exist unique integers 4, r such that u = 4b + r where Irl < Ibl. This
is an example of an integral domain upon which is defined a euclidean
valuation (Fraleigh, 1967). In this section, we present a division algorithm for
the minimax algebra structure and give an application of this result to image
processing in the image algebra notation.
We remark that the boolean case has already been stated by Miller (1 978),
and will be discussed in more detail at the end of this section.

1. A Matrix Division Algorithm


Let F - be a sub-bounded I-group of R - , . For notational convenience, we
will write t E A n , ( - co) when we mean that the matrix t will assume values
only on F u { - a}. Here, A?,,,ndenotes the set of all m x n matrices over the
structure F. Similarly, we write tEA?,,,(+ co) when the matrix t assumes
values only on F u { + a}. We will show that for a finite vector a E F” and a
subset of matrices of Ann( - co), that there exist vectors q and r such that

a = (t’ x q) v r.
Lemma 3.61. Let aEFn be finite, and t E A n , (- co) satisfy S - , ( t , ) #
@ V i = 1,. . . , n. Define i by i = (t*)’. Then both 9 x ’ a and i x (? x ’ a )
are finite, and
t’ x (i x ‘a) < a.
Proof First we note that

and that S+,(i,)= S-=(t,). Let b = ? x ’ a and let c = t’ x b. At location i,


, ) + u,. The vector b is finite, since for iE { I , 2 , . . . , n } , there
h, = A I E s + ? ( i<,
<,
exists at least onejES+,(i,) such that + a, is finite, since by hypothesis,
S-,(t,) # 0 V i = 1, . . . , n, and a , E F V i = I , . . . , n also. Thus, b, =
{ AJES+%(i,) +
i,, a,} EFVi. At location i, c, = v,Es ,(,) +
th b,. By a similar
argument, we see that c , e F V i . Suppose that c, = f:k + bk for some k E { I ,
2, . . . , n } . Then b k = A l E S c m ( iiklk ) + al = ikp + up for some p . Since
k E S - , (t:),we know that i E S - ( t k ) .Since S _ x1 ( t k ) = S + , ( i k ) , we know that
116 JENNIFER L. DAVIDSON

ie s,, (fk), and, hence,


ikp + a,, < ik,+ a, EF

by our choice of p and the fact that ik, E F and a, E F Vi. Thus,
c, = f:k + 6, = f:k f ikp + a,, < f:k + ik, + a, = t k , + ( - tk,) + a, = a,.
Thus, c, < a,, and our lemma is proved.
We now state the Division Algorithm.
Theorem 3.62. The Division Algorithm. Let a, t satisfy the hypothesis of
Lemma 3.61. Then for q = x a, and r dejined by
a, if a, > [t’ x (i x ‘a)],
- co if a, = [t’ x (i x ‘a)],
we have
a = (t’ x q) v r.
Proof By Lemma 3.61, a b t’ x q = t’ x (f x ‘a), and hence,
a2r
Thus, [t’ x (T x’a)] v r < a. To show that equality holds, that is, that
[t’ x (f x ‘a)] v r = a, we examine two cases.
Case I . a, > [t’ x (i x ‘a)],.
Here, [t’ x (f x ’ a)], v r, = [t’ x (f x ’ a)], v a, = a,.
Case 2. a, = [t’ x (f x a)],.
-
Here, [t’ x (t x ‘a)], v r, = a, v r, = a, v - co = a,.

Now suppose we have a = (t’ x q) v r for a finite, t E Y d n n (- co) and t


satisfying S+,(t,) # 0V i = I , . . . , n. Define
a’ = a,
r” = r, and
a’ = f x ’ ~ 1 - I .

Then we have
a = a’ = (t’ x a’) v r0 . (16)
By Lemma 3.61, a’ = f x ‘a” is finite, and in fact, a’ = f x ‘a‘+’ will be finite
for each i = 1,2, . . . Thus the Division Algorithm applies in particular to a’ :
a’ = (t’ x a*) v rl, (17)
FOUNDATION AND APPLICATIONS OF LATTICE TRANSFORMS 1 17
and substituting (17) into (16), we get
a = (t’ x a’) v ro
= {t’ x [(t’ x a’) v r’]} v ro
= (t’ x t’ x a’) v (t’ x r’) v ro
= [(t’)’ x a’] v (t’ x rl) v yo,

where (t’)k denotes the k-fold product oft’, x t’.


Apply the Division Algorithm to a’, to get
a’ = (t’ x a’) v r’,
and substituting this into (18), we get
a = {(t’)’ x [(t‘ x a’) v r’]) v (t’ x r’) v ro
= [(t’)’ x t’ x a’] v [(t’)’ x r’] v [t’ x rl] v ro
= [(t’)’ x a’] v [(t’)’ x r2] v [t’ x r’] v ro.
We can continue like this up to any kth iteration.
a = ro v [t’ x rl] v [(t’)’ x r’] v . . . v [(t’)k x rk] v [(tr)k+lx ak+l]
or, if we let (t’)’ denote the identity matrix e, we have
k
a = V [(t’)j x ri] v [(t’lk+l x ak+’],
i=l

We now state a result which will be useful in describing the division


algorithm in the image algebra.
Lemma 3.63. Let a, b E F n (be jinite vectors). Then we may express the
difference of vectors a and b, a - b, using the following matrix transform.
Define SEA,,, ( - 00) by s = diag((b,)*,. . . , (b,)*) = diag( - b l , . . . , - b,)
with - bi denoting the real arithmetic additive inverse of the real number 6 , .
Then
s x a =CEF“,
where
ci = ai - bi, i = 1, . . . , ‘ n
Proof
n
(s x a)i = V
k= I
(sik + a k )= sii+ ai = - b,. + a i f o r i = 1 , . . . , n.
118 JENNIFER L. DAVIDSON

We remark that the vector r as defined in Theorem 3.62 can be obtained


in the following way. Fix aEF", finite. Definef,: F" + Fnby

f , ( x ) = y where y i =

Then for x = t' x (? x'a),


i"
-co
if a; > xi
otherwise

However, it is easily shown thatf, is not a semi-lattice homomorphism. For


example, choose n = 2, and a, d , and e as follows:

Then

fa(d v e) =f, ([r?]) [ -3 =

but

Thus, according to Theorem 3.1, f a cannot be represented as a matrix


transform. If, however, we go outside of the structure of the minimax
algebra, and use image algebra operations in addition to v and (or a),
we can express this transform in a succinct way, as will be demonstrated in
the next section.
A dual division algorithm. The duality of the operations of the matrix algebra
enable us to describe a dual division algorithm. We omit the proofs, since
they are the dual of the proofs given in the previous section.
Lemma 3.64. Let a EF" be finite, and tE&,,,( +
co) satisfy S+,(t,) #
0 V i = 1 , . . . , n. Defineiby ? =(t*)'. Then both t' x aand? x'(t' x a)are
finite, and
T x '(t' x a) 2 a.
Lemma 3.65. (The Dual Division Algorithm). Let a, t satisfy the hypothesis
of Lemma 3.64. Then for q = t' x a, and r defined by
a, if a, < x ' (t' x a)],
+ 00 if a, = @ x ' (t' x a)],,
FOUNDATION AND APPLICATIONS O F LATTICE TRANSFORMS 1 19
we have
a = (i x’q) A r.

2. An Image Algebra Division Algorithm


Using the isomorphism Y, we can express these ideas in the image algebra.
Let f = (t*)’ for tE(F?,)’.
Lemma 3.66. Let aEF’, tE(Fx,)’ he such that SP,(t,) # ~ V X E XThen .
each o f a m i and ( a m Q m t ’ arefinite, and a k ( a l r \ ) m t ’ .
The next theorem is the counterpart to Lemma 3.63.
Lemma 3.67. Let a, bEF’. Then the image c = a - b may be expressed
using a template in the following way. Define s ~ ( F 1 , ) ’ by

Then ams = a - b.
SyW=
i -b(y)
-a
if x = y
otherwise

Using the lattice characteristic function, it is sometimes the case that we


can stay within the lattice operations of v and IVI and the image algebra
operation of + when needing to express a characteristic function. An
example of this follows immediately.
Theorem 3.68. The Division Algorithm. Let a, t satisfy the hypothesis of
Lemma 3.66. Then for q = a m i and r defined by

we have that
a = (q(t’) v r.
Proof;. We need to show that Y (r) matches with our definition of the matrix
r in Theorem 3.62. Let b = ( a W i ) M t ’ . Then using Lemma 3.67,
a - b = a m s , where

sy(x) =
i-b(y)
- co
if x =y

otherwise
Thus, a - b k 0 implies that a m s 2 0. Now,
0 if a(x) > b(x)
xCo(amIs)= CEF’, where c(x) =
- co if a(x) = b(x)
120 JENNIFER L. DAVIDSON

Thus, at location X E X ,the image r = a + x y o [ a m s ] has the gray value


a(x) + 0 if a(x) > b(x) a(x) if a(x) > b(x)
r(x) = a(x) + c(x) = a(x) + - 00 {
if a(x) = b(x) = - 00 if a(x) = b(x)
Under Y, this remainder image is the same as the vector r in Lemma 3.62.

Iterating k times on an image a and a template t satisfying the hypothesis


of Lemma 3.66, we obtain
k
a = V [(r’m(t’)’)]v [ak+’IV](t’)k+l],
i=l

where any template t raised to the zeroth power, to, is the identify template, e.
In the boolean case, there exists an integer m such that
a”m(t’)” = 0,
so that the expression for a becomes
m
a= V rkm(t’)k
k=O

One useful application of this result is in data compression. By encoding


the ri’s in run length code, the image can be represented by fewer bits of data,
and reconstructed exactly once t is known.
We have the dual Division Algorithm stated in the image algebra also.
Proposition 3.69. Let a E RX,t E (R?,)’ be such that S _ (t,) # @ Vx E X.
For T = (t*)’, we have that each of a m t ’ and ( a m t ’ ) W T are finite, and
a < (amt’)jTiJh.
Proposition 3.70. (The Dual Division Algorithm). Let a, t satisfy the
hypothesis of Proposition 3.69. Then for q = a m
t‘ and r defined by
r =a A xTo[a - ( ( a m t ’ ) W i ) ] ,

we have that
a = (qmi)A r.

C . An Image Complexity Measure

This section presents an image complexity measure, a term used in image


processing to describe any method that provides a quantitative measure of
some feature or set of features in an image. Image complexity measures are
used either as a pre-processing step in which the measures help direct the
FOUNDATION A N D APPLICATIONS OF LATTICE TRANSFORMS 121
selection of the next processing step, or in conjunction with other infor-
mation derived from the image to identify objects of interest.
The measure presented here is based on a method discussed by Mandelbrot
for curve length measurements. The original algorithm (Peleg et al., 1983)
was modified and translated into image algebra. The measure itself consists
of a graph that, in theory, gives an indication of the rate of change of
variation in the gray-level surface. The algorithm for computing the measure
is presented, followed by a short discussion of an application to 12 outdoor
forward looking infrared (FLIR) images.
The general approach of the algorithm is to make succesive approxi-
mations of the area of a gray-level surface and then plot the approximations
using a log-log scale. The log-log scale is purported to allow a better visual
inspection of the information contained in the graph.
Consider all points with distance to the gray-level surface of no more than
k . These points form a blanket of thickness 2k, and the suggested surface area
A ( k ) of the gray-level surface is the volume of the blanket divided by 2k. Here
we have A ( k ) increasing as k decreases.
To begin the computation of the surface area for k = 1, 2, . . . , an upper
surface uk and a lower surface bk are defined iteratively in the following
manner. Let a be the input image. Let

Then define uk and

where

t=#.

The volume v(k) of the blanket between the upper and lower surfaces is
calculated for each k by computing
pi ( k ) = uk 0 S, 91 ( k ) = bk 0 ( - s l y
where
122 JENNIFER L. DAVIDSON

This method of estimating the volume was derived using elementary


calculus. We explain the method for calculating the volume between the
upper surface and the coordinate set X. The volume between the lower
surface and X is found in a similar way. Given four pixel locations in X, (i, j ) ,
( i , j + I), (i + l , j ) , and (i + l , j + I), a box was constructed from the eight
points in R3 corresponding to the four gray values uk(i, j ) , u,(i, j + l),
u,(i + l , j ) , uk(i+ I , j + I ) and the four given pixels. Drawing a line from
u,(i,j) to u,(i+ 1 , j + 1) and a line from ( i , j ) to (i + 1 , j + l), the volume
of the triangular column determined by the six points u,(i, j ) , u k ( i + 1, j ) ,
u,(i + l , j + I), ( i , j ) , ( i + l , j ) , and (i + I , j + 1) was found using methods
from elementary calculus. Similarly, the volume of the triangular column
determined by the six points uk(irj ) , u,(i, j + +
I), uk(i+ 1, j I), (i, j ) ,
(i, j + I), and (i + 1, j + 1) was determined. The volumes of the two pieces
are added together to give an estimate of the volume of the box determined
by the eight initial points. This is done over the entire coordinate set X,and
all volumes added together to give an estimate of the volume between X and
the gray value surface uk. The method was expressed using the image algebra
operation 0and an invariant template, omitting the boundary effects. Using
this approach, the volume is overestimated, so it is corrected by applying a
variant template w effective only on the edge pixels. Define w by

if x is a top edge pixel and not the right corner pixel,


W, = E0.33 l
if x is the top right corner pixel,
m
if x is a right edge pixel but not the top right corner pixel, and w, = 0, if x
is otherwise.
To correct for the extra volume added in on the edge pixels, we calculate
p2(k) = u k 0(-wL q2(k) = b, 0w
and let volerr(k) = Z[p,(k) + q2(k)]. The correct volume v(k) is
v(k) = v, ( k ) + volerr(k)
The approximated surface area is

area(k) = v(k)
-2k .
FOUNDATION AND APPLICATIONS OF LATTICE TRANSFORMS 123
The rate of change of log(area(k)) with respect to log(k) contains
important information about the image. The slope S(k) of area(k) versus k
is computed on a log-log scale for each k by finding the best fitting straight
line through the three points
(log ( k - I), log (area(k - 1))), (log ( k ) , log (area(k))),
(log(k + l), log(area(k + 1))).
The graph of S(k) versus k is called the signature of the image. We can also
calculate a signature for the case where the array X represents the bottom
surface and uk the upper surface. We call this the upper signature. Similarly,
the signature that is calculated using {b,} for the lower surfaces and X for the
upper surfaces is called the lower signature.
This algorithm was run on 12 outdoor images of size 120 x 240, having
255 gray values. For each image, we calculated the upper and lower images
u,,b,, i = I , . . . , 50, and the graph of the upper and lower signatures.
As k increases, regions of pixels initially having the greatest gray values
decrease in size in the images b, . However, as k increases, the images uk shrink
regions having lower gray values. In theory, this asymmetry can be used to
advantage. Roughly, the lower signature represents the shape of objects with
high gray values, and the upper signature represents the distribution of
objects throughout the image. The images to which we applied this method
were infrared, so we were mainly interested in the lower signatures.
The magnitude of the curve S(k) is related to the information lost on
objects with details less than k in size. The more gray-level variation at
distance k, the higher the values for S(k). Thus, if at small k, S(k) is large,
then there are “high-frequency” gray-level variations, and if at large k, S(k)
is large, then we have “low-frequency’’ gray-level variations. The curve S(k)
thus gives us information about the rate of change of variations in the
gray-level surface.
After running the program on a dozen images, we have concluded that this
algorithm is too sensitive to the great variance in outdoor scenery. For
example, an image that has a background of trees and no man-made objects,
and an image that has two distinct man-made objects and no trees as
background have similar graphs of the signatures. While the lower signature
represents more of the shape of the hot objects (areas with high gray values)
in the image, in one image we have no hot objects while in the other, there
are two distinct hot objects. As another example, in two other images we have
a man-made object with a road and a field as background, yet the graphs for
the upper signatures of these images are very distinct. The theory suggests
that upper signatures should represent similar targets, but we cannot draw
that conclusion from this data. A controlled scene such as a conveyor belt or
124 JENNIFER L. DAVIDSON

other industrial scene will most likely produce better results than outdoor
scenery.
The initial goal of investigating this type of complexity measure was that
these graphs would give a measure of gray-level variation within an image
and help in choosing a more effective edge operator. If an image has a high
incidence of gray-level variation at small values of k, then it is reasonable to
assume that a more sensitive mask, such as the gradient mask, would give
better results. Otherwise, if an image had small values of S(k) at small values
of k, then computation time could be saved by using a Sobel operator instead
of a computationally intensive edge operator such as the Kirsch. Unfortu-
nately, the algorithm did not produce data that leads to this conclusion.

D . The Dual Transportation Problem in Image Algebra

This section gives a short description of the transportation problem in linear


programming, and provides a translation of the dual transportation problem
into image algebra notation. Thus, it provides an example of the use of the
isomorphism Y-I.
Let m producers and n consumers of some commodity be given. Let p ,
denote the production capacity of producer i, d, denote the demand of
consumer j , and c, denote the cost of transporting one unit of commodity
from producer i to consumer j . The problem is to determine how much
commodity to ship from each consumer so that consumer demands are met,
production capacities are not exceeded, and transportation costs are
minimized. This can be formulated as a linear programming (LP) problem,
which we state as follows.
Let z , be the number of units of commodity to be shipped from producer
i to consumer j . Then the total transportation cost

is to be minimized. To stay within production capacity, we also must have

zii<pi, i = I , . . . , m ,
j=l

and to satisfy consumer demands we must have


rn
1 zii2di,
i=l
j = 1 , . . . , n.
FOUNDATION AND APPLICATIONS OF LATTICE TRANSFORMS 125

Thus the LP problem is to


minimize

subject to
n

- 1 Z,B -pi, i = 1,. . . ,m,


j=l
rn
1 zu>d,,
i= I
j = l , . . . , nand

zy 20 for all i, j
Let x , be the dual variable associated with the ith constraint in (19), and
y, the dual variable associated with thejth constraint in (20). Then the dual
transportation problem is given by Murty (1976):
maximize
m

subject to
-xi + y j < cij for all i, j
xi 3 0, y i B 0 for all i , j
This is equivalent to solving
minimize

subject to
-x, + y, < c, for all i, j
x , 2 0, y, >0 for all i, j ,
which is
minimize
m n

subject to
-x , + Y/ G c, for all i, j
x , 2 0, y, 2 0 for all i,j.
126 JENNIFER L. DAVIDSON

Make a change of variables by letting

vJ = - y j and ui = - x i , for all i,j.

Then we have the equivalent dual LP problem


minimize

c d,vj- 1 p . u
n

j= I
m

i= I
I f

subject to

u, - VJ < c, for all i, j


u, ,< 0, vJ < 0 for all i , j
Using the theory of complementary slacks (Murty, 1976), if we assume that
the producers p , have value p , > 0 for all i, then we can be guaranteed that
for each i = 1, . . . , m, there exists at least one j~ { I , . . . , n } such that

and, hence,
n
ui = min {cv y j } ,
j= I
+
where u = (uI, . . . , u,) and v = (vl , . . . , v,) are optimal feasible solutions.
We can rewrite (21) in vector notation, as

u= c x'v,
and ui,vj < 0.
To formulate this problem in the context of the image algebra, we define
X and Y to be nonempty, finite coordinate sets, 1x1 = m, IYI = n. Define
dE(R!dY by

and define p ~ ( R t , ) ' by


i=j
00 otherwise
FOUNDATION AND APPLICATIONS O F LATTICE TRANSFORMS I27

Then we have

LP Image Algebra

Now, define t E (Rr )’ by


tx,(Y,) = c,, .

We have the equation u = C x ’ v translates as a = b l t . Thus, in image


algebra notation, the dual LP problem is
minimize
1( b l a- c( a l p )
subject to
a = bmt
a<O,b<O

REFERENCES

Backhouse, R. C., and Carre, B. (1975). “Regular algebra applied to path-finding problems,”
J . Inst. Math. Appl. IS, 161-186.
Bdtcher, K.E. (1980). “Design of a massively parallel processor,” IEEE Trans. Computers 29(9),
836-840.
Benzaken, C. (1968). Structures algebra des cheminements. In “Network and Switching Theory”
(Biorci, ed.), pp. 40-57. Academic Press.
Birkhoff, G. (1940). “Lattice Theory,” Vol. 25. AMS, Providence, RI.
BirkhofT, G., and Lipson, J. L. (1970). “Heterogeneous algebras,” J . Combinatorial Theory 8,
115-133.
Carre, B. (1971). “An algebra for network routing problems”, J . Inst. Math. Appl. 7 , 273-294.
Cloud, E., and Holsztynski, W. (1984). “Higher efficiency for parallel processors,” In Proc. IEEE
Southcon 84. pp. 416-422. Orlando, FL.
Cohen, J. E. ( I 988). “Subadditivity, generalized products of random matrices and operations
research,” H A M Review, 69-86.
Crimmons, T. R., and Brown, W. M. (1985). “Image algebra and automatic shape recognition,”
IEEE Trans. Aerospace and Elec. Systems AES-21(1), 60-69.
Cuninghame-Green, R. (1 960). Process synchronisation in steelworks - a problem of feasibility.
128 JENNIFER L. DAVIDSON

In “Proc. 2nd Int. Conf. on Oper. Research,” (Banbury, ed.), pp. 323-328. London, English
University Press.
Cuninghame-Green, R. (1962). “Describing industrial processes with interference and
approximating their steady-state behaviour,” Oper. Research Quart. 13, 95-100.
Cuninghame-Green, R. (1979). “Minimax Algebra: Lecture Notes in Economics and Mathe-
matical Systems 166”. Springer-Verlag, New York.
Davidson, J. L. (1989). Lattice Structures in the Image Algebra and Applications to Image
Processing, PhD thesis, Department of Mathematics, University of Florida, Gainesville, FL.
Davidson, J. L. (1990). “A classification of lattice transformations used in image processing,”
accepted for pub. in Comp. Vis., Graphics, and Image Proc., 1992.
Davidson, J. L. (1991). “Nonlinear matrix decompositions and an application to parallel
processing,” accepted for pub. in Journal of Mathematical Imaging and Vision, 1992.
Davidson, J. L., and Ritter, G . X. (1990). Theory of morphological neural networks. In “Proc.
of the 1990 SPIE OE/LASE 1990 Optics, Elec.-Optics, and Laser Appl. in Sci. and Eng.,”
Vol. 1215, pp. 378-388. Los Angeles, CA.
Davidson, J. L. and Sun, K. (1991). Template learning in morphological neural nets. In “SPIE
- Proc. SOC.of Photo-Optical Instr. Eng.,” Vol. 1568, San Diego, CA.
Duff, M. J. B. (1982). CLIP4. In “Special Computer Architectures for Pattern Processing,”
(K. S. Fu, ed.), CRC Press, Boca Raton, FL.
Fountain, T. J., Matthews, K. N., and Duff, M. J. B. (1988). “The CLIP7A image processor,”
IEEE PAMI lO(3).
Fraleigh, J. B. (1967). “A First Course in Abstract Algebra.” Addison-Wesley, Reading, MA.
Gader, P. D. (1986). Image Algebra Techniquesfor Parallel Computation of Discrete Fourier
Transforms and General Linear Transforms. PhD thesis, University of Florida, Gainesville,
FL.
Gader, P. D. (1988). “Necessary and sufficient conditions for the existence of local matrix
decompositions,” SIAM Journal on Matrix Analysis and Applications, 305-3 13.
Gader, P. D. (1989).“Bidiagonal factorization of Fourier matrices and systolic algorithms for
computing discrete Fourier transforms,” IEEE Trans. on ASSP 37(8), 1290-1283.
Giffler, B. (1960). “Mathematical solution of production planning and scheduling problems.”
Technical Report, IBM.
Hadwiger, H. (1950). “Minkowskische Addition und Subtraktion beliebiger punktmengen und
die Theoreme von Erhard Schmidt,” Mathemarische Zeitschrvt 53, 210-218.
Hadwiger, H. (1957). “Vorlesungen iiber inhalt, Oberflache und Isoperimetrie.” Springer-
Verlag, Berlin.
Haralick, R. M., Sternberg, S. R., and Zhuang, X. (1987). “Image analysis using mathematical
morphology,” IEEE Trans. PAMI PAMI3(4), 532-550.
Heijmans, H. J . A. M. (1990). “The algebraic basis of mathematical morphology, I: Dilations
and erosions,” Comp. Vis.. Graphics, and Image Proc. 50, 245-295.
Hillis, W. D. (1985). “The Connection Machine.” MIT Press, Cambridge, MA.
Klein, J. C. and Serra, J. (1972). “The texture analyzer,” J. Micros 95, 349-356.
Maragos, P. (1985). A UniJied Theory of Translation-Invariant Systems with Applications to
Morphological Analysis and Coding of Images. PhD thesis, Georgia Inst. Tech, Atlanta, GA.
Maragos, P. and Schafer, R. W. (1987). “Morphological filters, part I: Their set-theoretic
analysis and relations to linear shift-invariant filters,” IEEE Trans. Acoustics, Speech, and
Signal Proc. ASSP-35, 1153-1 169.
Matheron, G. (1967). “Elements pour une Theorie des Milieux Poreux.” Masson, Paris.
McCubbrey, D. L., and Lougheed, R. M. (1985). “Morphological image analysis using a raster
pipeline processor.” In “IEEE Comp. Workshop on Comp. Arch. for Patt. Anyl. and Image
Database Mngemt.,” 444-452, Miami Beach, FL.
FOUNDATION AND APPLICATIONS O F LATTICE TRANSFORMS 129
Meyer, F. (1978). “Iterative image transformation for an automatic screening of cervical
smears,” Journal of Histochem. and Cytochem. 27(1), 128-135.
Miller, P. E. (1978). An Investigation of Boolean Image Neighborhood Transformations. PhD
thesis, Ohio State University.
Miller, P. E. (1983). “Development of a mathematical structure for image processing.” Technical
Report, Optical Division, Perkin-Elmer.
Minkowski, H. (1903). “Volumen und Oberflache,” Mathematische Annalen 57, 447-495.
Minkowski, H. (191 1). “Gesammelte Abhandlungen.” Teubner Verlag, Leipzig-Berlin.
Murty, K. G. (1976). “Linear and Combinatorial Programming.” John Wiley, New York.
Nakagawa, Y., and Rosenfeld, A. (1978). “A note on the use of local min and max operations
in digital picture processing,” IEEE Trans. Sys., Man, and Cyber. SMC-8, 632-635.
Parlett, B. N. (1982). “Winograd’s Fourier transform via circulants,” Linear Algebra Appl. 45,
137-155.
Peleg, S. (1983). “Multiple resolution texture analysis and classification,” Technical Report,
Center for Automation Research, University of Maryland, College Park, MD.
Peteanu, V. (1967). “An algebra of the optimal path in networks,” Mathematica 9, 335-342.
Ritter, G. X., Davidson, J. L., and Wilson, J. N. (1987). Beyond mathematical morphology. In
“Proc. of SPIE Conf. - Visual Communication and Image Processing 11,” Vol. 845, 260-269.
Cambridge, MA.
Ritter, G. X., and Gader, P. D. (1987). “Image algebra techniques for parallel image pro-
cessing,” Journal of Parallel and Distributed Computing 4(5), 7-44.
Ritter, G. X., Shrader-Frechette, M. A., and Wilson, J. N. (1987b). Image algebra: A rigorous
and translucent way of expressing all image processing operations. In “Proc. of the 1987 SPIE
Tech. Symp. Southeast on Optics, Elec.-Opt., and Sensors,” 116-121. Orlando, FL.
Ritter, G . X . , and Wilson, J. N. (1987). Image algebra: A unified approach to image processing.
In “Proceedings of the SPIE Medical Imaging Conference,” Newport Beach, CA.
Ritter, G . X., Wilson, J. N., and Davidson, J. L. (1990). “Image algebra: An overview,” Comp.
Vis., Graphics. and Image Proc. 49(3), 297-33 I .
Rose, D. J. (1980). “Matrix identities of the fast Fourier transform,” Linear Algebra Appl. 29,
423-443.
Serra, J. (1969). Introduction a la morphologie mathtmatique. Technical Report, Cahiers du
Centre de Morphologie Mathematique, Fontainebleau, France.
Serra, J. (1975). Morphologie pour les fonctions “a peu pris en tout ou rien.” Tehnical Report,
Cahiers du Centre de Morphologie Mathematique, Fontainebleau, France.
Serra, J. (1982). “Image Analysis and Mathematical Morphology.” Academic Press, London.
Serra, J. (1988). “Image Analysis and Mathematical Morphology, Volume 2: Theoretical
Advances.” Academic Press, New York.
Shimbel, A. (1954). Structure in communication nets. In “Proc. Symp. on Information
Networks,” 119-203. Polytechnic Institute of Brooklyn.
Sinha, D., and Giardina, C. R. (1990). “Discrete black and white object recognition via
morphological functions,” IEEE Trans. PAMI PAMI-12(3), 275-293.
Sternberg, S. R. (1980a). Cellular computers and biomedical image processing. In “Lecture
Notes in Medical Informatics, Proc. on Biomedical Images and Computers,” (J. Sklansky,
ed.), Vol. 17, 274-319. Springer-Verlag, Berlin.
Sternberg, S. R. (1980b). Language and architecture for parallel image processing. In “Conf. on
Patt. Rec. in Practice,” Amsterdam.
Sternberg, S. R. (1983). “Biomedical image processing,” Computer 16(1), 22-34.
Sternberg, S. R. (1985). Overview of image algebra and related issues. In “Integrated Technology
for Parallel Image Processing.” Academic Press, London.
130 JENNIFER L. DAVIDSON

Sternberg, S. R. (1986). Grayscale Morphology, Comp. Vis., Graph., and Image Processing 35,
pp. 333-355.
Uhr, L. (1983). Pyramid multi-computer structures, and augmented pyramids. In “Computing
Structures for Image Processing,” (M. J. B. Duff,ed.). Academic Press, London.
Unger, S. H. (1958). “A computer oriented toward spatial problems,” Proc. IRE46, 1744-1750.
Von Neumann, J. (1951). The general logical theory of automata. In “Cerebral Mechanism in
Behavior: The Hixon Symposium.” Wiley and Sons.
Zhuang, X., and Haralick, R. M. (1986). “Morphological structuring element decomposition,”
Comp. Vis.,Graph.. and Image Proc. 35, 370-382.
ADVANCES I N ELECTRONICS A N D ELECTRON PHYSICS, VOL 84

Invariant Pattern Representations and Lie Groups Theory


MARIO FERRARO
Dipartimento di Fisica Sperimentale. Universita’ di Torino, Torino, Italy

I. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
11. The LTG/NP Approach to Visual Perception . . . . . . . . . . . . . . . . 137
111. Invariant Integral Transforms and Lie Transformation Groups . . . . . . . . 142
A. Background. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
B. Necessary and Sufficient Conditions for the Invariance of Integral I46
Transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
C. Examples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
D. Invariant Functions and Kernels of Integral Transforms. . . . . . . . . . 154
IV. Transformations of Integral Transforms . . . . . . . . . . . . . . . . . . 157
A. Weakly Invariant Representations . . . . . . . . . . . . . . . . . . . 157
B. “Covariance” of Integral Transforms . . . . . . . . . . . . . . . . . . 160
V. Notes on Invariant Representations of 3D Objects. . . . . . . . . . . . . . 166
VI. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
Appendix A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
Appendix B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I92

I. INTRODUCTION

The problem of invariance is central to image or object recognition; indeed


it may be said that, in a fundamental sense, invariant recognition is precisely
what recognition system must be able to achieve. Any visual system, biologi-
cal or artificial, that claims the ability to recognize images or objects is
qualified by its ability to function invariant to a variety of transformations:
the system must not only be able to recognize an image or an object trans-
formed with respect to some given prototype, but must also retain the
uniqueness of the match and register the transformational state. For
instance, in natural environments, patterns or objects to be detected have
unknown position, orientation, and size relative to some prototypical classes,
and so any recognition system must be able to function in a way that is
invariant to translations, rotations, and dilations. The human visual system
is in general quite able to perform under these conditions, even though many
experimental studies have elucidated the limits of this ability. The literature
on the nature and limits of invariant pattern recognition in the human visual
system is, not surprisingly, very large, but the results are far from conclusive,

131 Copyright 03 1992 by Academic Press. Inc.


All rights of reproduction in any form reserved.
ISBN 0-12-014726-2
132 MARIO FERRARO

as they seem to depend critically on the experimental paradigm and on the


type of pattern used as visual stimulus. For instance, classical chronometric
studies on invariance under rotations (see, e.g., Metzler and Shepard, 1974)
show that the time it takes to compare a pattern, such as a polygon, to its
rotated version increases linearly as a function of the rotation angle, up to
180 degrees, whereas experiments using letters or letter-like forms (Corballis
et al., 1978; Eley, 1982) found that the recognition time is unaffected by the
angular disparity between two patterns. Other experiments, in which displays
of limited duration were used (Rock, 1973; Foster and Mason, 1979; Kahn
and Foster, 1981), showed that the accuracy in discriminating patterns
decreases with the rotation angle up to 90 degrees. Similar results were
reported for size transformations (dilations): Bundesen and Larsen (1975)
found that recognition depends on the extent of the transformation whereas
Kubovy and Podgorny (198 l), using random polygons, found that there is no
penalty for the amplitude of the transformation. Recently, Nazir and
O’Regan (1990) have demonstrated that even in the case of translations
pattern recognition is not completely invariant. If more complex stimuli are
used, the recognition process seems to fail completely; thus, it is almost
impossible to read cursive handwriting upside down (Rock, 1973, 1984), and
difficulties in recognizing complex stimuli arise also in the case of dilations
(Kolers et al., 1985). This is not surprising in light of the fact that the visual
system has a limited processing capability (Eriksen and Eriksen, 1974;
Lupker and Massaro, 1979; Ferraro and Foster, 1984). In conclusion, the
ease of recognition of a transformed pattern seems to depend on factors such
as the temporal duration of the visual stimulus, the structure of the pattern
or object to be recognized, and its complexity.
As concerns artificial vision, since the early 1970s different techniques have
been used to develop efficient computational procedures to enable pattern
matching that is invariant to rigid motion in two dimensions (planar trans-
lation plus rotation in the frontoparallel, ( x , y ) plane) and dilations, whereas
only recently have attempts been made to achieve object recognition that is
invariant under three-dimensional rigid motion (translations along and
rotations about the x , y , and z axes). In this paper we shall be concerned
mostly with invariant pattern recognition, but the problem of object recog-
nition will also be briefly considered.
The ability of any visual system to perform invariant to a given transfor-
mation is determined by the way the visual input is encoded, or internally
represented, by the system. Thus, “invariant coding” deals precisely with the
problem of finding a representation of the pattern that is invariant under
certain transformations and preserves the uniqueness of the representation.
(The last condition implies that the transformational state of the pattern
relative to some reference state must be also preserved in the representation.)
INVARIANT PATTERN REPRESENTATIONS A N D LIE GROUPS THEORY 133
Both requirements are necessary: the invariance of the encoding ensures
recognition of the pattern even though it is transformed with respect to the
prototype, whereas uniqueness prevents false recognitions. A representation
that is invariant and preserves uniqueness, and hence encodes the transfor-
mational state, is said to be “invariant in the strong sense,” whereas a
representation that is invariant but not unique is called “invariant in the
weak sense” (Caelli et al., 1992), and likewise we shall refer to “strong” or
“weak” invariance with respect to certain transformations of a recognition
process or technique.
The most natural way to represent an image, and indeed the most closely
related to the physical process underlying visual perception, is given by its
definition in the Cartesian domain. An image in the Cartesian (x, y ) domain
is defined by a function f:R2+ R,f(x, y ) = z , where z is the light intensity
at the point P(x, y). The domain of definition off is a subset D c R2,f is
bounded for any (x, y ) ED, and we setf(x, y ) = 0, if (x, y ) $ D. The graph of the
functionfis then a representation of the image in the domain (x, y ) . (More
rigorously, a representation r(x, y ) in ( x , y ) is given by the triple (x, y , f ( x ,y));
we use just f ( x , y ) for sake of simplicity.) Moreover, functions of light
intensity certainly satisfy the following conditions, whose relevance will be
apparent later:

EL,, that is, Ij’“If(&


-m
y)ldxdy < a; (la)

f has a finite number of discontinuities and a finite number of


minima and maxima for any rectangular domain of definition. (1 b)
The representationf(x, y ) is certainly unique (by definition) but in general
is not invariant with respect to the most relevant transformation groups. Let
T, be a one-parameter transformation group with parameter a (see Appendix
A for definitions and notations), where a specifies the amount of the transfor-
mation. The transformed image is defined by T , f ( x , y ) = f ( Tax, T,y) and, for
different values of the parameter a, the action of T, produces different
transformational states of the image. A representation then is invariant if it
is such that for two different values of the parameter a, such as a’ # a”,
T,.f(x,y ) = T,.f(x, y ) for any pair (x, y ) , and, since this is in general not the
case, f ( x , y ) is not invariant with respect to one-parameter transformation
groups. Consider the group S0(2), that is, the group of rotations in the
(x, y ) plane. For two different values a’, a” of the parameter, we have in
general T,.f(x, y ) # T,-f(x, y). For example, letf(x, y ) be the simple pattern
formed by a bright line on a dark background,
f ( x , y ) = cost if y - kx = 0
f(x, y ) = 0 otherwise.
134 MARIO FERRARO

Suppose the pattern undergoes a rotation with parameter value a; the trans-
formed pattern is,
T,f(x, y ) = cost if ( y c o s a + xsina) - k(xcosa - ysina) = 0,
TJ(x, y ) = 0 otherwise,
and T,f(x, Y ) Z T o f ( x ,Y ) = f kY ) .
Thus, patterns, or patterns features, are generally not invariant; indeed,
invariance is a property of the process of perception and not of the images.
This point is crucial for understanding the problem of invariant coding, and
it is often missed in the literature.
Representations satisfying the condition of weak invariance can be
obtained fromf(x, y ) in a variety of ways. The average of image intensity is
invariant to rigid motion, and representations based on geometric features of
the pattern, such as critical points (Haralick et al., 1983) or the Gaussian
curvature (Zetzsche and Barth, 1990), enjoy the same property, but none of
these preserves the uniqueness of image representation. A single real-valued
scalar function in the domain (x,y ) cannot define a representation that is
both invariant and unique, and the reason is obvious: if the function changes
under the action of the transformation, invariance is lost, whereas if it does
not change, the transformational state is not encoded. Thus, it is apparent
that invariant representations must be sought by mapping the pattern from
(x, y ) to some suitable space (u, v), that is, by giving the image a new
representation in (u, v) that possesses the desired property of invariance, and
comparison of images can be made in this new space. To draw conclusions
about the original patterns in the spatial domain, such mapping must
preserve the uniqueness of the image.
An encoding of visual information, alternative to the formf(x, y ) , is given
by the complex-valued integral transforms of images. Formally, a general
integral transform off(x, y ) is

O [ f ( x ,Y)l = g(u9 v) = jrm


A x , y M u , v ; x, y)dxdy
-m
(2)

for some kernel k(u, v ;x, y ) ;g(u, v) is, in general, a complex-valued function
and can be written as
g(u, v) = 4,
v)exp[Mu, v>I, (3)
where A(u, v) = Ig(u, v)l and 4(u, v) correspond to magnitude (amplitude)
and phase spectra respectively. The representation of the image in the domain
(u, v) is defined by the pair
{ A b , v), 4@,v>>, (4)
that is, to any point in the domain, (u, v) is associated a vector with
INVARIANT PATTERN REPRESENTATIONS AND LIE GROUPS THEORY 135

components A(u, v) and 4 ( u , v). Note that in the domain (x, y ) the repre-
sentation was given by a scalarf(x, y ) defined for each value of the pair
(x, y ) . In the following, to simplify the notation we use g(u, v) both to denote
the results of operation of the integral transform, i.e., a map from (u, v) to
the complex plane, and as a shorthand for the representation { A @ , v), $(u, v)}.
In the vector-valued representation (4), the requirements of invariance and
uniqueness are determined by amplitude and phase components respectively.
The invariance condition is fulfilled if A(#, v) is constant for all states of the
image transformation T,, whereas uniqueness is preserved if different states
are uniquely coded in the phase component of the transform (Ferraro and
Caelli, 1988):

Note that the action of T, is defined in the domain (x, y ) and not in (u, v),
i.e., T, acts on the original pattern form f ( x , y ) . With a slight abuse of
notation we shall call an integral transform satisfying condition (5) invariant
in the strong sense with respect to To, since the corresponding representation
g(u, v) is strongly invariant. Condition (5) can be extended to two transfor-
mation groups T,, S, . We require that

For example, the shift theorem (Rosenfeld and Kak, 1982; Papoulis, 1984)
demonstrates that the Fourier transform off(x, y ) is invariant in the strong
sense with respect to translations along the x and y axes. If

is the Fourier transform off(x, y ) , then


+
F [ ~ ( x a, y + b)]= A(u, v )exp i[4(u, v) + (ua + vb)l.
Thus, in the Fourier transform the amplitude spectrum A(#, v ) is invariant
to shifts along the x and y axes, and the translational states are encoded in
the phase spectrum. Conditions (la,b), ensure that the Fourier transform is
unique in the sense that two functions with the same Fourier transform are
equal almost everywhere (Bochner and Chandrasekharan, 1949), and then it
is trivial to prove that the inverse of the shift theorem holds: given two
patterns f i ( x , y ) , &(x, y ) , with Fourier transforms F,(u,v) and F2(u,v)
respectively, if F,(u, v) and F2(u,v) differ only for an additive term in the
phase thenf, (x, y),&(x, y ) are the translated versions of each other except,
at most, for a set of zero measure in the domain (x, y ) .
In general, given a pair of transformation groups T,,S, in the domain
(x, y ) , the solution of the problem of invariant coding entails establishing
necessary and sufficient conditions under which there exists an integral
136 MARIO FERRARO

transform such that the representation g(u, v ) in (u, v ) is strongly invariant


with respect to the given pair and, moreover, a procedure must be found to
specify the analytical form of the integral transform.
In this chapter, we shall present results concerning invariant coding,
which were obtained using the theory of Lie transformation groups; we
thereby assume that all groups considered are one-parameter Lie transfor-
mation groups; this is not too restrictive since most groups of interest are
one-parameter (Lie) transformation groups. (A short introductory note on
Lie groups theory is presented in Appendix A.)
The great power of Lie groups theory lies in the fact that one can replace
the complicated, nonlinear conditions for the invariance of a set or function
under the transformation group by an equivalent linear condition of infini-
tesimal invariance under the corresponding infinitesimal operator of the
group. Thus, in this chapter the main tool of our analysis will be the infini-
tesimal operators of one-parameter transformation groups. First, we shall
consider the application of Lie groups to representations of the formf(x, y ) ;
next, the theory will be used to analyze the invariance properties of integral
transforms of images. One of the main topics of this chapter concerns
establishing necessary and sufficient conditions for the existence of
representations in a transform domain (Eqs. 2 and 3) that are invariant in the
strong sense, under the action of a pair (T,, S,) of Lie transformation groups;
a procedure for finding the kernels of these representations will also be
reviewed. Moreover, we will analyze the relationship between function
invariant under a Lie transformation group and the kernels of invariant
representations. Besides studying the conditions under which strong invariance
can be attained, we shall define the notation of “covariance” of integral
transforms under the action of a transformation group; necessary and
sufficient conditions for covariance will be determined, and relationships
between these conditions and the existence of representations invariant in the
weak sense will be examined. Finally, the problem of invariant object recog-
nition, as distinct from pattern recognition, will be analyzed briefly using an
approach based on differential geometry. (Basic notations on differential
geometry are given in Appendix B.) A final point about notation: in the
following, the terms vector field and infinitesimal operator will denote the
same mathematical entity (a justification for this dual notation is given in
Appendix A); roughly speaking, vectorfield will be used to stress geometrical
or topological characteristics, whereas infinitesimal operator will underline
the algebric aspects.
INVARIANT PATTERN REPRESENTATIONS A N D LIE GROUPS THEORY 137
TI. THELTG/NP APPROACHTO VISUALPERCEPTION

In a series of papers, Hoffman (1966, 1970, 1977) presented a model of visual


space perception and pattern recognition called “Lie Transformation Groups
Approach to Neuropsychology” (LTG/NP) (Hoffman, 1977). The assump-
tions made in Hoffman’s model of the visual system can be stated briefly as
follows.
The visual field is considered a two-dimensional manifold, and the proper-
ties of this manifold M are determined by the structure and arrangement of
the receptive fields of the retinal cells; these receptive fields are not disjoint
(their intersection is not empty) and differ in size. Next, it is assumed that the
visual cortex is formed by a hierarchy of neural cells, or neurons, whose
activity is driven by particular retinal cells, the ganglion cells, and is affected
(in an excitatory in inhibitory fashion) by the activity of the neighboring
cortical cells. The input signals to these cells seem to be such that orientation,
direction, motion, and other relevant features represent information whose
encoding is differential in form, that is, they encode the rate of change of
specific parameters (Caelli and Umansky, 1976).
The results of the neurophysiological research of Hubel and Wiesel (1962,
1965), as well the work of Breitmeyer (1973) on human perception, indicate
that the vertebrate visual cortex contains neurons that respond to orien-
tations, size, and motion parameters of linear shapes. In particular, the visual
receptive fields of simple and complex cortical units discovered by Hubel and
Wiesel (1962) have vector-like properties insofar as they have a position,
direction and (probably) a magnitude (Dodwell, 1983) associated with them.
The assumption that cortical cells encode rates of change of particular
parameters suggests a mechanism for their function that is different from the
one proposed by Hubel and Wiesel (1965). In Hubel and Wiesel model
orientation specificity is determined by the total activity of the ganglion cells,
with overlapping receptive fields, assuming a threshold model, whereas
LTG/NP contends that cortical cells respond to differences in impulses from
the retinal ganglion cells.
The fundamental idea of LTG/NP is that the visual cortex assigns a
collection of tangent vectors to the visual manifold M and that we can
consider the neural process of the Hubel and Wiesel cells as the action of
vector fields that connect local tangent vectors to form integral curves, or
orbits, and these orbits, generated by the integrative process, are the visual
contours of images.
Edges or boundaries of images are perhaps the most important part of the
structures that link sensory data with their interpretation (Attneave, 1954;
Marr, 1982); a variety of methods have been proposed to detect and encode
138 MARIO FERRARO

edges efficiently. However, a simple local encoding of edge elements must be


complemented by some integrative process to produce a coherent contour of
the image (Marr, 1976; Marr and Hildreth, 1980; Ballard and Brown, 1982;
Canny, 1986; Torre and Poggio, 1986).
The LTG/NP model supplies an elegant mathematical description of how
the representation of local contour elements takes place and how the process
of integration occurs. Local coding is specified by tangent vectors to edge
elements that approximate the contour at different points with a linear fit,
whereas the vector fields specify global characterisitics of the contour. Vector
fields generate orbits via the exponential map, but to do so they must have
the property of holonomy; in other words, local vectors must be aligned
“head to tail” rather than scattered across the visual field incoherently. We
are assured by a theorem due to Frobenius that a necessary and sufficient
condition for the holonomy property to hold is that locally the vector field
must be such that the differential 1-form p = dy - pdx, where p is the local
direction-field element, is identically equal to zero along an integral curve
(Cartan, 1971; Schutz, 1980). In fact, the curve y(s) obtained by parameteriz-
ing all x and y satisfying the preceding condition is the integral line of the
vector field (Cartan, 1971). The condition of holonomy establishes the
essential link between local and global coding in the sense that, if the vector
field has the property of holonomy, the local information about the stimulus
is integrated to form a visual contour,
We can describe the process of contour perception as follows: the sensory
input is sampled by the retinal cells, the cortical neural cells extract differen-
tial information assigning a tangent vector to every element of the sample,
and successive tangent vectors line up to give rise to the best linear
approximation of the contour; the result is a polygonal arc approximation to
any image contour within the limits of the visual acuity. This implies that a
contour z = c ( x , y ) must be invariant under the action of some vector field
or infinitesimal operator 9, and that c(x,y ) must satisfy the condition
Y c ( x , y ) = 0 (see Appendix A).
Hoffman (1966) contended that the process of perception occurs by means
of the process of cancellation, that is, the visual system seeks those differential
operators that will reduce the output of a given orbit (i.e., a contour) to zero.
Simple forms like lines or circles correspond to orbits of basic vector fields
such as translations, dilations, rotations, or their combinations; these are the
basic vector fields, or infinitestimal operators, of the model. If a given
contour c ( x , y )is not annulled by any of the basic vector fields, as is certainly
true for complex patterns, prolongations of increasingly high order of the
basic vector fields are to be considered until, for some prolongation 8‘k’,
L@’c(x, y ) = 0 (Hoffman, 1970; see Appendix A for definitions and notations
of prolongations of vector fields). Visual perception is thus characterized by
INVARIANT PATTERN REPRESENTATIONS A N D LIE GROUPS THEORY 139

a hierarchy of increasing complex forms of perception corresponding to


higher and higher orders of prolongations, to process and perceive patterns
of arbitrary complexity (Hoffman, 1970).
The LTG/NP approach aims to explain visual perception in biological
systems and, as such, needs to be corroborated by empirical findings; we shall
discuss here only some of the experimental results supporting LTG/NP; a
detailed critical review can be found elsewhere (Dodwell, 1983).
If contours are processed by a set of basic vector fields and their prolonga-
tions, it follows that the orbits of these basic vector fields should be visual
forms that are salient and easy to discriminate for biological visual systems.
Wilkinson and Dodwell ( 1 980) tested this prediction in experiments with
kittens and showed that orbits of basic vector fields, also called Lie-patterns,
are easier to discriminate than non-Lie-patterns of the same apparent
complexity.
Caelli and Dodwell (1982) studied the relationship between local and
global coding of visual contours, i.e., the relevance of the holonomy
condition, through the properties of vectorgraphs. A vectorgraph is a pattern
made of short line segments that represents the sample of a vector field in R2;
each element has both a defined position and a defined orientation. The
experimental results showed that the ease of discrimination and the fineness
of encoding of loccal position and orientation were affected by the global
structure of the vectorgraph, and that the effect was particularly strong for
vectorgraphs representing basic vector fields.
Vectorgraph-like displays were used by Caelli (1976) in an experiment that
studied the relevance for visual processing of commutativity of two vector
fields. He contended that the commutator, or Lie bracket [S?,,Y 2 ]of two
vector fields 9, ,Y2is a measure of the perceptive “interaction” between the
corresponding contours. Thus, for instance, since vector fields of translations
and rotations YTand zR do not commute, there should be some perceptive
interaction between straight parallel lines and circles (i.e., the orbits of YT
and YRrespectively) when they occur simultaneously in an image; by
contrast, [gR, S?”] = 0 for rotations and dilations; correspondingly, circles
and stars of radial lines (i.e., the orbits of 2”) should be perceptually
independent. Then, according to the interaction effect of the commutator, the
task of perceiving straight parallel lines on a background of circles should be
more difficult than that of perceiving radial stars on the same background.
The experimental results confirmed this prediction and, moreover, showed
that discrimination sensitivity increases when the length of the segments
representing elements of the vector field is increased (Caelli, 1976).
Experiments carried out by Foster (1972), in an investigation of the phi-
motion phenomenon not directly related to LTG/NP, demonstrated that
140 MARIO FERRARO

perceived paths of apparent motion are indeed Lie orbits, provided that the
angular separation between subsequent stimuli is not too large.
The LTG/NP model itself supplies only a general language for contour
perception and not a computational procedure; a formulation is needed that
can predict the direction of the contour as a function of some neural process
and of the geometry of the stimulus. In the general framework of LTG/NP,
Caelli et al. (1978) proposed a model in which vectors tangent to a contour
were computed from a sample of N points. To each pair of points P, =
P ( x , , y,), P, = P(x,, y , ) is assigned a measure of association by the function
wa(rl,)= exp (- url,), where rl, is the distance between P, and P,, and u is a
constant. Next, for each point the components u,, v,of the tangent vectors are
calculated in two steps. First, the averages iii,ei are computed with the
formulae
N
iii = C cos 28, wm(ri,), (7a)
i= I
N
fji = C sin 20ijwa(ri,). (7b)
i= I

Note that ii, 6 are the weighted averages of cos 28, and sin 28, respectively,
and that these averages are calculated since they have the property that
vectors with the same orientation but opposite sign give the same
contribution. (We are interested here in determining only the orientation of
the tangent vectors). The components uir vi of the tangent vectors are
obtained by converting the “28 averages” to “8 averages,” that is, by
calculating
( u i ,v i ) = (ricos Oi, ri sin O i )
so that
(iii, iji) = (ricos28,, risin 20i),
and the orientation of the contour at P, is estimated by the angle O i . This
method is consistent with the idea that the visual system samples the visual
stimulus and that contours of patterns are reconstructed by the action of a
vector field; but it must be noted that there is not an explicit calculation of
the integral curves that are solutions of the equations
dx dv

and represent the contour, even though this computation is thought to be


performed by some kind of piecewise linear interpolation.
Experiments with random dot displays (Caelli et al., 1978) demonstrated
that, indeed, in the process of extracting local orientation codes from discrete
INVARIANT PATTERN REPRESENTATIONS AND LIE GROUPS THEORY 141
images, human observers weight elements as an inverse function of the
distance, and that the contour path is the result of a summation of local
vector orientations.
The LTG/NP model is characterized by many appealing features: it is
based on a rigourous definition of the visual space (the visual manifold), is
simple and mathematically elegant, and establishes a precise relationship
between local and global processing. Unfortunately, these advantages are
offset, in my opinion, by serious drawbacks. First, LTG/NP deals only with
the visual contours and has not, to the best of my knowledge, been extended
to a model that takes the full image into account. We have already noted that
contours are extremely important in visual information processing, but a
representation based on contours of course contains only partial information
about the image and certainly is not unique. Although in principle an
extention of the LTG/NP to perception of complete images is possible, it
could be done only at expense of the simplicity and elegance of the model
and, arguably, this generalization would be computationally very expensive
to implement, both in biological and artificial visual systems. Even as a model
of contour perception, LTG/NP is less efficient than other approaches that
can be found in the literature (see, e.g., Ballard and Brown, 1982) since the
calculation of prolongations of order greater than one is very complicated
(Olver, 1986).
As regards the problem of invariance, it must be observed that perception
by cancellation is neither invariant nor unique. Let us consider the issue of
uniqueness first. Iff is a function such that for some infinitesimal operator
9, Y f ( x , y ) = 0, any function of the form g ( f ) is similarly annulled by 9,
Y ( g ( f ( x , y ) ) = 0 (Ovisiannikov, 1982), showing that the process is not
unique. For example, it is straightforward to prove that any circle, with
center (0,O)and generic radius, r, is annulled by the infinitesimal operator of
rotations in the plane 2,= -yd/dx + xd/dy, and, moreover, that any other
function of the form g(x2+ y 2 ) is annulled by 9,. Thus, the infinitesimal
operators seem to act as pattern classifiers rather than as pattern detectors.
On the other hand, considering again the example of the circle, if the center
is not located a t (0,O)but at an arbitrary point ( x o , y o )the output of 3,will
be different from zero since a circle centered in (x,,y,) is annulled by the
testimal operator = - ( y - y,)a/dx + (x - xo)d/dy. One should then
postulate the existence of a system of basic infinitesimal operators and their
prolongations, for any location, in the visual field just to ensure weak
invariance under translation; it is clear that such a solution is not compu-
tationally efficient. The main problem, however, is that for any transfor-
mation, the output of 3 A x , y ) is zero only for a restricted class of images, and
thus cancellation cannot explain how the process of perception is in general
invariant under certain transformations for all patterns, or at least for a large
142 MARIO FERRARO

class. For example, a pattern f(x, y ) #f(x’ + y’) is not invariant under
rotations, but in general we are able to recognize it independent of its
orientation.
On a more fundamental level, it must be observed that LTG/NP is a very
abstract approach to vision, and it would require detailed low-level models
of visual stimulus encoding to provide the necessary predictive power.
Because of this abstractness, even the experimental support is too generic to
be convincing. For instance, LTG/NP postulates the existence of an inte-
grative process that connects local position and orientation codes to global
encoding of images, and indeed evidence has been found for such a process
(Caelli and Dodwell, 1982); however, this assumption is common to many
different models (see, e.g., Grossberg, 1976a,b; Borello et al., 1981; Zucker,
1985; Carpenter et al., 1989), and thus these experimental results cannot be
considered a verification of LTG/NP. Crucial postulates of LTG/NP are that
pattern recognition takes place by a process of cancellation and, in particular,
that complex visual stimuli are processed (cancelled) by prolongations of a
small set of basic vector fields. It is clear that to test this hypothesis a model
is required showing how neural cells in the retina and in the visual cortex
implement the operation of cancellation. In conclusion, it may be said that
LTG/NP is a meta-language, useful for conveying concepts of perceptual
invariances rather than a model that can be tested by experiment or computer
simulation; a different and more favorable assessment of LTG/NP has been
formulated by Dodwell ( 1 983).

111. INVARIANT INTEGRAL TRANSFORMS


AND LIETRANSFORMATION
GROUPS

A . Background

We shall review here some methods that permit invariant recognition under
certain transformations. The method of the cross-correlator, or matched
filter, and its relationship with the Fourier transform will be considered first;
later, integral transforms that are invariant with respect to rotations and
dilations will be presented.
For its compatibility with the human visual system and its computational
efficiency, the cross-correlator has been the most commonly used form of
pattern matching since the early 1970s. Letf(x,y) and g(x,y) be a template
and larger picture (a scene) respectively; we assume thatf(x, y ) is zero outside
a small region A and we are interested in finding places where g(x, y ) matches
f(x,y). We can do so by shiftingf(x,y) into all possible positions relative to
g(x,y) and by computing a measure of the match for each position P(x,y).
INVARIANT PATTERN REPRESENTATIONS A N D LIE GROUPS THEORY 143

One such measure is given by the cross-correlation

C/,(a,P) = IT,((xAg(x + 4 y + P)dxdy;


applying the Chauchy-Schwartz inequality we obtain (Rosenfeld and Kak,
1982),

C/&.P) G [ jJp,r)dxdYjJAd(x + a , y + P)dxdy I (8)

On the right-hand side of formula (8) JJf'(x,y)dxdy is constant, whereas


+ +
JJg'(x a,y P)dxdy depends on a and P; thus, we cannot use C, as a
measure of the match. Instead, we must use the normalized cross-correlation,
defined by

+ +
where q2(a,P) = J j A g 2 ( x a,y b)dxdy.
From the Chauchy-Schwartz inequality it follows that L&(CI,P) takes a
maximum, i.e. [jjf'(x,y)d~dy]''~,for displacements (a, 8) at which g(x, y ) =
cf(x,y), that is, at positions where g(x, y ) andf(x,y) coincide, or at least are
proportional; indeed, the actual value of c is irrelevant and can be always set
equal to one by a suitable rescaling of the light intensity. The cross-correlator
thus provides a method of finding a pattern regardless of its location in the
picture, that is, the cross-correlator is invariant under translations, and the
position of the pattern in the scene, its transformational state with respect to
translations, is also encoded. The ability of the cross-correlator to function
invariant to translations is closely related to the invariance with respect to
translations of the Fourier transform, since C/,(a, 8) can be written as the
inverse transform of the product of G(u, v) and F*(u, v ) , where G(u,v) and
F ( u , v ) are the Fourier transforms of g(x,y) and f ( x , y ) respectively, and
F*(u, v) is the complex conjugate of F(u, v); in particular, the uniqueness of
the Fourier transform ensures that no false recognitions occur and that
the position of the pattern in the picture is registered. However, the cross-
correlation technique fails if the pattern to be detected is transformed by the
action of some group T,; for instance, the cross-correlator is very sensitive to
orientation and scale changes, as one must expect since the Fourier transform
is not invariant under rotations and dilations, and thus it cannot be used for
matching patterns with arbitrary orientation and size. A possible solution is
to use many templates forf(x,y) at different orientation and sizes, but this
solution requires storing a large number of templates, and the computation
time increases with the number of templates, and lacks elegance and simplicity.
To attain pattern recognition invariant with respect to rotations, Hsu et al.
144 MARIO FERRARO

(1982) and Hsu and Arsenault (1982) proposed a technique based on work
originally done in image reconstruction (Hansen, 198 I). Consider an image
function f ( x , y ) in Cartesian coordinates or 3(r, 0 ) in polar coordinates. A
circular harmonic expansion of 3(r, 0) is given by
+m

3(r, 0 ) = C
-a
fm ( r )exp (ime), (9)

where

f,(r) = 1/2n JO2'3(r,0)exp (- imO)dO.

The mth harmonic component is defined by


fm(r, 0) =fm(r) exp ( i d ) .
IfT,(r, 0) is rotated by an angle a,
ml

3(r, 8 + a) = -m
fm(r) exp (id)(ima),
exp

and the cross-correlation of3(r, 0) with fm(r, 0 ) yields


&(a) = A exp(irna),
where A = j$rlf,(r)12dr, and A is constant for any a. Thus, the amplitude of
the cross-correlation is invariant under rotations and the transformational
state is encoded in the phase. If the scene contains the reference function
3(r, 0), the amplitude of its cross-correlation with the filterf,(r, 0) is invariant
with respect to rotations and the angular disparity is given by an additive
term in the phase spectrum. However, the uniqueness of the match is not
ensured, since this method matches the target pattern with a single
component of the reference function. Furthermore, the method is not shift
invariant because the representation depends on the center of expansion,
which was the origin in the preceding discussion. Target images can be
rotated with respect to any center, that must be determined before the
expansion can be made. An iterative procedure has been proposed for finding
such centers, one for each component (Yuzan et al., 1982), but it has many
disadvantages: it requires an analytic expression for A which usually is not
available; the centers may be different for different components; and when
the target pattern is embedded in a large image it is very difficult, in practice
impossible, to find the proper center. Also, since this method matches the
target pattern against a single component of the reference pattern, the
question arises of which component should be used; and the match is in
general not unique. A modified version of this approach was proposed by Wu
and Stark (1 984). They used a common center - the geometrical center of the
INVARIANT PATTERN REPRESENTATIONS A N D LIE GROUPS THEORY 145

pattern - for all components and considered N harmonic components. A


signature vector for the reference image is defined by
R = (IRi I) IRA,. . ., IR,vI),
where

R, =
ss
3(r,O x * ( r , 0)drdO.
For a given target pattern g(x,y), is generated by the vector
c = (IC, I, ICA.. . ,ICNI),
where

C,, =
ss
g(r, B)f,*(r, 8)drdO.

Finally, a decision rule was defined by using the vector X = R - C and taking
the norm IlXll = (XTX)'I2.The test criterion is, in this approach,
llXll < T reference pattern present,
> T reference pattern absent.
The main advantage of this method is that uniqueness of the match is
improved because several harmonic components are used to determine the
vectors R and C, and the experimental results of Wu and Stark are, as
expected, better then when a single component is used. However, this method
is computationally very expensive in that it requires calculating 2N harmonic
components and 2N cross-correlations, and the question may arise whether
it is an improvement over using a conventional match filter and rotating the
reference pattern. In conclusion, the circular harmonic decomposition
approach, in any of its versions, provides a pattern recognition procedure
that is invariant under rotations and encodes the transformational state. As
noted before, this method does not preserve uniqueness (unless all
components are used), and, contrary to claims in the literature (Yuzan et al.,
1982), it is not shift invariant, since for any pattern in the scene the center
of expansion is in general different and must be known or computed in
advance. Thus, circular harmonics decomposition cannot be used to find
patterns embedded in a larger picture or scene.
More recently (Ferrier, 1987; Caelli and Liu, 1988), a representation has
been proposed that satisfies the conditions for strong invariance under
rotations and dilations. Such representation is provided by an integral
transform of the original pattern f(x, y ) , the so-called log-polar circular
harmonic transform, or LPCH transform, whose kernel is given by
k(u,v; x , y ) = (xz +y2)-'exp(-i[uIn(x2 +y2)I" + vtan-'(y/x)]}. (10)
146 MARIO FERRARO

The properties of the LPCH transform can be better understood by writing


it in the coordinates ( r , 8), where r = In (x2 + y2)'I2and 8 = tan-' ( y / x ) (log-
polar coordinates). The formula becomes

g(u, v ) = JJy(r, 0) exp [ - i(ur + vO)]drdO, (1 1)

and it is evident that the LPCH transform is just the Fourier transform
computed in the coordinate system (r,8). The measure of the match between
two patternsf, (x, y ) andf,(x, y ) is given by the normalized cross-correlation

where q2(u,8) = ffh2(r u,8+ + j)drd8 and


C(u,8) =
JJJ ( r , @J;@ + a, 0 + PWd8,

which has a maximum whenx (r, 8) = cX(r + u,8 + /?),where c is a constant.


It is easy to prove that the LPCH transform has the desired properties of
invariance with respect to rotations and dilations, and that the orientation
(rotation) and scale (dilation) states are encoded in the phase component;
moreover, it is unique because it is the Fourier transform of 3(r, 8). The
LPCH transform, however, is not invariant under translations, since it
depends on the origin of the log-polar coordinates system.

B. Necessary and Suficient Conditions for the Invariance of


Integral Transforms

For the integral transform defined by Eq. (2), we call g(u, v) the response of
f ( x , y ) to k(u,v; x,y), and we define g , ( u , v ) as the' response of
T,f (x, Y ) = f T,y), where
Tax = x'(a, x, y )
TOY= y'@, x, Y ) .
Suppose we are given two one-parameter (Lie) transformation groups. The
infinitesimal operators in the domain (x, y ) have the form
INVARIANT PATTERN REPRESENTATIONS A N D LIE GROUPS THEORY 147
where

and analogously for b,( x ,y ) and b,(x, y). The functions a,(x,y ) and bi(x,y )
are the components of the vector field associated with the transformation.
The condition of strong invariance with respect to To, sb requires that
responses for changes under the action of r,,
& ,be expressed as (compare
with Eq. ( 6 ) )
O [Ta &Ax, v)I = exp [i (au + bv)lg,(u, v) = exp ( i 4 g o b ( u ,v), (1 3 4
where g,(u, v) is the response corresponding to the identity transformation,
a=b=O.
Analogously,
o[sbT , f ( x ,y)l = exp [i (au + bv)lgW(u, v) = exp ( i W g A u , 4, (13b)
and
OLsb T a f ( x , y)] = OLSb Taf(x,y)] = gab (u, v).
Note that Eqs. (13a, b) imply
v)l = kaO(u, v)l = kOb(u, v)l = kCQ(u?v)l*
It is obvious that an integral transform satisfying conditions (13a and b)
exists if it is possible to define a change of coordinates ( x , y )+ (q, g), such
that q(x,y ) and ( ( x , y )are the canonical coordinates of Taand s b , that is, the
actions of T, and s b are translations along the q and 4 axes respectively (and
they are independent of each other). In this case the desired integral
transform is given by (Ferraro and Caelli, 1988)
n n

where? (q, 4 ) + z is the form the functionfassumes in the coordinates (q, 4).
The integral transform given by Eq. (14) is the Fourier transform in the
coordinate system (q, t),denoted by FV(q,4)]; hence, it is unique in the sense
specified previously ifJsatisfies conditions (la, b) and is strongly invariant for
translations along the q and 4 axis, that is, under the action of the transfor-
mation groups Ta and sb.
In the coordinate system (q,(), 9,and 9 b can be written simply as
8/3q, 3/84 respectively, and the following equations hold:
Yaq=1 Y b q =o (1 5 4
$Pbl=o $Pbl=I. (15b)
148 MARIO FERRARO

Equations (15a and b) must be satisfied whether Y,, y b are written in the
coordinate system (x, y ) - that is, have expressions (12a and b) - or are simply
9, = alaq, y b = slay.
It is easy to show that alaq, a/a< form a basis for all Lie derivatives
operating in the two-dimensional space (more formally, they form a basis for
the tangent bundle F = U p F p ,where Ypis the tangent space at a point
P E W’) and therefore the change of coordinate (x, y ) --$ (q,<) is one to one
(Schutz, 1980). Thus, we have shown that the existence of canonical coordi-
nates for y,,Y b is a sufficient condition for the existence of a representation
g(u,v) that is invariant in the strong sense, as defined by (13a and b).
We now address the inverse problem: for any g(u, v) such that conditions
(13a and b) hold with respect to a pair T,, sb of Lie transformation groups,
there exist q,< that are canonical coordinates of these groups, that is, the
condition is necessary. Indeed this is the case, and details of the proof can be
found in Ferraro and Caelli (1988); here, only the main points are reported.
Since Eqs. (13a and b) must hold for any arbitraryf(x,y) it follows that
6p,(lk(u, v; x,Y)ldxdy) = 0, (16 4
yb(Ik(u,v; x,y)ldxdY) = O, ( 16b)
where (k(u,v;x,y)l is independent of (u,v), that is,
Mu, v; x, Y ) = h(x,y ) exp (- i y h , a;x, y)).
Consequently, Eqs. (16a and b) become
Y a (h(x,. Y ) ~ x ~=Y0) (17 4
(h(x,y)dxdy) = 0,
Y b (1 7b)
and it can be proved, from Eqs. (13a and b) and by using the identities
T, = exp [aY,], Sb s exp [byb], that
Y a y ( u ,v ; x, y ) = u (184
yby(u, v ; x,y) = v. (18b)
Equations (18a and b) are satisfied by y(u,v; x , y ) = q(x,y)u + < ( x , y ) v for
Yuq(x,y)= 1 and dRbq(x,y) = 0, and for Y,((x,y) = 0 and Yb<(x,y) = 1.
Then, from what we have shown so far, there exists an invertible transfor-
mation form (x, y ) to (q,5 ), and the latter are canonical coordinates for Y,,
Ybrespectively. This proves that the condition is necessary. Further, it
follows that Eqs. (17a and b) can be written as
ya[h(q, t;)lJ(x>~;
II, Oldqd<l = 0,
yb[h(q, <)IJ(x>y;q, <)ldqdtl = 0,
where J ( x , y ; q, <) is determinant of the Jacobian matrix of the change of
INVARIANT PATTERN REPRESENTATIONS A N D LIE GROUPS THEORY I49

variables ( x ,y ) + ( I ] , < ) . The term dud( is invariant under T,, S,, since these
transformations are translations along I] and 4 respectively; the preceding
equations imply that

that is, h(q, r ) l J ( x , y ;I],()I = c, where cis a constant. We can set, without loss
in generality, c = I; thus, the kernel of a representation g ( u , v ) satisfying
conditions ( 1 3a and b) is, in coordinates I], l ,
Mu, v; I ] ,5) = exp [- i(vu + 4v>13 (19)
that is, just the kernel of the integral transform (14). Finally, note that

or
h(X,Y) = IJ(r1, 4; X , Y ) l ,
where J ( I ] 5;
, x , y ) is the Jacobian determinant of the change of coordinates
(I], 4 1 ( x ,Y).
+

Then the kernel of the integral transform (14) is, when the integral is
calculated in the domain ( x ,y ) ,
N u ,v ; x , Y ) = IJh4; x , y)l exp { - i [ d x ,y b + 4 ( x , ~ > v l } . (20)
The existence of a kernel of the form (1 9) or (20) is then a necessary and
sufficient condition for the strong invariance of a representation g(u, v ) .
(Throughout this paper we shall denote the kernel of the integral transform
by k(u, v; I], 4 ) when the integral is supposed to be calculated in the domain
( 1 ] , 4 )and by k ( u , v ; x , y ) when the domain of integration is ( x , y ) . ) We can
summarize these results in the following proposition (compare with Ferraro
and Caelli ( 1 988)):
Proposition 1. Given two Lie transformation groups T,, s h acting on an
image f ( x ,y ) there exists a representation g ( u , v ) of the image in the domain
I

( u ,v ) that is invariant in the strong sense if and only if there is a change of


image coordinates ( x ,y ) + ( I ] ,[ ) such that ( I ] ,4 ) are canonical coordinates of
9,, Yhrespectively. The kernel for the representation is then of the form (19)
or (20), depending on the domain of integration.
It is known from differential topology (Spivak, 1979; Schutz, 1980) that
canonical coordinates exist if and only if Y,, 6Rh are linearly independent and
150 MARIO FERRARO

their Lie bracket (or commutator) [9, Yb],is equal to zero. Then we can
state, in conjunction with Proposition 1 , the main result of this section.
Proposition 2. A representation g ( u , v ) of an image invariant in the strong
sense with respect to the Lie transformation groups T,, sb, exists if and only if
the infinitesimal operators pa,Ybare linearly independent and [Y,, 9 b ] = 0.
Formally, Propositions 1 and 2 are equivalent, but it is clear that Pro-
position 2 provides a simpler method to determine whether a pair of transfor-
mation groups T,, s b admits a representation g(u, v ) strongly invariant;
commutativity and linear independence of two infinitesimal operators 9,, 9 b
can be determined by straightforward (although sometimes very tedious)
calculations, whereas existence of canonical coordinates is usually more
difficult to prove. For instance, if T R is a rotation and S, a dilation, it is easy
to verify that YR,9,, commute and are orthogonal, and hence linearly
independent, whereas for translations along the x and y axes, with in-
finitesimal operators Yxand 9y, we have
[9R,9 x 1 # 0, [9, 1 # 0,
9 x, [ S R ? Yy1# 0, [=%, 9
y # 0.
It is therefore possible to find kernels k(u, v ; x, y ) such that the response
g(u, v ) is strongly invariant for dilations and rotations, but impossible for
translations and rotations or translations and dilations. It follows that,
contrary to claims in the literature (Casasent and Psaltis, 1976; Yuzan et al.,
1982), no integral transform can exist that is size-shift or orientation-shift
invariant while preserving the uniqueness of the pattern’s transformational
encoding.
The representation in (14) entails writing the stimulus pattern in coordi-
nates (q, 5 ) ; computationally, it is desirable to maintain the pattern in the
formf (x, y ) and to write the kernel in the coordinate system (x, y ) . The results
we have proved so far provide a simple procedure for finding the invariant
representation (14), which can be summarized as follows:
1. Solve Eqs. (1 5a, b) using 9, y,b expressed in (x, y ) to find q(x,y ) and

tky).
2. Compute the Jacobian determinant J ( q , 5; x, y ) of the transformation
from (q, 5 ) to (x, y ) . (Note that, since the change of variable is one-to-one
IJh,5; x,v)l # 0.1
The result will be Eq. (14) calculated in the integration domain (x, y):

g ( u , v )= jj
+m

-a,
f ( x , u ) l J ( v , t ;x,y)lexp{-i[q(x,y)u + 5(x9Y)v1jdx4 (21)

and
k(u,v ; x, Y ) = IJ(v,5; x , y)l exp { - i [q(x,y)u + t(x, ~ 1 ~ 1 ) . (22)
INVARIANT PATTERN REPRESENTATIONS A N D LIE GROUPS THEORY 151
It is obvious that g(u, v) does not change whether it is calculated using (14)
or (21).
As concerns the comparison of images transformed by the action of T, and
s b , it must be observed that any imagef(x,y) is uniquely represented in the
canonical coordinates domain byy(q, 5 ), since the change of coordinate from
( x , y ) to ( q , 5 ) is one-to-one, and then it is possible to define a generalized
cross-correlation

G C ( 4 B) = JJJ(v, 5)J;(rl + a, 5 + B)drl&, (23)

which, after normalization, yields a maximum whenT(l(r1,5 ) = &(q + a, 5 p), +


where c is a constant. Furthermore, invariant integral transforms are of
the form 9 ( 3 ( q , t)),that is, they are Fourier transforms computed in the
domain (q,t). From the convolution theorem it follows that GC(u,p) is the
inverse transform of the product gT(u, v)g,(u, v), where
g, (u, v) = 3Y3(q95 ) ) , g,&, v) = 9 ( 3 h 5)h
and the symbol * denotes complex conjugation.
We have not considered the case of invariance of integral transforms with
respect to one transformation group, but this is simply a particular case of
our treatment. The procedure is the same, the only difference being that the
response function depends on a single variable, such as u, and the system of
equations in Eqs. (15a and b) becomes, for instance, Y,q = I , Yut= 0, and
the integration must be performed on just two partial differential equations
(Bluman and Cole, 1974). Canonical coordinates will exist if Yuis not
singular, that is, if its components a,(x,y), i = 1,2 are different from zero
(Spivak, 1979).
On the other hand, the condition of linear independence limits to two the
number of one-parameter transformations under which an integral
transform of an image can be invariant. Transformations of images operate
in R2and then the corresponding infinitesimal operators are two-dimensional.
Suppose there are three infinitesimal operators Y,, Y b , YL acting on R2, and
suppose [Y,,Yh]= [T,,Y ( ]= [ T b , Y c = ] 0. Then one of the infinitesimal
operators can be written as a linear combination of the others. By virtue of
Proposition 2 it is impossible to find canonical coordinates, and hence
invariant representations, for the three operators Yu,Tb, Yc.Of course this
becomes possible if one introduces image representations defined on domains
of higher dimensionality. In general, representations that have the properties
of strong invariance with respect to N commuting one-parameter transfor-
mation groups must be of the form g(ul . . . u,), that is, responses of N-dimen-
sional representations f(xl . . . x N )in R N .
152 MARIO FERRARO

C . Examples

As noted in Section I, the Fourier transform satisfies the condition of strong


invariance with respect to translations along the x and y axes, and the
canonical coordinates are simply q = x, 5 = y. This result can be easily
generalized for any translation in two different directions. The infinitesimal
operators associated with these transformations are
Y = aalax + palay, Y‘ = ,ia/aX + palay,
where (a, fi) and ( a ’ , j ’ ) (with a/fl # a ’ / j ’ ) determine the direction of the
translations. It is straightforward to show, by Eqs. (15a and b), that
canonical coordinates for these transformations are
1
?=
(8‘ - a’Bl4 ( Y -
and

For a rotation TR and a dilation T, the infinitesimal operators are


yR= -ya/ax + xalay, yD= xa/ax + yajay.
(See Appendix A.) Since YRand YDcommute and they are orthogonal (and
hence linearly independent), conditions of Proposition 2 hold. The canonical
coordinates for YRand YDare given by solving the system (compare with
Eqs. 15a and b)

+
It is trivial to verify that x2 y 2 = const is a solution of Y R ( ( x , y )= 0 , so the
canonical coordinate tJ must be of the form t ( x ,y ) = t ( x 2+ y 2 )(Ovisian-
nikov, 1982). Hence,
a
x-t(x2 + y 2 )+ y -am 2 + Y 2 >= 1.
ax aY
INVARIANT PATTERN REPRESENTATIONS AND LIE GROUPS THEORY 153
+
Set x2 y 2 = z; by application of the chain rule, it follows that
a
x-<(z)2x
a
+y--5(z)2y
a
= 2x2-5(z)
a
+ 2y2-&) = 1,
aZ az aZ aZ

that is,
a 1
-((z) =-
aZ 22
and
t ( x , y ) = t(x2 + y 2 )= In [(x’ + y’)”’].
In a similar way we can calculate q. A solution of P D q ( x , y )= 0 is y / x =
const, then q(x,y) = q ( y / x ) , and by setting z = y / x we obtain
y’aq(z) + a q ( z ) = 1
-
x’ az az
and
? ( X , Y ) = tan-’(y/x).

The modulus of the Jacobian determinant of the change of coordinates


(q,0 ( x , A is
+

Thus, we obtain from Eq. (19)


1
k(u, v ; x, y ) = ___’exp ( - i (In [(x’
x2+y
+ y 2 ) ” 2 ] u+ tan-’ ( y / x ) v } ) ,
the kernel of an integral transform strongly invariant under rotations and
dilations (compare with Eq. (1 1)) (Ferraro and Caelli, 1988).
This method can be extended to transformation groups depending on two
parameters in such a way that
x’ = x’(a, x), (244
Y’ = y’(b, Y ) , (24b)
i.e., x and y are transformed independently. This transformation may corre-
spond to a deformation, i.e., variation in shape, of the image.
The transformation in Eqs. (24a and b) can be decomposed in two one-
parameter transformations To,S, defined as
Tax = x ’ ( a , x ) Sbx = x

TOY = y SbY = y’(b, Y ) .


I54 MARIO FERRARO

The infinitesimal operators yU,


y b are

where

and it is obvious that [Za,


y b ] because p depends only on the variable
= 0,
x, and q only on y . Further, go
and are linearly independent, since has
y b
component different from zero only in the direction of the basis vector a/ax,
whereas the non-zero component of ybis in the direction of the basis vector
a/ay. Let us examine the case of the transformation
x’ = exp (a)x,

Y’ = exp (@Y,
that is, a dilation with different parameter values for x and y ; the correspond-
ing infinitesimal operators are YU= xa/ax and dpb = ya/ay. The system ( I 5a
and b) then becomes

) 5
The first and third equations show that r] = ~ ( y and = t(x), and the
solutions are, from the second and fourth equation
r](y)= 1/21ny2 and ( ( x )= 1/21nx2.
The modulus of the Jacobian determinant is
IJ(% 5; x,y)l = I(xy)-’l.

D . Invariant Functions and Kernels of Integral Transforms

In this section we intend to investigate the properties of kernels of strongly


invariant integral transforms and to clarify the relation between these kernels
INVARIANT PATTERN REPRESENTATIONS AND LIE GROUPS THEORY 155
and the functions that are invariant of the infinitesimal operators, i.e., such
that 9 f ( x , y ) = 0.
The following proposition presents a novel result and establishes a
necessary and sufficient condition a kernel must satisfy for the corresponding
responseg(u, v ) to be invariant, in the strong sense, with respect to a transfor-
mation group T,. Define
x’ = Tux = x’(x, y , a )
y’ = T,y = y’(x,y, 4,
where the inverse of T, is T,-‘ = T-,.
Proposition 3. An integral transform O [ f ( x , y ) ] = g ( u , v ) is invariant in
the strong sense with respect to a one-parameter transformation group if and
only if its kernel k ( u , v; x, y ) is such that
1
T, k(u,v ; x, y ) = exp ( - iua)
I J(x’, y’; x, Y)l k(u,v ; x, Y ) , (25)

where J(x‘, y‘; x, y ) is the Jacobian determinant of the change of variable


(x‘, y’) + (x, y).
Proof. First we prove that the condition is necessary.

O[Taf (x, Y ) ] = JTf (x’, y’)k(u, v ; x, y ) d x d ~

= 51 f(x’,y’)k(u,v; T-,x‘, T_,y’)lJ(x,y; x’,y’)ldx’dy’

= exp (iua)
ss f (x’, y’)k(u, v ; x’,y’)dx’dy’,
by virtue of conditions (13a and b); here J(x,y; x’,y’) is the Jacobian
determinant of the change of variable ( x , y ) -,(x’,~’). Since the above
relations must hold for any f (x, y ) it follows that
exp(iua)k(u,v ; x ’ , y ’ ) = exp(iua)T,k(u, v; x , y ) = IJ(x,y; x’,y’)lk(u, v; x,y),
hence

Conversely
156 MARIO FERRARO

= exp (iau)
ss f ( x ’ ,y’)k(u, v ; x’, y’)dx’dy’,

and O [ T , f ( x , y ) ]= exp(iau)O[f(x,y)],and this proves that the condition is


also sufficient. W
Let q be the canonical coordinate of T,. It is obvious that for kernels of the
form k(u,v; q, 0, Eq. (25) becomes
T, k(u, v ; q, 5 ) = exp ( - iau)k(u, v ; q, 5 1,
because the transformation T,q = q’ is a translation, and hence lJ(q’, 5’;
q, t)l = 1. Thus, the kernel k(u, v ; q, 5 ) is an eigenfunction of the operator T,
with eigenvalue exp (- iuu). The kernel k(u, v ; q, 5 ) of an integral transform
strongly invariant with respect to To,s b is then an eigenfunction of T, and sb
with eigenvalues exp (- iuu) and exp (- ibv) respectively. By applying the
infinitesimal operators 9, and y b to k(u, v: q, 5 ) it can easily be seen that the
kernel is also an eigenfunction of Toand y b with eigenvahes - iau and - ibv
respectively. If we decompose the kernel into its real and imaginary parts we
obtain (Ferraro and Caelli, 1988):
Re [Wu, v ; x, y)l + i Im M u , v ; x, y ) ] ,
where
Re [k(u,v ; x , y)l = IJ(q, t; x, y)l cos (qu + 5v)
and
+
Im M u , v ; x,y)l = IJ(q, t; x, y)l sin (qu 5 ~ ) .
Consider, for instance, Re [k(u,v ; x, y ) ] , the results for Im [k(u, v ; x,y ) ]
are analogous, and suppose, first, that v = 0; then Re [k(u,v ; x, y ) ] =
IJ(q,(; x,y)l cos (qu). It is obvious that y b cos (qu) = 0, but in general
y b Re [k(u, v ; x , y ) ] # 0; the condition g b Re [k(u, v ; x , y ) ] = 0 implies that

yblJ(q, 5; x , y ) J= 0, and this holds if and only if S, is area preserving, that


is, leaves dxdy invariate. In fact, we have
yb(dqdt) = y b [ I J ( q ? 5; x,y)Idx&] = O, (26)
and if dxdy is invariant then g b I J ( q , 5; x,y)l = 0. Conversely, suppose
y b l J ( q , t; x,y)l = 0; since Eq. (26) must hold, it follows that dxdy is invariant
under sb.
It is straightforward to verify that invariants of y b with v = 0 are of the
form

and invariants for yowith u = 0 have an analogous form. If u and v are


INVARIANT PATTERN REPRESENTATIONS A N D LIE GROUPS THEORY 157

different from zero, the functions


1
IJh5; x, v)l
Re [k(u,v; x, y ) ] = cos (qu + 5v) = Re [ k (u, v; q, 511
are not invariant of Yoor Yb.These functions, however, are invariant of a
linear combination of Ya,Yb:
Y = -vYa+ U9b.

In fact,
Y cos (qu + {v) = v sin (qu + & ~ ) Y ~ ( q+u 5v)
- u sin (qu + 5v)Yb(qu+ {v)

= sin(?# + {v)(vu- uv) = 0,


and, analogously, Ysin(qu + 5v) = 0. It is interesting to note that the
+
invariant functions cos (qu + rv) and sin (qu tv) are the even and odd basis
functions of the Fourier decomposition ofT(q, 5 ) .
As an example, consider the kernel of the LPCH transform: we have
Re [k(u,v ; x, y ) ] = (x’ + y 2 ) - ’cos {In [(x’ + y 2 ) ] ’ ” u+ tan-’( y / x ) v } .
The function Re[k(u,v; x,y)] is an invariant of YR but not of Y,,, since
dilations are not area preserving; however,
(x’ + y’) Re [k(u,v ; x,y)] = cos {log [(x’ + y ’ ) ] 1 1 2+u tan-’( y / x ) v }
is an invariant of YRwhen v =0 and of YDwhen u = 0. When u,v are
different from zero, we have
+
~ ~ , , c o s { l n [ ( x+’y 2 ) ] ’ / 2 u tan-’(y/x)v} = 0,
where

Examples of these invariants for different values of u and v can be found in


Ferraro and Caelli (1988).

IV. TRANSFORMATIONS TRANSFORMS


OF INTEGRAL

A . Weakly Invariant Representations

We mentioned earlier some representations invariant in the weak sense that


can be obtained in the domain (x, y ) from the original representationf(x, y ) .
158 MARIO FERRARO

Representations with similar characteristics can easily be found in the


domain (u, v). For instance, given the response function
g(u, v) = A @ , v ) exp [i+(u, v)l,
invariant in the strong sense with respect to some transformation group T,,
the amplitude spectrum A(u, v) is a representation invariant in the weak
sense, in that the information about the state of the pattern relative to T, is
not encoded (compare with the conditions in Eqs. (13a and b)).
More interestingly, there exist integral transforms that are invariant under
a transformation group T,, except, at most, for a constant factor, that encode
the corresponding transformational state and are weakly invariant with
respect to another group S,. An example of such transforms is the Fourier-
Mellin transform (Casasent and Psaltis, 1976). The one-dimensional Mellin
transform of a functionfis defined by (Casasent and Psaltis, 1976)

M[f(x)] = M ( u ) =
s,u f(x)x-i”-’dx.

A two-dimensional Mellin transform can be similarly defined:

M[f(x,y)] = M ( u , v ) = ~ ~ o m / ( x , y ) x - i ” - ~ ~ - ” - ’ d x d y .

If one replaces x with expt, Eq. (27) becomes

M(u)=
jomf(exp t )exp (- iu<)dt.

If the original functionf(x) is dilated, that is, transformed tof(exp (a)x), the
corresponding Mellin transform is

~ [ f ( e x p(a)x>l = M,W = jomf(expto exp [- iu(t’ - ~ t ’ ,

where t’ = < + a, and


M,(u) = M ( u )exp (iua); (29)
the Mellin transform satisfies the conditions of strong invariance under
dilations. Note that in the literature, Eq. (29) is replaced by
M,(u) = a”M(u) (30)
(Casasent and Psaltis, 1976; Caelli and Liu, 1988)), but, formally, Eq. (30) is
incorrect because it assumes for dilations the form x + ax whereas, to be
one-parameter transformation group, dilations must be of the form
x + exp (a)x (see Appendix A).
A one-dimensional Mellin-type cross-correlation of two functionsf, andf,
INVARIANT PATTERN REPRESENTATIONS AND LIE GROUPS THEORY 159
can be defined by (Casasent and Psaltis, 1976)

CM(x) = A ( 4 x f 2 W = jornA(Y)f,(XY)(l/Y)dY, (31)


where x denotes the Mellin-type correlation. Defining x = expq and
y = exp 5, the conventional cross-correlation results:

C(?) = jrn/;(5)r,(s + w5, (32)


+ +
withx ( 5 ) =fi (exp t),X(r] 5 ) =f,(exp(q 5)). The cross-correlation C(r])
+
yields a peak at r] if and only i f x ( 5 ) = cx ( 5 q) for some constant c. In the
case of autocorrelation the maximum of the Mellin-type occurs at q = 0, i.e.,
x = 1.
From Eq. (32) it is apparent that the properties of the usual correlation
hold, mutatis mutandis, for the Mellin-type cross-correlation; in particular,
the Mellin transform offl (x) x f,(x) is the product M ( A(x))M*(f,(x)).
Casasent and Psaltis (1976) combined the properties of the Fourier and
Mellin transforms in a new integral transform called the Fourier-Mellin
Transform, and defined (in one dimension) as

MF(f(x)) = MF(s) =
IornIF(u)lu-’”-’du,
where IF(u)l is the amplitude spectrum of the Fourier transform F(u) off(x).
(33)

The properties of the two-dimensional Fourier-Mellin transform can be


better understood by using log-polar coordinates (Caelli and Liu, 1988).
Consider the amplitude spectrum IF(u,v)l of the Fourier transform of a
+
pattern f(x,y), and define r = ln(u2 v2)1’2and 0 = tan-’(v/u>. The
resulting function, denoted 1F(r,0)l, is the log-polar representation of the
amplitude spectrum. A dilation by exp a does not affect the 0 coordinate of
IF(r, 0)l, and by using the Mellin transform with respect to r we obtain (Caelli
and Liu, 1988)

J‘s
M F ( f ( x ,y ) ) = MF(s) = IF@,0)l exp ( - isr)dr.
As mentioned earlier, it is well-known (Rosefeld and Kak, 1982) that
(34)

IF(u, v)l is invariant under translation of the original image, so


M F ( f ( x + xo, Y + Yo)) = MF(f(X9Y)). (35)
It is also known (Rosenfeld and Kak, 1982) that a dilation f(x,y) +
f(exp (a)x, exp (a)y), with parameter value a, results in a dilation with
parameter value -a, of the Fourier transform, multiplied by a factor
exp ( - 2a): F(u, v) + F,(u,v) = (a’)-*F(u/a’,v/a’), where a’ = exp (a). It
follows that
M F ( f ( x ’ ,y’)) = exp (- 2a)MF(f(x,y)) exp (- isa); (36)
160 MARIO FERRARO

here, x’ = exp(a)x, y = exp(a)y. Equations (35) and (36) show that the
amplitude spectrum IMF(s)l of the Fourier-Mellin transform is invariant
under translations off(x, y ) and dilations of F(u, v ) but, contrary to claims
in the literature (Casasent and Psaltis, 1976), is not invariant with respect to
dilations of the original image. However, the value of the parameter a is
encoded in the phase spectrum so that the amplitude spectra IMF(s)l can be
suitably rescaled by a known factor exp (2a).
In conclusion, the Fourier-Mellin transform MF(f(x, y ) ) provide a
representation invariant in the weak sense under translations; as concerns
dilations, the representation is invariant except for a multiplicative factor
exp (- 2a), and the transformational state is encoded. However, this
representation is not unique, because it is based on an integral transform of
IF(u,v)l, and the amplitude spectrum of the Fourier transform does not
define uniquely the image f(x, y ) .
The detection of the target is determined by the Mellin-type cross-corre-
lations of the amplitude spectra of the Fourier transforms of the original
images,

C(a) =
Io*IFWl IF(r + alldr,
where 8 is dropped for notational clarity. The cross-correlation has a
maximum at s if and only if
,‘fI Wl = c l F k + 41.
The Fourier-Mellin transform method is shift and size invariant, but it is
well-known that using amplitude spectrum alone to represent a pattern will
result in erroneous recognition since the power spectrum does not define
uniquely the original pattern. That is, this method achieves invariance by
losing uniqueness of the matching process.
In the next section we will show which conditions one-parameter transfor-
mation groups must satisfy to ensure the existence of integral transforms that
enjoy properties similar to those of the Fourier-Mellin transform with
respect to dilations and translations.

B. “Covariance” of Integral Transforms

The uniqueness of transforms of the type 9 ( f ( q ,t)) implies that there is no


loss of information in the change of representationf(x, y ) -,g(u, v ) , but that
the information contained in the image has been given a different format in
which invariant characteristics and transformational properties have been
encoded in the amplitude and phase spectra respectively. The original pattern
f(x, y ) can be completely reconstructed from the representation g(u, v ) , which
INVARIANT PATTERN REPRESENTATIONS A N D LIE GROUPS THEORY 161

then can be treated as an alternate form of image. Therefore, it is plausible


to ask how g(u, v) changes under transformations off(x, y ) that do not satisfy
the conditions of invariance established in Propositions 1 and 2.
<
Let k(q, 5; u, v) be the kernel of g(u, v); here, q and are the canonical
coordinates of two commuting one-parameter transformation groups Toand
S,. Consider a transformation group N, and define
?’(?(X?Y)?t(x, Y ) ) = 4q(x9 v) = q(N,x, N‘Y)
t’(?(X,Y)55(x,Y)) = N , t ( X , Y > = t ( N , X , N , Y ) .
We say that g ( u , v ) is covariant with respect to N, if
k(N,- ’ q‘, N,- ’ t’;u, v) = k(q’,5‘; v, u, v, v), (37)
where U, is one-parameter transformation group (Giulianini et al., 1992). Let
g,(u, v) = O[N,3(q,5 ) ] be the integral transform of the pattern N,3(q, t) =
T(q’,5’). If the representation g, (u, v) is covariant with respect to N, , we have

g, (u, =
ss J(vl’,t’)k(?, t; u,v)dq&
= jj3(S.. 5’)4q’, 5’; u, v)lJ(rl, t; q’, t’)ldq’dt’
= IJ(& 5; v’, t’)lg(u’,4 9 (38)
where J ( q , 4; q’, 5’) is the Jacobian determinant of the changes of variables
(q, 4 ) + (q’, 5 ’ ) and u’ = U, u, v’ = U, v.
Despite its rather complicated formulation, the covariance property has a
very simple meaning: among the transformations under which g(u, v) is not
invariant in the strong sense there exist some such that their action onf(x,y)
results in a simple transformation of u and v. The following propositions
have been demonstrated by Giulianini et al. ( 1 992) and we shall report here
just a sketch of the proof.
Proposition 4. Let g(u, v) be the response of an integral transform with kernel
exp [ - i(uq(x, y ) + v<(x,y))]: g(u, v) is covariant with respect to a transfor-
mation group N, ifand only if the action of N, is a linear transformation of the
coordinates q , 5. Furthermore U, = ( N c ’ ) ~ i.e.,
, U, is the transpose of the
inverse of N,.
Proof. First we prove that the condition is necessary: if exp [ - i(uq(x,y ) +
v ( ( x , y ) ) ] is covariant with respect to N, we can write

uq(q’, t’, t ) + vt(q’, 5‘. t ) = q’u’ + 5’v‘.


Consider the Taylor expansion of q((’, q’) and {(t’,q’) around the origin
162 MARIO FERRARO

(0,O) (we omit the dependence on c for sake of simplicity):

[ 1(0,0)+,
a1
a1
q’+y
lOqO 2 lo,5’+--
2 a12 w2 I
0.0 1‘2

1-1 a2 1 aZ1
+ a1 a5 0.0 1’5’ + 7 p ~ o , 0 5 ’ 2 + . . .]
+v
[ t(0,O) +-
a5
a1 loa2t. at lo.
1’+:
65
y+--
2 a251 0,o
1’2

1-
+ a1a2at5 0.0
1’5‘
1
+ 7p~o,ot’z
+ * * .]
= 24’1’ + V’t’, (39)

where the derivatives are computed in (0,O). Rearranging the left-hand side
of equation (39) it is easy to show that the terms containing powers of
1’ and 5’ of order greater than one must vanish, and furthermore,
q(0,O) = t(0,O) = 0.
Therefore,

and we can write

where

a(c) = 7
a1 I
0.0
P(C) = 91
85‘ 0.0

Then N;’ is represented by


Y W =7
a1 0.0 I 6(c) =
85,’ 0.0
.

which shows that its action is a linear transformation. Furthermore,


1’ = a’(c)1 + B’(c)t
5’ = Y’(C)1 + d’(c)t
INVARIANT PATTERN REPRESENTATIONS A N D LIE GROUPS THEORY 163
and

To prove that the condition is sufficient, it is enough to observe that


m5”
exp { - i [ u ( d c ) v ’ + B ( 4 5 ’ ) + v(IJ(c)rl’ +
= exp { - i [ v ’ ( d c ) u + y(c)v)+ S’(B(c)u + 6(c)v)l}
= exp [i(u’q’ +v’5’)].
g(u, v) is then covariant and the transformation from (u, v) to (u’, v’) is given
by the action of a linear operator Uc:

The integral transform g, (u, v) is:


g, (4v) = mu’, 5‘; rl, 5)lg(u’, v‘)
= IJ(u’, 5’; rl, 5)lg[V,u, U,vI

= IJ(r’9 5’; q, 5 M w ’ ) r u , (N,-’)TVl.


It may be worth noting that the transformations must be linear in the
canonical coordinates q and 5 but, of course, it need not, and in general will
not, be linear in x and y .
Examples
1. It is known that the Fourier transform of an image rotated by an angle
8 about the origin is rotated by the same angle in the (u, v) domain (Gonzales
and Wintz, 1987). This property of the Fourier transform is a straightforward
consequence of Proposition 4.
FR (u, v) = IJ(x’, y’; x, y)lg(u’, v’)
= IJW, y’; x, Y)lF(U, u, u,v)
= IJ(x’,y’;x,y)lF[(N,’)54,(N,-’)Tv].
For the group of rotations in the plane N,-’ = N T , therefore U, =
(N,-’)‘ = = N,. Further, the determinant of the Jacobian is one. Then

FR ( u , v) = F(Nl u, N, v).
2. In the case of dilations, N, is a symmetric matrix i.e., (N; = N l - ’ . The
determinant of the Jacobian is 1/c’*, where c’ = expc. Then
FD(U,v) = (1 /c’2)F(u/c’,v/c’).
164 MARIO FERRARO

Given two commuting Lie transformation groups T, and s b and the corre-
+
sponding invarinat representation g(u, v) with kernel exp [ - i(qu (v)],it is
possible to establish whether g(u, v) is covariant with respect to a Lie trans-
formation group N, simply considering the properties of the commutator
[Yu,Y,] and [Yb,Yc], that is, without computing the canonical coordinates
v 9 5.
Proposition 5. The integral transform g ( u , v) is covariant with respect to N,.
if and only if the following conditions hold:
-%I
[Yu, = + PYb (40a)

where u , P, y , and 6 are constants.

Proof. We note preliminarily that, since the transformation from (x, y ) to


(q, 5 ) is one-to-one (Schutz, 1980), the previous conditions must hold both in
the (x, y ) and (q, g) coordinate system.
First we prove that the condition is necessary. If the integral transform is
covariant with respect N,, it follows from Proposition 4 that N, is linear in
the canonical coordinates (q, 4 ) of T, and sb. Then the infinitesimal operator
9, is

and therefore

Conditions (40a) and (40b) are also sufficient. The infinitesimal operators
9, = a/aq and Yb= a/a( form a basis in R2, and in general 9, can be written
as

where a(q, 5 ) and b ( q , t ) are smooth functions of q and 5. If one calculates


the Lie brackets [Yo, Yc], [ T b , Yc]
and takes into account the conditions (40a
INVARIANT PATTERN REPRESENTATIONS A N D LIE GROUPS THEORY 165
and b), it follows that

hence,

and

therefore N, is linear in (q,0,


and the integral transform is covariant with
respect to N,.
Suppose that there are two transformations To,sb, with [Yo, y b ] = 0 and
a transformation N, such that the conditions (40a and b) of Proposition 4 are
satisfied. Then there exists an integral transform g(u, v) invariant under To
and S, and covariant with respect to N , . Let A ( u , v ) = Ig(u,v)l be the
amplitude of g(u, v) and suppose that A(u, v), satisfies conditions ( l a and b),
then it is possible to define an integral transform G(s) of A(u, v) invariant in
the strong sense under the action of (N,-’)T on A ( u , v ) . To emphasize the
dependence on the original pattern we can write G(s) as
G(s) = O [ W ( %0 1 ,
where the integral operator denoted by 0 must be thought to act on the
amplitude spectrum of g(u, v) = 9(3(r],
5 ) . The explicit equation for G(s) is

s
G(s) = A(i(u, v)> exp ( - isi)di,

where ‘4 is the canonical coordinate of U, = (N,-’)T.


(41)

Let us examine the properties of G(s)under transformations of the original


patternf(x,y). It is obvious that G(s) is invariant in the weak sense under the
action of T,, S, on f(x,y) because, by hypothesis, A(u, v) is invariant. In
general, however, G(s) will not be invariant with respect to N,, because we
have, from Eqs. (38), (41),
I O [ ~ ( N ( 3 ( ?0,3 1 = tJ(V’9 t’; ?, 01tG(s)l,
and invariance requires lJ(q’, 5’; q, ()I = 1, that is, N, must be area preserving
in the coordinate system (v],t).Note, however, that since N, is linear, the
166 MARIO FERRARO

above Jacobian determinant is constant, that is, invariance is ensured except


for a constant factor. Finally, the transformational state with respect to N,
is encoded by the phase of G(s).The representation G(s) is not unique in that
it depends on the amplitude of g(u, v ) and, as remarked earlier, for a given
integral transform there are infinite patterns with the same amplitude
spectrum. Equation (41) can be generalized to a pair N, ,Md of commuting
one-parameter transformation groups provided that both 9, and Ydhave Lie
brackets with Yoand Ybthat satisfy conditions (40a and b). The formula
reads

G ( s , t ) = jjA([,I)exp[-i([s + It)]d[dA. (42)

It is trivial to prove that for translations, rotations, dilations, the following


relations hold:
[Yx,YR1 = q, [y~?YR1 = - Y ~ ?

[Ykj,YD] = 9 x 9 [ y j ? T D ] = Y y ?

where Y x ,Yy,yRand yDare the infinitesimal operators of translations,


rotations, and dilations respectively. We have already noted that the Fourier
transform is covariant with respect to rotations and dilations (see the
examples in this section). Keeping in mind the transformational properties of
IF(u,v)l, it is clear that we can define a LPCH transform

C(s,t ) =
SIIF(r, 0)l exp [ - i (SY + tO)]drdO,
where 0 = tan-’ (u, v ) and r = In (u2+ v2)”’. It is interesting to note that
(43)

C ( s , t ) does not depend on the position of the centers of rotation and


expansion of f ( x ,y ) , since a straightforward but tedious calculation proves
that, regardless of their positions in the (x, y ) domain, the corresponding
centers of rotation and dilation of IF(u, v)l are in the origin of the plane (u, v).
It is easy to check that the integral transform defined by Eq. (43) is just the
Fourier-Mellin transform written in log-polar coordinates.
In conclusion, replacing the condition of commutativity with conditions
(40a and b) results in the existence of a new representation that attains
invariance, except for a constant factor, while losing the uniqueness of the
represen tation.

v. NOTES OF 3D OBJECTS
O N INVARIANT REPRESENTATIONS

The term three-dimensional object recognition encompasses different and


INVARIANT PATTERN REPRESENTATIONS A N D LIE GROUPS THEORY I67

often contrasting meanings. Some approaches deal only with single pre-
segmented objects, whereas other schemes aim to interpret multiobjects
scenes. Some recognition systems require multiple viewpoints and in others
data are supposed to be available from both sensors and intermediate
processors. (A comprehensive bibliography and a precise definition of the
problem can be found in Besl and Jain (1989.) We shall be concerned here
solely with the problem of invariant coding in three dimensions, i.e., with the
problem of finding surface representations invariant under rigid motion in
R3.The literature on surface representations is vast in computational vision
(compare with Besl and Jain, 1985, 1986); the scope of our investigation is to
show how differential geometry provides necessary and sufficient conditions
for the soluton of three-dimensional invariant coding and to analyze some
examples of differential-geometric surface descriptors.
There are at least three characteristics that make object recognition more
difficult than pattern recognition (Caelli et al., 1992). First, sensory data are
usually in the form of light intensity and must be converted into data about
the shape of the surface. This entails solving the problem of “shape from X,”
that is, inferring the shape of a surface from the information contained in the
surface’s image. “Shape from X” is, in itself, a major problem in compu-
tational vision, since it is ill-posed in the sense of Hadamard (Hadamard,
1923; Poggio and Torre, 1984). Over the years, a variety of methods has been
proposed to infer shape from images: shape from stereo (Grimson, 1980),
from motion (Ullman, 1979), from texture (Witkin, 1981; Blake and
Marinos, 1990), from shading (Horn and Brooks, 1986; Bischof and Ferraro,
1989), from focus (Pentland, 1987),and from photometric stereo (Woodham,
1980). The difficulty in solving “shape from X” is certainly related to a
number of factors (e.g., scene illumination and reflectance properties of the
surface) other than the surface’s shape, which take part in the process of
formation of depth maps.
An alternative technique for gaining information about surface geometry
uses range finders to produce depth maps, or range images, of the surface. In
range images, the depth value encodes information at each pixel about
surface geometry in terms of the distance between the sensor and the surface
(Besl and Jain, 1985). The interpretation of depth maps is more immediate
than that of intensity images in that factors such as scene illumination and
reflectance properties of the surface do not concur to form the range image;
the information about surface geometry is directly encoded, but, of course,
the process of formation of range images is not related to vision. Whatever
be their specific format, sensory data refer to visible parts of surfaces, or
visible surfaces for short, and any surface of a physical object in not completely
visible from an observer in a fixed position (apart from objects made of
transparent material!).
168 MARIO FERRARO

A second characteristic of object recognition is that, once data about shape


information have been recovered, these view-dependent data are matched
with a view-independent model, and, as it will be seen, this requires the
comparison of two different types of representation.
Finally, rigid motions of objects involve translations in three directions
and rotations about three axes, a total of six one-parameter transformation
groups acting in R3, whereas in pattern recognition the group action is
restricted to R2.
We assume, for our purposes, that the depth map is given by solving
“shape from X” or by means of range finders; further, it is supposed that the
observer is in the origin of the cordinate system (x,y , z), and that the line of
view coincides with the z-axis, so that the depth map is of the form z = h(x,y),
where ( x , y ) is the image plane and z is the distance between the observer and
the corresponding point on the surface, assuming an orthogonal projection.
Note that, in general, h is not given in analytic form, even though it can be
approximated, at least locally, with various types of interpolating functions
(Faux and Pratt, 1979; Tiller, 1983; Sedeberg and Anderson, 1985).
View-independent model surfaces are described by parametric or implicit
representations well known from differential geometry. In the parametric
representation, each point of the surface is defined by a map f from a
parameter plane (u, v) to R’:

whereas in the implicit representation, points on the surface must satisfy the
equation
F ( x ,y , 2 ) = 0. (45)
View-dependent surfaces are Monge patches, graphs of the depth map h,
h: (x,y)+ h(x,y) = z (46)
(precise definition of surfaces and related mathematical entities can be found
in Appendix B).
We are not interested here in abstract surfaces but rather in surfaces of
physical objects that are closed, bounded and continuous, and we shall also
assume that surfaces are smooth and regular, that is, that there are no cusps
or sharp edges. The last two assumptions in general are not satisfied by most
physical surfaces, but, if fine or microscopic details are disregarded, they are
at least piecewise smooth, and usually non-regular (singular) points form a
set of zero measure in R2; therefore, our hypotheses are not too restrictive.
Note that the condition of continuity holds for view-independent represen-
tation of surfaces, whereas in Monge patches, occlusions of parts of the
surface result in discontinuities of the depth map h.
INVARIANT PATTERN REPRESENTATIONS AND LIE GROUPS THEORY 169
Any rigid motion in R3can be decomposed in six one-parameter transfor-
mation groups, three translations and three rotations. Translations are
defined by the formula
(x',y',z')'= T(x,y,z)'=(~+a,y+b,z+~)~,
and rotations are generated by the matrix operators

A2=A2(0)=[
cos6
0
-sin6
I cos$
0
1
0 cos6
sin6
0

-sin$
1.
A3=A3($)=[; si;$ co;$],

where 4, 6, II/ are the Euler angles (Korn and Korn, 1968). We shall denote
a generic translation by T(a,h, c), or simply by T, and likewise R($,O, $), or
R, will denote a rotation obtained by any application of the matrices A , .
The infinitesimal operators are
t , = a/ax, t , = slay, t , = aidz,
for translations, and
I, = - Zqay + ya/az, i2 = za/ax - xa/az, i3 = - ya/ax + xalay,
for rotations about the x, y and z axes, respectively. (In the following we shall
use symbols t, and 1, for infinitesimal operators of translations and rotations
and shall keep the symbol 9to indicate a generic infinitesimal operator.) The
Lie brackets of t , , 1, are
[t,, t,l = 0, (47a)

1 if ijk is an even permutation of 123,


qJk= - 1 if ijk is an odd permutation of 123,
0 otherwise,
170 MARIO FERRARO

(Crampin and Pirani, 1986). Since the operators li and ti do not commute, the
result of the application of a rigid motion to a vector x depends on the order
in which translations and rotations are performed; however, it is well known
(see, e.g., O'Neill, 1966) that any rigid motion in R3is uniquely determined
by a rotation followed by a translation, and thus we denote a generic rigid
motion by TR, and a transformed surface by S' = TR(S).Analogously, the
commutator between 1, and rj is different from zero, unless i = j , and this
shows that the result of a generic rotation R depends on the order of
applications of the matrices A i . There are various sequences A i , A,, A,, where
i, j , k need not be different, that uniquely define R ( 4 , 0 ,$) in the 3D space,
and we can set, without loss in generality, R ( 4 , 0 ,$) = A , (4)A2(O)A3($)
(Korn and Korn, 1968).
It is obvious that representations given by the mapsf, F, and h define
surface uniquely but are not invariant with respect to rigid motions.
We begin the analysis of invariant representations by showing how any
surface can be generated by the action of two commuting, linear independent
vector fields. (Later, in order to give the formulae a more compact form, we
shall use the notation x, = x, x2 = y , x3= z . )
Let S be a surface and let 9: be a vector field that assigns to each point
x = (x, ,x2, S a tangent vector vl,. From an algebraic point of view, a
vector field is the infinitesimal operator of a one-parameter group of transfor-
mations that are smooth and one-to-one: the action of this group, starting
from a point xo, generates smooth integral curves of Yu,a(u) = (x,(u),
~ , = xo, whose tangent vector at a point x coincides with the
x2(u),x ~ ( u ) )a(0)
value of 3, at the same point:

= 6Ru(x).
du x
Then it follows that, in R3, Yuhas the form
+ +
Yu= (dx,/du)d/dx, (dx,/du)d/dx, (dx3/du)i3/dx3.
The vector field 2'"completely determines the curve a(u) except for the initial
point xo, and hence there exists an infinite number of curves a(u), one for
each different initial point xo. Let a(u, xo) denote the maximal integral curve
starting from xo.The curve a(u, xo)can be calculated by using the exponential
map from the tangent bundle TS to S,
exp: TS -,S, a(u, xo) = exp ( u g U ) x , ,
where the exponentiation has the usual operational sense (compare with
Appendix A), i.e., a(#,xo)is computed as a Taylor's expansion in powers of
u. Consider a vector field gW linearly independent of Yuand such that
[Yu,Yw] = 0, and let fi(v,xo) be the maximal integral curve of Ywstarting
INVARIANT PATTERN REPRESENTATIONS A N D LIE GROUPS THEORY 17 1

from x,.A theorem of differential geometry (Schutz, 1980) establishes that


the flows of LYUand 2ZUform S in the sense that each point x E S belongs both
to u and p. Then S admits a parameterization with parameters u, v and can
be written as
X ( U , V ) = exp (~9")
0 exp ( v . Y ~ )=
x exp
~ ( d / a u )0 exp ( d / d v ) x , ,
where xo = ( x I(0, O), x,(O, O), x 3 ( 0 ,O ) ) T . Since yUand .Yucommute the result
of the composition of the two exponential maps is independent of the order
of application to xo and is equivalent to the Taylor expansion of a smooth,
vector-valued map
f:(u, 4 -b (XI (4v ) , x 2 ( 4 4,
x3 (u, V N T ,
the usual parametric representation of a surface.
Thus, two commuting, linearly independent vector fields uniquely define a
surface, up to a translation, as the position of the point xo is not encoded by
the vector fields.
If the surface S is given in parametric form, the application of the method
is trivial: the tangent vectors xu = ax/&, x, = ax/& are linearly independent
and obviously LZu = a/&, .Yu= a/& commute.
Consider the implicit representation of a surface F ( x ,,x 2 ,x 3 )= 0 and
define F, = aF/ax,, i = 1,2,3: an infinitesimal operator
3
.Y = 14 x 1
r=I
9 x2, X,)a/ax,

is a vector field on S if and only if


3
yF(XI,X2,X3) = a,(xl,x2,x3)& = 0. (49)
r=l

Among all vector fields satisfying Eq. (49), there exist pairs of commuting,
linearly independent vector fields that are generators of a parametric
representation of a surface. The Frobenius integrability theorem, when
applied to R', states that if there exist three vector fields s1,
p2,9,such that
7

where cij are smooth real-valued functions, then the integral curves of the
vector fields mesh to form a family of R2 surfaces that fill a subset A of R3
(Spivak, 1979; Schutz, 1980); moreover, each point of A is on one and only
one surface. The condition is also necessary. As an example consider the
vector fields
172 MARIO FERRARO

Obviously, 9 , F ( x , ,x 2 ,x 3 ) = 0, i = 1,2,3, and a straightforward, tedious


calculation proves that 9,, Lf2, 9, satisfy Eq. (50). The Frobenius theorem
(Schutz, 1980) entails the fact that it is possible to find vector fields, V,, V2,
linear combinations of 9, ,Y 2 ,and Y3 that are linearly independent and
commute with each other and with one of the vector fields dc: such as 9,.
There exist, then, at least three pairs of generators of independent parametric
repesentations of the surface.
The form F ( x , ,x2,x 3 ) = 0 is the representation most appropriate for
finding invariances of S under certain transformations. A surface is invariant
under a one-parameter transformation group T, if and only if the infini-
tesimal operator defines at every point a vector tangent to the surface
(Olver, 1986). For a surface of the form F(x, , x ~ , x , )= 0, the tangency
condition translates in Ya(F(x,,x 2 ,x 3 ) = 0. For example, if for some subset
of the domain of definition of S and some i, t,F(x,,x 2 ,x , ) = 0 , the surface is,
at least locally, planar, and likewise l,F(x,,x 2 ,x , ) = 0, for some i, indicates
that it has rotational symmetry. However, the usefulness of this method is
limited by the fact that it generally depends on the orientation of the surface
in R3. For instance, the identity l,F(x, , x 2 ,x3) = 0 holds if and only if the
surface is invariant with respect to a rotation about the x,-axis, whereas a
rotational symmetry about the generic axis implies that F ( x , , x , , x , ) is
annulled by the infinitesimal operator lE,= C C , ~ ,Then . the existence of
rotational symmetries can be determined if it is possible to find a linear
combination of the infinitesimal operators l,, with coefficients c,, that
annihilates F(x, ,x2,x,). The coefficients c, can be determined, for example,
with relaxation techniques (Rosenfeld and Kak, 1982; Aarts and Korst,
1989), but these methods may be computationally very expensive.
Finally, in the case of a Monge patch, two independent vector fields are
1= d/dx
9 + q(x,y)d/3z, 9 2 = a/ax + q(x,y)d/dz,
wherep(x, y ) = dh(x,y)/dxand q(x,y ) = d h ( x , y ) / d y ,and it is straightforward
to prove that 9,, Y2commute.
Thus, vector fields form a class of surface representations from which a
specific parameterization can be chosen, and the choice depends on the type
of surface under consideration. For instance, Brady and Yuille (1 984) argued
that lines of curvature may provide a natural parameterization of the surface
and called this parameterizaton a curvature patch representation, whereas
Brady et al. (1985) showed that asymptotic curves are the most suited to
parameterize ruled surfaces. It is well known in computer vision literature
that in general vector fields, or their integral curves on the surface, are a rich
source of information about surface shape. Of particular interest are the
bounding contours of surfaces, where the normal turns away from the viewer,
which allow the inference of local information about surface shape from their
INVARIANT PATTERN REPRESENTATIONS AND LIE GROUPS THEORY 173

projection on the image plane (Brady et al., 1985; Beusmans et a]., 1987;
Koenderink, 1987; Richards et al., 1987).
Surface representations based on tangent vector fields are useful for
analyzing properties of curves on the surface, permit the exploration of
different types of parameterization, and further are invariant, by definition,
under translations. Unfortunately, these representations do not meet the
conditions of invariance under rigid motion because they are linearly trans-
formed under rotations of the surface. It is a standard result of differential
geometry (O’Neill, 1966) that if a surface S is mapped into S‘ by a rigid
motion, S‘ = T R ( S ) ,any tangent vector vp = (vl ,v 2 ,v3)T to S at a point p is
transformed in a tangent vector to S‘ at q = TR(p), denoted by
wq = (w, w2, 5

and
wq = R(v,),
that is, tangent vectors are invariant under translations of the surface,
whereas when the surface is rotated, tangent vectors are rotated exactly the
same way.
The answer to the problem of invariance and uniqueness of representations
lies in the fundamental theorem of surface theory: a surface is defined uniquely,
up to a rigid motion, by the coefficients of its first and second fundamental
form
g&, v ) = v,, bf,(u,v) = x,,n, iJ = 1,2, ( 5 1 a)
where
XI= xu = ax/au, x2 = x, = a x / a v , (51b)
= a2x/au2, x22= a Z X / a v 2 , x 1 2= Xz1 = a2x/auav, (514
and n is the unit normal to the surface (see Appendix B). In other words, two
surfaces with the same coefficients of the first and second fundamental form
can be superposed onto each other by a rigid motion.
It is obvious from Eqs. (51a, b, c) that there are six independent coefficients
of the first and second fundamental form; they are invariant under rigid
motions of the surface in the sense that g,(p(u, v)) = g,,(q(u, v)), and b,](p(u,v)) =
h,,(q(u,v ) ) , where P E S, q E S’, and S’ = T R ( S ) ,q = TR(p). In the literature
on differential geometry, the term uniqueness is always understood to mean
“uniqueness within a rigid motion,” the reason being that the shape of the
surface is unique even though its position and orientation are not determined.
For the sake of simplicity, we shall heretofore use the same convention even
though it differs from our previous definition of uniqueness.
The set of functions g,(u, v ) ,b,(u, v ) defines a six-dimensional representa-
174 MARIO FERRARO

tion of S,
{gij(u, v), bij(u, v>>
that is unique (in the sense of differential geometry) and is invariant, albeit
in the weak sense, as position and orientation of the surface are not encoded;
in turn, this representation depends on the action on S of the five differential
operators of first and second order
(a/au, a/av, a2/au2,a21auav,a21av2), (52)
(compare with Caelli et al., 1992), or, more generally,
z,-e,%-%,93.
Mu, (53)
In other words, surfaces are completely described by tangent vectors,
surface normals, and rate of change of tangent vectors with respect to the
parametric representation.
Although the representation {(g,(u, v ) , bij(u,v))} is the answer to the
problem of invariance and uniqueness it requires the computation of six
functions, and, furthermore, it is difficult to interpret which information
about surface shape is conveyed by each of these functions. It would thus be
advantageous to find a simpler representation that combines the information
of g, and b, in a way that makes surface characteristics easier to interpret.
Besl and Jain (1986) proposed the use of two curvature functions, the
Gaussian (K(u,v)) and mean ( H ( u ,v)) curvatures to characterize surface
shape. They argued (Besl and Jain, 1986) that K(u, v) and H(u, v) capture the
salient properties of surface geometry even though, in general, cannot
ensure uniqueness. However, for compact and convex surfaces, where
K(u,v) > 0 at every point, there exists a single function, the Gaussian
curvature K(u,w) that uniquely defines (up to a rigid motion) the surface
(Chern, 1957); an example of such surfaces are the ovaloids, that is closed,
bounded, and convex surfaces. Moreover, it is interesting to note that, under
certain conditions, H can uniquely define a Monge patch. The function H can
be written as
H(X,Y) = tV[Vf(X,Y)[l +l~f~~~Y~ll-”21, (54)
(Besl and Jain, 1986) and Eq. (54) is a second order elliptic quasi-linear
partial differential equation. If the domain of definition D of the Monge
patch is bounded, H(x,y) is an arbitrary function of the two variables with
continuous first partial derivatives in D,and fi ,f2are solutions in D to Eq.
(54) such thatfi(x,y) =f2(x,y) on the boundary aD, thenf,(x,y) =f2(x,y)
throughout D (Gilbarg and Trudinger, 1977).Thus H(x, y ) plusf(x, y ) on aD
together constitute a representation of Monge patches such that all informa-
tion present in the original depth map is preserved. Under conditions con-
INVARIANT PATTERN REPRESENTATIONS A N D LIE GROUPS THEORY 175
cerning the absolute value of the integral IJHdxdyl calculated on any domain
A c D , it can be proved (Giusti, 1978) that H alone defines uniquely, within
a rigid motion in R2, the functionfand hence the Monge patch. However,
the above results apply to Monge patches only and thus they have a limited
relevance to our problem.
It is well known from differential geometry that K ( u , v) and H ( u , v) are
invariant under rigid motion (Gauss theorema egregium establishes a stronger
invariance property, namely that K ( u , v) is invariant under isometries), and
hence an encoding by K(u, v) and H ( u , v) provides a representation invariant
(in the weak sense).
One of the advantages of the representation { K ( u ,v),H ( u , v)} is that it
provides a simple way to segment surfaces in parts; every surface point can
be classified according to the sign of K and H (Besl and Jain, 1986).
If K > 0 the point is said to be elliptic, that is, the surface in a neighbor-
hood of x is like an ellipsoid; if K < 0 the point is hyperbolic and the S is
locally saddle-shaped; when K = 0 it is locally flat or conical or cylindrical.
If the sign of H is also considered, any point of the surface can be classified
as belonging to one of eight classes. If K = 0 and H < 0 the surface looks
locally like a ridge; if K = 0 and H = 0 it is locally planar; and if K = 0 and
H > 0 it is locally valley-shaped. When K < 0, the sign of H # 0 indicates
whether the surface looks more like a valley or a ridge, and K < 0, H = 0
correspond to the case of a surface that is locally minimal. Finally, if K > 0,
H < 0 the surface is locally ellipsoid-shaped and peaked (i.e., the surface
bulges in the direction of the surface normal), and if K > 0, H > 0 the surface
is locally ellipsoidal and bulges in the direction opposite the surface normal.
Note that if K > 0 it is impossible that H = 0 (see Appendix B). (Of course
it would also be possible to use the signs g,] and b , to classify surface points,
but the resulting classification would be very complicated, as there are
36 = 729 classes!)
It is not clear how the advantages of simplicity of the representation
{ K ( u ,v), H ( u , v)} compare with the lack of uniqueness, and this problem
can be solved only by a detailed analysis on the relationship between
{g,](u,v), b,](u,v ) } and { K ( u ,v ) , H ( u , v)}. In experiments with simple range
images (Besl and Jain, 1986), depth maps were reconstucted by using K ( u , v )
and H ( u , v), together with four other surface descriptors, invariant under
rigid motion:
1. The determinant g of the matrix k ] of the coefficients of the first
fundamental form; the integral o f g over the domain of definition of a surface
gives the area of the surface.
2. The coordinate angle function 0 defined as
@ = cos-'k12(gllg22)-11~
176 MARIO FERRARO

which measures the non-orthogonality of the parameterization.


3. The angles of the principal directions in the (u,v)plane, defined as
= [[- B f ( B 2- AC)]Cpl],
where
A = g,,b,2 - g , 2 b , , , 2B = g , , b , , - g22b,,, c = g,,b*, - g22bn.
Note that these directions are not orthogonal in the (u, v) plane even though
they are orthogonal in the tangent plane to the surface.
Besl and Jain (1986) conjectured that the set of functions H , K, g , 0, a,,(4
provide a description of the surface equivalent, as regards uniqueness, to
{ g , ] ,bf,}. Measures of curvature have been used by Fan et al. (1989) to match
model surfaces with visible surface patches.
To encode position and orientation of the surface, that is, to make our
representation invariant in the strong sense, we need to know the coordinates
of a generic point S 3 X o = (ao,bo,c ~ and ) ~the components of two linearly
independent tangent vectors at xo, denoted by vo and wo. If vo and wo are
given, tangent vectors at each point can be calculated by solving the
equations of Gauss-Weingarten (see Appendix B) that relate tangent vectors
and surface normals to their derivatives via functions of g , and b,]. In turn,
tangent vectors plus the initial value xo allow a complete reconstruction of the
surface through a simple integration (Stoker, 1963) corresponding to the
exponentiation process described earlier. Thus, to encode the transfor-
mational state of an object’s surface we need only orientation and position
at one point xo, because orientation and position at any other point are
uniquely determined via the Gauss-Weingarten equations and integration of
vector fields.
Any pair of linearly independent tangent vectors vo, wo uniquely defines the
unit normal, and hence surface orientation, at xo. We assume the existence
of a coordinate system placed at a conventional location, and a “reference
state” for the direction of no,say (O,O, l)T.Any rotation in R3can be written
as R ( 4 , 8 , $) = A , (4)A2(8)A3($) and then the direction of nois defined by the
triple 40,8,, of Euler angles such that the following equation holds:
= I)’
(~O)A2(e0)A3($0)(0,0,

If S is rotated, the orientation of S’ = R ( S ) is given by


nh = Al($)A2(e)A3($)n0 = Al(~~)A2(8~)A,($h)(0,0,
and &, Oh, encode uniquely the direction of nh. Since the operators 1, do
not commute, it is not possible to find canonical coordinates and in general
a rotation of the surface by 4, 8, or $ does not result in a simple additive
change in the angles # J ~B0,, and $o.
INVARIANT PATTERN REPRESENTATIONS A N D LIE GROUPS THEORY 177

Translational states are encoded by xo.If the surface undergoes a rotation


and a translation, the motion of xo due to the rotation can be separated from
the one induced by the translation, since rotations are encoded by the normal
orientation that is not affected by translations. As noted earlier, any rigid
motion can be expressed by a rotation followed by a translation, and it
follows that x i = TR(xo)is given by
XA = R(490, $)(xo) + (a, b, c ) ~ .
Since the angular disparity is encoded by no and n;, the translational state of
S‘ relative to S can be uniquely determined.
Then we can define the following representation invariant in the strong
sense,
{ao,bo,co; 40,e,, IcloIg,(u,v),b,,(u,v>}, (55)
where the variant (“phase”) component is given by the set of parameters
a, b, c, and 4,0, $, and the invariant component corresponds to the functions
g,(u, v) and b,(u, v). The possiblity of finding the representation in Eq. (55)
depends on the fact that compositions of rotations are again rotations, and
RT differs from TR by a translation; these properties are expressed in
differential form by Eqs. (47b and c). Note that we have assumed that the
coordinates of the point x ; , corresponding to xo under rigid motion, are
known; however, in general, the problem of finding x ; is not trivial.

VT. DISCUSSION

The literature on applications of Lie groups theory to pattern recognition is


surprisingly sparse if one considers the analytical power of the theory and
compares this situation with the extensive applications to other areas of
research, such as physics. A possible explanation may be that in pattern
recognition, particularly for machine vision, a great part of the research has
been carried out by considering images as signals rather than as geometric
structures, and the standard methods of signal theory rather than geometrical
or group-theoretical methods have been the main analytical tools. Moreover,
it must be remembered that Lie groups supply a powerful method of
analyzing invariance of mathematical entities such as sets or maps but do not
provide invariant representations per se, and thus their application to image
processing is limited by the fact that in the domain ( x , y ) only a restricted
class of patterns is invariant under any given one-parameter transformation
group.
By contrast, integral transforms of images, possibly the main legacy of
signal theory to pattern recognition, may provide invariant representations
I78 MARIO FERRARO

independent of the specific form f ( x , y ) , since amplitude and phase spectra


are considered as components of a vector that encode separately the desired
properties of invariance and uniqueness. However, the application of Lie
group-theoretical methods to integral transforms is less immediate than to
images in the representation f ( x ,y ) , since we are interested in changes of
integral transforms due to transformations of the original image.
The method reviewed in Section 1II.B rests on the simple idea that signal
theory and Lie groups can be brought together if one considers integral
transforms as an alternate form of images, so that their geometric properties,
with respect to a given set of one-parameter trnasformation groups acting on
patterns in (x,y), can be analyzed with the standard methods of the theory
of one-parameter (Lie) transformation groups.
The necessary and sufficient conditions stated in Propositions 1 and 2
depend, of course, on how the property of strong invariance is defined (see
Eqs. (13a and b)); different sets of conditions may hold for different
definitions of invariance; however, the definition used here has the advantage
of simplicity, since the transformational state is encoded in an additive
fashion in the phase spectrum, and, moreover, invariant representations are
given by Fourier transforms computed in the canonical coordinates domain
(q,() and thus enjoy the usual properties of the Fourier transform, and, in
particular, they are unique. Furthermore, canonical coordinates turn the
<
action of the transformation group into translations along the q and axes,
so that comparison of transformed images can be carried out in the (q,()
plane by means of the usual cross-correlation techniques. Most examples
reported here concerned invariance under translations, rotations, and
dilations, as these are the transformations usually considered in pattern
recognition. It must be stressed, however, that the method is very general in
that it can be applied to any linearly independent and commuting one-
parameter transformation group. For instance, it is possible to define
representations invariant under smooth transformations of shape, and hence
this method could be applied to the detection of patterns obtained from a
prototype by means of a smooth deformation.
In Section IV we saw under which conditions the amplitude spectrum of
a representation strongly invariant with respect to a pair Yu,Ybis linearly
transformed by an infinitesimal operator Ycand can be used as an input
function for an integral transform that is weakly invariant with respect to
Yu, Yb, Y ( .An open problem concerns the possibility of defining invariant
representations for a set (9,) of non-commuting infinitesimal operators that
is closed under the Lie bracket operation (i.e., the result of the Lie bracket
between Y , ,Y k e{z} is again a member of {z}). Our results show that no
single integral transform can exist that is strongly invariant with respect to
the operators of the set, and that even a weak representation is constrained
INVARIANT PATTERN REPRESENTATIONS A N D LIE GROUPS THEORY I79

by conditions (40a and b); however, it may be conjectured that there exists,
in some space of dimension at least equal to the number of infinitesimal
operators of the set, a system of coordinates that permits the definition of a
representation that encodes the transformational state in a simple way. We
have found an example of such space in the problem of invariant representa-
tions of objects, where to encode the transformational state requires a
six-dimensional parameter space.
Integral transforms of images were developed in the framework of artificial
vision, but they have also enormously stimulated the study of visual percep-
tion in biological systems. Since the pioneer work by Campbell and Robson
(1968), many other psychophysical studies (see, e.g., Braddick ef al., 1978;
Graham, 1980) have shown that spatial vision in biological systems may
depend on a Fourier decomposition of the stimulus pattern in elementary
stimuli that are the basis functions of the transform, and these results have
been supported by electrophysiological findings (see, for instance, Maffei,
1980; De Valois and De Valois, 1988). Although most studies have been
focused on the investigation of visual system sensitivity to amplitude infor-
mation (i.e., contrast), other experiments have proved that vision also
depends critically on phase information (Brettel et al., 1982; Lawden, 1983).
These results, and some neurophysiological experiments (Pollen and Ronner,
1981, 1982) seem to support the suggestion that amplitude and phase values
of local frequency of a stimulus pattern may be represented by a pair of
cortical cells, with even and odd symmetries (Robson, 1975).
There is no a priori reason why basis functions of the Fourier expansion
should be the only ones appropriated to decompose visual stimuli; a log-
polar coordinate system has been used to describe the mapping of retinal
images to the visual cortex (Schwarz, 1980). More recently, it has been
proposed that elementary stimuli based on the kernel of the LPCH transform
can also be used to specify the characteristic of the visual system, and some
results from psychophysical investigations seem to support this idea (Simas
and Dodwell, 1990). Thus, the operations of visual perception, at least in the
early stages of the process, might be characterized by the coding properties
of two sets of independent filters or channels, and these two systems might
encode both shift and size/orientation characteristics of the stimulus pattern.
One might speculate that similar filters exist for any pair of canonical coor-
dinates (compare with section III.D), possibly not as fixed filters but rather
as the result of some adaptive process that depends on the signal (the stimulus
pattern) and the task of the observer.
Pattern representations considered here are specified by real or complex-
valued functions defined on some domain (x, y ) or (u, v ) , and they are called
implicit (Caelli et al., 1992), in that they do not encode explicit image features.
A different type of representation exists that encodes images explicitly or
180 MARIO FERRARO

symbolically, and has been called explicit (Caelli et al., 1992). In such
representations, patterns are decomposed in parts labelled according to a set
of basis elements, such as lines, corners, or regions. Basis elements are
assigned a list of attributes, or unary predicates, such as “straight,” “acute/
obtuse,” or “close/open,” which define individual part characteristics. Parts
are related by binary relationships between parts, e.g., “adjacent to,” “left/
right of,” or “above/below,” which play a specific role in coding patterns
with the required invariance characteristics. The difference between implicit
and explicit representations entails different methods for pattern recognition:
whereas cross-correlation is the standard matching technique associated with
implicit representations, graph matching, heuristic search, and decision trees
are the predominant tools in the matching procedure of explicit represen-
tations. In general, invariant recognition for explicit representations comes
from the development of unary and binary properties of image parts that
have invariant characteristics. For example, part area, perimeter, and
interpart distances are unary features that are invariant under translations
and rotations; tri-part intersection angles are also invariant to dilations.
Thus, the invariance of a representation is determined by the choice of
appropriate features and binary relations, but uniqueness and registration of
transformation can only be guaranteed if the pattern can be uniquely
reconstructed from the features list, and the features are indexed according
to the transformational state.
Finally, we have seen that for three-dimensional objects it is more difficult
to find strongly invariant representations that are computationally efficient,
and this depends on two facts: first there is not, or at least it has not been
found, an alternative way to represent surfaces with properties similar to
integral transforms of images, and further, the transformations of interest do
not commute, unless one considers only the trivial case of translations in R3.
Differential geometry provides the basic conditions for the invariance and
uniqueness of the representations, but the representation {g,(u, v), b , ( u , v ) } is
not computationally very efficient in that it requires a six-dimensional space
only to encode surfaces shape, that is, without considering position and
orientation in R3. Of course, there may exist alternative representations of
lower dimensionality that retain all important information about surface
shape even though they are not unique, and we have seen that the curvature
functions H(u, v) and K(u, v) seem to have some of these characteristics; but
the solution of this problem requires further investigation, both theoretical
and experimental.
INVARIANT PATTERN REPRESENTATIONS AND LIE GROUPS THEORY 181
A
APPENDIX

Not surprisingly, the literature on Lie groups is virtually boundless. In this


Appendix we shall not try, of course, to cover all topics concerning Lie
groups theory, but just to provide some basic notion. More of what follows
can be found in Sagle and Walde (1973) and Olver (1986).
Preliminarily, we say that a map 4 is C " , or smooth, if is infinitely
differentiable, and 4 is a dzfeomorphism if it is one-to-one and both 4 and 4-l
(the inverse of 4 ) are C".
An m-dimensional smooth manifold is a set locally diffeomorphic to a space
R", that is, it "looks like" R", at least locally.
Definition A l . An m-dimensional smooth manifold is a set M , together with
a countable collection of subsets U, c M , called coordinate charts, and one-to-
one maps 4,: U, -+ V, onto open subsets V,of R", called local coordinate maps,
which satisfy the following properties:
1. The coordinate charts cover M:
UU,=M.
2. Each 4, is a bijection of U, onto an open subset 4, (U,) of some space R"
and for any pair i,j . 4, (U, n U , ) is open in R" .
3. For any pair of coordinate charts with non-empty intersection U, n U,, the
map
4,o 4;I = 41(U, n v,> 4 , N n v,)
+

is a difeomorphism.
( A more general deJnition of manifold can be found in Lunge (1967).)
The coordinate maps 4,: U, + endow the manifold with a coordinate
system x = ( x , , . . . ,x,) and with the topological structure of R".
Roughly speaking, a Lie group is an infinite group whose elements can be
parameterized smoothly. Thus, any element g of the group can be denoted by
g(a,,. . .a,) in terms of the parameters a , , . . . ,a,. The parameters of element
gh, resulting from the group operation, are smooth functions of the
parameters of g and h. The importance of the Lie group resides in the fact
that one can combine both differential calculus and algebra to investigate the
structure of the groups.
Definition A2. An r-parameter Lie group is a group G that also carries the
structure of an r-dimensional smooth manifold such that both the group
operation
m: G x G-G, m ( g , h ) = g h , g,hEG,
182 MARIO FERRARO

and the inversion


i : G-G, i(g)= g - ' , gEG,
are smooth maps between manifolds.
A familiar example of a one-parameter Lie group is G = SO(2), the group
of rotations in the plane
cos6 -sine
G = [ sine cose ] , 0<8<2x.

Another example of a Lie group is O(n), the group of orthogonal n x n


matrices. In this case, the group is a 1/2n(n - 1)-parameter Lie group since
an orthogonal matrix has 1/2n(n - 1) independent elements.
In most applications, Lie groups must not be considered as abstract groups
but rather as groups of transformations that act on some manifold M . For
instance, S O ( 2 ) is the group of rotations in the plane M = R2 mapping a
vector p to a new vector p', and O ( n ) is the group of orthogonal linear
transformations on W . In general, a Lie transformation group on some
manifold M exists if for any g E G is associated a map from M to itself.
Definition A3. Let M be a smooth manifold. A Lie group of transformations
acting on M is given by a Lie group G and a diffeomorphism T : G x M + M ,
p' = T ( g , p ) with the following properties:
1. For a l l g , h E G , p E M ,

2. For all p E M ,
T k P ) = P.
3. I f ( g , p ) E G x M , then ( g - ' , T ( g , p ) ) E G x M and

The simplest example of a transformation group is probably the group of


translations in R": let a # 0 be a fixed vector, for instance the unit vector in
R", and let G = R. Define
T(g,p) =P + ga, PER", gER.
The action of a transformation group maps p in p', p' = T ( g , p ) .
The orbit OPof the action o f G through p is defined as the set of all points
in M that can be reached from p . Formally,
1. I f p E 0 , then p' = T(g,p)EO;
2. If 0' c 0 , then either 0' = 0, or 0' is empty.
INVARIANT PATTERN REPRESENTATIONS A N D LIE GROUPS THEORY 183
The main tool in the theory of Lie transformation groups is the concept of
infinitesimal transformation. To introduce infinitesimal transformations we
need first to define the concept of a vector field on a manifold, and we shall
also discuss tangent vectors on a manifold.
A curve C o n a manifold is defined as a map y: I + M . where Zis an interval
on the real line R. Let x = (x,. . . ,x,) be the coordinate system in M . It is
clear that a point p is defined by the value taken by x at p . In the following
we shall use x to denote ambigously both the point and its coordinates.
A curve C in M is given by m smooth functions y(s) = (yl (s), . . . ,y,(s)) of
the real variable s E R.
We define at each point x = y(s), x E C , a tangent vector, whose components
are given by the derivative
Y(s) = d ~ / d =
s (71 . ..> ?m(s)),
and whose basis is (d/ax,,. . . ,d/dx,). Then a tangent vector at the point x
can be written as:
VI, = yl (s)a/ax, + 32(s)d/dx, + + j,(s)a/ax,.
' * *

The symbols d/dx, can be considered for the moment as a special way to
denote the basis of a tangent vector; later, we shall see that they are indeed
partial differential operators. Consequently, tangent vectors (or, more
exactly, vector fields) can be regarded as differential operators.
Consider the helix
y(s) = (cos s,sins, s)
in R3, with coordinates (xI,x2.x3).It has tangent vector
vI, = - sin sd/dx + cos sd/dy + a/dz = - y a / a x + xa/dy + d/dz
at the point x = (xi,x2,x3)= (cos s,sins, s).
Two curves passing through the same point x have the same tangent vector
if and only if the derivatives a t x are equal. It is possible to prove (Olver,
1986) that this property does not depend on the local coordinate system used
in the neighborhood of x.The set of all tangent vectors to all curves passing
through x in M is called the tangent space to M at x and is denoted by TM,.
The collection of all tangent spaces at all points x in M is called the tangent
bundle of M , denoted by
TM = U,TM,.
If y(s) is a smooth curve on M , then its tangent vectors will vary smoothly
from point to point.
A vectorfield v on M is a mapping that assigns a tangent vector vI to each
'i

point x E M , and vI, varies smoothly from point to point. The vector field has
184 MARIO FERRARO

the form
m

v= 1 Ci(x)a/axi,
i= I

where Ti are smooth functions of x .


An integral curve of a vector field v is a smooth curve x = y(s) whose
tangent vectors at any point are given by the corresponding values of the
vector fields
dY/dS = Vl?(J)
for all s. In local coordinates, x = y(s) is the solution of the system of m
differential equation
dx .
2= l i ( X ) .
ds
The theorems of existence and uniqueness of solutions for systems of
ordinary differential equations guarantee that the solution of the system is
unique for any set of initial values y(0) = x, and that there exists a unique
maximal integral curve, i.e., a curve that is not contained in any other integral
curve passing through x,.
Let v be a vector field and let us denote by Y ( x ,x ) = ( Y l ( s x, ) . . .Y,(s, x ) )
the maximal integral curve passing through x; we call Y theflow generated
by v. The flow Y satisfies the following properties:
Y ( t , Y ( s , x ) )= yl(s + t,x), (564
Y ( 0 , x )= x

Comparing Eqs. (56a and b) with Definition A3 and identifying the group
operation with the addition, it is apparent that the flow generated by a vector
field is the same as a group action of the Lie group R on the manifold M , a
Lie one-parameter group of transformations. We shall denote by T(s,x ) (or T,,
to simplify the notation) both the transformation groups and the action. The
vector field v is called the infinitesimal operator (injinitesimal generator, Lie
derivative) of the action, and henceforth will be denoted by YS, or just 9
when no confusion arises. Lie’s first theorem (Guggenheimer, 1963) demon-
strates that any one-parameter transformation group is determined by its
infinitesimal operator. The orbits of T(s,x ) are just the integral curves of the
vector field and are given by the formula
X’ = T(s,X ) = ( T I(s, x ) , . . . , T,(s, x ) ) (57)
INVARIANT PATTERN REPRESENTATIONS AND LIE GROUPS THEORY 185
and, defining

the infinitesimal operator is

Ys= cm

i= I
ai(x)a/axi. (59)
The computation of the one-parameter group T(s,x) from the infinitesimal
operator Y is often referred to as exponenfation of 9,
and it is customary to
adopt the notation
T ( s ,x) = exp [SLY].
Let 9be the infinitesimal operator of T(s,x) and letf: M -+ R be a smooth
function. We are interested in studying the changes off under the action
T(s,x), denoted by T’f(x) =f(T,x) =f(exp (sLY)x). By Taylor’s theorem,
+ + +
f(exp(s9)x) = f ( x ) s p f ( x ) S2/29’f(x) . . . ?/k! Y k f ( x ) O(sk+l), +
where,

and g k f ( x ) is defined recursively:


w - ( x ) = 9 [ 9 k - 1 f ( x ) 1 , Y O f ( x )= f ( x ) ,
(see, for example Olver, 1986).
Then the exponential map is defined by the expression
m
exp [ s 9 ] = C sk/k!y k .
0

A functionfis invariant under the action T(s, x) if and only if Y , f ( x ) = 0.


Then, by definition, the orbits of the transformation are invariant with
respect to the action of the group. Moreover, iff is invariant, any smooth
function g ( f ) is likewise invariant (Ovisiannikov, 1982). Note again that the
infinitesimal operator is a vector field; in other words, Y can be seen
algebraically as an operator and geometrically as a vector field assigning a
tangent vector to each point of the manifold A4 (see, for example, Spivak,
1979, for a complete discussion). Likewise, we can identify T ( s , x ) with
U(s, x). We shall now give some examples, restricted to the case M = R2, of
infinitesimal operators of transformation groups.
Example A4. The action of the group SO(2) is given by
TI(s,x) = x’ = x cos (s) - y sin (s),
T 2 ( s , x )= y’ = xsin(s) + ycos(s).
186 MARIO FERRARO

From Eq. (58) it follows that

g ILO
T,(s,X ) = X.

The infinitesimal operator yRis defined by (compare with Eq. (59))


yR= -ya/ax + xa/ay,
The orbits of this group are circles, as can be found by solving the system
dx
- -- - y , dY
-= x,
ds ds
or, identically, the Pfaffian system
dx
- - --
dy
_
Y X
Example A5. The group of dilations in the plane is defined by the transfor-
mations x’ = exp (s)x, y = exp (s)y. It is trivial to show that the infinitesimal
operator of the group is -rP, = xal8.x + ya/ay. Note that sometimes in the
literature a dilation is defined by x’ = sx, y’ = sy. This is not correct because
in this case the conditions (56a and b) are not satisfied. The orbits of this
group are solutions of the system of ordinary differential equations
dx
-= x , dY = Y ,
ds
or

which yields l g b ) - lg(x) = const, that is, y / x - const = 0, i.e., the orbits
form a star of radial lines.
The most important operation on infinitesimal operators is their com-
mutator or Lie bracket:
[yS,g1
= y: % - g y 5 .
In local coordinates, if

then
INVARIANT PATTERN REPRESENTATIONS A N D LIE GROUPS THEORY 187

Note that the Lie bracket of two infinitesimal operators is an infinitesimal


operator acting on smoth functions according to the formula
[29%l(f) =2 s ( 2 r ( f ) ) - %(2(f)).
A standard result of Lie group theory is that the Lie bracket has the following
properties:
I .Bilinearity
+ c’=Y,,=YW,]
[cZ. = c[-Yy9Y , ]+ c ’ [ 2 r , 9 w 3 1

[Lg,c 2 r + C’LYw] = C [ Y X ,Y,]+ c”=Ys, =YM,],


where c, c’ are constants.
2. Skew-symmetry
E?%I = -[%,2sl.

3. Jacobi identity
[% [%,-%I] + [%,[Z.,-%I] + [%,WW.,
=zl]
= 0.
Finally, we define the prolongation of an infinitesimal operator (or vector
field) and restrict ourselves to the case M = R2(see Olver, 1986, for a general
treatment). Let G be a one-parameter group acting on Rz, and let the
transformations be given by
x’ = T ,(s; x, y ) , y’ = T2 (3; x,y ) ,
and
2 = a, (x,y)a/ax+ a,(x,y)a/ay.
The first prolongation of the group G is a group GI acting on the variables
x, y and y = dy/dx,
x’ = T , ( s ; x , y ) , y’ = T 2 ( s ; x , y ) , j ’ = T , ( s ; x , y , j ) .
The infinitesimal operator of the prolongation is
LP) = =Y + Al(x,y,y)a/aj,
where

(Hoffman, 1970; Olver, 1986). The group G , , the second order prolongation,
can be obtained from GI in a similar way and the process can be repeated to
G k ,the kth-prolongation of G . The group Gkis determined by the kth-prolon-
gation of the infinitesimal operator
LP) =2 + Al(x,y,j)a/aj+ . . . + ..
~ k ( ~ , ~ , ) i ,, y. ( k ) ) ,
188 MARIO FERRARO

where y(k)= dky/dXk,and

(Hoffman, 1970; Olver, 1986).


The kth prolongation of 64, 9(k) must not be confused with Z k ,the kth
term in the operational expansion of the exponential map.
Invariants of prolongations of an infinitesimal operator, called differential
invariants or prolongations of invariants, must satisfy the condition
64(k'F(x,y , j , . . . ,y'k') = 0.
Consider the differential invariant F(x, y , j ) of 9"):
9 ( l ' F ( x ,y , j ) = 0,
and the corresponding Pfaffian system
dx dy dy
-=-=
a, a2
z.
One solution of this system is the invariant of 9, u ( x , y ) = const, which is
obtained by solving the first pair of equations; a second independent integral
is
u ' ( x ,y , j ) = const.
has the form
The most general function invariant under the action of 9'
F(u, u ' ) = 0.
(Note that the above expression is an implicit differential equation in y , since
u' contains j . )
Differential invariants of higher order can be computed by solving the
Pfaffian system associated with the equation
9 ' k ' F ( x ,y , j,* . . ,y'k') = 0
(Hoffman, 1970; Olver, 1986).

APPENDIX
B
The literature on differential geometry is even larger than that on Lie groups.
These notes are based mainly on Stoker (1963) and Do Carmo (1976).
The most general way to define a surface is given by the parametric
represen tat ion:
f:(u, v ) -+f(u, v ) = (x(u,4,y(u, 4, d u , v)),
INVARIANT PATTERN REPRESENTATIONS AND LIE GROUPS THEORY 189

where (u, v) E R2 and f is a vector-valued homeomorphism. That is, any point


x = (x(u, v),y(u, v), z(u, v)' belonging to the surface S is given coordinates
whose values are determined byf(u, v). Iff is C" the surface is said to be of
class C"; iff is C" the surface is called smooth. Note that a surface is a
two-dimensional manifold.
Although the parametric representation is the main tool for exploration of
surfaces, another type of representation must be mentioned, which is well
known from analytic geometry. This is the implicit representation of form
F(x, y , z) = 0.
For instance, the parametric representation of a sphere is
x = rcos(u)cos(v) y = rcos(u)sin(v) z = rsin(u),
where coordinates u and v are the latitude and longitude angles respectively,
and the implicit representation is
x2+ y2 + 2 2 = 2.
A Monge patch representation is a particular case of the parametric
representation and is defined by the formulae
x = u, y = v z = h(x,y).
We define the partial derivatives of x with respect to u,v respectively by
xu = axlau, x, = a x / a v and we say that a point on the surface is regular if
xu x x, # 0, a point for which xu x x, = 0 is called singular.
The tangent vectors to the surface at a point xo are the partial derivatives
x u , x, calculated at x,,.The directions of these tangent vectors define a tangent
plane to S at x,,; the unit normal to the surface is defined as the normalized
vector product of xu and x,:

n=-. xu x xv

1% x XVI

All information about surface shape is contained in the coefficient of thefirst


and secondfundamental forms. The first fundamental form is defined by the
following quadratic form:
2
I= 1 gijduidu,,
ij=l

where, u' = u, u2 = v. The coeficients g,j are defined to be


gll = xuxu g22 = xvxv g12 = g2, = xuxv;
they define the metric of the surface and are the element of a symmetric
matrix k]called the metric tensor, and g denotes the determinant of k].The
190 MARIO FERRARO

metric of a surface depends only on the surface itself and does not depend on
how the surface is embedded in the three-dimensional space; therefore, the
metric is referred to as an intrinsic property of the surface.
We denote by
xu, = a2x/auau, x , = a2x/avav, xu, = a2x/auav,
the second partial derivatives of x with respect to u and v, and, for reasons
that will be apparent later on, we introduce the notation
X I 1 = xu,, x22 = x,, X I 2 = x21 = xuv.

The second fundamental form is defined by


L

I1 = 1 bijduiduj,
i.i=l

where the [b] matrix elements are


b, = x,n,
and the determinant of [b] is denoted by b.
The second fundamental form calculated at a point xo E S estimates how
the surface in a neighborhood of xo deviates from the tangent plane at x,;
therefore, it is dependent upon how the surface is embedded in the space R3.
The vectors x,, lie in R', and hence can be written as a linear combination
of the tangent vectors x i = x u , x2 = x,, and n. The linear dependence of xII
on x, and n gives rise to the following vector differential equations, which
correspond to nine scalar differential equations, called the Gauss equations
(Stoker, 1963):
L

x 1.1 . =
k= I
+
ri?xxk b,n,
where i , j range from 1 to 2 (note that there are only three equations as
XI2 = X2I).
The Christofel symbols of second kind ri;depend only upon the coefficients
of the first fundamental form and are expressed by the formula

where we have again used the notation uI = u, u2 = v. gkmare the elements of


&I], the matrix inverse to k].
Consider now the vectors n, = an/& and n2 = an/av at a point x o . They
belong to the tangent plane to the surface at xo and are written as (Stoker,
1963)
2
nj = 1$ x i ,
i= I
j = 1,2.
INVARIANT PATTERN REPRESENTATIONS A N D LIE GROUPS THEORY 191
These two equations, corresponding to six scalar differential equations, are
called the Weingarten equations. The system formed by them with the Gauss
equations is called the Gauss- Weingarten system of equations. The equation's
coefficients & depend on both the first and second fundamental matrices
[PI = k-'"1,
or
2
P; = k1
=I
b,,gk'.

The matrix [PI is referred to as the shape operator, or Weingarten operator,


which maps tangent vectors to other tangent vectors in the tangent plane
associated with each point. One can consider the matrix [PI as an operator
that determines the surface shape by relating the intrinsic geometry of the
surface to the geometry of R3.
The eigenvalues k , , k, of [PI, calculated at a given point xo are called the
principal curvatures at xo. The directions of the corresponding eigenvectors
v(k,), v(k2)are called the principal directions. From the usual properties of the
eigenvectors it follows that if k , # k, then v ( k , )and v(k,) are orthogonal.
Connected curves C such that for all x E C the tangent line of C is a principal
direction at xo are called lines of curvature. The normal curvature k, o f the
surface in any direction 1 is defined by the Euler formula
k, = k , cos2(0) + k, sin2(O),
where 0 is the angle between 1 and the eigenvector v ( k , ) .Directions for which
the normal curvature is zero are called asymptotic directions; the correspond-
ing connected curves are called asymptotic curves.
The two most important measures of surface curvature are

K = k , k, = det [PI = -b
g

K is called the Gaussian curvature and H the mean curvature.


Let us go back to the Gauss-Weingarten equations

and consider them as differential equations in the unknown x , ,x2 and n. It


can be proved (Stoker, 1963) that if g , and b, satisfy certain compatibility
192 MARIO FERRARO

conditions there exist functions x , ,x , , n that satisfy the differential equations


of Gauss-Weingarten. In turn, the trihedral x l r x 2 , ndefines a surface
uniquely up to translations. The compatibility conditions are

where summation on k, m is understood.


Equation (63) is just another way to state the result of the theorema
egregium due to Gauss, which establishes that the Gaussian curvature
depends only of the coefficients of I , and hence K is invariant under
isometries.
These results are summarized in the so-called Fundamental Theorem of
Surface Theory: two quadratic differential forms I = Cg,duidu, and
I1 = Cbijduiduj,with coefficients of class C2, defined in some domain of the
plane (u, v ) are, in some neighborhood of a given point of this domain, first
and second fundamental form of a surface if and only if
1. g = d e t k ] > 0; and
2. the coefficients g , and b, satisfy the compatibility conditions expressed
by Eqs. (63), (64), and (65).
The surface is determined by the two fundamental forms completely within
position in space, i.e., two different surfaces with coinciding first and second
fundamental forms can be superimposed onto each other by a rigid motion
in space. In other words, the coefficients g , , b, define a surface uniquely, up
to rigid motions.

REFERENCES

Aarts, E., and Korst, J. (1989). “Simulated annealing and Boltzmann Machines.” John Wiley
and Sons, New York.
Attneave F. (1954). Psychological Review 61, 183.
Ballard, D. H., and Brown, C. M. (1982). “Computer Vision.” Prentice-Hall, Englewood Cliffs,
N.J.
Bed, P. J., and Jain, R. C. (1985). ACM Compur. Surveys 17, 75.
Besl, P. J., and Jain, R. C. (1986). Comput. Vision Graphics and Image Process. 33, 33.
Beusmans, J . M. H., Hoffman, D. D., and Bemnet, B. M. (1987). J . Opt. SOC.Am. A4, 1155.
INVARIANT PATTERN REPRESENTATIONS AND LIE GROUPS THEORY 193
Bischof, W. F., and Ferraro, M. (1989). Computational Intelligence 5, 121.
Blake, A., and Marinos, C. (1990). ArtiJicial Intelligence 45, 323.
Bluman, G. W., and Cole, J. D. (1974). “Similarity Methods for Differential Equations.”
Springer-Verlag, New York.
Bochner, S., and Chandrasekharan, K. (1949). “Fourier Transforms.” Princeton University
Press, Princeton, New Jersey.
Borello, L., Ferraro, M., Penengo, P., and Rossotti, M. L. (1981). Cybern. 39, 78.
Braddick, O., Campbell, F. W., and Atkinson, J. (1978). In “Handbook of Sensory Physiology”
(R. Held, H. Leibowitz, and H. L. Tuber, eds.), Vol. 8. Springer-Verlag, New York.
Brady, M., and Yuille, J. (1984). IEEE Transactions on Pattern Analysis and Machine Intelligence
PAMI-6, 288.
Brady, M., Ponce. J., Yuille, A., and Asasa, H. (1985). Comput. Vision Graphics Image Process.
32, I .
Breitmeyer, B. S . (1973). Vision Res. 13, 41.
Brettel, H., Caelli, T. M., Hilz, R., and Rentschler, I. (1982). Human Neurobiol. 1, 61.
Bundesen, C., and Larsen, A. (1975). J. Exp. Psychol. Human Percept. Perform. 3, 214.
Caelli, T. M. (1976). Mathematical Eiosciences 30, 191.
Caelli, T. M., and Dodwell, P. C. (1982). Percept. Psychophys. 32, 314.
Caelli, T. M., and Liu, Z-Q. (1988). Pattern Recognition 21, 205.
Caelli, T. M., and Umanski, J. (1976). Vision Res. 16, 1055.
Caelli, T. M., Preston, G. A. N., and Howell, R. (1978). Vision Res. 18, 723.
Caelli, T. M., Ferraro, M., and Barth, E. (1992). In “Neural networks for human and machine
perception” (H. Wechsler, ed.). Academic Press, Boston.
Campbell, F. W., and Robson, J. G. (1968). J. Physiol. (Lond.), 203, 237.
Canny, J. ( I 986). IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI-8, 679.
Carpenter, G. A,, Grossberg, S., and Mehanian, C. (1989). Neural Networks 2, 169.
Cartan, H.(I97 I). “Differential Forms.” Kershaw Publishing Company, London.
Casasent, D., and Psaltis, D. (1976). Optical Engineering 15, 258.
Chern, S. (1957). Amer. J. Math. 19, 949.
Corballis, M. C., Zbrodoff, N. J., Shetzer, L. I., and Butler, P. B. (1978). Memory Cognition 6,
98.
Crampin, M., and Pirani, F. A. E. (1986). “Applicable Differential Geometry.” Cambridge
University Press, Cambridge.
De Valois, R. L., and De Valois, K. K. (1988). “Spatial Vision.” Oxford University Press,
Oxford.
Do Carmo, M. P. (1976). “Differential Geometry of Curves and Surfaces.” Prentice Hall,
Englewood Cliffs, New Jersey.
Dodwell, P. C. (1983). Percept. Psychophys. 34, I .
Eley, M. G. (1982). Memory Cognition 10, 25.
Eriksen, B. A., and Eriksen, C. W. (1974). Percept. Psychophys. 16, 143.
Fan, T-J., Medioni, G., and Nevatia, R. (1989). IEEE Transactions on Pattern Analysis and
Machine Intelligence, 11, 1140.
Faux, I. D., and Pratt, M. J. (1979). “Geometry for Design and Manufacture.” Ellis Horwood,
Chichester, United Kingdom.
Ferraro, M., and Caelli, T. M. (1988). J. Opt. Soc. Amer. AS, 738.
Ferraro, M., and Foster, D. H. (1984). Eiol. Cybern. 50, 9.
Ferrier, N. (1987). “lnvariance coding in pattern recogniton.” MSc. Thesis. University of
Alberta, Edmonton, Alberta, Canada.
Foster, D. H. (1972). Eiol. Cybern. 11, 223.
Foster, D. H., and Mason, R. J. (1979). Biol. Cybern. 32, 85.
194 MARIO FERRARO

Gilbarg, D., and Trudinger, N. (1977). “Elliptic Partial Differential Equations.” Springer-
Verlag, New York.
Giulianini, F., Ferraro, M., and Caelli, T. M. (1992). J . Opt. Soc. Amer. A9, 494.
Giusti, E. (1978). Invent. Math. 46, 111.
Gonzalez, R. G., and Wintz, P. (1987). “Digital Image Processing.” Addison-Wesley, Reading,
Massachusetts.
Graham, N. (1980). In “Visual Coding and Adaptability” (C.S. Haris, ed.). Lawrence Erlbaum
Associates, Hillsdale, New Jersey.
Crimson, W. E. L. (1980). AIM 565, Artificial Intelligence Laboratory, Massachusetts Institute
of Technology, Cambridge, Massachusetts.
Grossberg, S. (1976a). B i d . Cybern. 23, 121.
Grossberg, S. (1976b). Biol Cybern. 23, 187.
Guggenheimer, H. W. (1963). “Differential Geometry.” McGraw-Hill, New York.
Hadamard, J. (1923). “Lectures on the Chauchy Problem in Linear Partial Differential
Equations.” Yale University Press, New Haven, Connecticut.
Hansen, E. W. (1981). Applied Optics 20,2266.
Hardlick, R. M., Watson, L. T., and Laffey, T. J. (1983). International Journal of Robotic
Research 2, 50.
Hoffman, W. C. (1966). Journal of Mathematical Psychology 3, 65; errata (1967). Journal of
Mathematical Psychology 4, 348.
Hoffman, W. C. (1970). Mathematical Biosciences 6, 437.
Hoffman, W. C. (1977). Cahiers de Psychologie 20, 139.
Horn, 9 . K. P., and Brooks, M. J. (1986).Comput. Vision Graphics and Image Process. 33, 174.
Hsu, Y-N., and Arsenault, H. H. (1982). Applied Optics 21,4016.
Hsu, Y-N., Arsenault, H. H., and April G . (1982). Applied Optics 21,4012.
Hubel, D. M., and Wiesel, T. N. (1962). J . Physiol. 160, 106.
Hubel, D. M., and Wiesel, T. N. (1965). J . Neurophysiol. 28,229.
Kahn, J. I., and Foster, D. H. (1981). Q. J . Exp. Psychol. 33A, 155.
Koenderink, J. J. (1987). In “Image Understanding” (W. Richards and S. Ulman, eds.). Ablex
Publishing Corporation, Norwood, New Jersey.
Kolers, P. A., Duchnicky, R. L., and Sundstroem, G. (1985). J. Exp. Psychol. Percept. Perform.
11, 726.
Korn, G . A., and Korn, T. M. (1968). “Mathematical Handbook for Scientist and Engineers.”
McGrdw-Hill, New York.
Kubovy, M., and Podgorny, (1981). Percept. Psychophys. 30. 24.
Lang, S. (1967). “Introduction to Differentiable Manifolds.” John Wiley and Sons, New York.
Lawden, M. C. (1983). Vision Res. 23, 1451.
Lupker, S. J., and Massaro, D. W. (1979). Percept. Psychophys. 25, 60.
Maffei, L. (1980). In “Handbook of Sensory Physiology” (R. Held, H. Leibowitz, and H. L.
Tuber, eds.). Vol. 8. Springer-Verlag, New York.
Marr, D. C. (1976). Phil. Trans. Roy. Soc. London B207, 483.
Marr, D. C. (1982). “Vision.” Freeman, San Francisco.
Marr, D. C., and Hildreth, E. (1980). Phil. Trans. Roy. Soc. London B-207,187.
Metzler, J., and Shepard, R. N. (1974). In “Theories in Cognitive Psychology.” (R. Solso, ed.).
Lawrence Erlbaum Associates, Hillsdale, New Jersey.
Nazir, T. A., and ORegan, J. K. (1990). Spatial Vision 5, 81.
Olver, P. J. (1986). “Application of Lie Groups to Differential Equations.” Springer-Verlag,
New York.
O’Neill, B. (1966). “Elementary Differential Geometry.” Academic Press, New York.
INVARIANT PATTERN REPRESENTATIONS A N D LIE GROUPS THEORY 195
Ovisiannikov, L. V. ( 1982). “Group Analysis of Differential Equations.” Academic Press, New
York.
Papoulis, A. ( I 984). “Signal Analysis.” McGraw-Hill, New York.
Pentland. A. P. (1987). IEEE Transaciions on Paifern Analysis and Machine Infelligence PAMI-
9, 523.
Poggio, T., and Torre, V. (1984). AIM 773, Artificial Intelligence Laboratory, Massachusetts
Institute of Technology, Cambridge, Massachusetts.
Pollen, D. A,, and Ronner, S. F. (1981). Science N . Y . 212, 1409.
Pollen, D. A,, and Ronner, S. F. (1982). Vi.rion. Res. 22, 101.
Richards, W. A,, Koenderink, J . J., and Hoffman, D. D. (1987). J . Opi. Soc. Am. A4, 1168.
Robson. J. G . (1975). In “Handbook of Perception” (E.C. Carterette and M . P. Friedman, eds.),
Vol. 5, 81. Academic Press, New York.
Rock, I. (1973). “Orientation and Form.” Academic Press, New York.
Rock, I. (1984). “Perception.” Scientific American Library, New York.
Rosenfeld, A,. and Kak, A. C. (1982). “Digital Picture Processing,” Second Edition. Academic
Press, New York.
Sagle, A. A,, and Walde, E. W. (1973). “Introduction to Lie Groups and Lie Algebras.”
Academic Press, New York.
Schutz. B. (1980). “Geometrical Methods for Mathematical Physics.” Cambridge University
Press.
Schwartz, E. L. (1980). Vision Res. 20, 645.
Sederberg, T. W., and Anderson, S. N. (1985). IEEE Compui.Graphics Appl. 5, 23.
Simas, M. L. de B., and Dodwell, P. C. (1990). Spaiial Vision 5, 59.
Spivak, M. (1979). “A Comprehensive Introduction to Differential Geometry.” Publish or
Perish, Berkeley, California.
Stoker, J. J. (1963). “Differential Geometry.” Wiley-Interscience, New York.
Tiller, W. (1983). IEEE Compui. Graphics Appl. 3, 61.
Torre. V., and Poggio, T. A. (1986). IEEE Transactions on Paitern Analysis and Machine
Inielligence PAMI-8, 147.
Ullman, S. (1979). Proc. R. Soc. Lond. B. 203, 405.
Wilkinson, F. E., and Dodwell, P. C. (1980). Naiure 284, 258.
Witkin, A. P. (1981). Artificial Infell. 17, 17.
Woodham, R. J. (1980). Opiical Engineering 19. 139.
Wu, R., and Stark, H. (1984). Applied Optics 23, 838.
Yuzan, Y., Hsu. Y-N., and Arsenault, H. H. (1982). Opiica Acfa. 29, 627.
Zetzsche, C., and Barth, E. (1990). Vision Res. 30, I 1 I I .
Zucker, S. W. (1985). Compuf. Vision Graphics and Image Process. 32, 74.
This Page Intentionally Left Blank
ADVANCES I N ELECTRONICSA N D ELECTRON PHYSICS,
VOL. a4

Finite Topology and Image Analysis


V. A. KOVALEVSKY
Technische Fachhochschule Berlin, Berlin, Germany

I . Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
11. Abstract Cell Complexes. . . . . . . . . . . . . . . . . . . . . . . . . 20 1
111.Images on Cell Complexes, . . . . . . . . . . . . . . . . . . . . . . . 208
IV.Resolution of Connectivity Contradictions. . . . . . . . . . . . . . . . . 212
V.Boundaries in Complexes . . . . . . . . . . . . . . . . . . . . . . . . 216
VI.Simple Image Analysis Problems . . . . . . . . . . . . . . . . . . . . . 220
VII.The Cell List Data Structure. . . . . . . . . . . . . . . . . . . . . . . 224
VIII.Subgraph and Subcomplex Isomorphism . . . . . . . . . . . . . . . . . 229
IX.Variability of Prototypes and Use of Decision Trees . . . . . . . . . . . . 238
X. Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
A. Handwritten Characters . . . . . . . . . . . . . . . . . . . . . . . 245
B. Block Diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
C. Cartography. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
D. Technical Drawings . . . . . . . . . . . . . . . . . . . . . . . . . 254
XI. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . 258
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258

I. INTRODUCTION

Problems of image analysis and image understanding are closely related to


geometrical notions, such as shape or size, and also to some topological
notions such as boundaries and connectivity of subsets. It is thus necessary
to look for ways of implementing basic topological concepts into processing
of digitized images.
The overwhelming majority of topological literature is concerned with
infinte sets. Digitized images, however, are defined on finite sets e.g., on
arrays of picture elements, called pixels. It may seem that finite sets are simply
a particular case of infinite sets and hence the topological knowledge may
simply be transferred into the world of digital images. This is, however, not
always true. Consider the idea that for any neighborhood N of a point p there
is another neighborhood of p that is a proper subset of N . This idea, being
considered as natural for infinite sets, cannot be applied to finite sets. Hence,
certain topological ideas must be different for finite and infinite sets.
Finite topological spaces have been well-known for at least 50 years

197 Copyright ~71992 by Academic Press, Inc.


All rights of reproduction in any form reserved.
ISBN 0-12-014726-2
198 V. A. KOVALEVSKY

(Alexandroff and Hopf, 1935), but this knowledge is only weakly represented
in toplogical text books. Therefore, specialists in image analysis were forced
to look for their own solutions of the problem. Rosenfeld (1970) introduced
the adjacency relation among pixels, thus considering the digital plane as a
graph whose vertices are the pixels; an edge of the graph corresponds to each
pair of adjacent pixels. Such a graph is called an adjacency graph. Connected
subsets of the digital plane are then defined by means of paths in the
adjacency graph. A path in a graph G is such a sequence Q of vertices of G,
Q= ( ~ 1 9 vz, . . . v,,h
that any two vertices being subsequent in Q are connected by an edge of G.
A subset S of the digital plane defines a subgraph SG of G. The subset S is
declared to be connected if for any two pixels a, b of S there exists a path in
SG that contains the vertices corresponding to a and b (“path in SG”
means that it lies completely in S G ) . A subset S that is not connected may
be considered as a union of disjoint connected subsets such that no two of
them compose a connected set. Such subsets are called components of S.
The concept of adjacency has led to some progress in image analysis: it
became possible to consider connected subsets of a segmented image, bound-
aries, pairs of adjacent subsets and to formulate some image analysis
problems as, e.g., those of subgraph isomorphism (Ullmann, 1976) or of
consistent labelling (Shapiro, 1983). These problems are considered in
Section VIII.
However, attempts to develop a consistent topology for two- and three-
dimensional images by means of graphs have failed due to the so-called
connectivity paradox and some great difficulties in defining the boundaries of
subsets (Pavlidis, 1977). The connectivity paradox consists in the following.
The well-known Jordan theorem states that a simple closed curve in the
Euclidean plane separates the complementary part of the plane into two
components: the interior and the exterior of the curve. The natural substitute
for a simple closed curve in the digital plane is a simple closed path P in the
adjacency graph. “Simple” means that any vertex of P has exactly two
adjacent vertices in P . It is possible to consider at least two different kinds of
adjacency graphs: those of 4- and %adjacency. In the 4-adjacency graph, a
vertex has four adjacent vertices (Fig. la), and in the 8-adjacency it has eight
(Fig. lc). As it may be seen in Fig. la, there exist in a 4-adjacency graph such
simple closed paths that the rest of the graph consists of more than two
components. On the other hand, a simple closed path in an 8-adjacency graph
does not separate the rest of the graph at all: the “interior” of the path always
remains connected with the “exterior.” Only under 6-adjacency is the number
of components always equal to two (Fig. lb).
Attempts to overcome this difficulty have been made by introducing an
FINITE TOPOLOGY AND IMAGE ANALYSIS 199

a b C

FIGURE I . Separation by simple closed paths under (a) 4-adjacency, (b) 6-adjacency, and
(c) 8-adjacency.

8-adjacency for “objects” and a 4-adjacency for the “background” in binary


images (Rosenfeld and Kak, 1976). However, this suggestion cannot be
considered as successful since it does not lead to a valid concept of a
topological structure of the digital space: the space structure must not depend
upon the definition of variable subsets of the space. Besides, such a “double
adjacency” does not solve the problem at all in the important case of
non-binary images.
Another difficulty is associated with the notion of the boundary of a subset.
According to the classical topological definition, the boundary of a subset S
of a space R consists of all space elements e such that any open neighborhood
of e contains elements both of S and of its complement R-S. When applying
this definition to an adjacency graph, the neighborhood of a vertex ZI is
defined as the union of v with all vertices adjacent to it. This leads to
boundaries that look like strips with a width of two elements (Fig. 2), which
contradicts our intuitive idea of a boundary as a thin curve whose area is
equal to zero. To make the boundary thinner, it is possible to consider its
intersection with the given subset S or with its complement. Then one obtains

R R R -
R R R R R
- R R R R R *
R R R R R
--R R R R R-

a b C

FIGURE2. (a) A subset, (b) its inner (I) and outer (0)boundaries under 4-adjacency, and
(c) under 8-adjacency.
200 V. A. KOVALEVSKY

two boundaries: the inner and the outer (compare labels I and 0 in Fig. 2).
The width becomes equal to one pixel but there is still no difference between
a boundary and a narrow region, the area of a boundary (being commonly
defined as the number of pixels) is still not equal to zero. In addition, one gets
different boundaries for a given subset S and for its complement. The
boundaries are different for 4- and 8-adjacency (compare Fig. 2b and 2c).
Each of the 4-boundaries (inner and outer) is disconnected under this
adjacency. The 8-boundaries are not simply connected. All these peculiarities
of boundaries are toplogical paradoxes.
Intuitive attempts to overcome the difficulties were often reported in the
literature. Thus, Rosenfeld and Kak (1976), when considering perimeters of
subsets in digital images, have suggested regarding “cracks” separating pixels
of a subset from those of its complement. Elliott and Srinavasan (1981)
considered boundaries as consisting of “boundary elements,” i.e., short line
segments equivalent to “cracks.” The Apple company also uses a similar
concept when describing its graphics software. Herman and Webster (1983)
define the boundary surface of a three-dimensional region as a set of “faces”:
space elements separating two adjacent voxels (volume elements) from each
other. These ideas may serve as evidence that image processing specialists
feel strongly that a consistent topological concept for digital images must
include space elements of various kinds.
This feeling will be verified in Section 11, where it is shown that the
resolution of these problems consists in considering the digital plane as a
finite topological space in full accordance with topological axioms. It is
shown that the most suitable for practical purposes is the particular case of
finite topological spaces, known as abstract cell complexes. Topologically
consistent definitions of connectivity and boundaries are given there. In
Section 111, images on cell complexes are defined and ways of encoding them
are discussed. Important notions of Cartesian complexes and coordinates are
introduced there. Section IV contains the explanation of why some adjacency
graphs are topologically contradictory. Section V is devoted to boundaries.
A definition of boundaries of subcomplexes is given, and advantages of this
concept as compared to boundaries in adjacency graphs are presented.
Section VI is devoted to the simplest applications of finite topology to image
analysis: tracking and filling of boundaries, and thinning of regions. Section
VII describes a new topologically founded data structure: the cell list, which
represents a segmented image as a cell complex. The cell list is the base for
efficient image analysis. Algorithms for transforming a raster image into a
cell list are also described in Section VII. Sections VIII and IX represent an
advanced concept of image analysis: a generalization of the subgraph isomor-
phism problem based on the notion of a cell list. Both the problem formula-
tions and solutions are discussed. Section X contains applications.
FINITE TOPOLOGY AND IMAGE ANALYSIS 20 1

FIGURE
3. The surface of a polyhedron.

CELLCOMPLEXES
11. ABSTRACT

A topological space is defined as a set of space elements or points, while some


of its subsets are declared to be open. Open subsets must satisfy a set of
axioms: any union of open subsets is open; an intersection of a finite number
of open subsets is open; the empty subset and the whole set are open.
Alexandroff and Hopf (1935) have shown that in a finite space (i.e., a space
whose number of elements is finite) there is always an order relation among
its elements. If the space element e’ belongs to all open subsets containing
another element e” then e” does not belong to all open subsets containing e’.
This relation is called specialization order. There are different ways to
interpret this relation. A survey of the ways may be found in Kong and
Rosenfeld (1991) or in Kong et al. (1991). We present here the way that seems
to be the most descriptive and comprehensive.
As an example of a two-dimensional finite topological space, consider the
surface of a polyhedron (Fig. 3). It consists of three kinds of space elements:
faces, edges, and vertices. An edge 1 bounds two faces, f and f”, and is
bounded by two vertices, V’ and v”. These two vertices are also said to bound
the facesf andf”. Let us declare as open any subset S of faces, edges, and
vertices such that for every element e of S all elements of the surface that are
bounded by e are also in S. According to this declaration, a face is an open
subset. An edge 1 with the two facesf andf” bounded by it also composes
an open subset. So does a vertex united with all edges and faces bounded by
it. It is easy to see that the open subsets such defined satisfy the axioms.
Hence, a topological space is defined. It is finite if the surface of the
polyhedron has a finite number of elements.
The elements of such a space are not topologically equivalent: a facef
bounded by the edge 1 belongs to all open subsets containing 1, but 1 does not
belong to the set {f} that is an open subset containingf. One can see that
the specialization order corresponds to the bounding relation.
Further, it is possible to assign numbers to the space elements in such a way
202 V. A. KOVALEVSKY

that elements with lower numbers bound those with higher numbers. The
numbers are called dimensions of the space elements. Thus, vertices that are
not bounded by other elements get the lowest dimension, i.e., 0; the edges get
dimension 1 and the faces dimension 2. Structures of this kind are known as
abstract cell complexes (Steinitz, 1908).
Definition 1: An abstract cell complex C = ( E , B, dim) is a set E of abstract
elements provided with an antisymmetric, irreflexive, and transitive binary
relation B c E x E, called the bounding relation, and with a dimension
function dim: E + I from E into the set I of non-negative integers such that
dim(e‘) < dim(e”)for all pairs (e‘, e”)E B.
Elements of E are called abstract cells. It is important to draw the attention
of topologists to the fact that, in contrast to cells of Euclidean complexes,
abstract cells should not be regarded as point sets in a Euclidean space. That
is why abstract cell complexes (ACC’s) and their cells are called abstract.
Neither should an ACC be regarded as a quotient space of a Euclidean space,
as it was proposed by Kong and Rosenfeld (1991). (A quotient space Q of a
space S with a given decomposition of S into disjoint subsets is a space whose
elements correspond to subsets of S , while a subset of Q is open in Q iff the
union of corresponding subsets of S is open in S.) Considering cells as
abstract space elements makes it possible to develop the topology of ACC’s
as a self-contained theory that is indepenendt of the topology of Euclidean
spaces.
If the dimension dim(e’) of a cell e‘ is equal to d then e‘ is called a
d-dimensional cell or a d-cell. An ACC is called k-dimensional or a k-complex
if the dimensions of all its cells are less than or equal to k . If (e’, e”)E B then
e‘ is said to bound e”.
Examples of ACC’s are shown in Fig. 4. In Fig. 4 and in the sequel, the
following graphical notations (similar to that of Fig. 3) are used: 0-cells are
denoted by small circles or squares representing points (which cannot be
drawn), I-cells are denoted by line segments, 2-cells by interiors of rectangles
or other polygons, and 3-cells by interiors of polyhedrons. The complexes of
Figs. 4a and 4d are one-dimensional: they contain only 0- and 1-cells. The
bounding relation in these examples is defined in a natural way: a I-cell
represented in the figure by a line segment is bounded by the 0-cells represented
by its end points.
The complexes of Figs. 4a and 4d may be considered as graphs whose
vertices are the 0-cells and whose edges are the I-cells. Any graph may be
considered in turn as an abstract cell complex if each vertex of the graph is
declared to bound all edges indicent with it and if every vertex gets, for
example, the dimension 0 and every edge the dimension I . The relation
between ACC’s and graphs is one more illustration of the idea of abstract
FINITE TOPOLOGY A N D IMAGE ANALYSIS 203

FIGURE4. Examples of complexes: (a, d) I-dimensional, (b, e) 2-dimensional, and (c) a


section through a 3-dimensional.

complexes: there is no reason for considering both graphs and ACC’s as


subsets of a Euclidean space. Rather, ACC’s should be considered as a
generalization of graphs for higher dimensions.
Figures 4b and 4e show two-dimensional ACC’s. The bounding relation is
similar to that of Fig. 3: squares are bounded by surrounding line segments
and by their end points, and line segments are bounded by their end points.
Notice that squares contain no line segments and no points.
In the case of Fig. 4e, only the surface of the cube is a two-dimensional
ACC. When considering the interior of the cube as one more cell, a dimension
equal to 3 may be assigned to it. The cube with its interior represents a
three-dimensional complex. All cells of the surface bound the 3-cell of the
interior. Figure 4c shows another three-dimensional complex, which is cut to
show its interior.
The notion of a pixel, which is widely used in computer graphics and image
processing, should be identified with that of a 2-cell rather than with a point,
since a pixel is thought of as a carrier of a gray value that can be physically
measured only if the pixel has a non-zero area. On the other hand, we are
used to thinking of a point as an entity with a zero area.
The I-cells of a two-dimensional ACC correspond to the “boundary
elements” considered by Elliott and Srinavasan (198 1) and to “cracks,”
mentioned by Rosenfeld and Kak (1976). The voxels considered in 3-D image
204 V. A. KOVALEVSKY

FIGURE
5 . Examples of open and non-open subsets.

processing obviously correspond to the 3-cells. The “faces” separating two


adjacent voxels from each other and considered by Herman and Webster
(1983) correspond to 2-cells in a three-dimensional ACC.
An abstract cell may generally be considered as a face of another such cell
of higher dimension in exactly the same way as in a polyhedron surface
(Fig. 3): if a d-dimensional cell e’ bounds the cell e” then e’ is called a
d-dimensionalface of e”. Any cell e may be considered as a (non-proper) face
of itself. Thus, in addition to the relation B, one may consider another binary
relation in E, namely the face relation F: the pair of cells (e’, e”)is in Fif either
(e’, e“)E B or e‘ = e”. The relation F is obviously antisymmetric, transitive,
and reflexive. Hence, it is an order in E. Consequently, B and Fare related
to each other in the same way as the well-known relations “ c and “ < ”.

Therefore, it is usual to write e’ < e“ instead of (e’, e”) G B.


The topological structure of an ACC is defined as follows.
Definition 2: A subset S of E is called open in C iffor every element e‘ of S,
all elements of C that are bounded by e‘ are also in S.
Consider as an example the complex C in Fig. 5. There are two subsets
drawn in it as hatched areas, heavy lines, and fat points. The subset S‘ = { e l ,
e2,e , } is open: there are no elements in C bounded by e, nor by e,; the only
elements of C bounded by e, are e, and e3.Thus, all elements bounded by the
elements of S’ are in S’. The subset S“ = (e4,e , } is not open since e, bounds
e6,which is not in S”.
It was shown in Kovalevsky (1989a) that for any finite topological space
there exists an ACC with an equivalent topological structure. A particular
feature of ACC’s is, however, the presence of the dimension function. Due to
this property, ACC’s are attractive for applications: dimensions make the
concept descriptive and comprehensible for non-topologists. It is possible to
make drawings of ACC’s to demonstrate topological evidence (e.g., Figs. 4
and 5 ) , a possibility lost, unfortunately, during the modern phase of topo-
logical development. ACC’s invented many years ago (Steinitz, 1908) are
recently discussed more and more because of their attractive features (see,
e.g., Herman, 1990, and Kong and Rosenfeld, 1991). Therefore, we shall
restrict ourselves to considering ACC’s as representatives of finite topological
FINITE TOPOLOGY A N D IMAGE ANALYSIS 205

spaces. We shall need in the sequel the following properties of ACC’s. (We
shall write complex for ACC.)
Definition 3: A subcomplex S = (E’, B‘, dim‘) of a given abstract complex
C = ( E , B, dim) is a complex whose set E’ is a subset of E and the relation B’
is an intersection of B with E’ x E’. The dimension dim‘ is equal to dim for all
cells of E’.

This definition is important because it makes clear that to define a subcom-


plex S of C = ( E , B, dim) it suffices to define the corresponding subset E‘ of
the elements of E. Thus, it is possible to speak of a subcomplex E‘ c E while
understanding the subcomplex S = (E’, B‘, dim’).All subcomplexes of C may
be regarded as subsets of C and thus it is possible to use the common
formulae of set theory to define intersections, unions, and complements of
subcomplexes of the same complex C.
Consider as an example the subcomplex C‘ defined by the subset S‘ = { e ,,
e,, e , } of the complex C shown in Fig. 5. Its set of cells is S’, and its bounding
relation B‘ may be defined as the subset of cell pairs that are in the bounding
relation of the containing complex C, while both cells of the pair are in S’.
This set is B‘ = {(e,, e l ) ,(e,, e , ) } .Other remaining pairs of cells of S‘ do not
belong to B‘ since they were not in the bounding relation B of the containing
complex C.
Definition 2 defines the notion of open subsets while simultaneously
defining the open subcomplexes of a given complex. According to the axioms
of topology, any intersection of a finite number of open subsets is open. In
a finite space there is only a finite number of subsets. Therefore, in a finite
ACC, which is a finite space, the intersection of all open subcomplexes
containing a given cell c is again an open subcomplex. It is called the smallest
open neighborhood of c in the given complex C , and will be denoted by
SON(c). Notice that there is no such notion for a connected Hausdorff space.
It is easy to see that SON(c) consists of the cell c itself and of all cells of C
bounded by c. Figure 6 shows some examples of the SON’S of cells of
different dimensions in different complexes. The first column shows the
SON’s in a one-dimensional complex C ’ ;the SON of a 0-cell eobounding two
I-cells consists of eo itself and of both 1-cells bounded by it. The SON of a
I-cell el is el itself since there are no cells in C’ bounded by it. TWOother
columns show the SON’s of different cells in a two-dimensional complex C2
and in a three-dimensional complex C’. Notice that the SON of a cell of the
highest dimension is always the cell itself.
There is a notion dual to that of “open”:
Definition 4: A subcomplex S of E is called closed in C ij”for every element
e‘ of S . all elements of C that bound e’ are also in S .
206 V. A . KOVALEVSKY

FIGURE6. The smallest open neighborhoods (SON’S)of k-dimensional cells ek in d-dimen-


sional complexes Cd,d = 1, 2, 3.

Consider the complex C of Fig. 7. Its subcomplexes S’ = { p 5 , l,, p h } ,


s2 = { p , p, ,p 3 ,p4, I, , 12, 13, 14, a, } and S’ = { p 7 } are closed in C since there
is no cell in C that bounds a cell of S‘ ( i = 1, 2, 3) and is not in S’. On the
other hand, the subcomplex S4 = { p s , plo716, l,, 19, a 2 }is not closed since the
cell p9 bounds a, E S4,but p9 is not in S4.The subcomplex S5 = { p l z ,1 1 0 ) is
also not closed since the cell p13bounds 110 but p13is not in Ss.
It is easy to see that the complement of an open subcomplex is closed and
vice versa. Notice, however, that there exist subcomplexes that are neither
open nor closed (e.g., S4and S5 of Fig. 7). The notions of both “open” and
“closed” are relative: a subcomplex S’ of another subcomplex S c C may be
open in S but not open in C , or S’ may be closed in S but not closed in C .
E.g., the subcomplex S 5 = { p I 2 ,/lo} of Fig. 7, which is not closed in C (as

FIGURE7. Examples of closed and non-closed subsets.


FINITE TOPOLOGY AND IMAGE ANALYSIS 207

FIGURE8. Illustration of the connectedness relation.

explained previously) is closed in S = S4u S5 = ( p s r p l o&,,, I,, 19, a,, p 1 2 llo}


,
since there is no cell in S that bounds a cell of S5and is not in 5''. Thus, any
subcomplex is both open and closed in itself.
The smallest closed subcomplex of C that contains S is called the closure
of S in C . It is denoted by CI(S, C ) , or simply by CI(S) if it is obvious which
containing complex C is meant. The closure of a cell c is a notion dual to that
of SON(c):Cl(c)consists of c itself and of all cells of C bounding c. E.g., the
closure of the cell a , in Fig. 7 is the subcomplex S 2 as previously defined, the
closure of 1, is the subcomplex S ' , and the closure of p7 is the subcomplex
s3= ( P 7 ) .
Now consider the notion of connectivity. Let us first introduce the notion
of incident cells:
Definition 5: Two cells e' and e" of a complex Care called incident with each
other in C iff either e' = e", or e' bounds e", or e" bounds e'.
Regard the cells in Fig. 8: p , is incident with 1,, with l,, and with a , . Also,
I , is incident with p , .
Consider now the transitive closure of this incidence relation. The new
relation will be declared the connectedness relation. As with any transitive
closure, it must be defined recursively:
Definition 6 Two cells e' and e" of a complex C are called connected to each
other in C ifeither e' is incident with e", or there exists in C a cell c which is
connected to both e' and e".
Thus, p , is connected to p5 in the complex C of Fig. 8 since both of them
are incident to a , . Further, p , is connected to I,, since p , is connected to I,,
through a,, etc. However, I, is not connected to 1, since they are not incident
with each other, nor is there a third cell in C that is connected to both of them.
It may easily be shown that the connectedness relation according to
Defintion 6 is an equivalence relation (reflexive, symmetric, and transitive).
Thus, it defines a partition of a complex C into equivalence classes, called the
components qf C . Thus, the complex C in Fig. 8 consists of two components,
208 V. A. KOVALEVSKY

one of which is the subcomplex S = { p 3 ,p6, p , , 1, , I,, 1, , I,, , 1 1 5 , a 3 , a 6 } and


the other its complement C-S.
Definition 7: A complex C consisting of a single component is called
connected.
Thus, the complex C of Fig. 8 is not connected since it consists of two
components.
It is easy to see that Definitions 6 and 7 are directly applicable to subsets
of a complex C: any subset is, according to Definition 3, a subcomplex of C
and is again an ACC. E.g., the previously mentioned subcomplex S is
connected, as well as is its complement C-S. It is, however, important to
stress that all intermediate cells c mentioned in Definition 6 must belong to
the subset under consideration. Therefore, it is reasonable to regard an
equivalent definition of connected complexes:
Definition 8 A sequence of cells of a complex C beginning with c‘ and
finishing with C” is called a path in C from c‘ to C” ifevery two cells, which are
adjacent in the sequence, are also incident with each other.
Definition 9 A complex C is called path-connected iffor any two cells c’ and
c“ of C there exists a path in C from c‘ to c”.
Thus, one of the possible paths from p 3 to I , , in the subcomplex S of Fig. 8
is the sequence ( p 3 ,a3, I,, a6,
Kong et al. (1991) showed that Definitions 7 and 9 are equivalent. It is
important to notice that a component S of a complex C is simultaneously
open and closed in C. According to Definition 6 and the definition of a
component, there are no elements in C-S that are incident to elements of S .
Thus, all elements bounded by elements of S are in S; hence, S is open.
Similarly, all elements of S that bound some elements of S are in S; hence,
S is closed. This may serve as an explanation for the definition of connected
sets that is common in general topology: a set S is called connected if it
cannot be represented as a union of two non-intersecting subsets that are
open (closed) in S. This definition, obviously equivalent to both Definitions
7 and 9, presents great difficulties to a non-topologist who forgets that the
notion of an open (closed) subset is relative and that any subset is open
(closed) in itself.

111. IMAGES
ON CELLCOMPLEXES

A two-dimensional image is usually considered as an assignment of gray


values or colors to a set of pixels. In image analysis it is necessary to use
209

FIGURE9. Composition of a two-dimensional Cartesian complex.

topological notions such as connectivity, boundary, inclusion, etc. Therefore,


it is necessary to replace the set of pixels by a two-dimensional finite topologi-
cal space, e.g., by an abstract cell complex (ACC). It is also necessary to apply
geometrical methods for analyzing measures and shapes of image parts. For
this purpose it is important to have coordinates in images. A natural way of
introducing coordinates in the ACC's consists in constructing ACC's with
some special simple structure, explained in the following.
Consider a one-dimensional connected ACC in which any 0-cell, except
two of them, has exactly two incident 1-cells. Such an ACC looks like a
polygonal line. It is possible to assing subsequent integer numbers (in
addition to dimensions) to the cells in such a way that a cell with the number
x is incident with cells having the numbers x - 1 and x + 1. These numbers
are considered as coordinates in a one-dimensional space.
Complexes of greater dimensions are defined as Cartesian products of such
one-dimensional complexes. A product complex is called a Cartesian ACC.
The set of cells of an n-dimensional Cartesian ACC is the Cartesian product
of n sets of cells of one-dimensional ACC's C, . Thus, a cell of the n-dimen-
sional Cartesian complex is an n-tuple ( a , , a*, . . . , a,) of cells a, of the
corresponding one-dimensional complexes a,E C,. The bounding relation of
the n-dimensional complex c" is defined as follows: the n-tuple ( a , ,a,, . . . , a,)
bounds another distinct n-tuple ( b , ,b,, . . . , b,) iff for all i = 1,2, . . . , n the
cell a, is incident with 6, in C, and dim(a,) < dim(b,) in C,. The dimension of
a product cell is defined as the sum of dimensions of the factor cells in their
one-dimensional spaces.
Coordinates of a product cell are defined by the vector whose components
are the coordinates of the factor cells in their one-dimensional spaces. A
fragment of a two-dimensional Cartesian ACC is shown in Fig. 9. Thus, the
1-cell (c,+, ,)c,, of the two-dimensional product complex is a pair consisting
of the 1-cell c,+, of the one-dimensional complex C, and the 0-cell c,,+, of the
one-dimensional complex C,,. The coordinates of (c,+,, c,+,) are ( x 1, +
210 V. A. KOVALEVSKY

y + 2). The 2-cell (cX+,, c,,,,) of the product complex consists of the l-cell
+
c , + ~and the 1-cell c V + , . The coordinates of ( c ~ + c~ ,~ + are
~ ) (x +
1, y 1).
Similar spaces that do not regard dimensions of space elements were
considered by Khalimsky (1977) (see also Kong et al., 1991). It is easy to see
that a Cartesian ACC represents a finite analogy of a Cartesian Euclidean
space.
An n-dimensional image (n = 2 or 3) is defined by assigning numbers (gray
values or densities) to the n-dimensional cells of an n-dimensional Cartesian
ACC. There is no need to assign gray values or densities to cells of lower
dimensions. Such an assignment would be unnatural since a gray value may
be physically determined only for a finite area. We interpret 2-cells in a
two-dimensional ACC as elementary areas. Cells of lower dimensions have
area equal to zero. Similarly, a density may be physically determined only for
a finite volume that is represented in a three-dimensional ACC by a 3-cell.
However, when considering the connectivity of subsets (subcomplexes),
the membership in a subset under consideration must be specified for cells of
all dimensions in such a way that each cell of the ACC is declared to belong
either to the subset or to its complement. If more than one subset of a given
ACC is being regarded, then a partition of the ACC in disjoint subsets must
be considered and each cell of the ACC must be assigned to exactly one
subset of the partition. The membership may be determined by assigning
labels to the cells. A label may be considered as the identification number of
a subset. In the simplest case of a binary image there are just two subsets, e.g.,
the black and the white ones. Then both the gray values (densities) and the
membership labels may consist of a single bit. The membership labels of the
n-cells may then be identical with their gray values. This tempts one to
interpret the membership labels of the lower dimensional cells as gray values,
which is not correct because of the previously mentioned reasons. It is better
to distinguish between gray values or densities on the one side and mem-
bership labels on the other.
As soon as membership labels are assigned to all cells (of all dimensions)
of an ACC, the connectivity of its subsets may be consistently specified by
Definitions 7 or 9. It is important to stress that the connectivity is determined
by means of the lower dimensional cells, which serve as “cement” joining
n-dimensional cells. A set consisting of only n-dimensional cells is always
disconnected.
When storing the membership labels of a two-dimensional ACC explicitly,
i.e., in an image memory, four times more memory space is required as
compared with the space required for the 2-cells only (Kovalevsky, 1989a).
This quotient is equal to eight in the three-dimensional case. However, such
a great memory volume is rarely (if ever) needed in practice. There are many
ways to use apriori knowledge about the image under consideration to obtain
FINITE TOPOLOGY A N D IMAGE ANALYSIS 21 I

a b
FIGURE10. The south-east membership rule: (a) a connected and (b) a disconnected subset.
some implicit determination of the membership labels of a lower dimensional
cell as a function of the labels and gray levels of the n-cells bounded by it. The
determination may be realized by means of the so-called membership rules.
Such a rule cannot be chosen arbitrarily; it must specify the membership
of every cell of the ACC, and the membership of a cell must be specified
uniquely. E.g., it is not allowed to specify all faces of an n-cell as belonging
to the same subset as the n-cell itself since the same cell may be a face of two
different n-cells belonging to different subsets.
Consider some examples of membership rules. One of the simplest rules for
two-dimensional images assigns every 1-cell to the same subset as the 2-cell
that is incident with it and lies directly below or to the right of it. Every 0-cell
is assigned together with the incident 2-cell below and to the right of it. It is
easy to see that under this rule, the set of pixels shown in Fig. 10a is connected
and that in Fig. 10b disconnected. Thus, the connectivity is non-isotropic, as
in the case of the 6-adjacency (Fig. 1b).
An isotropic and more practically useful rule follows.
Maximum Value Rule. In an n-dimensional ACC, every cell c of dimension
less than n gets the membership label of the n-cell that has the maximum gray
value (density) among all n-cells bounded by c.
Under this rule, both sets of pixels in Figs. 10a and 10b are connected if the
pixels of the sets have a greater gray value than those of the background. The
membership of the 0-cells in Fig. 10b is changed, thus making the set
connected. It is, of course, possible to formulate a similar Minimum Value
Rule. The connectivity of a binary image is similar in both cases to that
obtained according to the widely used idea of an 8-adjacency for objects and
a 4-adjacency for the background (Rosenfeld and Kak, 1976). An important
advantage of the Maximum (Minimum) Value Rule is the possibility of using
it for multi-valued images. A slightly more complicated and also practically
212 V. A. KOVALEVSKY

FIGURE11. The connectivity paradox.

useful rule may be found in Kovalevsky (1989a). Situations in which an


explicit specification of the membership labels may be useful are also
discussed there.

IV. RESOLUTION
OF CONNECTIVITY
CONTRADICTIONS

When defining connectivity according to Definitions 7 or 9, contradictions


disappear as soon as one defines the subset under consideration by specifying
the membership in it for cells of all dimensions rather than for pixels only.
Consider the simplest contradictory situation in Fig. 11, which is well known
due to Pavlidis (1977, 1982). When applying the concept of 8-adjacency to it,
“a path from A to B can intersect a path from C to D, even though both of
them lie entirely in disjoint sets.” This contradiction becomes resolved in cell
complexes if the 0-cell X is known to belong to one of the complementary
subsets: either to that containing A and B or C and D. Then only one of these
subsets is connected, namely those containing the 0-cell X .
Similarly, it may be shown that the Jordan theorem holds for ACC’s. It is
natural to think of a Jordan curve in an ACC as a path according to
Definition 8 that is simple and closed, i.e., every cell of the path is incident
with exactly two other cells of it (Fig. 12a).
Jordan Theorem for ACC’s: If a one-dimensional subcomplex S of a two-
dimensional Cartesian complex C is a simple closed path in C then the
complement C-S consists of two components.
Proof: Consider the SON’s of all 0-cells of S. The SON’s shown in Fig. 12b
and 12c obviously have the property that the subset SON-S consisting of cells
of a SON which are not in S has two components corresponding to cells lying
on both sides of S. Since C is a Cartesian ACC consisting of rows and
columns, any SON of a 0-cell of S must have one of the shapes shown in Fig.
FINITE TOPOLOGY AND IMAGE ANALYSIS 213

FIGURE12. (a) a Jordan curve, and (b, c) SON's of its 0-cells.

12b and 12c, differing from them only by a rotation by 90" or 180". Hence,
in any case the complementary complex SON-S consists of two components.
Let pI and p 2 be two 0-cells of S that are adjacent in path S, i.e., they have
a common 1-cell I of S incident with both of them (Fig. 13a). Each of the
components of SON(p,) has one 2-cell in common with one of the
components of SON(p2).This is one of the 2-cells belonging to SON(I) (e.g.,
u2 and a, in Fig. 13a). Thus, the union of a component of SON(p,) with one
of the components of SON(p2)composes a connected subset of C. When
repeatedly composing such unions for all subsequent 0-cells of S one obtains
two disjoint connected open subsets 0' and 0",separated by S (Fig. 13b).
These are the two components of the set U-S where U is the union of SON's
of all 0-cells of S . Notice that the SON of any I-cell of S contains one 2-cell
belonging to 0' and one belonging to 0".
Now we shall show that any cell of C-S is connected in C-S either with 0'

a b

FIGURE13. (a) common 2-cells of two SON's, and (b) a path connecting a cell c' to the
set 0'.
214 V. A. KOVALEVSKY

or with 0“.Take any cell c’ and construct an L-shaped path P connecting it


with an arbitrary 2-cell a’ of 0’.The path must be composed of alternating
2- and 1-cells, as shown in Fig. 13b. Denote by a” the 2-cell that is closest to
c’ in the path P and is incident to some I-cell of S , thus belonging either to
0‘ or to 0”. The part of P from c’ to a” does not cross S. Otherwise, there
is in P a 2-cell different from a“ that is closer to c‘ than a” and that contradicts
the definition of a”. If the whole path P does not cross S then a” coincides
with a’ and thus c’ is connected in C-S to 0‘.Otherwise, c’ is connected in
C-S to a”, which belongs either to 0’or to 0“.Therefore, c’ is connected in
C-S either to 0’ or to 0”,which proves the theorem.
It is also possible to prove this theorem for a more general case when C is
an arbitrary n-dimensional manifold and S is an (n - 1)-dimensional
manifold in it satisfying certain conditions, which is called in topology null
homologuous; however, this is a topic for another publication.
Thus, we see that the connectivity paradox is resolved in ACC’s. Neverthe-
less, the question remains: why are adjacency graphs not in accordance with
the topology of two-dimensional spaces and under which conditions such an
accordance may be established?
It will be demonstrated now that for any adjacency graph G there exists an
isomorphic topological space such that any connected subgraph of G corres-
ponds to a connected subset of the space and vice versa. The difficulty is that
it is impossible for certain graphs to imbed this space into a Cartesian space.
However, such imbedding is desirable, since we believe we live in a Cartesian
space and digital images must be considered as mappings of such a space into
a finite space that logically must also be Cartesian.
Without loss of generality it is possible to consider ACC’s as the desired
isomorphic topological spaces since, as we know, for any finite space there
is an isomorphic ACC (Kovalevsky, 1989a). The ACC corresponding to a
given “two-dimensional” adjacency graph is constructed by assigning a 2-cell
to each vertex and a 1- or 0-cell to each edge. A 1- or 0-cell is declared to
bound a 2-cell if the corresponding edge is incident with the corresponding
vertex. It is easy to show that under such a mapping any connected subgraph
corresponds to a connected subcomplex and vice versa. However, to define
a subgraph of a given graph G when starting from a subset V of vertices of
G , a rule must be specified that determines which edges of G belong to the
subgraph. Usually, the following rule is implied: an edge e‘ belongs to the
subgraph induced by the subset V of vertices iff both vertices incident with
e‘ are in V . Consider an 8-adjacency graph and its subgraph, shown in Fig.
14a. The isomorphic abstract cell complex is shown in Fig. 14b, while
corresponding elements are denoted by identical symbols. The bounding
relation is shown by arrows: an arrow is directed from a bounding cell to that
FINITE TOPOLOGY AND IMAGE ANALYSIS 215
V1

ea

v3 er VI

a b

FIGURE14. Relation among (a) an adjacency graph, (b) an abstract cell complex, and (c)
a Cartesian complex.

being bounded. It is easy to see that under such a correspondence between


a graph and an ACC, every pair of a vertex with an incident edge corresponds
to a pair of incident cells in the ACC and vice versa. Hence, there is a
one-to-one correspondence between paths in the graph and paths in the
ACC, the latter being defined according to Definition 8. Consequently, there
is also a one-to-one correspondence between connected subgraphs and
connected subcomplexes.
However, it is impossible to imbed this complex into a two-dimensional
Cartesian complex. Thus, Fig. 14c shows a Cartesian complex almost isomor-
phic to that of Fig. 14b; corresponding cells are denoted by the same symbols.
However, to make the isomorphism complete, both cells e, and e6 should be
mapped onto the 0-cell X . These cells, as the corresponding edges of the
graph in Fig. 14a, may sometimes belong to different subsets, e.g., e, may
belong to a subset S while e6 belongs to its complement G-S. The cell X ,
however, may belong only to one subcomplex. A compromise is only possible
if both edges e, and e6 of Fig. 14a are always declared to belong to the same
subgraph. This is possible when introducing a corresponding membership
rule. E.g., if we consider a subgraph as an “object” and its complement as a
“background,” then such a rule may look like the following: a “direct” edge,
such as e , or e,, belongs to the object if both vertices that are incident with
it belong to it; both coupled “diagonal” edges, such as e, and e6, belong to
216 V. A. KOVALEVSKY

the object iff either v I and v4 or v 2 and v 3 belong to it; otherwise, both e, and
e6 belong to the background.
To make adjacency graphs consistent for non-binary images, another
version of a membership rule may be suggested: numbers must be assigned
to subsets of vertices (e.g., gray values), to corresponding subgraphs, and
their elements; both e, and e6 must then belong to the subgraph with the
maximum number along those of v I , v4, v2,and v3.
Similar rules may be introduced in the three-dimensional case. The
following three situations must be distinguished:
a) Two voxels have a common 2-dimensional face. In the adjacency graph
there is a direct edge connecting these two voxels. Its membership is a
function of the memberships of the same two voxels.
b) Four voxels have a common 1-dimensional edge. In the adjacency
graph there are four direct and two “diagonal” edges connecting them. The
membership of these two diagonal edges is a function of the memberships of
the four voxels. This situation exactly corresponds to that of the 8-adjacency
in the two-dimensional case as previously considered.
c) Eight voxels have a common 0-dimensional vertex. In the adjacency
graph there are twelve direct, twelve diagonal (such as those of case b), and
four “double diagonal” edges connecting them. The membership of these
four double diagonal edges is a function of the memberships of the eight
voxels.
In the n-dimensional case, a constellation of 2‘k-’’ “multi-diagonal’’ graph
edges corresponds to a group of 2k n-dimensional space elements having a
common (n-k)-face. The membership of the whole constellation is a function
of the memberships of the 2k elements of the group.
There are n different kinds of constellations of graph edges, which should
be considered differently to obtain a consistent imbedding into a Cartesian
space. This is the payment for the conciliation of adjacency graphs with
Cartesian spaces. The number n of kinds of constellations is exactly the
number of different dimensions of cells minus one (the n-dimensional cells are
excluded). However, it is much simpler and more descriptive to consider
+
n 1 different kinds of cells in an ACC than n different constellations of
edges in an adjacency graph. This is one of the obviously important advan-
tages of the ACC’s as compared to adjacency graphs.

V. BOUNDARIES
IN COMPLEXES

One of the most important topological notions is that of the boundary. As


seen in Section I, defining boundaries on the base of adjacency relations or
adjacency graphs is associated with certain difficulties (see, e.g., Pavlidis,
a b

FIGURE15. (a) a subcomplex and (b) its boundary.

1977). On the other hand, the theory of finite topological spaces, particularly
cell complexes, leads to a simple and consistent definition. The notion of a
boundary is for ACC’s similar to that in general topology:
Definition 10: The boundary (frontier) of a subcomplex S c C relative to C
is the subcomplex Fr(S, C ) consisting of all cells c of C such that the S O N ( c )
contains cells both of S and of its complement C-S.
Figure 15a shows an example of a subcomplex S of a two-dimensional ACC
and Fig. 15b its boundary. The subcomplex S contains the 0-cells p, ,p , , p,,
p6, p s , and plo,which are marked by small fat squares. It also contains the
I-cells l , , 12, l,, I,, I,, I,,,l,2r
I,,, I,,, and I,,, which are represented by fat lines,
and the 2-cells a , , a,, a,, and a6, shown as big shaded squares. Figure I5a
demonstrates various particular cases, which may occur when determining a
boundary. The cell 1, belongs to the boundary of S because its SON, consist-
ing of the cells a,,, l,, and a,, contains both cells of S (1, and a , ) and a cell of
the complement C-S (the cell ao).The cell l9 also belongs to the boundary
because its SON, consisting of the cells a,, 19, and a6,contains the cells u2and
a,, which are in S, but l9itself belongs to the complement. The 0-cell p , , does
not belong to the boundary since its SON, consisting o f p , , itself, four I-cells,
,
and four 2-cells incident with p, , is completely in the complement of S. The
SON’S of most other 0-cells intersect both S and its complement and thus
belong to the boundary.
Now consider the differences between boundaries according to Definition
10 and boundaries in adjacency graphs. First of all, let us notice that a
boundary Fr(S, C) in an n-dimensional complex C contains no n-dimensional
cells since n is the highest dimension, and hence an n-cell bounds no cells of
218 V. A. KOVALEVSKY

C . Therefore, the SON of such a cell consists of a single cell, which is the cell
itself. Hence, such a SON cannot contain cells of both S and its complement
and the cell cannot belong to the boundary. Consequently, the boundary of
S is a subcomplex of a lower dimension equal to n - 1.
Thus, the boundary of a region (a connected open subcomplex) in a
two-dimensional ACC contains no pixels and consists of 0- and 1-cells. It
looks like a closed polygon (or like several polygons, if the region has holes
in it). The boundaries so defined are analogous to those considered by Elliott
and Srinavasan (198 1) or to the “(C,D)-borders” (sets of “cracks”) briefly
mentioned by Rosenfeld and Kak ( I 976, second edition). Similarly, the
boundary of a region in a three-dimensional ACC contains no voxels and
consists of 0-, 1-, and 2-cells. It looks like a closed surface of a polyhedron
(or several surfaces, if the region has holes). A 2-cell of a boundary separates
a voxel of the region from a voxel of its complement. Thus, the 2-cells of the
boundary are the “faces” considered by Herman and Webster (1983). We
may now see that the theory of the ACC’s brings many intuitively introduced
notions together in a consistent and topologically well founded concept.
The next peculiarity of the boundary Fr(S, C ) is that it is unique: there is
no need (and no possibility!) to distinguish between the inner and outer
boundaries, defined by Pavlidis (l977), or between the “D-border of C” and
“C-border of D,” defined by Rosenfeld and Kak (1 976, second edition). A
boundary according to Definition 10 is the same for a subset and for its
complement, since Definition 10 is symmetric with respect to both subsets.
Remember that this was not the case for adjacency graphs.
The boundary now depends neither on the kind of adjacency (which notion
is no longer used) nor on the membership rules as defined in Section 111. To
prove the last assertion we need one more notion:
Definition 11: A membership rule for an n-dimensional complex is called
local if the membership label of a cell c’ specified by this rule is equal to that
of some n-dimensional cell bounded by c’.
Theorem 1: The boundary of an n-dimensional subcomplex S consisting of
a set of n-cells and of cells of lower dimensions assigned to S by some local
membership rule does not depend on the choice of this local rule.
Proof: Consider a boundary cell c‘ of S. The dimension of c’ is obviously
less than n. Without loss of generality, we may suppose that c‘ is assigned by
the membership rule to S (rather than to its complement). Since the rule is
local, there must be in the SON(c’) an n-dimensional cell belonging to S.
Since c’ belongs to the boundary of S there must also be at least one cell c”
in SON(c’)that does not belong to S. The dimension of c” is higher than that
of c’ since SON(c’) contains, besides c’ itself, only cells bounded by c’. If the
dimension of c” is equal to n then the SON(c’)contains at least one n-dimen-
FINITE TOPOLOGY AND IMAGE ANALYSIS 219
sional cell that is not in S. If, however, dim(c”) < n then the membership of
c” is specified by the local membership rule and hence there must be an
n-dimensional cell c” bounded by C” that does not belong to S. According to
the transivity of the bounding relation, c” is also bounded by c’ and hence
belongs to the SON(c’). Thus, in any case there is in SON(c’) an n-dimensional
cell not belonging to S . It has been shown previously that there is also in
SON(c’) an n-dimensional cell belonging to S. The membership of the n-
dimensional cells does not depend on the membership rule. Therefore, c’ will
belong to the boundary of S independently of the membership of the cells of
dimensions less than n and thus independently of the choice of a local rule.
A similar consideration may be repeated for the case when c’ is not in S .
The next important property of a boundary of a subcomplex (satisfying
some commonly fulfilled conditions) is that it has no end points, as was the
case for the 4-boundaries in Fig. 2b. The proof of this assertion demands
however, additional definitions to specify the just mentioned conditions.
Therefore, we shall not bring it here.
The notion of adjacent regions, which was profoundly investigated by
Pavlidis (1 977) in connection with boundaries, may successfully be replaced
by that of incident ones.
Definition 12: Two non-intersecting subcomplexes S , and S2 of a complex C
are culled incident with each other if there are two cells e‘E S, and e”E S2 such
that one of them bounds the other.
It may be easily shown that the boundaries of incident subcomplexes
intersect. The solution of this problem, suggested by Pavlidis (1977) by means
of the so-called extended boundaries, may be considered as an intuitive
prevision of the topological results.
Consider the problem of the area of a boundary in a two-dimensional
complex. As explained earlier, a boundary of any subset S of such a complex
C contains no 2-cells. Therefore, such a boundary is a one-dimensional
subcomplex of C consisting only of 0- and I-cells, i.e., of line elements and
points. It is natural to assign a non-zero area to the 2-cells (pixels) only. Then
the area of any one-dimensional complex is equal to zero, which is in
accordance with our intuition. Similarly, the boundary of any subsets of a
three-dimensional ACC contains no 3-cells and in an n-dimensional complex
no n-cells. Hence, the volume of a boundary or, in general, its n-dimensional
measure is equal to zero.
We have demonstrated that the theory of the ACC’s, being a consistent
branch of classical, well-proven topology, removes all topological paradoxes
and contradictions from the theory of digitized images. It may be applied
without any change to describe finite topological spaces of any dimensions.
Why is the notion of a boundary in an adjacency graph topologically
220 V. A. KOVALEVSKY

incorrect? Alexandroff (1 937) has shown that a connected finite topological


space in which any two space elements have different SONS is a TO-space.
This means that for any two space elements there exits an open subset that
contains exactly one of these elements. If e“ is an element of SON(e’) then
certainly e‘ does not belong to SON(e”). Thus, SON(e”) must be a proper
subset of SON(e’) and must consist of fewer elements than SON(e’). This is
the decisive property of finite spaces: there are space elements with neighbor-
hoods of different cardinality. On the contrary, the main idea of the
adjacency graphs is the uniformity of the elements. However, it is impossible
under this assumption to define boundaries that are in accordance with
general topological concepts and with our intuition.
We shall show in the next sections that the notion of boundaries according
to Definition 10 leads to simple and efficient algorithms useful for applications.

VI. SIMPLEIMAGEANALYSIS
PROBLEMS

The concept of abstract cell complexes not only makes the theory of digital
image analysis and computer graphics free of contradictions, it also enables
one to develop elegant, simple, and comprehensible algorithms. Consider
first the problem of tracking boundaries of regions in two-dimensional raster
images. Regions may be specified in the usual way, that is, by labelling all
pixels (2-cells) of a region by some label different from those of adjacent
regions. Cells of lower dimensions need not be labelled explicitly; their
membership in a region may be defined in the most practical cases by a
membership rule as explained in Section 111. According to the Maximum
Value Rule, every 0- and I-cell c gets the label of the 2-cell bounded by it that
has the maximum gray value. Under such a labelling, the boundaries cannot
contain isolated 0-cells. Therefore, it is sufficient to test only the 1-cells for
their membership in a boundary. According to Definition 10, a 1-cell c’
belongs to a boundary of a subset S iff the SON(c’) consisting of c’ itself and
of two incident 2-cells (see Fig. 6 ) intersects both Sand its complement. Since
the subsets are defined by labels, c’ belongs to a boundary iff the three cells
do not have the same label. If the membership rule is a local one then the label
of c’ is always equal to one of the two labels of the incident 2-cells. Thus, it
is sufficient to test these two labels.
The tracking algorithm described below is identical with the “crack
following” (Rosenfeld and Kak, 1976, second edition). Our description is
given in terms of cell complexes, which has the advantage that it is topologic-
ally justified and more comprehensible.
The algorithm goes from one 0-cell to the next, step by step, in such a
direction that the region with the chosen label (the object) always remains to
FINITE TOPOLOGY A N D IMAGE ANALYSIS 22 1

FIGURE16. Explanation of the boundary tracking algorithm.

the right-hand side of the direction. These moves go along the 1-cells, which,
in a two-dimensional Cartesian ACC, are either horizontal or vertical. Thus,
there are onlyfour possible directions, as shown in Fig. 16. Having only four
directions rather than eight, as is usual when tracking boundaries in
adjacency graphs, is already a contribution toward simplifying the algorithm.
When arriving at the next 0-cell p , the direction of the last step that led to
p is known. Thus, it is known that the 2-cell lying to the right of this direction
belongs to the object, and that the one to the left belongs to the background
(Fig. 16). In this way, the membership of two pixels of SON(p) is already
known. It is only necessary to test the labels of the remaining two pixels of
SON(p) lying ahead: one to the right and one to the left of the direction of
the last step (R and L in Fig. 16). Consider the case when the object has a
greater gray level than the background, and accept the Maximum Value Rule
to determine the membership of the 0-cells. Then the actual 0-cell p (denoted
by a circle in Fig. 16). always belongs to the object, because p is a boundary
cell and, according to Definition 10, there must be in the SON(p) at least one
object pixel. The pixel having the maximum gray level determines the mem-
bership of p .
The direction of the next step depends upon the labels of R and L in the
following way: if L is in the object then turn left, else if R is in the background
turn right, else retain the old direction. A similar decision rule must be used
in the case when the object has a smaller gray level than the background. This
decision rule is the kernel of the tracking algorithm. The rest consists of some
obvious procedures:
a) calculating the coordinates of the pixels R and L as functions of the
actual coordinates of p and the direction;
b) calculating the new direction when turning to the right or to the left;
and
c) calculating the new coordinates o f p after having made the next step in
the new direction.
Procedures a) and c) may be easily realized by means of small arrays of
constants serving as lookup tables for the coordinate increments depending
on the direction. Procedure b) may be realized as increasing or decreasing the
direction value by 1 modulo 4 (if the directions are encoded by numbers
from 0 to 3). The whole procedure, including the definition of the lookup
tables, contains about 20 Pascal instructions. Tracking algorithms that do
222 V. A. KOVALEVSKY

a b C

FIGURE17. Recognizing inner pixels (a, b) in adjacency graphs, and (c) in a cell complex

not use the concept of cell complexes is much more complicated and less
comprehensible (compare, e.g., Rosenfeld and Kak, 1976, second edition; or
Pavlidis, 1982).
Consider now the problem of filling the interior of a closed curve. The
problem is obviously equivalent to that of deciding if a pixel is inside or
outside the curve: the inner pixels must be filled, the outer must not. The
decision is based on the fact that a ray that starts at a given point and goes
to infinity crosses the given curve an odd number of times if the point is inside
the curve, and an even number of times otherwise. Difficulty arises in
discriminating between crossing and tangency. It may be seen in Figs. 17a
and 17b that when describing curves as sets of pixels, situations may occur
in which it is impossible to decide correctly whether a pixel p is in the interior
of the curve when analyzing only the line containing p : the lines containing
p are identical in Figs. 17a and 17b, while p is inside the curve in Fig. 17a but
outside it in Fig. 17b. Algorithms not based on the concept of cell complexes
are rather complicated since they need to test three adjacent lines to decide
between crossing and tangency (compare, e.g., Pavlidis, 1982).
In the case of a cell complex, the ray is replaced by a horizontal open strip
consisting of alternating 2-cells and vertical 1-cells, all lying in a horizontal
row of the raster containing the pixel p (Fig. 17c). The curve is represented
as a 1-dimensional subcomplex consisting of alternating 0- and 1-cells. There
arises no problem of tangency since a horizontal strip does not contain
horizontal 1-cells. Crossings with the curve are only possible on vertical
1-cells. Therefore, the filling is reduced to scanning the image with the given
curve horizontally, row by row, and counting in each row the encountered
vertical 1-cellsof the curve. Counting must start with 0 at the left side of each
row. For each pixel in the row the number of vertical 1-cells counted since
the start of the row must be tested. If the count is odd then the pixel must be
filled, otherwise not. In other words, filling of subsequent pixels in a row must
be started whenever the count becomes odd, and stopped whenever it
becomes even. In the image of Fig. 17c, the count becomes equal to 1 in the
second column. Thus, the pixels in columns 2 through 10 must be filled. In
FINITE TOPOLOGY A N D IMAGE ANALYSIS 223

the 1 Ith column the count becomes 2 and the filling must be stopped. A
similar algorithm, again based on the notion of “cracks,” was described by
Rosenfeld and Kak (1976, second edition).
The filling procedure is important for computer graphics applications since
it enables fast and precise drawing of complex regions defined by their
boundaries. Calculating a boundary and then filling it requires much less time
than calculating all pixels of the region.
The advantages of cell complexes may be also demonstrated by the
example of a thinning problem, which consists in reducing the number of the
object pixels in a way similar to a “prairie fire,” destroying an object simul-
taneously at all boundary locations until two fire fronts collide. Thus, only
a skeleton line of every connected object must be left, while the connectivity
of all objects must be retained, Difficulty arises when regarding a boundary
as a sequence of pixels and there are two boundary pixels that are adjacent
in the region but not adjacent in the boundary sequence. The problem
consists in deciding which of the two pixels should be deleted, since if both
are deleted then the connectivity of the object may be damaged. The solution
of the problem is simple in the case of sequential algorithms: one of the pixels
under consideration may be chosen by means of some preference rule and
deleted. Then the decision about the possibility to delete the other pixel may
be taken according to the new situation occurring after the deletion.
However, the solution of the problem is more difficult in the case of
developing parallel thinning algorithms since there is no further possibility of
deleting one pixel and then investigating the new situation. Theories and
algorithms proposed for parallel thinning are numerous and complicated. On
the contrary, the solution based on cell complexes is again very simple. To
present it we need a new notion of an “open boundary,” which is dual to that
of the closed boundary specified by Definition 10:
Definition 13: The open boundary of a subcomplex S of a complex C relative
to C is the subcomplex Ob(S. C ) consisting of all cells c of C such that the
closure Cl(c) contains cells both of S and of its complement C-S.
Remember (Section 11) that the closure of a cell is a notion dual to the SON:
the closure Cl(c) in a complex C consists of c itself and all cells of C bounding
c.
The thinning algorithm consists in alternatively finding the closed and the
open boundaries of the objects. After finding the closed boundary, each cell
c’ contained in it is tested: if the cells of the background that are bounded by
c‘ comprise exactly one connected component then c‘ must be deleted.
Similarly, for each cell C” of the open boundary the set of background cells
bounding c” must be tested: if it consists of exactly one connected component
then c” must be deleted. The process stops if all cells to be deleted have three
224 V. A. KOVALEVSKY

incident 1-cells in the background. (In this way deletion of end points is
prevented.) It is easy to see that this algorithm, being simple and elegant, may
be parallelized in such a way that at any step either the closed or the open
boundaries of the objects are defined and the appropriate cells deleted. The
same algorithm may realize a dilation if we simply interchange object and
background: dilation of the object is the same as thinning of the background.
It can be shown that in both cases the connectivity of all object and back-
ground components is preserved.

VII. THECELLLISTDATASTRUCTURE

The theory of ACC’s provides a means for an efficient coding of images. In


many applications the aim of image analysis consists of testing geometrical
features of certain connected subsets considered as objects of interest. Also,
topological relations of objects such as adjacency, inclusion, etc., must be
analyzed. We now present a data structure called the cell list, which is well
suited for such analysis.
We regard an n-dimensional image (n = 2 or 3) as a mapping of the set of
all cells of highest dimension (n-cells or open cells) of an n-complex into a set
of integers representing the gray values (n = 2) or densities (n = 3). (The set
of open cells was considered in Kong et a[., 1991, as the “open screen”.) From
here on we shall use the term “gray level” also to denote the density in the
case of n = 3.
After having applied a segmentation procedure to the image, one obtains
another image subdivided into subsets (segments). Each open cell of the
segmented image receives a label that is the identification number of the
segment. The labels are additional to the gray values, which are stored in the
original image. In Section 111 we called these labels the membership labels
since they specify the membership of the cells in the subsets determined as the
result of the segmentation. We suggest considering the segmentation
procedure as the assignment of membership labels to cells of the complex
carrying the image, including the cells of lower dimensions. In this way,
subsets of open cells are replaced by subcomplexes of the carrying complex.
The subcomplexes must not be connected. Their subdivision into
components is the next task, known as component labelling, which may only
be performed if the membership of all cells is already specified.
The membership of the open cells must be always specified explicitly, in the
form of labels (numbers) stored in an array in the same way as the gray
values. However, the membership of cells of lower dimensions may be
specified implicitly by a membership rule, as discussed in Section 111. After
the segmentation, a two-dimensional image becomes subdivided into com-
FINITE TOPOLOGY A N D IMAGE ANALYSIS 225
paratively large subcomplexes, each having a constant membership label.
Consider the components of these subcomplexes, their interiors, and their
boundaries. The interior of a component is a region, as commonly defined in
topology (a connected open subset). A region is a two-dimensional subcom-
plex. Let us assign to every region an element of an abstract set and call the
element a 2-dimensional block. The boundary of a region is a 1-complex,
which may have branching points at locations where three or more regions
meet. Let us assign to each branching point a 0-dimensional block. Each part
of a boundary separated by branching points (excluding these points) gets a
I-dimensional block assigned to it. Such a boundary part will be called a line.
If the boundary of a region has no branching points then every connected
component of the boundary should be considered as a line and get a l-dimen-
sional block assigned to it.
Let us now introduce a bounding relation between the blocks: I-blocks
corresponding to lines may be bounded by 0-blocks, 2-blocks corresponding
to regions are bounded by 1- and 0-blocks. Thus, the collection of blocks with
the bounding relation and dimensions may be regarded as a new abstract cell
complex (ACC) whose cells are the blocks. This structure is known as a block
complex (Rinow, 1975). As with any ACC, it is a topological space. Its
topology is known to be the so-called quotient topology of the original ACC
with respect to the mapping, which maps each cell c of the original ACC onto
the block corresponding to the subcomplex to which c belongs. This mapping
is unique since the subcomplexes corresponding to blocks are by definition
disjoint: regions are open and thus have no common cells with the lines that
are parts of the boundaries; the lines do not contain the branching points. It
may easily be shown that every open subset (Definition 2) of the block
complex is a mapping of some open subset of the original ACC. Thus, all
necessary conditions for a quotient topology are fulfilled.
The structure of a block complex may be described in a computer by a list
of blocks with pointers which represent the bounding relation while indicat-
ing, for each block, which other blocks are bounding it or are bounded by it.
This topological part of the structure may be regarded as a generalization of
the well-known region adjacency graph (Strong and Rosenfeld, 1973; Pavlidis,
1977). The topological part is augmented by metric data consisting of coor-
dinates of the 0-blocks and intermediate points of the 1-blocks that define the
exact location of the lines. The coordinate pairs of intermediate points
compose a sequence. Every two points that are adjacent in the sequence must
be connected by a digital straight segment. All these data compose the
so-called cell list (Kovalevsky, 1989a). The above description may serve as a
definition of a 2-dimensional cell list.
A 3-dimensional cell list has a similar topological part but the metrical part
is more complicated. In addition to the 0- and 1-blocks, the location of the
226 V. A. KOVALEVSKY
I0 IS M 25 30 X

FIGURE18. A segmented image.

2-blocks must also be defined through metric data, e.g., in the form of a
triangulation in which the coordinate triples of some intermediate points are
accompanied by a list of “digital triangles.” Each record in the latter list
consists of three pointers indicating the intermediate points that are the
vertices of the corresponding triangle.
A cell list may be constructed automatically from a given segmented
2-dimensinal raster image. The corresponding program finds the boundaries
of regions, tracks them, and resolves them into digital straight line segments
(DSS) (Kovalevsky, 1990). Thus, every boundary is rcpresented as a
polygonal line. This, however, is an exact representation rather than an
approximation since the program encodes the DSS by their end points along
with some additional parameters that specify the exact location of the DSS.
Thus, a precise reconstruction of the original segmented image is possible.
The structure of the cell list may be explained by the example of a small
segmented image, as shown in Fig. 18. The corresponding list is shown in
Table I. The first column in the sublist of branching points contains the
identifiers of the 0-blocks used also in Fig. 18. These data are not stored in
the computer since they correspond to the addresses of the records. The next
two columns contain the coordinates of the points. The following four
columns contain the identifiers of all lines (1-blocks) bounded by the current
point. A line contacts its end point through a 1-cell. A point in a two-dimen-
sional Cartesian ACC is incident with at most four 1-cells lying to the east,
south, west, or north of the point. The identifier of each line is placed into
the column corresponding to one of these directions according to the position
of the incident I-cell with respect to the current branching point. The lines
FINITE TOPOLOGY AND IMAGE ANALYSIS 227
TABLE I
BRANCHINGPOINTS
~ ~

coordinates lines

No. X Y east south west north

PI 10 24 - L3 - LI 0 + L2
P2 30 23 0 + LI + L5 - L2
P3 17 17 - L4 + L3 0 + L6
P4 24 20 - L5 0 + L4 - L6

LINES REGIONS

points regions metric


boundary
No. begin end right left beg. end No. label start line

LI PI P2 RI R2 1 4 RI 0 +LI
L2 P2 PI RI R3 5 8 R2 112 +L3
L3 PI P3 R2 R3 9 II R3 255 -L5
L4 P3 P4 R2 R4 12 13 R4 0 -L6
L5 P4 P2 R2 R3 14 15
L6 P4 P3 R3 R4 16 20

METRIC
DATA

address: 1 2 3 4 5 6 7
coord.: (10, 24) (11, 26) (30, 25) (30, 23) (30, 23) (29, 10) (10, 11)
address: 8 9 10 II 12 13 14
coord.: (10, 24) (10, 24) (17, 18) (17, 17) (17, 17) (24, 20) (24, 20)
address: 15 16 17 18 19 20
coord.: (30. 23) (24, 20) (23, 17) (12, 11) (16, 16) (17, 17)

are considered as directed (Fig. 18). A minus sign of the identifier denotes a
line starting from the point, and a plus sign corresponds to a finishing line.
The first column in the list of lines contains their identifiers. The next two
columns contain identifiers of the start and end points of a line. If the line is
closed, and hence is neither starting nor finishing at a branching point, then
both identifiers are zero. The next two columns contain identifiers of the
regions lying to the right and to the left of the line correspondingly. The last
two columns contain identifiers of a starting and finishing coordinate pair,
228 V. A. KOVALEVSKY

which are to be found in the metric list. Each pair represents an end point of
a digital straight segment (DSS).
This list is a single sequence of coordinate pairs. The identifier of each pair
is its ordinal number in the sequence. The items in the last two columns of
the list of lines indicate the beginning and end of the subsequence containing
all vertices of the digital polygon representing the current line. For example,
the numbers 5 and 8 in the row L2 denote that the coordiantes of the vertices
of the corresponding polygon are to be found in the list of metric data
starting at pair number 5 and finishing at 8. These are

(30,23), (29,10), (lO,ll), (10,24).

When encoding the DSS by the coordinates of the end points, a reconstruc-
tion of the original segmented image is possible only with an accuracy of
about one pixel. This is so because there exist more than one DSS connecting
two given points. All these DSS deviate from each other by no more than one
pixel (see, e.g., Kovalevsky, 1990). If it is necessary to have a precise coding
of the line, certain additional parameters must be stored for each DSS. These
parameters are not shown in Table I to make the presentation simpler.
The first column in the list of regions contains the identifiers of the regions.
The next column contains the labels (gray values) of the regions. The last
column contains the identifier of a line belonging to the boundary of the
region. Starting from this line in the proper direction, one may reconstruct
the complete sequence of lines composing the boundary. The boundary is
directed in such a way that the region is always lying to the right side of the
boundary. The minus sign at the identifiers of some starting lines indicates
that the line should be transversed from the end to the beginning to obtain
the correct direction of the boundary.
Encoding images by cell lists is rather economical: in applications to
cartography (Kovalevsky, 1989b) and to technical drawings (see Section X)
the average compression factor, in comparison to a raster representation, is
in the range of 20 to 100. This means that the cell list for an image of
512 x 512 bytes is only about 3 to 13 Kbyte long.
Another important advantage of the cell list is that the data are region-
related, and therefore different objects of interest in the image are represented
in the list separately. The regions, and thus the objects of interest, are
represented in the list explicitly by their topological relations and coor-
dinates. Consequently, geometrical analysis of the image is reduced to simple
calculations using well-known formulae of analytic geometry. Details about
the procedure of transforming a segmented raster image into a cell list may
be found in (Kovalevsky, 1989a). Recognition of the DSS is described in
(Kovalevsky, 1990).
FINITE TOPOLOGY A N D IMAGE ANALYSIS 229
blue

a) bl

FIGURE19. (a) a simple scene, (b) its region adjacency graph, and (c) a prototype graph of
a house.

VIII. SUBCRAPH
AND SUBCOMPLEX
ISOMORPHISM

The cell lists described in the previous section give us a powerful means for
analyzing images, since an image is encoded in the list in such a form that
homogeneous regions, their boundaries, and their topological relations to
each other are represented explicitly. Moreover, geometrical features are
described by means of coordinates rather than by gray-value distributions.
This makes the analysis of size and shape of image parts easy. Cell lists
contain explicit information necessary for analyzing both geometrical
features of image parts and their spatial relations to each other. This is
exactly what is needed to solve complex problems of image understanding.
What one needs, in addition to this information, is a suitable technique for
verifying whether geometrical shapes and topological relations of image parts
correspond to certain predetermined demands characteristic of the image
classes to be analyzed. A well-known means of analyzing topological
relations is the technique of subgraph isomorphism (Ullmann, 1983). It is
based on describing the topological structure of an image by a region
adjacency graph (Strong and Rosenfeld, 1973; Pavlidis, 1977). In such a
graph, regions of a given segmented image are represented by graph vertices.
Every pair of adjacent regions is associated with an edge. The statement of
the image analysis problem may then sound as follows:
Formulation 1: Subgraph Isomorphism
Given is a region adjacency graph (image graph) IG of an image and a
prototype graph PG.
Find an isomorphic mapping M : PG + IG.
This means that a vertex of IG must be assigned to every vertex of PG in such
a way that for any two vertices of PG that are connected by an edge of PG,
the corresponding vertices of IG are also connected by an edge of IG.
Consider an example. Figure 19a shows a simple scene. The corresponding
230 V. A . KOVALEVSKY

graph IG is shown in Fig. 19b, and the prototype graph PG for a house in
Fig. 19c. A possible mapping from PG into IG may look like
s-I, r+2, w+3, g-4;
(1)
(s, 4 -+ (1, 2), (3, 4 -+ (1, 31, ( r , w > -+ ( 2 , 31, (w,8 ) -+ ( 3 , 4).
The prototype vertex s (sky) is mapped to the vertex 1 of IG,which corres-
ponds to the upper region in the scene of Fig. 19a. The edge (s, r ) of PG,
representing the adjacency of sky and roof, is mapped to the edge ( I , 2 ) of
IG,etc.
Subgraph isomorphism is a complex mathematical problem known to be
in general NP-complete. This means that the computation time grows ex-
ponentially with the number of the vertices and may become unacceptably
great for large graphs. Although this is not the case for planar graphs, the
difficulty remains relevant for analyzing planar images: as we shall show,
planar graphs are not sufficient to describe all relations that may be
important for image analysis. Hence, the computation time may also become
great for planar images.
Another disadvantage of the method consists in many “false alarms”: e.g.,
it is easy to see that the graph PG in the last example may also be mapped
onto the subgraph of IG representing the tree in Fig. 19b since this subgraph
is also isomorphic to PG.Thus, the tree will be erroneously recognized as a
house. The reason for such errors is that too little information about the
desired objects is contained in prototype graphs.
Both disadvantages have common causes: if too little information about a
vertex of a graph is available, then the program tries to match it with a large
number of vertices of the other graph. This causes a long computation time
and a large number of “false alarms,” since the probability of encounterring
an occasionally isomorphic subgraph becomes high. There is a possibility of
overcoming both disadvantages of the subgraph isomorphism simul-
taneously: the information content of the data describing a vertex in a graph
must be increased by assigning to it some additional features. A vertex in a
graph may be distinguished from other vertices only by its relations to other
vertices. A region of an image, however, may be characterized by many other
features, such as colour, texture, size, shape, etc. Such features may be
assigned to a vertex of an image graph in the form of labels.
The vertices of a prototype graph may be labelled by similar labels and
then a prototype vertex should be matched only onto an image vertex having
the same label. However, to make the recognition procedure more flexible
with respect to the variability of images, it is more advisable to label the
prototype vertices by other symbols, corresponding to classes of possible
feature values. Such symbols may be regarded as sematic labels whose com-
patibility with the features is known. Then a prototype vertex must be
FINITE TOPOLOGY A N D IMAGE ANALYSIS 23 1
matched only with those vertices of the image graph that have compatible
feature values, i.e., values belonging to a predetermined class of possible
values.
Realization of this idea leads to the following

Formulation 2: Isomorphism of Labelled Subgraphs


Given:
a) an image graph IG = ( V , I E ) with V being its set of vertices and I E that
of edges (vertex pairs);
b) a set F of symbolsf called features;
c) a mapping FM: V - F assigning a feature to each vertex of IG;
d) a prototype graph PG = ( P , P E ) with P being its set of vertices and P E
that of edges;
e) a set S of symbols s called semantic fabefs;
f ) a mapping S M : P -+ S assigning a semantic label to each vertex p of PG;
g) an interpretation relation IN c F x S, which is a set of ordered pairs
(f,s) denoting that a featurefmay be interpreted as a semantic label s.
Find:
a mapping M : PG + IG such that for all P E P
the pair ( F M ( M ( p ) ) ,S M ( p ) ) eIN.
The last condition means that if a vertex p of PG is mapped onto z, = M ( p ) ,
V U EV , then the featuref= FM(v) = F M ( M ( p ) )must be allowed by IN to be
interpreted as the semantic label s = S M ( p ) of the corresponding vertex of
the prototype graph P G .
Consider an example. Let the featuresfbe colors and let the mapping FM
assign to the sky (Fig. 19b) the color blue, to the roof red, to the wall white,
to the crown of the tree and to the grass green, and to the trunk of the tree
black. Let the mapping SM assign to the vertices of PG the semantic labels
as shown in Fig. 19c. Let the interpretation relation IN be
(blue, sky), (red, roof), (white, wall), (green, grass).
The mapping in Eq. (1) may then still be considered as consistent, but the
mapping of PG onto the subgraph of the tree is no longer consistent: the
colors of the crown and the trunk are not in agreement with the interpretation
relation I N .
When considering practical tasks, the problem just formulated must be
solved many times for every object class, e.g., to detect all houses in a scene
it is necessary to find all consistent mappings of PG into IG.
Rather similar to this problem is that of consistent labelling (see, e.g.,
Shapiro, 1983), defined by
232 V. A. KOVALEVSKY

Formulation 3: Consistent Labelling


Given:
a) an image graph ZG = ( V , ZE);
b) a set F of featuresf;
c) a mapping FM: V - , F assigning a feature to each vertex of ZG;
d) a set S of semantic labels s;
e) an interpretation relation IN c F x S;
f) an adjacency relation A c S x S, which is a set of pairs (s’, s”) of
semantic labels allowed to be assigned to adjacent vertices of IG.
Find:
a mapping M‘: V - , S such that
for all v E V (FM(v),M’(v))E IN
and
for all (v, w)E IE (M’(v), M’(w))E A .
This means that the semantic labels assigned to vertices of IG must satisfy two
conditions:
a) a vertext v E V may obtain only a semantic label that is allowed by the
interpretation relation IN for the existing feature f = FM(v);
b) two vertices v and w that are connected by an edge in ZG may obtain
only such semantic labels that compose a pair allowed according to the
adjacency relation A .
Consider an example. Let us retain the graph ZG of Fig. 19b with the colors
as features, with the semantic labels for parts and for the surrounding of a
house and with the interpretation relation IN. Let us additionally define the
following adjacency relation:
(sky, roof), (sky, wall), (roof, wall), (wall, grass), (sky, grass). ( 2 )
A consistent labelling of ZG is then
1 -,sky; 2 --t roof; 3 -,wall; 4 -,grass; 5 -,unknown; 6 -,grass.
The image part corresponding to the tree cannot be interpreted correctly
since no corresponding relations were specified.
The most important difference between labelled subgraph isomorphism
and consistent labelling consists in the following. Subgraph isomorphism
looks for a subset of edges and vertices of ZG that compose a subgraph
isomorphic to the prototype graph PG. On the other hand, consistent
labelling tests all edges of ZG,whether or not their incident vertices may be
consistently labelled. Therefore, subgraph isomorphism accepts images
having too many adjacencies but does not accept those having too few.
Consistent labelling acts conversely. E.g., if the roof of a house touches the
crown of a tree, as in Fig. 20a, and this situation is not taken into account
in the prototype graph, then subgraph isomorphism should still recognize the
house: the vertex of ZG corresponding to the roof would have one incident
/yy&
FINITE TOPOLOGY AND IMAGE ANALYSIS 233
blu

/T\
white
green
green

a) bl

FIGURE20. Two scenes leading to recognition errors: (a) unexpected adjacency, and (b)
missing adjacency.

edge more, but this does not prevent the finding of the subgraph isomorphic
to PG. On the contrary: when applying consistent labelling in the same case,
without correcting the adjacency relation, the roof would not be recognized,
since the adjacency of the roof to a green region was not allowed. On the
other hand, if a roof without a wall is present inside a blue region (Fig. 20b),
subgraph isomorphism would reject it, since there is no complete subgraph
isomorphic to PG. Consistent labelling, however, would recognize a roof,
since there are no adjacencies that are not allowed. These considerations
demonstrate the advantages of subgraph isomorphism as compared to con-
sistent labelling.
The method of subgraph isomorphism may be still improved by two
means. Firstly, more features of image parts and relations between the parts
must be introduced. Features must represent size, area, shape, curvature, etc.
Relations need not be those of adjacency: relations important for image
analysis may have the nature of geometric features of pairs of image parts
that are not necessarily mutually adjacent. These may be, e.g., angles between
lines, quotients of sizes, quotients of curvatures, etc.
Additional features and relations may increase the reliability of recog-
nition and reduce computation time. To achieve this, the order in which the
vertices of PG are tested must be chosen properly: the vertex of PG whose
semantic label is compatible with the fewest vertices of ZG must be tested first.
Such a vertex does not match most of the vertices of ZG, and the correspond-
ing matching variants are rejected from the beginning. There are only a few
vertices of ZG that match it, and only in these few cases are further vertices
of PG tested. Thus, the number of tested variants may essentially be reduced.
Additional relations may also reduce the computation time when at least
some relations are stored in the form of pointers indicating those parts of the
image that are in the desired relation to another part. This is the case, e.g.,
for the bounding relation in a cell list. Imagine that a vertex z, of ZG is found
whose features match the vertexp of PG. There is another vertex q in PG that
is in a relation R with p. If R is represented by a pointer then there is a pointer
(in the data structure describing the image) indicating a vertex w of ZG that
234 V. A. KOVALEVSKY

is in the same relation R with v . Thus, the time to scan all vertices of ZG to
find those in the relation R with v may be saved. In this way, certain
additional relations that are properly encoded may reduce the computation
time. Other additional relations serve as a means to reject more matching
variants as early as possible, which also reduces the computation time. The
second means of improving the method of subgraph isomorphism consists in
the following. Graphs, as tools for representing topological relations between
image parts, must be replaced by complexes, as was shown in Sections 11-V.
Cell lists (Section 111) not only describe the topological structure of images
completely and consistently; they also contain precise geometrical data about
image parts. These data are represented in the form of coordinates, which
makes possible the application of analytical geometry to image analysis.
possible the application of analytical geometry to image analysis.
Let us improve the subgraph isomorphism method step by step. Consider
first a slightly changed version of Formulation 2. The change consists in
representing the set ZE of the edges of the image graph ZG as a binary relation
in the set V of the vertices. (Remember that region adjacency graphs were
introduced to represent the adjacency relation of the regions.) The existence
of an edge between the vertices v, w E V will then be expressed as a two-place
predicate PR”(v, w) of two vertices v and w ; this predicate is true if the
vertices are connected by an edge of ZG.
For the second step, the featuresf, the semantic labels s,and the interpreta-
tion relation IN will be replaced by a set of one-place predicates. One such
predicate PRI must be assigned to every vertex p of the prototype graph PG
instead of the semantic label s. The predicates PRI are defined on the set of
vertices of ZG:
PRI: V + {true, false}.
PRI ( v ) is true if the formerly used featuref = FM(v) and the semantic label
s correspond to the interpretation relation IN c F x S.
Then, the mapping M : PG+ZG of the prototype graph PG into the
image graph ZG must be replaced by a mapping M V from the set P of
vertices of PG into the set V of vertices of ZG. The mapping of edges is
then replaced by the requirement that the previously mentioned two-place
predicate P R of certain vertex pairs of ZG be true, particularly if a pair ( p , q)
of vertices of PG is connected by an edge of PG then this pair becomes
labeled as related to the predicate P R . The corresponding pair ( M V ( p ) ,
MV(q)) of IG must then be tested as to whether P R of this pair is true. It
must be stressed here that all these changes influence only the form but not
the essence of the problem formulation. This step is necessary to make the
next, essential change comprehensible. The changed problem statement is as
follows:
FINITE TOPOLOGY AND IMAGE ANALYSIS 235
Formulation 4 Predicate Conditioned Mapping
Given:
a) the set V of image regions;
b) the set P of prototype regions;
c) a one-place predicate PRL for every region p of P with the predicate
defined on the set V of image regions;
d) the subset RP of marked prototype region pairs ( p , q), p , q E P;
e) a two-place predicate PR" defined on the set of image region pairs
(v, W ) , 27, WE v.
Find:
a mapping MI? P - + V such that the one-place predicates of all images
M V ( p ) of the prototype regions P E P and the two-place predicates of the
pairs ( M V ( p ) , MV(q))of the images of all marked pairs of prototype regions
are true.
It is easy to see that Formulation 4 is equivalent to Formulation 2 if
1) the set RP of marked pairs is equivalent to the former set of edges of
PG;
2) V V E V , V p e P P R ~ ( v=
) (FM(v), S)E I N with s = S M ( p ) ;
and
3 ) VV, W E V PR"(v, W ) = (v, W ) E IE.
Formulation 4 may be naturally generalized in a way, which is based on the
representation of segmented images by block complexes as specified in
Section VII. We shall call the elements of a block complex in the sequel cells
rather than blocks because they are cells with respect to the block complex.
The generalization consists in the following:
1) We replace the set Vof image regions by the set SC of cells of the image
block complex. Thus, the image graph IG is replaced by the image complex
IC = (E, B, dim).
2 ) We replace the prototype graph PG by a prototype complex PC = ( P ,
B', dim') containing cells of dimensions 0, I , and 2 while the 2-cells are the
prototype regions and the other cells compose their boundaries.
3) We replace the one-place predicates PR: verifying the colors of regions
by other one-place predicates having a cell of IC of any dimension as its
argument. This may be, e.g., a predicate depending on the area of a region,
or on the direction of a line, etc. Such a predicate may be defined as being
true if the area of a region is in a predetermined range, i.e., greater than A,,,
and less than A,,, . One or many such one-place predicates may be assigned
to a cell p of the prototype complex PC in the same way that we have formerly
assigned a semantic label s to a vertex of the prototype graph. Instead of
verifying the interpretation relation IN, we must now calculate all predicates
assigned top, for the cell c of the image complex IC,which p is mapped onto.
236 V. A. KOVALEVSKY

Suppose, e.g., two predicates are assigned to a 2-dimensional cell p” of PC,


one predicate demanding that the area lies in a range and the other that the
horizontal size is greater than the vertical size. If p” is mapped onto a region
r of the image then the area of r must be calculated and compared with the
limits of the range. Also, the sizes of r must be specified and compared with
one another.
4) We replace the single two-place predicate describing the adjacency of
regions in the image by many different two-place predicates. One of them is
the direct descendent of the adjacency predicate of Formulation 4. It
represents the bounding relation of the cells. E.g., this predicate, when
applied to a point and a region, is true if the point bounds the region, i.e., it
belongs to the boundary of the region.
Other two-place predicates are any predicates depending on some features
of two cells of IC.This may be, e.g., a predicate that is true if the ratio of the
areas of two regions is in a range, or a predicate that is true if the angle
between two lines is in a range. This may also be a predicate depending on
two cells of different dimensions, e.g., it is true if a point lies below a region,
etc.
It is often necessary for the image analysis to define two-place predicates
for pairs of non-incident cells and even for cells that are far away from each
other. The corresponding binary relations may not be representable by
planar graphs. This is important for computation time, since finding isomor-
phisms of non-planar graphs demands much more time than finding those of
planar graphs. Certain two-place predicates are assigned to some pairs of
cells of PC. If a predicate P R is assigned to a pair ( p , q), p , q E PC, p is
mapped onto a cell a of IC,and q is mapped onto the cell b of IC,then the
predicate PR” must be calculated for the pair (a, b).
5 ) We replace the requirement that all specified predicates be true, i.e., the
conjunction of all predicates be true, by the requirement that a predetermined
logical function of all predicates be true. The function may contain conjunc-
tions as well as disjunctions and negations. We shall call this function the
global predicate associated with the prototype complex.
According to these changes, the new problem statement is as follows:
Formulation 5: Predicate-Conditioned Mapping of Subcomplexes
Given:
a) a cell list describing the image to be analyzed as a block complex IC
with all metric data sufficient to restore the image;
b) a prototype complex PC;
c) sets of one- and two-place predicates;
d) an assignment of the one-place predicates to the cells and of the
two-place predicates to some pairs of cells of PC;
FINITE TOPOLOGY A N D IMAGE ANALYSIS 237
e) a logical function representing the global predicate.
Find:
a mapping MP: PC + IC from the prototype complex PCinto the image
complex IC such that the global predicate be true.
This short formulation may be explained in the following way. The mapping
MP: PC- IC defines a subcomplex S of IC that must satisfy the global
predicate. This means that for every cell p of PC to which a one-place
predicate is assigned, this predicate must be calculated for the image cell
MP(p) of IC and the result, which is a logical value - true or false - must be
substituted into the global predicate as one of its arguments. Similarly, for
every pair of cells ( p , q) of PC, to which a two-place predicate is assigned,
this predicate must be calculated for the pair of corresponding cells (MP(p),
M P ( q ) ) of IC and the result substituted into the global predicate. The
mapping M P must be such that the global predicate be true. Note that the
bounding relation of PC is included into the two-place predicates: for some
pairs of PC, the predicate assigned to them requries that the images of these
cells in IC be incident; for some other pairs the corresponding two-place
predicate requires some additional (e.g., geometrical) relation of the image
cells. There may also be pairs for which the assigned two-place predicate
requires only an additional relation without requiring incidence.
Consider a simple example. Figure 21 shows a two-dimensional complex
representing a house. It consists of two 2-cells R, and R,, six 1-cells L , ,
L,, . . . , L,, and five 0-cells P I , P,, . . ., P5.Their bounding relations are
obviously defined by Fig. 21 and we shall not consider them in our global
predicate, to make it simpler. We assign to the cell R , two one-place
predicates and to each of the cells R, and L6 one one-place predicate:
PR; = (the cell is a triangle) - assigned to R , ;
PR; = (the cell is a trapezium) - assigned to R , ;
PR; = (the cell is a rectangle) - assigned to R,;
PR; = (the cell is a horizontal line) - assigned to L6.

FIGURE
21. A prototype complex representing a house.
238 V. A. KOVALEVSKY

Further, we assign to the pair (&,, R , ) both and to the pair (&, R 2 )the first
of the following two-place predicates:
PR; = (the first cell bounds the second one);
PR; = (the first cell lies below the second one).
Then a global predicate defining a house may be
G P = ( P R ; ( M P ( R , ) )v PRi(MP(R1))) A PRi(Mf‘(R2))A
f‘Ri(MP(L6))A PR;’(MP(&), MP(R1)) A
PR;(MP(L6)9 MP(R1)) A PR;(MP(L,), MP(R2)).
We have represented predicates by their verbal descriptions. In computer
realizations, each predicate is a subroutine that may be applied to certain
records in the cell list representing the image. The subroutines test geometri-
cal features of cells of different dimensions, topological and/or geometrical
relations of pairs of cells, and return logical values “true” or “false.” These
values are then verified by the global predicate realized as a main subroutine
calling the predicate subroutines. The next section describes the realization of
this concept.

I x . VARIABILITY OF PROTOTYPES AND USE OF DECISION


TREES

Consider now the computer implementation of the described image analysis


procedures. It is based on the calculation of global predicates described in the
previous section. In a computer realization, a global predicate corresponds to
a program containing subroutines that calculate the elementary predicates,
and a control part, which specifies which elementary predicate must be
calculated next. The latter decision depends on the logical value returned by
the previous predicate.
Such a recognition program may be represented as a directed tree structure
whose vertices correspond to the elementary predicates, and the directed
edges lead from a predicate just verified to those to be verified next. Each of
the edges starting at the same vertex representing the predicate P R corres-
pond to a value returned by PR. There are two such values: true and false.
Hence, the tree is a binary tree. We shall call such structures decision trees and
programs implementing decision trees recognition programs. (It is, of course,
possible to generalize the concept while considering subroutines that return
more than two values and compose non-binary trees, but such a generaliza-
tion is of no principal importance since any non-binary decision may be
replaced by a set of binary decisions.)
FINITE TOPOLOGY A N D IMAGE ANALYSIS 239

439BBeI
22. Variants of prototype complexes representing a house.
FIGURE

The analysis of an image by a recognition program consists of the


following stages. The image must be scanned and segmented into quasi-
homogeneous regions. A cell list must be produced for the segmented image.
The list must then be read by the recognition program, and for each cell
(region) in the list the global predicate must be verified. The program starts
with the verification of the elementary predicate corresponding to the root of
the tree and tracks a path in the tree until an ultimate decision is reached. If
the decision is “false” then the next cell of the list must be taken. If the
decision is true then a subcomplex representing an object of the class to be
recognized has been found. The subcomplex may be marked in the cell list or
be recorded in a new list of recognized objects.
It is often the case that one object class must be described by a prototype
having a variable structure. E.g., the class “house” may be represented by the
complexes shown in Fig. 22. The “brute force” way of describing such a class
consists in producing as many global predicates - and hence as many decision
trees - as the number of different structures. This way may become rather
expensive.
A more efficient way is to construct a single decision tree in which all
global predicates of different structures are combined in such a way that
each individual structure corresponds to a path from the root of the tree
to one of its leaves. E.g., the tree for all the structures of Fig. 22 is shown in
Fig. 23.
Similarly, a decision tree may be constructed to describe many different
object classes having certain substructures in common. E.g., Fig. 24 shows
four prototype complexes, and Fig. 25 shows the decision tree describing all
of them.
It is important to notice that the recognition by means of decision trees is
performed by recognizing first the membership in a more general class
including some particular classes. Recognition of the most general, inclusive
classes is performed at the stages near the root of the tree. Recognition of
subclasses of a recognized class is realized at stages further from the root.
Thus, in the example of Fig. 25, the most general class is “an object contain-
ing a trapezium.” Objects of all four classes shown in Fig. 24 belong to this
general class. If this general class has been recognized, then at the next stage
240 V. A. KOVALEVSKY

Trapez iun

Adjacent triangle Triangle


at the side
I Y
I I 1
Parallelogram Adjacent Adjacent RJ
adjacent to rectangle
trapezium

Parallelogram
adjacent to horizontal horizontal

Padjacent
a r a 1 l ;to
;ogrmn I H ;,uds E I RJ
e
each other

Comnon side of
para1 lelograms
ver t ica 1

FIGURE23. A decision tree for recognizing the houses of Fig. 22.

the subclass “an object containing a rectangle adjacent to a trapezium” must


be recognized, etc.
We have developed a flexible means for producing new trees without
making changes in the recognition program. It has the nature of a quasi-

+$
FIGURE
24. Prototypes of four classes of hand-made drawings.
FINITE TOPOLOGY AND IMAGE ANALYSIS 24 1

1 S i z e

3 One neighbor of
comparable size

4 Rectang1e

5 1
I
Rectangle is longer I
I
‘RJ

1 I I
Comnon side is 7 Comnon side
horizontal

10 Comnon side RJ
8 is below the
trapezim
I I

111
I

H O U S E I
1

9
Comnon side is
the great base
1

of trapezium

12 S H I P

!+MOOTHING
13 IRON

FIGURE
25. A decision tree for recognizing the classes of Fig. 24

natural formal language and a corresponding compiler. A decision tree may


be described in this language by a text file produced by any commonly used
text editor program.
The description of a tree consists of records, each of which describes a
vertex of the tree. The order of the records in the file is of no importance since
the tree structure is defined by pointers represented by names in the text file.
242 V. A. KOVALEVSKY

A record contains the following fields, which are also written in an arbitrary
order: the name of the vertex, the name of the eldest son,* the name of the
next brother, the name of the predicate (subroutine), some input parameters,
and some output results. An input parameter may be a number or a name.
A number will be transferred to the subroutine and directly used for calcula-
tions, e.g., as a limit for a value being verified by the subroutine. A name as
an input parameter points to a result obtained at some previous stage of the
recognition process. The same name must be used in the record of another
vertex as an output result.
Consider as an example the record describing vertex 2 of the tree of
Fig. 25:
Record = (Name: trapezium; Son: one neighbor; Brother: rejection;
Subroutine: TRAPEZ; Znpl: region; Znp2: 0.05; Znp3; 0.2; Outl: decision;
Out2: great base; Out3: small base; End vertex).
The notation “Son: one neighbor;” denotes that if the first output parameter
“Outl: decision” of the subroutine “TRAPEZ” is equal to 1 (which means
“true”) then the next vertex to be chosen is the vertex with the name “one
neighbor”. The notation “Znp2: 0.05;” means that the value 0.05 must be
transfered to the subroutine “TRAPEZ” as its second input parameter. This
value will be interpreted by the subroutine as the upper limit for the sine of
the angle between two straight segments that are considered candidates for
the bases of a trapezium. The sine must be less than 0.05 for the two segments
to be recognized as the bases of a trapezium. Similarly, the notation “Znp3:
0.2;” means that the sine of the angle between the lateral edges must be
greater than 0.2.
The notation “Out2: great base;” means that the subroutine “TRAPEZ”
returns as its second result the pointer onto the straight segment in the cell
list; this segment was recognized by “TRAPEZ” as the greater base of the
trapezium.
As an example of using pointers as input parameters, consider the descrip-
tion of vertex 6 of Fig. 25:
Record = (Name: common great base; Son: hammer; Brother: rejection;
Subroutine: EQUINT; Znpl: joint; Z p 2 : great base; Outl: decision; End
vertex).
Here, the notion “Znp2: great base;” means that the second input parameter
of the subroutine “EQUINT” is the pointer onto the straight segment in the
cell list; the segment was previously found by the subroutine “TRAPEZ” as
* According to the commonly used terminology referring to genealogical trees, the son of a
vertex v is the vertex at the end of an edge starting at v . The meaning&Juther and brother is
obvious.
FINITE TOPOLOGY AND IMAGE ANALYSIS 243

FIGURE26. Synthetic images used in experiments.

the greater base of the trapezium. The task of “EQUINT” is to check if two
integers are equal to each other. In the particular case of vertex 6, these
integers are pointers onto the great base and onto another straight segment
“joint” bounding both the trapezium and the rectangle. However, the same
subroutine may be used in other vertices of the tree to compare other
integers.
A special compiler translates the set of descriptions into an array whose
elements are numbers. Some of the numbers are numerical parameter values,
others are addresses in the array. The array is then used by the main
recognition program to control its performance: the program reads the array,
calls the necessary subroutines, verifies their decisions, and tracks the corres-
ponding path in the decision tree, leading to an ultimate decision. As we have
seen in the preceding examples, the array also serves to transfer data among
the subroutines.
The described recognition program was successfully used for recognizing
both the synthetic images shown in Fig. 26 and the hand-made drawings
shown in Fig. 27. Synthetic images were produced interactively, by means of
an image editor, and then converted to cell lists. Such an image consists of
polygonal regions, each having a gray value different from those of other
regions. The sizes, shapes, and locations of the regions were chosen
244 V. A . KOVALEVSKY

FICUKE
27. Hand-made drawings used in experiments.

arbitrarily, restricted only by the human imagination, regarding a “house,”


“ship,” “hammer,” and “smoothing iron.” Experiments have demonstrated
good agreement between the decisions of the program and that of humans:
in cases of non-extraordinary shapes and proportions, the decisions of the
program were always correct.
In the case of hand-made drawings on paper, the preprocessing was much
more complex: the image was scanned by a CCD-camera, binarized and
transformed into a cell list with an approximation tolerance (Kovalevski,
1989a) approximately equal to the line thickness. This, however, turned out
not to be sufficient for representing each side of a trapezium or rectangle in
a hand-made drawing by a single straight segment. To overcome the dif-
ficulty, a measure of line curvature was introduced to detect “break points”
of straight lines. The curvature measure depends on the approximation
tolerance. It is defined in the solution of the following problem:
Given: a polygonal line and a number called approximation tolerance.
Find: a sequence of circular arcs such that

a) the tangent is continuous;
b) the sum of absolute values of curvatures is minimal;
c) the deviation from the given polygonal line is at any location less than
the given approximation tolerance.
FINITE TOPOLOGY A N D IMAGE ANALYSIS 245
Every vertex of the given polygonal line gets a curvature measure equal to the
curvature of the closest circular arc. If the curvature measure of a vertex is
greater than a predetermined threshold, it is regarded as a break point. Thus,
it becomes possible to reliably detect polygon corners in hand-made
drawings.
Every object is represented in an original drawing by a connected black
region since the drawing consists of black strips several pixels wide rather
than of mathematical lines (compare with Fig. 27). The boundary of the
region consists of one outer and one or more inner components. In the case
of the object classes of Fig. 24, there are two inner boundary components.
They are simultaneously boundaries of white regions included into the black
region. The program checks in the cell list for such white regions. Having
found one, it finds break points in its boundary. These points and the lines
connecting them are used for recognizing and analyzing trapeziums and
rectangles.
Experiments were made with approximately twenty hand-made drawings.
The results of recognition were always correct. Rejections (“unknown
object”) occurred in the cases of careless drawings containing gaps in the
lines. The necessary improvements of the recognition procedure were made
later on in the cartographical application as described in the next section.

X. APPLICATIONS

A . Handwritten Characters

The image analysis method described in the previous section was used by the
author in many applications. The earliest of them (Kovalevsky, 1986) is
concerned with the recognition of handwritten characters. At this early stage
of the research the technique was yet rather imperfect: the early version of the
cell list contained a list of break points and a list of strokes but no regions.
The prototypes were described by matrices in which two one-place predicates
were specified for every prototype stroke and up to two two-place predicates
for some stroke pairs. No decision trees combining subclasses and classes
were used. Thus, the global predicate of each subclass was verified separately,
which took a relatively large amount of time. The variability of the pro-
totypes was represented by the presence of non-obligatory strokes. E.g., the
character “three” might have a short horizontal stroke at the break point in
the middle, through careless writing. The prototype was provided with a
corresponding non-obligatory stroke, which was mapped onto an
appropriate stroke of the image to be recognized if such a stroke existed. Its
absence, however, did not prevent the recognition. On the contrary, every
246 V. A. KOVALEVSKY

- 1
1

i d
1

FIGURE 28. (a) a hand-made block diagram, (b) boundary approximation,and (c) result of
automatic digitization.

obligatory stroke might have a corresponding stroke in the image part to be


recognized as isomorphic to the prototype. The use of non-obligatory strokes
is equivalent to having many prototypes for an image class, prototypes that
differ from each other by the presence or absence of non-obligatory strokes.
More details about this early realization may be found in Kovalevsky (1986).
Experimental results were rather good: handwritten characters of up to 50
subclasses (all possible versions of handwritten digits), with variable size and
proportions of the parts, were always correctly recognized. The rejection rate
FINITE TOPOLOGY AND IMAGE ANALYSIS 247

FIGURE
28. continued.

was about 2%, and the recognition time about 2 seconds per character on a
rather slow PDP-1 1-like computer. Even at that early stage it was possible
to define new classes without changing the program: the prototypes were
described by editable text files, which were automatically compiled to
numeric matrices then used by the recognition program.

B. Block Diagrams

The next application is for automatic digitization of hand-made drawings of


block diagrams. An example of such a drawing is shown in Fig. 28a. The
result of automatic digitization must appear as a list of blocks, a list of
connected nodes, and a list of connection lines. The nodes lying on block
boundaries must be provided with indications of block assignments. Also,
coordinates of block corners, nodes, and break points of connection lines are
desirable for further displays.
The initial stages of processing are, in this case, the same as for recognizing
the patterns of Fig. 24: a drawing is scanned by a CCD camera, binarized,
and transformed into a cell list with a predetermined approximation
tolerance. Results of the approximation are shown in Fig. 28b: this image is
reconstructed from the cell list which was produced with a rather coarse
approximation tolerance of 4 pixels. It may be seen that slightly curved
boundaries of the hand-made “lines” are represented by as few as possible
248 V. A. KOVALEVSKY

29. Classes of break points.


FIGURE

straight segments. The next processing stages are more complex. They are
subdivided into three hierarchy levels.
At the first level the break points of the boundaries are found and recorded
in an additional list. Break points are classified into 16 classes according to
the convexity or concavity of the boundary of the black region at the point
and to the orientations of the two boundary segments meeting at the point
(Fig. 29). At the second level, groups of adjacent break points are recognized
as “singular locations.” They are classified into four classes according to the
number of strokes meeting at a location (Fig. 30), and recorded into the list
of singular locations. Coordinates of singular locations are calculated as the
average values of the coordinates of all corresponding break points.
At the third level, the recognition of blocks, nodes, and connection lines is
performed as explained in the following. The prototype of a block is a white
region with exactly four concave corners while exactly two strokes meet at
each corner (Fig. 3 la). The rest of the outer boundary may be arbitrary. The
prototype of a node is a singular location that is either a cusp (Fig. 31b) or
a T-shaped crossing with the boundary of a block (Fig. 31c). All other
singular locations are either block corners or intermediate points of connec-
tion lines.
The prototype of a connection line is a pair of singular locations ( S L , ,SL,)

a b C d

FIGURE
30. Classes of singular locations:(a) cusp, (b) comer, (c) T-shapedcrossing, and (d)
cross.
FINITE TOPOLOGY A N D IMAGE ANALYSIS 249

a b C

FIGURE31. Prototypes of (a) a block and (b, c) a node.

satisfying certain conditions, as specified. Let us call the direction of the line
from SL, to SL, rounded to a multiple of 90" the main direction of the pair
( S L , , SL,). Thus, a main direction may be only east, south, west, or north.
The conditions are
a) there is in the cell list a boundary line, and two adjacent break points
such that one belongs to SL, and the other to SL,;
b) neither SL, nor SL, is a block corner;
c) if both are nodes then they belong to different blocks;
d) the straight line connecting SL, with SL, must have an angle with the
horizontal or vertical direction less than a predetermined threshold (e.g.,
30"); and
e) there is no other pair containing S L , , that satisfies condition a) and has
the same main direction as ( S L ,, SL,), in which pair the distance between the
singular locations is less than the distance between SL, and SL,.
Every boundary component of a binary image is represented in the cell list
as an item of the sublist of lines. Intermediate points of a line that are located
in the metric list compose a closed polygon. This makes the recognition of
blocks and connection lines easy and fast. Thus, during the recognition of
blocks, polygons contained in the cell list are tracked point by point. Each
point is checked as to whether or not it is marked as a break point. If so, then
the properties of the corresponding singular location are tested. As soon as
they d o not correspond to the requirements of the block prototype, the
tracking is interrupted and the next polygon is tested. If all properties are
correct, a block is recorded into the list of blocks.
After having recognized all blocks, the program reads the list of singular
locations and looks for T-shaped crossings. Each T-shaped crossing is tested
as to whether it is located at a block boundary. If so, a node is recorded into
the list of nodes accompanied by the identifier of the block to which the node
belongs. The record of the block gets a pointer onto the first of its nodes. All
other nodes of the same block are linked to each other by pointers and thus
may be read later on without searching.
250 V. A. KOVALEVSKY

Connection lines are represented by pointers linking singular locations


(SL) to each other. Any SL has four pointers assigned to it, one for every
main direction: east, south, west, and north. Thus, an SL may be connected
to up to four other SL's. If there are fewer than four connections, then some
pointers are equal to zero. When recognizing connection lines, the polygons
in the cell list are tracked again. Singular locations are detected by means of
corresponding marks already made during the processing in the first two
hierarchy levels. Two singular locations following each other along a
boundary are tested for whether they meet the preceding conditions. If the
test holds, then each of the two SL's gets a pointer onto the other one.
The results of the recognition contained in the lists of blocks, nodes, and
singular locations with connection pointers may be displayed while represent-
ing blocks, nodes, and connection lines by different colors. To make the
display more accurate, an additional program was developed, that makes all
lines either exactly horizontal or exactly vertical. The results of recognizing
the drawing of Fig. 28a are shown in Fig. 28c.

C. Cartography

Cartography, as with many other branches of production, now increasingly


uses computer aided technologies. Therefore, it is important to digitize and
to input into computers the existing cartographical documents on paper or
film. Digitization by hand is slow, expensive, and introduces many errors
whose correction is also slow and expensive. The best solution is automatic
digitization, accompanied by as little human interaction as possible.
However, this problem is so complex that almost no positive results were
known until recently. The use of the methodology described in the preceding
sections has already led to some success.
The airn of the experiment (Kovalevsky, 1989b) was recognition of
cartographical point objects in fragments of topographical maps on film. The
fragments were scanned with a high resolution scanner, binarized, and
recorded on magnetic tape. Then cell lists of these images were produced.
Twenty-five object classes of different complexity were chosen and their
prototypes described by means of text files, as explained in the previous
section. Some examples of the objects to be recognized are shown in Fig. 32.
It was possible to choose the parameters of the recognition subroutines in
such a way that the error rate was minimized at the expense of the rejection
rate. There was a total of about 100 objects in the map fragments used in the
experiment. An example of such a fragment is shown in Fig. 33. All objects
were recognized correctly. The rejection rate was about 2%. The recognition
speed was about 20 objects per second with the SM 1420 computer (similar
to the PDP-I I).
FINITE TOPOLOGY A N D IMAGE ANALYSIS 25 1

FIGURE32. Prototypes of cartographical objects to be recognized.

Another experiment intended to make a complete analysis of map


fragments containing settlements as shown in the negative image of Fig. 34a.
In this case, the objects to be recognized are street borders, houses, trees
(represented as rings), and churches (representing as crosses).
Trees and churches being isolated objects, they can be recognized by the
previously described technique. E.g., the prototype of a tree is described as
a black region having one hole. (Fig. 34a shows a negative image.) The
number of the vertices in the polygon representing the outer boundary must
be at least six (this value depends on the scale and the approximation

33. An example of a test map


FIGURE
252 V. A. KOVALEVSKY

FIGURE 34. Fragment of a topographical map with a settlement: (a) the negative of the
original image, (b) visualization of the cell list with approximation tolerance of 2 pixels, ( c )
recognized street borders, (d) recognized houses and (e) synthetic image reconstructed from the
list of recognized objects.
FINITE TOPOLOGY A N D IMAGE ANALYSIS 253
tolerance). The vertical and horizontal diameters of the region, as well as the
ratio of the diameters, must be in prescribed limits. The same must be true
for the hole. The boundary components must be convex.
This description is complete: any region satisfying all indicated conditions
is a black ring with some possible small deviation from the ideal shape. The
description is rather complex and it may seem that checking all conditions for
all regions in an image may be rather time consuming. However, due to a
skilled construction of the decision tree, the average recognition time was
made small. The fulfillment of the first three conditions may be directly
extracted from the cell list. Each following condition is checked only for
regions that satisfy the preceding ones. The most time consuming condition
(that of convexity) is checked last (and hence, rather rarely). Thus, recogniz-
ing all seven trees in the comparatively complex image of Fig. 34a took a
fraction of a second.
A more complex problem is the recognition of houses, which are represented
as small rectangles merged with the strips denoting the street borders. The
program separates houses from street borders in the following way. For each
polygon edge e in the cell list whose length is greater than a threshold, all
other edges close and parallel to e are found. “Close” means that both end
points are on the black side of e, have a distance from e less than the
maximum possible strip width, and there is no third edge between e and the
edge being tested. “Parallel” means that the angle between the edges is less
than a threshold. Then a strip having e as one of its sides and a width equal
to the average distance of all found points is constructed and recorded as the
recognized street border.
All the necessary computations are fast and easily realizable when using
the cell list. However, the necessity of checking all pairs of edges may
nevertheless lead to great computation time. There are two ways to make the
computation faster. For comparatively small image fragments containing a
few thousand edges, it is sufficient to construct a circumscribed rectangle for
the edge e with sides parallel to coordinate axes, and to enlarge it by the
maximum strip width. Then, only such edges must be tested both of whose
end points are inside the rectangle. Such a test is much faster than the
calculation of distances and angles.
If the image fragment is large then it is expedient to prepare an auxiliary
data structure called “pseudoraster” (Kovalevsky, 1989b), which may be
considered as a means for two-dimensional sorting of objects in an image.
The image is (implicitly) partitioned into squares of, e.g., 32 x 32 pixels
composing a rectangular grid. For each square there is a list of pointers in the
pseudoraster indicating all objects (regions, lines, points) in the cell list that
cross the square. By this means it becomes possible to check for a given edge
e only such other edges that lie in the neighborhood of e.
254 V. A . KOVALEVSKY

The recognition results are recorded in compressed form in a list of


recognized cartographical objects. The content of the list may later be
converted into an idealized map in which the imperfection of particular
objects due to inexact drawing and to digitization errors is compensated for
by displaying for each cartographical object a standard symbol. Such an
idealized map produced fully automatically from the image of Fig. 34a is
shown in Fig. 34e. These investigations have served as a starting point for the
suggestion of a new cartographical data structure (Kovalevsky, I989b) that
may be used efficiently in computer aided cartography.

D . Technical Drawings

As in the case of cartography, many companies using computer aided design


(CAD) technologies are interested in digitizing and inputting into computers
technical drawings that exist on paper. As already mentioned, digitization by
hand is slow, expensive, and unreliable. Automatic or semi-automatic
digitization, with as little human interaction as possible, however, can by no
means be considered as a solved problem. Consider ways of applying the cell
list methodology to this field. The aim in this case is automatic or semi-
automatic production of CAD-files from drawings existing on paper. Dif-
ficulties arise immediately during the scanning: the drawings are often as
large as one meter (40 inches) and contain lines only 0.1 mm (0.004 inch)
thick. Scanning such lines without loss of information requires a resolution
of 20 dots per millimetre or 500dpi. It is difficult to construct a scanner that
can scan such a document as a whole with the necessary resolution. However,
even having overcome these difficulties, one arrives at a difficult situation: the
scanned data, even if just binarized during the scanning, require a storage
space of about 300Mbyte, while the overwhelming part of these data
represent nothing but white background. Processing such a huge amount of
data in the usual form of a two-dimensional array is difficult because there
is usually not enough main memory. Working with disk files is too slow.
Economical encoding of the data, e.g., by the run length code, makes the
processing rather complex and awkward because connected regions and lines
in the drawing are cut into many small pieces.
The methodology of cell lists suggests an efficient solution of the problem.
It is possible to scan a large drawing fragment by fragment, produce a cell list
of each fragment, and then link the fragment lists to each other, thus
composing a total list. It is even possible to scan a large fragment or the whole
drawing with some low resolution of about 2 dots per millimeter (50dpi),
produce a cell list, recognize locations where higher resolution may be
required, and then scan these locations for the second time with some high
resolution. Such locations are, e.g., those where characters are to be read, or
FINITE TOPOLOGY A N D IMAGE ANALYSIS 255

a b

FIGURE35. Replacing (a) a magnified low resolution fragment in the cell list by (b) a high
resolution fragment.

where there are some arrows of the dimension lines, or where two or more
lines are located close to each other.
Experiments were done by means of a scanner consisting of a moving table,
controlled by the computer, and a fixed CCD-camera with two different
objectives yielding resolutions of 60 and 600dpi. A fragment of
200 x 200 mm was scanned with low resolution and quantized into four gray
levels such that level 0 corresponded to black regions and thick lines, level 3
to the white background, and levels 1 and 2 to thin lines, or groups of merged
parallel thin lines, or gaps between close lines. Then, a cell list was prepared
for this image and the curvature of the boundaries was measured. All
locations where the curvature was greater than a threshold (of about
( 5 mm)-’) were recorded as “suspect.” Each suspect location was surrounded
by a square of 2 x 2 mm. Overlapping squares were joined to clusters. Then
each cluster was scanned with the high resolution, the obtained image was
binarized and converted into a cell list, and this cell list was linked to the low
resolution list. The scale of the latter was adjusted to that of the high
resolution list simply by multiplying all coordinates by 10.
The linking procedure may be explained as follows. Suppose the location
of the boundary of the cluster scanned with high resolution is known exactly
as a region in the coordinate system of low resolution. Consider the boun-
daries of the dark strips that represent the drawing lines in both images with
low and high resolution. Regard now the crossing points of these boundaries
with the boundary of the cluster (Fig. 35). The part of the low resolution
image (Fig. 35a) inside the cluster boundary, shown as a thin square in Fig.
35a, must be deleted and replaced by the high resolution image inside the
corresponding square of Fig. 35b. Thus, those edges of the polygons
representing the stroke boundaries in the low resolution list, which cross the
cluster boundary, must be deleted and replaced by the corresponding edges
256 V. A. KOVALEVSKY

of the high resolution list. E.g., the edge L of Fig. 35a must be deleted, the
edge I of Fig. 35b must be disjoined from the point r of the high resolution
list and connected to the point p of the low resolution list. Similar changes
must be made with all edges crossing the cluster boundary.
The problem consists in finding the correspondence between two sets of
edges of both lists. The first difficulty arises when there are more high
resolution edges than low resolution ones. This may happen if some lines
close to each other are merged in the low resolution image (compare, e.g., two
horizontal lines in the bottom part of Fig. 35b with the corresponding part
of Fig. 35a). If such merged lines cross the cluster boundary, additional lines
must be inserted into the combined cell list.
The second difficulty arises because the location of the cluster in the low
resolution coordinate system is known only approximately. Consequently,
the boundary of the cluster must be replaced by a strip whose width depends
on the accuracy with which the location of the cluster is known. The mapping
between the two sets of edges that cross the strip must be such that the
directions of edges mapped onto each other coincide approximately and the
sum of squared distances between such edges be minimal.
The computer realization of this solution was successful. The simple
drawings used in the experiments were correctly linked. The same technique
was successfully used for linking many fragments of equal resolution to a
joined cell list. This is necessary if the drawing cannot be scanned as a single
fragment even under low resolution.
Another problem important for transforming scanned drawings into
CAD-files consists in replacing black strips by mathematical lines and in
recognizing types of lines such as dimension lines, symmetry axes, etc. Trans-
forming strips into mathematical lines cannot be realized by the usual
thinning techniques since these techniques are noise-sensitive and yield rather
imprecise results for lines crossing each other at an acute angle. Using cell
lists leads to faster processing and better results since the medial axes of strips
may simply be calculated by averaging the coordinates of the polygon
vertices representing the boundaries of the stripe. Recognition of line types
has also been realized successfully by means of prototypes describing
relations between dimension lines, auxiliary dimension lines, contour lines
etc. Experiments were successful under high resolution sufficient to reliably
recognize the arrows of the dimension lines. Thus, Fig. 36a shows a fragment
of a drawing and Fig. 36b shows the automatically found mathematical lines
(CAD-lines) represented by dotted, dashed, and solid lines, corresponding to
the recognized line types. The notation is clear from Fig. 36b.
Difficulties arise in the case of insufficient resolution. However, the concept
of using double resolution processing as described here may solve this
FINITE TOPOLOGY A N D IMAGE ANALYSIS 257

b
FIGURE36. Recognition of lines and their types: (a) an original drawing and (b) the
recognition results.

problem also, since the higher resolution may always be chosen high enough
to reliably recognise the arrows as well as all other fine details.

XI. CONCLUSIONS

The concept of finite topology presented here has led to a new data structure,
called cell list, for encoding and processing segmented images. The cell lists
make any topological and geometrical analysis of images efficient and simple.
Adjacent regions, regions contained in one another, and lines and points
incident with each other or with some regions may be directly and quickly
found in the list. Distances between points, areas and perimeters of regions,
angles between line segments, etc., may be easily calculated from their coor-
dinates explicitly stored in the list, by means of usual well-known formulae.
All geometrical transformations, e.g., translation, magnification, reduction,
rotation, etc., may be performed by recalculating the coordinates of the
0-cells and intermediate points according to formulae of analytic geometry.
258 V. A. KOVALEVSKY

The results of such transformations are immediately visible: a cell list may be
rapidly converted into a raster image and sent to a display unit.
As we have demonstrated in the last section, the cell list technique was
effectively applied to object recognition and structural image analysis. A
numeric analysis of coordinates of polygon vertices, instead of the commonly
used mask matching, makes the recognition procedure tolerant of geometri-
cal distortions of the boundaries. Thus, the technique was used to convert
scanned drawings into CAD structures. This was done for both technical
drawings and hand-made block diagrams. Recognition of hand-written
characters was also performed successfully. Recognition techniques for this
kind were implemented in an experimental system for computerized
cartography. There is every reason to hope that this technique will soon be
implemented successfully for analyzing three-dimensional images and struc-
tures.

ACKNOWLEDGMENTS

The author wishes to express his deepest appreciation to Dr. R. Kopperman,


Dr. Yung Kong, Dr. 0. Tretiak, and Dr. G. T. Herman for discussions and
valuable suggestions.

REFERENCES

Alexandroff, P. (1937). Diskrete topologische Raeume, Matematicheskij Sbornik 2(44),Moscow,


50 1-5 19.
Alexandroff, P., and Hopf, H. (1935). “Topologie 1.” Berlin.
Elliott, H., and Srinivasan, L. (1981). An application of dynamic programming to sequential
boundary estimation, Computer Graphics and Image Processing 17(4), 294-3 14.
Herman, G. T. (1990). On topology as applied to image analysis, Computer Vision. Graphics and
Image Processing 52, 409-415.
Herman, G . T., and Webster, D. (1983). A topological proof of a surface tracking algorithm,
Computer Vision, Graphics and Image Processing 23, 162-171.
Khalimsky, E. (1977). “Ordered Topological Spaces” (in Russian). Naukova Dumka, Kiev.
Kong, T. Y., and Rosenfeld, A. (1991). Digital topology: a comparison of the graph-based and
topological approaches. In “Topology and Category Theory in Computer Science” (G. M.
Reed, A. W. Roscoe, and R. F. Wachter, eds.), pp. 273-289, Oxford University Press, Oxford,
U.K.
Kong, T. Y.,Kopperman, R., and Meyer, P. R. (1991). A topological approach to digital
topology, American Mathematical Monihly 98, 901-911.
Kovalevsky, V. A. (1986). Structural image analysis, Proceedings of the 8th International Confer-
ence on Pattern Recognition, Paris, October 27-31. 1986. IEEE Press, 358-368.
Kovalevsky, V. A. (1989a). Finite topology as applied to image analysis, Computer Vision,
Graphics and Image Processing 46, I4 1- 16 I .
FINITE TOPOLOGY AND IMAGE ANALYSIS 259
Kovalevsky, V. A. (1989b). Zellenkomplexe in der Kartografie, Bild und Ton (9, 10) Germany,
278-280, 312-314.
Kovalevsky, V. A. (1990). New definition and fast recognition of digital straight segments and
arcs, Proceedings of the IOth International Conference on Pattern Recognition, Atlantic City.
June 17-21. IEEE Press, Vol. 11, 31-34.
Pavlidis, T. (1977). “Structural Pattern Recognition.” Springer-Verlag, New York.
Pavlidis, T. (1982). “Algorithms for Graphics and Image Processing.” Computer Science Press,
Rockville, Maryland.
Rinow, W. (1975). “Lehrbuch der Topologie.” Deutscher Verlag der Wissenschaften, Berlin.
Rosenfeld, A. (1970). Connectivity in digital pictures, Journal of the ACM 17, 146-160.
Rosenfeld, A., and Kak, A. C. (1976). “Digital Picture Processing.” Academic Press (Second
Edition, 1982), New York.
Shapiro, Linda G. (1983). Computer Vision systems: Past, Present, and Future. In “Pictorial
Data Analysis” (R. M. Haralick, ed.), pp. 199-237. Springer-Verlag, New York.
Steinitz, E. ( I 908). Beitraege zur Analysis, Sitrungsbericht Berliner Mathematischer GeseUschafi
7 , 29-49.
Strong, J. P., and Rosenfeld, A. (1973). Region adjacency graphs, Communications of the
American Computer Machinery 4, 237-246.
Ullmann, J. R. (1976). An algorithm for subgraph isomorphism, Journal of 6he Association for
Computer Machinery 23, 31-42.
Ullman, J. R. (1983). Relational Matching. In “Pictorial Data Analysis” (R. M. Haralick, ed.),
pp. 147-1 70. Springer-Verlag, New York.
This Page Intentionally Left Blank
.
ADVANCES IN ELECTRONICS AND ELECTRON PHYSICS VOL . 84

The Intertwining of Abstract Algebra and Structured


Estimation Theory?
SALVATORE D. MORGERA
Department of Electrical Engineering. Canadian Institute for Telecommunications Research.
. . .
McCill University Monrrtal Quebec Canada

Foreword . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
I . Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
I1 . Covariance Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 264
A . Covariance Models with Simple Symmetry . . . . . . . . . . . . . . . 265
B . Linear Covariance Models . . . . . . . . . . . . . . . . . . . . . . 266
C . Covariance Estimators Conforming to the Linear Model . . . . . . . . . 269
D . The Algebra of Inverse Toeplitz Covariances . . . . . . . . . . . . . . 271
I11 . Jordan Algebras . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
A . Generation of a Jordan Algebra . . . . . . . . . . . . . . . . . . . . 274
B . Dimension of a Jordan Algebra . . . . . . . . . . . . . . . . . . . . 275
C . Decomposition of the Covariance Estimator . . . . . . . . . . . . . . . 278
D. Jordan Algebra Homomorphism . . . . . . . . . . . . . . . . . . . . 279
IV . Explicit MLE Solution . . . . . . . . . . . . . . . . . . . . . . . . . . 281
A . Vector Formulation of the MLE . . . . . . . . . . . . . . . . . . . . 282
B . Necessary and Sufficient Condition . . . . . . . . . . . . . . . . . . . 283
C . Relevance to Class of Toeplitz Matrices . . . . . . . . . . . . . . . . . 285
D . Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . 287
V . AR Process Parameter Estimation . . . . . . . . . . . . . . . . . . . . . 287
A . The Transformation Method . . . . . . . . . . . . . . . . . . . . . 288
B . Covariance of AR Process Parameter Estimates . . . . . . . . . . . . . 291
C . Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . 291
D . The Role of a Jordan Algebra . . . . . . . . . . . . . . . . . . . . . 293
VI . Exact Loglikelihood for AR Process Parameter Estimation . . . . . . . . . . 296
A . The Box-Jenkins Likelihood . . . . . . . . . . . . . . . . . . . . . 297
B . The Forward-Backward Likelihood . . . . . . . . . . . . . . . . . . 298
C . Maximization of the Forward-Backward Loglikelihood . . . . . . . . . . 300
D. Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . 302
VII . Summary and Conclusions . . , . . . . . . . . . . . . . . . . . . . . . 309
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310
Appendix A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310
Appendix B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312
Appendix C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314

t Research supported by Natural Sciences and Engineering Research Council (NSERC)


Grant A0912 and Quebec Fonds pour la formation de chercheurs et I’aide a la recherche (FCAR)
Grant EQ 41 12.

26 I Copyright (J 1992 by Academic Press. Inc.


All rights of reproduction in any form reserved.
ISBN 0-12-014726-2
262 SALVATORE D. MORGERA

FOREWORD

Over the past decade, the notion of structured estimation and, in particular,
that of structured covariance estimation, has played a role in many statistical
signal processing problems. Too often, however, the structure of an underly-
ing problem has been used in an ad hoc manner to obtain estimators having
either low variability for finite sample size or low computational complexity.
Much of what is done, or should be done, has an abstract algebraic formalism
dating back to the laying of the mathematical foundations of quantum
theory. This work attempts to formalize both structured covariance esti-
mation and autoregressive (AR) process parameter estimation in terms of the
underlying abstract Jordan algebra, an algebra that differs from the usual
noncommutative but associative matrix algebra. Our investigation puts us on
a firm footing from which to attack a wide variety of future problems in
statistical signal processing, rather in the same manner that the introduction
of Lie algebra and Lie groups in control theory and robotics made new ideas
and developments possible.

I. INTRODUCTION

The need to estimate accurately a covariance matrix using finite data sample
size arises in a wide variety of signal processing problems. Intuition seems to
indicate that if the true covariance matrix is known to possess a certain
element pattern or structure, then constraining the structure of the estimator
in a similar manner should result in a covariance matrix estimate having
perhaps lower variability than one that would result by ignoring the
structure. The common argument in favor of a structured estimate is simple
- fewer parameters means that the available data sample will yield a “better”
estimator, since the number of data samples per parameter will be greater
vis-a-vis the same ratio for an unstructured estimator.
Doubts regarding this argument linger, however, principally because so
little is known, in general, concerning the behavior of finite sample size
estimators. The situation is further complicated if covariance matrix estima-
tion is but one step toward the estimation of another set of parameters, a
pertinent example being the estimation of the AR process parameters via the
normal, or Yule-Walker, equations. In this case, it is not at all clear for finite
data sample size if minimum bias, minimum variance AR process parameter
estimates result by using a covariance matrix estimate constrained in
structure to conform to the true covariance matrix.
In this work, we do not profess to solve the exceedingly difficult finite
THE INTERTWINING OF ABSTRACT ALGEBRA 263
sample size structure problem, but do attempt to shed some light thereon
under the assumptions that the underlying stochastic process is weakly
stationary multivariate Gaussian and that the true covariance R has a
positive definite ( p d ) linear structure, i.e., the true covariance may be
represented as a sum of linearly independent ( l i ) basis matrices with each
basis matrix multiplied by a true covariance parameter. This is the versatile
model that plays an important role in the work of Anderson (1969, 1970,
1973), in which an iterative algorithm is also proposed for finding the
maximum likelihood estimate (MLE) of R under the constraint, of linear
structure.
Considering the form of the Gaussian multivariate distribution (or, indeed,
the more general exponential family of distributions (Lehmann, 1986)), it is
then natural to investigate the nature of the MLE of R when R-'also has a
linear structure, and to determine what impact this may have on Anderson's
iterative algorithm. A necessary and sufficient condition for both R and R-'
to have a linear structure is that the subspace spanned by the li basis matrices
of R be a Jordan algebra, an algebra that played an important role in the
formalism of quantum mechanics (Jordan et al., 1934). As pointed out by
Seely ( 1 977) and Jensen (1 988), a Jordan algebra completely characterizes the
solutions to the MLE problem that are not only complete, sufficient, and
unbiased, but are also explicit and, under these assumptions, obtainable as
a relatively simple quadratic function of the data. The idea of considering an
algebra and its relationship to problems that arise in statistical signal process-
ing was inspired by the work of James (1 957) on experimental designs.
In this chapter, we accomplish a number of tasks: (1) provide an alternate
view of maximization of a relevant likelihood under linear constraints;
(2) derive the manner in which Anderson's iterative approach reduces to an
explicit MLE when both R and R-' have linear structure; (3) illustrate the
point that highly constrained, low-variance covariance matrix estimates may
not necessarily lead to AR process parameter estimates having equally low
variance for finite sample size; (4) derive a method for obtaining the exact
MLE of AR process parameters; and (5) complement Morgera (1992) in
demonstrating the utility of considering the underlying algebra and its
properties found in many statistical signal processing problems.
Our style is largely expository, however; new results are presented and
generally highlighted in the form of theorems, lemmas, and propositions. As
study of an algebra is a highly abstract topic in its purest form, certain
concepts not generally found in the engineering literature are presented, but
relegated to an appendix. A rather comprehensive list of references is
included for researchers wishing to delve further into topics treated.
264 SALVATORE D. MORGERA

MODELS
11. COVARIANCE

Let V represent a p-dimensional vector space over R furnished with an inner


product denoted by (v, w), v, w E V , and let x be a zero mean multivariate
distributed Gaussian random vector taking on values in V . The vector spaces
of (not necessarily symmetric) linear transformations and of symmetric linear
transformations of V into itseyare denoted by L( V ) and L,( V ) , respectively
(see Appendix A, Definition 1). The covariance of x, R = E { x x T } , is a
member of L , ( V ) and is seen to satisfy E{(v, x)(w, x)} = (Rv, w), all v, W E V .
When R is p d and N statistically independent (si) observations of x, x = { x i :
i = 1,2, . . . ,N}, are available, the probability density function of the sample
set is

where
r N

is the general sample covariance matrix. Taking the logarithm of (l), multi-
plying through by 2/N, and neglecting the additive constant, the log-
likelihood function may be written as
f(R I x) = - In IRI - tr (R-'&). (3)
We shall have occasion to use (3), or other expressions derived from (3),
throughout this chapter.
There are two observations regarding (3) that are worth noting. The first,
that the reader may verify, is that f ( R 1 x) < f ( & 1 x), R E L,( V ) , where
f ( R , I x) = In Iff,' I - p . We have, therefore, an upper bound onf(R I x) for
R E L,( V ) ,which can be written in terms of the entropy of the sample set and
is attained at the stationary point R, as will be shown. This upper bound
may, in turn, be tightly bounded for 1 < p < N with additional assump-
tions by the use of recent results (Jonsson, 1982). The second observation is
of theoretical interest and concerns the behavior off(R I x) about the point
l?, when R is not restricted to being symmetric, i.e., when R E L( V ) . Instead,
it is convenient to work with R-'E L (V ) . Construct such a matrix R-' as
R-' = l?,' + uE, where the perturbation E = - ET about the point I?;' is a
skew-symmetric matrix, and u is a real scalar. Using (3), it is not difficult to
show that
THE INTERTWINING OF ABSTRACT ALGEBRA 265
In Karrila and Westerlund (1991), a proof is given for the inequality
+
IR;' aEl > lR;'l, for u # 0; thus, f(RI x) >f(R,I x). The conclusion is,
therefore, that the stationary point R , for R E L3(V ) is, evidently, a saddle
point for R EL( V ) . This observation provides some feeling for the nature of
the loglikelihood surface.

A. Covariance Models with Simple Symmetry

The inner product on the space L,( V ) (or L( V ) ) is called the trace inner
product. Let {e;: i = 1, 2, . . . ,p } be any orthonormal basis for V ; we define
the trace inner product for A, B E L , ( V ) as tr(AB) = tr(BA) = Z,t, ( A e i ,
Be;). The trace inner product also plays a role in describing the necessary
condition for maximization of f(R 11) over REL,(V), i.e., setting the
+
gradient of (3) equal to zero results in the set of p ( p 1)/2 equations,
tr[(R-'&R-' - R-')Ejj] = 0, i, j = 1, 2, . . . , p ; j 2 i, (4a)
where

and p,/ is the i j th element of R. We call a linear subspace basis set such as { E,, :
i , j = 1, 2, . . . , p ; j 2 i } a structure set; in the previous case, the structure
is simple symmetry and the solution to (4a) for N 2 p is R,. The problem
becomes more interesting when additional structural constraints are
imposed. These generally lead to a loglikelihood surface that exhibits local
maxima, due to passage of the constraint set through ridges of the log-
likelihood surface associated with simple covariance model symmetry.
It is well known that R , is p d with probability 1 if N 2 p , is the MLE of
Rover allpdmembers of L,( V ) ,and constitutes a sufficient, unbiased statistic
for R which is, however, not necessarily a minimal sufficient statistic. Finite
sample size results involving specific functions of R, (Morgera and Cooper,
1977; Morgera, 1981) suggest that, for sufficiently large N , Nmust be at least
5p for R , to be close to R on the average. For example, in Morgera and
Cooper ( 1 977) and Morgera ( 1 986), the classical two-hypothesis problem,
consisting of signal plus noise and noise only, is treated in the context of
adaptive pattern classification, and a performance criterion, the signal-to-
interference ratio (SIR), is derived and employed to compare the behavior of
various covariance matrix estimators as a function of sample size. The SIR
conditioned on an unbiased, nonsingular covariance matrix estimate R of R
is shown to be approximated by
266 SALVATORE D. MORGERA

The optimum, or Wiener, weight vector, wept, is given by wop, = R - ' s, where
s is the signal vector. The preceding expression for the SIR provides a useful
means whereby the expected SIR, and consequently, the probability of
classification error under the Gaussian assumption, may be evaluated for
various covariance matrix estimates. Substituting (2) into the preceding
expression and taking the expected value leads to the very simple and
revealing result

E[SIR I I?,] z wrptRwOP,


( -: -7,
irrespective of R and s. This result implies that if we desire
E[SIR I I?,] 2 (1 - u ) w Rwop,,
~ ~ ~ where wTP,RwOptis the optimum SIR, then we
must have

N > P-. + 7
U

Note that when a = 0.2, i.e., the expected SIR is within 1 dB of the optimum
SIR, the required sample set size is at least N = 5p + 35, which is
approximately Sp for p large. The reader may select a value for p suited to
the application at hand. Figure 1 illustrates the improvement in expected SIR
with N for p = 12.
For statistical signal processing problems in which R must be accurately
estimated orf(R I x) must be maximized, the sample size necessary when R ,
is employed is usually not available. We also note that the sample size
required to achieve a specified estimation accuracy depends to a great extent
on the functional form of the estimate, i.e., the manner in which the estimate
depends on the elements of R,. With reference to the previous example, an
accurate estimate of the expected SIR does not necessarily imply an accurate
estimate for the Wiener weight vector. In general, it makes sense to select a
reasonable model for R that possesses fewer than [ p ( p + 1)]/2 free
parameters and to require that the estimator, R , conform to the model.

B. Linear Covariance Models

One covariance model that is felt to be reasonable is the linear model, for
which R is written as

where the structure set { G I :i = 0, 1, . . . , m - l } has ( p x p)-dimensional


members, which are known, symmetric, and li. Assume that m < [ p ( p + 1)]/2
and let p = [pa pI * . pm- I]' be the vector of unknown parameters ranging
THE INTERTWINING OF ABSTRACT ALGEBRA 267
1 I I
- I I
___________-----
1 I I
.....................
I

0.9-
RT,
_ _ - _- _ - -
/._-
I
-
0.7 - -
-
-
-
-
-
-
0 I I I I 1 1 I I
20 40 60 80 100 120 140 160 180 200

N
FIGURE1 . Expected signal-to-interference (SIR) ratios for unstructured, R,, and Toeplitz
structured, 8,. estimators as a function of sample set size N forp = 12 and b = 3. Optimum SIR
is equal to unity.

over a nonempty open set R of R". Suppose further that for each P E R , the
covariance R ispd. Let Lg be the m-dimensional linear subspace of L,( V ) , for
which the set {Ci: i = 0, I , . . . , m - l} constitutes a basis; we may then
characterize the family of covariance matrices conforming to ( 5 ) as
9 = { R E L g :PER}.
We also assume that the identity matrix ZEY,although this is not strictly
necessary. Now using (3) and (9,the necessary condition for maximization
off'(R 1 x) over 9 results in the set of m equations
tr[(R-'k,R-' - R-')E,] = 0, i = 0, 1, . . . , m - 1, (6a)
where

It is worthwhile mentioning that most iterative procedures that exist for


finding a solution R to (6) stem from the work of Anderson (1969, 1970, 1973)
and, therefore, require the computation of R-' at each iteration and exclu-
sively utilize the basis of L,. In principle, however, it is possible to view the
iterative process of finding a solution to (6) as one that effectively searches in
268 SALVATORE D. MORGERA

the orthogonal complement of Lg and, as it turns out, does not require


computation of R - ' at each iteration. To understand this, we require the
following.
Lemma 1. Let the covariance R E 2' conform to a pd ( p x p)-dimensional
linear model with m parameters. Maximization o f f ( R I x ) over 2' results in the
system
tr(AG,) = 0, i = 0, 1, . . . , m - 1 ,
where the solution A has the form
n- I

A =
I
1=o a , A , , AEUV),

with n = [ p ( p + 1)]/2- m independent parameters.


ProoJ Assume N 2 p and N finite, then A" = R-'R,R-' - R-' has
+
[ p ( p 1)]/2 distinct elements with probability 1. The system (6) is
tr(A"G,) = 0, i = 0 , I , . . . , m - 1; this homogeneous system imposes m li
constraints on the elements of 2. It follows that the solution, A , may be
+
expressed linearly as A =:::X a, A , using n = [ p ( p 1)]/2 - m independent
parameters, a,, i = 0, 1, . . . , n - 1, which are elements of 2, with the
remaining m elements of A" linearly dependent on these. The matrices A , ,
i = 0, 1, . . . , n - 1, appearing in the solution are symmetric but not
necessarily li. They are partly characterized by virtue of the fact that, for the
a, arbitrary, tr(AG,) = 0 implies that tr(A,G,) = 0, which, together with
Z E ~ implies
, that t r ( A , ) = 0, i = 0, 1 , . . . , m - 1 ; j = 0, I , . . . ,n - 1. Such
matrices of trace zero are elements of the special linear Lie algebra, s l ( V )
(Samelson, 1969).
Using the form for A given in Lemma 1, we may express R as
R = R, - RAR

This identity is quite interesting and reveals the manner in which the solution
R is related to 8, when certain additional structural constraints are imposed.
An iterative procedure of the type previously alluded to could now be
developed. Such an approach would, we feel, be of benefit only if n < m
and/or a general approach could be devised t o find the A, matrices; neverthe-
less, the connection to a Lie algebra is indeed fascinating and worthy of
additional investigation.
THE INTERTWINING OF ABSTRACT ALGEBRA 269
C. Covariance Estimators Conforming to the Linear Model

An estimator for R may have the following form:


m'- I
ff= fi.H
I I ?
(7)
i=O

+
where m < m' < [ p ( p 1)]/2, fi = [$,, $, ... $ m . - , ] T , and the structure set
{ H i :i = 0, 1, . . . ,m' - l} has ( p x p)-dimensional members that are known,
symmetric, and li. Now, let Lh be the m'-dimensional linear subspace of L,( V )
for which the set {Hi: i = 0, 1, . . . , m' - l} constitutes a basis. Typically,
5
Lg Lh, and we leave the apparent issue of estimating a parameter vector p
of dimension m from a statistic fi of dimension m' for later.
When R is symmetric Toeplitz, m = p , unless a reduced band structure is
suspected, and it is commonly assumed that Lh L g , i.e., the structure set
{G,: i = 0, 1, . . . , m - l} serves as a basis for R . In this case, we shall see
in Section IV that the MLE fi of p cannot be explicitly written in terms of the
sufficient statistic R,, i.e., as a linear transformation of the elements of R,
depending only on the structure set. For situations of this type, Anderson, as
previously mentioned, has refined (3) and suggested a Newton-Raphson, or
scoring (Rao, 1973) approach to finding the MLE. In practice, this method
can suffer from convergence difficulties and has problems negotiating
multiple maxima exhibited by (3) under structural constraints. In regard to
convergence, considerably improved algorithms are found in Morgera and
Armour (1989) and Mukherjee and Maiti (1988). We further note that any
algorithm should not allow excursion of the iterates outside the set f2. The
following example illustrates a structure set when the estimation problem is
formulated as previously described.
Example 1. The symmetric Toeplitz structure results in the following
structure set when m = p = 5 :
-
0 1 0 0 0
- -0 0 1 0 0
-
1 0 1 0 0 0 0 0 1 0
G,=Z, GI= 0 1 0 1 0 , G2= I 0 0 0 1 ,
0 0 1 0 1 0 1 0 0 0
-0 0 0 1 0 - -0 0 1 0 0-
270 SALVATORE D. MORGERA
- - -
0 0 0 1 0 0 0 0 0 1
0 0 0 0 1 0 0 0 0 0
G3= 0 0 0 0 0 , G4= 0 0 0 0 0 . ( 8 )
1 0 0 0 0 0 0 0 0 0

-0 1 0 0 0 - 1 0 0 0 0-
It is instructive to represent each Gi matrix in terms of rank one linear
transformations of Vinto itself. Recall that, in general, for v, w E V , the linear
transformation u --t (w,u)v of V into itself is the dyad vwT. Each Gi may be
represented in terms of the standard basis, { e i : i = 1, 2, . . . ,p } of V , where
e, has unity in the ith position and zeroes elsewhere. Since the structure of R
falls within the broader class of symmetric centrosymmetric (SCS) matrices
whose elements satisfy pi, = p,i = pp+ - i , p + I -,,
i, j = 1, 2, . . . ,p , and whose
eigenvectors (for distinct eigenvalues) are always of symmetric or skew-
symmetric form (Collar, 1962; Morgera, 1982), it is preferable to use the
following orthogonal transformation of the standard basis:
1 - 1
E l = - (el + e,), e, = -(e2 + e,), C, = e,,
8 8
1 1
= - (e, - e,), C - - (el - e 5 ) . (9)
64
4 ’-$
Now, in terms of the basis { E l : i = 1, 2, . . . ,p } , we have
5
Go= C Eli!:
1 = I

+ C,E:) + $(C,CT + C,Cr) + (C4Cc + C5C%)


G, = ( C I E r
G, = $(C,CT + C3$) + &,E; - E4E4T (10)
G3 = (CICT + &C:) - + EsE:)
(C4CT

G4 = ClEY - E5Ci

Note that, in general, all dyads occur in the symmetric form Eli!,? + C,Cf;
i, j = 1, 2, . . . ,p . The trace inner product of the basis matrices is given, in
general, for any dimension, by

tr(G,G,) =
L?
2(p - i ) , i = j # 0;
p, i=j
i#j.
We return to the SIR estimation problem of Section 1I.A to provide the
= 0. (1 1)
THE INTERTWINING OF ABSTRACT ALGEBRA 27 1
reader with an understanding of the practical value of employing a linear,
structured covariance estimator. Assume that the problem appears to
warrant the use of an estimate of Toeplitz structure, which we denote by R , ,
formed as in Morgera and Cooper (1977) and Morgera (1981) using the
structure set of Example 1, and possessing m = p covariance parameters.
Very little is known concerning the finite sample size behavior of such a
structured estimate or functions thereof other than what is found in the work
of Morgera and Cooper (1977; 1981). From Morgera (1981), we conclude
that

E[SIR I I?,] ("


w ~ ~ , R w , ~ ,p
; - ')>
where N' = PN is the efective sample size (Morgera-Cooper coefficient) with
coefficient fl given by

In this expression, the quantity b is the average correlation distance within


the p-dimensional data vectors; for example, we have b = 1 if the data is
-
first-order Gauss-Markov, and $( ) is the psi function, the logarithmic
derivative of the gamma function. For p = 12, and assuming that b = 3 , we
have fl = 3.82; thus, the expected SIR is within 1 dB of the optimum SIR
when the sample set size N is at least N = 1 . 3 ~+ 9.2, or betweenp and 2p for
p large. Figure 1 also shows the expected SIR, given R , as a function of
sample size, N .
The expressions provided here seem to comprise a useful tool for evaluat-
ing performance improvement of linear model-structured estimators over
those possessing simple symmetry only. The notion of an effective sample size
increase also assists in understanding why, for a given and small sample set
size, structured covariance matrix estimates may be positive definite, whereas
the general sample covariance matrix may be singular.

D . The Algebra of Inverse Toeplitz Covariances

We conclude this section with a number of observations concerning Toeplitz


matrices. First, to set terminology, L( V ) is known as an associative algebra
of linear transformations (see Appendix A, Definition 2). Consider a
symmetric covariance matrix R e 3 with structure set { G I : i = 0, I , . . . ,
m - I}. There are many examples in the literature, Pukhal'sky (1981) being
representative, for which it is assumed that products of the form GIG, E L , .
Since the G, are symmetric, this is equivalent to stating that the GI commute,
272 SALVATORE D. MORGERA

and, therefore, that there exists an orthogonal matrix 0, independent of the


elements of p, such that OROT is diagonal. In this case, L, is a symmetric
linear ring, or a von Neumann real algebra. It is clear from (8) or (lo),
however, that products of the form G i G j ,i, j = 1,2, . . . ,p - 1; i Zj,do not
lead to symmetric matrices representable (even for i = j , in which case the
products are symmetric) in terms of a linear combination of the Gi , i = 1,
2, . . . , p - 1. The subspace L, of Toeplitz matrices is, therefore, not closed
under multiplication and is not itself a subalgebra of L,( V ) ,or an algebra of
symmetric linear transformations. As a result, it is possible to prove that the
family 9does not admit a complete sufficient (or explicit) statistic under the
assumption of normality (Seely, 1971, 1977).
There are two things that must be done to generate a symmetric algebra
of smallest dimension for Toeplitz matrices: (1) extend the subspace L,, and
(2) define a symmetric product that ensures closure on the extended subspace.
With respect to item (l), and without introducing further notation, we let
SEL,( V )and assume that a linear subspace Lh of L,( V )exists, which permits
us to define the family of inverse covariances,
Y-’
= {SELh: S - ’ E 9 , SER’},

where s = [so sI . . . sm.- , I T € R’ c Rm’implies that S is pd. A typical member


of 9-I conforms to the linear model
m’-
. .I
S= 1 siHi.
i =O

If L, is the subspace of symmetric Toeplitz matrices, then Lh is the subspace


of SCS matrices. In general, certain special conditions must be met for L, and
Lh both to exist and to constitute a symmetric algebra; these are related to
item (2) and are discussed in the next section. Also, with a slight abuse of
notation, we have dim Lh = dim Lg + dim(L,/L,), where is the quotient
space of Lh by L, and dim( * ) denotes the dimension of the indicated
subspace. An example is provided; the purpose of this example is to illustrate
one way in which a basis set for Lh may be characterized.
Example 2. Let L, and Lh be the linear subspaces of symmetric Toeplitz
matrices and SCS matrices, respectively, where dim L, = p = 5 . We wish
to extend the free structure set for L, of Example 1 to a basis for Lh. We
find that a suitable basis for Lh is given by the set {Go, GI, . . . , G,, A , , A , ,
THE INTERTWINING OF ABSTRACT ALGEBRA 273
A , , A , } , where
-
0 0 0 0 0 ' 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0
A, = 0 0 0 0 0 , A,= 0 0 1 0 0
0 0 0 1 0 0 0 0 0 0
0 0 0 0 0- 0 0 0 0 0
-
0 0 0 0 0 0 0 0 0 0
0 0 0 1 0 0 0 1 0 0
A, = 0 0 0 0 0 9 A, = 0 1 0 1 0
0 1 0 0 0 0 0 1 0 0
0 0 0 0 0 0 0 0 0 0-
We see that A , and A z split Go,A , splits GZ, and A, splits GI. A basis for L,/L,
is the set { A i + R: i = 1, 2, . . . , 4; R E L g ) of cosets. The dimension of the
subspace of the (5 x 5)-SCS matrices is dim Lh = 9. Note that the subspace
L, is not closed under normal matrix multiplication.

111. JORDANALGEBRAS

In this section, we characterize the algebra for which a complete sufficient


statistic exists for a covariance R conforming to a linear model. Given that
S EY ' , setting the gradient of the loglikelihood (3) equal to zero results in
tr(SR) = tr(s&), VSELh, (14)
or, equivalently, all SE Lh must be orthogonal to the difference (R - AG).A
complete characterization of the solution to (14) when both R and S conform
to linear models seems to have first appeared in Seely (1971) and then to have
been expanded upon and clarified in Jensen (1988) and earlier, unpublished
work by Jensen. We present the relevant result as the following theorem.
Theorem 1 (Seely, 1971; Jensen, 1988). A solution RE^ to ( 1 4 ) exists and
is unique fi and only if dim L, = dim Lhand Lh is a Jordan algebra of symmetric
linear transformations (see Appendix A , DeJnition 3) with a commutative or
symmetric product denoted by * and dejined as
A *B A +(AB + BA), VA, B EL h . (15)
The proof of Theorem 1 may be found in the indicated references. Note
274 SALVATORE D. MORGERA

that + ( A B + BA) = ) ( A + B)’ - +A2 - + B 2 , so it is clear that if A , BEL,,


then A * B E Lh. Conversely, let Lh be a subspace closed under the com-
position * defined by (15). Then, if A , B E Lh,we have that A’ = A * A E Lh and
ABA = 2(B * A ) * A - B * A* E Lh . The subspace Lh is, therefore, closed under
the product * and is a Jordan subalgebra of L , ( V ) .
The uniqueness of a solution to (14) has also previously been argued from
the point of view that members of Y - ’ form a closed convex set (Quang A.,
1984), which follows immediately when Lh is a Jordan algebra. The notion of
a Jordan algebra is surely a more fundamental one and characterizes
solutions to problems of this type not only when normality is assumed, but
also when the underlying distribution is a member of the broader exponential
family of distributions (Barndorff-Nielsen, 1978; Lehmann, 1986). The
implication of this is that a complete sufficient statistic exists only for
covariances having a Jordan algebra structure. The Toeplitz matrices do not
possess this structure; however, an important question involves the gener-
ation of the smallest class of matrices that includes the class of Toeplitz
matrices and does have a Jordan algebra structure.

A . Generation of a Jordan Algebra

To generate a Jordan algebra using (15) and the basis elements Gi€ L g , we
start by assuming that the Gi are candidate elements of the basis set for Lhand
form all products Gf and G,GjGi,i, j = 0 , 1, . . . , m - I; i # j . The resulting
matrices are found to be linearly representable in terms of the elements Gi and
a number of symmetric matrices not in L, that serve as additional candidate
elements of the basis set for Lh. We then proceed in a recursive fashion, again
forming the above products using all candidates until no additional can-
didates are found. The process terminates in a finite number of steps, at which
point we delete all elements that are linearly representable in terms of other
elements and retain those that remain as a basis set for L,.
Example 3. This example illustrates the extension of the Toeplitz matrix
structure set { G i : i = 0, 1, . . . , m - I}, for m = p = 5 , of Example 1 to a
structure set { H i :i = 0, I , . . . ,m‘ - I } for the Jordan algebra Lh. Following
the procedure outlined previously, we find that a basis for the subspace Lh is
THE INTERTWINING OF ABSTRACT ALGEBRA 275
given by
- - - - - -
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
H o = O O O O O , H l = O O O O O , H 2 = 0 0 1 0 0 ,
0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
-0 0 0 0 1- -0 0 0 0 0 - -0 0 0 0 0-
- - - - - -
0 0 0 0 1 0 0 0 0 0 0 0 1 0 0
0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
H 3 = 0 0 0 0 0 , H 4 = 0 0 0 0 0 , H 5 = 1 O O O 1 ,
0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
-1 0 0 0 0- -0 0 0 0 0 - -0 0 1 0 0-
- - - - - -
0 0 0 0 0 0 0 0 1 0 0 1 0 0 0
0 0 1 0 0 0 0 0 0 1 1 0 0 0 0
H 6 = 0 1 0 1 0 , H 7 = 0 0 0 0 0 , H , = O O O O O .
0 0 1 0 0 1 0 0 0 0 0 0 0 0 1

-0 0 0 0 0 - -0 1 0 0 0- -0 0 0 1 0-

We see that m' = dimL, = 9 and that L, is the subspace of SCS matrices,
consistent with what was shown in Example 2, and, in fact, Examples 2 and
3 illustrate that neither the method for finding a basis set nor the set itself is
unique. In terms of the generating basis, we have Go = H , H , H 2 , + +
+ +
GI = H6 H , , G2 = H4 H , , G, = H , , and G4 = H , . The multiplication (*)
table for this Jordan algebra is found in Table 1 of Appendix B. Those
matrices for which H , * Hi= H i , or H,' = H , , are called the principal idem-
potents of the algebra; these are H,, H I , and H 2 .

B. Dimension of a Jordan Algebra

Generally, finding the dimension of a Jordan algebra and an underlying basis


for the subspace can be a difficult task. In the case of the SCS matrices, we
can, however, immediately make the following proposition.
Proposition 1. The lowest dimension Jordan subalgebra of L,( V ) containing
Lg, i.e., with basis spanning the class of ( p x p)-dimensional symmetric
216 SALVATORE D. MORGERA

Toeplitz matrices, where p > 3, is the Jordan algebra Lhof ( p x p)-dimensional


+ +
SCS matrices having algebra dimension m' = [ p I( p I 1)]/2 [ p 2 ( p 2 1)]/2 +
with p I = rp/21,pz = Lp/2J, where rxl (LxJ denotes the closest integer value 3
(<) x.
The proposition is proved by noting that the product * generates elements
of Lh which are not Toeplitz but linearly representable in terms of the
elements of a spanning set { H I :i = 0, 1, . . . , m' - l}, where each HIsatisfies
HI= PH,P, with P a conformable contraidentity matrix having unity along
the secondary diagonal and zeroes elsewhere (Collar, 1962; Morgera, 1982).
Since P 2 = I, each element of Lhis isomorphic to a (Cartesian) product of two
Jordan algebras, one associated with a symmetric subspace and the other a
skew symmetric subspace. The expression for dim Lh results simply by adding
the dimensions of these two orthogonal subspaces. Now, let f be a special
and formally real Jordan algebra (see Appendix A, Definition 3). In abstract
algebraic terms, we have, in general, that f z yl x f 2 x * x y k ,where z
denotes isomorphism, the symbol x denotes Cartesian product, and the $I
are called simple Jordan algebras (Jordan et al., 1934; Jensen, 1988), of which
there are three types that may be characterized by degree (see Appendix A,
Definition 3). If 2 is the Jordan algebra of ( p x p)-dimensional SCS
matrices, wherep > 3, then we have, specifically, that f z Ap, (R) x A p 2 ( R ) ,
where A p , ( R )denotes the algebra of ( p , x p , ) symmetric matrices over R,
i = I , 2. Noting that the dimension of A p , ( R )is [ p , ( p ,+ l)]/2, i = 1, 2,
establishes the result.
A few words on the simple Jordan algebras are in order; the reader should
refer to Appendix A. The Jordan algebra R of degree 1 characterizes a
covariance R of the form R = pZ, where p > 0 is the only parameter. The
+
Jordan algebra R x W of degree 1 dim W characterizes a covariance con-
forming to the linear model ( 5 ) if and only if
G,G,+G,Gl=(2/p)tr(G,G,)Z, i , j = l , 2, . . . , m - 1 ,
where dim W = m - 1 and we assume, without restriction, that the GI are
trace zero. Although this algebra is useful in the reduction of higher degree
algebras, it does not appear that linear covariance models with structure set
elements possessing the preceding condition occur naturally in statistical
signal processing problems. Basis matrices that do satisfy the condition,
however, arise in problems in quantum and wave mechanics. For p = 4, an
example of a matrix characterized by a simple Jordan algebra R x W of
THE INTERTWINING OF ABSTRACT ALGEBRA 277
degree 4 has the form

where a, 6, c, d E R.The matrix R may be linearly expanded as R = [(a + b)/


+ + +
211 [ ( a - b)/2]E2 cEl dE,, where the trace zero, symmetric structure
set elements, E l , i = 1, 2, 5 , are known, to within a purely imaginary scalar,
as the Dirac or Eddington matrices (Jeffreys and Swirles, 1956). It is not
difficult to show that the previous Jordan algebra condition on the structure
set elements is satisfied, since El E, + E,El = 2dIJZ,i, j = 1, 2, 5 ; thus, given
that R is pd, the matrix R - ' may also be linearly expanded using the same
structure set. The interested reader will find details on the distribution of
covariance estimates for Jordan algebras of the type R x Win Jensen (1988);
also, since covariance models of this type are parameterized by Clifford
algebras, results found in Hile and Lounesto (1990) will be of assistance. The
last type of simple real Jordan algebra is A p , ( R )and is said to be of degree
p I ,p I 2 3; in the context of the Toeplitz and SCS matrix classes, this is the
relevant Jordan algebra. In certain special cases, simple Jordan algebras of
degree 2 can be used to describe the substructure of higher degree problems.
For example, since A2(R)E R x W, where dim W = 2, we can write the
Jordan algebra of (p x p)-dimensional SCS matrices as f E (R x W ) x
(R x W) and f z A3(R)x (R x W), for p = 4 and p = 5, respectively.
At this point, it is important to see the manner in which the basis, or SCS
structure set, { H , : i = 0 , 1, . . . , m' - l } relates to a basis for V. We provide
an example.
Example 4. Consider the basis for V , { E l : i = 1, 2, . . . , p}, where p = 5 ,
presented in Example 1. We have
5 5
H, = a,k/ZkET, i = 0, 1, . . . , m' - 1, ( 17a)
k=l /=I

where
alk/ = ( H IZ/ 9 E,) = El H IE/. ( 17b)
Since the E,, i = 1, 2, . . . , p , form an orthonormal set,
5

The set of structure constants {aik/} may be arranged into m' p-dimensional
278 SALVATORE D. MORGERA

matrices u i , where aikl= uilk is the ( k , I)th entry in the ith matrix, i = 0,
1, . . . , in' - 1. We say that ai is the matrix of Hi relative to the basis set
{Ci:i = 1,2, , . . ,p} of V(see Appendix A, Definition 1). The correspondence
Hi ++aj is an isomorphism of L,,, the Jordan algebra, onto the algebra L,( V )
of the symmetric ( p x p)-dimensional matrices with entries ajk,.The interested
reader may wish to find the a ; ; we provide one as a means of checking. The
isomorphism H, 2 u, has structure matrix
- 0 1 0 ; 0 0-
a

1 0 0 :
I
0 0
I

0 0 0 : 0 0
I
_ _ _ _ r _ _ _ _ _ _ _ - - - - - - - -
I

0 0 0 : 0 - 1
I

-0 0 0 : - I 0-
where we have explicitly shown the partition into blocks of sizep, = rp/21 and
pz = Lp/2J. We note that the a; reveal a decomposition of V into orthogonal
subspaces as the direct sum V = V, 0 6 .In this example, we have V, = sp{C,,
E2, E , } and V, = sp{&,, E5}, where sp denotes linear span.

C. Decomposition of the Covariance Estimator

Now, let us assume that R e 9 and that Y = 9-', where R may be written
as R = ZyLil p,Hi. Let R of (7) be the solution to (14) and recall that
$ z A ~ $ ~ x . . . x $ k . T h e u n i q u e d e c o m p o s i t i o n V =V , 0 V 2 0 . . . 0 v k
into a direct sum of k orthogonal subspaces allows R to be written as

where the Q, E Lh satisfy Zf=I Q, = I; Q, Q, = 6,)Q,; and the range of Q, is F,


i.e., = Q,( V ) , i, j = 1, 2, . . . , k . In other words, if R is an element of a
Jordan algebra, the complete sufficient statistic R is the orthogonal projection
of R, onto L,, the scalar product of matrices being the trace inner product.
The expression (19) is actually an expansion of R in terms of the ideal
,
sfructure of Lh, i.e., R = X:= R , , where R, E X ,the ith (two-sided) ideal of
.a,
L,, and Y, = 0, i, j = 1,2, . . . ,k ; i # j (see Appendix A, Definition 4). The
projector Q, is the identity element in $, i = 1, 2, . . . , k . In this light,
the covariance estimation problem is seen to result in the explicit form (19),
and is reduced to finding the ideals and ideal identities of the algebra.
Example 5 describes how this is done for the class of SCS matrices.
Example 5. The reader should refer to Appendix B. Consider the Jordan
THE INTERTWINING OF ABSTRACT ALGEBRA 279
algebra of ( p x p)-dimensional SCS matrices for which f z A p , ( R )x
&ZP2(R).The algebra is seen to possess two ideals, 9, and 9,, where A , A , = 0,
A , E 3,.It is not difficultto surmise from Appendix B that these ideals
A , E 9,,
may be expressed for p = 5 as
9,= { A ,E Lh: A , = + H 3 )+ R ( H , + H4)
R(Ho
+ R(H8 + + RH, + R H , + RH,}
H7) (20)
9 2 = (A2 E L,: A2 = R(H0 - Hj) + R(H1 - + R(H8 - H , ) } ,
H4)

where R denotes an element of the real field; there are m' = 9 such scalar
, .a,, respectively, are
parameters. Typical members A , , A , of the ideals 9,
shown as follows, where a, 6 , c, . . . , i E R :
-
a c e c a g i 0 -i -g
c b f b c i h 0 -h -i
A, = e f d f e , A,= 0 0 0 0 0
c b f b c -i -h 0 h i
a c e c a -g -i 0 i g
To find thc dentities in each ideal, we write the ideal subspace multiplication
tables; these are found in Table 2 of Appendix B. Examining this table reveals
that, with proper normalization, the orthogonal projectors are
Ql = +(Ho + Hj) + +(Hi + H4) + H2
(21)
Q2 = +(Ho - H3) + +(HI - H4).

As expected, the columns of Q, span the 3-dimensional symmetric subspace


V, of V and the columns of Q, span the 2-dimensional skew symmetric
subspace of V , with V = 0 V,. We have 9,2 &,(R) and 9,2 A , ( R ) ;
furthermore, it is possible to show that 3, z R x W,, where W, is a real
vector space of dimension 2.

D. Jordan Algebra Homomorphism

We now present the final property of Jordan algebras and the idea of an
algebra homomorphism, a mapping that preserves algebra structure and a
concept that has its analog in almost every part of mathematics (see
Appendix A, Definition 5). To do this, we formalize the results found in
Example 4. In terms of the algebras involved, the isomorphism discussed in
Example 4 may be thought of as a 1 : 1 Jordan algebra homomorphism,
which we refer to simply as a Jordan algebra isomorphism in the sequel, such
280 SALVATORE D. MORGERA

that t : f + L , ( V ) , where f = j , x f 2 x x f k is the product of simple


Jordan algebras. We have t ( f ) = L,,, ~ ( 1 =) I, and t ( a * b) = &[t(a)t(b)+
t(b)t(a)],Va, b E f (Jensen, 1988). It is now possible to decompose further the
estimation problem using the Jordan algebra isomorphism and the ideal
structure. The statement and proof of the decomposition is due to Jensen
(1988); however, we have rephrased it in terms of the ideal structure, mind-
ful that every ideal can be made the kernel of a homomorphism, as in
Theorem 2.
Theorem 2 (Jensen, 1988). Let the Jordan algebra f and the 1 : I linear
mapping t be as previously dejined. The ( p x p)-dimensional covariance matrix
estimate l? then admits the decomposition
r(R) = tl(Al), RI€$; i = 1, 2, . . . , k , (22)
in which the t-image of the estimate I? is placed in block diagonal form. The t I
are also 1 :1 linear mappings and the ith block t,(R,) E L,(y ) , where tl(& is
( p , x p,)-dimensional.The decomposition is uniquely determined by f and T
and Z!'= I pi = p .
In effect, the result of (22) follows by applying a 1 : 1 linear mapping to (19),
which is equivalent to the application of 1 : 1 linear mappings to the R, E 8,
i = 1, 2, . . . , k . Define yl, = Q,x,, i = 1, 2, . . . , k , where x = { x , : j = 1,
2, . . . , N } is the sample set defined in Section 11. It is clear that the yl, are
zero mean multivariate normal random vectors that are mutually statistically
independent by virtue of the assumptions on x and the fact that QlQ, = 6, Q, ,
i = 1, 2, . . . , k . Since Rl = (l/N) Ey=,yl,yc, the R, are statistically indepen-
dent and Wishart distributed (Wishart, 1928). As such properties are
invariant under the 1 : 1 linear mappings t I , the t,(l?,) are statistically
independent blocks, each of which is Wishart distributed. We note that,
in general, neither 8 nor t ( d ) are, however, Wishart distributed. The
important practical consequence of Theorem 2 is that the ( p x p)-dimensional
covariance matrix estimation problem decomposes into k smaller ( p i x p i ) -
dimensional problems with Xf=I pi = p , when the underlying structure is
described by a Jordan algebra.
Example 6. In this example, we show the correspondence between the 1 : 1
linear mapping t of Theorem 2 and the ideal structure of Example 5. Define
y,, = Q , x , , i = 1 , 2 ; j = 1 , 2 , . . . ,N,wherex,= [x,, x2, . . . x,,]~. From(l9),
we have one expression, R, = (1/ N )Zy=I y,, y i , and, from (20), since r?, E 4,
we have another expression,
Rl = 60.I(HO + 4 ) + a, ,(HI + H4) + 62.I(H* + H7)
+ 6 3 . 1 H 2 + H S + 65 I
64.1 H6 (23)
R* = PO.*(HO - 4) + 6,.2(Hl - H4) + 6 2 . 2 W B - H7).
THE INTERTWINING
THE INTERTWINING OF
OF ABSTRACT
ABSTRACT ALGEBRA
ALGEBRA 2811
28
The form for the estimates jf,, equating the
j,,,is obtained by equating the two
two expressions
expressions
example, we have
for each of R , and R,; for example, have fro.,
j0., X,”=I (xll
(1/4N)X,”=I
= (1/4N)
= +
(x,, xx ~~, ,) )~~, ,
Po,* = (x,,-
= (1/4N)X,”=, (xII - x,,)~. using the
x , , ) ~ .Now using the basis
basis for
for V,
V, {Zf:
{Z,: ii = 1,
= 1,
2, . . . , p}, of Example 4, and, in particular, thethe correspondences
correspondences H Hi-a,,
i~a,,
places (23) in the form

t(8)= 2

, I

in accord with Theorem 2. Consider the isometries q i : v- &,, where Vp, is


the vector space of dimension pi, i = 1, 2, and
-- --
1 0 0 0 1
0 1 0 1 0
1
‘pj=(p2=- 0 0 4 0 0 . (25)
Jz 0 1 0 -1 0
-1 0 0 0 - 1 -
Note that an isometry is a linear mapping that preserves inner product
values, i.e., (y,,, y,,) = ( z i j ,z i j ) .Define zij = (pi yij and shorten zijto Z,, a vector
ofdimensionp,,i= 1 , 2 ; j = 1,2, . . . , N,wherep, = 3 , p 2 = 2 . W e t h e n h a v e

(26)

-
where the N ti(&) are statistically independent, complete sufficient
statistics for the family 9 and have Wishart W ( z i ( R i )pi,
, N ) distributions.
Subsequent simulation results, as they pertain to the class of SCS matrices,
employ the decomposition (24).

EXPLICIT
IV. EXPLICIT SOLUTION
MLE SOLUTION

demonstrate the
The goal of this section is to demonstrate the manner inin which
which Anderson’s
Anderson’s
explicit expression
approach (1969, 1970, 1973) leads to an explicit expression for
for the
the MLE
MLE I?I?
apdmember of a Jordan
of R in the case that R is apdmember Jordan algebra
algebra of
of symmetric
symmetric linear
linear
282 SALVATORE D. MORGERA

transformations. We have chosen to use Anderson's approach as the starting


point, due to its generality and relative popularity; see, for example, Armour
and Morgera (1991), Gueguen (1987), and Burg et al. (1982).
We simply assume that R E 9; thus, R has the form given by (9,with the
estimate I? having the form R;:=:X FiGi. From the discussion of Section
TI, the MLE R is seen to satisfy the set of equations
m- I
1 f i j t r ( R - ' G i k l G j )= t r ( k l G i k l & ) ,
j =O
i = 0, 1, . . . , m - 1. (27)

The MLE R is completely characterized by the MLE fi = [$,, fiI . . . fimPllT


of p.

A . Vector Formulation of the MLE

Now, let 3 and gi be the [ p ( p + 1)]/2-dimensional column vectors formed in


a specific and consistent manner from the upper triangular portions of R and
G,, respectively, i.e.,
. . . P p p PI2 . . . ljp-l.plT
A 1

3 = [fill $22 3

(28)
g, = [s',l/ g';l . . . g;; g',lj . . . gp-l,p]T ;
(1) i = O , 1,. . . , m - 1.
The vector formed in the same manner from the elements of R, is denoted
+
by i,. Construct the [ p ( p 1)]/2-dimensional matrix @(I?) from the elements
4 # j , k / = fitkfij, +
$,$, , i < j , k < 1, where +lJ,kl is the element of @ ( R )in the row
position of filJ in P and in the column position of fik, in PT. Note that @(R) =
NCov (i,) and that @(I) = 2 4 0 Zp(,, -
Using these definitions, each side of (27) may be written as (Anderson,
1973)
tr ( kI G, R - I GJ) = 2g:@(I?) ~ I gJ,
(29)
tr(klGl&-'&) = 2g:@(R)-'PG; i, j = 0, I , . . , , m - 1.
Finally, we define the [ p ( p + 1)/2 x m]-dimensional matrix G as G =
fi, we obtain
I , I

[goigl/ * .*igm-l]. Substituting (29) into (27) and solving for


fi = [G'@(R)-IGI- I G '@(I?) I i, . (30)
The MLE fi of (30) has a form that resembles the one found in generalized
least squares estimation theory and is not, in general, explicit (Browne, 1977).
Note that the matrix GT@(&lG appearing in (30) is one form for the
Hessian matrix, i.e., the matrix of second partial derivatives of the log
likelihood function, f (R I x). Our goal is to describe precisely the necessary
and sufficient conditions for obtaining an explicit MLE of the form fi = A?,,
where the matrix A does not depend on the elements of 8.
THE INTERTWINING OF ABSTRACT ALGEBRA 283
To proceed, we define two operators in terms of a general ( p x p)-dimen-
sional matrix A E L (V ) . The first is the vec operator, which transforms the
matrix A into ap*-dimensional column vector, i.e., vec(A) = [A?,/A?,;.. . ;A?,]',
where A.i denotes the ith column of A (Magnus and Neudecker, 1979). The
second operator is the [p' x p ( p + 1)/2]-dimensional matrix T, which
transforms a vector a of upper triangular elements of A arranged as in (28)
into vec(A), i.e., Ta = vec(A).
With respect to the second operator, r is seen to be an extension of a
similar operator described in Magnus (1988). The system Ta = vec(A) is
clearly consistent. Solving this system for a, we have
a= (rTr)-I
Pvec (A),
assuming that T'T is pd. In fact, it follows from the discussion in Magnus
(1988) that T'T = Zp(p+,),z is idempotent; thus, a = TTvec(A)is the unique
solution.

B. Necessary and SufJicient Condition

Now, using both of the preceding operators, we may express i as


i = r7vec (2). (31)
In addition, we may write @(A)as
@(I?) = + K ) ( R0 R)r,
rT(zp2
where the ( p 2 x p2)-dimensional orthogonal matrix K is given by
P
K = 1 <e,@Z,@eT)
I = ,
(33)

and 0 denotes the Kronecker product. The matrix K is known as the


commutalion matrix (Magnus and Neudecker, 1979). The expression (32) for
@(A) may be written in a more symmetric form as
@(A)= S'(R 0 R ) S , (344
where
1
S = -(lpz + K)T (34b)
4
and S T S = @(Z). Define the ( p z x m)-dimensional matrix 9 as
I ,

9 [vec(G,)i vec(G,)i . . . / vec(G,-,)]. (35)


It is not difficult to show that S ( S T S ) - ' G = (I/$)g; this leads to the
284 SALVATORE D. MORGERA

intermediate expression,
1
@(R)@(Z)-'G= - S'(R@ R ) Y . (36)
Jz
Employing the linear model for fi, we have
Rod= c 1b i j ~ i ~ j [ ( G i O G , ) + ( G , O G , ) l ,
i j
(37)
j 2 i

where b, = 1 - 6,/2. There are a total of [m(m + 1)/2] terms in the summand
of (37). We are now prepared to state the following proposition.
Proposition 2. If R is a pd member of a Jordan algebra of symmetric linear
transformations for which the basis set is { G o , GI,. . . , G , , - ' } , i.e., RELY,
L? = Y - ' ,then the m columns of Q are spanned by m eigenvectors of R @ R
or, equivalently, m li linear combinations of the columns of 9 are eigenvectors
of R @ R. In mathematical form,
( R @ R)Y = Y X , (38)
where the m-dimensional (nonsymmetric)matrix X is pd.
The proof of this proposition follows directly from the discussion of Jordan
algebras found in Section I11 and some straightforward algebra. The pro-
position applies equally well to an estimate 8 that is also an element of the
Jordan algebra; in this case, we write (I? @ 2)s = 92.The relationship (38)
is key to simplifying (30) and exposing the explicit MLE. In fact, (38) is a
necessary and sufficient condition for obtaining an explicit MLE. We state
the result of applying Proposition 2 to the simplification of (30) as the
following lemma.
Lemma 2. Let R be as defined in Proposition 2. The MLE, fi, of p is given
bY
fi = [G'~(Z)-'G]-'G'@(Z)-'P,. (39)
The MLE so obtained is explicit and a complete suficient statistic for p.
Proof. The estimate R is an element of the Jordan algebra; substitute
(38) into (36) to obtain @(d)@(Z)-'G = G 2 . Recall that 2 ' = LY-', so
that R - ' is also an element of the Jordan algebra; thus, from Proposition 2,
@(Z)@(&'G = Gf,where P = 2-l. Using these expressions and noting that
2 [ G T @ ( r ) - ' G ] -P'
' = [GT@(r)-'G]-',we obtain [G'@(R)-'G]-'G'@(ff)-' =
[G'~(Z)-'G]-'G'@(Z)-', which simplifies (30) to (39) and renders the
solution explicit. The completeness and sufficiency follows from Section 111.
We make a remark in the special case that R is a member of a von Neumann
algebra, as defined in Section 1I.D. If this is so, the solution is explicit if and
THE INTERTWINING OF ABSTRACT ALGEBRA 285
only if m is the number of distinct characteristic roots of R. In this case, the
matrix X is of diagonal form.

C . Relevance to Class of Toeplitz Matrices

Consider the class of ( p x p)-dimensional Toeplitz matrices. Proposition 2


does not apply to this class; thus, the solution fi to (30) must be iteratively
obtained. The class of ( p x p)-dimensional SCS matrices do form a Jordan
algebra; therefore, the solution simplifies to (39) and no iteration is required.
This was observed in Szatrowski (1980), but no concrete connection to
Jordan algebras was provided. To distinguish the latter estimate from the
former, we write the MLE in the SCS case as,
fiCSCS)
= [HT@(Z)-'HI - I HT@(Z)- If,, (40)
where H is [ p ( p - 1)/2 x m']-dimensional and formed from the basis set
{Ha,H I , . . . , H , , , - , } for the SCS matrices (see Example 3) in the same
manner in which G was formed for the Toeplitz matrices in this section. The
MLE (30), applied to Toeplitz covariance estimation, will be denoted &).
We note that if the underlying stochastic process is weakly stationary, the
Toeplitz linear model would appear to impose the proper constraints for
estimation purposes. If the SCS linear model is used under these conditions,
however, a larger number of li basis matrices must be employed, thereby
violating the conditions for completeness and even minimality of the
sufficient statistic. In some cases, this may nevertheless be a practical alter-
native, as the SCS estimator requires no multiplications and only additions
to form. Assuming that the true covariance matrix R is Toeplitz, but the
estimate used is given by (40),it is not difficult to show, using (40) and
@(R)= N Cov &), that
1
cov (fi(SCS, I PI = [ H T @ ( R ) -HI-
' I

1
= - [ ( H W - 'H T @ ( R ) H ( H T H ) - ' ] .
N
The expression for Cov ( f i ( T ) I p ) when (30) is used is complex, due to the fact
that (30) is not explicit but can be directly calculated using the fourth-order
moment expression for Gaussian random variables. It is, however, possible
to relate the total variance of the two estimators as in the following lemma.
Lemma 3. For su8ciently large N and some No such that N > N o ,
tr {COV ones, I P I > tr {COV (B(T,I P I > .
froo$ We first consider the Fisher information matrix, J(p), associated
286 SALVATORE D. MORGERA

with the m-dimensional estimate under the condition of statistical


independence described in Section 11. Following the approach taken in Porat
and Friedlander (1986), we find that, for p fixed and for sufficiently large N,
JN(p), the Fisher information matrix for a sample set of size N, may be
written as
J d P ) = JN,(P) + (N - No)J(P), N > No,
for some No and where J ( p ) is a constant matrix. To a first order approxi-
mation, we then have that

Note that J i l ( p )is the Cramer-Rao bound for the variance of any unbiased
estimator p of p based on a sample set of size N. Under the assumption of
normality, the elements of J ( p ) , [J(p)],, i, j = 1, 2, . . . , m, are given by
[J(p)],, = 3 tr(R-'GiR-'G,] = gr@(R)-'gj,
where we have used Porat and Friedlander (1986) and (29). The definition of
the matrix G allows the simple result J(p) = G'@(R)-'G; thus, J(p)-' =
[G'@(R)-l GI-'.Now, we identify J ( p ) with the constant matrix J ( p ) ; thus,
for a sample set of size N, we have

We take for the estimate 8 the estimate f i ( j - ) of (30), which is an unbiased


estimate of p. Asymptotically, COV(ji,,)Ip) = (l/N)[G'@(R)-'G]-', as
described in Anderson (1970). We then see that for sufficiently large N > No,
the Toeplitz covariance estimate achieves the Cramer-Rao lower bound.
Another unbiased estimator of R, e.g., &scs) of (41), will, therefore, have a
total variance which is larger than, or equal to, that of p(',.
Lemma 3 leads us to conjecture that a minimum variance estimator of a
covariance matrix should utilize the same structure set as the true covariance
matrix itself. In the case of a true Toeplitz covariance, this is the structure set
{Ci:i = 0,1, . . . ,m - l}, where m = p . In essence, the SCS structure set {Hi :
i = 0, 1, . . . , m' - I}, where m' 2 m, overspec$es the estimation problem
and does not concentrate the data usage in the subspace L,. In this regard,
Example 2 is of value, as it provides a way of viewing the SCS structure set
as consisting of subsets in Lgand in the quotient space L , / L g .Asymptotically,
the elements of &scs, associated with the structure subset in L,,/Lgmust go to
zero, while the remaining elements approach those of p. Since the data usage
THE INTERTWINING OF ABSTRACT ALGEBRA 287
is split between the two subsets, the latter estimates will have a higher
variance for a given sample set size N .

D . Experimental Results

As the above arguments are largely based on asymptotics, the question of


how the estimators perform for finite sample size still remains. Example 7,
which follows, presents the results of an experiment for which the total
variance of f i ( T j is compared to that of &--, for finite sample size.
Example 7. Let p = 5. Consider a fourth order stationary Gaussian
autoregressive (AR) process of unit variance, that generates a sequence x,,
j = 0, 1, . . . . We then partition this sequence into N contiguous length p
subsequences, which we assemble as the vectors x, , i = 1,2, . . . ,N . Making
the crude assumption that these vectors are statistically independent, we form
R , as in (2), or, equivalently, i, , and then use (40) and (30) to find fi(scs,and
f i ( T j , respectively. Finding ficsCs,only involves an averaging over the elements
of i,, whereas, finding requires an iterative process. To find &), we use
Algorithm B, reported in Morgera and Armour (1989). In both cases,
tr {Cov (a,. ) I p ) } , is estimated for N = 3 and averaged over 100 realizations;

no appreciable difference in the results was observed when averaging was


carried out over 1000 realizations. Also, no appreciable difference was
observed when the data sample set was constructed in the manner described
previously, but only every other vector used in the simulation in order to
conform more closely to the assumption of statistical independence. The
variables in this experiment are the parameters of the AR model, viz., the two
complex conjugate pole pairs ( p ,,pp), where p , = I p , I e'", i = 1,2. In viewing
the results, we may use the spectral characterizations found in Kay (1988),
i.e., roughly speaking, a peaked AR power spectral density (psd) occurs for
lp,l 2 0.96, and a narrowband psd for 10, - B21 2 4 5 . The results are
presented in Figs. 2-5; for example, in Fig. 2, the pole magnitude 1 p , I and
pole angles 8, ,8, were fixed, and the pole magnitude I p 21 allowed to vary. The
results range over a variety of peaked and nonpeaked wideband and narrow-
band situations. In all cases, the Cramer-Rao lower bound on
tr {Cov I p ) } was also computed as described in Giannella (1986) and
found to be several orders of magnitude lower than the simulation results for
the lowest variance estimator, &.,. This is to be expected, since it is known
that the Cramer-Rao bound is not necessarily tight for finite sample size.
Other experiments supporting the results of this example may be found in
Morgera and Armour (1989) and Armour (1 990).
288 SALVATORE D. MORGERA

13 , 1
12 -
11 -
10 -

9-

2 8-
-
-
<a
b 7-
iilscs!

9
Y 6-
b
5 --
4- 4T,

3-
2-

1- Cramhr-Rao bound

0
d
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ~ 1 1 ~

IPJ
FIGURE2. Variance of structured covariance matrix estimators for N = 3. Pole parameters
are IpII = 0.96, 0, = n/4,0, = 2x13.

V. AR PROCESS
PARAMETER
ESTIMATION

The problem of AR and autoregressive-moving average (ARMA) process


parameter estimation has been extensively studied, and the available
techniques may be broadly classified into two categories: (1) direct methods,
particularly those based on maximum likelihood, e.g., Akaike (1973) and
Gueguen ( 1 987); and (2) transformation methods, particularly those based on
solution of the Yule-Walker equations, e.g., Graupe et al. (1975) and
Morgera and Armour (1989). It is worthwhile mentioning that Anderson’s
approach to problems of this type (see, e.g., Anderson, 1975, 1977) falls in the
category of direct methods for which a slightly modified likelihood function
is used, coupled with an iterative procedure based on the method of scoring.
We confine our discussion in this section to the second category and consider
how structural constraints imposed in estimating the data covariance matrix
affect the accuracy of the AR process parameter estimates.
THE INTERTWINING OF ABSTRACT ALGEBRA 289

12-
11- &
10 -

9-
Rlsjlsl
8-

1-

6-

5-
%,

1 - Cram&-Rao bound
0
d
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

IP*l
FIGURE3. Variance of structured covariance matrix estimators for N = 3. Pole parameters
are IpII = 0.96, 0, = n/4,O2 = 4 4 + 3n/50.

A . The Transformation Method

The procedure is first to form an estimate, R, of the true covariance matrix


R E 9,and then to transform this estimate into an estimate of the AR process
parameter vector a = [a,, a, a, * . . using the normal equations, i.e.,

where s2is an estimate of the variance, a,, of the zero mean, white Gaussian
term driving the AR process.
Note that (42) does not restrict the coefficient matrix, R , to be Toeplitz, a
structure that is, however, commonly used, principally because AR process
parameter estimates so obtained are asymptotically efficient, even though the
vector of covariance estimates &) is not a sufficient statistic for a. This lack
of sufficiency arises for the Toeplitz structure since the solution to (42)
290 SALVATORE D. MORGERA

FIGURE4. Variance of structured covariance matrix estimators for N = 3. Pole parameters


are (p,I = Jp21= 0.96, 0, = 4 4 .

requires 8-l and Y # 9 - l . We are then faced with a situation in which Ij(r)
may be the lowest variance finite sample size estimate among the Toeplitz,
SCS, and generalized estimates, but does not, however, result in the lowest
variance estimate of a via (42). In fact, it may be argued that the lowest
variance estimate of a using (42) must be realized through a covariance
matrix estimate consistent with R E Y and Y = 2-’, i.e., R must be a
member of a Jordan algebra of symmetric linear mappings. ThiLpoint is
related to the “old” question of whether one should utilize R - ’ or R - I ; when
Y = Y - ’ ,the two are equivalent, and when R is a MLE for R,any linear
combination, such as R - ‘ v , V E V , is a uniformly minimum variance unbiased
estimate (Seely, 1971).

B. Covariance of A R Process Parameter Estimates

It is possible to express Cov(P1p) in terms of Cov(Ij1 p), discussed in


Section IV. Let a, = 1 and let the unknown AR parameter vector be given by
0 = [a2a, . . . up- The normal equations (42) represent a nonlinear
transformation, which we denote by 6 = g(Ij), where R G and~ I j is the
estimate of the unknown covariance parameter vector p. Assume the
THE INTERTWINING OF ABSTRACT ALGEBRA 29 1

2-

1 - Cramer-Rao bound
J
O l l , l l l , l , l l l l l l l l l I I

FIGURE 5. Variance of structured covariance matrix estimators for N = 3. Pole parameters


are 0, = n/4,0, = n/4 + 371150.

appropriate regularity conditions for g( ) and assume that fi is a consistent


estimator of p, i.e., g(p) = 0 , for all admissible 0 that ensure stationarity of
the AR process. A Taylor series expansion for is then of the form

where d ( * ) is a stochastic remainder term of the indicated order. In addition,


V denotes the matrix differential operator, and it is understood that
V&'(P) = V,g'(fi)l+, = K'(P), (43b)
a matrix of partial derivatives whose dimension is that of by that of 6.The
Jacobian of the transformation is K ( O ) ,expressed in terms of 0 .To U ( N - l ) ,
the bias, b = E ( 0 ) - 0, is given by the q = 2 term in the summation in
(43a); thus, to the same order, we obtain
+
c o v (n I p ) = K ( 0 ) c o v (fi I p ) K T ( 0 ) U ( N - l ) .
(44)
Explicit expressions for K ( O ) may be found, e.g., for the Toeplitz and SCS
structures, using the respective linear covariance models. Computation of
(44) and comparison with simulation results show that for a wide range of
( p - I)th order AR process types, as examined in Example 7, the result of
292 SALVATORE D. MORGERA

FIGURE6. yariance of structured AR process parameter estimators as a function of sample


size, N .
(44) is reasonably accurate for approximately N > 3p. This makes (44) a
useful tool in conjunction with simulation for studying different covariance
estimator structures and their impact on AR process parameter estimation.

C . Experimental Results

We now present a finite sample size experiment. This example highlights the
importance of selecting an appropriate covariance matrix estimate structure
when solving for the AR process parameters via the transformation method.
Example 8. Let p = 5 and construct the data sample set as in Example 7.
Consider the fourth order AR model for which a, = - 2.7607, a, = 3.8106,
a3 = -2.6535, and a, = .9238. The variance c2 = 1 and is assumed known.
The true psd for this model is shown in Fig. 8. In view of the previous
definitions, this AR model is associated with a relatively peaked, narrowband
spectrum, which poses some difficulty for finite sample size estimation. The
actual pole positions for this model are p , = 0.98 ej2n(o.14)
and p , = 0.98 ej2"'.'').
Figures 6 and 7 show the total variance tr {Cov(i(.)lp)}and bias b , , . ,=
E{ii,(.)}- a, as a function of sample size, N , where the subscript ( )
indicates the type of structured covariance matrix estimator (T or SCS) used
prior to solution of the normal equations. The results presented represent an
average taken over 100 realizations; no appreciable differences were detected
THE INTERTWINING OF ABSTRACT ALGEBRA 293
0.2

0.18

0.16

0.14
i
u) 0.12
Al
P
0.1

0.08

0.06

0.04

0.02
3 4 5 6 7 8 9 10

N
FIGURE7. Bias of structured AR process parameter estimators as a function of sample
size, N .

when 1000 realizations were employed. In this example and the other cases
studied, the bias associated with ri, was chosen as being representative of that
calculated for the other AR parameters. Figure 6 also shows the exact
Cramer-Rao bound on tr { Cov (ii I p ) } computed using the approach found
in Giannella (1986). Use of the generalized estimate, ff,, produced variances
and biases of the elements of 6 that were several orders of magnitude higher
than those associated with the structured estimates, rendering unstructured
estimation of AR process parameter estimates virtually useless for small
sample sizes. Figures 8 and 9 verify that the SCS covariance estimator leads
to the best estimate of the AR process psd for N = 35, with the Toeplitz and
SCS estimators producing comparable psd estimates when N = 10.

D . The Role of a Jordan Algebra

The preceding example points out that if finite sample size AR process
parameter estimation via the normal equations is the goal, it is important that
the estimate be a member of a Jordan algebra. We conclude this section by
showing why this is so. To begin, we consider the matrix
- 1

I=RR*R-'=~~RR-~+R-'R); (45)
for the covariance estimators under consideration, we have that limN-.m f = I
294 SALVATORE D. MORGERA

50 1 I 1 I 1 1 I I 1

-
-
-
-
U
v)
U
0- -
-10 -

-20 -

-30 1 1 I I I I I 1 1

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

norrnalised frequency

FIGURE8. True and estimated power spectral densities for N = 3.

50 I I I I I I I 1 I

-
-
-
-
‘0
L 0-
-

-10 - -
-20 - -
-30 I I I I 1 I I I I
THE INTERTWINING OF ABSTRACT ALGEBRA 295
almost surely. Let p = 5 and consider the SCS estimator, &s, = X:=, &.cs,H,,
where the structure set is shown in Example 3. Assuming weak stationarity,
R is Toeplitz and we can write R - ' = Xf,, s,H, . One of the important reasons
that I?,,, leads to lower variance AR process parameter estimates vis-a-vis
I?,,, is the increased number of degrees of freedom (rp/2l) along the diagonal.
We consider, therefore, all products H, * H, in computing (45) that lead to
diagonal elements of f (Table 1 of Appendix B is of help in deducing which
are relevant) and call the resulting matrix D ( i ) . Some simple algebra shows
that
D(4SCS)) = &CS) * R - = &CS) + &J 9 (46)
where fi,,,,, is linearly expressed in terms of H,, H I , and H,, and is
linearly expressed in terms of H , and H4. We can show that E{&cs)} = I ;
thus, &cs, may be considered a stochastic error term (of antidiagonal
form) for an estimator that is an element of a Jordan algebra. Let =
Zf=, &cs)H, and = XtC3 4(scs,H,; in particular, C,(,, = ij5(scs)s5and
t4(scs,= & ( s C s ) s 6 . As a measure of the variance, we compute

CWS) = E{~(i,SCS,>~'(~(,CS))}
- E2{D@(SCS))1
= w a r ( d S C S ) 1 + Var (~3(SCS)IHO+ w a r (~I(SCS))
+ Var (@qSCS))IHi
+ P a r (Ciz(SCS))1~Z+ ~ ~ ~ { ~ o ( s c s )1~-3 (Ws3c( Ss C) S )
+ 2[E~~l(scs)~qscs)1 - W4(SCS)}Iff49 (47)
which shows the manner in which the error terms contribute to the variance
of the diagonal terms.
Now, consider a Toeplitz estimator. We write &) = Zp=, &)G,, where
the structure set is shown in Example 1. From (45), we have
D ( f ( , ) )= I?,,, * R-'= a,,) &), + (48)
where d,,, is expressed as l$T) = Xf=, &,H, with E{&} = I , and the
stochastic error term is now of the form = X:=, C ? , ( ~ , HIn
, , particular,
4(T)= fi2(TpSI &) = d l ( T ) s 6 , and 4(T) =h(T)(% +
sJ2. As before, we
compute
C(T) E{D(f(T))DT(f(T)N- EZ{D(f(T))}
= rvar@o(T)) + Var(@,(T))+ Var (exT))IHo
+ w a r (&)I + Var (64(T))IHI
+ w a r ( 4 T ) ) + 2 Var (@5(T))IHZ
~ [ E { J O ( T ) ~-~ E( T{ ;) ~ ( T +
) } Var(65(~))IH3
2[E{&)&) - E{;~(T)
296 SALVATORE D. MORGERA

+ [E{(Jo(T)i-
J2(r)+ P ~ ( T ) ) & ) } - E I P ~ ( T ) ) E { ; s ( T-) )~E{PxT,}IH~.
(49)
For N sufficientlylarge, elements on the same super- or sub-diagonal of
all converge to the element on the corresponding super- or sub-diagonal
of R . Comparing (47) and (49) in this case, we obtain
C(T) = C,,v + AE, (504
where
AE = Var(Ps(,))[H0 + 2H2 + 2H3]+ t H 5 , (50b)
with t representing the coefficient of H5 in (49).
The quality of importance is clearly Var ( P ~ ( ~= ) ) Var (&))(Sg + s,)'/4.
+
Since s6 s8 = (2a, + a, a, - a3a4)/a2, using the Gohberg-Semencul decom-
position of R - ' , the magnitude of this term depends on the variance of the
first lag covariance estimate and the true AR parameters which, in turn,
depend on the true covariances. In general, the norm of AE is not
thereby illustrating the importance
insignificant relative to the norm of Ccscs),
of using an SCS estimator as a first step to estimating the AR process
parameters. Furthermore, simulation results indicate that Var (P,,,,) is larger
for narrowband spectra than for wideband spectra and in the former case, if
the pole magnitudes are approximately equal, is maximum when the pole
angles are equal. The interested reader may easily carry this investigation
further in order to obtain a good understanding of the dependencies
involved.

VI. EXACTLOGLIKELIHOOD
FOR AR PROCESS PARAMETER ESTIMATION

In the previous section, we investigated the AR process parameter estimation


problem from a rather general point of view and found that if a transfor-
mation method using the normal equations is employed, it is important that
the covariance matrix estimate be a member of a Jordan algebra. In this
section, we look in more detail at this problem and derive a new method for
finding the exact MLE of the AR process parameters.
The approach taken uses the exact forward-backward loglikelhood
function of the data sequence. The loglikelihood function is then iteratively
optimized with respect to the inverse covariance. The inverse covariance is
constrained to have a SCS structure, thereby permitting the structure of a
Jordan algebra to be brought to bear in the AR process parameter estimation
problem. The method differs from that found in Anderson (1975, 1977) and
the RMLE approach of Kay (1983, 1988) in that no modification is made to
THE INTERTWINING OF ABSTRACT ALGEBRA 297
the loglikelihood, as in the former references, and no approximate order-
recursive procedure is used, as in the latter references. We feel that the
method presented is well suited for AR process parameter estimation for
finite sample size when the data sequence length is of the same order as the
AR process order.
In this section, we work directly with the scalar data sequence, which we
denote by X = { x j :j = 0, 1, . . . , M - I}, and we assume that this data
sequence is generated by a ( p - 1)th order stationary Gaussian AR process.
Note that we do not employ here the statistically independent multivariate
setup used in other sections and, for example, in Morgera and Armour (1989)
and Burg et al. (1982). We use subscripts to designate data subsequences, e.g.,
Xk,, +
= { x j :j = k, k 1, . . . , I; 12 k } and to show explicitly the dimension
of the coefficient matrix in the normal equations, i.e., the normal equations
(42) will be written as R,i = -&'h, where h = [l 0 ... OIT.

A . The Box-Jenkins Likelihood

We begin with a review of the exact loglikelihood function derived by Box


and Jenkins (1970) for the ( p - 1)th order AR process with difference
equation
xj + a , x j - , + . . . + ap-Ixj-p+l= uj, (51)
where { u, } is a statistically independent Gaussian distributed sequence
having variance 0 2 . The joint probability density function of 42 A { u j :
j = p - 1 , p , . . . , M - l} is

Changing variables in accord with (51) results in the joint conditional prob-
ability density function

where xI = [x, x, I . . . xI - p + l ] T . The joint probability density function of


the vector of initial conditions, x, = [x,, x I . . . x,-,]', is

where Rp-l = E[x,xf and A RP_,/a2 is the [ ( p - 1) x ( p - I)]-dimen-


sional covariance matrix normalized by 0 2 .Using (53), (54), and Bayes' law,
298 SALVATORE D. MORGERA

the joint probability density function of the entire data record is

(55)
The exact loglikelihood function, neglecting constant terms, then follows
directly from (55), viz.,

(56)
We note that R,,-l is a function of a, and that the partial derivatives of
f(a, a2I S)with respect to the elements of { a j : j= 1,2, . . . , p - l} are highly
nonlinear functions, thereby making iterative maximization of (56) com-
plicated. A common approach taken is to neglect the initial conditions x, and
just to maximize the logarithm of (53); this approach is not suitable for finite
sample size. We now present a solution to the problem of ML AR process
parameter estimation for finite sample size. The approach uses a new
form of the loglikelihood function, which we call the forward-backward
loglikelihood function.

B. The Forward-Backward Likelihood

The exact loglikelihood (56) is not immediately useful, since it is dependent


on both the [(p - 1) x ( p - I)]-dimensional matrix R,-, and the AR
parameter vector a. In order to obtain a maximum likelihood estimate of the
( p x p)-dimensional covariance matrix R,,, it is necessary to rederive the
expression for f(a, a2I X).This is achieved by converting the p - 1 initial
conditions, to the p initial conditions, So,,,-
I , as specified by xp- I , and
by substituting $h for a. To this end, we write the joint probability density
function of S using Bayes’ law as
P ( 9 ) = P<%,,.M-I I ~ o . , , - I ) * P(~0.p-I ) ? (57)
and rewrite the ( p - 1)th order AR difference equation as
x, = - a , x , - , - ... - a p - l x J - / I f ! + (58)
The key point to realize is that the value of xj depends only on the p - 1
previous data points, SJ- - p + I , and on uJ. This implies that we can write
P(S,,,M-lI ~ O , , - I ~ =P(X,,,M-l I ~ L - I )
THE INTERTWINING OF ABSTRACT ALGEBRA 299
The joint probability density function of the initial conditions, I, may be
found using (54), i.e.,

where kp= E[x, xf/a2 is the normalized ( p x p)-dimensional covariance


matrix. Now, substituting (59) and (60) into (57) results in

where

Neglecting constant terms, the resulting exact loglikelihood for the usual, or
forward predictor, is

It is also possible to derive the exact loglikelihood for the order p - 1


backward predictor, d, having the difference equation
x,+ dix,+l + . . . + d p - l ~ , + p =- I u,
and where d = a* (Burg, 1975). In the backward prediction case, the starting
conditions are given by XM-p,M- I . The joint probability density function of
9'is then
-
P(X) = P(%o.M-p-l I x M - p . M - I ) P(.FM-p,M- 119 (63)
which is the backward equivalent of (57). We have that
P(%o,M-p-l lxM-p,M-l) = P(%O.M-,-I I%M-p.M-2)

where x,' = [x, x,,, ... x , + ~ - ~ ] The


' . joint probability density function of
the final conditions is

Inserting (64) and (65) into (63), taking the logarithm, and dropping the
constant terms yields the exact backward loglikelihood

f"(a, a21%) =
M
+
- -In a2 - In ll?,,[ - - @(a),
2
I -
2a2
(66a)
300 SALVATORE D. MORGERA

where
M-p- I
@(a) = X $ - ~ R ; I X ~ - ~ + 1 (xlTa)'. (66b)
i =O

The approach we take is to combine linearly the forward and backward


loglikelihoods previously found (Armour and Morgera, 199 I). We present
this new result in the following theorem.
Theorem 3. Consider the A R process parameter estimation problem for a
( p - 1)th order, stationary, Gaussian A R process. Define the exact forward-
backward loglikelihood function to be the linear combination of the exact
loglikelihoods, (62) and (66),previously found. The exact forward-backward
loglikelihood function is then given by
1
f"(a, a2I X)= - MIn a*- In - -i &a), (674
a
where
&a> = 5[&W @(a)]. + (67b)
With some algebraic manipulation, we can write &a) as
&a) = 5 tr[(xp-,xpT_I + X ~ - ~ X & , ) ~ ? ; ]+] &a'(C + 6)a, (68a)
where
M- I
c=1 i =p
XiX'

and

Lj = c
M-p-1

i50
xlxl'.

Expressions (67) and (68) define the exact forward-backward loglikelihood


function, which is seen to depend on the normalized ( p x p)-dimensional
covariance matrix kpand on a*.A Newton-Raphson maximization of (67)
with respect to the inverse covariance is now made possible by eliminating the
AR parameter vector a using the normal equations, (42).

C. Maximization of the Forward-Backward Loglikelihood

To obtain the precise form off "(a, a' I X)that is to be maximized, the noise
variance is first eliminated, and then a is mapped into Rp using (42). The
partial derivative off"(a, a*I X)with respect to a' is
THE INTERTWINING OF ABSTRACT ALGEBRA 301
Setting this partial derivative equal to zero yields a2= &a)/M; therefore,
given that the MLE e(B) has been found, the MLE of a2is just e2= &(B)/M.
Substituting o2 = e ( a ) / Minto (67)and again dropping constant terms results
in
2f"(a 1%) = In 1R;' I - MIn &a). (69)
Finally, we substitute a = R;'h into (68a) and denote the result by Q(R;');
using this result, we write (69) as
f"(&'l~) = In&'[ - MlnQ(R;'), (70)
where we have dropped the factor of two on the left hand side of (69).
It is important to keep in mind I?, = R/a2 and that, by definition, a, = 1.
Referring to (42), we see that the only way that the latter condition can be
met is if [R; 'lo,, = 1. Satisfying this requirement and maximizingf"(8; ' I 3)
with respect to l?, is complicated; however, if we maximize the loglikelihood
function with respect to 3, G &', the constraint of [R;'],,, = 1 is not
difficult to enforce. The maximization problem is, therefore, stated as
follows: obtain a ( p x p)-dimensional matrix 3, with [3,],,, = 1 that
maximizes
~ " (I33), = In lSpl - Mln Q(3,), (7 1 4
where
+x~-,x~-,)~,]
&Sp) = 3 tr[(~,-,x,T_~
+ )h'S,(d + B)S,h. (71b)
In the spirit of Section 11, we expand the normalized inverse covariance, g,
according to the SCS linear model

where the dimension m' is given by Proposition 1 in terms ofp. The first and
second order partial derivatives off"(S,I 3)with respect to &,i = 1, 2, . . . ,
m' - 1, are not difficult to obtain; therefore, a Newton-Raphson maximiz-
ation of the loglikelihood, similar to that employed in Morgera and Armour
(1989) may be formulated. The derivatives and the update equation for the
optimization are given in Appendix C . The MLE of the AR process
parameters is given by B = Sph, where 3, is understood to be the estimate
obtained upon termination of the iterative process. The corresponding MLE
of the noise variance is as previously described.
It is clear that the individual forward and backward loglikelihoods, given
by (62) and (66), respectively, can be written in terms of Sp; therefore,
utilizing the linear model (72), either loglikelhood can be maximized in the
302 SALVATORE D. MORGERA

manner previously described. In the course of our experimentation, this was


done; however, it was found that consistently poor convergence resulted
when either the forward or backward loglikelihood functions were taken
individually. In contrast, optimization of the combined forward-backward
loglikelihood function was found to be much more stable, and, as a result,
the Newton-Raphson procedure was successful in converging to a local
maximum for the majority of realizations. More detailed experimental results
are presented in the following section.

D . Experimental Results

The exact forward-backward ML (FB-ML) AR process parameter estimation


method is compared with the approximate ML (AML) method of Morgera
and Armour (1989) and Burg et al. (1982), the Burg method (Burg, 1975), the
RMLE method of Kay (1983, 1988), and the least-squares forward-back-
ward (LS-FB) prediction error method of Ulrych and Clayton (l976), Nuttal
(1976), and Marple (1980). We point out that the AML method is only
approximate in the present context, as it utilizes a loglikelihood based on
statistically independent data vectors, and that the LS-FB method uses a
least-squares forward-backward prediction error criterion to estimate the
AR process parameters.
We let p = 5; two fourth-order AR models, Model 1 and Model 2, are used
in the comparison with a data record length of M = 15. In the terminology
of Section TV, Model 1 is characterized by a wideband psd with one sharp
peak and one low energy peak, while Model 2 is characterized by a narrow-
band psd with two closely-spaced sharp peaks. Comparison of the five
estimation methods is made in terms of the sample mean and variance of the
AR parameter estimates as calculated over 1000 realizations for each
method. The Cram6r-Rao lower bound on the variance of the AR parameter
estimates is calculated as before for each model and the given data record
length. Each experiment is performed as follows. The Gaussian white noise
sequence { u , } is first generated for all 1000 realizations and corrected by the
empirical mean and variance to ensure a mean of zero and a variance of
0 2 = 1.
The AML AR process parameter estimate is obtained using Algorithm B
in Morgera and Armour (1989), with the constraint that the successive
covariance matrix iterates 8(i)~5?, where Lg is the subspace of symmetric
Toeplitz matrices. As mentioned earlier, the resulting covariance matrix
estimate is considered to be an approximate MLE, since this method assumes
that the data record consists of, in the case when M = 15, N = 3 statistically
independent p = 5-dimensional data vectors. The initialization used for the
Algorithm B iterative procedure is the Toeplitz structured biased correlation
THE INTERTWINING OF ABSTRACT ALGEBRA 303
matrix estimate given by

where x, = [x,x, I . . * x, - p + l ] T and x,= 0 for i < 0 and i 2 M . The LS-FB


method solves the normal equations (42) using an SCS-structured unbiased
covariance matrix estimate defined as

where P is a conformable contraidentity matrix. The LS-FB method assumes,


therefore, that R E 9, where L, = L, is the subspace of SCS matrices. We
note that Marple has derived a fast algorithm for solving the normal
equations with this covariance matrix structure that avoids direct compu-
tation of R,,, (Marple, 1980). The exact FB-ML method is initialized using
the normalized inverse of Rcov;thus, we use SF’ = R;i/[k&i]O.O as the initial
inverse covariance estimate. This is a logical initialization, since the exact
FB-ML covariance estimator implicity assumes an SCS structured covari-
ance matrix and, as can be argued using the results of Section IV, the best
such SCS structured covariance estimate is R,,, . By way of comparison, using
the normalized inverse of the Toeplitz structured covariance R,,, as initializ-
ation for the exact FB-ML algorithm, consistently placed iterates in regions
where the Hessian was non-negative definite, thereby causing the algorithm
to fail.
The exact FB-ML estimates are also compared with the popular Burg
(Burg, 1975) and RMLE (Kay, 1983, 1988) algorithms for AR process
parameter estimation. Both of these algorithms are order-recursive methods
based on Levinson’s algorithm. Using a least-squares forward and backward
prediction error criterion, the Burg algorithm forms the reflection coefficient
estimate I;,+ I assuming {tl,&, . . . , 6,} are fixed, where i + 1 is the current
AR process order. The approach taken by the RMLE algorithm is the same,
with the only significant difference being that a ML criterion is employed.
In the results to follow, it will be important to compare the exact FB-ML
AR parameter estimates with the R,,,-derived estimates to see how well the
algorithm improves the initial estimate which, in this case, is already very
good. It is also noted that the exact FB-ML algorithm is an iterative
procedure that does not always converge to a maximum. The algorithm is
designed to reduce the step size p to zero if a step not increasing the log-
likelihood is taken. When this occurs, techniques such as annealing could be
used to find a direction where the loglikelihood increases. In our implemen-
tation, we chose to stop iterating at this point and use the result found thus
304 SALVATORE D. MORGERA

TABLE I
COMPARISON
OF THE APPROX.
ML, EXACTFB-ML, RMLE, BURGAND LS-FB
SPECTRUMESTIMATION
METHODS.
M = 15 DATAPOINTS- 1000 REALIZATIONS MODEL1
AVERAGED,

~ ~~ ~

True Parameters 1 .O - 0.75730 0.97811 - 0.62659 0.89350

Mean - AML 0.43192 -0.68042 0.83700 -0.51864 0.85146


Mean - FB-ML 0.58846 -0.75702 0.94520 -0.58271 0.84764
Mean - RMLE 1.22402 -0.82817 0.96800 -0.55118 0.76181
Mean - Burg 1.19752 -0.82629 0.97153 -0.55791 0.76867
Mean - LS-FB 0.78102 -0.76096 0.93977 -0.57039 0.83219

Var - AML 0.33263 0.17014 0.33084 0.25434 0.09852 0.21346


Var - FB-ML 2.10817 0.03308 0.06019 0.05556 0.02679 0.04391
Var - RMLE 0.46908 0.05776 0.07598 0.06303 0.03274 0.05738
Var - Burg 0.41023 0.05156 0.07144 0.06013 0.03244 0.05390
Var - LS-FB 0.1302 0.02856 0.05538 0.04937 0.02018 0.03837
CR Bound 0.04205 0.01613 0.03173 0.02399 0.00747 0.01983

far. In this way, the number of realizations for which the algorithm fails is
counted and included in the results.
Example 9: M L Estimates for Model 1. The sample mean and variance of
the AR process parameter estimates for Model 1 are shown in Table I. The
exact FB-ML algorithm failed to converge 148 times out of 1000, but all 1000
realizations are included in the sample mean and sample variance of the
estimates. Each of the psd estimates for Model 1 are plotted in Figure 10. We
see that the first sharp peak is correctly estimated by all the methods, and the
second, lower energy peak is, in general, poorly estimated. Both the LS-FB
and the exact FB-ML psd estimates place the second peak correctly in
frequency, but the ML method more closely models the energy content. The
AML estimate is biased, shifting the second peak up in frequency from its
true position.
Referring to Table I, we note that there are large differences in the variance
of the estimates. Defining the total average variance as the sum +X;=,var (ci,),
we see that the AML method yields relatively high variance AR parameter
estimates. The exact FB-ML, RMLE, Burg, and the LS-FB psd estimators
all yield comparable total variance. The exact FB-ML estimate has lower
total variance than the suboptimal RMLE method, but slightly higher
variance than the LS-FB AR estimates with which the exact FB-ML
algorithm is initialized. None of the estimators attains the CramQ-Rao (CR)
THE INTERTWINING OF ABSTRACT ALGEBRA 305

-15 I 1 I I I I 1 I I I
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

norrnalised frequency
FIGURE10. True and estimated power spectral densities for Model I and M = 15.

lower bound on variance. The LS-FB AR estimates come the closest, with the
total variance being approximately twice the Cramer-Rao lower bound.
Turning our attention now to the white noise power estimates, 82,we see
that the variance is quite high. The best estimate is derived from the LS-FB
method, but it still has variance that is an order of magnitude from the
Cramtr-Rao lower bound. The poor performance of the estimators in this
respect results, in part, because the noise power estimate is derived from the
AR process parameter estimates.
Example 10: M L Estimates for Model 2. The narrowband psd of Model 2,
posing a more difficult spectrum estimation problem, is estimated with a
higher total variance for all the estimation methods. The sample mean and
variance of the AR process parameter estimates are listed in Table 11. The
exact FB-ML method failed to converge in 256 realizations out of 1000.
Again, all 1000 realizations are included in the sample mean and sample
variance of the estimated parameters.
The mean psd estimates are plotted in Fig. 11. The AML estimate fails to
produce a psd estimate that resolves the two peaks in the spectrum. None of
306 SALVATORE D. MORGERA

TABLE I1
COMPARISON
OF THE APPROX.
ML, EXACTFB-ML, RMLE, BURGAND LS-FB
SPECTRUMESTIMATION
METHODS.
M = 15 DATAPOINTS- 1000 REALIZATIONS
AVERAGED,MODEL2

True Parameters 1.0 - 2.76070 3.81060 - 2.65350 0.92380

Mean - AML 0.04556 -2.48699 3.13423 - 1.98158 0.63944


Mean - FB-ML 0.49804 - 2.66491 3.58007 - 2.42728 0.83216
Mean - RMLE 1.23016 -2.68518 3.60916 -2.45270 0.83559
Mean - Burg 1.48711 -2.61184 3.48236 -2.35251 0.81294
Mean - LS-FB 0.75251 -2.64105 3.51312 -2.35664 0.79931

Var - AML 12.00923 1.16625 3.02472 2.60065 0.40265 1.79857


Var - FB-ML 0.84744 0.04814 0.22854 0.22259 0.04237 0.13541
Var - RMLE 0.64435 0.04903 0.14576 0.12056 0.02015 0.08388
Var - Burg 4.87627 0.06670 0.20163 0.15695 0.02148 0.11 169
Var - LS-FB 0.11716 0.04227 0.19623 0.18978 0.03548 0.14523
CR Bound 0.042585 0.00778 0.02452 0.02064 0.00353 0.01412

the estimators exactly estimate the energy of the two peaks, but the exact
FB-ML mean spectrum estimate does succeed in resolving both peaks. The
LS-FB algorithm only barely resolves the second peak.
Overall, the exact FB-ML, RMLE, Burg, and LS-FB methods yield com-
parative variance AR process parameter estimates, which is well below that
of the AML method. The noise power estimates, again, have high variances,
with the lowest variance estimate resulting from the LS-FB method. As with
the wideband psd of Model 1, none of the methods attains the Cramer-Rao
lower bound.
The mean psd estimate of the exact FB ML algorithm is compared with
the RMLE, Burg, and LS-FB algorithms in Figure 12. Though all of the
estimates have very low bias, some differences can be seen. The Burg
algorithm positions the second peak slightly up in frequency and the energy
content is too low. The LS-FB estimate used to initialize the exact FB-ML
algorithm does not adequately resolve the second peak. The RMLE and the
exact FB-ML algorithms yield very similar mean psd estimates. The peak
positions are unbiased, but the exact FB-ML estimate more closely models
the spectral energy of the first peak.
There should be concern that the good quality of estimates obtained using
the exact FB-ML algorithm is due only to the excellent initialization provided
by the covariance estimate $,, . The results show, however, that the FB-ML
algorithm leads to AR process parameter estimates that are quite different
THE INTERTWINING OF ABSTRACT ALGEBRA 307

-30 I I I I I I I I I I
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 5
normalised frequency

FIGUREI I. True and estimated power spectral densities for Model 2 and M = 15.

from those obtained using the LS-FB method. For each case where the
algorithm converges, a local maximum of the forward-backward log-
likelihood is attained. One might expect lower variance estimates if the
realizations for which an exact ML estimate is not found, i.e., the realizations
for which the algorithm did not converge, are excluded. To see if this is true,
an additional experiment using Model 2 with M = 15 and 1000 realizations
was conducted, only this time all the realizations for which the exact FB-ML
algorithm did not properly converge were excluded from the sample mean
and variance calculations. The results are shown in Table 111. We see that the
total variance of the exact FB-ML algorithm is indeed lower than that
obtained for the LS-FB estimate. The results presented here are conservative
in that these realizations are retained.
The number of realizations for which convergence does not occur
decreases slowly with increasing data record length M . For example, with
Model 2 and M = 50, 116 realizations out of 1000 did not converge. This is
about half the number of convergence failures recorded when M = 15. With
M = 100, 86 realizations out of 1000 failed to converge. Further research is
308 SALVATORE D. MORGERA

normalised frequency
FIGURE12. Power spectral density estimates for Model 2 and M = 15.

TABLE I11
COMPARISON
OF THE EXACTFB-ML AND LS-FB SPECTRUMESTIMATION
METHODS.
M = 15 DATAPOINTS- 1000 REALIZATIONS,
MODEL2
DUETO CONVERGENCE FAILURE.
250 REALIZATIONS EXCLUDED

a2 01 a4 Sc:, , var (a(i))


True Parameters I .O - 2.76070 3.81060 - 2.65350 0.92380

Mean - FB-ML 0.75348 - 2.64225 3.51467 - 2.35724 0.79844


Mean - LS-FB 0.75496 -2.63728 3.50103 -2.34297 0.79189

Var - FB-ML 0.11962 0.04533 0.20171 0.19082 0.03483 0.11817


Var - LS-FB 0.11925 0.04554 0.20769 0.19905 0.03687 0.12229
CR Bound 0.04259 0.00778 0.02452 0.02064 0.00353 0.01412
THE INTERTWINING OF ABSTRACT ALGEBRA 309
required in order to develop an improved maximization algorithm that
ensures convergence in all cases. Some insight into convergence of the AR
process parameter estimates with increasing M under more general assump-
tions on the sequence { u j } than those made here may be found in Davis and
Resnick (1985, 1986). In these references, the limiting distributions of the
normalized elements of a Toeplitz structured biased correlation matrix are
derived. Using these results and the transformation method, it is not difficult
to show that M ” * F ( M ) ( i i- a) converges in distribution to a stochastic
vector e, where F ( M ) is a slowly varying function of M . The interesting
observation is that each element of e is a ratio of random variables; this
implies that the distribution of a typical element is heavy-tailed. The experi-
mental results seem to confirm this observation. We note that while ii does
appear to converge quickly to a for modest sample size M , further refinement
as M increases is slow, and quite possibly due to the heavy-tailed nature of
the element distributions.

VII. SUMMARYAND CONCLUSIONS

This work has shown that, under the assumption of normality, a necessary
and sufficient condition for a.complete sufficient and explicit statistic for the
covariance estimation problem is that the linear subspaces associated with
the covariance and the inverse covariance be identical and a Jordan algebra
of symmetric linear mappings. The class of symmetric centrosymmetric (SCS)
matrices forms such an algebra under the proper composition, whereas the
class of symmetric Toeplitz matrices does not. The extension of the smaller
class to the larger has been demonstrated using the composition - or
symmetric product - operation associated with a Jordan algebra. In addition,
the concepts of an ideal structure and related Jordan algebra isomorphism
has been employed to expose the form of the maximum likelihood estimator
(MLE) for the SCS class. Simulation results for finite, and very small, sample
sizes have shown that, under the assumption of weak stationarity, the highly
constrained Toeplitz covariance estimate has lower variance than the SCS
estimate, but gives rise to higher bias and variance autoregressive (AR)
process parameter estimates obtained via the normal equations. This
phenomenon is explained using an argument in which the Jordan algebra
structure plays a key role.
Cognizant of the relevance of the Jordan algebra structure, a new form of
the loglikelihood function suitable for finite sample size Gaussian AR process
parameter estimation is derived. The resulting functional is called the exact
forward-backward (FB) loglikelihood and is a linear combination of the
forward and backward loglikelihoods. Experimental results are presented
310 SALVATORE D. MORGERA

using FB maximum likelihood estimates of the SCS-structured inverse data


covariance matrix to estimate the AR process parameters via the normal
equations. The results obtained are then compared with a number of other
approaches for several AR models and finite, and very small, sample size. The
new approach offers good results, as do certain other existing methods, but
none come sufficiently close to the Cramer-Rao lower bound on estimator
variance. This, we feel, is an indication of both the enormous richness and
challenge awaiting researchers in the increasingly multidisciplinary area of
statistical signal processing.

ACKNOWLEDGMENTS

Stimulating discussions with Professor John Taylor, of Mathematics and


Statistics, McGill University, assisted in placing certain technical details in
their proper perspective. The help of Mr. Bernard Armour, of Atlantis
Scientific, Ottawa, Ontario, in obtaining some of the simulation results in
Sections V and VI is gratefully acknowledged. The facilities and excellent
research environment offered by the Information Networks and Systems
Laboratory at McGill University were key elements in making this work
possible.

APPENDIX
A
ABSTRACTALGEBRAIC
CONCEPTS

Definition 1. Linear Mappings


The general terminology is linear mupping, as opposed to linear transfor-
mation. Let V, and V , be vector spaces over R, and let cp: V, .+ V , be a set
mapping. Then cp is called a linear mapping if
cp(vl + w l ) = cpv, + cpw,, and (A. 1a)
cp(lv,) = Icpv,, v , , W ~ V,;
E AER. (A. 1b)
A linear mapping is, therefore, a mapping that preserves linear combinations.
A linear mapping cp: V - + V is called a linear transformation of V and we
denote the space of all linear (symmetric) transformations of V as L ( V )
( L s ( V ) ) .We shall have occasion to consider the composition of linear
mappings. Let cp: 5 V , and $: V , .+ & be two linear mappings. The
.
+

composition of cp and $, the mapping $ cp: 5 -+ &, or, simply, $cp: Vl -+ V 3 ,


0

is defined by
($ 0 Cp)Vl = $(cpVl)9 VI E V, . ('4.2)
THE INTERTWINING OF ABSTRACT ALGEBRA 31 1
The mapping $ 0 cp is again linear and satisfies (A. 1). For the most part, we,
in fact, work with the matrix representation of a linear mapping. Consider the
linear mapping cp: V, -+ V2, where El = { e l l : i = 1, 2, . . . , p > and E2=
{e,,: i = 1 , 2, . . . , q} are basis sets for V, and V,, respectively. Every vector
cpe,, can be expressed as
4
cpe,, = 1 ~ l , ~ e , ~ ,i = 1, 2, . . . ,p .
J=I
('4.3)

The mapping cp uniquely determines a ( p x q)-dimensional matrix cr(cp; E l , E 2 )


with ijth element ail; conversely, every such ( p x q)-dimensional matrix
uniquely determines a linear mapping cp: V, + V,.
Definition 2. Algebra
An abstract algebra d,over R together with a mapping d x d where
+d,
d x d + .d denotes the product ab, a, b e d , which satisfies
(La, + pa,)b = A(a,b) + p(a,b), and (A.4a)
a(& + pb,) = A(abl)+ p(ub2), A, P E R . (A.4b)
An algebra is called associative if
a(bc) = (ab)c, a, b, c ~ d . (A.5)
If d has an identity, I , such that a1 = 1a = a, a E d,it is unique. Consider
the space L( V ) of all linear transformations of a vector space V . Define the
product of two transformations, cp and $, by the composition $ cp = $cp. 0

Clearly, the mapping (cp, $) -+ $cp satisfies (A.4); thus, L ( V ) is an algebra,


which we call the algebra of linear transformations. It follows easily that this
algebra is associative. A subalgebra, d, , of an algebra d is a linear subspace
that is closed under multiplication in A?, i.e., if a and b are any elements of
dl, then ah E d, .
Definition 3. Jordan Algebra
An abstract Jordan algebra 9 over a field R is a nonassociative algebra
satisfying the identities
ah = ba and (A.6a)
(a2b)a= a2(ba) (4.6b)
for all a, b E 3.The simplest examples of Jordan algebras arise from associ-
ative algebras, d.Let d be an associative algebra over R (see Definition 2).
In terms of the associative multiplication of elements in d,define a new
multiplication, or composition *, as
a*b = +(ab + ba). (-4.7)
If we retain the vector space structure of d and replace the associative
312 SALVATORE D. MORGERA

multiplication ab by the new multiplication a * b, we obtain a Jordan algebra,


typically denoted by d+. If a Jordan algebra f is isomorphic to a subalgebra
of an algebra d + , where A is associative, then is called a special Jordan
algebra. A Jordan algebra is called formally real if a * a + b * b = 0 implies
that a = 0 and b = 0. Finally, every Jordan algebra has a unique identity
element, which we denote by 1. We know that L( V) is an associative algebra.
Since a * b is symmetric when a and b are symmetric, L , ( V ) is a Jordan
subalgebra of L ( V ) + . In fact, L , ( V ) is a special Jordan algebra that is
formally real. Any Jordan subalgebra of L,( V) inherits these properties.
Jordan algebras are extensively treated in Braun and Koecher (1966) and
Jacobson (1968).
Definition 4. Ideal
A right (left) ideal in an algebra d is a subspace 9 such that for every a E 9
and every b E d ,ab E .f (ba E S ) . A subspace that is both a right and a left
ideal is called a two-sided ideal, Every right (left) ideal is a subalgebra of d .
Definition 5. Algebra Homomorphism
In place of the term linear mapping for a mapping between vector spaces, we
use the term homomorphism to describe a linear mapping between algebras.
Let d and g be two algebras over R.A linear mapping cp: d + g is called
an algebra homomorphism if cp preserves products, i.e., q(ab) = cpa * cpb,
a, b E d.A Jordan algebra homomorphism preserves the symmetric product
* defined in Section 111. The 1 : 1 Jordan algebra homomorphism used in
Section I11 is bijective, i.e., 1 : 1 and onto, and may equally well be called an
isomorphism.

APPENDIX
B
JORDAN
ALGEBRA
MULTIPLICATION
TABLES
TABLE 1
MULTIPLICATION
(*) TABLE
FOR THE %DIMENSIONAL JORDAN ALGEBRA
OF ( 5 X 5) scs MATRICES
THE INTERTWINING OF ABSTRACT ALGEBRA 313
TABLE 2
(*) TABLE
MULTIPLICATION FOR THE %DIMENSIONAL JORDAN
ALGEBRA
IDEALS OF
( 5 x 5 ) SCS MATRICES

-42,
A22

1 A21
A22 A21
0
0
A21

APPENDIX
C
A NEWTON-RAPHSON
MAXIMIZATION
OF THE EXACT
FORWARD-BACKWARD FUNCTION
LOGLIKELIHOOD

Let spbe an inverse covariance matrix conforming to the linear model


m'- I

sp= 1 S,,H,,,
n=O

where {H,,: n = 0, I , . , . , m' - 1) is the SCS structure set defined in


Example 3. To satisfy the condition [s"],, = 1, we set So = 1, noting that H,
is the matrix having [Ho]o,o = 1. The maximization to be carried out is with
respect to {S, ,Sz, . . . , }. In obtaining the partial derivatives off"(sp I %),
the following results are useful:

a -
-Sp = H,,,
a%
314 SALVATORE D. MORGERA

The first and second order partial derivatives off"($, 1%) are

and

and

f o r m , n = 1 , 2 , . . . , m ' - 1.
Letting 9 = [Sl S2 . . . Fm.- I]r and ? =[c -
. . . tm.- I]r, a Newton-Raphson
maximization procedure, for example, that found in Morgera and Armour
(1989) may be used to maximize f"(spl%).
Here, ; ( ' + I ) is defined as the
step taken from the estimate 9(') at the ith iteration to the estimate 9"'"
given by
#;+I) = 60) + p ? U + l ) . (C.3)
The step size p is initially set to p = 1 , and is reduced as necessary such that
the loglikelihood increases. The update equation is

m= I

where gt) is the nth component of the gradient given by (C.1) evaluated
at the ith iteration and is the (m,n)th component of the Hessian matrix
given by (C.2) evaluated at the ith iteration. Iteration is terminated
when the change in f "($p 1%) is sufficiently small for several consecutive
iterations.
T H E INTERTWINING OF ABSTRACT ALGEBRA 315
REFERENCES

Akaike, H. (1973). Biometrika 60,255-265.


Anderson, T. W. (1969). In “Proc. Second Intern. Symp. Multivariate Anal.” (P. R. Krishnaiah,
ed.), pp. 55-66. Academic Press, New York.
Anderson, T. W. (1970). In “Essays in Probability and Statistics” (R. C. Bose and S. N . Roy,
eds.), pp. 1-24. Univ. of North Carolina Press, Chapel Hill.
Anderson, T. W. (1973). Ann. Statisr. 1, 135-141.
Anderson, T. W. (1975). Ann. Statist. 3, 1283-1304.
Anderson, T. W. (1977). Ann. Starist. 5 , 842-865.
Armour, B. (1990). “Structured Covariance Autoregressive Parameter Estimation.” M.Eng.
Thesis, E.E. Dept., McGill Univ., Montreal.
Armour, B., and Morgera, S. D. (1991). IEEE Trans. Sig. Proc. 39, 1985-1993.
Barndorff-Nielsen, 0. (1978). “Information and Exponential Families in Statistical Theory.”
Wiley, New York.
Box, G . E. P., and Jenkins, G. M. (1970). “Time Series Analysis - Forecasting and Control.”
Holden Day, San Francisco.
Braun, H., and Koecher, M. (1966). “Jordan-Algebren.” Springer, Berlin.
Browne, M. W. (1977). Brit. Journ. Math. and Statis/. Psychol. 30, 113-124.
Burg, J. P. (1975). “Maximum Entropy Spectral Analysis,” Ph.D. Thesis, E.E. Dept., Stanford
Univ., California.
Burg, J. P., Leuenberger, D . G . , and Wenger, D. L. (1982). Proc. IEEE 70, 963-974.
Collar, A. R. (1962). Quart. Journ. Mech. and Appl. Math. XV, (3). 265-281.
Davis, R. A,, and Resnick, S. (1985). Stochastic Process. Appl. 20, 257-279.
Davis, R. A,, and Resnick, S. (1986). Ann. Statist. 14, 533-558.
Giannella, F. (1986). IEEE Trans. Acoust., Speech, Sig. Proc. ASSP-34, 994-995.
Graupe, D.. Krause. D. A,, and Moore, J. B. (1975). IEEE Trans. Automat. Contr. AC-20,
104- 107.
Gueguen, C. (1987). In “Signal Processing” (J. L. Lacoume, T. S. Durrani, and R. Stora, eds.),
pp. 707-779. North-Holland, Amsterdam.
Hile, G. N., and Lounesto, P. (1990). Linear Algebra Appl. 128, 51-63.
Jacobson, N. (1968). “Structure and Representations of Jordan Algebras.” Amer. Math. SOC.,
Providence, RI.
James, A. N. (1957). Ann. Math. Statist. 28, 993-1002.
Jeffreys, H., and Swirles, B. (1956). “Methods of Mathematical Physics” 3rd ed. Cambridge
Univ. Press. Cambridge.
Jensen, S. T. (1988). Ann. Statist. 16, 302-322.
J o h n s o n , D. (1982). J . Multivariate Anal. 12, 1-38.
Jordan, P., Neumann, J. v., and Wigner. E. ( I 934). Ann. Ma/h. 36, 29-64.
Karrila, S., and Westerlund, T. (199 I). Automatica 27. 425-426.
Kay, S. M. (1983). IEEE Trans. Acoust.. Speech, Sig. Proc. ASSP-31, 56-65.
Kay, S. M. (1988). “Modern Spectral Estimation: Theory and Application.” Prentice-Hall, New
Jersey.
Lehmann, E. L. (1986). “Testing Statistical Hypotheses” 2nd ed. Wiley. New York.
Magnus, J. R.. and Neudecker, H. (1979). Ann. Sturist. 7, 381-394.
Magnus, J. R. (1988). “Linear Structures.” Monograph No. 42. Oxford, New York.
Marple. L. (1980). IEEE Trans. Acoust.. Speech. Sig. Proc. ASSP-28, 441454.
Morgera, S. D., and Cooper, D. B. (1977). IEEE Trans. Inform. Theory IT-23, 728-741.
Morgera, S. D. (1981). IEEE Trans. Inform. Theory IT-27, 607-622.
316 SALVATORE D. MORGERA

Morgera, S. D. (1982). Signal Processing 4, 425-443.


Morgera, S. D. (1986). Pattern Recognition Letters 4, 1-7.
Morgera, S. D., and Armour, B. (1989). Proc. IEEE 1989 Intern. Conf. on Acoust.. Speech, Sig.
Proc., Glasgow, 2202-2205.
Morgera, S. D. (1992). IEEE Trans. Inform. Theory IT-38, 1053-1065.
Mukherjee, B. N., and Maiti, S. S. (1988). Comput. Statist. Quart. 2, 105-128.
Nuttal, A. N. (1976). Tech. Rep. 5303, Naval Underwater Systems Center, New London, Conn.
Porat, B., and Friedlander, B. (1986). IEEE Trans. Acoust., Speech, Sig. Proc. ASSP-34,
118-130.
Pukhal’sky, E. A. (1981). Theory of Prob. Appl. XXVI, 564-572.
Quang, A. N. (1984). IEEE Trans. Acoust., Speech, Sig. Proc. ASSP-32, 1249-1251.
Rao, C. R. (1973). “Linear Statistical Inference and Its Applications” 2nd ed. Wiley, New York.
Samelson, H. (1969). “Notes on Lie Algebras.” Von Nostrand Reinhold, New York.
Seely, J. (1971). Ann. Math. Statist. 42, 710-721.
Seely, J. (1977). Sankhya 39, ser. A, pr. 2, 170-185.
Szatrowski, T. H. (1980). Ann. Statist. 8, 802-810.
Ulrych, T. J., and Clayton, R. W. (1976). Phys. Earth and Plan. Int. 12, 188-200.
Wishart, J. (1928). Biometrika ZOA, 32-52.
ADVANCES IN ELECTRONICS A N D ELECTRON PHYSICS.VOL. 84

Echographic Image Processing


J. M. THIJSSEN
Biophysics Laboratory of the lnstituie of Ophthalmology
University Hospital. Nijmegen, The Netherlands

I. Introduction . . . . . . . . . . . . . . . . . . . . , , . . . , . , , . 317
11. Physics of Ultrasound . . . . . . . . . , . .. . . . .
, . . . . . . . . 318
111. Acoustic Tissue Models . . . . . . . . , . .
, . . . . . . . . . . . . . 321
IV. Estimation of Acoustic Parameters: Acoustospectrography . . . . . . . . . . 323
V. Generation of Tissue Texture. . . . . . . . . . . . . . . . . . . . . . . 325
VI. Texture Analysis . . . . . . . . . . . , . . , . . , . , . , . , . . , . 329
A. Diffuse Scattering Model . . . . . . . , , , . . . . . , . , . . , . . 329
B. Combined Diffuse/Structural Scattering Model. . . . . . . . . . . , . . 332
C. Resolved Structure. . . . . . . . . . , , , . . . . . . . . . . . . . 333
D. Non-Parametric Texture Analysis . . . , . . . . . . . . . . . , . .
. 337
VII. Image Processing , . . . . . . . . . . . , . . . . . . . . . . . . . .
. 338
A. Detection of Focal Lesions . . . . . . , . , , . . . . . . . . . . ,
. 338
9. Improvement of Lesion Detection . . . , . , , . . . . . . . . . , , . 341
C. Detection of Diffuse Pathological Conditions , , . . . . . . . . . . . . 344
Acknowledgments , . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345
References . , . , , . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345

I. INTRODUCTION

The introduction of the concept of gray-scale echography (Kossoff, 1974)


and the development of scan converters for echographic imaging enabled the
display of parenchymal tissues in addition to the outlining of the gross
anatomy of organs. The characteristics of the tissue texture are generally
expressed in such qualitative terms as hyper- or hypoechoic, coarse or fine
granular, and regular or irregular. These characteristics apply to changes of
the tissue texture due to diffuse diseases of organs (e.g., cirrhosis) or caused
by focal lesions. In both cases, the changes are expressed relative to some
standard display of the texture, either the “normal” echogram of the healthy
organ, or the display of the healthy tissue surrounding a lesion. A major
problem in echographic diagnosis, which is almost completely neglected, is
the dependence of the tissue texture on the performance characteristics of the
equipment (Jaffe and Harris, 1980; Wells and Halliwell, 1981). It is neverthe-
317 Copyright Ip 1992 by Academic Press, Inc.
All rights of reproduction in any form reserved.
ISBN 0- 12-014726-2
318 J. M .THIJSSEN

less quite evident that the texture is depth-dependent, which means


dependent on the characteristics of the sound field produced by a transducer,
frequency-dependent, and dependent on the processing by the electronic
system, i.e., on the brand and the type of equipment. These observations are
indicative of the backgrounds for the trailing of the diagnostic potentials of
gray-scale echography behind the substantial technological improvement of
the equipment in the 1980s.
The scientific progress made during the same period of time has been
substantial as well. The impact of this progress on the clinical work is,
however, not generally recognized. This may be due partly to the veil induced
by the mathematics of the physical descriptions involved and partly caused
by the lack of transfer of scientific knowledge to instrumental innovations. By
using a simplified concept of the biological tissue, which is generally called an
acoustic tissue model, the interaction mechanisms of ultrasound with tissue
can be formulated analytically. The attenuating and backscattering charac-
teristics may then be assessed quantitatively. Moreover, the involvement of
the equipment performance and of the attenuation in the acquired echosig-
nals can be compensated for. The echographic images resulting from these
processing steps are both analyzed and processed. The image analysis yields
statistical texture descriptors that have been shown to be relevant for
improving echographic diagnosis. Finally, the method of processing of B-
mode (i.e., 2-dimensional) echograms is either to modify the original,
preprocessed, images for enhancing the visual diagnosis, or else is used to
generate parametric images. This paper reviews the concepts, methods, and
results of this fascinating joint field of physics and medicine, the cited
literature, however, is only a very limited and necessarily subjective selection.

OF ULTRASOUND
11. PHYSICS

Ultrasound is a mechanical wave phenomenon, with a frequency above the


audible range, i.e., larger than 20kHz. In gases and most liquids, only
longitudinal waves are propagated, as is the case in soft biological tissues.
The ultrasound is produced by a so-called transducer containing a
piezoelectric layer that generally consists of a sintered ceramic material, e.g.,
lead-zirconate-titanate (PZT). The transducer is used in the pulse-echo mode,
which implies a very short acoustic pulse production during the transmission
mode, and the transducer acting as a microphone. The latter mode is main-
tained over a period of time that is sufficiently long to register the echoes
from the deepest structures in the body that are being examined. This
transmission/reception cycle is repeated continuously, and for the imaging
the direction of the sound beam is scanned in a single plane.
ECHOGRAPHIC IMAGE PROCESSING 3 19
0 dB
spectrum

-30 dB
0 1 0 10 20
time (ps) frequency (MHz)

FIGUREI . (a) Waveform of transmitted ultrasound pulse; (b) power spectrum correspond-
ing to waveform in (a) (Oosterveld el al., 1985).

The temporal characteristics of the transmitted ultrasound pulse depend


on the electronics producing the excitation pulse and on the dimensions and
electro-acoustic properties of the materials used for the construction. Apart
from the ceramic plate, a backing medium (for reducing the acoustic pulse
length) and one or two matching layers constitute a transducer. Adequate
models were developed to describe the transmitting and receiving characteris-
tics of the transducer (Mason, 1948; Krimholtz et al., 1970; Van Kervel and
Thijssen, 1983), which are based on the delay line concept. An example of a
pulse waveform and the corresponding spectrum is shown in Fig. 1.
The sound field produced by a flat, or spherically curved, transducer in a
homogeneous medium with negligible attenuation can be calculated analytic-
ally for continuous wave (CW-) mode, i.e., the “monochromatic” case
(Harris, 1981). This term from optics is purposely used to indicate the
equivalence of the ultrasound beam with a laser beam. The Fraunhofer
diffraction theory is, for this reason, applicable to an ultrasound transducer,
when the limiting aperture of a pinhole is replaced by the edges of the
transducer. In the case of pulsed acoustic transmission (i.e., broad-band), an
elegant numerical calculation scheme was devised (Stephanishen, 1971),
which has been extrapolated to a generalized clamped condition of a circular
transducer (Verhoef et af., 1984). Examples of the sound field generated by
the same, focused, transducer in CW- and pulsed-mode are shown in Fig. 2.
As one can see, the pulsed mode yields a relatively homogeneous cross section
due to the fact that interference phenomena are almost completely excluded
by the short duration of the transmitted pulse.
The phenomena described so far can be formalized as follows: the excitation
pulse generates an acoustic transmission pulse i(t). This pulse propagates as
a spatially distributed package of acoustic energy with time; it shows depth-
dependent spectral and spatial properties. This diffraction may be considered
as a depth-dependent, low-pass filter with impulse response hd(t, z ) (Fink and
320 J. M. THIJSSEN

FIGURE 2. (a) Continuous wave mode of transmission, cross section of beam obtained by
simulation; (b) pulsed wave mode of same transducer as in (a) (Thijssen, 1987).

Cardoso, 1984; Verhoef et al., 1985). Therefore, the pressure amplitude at


depth z is given by the convolution product:
p ( z , t ) = i(t) * hd(t, z). (1)
To simplify the discussion further on, it is convenient to write this equation
after a Fourier transformation:
p ( f , z, = r(f)Hd(f, z)* (2)
When including the tissue transfer function H T ( f ,z), which represents the
overall attenuation characteristics, Eq. 2 becomes
p ( f , z, = r ( f ) H d ( f , z ) H T ( f , z). (3)
The low-pass character of the attenuation largely suppresses the finite
amplitude effects of the propagation (non-linear behavior) in biological
tissues.
When considering a particular “slab” of tissue at depth z , which actually
is an isochronous volume at a distance z = ct from the transducer, the
echosignal is assumed to be due to backscattering: H b s ( f ) ,so
p ( f , z, = z(f)Hd(f, z)HT(f, z)Hbs(f)* (4)
Since both the tissue transfer function and the “directivity function” of the
transducer, acting as an acoustical antenna, are identical to those in trans-
ECHOGRAPHIC IMAGE PROCESSING 32 1

absorption

attenuation

/
/ backscattering

scattering

FIGURE3. Acoustic tissue model: constant sound velocity, homogeneous absorption, and
isotropic (diffuse) scattering. Accessible acoustic tissue parameters: attenuation and backscatter-
ing (Thijssen and Oosterveld, 1990).

mission, reciprocity can be applied to Eq. (4). The spectrum E ( f , z ) cor-


responding to the received rf-echo signal then can be written

It may be emphasized that this equation does not simplify the field conditions
and is not restricted to plane wave propagation, as is often assumed.

TISSUE
111. ACOUSTIC MODELS

Characteristic parameters of tissues that are revealed by the interaction of the


ultrasound with the tissue through which it propagates are: the propagation
speed, the attenuation, and the backscattering. An acoustic tissue model is
shown in Fig. 3. It is assumed that the sound speed is constant, the absorption
is homogeneous, and the scattering is isotropic by a random arrangement of
small structures in 3-D space. These assumptions do not apply to all
parenchymal tissues (e.g., skeletal muscle). The absorption of ultrasound is
mainly caused by relaxation phenomena of biological macromolecules (Pauli
and Schwann, 1971; Jongen et al., 1986), which transfer mechanical energy
into heat. Another source of attenuation is the scattering, i.e., omnidirectional
reflections, by small inhomogeneities in the tissue (e.g., microvasculature, cell
conglomerates). The overall attenuation is therefore the result of absorption
and scattering, which both depend on the frequency in such a way that the
higher the frequency, the stronger the attenuation will be. It should be
mentioned that the scattering constitutes only a small fraction of the overall
attenuation coefficient in the low megahertz range of transducer frequencies
(Nicholas, 1982). As is indicated in Fig. 3, the attenuation (coefficient) and
322 J. M. THIJSSEN

absorption

attenuation

/
-
scattering tissue model

-
\
backscattering

/
structure
texture

U
FIGURE 4. Acoustic tissue model as in Fig. 3, with additionally structural scattering.
Additional parameter(s) related to structure can be estimated (Thijssen and Oosterveld, 1990).

the backscattering are accessible for a quantitative estimation. Both charac-


teristics become evident in echographic images as well.
The acoustic tissue model in Fig. 3 has to be extended to incorporate the
histological structure of some tissues. A first indication was found by
Nicholas and Hill (1975), who, by analogy to x-ray diffraction by crystals,
introduced the term “Bragg-diffraction” to describe the scattering charac-
teristics of liver tissue in an in-vitro experiment. The crystal-like structure of
liver tissue is constituted by the triads of Kiernan, which are more or less
regularly positioned in a hexagonal grid. Therefore, in addition to the
“diffuse” scattering due to randomly distributed inhomogeneities, “struc-
tural” scattering also has to be incorporated in the tissue model (Fig. 4).
Wagner et al. (1986) introduced the concept of structural scattering in the
computer analysis of the texture of tissue echograms.
It may be mentioned that scattering by definition implies that the struc-
tures are much smaller than the “sampling volume” of the echographic
equipment, i.e., the product of spatial pulse length and beam cross section.
In the limit when the dimension of the scattering sites is much smaller than
the wavelength, Rayleigh scattering (Morse and Ingard, 1986) occurs. The
latter is characterized by a scattering intensity that is proportional to the
fourth power of the frequency, and it explains the increased contribution of
scattering to the attenuation at high frequencies. A further assumption
implicitly made in the tissue model is that the diffusely scattering structures
are separated by distances which are also small with respect to the sampling
volume. This implies that the distances are below the resolution limit of the
ECHOGRAPHIC IMAGE PROCESSING 323
equipment, and, therefore, the tissue echogram cannot be a true picture of the
histological structures but rather an artifactual representation, as is discussed
further on.

Iv. ESTIMATION
OF ACOUSTIC
PARAMETERS: ACOUSTOSPECTROGRAPHY

The attenuation can be modelled by an exponential, and is equivalent to the


tissue transfer function defined as
H T ( f, z ) = exp { - P ( f 1.1. (6)
To be able to extract the attenuation coefficient, it can be seen from Eq. ( 5 )
that the other depth-dependent factor, i.e., the diffraction term, has to be
removed. This can be done by careful measurement, at a large number of
depths, of this diffraction, and then normalization of the spectra by that
measured in the focus. By this procedure (Romijn et al., 1991; Oosterveld
et al., 1989), the influence of the scattering medium employed for these
measurements is removed and a “plane wave” condition is created. This
diffraction correction can be written as

where R equals the focal distance of the employed transducer. Since the
rf-signal from the calibration material is measured in a water tank, while
carefully selecting a “time window” at the same location within this material
and changing the distance between transducer and the top surface, all the
other terms in Eq. ( 5 ) are divided out of Eq. (7).
It will be clear that the division of Eq. (5) by Eq. (7) yields
E ( L 2) = W ) H , 2 ( f ,R ) f G ( f ,z ) H , s ( f ) (8)
After insertion of Eq. (6) into this equation, and after taking the (natural)
logarithm of it, the first derivative to z yields

The factor of two is due to the square of H,;in other words, the distance from
transducer to the insonified region of interest (ROI) is travelled twice.
The attenuation coefficient of most biological tissues is proportional to the
frequency. One method of estimating the “slope” of the attenuation
coefficient is the “quasi multi-narrow band” method (Cloostermans and
Thijssen, 1983). E ( f , z ) is estimated by a sliding window technique from a
series of windowed rf-line segments at depths z,. The discrete Fourier trans-
formation yields estimates of E at a range of discrete frequenciesf;. So, the
324 J. M. THIJSSEN

attenuation coefficient can be estimated at each of these frequencies by


applying Eq. (9), and then a linear regression of the attenuation vs. frequency
is calculated. The slope of this straight line then becomes

This method was first applied by the author (Cloostermans and Thijssen,
1983), and has a statistical advantage over the log-spectral difference method
devised some years before (Kuc et al., 1976). Another method of estimating
the attenuation coefficient (slope) can be found in the literature: the centroid
shift method, which is applicable if the transmitted spectrum is a Gaussian
(Kuc et al., 1976; Dines and Kak, 1970; Fink et al., 1983).
The backscattering characteristics of tissue can be estimated by proceeding
with a further reduction of Eq. (8). This is done by a correction of the
attenuation, i.e., by using Eq. (9), HT is divided out. Then a new measurement
is to be involved: a registration of the echo from a plane reflector placed in
the focus, Hdp(R).Taking a (known) particular reflectivity of this reflector
into account and knowing that this reflection is practically frequency-
independent and identical to Hd(f,R),we get
R, = zZ(f)Hdp(R> 12(f>Hd(f, R)* (1 1)
So, dividing Eq. (8) by Eq. (1 1) yields
E ( f )= H b s ( f ) . (12)
As was shown (Lizzi et al., 1983; Romijn et al., 1989) both for discrete and
inhomogeneous continuum media, the backscattering spectrum of biological
tissues can be modelled by a straight line in the frequency band transmitted
by diagnostic transducers (Fig. 5). The slope of this line is determined by the
effective size of the scattering structures, providing that the attenuation and
the diffraction have been adequately compensated for. The zero-frequency
intercept depends on both this size and the reflectivity of the structures.
The scattering also contributes to the attenuation of the propagating
ultrasound pulse. The attenuation coefficient as defined in Eq. 6 therefore
consists of a pure absorption part and a part related to scattering:
P(f) = P a ( f ) + Ps(f)* (1 3)
In the range of diagnostic frequencies (2-10 MHz), the scattering has been
estimated to contribute in a small fraction to the attenuation coefficient, from
a few percentage points at the low end to the order of 10% at the higher end
(Campbell and Waag, 1984; Nicholas, 1977). The frequency dependence of
the (back)scattering intensity was found to be a power of the order of 1 to
2, which increases with increasing frequency (Nicholas, 1977). Since the
ECHOGRAPHIC IMAGE PROCESSING 325

0 2 4 6 8 10

Frequency (MZ)

FIGURE5. (Back)scattered power as a function of frequency for various sizes of scattering


structures, Gaussian scatterer model.

absorption coefficient is almost proportional to the frequency, it becomes


clear that the relative contribution of scattering to the attenuation increases
at the higher frequencies.

v. GENERATION
OF TISSUE
TEXTURE

The generation of B-mode images from scattering media is first discussed by


using the simple model of Fig. 3. To simplify the discussion even further, the
configuration in Fig. 6 is considered: a transducer insonating a homogeneous
medium which contains four point-like scatterers. These scatterers yield
spherical waves that arrive at the transducer at slightly different times after
the transmission of the ultrasound pulse. Although in reality the rf-echoes
may be more complicated (in the near field), the four echoes are depicted as
replicas of the transmission pulse in Fig. 7 for reasons of simplicity. The

6. Scattering by small structures yields spherical wavefronts (Thijssen, 1987).


FIGURE
326 J. M. THIJSSEN
n

4 - h w
”V
FIGURE7. Scheme of linear summation of echoes from scatterers in Fig. 6 . Resulting
rf-echogram does not display number and level of echoes due to interference: speckle is formed.
Dashed curve is envelope (A-mode) (Thijssen, 1987).

transducer produces an electrical signal (rf), which is the algebraic sum of the
instantaneous sound pressures originating from the four backscattered
waves. This operation is called a linear phase-sensitive reception. As is shown
in Fig. 7, the four rf-echoes build an “interference pattern” because the depth
differences of the scatterers are smaller than the axial size of the resolution
volume of the transducer (i.e., the pulse length).
This is in fact the basic principle of the generation of tissue textures! The
dashed line in Fig. 7 is the demodulated, i.e., video, A-mode echogram,
which, in this case, contains three peaks. Neither this number of peaks nor
the amplitudes are simply related to the number (nor to the location) of the
scattering structures. In an analogy with the interference phenomena, which
are visible when viewing an image produced by laser light, the texture of an
echogram obtained from a scattering medium is called a “speckle” pattern.
It should be stressed once more that the tissue texture is in general not a true
image of the histological structure but rather an interference pattern that is
mainly determined by the beam characteristics. However, as is discussed
later, some characteristics of the tissue structure may be revealed by the
texture. The next step is the construction of a B-mode echogram from the
single A-mode lines (Fig. 8).
When the number of scattering structures in the resolution cell (i.e.,
effective beam width times pulse length) is large, they will not be resolved in
the rf-signal (Fig. 7). The condition at the transducer corresponds to a
random walk in 2-D space: a summation of a large number of phasors with
a phase that is uniformly distributed between 0 and 271. The rf-signal has, for
the large number limit, a circular Gaussian joint probability distribution
ECHOGRAPHIC IMAGE PROCESSING 321

FIGURE8. (a) Generation of B-mode echograms from scattering medium. Rectangles


indicate regions selected for simulations; (b) B-mode images at increasing depth, c.f. (a), no
attenuation: (c) same as (b), with attenuation of 0.1 Np/cmMHz (Thijssen and Oosterveld,
1990).
328 J. M. THIJSSEN

function (p.d.f.) (Goodman, 1975; Wagner et al., 1983).

p ( a ) = (2naZ)-'exp

where a, and ai are the real and imaginary parts of the analytic function.
It can be shown, (Goodman, 1975; Burckhardt, 1978; Abbott and
Thurstone, 1979; Flax et al., 1981; Wagner et al., 1983) that after demodu-
lation of the rf-signal, which yields the video signal v(t),
p(v) = v/aZexp { -v2/2a2}. (15)
This is the Rayleigh p.d.f., whereas the intensity 1, i.e., the square of 'u, has
an exponential p.d.f.
p ( ~ =) 1/ 2 d exp { - 1/2u2}. (16)
The condition where these formulas apply is sometimes called the
"Rayleigh" limit of the number density of scatterers within the tissue. In fact,
it is the absolute number of scatterers N within the resolution cell that is the
important factor. The p.d.f. for lower numbers can be derived in integral
form (Jakeman, 1984)

where Jo is the Bessel function of zero order first kind, and b is the scattering
amplitude of individual scatterers. This equation has to be solved numeri-
cally, but the moments of the p.d.f. can be derived in analytical form.
In addition to these first-order gray-level statistics, i.e., the histogram, it is
important to quantify the second-order statistics as well. These can best be
described by the spatial autocorrelation function (ACF), which in 1-D is
given by
A(Ax) = E { v ( x + Ax)v(x)}, (18)
where E stands for expectation value.
This ACF represents the spatial relations between image pixels, i.e., the
texture characteristics of the image. The speckle nature of echographic
images is illustrated by the B-mode images in Fig. 8B, which were obtained
by calculation with a simulation software package developed at the author's
laboratory.
The rf-lines were calculated from a 3-D volume with a density of 5000
scatterers per cubic centimetre, and at each depth indicated in Fig. 8B a 5 mm
depth range was selected. The sound beam was displaced 0.2 mm, and a new
rf-line was calculated until a lateral image size (vertical in the figure) of 20 mm
was reached. After completion of the simulations, the rf-line segments were
ECHOGRAPHIC IMAGE PROCESSING 329
software-demodulated and depicted in gray-scale. The gray levels of each box
were normalized in order to obtain the most adequate display; this procedure
corresponds to the appropriate setting of the time-gain compensation (TGC)
of equipment. It may be remarked that without this normalization, the image
in the focal zone would have displayed clearly a larger mean gray level than
the surrounding images. The most striking feature of the B-mode images in
Fig. 8B is the continuing increase of the lateral speckle size with increasing
depth, i.e., from left to right. This feature is present in any B-mode picture
and it can be understood now from the foregoing discussion on the inter-
ference at reception by the transducer. When the insonified volume is near the
transducer, the differences in the distance to the transducer of the scatterers
located within the sampling volume are large as compared to the wavelength.
Therefore, the changes of these distances due to the scanning of the beam are
also relatively large, and the lateral size of the speckles is small. This
phenomenon may be looked a t as the inverse of the interference phenomena
that occur in the near field at the transmission of the sound beam (Fig. 5).
When moving through the near field toward the focus, the lateral speckle size
reaches a magnitude that does not change much any more beyond it. This
latter phenomenon can be explained by the simultaneous increase of the
lateral extent of the sampling volume (beam width) and the decrease of the
depth differences of the scatterers with respect to the transducer. The effect
of an attenuation of 0.1 Np/cmMHz is quite evident in Fig. 8c, again
predominantly in the lateral speckle size.

ANALYSIS
VI. TEXTURE

A . DifSuse Scattering Model

The first-order statistics of the texture of echographic images is given by the


gray-level histogram. This histogram is specified in Eqns. (1 7) and (1 5 ) for the
case of a low number density and/of fully developed speckle, respectively.
The most suitable measures of a histogram are the mean, denoted by p, and
the standard deviation, denoted by 0 . Starting with Eq. (17), it can be shown
(Jakeman, 1984) that the signal-to-noise ratio (SNR,) for the intensity
becomes
SNR, = p,/o,= ( 1 + (h4)/nV(b2)2)-'/2, (19)
where ( b 4 / ( b 2 ) 2is the kurtosis of the scattering strength, V is the volume of
the resolution cell of the employed pulsed transducer ( N = n V ) , and n is the
number density. As can be noticed from Eq. (19), the limit value for large n
becomes lim SNR, = 1, (20)
n-m
330 J. M. THIJSSEN

which is also directly derived from the p.d.f. in the case of fully developed
speckle (Eq. 16).
The similar expression for the echoamplitude v is not known to the author;
however, Eq. (19) can be rewritten as
(v’)/((v4) - (V2)’)1/* = (1 + (b4)/nV(b’)’)-1/’, (21)
and after some rearrangement,
(v4)/(v’ j’ = ( 1 + (b4)/nV(b2)2). (22)
This is the expression of the kurtosis of the p.d.f. of v, which can be
experimentally estimated. The value of ( b 4 ) / ( b 2 ) ’ has been assessed for
biological tissues and is of the order of 3 (Sleefe and Lele, 1988). It can be
concluded then, that from Eq. (22) the number density n can be obtained. The
value of V is assessed from the point spread function (PSF) of the employed
transducer, or the analysis of the second-order statistics of the image
(Thijssen and Oosterveld, 1988).
The dependence of the SNR,, i.e., of the echoamplitude, was investigated
by simulations (Thijssen and Oosterveld, 1985; Oosterveld et al., 1985) and
appeared to increase continuously with increasing number density to a limit
set by the mean and standard deiiation of the Rayleigh p.d.f. (Eq. 15):
p v = (?rrT’)I/* (23)
rTv = { (4- ?r)rT’} I/’ (24)
Hence,
SNR, = { 4 ( 4 - 7~))”’ = 1.91, (25)
which is a limit value for large number density, equivalent to Eq. (20). It may
be mentioned that according to the general theory of scattering (Flax et al.,
198l), the mean scattering intensity is proportional to the number density
under all conditions. Therefore,
pw: nil2. (26)
This relation was confirmed in a simulation study (Thijssen and Oosterveld,
1985, 1986; Oosterveld et al., 1985). It indicates a potential for characterizing
tissues and changes due to pathologic conditions. It should be emphasized
that the relative change of SNR, is of the order of a factor two, when the
number density increases over two decades. The mean p w ,however, displays
an increase by a factor of 10 for the same density range.
The lateral and the axial size of the speckle cannot be calculated analyti-
cally except in the small zone around the focus, which underlines the
importance of performing realistic 3-D simulation studies. Therefore, this
discussion will be restricted to this focal zone. Analytic formulae for the axial
ECHOGRAPHIC IMAGE PROCESSING 33 1
and lateral dimensions of the speckle for this condition were derived by
Wagner et al. (1983). These authors extended the theory of speckle that was
developed for coherent light (i.e., laser speckle) by Goodman (1975). The size
of the speckles as given by the full-width-at-half-maximum (FWHM) of the
“autocorrelation” function in the axial direction is found to be (in the focus)
FWHM,, = 0.61/Af (p), (27)
where Af is the FWHM of the spectrum corresponding to the transmitted
ultrasound pulse (- 6 dB width). When assuming a pulse with a Gaussian
envelope with “standard deviation” o r , it can easily be shown that the
spectrum is also a Gaussian and
O,Of = (27T-I

Rewriting Eq. (27) for this Gaussian case,


FWHM,, = 0.2610~= 1.630, ( p s ) . (29)
By using the speed of sound in biological tissues (1 500 m/s), the latter formula
can be rewritten in the spatial domain:
CJ
FHWM,, = 1.63 2= 2.170, (mm). (30)
0.75
The factor 0.75 is derived from half of the speed in m m / p , because of the
two-way travelling of the echowaveform, and O: stands for the standard
deviation of the ultrasound pulse in the axial ( 2 - ) direction.
The lateral width of the ACF was also derived in Wagner et al. (1983) when
assuming independence of the axial and lateral directions. This assumption
holds approximately in the focal zone, because the plane wave condition
exactly applies in the geometrical focus. The lateral case yielded
FWHM,,, = O.8OACF/D’% 0.86AcF/D (31)
where Ac is the wavelength at the central frequency of the transmitted
spectrum, F is the focal length and D‘ = D/1.08 is the effective and D the
geometrical diameter of the (circulate) transducer.
Equations (27)-(3 1) indicate that the speckle dimensions are completely
and exclusively dependent on the characteristics of the employed transducer
(Thijssen and Oosterveld, 1985, 1986; Oosterveld et al., 1985; Foster et al.,
1983; Smith and Wagner, 1984). However, as is evidenced from simulations
(Thijssen and Oosterveld, 1985; Oosterveld et al., 1985), when the number
density is relatively low the speckle dimensions are larger, as can be seen in
Fig. 9. Under these conditions, the texture statistics are thus indicative for
the number density. The half-width (-6dB) sizes of the PSF (i.e., the
resolution cell) of the employed transducer in the axial and lateral directions,
332 J. M.THIJSSEN

FIGURE 9. B-mode images from simulations with increasing volume densities of the scatter-
ing structures, from left to right 100 to 3000cm-’ (Oosterveld et al., 1985).
respectively, can be shown to be (Thijssen and Oosterveld, 1988)
AZ = 2.3550, (32)
and
AX = l.Q2&F/D. (33)
Because in biological tissues the attenuation coefficient is proportional to
frequency to a fair approximation, it can easily be shown that a Gaussian
spectrum corresponding to the ultrasound transmission pulse will remain
Gaussian. This property implies that attenuation induces a downshift of the
central frequency, while the bandwidth is maintained. Therefore, only the
lateral speckle size will increase (Eq. 31) with increasing depth. This increase
enhances the already occurring increase due to the beam diffraction (Fig. 8),
and it is still present in the far field. It may be clear that any texture analysis
of echographic images can be unambiguous only when these effects are in
some way corrected for and the texture has been made homogeneous in
depth.

B. Combined DiffuselStructural Scattering Model

In this model, it is assumed that in addition to the randomly positioned


scatterers, yielding diffuse scattering, a second class of scatterers is present,
which is regularly spaced (Fig. 4). This long-range order may be isotropic
ECHOGRAPHIC IMAGE PROCESSING 333
(e.g., liver) or anisotropic (e.g., muscle). The structural scattering in liver is
related to the lobular structure with a characteristic dimension of the order
of 1 mm. These lobules are surrounded by the portal triads in a hexagonal
grid. The triads are collagen-rich structures, which therefore are relatively
strong scatterers. In the following, only isotropic structure is considered.
Since the liver is generally examined with frequencies in the range of
2-5 MHz, it follows from Eqs. (32) and (33) that the triads may be resolved
in the axial but not in the lateral direction. For this reason, both of these
conditions have to be considered.
First, we consider the sub-resolution structure. This structure yields a
coherent scattering component, which contributes like a specular reflector
(Wagner et al., 1986; Goodman, 1975). The joint p.d.f. of the rf-signal
(Eq. 14), therefore, is shifted along the real axis over a distance equal to the
amplitude of this coherent scattering component. The p.d.f. of the overall
scattering intensity (Eq. 16) now becomes (Rice, 1945)

where
(Id) = ensemble average of diffuse scattering intensity = I d when taking
stationarity into account;
I, structural scattering intensity (= ( I , ) in case of unresolved structure);
=
and
I, = modified Bessel function of zero order, first kind.
This equation is derived while assuming that the variance of I, is small
compared with that of Id, and that the number density of the diffuse scatter-
ing is large.
It can be shown (Wagner et al., 1986) that the signal-to-noise ratio becomes
SNR, = {Id+ I , } / { I j + 21s1d}”2 (35)
Defining r = Is/Id, it follows from this equation that
SNR, = (1 + r)/(l + 2r)1’2. (36)
Hence, the high number density limit of SNR, is again a constant (Eq. 20),
which is determined by the intensity ratio r . It may be remarked that the
denominator of Eqs. (35) and (36) equals the so-called Rician variance:
c’,= I: + 21,Id. (37)

C. Resolved Structure

The influence of resolved structure on the texture of echographic images was


334 J . M . THIJSSEN

FIGURE10. B-mode images from simulations with increasing relative scattering strength of
structural scattering (cubic matrix, I .25 mm characteristic distance) embedded in diffusely
scattering medium (volume density 5000 cm-’). Relative reflectivity of structural scatterers: (a)
25%; (b) 50%; (c) 75% (Thijssen and Oosterveld, 1990).

studied by simulations (Jacobs and Thijssen, 1991). In addition to a diffuse


scattering with a number density of 5000 ~ m - a~ cubic , matrix of scatterers
was simulated with a characteristic dimension d = 1.25 mm. The axial PSF of
the 3.5 MHz transducer was 0.7 mm. One of the variable parameters was the
relative scattering strength of the structure. Examples of the resulting images
are shown in Fig. 10, where the relative strength is 25, 50, and 75 percent,
respectively. As can be seen, a stratification of the texture occurs at the two
highest scattering strengths. This stratification is present in the axial direction
only, because the structure is not resolved in the lateral direction. The matrix
was aligned with the axis of the scanning ultrasound beam.
The axial ACF’s of the images in Fig. 10 are shown in the left column of
Fig. 1 1. Increasing the structural scattering strength results in an increasing
occurrence of equidistant peaks along the tail of the ACF. The distance
between these peaks corresponds to the characteristic dimension, d, of the
cubic matrix. The right column of Fig. 11 displays the spectra derived from
the ACF’s. Again the structure is revealed, in this case as a scalloping on top
of the spectrum due to the diffuse scattering. The magnitude of the peaks is
now weighted by the spectrum of the transmitted sound pulse. In the simula-
ECHOGRAPHIC IMAGE PROCESSING 335

FIGURE1 I . (a) Autocorrelation functions (ACF) of the texture in the images of Fig. 10,
axial direction, d = characteristic distance of the cubic matrix; (b) spectra calculated from the
ACF’s in (a). Oscillations of ACF’s (ie., structural scattering) are revealed by a peak (arrow)
upon the spectra produced by the diffuse scattering.
336 J. M. THIJSSEN

tions, a Gaussian spectrum was implied; hence, both the spectral components
due to diffuse and to structural scattering are Gaussian weighted.
The analysis of the texture for this case of resolved structure in addition to
diffuse scattering is based on the autocorrelation function and the corre-
sponding power spectrum of image texture (Wagner et al., 1983; Lowenthal
and Arsenault, 1970; Insana et al., 1986b).The somewhat lengthy expressions
are not reproduced here, and the discussion is restricted to the derivation of
relevant parameters. Writing the total variance of the intensity,
0: = (I2) - (I)2 = 0; + c;, (38)
where 0; is the Rician variance, as before, and X: is the variance due to
(resolved part of) structural scattering. X: can be derived from the overall
second-order statistics. However, when considering a Gaussian-shaped
ultrasound transmission pulse, the power spectrum corresponding to the sum
of the diffuse and the unresolved structural scattering will be a Gaussian as
well. Therefore, a Gaussian is fitted to the minima of the scalloping due to
resolved scattering. The area below this curve then equals the Rician variance
c’, and the integral of the superimposed line spectrum yields the structural
variance Z:. It can be shown that

Hence,
(I>2 = .; + I:. (39)

I,’ = ( I ) 2 - 0;, (40)


which yields I,, and then Id can be derived from
Id = ( I ) - 1,. (41)
From these parameters the following relative quantities were derived:
r = Ir/Id, and u = &/Id
Finally, the average structural distance, d (scatterer spacing), was derived
from the peaks at the power spectrum (Insana et al., 1986). The average
spatial frequency was estimated of those spectral peaks greater than the fitted
Gaussian plus two times the standard deviation. The range of spatial
frequencies considered in the averaging has to be limited. The high and the
low limits are set by the width of the PSF of the employed transducer and by
size of known anatomical structures (i.e., of the order of a few millimetres),
respectively.
This approach of texture analysis was applied to the problem of detecting
and differentiating diffuse liver diseases (Insana et al., 1986; Garra et al.,
1987). These authors used the attenuation coefficient slope as a fourth
parameter in a linear discriminant analysis. A similar, but more general,
ECHOGRAPHIC IMAGE PROCESSING 337
approach was followed by Oosterveld et al. (1991; Oosterveld, 1990). They
started with a broader set of parameters, which also included the overall first-
and second-order statistics of the texture, and the backscattering parameters.
Then a parameter selection was made for each of the classes in a retrospective
classification of known diseases. It appeared that the structural parameters
were rarely among the optimal set of parameters. This leads to the conclusion
that either the structural characteristics of liver tissue are not regular enough,
or the resolution of the transducers generally employed for liver diagnosis
(3.5 MHz central frequency) is not sufficiently high for a proper analysis of
the structure.

D . Non-Parametric Texture Analysis

Several other methods of analysis of the second-order statistics of image


texture have been applied to echograms. A common aspect of these methods
is the absence of a specific model for the biological tissue structure. The first
method is a two-dimensional histogram describing the occurrence of gray-
level combinations in pairs of pixels, which are spatially separated by a
certain distance. Although some authors (Nicholas et al., 1986; Schlaps et al.,
1987; Raeth et al., 1985) considered the image matrix as being symmetrical,
the large difference in the axial and lateral speckle dimensions necessitates
taking two different pixel separations into account and leaving out the
diagonal. Moreover, since the lateral speckle size is very much depth-
dependent in the near field (in front of the focus), the cooccurrence matrix
may be ambiguous if the ROI is not positioned carefully and reproducibly.
Defining the cooccurrence matrix as a 2-D histogram yields
C,C J p ( i , j )= 1, (42)
where i, j are discrete gray levels (0, N ) .
Among the many parameters that can be derived from this matrix are a few
that have been proven to be relevant to the analysis of clinical echograms:
Contrast = X I C, ( i - j ) * p ( i , j ) (43)
Energy = 2, C, p 2 ( i , j )( = angular second moment) (44)
Entropy = - C, C, i j p ( i , j ) log, { p(i, j ) } (45)
Correlation = [C, CJ i j p ( i , j ) - m , m , ] / [ s , s , ] , (46)
where
m, = C,iC,p(i, j )
m, = C,j C , p ( i , j )
338 J. M. THIJSSEN

3: = xi xj (i - m i ) 2 p ( i , j ) .
It may be remarked that in the first application of the cooccurrence matrix
by Haralick et al. (1973), these four parameters were also employed.
Another method that was applied to clinical echograms is the MAX-MIN
method (Mitchell et al., 1977). In this case the radiofrequency scanlines
underlying the echographic image are processed (Lerski et al., 1979). The
method consists of a gradual smoothing of the echosignals, and for each
grade of smoothing the number of extrema, i.e., maxima and minima, is
estimated. The smoothing algorithm is as follows:
if yk < xk+i - then Yk+i = xk+i - TI2
if xk+l - T/2 < y k < xk+l + T/2, then Y k + l = xk (47)
if xk+l + < Y k , then yk+l =xk+l + T/2,
where xk is the original (rf-) signal value at sample k,Y k is the new signal value
at sample k , and T is the threshold value.
As was shown by Mitchell et al. (1977), a plot of the logarithm of the
number of extrema vs. the threshold level may display a fairly straight line
at the low threshold position. These authors also explained that when pro-
cessing the logarithm of the image intensity, the slope of this line becomes
independent of the amplifier gain of the display system. Since most com-
mercially available echoscanners basically display the log-compressed envelope
of the echodata, the MAX-MIN method could easily be implemented as
described, provided that an adequate compensation for the ultrasound
attenuation is implemented. This is confirmed in recent work by Berger et al.
(1992).

VII. IMAGE
PROCESSING

A . Detection of Focal Lesions

Until here, the changes of the image texture due to changes of the volume
density, the structure, and/or the reflectivity of the scattering sites have been
considered for the image as a whole. It has been concluded that the mean
gray-level may change as well as the size of the speckles. However, in many
instances, the clinical question is the detection of the presence of a focal lesion
within an organ (Fig. 12). In terms of the statistical theory of signal detection
(Metz, 1978; Swets and Pickett, 1982; Thijssen, 1988), the problem can be
stated as follows: Which is the possibility of observing a difference between
a circular area (containing therefore a particular number of speckles)
ECHOGRAPHIC IMAGE PROCESSING 339
D
I AV
I

FIGURE12. Schematic drawing of the concept of detection of a circular lesion in a


homogeneous tissue texture. Mean intensity level V , lesion incremental intensity A V , lesion
diameter D (Thijssen and Oosterveld, 1990).

suspected of being a lesion and an area of the same size that can be considered
to belong to a “normal” part of the tissue? and When do the mean texture
characteristics (mean gray-level and/or speckle size) of these areas differ to a
certain amount? By also taking into account the transfer of echolevel to gray
level at the image display, a “lesion signal-to-noise-ratio” can be defined
(Smith et al., 1983b; Wagner and Brown, 1985), which uniquely describes the
detectability of the lesion. This detectability index applies to the concept of
an ideal observer, i.e., a concept where it is assumed that no uncertainty
(noise) is introduced by the detection process itself (North, 1963).
The lesion signal-to-noise ratio SNR, is defined as
SNRL = [(sz) - + I >I[d,, + d,Ll-1’2, (48)
where ( s j ) is the mean over the lesion area in the case of background ( j = I),
or of lesion ( j = 2), and c:., is the variance over the lesion area in the case
of background ( j = I), or of lesion ( j = 2 ) .
The relation of the statistical area (lesion) parameters to the pixel statistics
now has to be derived. The lesion is characterized by a weighting function
a(x,y) which can be uniform (e.g., equal to unity), or any other function (e.g.,
Gaussian; Wagner and Brown, 1985). The numerator in Eq. (48) can be
written

(sz) - GI) = A L I Jldr, dxdyk(o2) - g(v,)la(x,y), (49)


where A , is the lesion area, x, y are the Cartesian coordinates in the image,
and g ( v ) is some function of the signal (envelope) o. The lesion variance
follows from

6,= J s,, [J s,,


dxdy
dx’dy’a(x, y)C,(x - x’, y -
1
~ ’ ) C C ( X ’ , ~ ’,)

(50)
where C,(x - x’) = ([s(x) - (s(x))] [s(x’) - (~(x’))]) = autocovariance
(ACV). It can be shown that Eq. (50) for uniform weighting reduces to
340 J. M. THIJSSEN

where A, = area under the normalized ACV (Smith ef af.,1983b).


Since the first two terms in this equation describe the inverse of the number
of speckles M within the lesion area
M = A,/& (52)
and
C,(O,O) = a;,,, (53)
where of.,= pixel variance in either of the conditions, Eq. (51) can be
rewritten:
o;,L = M-‘a;,,. (54)
Thus, the lesion signal-to-noise ratio (Eq. 40) becomes
SNRL= CLSNR, (55)
+
where CL= [ ( g ( v 2 ) ) - ( g ( v l ) ) / [ ( g ( v 2 ) ) ( g ( v , ) ) ] (=lesion contrast), and
SNR, = [(dud) + (g(vl)>l/[d,,+ d,p1112
(=pixel SNR).
The occurrence of the term M‘” in Eq. ( 5 5 ) indicates that the SNR, is
dependent on the total number of independent “signal” samples within the
lesion area, i.e., the number of speckles. This result is analogous to that
obtained in photon images (e.g., x-ray), where this number stands for the
photon count over the lesion area (Wagner and Brown, 1985).
The SNRL can also be applied to the assessment of the imaging quality
of echographic equipment. Smith et al. (1983a; Smith and Lopez, 1982)
described a tissue mimicking phantom containing cones of different reflectiv-
ity levels. Scanning of the cones yields tissue-like echograms containing discs
of a particular size with various mean gray levels. These images can then be
used to estimate the visual detectability (ROC-analysis); and by repeating the
scanning for different cross-sections of the cones, a contrast-detail curve can
be measured for the scanner used by psychophysical experiments (Smith and
Lopez, 1982). Otherwise, it is possible to assess the SNRL after storage of the
echograms in a computer, as a function both of the contrast level and of the
size of the lesion.
Thijssen et al. (1988) employed the lesion SNR to investigate the influences
of the pre- and postprocessing and of the display characteristics of the
TV-monitor on the lesion detection. They derived analytic expressions for the
various conditions, which were verified by results obtained from simulated
and experimental B-mode images. They concluded that the logarithmic
compression (and proper TGC setting) prior to the digitization, as is
generally implemented in echographic equipment, combined with the gamma
of the TV-monitor, which is of the order of 2, yields the optimum lesion SNR
ECHOGRAPHIC IMAGE PROCESSING 34 1
for a relatively large contrast of the lesion. This condition corresponds with
the “linear” look-up table, i.e., the post-processing curve of the equipment.
In the low-contrast case, the choice of the postprocessing curve does not
significantly influence the lesion detectability.

B. Improvement of Lesion Detection

The (spatial) compounding of B-mode images by using a static scanner was


known to produce not only better outlining of the anatomy but to improve
the lesion detection simultaneously. Burckhardt (1978) theoretically derived
an expression for the signal-to-noise ratio when the peak-detect mode of the
scan converter was used. In general terms, it can be stated that the lesion
SNR improves proportionally to the square root of the number of indepen-
dent images that are superimposed. Burckhardt showed that the peak detect
mode is somewhat inferior to this theoretical maximum.
Because of the replacement of static scanners by real-time equipment, the
improvement of lesion detectability by compounding may seem obsolete.
However, the introduction of computer-controlled systems with linear array
transducers has opened new horizons. For instance, it is possible to use a
sub-array of the linear array as a phased-array, thus producing sector scans.
The sub-array is then moved along the linear array and a series of partially
overlapping sector images is obtained. This way of scanning was considered
by Trahey et al. (1986a). These authors showed that by linear superposition
the SNR indeed was improved. Due to the limited rotation of the angle of
view when shifting the sector scanning along the array, the images are still
correlated and, therefore, the improvement achieved corresponds roughly to
an effective number equal to the square root of two-thirds of the number of
sector scans. This improvement is still impressive and is without doubt
clinically relevant, as can be seen from Fig. 13.
The basic idea of compounding is implemented in some modern equipment
by what is often called the “integration” mode of operation. This implies that
a weighted moving average of 5 to 7 images is calculated by the on-board
computer, and the frame rate of the system is slowed down accordingly.
When the transducer is angled, shifted, or rotated by hand, the speckle
pattern of successive images will partially decorrelate, and the average image
then displays a reduced speckle, which implies, as before, an enhanced
signal-to-noise ratio. The limitation of this technique is the theoretical
separation between two images of the order of half the beam width that is
needed to obtain full decorrelation (Burckhardt, 1978; Wagner et al., 1988).
Therefore, relatively large displacements of the scan plane are needed, which
implies that small focal lesions easily disappear from the field of view.
A basically different approach is the subdivision of the bandwidth of the
342 J. M.THIJSSEN

FIGURE13. Left: single image of hyperechoic “lesion” in a contrast detail phantom. Right:
compound image of same lesion, average of six scans (‘1986 IEEE).

transmitted ultrasound pulse into a number of partially overlapping sub-


bands. The received rf-echolines are multiband filtered and the filter outputs
are demodulated separately. This was termed “frequency compounding”
(Magnin et al., 1982; Gehlbach and Sommer, 1987). The video images thus
obtained for each sub-band are then superimposed. This technique indeed
reduces the speckle appearance of the images, i.e., the pixel SNR of the gray
levels increases. However, when the degrading of the spatial resolution due
to the narrow bandwidth of each sub-image is taken into account, it appears
that the lesion SNR has decreased instead of increased (Trahey et al., 1986b)!
This result is also influenced by the correlation of the subimages due to the
partial overlapping of the successive frequency bands.
It is shown in the previous section on lesion detection that the visibility of
a lesion in a homogeneous background is limited by the speckle nature of
echographic images. One possible method of enhancing this detectability, as
an aid to the human observer, would be to reduce the speckle. Various kinds
of smoothing filters to achieve this improvement have been described in the
literature. The simplest approach is a mean filter, where the average gray level
of a rectangular sliding window is assigned to the central pixel. This strategy
will, of course, improve the pixel signal-to-noise ratio, SNR,, but at the same
time it will reduce the contrast and the sharpness of the image (Smith et al.,
1983b). For these reasons, the overall effect on the SNR will be negligible for
small lesions. The sharpness, i.e., the contours of large specularly reflecting
structures, can be preserved, however, by an adaptive mean filter (Pratt, 1978;
Pitas and Venetasanopoulos, 1990). This filter can be described by the
ECHOGRAPHIC IMAGE PROCESSING 343
equation
(56)
where
v' = new gray level of central pixel of a window
(v), = mean gray level of a window
k = (similarity) factor
v = original gray level of central pixel.
This approach was applied by Bamber and Daft (l986), where they used the
variability of the gray-level statistics of a window, relative to that of a
reference (image) to set the adaptive factor k :
k = (gP - ( P > r )lP, (57)
where
P = (d/(v>),
g = factor determining the aggressiveness of the filter
(p), = (at/(v)), of a reference image.
The effect of the similarity factor is that, in a window in which a contour
(edge) is present, the original pixel information is preserved. Although the
authors showed some examples of processed clinical echograms, the impact
on tumor detection still has to be proven. Also, adaptive non-linear filters
were investigated for the purpose of improving lesion detectability. Loupas
et al. (1989) used an adaptive weighted median filter (see Pitas and
Venetasanopoulos, 1990), and more recently, Kotropoulos and Pitas (1992)
showed that the maximum likelihood estimator of a lesion is a L, mean filter,
i.e., the mean of the squared gray levels.
The adaptive filters described thus far are based on the assumption that a
lesion can be characterized as a region where the mean gray level differs from
that of the surrounding tissue. However, it is known from the literature that
tumors of appreciable size can be isoechoic. This means that the first-order
gray-level statistics should not be considered exclusively. A problem arises
when using higher order statistics: The amount of data, i.e., the window size,
needed to estimate higher-order parameters is larger than for those of the
first-order statistics. An attractive solution to this problem has been worked
out by Verhoeven and Thijssen (1990; Verhoeven et al., 1991). From the
simulation study by Oosterveld et al. (1989, it is evident that both the pixel
SNR and the width of the autocorrelation function display a similar change
with changing number density of the scatterers in a diffusely scattering
344 J. M. THIJSSEN

medium. For this reason, the SNR, can be used as an estimator of the
second-order statistics, but with a window size that is comparable to those
involved in the mean and median filters described previously. The SNR-filter
could be shown to produce visible lesions in the case of absence of gray-level
contrast (Verhoeven and Thijssen, 1990; Verhoeven et al., 1991).
The non-adaptive filters can also be looked at as producing parametric
images. So far, only parameters derived from the texture statistics have been
considered. It is, however, also feasible to estimate locally acoustic tissue
parameters: attenuation coefficient and backscattering parameters (Coleman
et al., 1985; Insana and Hall, 1990). Furthermore, multiparameter images
were derived by applying a cluster analysis (Mailloux et a]., 1985) or a
discriminant analysis (Momenan et al., 1988) to the locally derived
parameters in an effort to obtain a segmentation. It may be concluded, then,
that currently many ideas are circulating in the scientific community, but the
clinical impact remains to be shown. Moreover, the inhomogeneity of the
speckle characteristics of echographic images (i.e., depth-dependence due to
beam diffraction) remains an additional complicating factor in image pro-
cessing, at least for the time being.
C . Detection of Diffuse Pathological Conditions
The detection of diffuse pathological changes is a difficult task for a human
observer when using present-day equipment. The first problem is that the
“normal” condition, i.e., the normal appearance of the tissue texture, has to
be memorized. The second problem is a practical one: to be able to assess
changes of the mean gray-level of the equipment, its gain and TGC settings
should be consequently maintained after repeated calibrations, e.g., with a
stable tissue-mimicking phantom. This procedure is complicated by the
variable attenuation of intervening tissues (e.g., subcutaneous fat layer) of
patients, which should be compensated for by taking an average attenuation
per cm of tissue into account. Some brands of equipment facilitate this
procedure by enabling different TGC ranges and slopes to be set. The third
problem is extensively discussed in this paper; it arises from the dependence
of the speckle pattern on the transducer characteristics and on the depth
range (the “diffraction” effects). This problem is circumvented to a large
extent by the “computed sonography” type of echographic equipment. The
array transducer of this equipment is software-controlled in such a way that,
at a series of depths, a focusing at transmission is obtained with the same
numerical aperture, i.e., the employed part of the array is increased in
proportion to the depth range. This multifocus transmission mode is
combined with either the continuous (dynamic) focusing at reception, or else
the multifocus mode is also employed at reception. The resulting tissue
echograms display a fairly homogeneous texture over a large depth range.
ECHOGRAPHIC IMAGE PROCESSING 345
When scanning a patient, attenuation will again cause a depth dependence,
but it might be avoided to some extent if the synthetic focusing is performed
while anticipating an average attenuation level (e.g., 0.3 dB/cm MHz).
However, the effects caused by the modification of the spectral contents of the
travelling waveform by the tissue cannot easily be corrected for, and the
lateral size of the speckle will therefore still be depth-dependent.
A more appropriate means of assessing diffuse changes of the tissue texture
is the employment of a computer to analyze not only the first-order (i.e.,
gray-level histogram) but also the second-order (i.e., speckle characteristics)
statistical properties of the texture. When the radiofrequency signals, rather
than the video echograms, are digitized, a proper correction for both the
beam diffraction and the attenuation effects along the scan lines can be
achieved in the frequency domain prior to (software) image formation. The
assessment of abnormality can then be performed by comparing the acoustic
tissue parameters (attenuation coefficient, backscattering) as well as the
texture features of the image under investigation with a data base of
“normals.” This kind of combined acoustospectrographic and textural
analysis has already produced very convincing results (Insana et al., 1986a;
Garra et al., 1987, 1989; Oosterveld et al., 1991; Oosterveld, 1990; Nicholas
et al., 1986; Schlaps et al., 1987; Raeth et al., 1985; Feleppa et al., 1986;
Thijssen et al., 1991) and should be advocated for future developments in
equipment technology.

ACKNOWLEDGMENTS

This work has been supported by grants from the Netherlands’ Cancer
Foundation and the Technical Branch of the Netherlands’ Organization for
Scientific Research (NWO).

REFERENCES

Abbott, J . G . , and Thurstone, F. L. (1979). Acoustic speckle: theory and experimental analysis.
Ultrasonic h a g . 1, 303-324.
Bamber, J. C., and Daft, C . (1986). Adaptive filtering for reduction of speckle in ultrasonic
pulse-echo images. Ultrasonics 24, 41-44.
Berger, G., Giat, P., Laugier, P., and Abouelkaram, S. (1992). Basic aspects of the max-min
measure related to tissue texture. In “Acoustical Imaging” (H. Ermert and H. P. Harjes, eds.),
Vol. 19. Plenum, New York (in press).
Burckhardt, C. B. (1978). Speckle in ultrasound B-mode scans. IEEE Trans. Sonics Ultrasonics
SU-25,1-6.
Campbell, J. A,, and Waag, R. C. (1984). Measurement of calf liver ultrasonic differential and
total scattering cross sections. J . Acoust. Soc. Am. 75, 603-611.
346 J. M. THIJSSEN

Cloostermans, M. J. T. M., and Thijssen, J. M. (1983). A beam corrected estimation of the


frequency dependent attenuation of biological tissues from backscattered ultrasound. Ultra-
sonic Imag. 5, 136-147.
Coleman, D. J., Lizzi, F. L., Silverman, R. H., Helson, L., Torpey, J. H., and Rondeau, M. J.
(1985). A model for acoustic characterization of intraocular tumours. Invest. Ophthal. Vis.Sci.
26, 545-550.
Dines, K. A., and Kak, A. C. (1970). Ultrasonic attenuation tomography of soft tissues.
Uitrasonic Imag. I, 16-33.
Feleppa, E. J., Lizzi, F. L., Coleman, D. J., and Yaremko, M. M. (1986). Diagnostic spectrum
analysis in ophthalmology: a physical perspective. Ultrasound Med. Eioi. 12, 623-63 I .
Fink, M., and Cardoso, J. F. (1984). Diffraction effects in pulse-echo measurement. IEEE Trans.
Sonics Ultrasonics SU-31, 3 13-329.
Fink, M., Hottier, F., and Cardoso, J. F. (1983). Ultrasonic signal processing for in vivo
attenuation measurement: short time Fourier analysis. Ultrasonic /mag. 5, 1 17-1 35.
Flax, S. W., Glover, G . H., and Pelc, N. J. (1981). Textural variations in B-mode ultra-
sonography: a stochastic model. Ultrasonic Imag. 3, 235-257.
Foster, D. E., Arditi, M., Foster, F. S., Patterson, M. S., and Hunt, J. W. (1983). Computer
simulations of speckle in B-scan images. Ultrasonic Imag. 5, 308-330.
Garra, B. S., Shawker, T. H., Insana, M. F., and Wagner, R. F. (1987) In-vivo attenuation
measurement methods and clinical relevance. In “Ultrasonic Tissue Characterization and
Echographic Imaging” (J. M. Thijssen and G. Berger, eds), Vol. 6, pp. 87-100. Office for
Official Pub1 EC, Luxembourg.
Garra, B.S., Insana, M.F., Shawker, T.H., Wagner, R.F., and Bradford, M. (1989).Quantitative
ultrasonic detection and classification of diffuse liver disease: Comparison with human
observers performance. Invest. Radio/. 24, 196-203.
Gehlbach, S. M., and Sommer, F. G. (1987). Frequency diversity speckle processing. Ultrason.
Imag. 9,92-105.
Goodman, J. W. (1975). Statistical properties of laser speckle patterns. In ”Laser Speckle and
Related Phenomena” (J. C. Dainty, ed.), pp. 9-75. Springer, Berlin.
Haralick, R. M., Shammugan, K., and Dinstein, I. (1973). Textural features for image classifi-
cation. IEEE Trans. Syst. Man Cybbern. SMC-6,610-621.
Harris, G. R. (1981). Review of transient field theory for a baffled planar piston. J. Acoust. Soc.
Am. 70, 10-20.
Insana, M. F., and Hall, T. J. (1990). Characterization of microstructure of random media using
ultrasound. Phys. Med. Biol. 35, 1373-1 386.
Insana, M. F., Wagner, R. F., Garra, B. S., and Shawker, T. H. (1986a). A statistical approach
to an expert diagnostic ultrasonic system. In Application of Optical Instrumentation in
Medicine XIV (R. H. Schneider, and S. J. Dugeri, eds). Proc. Soc. Photo-Opt. Instr. Eng. 626,
24-29.
Insana, M. F., Wagner, R.F., Garra, B. S., Brown, D. G., and Shawker, T. H. (1986b). Analysis
of ultrasound image texture via generalized Rician statistics. Opt. Eng. 25, 743-748.
Jacobs, E. M. G., and Thijssen, J. M. (1991). A simulation study of echographic imaging of
structurally scattering media. Ultrasonic Imag. 13, 31 6-333.
Jaffe, C. C., and Harris, D. J. (1980). Sonographic tissue texture: influence of transducer
focussing pattern. Am. J. Roentgenol. 135, 343-347.
Jakeman, E. (1984). Speckle statistics with a small number of scatterers. Opt. Eng. 23,453-461.
Jongen, H., Thijssen, J. M., Van den Aarssen, M., and Verhoef, W. A. (1986). A general model
for the absorption of ultrasound by biological tissues and experimental verifications. J.
Acoust. Soc. Am. 79, 535-540.
Kotropoulos, C., and Pitas, 1. (1992). Nonlinear filtering of speckle noise in ultrasound B-mode
ECHOGRAPHIC IMAGE PROCESSING 347
images. Ultrasonic Imag., in press.
Kossoff, G. (1974). Display techniques in ultrasound pulse echo investigations. J. Clin.
Ultrasound 2, 61-72.
Krimholtz, R., Leedom, D. A., and Mattai, G . L. (1970). New equivalent circuits for elementary
piezoelectric transducers. Electronics Letters 6, 398-399.
Kuc, R., Schwartz, M., and von Micksky, G. L. (1976). Parametric estimation of the acoustic
attenuation coefficient slope for soft tissues. In “IEEE Ultrasonics Symp. PROC, IEEE,” Cat.
NO. 76, HI 120-550, pp. 44-47.
Lerski, R. A., Barnett, E., Morley, P., Mills, P. R., Watkinson, G . , and MacSween, R. N. M.
(1979). Computer analysis of ultrasonic signals in diffuse liver disease. Ultrasound Med. Biol.
5, 341-350.
Lizzi, F. L., Greenebaum, E. J., Feleppa, E. J., and Elbaum, M. (1983). Theoretical framework
for spectrum analysis in ultrasonic tissue characterization. J . Acoust. Soc. Am. 73, 1366-1373.
Loupas, A,, McDicken, W. N., and Allan, P. L. (1989). An adaptive weighted median filter for
speckle suppression in medical ultrasonic images. IEEE Trans. Circ. Syst. CAS-36, 129-1 35.
Lowenthal, S., and Arsenault, H. (1970). Image formation for coherent diffuse objects: statistical
properties. J. Opt. Soc. Am. 60, 1478-1483.
Magnin, P. A,, Von Ramm, 0. T., and Thurstone, F. L. (1982). Frequency compounding for
speckle contrast reduction in phased array images. Ultrason. Imag. 4, 267-281.
Mailloux, G. E., Bertrand, M., and Stampfler, R. (1985). Local histogram information content
of ultrasound B-mode echographic texture. Ultrasound Med. Biol. 11, 743-750.
Mason, W. (1948). “Electromechanical Transducers and Wave Filters.” Van Nostrand, New
York.
Metz, C. E. (1978). Basic principles of ROC analysis. Sec. Nucl. Med. 8, 283-298.
Mitchell, 0. R., Myers, C. R., and Boyne, W. (1977). A max-min method for image texture
analysis. IEEE Trans. Comput. C-26,408414.
Momenan, R.. Insana, M. F., Wagner, R. F., Garra, B. S., and Brown, D. G. (1988). Application
of cluster analysis and unsupervised learning to multivariate tissue characterization. J. Clin.
Eng. 13, 455-461.
Morse, P. M.. and Ingard, K. U. (1968). “Theoretical Acoustics.” McGraw-Hill, New York.
Nicholas, D. (1977). An introduction to the theory of acoustic scattering by biological tissues.
In “Recent Advances in Ultrasound in Biomedicine” (D. N. White, ed.), Vol. I , pp. 1-28.
Research Studies Press, Forest Grove, Oregon 971 16.
Nicholas, D. (1982). Evaluation of backscattering coefficients for excised human tissues: results,
interpretation and associate measurements. Ultrasound Med. Biol. 8, 17-28.
Nicholas, D., and Hill, C. R. (1975). Acoustic Bragg diffraction from human tissues. Nature 257,
305-307.
Nicholas, D., Nassiri, D. K., Garbutt, P., and Hill, C. R. (1986). Tissue characterization from
ultrasound B-scan data. Ultrasound Med. Biol. 12, 135-143.
North, D. 0. (1963). The modification of noise by certain non linear devices. Proc. IEEE 51,
10-16.
Oosterveld. B. J.. (1990).“On the Quantitative Analysis of Ultrasound Signals with Applications
to Diffuse Liver Disease.” Ph.D. Thesis, Nijmegen University, The Netherlands.
Oosterveld, B. J., Thijssen, J. M., and Verhoef, W. A. (1985). Texture of B-mode echograms: 3-D
simulations and experiments of the effects of diffraction and scatterer density. Ultrasonic Imag.
7, 142-160.
Oosterveld, B. J., Thijssen, J. M., Hartman, P., and Rosenbusch, G . J. E. (1989). Ultrasound
attenuation and B-mode texture analysis of diffuse liver disease. In “Ultrasonic Tissue
Characterization and Echographic Imaging” (J. M. Thijssen, ed.) Vol. 7, pp. 43-54. Publ.
Office EC, Luxembourg.
348 J. M. THIJSSEN

Oosterveld, B. J., Thijssen, J. M., Hartman, P., and Rosenbusch, G. J. E. (1991). Ultrasound
attenuation and texture analysis of diffuse liver disease: Methods and preliminary results.
Phys. Med. Eiol. 36, 1039-1064.
Pauli, H., and Schwann, H. P. (1971). Mechanism of absorption of ultrasound tissue. J. Acoust.
SOC.Am. 50, 692-699.
Pitas, I., and Venetasanopoulos, A. N. (1990). “Nonlinear Digital Filters: Principles and
Applications.” Kluwer, Boston.
Pratt, B. (1978). “Digital Image Restoration.” Wiley, New York.
Raeth, U., Schlaps, D., Limberg, B. et al. (1985). Diagnostic accuracy of computerized B-scan
texture analysis and conventional ultrasonography in diffuse parenchymal and malignant liver
disease. J. Clin. ultrasound 13, 87-89.
Rice, S. 0. (1945). Mathematical analysis of random noise. Eel1 Syst. Tech. J . XXIV,46-158.
Romijn, R. L., Thijssen, J. M., and van Beuningen, G. W. J. (1989). Estimation of scatterer size
from backscattered ultrasound: a simulation study. IEEE Trans. Ultrasonics Ferroel. Freq.
Control UFFC-36, 593-606.
Romijn, R. L., Thijssen, J. M., Oosterveld, B. J., and Verbeek, A. M. (1991). Ultrasonic
differentiation of intraocular melanomas: parameters and estimation methods. Ultrasonic
Imag. 13, 27-55.
Schlaps, D., Zuna, I., Walz, M. et al. (1987). Ultrasonic tissue characterization by texture
analysis: elimination of tissue independent factors: In “Proceedings SPIE Congress”
(L. A. Ferrari, ed), Proc. SOC.Photo-Opt. Instr. Eng. 768, 128-134.
Sleefe, G. E., and Lele, P. P. (1988). Tissue characterization based on scatterer number density
estimation. IEEE Trans. Ultrasonics. Ferroel. Freq. Control UFFC-35, 749-757.
Smith, S. W., and Lopez, H. (1982). A contrast detail analysis of diagnostic ultrasound imaging.
Med. Phys. 9, 4-12.
Smith, S. W., and Wagner, R. F. (1984). Ultrasound speckle size and lesion signal to noise ratio:
verification of theory. Ultrasonic Imag. 6, 174-180.
Smith, S. W., Lopez, H., and Bodme, W. J. (1983a). Frequency independent ultrasound
contrast-detail phantom. J . Ultrasound Med. 2, 75.
Smith, S. W., Wagner, R. F., Sandrik, J. M., and Lopez, H. (1983b). Low contrast detectability
and contrast/detail analysis in medical ultrasound. IEEE Trans. Sonics ultrasonics SU-30,
16 4173.
Stephanishen, P. R. (1971). Transient radiation from pistons in an infinite planar baffle. J.
Acoust. SOC.Am. 49, 1629-6638.
Swets, J. A,, and Pickett, R. M. (1982). “Evaluation of Diagnostic Systems.” Academic Press,
New York.
Thijssen, J. M. (1987). Ultrasonic tissue characterization and echographic imaging. Med. Progr.
Technol. 13, 29-46.
Thijssen, J. M. (1988). Focal lesions in medical images: a detection problem. In “Proceedings
NATO-AS1 Mathematics and Computer Science in Medical Imaging” (M. A. Viergever and
A. Todd-Prakopek, eds.) pp. 415-440, Springer, Berlin.
Thijssen, J. M., and Oosterveld, B. J. (1985). Texture in B-mode echograms: a simulation study
of the effects of diffraction and of scatterer density on gray scale statistics. In “Acoustical
Imaging” (A. J . Berkhout, J. Ridder and L. Van der Wal, eds.), Vol. 14, pp. 481-486. Plenum,
New York.
Thijssen, J. M., and Oosterveld, B. J. (1986). Speckle and texture in echography: artifact or
information? In “IEEE Ultrasonics Symposium Proceedings” (B. R. McAvoy, ed.) Vol. 2,
pp. 803-810.
Thijssen, J . M., and Oosterveld, B. J. (1988). Performance of echographic equipment and
potentials for tissue characterization. In “Proceedings NATO-AS1 Mathematics and
ECHOGRAPHIC IMAGE PROCESSING 349
Computer Science in Medical Imaging” (M. A. Viergever and R. Todd-Prokopek, eds)
pp. 455-468. Springer, Berlin.
Thijssen, J. M., and Oosterveld, B. J. (1990). Texture in tissue echograms: speckle or informa-
tion? J . Ultrasound Med. 9, 215-229.
Thijssen, J. M., Oosterveld, B. J., and Wagner, R. F. (1988). Gray level transforms and lesion
detectability in echographic images. Ultrasonic Imag. 10, 171-195.
Thijssen, J. M., Verbeek, A. M., Romijn, R. L. et al. (1991). Echographic differentiation of
histological types of intraocular melanoma. Ultrasound Med. Biol. 17, 127- 138.
Trahey, G . E., Smith, S. W., and Von Ramm, 0. T. (1986a). Speckle pattern correlation with
lateral translation: Experimental results and implications for spatial compounding. IEEE
Trans. Ultrasonics Ferroel. Freq. Control UFFC-33, 257-264.
Trahey, G . E., Allison, J. W., Smith, S. W., and Von Ramm, 0. T. (1986b). A quantitative
approach to speckle reduction via frequency compounding. Ultrason Imag. 8, 151-164.
Van Kervel, S. J. H., and Thijssen, J. M. (1983). A calculation scheme for the design of optimal
ultrasonic transducers. Ultrasonics 21, 134-140.
Verhoef, W. A,, Cloostermans, M. J. T. M., and Thijssen, J. M. (1984). The impulse response
of a focussed source with an arbitrary axisymmetric velocity distribution. J. Acoust. SOC.Am.
75, 1716-1721.
Verhoef, W. A., Cloostermans, M. J. T. M., and Thijssen, J. M. (1985). Diffraction and
dispersion effects on the estimation of ultrasound attenuation and velocity in biological
tissues. IEEE Trans. Biomed. Engng. BME-32, 521-529.
Verhoeven, J. T. M . , and Thijssen, J. M. (1990). Improvement of lesion detection by echographic
image processing: signal-to-noise ratio imaging. Ultrasonic h a g . 12, 130.
Verhoeven, J. T. M., Thijssen, J. M., and Theeuwes, A. G. M. (1991). Improvement of lesion
detection by echographic image processing: signal-to-noise ratio imaging. Ultrasonic Imag. 13,
238-251.
Wagner, R. F., and Brown, D. G. (1985). Unified SNR analysis of medical imaging systems.
Phys. Med. Biol. 30, 489-518.
Wagner, R. F., Smith, S. W., Sandrik, J. M., and Lopez, H. (1983). Statistics of speckle in
ultrasound B-scans. IEEE Trans. Sonics Ultrasonics SU-30, 156-163.
Wagner, R. F., Insana, M. F., and Brown, D. G. (1986). Unified approach to the detection and
classification of speckle texture in diagnostic ultrasound. Opt. Eng. 25, 738-742.
Wagner, R. F., Insana, M. F., and Smith, S. W. (1988). Fundamental correlation lengths of
coherent speckle in medical ultrasonic images. IEEE Trans. Ultrasonics Ferroel. Freq. Control
UFFC-35, 34-44.
Wells, P. N. T., and Halliwell, M. (1981). Speckle in ultrasonicimaging. Ultrasonics 19,225-229.
This Page Intentionally Left Blank
ADVANCES IN ELECTRONICS A N D ELECTRON PHYSICS,VOL. 84

Index

A Attenuation coefficient, estimation


centroid shift method, 324
Absorption, relaxation phenomena, 321
log-spectral difference method, 324
Acoustic tissue models, 321
quasi multi-narrow band method, 323
Acoustic (tissue) parameters, 323, 344
Autocorrelation function
Acoustospectrography, 323, 345
axial, 331, 334
Adaptive pattern classification, 265-266,
lateral, 331, 334
270-271
spatial, 328
Adjacency
Auto covariance function, 339
graph, 198-200
Autoregressive process
4- and 8-adjacency, 198
experimental results, 302-309
relation, 198, 232
final conditions, 299
Algebra
initial conditions, 297-299
associative, 271, 3 I I
parameter estimation
Clifford, 277
Box-Jenkins, 297-298
definition, 3 11 covariance, 291
Jordan, see Jordan algebra
direct method, 288
Lie, 262, 268
exact likelihood, 296-297
von Neumann real, 272, 284 finite sample size results, 292-293
A-mode, 326
forward-backward, 298-302
Applications of lattice transforms relation to Jordan algebra, 293,
communications networks, 66 295-296
dual transportation problem, 124 transformation method, 288-289, 291
generalized skeletonizing technique, 1 I5 power spectral density, 287, 292, 302
image complexity measure, 120 used in example, 287
minimax algebra properties, 90
operations research, 67
scheduling problem, 94 B
Approximate maximum likelihood (AML) Backscattering, spectrum, 320-32 I , 324
method. 302-309 intercept, 324
Approximation, 244, 247 slope, 324
AR process, see Autoregressive process Backward predictor, 299
Attenuation, 320-321, 323, 329, 332 Belt, 68
effect, 329 identity element in, 69
scattering, 32 I , 324 null element in, 69

35 I
352 INDEX
Birkhoff, 72 Commutator, 139, 150, 186
Block Complete lattice, 63
of block complex, 225 Complex, see Cell, complex
diagrams, 247 Component, 207
B-mode image, 325, 327 Compounding, spatial, 341
Bound frequency, 342
Cramer-Rao, 286-287, 292, 302, 304-306, Compression, logarithmic, 340
310 Compression factor, 228
sample size, 266, 271 Computed sonography, 344
trace covariance, 285-286 Conjugacy
Boundary, 199-216 in image algebra
in adjacency graphs, 199, 219-220 images, 74
area, 200, 216 templates, 78
definition, 21 7 in semi-lattice, 70
difficulties with, 199 self-, 70
tracking, 220 Conjugate element in lattice, additive and
Bounded lattice-ordered group, 67, 69 multiplicative, 70
radicable, 113 Connected
Bragg diffraction, 322 complex, 208
Break point, 244, 248 pairs of cells, 207
Burg method, 302-309 Connectedness relation, 207
Connectivity
C
of complexes, 207-208
Cancellation, 138 paradox, 198, 212
invariance and uniqueness, 141 resolution, 212-216
Canonical coordinates, 147-149 Consistent labelling, 232
rotations and dilations, 152 Constellation of graph edges, 216
smooth deformations, 154 Continuous additive and multiplicative
translations, 152 maximum, extension, 81
Cardinality of set, 73 Contour, 138
Cartesian ACC, 209 invariance, I38
Cartographical Contour, preservation, 343
data structure, 254 Contraidentity matrix, 276
point objects, 250 Contrast detail cprve, 340
Cartography, 250-254 Coordinates, 209
Causal residual, 17 Covariance matrix estimate
Cell explicit, 282, 284, 287
abstract, 202 general sample, 264-265, 268, 292-293
complex, abstract (ACC), 202-208 inverse linear, 301
connected, 208 isomorphic block diagonal, 280
definition, 202 linear, 263, 269, 271, 282
k-dimensional, 202 normal equations, 262, 289, 291, 297
&dimensional, 202 orthogonal subspace decomposition, 278
list, 224-228, 247, 250, 254 Crack, 200, 203, 218
Cellular array machines, 64 following, 220
Characteristic function, 7 Cross-correlation, 143-144
of image, 75 generalized, I5 I
Circular harmonic expansion, 144 Mellin-type, 158
Closed subcomplex, 205 normalized, 143
Closure, 207 Cross-correlator, 142-143
INDEX 353
Cuninghame-Green. 67 Estimation of volume, 3-D space, 122
Curvature measure. 244 Experiment, 245-246, 250-251, 255-256
Exponential map, 170, l85f
D Extended boundary, 219
Decision tree, 238
text file description, 241 F
language, 24 1 Face, 201, 204
compiler, 241, 243 relation, 204
Demodulation, 328 False alarm, 230
Detectability index, 339 Feature, 230-234
Diffraction, 3 19, 332 Filling, interior, of closed curve, 222
Bragg, 322 Filter
correction, 323 adaptive mean, 342
effect, 344 adaptive non-linear, 343
Fraunhofer, 3 19 adaptive weighted mean, 343
term, 323 adaptive weighted median, 343
Diffuse pathological conditions, 344 mean, 342
Digital straight segments (DSS),225-226 smoothing, 342
Digitization, automatic SNR, 344
maps, 250 window, 343
technical drawings, 254 Finite sample size
Dimension, 202 effective, 27 I
Dirac matrix, 277 estimation, 262-263, 265-266, 289, 291,
Distribution 296
autoregressive process parameters, 297-300 Fisher information matrix, 285-286
convergence, 309 initial conditions, importance, 298
exponential family, 263, 274 Morgera-Cooper coefficient, 27 I
Gauss-Markov, 27 I performance
heavy-tailed, 309 adaptive pattern classification, 265-266,
multivariate Gaussian, 264, 274, 380 270-271
sample set, 264, 297, 299 autoregressive parameter estimation,
univariate Gaussian, 297 292-293, 302-309
Wishart, 280-281 covariance estimation, 287
Division algorithm, 1 15, see also First fundamental form, 189
Skeletonizing technique coefficients, 173
Dual transportation problem, image algebra, Fisher information matrix, 285-286
124 Focus, 33 1 , 344, see also Transducer
Fourier-Mellin transform, I59
E Fourier transform, 135
Eigenproblem, image algebra, 1 1 1-1 14 discrete, 62, 84
eigenimage, 1 13 fast, 62
eigennode, 1 I2 Forward predictor, 299
equivalent, 112 Fraunhofer diffraction, 3 19
eigenspace of template, 1 12 Fundamental theorem of surface theory,
eigenvalue, 112 173, 192
principal, template, I 1 3
solutions, I14 G
Entanglement, 28 3aussian curvature, 174, 191
Entropy, 8 3auss-Weingarten equations, 176, 191
sample set, 264 3eneralized Lloyd algorithm, 10
354 INDEX

Generalized matrix product, 67 Image, 208


Global predicate, 236-239, 245 additive conjugate, 74
Global reduce operation, 79 binary operations between, 73-74
Gohberg-Semencul decomposition, 296 characteristic function, 75
Grafted branch, I6 complex, 235
Graft residual, 17 on a complex, 208-212
centroid, 16, 18 complexity measure, 120
Graph theory, 107 constant, 74
circuit, length, weighted graph, 113 correspondence with mathematical
correspondence between graph and morphology set, 88
template, 107 definition, 73
weighted, associated with template, 107 graph, 229
Gray level statistics, 328, 337 induced unary operation, 75
cooccurrence matrix, 338-339 n-dimensional, 210
first order, 329-330 operations between template and, 79
histogram, 328-329 parametric, 344
kurtosis, 329-330 processing, 338
mean, 329-330 Image algebra, 72
probability, density function correspondence with
circular, 327 mathematical morphology, 88
exponential, 328 minimax algebra, 85
Gaussian, 327 first to use term, 65
joint, 327, 333 image processing, 64
Rayleigh, 328, 330 minimax algebra properties mapped to, 90
Rice, 333 origin, 65
Rician variance, 333, 336 Image thresholding, using characteristic
second order, 328, 337 function, 75
signal-to-noise ratio, 329, 333 Incidence relation, 207
standard deviation, 329-330 Incident
structural variance, 336 cells, 207
Greatest lower bound, 68 subcomplexes, 219
Infinitesimal operator, 136, 138, 146, 184f
H dilations, 152
rotations, 152
Hadwiger, 63, 87 smooth deformations, 154
Hand-made drawings, 240,244, 247 Instrumentable, 4
Handwritten characters, 245-247 Integral transform, 134
Hessian matrix, 282, 303 condition for invariance, 135
Heterogeneous algebra, 72
condition for uniqueness, 135
Homomorphism covariant, 161, 164
definition, 312 invariant in the strong sense, 146-151, 155
Jordan algebra, 279-280 kernel, 154
Homomorphism, semi-lattice, right linear, with respect to dilations and rotations,
69 153
Interference, 319, 326, 329
I Interpretation relation, 231
Intervening tissue, 344
Ideal Invariance, 131, I34
definition, 312 strong, 133
role in orthogonal decomposition, 278 condition of, 147, 149
Ideal observe, 339 weak, 133-134
INDEX 355
Invariant coding, 132, 136 focal, 338
three dimensions, 167 SNR, 339-340, 342
Invariant functions, 155, 157 Lie algebra, 262, 268
Invariant recognition, 131, 142 Lie bracket, 139, 150, 186fl
human visual system, 131 Lie groups, 136, 181-188
Isometry, 28 1 Linear mapping
Isomorphism definition, 310-31 1
embedding minimax algebra into image isometry, 281
algebra, 85 Linking cell lists, 254
labelled subgraphs, 23 I Liver
subgraphs, 229 diffuse disease, 336
Iterative method lobular structure, 333
Algorithm B, 287, 302 sub-resolution structure, 333
annealing, 303 triads of Kiernan, portal, 322, 333
Levinson, 303 Local decomposition, 84
Newton-Raphson, 288, 300-302, 3 13-3 14 Local neighborhood, 78
Loglikelihood
J Box-Jenkins, 297-298
Jacobian, 291 constrained, 267, 282, 301
Jordan algebra exact forward-backward, 296, 298-300
definition, 31 1 maximization, 300-302, 313-314
dimension, 275-276 modified, 288, 296
generation, 274 perturbed, 264
homomorphism, 279-280, 312 surface, 265
multiplication tables, 3 12-3 13 unconstrained, 264-265
simple, 276-277, 280 LPCH transform, 145
special, 312 invariance, 145
symmetric linear mapping, 273 LTG/NP, 137-142
symmetric product, 272-273, 31 1
Jordan theorem, 198, 212 M
proof, 212 Manifold, 137, 181
Map, automatic digitization, 250-254
K Mapping
Khalimsky space, 210 predicate-conditioned
sets of regions, 235
L subcomplexes, 236
template to computer architectures, 78
Label Mathematical morphology, 63
membership, 210, 224 dilation and erosion, 63, 87
segmentation, 224 history, 86
semantic, 230 hit or miss transform, 87
Lattice, relationship to complex numbers, 68 limitations, 65
Lattice-ordered group, 66 opening and closing, 87
Lattice transforms, definition, 71 transform, as block toeplitz matrix with
Least-squares toeplitz blocks, 89
forward-backward method, 302-309 Matrix operations, see also Minimax matrix
generalized, 282 theory or minimax algebra
Least upper bound, 68 pointwise maximum and product, 71
Lesion Maximum likelihood estimate
detection, 338, 341 exact, autoregressive parameters, 298-300
356 INDEX

explicit, 281-282, 284 Morgera-Cooper coefficient, 27 1


linear covariance model, 282 Morphological neural networks, 62, 89
member of Jordan algebra, 284, 301
Maximum value rule, 21 I, 220 N
Mean curvature, 174, 191
Medial axes, 256 Non-obligatory strokes, 245
Mellin transform, 158 Normal, to surface, 173, 189
Membership Normal equations, 289, 291, 297
cells in subset, 210
label, 210, 224 0
rules, 21 I
Open
local, 218
boundary, 223
Metric data, 225, 227
screen, 224
Minimax algebra, 66
subcomplex, 205
equivalent linear programming criterion,
subset, 201, 204
102
properties, 90
properties mapped to image algebra, 90 P
/-defined and /-undefined products, 98 Parametric image, 344
alternating t-t star products, 96 Path
conjugacy, 93 in complex, definition, 208
homomorphisms, classification of right in graph, 198
linear, 92 Path-connected complex, 208
linear dependence, 103 Pixel, 197, 203
linear independence, 104 definition, 73
scalar multiplication, 91 location and value, 73
rank SNR, 342
column, row 0-astic, 105 Point sets, definition, 73
dual, 106 Polygon corner, 245
existence and relation to SLI, 106 Predicate, 234, 245
similarities to linear algebra, 71 global, 236-239, 245
strongly linear independent, 103 Predicate-conditioned mapping
systems of equations, solutions, 98 subcomplexes, 236
boolean equations, 100 subgraphs, 235
existence and uniqueness, 101 Principal idempotents, 275
templates Prototype
adjugate, 107 complex, 235
based on set, 104 graph, 229
definite, 108 variability, 238, 245
elementary, I10 Pseudoraster, 253
equivalent, 110
identity, 92
increasing, 106 Q
inverse, 109 Quantization, 2
invertible, 109 Quantizer
metric, 109 equivalent, 13
permanent, 106 exhaustive search, 3
Minimax matrix theory, 67 lattice, 4
Minkowski operations, addition and residual, 4, 11
subtraction, 63, 87 exhaustive search, 5, 22
MLE, see Maximum likelihood estimate reflected, 30
INDEX 357
scalar, 14 Scattering
vector, 26 anisotropic, 333
single-stage Bragg diffraction, 322
scalar, 6 combined, model, 332
vector, 9 diffuse, model, 329, 332, 334
tree structured, 4 isotropic, 332
Quantum and wave mechanics, 262-263, 276 Rayleigh, 330, 332
Quotient resolved structure, 333, 336
space, 202, 225, 272-273, 286 characteristic dimension (scatterer
topology, 225 spacing), 334
structural, model, 332
R subresolution structure, 333
unresolved structure, 336
Radiofrequency echogram, 325, 338 Scheduling problem, 94
Rate-distortion theory, 2 SCS matrix, see Symmetric centrosymmetric
Reception, linear phase sensitive, 326 matrix
Recognition Second fundamental form, 190
program, 238 coefficients, 173
of types of lines, 256 Segmentation, 224
Recursive maximum likelihood estimate Semantic label, 230
(RMLE), 302-309 Semi-lattice ordered group, 66
Region, 225 Semi-lattice ordered semi-group, 68
Region adjacency graph, 225, 229 Shortest path problem, 64
Representation, 132 Signal detection, statistical theory, 338
explicit, 180 Signal-to-interference ratio (SIR), 265-27 1
images Signal-to-noise ratio (SNR),329, 333
domain (u. v). 134 Singular location, 248
domain ( x , y ) . 133 Skeletonizing technique, 1 15
implicit, patterns, 179 application to data compression, 120
invariant, 133 image algebra notation, 119
in strong sense, 133, 149-150 matrix notation, 115
objects, 177 Smallest open neighborhood (SON), 205
in weak sense. 133, 157-158 Smoothing filter, 342-343
objects, 174 Sound field, 319
surfaces Space
Monge patch, 168, 189 for matrices, right semi-lattice, two-sided,
parametric, 168, 188 and function, 69
uniqueness, 132- I33 for templates, 9 1
Resolution cell, 326 Specialization order, 201
Rigid motion in Rq, 169-170 Speckle, 326, 328
Rotation, in R9, 169, 176 attenuation effect, 329
autocorrelation function, 331
S full-width-at-half maximum, 33 1
fuly developed, 330
Sampling, 2 reduction, 342
volume, 322 size, axial, lateral, 329, 331, 334, 338
Scanning, 329 Specular reflector, 333
large drawing, 254 Speed, propagation, 321
Scatterer Statistic
number density, 328-330 complete sufficient, 272-274, 284, 289
spacing, 336 explicit, 278, 281-282
358 INDEX

minimal, 265, 285 backward and forward additive


Structure constant, 277-278 maximum, 80
Structured centrosymmetric matrix backward and forward linear
estimation performance convolution, 79
autoregressive parameters, 292-293, backward and forward multiplicative
302-309 maximum, 80
covariance, 287 continuous domain, 81
ideal structure, 278-279 generalized backward and forward, 79
isomorphic block diagonal form, 280-28 1 multiplicative additive and minimum,
isomorphism of simple algebras, 276-278 81
Jordan subalgebra dimension, 275-276 operations between templates, 81
relation to Toeplitz matrix, 270 convolution type
role in autoregressive parameter additive maximum, 83
estimation, 290-291, 293-296, 301 dual to additive maximum, 84
structure set, 272-275 generalized backward, 83
trace covariance bound, 285-286 linear convolution, 83
Structure set pointwise, 82
commutative, 271-272 row/column/doubly-P-astic, 98
for Dirac matrix, 277 strictly doubly F-astic, I10
extension support, infinite negative and positive, 77
quotient space, 272-273 target point, 77
recursive, 274-275 translation invariant and variant, 78
free, 272 transpose, 78
inverse covariance, 272, 3 13 Texture
for minimum variance estimation, 286 analysis, 329
symmetric centrosymmetric matrix, generation, 325
272-275 non-parametric analysis, 337, 345
Toeplitz covariance matrix, 269-270 cooccurrence matrix, 337
Structuring element, 65, 87-88 MAX-MIN method, 338
Subcomplex power spectrum, 336
definition, 205 Time-gain-compensation, 329, 340, 344
mapping, 236 Tissue mimicking phantom, 340, 344
Subgraph isomorphism, 229-235 Toeplitz matrix
biased correlation estimate, 303
T estimation performance
autoregressive parameters, 292-293,
Tangent vector, 137, 171, 173, I89 302-309
Technical drawings, 254-257 covariance, 287
Template SIR, 270-271
additive and multiplicative conjugates, 78 inverse, 271-272
constant, 82 lowest Jordan subalgebra dimension,
correspondence with structuring element, 275-276
85 maximum likelihood estimate, 269, 282,
decomposition, 78, 84 285
definition, 76 relation to symmetric centrosymmetric
example, 77 matrix, 270
induced functions, 82 role in autoregressive parameter
null, negative and positive, 82 estimation, 289-291, 293-296
one-point, 82 structure set, 269-270
operations between image and template, trace covariance bound, 285-286
78 Topographical maps, 250-254
INDEX 359
Topological space, axioms, 201 Translation, in R9, 169, 177
TO-space, 220 Tree
Total residual error, 12 entangled, 22
Trace inner product, 265, 270 unentangled, 22
Tracking, boundary, 220 True covariance matrix model
Transducer free parameters, 266
aperture, 344 inverse linear, 271-272
array, 344 linear, 266-268, 273, 282
backing medium, 319 nonsymmetric, 264
continuous wave, 319 orthogonal complement identity, 268
directivity function, 320 simple symmetry, 265-266
dynamic focus, 344
geometrical, 33 I V
linear array, 341
multifocus mode, 344 Variability of prototypes, 238, 245
phased array, 341 Vector field, 136, 138, 170, 184f
piezoelectric layer, 318 holonomy, property, 138
pulsed mode, 3 19 prolongations, 138, 141, 187
pulse-echo mode, 318 Video signal, 328
pulse waveform, envelope, 319, 331 Voxels, 203
synthetic focus, 345
Transfer function, tissue, 320, 323 W
Transformation group, 133, 146, 184f Wiener weight vector, 266
This Page Intentionally Left Blank
ISBN O-L2-0LLI726-2
90040
This Page Intentionally Left Blank

Você também pode gostar