Escolar Documentos
Profissional Documentos
Cultura Documentos
PROCESSING
KAY Modern Spectral Estimation
KINO Acoustic Waves: Devices, Imaging, and Analog Signal Processing
LEA,ED. Trends in Speech Recognition
LIM Two-Dimensional Signal and Image Processing
LIM,ED. Speech Enhancement
LIMAND OPPENHEIM, EDS. Advanced Topics in Signal Processing
MARPLE Digital Spectral Analysis with Applications
MCCLELLAN AND RADER Number Theory in Digital Signal Processing
MENDEL Lessons in Digital Estimation Theory
OPPENHEIM, ED. Applicatiom of Digital Signal Processing
JAE S. LIM
OPPENHEIM, WILLSKY, WITH YOUNG Signals and Systems Department of Electrical Engineering
OPPENHEIM AND SCHAFERDigital Signal Processing and Computer Science
OPPENHEIM AND SCHAFERDiscrete-Time Signal Processing Massachusetts Institute of Technology
QUACKENBUSH ET AL. Objective Measures of Speech Quality
RABINER AND GOLD Theory and Applications of Digital Signal Processing
RABINER A N D SCHAFER Digital Processing of Speech Signals
ROBINSON AND TREITELGeophysical Signal Analysis
STEARNS AND DAVID Signal Processing Algorithm
TRIBOLETSeismic Applications of Homomorphic Signal Processing
WIDROWAND STEARNSAdaptive Signal Processing
TO
EditoriaYproduction supervision: Raeia Maes
Cover design: Ben Santora
KYUHO and TAEHO
Manufacturing buyer: Mary Ann Gloriande
ISBN 0-13-735322-q
PREFACE xi
INTRODUCTION xiii
1.0 Introduction, 1
1.1 Signals, 2
1.2 Systems, 12
1.3 The Fourier Transform, 22
1.4 Additional Properties of the Fourier Transform, 31
1.5 Digital Processing of Analog Signals, 45
References, 49
Problems, 50
2 THE 2-TRANSFORM 65
2.0 Introduction, 65
2.1 The z-Transform, 65
vii
2.2 Linear Constant Coefficient Difference Equations, 78 5.4 Stabilization of an Unstable Filter, 304
2.3 Stability, 102 5.5 Frequency Domain Design, 309
References, 124 5.6 Implementation, 315
Problems, 126 5.7 Comparison of FIR and IIR Filters, 330
4.3 Filter Design by the Window Method and the 7.0 Introduction, 410
Frequency Sampling Method, 202
7.1 Light, 413
4.4 Filter Design by the Frequency Transformation
Method, 218 7.2 The Human Visual System, 423
4.6 Implementation of FIR Filters, 245 7.4 Image Processing Systems, 437
5.1 The Design Problem, 265 8.1 Contrast and Dynamic Range Modification, 453
viii Contents
8.5 False Color and Pseudocolor, 511
References, 512
Problems, 515
xiii
cessing, an important one-dimensional signal processing application, speech is typ- and what impacts they have on two-dimensional signal processing applications.
ically sampled at a 10-kHz rate and we have 10.000 data points to process in a Since we will study the similarities and differences of one-dimensional and two-
second. However, in video processing, where processing an image frame is an dimensional signal processing and since one-dimensional signal processing is a
important two-dimensional signal processing application, we may have 30 frames special case of two-dimensional signal processing, this book will help us understand
per second, with each frame consisting of 500 x 500 pixels (picture elements). In not only two-dimensional signal processing theories but also one-dimensional signal
this case, we would have 7.5 million data points to process per second, which is processing theories at a much deeper level.
orders of magnitude greater than the case of speech processing. Due to this An important application of two-dimensional signal processing theories is
difference in data rate requirements, the computational efficiency of a signal pro- image processing. Image processing is closely tied to human vision, which is one
cessing algorithm plays a much more important role in two-dimensional signal of the most important means by which humans perceive the outside world. As a
processing, and advances in hardware technology will have a much greater impact result, image processing has a large number of existing and potential applications
on two-dimensional signal processing applications. and will play an increasingly important role in our everyday life.
Another major difference comes from the fact that the mathematics used for Digital image processing can be classified broadly into four areas: image
one-dimensional signal processing is often simpler than that used for two-dimen- enhancement, restoration, coding, and understanding. In image enhancement,
sional signal processing. For example, many one-dimensional systems are de- images either are processed for human viewers, as in television, or preprocessed
scribed by differential equations, while many two-dimensional systems are de- to aid machine performance, as in object identification by machine. In image
scribed by partial differential equations. It is generally much easier to solve differential restoration, an image has been degraded in some manner and the objective is to
equations than partial differential equations. Another example is the absence of reduce or eliminate the effect of degradation. Typical degradations that occur in
the fundamental theorem of algebra for two-dimensional polynomials. For one- practice include image blurring, additive random noise, quantization noise, mul-
dimensional polynomials, the fundamental theorem of algebra states that any one- tiplicative noise, and geometric distortion. The objective in image coding is to
dimensional polynomial can be factored as a product of lower-order polynomials. represent an image with as few bits as possible, preserving a certain level of image
This difference has a major impact on many results in signal processing. For quality and intelligibility acceptable for a given application. Image coding can be
example, an important structure for realizing a one-dimensional digital filter is the used in reducing the bandwidth of a communication channel when an image is
cascade structure. In the cascade structure, the z-transform of the digital filter's transmitted and in reducing the amount of required storage when an image needs
impulse response is factored as a product of lower-order polynomials and the to be retrieved at a future time. We study image enhancement, restoration, and
realizations of these lower-order factors are cascaded. The z-transform of a two- coding in the latter part of the book.
dimensional digital filter's impulse response cannot, in general, be factored as a The objective of image understanding is to symbolically represent the contents
product of lower-order polynomials and the cascade structure therefore is not a of an image. Applications of image understanding include computer vision and
general structure for a two-dimensional digital filter realization. Another conse- robotics. Image understanding differs from the other three areas in one major
quence of the nonfactorability of a two-dimensional polynomial is the difficulty respect. In image enhancement, restoration, and coding, both the input and the
associated with issues related to system stability. In a one-dimensional system, output are images, and signal processing has been the backbone of many successful
the pole locations can be determined easily, and an unstable system can be stabilized systems in these areas. In image understanding, the input is an image, but the
without affecting the magnitude response by simple manipulation of pole locations. output is symbolic representation of the contents of the image. Successful devel-
In a two-dimensional system, because poles are surfaces rather than points and opment of systems in this area involves not only signal processing but also other
there is no fundamental theorem of algebra, it is extremely difficult to determine disciplines such as artificial intelligence. In a typical image understanding system,
the pole locations. As a result, checking the stability of a two-dimensional system signal processing is used for such lower-level processing tasks as reduction of deg-
and stabilizing an unstable two-dimensional system without affecting the magnitude radation and extraction of edges or other image features, and artificial intelligence
response are extremely difficult. is used for such higher-level processing tasks as symbol manipulation and knowledge
As we have seen, there is considerable similarity and at the same time con- base management. We treat some of the lower-level processing techniques useful
siderable difference between one-dimensional and two-dimensional signal pro- in image understanding as part of our general discussion of image enhancement,
cessing. We will study the results in two-dimensional signal processing that are restoration, and coding. A complete treatment of image understanding is outside
simple extensions of one-dimensional signal processing. Our discussion will rely the scope of this book.
heavily on the reader's knowledge of one-dimensional signal processing theories. Two-dimensional signal processing and image processing cover a large number
We will also study, with much greater emphasis, the results in two-dimensional of topics and areas, and a selection of topics was necessary due to space limitation.
signal processing that are significantly different from those in one-dimensional In addition, there are a variety of ways to present the material. The main objective
signal processing. We will study what the differences are, where they come from, of this book is to provide fundamentals of two-dimensional signal processing and
TWO-DIMENSIONAL
Introduction
Signals, Systems, and
the Fourier Transform
Most signals can be classified into three broad groups. One group. which consists
of analog or continuous-space signals, is continuous in both space* and amplitude.
In practice, a majority of signals falls into this group. Examples of analog signals
include image, seismic, radar, and speech signals. Signals in the second group,
discrete-space signals, are discrete in space and continuous in amplitude. A com-
mon way to generate discrete-space signals is by sampling analog signals. Signals
in the third group, digital or discrete signals, are discrete in both space and am-
plitude. One way in which digital signals are created is by amplitude quantization
of discrete-space signals. Discrete-space signals and digital signals are also referred
to as sequences.
Digital systems and computers use only digital signals, which are discrete in
both space and amplitude. The development of signal processing concepts based
on digital signals, however, requires a detailed treatment of amplitude quantization,
which is extremely difficult and tedious. Many useful insights would be lost in
such a treatment because of its mathematical complexity. For this reason, most
digital signal processing concepts have been developed based on discrete-space
signals. Experience shows that theories based on discrete-space signals are often
applicable to digital signals.
A system maps an input signal to an output signal. A major element in
studying signal processing is the analysis, design, and implementation of a system
that transforms an input signal to a more desirable output signal for a given ap-
plication. When developing theoretical results about systems, we often impose
*Although we refer to "space," an analog signal can instead have a variable in time,
as in the case of speech processing.
the constraints of linearity and shift invariance. Although these constraints are in Figure 1.1. An alternate way to sketch the 2-D sequence in Figure 1.1 is shown
very restrictive, the theoretical results thus obtained apply in practice at least in Figure 1.2. In this figure, open circles represent amplitudes of 0 and filled-in
approximately to many systems. We will discuss signals and systems in Sections circles represent nonzero amplitudes, with the values in parentheses representing
1.1 and 1.2, respectively. the amplitudes. For example, x(3, 0) is 0 and x(1, 1) is 2.
The Fourier transform representation of signals and systems plays a central Many sequences we use have amplitudes of 0 or 1 for large regions of
role in both one-dimensional (1-D) and two-dimensional (2-D) signal processing. ( n , n ) . In such instances, the open circles and parentheses will be eliminated
In Sections 1.3 and 1.4, the Fourier transform representation including some aspects for convenience. If there is neither an open circle nor a filled-in circle at a particular
that are specific to image processing applications is discussed. In Section 1.5, we (n,, n,), then the sequence has zero amplitude at that point. If there is a filled-
discuss digital processing of analog signals. Many of the theoretical results, such in circle with no amplitude specification at a particular (n,, n,), then the sequence
as the 2-D sampling theorem summarized in that section, can be derived from the has an amplitude of 1 at that point. Figure 1.3 shows the result when this additional
Fourier transform results. simplification is made to the sequence in Figure 1.2.
Many of the theoretical results discussed in this chapter can be viewed as
straightforward extensions of the one-dimensional case. Some, however, are unique 1.1.1 Examples of Sequences
to two-dimensional signal processing. Very naturally, we will place considerably
more emphasis on these. We will now begin our journey with the discussion of Certain sequences and classes of sequences play a particularly important role in
signals. 2-D signal processing. These are impulses, step sequences, exponential sequences,
separable sequences, and periodic sequences.
1.1 SIGNALS
Impulses. The impulse or unit sample sequence, denoted by S(nl, n,), is
The signals we consider are discrete-space signals. A 2-D discrete-space signal defined as
(sequence) will be denoted by a function whose two arguments are integers. For
example, x(n,, n,) represents a sequence which is defined for all integer values of
n, and n,. Note that x(n,, n,) for a noninteger n, or n, is not zero, but is undefined.
The notation x(n,, n,) may refer either to the discrete-space function x or to the The sequence S(nl, n,), sketched in Figure 1.4, plays a role similar to the impulse
value of the function x at a specific (n,, n,). The distinction between these two S(n) in 1-D signal processing.
will be evident from the context.
A n example of a 2-D sequence x(n,, n,) is sketched in Figure 1.1. In the
figure, the height at (n,, n,) represents the amplitude at (n,, n,). It is often tedious
to sketch a 2-D sequence in the three-dimensional (3-D) perspective plot as shown
0 0 1 1 1 1 1 0 0
Figure 1.2 Alternate way to sketch the
0 0 0 0
2-D sequence in Figure 1.1. Open cir-
cles represent amplitudes of zero, and
filled-in circles represent nonzero ampli-
0 0
0 0
0 0
0 / 0 0 0 0 tudes, with values in parentheses repre-
Figure 1.1 2-D sequence x ( n , , n,). senting the amplitude.
Signals, Systems, and the Fourier Transform Chap. 1 Sec. 1.1 Signals 3
I Figure 1.4 Impulse 6(n,, n,)
Step sequences. The unit step sequence, denoted by u(nl, n,), is defined
Any sequence x ( n l , n,) can be represented as a linear combination of shifted as
impulses as follows:
x(nl,n2)=.. .+x(-1,-1)6(nl+1,n2+1)+x(0,-1)6(n,,n2+1)
"2
4b
4t
4b
0
The representation of x(nl, n,) by (1.2) is very useful in system analysis.
Line impulses constitute a class of impulses which do not have any counter- b "1
parts in 1-D. An example of a line impulse is the 2-D sequence 6,(n1), which is
sketched in Figure 1.5 and is defined as db
1, n , = 0
x(n1, n,) = 6,(n,) =
0, otherwise.
4b
Other examples include 6,(n2) and 6,(n1 - n,), which are defined similarly to db
6,(n1). The subscript T i n 6,(n1) indicates that 6,(n1) is a 2-D sequence. This
notation is used to avoid confusion in cases where the 2-D sequence is a function O
of only one variable. For example, without the subscript T, 6,(n1) might be Figure 1.5 Line impulse &An,).
4 Signals, Systems, and the Fourier Transform Chap. 1 Sec. 1.1 Signals 5
The sequence u(n,, n,), which is sketched in Figure 1.6, is related to 6(n1,n,) as
Other examples include uT(n2)and uT(n, - n,), which are defined similarly to
uT(n1).
where 6(n1)and 6(n2)are 1-D impulses. The unit step sequence u(n,, n,) is also
a separable sequence since u(nl, n,) can be expressed as
where u(nl)and u(n,) are 1-D unit step sequences. Another example of a separable
sequence is an'bn2 + bnl+"Z,which can be written as (an1 + b"')bn2.
t " " Separable sequences form a very special class of 2-D sequences. A typical
2-D sequence is not a separable sequence. As an illustration, consider a sequence
x(n,, n,) which is zero outside 0 5 n1 r N, - 1 and 0 5 n, 5 N2 - 1. A general
sequence x(nl, n,) of this type has N1N2 degrees of freedom. If x(n,, n,) is a
separable sequence, x(n,, n,) is completely specified by some f(n,) which is zero
outside 0 r n , 5 N, - 1 and some g(n,) which is zero outside 0 r n2 5 N2 - 1,
and consequently has only Nl + N, - 1 degrees of freedom.
Despite the fact that separable sequences constitute a very special class of
2-D sequences, they play an important role in 2-D signal processing. In those
cases where the results that apply t o 1-D sequences do not extend to general 2-D
Figure 1.6 Unit step sequence u(n,, n,). sequences in a straightforward manner, they often do for separable 2-D sequences.
Signals, Systems, and the Fourier Transform Chap. 1 Sec. 1.1 Signals 7
In addition, the separability of the sequence can be exploited in order to reduce andlor N,. For example, the sequence in Figure 1.8 is periodic with a period of
computation in various contexts, such as digital filtering and computation of the 6 x 2 using (1.10).
discrete Fourier transform. This will be discussed further in later sections.
1.1.2 Digital Images
Periodic sequences. A sequence x(n,, n,) is said to be periodic with a Many examples of sequences used in this book are digital images. A digital image,
period of N, x N2 if x(nl, n,) satisfies the following condition: which can be denoted by x(n,, n,), is typically obtained by sampling an analog
image, for instance, an image on film. The amplitude of a digital image is often
x(nl, n,) = x(n, + N,, n,) = x(nl, n, + N,) for all (n,, n,) (1.10)
quantized to 256 levels (which can be represented by eight bits). Each level is
where N, and N, are positive integers. For example, cos ( ~ n + , (.rr/2)n2) is a commonly denoted by an integer, with 0 corresponding to the darkest level and
periodic sequence with a period of 2 x 4, since cos (.rrn, + (.rr/2)n2) = cos 255 to the brightest. Each point (n,, n,) is called a pixel or pel (picture element).
(.rr(n, + 2) + (.rr/2)n2) = cos (.rrn, + (.rr/2)(n2 + 4)) for all (n,, n,). The sequence A digital image x(n,, n,) of 512 X 512 pixels with each pixel represented by eight
cos ( n , + n,) is not periodic, however, since cos (n, + n,) cannot be expressed bits is shown in Figure 1.9. As we reduce the number of amplitude quantization
as cos ((n, + N,) + n,) = cos (n, + (n, + N,)) for all (n,, n,) for any nonzero levels, the signal-dependent quantization noise begins to appear as false contours.
integers N, and N,. A periodic sequence is often denoted by adding a "-"(tilde), This is shown in Figure 1.10, where the image in Figure 1.9 is displayed with 64
for example, i(n,, n,), to distinguish it from an aperiodic sequence. levels (six bits), 16 levels (four bits), 4 levels (two bits), and 2 levels (one bit) of
Equation (1.10) is not the most general representation of a 2-D periodic amplitude quantization. As we reduce the number of pixels in a digital image,
sequence. As an illustration, consider the sequence x(n,, n,) shown in Figure 1.8. the spatial resolution is decreased and the details in the image begin to disappear.
Even though x(n,, n,) can be considered a periodic sequence with a period of This is shown in Figure 1.11, where the image in Figure 1.9 is displayed at a spatial
3 x 2 it cannot be represented as such a sequence by using (1.10). Specifically, resolution of 256 X 256 pixels, 128 x 128 pixels, 64 x 64 pixels, and 32 x 32
x(n,, n,) + x(n, + 3, n,) for all (n,, n,). It is possible to generalize (1.10) to pixels. A digital image of 512 x 512 pixels has a spatial resolution similar to that
incorporate cases such as that in Figure 1.8. However, in this text we will use seen in a television frame. T o have a spatial resolution similar to that of an image
(1.10) to define a periodic sequence, since it is sufficient for our purposes, and on 35-mm film, we need a spatial resolution of 1024 x 1024 pixels in the digital
sequences such as that in Figure 1.8 can be represented by (1.10) by increasing N, image.
Figure 1.8 Periodic sequence with a period of 6 x 2. Figure 1.9 Digital image of 512 x 512 pixels quantized at 8 bitslpixel.
8 Signals, Systems, and the Fourier Transform Chap. 1 Sec. 1.1 Signals 9
(c) (d)
Figure 1.10 Image in Figure 1.9 with amplitude quantization at (a) 6 bitsipixel, (b) 3 bits1 Figure 1.11 Image in Figure 1.9 with spatial resolution of (a) 256 X 256 pixels, (b) 128 X
pixel, (c) 2 bitsipixel, and (d) 1 hitipixel. 128 pixels, (c) 64 x 64 pixels, and (d) 32 x 32 pixels.
Signals, Systems, and the Fourier Transform Chap. 1 Sec. 1.2 Systems
Note that the impulse response h(n,, n z ) , which plays such an important role for
an LSI system, loses its significance for a nonlinear or shift-variant system. Note
also that an LSI system can be completely characterized by the system response
to one of many other input sequences. The choice of 6(n,, n z ) as the input in
characterizing an LSI system is the simplest, both conceptually and in practice.
1.2.2 Convolution
The convolution operator in (1.18) has a number of properties that are straight-
forward extensions of 1-D results. Some of the more important are listed below.
Commutativity
x(n19 4 ) * y ( n 1 , nz) = y(nl3 n2) * x ( n , , n,) (1.19)
Associativity
( x ( n l , n,) * y(n1, n 2 ) ) * n,) = x(nl, n,) * ( ~ ( n ,n,)
, * n,)) (1.20)
Distributivity
:h(n, - k,, n, - k 2 )
121
- g ( k l - n,, k, - n2) k2
A
u
"2
---------*-a 111
I
The commutativity property states that the output of an LSI system is not
affected when the input and the impulse response interchange roles. The asso-
ciativity property states that a cascade of two LSI systems with impulse responses
n,
I
-1 kt
1
.4)
-
121 ,111
.13)
kl
h,(n,, n,) and h,(n,, n,) has the same input-output relationship as one LSI system
with impulse response h , ( n , , n,) * h,(n,, n , ) The distributivity property states
that a parallel combination of two LSI systems with impulse responses h,(n,, n,)
and h,(n,, n,) has the same input-output relationship as one LSI system with impulse
response given by h,(n,, n,) + h,(n,, n,). In a special case of (1.22), when m,
= m 2 = 0 , we see that the impulse response of an identity system is S(n,, n,).
The convolution of two sequences x(n,, n,) and h(n,, n,) can be obtained by
explicitly evaluating (1.18). It is often simpler and more instructive, however, to
evaluate (1.18) graphically. Specifically, the convolution sum in (1.18) can be
interpreted as multiplying two sequences x(k,, k,) and h(n, - k , , n, - k,), which
are functions of the variables k , and k,, and summing the product over all integer
values of k , and k,. The output, which is a function of n , and n,, is the result of
convolving x(n,, n,) and h ( n , , n,). T o illustrate, consider the two sequences Figure 1.13 Example of convolving two sequences.
x ( n l , n2) and h(n,, n,), shown in Figures 1.13(a) and (b). From x(n,, n,) and
h(n,, n,), x(k,, k,) and h(n, - k,, n, - k,) as functions of k , and k, can be
obtained, as shown in Figures 1 . 1 3 ( c ) - ( f ) . Note that g(k, - n , , k , - n,) is
Sec. 1.2 Systems
14 Signals, Systems, and the Fourier Transform Chap. 1
g(k,, k,) shifted in the positive k, and k, directions by n, and n, points, respectively.
Figures 1.13(d)-(f) show how to obtain h(n, - kl, n, - k2) as a function of k,
and k, from h(n,, n,) in three steps. It is useful to remember how to obtain
h(n, - k t , n, - k,) directly from h(n,, n,). One simple way is to first change
the variables n, and n, to k, and k,, flip the sequence with respect to the origin,
and then shift the result in the positive k, and k, directions by n, and n, points,
respectively. Once x(k,, k,) and h(n, - k,, n, - k,) are obtained, they can be
multiplied and summed over k, and k, to produce the output at each different
value of (n,, n,). The result is shown in Figure 1.13(g).
An LSI system is said to be separable, if its impulse response h(nl, n,) is a
separable sequence. For a separable system, it is possible to reduce the number
of arithmetic operations required to compute the convolution sum. For large
amounts of data, as typically found in images, the computational reduction can be
considerable. To illustrate this, consider an input sequence x(n,, n,) of N x N
points and an impulse response h(n,, n,) of M x M points:
where N >> M in typical cases. The regions of (n,, n,) where x(n,, n,) and
h(n,, n,) can have nonzero amplitudes are shown in Figures 1.14(a) and (b). The
output of the system, y(nl, n,), can be expressed as
Figure 1.14 Regions of (n,, n,) where x(n,, n,), h(n,, n,), and y(n,, n,) =
x(n,, n,) * h(n,, n,) can have nonzero amplitude.
The region of (n,, n,) where y(n,, n,) has nonzero amplitude is shown in Figure For a fixed kl, Cz2=_, x(kl, k,)h,(n, - k,) in (1.26) is a 1-D convolution of
1.14(c). If (1.24) is used directly to compute y(n,, n,), approximately (N + M x(k,, n,) and h,(n,). For example, using the notation
- 1),M2 arithmetic operations (one arithmetic operation = one multiplication and
one addition) are required since the number of nonzero output points is (N + M
r
- and computing each output point requires approximately M2 arithmetic f(kt, n2) = C
k2= -r
x(k,, k,)h,(n, - k,), (1.27)
operations. If h(nl, n,) is a separable sequence, it can be expressed as
f(0, n,) is the result of I-D convolution of x(0, n,) with h2(n2),as shown in Figure
1.15. Since there are N different values of k, for which x(k,, k,) is nonzero,
computing f(k,, n,) requires N I-D convolutions and therefore requires approxi-
h,(n,) = 0 outside 0 5 n, 5 M - 1 (1.25) mately NM(N + M - 1) arithmetic operations. Once f(k,, n,) is computed,
M - 1. y(n,, n,) can be computed from (1.26) and (1.27) by
h,(n,) = 0 outside 0 5 n, (-
From (1.28), for a fixed n,, y(n,, n,) is a 1-D convolution of h,(n,) and f(n,, n,).
For example, y(n,, 1) is the result of a 1-D convolution of f(n,, 1) and h,(n,),
as shown in Figure 1.15, where f(n,, n,) is obtained from f(k,, n,) by a simple
......
convolution
Figure 1.15 Convolution of x ( n , , n,) with a separable sequence h ( n , , n2) (el (f)
20 Signals, Systems, and the Fourier Transform Chap. 1 Sec. 1.2 Systems 21
The notion that a wedge support sequence can always be transformed to a
first-quadrant support sequence by a simple linear mapping of variables without
affecting its stability is very useful in studying the stability of a 2-D system. As
we will discuss in Chapter 2 , our primary concern in testing the stability of a 2-D
system will be limited to a class of systems known as recursively computable systems.
T o test the stability of a recursively computable system, we need to test the stability
of a wedge support sequence h1(n1,n,). T o accomplish this, we will transform
h ' ( n l , n,) to a first-quadrant support sequence hV(nl,n,) by an appropriate linear
mapping of variables and then check the stability of hV(nl,n,). This approach
exploits the fact that it is much easier to develop stability theorems for first-quadrant
support sequences than for wedge support sequences. This will be discussed further
in Section 2.3.
x ( n l , n,) = -
1
7
"
( 2 n ) w1=-"
jw:=-" X(wl, y)ejwln1ejWx2dol do2
When the Fourier transform of x(nl, n,) converges uniformly, X ( o l , o,) is an
Equation (1.31a) shows how the amplitude X(o,, o,) associated with the exponen- analytic function and is infinitely differentiable with respect to o1and o,.
tial e j w ~ en j~w n z can be determined from x(nl, n,). The function X ( o l , o,) is called A sequence x(nl, n,) is said to be an eigenfunction of a system T if T[x(nl,n,)]
the discrete-space Fourier transform, or Fourier transform for short, of x ( n l , n,). = kx(nl, n,) for some scalar k . Suppose we use a complex exponential
Equation (1.31b) shows how complex exponentials X ( o l , o2)ejwln1ejwznz are specif- jwlnle jwzn. as an input x(nl, n,) to an LSI system with impulse response h ( n l , n,).
ically combined to form x ( n l , n,). The sequence x ( n l , n,) is called the inverse The output of the system y(nl, n,) can be obtained as
discrete-space Fourier transform or inverse Fourier transform of X ( o l , 0,). The
consistency of (1.31a) and (1.31b) can be easily shown by combining them.
From (1.31), it can be seen that X ( o l , 0,) is in general complex, even though
x ( n l , n,) may be real. It is often convenient to express X(w,, o,) in terms of its
magnitude J X ( o l ,02)1and phase B,(w,, o,) or in terms of its real part X,(o,, o,)
and imaginary part X , ( o l , 0,) as
From (1.31),it can also be seen that X ( o l , 0,) is a function of continuous variables
o1and o,, although x(n,, n,) is a function of discrete variables n , and n,. In
22 Signals, Systems, and the Fourier Transform Chap. 1 Sec. 1.3 The Fourier Transform 23
From (1.34), ejwln1ejwZn2 is an eigenfunction of any LSI system for which H ( o , , o,)
is well defined and H ( o , , w,) is the Fourier transform of h(n,, n,). The function
H(wl, w,) is called the frequency response of the LSI system. The fact that n2)
y(n1, n2)
--
TABLE 1.1 PROPERTIES OF THE FOURIER TRANSFORM
X(o1, o 2 )
Y(o1, o2)
-
- n zeigenfunction of an LSI system and that H ( o , , 0,) is the scaling
~ W ~ I I I ~ ~isWan
factor by which ejw1"1ejw"'2is multiplied when it is an input to the LSI system sim- Property I . Linearity
ax(nl, n,) + by(n1,4 ) a X ( o l , o,) + bY(w,, o,)
-
plifies system analysis for a sinusoidal input. For example, the output of an LSI
system with frequency response H ( o , , o,) when the input is cos ( o ; n , + win,) can Property 2. Convolution
n2) * y(n1, n2) X(o1, o,)Y(w,, o 2 )
-
be obtained as follows:
Property 3. Multiplication
n,)y(n,, n2) X(o1, w,) O Y(o1, o2)
1.3.2 Properties
We can derive a number of useful properties from the Fourier transform pair in
(a) x(n, - m , , n, - m,)
( b ) elu~"l
eI"?n2
- x(n,, n,)
Property 6. Differentiation
--
Property 5. Shift of a Sequence and a Fourier Transform
X ( o , , 02)e-lwlmle-lw2mZ
X ( o , - v,, w, - v,)
( 1 . 3 1 ) Some of the more important properties, often useful in practice, are listed
in Table 1.1. Most are essentially straightforward extensions of 1-D Fourier trans-
form properties. The only exception is Property 4 , which applies to separable
sequences. If a 2-D sequence x(n,, n,) can be written as xl(n,)x2(n2),then its
Fourier transform, X ( o l , o,), is given by X l ( o l ) X 2 ( 0 2 )where
, X,(o,) and X 2 ( 0 2 ) Property 7. Initial Value and DC Value Theorem
represent the 1-D Fourier transforms of xl(n,) and x2(n2),respectively. This prop-
erty follows directly from the Fourier transform pair of (1.31). Note that this
(a) x(0, 0 ) = -
property is quite different from Property 3, the multiplication property. In the
multiplication property, both x ( n l , n,) and y(n,, n,) are 2-D sequences. In Property
4, x,(n,) and x2(n2)are 1-D sequences, and their product xl(n,)x2(n2)forms a 2-D
sequence. Property 8. Parseval's Theorem
1.3.3 Examples
Example 1
We wish to determine H(o,, o,) for the sequence h(n,, n,) shown in Figure 1.19(a).
From (1.31),
--
The function H(o,, o,) for this example is real and its magnitude is sketched in Figure X,(o,, w,), IX(w,, o,)(:even (symmetric with respect to the origin)
. 1 9 ( b ) . If H(o,, o,) in Figure 1.19(b) is the frequency response of an LSI system, X,(w,, o,), 8,(01, o,): odd (antisymmetric with respect to the origin)
the system corresponds to a lowpass filter. The function ( H ( o , ,oz)lshows smaller (f) x(n,, n,): real and even X ( o , , o,): real and even
values in frequency regions away from the origin. A lowpass filter applied to an (g) x(n,, n,): real and odd X ( o , , o,): pure imaginary and odd
Property 10. Unqorm Convergence
24 Signals, Systems, and the: iourier Transform Chap. 1 For a stable x(n,, n,), the Fourier transform of x(n,, n,) uniformly converges.
Figure 1.20 (a) Image of 256 x 256 pixels; (b) image processed by filtering the image in
(a) with a lowpass filter whose impulse response is given by h(n,, n,) in Figure 1.19 (a).
Example 2
We wish to determine H(o,, o,) for the sequence h(n,, n2) shown in Figure 1.21(a).
We can use (1.31) to determine H(w,, o,), as in Example 1. Alternatively, we can
use Property 4 in Table 1.1. The sequence h(n,, n,) can be expressed as h,(n,)h,(n,),
where one possible choice of h,(n,) and h,(n,) is shown in Figure 1.21(b). Computing
the 1-D Fourier transforms H,(o,) and H,(o,) and using Property 4 in Table 1.1, we
have
The function H(w,, w2) is again real, and its magnitude is sketched in Figure 1.21(c).
A system whose frequency response is given by the H(w,, o,) above is a highpass
filter. The function (H(w,, oz)l has smaller values in frequency regions near the ori-
Figure 1.19 (a) 2-D sequence h(n,, n,); (b) Fourier transform magnitude gin. A highpass filter applied to an image tends to accentuate image details or local
IH(w,, wz)l of h(n,, nz) in (a). contrast, and the processed image appears sharper. Figure 1.22(a) shows an original
image of 256 x 256 pixels and Figure 1.22(b) shows the highpass filtered image using
h(n,, n,) in this example. When an image is processed, for instance by highpass
filtering, the pixel intensities may no longer be integers between 0 and 255. They
image blurs the image. The function H(o,, o,) is 1 at o, = o, = 0, and therefore
may be negative, noninteger, or above 255. In such instances, we typically add a
the average intensity of an image is not affected by the filter. A bright image will
bias and then scale and quantize the processed image so that all the pixel intensities
remain bright and a dark image will remain dark after processing with the filter.
are integers between 0 and 255. It is common practice to choose the bias and scaling
Figure 1.20(a) shows an image of 256 x 256 pixels. Figure 1.20(b) shows the image
factors such that the minimum intensity is mapped to 0 and the maximum intensity
obtained by processing the image in Figure 1.20(a) with a lowpass filter whose impulse is mapped to 255.
response is given by h(n,, n,) in this example.
Since H ( o , , o,) is always periodic with a period of 2 n along each of the two variables
o, and 0 2 , H ( o l , 0 2 ) is shown only for loll 9 n and lozl5 n . The function H ( o , , o,)
can be expressed as H,(ol)H2(o,), where one possible choice of H , ( o , ) and H,(o,)
is also shown in Figure 1.23. When H(wl, o,) above is the frequency response of a
2-D LSI system, the system is called a separable ideal lowpass filter. Computing the
1-D inverse Fourier transforms of H , ( o l ) and H,(o,) and using Property 4 in Table
1.1, we obtain
sin an, sin bn,
h(n,, n,) = hl(nl)h,(nz) = -
m,
-
nn,
Example 4
We wish to determine h(n,, n,) for the Fourier transform H ( o l , o Z ) shown in Figure
1.24. The function H(w,, o,) is given by
Signals, Systems, and the Fourier Transform Chap. 1 Sec. 1.3 The Fourier Transform 29
I I
Figure 1.23 Separable Fourier trans-
form H(w,, w,) and one possible choice
of H,(w,) and H,(w,) such that H(w,, w,)
, = H,(w,)H,(w,). The function H(w,, w,)
is 1 in the shaded region and 0 in the
-I -a a 1 O1 unshaded region.
where J , ( . ) represents the Bessel function of the first kind and the first order and can
be expanded in series form as
This example shows that 2 - D Fourier transform or inverse Fourier transform opera-
Figure 1.25 Sketch of, J' (1) where J,(x) is the Bessel function of the first kind
tions can become much more algebraically complex than 1-D Fourier transform or and first order.
inverse Fourier transform operations, despite the fact that the 2 - D Fourier transform
pair and many 2 - D Fourier transform properties are straightforward extensions of 1 - D
results. From ( 1 . 3 6 ) , we observe that the impulse response of a 2 - D circularly sym- mable, and their Fourier transforms do not converge uniformly to H(w,, w,) used
metric ideal lowpass filter is also circularly symmetric, that is, it is a function of to obtain h(nl, n,). This is evident from the observation that the two H(w,, w,)
n: + nf. This is a special case of a more general result. Specifically, if H ( o , , o , ) contain discontinuities and are not analytic functions. Nevertheless, we will regard
is a function of w: + in the region d a 5 a and is a constant outside the them as valid Fourier transform pairs, since they play an important role in digital
region, then the corresponding h ( n , , n,) is a function of n: + n;. Note, however, that filtering and the Fourier transforms of the two h(n,, n,) converge to H ( w l , 0,) in
circular symmetry of h ( n , , n,) does not imply circular symmetry of H ( o , , a , ) . The the mean square sense.*
function J,(x)lx is sketched in Figure 1.25. The sequence h ( n , , n 2 )in ( 1 . 3 6 ) is sketched
in Figure 1.26 for the case o, = 0.4~.
1.4 ADDITIONAL PROPERTIES OF THE FOURIER TRANSFORM
The impulse responses h(nl, n,) obtained from the separable and circularly
symmetric ideal lowpass filters in Examples 3 and 4 above are not absolutely sum- 1.4.1 Signal Synthesis and Reconstruction from Phase or
Magnitude
Figure 1.26 Impulse response of a circularly symmetric ideal lowpass filter with
w, = 0.4, in Equation (1.36). The value at the origin, h(0, 0), is 0.126.
Figure 1.27 Example of phase-only and magnitude-only synthesis. (a) Original image of 128
xm(nl, n2> = F-'[IX(w17w2)lei0] (1.40) x 128 pixels; (b) result of phase-only synthesis: ( c ) result of magnitude-only synthesis.
Signals, Systems, and the Fourier Transform Chap. 1 Sec. 1.4 Additional Properties of the Fourier Transform 33
An experiment which more dramatically illustrates the observation that phase-
only signal synthesis captures more of the signal intelligibility than magnitude-
only synthesis can be performed as follows. Consider two images x ( n l , n,) and
y ( n , , n,). From these two images, we synthesize two other images f ( n , , n,) and
g(n1, n2) by
f ( n , , n,) = F- I [ \ Y ( w l , ~ , ) l e j ~ ~ ( ~ ~ . ~ * ) ] (1.41)
g ( n l , n,) = , 1.
F - ' [ J X ( w 1 ~,)lej~k("'3"~) (1.42)
In this experiment, f ( n l , n,) captures the intelligibility of x ( n , , n , ) , while g ( n l , n,)
captures the intelligibility of y ( n l , n,). An example is shown in Figure 1.28.
Figures 1.28(a) and (b) show the two images x ( n l , n,) and y(n,, n,) and Figures
1.28(c) and (d) show the two images f ( n l , n,) and g ( n l , n,).
The high intelligibility of phase-only synthesis raises the possibility of exactly
reconstructing a signal x ( n l , n,) from its Fourier transform phase O,(wl, w,). This
is known as the magnitude-retrieval problem. In fact, it has been shown [Hayes]
that a sequence x ( n l , n,) is uniquely specified within a scale factor if x ( n l , n,) is
real and has finite extent, and if its Fourier transform cannot be factored as a
product of lower-order polynomials in ejw' and elwZ. Typical images x ( n l , n,) are
real and have finite regions of support. In addition, the fundamental theorem of
algebra does not apply to 2 - D polynomials, and their Fourier transforms cannot
generally be factored as products of lower-order polynomials in ejwl and ejw* TYP-
ical images, then, are uniquely specified within a scale factor by the Fourier trans-
form phase alone.
Two approaches to reconstructing a sequence from its Fourier transform phase
alone have been considered. The first approach leads to a closed-form solution
and the second to an iterative procedure. In the first approach, tan O,(wl, w,) is
expressed as
C C x(n1, n2) sin (w1n1 + w2n2)
tan 8,(w1, w,) = X I ( ~~ I 2~-) - ( n 1 . n ~C)R X
(1.43)
XR(w1?w2) 2 2 x(n1, 4 )cos (wln1 +
(nl,nZ)€R,
~ 2 ~ 2 )
34 Signals, Systems, and the Fourier Transform Chap. 1 Sec. 1.4 Additional Properties of the Fourier Transform 35
1 Initial guess of
x(nl, n,) is
yin1, n2) I
~ ( n ,n,)
, =0
outside the region
of support of
Y(wl, w,) =
36 Signals, Systems, and the Fourier Transform Chap. 1 Sec. 1.4 Additional Properties of the Fourier Transform
are imposed. The algorithm has been observed to converge to the desired solution
when the initial estimate used is quite accurate or the signal x(n,, n,) has a special
characteristic such as a triangular region of support. The magnitude-only recon-
struction problem specifies x(n,, n,) within a sign factor, translation, and rotation
by 180", and, therefore, more than one solution is possible. Imposing an initial
estimate sufficiently close to a possible solution or imposing additional constraints
such as a triangular region of support appear to prevent the iterative procedure
from wandering around from one possible solution to another. In general, how-
ever, the algorithm does not converge to the desired solution. Figure 1.32 shows
an example of signal reconstruction from the magnitude using a closed-form al-
gorithm [Izraelevitz and Lim]. Figures 1.32(a) and (b) show the original and the
reconstruction respectively. Developing a practical procedure that can be used to
reconstruct x(n,, n,) from JX(o,, 02)Jremains a problem for further research.
In addition to the phase-only and magnitude-only signal synthesis and recon-
struction problems discussed above, a variety of results on the synthesis and re-
construction of a signal from other partial Fourier transform information-for
instance, one bit of Fourier transform phase or signed Fourier transform magni-
tude-have been reported [Oppenheim, et al. (198311.
The Fourier transforms of typical images have been observed to have most of their
energy concentrated in a small region in the frequency domain, near the origin
an iterative procedure similar to that in Figure 1.30, which was developed for the
phase-only reconstruction. The only modification required is to replace the Fourier Figure 1.32 Example of magnitude-only reconstruction by a closed-form algorithm.
(a) Original image of 24 x 24 pixels; (b) result of magnitude-only reconstruction of the image
transform magnitude with the given (X(o,, 02)(rather than to replace the Fourier
in (a) using a closed-form algorithm. After [Izraelevitz and Lim.]
transform phase with the given 0,(o,, o,) when the frequency domain constraints
Sec. 1.4 Additional Properties of the Fourier Transform 39
38 Signals, Systems, and the Fourier Transform Chap. 1
and along the w, and w, axes. One reason for the energy concentration near the
origin is that images typically have large regions where the'iirtensities change slowly.
-
Furthermore. sham discontinuities such as edges contribute to low-frequency as
well as high-frequency components. The energy concentration along the w, and
w, axes is in part due to a rectangular window used to obtain a finite-extent image.
The rectangular window creates artificial sharp discontinuities at the four bound-
aries. Discontinuities at the top and bottom of the image contribute energy along
the w, axis and discontinuities at the two sides contribute energy along the w, axis.
Figure 1.33 illustrates this property. Figure 1.33(a) shows an original image of
512 x 512 pixels, and Figure 1.33(b) shows IX(o,, ~ , ) / lof ' ~the image in Figure
1.33(a). The operation (.)'I4 has the effect of compressing large amplitudes while
expanding small amplitudes, and therefore shows (X(w,, wz)l more clearly for higher-
frequency regions. In this particular example, energy concentration along ap-
proximately diagonal directions is also visible. This is because of the many sharp
discontinuities in the image along approximately diagonal directions. This example
shows that most of the energy is concentrated in a small region in the frequency
plane.
Since most of the signal energy is concentrated in a small frequency region,
an image can be reconstructed without significant loss of quality and intelligibility
from a small fraction of the transform coefficients. Figure 1.34 shows images that
were obtained by inverse Fourier transforming the Fourier transform of the image
in Figure 1.33(a)
u . after setting- most of the Fourier transform coefficients to zero.
, 1
The percentages of the Fourier transform coefficients that have been preserved in
I
Figure 1.34 Illustration of energy concentration in the Fourier transform domain for a typical
image. (a) Image obtained by preserving 12.4% of Fourier transform coefficients of the image
Figure 1.33 Example of the Fourier transform magnitude of an image. (a) Original image in Figure 1.33(a). All other coefficients are set to 0. (b) Same as (a) with 10% of Fourier
x(n,, n,) of 512 x 512 p~xels;(b) /X(w,, w,)I'~, scaled such that the smallest value maps to transform coefficients preserved; (c) same as (a) with 4.8% of Fourier transform coefficients
the darkest level and the largest value maps to the brightest level. The operation ( . ) I d has preserved.
i
the effect of compressing large amplitudes while expanding small ampl~tudes,and therefore
shows (X(w,, w,)( more clearly for higher-frequency regions.
40 Signals, Systems, and the Fourier Transform Chap. 1 31 Sec. 7.4 Additional Properties of the Fourier Transform
Figures 1.34(a), (b), and (c) are 12.4%, lo%, and 4.8%, respectively. The fre- Let us integrate fc(t,, t2) along the parallel rays shown in Figure 1.36. The
quency region that was preserved in each of the three cases has the shape (shaded angle that the rays make with the t,-axis is denoted by 0. The result of the
region) shown in Figure 1.35. integration at a given 8 is a 1-D function, and we denote it by p,(t). In this figure,
The notion that an image with good quality and intelligibility can be recon- p,(O) is the result of integrating fc(tl, t,) along the ray passing through the origin.
structed from a small fraction of transform coefficients for some transforms, for The function p,(t), which is called the projection of fc(tl, t,) at angle 0 or Radon
instance the Fourier transform, is the basis of a class of image coding systems transform of fc(tl, t2), can be expressed in terms of fc(tl, t,) by
known collectively as transform coding techniques. One objective of image coding
is to represent an image with as few bits as possible while preserving a certain level (1.46)
of image quality and intelligibility. Reduction of transmission channel or storage Equation (1.46) arises naturally from the analysis of an x-ray image. Consider a
requirements is a typical application of image coding. In transform coding, the 2-D object (a slice of a 3-D object, for example) through which we radiate a
transform coefficients of an image rather than its intensities are coded. Since only monoenergetic x-ray beam, as shown in Figure 1.36. O n the basis of the Lambert-
a small fraction of the transform coefficients need to be coded in typical applica- Beer law, which describes the attenuation of the x-ray beam as it passes through
tions, the bit rate required in transform coding is often significantly lower than an object, and of a model of a typical film used to record the output x-ray beam,
image coding techniques that attempt to code image intensities. The topic of the image recorded on film can be modeled by p,(r) in (1.46), where fc(rl, r,) is
image coding is discussed in Chapter 10. the attenuation coefficient of the 2-D object as a function of two spatial variables
t1 and t2. The function fc(tl, t,) depends on the material that composes the 2-D
1.4.3 The Projection-Slice Theorem object at the spatial position (t,, t,). T o the extent that the attenuation coefficients
Another property of the Fourier transform is the projection-slice theorem, which of different types of material such as human tissue and bone differ, fc(tl, t2) can
is the mathematical basis of computed tomography (CT). Computed tomography be used to determine the types of material. Reconstructing fc(tl, t2) from the
has a number of applications, including the medical application of reconstructing recorded p,(t) is, therefore, of considerable interest.
cross sections of a human body from x-ray images. The impact of computed Consider the 1-D analog Fourier transform ofp,(t) with respect to the variable
tomography on medicine requires no elaboration. t and denote it by P,(R), so that
Consider a 2-D analog function fc(rl, t2) where r1 and t2 are continuous var-
iables. The subscript c denotes that the signal is a function of a continuous variable pe(t)e-jar dt. (1.47)
or variables. The analog Fourier transform Fc(Rl, R,) is related to fc(tl, t,) by
It can be shown (see Problem 1.33) that there is a simple relationship between
P,(R) and Fc(Rl, R,), given by
Po(a) = Fc (R1, 2' ) In,=ncos~,Rz=R
s,ne = Fc(R cos 0, fl sin 0). (1.48)
I
I Figure 1.35 Shape of the frequency
region where Fourier transform coeffi-
cients are preserved in obtaining the im- Figure 1.36 Projection of f,(t,. t 2 ) at
ages in Figure 1.34. angle 0 .
Signals, Systems, and the Fourier Transform Chap. 1 Sec. 1.4 Additional Properties of the Fourier Transform 43
Figure 1.38 Values of (I,, t,) for which
Figure 1.37 Projection slice theorem. is affected by q,.(rl) in the fil-
f,(t,, I,)
P,(n) is the 2-D Fourier transform tered back-projection reconstruction
F,(R,, R,) evaluated along the dotted method. They can be described by r' =
line. t, cos 0' + t, sin 0'.
Expressed graphically, (1.48) states that the 1-D Fourier transform of the projection
process of determining fc(tl, t2) from q,(t) using (1.49) can be viewed as a back-
pe(t) is Fc((4, a,) evaluated along the slice that passes through the origin and
makes an angle of 0 with the R, axis, as shown in Figure 1.37. The relationship projection. Consider a particular 0 and t , say 0' and 1'. From (1.49), the values of
(t,, t2) for which fc(t,, t,) is affected by q,t(tr) are given by t' = t, cos 0' + t, sin 0'.
in (1.48) is called the projection-slice theorem.
The projection-slice theorem of (1.48) can be used in developing methods to These values are shown by the straight line in Figure 1.38. Furthermore, the
reconstruct the 2-D function fc(t,, t2) from its projections p,(t). One method is to contribution that qer(t1)makes to fc(tl, t2) is equal at all points along this line. In
compute the inverse Fourier transform of FC(!2,, R,) obtained from p,(t). Spe- essence, q , ~ ( t ' )is back-projected in the (t,, t,) domain. This back-projection takes
cifically, if we compute the 1-D Fourier transform of p,(t) with respect to t for all place for all values of t' and is integrated over all values of 0 ' . Since q,(t) is a
0 5 0 < n , we will have complete information on F c ( R l , 02).In practice, of filtered version o f p , ( t ) , this technique is called the filtered back-projection method.
course, p,(t) cannot be measured for all possible angles 0 5 0 < n , so FC(11,,f12) In practice, p,(t) is not available for all values of 0. As a result, q,(t) must be
interpolated from the known slices of q,(t).
must be estimated by interpolating known slices of Fc(Rl, R2).
Another reconstruction method, known as the filtered back-projection method, In addition to the interpolation involved in both the direct Fourier transform
is more popular in practice and can be derived from (1.45b) and (1.48). It can method and the filtered back-projection method, a number of practical issues arise
be shown [Kak] that in reconstructing fc(t,, t,) from p,(t). For example, the Fourier transform, inverse
Fourier transform, filtering, and integration require a discretization of the problem,
which raises a variety of important issues, including sampling and aliasing. In
practice, the measured function p,(t) may be only an approximate projection of
f,(t,, t2). In addition, the measured data may not have been obtained from parallel-
where q,(t) is related to p,(t) by beam projection, but instead from fan-beam projection, in which case a different
set of equations governs. More details on these and other theoretical and practical
issues can be found in [Scudder, Kak]. We will close this section with an example
in which a cross section of a human head was reconstructed from its x-ray projec-
tions. Figure 1.39 shows the reconstruction by the back-projection method.
The function h ( t ) , which can be viewed as the impulse response of a filter, is given
by
1.5 DIGITAL PROCESSING OF ANALOG SIGNALS
Most signals that occur in practice are analog. In this section, we discuss digital
where Rc is the frequency above which the energy in any projection p,(t) can be processing of analog signals. Since the issues that arise in digital processing of
assumed to be zero. From (1.49) and (1.50), we can see that one method of analog signals are essentially the same in both the 1-D and 2-D cases, we will
reconstructing fc(tl, t2) from p,(t) is to first compute q,(t) by filtering (convolving) briefly summarize the 2-D results.
p,(t) with h(t) and then to determine fc(tl. t2) from qe(t) by using (1.49). The Consider an analog 2-D signal xc(tl, 1,). We'll denote its analog Fourier
44 Signals, Systems, and the Fourier Transform Chap. 1 Sec. 1.5 Digital Processing of Analog Signals 45
Figure 1.39 Cross section of a human
head reconstructed from its projections
by the filtered back-projection method.
Courtesy of Tamas Sandor.
57
T2
-(t,-n,T,) --(t2-n,T,)
fl:/n and 1/T2 > flzln, xc(tl, t,) can be recovered from x(n,, n,). Otherwise, Tl T2
xc(tl,t,) cannot be exactly recovered from x ( n l , n,) without additional information
on xc(tl,t,). This is the 2-D sampling theorem, and is a straightforward extension The function yc(tl, t,) is identical to x,(tl, t2) when the sampling frequencies used
of the 1-D result. in the ideal AID converter are sufficiently high. Otherwise, yc(tl, t,) is an aliased
An ideal digital-to-analog ( D I A ) converter recovers xc(tl, t,) from x ( n l , n,) version of xc(tl, t,). Equation (1.54) is a straightforward extension of the 1-D
when the sampling frequencies 1/F1and 1/T2are high enough to satisfy the require- result.
46 Signals, Systems, and the Fourier Transform Chap. 1 Sec. 1.5 Digital Processing of Analog Signals 47
An analog signal can often be processed by digital processing techniques using
Prefilter
(lowoass) AID
Digital
processing DIA I the AID and DIA converters discussed above. The digital processing of analog
signals can, in general. be represented by the system in Figure 1.41. The analog
lowpass filter limits the bandwidth of the analog signal to reduce the effect of
Figure 1.41 Digital processing of analog signals. aliasing. In digital image processing, the analog prefiltering operation is often
performed by a lens and the scanning aperture used in converting an optical image
to an electrical signal. The importance of the antialiasing filter is illustrated in
Figure 1.42. Figure 1.42(a) shows an image of 128 x 128 pixels with little aliasing
due to an effective antialiasing filter used. Figure 1.42(b) shows an image of 128
x 128 pixels with noticeable aliasing.
The AID converter of (1.52) is based on sampling on the Cartesian grid. The
analog signal can also be sampled on a different type of grid. Sampling on a
hexagonal grid is discussed in Problem 1.35.
REFERENCES
In this text, we have assumed that the reader is familiar with fundamentals of 1-D
digital signal processing. For a comprehensive treatment of 1-D digital signal
processing concepts, see [Oppenheim and Schafer (1975); Rabiner and Gold; Lim
and Oppenheim; Oppenheim and Schafer (1989)l.
For different viewpoints or more detailed treatment of some topics in 2-D
digital signal processing, see [Huang; Huang; Dudgeon and Mersereau]. For
collections of selected papers on 2-D digital signal processing, see [Mitra and
Ekstrom; IEEE].
For a more detailed treatment of the Fourier transform theory, see [Papoulis].
For processing data obtained from sampling on any regular periodic lattice including
the rectangular lattice and hexagonal lattice, see [Mersereau; Mersereau and Speake].
Y. M. Bruck and L. G. Sodin, On the ambiguity of the image reconstruction problem, Opt.
Commun., September 1979, pp. 304-308.
D. E. Dudgeon and R. M. Mersereau, Multidimensional Digital Signal Processing. En-
glewood Cliffs, NJ: Prentice-Hall, 1983.
M. H. Hayes, The reconstruction of a multidimensional sequence from the phase or mag-
nitude of its Fourier transform, I E E E Trans. on Acozut., Speech, and Sig. Proc., Vol.
ASSP-30, April 1982, pp. 140-154.
T. S. Huang, ed., Two-Dimensional Digital Signal Processing I , in "Topics in Applied
Physics," Vol. 42. Berlin: Springer-Verlag, 1981.
T. S. Huang, ed.,Two-Dimensional Digital Signal Processing 11, in "Topics in Applied
Physics," Vol. 43. Berlin: Springer-Verlag, 1981.
Figure 1.42 (a) Image of 128 X 128
pixels with little aliasing due to an IEEE, ASSP Society's MDSP Committee, editor, Selected Papers in Multidimensional Digital
effective antialiasing filter; (b) image of Signal Processing, IEEE Press, New York, 1986.
128 x 128 pixels with noticeable alias- D. Izraelevitz and J. S. Lim, A new direct algorithm for image reconstruction from Fourier
(b) ing.
Chap. 1 References 49
Signals, Systems, and the Fourier Transform Chap. 1
transform magnitude. IEEE Trans. on Acoust., Speech, and Sig. Proc., Vol. ASSP-35,
April 1987, pp. 511-519.
A. C. Kak, "Image Reconstruction from Projections," in Digital Image Processing Tech-
niques, edited by M. Ekstrom. Orlando. FL: Academic Press, 1984, Chapter 4.
R. G . Lane, W. R. Fright, and R. H. T. Bates. Direct phase retrieval, IEEE Trans. on
Acoust., Speech, and Sig. Proc., Vol. ASSP-35, April 1987, pp. 520-525.
J. S. Lim and A. V. Oppenheim, ed., Advanced Topics in Signal Processing, Englewood
Cliffs, NJ: Prentice-Hall, 1988.
R. M. Mersereau, "The processing of hexagonally sampled two-dimensional signals," Proc.
IEEE, Vol. 67, May 1979, pp. 930-949.
R. M. Mersereau and T. C. Speake, The processing of periodically sampled multidimensional
signals, IEEE Trans. on Acoust., Speech, and Sig. Proc.. Vol. ASSP-31, February 1983,
pp. 188-194. Figure P1.2
S. K. Mitra and M. P. Ekstrom, eds. Two-Dimensional Digital Signal Processing. Strouds-
burg, PA: Dowden, Hutchinson and Ross, 1978.
A . V. Oppenheim, J. S. Lim, and S. R. Curtis, Signal synthesis and reconstruction from Express x(n,, n,) as a linear combination of 6(n,, n,) and its shifts.
partial Fourier domain information. J. Opt. Soc. Atner., Vol. 73, November 1983, pp. 1.3. We have defined a sequence x(n,, n,) to be periodic with a period of N , x N, if
1413-1420.
A. V. Oppenheim and R. W. Schafer, Digital Signal Processing. Englewood Cliffs, NJ: x ( n , , n,) = x(n, + N l j n,) = x(n,, n2 + N,) for all ( n , , n,). (1)
Prentice Hall, 1975. More generally defined, the condition is
A. V. Oppenheim and R. W. Schafer, Discrete-Time Signal Processing. Englewood Cliffs, x(n,, n,) = x(n, + N,,, n, + N,,) = x(n, + NZ,, n, + N22) for all (n,,nz)
NJ: Prentice Hall, 1989. (2)
A. Papoulis, The Fourier Integral and Its Applications. New York: McGraw-Hill, 1962. with the number of points in a period given by
L. R. Rabiner and B. Gold, Theory and Application of Digital Signal Processing. Engle-
wood Cliffs, NJ: Prentice Hall, 1975.
(a) Show that the condition in ( 2 ) reduces to the condition in ( 1 ) with a proper choice
G. N. Ramachandran and R. Srinivasan, Fourier Methods in Crystallography. New York:
Wiley-Interscience, 1978. of N l , , N,,, Nz,, and N,,.
(b) Consider the periodic sequence x ( n , , n,), which was shown in Figure 1.8. If we
W. 0. Saxton, Computer Techniques for Itnage Processing in Electron Microscopy. New use ( I ) , the minimum choices of N , and N, are 6 and 2, respectively, and the
York: Academic Press, 1970. number of points in one period is 12. If we use ( 2 ) , N , , , N,,, N2,, and N12 can
H. J. Scudder, Introduction to computer aided tomography. Proc. ZEEE. Vol. 66. June be chosen such that IN,,Nz2 - Nl2N,,I = 6. Determine one such set of N , , , N,,,
1978, pp. 628-637. N21, and N22.
V. T. Tom, T. F. Quatieri, M. H. Hayes, and J. H. McClellan, Convergence of iterative (c) Show that any sequence that satisfies the condition in ( 2 ) will also satisfy the
nonexpansive signal reconstruction algorithms, ZEEE Trans. on Acoust., Speech, and Sig. condition in ( 1 ) as long as N , and N, are chosen appropriately. This result shows
Proc., Vol. ASSP-29, October 1981, pp. 1052-1058. that ( 1 ) can be used in representing any periodic sequence that can be represented
by ( 2 ) , although the number of points in one period may be much larger when
( 1 ) rather than ( 2 ) is used.
1.4. For each of the following systems, determine whether or not the system is ( 1 ) linear,
( 2 ) shift invariant. and ( 3 ) stable.
PROBLEMS
sin,, n2)
Figure P1.6
1.9. Consider a 2-D LSI system whose impulse response h(n,, n,) has first-quadrant sup-
For each of the following three cases, determine the class of inputs for which we can port. When the input x(n,, n,) to the system is given by
determine the output of the system in terms of s(n,, n,). For each input in the class,
express the output in terms of s(n,, n,). x(n,, n2) = 2n1n2u(n,,
n,),
(a) T is linear, but not shift invariant.
(b) T is shift invariant, but not linear. some portion of the output y(n,, n,) has been observed. Suppose the observed portion
(c) T is linear and shift invariant. of y(n,, n z ) is as shown in the following figure.
1.7. Compute x(n,, n,) * h(n,, n,) for each of the following two problems.
(a)
....
"2
A : hln,, n,) = ulnl, "2)
A
0 e'.
~ l n , n,)
,
o . . . / \
4 (4) • (22) not observed
outside the
D e e . boundary
* = = +"I 014) @(14)
\
\
0 (2) (8) (14) el231
c- -
Ah (1) 13) - (4) - (4) t
"1 Figure P1.9
Figure P1.7
where I,. l,, l,, and I, are integers. Consider the following wedge support sequence
x(n,, n2).
Express h(n,, n,) in terms of hl(n,, n,), hz(n,, nz). h,(n,, n,), h,(n,, n,) and h,(n,, n2).
1.12. Let h(n,, n,, n,) be a separable, finite-extent, 3-D sequence of M x M x M points
which can be expressed as
(a) The Fourier transform of h(n,, n,, n,), H(w,, w,, 04,is defined by
I
Show that H(wl, w,, w,) is a separable function that can be expressed in the form
of A(w1)B(w2)C(w3).
(b) We wish to filter an input sequence x(n,, n,, n,) of N x N x N points using an
LSI system with impulse response h(n,, n,, n,) as given above. Develop a com-
putationally efficient way to compute the output y(n,, nz, n,).
(c) How does your method compare to direct evaluation of the convolution sum for I Figure P1.17
each output point when N = 512 and M = l o ?
The shaded region in the figure is the region of ( n , , n,) for which x(n,, n,) is nonzero.
1.13. A n LSI system can be specified by its impulse response h(t1,. 11,). A n LSI system Determine one specific linear mapping of variables that maps s(t1,. n,) to a first-
can also be specified by its unit step response s(n,, n,), the response of the system quadrant support sequence y(n,, n,) without affecting its stability.
when the input is the unit step sequence u(n,, n,).
1.18. Determine if each of the following sequences is an eigenfunction of a general LSI
(a) Express y(n,, nz), the output of an LSI system, in terms of the input x ( n , , n,) and
system.
the unit step response s(n,, n,).
(b) In determining the output y(n,, nz) of an LSI system, which of the two methods
.
(a) x(n, n2) = elw2"?
requires less computation: your result in (a), or convolving x(tl,, n,) with
+
(b) x(nl, n,) = an' b"?
(c) x(n,, n,) = elwl"lelw~"'
h(n,, n,)?
(d) x(n,. n3) = e ~ ~ ~ " ~ e ~ ~nz)
?"~u(n,,
1.14. Show that an LSI system is stable in the B I B 0 (bounded-input-bounded-output) sense
if and only if the impulse response of the system h(n,, n,) is absolutely summable, 1.19. Determine the Fourier transform of each of the following sequences.
that is, (a) 6(n,, nz)
(b) a2n1+n2u(nl, n,), la1 < 1
i: i
"I= -I n?= - "
lh(n , , n 2 )/ < x. (c) anlbn?6,(4n, - nz)uT(n,),la1 < 1, Ibl < 1
(d) n,(i)"' (!)"'u(n,. t12)
(1, - 5 2
f(x' y, = 0, otherwise.
where J,(x) is the Bessel function of the first kind and first order.
1.29. Cosine transforms are used in many signal processing applications. Let x(n,, n,)
be a real, finite-extent sequence which is zero outside 0 5 n , I N, - 1, 0 5 n, 5
N2 - 1. O n e of the possible definitions of the cosine transform C,(o,, o,) is
PI-I N2-I
Figure P1.25 (a) Express C,(o,, 0 2 ) in terms of X ( o , , o,), the Fourier transform of x(n,, n,).
(b) Derive the inverse cosine transform relationship; that is, express x(n,, nz) in terms
1.26. It is well known that circular symmetry of X ( o , , o,) implies circular symmetry of of C,(o,, w2).
x(n,, n,). However, circular symmetry of x(n,, n,) does not imply circular symme- 1.30. In reconstructing an image from its Fourier transform phase, we have used an iterative
try of X(w,, o,). To show the latter, determine a circularly symmetric sequence algorithm, shown in Figure 1.30. The method of imposing constraints separately in
x(n,, n,) that is a function of n: + nI with the property that X(w,, w2) cannot be ex- each domain in an iterative manner in order to obtain a solution that satisfies all the
pressed as a function of + o; for o: + of 5 'R,. required constraints is useful in a variety of applications. O n e such application is the
1.27. Evaluate the following expression: band-limited extrapolation of a signal. A s an example of a band-limited extrapolation
problem, consider x(n,, n2), which has been measured only for 0 5 n , 5 N - 1 , 0 5
n, 5 N - 1. From prior information, however, we know that x(n,. n,) is band-
limited and that its Fourier transform X ( o , , w,) satisfies X(o,. 0 2 ) = 0 for
. Develop an iterative algorithm that may be used for determining
x(n,, n,) for all (n,, n,). You do not have to show that your algorithm converges to
where J,(.) is the Bessel function of the first kind and first order. 'R
1.28. Let f(x, y ) denote a 2-D analog function that is circularly symmetric and can therefore a desired solution. However, using N = 1, x(0, 0) = 1, and o, = -
2 ' carry out a
be expressed as few iterations of your algorithm and illustrate that it behaves reasonably for at least
this particular case.
otherwise.
Tr(Tr/10)2
The fraction of the frequency components retained is -or approximately 1%.
By evaluating the quantity 4Tr2
discuss the amount of distortion in the signal caused by discarding 99% of the frequency
components.
1.32. For a typical image, most of the energy has been observed to be concentrated in the
low-frequency regions. Give an example of an image for which this observation may
not be valid.
1.33. In this problem, we derive the projection-slice theorem, which is the basis for com-
puted tomography. Let f ( t , , t,) denote an analog 2-D signal with Fourier transform
F(R1, a , ) . Figure P1.33
(a) We integrate f (t,, t,) along the t, variable and denote the result by p,(t,); that is,
The result states that when f(t,, t,) is rotated by an angle 0 with respect to the
origin in the (t,, t,) plane, its Fourier transform F ( R , , rotates by the same a,)
angle in the same direction with respect to the origin in the ( a , , R,) plane. This
(b) We integrate f ( t l , t2) along the t , variable and denote the result by p,,,(t,);
is,
that
I is a property of the 2-D analog Fourier transform.
(d) Suppose we integrate f ( t , , t,) along the u variable where the u variable axis is
shown in Figure P1.33(a). Let the result of integration be denoted by p,(t). The
function p,(t) is called the projection of f ( t l , t,) at angle 0 . Using the results of
(a) and (c) or the results of (b) and (c), discuss how P,(R) can be simply related
to F(R,, a , ) , where
Express P,,,(R) in terms of F(R,, R,), where P,,,(R) is the 1-D Four~ertransform
of ~,,2(t2) given by I4
PTI2(n) = 1%
o= - % ~ , ( t , ) e - ~ ' ~ dl2. The relationship between P,(R) and F ( R , , R,) is the projection-slice theorem.
Suppose the spectra of s,(tl, t,) and w,(t,, t,) are nonzero only over the shaded regions
shown in the following figure.
I Figure P1.35a
Figure P1.34
W e wish to filter the additive noise w,(tl, t,) by digital filtering, using the following
I
system: The function p,(t,, t,) is a periodic train of impulses given by
= rn
57
sin 7(tl - n , T l ) sin ,
'R
(1, - n2T2) where S(t,. 1,) is a dirac-delta function. The system G converts an analog signal
x,(t,, t,) to a sequence x(n,, n,) by measuring the area under each impulse and using
it as the amplitude of the sequence x(nl. n,).
(a) Sketch an example of x,(tl, t,), x,(tl, t,) and x ( n , , n,). Note, from (I), that
Assuming that it is possible to have any desired H ( o l , o,),determine the maximum x(n,, n,) is zero for even n, and odd n, o r odd n, and even n,.
( b ) Determine P,(fLl,a,). Note that the Fourier transform of
T I and T, for which L,(t,, t,) can be made to equal s,(tl, t,).
1.35. I n Section 1.5, we discussed the results for the ideal AID and DIA converters when
the analog signal is sampled on a rectangular grid. In this problem, we derive the
corresponding results when the analog signal is sampled on a hexagonal grid. Let
x,(t,, t,) and X c ( f l l ,0,)denote an analog signal and its analog Fourier transform. is given by
Let x ( n l , n,) and X ( o , , w,) denote a sequence and its Fourier transform. An ideal
AID converter converts x,(tl, t,) to x(nl, n,) by
f2)lrl=nlTI n = n Z T 2 , if both nl and n, are even, o r (c) Express X,(R,, a,)in terms of X , ( a l , a,).
x(nl, n,) = both n, and n, are odd (1) (d) Express X ( o l , o,) in terms of X , ( R , , a,).
otherwise.
The Fourier transform discussed in Chapter 1 converges uniformly only for stable
sequences. As a result, many important classes of sequences, for instance, the
unit step sequence u(nl, n,), cannot be represented by the Fourier transform. In
this chapter, we study the z-transform signal representation and related topics.
The z-transform converges for a much wider class of signals than does the Fourier
transform. In addition, the z-transform representation of signals and systems is
very useful in studying such topics as difference equations and system stability. In
Section 2.1, we discuss the z-transform and its properties. Section 2.2 deals with
the linear constant coefficient difference equation, which is important in studying
digital filters. Section 2.3 treats topics related to system stability. In our discussion
I Figure P1.35b of the z-transform, the linear constant coefficient difference equation, and system
stability, we will find fundamental differences between one-dimensional and two-
We now derive the results for the ideal DIA converter for a hexagonal sampling grid. dimensional results.
When x,(tl, t2)is sampled at a sufficiently high rate, the ideal DIA converter recovers
x,(tl, t,) exactly from x(nl, n,). It is convenient to represent the ideal DIA converter 2.1 THE Z-TRANSFORM
- -
by the system shown below:
,,I=-=
x
"?=-r
anlbn'u(n,,n , ) ~ ; " ~ z ; " ~
Fortunately, however, the ROC depends only on lzl for 1-D signals and lzll and
lz,l for 2-D signals. Therefore, an alternative way to sketch the R O C for 1-D
signals would be to use the lzl axis. The ROC in Figure 2.l(a) sketched using
the lzl axis is shown in Figure 2.l(b). In this sketch, each point on the lzl axis - 1 1
corresponds to a 1-D contour in the z plane. For 2-D signals, we can use the 1 - a z r L1 - b z - I '
> lal
l ~ l l and Iz,l > Ibl
( 1 ~ ~1z21)
1 , plane to sketch the R O C ; an example is shown in Figure 2.l(c). In this
The ROC is sketched in Figure 2.2(b).
sketch, the point at (lzll = 1, 1z21 = 1) corresponds to the 2-D unit surface, and
each point in the (lzll, 1z21)plane corresponds to a 2-D subspace in the 4-D ( z , , 2,)
For 1-D signals, poles of X ( z ) are points in the z plane. For 2-D signals,
space. The R O C plays an important role in the z-transform representation of a poles of X ( z l , z,) are 2-D surfaces in the 4-D ( z , , 2,) space. In Example 1 , for
sequence, as we shall see shortly.
instance, the poles of X ( z l , 2,) can be represented as ( 2 , = a , any 2,) and (any 2,.
66 The z-Transform Chap. 2
Sec. 2.1 The z-Transform 67
Example 2
We wish to determine the z-transform and its ROC for the following sequence:
x(n,, n,) = -anlbn2u(-n, - 1 , n l ) .
The sequence is sketched in Figure 2.4(a). Using the z-transform definition in (2.1).
1-
' and after a little algebra, we have
. Examples 1 and 2 show the importance of the ROC in the z-transform rep-
Figure 2.2 ( a ) Sequence x(n,, n,) = an1bnzu(n1,
n,) and ( b ) ROC of its resentation of a sequence. Even though the two sequences in Examples 1 and
z-transform. 2 are very different, their z-transforms are exactly the same. Given only the
z-transform, therefore, the sequence cannot be uniquely determined. Uniquely
determining the sequence requires not only the z-transform but also its ROC.
2, = 6). Each of the two pole representations is a 2-D surface in the 4-D (z,, 2,)
space. Example 3
For the 1-D case, the ROC is bounded by poles. For the 2-D case, the ROC We wish to determine the z-transform and its ROC for the following sequence:
is bounded by pole surfaces. To illustrate this, consider the pole surfaces in
x(n,, n,) = anlu(n,,n2)uT(nl- n,).
Example 1. Taking the magnitude of z, and z, that corresponds to the pole
surfaces, we have lzll = la1 and 1z21 = lbl. These are the two solid lines that bound The sequence is sketched in Figure 2.5(a). Using the z-transform definition in ( 2 . 1 ) .
the ROC,as shown in Figure 2.2(b). Note that each of the two solid lines in we have
Figure 2.2(b) is a 3-D space, since each point in the ( 1 ~ ~1z21)
1 , plane is a 2-D space.
The pole surfaces, therefore, lie in the 2-D subspace within the 3-D spaces cor-
responding to the two solid lines in Figure 2.2(b). To illustrate this point more
clearly, consider the 3-D space [the vertical solid line in Figure 2.2(b)] correspond-
ing to lzll = la[. This 3-D space can be represented as (2, = lale~"', any z,), as
shown in Figure 2.3. The pole surface corresponding to (2, = a , any z,) is sketched
as the shaded region in Figure 2.3.
= P2 ,,>=o
'"I + ",I! (az; l>"!(bz, I)"'.
nl!n2!
Letting m = n, + n,, X(z,, z,) can be expressed as
Noting that
The ROC is sketched in Figure 2.5(b). The two pole surfaces lie in the 2-D subspaces
within the two solid lines (3-D spaces) in Figure 2.5(b).
The pole surface is given by (any z,, z, = bz,l(z, - a)) and is the shaded (dotted)
Example 4 1 region in Figure 2.6(b). The pole surface is a 2-D surface in the 4-D (z,, z,) space
and it is visible in this example as a 2-D space in the (lz,l, 1z21)plane. To determine
We wish to determine the z-transform and its ROC for the following sequence: I
the ROC, we note that the ROC depends on ((z,(,1z21). For (lzil, 1.~41)to be in the
(n, + n?)! ROC, (z, = Iz;lejUf,z, = Iz;leju2) has to satisfy Iaz;' + bz;'l < 1 for all (w,, a,).
x(nl, n,) = anlblu(n,, n?). After some algebra, the ROC is given by
n,!nl!
ROC: la1 lzll-' + Ibllz21-' < 1.
The ROC is the shaded (lined) region in Figure 2.6(b). The ROC can also be obtained
using (2.4). Note that some values of (z,, z,) which satisfy lazr1 + bz;'l < 1 are
not in the ROC. For example, (z, = a, z, = - b) satisfies (az;' + bz;'I < 1, but
it is not in the ROC. This is because (z, = (alejWl,z, = Ibleju2) does not satisfy
laz;' + bz;'l < 1 for all (w,, w,).
Example 5
We wish to determine the 2-transform and its ROC for the following sequence:
The sequence is sketched in Figure 2.7. Using the z-transform definition in (2.1),
we have
The above expression does not converge for any (z,, z,), and therefore the z-transform
Figure 2.5 (a) Sequence x(n,. n,) = an'u(n,,n,)u,(n, - n2) and (b) ROC of its does not exist.
z-transform.
a valid z-transform to have first-quadrant support. The condition is that for any
( z ; , z;) in the ROC, all ( z , , 2,) in the shaded region in Figure 2.8 are also in the
ROC. The sketch in Figure 2.8 is called a constraint m a p , since it shows the
constraints that the ROC of any first-quadrant support sequence has to satisfy.
This property can be demonstrated straightforwardly. To see that a first-quadrant
support sequence implies the condition in the constraint map, note that the z-
transform of any first-quadrant support sequence x(n,, n,) can be expressed as
' ROC
Consider any ( z ; , z;) in the ROC. From the definition of the ROC, then,
Now consider any ( z , , 2,) such that lzll 2 Iz; I and ( z z (2 lz;l. We have
Equation (2.7) shows that if (z;, z;) is in the ROC, then all (z,, 2,) such that
lzll z-lz;l and 1z21 2 lz;l will also be in the ROC; this is the condition in the con-
straint map of Figure 2.8. To demonstrate that the condition in the constraint
map implies that the sequence will be a first-quadrant support sequence, we will
(b) Second-quadrant
show that if the sequence is not a first-quadrant support sequence, then the con- support sequence "I
dition in the constraint map cannot be true. We can do this because "B implies
A" is logically equivalent to "Not A implies not B." If a sequence is not a first-
quadrant support sequence, then the z-transform of such a sequence can be ex-
pressed as
.
. .
.
X(zl, 2,) = 2 2
nl = 0 nz=O
~ ( n , n, , ) ~ ; ~ I ~ z+; ~additional
~ term(s) (2.8)
(c) Third-quadrant
support sequence * "1
where the additional term(s) are in the form Z Y I Z ~where
~ ~ , m, or m, is a positive
integer. Because of the additional term(s), there is a pole surface at lzll = X , lz,l
= =, or both. Hence, it is not possible for the ROC to include all (z,, 2,) such I z1 I
that (z,I > 1z;I and 1.~~1
> lz;l where (z;, z;) is in the ROC.
Examples of ROCs that satisfy the condition in the constraint map of Figure
2.8 were shown in Figures 2.2(b), 2.5(b), and 2.6(b). Three sequences whose z-
transforms have the three ROCs are x(n,, n,) = anlbnzu(nl,n,), x(n,, n,) =
arllu(nl, n2)uT(nl - n,), andx(nl, n2) = ((n, + n,)!ln,!n,!) anlb'u(nl, n,), as dis- (d) Fourth-quadrant
support sequence "1
1lo
in (2.1). Some of the more important properties are listed in Table 2.2. All the
properties except Property 3 and Property 7 can be viewed as straightforward
extensions of the 1-D case. Property 3 applies to separable sequences, and Prop-
erty 7 can be used in determining the z-transform of a first-quadrant support
Figure 2.9 Constraint maps of ROCs of various special support sequences.
sequence obtained by linear mapping of the variables of a wedge support sequence.
X ( z , , z , ) , ROC: R,
Y ( z , , z , ) , ROC: R,.
Although the conditions that the contours C1 and C, in (2.9) must satisfy
appear complicated, there exists a simple way to determine the contours C1 and
C, when the ROC is given. Suppose (lzil, IZ;~)lies in the ROC. One set of the
Property I . Linearip
u x ( n l ,n , ) -
+ b y ( n , , n,) a X ( z , , 2,) + bY(z,, z,) ROC: at least R, n R,.
contours C, and C2 that satisfies the conditions in (2.9) is
Property 2. Convolution
-
x ( n l , n,) * y ( n , , n,) X ( z l , z Z ) Y ( z , ,z , ) ROC: at least R, n R ,
Property 3. Separable Sequence
- ROC: lzll C ROC o f X , ( z , ) and )z,( C ROC of X,(z,)
-
xl(nl)x2(n,) X,(z,)X,(z2) If the sequence is stable so that the ROC includes the unit surface, one possible
Property 4. Shift of a Sequence set of contours is
x ( n , - m , , n, - m , ) X ( z I ,z2)z;"'l~zrn?
ROC: R, with possible exception o f lzll = 0 , x , 1z21 = 0 , x C 1.' z 1 = ejwl, w,: 0 to 257
Property 5. Differentiation
- ax(z1, 2 2 ) z l , ROC: R,
C2: z2 = elw2, w,: 0 to 257.
If'this set of contours is chosen in (2.9), then (2.9) reduces to the inverse Fourier
-
( a ) - n l x ( n , , n,)
az 1 transform equation of (1.31b). This again shows that the Fourier transform rep-
ax(z,. z2)
(b) - ng(n13 n2) z2, ROC: R, resentation is a special case of the z-transform representation.
-- From (2.9), the result of the contour integration differs depending on the
a22
Property 6. Symmetry Properties contours. Therefore, a sequence x(nl, n,) is not uniquely specified by X(zl, 2,)
( a ) x * ( n , , n,) X*(z:. z:) ROC: R, alone. Since the contours C, and C, can be determined from the ROC of
ROC: ( z ; ' l , ) z ~ ' in
1 R,
-
(b) x(-n,, - 4 ) X ( z r 1 .z z l ) X(zl, z,), however, a sequence x(nl, n,) is uniquely specified by both X(zl, 2,)
Property 7. Linear Mapping of Variables and its ROC:
x ( n , , n,) = y ( m , , m 2 ) l , , = ~ ,+, ,I ~ , ~ . , ? = , ~ ~ I + I ~Y~ (~ ~ 2I 2,) = X ( ~ : ' Z ?z%,"
ROC: (lz:lz$1, lz:?z$1)in R,
Note: For those points o f y ( m l , m,) that d o not correspond t o any x ( n , . n 2 ) ,y ( t n , , m , )
is taken t o b e zero.
In theory, (2.9) can be used directly in determining x(nl, n,) from X(zl, 2,)
Property 8. Useful Relations
( a ) For a first-quadrant support sequence, x ( 0 , 0 ) = lim lim X ( z , , z,)
and its ROC. In practice, however, it is extremely tedious to perform the contour
z,-% 1?-% integration even for simple problems. The curious reader may attempt (see Prob-
( b ) For a third-quadrant support sequence, x ( 0 , 0 ) = lim lim X ( z , , z 2 ) lem 2.13) t o determine a stable sequence by performing the contour integration in
z1-0 z2-0
(2.9) when X(z,, 2,) is given by
The z-transform definition in (2.1) can be used in determining the z-transform and The sequence x(nl, n,) in this problem is given by
ROC of a 2-D sequence. As in the 1-D case, using (2.1) and Cauchy's integral
theorem, we can determine the inverse z-transform relation that expresses x(nl, n,) n,!
(t)nl (t)n2, nl 2 0, n2 2 0, n1 2 n2
as a function of X(zl, 2,) and its ROC. The inverse z-transform relation, which [(nl - n2)!n2!
is a straightforward extension of the 1-D result, is given by x(n1, n2) = (2.14)
I 0 , otherwise.
For 1-D signals, approaches that have been useful in performing the inverse
z-transform operation without explicit evaluation of a contour integral are series
expansion, partial fraction expansion, and the inverse z-transform using the z-
where Cl and C2 are both in the ROC of X(z,, z,), C1 is a closed contour that transform properties. The partial fraction expansion method, combined with the
encircles in a counterclockwise direction the origin in the z1 plane for any fixed 2, inverse z-transform using properties, is a general method that can always be used
on C2, and C, is a closed contour that encircles in a counterclockwise direction the in performing the inverse z-transform operation for any 1-D rational z-transform.
origin in the 2, plane for any fixed z, on C,. In the 1-D partial fraction expansion method, the z-transform X(z) is first expressed
76 The z-Transform Chap. 2 Sec. 2.1 The z-Transform 77
as a sum of simpler z-transforms by factoring the denominator polynomial as a A difference equation alone does not specify a system, since there are many
product of lower-order polynomials as follows: solutions of y(nl, n,) in (2.16) for a given x(n,, n,). For example, if yl(n,, n,) is
a solution to y(n,, n,) = (1/2)y(nl - 1, n, + 1) + (1/2)y(nl + 1, n, - 1) +
x(nl, n,), then so is y,(n,, n,) + f(n, + n,) for any function f. To uniquely specify
a solution, a set of boundary conditions is needed. Since these boundary conditions
must provide sufficient information for us to determine specific functional forms
and the constants associated with the functions, they typically consist of an infinite
number of values in the output y(n,, n,). In this regard, the 2-D case differs
The inverse z-transform is then performed for each of the simpler z-transforms, fundamentally from the 1-D case. In the 1-D case, a difference equation specifies
and the results are combined to obtain x(n). a solution within arbitrary constants. For example, an Nth-order 1-D difference
For 2-D signals, unfortunately, the partial fraction expansion method is not equation specifies a solution within N constants, so N initial conditions (N values
a general procedure. In the 1-D case, the factorization of any 1-D polynomial in the output y(n)) are generally sufficient to uniquely specify a solution. In the
D(z) as a product of lower-order polynomials is guaranteed by the fundamental 2-D case, we typically need a set of boundary conditions that comprise an infinite
theorem of algebra. A 2-D polynomial, however, cannot in general be factored number of points in the output y(nl, n,).
as a product of lower-order polynomials. Therefore, it is not generally possible
to use a procedure analogous to (2.15). 2.2.2 Difference Equations with Boundary Conditions
Partly because of the difficulty involved in the partial fraction expansion
method, no known practical method exists for performing the inverse z-transform The problem of solving a difference equation with boundary conditions can be
of a general 2-D rational z-transform. This limits to some extent the usefulness stated as follows: Given x(n,, n,), andy(n,, n,) for (n,, n,) C RBc, find the solution
of the 2-D z-transform as compared to the 1-D z-transform. For example, a to
procedure involving an inverse z-transform operation may not be useful in 2-D
signal processing. 2 2 a(k1, kdy(n1 - k1, n2 - k2) = 2 2 b(k1, k2)x(n1 - kl, n,
( k l , k z ) ER, ( k 1 . k ~ERb
)
- k,),
78 The z-Transform Chap. 2 Sec. 2.2 Linear Constant Coefficient Difference Equations 79
unknown constants in the homogeneous solution, requires solving a set of linear
equations. This approach can be used in solving (2.18), and typically leads to a
closed-form expression for y(n).
The above approach cannot be used for solving a 2-D difference equation
with boundary conditions. First, there is no general procedure for obtaining the
homogeneous solution. The homogeneous solution consists of unknown functions,
and the specific functional form of kan used in the 1-D case cannot be used in the
2-D case. Second, a particular solution cannot generally be obtained by either
inspection o r the z-transform method, since there is no practical procedure for
performing the inverse z-transform for the 2-D case. Furthermore, determining
the unknown functions in the homogeneous solution by imposing the boundary
conditions (an infinite number of known values of y(n,, n,)) is not a simple linear
problem.
Another approach to solving (2.18) is to compute y(n) recursively. This is
the obvious way to determine y(n) by using a computer, and can be used in solving
Y
the 2-D difference equation with boundary conditions in (2.17). To illustrate this
approach, consider the following 2-D difference equation with boundary conditions: -8oundary condition used
Figure 2.10 Output y(n,, n,) for n, 2 0 and n, 2 0 of the system specified by
(2.19)with x(n,, n,) = 6(n,,n,).
Boundary conditions: y(n,, n,) = 1, for n, = - 1 or n, = - 1.
4
unique solution is not that obvious.
Suppose we have chosen the boundary conditions such that the difference
+Boundary condition used
equation has a unique solution. The difference equation with boundary conditions
can then be considered a system. In both the 1-D and the 2-D cases, the system
is in general neither linear nor shift invariant. Consider the example in (2.19).
The output y(n,, n,) for n, r 0 and n, 2 0 which results when the input x(n,, n,) Figure 2.11 Output y(n,, nZ) for n, 2 0 and n, 2 0 of the system specified by
in (2.19) is doubled so that x(n,, n,) = 26(n,, n,) is shown in Figure 2.11. Com- (2.19)when x(n,, nZ) = 26(n1,n,).
Y
be chosen such that they will shift as the input is shifted, and they will not overlap
c Boundary conditions
with the result of convolving the input with the impulse response of the system.
Step 3 ensures that the boundary conditions we impose are zero boundary con-
ditions. When we choose the boundary conditions following these three steps, we
Figure 2.12 Output y(n,, n Z )for n , 2 0 and n, 2 0 of the system specified by shall find that the resulting difference equation with boundary conditions will be-
(2.19) with x(n,. n,) = S(n, - 1 , n, - 1). come an LSI system. We will now discuss each of the three steps in detail.
82 The z-Transform Chap. 2 Sec. 2.2 Linear Constant Coefficient Difference Equations 83
Step 1. In this step, we interpret a difference equation as a specific com- It is often convenient to represent a specific computational procedure pic-
putational procedure. The best way to explain this step is with an example. torially. Suppose we wish to pictorially represent the computational procedure
Consider the following difference equation: corresponding to (2.23a). To do so, we consider computing y(0, 0). Since
y(n,, n,) on the left-hand side is always computed from the right-hand side,
y(nl, n2) + 2y(nl - 1, n2) + 3y(nl, n2 - 1) y(0, 0) in this case is computed by
+ 4y(nl - 1, n, - 1) = x(n,, n,). (2.21)
Since there are four terms of the form y(n, - k,, n, - k,), we obtain four equations We have used an arrow, (-), to emphasize that y(0, 0) is always computed from
by leaving only one of the four terms on the left-hand side of the equation: the right-hand side. Equation (2.24) is represented in Figure 2.13. The value
y(0, 0) that is computed is denoted by "x" in Figure 2.13(a). The values y(- 1, 0),
y(nl, n2) = -2y(nl - 1, n2j - 3y(nl, n2 - 1) (2.22a) y(0, - 1) and y( - 1, - 1) that are used in obtaining y(0, 0) are marked by a filled-
in dot (). with the proper coefficient attached to the corresponding point. The
value x(0, 0) used in obtainingy(0, 0) is marked by a filled-in dot in Figure 2.13(b).
To compute y(0, O), therefore, we look at y(n,, n,) and x(n,, n,) at the points
marked by filled-in dots, multiply each of these values by the corresponding scaling
1 factor indicated, and sum all the terms. This is illustrated in Figure 2.13(c). Figure
- 5(nl - 1, 2 - 1) + ~ x ( n , n2)
,
~ ( n ln2
, - 1) = -+y(n,, n,) - $y(n, - 1, n,) (2.22~)
- &(nl - 1, n, - 1) + k(n,, n,)
y(nl - 1, n, - 1) = -;y(n,, n,) - $y(n, - 1, n,) (2.22d)
-:y(n,, n, - 1) + k(n,, n,).
By a simple change of variables, (2.22) can be rewritten so that the left-hand side
of each equation has the form y(n,, n,) as follows:
y(nl, n2) = -2y(nl - 1, n2) - 3y(n1, n2 - 1) (2.23a)
Even though all four equations in (2.23) are derived from the same difference
equation, by proper interpretation they will correspond to four different specific
computational procedures and therefore four different systems. In the interpre-
tation we use, the left-hand side y(n,, n,) is always computed from the right-hand
side expression for all (n,, n,). When this interpretation is strictly followed, each
of the four equations in (2.23) will correspond to a different computational pro- Figure 2.13 (a) Output mask and (b) input mask for the computational procedure
in (2.24) and (c) graphical sketch of how y(0, 0) is computed.
cedure. This will be clarified when we discuss Step 2.
Sec. 2.2 Linear Constant Coefficient Difference Equations
84 The z-Transform Chap. 2
2.13(a) is called the output mask and Figure 2.13(b) is called the input mask, since is influenced by the impulse* 6(n1,n,) when we set x ( n l , n,) = 6(n,, 11,). Consider
they are masks that are applied to the output and input to compute y(0, 0 ) . y(0, 0 ) . From (2.25) or Figure 2.13, y(0, 0 ) is influenced by y ( - 1, 0 ) . y(0, - 1 ) ,
In the illustration, the output and input masks are sketched for the case when y( - 1, - 1) and 6(0, 0 ) . Clearly y(0, 0 ) is influenced by the impulse 6(n1,n,).
y(0, 0 ) is computed. They are also very useful in visualizing how other points of Let us now consider y(1, 0 ) . y(0, I ) , and y(1, 1). From (2.25) or Figure 2.13,
y(n,, n,) are computed. Suppose we wish to compute y(3, 2 ) using the compu-
tational procedure in Figure 2.13. The points used in determining y(3, 2) are
marked by filled-in dots in Figure 2.14 with the proper scaling factors attached.
Figure 2.14 is simply a shifted version of Figure 2.13.
As the above discussion suggests, a difference equation can lead to a number
of different computational procedures. Which procedure is chosen from these Since 6(n,, n,) has already influenced y(0, O), and y(0, 0 ) in turn influences
possibilities depends on the context of the given problem. This point will be y ( l , O ) ,y(0, I ) , a n d y ( l . 1 ) from (2.26),y(l,0 ) ,y(0, I ) , a n d y ( l , 1 )will be influenced
discussed after we discuss Steps 2 and 3. by the impulse 6(n1,n,). Now consider y ( - 1, 0 ) . From (2.25) or Figure 2.13,
y ( - 1, 0 ) is obtained from
Step 2. In this step, we determine R,, the region of support of the system's
output y(n17n,). To determine R,, we first determine R,,. the region of support
of the system's impulse response. To see how R, is determined, consider the This is shown in Figure 2.15. The terms that influence y ( - 1, 0 ) in (2.27) are
following computational procedure: obtained from
Y ( - 2 , 0 ) + -2y(-3, 0 ) - 3y(-2, -1) - 4y(-3, -1) + 6(-2, 0)
*If the right-hand side of (2.25)contains an additional term x(n, - 1, n,), then setting
x(n,, n,) = 6(n,,n,) will cause two impulses 6(n,, n,) and 6(n, - 1, n,) to be present in
Figure 2.14 (a) Output mask and (b) input mask in Figure 2.13 shifted to illustrate (2.25). In this case, R, is the region of (n,,n,) for which y(n,, n2) is influenced by either
(c) computation of y(3, 2).
6(n,,n,) andlor 6(n, - 1, n,).
Step 3. In this step, we choose boundary conditions such that y(nl, n,) = 0
for all (n,, n,) C RBc. This is a necessary condition for the system to be linear.
In the example considered in Step 2 above, the boundary conditions are chosen
as
(cl (d)
y(n,, n,) = 0 for n, < -1 or n, < -1. (2.29)
Figure 2.15 Output points that determine (a) y ( - 1, 0 ) , (b) y( -2, 0). (c) y( - 1,
- 11, and (d) y ( - 2 , - 1). Once the boundary conditions have been determined, we can compute the
output y(n,, n,) from the specific computational procedure, the input, and the
boundary conditions. For the specific computational procedure of (2.25) and the
input x(nl, n,) in Figure 2.18(a), the output y(n,, n,) is shown in Figure 2.19.
In the example above, we have assumed a specific computational procedure.
As we mentioned before, many computational procedures are possible for a given
difference equation. Which one is chosen depends on the problem context. The
following example will show how this choice is made. Consider an infinite impulse
response (IIR) filter whose system function H(z,, 2,) is given by
Suppose the IIR filter is designed so that the impulse response of the designed
system is as close as possible in some sense to an ideal impulse response that is a
first-quadrant support sequence. Then, we know that the filter is at least ap-
proximately a first-quadrant support system.* We can use that information to
choose a specific computational procedure.
From (2.30), we can determine the corresponding difference equation by
setting H(zl, 2,) to Y(zl,z2)lX(z1, z,), cross-multiplying the resulting equation,
*In practice, we may even know the specific output and input masks from the design
procedure. In this case, choosing a specific computational procedure follows directly from
Figure 2.16 Output points that influence y ( - 1 , 0). the known output and input masks. This is discussed further in Chapter 5.
The z-Transform Chap. 2 Sec. 2.2 Linear Constant Coefficient Difference Equations 89
Figure 2.19 Output of the LSI system
corresponding to (2.25). when the input
used is shown in Figure 2.18(a).
which is identical to (2.21). From (2.31), we can derive four possible computational
procedures, as shown in (2.23). We have already analyzed the first computational
procedure given by (2.23a) in detail and have found that the corresponding
h(n,, n,) is a first-quadrant support sequence. Therefore, this computational pro-
cedure is consistent with the assumed information in this problem. If we analyze
all three other computational procedures in detail, we can easily show that none
corresponds to a first-quadrant support system. For example, consider the second
computational procedure given by (2.23b). The output and input masks for this
computational procedure are shown in Figures 2.20(a) and (b). The region of
support R, is the region of (n,, n,) for which y(n,, n,) is influenced by the impulse
S(nl, n,) when we set x(n,, n,) = S(nl, n,). The region R,, in this case is shown
in Figure 2.20(c). This computational procedure is clearly inconsistent with the
assumed information in the problem. In a similar manner, we can show that the
two remaining computational procedures, given by (2.23~)and (2.23d), are also
inconsistent with the assumption that the system is at least approximately a first-
quadrant support system.
We have considered an example in which we began with H(z,, z,), x(nl, n,),
and the approximate region of support of h(nl, n,) and determined the output
y(n,, n,). This is how an IIR filter can be implemented. A summary of the steps
involved is shown in Figure 2.21. From the system function H(z,, z,), we can
obtain the difference equation. From the difference equation and approximate
region of support of h(n,, n,), we can determine a specific computational procedure.
Figure 2.18 (a) x ( n , , n,); (b) R,; (c) RE,. R, and R,, are the region of support
of y(n,, n,) and boundary conditions, respectively, for the computational procedure From the specific computational procedure and the region of support of x(nl, n,),
specified by (2.25). we can determine the boundary conditions. From the specific computational pro-
cedure, boundary conditions, and x(n,, n,), we can determine the output y(n,, n,).
and then performing inverse z-transformation. The resulting difference equation The procedure discussed above can be viewed as a generalization of the initial
is given by rest condition. We can illustrate this with a specific example. Consider a 1-D
difference equation given by
90 The z-Transform Chap. 2 Sec. 2.2 Linear Constant Coefficient Difference Equations 91
Determine
difference
Difference equation
Approximate
region of support computational
of h(nl. n21
k- l
Computational procedure
Region of support
of ~ ( n ,n2)
, 4 1
Determine
conditions
boundary
t ICompute
output
Boundary conditions
Figure 2.20 (a) Output mask, (b) input mask and (c) region of support of the The difference equation with boundary conditions plays a particularly important
impulse response for the computational procedure specified by (2.23b). role in digital signal processing, since it is the only practical way to realize an IIR
digital filter. As we discussed in previous sections, the difference equation can
We assume that the input x ( n ) to the system is given by be used as a recursive procedure in computing the output. We define a system
to be recursively computable when there exists a path we can follow in computing
every output point recursively, one point at a time. The example in Section 2.2.3
The initial rest condition in this example, from its definition, is corresponds to a recursively computable system, and all our discussions in that
section were based on the implicit assumption that we were dealing with recursively
computable systems.
Equation (2.34)can also be derived by using the procedure discussed in this section. From the definition of a recursively computable system, it can be easily shown
From the difference equation in (2.32), we choose the specific computational pro- that not all computational procedures resulting from difference equations are re-
cedure given by cursively computable. For example, consider a computational procedure whose
output mask is shown in Figure 2.22. From the output mask, it is clear that
computing y(0, 0 ) requires y(1, 0 ) and computing y(1, 0 ) requires y(0, 0). There-
To determine R,, the region of support of h(n), we set x(n) = 6(n) and determine 1 fore, we cannot compute y(0, 0 ) and y(1, 0 ) one at a time recursively, so the
the values of n for which y(n) is influenced by 6(n). From (2.35), R, is given by example in Figure 2.22 is not a recursively computable system. It can be shown
n 2 0. From R, in (2.33) and R,, R, is given by n 1 0. The initial conditions, I that if a system is recursively computable, the output mask has wedge support.
92 The z-Transform Chap. 2 Sec. 2.2 Linear Constant Coefficient Difference Equations 93
Examples of wedge support output masks are shown in Figure 2.23. The output
mask in Figure 2.23(a) is said to have quadrant support. The output masks in
Figures 2.23(c) and (d) are said to have nonsymmetric halfplane support. Examples
of output masks that do not have wedge support are shown in Figure 2.24. The
output masks in Figures 2.24(a) and (b) are said to have half plane support. For
a finite-extent input x(n,, n,), the wedge support output mask is not only necessary,
but also sufficient for the system to be recursively computable.
For a recursively computable system, there may be many different paths we
Figure 2.22 Example of an output can follow in computing all the output points needed. For example, consider a
mask corresponding to a system that is computational procedure whose output and input masks are shown in Figure 2.25.
not recursively computable. For the input x(n,, n,) shown in Figure 2.26(a), the boundary conditions we need
are shown in Figure 2.26(b). We can compute y(n,, n,) in many different orders,
using the graph shown in Figure 2.27. The figure shows which output points are
needed to compute a given output point recursively. For example, y(2, 0) and
y[17 1) must be computed before we can compute y(2, 1). Specific orders that
can be derived from Figure 2.27 include
Figure 2.23 Examples of wedge support output masks. For finite-extent input. Figure 2.24 Examples of output masks of systems that are not recursively com-
they correspond to recursively computable systems. putable.
94 The z-Transform Chap. 2 Sec. 2.2 Linear Constant Coefficient Difference Equations
~(0.0) Figure 2.27 Illustration of the output
N\ points that are required to compute a
~(1.0) y(0,1)
\ P ' % given output point recursively, for the
~ ( 20)
. ~ ( 11)
, ~(0.2) example considered in Figures 2.25 and
' 4 H x 2.26. This graph can be used in deter-
yt3,O) ~(21 Yl1.2, y(O.3, mining an order in which all the output
points needed are computed recursively.
The IIR filter was designed such that the impulse response of the designed system
will be as close as possible in some sense to an ideal impulse response that is a
first-quadrant support sequence. Therefore, we know that the filter is a first-
quadrant support system. We wish to determine the output of the filter when the
input x(n,, n,) is given by
Figure 2.25 (a) Output mask and (b) input mask of a specific computational
procedure. Since the only practical way to implement an IIR filter is by using a difference
equation, we first convert (2.37) to a difference equation by
These three different orders are illustrated in Figure 2.28. Although there may
be many different orders in which y(n,, n,) can be computed, the result does not
depend on the specific order in which the output is computed.
2.2.5 Example
In this section, we will present one additional example to illustrate the steps n
Figure 2.21. Suppose we have designed an IIR digital filter whose system function
H(z,, 2,) is given by
Figure 2.26 (a) Input x(n,, n,) and (b) resulting boundary conditions for the system Figure 2.28 Possible ordering for computing the output of the system in Figure
whose output and input masks are shown in Figure 2.25. 2.25 with the input in Figure 2.26(a).
96 The z-Transform Chap. 2 Sec. 2.2 Linear Constant Coefficient Difference Equations
= x(n,, n,) + L ( n l - 1, n2). (2.39) Output mark Input mask
(a)
Since the IIR filter is an LSI system, we choose the proper set of boundary con-
ditions such that the difference equation will become an LSI system. There are
four specific computational procedures that correspond to the difference equation
in (2.39). One computational procedure is
y(n1, n2) + i y ( n l - 1, n,) - $y(nl. n, - 1) - $y(n,, n, - 2 )
+ x ( n l , n,) + 2x(nl - 1, n,). (2.40)
The second computational procedure can be obtained by
The output and input masks corresponding to each of these four computational
procedures are shown in Figure 2.29. From the output mask corresponding to
(2.43), we recognize that (2.43) is not a recursively computable system. and we
therefore eliminate (2.43) from further consideration. The region of support of
h ( n l , n,) for each of the three remaining computational procedures is shown in
Figure 2.30. Since we know that the filter is a first-quadrant support system, we
choose the computational procedure given by (2.40). To determine the boundary
conditions for the computational procedure chosen, we determine R,. , the region Output mask Input mask
of support for the output sequence y(nl, n?). from R, in (2.38) a n d - R , in Figure (dl
2.30(a). The region H,; is shown in Figure 2.31(a). The boundary conditions that Figure 2.29 Output and input masks corresponding to the computational proce-
we use are shown in ~ i g u r e2.31(b). With the boundary conditions shown in Figure dures of (a) (2.40), (b) (2.41), (c) (2.42), and (d) (2.43).
98 The z-Transform Chap. 2 Sec. 2.2 Linear Constant Coefficient Difference Equations
(a) (b)
(c)
Figure 2.30 Region of support of h(n,, n2) for systems specified by (a) (2.40), (b) (2.41).
and (c) (2.42).
2.31(b), the output y(n,, n,) can be computed recursively from (2.38) and (2.40).
The result is shown in Figure 2.32. We can verify that the computational procedure Figure 2.32 Output of the system corresponding to (2.40) when the input
is indeed an LSI system. If we double the input, the output also doubles. If we x(n,, n,) is given by (2.38).
shift the input, then the output shifts by a corresponding amount.
where a(k,, k,) and b ( k l , k,) are finite-extent sequences with nonzero amplitudes
in R, and R, respectively. By z-transforming both sides of (2.44) and then solving
for H ( z , , z,), we have
(a) (b)
Figure 2.31 (a) R, and (b) R,, for the computational procedure given by (2.40)
and x(n,, n,) in (2.38).
The denominator and numerator in (2.45) are z-transforms of a(n,, n,) and
The z-Transform Chap. 2 Sec. 2.2 Linear Constant Coefficient Difference Equations 101
b(n,, n,), and therefore can be expressed as A(zI, 2,) and B(zl, 2,) respectively. our main interest is testing the stability of digital filters, we will consider the problem
The system function H(zl, z2) can then be expressed as of testing the system stability given H(zl, 2,) and the region of support of h(n,, n,).
When H(zl, z,) is a system function of a digital filter, there are restrictions
imposed on H(zl, 2,) and the region of support of h(n,, n,). One restriction is
that H(zl, z,) must be a rational z-transform, which can be expressed as
(11 I,112) C Re,
The functions A(zl, z,) and B(zl, z,) are finite-order polynomials* in z, and
2,. The function H(zl, z,), which is the ratio of two finite-order polynomials, is where A(zl, z2) and B(zl, 2,) are finite-order polynomials in z, and 2,. We will
called a rational z-transform. It is clear from (2.46) that the system function derived assume that A(zl, z2) and B(zl, 2,) are co-prime, so H(zl, z,) is irreducible.
from the difference equation of (2.44) is a rational z-transform. If A(zl, z,) cannot Another restriction is that the system must be recursively computable. With these
be written as a product of two finite-order polynomials, it is said to be irreducible. two restrictions, the system can be realized by a difference equation with boundary
If there is no common factor (except a constant or linear phase term) that divides conditions and the output can be computed recursively, one value at a time.
both A(z,, z,) and B(zl, z,), A(zl, z,) and B(zl, z2) are said to be co-prime. A In the 1-D case, when the system function H(z) is expressed as B(z)IA(z),
rational H(zl, 2,) is said to be irreducible if A(z,, z2) and B(zl, z,) where there are no common factors between A(z) and B(z), B(z) does not affect
are co-prime. An irreducible H(z,, 2,) does not imply, however, either an irre- the system stability. In the 2-D case, however, the presence of B(zl, 2,) in (2.48)
ducible A(zl, 2,) or an irreducible B(zl. z,). can stabilize an otherwise unstable system, even when there are no common factors
between A(zl, z,) and B(zl, 2,). In this case, pole surfaces (A(zl, z,) = 0) and
zero surfaces (B(zl, 2,) = 0) intersect at the unit surface and the specific values
2.3 STABILITY
*The polynomial in z, and z, used here is a linear combination of 1, 2,. z2. z,z2, z;', *A finite shift of h(n,, n,) accounts for an arbitrary linear phase term that may have
zyl, z;'~;', and so on. "Finite order" rneans that there is a finite number of such terms. been included as part of A ( z , , 2,).
H(z1,1 2 ) = -
A(z1,2 2 )
: Wedge support system
Linear mapping
of variables
A ( z l , 2,) = xx
M
n1=0
M
n2=O
a(nl, ~ , ) Z ; " ~ Z ; " ~ with a(0, 0 ) = 1 (2.49)
In the following sections, we discuss several stability theorems and how they
The first-quadrant support system with the above system function H(zl, z,) has can be used in testing the system stability.
been shown [Goodman] to be stable, even though
Theorem 1.
Stability cJ A(zl, z,) f 0 for any Iz,l 2 1, Iz21 2 1. (2.53)
Let us first consider Condition (a). To satisfy Condition (a), we need to ensure
that A(zl, 2,) is not zero for any (z,, 2,) such that lzll = 1 and 1z21 2 1. This
requires a 3-D search and is shown in the figure by the solid vertical line. To
ensure Condition (b), we need to ensure that A(z,, 2,) is not zero for any (z,, z,)
such that lzll 2 1 and 1z21 = 1. This also requires a 3-D search and is shown in
the figure by the solid horizontal line. From the search point of view. then, the
Consider a fixed value of w,, say w;. Then A(ejmi,2,) is a 1-D polynomial in the
variable z,, and solving for all 2, such that A(elWi,2,) = 0 is equivalent to a 1-D
stability test with complex coefficients. If we vary w, continuously from 0 to 2 ~ ,
and we perform the 1-D stability test for each w,, then we will find all possible
values of (w,, 2,) such that A(ejml,2,) = 0. In practice, we cannot change w,
continuously, and must consider discrete values of w,. We can obtain a table like
Table 2.3 by performing many 1-D stability tests. If we choose a sufficiently small
A, we can essentially determine all possible values of (w,, 2,) such that
A(ejml,2,) = 0. By checking if all the values of 1z21 in Table 2.3 are less than 1,
we can satisfy Condition (a) without a 3-D search.
Figure 2.35 Derivation of Theorem 2 from Theorem 1. A(z,, z,) # 0 in the There are various ways to improve the efficiency of the 1-D stability tests in
shaded region in (a) and the solid lines in (b). (a) Condition in Theorem 1: (b)
two conditions in Theorem 2.
(2.57) for different values of w,. As we change w, in (2.57) in a continuous manner,
the roots for 2, also change in a continuous manner. Therefore, once the roots
for ~ ( e i ~2,)i , = 0 are known, they can be used as initial estimates for finding the
) , 2,) = 0 for any small A. Furthermore, if any of the roots has
roots of A ( ~ J (+"A ~
magnitude greater than 1, the system is unstable and no additional tests are nec-
essary. In addition, if we sample o, at equally spaced points, then a fast Fourier
transform (FFT) algorithm can be used to compute the coefficients of the 1-D
polynomial in (2.57). Specifically, from (2.49) and (2.57),
where
Figure 2.38 Root Map 1 for each of the system functions. Root Map 1 is a sketch of the
For a sufficiently small M relative to N, computing c(k, n,) in (2.59b) directly is roots of A(@-', z,), as we vary o1from 0 to 2 ~ .
the most efficient approach. Otherwise, using an FFT algorithm may be more
efficient. From (2.59b), c(k, n,) for a fixed n2 is the 1-D discrete Fourier transform
( D m ) of one row of a(n,, n,). By computing c(k, n,) using a 1-D FFT algorithm 1 1
for each n, and storing the results, all the coefficients of the 1-D polynomial in (b) H(z1, z2) = -
A @ , z2) = 1 - 1 -1 - 1 -1 -1 -2 -2'
az1 5% 3 2 1 Z2
(2.57) for different values of w, will have been obtained.
Step 1. Solve for all (z,, z,) such that A(z,, 1z21 = 1) = 0. This is equivalent to Theorem 3.
solving all (z,, o,) such that A(zl, ejwz)= 0. Stability W (a) A(zl, 2,) f 0 for Izll = 1, ( z ~2l 1 (2.61)
Step 2. Check if all lzll obtained in Step 1 are less than 1. and ( 6 ) A(z,, 2,) f 0 for lzll 2 1, 2 2 = 1.
If we plot the values of z, such that A(zl, elw2)= 0 in the 2, plane as we vary o,
continuously from 0 to 27r, then the values of 2, will vary continuously, and the
resulting sketch will form another root map. We will refer to this as Root Map
2. Condition (b) in Theorem 2 is equivalent to having all the contours in Root
Map 2 lie inside the unit circle in the z, plane. Two examples of Root Map 2
corresponding to (2.60a) and (2.60b) are shown in Figure 2.39. All the contours
1,
Condition (a) in this theorem is identical to Condition (a) in Theorem 2, and this
3-D search problem can be solved by solving many l - D stability tests. T o satisfy
Condition (b), we must ensure that A(z,, 2,) is not zero for any (z,, 2,) such that
\ lzll > 1 and z, = 1. This corresponds to a 2-D search problem, where the space
\ to be searched is shown by the dashed line in the figure above. The dashed line
1 ,
I emphasizes that this is a 2-D search problem, in which the search is performed in
/ the 2-D subspace (lzll 2 1, 2, = 1) of the 3-D space (lzll 2 1, lz,l = 1).
I
The 2-D search problem corresponding to Condition (b) can be substituted
for by one l - D stability test. To satisfy Condition (b), we must make sure that
A(z,, 2,) f 0 in the 2-D space corresponding to (lzll 2 1, 2, = 1). The following
I circle circle two steps comprise one approach to satisfying this condition.
Figure 2.40 Derivation of Theorem 3 from Theorem 2. A(z,, z,) # 0 in the solid
Stability e (a) A(z,, 2,) f 0 for Jzl(2 1, lz,l = 1
lines in (a) and in the solid and dashed lines in (b). The dashed line emphasizes and (b) A@,, 2,) f 0 for z, = 1, (z212 1.
that it corresponds to a 2-D space. (a) Conditions in Theorem 2; (b) conditions in
Theorem 3.
Figure 2.43 (a) Root Map 1 and (b) Root Map 2 used in deriving Theorem 4
from Theorem 2. Condition (a) in Theorem 4 implies that each contour in the root
map is completely inside or outside the respective unit circle.
From the theorems discussed in Section 2.3.2, we can develop many different
methods useful for testing the stability of 2-D systems in practical applications.
One such method is shown in Figure 2.44. Test 1 in the figure checks Condition
(b) of Theorem 4. Test 2 checks Condition (c) of Theorem 4. Test 3 checks
Condition (a) of Theorem 4. Test 3 can be replaced by testing whether all the
contours in one of the two root maps are inside the unit circle. This checks
Condition (a) of Theorem 3. If the system passes all three tests, it is stable.
Figure 2.42 Derivation of Theorem 4 from Theorem 2. A(z,, z,) # 0 in the solid
lines in (a), and the filled-in dot and dashed lines in (b). (a) Conditions in Theorem
Otherwise, it is unstable.
2; (b) conditions in Theorem 4. To illustrate how the procedure in Figure 2.44 may be used in testing the
stability of a 2-D system, we consider three examples.
Example 2
z,=:<1 t
Stable
t
Unstable
Test 1 is passed.
Test2 A(l,z,)= 1 -4-iz;l=O Using Property 7 of the z-transform properties from Table 2.2, we have
z z = f < l
Test 2 is passed. From this relationship and the given H ( z l , z,),
Since the coefficients bo, cO . . . are obtained recursively one at a time, the procedure
can terminate early when the system is unstable. This method is computationally
very efficient, requiring approximately M 2 multiplications and M 2 additions. An-
other advantage, common to all algebraic methods, is that the stability is exactly
determined in a finite number of steps, if infinite precision for the arithmetic is
assumed. .Other methods, such as explicit root determination, lack this property.
This method has been reported to be computationally efficient and reliable for M
Figure 2.46 Root Map 1 for A ' ( 2 ,, 2,)
up to 20 or so, which covers most infinite impulse response (IIR) filters considered
-- 1 + I421-1 22- 1 - 1 222- 1 - 1 321- 1 . in practice [O'Connor and Huang]. One disadvantage of this method is that it
The stability test procedures we discussed in Section 2.3 are based on the notion
that a 2-D stability test can be performed by a set of many l-D stability tests.
Although we can be more certain about the stability of a 2-D system by increasing
the number of l-D stability tests, stability is not absolutely guaranteed with a finite
number of l-D stability tests, even with infinite precision in arithmetic.
There does exist a class of methods, known as algebraic tests, which can
guarantee the stability of a system in a finite number of steps if infinite precision
in arithmetic is assumed. All existing methods of this type test the conditions in
Theorem 3 from Section 2.3.2 by modifying l-D algebraic tests. In a l - D algebraic
does not tell us just how stable a filter is. The method can be used in determining stability test, such as the modified Marden-Jury test discussed in Section 2.3.4, a
the number of roots inside the unit circle, but cannot be used in explicitly deter- sequence of real numbers that contain the stability information is derived directly
mining the root locations. from the l-D polynomial coefficients. In a 2-D algebraic test, A(z,, z2) is viewed
Another approach to solving (2.63) is to exploit the Argument principle [Mar- as a l-D polynomial with respect to one variable, say z,, and the complex coeffi-
den]. Consider the net change in the argument of A(z) in (2.63) as we follow the cients of the polynomial are themselves l-D real polynomials with respect to the
unit circle contour given by z = ei" from w = 0 to o = 2n in a counterclockwise other variable z,. Applying l-D algebraic tests to this case results in a sequence
direction. Denoting the net argument change by AOA(w:O, 2n), the Argument of real polynomials that contain the stability information. This sequence of real
principle states polynomials must be checked as a function of z, to determine the system stability.
Although algorithms of this class can guarantee stability in a finite number of steps,
they are very difficult to program and require a large number of computations and
where N , is the number of zeros inside the unit circle. When all roots are inside infinite precision in arithmetic to guarantee stability. Therefore, they do not
the unit circle so that N, = M, appear to be very useful in practice.
Stability tests of another class are based on some properties of the complex
AO,(w:O, 2n) = 0. (2.69) cepstrum. These are called cepstral methods. The 2-D complex cepstrum is a
From (2.69), one approach to testing the stability is to check if the net phase straightforward extension of the l-D complex cepstrum. One property of the
change is zero. The net phase change can be determined by unwrapping the phase complex cepstrum is that a recursively computable system is stable if and only if
[Oppenheim and Schafer]. The result of this procedure is a continuous phase, the sequence a(n,, n,) corresponding to the denominator polynomial A(zl, 2,) and
which is well defined except when any root is on the unit circle. A typical phase its complex cepstrum li(nl, n,) have the same wedge-shape support. By computing
unwrapping procedure computes the principal values of the phase andlor the phase ri(nl, n,) and checking its region of support, we can in principle test the stability.
derivatives at many frequencies w using an FJT algorithm. Starting from w = 0, Unfortunately, li(n,, n,) is typically an infinite-extent sequence despite the fact that
the continuity assumption of the unwrapped phase is used to find a continuous a(nl, n,) is a finite-extent sequence, and computer computation of li(n,, n,) leads
phase function, which results in the unwrapped phase. If the unwrapped phase is
I to an aliased version of the true ri(nl, n,). In addition, computation of b(n,, n,)
2.2. Let X(z,, z,) and R, denote the z-transform and its ROC for x(n,, n,). For each of
the following sequences, determine the z-transform and ROC in terms of X(z,, z,)
and R,.
(a) x(n, + 5, n, - 6 )
(b) anlx(nl,n,)
( 4 n,ng(n,, n,)
(d) x(n,, n,) * X( - n,, - n,)
2.3. Let X(z,, z,) and R, denote the z-transform and its ROC for x(n,, n,). Suppose we
form a new sequence y(nl, n,) by the following linear mapping of variables:
Y ( ~ In2)
, = x(m1, m,)Im,= -,1,m2= -n,+,n,. (a) What constraints does the ROC of X(z,, z,) have to satisfy?
(b) Consider a sequence y(nl, n,) given by
Determine Y(z,, z,) and R,, the z-transform and its ROC for y(n,, n,).
2.4. Consider a sequence x(n,, n,) given by y(nl, n,) = uAn,) 8An,) + an2&An, - n,)u( -nl, - n,).
x(nl, n,) = (4)"' (f)"2u(n,,n,). Determine YA,, z,) and its ROC. Is your answer consistent with your result in
The ROC of X(z,, z,), the z-transform of x(nl, n,), is sketched below: (a)?
2.8. Consider a separable system with system function H(zl, z,) = Hl(z1)H2(z2).HOWis
the stability of the 2-D system H(z,, z,) related to the stability of the two 1-D systems
H,(z,) and H,(z,)?
2.9. Consider a sequence x(nl, n,) given by
-
x(nl, n,) = an1bn2u(nl,
n,).
(a) Determine the range of values of a and b for which x(n,, n,) is stable by imposing
the condition of absolute summability.
(b) Answer (a) by determining the ROC of X(z,, z,) and using a property of the
lz11 ROC.
112 Figure P2.4
2.10. Determine all sequences x(n,, n,) with the same z-transform, which is given by
Consider the 2-D surface represented by lzll = 112 and Iz,J = 113, shown as a filled-
in dot in the above figure. Determine and sketch the intersection of this 2-D surface I
with the pole surfaces of X(z,, z,). 2.11. Consider a stable sequence x(n,, n,) whose z-transform X(z,, z,) is given by
2.5. Consider a sequence x(n,, n,) given by
1
X(z1, z?) =
1 - 2 ~ ; ~ z ;-
' fz;Sz;' - ~z;~z;''
Determine and sketch the intersection of the pole surfaces of X ( z l , z,) with the unit Determinex(-8, I ) , the sequencex(n,, n,) evaluated at n, = -8 and n, = 1. Explain
surface. your answer.
z;'
1 I
(a) H(z1, z,) =
1 - ;Z,~Z;'
We wish to recover x(n,. n,) from y ( n , , n,) with an LSI system with impulse response
f ( n l , n,), as shown in the following figure.
2.13. In this problem, we determine a stable sequence whose z-transform X ( z I , z,) is given
by
1 Determine f ( n , , n,). Does f ( n l , n,) depend on the ROC of H ( z , , z,)?
X(z1, z2) = 1 - fz;l - fz;Iz,l'
2.16. In general, a system corresponding to a 1-D difference equation with initial conditions
(a) Using the inverse z-transform formula, we can express x ( n l , n,) as is neither linear nor shift invariant. As discussed in the text, one approach to forcing
the difference equation with initial conditions to result in an LSI system is to impose
the initial rest condition (IRC). Another approach is to impose the final rest condition
(FRC). In this problem, we show that both the IRC and the FRC can be viewed as
special cases of the following approach to obtaining the initial conditions.
For a fixed z,, the expression inside the parenthesis can be viewed as the 1-D Step I. Interpret the difference equation as a specific computational procedure.
inverse z-transform of G ( z I ) . Determine g ( n , ) , the 1-D inverse z-transform of Step 2. Determine the initial conditions as follows.
G ( z , ) . The sequence g(n,) is a function of z,. (a) Determine R,, the region of support of h ( n ) .
(b) Using the result of (a), we obtain (b) Determine R,, the region of support of y ( n ) .
(c) Rlc is all n Q R,..
Step 3. Initial conditions are given by y ( n ) = 0 for n C R,,.
The right-hand side expression of this equation can be viewed as the 1-D inverse Consider the following difference equation:
z-transform of F(z,). Using the series expansion formula
($)"I+n2 n,!
n , 2 0, n, 2 0, n, 2 n,
( n , - n2)!n2! Show that the initial conditions obtained from the above three-step approach are
x(n1, n,) =
identical to the IRC. The IRC states that the output y ( n ) is zero for n < no
lo, otherwise.
whenever the input x ( n ) is zero for n < no.
(c) Sketch x ( n l , n,) in (b).
(b) If you choose a different computational procedure obtained from the same dif-
2.14. When the input x(n,, n,) to an LSI system is the unit step sequence u ( n l , n,)! the ference equation, then the initial conditions obtained from the above three-step
output y ( n l , n,) is approach are identical to the FRC. The FRC states that the output y ( n ) is zero
for n > no + m0 for some fixed finite m0 whenever the input x(n) is zero for
y(n,, n,) = (4)"'-'u(n1 + 1, n,) n > no. Determine the computational procedure and show that the resulting
initial conditions are identical to the FRC.
(a) Find the system function H ( z l , z,) and sketch its pole surfaces. (c) ISit possible to derive one more computational procedure from the same difference
(b) Determine the impulse response h(n,, n,). equation? Is this computational procedure recursively computable?
(c) Is the system stable? 2.17. Consider the following difference equation.
2.15. Consider an LSI system whose input and output are denoted by x(nl, n,) and
The IIR filter is an LSI system. Suppose the filter was designed by attempting to
I z, I Figure P2.17 approximate a desired second-quadrant support sequence h,(n,, n,) with h ( n l , n,).
We wish to filter the following 2 x Zpoint input x(n,, n,).
The constraint map shows that for any ( z ; , z ; ) in the ROC, all ( z , , z,) such that
/zll 2 Iz;I and lzzl 5 1z;I are also in the ROC.
(a) Determine the output mask and input mask for the system.
(b) Determine the impulse response of the above LSI system. Your answer should
be a closed-form expression. (Hint: Consider doing an inverse z-transform.)
2.18. Consider the following difference equation:
A digital filter of this type is called a nonsymmetric half plane filter. Show that a
Output mask Input mask
i filter whose output and input masks are shown in the following figure is a nonsymmetric
Figure PZ.19
I half plane filter.
Figure P2.22b
2.23. Consider a 2-D LSI system with its system function denoted by H ( z I ,2 , ) . Consider
the following four pieces of information.
1
(f) ' = 1 - 0 . 8 ~ ;-~0.72;' + 0.56~;~~;'
I 21I Figure PZ.23s 2.26. Consider a first-quadrant support system with the system function H ( z , , z 2 ) given by
1
(14) The input to the system, x ( n l , n,), is a 3 x 3-point sequence given by H ( z , , 22) =
1 - az;' - bz;"
Determine the conditions on a and b for which the system is stable.
2.27. Consider an LSI system whose output mask is sketched below.
(a) Suppose that only the information in (11). (I2), and (14) is available. How many
different systems would satisfy the constraints imposed by the available infor-
mation?
(b) Suppose all the above information in ( I l ) , (I2), (I3), and (14) is available. Sketch
the output and input masks and determine y(0, O), the response of the system
evaluated at n, = n, = 0.
2.24. When the input x(n,, n,) is a finite-extent sequence, a necessary and sufficient condition
I Figure P2.27
for a computational procedure to be recursively computable is that the output mask Determine whether or not the system is stable.
where A(z,, z,) is a finite-order polynomial in the two variables. It is known that a
necessary and sufficient condition for the stability of the system is given by
A(z,, z,) + 0 for (a) lzll = 1, 1z212 1
(b) JzlI 5 1, Iz2I = 1.
(a) Show that the following set of conditions is equivalent to the above set of con-
ditions.
i. A(z,, 2,) # 0 for )zl)= 1z21 = 1
ii. A(z,, - j ) # 0 for lzll 5 1
...
111. A(j, z,) # 0 for lz21 2 1
(b) Suppose that for every o, we find all values of z, such that A(z,, 3e"9 = 0.
Suppose we sketch all the above values of z, in the z, plane. If the system is
stable, where do the values of z, lie in the z, plane?
2.30. We have a one-quadrant support system whose system function H(z,, z,) is given by
where a is a real constant. Determine the range of a for which the system is stable.
2.31. Consider a first-quadrant support LSI system whose system function is denoted by
Show the above theorem. You may use any of the stability theorems discussed in
Section 2.3.2.
2.32. Theorem 2 from Section 2.3.2 suggests a procedure that involves performing many
1-D stability tests twice, while Theorem 3 from the same section suggests a procedure
that involves performing many 1-D stability tests once. It would appear, therefore,
that Theorem 3 reduces the problem complexity by half. This is not the case. Sup-
pose we fix the total number of 1-D stability tests. We can use all of them to test
Condition (a) of Theorem 3. Alternatively, we can use half of them to test Condition
(a) of Theorem 2 and the remaining half to test Condition (b) of Theorem 2. Explain
why one approach is not necessarily better than the other by relating each of the two
approaches to Condition (a) of Theorem 4.
Chap. 2 Problems
The z-Transform Chap. 2
A sequence R(nl, n,) is said to be periodic with a period of N, x N, when
R(nl, n,) = R(n, + N,, n,) = R(nl, n, + N,) for all (n,, n,). Since f(n,, n2)r;n'r;n2
is not absolutely summable for any r, and r,, neither the Fourier transform nor
the z-transform uniformly converges for a periodic sequence.
As in the 1-D case, a periodic sequence f(n,, n,) with a period of N , x N,
can be obtained by appropriately combining complex exponentials of the form
~ ( k , k2)ei(2mlNl)k1nlei(2"/Nz)k2n2.
, neexponential sequence ~ (1, k2)ei(2"Nl)kln1~i(2mlNz)kznz
for 0 5 k, 5 N, - 1 , 0 5 k, 5 N, - 1represents all complex exponential sequences
that are periodic with a period of N, x N,. The sequence ~ ( k , ,k,), which
represents the amplitude associated with the complex exponential, can be obtained
from R(n,, n,). The relationship betweenf(n,, n,) and ~ ( k , k,) , is given by
Equation (3.la) shows how the amplitude ~ ( k ,k,) , associated with the exponential
3.0 INTRODUCTION e ~ ( ~ ~ ~ can ~ be
~ determined
) ~ ~ ~from ~ f(n,,
@ ( n,).~ ~The~ sequence
~ ~ )~ (~k , ~k,), ~
is called the DFS coefficients of f(n,, n,). Equation (3.lb) shows how complex
In many signal processing applications, such as image processing, we deal with exponentials ~ ( k , ,k2)ej(2"/N1)k1n1ej(2r1N2)k2n2
are specifically combined to form
finite-extent sequences. For such sequences, the Fourier transform and R(n,, n,). The sequence f(n,, n,) is called the inverse DFS of ~ ( k , k,). , By
z-transform uniformly converge and are well defined. The Fourier transform and combining (3.la) and (3.lb), they can easily be shown to be consistent with each
z-transform representations X(W,, y) and X(z,, 1,). however, are functions of other.
continuous variables (o,,w,) and ( z , , z,). For finite-extent sequences, which can From (3.la) and (3.lb), R(nl, n,) is represented by ~ ( k , k,) , for 0 5 k, 5
be represented by a finite number of values, the Fourier transform and z-transform N - 1 0 5 k 5 N - 1 The sequence X(kl, k,) can, therefore, be defined
are not efficient frequency domain representations. The discrete Fourier transform arbitrarily for (k,, k2)-outside0 5 k, 5 N, - 1 , 0 5 k, 5 N, - 1. For convenience
represents a finite-extent sequence in the frequency domain with a finite number and by convention, X(kl, k,) is defined to be periodic with a period of N, x N,,
of values. with one period of X(kl, k,) given by ~ ( k , k,) , for 0 5 k, s N, - 1, 0 5 k, 5
In this chapter, we study the discrete Fourier transform and algorithms to N, - 1. Also by convention, (3. lb) has a scaling factor of 11(NlN2). This scaling
compute it efficiently. We also study the discrete cosine transform, which is closely factor can be distributed to (3.la) and (3.lb) equally as 11- if desired. Since
related to the discrete Fourier transform. In Section 3.1, we discuss the discrete , are periodic with a period of N, x N, in the variables
R(n,, n,) and ~ ( k , k2)
Fourier series representation of periodic sequences. The discrete Fourier series (n,, n,) and (k,, k,), respectively, and since e ' ~ ( ~ ~ ~ ~ ~ ) ~ ~ ~ is
l eperiodic with
" ( ~ ~ ~ ~ 2 ) ~ 2 ~ 2
representation can be used in deriving the discrete Fourier transform representa- a period of N, x N, in both sets of variables (n,, n,) and (k,, k,), the limits of
tion. In Section 3.2, the discrete Fourier transform representation of finite-extent summation in (3.la) and (3.lb) can be over any one period.
sequences is discussed. In Section 3.3, the discrete cosine transform representation We can derive a number of useful properties from the DFS pair of (3.la) and
of finite-extent sequences is discussed. In Section 3.4, we discuss algorithms that (3.lb). Some of the more important properties are listed in Table 3.1. These
can be used in computing the discrete Fourier transform efficiently. are essentially the same as in the 1-D case, except Property 4, which applies to
separable 2-D sequences. Property 2 states that when two sequences f (n,,n,) and
Y(n,, n,) are periodically convolved, their DFS coefficients ~ ( k , k,), and Y(k,, k,)
3.1 THE DISCRETE FOURIER SERIES multiply. The periodic convolution, denoted by a circled asterisk (O), is very
similar in form to the linear convolution. The difference lies in the limits of the
The discrete Fourier series (DFS) is a frequency domain representation of a periodic summation. Specifically, f(ll, l2)j7(nl - I,, n, - I,) is summed over only one
sequence. We will begin with a discussion of the DFS, since it naturally leads to
the discrete Fourier transform jDFTj, the main topic oi this cnapter.
Sec. 3.1 The Discrete Fourier Series 137
period ( 0 5 I , 5 N , - 1, 0 5 1, 5 N, - 1 ) in the periodic convolution i ( n , , n,)
@Y(n,, n,), while x(l,, 12)y(nl - I,, n, - 1,) is summed over all values of I , and
TABLE 3.1 PROPERTIES OF THE DISCRETE FOURIER SERIES 1, in the linear convolution x(n,, n,) * y(n,, n,). An example of i ( n l , n,) @
f (n,,
--
. - n,), Y ( n l , n,): periodic with a period o f Nl X N2
f ( n l , n2)
01, n2)
% k l , k2)
y ( k l , k2)
N l x N2-point DFS and IDFS are assumed
..
Y(n,, n,) is shown in Figure 3.1. Figures 3.l(a) and (b) show one period of
i ( n , , n,) and y(n,, n,), each periodic with a period of 3 x 2. Figures 3.l(c) and
(d) show one period of f(l,, I,) and j(nl - 4 , n, - 12)1nl=n2=,. The result of
Property 1. Linearity
4 n 1 , n2) + bj(n,, n2)
Property 2. Periodic Convolution
- a X ( k l , k 2 ) + by@,, k2)
Property 3. Multiplication
1 NI-1 N2-1
(a) f(0, 0 ) = -
NlN2 k l = o
C *1=o
C X(kl, k2)
N I - 1 N2-1
(b) X(0, 0) = C C
nl=0 n2=0
f ( n l , n2)
Figure 3.1 Example of periodic convolution. Only one period is shown. (a) Pe-
-
riodic sequence i(n,, n,) with a period of 3 x 2; (b) periodic sequence y(n,, n,)
with a period of 3 x 2; (c) i(l,, 1,); (d) P(n, - 11, n2 - 12)lnl=n2=0;
(e) f(nlr n2)
Property 8. Symmetry Properties- O y(nl, n,) computed with an assumed period of 3 x 2. Note that i(n,, n,) O
(a) f * ( n , , n2) x * ( : k l , -k2) y(nl, n,) at n, = n, = 0 is given by
(.b,) real f (n,,
. - n,)
-. t- ,X ( k , , k 2 ) = X * ( - k l , - k 2 )
X R ( k l , k 2 ) = XR(-kl, -k2)
X,(kl, k2) = - X I ( - k l , - k2)
$(kl, k2)! = $( - k l , - k2)l
Sec. 3.1 The Discrete Fourier Series
OJkl, k,) = - 0%(- k l , - k2)
where RNl.N2(kl, k,) is defined by (3.2b). This operation is also invertible, since
2(n,, n,) @ j ( n l , n,) at n, = n2 = 0 can be obtained by multiplying f(l,, 1,) and x ( k l , k2) can be obtained from X(k,, k,) by
j(n, - I,, n, - 12))nl=nz=0 and then summing over one period. One period of
f(n,, n,) @ j(n,, n,) is shown in Figure 3.l(e). In the figure, f(nl, n,) O
j(n,, n,) is computed explicitly by using the periodic convolution sum. An alter-
native way to compute f ( n l , n,) @ >i(n,, n,) is by performing the inverse DFS From the above, the sequence x(n,, n,) is related to X(kl, k,) by
operation of ~ ( k , k, , ) ~ ( k , , k,).
3.2 THE DISCRETE FOURIER TRANSFORM where "-" denotes an invertible operation. The relationship between x(nl, n,)
and X(k,, k,) can be easily obtained from (3.6) and the DFS pair in (3.1). The
3.2.1 The Discrete Fourier Transform Pair
I DFT pair is given by
a finite-extent sequence x(n,, n,) by preserving one period of f(n,, n,) but setting X(k1, k2) =
01k,sN,- 1 , 0 s k 2 ~ N 21
-
all other values to zero. Specifically, we have
x(nl, n2) = 3 n l , n2)RN,x ~ 2 ( n n2)
l, (3.2a) 0, otherwise.
kl=o
x
N2-1
k2=0
X(kl, k2)ei(2"/Ni)k~n~ej(2n/Nz)k2nz
An example o f f (n,, n,) and x(n,, n,) when N, = 3 and N2 = 2 is shown in Figure # l O s n 1 5 N l - 1,0sn,':N2 - 1
3.2. Clearly, the operation given by (3.2) is invertible in that f(n,, n,) can be
determined from x(nl, n,) by otherwise.
x x
--
TABLE 3.2 PROPERTIES OF THE DISCRETE FOURIER TRANSFORM
--
-- - Property 9. Symmetry Properties
x(nl, n,), y(nl, n,) = 0 outside 0 5 n, 5 N, - 1, 0 5 n, 5 N2 - 1 (a)x*(n1,n2) X*( - kl, - k 2 ) R N , XNl(k1, k2) = X* (( - kl)N,,(- k2)NZ)
x(n1, n2) X(k1, k2) (b) real x(nl, n2) X(kl, k2) = X*( - k l , - k2)RNl .N2(kl,k,)
y(n1, n2) Y(k1, k2)
XR(-kl, - k2)RN1xN2(k1,k 2 )
-
Nl x N2-point DFT and IDFT are assumed. XR(kl, k2) =
Property 1. Linearity Xdkl, k2) = -XI( - kl, - k2)RNI XNZ(k1, k2)
n,) + by(n1,n2) aX(kl, k2) + bY(kl,k2)
Properry 2. Circular Convolution
x(n1, n2) @ y(n1, n2)
=
-
n2>O j(nl, n 2 ) l R N l x N 2 ( n l ? n2)
X(k1, k2)Y(k1, k2)
X ( k l , k2)1 = lX( (- kl? -k2)lRN,xN2(kl, k 2 )
-
Some of the more important properties are listed in Table 3.2. Most of these
Property 4. Multiplication
1 properties are straightforward extensions of 1-D results.
x(nl, n2ly(n1,n,) X(klj k2)@ Y ( k l ,k2) To illustrate how DFT properties can be obtained from (3.6) and the DFS
properties discussed in Section 3.1, we will derive the circular convolution property.
Consider the periodic convolution property in Table 3.1:
..
Pro~ertv7. Initial Value and DC Value Theorem
1 N1-1 N2-1 We define x(nl, n 2 ) O y ( n l ,n2), the circular convolution of x(n,, n,) and y(n,, n,),
(a)x(0, 0 ) = -NIN, kC C X(k1, k2)
,=o h = o by
x(nl, n2) @ ~ ( n ln2) , = [ f ( n , , n2) @ 9 ( n 1 7 n 2 ) ] R N 1 x N2(nl,n2)- (3-12)
From (3.11) and (3.12),
Property 8. Parseval's Theorem
1 ,* X ( k 1 7
x(n1, n2) @ ~ ( ~ n2) k z ) Y ( k l , k2)7 (3.13)
which is the desired result.
Under some conditions, the result of circular convolution is identical to the
result of linear convolution. Suppose f(n,, n2) is zero outside 0 s n , r N; - 1
and 0 5 n, IN; - 1 and g(n,, n,) is zero outside 0 r n , 4 N ; - 1 and 0 r n,
IA"' - 1. Then Property 3 states that f(n,, n2)Og(nl, n,) = f(n,, n2) * g(n,, n,)
iff (n,, n2)O g ( n l , n,) is obtained with assumed periodicity of N , x N2 such that
where L,L2 is the number of segments in the image. Convolving x(n,, n,) with
h(n,, n2) and using the distributive property of convolution, we obtain
Since xij(nl, n,) is a sequence whose region of support is much smaller than
x(nl, n,), xii(nl, n,) * h(n,, n,) can be computed by performing the inverse DFT
operation of Xij(kl, k2)H(kl, k,) with a much smaller size DFT and IDFT. Since
the DFT and IDFT used are much smaller in size, this computation requires much
less memory. The overlap-add method is so named because the results of Figure 3.5 Overlap-add method.
xij(nl, n,) * h(n,, n2) are overlapped and added, as shown in Figure 3.5.
In the overlap-save method, we again consider a small segment of x(n,, n,).
Let h(nl, n,) be an M, x M,-point sequence, zero outside 0 5 n, 5 M , - 1,
0 5 n2 5 ?d2 - I. Consider a segment of x(nl, n,). The segmeiit is zero outside
Sec. 3.2 The Discrete Fourier Transform
146 The Discrete Fourier Transform Chap. 3
0 5 n, 5 N; - 1, 0 I n, 5 N; - 1. We will denote the segment by xf(n1,n,).
We choose N; and N; sufficiently large relative to M, and M,, but sufficiently small
relative to the image size:
M, << N; << N, (3.16a)
for M, - 1 5 n, 5 N; - 1, M, - 1 5 n, IN; - 1
where the circular convolution is based on the assumed periodicity of N ; x N;.
The linear convolution can be performed by using (3.17), as is illustrated in
Figure 3.6. We first segment the output y(n,, n,) into small nonoverlapping seg-
ments yij(nl, n,) of N;' x N" points so that
- - -
(DCT) is extensively used. The DCT is the most widely used transform in a class
of image coding systems known as transform coders: this is discussed further in N-point 2N-point DFT 2N-point N-point
Chapter 10. In this section, we discuss the DCT, which is closely ielated to the (3.19)
x(n) y(n) Y(k) Cx(k)
DFT.
Sec. 3.3 The Discrete Cosine Transform 149
148 The Discrete Fourier Transform Chap. 3
The sequence x(n) is related to y(n) by
y(n) = x(n) + x(2N - 1 - n) (3.20)
= { - O s n s N - 1
1 n ) , N h n s 2 N - 1.
An example of x(n) and y(n) when N = 4 is shown in Figure 3.7. The sequence
y(n) is symmetric with respect to the half-sample point at n = N - 4. When we
form a periodic sequence f(n) by repeating x(n) every N poiHfs, f (n) has artificial
discontinuities, since the beginning and end part of x(n) are joined in the repetition
process. When we form a periodic sequence )i(n) by repeating y(n) every 2N
points, however, )i(n) no longer contains the artificial discontinuities. This is
shown in Figure 3.8 for the x(n) and y(n) shown in Figure 3.7. As we discuss in
Chapter 10, eliminating the artificial boundary discontinuities contributes to the
energy compaction property that is exploited in transform image coding.
The 2N-point DFT Y(k) is related to y(n) by
2N-1
Figure 3.8 Periodic sequences f (n) and g(n) obtained from x(n) and y(n) in Figure
Y(k) = y(n)WG, 0 5 k 5 2N - 1 (3.21a) 3.7.
n=O
Cx(k) =
($: 24.)
'lr
cos iN k(2n + 1). 0 a k a N - 1
(3.25)
With a change of variables and after some algebra, (3.22) can be expressed as lo, otherwise.
Equation (3.25) is the definition of the DCT of x(n). From (3.25), Cx(k) is an
N-point sequence, and therefore N values of x(n) are represented by N values of
The N-point DCT of x(n), Cx(k), is obtained from Y(k) by Cx(k). If x(n) is real, Cx(k) is real. If x(n) is complex, so is Cx(k).
To derive the inverse DCT relation, we relate Cx(k) to Y(k), Y(k) to y(n),
W$EY(k), 0 5k 5N - 1 and then y(n) to x(n). We first consider determining Y(k) from Cx(k). Although
otherwise Y(k) is a 2N-point sequence and C,(k) is an N-point sequence, redundancy in Y(k)
due to the symmetry of y(n) allows us to determine Y(k) from Cx(k). Specifically,
from (3.23),
W,-,kY(2N - k), O S k s 2 N - 1
Y(k) = k = N.
From (3.24) and (3.26),
OakSN- 1
Y(k) = k = N (3.27)
N k), N + 1 s k h 2 N
- W G ~ / ~ C , ( ~- - 1.
Figure 3.7 Example of (a) x(n) and (b) y(n) = x(n) + x(2N - 1 - n). The
sequence y(n) is used in the intermediate step in defining the discrete cosine trans-
form of x(n). Sec. 3.3 The Discrete Cosine Transform 151
In computing the DCT and inverse DCT, Steps 1 and 3 are computationally quite
simple. Most of the computations are in Step 2, where a 2N-point D l T is computed
1 0, otherwise. for the DCT and a 2N-point inverse DFT is computed for the inverse DCT. The
DFT and inverse DFT can be computed by using fast Fourier transform (FlT)
Equation (3.30) can also be expressed as algorithms. In addition, because y(n) has symmetry, the 2N-point DFT and inverse
D l T can be computed (see Problem 3.20) by computing the N-point DFT and the
- 2 IT
w(k)Cx(k) cos - k(2n
2N
+ I), 0 5 n I N - 1 N-point inverse D l T of an N-point sequence. Therefore, the computation in-
(3.31a) volved in using the DCT is essentially the same as that involved in using the D R .
10, otherwise In the derivation of the DCT pair, we have used an intermediate sequence
y(n) that has symmetry and whose length is even. The DCT we derived is thus
called an even symmetrical DCT. It is also possible to derive the odd symmetrical
DCT pair in the same manner. In the odd symmetrical DCT, the intermediate
where
sequence y(n) used has symmetry, but its length is odd. For the sequence x(n)
shown in Figure 3.9(a), the sequence y(n) used is shown in Figure 3.9(b). The
Equation (3.31) is the inverse DCT relation. From (3.25) and (3.31), length of y(n) is 2N - 1, and y(n), obtained by repeating y(n) every 2N - 1
points, has no artificial discontinuities. The detailed derivation of the odd sym-
I Discrete Cosine Transform Pair I metrical DCT is considered in Problem 3.22. The even symmetrical DCT is more
commonly used, since the odd symmetrical DCT involves computing an odd-length
2 Tr
k ( n ) cos- k(2n
2N
+ I), 0 r k I N - 1 DFT, which is not very convenient when one is using FFT algorithms.
I 1 0,1 N - 1 otherwise.
- 2 Tr
w(k)Cx(k) cos - k(2n
2N
+ I), 0 I n 5 N - 1
(3.32b)
I 1 0, otherwise. I
From the derivation of the DCT pair, the DCT and inverse DCT can be computed
by
Computation of Discrete Cosine Transform
The 1-D DCT discussed in Section 3.3.1 can be extended straightforwardly to two
dimensions. Let x(n,, n,) denote a 2-D sequence of N, x N, points that is zero
outside 0 5 n, 5 N, - 1, 0 5 n, 5 N, - 1. We can derive the 2-D DCT pair
by relating x(n,, n,) to a new 2N1 x 2N2-point sequence y(n,, n,), which is then
related to its 2N1 x 2N2-point DFT Y(k,, k,). We then relate Y(k,, k,) to
Cx(kl, k,), the N, x N2-point DCT. Specifically,
Nl x N2-point
x(n1, n2) - y(n1, n2) -
2N1 x 2N2-point DFT 2N1 x 2N2-point
Y(k1, k2) - N, x N2-point
CX(k1, k,).
(3.33)
The sequence x(n,, n,) is related to y(n,, n,) by
y(nl, 4 ) = x(nl, a ) + x(2Nl - 1 - n,, n2) + x(nl, 2N2 - 1 - n2) (3.34)
+ x(2Nl - 1 - n,, 2N2 - 1 - n,).
An example of x(n,, n,) and y(n,, n,) when N, = 3, N, = 4 is shown in Figures
3.10(a) and (b), respectively. A periodic sequence i(nl, n,) with a period of N1
x N, obtained by repeating x(nl, n,) is shown in Figure 3.11(a). A periodic Figure 3.11 Periodic sequences f ( n , , n2) and )i(n,, n,) obtained from x(n,, n,)
sequence )i(n,, n,) with a period of 2N1 x 2N2 obtained by repeating y(n,, n,) is and y(n,, n,) in Figure 3.10.
shown in Figure 3.11(b). The artificial discontinuities present in f(n,, n,) are not
present in )i(nl, n,). From (3.34), (3.35), and (3.36), and after some algebra,
The sequence y(n,, n,) is related to Y(k,, k,) by NI-1 N2-1
Tr Tr
4x44, n,) cos -kl(2nl + 1) cos -k2(2n2 + I),
~ N I 2N7
The N1 x N2-point DCT of x(nl, n,), Cx(kl, k,), is obtained from Y(k,, k,) by
W$$:WggY(kl, k2), 0 5 kl 5 Nl - 1, 0
otherwise.
5 k2 5 N2 - 1
(3.36) lo. otherwise. (3.37)
Equation (3.37) is the definition of the 2-D DCT.
The inverse DCT can be derived by relating Cx(kl, k,) to Y(k,, k,), exploiting
the redundancy in Y(k,, k,) due to the symmetry of y(nl, n,), relating Y(k,, k,)
to y(n,, n,) through the inverse DFT relationship, and then relating y(n,, n,) to
x(n,, n,). The result is
1 N1-1 N2-1
2
I- kl(2nl + 1) cos- k2(2n2+ I),
Tr Tr
~ ~ ( k ~ ) w , ( k , ) C ~k,)
( k cos-
,,
N1N2 k,=o k 2 = 0 2Nl 2N2
for O s n , s N , - 1 , 0 s n 2 5 N 2- 1
Loy otherwise.
where
Figure 3.10 Example of (a) x(n,, n,) and (b) y(n,, n,) = x(n,, n,) + x(2N1 - 1
- n,, n,) + x(n,, 2N2 - 1 - n,) + x(2N1 - 1 - n , , 2N2 - 1 - n,). The sequence
y(n,, n,) is used in the intermediate step in defining the discrete cosine transform
of x ( K , , n,).
Sec. 3.3 The Discrete Cosine Transform 155
The Discrete Fourier Transform Chap. 3
sequence. The computations required for the DCT are essentially the same as
From (3.37) and (3.38), those required for the DFT. The DCT discussed above is the 2-D even symmetrical
discrete cosine transform. The 2-D odd symmetrical discrete cosine transform can
Two-Dimensional Discrete Cosine Transform Pair also be derived.
N I - 1 Nz-1
C C + I),
'lr 'lr
4x(nl, n,) cos -kl(2nl + 1) cos -k,(2n2 3.3.3 Properties of the Discrete Cosine Transform
n1=0 nz-o 2N1 2%
Cx(k1, k2) = We can derive many useful DCT properties from the DCT pair in (3.39). However,
for 0 r k, I N , - 1, 0 r k, 5 N2 - 1 (3.39a) the relationship in (3.33) and the DFT properties discussed in Section 3.2 are often
otherwise. more convenient and easier to use in deriving DCT properties. Some of the more
important properties are listed in Table 3.3.
1 NI-1 Nz-1
rr To illustrate how DCT properties can be obtained from (3.33) and the DFT
- C
NlN2 k , = 0
C.
kz-0 2Nl
'lr
w1(k1)w2(k2)Cx(k1,k,) cos -k1(2n1 + 1) cos -k,(2n2 + I),
2% properties, we will derive Property 3, which is analogous to Parseval's theorem for
x(n1, n,) = the DFT. From Parseval's theorem in Table 3.2,
for0 5 n1 IN1 - 1,O s n 2 I N , - 1 (3.39b) 2N1-1 2Nz-1 1 2N1- 1 2Nz-1
I
0, otherwise. C C
nl=0 n2=0
aC
IY ("17 ni)I2 = =o C
kz=O
IY(k1, k2)12. (3.40)
From (3.33) and (3.34).
The DCT and inverse DCT can be computed by
44, n2)
y(n17 n2)
--
x(nl, n,), y(n,, n,) = 0 outside 0 5 n, 5 N, - 1 , 0 5 n,
Cx(k1, k2)
Cy(k17 k2)
5 N2 - 1
+ 4)).
--
Property 4. Symmetry Properties
( a ) x*(n,, n2)
( b ) real x(n,, n,)
C,'(kl, k2)
real C,(k,, k,)
R ( w ) = elwJ2
n=O
h ( n ) cos ( w ( n
n=O
h ( n ) cos w ( n + 4). (3.49)
x ( n ) that is zero for n < 0. To derive the cosine transform relation, we relate Equation (3.49) is the definition of the cosine transform of x ( n ) . Note that C x ( w )
x ( n ) to a new sequence r ( n ) , which is then related to its Fourier transform R ( w ) . is an even function of o. From (3.32) and (3.49), for an N-point causal sequence
We can then relate R ( o ) to the cosine transform C x ( w ) . Specifically,
x(n),
Noting that Cx(o)is an even function of o , from (3.45) and (3.52), we have
Cx(w)cos o ( n + t ) do, n r 0,
(3.53)
0, otherwise.
Equation (3.53) is the inverse cosine transform relation. From (3.49) and (3.53),
Cx(o)= z
3
n=O
2x(n) cos o ( n + t)
0 , otherwise.
n l - 0 nz=O
4x(nl, n,) cos o l ( n l + f ) cos w2(n2 + 4). (3.58) Cx(ol,0,) at equally spaced points on the Cartesian grid.
--
x(n,, n,), y(n,, n,) = 0 outside n, 2 0, n, 2 0
x(nl, n,) Cx(w,, w,)
~ ( 4n2)
, Cy(w1, w2) f (kl, n2)
Property 1. Linearity
4 n 1 , n2) + n2) - aCx(w1, w2) + bCy(w,, w,)
We first compute f(k,, n,) from x(n,, n,) and then X(k,, k,) from f(k,, n,). Con-
sider a fixed n,, say n, = 0. Then x(n,, n2)ln,=orepresents a row of x(n,, n,), and
-
Property 2. Separable Sequence f(k,, n2)ln,=ois the 1-D N,-point DFT of x(n,, n2)lnz=owith respect to the variable
n,. Therefore, f(k,, 0) can be computed from x(n,, n,) by computing one 1-D
~ ( 4n2)
, = x,(n,)xz(n,) Cx(w,, w2) = Cx,(w,)Cx2(w2) N,-point DFT. Since there are N, different values of n, in f(k,, n,) that are of
Property 3. Energy Relationship interest to us, f(kl, n,) can be computed from x(n,, n,) by computing N, 1-D N1-
point D l T s . This is illustrated in Figure 3.14.
Once f(k,, n,) is computed, from (3.63) we can compute X(k,, k,) from
Property 4. Symmetry Properties f (kl, n2) by
(b) x*(n,, n2) --
(a) Cx(w,, w2) = Cx( - w,, w,) = Cx(w1, - w2) = Cx( - w,, - w2)
..
0 . .
Number of
I
1-D
* N2-point DFT -
4)
0 .. ...
0 . .
multiplications
( N , = N2 = 512)
Number of additions
( N , = N2 = 512)
... ... Direct computation NfN: N:N$
(100%) (100%)
Row-column decomposition with N I N ~ ( N+I N2) N I N ~ ( N+I N2)
direct 1-D DJT computation (0.4%) (0.4%)
Row-column decomposition with
1-D FFT algorithm 2 log2 (N,N2) N1N2 log2 ( N I N J
(0.007%)
Figure 3.15 Computation of X(k,, k,) from f(k,, n,) by computing N, N2-point
i-G E E . Sec. 3.4 The Fast Fourier Transform 165
ki
Interchange
Figure 3.21 Second stage of Eklundh's method. (a) Input to the second stage, showing
exchanges; (b) output of the second stage.
Figure 3.19 Transformation of one N x N-point x(n,, nJ transposition problem to four
Nl2 x Nl2-point sequence transposition problems.
Sec. 3.4 The Fast Fourier Transform
170 The Discrete Fourier Transform Chap. 3
straightforward, we will only summarize the main results. Keeping track of the
data indices, however, is more complicated in 2-D, so careful attention should be
paid to them.
Consider an N, x N,-point sequence x(nl, n,) that is complex and is zero
outside 0 5 n, s N, - 1, 0 5 n, 5 N, - 1. We will assume that N, = N, = N
= 2M for convenience and consider the extension of the 1-D decimation-in-time
algorithm. Rewriting X(k,, k,), we have
at the four nodes in the figure by a , b, c, and d. Then we can express X(k,, k,),
X(k, + Nl2, k2), X(kl, k2 + N/2), X(kl + Nl2, k, + N/2) by
X(k,, k,) = A + C (3.73a)
where
From Figure 3.23, we can also determine the number of I10 operations re-
quired when data are stored on slow memory media such as disk. We will assume,
as in Section 3.4.2, that each row is stored as one block on the disk, and that 2N
words of fast memory which can store two rows of data are available. In the
vector radix FFT, we first need to perform bit reversal of input. From Figure
3.23, two rows of the bit-reversed data come from two rows of the input. We can
read these two rows of x(nl, n,) into fast memory, bit-reverse the data within the
fast memory with respect to the n, dimension, and then write the data back to the
disk. Since each row is read and written once, the total number of I10 operations
for the complete bit reversal is 2N. Now consider one stage of the F R algorithm,
shown in the enclosed box in Figure 3.23. Again, two complete rows of the output
come from two complete rows of the input. Therefore, the two rows of input can
be read and stored in the fast memory. Within the fast memory, necessary linear
combinations are performed, and the results are written back to the disk. Since Figure 3.25 Decimation-in-space 2 X 2 vector radix FFT, for the case N, = N 2 = 4. For
each row is read from the disk once and written to the disk once, the total number clarity, many values are omitted.
of I10 operations involved in this stage is 2N. There are log, N stages, so we have
a total of 2N -k 2N log, N I10 operations, where the 2N comes from the bit-
reversal operation. This is approximately the same number of I10 operations computation aspect are roughly the same. Since the row-column decomposition
required in the row-column decomposition approach. method is quite simple to implement by using 1-D FFT algorithms, it is used more
The complete vector radix FFT algorithm for the N, = N, = 4 case is shown often than the vector radix FFT algorithm.
in Figure 3.25. This is obtained by carrying out one additional stage in Figure
3.23, where each Nl2 x Nl2-point DFT computation is transformed to four
Nl4 x Nl4-point DFT computations. The Nl4 x Nl4-point DFT in this case is a 3.4.4 Fast Algorithms for One-Dimensional Discrete Fourier
1 x 1-point DFT, which is an identity. The blocks of input in Figure 3.25 are Transform Computation
arranged in the following order: first, bottom left [gee(nl, n,)], next, bottom right
[goe(nl,n,)], then top left [ge,(nl, n,)], and last top right [goo(nl,n,)]. The output As we discussed in Section 3.4.1, efficient computation of a 2-D DFT is closely
blocks are arranged in the same manner. The four elements in each block are related to efficient computation of a 1-D D R . In this section, we discuss two
also arranged in this order. From Figure 3.25, it is clear that data indexing using fast 1-D DFT computation algorithms that are significantly different from the
the 1-D flowgraph representation is complicated. It is often easier to use the Cooley-Tukey FFT algorithm.
2-D representation, for instance the one in Figure 3.23, in studying a vector radix Shortly after Cooley and Tukey published the F R algorithm, there was a
FFT algorithm. flurry of activity: identifying earlier related work,* extending the idea, and applying
Vector radix FFT algorithms are roughly equivalent in many respects to the
row-column decomposition approach discussed in Section 3.4.1, and do not offer *It has recently been discovered that Gauss used the Cooley-Tukey FFT algorithm in
any significant advantages. The number of arithmetic operations required, the the early nineteenth century [Heideman et al.].
number of 110 operations required when data are stored on disk, and the in-place
Sec. 3.4 The Fast Fourier Transform 177
176 The Discrete Fourier Transform Chap. 3
The sequence w(n) is obtained from W5 by
the result to practical problems. By the early 1970s, most of the interesting the-
oretical developments were thought to have been completed. In 1976, however, w(0) = w:
Winograd proposed an algorithm that significantly reduces the number of multi-
plications compared to the Cooley-Tukey approach. For some values of N, the
number of multiplications required to compute an N-point DFT is proportional to
N, not N log, N. This led to another cycle of identifying earlier related work,
extending the idea, and applying the result to practical algorithms. Now that the
dust has settled somewhat, two algorithms appear to be worth considering. They The sequence v(n) is obtained from X(k) by
are the prime factor algorithm (PFA), which was originally developed by [Good]
and later developed further by [Kolba and Parks] and [Burrus and Eschenbacher],
and the Winograd Fourier transform algorithm (WFTA). In this section, we briefly
summarize the basic ideas behind these two algorithms and their practical signif-
icance. A detailed treatment of this topic requires a background in number theory, v(2) = X(4) (3.79~)
and the interested reader is referred to [McClellan and Rader]. v(3) = q3). (3.79d)
Both the PFA and the WFTA are based on three key ideas. The first is the
From (3.77), (3.78), and (3.79), it can be shown (see Problem 3.33) after some
recognition that an N-point DFT computation for the case when N is a prime
algebra that
number or a power of a prime number can be expressed as a circular convolution.
The result for the case when N is prime was originally described by [Rader], and
was primarily a curiosity when the discovery was published in 1968. To illustrate
this result, let us consider an example for the case N = 5. The 1-D DFT of a with an assumed periodicity of N - 1 for the circular convolution. This method
5-point sequence x(n) is given by can be used in computing a DFT by performing a circular convolution. Specifically,
from x(n) and W,, u(n) and w(n) are obtained. From (3.80), v(n) is computed
by a 4-point circular convolution of u(n) and w(n). Using (3.79), we obtain X(k)
from v(n). Using (3.76), we obtain X(k) from X(k). This is illustrated in Figure
3.26. It is clear that a 5-point DFT can be computed from the result of a 4-point
where w5 = e-i(2r/5) circular convolution. Since an N-point DFT for the case when N is a prime or a
We first define X(k) by power of a prime can always be expressed in terms of an N - 1-point circular
convolution, an efficient method for circular convolution potentially leads to an
efficient method for DFT computation.
The second key idea due to Winograd is the result that the number of mul-
tiplications (assuming that only a certain type of multiplications is counted) required
From (3.74) and (3.79, X(k) can be obtained from x ( k ) by
for a circular convolution has a theoretical minimum, and that this minimum can
be achieved. As an example, Figure 3.27 shows the computation of a 3-point
circular convolution of x(n) and h(n). In the figure, the coefficients associated
with h(n) are assumed to have been precomputed. A 3-point circular convolution,
We will now illustrate that x ( k ) can be expressed as the result of a Cpoint circular
convolution (N - 1 being equal to four). We define three new sequences u(n),
w(n), and v(n). The sequence u(n) is obtained from x(n) by
u(0) = x(1) (3.77a)
u(1) = x(3) (3.77b)
where the set Pi is mutually prime, the 1-D DFT can be expressed as a multidi-
mensional (m-D) DFT of size P, X P, X . . . X P,. A set Pi 1 5 i 5 M is
called mutually prime if the only common divisor of all Pi is 1. This particular
mapping of a 1-D problem to an m-D problem was first proposed by [Good]. The
method is based on the Chinese remainder theorem, which requites that all the
factors be mutually prime. The requirement that the factors be mutually prime
eliminates the important case of N = 2,. AS an example, consider N = P, . P,
= 2 . 3. Clearly 2 and 3 are mutually prime. From x(n), we define a new 2 x
3-point sequence w(n,, n,) by
w(0, 0) = x(0) (3.83a)
Figure 3.27 Computation of 3-point circular convolution of two 3-point sequences
x(n) and h(n) with four multiplications, the theoretically minimum number. After
[McClellan and Rader].
when implemented by using the scheme in Figure 3.27, requires four multiplica-
tions. Winograd's findings indicate that this is the minimum possible for this case.
These two key ideas provide an approach to computing the DFT when its
size N is a prime or a power of a prime. The result of applying these ideas to the
problem of computing a 3-point DFT is given by [McClellan and Rader]: From X(k), we define a new 2 x 3-point sequence W(kl, k,) by
W(0, 0) = X(0) (3.84a)
From (3.83), (3.84), and the 2-D DFT definition of (3.7), it can be shown that
W(k,, k,) is the 2 x 3-point DFT of w(n,, n,), that is,
X(2) = c, - m,. (3.81i) From x(n), a 2-D sequence w(n,, n,) is obtained by using (3.83). The 2-D DFT
W(kl, k,) is next computed from w(n,, n,). The 1-D DFT X(k) is then obtained
The above procedure requires one multiplication, 1 shift (multiplication by f), and from W(k,, k,) by using (3.84). This is illustrated in Figure 3.28.
six additions. Efficient methods of computing the DFT based on the two key ideas
have been developed for many different DFT sizes including 2, 3, 2,, 5, 7, 2,, 3,,
11, 13, 24, 17, 19, 5,, and 25. Fewer multiplications are required for these cases
than for a Cooley-Tukey FFT algorithm for a DFT computation of similar size. Figure 3.28 1-D DFT computation by m-D DFT computation.
As ?! i~creases,however, :he mcthod becomes quite complicated, a i d the number
Sec. 3.4 The Fast Fourier Transform
180 The Discrete Fourier Transform Chap. 3
lander et al.]. For a fast method for matrix transposition, see [Eklundh]. For
vector radix F l T algorithms, see [Harris et al.; Rivard].
The PFA differs from the W l T A in the way the m-D D l T is computed. In For a tutorial presentation of number theory and its application to fast Fourier
the PFA, the m-D D m is computed by using the row-column decomposition, transform algorithm development, see [McClellan and Rader]. For readings on
involving many 1-D D l T computations of size PI, P,, . . . P, ,which are typically Winograd Fourier transform algorithm, see [Winograd (1976, 1978); Silverman;
' much shorter than N. In essence, we are computing a 1-D D l T by computing Nawab and McClellan; Morris]. For prime factor algorithms, see [Kolba and
many shorter 1-D D l T s . The resulting shorter 1-D D l T s are computed by efficient Parks; Burrus and Eschenbacher; Johnson and Burrus].
circular convolution algorithms. In the W l T A , the special structure of short For F l T algorithms for real-valued data, see [Vetterli and Nussbaumer; Sor-
transforms is used to nest the multiplications. This decreases the number of mul- ensen e t al.]. For in-place FFT algorithms, see [Pitas and Strintzis]. For F I T
tiplications, but increases the number of additions. algorithms defined for arbitrary periodic sampling lattices such as hexagonal sam-
When we compare the various methods of computing the D l T , a number of pling, see [Mersereau and Speake; Guessoum and Mersereau].
issues must be considered, such as the number of computations, memory size, and
regularity of structure. The WFTA has a clear advantage over the Cooley-Tukey N. Ahmed, T. Natarajan, and K. R. Rao, Discrete cosine transform, IEEE Trans. Comput.,
F l T and the PFA in the number of multiplications required. However, it has a Vol. (2-23, January 1974, pp. 90-93.
number of disadvantages. It requires significantly more additions than the other L. Auslander, E. Feig, and S. Winograd, New algorithms for the multidimensional discrete
two methods, and the algorithm is quite complex and requires a large program. Fourier transform, IEEE Trans. on Acoust., Speech and Sig. Proc., Vol. ASSP-31, April
In addition, in-place computation does not appear to be possible, since the number 1983, pp. 388-403.
of intermediate variables created in this method is much greater than the original R. E. Blahut, Fast Algorithms for Digital Signal Processing, Reading, MA: Addison-Wesley,
data size. The number of coefficients that must be stored is also larger and the 1985.
number of data transfers is greater. As a result, the W l T A appears to be useful E. 0 . Brigham, The Fast Fourier Transform. Englewood Cliffs, NJ: Prentice-Hall, 1974.
only in a rather restricted environment where multiplications are much slower than C. S. Burrus, Index mappings for multidimensional formulation of the D l T and convolution,
additions and the cost of memory is very small. IEEE Trans. on Acoust., Speech and Sig. Proc., Vol. ASSP-25, June 1977, pp. 239-242.
The major advantages of the PFA over the Cooley-Tukey FlT are the sig- C. S. Burrus and P. W. Eschenbacher, An in-place, in-order prime factor F l T algorithm,
nificant reduction in the required number of multiplications without a significant IEEE Trans. on Acoust., Speech and Sig. Proc., Vol. ASSP-29, August 1981, pp. 806-
increase in the number of additions, the absence of twiddle factors (the W, term), 817.
the lack of need for bit-reversal data indexing, and the small number of stored C. S. Burrus and T. W. Parks, DFTlFFT and Convolution Algorithms. New York: Wiley,
constants. Because of these advantages, PFA benchmark programs run very quickly, 1985.
and the PFA is a good choice for a floating-point software implementation in micro- W. H. Chen, C. H. Smith, and S. C. Fralick, A fast computational algorithm for the discrete
computers or minicomputers. However, it is not as regular in structure as the Cooley- cosine transform, IEEE Trans. Commun., Vol. COM-25, September 1977, pp. 1004-1009.
Tukey FFT. Each stage uses a different module or butterfly, and the structure of R. J. Clarke, Transform Coding of Images. London: Academic Press, 1985.
these modules is such that many additions are camed out separately from multipli- J. W. Cooley and J. W. Tukey, An algorithm for the machine calculation of complex Fourier
cations. As a result, the PFA cannot efficiently utilize many existing array processors series, Math. Comput., Vol. 19, April 1965, pp. 297-301.
and vector-oriented machines that are efficient for inner-product-type (multiply J. 0 . Eklundh, A fast computer method for matrix transposing, IEEE Trans. Comput.,
and add) operations. In addition, the basic concepts of the PFA are considerably Vol. (2-21, July 1972, pp. 801-803.
more difficult to understand. Consequently, writing a program takes more effort. D. F. Elliot and K. R. Rao, Fast Transforms, Algorithms, Analyses, Applications. New
In contrast, the Cooley-Tukey F l T is very simple to understand. It has a clean, York: Academic Press, 1982.
highly regular structure. The same butterfly or set of arithmetic instructions is I. J. Good, The relationship between two fast Fourier transforms, IEEE Trans. on Com-
used in each stage, and the structure is well suited to general-purpose computers, puters, Vol. C-20, March 1971, pp. 310-317.
array processors with parallel or pipelined architecture, and VLSI machines. A. Guessoum and R. M. Mersereau, Fast algorithms for the multidimensional discrete
Fourier transform, IEEE Trans. on Acoust., Speech and Sig. Proc., Vol. ASSP-34, August
REFERENCES 1986, pp. 937-943.
D. B. Harris, J. H. McClellan, D. S. K. Chan, and H. W. Schuessler, Vector radix fast
For books on the fast Fourier transform and related topics, see [Brigham; Fourier transform, Proc. Int. Conf. Acoust., Speech and Sig. Proc., April 1977, pp. 548-
McClellan and Rader; Elliot and Rao; Blahut; Burrus and Parks]. 551.
For readings on the discrete cosine transform, see [Ahmed et al.; Chen et M. T. Heideman, D. H. Johnson, C. S. Burrus, Gauss and the history of the fast Fourier
al.; Narasimha and Peterson; Makhoul; Clarke]. transform, IEEE ASSP Magazine, Vol. 1, October 1984, pp. 14-21.
- For methods in which we compute a 2-D D l T by computing many 1-D DlTs.
see [Nussbaumer (1977); Nussbaumer and Quandalle; Nussbaumer (1981); Aus- Chap. 3 References 183
Figure P3.9b
in terms o f
Taking the complex conjugate o f ( I ) ,we have (a) Determine Cx(k,,k,), the 2 x 2-point DCT of x(n,, n,).
(b) For the sequencex(n,, n,), verify the energy relationship given by Equation (3.43)
with N, = N, = 2.
3.16. Let x(n,, n,) be zero outside 0 5 n, 5 N, - 1, 0 5 n, 5 N, - 1. The sequence
From the definition of the DFT, x(n,, n,) is also a separable sequence so that
show that C,(k,, k,) = Cx,(k,)Cxz(k2), where Cx(k,, k,) is the 2-D N1 x N2-point
From ( 2 ) and (3), DCT of x(n,, n,), Cx,(k,)is the l-D N,-point DCT of x,(n,), and C,,(k,) is the l-D
Y*(k,,k2) = X ( - k , , - k z ) . (4) N,-point DCT of x,(n,).
From (4), Y(k,, k,) = DFT[x*(n,,n,)] = X * ( - k , , -k,). 3.17. Let x(n,, n,) be an N, x N2-point sequence and Cx(k,,k,) be an N, x N2-pointDCT
(5)
(a) The result obtained in ( 5 ) is not correct. Which step in the above derivation is of x(n1, n2).
wrong? (a) Show that the N, x N,-point DCT of x*(n,, n,) is C:(k,, k,).
> ,
(b) By relating the DFT to the DFS representation and using the properties o f the (b) Show that if x(n;, n,) is ieal, Cx(k,,k,) will also be real.
DFS, express the DFT o f x*(n,, n,) in terms of X(k,, k,). 3.18. Let x(n,, n,) denote a first-quadrant support sequence, and let Cx(o,,o,) denote the
3.14. Determine a sequence x(n,, n,) that satisfies all o f the following conditions: cosine transform of x(n,, n,). Show each of the following symmetry properties.
(a) Cx(w,,o,) = Cx(-o,, o,) = Cx(ol,- o,) = Cx(- o , , - o,).
Condition I . X ( o , , o,), the Fourier transform o f x(n,, n,), is in the form of (b) The cosine transform of x*(n,, n,) is given by C:(w,, o,).
X(o1, 0 2 ) = A + B COS ( 0 , + 0,) + C COS (o, - 0,)
(c) Cx(ol,o,) of a real sequence x(n,, n,) is real.
3.19. Let x(n,, n,) denote a first-quadrant support sequence, and let Cx(o,,w,) denote the
where A, B, and C are all real numbers. cosine transform o f x(n,, n,). Show that
Condition 2. Let Y(k,, k,) denote the 4 x 4-point DFT of x(n, - 2, n, - 2). It is
known that Y(0, 0) = 13.
Condition 3. Let w(n,, n,) denote the following sequence:
- 1. In computing
3.20. Let x(n) denote an N-point sequence that is zero outside 0 5 n 5 N
the N-point DCT o f x(n), the major computations involved are the 2N-point DFT
computation of y(n), which is related to x(n) by
y(n) = x(n) + x(2N - 1 - n).
In this problem, we show that the 2N-point DFT Y ( k )can be computed by computing
one N-point DFT.
y(nl, n,) = x(n,, n,) + x(2N1 - 1 - n,, n,) + x(n,, 2N2 - 1 - n,)
By sketching an example of y(n), v(n), and w(n), illustrate that + x(2N, - 1 - n,, 2N2 - 1 - n,)
w(n) = v(N - 1 - n), 0 ~n I N - 1
To show that the 2N, x 2N,-point DFT Y(k,, k,) can be computed by computing one
N, x N2-point DFT, we first divide y(n,, n2) into four N, x N2-point sequences
(b) From the definition of Y(k),
v(n1, n,), w,(n,, n,), w,(n,, n2) and w,(n,, n2) by
where
Substituting y(n) in the above expression with v(n) and w(n), show that w3(n,, n,) = y(2n1 '+"'1,2 4 + 1) J
(a) By following steps analogous to those taken in Problem 3.20, show that Y(k,, k,)
for 0 -c k, 5 N, - 1, 0 1 k2 IN, - 1 can be expressed in terms of V(k,, k,),
the N, x N2-point DFT of v(n,, n,). This problem shows that the computations
involved in computing the N, x N2-point DCT of an N, x N2-point sequence are
(c) Combining the results of (a) and (b), show that similar to those involved in computing the N, x N2-point DFT of the N, x N2-
point sequence.
(b) The major computations involved in computing the inverse DCT of Cx(k,, k,) are
those in the 2N, x 2N2-point inverse DFT of Y(k,, k,). To show that the 2N1
x 2N2-point inverse DFT of Y(k,, k2) can be computed by computing one N, X
(d) We define V(k), the N-point DFT of v(n), by N,-point inverse DFT, express V(k,, k,) in terms of Y(k,, k,). From v(n,, n,),
x(n,, n,) can be obtained directly.
3.22. Let x(n) denote an N-point I-D sequence that is zero outside 0 5 n 1 N - 1. In
defining the DCT of x(n), we first related x(n) to y(n) by
4.0 INTRODUCTION
Figure P3.33
Three steps are generally followed in using digital filters. In the first step, we
specify the characteristics required of the filter. Filter specification depends, of
Using the results in Section 3.4.4, verify that the 5-point DFT X(k) can be computed course, on the intended application. For example, if we wish to restore a signal
by a circular convolution.
that has been degraded by background noise, the filter characteristics required
depend on the spectral characteristics of the signal and the background noise. The
second step is filter design. In this step, we determine h(n,, n,), the impulse
response of the filter, or its system function H(zl, z2), that will meet the design
specification. The third step is filter implementation, in which we realize a discrete
system with the given h(nl, n,) or H(zl, 2,).
These three steps are interrelated. It does not make much sense, for example,
to specify a filter that cannot be designed. Nor is it worthwhile to design a filter
that cannot be implemented. However, for convenience, we will discuss the three
steps separately. From time to time, though, we will point out their interrela-
tionships.
For practical reasons, we will restrict ourselves to a certain class of digital
filters. One restriction is that h(nl, n,) must be real. In practice, we will deal
with real data. To ensure that the processed signal will be real, we will require
h(n,, n,) to be real. Another restriction is the stability of h(nl, n,); that is,
Zz,= -, Zt2;;=-, (h(nl, n,)( < a. In practice, an unbounded output can cause many
difficulties, for example, system overload. We will restrict our discussion, then,
to the class of digital filters whose impulse response h(nl, n,) is real and stable.
Digital filters can be classified into two groups. In the first group, h(nl, n,)
is a finite-extent sequence, so the filters in this group are called finite impulse
response (FIR) filters. In the second group, h(n,, n,) is of infinite extent, so the
filters in this group are called infinite impulse response (IIR) filters. We concentrate
A digital filter h(nl, n,) is said to have zero phase when its frequency response
H(ol, o,) is a real function so that
Strictly speaking, not all filters with real frequency responses are necessarily zero-
phase filters, since H(wl, 02) can be negative. In practice, the frequency regions
for which H(o,, w,) is negative typically correspond to the stopband regions, and
a phase of 180" in the stopband regions has little significance.
From the symmetry properties of the Fourier transform, (4.1) is equivalent
in the spatial domain to the following:
h(nl, n,) = h*(-n,, -n,). (4.2)
Since we consider only real h(nl, n,), (4.2) reduces to
Equation (4.3) states that the impulse response of a zero-phase filter is symmetric
with respect to the origin.
One characteristic of a zero-phase filter is its tendency to preserve the shape
of the signal component in the passband region of the filter. In applications such
as speech processing, the zero-phase (or linear phase) characteristic of a filter is
not very critical. The human auditory system responds to short time spectral
magnitude characteristics, so the shape of a speech waveform can sometimes change
drastically without the human listener's being able to distinguish it from the original. Figure 4.1 Illustration of the importance of zero-phase filters in image processing. (a) Orig-
inal image of 512 x 512 pixels; (b) processed image by a zero-phase lowpass filter;
In image processing, the linear phase characteristic appears to be more important. (c) processed image by a nonzero-phase lowpass filter.
Our visual world consists of lines, scratches, etc. A nonlinear phase distorts the
proper registration of different frequency components that make up the lines and
Sec. 4.1 Zero-phase Filters
196 Finite Impulse Response Filters Chap. 4
+ .....
to the image in Figure 4.l(a). The magnitude responses of the two lowpass filters
used in Figures 4.l(b) and (c) are approximately the same. The zero-phase char-
I l
acteristic is quite useful in applications such as image processing, and zero phase a . . . .
is very easy to achieve with FIR filters, due to (4.3). In addition, design and I
implementation are often simplified if we require zero phase. For these reasons,
we will restrict our discussion of FIR filters to zero-phase filters.
Consider a zero-phase impulse response h(nl, n,). From (4.3) and the Fourier
transform definition of (1.31), the frequency response H(o,, o,) can be expressed
as
where R, is the region of support of h(n,, n,) and consists of three mutually exclusive
regions: (0, 0), RI, and RK. The region RA ' is RL flipped with respect to the
origin. Combining the two terms in the right-hand side expression of (4.4) and
noting that h(nl, n,) = h(-n,, -n,), we can express (4.4) as
H(o,, o,) = h(0, 0) + x2 (h(nl, n2)e-j"1n1e-j"2n2+ h(-n I? -n2)ej"1n1ej"m2)
= h(0,O) +
(nl ,nz) E Rh
(m,nz)E Rh
From (4.5), H(w,, 0,) for a zero-phase FIR filter can always be expressed as a
linear combination of cosine terms of the form cos (o,n, + w,n,).
The zero-phase filter h(n,, n,) is symmetric with respect to the origin, so
approximately half of the points in h(n,, n,) are independent. As a result, the
filter is said to have twofold symmetry. In some applications, such as the design
of circularly symmetric filters, it may be useful to impose additional symmetries
on h(n,, n,). One such constraint is a fourfold symmetry given by
In the Fourier transform domain, (4.6) is equivalent to Figure 4.2 Independent points of an 11 X 11-point h(nl, n,) with (a) twofold, (b) fourfold,
and (c) eightfold symmetry.
H(w1, 02) = H ( - ~ 1 , 0 2 ) = H(u1, -02). (4.7)
Another such symmetry constraint is an eightfold symmetry given by of h(n,, n,), imposing symmetry constraints reduces the number of independent
parameters that must be estimated in the design and reduces the number of arith-
metic operations required in the implementation.
In the Fourier transform domain, (4.8) is equivalent to
4.2 FILTER SPECIFICATION
H(o1, ~ 2 =) H( - ~ 1~ , 2 =) H(o1, -4 = H(o2, ~ 1 ) . (4.9)
The independent points of h(nl, n,) in the twofold, fourfold, and eightfold sym- Like 1-D digital filters, 2-D digital filters are generally specified in the frequency
metries are shown in Figure 4.2 for the case when the region of support of domain. Since H(wl, o,) = H(ol + 27r, o,) = H(w,, o, + 27r) for all (o,, o,),
h(n,, n2) is 11 x I 1 points with square s h q e . Fnr a given regicn of support H(o,, o,) for -7r r o15 7r, -7r s w, r 7r completely specifies H(ol, o,). In
198 Finite Impulse Response Filters Chap. 4 Sec. 4.2 Filter Specification 199
unshaded regions have amplitude 0. Their corresponding impulse responses are
given by
addition, since h(n,, n,) is assumed real, H(w,, o,) = H *( - w,, - 0,). Specifying
H(w,,
. - w,) for -IT 5 o, 5 IT,0 5 o, 5 IT therefore completely specifies H(o,, o,)
- I
for all (o,, o,).
A filter is said t o have a circularly symmetric frequency response H(o,, o,)
if H(o,, w,) is a function of o: + w: for do: + o$5 IT and is constant outside
I
the region within -IT o, 5 IT, -IT Io, 5 IT. A filter is said to have a circularly
symmetric impulse response h(n,, n,) if h(n,, n,) is a function of n: + n$. As we
discussed in Section 1.3.3, circular symmetry of H(o,, o,) implies circular symmetry
of h(n,, n,). Circular symmetry of h(n,, n,) does not, however, imply circular
symmetry of H(o,, o,). Frequency responses of circularly symmetric ideal lowpass,
highpass, bandpass, and bandstop filters are shown in Figures 4.3(a), (b), (c), and
(d), respectively. The shaded regions in the figures have amplitude 1, and the
where hlp(nl, n,), hhp(nl, n,), hbp(nl, n,) and hbJ(nl, n,) represent the lowpass,
highpass, bandpass, and bandstop filters, respectively, and J,(x) is the Bessel func-
tion of the first kind and the first order. Equation (4.10) follows directly from
(1.36). In a sense, a circularly symmetric filter does not give preferential treatment
to any particular direction in the frequency domain. When we refer to such filters
as ideal lowpass filters, circular symmetry is generally assumed.
Since H(o,, o,) is in general a complex function of (w,, o,), we must specify
both the magnitude and the phase of H(o,, o,). For FIR filters, we require zero
phase, so we need only to specify the magnitude response. The method most
commonly used for magnitude specification is called the tolerance scheme. To
illustrate this scheme, let us consider the specification of a lowpass filter. Ideally,
a lowpass filter has only a passband region and a stopband region. In practice, a
sharp transition between the two regions cannot be achieved; the passband region
corresponds to (o,, w,) C Rp and the stopband region corresponds to (o,, w,) C R,,
as shown in Figure 4.4. The frequency region R, between the passband and the
Transition
band ( R , )
Stopband
(R.)
I I I
-*
(c) (d) Figure 4.4 Example of a 2-D lowpass filter specification using a tolerance scheme.
Figure 4.3 Frequency responses of circularly symmetric ideal filters. (a) Lowpass
filter; (b) highpass filter; (c) bandpass filter; (d) bandstop filter.
Sec. 4.2 Filter Specification
Finite Impulse Response Filters Chap.
We will discuss two specific methods. The first method is to obtain a 2-D window
stopband regions is called the transition band. Ideally, the magnitude response w(n1, n2) by
IH(wl, w,)l is unity in the passband region and zero in the stopband region. In w(ni, n2) = wl(nl)w,(n2) = W C ( ~t2)IrI
I , =nl,r,=n, (4.13a)
practice, we require 1 - 6, < IH(w,, w,)I < 1 + 6, for (w,, w,) C R, and
(H(w,, w,)J < 6, for (w,, w,) C R,. The variables 6, and 6, are called passband ~ ~ ( l2)
~ 1=3wa(t1)wb(t2). (4.13b)
tolerance and stopband tolerance, respectively. A lowpass filter, then, is completely
The functions wa(tl) and wb(t2)in (4.13b) are 1-D analog windows. We can think
specified by 6,, 6,, R,, and R,. Other filters can also be specified in an analogous of wl(nl) and w,(n,) as 1-D window sequences, or we can think of them as samples
manner. The choice of the filter specification parameters depends on the intended of 1-D analog window functions. The resulting sequence w(n,, n,) is separable,
application. and its Fourier transform W(w,, w,) is very simply related to W,(w,) and W,(w,)
by
4.3 FILTER DESIGN BY THE WINDOW METHOD AND THE w(wl, ~ 2 =) W1(w1)wz(w2) (4.14)
FREQUENCY SAMPLING METHOD where W,(w,) and W,(w,) are 1-D Fourier transforms of w,(n,) and w,(n,), re-
3 spectively. Because of (4.14), it is relatively straightforward to relate the sidelobe
The problem of designing a filter is basically one of determining h(n,, n,) or 1 behavior and mainlobe behavior of w(nl, n,) to those of w,(n,) and w,(n,). If the
H(z,, 2,) that meets the design specification. The four standard approaches to frequency response of the desired filter is a separable function, one simple approach
designing FIR filters are the window method, the frequency sampling method, the to design a 2-D filter h(nl, n,) is to design two 1-D filters h,(n,) and h2(n2) and
frequency transformation method, and optimal filter design. The window and the multiply them. If each of the two 1-D filters is designed using the window method,
frequency sampling methods are, in many respects, straightforward extensions of the approach is equivalent to using a separable 2-D window in (4.13) in the 2-D
1-D results. They are the topics of this section. Our discussion of these two filter design.
methods is relatively brief. Other design methods are discussed in later sections. The second popular method [Huang] is to obtain a 2-D window w(n,, n,) by
4.3.1 The Window Method
In the window method, the desired frequency response Hd(wl, w,) is assumed to
I
where wC(t1,t2) = wa(t)lr=- (4.15b)
be known. By inverse Fourier transforming Hd(wl, w,), we can determine the
desired impulse response of the filter, h,(n,, n,). In general, hd(nl, n,) is an The function wa(t) in (4.15b) is a 1-D analog window. In this method, a 2-D
infinite-extent sequence. In the window method, an FIR filter is obtained by analog window wc(tl, t,) is obtained by rotating a 1-D analog window wa(t). Note
multiplying hd(nl, n,) with a window w(n,, n,); that is, that Wc(al, a,), the 2-D analog Fourier transform of wc(tl, t,), is a circularly
symmetric function. It is not, however, a rotated version of Wa(a), the 1-D analog
Fourier transform of wa(t). Specifically, Wc(al, a,) is related to wa(t) by
If hd(nl, n,) and w(n,, n,) are both symmetric with respect to the origin,
hd(nl, n,)w(n,, n,) will be also, so the resulting filter will be zero phase.
From (4.11) and the Fourier transform properties,
where Jo(.) is the Bessel function of the first kind, zeroth order. The function
G(p) in (4.16) is the Hankel transform (see Problem 1.28 in Chapter 1) of w,(t).
To obtain a 2-D window w(nl, n,), the rotated 2-D analog window wc(tl, t,) is
sampled. The resulting sequence w(nl, 4) is a circularly symmetric window.
From (4.15a) and (1.53),
From (4.12), the effect of the window in the frequency domain is to smooth
Hd(wl, w,). We like to have the mainlobe width of W(ol, w,) small so that the
transition width of H(o,, a,) is small. We also want to have small sidelobe am-
plitude to ensure that the ripples in the passband and stopband regions will have Due to the aliasing effect in (4.17), W(w,, 0,) is no longer circularly symmetric.
small amplitudes. As we discussed in Chapter 1 (Section 1.3.3), circular symmetry of w(nl, n,) does
A 2-D window used in filter design is typically obtained from a 1-D window.
Sec. 4.3 Filter Design by the Window Method 203
202 Finite Impulse Response Filters Chap. 4
Figure 4.5 shows the Fourier transform of a separable window and a circularly
not guarantee circular symmetry of its Fourier transform W(o,, 0,). The function symmetric window obtained from the analog rectangular window with T = 8. Figure
W(ol, 0,) can deviate from circular symmetry considerably for (o,, w,) away from 4.5(a) corresponds to the (27 - 1) x (27 - 1)-point separable window and Figure
the origin. Near the origin, however, the aliasing effect is less and W(o,, o,) tends 4.5(b) corresponds to the circularly symmetric window. The regions of support of
to be close to being circularly symmetric. If the desired filter has a circularly the separable window (225 nonzero points) and the circularly symmetric window (193
symmetric frequency response, the circularly symmetric window of (4.15) tends to nonzero points) used in Figure 4.5 are shown in Figures 4.6(a) and (b), respectively.
have better performance for a fixed window size than the separable window of
(4.14).
Examples of 1-D analog windows that can be used in obtaining a 2-D window
are the rectangular, the Hamming, and the Kaiser windows. Their functional
forms are given below.
The Rectangular Window _I
wa(t) =
1, It1 < 7
0, otherwise.
The Hamming Window . I
0.54 4- O . ~ ~ C O S ( I T ~It(
/ T<) ,?
wa(t) = otherwise.
The Kaiser Window
" I
In,n)
(at
107 otherwise.
where I,, is the modified Bessel function of the first kind, order zero, and a is a
parameter. Note that w,(t) is zero at (tl = 7. If T is an integer, therefore, the 1-D
window w(n) obtained by w(n) = wa(t)(,=, will be a 27 - 1-point sequence. The
rectangular window has a very good mainlobe behavior (small mainlobe width), but
a poor sidelobe behavior (large sidelobe peak). It is very simple to use and is optimal,
based on the minimum mean square error criterion. Specifically, h(nl, n,) designed
by using the rectangular window minimizes
for a fixed region of support of h(n,, n,). The Hamming window does not have a
good mainlobe behavior, but does have a good sidelobe behavior. It is also very
simple to use. The Kaiser window is a family of windows. The Kaiser window's
mainlobe and sidelobe behavior can be controlled by the parameter a. Improving the (b)
mainlobe behavior will generally result in a poorer sidelobe behavior and vice versa. Figure 4.5 Fourier transform of (a) separable window and (b) circularly symmetric
The Kaiser window involves a Bessel function; evaluating the function is somewhat window obtained from the analog rectangular window with 7 = 8.
more complicated than using the rectangular or Hamming windows.
Sec. 4.3 Filter Design by the Window Method
204 Finite Impulse Response Filters Chap. 4
Figure 4.6 Support region of w(n,, n,) for 7 = 8. (a) Separable window;
(b) circularly symmetric window.
As is expected from (4.14), the Fourier transform of a separable window has large
amplitudes near the origin and along the o, and w, axes. The Fourier transform of
a circularly symmetric window is circularly symmetric approximately with a very good
approximation near the origin in the (w,, w,) plane. Figure 4.7 shows the Fourier
transforms of circularly symmetric windows obtained from the Hamming window in
(4.19) and Kaiser window in (4.20). Figure 4.7(a) corresponds to the Hamming
window and Figures 4.7(b) and (c) correspond to the Kaiser window with a = 1 and
3, respectively. In all cases, the value of T is 8.
The shape of the window affects both the mainlobe and sidelobe behavior. The
size of the window, however, primarily affects the mainlobe behavior. Since the
sidelobe behavior (and therefore passband and stopband tolerance) is affected by only
the shape of the window, the window shape is chosen first on the basis of the passband
and stopband tolerance requirements. The window size is then determined on the
basis of the transition requirements. Two examples of digital filters designed by the
window method are shown in Figures 4.8 and 4.9. Figure 4.8 shows the result of a
lowpass filter design using the Kaiser window with T = 8 and a = 2. The desired
impulse response h,(n,, n,) used is the circularly symmetric ideal lowpass filter of Figure 4.7 Fourier transform of circularly symmetric windows with 7 = 8. (a) Hamming
(4.10a) with cutoff frequency R = 0.41~. Figure 4.8(a) is based on the separable window; (b) Kaiser window with a = 1; (c) Kaiser window with a = 3.
window design, and Figure 4.8(b) is based on the circularly symmetric window design.
In each case, both the perspective plot and contour plot are shown. Note that the band is larger along the w, and w, axes than along the diagonal directions (w, = w,,
number of nonzero values of h(n,, n,) is not the same in the two cases (see Figure w, = -to,). This is typically the case for a lowpass or bandpass filter design using
4.6). It is 225 for Figure 4.8(a) and 193 for Figure 4.8(b). Figure 4.9 is the same a separable window. As was illustrated in Figure 4.5(a), the Fourier transform of a
as Figure 4.8 except that the desired impulse response used is the circularly symmetric separable window has large amplitudes near the origin and along the w, and w, axes.
ideal bandpass filter of (4.10~)with cutoff frequencies R, = 0.31~ and R, = 0.71~. The convolution operation in (4.12) of the Fourier transform of a separable window
In Figure 4.9(a), the deviation from the desired frequency response in the stop-
Sec. 4.3 Filter Design by the Window Method 207
206 Finite Impulse Response Filters Chap. 4
-r I
-77
lr
(a) (b)
Figure 4.8 Frequency responses of lowpass filters designed by the window method. Figure 4.8 (continued)
The desired impulse response was obtained by using (4.10a) with R = 0 . 4 ~The .
I-D Kaiser window was used. The support regions of the windows are those shown
in Figure 4.6. Both perspective and contour plots are shown. (a) Separable window
design; (b) rotated circularly symmetric window design.
Sec. 4.3 Filter Design by the Window Method
Finite Impulse Response Filters Chap. 4
FLgurr 4.9 Frequency responses of bandpass filters designed by the window method. Figure 4.9 (continued)
The desired impulse response was obtained by using (4.10~)with R, = 0.3n and
R, = 0 . 7 ~The
. 1-D Kaiser window was used. The support regions of the windows
are those shown in Figure 4.6. Both perspective and contour plots are shown.
(a) Separable window design: !b) rotated circularly symmetric window design.
The linear phase term in (4.25) and the resulting shift of the sequence in (4.27)
are due to the fact that the DFT is defined only for a first-quadrant support
a filter with an even number of nonzero values and with essentially the zero-phase
characteristic (which preserves the shape of the waveform in the passband regionj can
Sec. 4.3 Filter Design by the Window Method 213
212 Finite Impulse Response Filters Chap. 4
linear equations. If we impose a zero-phase constraint, from ( 4 4 ,
sequence, which cannot be zero phase. The inverse DFT in (4.26) can be computed
by using an FFT algorithm for highly composite numbers N, and N,. From the
definition of the Fourier transform and the DFT, the frequency response of the
designed filter H(w,, o,) can be shown to be related to H1(k1,k,) and therefore The set of linear equations in (4.30) have coefficients that are all real, and the
Hd(ol, ~ 2 ) resulting filter is zero phase. For M independent values in h(nl, n,), M linear
equations in (4.30) almost always have a unique solution for h(nl, n,) in practice.
e j o ~ ( N-~1)/2ejw2(N2-1)/2(1- e-jwtN~)(l - e-jw2Nz)
As with the window method, the filter designed by the frequency sampling
H(o1,02) = N1N2
method is not optimal. That is, there exists in general a filter that meets the same
design specification and whose region of support is smaller than the filter designed
by the frequency sampling method. In addition, because of a lack of control over
the frequency domain parameters, we may have to design several filters to meet
When Hd(ol, 0,) is sampled by using (4.25) strictly to design piecewise con- a given specification. Despite those disadvantages, the frequency sampling method
stant filters such as lowpass filters, Hr(kl, k,) changes sharply from 1 to 0 near is sometimes used in practice because of its conceptual and computational sim-
transition regions. As a result, both the stopband and passband behaviors of the plicity. Determining specific values and the region of the transition samples is
filter H(o,, o,) in (4.28) are rather poor. They can be improved considerably if cumbersome compared to using the window method, but an inverse transform of
some transition samples are taken in the frequency region where Hd(ol, 0,) has a Hd(ol, o,) is not needed in the frequency sampling method. The performance of
sharp transition. Determining the values of the transition samples optimally is not the two methods appears comparable as measured by the size of the region of
simple and requires linear programming methods. However, a reasonable choice, support for the filter designed to meet a given design specification.
such as linear interpolation of samples, will work well. Two examples of digital
filters designed by the frequency sampling method are shown in Figures 4.10 and
4.11. Figure 4.10 shows the result of a lowpass filter design with N1 = 15 and
N, = 15, The region of the transition samples used is shown in Figure 4.10(a),
and the transition samples used are samples of a circularly symmetric linear inter-
polation between 0 and 1. The perspective plot and the contour plot of the
frequency response of the resulting filter are shown in Figures 4.10(b) and (c),
respectively. Figure 4.11 is the same as Figure 4.10, except that a bandpass filter
is designed. The values of the transition samples are again obtained from samples
of a circularly symmetric linear interpolation between 0 and 1.
Like the window method, the frequency sampling method is quite general.
When Hd(ol, 0,) has a nonzero-phase component, we can still use (4.25), (4.26),
and (4.27). The method can also be used to design a filter with linear phase
resulting from a half-sample delay. The modification made to Hd(o,, 0,) to in-
corporate the half-sample delay is identical to that in the window design case, and
the modified Hd(ol, o,) can be used in (4.25) and (4.26). The half-sample delay
can be incorporated in the right-hand side expression of (4.27) by subtracting f
from n,, n,, or both n1 and n,. The frequency sampling method can also be used
when the desired frequency samples are obtained on a non-Cartesian grid. The
basic objective behind the frequency sampling method is to ensure that H(o,, w2)
is identical to some given Hd(ol, 02) at a Set of frequencies (oi, wi), that is,
Figure 4.10 Example of a 15 x 15-point lowpass filter designed by the frequency sampling method. (a)
Passband (filled-in dots), transition band (marked by "x"), and stopband (open dots) samples used in the
At a fixed frequency (oi, oi), (4.29) is a linear equation for the unknown coef- design; (b) perspective plot of the filter frequency response; (c) contour plot of the filter frequency response.
ficients h(n,, n,). By considering many different frequencies of (oi, d),not
necessarily on the Cartesian grid, h(nl, n,) can be determined by solving a set of Sec. 4.3 Filter Design by the Window Method 215
Filter design by frequency transformation does not have a 1-D counterpart. This
method leads to a practical means of designing a 2-D FIR filter with good per-
formance. Under a restricted set of conditions, the method has been shown to
be optimal. For these reasons, we discuss this method in some detail.
where H(w) is a 1-D digital filter frequency response and H(w,, w,) is the frequency
response of the resulting 2-D digital filter. Suppose H(w) is a bandpass filter,
as shown in Figure 4.12. Consider one particular frequency w = w;. Suppose
the function w; = G(o,, o,) represents a contour in the (w,, w,) plane, shown in
Figure 4.12. Then, according to (4.31), H(wl, w,) evaluated on the contour equals
H(w;). If we now consider other frequencies w;, o;, . . . , and if their corresponding
contours are as shown in Figure 4.12, then the resulting 2-D filter will be a bandpass
filter. Furthermore, the amplitude characteristics of H(w) will be preserved in
Figure 4.12 Illustration of the principle behind the design of a 2-D filter from a
H(w,, w,). For example, if H(w) has equiripple characteristics in the passband 1-D filter by frequency transformation.
region, so will H(w,, w,).
Two important issues must be considered in this method. One is whether
or not the resulting 2-D filter is a zero-phase FIR filter. The second is whether Problem 4.16) simply from h(n). The 2-D frequency response H(o, ,y)is obtained
or not a transformation function G(w,, w,) exists such that there will be a nice
by
mapping between w and w = G(w,, w,), as in Figure 4.12. Both of these issues N
are resolved by using a 1-D zero-phase filter and the appropriate transformation (4.33)
function. We will refer to this method as the frequency transformation method, H(w1, ~ 2 =
) H(w)l,= nw,.w2, = C
n=O
b(n)(T(w, 4 ) "
9
'.
= h(0) + 2h(n) cos wn
n =1 where R, and R, represent the region of support of t(n,, n,) and c(nl, n,), respec-
tively. The sequence c(n,, n,) is simply related to t(nl, n,) and can be easily
= C
n=O
a(n) cos wn obtained from t(n,, n,). An example of T(w,, q)often used in practice is given
'. by
= C b(n) (cos w)". T ( w , , w , ) = - i + i c o s o l + ~ c o s y + ~ c o s ( w l + y ) + ~ c o s ( o l - w , ) (4.35)
= -4 + tcosw, + tcosw, + ~ C O S W ~ C O S W ~ .
n=O
In (4.32): the sequence b(n) is not the same as E(n), but can be obtained (see
Sec. 4.4 Filter Design by the Frequency Transformation Method 219
218 Finite Impulse Response Filters Chap. 4
The sequences t(n,, n,) and c(n,, n,) that correspond to T(w,, w,) in (4.35) are
shown in Figure 4.13. Equation (4.35) originally proposed by [McClellan] is known
as the McClellan transformation.
From (4.33) and (4.34), H(w,, w,) is always real, so the resulting 2-D filter
has zero phase. In addition, it is an FIR filter. For example, when t(n,, n,) has Figure 4.13 Transformation sequence
a region of support of size (2M, + 1) x (2M2 + 1) and the length of h(n) is t(n,, n,) and the corresponding se-
quence c(n,, n,) in (4.34) often used in
2N + 1, the resulting 2-D filter H(w,, w,) will be a finite-extent sequence of size the transformation method.
(2M1N + 1) x (2M2N + 1). If N = 10 and M, = M, = 1, t(n,, n,) will be a
3 x 3-point sequence and h(n,, n,) will be a 21 x 21-point sequence. In this
example, the 2-D filter obtained has a large region of support, even though h(n) In Step 1, we choose t(n,, n,) from the results that are already available in
is short and t(n,, n,) has a small region of support. This is typically the case. the open literature [McClellan; Mersereau et al.]. For a circularly symmetric filter
Since h(n,, n,) is completely specified by h(n) and t(n,, n,), there is considerable design, for example, the McClellan transformation of (4.35) may be used. In Step
structure in h(n,, n,). As discussed in Section 4.6, this structure can be exploited 2, we translate the filter specification of H(w,, y) to the specification of H(w),
in reducing the number of arithmetic operations when the filter is implemented. using the transformation function selected in Step 1. This translation should be
By choosing T(w,, w,) in (4.34) properly, we can obtain many different sets made such that when h(n) is designed meeting the 1-D filter specification, the
of contours that can be used for 2-D filter design. For the transformation function resulting h(n,, n,) will meet the 2-D filter specification. In Step 3, we design the
1-D filter h(n) that meets the specification determined in Step 2. The optimal
T(w,, w,) in (4.39, the set of contours obtained by cos w = T(w,, w,) for w =
0,7~/10,27~/10, . . . ,7~ are shown in Figure 4.14. This can be used to design many 1-D filter design technique developed by [Parks and McClellan] can be used here.
In Step 4, we determine h(n,, n,) from t(n,, n,) and h(n) using (4.32) and (4.33).
different 2-D FIR filters. From a 1-D lowpass filter of 21 points in length whose
H(w) is shown in Figure 4.15(a), we can obtain a 2-D filter whose frequency To illustrate these four steps, let us consider a specific design example.
response is shown in Figure 4.15(b). Note that the amplitude characteristics of
H(w) are preserved in H(w,, w,). For example, the equiripple characteristics of
H(w) in the passband and stopband regions are preserved in H(w,, 0,). If we
begin with a 1-D highpass or bandpass filter, the resulting 2-D filter based on (4.35)
would be a highpass or bandpass filter. Due to the approximate circular symmetry
of the contours corresponding to cos w = T(w,, w,) with T(w,, w,) in (4.35),
H(w,, w,) obtained has approximately circular symmetry particularly in the low-
frequency regions.
The basic idea behind the transformation method discussed in this section
can be used to develop specific methods of designing 2-D zero-phase FIR filters.
The next two sections treat two specific design methods.
correspond~ngto cos w = T(w,, w,) are not exactly circularly symmetric. This de-
viation from circular symmetry is more clearly visible in high-frequency regions.
Step 3. Using the Parks-McClellan algorithm, we design a 1-D Type I filter (h(n) =
h(-n)) that meets the design specification in (4.38). The frequency response of the
1-D filter designed is shown in Figure 4.16(b). The length of the filter 2N + 1 in
1
this case is 31.
Step 4. From t ( n , , n,) and h(n) obtained above, a 2-D filter h(n,. n,) is obtained by
using (4.32) and (4.33). The frequency response H(w,, w,) is shown as a perspective
plot in Figure 4.16(c) and as a contour plot in Figure 4.16(d). This filter is guaranteed
to meet the specification given by (4.36). The 2-D filter h(n,, n,) obtained is a
31 x 31-point sequence.
Figure 4.17 shows another example of a filter designed by using the method
discussed in this section. Figures 4.17(a) and (b) show the perspective plot and the
contour plot, respectively. The filter is a bandpass filter designed on the basis of a
Figure 4.15 Example of a lowpass filter designed by the transformation method. filter specification with circular symmetry. The four cutoff frequencies R,, R,, R,,
(a) Frequency response of the 21-point I-D filter used in the design; (b) frequency R, used are 0.37, 0.47, 0.67, and 0.77, respectively with R, and R, representing the
response of the 21 x 21-point 2-D filter designed. stopband frequencies and R, and R, representing the passband frequencies. T h e
passband and stopband tolerances 6, and 6, used are 0.054 and 0.027. The size of
Design Example the filter is 41 x 41.
We wish to design a 2-D lowpass filter that meets the following circularly sym- In this method, we have simply selected a transformation function from ex-
metric filter specification:
isting ones. This, of course, can be quite restrictive. If there is not a good match
6, (passband tolerance) = 0.05, 6, (stopband tolerance) = 0.025 between the contours obtained from the transformation function and the passband
w, (passband cutoff frequency) = 0.47, and stopband region contours in the 2-D filter specification, then the resulting filter
w,(stopband cutoff frequency) = 0.57 may significantly exceed the given specification. As a result, the region of support
(4.36)
Sec. 4.4 Filter Design by the Frequency Transformation Method 223
222 Finite Impulse Response Filters Chap. 4
Figure 4.16 Illustration of a lowpass filter design using Design Method 1. (a)
Passband ( ), transition band ( ) ), and stopband ( ) regions Figure 4.16 (continued)
in the filter specification with circular symmetry. Note that boundaries of different
regions in the filter specification do not coincide with contours given by cos w =
T(w,, w,). This illustrates that contours given by cos w = T(w,, w,) are not circularly
symmetric. (b) Frequency response of the 31-point 1-D filter used in the design;
(c) perspective plot of the frequency response of the 2-D filter designed; (d) contour
r - - - of !he freqency respcnse cf the 2 D fi!te; dcsigncd.
nlnt
Sec. 4.4 Filter Design by the Frequency Transformation Method
In this method, we first design the transformation sequence t(nl, n,). Once
t(nl, n,) is designed, the remaining steps are identical to those of Design Method
1. The four steps involved in Method 2 can be stated as
(*)
2
(a) (b)
Figure 4.18 Illustration of Design Method 2. (a) 2-D lowpass filter specification with elliptical
Figure 4.17 Frequency response of a bandpass filter designed by using Design
Method 1. (a) Perspective plot; (b) contour plot.
I passband and stopband contours. (b) 1-D lowpass filter specification. The 1-D filter that
meets this specification is used in designing the 2-D filter specified in (a).
With this constraint, there are five independent parameters in t(n,, n,). We then
where Rp includes the passband region
consider the filter specification. Since the specification in our example has fourfold
symmetry, it is reasonable to impose the same fourfold symmetry on t(n,, n,).
The result is given by
where R, includes the stopband region
and the region of support of t(n,, n,) is as small as possible. (4.41e)
Equations (4.41a) and (4.41b) are the constraints to map o = o, to the pass-
Imposing the constraints of (4.43) and (4.44) on t(nl, n,), we can express T(ol, y)
band contour Cp and o = o, to the stopband contour C, in (4.39). If this can be
in the form of
done, we can reduce the amount by which the designed filter exceeds the speci-
fication. Equations (4.41~)and (4.41d) are the constraints to map the 1-D filter's T(ol, o,) = A + B cos o, + C cos o, + D cos o, cos o, (4.45)
passband and stopband regions to the 2-D filter's passband and stopband regions.
for some real constants A, B, C, and D. The McClellan transformation of (4.35)
The constraint in (4.41e) is intended to minimize the region of support of the resulting
is a special case of (4.45). Since a 1-D lowpass filter is transformed to a 2-D
2-D filter h(n,, n,). Since the conditions in (4.41a) and (4.41b) apply to an infinite
lowpass filter, we may impose the constraint that o = 0 be mapped to o, = o,
number of points on the Cp and C, contours, they cannot in general be satisfied
= 0 and therefore
exactly with a finite number of terms in t(nl, n,). In addition, attempting to satisfy
all the conditions in (4.41) simultaneously becomes a highly nonlinear problem. 'OS olw=o = T("17 02)(wl= O , W Z = O . (4.46)
One suboptimal method to solve (4.41), and thus to complete Step 1of Design
From (4.45) and (4.46),
Method 2, is given by
Step 1.1. Assume a shape and size for the region of support of t(n,, n,).
A + B + C + D = l . (4.47)
Step 1.2. Impose a structure on t(n,, n,).
From (4.45) and (4.47), designing t(nl, n,) is equivalent to determining three
Step 1.3. Assume an op and o,, and design t(n,, n,) so that (4.41a) and (4.41b) independent parameters. It is possible to impose further constraints, such as
are approximately satisfied: cos oI,=, = T(o,, 02)ld,=m,w2=m. However, imposing too many constraints re-
duces the number of degrees of freedom that may be necessary to satisfy other
constraints, for instance, the passband and stopband constraints. Increasing the
COS T(o17 ~ 2 ) 7 ( ~ 1 7~ 2 e) Cs. region of support size of t(n,, n,) is possible, but it will increase the region of
Step 1.4. Design a new t1(n1,n,) from t(nl, n,) in Step 1.3 to ensure that (4.41~) support size of the resulting 2-D filter h(nl, n,).
and (4.412) are satisfied.
Sec. 4.4 Filter Design by the Frequency Transformation Method 229
228 Finite Impulse Response Filters Chap. 4
Step 1.3. In this step, we impose the passband and stopband contour con-
straints. We first choose initial values of wp and w,. These frequencies will be
modified later, and the performance of the resulting filter is not very sensitive to
the specific initial choices of wp and w,. Since there are an infinite number of
(w,, w,) on the contours Cp and C,, it is not in general possible to satisfy the
passband and stopband constraints of (4.41a) and (4.41b) exactly with a finite
number of parameters in t(n,, n,). Therefore, we define an error criterion such
as
Error = I
(wi,w2)ECp
(cos wp - T(w,, w,))' do, do,
+ II
(w1,wz)fCs
( c o s ~ , - T ( ~ , , ~ , ) ) ~(4.48)
~ ~ , ~ ~ , .
Since T(w,, w,) is a linear combination of the unknown parameters t(n,, n,) and
the Error in (4.48) is a quadratic form of the unknown parameters, minimization
of the error expression with respect to t(n,, n,) is a linear problem. In our particular
design problem, we minimize the Error in (4.48) with respect to A , B, C, and D
under the constraint of (4.47). The solution to this problem with the choice of
wp = 0 . 5 ~and w, = 0 . 6 is~ given by
T(w,, 0,) = -0.47 + 0.67 cos w, + 0.47 cos w, + 0.33 cos w, cos w,. (4.49)
Step 1.4. The transformation function in (4.49) ensures that T(w,, w,) will
have fourfold symmetry, w = 0 will map exactly to (w,, w,) = (0, O), and w =
wp and w = w, will map approximately to (w,, w,) E Cp and (w,, w,) E C,. It
does not ensure, however, that T(w,, w,) will be between - 1 and + 1 for all (w,,
w,). Since the frequency transformation used is cos w = T(w,, w,), and - 1 I
cos w I 1 for all real w, T(w,, w,) less than - 1 or greater than + 1 leads to an
unacceptable result. To understand this problem more clearly, we consider (4.32)
rewritten as
H1(x) = ~ ( w ) 1 cosw=x
= 2 b(n) (cos w)" 1
n=O com=x
=
n=O
b(n)xn. (4.50)
When a 1-D filter H(w) is designed, b(n) is estimated such that H(w) approximates
the desired frequency response over 0 5 w I IT. This is equivalent to estinlating
b(n) such that Ht(x) approximates some desired function over the interval - 1 I
x 5 1. Examples of H(w) and H1(x) are shown in Figure 4.19. Figure 4.19(a)
shows the 1-D lowpass filter designed by the Parks-McClellan algorithm for 6, = Figure 4.19 Example of (a) H(w) and (b) H'(x). H(w) is the frequency response
0.08, 6, = 0.08, wp = 0.51~,w, = 0.61~. The corresponding function Ht(x) is of a 1-D optimal filter. H'(x) is obtained from H(w) by
shown in Figure 4.19(b). It is clear that Ht(x) is a well-controlled function for
- 1 Ix I 1 but does not behave in a controlled manner outside that range. In
2-D filter design by the transformation method, H(w,, w,) is obtained by Note that the x-axis in H'(x) is reversed in direction.
where Tmx and TMw are the maximum and minimum values of T(wl, w,), re-
spectively over -IT 5 o, 5 IT, -IT 5 o, 5 n. In essence, the constants k, and
k, are chosen such that the minimum and maximum of T(ol, o,) become - 1 and
+ 1in T1(o1,o,), respectively, and all in-between values are mapped monotonically
between - 1 and + 1. Equation (4.52) ensures that T1(w1, 02) will satisfy the
condition that
-1 :T1(wl, o,) I1 for all (o,, o,). (4.53) -1:
6
From (4.52a),
fl
The sequence t1(n1, n,) in (4.54) is the transformation sequence designed. If
t(nl, n,) has fourfold symmetry, t1(n1,n,) will also have fourfold symmetry. When
(4.52) is applied to (4.49), the resulting Tr(w1, w,) for our specific design problem
is
T1(o1,w,) = -0.28 + 0.58 cos w1 + 0.41 cos o, + 0.29 cos o1cos o,. (4.55)
This completes the design of the transformation sequence.
We will now consider the effect of the modification in (4.52) on the constraints
that we previously imposed on t(nl, n,). We first consider the general nature of
the modification. For the transformation function T(ol, o,), the mapping used Figure 4.20 (a) Contours given by cos o = T(o,, o,) with T(o,,
is in (4.49).
Regions of (o,, o,) where 1 < lT(o,, o,)/ are cross-hatched. (b) Frequency response
of the filter designed by the transformation method using the I-D filter in Figure
cos = T(ol, w,). (4.56) 4.19(a) and the transformation function T o , , g)in (4.49). Note that the filter
has serious problems in the frequency regions cross-hatched in (a).
For the new transformation function T1(wl, o,), we have
cos o = T1(o1, o,) = klT(ol, w,) + k,. (4.57)
From (4.60), it is clear that the Cpand C, contours will map approximately to new
frequencies o; and oj. Therefore, the 1-D filter specification we assumed in Step
1.1 for the purpose of designing tt(nl, n2) must be modified. The result is consistent
with the observation that the modification does not affect the shape of the contours,
but affects the 1-D frequency associated with each contour. The modification of
the I-D filter specification is automatically done in Step 2, where the 2-D filter
specification is translated to the 1-D filter specification based on the t1(n1, n,)
designed in Step 1. Step 2, together with a small number of degrees of freedom
for T(ol, o,), will ensure that the passband regions of the 1-D filter specification
will map to at least the passband regions of the 2-D filter specification. This
satisfies the condition in (4.41~). Similarly, (4.41d) that applies to the stopband
region is also satisfied. From (4.49) and (4.55),
T(0, 0) = Tt(O, 0) = T,, = 1 = cos 0. (4.61) -a 1
Therefore, the constraint in (4.46) that o = 0 must map to ol = o2 = 0 is also
satisfied in this example. This type of constraint, however, may not be satisfied
in general after the original transformation function is modified. Even though
(4.46) was a reasonable constraint to impose, it was not necessary.
Steps 2, 3, and 4 were discussed in detail in Section 4.4.2. This completes
Design Method 2. Figure 4.21(a) shows the contours given by cos o = Tt(wl, %)
with T1(o1,o,) in (4.55). Figures 4.21(b) and (c) show the perspective plot and
the contour plot of the 2-D filter H(o,, o,) designed by following Steps 2, 3, and
4 with Tt(ol, 02) in (4.55). The size of the filter's region of support is 27 x 27. Figure 4.21 (a) Contours given by cos w = T(wl, w,) with T(wl, w,) in (4.55);
One additional example of a filter designed by Design Method 2 is shown in Figure (b) frequency response of the 2-D filter designed by Design Method 2. Perspective
4.22. Figure 4.22 shows the design of another lowpass filter. In designing the plot; (c) contour plot.
filter, the passband and stopband contours in the specification had square shape,
the passband and stopband frequencies used were 0.51~and 0.751~,respectively,
n=O
a(n) cos o n
applied to the lowpass filter design problem of (4.63), the alternation frequencies
of the resulting lowpass filter will include w,, w,, and all local extrema (minimum
or maximum) of H(w) with the possible exception of w = 0 or n. An example
of an optimal filter H(w) is shown in Figure 4.23 for the case N = 9, w, = OSn,
w, = 0.6n, and k = 6,/6, = 1. The value 6, in this case is 0.08. The function
where a(n) is simply related t o the impulse response h(n), ai = a(i) and 4(w) =
cos wi. From (4.62), H(w) can be viewed as a linear combination of N + 1 known
basis functions ~$~(w).The problem of designing an optimal filter can be stated as
Given w,, us, k = 6,/6,, N, determine a, so that 6, is minimized. (4.63)
It can be shown that a solution t o the problem in (4.63) can be used (see Problem
4.17) to solve other formulations of the optimal design problem, such as the problem
of determining a, that minimizes N.
The problem stated above can be shown to be a special case of a weighted
Chebyshev approximation problem that can be stated as
The function W(w) in (4.64) is a positive weighting function, and D(w) is the desired
function that we wish to approximate with H(w). The error criterion used in (4.64)
is the minimization of the maximum error between the desired frequency response
and the resulting filter frequency response, and (4.64) is sometimes referred t o as
a min-max problem as well as a weighted Chebyshev approximation problem. By Figure 4.23 Frequency response of a 19-point 1-D optimal lowpass filter.
choosing K, the interval of approximation, such that it consists of the passband
Sec. 4.5 Optimal Filter Design 239
238 Finite Impulse Response Filters Chap. 4
region). From ( 4 . 3 , the frequency response of a 2-D zero-phase filter can be
expressed as
H(w) in Figure 4.23 shows equiripple behavior in the passband and stopband
regions. This is typical of optimal filters, and optimal filters are thus sometimes
called equiripple filters.
The Remez multiple exchange algorithm exploits the necessary and sufficient
condition given by the alternation theorem to solve the weighted Chebyshev ap-
proximation problem. This exchange algorithm was first used to solve the optimal
filter design problem by [Parks and McClellan], so it is referred to as the Parks-
McClellan algorithm. The optimal filter design method based on the Rernez ex-
change algorithm is an iterative procedure in which the filter is improved in each where N + 1 is the number of parameters in a(n,, n,), a, is one parameter in
iteration. Each iteration consists of two steps. One step is determining candidate a(nl, n,) for a particular (n',, n:), and +i(O1, 02) = cos (olni + 02n$). If there is
coefficients a, from candidate alternation frequencies. In this step, we attempt to additional symmetry to h(nl, n,), for instance, fourfold or eightfold symmetry,
impose the condition that the magnitude of the error function E(o) must reach its H ( o ) can still be expressed in a form similar to (4.67). From (4.67), H ( o ) can
maximum at the alternation frequencies. The step involves solving N + 2 linear be viewed as a linear combination of N + 1 known basis functions +,(o). The
equations for 8, and N + 1 coefficients of a, of the form problem of designing an optimal filter can be stated as
Given R,, R,, k = 848,, and N, determine a, so that 8, is minimized. (4.68)
As in the 1-D case, it can be shown that a solution to (4.68) can be used to solve
other formulations of the optimal filter design problem such as the problem of
determining ai that minimizes N.
The problem stated in (4.68) can be shown to be a special case of a more
general problem, which can be stated as
Figure 4.24 Frequency responses of 2-D optimal lowpass filters. In all cases, In the previous sections, we discussed the problem of specifying and designing an
circularly symmetric filter specification with up = 0.4n, w, = 0.6n, and k = FIR filter. Once the filter is designed, the remaining task is implementation. In
6,/6, = 1 was used. After [Harris and Mersereau]. (a) 5 x 5-point filter with
6, = 0.2670; ( b ) 7 x 7-point filter with 6, = 0.1269; (c) 9 X 9-point filter with this section, we discuss the implementation of an FIR filter.
6, = 0.1141; (d) 11 x 11-point filter with 6, = 0.0569.
4.6.1 Implementation of General FIR Filters
4.5.3 Optimal Filter Design by Frequency Transformation
The filter implementation problem is to realize a discrete system with h(n,, n,),
the impulse response of the designed filter. The simplest method of implementing
A filter designed by the transformation method discussed in Section 4.4 is not in
an FIR filter is to use the convolution sum. Let x(n,, n,) and y(n,, n,) denote the
general an optimal filter. To see this qualitatively, note that the filter h(nl, n2)
input and output of the filter. Then y(n,, n,) is related to x(n,, n,) by
designed by using the transformation method has a great deal of structure built
into it. As part of this structure, the contours given by cos w = T(wl, w2) are
constrained by the region of support size of t(n,, n,) and the functional form of
T(w,, w2). Since the boundaries of passband and stopband regions in the filter
specification do not generally coincide with contours given by cos w = T(wl, w,,), where R, is the region of support of h(n,, n,). From (4.71), the number of
the filter designed by the transformation method has to exceed the filter specifi- arithmetic operations required for each output point is about N multiplications and
cation. By eliminating the structure imposed on h(n,, n2) by the transformation N additions, where N is the number of nonzero coefficients of h(n,, n,). As in
method, we can in general design a filter that exceeds the given filter specification the 1-D case, the realization can be improved by exploiting the symmetry of
by less than what is possible with the transformation method. h(n,, n,). Since h(n,, n,) = h(-n,, -n,), by rewriting (4.71) and combining the
In a very restricted set of cases, however, it is possible to show that an optimal two terms that have the same value for h(k,, k,), the number of multiplications
filter can be designed by using the transformation method discussed in Section 4.4. can be reduced by about fifty percent without affecting the number of additions.
Suppose the transformation sequence t(n,, n,) used in the transformation method Any FIR filter can also be implemented by using an FFT algorithm. As
is first-order (3 x 3 size) with fourfold symmetry along the n, and n, axes, and shown in Section 3.2.3, the overlap-add method and the overlap-save method can
the contours (such as the passband and stopband contours in the lowpass filter be used to perform the filtering operation. In some cases, this method reduces
design) in the filter specification exactly match the contours given by cos w =
Sec. 4.6 Implementation of Fir Filters 245
Finite Impulse Response Filters Chap. 4
the number of arithmetic operations significantly, relative to realization by direct
convolution.
Using the convolution sum or an FFT algorithm to implement an FIR filter
is very general, in that it applies to any FIR filter independent of how it is designed.
If there is some structure in h(n,, n,), however, it can be used to reduce the number
of arithmetic operations required. For example, the number of multiplications
required in the direct convolution case can be reduced by about fifty percent by
exploiting the symmetry of h(nl, n,). An implementation that exploits a special
structure of h(n,, n,) is generally applicable to only that class of h(nl, n,). The
next two sections treat the implementation of FIR filters whose impulse re-
sponses have special structures in addition to the structure resulting from h(nl, n,)
= h(-n,, -n2).
The most straightforward implementation results when (4.72b) is used directly. From
(4.72b), H(ol, o,) can be viewed as a parallel combination of b(O), b(l)T(ol, o,),
b(2)T2(ol, o,), . . . , and b(N)TN(ol,w,). Since Ti(ol, 0,) can be obtained from
Figure 4.25 Frequency response of an optimal lowpass filter with elliptic passband cascading Ti-'(o,, w,) with T(ol, o,), a straightforward implementation of (4.72b)
and stopband contours, designed by the transformation method. In a certain re- results in the structure shown in Figure 4.26. Since T(o,, 0,) corresponds to a
stricted set of cases, filters designed by the transformation method are optimal.
finite-extent sequence of (2M1 + 1) x (2M2 + 1) points, the number of arithmetic
(a) Perspective plot; (b) contour plot.
chap P
operations per output point is about (2M1 + 1)(2M2 + l ) N multiplications and A 2-D polynomial cannot, in general, be factored as a product of lower-order
(2M1 + 1)(2M2 + l ) N which is proportional to N. For large values of N, this polynomials, and H(z,, 2,) cannot generally be written in the form of
represents considerable computational savings. When N = 20 and M, = M,
= 1, direct convolution with only symmetry exploitation requires approximately
H(z,, z,) = A n
k
H,(z,, z,). (4.74)
850 multiplications and 1700 additions per output point, while the realization in T o use a cascade form in the realization of a 2-D FIR filter, therefore, the form
Figure 4.26 involves about 180 multiplications and 180 additions. If the symmetry in (4.74) should be used explicitly in the design step.
of t(nl, n,) is also exploited, then the number of multiplications can be further If H(zl, z2) can be expressed in the form given by (4.74), that information
reduced by approximately half. can be exploited in reducing the number of arithmetic operations. T o see this,
From Figure 4.26, keeping two intermediate results would be sufficient in the suppose H(zl, 2,) can be expressed as
implementation. For example, once fl(nl, n,) and gl(nl, n,) are computed, (4.75)
x(nl, n,) is not needed. Once f2(nl, n,) and g,(nl, n,) are computed, fl(nl, n,) H(z1, 22) = Hl(Z1, z,)H,(z1, 22).
and gl(nl, n,) can be eliminated. A t any given time, therefore, we need to store Suppose further that Hl(zl, 2,) corresponds to an M x M-point FIR filter and
only two intermediate results, and the amount of storage required in this imple- that H2(z1, 2,) corresponds to an N x N-point FIR filter. The resulting H(z,, 2,)
mentation will be approximately twice as much as in the direct convolution case. will correspond to an FIR filter of size (M + N - 1) x (M + N - 1). If
It has been observed that the implementation in Figure 4.26 tends to be H(zl, 2,) is directly implemented by using convolution without exploiting the sym-
somewhat sensitive to finite precision arithmetic. This characteristic can be im- metry of h(nl, n,), computation of each output sample will require approximately
proved by using (4.72a) and expressing cos wn in terms of cos w(n - 1) and (M + N - multiplications and (M + N - additions. If H(z,, 2,) is
cos w(n - 2). See Problem 4.22 for further discussion. implemented by exploiting the structure in (4.79, but without exploiting the sym-
metry of hl(nl, n,) or h2(nl, n,), computation of each output sample will require
approximately M 2 + N 2 multiplications and M 2 + N 2 additions. The M 2 mul-
4.6.3 Cascade lmplementation
x ( n l , n2) :
I
1
!
1
I
~0wpa.sfilter
with cutoff
frequency R2
H 1 Hi"""
frequency R,,
with R , < R2
:~ ( n ,n2)
,
For implementation of filters designed by the transformation method, see [Meck-
lenbrauker and Mersereau; McClellan and Chan].
For books on functional approximation theory, see [Cheney; Rice]. For the
Parks-McClellan algorithm for 1-D optimal filter design, see [Parks and McClellan;
McClellan et al.]. For one of the earlier attempts to design 2-D optimal FIR filters
based on a computationally very expensive linear programming approach, see [Hu
and Rabiner]. For the Remez exchange type approach to the 2-D optimal FIR
filter design problem, see [Kamp and Thiran; Harris and Mersereau]. For a design
Figure 4.27 Realization of a bandpass filter by cascading a highpass filter with a
lowpass filter.
based on the 1, norm, which can be used to design an optimal filter approximately,
see [Lodge and Fahmy]. For an optimal filter design approach based on general
functional optimization methods, see [Charalambous].
longer a linear combination of the parameters that need to be determined in the For an approach to the design of FIR filters with a special structure such as
design. For example, suppose H l ( z l , 2,) and H2(z1,2,) have forms given by the cascade of simple filters, see [Abramatic and Faugeras].
H l ( z l , z2) = a + bz;' + bzl + cz;l + cz, (4.77) J. F. Abramatic and 0. D. Faugeras, Sequential convolution techniques for image filtering,
and H,(zl, z,) = d + ez;' + ez,. IEEE Trans. Acoust., Speech and Sig. Proc., Vol. ASSP-30, February 1982, pp. 1-10.
The function H ( z l , z,) is given by S. A. H. Aly and M. M. Fahmy, Symmetry in two-dimensional rectangularly sampled digital
filters, IEEE Trans. Acoust., Speech and Sig. Proc., Vol. ASSP-29, August 1981, pp.
+ 2be + (ae + bd)z;' + (ae + bd)zl + c d z z l + cdz2
H ( z l , 2,) = ad
(4.78) 794-805.
+ bezc2 + bez: + cez; ' 2 , + c e z l z z l + cez; ' z z l + c e z , ~ , . C. Charalambous, The performance of an algorithm for minimax design of two-dimensional
linear phase FIR digital filters, IEEE Trans. Circuits Syst., Vol. CAS-32, October 1985,
Clearly, the coefficients in H ( z l , 2,) that need to be determined in the design step pp. 1016-1028.
are nonlinear functions of a , b , c, d , and e. E. W. Cheney, Introduction to Approximation Theory. New York: McGraw-Hill, 1966.
In some cases, it is possible to design a complicated filter by cascading simpler D. B. Harris, Iterative procedures for optimal Chebyshev design of FIR digital filters,
filters. For example, by cascading a highpass filter with a lowpass filter, it is S.M. thesis, MIT EECS Dept., February 1976.
possible to design a bandpass filter. This is shown in Figure 4.27. If we have D. B. Harris and R. M. Mersereau, A comparison of algorithms for minimax design of two-
efficient ways to design a lowpass filter and a highpass filter with impulse response dimensional linear phase FIR digital filters, IEEE Trans. Acoust., Speech and Sig. Proc.,
of a certain size, then a bandpass filter with a larger region of support for the Vol. ASSP-25, December 1977, pp. 492-500.
impulse response may be obtained by cascading the two simpler filters. In this F. J. Harris, On the use of windows for harmonic analysis with the discrete Fourier transform,
case, the frequency responses of the filters designed are no longer nonlinear func- Proc. IEEE, Vol. 66, January 1978, pp. 51-83.
tions of the parameters determined in the design step. S. N. Hazra and M. S. Reddy, Design of circularly symmetric low-pass two-dimensional
FIR digital filters using transformation, IEEE Trans. Circuits Syst., Vol. CAS-33, October
1986, pp. 1022-1026.
REFERENCES
For a review of 1-D FIR filter design methods, see [Rabiner; Rabiner et al.].
I J. V. Hu and L. R. Rabiner, Design techniques for two-dimensional digital filters, IEEE
Trans. Audio Electroacoust., Vol. AU-20, October 1972, pp. 249-257.
T. S. Huang, Two-dimensional windows, IEEE Trans. Audio Electroacoust., Vol. AU-20,
For the effect of symmetry on 2-D FIR filter design, see [Aly and Fahmy; Rajan March 1972, pp. 88-90.
et al.]. For the importance of phase in digital filters applied to image processing T. S. Huang, J. W. Burnett, and A. G. Denky, The importance of phase in image processing
problems, see [Huang et al.]. filters, IEEE Trans. Acoust., Speech and Sig. Proc., Vol. ASSP-23, December 1975, pp.
For a survey of 1-D windows, see [Harris (1978)l. For 2-D window design 529-542.
methods, see [Huang; Kato and Furukawa; Yu and Mitra]. For a discussion on Y. Kamp and J. P. Thiran, Chebyshev approximation for two-dimensional nonrecursive
the initial choice of window parameters in the window design method, see [Speake digital filters, IEEE Trans. Circuits Syst., Vol. CAS-22, March 1975, pp. 208-218.
and Mersereau] . H. Kato and T. Furukawa, Two-dimensional type-preserving circular windows, IEEE Trans.
For readings on the frequency transformation method for FIR filter design, Acoust., Speech and Sig. Proc., Vol. ASSP-29, August 1981, pp. 926-928.
see [McClellan; Mersereau et al.; Mersereau]. For design of circularly symmetric
We wish to design an FIR filter whose impulse response h ( n , , n,) is zero outside - 5 - -- --
I------?
1
, , - * - Figure P4.9
5 n, 5 and -5 5 n, 5 5 by minimizing
Err01 = -
1 "
(2=)2 Jw,=-" \:=-* IHd(@l, ~ 2 -) H ( " ~ ,~ 2 1 1 ~
Determine h(n,, n,), the impulse response of the 2-D filter.
4.10. Suppose we design a 2-D zero-phase lowpass filter H ( o , , a , ) from a 1-D zero-phase
lowpass filter H ( w ) by
where H ( w I j o , ) is the frequency response of the resulting digital filter. Determine
h(n,, n,) and show that it minimizes the above error expression.
4.7. In the frequency sampling method, we ensure that H ( o , , a,), the 2-D filter designed.
will be identical to some given H d ( w , , o , ) at a set of frequencies. The frequencies We wish the resulting filter H ( o , , o , ) to be symmetric with respect to the o, axis, the
in the chosen set do not have to be o n the Cartesian grid, and the zero-phase filter o, axis, the o, = o , axis, and the o, = - o , axis.
h(n,, n,) can be determined by solving a set of linear equations of the form (a) Determine one set of A , B, and C that could be used. Explain your answer.
You may assume that a reasonable 1-D lowpass filter H(w) is available to you.
(b) Suppose h ( n ) , the impulse response of the 1-D filter used, is nonzero for
- 2 5 n 5 2 and zero outside -2 5 n 5 2 . Determine the number of nonzero
For M independent values in h(n,, n,), M linear equations almost always have a unique coefficients in h(n,, n,).
solution for h(n,, n,). However, it is possible to select a set of M distinct frequencies
Chap. 4 Problems
254 Finite Impulse Response Filters Chap. 4
4.11. We wish to design a 2-D FIR filter from a 1-D FIRfilter by the transformation method. 4.14. Consider a 3 x 3-point transformation sequence t(n,, n z ) We design a 2-D zero-
Suppose the transformation sequence t(n,, n,) is zero except for the five points shown phase filter H(o,, o,) from a 1-D zero-phase filter H(o) by
in the following figure.
I Figure P4.11
(a) What are the minimum constraints on the coefficients a , b, c, d, and e for the
resulting 2-D filter to be zero phase [i.e., for the filter frequency response
H(ol, o,) to be real]? Assume the 1-D filter used in the transformation method
is zero phase.
(b) Suppose a = b = e = 0 and c = d = 112. Suppose also that the 1-D zero-phase
filter used is a lowpass filter with a passband region given by - 1~12Io I1~12.
Sketch the complete passband region (or regions) of the resulting2-D filter. Label
the axes in your sketch. 1, in the shaded region
4.12. Suppose we design a 2-D FIR filter H(o,, w,) from a 1-D FIR filter H(o) by 0, in the unshaded region Figure P4.14
The filter designed by using the above transformation is typically not circularly sym- The 2-D filter H(o,, o,) has to satisfy the following specification:
metric. One clever student suggested the following approach:
0.95 I H(w,, o,) I 1.05, (o,, o,) E passband region (see figure)
-0.02 I H(ol, o,) I 0.02, ( a l , o,) E stopband region (see figure)
Determine the filter specification that the 1-D filter H(o) has to satisfy so that the
Is this a good approach? If so, explain why. If not, discuss the problems of this resulting 2-D filter H(o,, o,) will be guaranteed to meet the above filter specification.
approach.
P,[cos w ] = cos o
P,[cos o] = 2 cos oP,[cos o] - Po[coso]
4.16. Consider a 1-D zero-phase FIR filter h(n) with length 2N + 1. The frequency
response H ( o ) can be expressed as
= zN
n=O
a(n) cos on
Figure P4.16
where a(0) = h(0) and a(n) = 2h(n) for n r 1. In this problem, we show that H(o)
can also be expressed in the form of
N Determine a(n) in (2) and b(n) in (3).
H(w) = b(n) (cos o)" 4.17. W e wish to design a 2-D zero-phase lowpass filter with design specification parameters
n=O
6, (passband tolerance), 6, (stopband tolerance), R, (passband region), and R, (stop-
where b(n) can be simply related to a(n) or h(n). Note that band region). W e assume that the impulse response of the filter designed has a region
of support of (2N + 1) x (2N + 1) points. Suppose we have developed an algorithm
+ B ) = cos A cos B - sin A sin B
cos ( A that solves the following problem:
and sin ( A + B) = sin A cos B + cos A sin B. Given R,, R,, k = 6,16,, and N, determine h(n,, n,) so that 6, is minimized.
We'll refer to the algorithm as Algorithm A.
(a) Show that cos 20 = 2 cos o cos o - 1. (a) Using Algorithm A , develop a method that solves the following problem:
(b) Show that cos 30 = 2 cos o cos 2w - cos w.
(c) More generally, show that for n r 2, Given R,, R,, 6,, and 6,, determine h(n,, n,) so that N is minimized.
Chap. 4 Problems
One approach suggested is to design a 2-L, zero-phase fan filter H(o,, a,) from a
1-D zero-phase filter H(o) by
(b) Suppose R, is given by
V
m' = 0,
If V(o,) for 0 Ii IN are independent vectors for any choice of distinctly different
oir K where K is some known region, then the functions +i(o) are said to form a Che-
byshev set or satisfy the Haar condition. Show that +i(o) = cos (o,n{ + o,n;) for In Section 4.6.2, we developed an implementation method based directly on (2). An
integers n! and ni do not satisfy the Haar condition for K given by 0 Io, I IT, 0 alternate implementation method can be developed by using (1) and expressing cos o n
o2IIT. in terms of cos o(n - 1) and cos o(n - 2). Specifically, let cos o n be denoted by
P,,[cos a]. From (I),
4.20. In some geophysical applications, it is useful to design a fanfilter whose idea fre-
quency response H,(w,, o,) is shown in the following figure.
From Problem 4.16, P,,[*] is the nth-order Chebyshev polynomial and is given by
(a) Show that P,,[T(o,, o,)] for n r 2 can be obtained from P,,-,[T(ol, o,)] and
in the shaded region
Pn-2[T(o,7 w,>l by
Hd(w1#~ 2 = ) {I:
0 in the unshaded region Figure P4.20
(b) From (3) and the result in (a), show that H(ol, o,) for N = 6 can be implemented
by Figure P4.22(b).
(c) Compare the computational requirements (arithmetic operations) using the struc-
ture in (b) with direct convolution of h(nl, n,) with the input.
4.23. Let H(zl, z,, z,) represent a 3-D FIR filter. Suppose H(zl, z,, 2,) can be expressed
as
where Hl(zl, z,, z,) and H2(zl, z,, z,) each has a region of support of N x N X N
points.
(a) Compare the number of arithmetic operations required for convolving the input with
h(nl, n,, n,) to the number required for convolving the input with hl(nl, n,, n,)
first and then convolving the result with h,(n,, n,, n,).
(b) Considering the number of degrees of freedom, argue that H(zl, z,, 2,) in general
cannot be expressed as Hl(zl, z,, z3)H2(zl,z,, 2,). A cascade structure, therefore,
is not a general structure that can be used in the implementation of a general FIR
filter H(zl, z,, 2,).
(c) Repeat (a) when H(zl, z,, 2,) can be expressed as
Filters In Section 5.5, the frequency domain design approach is discussed. In Section
5.6, we discuss the problem of realizing IIR filters. In Section 5.7, we discuss the
advantages and disadvantages of FIR and IIR filters.
264
t
without allowing any multiplication of terms (for instance, by a constant or al-1. 21 a(0, 2) all. 21 a12, 21
-- --'
'"t"
z ~ z ? ) , and solve for H(zl, 2,) = Y(zl, z2)/X(zl, 2,). This will result in (5.2).
If we follow the convention stated above for interpreting H(zl, z,), then there
be a unique relationship between the system function in (5.2) and the computational
procedure in (5.3). Just as we earlier obtained a specific computational procedure
from a difference equation by adopting a notational convention, we have here
obtained a specific computational procedure from H(zl, 2,) by adopting a notation4
convention. Unless otherwise specified, the convention of expressing H(zl, 2,) in
the form of (5.2) and equating it to the computational procedure in (5.3) will be
followed throughout this chapter.
The computational procedure in (5.3) can be represented by the output and
input masks shown in Figure 5.1 for a specific choice of R, and R,. The system
function in (5.2) can be represented by the two sequences a(nl, n,) and b(nl, Q)
shown in Figure 5.2. From Figures 5.1 and 5.2, the output and input masks are
very simply related to a(nl, n,) and b(n,, n,). The regions of support of the output Figure 5.2 Sequences (a) a@,, n,) and (b) b(n,, n,) corresponding to the com-
and input masks, for example, are reflections of the regions of support of a(n,, n,) putational procedure with output and input masks shown in Figure 5.1.
and b(n,, n,) with respect to the origin. Since the output and input masks are
uniquely related to a(n,, n,) and b(n,, n,), they can be used interchangeably. The
sequences a(nl, n,) and b(nl, n,) will be referred to as the filter coefficients of an h,(nl, n,) in the spatial domain, we will want to choose R, and R, such that
IIR filter. h(n,, n,) will have at least approximately the same region of support as h,(nl, n,).
The first step in the design of an IIR filter is usually an initial determination If R, and R, have the same wedge support, then it can be shown (Problem 5.3)
that h(n,, n,) also has exactly the same wedge support. If we wish t o have a first-
of R, and R,, the regions of support of a(n,, n,) and b(n,, n,). The regions of
support of a(n,, n,) and b(n,, n,) are determined by several considerations. One quadrant support h(nl, n,), therefore, a reasonable choice would be first-quadrant
is that the resulting system must be recursively computable; this requires a(nl, n,) support a(n,, n,) and b(n,, n,). Another consideration relates to the filter spec-
to be a wedge support sequence. Another consideration is the approximate region ification parameters. In lowpass filter design, for example, a small F,, F,, and
of support that we wish the resulting filter h(n,, n,) to have. If we determine the transition region will generally require a larger number of filter coefficients. It is
filter coefficients by attempting to approximate some desired impulse response often difficult to determine the number of filter coefficients required to meet a
given filter specification for a particular design algorithm, and an iterative procedure
may become necessary.
One major difference between IIR and FIR filters is on issues related to
stability. An FIR filter is always stable as long as h(n,, n,) is bounded (finite) for
all (n,, n,), so stability is never an issue. With an IIR filter, however, ensuring
stability is a major task. One approach to designing a stable IIR filter is to impose
a special structure on H(zl, 2,) such that testing the stability and stabilizing an
unstable filter become relatively easy tasks. Such an approach, however, tends
to impose a severe constraint on the design algorithm or t o highly restrict the class
of filters that can be designed. For example, if H(z,, 2,) has a separable denom-
inator polynomial of the form A,(z1)A2(z2), testing the stability and stabilizing an
unstable H(z,, 2,) without affecting the magnitude response is a 1-D problem, and
is consequently quite straightforward. However, the class of filters that can be
designed with a separable denominator polynomial without a significant increase
in the number of coefficients in the numerator polynomial of H(z,, 2,) is restricted.
In addition, the filter coefficients a(nl, n,) are nonlinear functions of the unknown
parameters to be determined, and this imposes a severe restriction on some design
approaches. An alternative is to design a filter without considering the stability
Figure 5.1 (a) Output mask and (b) input mask corresponding to the computa-
t~onalprocedure in (5.3j ior a speciiic choice oi R, and R,.
I Sec. 5.1 The Design Problem 267
Infinite Impulse Response Filters Chap. 5
system that we wish to model with a rational system function H(z,, 2,). Suppose
issue, and then test the stability of the resulting filter and attempt to stabilize it if further that we use 6(n1, n,) as an input to the system and observe h,(n,, n,), the
it proves unstable. Although testing stability and stabilizing an unstable filter are output of the unknown system. One approach to estimating the system model
not easy problems, this approach does not impose severe constraints on design parameters [filter coefficients a(n,, n,) and b(n,, n,) in our case] is to require the
algorithms and does not unduly restrict the class of filters that can be designed. impulse response of the designed system to be as close as possible in some sense
For these reasons. we will stress this approach to designing stable IIR filters in this
to hd(n1, n2).
chapter. In the system identification literature, modeling a system by H(z,, 2,) in (5.2)
In the 1-D case, there are two standard approaches to designing IIR filters. is called auto-regressive moving-average (ARMA) modeling. When all b(k,, k,)
One is to design the filter from an analog system function. and the other is to in (5.2) are zero except b(0, 0), so that the numerator is a constant, it is called
design directly in the discrete domain. The first approach is typically much simpler auto-regressive (AR) or all-pole modeling. When all the values of a(k,, k,) in (5.2)
and more useful than the second. Using an elliptic analog filter's system function are zero except a(0, 0), so that the denominator is a constant, it is called moving-
and bilinear transformation, for example, we can design optimal IIR lowpass, average (MA) or all-zero modeling. Designing an IIR filter can be viewed as
highpass, bandpass, and bandstop filters by following a few simple steps. Unfor- ARMA modeling, and designing an FIR filter can be viewed as MA modeling.
tunately, this approach is not useful in the 2-D case. In the 1-D case, this approach Among many possibilities, one error criterion often used in the filter design
exploits the availability of many simple methods to design 1-D analog filters. Sim- is
ple methods do not exist for the design of 2-D analog filters.
The second approach, designing an IIR filter directly in the discrete domain, Error = xx ~ t Re
( n .nz)
e2(nl, n,) (5.4a)
can be classified into two categories. The first is the spatial domain design ap-
proach, where filters are designed by using an error criterion in the spatial domain. where e(n1, n2) = hd(n1, n2) - h(n1, n2) (5.4b)
The second is the frequency domain design approach, where filters are designed
by using an error criterion in the frequency domain. The magnitude response of and Re is the region of support of the error sequence. Ideally, R, is all values of
an IIR filter is often specified by using the tolerance scheme discussed in Section ( n , n ) . In practice, however, it extends only over a finite region of (n,, n,),
4.2, and the performances of different design algorithms are often compared on where hd(nl, n,) has significant energy. The mean square error in (5.4) is chosen
the basis of the tolerance scheme. Therefore, the weighted Chebyshev error because it is used widely in a number of system identification problems and some
criterion, also known as the min-max error criterion, is a natural choice for designing variation of it serves as the basis for a number of simple methods developed to
IIR filters. An error criterion of this type, however, leads to a highly nonlinear estimate system parameters.
problem. As a result, IIR filters are often designed in the spatial or frequency Minimizing the Error in (5.4) with respect to a(n,, n,) and b(n,, n,) is a
domain based on some reasonable error criterion that leads to simpler design nonlinear problem. To illustrate this, let us consider a simple example where
algorithms. This is analogous to FIR filter design by the window, frequency H(z,, 2,) is given by
sampling, and transformation methods, which are not based on the weighted Che-
byshev error criterion. In Section 5.2, we discuss spatial domain IIR filter design.
In Section 5.5, frequency domain IIR filter design is discussed.
where a,, a,, and b are the filter coefficients to be estimated. The computational
procedure corresponding to (5.5) is given by
5.2 SPATIAL DOMAIN DESIGN
In spatial domain IIR filter design, some desired or ideal spatial domain response When x(n,, n,) = 6(n1, n,), y(n,, n,) is h(n,, n,) which is given by
of a system to a known input is assumed given. The filter coefficients are estimated
such that the response of the designed filter to the known input is as close as
possible in some sense to the desired response. This does not, of course, minimize
the Chebyshev error norm, but it is a natural approach if we wish the filter to
preserve some desirable spatial domain properties. The input often used in IIR
filter design is 6(n1, n,), and the desired impulse response that is assumed given is
denoted by hd(nl, n,).
Spatial domain design can be viewed as a system identification problem. The
system identification problem occurs in a number of different contexts, so it has
received considerable attentior, in the literature. Suppcse :.e have an u n k n o r ,
Sec. 5.2 Spatial Domain Design 269
268 Infinite Impulse Response Filters Chap. 5
reasonable to define an error sequence eM(nl, n,) as the difference between the
left-hand and right-hand side expressions of (5.11) :
From (5.4) and (5.7), the error is given by
Error = (hd(O, 0) - b)' + (hd(l, 0) + 0 ) ' + (hdo. 1) + a2bl2 (5.8)
It is clear that eM(nl, n,) in (5.12) is not the same as e(nl, n,) in (5.4b). The
Clearly, minimizing the Error in (5.8) with respect to the filter coefficients a,, a,, subscript M in eM(nl, n,) is used to emphasize that eM(nl, n,) is a modification
and b is a nonlinear problem. of e(n,, n,). However, it is linear in the unknown filter coefficients a(n,, n,) and
One approach to minimizing the Error in (5.4) is to use well-known standard
b(n1, n,).
descent or hill-searching algorithms, which apply to a wide class of system iden- There are many different interpretations of (5.12). One useful interpretation
tification problems, or to use somewhat ad hoc iterative procedures that are specific can be obtained by rewriting (5.12). Expressing the right-hand side of (5.12) using
to the IIR filter design problem. An alternate approach is to slightly modify the the convolution operation, we find that (5.12) leads to
Error in (5.4), so that the resulting algorithm leads to closed form solutions that
require solving only sets of linear equations. We will begin with these suboptimal,
but much simpler algorithms.
This equation also shows that eM(nl, n,) is linear in a(nl, n,) and b(nl, n,). Since
H(zl, 2,) = B(zl, z2)lA(zl, 2,) and therefore b(nl, n,) = h(nl, n,) * a(nl, n,),
5.2.1 Linear Closed-Form Algorithms
I (5.13) can be rewritten as
The algorithms we discuss in this section lead to closed-form solutions that require eM(n1, nz) = n2) * hd(n1, n2) - n,) * h(n1, n2). (5.14)
solving only sets of linear equations. They are considerably simpler computa-
From (5.14), it is clear that eM(nl, n,) in (5.12) is not defined in the domain of
tionally than the iterative procedures discussed in the next two sections. However,
none of them minimizes the Error in (5.4). hd(n,, n,) and h(nl, n,). Instead, it is defined in a new domain where both hd(nl, n,)
The methods discussed in this section are based on the observation that a and h(n,, n,) are prefiltered with a(nl, n,). For this reason, methods based on
(5.12) or equivalently (5.14) are called indirect signal modeling methods. Figure
reasonable modification of the error expression in (5.4) transforms a nonlinear
problem to a linear one. Consider a computational procedure given by 5.3 shows the difference between e(nl, n,) in (5.4b) and eM(nl, n,) in (5.12). Figure
5.3(a) shows e(nl, n,), which is defined as the difference between h,(n,, n,) and
h(nl, n,), corresponding to (5.4b). Figure 5.3(b) shows eM(nl, n,) defined in the
domain where both hd(nl, n,) and h(n,, n,) are prefiltered, corresponding to (5.14).
A simplification of Figure 5.3(b) is shown in Figure 5.3(c), which corresponds to
(5.13).
The observation that eM(nl, n,) defined in (5.12) or equivalently in (5.14) is
We will assume that there are p unknown values of a(n,, n,) and q + 1 unknown linear in the filter coefficients a(nl, n,) and b(nl, n,) can be used in a variety of
values of b(nl, n,), and thus a total of N = p q+ + 1 filter coefficients to be ways to develop closed-form algorithms for estimating a(n,, n,) and b(nl, n,). We
will now discuss a few representative methods.
determined. Replacing* x(n,, n,) with 6(n,, n,) and y(nl, n,) with h(nl, n,) in
(5.9) and noting that 2C b(kl, k,) 8(n1 - k,, n, - k,) is b(nl, q),we have
(k1,kz)ERb Pade matching. In Pade matching [Pade], mare often known as "Pade
approximation," e,(n,, n,) in (5.12) is set to zero for a finite region of (n,, n,)
that includes N = p + q + 1 points. This results in
If we replace h(nl, n,) in both sides of (5.10) with the given hd(n,, n,), then the eM(nl, n2) = 0, (nl, n2) R ~ a d e (5.15)
equality in (5.10) will not hold: where Rpadeconsists of N = p + q + 1 points. Since eM(n,, n,) is linear in
a(nl, n,) and b(n,, n,), (5.15) is a set of N linear equations in N unknowns. The
region Rpadeincludes R, and is usually chosen such that hd(n,, n,) has large am-
plitudes for ( 4 , n,) E RPad,. If hd(nl, n,) has wedge support, Rpad, is typically
Since we wish to approximate hd(n,, n,) as well as we can with h(nl, n,), it is chosen near the origin. If Rpadeis properly chosen, h(nl, n,) will match hd(nl, n,)
exactly for N points of (n,, n,). The set of N linear equations for N unknown
*The general philosophy behind the methods we develop does not restrict us to this
choice of x(n,, n,). Sec. 5.2 Spatial Domain Design 271
Error = 2
,,I= - x n,=
2 -f
eL(n,, n2)
where
and
Figure 5.3 Mean square and modified mean square errors. (a) e(n,, n,) used in The expression El in (5.18b) consists of q + 1 terms, and E, in (5.18~)consists of
the mean square error criterion; (b) e,(n,, n,) used in the modified mean square a large number of terms. Consider E2 first. From (5.12) and (5.18c), E, can be
error criterion; (c) simplification of (b). expressed as
parameters in (5.15) typically has a unique solution. If there are multiple solutions,
the number of unknown parameters N can be reduced or RPadecan be expanded.
If there are no solutions, one approach is to choose a different region RPade. - b(nl, n,)),. (5.19)
One problem with Pade matching is that the resulting filter may not be stable.
This is typically the case with other spatial domain design techniques, which we For (n,, n,) t? Rb, b(nl, n,) = 0, and therefore (5.19) simplifies to
will discuss; but the problem is more severe with Pade matching. As we will
discuss later, other methods attempt to reduce eM(n,, n,) for all regions of (n,, n,).
As long as hd(nl, n,) is stable, it is likely that h(nl, n,) will be stable. Another,
much more serious, problem with Pade matching is that eM(nl, n,) is minimized We observe that E, does not depend on b(n,, n,). Now consider El. Since El
only over N = p + q + 1 points of (n,, n,), an area that is typically much smaller consists of q + 1 terms which are quadratic in a(n,, n,) and b(nl, n,) and there
than the effective extent of h,(n,, n,). If we wish to reduce eM(nl, n,) over a are q + 1 terms of b(nl, n,), we can choose b(nl, n,) such that El is zero for any
larger region of (n,, n,), then we have to increase the model order of the filter. set of a(nl, n,). This means that minimizing the Eiror in (5.18a) with respect to
The only exception is when the desired h,(nl, n,) can be exactly represented by a a(n,, n,) is equivalent to minimizing E2with respect to a(nl, n,). This observation
low-order rational model; in practice, this is not likely to be the case. If we wish
Sec. 5.2 Spatial Domain Design 273
272 Infinite Impulse Response Filters Chap. 5
There are a number of variations to the method described above. For ex-
can also be verified by explicitly writing down p + q + 1 linear equations given 1 ample, both a(nl, n,) and b(n,, n,) have been estimated by minimizing the Error
by (5.17).
Minimizing E2 in (5.18) with respect to a(n,, n,) results in p linear equations
for p unknowns-given by
11 in (5.16). Because of the specific choice of e,(n,, n,), however, b(n,, n,) was
determined essentially from only q + 1 terms of e,(n,, n,). One approach to
improving the estimate of b(n,, n,) is to use the original error expression in (5.4),
which is repeated:
Error = C C e2(nl, n,)
( n I ,nz)E Re
where r(k,, k,; I,, 1,) = C 2 hd(nl - k,, n,
(m ,n2) LRb
- k2)hd(nl - I,, n, - 1,). (5.21b)
= C CRe n2) - h(n1, n,)),.
( n I ,nz) E
In either case, the resulting equations for b(n,, n,) are given by
I
Since hd(nl, n,) and v(n,, n,) are known in (5.26), the Error in (5.26) is a quadratic
form of the unknown parameters b(n,, n,) and therefore can be estimated by solving
a set of linear equations. Differentiating the Error in (5.26) with respect to
We can see from (5.23) that b(n,, n,) does not require solving q + 1 simultaneous b(l,, I,) for (I,, I,) € Rb and setting it to zero yields
linear equations, and the coefficients of b(n,, n,) can be solved one at a time.
Noting that a(0, 0) = 1, we can also express (5.23) as
Equation (5.24) can also be obtained from (5.13) and (5.22). The method discussed
Equation (5.27) is a set q + 1 linear equations for q + 1 unknowns of b(n,, n,).
above can be viewed as a straightforward application of a system identification
One major advantage of all the methods discussed in this section is that they
method developed by [Prony] to the problem of designing a 2-D IIR filter. Since
this method was first applied to the 2-D IIR filter design problem by [Shanks et lead to closed-form solutions that require solving sets of linear equations. This
has been accomplished by modifying the original error function in (5.4). Although
al.], it is sometimes referred to as Shanks's method.
The advantages of Prony's method over Pade matching are clear. Like Pade such methods as Prony's produce reasonable results in terms of the Error in (5.4),
matching, Prony's method will estimate the model parameters exactly if hd(nl, n,)
can be represented exactly by a low-order rational model. In addition, the error
e,(n,, n,) is reduced over a much larger region of (n,, n,). Although there is no
guarantee that Prony's method will result in a stable filter, it is more likely than
Pade matching to do so. Qualitatively, an unstable filter is likely to result in
h(n,, n,) with large amplitude, and e(n,, n,) in (5.4) or e,(n,, n,) in (5.12) will
tend to be large for some region of (n,, n,) for a stable hdn,, n,). Since Prony's Figure 5.4 Error signal e(n,, n,) used in estimating b(n,, n,) in the modified
method attempts to reduce the total square error, the resulting filter is likely to Prony's method.
be stable.
274 Infinite Impulse Response Filters Chap. 5 I Sec. 5.2 Spatial Domain Design 275
From (5.30), if v(n,, n,) is somehow given, then e(nl, n,) is linear in both a(nl, n,)
none of these methods minimizes it. Some examples of filters designed by the and b(nl, n,), so minimization of C,, C,, e2(nl, n,) with respect to a(nl, n,) and
methods discussed in this section will be shown in Section 5.2.5 after we discuss b(nl, n,) is a linear problem. Of course, if v(nl, n,) is given, then a(n,, n,) will
other spatial domain design techniques. In the next two sections, we will consider automatically be given by (5.29b). However, the above discussion suggests an
iterative algorithms that attempt to reduce the Error in (5.4) better than do the iterative procedure, where we begin with some initial estimate of a(n,, n,), obtained
closed-form methods discussed in this section. using a method such as Prony's, obtain v(nl, n,) from a(nl, n,), and then minimize
C,, Cn2e2(n1, n,) with respect to both a(nl, n,) and b(nl, n,) by solving a set of
linear equations. We now have a new estimate of a(nl, n,), and the process
5.2.2 Iterative Algorithms
The methods discussed in the previous section suggest an ad hoc iterative procedure
I continues. Since v(nl, n,) is obtained from the previous estimate of a(nl, n,) and
it "prefilters" eM(nl, n,) to obtain e(nl, n,), this procedure is called the iterative
prefiltering method. The method is sketched in Figure 5.5.
that may improve algorithm performance in reducing the Error in (5.4). Suppose In neither of the two iterative procedures discussed in this section, is the
we estimate a(nl, n,) by solving (5.21). We can then estimate b(nl, n,) by solving algorithm guaranteed to converge. In addition, the conditions under which either
(5.27). If we assume we know b(nl, n,), we will be able to reestimate a(nl, n,) algorithm converges are not known. Both methods require computation of
by minimizing the Error in (5.18). When we estimate both a(nl, n,) and b(nl, n,), v(nl, n,), the inverse of a(nl, n,), in each iteration. If l/A(zl, 2,) is unstable at
El in (5.18b) is used in estimating b(nl, n,) and the equations in (5.21) are derived any point, the iterative procedure cannot proceed any further. Despite these
by minimizing only E, in (5.18~). Since b(nl, n,) is now assumed known, both El problems which have been encountered in practice, the iterative prefiltering method
and E, can be used in deriving a new set of equations for a(nl, n,). With a new has been successfully used in the design of some IIR filters [Shaw and Mersereau].
estimate of a(nl, n,), we can reestimate b(nl, n,) by solving (5.27), and the iterative The algorithm is usually terminated before convergence. The filter designed, when
procedure continues. the method is successful, appears to be "good" within a few iterations, and the
Another iterative procedure that has somewhat better properties than that error based on e(nl, n,) in (5.30) decreases very slowly after a few iterations. If
above is an extension of a 1-D system identification method [Steiglitz and McBride]. the iterative prefiltering method does converge, then the converging solution can
From (5.14), e(nl, n,) = hd(nl, n,) - h(nl, n,) is related to eM(n,, n,) by
where
--+
Equation (5.30) can also be derived from
New estimate of
a(nl, n2)and b(nl, n2)
Desired
a(nl, n2),b(n1. nz) Figure 5.5 Iterative prefiltering
method.
From (5.51), the DFP matrix R' can be viewed as the inverse of a finite difference 5.2.4 Zero-Phase Filter Design
approximation to the second-order derivative. Equation (5.51) is the approximate
result from the Taylor series expansion. More generally, R' in the DFP method In some applications, zero-phase design may be necessary. As we discussed in
can be shown to approximate the inverse Hessian matrix Hyl(O*) by the Taylor Chapter 4, it is simple to design zero-phase FIR filters. It is impossible, however,
for a single recursively computable IIR filter to have zero phase. To have zero
series expansion and requires calculation of only Vf (0). Since no second-order
derivatives are calculated, the DFP method requires far less computation in each
iteration than the NR method.
i phase, h(nl, n,) must be equal to h(-n,, -n,). An IIR filter requires an infinite
extent h(nl, n,). Recursive computability requires the output mask to have wedge
The DFP method starts out like the steepest descent method, which is known support. These requirements cannot all be satisfied at the same time. It is pos-
to be very effective in reducing f (0) in the earlier stage. The method then adapts sible, however, to achieve zero phase by using more than one IIR filter. A method
itself to become more like an NR method, which is known to have the desirable particularly well suited to spatial domain design is to divide hd(nl, n,) into different
behavior of quadratic convergence when the iterative solution is sufficiently close regions, design an IIR filter to approximate hd(nl, n,) in each region, and then
to the true solution. Unlike the NR method, the DFP method has the property combine the filters by using a parallel structure.
that R' can be forced to remain positive definite, which is desirable for algorithm Suppose we have a desired hd(nl, n,). Since zero phase is desired, we assume
convergence. The method has proven very successful in a wide variety of param- that
eter and state estimation problems.
The methods discussed in this section require first-order or both first- and hd(nl, n2) = hd(-nl, -n2). (5.52)
second-order derivative information of the form af (0)la0, and a2f (O)la0,a0,. Since We can divide hd(n,, n,) into an even number of regions: two, four, six, eight, or
the error to be minimized in the spatial domain IIR filter design is of the form more. Suppose we divide hd(nl, n,) into four regions by
ZnlZnz(hd(nl,n,) - h(n,, n,)),, we need to compute ah(n,, n2)la0,and a2h(nl, n2)la01a0,,
hfi(n1, n2) = hd(n1, nz)w(nl, n2) (5.53a)
where the parameters are the filter coefficients a(nl, n,) and b(nl, n,). Exploiting
the form of the computational procedure in (5.10) that governs h(nl, n,), we can hff(n1, n2) = hd(n1, nz)w( - 4 , n2) (5.53b)
develop [Cadzow] relatively simple procedures for determining ah(nl, n,)lae, and
a2h(nl, n,)laeiae,. hY1(nl, n2) = hd(4, nz)w( - 4 , - nz) (5.53~)
These methods have been applied [Cadzow; Shaw and Mersereau] to the
design of IIR filters by minimizing ZnlZn,(hd(nl, n,) - h(nl, n,))'. The steepest and hfY(n1, n2) = hd(n1, nz)w(n,, - n2) (5.53d)
descent method has not been very successful when used alone, due to its slow
+ HI(z;z~~,
l)
The window sequence w(nl, n,) is shown in Figure 5.7. It is chosen such that using a parallel structure.
there will be maximum symmetry and the values in all four windows will add up
to 1. From (5.53) and (5.54), it is clear that
Similarly, HIV(z,, 2,) can be obtained from HI1(zl, 2,) by
Since hL(nl, n,), ha(nl, n,), h;I1(nl, n,), and hfY(nl, n,) are quadrant support se-
quences, they can be implemented by means of recursively computable systems. Since H1(zl, z,), H1I(zl, z,), H1I1(z1, z,), and HIV(z1, 2,) approximate h',(n,, n,),
Suppose we use one of the spatial IIR fiIter design techniques discussed earlier to hy(n,, n,), hfi"(nl, n,), and hLv(n,, n,), respectively, from (5.55), h,(nl, n,) will be
design H1(z,, 2,) that approximates hL(nl, n,). Similarly, suppose we have de- approximated by H(zl, 2,) given by
signed HH(z1, z,) that approximates ha(nl, n,). From (5.52) and (5.53),
hfi"(nl, n,) = hi(-n,, -n,). (5.56)
Therefore, Hn1(zl, 2,) that approximates hyl(nl, n,) can be obtained from
I
Each of the four filters approximates a one-quadrant support sequence and is
H1(z,, 2,) by recursively computable. In addition, H(zl, z,) has zero phase since H(zl, 2,) =
Hn1(z1, 2,) = H1(z;l, 2;'). (5.57) H(z; l , z l I). The system in (5.59) can be implemented by using a parallel structure
i as shown in Figure 5.8. The input is filtered by each of the four recursively com-
putable systems, and the results are combined to produce the output. If hd(nl,
n,) has fourfold symmetry,
hY(nl, n2) = hi( - nl, n2) (5.60)
and therefore Hu(zl, 3 ) can be determined from Hi(zl, 2,) by
Hu(zl, z,) = H1(z;l, 2,). (5.61)
In this case, H(zl, 2,) in (5.59) is given by
H ( z 1 , z ~ ) = H ( z , z 2 ) + H 1 ( z 1 , z 2 ) + H ( z 1 , z 1 ) + H 1 ( z 1 z 1 ) .(5.62)
From (5.62), only one filter needs to be designed in this case.
5.2.5 Examples
Figure 5.10 Zero-phase bandpass filter designed by the modified Prony's method.
(a) Perspective plot; (b) contour plot.
The desired impulse response used is the circularly symmetric ideal bandpass filter
with cutoff frequencies of 0 . 3 1 and
~ 0 . 7 1 ~to
, which a circularly symmetric Kaiser
window with radius of 16 points was applied. The zero-phase filter was again
designed by parallel combination of four one-quadrant filters. The first-quadrant
filter with 5 x 5-point a(n,, n,) and 5 x 5-point b(n,, n,) was first designed, and
the overall system function was then obtained from the first-quadrant filter de-
signed. The method typically requires a large number (on the order of 100) of
iterations and is very expensive computationally. However, the method can be
used with any error criterion. For a given error criterion, the method performs
better than other methods we discussed in this section.
In this section, we discussed spatial domain design methods. As we discussed,
simple design methods that require solving a set of linear equations are available.
In addition, it is a natural approach if we wish the filter to preserve some desirable
spatial domain properties such as the shape of the impulse response. However,
with spatial domain design methods, we do not have much control over frequency
domain design parameters. In addition, they are not optimal in the Chebyshev
error sense. An alternative to the spatial domain design is the frequency domain
design. We will first discuss the complex cepstrum representation of signals, which
is useful in our discussion of the frequency domain design.
1
straightforward extension of the l-D case. However, the problems to which they same domain as x(n) because ~ ( w is ) obtained from nonlinear modification of
have been applied differ markedly. The l-D complex cepstrum is primarily used X(0).
for deconvolution problems; the 2-D complex cepstrum, for problems related to In (5.65) and (5.66), xR(w) is very well defined. However, x1(w) = 8,(w)
IIR filter design. We will first summarize results on the l-D complex cepstrum is a multivalued function and can be expressed as
representation and then discuss how they can be applied to solving problems related %(a) = Ox(@) = e,(w) + 2nK(w) (5.67)
to l-D IIR filter design. The complex cepstrum is not very useful in l-D filter
design applications, due to the existence of simpler methods, but the general ideas where 0,(w) is the principal value of the phase of X(w) and K(w) is any integer
can be extended to the corresponding 2-D problems. function. T o uniquely define 0,(w), it is expressed in terms of its derivative 0:(u)
The complex cepstrum of a l - ~ i e ~ u e n x(n)
c e is defined by the system shown = d0,(u)ldu by
in Figure 5.15. The complex cepstrum is denoted by f(n), and its z-transform
~ ( z is) related to X(z) by
X(Z) = log X(z). (5.63) 1
I
A system that recovers x(n) from i(n) is shown in Figure 5.18. The exponential
operation is very well defined and has no ambiguity.
The complex cepstrum i(n) defined above exists only for a certain restricted
The term complex is used because X(o) is in general complex and the logarithmic class of sequences. For example, i ( n ) cannot be defined for x(n) = 0. Of the
operation used in (5.64) is a complex logarithm. The term cepstrum comes from many possibilities, we will restrict ourselves, for both theoretical and practical
reversing the letters in the first syllable of "spectrum." The complex cepstrum reasons, to a class of sequences x(n) for which i ( n ) defined in the above manner
f ( n ) is the inverse Fourier transform of a spectrum, but in a sense it is not in the is real and stable. A sequence x(n) which has a real a i d stable I(n) is said to
- .-I
have a valid complex cepstrum. Although i ( n ) is called a complex cepstrum, a
- valid i ( n ) is always real.
xln) -- X(z)
r ~ o g
i(z)
5 ii(n)
For any valid I(n), there exists a unique corresponding sequence x(n) which
- is real and stable and whose z-transform X(z) has no poles or zeros on the unit
Figure 5.15 Definition of the complex cepstrum of a sequence using the circle. To see this, note that ~ ( w is) a well-defined Fourier transform for a stable
z-:iar,sf~rm.
Sec. 5.3 The Complex Cepstrum Representation of Signals 293
292 Infinite Impulse Response Filters Chap. 5
I
Figure 5.18 System for recovering x(n) from its complex cepstrum i(n).
From (5.71), it is clear that IX(o)( cannot be zero or infinite, and therefore X ( z )
has no poles or zeros on the unit circle. In addition, ( X ( o ) l is even and 8,(0) is
odd. Therefore, x(n) is real and stable. Furthermore, since X , ( w ) is the imaginary
part of ~ ( o )d ,I ( w ) is odd, analytic, and periodic with a period of 2n. Since
~ , ( o=) 8,(o), 8,(0) is also odd, analytic, and periodic with a period of 2 n .
The condition that x(n) must be real and stable, with no zeros or poles on
the unit circle, is necessary but not sufficient for x(n) to have a valid i ( n ) . For
example, x(n) = 6 ( n ) - 2S(n - 1 ) satisfies all the above conditions, but does not
have a valid Z(n). If a sequence satisfies all the necessary conditions above and
has a z-transform of the form
where A is positive and (ak(,(bk(,(ck/,Idk( < 1, then x(n) has a valid i ( n ) . This
can be seen by considering each individual term in (5.72) and showing that the
unwrapped phase of each individual term is odd, analytic, and periodic with a
period of 2 ~ .
The condition in (5.72) is highly restrictive, and a typical x(n) does not have
the form in (5.72). However, any real and stable x(n) with a rational z-transform
with no poles or zeros on the unit circle can be very simply modified such that the
modified sequence x f ( n )will have a valid complex cepstrum. The modification
involves multiplying x(n) with - 1 andlor shifting x(n) by K points for some fixed
integer K. This modification is shown in Figure 5.19. One way to determine if
- 1 should be multiplied to x(n) and what value of K should be used in Figure
5.19 is
Multiply by - 1, if X(o)lw,, c 0 (5.73a)
and
Figure 5.17 Example of an unwrapped phase function. (a) Principal value of the
phase function of X(w) = ( 1 - !e-jW)(l - fe-jW);(b) unwrapped phase function where 0,(0) is defined in (5.68). The integer K is also given, from the Argument
of X(w) in (a).
&
if two sequences are convolved, then their complex cepstra add. This is the basis
behind its application to deconvolution problems. Alternatively, if the complex
y(n)
x(n) = Delay by K points cepstra add, then the corresponding spectra multiply. This is the basis behind its
application to spectral factorization problems.
Property 2 defines the stable inverse v ( n ) for a sequence x(n) that has a valid
complex cepstrum. The stable inverse v ( n ) is related to x(n) by
Figure 5.19 Modification of x ( n ) such that the modified sequence has a valid
complex cepstrum.
Principle, by It is important to note that v ( n ) in (5.75) is guaranteed to be stable, but its region
of support can be different from that of x(n). From Property 2, a causal f ( n )
K = number of zeros of X ( z ) inside the unit circle
(5.74) implies a causal x(n) and v ( n ) . A sequence is called a minimum phase sequence
- number of poles of X ( z ) inside the unit circle. if R(n) is causal. If a minimum phase sequence x(n) has a rational z-transform,
all its poles and zeros will be inside the unit circle. Since v(n) is guaranteed to
In some applications, the modification made to x(n) can be very easily taken into
be stable, v ( n ) is a causal and stable sequence for a minimum phase sequence x(n).
account. If the modification cannot be taken into account in a simple manner,
we are constrained to a highly restrictive set of sequences if we wish to use the This property, together with Properties 3 and 4, can be used in stabilizing an
unstable filter, as will be discussed in the next section.
complex cepstrum representation.
Property 3 states that even if x(n) does not have a valid complex cepstrum,
r(n) = x(n) * x ( - n ) can have a valid complex cepstrum. In this case, both r(n)
5.3.2 Properties of the One-Dimensional Complex
and F(n) are even sequences.
Cepstrum
Property 4 states that f (n) is typically an infinite-extent sequence, even though
x(n) may be a finite-extent sequence. Suppose x(n) = 6(n) - $S(n - 1 ) . Its
A number of useful properties can be derived from the definition of the complex
complex cepstrum f ( n ) is given by
cepstrum. They are listed in Table 5.1.
All the properties in Table 5.1 can be shown from the properties of the Fourier
transform and from the definition of the complex cepstrum. Property 1 states that
This property shows that computing the complex cepstrum can be a problem in
TABLE 5.1 PROPERTIES OF THE ONE-DIMENSIONAL COMPLEX CEPSTRUM
practice. If we replace the Fourier transform and inverse Fourier transform op-
-
Property 1. Suppose x ( n ) and w ( n ) have valid complex cepstra i ( n ) and iv(n). We then have
y ( n ) = x ( n ) * w(n)
Y ( 0 ) = X(w)W(o)
j ( n ) = f ( n ) + tO(n)
Y(") = *(") + w(")
erations in Figure 5.18 by the N-point DFT and inverse DFT operations, then
fc(n), the computed complex cepstrum, will be given by
-
degraded further.
the stable inverse of x ( n ) .
x ( n ) , v ( n ) : causal f ( n ) : causal. 5.3.3 Applications of the One-Dimensional Complex
Property 3. Suppose x(n) is real and stable, and has a rational z-transform with no poles or zeros Cepstrum
on the unit circle. The sequence x(n) may or may not have a valid complex cepstrum.
Suppose we define r(n) by The 1-D complex cepstrum representation is most often used in solving decon-
r(n) = x(n) * x( - n). volution problems. Suppose two sequences x(n) and w(n) are combined by con-
Then r(n) is even and has a valid complex cepstrum ?(n) which is also even
volution
Property 4. The complex cepstrum i ( n ) is typically an infinite-extent sequence, even though x(n)
may be a finite-extent sequeoce.
Sec. 5.3 The Complex Cepstrum Representation of Signals 297
296 Infinite Impulse Response Filters Chap. 5
One simple approach to solving this stabilization problem is to compute the
roots of A ( z ) = 0 , reflect each root outside the unit circle to a root inside the unit
and we wish to separate x(n) from w ( n ) . In general, y(n) does not have a valid circle at a conjugate reciprocal location, and scale the result appropriately. For
complex cepstrum. In typical applications, however, recovering + x(n - K ) for example, suppose the causal system function H ( z ) is given by
some integer K is sufficient. In such a case, assuming that x(n) and w ( n ) are real
and stable and have rational z-transforms with no poles or zeros on the unit circle,
we can modify y ( n ) , using (5.73), and compute its complex cepstrum. Modifying
y ( n ) is equivalent to modifying x(n) and w(n) individually and then combining them
Clearly, the system is unstable due to a pole at z = 2 . We replace the pole at
by using (5.78). Since the delay and multiplication by - 1 are assumed to be z = 2 with a pole at 112* = 112. The resulting system function H,(z) is given by
unimportant, let us suppose that x(n) and w ( n ) and therefore y(n) in (5.78) all
have valid complex cepstra. Then from Property 1 in Table 5.1,
Suppose f ( n ) is separable from G ( n ) by a linear operation on j ( n ) . For example, The scaling factor k can be computed by requiring H(w)l,,, to have the same
i f f ( n ) consists of low-frequency components while G ( n ) consists of high-frequency amplitude as H,(o)(,,,. This leads to one choice of k = i, and (5.83) becomes
components, then lowpass filtering j ( n ) will result in f ( n ) . Then f ( n ) can be
recovered from linearly operating on j ( n ) . From R(n), x(n) can be recovered by
using (5.70). This is the basis of the homomorphic system for convolution. One
application of the above idea is the development of a homomorphic vocoder. On It is easy to verify that H,(z) is a causal and stable system with lH,(o)l = IH(w)l.
a short time basis, human voiced speech may be approximately modeled by (5.78), An alternate approach is to use the complex cepstrum representation. This
where x(n) is a vocal tract impulse response and w ( n ) is a train of impulses with is shown in Figure 5.20. Consider the sequence r(n) given by
equal spacing. By exploiting the result that f ( n ) decays very fast while G ( n ) is
another train of pulses with the same spacing as w ( n ) , low-time gating of j ( n ) can r(n) = a ( n ) * a(-n). (5.85)
be used to approximately separate i ( n ) from G ( n ) . This is the rationale for sep-
If a(n) is real and stable, and has a rational z-transform with no poles or zeros on
arating the vocal tract impulse response from the pitch information in the devel-
the unit circle, then r(n) has a valid complex cepstrum i ( n ) , according to Property
opment of a homomorphic vocoder.
3 in Table 5 . 1 . From (5.85),
Another application of the 1-D complex cepstrum, one which is not very
useful in 1-D signal processing but is significant in 2-D signal processing, is the R ( o ) = IA(o)I2. (5.86)
stabilization of an unstable filter. Consider a system function H ( z ) = B ( z ) I A ( z ) We can therefore compute R ( o ) directly from a ( n ) , as shown in Figure 5.20. Since
where A ( z ) and B ( z ) do not have a common factor. Since B ( z ) does not affect
R ( o ) is always real and positive, Or(@)is zero, and computation of ?(n) does not
system stability, we will assume that B ( z ) = 1. Suppose H ( z ) = l I A ( z ) has been
designed so that it is a causal system. Then A ( z ) is in the form of
We first assume that A ( z ) has no zeros on the unit circle. If A ( z ) has zeros on
the unit circle, the problem is impossible to solve. It is also very important that
H,(z) remain a causal system. If we are willing to sacrifice causality, then the Figure 5.20 System for stabilizing an unstable system.
stabilization problem could be solved by simply choosing a different region of
convergence for H ( z ) that includes the unit circle. The Complex Cepstrum Representation of Signals
Sec. 5.3
" (1/2)k
i(n) = 2 log 2 6(n) - k=l k 6(n + k) - k2
- =l -
1 2k k ( n - k ) (5.93d)
- - - - -
A a a A
-' n
Figure 5.21 Window sequence w(n)
used in Figure 5.20.
d,(n) = i(n)w(n) = log 2 6(n) -
*
mk6(n - k)
k=l k
require a phase-unwrapping operation. We compute a causal ri,(n) from i(n) by A,(w) = log 2 + log (1 - le-1")
A,(w) = 2(1 - le-jw)
where
In (5.93b), we have expressed R(w) such that each of its factors corresponds to a
The sequence w(n) is sketched in Figure 5.21. With this choice of w(n),
I sequence that has a valid complex cepstrum. In this way, we can use Property 1
in Table 5.1 to obtain ~ ( w in
) (5.93~)and i(n) in (5.93d). The result in (5.93h)
is identical to the result in (5.84). Using the complex cepstrum is, of course,
The complex cepstrum &(n) is clearly real and stable, so the corresponding sequence considerably more complicated than the approach that led to (5.84). However,
as(n) can be computed from hS(n). The stabilized system is the approach of flipping the poles does not extend readily to the 2-D case, while
the approach based on the complex cepstrum extends to the 2-D case in a straight-
forward manner.
where Hs(z) is the stable inverse of A,(z). To see that this is a solution of the 5.3.4 The Two-Dimensional Complex Cepstrum
stabilization problem, note that is(n) is causal and therefore, from Property 2 in
Table 5.1, its stable inverse Hs(z) is also causal and stable. All that must be shown, The results developed in the previous three sections for the 1-D complex cepstrum
then, is that lHs(w)l = IH(w)(. From (5.87) and (5.89), and noting from Property representation can be extended to the 2-D case in a straightforward manner. In
3 in Table 5.1 that i(n) is even, we obtain this section, we summarize the results of Sections 5.3.1 and 5.3.2 extended to the
2-D case.
The complex cepstrum of a 2-D sequence x(nl, n,) is defined by the system
= f(n)[w(n) + w( - n)] (5.91) shown in Figure 5.22. The complex cepstrum denoted by .?(nl, n,) is related to
x(n1, n2) by
1
and causal inverse and the fact that R(w) = IAS(w)l2,the stabilization problem is
solved.
T o illustrate this approach with an example, let us consider the causal but
unstable system given by (5.82). The functions A(w), R(w), ~ ( w ) i(n), , i,(n),
A&), As(w), and Hs(z) are given by Figure 5.22 Definition of the complex cepstrum of a two-dimensional sequence.
Figure 5.23 Integration path of the Figure 5.25 Modification of x(n,, n,) such that the modified sequence has a valid
phase derivative used in defining the complex cepstrum.
u1 unwrapped phase in (5.96).
Property 2. Suppose x ( n , , n,) has a valid complex cepstrum i ( n , , n,). We define a new sequence where a(kl, k,) with a(0, 0) = 1 is a wedge support sequence. We wish to find an
v(n1, n2) by Hs(zl, 2,) that is recursively computable, with h,(nl, n,) having the same or similar
region of support as h(nl, n,), and at the same time stable, with /Hs(ol, o,)] given
by
The sequence v ( n l , n,) also has a valid complex cepstrum 6(n1, n,) = - i ( n , , n,). IHs(ol, oz)J = IH(%, 4 1 . (5.101)
Then r(n,, n,) is even and has a valid complex cepstrum P(n,, n,) that is also even. next factoring R(z), and then collecting the terms whose poles are inside the unit
Property 4. The complex cepstrum i ( n , , n,) is typically an infinite-extent sequence, even though circle. Since R(z) = R(zdl), each pole inside the unit circle has a corresponding
x(n,, n,) may be a finite-extent sequence. pole at a conjugate reciprocal location outside the unit circle. This procedure for
determining A,(z) ensures that H,(z) = lIAs(z) will have all its poles inside the
unit circle and A,(z) will satisfy
cepstra are combined by convolution, then their complex cepstra 2(n1, n,) and
i9(n1, n2) add. If 2(n1, n,) is linearly separable from iu(nl, n,), linear operation
From (5.102) and (5.103),
on the complex cepstrum can lead to separation of x(nl, n,) from w(nl, n,). Un-
fortunately, problems where f (n,, n,) is linearly separable, even approximately,
from fi(nl, n,) are not common in practice. An important application of the
2-D complex cepstrum representation is the stabilization of an unstable IIR filter. From (5.104), it is clear that the method can be used only when A(z) can be
factored as a product of other polynomials.
This is discussed in the next section.
An extension of (5.104) to the 2-D problem is
One approach to approximately solving the stabilization problem is to allow as(nl, n,)
t o become an infinite-extent sequence, solve the stabilization problem, and then
truncate as(nl, n,), hoping that the truncation will not affect the solution too much.
Once we allow as(nl, n,) to be an infinite-extent sequence, (5.105) can be satisfied.
A 2-D finite-order polynomial cannot in general be factored as a product of other
finite-order polynomials, but can always be factored as a product of many different Figure 5.27 Window sequence
infinite-order polynomials. A n approach that solves the stabilization problem with w(n,, n,) used in Figure 5.26.
an infinite-extent as(n,, n,) is a straightforward extension of the 1-D approach
discussed in Section 5.3.3.
The 2-D stabilization system based on the complex cepstrum representation
is shown in Figure 5.26. We consider r(n,, n,) given by
half-plane filter whose region of support lies in the first and second quadrants,
except the line corresponding to n, < 0, n, = 0, is shown in Figure 5.27. The
complex cepstrum ts(nl, n2) is clearly real and stable, and the corresponding se-
Even though a(n,, n,) may not have a valid complex cepstrum, r(n,, n,) has a valid
is
quence a3(nl, n2) can be computed from &(nl, 4). The stabiliied system Hs(zl, 2,)
i(n,, n,), according to Property 3 in Table 5.2. Since R(o,, o,) is always positive,
the unwrapped phase is zero, and no phase-unwrapping operation is necessary.
We compute a wedge support sequence as(nl, n,) by
where the window sequence w(nl, n,) is chosen such that w(n,, n,) has a region where Hs(zl, 2,) is the stable inverse of As(zl, z,). From Property 2 in Table 5.2,
of support that includes the region of support of a(nl, n,) and Hs(zl, z,) is stable and hs(nl, n2) has a wedge-shaped region of support that includes
the region of support of a(n,, n,). In addition, from (5.107) and (5 .108),
(5.1 10)
i.(n,, n2) = hS(nl, n2) + bs( - nl, - n2).
The window w(nl, n2) that can be used when a(nl, n,) corresponds to a nonsymmetric
-
From (5.110),
(5.111)
R(o1, 02) = IAs(o1, @2)12.
dnl, n21- F l.lZ
This shows that (5.109) solves the stabilization problem. In essence, R(zl, 2,) in
3(n1, n2) (5.105) is factored by dividing i (n,, n,) into two pieces. We choose, among many
ways of dividing yn,, n,) into two pieces, the way that will factor R(zl, 2,) such
that the resulting factor As(zl, 1,) when used in (5.109) will solve the stabilization
w(nl, n2)
problem.
The sequence as(nl, n,) obtained in this manner is typically an infinite-extent
F - &(n,. n2)
sequence. One approach to obtaining a finite-extent sequence from as(nl, n,) is
to window as(nl, n,). This is, in a sense, similar to the window method for FIR
filter design. I n FIR filter design, the windowed sequence constitutes the coef-
ficients of the numerator polynomial, while as(nl, n,) denotes the coefficients of
Figure 5-x S Y S Ifor
~ ~srabiiizing an unstable sysrem.
Sec. 5.4 Stabilization of an Unstable Filter 307
Infinite Impulse Response Filters Chap. 5
to as the planar least squares inverse (PLSI). Since c(n,, n,) is an approximate 'I
the denominator polynomial. Windowing a,(n,, n,) may affect the stability of inverse of a(n,, n,), we expect that
H,(z,, z,), and the magnitude constraint in (5.101) is no longer exactly satisfied.
In addition, spatial aliasing results when the Fourier transform and inverse Fourier
transform operations are replaced by DFTs and inverse DFTs. In typical cases,
a,(n,, n,) decays rapidly, and the windowing operation tends to preserve stability As we increase M, and M, [the size of c(n,, n,)], C(o,, 0,) will approximate
without significantly affecting the magnitude. The choice of sufficiently large DFT l/A(o,, o,) better, but determining c(n,, n,) will require more computations.
and inverse DFT sizes, combined with rapid decay of both as(nl, n,) and 3(nl,-n,), Shanks conjectured [Shanks et al.] that a first-quadrant support system with
reduces the spatial aliasing problem. In essence, solving the 2-D stabilization system function l/C(z,, z,), where C(z,, 2,) is the z-transform of the PLSI c(n,, n,)
problem exactly is not theoretically possible, and some constraint must be relaxed. of any first-quadrant support sequence a(n,, n,), is stable. This is known as
The system in Figure 5.26 provides one approach to approximately solving the Shanks's conjecture. Suppose Shanks's conjecture is true. We can determine an
2-D stabilization problem. N, x N2-point first-quadrant support sequence d(n,, n2) by computing the PLSI
of c(n,, n,). Note that the region of support size of d(n,, n,) is chosen to be the
5.4.2 Planar Least Squares Inverse Method same as that of a(n,, n,). The sequence d(n,, n2) is called a double PLSI of a(n,, n,).
Clearly, the first-quadrant support system with system function l/D(zl, 2,) would
In the complex cepstrum method, we allowed A&,, 2,) to be an infinite-order be stable. In addition,
polynomial, solved the stabilization problem exactly, and then truncated the re-
sulting A,(z,, 2,). In the method discussed in this section, we require A,(z,, 2,)
to be a finite-order polynomial but allow (A,(o,, 02)(to be only approximately
equal to ( ~ ( o ,o,)\.
, From (5.116) and (5.117),
Let a(n,, n,) be an N, x N2-point first-quadrant support sequence. Let
i(nl, n,) denote an inverse of a(n,, n,), so that D(% 02) z A(u1, 02). (5.118)
An approximate solution to the stabilization problem would then be given by
The Error in (5.114) can also be expressed in the frequency domain as 5.5 FREQUENCY DOMAIN DESIGN
Error = - J
( 2 ~ ,)I=~-,-,
I
wz=-n
11- 5.5.1 Design Approaches
From (5.114), the error expression is a quadratic form of the unknown parameters In frequency domain IIR filter design, some desired or ideal frequency domain
c(n,, n,), and therefore its minimization with respect to c(n,, n,) is a linear problem. response to a known input is assumed given. The filter coefficients are estimated
The solution c(n,, n,) to the above minimization problem is called the least squares such that the response of the designed filter in the frequency domain will be as
inverse of a(n,, n,). ?'he least squares inverse of a 2-D sequence is often referred
Sec. 5.5 Frequency Domain Design 309
308 Infinite Impulse Response Filters Chap. 5
for a stable hd(nl, n,). In frequency domain design based on (5.120), H(ol, o,)
is obtained by evaluating H(zl, 2,) on the unit surface:
close as possible in some sense to the desired response. As with the spatial domain
IIR filter design, an input often used is 6(n1, n,), and the desired frequency response
that is assumed given is Hd(ol, %), which either includes both the magnitude and I
phase responses or only includes the magnitude response IHd(ol, %)I. 2 2 b(kl, k2)zik1z;"
(f 1,f2) E Rb
'1
where H(zl, 2,)
If we define a filter to be optimal when it minimizes the weighted Chebyshev
error norm, an obvious error criterion will be the minimization of the maximum
=
1+ 2 2- a(kl, k , ) ~ ; ~ l z ~ ~ ~(5.123b)
(kl3k2)€Ro (0.0)
'
weighted error over the domain of approximation. This error criterion, however,
makes the minimization problem highly nonlinear. Even though the standard The region of support of h(nl, n,) obtained by minimizing (5.120) with H(ol, 0,)
descent algorithms discussed in Section 5.2.3 can in theory be used, computing the in (5.123a) now depends on the specific choice of the filter coefficients a(nl, n,)
first- and second-order partial derivative information typically required in such and b(nl, n,). If Rhd, the region of support of hd(nl, n,), is approximately the
algorithms is quite involved. As a result, this approach has been considered only same as R,, a(nl, n,), b(nl, n,) and h(nl, n,) obtained from both spatial and
for the design of very low-order filters. frequency domain designs are likely to be the same. If R,, is significantly different
Another frequency domain error criterion that has been considered is from the assumed R,, a(nl, n,) and b(n,, n,) obtained can be different. In this
case, the region of support of the filter designed may no longer be the same as the
Error = -
1 "
(2n)2 Jwl = -" la=
71
-" IW(oi, %)21E(~1,%)I2 do1 do2 (5.120a) assumed R,. We can require the filter in (5.123b) to have the same region of
support as the assumed R,, but we will then be affecting the stability of the filter.
where E(o1, ~ 2 =) Hd(o1, ~ 2 -) o2) (5.120b) Therefore, if we require the resulting filter to have the assumed region of support
R, to ensure that the filter is recursively computable, the filter obtained from
and where W(wl, o,) is a positive weighting function that can take into account frequency domain design is likely to be unstable more often than the filter obtained
the relative importance of different frequency components. The error criterion from spatial domain design. In essence, we have assumed that the filter designed
in (5.120) can be very simply related to a spatial domain error criterion. Using has a particular region of support R, in spatial domain design, and we have assumed
Parseval's theorem, we find that (5.120) is equivalent to that the filter designed is stable in frequency domain design. These two assump-
tions are not always the same and can lead to two different filters. An example
Error = 2
nl=-=
2 (44, n2) * e(n1, n2))2 (5.121a) that illustrates this point can be found in Problem 5.22.
n2= -X
In addition to the difference between (5.120) and (5.121) discussed above,
where n2) = hd(nl, n2) - h(n1, n2). (5.121b) there is another difference in practice. In the spatial domain, hd(nl, n,) is typically
truncated after a certain point, and h(nl, n,) approximates a truncated version of
When W(ol, o,) = 1 so that equal weight is given to all frequency components, hd(nl, n,). In the frequency domain, Hd(ol, 0,) and H(ol, 0,) are typically
w(nl, n2) = 6(n1, q,), and the Error in (5.120) reduces to evaluated on a Cartesian grid, and samples of H(ol, o,) approximate samples of
Hd(ol, o,). When this is interpreted in the spatial domain, an aliased version of
h(nl, n,) approximates an aliased version of hd(nl, n,). When a sufficiently large
region of hd(nl, n,) is used in the spatial domain and a sufficiently large number
This is precisely the same error criterion that we considered in Section 5.2, and all of samples are used in the frequency domain, the difference should be minor.
the methods discussed in Section 5.2 apply to the minimization of the Error in Experience has shown that this is the case. This difference in practice between
(5.120) with W(ol, o,) = 1. Furthermore, the spatial domain design techniques (5.120) and (5.121) suggests a way of checking how significant truncating the tail
discussed in Section 5.2 can easily be extended to incorporate w(nl, n,) in (5.121). of hd(n,, n,) is in spatial domain design. Once h(nl, n,) is estimated, the error
Since w(nl, n,) is a known sequence, the presence of w(nl, n,) does not affect the can be computed by both (5.120) and (5.121). Since the tail parts of hd(nl, n,)
linearity or nonlinearity of the methods we discussed. In other words, solving a and h(nl, n,) affect (5.120) and (5.121) differently, a small difference in the two
set of linear equations with w(nl, n,) = 6(n1, n,) will remain the problem of solving computed errors is a reasonable indication that truncation of hd(nl, n,) is not very
a set of linear equations with a general w(nl, n,). significant.
Although (5.120) and (5.121) are equivalent error criteria, there are some Another error criterion that has been considered is
differences between the two in practice. In spatial domain design based on (5.121),
a specific R, (the region of support of h(nl, n,)) is assumed and h(nl, n,) is compared
to hd(nl, n,). Therefore h(nl, n,) designed will always have R, as its region of
support, will approximate hd(nl, n,) for (n,, n,) C R,, and will tend to be stable
5.6 IMPLEMENTATION
on the order in which the output values are computed. Suppose we have an N memory units for row-by-row computation (5.134) I
N X N-point input sequence x(nl, 4)that is zero outside 0 c n, I;N - 1, 0 5 n, z; 1:
1 memory unit for column-by-column computation.
L
5 N - 1, and we wish to compute the output y(nl, n,) in the same region, that
is, 0 5 n1 IN - 1, 0 s n, 5 N - 1. The output and input masks for (5.132) The problem of counting the number of memory units corresponding to
are shown in Figure 5.32. As was shown in Section 2.2.3, the boundary conditions z i l or z;l is further complicated by the fact that the region of support of the
can be computed from the output and input masks and x(nl, n2). The region of
support of x(nl, n,) and the boundary conditions are shown in Figure 5.33. We
will consider two of the many different orders in which the output can be computed,
boundary conditions depends on the output and input masks. Consider a com- The output y(n,, n,) in the cross-hatched region
putational procedure given by has to be computed to compute y(nl, n,) for 0 S
n , s N - l a n d O ~ n , % N -1.
The standard methods of implementing 1-D IIR filters are direct, cascade, and
parallel forms. These structures can be used for the realization of any causal IIR
filter with a rational z-transform. In the 2-D case, only the direct form can be
used to implement any recursively computable IIR filter with a rational z-transform.
As we will discuss in Section 5.6.3, the cascade and parallel forms can be used to
realize only a small subclass of recursively computable IIR filters.
To illustrate the direct form implementation, we will use a specific example.
Consider a system function H(z,, 1,) given by
l
l>1 -1 4
,,, ~
4
Figure 5-34 Required storage elements
corresponding to 2;' in (a) row-by-row
computation and (b) column-by-column The filter coefficients in (5.136) were arbitrarily chosen to be simple numbers.
(a1 fb) computation.
319
Sec. 5.6 Implementation
Infinite Impulse Response Filters Chap. 5
(b)
Figure 5.37 Signal flowgraphs corresponding to the computational procedure in Figure 5.38 Signal flowgraphs obtained from the signal flowgraphs in Figure 5.37
(5.137). by eliminating common delay elements.
by-row computation, while the structure in Figure 5.38(b) is preferable for column-
The computational procedure corresponding to (5.136) is given by by-column computation.
y(nl,n2)+2y(nl - 1,n2) + 3y(n1,n2 - 1) + 4y(n1 - 1,n2 - 1) + 5y(n1 - 2,n2 - 1) The signal flowgraphs in Figure 5.38 were obtained from those in Figure 5.37
by eliminating either redundant z i l elements or redundant z i l elements. It is
+ 6x(nl,n2) + 7x(nl - 1,n2) + 8x(nl,n2 - 1). (5.137)
natural to ask if we can eliminate both redundant 1;' elements and redundant
1,-1 elements. A structure which requires the theoretically minimum number of
A structure which can be obtained directly from (5.137) is shown in Figure 5.37(a).
'
If we reverse the locations of z; and z g l in Figure 5.37(a) and make corresponding z;' and 2%' elements is called a minimal r e a h t i o n . For a first-quadrant suppod
M, x M2-point sequence a(n,, n,) with a(M, - 1, Mi - 1) C 0 and N, x N2-
changes, we will also obtain the structure in Figure 5.37(b). Since delaying the
signals first and then adding the results is equivalent to adding the signals first and point sequence b(n,, n,) with b(Nl - 1, N2 - 1) f 0, it is not possible to obtain
then delaying the result, we can eliminate one z i l element in Figure 5.37(a) and a structure which requires fewer than max[Ml - 1 N - 11 1;' elements and
one z c l element in Figure 5.37(b). The resulting structures are shown in Figure max[M2 - 1, N2 - 11 zil elements, where max[.;] is the larger of the two
5.38. Note that the signal flowgraph representation makes it quite easy to see arguments. The total number of delay elements cannot therefore be less than
which delay elements are redundant and thus can be eliminated. The advantage max[M, - 1, N, - 11 + max[M2 - 1, N, - 11. Methods [Fornasini and Mar-
of the structures in Figure 5.38 over those in Figure 5.37 is the reduction in the chesini; Kung et al.; Chan; Fomasini] have been developed to obtain minimal
number of memory units required. The structure in Figure 5.38(a) requires 3 + N realizations. Unfortunately, however, branch transmittances in minimal realiza-
memory units for row-by-row computation and 3N + 1units for column-by-column tions are obtained by solving nonlinear equations and are complex-valued. The
computation. The structure in Figure 5.38(b) requires 2 + 2N memory units for advantages due to a smaller number of delay elements in minimal realizations often
both row-by-row and column-by-column computations. From the perspective of disappear due to the need to perform complex arithmetic and store complex-valued
memory requirements, then, the structure in Rgure 5.38(a) is preferable for row- elements.
vl (nl, n2) I
Middle region
Left region
, = ~ 4 ( n l- 1, n2).
~ * ( n ln2) (5.139~)
For this particular structure, this observation allows us to compute all the signal
values in the leftmost and rightmost regions shown in the signal flowgraph except Right region
y(nl, n,). We can then compute v,(n,, n,), since all signals entering v2(nl, n,) are
either through the delay element or through the leftmost and rightmost portions. Figure 5.41 Signal flowgraph in Figure 5.38(a), redrawn with each node marked
Therefore, with a specific signal value.
The above approach can also be used for other structures to determine the order
in which the intermediate signals can be recursively computed.
The structures discussed in this section are called direct form, since they were
obtained directly from the computational procedure. Direct form structures can
Figure 5.42 1-D signal flowgraph corresponding to the 1-D computational
be used to implement any recursively computable IIR filter. They can also be procedure in (5.144).
used to implement subsystems in an implementation with a special structure. This
wiil be discussed in a iater section.
Sec. 5.6 Implementation
324 Infinite Impulse Response Filters Chap. 5
variable s';(nl, n,), which has the superscript v, propagates in the vertical direction
The two variables sl(n) and s,(n) are state variables, and the two equations in
and is called a vertical state variable. Both the horizontal and vertical state variables
(5.145) are state equations. We can also express from the signal flowgraph the
are state variables in the state-space representation, and the four equations in
output y(n) as a linear combination of sl(n), s,(n), and x(n) as
- - (5.150) are state equations. We can express from the signal flowgraph the output
y(nl, n,) as a linear combination of the four state variables and x(nl, n,) as
Equation (5.146) is called the observation equation. Equations (5.145) and (5.146)
are a state-space representation of the computational procedure in (5.144).
A general state-space representation for any 1-D causal computational pro-
cedure with one input and one output is in the form of
Equation (5.151) is the observation equation. Equations (5.150) and (5.151) are
a state-space representation of the signal flowgraph in Figure 5.41. The equations
in (5.150) and (5.151) can be used to compute the output as an alternative to the
where s(n) is a vector of state variables. In the state-space representation, we eq;ations in (5.143).
relate the input to state variables and relate the state variables to the output. The A general state-space representation for any 2-D computational procedure
states summarize the past history of the system. The state and observation equa- with first-quadrant support a(nl, n,) and b(nl, n,) for one input and one output is
tions in (5.147) can be used in generating future state and output values. in the form of
As in 1-D, a 2-D computational procedure with first-quadrant support a(nl, n,)
and b(nl, n2) can be expressed in a state-space form. Among several variations
[Attasi; Roesser; Fornasini and Marchesini] proposed, the form proposed by [Roes-
ser] is the most general, and we will consider this model. As an example of a
2-D state-space representation, consider the computational procedure in (5.137).
A signal flowgraph corresponding to (5.137) was previously shown in Figure 5.41.
Denoting the output of the three z;' elements in the signal flowgraph as where sh(nl, n,) is a vector of horizontal state variables that propagate in the
sf(n1, nz), ~2h(nl,n,), and sXn1, n,), horizontal direction and su(nl,n,) is a vector of vertical state variables that prop-
agate in the vertical direction. By taking the z-transform of (5.152), we can
compute the system function H(zl, z,) = Y(zl, z2)lX(zl, z,) given by
Denoting the output of the z c l element in the signal flowgraph as sp(nl, n,)? we
have
If an FIR filter is implemented by exploiting the computational efficiency of FFT For design methods based on cascade implementation, see [Maria and Fahmy;
algorithms, however, the IIR filter's advantage is likely to disappear. IIR filters Costa and Venetsanopoulos; Chang and Aggarwal(1977); Goodman; Iijima et al.].
are useful, therefore, mainly in those applications where low-cost implementation For design methods for separable denomination filters see [Twogood and Mitra;
is very important. Because of their overwhelming general advantage over IIR Abramatic et al.; Lashgari et al.; Hinamoto].
filters, FIR filters are much more common in practice. However, design and For readings on design methods based on both the magnitude and phase error
implementation of 2-D FIR and IIR filters are active areas of research. The impact criteria, see [Aly and Fahmy (1978); Woods et al.; Shimizu and Hirata; Hinamoto
that this research will have on the practical application of 2-D FIR and IIR filters and Maekawa]. I
remains to be seen. I
Significant differences between the I-D and 2-D cases also exist in the design
J . F. Abramatic, F. Germain, and E. Rosencher, Design of two-dimensional separable
and implementation of digital filters. In 1-D, there are practical methods to design
denominator recursive filters, IEEE Trans. Acoust. Speech Sig. Proc., Vol. ASSP-27,
optimal FIR and IIR filters. In 2-D, practical methods that can be reliably used October 1979, pp. 445-453.
to design optimal FIR or IIR filters have not yet been developed. In 1-D, checking
S. A. H. Aly and M. M. Fahmy, Design of two-dimensional recursive digital filters with
the stability of an IIR filter and stabilizing an unstable filter without affecting the specified magnitude and group delay characteristics, IEEE Trans. Circuits and Systems,
magnitude response are quite simple. In contrast, checking the stability and sta- Vol. CAS-25, November 1978, pp. 908-916.
bilizing an unstable filter without significantly affecting the magnitude response is S. A. H. Aly and M. M. Fahmy, Spatial-domain design of two-dimensional recursive digital
a big task in 2-D. In I-D, the cascade and parallel forms are general implemen- filters, IEEE Trans. Circuits and Systems, Vol. CAS-27, October 1980, pp. 892-901.
tation methods for rational system functions. In 2-D, the cascade and parallel S. A. H. Aly and M. M. Fahmy, Symmetry exploitation in the design and implementation
forms cannot generally be used for the realization of rational system functions. of recursive 2-D rectangularly sampled digital filters, IEEE Trans. Acoust. Speech Sig.
Design and implementation are more complex for 2-D than for 1-D digital filters. Proc., Vol. ASSP-29, October 1981, pp. 973-982.
A. Attasi, Systemes lineaires homogenes a deux indices, Rapport Laboria, No. 31, Septem-
REFERENCES ber 1973.
J. B. Bednar, Spatial recursive filter design via rational Chebyshev approximation, IEEE
For a tutorial overview of 2-D digital filtering, see [Mersereau and Dudgeon]. Tram. Circuits and Systems, Vol. CAS-22, June 1975, pp. 572-574.
For constraints imposed by symmetry on filter design and implementation, see [Aly
and Fahmy (1981); Pitas and Venetsanopoulos]. 331
Chap. 5 References
PROBLEMS
(a) If we use the notational convention discussed in Section 5.1, H ( z , , z,) above will
correspond to one particular computational procedure. Determine the compu-
tational procedure.
(b) Sketch the output and input masks of the computational procedure in (a). 1
(c) If we do not assume the notational convention discussed in Section 5.1, more than
one computational procedure will have the above system function H ( z , , 2,). De-
termine all computational procedures that have the above system function.
5.2. Consider a system function H ( z , , z,), given by
!
i
Figure P5.4
Chap. 5 Problems
a(n,, n,) and b ( n , , n,) that could be used. Is the resulting set of equations linear or nonlinear? If the equations are linear,
5.5. Consider a system function H ( z , , z,) = B ( z , , z,)lA(z,, z , ) , where a(n,, n,) is an M explicitly determine a , b , and c . If the equations are nonlinear, explain why.
x M-point first-quadrant support sequence and b(n,, n,) is an N x N-point first- 5.9. Suppose we are given a desired zero-phase infinite impulse response hd(n,,n,). We
quadrant support sequence. We wish to design a zero-phase filter with a circularly wish to design a zero-phase IIR filter whose impulse response is as close as possible
symmetric filter specification by a parallel combination of H ( z , , z,), H ( z c l , z,), in some sense to hd(nl,n,). One approach, which we discussed in Section 5.2.4, is
H ( z , , z;'), and H(z;', 2 , ' ) . For a fixed M and N, imposing symmetry on H ( z , , z,) to divide hd(nl,n,) into four quadrants, design four one-quadrant IIR filters, and then
reduces the number of independent parameters in H ( z , , 2,) and therefore may reduce implement them in an appropriate way. We will refer to this as Approach 1. Another
the complexity of the filter design problem. approach, which we will call Approach 2, is to divide hd(n,, n,) into two segments,
(a) What type of symmetry is reasonable to impose on H ( z , , z,)? design two IIR filters, and then implement them in an appropriate way.
(b) To achieve the symmetry in (a), what type of symmetry is reasonable to impose (a) To design the IIR filter only once in Approach 2, how should we segment hd(n,,n,)?
on A ( z , , 2,) and B ( z , , z,)? Specify the window to be used in the segmentation.
(c) For the symmetry imposed o n A ( z , , 2,) and B ( z , , 2,) in (b), how many independent (b) After one filter is designed in (a), how should that filter be used for the second
parameters do we have in H ( z , , z,)? filter?
5.6. Suppose H ( z , , z,), the system function of an IIR filter, is given by (c) If we wish to design an IIR filter whose impulse response approximates a desired
impulse response that is nonzero everywhere, which approach would you rec-
ommend? Explain why one of the two approaches is inappropriate.
5.10. Suppose we wish to design a 2-D IIR filter whose impulse response h(n,, n,) approx-
Assuming that the desired impulse response hd(n,, n,) is a first-quadrant support imates the desired impulse response h,(n,, n,). The sequence h,(n,, n,) is circularly
sequence, use the Pade matching method to estimate a , b , c, and d . Express your symmetric; that is, hd(n,, n,) = f(n: + n:). One such example is
answer in terms of hd(O, 0 ) , hAO, I ) , h d ( l , O ) , and h d ( l , 1).
5.7. Consider a system function H ( z , , z,), given by
The 2-D IIR filter h(n,, n,) should have the following three characteristics:
(1) The sequence h(n,, n,) should have a maximum amount of symmetry and should
(a) Determine h(n,, n,), the impulse response of the system. be symmetric with respect to as many lines as possible that pass through the origin.
(b) Assume that h(n,, n,) in (a) is the desired impulse response h,(n,, n,). Suppose One such line is the n, axis, in which case h(n,, n,) = h ( - n , , n,).
we estimate the parameters a , b , c, and d in the system function F ( z , , 2,) given ( 2 ) The sequence h(n,, n,) should be designed as a combination of recursively com-
bv putable IIR subfilters h,(n,, n,). Only one subfilter should be designed. Let
H , ( z , , 2,) denote the system function of this subfilter. All others should be
derivable from H , ( z , , z,) by inspection, exploiting only symmetry considerations.
using the Pade matching method. Show that a = -4, b = -!, c = 1 , and d = 1 . (3) The desired impulse response that H , ( z , , z,) approximates should have as small
(c) If we estimate the parameters a , b , c, and d in (b) using Prony's method, do we a region of support as possible. This simplifies the design of H , ( z , , 2,).
get the same results as in (b)? Develop an approach that will achieve the above objectives. Show that your approach
will have the desired characteristics. Clearly state the system functions of all the
5.8. We have measured hAn,, n,), the impulse response of an unknown first-quadrant subfilters in terms of H , ( z , , z,), the system function of the subfilter you designed.
support LSI system, as shown in the following figure.
5.11. Consider a sequence x(n,, n,) which has a valid complex cepstrum. We can express
X ( o , , o , ) , the Fourier transform of x(n,, n,), in terms of its real and imaginary part
as
LSI System
X(wl, = X R ( w l ,4 + jX1(wl, 0 , ) .
We wish to model the unknown system by a rational system function, as shown in the
following figure. We can also express ~ ( o ,o , ) , the Fourier transform of L(n,, n,), in terms of its real
Show that r(n,, n,) has a valid complex cepstrum and determine i ( n , , n,), the
complex cepstrum of r(n,, n,). Determine the system function of a system that is causal and stable without changing
5.15. Let & ( a ) represent the principal value of the phase of X ( o ) , the Fourier transform ( H ( o ) ( ,using each of the following methods:
of x(n), and let O,(a) represent the odd and continuous phase function of X ( o ) . (a) By reflecting each root outside the unit circle to a root inside the unit circle at a
Suppose gX(w) has been sampled at w = ( 2 n I N ) k and that &(k) = 0,(o)J,=,,,,,, is conjugate reciprocal location.
shown below. (b) By using the complex cepstrum.
5.18. Let x(n,, n,) denote a real and stable sequence that has a rational z-transform
X ( z , , 2,) with no pole or zero surfaces crossing the unit surface. Suppose we wish
to factor X ( z , , z,) as a product of four z-transforms as follows:
The function Xi(z,, 2,) represents the z-transform of a sequence that has the ith-
quadrant support region. For example, X,(z,, 2,) represents the z-transform of a
second-quadrant support sequence, Develop a method to perform the above spectral
factorization. Note that X,(z,, z,) may not be a rational z-transform.
5.19. Let a(n,, n,) denote a real, finite-extent, and stable sequence with its Fourier transform
denoted by A ( o , , o,). The sequence a(n,, n,), in addition, satisfies the following
properties:
( 1 ) A ( w , o,) f 0 at any ( o , , o,).
( 2 ) a(n,, n,) = a( - n,, n,) = a(n,, - n,) = a( - n,, - n,).
( 3 ) b(n,, n,), the complex cepstrum of a(n,, n,), is well defined without any modifi-
cation of a(n,, n,).
Suppose we obtain a sequence b(n,, n,) by the system in Figure P5.19. Note that
l(n,, n,) is a third-quadrant support sequence.
Chap. 5 Problems
Infinite Impulse Response Filters Chap. 5
Figure P5.21
Figure P5.19
Is c(n,, n,) a well-defined sequence; that is, is C ( w , , w,) a valid Fourier transform? We estimate the filter coefficients a , b , and c by minimizing
Explain your answer.
(b) Can you recover a(n,, n,) from b(n,, n,)? If so, express a(n,, n,) in terms of Error = 2
"=
(hd(n) - h(n))2.
b(n,, n,). If not, explain why a(n,, n,) cannot be recovered from b(n,, n,). -r
5.20. Suppose an image x(n,, n,) is degraded by blurring in such a way that the blurred Show that a = -4, b = 0, c = 1 , and h(n) is given by h(n) = (4lnu(n).
image y(n,, n,) can be represented by (b) Is the system in (a) stable?
(c) In a frequency domain design method, we assume that
where b(n,, n,) represents the impulse response of the blurring system. Using the
complex cepstrum, develop a method to reduce the effect of b(n,, n,) by linear filtering
in the complex cepstrum domain. This type of system is referred to as a homomorphic We estimate a , b , and c by minimizing
system for convolution. Determine the conditions under which the effect of b ( n , , n,)
can be completely eliminated by linear filtering in the complex cepstrum domain.
Error = -
27'T j
H a w ) - H(w)12 dw.
5.21. Consider a 2 x 2-point first-quadrant support sequence a(n,, n,) sketched in the w=-=
following figure.
Chap. 5 Problems
Infinite Impulse Response Filters Chap. 5
by row. You may assume that the input x(n,, n,) is zero outside 0 5 n, 5 N - 1,
0 s n, 5 N - 1 where N >> 1, and that we wish to compute y(n,, n,) for 0 5 n, 5
Show that a = - 912, b = 2, c = - 7, and h(n) obtained by inverse Fourier N-1,0sn2sN-1.
transforming H(w) is given by
Note that the filter is stable, but it does not have the same region of support as
h(n) in (a), and it is not even recursively computable.
(d) Suppose we use the a, b, and c obtained in (c) but use the computational procedure
in (3). Determine h(n) and show that the system is unstable.
(e) Suppose hd(n) = (f)"u(n). Show that the impulse responses obtained from the
spatial domain design method in (a) and the frequency domain design method in
(c) are the same.
Even though Parseval's theorem states that the two error criteria in (4) and (6) are
the same, they can lead to different results in practice, depending on how they are
used.
Figure P5.27
5.23. One error criterion that penalizes an unstable filter is given by (5.127). If we let the
parameter a in (5.127) approach a , is the filter designed guaranteed to be stable? Is
I
minimizing the Error in (5.127) while letting a approach a good way to design an (a) Write a series of difference equations that will realize the above signal flowgraph.
IIR filter? (b) Determine the system function of the above signal flowgraph.
5.24. In one design method by spectral transformation, a 2-D IIR filter is designed from a 5.28. Consider a first-quadrant support IIR filter whose system function is given by
I-D IIR filter. Let H(z) represent a 1-D causal and stable IIR filter. The filter is
a lowpass filter with cutoff frequency of 312. We design two 2-D filters H,(z,, z,)
and H2(z1,2,) by
The filter may or may not be stable, depending on the coefficients, a, b, and c.
(a) If a = 2, b = 4, and c = 8, the filter is unstable. Determine H'(zl, z,) which
is stable and is a first-quadrant support system, and which has the same magnitude
(a) What is the approximate magnitude response of H,(z,, z,)? response as H(zl, 2,). Note that the denominator polynomial is factorable for
(b) Is H,(zI, 2,) stable and recursively computable? this choice of a, b, and c.
(c) What is the approximate magnitude response of Hz@,, z,)? (b) We wish to implement H(z,, z,), even though it may be unstable. Sketch a signal
(d) Is H2(z1, 2,) stable and recursively computable? flowgraph of H(zl, z,), which minimizes (within a few storage elements from the
(e) We design another 2-D filter HT(zI, 2,) by minimum possible) the number of storage elements needed when the input is
available column by column. Find the total number of storage elements needed
when the input is zero outside 0 s n, 5 N - 1, 0 5 n2 5 N - 1 and we wish to
compute the output for 0 s n, s N - 1 , 0 n, 5 N - 1.
What is the approximate magnitude response of HT(zI,z2)?
5.25. Let H(z,, z,) denote a system function given by
:I 5.29. Consider the following signal flowgraph, which implements an IIR filter:
1
v, In,, n,)
t v(nl, nz)
x(nl, nz) o
(a) Sketch a signal flowgraph that realizes the above system function using a direct
form.
(b) Sketch a signal flowgraph that realizes the above system function using a cascade
form.
5.26. Consider the following computational procedure:
Figure P5.29
Draw a signal flowgraph that requires the smallest (within a few storage elements
from the ~ i n i m u npossible)
: ~ m b c cfi storage w i t s when the input is available row
Chap. 5 Problems
u2(n2)z~"'.
To implement the system, we consider two approaches. In the first approach (Ap-
proach I), we use a computational procedure that relatesx(n,, n,) to y(n,, n,) directly.
In the second approach (Approach 2), we implement l/A,(z,) first with a computa-
tional procedure that relates x(n,, n,) to r(n,, n,), and then implement 1/A2(z2)with
a computational procedure that relates r(n,, n,) to y(n,, n,).
(a) How many multiplications and additions do we need to evaluate one output point
using Approach l ?
(b) Determine the computational procedure that relates x(n,, n,) to r(n,, n,).
Figure PS.3Oa
(c) Determine the computational procedure that relates r(n,, n,) to y(n,, n,).
(a) Determine the computational procedure. (d) How many multiplications and additions do we need to evaluate one output point
(b) Determine the system function. using Approach 2?
(c) Suppose the input x(n,, n,) is zero outside 0 5 n, I511,O 5 n, I511. We wish 5.34. One approach to designing a zero-phase IIR filter is by cascading more than one IIR
to compute the output y(n,, n,) for 0 5 n, I511, 0 5 n, r 511. Is it possible filter. Let H(z,, z,) denote a first-quadrant support IIR filter. We now cascade
to compute y(n,, n,) in only the desired region without computing any other output H(zl, z,), H ( z i l , z,), H(z,, z;,), and H(zr1, z;'), as shown below.
points? If not, what is the region for which the output is not desired but has to
be computed?
(d) If the output is computed column by column, how many storage elements are
required to realize 2;' and z;' elements, respectively? Assume that the input
is available in any order desired.
(e) Can the output be computed row by row? Figure P5.34
(f) Answer (d) if the output is computed in the direction shown below.
(a) Show that F(z,, z,) given by
F(z1, 22) = H(z1, ~,)H(z;', z2)H(z1, z;')H(z;', 2;')
is a zero-phase filter.
(b) Let the input x(n,, n,) denote an N x N-point finite-extent sequence that is zero
outside 0 5 n, 5 N - 1, 0 5 n, s N - 1. What is the region of support of
y,(n,, n2), the output of the system H(z,, zz)?
(c) If we do not truncate the infinite-extent sequence y,(nl, n,), is it possible to
implement H(z;l, z,) with a recursively computable system?
Figure F 5 . x ~
Chap. 5 Problems 345
Infinite Impulse Response Filters Chap. 5
6.1 RANDOM PROCESSES
This section summarizes the fundamentals of random processes, which are useful
in this chapter for studying spectral estimation and in later chapters for studying
image processing. Since the reader is assumed to have some familiarity with
random processes, many results are simply stated. Results with which some readers
may not be familiar are derived. Despite the derivations, this section is intended
to serve primarily for review and establishment of notation. For a detailed and
complete presentation of topics related to random processes, the reader is referred
to [Papoulis; Van Trees].
A real random variable x is a variable that takes on real values at random, for
instance, from the outcome of flipping a coin. It is completely characterized by
its probability density function px(xo). The subscript x inpx(xo)denotes the random
variable x, and x, is a dummy variable that denotes a specific value of x. The
6.0 INTRODUCTION probability that x will lie between a and b is given by
b
Estimating the power spectrum associated with a random process is desirable in Prob [a r x 5 b] =
many applications. For example, Wiener filtering is one approach to solving image (6.1)
restoration problems in which a signal is degraded by additive random noise. The Since an event that is certain to occur is assumed to have a probability of 1,
frequency response of a noncausal Wiener filter requires knowledge of the spectral
contents of the signal and background noise. In practice, they often are not known
and must be estimated from segments of data. As another example, spectra1
contents of signals received from an array of sensors contain information on the
signal source, such as the direction of low-flying aircraft in the case of an acoustic The expectation of a function of a random variable x, E[f(x)], is defined by
sensor array application.
This chapter treats the 2-D spectral estimation problem. At a conceptual
level, 2-D spectral estimation methods are straightforward extensions of 1-D meth-
ods, or can be derived straightforwardly from them. However, some 2-D spectral The expectation defined above is a linear operator and satisfies
estimation methods differ considerably from the corresponding 1-D methods in
such details as computational complexity and properties. We examine both sim- E[f(x) + g(x)l = E[f(x)l + E[g(x)l (6.4)
ilarities and differences, with more emphasis on topics where there are significant and E[cf (x>l = cE[f(x)l (6.5)
differences between the 1-D and 2-D cases. where c is any scalar constant.
In Section 6.1, we briefly summarize some fundamental results related to The nth moment of a random variable x, E[xn], is defined by
discrete-space random processes. Readers familiar with the material may wish to
skim this section, but should give some attention to the notation used. In Section
6.2, we discuss various issues related to spectral estimation and different approaches
to spectral estimation. In Section 6.3, the performance of several different spectral
The first moment of x is called the mean or average of x. From (6.6),
estimation methods is compared. Section 6.4 contains some additional comments
on the spectral estimation problem. In Section 6.5, we illustrate the application
of a spectral estimation technique to data from an array of acoustic microphones.
Parts of this chapter appeared in "Multi-dimensional Spectral Estimation," by Jae S. The variance of x, Var [x], is defined by
Lim, Chapter 6 of Advances in Computer Vkion and lmage Processing: lmage Enhancement
and Restoration, edited by Thomas S. Huang. Copyright O 1986 by JAI Press. Reprinted Var[x] = E[(x - E[x])~]= E[x2 - 2xE[x] + E2[x]] = E[x2] - E2[x]. (6.8)
by permission of the publisher.
Sec. 6.1 Random Processes 347
Many random variables x, y, z, . . . are completely characterized by their
The standard deviation of x, s.d. [x], is defined by joint probability density function p,, ,, .,. . . (x,, yo, zO,. . .). The definitions of
the expectation operation and linear and statistical independence in this case are
s.d. [x] = (Var [x])~'~. (6.9) similar to the definitions in the two random variable case.
Two random variables, x and y, are completely characterized by their joint
probability density f u n c t i ~ n p ~ , ~
yo).
( x ~They
, are said to be statistically independent 6.1.2 Random Processes
if they satisfy
A collection of an infinite number of random variables is called a random process.
px,y(xo,YO)= P ~ ( ~ O ) P ~ ( Yfor
O )all (10, YO). (6.10) If the random variables are real, the collection is called a real random process. If
The expectation of a function of two random variables, E[f(x, y)], is defined by the random variables are complex, the collection is called a complex random proc-
ess. Unless stated otherwise, all the results in this section will be stated for complex
(6.11) random processes. The results for a real random process are special cases of the
results for a complex random process.
Two random variables x and y are said to be linearly independent if Let us denote an infinite number of complex random variables by x(n,, n,),
where x(nl, n,) for a particular (n,, n,) is a complex random variable. The random
E [ ~ Y= ] E[xlE[~l. (6.12) process x(n,, n,) is completely characterized by the joint probability density function
Statistical independence implies linear independence, but linear independence does of all the random variables. If we obtain one sample, or realization, of the random
not imply statistical independence. process x(nl, n,), the result will be a 2-D sequence. We will refer to this 2-D
The probability density function of a random variable x given (conditioned sequence as a random signal, and will denote it also by x(nl, n,). Whether x(n,, n,)
on) another random variable y is denoted by p,ly(xo~yo)and is defined by refers to a random process or one realization of a random process will usually be
clear from the context. Otherwise, what is meant will be specifically stated. The
PxiY(xol~o)= P ~ , ~ (YO)~P~(YO).
~O, (6.13) collection of all possible realizations is called the ensemble of the random process
If x and y are statistically independent, knowing y does not tell us anything about
x(n1, n,).
x, and p,ly(xo~yo)reduces to p,(xo). The expectation of a function of x conditioned The auto-correlation function, or for short, the correlation function of the
on y, E[f(x)Jy], is defined by random process x(nl, n,), R,(n,, n,; k,, k,), is defined by
(6.14)
The correlation is the expectation of the product of two complex random variables,
A complex random variable w is defined by x(nl, n,) and x*(kl, k,). The auto-covariance function, or the covariance function,
w=x+jy (6.15) for short, of x(nl, n,), y,(n,, n,; k,, k,), is defined by
where x and y are the two random variables defined above. The expectation of
a function of w, E[f(w)], is defined by
E[f ( 4 1 = E[f (x + jy)l (6.16)
f(x0 + i ~ o ) ~ x , y (YO)
~ o ,h o dye A random process x(nl, n,) is called a zero-mean process if
The mean of w is defined by E[x(n,, n,)] = 0 for all ( 4 , n,). (6.21)
E[w] = E[x + jy] = E[x] + jE[y]. (6.17) For a zero-mean random process,
The variance of w is defined by
Var[w] = E[(w - E[w])(w - E[w])*] A random process x(n,, n,) with nonzero mean can always be transformed to a
= E[ww*] - E[w]E*[w] (6.18) zero-mean random process by subtracting E[x(nl, n,)] from x(n,, n,). Unless
specified otherwise, we will assume that x(nl, n,) is a zero-mean process and (6.22)
= E[x2 + y2] - ( F [ x ] + F [ y ] ) is valid.
= Var [x] + Var [y]. A random process x(nl, n,) is said to be stationary or homogeneous in the
Note that the variance of w is real and nonnegative, even though w may be complex. Sec. 6.1 Random Processes 349
~*2(n1,
n2), n, = k,, n, = k,
= {o, otherwise.
From (6.27), it is clear that the correlation sequence has complex conjugate sym- For a stationary white noise process, then,
metry.
A stationary random process x(n,, n,) is said to be ergodic if the time (or Rx(n1, n2) = E[x(kl, k2)x*(k1 - n1, k, - n,)] (6.36)
space) average equals the ensemble average. Suppose we wish to estimate m, = = u,' 8(n1, n,).
E[x(n,, n,)] from realizations, or samples, of a stationary x(n,, n,). Since m,
represents an ensemble average, we need an ensemble (an entire collection of all From (6.30) and (6.36), the power spectrum of a stationary white noise process is
possible outcomes) of x(n,, n,) for any particular (n,, n,). If the random process given by
is ergodic, then m, can be computed from one realization of x(n,, n,) by Px(wl, 0,) = u: for all (o,, w,). (6.37)
N N
1 The power spectrum is constant for all frequencies; hence the term "white."
m, = E[x(n,, n,)] = lim x(nl, n2). (6.28)
N-+Z (2N + 1)' nl= -N n2= -,v For a real random process x(nl, n,), (6.19), (6.26), (6.27), and (6.29) reduce
Similarly, for an ergodic process, to
Since Px(ol, o,) is periodic with a period of 27r in both the variables o, and o,,
from (6.42) Px(ol,o,) for a real random process is completely specified by Px(ol,0,) shown in Figure 6.1. Clearly, y(n,, n,) is related to x(n,, n,) by
for -n r ol 5 I T ,0 5 0, IIT. For a real random process, therefore, Px(wl,0,) is
often displayed only for -IT 5 o, 5 I T , 0 5 o, 5 IT.
Two complex random processes x(n,, n,) and y(n,, n,) are completely char-
acterized by the joint probability density function of the random variables in
x(n,, n,) and y(n,, n,). The cross-correlation function of x(n,, n,) and y(n,, n,),
Rxy(nl,n,; k,, k,) is defined by Note that x(n,, n,) is a random process, while h(n,, n,) is a deterministic signal.
In practice, h(n,, n,) would typically be a real signal, but we will derive here a
Rxy(nl,n2; k l , k2) = E[x(n,, n2)y* ( k l ,k,)]. (6.43) more general result that applies to a complex h(n,, n,).
The cross-covariancefunction of x(n, ,n,) and y(n,, n,) , yxy(nl,n,; k,, k,) , is defined We wish to determine E[y(n,, n,)], Ry(nl,n,; k,, k,), Rxy(nl,n,; k,, k,), and
Ryx(nl,n,; k,, k,). We shall find that y(n,, n,) is a stationary random process.
by To determine E[y(n,,n,)], the expectation operator is applied to (6.49),recognizing
rxy(nl,n,; k l , k2) = E[(x(n1,n2) - E[x(n,, n,)])(y*(k,,k2) - E*[y(k,,k2)l)l. that h(n,, n,) is a deterministic signal:
(6.44)
From (6.43) and (6.44), for zero-mean processes x(n,, n,) and y(n,, n,),
= m$(O, 0 ) for all (n,, n,)
RxY(nl,n2; k1, k2) = yxy(nl,n2; k1, k2). (6.45)
For stationary processes x(n,, n,) and y(n,, n,),
For a zero-mean x(n17n,) , y(n,, n,) is also zero mean.
Rxy(nl,n,) = E[x(k,,k,)y*(k, - n,, k, - n,)] independent of (k,, k,). (6.46) To obtain Ry(nl,n,; k,, k,) from (6.49),
For ergodic processes x(n,, n2)and y(n,, n,),
1
= lim
N-+r (2N + k , = - N k2= - N
The cross-power spectrum of two jointly stationary processes x(n,, n,) and y(n,, n,),
Pxy(ol,o,), is defined by
From (6.5I), Ry(nl,n,; k, ,k,) is a function of n1 - k1 and n, - k,, and so, denoting
Ry(n,, n,; 0 , 0 ) by Ry(nl,n,), we can rewrite (6.51) as
X X X X
Consider a stationary complex random process x(n,, n,) with mean of mx and
correlation of Rx(nl, n,). Suppose we obtain a new random process y(nl, n,) by
passing x(n17n,) through an LSI system with the impulse response h(n,, n,), as
Sec. 6.1 Random Processes
352 Spectral Estimation Chap. 6
From (6.59), (6.60b), and (6.61),
Similarly, E[s(n,, n2)x*(m1,m2)l = E[3(n1,n2)x*(m1,4 1
RXy(nl,n2) = RX(nl,n2) * h*(-n1, -n2) (6.53)
Suppose we have a signal s(n,, n,) and a noise w(n,, n,) which are samples of zero- The filter H ( w l , w2) in (6.65) is called the noncausal Wiener filter.
mean stationary random processes s(n,, n,) and w(n,, n,). The noisy observation Suppose s(nl, n,) is uncorrelated with w(n,, n,). Then we have
x(nl, n,) is given by
E[s(n1, n2)w*(m1, m2>I = E[s(n1, n,)]E[w*(ml, m,)]. (6.66)
x(n1, n2) = 4 4 , n2) + w(n1, n2). (6.58)
From (6.66) and noting that s(n,, n2) and w(n,, n,) are zero-mean processes, we
We wish to determine s(n,, n,) from x(n,, n,) using a linear estimator given by obtain
Rsx(n17n2) = RS(nl,n2) (6.67a)
the product of the variance and resolution of the spectral estimate is about the
same to a first-order approximation.
Sec. 6.2 Spectral Estimation Methods 363
The function +,(nl, w,) can be determined by applying 1-D Fourier transform
operations to Rx(nl, n,) along the n, dimension, and Px(wl, w,) can be determined
by applying 1-D Fourier transform operations to +,(nl, w,) along the n, dimension.
Although Px(wl, w,) is real and nonnegative, the intermediate function +,(n,, w,)
is in general complex. If +,(nl, w,) is forced to be real and nonnegative in (6.91),
the expression on the right-hand side of (6.90) will not be equal to Px(wl, w,).
For dimension-dependent processing, it is not necessary to apply a 1-D spec-
tral estimation method to each dimension. For 3-D data, for example, a 1-D
spectral estimation method may be applied along one dimension and a 2-D spectral
estimation method along the remaining two dimensions. In a planar array envi-
ronment, where the data are 3-D (one time dimension and two spatial dimensions),
for example, first a conventional 1-D spectral estimation method can be applied
along the time dimension, and then a 2-D high-resolution spectral estimation tech-
nique, such as the maximum likelihood method (MLM), can be applied along the
two spatial dimensions.
The maximum likelihood method (MLM) was originally developed [Capon] for
nonuniformly sampled data. "Maximum likelihood" is a misnomer, in that this
method is not an ML estimator as discussed in Section 6.1. The computation and
performance aspects of this method are essentially the same in both 1-D and 2-D
except for the amount of data involved.
Consider a specific frequency (w;, w;) at which we wish to estimate the power
spectrum Px(wl, w,). To develop the MLM, we first obtain y(n,, n,) by linearly
combining (filtering) the random signal x(n,, n,) by
Figure 6.4 Conventional spectral estimates by correlation windowing. Parameters used for
Subject to B(w;, w;) = 2 C b(n,, n , ) e - ~ " ~ ~ ~=e 1.- ~ " ~(6.94)
(ni,nz) CB
~
data generation: M (number of sinusoids) = 2; a, (amplitude of the first sinusoid) = 1;
az = 1; ( u l 1 / 2 ~ulJ2m)
, = ( - 0 . 2 , 0.2); ( u 2 , / 2 ~u22/27r)
, = (0.3, 0.3); 4, (phase of the ith The average power E[k(nl, n2)12]with this choice of b(nl, n,) is then considered
sinusoid) = 0; uZ(noise power) = 1; SNR = 3 dB; correlation size (region C), 9 x 9; CINC the spectral estimate P,(w;, w;). In essence, the MLM estimates Px(wl, w,) by
(increments in decibels between contours) = 2 dB. (a) Estimated correlation from data size
of 6 x 6 was used; (b) exact correlation was used.
where MLM(w :N) represents the MLM spectral estimate, and MEM(w :p) represents
and the resulting spectral estimate px(w;, wb) is given by the MEM spectral estimate based on R,(n) for - p 5 n 5 p. By rewriting (6.102),
we can express MEM(w : N) in terms of MLM(u : N) and MLM(w : N - 1) as
The elements in R in (6.101) are the correlation points of the random process
x. Since R does not depend on (w,, w,), R-' in (6.101) needs to be computed
only once. Furthermore, pX(wl,w,) in (6.101) can be computed [Musicus] by using
Equation (6.103) is the basis for the algorithm.
FFT algorithms in some cases by exploiting special structures in the matrix R.
Now suppose that Rx(nl, n;?), the correlation of a 2-D signal, is either known
In (6.92), we wish to choose as large a region B as possible, since we can
minimize E[ly(n,, n2)l2]better with a larger B. A larger B, however, corresponds or estimated for (n,, n,) € C. A straightforward extension of (6.103) to 2-D signals
to a larger matrix H. Since all the elements in H are assumed known in (6.101),
Sec. 6.2 Spectral Estimation Methods 367
most often used is autoregressive (AR), since estimating the AR model parameters
is a simple linear problem for many cases of interest.
In AR signal modeling, the random process x is considered to be the response
of an AR model excited by white noise w(n,, n,) with variance u2, as shown in
Figure 6.7. From x(n,, n,), a(n,, n,) for (n,, n,) € A and u2 are estimated. Since
the input random process has a constant power spectrum of amplitude c?, from
(6.55) the spectral estimate ~ ~ ( oo,), , is given by
If we choose (I,, I,) such that x*(n, - I,, n, - I,) represents a previously computed
point relative to x(n,, n,), x*(nl - I,, n, - I,) is uncorrelated with w(n,, n,).
Since w(n,, n,) is assumed to be white noise with zero mean, for such values of
(I,, I,), (6.107) reduces to
Equation (6.108) is a linear set of equations for a(n,, n,). Once a(n,, n,) is
determined, c? can be obtained from (6.107) with I, = I, = 0. When I, = I, = 0,
E[w(n,, n,)x*(n, - I,, n, - I,)] in (6.107) becomes 13.Equation (6.108) is based
on the recursive computability assumption, which limits the shape of Region A.
Computing a(n,, n,) for a nonrecursively computable system is a nonlinear problem.
Partly for this reason, (6.107) is sometimes used for any shape of Region A .
N p r e 6.6 MLM and lMLM spectral estimates based on exact correlation points. Parameters Equation (6.108) is called the normal equation and can be very easily related
used for correlation generation: M = 2; a, = 1; a, = 1; (w1,/2.rr,w12/2m) = (0.2, 0.2); to (5.21), which is used in designing an IIR filter, as discussed in Section 5.2.1.
(02,i2m, w&m) = (0.3, 0.3); cr2 = 0.25; SNR = 9 dB; correlation size, 5 x 5; CINC = 2
dB. (a) MLM; (b) IMLM. However, there are two major differences between the filter design problem and
AR modeling for spectral estimation. The filter design problem is a deterministic
problem in the sense that the input to the system is a fixed known sequence such
where P~,(W,,w,) and !'x,(wl, wz) are spectral estimates obtained from two different
shapes of the region A. This is illustrated in the following example. Figure 6.11
shows the spectral estimate obtained by using (6.109), with P ~ , ( W0,) ~ , and
px2(w1, 0,) being the spectral estimates in Figures 6.10(a) and 6.10(b), respectively.
The spectral distortions evident in Figures 6.10(a) and 6.10(b) have been signifi-
cantly reduced in Figure 6.11.
When spectral distortion is reduced by such methods, spectral estimation
based on AR modeling appears to have resolution properties better than the MLM
Figure 6.10 Spectral estimates by AR signal modeling based on exact correlation points.
Parameters used for correlation generation: M = 2; a, = 1; a, = 1; (wl1/2n, w,J2n) =
(-0.1,0.22); (w2,/2.rr,02,/2n) = (0.1,0.28); 02 = 1; SNR = 3 dB; correlation size, 5 x 5;
CINC = 2 dB. (a) AR model coefficients have the region of support shape shown in Figure
Figure 6.11 Spectral estimate obtained by combining the spectral estimates in Figures 6.10(a)
6.9(b); (b) same as (a) with the shape shown in Figure 6.9(c).
and (b), using (6.109).
Figure 6.12 MLM and IMLM spectral estimates. Parameters used are the same as those in 6.2.6 The Maximum Entropy Method
Figure 6.10: (a) MLM; (b) IMLM.
Because of its high resolution characteristics, the maximum entropy method (MEM)
for spectral estimation has been studied extensively. For 1-D signals with no
missing correlation points, the MEM is equivalent to spectral estimation based on
is maximized and
and Rx(nl, n,) = ~-'['~(o,, w2)] for (nl, n2) € C. (6.120)
RX(nl,4 ) = F-'[pX(ol, o,)] for (n,, n,) € C. (6.113)
The MEM problem above generally has a unique solution if the given Rx(nl, n,)
Rewriting pX((o1,0,) in terms of the given Rx(nl, n,) for (n,, n,) € C and for (n,, n,) € C is extendable, that is, is a part of some positive definite correlation
k ( n , , n,) for (n,, n,) f C, we have function (meaning that its Fourier transform is positive for all (o,, o,)). In general,
it is difficult to determine if the given segment of the correlation is part of some
positive definite correlation sequence, although this is often the case in practice.In
the following discussion, we will assume that the given Rx(nl, n,) for (n,, n,) € C
is extendable.
Even though the MEM problem statement of (6.119) and (6.120) applies,
To maximize H in (6.112) with respect to kx(nl, n,) in (6.114), with appropriate dimensionality changes, to signals of any dimensionality, the
solutions depend strongly on dimensionality. For 1-D signals with no missing
correlation points, the spectral estimate obtained from AR signal modeling is in
the form of (6.119) and satisfies the correlation matching property of (6.120), so
it is the same as the spectral estimate given by the MEM. This is not the case for
J'= J .. 1
e-jm~nle-j~2n2
do, do2 (6.115) 2-D signals. In the 2-D case, the spectral estimate obtained from AR signal
( 2 ~ )WI =~ w2= - = Px(O1,0,) modeling is in the form of (6.119), but does not satisfy the correlation matching
= 0 for (n,, n,) p C property of (6.120). As discussed in Section 6.2.4, this is because the number of
independent correlation points needed in solving the 2-D normal equation of (6.108)
From (6.115), and noting that px(o,, o,) = &(to,, o,) from (6.31), we obtain is greater than the number of AR model parameters, and because the spectral
1 estimate is completely determined by the AR model parameters. As a result, the
eimlnle~m2n2
do, do2 = 0 for (n,, n,) f C. spectral estimate obtained from AR signal modeling does not have enough degrees
of freedom to satisfy (6.120). Because of this difficulty, no closed-form solution
(6.116) for the 2-D MEM problem has yet been found.
Many attempts have been made to solve the 2-D MEM problem. In all cases,
The expression in (6.116) is the inverse Fourier transform of l l ~ ~ ( oo,),
, , and the resulting algorithms are iterative ones which attempt to improve the spectral
therefore from (6.116), estimate in each iteration. Burg (1975) has proposed an iterative solution which
1 requires the inversion of a matrix in each iteration, with the matrix dimension
A(n,, n,) = F-I (6.117) being on the order of the number of the given correlation points. No experimental
results using this technique have yet been reported. Wernecke and D'Addario
*The entropy expression in (6.112) is valid for a stationary Gaussian random process have proposed a scheme in which the entropy is numerically maximized. The
[Burg (1975)j.
Only certain aspects of the spectral estimation techniques discussed in the previous
section have been quantitatively compared [Malik and Lim]. In this section, we
discuss the limited results available in the literature.
The major impetus behind the development of high resolution spectral esti-
mation techniques is the improvement in resolution they offer over conventional
techniques. Experimental studies of two aspects of resolution have been reported.
The first is the resolvability of two sinusoids in the presence of noise. The other
is the accuracy of the frequency estimation of a sinusoid when the sinusoid is well
resolved. In both cases, complex data with exact correlation values [Equation
(6.84)] were used to separate the issue of resolution from the issue of correlation Figure 6.15 Spectral estimates do not depend on absolute peak locations for complex signals.
estimation from data. The correlation windowing method, the MLM, and the Parameters used for these MEM spectral estimates: M = 2; a, = 1; a, = 1; u2 = 2;
SNR = 0 dB; correlation size, 5 x 5; CINC = 1.915 dB; (0)0-dB (maximum) points.
MEM were compared. For correlation windowing, the 2-D separable triangular
(a) (wl1/2.rr,wl2/2m) = (0.1, -0.23); (oZl/2.rr,wZ2/2.rr)= (0.325, -0.23); (b) (wll/2.rr,w,,/2.rr)
window was used. = (-0.4, 0.4); (wZ1/2.rr, wZ2/271)= (-0.175, 0.4).
To compare different spectral estimation algorithms in their ability to resolve
two sinusoids, a quantitative measure of resolvability was developed based on
empirical observations. It has been observed that for a given size and shape of C
(the region for which the exact correlation sequence is assumed to be known) and initially resolved, but become resolved as the distance between them is increased.
a given SNR, the spectral estimates for the correlation windowing method, the Figure 6.16 shows the results obtained by the MEM as the separation between
MLM, and the MEM do not depend on the absolute location of the peaks in the peaks is increased. Initially, the two peaks are not resolved, and the spectral
2-D frequency plane. That is, the shape and size of the estimated spectral peaks estimate consists of a single spectral peak located approximately at the midpoint
remain the same regardless of the complex sinusoids' locations, if the same relative of the line joining the true peak locations. As the distance between the peaks
distance and orientation of the peaks are maintained. Figure 6.15 illustrates this increases, the spectral estimate shows a distortion or stretching in the direction of
phenomenon for MEM spectral estimates. In these cases, the frequency separation the peaks, and eventually the two peaks are resolved. Figure 6.17 summarizes
between the peaks is held constant, and the orientation of the peaks is kept hor- the resolution performance of the three techniques. It is clear that in the 2-D
izontal. The results clearly show the invariance of the MEM spectral estimates case, as in the 1-D case, the MEM provides higher resolution than the other two
under these conditions. Other examples support this conclusion, and similar results methods. Note that the resolution performance of the correlation windowing
have been observed for the MLM and the correlation windowing method. In method is determined only by the size of the correlation available for analysis, and
addition, in all three methods, a larger separation in the frequencies of two sinusoids is independent of the SNR as far as the resolution d is concerned. This is because
has been observed to always produce "more resolved" spectral estimates at a given changing noise level adds a constant independent of frequency to the spectral
SNR, size and shape of Region C, and orientation of the peaks. Based on the estimate when exact correlation values are used. The minimum distance for the
above, a reasonable quantitative measure of the resolvability would be the minimum peaks to be resolved in the MEM and MLM estimates decreases with increasing
frequency separation, denoted by d, above which the two sinusoids are resolved SNR, with the MEM consistently outperforming the MLM.
and below which they are not. For this measure d, smaller values imply higher The measure adopted for the resolution performance evaluation is fairly ar-
resolution, while larger values imply lower resolution. bitrary, and is used only to study the relative performance of the different tech-
Many cases have been studied in an effort to determine the minimum sepa- niques under the same set of conditions. The minimum resolution distance be-
ration distance d required to resolve two peaks in the sense that the spectral tween two peaks also depends on their orientation in the 2-D frequency plane and
estimates will display two distinct peaks. One peak's location is held constant
while the second peak's location is varied over a range such that the peaks are not
Sec. 6.3 Performance Comparison 385
on the shape of C. Thus, the resolution measure is only an indicator of the relative
performance, and should not be considered meaningful as an absolute measure.
Another set of experiments has been directed toward studying the accuracy
of the peak location resulting when the number of sinusoids present is accurately
estimated. The quantitative measure of the error in the location of the spectral
peak (LOSP) is defined as
2
Error (LOSP) =
M
where the number of sinusoids is M, wile and wi2erepresent the estimated frequency
Figure 6.16 Change in MEM spectral estimates as the separation between two peaks is location of the ith peak, and w,, and q2,represent the true peak location. Table
increased. Parameters used for the MEM spectral estimates: M = 2; a, = 1; a, = 1;
a2 = 6.32; SNR = - 5 dB; correlation size, 3 x 3; (0) 0-dB (maximum) point in the spectral
6.1 shows the LOSP error of some representative one sinusoid and two sinusoid
estimate. CINC = 1 dB. (a) (o11/2.rr,w12/2.rr)= (0.1, 0.1); (02,/2.rr,02,/2a) = (0.3, 0.1). cases. For the one sinusoid case with 3 x 3 correlation values, the correlation
(b) (oIl/2.rr, o12/2.rr)= (0.1, 0.1); (021/271, ~ ~ ~ / 2 .=r r(0.34,
) 0.1). (c) (wll/2.rr,o12/2.rr) = windowing method, MLM, and MEM estimates all show LOSP errors very close
(0.1, 0.1); (*,/2.rr, od2.rr) = (0.38, 0.1). to zero. For two sinusoids with 5 x 5 correlation values, all methods show some
finite LOSP error. Although the MEM estimates exhibit much sharper peaks than
do the other two, the table shows that the MEM gives the worst LOSP estimate
for the two sinusoid cases. The correlation windowing method and the MLM give
LOSP estimates of approximately the same magnitude.
many correlation points from the data, those correlation points with large lags
would be estimated from a small amount of data and consequently might not be
reliable. Using unreliable correlation points can sometimes do more harm than
good to the resulting spectral estimate. Determining the optimum size and shape
of the correlation points to be estimated depends on many factors, including the
type of data, the specific spectral estimation technique used, and the method by
which the correlation points are estimated from available data. Despite some
studies [Burg et al.] on this problem, few concrete results are available.
In summary, a spectral estimate can be significantly affected by the way in
which the correlation points are estimated from available data. Estimating the
correlation points to optimize the performance of a given spectral estimation method
remains an area in need of further research.
Determining which method is best for any given application is an important issue
in using spectral estimation techniques. Since the various methods were developed
on the basis of different assumptions, and since only limited comparisons of the
Figure 6.18 Spectral estimates are affected by how correlation points are estimated from methods' performance are available, choosing the best method for a given appli-
data. MLM spectral estimates. Parameters used for data generation: M = 2; a, = 1: cation problem is a difficult task. We offer here some general guidelines for
a, = 1; (oll/2n, wl,/2n) = (0.1, 0.1); (o,,/2n, wz2/2n) = (0.4, 0.4); $ i = 0, $ = 1; SNR
= 3 dB; data size, 6 x 6; correlation size, 5 x 5; CINC = 0.5 dB. (a) Correlation points
choosing a spectral estimation method.
estimated by using (6.124); (b) correlation points estimated by using (6.125). If enough data are available so that resolution is not a problem, conventional
IMLM, or MEM should be considered. The MEM has significantly better reso-
lution characteristics than the MLM and IMLM, but its computational requirements
are orders of magnitude greater than those of the MLM and IMLM. The MEM's ws e ws6
4-2; ----
computational requirements make it impractical for most real time applications. C
In this section, we present a typical application of a multidimensional spectral a speed of approximately 340 mlsec. For a planar array parallel to the ground
estimation technique. We will consider the problem of determining the direction surface, we can assume z = 0 without any loss of generality. Equation (6.128)
of low-flying aircraft by processing acoustic signals on a sensor array having a then becomes
planar spatial distribution. The two directional parameters of interest are the
azimuthal (or bearing) angle 0, defined in the interval
x cos 0 cos
C
4 - y sin 0 cos
C
4
) (6.129)
Denoting the 3-D analog Fourier transform of s(t, x, y) by S(R,, a,, a,), we have
and the elevation angle 4 , defined in the interval
S(R,, a x ,a,) = (2n)2S(fi,) 8
C
sin 0 cos 4
C
) (6.130)
These two parameters are illustrated in Figure 6.20. T o illustrate how a spectral where S(R,) is the 1-D analog Fourier transform of s(t), R, is the temporal frequency,
estimation technique can be used in determining the directions of aircraft, we first and Rx and Ry are the two spatial frequencies. Now consider a particular temporal
consider the case of a single aircraft as an acoustic source. The acoustic source frequency R;. From (6.130), as shown in Figure 6.21, the spectrum S(R;, R,, a,)
generates a space-time wavefield s(t, x, y, z), where t represents time and (x, y, z) should have a strong spectral peak at the following values of (a,, a,):
represent the Cartesian spatial coordinates. If the array is in the far field of the
cos 0 cos 4
acoustic source, the signal in the region of the array can be approximated as a plane n, = -n; C
wave. For any plane wave, the wavefield s(t, x, y, z) is constant along the plane
perpendicular to the directon of wave propagation, and therefore s(t, x, y, z) can
be expressed as = -a' sin 0 cos 4
C
x cos 0 cos
C
4 - y sin 0 cos
C
4 ----)
z sin
C
4 (6.128) By looking for a spectral peak in the function S(R:, R,, R,), then, we can determine
the two direction parameters 0 and 4 using (6.131) and (6.132). For SNR con-
where s(t) is the acoustic signal received at a particular spatial location (for example, siderations, R; is chosen such that S(R;, a,, R,) is large.
x = y = z = 0) and c is the speed of wave propagation. Sound travels in air at
I -52;
sin O2 cos @2
C
the nine sensors was used to record a time series. The wavefront s(t, x , y) was
sampled along the temporal dimension with a sampling rate of 2048 Hz, and the
total number of data points in each channel was 4096. Since the temporal spectral
contents of acoustic sounds from a helicopter typically consist of frequency contents
"v
/
below 200 Hz, this sampling rate was more than sufficient to avoid aliasing. The
cos 8, cos 9, /
time series obtained were three-dimensional, with one temporal and two spatial
-a: /
---/----
C / dimensions. Since much more data were available in the temporal than in the
-n;
cos 8, cos @, spatial dimensions, dimension-dependent processing was used. A Fourier-trans-
Figure 6.22 S(R',, Ox, R,) as a func- form-based conventional technique was first used along the time dimension. Along
tion of 9 and Ryat a particular tem- the two spatial dimensions, the MLM was used, because it is relatively simple
poral frequency R: for two sources. computationally, while offering higher resolution than Fourier-transform-based
conventional techniques.
If there are N sources at different azimuths and elevations, s(t, x, y) can be Taking the time series at each of the microphones, we first determine the
expressed as temporal frequency at which there was a strong spectral peak. This is accomplished
by looking at a single channel and performing periodogram averaging. Once the
i=l
x cos eicos 4;- y sin ei cos 4i
C C
) (6.133) temporal frequency is chosen, the data from each of the nine channels is divided
into 512-point sections. The nine channels are then correlated spatially at the
At a particular temporal frequency R;, the spectrum S(0;, Ox, 0,) should have N chosen temporal frequency. This is illustrated in Figure 6.23. Note that the result
strong spectral peaks. An example of S(R;, R,, fly) for N = 2 is shown in Figure
6.22. For each of these spectral peaks, the directional parameters can be deter-
mined by using (6.131) and (6.132). It is clear from the above discussion that the
number of sources and the direction parameters for each source can be determined
Channel 1 s(nT, x,, y,)
OIn14095
- Periodogram averaging
along time dimension
with 8 512-point data
segments
$s t 1 y.1 1 -z2~ 1 )
- Ps(wt.
A
( is largest I
XI,
k
through spectral analysis of s(t, x, y).
In the above discussion, determining the number of sources and their direction
parameters was a straightforward task that involved evaluating the Fourier trans-
I
1
form of s(t, x, y). This is because s(t, x, y) was assumed to be available for all Channel 1 s(nT, x,, y,) 1-D S(U,, x,, Y,) -- S(w;. x,. Y,)
(t, X, y). In practice, s(t, x, y) is sampled in both the temporal and spatial dimen- OIn1511 DFT
sions and only a finite number of samples are available. Along the spatial dimen-
sions, the number of array sensors limits the number of spatial samples. Along
the temporal dimension, an aircraft can be assumed to be coming from a constant Sample at
direction only for a limited amount of time, and this limits the number of samples. w, = w; correlation
Sampling s(t, x, y) repeats S(R,, a,, 0,) periodically in the frequency domain, and
sampling periods should be chosen to avoid aliasing effects. The limitation on the
number of samples can be viewed as signal windowing, which smoothes the spec- Channel 9 s(nT, xe, ye) -- I-D S(w,, xe. Ye)
trum in the frequency domain. As a result, the spectral peaks in S(0:, Ox, 0,) O I n 1511 DFT
are no longer impulses but have finite widths. This will clearly make it more
difficult to resolve two aircraft coming from similar directions, particularly when Figure 6.23 Estimation of spatial correlation at a particular temporal frequency from nine
the number of samples within the window is small. Another potential problem array sensors, with 512 data points in each channel.
area is noise. The wavefront s(t, x, y) may be degraded by noise, further reducing
resolution. Simply computing the discrete Fourier transform of samples of s(t, x, y)
Sec. 6.5 Application Example 395
This process is repeated for eight consecutive 512-point sections, and the
s(flT, x, Y ) Estimate of R,(x, y ) results are averaged to obtain the spatial correlation values:
O5n1511 by system in Figure 6.23 8
R,(x, y) = i i=l RS(x, y). (6.134)
s(nT, x, Y ) The spatial correlation estimate R,(x, y) is used along the two spatial dimensions
5125n 51023 by system in Figure 6.23 by the MLM. This is shown in Figure 6.24. The spectral estimate obtained as a
estimation
(MLM) function of 8 and c$ is shown in Figure 6.25. Conversion from the spatial fre-
quencies to 8 and c$ is made by using (6.131) and (6.132). The spectral estimate
clearly shows one strong spectral peak representing the presence of an acoustic
s(nT, x, Y ) Estimate O-p(x, y) source. The location of the spectral peak is well within the tolerance of the
3584 5 n 1 4 0 9 5 by system in Figure 6.23 experiment in this example.
We have discussed one particular way to process the data from a sensor array.
Figure 6.24 Estimation of spatial correlation at a particular temporal frequency from nine There are many variations of this method, in addition to methods that are signif-
array sensors with 4096 data points in each channel. The estimate is obtained by segmenting icantly different from this one. A method that is useful for determining the azimuth
the data into eight blocks of 512 data points each and by averaging the result of applying the angle 8 alone is discussed in Problem 6.24.
system in Figure 6.23 to each of the eight data blocks.
REFERENCES
of the 1-D DFT along the time dimension in each of the channels is complex-
valued, and we do not perform the magnitude squared operation. This is because
For books on probability, random processes, and estimation, see [Papoulis;
the phase information must be preserved in the intermediate stages of dimension-
Van Trees].
dependent processing, as discussed in Section 6.2.2.
For reprint books of journal articles on spectral estimation, see [Childers;
Kesler]. For a special journal issue on the topic, see [Haykin and Cadzow]. For
books on conventional spectral estimation techniques, see [Blackman and Tukey;
Jenkins and Watts]. For articles on conventional methods, see [Welch; Nuttal
and Carter]. For a tutorial article on 1-D high resolution spectral estimation
techniques, see [Kay and Marple]. For a review article on multidimensional spec-
tral estimation, see [McClellan].
For readings on the MLM for spectral estimation, see [Capon et al.; Capon].
For the IMLM, see [Burg (1972); Lirn and Dowla]. For a computationally efficient
method to compute the MLM spectral estimate, see [Musicus].
For the relationship between maximum entropy method and autoregressive
C-
WEST
-EAST
signal modeling in 1-D, see [Ulrych and Bishop]. For a method to reduce spectral
distortion caused by 2-D autoregressive (AR) modeling, see [Jackson and Chien].
For a method to improve the performance of the A R signal modeling method in
the frequency estimation of multiple sinusoids, see [Tufts and Kumaresan]. For
spectral estimation based on autoregressive moving average (ARMA) signal mod-
eling, see [Cadzow and Ogino]. For spectral estimation based on a Markov model,
see [Chellappa et al.].
For a reading on the original development of the MEM for spectral estimation,
see [Burg (1975)l. For a reading on extendability of a given correlation sequence,
see [Dickinson]. For the development of the MEM algorithms, see [Woods; Wern-
ecke and D'Addario; Lim and Malik; Lang and McClellan (1982)l.
SOUTH Many methods for spectral estimation have been proposed in addition to the
standard spectral estimation methods discussed in this chapter. The data adaptive
Figure 6.25 MLM spectral estimate for real data gathered from nine microphones spectral estimation (DASE) method, which is a generalization of the MLM dis-
(3 x 3-size grid with 1-m spacing). The spectral peak shows the estimated direction
parameters of a flying helicopter.
Chap. 6 References 397
Spectral Estimation Chap. 6
L. B. Jackson and H. C. Chien, Frequency and bearing estimation by two-dimensional linear
prediction, Proc. 1979 Int. Conf. Acoust. Speech Sig. Proc., April 1979, pp. 665-668.
cussed in Section 6.2.3, is discussed in [Davis and Regier]. Pisarenko's method, G. M. Jenkins and D. G. Watts, Spectral Analysis and Its Applications. San Francisco:
which is based o n a parametric model of the spectrum consisting of impulses and Holden-Day, 1968.
a noise spectrum with a known shape is discussed in [Pisarenko; Lang and McClellan D. H. Johnson, The application of spectral estimation methods to bearing estimation prob-
(1983)l. This method requires eigenanalysis of t h e correlation. A description of lems, Proc. IEEE, Vol. 70, September 1982, pp. 1018-1028.
multiple signal classification (MUSIC), which is related t o Pisarenko's method, can S. M. Kay and S. L. Marple, Jr., Spectrum analysis-a modern perspective, Proc. IEEE,
b e found in [Schmidt]. Vol. 69, November 1981, pp. 1380-1419.
For readings o n performance comparison of different spectral estimation S. B. Kesler, ed., Modern Spectrum Analysis, II. New York: IEEE Press, 1986.
methods, see [Lacoss (1971); Cox; Malik and Lim]. For t h e effect of correlation
R. T. Lacoss, Data-adaptive spectral analysis methods, Geophysics, Vol. 36, August 1971,
estimation o n spectral estimation, see [Burg, e t al.]. pp. 661-675.
For applications of spectral estimation to sonar a n d radar signal processing, R. T. Lacoss, et al., Distributed sensor networks, Semiannual Tech. Rep., M.I.T. Lincoln
see [Oppenheim]. For a tutorial article on array processing, which is o n e of t h e Lab, May 1980.
main applications of 2-D spectral estimation, see [Dudgeon]. For estimation of
S. W. Lang and J. H. McClellan, Multi-dimensional MEM spectral estimation, IEEE Trans.
t h e bearing angle and elevation angle for array sensors through dimension-de- Acoust. Speech, Sig. Proc., Vol. ASSP-30, December 1982, pp. 880-887.
pendent spectral estimation, see [Lacoss, e t al.]. For methods t o estimate t h e S. W. Lang and J. H. McClellan, Spectral estimation for sensor arrays, IEEE Trans. Acoust.
bearing angle through spectral estimation, see [Johnson; Nawab, e t al.]. Speech, Sig. Proc., Vol. ASSP-31, April 1983, pp. 349-358.
J. S. Lim and F. U. Dowla, A new algorithm for high-resolution two-dimensional spectral
R. B. Blackman and J. W. Tukey, The Measurement of Power Spectra. New York: Dover, estimation, Proc. IEEE, Vol. 71, February 1983, pp. 284-285.
1958. J. S. Lim and N. A. Malik, A new algorithm for two-dimensional maximum entropy power
J. P. Burg, Maximum entropy spectral analysis, Ph.D. Thesis. Stanford, CA: Stanford spectrum estimation, IEEE Trans. Acoust. Speech, Sig. Proc., Vol. ASSP-29, June 1981,
University, 1975. pp. 401-413.
J. P. Burg, The relationship between maximum entropy spectra and maximum likelihood N. A. Malik and J. S. Lim, Properties of two-dimensional maximum entropy power spectrum
spectra, Geophysics, Vol. 37, 1972, p. 375. estimates, IEEE Trans. Acourt. Speech, Sig. Proc., Vol. ASSP-30, October 1982, pp.
J. P. Burg, D. G. Luenberger, and D. L. Wenger, Estimation of structured covariance 788-798.
matrices, Proc. IEEE, Vol. 70, September 1982, pp. 963-974. J. H. McClellan, Multi-dimensional spectral estimation, Proc. IEEE, Vol. 70, September
J. A. Cadzow and K. Ogino, Two-dimensional signal estimation, IEEE Trans. Acoust. 1982, pp. 1029-1039.
Speech Sig. Proc., Vol. ASSP-29, June 1981, pp. 396-401. B. R. Musicus, Fast MLM power spectrum estimation from uniformly spaced correlations,
J. Capon, High-resolution frequency-wavenumber spectrum analysis, Proc. IEEE, Vol. 57, IEEE Trans. Acoust. Speech, Sig. Proc., Vol. ASSP-33, October 1985, pp. 1333-1335.
August 1969, pp. 1408-1419. S. H. Nawab, F. U. Dowla, and R. T. Lacoss, Direction determination of wideband signals,
J. Capon, R. J. Greenfield, and R. J. Kolken, Multi-dimensional maximum likelihood IEEE Trans. Acoust. Speech, Sig. Proc., Vol. ASSP-33, October 1985, pp. 1114-1122.
processing of a large aperture seismic array, Proc. IEEE, Vol. 55, February 1967, pp. S. H. Nuttal and G. C. Carter, Spectral estimation using combined time and lag weighting,
192-211. Proc. IEEE, Vol. 70, September 1982, pp. 1115-1125.
R. Chellappa, Y. H. Hu, and S. Y. Kung, On two-dimensional Markov spectral estimation, A. V. Oppenheim, ed., Applications of Digital Signal Processing. Englewood Cliffs, NJ:
IEEE Trans. Acoust. Speech Sig. Proc., Vol. ASSP-31, August 1983, pp. 836-841. Prentice-Hall, 1978, pp. 331-428.
D. G. Childers, ed., Modern Spectral Analysis. New York: IEEE Press, 1978. A. Papoulis, Probability, Random Variables, and Stochastic Processes. New York: Mc-
H. Cox, Resolving power and sensitivity to mismatch of optimum array processors, J. Acoust. Graw-Hill, 1965.
Soc. A m . , Vol. 54, 1973, pp. 771-785. V. F. Pisarenko, On the estimation of spectra by means of nonlinear functions of the
R. E. Davis and L. A . Regier, Methods for estimating directional wave spectra from multi- covariance matrix, Geophys. J. Roy. Astron. Soc., Vol. 28, 1972, pp. 511-531.
element arrays, J. Marine Res., Vol. 35, 1977, pp. 453-477. R. 0. Schmidt, A signal subspace approach to emitter location and spectral estimation,
B. W. Dickinson, Two-dimensional Markov spectrum estimates need not exist, IEEE Trans. Ph.D thesis. Stanford, CA: Stanford University, August 1981.
Inf. Theory, Vol. IT-26, January 1980, pp. 120-121. D. W. Tufts and R. Kumaresan, Estimation of frequencies of multiple sinusoids: making
F. U. Dowla, Bearing estimation of wideband signals by multidimensional spectral esti- linear prediction perform like maximum likelihood, Proc. IEEE, Vol. 70, September
mation, Ph.D. Thesis, Cambridge, MA: M.I.T., 1984. 1982, pp. 975-989.
D. E. Dudgeon, Fundamentals of digital array processing, Proc. IEEE, Vol. 65, June 1977, T. J. Ulrych and T. N. Bishop, Maximum entropy spectral analysis and autoregressive
pp. 898-904. decomposition, Rev. Geophys. Space, Phys., Vol. 13, February 1975, pp. 183-200.
S. Haykin and J. A. Cadzow, eds., Special issue on Spectral Estimation, Proc. IEEE,
September 1982.
Chap. 6 References 399
PROBLEMS
6.1. Let x and y be uncorrelated real random variables. Define a new random variable I Figure P6.4
wbyw=x+y.
(a) Show that E[w] = E[x] + E[y]. (a) Determine ~ ~ , ~ ( x o l ~ o )=l ~P~,~(~OI+).
~=I,
(b) Show that Var [w] = Var [x] + Var [y], where Var [w] represents variance of w (b) Determine py(~o)lyo=,, = py(4).
and Var [x] and Var [y] are similarly defined. You may use the result of (a). (c) From the results of (a) and (b), show that
6.2. Let x denote a real random variable whose probability density functionpx(xo)is given
by
I Figure P6.3
I Figure P6.5
(aj Determine E[w].
(a) Are x and y statistically independent? (b) Determine Var [w].
(b) Are x and y linearly independent? 6.6. Let x denote an N x I-column vector that consists of N real random variables. The
(c) Are the results in (a) and (b) consistent with the fact that statistical independence N random variables are said to be jointly Gaussian if their joint probability density
implies linear independence, but linear independence does not imply statistical
independence?
Chap. 6 Problems 401
Spectral Estimation Chap. 6
6.11. Let f(n,, n,) denote a deterministic signal. We have two observations of f(n,, n,),
function px(x,,) is given by r l ( n l , n2) and r2(nl, n,), given by
rl(n1, n,) = f(n1, n2) + w,(n,, n2)
Figure P6.12
Suppose the response of the system to a real, stationary, zero-mean, and white noise
process with variance of 1 is denoted by x(nl, n,), as shown below.
zero-mean
white noise
w(n1, n2)
(a) Determine P,(w,, o,), the power spectrum of x(n,, n,).
(b) Determine R,(n,, n,), the correlation sequence of x(n,, n,).
Figure P6.10 (c) Determine E[x(n,, n,)].
(d) Determine E[x2(n,, n,)].
F3r what va!ues of o: is Rx(n,,n,) a valid corielation fdfiiti~ii? Assumr a is reai
Chap. 6 Problems
402 Spectral Estimation Chap. 6
esses. Consider x ( n , , n,) given by
6.13. Consider a linear shift-invariant system whose impulse response is real and is given
by h ( n , , n,). Suppose the responses of the system to the two inputs x ( n , , n,) and
v ( n l , n,) are y ( n , , n,) and z ( n , , n,), as shown in the figure below.
where the sinusoidal components are deterministic and w ( n , , n,) is zero-mean white
n2) + + y(n1, n2) noise with variance of 02,. Clearly, E [ x ( k , ,k,)x*(k, - n , , k , - n,)] depends not only
on n, and n, but also on k , and k,, so x(n,, n,) is not a stationary random process.
One way to eliminate the dependence on k , and k , is to define 8,(n,, n,) by
1
8,(n,, n,) lim (2N
= N--.- + 2 2
k1=-Nk2=-N x ( k , , k,)x*(k, - n , , k , - n,)
The inputs x(n,, n,) and v ( n , , n,) in the figure represent real stationary zero-mean
random processes with autocorrelation functions Rx(n,, n,) and R,(n,, n,), the power and assume that 8,(n,, n,) is the correlation function. Show that 8,(n,, n,) for the
spectra P , ( o , , o,) and P,(w,, o,), cross-correlation function Rxv(n,, n,), and cross- above x(n,, n,) is given by
power spectrum P,,(o,, o , ) . *I
(8) Given R,(n,, n,), R,(n,, n,), Rx,(nl, n,), P x ( o l , a,), P v ( o l , o,),and P,,(o,, o,), e,(n,, n,) = afel(of1n1+02n2) + u2 8(n
w 1, n2 ).
,=I
determine P,,(o,, o,), the cross-power spectrum of y(n,, n,) and z ( n , , n,).
(b) Is the cross-power spectrum P,,(o,, o,) always nonnegative? That is, is P,,(o,, y) 6.17. Let x(n,, n,) be a sample of a 2-D stationary random process. The sequence
r 0 for all (o,,a,)? Justify your answer. x(n,, n,) is given for ( n , , n,) 6 D . In spectral estimation based on a periodogram,
6.14. Let s(n,, n,) and w ( n , , n,) denote two stationary zero-mean random processes. The x(n,, n,) is assumed to be zero for ( n , , n,) f D , and the spectral estimate
power spectra of the two processes P , ( o , , o,) and P , ( o , , o,) are given by P , ( o , , o,) is obtained by
where N represents the number of points in the region D. Show that the periodogram
10, otherwise f',(o,, w,) is the Fourier transform of ~ , ( n , n,)
, given by
and P , ( o , , o,) = 1 for all ( o , ,o,). The observation process is given by
Determine the frequency response of the noncausal Wiener filter that can be used to 6.18. Let x(n,, n,) be a sample of a 2-D stationary random process. The sequence x(n,, n,)
estimate s(n,, n,) from r ( n , , n,). Assume that s(n,, n,) and w ( n , , n,) are uncorrelated is given for 0 5 n, 5 N l - 1, 0 5 n, 5 N , - 1. In spectral estimation based on a
with each other. periodogram, x(n,, n,) is assumed to be zero outside 0 5 n, 5 Nl - 1 , 0 5 n, 5 N,
6.15. Let r denote an observation variable that can be modeled by
- 1 and the spectral estimate f',(o,, o,) is obtained by
where s is the variable we wish to estimate and w is a Gaussian random variable with
mean of 0 and variance of a; that is,
: The periodogram f',(o,, o,) is often computed by using FFT algorithms. Suppose
we compute the periodogram by computing an MI x M2-point DFT. If MI > N , or
M, > N,, we can pad x ( n , , n,) with enough zeros before we compute its DFT. Does
the choice of MI and M, affect the resolution characteristic of the periodogram?
(a) Assuming that s is nonrandom, determine,,,5 the maximum likelihood estimate 6.19. In this problem, we show that a periodogram can be viewed as filtering the data with
of s . a bank of bandpass filters that are identical except for their passband center frequen-
(b) Suppose s is a random variable which has a Gaussian probability density function cies. Let x(n,, n,) be a sample of a 2-D stationary random process. The sequence
with mean of m, and variance of <.
Determine,,,,i the maximum a posteriori x(n,, n,) is assumed to be given for 0 5 n , 5 N , - 1 , 0 5 n, 5 N, - 1. In computing
estimate of s . Discuss how,,,i is affected by r , m,, u,2,and .
u Does your answer
: the periodogram, x(n,, n,) is assumed to be zero outside 0 5 n, 5 N , - 1, 0 5 n,
make sense? IN, - 1. This is equivalent to windowing x(n,, n,) with a rectangular window
(c) Suppose s is again assumed to be a random variable which has a Gaussian prob- w ( n , , n,) given by
ability density function with mean of m, and variance of u . Determine
: ,,,,,S the
minimum mean square error estimate of s . You do not have to evaluate your
expression explicitly. What is JMMsE when r = m,?
6.16. Signals such as sinusoids buried in noise are not samples of stationary random proc-
Chap. 6 Problems
404 Spectral Estimation Chap. 6
6.21. Let x(nl, n,) denote a stationary random process whose correlation function R,(nl, n,)
is assumed to be known only for - N 5 n, I N, - N 5 n, 5 N. To estimate the
Let x,(nl, n,) denote the windowed signal so that power spectrum, we model x(n,, n,) as the response of an auto-regressive model
x,(nl, n,) = x(nl, n2)w(n1,n,). excited by white noise with variance of u2,as shown in the following figure.
The periodogram fix(wl, w,) is given by 0
1 I
LSI filter
h(n,, n,) = w(-n,, -n,l
-
where h(n,, n,) = w(- n,, n,) is the impulse resp2nse of the filter.
(b) From the result in (a), show that the periodogram Px(w,, w,) can be viewed as
the result of filtering the data with a bank of bandpass filters that are identical
except the passband center frequencies. Sketch the shape of one bandpass filter. Figure P6.21a
The result in (b) shows that the characteristics of the bandpass filter do not change
for different frequencies. This contrasts sharply with the maximum likelihood method,
in which a different bandpass filter is designed at each frequency. (b) Suppose the region A is chosen as shown in Figure P6.21b. Determine and sketch
the region of support of the correlation points that are used in estimating the
6.20. Let x(n,, n,) denote a stationary zero-mean random process. The correlation function
Rx(nl, n,) is given only for - 1 5 n, 5 1, -1 5 n, c 1, as shown in the following model parameters a(nl, n,) with the smallest possible value of N.
figure.
"2
Unknown
"1
I Figure P6.21b
(c) Suppose we conjecture that the resolution of the spectral estimate is better along
I Figure P6.20
Determine Px(wl, w,), the MLM spectral estimate of x(n,, n,). Chap. 6 Problems
6.22. The correlation function R,(n, , n,) o f a stationary random process x(n, , n,) is assumed
known only for - 1 : n, 5 1 and - 1 c: n2 5 1. Determine one possible maximum
entropy spectral estimate. You may choose any nonzero R,(n,, n,) within the above
constraints on R,(n,, n,).
4, Figure P6.24
6.23. Consider a 3-D analog signal s(t, x , y ) given by (a) Suppose we consider a different temporal frequency R:'. Sketch P,(R:', Q, fly)
and determine the (a,, R.") at which the spectral peak occurs.
x w s 0 cos 4 - y sin 0 cos I$
(b) Suppose IS(Rr),)12is given by Figure 6.24b. Sketch PzD(Rx, R,) =
C C n,= -=
Show that the 3-D analog Fourier transform S ( R r ,a,, fly)of s(t, x , y ) is given by P,(Rr, fl,, fly)dfl,. The subscript Z D in PzD(Rx,fl,) refers to "zero delay," for
reasons to be discussed shortly.
sin 0 cos 4
s(fl,,ax,ay)= ( 2 ~ ) ' S ( f l ,6) C C
where S(R,) is the 1-D analog Fourier transform of s(t), R , is the temporal frequency,
and (R,, f l y ) are two spatial frequencies corresponding to the spatial variables ( x , y).
6.24. In this problem, we discuss a method developed for estimating the bearing angle from
the data for a planar array. For a planar array, a space-time wavefield s(t, x , y) based
on the plane wave assumption can be expressed as
x cos 0 cos 4 - y sin 0 cos 4
C C
where s(t) is the wavefield received at x = y = 0 , 0 is the bearing angle, and 4 is the (c) Develop a method of estimating the bearing angle 8 from PzD(flx,fly).
elevation angle, as shown in Figure 6.20. The Fourier transform of s(t, x , y ) is given (d) Explain why the elevation angle 4 cannot be estimated from PzD(Rx,fly).
by (e) Let R,(t, x , y) denote the correlation function of s(t, x , y ) defined by
sin 8 cos 4
C C
Rs(t, x , .Y) = F-'[Ps(Q, ax ay)].
We define RZD(x,y ) by
where S ( Q ) is the I-D analog Fourier transform of s(t), R r is the temporal frequency,
and (a,,f l y ) are the two spatial frequencies. Suppose we estimate P,(R,, a x , a,), RzD(x, Y ) = Rs(t, X , y)Ir=O-
the power spectrum of s(t, x , y ) , by What is the relationship between R,,(x, y ) and Pz,(Rx, fly)? Since RzD(x,y )
PS(flr,a,, ny)= kls(R,, f i x , &)I2 does not require correlation along the temporal dimension, it is said to be a zero
delay correlation function.
where k is a normalization wnstant. As sketched in Figure P6.24a, P,(R,, a x , fly)/,,=,; ( f ) Since RzD(x, y) does not require correlation along the temporal dimension, es-
has a strong spectral peak at timating RZD(x,y ) or P,,(x, y ) is considerably simpler computationally than
estimating R,(t, x , y) or P,(t, x , y). Discuss the circumstances under which it is
4 = -n: cos 8Ccos 4 better to estimate the bearing angle from PzD(Rx,fly) than from P,(R:, a,, fly)
at some particular temporal frequency 0:.
sin 8 cos 4
4= -0:
C
Chap. 6 Problems
Spectral Estimation Chap. 6
crawling due to line interlace. Digital televisions are not far from realization, and
digital image processing will have a major impact on improving the image quality
of existing television systems and on developing such new television systems as
high definition television.
One major problem of such video communications as video conferencing and
video telephone has been the enormous bandwidth required. A straightforward
coding of broadcast quality video requires on the order of 100 million bits per
second. By sacrificing quality and using digital image coding schemes, systems
that transmit intelligible images at bit rates lower than 100 thousand bits per second
have become commercially available.
Robots are expected to play an increasingly important role in industries and
Image Processing homes. They will perform jobs that are very tedious or dangerous and jobs that
require speed and accuracy beyond human ability. As robots become more so-
phisticated, computer vision will play an increasingly important role. Robots will
Basics be asked not only to detect and identify industrial parts, but also to "understand"
what they "see" and take appropriate actions. Digital image processing will have
a major impact on computer vision.
In addition to these well-established application areas of digital image pro-
cessing, there are a number of less obvious ones. Law enforcement agents often
take pictures in uncooperating environments, and the resulting images are often
7.0 INTRODUCTION degraded. For example, snapshots of moving cars' license plates are often blurred;
reducing blurring is essential in identifying the car. Another potential application
In the previous six chapters, we studied the fundamentals of 2-D digital signal is the study of whale migration. When people study the migratory behavior of
processing. In the next four chapters, we will study digital image processing, an lions, tigers, and other land animals, they capture animals and tag them on a
important application of 2-D digital signal processing theories. convenient tail or ear. When the animals are recaptured at another location, the
Digital image processing has many practical applications. One of the earliest tags serve as evidence of migratory behavior. Whales, however, are quite difficult
applications was processing images from the Ranger 7 mission at the Jet Propulsion to capture and tag. Fortunately, whales like to show their tails, which have features
Laboratory in the early 1960s. The imaging system mounted on the spacecraft that can be used to distinguish them. To identify a whale, a snapshot of its tail,
had a number of constraints imposed on it, such as size and weight, and images taken from shipboard, is compared with a reference collection of photographs of
received had such degradations as blurring, geometric distortions, and background thousands of different whales' tails. Successive sightings and identifications of an
noise. These images were successfully processed by digital computers, and since individual whale allow its migration to be tracked. Comparing photographs, though,
then images from space missions have been routinely processed by digital com- is extremely tedious, and digital image processing may prove useful in automating
puters. The striking pictures of the moon and the planet Mars we see in magazines the task.
have all been processed by digital computers. The potential applications of digital image processing are limitless. In ad-
The image processing application which probably has had the greatest impact dition to the applications discussed above, they include home electronics, astron-
on our lives is in the field of medicine. Computed tomography, which has its basis omy, biology, physics, agriculture, geography, defense, anthropology, and many
in the projection-slice theorem discussed in Section 1.4.3, is used routinely in many other fields. Vision and hearing are the two most important means by which
clinical situations, for example, detecting and identifying brain tumors. Other humans perceive the outside world, so it is not surprising that digital image pro-
medical applications of digital image processing include enhancement of x-ray cessing has potential applications not only in science and engineering, but also in
images and identification of blood vessel boundaries from angiograms. any human endeavor.
Another application, much closer to home for the average person, is the Digital image processing can be classified broadly into four areas, depending
improvement of television images. * The image that we view on a television mon- on the nature of the task. These are image enhancement, restoration, coding, and
itor has flickering, limited resolution, a ghost image, background noise, and motion understanding. In image enhancement, images either are processed for human
viewers, as in television, or are preprocessed to aid machine performance, as in
'It may be argued that improving the quality of television programs is a much more object identification by machine. Image enhancement is discussed in Chapter 8.
urgent problem. We point out, though, that improving image quality probably will not make In image restoration, an image has been degraded in some manner such as blurring,
mediocre programs any worse.
Sec. 7.0 Introduction 41 1
7.1, we discuss basics of the images we process. In Sections 7.2 and 7.3, we discuss
the basics of the human visual system. In Section 7.4, we discuss the basics of a
and the objective is to reduce or eliminate the effect of degradation. Image
restoration is closely related to image enhancement. When an image is degraded, typical image processing environment.
reducing the image degradation often results in enhancement. There are, however,
some important differences between restoration and enhancement. In image res-
toration, an ideal image has been degraded and the objective is to make the 7.1 LIGHT
processed image resemble the original as much as possible. In image enhancement,
the objective is to make the processed image look better in some sense than the 7.1.1 Light as an Electromagnetic Wave
unprocessed image. To illustrate this difference, note that an original, undegraded
image cannot be further restored, but can be enhanced by increasing sharpness. Everything that we view is seen with light. There are two types of light sources.
One type, called a primary light source, emits its own light. Examples of primary
Image restoration is discussed in Chapter 9. In image coding, one objective is to
represent an image with as few bits as possible, preserving a certain level of image light sources include the sun, lamps, and candles. The other type, called a sec-
quality and intelligibility acceptable for a given application such as video confer- ondary light source, only reflects or diffuses the light emitted by another source.
encing. Image coding is related to image enhancement and restoration. If we Examples of secondary light sources include the moon, clouds, and apples.
can enhance the visual appearance of the reconstructed image, or if we can reduce Light is part of a vast, continuous spectrum of electromagnetic radiation. An
degradation from such sources as quantization noise from an image coding algo- electromagnetic wave carries energy, and the energy distribution of the wave pass-
rithm, then we can reduce the number of bits required to represent an image at a ing through a spatial plane can be represented by c(x, y, t, A), where x and y are
two spatial variables, t is the time variable, and A is the wavelength. The function
given level of image quality and intelligibility. Image coding is discussed in Chapter
10. c(x, y, t, A) is called radiant flux per (area x wavelength) or irradiance per wave-
length. The wavelength A is related to the frequency f by
In image understanding, the objective is to symbolically represent the contents
of an image. Applications of image understanding include computer vision, ro- A = clf (7.1)
botics, and target identification. Image understanding differs from the other three
areas in one major respect. In image enhancement, restoration, and coding, both where c is the speed* of an electromagnetic wave, approximately 3 x 10' rnlsec
the input and the output are images, and signal processing has been the backbone in vacuum and air. Although the function c(x, y, t, A) can be expressed in terms
of the frequency f, it is more convenient to use the wavelength A. The unit
of many successful systems in these areas. In image understanding, the input is
an image, but the output is typically some symbolic representation of the contents associated with c(x, y, t, A) is energy per (area X time x wavelength) and is joules/
of the input image. Successful development of systems in this area involves both (m3 sec) in the MKS (meter, kg, second) system. If we integrate c(x, y, t, A) with
signal processing and artificial intelligence concepts. In a typical image under- respect to A, we obtain irradiance that has the unit of joules/(m2 sec) or watts/m2.
standing system, signal processing is used for such lower-level processing tasks as Radiation from the sun that passes through a spatial plane perpendicular to the
reduction of degradation and extraction of edges or other image features, and rays has 1350 watts/m2 of irradiance in the absence of atmospheric absorption. If
artificial intelligence is used for such higher-level processing tasks as symbol ma- we integrate c(x, y, t, A) with respect to aLl four variables x , y, t and A, we obtain
nipulation and knowledge base management. We treat some of the lower-level the total energy (in joules) of the electromagnetic wave that passes through the
processing techniques useful in image understanding as part of our general dis- spatial plane.
cussion of image enhancement, restoration, and coding. A more complete treat- Light is distinguished from other electromagnetic waves-for instance, radio
ment of image understanding is beyond the scope of this book. transmission waves-by the fact that the eye is sensitive to it. Suppose we consider
The theoretical results we studied in the first six chapters are generally based a fixed spatial point (x', y') and a fixed time t'. The function c(x, y, t, A) can be
on a set of assumptions. In practice, these assumptions rarely are satisfied exactly. viewed as a function of A only. We can express it as c(xl, y', t', A), or c(A) for
Some results may not be useful in image processing; others may have to be modified. convenience. An example of c(A) for the radiation from the sun is shown in Figure
We need to know the basics of image processing if we are to understand the theories' 7.1. The eye is sensitive t o electromagnetic waves over an extremely narrow range
of A, that is, approximately from 350 nm to 750 nm. (1 nm = meter). Figure
applicability and limitations and to modify them when necessary to adapt to real-
world problems. Moreover, the first six chapters have focused on general theories 7.2 shows different types of electromagnetic waves as a function of the wavelength
that apply not only to image processing, but also to other 2-D signal processing A. Electromagnetic radiation with large A, from a few centimeters to several
thousand meters, can be generated by electrical circuits. Such radiation is used
problems such as geophysical data processing. Some important theoretical results
specific to image processing have not yet been discussed. Some basic knowledge
*The variable c is used both as the speed and as the energy distribution function of
of image processing is needed to understand these theories. In this chapter, we an electromagnetic wave. Which is meant will be apparent from the context.
present the basics of image processing. These basics will lay a foundation for later
chapters' discussion of image enhancement, restoration, and coding. In Section
Sec. 7.1 Light
412 Image Processing Basics Chap. 7
Wavelength
+
h in meters
Wavelength h
in nanometers
m)
Radio broadcast bands
VHF
700
UHF
lo-' 1 Radar
Microwaves
Orange
Yellow
Io6 -Infrared
Visible light
Green
Figure 7.1 Spectral contents of the -
X rays
sun's radiation, above the earth's at-
400 500 600 700
mosphere (solid line) and on the ground
at noon in Washington (dotted line). 10-lo-
Blue
Wavelength [nm] After [Hardy]. Gamma rays
10-l2- 400 Figure 7.2 Different types of electro-
Violet magnetic waves as a function of the
for radio transmission and radar. Radiation with A just above the visible range is wavelength h.
called infrared; with A just below the visible range, it is called ultraviolet. Both
infrared and ultraviolet radiations are emitted by typical light sources such as the
perception of brightness. The quantities which take the human observer's char-
sun. Radiation with A far below the visible range includes X rays, y rays, and
acteristics into account, thus relating to brightness better than the integral of c(A),
cosmic rays; for cosmic rays, A is less than nm or 10-l4 m.
are called photometric quantities.
The basic photometric quantity is luminance, adopted in 1948 by the C.I.E.
7.1.2 Brightness, Hue, and Saturation
(Commission Internationale de l'Eclairage), an international body concerned with
standards for light and color. Consider a light with c(A) that is zero everywhere
Human perception of light with c(A) is generally described in terms of brightness,
except at A = A,, where A, denotes a fixed reference wavelength. A light that
hue, and saturation. Brightness refers to how bright the light is. Hue refers to
consists of only one spectral component (one wavelength) is called a monochromatic
the color, such as red, orange, or purple. Saturation, sometimes called chroma,
light. Suppose we ask a human observer to compare the brightness from a mono-
refers to how vivid or dull the color is. Brightness, hue, and saturation are per-
chromatic light with c(Ar) with that from another monochromatic light with cf(A,)
ceptual terms, and they depend on a number of factors, including the detailed
where A, is a test wavelength. Suppose further that the observer says that c(Ar)
shape of c(A), the past history of the observer's exposure to visual stimuli, and the
matches cf(A,) in brightness. The equal brightness points c(A,) and cf(A,) can be
specific environment in which the light is viewed. Nevertheless, it is possible to
obtained by such experiments as showing two patches of light with a fixed c(Ar)
relate them very approximately to specific features of c(A).
and a variable cl(A,) and asking the observer to decrease or increase the amplitude
To relate the human perception of brightness to c(A), it is useful to define
of cl(A,) until they match in brightness. The ratio c(Ar)/cl(A,)where c(A,) and cl(A,)
photometric quantities. The quantities associated with c(A), such as radiant flux,
match in brightness, is called the relative luminous efficiency of a monochromatic
irradiance, and watts/m2, are called radiometric units. These physical quantities light with A, relative to A,, and is approximately independent of the amplitude of
can be defined independent of a specific observer. The contributions that c(A,) c(A,) under normal viewing conditions. The wavelength A, used is 555 nm (yellow-
and c(A2) make to human perception of brightness are in general quite different
green light), at which a typical observer has maximum brightness sensitivity. For
for A, # A, even though c(A,) may be the same as c(A2). For example, an this choice of A,, the relative luminous efficiency c(A,)/cl(A,) is always less than or
electromagnetic wave with c(A) is invisible to a human observer as long as c(A) is
equal to 1, since c(Ar) is not greater than cr(A,); that is, it takes less energy at A,
zero in the visible range of A, no matter how large c(A) may be outside the visible
to produce the same brightness. The relative luminous efficiency as a function of
range. Even within the visible range, the brightness depends on A. For this
reason, a simple integral of c(h) over the variable ?, does iioi relate well to the
Sec. 7.1 Light 415
Image Processing Basics Chap. 7
occurs when A = 555 nm. At other wavelengths, v(A) < 1, so the irradiance of
the monochromatic light must be greater than 1 watt/m2 to generate a luminance
per area of 685 lumens/m2. Many other photometric units such as footcandle
(lumen/ft2) and phot (lumen/cm2) can be defined in terms of lumen.
It is important to note that the luminance or luminance per area does not
measure human perception of brightness. For example, a light with 2 lumens/m2
does not appear twice as bright to a human observer as a light with 1 lumen/m2.
It is also possible to create an environment where a light with a smaller luminance
per area looks brighter than a light with a larger luminance per area. However,
luminance per area is related more directly than an integral of c(A) to human
perception of brightness. Furthermore, in typical viewing conditions (light neither
too weak nor excessive), a light with larger luminance per area is perceived to be
brighter than a light with smaller luminance per area.
Hue is defined as that attribute of color which allows us to distinguish red
from blue. In some cases, the hue of a color can be related to simple features of
c(A). Light with approximately constant c(A) in the visible range appears white
or colorless. Under normal viewing conditions, a monochromatic light appears
colored and its color depends on A. When an observer is shown a succession of
Figure 7.3 C.I.E. relative luminous ef- monochromatic lights side by side, the color changes smoothly from one hue to
Wavelength [nm] ficiency function. After [Hardy]. another. Light can be split into a succession of monochromatic lights by a prism,
as shown in Figure 7.4. This was first done in 1666 by Newton. Newton divided
A is called the relative luminous efficiency function and is denoted by v(A). Two the color spectrum in the visible range into seven broad categories: red, orange,
monochromatic lights with c,(A,) and c2(A2)appear equally bright to an observer yellow, green, blue, indigo, and violet, in the order of longer to shorter A. These
when* are known as the seven colors of the rainbow. Newton originally began with only
red, yellow, green, blue and violet. H e later added orange and indigo to bring
the number to seven (in keeping with the tradition of dividing the week into seven
The relative luminous efficiency function v(A) depends on the observer. Even days, the musical notes into seven, and so on).
with a single observer, a slightly different v(A) is obtained when measured at When a light is not monochromatic but its c(A) is narrow band in the sense
different times. To eliminate this variation, the C.I.E. standard observer was that most of its energy is concentrated in A' - AX < A < A' + Ah for small Ah,
defined in 1929, based on experimental results obtained from a number of different the perceived hue roughly corresponds to monochromatic light with A = A'. The
observers. The resulting function v(A) is called the C.I. E. relative luminous effi- color will appear less pure, however, than a monochromatic light of a similar hue.
ciency function and is shown in Figure 7.3. The C.I.E. function is roughly bell- When c(h) is some arbitrary function, it is difficult to relate the hue to some simple
shaped with a maximum of 1 at A = 555 nm. features of c(A). By proper choice of c(A), it is possible to produce hues that do
The basic unit of luminance is the lumen (lm). The luminance per area 1 of not correspond to any monochromatic light. By mixing red and blue lights, for
a light with c(A) can be defined by example, it is possible to produce purple light.
Red
In (7.3), the quantity 1 is in units of lumens/mz, k is 685 lumenslwatt, c(A) is in Orange
Yellow
units of wattslm3, v(A) is unitless, and A has a unit of m. A monochromatic light
Green
with an irradiance of 1 watt/m2 produces 685 lumens/mz when v(A) = 1. This Blue
*Our discussions in this section are brief and some reasonable assumptions are made.
For example, (7.2) is based on the transmittivity law, which states that if A and B are
Figure 7.4 White light split into a succession of monochromatic lights by a prism.
equally bright, and B and C are equally bright, then A and C are equally bright. This
transrnittivity law has been approximately verified experimentally.
Sec. 7.1 Light
416 Image Processing Basics Chap. 7
and B are combined, the result is white. When roughly equal amounts of R, G ,
Saturation refers to the purity or vividness of the color. A monochromatic and B components are used in a color TV monitor, therefore, the result is a black-
light has very pure spectral contents, and looks very vivid and pure. It is said to and-white image. By combining different amounts of the R , G, and B components,
be highly saturated. As the spectral content of c(h) widens, the color is perceived other colors can be obtained. A mixture of a red light and a weak green light
as less vivid and pure, and the color is said to be less saturated. Color saturation with no blue light, for example, produces a brown light.
is related very approximately to the effective width of c(h). Nature often generates color by filtering out, or subtracting, some wavelengths
and reflecting others. This process of wavelength subtraction is accomplished by
7.1.3 Additive and Subtractive Color Systems molecules called pigments, which absorb particular parts of the spectrum. For
example, when sunlight, which consists of many different wavelengths, hits a red
When two lights with c,(A) and c,(h) are combined, the resulting light has c(X) apple, the billions of pigment molecules on the surface of the apple absorb all the
given by wavelengths except those corresponding to red. As a result, the reflected light
has a c(h) which is perceived as red. The pigments subtract out certain wave-
c(h) = c,(h) + c,(A). (7-4) lengths, and a mixture of two different types of pigments will result in a reflected
Since the lights add in (7.4), this is called an additive color system. By adding light whose wavelengths are further reduced. This is called a subtractive color
system. When two inks of different colors are combined to produce another color
light sources with different wavelengths, many different colors can be generated.
For example, the lighted screen of a color television tube is covered with small, on paper, the subtractive color system applies.
The three primary colors of a subtractive color system are yellow (Y), cyan
glowing phosphor dots arranged in groups of three. Each of these groups contains
(C), and magenta (M), which are the secondary colors of the additive color system.
one red, one green, and one blue dot. These three colors are used because by
The three colors are shown in Figure 7.6. By mixing the proper amounts of these
proper combination they can produce a wider range of colors than any other
colors (pigments), a wide range of colors can be generated. A mixture of yellow
combination of three colors; they are the primary colors of the additive color
and cyan produces green. A mixture of yellow and magenta produces red. A
system. Colors of monochromatic lights change gradually, and it is difficult to
mixture of cyan and magenta produces blue. Thus, the three colors, red, green,
pinpoint the specific wavelength corresponding to red (R), green (G), and blue
and blue, the primary colors of the additive color system, are the secondary colors
(B). The C.I.E. has chosen A = 700 nm for red, A = 546.1 nm for green, and of the subtractive color system. When all three primary colors Y, C , and M are
A = 435.8 nm for blue.
combined, the result is black; the pigments absorb all the visible wavelengths.
The three primary colors of the additive color system are shown in Figure
It is important to note that the subtractive color system is fundamentally
7.5. In the additive color system, a mixture of equal amounts of blue and green
different from the additive color system. In the additive color system, as we add
produces cyan. A mixture of equal amounts of red and blue produces magenta,
colors (lights) with different wavelengths, the resulting light consists of more wave-
and a mixture of equal amounts of red and green produces yellow. The three
lengths. We begin with black, corresponding to no light. As we then go from
colors yellow (Y), cyan (C), and magenta (M) are called the secondary colors of
the additive color system. When roughly equal amounts of all three colors R, G,
where sBw(h) is the spectral characteristic of the sensor used and k is some scaling
constant. Since brightness perception is the primary concern with a black-and-
white image, sBW(A)is typically chosen to resemble the relative luminous efficiency
function discussed in Section 7.1.2. The value I is often referred to as the lumi-
nance, intensity, orgray level of a black-and-white image. Since I in (7.5) represents
power per unit area, it is always nonnegative and finite; that is,
The Y component is called the luminance component, since it roughly reflects the
luminance 1 in (7.3). It is primarily responsible for the perception of the brightness
of a color image, and can be used as a black-and-white image. The I and Q
components are called chrominance components, and they are primarily responsible
for the perception of the hue and saturation of a color image. The f,(x, y), f,(x, y),
and fQ(x, y) components, corresponding to the color image in Figure 7.8, are shown
as three monochrome images in Figures 7.9(a), (b), and (c), respectively. Since
f,(x, y) and fQ(x, y) can be negative, a bias has been added to them for display.
The mid-gray intensity in Figures 7.9(b) and (c) represents the zero amplitude of
f,(x, y) and fQ(x, y). One advantage of the YIQ tristimulus set relative to the
RGB set is that we can process the Y component only. The processed image will
tend to differ from the unprocessed image in its appearance of brightness. Another
advantage is that most high-frequency components of a color image are primarily
in the Y component. Therefore, significant spatial lowpass filtering of I and Q
components does not significantly affect the color image. This feature can be
exploited in the coding of a digital color image or in the analog transmission of a
color television signal.
When the objective of image processing goes beyond accurately reproducing
the "original" scene as seen by human viewers, we are not limited to the range
of wavelengths visible to humans. Detecting objects that generate heat, for ex-
ample, is much easier with an image obtained using a sensor that responds to
infrared light than with a regular color image. Infrared images can be obtained
in a manner similar to (7.7) by simply changing the spectral characteristics of the
sensor used.
The human visual system is one of the most complex in existence. Our visual
system allows us to organize and understand the many complex elements of our
environment. For nearly all animals, vision is just an instrument of survival. For
humans, vision is not only an aid to survival, but an instrument of thought and a
Figure 7.9 Y, I, and Q components of the color image in Figure 7.8(d). (a) Y component; means to a richer life.
(b) I component; (c) Q component. The visual system consists of an eye that transforms light to neural signals,
and the related parts of the brain that process the neural signals and extract nec-
TV monitor (in the NTSC* color system), the corresponding luminance-chromi- essary information. The eye, the beginning of the visual system, is approximately
nance values Y, I, and Q are related to R , G , and B by spherical with a diameter of around 2 cm. From a functional point of view, the
eye is a device that gathers light and focuses it on its rear surface.
A horizontal cross section of an eye is shown in Figure 7.10. At the very
front of the eye, facing the outside world, is the cornea, a tough, transparent
membrane. The main function of the cornea is to refract (bend) light. Because
*NTSC stands for National Television Systems Committee.
Sec. 7.2 The Human Visual System 423
dation. This adjustability of the shape is the most important feature of the lens.
Sciera
Accommodation takes place almost instantly and is controlled by the ciliary body,
/ a group of muscles surrounding the lens.
Behind the lens is the vitreous humor, which is a transparent jelly-like sub-
stance. It is optically matched so that light which has been sharply focused by the
lens keeps the same course. The vitreous humor fills the entire space between
the lens and the retina and occupies about two-thirds of the eye's volume. One
of its functions is to support the shape of the eye.
Behind the vitreous humor is the retina, which covers about 65% of the inside
of the eyeball. This is the screen on which the entering light is focused and light-
receptive cells convert light to neural signals. All of the other eye parts we have
discussed so far serve the function of placing a sharp image on this receptor surface.
The fact that an image is formed on the retina, so the eye is simply an image
catching device, was not known until the early seventeenth century. Even though
z/ 'Blind Figure 7.10 Horizontal cross section of
the ancient Greeks knew the structure of an eye accurately and performed delicate
surgery on it, they theorized that light-like rays emanate from the eye, touch an
a right human eye, seen from above.
object, and make it visible. After all, things appear "out there." In 1625, Scheiner
demonstrated that light enters the eye and vision stems from the light that enters
of its rounded shape, it acts like the convex lens of a camera. It accounts for the eye. By exposing the retina of an animal and looking through it from behind,
nearly two-thirds of the total amount of light bending needed for proper focusing. he was able to see miniature reproductions of the objects in front of the eyeball.
Behind the cornea is the aqueous humor, which is a clear, freely flowing There are two types of light-receptive cells in the retina. They are called
liquid. Through the cornea and the aqueous humor, we can see the iris. By cones and rods because of their shape. The cones, which number about 7 million,
changing the size of the pupil, a small round hole in the center of the iris, the iris are less sensitive to light than rods, and are primarily for day (photopic) vision.
controls the amount of light entering the eye. Pupil diameter ranges between 1.5 They are also responsible for seeing color. The three types of cones are most
-
mm 8 mm, with smaller diameter corresponding to exposure to brighter light. sensitive to red, green, and blue light, respectively. This is the qualitative phys-
The color of the iris determines the color of the eye. When we say that a person iological basis for representing a color image with red, green, and blue monochrome
has blue eyes, we mean blue irises. Iris color, which has caught the attention of images. The rods, which number about 120 million, are more sensitive to light
so many lovers and poets, is not functionally significant to the eye. than cones, and are primarily for night (scotopic) vision. Since the cones respon-
Behind the iris is the lens. The lens consists of many transparent fibers sible for color vision do not respond to dim light, we do not see color in very dim
encased in a transparent elastic membrane about the size and shape of a small light.
bean. The lens grows throughout a person's lifetime. Thus, the lens of an eighty- Rods and cones are distributed throughout the retina. However, their dis-
year-old man is more than fifty percent larger than that of a twenty-year old. As tribution is highly uneven. The distribution of the rods and cones in the retina is
with an onion, cells in the oldest layer remain in the center, and cells in newer shown in Figure 7.11. Directly behind the middle point of the pupil is a small
layers grow further from the center. The lens has a bi-convex shape and a refractive
index of 1.4, which is higher than any other part of the eye through which light
passes. However, the lenk is surrounded by media that have refractive indices
close to its own. For this reason, much less light-bending takes place at the lens
than at the cornea. The cornea has a refractive index of 1.38, but faces the air,
which has a refractive index of 1. The main function of the lens is to accurately
focus the incoming light on a screen at the back of the eye called the retina. For
a system with a fixed lens and a fixed distance between the lens and the screen, it
is possible to focus objects at only one particular distance. If faraway objects are
in sharp focus, for example, close objects will be focused behind the screen. To
be able to focus close objects at one time and distant objects at some other time,
Figure 7.11 Distribution of rods (dot-
a camera changes the distance between the fixed lens and the screen. This is what Nasal Temporal on retina ted line) and cones (solid line) on the
the eyes of many fish do. In the case of the human eye, the shape of the lens, Perimetric angle, degrees retina. After [Pirenne].
rather than the distance between the lens and screen, is changed. This process of
changing shape to meet the needs of both near and far vision is cailea accommo-
Sec. 7.2 The Human Visual System 425
424 Image Processing Basics Chap. 7
Lateral
Optic geniculate
depressed dimple on the retina called the fovea. There are no rods in this small chiasm bodv
region, and most of the cones are concentrated here. Therefore, this is the region
for the most accurate vision in bright light. When we look straight ahead at an
object, the object is focused on the fovea. Since the fovea is very small, we
constantly move our attention from one region to another when studying a larger
region in detail. The rods, which function best in night vision, are concentrated
away from the fovea. Since there are no rods in the fovea, an object focused in
the fovea is not visible in dim light. To see objects at night, therefore, we look
at them slightly sideways. Optic Visual
nerves cortex
There are many thin layers in the retina. Even though cones and rods are
light-receptive cells, so that it would be reasonable for them to be located closer
to the vitreous humor, they are located farther away from it. Therefore, light has
to pass through other layers of the retina, such as nerve fibers, to reach the cones
and rods. This is shown in Figure 7.12. It is not clear why nature chose to do it
this way, but the arrangement works. In the fovea, at least, the nerves are pushed
aside so that the cones are directly exposed to light. Due to this particular ar- I
rangement, the optic nerve fibers have to pass through the light-receptive cell layers Optic radiation
tract
Figure 7.13 Path that neural signals travel from the retina to the visual cortex.
Light
on the way to the brain. Instead of crossing the light-receptive cell layers through-
out the retina, they bundle up at one small region the size of a pinhead in the
retina, known as the blind spot. Since there are no light receptive cells in this
region, we cannot see light focused on the blind spot.
When light hits cones and rods, a complex electrochemical reaction takes
place, and light is converted to neural impulses, which are transmitted to the brain
through the optic nerve fibers. There are about 130 million light-receptive cells
(cones and rods), but only about 1million nerve fibers. This means that one nerve
fiber, on the average, serves more than 100 light-receptive cells. The nerve fibers
are not shared equally. Some cones in the fovea are served by one nerve fiber
each, increasing the visual acuity in this region. The rods, however, always share
nerve fibers. This is one reason why visual acuity at night is not as good as it is
during the day even though there are many more rods than cones.
After the optic nerve bundles leave the two eyes, the two bundles meet at
an intersection called the optic chiasm. This is shown in Figure 7.13. Each of
the two bundles is divided into two branches. Two branches, one from each of
the two bundles, join together to form a new bundle. The remaining two branches
form another bundle. This crossing of the nerve fibers from two eyes is partly
responsible for our stereoscopic vision, which mixes the images from each eye to
allow the visual field to be perceived as a 3-D space. These two new bundles go
to the left and right lateral geniculate bodies, respectively. The original fibers end
here and new fibers continue to the visual cortex, where the neural signals are
processed and vision takes place. The visual cortex is a small part of the cortex,
Figure 7.12 Layers in the retina. Note
a mass of gray matter forming two hemispheres in the back of the brain. Little
that light has to travel through several
layers before it reaches light-sensitive is known about how the visual neural signals are processed in the visual cortex.
Pigment cells cziis.
Sec. 7.2 The Human Visual System 427
426 Image Processing Basics Chap. 7
placed on what is important to the visual system. This is one reason why some
Neural image processing operations are performed in the log intensity domain rather than
Light * Peripheral signals . Central
1
7.2.2 Model for Peripheral Level of Visual System 7.3.1 Intensity Sensitivity
The human visual system discussed in Section 7.2.1 can be viewed as a cascade of One way to quantify our ability to resolve two visual stimuli which are the same
two systems, as shown in Figure 7.14. The first system, which represents the except for their intensities or luminances is by measuring the just-noticeable dif-
peripheral level of the visual system, converts light to a neural signal. The second ference (j.n.d.). The j.n.d. can be defined and measured in a variety of ways.
system, which represents the central level of the visual system, processes the neural One way is through a psychophysical experiment called intensity discrimination.
signal to extract information. Suppose we present the visual stimulus in Figure 7.16 to an observer. The inside
Unlike central level processing, about which little is known, peripheral level region is a monochrome image of uniform intensity I,,, which is randomly chosen
processing is fairly well understood, and many attempts have been made to model to be either I or I + AI. The outside region is a monochrome image of intensity
it. One very simple model [Stockham] for a monochrome image that is consistent I,,,, which is chosen to be I + A I when Ii, = I and I when Iin= I + AI. The
with some well-known visual phenomena is shown in Figure 7.15. In this model, observer is asked to make a forced choice as to which of the two intensities Iinand
the monochrome image intensity I(x, y) is modified by a nonlinearity, such as a I,,, is brighter. When A I is very large, the observer will give a correct answer
logarithmic operation, that compresses the high level intensities but expands the most of the time, correct in the sense that the region with I + A I is chosen. When
low level intensities. The result is then filtered by a linear shift-invariant (LSI) A I is very small, the observer will give a correct answer about half of the time.
system with spatial frequency response H(flx, a,). The nonlinearity is motivated As we move away from a very large A I to a very small AI, the percentage of the
by the results of some psychophysical experiments that will be discussed in the observer's correct responses decreases continuously, and we can define A I at which
next section. The LSI system H(flx, fly), which is bandpass in character, is mo- the observer gives correct responses 75% of the time as the j.n.d. at the intensity I.
tivated by the finite size of the pupil, the resolution limit imposed by a finite number
of light-receptive cells, and the lateral inhibition process. The finite size of the
pupil and the resolution limit due to a finite number of light receptive cells are
responsible for the lowpass part of the bandpass nature of H(flx, fly). The lateral
inhibition process stems from the observation that one neural fiber responds to
many cones and rods. The response of the neural fiber is some combination of
the signals from the cones and rods. While some cones and rods contribute pos-
itively, others contribute negatively (inhibition). This lateral inhibition process is
the rationale for the highpass part of the bandpass character of H(flx, fly). Even
though the model in Figure 7.15 is very simple and applies only to the peripheral
level of the human visual system, it is consistent with some of the visual phenomena
that are discussed in the next section.
One way to exploit a model such as the one in Figure 7.15 is to process an
image in a domain closer to where vision takes place. This can be useful in some
applications. In image coding, for example, the information that is in the image
but is discarded by the visual system does not need to be coded. By processing Figure 7.16 Two stimuli used in an intensity discrimination experiment. Each trial
an image in a domain closer to where vision takes place, more emphasis can be consists of showing one of the two stimuli to an observer and asking the observer
to make a forced choice of which between I,, and I,,, appears brighter. The stimulus
used in a trial is chosen randomly from the two stimuli. Results of this experiment
can be used to measure the just noticeable difference Al as a function of I.
Light Nonlinearity H(S2,.Sly) Neural
signal Figure 7.15 Simple model of periph-
era1 level of human visual system.
Sec. 7.3 Visual Phenomena
428 Image Processing Basics Chap. 7
1 \ /= , IT,
log I
Figure 7.17 Plot of A111 as a function
of I. Incremental intensity AI is the just
noticeable difference. Over a wide
range of I , AIII is approximately con-
stant. This relationship is called Weber's
law.
7.3.2 Adaptation
Figure 7.19 Two stimuli used in studying the effect of adaptation on the intensity
sensitivity. Each trial consists of showing one of the two stimuli to an observer and
asking the observer to make a forced choice of which between IR or IL appears
brighter. The stimulus used in a trial is chosen randomly from the two stimuli.
Results of this experiment can be used to measure the just noticeable difference
Figure 10.59 Another example of hl as a function of I and I,,.
color image coding using the scene
adaptive coder. Courtesy of Wen-
Hsiung Chen. (a) Original image ot Sec. 7.3 Visual Phenomena 43 1
512 X 512 pixels; (b) reconstructed
color image of 0.4 bit!pixel.
Figure 7.20 Plot of AIII as a function of I and I,. When I, equals I , All1 is the
same as that in Figure 7.17 (dotted line in this figure). When I, is different from
I, AlIIincreases relative to the case I, = I. This means that the observer's sensitivity
to intensity decreases.
Figure 7.20. When I, is equal to I, the result is the same as that in Figure 7.17.
When I,, is different from I, however, the j.n.d. A 1 increases relative to the case
I, = I, indicating that the observer's sensitivity to intensity decreases. The result
shows that sensitivity to intensity is highest near the level that the observer is
adapted to. This is another way in which the visual system responds to a wide
range of intensities at different times.
Consider an image whose intensity is constant along the vertical dimension but
increases in a staircase manner along the horizontal dimension, as shown in Figure
7.21(a). The intensities along the horizontal direction are sketched in
Figure 7.21(b). Even though the intensity within each rectangular region is con-
stant, each region looks brighter towards the left and darker towards the right.
This is known as the Mach band effect. This phenomenon is consistent with the
presence of spatial filtering in the peripheral-level model of the visual system in
Figure 7.15. When a filter is applied to a signal with sharp discontinuities, an (b)
overshoot and undershoot occur. This is partly responsible for uneven brightness
perception within the region of uniform intensity. This suggests that precise pres- Figure 7.21 Illustration of Mach band effect.
ervation of the edge shape is not necessary in image processing.
The presence in the visual system of a spatial bandpass filter can be seen by rection. In Figure 7.22, we are more sensitive to the contrast in mid-frequency
looking at the image in Figure 7.22. The image I(x, y) in Figure 7.22 is given by regions than in low- and high-frequency regions, indicating the bandpass character
of the visual system. A spatial filter frequency response H(R,, fly),which is more
I(x, y) = Io(y) cos (o(x)x) + constant (7.11) accurately measured by assuming the model in Figure 7.15 is correct, is shown in
where the constant is chosen such that I(x, y) is positive for all (x, y). As we Figure 7.23. The horizontal axis is the spatial frequencylangle of vision. The
move in the horizontal direction from left to right, the spatial frequency o(x) perceived spatial frequency of an image changes as a function of the distance
increases. As we move in the vertical direction from top to bottom, the amplitude between the eye and the image. As the distance increases, the perceived spatial
Io(y) increases. If the spatial frequency response were constant across the fre- frequency increases. To take this effect into account, the spatial frequencylangle
quency rafige, sensitivity to intensity wou!d be constant along the hcrizoctal di-
Sec. 7.3 Visual Phenomena 433
432 Image Processing Basics Chap. 7
is much more pronounced than the effect of the brightness level on noise visibility
discussed in Section 7.3.1. Consider the image in Figure 7.18, which illustrates
the effect of overall brightness level on noise visibility. In the figure, the noise is
much less visible in the edge regions than in the uniform background regions. In
addition, the noise in the dark edge regions is less visible than the noise in the
bright uniform background regions. One way to explain this is to consider the
local signal-to-noise ratio (SNR). If we consider the local SNR to be the ratio of
the signal variance to the noise variance in a local region, at the same level of
noise, the SNR is higher in the high-contrast region than in the uniform background.
Another related view is spatial masking. In a high-contrast region, signal level is
high and tends to mask the noise more.
The spatial masking effect can be exploited in image processing. For ex-
ample, attempting to reduce background noise by spatial filtering typically involves
some level of image blurring. In high-contrast regions, where the effect of blurring
due to spatial filtering is more likely to be pronounced, the noise is not as visible,
so little spatial filtering may be needed.
It is well known that a sharper image generally looks even more pleasant to a
human viewer than an original image. This is often exploited in improving the
Figure 7.22 Modulated sinewave grating that illustrates the bandpass character appearance of an image for a human viewer. It is also a common experience that
of the peripheral level of the human visual system. an unnatural aspect catches a viewer's attention. A positive aspect of this phe-
nomenon is that it can be exploited in such applications as production of television
of vision (spatial frequency relative to the spatial domain in the retina) is often commercials. A negative aspect is that it sometimes makes it more difficult to
used in determining H(flx, fly). The frequency response H(Qx, fly) is maximum develop a successful algorithm using computer processing techniques. For ex-
at the spatial frequency in the range of approximately 5 -
10 cyclesldegree and ample, some image processing algorithms are capable of reducing a large amount
decreases as the spatial frequency increases or decreases from 5 10 cyclesldegree. - of background noise. In the process, however, they introduce noise that has an
artificial tone to it. Even when the amount of the artificial noise introduced is
7.3.4 Spatial Masking much less than the amount by which the background noise is reduced, the artificial
noise may catch a viewer's attention more, and a viewer may prefer the unprocessed
When random noise of a uniform level is added to an image, it is much more visible image over the processed image.
in a uniform background region than in a region with high contrast. This effect The visual phenomena discussed in the previous sections can be explained
simply, at least at a qualitative level; however, many other visual phenomena cannot
be explained simply, in part due to our lack of knowledge. For example, a visual
phenomenon that involves a fair amount of central level processing cannot be
explained in a satisfactory manner. Figure 7.24 shows a sketch consisting of just
a small number of lines. How we can associate this image with Einstein is not
clear. The example does demonstrate, however, that simple outlines representing
the gross features of an object are important for its identification. This can be
exploited in such applications as object identification in computer vision and the
development of a very low bit-rate video telephone system for the deaf.
The visual phenomena discussed above relate to the perception of light that
shines continuously. When light shines intermittently, our perception depends a
LIIIIIII 1 1 1 1 LI111l I I l l t l l # l l l great deal on its frequency. Consider a light that flashes on for a brief duration
0.18 0.80 1.8 6.0 18
Figure 7.23 Frequency response
60
- 7.15.
H(fi,. fi.', in the model in Figure N times per second. When N is small, the light flashes are perceived to be separate.
Spatial fresuencv (cvcledde.) ~ k e ;[~ividsonj.
Sec. 7.3 Visual Phenomena 435
434 Image Processing Basics Chap. 7
sec with 2: 1 interlace. For motion pictures, 24 frames per second are shown, with
one frame shown twice. The effective flicker rate is therefore 48 frameslsec. In
addition, the typical viewing condition in a cinema is very dark, decreasing the
fusion frequency to below 40 cycles/sec. For this reason, flickering is not visible
in a motion picture, even though the screen is dark for approximately half of the
time.
Even though each frame of a motion picture or television broadcast is actually
still, and only a finite number of frames are shown in a second, the objects in the
scene appear to be moving in a continuous manner. This effect, known as motion
rendition, is closely related to the phi phenomenon [Kinchla et all. Consider two
pulsating light sources separated by approximately 1degree of an observer's viewing
angle. When the lights shine for one msec each with a separation of 10 msec, the
light is perceived to move continuously from one source to the other. When the
time difference between the two lights is on the order of 1 msec, they appear to
flash simultaneously. When the time difference is more than 1 second, they are
perceived as two separate flashes. This is known as the phi phenomenon.
In general, frame rates that are sufficiently high to avoid flicker are adequate
for motion rendition. The fact that an object appears to move in a continuous
Figure 7.24 Sketch of Einstein's face manner in a motion picture or television broadcast does not necessarily imply that
with a small number of lines. the sampling rate along the temporal dimension is above the Nyquist rate. For
As we increase N, an unsteady flicker that is quite unpleasant to the human viewer objects with sufficiently rapid motion, sampling the temporal dimension 24 times1
occurs. As we increase N further, the flicker becomes less noticeable, and even- sec or 30 timeslsec is far lower than the Nyquist rate, and temporal aliasing occurs.
tually the observer can no longer detect that the light intensity is changing as a Temporal aliasing does not always cause motion discontinuity. In a movie, we
function of time. The frequency at which the observer begins perceiving light sometimes see a wheel that moves continuously, but backwards. In this case,
flashes as continuous light is called the criticalflicker frequency or fusion frequency. motion rendition is present, but significant temporal aliasing has occurred. Our
The fusion frequency increases as the size and overall intensity of the flickering current knowledge of the flicker effect, motion rendition, the temporal aliasing
object increase. The fusion frequency can be as low as a few cycleslsec for a very effect, and their interrelations is far from complete. A comprehensive under-
dim, small light and may exceed 100 cycleslsec for a very bright, larger light. When standing of this topic would be useful in a number of applications, such as bit rate
a flicker is perceived, visual acuity is at its worst. reduction by frame elimination in a sequence of image frames.
Intermittent light is common in everyday vision. Fluorescent lights do not
shine continuously, as they appear to, but flicker at a sufficiently high rate (over
100 timeslsec) that fusion is reached in typical viewing conditions. Avoiding the 1 7.4 IMAGE PROCESSING SYSTEMS
perception of flicker is an important consideration in deciding the rate at which a
CRT (cathode ray tube) display monitor is refreshed. As is discussed further in 7.4.1 Overview of an Image Processing System
Section 7.4, CRT display monitors are illuminated only for a short period of time. I
For an image to be displayed continuously without the perception of flicker, the A typical image processing system that involves digital signal processing is shown
monitor has to be refreshed at a sufficiently high rate. Typically, a CRT display in Figure 7.25. The input image source I ( x , y) is generally an object or a natural
monitor is refreshed 60 times per second. With 2:l interlace, which is discussed scene, but it may be an image produced by another system, such as a filter, a
further in Section 7.4, this corresponds to 30 frameslsec. The current NTSC cathode ray tube (CRT) display monitor, or a video cassette recorder (VCR). The
(National Television Systems Committee)* television system employs 30 frames1 digitizer converts the input source to an electrical signal whose amplitude represents
the image intensity and digitizes the electrical signal using an analog-to-digital
*The NTSC is a group formed in the United States in 1940 to formulate standards (AID) converter.
for monochrome television broadcast service. It was reconvened in 1950 to study color The sequence f(n,, n,) that results from the digitizer is then processed by a
television systems and recommended a standard system for the United States. The NTSC digital image processing algorithm. The algorithm may be implemented on a
system is used in North America and Japan. Both the SECAM (Sequential Couleur a general purpose computer, a microprocessor, or a special purpose hardware. The
Memoire) System used in France, the USSR and Eastern Europe, and the PAL (Phase
Alternating Line) system used in South America, Africa, and Western Europe except France specific algorithm used depends on the objective, which may involve image en-
employ 25 iramesisec with 2:l interiace.
Sec. 7.4 lmage Processing Systems 437
436 Image Processing Basics Chap. 7
Target ring
Photosensitive layer
,Cathode
Digital d n , , n,) Display
source promsing image Light
Figure P7.5
(a) Suppose we mix the two lights. Let c(A) denote the energy density of the combined
light. Sketch c(A).
(b) Suppose that cl(A) denotes the energy density of light reflected by a particular
ink illuminated by white light. Suppose further that c,(A) denotes the energy
density of light reflected by a second ink illuminated by the same white light. We
now mix the two inks in equal amounts and illuminate the mixture with the same
white light. Sketch c(A), the light reflected by the mixture of the two inks.
Figure P7.2
7.6. A color image can be viewed as three monochrome images, which we will denote by
fR(n,, n,), fc(nl, n,), and fB(nl, n,), representing the red, green, and blue components,
Under identical normal viewing conditions, which light will appear brighter to a human respectively. Alternatively, we can transform the fR(n,, n,), f,(n,, n,), and f,(n,, n,)
viewer?
7.3. Consider a monochron?at.ic !ight ~ i t weve!er,gt!:
h cf 555 nm which has irradiance of
Chap. 7 Problems 447
touch. Discuss why evolution may have favored such a mechanism rather than a
system whereby the just-noticeable intensity difference AI is constant independent of I .
I
7.10. Supppose Weber's law holds strictly. The just-noticeable intensity difference AI when
I = 100 has been measured to be 1. How many just-noticeable differences are there
between I = 10 and I = 1000?
7.11. A physical unit of spatial frequency is cycleslcm. A unit of spatial frequency more i
iI
directly related to our perception of spatial frequency is cycles/viewing angle. Suppose
we have a sinusoidal intensity grating at a horizontal frequency of 10 cycleslcm. In
In the above system, each of the three components fR(nl,n,), f G ( n l ,n,), and f,(n,, n,) answering the following questions, assume that the image is focused on the retina,
is filtered by a linear shift-invariant system with impulse response h(n,, n,). Now the sinusoidal grating is directly displayed before an eye, and the eye looks at the
consider the system shown in the following figure. image straight on.
(a) When the image is viewed at a distance of 50 cm away from the eye, what is the
horizontal spatial frequency in units of cyclesldegree in viewing angle?
(b) Answer (a) when the image is viewed at a distance of 100 cm from the eye.
f ( n 1n ) - Transfor-
mation -h ( n i , nz)
h ( n , , n,)
f d n i , n2)
Inverse
Transfor-
mation
(c) From the results of (a) and (b), and the spatial frequency response of a human
eye, discuss why the details of a distant scene are less visible than those of nearby
objects.
7.12. The lens of a human eye is convex. As a result, the image formed on the retina is
I
i
1I
upside down, as shown in the following figure. Nevertheless, the object is perceived
to be right side up. Discuss possible explanations for this phenomenon.
i
In the figure, h ( n l , n,) is the same as in the previous system. How are f k ( n l , n,),
f ~ ( n ,n,),
, and f X n l , n2), related to f R ( % , n 2 ) ,f G ( n l , n,), and f B ( n l ,n,)?
7.7. Consider a light represented in terms of its three components, red ( R ) , green ( G ) ,
and blue ( B ) . Suppose the intensities R , G , and B are in the following ranges:
Lens of
an eye
Figure P7.12
7.13. In the region on the retina known as the blind spot, there are no light-sensitive cells,
Note that R , G , and B are nonnegative. Another way of representing the same light
is in terms of the luminance component Y and the two chrominance components I so light focused on this region is not visible. The blind spot often goes unnoticed in
and Q. our everyday experience. Design an experiment that can be used to perceive the
(a) The possible ranges of Y, I , and Q are blind spot.
7.14. In a typical image digitization system, a small aperture A ( x , y ) searches an image
following a certain pattern called a raster. Let A ( x , y ) be given by
I, w < r
0, otherwise
The light intensity integrated over the small aperture is measured and is considered
Determine YMIN,YMAX, QMIN,and QMAx. the image intensity at the spatial point.
(b) Does any set of Y, I , and Q in the ranges obtained in (a) correspond to a valid (a) Discuss how this process can be viewed as convolving the input intensity I(x, y )
light, valid in the sense that it corresponds to a set of nonnegative R , G , and B? with the aperture A ( x , y ) and then sampling the result of convolution.
7.8 Figure 7.22 illustrates that the peripheral level of the human visual system is approx-
imately bandpass in character. Design an experiment that can be used in more
Chap. 7 Problems 449
448 Image Processing Basics Chap. 7
(b) Suppose we have an image of 5 cm x 5 cm, and we wish to sample it with uniform
sampling on a Cartesian grid at 512 x 512 points. What is a reasonable size of
r for the aperture? What is the effect of choosing a very small r? a very large
r?
7.15. In television broadcasting, a horizontal line time is the time interval during which the
scanner traces one horizontal line and retraces to the beginning of the next horizontal
line.
(a) What is the approximate horizontal line time in an NTSC television system?
Assume that the time it takes the scanner to retrace from the end of one field to
the beginning of the next is negligible.
(b) Approximately 16% of the horizontal line time is used to allow the retrace from
lmage Enhancement
the end of one horizontal line to the beginning of the next horizontal line. The
picture intensity information is blanked out during this period, which is called a
horizontal blanking period. What is the approximate time used as horizontal
blanking periods during the period the scanner scans one complete field in an
NTSC television system? 8.0 INTRODUCTION
7.16. In NTSC television broadcasting, the frame rate used is 30 frameslsec. Each frame
consists of 525 horizontal lines which are divided into two fields, odd and even. The Image enhancement is the processing of images to improve their appearance to
field rate used is, therefore, 60 fieldslsec. The spacing in time between any two human viewers o r to enhance other image processing systems' performance. Meth-
consecutive fields is the same. The 2:l interlace is used to give a vertical resolution ods and objectives vary with the application. When images are enhanced for
of 525 lineslframe at the rate of 30 frameslsec, but with a flicker frequency of 60 human viewers, as in television, the objective may be to improve perceptual aspects:
cycleslsec to reduce the perception of flicker. The 2:l interlace, however, causes image quality, intelligibility, or visual appearance. In other applications, such as
some artifacts. By considering a solid circle moving in the horizontal and vertical object identification by machine, an image may be preprocessed to aid machine
directions, discuss what distortion in the circle may be visible. Assume that all the performance. Because the objective of image enhancement is dependent on the
lines in a given field are displayed at the same time. application context, and the criteria for enhancement are often subjective or too
complex to be easily converted to useful objective measures, image enhancement
algorithms tend to be simple, qualitative, and ad hoc. In addition, in any given
application, an image enhancement algorithm that performs well for one class of
images may not perform as well for other classes.
Image enhancement is closely related to image restoration, which will be
discussed in Chapter 9. When an image is degraded, restoration of the original
image often results in enhancement. There are, however, some important differ-
ences between restoration and enhancement. In image restoration, an ideal image
has been degraded, and the objective is to make the processed image resemble the
original image as much as possible. In image enhancement, the 'objective is to
make the processed image better in some sense than the unprocessed image. In
this case, the ideal image depends on the problem context and often is not well
defined. To illustrate this difference, note that an original, undegraded image
cannot be further restored but can be enhanced by increasing sharpness through
highpass filtering.
Image enhancement is desirable in a number of contexts. In one important
class of problems, an image is enhanced by modifying its contrast andlor dynamic
Some part of this chapter has been adapted from "Image Enhancement" by Jae S.
Lim in Digital Image Processing Techniques, edited by Michael P. Ekstrom. Copyright
0 1984 by Academic Press, Inc. Reprinted by permission of the publisher.
(a) (bl
fa)
Figure 8.2 Histogram of 4 x 4-pixel image (a) in Figure 8.l(a); (b) in Figure
8.l(c).
time. On the basis of the initial histogram computation, the operator chooses a
gray scale transformation to produce a processed image. By looking at the proc-
essed image and its histogram, the operator can choose another gray scale trans-
formation, obtaining a new processed image. These steps can be repeated until
the output image satisfies the operator.
In such circumstances as when there are too many images for individual
lnput intensity attention by a human operator, the gray scale transformation must be chosen
automatically. A method known as histogram modification is useful in this case.
In this method, the gray scale transformation that produces a desired histogram is
chosen for each individual image. The desired histogram of the output image,
denoted by pd(g), that is useful for typical images has a maximum around the
Figure 8.1 Example of gray scale mod- middle of the dynamic range and decreases slowly as the intensity increases or
ification. (a) Image of 4 x 4 pixels, decreases. For a given image, we wish to determine the transformation function
with each pixel represented by 3 bits; so that the resulting output image has a histogram similar to pdg). This problem
(b) gray scale transformation function;
can be phrased in terms of a problem in elementary probability theory. Specifi-
(c) result of modifying the image in (a)
using the gray scale transformation cally, the histograms p(f) and pd(g) can be viewed as scaled probability density
functions of random variables f and g, respectively. For example, p(3)/16 in Figure
l
(c) function in (b).
8.2(a) is the probability that a randomly chosen pixel in the 4 x 4-pixel image in
Figure 8.l(a) will have an intensity level of 3. We wish to find a transformation
important image features that help determine which particular gray scale trans- g = T[f] with the constraint that T[f] must be a monotonically nondecreasing
formation is desirable. In Figure 8.2(a), the image's intensities are clustered in a function off such that p(g) is equal to or close to pd(g). One approach to solving
small region, and the available dynamic range isnot very well utilized. In such a this probability problem is to obtain the probability distribution functions P(f) and
case, a transformation of the type shown in Figure 8.l(b) would increase the overall Pd(g) by integrating the probability density functions p(f) and pd(g) and then
dynamic range, and the resulting image would appear to have greater contrast. choosing the transformation function such that P(f) will be equal to or close to
This is evidenced by Figure 8.2(b), which is the histogram of the processed image 64 Pd(g) at g = T[f]. Imposing the constraint that T[f] must be a monotonically
shown in Figure 8.l(c). nondecreasing function ensures that a pixel with a higher intensity than another
Because computing the histogram of an image and modifying its gray scale pixel will not become a pixel with a lower intensity in the output image.
for a given gray scale transformation requires little computation, the desirable gray Applying this approach to the histogram modification problem which involves
scale transformation can be determined by an experienced human operator in real
(8. l a )
is shown in Figure 8.4(a), and the histogram of the image obtained by using this
tralisformation function is shown in Figure 8.4(b). If the desired histogram pd(g)
remains the same for different input images, Pd(g) needs to be computed only once
from pd(g).
In the example we considered above, note that the histogram of the processed
image is not the same as the given desired histogram. This is in general the case
when f and g are discrete variables and we require that all the pixels with the same
input intensity be mapped to the same output intensity. Note also that the desired
cumulative histogram Pd(g) is close to a straight line. In the special case of the f lnput intensity
(c)
lnput intensity
(al
6 -
4 -
Figure 8.3 Histograms and cumulative
2 -
h~stograms.(a) Histogram of an 8 x 8- Output intensity
Figure 8.3 (continued)
0 1 2 3 4 5 6 7 8 9101112131415 g plxel image; (b) desired histogram; (d)
(cl
~, cumulative histoeram derived from
Output intensity (a); (d) cumulative histogram derived
(b) from (b).
Sec. 8.1 Contrast and Dynamic Range Modification
essing applications. This is illustrated in the following two examples. Figure
8.5(a) shows an original image of 512 X 512 pixels, with each pixel represented
by eight bits. Figure 8.5(b) shows the histogram of the image in Figure 8.5(a).
The histogram clearly shows that a large number of the image's pixels are concen-
trated in the lower intensity levels of the dynamic range, suggesting that the image
will appear very dark with a loss of contrast in the dark regions. By increasing
the contrast in the dark regions, the details can be made more visible. This can
be accomplished by using the transformation function shown in Figure 8.5(c). The
processed image using the function in Figure 8.5(c) is shown in Figure 8.5(d), and
the histogram of the processed image is shown in Figure 8.5(e). Another example
is shown in Figure 8.6. The unprocessed image is shown in Figure 8.6(a) and the
image processed by gray scale modification is shown in Figure 8.6(b).
The histogram modification method discussed above can also be applied to
color images. To improve the image contrast with only a relatively small effect
on the 11ue o r saturation, we can transform RGB images fR(nl, n,), f,(n,, n,), and
fB(nl, n,) to YIQ images fu(nl, n,), fr(nl, n,), and fQ(n,, n,) by using the trans-
formation in (7.8). Gray scale modification can be applied to only the Y image
fdn,, n,), and the result can be combined with the unprocessed f,(nl, n,) and
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
fQ(nl, n,). Again using the transformation in (7.8), the processed RGB images
0 1 2 3 4 5 6 7 8 9 1011 1 2 1 3 1 4 1 5 b f gR(nl, n,), g,(nl, n,), and gB(n,, n,) can be obtained. Figure 8.7(a) (see color
Input intensity insert) shows an original color image of 512 x 512 pixels, and Figure 8.7(b) shows
(a)
the image processed by the gray scale transformation discussed above.
where f(nl, n,) is the original image, f,(n,, n,) is the lowpass filtered or unsharp
histogram modification known as histogram equalization, the desired histogram is image, a and b are positive scalars with a > 6,and g(nl, n,) is the processed image.
assumed constant. In this case, the desired cumulative histogram would be exactly Rewriting f(nl, n,) as a sum of the lowpass filtered image fL(nl, n,) and the highpass
a straight line. Images processed by histogram equalization typically have more filtered image fH(n,, n,), we can write (8.2) as
contrast than unprocessed images, but they tend to appear somewhat unnatural.
Even though gray scale modification is conceptually and computationally
simple, it can often provide significant improvement in image quality or intelligi- From (8.3), it is clear that high-frequency components are emphasized over low-
bility to the human observer, and is therefore used routinely in many image proc-
Input intensity
Figure 8.5 Example of gray scale modification. (a) Orlginal image of 512 x 512 pixels:
(b) histogram of the imagc in ( a ) (c) transformation function used in the gray scale modi-
fication: (d) processed Image; (e) histogram of the processed image III (d). (at ~b)
Figure 8.6 Example of gray scale modification. (a) Original image of 512 x 512 pixels;
(b) processed image.
In some applications, it is desirable to modify the local contrast and local luminance
I mean as the local characteristics of an image vary. In such applications, it is
I
I reasonable to process an image adaptively.
'0s i(''1. "2)
I owp pass
- I
I
I
filtering
-+log -p
I
I
I Highpass log r(nlln21
I filtering
I
I
I
I
L ----------------------- J
(a)
Figure 8.11 Example of homomorphic processing for image enhancement. (a) Original
image of 256 x 256 pixels; (b) processed image by homomorphic system for multiplication.
Figure 8.10 Homomorphic system for image enhancement. (a) Homomorphic After [Oppenheim et al.].
system for contrast enhancement and dynamic range modification; (b) simplification
3f system in (a'
\ /.
Figure 8.12 System for the modification of local contrast and local luminance
mean as a function of luminance mean.
One application in which adaptive modification of the local contrast and local
luminance mean is used is enhancement of an image taken from an airplane through
varying amounts of cloud cover. According to one simple model of image deg-
radation due to cloud cover, regions of an image covered by cloud have increased
local luminance mean due to direct reflection of the sunlight by cloud cover and
decreased local contrast due to attenuation of the signal from the ground when it
passes through the cloud cover. One approach to enhancing the image, then, is
to increase the local contrast and decrease the local luminance mean whenever
cloud cover is detected. One way to detect the presence of cloud cover is by
measuring the local luminance mean. When the local luminance mean is high, it
is likely that cloud cover is present.
One system developed to reduce the effect of the cloud cover is shown in
Figure 8.12. This system modifies the local contrast and the local luminance mean. p8
In the figure, f(nl, n,) denotes the unprocessed image. The sequence f,(n,, n,)
which denotes the local luminance mean off (n,, n,) is obtained by lowpass filtering
f(n,, n,). The sequence fH(nl, n,), which denotes the local contrast, is obtained
by subtracting f,(nl, n,) from f(nl, n,). The local contrast is modified by multi-
plying fH(n,, n,) with k(fL), a scalar that is a function of fL(n,, n,). The modified
contrast is denoted by fh(n,, n,). If k(fL) is greater than one, the local contrast
is increased, while k(f,) less than one represents local contrast decrease. The
local luminance mean is modified by a point nonlinearity, and the modified local
luminance mean is denoted by ft(nl, n,). The modified local contrast and local
luminance mean are then combined to obtain the processed image, p(n,, n,). T o
increase the local contrast and decrease the local luminance mean when the local
luminance mean is high, we choose a larger k(fL) for a larger f,, and we choose
the nonlinearity, taking into account the local luminance mean change and the
contrast increase. Figure 8.13 shows the result of applying the system in Figure lnput local luminance mean lnput local luminance mean
8.12 to enhance an image taken from an airplane through varying amounts of cloud (cl (dl
cover. Figure 8.13(a) shows the original image of 256 x 256 pixels. Figure 8.13(b)
shows the processed image. The function k(f,) and the nonlinearity used are Figure 8.13 Example of image enhancement by adaptive filtering. (a) Image of 256 x 256
pixels taken from an airplane through varying amounts of cloud cover; (b) result of processing
I
shown in Figures 8.13(c) and 8.13(d). The lowpass filtering operation was per- the image in (a) with the system in Figure 8.12; (c) function k(f,) used in the processing;
formed by using an FIR filter whose impulse response is an 8 x 8-point rectangular (d) nonlinearity used in the processing. After [Peli and Lim].
window.
The system in Figure 8.12 can be viewed as a special case of a two-channel
process. In the two-channel process, the image to be processed is divided into
Sec. 8.1 Contrast and Dynamic Range Modification
466 Image Enhancement Chap. 8
components at the expense of small SNR decrease in the low-frequency compo-
two components, the local luminance mean and the local contrast. The two com- nents.
ponents are modified separately and then the results are combined. In the system Examples of impulse responses of lowpass filters typically used for image
in Figure 8.12, the local luminance mean is modified by a nonlinearity, and the enhancement are shown in Figure 8.14. To illustrate the performance of lowpass
local contrast is modified by the multiplication factor k(f,). As Chapters 9 and filtering for image enhancement, two examples are considered. Figure 8.15(a)
10 show, a two-channel process is also useful in image restoration and coding. shows an original noise-free image of 256 x 256 pixels, and Figure 8.15(b) shows
The notion of adapting an image enhancement system to changing local char- an image degraded by wideband Gaussian random noise at an SNR of 15 dB. The
acteristics is generally a very useful idea that can be applied in a number of different SNR is defined as 10 loglo (image variancelnoise variance). Figure 8.15(c) shows
contexts. For example, gray scale transformation and highpass filtering, discussed the result of lowpass filtering the degraded image. The lowpass filter used is shown
earlier, can be modified so that they adapt to some varying local characteristics. in Figure 8.14(c). In Figure 8.15, lowpass filtering clearly reduces the additive
Even though an adaptive system often requires considerably more computations noise, but at the same time it blurs the image. Blurring is a primary limitation of
than a nonadaptive system, the adaptive system's performance is generally consid- lowpass filtering. Figure 8.16(a) shows an original image of 256 x 256 pixels with
8 bitslpixel. Figure 8.16(b) shows the image coded by a PCM system with Roberts's
erably better. It is worthwhile to explore adaptive systems in solving an image
enhancement problem that requires high performance. Adaptive processing of pseudonoise technique at 2 bitslpixel. Roberts's pseudonoise technique is dis-
cussed in Chapter 10. Figure 8.16(c) shows the result of highpass filtering before
images is also very useful in image restoration and coding, and is discussed further
coding and lowpass filtering after coding. The highpass and lowpass filters used
in Chapters 9 and 10.
in this example are those in Figure 8.8(c) and Figure 8.14(c), respectively.
Figure 8.18 Illustration of a median filter's capability to remove impulsive values. (a) One-
dimensional sequence with two consecutive samples significantly different from surrounding
samples; (b) result of lowpass filtering the sequence in (a) with a 5-point rectangular impulse
response; (c) result of applying a 5-point median filter to the sequence in (a).
2-D median filter. For example, median filtering a 1-D unit step sequence u(n)
preserves the step discontinuity and does not affect the signal u(n) at all. Suppose
we filter a 2-D step sequence u(n,, n,) with a 2-D N x N-point median filter.
Figure 8.20(a) shows u(n,, h)and Figure 8.20(b) shows the result of filtering u(n,, n2)
with a 2-D 5 x 5-point median filter. From Figure 8.20(b), the intensity discon-
tinuities which can be viewed as 1-D steps (for large n, at n, = 0 and large n, at
n, = 0 ) are not affected. However, the discontinuities which are truly 2-D steps
(n, = n2 = 0) are seriously distorted. One method that tends to preserve 2-D
step discontinuities well is to filter a 2-D signal along the horizontal direction with
a 1-Dmedian filter and then filter the result along the vertical direction with another
1-D median filter. This method is called separable median filtering, and is often
used in 2-D median filtering applications. When a separable median filter is applied
to u(n,, n,), the signal u(n,, n,) is not affected.
A median filter is a nonlinear system, and therefore many theoretical results
on linear systems are not applicable. For example, the result of separable median
(c)
filtering depends on the order in which the 1-D horizontal and vertical median
Figure 8.17 Illustration of median filter's tendency to preserve step discontinuities. (a) One- filters are applied. Despite this difficulty, some theoretical results have been
dimensional step sequence degraded by random noise; (b) result of lowpass filtering the developed [Gallagher and Wise; Nodes and Gallagher; Arce and McLoughlin] on
sequence in (a) with a 5-point rectangular impulse response; (c) result of applying a 5-point
median filter to the sequence in (a). Sec. 8.2 Noise Smoothing 473
472 Image Enhancement Chap. 8
median filtering. One result states that repeated application of a 1-D median filter
to a 1-D sequence eventually leads to a signal called a root signal,which is invariant
under further applications of the 1-D median filter.
Two examples are given to illustrate the performance of a median filter. In
the first, the original image of 512 x 512 pixels shown in Figure 8.21(a), is degraded
Figure 8.19 Results of applying a median filter to the sequence in Figure 8.18(a) as a function
of window size. This illustrates that removal of impulsive values by a median filter depends
on the window size. (a) Window size = 3; (b) window size = 5; (c) window size = 7.
"2
A
d n , , n,)
. ..
o . . . . .
o . . . . .
..... ..,
0 . .
o . . . . .
4
- - - - -
- - - a -
+ "I
Figure 8.21 Example of wideband random noise reduction by median filtering. (a) Original
image of 512 x 512 pixels; (b) image degraded by wideband Gaussian random noise at SNR
of 7 dB; (c) processed image by a separable median filter with window size of 3 for both the
Figure 8.20 Illustration that a 2-D N X N-point median filter distorts 2-D step
horizontal and vertical 1-D median filter.
discontinuities. (a) Unit step sequence u(n,. nz); (b) result of filtering u(n,, n2)
with a 5 x j-po~ntmedian filter.
Sec. 8.2 Noise Smoothing 475
Image Enhancement Chap. 8
by wideband Gaussian random noise at an SNR of 7 dB. The degraded image is
shown in Figure 8.21(b). Figure 8.21(c) shows the image processed by a separable
median filter with a window size of 3 for both the horizontal and vertical 1-D
median filters. Although the very sharp edges are not blurred, median filtering
blurs the image significantly. In the second example, the original image from
Figure 8.21(a) is degraded by salt-and-pepper noise. The degraded image is shown
in Figure 8.22(a) and the image processed by the same separable median filter used
in Figure 8.21 is shown in Figure 8.22(b). This example shows that median filtering
is quite effective in removing salt-and-pepper noise.
Consider an analog* function f(x) which represents a typical 1-D edge, as shown
in Figure 8.24(a). In typical problems, it is reasonable to consider the value xo in
the figure an edge point. One way to determinex, is to compute the first derivative
fl(x) or the second derivative f"(x). Figures 8.24(b) and (c) show fl(x) and f"(x).
From the figure, the value x, can be determined by looking for the local extremum
(maximum or minimum) of ft(x) or by looking for a zero crossing of f'(x) where
f"(x) changes its sign. In this section, we discuss methods that exploit the char-
acteristics off '(x). In the next section, we discuss methods that exploit the char-
acteristics of f"(x).
In addition to determining the possible edge point x,, f '(x) can also be used
in estimating the strength and direction of the edge. If I fl(x)( is very large, f (x)
is changing very rapidly and a rapid change in intensity is indicated. If fl(x) is
positive, f(x) is increasing. Based on the above observations, one approach to
detecting edges is to use the system shown in Figure 8.25. In the system, first
Ifl(x)l is computed from f(x). If Ifl(x)l is greater than some threshold, it is a
candidate to be an edge. If all values of x such that 1 ft(x)l is greater than a certain
threshold are detected to be edges, an edge will appear as a line rather than a
point. T o avoid this problem, we further require I fl(x)l to have a local maximum
at the edge points. It may be desirable to determine whether f(x) is increasing
or decreasing at x = x,. The necessary information is contained in ft(x) at x =
Figure 8.24 (a) f(x), (b) f'(x), and (c) f"(x) for a typical 1-D edge.
x,. The choice of the threshold depends on the application. As the threshold
increases, only the values of x where f(x) changes rapidly will be registered as
candidate edges. Since it is difficult to choose the threshold optimally, some trial checking if )Vf(x, y)( is a local maximum in at least one direction. The property
and error is usually involved. It is also possible to choose the threshold adaptively.
that (Vf(x, y)Jachieves its local maximum in at least one direction is usually checked
The system in Figure 8.25 is based on the particular type of edge shown in Figure
along a few specified directions. In most cases, it is sufficient to check for local
8.24(a), but it is generally applicable to detecting various other types of edges. maxima in only the horizontal and vertical directions. If IVf(x, y)l is a local
The generalization o f f ' (x) to a 2-D function f (x, y) is the gradient Vf (x, y)
maximum along any one of the specified directions at a potential edge point, the
given by potential edge point is considered to be an edge point. One difficulty with this
simple edge thinning algorithm is that it creates a number of minor false edge lines
in the vicinity of strong edge lines. One simple method to remove most of these
minor false edge lines is to impose the following additional constraints:
where 2, is the unit vector in the x-direction and iy is the unit vector in the y-
direction. A generalization of the edge detection system in Figure 8.25 based on
Vf(x, y) is shown in Figure 8.26. The magnitude of Vf(x, y) is first computed
and is then compared with a threshold to determine candidate edge points. If all
values of (x, y) such that IVf(x, y)l is greater than a certain threshold are detected flxl -
7
d(.) I f'(x)
1.1
]I "x!l
- >Threshold -
Yes- IS I f1(x)I a
local maximum? -
Yes- x, is an
edge point
T/Y I atx=x,?
to be edges, the edges will appear as strips rather than lines. The process of
determining an edge line from a strip of candidate edge points is called edge
thinning. In one simple edge thinning algorithm, the edge points are selected by
X,
1.0
is not X, is not
an edge an edge
*Sometimes, it is more convenient to develop results in the analog domain. In such
instances, we will begin the development of results in the analog domain and then discretize Figure 8.25 System for 1-D edge detection.
the results at some later point in the deve!~pn?ec?.
Sec. 8.3 Edge Detection
478 Image Enhancement Chap. 8
and continuity of the estimate of af(x, y)lax. Examples of "improved" estimates
IV f ( x , Y) l >Threshold Yes of af (x, y)lax are
-
L
at (x,, yo)?
Edge
thinning
Edge
map
lNO
(x,, yo) is not
an edge point
Since the computed derivatives are compared later with a threshold and the thresh-
old can be adjusted, the scaling factors llT and 112T can be omitted. Typically
t
(dl
l )el )
Figure 8.27 Impulse responses of fil-
ters that can be used for directional
edge detection. (a) Vertical edge detec-
tion; (b) horizontal edge detection;
(c) and (d) diagonal edge detection.
the expressions in (8.7) are averaged over several samples to improve the reliability Sec. 8.3 Edge Detection 48 1
480 Image Enhancement Chap. 8
IF-
The objective of an edge detection algorithm is to locate the regions where the
intensity is changing rapidly. In the case of a 1-D function f(x), searching for
regions of rapidly changing intensity corresponds to searching for regions where
fl(x) is large. For gradient-based methods, ff(x) is considered large when its
magnitude I ft(x)l is greater than a threshold. Another possible way is to assume
that ft(x) is large whenever it reaches a local extremum, that is, whenever the
second derivative f"(x) has a zero crossing. This is illustrated in Figure 8.24.
Declaring zero-crossing points as edges results in a large number of points being
Figure 8.28 Approximation of (a) Jf(x, y)lJx with f ( n , , n,) * h,(n,, n,); (b) Jf(x, y)/ declared to be edge points. Since there is no check on the magnitude of ff(x),
ay with f ( n , , n,) * h,(n,, n,). Sobel's
- edge detection. method is based on comparison
of v ' ( f ( n , , n,) * h,(n,, n,)), + ( f ( n , , n,) * h,(n,, n,))' with a threshold. Sec. 8.3 Edge Detection 483
-
Figure 8.31 Result of applying (a) Sobel edge detector and (b) Roberts's cdge detector to
the image in Figure 8.30(a).
any small ripple in f (x) is enough to generate an edge point. Due to this sensitivity
to noise, the application of a noise reduction system prior to edge detection is very
desirable in processing images with background noise.
A generalization of a2f(x)lax2 to a 2-D function f(x, y) for the purpose of
edge detection (see Problem 8.19) is the Laplacian V2f(x, y) given by
For a 2-D sequence f(n,, n,), the partial second derivatives d2f(x, y)ldx2 and
d2f (x, y)/dy2 can be replaced by some form of second-order differences. Second-
order differences can be represented by convolution of f(nl, n,) with the impulse
response of a filter h(nl, n,). Examples of h(nl, n,) that may be used are shown
in Figure 8.32. To illustrate that f(nl, n,) * h(nl, n,) may be viewed as a discrete
approximation of V2f(x, y), let us consider h(nl, n,) in Figure 8.32(a). Suppose
we approximate df (x, y)lax by
We again omitted the scaling factor, since it does not affect zero-crossing points.
Figure 8.30 Edge maps obtained by directional edge detectors. (a) Image of 512 X 512
pixels; (b) result of applying a vertical edge detector; (c) result of applying a horizontal edge
Since the forward difference is used in (8.12), we can use the backward difference
detector. in approximating d2f (x, y)ldx2:
. -
+
C.
k l = n ~ - M kt=n2-M
f(kbk2)
with M typically chosen around 2. Since u?(n,, n2) is compared with a threshold,
the scaling factor 1/(2M + in (8.16a) can be eliminated. In addition, the local
variance a2f needs to be computed only for (n,, n2) which are zero-crossing points
Figure 8.32 Examples of h(n,, n,) that of V2f(nl, n,). Figure 8.35 shows the result of applying the system in Figure 8.34
may be used in approximating V2f(x, y) to the image in Figure 8.33(a). Comparison of Figures 8.33(b) and 8.35 shows
with f(nl, n2) * h(nl, n2). considerable reduction in the "false" edge contours.
The system in Figure 8.34 can be interpreted as a gradient-based method.
From (8.11) and (8.14), and approximating a2f (x, y)lay2 in a similar manner, we
obtain
The resulting V2f(nl, n,) is f(n,, n,) * h(n,, n,) with h(n,, n,) in Figure 8.32(a).
Depending on how the second-order derivatives are approximated, it is possible
to derive many other impulse responses h(n,, n,), including those shown in Figures
8.32(b) and (c).
Figure 8.33 shows an example where edges were detected by looking for zero-
crossing points of V2f(nl, n,). Figure 8.33(a) shows an image of 512 x 512 pixels.
Figure 8.33(b) shows the zero-crossing points of V2f(nl, n,), obtained from (8.15)
and using the image in Figure 8.33(a) as f (n,, n,). Since zero-crossing contours
are boundaries between regions, they tend to be continuous lines. As a result, Figure 8.33 Edge map obtained by a Laplacian-based edge detector. (a) Image of 512 x
edge thinning necessary in gradient-based methods is not needed in Laplacian- 512 pixels; (b) result of convolving the image in (a) with h(nl, n,) in Figure 8.32(a) and then
based methods. In addition, algorithms that force continuity in edge contours are finding zero-crossing points.
not as useful in Laplacian-based methods as in gradient-based methods. As is
Sec. 8.3 Edge Detection 487
486 Image Enhancement Chap. 8
regions may be present in the same image. "Optimal" detection of significant
intensity changes, therefore, generally requires the use of operators that respond
to several different scales. Marr and Hildreth suggested that the original image
be band-limited at several different cutoff frequencies and that an edge detection
algorithm be applied to each of the images. The resulting edge maps have edges
, - corresponding to different scales.
v 2f ' n l , "2: zero-crossing Yes Marr and Hildreth argue that edge maps of different scales contain important
-
s
v 2[.I
- point?
information about physically significant parameters. The visual world is made of
I
elements such as contours, scratches, and shadows, which are highly localized at
lNO
Not an Not an
edge point
their own scale. This localization is also reflected in such physically important
changes as reflectance change and illumination change. If the same edge is present
edge point
in a set of edge maps of different scale, it represents the presence of an image
Figure 8.34 Laplacian-based edge detection system that does not produce many false edge intensity change due to a single physical phenomenon. If an edge is present in
contours. only one edge map, one reason may be that two independent physical phenomena
are operating to produce intensity changes in the same region of the image.
To bandlimit an image at different cutoff frequencies, the impulse response
The local variance oZ,(nl, n,) is closely related to the gradient magnitude. Com- h(x, y) and frequency response H(flx, fly) of the lowpass filter proposed [Marr
paring oZ,(nl, n,) with a threshold is similar to comparing the gradient magnitude and Hildreth; Canny] is Gaussian-shaped and is given by
with a threshold. Requiring that V2f(nl, n,) crosses zero at an edge can be inter-
preted as edge thinning. With this interpretation, we can implement the system
in Figure 8.34 by computing uF(nl, n,) first and then by detecting the zerosrossing
points of V2f(nl, n,) only at those points where oj(nl, n,) is above the chosen
threshold. where u determines the cutoff frequency with larger u corresponding to lower
cutoff frequency. The choice of Gaussian shape is motivated by the fact that it is
8.3.3 Edge Detection by Marr and Hildreth's Method smooth and localized in both the spatial and frequency domains. A smooth h(x, y)
is less likely to introduce any changes that are not present in the original shape.
In the previous two sections, we discussed edge detection algorithms that produce A more localized h(x, y) is less likely to shift the location of edges.
one edge map from an input image. Marr and Hildreth [Marr and Hildreth; Marr] From the smoothed images, edges can be detected by using the edge detection
observed that significant intensity changes occur at different scales (resolution) in algorithms discussed in the previous two sections. Depending on which method
an image. For example, blurry shadow regions and sharply focused fine-detail is used, the lowpass filtering operation in (8.17) and the partial derivative operation
used for edge detection may be combined. For example, noting that V2[.] and
convolution * are linear, we obtain
For the Gaussian function h(x, y) in (8.17), V2h(x, y) and its Fourier transform
are given by
The edge detection algorithms discussed above are general methods, in that they
are developed independent of an application context. An alternative approach is
to develop an edge detection algorithm specific to a particular application problem.
If we know the shape of an edge, for example, this information can be incorporated
in the development of an edge detection algorithm. To illustrate how an edge
detection algorithm specific to an application problem may be developed, we con-
sider the problem of detecting boundaries of coronary arteries from an angiogram
[Abrams].
The coronary arteries are the blood vessels which encircle the heart and supply
blood to the heart muscle. Narrowing of the coronary arteries prevents adequate
blood supply from reaching the heart, causing pain and damage to the heart muscle.
Such damage is called coronary disease. To determine the severity of coronary
disease, a coronary angiogram is used. An angiogram is an X ray picture of arteries
taken after a contrast agent, typically iodine, has been injected into the vessels.
The contrast agent is injected directly into the arteries through a catheter in order
to achieve high concentrations. An example of a coronary angiogram is shown in
Figure 8.38. Different observers making conventional visual evaluations of an
angiogram will give widely varying evaluations of the severity of the disease.
The most commonly used measure of an obstruction is percentage of stenosis,
which is defined as the maximum percentage of arterial narrowing within a specified
length of the vessel. One approach to estimating the percentage of stenosis begins
with determining the vessel boundaries from an angiogram. We will be concerned
with the problem of detecting the vessel boundaries.
One reasonable model of an angiogram f(n,, n,) is given by
(3.2;3.2)
Ib)
where v(nl, n,) denotes the vessel, p(nl, n2) denotes the background, g(n,, n,) Figure 8.36 Sketch of (a) V2h(x,y ) and (b) - F[V2h(x,y)] in Equation (8.19) for
u2 = 1.
denotes blurring, and w(nl, n,) denotes the background noise. The vessel function
v(n,, n,) is derived from a generaiized cone modei of a 3-D vessel which is con-
Sec. 8.3 Edge Detection
490 Image Enhancement Chap. 8
Figure 8.38 Coronary angiogram.
vessel parameters, the polynomial coefficients of p(n,, n,), and the noise variance.
The vessels, tissues, bones, and the radiographic imaging process are much
more complicated than suggested by the simple model presented above. Never-
theless, the model has been empirically observed to lead to good estimates of the
vessel boundaries and corresponding percentage of stenosis. The model param-
eters may be estimated by a variety of different procedures. One possibility is the
Figure 8.37 Edge maps obtained from lowpass filtered image. Blurred image with (a) = 4;
maximum likelihood (ML) parameter estimation method discussed in Section 6.1.5.
(b) u2 = 16; (c) u2 = 36. Result of applying Laplacian-based algorithm to the blurred image;
(d) uZ = 4; (e) uZ = 16; ( f ) 02 = 36.
In the ML method, the unknown parameters denoted by 0 are estimated by max-
imizing the probability density function pfcn,,nz,le(fo(nl,
nz)(Bo)where f(nl, n,) is
the angiogram observation and 0 represents all the unknown parameters to be
tinuous and has elliptical cross sections. The elliptical shape is chosen because of estimated. The ML method applied to vessel boundary detection is a nonlinear
the small number of parameters involved in its characterization and because of problem, but has been solved approximately [Pappas and Lim]. Figures 8.39 and
some empirical evidence that it leads to a good estimate of percentage of stenosis. 8.40 illustrate the results of applying the ML parameter estimation method to the
The 1-D cross section of v(n,, n,), which consists of one blood vessel, is totally detection of the blood vessels using the 1-D version of the 2-D model in (8.20).
specified by three parameters, two representing the blood vessel boundaries and In the 1-D version, f(n,, n,) in (8.20) is considered a 1-D sequence with variable
one related to the x-ray attenuation coefficient of iodine. The continuity of the n, for each n,. Computations simplify considerably when the 1-D model is used.
vessel is guaranteed by fitting a cubic spline function to the vessel boundaries. Figure 8.39(a) shows the original angiogram of 80 x 80 pixels, and Figure 8.39(b)
The background p(nl, n,) is modeled by a 2-D low-order polynomial. Low-order shows the detected vessel boundaries superimposed on the original image. Figure
polynomials are very smooth functions, and their choice is motivated by the ob- 8.40 is another example. Developing an edge detection algorithm specific to an
servation that objects in the background, such as tissue and bone, are much bigger application problem is considerably more complicated than applying the general
than the blood vessels. The blurring function g(n,, n,) is modeled by a known edge detection algorithms discussed in previous sections. However, it has the
2-D Gaussian function that takes into account the blurring introduced at various potential of leading to much more accurate edge detection.
stages of the imaging process. The noise w(n,, n,) is random background noise
and is assumed to be white. The parameters in the model of f(n,, n,) are the Sec. 8.3 Edge Detection 493
If f,(x, y) is bandlimited and the sampling frequencies l/Tl and lIT, are higher
than the Nyquist rate, from Section 1.5 fc(x, y) can be reconstructed from f(nl, n2)
with an ideal DIA converter by
where h(x, y) is the impulse response of an ideal separable analog lowpass filter
given by
7l 7l
sin - x sin - y
h(x, y) = ----
Tl T2
7l 7l
-X
Tl EY
There are several difficulties in using (8.22) and (8.23) for image interpolation.
The analog image fc(x, y), even with an antialiasing filter, is not truly bandlimited,
so aliasing occurs when fc(x, y) is sampled. In addition, h(x, y) in (8.23) is an
infinite-extent function, so evaluation of fc(x, y) using (8.22) cannot be carried out
in practice. T o approximate the interpolation by (8.22) and (8.23), one approach
is to use a lowpass filter h(x, y) that is spatially limited. For a spatially limited
Figure 8.40 Another example of vessel boundary detection from an angiogram by signal
modeling. (a) Angiogram of 80 x 80 pixels; (b) vessel boundary detection.
+
where denotes the pixels over which f(x, y) is approximated. Solving (8.28) is
a simple linear problem, since the ~ $ ~ ( y)
x , are fixed. Advantages of polynomial
then it is called zero-order interpolation. In zero-order interpolation, fc(x, y) is interpolation include the smoothness of fc(x, y) and the simplicity of evaluating
chosen as f(nl, n,) at the pixel closest to (x, y). Other examples of h(x, y) which dfc(x, y)ldx and dfc(x, y)ldy, partial derivatives which are used in such applications
are more commonly used are functions of smoother shape, such as the spatially as edge detection and motion estimation. In addition, by fitting a polynomial with
limited Gaussian function or the windowed ideal lowpass filter. fewer coefficients than the number of pixels in the region JI in (8.28), some noise
Another simple method widely used in practice is bilinear interpolation. In smoothing can be accomplished. Noise smoothing is particularly useful in appli-
this method, fc(x, y) is evaluated by a linear combination of f(nl, n,) at the four cations where the partial derivatives dfc(x, y)ldx and dfc(x, y)ldy are used.
closest pixels. Suppose we wish to evaluate f(x, y) for nlTl 5 x r (n, l)Tl + Spatial interpolation schemes can also be developed using motion estimation
and n2T2 r y r (n, + 1)T2, as shown in Figure 8.41. The interpolated fc(x, y) algorithms discussed in the next section. One example, where an image frame
in the bilinear interpolation method is that consists of two image fields is constructed from a single image field, is discussed
fc(x, Y) = (1 - 'x)(1 - Ay)f(nl, '2) + ( l - ' ~ ) ~ y f ( ~n2l , + in Section 8.4.4.
(8.2Sa) Figure 8.42 shows an example of image interpolation. Figure 8.42(a) shows
+ 4(1 - Aylf(nl + 1, n,) + AxAyf(n1 + 1, n2 + 1) an image of 256 x 256 pixels interpolated by zero-order hold from an original
where Ax = (X - nlTl)ITl (8.25b) image of 64 x 64 pixels. Figure 8.42(b) shows an image of 256 x 256 pixels
obtained by bilinear interpolation of the same original image of 64 x 64 pixels.
and Ay = 0, - n2T2)/T2. (8.25~)
Another method is polynomial interpolation. Consider a local spatial region, 8.4.2 Motion Estimation
say 3 x 3 or 5 x 5 pixels, over which f(x, y) is approximated by a polynomial. New image frames can be created from existing ones through temporal interpo-
The interpolated image fc(x, y) is lation. Unlike spatial interpolation, temporal interpolation requires a large amount
N
fc(x, Y) = C Si+i(x, Y)
i=l
(8.26)
x , when N = 6 is
where &(x, y) is a term in a polynomial. An example of ~ $ ~ ( y)
m I
I AXT, I I
1 I I
I I I (a) (b)
I I I
I I I Figure 8.41 Region where f,(x, y) is in-
I l l terpolated from the four neighboring pix- Figure 8.42 Example of spatial interpolation. (a) Image of 256 x 256 pixels interpolated
I I I by zero-order hold from an original image of 64 x 64 pixels; (b) image of 256 x 256 pixels
I I I els fc(nlTl, n2T2)7fc((nl + l)Tl, n2T2):
I I I fc(nlTl, (n2 + l)Tz), and fc((nl + l)Tl, obtained by bilinear interpolation of the same original image used in (a).
n,T, x (n, + l!T, (.nz + l)Tz).
Sec. 8.4 Image Interpolation and Motion Estimation 497
496 Image Enhancement Chap. 8
of storage. Therefore a new frame is usually created from two adjacent frames,
one in the past and one in the future relative to the frame being created.
The simplest method, often used in practice, is the zero-order hold method,
in which a new frame is created by repeating the existing frame which is closest in
time. In transforming a 24 frameslsec motion picture to a 60 fieldslsec NTSC
signal, three consecutive fields are created from a single motion picture frame, and
the next two consecutive fields are created from the next motion picture frame.
This process is repeated for the entire motion picture. This is known as the 3:2
pull-down method. For most scenes without large global motion, the results are
quite good. If there is large global motion, however, zero-order hold temporal
interpolation can cause noticeable jerkiness of motion. One method to improve
the performance of temporal interpolation is through motion compensation.
A motion picture or television broadcast is a sequence of still frames that are Figure 8.43 Image translated with displacement of (d,, d,). ( a ) f ( x , y , r-,);
displayed in rapid succession. The frame rate necessary to achieve proper motion (b) f (x, Y , to).
rendition is usually high enough to ensure a great deal of temporal redundancy
among adjacent frames. Much of the variation in intensity from one frame to the A direct consequence of (8.30) is a differential equation which relates v, and
next is due to object motion. The process of determining the movement of objects vy to af(x, y, t)lax, af(x, y, t)lay, and af(x, y, t)lat, which is valid in the spatio-
within a sequence of image frames is known as motion estimation. Processing temporal region over which uniform translational motion is assumed. To derive
images accounting for the presence of motion is called motion-compensated image the relationship, let f(x, y, t-,) be denoted by s(x, y):
processing.
Motion-compensated image processing has a variety of applications. One S(X,Y) = f(x, Y, t-1).
application is image interpolation. By estimating motion parameters, we can create From (8.30) and (8.31),
a new frame between two adjacent existing frames. The application of motion-
compensated processing to image interpolation is discussed in the next section.
Another application is image restoration. If we can estimate the motion param- where a(x,y,t) = x - vx(t - L1) (8.32b)
eters and identify regions in different frames where image intensities are expected
to be the same or similar, temporal filtering can be performed in those regions. and p(x, Y, t) = Y - vy(t - t-1). (8.32~)
Application to image restoration is discussed in Chapter 9. Motion-compensated
image processing can also be applied to image coding. By predicting the intensity From (8.32), assuming af (x, y, t)lax, af (x, y, t)lay , and af(x, y, t)lat exist, we obtain
of the current frame from the previous frames, we can limit our coding to the
difference in intensities between the current frame and the predicted current frame.
In addition, we may be able to discard some frames and reconstruct the discarded
frames through interpolation from the coded frames. Application to image coding
is discussed in Chapter 10.
The motion estimation problem we consider here is the translational motion
of objects. Let f(x, y, t-,) and f(x, y, to) denote the image intensity at times t-,
and to, respectively. We will refer to f(x, y, t-,) and f(x, y, to) as the past and
current frame. We assume that From (8.33),
f(x, Y, to) = f ( -~ 4 , Y - dy, t-1) (8.29)
where d, and dy are the horizontal and vertical displacement between t-, and to.
An example of f(x, y, t-,) and f(x, y, to) which satisfy (8.29) is shown in Figure Equation (8.34) is called a spatio-temporal constraint equation and can be gener-
8.43. If we assume uniform motion between t-l and to, alized to incorporate other types of motion, such as zooming [Martinez].
The assumption of simple translation that led to (8.29) and the additional
f ( ~ , y , t ) = f ( ~ - ~ , ( t - t ~ ~ ) , y - ~ ~ ( t - t ~ ~ ) , t(8.30)
~ ~ ) , t ~ , ~ t ~ t ~
assumption of translation with uniform velocity that led to (8.34) are highly re-
where vXand ~j, ai&the uniform horizonta: aiid vertical veliicitizs.
Sec. 8.4 Image Interpolation and Motion Estimation 499
488 Image Enhancement Chap. 8
To the extent that (8.29) is valid, the error expression in (8.36) or (8.37) is zero
at the correct (d,, d,).
strictive. For example, they do not allow for object rotation, camera zoom, regions
Minimizing the Error in (8.36) or (8.37) is a nonlinear problem. Attempts
uncovered by translational object motion, or multiple objects moving with different
to solve this nonlinear problem have produced many variations, which can be
v, and v,. However, by assuming uniform translational motion only locally and
grouped into block matching and recursive methods. We discuss block matching
estimating the two motion parameters (d,, d,) or (v,, v,) at each pixel or at each
methods first.
small subimage, (8.29) and (8.34) are valid for background regions that are not
One straightforward approach to solve the above minimization problem is to
affected by object motion and for regions occupied by objects which do indeed
evaluate the Error for every possible (d,, d,) within some reasonable range and
translate with a uniform velocity. Such regions~ccupya significant portion of a
choose (d,, d,) at which the Error is minimum. In this approach, a block of pixel
typical sequence of image frames. In addition, if we identify the regions where intensities at time to is matched directly to a block at time t-,. This is the basis
motion estimates are not accurate, we can suppress motion-compensated processing of block matching methods. Since the error expression has to be evaluated at
in those regions. In image interpolation, for example, we can assume that v, and many values of (d,, d,), this method of estimating (d,, d,) is computationally very
v, are zero. expensive and many methods have been developed to reduce computations. In
Motion estimation methods can be classified broadly into two groups, that one simplification, we assume that (d,, d,) is constant over a block, say of 7 x 7
is, region matching methods and spatio-temporal constraint methods. Region pixels. Under this assumption, we divide the image into many blocks and weestimate
matching methods are based on (8.29), and spatio-temporal constraint methods (d,, d,) for each block. Even though we generally choose the block size to be the
are based on (8.34). We first discuss region matching methods. same as the size of R in (8.35), it is not necessary to do so. In another simplification,
we can limit the search space to integer values of (d,, d,]. In addition to reducing
Region matching methods. Region matching methods involve consider-
ing a small region in an image frame and searching for the displacement which the search space from continuous variables (d,, d,) to discrete variables, limiting
the search space to integer values allows us to determine f(n, - d,, n, - d,, t-l),
produces the "best match" among possible regions in an adjacent frame. In region necessary in the evaluation of the error expression, without interpolation. How-
matching methods, the displacement vector (d,, d,) is estimated by minimizing
ever, the estimates of (d,, d,) are restricted to discrete values.
We can reduce the computational requirements in block matching methods
further by using a more efficient search procedure than a brute force search. One
such method is called a three-step search method, illustrated in Figure 8.44. In
where R is the local spatial region used to estimate (d,, d,) and C [.;I is a metric the first step of this method, the error expression is evaluated at nine values of
that indicates the amount of dissimilarity between two arguments. The integrals (d,, d,) which are marked by "1" and filled circles. Among these nine values, we
in (8.35) can be replaced by summation iff (x, y, t) is sampled at the spatial variables choose (d,, d,) with the smallest Error. Suppose the smallest Error is at (d, =
(x, y). If we estimate (d,, d,) at time to, the region R is the local spatial region 3, d, = -3). In the second step, we evaluate the error expression at eight
that surrounds the particular spatial position at which (d,, d,) is estimated. The additional values of (d,, d,) which are marked by "2" and filled squares. We now
size of R is dictated by several considerations. If it is chosen too large, the choose (d,, d,) from nine values [eight new values and (d, = 3, d, = - 3)]. This
assumption that (d,, d,) is approximately constant over the region R may not be procedure is repeated one more time. At the end of the third step, we have an
valid and evaluation of the error expression requires more computations. If it is estimate of (d,, d,). This procedure can be easily generalized to more than three
chosen too small, the estimates may become very sensitive to noise. One reason- steps to increase the range of possible (d,, d,). Another search method is to
able choice based on these considerations is a 5 x 5-pixel region. There are many estimate d, first by searching (d,, 0). Once d, is estimated, say ax, d, is estimated
possible choices for the dissimilarity function C [., +I. Two commonly used choices by searching (&, d,). If we wish to improve the estimate further, we can reestimate
are the squared difference and absolute difference between the two arguments. d, by searching (d,, &) where &, is the estimate of d, obtained in the previous
With these choices of C [., -1, (8.35) reduces to step. At each step in this procedure, we estimate only one parameter, which is
considerably simpler than estimating two parameters jointly. These heuristic meth-
ods-rsduce the number of computations by reducing the number of values of
(6,, d,) at which the error expression is evaluated. However, the Error at the
estimated (d,, d,) may not be the global minimum.
In block matching methods, we estimate (d,, d,) by explicitly evaluating the
Error at some specified set of (d,, d,). As an alternative, we can use descent
algorithms such as the steepest descent, Newton-Raphson, and Davidon-Fletcher-
The function f(x, y, to) - f(x - d,, y - d,, t-l) is called the displaced frame Powell methods discussed in Section 5.2.3, to solve the nonlinear problem of
difference. In typical applications of motion-compensated processing, the system
perfurnlance is not very sensiiive io the specific choice of the dissimiiarity function.
Sec. 8.4 lmage Interpolation and Motion Estimation' 501
where C is a step size that can be adjusted and Error(d,, d,) is the Error in (8.35)
as a function of d, and d, for a given R. Recursive methods typically involve
partial derivatives and tend to be sensitive to the presence of noise or fine details
in an image. Smoothing the image before motion estimation often improves the
performance of recursive methods.
In recursive methods, (d,, d,) is not limited to integer values and can be
estimated within subpixel accuracy. Update terms typically include evaluation of
partial derivatives of Error(d,, d,), which involves evaluation of f(x, y, t-,) and
its partial derivatives at an arbitrary spatial point. In practice, f(x, y, t-,) is known
only at x = nlTl and y = n,T,. To evaluate the necessary quantities at an arbitrary
spatial point (x, y), we can use the spatial interpolation techniques discussed in
Section 8.4.1.
In recursive methods, (d,, d,) is typically estimated at each pixel. In using
the recursion relationship in (8.38), (dx(0), 440)) is typically obtained from the
estimate at the adjacent pixel in the same horizontal scan line, in the adjacent scan
line, or in the adjacent frame. These methods are called pel (picture element)
recursive estimation with horizontal, vertical, and temporal recursion, respectively.
Given (dx(k), dy(k)), we can use the recursion relation in (8.38) only once for a
pixel and then move on to the next pixel. Alternatively, we can use the recursion
relation more than once for a more accurate estimate of (d,, d,) before we move
on to the next pixel.
Although we classified region matching methods into block matching methods
and recursive methods to be consistent with the literature, the boundary between
the two classes of methods is fuzzy. By choosing a finer grid at which the error
expression is evaluated, we can also estimate (d,, d,) within subpixel accuracy with
Figure 8.44 Illustration of three-step search method.
block matching methods. In addition, the three-step search procedure discussed
as a block matching method can be viewed as a recursive method in which the
estimate is improved iteratively. A major disadvantage of region matching meth-
minimizing the Error with respect to (d,, d,). In this class of algorithms, a recursive ods is in the amount of computation required. Even though only two parameters,
(iterative) procedure is used to improve the estimate in each iteration. For this dx and d,, must be estimated, solving the nonlinear problem at each pixel or at
reason, they are called recursive methods. each small subimage can be computationally very expensive.
Let (&(k), %(k)) denote the estimate of (d,, d,) after the kth iteration. In
recursive methods, the estimate of (d,, d,) after the k + l t h iteration, (d,(k + I),
dy(k + I)), is obtained by Spatio-temporal constraint methods. Algorithms of this class are based
on the spatio-temporal constraint equation (8.34), which can be viewed as a
dx(k + 1) = dx(k) + ux(k) (8.38a)
linear equation for two unknown parameters v, and v, under the assumption that
dy(k + 1) = %(k) + u,(k) (8.38b) df(x, y, t)la~,af ( x , y, t)lay, and af (x, y, t)ldt are given. By evaluating df (x, y, t)ldx,
where ux(k) and u,(k) are the update or correction terms. The update terms vary
Sec. 8.4 Image Interpolation and Motion Estimation 503
502 Image Enhancement Chap. 8
The two linear equations in (8.43) may have multiple solutions. Suppose
f(x, y, t) is constant in the spatio-temporal region Jr. Then df(x, y, t)lax, af(x, y, t)l
af(x, y, t)lay and af(x, y, t)lat at many points (xi, yi, ti) for 1 5 i 5 N at which v, and
ay, and af (x, y, t)lat are all zero, and all the elements in W and y in (8.43) are
v, are assumed constant, we can obtain an overdetermined set of linear equations:
zero. Therefore, any (v,, v,) will satisfy (8.43a). Any velocity in a uniform
intensity region will not affect f(x, y, t ) , so the true velocity cannot be estimated
from f(x, y, t). Suppose f(x, y, t) is a perfect step edge. The velocity along the
direction parallel to the step edge will not affect f(x, y, t) and therefore cannot be
(8.40) estimated. These problems have been studied, and a solution has been developed
The velocity estimates can be obtained by minimizing [Martinez]. Let A, and h2 denote the eigenvalues of W, and let a,and a, denote
the corresponding orthonormal eigenvectors. A reasonable solution to (8.43) is
given by (see Problem 8.24)
Case 1. v = 0, Al, A, < threshold (8.44a)
(8.41) Case 2.
Since the error expression in (8.41) is a quadratic form of the unknown parameters
v, and v,, the solution requires solving two linear equations. More generally,
suppose (8.34) is valid in a local spatio-temporal region denoted by \Ir. To estimate Case 3. v = W-Iy, otherwise (8.44~)
v, and v,, we minimize Case 1 includes the uniform intensity region, where the velocity is set to zero.
2
a f ( x > ~ , t ) + v y a f ( x , ~ > t ) + a f ( x j ~ > t ) ) d x d (8.42)
ydr
Case 2 includes the perfect step edge region, and the velocity estimate in (8.44b)
Error =
(x,y ,r) E 6
(
vx
ax ay at
is along the direction perpendicular to the step edge.
Solving the linear equations in (8.43) requires evaluation of af(x, y, t)/ax,
af(x, y , [)lay, and af(x, y, r)lat at arbitrary spatio-temporal positions. This can be
The integrals in (8.42) may be replaced by summations. One such example is accomplished by extending the spatial polynomial interpolation method to 3-D,
(8.41). Differentiating the Error in (8.42) with respect to v, and v, and setting which has the advantage over other approaches in computational simplicity and
the results to zero leads to robustness to noise. In the 3-D polynomial interpolation, the interpolated
f(x7 Y, t) is
wv = y
N
where f ( x , Y, t) = C
i=l
Si +i(x, ~t t). (8.45)
One reasonable choice of the region JI typically contains 50 pixels: 5 for n,, 5 for
n,, and 2 for t. Minimizing the error expression in (8.47) with respect to Si requires
solving a set of linear equations. Note that the partial derivatives af(x, y , t)lax,
af(x, y, t)lay, and af(x, y, t)lat used in (8.43) can be precomputed in terms of Si.
The motion estimation algorithms discussed above require determination of
the spatio-temporal regions denoted by Jy over which the uniform translational
motion can be assumed. Since a local spatial region in a frame is on the order of
5 x 5 pixels in size, determining a reasonable IJJ requires an initial displacement
estimate within a few pixels of the true displacement. In practice, it is not un-
lmage Enhancement Chap. 8 ;I Sec. 8.4 lmage lnterpolation and Motion Estimation 509
8.48(b) shows the frame obtained by bilinear spatial interpolation. Figure 8.48(c)
shows the frame obtained by estimating the horizontal displacement based on (8.50)
Once the horizontal displacement (or velocity) is estimated, it can be used in spatial and then using the estimate for spatial interpolation. Spatial continuity of lines
interpolation in a manner analogous to the temporal interpolation discussed in and contours is preserved better in the image in Figure 8.48(c) than in the other
Section 8.4.3. Figure 8.48 illustrates the performance of a spatial interpolation two images in Figures 8.48(a) and (b).
algorithm based on (8.50). Figure 8.48(a) shows a frame of 256 X 256 pixels
obtained by repeating each horizontal line of a 256 X 128-pixel image. Figure 8.5 FALSE COLOR AND PSEUDOCOLOR
It is well known that the human visual system is quite sensitive to color. The
number of distinguishable intensities, for example, is much smaller than the number
of distinguishable colors and intensities. In addition, color images are generally
much more pleasant to view than black-and-white images. The aesthetic aspect
of color can be used for image enhancement. In some applications, such as tele-
vision commercials, false color can be used to emphasize a particular object in an
image. For example, a red banana in a surrounding of other fruits of natural color
will receive more of a viewer's attention. In other applications, data that do not
represent an image in the conventional sense can be represented by a color image.
In this case, the color used is called pseudocolor. As an example, a speech spec-
trogram showing speech energy as a function of time and frequency can be rep-
resented by a color image, with silence, voiced segments, and unvoiced segments
distinguished by different colors and energy represented by color brightness.
The use of color in image enhancement is limited only by artistic imaginations,
and there are no simple guidelines or rules to follow. In this section, therefore,
we will concentrate on three examples that illustrate the type of image enhancement
that can be achieved by using color. In the first example, we transform a mono-
chrome (black and white) image to a color image by using a very simple rule. T o
obtain a color image from a monochrome image, the monochrome image is first
filtered by a lowpass filter, a bandpass filter, and a highpass filter. The lowpass
filtered image is considered to be the blue component of the resulting color image.
The bandpass filtered image is considered the green component, and the highpass
filtered image is considered the red component. The three components-red,
green, and blue-are combined to form a color image. Figure 8.49(a) (see color
insert) shows an original monochrome image of 512 X 512 pixels. Figure 8.49(b)
shows the color image obtained by using this procedure. The color is pleasant,
but this arbitrary procedure does not generate a natural-looking color image.
Changing classic black-and-white movies such as Casablanca or It's A Wonderful
Life to color movies requires much more sophisticated processing and a great deal
of human intervention.
In the second example, we consider the display of a 2-D spectral estimate on
a CRT. The 2-D spectral estimate, represented by P,(w,, w,) in dB, is typically
displayed by using a contour plot. A n example of a 2-D maximum likelihood
spectral estimate for the data of two sinusoids in white noise is shown in Figure
8.50(a). The maximum corresponds to 0 dB and the contours are in increments
(c)
of 0.5 dB downward from the maximum point. As we discussed in Chapter 6, in
Figure 8.48 Creation of a frame from a field by spatial interpolation. (a) Image of 256 X such applications as detection of low-flying aircraft by an array of microphone
256 pixels obtained from an image of 256 x 128 pixels by zero-order hold interpolation; sensors, we wish to determine the number of sinusoids present and their frequen-
(b) same as (a) obtained by bilinear interpolation; (c) same as (a) obtained by application
of a motion estimation algorithm. Sec. 8.5 False Color and Pseudocolor 511
H. L. Abrams, ed., Coronary Artiography. Boston: Little, Brown and Company, 1983.
J. K. Aggarwal and N. Nandhakumar, On the computation of motion from sequences of
images-a review, Proc. IEEE, Vol. 76, August 1988, pp. 917-935.
G. R. Arce, N. C. Gallagher, and T. A. Nodes, Median filters: Theory for one or two
dimensional filters, Advances in Computer Vision and Image Processing, T. S. Huang,
ed., Greenwich, CT:JAI Press, 1986.
G. R. Arce and M. P. McLoughlin, Theoretical analysis of the maxlmedian filter, IEEE
Trans. Acoust., Speech and Sig. Proc., Vol. ASSP-35, January 1987, pp. 60-69.
Figure 8.50 Display of spectral estimate using pseudocolor. (a) 2-D maximum likelihood E. Ataman, V. K. Aatre, and K. M. Wong, A fast method for real-time median filtering,
spectral estimate represented by a contour plot; (b) spectral estimate in (a) represented by IEEE Trans. Acoust., Speech and Sig. Proc., Vol. ASSP-28, August 1980, pp. 415-421.
a color image (see color insert).
M. Bierling and R. Thoma, Motion compensating field interpolation using a hierarchically
structured displacement estimator, Signal Processing, Vol. 11, December 1986, pp. 387-
cies. An alternative way of representing the spectral estimate is to use pseudocolor. 404.
Figure 8.50(b) (see color insert) gives an example, where different amplitudes of J. Canny, A computational approach to edge detection, IEEE Trans. on Patt. Ana. Mach.
P,(o,, o,) have been mapped to different colors. Comparing the two figures shows Intell., Vol. PAMI-8, November 1986, pp. 679-698.
that the two peaks and their locations in the spectral estimate stand out more R. E. Crochiere and L. R. Rabiner, Interpolation and decimation of digital signals-a
clearly in Figure 8.50(b) tutorial review, Proc. IEEE, Vol. 69, March 1981, pp. 300-331.
The third example is the display of range information using color [Sullivan L. S. Davis, A survey of edge detection techniques, Computer Graphics and Image Pro-
et al.]. In such applications as infrared radar imaging systems, range information cessing, Vol. 4, 1975, pp. 248-270.
and image intensity are available. Figure 8.51(a) (see color insert) shows an L. S. Davis and A. Mitiche, Edge detection in textures, Computer Graphics and Image
intensity image of several buildings located two to four kilometers away from the Processing, Vol. 12, 1980, pp. 25-39.
radar; the range information has been discarded. Figure 8.51(b) shows an image E. Dubois, The sampling and reconstruction oY time-varying imagery with application in
that uses color to display range information. The range value determines the hue, video systems, Proc. IEEE, Vol. 73, April 1985, pp. 502-522.
and the intensity determines the brightness level of the chosen hue. The most R. 0. Duda and P. E . Hart, Pattern Classifcation and Scene Analysb. New York: Wiley,
striking aspect of this technique is demonstrated by the observation that a horizontal 1973.
line seen at close range (actually a telephone wire) is visible in Figure 8.51(b), but N. C. Gallagher, Jr. and G. L. Wise, A theoretical analysis of the properties of median
is completely obscured in Figure 8.51(a). filters, IEEE Trans. Acoust., Speech and Sig. Proc., Vol. ASSP-29, December 1981, pp.
1136-1141.
C. Gazley, J. E. Reiber, and R. H. Stratton, Computer works a new trick in seeing pseu-
REFERENCES
docolor processing, Aeronaut. Astronaut., Vol. 4, 1967, p. 76.
For readings on gray scale modification, see [Hall et al. (1971); Troy et al.; R. C. Gonzales and B. A. Fittes, Gray-level transformation for interactive image enhance-
ment, Mech. Mach. Theory, Vol. 12, 1977, pp. 111- 122.
Gonzales and Fittes; Woods and Gonzales]. For image enhancement by lowpass
J. E. Hall and J. D. Awtrey, Real-time image enhancement using 3 x 3 pixel neighborhood
filtering and highpass filtering, see [O'Handley and Green; Hall and Awtrey]. For
operator functions, Opt. Eng., Vol. 19, MayIJune 1980, pp. 421-424.
unsharp masking see [Schreiber (1970)l. For readings on homomorphic processing
for image enhancement, see [Oppenheim et al.; Schreiber (197811. See [Peli and
Chap. 8 References 513
512 Image Enhancement Chap. 8
G. B. Shaw, Local and regional edge detectors: some comparisons, Computer Graphics and
E. L. Hall, R. P. Kruger, S. J. Dwyer, 111, D. L. Hall, R. W. McLaren, and G. S. Lodwick, Image Processing, Vol. 9, 1979, pp. 135- 149.
A survey of preprocessing and feature extraction techniques for radiographic images, J. J. Sheppard, Jr., ~seudocoloras a means of image enhancement, Am. J. Opthalmol.
IEEE Trans. Computer, Vol. C-20, 1971, pp. 1032-1044. Arch. Am. Acad. Optom., Vol. 46, 1969, pp. 735-754.
T. S. Huang and R. Y. Tsai, Image Sequence Analysis. Berlin: Springer-Verlag, 1981, R. Srinivasan and K. R. Rao, Predictive coding based on efficient motion estimation, IEEE
Chapter 1. Tram. on Comm., Vol. COM-33, August 1985, pp. 888-896.
T. S. Huang, G. J. Yang, and G. Y. Tang, A fast two-dimensional median filtering algorithm, D. R. Sullivan, R. C. Harney, and J. S. Martin, Real-time quasi-three-dimensional display
IEEE Trans. Acoust., Speech and Sig. Proc., Vol. ASSP-27, February 1979, pp. 13-18. of infrared radar images, SPIE, Vol. 180, 1979, pp. 56-64.
E. R. Kreins and L. J. Allison, Color enhancement of nimbus high resolution infrared E. G. Troy, E. S. Deutsch, and A. Rosenfeld, Gray-level manipulation experiments for
radiometer data, Appl. Opt., Vol. 9, 1970, p. 681. texture analysis, IEEE Trans. Sys. Man. Cybernet., Vol. SMC-3, 1973, pp. 91-98.
J. S. Lim, ed., Speech Enhancement. Englewood Cliffs, NJ: Prentice-Hall, 1983. R. E. Woods and R. C. Gonzales, Real-time digital image enhancement, Proc. IEEE, Vol.
D. Marr, Vhion, A Computational Investigation into the Human Representation of Visual 69, May 1981, pp. 643-654.
Information. New York: W. H. Freeman and Company, 1982.
PROBLEMS
D. Marr and E. Hildreth, Theory of edge detection, Proc. R. Soc., London, Vol. B207,
1980, pp. 187-217.
8.1. Let f(nl, n,) denote an image of 256 x 256 pixels. The histogram of f ( n , , n,) is
D. M. Martinez, Model-based motion estimation and its application to restoration and sketched below.
interpolation of motion pictures, Ph.D. Thesis, M.I.T., Dept. of Elec. Eng. Comp. Sci.,
1986.
H. G . Musmann, P. Pirsch, and H. Grallert, Advances in picture coding, Proc. IEEE, Vol.
73, April 1985, pp. 523-548.
A. N. Netravali and J. D. Robbins, Motion-compensated coding: some new results, The
Bell System Tech. J . , Vol. 59, November 1980, pp. 1735-1745.
T. A. Nodes and N. C. Gallagher, Jr., Two-dimensional root structures and convergence
properties of the separable median filter, IEEE Tram. Acoust., Speech and Sig. Proc.,
December 1983, pp. 1350-1365.
D. A. O'Handley and W. B. Green, Recent developments in digital image processing at
the image processing laboratory at the Jet Propulsion Laboratory, Proc. IEEE, Vol. 60, What can we say about f(nl, n,)? Approximately sketch a transformation function
July 1972, pp. 821-828. which is likely to improve the contrast of the image when it is used to modify the gray
A. V. Oppenheim, R. W. Schafer, and T. G. Stockham, Jr., Nonlinear filtering of multiplied scale of the image.
and convolved signals, Proc. IEEE, Vol. 56, 1968, pp. 1264-1291. 8.2. Suppose we have an image of 8 x 8 pixels as shown below.
T. N. Pappas and J. S. Lim, A new method for estimation of coronary artery dimensions
in angiograms, IEEE Trans. Acoust., Speech and Sig. Proc., Vol. ASSP-36, September
1988, pp. 1501-1513.
R. Paquin and E. Dubois, A spatio-temporal gradient method for estimating the displace-
ment field in time-varying imagery, Computer Vision, Graphics, and Image Processing,
Vol. 21, 1983, pp. 205-221.
T. Peli and J. S. Lim, Adaptive filtering for image enhancement, J. Opt. Eng., Vol. 21,
JanuaryIFebruary 1982, pp. 108-112.
T. Peli and D. Malah, A study of edge detection algorithms, Computer Graphics and Image
Processing, Vol. 20, 1982, pp. 1-20.
L. G. Roberts, Machine perception of three-dimensional solids, in Optical and Electro-
Optical Information Processing, J. T. Tippett et al., eds., Cambridge, MA: MIT Press,
1965, pp. 159-197.
W. F. Schreiber, Image processing for quality improvement, Proc. IEEE, Vol. 66, December
1978, pp. 1640-1651.
W. F. Schreiber, Wirephoto quality improvement by unsharp masking, J. Pattern Recog- Figure P8.2
nition, Vol. 2, 1970, pp. 117-121.
Chap. 8 Problems
514 Image Enhancement Chap. 8
If the gray scale is modified by histogram modification by using the same desired
We wish to modify the gray scale of this image such that the histogram of the processed histograms in both systems, are the results the same; that is, fk(nl, n,) =
image is as close as possible to being constant in the range between 0 and 7. This is f'xn,, n,)?
known as histogram equalization. 8.6. Let V,, denote the input voltage to a display monitor and IOU,denote the output
(a) Determine a transformation function that will achieve the above objective. intensity actually displayed on the display monitor. Ideally, we wish to have IOU,
(b) Determine the processed image based on the transformation function you obtained proportional to V,,. In practice, however, the relationship between V,, and IOU,is
in (a). nonlinear and is called gamma. Suppose the nonlinear relationship is given approx-
8.3. Modifying a histogram so that the output image has a histogram which has a maximum imately by
around the middle of the dynamic range and decreases slowly as the intensity increases
or decreases does not always result in an output image more useful than the original
unprocessed image. Discuss one such example. (a) What effect does the nonlinear relation have on a black-and-white image?
8.4. In this problem, we consider an elementary probability problem closely related to the (b) What effect does the nonlinear relation have on a color image?
histogram modification problem. Let f denote a random variable with probability (c) One way to compensate for the gamma effect is to process V,, prior to its input
density functionpf(fo). We define g as g = T[f]. The function T[.] is a deterministic to the display monitor. Discuss a method of processing V,, so that the gamma
monotonically increasing function. The variable g is a random variable with prob- effect can be taken into account. In practice, such processing is not necessary,
ability density function pg(go). since the correction can be incorporated in the camera design.
(a) Let pf(fo) be a uniform probability density function given by 8.7. Determine and sketch the frequency response of each of the filters in Figure 8.8. In
sketching the frequency response H(o,, o,), you need to sketch only H(o,, oz)lwl=o.
8.8. In homomorphic processing for contrast enhancement, an image f(n,, n,) is logarith-
mically transformed, highpass filtered, and then exponentiated, as shown in the fol-
Suppose T[f] is given by lowing figure.
g = T[f] = ef.
Determine pg(go). filtering
(b) More generally, develop a method to determine pg(go)given pf(fo) and TI.].
(c) Let pf(fo) be the same uniform probability density function as in (a). Suppose
We assume in this problem that the highpass filter used is a linear shift-invariant
pg(g0) is given by system.
(a) In practice, the scale of f(n,, n,) is arbitrary. For example, f(n,, n,) may be
scaled such that 0 corresponds to the darkest level and 1 corresponds to the
where u(go) is the unit step function. Determine Tk].
brightest. Alternatively, f(n,, n,) may be scaled such that 0 corresponds to the
(d) More generally, develop a method to determine Tk], given pf(fo) and pg(go).
darkest level and 255 corresponds to the brightest. What is the effect on the
(e) Discuss how the solution to (d) can be used as a basis for developing a histogram
processed image p(n,, n,) of the choice of scale for f (n,, n,)?
modification method.
(b) The logarithmic operation has an undesirable behavior for the amplitude of
8.5. Consider a color image represented by fR(n,, n,), f,(n,, n,), and f,(n,, n,), the red, f(n,, n,) close to zero. One system proposed to eliminate this undesirable be-
green, and blue components. We modify the gray scale of each of the three com- havior is shown below.
ponents using histogram modification.
(a) Suppose the desired histogram used in modifying the gray scale is the same for
each of the three components. Does the modification affect the hue and saturation
of the color image?
(b) Suppose we filter fR(n,, n,) with a system whose impulse response is h(n,, n,) and
then modify the gray scale of the filtered signal, as shown in the following figure. In the figure, a is a positive real constant. What is a reasonable choice of the
constant a in order for this to approximate the homomorphic system?
(c) In the system in (b), what effect does the choice of scale for f(n,, n,) have on the
Modification processed image pl(n,, n,)?
(d) One sometimes-cited advantage of the homomorphic system is the property that
Suppose we change the order of the filtering and gray scale modification, as shown the processed imagep(n,, n,) always has nonnegative amplitude. Does the system
below. in (b) have this property?
Modification - f X n , , n2)
8.9. The system in Figure 8.12 modifies the local contrast as a function of the local lu-
minance mean and modifies the local luminance mean through some nonlinearity.
Figure P8.11
I Figure P8.18
(a) Show that when the Sobel edge detector is used, the result of discrete approxi- This is consistent with the generalization of (fl(x)(to (Vf(x, y)( in gradient-based
mation of (Vf(x, y)l is given by Figure P8.17b. We will denote this result by methods for edge detection.
(Vf(n,, n,)l. At pixel locations at the image boundary, we cannot compute (b) For (x, y) on the u-axis, show that
(Vf(n,, n,)J based on the Sobel edge detector.
(b) Suppose we choose a threshold of 100. Determine the candidate edge points.
(c) From your result in (b), edge thinning may be necessary to avoid wide strips of
edges. Suppose we decide that aaj; poini among the candidate edge points is a
Chap. 8 Problems
Image Enhancement Chap. 8
8.21. In image interpolation, f ( x , y ) is approximated by
This is consistent with the generalization of f"(x) to VZf( x , y ) in Laplacian-based (a) Determine f ( x , y , t ) .
methods for edge detection. (b) For the f ( x , y , t ) obtained in (a), show that
(c) The result in (b) is not valid for 2-D edges with sharp corners. Discuss how this
could effect the performance of Laplacian-based edge detection methods. You
may want to look at some edge maps obtained by Laplacian-based edge detection
methods. 8.23. The motion estimation methods we discussed in Section 8.4.2 are based on the as-
8.20. Let f ( x , y ) denote a 2-D function that is sampled on a Cartesian grid. The samples sumption of translational motion given by (8.29):
of f ( x , y ) are shown in the following figure.
f(x7 Y , to) = f ( x - 4 ,Y - d,, t - I )
If the overall illumination varies over time, a better model is given by
where a is some unknown constant. Discuss how we would develop region matching
methods based on this new signal model.
8.24. In motion estimation methods based on the spatio-temporal constraint equation, we
solve two linear equations for the velocity vector (v,, v,). The two equations can be
expressed as
where v = [v,, vYITand where Wand y are given by (8.43). Let A, and A, denote
the eigenvalues of W, and let a, and a, denote the corresponding orthonormal ei-
1 : Figure P8.20
genvectors.
(a) Suppose f ( x , y , t) is constant. Show that A, = A, = 0 , and that a reasonable
estimate of v is 0.
(b) Consider f ( x , y , t ) which does not depend on x and which results from uniform
The values in parentheses in the figure represent the amplitudes off ( x , y ) evaluated
at the spatial points corresponding to the filled-in dots. Suppose we wish to estimate translational motion. Let A, 2 A,. Show that A, = 0 , and that a reasonable
estimate of v is v = [a:y]a,lA,.
f ( x , y ) by using bilinear interpolation.
(a) Determine f ( x , y ) for 0 5 x 5 2, 0 5 y 5 1.
( i i ) Are 6 ! < ~ ,y ) / & a f < ~ y)i6y
, weii jefiiizj for ali G S A 5 2 and 0 5 y 5 i ? Chap. 8 Problems 523
ferent from those that attempt to reduce blurring. The types of degradation we
consider in this chapter are additive random noise, blurring, and signal-dependent
noise such as multiplicative noise. These types of degradation are chosen because
they often occur in practice and they have been extensively discussed in the lit-
erature. In addition to providing specific restoration systems for the degradation
treated in this chapter, the general approaches used in the development of these
systems apply to the reduction of other types of degradation.
Image Restoration Examples illustrating the performance of various algorithms are given throughout
the chapter. These examples are included only for illustrative purposes and should
not be used for comparing the performance of different algorithms. The perform-
ance of an image processing algorithm depends on many factors, such as the ob-
jective of the processing and the type of image used. One or two examples do
9.0 INTRODUCTION not adequately demonstrate the performance of an algorithm.
In Section 9.1, we discuss how to obtain information on image degradation.
In image restoration, an image has been degraded in some manner and the objective Accurate knowledge about the degradation is essential in developing successful
is to reduce or eliminate the degradation. Simple and heuristic enhancement restoration algorithms. In Section 9.2, we discuss the problem of restoring an
algorithms for reducing degradation were discussed in Chapter 8. In this chapter, image degraded by additive random noise. Section 9.3 treats the problem of
we study image restoration algorithms. Restoration algorithms are typically more restoring an image degraded by blurring. Section 9.4 treats the problem of re-
mathematical and complex than enhancement algorithms. In addition, they are storing an image degraded by both blurring and additive random noise, and more
designed to exploit detailed characteristics of the signal and degradation. generally the problem of reducing an image degraded by more than one type of
A typical environment for an image restoration system is shown in Figure degradation. In Section 9.5, we develop restoration algorithms for reducing signal-
9.1. If the digitizer and display were ideal, the output image intensity f f ( x , y) dependent noise. In Section 9.6, we discuss temporal domain processing for image
would be identical to the input f (x, y) without any restoration. In practice, many restoration. In Section 9.7, we describe how an image restoration problem can
different types of degradation may be present in the digitizer and display. With be phrased by using matrix notation and how tools of linear algebra may be brought
an image restoration system we attempt to deal with the degradation, so that the to the solution of many image restoration problems.
output f f ( x , y) will be as close as possible to the input f(x, y).
For the purposes of studying image restoration, we will assume that all the 9.1 DEGRADATION ESTlMATlON
degradation occurs before the image restoration system is employed, as shown in
Figure 9.2. This will allow us to consider the image restoration problem entirely
Since image restoration algorithms are designed to exploit characteristics of a signal
in the discrete-space domain (dotted line in Figure 9.2). We can consider f (n,, n,)
and its degradation, accurate knowledge of the degradation is essential to devel-
as the original digital image, g(n,, n,) as the degraded digital image, and p(n,, n,)
as the processed digital image. The objective of image restoration is to make the
processed image p(n,, n,) be as close as possible to f(n,, n,). It is not always
r--- Discrete
--- domain
- - ---------,
reasonable to assume that all the degradation occurs before the restoration system I I
is employed. One such example is additive random noise degradation in a display. f(n~,
I
I q(n1, n2)
I
Ideal I p(nl. n2)
In this case, it is more reasonable to process an image in anticipation of future ffxr
digitizer . I = Degradation
restoration I -
L
display ' f ' ( x , Y)
degradation. However, many different types of degradation, such as blurring in I I
I I
the digitizer or display, can be modeled as occurring before the restoration system
is applied. In this chapter, we assume that the original image f (n,, n,) is degraded Figure 9.2 Image restoration based on the assumption that all the degradation occurs before
and an image restoration system attempts to restore f(n,, n,) from the degraded image restoration. This allows us to consider the image restoration problem in the discrete-
image g(n,, n,), as shown in Figure 9.2. space domain.
The development of an image restoration system depends on the type of
degradation. Algorithms that attempt to reduce additive random noise are dif- Sec. 9.1 Degradation Estimation 525
function b(x, y) is sometimes referred to as the blurring function, since b(x, y)
oping a successful image restoration algorithm. There are two approaches to typically is of lowpass character and blurs the image. It is also referred to as the
obtaining information about degradation. One approach is to gather information point spread function, since it spreads an impulse. When there is no motion and
from the degraded image itself. If we can identify regions in the image where the thus xo(t) = 0 and yo(t) = 0, B(Rx, fly) is 1 and g(x, y) is f ( x , y). If there is linear
image intensity is approximately uniform, for instance the sky, it may be possible motion along the x direction so that xo(t) = kt and yo(t) = 0, B(Rx, R,) in (9.4)
to estimate the power spectrum or the probability density function of the random reduces to
background noise from the intensity fluctuations in the uniform background re-
gions. As another example, if an image is blurred and we can identify a region sin, Rx kT
in the degraded image where the original undegraded signal is known, we may be
able to estimate the blurring function b(nl, n,). Let us denote the known original
undegraded signal in a particular region of the image as f(nl, n,) and the degraded
image in the same region as g(nl, n,). Then g(n,, n,) is approximately related to A discrete image g(n,, n,) may be approximately modeled by
f (nl n2)
7
blur, we can express the degraded image g(x, y) as The model of an image degraded by additive random noise is given by
g(x, Y) = - (9.2)
where v(nl, n,) represents the signal-independent additive random noise. Ex-
where xo(t) and yo(t) represent the horizontal and vertical translations off (x, y) at amples of additive random noise degradation include electronic circuit noise, and
time t relative to the imaging system, and T is the duration of exposure. In the in some cases amplitude quantization noise. In this section, we discuss a number
Fourier transform domain, (9.2) can be expressed as of algorithms that have been proposed for reducing additive random noise in
images.
One of the first methods developed to reduce additive random noise in images is
based on Wiener filtering, which was discussed in Section 6.1.4. If we assume
. e-j"x~e-i "'dxdy
Y (9.3) that f(nl, n,) and v(nl, n,) are samples of zero-mean stationary random processes
that are linearly independent of each other and that their power spectra Pf(wl, w,)
where G(Rx, R,) is the Fourier transform of g(x, y). Simplifying (9.3), we obtain and Pv(w,, w,) are known, the optimal linear minimum mean square error estimate
G(fl,, a,) = F(Rx, Ry)B(Rx, a,) (9.4a) of f(nl, n,) is obtained by filtering g(nl, n,) with a Wiener filter whose frequency
response H(w,, w,) is given by
where
From (9.4), it is clear that the planar motion blur can be viewed as convolution of
f(x, y) with b(x, y) whose Fourier transform B(Rx, 0,) is given by (9.4b). The If we impose the additional constraint that f(nl, n,) and v(nl, n,) are samples of
Gaussian random processes, then the Wiener filter in (9.8) is the optimal minimum
*It is simpler to derive. the effect of image degradation due to motion blur in the
andog domain and then disc.re!Lz~-!he resu!! than tn derive it in the digita! domain
where Var [.I is the variance. Using the variance ensures that the NMSE will not
be affected by adding a bias top(n,, n,). The measure NMSE [f (n,, n,), g(n,, n,)]
for some constant 0 < p < 1. The parameter p is estimated from the degraded is similarly defined. The SNR improvement due to processing is defined by
image g(nl, n,).
SNR improvement = 10 log,, NMSE [f (47 n2)7 g(n17 n2)l dB. (9.13)
NMSE tf (47 n,), p(n17 n2)l
A human observing two images affected by the same type of degradation will
generally judge the one with the smaller NMSE to be closer to the original. A
very small NMSE generally can be taken to mean that the image is very close to
the original. It is important to note, however, that the NMSE is just one of many
possible objective measures and can be misleading. When images with different
Figure 9.3 Noncausal Wiener filter for linear minimum mean square error esti-
matlon ot f (n,,n,) from g(n,,n,) = j(n,,n,) + v(n,,n,).
Sec. 9.2 Reduction of Additive Random Noise 529
Image Restoration Chap. 9
types of degradation are compared, the one with the smallest NMSE will not
necessarily seem closest to the original. As a result, the NMSE and SNR im-
provements are stated for reference only and should not be used in literally com-
paring the performance of one algorithm with another.
Figure 9.5 illustrates the performance of a Wiener filter for image restoration.
Figure 9.5(a) shows an original image of 512 x 512 pixels, and Figure 9.5(b) shows
the image degraded by zero-mean white Gaussian noise at an SNR of 7 dB. The
SNR was defined in Chapter 8 as
The Wiener filter discussed in Section 9.2.1 was derived by minimizing the mean
square error between the original and processed signals. The mean square error
is not, however, the criterion used by a human observer in judging how close a
processed image is to the original. Since the objective criterion consistent with
human judgment is not known, many ad hoc variations have been proposed. One
variation is power spectrum filtering. In this method, the filter used has the
frequency response H(ol, w,) given by
The function H(ol, o,) in (9.15) is the square root of the frequency response of
the Wiener filter. If f(nl, n,) and v(nl, n,) are samples of stationary random
processes linearly independent of each other, the output of the filter will have the
same power spectrum as the original signal power spectrum Pf(ol, o,). The method
is thus known as power spectrum filtering. To show this,
One reason why the Wiener filter and its variations blur the image sighificantly is
that a fixed filter is used throughout the entire image. The Wiener filter was
developed under the assumption that the characteristics of the signal and noise do
not change over different regions of the image. This has resulted, in a space-
invariant filter. In a typical image, image characteristics differ considerably from
one region to another. For example, walls and skies have approximately uniform
background intensities, whereas buildings and trees have large, detailed variations
Figure 9.5 (a) Original image of 512 x 512 pixels; (b) degraded image at, S N R of 7 dB, in intensity. Degradations may also vary from one region to another. It is rea-
with NMSE of 19.7%; (c) processed image by Wiener filtering, with NM!& of 3.6% and sonable, then, to adapt the processing to the changing characteristics of the image
S N R improvement of 7.4 dB. and degradation. The idea of adapting the processing to the local characteristics
Several variations of Wiener filtering that have been proposed for image
restoration can be expressed by the following H(wl, w,): *Some parts of this section were previously published in J. S. Lim, "Image Enhance-
ment," in Digital Image Processing Techniques" (ed. M . P. Ekstrom). New York: Aca-
demic Press, 1984, Chapter 1.
where a and p are some constants. When a = 1 and P = 1, H(o,, o,) reduces
I Sec. 9.2 Reduction of Additive Random Noise 533
This condition guarantees that simple addition of the unprocessed subimages will
result in the original image. The second condition requires wi,(n,, n,) to be a
smooth function that falls close to zero near the window boundaries. This tends w i i ( n r ,n 2 ) = w i ( n l ) w i ( n 2 )
to reduce possible discontinuities or degradation that may appear at the subimage
boundaries in the processed image. Figure 9.7 Example of two-dimensional separable triangular window.
One of the many ways to find a smooth 2-D window functioq that satisfies
Sec. 9.2 Reduction of Additive Random Noise
Image Restoration Chap. 9
A priori
knowledge
Local
characteristics
Degraded Processed
image image
I
dn,, n2) ~ ( n ,n2)
,
Degraded Processed
image 0 Processing image
d n , , n2)
A priori information
of image, degradation,
or any other relevant
information
t Figure 9.8 General adaptive image
processing system.
A priori
knowledge
Measure of
local image
detail
Figure 9.9 Typical adaptive image
restoration system for additive noise
reduction.
performed so as not to distort (blur) the signal component. This does not reduce
much noise, but the same noise is less visible in the high-detail than in the low-
detail regions.
as the lowpass filter cutoff frequency. Without a specific application context, only
A number of different algorithms can be developed, depending on which
general statements can be made. In general, the more the available knowledge
specific measure is used to represent local image details, how the space-variant
is used, the higher the resulting performance will be. If the available information
is inaccurate, however, the system's performance may be degraded. In general, h(nl, n,) is determined as a function of the local image details, and what prior
knowledge is available. One of the many possibilities is to adaptively design and
more sophisticated rules for adaptation are associated with subimage-by-subimage
implement the Wiener filter discussed in Section 9.2.1. As Figure 9.3 shows, the
processing, while simple rules are associated with pixel-by-pixel processing for Wiener filter requires knowledge of the signal mean mf, noise mean mu, signal
computational reasons.
When an adaptive image processing method is applied to the problem of power spectrum Pf (o,, o,), and noise power spectrum Pu(ol, w,). Instead of
assuming a fixed mf, mu, Pf(ol, o,), and P,(wl, o,) for the entire image, they can
restoring an image degraded by additive random noise, it is possible to reduce
background noise without significant image blurring. In the next four sections, be estimated locally. This approach will result in a space-variant Wiener filter.
Even within this approach, many variations are possible, depending on how mf,
we discuss a few representative adaptive image restoration systems chosen from mu, Pf(ol, o,), and Pu(wl, w2) are estimated locally and how the resulting space-
among the many proposed in the literature.
variant Wiener filter is implemented. We will develop one specific algorithm to
illustrate this approach.
9.2.4 The Adaptive Wiener Filter We first assume that the additive noise v(n,, n,) is zero mean and white with
variance of o$ Its power spectrum Pu(o,, o,) is then given by
Most adaptive restoration algorithms for reducing additive noise in an image can
be represented by the system in Figure 9.9. From the degraded image and prior
knowledge, some measure of the local details of the noise-free image is determined.
One such measure is the local variance. A space-variant* filter h(n,, n,) which is Consider a small local region in which the signal f(n,, n,) is assumed stationary.
a function of the local image details and of additional prior knowledge is then Within the local region, the signal f (n,, n,) is modeled by
determined.
The space-variant filter is then applied to the degraded image in the local
region from which the space-variant filter was designed. When the noise is wide-
band, the space-variant h(n,, n,) is lowpass in character. In low-detail image where mf and uf are the local mean and standard deviation of f(n,, n,), and
regions such as uniform intensity regions, where noise is more visible than in high- w(n,, n,) is zero-mean white noise with unit variance.* There is some empirical
detail regions, a large amount (low cutoff frequency) of lowpass filtering is per- evidence that (9.22) is a reasonable model for a typical image [Trussell and Hunt;
formed to reduce as much noise as possible. Since little signal variation is present Kuan et al. (1985)l.
in low-detail regions, even a large amount of lowpass filtering does not significantly In (9.22), the signal f (n,, n,) is modeled by a sum of a space-variant local
affect the signal component. In high-detail image regions such as edges, where a
large signal component is present, only a small amount of lowpass filtering is
*The notation w(n,, n,) is used to represent both a window function and white noise.
*For a space-variant filter, the filter coefficients change as a function of (n,, n,). For Specifically which is meant will be clear from the context.
notational simplicity, we denote the filter coefficients by h(n,, n,). It should be noted.
however, that h(n,, it,) ~liiliigebab we process different parts of an image. Sec. 9.2 Reduction of Additive Random Noise 537
where
where f(n,, n,) is the noise-free image and (2L + 1) x (2L + 1) is the size of
the local region used in measuring the masking level M at (n,, n,). In (9.31),
M(n,, n,) increases as the horizontal and vertical slopes of f(n,, n,) increase and
the contributions that the horizontal and vertical slopes make on M(nl, k )decrease
Figure 9.11 Performance illustration of an adaptive Wiener filtering method. The degraded exponentially as the Euclidean distance between (n,, n,) and the point at which
image used is the image in Figure 9.5(b). (a) Processed image by adaptive filtering, with the slopes are measured increases. In (9.30), the equal noise visibility is assumed
NMSE of 3.8% and SNR improvement of 7.1 dB: fb) processed image by space-invzriznt
Wiener filter, with NMSE of 3.6% and SNR improvement of 7.4 dB.
Sec. 9.2 Reduction of Additive Random Noise 541
540 Image Restoration Chap. 9
throughout the entire image. To impose this constraint, note from elementary
stochastic processes theory (see Chapter 6) that when the degrading noise v(nl, n,)
to hold as we scale u: and a2, by the same scaling factor. This assumption is is white with variance of u:, the noise in the processed image is colored with
probably valid only over a small range of the scaling factor. variance of ui where
In addition to the assumptions made in (9.30), which may be only approxi- l l
mately valid, there are practical difficulties in measuring V(M) by using (9.30). In
a typical image, the number of pixels with a given M may be small, particularly
for a large M. In such cases, it will be difficult to measure V(M) by using the If we choose h(nl, n,) in each local region such that satisfies
noise visibility matching experiment. Despite these difficulties, V(M) has been u;V(M) = constant, (9.35)
approximately measured based on (9.30) and (9.31) by means of the noise visibility
the level of noise that remains in the processea image will be equally visible to the
matching experiment. The result is shown in Figure 9.12. As expected, V(M)
extent that V(M) accurately reflects the definition in (9.30) and V(M) is approxi-
decreases as M increases for a wide range of M.
mately the same for the white and colored noise. The constant in (9.35) is chosen
The noise visibility function can be used in a number of ways to develop
such that some balance between noise reduction and signal distortion (blurring) is
image restoration algorithms. We will develop one restoration algorithm, which
reached. If the constant chosen is too large, little background noise will be re-
can be viewed as a special case of the adaptive system shown in Figure 9.9. In
duced. If the constant chosen is too small, noise will be reduced, but significant
this algorithm, the space-variant filter h(nl, n,) has a Gaussian shape given by signal distortion will occur. At each pixel, the space-variant h(nl, n,) can be
determined from (9.32), (9.33), (9.34), and (9.35). Since the filter parameters k
and u2 depend only on M in this algorithm, k and u2 can be precomputed and
where k and u2 are determined adaptively and w(nl, n,) is a rectangular window stored in a table as a function of M. To restore an image, M(nl, n,) of the noise-
that limits the region of support size of h(nl, n,). To determine k and u2, one free image f(nl, n,) is estimated from the degraded image and k(nl, n,) and u2(nl, n,)
constraint imposed is
can be determined from the precomputed table. At each pixel ( 4 , n,), the space-
variant h(nl, n,) can be determined from (9.32) by using the k and u2 that have
been determined.
The algorithm developed above is based on the concept that the noise in the
Another constraint requires the noise in the processed image to be equally visible processed image is equally visible throughout the image, independent of local image
detail. The amount of blurring that will result in the processed image, however,
log VIM) is not explicitly controlled. Fortunately, in the high-detail regions, where it is
desirable to have as little signal blurring as possible, M is large, V(M) is small, the
noise level ui that remains in the processed image is relatively large, and relatively
little blurring occurs.
Figure 9.13 illustrates the performance of this algorithm. Figure 9.13(a)
shows an original image of 512 x 512 pixels. Figure 9.13(b) shows the image
degraded by white Gaussian noise at an SNR of 7 dB. Figure 9.13(c) shows the
processed image, with an SNR improvement of 7.7 dB. The processed image was
obtained by adapting the filter at each pixel and determining the masking function
M(n,, n,) from the original noise-free image.
Despite the various assumptions and approximations made in this algorithm's
development, significant noise reduction is achieved with little noticeable signal
blurring. In practice, the original noise-free image is not available for use in
estimating M(n,, n,). If M(n,, n,) is obtained from the noisy image, the perform-
ance of the algorithm deteriorates. Figure 9.13(d) shows the image processed by
the algorithm when M(nl, n,) is obtained from the degraded image. The image
has an SNR improvement of 4.5 dB. This algorithm is just one example of ex-
ploiting the noise visibility function V(M). There are many other possible defi-
nitions of V(M) and many other ways to exploit V(M) in the development of image
O . O 1 l l ~ ~ I I I 1 " I ' ~ ' +M restoration algorithms.
0 20 40 60 80 100 120
Masking !unit C-255)
.
Figure 9.12 Noise visibility functlon
TI,
v \ L v ~ , . A k r iAnderson and Netravahj.
I\
The window is chosen such that the subimage gw(nl, n,) can be assumed to be
approximately stationary. With Gw(wl, w,), Fw(wl, w,), and V w ( w l ,w,) denoting
the Fourier transforms of gw(nl, n,), fw(nl, n,), and v w ( n l , n,), respectively, from
(9.37)
The functions V:(wl, w,) and F:(wl, w,) are complex conjugates of V w ( w l , w,)
and Fw(wl, w,). Rewriting (9.38), we obtain
In the spectral subtraction method, lFw(wl, w2)l is estimated based on (9.39). From
the degraded subimage gw(nl, n;?), ICw(wl, w2)I2can be obtained directly. The terms
IVw(wl, w2)I2,Fw(wl, w2)V:(wl, w,), and F;(wl, w2)Vw(wl,w,) cannot be obtained
exactly, and are approximated by EIIVw(wl, w2)(2],EIFw(wl, w,)V:(w,, w,)], and
E[F;(w,, w2)Vw(wl,w , ) ] For v w ( n l ,n,) which is zero mean and uncorrelated with
f ( n l , n,), E[FW(w1,w2)V:(w1, w2)1 and EIFG(wl, w2)Vw(w1,w2)1 are zero and an
estimate of (Fw(wl,w2)12is suggested from (9.39) as
In the previous three sections, we have discussed adaptive image restoration al-
gorithms that adapt to the local characteristics of an image. Within the local region,
the image is typically assumed to be a sample of a stationary random process. One Figure 9.14 Performance illustration of the short-space spectral subtraction method. (a)
Original image of 256 x 256 pixels; (b) image degraded by white noise, with SNR of 10 dB;
problem with this assumption arises in edge regions, where the signal cannot be (c) processed image by short-space spectral subtraction.
adequately modeled even locally as a sample of a stationary random process. A
filter based on this assumption will preserve edges by leaving a significant amount
of noise near them. Although noise is less visible near edge regions than near detected edges in designing and implementing an adaptive filter. Edges, for ex-
uniform intensity regions, removing more noise near edges can be beneficial. ample, can be used in defining boundaries of local image regions over which the
One approach to reducing more noise near edges without additional edge image is assumed stationary. When a space-variant filter h(n,, n,) is designed,
blurring is to model the image more accurately (by considering an edge as a de- the region of support of h(n,, n,) can be chosen such that h(n,, n,) does not cover
terministic component, for example) and to develop an image restoration algorithm pixels that lie in more than one region where the regions are bounded by edges.
based on the new model. However, modeling an image accurately is a difficult This approach, however, requires explicit determination of edges, and detection
task, and an image restoration algorithm based on a detailed and accurate image of edges in the presence of noise is not a simple task.
model is likely to be quite complex. Another approach is to first detect edges
using the edge detection algorithms discussed in Section 8.3 and then to use the
Sec. 9.2 Reduction of Additive Random Noise 547
Suppose f ( n l , n,) and b(nl, n,) are finite extent sequences with nonfactorable*
z-transforms F ( z l , 2,) and B ( z l , 2 , ) . Then we can recover f ( n l , n,) within trans-
lation and a scale factor from g(nl, n,) = f ( n l , n,) * b ( n l , n,), using a polynomial
factorization algorithm. Specifically, G ( z l , z,), the z-transform of g(nl, n,), is
given by F ( z l , z 2 ) B ( z 1 ,2 , ) . Since we assume that f ( n l , n,) and b ( n l , n,) are finite-
extent sequences, G ( z l , 2,) is a finite-order 2-D polynomial in z;' and 2;'. In
addition, we assume that F ( z l , 2,) and B ( z l , 2,) are nonfactorable, and therefore
the only nontrivial factors of G ( z l , 2,) are F ( z l , 2,) and B ( z l , z,). Polynomial
factorization algorithms that determine nontrivial factors of G ( z l , 2,) are available
[Izraelevitz and Lim] and may be used in determining F ( z l , 2,) or f ( n l , n,) within
translation and a scale factor. Unfortunately, this approach to solving the blind
deconvolution problem has serious practical difficulties. Polynomial factorization
algorithms developed to date are very expensive computationally. In addition,
the algorithms are extremely sensitive to any deviation from the assumption that
G ( z l , 2,) = F ( z l , z 2 ) B ( z 1 ,2,) or g(nl, n2) = f ( n l , n2) * b ( n l , n2). In practice,
the convolutional model of g(nl, n,) = f ( n l , n,) * b ( n l , n,) is not exact due to the
presence of some background noise or due to approximations made in the modeling
process.
One practical blind deconvolution algorithm is based on the assumption that
(B(w,, w,)( is a smooth function. This assumption is approximately valid in some
applications. When an image is blurred by a thin circular lens, the modulation
transfer function? ( H ( f l x f, l y ) ( is a fairly smooth circularly symmetric lowpass filter
shown in Figure 9.19. When a long-exposure image is blurred by atmospheric
turbulence, the blurring function b ( x , y) and its Fourier transform B ( f l x , f l y ) are
approximately Gaussian-shaped [Goodman (1968)l. When an image is blurred by
By summing both sides of (9.67) over all subimages and rewriting the expression,
we obtain
Equation (9.68) is the basis for estimating IB(w,, 02)1. The numerator term Ci 2,
IG,(wl, 02)l is obtained from g(nl, n,). The term Ci Cj IFij(ol, 02)(can be estimated
from the empirical observation that
C C IFij((J4,
i j
("2)I C C IFh(~17w2)I
I i
(9.69)
where F$(ol, o,) is obtained from an undegraded image that is similar in content
to f(nl, n,). From (9.68) and (9.69), IB(ol, 02)/is estimated by
more than one type of degradation is present in the degraded image. In addition,
--
it is an optimal approach in some cases. For example, suppose f(nl, n,) and
v(nl, n,) are samples of zero-mean stationary random processes that are linearly
I
independent from each other. In addition, suppose b(nl, n2) is known. Then the
optimal linear estimator that minimizes E[(f(n,, n,) - f(nl, n,)),] is an LSI system
with a frequency response H(w,, w,) given by
g(nl, n2)
I
' Reduction of
v(n,, n2)
I '("1.
Deblurring +
$nit n2)%
f(nl, n2)
Figure 9.25 (a) Original image of 512 x 512 pixels; (b) image degraded by blurring and
additive random noise with SNR of 25 dB; (c) image processed by an adaptive Wiener filter
for random noise reduction and an inverse filter for reduction of blurring; (d) image processed
Figure 9.23 Reduction of blurring and additive random noise by cascade of a by an inverse filter alone.
w i s e rednc?lr?r!S Y S ~ P E 3zd Ideblurring system.
Sec. 9.4 Reduction of Blurring and Additive Random Noise 56 1
560 Image Restoration Chap. 9
where TI[.] is an operator that may be different from T[.] and v(nl, n,) is an
quantization noise may not be visible by itself but can be amplified by subsequent additive signal-independent noise. One approach to restoring f(n,, n,) from
processing. For example, suppose we process an image degraded by blurring and g(nl, n,) is first to estimate T,[f (n,, n2)] by reducing the additive signal-independent
additive random noise with a noise reduction system followed by a deblurring noise v(n,, n,) and then estimate f (n,, n,) from the estimated Tl[f (n,, n,)]. This
system. If the output of the noise reduction system is quantized to 8 bitsipixel, approach exploits the fact that reducing additive signal-independent noise is gen-
the quantization noise will not be visible at this stage. However, the subsequent erally much simpler than reducing signal-dependent noise, and a number of al-
deblurring system, which is typically a highpass filter, can amplify the quantization gorithms have already been developed to reduce additive signal-independent noise.
noise, and the noise may become clearly visible in the final result. Since the effect To illustrate this approach, let us consider the problem of reducing multipli-
on the final result due to quantization in the intermediate results often is not cative noise. An example of multiplicative noise is the speckle effect [Dainty],
straightforward to analyze, it is worthwhile to keep intermediate results with high which is commonly observed in images generated with highly coherent laser light,
accuracy. such as infrared radar images. The degraded image due to multiplicative noise,
In this section, we have discussed the problem of restoring an image degraded g(n,, n,), can be expressed as
by two specific types of degradation. The idea of reducing one type of degradation
at a time may be applicable to other types of degradation. Specifically, when an
where v(n,, n,) is random noise that is not a function of f(n,, n,). Since g(n,, n,)
image is degraded by degradation 1, followed by degradation 2, and followed by
and f(n,, n,) represent image intensities and are therefore nonnegative, v(n,, n,)
degradation 3, one approach to consider is to reduce degradation 3 first, degradation
2 next, and finally degradation 1. Once the overall system consisting of subsystems is also nonnegative. By applying the logarithmic operation to (9.76), we obtain
is developed, it may be made more computationally efficient by rearranging the
subsystems. Such an approach, although not always optimal, often simplifies the
If we denote log g(n,, n,) by g1(nl, n,) and denote log f(n,, n,) and log v(n,, n,)
restoration problem and is in some cases an optimal approach in that it leads to
similarly, (9.77) becomes
the same solution as the approach that treats all the degradations simultaneously.
The multiplicative noise v(n,, n,) has now been transformed to additive noise
9.5 REDUCTION OF SIGNAL-DEPENDENT NOISE vf(n1,n,) and image restoration algorithms developed for reducing additive signal-
Any degraded image g(n,, n,) can be expressed as independent noise may be applied to reduce vf(n1, n,). The resulting image is
exponentiated to compensate for the logarithmic operation. The overall system
g(nl, n2) = D[f (n1, n2>l = f (n,, nz) + d(n1, nz) (9.74a) is shown in Figure 9.26.
(9.74b) Figure 9.27 illustrates the performance of this image restoration algorithm in
where d(n1, n2) = g(n1, n2) - f (n1, 4 reducing multiplicative noise. The noise v(n,, n,) is white noise generated by using
and D[.] is a degradation operator applied to f(n,, n,). If d(nl, n,) is not a function the probability density function
of the signal f(nl, n,), d(n,, n,) is called additive signal-independent noise. If
d(n,, n,) is a function of f(n,, n,), d(n,, n,) is called additive signal-dependent noise,
often referred to as just signal-dependent noise. Examples of signal-dependent where u(vo) is the unit step function, a, and a, are some constants, and k is a scale
noise are speckle noise, film grain noise, and quantization noise. One approach factor that ensures that the integral of the probability density function is 1. Figure
to reducing signal-dependent noise is to transform g(n,, n,) into a domain where 9.27(a) shows an original image of 512 x 512 pixels. Figure 9.27(b) shows the
the noise becomes additive signal-independent noise and then to reduce the signal- image degraded by multiplicative noise v(n,, n,) obtained from (9.79) with a, = 1,
independent noise. Another approach is to reduce it directly in the signal domain. and a, = . l . Figure 9.27(c) shows the processed image, using the system in Figure
These approaches are discussed in the following two sections. 9.26. The processed image has an SNR improvement of 5.4 dB. The restoration
Subtracting the additive component log B(wl, w2) from log G(wl, w2) and expo-
nentiating the result are equivalent to inverse filtering. Another example of the
transformation of a signal-dependent noise to an additive signal-independent noise
for its reduction is the decorrelation of quantization noise in image coding, which
is discussed in Chapter 10.
A major advantage of the approach discussed in the previous section is its simplicity.
The approach, however, is based on the assumption that a domain can be found
in which the signal-dependent noise becomes an additive signal-independent com-
ponent. Such a domain may not exist for some types of signal-dependent noise.
Even when such a domain can be found, the image restoration problem will be
solved in the new domain. This may cause some degradation in performance. To
see this, suppose an algorithm has been developed to reduce the signal-independent
noise v(nl, n2) in
by attempting to minimize
If the same algorithm is used to reduce the signal-independent noise v(nl, n2) in
where mf = E[f (n,,n,)], m, = E[g(n,, n,)], Pfg(wl, 0,) is the cross power spectrum
of f(n,, n2) and the degraded signal g(n,, n,) and Pg(ol, 02) is the power spectrum and
of g(n,, n,). This is the Wiener filter solution discussed in Section 6.1.4. When b = Elf] - E [ g b (9.97b)
d(n,, n,) is signal independent, (9.89) simplifies to the Wiener filter discussed in where mz is the variance of g. From (9.93), (9.94), and (9.97),
Section 9.2. Since the signal f(n,, n,) and the signal-dependent noise d(n,, n,)
cannot be assumed stationary in an image restoration problem, the filter in (9.88)
and (9.89) may be implemented locally in an adaptive manner to the extent that
Pfg(wl, 0,) and P,(o,, o,) can be estimated locally. where
In another example where the solution to the linear minimum mean square
error signal estimation problem is greatly simplified, the signal f (n,, n,) is modeled From (9.93) and (9.98) and after some algebra (see Problem 9.18),
as
f ( n l , n2) = mf(n,, 02) + of(%, n2)~(n1,n2) (9.90)
where mf(nl, n,) is E[f (n,, n,)], uJ(n,, n,) is the variance of f(n,, n,) and w(n,, n,) Since (9.99) can be used for estimating f(n,, n,) at every pixel,
is zero-mean white noise with unit variance. This is the same model used in the
image restoration algorithm developed in Section 9.2.4. For a certain class of
signal-dependent noise which includes multiplicative noise and Poisson noise, the
signal model (9.90) leads to a particularly simple algorithm. T o illustrate this, we
again consider the problem of reducing multiplicative noise. (9.100)
Consider a degraded image g(n,, n,) given by The noise statistics u: and m, are assumed known. T o the extent that the signal
model is valid, and m,(n,, n,) and u,2(nl, n,) can be estimated from the local
d n 1 7 n2) = f (n1, n2)v(n1, n2) (9.91) neighborhood at each pixel from g(n,, n,), f (n,, n,) may be estimated from (9.100).
where v(n,, n,) is a stationary white noise with mean of mu and variance of u,2. This approach can be used to develop algorithms for reducing other types of signal-
From (9.90) and (9.91), dependent noise, including Poisson noise [Kuan et al. (1985)l.
where vi(nl, n,) is zero-mean stationary white Gaussian noise with variance of cr:
and vi(nl, n,) is independent from v,(n,, n,) for i f j. If we assume that f (n,, n,)
is nonrandom, the maximum likelihood (ML) estimate of f(n,, n,) that maximizes
~ g i ( n l , n z ) ~ ( n l , n z ) ( g I ( n n2)[f1(nl9
l, n2)) is given by (see Problem 9.19)
1
f(n1, n2) = - N iC gi(nl9 n2). (9.104)
=,
1
/..0 Future
Figure 9.30 Performance illustration of frame averaging. (a) Image in Figure 9.29(a) de- Pad Y Current
graded by multiplicative noise, with NMSE of 28.8%; (b) processed image by frame averaging
(8 frames), with NMSE of 4.7% and SNR improvement of 7.9 dB; (c) image processed by
applying a spatial filter to the image in (b). The processed image has NMSE of 4.3% and F i 9.31 Motion-compensated
SNR improvement of 8.3 dB. t -1 to r1 image restoration.
with a proper choice of g, B, f, and v . For example, suppose that f(n,, n,) and
Figure 9.32 Performance illustration of motion-compensated image restoration. (a) De- b(nl, n,) are 2 X 2-point sequences, zero outside 0 5 n, 5 1, 0 5 n2 5 1, and
graded current frame; (b) processed image by motion-compensated frame averaging; (c) that g(n,, n,) and v(nl, n,) are 3 x 3-point sequences, zero outside 0 5 n, 5 2,
processed image by frame averaging without motion compensation. The blurring in this image 0 5 n, 5 2. Then one set of f, B, v , and g is
shows the amount of motion present in the sequence of three image frames used.
current, and future) were used in the processing. The motion estimation algorithm
used is the spatio-temporal constraint method with polynomial signal interpolation
discussed in Section 8.4.2. After the motion parameters are estimated, the spatial
interpolation required for temporal filtering is performed by using the truncated
ideal interpolation filter. Figure 9.32(c) shows the result of frame averaging with- Sec. 9.7 Additional Comments 575
Chap. 9 References
9.1. Consider an image restoration problem in which we wish to develop a restoration Show that p,(v0), the probability density function from which v ( n l , n,) is obtained, is
system to compensate for blurring in an image display system. One model for this given by
restoration problem is shown in the following figure.
9.4. In Section 9.1, we derived a model for image restoration when an analog image f(x, y )
is blurred by a planar motion of an imaging system during the exposure time. Assume
p(n,, n2)
that the exposure time is for 0 5 t 5 T .
Restoration
(a) Suppose the planar motion during the exposure time has a constant horizontal
velocity v,. Determine b(x, y ) , the point spread function of the blurring. Does
your answer make sense?
(b) Suppose the planar motion during the exposure time has a horizontal velocity vx(t)
given by
In the figure, f ( n l , n,) is the original image, b(nl, n,) is the blurring function, and
p(nl, n,) is the processed image. The restoration system in this problem is placed Determine the point spread function b(x, y ) and its Fourier transform B(flx,f l y ) .
prior to the image degradation, since the display system displays the processed image.
9.5. At the peripheral level of the human visual system, the image intensity appears to be
The restoration problem we discussed in this chapter is based on a model in which
the degradation occurs before the restoration system is applied, as shown in the subjected to some form of nonlinearity, such as a logarithmic operation. This is
evidenced in part by the approximate validity of Weber's law. Weber's law states
following figure.
that the just-noticeable difference in image density (logarithm of image intensity) is
independent of the density. Since the image density domain is, then, more central
to the human visual system than is the intensity domain, it appears that processing
an image in the density domain may be more effective than processing an image in
Degradation Restoration t p ( n l , n2)
the intensity domain. Although the above reasoning sounds logical and has been
used in the development of image restoration and enhancement systems, in many
Figure F9.lb instances processing an image in the intensity domain rather than in the density domain
makes more sense. Give one such example.
9.6. Let g(n,, n,) denote a degraded image that can be expressed as
Discuss how the model in Figure P9.lb can be used in solving the restoration problem
in Figure 9.la.
9.2. Let g(n,, n,) denote the response of an LSI system when the input f(n,, n,) is a line where f(n,, n,) is a signal and v ( n l , n,) is a background noise. In Wiener filtering,
impulse 6,(n1). From g(nl, n,), what can we say about the impulse response we assume that P f ( o l ,o,) is given. One method of estimating P f ( o , ,o,) is to model
h(n,, n,) and the frequency response H ( o , , o,) of the LSI system? the correlation function Rf(n,, n,) by
9.3. In developing image restoration systems, it is necessary to determine the characteristics Rf ( n l , n,) = p!' p r l
of image degradation. In this problem, we consider the degradation due to speckle
noise that occurs in infrared radar images. Let g(n,, n,) denote the original noise- and then obtain P f ( o l ,0 2 )by
free image f ( n l , n,) degraded by speckle noise. Suppose the degraded image is
sampled coarsely enough so that the degradation at any pixel can be assumed to be
independent from all other pixels. According to a model derived theoretically and
Chap. 9 Problems
580 Image Restoration Chap. 9
dn,. n2) Design ~ ( n ,n,)
.
Assuming v(n,, n,) is a zero-mean white noise with variance of at independent of
-
f(n,, n,), develop a method of estimating p, and p, from g(n,, n,).
9.7. Many different measures may be used in quantifying the degree of closeness between local signal
an original image f(n,, n,) and a processed image p(n,, n,). In this problem we variance a:
compare two measures. One is the normalized mean square error (NMSE), given
by Figure P9.10
n,=
t B -w ,,2= -w
h(n1, n2) = 1
where f(n,, n,) is an original image, b(n,, n,) is a known point spread function, and
and 02, 2 2
"I=-= n2=-=
w 1 , n2)l2 v ( M ) = a v ( n , , n,) is background noise. Assuming that f ( n l , n,) and v ( n , , n,) are samples of
zero-mean stationary random processes that are linearly independent of each other,
show that the optimal linear estimator that minimizes E [ ( f ( n , , n,) - f ( n , , n,)),] is a
where 02, is the variance of the additive noise present in the degraded image and a linear shift-invariant system with its frequency response H(w,, w,) given by
controls the relative amount of noise reduction and signal blurring. We assume that
both ot and a are known. Develop an efficient method to compute k and o.
9.13. In Section 9.3.2, we discussed two specific methods of solving the blind deconvolution
problem. There are many variations to the methods we developed. In this problem,
where P f ( w l ,w,) and P,(wl, a,) denote the power spectrum of f ( n , , n,) and v ( n , , n,),
we consider one of them. Let g(n,, n,) denote the degraded image, which can be
respectively. You may wish to refer to the derivation of the noncausal Wiener filter
expressed as
in Section 6.1.4, where b ( n l , n,) is assumed to be 6 ( n , , n,).
9.16. Let f ( n , , n,) denote an original signal. Let g(n,, n,) denote f(n,, n,) degraded by
two different additive random noises v(n,, n,) and w(n,, n,), as shown in the following
where f ( n , , n,) is an original image and b(n,, n,) is the blurring point spread function.
We assume that the effective size of b ( n l , n,) is much smaller than f ( n , , n,). Under figure.
this assumption,
(b) Another approach is to estimate x(n,, n2) = f ( n , , n,) + v(n,, n,) from g(n,, n,)
and then estimate f (n,, n,) from the estimated x(n,, n,), as shown in the following where v,(n,, n,) is zero-mean stationary white Gaussian noise with variance of uf and
figure. vi(n,, n,) is independent from vj(n,, n2) for i # j. We assume that f ( n , , n,) is
nonrandom and wish to estimate f (n,,n z )using the maximum likelihood ( M L )method.
(a) Explain why the ML estimate off ( n , , n2) at a particular pixel ( n ,, n,) is affected
by g,(n,, n,) at only that pixel.
(b) From (a), we can consider one pixel at a time, and therefore we can consider the
problem of estimating one nonrandom variable f from N observations g, given by
Figure P9.16b
E[fgl - E [ flE[gl 9.21. Consider a sequence of degraded images g,(n,, n2)that can be expressed as
where a =
mi
and b = E [ f ] - E[g]a.
where i is the time variable, v,(n,, nZ)is zero-mean stationary white Gaussian noise
(a) Show that a = m,,u~/u:and b = m,. The result is Equation (9.98). with variance of mi, and v,(n,, n,) is independent from v l ( n , , n2) for i # j. We
(b) Using the result in (a), show that assume that g,(n,, n2) has been observed for a long time. To reduce the noise
v,(n,, n,), we consider performing frame averaging by
m,
and b = -. where p,(n,, n,) is the processed image at time index i.
m"
The result is Equation (9.99). Chap. 9 Problems
10 log
at dB
v a r [p(n,,n2) - f (n1. n2)l
where Var [.] is the variance.
(b) The processing method used above requires storage of N frames. One method
of reducing the storage requirement is to obtain the processed image p,(n,,n,)
by
pi(n,, n2) = a~i-i(nl,n2) + (1 - a)g;(n,,.
n2) Image Coding
where a is a real constant between 0 and 1. This method requires the storage of
only one frame p,-,(n,,n,) in addition to the current observed frame g,(n,,n,).
Determine the value of a required to achieve an SNR improvement equal to that
in (a). 10.0 INTRODUCTION
(c) The two methods considered in this problem are based on the assumption that
f(nl,n,) does not change from frame to frame. Suppose this assumption is valid A major objective of image coding is to represent an image with as few bits as
only over some small finite number of frames. Which of the two methods will possible while preserving the level of quality and intelligibility required for the
perform better in this environment? given application. Image coding has two major application areas. One is the
reduction of channel bandwidth required for image transmission systems. Ex-
amples of this application include digital television, video conferencing, and fac-
simile. The other application is reduction of storage requirements. Examples of
this application include reduction in the storage of image data from space programs
and of video data in digital VCRs.
The levels of image quality and intelligibility required vary widely, depending
on the application. In such applications as storage of image data from space
programs and images of objects with historical value that no longer exist, regen-
eration of the original digital data may be very expensive or even impossible. In
such applications, we may want to preserve all the information in the original digital
data for possible future use. Image coding techniques which d o not destroy any
information and which allow exact reconstruction of the original digital data are
said to be information-preserving. In applications such as digital television, it is
not necessary for the coder to be information-preserving. In such applications,
high quality is very important, but some information in the original data may be
destroyed, so long as the decoded video on the TV m litor is acceptable to human
viewers. In applications such as remotely piloted vehicles (RPVs), image intel-
ligibility is essential, but we may be able to sacrifice a significant degree of quality.
The more quality and intelligibility we can sacrifice, the lower will be the required
bit rate.
Image coding is related to image enhancement and restoration. If we can
enhance the visual appearance of the reconstructed image or if we can reduce the
degradation that results from the image coding algorithm (quantization noise being
an example), we may be able to reduce the number of bits required to represent
an image at a given level of quality and intelligibility, or conversely to hold the
number of bits steady while improving the image quality and intelligibility.
Transmitter
Image
source
Image
coder
-
- Channel
encoder
cannot adequately demonstrate the performance of an algorithm.
10.1 QUANTIZATION
( Channel I Let f denote a continuous scalar quantity that may represent a pixel intensity,
-k1-
transform coefficient, or image model parameter. To represent f with a finite
number of bits, only a finite number of reconstruction or quantization levels can
I
1 1 1
Receiver be used. We will assume that a total of L levels are used to represent f. The
Reconstructed : Image Channel process of assigning a specificf to one of L levels is called amplitude quantization,
image decoder decoder
or quantization for short. If each scalar is quantized independently, the procedure
Figure iu.i Typicai environment fur image coding.
Sec. 10.1 Quantization 591
lmage Coding Chap. 10
is called scalar quantization. If two or more scalars are quantized jointly, the
procedure is called vector quantization or block quantization. Vector quantization
f-1 Uniform quantizer +;
is discussed in Section 10.1.2.
Let f denote an f that has been quantized. We can express f as
The quantization error eQ is also called quantization noise. The quantity ek can
be viewed as a special case of a distortion measure d(f, f ) , which is a measure of Although uniform quantization is quite straightforward and appears to be a
distance or dissimilarity between f and f. Other examples of d(f, f ) include natural approach, it may not be optimal. Suppose f is much more likely to be in
one particular region than in others. It is reasonable to assign more reconstruction
If - f (and (IfIp - Ifl~l. The reconstruction and decision levels are often deter- levels to that region. Consider the example in Figure 10.3. Iff rarely falls between
mined by minimizing some error criterion based on d(f, f ) , such as the average
do and dl, the reconstruction level r1 is rarely used. Rearranging the reconstruction
distortion D given by
levels r,, r,, r3, and r, so that they all lie between dl and d, makes more sense.
D = E[d(f, f)] = -= d(fo, f)Pf(fo) df@ (10.4) Quantization in which reconstruction and decision levels do not have even spacing
is called nonuniform quantization.
The most straightforward method of quantization is uniform quantization, in The optimum determination of ri and di depends on the error criterion used.
which the reconstruction and decision levels are uniformly spaced. Specifically, One frequently used criterion is the minimum mean square error (MMSE) criterion.
for a uniform quantizer, Suppose we assume that f is a random variable with a probability density function
di - di-, = A, 1 si 5 L (10.5a) pf(fo). Using the MMSE criterion, we determine rk and dk by minimizing the
average distortion D given by
where A is the step size equal to the spacing between two consecutive reconstruction
levels or two consecutive decision levels. An example of a uniform quantizer when
L = 4 and f is assumed to be between 0 and 1 is shown in Figure 10.3.
The quantization noise eQ is typically signal dependent. For example, the
quantization noise eQ for the uniform quantizer in Figure 10.3 is sketched in Figure
10.4. From Figure 10.4, eQ is a function o f f and therefore is signal dependent.
It is possible to decorrelate the quantization noise eQ for the uniform quantizer by
a method known as dithering or Roberts's pseudonoise technique. As will be
discussed in Section 10.3, decorrelation of quantization noise can be useful in
improving the performance of an image coding system. It changes the character-
istics of the degradation in the coded image. In addition, the decorrelated quan-
tization noise may be reduced by the image restoration algorithms discussed in Figure 10.4 Illustration of signal dependence of quantization noise.
Chapter 9.
Sec. 10.1 Quantization
593
592 Image Coding Chap. 10
TABLE 10.1 PLACEMENT OF RECONSTRUCTION AND DECISION LEVELS
FOR LLOYD-MAX QUANTIZER. FOR UNIFORM PDF, p,(f,,) IS ASSUMED
Noting that f is one of the L reconstruction levels obtained by (10.1), we can write UNIFORM BETWEEN - 1 AND 1. THE GAUSSIAN PDF IS ASSUMED TO
(10.6) as HAVE MEAN OF 0 AND VARIANCE OF 1. FOR THE LAPLACIAN PDF,
filfol
p,(f,,) = -e - 7 with u = 1.
2u
- - - -
Uniform Gaussian Laplacian
To minimize D,
Bits Ti di Ti di rj d;
-aD- - 0 , 1 i k l L - 1
ad, (10.8)
do = -CO
dL = 03.
rk + Tk+l, I S k S L - 1 (10.9b)
dk = 2
do = - w (10.9~)
dL = m. (10.9d)
The first set of equations in (10.9) states that a reconstruction level rk is the centroid
ofpf(fo) over the interval dk-, r fo 5 dk. The remaining set of equations states
that the decision level d, except do and dL is the middle point between two recon-
struction levels rk and rk+,. Equation (10.9) is a necessary set of equations for
the optimal solution. For a certain class of probability density functions, including
uniform, Gaussian, and Laplacian densities, (10.9) is also sufficient.
Solving (10.9) is a nonlinear problem. The nonlinear problem has been
solved for some specific probability density functions. The solutions when pf(fo)
is uniform, Gaussian, and Laplacian are tabulated in Table 10.1. A quantizer
based on the MMSE criterion is often referred to as a Lloyd-Max quantizer [Lloyd;
Max]. From Table 10.1, the uniform quantizer is the optimal MMSE quantizer (10.6) as a function of the number of reconstruction levels L is shown in Figure
when p,(fo) is a uniform probability density function. For other densities, the 10.6 for the optimal MMSE quantizer (solid line). The average distortion D as a
optimal solution is a nonuniform quantizer. For example, the optimal reconstruc- function of L is also shown in Figure 10.6 for the uniform quantizer* (dotted line),
tion and decision levels for the Gaussian pf(fo) with variance of 1 when L = 4 are in which the reconstruction levels ri are chosen to be symmetric with respect to
shown in Figure 10.5.
It is useful to evaluate the performance improvement that the optimal MMSE .. *The definition of uniform quantization by (10.5) has been extended in this case to
quantizer gives over the simpler uniform quantizer. As an example, consider a account for a Gaussian random variable f whose value can range between - 0 3 and m.
Gaussian p,(fu) with mezn of O and rariaxe cf I. The m e r q e dis:or!inn D ir.
Sec. 10.1 Quantization 595
on the assumption that pf(fo) is Gaussian. Similar analyses cad, of course, be
performed for other probability density functions. The more the density function
f-I Nonuniform
quantizer
deviates from being uniform, the higher will be the gain from nonuniform quan-
tization over uniform quantization.
The notion that the uniform quantizer is the optimal MMSE quantizer when
pf(fo) is uniform suggests another approach. Specifically, we can map f to g by a
nonlinearity in such a way that p,(go) is uniform, quantize g with a uniform quan-
tizer, and then perform the inverse nonlinearity. This method is illustrated in
Figure 10.7. The nonlinearity is called companding. From the elemeihary prob-
ability theory, one choice of the nonlinearity or companding C [.I that results in a
uniform p,(go) is given by (see Problem 10.4)
+
From (10.12) and (10.13), the total number of reconstruction levels L is
Q I \ quantization
, , quantization
Lloyd;~~, ,\ Figure 10.6 Comparison of average
distortion D = E[V -
f)=] as a func-
tion of L, the number of reconstruction
levels, for a uniform quantizer (dotted
line) and the Lloyd-Max quantizer
(solid line). The vertical axis is 10 log,,
f -A Nonlinearity quantizer 1 ~ 1 ;-I
Nonlinearity-'
From (10.16), the number c. reconstruction levels for f, is proportional to u,, the
standard deviation of f,. Although (10.15) is an approximate solution obtained
under some specific assumptions, it is useful as a reference in other bit allocation
The average distortion in (10.20) is the mean square error (MSE) and is a gen-
problems. We note that B, in (10.15) can be negative and is not in general an
eralization of (10.7).
integer. In scalar quantization, Biis a nonnegative integer. This constraint has
A major advantage of vector quantization is its performance improvement
to be imposed in solving a bit allocation problem in practice.
over scalar quantization of a vector source. Vector quantization can lower the
average distortion D with the number of reconstruction levels held constant or can
reduce the required number of reconstruction levels when D is held constant.
10.1.2 Vector Quantization There are various ways that vector quantization can improve its performance over
scalar quantization. The most significant way is to exploit the statistical depend-
In the previous section, we discussed scalar quantization of a scalar source and a ence among the scalars in the block.
vector source. An alternate approach to coding a vector source is to divide scalars
To illustrate that vector quantization can exploit statistical dependence, let
into blocks, view each block as a unit, and then jointly quantize the scalars in the
us consider two examples. In the first example, we consider the exploitation of
unit. This is called vector quantization (VQ) or block quantization.
linear dependence (correlation). Consider two random variables f, and f2 with a
Let f = [f,, f2, . . . , fNIT denote an N-dimensional vector that consists of N
real-valued, continuous-amplitude scalars f,. In vector quantization, f is mapped joint probability density function (pdf) ~ ~ , ;, , ~f;) ( shown
f in Figure 10.9(a). The
pdf is uniform with amplitude of 1/2a2in the shaded region and zero in the unshaded
to another N-dimensional vector r = [r,, r2, . . . , rNIT. Unlike f, whose elements
region. The two marginal pdfs pf,(f;) and pf,(f;) are also shown in the figure.
have continuous amplitudes, the vector r is chosen from L possible reconstruction Since E[flf2] f E [fl]E[f2], fl and ft are correlated or linearly dependent. Suppose
or quantization levels. Let f denote f that has been quantized. We can express we quantize f, and f, separately, using scalar quantization and the MMSE criterion.
f as
2 ,n{m
- 1
I = I,, E i C, (10. i 7 j Sec. 10.1 Quantization 599
Pf, ( f ; )
Figure 10.8 Example of vector quantization. The number of scalars in the vector -8
is 2, and the number of reconstruction levels is 9.
Since each of the two scalars has a uniform pdf, the optimal scalar quantizer is a
uniform quantizer. If we allow two reconstruction levels for each scalar, the
optimal reconstruction levels for each scalar are a12 and - a/2. The resulting four
(2 x 2) reconstruction levels in this case are the four filled-in dots shown in Figure
10.9(b). Clearly, two of the four reconstruction levels are wasted. With vector
quantization, we can use only the reconstruction levels shown in Figure 10.9(c).
This example shows that vector quantization can reduce the number of reconstruc-
tion levels without sacrificing the MSE. We can eliminate the linear dependence
between f, and f2 in this example by rotating the pdf clockwise 45". The result of
this invertible linear coordinate transformation is shown in Figure 10.10. In the
new coordinate system, g1 and g2 are uncorrelated, since E[g,g2] = E[g,]E[g2].
In this new coordinate system, it is possible to place the two reconstruction levels
at the filled-in dots shown in the figure by scalar quantization of the two scalars,
and the advantage of vector quantization disappears. Eliminating the linear de-
pendence reduces the advantage of vector quantization in this example. This is Figure 10.9 Illustration that vector quantization can exploit Linear dependence of
scalars in the vector. (a) Probability density functionp,,(f;, f;); (b) reconstruction
consistent with the notion that vector quantization can exploit linear dependence levels (filled-in dots) in scalar quantization; (c) reconstructionlevels (filled-in dots)
of the scalars in the vector. in vector quantization.
Vector quantization can also exploit nonlinear dependence. This is illustrated
in the second example. Consider two random variables f l and f2 whose joint pdf
Sec. 10.1 Quantization
- t
I
Classification of M training vectors
to L clusters by quantization
units. Therefore,
Total number of memory units required = (M + L)N = (M + 2NR)N. (10.26)
Since M is typically much greater than L , memory requirements are dominated by
the storage of training vectors. When N = 10, R = 2, and M = 10L = 10 2 20,
the number of memory units given by (10.26) is on the order of 100 million. Since
Estimation of r, by computing the number in (10.26) again grows exponentially with Nand R, both computational
centroid of the vectors within
each cluster and storage requirements dictate that vector quantization will mainly be useful for
a small number of scalars in the vector and a small number of bits per scalar.
So far we have discussed the computational and storage requirements of
designing a codebook. Once the codebook is designed, it must be stored at both
the transmitter and the receiver. Since the training vectors are no longer needed
once the codebook is designed, only the reconstruction levels need to be stored.
The amount of storage required is still very large. In this case,
t Figure 10.14 Codebook design by the Number of memory units required in a codebook = NL = N 2NR. (10.27)
Stop
K-means algorithm for vector quantiza-
Codebook designed: List of ri tion. When N = 10 and R = 2, the number in (10.27) is on the order of 10 million.
For each vector f to be quantized, the distortion measure d(f, ri) has to be
computed for each of the L reconstruction levels at the transmitter. Therefore,
minimum, we can repeat the algorithm with different initial estimates of ri and for each vector,
choose the set that results in the smallest average distortion D.
Number of arithmetic operations = NL = N . 2NR. (10.28)
The most computationally expensive part of the K-means algorithm is the
quantization of training vectors in each iteration. For each of M training vectors, When N = 10 and R = 2, the number in (10.28) is on the order of 10 million.
the distortion measure must be evaluated L times (once for each reconstruction From (10.27) and (10.28), both the number of memory units required in the code-
level), so ML evaluations of the distortion measure are necessary in each iteration. book and the number of arithmetic operations required in quantizing one vector
If we assume that there are N scalars in the vector, R bits are used per scalar, and f grow exponentially with N (scalars per vector) and R (bits per scalar). Note that
a uniform-length codeword is assigned to each reconstruction level, then L is related the above arithmetic operations are required at the transmitter. Fortunately, the
to N and R by receiver requires only simple table look-up operations.
where B is the total number of bits used per vector. If we further assume that
the distortion measure used is the squared error e$eQ, one evaluation of the dis-
f' Tree codebook and binary search. The major computations involved in
codebook design by the K-means algorithm are in the quantization of the training
tortion measure requires N arithmetic operations (N multiplications and N addi- vectors. Quantization of vectors is also required when the codebook is used at
tions). The number of arithmetic operations required in the training vector quan- the transmitter. When a codebook is designed by the K-means algorithm, quan-
tization step of each iteration is given by tization of a vector during the design and the data transmission necessitates eval-
uating a distortion measure between the vector and each of the L reconstruction
Number of arithmetic operations = NML = NM . 2NR.
From (10.25), the computational cost grows exponentially with N (the number of
(10.25)
I levels. This is called a full search and is responsible for the exponential dependence
of the number of computations involved on the number of scalars in the vector
scalars in a vector) and R (number of bits per scalar). When N = 10, R = 2, and the number of bits per scalar. Various methods have been developed to
and M = 10L = 10 2NR = 10 220,the number of arithmetic operations given eliminate this exponential dependence [Makhoul et al.]. They achieve a reduction
by (10.25) is 100 trillion per iteration.* in computations by modifying the codebook, by sacrificing performance in achieved
average distortion, and/or by increasing storage requirements. We will describe
*The choice of NR = 20 was made to illustrate how fast the number can grow when one such method, which results in what is termed a tree codebook.
it depends exponentially on NR. In typical applications of vector quantization. NR chnwn
is significantly less than 20. Sec. 10.1 Quantization 609
measure. Suppose r, is chosen. In the third stage, we compare f to r,, and r,,
in Figure 10.15. This procedure is continued until we reach the last stage; the
The basic idea behind the tree codebook is to divide the N-dimensional space reconstruction level chosen at the last stage is the reconstruction level of f. If L
of f into two regions using the K-means algorithm with K = 2, then divide each = 8 and r,, has been chosen in the third stage in the above example, r,, will be
of these two regions into two more regions, again using the K-means algorithm, the reconstruction level of f. In this procedure, we simply follow the tree and
and continue the process. Specifically, assuming that L can be expressed as a perform a search between two reconstruction levels at each node of the tree. Since
power of 2, we first design a codebook with two reconstruction levels r, and r, the search is made between two levels at a time, it is said to be a binary search.
using the K-means algorithm. We then classify all the training vectors into two Since log, L stages are involved and the distortion measure is evaluated twice at
clusters, one cluster corresponding to r, and the other to r,. Each of the two each stage, the number of arithmetic operations required to quantize f using the
clusters is treated independently, and a codebook with two reconstruction levels tree codebook is given by
is designed for each cluster. This process is repeated until we have a total of L
reconstruction levels at the last stage. This is shown in Figure 10.15 for the case Number of arithmetic operations = 2N log, L = 2N2R. (10.30)
where L = 8. By this process, the tree codebook is designed. From (10.30), the computational cost does not increase exponentially with N and
We first consider the computational and storage requirements in designing R. When the number in (10.30) is compared with the corresponding number in
the codebook. We again assume that one evaluation of the distortion measure (10.28) for the full search case, the computational cost is reduced by a factor of
requires N arithmetic operations. Since there are log, L stages and the distortion 2NRl(2NR). When N = 10 and R = 2, the reduction is by a factor of 26,000.
measure is evaluated only twice for each of M training vectors for each stage and The reduction in the number of computations has a cost. The codebook
for each iteration of the K-means algorithm, used at the transmitter must store all the intermediate reconstruction levels as well
Total number of arithmetic operationsliteration = 2NM log, L. (10.29)
I as the final reconstruction levels, because the intermediate levels are used in the
search. The codebook size is, therefore, increased by a factor of two over the
When this number is compared with the corresponding number in (10.25), for the codebook designed by the K-means algorithm. In addition, the tree codebook's
full search case, the reduction in the number of computations is seen to be by a performance in terms of the average distortion achieved is reduced slightly in typical
factor of Ll(2 log, L) or 2NRl(2NR). When N = 10 and R = 2, the reduction in applications as compared to the codebook designed by the K-means algorithm. In
computation is by a factor of 26,000. The storage required in the design of the many cases, however, the enormous computational advantage more than compen-
tree codebook is slightly more than in the case of K-means algorithm, since the sates for the twofold increase in storage requirements and the slight decrease in
storage requirement is dominated in both cases by the need to store all the training performance.
vectors. The binary search discussed above is a special case of a more general class
We will now consider the computation involved in quantizing a vector fusing of methods known as tree-searched vector quantization. It is possible, for example,
the tree codebook. In the first stage, we compute the distortion measure between to divide each node into more than two branches. In addition, we can terminate
f and the two reconstruction levels r, and r, in Figure 10.15. Suppose d(f, r2) is a particular node earlier in the design of the codebook when only a very small
smaller than d(f, r,), so we choose r,. In the second stage, we compute the number of training vectors ars assigned to the node.
distortion measure between f and the two reconstruction levels r, and r, in Figure In this section, we have discussed vector quantization. The advantage of
10.15 and choose the reconstruction level that results in the smaller distortion vector quantization over scalar quantization lies in its potential to improve per-
formance. The amount of performance improvement possible depends on various
factors, for instance, the degree of statistical dependence among the scalars in the
vector. This performance improvement, however, comes at a price in computation
and storage requirements. Whether or not the performance improvement justifies
the additional cost depends on the application. Vector quantization is likely to
be useful in low bit rate applications, where any performance improvement is
important and the additional cost due to vector quantization is not too high due
to the low bit rate. Vector quantization can also be useful in such applications as
broadcasting, where the number of receivers is much larger than the number of
transmitters and the high cost of a transmitter can be tolerated. The receiver in
vector quantization has to store the codebook, but it requires only simple table
look-up operations to reconstruct a quantized vector.
10.2.1 Uniform-Length Codeword Assignment 10.2.2 Entropy and Variable-Length Codeword Assignment
In Section 10.1, we discussed the problem of quantizing a scalar or a vector source. Uniform-length codeword assignment, although simple, is not in general optimal
As the result of quantization, we obtain a specific reconstruction level. To transmit in terms of the required average bit rate. Suppose some message possibilities are
to the receiver which of the L possible reconstruction levels has been selected, we more likely to be sent than others. Then by assigning shorter codewords to the
need to assign a specific codeword (a string of 0s and 1s) to each of the L recon- more probable message possibilities and longer codewords to the less probable
struction levels. Upon receiving the codeword, the receiver can identify the re- message possibilities, we may be able to reduce the average bit rate.
construction level by looking up the appropriate entry in the codebook. For the Codewords whose lengths are different for different message possibilities are
receiver to be able to uniquely identify the reconstruction level, each reconstruction called variable-length codewords. When the codeword is designed based on the
level must be assigned a different codeword. In addition, since more than one statistical occurrence of different message possibilities, the design method is called
reconstruction level may be transmitted in sequence, the codewords have to be statistical coding. To discuss the problem of designing codewords such that the
designed so that they can be identified when received sequentially. A code having average bit rate is minimized, we define the entropy H by
these characteristics is said to be uniquely decodable. When L = 4, assigning 00
L
to r,, 01 to r,, 10 to r,, and 11 to r, results in a uniquely decodable code. A code H = - C Pi log, Pi
constructed by assigning 0 to r,, 1 to r,, 10 to r,, and 11 to r, is not uniquely i=l (10.31)
decodable. When 100 is received, for example, it could be taken to represent
where Pi is the probability that the message will be a,. Since Xf= Pi = 1, it can
either r3rl or r,r,r,. be shown that
It is convenient to think of the result of quantizing a scalar or a vector as a
message that has L different possibilities a,, 1 5 i 5 L , with each possibility 0 s H 5 log, L . (10.32)
corresponding to a reconstruction level. The simplest method of selecting code-
The entropy H can be interpreted as the average amount of information that a
words is to use codewords of uniform length. In this method each possibility of
message contains. Suppose L = 2. If P, = 1 and P, = 0, the entropy H is 0,
the message is coded by a codeword that has the same length as all the other and is the minimum possible for L = 2. In this case, the message is a, with
possibilities of the message. An example of uniform-length codeword selection
probability of 1; that is, the message contains no new information. At the other
for L = 8 is shown in Table 10.2. The length of each codeword in this example
extreme, suppose P1 = P, = 4. The entropy H is 1 and is the maximum possible
is log, L = log, 8 = 3 bits. We will refer to the number of bits required to code
for L = 2. In this case, the two message possibilities a, and a, are equally likely.
a message as the bit rate. The bit rate in our example is 3 bitslmessage. If we Receiving the message clearly adds new information.
code more than one message, the average bit rate is defined as the total number From the information theory, the entropy H in (10.31) is the theoretically
minimum possible average bit rate required in coding a message. This result,
although it does not specify a method to design codewords, is very useful. Suppose
TABLE 10.2 AN the average bit rate using the codewords we have designed is the same as the
EXAMPLE OF UNIFORM- entropy. We then know that these codewords are optimal, and we do not have
LENGTH CODEWORD to search any further. For example, suppose L can be expressed as a power of 2
SELECTION FOR A and each message possibility a, is equally probable so that Pi = 1lL for 1 s i 5
MESSAGE WITH EIGHT
POSSIBILITIES. L. From (10.32), the entropy H for this case is log, L. Since uniform-length
codeword assignment results in an average bit rate of log, L bitslmessage, we can
Message Codeword conclude that it is an optimal method to design codewords in this case. The entropy
also provides a standard against which the performance of a codeword design
method can be measured. If the average bit rate achieved by a codeword design
method is close to the entropy, for example, the method is very efficient.
If we code each message separately, it is not in general possible to design
codewords that result in an average bit rate given by the entropy. For example,
suppose L = 2, P1 = 4, and P, = $. Even though the entropy H of this message
uniform n2)
subsequently reducing it by an image restoration algorithm can be applied to any
f(nl, ' 2 ) quantiar system that has a uniform quantizer as a component. For example, consider the
PCM system with a nonuniform quantizer shown in Figure 10.23(a). Roberts's
pseudonoise can be added prior to the uniform quantizer and subtracted after it.
This is shown in Figure 10.23(b). The signal g(nl, n,) can be viewed as g(nl, n,)
degraded by additive random noise independent of g(n,, n,). If noise reduction
is desired, a noise reduction system can be applied to g(nl, n,), as shown in Figure
I 10.23(b).
Figure 10.19 Decorrelation of quantization noise by Roberts's pseudonoise
technique.
Sec. 10.3 Waveform Coding 621
TlME
In the PCM system, the image intensity is coded by scalar quantization, and the
correlation among pixel intensities is not exploited. One way to exploit some of
the correlation, still using scalar quantization, is delta modulation (DM). In the
DM system, the difference between two consecutive pixel intensities is coded by
a one-bit (two reconstruction levels) quantizer. Although the dynamic range of
the difference signal is doubled as a result of differencing, the variance of the
difference signal is significantly reduced due to the strong correlation typically
present in the intensities of two pixels that are spatially close.
In discussing DM, it is useful to assume that the pixels in an image have been Figure 10.22 Example of quantization noise reduction in PCM image coding. (a) Original
arranged in some sequential manner so that f(nl, n,) can be expressed as a 1-D image of 512 x 512 pixels; (b) PCM-coded image at 2 bits/pixel; (c) PCM-coded image at
2 bits/pixel by Roberts's pseudonoise technique; (d) PCM-coded image at 2 bitslpixel with
signal f(n). If f(n) is obtained by reading one row of f(nl. n,) and then reading quantization noise reduction.
the next row, it will preserve some of the spatial correlation present in f(n,, n,).
A DM system is shown in Figure 10.24. In the figure, f(n) represents f(n) re-
constructed by DM. To code f(n), the most recently reconstructed f(n - 1) is
s.;bt;ac:cd f x n ; f ( n ) . Thc diffcience sign21 c ( n ) = f (;2) f(n 1) is quaiitizcd
Sec. 10.3 Waveform Coding
622 Image Coding Chap. 10
The equations that govern the DM system in Figure 10.24 are
Receiver 1- 7 otherwise
1rsnsrnitter 1
8(n1#" 2 ) Noie
f (nl. n2) Nonlinearity-'
Nonlinearitv reduction
From (10.35a) and (10.35c),the quantization noise eQ(n)is given by
f, (n)
Figure 10.25 Illustration that quantiza-
tion noise accumulates in delta modula-
tion when f(n - 1) is used in predicting
Receiver
f (n) instead of f(n - 1). The solid
staircase denoted by f,(n) is the recon-
struction when f ( n - 1) is used. The
dotted staircase denoted by f2(n)is the
n reconstruction when f (n - 1) is used.
Figure 10.24 Delta modulation system.
Sec. 10.3 Waveform Coding 625
Image Coding Chap. 10
Slope overload
A
Granular noise
where f(n,, n,) is assumed to be a stationary random process with the correlation
function Rf(n,, n,). The linear equations in (10.42) are the same as those used
where Q[e(n,, n,)] is the quantization of e(n,, n,) by a PCM system. From (10.37a)
in the estimation of the autoregressive model parameters discussed in Chapters 5
and (10.37c), the quantization noise eQ(n,, n,) is given by
and 6.
Figure 10.30 illustrates the performance of a DPCM system. Figure 10.30
shows the result of a DPCM system at 3 bitslpixel. The original image used is the
The DPCM system in (10.37) can also be viewed as a generalization of PCM.
image in Figure 10.22(a). The PCM system used in Figure 10.30 is a nonuniform
Specifically, DPCM reduces to PCM when ff(n,, n,) is set to zero.
quantizer. The prediction coefficients a(kl, k,) used to generate the example are
In DPCM, f (n,, n,) is predicted by linearly combining the previously recon-
structed values:
Transmitter Receiver
I
Previousl. !I intensities I
I
I
L ------------- -A
I
Figure 10.30 Example of differential
{(n, - n2 - 11, II pulse code modulation (DPCM)-coded
f(n, - . .. I
I
image at 3 bitdpixel. Original image
I used is the image in Figure 10.22(a),
I NMSE = 2.2%, SNR = 16.6 dB.
A - -------------- J
Figure 10.29 Differential pulse code modulation system. Sec. 10.3 Waveform Coding
629
where
The constant a in (10.44b) is a free parameter and is chosen typically between 0.3
and 0.6. The sequence h(n) is sketched in Figure 10.34 for a = 0.3,0.4, 0.5, and
0.6. When a = 0.4, h(n) has approximately Gaussian shape and thus the name
"Gaussian pyramid." The choice of h(n,. n,) in (10.44) ensures that h(n,, n,) is
zero phase, and the filter passes the D C component unaffected (H(0, 0 ) = 1,
Znl Znt h(n,, n,) = 1). In addition, separability of h(nl, n,) reduces computations
involved in the filtering operation. The image f,L(n,, n,) obtained from fo(nl, n,)
* h(n,, n,) is then subsampled by a factor of 4, that is, a factor of 2 along n , and
a factor of 2 along q,. The subsampled imagef,(n,, n,) is given by
2n1, 2 4 , 0 In, 5 2 M - 1 , 0 5 n2 5 2 M - 1 Figure 10.34 Impulse response h(n) as
(10.45) a function of the free parameter "a."
The *-DIowpas filter h ( n n2)used
otherwise.
-0.05 -0.05 in the Gaussian
sentation pyramid
is obtained fromimage repre-
h(n)by
The size of fl(nl, n,) is ( 2 M - 1+ 1) x (2M-1+ 1) pixels, approximately one quarter
(d) h(n,, n2) = h(n,)h(n,).
of the size of fo(nl, n,). From (10.45), f,L(n,, n2) has to be computed for only
even values of n , and n, to obtain f,(n,, n,). Higher-level images are generated
by repeatedly applying the lowpass filtering and subsampling operations. A one-
dimensional graphical representation of the above process is shown in Figure 10.35.
An example of the Gaussian pyramid representation for an image of 513 x 513
pixels is shown in Figure 10.36.
ff (n,. n 2 )
n2) : Lowpass filtering * Subsample +
Figure 10.35 One-dimensional graphical representation of the Gauuian pyramid
Figure 10.33 Process of generating the i + lth-level image f , , , ( n , . n,) from the generation.
ith-level image f.!n,. n,] in Gaussian pyramid image represe~tztion.
Sec. 10.3 Waveform Coding 635
Image Coding Chap. 10 -1
The idea that an image can be decomposed into two components with very
different characteristics can also be applied to coding fl(n,, n,), the first-level image
in the Gaussian pyramid. Instead of coding fl(nl, n,), we can code f2(nl, n,) and
e1(n1, n,) given by
el(%, n2) = fl(n1, n2) - I[fz(n1,n2)l. (10.49)
This procedure, of course, can be repeated. Instead of codingf,(nl, n,), we can
code f;:+,(n1, n,) and ei(nl, n,) given by
ei(n1, n2) = h(n1, n2) - I[fi+l(nl,n2)I- (10.50)
If we do not quantize f;.+,(n,, n,) and ei(nl, n,), from (10.50) f,(nl, n,) can be
recovered exactly from f,+l(nl, n,) and ei(nl, n,) by
We can repeat the above process until we reach the top level of the pyramid. This
is shown in Figure 10.37. Instead of coding fo(nl, n,), we code ei(nl, n,) for
0 r i 5 K - 1 and fK(nl, n,). An example of ei(nl, n,) for 0 5 i 5 K - 1 and
fK(nl, n,) for an original image fo(nl, n,) of 513 x 513 pixels for K = 4 is shown
Figure 10.36 Example of the Gaussian pyramid representation for image of 513 x 513 pixels in Figure 10.38. If ei(nl, n,) for 0 5 i 5 K - 1 and f,(nl, n,) are not quantized,
with K = 4. then fo(nl, n,) can be reconstructed exactly from them by recursively solving (10.51)
for i = K - 1, K - 2, . . . , 0. Note that (10.51) is valid independent of the
specific choice of the interpolation operation I[.]. Equation (10.51) can be used
The Gaussian pyramid representation can be used in developing an approach for reconstructing fo(nl, n,) from quantized ei(nl, n,) for 0 r i 5 K - 1 and
to image coding. To code the original image fo(nl, n,), we code fl(nl, n,) and the f ~ ( ~ n2).
1 ,
difference between fo(nl, n,) and a prediction of fo(nl, n,) from fl(nl, n,). Suppose The images fK(nl, n,) and ei(nl, n,) for 0 5 i 5 K - 1 form a pyramid called
we predict fo(nl, n,) by interpolating fl(nl, n,). Denoting the interpolated image the Laplacian pyramid, where ei(nl, n,) is the ith-level image of the pyramid and
by f{(nl, n,), we find that the error signal eo(nl,n,) coded is fK(nl, n,) is the top level image of the pyramid. From (10.50),
eo(nl, n2) = fo(n1, n2) - I [ f l ( n l ,nz)]. (10.52)
From Figure 10.33, fl(nl, n,) is the result of subsampling fo(nl, n,) * h(nl, n,).
where I[.]is the spatial interpolation operation. The interpolation process expands Approximating the interpolation operation I[.] as the reversal of the subsampling
the support size of fl(nl, n,), and the support size of f{(nl, n,) is the same as operation,
fo(nl, n,). One advantage of coding fl(nl, n,) and eo(nl,n,) rather than fo(nl, n,) is eo(n1, n2) = f o b 1 9 n2) - fo(n1, n2) * h(n1, n2) (10.53)
that the coder used can be adapted to the characteristics of fl(nl, n,) and eo(nl,n,).
If we do not quantize fl(nl, n,) and eo(nl,n,), from (10.46)fo(nl, n,) can be recovered
exactly by Since h(nl, n,) has lowpass character, eo(nl, n,) has a highpass character. Now
consider el(nl, n,), the first-level image of the Laplacian pyramid. Following the
fo(n1, n2) = I [ f l ( n l ,nz)] + eo(n1, n2)- (10.47)
steps similar to those that led to (10.53) and making a few additional approxi-
In image coding, fl(nl, n,) and eo(nl,n,) are quantized and the reconstructed image mations, we obtain
fO(nl,n,) is obtained from (10.47) by I[el(nl, n2)I I
.fo(n1, n2) * hl(n1, n2) (10.54)
where fO(nl, n,) and $(nl, n,) are quantized versions of f,(nl, n,) and eo(nl, n,). where hl(nl, n2) = h(nl, n2) - h(nl, n2) * h(nl, n2). (10.55)
If we stop here, the structure of the coding method is identical to the two-channel
coder we discussed in the previous section. The image fl(n,, n,) can be viewed From (10.54), the result of interpolating el(nl, n,) such that its size is the same as
as the subsampled lows component fLs(nl, n,) and eo(nl,n,) can be viewed as the fo(nl, n,) is approximately the same as the result of filtering fo(nl, n,) with hl(nl, n,).
highs component f,,(nl, t i 2 ) in the system in Figiii-e i0.3i.
Sec. 10.3 Waveform Coding 637
636 Image Coding Chap. 10
'
1 Interpolation
Figure 10.38 Example of the Laplacian pyramid image representation with K = 4. The
original image used is the 513 x 513-pixel image fo(nl, n,) in Figure 10.36. e,(n,, n,) for
0 5 i s 3 and f,(n,, n,).
the difference of the two Gaussian functions. The difference of two Gaussians
can be modeled [Marr] approximately by the Laplacian of a Gaussian, hence the
name "Laplacian pyramid. "
From the above discussion, the pyramid coding method we discussed can be
fr;-l In,, n 2 ) (.-3 viewed as an example of subband image coding. As we have stated briefly, in
subband image coding, an image is divided into different frequency bands and each
band is coded with its own coder. In the pyramid coding method we discussed,
the bandpass filtering operation is performed implicitly and the bandpass filters
are obtained heuristically. In a typical subband image coder, the bandpass filters
are designed more theoretically [Vetterli; Woods and O'Neil].
Figure 10.39 illustrates the performance of an image coding system in which
I
4.
lnterpolation
1 --
Figure 10.37 Laplacian pyramid gener-
fK(nl, n,) and ej(n,, n,) for 0 5 i 5 K- 1 are coded with coders adapted to the
signal characteristics. Qualitatively, higher-level images have more variance and
more bitstpixel are assigned. Fortunately, however, they are smaller in size. Fig-
ation. The base image f,(n,, n,) can be ure 10.39 shows an image coded at 4 bittpixel. The original image used is the 513
reconstructed from ej(n,, n,) for 0 5 i 5
x 513-pixel image fo(nl, n,) in Figure 10.36. The bit rate of less than 1 bitlpixel
K - 1 and f&,, n,).
was possible in this example by entropy coding and by exploiting the observation
that most pixels of the 513 x 513-pixel image eo(nl, n,) are quantized to zero.
One major advantage of the pyramid-based coding method we discussed
Since h(nl, n,) is a lowpass filter, hl(nl, n,) in (10.56) is a bandpass filter. If we above is its suitability for progressive data transmission. By first sending the top-
continue the above analysis further, we can argue that the result of repetitive level image fK(nl, n,) and interpolating it at the receiver, we have a very blurred
interpolation of ei(nl, n,) for 1 s i 5 K - 1 is approximately the same as the image. We then transmit eK-,(n,, n,) to reconstruct fK-,(nl, n,), which has a
result of filtering fo(nl, n,) with a different bandpass filter. As we increase i from higher spatial resolution than fK(nl, n,). As we repeat the process, the recon-
1 to K - 1, the frequency response of the bandpass filter has a successively smaller structed image at the receiver will have successively higher spatial resolution. In
effective bandwidth with successively lower passband frequencies. If h(n,, n,) has some applications, it may be possible to stop the transmission before we fully
approximately Gaussian shape, then so will h(n,, n,) * h(n,, n,). For a Gaussian-
shaped hjn,, n,), from ji0.56j the banapass fiiter has an impulse response that is Sec. 10.3 Waveform Coding 639
10.4.1 Transforms
and are selected from a zone of triangular shape shown in Figure 10.43(a). From
Figure 10.45, it is clear that the reconstructed image appears more blurry as we
retain a smaller number of coefficients. It is also clear that an image reconstructed
from only a small fraction of the transform coefficients looks quite good, illustrating
the energy compaction property.
Another type of degradation results from quantization of the retained trans-
form coefficients. The degradation in this case typically appears as graininess in
the image. Figure 10.46 shows the result of coarse quantization of transform
coefficients. This example is obtained by using a 2-bit uniform quantizer for each
retained coefficient to reconstruct the image in Figure 10.45(b).
A third type of degradation arises from subimage-by-subimagecoding. Since
each subimage is coded independently, the pixels at the subimage boundaries may
have artificial intensity discontinuities. This is known as the blocking effect, and
is more pronounced as the bit rate decreases. An image with a visible blocking
effect is shown in Figure 10.47. A DCT with zonal coding, a subimage of 16 x
16 pixels, and a bit rate of 0.15 bitlpixel were used to generate the image in Figure
10.47.
When the bit rate is sufficiently low, the blocking effect, which results from in-
dependent coding of each subimage, becomes highly visible. Reconstructed images
exhibiting blocking effects can be very unpleasant visually, and blocking effects
that are clearly visible often become the dominant degradation.
Two general approaches to reducing the blocking effect have been considered.
In one approach, the blocking effect is dealt with at the source. An example of
this approach is the overlap method, which modifies the image segmentation proc-
ess. A typical segmentation procedure divides an image into mutually exclusive
regions. In the overlap method, the subimages are obtained with a slight overlap
around the perimeter of each subimage. The pixels at the perimeter are coded in
two or more regions. In reconstructing the image, a pixel that is coded more than
once can be assigned an intensity that is the average of the coded values. Thus,
abrupt boundary discontinuities caused by coding are reduced because the recon-
structed subimages are woven together. An example of the overlap method is
shown in Figure 10.49. In the figure, a 5 x 5-pixel image is divided into four 3
Figure 10.47 DCT-coded image with x 3-pixel subimages by using a one-pixel overlap scheme. The shaded area in-
visible blocking effect. dicates pixels that are coded more than once. The overlap method reduces blocking
effects well. However, some pixels are coded more than once, and this increases
the number of pixels coded. The increase is about 13% when an image of 256 x
and (b) show the results of DCT image coding at 1 bitlpixel and 4 bitlpixel, re- 256 pixels is divided into 16 x 16-pixel subimages with a one-pixel overlap. This
spectively. The original image is the 512 x 512-pixel image shown in Figure increase shows why overlap of two or more pixels is not very useful. It also shows
10.45(a). In both examples, the subimage size used is 16 x 16 pixels, and adaptive a difference between image coding and other image processing applications such
zonal coding with the zone shape shown in Figure 10.43(b) and the zone size adapted as image restoration in dealing with blocking effects. As was discussed in Section
to the local image characteristics has been used. 9.2.3, a blocking effect can occur in any subimage-by-subimage processing envi-
ronment. In image restoration, the cost of overlapping subimages is primarily an
increase in the number of computations. An overlap of 50% of the subimage size
is common in subimage-by-subimage restoration. In image coding, however, the
cost of overlapping subimages is an increase in the number of computations and,
more seriously, a potential increase in the required bit rate. An overlap of more
than one pixel is thus seldom considered in DCT image coding.
T;iklr Codeword
assignment
/ e
low bit rates, while requiring simpler implementation than true 2-D transform
coders.
In a hybrid coder, an image f(n,, n,) is transformed by a 1-D transform, such Receiver
as a 1-D DCT, along each row (or column). The result Tf(k,,
*'I
n,) is then coded A
A
' k l , '2'
by a waveform coder, such as DPCM, along each column (or row). This is illus- f (n,, n,) + I-,waveform re-p'kl*
along each column Decoder
trated in Figure 10.51. The 1-D transform decorrelates each row of data well.
The remaining correlation is reduced further by DPCM. Due to the transform,
the correlation in the data is reduced more than it would have been by waveform Figure 10.51 Hybrid transform/waveform coder.
ceding alcze. Sizce 2 !-I"transform is csed, the. imnlpmentaticn
-r---- isszes szch as
Sec. 10.4 Transform lmage Coding
654 Image Coding Chap. 10
Transmitter
Model
zone can be adapted. In addition, the number of bits allocated to each subimage f h , , f12) Analysis Encoder +
and each coefficient within a subimage can be adapted. For example, we may
wish to code fewer coefficients and allocate fewer bits in uniform background
regions than in edge regions.
A coding method may be adapted continuously based on some measure such Receiver Quantized
model
as the variance of pixel intensities within a subimage. If the local measure can be parameters
f ( n , , f12) 4 Synthesis -< Decoder 4
obtained from previously coded subimages, it does not have to be transmitted.
Figure 10.52 Image model coder.
An alternative to continuous adaptation is to classify all subimages into a small
number of groups and design a specific coding method for each group. An ad-
vantage of this approach is the small number of bits necessary to code the group
that a given subimage belongs to. If the number of bitslframe is fixed in adaptive ground regions of an image, such as grass, sky, and wall, may not be essential to
coding, a control mechanism that allocates the appropriate number of bits to each image intelligibility, and we may be able to replace them with similar backgrounds
subimage is required. For a variable bit rate coder, a buffer must be maintained that can be synthesized with a simple model. As another example, a cartoonist
to accommodate the variations in local bit rates. can draw a very intelligible image with a small number of simple lines. In image
Adaptive coding significantly improves the performance of a transform coder model coding, we attempt to retain the features of an image essential to its intel-
while adding to its complexity. Transform coding methods are used extensively ligibility and grossly approximate nonessential features, using simple image models.
in low bit rate applications where reducing the bit rate is important, even if it This approach contrasts sharply with waveform and transform coding, in which we
involves additional complexity in the coder. Many transform coders used in prac- attempt to accurately reconstruct the image intensity f (n,, n,). In waveform and
tice are adaptive. transform coding, the difference between f(n,, n,) and the reconstructed f(n,, n,)
In transform coding, the pixel intensities in a block are transformed jointly. comes from the quantization of the parameters, and f(n,, n,) can be reconstructed
In this respect, transform coding is similar to vector quantization of a waveform. exactly if the parameters used are not quantized. In image model coding, the
The transform coefficients, however, are typically quantized by scalar quantization. difference between f (n,, n,) and the synthesizedf(nl, n,) is due both to quantization
It is possible to use vector quantization to quantize transform coefficients. How- of the model parameters and to modeling error. It is generally not possible to
ever, the additional complexity associated with vector quantization may not justify exactly reconstruct f (n,, n,) from the model parameters, even if we do not quantize
the performance improvement that vector quantization can offer. Transform coef- them. The number of parameters involved is likely to be much smaller in image
ficients are not correlated much with each other, while correlation among scalars model coding than in waveform and transform coding, indicating the image model
is a major element exploited by vector quantization. coding's potential for application to the development of very low bit rate systems.
A typical image consists of various regions with different characteristics. It
is convenient to model different classes of regions with different models. Regions
10.5 IMAGE MODEL CODING such as grass, water, sky, and wall have order or repetitive structure analogous to
the texture of cloth. We will refer to such regions as texture regions [Brodatz].
In image model coding, an image or some portion of an image is modeled and the There are two basic approaches to texture modeling. One is to use some basic
model parameters are used for image synthesis. At the transmitter, the model elementary pattern and repeat it according to some deterministic or probabilistic
parameters are estimated from analyzing the image, and at the receiver the image rule. Another approach is to model a texture region as a random field with some
is synthesized from the estimated and quantized model parameters. An image given statistics. Studies indicate that when two textures have similar second-order
model coder is shown in Figure 10.52. It can be viewed as an analysis/synthesis statistics, they will usually appear to human viewers to have similar textures. Many
system. Image model coders have the potential to synthesize intelligible images models can be used to model texture as a random field with some given second-
at a bit rate substantially lower than that necessary for waveform or transform order statistics [Pratt et al.]. Figure 10.53 shows an example of a texture region's
coders. However, they are still at the research stage, and much needs to be done synthesis by a random process model. The figure shows an original image of 512
before their use becomes feasible in very low bit rate applications, such as video x 512 pixels with a region of 96 x 128 pixels replaced by texture synthesized by
telephones for the deaf. Developing a simple model capable of synthesizing in- using a first-order Markov model [Cross and Jain; Cohen and Cooper] with 3
telligible images is not an easy problem. In addition, estimating model parameters unknown model parameters. Even though more than 10,000 pixel intensities have
and synthesizing an image from them are likely to be very expensive computa- been synthesized by using only 3 parameters, the synthesized texture blends well
tionally. with the rest of the image. To use an approach of this type, some method of
Image model coding exploits the notion that synthesizing intelligible images segmenting an image into regions of similar texture must also be developed.
does not 'r de:a;l"T' a-n..rnT'..n+;#,n
Il,,U
n f 1111u6b;n+n?r;t;nr
r p x w u u r u w u w, I w I rAum~!e,Sack-
JAILro. Enr
ova
10.6 INTERFRAME IMAGE CODING, COLOR IMAGE CODING, AND f ( n l . n2. n3l 2-D transform ' Tt(k1, k2, " 3 )
coding f;(k1, kz. n
-
CHANNEL ERROR EFFECTS for each ns b along n,
-
at each ( k , . k 2 )
Ic
high correlation between neighboring frames can be exploited in image coding. Figure 10.55 Interframe hybrid coder.
Exploiting the temporal correlation as well as spatial correlation to code a sequence
of images is called interframe coding. transforms are first computed. Since many coefficients are discarded at this point,
Interframe coding requires storage of the frames used in the coding process. waveform coding is applied to only a fraction of transform coefficients. In trans-
If we use N previous frames in coding the current frame, then these previous N form coding, however, all the transform coefficients have to be computed first.
frames have to be stored. In addition, there may be some inherent delay if all N In addition, transform coding imposes an inherent delay, because computing any
frames are needed at the same time in coding the current frame, as in transform one transform coefficient requires all the frames involved. In hybrid coding, the
coding. This limits the number of frames that can be used in interframe coding. current frame is predicted from one or two previously coded frames and significant
In most cases, only one or two additional frames are used in coding the current delay is not necessary.
frame. Interframe hybrid coding can be viewed as an example of dimension-de-
One approach to interframe coding is to simply extend 2-D intraframe coding pendent processing, which was discussed in Section 6.2.2. In dimension-dependent
methods to 3-D interframe coding. In DPCM waveform coding, for example, we processing, a different processing method is applied to each different dimension,
can predict the current pixel intensity being coded as a linear combination of chosen on the basis of the dimension's particular characteristics. Many data points
previously coded neighborhood pixel intensities in both the current and the previous are available along the two spatial dimensions, and transform coding, which exploits
frames. Transform coding methods can also be extended in a straightforward the correlation among the data points, is used. Along the temporal dimension,
manner. A strong temporal domain correlation will manifest itself as an energy only a few data points are typically available, and waveform coding is used.
concentration in the low temporal-frequency regions, and transform coefficients in The frame replenishment method which is related to DPCM codes the differ-
the high temporal-frequency regions may be discarded without seriously distorting ence between the current frame and the previously coded frame. Let f (n,, n,, n,)
the image frame intensities. Some studies [Pratt] indicate that in typical scenes, represent the current frame and f(n,, n2, n3 - 1) represent the previously coded
the bit rate can be reduced by a factor of five without sacrificing intelligibility or frame. In the simplest form of the frame replenishment method, f (n,, n,, n,) is pre-
quality if a 3-D DCT with a subimage size of 16 x 16 X 16 pixels is used instead dicted asf(n1, n2, n3 - 11, and e(nl, n,, n,) = f(nl, n,, n,) - f(nl, n,, n, - 1)
of a 2-D DCT with a subimage size of 16 x 16 pixels. In practice, the storage is quantized. Since (e(nl, n,, n3)(is typically very small except in the small regions
requirements and delay involved make using sixteen frames difficult in most cases. where there is motion, only e(nl, n,, n,) with magnitude above a certain threshold
Hybrid transformlwaveform coding can also be extended to interframe coding. is coded along with its spatial location. At the decoder, the quantized e(n,, n,, n,)
We can compute a 2-D transform for each frame and then apply a waveform coder is combined with f(n,, n,, n, - 1) to reconstruct the current frame f(n,, n,, n3).
such as DPCM along the temporal dimension, as shown in Figure 10.55. The Since the number of pixels at which e(n,, n2, n,) is retained depends on the local
sequence f (n,, n,, n,) in the figure represents the intensities of image frames, with frames involved, a buffer must be established to smooth out the higher-than-
n, and n, denoting two spatial variables and n, denoting the time variable. At average data rates in frames with large motion and the lower-than-average data
each n,, a 2-D transform is computed with respect to variables n, and n,, and the rates in frames with little motion. An example of the frame replenishment method
result is denoted by Tf(kl, k,, n,). At each k, and k,, a waveform coder is used applied to bi-level images is shown in Figure 10.56. Figure 10.56(a) shows a
to quantize TAk,, k,, n,). The result is ~ ; ( k , , k,, n,). The process is reversed sequence of sixteen original image frames. The size of each frame is 128 x 128
at the decoder. The 2-D transform used is typically the DCT, and the waveform pixels. Each of the sixteen frames is mapped to a bi-level image by using the
coder used is typically DPCM. Hybrid coders have several advantages over trans- method described in Section 10.5, and the resulting bi-level images are coded by
form coders in interframe coding. When we are restricted to using only a small using the frame replenishment method. The result at a bit rate of 0.08 bitlpixel
number of frames, which is typically the case in practice, the advantage of transform is shown in Figure 10.56(b). When a frame rate of 15 frameslsec is used, the
coding along the temporal dimension over waveform coding in terms of correlation resulting bit rate is approximately 20 kbitslsec. We note that it is very difficult to
reduction and energy compaction diminishes significantiy. Ir. hybrid coding, 2 - 9
Sec. 10.6 lnterframe Image Coding 661
Figure 10.56 (continued)
Figure 10.56 Example of the frame replenishment method applied to bi-level image coding. addition, degradations such as the dirty window effect and crawling, which are due
(a) 16 original frames. The size of each frame is 128 X 128 pixels. The sequence of the frames
is from left to right (first row), right to left (second row), left to right (third row), and right
to correlation between frames, may not be that apparent when one frame at a time
to left (fourth row). (b) Reconstructed frames at 0.08 bitlpixel by application of the frame is viewed.
replenishment method to the bi-level images obtained from the gray level images in (a). The One way to improve the performance of this frame replenishment method is
method used to generate bi-level images is the same as that used in Figure 10.54. to predict the current frame f(nl, n,, n,) by using motion estimation algorithms.
Specifically, we can form e(nl, n,, n,) by e(n,, n,, n,) = f(nl, n,, n,) - f(nl - d,,
n, - d,, n, - I), where d, and d, are the horizontal and vertical displacements,
visualize from Figure 10.56 how the frames will appear when displayed in a se-
which are functions of pixel location. To the extent that the intensity change
quence. For example, a random fluctuation of intensity from one frame to the
between the current frame and the previously coded frame is due to motion that
next in the same local region may not be annoying when the frames are viewed
can be modeled as translational motion at least at the local level, and that the
one frame at a time. However, when the frames are displayed as a video sequence,
the random fluctuations may appear as an annoying flicker in the lucai rrgion. i n
Sec. 10.6 Interframe Image Coding 663
lmage Coding Chap. 10
Transmitter
10.2. Let f be a random variable with a probability density function pf(fo). We wish to
quantize f. One way to choose the reconstruction levels r,, 1 s k IL, and the
decision boundaries d , , 0 5 k 5 L, is to minimize the average distortion D given
by
D = E [ ( f - fl21
where f is the result of quantization. Show that a necessary set of conditions that
r, and d , have to satisfy is given by Equation (10.9).
10.3. Let f be a random variable with a Gaussian probability density function with mean
of 3 and standard deviation of 5. We wish to quantize f to four reconstruction levels
by minimizing E [ ( f - f ) Z ] where f is the result of quantization. Determine the
reconstruction levels and decision boundaries.
10.4. Consider a random variable f with a probability density function p f ( f o ) . We wish
to map f to g by a nonlinearity such that pg(go), the probability density function of Figure P10.7
g , is uniform.
10.5. Let f denote a random variable with a probability density function p f ( f o ) given by
I
e-fi, fo r 0
pf(fO) = ( 0 , otherwise.
10.6. We wish to quantize, with a total of 7 bits, two scalars f, and f,, which are Gaussian
random variables. The variances of fl and f2 are 1 and 4, respectively. Determine I Figure P10.8
how many (integer number) bits we should allocate to each of the f , and f2, under
Chap. 10 Problems
a method that can be used in coding the pixel intensities with the required average
bit rate being as close as possible to the entropy.
Determine the cells C , , 1 5 i 5 3. Note that if f is in C , , it is quantized to r,. (d) What codebook size is needed if the method you developed in (c) is used?
10.9. Let f = (f,, f2)T denote a vector to be quantized by using vector quantization. 10.12. Suppose we have an image intensity f , which can be viewed as a random variable
Suppose the four training vectors used for the codebook design are represented by whose probability density function p,(f,) is shown in the following figure.
the filled-in dots in the following figure.
0
r L 1
fo
Figure P10.12
Given the number of reconstruction levels, the- reconstruction levels and decision
I Figure P10.9
boundaries are chosen by minimizing
Error = ~ [ ( -
f f),]
Design the codebook using the K-means algorithm with K = 2 (two reconstruction
levels) with the average distortion D given by where E[.] denotes expectation and f denotes the reconstructed intensity. We wish
to minimize the Error defined above and at the same time keep the average number
D = E[@ - f ) ~ ( ?- f)] of bits required to code f below 2.45 bits. Determine the number of reconstruction
where f is the result of quantizing f. levels, the specific values of the reconstruction levels and decision boundaries, and
the codewords assigned to each reconstruction level.
10.10. Consider a message with L different possibilities. The probability of each possibility
is denoted by Pi,1 5 i 5 L. Let H denote the entropy. 10.13. Consider an image intensity f which can be modeled as a sample obtained from the
(a) Show that 0 5 H 5 log, L. probability density function sketched below:
(b) For L = 4, determine one possible set of Pisuch that H = 0.
(c) Answer (b) with H = log, L.
(d) Answer (b) with H = log, L.
10.11. Suppose we wish to code the intensity of a pixel. Let f denote the pixel intensity.
The five reconstruction levels for f are denoted by r,, 1 5 i 5 5. The probability
that f will be quantized to r; is P,, and it is given in the following table.
0 -21 1
Figure P10.13
Suppose four reconstruction levels are assigned to quantize the intensity f. The
reconstruction levels are obtained by using a uniform quantizer.
(a) Determine the codeword to be assigned to each of the four reconstruction levels
such that the average number of bits in coding f is minimized. Specify what the
(a) Suppose we wish to minimize the average bit rate in coding f. What is the reconstruction level is for each codeword.
average number of bits required? Design a set of codewords that will achieve (b) For your codeword assignment in (a), determine the average number of bits
this average bit rate. Assume scalar quantization. required to represent f.
(b) Determine the entropy H.
(c) Suppose we have many pixel intensities that have the same statistics as f. Discuss
Chap. 10 Problems 677
If more than one pixel is coded, we will code each one separately.
(a) What is the minimum average bit rate that could have been achieved if the true
probability Pi had been known?
(b) What is the actual average bit rate achieved? Assume that the codewords were
designed to minimize the average bit rate under the assumption that PI was
accurate.
10.15. Let f(n,, n,) denote a zero-mean stationary random process with a correlation func-
tion R,(n,, n,) given by
(a) In a PCM system, f(nl, n,) is quantized directly. Determine u;, the variance
o f f (n1, n2).
(b) In a DM system, we quantize the prediction error el(nl, n,) given by
where a, b, and c are constants, and f(n, - 1, n,), f (n,, n, - I), and f(n, - 1,
n, - 1) are the quantized intensities of previously coded pixels. Determine a
reasonable choice of a , b, and c. With your choice of a , b, and c, determine
E[e:(nl, n,)] Clearly state any assumptions you make.
(d) Compare u,?,E [e:(nl, n,)] and E[e$(nl, n,)], obtained from (a), (b), and (c).
Based on this comparison, which of the following three expressions would you
code if you wished to minimize the average bit rate at a given level of distortion:
f(nl, n,), el@,, n,), or e2(n1, n,)?
(e) State significant advantages of quantizing f(n,, n,) over quantizing el(nl, n,) and
10.16. Consider the following two different implementations of a two-channe! image coder.
In Figure P10.16, f(n,, n,) is the image, f,(n,, n,) is the lows component, and
where f (n,, n,) is an image, a(nl, n,; k,, k,) are basis functions, and Tf(k,, k,) are
transform coefficients. If a(nl, n,; k,, k,) is separable so that Tf(kl, k,) can be
expressed as Figure P10.22
For a 2 bitslpixel DCT coder, determine a good bit allocation map that can be used
in quantizing the DCT coefficients.
the required number of computations can be reduced significantly.
(a) Determine the number of arithmetic operations required in computing Tf(kl, k,) 10.23. In a subband image coder, we divide an image into many bands (typically 16 bands)
for the case when a(nl, n,; k,, k,) is not separable. by using bandpass filters and then code each band with a coder specifically adapted
(b) Determine the number of arithmetic operations required in computing Tf(k,, k,) to it. Explain how we can interpret the discrete Fourier transform coder as a subband
for the case when a(n,, n,; k,, k,) is separable and the row-column decomposition coder with a large number of bands.
method is used in computing Tf(kl, k,). 10.24. The notion that we can discard (set to zero) variables with low amplitude without
10.20. Let f(nl, n,) denote an Nl x N2-point sequence, and let Cf(kl, k,) denote the discrete creating a large error between the original and reconstructed value of the variable
cosine transform (DCT) of f(nl, n,). Show that discarding some coefficients is applicable not only to transform coefficients but also to image intensities. For
Cf(kl, k,) with small amplitude does not significantly affect Znl Zn2(f(nl, n,) example, if the image intensity is small (close to zero), setting it to zero does not
- f(nl, n,)),, wheref(nl, n,) is the image reconstructed from Cf(k,, k,) with small create a large error between the original and reconstructed intensities. Zonal coding
amplitude coefficients set to zero. and threshold coding are two methods of discarding variables with low amplitudes.
10.21. Let f(n,, n,) denote a stationary random process with mean of mf and correlation (a) Why is zonal coding useful for transform coding but not for waveform coding?
function Rf(nl, n,) given by (b) Why is threshold coding useful for transform coding but not for waveform coding?
10.25. In some applications, detailed knowledge of an image may be available and may be
Rf(nl, n,) = plnll+lnzi + m:, where 0 < p < 1. exploited in developing a very low bit rate image coding system. Consider a video
Let fw(nl, n,) denote the result of applying a 2 x 2-point rectangular window to telephone application. In one system that has been proposed, it is assumed that the
f (n,, n,) so that primary changes that occur in the images will be in the eye and mouth regions.
Suppose we have stored at both the transmitter and the receiver the overall face and
many possible shapes of the left eye, the right eye, and the mouth of the video
Let r", jk,, k,j denote tne 2 x 2-point DFI' of j,(nl, q).
Chap. 10 Problems 681
680 Image Coding Chap. 10
telephone user. At the transmitter, an image frame is analyzed, and the stored eye
shape and mouth shape closest to those in the current frame are identified. The
identification numbers are transmitted. At the receiver, the stored images of the
eyes and mouth are used to create the current frame.
(a) Suppose 100 different images are stored for each of the right eye, the left eye,
and the mouth. What is the bit ratelsec of this system? Assume a frame rate
of 30 frameslsec.
(b) What type of performance would you expect from such a system?
10.26. Let an N, x N, x N,-point sequence f(n,, n,, n,) denote a sequence of frames,
where n, and n, represent the spatial variables and n, represents the time variable.
Let F(k,, k,, k,) denote the N, x N, x N3-point discrete Fourier transform (DFT) Index
of f b l , n,, n,).
(a) Suppose f(n,, n,, n,) does not depend on n,. What are the characteristics of
F(k,, k2, k3)?
(b) Suppose f(n,, n,, n,) = f(n, - n,, n, - n,, 0). What are the characteristics of
F(k1, k2, k3)?
(c) Discuss how the results in (a) and (b) may be used in the 3-D transform coding
of f(n1, n27 n,).
A R (auto-regressive) signal
10.27. Let f(n,, n,, n,) for 0 s n, 5 4 denote five consecutive image frames in a motion
modeling, 369, 371-73, 375,
picture. The variables (n,, n,) are spatial variables. Due to the high temporal
Accommodation, 424-25 377
correlation, we have decided to transmit only f(n,, n,, 0), f(n,, n,, 4), and the
Adaptive image processing, 533-36 Argument principle, 122
displacement vectors dx(n,, n,) and dy(nl,n,) that represent the translational motion
in the five frames. We assume that the motion present can be modeled approximately Additive color systems, 418- 19 ARMA (auto-regressive moving-
by uniform velocity translation in a small local spatio-temporal region. At the Additive random noise, reduction of: average) modeling, 269
receiver, the frames f(n,, n,, I), f(n,, n,, 2), and f(n,, n,, 3) are created by inter- adaptive image processing, 533-36 Associativity property, 14
polation. Since d,(n,, n,) and dy(nl, n,) do not typically vary much spatially, the adaptive restoration based o n noise Auto-correlation function, 349
hope is that coding dx(n,, 4) and dy(n,, n,) will be easier than coding the three visibility function, 540-43 Auto-covariance function, 349
frames. adaptive Wiener filter, 536-39 Auto-regressive (AR) signal
(a) One method of determining dx(nl, n,) and d,(n,, n,) is to use only f(n,, n,, 0) edge-sensitive adaptive image modeling, 369, 371-73, 375,
and f(n,, n,, 4). An alternate method is to use all five frames. Which method restoration, 546-49 377
would lead to a better reconstruction of the three framesf (n,, n,, I), f(n,, n,, 2), image degradation and, 559-62 Auto-regressive moving-average
and f(nl, n,, 3)? short-space spectral subtraction, (ARMA) modeling, 269
(b) Determine one reasonable error criterion that could be used to estimate d,(n,, n,)
545-46
and dy(nl, n,) from all five frames. Assume that we use a region-matching
method for motion estimation. variations of Wiener filtering,
531-33
10.28. Let f(n,, n,) denote an image intensity. When f(n,, n,) is transmitted over a com-
munication channel, the channel may introduce some errors. We assume that the Wiener filtering, 527-33
effect of channel error is a bit reversal from 0 to 1 or from 1 to 0 with probability Additive signal-dependent noise, 562 Bessel function, 203, 204
P. Additive signal-independent noise, Bilinear interpolation, 496
(a) Suppose we code f(n,, n,) with a PCM system with 8 bitslpixel. When P = 562-63, 565 Binary search, tree codebook and,
lo-,, determine the expected percentage of pixels affected by channel errors. Algebraic methods, stability testing 609- 11
(b) Suppose we code f(n,, n,) with a PCM system with 4 bitslpixel. When P = and, 121 Bit allocation, 648-49
lo-,, determine the expected percentage of pixels affected by channel errors. Algebraic tests, 123 Blind deconvolution, algorithms for,
(c) Based on the results of (a) and (b), for a given coding method and a given P, Alternation frequencies, 239 553, 555-59
does the channel error affect more pixels in a high bit rate system or in a low Alternation theorem, 239 Block-by-block processing, 534
bit rate system? Analog signals: Blocking effect, 534, 651, 653-54
(d) Suppose we code f(n,, n,) with a DCT image coder with an average bit rate of
digital processing of, 45-49 Block matching methods, 501
1 bit/pixel. When P = lo-,, determine the expected percentage of pixels af-
examples of, 1 Block quantization, 592
fected by the channel error. Assume that a subimage size of 8 x 8 pixels is
used.
lndex
Granular noise, 625 highpass filtering and unsharp
Edge-sensitive adaptive image implementation of, by frequency Gray level of a black-and-white masking, 459, 462-63
restoration, 546-49 image, 420 homomorphic processing, 463-65
transformation, 247-48
Gray scale modification, 453-59 image interpolation, 495-97
Edge thinning, 478 optimal filter design, 238-45
modification of local contrast and
Eklundh's method, 169-72 specification, 199-202
local luminance mean, 465-68
Electromagnetic wave, light as an, zero-phase filters, 196-99
motion estimation, 497-507
413-14 Flickering, 436-37
I
noise smoothing, 468-76
Energy compaction property, 642 Fourier transform:
Image interpolation, 495-97
Energy relationship: discrete-space , 22 Haar condition, 239, 241 Image model coding, 590, 656-59
discrete cosine transform and, examples, 24, 26-31 Haar transforms, 646-47 Image processing:
158 filtered back-projection method, Hadamard transforms, 646-47 adaptive, 533-36
discrete-space cosine transform 44-45 Hamming window, 204, 206 applications, 410- 11
and, 162 inverse, 22 Hankel transform, 203 categories of, 411-12
Entropy coding method, 613-16 pair, 22-24 Highpass filtering, 459, 462-63 human visual system and, 423-29
Equiripple filters, 240 projection-slice theorem, 42-45 Histogram equalization, 458 light and, 413-23
Even symmetrical discrete cosine properties, 24, 25 Histogram modification, 455 visual phenomena and, 429-37
transform, 149 signal synthesis and reconstruction Homomorphic processing, 463-65 Image processing systems:
Exponential sequences, 6 from phase or magnitude, Horizontal state variables, 326 digitizer, 438-41
discrete Fourier series and, 137 31-39 Huang's theorem, 113- 15 display, 442-43
Eye, human, 423-28 of typical images, 39-42 Hue, light and, 414,417 overview of, 437-38
See also Discrete Fourier Huffman coding, 614-16 Image restoration:
transform; Fast Fourier Hybrid transform coding, 654-55 degradation estimation, 525-27
transform description of, 411-12, 524-25
Frequency domain design: reduction of additive random
False color, 511-12 design approaches, 309- 13 noise, 527-49, 559-62
Fast Fourier transform (FIT): zero-phase design, 313- 15 reduction of image blurring,
minicomputer implementation of Frequency sampling method of 549-62
Ideal lowpass filter, 29
row-column decomposition, design for finite impulse reduction of signal-dependent
IIR. See Infinite impulse response
166-72 response filters, 213- 17 noise, 562-68
filters
row-column decomposition, Frequency transformation method of temporal filtering for, 568-75
Image coding:
163-66 design for finite impulse Image understanding, description of,
channel error effects, 665-67
transform image coding and, 645 response filters: 412
codeword assignment, 612- 17
vector radix, 172-77 basic idea of, 218-20 Impulses, 3-5
color, 664-65
FIT. See Fast Fourier transform design method one, 220-27 Indirect signal modeling methods,
description of, 412, 589-91
Filtered back-projection method, design method two, 227-37 27 1
image model coding, 590, 656-59
44-45 Frequency transformation method of Infinite impulse response (IIR)
interframe, 660-64
Finite impulse response (FIR) filters: design for optimal filters, filters, 195
quantization, 591-611
compared with infinite impulse 244-45 compared with finite impulse
transform, 642-56
response filters, 264, 267, 330 Full search, 609 response filters, 264, 267, 330
waveform coding, 617-42
design by frequency sampling Fusion frequency, 436 design problems, 265-68
Image enhancement:
method, 213-15 description of, 411, 412, 451-53 frequency domain design, 309- 15
design by frequency transformation edge detection, 476-93 implementation, 315-30
method, 215-35 false color and pseudocolor, one-dimensional complex cepstrum ,
design by window method, 202-13 511-12 292 -301
implementation of general, 245-47 Gaussian pyramid representation, gray scale modification, 453-59 spatial domain design, 268-91
implementation of, by cascade, 634, 636-40
248- 50 Global states; 32,?
lndex
lndex
Linear systems, 12-14 MMSE (minimum mean square
Lloyd-Max quantizer, 594 error) estimation, 356, 357,
Infinite impulse response (IIR) brightness, hue, and saturation,
Local contrast and local luminance 358
.filters (cont.) 414-18
mean, modification of, 465-68 Monochromatic light, 415
stabilization of unstable filters, as an electromagnetic wave, Local states, 327
413- 14 Motion-compensated image
304-9 Lowpass filter, specification of a, processing, 498
two-dimensional complex cepstrum, representation of monochrome and
201-2 Motion-compensated image
301-4 color images, 420-23 Lowpass filtering, noise smoothing
sources, 413 restoration, 570, 573-75
Information-preserving, 589 and, 468-69 Motion-compensated temporal
Infrared radiation, 414 Linear closed-form algorithms, LSI. See Linear shift-invariant
270-76 interpolation, 507-9
Initial value and DC value theorem: systems Motion estimation:
discrete Fourier series and, 138 Linear constant coefficient difference Luminance, 415- 17, 420
equations: description of, 497-500
discrete Fourier transform and, 142 Luminance-chrominance, 421-23 region matching methods, 500-503
Fourier transform and, 25 comparison of one- and two- Luminescence, 442
dimensional, 79 spatial interpolation and, 509-11
Input mask, 86 spatio-temporal constraint
Intensity discrimination, 429-31 as linear shift-invariant systems,
83-93 methods, 503-7
Intensity of a black-and-white image, Motion rendition, 437
420 recursive computability, 93-101
system functions for, 101-2 Multiplication:
Interframe image coding, 660-64 discrete Fourier series and, 138
Interpolation, image, 495-97 uses for, 78 McClellan transformation, 220
with boundary conditions, 79-83 discrete Fourier transform and,
Inverse discrete cosine transform, Mach band effect and spatial 142
computation of, 156-57 Linearity: frequency response, 432-34
discrete cosine transform and, 158 Fourier transform and, 25
Inverse discrete-space Fourier Magnitude-retrieval problem, 34
transform, 22 discrete Fourier series and, 138 MAP (maximum a posteriori)
Inverse filtering, 549, 552-53 discrete Fourier transform and, 142 estimation, 356, 357, 358
Inverse z-transform, 76-78 discrete-space cosine transform Marden-Jury test, 121-22
Irradiance, 413 and, 162 Maximum a posteriori (MAP)
Iterative algorithms, 276-78 Fourier transform and, 25 estimation, 356, 357, 358
Iterative prefiltering method, 277 relation between circular convolu- Maximum entropy method (MEM), National Television Systems
tion and, 142 377-81 Committee (NTSC), 436-37,
z-transform and, 76 Maximum likelihood (ML) 441
Linear mapping of variables, z- estimation, 356, 357 Newton-Raphson (NR) method,
transform and, 76 Maximum likelihood method (MLM)
Kaiser window, 204, 206 280-81
Linear shift-invariant (LSI) systems: estimation, 365-67, 369 Noise smoothing:
Karhunen-Lotve (KL) transform, convolution properties and, 14 Median filtering, 469-76
644- 45 lowpass filtering, 468-69
exponential sequences and, 6, MEM (maximum entropy method), median filtering, 469-76
K-means algorithm, 607-9 23-24 377-81 out-range pixel smoothing, 476
frequency response of, 24 Minimal realization, 321 Noise visibility function, 540-43
input-output relation, 13-14 Minimum mean square error Noncausal Wiener filter, 354-56
linear constant coefficient differ- (MMSE) estimation, 356, 357, Nondirectional edge detector, 480
Lag, 440 ence equations as, 83-93 358 Nonessential singularities of the
Lambert-Beer law, 43 quadrant support sequence and, Min-max problem, 236, 268 second kind, 103
Laplacian pyramid representation, 20-21 ML (maximum likelihood) NR (Newton-Raphson) method,
639-40 separable sequences and, 16-20 estimation, 356, 357 280-81
LBG algorithm, 607-9 special support systems and, 20-22 MLM (maximum likelihood method) NTSC (National Television Systems
Light: stability of, 20 estimation, 365-67, 369 Committee), 436-37, 441
additive and subtractive color wedge support sequence and,
1'1 1 1
systems, $18-20 LL-LL
lndex
lndex
Lloyd- Max, 594 Screen persistence, 442
noise, 592 Secondary light source, 413
Periodic sequences, 8-9 pulse code modulation with Separable ideal lowpass filter, 29
discrete Fourier series and, 136-40 nonuniform, 619-20 Separable median filtering, 473
Observation equation, 326 Periodogram, 362 scalar, 591-98 Separable sequences, 6-8
One-dimensional:
comparison of one- and two-
dimensional linear constant
PFA (prime factor algorithm), 178-
82
Phase, unwrapping the, 122
I vector, 598-611 convolution and, 16-20
discrete cosine transform and, 158
discrete Fourier series and, 138
coefficient difference equa- Phase-retrieval problem, 37 discrete Fourier transform and, 142
tions, 79 Phi phenomenon, 437 R discrete-space cosine transform
comparison of one- and two- Photo-conductive sensors, 440 and, i 6 2
dimensional optimal filter Photometric quantities, 415 Radiant flux, 413
Fourier transform and, 25
design, 240-43 Pixel-by-pixel processing, 534 Radiometric units, 414
z-transform and, 76
complex cepstrum, 292-301 Planar least squares inverse method, Radon transform, 43
Sequences. See under Discrete-space
308-9 Random processes, 349-52
discrete cosine transform, 149-53 signals; Separable sequences;
Point spread function, 527 Random signals as inputs to linear
discrete Fourier transform and fast Shift of a sequence ;
Fourier algorithms, 177-82 Polynomial interpolation, 496-97 systems, 352-54
Shanks's conjecture, 309
discrete-space cosine transform, Polynomial transform, 166 Random variables, 347-49
Shanks's theorem, 106- 13
158-60 Power spectra/spectral densities, 354 Raster, 438
Shift-invariant (SI) systems, 12-14
Power spectrum: Real random process, 349
optimal filter design, 238-40 Shift of a sequence:
filtering, 531 Reconstruction codebook, 606
stability tests, 119-23 discrete Fourier series and, 138
of a stationary random process, Rectangular window, 204
Optimal filter design: discrete Fourier transform and, 142
350-51 Recursively computable systems, 22
comparison of one- and two- Fourier transform and, 25
Primary light source, 413 linear constant coefficient differ-
dimensional, 240-43 z-transform and, 76
Prime factor algorithm (PFA), ence equations and, 93-101
by frequency transformation, Short-space spectral subtraction,
178-82 Recursive methods, 502-3
244- 45 545-46
Principle of superposition, 12 Region matching methods, 500-503
summary of one-dimensional, SI (shift-invariant) systems, 12-14
Projection-slice theorem, 42-45 Region of convergence:
238-40 Signal-dependent noise, reduction of:
Prony's method, 273-75 definition of, 66
Output mask, 86, 95 in the signal domain, 565-68
Out-range pixel smoothing, 476 Pseudocolor, 511- 12 properties of, 72-74
transformation to additive signal-
Pulse code modulation (PCM): Relative luminous efficiency, 415,
Overlap-add and overlap-save independent noise, 562-63,
description of basic, 618-19 416 - .-
methods, 145-48
Roberts's pseudonoise technique,
620-21 313-19
with nonuniform quantization, Roberts's edge detection method, 483
Signals, types of, 1
619-20 Roberts's pseudonoise technique,
Pade approximation, 271-73 Slope overload distortion, 625-27
Pyramid coding, 632-34, 636-40 592, 620-21
Parallel form implementation, Sobel's edge detection method, 482
Pyramid processing, 506 ROC. See Region of convergence
329-30 Solid-state sensors, 440-41
Root map, 111-13
Parks-McClellan algorithm, 240 Source coder, 590
Root signal, 475
Parseval's theorem: Space invariance of a system, 12-14
Row-column decomposition, 163-66
discrete Fourier series and, 138 Spatial domain design, 268
discrete Fourier transform and, 142 Quadrant support sequence, 20-21 auto-regressive moving-average
Fourier transform and, 25 Quantization: S modeling, 269
Passband region, 201 adaptive coding and vector, descent algorithms, 278-83
Passband tolerance, 202 640-42, 655-56 Saturation, light and, 414, 418 examples of, 285-91
PCM, See Pulse code modulation block, 592 Scalar quantization, 591-98 iterative algorithms, 276-78
Periodic convolution, discrete Fourier joint optimization of codeword
series and, 138 assignment and, 6i6-i3
Index 691
lndex
Transform image coding: Unit surface, 66
Stable systems, 20 I adaptive coding and vector quan- Unsharp masking, 459, 462-63
Spatial domain design (cont.) tization, 655-56
linear closed-form algorithms, State-space representation, 325-28 Useful relations, z-transform and, 76
Stationary random process, 349-50 description of, 42, 590, 642
270-76 hybrid, 654-55
zero-phase filter design, 283-85 Statistical coding, 613
Statistical parameter estimation, implementation considerations and
Spatial frequency response, mach examples, 647-52
band effect and, 432-34 356-58
Steepest descent method, 279 -80 properties of, 642-44
Spatial interpolation, 495-97 reduction of blocking effect,
motion estimation methods and, with accelerated convergence
method, 281 653-54 Variable-length codeword
509- 11 type of coders, 644-47
Spatial masking, 434-35 Stopband region, 201 assignment, 613- 16
Stopband tolerance, 202 Transition band, 202 Vector quantization, 598-611
Spatial resolution, 440, 441 Tree codebook and binary search,
Spatio-temporal constraint equation, Subband signal coding, 632, 639 adaptive coding and, 640-42,
Subimage-by-subimage coding, 647 609- 11 655-56
499 Tristimulus values, 420-21
Spatio-temporal constraint methods, Subimage-by-subimage processing, Vector radix fast Fourier transforms,
534 Tube sensors, 440 172-77
503-7 Two-channel coders, 466, 468, 538,
Special support systems, 20-22 Subtractive color systems, 419-20 Vertical state variables, 327
630, 632 Video communications and
Spectral estimation, random Symmetry properties:
Two-dimensional: conferencing, 411
processes and, 347-58 discrete cosine transform and, 158
discrete Fourier series and, 138 comparison of one- and two- Vidicon, 438-40
Spectral estimation methods: dimensional linear constant
application of, 391-97 discrete Fourier transform and, 142 Visual system, human:
coefficient difference equa- adaptation, 431-32
based on autoregressive signal discrete-space cosine transform
and, 162 tions, 79 the eye, 423-28
modeling, 369, 371-73, 375, comparison of one- and two-
377 Fourier transform and, 25 intensity discrimination, 429-31
z-transform and, 76 dimensional optimal filter mach band effect and spatial
conventional methods, 361-63 design, 240 -43
data or correlation extension, 377 Systems: frequency response, 432-34
complex cepstrum, 301-4 model for peripheral level of,
dimension-dependent processing, convolution, 14-20
linear, 12-14 discrete cosine transform, 154-57 428-29
363, 365 discrete-space cosine transform,
estimating correlation, 388-89, 391 purpose of, 1-2 other visual phenomena, 435-37
shift-invariant , 12- 14 160-62 spatial masking, 434-35
maximum entropy method, 377-81 See also Discrete-space signals,
maximum likelihood method, special support, 20-22
two-dimensional
365-67, 369 stable, 20
modern or high resolution, 363 See also Linear shift-invariant
performance comparisons of, systems
384-88
Spectral subtraction, short-space,
545-46 Waveform coding:
Stability: advantages of, 617
Ultraviolet radiation, 414 delta modulation, 622, 624-27
algorithms for testing, 117- 19 Uniform convergence, Fourier trans-
one-dimensional tests, 119-24 Television images, improving, description of, 590
410-11 form and, 25 differential pulse code modulation,
problem of testing, 102-5 Uniform-length codeword
theorems, 105-17 Temporal filtering for image 627-30
restoration: assignment, 612- 13 pulse code modulation, 618-21
Stabilization of unstable filters, 304 Uniquely decodable, 612
frame averaging, 568-70 pyramid coding, 632-34, 636-40
by the complex cepstrum method, Unit sample sequence, 3-5
306-8 motion-compensated, 570, 573-75 subband coding, 632, 639
Threshold coding, 647-48 Unit step sequence, 5-6 two-channel coders, 630, 632
by the planar least squares inverse
method, 308-9 Tolerance scheme? 2.01
lndex
lndex
I
vector quantization and adaptive,
640-42
Weber's law, 430 Zero-mean random process, 349
Wedge support output mask, 95 Zero-order interpolation, 496
Wedge support sequence, 21-22 Zero-phase filter design:
Weighted Chebyshev approximation frequency domain design, 313- 15
problem, 236, 239, 268 spatial domain design, 283-85
White noise process, 351 Zero-phase filters, 196-99
Wiener filtering: Zonal coding, 647-48
adaptive, 536-39 z-transform:
noncausal, 354-56 definition of, 65-66
reducing additive random noise examples of, 67-72
and, 527-31 inverse, 76-78
variations of, 531 -33 linear constant coefficient
Window method of design for finite difference equations and,
impulse response filters, 78- 102
202- 13 properties of, 74, 76
Winograd Fourier transform rational, 102
algorithm (WFTA), 178-82 stability and, 102-24
Index