Escolar Documentos
Profissional Documentos
Cultura Documentos
f
electrical
, to slip in relative phase. At
certain frequencies, their beats with the
optical carrier interfere destructively,
creating nulls in the frequency
response of the system. For practical
systems the first null is at tens of GHz,
which is sufficient for handling most
electrical signals of interest. Although
it may seem that the dispersion penalty
places a fundamental limit on the
impulse response (or the bandwidth) of
the time-stretch system, it can be
eliminated. The dispersion penalty
vanishes with single-sideband
modulation. Alternatively, one can use the modulators secondary (inverse) output port to eliminate the dispersion
penalty, in much the same way as two antennas can eliminate spatial nulls in wireless communication (hence the two
antennas on top of a WiFi access point). This configuration is termed phase-diversity. For illustration, two calculated
complementary transfer functions from a typical phase-diverse time-stretch configuration are plotted in Fig. 4.
[5]
Combining the complementary outputs using a maximal ratio combining (MRC) algorithm results in a transfer
function with a flat response in the frequency domain. Thus, the impulse response (bandwidth) of a time-stretch
system is limited only by the bandwidth of the electro-optic modulator, which is about 120GHza value that is
adequate for capturing most electrical waveforms of interest.
Extremely large stretch factors can be obtained using long lengths of fiber, but at the cost of larger lossa problem
that has been overcome by employing Raman amplification within the dispersive fiber itself, leading to the worlds
fastest real-time digitizer,
[6]
as shown in Fig. 3. Also, using PTS, capture of very high frequency signals with a world
record resolution in 10-GHz bandwidth range has been achieved.
[7]
Comparison with time lens imaging
Another technique, temporal imaging using a time lens, can also be used to slow down (mostly optical) signals in
time. The time-lens concept relies on the mathematical equivalence between spatial diffraction and temporal
dispersion, the so-called space-time duality.
[8]
A lens held at fixed distance from an object produces a magnified
visible image. The lens imparts a quadratic phase shift to the spatial frequency components of the optical waves; in
conjunction with the free space propagation (object to lens, lens to eye), this generates a magnified image. Owing to
the mathematical equivalence between paraxial diffraction and temporal dispersion, an optical waveform can be
temporally imaged by a three-step process of dispersing it in time, subjecting it to a phase shift that is quadratic in
time (the time lens itself), and dispersing it again. Theoretically, a focused aberration-free image is obtained under a
specific condition when the two dispersive elements and the phase shift satisfy the temporal equivalent of the classic
Time-stretch analog-to-digital converter
140
lens equation. Alternatively, the time lens can be used without the second dispersive element to transfer the
waveforms temporal profile to the spectral domain, analogous to the property that an ordinary lens produces the
spatial Fourier transform of an object at its focal points.
[9]
In contrast to the time-lens approach, PTS is not based on the space-time duality there is no lens equation that
needs to be satisfied to obtain an error-free slowed-down version of the input waveform. Time-stretch technique also
offers continuous-time acquisition performance, a feature needed for mainstream applications of oscilloscopes.
Another important difference between the two techniques is that the time lens requires the input signal to be
subjected to high amount of dispersion before further processing. For electrical waveforms, the electronic devices
that have the required characteristics: (1) high dispersion to loss ratio, (2) uniform dispersion, and (3) broad
bandwidths, do not exist. This renders time lens not suitable for slowing down wideband electrical waveforms. In
contrast, PTS does not have such a requirement. It was developed specifically for slowing down electrical
waveforms and enable high speed digitizers.
Application to imaging and spectroscopy
In addition to wideband A/D conversion, photonic time-stretch (PTS) is also an enabling technology for
high-throughput real-time instrumentation such as imaging
[10]
and spectroscopy.
[11][12]
The world's fastest optical
imaging method called serial time-encoded amplified microscopy (STEAM) makes use of the PTS technology to
acquire image using a single-pixel photodetector and commercial ADC. Wavelength-time spectroscopy, which also
relies on photonic time-stretch technique, permits real-time single-shot measurements of rapidly evolving or
uctuating spectra.
References
[1] A. S. Bhushan, F. Coppinger, and B. Jalali, Time-stretched analogue-to-digital conversion," Electronics Letters vol. 34, no. 9, pp. 839841,
April 1998. (http:/ / ieeexplore. ieee.org/ xpls/ abs_all. jsp?arnumber=682797)
[2] A. Fard, S. Gupta, and B. Jalali, "Photonic time-stretch digitizer and its extension to real-time spectroscopy and imaging," Laser & Photonics
Reviews vol. 7, no. 2, pp. 207-263, March 2013. (http:/ / onlinelibrary. wiley. com/ doi/ 10. 1002/ lpor. 201200015/ abstract)
[3] Y. Han and B. Jalali, Photonic Time-Stretched Analog-to-Digital Converter: Fundamental Concepts and Practical Considerations," Journal
of Lightwave Technology, Vol. 21, Issue 12, pp. 30853103, Dec. 2003. (http:/ / www. opticsinfobase. org/ abstract. cfm?&
uri=JLT-21-12-3085)
[4] J. Capmany and D. Novak, Microwave photonics combines two worlds," Nature Photonics 1, 319-330 (2007). (http:/ / www. nature. com/
nphoton/ journal/ v1/ n6/ abs/ nphoton.2007. 89.html)
[5] Yan Han, Ozdal Boyraz, Bahram Jalali, "Ultrawide-Band Photonic Time-Stretch A/D Converter Employing Phase Diversity," "IEEE
TRANSACTIONS ON MICROWAVE THEORY AND TECHNIQUES" VOL. 53, NO. 4, APRIL 2005 (http:/ / ieeexplore. ieee. org/ xpls/
abs_all.jsp?arnumber=1420773)
[6] J. Chou, O. Boyraz, D. Solli, and B. Jalali, Femtosecond real-time single-shot digitizer," Applied Physics Letters 91, 161105 (2007). (http:/ /
scitation.aip.org/ getabs/ servlet/ GetabsServlet?prog=normal& id=APPLAB000091000016161105000001& idtype=cvips& gifs=yes)
[7] S. Gupta and B. Jalali, Time-warp correction and calibration in photonic time-stretch analog-to-digital converter," Optics Letters 33,
26742676 (2008). (http:/ / www. opticsinfobase. org/ abstract. cfm?uri=ol-33-22-2674)
[8] B. H. Kolner and M. Nazarathy, Temporal imaging with a time lens," Optics Letters 14, 630-632 (1989) (http:/ / www. opticsinfobase. org/
ol/ abstract.cfm?URI=ol-14-12-630)
[9] J. W. Goodman, Introduction to Fourier Optics," McGraw-Hill (1968).
[10] K. Goda, K.K. Tsia, and B. Jalali, "Serial time-encoded amplified imaging for real-time observation of fast dynamic phenomena," Nature
458, 11451149, 2009. (http:/ / www. nature.com/ nature/ journal/ v458/ n7242/ full/ nature07980. html)
[11] D. R. Solli, J. Chou, and B. Jalali, "Amplified wavelengthtime transformation for real-time spectroscopy," Nature Photonics 2, 48-51,
2008. (http:/ / www. nature. com/ nphoton/ journal/ v2/ n1/ full/ nphoton. 2007. 253. html)
[12] J. Chou, D. Solli, and B. Jalali, "Real-time spectroscopy with subgigahertz resolution using amplified dispersive Fourier transformation,"
Applied Physics Letters 92, 111102, 2008. (http:/ / apl.aip. org/ resource/ 1/ applab/ v92/ i11/ p111102_s1)
Time-stretch analog-to-digital converter
141
Other resources
G. C. Valley, Photonic analog-to-digital converters," Opt. Express, vol. 15, no. 5, pp.19551982, March 2007.
(http:/ / www. opticsinfobase. org/ oe/ abstract. cfm?uri=oe-15-5-1955)
Photonic Bandwidth Compression for Instantaneous Wideband A/D Conversion (PHOBIAC) project. (http:/ /
www. darpa. mil/ MTO/ Programs/ phobiac/ index. html)
Short time Fourier transform for time-frequency analysis of ultrawideband signals (http:/ / www. researchgate.
net/ publication/ 3091384_Time-stretched_short-time_Fourier_transform/ )
142
Fourier Transforms, Discrete and Fast
Discrete Fourier transform
Fourier transforms
Continuous Fourier transform
Fourier series
Discrete-time Fourier transform
Discrete Fourier transform
Fourier analysis
Related transforms
Relationship between the (continuous) Fourier transform and the discrete Fourier
transform. Left column: A continuous function (top) and its Fourier transform (bottom).
Center-left column: Periodic summation of the original function (top). Fourier transform
(bottom) is zero except at discrete points. The inverse transform is a sum of sinusoids
called Fourier series. Center-right column: Original function is discretized (multiplied by
a Dirac comb) (top). Its Fourier transform (bottom) is a periodic summation (DTFT) of
the original transform. Right column: The DFT (bottom) computes discrete samples of the
continuous DTFT. The inverse DFT (top) is a periodic summation of the original samples.
The FFT algorithm computes one cycle of the DFT and its inverse is one cycle of the
DFT inverse.
In mathematics, the discrete Fourier
transform (DFT) converts a finite list
of equally spaced samples of a
function into the list of coefficients of
a finite combination of complex
sinusoids, ordered by their frequencies,
that has those same sample values. It
can be said to convert the sampled
function from its original domain
(often time or position along a line) to
the frequency domain.
The input samples are complex
numbers (in practice, usually real
numbers), and the output coefficients
are complex as well. The frequencies
of the output sinusoids are integer multiples of a fundamental frequency, whose corresponding period is the length of
the sampling interval. The combination of sinusoids obtained through the DFT is therefore periodic with that same
period. The DFT differs from the discrete-time Fourier transform (DTFT) in that its input and output sequences are
both finite; it is therefore said to be the Fourier analysis of finite-domain (or periodic) discrete-time functions.
Discrete Fourier transform
143
Illustration of using Dirac comb functions and the convolution theorem to model the
effects of sampling and/or periodic summation. At lower left is a DTFT, the spectral
result of sampling s(t) at intervals of T. The spectral sequences at (a) upper right and (b)
lower right are respectively computed from (a) one cycle of the periodic summation of
s(t) and (b) one cycle of the periodic summation of the s(nT) sequence. The respective
formulas are (a) the Fourier series integral and (b) the DFT summation. Its similarities to
the original transform, S(f), and its relative computational ease are often the motivation
for computing a DFT sequence.
The DFT is the most important discrete
transform, used to perform Fourier
analysis in many practical applications.
In digital signal processing, the
function is any quantity or signal that
varies over time, such as the pressure
of a sound wave, a radio signal, or
daily temperature readings, sampled
over a finite time interval (often
defined by a window function). In
image processing, the samples can be
the values of pixels along a row or
column of a raster image. The DFT is
also used to efficiently solve partial
differential equations, and to perform
other operations such as convolutions
or multiplying large integers.
Since it deals with a finite amount of
data, it can be implemented in
computers by numerical algorithms or
even dedicated hardware. These
implementations usually employ
efficient fast Fourier transform (FFT)
algorithms;
[1]
so much so that the
terms "FFT" and "DFT" are often used
interchangeably. Prior to its current usage, the "FFT" initialism may have also been used for the ambiguous term
finite Fourier transform.
Definition
The sequence of N complex numbers is transformed into an N-periodic sequence of complex
numbers:
(integers)
[2]
(Eq.1)
Each is a complex number that encodes both amplitude and phase of a sinusoidal component of function .
The sinusoid's frequency is k/N cycles per sample. Its amplitude and phase are:
where atan2 is the two-argument form of the arctan function. Due to periodicity (see Periodicity), the customary
domain of k actually computed is [0,N-1]. That is always the case when the DFT is implemented via the Fast
Fourier transform algorithm. But other common domains are [-N/2,N/2-1] (N even) and [-(N-1)/2,(N-1)/2] (N
odd), as when the left and right halves of an FFT output sequence are swapped.
The transform is sometimes denoted by the symbol , as in or or .
[3]
Eq.1 can be interpreted or derived in various ways, for example:
Discrete Fourier transform
144
It completely describes the discrete-time Fourier transform (DTFT) of an N-periodic sequence, which comprises
only discrete frequency components. (Discrete-time Fourier transform#Periodic data)
It can also provide uniformly spaced samples of the continuous DTFT of a finite length sequence. (Sampling the
DTFT)
It is the cross correlation of the input sequence, x
n
, and a complex sinusoid at frequency k/N. Thus it acts like a
matched filter for that frequency.
It is the discrete analogy of the formula for the coefficients of a Fourier series:
(Eq.2)
which is also N-periodic. In the domain this is the inverse transform of Eq.1.
The normalization factor multiplying the DFT and IDFT (here 1 and 1/N) and the signs of the exponents are merely
conventions, and differ in some treatments. The only requirements of these conventions are that the DFT and IDFT
have opposite-sign exponents and that the product of their normalization factors be 1/N. A normalization of
for both the DFT and IDFT, for instance, makes the transforms unitary.
In the following discussion the terms "sequence" and "vector" will be considered interchangeable.
Properties
Completeness
The discrete Fourier transform is an invertible, linear transformation
with denoting the set of complex numbers. In other words, for any N>0, an N-dimensional complex vector has a
DFT and an IDFT which are in turn N-dimensional complex vectors.
Orthogonality
The vectors form an orthogonal basis over the set of N-dimensional
complex vectors:
where is the Kronecker delta. (In the last step, the summation is trivial if , where it is 1+1+=N, and
otherwise is a geometric series that can be explicitly summed to obtain zero.) This orthogonality condition can be
used to derive the formula for the IDFT from the definition of the DFT, and is equivalent to the unitarity property
below.
Discrete Fourier transform
145
The Plancherel theorem and Parseval's theorem
If X
k
and Y
k
are the DFTs of x
n
and y
n
respectively then the Plancherel theorem states:
where the star denotes complex conjugation. Parseval's theorem is a special case of the Plancherel theorem and
states:
These theorems are also equivalent to the unitary condition below.
Periodicity
The periodicity can be shown directly from the definition:
Similarly, it can be shown that the IDFT formula leads to a periodic extension.
Shift theorem
Multiplying by a linear phase for some integer m corresponds to a circular shift of the output :
is replaced by , where the subscript is interpreted modulo N (i.e., periodically). Similarly, a circular
shift of the input corresponds to multiplying the output by a linear phase. Mathematically, if
represents the vector x then
if
then
and
Circular convolution theorem and cross-correlation theorem
The convolution theorem for the discrete-time Fourier transform indicates that a convolution of two infinite
sequences can be obtained as the inverse transform of the product of the individual transforms. An important
simplification occurs when the sequences are of finite length, N. In terms of the DFT and inverse DFT, it can be
written as follows:
which is the convolution of the sequence with a sequence extended by periodic summation:
Similarly, the cross-correlation of and is given by:
When either sequence contains a string of zeros, of length L, L+1 of the circular convolution outputs are equivalent
to values of Methods have also been developed to use this property as part of an efficient process that
constructs with an or sequence potentially much longer than the practical transform size (N). Two
Discrete Fourier transform
146
such methods are called overlap-save and overlap-add.
[4]
The efficiency results from the fact that a direct evaluation
of either summation (above) requires operations for an output sequence of length N. An indirect method,
using transforms, can take advantage of the efficiency of the fast Fourier transform (FFT) to achieve much better
performance. Furthermore, convolutions can be used to efficiently compute DFTs via Rader's FFT algorithm and
Bluestein's FFT algorithm.
Convolution theorem duality
It can also be shown that:
which is the circular convolution of and .
Trigonometric interpolation polynomial
The trigonometric interpolation polynomial
for
N even ,
for
N odd,
where the coefficients X
k
are given by the DFT of x
n
above, satisfies the interpolation property for
.
For even N, notice that the Nyquist component is handled specially.
This interpolation is not unique: aliasing implies that one could add N to any of the complex-sinusoid frequencies
(e.g. changing to ) without changing the interpolation property, but giving different values in
between the points. The choice above, however, is typical because it has two useful properties. First, it consists
of sinusoids whose frequencies have the smallest possible magnitudes: the interpolation is bandlimited. Second, if
the are real numbers, then is real as well.
In contrast, the most obvious trigonometric interpolation polynomial is the one in which the frequencies range from
0 to (instead of roughly to as above), similar to the inverse DFT formula. This
interpolation does not minimize the slope, and is not generally real-valued for real ; its use is a common mistake.
The unitary DFT
Another way of looking at the DFT is to note that in the above discussion, the DFT can be expressed as a
Vandermonde matrix:
where
is a primitive Nth root of unity. The inverse transform is then given by the inverse of the above matrix:
Discrete Fourier transform
147
With unitary normalization constants , the DFT becomes a unitary transformation, defined by a unitary
matrix:
where det() is the determinant function. The determinant is the product of the eigenvalues, which are always or
as described below. In a real vector space, a unitary transformation can be thought of as simply a rigid rotation
of the coordinate system, and all of the properties of a rigid rotation can be found in the unitary DFT.
The orthogonality of the DFT is now expressed as an orthonormality condition (which arises in many areas of
mathematics as described in root of unity):
If is defined as the unitary DFT of the vector then
and the Plancherel theorem is expressed as:
If we view the DFT as just a coordinate transformation which simply specifies the components of a vector in a new
coordinate system, then the above is just the statement that the dot product of two vectors is preserved under a
unitary DFT transformation. For the special case , this implies that the length of a vector is preserved as
wellthis is just Parseval's theorem:
A consequence of the circular convolution theorem is that the DFT matrix diagonalizes any circulant matrix.
Expressing the inverse DFT in terms of the DFT
A useful property of the DFT is that the inverse DFT can be easily expressed in terms of the (forward) DFT, via
several well-known "tricks". (For example, in computations, it is often convenient to only implement a fast Fourier
transform corresponding to one transform direction and then to get the other transform direction from the first.)
First, we can compute the inverse DFT by reversing the inputs (Duhamel et al., 1988):
(As usual, the subscripts are interpreted modulo N; thus, for , we have .)
Second, one can also conjugate the inputs and outputs:
Third, a variant of this conjugation trick, which is sometimes preferable because it requires no modification of the
data values, involves swapping real and imaginary parts (which can be done on a computer simply by modifying
pointers). Define swap( ) as with its real and imaginary parts swappedthat is, if then
swap( ) is . Equivalently, swap( ) equals . Then
Discrete Fourier transform
148
That is, the inverse transform is the same as the forward transform with the real and imaginary parts swapped for
both input and output, up to a normalization (Duhamel et al., 1988).
The conjugation trick can also be used to define a new transform, closely related to the DFT, that is involutarythat
is, which is its own inverse. In particular, is clearly its own inverse: . A
closely related involutary transformation (by a factor of (1+i) /2) is , since the
factors in cancel the 2. For real inputs , the real part of is none other than the
discrete Hartley transform, which is also involutary.
Eigenvalues and eigenvectors
The eigenvalues of the DFT matrix are simple and well-known, whereas the eigenvectors are complicated, not
unique, and are the subject of ongoing research.
Consider the unitary form defined above for the DFT of length N, where
This matrix satisfies the matrix polynomial equation:
This can be seen from the inverse properties above: operating twice gives the original data in reverse order, so
operating four times gives back the original data and is thus the identity matrix. This means that the eigenvalues
satisfy the equation:
Therefore, the eigenvalues of are the fourth roots of unity: is +1, 1, +i, or i.
Since there are only four distinct eigenvalues for this matrix, they have some multiplicity. The multiplicity
gives the number of linearly independent eigenvectors corresponding to each eigenvalue. (Note that there are N
independent eigenvectors; a unitary matrix is never defective.)
The problem of their multiplicity was solved by McClellan and Parks (1972), although it was later shown to have
been equivalent to a problem solved by Gauss (Dickinson and Steiglitz, 1982). The multiplicity depends on the value
of N modulo 4, and is given by the following table:
Multiplicities of the eigenvalues of the unitary DFT matrix U as a function of the
transform size N (in terms of an integer m).
size N = +1 = 1 = -i = +i
4m m + 1 m m m 1
4m + 1 m + 1 m m m
4m + 2 m + 1 m + 1 m m
4m + 3 m + 1 m + 1 m + 1 m
Otherwise stated, the characteristic polynomial of is:
No simple analytical formula for general eigenvectors is known. Moreover, the eigenvectors are not unique because
any linear combination of eigenvectors for the same eigenvalue is also an eigenvector for that eigenvalue. Various
researchers have proposed different choices of eigenvectors, selected to satisfy useful properties like orthogonality
and to have "simple" forms (e.g., McClellan and Parks, 1972; Dickinson and Steiglitz, 1982; Grnbaum, 1982;
Atakishiyev and Wolf, 1997; Candan et al., 2000; Hanna et al., 2004; Gurevich and Hadani, 2008).
Discrete Fourier transform
149
A straightforward approach is to discretize an eigenfunction of the continuous Fourier transform, of which the most
famous is the Gaussian function. Since periodic summation of the function means discretizing its frequency
spectrum and discretization means periodic summation of the spectrum, the discretized and periodically summed
Gaussian function yields an eigenvector of the discrete transform:
.
A closed form expression for the series is not known, but it converges rapidly.
Two other simple closed-form analytical eigenvectors for special DFT period N were found (Kong, 2008):
For DFT period N = 2L + 1 = 4K +1, where K is an integer, the following is an eigenvector of DFT:
For DFT period N = 2L = 4K, where K is an integer, the following is an eigenvector of DFT:
The choice of eigenvectors of the DFT matrix has become important in recent years in order to define a discrete
analogue of the fractional Fourier transformthe DFT matrix can be taken to fractional powers by exponentiating
the eigenvalues (e.g., Rubio and Santhanam, 2005). For the continuous Fourier transform, the natural orthogonal
eigenfunctions are the Hermite functions, so various discrete analogues of these have been employed as the
eigenvectors of the DFT, such as the Kravchuk polynomials (Atakishiyev and Wolf, 1997). The "best" choice of
eigenvectors to define a fractional discrete Fourier transform remains an open question, however.
Uncertainty principle
If the random variable is constrained by:
then may be considered to represent a discrete probability mass function of n, with an associated
probability mass function constructed from the transformed variable:
For the case of continuous functions P(x) and Q(k), the Heisenberg uncertainty principle states that:
where and are the variances of and respectively, with the equality attained in the case
of a suitably normalized Gaussian distribution. Although the variances may be analogously defined for the DFT, an
analogous uncertainty principle is not useful, because the uncertainty will not be shift-invariant. Nevertheless, a
meaningful uncertainty principle has been introduced by Massar and Spindel.
However, the Hirschman uncertainty will have a useful analog for the case of the DFT. The Hirschman uncertainty
principle is expressed in terms of the Shannon entropy of the two probability functions. In the discrete case, the
Shannon entropies are defined as:
and
Discrete Fourier transform
150
and the Hirschman uncertainty principle becomes:
The equality is obtained for equal to translations and modulations of a suitably normalized Kronecker comb of
period A where A is any exact integer divisor of N. The probability mass function will then be proportional to a
suitably translated Kronecker comb of period B=N/A.
The real-input DFT
If are real numbers, as they often are in practical applications, then the DFT obeys the symmetry:
where denotes complex conjugation.
It follows that X
0
and X
N/2
are real-valued, and the remainder of the DFT is completely specified by just N/2-1
complex numbers.
Generalized DFT (shifted and non-linear phase)
It is possible to shift the transform sampling in time and/or frequency domain by some real shifts a and b,
respectively. This is sometimes known as a generalized DFT (or GDFT), also called the shifted DFT or offset
DFT, and has analogous properties to the ordinary DFT:
Most often, shifts of (half a sample) are used. While the ordinary DFT corresponds to a periodic signal in both
time and frequency domains, produces a signal that is anti-periodic in frequency domain (
) and vice-versa for . Thus, the specific case of is known as an odd-time
odd-frequency discrete Fourier transform (or O
2
DFT). Such shifted transforms are most often used for symmetric
data, to represent different boundary symmetries, and for real-symmetric data they correspond to different forms of
the discrete cosine and sine transforms.
Another interesting choice is , which is called the centered DFT (or CDFT). The
centered DFT has the useful property that, when N is a multiple of four, all four of its eigenvalues (see above) have
equal multiplicities (Rubio and Santhanam, 2005)
[5]
The term GDFT is also used for the non-linear phase extensions of DFT. Hence, GDFT method provides a
generalization for constant amplitude orthogonal block transforms including linear and non-linear phase types.
GDFT is a framework to improve time and frequency domain properties of the traditional DFT, e.g.
auto/cross-correlations, by the addition of the properly designed phase shaping function (non-linear, in general) to
the original linear phase functions (Akansu and Agirman-Tosun, 2010).
[6]
The discrete Fourier transform can be viewed as a special case of the z-transform, evaluated on the unit circle in the
complex plane; more general z-transforms correspond to complex shifts a and b above.
Multidimensional DFT
The ordinary DFT transforms a one-dimensional sequence or array that is a function of exactly one discrete
variable n. The multidimensional DFT of a multidimensional array that is a function of d discrete
variables for in is defined by:
where as above and the d output indices run from . This is
more compactly expressed in vector notation, where we define and
Discrete Fourier transform
151
as d-dimensional vectors of indices from 0 to , which we define as :
where the division is defined as to be performed element-wise, and the
sum denotes the set of nested summations above.
The inverse of the multi-dimensional DFT is, analogous to the one-dimensional case, given by:
As the one-dimensional DFT expresses the input as a superposition of sinusoids, the multidimensional DFT
expresses the input as a superposition of plane waves, or multidimensional sinusoids. The direction of oscillation in
space is . The amplitudes are . This decomposition is of great importance for everything from digital
image processing (two-dimensional) to solving partial differential equations. The solution is broken up into plane
waves.
The multidimensional DFT can be computed by the composition of a sequence of one-dimensional DFTs along each
dimension. In the two-dimensional case the independent DFTs of the rows (i.e., along ) are
computed first to form a new array . Then the independent DFTs of y along the columns (along ) are
computed to form the final result . Alternatively the columns can be computed first and then the rows. The
order is immaterial because the nested summations above commute.
An algorithm to compute a one-dimensional DFT is thus sufficient to efficiently compute a multidimensional DFT.
This approach is known as the row-column algorithm. There are also intrinsically multidimensional FFT algorithms.
The real-input multidimensional DFT
For input data consisting of real numbers, the DFT outputs have a conjugate symmetry similar to the
one-dimensional case above:
where the star again denotes complex conjugation and the -th subscript is again interpreted modulo (for
).
Applications
The DFT has seen wide usage across a large number of fields; we only sketch a few examples below (see also the
references at the end). All applications of the DFT depend crucially on the availability of a fast algorithm to compute
discrete Fourier transforms and their inverses, a fast Fourier transform.
Spectral analysis
When the DFT is used for spectral analysis, the sequence usually represents a finite set of uniformly spaced
time-samples of some signal , where t represents time. The conversion from continuous time to samples
(discrete-time) changes the underlying Fourier transform of x(t) into a discrete-time Fourier transform (DTFT),
which generally entails a type of distortion called aliasing. Choice of an appropriate sample-rate (see Nyquist rate) is
the key to minimizing that distortion. Similarly, the conversion from a very long (or infinite) sequence to a
manageable size entails a type of distortion called leakage, which is manifested as a loss of detail (aka resolution) in
the DTFT. Choice of an appropriate sub-sequence length is the primary key to minimizing that effect. When the
available data (and time to process it) is more than the amount needed to attain the desired frequency resolution, a
standard technique is to perform multiple DFTs, for example to create a spectrogram. If the desired result is a power
spectrum and noise or randomness is present in the data, averaging the magnitude components of the multiple DFTs
Discrete Fourier transform
152
is a useful procedure to reduce the variance of the spectrum (also called a periodogram in this context); two
examples of such techniques are the Welch method and the Bartlett method; the general subject of estimating the
power spectrum of a noisy signal is called spectral estimation.
A final source of distortion (or perhaps illusion) is the DFT itself, because it is just a discrete sampling of the DTFT,
which is a function of a continuous frequency domain. That can be mitigated by increasing the resolution of the
DFT. That procedure is illustrated at Sampling the DTFT.
The procedure is sometimes referred to as zero-padding, which is a particular implementation used in conjunction
with the fast Fourier transform (FFT) algorithm. The inefficiency of performing multiplications and additions
with zero-valued "samples" is more than offset by the inherent efficiency of the FFT.
As already noted, leakage imposes a limit on the inherent resolution of the DTFT. So there is a practical limit to
the benefit that can be obtained from a fine-grained DFT.
Filter bank
See FFT filter banks and Sampling the DTFT.
Data compression
The field of digital signal processing relies heavily on operations in the frequency domain (i.e. on the Fourier
transform). For example, several lossy image and sound compression methods employ the discrete Fourier
transform: the signal is cut into short segments, each is transformed, and then the Fourier coefficients of high
frequencies, which are assumed to be unnoticeable, are discarded. The decompressor computes the inverse transform
based on this reduced number of Fourier coefficients. (Compression applications often use a specialized form of the
DFT, the discrete cosine transform or sometimes the modified discrete cosine transform.) Some relatively recent
compression algorithms, however, use wavelet transforms, which give a more uniform compromise between time
and frequency domain than obtained by chopping data into segments and transforming each segment. In the case of
JPEG2000, this avoids the spurious image features that appear when images are highly compressed with the original
JPEG.
Partial differential equations
Discrete Fourier transforms are often used to solve partial differential equations, where again the DFT is used as an
approximation for the Fourier series (which is recovered in the limit of infinite N). The advantage of this approach is
that it expands the signal in complex exponentials e
inx
, which are eigenfunctions of differentiation: d/dx e
inx
= in e
inx
.
Thus, in the Fourier representation, differentiation is simplewe just multiply by i n. (Note, however, that the
choice of n is not unique due to aliasing; for the method to be convergent, a choice similar to that in the
trigonometric interpolation section above should be used.) A linear differential equation with constant coefficients is
transformed into an easily solvable algebraic equation. One then uses the inverse DFT to transform the result back
into the ordinary spatial representation. Such an approach is called a spectral method.
Discrete Fourier transform
153
Polynomial multiplication
Suppose we wish to compute the polynomial product c(x) = a(x) b(x). The ordinary product expression for the
coefficients of c involves a linear (acyclic) convolution, where indices do not "wrap around." This can be rewritten
as a cyclic convolution by taking the coefficient vectors for a(x) and b(x) with constant term first, then appending
zeros so that the resultant coefficient vectors a and b have dimension d>deg(a(x))+deg(b(x)). Then,
Where c is the vector of coefficients for c(x), and the convolution operator is defined so
But convolution becomes multiplication under the DFT:
Here the vector product is taken elementwise. Thus the coefficients of the product polynomial c(x) are just the terms
0, ..., deg(a(x)) + deg(b(x)) of the coefficient vector
With a fast Fourier transform, the resulting algorithm takes O (NlogN) arithmetic operations. Due to its simplicity
and speed, the CooleyTukey FFT algorithm, which is limited to composite sizes, is often chosen for the transform
operation. In this case, d should be chosen as the smallest integer greater than the sum of the input polynomial
degrees that is factorizable into small prime factors (e.g. 2, 3, and 5, depending upon the FFT implementation).
Multiplication of large integers
The fastest known algorithms for the multiplication of very large integers use the polynomial multiplication method
outlined above. Integers can be treated as the value of a polynomial evaluated specifically at the number base, with
the coefficients of the polynomial corresponding to the digits in that base. After polynomial multiplication, a
relatively low-complexity carry-propagation step completes the multiplication.
Convolution
When data is convolved with a function with wide support, such as for downsampling by a large sampling ratio,
because of the Convolution theorem and the FFT algorithm, it may be faster to transform it, multiply pointwise by
the transform of the filter and then reverse transform it. Alternatively, a good filter is obtained by simply truncating
the transformed data and re-transforming the shortened data set.
Some discrete Fourier transform pairs
Some DFT pairs
Note
Shift theorem
Real DFT
from the geometric progression formula
from the binomial theorem
Discrete Fourier transform
154
is a rectangular window function of W points
centered on n=0, where W is an odd integer, and
is a sinc-like function (specifically, is a Dirichlet
kernel)
Discretization and periodic summation of the scaled
Gaussian functions for . Since either or
is larger than one and thus warrants fast convergence
of one of the two series, for large you may choose
to compute the frequency spectrum and convert to the
time domain using the discrete Fourier transform.
Generalizations
Representation theory
For more details on this topic, see Representation theory of finite groups Discrete Fourier transform.
The DFT can be interpreted as the complex-valued representation theory of the finite cyclic group. In other words, a
sequence of n complex numbers can be thought of as an element of n-dimensional complex space C
n
or equivalently
a function f from the finite cyclic group of order n to the complex numbers, Z
n
C. So f is a class function on the
finite cyclic group, and thus can be expressed as a linear combination of the irreducible characters of this group,
which are the roots of unity.
From this point of view, one may generalize the DFT to representation theory generally, or more narrowly to the
representation theory of finite groups.
More narrowly still, one may generalize the DFT by either changing the target (taking values in a field other than the
complex numbers), or the domain (a group other than a finite cyclic group), as detailed in the sequel.
Other fields
Main articles: Discrete Fourier transform (general) and Number-theoretic transform
Many of the properties of the DFT only depend on the fact that is a primitive root of unity, sometimes
denoted or (so that ). Such properties include the completeness, orthogonality,
Plancherel/Parseval, periodicity, shift, convolution, and unitarity properties above, as well as many FFT algorithms.
For this reason, the discrete Fourier transform can be defined by using roots of unity in fields other than the complex
numbers, and such generalizations are commonly called number-theoretic transforms (NTTs) in the case of finite
fields. For more information, see number-theoretic transform and discrete Fourier transform (general).
Other finite groups
Main article: Fourier transform on finite groups
The standard DFT acts on a sequence x
0
, x
1
, , x
N1
of complex numbers, which can be viewed as a function {0, 1,
, N 1} C. The multidimensional DFT acts on multidimensional sequences, which can be viewed as functions
This suggests the generalization to Fourier transforms on arbitrary finite groups, which act on functions G C
where G is a finite group. In this framework, the standard DFT is seen as the Fourier transform on a cyclic group,
while the multidimensional DFT is a Fourier transform on a direct sum of cyclic groups.
Discrete Fourier transform
155
Alternatives
Main article: Discrete wavelet transform
For more details on this topic, see Discrete wavelet transform Comparison with Fourier transform.
There are various alternatives to the DFT for various applications, prominent among which are wavelets. The analog
of the DFT is the discrete wavelet transform (DWT). From the point of view of timefrequency analysis, a key
limitation of the Fourier transform is that it does not include location information, only frequency information, and
thus has difficulty in representing transients. As wavelets have location as well as frequency, they are better able to
represent location, at the expense of greater difficulty representing frequency. For details, see comparison of the
discrete wavelet transform with the discrete Fourier transform.
Notes
[1] [1] Cooley et al., 1969
[2] In this context, it is common to define UNIQ-math-0-fdcc463dc40e329d-QINU to be the N
th
primitive root of unity,
UNIQ-math-1-fdcc463dc40e329d-QINU , to obtain the following form:
[3] As a linear transformation on a finite-dimensional vector space, the DFT expression can also be written in terms of a DFT matrix; when
scaled appropriately it becomes a unitary matrix and the X
k
can thus be viewed as coefficients of x in an orthonormal basis.
[4] T. G. Stockham, Jr., " High-speed convolution and correlation (http:/ / dl. acm. org/ citation. cfm?id=1464209)," in 1966 Proc. AFIPS Spring
Joint Computing Conf. Reprinted in Digital Signal Processing, L. R. Rabiner and C. M. Rader, editors, New York: IEEE Press, 1972.
[5] Santhanam, Balu; Santhanam, Thalanayar S. "Discrete Gauss-Hermite functions and eigenvectors of the centered discrete Fourier transform"
(http:/ / thamakau. usc. edu/ Proceedings/ ICASSP 2007/ pdfs/ 0301385. pdf), Proceedings of the 32nd IEEE International Conference on
Acoustics, Speech, and Signal Processing (ICASSP 2007, SPTM-P12.4), vol. III, pp. 1385-1388.
[6] Akansu, Ali N.; Agirman-Tosun, Handan "Generalized Discrete Fourier Transform With Nonlinear Phase" (http:/ / web. njit. edu/ ~akansu/
PAPERS/ AkansuIEEE-TSP2010. pdf), IEEE Transactions on Signal Processing, vol. 58, no. 9, pp. 4547-4556, Sept. 2010.
Citations
References
Brigham, E. Oran (1988). The fast Fourier transform and its applications. Englewood Cliffs, N.J.: Prentice Hall.
ISBN0-13-307505-2.
Oppenheim, Alan V.; Schafer, R. W.; and Buck, J. R. (1999). Discrete-time signal processing. Upper Saddle
River, N.J.: Prentice Hall. ISBN0-13-754920-2.
Smith, Steven W. (1999). "Chapter 8: The Discrete Fourier Transform" (http:/ / www. dspguide. com/ ch8/ 1.
htm). The Scientist and Engineer's Guide to Digital Signal Processing (Second ed.). San Diego, Calif.: California
Technical Publishing. ISBN0-9660176-3-3.
Cormen, Thomas H.; Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein (2001). "Chapter 30:
Polynomials and the FFT". Introduction to Algorithms (Second ed.). MIT Press and McGraw-Hill. pp.822848.
ISBN0-262-03293-7. esp. section 30.2: The DFT and FFT, pp.830838.
P. Duhamel, B. Piron, and J. M. Etcheto (1988). "On computing the inverse DFT". IEEE Trans. Acoust., Speech
and Sig. Processing 36 (2): 285286. doi: 10.1109/29.1519 (http:/ / dx. doi. org/ 10. 1109/ 29. 1519).
J. H. McClellan and T. W. Parks (1972). "Eigenvalues and eigenvectors of the discrete Fourier transformation".
IEEE Trans. Audio Electroacoust. 20 (1): 6674. doi: 10.1109/TAU.1972.1162342 (http:/ / dx. doi. org/ 10. 1109/
TAU. 1972. 1162342).
Bradley W. Dickinson and Kenneth Steiglitz (1982). "Eigenvectors and functions of the discrete Fourier
transform". IEEE Trans. Acoust., Speech and Sig. Processing 30 (1): 2531. doi: 10.1109/TASSP.1982.1163843
(http:/ / dx. doi. org/ 10. 1109/ TASSP. 1982. 1163843). (Note that this paper has an apparent typo in its table of
the eigenvalue multiplicities: the +i/i columns are interchanged. The correct table can be found in McClellan and
Discrete Fourier transform
156
Parks, 1972, and is easily confirmed numerically.)
F. A. Grnbaum (1982). "The eigenvectors of the discrete Fourier transform". J. Math. Anal. Appl. 88 (2):
355363. doi: 10.1016/0022-247X(82)90199-8 (http:/ / dx. doi. org/ 10. 1016/ 0022-247X(82)90199-8).
Natig M. Atakishiyev and Kurt Bernardo Wolf (1997). "Fractional Fourier-Kravchuk transform". J. Opt. Soc. Am.
A 14 (7): 14671477. Bibcode: 1997JOSAA..14.1467A (http:/ / adsabs. harvard. edu/ abs/ 1997JOSAA. . 14.
1467A). doi: 10.1364/JOSAA.14.001467 (http:/ / dx. doi. org/ 10. 1364/ JOSAA. 14. 001467).
C. Candan, M. A. Kutay and H. M.Ozaktas (2000). "The discrete fractional Fourier transform". IEEE Trans. on
Signal Processing 48 (5): 13291337. Bibcode: 2000ITSP...48.1329C (http:/ / adsabs. harvard. edu/ abs/
2000ITSP. . . 48. 1329C). doi: 10.1109/78.839980 (http:/ / dx. doi. org/ 10. 1109/ 78. 839980).
Magdy Tawfik Hanna, Nabila Philip Attalla Seif, and Waleed Abd El Maguid Ahmed (2004).
"Hermite-Gaussian-like eigenvectors of the discrete Fourier transform matrix based on the singular-value
decomposition of its orthogonal projection matrices". IEEE Trans. Circ. Syst. I 51 (11): 22452254. doi:
10.1109/TCSI.2004.836850 (http:/ / dx. doi. org/ 10. 1109/ TCSI. 2004. 836850).
Shamgar Gurevich and Ronny Hadani (2009). "On the diagonalization of the discrete Fourier transform". Applied
and Computational Harmonic Analysis 27 (1): 8799. arXiv: 0808.3281 (http:/ / arxiv. org/ abs/ 0808. 3281). doi:
10.1016/j.acha.2008.11.003 (http:/ / dx. doi. org/ 10. 1016/ j. acha. 2008. 11. 003). preprint at.
Shamgar Gurevich, Ronny Hadani, and Nir Sochen (2008). "The finite harmonic oscillator and its applications to
sequences, communication and radar". IEEE Transactions on Information Theory 54 (9): 42394253. arXiv:
0808.1495 (http:/ / arxiv. org/ abs/ 0808. 1495). doi: 10.1109/TIT.2008.926440 (http:/ / dx. doi. org/ 10. 1109/
TIT. 2008. 926440). preprint at.
Juan G. Vargas-Rubio and Balu Santhanam (2005). "On the multiangle centered discrete fractional Fourier
transform". IEEE Sig. Proc. Lett. 12 (4): 273276. Bibcode: 2005ISPL...12..273V (http:/ / adsabs. harvard. edu/
abs/ 2005ISPL. . . 12. . 273V). doi: 10.1109/LSP.2005.843762 (http:/ / dx. doi. org/ 10. 1109/ LSP. 2005.
843762).
J. Cooley, P. Lewis, and P. Welch (1969). "The finite Fourier transform". IEEE Trans. Audio Electroacoustics 17
(2): 7785. doi: 10.1109/TAU.1969.1162036 (http:/ / dx. doi. org/ 10. 1109/ TAU. 1969. 1162036).
F.N. Kong (2008). "Analytic Expressions of Two Discrete Hermite-Gaussian Signals". IEEE Trans. Circuits and
Systems II: Express Briefs. 55 (1): 5660. doi: 10.1109/TCSII.2007.909865 (http:/ / dx. doi. org/ 10. 1109/
TCSII. 2007. 909865).
External links
Matlab tutorial on the Discrete Fourier Transformation (http:/ / www. nbtwiki. net/ doku.
php?id=tutorial:the_discrete_fourier_transformation_dft)
Interactive flash tutorial on the DFT (http:/ / www. fourier-series. com/ fourierseries2/ DFT_tutorial. html)
Mathematics of the Discrete Fourier Transform by Julius O. Smith III (http:/ / ccrma. stanford. edu/ ~jos/ mdft/
mdft. html)
Fast implementation of the DFT - coded in C and under General Public License (GPL) (http:/ / www. fftw. org)
The DFT Pied: Mastering The Fourier Transform in One Day (http:/ / www. dspdimension. com/ admin/
dft-a-pied/ )
Explained: The Discrete Fourier Transform (http:/ / web. mit. edu/ newsoffice/ 2009/ explained-fourier. html)
wavetable Cooker (http:/ / noisemakessound. com/ blofeld-wavetable-cooker/ ) GPL application with graphical
interface written in C, and implementing DFT IDFT to generate a wavetable set
Fast Fourier transform
157
Fast Fourier transform
"FFT" redirects here. For other uses, see FFT (disambiguation).
Frequency and time domain for the same signal
A fast Fourier transform (FFT) is an algorithm to compute the
discrete Fourier transform (DFT) and its inverse. Fourier analysis
converts time (or space) to frequency and vice versa; an FFT rapidly
computes such transformations by factorizing the DFT matrix into a
product of sparse (mostly zero) factors.
[1]
As a result, fast Fourier
transforms are widely used for many applications in engineering,
science, and mathematics. The basic ideas were popularized in 1965,
but some FFTs had been previously known as early as 1805. Fast
Fourier transforms have been described as "the most important
numerical algorithm[s] of our lifetime".
Overview
There are many different FFT algorithms involving a wide range of mathematics, from simple complex-number
arithmetic to group theory and number theory; this article gives an overview of the available techniques and some of
their general properties, while the specific algorithms are described in subsidiary articles linked below.
The DFT is obtained by decomposing a sequence of values into components of different frequencies. This operation
is useful in many fields (see discrete Fourier transform for properties and applications of the transform) but
computing it directly from the definition is often too slow to be practical. An FFT is a way to compute the same
result more quickly: computing the DFT of N points in the naive way, using the definition, takes O(N
2
) arithmetical
operations, while a FFT can compute the same DFT in only O(N log N) operations. The difference in speed can be
enormous, especially for long data sets where N may be in the thousands or millions. In practice, the computation
time can be reduced by several orders of magnitude in such cases, and the improvement is roughly proportional to N
/ log(N). This huge improvement made the calculation of the DFT practical; FFTs are of great importance to a wide
variety of applications, from digital signal processing and solving partial differential equations to algorithms for
quick multiplication of large integers.
The best-known FFT algorithms depend upon the factorization of N, but there are FFTs with O(NlogN) complexity
for all N, even for prime N. Many FFT algorithms only depend on the fact that is an N-th primitive root of
unity, and thus can be applied to analogous transforms over any finite field, such as number-theoretic transforms.
Since the inverse DFT is the same as the DFT, but with the opposite sign in the exponent and a 1/N factor, any FFT
algorithm can easily be adapted for it.
Definition and speed
An FFT computes the DFT and produces exactly the same result as evaluating the DFT definition directly; the most
important difference is that an FFT is much faster. (In the presence of round-off error, many FFT algorithms are also
much more accurate than evaluating the DFT definition directly, as discussed below.)
Let x
0
, ...., x
N-1
be complex numbers. The DFT is defined by the formula
Evaluating this definition directly requires O(N
2
) operations: there are N outputs X
k
, and each output requires a sum
of N terms. An FFT is any method to compute the same results in O(N log N) operations. More precisely, all known
FFT algorithms require (N log N) operations (technically, O only denotes an upper bound), although there is no
Fast Fourier transform
158
known proof that a lower complexity score is impossible.(Johnson and Frigo, 2007)
To illustrate the savings of an FFT, consider the count of complex multiplications and additions. Evaluating the
DFT's sums directly involves N
2
complex multiplications and N(N1) complex additions [of which O(N) operations
can be saved by eliminating trivial operations such as multiplications by 1]. The well-known radix-2 CooleyTukey
algorithm, for N a power of 2, can compute the same result with only (N/2)log
2
(N) complex multiplications (again,
ignoring simplifications of multiplications by 1 and similar) and Nlog
2
(N) complex additions. In practice, actual
performance on modern computers is usually dominated by factors other than the speed of arithmetic operations and
the analysis is a complicated subject (see, e.g., Frigo & Johnson, 2005), but the overall improvement from O(N
2
) to
O(N log N) remains.
Algorithms
CooleyTukey algorithm
Main article: CooleyTukey FFT algorithm
By far the most commonly used FFT is the CooleyTukey algorithm. This is a divide and conquer algorithm that
recursively breaks down a DFT of any composite size N = N
1
N
2
into many smaller DFTs of sizes N
1
and N
2
, along
with O(N) multiplications by complex roots of unity traditionally called twiddle factors (after Gentleman and Sande,
1966).
This method (and the general idea of an FFT) was popularized by a publication of J. W. Cooley and J. W. Tukey in
1965, but it was later discovered (Heideman, Johnson, & Burrus, 1984) that those two authors had independently
re-invented an algorithm known to Carl Friedrich Gauss around 1805 (and subsequently rediscovered several times
in limited forms).
The best known use of the CooleyTukey algorithm is to divide the transform into two pieces of size N/2 at each
step, and is therefore limited to power-of-two sizes, but any factorization can be used in general (as was known to
both Gauss and Cooley/Tukey). These are called the radix-2 and mixed-radix cases, respectively (and other variants
such as the split-radix FFT have their own names as well). Although the basic idea is recursive, most traditional
implementations rearrange the algorithm to avoid explicit recursion. Also, because the CooleyTukey algorithm
breaks the DFT into smaller DFTs, it can be combined arbitrarily with any other algorithm for the DFT, such as
those described below.
Other FFT algorithms
Main articles: Prime-factor FFT algorithm, Bruun's FFT algorithm, Rader's FFT algorithm and Bluestein's FFT
algorithm
There are other FFT algorithms distinct from CooleyTukey.
Cornelius Lanczos did pioneering work on the FFS and FFT with G.C. Danielson (1940).
For N = N
1
N
2
with coprime N
1
and N
2
, one can use the Prime-Factor (Good-Thomas) algorithm (PFA), based on the
Chinese Remainder Theorem, to factorize the DFT similarly to CooleyTukey but without the twiddle factors. The
Rader-Brenner algorithm (1976) is a CooleyTukey-like factorization but with purely imaginary twiddle factors,
reducing multiplications at the cost of increased additions and reduced numerical stability; it was later superseded by
the split-radix variant of CooleyTukey (which achieves the same multiplication count but with fewer additions and
without sacrificing accuracy). Algorithms that recursively factorize the DFT into smaller operations other than DFTs
include the Bruun and QFT algorithms. (The Rader-Brenner and QFT algorithms were proposed for power-of-two
sizes, but it is possible that they could be adapted to general composite n. Bruun's algorithm applies to arbitrary even
composite sizes.) Bruun's algorithm, in particular, is based on interpreting the FFT as a recursive factorization of the
polynomial z
N
1, here into real-coefficient polynomials of the form z
M
1 and z
2M
+ az
M
+ 1.
Fast Fourier transform
159
Another polynomial viewpoint is exploited by the Winograd algorithm, which factorizes z
N
1 into cyclotomic
polynomialsthese often have coefficients of 1, 0, or 1, and therefore require few (if any) multiplications, so
Winograd can be used to obtain minimal-multiplication FFTs and is often used to find efficient algorithms for small
factors. Indeed, Winograd showed that the DFT can be computed with only O(N) irrational multiplications, leading
to a proven achievable lower bound on the number of multiplications for power-of-two sizes; unfortunately, this
comes at the cost of many more additions, a tradeoff no longer favorable on modern processors with hardware
multipliers. In particular, Winograd also makes use of the PFA as well as an algorithm by Rader for FFTs of prime
sizes.
Rader's algorithm, exploiting the existence of a generator for the multiplicative group modulo prime N, expresses a
DFT of prime size n as a cyclic convolution of (composite) size N1, which can then be computed by a pair of
ordinary FFTs via the convolution theorem (although Winograd uses other convolution methods). Another
prime-size FFT is due to L. I. Bluestein, and is sometimes called the chirp-z algorithm; it also re-expresses a DFT as
a convolution, but this time of the same size (which can be zero-padded to a power of two and evaluated by radix-2
CooleyTukey FFTs, for example), via the identity .
FFT algorithms specialized for real and/or symmetric data
In many applications, the input data for the DFT are purely real, in which case the outputs satisfy the symmetry
and efficient FFT algorithms have been designed for this situation (see e.g. Sorensen, 1987). One approach consists
of taking an ordinary algorithm (e.g. CooleyTukey) and removing the redundant parts of the computation, saving
roughly a factor of two in time and memory. Alternatively, it is possible to express an even-length real-input DFT as
a complex DFT of half the length (whose real and imaginary parts are the even/odd elements of the original real
data), followed by O(N) post-processing operations.
It was once believed that real-input DFTs could be more efficiently computed by means of the discrete Hartley
transform (DHT), but it was subsequently argued that a specialized real-input DFT algorithm (FFT) can typically be
found that requires fewer operations than the corresponding DHT algorithm (FHT) for the same number of inputs.
Bruun's algorithm (above) is another method that was initially proposed to take advantage of real inputs, but it has
not proved popular.
There are further FFT specializations for the cases of real data that have even/odd symmetry, in which case one can
gain another factor of (roughly) two in time and memory and the DFT becomes the discrete cosine/sine transform(s)
(DCT/DST). Instead of directly modifying an FFT algorithm for these cases, DCTs/DSTs can also be computed via
FFTs of real data combined with O(N) pre/post processing.
Computational issues
Bounds on complexity and operation counts
List of unsolved problems in computer science
What is the lower bound on the complexity of fast Fourier transform algorithms? Can they be faster than (N log N)?
A fundamental question of longstanding theoretical interest is to prove lower bounds on the complexity and exact
operation counts of fast Fourier transforms, and many open problems remain. It is not even rigorously proved
whether DFTs truly require (N log(N)) (i.e., order N log(N) or greater) operations, even for the simple case of
power of two sizes, although no algorithms with lower complexity are known. In particular, the count of arithmetic
operations is usually the focus of such questions, although actual performance on modern-day computers is
determined by many other factors such as cache or CPU pipeline optimization.
Fast Fourier transform
160
Following pioneering work by Winograd (1978), a tight (N) lower bound is known for the number of real
multiplications required by an FFT. It can be shown that only irrational real
multiplications are required to compute a DFT of power-of-two length . Moreover, explicit algorithms
that achieve this count are known (Heideman & Burrus, 1986; Duhamel, 1990). Unfortunately, these algorithms
require too many additions to be practical, at least on modern computers with hardware
multipliers.Wikipedia:Citation needed
A tight lower bound is not known on the number of required additions, although lower bounds have been proved
under some restrictive assumptions on the algorithms. In 1973, Morgenstern proved an (N log(N)) lower bound on
the addition count for algorithms where the multiplicative constants have bounded magnitudes (which is true for
most but not all FFT algorithms). Pan (1986) proved an (N log(N)) lower bound assuming a bound on a measure of
the FFT algorithm's "asynchronicity", but the generality of this assumption is unclear. For the case of power-of-two
N, Papadimitriou (1979) argued that the number of complex-number additions achieved by
CooleyTukey algorithms is optimal under certain assumptions on the graph of the algorithm (his assumptions
imply, among other things, that no additive identities in the roots of unity are exploited). (This argument would
imply that at least real additions are required, although this is not a tight bound because extra additions
are required as part of complex-number multiplications.) Thus far, no published FFT algorithm has achieved fewer
than complex-number additions (or their equivalent) for power-of-two N.
A third problem is to minimize the total number of real multiplications and additions, sometimes called the
"arithmetic complexity" (although in this context it is the exact count and not the asymptotic complexity that is being
considered). Again, no tight lower bound has been proven. Since 1968, however, the lowest published count for
power-of-two N was long achieved by the split-radix FFT algorithm, which requires real
multiplications and additions for N > 1. This was recently reduced to (Johnson and Frigo, 2007;
Lundy and Van Buskirk, 2007). A slightly larger count (but still better than split radix for N256) was shown to be
provably optimal for N512 under additional restrictions on the possible algorithms (split-radix-like flowgraphs with
unit-modulus multiplicative factors), by reduction to a Satisfiability Modulo Theories problem solvable by brute
force (Haynal & Haynal, 2011).
Most of the attempts to lower or prove the complexity of FFT algorithms have focused on the ordinary complex-data
case, because it is the simplest. However, complex-data FFTs are so closely related to algorithms for related
problems such as real-data FFTs, discrete cosine transforms, discrete Hartley transforms, and so on, that any
improvement in one of these would immediately lead to improvements in the others (Duhamel & Vetterli, 1990).
Accuracy and approximations
All of the FFT algorithms discussed above compute the DFT exactly (in exact arithmetic, i.e. neglecting
floating-point errors). A few "FFT" algorithms have been proposed, however, that compute the DFT approximately,
with an error that can be made arbitrarily small at the expense of increased computations. Such algorithms trade the
approximation error for increased speed or other properties. For example, an approximate FFT algorithm by
Edelman et al. (1999) achieves lower communication requirements for parallel computing with the help of a fast
multipole method. A wavelet-based approximate FFT by Guo and Burrus (1996) takes sparse inputs/outputs
(time/frequency localization) into account more efficiently than is possibleWikipedia:Citation needed with an exact
FFT. Another algorithm for approximate computation of a subset of the DFT outputs is due to Shentov et al. (1995).
The Edelman algorithm works equally well for sparse and non-sparse data, since it is based on the compressibility
(rank deficiency) of the Fourier matrix itself rather than the compressibility (sparsity) of the data. Conversely, if the
data are sparsethat is, if only K out of N Fourier coefficients are nonzerothen the complexity can be reduced to
O(Klog(N)log(N/K)), and this has been demonstrated to lead to practical speedups compared to an ordinary FFT for
N/K>32 in a large-N example (N=2
22
) using a probabilistic approximate algorithm (which estimates the largest K
coefficients to several decimal places).
[2]
Fast Fourier transform
161
Even the "exact" FFT algorithms have errors when finite-precision floating-point arithmetic is used, but these errors
are typically quite small; most FFT algorithms, e.g. CooleyTukey, have excellent numerical properties as a
consequence of the pairwise summation structure of the algorithms. The upper bound on the relative error for the
CooleyTukey algorithm is O( log N), compared to O(N
3/2
) for the nave DFT formula (Gentleman and Sande,
1966), where is the machine floating-point relative precision. In fact, the root mean square (rms) errors are much
better than these upper bounds, being only O( log N) for CooleyTukey and O( N) for the nave DFT
(Schatzman, 1996). These results, however, are very sensitive to the accuracy of the twiddle factors used in the FFT
(i.e. the trigonometric function values), and it is not unusual for incautious FFT implementations to have much worse
accuracy, e.g. if they use inaccurate trigonometric recurrence formulas. Some FFTs other than CooleyTukey, such
as the Rader-Brenner algorithm, are intrinsically less stable.
In fixed-point arithmetic, the finite-precision errors accumulated by FFT algorithms are worse, with rms errors
growing as O(N) for the CooleyTukey algorithm (Welch, 1969). Moreover, even achieving this accuracy requires
careful attention to scaling in order to minimize the loss of precision, and fixed-point FFT algorithms involve
rescaling at each intermediate stage of decompositions like CooleyTukey.
To verify the correctness of an FFT implementation, rigorous guarantees can be obtained in O(Nlog(N)) time by a
simple procedure checking the linearity, impulse-response, and time-shift properties of the transform on random
inputs (Ergn, 1995).
Multidimensional FFTs
As defined in the multidimensional DFT article, the multidimensional DFT
transforms an array x
n
with a d-dimensional vector of indices by a set of d nested summations
(over for each j), where the division n/N, defined as , is
performed element-wise. Equivalently, it is the composition of a sequence of d sets of one-dimensional DFTs,
performed along one dimension at a time (in any order).
This compositional viewpoint immediately provides the simplest and most common multidimensional DFT
algorithm, known as the row-column algorithm (after the two-dimensional case, below). That is, one simply
performs a sequence of d one-dimensional FFTs (by any of the above algorithms): first you transform along the n
1
dimension, then along the n
2
dimension, and so on (or actually, any ordering will work). This method is easily shown
to have the usual O(Nlog(N)) complexity, where is the total number of data points
transformed. In particular, there are N/N
1
transforms of size N
1
, etcetera, so the complexity of the sequence of FFTs
is:
In two dimensions, the x
k
can be viewed as an matrix, and this algorithm corresponds to first performing
the FFT of all the rows (resp. columns), grouping the resulting transformed rows (resp. columns) together as another
matrix, and then performing the FFT on each of the columns (resp. rows) of this second matrix, and
similarly grouping the results into the final result matrix.
In more than two dimensions, it is often advantageous for cache locality to group the dimensions recursively. For
example, a three-dimensional FFT might first perform two-dimensional FFTs of each planar "slice" for each fixed
n
1
, and then perform the one-dimensional FFTs along the n
1
direction. More generally, an asymptotically optimal
cache-oblivious algorithm consists of recursively dividing the dimensions into two groups and
Fast Fourier transform
162
that are transformed recursively (rounding if d is not even) (see Frigo and Johnson, 2005). Still, this remains a
straightforward variation of the row-column algorithm that ultimately requires only a one-dimensional FFT
algorithm as the base case, and still has O(Nlog(N)) complexity. Yet another variation is to perform matrix
transpositions in between transforming subsequent dimensions, so that the transforms operate on contiguous data;
this is especially important for out-of-core and distributed memory situations where accessing non-contiguous data is
extremely time-consuming.
There are other multidimensional FFT algorithms that are distinct from the row-column algorithm, although all of
them have O(Nlog(N)) complexity. Perhaps the simplest non-row-column FFT is the vector-radix FFT algorithm,
which is a generalization of the ordinary CooleyTukey algorithm where one divides the transform dimensions by a
vector of radices at each step. (This may also have cache benefits.) The simplest case of
vector-radix is where all of the radices are equal (e.g. vector-radix-2 divides all of the dimensions by two), but this is
not necessary. Vector radix with only a single non-unit radix at a time, i.e. , is
essentially a row-column algorithm. Other, more complicated, methods include polynomial transform algorithms due
to Nussbaumer (1977), which view the transform in terms of convolutions and polynomial products. See Duhamel
and Vetterli (1990) for more information and references.
Other generalizations
An O(N
5/2
log(N)) generalization to spherical harmonics on the sphere S
2
with N
2
nodes was described by
Mohlenkamp (1999), along with an algorithm conjectured (but not proven) to have O(N
2
log
2
(N)) complexity;
Mohlenkamp also provides an implementation in the libftsh library
[3]
. A spherical-harmonic algorithm with
O(N
2
log(N)) complexity is described by Rokhlin and Tygert (2006).
The Fast Folding Algorithm is analogous to the FFT, except that it operates on a series of binned waveforms rather
than a series of real or complex scalar values. Rotation (which in the FFT is multiplication by a complex phasor) is a
circular shift of the component waveform.
Various groups have also published "FFT" algorithms for non-equispaced data, as reviewed in Potts et al. (2001).
Such algorithms do not strictly compute the DFT (which is only defined for equispaced data), but rather some
approximation thereof (a non-uniform discrete Fourier transform, or NDFT, which itself is often computed only
approximately). More generally there are various other methods of spectral estimation.
References
[1] Charles Van Loan, Computational Frameworks for the Fast Fourier Transform (SIAM, 1992).
[2] Haitham Hassanieh, Piotr Indyk, Dina Katabi, and Eric Price, "Simple and Practical Algorithm for Sparse Fourier Transform" (http:/ / www.
mit. edu/ ~ecprice/ papers/ sparse-fft-soda. pdf) (PDF), ACM-SIAM Symposium On Discrete Algorithms (SODA), Kyoto, January 2012. See
also the sFFT Web Page (http:/ / groups. csail.mit.edu/ netmit/ sFFT/ ).
[3] http:/ / www. math. ohiou. edu/ ~mjm/ research/ libftsh.html
Brenner, N.; Rader, C. (1976). "A New Principle for Fast Fourier Transformation". IEEE Acoustics, Speech &
Signal Processing 24 (3): 264266. doi: 10.1109/TASSP.1976.1162805 (http:/ / dx. doi. org/ 10. 1109/ TASSP.
1976. 1162805).
Brigham, E. O. (2002). The Fast Fourier Transform. New York: Prentice-Hall
Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein, 2001. Introduction to Algorithms,
2nd. ed. MIT Press and McGraw-Hill. ISBN 0-262-03293-7. Especially chapter 30, "Polynomials and the FFT."
Duhamel, Pierre (1990). "Algorithms meeting the lower bounds on the multiplicative complexity of length-
DFTs and their connection with practical algorithms". IEEE Trans. Acoust. Speech. Sig. Proc. 38 (9): 1504151.
doi: 10.1109/29.60070 (http:/ / dx. doi. org/ 10. 1109/ 29. 60070).
P. Duhamel and M. Vetterli, 1990, Fast Fourier transforms: a tutorial review and a state of the art (http:/ / dx. doi.
org/ 10. 1016/ 0165-1684(90)90158-U), Signal Processing 19: 259299.
Fast Fourier transform
163
A. Edelman, P. McCorquodale, and S. Toledo, 1999, The Future Fast Fourier Transform? (http:/ / dx. doi. org/ 10.
1137/ S1064827597316266), SIAM J. Sci. Computing 20: 10941114.
D. F. Elliott, & K. R. Rao, 1982, Fast transforms: Algorithms, analyses, applications. New York: Academic
Press.
Funda Ergn, 1995, Testing multivariate linear functions: Overcoming the generator bottleneck (http:/ / dx. doi.
org/ 10. 1145/ 225058. 225167), Proc. 27th ACM Symposium on the Theory of Computing: 407416.
M. Frigo and S. G. Johnson, 2005, " The Design and Implementation of FFTW3 (http:/ / fftw. org/
fftw-paper-ieee. pdf)," Proceedings of the IEEE 93: 216231.
Carl Friedrich Gauss, 1866. " Theoria interpolationis methodo nova tractata (http:/ / lseet. univ-tln. fr/ ~iaroslav/
Gauss_Theoria_interpolationis_methodo_nova_tractata. php)," Werke band 3, 265327. Gttingen: Knigliche
Gesellschaft der Wissenschaften.
W. M. Gentleman and G. Sande, 1966, "Fast Fourier transformsfor fun and profit," Proc. AFIPS 29: 563578.
doi: 10.1145/1464291.1464352 (http:/ / dx. doi. org/ 10. 1145/ 1464291. 1464352)
H. Guo and C. S. Burrus, 1996, Fast approximate Fourier transform via wavelets transform (http:/ / dx. doi. org/
10. 1117/ 12. 255236), Proc. SPIE Intl. Soc. Opt. Eng. 2825: 250259.
H. Guo, G. A. Sitton, C. S. Burrus, 1994, The Quick Discrete Fourier Transform (http:/ / dx. doi. org/ 10. 1109/
ICASSP. 1994. 389994), Proc. IEEE Conf. Acoust. Speech and Sig. Processing (ICASSP) 3: 445448.
Steve Haynal and Heidi Haynal, " Generating and Searching Families of FFT Algorithms (http:/ / jsat. ewi.
tudelft. nl/ content/ volume7/ JSAT7_13_Haynal. pdf)", Journal on Satisfiability, Boolean Modeling and
Computation vol. 7, pp.145187 (2011).
Heideman, M. T.; Johnson, D. H.; Burrus, C. S. (1984). "Gauss and the history of the fast Fourier transform".
IEEE ASSP Magazine 1 (4): 1421. doi: 10.1109/MASSP.1984.1162257 (http:/ / dx. doi. org/ 10. 1109/ MASSP.
1984. 1162257).
Heideman, Michael T.; Burrus, C. Sidney (1986). "On the number of multiplications necessary to compute a
length- DFT". IEEE Trans. Acoust. Speech. Sig. Proc. 34 (1): 9195. doi: 10.1109/TASSP.1986.1164785
(http:/ / dx. doi. org/ 10. 1109/ TASSP. 1986. 1164785).
S. G. Johnson and M. Frigo, 2007. " A modified split-radix FFT with fewer arithmetic operations (http:/ / www.
fftw. org/ newsplit. pdf)," IEEE Trans. Signal Processing 55 (1): 111119.
T. Lundy and J. Van Buskirk, 2007. "A new matrix approach to real FFTs and convolutions of length 2
k
,"
Computing 80 (1): 23-45.
Kent, Ray D. and Read, Charles (2002). Acoustic Analysis of Speech. ISBN 0-7693-0112-6. Cites Strang, G.
(1994)/MayJune). Wavelets. American Scientist, 82, 250-255.
Morgenstern, Jacques (1973). "Note on a lower bound of the linear complexity of the fast Fourier transform". J.
ACM 20 (2): 305306. doi: 10.1145/321752.321761 (http:/ / dx. doi. org/ 10. 1145/ 321752. 321761).
Mohlenkamp, M. J. (1999). "A fast transform for spherical harmonics" (http:/ / www. math. ohiou. edu/ ~mjm/
research/ MOHLEN1999P. pdf). J. Fourier Anal. Appl. 5 (2-3): 159184. doi: 10.1007/BF01261607 (http:/ / dx.
doi. org/ 10. 1007/ BF01261607).
Nussbaumer, H. J. (1977). "Digital filtering using polynomial transforms". Electronics Lett. 13 (13): 386387.
doi: 10.1049/el:19770280 (http:/ / dx. doi. org/ 10. 1049/ el:19770280).
V. Pan, 1986, The trade-off between the additive complexity and the asyncronicity of linear and bilinear
algorithms (http:/ / dx. doi. org/ 10. 1016/ 0020-0190(86)90035-9), Information Proc. Lett. 22: 11-14.
Christos H. Papadimitriou, 1979, Optimality of the fast Fourier transform (http:/ / dx. doi. org/ 10. 1145/ 322108.
322118), J. ACM 26: 95-102.
D. Potts, G. Steidl, and M. Tasche, 2001. " Fast Fourier transforms for nonequispaced data: A tutorial (http:/ /
www. tu-chemnitz. de/ ~potts/ paper/ ndft. pdf)", in: J.J. Benedetto and P. Ferreira (Eds.), Modern Sampling
Theory: Mathematics and Applications (Birkhauser).
Fast Fourier transform
164
Press, WH; Teukolsky, SA; Vetterling, WT; Flannery, BP (2007), "Chapter 12. Fast Fourier Transform" (http:/ /
apps.nrbook. com/ empanel/ index. html#pg=600), Numerical Recipes: The Art of Scientific Computing (3rd ed.),
New York: Cambridge University Press, ISBN978-0-521-88068-8
Rokhlin, Vladimir; Tygert, Mark (2006). "Fast algorithms for spherical harmonic expansions". SIAM J. Sci.
Computing 27 (6): 19031928. doi: 10.1137/050623073 (http:/ / dx. doi. org/ 10. 1137/ 050623073).
James C. Schatzman, 1996, Accuracy of the discrete Fourier transform and the fast Fourier transform (http:/ /
portal. acm. org/ citation. cfm?id=240432), SIAM J. Sci. Comput. 17: 11501166.
Shentov, O. V.; Mitra, S. K.; Heute, U.; Hossen, A. N. (1995). "Subband DFT. I. Definition, interpretations and
extensions". Signal Processing 41 (3): 261277. doi: 10.1016/0165-1684(94)00103-7 (http:/ / dx. doi. org/ 10.
1016/ 0165-1684(94)00103-7).
Sorensen, H. V.; Jones, D. L.; Heideman, M. T.; Burrus, C. S. (1987). "Real-valued fast Fourier transform
algorithms". IEEE Trans. Acoust. Speech Sig. Processing 35 (35): 849863. doi: 10.1109/TASSP.1987.1165220
(http:/ / dx. doi. org/ 10. 1109/ TASSP. 1987. 1165220). See also Sorensen, H.; Jones, D.; Heideman, M.; Burrus,
C. (1987). "Corrections to "Real-valued fast Fourier transform algorithms"". IEEE Transactions on Acoustics,
Speech, and Signal Processing 35 (9): 13531353. doi: 10.1109/TASSP.1987.1165284 (http:/ / dx. doi. org/ 10.
1109/ TASSP. 1987. 1165284).
Welch, Peter D. (1969). "A fixed-point fast Fourier transform error analysis". IEEE Trans. Audio Electroacoustics
17 (2): 151157. doi: 10.1109/TAU.1969.1162035 (http:/ / dx. doi. org/ 10. 1109/ TAU. 1969. 1162035).
Winograd, S. (1978). "On computing the discrete Fourier transform". Math. Computation 32 (141): 175199. doi:
10.1090/S0025-5718-1978-0468306-4 (http:/ / dx. doi. org/ 10. 1090/ S0025-5718-1978-0468306-4). JSTOR
2006266 (http:/ / www. jstor. org/ stable/ 2006266).
External links
Fast Fourier Algorithm (http:/ / www. cs. pitt. edu/ ~kirk/ cs1501/ animations/ FFT. html)
Fast Fourier Transforms (http:/ / cnx. org/ content/ col10550/ ), Connexions online book edited by C. Sidney
Burrus, with chapters by C. Sidney Burrus, Ivan Selesnick, Markus Pueschel, Matteo Frigo, and Steven G.
Johnson (2008).
Links to FFT code and information online. (http:/ / www. fftw. org/ links. html)
National Taiwan University FFT (http:/ / www. cmlab. csie. ntu. edu. tw/ cml/ dsp/ training/ coding/ transform/
fft. html)
FFT programming in C++ CooleyTukey algorithm. (http:/ / www. librow. com/ articles/ article-10)
Online documentation, links, book, and code. (http:/ / www. jjj. de/ fxt/ )
Using FFT to construct aggregate probability distributions (http:/ / www. vosesoftware. com/ ModelRiskHelp/
index. htm#Aggregate_distributions/ Aggregate_modeling_-_Fast_Fourier_Transform_FFT_method. htm)
Sri Welaratna, " Thirty years of FFT analyzers (http:/ / www. dataphysics. com/
30_Years_of_FFT_Analyzers_by_Sri_Welaratna. pdf)", Sound and Vibration (January 1997, 30th anniversary
issue). A historical review of hardware FFT devices.
FFT Basics and Case Study Using Multi-Instrument (http:/ / www. virtins. com/ doc/ D1002/
FFT_Basics_and_Case_Study_using_Multi-Instrument_D1002. pdf)
FFT Textbook notes, PPTs, Videos (http:/ / numericalmethods. eng. usf. edu/ topics/ fft. html) at Holistic
Numerical Methods Institute.
ALGLIB FFT Code (http:/ / www. alglib. net/ fasttransforms/ fft. php) GPL Licensed multilanguage (VBA, C++,
Pascal, etc.) numerical analysis and data processing library.
MIT's sFFT (http:/ / groups. csail. mit. edu/ netmit/ sFFT/ ) MIT Sparse FFT algorithm and implementation.
VB6 FFT (http:/ / www. borgdesign. ro/ fft. zip) VB6 optimized library implementation with source code.
Cooley-Tukey FFT algorithm
165
Cooley-Tukey FFT algorithm
The CooleyTukey algorithm, named after J.W. Cooley and John Tukey, is the most common fast Fourier
transform (FFT) algorithm. It re-expresses the discrete Fourier transform (DFT) of an arbitrary composite size N =
N
1
N
2
in terms of smaller DFTs of sizes N
1
and N
2
, recursively, in order to reduce the computation time to O(N log N)
for highly-composite N (smooth numbers). Because of the algorithm's importance, specific variants and
implementation styles have become known by their own names, as described below.
Because the Cooley-Tukey algorithm breaks the DFT into smaller DFTs, it can be combined arbitrarily with any
other algorithm for the DFT. For example, Rader's or Bluestein's algorithm can be used to handle large prime factors
that cannot be decomposed by CooleyTukey, or the prime-factor algorithm can be exploited for greater efficiency
in separating out relatively prime factors.
See also the fast Fourier transform for information on other FFT algorithms, specializations for real and/or
symmetric data, and accuracy in the face of finite floating-point precision.
History
This algorithm, including its recursive application, was invented around 1805 by Carl Friedrich Gauss, who used it
to interpolate the trajectories of the asteroids Pallas and Juno, but his work was not widely recognized (being
published only posthumously and in neo-Latin).
[1][2]
Gauss did not analyze the asymptotic computational time,
however. Various limited forms were also rediscovered several times throughout the 19th and early 20th centuries.
FFTs became popular after James Cooley of IBM and John Tukey of Princeton published a paper in 1965
reinventing the algorithm and describing how to perform it conveniently on a computer.
Tukey reportedly came up with the idea during a meeting of a US presidential advisory committee discussing ways
to detect nuclear-weapon tests in the Soviet Union.
[3]
Another participant at that meeting, Richard Garwin of IBM,
recognized the potential of the method and put Tukey in touch with Cooley, who implemented it for a different (and
less-classified) problem: analyzing 3d crystallographic data (see also: multidimensional FFTs). Cooley and Tukey
subsequently published their joint paper, and wide adoption quickly followed.
The fact that Gauss had described the same algorithm (albeit without analyzing its asymptotic cost) was not realized
until several years after Cooley and Tukey's 1965 paper. Their paper cited as inspiration only work by I. J. Good on
what is now called the prime-factor FFT algorithm (PFA); although Good's algorithm was initially mistakenly
thought to be equivalent to the CooleyTukey algorithm, it was quickly realized that PFA is a quite different
algorithm (only working for sizes that have relatively prime factors and relying on the Chinese Remainder Theorem,
unlike the support for any composite size in CooleyTukey).
[4]
The radix-2 DIT case
A radix-2 decimation-in-time (DIT) FFT is the simplest and most common form of the CooleyTukey algorithm,
although highly optimized CooleyTukey implementations typically use other forms of the algorithm as described
below. Radix-2 DIT divides a DFT of size N into two interleaved DFTs (hence the name "radix-2") of size N/2 with
each recursive stage.
The discrete Fourier transform (DFT) is defined by the formula:
where is an integer ranging from to .
Radix-2 DIT first computes the DFTs of the even-indexed inputs and of the
odd-indexed inputs , and then combines those two results to produce the DFT of
Cooley-Tukey FFT algorithm
166
the whole sequence. This idea can then be performed recursively to reduce the overall runtime to O(N log N). This
simplified form assumes that N is a power of two; since the number of sample points N can usually be chosen freely
by the application, this is often not an important restriction.
The Radix-2 DIT algorithm rearranges the DFT of the function into two parts: a sum over the even-numbered
indices and a sum over the odd-numbered indices :
One can factor a common multiplier out of the second sum, as shown in the equation below. It is then clear
that the two sums are the DFT of the even-indexed part and the DFT of odd-indexed part of the
function . Denote the DFT of the Even-indexed inputs by and the DFT of the Odd-indexed inputs
by and we obtain:
Thanks to the periodicity of the DFT, we know that
and .
Therefore, we can rewrite the above equation as
We also know that the twiddle factor obeys the following relation:
This allows us to cut the number of "twiddle factor" calculations in half also. For , we have
This result, expressing the DFT of length N recursively in terms of two DFTs of size N/2, is the core of the radix-2
DIT fast Fourier transform. The algorithm gains its speed by re-using the results of intermediate computations to
compute multiple DFT outputs. Note that final outputs are obtained by a +/ combination of and
, which is simply a size-2 DFT (sometimes called a butterfly in this context); when this is
generalized to larger radices below, the size-2 DFT is replaced by a larger DFT (which itself can be evaluated with
an FFT).
Cooley-Tukey FFT algorithm
167
Data flow diagram for N=8: a decimation-in-time radix-2 FFT breaks a length-N
DFT into two length-N/2 DFTs followed by a combining stage consisting of many
size-2 DFTs called "butterfly" operations (so-called because of the shape of the
data-flow diagrams).
This process is an example of the general
technique of divide and conquer algorithms;
in many traditional implementations,
however, the explicit recursion is avoided,
and instead one traverses the computational
tree in breadth-first fashion.
The above re-expression of a size-N DFT as
two size-N/2 DFTs is sometimes called the
DanielsonLanczos lemma, since the
identity was noted by those two authors in
1942
[5]
(influenced by Runge's 1903 work).
They applied their lemma in a "backwards"
recursive fashion, repeatedly doubling the
DFT size until the transform spectrum
converged (although they apparently didn't
realize the linearithmic [i.e., order NlogN]
asymptotic complexity they had achieved).
The DanielsonLanczos work predated
widespread availability of computers and
required hand calculation (possibly with mechanical aids such as adding machines); they reported a computation
time of 140 minutes for a size-64 DFT operating on real inputs to 35 significant digits. Cooley and Tukey's 1965
paper reported a running time of 0.02 minutes for a size-2048 complex DFT on an IBM 7094 (probably in 36-bit
single precision, ~8 digits). Rescaling the time by the number of operations, this corresponds roughly to a speedup
factor of around 800,000. (To put the time for the hand calculation in perspective, 140 minutes for size 64
corresponds to an average of at most 16 seconds per floating-point operation, around 20% of which are
multiplications.)
Pseudocode
In pseudocode, the above procedure could be written:
X
0,...,N1
ditfft2(x, N, s): DFT of (x
0
, x
s
, x
2s
, ..., x
(N-1)s
):
if N = 1 then
X
0
x
0
trivial size-1 DFT base case
else
X
0,...,N/21
ditfft2(x, N/2, 2s) DFT of (x
0
, x
2s
, x
4s
, ...)
X
N/2,...,N1
ditfft2(x+s, N/2, 2s) DFT of (x
s
, x
s+2s
, x
s+4s
, ...)
for k = 0 to N/21 combine DFTs of two halves into full DFT:
t X
k
X
k
t + exp(2i k/N) X
k+N/2
X
k+N/2
t exp(2i k/N) X
k+N/2
endfor
endif
Here, ditfft2(x,N,1), computes X=DFT(x) out-of-place by a radix-2 DIT FFT, where N is an integer power of 2
and s=1 is the stride of the input x array. x+s denotes the array starting with x
s
.
(The results are in the correct order in X and no further bit-reversal permutation is required; the often-mentioned
necessity of a separate bit-reversal stage only arises for certain in-place algorithms, as described below.)
Cooley-Tukey FFT algorithm
168
High-performance FFT implementations make many modifications to the implementation of such an algorithm
compared to this simple pseudocode. For example, one can use a larger base case than N=1 to amortize the overhead
of recursion, the twiddle factors can be precomputed, and larger radices are often used for cache
reasons; these and other optimizations together can improve the performance by an order of magnitude or more.
[]
(In
many textbook implementations the depth-first recursion is eliminated entirely in favor of a nonrecursive
breadth-first approach, although depth-first recursion has been argued to have better memory locality.) Several of
these ideas are described in further detail below.
General factorizations
The basic step of the CooleyTukey FFT for general factorizations can be viewed as
re-interpreting a 1d DFT as something like a 2d DFT. The 1d input array of length N =
N
1
N
2
is reinterpreted as a 2d N
1
N
2
matrix stored in column-major order. One performs
smaller 1d DFTs along the N
2
direction (the non-contiguous direction), then multiplies by
phase factors (twiddle factors), and finally performs 1d DFTs along the N
1
direction. The
transposition step can be performed in the middle, as shown here, or at the beginning or
end. This is done recursively for the smaller transforms.
More generally, CooleyTukey
algorithms recursively re-express a
DFT of a composite size N = N
1
N
2
as:
[6]
1. Perform N
1
DFTs of size N
2
.
2. Multiply by complex roots of unity
called twiddle factors.
3. Perform N
2
DFTs of size N
1
.
Typically, either N
1
or N
2
is a small
factor (not necessarily prime), called
the radix (which can differ between
stages of the recursion). If N
1
is the
radix, it is called a decimation in time
(DIT) algorithm, whereas if N
2
is the
radix, it is decimation in frequency
(DIF, also called the Sande-Tukey
algorithm). The version presented
above was a radix-2 DIT algorithm; in
the final expression, the phase
multiplying the odd transform is the
twiddle factor, and the +/- combination (butterfly) of the even and odd transforms is a size-2 DFT. (The radix's small
DFT is sometimes known as a butterfly, so-called because of the shape of the dataflow diagram for the radix-2 case.)
There are many other variations on the CooleyTukey algorithm. Mixed-radix implementations handle composite
sizes with a variety of (typically small) factors in addition to two, usually (but not always) employing the O(N
2
)
algorithm for the prime base cases of the recursion (it is also possible to employ an NlogN algorithm for the prime
base cases, such as Rader's or Bluestein's algorithm). Split radix merges radices 2 and 4, exploiting the fact that the
first transform of radix 2 requires no twiddle factor, in order to achieve what was long the lowest known arithmetic
operation count for power-of-two sizes, although recent variations achieve an even lower count.
[7][8]
(On present-day
computers, performance is determined more by cache and CPU pipeline considerations than by strict operation
counts; well-optimized FFT implementations often employ larger radices and/or hard-coded base-case transforms of
significant size.) Another way of looking at the CooleyTukey algorithm is that it re-expresses a size N
one-dimensional DFT as an N
1
by N
2
two-dimensional DFT (plus twiddles), where the output matrix is transposed.
The net result of all of these transpositions, for a radix-2 algorithm, corresponds to a bit reversal of the input (DIF) or
output (DIT) indices. If, instead of using a small radix, one employs a radix of roughly N and explicit input/output
matrix transpositions, it is called a four-step algorithm (or six-step, depending on the number of transpositions),
initially proposed to improve memory locality,
[9][10]
e.g. for cache optimization or out-of-core operation, and was
Cooley-Tukey FFT algorithm
169
later shown to be an optimal cache-oblivious algorithm.
[11]
The general CooleyTukey factorization rewrites the indices k and n as and ,
respectively, where the indices k
a
and n
a
run from 0..N
a
-1 (for a of 1 or 2). That is, it re-indexes the input (n) and
output (k) as N
1
by N
2
two-dimensional arrays in column-major and row-major order, respectively; the difference
between these indexings is a transposition, as mentioned above. When this re-indexing is substituted into the DFT
formula for nk, the cross term vanishes (its exponential is unity), and the remaining terms give
where each inner sum is a DFT of size N
2
, each outer sum is a DFT of size N
1
, and the [...] bracketed term is the
twiddle factor.
An arbitrary radix r (as well as mixed radices) can be employed, as was shown by both Cooley and Tukey as well as
Gauss (who gave examples of radix-3 and radix-6 steps). Cooley and Tukey originally assumed that the radix
butterfly required O(r
2
) work and hence reckoned the complexity for a radix r to be O(r
2
N/rlog
r
N) =
O(Nlog
2
(N)r/log
2
r); from calculation of values of r/log
2
r for integer values of r from 2 to 12 the optimal radix is
found to be 3 (the closest integer to e, which minimizes r/log
2
r).
[12]
This analysis was erroneous, however: the
radix-butterfly is also a DFT and can be performed via an FFT algorithm in O(r log r) operations, hence the radix r
actually cancels in the complexity O(rlog(r)N/rlog
r
N), and the optimal r is determined by more complicated
considerations. In practice, quite large r (32 or 64) are important in order to effectively exploit e.g. the large number
of processor registers on modern processors, and even an unbounded radix r=N also achieves O(NlogN)
complexity and has theoretical and practical advantages for large N as mentioned above.
Data reordering, bit reversal, and in-place algorithms
Although the abstract CooleyTukey factorization of the DFT, above, applies in some form to all implementations of
the algorithm, much greater diversity exists in the techniques for ordering and accessing the data at each stage of the
FFT. Of special interest is the problem of devising an in-place algorithm that overwrites its input with its output data
using only O(1) auxiliary storage.
The most well-known reordering technique involves explicit bit reversal for in-place radix-2 algorithms. Bit
reversal is the permutation where the data at an index n, written in binary with digits b
4
b
3
b
2
b
1
b
0
(e.g. 5 digits for
N=32 inputs), is transferred to the index with reversed digits b
0
b
1
b
2
b
3
b
4
. Consider the last stage of a radix-2 DIT
algorithm like the one presented above, where the output is written in-place over the input: when and are
combined with a size-2 DFT, those two values are overwritten by the outputs. However, the two output values
should go in the first and second halves of the output array, corresponding to the most significant bit b
4
(for N=32);
whereas the two inputs and are interleaved in the even and odd elements, corresponding to the least
significant bit b
0
. Thus, in order to get the output in the correct place, these two bits must be swapped. If you include
all of the recursive stages of a radix-2 DIT algorithm, all the bits must be swapped and thus one must pre-process the
input (or post-process the output) with a bit reversal to get in-order output. (If each size-N/2 subtransform is to
operate on contiguous data, the DIT input is pre-processed by bit-reversal.) Correspondingly, if you perform all of
the steps in reverse order, you obtain a radix-2 DIF algorithm with bit reversal in post-processing (or pre-processing,
respectively). Alternatively, some applications (such as convolution) work equally well on bit-reversed data, so one
can perform forward transforms, processing, and then inverse transforms all without bit reversal to produce final
results in the natural order.
Many FFT users, however, prefer natural-order outputs, and a separate, explicit bit-reversal stage can have a
non-negligible impact on the computation time, even though bit reversal can be done in O(N) time and has been the
Cooley-Tukey FFT algorithm
170
subject of much research. Also, while the permutation is a bit reversal in the radix-2 case, it is more generally an
arbitrary (mixed-base) digit reversal for the mixed-radix case, and the permutation algorithms become more
complicated to implement. Moreover, it is desirable on many hardware architectures to re-order intermediate stages
of the FFT algorithm so that they operate on consecutive (or at least more localized) data elements. To these ends, a
number of alternative implementation schemes have been devised for the CooleyTukey algorithm that do not
require separate bit reversal and/or involve additional permutations at intermediate stages.
The problem is greatly simplified if it is out-of-place: the output array is distinct from the input array or,
equivalently, an equal-size auxiliary array is available. The Stockham auto-sort algorithm
[13]
performs every stage
of the FFT out-of-place, typically writing back and forth between two arrays, transposing one "digit" of the indices
with each stage, and has been especially popular on SIMD architectures.
[]
Even greater potential SIMD advantages
(more consecutive accesses) have been proposed for the Pease algorithm, which also reorders out-of-place with each
stage, but this method requires separate bit/digit reversal and O(N log N) storage. One can also directly apply the
CooleyTukey factorization definition with explicit (depth-first) recursion and small radices, which produces
natural-order out-of-place output with no separate permutation step (as in the pseudocode above) and can be argued
to have cache-oblivious locality benefits on systems with hierarchical memory.
[14]
A typical strategy for in-place algorithms without auxiliary storage and without separate digit-reversal passes
involves small matrix transpositions (which swap individual pairs of digits) at intermediate stages, which can be
combined with the radix butterflies to reduce the number of passes over the data.
References
[1] Gauss, Carl Friedrich, " Theoria interpolationis methodo nova tractata (http:/ / lseet. univ-tln. fr/ ~iaroslav/
Gauss_Theoria_interpolationis_methodo_nova_tractata. php)", Werke, Band 3, 265327 (Knigliche Gesellschaft der Wissenschaften,
Gttingen, 1866)
[2] Heideman, M. T., D. H. Johnson, and C. S. Burrus, " Gauss and the history of the fast Fourier transform (http:/ / ieeexplore. ieee. org/ xpls/
abs_all.jsp?arnumber=1162257)," IEEE ASSP Magazine, 1, (4), 1421 (1984)
[3] Rockmore, Daniel N. , Comput. Sci. Eng. 2 (1), 60 (2000). The FFT an algorithm the whole family can use (http:/ / www. cs. dartmouth.
edu/ ~rockmore/ cse-fft. pdf) Special issue on "top ten algorithms of the century " (http:/ / amath. colorado. edu/ resources/ archive/ topten.
pdf)
[4] James W. Cooley, Peter A. W. Lewis, and Peter W. Welch, "Historical notes on the fast Fourier transform," Proc. IEEE, vol. 55 (no. 10), p.
16751677 (1967).
[5] Danielson, G. C., and C. Lanczos, "Some improvements in practical Fourier analysis and their application to X-ray scattering from liquids," J.
Franklin Inst. 233, 365380 and 435452 (1942).
[6] Duhamel, P., and M. Vetterli, "Fast Fourier transforms: a tutorial review and a state of the art," Signal Processing 19, 259299 (1990)
[7] Lundy, T., and J. Van Buskirk, "A new matrix approach to real FFTs and convolutions of length 2
k
," Computing 80, 23-45 (2007).
[8] Johnson, S. G., and M. Frigo, " A modified split-radix FFT with fewer arithmetic operations (http:/ / www. fftw. org/ newsplit. pdf)," IEEE
Trans. Signal Processing 55 (1), 111119 (2007).
[9] Gentleman W. M., and G. Sande, "Fast Fourier transformsfor fun and profit," Proc. AFIPS 29, 563578 (1966).
[10] Bailey, David H., "FFTs in external or hierarchical memory," J. Supercomputing 4 (1), 2335 (1990)
[11] M. Frigo, C.E. Leiserson, H. Prokop, and S. Ramachandran. Cache-oblivious algorithms. In Proceedings of the 40th IEEE Symposium on
Foundations of Computer Science (FOCS 99), p.285-297. 1999. Extended abstract at IEEE (http:/ / ieeexplore. ieee. org/ iel5/ 6604/ 17631/
00814600.pdf?arnumber=814600), at Citeseer (http:/ / citeseer. ist. psu. edu/ 307799. html).
[12] Cooley, J. W., P. Lewis and P. Welch, "The Fast Fourier Transform and its Applications", IEEE Trans on Education 12, 1, 28-34 (1969)
[13] Originally attributed to Stockham in W. T. Cochran et al., What is the fast Fourier transform? (http:/ / dx. doi. org/ 10. 1109/ PROC. 1967.
5957), Proc. IEEE vol. 55, 16641674 (1967).
[14] A free (GPL) C library for computing discrete Fourier transforms in one or more dimensions, of arbitrary size, using the CooleyTukey
algorithm
Cooley-Tukey FFT algorithm
171
External links
a simple, pedagogical radix-2 CooleyTukey FFT algorithm in C++. (http:/ / www. librow. com/ articles/
article-10)
KISSFFT (http:/ / sourceforge. net/ projects/ kissfft/ ): a simple mixed-radix CooleyTukey implementation in C
(open source)
Butterfly diagram
This article is about butterfly diagrams in FFT algorithms; for the sunspot diagrams of the same name, see
Solar cycle.
Data flow diagram connecting the inputs x (left) to the
outputs y that depend on them (right) for a "butterfly"
step of a radix-2 CooleyTukey FFT. This diagram
resembles a butterfly (as in the Morpho butterfly
shown for comparison), hence the name.
In the context of fast Fourier transform algorithms, a butterfly is a
portion of the computation that combines the results of smaller
discrete Fourier transforms (DFTs) into a larger DFT, or vice versa
(breaking a larger DFT up into subtransforms). The name
"butterfly" comes from the shape of the data-flow diagram in the
radix-2 case, as described below.
[1]
The same structure can also be
found in the Viterbi algorithm, used for finding the most likely
sequence of hidden states.
Most commonly, the term "butterfly" appears in the context of the
CooleyTukey FFT algorithm, which recursively breaks down a
DFT of composite size n=rm into r smaller transforms of size m
where r is the "radix" of the transform. These smaller DFTs are
then combined via size-r butterflies, which themselves are DFTs
of size r (performed m times on corresponding outputs of the
sub-transforms) pre-multiplied by roots of unity (known as twiddle
factors). (This is the "decimation in time" case; one can also
perform the steps in reverse, known as "decimation in frequency",
where the butterflies come first and are post-multiplied by twiddle
factors. See also the CooleyTukey FFT article.)
Radix-2 butterfly diagram
In the case of the radix-2 CooleyTukey algorithm, the butterfly is simply a DFT of size-2 that takes two inputs
(x
0
,x
1
) (corresponding outputs of the two sub-transforms) and gives two outputs (y
0
,y
1
) by the formula (not
including twiddle factors):
If one draws the data-flow diagram for this pair of operations, the (x
0
,x
1
) to (y
0
,y
1
) lines cross and resemble the
wings of a butterfly, hence the name (see also the illustration at right).
Butterfly diagram
172
A decimation-in-time radix-2 FFT breaks a length-N DFT into two length-N/2
DFTs followed by a combining stage consisting of many butterfly operations.
More specifically, a decimation-in-time FFT
algorithm on n=2
p
inputs with respect to a
primitive n-th root of unity
relies on O(nlogn) butterflies of the form:
where k is an integer depending on the part
of the transform being computed. Whereas
the corresponding inverse transform can
mathematically be performed by replacing
with
1
(and possibly multiplying by an
overall scale factor, depending on the
normalization convention), one may also
directly invert the butterflies:
corresponding to a decimation-in-frequency FFT algorithm.
Other uses
The butterfly can also be used to improve the randomness of large arrays of partially random numbers, by bringing
every 32 or 64 bit word into causal contact with every other word through a desired hashing algorithm, so that a
change in any one bit has the possibility of changing all the bits in the large array.
References
[1] Alan V. Oppenheim, Ronald W. Schafer, and John R. Buck, Discrete-Time Signal Processing, 2nd edition (Upper Saddle River, NJ: Prentice
Hall, 1989)
External links
explanation of the FFT and butterfly diagrams (http:/ / www. relisoft. com/ Science/ Physics/ fft. html).
butterfly diagrams of various FFT implementations (Radix-2, Radix-4, Split-Radix) (http:/ / www. cmlab. csie.
ntu. edu. tw/ cml/ dsp/ training/ coding/ transform/ fft. html).
Codec
173
Codec
This article is about encoding and decoding a digital data stream. For other uses, see Codec (disambiguation).
Further information: List of codecs and Video codecs
A codec is a device or computer program capable of encoding or decoding a digital data stream or signal. The word
codec is a portmanteau of "coder-decoder" or, less commonly, "compressor-decompressor". A codec (the program)
should not be confused with a coding or compression format or standard a format is a document (the standard), a
way of storing data, while a codec is a program (an implementation) which can read or write such files. In practice,
however, "codec" is sometimes used loosely to refer to formats.
A codec encodes a data stream or signal for transmission, storage or encryption, or decodes it for playback or
editing. Codecs are used in videoconferencing, streaming media and video editing applications. A video camera's
analog-to-digital converter (ADC) converts its analog signals into digital signals, which are then passed through a
video compressor for digital transmission or storage. A receiving device then runs the signal through a video
decompressor, then a digital-to-analog converter (DAC) for analog display. The term codec is also used as a generic
name for a videoconferencing unit.
Related concepts
An endec (encoder/decoder) is a similar yet different concept mainly used for hardware. In the mid 20th century, a
"codec" was hardware that coded analog signals into pulse-code modulation (PCM) and decoded them back. Late in
the century the name came to be applied to a class of software for converting among digital signal formats, and
including compander functions.
A modem is a contraction of modulator/demodulator (although they were referred to as "datasets" by telcos) and
converts digital data from computers to analog for phone line transmission. On the receiving end the analog is
converted back to digital. Codecs do the opposite (convert audio analog to digital and then computer digital sound
back to audio).
An audio codec converts analog audio signals into digital signals for transmission or storage. A receiving device then
converts the digital signals back to analog using an audio decompressor, for playback. An example of this is the
codecs used in the sound cards of personal computers. A video codec accomplishes the same task for video signals.
Compression quality
Lossy codecs: Many of the more popular codecs in the software world are lossy, meaning that they reduce quality
by some amount in order to achieve compression. Often, this type of compression is virtually indistinguishable
from the original uncompressed sound or images, depending on the codec and the settings used. Smaller data sets
ease the strain on relatively expensive storage sub-systems such as non-volatile memory and hard disk, as well as
write-once-read-many formats such as CD-ROM, DVD and Blu-ray Disc. Lower data rates also reduce cost and
improve performance when the data is transmitted.
Lossless codecs: There are also many lossless codecs which are typically used for archiving data in a compressed
form while retaining all of the information present in the original stream. If preserving the original quality of the
stream is more important than eliminating the correspondingly larger data sizes, lossless codecs are preferred.
This is especially true if the data is to undergo further processing (for example editing) in which case the repeated
application of processing (encoding and decoding) on lossy codecs will degrade the quality of the resulting data
such that it is no longer identifiable (visually, audibly or both). Using more than one codec or encoding scheme
successively can also degrade quality significantly. The decreasing cost of storage capacity and network
bandwidth has a tendency to reduce the need for lossy codecs for some media.
Codec
174
Media codecs
Two principal techniques are used in codecs, pulse-code modulation and delta modulation. Codecs are often
designed to emphasize certain aspects of the media to be encoded. For example, a digital video (using a DV codec)
of a sports event needs to encode motion well but not necessarily exact colors, while a video of an art exhibit needs
to encode color and surface texture well.
Audio codecs for cell phones need to have very low latency between source encoding and playback. In contrast,
audio codecs for recording or broadcast can use high-latency audio compression techniques to achieve higher fidelity
at a lower bit-rate.
There are thousands of audio and video codecs, ranging in cost from free to hundreds of dollars or more. This variety
of codecs can create compatibility and obsolescence issues. The impact is lessened for older formats, for which free
or nearly-free codecs have existed for a long time. The older formats are often ill-suited to modern applications,
however, such as playback in small portable devices. For example, raw uncompressed PCM audio (44.1kHz, 16 bit
stereo, as represented on an audio CD or in a .wav or .aiff file) has long been a standard across multiple platforms,
but its transmission over networks is slow and expensive compared with more modern compressed formats, such as
MP3.
Many multimedia data streams contain both audio and video, and often some metadata that permit synchronization
of audio and video. Each of these three streams may be handled by different programs, processes, or hardware; but
for the multimedia data streams to be useful in stored or transmitted form, they must be encapsulated together in a
container format.
Lower bitrate codecs allow more users, but they also have more distortion. Beyond the initial increase in distortion,
lower bit rate codecs also achieve their lower bit rates by using more complex algorithms that make certain
assumptions, such as those about the media and the packet loss rate. Other codecs may not make those same
assumptions. When a user with a low bitrate codec talks to a user with another codec, additional distortion is
introduced by each transcoding.
AVI is sometimes erroneously described as a codec, but AVI is actually a container format, while a codec is a
software or hardware tool that encodes or decodes audio or video into or from some audio or video format. Audio
and video encoded with many codecs might be put into an AVI container, although AVI is not an ISO standard.
There are also other well-known container formats, such as Ogg, ASF, QuickTime, RealMedia, Matroska, and DivX
Media Format. Some container formats which are ISO standards are MPEG transport stream, MPEG program
stream, MP4 and ISO base media file format.
References
FFTW
175
FFTW
FFTW
Developer(s) Matteo Frigo and Steven G. Johnson
Initial release 24March1997
Stable release 3.3.4 / 16March 2014
Written in C, OCaml
Type Numerical software
License GPL, commercial
Website
www.fftw.org
[1]
The Fastest Fourier Transform in the West (FFTW) is a software library for computing discrete Fourier
transforms (DFTs) developed by Matteo Frigo and Steven G. Johnson at the Massachusetts Institute of Technology.
FFTW is known as the fastest free software implementation of the Fast Fourier transform (FFT) algorithm (upheld
by regular benchmarks
[2]
). It can compute transforms of real and complex-valued arrays of arbitrary size and
dimension in O(nlogn) time.
It does this by supporting a variety of algorithms and choosing the one (a particular decomposition of the transform
into smaller transforms) it estimates or measures to be preferable in the particular circumstances. It works best on
arrays of sizes with small prime factors, with powers of two being optimal and large primes being worst case (but
still O(n log n)). To decompose transforms of composite sizes into smaller transforms, it chooses among several
variants of the CooleyTukey FFT algorithm (corresponding to different factorizations and/or different
memory-access patterns), while for prime sizes it uses either Rader's or Bluestein's FFT algorithm. Once the
transform has been broken up into subtransforms of sufficiently small sizes, FFTW uses hard-coded unrolled FFTs
for these small sizes that were produced (at compile time, not at run time) by code generation; these routines use a
variety of algorithms including CooleyTukey variants, Rader's algorithm, and prime-factor FFT algorithms.
For a sufficiently large number of repeated transforms it is advantageous to measure the performance of some or all
of the supported algorithms on the given array size and platform. These measurements, which the authors refer to as
"wisdom", can be stored in a file or string for later use.
FFTW has a "guru interface" that intends "to expose as much as possible of the flexibility in the underlying FFTW
architecture". This allows, among other things, multi-dimensional transforms and multiple transforms in a single call
(e.g., where the data is interleaved in memory).
FFTW has limited support for out-of-order transforms (using the MPI version). The data reordering incurs an
overhead, which for in-place transforms of arbitrary size and dimension is non-trivial to avoid. It is undocumented
for which transforms this overhead is significant.
FFTW is licensed under the GNU General Public License. It is also licensed commercially by MIT and is used in the
commercial MATLAB
[3]
matrix package for calculating FFTs. FFTW is written in the C language, but Fortran and
Ada interfaces exist, as well as interfaces for a few other languages. While the library itself is C, the code is actually
generated from a program called 'genfft', which is written in OCaml.
[4]
In 1999, FFTW won the J. H. Wilkinson Prize for Numerical Software.
FFTW
176
References
[1] http:/ / www. fftw.org/
[2] Homepage, second paragraph (http:/ / www. fftw.org/ ), and benchmarks page (http:/ / www. fftw. org/ benchfft/ )
[3] Faster Finite Fourier Transforms: MATLAB 6 incorporates FFTW (http:/ / www. mathworks. com/ company/ newsletters/ articles/
faster-finite-fourier-transforms-matlab. html)
[4] "FFTW FAQ" (http:/ / www.fftw.org/ faq/ section2.html#languages)
External links
Official website (http:/ / www. fftw. org/ )
177
Wavelets
Wavelet
A wavelet is a wave-like oscillation with an amplitude that begins at zero, increases, and then decreases back to
zero. It can typically be visualized as a "brief oscillation" like one might see recorded by a seismograph or heart
monitor. Generally, wavelets are purposefully crafted to have specific properties that make them useful for signal
processing. Wavelets can be combined, using a "reverse, shift, multiply and integrate" technique called convolution,
with portions of a known signal to extract information from the unknown signal.
Seismic wavelet
For example, a wavelet could be created to have a frequency of Middle
C and a short duration of roughly a 32nd note. If this wavelet was to be
convolved with a signal created from the recording of a song, then the
resulting signal would be useful for determining when the Middle C
note was being played in the song. Mathematically, the wavelet will
correlate with the signal if the unknown signal contains information of
similar frequency. This concept of correlation is at the core of many
practical applications of wavelet theory.
As a mathematical tool, wavelets can be used to extract information
from many different kinds of data, including but certainly not limited
to audio signals and images. Sets of wavelets are generally needed to
analyze data fully. A set of "complementary" wavelets will decompose
data without gaps or overlap so that the decomposition process is
mathematically reversible. Thus, sets of complementary wavelets are
useful in wavelet based compression/decompression algorithms where
it is desirable to recover the original information with minimal loss.
In formal terms, this representation is a wavelet series representation of
a square-integrable function with respect to either a complete,
orthonormal set of basis functions, or an overcomplete set or frame of a
vector space, for the Hilbert space of square integrable functions.
Name
The word wavelet has been used for decades in digital signal processing and exploration geophysics. The equivalent
French word ondelette meaning "small wave" was used by Morlet and Grossmann in the early 1980s.
Wavelet theory
Wavelet theory is applicable to several subjects. All wavelet transforms may be considered forms of time-frequency
representation for continuous-time (analog) signals and so are related to harmonic analysis. Almost all practically
useful discrete wavelet transforms use discrete-time filterbanks. These filter banks are called the wavelet and scaling
coefficients in wavelets nomenclature. These filterbanks may contain either finite impulse response (FIR) or infinite
impulse response (IIR) filters. The wavelets forming a continuous wavelet transform (CWT) are subject to the
Wavelet
178
uncertainty principle of Fourier analysis respective sampling theory: Given a signal with some event in it, one cannot
assign simultaneously an exact time and frequency response scale to that event. The product of the uncertainties of
time and frequency response scale has a lower bound. Thus, in the scaleogram of a continuous wavelet transform of
this signal, such an event marks an entire region in the time-scale plane, instead of just one point. Also, discrete
wavelet bases may be considered in the context of other forms of the uncertainty principle.
Wavelet transforms are broadly divided into three classes: continuous, discrete and multiresolution-based.
Continuous wavelet transforms (continuous shift and scale parameters)
In continuous wavelet transforms, a given signal of finite energy is projected on a continuous family of frequency
bands (or similar subspaces of the L
p
function space L
2
(R) ). For instance the signal may be represented on every
frequency band of the form [f, 2f] for all positive frequencies f > 0. Then, the original signal can be reconstructed by
a suitable integration over all the resulting frequency components.
The frequency bands or subspaces (sub-bands) are scaled versions of a subspace at scale 1. This subspace in turn is
in most situations generated by the shifts of one generating function in L
2
(R), the mother wavelet. For the example
of the scale one frequency band [1, 2] this function is
with the (normalized) sinc function. That, Meyer's, and two other examples of mother wavelets are:
Meyer Morlet Mexican Hat
The subspace of scale a or frequency band [1/a, 2/a] is generated by the functions (sometimes called child wavelets)
where a is positive and defines the scale and b is any real number and defines the shift. The pair (a, b) defines a point
in the right halfplane R
+
R.
The projection of a function x onto the subspace of scale a then has the form
with wavelet coefficients
See a list of some Continuous wavelets.
For the analysis of the signal x, one can assemble the wavelet coefficients into a scaleogram of the signal.
Wavelet
179
Discrete wavelet transforms (discrete shift and scale parameters)
It is computationally impossible to analyze a signal using all wavelet coefficients, so one may wonder if it is
sufficient to pick a discrete subset of the upper halfplane to be able to reconstruct a signal from the corresponding
wavelet coefficients. One such system is the affine system for some real parameters a > 1, b > 0. The corresponding
discrete subset of the halfplane consists of all the points (a
m
, na
m
b) with m, n in Z. The corresponding baby wavelets
are now given as
A sufficient condition for the reconstruction of any signal x of finite energy by the formula
is that the functions form a tight frame of L
2
(R).
Multiresolution based discrete wavelet transforms
D4 wavelet
In any discretised wavelet transform, there are only a finite number of
wavelet coefficients for each bounded rectangular region in the upper
halfplane. Still, each coefficient requires the evaluation of an integral.
In special situations this numerical complexity can be avoided if the
scaled and shifted wavelets form a multiresolution analysis. This
means that there has to exist an auxiliary function, the father wavelet
in L
2
(R), and that a is an integer. A typical choice is a = 2 and b = 1.
The most famous pair of father and mother wavelets is the Daubechies
4-tap wavelet. Note that not every orthonormal discrete wavelet basis
can be associated to a multiresolution analysis; for example, the Journe
wavelet admits no multiresolution analysis.
From the mother and father wavelets one constructs the subspaces
The mother wavelet keeps the time domain properties, while the father wavelets keeps the frequency
domain properties.
From these it is required that the sequence
forms a multiresolution analysis of L
2
and that the subspaces are the orthogonal
"differences" of the above sequence, that is, W
m
is the orthogonal complement of V
m
inside the subspace V
m1
,
In analogy to the sampling theorem one may conclude that the space V
m
with sampling distance 2
m
more or less
covers the frequency baseband from 0 to 2
m-1
. As orthogonal complement, W
m
roughly covers the band [2
m1
,
2
m
].
From those inclusions and orthogonality relations, especially , follows the existence of
sequences and that satisfy the identities
so that and
so that
Wavelet
180
The second identity of the first pair is a refinement equation for the father wavelet . Both pairs of identities form
the basis for the algorithm of the fast wavelet transform.
From the multiresolution analysis derives the orthogonal decomposition of the space L
2
as
For any signal or function this gives a representation in basis functions of the corresponding subspaces as
where the coefficients are
and
.
Mother wavelet
For practical applications, and for efficiency reasons, one prefers continuously differentiable functions with compact
support as mother (prototype) wavelet (functions). However, to satisfy analytical requirements (in the continuous
WT) and in general for theoretical reasons, one chooses the wavelet functions from a subspace of the space
This is the space of measurable functions that are absolutely and square integrable:
and
Being in this space ensures that one can formulate the conditions of zero mean and square norm one:
is the condition for zero mean, and
is the condition for square norm one.
For to be a wavelet for the continuous wavelet transform (see there for exact statement), the mother wavelet must
satisfy an admissibility criterion (loosely speaking, a kind of half-differentiability) in order to get a stably invertible
transform.
For the discrete wavelet transform, one needs at least the condition that the wavelet series is a representation of the
identity in the space L
2
(R). Most constructions of discrete WT make use of the multiresolution analysis, which
defines the wavelet by a scaling function. This scaling function itself is solution to a functional equation.
In most situations it is useful to restrict to be a continuous function with a higher number M of vanishing moments,
i.e. for all integer m < M
The mother wavelet is scaled (or dilated) by a factor of a and translated (or shifted) by a factor of b to give (under
Morlet's original formulation):
For the continuous WT, the pair (a,b) varies over the full half-plane R
+
R; for the discrete WT this pair varies over
a discrete subset of it, which is also called affine group.
These functions are often incorrectly referred to as the basis functions of the (continuous) transform. In fact, as in the
continuous Fourier transform, there is no basis in the continuous wavelet transform. Time-frequency interpretation
uses a subtly different formulation (after Delprat).
Restriction
Wavelet
181
(1) when a1 = a and b1 = b,
(2) has a finite time interval
Comparisons with Fourier transform (continuous-time)
The wavelet transform is often compared with the Fourier transform, in which signals are represented as a sum of
sinusoids. In fact, the Fourier transform can be viewed as a special case of the continuous wavelet transform with the
choice of the mother wavelet . The main difference in general is that wavelets are localized in both
time and frequency whereas the standard Fourier transform is only localized in frequency. The Short-time Fourier
transform (STFT) is similar to the wavelet transform, in that it is also time and frequency localized, but there are
issues with the frequency/time resolution trade-off.
In particular, assuming a rectangular window region, one may think of the STFT as a transform with a slightly
different kernel
where can often be written as , where and u respectively denote the length and
temporal offset of the windowing function. Using Parsevals theorem, one may define the wavelets energy as
=
From this, the square of the temporal support of the window offset by time u is given by
and the square of the spectral support of the window acting on a frequency
As stated by the Heisenberg uncertainty principle, the product of the temporal and spectral supports
for any given time-frequency atom, or resolution cell. The STFT windows restrict the resolution cells to spectral and
temporal supports determined by .
Multiplication with a rectangular window in the time domain corresponds to convolution with a
function in the frequency domain, resulting in spurious ringing artifacts for short/localized temporal windows. With
the continuous-time Fourier Transform, and this convolution is with a delta function in Fourier space,
resulting in the true Fourier transform of the signal . The window function may be some other apodizing filter,
such as a Gaussian. The choice of windowing function will affect the approximation error relative to the true Fourier
transform.
A given resolution cells time-bandwidth product may not be exceeded with the STFT. All STFT basis elements
maintain a uniform spectral and temporal support for all temporal shifts or offsets, thereby attaining an equal
resolution in time for lower and higher frequencies. The resolution is purely determined by the sampling width.
In contrast, the wavelet transforms multiresolutional properties enables large temporal supports for lower
frequencies while maintaining short temporal widths for higher frequencies by the scaling properties of the wavelet
transform. This property extends conventional time-frequency analysis into time-scale analysis.
[1]
Wavelet
182
STFT time-frequency atoms (left) and DWT
time-scale atoms (right). The time-frequency
atoms are four different basis functions used for
the STFT (i.e. four separate Fourier transforms
required). The time-scale atoms of the DWT
achieve small temporal widths for high
frequencies and good temporal widths for low
frequencies with a single transform basis set.
The discrete wavelet transform is less computationally complex, taking
O(N) time as compared to O(NlogN) for the fast Fourier transform.
This computational advantage is not inherent to the transform, but
reflects the choice of a logarithmic division of frequency, in contrast to
the equally spaced frequency divisions of the FFT (Fast Fourier
Transform) which uses the same basis functions as DFT (Discrete
Fourier Transform).
[2]
It is also important to note that this complexity
only applies when the filter size has no relation to the signal size. A
wavelet without compact support such as the Shannon wavelet would
require O(N
2
). (For instance, a logarithmic Fourier Transform also
exists with O(N) complexity, but the original signal must be sampled
logarithmically in time, which is only useful for certain types of
signals.
[3]
)
Definition of a wavelet
There are a number of ways of defining a wavelet (or a wavelet family).
Scaling filter
An orthogonal wavelet is entirely defined by the scaling filter a low-pass finite impulse response (FIR) filter of
length 2N and sum 1. In biorthogonal wavelets, separate decomposition and reconstruction filters are defined.
For analysis with orthogonal wavelets the high pass filter is calculated as the quadrature mirror filter of the low pass,
and reconstruction filters are the time reverse of the decomposition filters.
Daubechies and Symlet wavelets can be defined by the scaling filter.
Scaling function
Wavelets are defined by the wavelet function (t) (i.e. the mother wavelet) and scaling function (t) (also called
father wavelet) in the time domain.
The wavelet function is in effect a band-pass filter and scaling it for each level halves its bandwidth. This creates the
problem that in order to cover the entire spectrum, an infinite number of levels would be required. The scaling
function filters the lowest level of the transform and ensures all the spectrum is covered. See [4] for a detailed
explanation.
For a wavelet with compact support, (t) can be considered finite in length and is equivalent to the scaling filter g.
Meyer wavelets can be defined by scaling functions
Wavelet function
The wavelet only has a time domain representation as the wavelet function (t).
For instance, Mexican hat wavelets can be defined by a wavelet function. See a list of a few Continuous wavelets.
History
The development of wavelets can be linked to several separate trains of thought, starting with Haar's work in the
early 20th century. Later work by Dennis Gabor yielded Gabor atoms (1946), which are constructed similarly to
wavelets, and applied to similar purposes. Notable contributions to wavelet theory can be attributed to Zweigs
discovery of the continuous wavelet transform in 1975 (originally called the cochlear transform and discovered while
Wavelet
183
studying the reaction of the ear to sound),
[5]
Pierre Goupillaud, Grossmann and Morlet's formulation of what is now
known as the CWT (1982), Jan-Olov Strmberg's early work on discrete wavelets (1983), Daubechies' orthogonal
wavelets with compact support (1988), Mallat's multiresolution framework (1989), Akansu's Binomial QMF (1990),
Nathalie Delprat's time-frequency interpretation of the CWT (1991), Newland's harmonic wavelet transform (1993)
and many others since.
Timeline
First wavelet (Haar wavelet) by Alfrd Haar (1909)
Since the 1970s: George Zweig, Jean Morlet, Alex Grossmann
Since the 1980s: Yves Meyer, Stphane Mallat, Ingrid Daubechies, Ronald Coifman, Ali Akansu, Victor
Wickerhauser,
Wavelet transforms
A wavelet is a mathematical function used to divide a given function or continuous-time signal into different scale
components. Usually one can assign a frequency range to each scale component. Each scale component can then be
studied with a resolution that matches its scale. A wavelet transform is the representation of a function by wavelets.
The wavelets are scaled and translated copies (known as "daughter wavelets") of a finite-length or fast-decaying
oscillating waveform (known as the "mother wavelet"). Wavelet transforms have advantages over traditional Fourier
transforms for representing functions that have discontinuities and sharp peaks, and for accurately deconstructing
and reconstructing finite, non-periodic and/or non-stationary signals.
Wavelet transforms are classified into discrete wavelet transforms (DWTs) and continuous wavelet transforms
(CWTs). Note that both DWT and CWT are continuous-time (analog) transforms. They can be used to represent
continuous-time (analog) signals. CWTs operate over every possible scale and translation whereas DWTs use a
specific subset of scale and translation values or representation grid.
There are a large number of wavelet transforms each suitable for different applications. For a full list see list of
wavelet-related transforms but the common ones are listed below:
Continuous wavelet transform (CWT)
Discrete wavelet transform (DWT)
Fast wavelet transform (FWT)
Lifting scheme & Generalized Lifting Scheme
Wavelet packet decomposition (WPD)
Stationary wavelet transform (SWT)
Fractional Fourier transform (FRFT)
Fractional wavelet transform (FRWT)
Generalized transforms
There are a number of generalized transforms of which the wavelet transform is a special case. For example, Joseph
Segman introduced scale into the Heisenberg group, giving rise to a continuous transform space that is a function of
time, scale, and frequency. The CWT is a two-dimensional slice through the resulting 3d time-scale-frequency
volume.
Another example of a generalized transform is the chirplet transform in which the CWT is also a two dimensional
slice through the chirplet transform.
An important application area for generalized transforms involves systems in which high frequency resolution is
crucial. For example, darkfield electron optical transforms intermediate between direct and reciprocal space have
been widely used in the harmonic analysis of atom clustering, i.e. in the study of crystals and crystal defects.
[6]
Now
Wavelet
184
that transmission electron microscopes are capable of providing digital images with picometer-scale information on
atomic periodicity in nanostructure of all sorts, the range of pattern recognition
[7]
and strain
[8]
/metrology
[9]
applications for intermediate transforms with high frequency resolution (like brushlets
[10]
and ridgelets
[11]
) is
growing rapidly.
Fractional wavelet transform (FRWT) is a generalization of the classical wavelet transform in the fractional Fourier
transform domains. This transform is capable of providing the time- and fractional-domain information
simultaneously and representing signals in the time-fractional-frequency plane.
[12]
Applications of Wavelet Transform
Generally, an approximation to DWT is used for data compression if a signal is already sampled, and the CWT for
signal analysis.
[13]
Thus, DWT approximation is commonly used in engineering and computer science, and the CWT
in scientific research.
Like some other transforms, wavelet transforms can be used to transform data, then encode the transformed data,
resulting in effective compression. For example, JPEG 2000 is an image compression standard that uses biorthogonal
wavelets. This means that although the frame is overcomplete, it is a tight frame (see types of Frame of a vector
space), and the same frame functions (except for conjugation in the case of complex wavelets) are used for both
analysis and synthesis, i.e., in both the forward and inverse transform. For details see wavelet compression.
A related use is for smoothing/denoising data based on wavelet coefficient thresholding, also called wavelet
shrinkage. By adaptively thresholding the wavelet coefficients that correspond to undesired frequency components
smoothing and/or denoising operations can be performed.
Wavelet transforms are also starting to be used for communication applications. Wavelet OFDM is the basic
modulation scheme used in HD-PLC (a power line communications technology developed by Panasonic), and in one
of the optional modes included in the IEEE 1901 standard. Wavelet OFDM can achieve deeper notches than
traditional FFT OFDM, and wavelet OFDM does not require a guard interval (which usually represents significant
overhead in FFT OFDM systems).
[14]
As a representation of a signal
Often, signals can be represented well as a sum of sinusoids. However, consider a non-continuous signal with an
abrupt discontinuity; this signal can still be represented as a sum of sinusoids, but requires an infinite number, which
is an observation known as Gibbs phenomenon. This, then, requires an infinite number of Fourier coefficients, which
is not practical for many applications, such as compression. Wavelets are more useful for describing these signals
with discontinuities because of their time-localized behavior (both Fourier and wavelet transforms are
frequency-localized, but wavelets have an additional time-localization property). Because of this, many types of
signals in practice may be non-sparse in the Fourier domain, but very sparse in the wavelet domain. This is
particularly useful in signal reconstruction, especially in the recently popular field of compressed sensing. (Note that
the Short-time Fourier transform (STFT) is also localized in time and frequency, but there are often problems with
the frequency-time resolution trade-off. Wavelets are better signal representations because of multiresolution
analysis.)
This motivates why wavelet transforms are now being adopted for a vast number of applications, often replacing the
conventional Fourier Transform. Many areas of physics have seen this paradigm shift, including molecular
dynamics, ab initio calculations, astrophysics, density-matrix localisation, seismology, optics, turbulence and
quantum mechanics. This change has also occurred in image processing, EEG, EMG,
[15]
ECG analyses, brain
rhythms, DNA analysis, protein analysis, climatology, human sexual response analysis,
[16]
general signal processing,
speech recognition, acoustics, vibration signals,
[17]
computer graphics, multifractal analysis, and sparse coding. In
computer vision and image processing, the notion of scale space representation and Gaussian derivative operators is
Wavelet
185
regarded as a canonical multi-scale representation.
Wavelet Denoising
Suppose we measure a noisy signal . Assume s has a sparse representation in a certain wavelet bases,
and
So .
Most elements in p are 0 or close to 0, and
Since W is orthogonal, the estimation problem amounts to recovery of a signal in iid Gaussian noise. As p is sparse,
one method is to apply a gaussian mixture model for p.
Assume a prior , is the variance of "significant" coefficients, and
is the variance of "insignificant" coefficients.
Then , is called the shrinkage factor, which depends on the prior variances and
. The effect of the shrinkage factor is that small coefficients are set early to 0, and large coefficients are
unaltered.
Small coefficients are mostly noises, and large coefficients contain actual signal.
At last, apply the inverse wavelet transform to obtain
List of wavelets
Discrete wavelets
Beylkin (18)
BNC wavelets
Coiflet (6, 12, 18, 24, 30)
Cohen-Daubechies-Feauveau wavelet (Sometimes referred to as CDF N/P or Daubechies biorthogonal wavelets)
Daubechies wavelet (2, 4, 6, 8, 10, 12, 14, 16, 18, 20, etc.)
Binomial-QMF (Also referred to as Daubechies wavelet)
Haar wavelet
Mathieu wavelet
Legendre wavelet
Villasenor wavelet
Symlet
[18]
Continuous wavelets
Real-valued
Beta wavelet
Hermitian wavelet
Hermitian hat wavelet
Meyer wavelet
Mexican hat wavelet
Shannon wavelet
Wavelet
186
Complex-valued
Complex Mexican hat wavelet
fbsp wavelet
Morlet wavelet
Shannon wavelet
Modified Morlet wavelet
Notes
[1] [1] Mallat, Stephane. "A wavelet tour of signal processing. 1998." 250-252.
[2] The Scientist and Engineer's Guide to Digital Signal Processing By Steven W. Smith, Ph.D. chapter 8 equation 8-1: http:/ / www. dspguide.
com/ ch8/ 4.htm
[3] http:/ / homepages.dias.ie/ ~ajones/ publications/ 28. pdf
[4] http:/ / www. polyvalens. com/ blog/ ?page_id=15#7.+ The+ scaling+ function+ %5B7%5D
[5] http:/ / scienceworld. wolfram. com/ biography/ Zweig.html Zweig, George Biography on Scienceworld.wolfram.com
[6] P. Hirsch, A. Howie, R. Nicholson, D. W. Pashley and M. J. Whelan (1965/1977) Electron microscopy of thin crystals (Butterworths,
London/Krieger, Malabar FLA) ISBN 0-88275-376-2
[7] P. Fraundorf, J. Wang, E. Mandell and M. Rose (2006) Digital darkfield tableaus, Microscopy and Microanalysis 12:S2, 10101011 (cf.
arXiv:cond-mat/0403017 (http:/ / arxiv. org/ abs/ cond-mat/ 0403017))
[8] M. J. Htch, E. Snoeck and R. Kilaas (1998) Quantitative measurement of displacement and strain fields from HRTEM micrographs,
Ultramicroscopy 74:131-146.
[9] Martin Rose (2006) Spacing measurements of lattice fringes in HRTEM image using digital darkfield decomposition (M.S. Thesis in Physics,
U. Missouri St. Louis)
[10] F. G. Meyer and R. R. Coifman (1997) Applied and Computational Harmonic Analysis 4:147.
[11] A. G. Flesia, H. Hel-Or, A. Averbuch, E. J. Candes, R. R. Coifman and D. L. Donoho (2001) Digital implementation of ridgelet packets
(Academic Press, New York).
[12] J. Shi, N.-T. Zhang, and X.-P. Liu, "A novel fractional wavelet transform and its applications," Sci. China Inf. Sci., vol. 55, no. 6, pp.
12701279, June 2012. URL: http:/ / www. springerlink. com/ content/ q01np2848m388647/
[13] A.N. Akansu, W.A. Serdijn and I.W. Selesnick, Emerging applications of wavelets: A review (http:/ / web. njit. edu/ ~akansu/ PAPERS/
ANA-IWS-WAS-ELSEVIER PHYSCOM 2010.pdf), Physical Communication, Elsevier, vol. 3, issue 1, pp. 1-18, March 2010.
[14] [14] An overview of P1901 PHY/MAC proposal.
[15] J. Rafiee et al. Feature extraction of forearm EMG signals for prosthetics, Expert Systems with Applications 38 (2011) 405867.
[16] J. Rafiee et al. Female sexual responses using signal processing techniques, The Journal of Sexual Medicine 6 (2009) 308696. (pdf) (http:/ /
rafiee.us/ files/ JSM_2009. pdf)
[17] J. Rafiee and Peter W. Tse, Use of autocorrelation in wavelet coefficients for fault diagnosis, Mechanical Systems and Signal Processing 23
(2009) 155472.
[18] Matlab Toolbox URL: http:/ / matlab.izmiran.ru/ help/ toolbox/ wavelet/ ch06_a32. html
References
Paul S. Addison, The Illustrated Wavelet Transform Handbook, Institute of Physics, 2002, ISBN 0-7503-0692-0
Ali Akansu and Richard Haddad, Multiresolution Signal Decomposition: Transforms, Subbands, Wavelets,
Academic Press, 1992, ISBN 0-12-047140-X
B. Boashash, editor, "Time-Frequency Signal Analysis and Processing A Comprehensive Reference", Elsevier
Science, Oxford, 2003, ISBN 0-08-044335-4.
Tony F. Chan and Jackie (Jianhong) Shen, Image Processing and Analysis Variational, PDE, Wavelet, and
Stochastic Methods, Society of Applied Mathematics, ISBN 0-89871-589-X (2005)
Ingrid Daubechies, Ten Lectures on Wavelets, Society for Industrial and Applied Mathematics, 1992, ISBN
0-89871-274-2
Ramazan Genay, Faruk Seluk and Brandon Whitcher, An Introduction to Wavelets and Other Filtering Methods
in Finance and Economics, Academic Press, 2001, ISBN 0-12-279670-5
Haar A., Zur Theorie der orthogonalen Funktionensysteme, Mathematische Annalen, 69, pp 331371, 1910.
Barbara Burke Hubbard, "The World According to Wavelets: The Story of a Mathematical Technique in the
Making", AK Peters Ltd, 1998, ISBN 1-56881-072-5, ISBN 978-1-56881-072-0
Wavelet
187
Gerald Kaiser, A Friendly Guide to Wavelets, Birkhauser, 1994, ISBN 0-8176-3711-7
Stphane Mallat, "A wavelet tour of signal processing" 2nd Edition, Academic Press, 1999, ISBN 0-12-466606-X
Donald B. Percival and Andrew T. Walden, Wavelet Methods for Time Series Analysis, Cambridge University
Press, 2000, ISBN 0-521-68508-7
Press, WH; Teukolsky, SA; Vetterling, WT; Flannery, BP (2007), "Section 13.10. Wavelet Transforms" (http:/ /
apps.nrbook. com/ empanel/ index. html#pg=699), Numerical Recipes: The Art of Scientific Computing (3rd ed.),
New York: Cambridge University Press, ISBN978-0-521-88068-8
P. P. Vaidyanathan, Multirate Systems and Filter Banks, Prentice Hall, 1993, ISBN 0-13-605718-7
Mladen Victor Wickerhauser, Adapted Wavelet Analysis From Theory to Software, A K Peters Ltd, 1994, ISBN
1-56881-041-5
Martin Vetterli and Jelena Kovaevi, "Wavelets and Subband Coding", Prentice Hall, 1995, ISBN
0-13-097080-8
External links
Hazewinkel, Michiel, ed. (2001), "Wavelet analysis" (http:/ / www. encyclopediaofmath. org/ index. php?title=p/
w097160), Encyclopedia of Mathematics, Springer, ISBN978-1-55608-010-4
OpenSource Wavelet C# Code (http:/ / www. waveletstudio. net/ )
JWave Open source Java implementation of several orthogonal and non-orthogonal wavelets (https:/ / code.
google. com/ p/ jwave/ )
Wavelet Analysis in Mathematica (http:/ / reference. wolfram. com/ mathematica/ guide/ Wavelets. html) (A very
comprehensive set of wavelet analysis tools)
1st NJIT Symposium on Wavelets (April 30, 1990) (First Wavelets Conference in USA) (http:/ / web. njit. edu/
~ali/ s1. htm)
Binomial-QMF Daubechies Wavelets (http:/ / web. njit. edu/ ~ali/ NJITSYMP1990/
AkansuNJIT1STWAVELETSSYMPAPRIL301990. pdf)
Wavelets (http:/ / www-math. mit. edu/ ~gs/ papers/ amsci. pdf) by Gilbert Strang, American Scientist 82 (1994)
250255. (A very short and excellent introduction)
Wavelet Digest (http:/ / www. wavelet. org)
NASA Signal Processor featuring Wavelet methods (http:/ / www. grc. nasa. gov/ WWW/ OptInstr/
NDE_Wave_Image_ProcessorLab. html) Description of NASA Signal & Image Processing Software and Link to
Download
Course on Wavelets given at UC Santa Barbara, 2004 (http:/ / wavelets. ens. fr/ ENSEIGNEMENT/ COURS/
UCSB/ index. html)
The Wavelet Tutorial by Polikar (http:/ / users. rowan. edu/ ~polikar/ WAVELETS/ WTtutorial. html) (Easy to
understand when you have some background with fourier transforms!)
OpenSource Wavelet C++ Code (http:/ / herbert. the-little-red-haired-girl. org/ en/ software/ wavelet/ )
Wavelets for Kids (PDF file) (http:/ / www. isye. gatech. edu/ ~brani/ wp/ kidsA. pdf) (Introductory (for very
smart kids!))
Link collection about wavelets (http:/ / www. cosy. sbg. ac. at/ ~uhl/ wav. html)
Gerald Kaiser's acoustic and electromagnetic wavelets (http:/ / wavelets. com/ pages/ center. html)
A really friendly guide to wavelets (http:/ / perso. wanadoo. fr/ polyvalens/ clemens/ wavelets/ wavelets. html)
Wavelet-based image annotation and retrieval (http:/ / www. alipr. com)
Very basic explanation of Wavelets and how FFT relates to it (http:/ / www. relisoft. com/ Science/ Physics/
sampling. html)
A Practical Guide to Wavelet Analysis (http:/ / paos. colorado. edu/ research/ wavelets/ ) is very helpful, and the
wavelet software in FORTRAN, IDL and MATLAB are freely available online. Note that the biased wavelet
power spectrum needs to be rectified (http:/ / ocgweb. marine. usf. edu/ ~liu/ wavelet. html).
Wavelet
188
WITS: Where Is The Starlet? (http:/ / www. laurent-duval. eu/ siva-wits-where-is-the-starlet. html) A dictionary
of tens of wavelets and wavelet-related terms ending in -let, from activelets to x-lets through bandlets,
contourlets, curvelets, noiselets, wedgelets.
Python Wavelet Transforms Package (http:/ / www. pybytes. com/ pywavelets/ ) OpenSource code for computing
1D and 2D Discrete wavelet transform, Stationary wavelet transform and Wavelet packet transform.
Wavelet Library (http:/ / pages. cs. wisc. edu/ ~kline/ wvlib) GNU/GPL library for n-dimensional discrete
wavelet/framelet transforms.
The Fractional Spline Wavelet Transform (http:/ / bigwww. epfl. ch/ publications/ blu0001. pdf) describes a
fractional wavelet transform based on fractional b-Splines.
A Panorama on Multiscale Geometric Representations, Intertwining Spatial, Directional and Frequency
Selectivity (http:/ / dx. doi. org/ 10. 1016/ j. sigpro. 2011. 04. 025) provides a tutorial on two-dimensional
oriented wavelets and related geometric multiscale transforms.
HD-PLC Alliance (http:/ / www. hd-plc. org/ )
Signal Denoising using Wavelets (http:/ / tx. technion. ac. il/ ~rc/ SignalDenoisingUsingWavelets_RamiCohen.
pdf)
A Concise Introduction to Wavelets (http:/ / www. docstoc. com/ docs/ 160022503/
A-Concise-Introduction-to-Wavelets) by Ren Puchinger.
Discrete wavelet transform
An example of the 2D discrete wavelet transform that is used in JPEG2000. The
original image is high-pass filtered, yielding the three large images, each
describing local changes in brightness (details) in the original image. It is then
low-pass filtered and downscaled, yielding an approximation image; this image is
high-pass filtered to produce the three smaller detail images, and low-pass filtered
to produce the final approximation image in the upper-left.
In numerical analysis and functional
analysis, a discrete wavelet transform
(DWT) is any wavelet transform for which
the wavelets are discretely sampled. As with
other wavelet transforms, a key advantage it
has over Fourier transforms is temporal
resolution: it captures both frequency and
location information (location in time).
Examples
Haar wavelets
Main article: Haar wavelet
The first DWT was invented by the
Hungarian mathematician Alfrd Haar. For
an input represented by a list of
numbers, the Haar wavelet transform may
be considered to simply pair up input values,
storing the difference and passing the sum.
This process is repeated recursively, pairing
up the sums to provide the next scale: finally
resulting in differences and one
final sum.
Discrete wavelet transform
189
Daubechies wavelets
Main article: Daubechies wavelet
The most commonly used set of discrete wavelet transforms was formulated by the Belgian mathematician Ingrid
Daubechies in 1988. This formulation is based on the use of recurrence relations to generate progressively finer
discrete samplings of an implicit mother wavelet function; each resolution is twice that of the previous scale. In her
seminal paper, Daubechies derives a family of wavelets, the first of which is the Haar wavelet. Interest in this field
has exploded since then, and many variations of Daubechies' original wavelets were developed.
[1]
The Dual-Tree Complex Wavelet Transform (WT)
The Dual-Tree Complex Wavelet Transform (WT) is a relatively recent enhancement to the discrete wavelet
transform (DWT), with important additional properties: It is nearly shift invariant and directionally selective in two
and higher dimensions. It achieves this with a redundancy factor of only substantially lower than the
undecimated DWT. The multidimensional (M-D) dual-tree WT is nonseparable but is based on a computationally
efficient, separable filter bank (FB).
[2]
Others
Other forms of discrete wavelet transform include the non- or undecimated wavelet transform (where downsampling
is omitted), the Newland transform (where an orthonormal basis of wavelets is formed from appropriately
constructed top-hat filters in frequency space). Wavelet packet transforms are also related to the discrete wavelet
transform. Complex wavelet transform is another form.
Properties
The Haar DWT illustrates the desirable properties of wavelets in general. First, it can be performed in
operations; second, it captures not only a notion of the frequency content of the input, by examining it at different
scales, but also temporal content, i.e. the times at which these frequencies occur. Combined, these two properties
make the Fast wavelet transform (FWT) an alternative to the conventional Fast Fourier Transform (FFT).
Time Issues
Due to the rate-change operators in the filter bank, the discrete WT is not time-invariant but actually very sensitive to
the alignment of the signal in time. To address the time-varying problem of wavelet transforms, Mallat and Zhong
proposed a new algorithm for wavelet representation of a signal, which is invariant to time shifts.
[3]
According to this
algorithm, which is called a TI-DWT, only the scale parameter is sampled along the dyadic sequence 2^j (jZ) and
the wavelet transform is calculated for each point in time.
[4][5]
Applications
The discrete wavelet transform has a huge number of applications in science, engineering, mathematics and
computer science. Most notably, it is used for signal coding, to represent a discrete signal in a more redundant form,
often as a preconditioning for data compression. Practical applications can also be found in signal processing of
accelerations for gait analysis,
[6]
in digital communications and many others.
[7]
[8][9]
It is shown that discrete wavelet transform (discrete in scale and shift, and continuous in time) is successfully
implemented as analog filter bank in biomedical signal processing for design of low-power pacemakers and also in
ultra-wideband (UWB) wireless communications.
[10]
Discrete wavelet transform
190
Comparison with Fourier transform
See also: Discrete Fourier transform
To illustrate the differences and similarities between the discrete wavelet transform with the discrete Fourier
transform, consider the DWT and DFT of the following sequence: (1,0,0,0), a unit impulse.
The DFT has orthogonal basis (DFT matrix):
while the DWT with Haar wavelets for length 4 data has orthogonal basis in the rows of:
(To simplify notation, whole numbers are used, so the bases are orthogonal but not orthonormal.)
Preliminary observations include:
Wavelets have location the (1,1,1,1) wavelet corresponds to left side versus right side, while the last two
wavelets have support on the left side or the right side, and one is a translation of the other.
Sinusoidal waves do not have location they spread across the whole space but do have phase the second and
third waves are translations of each other, corresponding to being 90 out of phase, like cosine and sine, of which
these are discrete versions.
Decomposing the sequence with respect to these bases yields:
The DWT demonstrates the localization: the (1,1,1,1) term gives the average signal value, the (1,1,1,1) places the
signal in the left side of the domain, and the (1,1,0,0) places it at the left side of the left side, and truncating at any
stage yields a downsampled version of the signal:
Discrete wavelet transform
191
The sinc function, showing the time domain
artifacts (undershoot and ringing) of truncating a
Fourier series.
The DFT, by contrast, expresses the sequence by the interference of
waves of various frequencies thus truncating the series yields a
low-pass filtered version of the series:
Notably, the middle approximation (2-term) differs. From the frequency domain perspective, this is a better
approximation, but from the time domain perspective it has drawbacks it exhibits undershoot one of the values is
negative, though the original series is non-negative everywhere and ringing, where the right side is non-zero,
unlike in the wavelet transform. On the other hand, the Fourier approximation correctly shows a peak, and all points
are within of their correct value, though all points have error. The wavelet approximation, by contrast, places a
peak on the left half, but has no peak at the first point, and while it is exactly correct for half the values (reflecting
location), it has an error of for the other values.
This illustrates the kinds of trade-offs between these transforms, and how in some respects the DWT provides
preferable behavior, particularly for the modeling of transients.
Definition
One level of the transform
The DWT of a signal is calculated by passing it through a series of filters. First the samples are passed through a
low pass filter with impulse response resulting in a convolution of the two:
The signal is also decomposed simultaneously using a high-pass filter . The outputs giving the detail coefficients
(from the high-pass filter) and approximation coefficients (from the low-pass). It is important that the two filters are
related to each other and they are known as a quadrature mirror filter.
However, since half the frequencies of the signal have now been removed, half the samples can be discarded
according to Nyquists rule. The filter outputs are then subsampled by 2 (Mallat's and the common notation is the
opposite, g- high pass and h- low pass):
Discrete wavelet transform
192
This decomposition has halved the time resolution since only half of each filter output characterises the signal.
However, each output has half the frequency band of the input so the frequency resolution has been doubled.
Block diagram of filter analysis
With the subsampling operator
the above summation can be written more concisely.
However computing a complete convolution with subsequent downsampling would waste computation time.
The Lifting scheme is an optimization where these two computations are interleaved.
Cascading and Filter banks
This decomposition is repeated to further increase the frequency resolution and the approximation coefficients
decomposed with high and low pass filters and then down-sampled. This is represented as a binary tree with nodes
representing a sub-space with a different time-frequency localisation. The tree is known as a filter bank.
A 3 level filter bank
At each level in the above diagram the signal is decomposed into low and high frequencies. Due to the
decomposition process the input signal must be a multiple of where is the number of levels.
For example a signal with 32 samples, frequency range 0 to and 3 levels of decomposition, 4 output scales are
produced:
Level Frequencies Samples
3
to
4
to
4
2
to
8
1
to
16
Frequency domain representation of the DWT
Discrete wavelet transform
193
Relationship to the Mother Wavelet
The filterbank implementation of wavelets can be interpreted as computing the wavelet coefficients of a discrete set
of child wavelets for a given mother wavelet . In the case of the discrete wavelet transform, the mother
wavelet is shifted and scaled by powers of two
where is the scale parameter and is the shift parameter, both which are integers.
Recall that the wavelet coefficient of a signal is the projection of onto a wavelet, and let be a
signal of length . In the case of a child wavelet in the discrete family above,
Now fix at a particular scale, so that is a function of only. In light of the above equation, can be
viewed as a convolution of with a dilated, reflected, and normalized version of the mother wavelet,
, sampled at the points . But this is precisely what the detail
coefficients give at level of the discrete wavelet transform. Therefore, for an appropriate choice of and
, the detail coefficients of the filter bank correspond exactly to a wavelet coefficient of a discrete set of child
wavelets for a given mother wavelet .
As an example, consider the discrete Haar wavelet, whose mother wavelet is . Then the dilated,
reflected, and normalized version of this wavelet is , which is, indeed, the highpass
decomposition filter for the discrete Haar wavelet transform
Time Complexity
The filterbank implementation of the Discrete Wavelet Transform takes only O(N) in certain cases, as compared to
O(NlogN) for the fast Fourier transform.
Note that if and are both a constant length (i.e. their length is independent of N), then and
each take O(N) time. The wavelet filterbank does each of these two O(N) convolutions, then splits the signal into two
branches of size N/2. But it only recursively splits the upper branch convolved with (as contrasted with the
FFT, which recursively splits both the upper branch and the lower branch). This leads to the following recurrence
relation
which leads to an O(N) time for the entire operation, as can be shown by a geometric series expansion of the above
relation.
As an example, the Discrete Haar Wavelet Transform is linear, since in that case and are constant length
2.
Discrete wavelet transform
194
Other transforms
See also: Adam7 algorithm
The Adam7 algorithm, used for interlacing in the Portable Network Graphics (PNG) format, is a multiscale model of
the data which is similar to a DWT with Haar wavelets.
Unlike the DWT, it has a specific scale it starts from an 88 block, and it downsamples the image, rather than
decimating (low-pass filtering, then downsampling). It thus offers worse frequency behavior, showing artifacts
(pixelation) at the early stages, in return for simpler implementation.
Code example
In its simplest form, the DWT is remarkably easy to compute.
The Haar wavelet in Java:
public static int[] discreteHaarWaveletTransform(int[] input) {
// This function assumes that input.length=2^n, n>1
int[] output = new int[input.length];
for (int length = input.length >> 1; ; length >>= 1) {
// length = input.length / 2^n, WITH n INCREASING to
log(input.length) / log(2)
for (int i = 0; i < length; ++i) {
int sum = input[i * 2] + input[i * 2 + 1];
int difference = input[i * 2] - input[i * 2 + 1];
output[i] = sum;
output[length + i] = difference;
}
if (length == 1) {
return output;
}
//Swap arrays to do next iteration
System.arraycopy(output, 0, input, 0, length << 1);
}
}
Complete Java code for a 1-D and 2-D DWT using Haar, Daubechies, Coiflet, and Legendre wavelets is available
from the open source project: JWave
[11]
. Furthermore, a fast lifting implementation of the discrete biorthogonal
CDF 9/7 wavelet transform in C, used in the JPEG 2000 image compression standard can be found here
[12]
(archived 5th March 2012).
Discrete wavelet transform
195
Example of Above Code
An example of computing the discrete Haar wavelet coefficients for a sound signal
of someone saying "I Love Wavelets." The original waveform is shown in blue in
the upper left, and the wavelet coefficients are shown in black in the upper right.
Along the bottom is shown three zoomed-in regions of the wavelet coefficients for
different ranges.
This figure shows an example of applying
the above code to compute the Haar wavelet
coefficients on a sound waveform. This
example highlights two key properties of the
wavelet transform:
Natural signals often have some degree
of smootheness, which makes them
sparse in the wavelet domain. There are
far fewer significant components in the
wavelet domain in this example than
there are in the time domain, and most of
the significant components are towards
the coarser coefficients on the left.
Hence, natural signals are compressible
in the wavelet domain.
The wavelet transform is a
multiresolution, bandpass representation of a signal. This can be seen directly from the filterbank definition of the
discrete wavelet transform given in this article. For a signal of length , the coefficients in the range
represent a version of the original signal which is in the pass-band . This is why
zooming in on these ranges of the wavelet coefficients looks so similar in structure to the original signal. Ranges
which are closer to the left (larger in the above notation), are coarser representations of the signal, while ranges
to the right represent finer details.
Notes
[1] [1] Akansu, Ali N.; Haddad, Richard A. (1992), Multiresolution signal decomposition: transforms, subbands, and wavelets, Boston, MA:
Academic Press, ISBN 978-0-12-047141-6
[2] [2] Selesnick, I.W.; Baraniuk, R.G.; Kingsbury, N.C. - 2005 - The dual-tree complex wavelet transform
[3] [3] S. Mallat, A Wavelet Tour of Signal Processing, 2nd ed. San Diego, CA: Academic, 1999.
[4] S. G. Mallat and S. Zhong, Characterization of signals from multiscale edges, IEEE Trans. Pattern Anal. Mach. Intell., vol. 14, no. 7, pp.
710 732, Jul. 1992.
[5] [5] Ince, Kiranyaz, Gabbouj - 2009 - A generic and robust system for automated patient-specific classification of ECG signals
[6] "Novel method for stride length estimation with body area network accelerometers" (http:/ / www. youtube. com/
watch?v=DTpEVQSEBBk), IEEE BioWireless 2011, pp. 79-82
[7] A.N. Akansu and M.J.T. Smith, Subband and Wavelet Transforms: Design and Applications (http:/ / www. amazon. com/
Subband-Wavelet-Transforms-Applications-International/ dp/ 0792396456/ ref=sr_1_1?s=books& ie=UTF8& qid=1325018106& sr=1-1),
Kluwer Academic Publishers, 1995.
[8] A.N. Akansu and M.J. Medley, Wavelet, Subband and Block Transforms in Communications and Multimedia (http:/ / www. amazon. com/
Transforms-Communications-Multimedia-International-Engineering/ dp/ 1441950869/ ref=sr_1_fkmr0_3?s=books& ie=UTF8&
qid=1325018358& sr=1-3-fkmr0), Kluwer Academic Publishers, 1999.
[9] A.N. Akansu, P. Duhamel, X. Lin and M. de Courville Orthogonal Transmultiplexers in Communication: A Review (http:/ / web. njit. edu/
~akansu/ PAPERS/ AKANSU-ORTHOGONAL-MUX-1998. pdf), IEEE Trans. On Signal Processing, Special Issue on Theory and
Applications of Filter Banks and Wavelets. Vol. 46, No.4, pp. 979-995, April, 1998.
[10] A.N. Akansu, W.A. Serdijn, and I.W. Selesnick, Wavelet Transforms in Signal Processing: A Review of Emerging Applications (http:/ /
web. njit.edu/ ~akansu/ PAPERS/ ANA-IWS-WAS-ELSEVIER PHYSCOM 2010. pdf), Physical Communication, Elsevier, vol. 3, issue 1,
pp. 1-18, March 2010.
[11] http:/ / code.google. com/ p/ jwave/
[12] http:/ / web.archive. org/ web/ 20120305164605/ http:/ / www. embl. de/ ~gpau/ misc/ dwt97. c
Discrete wavelet transform
196
References
Fast wavelet transform
The Fast Wavelet Transform is a mathematical algorithm designed to turn a waveform or signal in the time domain
into a sequence of coefficients based on an orthogonal basis of small finite waves, or wavelets. The transform can be
easily extended to multidimensional signals, such as images, where the time domain is replaced with the space
domain.
It has as theoretical foundation the device of a finitely generated, orthogonal multiresolution analysis (MRA). In the
terms given there, one selects a sampling scale J with sampling rate of 2
J
per unit interval, and projects the given
signal f onto the space ; in theory by computing the scalar products
where is the scaling function of the chosen wavelet transform; in practice by any suitable sampling procedure
under the condition that the signal is highly oversampled, so
is the orthogonal projection or at least some good approximation of the original signal in .
The MRA is characterised by its scaling sequence
or, as Z-transform,
and its wavelet sequence
or
(some coefficients might be zero). Those allow to compute the wavelet coefficients , at least some range
k=M,...,J-1, without having to approximate the integrals in the corresponding scalar products. Instead, one can
directly, with the help of convolution and decimation operators, compute those coefficients from the first
approximation .
Forward DWT
One computes recursively, starting with the coefficient sequence and counting down from k=J-1 to some M<J,
single application of a wavelet filter bank, with filters g=a
*
, h=b
*
or
and
or
,
for k=J-1,J-2,...,M and all . In the Z-transform notation:
Fast wavelet transform
197
recursive application of the filter bank
The downsampling operator
reduces an infinite
sequence, given by its
Z-transform, which is simply a
Laurent series, to the sequence of
the coefficients with even
indices,
.
The starred Laurent-polynomial denotes the adjoint filter, it has time-reversed adjoint coefficients,
. (The adjoint of a real number being the number itself, of a complex number its
conjugate, of a real matrix the transposed matrix, of a complex matrix its hermitian adjoint).
Multiplication is polynomial multiplication, which is equivalent to the convolution of the coefficient
sequences.
It follows that
is the orthogonal projection of the original signal f or at least of the first approximation onto the subspace
, that is, with sampling rate of 2
k
per unit interval. The difference to the first approximation is given by
,
where the difference or detail signals are computed from the detail coefficients as
,
with denoting the mother wavelet of the wavelet transform.
Inverse DWT
Given the coefficient sequence for some M<J and all the difference sequences , k=M,...,J-1, one
computes recursively
or
for k=J-1,J-2,...,M and all . In the Z-transform notation:
The upsampling operator creates zero-filled holes inside a given sequence. That is, every second
element of the resulting sequence is an element of the given sequence, every other second element is zero or
. This linear operator is, in the Hilbert space , the adjoint to the
downsampling operator .
Fast wavelet transform
198
References
A.N. Akansu Multiplierless Suboptimal PR-QMF Design Proc. SPIE 1818, Visual Communications and Image
Processing, p. 723, November, 1992
A.N. Akansu Multiplierless 2-band Perfect Reconstruction Quadrature Mirror Filter (PR-QMF) Banks US Patent
5,420,891, 1995
A.N. Akansu Multiplierless PR Quadrature Mirror Filters for Subband Image Coding IEEE Trans. Image
Processing, p. 1359, September 1996
M.J. Mohlenkamp, M.C. Pereyra Wavelets, Their Friends, and What They Can Do for You (2008 EMS) p. 38
B.B. Hubbard The World According to Wavelets: The Story of a Mathematical Technique in the Making (1998
Peters) p. 184
S.G. Mallat A Wavelet Tour of Signal Processing (1999 Academic Press) p. 255
A. Teolis Computational Signal Processing with Wavelets (1998 Birkhuser) p. 116
Y. Nievergelt Wavelets Made Easy (1999 Springer) p. 95
Further reading
G. Beylkin, R. Coifman, V. Rokhlin, "Fast wavelet transforms and numerical algorithms" Comm. Pure Appl. Math.,
44 (1991) pp. 141183
Haar wavelet
The Haar wavelet
In mathematics, the Haar wavelet is a sequence of rescaled
"square-shaped" functions which together form a wavelet family or
basis. Wavelet analysis is similar to Fourier analysis in that it allows a
target function over an interval to be represented in terms of an
orthonormal function basis. The Haar sequence is now recognised as
the first known wavelet basis and extensively used as a teaching
example.
The Haar sequence was proposed in 1909 by Alfrd Haar.
[1]
Haar
used these functions to give an example of an orthonormal system for
the space of square-integrable functions on the unit interval[0,1]. The
study of wavelets, and even the term "wavelet", did not come until much later. As a special case of the Daubechies
wavelet, the Haar wavelet is also known as D2.
The Haar wavelet is also the simplest possible wavelet. The technical disadvantage of the Haar wavelet is that it is
not continuous, and therefore not differentiable. This property can, however, be an advantage for the analysis of
signals with sudden transitions, such as monitoring of tool failure in machines.
The Haar wavelet's mother wavelet function can be described as
Its scaling function can be described as
Haar wavelet
199
Haar functions and Haar system
For every pair n, k of integers in Z, the Haar function
n, k
is defined on the real line R by the formula
This function is supported on the right-open interval I
n, k
= [ k 2
n
, (k+1) 2
n
), i.e., it vanishes outside that interval. It
has integral 0 and norm1 in the Hilbert spaceL
2
(R),
The Haar functions are pairwise orthogonal,
where
i,j
represents the Kronecker delta. Here is the reason for orthogonality: when the two supporting intervals
and are not equal, then they are either disjoint, or else, the smaller of the two supports, say , is
contained in the lower or in the upper half of the other interval, on which the function remains constant. It
follows in this case that the product of these two Haar functions is a multiple of the first Haar function, hence the
product has integral0.
The Haar system on the real line is the set of functions
It is complete in L
2
(R): The Haar system on the line is an orthonormal basis in L
2
(R).
Haar wavelet properties
The Haar wavelet has several notable properties:
1. Any continuous real function with compact support can be approximated uniformly by linear combinations of
and their shifted functions. This extends to those function spaces where
any function therein can be approximated by continuous functions.
2. Any continuous real function on [0,1] can be approximated uniformly on [0,1] by linear combinations of the
constant function1, and their shifted functions.
[2]
3. Orthogonality in the form
Here
i,j
represents the Kronecker delta. The dual function of (t) is (t) itself.
1. Wavelet/scaling functions with different scale n have a functional relationship: since
it follows that coefficients of scale n can be calculated by coefficients of scale n+1:
If
and
then
Haar wavelet
200
Haar system on the unit interval and related systems
In this section, the discussion is restricted to the unit interval [0,1] and to the Haar functions that are supported on
[0,1]. The system of functions considered by Haar in 1910,
[3]
called the Haar system on [0,1] in this article,
consists of the subset of Haar wavelets defined as
with the addition of the constant function 1 on [0,1].
In Hilbert space terms, this Haar system on [0,1] is a complete orthonormal system, i.e., an orthonormal basis, for
the space L
2
([0,1]) of square integrable functions on the unit interval.
The Haar system on [0,1] with the constant function 1 as first element, followed with the Haar functions ordered
according to the lexicographic ordering of couples (n, k) is further a monotone Schauder basis for the space
L
p
([0,1]) when 1 p < .
[4]
This basis is unconditional when 1 < p < .
[5]
There is a related Rademacher system consisting of sums of Haar functions,
Notice that |r
n
(t)|= 1 on [0,1). This is an orthonormal system but it is not complete. In the language of probability
theory, the Rademacher sequence is an instance of a sequence of independent Bernoulli random variables with
mean0. The Khintchine inequality expresses the fact that in all the spaces L
p
([0,1]), 1 p < , the Rademacher
sequence is equivalent to the unit vector basis in
2
.
[6]
In particular, the closed linear span of the Rademacher
sequence in L
p
([0,1]), 1 p < , is isomorphic to
2
.
The FaberSchauder system
The FaberSchauder system
[7][8]
is the family of continuous functions on [0,1] consisting of the constant
function1, and of multiples of indefinite integrals of the functions in the Haar system on[0,1], chosen to have
norm1 in the maximum norm. This system begins with s
0
=1, then s
1
(t) = t is the indefinite integral vanishing at0
of the function1, first element of the Haar system on [0,1]. Next, for every integer n 0, functions s
n, k
are defined
by the formula
These functions s
n, k
are continuous, piecewise linear, supported by the interval I
n, k
that also supports
n, k
. The
function s
n, k
is equal to1 at the midpoint x
n, k
of the interval I
n, k
, linear on both halves of that interval. It takes
values between0 and1 everywhere.
The FaberSchauder system is a Schauder basis for the space C([0,1]) of continuous functions on [0,1]. For everyf
in C([0,1]), the partial sum
of the series expansion of f in the FaberSchauder system is the continuous piecewise linear function that agrees
withf at the 2
n
+ 1 points k 2
n
, where 0 k 2
n
. Next, the formula
gives a way to compute the expansion of f step by step. Since f is uniformly continuous, the sequence {f
n
} converges
uniformly to f. It follows that the FaberSchauder series expansion of f converges in C([0,1]), and the sum of this
series is equal tof.
Haar wavelet
201
The Franklin system
The Franklin system is obtained from the FaberSchauder system by the GramSchmidt orthonormalization
procedure.
[9][10]
Since the Franklin system has the same linear span as that of the FaberSchauder system, this span
is dense in C([0,1]), hence in L
2
([0,1]). The Franklin system is therefore an orthonormal basis for L
2
([0,1]),
consisting of continuous piecewise linear functions. P. Franklin proved in 1928 that this system is a Schauder basis
for C([0,1]).
[11]
The Franklin system is also an unconditional basis for the space L
p
([0,1]) when 1 < p < .
[12]
The
Franklin system provides a Schauder basis in the disk algebra A(D). This was proved in 1974 by Bokarev, after the
existence of a basis for the disk algebra had remained open for more than forty years.
[13]
Bokarev's construction of a Schauder basis in A(D) goes as follows: letf be a complex valued Lipschitz function on
[0,]; thenf is the sum of a cosine series with absolutely summable coefficients. LetT(f) be the element of A(D)
defined by the complex power series with the same coefficients,
Bokarev's basis for A(D) is formed by the images underT of the functions in the Franklin system on[0,].
Bokarev's equivalent description for the mappingT starts by extending f to an even Lipschitz functiong
1
on [,],
identified with a Lipschitz function on the unit circleT. Next, let g
2
be the conjugate function ofg
1
, and define T(f)
to be the function inA(D) whose value on the boundary T ofD is equal tog
1
+ i g
2
.
When dealing with 1-periodic continuous functions, or rather with continuous functions f on [0,1] such that f(0) =
f(1), one removes the function s
1
(t) = t from the FaberSchauder system, in order to obtain the periodic
FaberSchauder system. The periodic Franklin system is obtained by orthonormalization from the periodic
Faber-Schauder system.
[14]
One can prove Bokarev's result on A(D) by proving that the periodic Franklin system
on [0,2] is a basis for a Banach space A
r
isomorphic to A(D). The space A
r
consists of complex continuous
functions on the unit circle T whose conjugate function is also continuous.
Haar matrix
The 22 Haar matrix that is associated with the Haar wavelet is
Using the discrete wavelet transform, one can transform any sequence of even length
into a sequence of two-component-vectors . If one right-multiplies each vector with
the matrix , one gets the result of one stage of the fast Haar-wavelet transform.
Usually one separates the sequences s and d and continues with transforming the sequence s. Sequence s is often
referred to as the averages part, whereas d is known as the details part.
If one has a sequence of length a multiple of four, one can build blocks of 4 elements and transform them in a similar
manner with the 44 Haar matrix
which combines two stages of the fast Haar-wavelet transform.
Compare with a Walsh matrix, which is a non-localized 1/1 matrix.
Generally, the 2N2N Haar matrix can be derived by the following equation.
Haar wavelet
202
where and is the Kronecker product.
The Kronecker product of , where is an mn matrix and is a pq matrix, is expressed as
An un-normalized 8-point Haar matrix is shown below
Note that, the above matrix is an un-normalized Haar matrix. The Haar matrix required by the Haar transform should
be normalized.
From the definition of the Haar matrix , one can observe that, unlike the Fourier transform, matrix has only
real elements (i.e., 1, -1 or 0) and is non-symmetric.
Take 8-point Haar matrix as an example. The first row of matrix measures the average value, and the
second row matrix measures a low frequency component of the input vector. The next two rows are sensitive to
the first and second half of the input vector respectively, which corresponds to moderate frequency components. The
remaining four rows are sensitive to the four section of the input vector, which corresponds to high frequency
components.
Haar transform
The Haar transform is the simplest of the wavelet transforms. This transform cross-multiplies a function against the
Haar wavelet with various shifts and stretches, like the Fourier transform cross-multiplies a function against a sine
wave with two phases and many stretches.
[15]
Introduction
The Haar transform is one of the oldest transform functions, proposed in 1910 by a Hungarian mathematician
Alfred Haar. It is found effective in applications such as signal and image compression in electrical and computer
engineering as it provides a simple and computationally efficient approach for analysing the local aspects of a signal.
The Haar transform is derived from the Haar matrix. An example of a 4x4 Haar transformation matrix is shown
below.
The Haar transform can be thought of as a sampling process in which rows of the transformation matrix act as
samples of finer and finer resolution.
Haar wavelet
203
Compare with the Walsh transform, which is also 1/1, but is non-localized.
Property
The Haar transform has the following properties
1. No need for multiplications. It requires only additions and there are many elements with zero value in the
Haar matrix, so the computation time is short. It is faster than Walsh transform, whose matrix is composed of
+1 and -1.
2. Input and output length are the same. However, the length should be a power of 2, i.e. .
3. It can be used to analyse the localized feature of signals. Due to the orthogonal property of Haar function,
the frequency components of input signal can be analyzed.
Haar transform and Inverse Haar transform
The Haar transform y
n
of an n-input function x
n
is
The Haar transform matrix is real and orthogonal. Thus, the inverse Haar transform can be derived by the following
equations.
where is the identity matrix. For example, when n = 4
Thus, the inverse Haar transform is
Example
The Haar transform coefficients of a n=4-point signal can be found as
The input signal can reconstruct by the inverse Haar transform
Application
Modern cameras are capable of producing images with resolutions in the range of tens of megapixels. These images
need to be compressed before storage and transfer. The Haar transform can be used for image compression. The
basic idea is to transfer the image into a matrix in which each element of the matrix represents a pixel in the image.
For example, a 256256 matrix is saved for a 256256 image. JPEG image compression involves cutting the original
image into 88 sub-images. Each sub-image is an 88 matrix.
Haar wavelet
204
The 2-D Haar transform is required. The equation of the Haar transform is , where is a nn
matrix and is n-point Haar transform. The inverse Haar transform is
Notes
[1] [1] see p.361 in .
[2] [2] As opposed to the preceding statement, this fact is not obvious: see p.363 in .
[3] [3] p.361 in
[4] see p.3 in J. Lindenstrauss, L. Tzafriri, (1977), "Classical Banach Spaces I, Sequence Spaces", Ergebnisse der Mathematik und ihrer
Grenzgebiete 92, Berlin: Springer-Verlag, ISBN 3-540-08072-4.
[5] The result is due to R. E. Paley, A remarkable series of orthogonal functions (I), Proc. London Math. Soc. 34 (1931) pp. 241-264. See also
p.155 in J. Lindenstrauss, L. Tzafriri, (1979), "Classical Banach spaces II, Function spaces". Ergebnisse der Mathematik und ihrer
Grenzgebiete 97, Berlin: Springer-Verlag, ISBN 3-540-08888-1.
[6] see for example p.66 in J. Lindenstrauss, L. Tzafriri, (1977), "Classical Banach Spaces I, Sequence Spaces", Ergebnisse der Mathematik und
ihrer Grenzgebiete 92, Berlin: Springer-Verlag, ISBN 3-540-08072-4.
[7] Faber, Georg (1910), "ber die Orthogonalfunktionen des Herrn Haar", Deutsche Math.-Ver (in German) 19: 104112. ISSN 0012-0456;
http:/ / www-gdz.sub.uni-goettingen. de/ cgi-bin/ digbib.cgi?PPN37721857X ; http:/ / resolver. sub. uni-goettingen. de/
purl?GDZPPN002122553
[8] Schauder, Juliusz (1928), "Eine Eigenschaft des Haarschen Orthogonalsystems", Mathematische Zeitschrift 28: 317320.
[9] see Z. Ciesielski, Properties of the orthonormal Franklin system. Studia Math. 23 1963 141157.
[10] Franklin system. B.I. Golubov (originator), Encyclopedia of Mathematics. URL: http:/ / www. encyclopediaofmath. org/ index.
php?title=Franklin_system& oldid=16655
[11] Philip Franklin, A set of continuous orthogonal functions, Math. Ann. 100 (1928), 522-529.
[12] S. V. Bokarev, Existence of a basis in the space of functions analytic in the disc, and some properties of Franklin's system. Mat. Sb. 95
(1974), 318 (Russian). Translated in Math. USSR-Sb. 24 (1974), 116.
[13] The question appears p.238, 3 in Banach's book, . The disk algebra A(D) appears as Example10, p.12 in Banach's book.
[14] [14] See p.161, III.D.20 and p.192, III.E.17 in
[15] The Haar Transform (http:/ / sepwww.stanford.edu/ public/ docs/ sep75/ ray2/ paper_html/ node4. html)
References
Haar, Alfrd (1910), "Zur Theorie der orthogonalen Funktionensysteme", Mathematische Annalen 69 (3):
331371, doi: 10.1007/BF01456326 (http:/ / dx. doi. org/ 10. 1007/ BF01456326)
Charles K. Chui, An Introduction to Wavelets, (1992), Academic Press, San Diego, ISBN 0-585-47090-1
English Translation of the seminal Haar's article: https:/ / www. uni-hohenheim. de/ ~gzim/ Publications/ haar.
pdf
External links
Hazewinkel, Michiel, ed. (2001), "Haar system" (http:/ / www. encyclopediaofmath. org/ index. php?title=p/
h046070), Encyclopedia of Mathematics, Springer, ISBN978-1-55608-010-4
Free Haar wavelet filtering implementation and interactive demo (http:/ / www. tomgibara. com/ computer-vision/
haar-wavelet)
Free Haar wavelet denoising and lossy signal compression (http:/ / packages. debian. org/ wzip)
205
Filtering
Digital filter
A general finite impulse response filter with n stages, each with an
independent delay, d
i
, and amplification gain, a
i
.
In signal processing, a digital filter is a system that
performs mathematical operations on a sampled,
discrete-time signal to reduce or enhance certain
aspects of that signal. This is in contrast to the other
major type of electronic filter, the analog filter, which
is an electronic circuit operating on continuous-time
analog signals.
A digital filter system usually consists of an
analog-to-digital converter to sample the input signal,
followed by a microprocessor and some peripheral
components such as memory to store data and filter
coefficients etc. Finally a digital-to-analog converter to
complete the output stage. Program Instructions
(software) running on the microprocessor implement
the digital filter by performing the necessary
mathematical operations on the numbers received from the ADC. In some high performance applications, an FPGA
or ASIC is used instead of a general purpose microprocessor, or a specialized DSP with specific paralleled
architecture for expediting operations such as filtering.
Digital filters may be more expensive than an equivalent analog filter due to their increased complexity, but they
make practical many designs that are impractical or impossible as analog filters. When used in the context of
real-time analog systems, digital filters sometimes have problematic latency (the difference in time between the input
and the response) due to the associated analog-to-digital and digital-to-analog conversions and anti-aliasing filters, or
due to other delays in their implementation.
Digital filters are commonplace and an essential element of everyday electronics such as radios, cellphones, and AV
receivers.
Characterization
A digital filter is characterized by its transfer function, or equivalently, its difference equation. Mathematical
analysis of the transfer function can describe how it will respond to any input. As such, designing a filter consists of
developing specifications appropriate to the problem (for example, a second-order low pass filter with a specific
cut-off frequency), and then producing a transfer function which meets the specifications.
The transfer function for a linear, time-invariant, digital filter can be expressed as a transfer function in the
Z-domain; if it is causal, then it has the form:
where the order of the filter is the greater of N or M. See Z-transform's LCCD equation for further discussion of this
transfer function.
Digital filter
206
This is the form for a recursive filter with both the inputs (Numerator) and outputs (Denominator), which typically
leads to an IIR infinite impulse response behaviour, but if the denominator is made equal to unity i.e. no feedback,
then this becomes an FIR or finite impulse response filter.
Analysis techniques
A variety of mathematical techniques may be employed to analyze the behaviour of a given digital filter. Many of
these analysis techniques may also be employed in designs, and often form the basis of a filter specification.
Typically, one characterizes filters by calculating how they will respond to a simple input such as an impulse. One
can then extend this information to compute the filter's response to more complex signals.
Impulse response
The impulse response, often denoted or , is a measurement of how a filter will respond to the Kronecker
delta function. For example, given a difference equation, one would set and for and
evaluate. The impulse response is a characterization of the filter's behaviour. Digital filters are typically considered
in two categories: infinite impulse response (IIR) and finite impulse response (FIR). In the case of linear
time-invariant FIR filters, the impulse response is exactly equal to the sequence of filter coefficients:
IIR filters on the other hand are recursive, with the output depending on both current and previous inputs as well as
previous outputs. The general form of an IIR filter is thus:
Plotting the impulse response will reveal how a filter will respond to a sudden, momentary disturbance.
Difference equation
In discrete-time systems, the digital filter is often implemented by converting the transfer function to a linear
constant-coefficient difference equation (LCCD) via the Z-transform. The discrete frequency-domain transfer
function is written as the ratio of two polynomials. For example:
This is expanded:
and to make the corresponding filter causal, the numerator and denominator are divided by the highest order of :
The coefficients of the denominator, , are the 'feed-backward' coefficients and the coefficients of the numerator
are the 'feed-forward' coefficients, . The resultant linear difference equation is:
or, for the example above:
Digital filter
207
rearranging terms:
then by taking the inverse z-transform:
and finally, by solving for :
This equation shows how to compute the next output sample, , in terms of the past outputs, , the
present input, , and the past inputs, . Applying the filter to an input in this form is equivalent to a
Direct Form I or II realization, depending on the exact order of evaluation.
Filter design
Main article: Filter design
The design of digital filters is a deceptively complex topic.
[1]
Although filters are easily understood and calculated,
the practical challenges of their design and implementation are significant and are the subject of much advanced
research.
There are two categories of digital filter: the recursive filter and the nonrecursive filter. These are often referred to as
infinite impulse response (IIR) filters and finite impulse response (FIR) filters, respectively.
[2]
Filter realization
After a filter is designed, it must be realized by developing a signal flow diagram that describes the filter in terms of
operations on sample sequences.
A given transfer function may be realized in many ways. Consider how a simple expression such as
could be evaluated one could also compute the equivalent . In the same way, all realizations may
be seen as "factorizations" of the same transfer function, but different realizations will have different numerical
properties. Specifically, some realizations are more efficient in terms of the number of operations or storage
elements required for their implementation, and others provide advantages such as improved numerical stability and
reduced round-off error. Some structures are better for fixed-point arithmetic and others may be better for
floating-point arithmetic.
Digital filter
208
Direct Form I
A straightforward approach for IIR filter realization is Direct Form I, where the difference equation is evaluated
directly. This form is practical for small filters, but may be inefficient and impractical (numerically unstable) for
complex designs.
[3]
In general, this form requires 2N delay elements (for both input and output signals) for a filter of
order N.
Direct Form II
The alternate Direct Form II only needs N delay units, where N is the order of the filter potentially half as much as
Direct Form I. This structure is obtained by reversing the order of the numerator and denominator sections of Direct
Form I, since they are in fact two linear systems, and the commutativity property applies. Then, one will notice that
there are two columns of delays ( ) that tap off the center net, and these can be combined since they are
redundant, yielding the implementation as shown below.
The disadvantage is that Direct Form II increases the possibility of arithmetic overflow for filters of high Q or
resonance.
[4]
It has been shown that as Q increases, the round-off noise of both direct form topologies increases
without bounds.
[5]
This is because, conceptually, the signal is first passed through an all-pole filter (which normally
boosts gain at the resonant frequencies) before the result of that is saturated, then passed through an all-zero filter
(which often attenuates much of what the all-pole half amplifies).
Digital filter
209
Cascaded second-order sections
A common strategy is to realize a higher-order (greater than 2) digital filter as a cascaded series of second-order
"biquadratric" (or "biquad") sections
[6]
(see digital biquad filter). The advantage of this strategy is that the coefficient
range is limited. Cascading direct form II sections results in N delay elements for filters of order N. Cascading direct
form I sections results in N+2 delay elements since the delay elements of the input of any section (except the first
section) are redundant with the delay elements of the output of the preceding section.
Other forms
Other forms include:
Direct Form I and II transpose
Series/cascade lower (typical second) order subsections
Parallel lower (typical second) order subsections
Continued fraction expansion
Lattice and ladder
One, two and three-multiply lattice forms
Three and four-multiply normalized ladder forms
ARMA structures
State-space structures:
optimal (in the minimum noise sense): parameters
block-optimal and section-optimal: parameters
input balanced with Givens rotation: parameters
Coupled forms: Gold Rader (normal), State Variable (Chamberlin), Kingsbury, Modified State Variable, Zlzer,
Modified Zlzer
Wave Digital Filters (WDF)
AgarwalBurrus (1AB and 2AB)
HarrisBrooking
ND-TDL
Multifeedback
Analog-inspired forms such as Sallen-key and state variable filters
Systolic arrays
Comparison of analog and digital filters
Digital filters are not subject to the component non-linearities that greatly complicate the design of analog filters.
Analog filters consist of imperfect electronic components, whose values are specified to a limit tolerance (e.g.
resistor values often have a tolerance of 5%) and which may also change with temperature and drift with time. As
the order of an analog filter increases, and thus its component count, the effect of variable component errors is
greatly magnified. In digital filters, the coefficient values are stored in computer memory, making them far more
stable and predictable.
[7]
Because the coefficients of digital filters are definite, they can be used to achieve much more complex and selective
designs specifically with digital filters, one can achieve a lower passband ripple, faster transition, and higher
stopband attenuation than is practical with analog filters. Even if the design could be achieved using analog filters,
the engineering cost of designing an equivalent digital filter would likely be much lower. Furthermore, one can
readily modify the coefficients of a digital filter to make an adaptive filter or a user-controllable parametric filter.
While these techniques are possible in an analog filter, they are again considerably more difficult.
Digital filter
210
Digital filters can be used in the design of finite impulse response filters. Analog filters do not have the same
capability, because finite impulse response filters require delay elements.
Digital filters rely less on analog circuitry, potentially allowing for a better signal-to-noise ratio. A digital filter will
introduce noise to a signal during analog low pass filtering, analog to digital conversion, digital to analog conversion
and may introduce digital noise due to quantization. With analog filters, every component is a source of thermal
noise (such as Johnson noise), so as the filter complexity grows, so does the noise.
However, digital filters do introduce a higher fundamental latency to the system. In an analog filter, latency is often
negligible; strictly speaking it is the time for an electrical signal to propagate through the filter circuit. In digital
systems, latency is introduced by delay elements in the digital signal path, and by analog-to-digital and
digital-to-analog converters that enable the system to process analog signals.
In very simple cases, it is more cost effective to use an analog filter. Introducing a digital filter requires considerable
overhead circuitry, as previously discussed, including two low pass analog filters.
Types of digital filters
Many digital filters are based on the fast Fourier transform, a mathematical algorithm that quickly extracts the
frequency spectrum of a signal, allowing the spectrum to be manipulated (such as to create band-pass filters) before
converting the modified spectrum back into a time-series signal.
Another form of a digital filter is that of a state-space model. A well used state-space filter is the Kalman filter
published by Rudolf Kalman in 1960.
Traditional linear filters are usually based on attenuation. Alternatively nonlinear filters can be designed, including
energy transfer filters
[8]
which allow the user to move energy in a designed way. So that unwanted noise or effects
can be moved to new frequency bands either lower or higher in frequency, spread over a range of frequencies, split,
or focused. Energy transfer filters complement traditional filter designs and introduce many more degrees of freedom
in filter design. Digital energy transfer filters are relatively easy to design and to implement and exploit nonlinear
dynamics.
References
General
A. Antoniou, Digital Filters: Analysis, Design, and Applications, New York, NY: McGraw-Hill, 1993.
J. O. Smith III, Introduction to Digital Filters with Audio Applications
[9]
, Center for Computer Research in
Music and Acoustics (CCRMA), Stanford University, September 2007 Edition.
S.K. Mitra, Digital Signal Processing: A Computer-Based Approach, New York, NY: McGraw-Hill, 1998.
A.V. Oppenheim and R.W. Schafer, Discrete-Time Signal Processing, Upper Saddle River, NJ: Prentice-Hall,
1999.
J.F. Kaiser, Nonrecursive Digital Filter Design Using the Io-sinh Window Function, Proc. 1974 IEEE Int. Symp.
Circuit Theory, pp.2023, 1974.
S.W.A. Bergen and A. Antoniou, Design of Nonrecursive Digital Filters Using the Ultraspherical Window
Function, EURASIP Journal on Applied Signal Processing, vol. 2005, no. 12, pp.19101922, 2005.
T.W. Parks and J.H. McClellan, Chebyshev Approximation for Nonrecursive Digital Filters with Linear Phase
[10]
, IEEE Trans. Circuit Theory, vol. CT-19, pp.189194, Mar. 1972.
L. R. Rabiner, J.H. McClellan, and T.W. Parks, FIR Digital Filter Design Techniques Using Weighted Chebyshev
Approximation
[11]
, Proc. IEEE, vol. 63, pp.595610, Apr. 1975.
A.G. Deczky, Synthesis of Recursive Digital Filters Using the Minimum p-Error Criterion
[12]
, IEEE Trans.
Audio Electroacoust., vol. AU-20, pp.257263, Oct. 1972.
Digital filter
211
Cited
[1] M. E. Valdez, Digital Filters (http:/ / home. mchsi.com/ ~mikevald/ Digfilt. html), 2001.
[2] [2] A. Antoniou, chapter 1
[3] J. O. Smith III, Direct Form I (http:/ / ccrma-www. stanford. edu/ ~jos/ filters/ Direct_Form_I. html)
[4] J. O. Smith III, Direct Form II (http:/ / ccrma-www.stanford. edu/ ~jos/ filters/ Direct_Form_II. html)
[5] L. B. Jackson, "On the Interaction of Roundoff Noise and Dynamic Range in Digital Filters," Bell Sys. Tech. J., vol. 49 (1970 Feb.), reprinted
in Digital Signal Process, L. R. Rabiner and C. M. Rader, Eds. (IEEE Press, New York, 1972).
[6] J. O. Smith III, Series Second Order Sections (http:/ / ccrma-www. stanford. edu/ ~jos/ filters/ Series_Second_Order_Sections. html)
[7] http:/ / www. dspguide. com/ ch21/ 1.htm
[8] [8] Billings S.A. "Nonlinear System Identification: NARMAX Methods in the Time, Frequency, and Spatio-Temporal Domains". Wiley, 2013
[9] http:/ / ccrma-www.stanford.edu/ ~jos/ filters/ filters. html
[10] http:/ / ieeexplore. ieee.org/ search/ wrapper.jsp?arnumber=1083419
[11] http:/ / ieeexplore. ieee.org/ search/ wrapper.jsp?arnumber=1451724
[12] http:/ / ieeexplore. ieee.org/ search/ wrapper.jsp?arnumber=1162392
External links
WinFilter (http:/ / www. winfilter. 20m. com/ ) Free filter design software
DISPRO (http:/ / www. digitalfilterdesign. com/ ) Free filter design software
Java demonstration of digital filters (http:/ / www. falstad. com/ dfilter/ )
IIR Explorer educational software (http:/ / www. terdina. net/ iir/ iir_explorer. html)
Introduction to Filtering (http:/ / math. fullerton. edu/ mathews/ c2003/ ZTransformFilterMod. html)
Introduction to Digital Filters (http:/ / ccrma. stanford. edu/ ~jos/ filters/ filters. html)
Publicly available, very comprehensive lecture notes on Digital Linear Filtering (see bottom of the page) (http:/ /
www. cs. tut. fi/ ~ts/ )
Finite impulse response
In signal processing, a finite impulse response (FIR) filter is a filter whose impulse response (or response to any
finite length input) is of finite duration, because it settles to zero in finite time. This is in contrast to infinite impulse
response (IIR) filters, which may have internal feedback and may continue to respond indefinitely (usually
decaying).
The impulse response (that is, the output in response to a Kronecker delta input) of an Nth-order discrete-time FIR
filter lasts exactly N+1 samples (from first nonzero element through last nonzero element) before it then settles to
zero.
FIR filters can be discrete-time or continuous-time, and digital or analog.
Finite impulse response
212
Definition
A direct form discrete-time FIR filter of order N. The top part is an N-stage delay
line with N+1 taps. Each unit delay is a z
1
operator in Z-transform notation.
A lattice form discrete-time FIR filter of order N. Each unit delay is a z
1
operator
in Z-transform notation.
For a causal discrete-time FIR filter of order
N, each value of the output sequence is a
weighted sum of the most recent input
values:
where:
is the input signal,
is the output signal,
is the filter order; an th-order filter
has terms on the right-hand side
is the value of the impulse response at
the i'th instant for of an
th-order FIR filter. If the filter is a direct
form FIR filter then is also a
coefficient of the filter .
This computation is also known as discrete
convolution.
The in these terms are commonly
referred to as taps, based on the structure of
a tapped delay line that in many implementations or block diagrams provides the delayed inputs to the multiplication
operations. One may speak of a 5th order/6-tap filter, for instance.
The impulse response of the filter as defined is nonzero over a finite duration. Including zeros, the impulse response
is the infinite sequence:
If an FIR filter is non-causal, the range of nonzero values in its impulse response can start before n=0, with the
defining formula appropriately generalized.
Properties
A FIR filter has a number of useful properties which sometimes make it preferable to an infinite impulse response
(IIR) filter. FIR filters:
Require no feedback. This means that any rounding errors are not compounded by summed iterations. The same
relative error occurs in each calculation. This also makes implementation simpler.
Are inherently stable, since the output is a sum of a finite number of finite multiples of the input values, so can be
no greater than times the largest value appearing in the input.
They can easily be designed to be linear phase by making the coefficient sequence symmetric. This property is
sometimes desired for phase-sensitive applications, for example data communications, crossover filters, and
mastering.
The main disadvantage of FIR filters is that considerably more computation power in a general purpose processor is
required compared to an IIR filter with similar sharpness or selectivity, especially when low frequency (relative to
the sample rate) cutoffs are needed. However many digital signal processors provide specialized hardware features to
Finite impulse response
213
make FIR filters approximately as efficient as IIR for many applications.
Frequency response
The filter's effect on the x[n] sequence is described in the frequency domain by the Convolution theorem:
and
where operators and respectively denote the discrete-time Fourier transform (DTFT) and its inverse.
Therefore, complex-valued, multiplicative function is the filter's frequency response. It is defined by a
Fourier series:
where the added subscript denotes 2-periodicity. Here represents frequency in normalized units
(radians/sample). The substitution favored by many filter design programs, changes the units of
frequency to cycles/sample and the periodicity to 1.
[1]
When the x[n] sequence has a known sampling-rate,
samples/second, the substitution changes the units of frequency to cycles/second (hertz) and
the periodicity to The value corresponds to a frequency of Hz cycles/sample, which
is the Nyquist frequency.
Transfer function
The frequency response can also be written as where function is the Z-transform of
the impulse response:
z is a complex variable, and H(z) is a surface. One cycle of the periodic frequency response can be found in the
region defined by which is the unit circle of the z-plane. Filter transfer functions are often
used to verify the stability of IIR designs. As we have already noted, FIR designs are inherently stable.
Filter design
A FIR filter is designed by finding the coefficients and filter order that meet certain specifications, which can be in
the time-domain (e.g. a matched filter) and/or the frequency domain (most common). Matched filters perform a
cross-correlation between the input signal and a known pulse-shape. The FIR convolution is a cross-correlation
between the input signal and a time-reversed copy of the impulse-response. Therefore, the matched-filter's impulse
response is "designed" by sampling the known pulse-shape and using those samples in reverse order as the
coefficients of the filter.
[2]
When a particular frequency response is desired, several different design methods are common:
1. 1. Window design method
2. 2. Frequency Sampling method
3. 3. Weighted least squares design
4. Parks-McClellan method (also known as the Equiripple, Optimal, or Minimax method). The Remez exchange
algorithm is commonly used to find an optimal equiripple set of coefficients. Here the user specifies a desired
frequency response, a weighting function for errors from this response, and a filter order N. The algorithm then
finds the set of coefficients that minimize the maximum deviation from the ideal. Intuitively, this finds
the filter that is as close as you can get to the desired response given that you can use only coefficients.
Finite impulse response
214
This method is particularly easy in practice since at least one text
[3]
includes a program that takes the desired filter
and N, and returns the optimum coefficients.
5. Equiripple FIR filters can be designed using the FFT algorithms as well.
[4]
The algorithm is iterative in nature.
You simply compute the DFT of an initial filter design that you have using the FFT algorithm (if you don't have
an initial estimate you can start with h[n]=delta[n]). In the Fourier domain or FFT domain you correct the
frequency response according to your desired specs and compute the inverse FFT. In time-domain you retain only
N of the coefficients (force the other coefficients to zero). Compute the FFT once again. Correct the frequency
response according to specs.
Software packages like MATLAB, GNU Octave, Scilab, and SciPy provide convenient ways to apply these different
methods.
Window design method
In the window design method, one first designs an ideal IIR filter and then truncates the infinite impulse response by
multiplying it with a finite length window function. The result is a finite impulse response filter whose frequency
response is modified from that of the IIR filter. Multiplying the infinite impulse by the window function in the time
domain results in the frequency response of the IIR being convolved with the Fourier transform (or DTFT) of the
window function. If the window's main lobe is narrow, the composite frequency response remains close to that of the
ideal IIR filter.
The ideal response is usually rectangular, and the corresponding IIR is a sinc function. The result of the frequency
domain convolution is that the edges of the rectangle are tapered, and ripples appear in the passband and stopband.
Working backward, one can specify the slope (or width) of the tapered region (transition band) and the height of the
ripples, and thereby derive the frequency domain parameters of an appropriate window function. Continuing
backward to an impulse response can be done by iterating a filter design program to find the minimum filter order.
Another method is to restrict the solution set to the parametric family of Kaiser windows, which provides closed
form relationships between the time-domain and frequency domain parameters. In general, that method will not
achieve the minimum possible filter order, but it is particularly convenient for automated applications that require
dynamic, on-the-fly, filter design.
The window design method is also advantageous for creating efficient half-band filters, because the corresponding
sinc function is zero at every other sample point (except the center one). The product with the window function does
not alter the zeros, so almost half of the coefficients of the final impulse response are zero. An appropriate
implementation of the FIR calculations can exploit that property to double the filter's efficiency.
Moving average example
Fig. (a) Block diagram of a simple FIR filter (2nd-order/3-tap filter in this case, implementing a moving average)
Finite impulse response
215
Fig. (b) Pole-Zero Diagram
Fig. (c) Magnitude and phase responses
Finite impulse response
216
Fig. (d) Amplitude and phase responses
A moving average filter is a very simple FIR filter. It is sometimes called a boxcar filter, especially when followed
by decimation. The filter coefficients, , are found via the following equation:
To provide a more specific example, we select the filter order:
The impulse response of the resulting filter is:
The Fig. (a) on the right shows the block diagram of a 2nd-order moving-average filter discussed below. The transfer
function is:
Fig. (b) on the right shows the corresponding pole-zero diagram. Zero frequency (DC) corresponds to (1,0), positive
frequencies advancing counterclockwise around the circle to the Nyquist frequency at (-1,0). Two poles are located
at the origin, and two zeros are located at , .
The frequency response, in terms of normalized frequency , is:
Fig. (c) on the right shows the magnitude and phase components of But plots like these can also be
generated by doing a discrete Fourier transform (DFT) of the impulse response.
[5]
And because of symmetry, filter
design or viewing software often displays only the [0,] region. The magnitude plot indicates that the
moving-average filter passes low frequencies with a gain near 1 and attenuates high frequencies, and is thus a crude
low-pass filter. The phase plot is linear except for discontinuities at the two frequencies where the magnitude goes to
zero. The size of the discontinuities is , indicating a sign reversal. They do not affect the property of linear phase.
That fact is illustrated in Fig. (d).
Finite impulse response
217
Notes
[1] A notable exception is Matlab, which prefers units of half-cycles/sample = cycles/2-samples, because the Nyquist frequency in those units is
1, a convenient choice for plotting software that displays the interval from 0 to the Nyquist frequency.
[2] [2] Oppenheim, Alan V., Willsky, Alan S., and Young, Ian T.,1983: Signals and Systems, p. 256 (Englewood Cliffs, New Jersey: Prentice-Hall,
Inc.) ISBN 0-13-809731-3
[3] [3] Rabiner, Lawrence R., and Gold, Bernard, 1975: Theory and Application of Digital Signal Processing (Englewood Cliffs, New Jersey:
Prentice-Hall, Inc.) ISBN 0-13-914101-4
[4] [4] A. E. Cetin, O.N. Gerek, Y. Yardimci, "Equiripple FIR filter design by the FFT algorithm," IEEE Signal Processing Magazine, pp. 60-64,
March 1997.
[5] See Sampling the DTFT.
Citations
External links
Notes on the Optimal Design of FIR Filters (http:/ / cnx. org/ content/ col10553/ latest/ ) Connexions online book
by John Treichler (2008).
FIR FAQ (http:/ / dspguru. com/ dsp/ faqs/ fir) provided by dspguru.com.
BruteFIR; Software for applying long FIR filters to multi-channel digital audio, either offline or in realtime.
(http:/ / www. ludd. luth. se/ ~torger/ brutefir. html)
Freeverb3 Reverb Impulse Response Processor (http:/ / www. nongnu. org/ freeverb3/ )
Worked examples and explanation for designing FIR filters using windowing (http:/ / www. labbookpages. co.
uk/ audio/ firWindowing. html). Includes code examples.
A JAVA applet with different FIR-filters (http:/ / www. falstad. com/ dfilter/ ); the filters are applied to sound and
the results can be heard immediately. The source code is also available.
Matlab code (http:/ / signal. ee. bilkent. edu. tr/ my_filter. m); Matlab code for "Equiripple FIR filter design by
the FFT algorithm" by A. Enis Cetin, O. N. Gerek and Y. Yardimci, IEEE Signal Processing Magazine, 1997.
Infinite impulse response
218
Infinite impulse response
Infinite impulse response (IIR) is a property applying to many linear time-invariant systems. Common examples of
linear time-invariant systems are most electronic and digital filters. Systems with this property are known as IIR
systems or IIR filters, and are distinguished by having an impulse response which does not become exactly zero past
a certain point, but continues indefinitely. This is in contrast to a finite impulse response in which the impulse
response h(t) does become exactly zero at times t > T for some finite T, thus being of finite duration.
In practice, the impulse response even of IIR systems usually approaches zero and can be neglected past a certain
point. However the physical systems which give rise to IIR or FIR (finite impulse response) responses are dissimilar,
and therein lies the importance of the distinction. For instance, analog electronic filters composed of resistors,
capacitors, and/or inductors (and perhaps linear amplifiers) are generally IIR filters. On the other hand, discrete-time
filters (usually digital filters) based on a tapped delay line employing no feedback are necessarily FIR filters. The
capacitors (or inductors) in the analog filter have a "memory" and their internal state never completely relaxes
following an impulse. But in the latter case, after an impulse has reached the end of the tapped delay line, the system
has no further memory of that impulse and has returned to its initial state; its impulse response beyond that point is
exactly zero.
Implementation and design
Although almost all analog electronic filters are IIR, digital filters may be either IIR or FIR. The presence of
feedback in the topology of a discrete-time filter (such as the block diagram shown below) generally creates an IIR
response. The z domain transfer function of an IIR filter contains a non-trivial denominator, describing those
feedback terms. The transfer function of an FIR filter, on the other hand, has only a numerator as expressed in the
general form derived below. All of the coefficients (feedback terms) are zero and the filter has no finite poles.
The transfer functions pertaining to IIR analog electronic filters have been extensively studied and optimized for
their amplitude and phase characteristics. These continuous-time filter functions are described in the Laplace
domain. Desired solutions can be transferred to the case of discrete-time filters whose transfer functions are
expressed in the z domain, through the use of certain mathematical techniques such as the bilinear transform,
impulse invariance, or polezero matching method. Thus digital IIR filters can be based on well-known solutions for
analog filters such as the Chebyshev filter, Butterworth filter, and Elliptic filter, inheriting the characteristics of those
solutions.
Transfer function derivation
Digital filters are often described and implemented in terms of the difference equation that defines how the output
signal is related to the input signal:
where:
is the feedforward filter order
are the feedforward filter coefficients
is the feedback filter order
are the feedback filter coefficients
is the input signal
is the output signal.
Infinite impulse response
219
A more condensed form of the difference equation is:
which, when rearranged, becomes:
To find the transfer function of the filter, we first take the Z-transform of each side of the above equation, where we
use the time-shift property to obtain:
We define the transfer function to be:
Considering that in most IIR filter designs coefficient is 1, the IIR filter transfer function takes the more
traditional form:
Description of block diagram
Simple IIR filter block diagram
A typical block diagram of an IIR filter looks like the
following. The block is a unit delay. The
coefficients and number of feedback/feedforward paths
are implementation-dependent.
Stability
The transfer function allows us to judge whether or not
a system is bounded-input, bounded-output (BIBO)
stable. To be specific, the BIBO stability criteria
requires that the ROC of the system includes the unit
circle. For example, for a causal system, all poles of the
transfer function have to have an absolute value smaller
than one. In other words, all poles must be located
within a unit circle in the -plane.
The poles are defined as the values of which make the denominator of equal to 0:
Clearly, if then the poles are not located at the origin of the z-plane. This is in contrast to the FIR filter
where all poles are located at the origin, and is therefore always stable.
Infinite impulse response
220
IIR filters are sometimes preferred over FIR filters because an IIR filter can achieve a much sharper transition region
roll-off than FIR filter of the same order.
Example
Let the transfer function of a discrete-time filter be given by:
governed by the parameter , a real number with . is stable and causal with a pole at . The
time-domain impulse response can be shown to be given by:
where is the unit step function. It can be seen that is non-zero for all , thus an impulse response
which continues infinitely.
Advantages and disadvantages
The main advantage digital IIR filters have over FIR filters is their efficiency in implementation, in order to meet a
specification in terms of passband, stopband, ripple, and/or roll-off. Such a set of specifications can be accomplished
with a lower order (Q in the above formulae) IIR filter than would be required for an FIR filter meeting the same
requirements. If implemented in a signal processor, this implies a correspondingly fewer number of calculations per
time step; the computational savings is often of a rather large factor.
On the other hand, FIR filters can be easier to design, for instance, to match a particular frequency response
requirement. This is particularly true when the requirement is not one of the usual cases (high-pass, low-pass, notch,
etc.) which have been studied and optimized for analog filters. Also FIR filters can be easily made to be linear phase
(constant group delay vs frequency), a property that is not easily met using IIR filters and then only as an
approximation (for instance with the Bessel filter). Another issue regarding digital IIR filters is the potential for limit
cycle behavior when idle, due to the feedback system in conjunction with quantization.
External links
The fifth module of the BORES Signal Processing DSP course - Introduction to DSP
[1]
IIR Digital Filter Design applet
[2]
in Java
IIR Digital Filter design tool
[2]
- produces coefficients, graphs, poles, zeros, and C code
Almafa.org Online IIR Design Tool
[3]
- does not require Java
References
[1] http:/ / www. bores. com/ courses/ intro/ iir/ index.htm
[2] http:/ / www-users. cs.york. ac.uk/ ~fisher/ mkfilter/
[3] http:/ / almafa. org/ ?sidebar=docs/ iir. html
Nyquist ISI criterion
221
Nyquist ISI criterion
Raised cosine response meets the Nyquist ISI criterion. Consecutive raised-cosine
impulses demonstrate the zero ISI property between transmitted symbols at the sampling
instants. At t=0 the middle pulse is at its maximum and the sum of other impulses is zero.
In communications, the Nyquist ISI
criterion describes the conditions
which, when satisfied by a
communication channel (including
responses of transmit and receive
filters), result in no intersymbol
interference or ISI. It provides a
method for constructing band-limited
functions to overcome the effects of
intersymbol interference.
When consecutive symbols are
transmitted over a channel by a linear
modulation (such as ASK, QAM, etc.), the impulse response (or equivalently the frequency response) of the channel
causes a transmitted symbol to be spread in the time domain. This causes intersymbol interference because the
previously transmitted symbols affect the currently received symbol, thus reducing tolerance for noise. The Nyquist
theorem relates this time-domain condition to an equivalent frequency-domain condition.
The Nyquist criterion is closely related to the Nyquist-Shannon sampling theorem, with only a differing point of
view.
Nyquist criterion
If we denote the channel impulse response as , then the condition for an ISI-free response can be expressed as:
for all integers , where is the symbol period. The Nyquist theorem says that this is equivalent to:
,
where is the Fourier transform of . This is the Nyquist ISI criterion.
This criterion can be intuitively understood in the following way: frequency-shifted replicas of H(f) must add up to a
constant value.
In practice this criterion is applied to baseband filtering by regarding the symbol sequence as weighted impulses
(Dirac delta function). When the baseband filters in the communication system satisfy the Nyquist criterion, symbols
can be transmitted over a channel with flat response within a limited frequency band, without ISI. Examples of such
baseband filters are the raised-cosine filter, or the sinc filter as the ideal case.
Nyquist ISI criterion
222
Derivation
To derive the criterion, we first express the received signal in terms of the transmitted symbol and the channel
response. Let the function h(t) be the channel impulse response, x[n] the symbols to be sent, with a symbol period of
T
s
; the received signal y(t) will be in the form (where noise has been ignored for simplicity):
.
Sampling this signal at intervals of T
s
, we can express y(t) as a discrete-time equation:
.
If we write the h[0] term of the sum separately, we can express this as:
,
and from this we can conclude that if a response h[n] satisfies
,
only one transmitted symbol has an effect on the received y[k] at sampling instants, thus removing any ISI. This is
the time-domain condition for an ISI-free channel. Now we find a frequency-domain equivalent for it. We start by
expressing this condition in continuous time:
for all integer . We multiply such a h(t) by a sum of Dirac delta function (impulses) separated by intervals T
s
This is equivalent of sampling the response as above but using a continuous time expression. The right side of the
condition can then be expressed as one impulse in the origin:
Fourier transforming both members of this relationship we obtain:
and
.
This is the Nyquist ISI criterion and, if a channel response satisfies it, then there is no ISI between the different
samples.
References
John G. Proakis, "Digital Communications, 3rd Edition", McGraw-Hill Book Co., 1995. ISBN 0-07-113814-5
Behzad Razavi, "RF Microelectronics", Prentice-Hall, Inc., 1998. ISBN 0-13-887571-5
Pulse shaping
223
Pulse shaping
In electronics and telecommunications, pulse shaping is the process of changing the waveform of transmitted pulses.
Its purpose is to make the transmitted signal better suited to its purpose or the communication channel, typically by
limiting the effective bandwidth of the transmission. By filtering the transmitted pulses this way, the intersymbol
interference caused by the channel can be kept in control. In RF communication, pulse shaping is essential for
making the signal fit in its frequency band.
Typically pulse shaping occurs after line coding and before modulation.
Need for pulse shaping
Transmitting a signal at high modulation rate through a band-limited channel can create intersymbol interference. As
the modulation rate increases, the signal's bandwidth increases. When the signal's bandwidth becomes larger than the
channel bandwidth, the channel starts to introduce distortion to the signal. This distortion usually manifests itself as
intersymbol interference.
The signal's spectrum is determined by the pulse shaping filter used by the transmitter. Usually the transmitted
symbols are represented as a time sequence of dirac delta pulses. This theoretical signal is then filtered with the pulse
shaping filter, producing the transmitted signal. The spectrum of the transmission is thus determined by the filter.
In many base band communication systems the pulse shaping filter is implicitly a boxcar filter. Its Fourier transform
is of the form sin(x)/x, and has significant signal power at frequencies higher than symbol rate. This is not a big
problem when optical fibre or even twisted pair cable is used as the communication channel. However, in RF
communications this would waste bandwidth, and only tightly specified frequency bands are used for single
transmissions. In other words, the channel for the signal is band-limited. Therefore better filters have been
developed, which attempt to minimise the bandwidth needed for a certain symbol rate.
An example in other areas of electronics is the generation of pulses where the rise time need to be short; one way to
do this is to start with a slower-rising pulse, and increase the rise time, for example with a step recovery diode
circuit.
Pulse shaping filters
A typical NRZ coded signal is implicitly filtered
with a boxcar filter.
Not every filter can be used as a pulse shaping filter. The filter itself
must not introduce intersymbol interference it needs to satisfy
certain criteria. The Nyquist ISI criterion is a commonly used criterion
for evaluation, because it relates the frequency spectrum of the
transmitter signal to intersymbol interference.
Examples of pulse shaping filters that are commonly found in
communication systems are:
The trivial boxcar filter
Sinc shaped filter
Raised-cosine filter
Gaussian filter
Sender side pulse shaping is often combined with a receiver side matched filter to achieve optimum tolerance for
noise in the system. In this case the pulse shaping is equally distributed between the sender and receiver filters. The
filters' amplitude responses are thus pointwise square roots of the system filters.
Pulse shaping
224
Other approaches that eliminate complex pulse shaping filters have been invented. In OFDM, the carriers are
modulated so slowly that each carrier is virtually unaffected by the bandwidth limitation of the channel.
Boxcar filter
The boxcar filter results in infinitely wide bandwidth for the signal. Thus its usefulness is limited, but it is used
widely in wired baseband communications, where the channel has some extra bandwidth and the distortion created
by the channel can be tolerated.
Sinc filter
Main article: sinc filter
Amplitude response of raised-cosine filter with various roll-off factors
Theoretically the best pulse shaping
filter would be the sinc filter, but it
cannot be implemented precisely. It is
a non-causal filter with relatively
slowly decaying tails. It is also
problematic from a synchronisation
point of view as any phase error results
in steeply increasing intersymbol
interference.
Raised-cosine filter
Main article: raised-cosine filter
Raised-cosine filters are practical to implement and they are in wide use. They have a configurable excess
bandwidth, so communication systems can choose a trade off between a simpler filter and spectral efficiency.
Gaussian filter
Main article: Gaussian filter
This gives an output pulse shaped like a Gaussian function.
References
John G. Proakis, "Digital Communications, 3rd Edition" Chapter 9, McGraw-Hill Book Co., 1995. ISBN
0-07-113814-5
National Instruments Signal Generator Tutorial, Pulse Shaping to Improve Spectral Efficiency
[1]
National Instruments Measurement Fundamentals Tutorial, Pulse-Shape Filtering in Communications Systems
[2]
References
[1] http:/ / zone. ni.com/ devzone/ cda/ ph/ p/ id/ 200
[2] http:/ / zone. ni.com/ devzone/ cda/ tut/ p/ id/ 3876
Raised-cosine filter
225
Raised-cosine filter
The raised-cosine filter is a filter frequently used for pulse-shaping in digital modulation due to its ability to
minimise intersymbol interference (ISI). Its name stems from the fact that the non-zero portion of the frequency
spectrum of its simplest form ( ) is a cosine function, 'raised' up to sit above the (horizontal) axis.
Mathematical description
Frequency response of raised-cosine filter with various roll-off factors
Impulse response of raised-cosine filter with various roll-off factors
The raised-cosine filter is an implementation
of a low-pass Nyquist filter, i.e., one that has
the property of vestigial symmetry. This
means that its spectrum exhibits odd
symmetry about , where is the
symbol-period of the communications
system.
Its frequency-domain description is a
piecewise function, given by:
and characterised by two values; , the roll-off factor, and , the reciprocal of the symbol-rate.
The impulse response of such a filter
[1]
is given by:
, in terms of the normalised sinc function.
Raised-cosine filter
226
Roll-off factor
The roll-off factor, , is a measure of the excess bandwidth of the filter, i.e. the bandwidth occupied beyond the
Nyquist bandwidth of . If we denote the excess bandwidth as , then:
where is the symbol-rate.
The graph shows the amplitude response as is varied between 0 and 1, and the corresponding effect on the
impulse response. As can be seen, the time-domain ripple level increases as decreases. This shows that the excess
bandwidth of the filter can be reduced, but only at the expense of an elongated impulse response.
As approaches 0, the roll-off zone becomes infinitesimally narrow, hence:
where is the rectangular function, so the impulse response approaches . Hence, it converges to
an ideal or brick-wall filter in this case.
When , the non-zero portion of the spectrum is a pure raised cosine, leading to the simplification:
Bandwidth
The bandwidth of a raised cosine filter is most commonly defined as the width of the non-zero portion of its
spectrum, i.e.:
(0<T<1)
Raised-cosine filter
227
Auto-correlation function
The auto-correlation function of raised cosine function is as follows:
The auto-correlation result can be used to analyze various sampling offset results when analyzed with
auto-correlation.
Application
Consecutive raised-cosine impulses, demonstrating zero-ISI property
When used to filter a symbol stream, a
Nyquist filter has the property of
eliminating ISI, as its impulse response
is zero at all (where is an
integer), except .
Therefore, if the transmitted waveform
is correctly sampled at the receiver, the
original symbol values can be
recovered completely.
However, in many practical
communications systems, a matched filter is used in the receiver, due to the effects of white noise. For zero ISI, it is
the net response of the transmit and receive filters that must equal :
And therefore:
These filters are called root-raised-cosine filters.
References
[1] Michael Zoltowski - Equations for the Raised Cosine and Square-Root Raised Cosine Shapes (http:/ / www. commsys. isy. liu. se/ TSKS04/
lectures/ 3/ MichaelZoltowski_SquareRootRaisedCosine. pdf)
Glover, I.; Grant, P. (2004). Digital Communications (2nd ed.). Pearson Education Ltd. ISBN 0-13-089399-4.
Proakis, J. (1995). Digital Communications (3rd ed.). McGraw-Hill Inc. ISBN 0-07-113814-5.
Tavares, L.M.; Tavares G.N. (1998) Comments on "Performance of Asynchronous Band-Limited DS/SSMA
Systems" . IEICE Trans. Commun., Vol. E81-B, No. 9
External links
Technical article entitled "The care and feeding of digital, pulse-shaping filters" (http:/ / www. nonstopsystems.
com/ radio/ article-raised-cosine. pdf) originally published in RF Design, written by Ken Gentile.
Root-raised-cosine filter
228
Root-raised-cosine filter
In signal processing, a root-raised-cosine filter (RRC), sometimes known as square-root-raised-cosine filter
(SRRC), is frequently used as the transmit and receive filter in a digital communication system to perform matched
filtering. This helps in minimizing intersymbol interference (ISI). The combined response of two such filters is that
of the raised-cosine filter. It obtains its name from the fact that its frequency response, , is the square root
of the frequency response of the raised-cosine filter, :
or:
Why is it required
To have minimum ISI (Intersymbol interference), the overall response of transmit filter, channel response and
receive filter has to satisfy Nyquist ISI criterion. Raised-cosine filter is the most popular filter response satisfying
this criterion. Half of this filtering is done on the transmit side and half of this is done on the receive side. On the
receive side, the channel response, if it can be accurately estimated, can also be taken into account so that the overall
response is Raised-cosine filter.
Mathematical Description
The impulse response of a root-raised cosine filter for three values of : 1.0 (blue), 0.5
(red) and 0 (green).
The RRC filter is characterised by two
values; , the roll-off factor, and T
s
,
the reciprocal of the symbol-rate.
The impulse response of such a filter
can be given as:
,
though there are other forms as well.
Root-raised-cosine filter
229
Unlike the raised-cosine filter, the impulse response is not zero at the intervals of T
s
. However, the combined
transmit and receive filters form a raised-cosine filter which does have zero at the intervals of T
s
. Only in the case
of =0 does the root raised-cosine have zeros at T
s
.
References
S. Daumont, R. Basel, Y. Lout, "Root-Raised Cosine filter influences on PAPR distribution of single carrier
signals", ISCCSP 2008, Malta, 12-14 March 2008.
Proakis, J. (1995). Digital Communications (3rd ed.). McGraw-Hill Inc. ISBN 0-07-113814-5.
Adaptive filter
An adaptive filter is a system with a linear filter that has a transfer function controlled by variable parameters and a
means to adjust those parameters according to an optimization algorithm. Because of the complexity of the
optimization algorithms, most adaptive filters are digital filters. Adaptive filters are required for some applications
because some parameters of the desired processing operation (for instance, the locations of reflective surfaces in a
reverberant space) are not known in advance or are changing. The closed loop adaptive filter uses feedback in the
form of an error signal to refine its transfer function.
Generally speaking, the closed loop adaptive process involves the use of a cost function, which is a criterion for
optimum performance of the filter, to feed an algorithm, which determines how to modify filter transfer function to
minimize the cost on the next iteration. The most common cost function is the mean square of the error signal.
As the power of digital signal processors has increased, adaptive filters have become much more common and are
now routinely used in devices such as mobile phones and other communication devices, camcorders and digital
cameras, and medical monitoring equipment.
Example application
The recording of a heart beat (an ECG), may be corrupted by noise from the AC mains. The exact frequency of the
power and its harmonics may vary from moment to moment.
One way to remove the noise is to filter the signal with a notch filter at the mains frequency and its vicinity, which
could excessively degrade the quality of the ECG since the heart beat would also likely have frequency components
in the rejected range.
To circumvent this potential loss of information, an adaptive filter could be used. The adaptive filter would take
input both from the patient and from the mains and would thus be able to track the actual frequency of the noise as it
fluctuates and subtract the noise from the recording. Such an adaptive technique generally allows for a filter with a
smaller rejection range, which means, in this case, that the quality of the output signal is more accurate for medical
purposes.
Adaptive filter
230
Block diagram
The idea behind a closed loop adaptive filter is that a variable filter is adjusted until the error (the difference between
the filter output and the desired signal) is minimized. The Least Mean Squares (LMS) filter and the Recursive Least
Squares (RLS) filter are types of adaptive filter.
Adaptive Filter. k = sample number, x = reference input, X = set of recent values of x, d =
desired input, W = set of filter coefficients, = error output, f = filter impulse response, *
= convolution, = summation, upper box=linear filter, lower box=adaption algorithm
Adaptive Filter, compact representation. k = sample number, x = reference input, d =
desired input, = error output, f = filter impulse response, = summation, box=linear
filter and adaption algorithm.
There are two input signals to the
adaptive filter: d
k
and x
k
which are
sometimes called the primary input
and the reference input respectively.
[1]
which includes the desired
signal plus undesired
interference and
which includes the signals
that are correlated to some of the
undesired interference in .
k represents the discrete sample
number.
The filter is controlled by a set of L+1
coefficients or weights.
represents
the set or vector of weights,
which control the filter at sample
time k.
where refers to the
'th weight at k'th time.
represents the change in
the weights that occurs as a
result of adjustments computed
at sample time k.
These changes will be applied after sample time k and before they are used at sample time k+1.
The output is usually but it could be or it could even be the filter coefficients.
[2]
(Widrow)
The input signals are defined as follows:
where:
g = the desired signal,
g' = the a signal that is correlated with the desired signal g ,
u = an undesired signal that is added to g , but not correlated with g or g'
u' = the a signal that is correlated with the undesired signal u ,but not correlated with g or g' ,
v = an undesired signal (typically random noise) not correlated with g , g' , u, u' or v' ,
v' = an undesired signal (typically random noise) not correlated with g , g' , u, u' or v.
The output signals are defined as follows:
Adaptive filter
231
.
where:
= the output of the filter if the input was only g' ,
= the output of the filter if the input was only u' ,
= the output of the filter if the input was only v' .
Tapped delay line FIR filter
If the variable filter has a tapped delay line Finite Impulse Response (FIR) structure, then the impulse response is
equal to the filter coefficients. The output of the filter is given by
where refers to the 'th weight at k'th time.
Ideal case
In the ideal case . All the undesired signals in are represented by . consists
entirely of a signal correlated with the undesired signal in .
The output of the variable filter in the ideal case is
.
The error signal or cost function is the difference between and
. The desired signal g
k
passes through without being changed.
The error signal is minimized in the mean square sense when is minimized. In other words, is the
best mean square estimate of . In the ideal case, and . In the ideal case, all that is left after
the subtraction is which is the unchanged desired signal with all undesired signals removed.
Signal components in the reference input
In some situations, the reference input x_k includes components of the desired signal. This means g' 0.
Perfect cancelation of the undesired interference is not possible in the case, but improvement of the signal to
interference ratio is possible. The output will be
. The desired signal will be modified (usually decreased).
The output signal to interference ratio has a simple formula referred to as power inversion.
.
where
= output signal to interference ratio.
= reference signal to interference ratio.
= frequency in the z-domain.
This formula means that the output signal to interference ratio at a particular frequency is the reciprocal of the
reference signal to interference ratio.
[3]
Example: A fast food restaurant has a drive-up window. Before getting to the window, customers place their order
by speaking into a microphone. The microphone also picks up noise from the engine and the environment. This
microphone provides the primary signal. The signal power from the customers voice and the noise power from the
engine are equal. It is difficult for the employees in the restaurant to understand the customer. To reduce the amount
Adaptive filter
232
of interference in the primary microphone, a second microphone is located where it is intended to pick up sounds
from the engine. It also picks up the customers voice. This microphone is the source of the reference signal. In this
case, the engine noise is 50 times more powerful than the customers voice. Once the canceler has converged, the
primary signal to interference ratio will be improved from 1:1 to 50:1.
Adaptive Linear Combiner
Adaptive linear combiner showing the combiner and the adaption process. k = sample
number, n=input variable index, x = reference inputs, d = desired input, W = set of filter
coefficients, = error output, = summation, upper box=linear combiner, lower
box=adaption algorithm.
Adaptive linear combiner, compact representation. k = sample number, n=input variable
index, x = reference inputs, d = desired input, = error output, = summation.
The adaptive linear combiner (ALC)
resembles the adaptive tapped delay
line FIR filter except that there is no
assumed relationship between the X
values. If the X values were from the
outputs of a tapped delay line, then the
combination of tapped delay line and
ALC would comprise an adaptive
filter. However, the X values could be
the values of an array of pixels. Or
they could be the outputs of multiple
tapped delay lines. The ALC finds use
as an adaptive beam former for arrays
of hydrophones or antennas.
where refers to the
'th weight at k'th time.
LMS algorithm
Main article: Least mean squares filter
If the variable filter has a tapped delay
line FIR structure, then the LMS
update algorithm is especially simple.
Typically, after each sample, the coefficients of the FIR filter are adjusted as follows:
[4]
(Widrow)
for
is called the convergence factor.
The LMS algorithm does not require that the X values have any particular relationship; therefor it can be used to
adapt a linear combiner as well as an FIR filter. In this case the update formula is written as:
The effect of the LMS algorithm is at each time, k, to make a small change in each weight. The direction of the
change is such that it would decrease the error if it had been applied at time k. The magnitude of the change in each
weight depends on , the associated X value and the error at time k. The weights making the largest contribution to
the output, , are changed the most. If the error is zero, then there should be no change in the weights. If the
associated value of X is zero, then changing the weight makes no difference, so it is not changed.
Adaptive filter
233
Convergence
controls how fast and how well the algorithm converges to the optimum filter coefficients. If is too large, the
algorithm will not converge. If is too small the algorithm converges slowly and may not be able to track changing
conditions. If is large but not too large to prevent convergence, the algorithm reaches steady state rapidly but
continuously overshoots the optimum weight vector. Sometimes, is made large at first for rapid convergence and
then decreased to minimize overshoot.
Widrow and Stearns state in 1985 that they have no knowledge of a proof that the LMS algorithm will converge in
all cases.
[5]
However under certain assumptions about stationarity and independence it can be shown that the algorithm will
converge if
where
= sum of all input power
is the RMS value of the 'th input
In the case of the tapped delay line filter, each input has the same RMS value because they are simply the same
values delayed. In this case the total power is
where
is the RMS value of , the input stream.
[6]
This leads to a normalized LMS algorithm:
in which case the convergence criteria becomes: .
Applications of adaptive filters
Noise cancellation
Signal prediction
Adaptive feedback cancellation
Echo cancellation
Filter implementations
Least mean squares filter
Recursive least squares filter
Multidelay block frequency domain adaptive filter
Adaptive filter
234
Notes
[1] [1] Widrow p 304
[2] [2] Widrow p 212
[3] [3] Widrow p 313
[4] [4] Widrow p 100
[5] [5] Widrow p 103
[6] [6] Widrow p 103
References
Hayes, Monson H. (1996). Statistical Digital Signal Processing and Modeling. Wiley. ISBN0-471-59431-8.
Haykin, Simon (2002). Adaptive Filter Theory. Prentice Hall. ISBN0-13-048434-2.
Widrow, Bernard; Stearns, Samuel D. (1985). Adaptive Signal Processing. Englewood Cliffs, NJ: Prentice Hall.
ISBN0-13-004029-0.
Kalman filter
The Kalman filter keeps track of the estimated state of the system and the variance or
uncertainty of the estimate. The estimate is updated using a state transition model and
measurements. denotes the estimate of the system's state at time step k before
the k-th measurement y
k
has been taken into account; is the corresponding
uncertainty.
The Kalman filter, also known as
linear quadratic estimation (LQE), is
an algorithm that uses a series of
measurements observed over time,
containing noise (random variations)
and other inaccuracies, and produces
estimates of unknown variables that
tend to be more precise than those
based on a single measurement alone.
More formally, the Kalman filter
operates recursively on streams of
noisy input data to produce a
statistically optimal estimate of the
underlying system state. The filter is
named for Rudolf (Rudy) E. Klmn,
one of the primary developers of its
theory.
The Kalman filter has numerous applications in technology. A common application is for guidance, navigation and
control of vehicles, particularly aircraft and spacecraft. Furthermore, the Kalman filter is a widely applied concept in
time series analysis used in fields such as signal processing and econometrics.
The algorithm works in a two-step process. In the prediction step, the Kalman filter produces estimates of the current
state variables, along with their uncertainties. Once the outcome of the next measurement (necessarily corrupted with
some amount of error, including random noise) is observed, these estimates are updated using a weighted average,
with more weight being given to estimates with higher certainty. Because of the algorithm's recursive nature, it can
run in real time using only the present input measurements and the previously calculated state and its uncertainty
matrix; no additional past information is required.
It is a common misconception that the Kalman filter assumes that all error terms and measurements are Gaussian
distributed. Kalman's original paper derived the filter using orthogonal projection theory to show that the covariance
is minimized, and this result does not require any assumption, e.g., that the errors are Gaussian. He then showed that
Kalman filter
235
the filter yields the exact conditional probability estimate in the special case that all errors are Gaussian-distributed.
Extensions and generalizations to the method have also been developed, such as the extended Kalman filter and the
unscented Kalman filter which work on nonlinear systems. The underlying model is a Bayesian model similar to a
hidden Markov model but where the state space of the latent variables is continuous and where all latent and
observed variables have Gaussian distributions.
Naming and historical development
The filter is named after Hungarian migr Rudolf E. Klmn, although Thorvald Nicolai Thiele
[1][2]
and Peter
Swerling developed a similar algorithm earlier. Richard S. Bucy of the University of Southern California contributed
to the theory, leading to it often being called the KalmanBucy filter. Stanley F. Schmidt is generally credited with
developing the first implementation of a Kalman filter. It was during a visit by Kalman to the NASA Ames Research
Center that he saw the applicability of his ideas to the problem of trajectory estimation for the Apollo program,
leading to its incorporation in the Apollo navigation computer. This Kalman filter was first described and partially
developed in technical papers by Swerling (1958), Kalman (1960) and Kalman and Bucy (1961).
Kalman filters have been vital in the implementation of the navigation systems of U.S. Navy nuclear ballistic missile
submarines, and in the guidance and navigation systems of cruise missiles such as the U.S. Navy's Tomahawk
missile and the U.S. Air Force's Air Launched Cruise Missile. It is also used in the guidance and navigation systems
of the NASA Space Shuttle and the attitude control and navigation systems of the International Space Station.
This digital filter is sometimes called the StratonovichKalmanBucy filter because it is a special case of a more
general, non-linear filter developed somewhat earlier by the Soviet mathematician Ruslan L. Stratonovich.
[3][4][5][6]
In fact, some of the special case linear filter's equations appeared in these papers by Stratonovich that were published
before summer 1960, when Kalman met with Stratonovich during a conference in Moscow.
Overview of the calculation
The Kalman filter uses a system's dynamics model (e.g., physical laws of motion), known control inputs to that
system, and multiple sequential measurements (such as from sensors) to form an estimate of the system's varying
quantities (its state) that is better than the estimate obtained by using any one measurement alone. As such, it is a
common sensor fusion and data fusion algorithm.
All measurements and calculations based on models are estimates to some degree. Noisy sensor data, approximations
in the equations that describe how a system changes, and external factors that are not accounted for introduce some
uncertainty about the inferred values for a system's state. The Kalman filter averages a prediction of a system's state
with a new measurement using a weighted average. The purpose of the weights is that values with better (i.e.,
smaller) estimated uncertainty are "trusted" more. The weights are calculated from the covariance, a measure of the
estimated uncertainty of the prediction of the system's state. The result of the weighted average is a new state
estimate that lies between the predicted and measured state, and has a better estimated uncertainty than either alone.
This process is repeated every time step, with the new estimate and its covariance informing the prediction used in
the following iteration. This means that the Kalman filter works recursively and requires only the last "best guess",
rather than the entire history, of a system's state to calculate a new state.
Because the certainty of the measurements is often difficult to measure precisely, it is common to discuss the filter's
behavior in terms of gain. The Kalman gain is a function of the relative certainty of the measurements and current
state estimate, and can be "tuned" to achieve particular performance. With a high gain, the filter places more weight
on the measurements, and thus follows them more closely. With a low gain, the filter follows the model predictions
more closely, smoothing out noise but decreasing the responsiveness. At the extremes, a gain of one causes the filter
to ignore the state estimate entirely, while a gain of zero causes the measurements to be ignored.
Kalman filter
236
When performing the actual calculations for the filter (as discussed below), the state estimate and covariances are
coded into matrices to handle the multiple dimensions involved in a single set of calculations. This allows for
representation of linear relationships between different state variables (such as position, velocity, and acceleration) in
any of the transition models or covariances.
Example application
As an example application, consider the problem of determining the precise location of a truck. The truck can be
equipped with a GPS unit that provides an estimate of the position within a few meters. The GPS estimate is likely to
be noisy; readings 'jump around' rapidly, though always remaining within a few meters of the real position. In
addition, since the truck is expected to follow the laws of physics, its position can also be estimated by integrating its
velocity over time, determined by keeping track of wheel revolutions and the angle of the steering wheel. This is a
technique known as dead reckoning. Typically, dead reckoning will provide a very smooth estimate of the truck's
position, but it will drift over time as small errors accumulate.
In this example, the Kalman filter can be thought of as operating in two distinct phases: predict and update. In the
prediction phase, the truck's old position will be modified according to the physical laws of motion (the dynamic or
"state transition" model) plus any changes produced by the accelerator pedal and steering wheel. Not only will a new
position estimate be calculated, but a new covariance will be calculated as well. Perhaps the covariance is
proportional to the speed of the truck because we are more uncertain about the accuracy of the dead reckoning
estimate at high speeds but very certain about the position when moving slowly. Next, in the update phase, a
measurement of the truck's position is taken from the GPS unit. Along with this measurement comes some amount of
uncertainty, and its covariance relative to that of the prediction from the previous phase determines how much the
new measurement will affect the updated prediction. Ideally, if the dead reckoning estimates tend to drift away from
the real position, the GPS measurement should pull the position estimate back towards the real position but not
disturb it to the point of becoming rapidly changing and noisy.
Technical description and context
The Kalman filter is an efficient recursive filter that estimates the internal state of a linear dynamic system from a
series of noisy measurements. It is used in a wide range of engineering and econometric applications from radar and
computer vision to estimation of structural macroeconomic models, and is an important topic in control theory and
control systems engineering. Together with the linear-quadratic regulator (LQR), the Kalman filter solves the
linear-quadratic-Gaussian control problem (LQG). The Kalman filter, the linear-quadratic regulator and the
linear-quadratic-Gaussian controller are solutions to what arguably are the most fundamental problems in control
theory.
In most applications, the internal state is much larger (more degrees of freedom) than the few "observable"
parameters which are measured. However, by combining a series of measurements, the Kalman filter can estimate
the entire internal state.
In DempsterShafer theory, each state equation or observation is considered a special case of a linear belief function
and the Kalman filter is a special case of combining linear belief functions on a join-tree or Markov tree. Additional
approaches include Belief Filters which use Bayes or evidential updates to the state equations.
A wide variety of Kalman filters have now been developed, from Kalman's original formulation, now called the
"simple" Kalman filter, the KalmanBucy filter, Schmidt's "extended" filter, the information filter, and a variety of
"square-root" filters that were developed by Bierman, Thornton and many others. Perhaps the most commonly used
type of very simple Kalman filter is the phase-locked loop, which is now ubiquitous in radios, especially frequency
modulation (FM) radios, television sets, satellite communications receivers, outer space communications systems,
and nearly any other electronic communications equipment.
Kalman filter
237
Underlying dynamic system model
The Kalman filters are based on linear dynamic systems discretized in the time domain. They are modelled on a
Markov chain built on linear operators perturbed by errors that may include Gaussian noise. The state of the system
is represented as a vector of real numbers. At each discrete time increment, a linear operator is applied to the state to
generate the new state, with some noise mixed in, and optionally some information from the controls on the system if
they are known. Then, another linear operator mixed with more noise generates the observed outputs from the true
("hidden") state. The Kalman filter may be regarded as analogous to the hidden Markov model, with the key
difference that the hidden state variables take values in a continuous space (as opposed to a discrete state space as in
the hidden Markov model). There is a strong duality between the equations of the Kalman Filter and those of the
hidden Markov model. A review of this and other models is given in Roweis and Ghahramani (1999)
[7]
and
Hamilton (1994), Chapter 13.
[8]
In order to use the Kalman filter to estimate the internal state of a process given only a sequence of noisy
observations, one must model the process in accordance with the framework of the Kalman filter. This means
specifying the following matrices: F
k
, the state-transition model; H
k
, the observation model; Q
k
, the covariance of
the process noise; R
k
, the covariance of the observation noise; and sometimes B
k
, the control-input model, for each
time-step, k, as described below.
Model underlying the Kalman filter. Squares represent matrices. Ellipses represent
multivariate normal distributions (with the mean and covariance matrix enclosed).
Unenclosed values are vectors. In the simple case, the various matrices are constant with
time, and thus the subscripts are dropped, but the Kalman filter allows any of them to
change each time step.
The Kalman filter model assumes the
true state at time k is evolved from the
state at (k1) according to
where
F
k
is the state transition model
which is applied to the previous
state x
k1
;
B
k
is the control-input model which
is applied to the control vector u
k
;
w
k
is the process noise which is
assumed to be drawn from a zero mean multivariate normal distribution with covariance Q
k
.
At time k an observation (or measurement) z
k
of the true state x
k
is made according to
where H
k
is the observation model which maps the true state space into the observed space and v
k
is the observation
noise which is assumed to be zero mean Gaussian white noise with covariance R
k
.
The initial state, and the noise vectors at each step {x
0
, w
1
, ..., w
k
, v
1
... v
k
} are all assumed to be mutually
independent.
Many real dynamical systems do not exactly fit this model. In fact, unmodelled dynamics can seriously degrade the
filter performance, even when it was supposed to work with unknown stochastic signals as inputs. The reason for
this is that the effect of unmodelled dynamics depends on the input, and, therefore, can bring the estimation
algorithm to instability (it diverges). On the other hand, independent white noise signals will not make the algorithm
diverge. The problem of separating between measurement noise and unmodelled dynamics is a difficult one and is
treated in control theory under the framework of robust control.
Kalman filter
238
Details
The Kalman filter is a recursive estimator. This means that only the estimated state from the previous time step and
the current measurement are needed to compute the estimate for the current state. In contrast to batch estimation
techniques, no history of observations and/or estimates is required. In what follows, the notation represents
the estimate of at time n given observations up to, and including at time m n.
The state of the filter is represented by two variables:
, the a posteriori state estimate at time k given observations up to and including at time k;
, the a posteriori error covariance matrix (a measure of the estimated accuracy of the state estimate).
The Kalman filter can be written as a single equation, however it is most often conceptualized as two distinct phases:
"Predict" and "Update". The predict phase uses the state estimate from the previous timestep to produce an estimate
of the state at the current timestep. This predicted state estimate is also known as the a priori state estimate because,
although it is an estimate of the state at the current timestep, it does not include observation information from the
current timestep. In the update phase, the current a priori prediction is combined with current observation
information to refine the state estimate. This improved estimate is termed the a posteriori state estimate.
Typically, the two phases alternate, with the prediction advancing the state until the next scheduled observation, and
the update incorporating the observation. However, this is not necessary; if an observation is unavailable for some
reason, the update may be skipped and multiple prediction steps performed. Likewise, if multiple independent
observations are available at the same time, multiple update steps may be performed (typically with different
observation matrices H
k
).
Predict
Predicted (a priori) state estimate
Predicted (a priori) estimate covariance
Update
Innovation or measurement residual
Innovation (or residual) covariance
Optimal Kalman gain
Updated (a posteriori) state estimate
Updated (a posteriori) estimate covariance
The formula for the updated estimate and covariance above is only valid for the optimal Kalman gain. Usage of other
gain values require a more complex formula found in the derivations section.
Kalman filter
239
Invariants
If the model is accurate, and the values for and accurately reflect the distribution of the initial state
values, then the following invariants are preserved: (all estimates have a mean error of zero)
where is the expected value of , and covariance matrices accurately reflect the covariance of estimates