Você está na página 1de 8

1640

IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 56, NO. 5, OCTOBER 2007

Discrete Wavelet Transform Signal Analyzer


Pedro Henrique Cox and Aparecido Augusto de Carvalho

AbstractThis paper addresses the problem of processing biological data, such as cardiac beats in the audio and ultrasonic
range, and on calculating wavelet coefficients in real time, with
the processor clock running at a frequency of present application-specified integrated circuits and field programmable gate
array. The parallel filter architecture for discrete wavelet transform (DWT) has been improved, calculating the wavelet coefficients in real time with hardware reduced up to 60%. The new
architecture, which also processes inverse DWT, is implemented
with the Radix-2 or the BoothWallace constant multipliers. One
integrated circuit signal analyzer in the ultrasonic range, including
series memory register banks, is presented.
Index TermsAsynchronous logic circuits, digital filters, digital
signal processors, last in last out memory, logic design, sequential
machines, signal analysis and synthesis.

I. I NTRODUCTION

HE DISCRETE wavelet transform (DWT) algorithm [1],


[2], [5] provides efficient multiresolution subband coding
representation in the time-scale plane. In each step, the signal
is high-pass and low-pass filtered (Fig. 1). An algorithm for
the calculation of 1-D DWT is proposed [1]. In this algorithm,
DWT coefficients in one level are calculated with DWT coefficients of the previous level. The input data sequence l0
has N0 = p2J samples, where p is an integer, and J is the
number of levels of the transform. Each decomposition level
j, 1 j J can be seen as the further decomposition of the
sequence lj1 , which has Nj1 samples, into two subbands lj
and hj , both with Nj = Nj1 /2 samples.
Such a decomposition is produced by two convolutions,
followed by a decimation by two.
In (1), ai and ci denote coefficients on low-pass Lj and highpass Hj (Fig. 1), M tap filters, lnj = 0 for n < 0 and n J
lnj =
hjn

M
1

i=0
M
1


Fig. 1.

Signal analysis and synthesis for a J-level DWT/IDWT.

DWT coefficients of a higher resolution level. Coefficients of


subsequent levels are obtained from (1). Hence, DWT extracts
information from the signal at different scales. The first level of
wavelet decomposition extracts the high-frequency components
of the signal, while the second and all subsequent wavelet
decompositions extract, progressively, lower frequency components. A few levels are enough to have a good approximation of
the signal with discrete wavelet coefficients. The four-level 1-D
DWT with low-pass eight-order filter wavelet coefficients are
presented in (2). Numerical equations for high-pass direct and
1-D inverse discrete wavelet transform (IDWT) filters are
obtained from (1b) and (3a) and (3b), which follow
later.
l1 (0) = a0 l0 (0) + a1 l0 (1) + a2 l0 (2)
+ a3 l0 (3) + a4 l0 (4) + a5 l0 (5)
+ a6 l0 (6) + a7 l0 (7)

(2a)

l1 (2) = a0 l0 (2) + a1 l0 (1) + a2 l0 (0)


+ a3 l0 (1) + a4 l0 (2) + a5 l0 (3)
+ a6 l0 (4) + a7 l0 (5)

(2b)

l1 (4) = a0 l0 (4) + a1 l0 (3) + a2 l0 (2)


j1
ai l2ni
,

0 n Nj 1

+ a3 l0 (1) + a4 l0 (0) + a5 l0 (1)

(1a)

+ a6 l0 (2) + a7 l0 (3)
ci

j1
l2ni
,

0 n Nj 1.

(1b)

(2c)

l1 (6) = a0 l0 (6) + a1 l0 (5) + a2 l0 (4)

i=0

+ a3 l0 (3) + a4 l0 (2) + a5 l0 (1)

For computing the DWT coefficients of the input discretetime data, it is assumed that the input data represent the

+ a6 l0 (0) + a7 l0 (1)

(2d)

l1 (8) = a0 l0 (8) + a1 l0 (7) + a2 l0 (6)


Manuscript received February 15, 2006; revised August 31, 2006.
P. H. Cox is with DEL CCET UFMSDepartamento de Engenharia Eltrica,
Universidade Federal de Mato Grosso do Sul, Cidade Universitria, 79070-900
Campo Grande, Brazil (e-mail: phcox@del.ufms.br).
A. A. de Carvalho is with the DEE FEIS UNESPDepartamento de
Engenharia Eltrica, Universidade do Estado de So Paulo, Faculdade
de Engenharia de Ilha Solteira, 15385-000 Ilha Solteira, Brazil (e-mail:
aac@dee.feis.unesp.br).
Digital Object Identifier 10.1109/TIM.2007.894797

+ a3 l0 (5) + a4 l0 (4) + a5 l0 (3)


+ a6 l0 (2) + a7 l0 (1)

(2e)

l1 (10) = a0 l0 (10) + a1 l0 (9) + a2 l0 (8)

0018-9456/$25.00 2007 IEEE

+ a3 l0 (7) + a4 l0 (6) + a5 l0 (5)


+ a6 l0 (4) + a7 l0 (3)

(2f)

COX AND DE CARVALHO: DISCRETE WAVELET TRANSFORM SIGNAL ANALYZER

Fig. 2.

1641

Dyadic wavelet bands for J = 4 and sampling frequency fs .

 j (Fig. 1) and coefficients sets a and c


 j and H
functions L

l1 (12) = a0 l0 (12) + a1 l0 (11) + a2 l0 (10)


+ a3 l0 (9) + a4 l0 (8) + a5 l0 (7)
+ a6 l0 (6) + a7 l0 (5)

M/21

(2g)

lnj

j+l
a2i l2ni

i=0

l1 (14) = a0 l0 (14) + a1 l0 (13) + a2 l0 (12)


+

+ a3 l0 (11) + a4 l0 (10) + a5 l0 (9)


+ a6 l0 (8) + a7 l0 (7)

(2h)

(2i)

(2j)

l2 (8) = a0 l1 (8) + a1 l1 (6) + a2 l1 (4)


+ a3 l1 (2) + a4 l1 (0) + a5 l1 (2)
(2k)

l2 (12) = a0 l1 (12) + a1 l1 (10) + a2 l1 (8)


+ a3 l1 (6) + a4 l1 (4) + a5 l1 (2)
(2l)

l3 (0) = a0 l2 (0) + a1 l2 (4) + a2 l2 (8)


+ a3 l2 (12) + a4 l2 (16) + a5 l2 (20)
+ a6 l2 (24) + a7 l2 (28)

(2m)

l3 (8) = a0 l2 (8) + a1 l2 (4) + a2 l2 (0)


+ a3 l2 (4) + a4 l2 (8) + a5 l2 (12)
+ a6 l2 (16) + a7 l2 (20)

(2n)

l4 (0) = a0 l3 (0) + a1 l3 (8) + a2 l3 (16)


+ a3 l3 (24) + a4 l3 (32) + a5 l3 (40)
+ a6 l3 (48) + a7 l3 (56).

M/21
j
ln+l

+ a3 l1 (2) + a4 l1 (4) + a5 l1 (6)

+ a6 l1 (0) + a7 l1 (2)

(3a)

j+l
a2i+l l2ni

M/21

l2 (4) = a0 l1 (4) + a1 l1 (2) + a2 l1 (0)

+ a6 l1 (4) + a7 l1 (6)

l n Nj

i=0

+ a3 l1 (6) + a4 l1 (8) + a5 l1 (10)

+ a6 l1 (8) + a7 l1 (10)

c2i hj+l
2ni ,

i=0

l2 (0) = a0 l1 (0) + a1 l1 (2) + a2 l1 (4)

+ a6 l1 (12) + a7 l1 (14)

M/21

(2o)

To reconstruct the analyzed signal (3) below, 1-D DWT


coefficients are upsampled and inverse filtered with transfer

c2i+l hj+l
2ni ,

l n Nj .

(3b)

i=0

This analysis on the signal is done with power of two or


dyadic bands. For computing the DWT coefficients of the
input discrete-time data, it is considered that the input data
represent the DWT coefficients of a higher resolution level.
Coefficients of subsequent levels are obtained from (1). Hence,
DWT extracts information from the signal at different scales.
The first level of wavelet decomposition extracts the highfrequency components of the signal, while the second and
all subsequent wavelet decompositions extract, progressively,
lower frequency components. A few levels are enough to have
a good approximation of the signal with discrete wavelet coefficients. A schematic for a four-level DWT and the dyadic
wavelet spectrum are shown in Fig. 2 (low-pass wavelet level
l4 , high-pass wavelet levels h4 , h3 , h2 , and h1 , and sampling
frequency fs ). It has a very wide array of applications such
as biomedicine [3], signal processing, ultrasonic analysis [4],
speech compression, numerical analysis, statistics, etc.
One original common architecture for the DWT and the
IDWT is designed. One logical circuit for data compression
is presented. This circuit controls compression rates in compression schemes used with the Percent of Root-mean-square
Difference (PRD) control index.
With one DWT module, one IDWT module, and delay memory, one signal analyzer is designed. It was simulated with three
and four levels to evaluate precision for several lengths on data
samples and filter coefficients.
To process high-frequency signals, combinational multipliers
are required. The Booth multiplier implements a parallel-serial
multiplier with additional control logic. However, it requires
fewer operations. One combinatorial version is designed. The
proposed multiplier is faster, needs less area, and is simple
to implement. With this high-efficiency element processor,
the signal analyzer frequency range is extended to one-eighth
processor clock frequency.

1642

IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 56, NO. 5, OCTOBER 2007

Fig. 4.

Asynchronous Control Logic.

Fig. 5.

High-pass and low-pass FIRs with eight coefficients.

Fig. 3. Asynchronous folded parallel filter architecture with four levels.

One four-level asynchronous folded parallel filter architecture (AFPFA) with the Radix-2 multiplier and eight filter coefficients has been implemented in very high speed integrated
circuit hardware description language (VHDL).
This paper presents an original hardware signal analyzer
(1-D coder and decoder) for CMOS and FPGA integrated
circuits. The PRD quality criterion to evaluate the precision
on DWT and IDWT processing modules is one of the most
widely adopted, nowadays [12], data compression algorithms.
A wavelet coder/decoder general architecture presented in [11]
has a low frequency response band and shows some graphics about the results from simulations. Selected wavelet band
synthesizers such as in [9] employ a digital signal processor
board connected to a VXI standard interface to process power
line frequency range signals with three-level DWT. The most
important achievement in this paper is the perfect synthesis
architecture for the DWT algorithm with any number of levels.
Synthesized data are obtained with precision, depending on
the word length on filter coefficients and input data. This
paper is the solution to implement data compression algorithms
with integrated circuits. Equations for perfect synthesis are
implemented in DWT and IDWT algorithms on the signal
analyzer. The evaluation of the signal analyzer precision was
done by quantizing the data and filter coefficients for n bits
word processing (n = 4 + 4i, 1 i 7). With synchronous
input and constant processing elements, real-time analysis and
synthesis is assured for signal sampling in the megahertz range.
II. A SYNCHRONOUS F OLDED P ARALLEL
F ILTER A RCHITECTURE
The parallel filter architecture is optimal with respect to both
area and computing time [5]. For each N data samples, N
wavelet coefficients are output. It is an architecture that has
a simple register allocation scheme and two filters, with high
processor efficiency. The proposed architecture has only one
filter to calculate both low-pass and high-pass wavelet coefficients in each algorithm step. Real-time transform is achieved
with two clocks: the data sampling clock and the processor
clock. The ratio between the two clocks is a real number.
The new design (Fig. 3) employs an asynchronous control

circuit (Fig. 4) rather than the classical approach presented


in [6]. With asynchronous control logic (ACL), the maximum
sampling frequency is fp /2m, where fp is the processor clock
frequency, and m is the number of processor cycles in each step.
Wavelet coefficients are obtained by multiplying M samples
by M coefficients in an M -tap finite impulse response (FIR)
digital filter. The example in Fig. 5 with M = 8 illustrates
the operation on an FIR filter. In this multiply and add unit,
8 n bits data or wavelet coefficients are multiplied by n bit
filter coefficients ci , 1 i 7 for an n bit high-pass wavelet
coefficient, and by an n bit ai , 1 i 7 for an n bit lowpass wavelet coefficient. For each data sample, two wavelet
coefficients are calculated, and the result is output to a bidirectional bus with the recursive pyramid algorithm (RPA) [7].
A. Recursive Pyramid Algorithm
The RPA is a running algorithm for computing DWT. It
can be implemented very efficiently on FPGA, without any
buffering. The RPA outputs high-pass wavelet coefficients and
stores or outputs low-pass wavelet coefficients at previously
defined clock cycles. Goals for the RPA are:
1) real-time DWT performance.
2) input data at a uniform rate.
3) minimization of storage.
The RPA allows a DWT computation in real time, with
M log N M cells of storage, where M is the number of filter
coefficients and N = 2J , where J is the number of levels. It
consists of rearranging the order of the N outputs such that
an output is scheduled at the earliest instance that it can be
scheduled. The earliest instance is decided based on a strict
precedence relation, i.e., if the earliest instance of the ith

COX AND DE CARVALHO: DISCRETE WAVELET TRANSFORM SIGNAL ANALYZER

1643

TABLE I
WAVELET COEFFICIENTS REGISTER ALLOCATION AND ROUTING ON DWT

Fig. 6.

Dyadic sampling grid for the DWT.

octave clashes with that of the (i + 1)th octave, then the ith octave is scheduled. A simple way of obtaining this output schedule is to consider the sampling grid for the DWT output, which
is shown in Fig. 6. Now, push (up or down) all the horizontal
lines of samples until they form a single line. The order on the
outputs obtained in this manner gives us the output schedule.
The basic idea behind the RPA is to linearize the pyramid
schedule without increasing the dependencies between octaves.
B. Control Logic Units
AFPFA performs DWT at the input data frequency. In FPGA
implementations, the running clock for circuitry may be much
higher than the input data sampling frequency. Data are input
to the communication bus with the data sampling clock and the
multiplier process data with the processor clock. The processor
clock must be 2n + 4 times faster than the data clock for
the Radix-2 multiplier, where n is the number of bits on the
operation and is six times faster for the BoothWallace constant
multiplier. There are three simple control logics in AFPFA
instead of one, which depends directly on the number of taps
and transform levels [8]: the asynchronous control logic, to
input data samples and filter coefficients, the processor control
logic, for the sequential multiplier, and the transform control
logic (TCL), for the algorithm.
The asynchronous control logic (Fig. 4) synchronizes the
data sampling clock and processor clock in each step of
the DWT algorithm. After reset, filter coefficients are loaded
to the set of Radix-2 multipliers, and the processor is ready.
For each data sample input to AFPFA, the ACL detects the first
valid positive transition on the processor clock, and the processor control logic performs one multiply-and-add operation.
The TCL performs two basic operations in each DWT algorithm step. The schedule with M = 8 and four levels is shown
in Table I. The operations are:
1) To select input data or a set of low-pass wavelet coefficients from register banks to the multiply-and-add unit,
addressing the register bank multiplexer (RBM).
2) To output a detail or high-pass wavelet coefficient and
store or output an approximation or low-pass wavelet
coefficient with RPA.
C. Filter
The filter unit is an M -tap nonrecursive FIR digital filter,
where the transfer function is shown in (4) and (5), and where ci
and ai , 0 i 7 are the coefficients for the high-pass and lowpass bands, respectively. Computation of a DWT coefficient is
done with one multiply-and-add operation for each data sampling clock cycle. In each DWT level, high-pass and low-pass

wavelet coefficients are calculated. The performance of this


filter is enhanced with the BoothWallace constant multiplier
C(z) = c0 + c1 z 1 + c2 z 2 + c3 z 3
+ c4 z 4 + c5 z 5 + c6 z 6 + c7 z 7
A(z) = a0 + a1 z

+ a2 z

+ a3 z

(4)

+ a4 z 4 + a5 z 5 + a6 z 6 + a7 z 7 .

(5)

D. Storage and Multiplexing


The input register bank (IRB) and coefficient register banks
(CRB1 , . . . , CRBJ1 ) store input data samples and calculate
wavelet coefficients for each level, respectively.
The registers are sets of positive edge triggered D-type flipflops. At the output, the RBM selects one of these four register
banks to input data to the high-pass and low-pass filters (Fig. 3).
In our example, M = 8 and J = 4, which means that eight
consecutive input samples are required to output each coefficient at the first level. The IRB has eight data registers
connected serially. Data are output to filters whenever an even
data sample arrives.
Results from low-pass filtering will further be processed
when calculating level 2 wavelet coefficients. It is necessary to
store eight coefficients from the previous level to calculate one
coefficient at the present level. These results are stored in CRBs
1, 2, and 3 for levels 2, 3, and 4, respectively.
When designing with FPGAs, data bus implementation with
an electronic high-impedance state for several tied output registers is not available, and data sources must be multiplexed.
To calculate one coefficient at a level 1 j 4, IRB, CRB1 ,
CRB2 , or CRB3 is selected. Data are loaded to filters at positive
data clock transitions.
E. Timing
The first octave computations are scheduled every even data
sampling clock cycle 2k. Second octave computations are

1644

IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 56, NO. 5, OCTOBER 2007

TABLE II
FILTER COEFFICIENTS SETS

Fig. 7. Common architecture for DWT and IDWT.

executed in clock cycles 4k + 1. Third octave computations


are done at 8k + 3 clock cycles and final results, and fourth
octave computations are done at 16k + 7 clock cycles. The
delay to present first results is the period, in data sampling
clock cycles, to fill up CRB3 for the first time, in addition to the
number of periods to compute the next fourth-level wavelet coefficient. First results were output with 71 data sampling clock
cycles.

In IDWT mode, the common architecture performs two basic


operations in each step. The schedule with M = 8 and four
levels is shown in Table III. The operations are:
1) To select input high-pass wavelet coefficients and lowpass reconstructed wavelet coefficients from register
banks to the multiply-and-add unit, addressing the RBM.
2) To store a reconstructed low-pass wavelet coefficient and
output reconstructed data samples with IRPA.

III. C OMMON A RCHITECTURE FOR


A NALYSIS AND S YNTHESIS
To analyze a signal with the DWT consists of calculating two
coefficients: outputs of a high-pass and a low-pass finite input
filter (FIR). To synthesize a signal consists of inverse filtering
two output signals from the low-pass and high-pass FIRs, with
two sets of odd and even inverse filter coefficients for each set of
input samples. Slight modifications on AFPFA, such as splitting
each CRB in a set of two, and on the synchronization, accessing twice the same set of coefficients to calculate subsequent
reconstructed data, are necessary to implement the synthesis
module. For inverse operation, the inverse RPA (IRPA) is used.
The IRPA is structurally similar to the RPA. In each step, both
wavelet coefficient sequences are upsampled by inserting zeros.
Inverse filtering the add operations is done with two sets of even
and odd coefficients. Sets of registers IRB, CRB1 , CRB2 , and
CRB3 for DWT are split in two for IDWT, and multiplexers
B0 , B1 , B2 , and B3 are inserted (Fig. 7) to form the common
architecture for DWT and IDWT. When D = 1, the outputs of
BLi are connected to the inputs of BHi , i = 0, 1, . . . , 3, and
eight data samples or low-pass wavelet coefficients from sets
IRB, CRB1 , CRB2 , or CRB3 in Fig. 3 are selected on the RBM
multiplexer for high-pass or low-pass filtering. When D = 0,
four high-pass wavelet coefficients from BHi and four low-pass
reconstructed wavelet coefficients from BLi , i = 0, 1, . . . , 3 are
selected on RBM and multiplied by even and odd coefficient
sets for synthesis wavelet coefficients and data. Table II presents
the filter coefficient set selection with control lines D and
F . The control line D defines the architecture mode: direct
(DWT) or IDWT. The control line F chooses high-pass or lowpass filters in DWT mode and even or odd inverse filtering
coefficient sets in IDWT mode.

IV. S IGNAL A NALYZER


The signal analyzer consists of one DWT module, additional
memory to delay detail wavelet coefficients, and one IDWT
module (Fig. 8). With the Radix-2 multiplier, a programmable
analytic wavelet is provided: real-time performance on biological signals and audio. With the BoothWallace constant
multiplier, very fast processing times improve the performance,
and real time is achieved on the audio and ultrasonic range.
Additional memory is provided by two/three sets of series
registers for the three/four-level wavelet transform. To measure
the performance with a simple difference index, only one set
of memory registers to delay the input data stream is included.
This parameter is calculated by subtracting the reconstructed
data from the delayed input data and is an indication of quality
in compression algorithms.
A. Wavelet Coefcient Selection
Several algorithms for ECG compression set minimum values for wavelet coefficients at each level. When the last calculated wavelet coefficient is lower than the respective level
limit (Fig. 9), it is set to zero to improve the bits compression
rate. Efficiency on coefficient selection is measured by the
difference index in the signal analyzer (Fig. 8). The calculation
of PRD index (6), where f (i) is a data sample and f (i) is a
reconstructed data sample, is accomplished with the inclusion
of two adders, two multipliers, and one divider.

PRD =

 n

i=1

(f (i) f (i))


n
i=1

1/2
2

f (i)

100%.

(6)

COX AND DE CARVALHO: DISCRETE WAVELET TRANSFORM SIGNAL ANALYZER

1645

TABLE III
WAVELET COEFFICIENTS REGISTER ALLOCATION AND ROUTING ON IDWT

Fig. 9. Coefficient selection.


Fig. 8.

Signal analyzer memory allocation. Four levels.

B. BoothWallace Constant Multiplier


The Booth logic multiplier makes use of a control logic
to reduce the number of add operations in the serial-parallel
multiplier. The Wallace tree multiplier is a combinational
adder; there is one adder for each bit in the multiplicand. The
BoothWallace tree multiplier is a multiplier that requires a
complex control logic and a series of adders/subtracters. The
BoothWallace constant multiplier is an improved version of
the BoothWallace tree multiplier, which is proposed for audio
and ultrasonic analysis and synthesis in AFPFA. Orthogonal
quadrature mirror filters such as Daubechies and Symlets are
hardware implemented. Data input to the filter is multiplexed,
and some coefficient signals are changed in low-pass and highpass filtering or even and odd reconstructing.

V. E XPERIMENTAL R ESULTS
FPGA prototyping tools reduce development time to a minimum. Reconfigurable processors are viable platforms for a
broad range of specialized applications such as DWT algorithms. Other DWT algorithms have been implemented using
CMOS technology [2], [5] or DSP-based architecture [9]. The
AFPFA has been implemented in VHDL. Numerical equations
define low-pass and high-pass eight-order filter operations.
The Radix-2 multiplier was implemented first, the eighttap filter was developed next, and then, one four-level DWT
algorithm was implemented.
A. Radix-2 Multiplier and Filter
Filters are programmed after reset or any time with M0 = 1
(Fig. 3). To load one data sample M0 = 0, data on CB, FCLK is

1646

IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 56, NO. 5, OCTOBER 2007

450 kHz. To extend frequency response to the megahertz


range, the Radix-2 multiplier is replaced by the BoothWallace
constant multiplier, which is an improved version of the
BoothWallace tree multiplier.
C. Signal Analyzer Evaluation

Fig. 10. Timing for 8-bit Radix-2 multiplier.

pulsed. In each operation, one wavelet coefficient is calculated


with cp and cpnsr. The processor clock cp shows one new SR
bit at the positive transition. At the negative transition, one new
partial sum AND + PR2 is stored in PR2. The processor cycle
is shown in Fig. 10.
To multiply two 8-bit operands, the result has 16 bits to represent most and least significant bits. When calculating DWT,
wavelet coefficients are stored to be processed as operands
in subsequent filtering. Only eight MSBs from the result are
stored. To calculate one wavelet coefficient, each multiply result
is added to seven others, the sum being divided by eight (Fig. 5).
This division is easily accomplished with four, two, and one
adders for stages 1, 2, and 3, respectively. In each stage, each
sum is divided by two. To multiply numbers with 12 or 16 bits,
the increase in FPGA area is minimum. Area is O(4n), which
is proportional to the number of bits n for data and wavelet
coefficients.
B. DWT Algorithm
Control lines sel1 sel0 select one of the four register banks
for filter input data in multiplexer RBM. Control lines FCLK2 ,
FCLK3 , and FCLK4 store wavelet coefficients at the end of
each processor cycle (Fig. 11). Table IV presents the control
lines for output registers. External control line read data is
accessed to send data samples to IRB or wavelet coefficients
to CB (Fig. 3).
In this paper, we have presented the VHDL implementation
of a DWT architecture for real-time processing with minimum
area. FPGAs like the ACEX EP1K50 have high density and
speed to implement complex algorithms directly in hardware.
The 8-bit four-level VHDL AFPFA requires about 1630
logic cells (56%). The clock has been set to 30 MHz. The
implementation defines bit-to-bit control lines, data buses,
and high-level digital system design. The DWT with different
analytic wavelets is performed during operation. It was first
developed for real-time analysis and compression of biological
signals such as ECG. Due to its outstanding performance,
the AFPFA processes audio and ultrasonic signals up to

Numerical equations have been written for synthesizing


wavelet coefficients streams and data. The signal analyzer, consisting of the calculation of two sets of coefficients, with three
and four levels, has been simulated to evaluate the precision on
synthesized signal. In the first set of steps, DWT is obtained by
calculating the coefficients from the input data. In the second
set of steps, the IDWT is calculated to reconstruct original data
from the DWT coefficients.
The performance is evaluated with the x derivation from
three derivations of ECG data. The PRD index, which is usually
adopted to measure compression quality, instead of signal-tonoise ratio [2], is calculated. Table V presents the PRD index
on the synthesized data for 8-, 12-, 16-, 20-, 24-, 28-, and 32-bit
input data and filter coefficients and fixed-point processing
elements. Two architectures are evaluated: one with three and
the other with four levels. Precision, for the DWT module only,
fixed-point processing elements is performed by synthesizing
the DWT wavelet coefficients with a floating-point ALU in the
IDWT module and then by calculating the difference between
the original and the reconstructed data. Three- and four-level
IDWT precisions calculated in this manner, in a 32-bit microcomputer, are 1.58 1010 % and 2.39 1010 %, respectively.
Each PRD measured index is the mean value for 20 ECGs.
VI. C ONCLUSION
In the last five years, nothing has been published about
common architectures or wavelet signal analyzer. This paper
presents, in detail, an original common architecture for DWT
or IDWT and the first wavelet signal analyzer.
In this paper, the IRPA is implemented, including the output reconstructed data in the calculations with one processor.
The folded common architecture presented in [10], scheduled
with IRPA, requires twice the number of filters and buffers
to calculate the IDWT. The VXI signal analyzer presented in
[9] performs only power line analysis with a DSP. The FPGA
coder/decoder implementation in [11] only presents simulations
for known DWT and IDWT algorithms.
The asynchronous architecture proposed in this paper has
a flexible design. The number of levels on DWT/IDWT is
changed without affecting the algorithm state chart. The control
logic is the same, only the size and the number of memory
buffers are changed. The asynchronous feature improves the
processing speed on biological signals and audio. On an FPGA
with Radix-2 processing elements, the analytical wavelet may
be software configured. With this processor, the hardware required for a complete signal analyzer is minimized. The implementation with Radix-2 multipliers reduces, by up to 60%,
the hardware on the implementation with Wallace multipliers.
Depending on the signal frequency response, the required circuit area is reduced. For example, for classifying ECG data

COX AND DE CARVALHO: DISCRETE WAVELET TRANSFORM SIGNAL ANALYZER

1647

Fig. 11. DWT algorithm timing for filter coefficients write and first states 010.
TABLE IV
FOUR-LEVEL ALGORITHM CONTROL LOGIC FOR
WAVELET COEFFICIENTS STORAGE ON DWT

[6]
[7]
[8]
[9]
[10]

TABLE V
SIGNAL ANALYZER PRECISION

[11]
[12]

in a signal analyzer, only one arithmetic unit implemented


with Radix-2 processing elements is required, instead of four
when implementing the signal analyzer with the same number
of levels with BoothWallace constant processing elements in
the ultrasonic range. The PRD precision for this four-level
integrated circuit signal analyzer is 0.043% on 16-bit input data
and filter coefficients.

to mappings on SIMD array computers, IEEE Trans. Signal Process.,


vol. 43, no. 3, pp. 759771, Mar. 1995.
K. K. Parhi, Synthesis of control circuits in folded pipelined DSP architectures, IEEE J. Solid-State Circuits, vol. 27, no. 1, pp. 2943,
Jan. 1992.
M. Vishwanath, The recursive pyramid algorithm for the discrete wavelet
transform, IEEE Trans. Signal Process., vol. 42, no. 3, pp. 673676,
Mar. 1994.
E. Huluta, E. M. Petriu, S. R. Das, and A. H. Al-Dhaer, Discrete wavelet
transform architecture using fast processing elements, in Proc. IEEE Inst.
Meas. Technol. Conf., May 2002, pp. 15371542.
L. Angrisani, P. Daponte, M. DApuzzo, and A. Pietrosanto, A VXI
signal analyzer based on the wavelet transform, in Proc. IEEE Inst. Meas.
Technol. Conf., May 1997, pp. 440445.
M. Vishwanath and R. M. Owens, A common architecture for the DWT
and IDWT, in Proc. IEEE ASAP, 1996, pp. 193198.
N. Elghamery and S. E.-D. Habib, An efficient FPGA implementation of
a wavelet coder/decoder, in Proc. 12th Int. Conf. Microelectron., Tehran,
Iran, Oct. 31Nov. 2, 2000, pp. 269272.
S.-G. Miaou and C.-L. Lin, A quality-on-demand algorithm for waveletbased compression of electrocardiogram signals, IEEE Trans. Biomed.
Eng., vol. 49, no. 3, pp. 233239, Mar. 2002.

Pedro Henrique Cox was born in Campo Grande,


Mato Grosso do Sul, Brazil. He received the B.Sc.
and M.Sc. degrees in electrical engineering from
Pontific Catholic University of Rio de Janeiro,
Rio de Janeiro, Brazil, in 1980 and 1982, respectively, and the Diplome dtudes Approfondies
en Genie Biomedical from Universit Paris XII,
Val de Marne, France, in 1983. In 2004, he received
the Ph.D. degree in electrical engineering from So
Paulo State University, So Paulo, Brazil.
Currently, he is teaching at the Mato Grosso do
Sul Federal University, Campo Grande. His research is currently on electronic
instrumentation and digital systems.

R EFERENCES
[1] S. G. Mallat, A theory for multiresolution signal decomposition: The
wavelet representation, IEEE Trans. Pattern Anal. Mach. Intell., vol. 2,
no. 7, pp. 674693, Jul. 1989.
[2] A. Grzeszczak, K. M. Mrinal, and P. Sethuraman, VLSI implementation
of discrete wavelet transform, IEEE Trans. VLSI Syst., vol. 4, no. 4,
pp. 421433, Dec. 1996.
[3] Z. Lu, D. Y. Kim, and A. William, Wavelet compression of ECG signals by the set partitioning in hierarchical trees algorithm, IEEE Trans.
Biomed. Eng., vol. 47, no. 7, pp. 849856, Jul. 2000.
[4] B. A. Rajoub, An efficient coding algorithm for the compression of ECG
signal using the wavelet transform, IEEE Trans. Biomed. Eng., vol. 49,
no. 4, pp. 355362, Apr. 2002.
[5] C. Chakrabarti and M. Vishwnath, Efficient realizations of the discrete
and continuous wavelet transforms: From single chip implementations

Aparecido Augusto de Carvalho was born in


Bebedouro-SP, Brazil. He received the B.Sc. degree in electrical engineering from the University of
So Paulo, So Paulo, Brazil, in 1976, the M.Sc.
degree in biomedical engineering from the Federal
University of Rio de Janeiro, Rio de Janeiro, Brazil,
in 1979, and the Ph.D. degree in applied physics from
the University of So Paulo in 1987.
From 1993 to 1994, he was an honorary fellow with the Department of Computer and Electrical Engineering, University of Wisconsin-Madison,
granted by FAPESP (Brazil). He is currently a Professor with the So Paulo
State University (UNESP), Campus of Ilha Solteira, Brazil. His research
interests include sensors and electronic instrumentation.

Você também pode gostar