Escolar Documentos
Profissional Documentos
Cultura Documentos
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 56, NO. 5, OCTOBER 2007
AbstractThis paper addresses the problem of processing biological data, such as cardiac beats in the audio and ultrasonic
range, and on calculating wavelet coefficients in real time, with
the processor clock running at a frequency of present application-specified integrated circuits and field programmable gate
array. The parallel filter architecture for discrete wavelet transform (DWT) has been improved, calculating the wavelet coefficients in real time with hardware reduced up to 60%. The new
architecture, which also processes inverse DWT, is implemented
with the Radix-2 or the BoothWallace constant multipliers. One
integrated circuit signal analyzer in the ultrasonic range, including
series memory register banks, is presented.
Index TermsAsynchronous logic circuits, digital filters, digital
signal processors, last in last out memory, logic design, sequential
machines, signal analysis and synthesis.
I. I NTRODUCTION
M
1
i=0
M
1
Fig. 1.
(2a)
(2b)
0 n Nj 1
(1a)
+ a6 l0 (2) + a7 l0 (3)
ci
j1
l2ni
,
0 n Nj 1.
(1b)
(2c)
i=0
For computing the DWT coefficients of the input discretetime data, it is assumed that the input data represent the
+ a6 l0 (0) + a7 l0 (1)
(2d)
(2e)
(2f)
Fig. 2.
1641
M/21
(2g)
lnj
j+l
a2i l2ni
i=0
(2h)
(2i)
(2j)
(2m)
(2n)
M/21
j
ln+l
+ a6 l1 (0) + a7 l1 (2)
(3a)
j+l
a2i+l l2ni
M/21
+ a6 l1 (4) + a7 l1 (6)
l n Nj
i=0
+ a6 l1 (8) + a7 l1 (10)
c2i hj+l
2ni ,
i=0
+ a6 l1 (12) + a7 l1 (14)
M/21
(2o)
c2i+l hj+l
2ni ,
l n Nj .
(3b)
i=0
1642
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 56, NO. 5, OCTOBER 2007
Fig. 4.
Fig. 5.
One four-level asynchronous folded parallel filter architecture (AFPFA) with the Radix-2 multiplier and eight filter coefficients has been implemented in very high speed integrated
circuit hardware description language (VHDL).
This paper presents an original hardware signal analyzer
(1-D coder and decoder) for CMOS and FPGA integrated
circuits. The PRD quality criterion to evaluate the precision
on DWT and IDWT processing modules is one of the most
widely adopted, nowadays [12], data compression algorithms.
A wavelet coder/decoder general architecture presented in [11]
has a low frequency response band and shows some graphics about the results from simulations. Selected wavelet band
synthesizers such as in [9] employ a digital signal processor
board connected to a VXI standard interface to process power
line frequency range signals with three-level DWT. The most
important achievement in this paper is the perfect synthesis
architecture for the DWT algorithm with any number of levels.
Synthesized data are obtained with precision, depending on
the word length on filter coefficients and input data. This
paper is the solution to implement data compression algorithms
with integrated circuits. Equations for perfect synthesis are
implemented in DWT and IDWT algorithms on the signal
analyzer. The evaluation of the signal analyzer precision was
done by quantizing the data and filter coefficients for n bits
word processing (n = 4 + 4i, 1 i 7). With synchronous
input and constant processing elements, real-time analysis and
synthesis is assured for signal sampling in the megahertz range.
II. A SYNCHRONOUS F OLDED P ARALLEL
F ILTER A RCHITECTURE
The parallel filter architecture is optimal with respect to both
area and computing time [5]. For each N data samples, N
wavelet coefficients are output. It is an architecture that has
a simple register allocation scheme and two filters, with high
processor efficiency. The proposed architecture has only one
filter to calculate both low-pass and high-pass wavelet coefficients in each algorithm step. Real-time transform is achieved
with two clocks: the data sampling clock and the processor
clock. The ratio between the two clocks is a real number.
The new design (Fig. 3) employs an asynchronous control
1643
TABLE I
WAVELET COEFFICIENTS REGISTER ALLOCATION AND ROUTING ON DWT
Fig. 6.
octave clashes with that of the (i + 1)th octave, then the ith octave is scheduled. A simple way of obtaining this output schedule is to consider the sampling grid for the DWT output, which
is shown in Fig. 6. Now, push (up or down) all the horizontal
lines of samples until they form a single line. The order on the
outputs obtained in this manner gives us the output schedule.
The basic idea behind the RPA is to linearize the pyramid
schedule without increasing the dependencies between octaves.
B. Control Logic Units
AFPFA performs DWT at the input data frequency. In FPGA
implementations, the running clock for circuitry may be much
higher than the input data sampling frequency. Data are input
to the communication bus with the data sampling clock and the
multiplier process data with the processor clock. The processor
clock must be 2n + 4 times faster than the data clock for
the Radix-2 multiplier, where n is the number of bits on the
operation and is six times faster for the BoothWallace constant
multiplier. There are three simple control logics in AFPFA
instead of one, which depends directly on the number of taps
and transform levels [8]: the asynchronous control logic, to
input data samples and filter coefficients, the processor control
logic, for the sequential multiplier, and the transform control
logic (TCL), for the algorithm.
The asynchronous control logic (Fig. 4) synchronizes the
data sampling clock and processor clock in each step of
the DWT algorithm. After reset, filter coefficients are loaded
to the set of Radix-2 multipliers, and the processor is ready.
For each data sample input to AFPFA, the ACL detects the first
valid positive transition on the processor clock, and the processor control logic performs one multiply-and-add operation.
The TCL performs two basic operations in each DWT algorithm step. The schedule with M = 8 and four levels is shown
in Table I. The operations are:
1) To select input data or a set of low-pass wavelet coefficients from register banks to the multiply-and-add unit,
addressing the register bank multiplexer (RBM).
2) To output a detail or high-pass wavelet coefficient and
store or output an approximation or low-pass wavelet
coefficient with RPA.
C. Filter
The filter unit is an M -tap nonrecursive FIR digital filter,
where the transfer function is shown in (4) and (5), and where ci
and ai , 0 i 7 are the coefficients for the high-pass and lowpass bands, respectively. Computation of a DWT coefficient is
done with one multiply-and-add operation for each data sampling clock cycle. In each DWT level, high-pass and low-pass
+ a2 z
+ a3 z
(4)
+ a4 z 4 + a5 z 5 + a6 z 6 + a7 z 7 .
(5)
1644
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 56, NO. 5, OCTOBER 2007
TABLE II
FILTER COEFFICIENTS SETS
PRD =
n
i=1
(f (i) f (i))
n
i=1
1/2
2
f (i)
100%.
(6)
1645
TABLE III
WAVELET COEFFICIENTS REGISTER ALLOCATION AND ROUTING ON IDWT
V. E XPERIMENTAL R ESULTS
FPGA prototyping tools reduce development time to a minimum. Reconfigurable processors are viable platforms for a
broad range of specialized applications such as DWT algorithms. Other DWT algorithms have been implemented using
CMOS technology [2], [5] or DSP-based architecture [9]. The
AFPFA has been implemented in VHDL. Numerical equations
define low-pass and high-pass eight-order filter operations.
The Radix-2 multiplier was implemented first, the eighttap filter was developed next, and then, one four-level DWT
algorithm was implemented.
A. Radix-2 Multiplier and Filter
Filters are programmed after reset or any time with M0 = 1
(Fig. 3). To load one data sample M0 = 0, data on CB, FCLK is
1646
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 56, NO. 5, OCTOBER 2007
1647
Fig. 11. DWT algorithm timing for filter coefficients write and first states 010.
TABLE IV
FOUR-LEVEL ALGORITHM CONTROL LOGIC FOR
WAVELET COEFFICIENTS STORAGE ON DWT
[6]
[7]
[8]
[9]
[10]
TABLE V
SIGNAL ANALYZER PRECISION
[11]
[12]
R EFERENCES
[1] S. G. Mallat, A theory for multiresolution signal decomposition: The
wavelet representation, IEEE Trans. Pattern Anal. Mach. Intell., vol. 2,
no. 7, pp. 674693, Jul. 1989.
[2] A. Grzeszczak, K. M. Mrinal, and P. Sethuraman, VLSI implementation
of discrete wavelet transform, IEEE Trans. VLSI Syst., vol. 4, no. 4,
pp. 421433, Dec. 1996.
[3] Z. Lu, D. Y. Kim, and A. William, Wavelet compression of ECG signals by the set partitioning in hierarchical trees algorithm, IEEE Trans.
Biomed. Eng., vol. 47, no. 7, pp. 849856, Jul. 2000.
[4] B. A. Rajoub, An efficient coding algorithm for the compression of ECG
signal using the wavelet transform, IEEE Trans. Biomed. Eng., vol. 49,
no. 4, pp. 355362, Apr. 2002.
[5] C. Chakrabarti and M. Vishwnath, Efficient realizations of the discrete
and continuous wavelet transforms: From single chip implementations