Compdsp Allie IIRFilter

Signal to Noise and Numeric Range issues for
Direct Form I & II IIR Filters on Modern Analog

Devices and TI Digital Signal Processors
Presented at the 2004 comp.dsp conference

Mark Allie Consulting LLC.
Mark Allie comp.dsp 2004 1

Main Topics
• Filter configurations Direct Form I and II and the

transposes are investigated.
• This presentation contains a summary of the issues of
signal to noise and numeric range of Direct Form IIR
filtering.
• Design criteria combined with sensible dsp choices are
presented.

Direct Form I, II and Transposes Filter Assumptions.
• Unity gain 2nd order filter sections (biquads) are used.

• The transfer function of the biquad is given in equations 1 & 2.
• The recursive coefficients are scaled to yield a0 = 1.
N
Y ( z) ∑b Z n
−n
H ( z) = = n =0
X ( z) M 1
1 + ∑ am Z −m
m =1
N M
y ( k ) = ∑ bn x( k − n ) − ∑ am y ( k − m ) 2
n =0 m =1
• The filters are implemented with all terms being added.

Filter Forms
Direct Form I Direct Form II
b0 b0
x(n) y(n) x(n) w(n) y(n)
Z-1 Z-1 Z-1

b1 -a1 -a1 b1
w(n-1)
Z-1 Z-1 Z-1

b2 -a2 -a2 b2
w(n-2)
Direct Form I Transposed Direct Form II Transposed

b0 b0
x(n) y(n) x(n) y(n)
Z-1 Z-1 Z-1

-a1 b1 b1 -a1
Z-1 Z-1 Z-1

-a2 b2 b2 -a2

Matlab Design Example
• Because all products are summed the Matlab generated

recursive coefficients a1 and a2 must be negated for use
in the filter.
• For example a Butterworth 2nd order low pass filter with
a cutoff frequency of 0.01 * Fs has the coefficients:
• b0 = 0.00094469184384
• b1 = 0.00188938368768
• b2 = 0.00094469184384
• a0 = 1.00000000000000
• a1 = -1.91119706742607 use -a1 or 1.91119706742607
• a2 = 0.91497583480143 use -a2 or -0.91497583480143
Topics of Interest
• Overflow in ideal and real filters.

• Noise issues.
• Noise Compensation issues.

Two Types of Overflow
• The final result overflows.

• Intermediate results overflow.
• Overflow of the result.

• Unity gain filters can have outputs that are greater than
±1 with a bounded input.

Ideal Frequency Response

Ideal Time Response

Internal Overflow with Fixed Point Math
• Modulo math is your friend.

• Modulo math is the default when not in saturation mode.
• Jackson’s Rule: Any number of additions and/or
subtractions may occur. Intermediate results and
operands may fall into any modulo. As long as the final
result is made to fall into the first modulo by design, it will
be representable in two’s complement at the chosen
wordlength, and a valid result.
• Once overflow detection is applied to the result the
correct sign will be reflected in the saturated value.

Noise Issues
• Noise is generated by truncating high resolution results.

• The noise generated by truncation often resembles the
spectrum of the signal being filtered.
• The generated noise is most harmful when it re-
circulates through the recursive coefficients.
• Force correlated truncation noise to be random (white)
by using dither.
• If the truncation process is deterministic then correction
can be applied to the filter.
• Fixed point truncation is deterministic.
• Floating point truncation is not deterministic.

Fixed Point Truncation
In a fixed point processor the accumulator has at least 2 times the number of bits as in the native data word .
Native Data Word N bits Acc N bits N bits
Any time the accumulated result needs to be stored it must be truncated .
It is assumed that data needs to be stored for multiplying , delaying and sending to a converter.
Truncation is performed after the lsb of the highest word of N bits .
lo
null
N bits N bits
hi
Truncate

Floating Point Truncation
• The data word in a floating point processor is composed of an exponent and
a mantissa.
• The number of bits used to represent a mantissa is not directly reduced.
• The number of bits used to represent one mantissa is usually indirectly

reduced when two numbers are added.
• This occurs because the exponents of floating point numbers must be equal
to add them.
• There is no indirect truncation when the exponents are already equal.
• The number of bits lost is dependant on the ratio of the 2 numbers being
added.

Floating Point Truncation
• For example: When adding 1 to 1/256, 7 bits are truncated because of
shifting when the exponents are matched.
• The truncation associated with floating point addition is not deterministic in

the case of Direct Form IIR filters.
• The number of bits truncated can be different for each addition.
• The number of bits truncated depends on the coefficients and input signal
statistics.
• Truncation errors can not be corrected.
• They usually must be accepted!
• If the recursive products can be kept larger than the direct products then the
recursive data path will not be truncated very much.

Noise compensation
• Fixed point compensation is accomplished through an

error feedback signal.
• What is the error signal? e(n) = y(n) – yt(n).
• Floating point compensation can not be performed.
• So how bad is this problem?

How is the error term generated?
y(n), yt(n) and e(n) are signed quantities.
y(n) is 2N bits. N bits N bits
yt(n) is N bits. N bits XXXX
e(n) = y(n) - yt(n). e(n) has a sign.
Is it correct to use the bit pattern in the lower half of the register as e (n)?
The pattern is correct for e (n) but what is the sign of e (n)? ???? N bits
For 2's complement math the sign of e (n) is always positive. 0000... N bits
Use unsigned multiplies for e(n).
lo
e(n)
N bits
Q N bits
hi
yt(n)
Truncate

Direct Form Implementation Analysis
• Overload considerations.
1. Input scaling.
2. Modulo math.
• Noise generation considerations.
1. Truncation noise.
2. Terms affected.

e(n)
Direct Form I
N(lo)
N N
N b0 2N 2N N(hi)
x(n) y(n) yt(n) yt(n)
Q Q
Z-1 N N Z-1
N b1 2N 2N -a1 N
Z-1 N N Z-1
N b2 2N 2N -a2 N
b0 + b1 z −1 + b2 z −2 1
Yt ( z ) = −1 −1
X (z) − E ( z)
1 + a1 z + a 2 z 1 + a1 z + a 2 z −1
−1
Fixed Point: Floating point:

Modulo math useful. One summer before saturator . No input scaling required.
No Input scaling required to prevent internal overflow . No overflow concerns.
One truncation feedback error term . Truncation error is coefficient dependant and not
Truncation noise affects recursive terms . correctible.
Truncation noise negligible for N large enough. Truncation noise affects recursive terms .
Number of bits in mantissa critical .

Floating Point Truncation Analysis
xn*b0 yn
xn-1*b1 xn-2*b2 yn-2*a2 yn-1*a1
5 products feeding the summer .
This means there are 4 summations performed per sample period (Ts).
Take the ratio of the ABS of 2 terms being summed*.
Find the maximum ratio for the 4 sets of addends for each Ts .
This indicates the number of truncated bits for each Ts .
* The chosen addend terms can be optimized.

Floating Point Truncation Analysis Example

Floating Point Truncation Analysis Example

e1(n) Direct Form II e2(n)
N(lo) N(lo)
N N
y(n)
N 2N N(hi) b0 2N N(hi)
x(n) w(n) yt(n)
Q Q
N Z-1 N
2N -a1 b1 2N
w(n-1)
N Z-1 N
2N -a2 b2 2N
w(n-2)

Modulo math not useful . There are 2 sums No input scaling required.
separated by a saturator . No overflow concerns.
Input scaling required to prevent internal overflow . Truncation error is coefficient dependant and not
One truncation feedback error term . correctible.
Truncation noise affects all terms . Truncation noise affects all terms .
Truncation noise negligible for N large enough. Number of bits in mantissa critical .

Direct Form I Transposed e3(n)
N N b0 2N N (hi)
x(n) yt(n)
Q Q
y(n)
Z-1 Z-1
e1(n) e4(n)
N
N (hi) 2N -a1 b1 2N N (hi)
Q Q
e2(n) e5(n)
-1 N
Z Z-1
N (hi) 2N -a2 b2 2N N (hi)
Q Q

Modulo math not useful. There are 2 sums No input scaling required.
separated by a saturator . No overflow concerns.
Input scaling required to prevent internal overflow . Truncation error is coefficient dependant and not
Multiple truncation feedback error terms . correctible.
Truncation noise affects all terms. Truncation noise affects all terms.
Truncation noise negligible for N large enough. Number of bits in mantissa critical .

Direct Form II Transposed e3(n)
N(lo)
N N
N b0 2N 2N N(hi)
x(n) yt(n)
Q Q
y(n)
yt(n)
Z-1 Fixed Point:
Modulo math useful. One summer before saturator .
N(hi) No Input scaling required to prevent internal overflow .
N(lo)
Multiple truncation feedback error terms *.
Q e1(n)
Truncation noise affects recursive terms .
N Truncation noise negligible for N large enough.
2N
b1 -a1 * One if Q1 and Q2 are removed. Dattorro
2N 2N
N N
Z-1 Floating point:
No input scaling required.
N(hi) No overflow concerns.
N(lo)
Truncation error is coefficient dependant and not
Q e2(n) correctible.
Truncation noise affects recursive terms .
N Number of bits in mantissa critical .
2N
b2 -a2
2N 2N
N N
Tabulated Results
Fixed Point Floating Point

DFI DFII DFIT DFIIT DFI DFII DFIT DFIIT
Input scaling + + + + + +
Modulo math + +
Truncation noise + + + - - - -
Performs well with 32 bit
mantissa + + + + + +
Error compensation + + + - - - -
+ Good Attribute + Qualified Good Attribute - Poor Attribute

Error Compensation
• Fixed Point Only.

• Direct Forms with 1 error term.
• Attempt to correct noise by feeding back the error signal.
• Evaluate DFI based on Dottorro and Wilson.

Error Compensation
Direct Form I with 2nd

order error feedback 2N k2
Z-1
2N k1
Z-1
N(lo) e(n)
N b0 2N 2N N(hi)
x(n) y(n) yt(n)
Q
Z-1 Z-1
N b1 2N 2N -a1 N
Z-1 Z-1
N b2 2N 2N -a2 N

Trivial Noise Shaping with the Error Term
yt(n) = y(n) - e(n).

2N k2 -1
−1 −2
Z
b0 + b1 z + b2 z 1
Yt ( z ) = −1 −1
X ( z) − E( z)
1 + a1 z + a 2 z 1 + a1 z + a 2 z −1
−1
2N k1
−1
b + b1 z + b2 z −2 −1
1 − k1 z − k 2 z −2 Z-1
Yt ( z ) = 0 −1 −1
X (z) − E( z)
1 + a1 z + a 2 z 1 + a1 z −1 + a 2 z −1
N(lo) e(n)
Trivial k 2N N(hi)
y(n) yt(n)
k1 k2 Region θ Q
+2 -1 0 Twice
-2 -1 π Twice
0 +1 0 and π Z-1
2N -a1 N
+1 -1 π/3 Twice
-1 -1 2π/3 Twice
+1 0 0 Once
-1 0 π Once Z-1
2N -a2 N
For k 1 = 1 and k2 = 0 the noise is
attenuated at dc and low frequencies .

Optimal Noise Shaping with the Error Term
yt(n) = y(n) - e(n).

2N k2 -1
−1
b + b1 z + b2 z −2
1 Z
Yt ( z ) = 0 −1 −1
X ( z) − E( z)
1 + a1 z + a 2 z 1 + a1 z + a 2 z −1
−1
2N k1
b0 + b1 z −1 + b2 z −2 1 − k1 z − 1 − k 2 z − 2
Yt ( z ) =
-1
X ( z) − E ( z) Z
1 + a1 z −1 + a 2 z −1 1 + a1 z −1 + a 2 z −1
N(lo) e(n)
2N N(hi)
Optimal k y(n) yt(n)
Q
k1 = -a1
k2 = -a2 Z-1
2N -a1 N
b0 + b1 z −1 + b2 z −2
Yt ( z ) = X ( z) − E( z)
1 + a1 z −1 + a 2 z −1
Z-1
2N -a2 N

2N Word Equivalent to 2nd Order Error
Compensated Filter.
Wilson:
If k 1 = -a1 and k2 = -a2 it is equivalent to
implementing a 2N bit recursive data path
biquad without error feedback compensation.
N(lo) e(n)
3N 2N
y(n) yt(n)
Q
Z-1
3N -a1 2N
Z-1
3N -a2 2N

Is This Good News?
• Yes DFI and DFIIT both work.

• A processor with 32 bit integer native word size
capabilities doesn’t need truncation error compensation.
• Some modern floating point processors can do this
inexpensively and fast.
• Some modern fixed point processors can do this
inexpensively and more slowly.

Floating Point?
• All DF IIR filters can work.

• Complicated analysis and addend evaluation.
• Floating point processors with extended precision results
may work well.
• Analog devices has processors with 32 bit mantissa
capabilities.
• Fixed point processing at 32 bits a sure bet.

References
• J. Dattorro, “The Implementation of Recursive Digital

Filters for High Fidelity Audio,” J. Audio Eng. Soc., vol
36, pp 851-878 (1988 Nov)
• R. Wilson, “Filter Topologies,” J. Audio Eng. Soc., vol 41,
pp 667-678 (1993 Sept)
• S.P. Lipshitz, R.A. Wannamaker and J. Vanderkooy,
“Quantization and Dither: A Theoretical Study,” J. Audio
Eng. Soc., vol 40, pp355-375 (1992 May)

Compdsp Allie IIRFilter

Enviado por

Dados do documento

Descrição original:

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Compdsp Allie IIRFilter

Enviado por

Direitos autorais:

Formatos disponíveis

Signal to Noise and Numeric Range issues for

Direct Form I & II IIR Filters on Modern Analog

Presented at the 2004 comp.dsp conference

Mark Allie comp.dsp 2004 1

• Filter configurations Direct Form I and II and the

Mark Allie comp.dsp 2004 2

• Unity gain 2nd order filter sections (biquads) are used.

• The filters are implemented with all terms being added.

Mark Allie comp.dsp 2004 3

Z-1 Z-1 Z-1

Z-1 Z-1 Z-1

Direct Form I Transposed Direct Form II Transposed

Z-1 Z-1 Z-1

Z-1 Z-1 Z-1

Mark Allie comp.dsp 2004 4

• Because all products are summed the Matlab generated

• Overflow in ideal and real filters.

Mark Allie comp.dsp 2004 6

• The final result overflows.

• Overflow of the result.

Mark Allie comp.dsp 2004 7

Mark Allie comp.dsp 2004 8

Mark Allie comp.dsp 2004 9

• Modulo math is your friend.

Mark Allie comp.dsp 2004 10

• Noise is generated by truncating high resolution results.

Mark Allie comp.dsp 2004 11

Native Data Word N bits Acc N bits N bits

Any time the accumulated result needs to be stored it must be truncated .

Truncation is performed after the lsb of the highest word of N bits .

Mark Allie comp.dsp 2004 12

• The number of bits used to represent a mantissa is not directly reduced.

• The number of bits used to represent one mantissa is usually indirectly

• There is no indirect truncation when the exponents are already equal.

Mark Allie comp.dsp 2004 13

• The truncation associated with floating point addition is not deterministic in

• The number of bits truncated can be different for each addition.

• Truncation errors can not be corrected.

• They usually must be accepted!

Mark Allie comp.dsp 2004 14

• Fixed point compensation is accomplished through an

Mark Allie comp.dsp 2004 15

y(n) is 2N bits. N bits N bits

yt(n) is N bits. N bits XXXX

e(n) = y(n) - yt(n). e(n) has a sign.

Use unsigned multiplies for e(n).

Mark Allie comp.dsp 2004 16

Mark Allie comp.dsp 2004 17

Fixed Point: Floating point:

Mark Allie comp.dsp 2004 18

xn-1*b1 xn-2*b2 yn-2*a2 yn-1*a1

5 products feeding the summer .

Take the ratio of the ABS of 2 terms being summed*.

This indicates the number of truncated bits for each Ts .

* The chosen addend terms can be optimized.

Mark Allie comp.dsp 2004 19

Mark Allie comp.dsp 2004 20

Mark Allie comp.dsp 2004 21

Fixed Point: Floating point:

Mark Allie comp.dsp 2004 22

Fixed Point: Floating point:

Mark Allie comp.dsp 2004 23

Fixed Point Floating Point

xn-1b1 xn-2b2 yn-2a2 yn-1a1