Lecture 07: Adaptive Filtering: Instructor: Dr. Gleb V. Tcheslavski Contact: Gleb@ee - Lamar.edu Office Hours

Summer II 2008
1
ELEN 5301 Adv. DSP and Modeling
Lecture 07: Adaptive filtering
Instructor:
Dr. Gleb V. Tcheslavski
Contact:
gleb@ee.lamar.edu
Office Hours:
Room 2030
Class web site:
http://www.ee.lamar.edu/
gleb/adsp/Index.htm
Material comes from Hayes
Summer II 2008
2
Introduction
So far, we were concerned with processing (including optimum filtering etc.) of wss
signals. However, most of natural signals are nonstationary! This makes the
techniques considered so far as inappropriate.
One way around this limitation would be to process these nonstationary processes
in blocks, over short time intervals, for which the process may be assumed to be
approximately stationary (quasi-stationary). However, the efficiency of this
approach is limited for several reasons.
1.For rapidly varying processes, the quasi-stationarity interval may be too short to
be sufficient for desire resolution in parameters estimation.
2.It is not easy to accommodate step changes within the analysis intervals.
3.This solution imposes an incorrect (i.e., piecewise stationary) model on the non-
stationary data.
Let us start from the scratch
Summer II 2008
3
Problem statement
Reconsider the Wiener filtering problem within the context of nonstationary
processes. Let w
n
be the unit pulse response of the FIR Wiener filter producing the
MMS estimate of the desired process d
n
.
0
p
n k n k
k
d w x
=
=
(7.3.1)
If d
n
and x
n
are nonstationary, the filter coefficients minimizing the error
{ }
{ }
2
2

n n n
E e E d d =
will depend on n and the filter will be time-varying, i.e.
,
0
p
n n k n k
k
d w x
=
=
where w
n,k
is the value of the k
th
coefficient at time n.
(7.3.2)
(7.3.3)
Summer II 2008
4
Problem statement
The last equation can be rewritten using the vector notation:
T
n n n
d = w x
where
,0 ,1 ,
, ,...,
T
n n n n p
w w w =

w
is the vector of filter coefficients at time n, and
1
, ,...,
T
n n n n p
x x x

=

x
(7.4.1)
(7.4.2)
(7.4.3)
The design of a time-varying (adaptive) filter is much more difficult than the design
of a traditional (time invariant) Wiener filter since it is necessary to find a set of
optimum coefficients w
n,k
for k =0,1,,p and for each value of n.
However, the problem may be simplified considerably if we do not require that w
n
minimize the MS error at each time n and consider, instead, a coefficient update
equation of the form
Summer II 2008
5
Problem statement
where w
n
is a correction that is applied to the filter coefficients w
n
at time n to form
a new set of coefficients w
n+1
at time n+1.
1 n n n +
= + w w w (7.5.1)
The design of an adaptive filter
involves defining how this correction
should be formed.
This approach may be preferred
even for stationary cases. For
instance, if the order p is large, it
might be difficult to solve Wiener-
Hopf equations directly.
Also, if R
x
is ill-conditioned (nearly singular), the solution to the Wiener-Hopf
equations will be numerically sensitive to round-off errors and finite precision
effects. Finally, since autocorrelation and cross-correlations are unknown, we need
to estimate them. Since the process may be changing, these estimates need to be
updated continuously.
Summer II 2008
6
Problem statement
The key concept of an adaptive filter is the set of rules defining how the correction
w
n
is formed. Despite the particular algorithm, the adaptive filter should have the
following properties:
1.In a stationary situation, the filter should produce a sequence of corrections w
n
such that w
n
converges to the solution of the Wiener-Hopf equations:
2.It should not be necessary to know the signal statistics r
x
(k) and r
dx
(k) in order to
compute w
n
. The estimation of these statistics should be a part of the adaptive
filter.
3.In nonstationary situations, the filter should be able to adapt to the changing
signal statistics and trackthe solution as it evolves in time.
1
lim
n x dx
n
= w R r (7.6.1)
It is also important that the error signal e
n
should be available to the filter since this
error signal allows filter to measure its performance and determine how to modify
filter coefficients (adapt).
Summer II 2008
7
Problem statement
Obtaining the desired signal d
n
is straightforward in some applications, which
makes evaluation of e
n
easy.
For instance, in the case of system identification, a plant produces an output d
n
in
response to a known input x
n
. The goal is to develop a model for the plant that
would replicate d
n
as close as possible. An additive noise source v
n
represents
observation noise.
Summer II 2008
8
Problem statement
In other applications, obtaining the desired signal d
n
is not as straightforward.
For example, in the problem of interference (noise) cancellation where d
n
is
observed in the presence of an interfering signal v
n
:
n n n
x d v = +
(7.8.1)
Summer II 2008
9
Problem statement
Since d
n
is unknown, the error sequence cannot be generated directly. However, in
some situations, it is possible to generate a sequence that may be used by the
adaptive filter to estimate d
n
.
Suppose that d
n
and v
n
are real-valued, uncorrelated zero mean processes.
Additionally, suppose that d
n
is a narrowband process and that v
n
is a broadband
process with an autocorrelation sequence that is approximately zero for lags k k
0
.
the MS error may be expressed as
{ } [ ]
{ }
{ } [ ]
{ }
[ ] { }
2 2
2 2
n n n n n n n n n n
E e E d v y E v E d y E v d y = + = + + (7.9.1)
Since v
n
and d
n
are uncorrelated
{ } 0
n n
E v y =
the last term in (7.9.1) becomes
[ ] { } { }
2 2
n n n n n
E v d y E v y =
(7.9.2)
(7.9.3)
Summer II 2008
10
Problem statement
Additionally, since the input to the filter is x
n-k0
, the output will be
0 0 0
, ,
0 0
p p
n n k n k k n k n k k n k k
k k
y w x w d v

= =
= = +

Therefore:
{ } { } { }
0 0
,
0
p
n n n k n n k k n n k k
k
E v y w E v d E v v

=

= +

Finally, since v
n
is uncorrelated with d
n
as well as with v
n-k0-k
,
{ }
0
0
n n k k
E v v

=
and the MS error becomes
{ } { } [ ]
{ }
2
2 2
n n n n
E e E v E d y = +
(7.10.1)
(7.10.2)
(7.10.3)
(7.10.4)
Minimizing the filter MS error is equivalent to minimizing the MS error between the
desired signal d
n
and the filter output y
n
. Therefore, the output of the adaptive filter
in the considered case is the MMS estimate of d
n
.
Summer II 2008
11
FIR adaptive filters
FIR adaptive filters are quite popular for the following reasons:
1.Stability can be easily controlled by ensuring that the filter coefficients are
bounded.
2.There are simple and efficient algorithms for adjusting the filter coefficients.
3.These algorithms are well understood in terms of their convergence and stability.
4.FIR adaptive filters usually perform well enough to satisfy the design criteria.
An FIR adaptive filter for estimating a desired signal d
n
from a related signal x
n
as
,
0
p
T
n n k n k n n
k
d w x
=
= =
w x
is shown
(7.11.1)
Summer II 2008
12
Assuming that x
n
and d
n
are nonstationary random processes, the goal is to find
the coefficient vector w
n
at time n that minimizes the MS error
{ }
2
n n
E e =
where

T
n n n n n n
e d d d = = w x
Similarly to the Wiener filter case, the coefficients minimizing the MS error can be
found by setting the derivative of
n
with respect to w
n,k
*
equal to zero for k =
0,1,,p. Therefore:
{ }
*
*
,
0
0, 0,1,...
0, 0,1,...
n n k
p
n n l n l n k
l
E e x k p
E d w x x k p

=
= =

= =

(7.12.1)
(7.12.2)
(7.12.3)
(7.12.4)
Summer II 2008
13
(7.13.1)
which after rearranging terms becomes
{ } { }
* *
,
0
, 0,1,...
p
n l n l n k n n k
l
w E x x E d x k p

=
= =
a set of p+1 linear equations in the p+1 unknowns w

n,l
, the solution to which
depends on n. Expressed in vector form:
( ) ( )
x n dx
n n = R w r
(7.13.2)
where
{ } { } { }
{ } { } { }
{ } { } { }
* * *
1
* * *
1 1 1 1
* * *
1
( )
n n n n n p n
n n n n n p n
x
n n p n n p n p n p
E x x E x x E x x
E x x E x x E x x
n
E x x E x x E x x

=

R
(7.13.3)
(p+1)x(p+1) Hermitian
matrix of autocorrelations
Summer II 2008
14
(7.14.1)
{ } { } { }
* * *
1
( )
T
dx n n n n n n p
n E d x E d x E d x

=

r
Vector of cross-correlations between d
n
and x
n
We observe that in the case of wss processes, (7.13.2) reduces to the
Wiener-Hopf equations, and the solution w
n
becomes time-independent.
Instead of solving (7.13.2) for each value of n, which would be impractical
in most real-time applications, we consider alternative approaches next.
Summer II 2008
15
FIR adaptive filters: the steepest
descent adaptive filter
The vector w
n
minimizing the quadratic error function can be found by setting its
derivative with respect to filter coefficients w
n
to zero. An alternative approach is to
search for the solution using the iterative method of steepest descent.
Let w
n
be an estimate of the vector that minimizes the MS error
n
at time n. At time
n+1, a new estimate is formed by adding a correction to w
n
that is designed to
bring w
n
closer to the desired solution. The correction involves taking a step of size
in the direction of maximum descent down the quadratic error surface.
For instance, for a 3D plot of a quadratic function of
2 real-valued coefficients w
0
and w
1
such as:
2 2
0 1 0 1 0 1
6 6 4 6 6
n
w w w w w w = + + +

(7.15.1)
We notice that the contours of constant error, when
projected onto w
0
w
1
plane, form a set of concentric
ellipses. The direction of steepest descent at any
point in the plane is the direction that a marble would
take if it was placed inside of this quadratic bowl.
Summer II 2008
16
Mathematically, this direction is given by the gradient: the vector of partial
derivatives of
n
with respect to the coefficients w
k
. For the quadratic function in
(7.15.1), the gradient vector is
0 0 1
1 0
1
12 6 6
12 6 4
n
n
n
w w w
w w
w

+

= =

+

(7.16.1)
We observe that the gradient is orthogonal to the line
that is tangent to the contour of constant error at w.
However, since the gradient vector points in the
direction of the steepest ascent, the direction of
steepest descent points in the negative gradient
direction.
Summer II 2008
17
Therefore, the update equation for w
n
is
1 n n n

+
= w w
(7.17.1)
The step size affects the rate at which the weight vector moves down the
quadratic surface and must be a positive number. For very small values of , the
correction to w
n
is small and the movement down the quadratic surface is slow
and, as increases, the rate of descent increases.
However, an upper limit exists on how large the step size could be. For values of
exceeding this limit, the trajectory of w
n
becomes unstable and unbounded.
Summer II 2008
18
The steepest descent algorithm may be summarized as follows:
1. Initialize the steepest descent algorithm with an initial estimate w
0
of the
optimum weight vector w.
2. Evaluate the gradient of
n
at the current estimate of w
n.
3. Update the estimate at time n by adding a correction in the negative
gradient direction as follows:
4. Go back to step 2 and repeat the process.
1 n n n

+
= w w
(7.18.1)
Summer II 2008
19
Assuming that w is complex, the gradient is the derivative of E{|e
n
2
|}with respect to
w
*
. Then with
{ } { }
{ }
2 2
*
n n n n n
E e E e E e e = = =
and
* *
n n
e = x
it follows that
{ }
*
n n n
E e = x
Therefore, with a step size , the steepest descent algorithm becomes
{ }
*
1 n n n n
E e
+
= + w w x
(7.19.1)
(7.19.2)
(7.19.3)
(7.19.4)
Summer II 2008
20
If x
n
and d
n
are jointly wss, then
{ } { } { }
* * * T
n n n n n n n dx x n
E e E d E = = x x w x x r R w
and the steepest descent algorithm becomes
( )
1 n n dx x n
+
= + w w r R w
We notice that if w
n
is the solution to the Wiener-Hopf equations,
1
n x dx
= w R r
the correction term is zero and w
n
=w
n+1
for all n yielding a stationary solution.
(7.20.1)
(7.20.2)
(7.20.3)
Summer II 2008
21
The following property defines what is required for w
n
to converge to w:
Property 1:
For jointly wss processes d
n
and x
n,
the steepest descent adaptive filter
converges to the solution of the Wiener-Hopf equations
1
lim
n x dx
n
= w R r
if the step size satisfies the condition
max
2
0
< <
where
max
is the maximum eigenvalue of the autocorrelation matrix R
x
.
(7.21.1)
(7.21.2)
Summer II 2008
22
To establish Property 1, we rewrite (7.19.4) as follows:
( )
1 n x n dx

+
= + w I R w r
Subtracting w from both sides and since
dx x
= r R w
we have
( ) [ ]( )
1 n x n x x n

+
= + = w w I R w R w w I R w w
Denoting next the weight error vector as
n n
= c w w
(7.22.3) becomes
( )
1 n x n
+
= c I R c
(7.22.1)
(7.22.2)
(7.22.3)
(7.22.4)
(7.22.5)
Summer II 2008
23
We note that, unless R
x
is a diagonal matrix, there will be cross-coupling between
the coefficients of the weight error vector. However, these coefficients can be
decoupled using following approach.
The autocorrelation matrix may be factored using the spectral theorem as
H
x
= R VV
where is a diagonal matrix of eigenvalues of R
x
and V is a matrix whose columns
are eigenvectors of R
x
. Since R
x
is Hermitian and nonnegative definite, the
eigenvalues are real and non-negative:
k
0, and the eigenvectors may be
chosen to be orthonormal: VV
H
= I, i.e., V is unitary. Then
( )
1
H
n n
+
= c I VV c
Using the unitary properties of V, we derive
( ) ( )
1
H H H
n n n

+
= = c VV VV c V I V c
(7.23.1)
(7.23.2)
(7.23.3)
Summer II 2008
24
( )
1
H H
n n
+
= V c I V c
Multiplying both sides of (7.23.3) by V
H
gives
Defining next H
n n
= u V c
Then (7.24.1) becomes
( )
1 n n
+
= u I u
(7.24.2) represents a rotation of the coordinate system
for the weight error vector c
n
with the new axes
aligned with the eigenvectors v
k
of the autocorrelation
matrix. With the initial weight vector u
0
, it follows that
( )
0
n
n
= u I u
(7.24.1)
(7.24.2)
(7.24.3)
(7.24.4)
Summer II 2008
25
Since (I - ) is a diagonal matrix, the k
th
component of u
n
may be expressed as
( )
, 0,
1
n
n k k k
u u =
In order for w
n
to converge to w, it is necessary that the weight error vector c
n
converges to zero and, therefore, that u
n
=V
H
c
n
converges to zero. This will occur
iff
1 1, 0,1,...,
k
k p < =
which imposes the following restrictions on the step size :
max
2
0
< <
as was to be derived.
(7.25.1)
(7.25.2)
(7.25.3)
Summer II 2008
26
We may next derive an expression for the evolution of the weight vector w
n
. With
,0
,1
0 1
,
n
n
n n n p
n p
u
u
u

= + = + = +

w w c w Vu w v v v
and using (7.25.1), we have

( )
, 0,
0 0
1
p p
n
n n k k k k k
k k
u u
= =
= + = +

w w v w v
Since w
n
is a linear combination of the eigenvectors v
n
, referred to as the modes of
the filter, then w
n
will converge no faster than the slowest decaying mode. With
each mode decaying as (1-
k
)
n
, we may define the time constant
k
to be the time
required for the k
th
mode to reach 1/e of its initial value:
( ) 1 1
k
k
e
=
(7.26.1)
(7.26.2)
(7.26.3)
Summer II 2008
27
Taking logarithms of (7.26.3), we have
( )
1
ln 1
k
k
If is small enough, so that

k
<<1, the time constant may be approximated as
1
k
k
Defining the overall time constant as the time needed for the slowest decaying
mode to converge to 1/e of its initial value, we have
{ }
min
1
max
k

=
(7.27.1)
(7.27.2)
(7.27.3)
Summer II 2008
28
Since Property 1 places an upper bound on the step size for convergence, the step
size can be expressed as
max
2

=
where is a normalized step size with 0 < <1. Therefore, the time constant is
max
min
1 1
2 2

=
where
max
min
=
is the condition number of the autocorrelation matrix.
(7.28.1)
(7.28.2)
(7.28.3)
Summer II 2008
29
Therefore, the rate of convergence is determined by the eigenvalue spread.
Error contours for 2D adaptive filters
1
=
2
= 1, i.e.
=1
Since the
contours are
circles, the
direction of
steepest
descent points
toward the min
of quadratic fcn.
1
= 3,
2
= 1,
i.e. =3
The direction of
steepest
descent does
not typically
point toward the
min of the
quadratic fcn.
Summer II 2008
30
Another useful measure of performance is the behavior of the MS error as a
function of n. For jointly wss processes, the MMS error is
min
(0)
H
d dx
r = r w
For an arbitrary weight vector w
n
, the MS error is
{ } { }
2
2
(0)
T H H H
n n n n n d dx n n dx n x n
E e E d r = = = + w x r w w r w R w
Expressing w
n
in terms of the weight error vector c
n
, the MS error becomes
( ) ( ) ( ) ( ) (0)
H H
H
n d dx n n dx n x n
r = + + + + + r w c w c r w c R w c
Expanding the product and since
x dx
= R w r
(7.30.1)
(7.30.2)
(7.30.3)
(7.30.4)
Summer II 2008
31
we derive that
(0)
H H
n d dx n x n
r = + r w c R c
The first two terms are equal to the minimum error. Therefore, the error at time n is
min
H
n n x n
= +c R c
With the decomposition
H
x
= R VV
and using the definition for u
n
given previously, we have
min
H
n n x n
= +u u
Expending the quadratic form and using the expression for u
n,k
yields
( )
2 2
2
min , min 0,
0 0
1
p p
n
n k n k k k k
k k
u u
= =
= + = +

(7.31.1)
(7.31.2)
(7.31.3)
(7.31.4)
(7.31.5)
Summer II 2008
32
If the step size satisfies the condition for convergence (7.21.2), then
n
decays exponentially to
min
. A plot of
n
vs. n is referred to as the learning
curve and indicates how rapidly the adaptive filter learnsthe solution to
the Wiener-Hopf equations.
Although for stationary processes, the steepest descent adaptive filter
converges to the solution of the Wiener-Hopf equations when < 1/
max
,
this algorithm is primary of theoretical interest and finds little use in
practice. The reason for this is that to compute the gradient vector, it is
necessary to know E{e
n
x
n
*
}. For the majority of applications the exact
statistics is unknown and must be estimated from the data, such as
{ }
1
* *
0
1
L
n n n l n l
l
E e e
L

=
=
x x (7.32.1)
Summer II 2008
33
FIR adaptive filters: LMS algorithm
Incorporating the correlation estimate into the steepest descent method yields:
1
*
1
0
L
n n n l n l
l
e
L
+
=
= +
w w x
In a special case, a one-point sample mean (L =1) is used
{ }
* *
n n n n
E e e = x x
*
1 n n n n
e
+
= + w w x
And the filter-update equation becomes
which is known as the LMS algorithm.
(7.33.1)
(7.33.2)
(7.33.3)
Summer II 2008
34
FIR adaptive filters: LMS algorithm
The simplicity of the algorithm comes from the fact that the update of the k
th
filter
coefficient only requires one multiplication and one addition:
*
1, , n k n k n n k
w w e x
+
= +
An LMS adaptive filter with p+1 coefficients requires p+1 multiplications and p+1
additions to update the filter coefficients. One addition is needed to form an error e
n
and one multiplication is required to form the product e
n
. Finally, p+1
multiplications and p additions are needed to calculate the output y
n
. Therefore, a
total of 2p+3 multiplications and 2p+2 additions per output sample are required.
(7.34.1)
A summary of an LMS algorithm is given
Summer II 2008
35
Convergence of the LMS algorithm
In estimating the process statistic E{e
n
x
n
*
}by a one-point sample mean, the LMS
algorithm replaces the gradient in the steepest descent algorithm by an estimated
gradient
*
n n n
e = x
In this situation, the correction to the filter coefficients is generally not aligned with
the direction of steepest descent. However, since the gradient estimate is unbiased
{ } { }
*
n n n n
E E e = = x
then the applied correction is on average in the direction of steepest descent.
Weight
trajectories with
w
0
=0
Weights move
to the W-H
solution.
Weight
trajectories with
w
0
=W-H
solution (gradient
is zero)
Weights move
randomly.
(7.35.1)
(7.35.2)
Summer II 2008
36
Assuming the x
n
and d
n
are jointly wss and that the coefficients converge in the
mean to
{ }
1
lim
n x dx
n
E
= = w w R r
The LMS coefficient update equation can be rewritten as
*
1
T
n n n n n n
d
+
= +

w w w x x
Taking the expected value, we have
{ } { } { } { }
* *
1
T
n n n n n n n
E E E d E
+
= + w w x x x w
The last term in (7.36.3) is not easy to evaluate. However, it may be simplified by
making the independency assumption:
(7.36.1)
(7.36.2)
(7.36.3)
Independence assumption: the data in x
n
and the LMS
weight vector w
n
are statistically independent.
Since w
n
depends on the previous vectors x
n-1
,this assumption is only approximately true.
Summer II 2008
37
With this assumption, (7.36.3) becomes
{ } { } { } { } { } ( ) { }
* *
1
T
n n n n n n n x n dx
E E E d E E E
+
= + = + w w x x x w I R w r
We observe that (7.37.1) is the same as for the steepest descent algorithm.
Therefore, the analysis will be the same. In particular, it follows that
{ } ( )
0
n
n
E = u I u
(7.37.1)
(7.37.2)
where ( )
H
n n
= u V w w
Since w
n
converges in mean to w if E{u
n
}converges to zero, the following property
will hold:
Property 2:
For jointly wss processes the LMS algorithm converges in mean if the
independence assumption is satisfied and
max
0 2 < <
(7.37.3)
(7.37.4)
Summer II 2008
38
max
0
( )
p
k x
k
tr
=
=
R
1
( )
n
ii
i
tr a
=
=
A
(7.37.4) places a bound on the step size for convergence in the mean. However,
this bound is not very practical since:
1) The upper bound is too large to ensure stability of the LMS algorithm;
2) Since R
x
is usually unknown, its eigenvalues cannot be computed.
The last difficulty can be relaxed by the observation that
where the trace operator is defined as
If x
n
is wss, R
x
is Toeplitz and the trace can be computed as
( ) ( )
{ }
2
( ) 1 (0) 1
x x n
tr p r p E x = + = + R
(7.38.1)
(7.38.2)
(7.38.3)
Summer II 2008
39
Although we have simply replaced one unknown with another, E{|x
n
|
2
}is more
easily to estimate since it represents the power in x
n
. For example, it could be
estimated using an average such as
( )
{ }
2
2
0
1
n
p E x
< <
+
The bound equation (7.37.4) may be replaced with a more conservative bound:
{ }
1
2 2
0
1
N
n n k
k
E x x
N
=
=
(7.39.1)
(7.39.2)
Summer II 2008
40
where v
n
is unit variance white noise. The optimum causal linear predictor would be
However, to determine that predictor (i.e., to find its coefficients), the exact
autocorrelation sequence for x
n
needs to be known. Therefore, we consider an
adaptive linear predictor in the form
Ex: Adaptive linear prediction by the LMS
algorithm
1 2
1.2728 0.81
n n n n
x x x v

= +
1 2
1.2728 0.81
n n n
x x x

=
Let x
n
be a second-order AR process generated according to
,1 1 ,2 2
n n n n n
x w x w x

= +
(7.40.1)
(7.40.2)
(7.40.3)
The predictor coefficients are updated as follows
*
1, , n k n k n n k
w w e x
+
= +
(7.40.4)
Summer II 2008
41
Ex: Adaptive linear prediction by the
LMS algorithm
If the step size is sufficiently small, the coefficients w
n,1
and w
n,2
will converge in
the mean to their optimum values, w
n,1
=1.2728 and w
n,2
=-0.81. In general, the
prediction error is
,1 1 ,2 2
1.2728 0.81
n n n n n n n n
e x x w x w x v

= = + +

Therefore, when the coefficients converge to their optimum values, the error
becomes e
n
= v
n
, and the MMS error is
2
min
1
v
= =
An interesting property of a linear
predictor is that its MS error does
not converge to its MMS value.
The estimates (predictions) are
formed as shown:
(7.41.1)
(7.41.2)
Summer II 2008
42
LMS algorithm
Assuming that the weight vector was initialized at zero: w
0
=0, the trajectories for
filter coefficients are shown below for
= 0.02
= 0.004
True
value
We observe that for a smaller step size, the weights take longer to converge but
the trajectories are smoother basic trade-off between convergence and stability
(variance) of the final solution.
Summer II 2008
43
LMS algorithm
Squared errors are shown below:
= 0.02
= 0.004
The squared error is also less peaked for a smaller step size.
Summer II 2008
44
As the weight vector begins to converge in the mean, the coefficients start fluctuate
about their optimum values. These fluctuations are due to the noisy gradient
vectors used to form corrections for w
n
. As a result, the variance of the weight
vector does not go to zero and the MS error is larger than its MMS value by the
amount of the excess MS error.
As w
n
oscillates about w = R
x
-1
r
dx
, the corresponding MS error
n
has a value that
exceeds the MMS error on average. This error at time n is
( )
min,
T
T T
n n n n n n n n n n
e d d e = = + = w x w c x c x
where e
min,n
is the error that would occur if the optimum filter coefficients were used
min,
T
n n n
e d = w x
Assuming that the filter is in the steady state with E{c
n
}=0, the MS error is
{ }
2
min , n n ex n
E e = = +
(7.44.1)
(7.44.2)
(7.44.3)
Summer II 2008
45
and the LMS algorithm converges in the mean-square iff the step size
satisfies
where
is the MMS error and
ex,n
is the excess MS error, which depends on the statistics
of x
n
, c
n
, and d
n
. Although
ex,n
is not easy to evaluate, the following property may
be established using the independence assumption:
{ }
2
min min,n
E e =
Property 3:
The MS error
n
converges to a steady state value of
max
0 2 < <
( )
min , min
0
1
1 2
ex p
k k
k

=
= + =

( )
0
2 1
p
k k
k

=
<
(7.45.1)
(7.45.2)
(7.45.3)
(7.45.4)
Summer II 2008
46
We can find that
0
, min
0
2
1
2
p
k
k k
ex p
k
k k
Moreover, if <<2/
max
, as is typically the case, then
k
<<2 and (7.45.4) may be
simplified to
0
1
1
2
p
k
k

=
<
or
( )
2
x
tr
<
R
When <<2/
max
it also follows that
( )
min
1
1 0.5
x
tr

R
(7.46.1)
(7.46.2)
(7.46.3)
(7.46.4)
Summer II 2008
47
and the excess MS error is approximately
( )
( )
( )
, min min
0.5 1
1 0.5 2
x
ex x
x
tr
tr
tr

R
R
R
(7.47.1)
Therefore, for small , the excess MS error is proportional to the step size .
Adaptive filters may also be described in terms of their misadjustment, which is a
normalized MS error defined as follows
Definition:
The misadjustment Mis the ratio of the steady state excess MS error to
the minimum MS error
, min ex

= M
If the step size is small: <<2/
max
then the misadjustment is approximately
( )
( )
( )
0.5
1
1 0.5 2
x
x
x
tr
tr
tr

R
R
R
M
(7.47.2)
(7.47.3)
Summer II 2008
48
Ex: LMS misadjustment
We look at the learning curves for the adaptive linear predictor considered in the
previous example. Since the learning curve is a plot of
n
=E{|e
n
|
2
}versus n, we
may approximate the learning curve by averaging plots of |e
n
|
2
that are obtained by
repeatedly implementing the adaptive predictor K times and denoting the squared
error at time n on the k
th
trial by |e
k,n
|
2
, we have
{ }
2
2
,
1
1

K
n n k n
k
E e e
K
=
= =
(7.48.1)
With K =200, an initial weight vector at zero,
and the step sizes of = 0.02 (solid line) and
= 0.004 (dashed line), the estimates of learning
curves are shown:
When the step size decreases, the
convergence of the adaptive filter to its steady
state value is slower but the average steady
state squared error is smaller.
Summer II 2008
49
Ex: LMS misadjustment
We may estimate the steady state MS error by averaging over n after the LMS
algorithm has reached steady state. For example, with

n
{ }
1000
2
901
1

100
n
k
E e
=
=
we find
1.1942,
1.0155,
= 0.02
=
= 0.004
We may compare these results to the theoretical steady state MS error. With
min
=
1 and eigenvalues
1
=9.7924 and
2
=1.7073, it follows that
1.1441,
1.0240,
= 0.02
=
= 0.004
Which is fairly close to the estimated values.

(7.49.1)
(7.49.2)
(7.49.3)
Summer II 2008
50
Normalized LMS (NLMS)
One of the difficulties in the design of adaptive LMS filters is the selection of the
step size . For stationary processes, the LMS algorithm converges in the mean if
0 < <2/
max
, and converges in the mean-square if 0 < <2/tr(R
x
). However,
since R
x
is generally unknown, then either
max
or R
x
must be estimated.
One way is to use the fact that, for stationary processes,
( )
{ }
2
( 1)
x n
tr p E x = + R
Therefore, the condition for mean-square convergence may be replaced by
( )
{ }
2
2
0
1
n
p E x
< <
+
where E{|x
n
|
2
}is the power in the process x
n
that may be estimated as
{ }
2 2
0
1
1
p
n n k
k
E x x
p
=
=
+
(7.50.1)
(7.50.2)
(7.50.3)
Summer II 2008
51
(7.51.1)
which leads to the following bound on the step size for mean-square convergence:
2
0
H
n n
< <
x x
A convenient way to incorporate this bound into the LMS adaptive filter is to use a
time varying step size of the form
2 n H
n n
n

= =
x x
x
where is a normalized step size with 0 < <2. Replacing in the LMS weight
vector update equation with
n
leads to the Normalized LMS (NLMS) algorithm:
*
1 2
n
n n n
n
e
+
= +
x
w w
x
(7.51.2)
(7.51.3)
Summer II 2008
52
We note that the effect of normalization by ||x
n
||
2
is to change the magnitude but
not the direction of the estimated gradient vector. Therefore, with the appropriate
set of statistical assumptions, it may be shown that the NLMS algorithm converges
in the mean-square if 0 < < 2.
In the LMS algorithm, the correction applied to w
n
is proportional to the input vector
x
n
. Therefore, when x
n
is large, the LMS algorithm has a problem of gradient noise
amplification. In the NLMS algorithm, however, the normalization diminishes this
problem. On the other hand, the NLMS algorithm has a similar problem when ||x
n
||
becomes too small. An alternative, therefore, is to use the following modification to
the NLMS algorithm:
*
1 2
n
n n n
n
e
+
= +
+
x
w w
x
(7.52.1)
where is some small positive number.
Lastly, the normalization term can be computed recursively:
2
2 2 2
1 1 n n n n p
x x
+
= + x x
(7.52.2)
Summer II 2008
53
Normalized LMS (NLMS): Example
Adaptive linear prediction using NLMS
The process that is to be predicted is the AR(2) process generated by the
difference equation:
1 2
1.2728 0.81
n n n n
x x x v

= +
where v
n
is unit variance white noise. With a two-coefficient LMS adaptive
predictor, we have
,1 1 ,2 2
n n n n n
x w x w x

= +
(7.53.1)
(7.53.2)
where the predictor coefficients are updated according to
1, ,
2 2
1 2
; 1,2
n k
n k n k n
n n
x
w w e k
x x
+

= + =
+ +
(7.53.3)
Summer II 2008
54
Normalized LMS (NLMS): Example
Using normalized step sizes of =0.05 and =0.01 with =0.0001 and w
0,1
=
w
0,2
= 0, the predictor coefficients are shown:
=0.05
=0.01
We observe that the trajectories are similar to ones produced bythe LMS
algorithm. The difference is that the NLMS algorithm does not need to estimate
max
to select a step size.
Summer II 2008
55
NLMS: noise cancellation
The problem of noise cancellation implies estimation of the process d
n
from a noise
corrupted observation
1, n n n
x d v = +
It is impossible to separate d
n
and v
1,n
without any information about these
processes. However, given a reference signal v
2,n
that is correlated with v
1,n
, this
reference signal can be used to estimate the noise v
1,n
, and this estimate may be
subtracted from x
n
to form an estimate for d
n
:
1,

n n n
d x v =
For example, if d
n
, v
1,n
, and v
2,n
are jointly wss processes, and if
the autocorrelation r
v2
(k) and the
cross-correlation r
v1v2
(k) are
known, a Wiener filter may be
designed to find the MMS
estimate of v
1,n
as shown.
(7.55.1)
(7.55.2)
Summer II 2008
56
In practice, however, a stationarity assumption is not generally appropriate and the
statistics of v
1,n
and v
2,n
are generally unknown. Therefore, as an alternative to the
Wiener filter, we consider the adaptive noise canceller shown below.
If the reference signal v
2,n
is uncorrelated with d
n
,
then it follows that
minimizing the MS error
E{|e
n
|
2
}is equivalent to
minimizing
{ }
2
1, 1,
n n
E v v
In other words, the output of the adaptive filter is the MMS estimate of v
1,n
since
there is no information about the desired signal d
n
in the reference v
2,n
. Therefore,
e
n
is the MMS estimate of d
n
.
Summer II 2008
57
As a specific example, we consider a sinusoid
( )
0
sin
n
d n = +
with
0
=0.05 contaminated by noise v
1,n
1, 1, 1
0.8
n n n
v v g
= +
where g
n
is a zero-mean unit variance white noise uncorrelated with d
n
.
(7.57.1)
(7.57.2)
d
n
Summer II 2008
58
The noisy signal is
1, n n n
x d v = + (7.58.1)
2, 2, 1
0.6
n n n
v v g
= + (7.58.2)
The reference signal
Summer II 2008
59
The estimate of d
n
produced with the 6
th
order NLMS adaptive
noise canceller with the
step size =0.25
We observe that after about 100 iterations, the adaptive filter is producing a fairly
accurate estimate of d
n
and, after about 200 iterations, the filter appears to have
settled down into its steady state behavior.
However, the estimate is not as good as
one produced with a 6
th
order Wiener
filter.
Summer II 2008
60
One of the advantages of this adaptive noise canceller over a Wiener filter is that it
may be used when the processes are nonstationary. For instance, the same
sinusoid may be corrupted
with nonstationary noise.
Suppose that v
1,n
and v
2,n
are
generated according to
(7.57.2) and (7.58.2) with g
n
being nonstationary white
noise with a variance
increasing linearly from
g
2
(0)
=0.25 to
g
2
(1000) =6.25.
The noisy signal x
n
Summer II 2008
61
The reference signal
The estimate of d
n
produced with the 12
th
order NLMS adaptive
step size =0.25
Summer II 2008
62
We observe that the performance of the noise canceller is not significantly affected
by the nonstationarity of noise.
The availability of a reference signal v
2,n
is very important for successful operation
of the adaptive noise canceller. Unfortunately, in many applications, a reference
signal is not available and an alternative approach must be considered. In some
cases, however, it is possible to derive a reference signal by delaying the process
x
n
= d
n
+ v
1,n
. For example, suppose that d
n
is a narrowband process and v
1,n
is a
broadband process with
{ }
1, 1, 0
0,
n n k
E v v k k
= >
If d
n
and v
1,n
are uncorrelated, then
{ } { } { }
1, 1, 1, 1, 0
0,
n n k n n k n n k
E v x E v d E v v k k

= + = >
Therefore, if n
0
>k
0
then the delayed process x
n-n0
will be uncorrelated with the
noise v
1,n
and correlated with d
n
(assuming that d
n
is a broadband process).
(7.62.1)
(7.62.2)
Summer II 2008
63
Thus, x
n-n0
may be used
as a reference signal to
estimate d
n
.
As an example, we
consider the same
sinusoid of
0
=0.05
contaminated by noise
v
1,n
according to (7.57.2)
where g
n
is white noise
with a variance
g
2
=0.25
Summer II 2008
64
The noisy signal x
n
The estimate of d
n
produced with the 12
th
order NLMS adaptive
step size =0.25 and a
reference signal
obtained by delaying x
n
by n
0
=25 samples
Summer II 2008
65
Other LMS-based algorithms: the
leaky LMS
In addition to the NLMS algorithm, a number of other modifications to the LMS
algorithm exists. Each of these modifications attempts to improve one or more
properties of the LMS algorithm.
When the input process to an adaptive filter has an autocorrelation matrix with zero
eigenvalues, the LMS adaptive filter has one or more modes that are undriven or
undamped. For example, if
k
=0,
{ }
, 0, n k k
E u u =
which does not decay to zero with n. Since it is possible for these undamped
models to become unstable, it is important to stabilize the LMS adaptive filter by
forcing these modes to zero. One way to accomplish this is to introduce a leakage
coefficient into the LMS algorithm as follows:
( )
*
1
1
n n n n
e
+
= + w w x
where 0 < <<1.
(7.65.1)
(7.65.2)
Summer II 2008
66
leaky LMS
The effect of this leakage coefficient is to force the filter coefficients to zero if either
the error e
n
or the input x
n
become zero, and to force any undamped modes of the
system to zero.
Substituting the expression for error e
n
into (7.65.2), we derive
{ }
* *
1
T
n n n n n n
d
+

= + +

w I x x I w x
Taking the expected value and using the independency assumption yields
{ } { } { }
1 n x n dx
E E
+
= + +

w I R I w r
Comparing (7.66.2) to the expression for the LMS algorithm, we observe that the
autocorrelation matrix R
x
is replaced with R
x
+I. Therefore, the coefficient leakage
term adds white noise to x
n
by adding to the main diagonal of the autocorrelation
matrix. Since the eigenvalues of R
x
+I are
k
+ and since
k
0, then none of
the modes of the leaky LMS algorithm will be undamped.
(7.66.1)
(7.66.2)
Summer II 2008
67
leaky LMS
In addition, the constrain on the step size for convergence in the mean becomes
max
2
0

< <
+
The drawback of the leaky LMS algorithm is that for stationary processes, the
steady state solution will be biased. Specifically, note that if w
n
converges in the
mean then
{ } ( )
1
lim
n x dx
n
E
= + w R I r
Therefore, a leakage coefficient introduces a bias into the steady state solution.
(7.67.1)
(7.67.2)
Summer II 2008
68
leaky LMS
Alternatively, the leaky LMS algorithm may also be derived by using the LMS
gradient descent algorithm to minimize the error
2 2
n n n
e = + w (7.68.1)
Specifically, since the gradient of
n
is
*
n n n n
e = + x w
then the gradient descent algorithm becomes
*
1 n n n n n n n
e
+
= = + w w w w x
Which is the same as (7.65.2)
(7.68.2)
(7.68.3)
Summer II 2008
69
Other LMS-based algorithms: LMS
with reduced complexity
In certain applications, such as high speed digital communications, it may be
necessary to further simplify the LMS algorithm.
One possible simplification is the block LMS algorithm, which is identical to the
LMS except that the filter coefficients are updated only once for each block of L
samples. In other words, the filter coefficients are constant over each block of L
samples, and the output y
n
and the error e
n
for each value of n within the block are
calculated using the filter coefficients for that block. At the end of each block, the
coefficients are updated using an average of the L gradient estimates over the
block. The update equation for the filter coefficients in the k
th
block is
1
*
( 1), ,
0
1
L
k L k L kL l kL l
l
e
L
+ + +
=
= +
w w x
where the output for the k
th
block is
,
0,1,..., 1
T
kL l k L kL l
y l L
+ +
= = w x
(7.69.1)
(7.69.2)
Since y
n
over each block of L values is the convolution of the weight vector w
kL
with
a block of input samples, the efficiency of block LMS comes fromusing an FFT to
perform the convolution.
Summer II 2008
70
Another simplification is the sign LMS algorithm, where the coefficient update
equation is modified by applying the sign operator to either the error e
n
, the data x
n
,
or both the error and the data. For example, assuming that x
n
and d
n
are real-
valued processes, the sign-error algorithm is
{ }
1
sgn
n n n n
e
+
= + w w x
where
{ }
1 0
sgn 0 0
1 0
n
n n
n
e
e e
e
>
= =
<
The sign error algorithm may be viewed as the result of applying a two-level
quantizer to the error. The simplification comes when the step size is selected as a
power of two:
2
l
=
(7.70.1)
(7.70.2)
(7.70.3)
Summer II 2008
71
In this case, the coefficient update equation may be implemented using p+1 data
shifts instead of p+1 multiplies. Since the replace of e
n
with the sign of the error
changes only the magnitude of the correction used to update w
n
and does not
affect the direction, the sign-error algorithm is equivalent to the LMS algorithm with
a step size that is inversely proportional to the magnitude of the error.
Instead of using the sign of the error, the sign of the data maybe used to simplify
the LMS algorithm as follows:
{ }
1
sgn
n n n n
e
+
= + w w x
This algorithm is referred to as the sign-data algorithm. Unlike the sign-error
algorithm, the sign-data algorithm modifies the direction of the update vector. As a
result, this algorithm is generally less robust than the sign-error algorithm.
The k
th
coefficient in the sign of the data vector may be expressed as follows:
{ } sgn
n k n k n k
x x x

=
(7.71.1)
(7.71.1)
Summer II 2008
72
Therefore, the sign-data algorithm individually normalizes each coefficient of the
weight vector. Thus, the sign-data algorithm may be written as
1, , n k n k n n k
n k
w w e x
x
= +
which is an LMS algorithm that has a different (time-varying) step size for each
coefficient in the weight vector.
Finally, the sign-sign algorithm uses the quantization of both the error and the data,
and has a coefficient update equation:
{ } { }
1
sgn sgn
n n n n
e
+
= + w w x
The coefficients are updated by either adding or subtracting a constant . For
stability, a leakage term is often introduced into the sign-sign algorithm as follows:
( ) { } { }
1
1 sgn sgn
n n n n
e
+
= + w w x
The sign-sign algorithm converges slower than the LMS algorithm and has a larger
excess MS error. Nevertheless, simplicity of the update algorithm made it popular.
(7.72.1)
(7.72.2)
(7.72.3)
Summer II 2008
73
Other LMS-based algorithms:
variable step-size algorithms
While selecting the step size in the LMS algorithm, tradeoff between the rate of
convergence, the amount of excess MS error, and the ability of the filter to track
signals as their statistics change. Ideally, the step size must be large when the
coefficients are far from optimum solution to move toward it rapidly. When the filter
starts converging to the steady state solution, the step size should be decreased to
reduce the excess MS error. Therefore, a variable step size would be beneficial.
Assuming that x
n
and d
n
are real-valued processes, the variable step (VS)
algorithmuses the coefficient update equation of the form:
1, , , n k n k n k n n k
w w e x
+
= +
where
n,k
are step sizes that are adjusted independently for each coefficient.
(7.73.1)
Summer II 2008
74
Other LMS-based algorithms:
variable step-size algorithms
Therefore,
n,k
is decreased by a constant c
1
if m
1
successive sign changes are
observed in e
n
x
n-k
; whereas
n,k
is increased by a constant c
2
if e
n
x
n-k
has the same
sign for m
2
successive updates. Additionally, hard limits are placed on the step size
min , max n k
< <
to ensure that the VS algorithm converges in the mean with only a modest increase
in computation. The VS algorithm may result in a considerable improvement in the
convergence rate.
the rules for adjusting
n,k
are based on the observation that if the sign of e
n
x
n-k
is
changing frequently, then the coefficient w
n,k
should be close to its optimum value.
2
2
n n n k
e e x
=
With an estimated gradient given by
(7.74.1)
(7.74.2)
Summer II 2008
75
Adaptive recursive filters
So far, we were concerned about FIR (non-recursive) adaptive filters for producing
the MMS estimate of d
n
. Next, we consider design of IIR (recursive) adaptive filters
of the form
, ,
1 0
p q
n n k n k n k n k
k k
y a y b x

= =
= +

where a
n,k
and b
n,k
are the coefficients of the adaptive filter at time n. Recursive
adaptive filters potentially have an advantage over non-recursive filters in providing
better performance for a given filter order. However, the convergence time and the
numerical sensitivity of IIR adaptive filters might be affected by its potential
instability.
The gradient vector can be
formed recursively by filtering
x
n-k
*
and y
n-k
*
with the TI
recursive filter 1/A
n
*
(z
*
).
(7.75.1)
Summer II 2008
76
A simplified IIR LMS
adaptive filter assumes that
the step size is small
enough so that an estimate
of the gradient vector may
be computed recursively.
These estimates are used
to update the filter
coefficients.
We observe that p+q+1 recursive filters operating in parallel to produce the
approximations
k,n
a
and
k,n
b
to the gradient vector.
Summer II 2008
77
The
summary
of a
simplified
IIR LMS
algorithm:
Summer II 2008
78
The implementation of p+q+1 TI filters in parallel represents a significant
computational load and requires a significant amount of storage. However,
assuming that the step size is small, the IIR LMS adaptive filter may be further
simplified.
Specifically, the gradient
estimate may be formed
by delaying y
n
*
and
filtering it by the TI filter.
Similar processing might be applied to x
n
*
to form another gradient estimate.
If is small enough so
that the coefficients a
n,k
do not vary significantly
over intervals of length p,
the filter 1/A
n
*
(z
*
) may be assumed to be TI and the cascade of the delay and the TI
filter may be interchanged.
Summer II 2008
79
Therefore, the gradients may be estimated by simply generating the filtered signals
y
n
f
=
0,n
a
and x
n
f
=
0,n
b
, and then delaying these signals.
This method called the
filtered signal
approach, requires
only two recursive
filters to estimate the
gradient vector.
Compared to the
p+q+1 filters needed
for the simplified IIR
LMS algorithm, this
method is much more
efficient and, for a
sufficiently small step
size, has similar to the
IIR LMS performance.
Summer II 2008
80
The
summary
of the
filtered
signal
approach
to
adaptive
recursive
filtering:
Summer II 2008
81
Recursive least squares (RLS)
In each of the adaptive filtering methods discussed so far, we considered gradient
descent algorithms minimizing the MS error
{ }
2
n n
E e =
The problem with these methods is that they all require knowledge of the
autocorrelation of the input process E{x
n
x
n-k
*
}, and the cross-correlation between the
input and the desired process E{d
n
x
n-k
*
}. When the statistical information is
unknown, we estimated these statistics from the data.
Although this approach may be adequate in some applications, in others this
gradient estimate may not provide a sufficiently rapid convergence or a sufficiently
small excess MS error.
An alternative, therefore, is to consider error measures that do not include
expectations and may be computed directly from the data.
(7.81.1)
Summer II 2008
82
Recursive least squares (RLS)
For example, a least squares (LS) error
2
0
n
n i
i
e
=
=
does not require statistical information about x

n
and d
n
and may be evaluated
directly from x
n
and d
n
.
Note that minimizing the MS error produces the same set of filter coefficients for all
sequences having the same statistics: the filter coefficients do not depend on the
incoming data.
Minimizing the LS error that depends explicitly on the specific values of x
n
and d
n
will produce different filters for different signals even if the signals have the same
statistics. In other words, different realizations of x
n
and d
n
will lead to different
filters.
(7.82.1)
Summer II 2008
83
Exponentially weighted RLS
Let us reconsider the design of an FIR adaptive Wiener filter and find the coefficients
,0 ,1 ,
T
n n n n p
w w w =

w
that minimize, at time n, the weighted LS error
0
n
n i
n i
i
e
=
=
where 0 < 1 is an exponential weighting (forgetting) factor and

T
i i i i n i
e d y d = = w x
Note that e
i
is the difference between the desired signal d
i
and the filtered output at
time i, using the latest set of filter coefficients w
n,k
. Thus, in minimizing
n
it is
assumed that the weights w
n
are constant over the entire observation interval [0, n].
(7.83.1)
(7.83.2)
(7.83.3)
Summer II 2008
84
To find the coefficients minimizing the LS error, we set the derivative of the error to
zero for k =0, 1,, p as
*
*
* *
0 0
, ,
0
n n
n i n i n i
i i i k
i i
n k n k
e
e e x
w w
= =

= = =

(7.84.1)
Incorporating (7.83.3), the last equation is
*
,
0 0
* *
,
0 0 0
0
p n
n i
i n l i l i k
i l
p n n
n i n i
n l i l i k i i k
l i j
d w x x
w x x d x

= =

= = =

=

=

Which, in the matrix form, becomes
( ) ( )
x n dx
n n = R w r
(7.84.2)
(7.84.3)
(7.84.4)
Summer II 2008
85
Where R
x
(n) is a (p+1) x (p+1) exponentially weighted deterministic autocorrelation
matrix for x
n
:
*
0
( )
n
n i T
x l i
i
n
=
=
R x x
with x
i
the data vector
1
T
i i i i p
x x x

=

x
and where r
dx
(n) is the deterministic cross-correlation between d
n
and x
n
:
*
0
( )
n
n i
dx i i
i
n d
=
=
r x
Here, (7.84.4) is called the deterministic normal equations.
(7.85.1)
(7.85.2)
(7.85.3)
Summer II 2008
86
For the set of optimum coefficients, the error will be
*
*
,
0 0 0
*
,
0 0 0
p n n
n i n i
n i i i i n l i l
i i l
p n n
n i n i
i i n l i i l
i l i
e e e d w x
e d w e x

= = =

= = =

= =

=

If w
n,l
are the coefficients minimizing the squared error, the second term in (7.86.1) is
zero and the minimum error is
(7.86.1)
{ }
* *
,
min
0 0 0
2
*
,
0 0 0
p n n
n i n i
n i i i n l i l i
i i l
p n n
n i n i
i n l i l i
i l i
e d d w x d
d w x d

= = =

= = =

= =

=

(7.86.2)
Summer II 2008
87
Alternatively, using vector format, the minimum error is
{ } ( )
2
min
H
n n dx n
n
= d r w
where is the weighted norm of the vector d
n
=[d
n
, d
n-1
, ,d
0
]
T
.
2
n
d
Since both R
x
(n) and r
dx
(n) depend on n, instead of solving the deterministic normal
equations directly for each value of n, we derive a recursive solution of the form:
1 1 n n n
= + w w w
where w
n-1
is a correction that is applied to the solution at time n 1. Observe that
( ) ( )
1
n x dx
n n
= w R r
It follows from (7.85.3) that the cross-correlation may be updated recursively as
( ) ( )
*
1
dx dx n n
n n d = + r r x
(7.87.1)
(7.87.2)
(7.87.3)
(7.87.4)
Summer II 2008
88
Similarly, the autocorrelation matrix may also be updated recursively as
( ) ( )
*
1
T
x x n n
n n = + R R x x
However, since we are interested in the inverse of R
x
(n), we need to use the
Woodburys identity to obtain the following:
( ) ( )
( ) ( )
( )
2 1 * 1
1 1 1
1 1 *
1 1
1
1 1
T
x n n x
x x T
n x n
n n
n n
n

=
+
R x x R
R R
x R x
Simplifying the notation, we denote the inverse of the autocorrelation matrix at time n
as
( ) ( )
1
x
n n
= P R
and define the gain vector as
( )
( )
1 *
1 *
1
1 1
n
n
T
n n
n
n
=
+
P x
g
x P x
(7.88.1)
(7.88.2)
(7.88.3)
(7.88.4)
Summer II 2008
89
(7.89.1)
We rewrite the inverse as
( ) ( ) ( )
1
1 1
T
n n
n n n
=

P P g x P
The gain vector can be rewritten (derivations are omitted) as follows:
( )
*
n n
n = g P x
Therefore, the gain vector is the solution to the linear equations:
( )
*
x n n
n = R g x
Which is the same as the deterministic normal equations for w
n
in (7.84.4) except
that the cross-correlation vector has been replaced with the data vector.
To complete the recursion, we need to derive the update equation for the coefficient
vector w
n
.
(7.89.2)
(7.89.3)
Summer II 2008
90
with
( ) ( )
n dx
n n = w P r (7.90.1)
It follows from (7.85.3) that
( ) ( ) ( )
*
1
n dx n n
n n d n = + w P r P x
Incorporating the update for P(n) and since P(n)x
n
*
=g
n
, we have
( ) ( ) ( ) 1 1 1
T
n n n dx n n
n n n d = +

w P g x P r g
Finally, since
( ) ( )
1
1 1
dx n
n n
= P r w
it follows that
1 1
T
n n n n n n
d

= +

w w g w x
which may be written as
1 n n n n
= + w w g
(7.90.2)
(7.90.3)
(7.90.4)
(7.90.5)
(7.90.6)
Summer II 2008
91
(7.91.1)
Here
1
T
n n n n
d
= w x
is the difference between d
n
and its estimate that was formed by applying the
previous filter coefficients to the new data vector. This sequence called the a
priori error is the error that would occur if the filter coefficients were not updated.
The a posteriori error, on the other hand, is the error occurring after the weight
vector is updated:
T
n n n n
e d = w x (7.91.2)
When a priori error is small, the current set of filter coefficients is close to their
optimal values (in a least square sense), and only a small correction needs to be
applied to the coefficients.
Finally, we note that the following product needs to be defined:
( )
*
1
n n
n = z P x
(7.91.3)
Summer II 2008
92
(7.91.3) represents the filtered information vector that can be used in the
calculation of both g
n
and P(n). The RLS algorithm is summarized below:
The special case of =1 is referred
to as the growing window RLS
algorithm.
The initial conditions for the weight
vector and for the inverse
autocorrelation matrix are required
to initialize the algorithm.
There are two approaches to this
problem
Summer II 2008
93
Exponentially weighted RLS:
initialization
The first approach is to build up the autocorrelation matrix recursively as in (7.88.1)
until it is of full rank (typically p+1 input vectors), and then compute the inverse
directly as
( )
1
0
*
0
i T
i i
i p
=

=

P x x
Evaluating the cross-correlation vector in the same manner
( )
0
*
0
i
dx i i
i p
d
=
=
r x
We may then initialize w
0
by setting
( ) ( )
0
0 0
dx
= w P r
This approach starts from the vector minimizing the weighted LS error. Its
disadvantage is that it requires on the order of (p+1)
3
operations and would
introduce a delay of p+1 samples before any updates are performed.
(7.93.1)
(7.93.2)
(7.93.3)
Summer II 2008
94
Exponentially weighted RLS:
initialization
(7.94.1)
Another approach (the soft-constrained initialization) that may be used to initialize
the autocorrelation matrix is
( )
0
x
= R I
where is a small positive constant. Therefore
( )
1
0
= P I
The weight vector is initialized to zero:
0
= w 0
The disadvantage of this approach is that it introduces a bias in the LS solution.
However, with an exponential weighting factor <1 this bias goes to zero as n
increases.
(7.94.2)
(7.94.3)
Summer II 2008
95
Unlike the LMS algorithm that requires on the order of p multiplications and
additions, the RLS algorithm requires on the order of p
2
operations.
The evaluation of z
n
requires (p+1)
2
multiplications, computing the gain vector g
n
requires 2(p+1) multiplications, the a priori error takes another p+1 multiplications,
and the update of the inverse autocorrelation matrix P(n) requires 2(p+1)
2
multiplications. Therefore, a total of 3(p+1)
2
+3(p+1) multiplications and a similar
number of additions is required.
However, this increase in complexity of the RLS algorithm results in better
performance. The RLS algorithm, generally, converges faster than LMS and is less
sensitive to eigenvalue disparities for stationary processes.
On the other hand, without exponential weighting (=0), RLS does not perform very
well in tracking nonstationary processes.
Finally, although the exponential weighting improves the tracking ability of RLS, it is
not clear how to choose .
Summer II 2008
96
Ex: Linear prediction using RLS
where v
n
is unit variance white noise. We will consider a second-order RLS
predictor of the form
1 2
1.2728 0.81
n n n n
x x x v

= +
Let x
n
be a second-order AR process generated according to
,1 1 ,2 2
n n n n n
x w x w x

= +
(7.96.1)
Predictor
coefficients obtained
with a growing
window RLS
algorithm (=1).
Note: much faster
convergence than
by the LMS.
(7.96.2)
Summer II 2008
97
Predictor
coefficients obtained
with the exponential
window RLS with
=0.95.
Note: increase in
fluctuations of the
weights.
If the signal properties do not change over time, the weights w
n
of a growing
window RLS become more stable as n increases.
Summer II 2008
98
,1 1 2
0.81
n n n n n
x a x x v

= +
Let us modify the problem to simulate a nonstationary case:
(7.98.1)
Where v
n
is still unit-variance white noise and a
n,1
is a time-varying coefficient
,1
1.2728 0 100
0 100 200
n
n
a
n
<
=
which corresponds to a filter with a pair of poles at radius r =0.9 and angle of /4
when a
n,1
=1.2728 and radius 0.9 and angle of /2 when a
n,1
=0.
(7.98.2)
Summer II 2008
99
Nonstationary 2
nd
-
order AR process:
Predictor
coefficients of the
growing window
RLS (=1).
Nonstationary 2
nd
-
order AR process:
Predictor
coefficients of the
exponentially
weighted RLS with
=0.9.
Summer II 2008
100
Nonstationary 2
nd
-
order AR process:
Predictor
coefficients of the
LMS algorithm with
the step size =
0.02.
The growing window RLS cannot track the time variations effectively due to the
infinite window length (infinite memory).
On the other hand, a decrease in the weighting factor to 0.9 allows the RLS
algorithm tracking changes more adequately.
Finally, the performance of the LMS algorithm in tracking nonstationary changes is
similar to that of the weighted RLS. However, the initial convergence is somewhat
slower.
Summer II 2008
101
Sliding window RLS (WRLS)
The RLS algorithm minimizes the exponentially weighted least squares error
n
.
With the growing window RLS, each of the squared errors |e
i
|
2
from i =0 to i =n
are equally weighted, whereas with an exponentially weighted RLS, the squared
errors |e
i
|
2
become less important for values of i that are small compared to n. In
both cases, however, the RLS algorithm has infinite memory in the sense that all
the data from n =0 will affect the values of coefficients w
n
. In certain applications
(for instance, for nonstationary processes) this may be undesirable.
An alternative is to minimize the sum of the squares of e
i
over a finite window:
2
,
n
L n i
i n L
e
=
=
(7.101.1)
The finite window RLS algorithm tracks nonstationary processes more easily and is
able to forgetany data outliers after a finite number of iterations.
Summer II 2008
102
The filter coefficients can be found by solving the equations:
( ) ( )
x n dx
n n = R w r
where
*
( )
n
T
x i j
i n L
n
=
=
R x x
and
*
( )
n
dx i i
i n L
n d
=
=
r x
(7.102.1)
(7.102.2)
(7.102.3)
(7.102.1) can be solved recursively with a computational complexity on the order of
p
2
operations.
Summer II 2008
103
The sliding window RLS algorithm consists of the following steps:
1. Given the solution w
n-1
to (7.102.1) at time n-1, with the new data value x
n
, the
weight vector is found that minimizes the error
2. The weight vector w
n
that minimizes
L,n
is then determined by discarding the
last data point x
n-L-1
.
n
w
2
1,
1
n
L n i
i n L
e
+
=
=
(7.103.1)
We observe that we may use the growing window RLS algorithm in the first step as
follows:
( )
( )
*
1
1
1 1
n
n T
n n
n
n
+
=
+
P x
g
x P x
(7.103.2)
Summer II 2008
104
( )
( ) ( ) ( )
*
1 1
1 1
n n n n n n
T
n n
d
n n n

= +
=
w w g w x
P P g x P
We note that is the inverse of the matrix ( ) n P
( )
*
1
n
T
x k k
k n L
n
=
=
R x x
which is based on L+2 data values, and is the solution to

n
w
( ) ( )
x n dx
n n = R w r

where
( )
1
*
1
n
dx k k
k n L
n d
+
=
=
r x
(7.104.1)
(7.104.2)
(7.104.3)
(7.104.4)
(7.104.5)
Summer II 2008
105
In the second step of the recursion, the last data point x
n-L-1
is discarded to restore
the L+1 point window. Therefore, we begin with a matrix update
*
1
T
n n n L n L
= R R x x
and
( ) ( )
*
1 1 dx dx n L n L
n n d

= r r x
Finally, with the matrix inversion lemma and following the steps used to derive the
RLS algorithm, we obtain the update equations:
( )
( )
*
1
1
*
1 1
1
n L
n
T
n L n L
n
n

+

=
P x
g
x P x
( )
*
1 1 n n n n L n n L
d

= w w g w x
(7.105.1)
(7.105.2)
(7.105.3)
(7.105.4)
Summer II 2008
106
The WRLS algorithm is implemented by the (framed) equations (7.103.2),
(7.104.1), (7.104.2), (7.105.3), (7.105.4), and (7.106.1).
Compared to the exponentially weighted RLS, the sliding window RLS
requires about twice the number of multiplications and additions. It also
requires that p+L values of x
n
be stored. This storage requirement may
potentially be a problem for long windows.
Note, we may also introduce the weighting (forgetting) factor by replacing
P(n-1) with P(n-1) in (7.103.2) and (7.104.2).
( ) ( ) ( )
1
T
n n L
n n n

= + P P g x P

(7.106.1)

Lecture 07: Adaptive Filtering: Instructor: Dr. Gleb V. Tcheslavski Contact: Gleb@ee - Lamar.edu Office Hours

Enviado por

Dados do documento

Descrição original:

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Lecture 07: Adaptive Filtering: Instructor: Dr. Gleb V. Tcheslavski Contact: Gleb@ee - Lamar.edu Office Hours

Enviado por

Direitos autorais:

Formatos disponíveis

Summer II 2008

a set of p+1 linear equations in the p+1 unknowns w

and using (7.25.1), we have

If is small enough, so that

Which is fairly close to the estimated values.

does not require statistical information about x

where 0 < 1 is an exponential weighting (forgetting) factor and

We note that is the inverse of the matrix ( ) n P

which is based on L+2 data values, and is the solution to

Você também pode gostar