Escolar Documentos
Profissional Documentos
Cultura Documentos
3
2.1.2 Steepest Descent
The graph of x ( w) = d k2 + wT Rw - 2 pT w is a
paraboloid.
4
Steps: 1. Initialize weight values w( t0 )
2. Determine the steepest descent direction
d x ( w(t ))
- �x ( w(t )) = - = 2( p - Rw(t ))
w
dw ( t )
Let Dw (t ) = -�x ( w (t ))
w
= -2( d k - w t (tεt
)� x k ) x k = -2 k ( ) x k
4. w (t + 1) = w (tμεt
)+2 k( ) xk
5. Repeat 1~4 with the next input vector
No calculation of p and R
6
Drawback: time consuming.
Improvement: mini-batch training method.
○ Practical Considerations:
(a) No. of training vectors, (b) Stopping criteria
(c) Initial weights, (d) Step size
7
2.1.4 Conjugate Gradient Descent
-- Drawback: can only minimize quadratic functions,
1 T
e.g., f ( w) = w Aw - bT w + c
2
Advantage: guarantees to find the optimum solution in
at most n iterations, where n is the size of matrix A.
A-Conjugate Vectors:
Let An�n : square, symmetric, positive-definite matrix.
S = {s(0), s(1), �
��, s( n - 1)}
�
s T
(i ) As( j ) = 0, "i �j
Vectors are A-conjugate
* If A = I (identity matrix), conjugacy = orthogonality.
if
Set S forms a basis for space R n .
The solution in can be written as n -1
w *
R n
w = �ai s(i )
*
i =0
• The conjugate-direction method for minimizing f(w) is
defined by w(i + 1) = w(i ) + h (i ) s(i ), i = 0,1, �
�,n -1
�
where w(0) is an arbitrary starting vector.
h (i ) is determined by min f ( w(i ) + h s(i ))
h
How to determine s(i ) ?
Define r (i ) = b - Aw(i ) , which is in the steepest
descent direction of f ( w) (Q - �
w
f ( w) = 2( b - Aw)).
Let s(i ) = r (i ) + a (i ) s(i - 1), i = 1,2, �
�, n - 1 - (A)
�
Multiply by s(i-1)A,
sT (i - 1) As(i ) = sT (i - 1) A( r (i ) + a (i ) s(i - 1)).
In order to be A-conjugate: (i ) As( j ) = 0, "i �j
T
s
0 = sT (i - 1) Ar (i ) + a (i ) sT (i - 1) As(i - 1).
sT (i - 1) Ar (i )
a (i ) = - T - - - (B)
s (i - 1) As(i - 1)
s(1), s(2), ��
� , s( n - 1) generated by Eqs. (A) and (B)
�
are A-conjugate.
• Desire that evaluating a (i ) does not need to know A.
r T (i )( r (i ) - r (i - 1))
Polak-Ribiere formula: a (i ) = -
r (i - 1)r (i - 1) 10
T
r T (i ) r (i )
Fletcher-Reeves formula: a (i ) = T
r (i - 1)r (i - 1)
* The conjugate-direction method for minimizing
x ( w) = 2
dk T
+ w Rw - 2 p w T
Conjugate gradient
converges in at most
n steps where n is the
size of the matrix of
the system (here n=2).
2.3. Applications
2.3.1. Echo Cancellation in Telephone Circuits
17
2.3.4. Adaptive beam – forming antenna
arrays
Antenna : spatial array of sensors which are
directional in their reception
characteristics.
Adaptive filter learns to steer antennae in order
that they can respond to incoming signals no
matter what their directions are, which reduce
responses to unwanted noise signals coming in
from other directions
18
2.4 Madaline : Many
adaline
○ XOR function ?
19
20
2.4.2. Madaline Rule II (MRII)
○ Training algorithm – A trial–and–error procedure
with a minimum disturbance principle (those
nodes that can affect the output error while
incurring the least change in their weights
should have precedence in the learning
process)
○ Procedure –
1. Input a training pattern
2. Count #incorrect values in the output layer
21
3. For all units on the output layer
3.1. Select the first previously unselected error
node whose analog output is closest to zero
( Q this node can reverse its bipolar output
with the least change in its weights)
3.2. Change the weights on the selected unit s.t.
the bipolar output of the unit changes
3.3. Input the same training pattern
3.4. If reduce #errors, accept the weight change,
otherwise restore the original weights
4. Repeat Step 3 for all layers except the input layer
22
5. For all units on the output layer
5.1. Select the previously unselected pair of units
whose output are closest to zero
5.2. Apply a weight correction to both units, in
order to change their bipolar outputs
5.3. Input the same training pattern
5.4. If reduce # errors, accept the correction;
otherwise, restore the original weights.
6. Repeat step 5 for all layers except the input layer.
23
※ Steps 5 and 6 can be repeated with triplets,
quadruplets or longer combinations of units
until satisfactory results are obtained
24
2.4.3. A Madaline for Translation–Invariant
Pattern Recognition
25
。 Relationships among the weight matrices of Adalines
26
○ Extension -- Mutiple slabs with different key weight
matrices for discriminating more then two classes of
patterns
27