Você está na página 1de 11

Momentum

Newton’s Method
T 1 T
F  xk + 1  = F  xk +  xk   F  xk  + g k  x k + --  xk A k x k
2

Take the gradient of this second-order approximation


and set it equal to zero to find the stationary point:

gk + Ak  xk = 0

–1
 x k = – Ak g k

–1
xk + 1 = xk – Ak gk
Example
2 2
F  x  = x1 + 2 x1 x 2 + 2x 2 + x1

x 0 = 0.5
0.5


F x  g0 =  F x  = 3
 x1 2x 1 + 2x2 + 1 x= x0 3
F  x  = =
 2x 1 + 4x 2
F x 
 x2 A= 22
24

–1
x1 = 0.5 – 2 2 3
=
0.5

1 – 0.5 3
=
0.5

1.5
=
–1
0.5 24 3 0.5 – 0.5 0.5 3 0.5 0 0.5
2

-1

-2
-2 -1 0 1 2
Non-Quadratic Example
4
F x  =  x2 – x1  + 8x 1 x2 – x1 + x2 + 3

x = – 0.42 x = – 0.13 0.55


1 2 3
Stationary Points: x =
0.42 0.13 – 0.55

(1.5 0), [0.75 0.75] (-1.5 0)

2 2 2

1 1 1

0 0 0

-1 -1 -1

-2 -2 -2
-2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2
Newton’s Method
–1
xk + 1 = xk – A k gk

Ak  2 F  x gk  F  x
x = xk x = xk

If the performance index is a sum of squares function: F ( x)  x12 x2 2  4 x2 2

N
2 T
F  x =  v i x  = v  x v x 
v1  x1x2
i =1
v2  2x2
then the jth element of the gradient is

For j=1
N
F  x  vi  x
 F  x  j = --------------- = 2  vi x  --------------- 2( x1x2  x2  2 x2  0)  2 x1x2 2
x j x j
i=1

For j=2
2( x1x2  x1  2 x2  2)  2 x12 x2  8 x2
Matrix Form
The gradient can be written in matrix form:

T
Fx  = 2J  x v x 

where J is the Jacobian matrix: F ( x)  x12 x2 2  4 x2 2


 
J (x)   
 
 x2 x1  x x 
v 1  x  v 1  x  v 1  x  J ( x)   v   1 2
---------------- ----------------  ----------------
x 1 x 2 x n 0 2   2x2 
v 2  x  v 2  x  v 2  x 
---------------- ----------------  ----------------
J x  = x 1 x 2 x n  x2 0
J ( x)  
T
2

 x1
v N  x  v N  x  v N  x 
----------------- -----------------  -----------------
x 1 x 2 x n
Hessian

2 N 2
 F x    vi x  vi x   v i x  
 2 F x  k j = ------------------ = 2  --------- --------------- + vi x  ------------------ 
 ------
 xk x j  x k x j
  xk  x j 
i= 1

T
2 F x  = 2J  x J x  + 2S x 

N
S x  =  vi x  2v i x
i=1 v1  x1x2
F ( x)  x12 x2 2  4 x2 2
v2  2x2

 2 x2 2 2 x1x2 
H  
4 x1x2 2 x1  8
2
Gauss-Newton Method
Approximate the Hessian matrix as:

T
2Fx   2J  x J x 

Newton’s method becomes:

T –1 T
xk + 1 = xk – 2 J  xk  J xk   2 J  xk v  xk 

T –1 T
= x k – J x k J x k  J x k v x k 
Levenberg-Marquardt
Gauss-Newton approximates the Hessian by:
T
H =J J
This matrix may be singular, but can be made invertible as follows:

G = H + I

If the eigenvalues and eigenvectors of H are:

 1 2     n   z 1  z 2   z n 
then Eigenvalues of G
Gz i =  H +  I z i = Hz i +  z i = iz i + z i =  i +  z i

T –1 T
xk + 1 = x k –  J x k J x k + k I  J  xk v  xk 
Adjustment of k
As k0, LM becomes Gauss-Newton.

T –1 T
x k + 1 = xk –  J  xk J  xk  J x k v xk 

As k, LM becomes Steepest Descent with small learning rate.

x k + 1  xk – ---1--J T xk  v xk  = x k – -----1---- F x 
k 2k

Therefore, begin with a small k to use Gauss-Newton and speed


convergence. If a step does not yield a smaller F(x), then repeat the step with an
increased k until F(x) is decreased. F(x) must decrease eventually, since we will be taking
a very small step in the steepest descent direction.

Você também pode gostar