Escolar Documentos
Profissional Documentos
Cultura Documentos
Sargur Srihari
1
Machine Learning Srihari
• Gradient of E y = (y1,..yN )T
N
t = (t1,.. tN )T
∇E(w) = ∑ (yn −tn )φ(xn ) = Φ (y − t)
T
n=1
⎛ φ 0( x1 ) φ 1( x1 ) ... φ M −1( x1 ) ⎞
⎜ ⎟
⎜ φ 0( x 2 )
• Hessian of E Φ=⎜
⎟
⎟
⎜ ⎟
N ⎜φ (x ) φ ( x ) ⎟
H = ∇∇E(w) = ∑ yn (1 −yn )φ(xn )φT (xn ) = ΦT RΦ
⎝ 0 N M −1 N ⎠
n=1
( φ (x ) )
T
R is NxN diagonal matrix with elements φ(xn ) = 0 n
φ1 (xn ) ... φM −1 (xn )
Rnn=yn(1-yn)=wTφ(xn)(1-wTφ(xn))
Hessian is not constant and depends on w through R 4
Since H is positive-definite (i.e., for arbitrary u, uTHu>0)
error function is a concave function of w and so has a unique minimum
Machine Learning Srihari
exp(wTk φ(xn ))
ynk = yk (φ(xn )) =
• Gradient of E ∑ j
exp(wT
j
φ(xn ))
N
∇w E(w1,..., w K ) = −∑ (ynj −tnj )φ(xn )
j
n=1 ⎛ φ 0( x1 ) φ 1( x1 ) ... φ M −1( x1 ) ⎞
⎜ ⎟
⎜ φ 0( x 2 ) ⎟
• Hessian of E Φ=⎜ ⎟
⎜ ⎟
N ⎜φ (x ) φ M −1( x N ) ⎟⎠
⎝ 0 N
∇w ∇w E(w1,..., w K ) = −∑ ynk (I kj −ynj )φ(xn )φT (xn )
k j
( φ (x ) )
n=1 T
φ(xn ) = 0 n
φ1 (xn ) ... φM −1 (xn )
8
Machine Learning Srihari
9
Machine Learning Srihari
Diagonal Approximation
10
Machine Learning Srihari
n=1
• Where bn = ∇y n = ∇an
• Elements can be found in O(W2) steps
11
Machine Learning Srihari
Inverse Hessian
12
Machine Learning Srihari
Finite Differences
13
Machine Learning Srihari
14
Machine Learning Srihari
15