Você está na página 1de 8

ENGM 646 II. Unconstrained Optimization Page II.

57

6. Backpropagation training algorithm: Consider a three layer neural


network with the input layer, the hidden layer, and the output layer
shown in Figure 13.6. There are n inputs, m outputs, and l neurons
in the hidden layer.

Input: x1, x2, …, xn


Input to the hidden layer: vj for j = 1, 2, …, l
Output: y1, y2, …, ym
Output from the hidden layer: zj for j = 1, 2, …, l
Connection weights to the hidden layer: wjih for j = 1, 2, …, l and i
= 1, 2, …, n
Connection weights to the output layer: wkjo for j = 1, 2, …, l and k
= 1, 2, …, m
Activation functions: fjh for j = 1, 2, …, l and fso for s = 1, 2, …, m
ENGM 646 II. Unconstrained Optimization Page II.58

n
v j = ∑ w hji x i ,
i =1

z j = f jh (v j ),

⎛ l ⎞
ys = f so ⎜ ∑ w sjo z j ⎟
⎝ j=1 ⎠

⎛ l ⎞ ⎛ l ⎞ ⎛ l ⎛ n ⎞⎞
y s = f so ⎜⎜ ∑ w sjo z j ⎟⎟ = f so ⎜⎜ ∑ w sjo f jh (v j ) ⎟⎟ = f so ⎜⎜ ∑ w sjo f jh ⎜ ∑ w hji x i ⎟ ⎟⎟ = Fs (x1 ,..., x n )
⎝ j=1 ⎠ ⎝ j=1 ⎠ ⎝ j=1 ⎝ i=1 ⎠⎠

First consider a single training data point (xd, yd), where xd ∈ ℜn


and yd ∈ ℜm. We need to find the weights wjih for j = 1, 2, …, l
and i = 1, 2, …, n and wkjo for j = 1, 2, …, l and k = 1, 2, …, m
such that the following objective function is minimized:
m
Minimize E(w) = 1 ∑ (yds − ys ) 2
2 s=1

where ys, whose equation is given earlier, is a function of input


data xd and the unknown weights to be optimized. To solve this
unconstrained optimization problem, we may use a gradient
method with a fixed step size. An iterative procedure is needed
with a proper stopping criterion. We need a starting point, that is,
initial guesses of the weights of the neural network.
ENGM 646 II. Unconstrained Optimization Page II.59

Defining
'⎛ ⎞
l
δs = (y ds − y s )f so ⎜⎜ ∑ w sq
o
z q ⎟⎟ , s = 1, 2, …, m
⎝ q=1 ⎠
we can express the gradient ∇E(w) (with respect to wjih and wsjo) as
follows:
∂E(w) ⎛ m ⎞ h'
∂w hji
= −⎜
⎜ ∑ δ p w o
pj
⎟f j (v j )x di

⎝ p =1 ⎠
∂E(w)
= −δ s z j
∂w sjo

The fixed step-size gradient method uses the following iterative


equation:

w(k+1) = w(k) – η∇E(w(k)), k = 0, 1, 2, …

where η is called the learning rate. Explicitly, we have

⎛ m (k) (k) ⎞ ' (k)


+ η⎜⎜ ∑ δ p w opj ⎟⎟f jh (v j )x di
(k +1) (k)
w hji = w hji
⎝ p=1 ⎠
(k +1) (k) (k) (k)
w sjo = w sjo + ηδs z j

The update equation for the weights wsjo of the output layer is
illustrated in Figure 13.7. The update equation for the weights wjih
of the hidden layer is illustrated in Figure 13.8.
ENGM 646 II. Unconstrained Optimization Page II.60
ENGM 646 II. Unconstrained Optimization Page II.61

This algorithm is called the backpropagation algorithm because the


output errors δ1, δ2, …, δm are propagated back from the output
layer to other layers and are used to update the weights in these
layers.
ENGM 646 II. Unconstrained Optimization Page II.62

7. The standard sigmoid activation function has the following form:


f(v) = 1 −v .
1+ e
An extended sigmoid function has the following form:
g(v) = β−( v−θ ) ,
1+ e
where β is called the scale parameter and θ is called the shift
parameter. If such an activation function is needed in a neural
network, we would also like to be able to adjust the values of these
two parameters. However, it turns out that these parameters can be
incorporated into the structure of the neural network by treating
them as additional weights to be adjusted. The scale parameter β
can be incorporated into the weight on the output of the neuron.
The shift parameter θ can be treated as the weight to a constant
input (+1) to the neuron. This constant input is called the bias term.
Refer to Figure 13.11 for graphical explanation. The weights on
the constant inputs are often denoted by vector b.
ENGM 646 II. Unconstrained Optimization Page II.63

Example II.19: Consider a neural network with 2 inputs, 2 hidden


neurons, and 1 output neuron. The activation function for all neurons is
given by f(v) = 1/(1+e–v). The starting points are (w11h(0), w12h(0), w21h(0),
w22h(0), w11o(0), w12o(0)) = (0.1, 0.3, 0.3, 0.4, 0.4, 0.6). The learning rate η
= 10. Consider a single training input-output pair with x = (0.2, 0.6)T and
y = 0.7. See Figure 13.9. The results of 21 iterations of the
backpropagation algorithm are given in the attached table.
ENGM 646 II. Unconstrained Optimization Page II.64

Você também pode gostar