Back Pro Bag at Ion

ENGM 646 II. Unconstrained Optimization Page II.
57
6. Backpropagation training algorithm: Consider a three layer neural

network with the input layer, the hidden layer, and the output layer
shown in Figure 13.6. There are n inputs, m outputs, and l neurons
in the hidden layer.
Input: x1, x2, …, xn

Input to the hidden layer: vj for j = 1, 2, …, l
Output: y1, y2, …, ym
Output from the hidden layer: zj for j = 1, 2, …, l
Connection weights to the hidden layer: wjih for j = 1, 2, …, l and i
= 1, 2, …, n
Connection weights to the output layer: wkjo for j = 1, 2, …, l and k
= 1, 2, …, m
Activation functions: fjh for j = 1, 2, …, l and fso for s = 1, 2, …, m
ENGM 646 II. Unconstrained Optimization Page II.58
n
v j = ∑ w hji x i ,
i =1
z j = f jh (v j ),
⎛ l ⎞
ys = f so ⎜ ∑ w sjo z j ⎟
⎝ j=1 ⎠
⎛ l ⎞ ⎛ l ⎞ ⎛ l ⎛ n ⎞⎞
y s = f so ⎜⎜ ∑ w sjo z j ⎟⎟ = f so ⎜⎜ ∑ w sjo f jh (v j ) ⎟⎟ = f so ⎜⎜ ∑ w sjo f jh ⎜ ∑ w hji x i ⎟ ⎟⎟ = Fs (x1 ,..., x n )
⎝ j=1 ⎠ ⎝ j=1 ⎠ ⎝ j=1 ⎝ i=1 ⎠⎠
First consider a single training data point (xd, yd), where xd ∈ ℜn

and yd ∈ ℜm. We need to find the weights wjih for j = 1, 2, …, l
and i = 1, 2, …, n and wkjo for j = 1, 2, …, l and k = 1, 2, …, m
such that the following objective function is minimized:
m
Minimize E(w) = 1 ∑ (yds − ys ) 2
2 s=1
where ys, whose equation is given earlier, is a function of input

data xd and the unknown weights to be optimized. To solve this
unconstrained optimization problem, we may use a gradient
method with a fixed step size. An iterative procedure is needed
with a proper stopping criterion. We need a starting point, that is,
initial guesses of the weights of the neural network.
Defining
'⎛ ⎞
l
δs = (y ds − y s )f so ⎜⎜ ∑ w sq
o
z q ⎟⎟ , s = 1, 2, …, m
⎝ q=1 ⎠
we can express the gradient ∇E(w) (with respect to wjih and wsjo) as
follows:
∂E(w) ⎛ m ⎞ h'
∂w hji
= −⎜
⎜ ∑ δ p w o
pj
⎟f j (v j )x di
⎟
⎝ p =1 ⎠
∂E(w)
= −δ s z j
∂w sjo
The fixed step-size gradient method uses the following iterative

equation:
w(k+1) = w(k) – η∇E(w(k)), k = 0, 1, 2, …
where η is called the learning rate. Explicitly, we have
⎛ m (k) (k) ⎞ ' (k)

+ η⎜⎜ ∑ δ p w opj ⎟⎟f jh (v j )x di
(k +1) (k)
w hji = w hji
⎝ p=1 ⎠
(k +1) (k) (k) (k)
w sjo = w sjo + ηδs z j
The update equation for the weights wsjo of the output layer is
illustrated in Figure 13.7. The update equation for the weights wjih
of the hidden layer is illustrated in Figure 13.8.
This algorithm is called the backpropagation algorithm because the

output errors δ1, δ2, …, δm are propagated back from the output
layer to other layers and are used to update the weights in these
layers.
7. The standard sigmoid activation function has the following form:

f(v) = 1 −v .
1+ e
An extended sigmoid function has the following form:
g(v) = β−( v−θ ) ,
1+ e
where β is called the scale parameter and θ is called the shift
parameter. If such an activation function is needed in a neural
network, we would also like to be able to adjust the values of these
two parameters. However, it turns out that these parameters can be
incorporated into the structure of the neural network by treating
them as additional weights to be adjusted. The scale parameter β
can be incorporated into the weight on the output of the neuron.
The shift parameter θ can be treated as the weight to a constant
input (+1) to the neuron. This constant input is called the bias term.
Refer to Figure 13.11 for graphical explanation. The weights on
the constant inputs are often denoted by vector b.
Example II.19: Consider a neural network with 2 inputs, 2 hidden

neurons, and 1 output neuron. The activation function for all neurons is
given by f(v) = 1/(1+e–v). The starting points are (w11h(0), w12h(0), w21h(0),
w22h(0), w11o(0), w12o(0)) = (0.1, 0.3, 0.3, 0.4, 0.4, 0.6). The learning rate η
= 10. Consider a single training input-output pair with x = (0.2, 0.6)T and
y = 0.7. See Figure 13.9. The results of 21 iterations of the
backpropagation algorithm are given in the attached table.

Back Pro Bag at Ion

Enviado por

Dados do documento

Descrição original:

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Back Pro Bag at Ion

Enviado por

Direitos autorais:

Formatos disponíveis

ENGM 646 II. Unconstrained Optimization Page II.

6. Backpropagation training algorithm: Consider a three layer neural

Input: x1, x2, …, xn

First consider a single training data point (xd, yd), where xd ∈ ℜn

where ys, whose equation is given earlier, is a function of input

The fixed step-size gradient method uses the following iterative

w(k+1) = w(k) – η∇E(w(k)), k = 0, 1, 2, …

where η is called the learning rate. Explicitly, we have

⎛ m (k) (k) ⎞ ' (k)

This algorithm is called the backpropagation algorithm because the

7. The standard sigmoid activation function has the following form:

Example II.19: Consider a neural network with 2 inputs, 2 hidden

Você também pode gostar