Você está na página 1de 24

# Multilayer Perceptron

1
MLP

## Function signals, Error signals and Credit Assignment Problem

Batch learning, Online Learning
Now when you have multiple neurons in multiple layer you
get: Multilayer Perceptron
Training?
The Back Propagation Algorithm
2
Batch learning:
- All / Multiple examples are presented together in one
instance / epoch of training.
- Cost function (optimization)  “average” error energy
- Synaptic weights are adjusted on epoch to epoch basis

jth neuron

3
Over all samples

Function of all
free parameters

## Pros and Cons

- Accurate gradient vector (derivative of cost function with
respect to weight vector) estimation, increasing assurance of
local minimum convergence
- Fast learning process
- High storage / memory requirements

4
Online learning:
- Training is done example by example basis
- Cost function (optimization)  instantaneous error energy
- Synaptic weights are adjusted on sample to sample basis

## In every epoch, the instantaneous error is measured until we get

Final value

5
The learning curve is Final error value (vs) Epoch

## - Usually, the same training examples BUT randomly

shuffled is used in each epoch.
- For study, a large enough initial conditions are chosen at
random, all of them yielding multiple realizations of
learning curve, whose average is considered.

## Allows it to jump and avoid local minimum & allows

tracking small changes, ALSO simple and elegant!

6
Back Propagation
Online training for multilayer
perceptions

7
Induced local field / Total activation:

Activation function:

Cost function:
1 2
𝜀𝜀 𝝎𝝎(𝑛𝑛) = 𝒀𝒀(𝑛𝑛) − 𝑫𝑫(𝑛𝑛)
2
(similar to LMS): 𝜕𝜕𝜀𝜀(𝑛𝑛)
∆𝝎𝝎(𝑛𝑛) = −η
𝜕𝜕𝝎𝝎(𝑛𝑛)
8
Sensitivity factor (how much change)

We have

We have

9
We have

So we have

Error correction:

10
Activation

## Local (neuronal level) error (gradient) is the key (how

to get in multi-layer?)
• Output node j: Easy, desired response supplied!
• Hidden node j:

11
For hidden neuron:
Sequentially going backwards tracking the error signals from
all the neurons to the hidden neuron in question.

## That’s why its back propagation

12
𝜕𝜕𝜀𝜀(𝑛𝑛)
𝜕𝜕𝑒𝑒𝑘𝑘 (𝑛𝑛)

13
We have

−𝑒𝑒𝑗𝑗 (𝑛𝑛)
14
Backward propagation of error:

15
Correction (delta rule):

Forward pass:

or
Hidden Output
For every input
Backward pass: pattern / vector
Error / Local gradient flow from output
layer towards input via the hidden layers
16
Error Functions

By considering

Invoking log-
likelihood

## −𝑬𝑬𝑻𝑻 [log 𝑝𝑝𝐷𝐷⁄𝑊𝑊,𝑋𝑋 (𝑑𝑑⁄𝑤𝑤, 𝑥𝑥 )]

This is nothing but cross entropy between d and wTx

17
*discuss later
18
Needs to be continuous

## Derivatives of sigmoid activation function:

Logistic function:

## Maximizes at the yj=0.5, so most weight change

at midrange signal values

19
Output neuron

Hidden neuron

Hyperbolic tangent:

20
Output neuron

Hidden neuron

## Gradient maximizes at the yj=0.5, so most

weight change at midrange signal values

## What about sign / threshold function or Linear line?

21
Oscillatory, unstable network!

## Smoother trajectory in weight space, slow decent /

learning towards optimal

## • Same sign of the two sum terms means accelerating decent

• Different sign of the two sum terms means stabilizing
22 (decaying) effect
Current time

Start time

## Δis sum of exponentially weighted time series,

which converges when |α|<1

 Connection Dependent*

23
No analytical convergence proof

One way: