Você está na página 1de 7

Multilayer Perceptron

1
Classifying Nonlinearly Separable Patterns

Example: XOR Problem


Unit hypercube classification

A single perceptron draws a line!

A solution in book
X1
So
X2

2
Tricks of the Trade: Heuristics for making BackProp perform*

Stochastic versus batch update

• Stochastic especially equally effective but faster in case of


highly redundant training samples!

Maximizing Information Content (in training samples)

Use an example:
• that results in the largest training error
• that is radically different from all those previously used

Randomizing the order of same samples from epoch to epoch

* For ‘shallow’ nets


3
Activation function choice (problem specific)
But
In general (for NN):
• Odd function
• Sigmoid function, hyperbolic Tan
• a = 1.7159 and b = 2/3, as function derivative is 1 at origin

Almost linear region

4
Target Values

• Target values (desired response) be chosen well within the


range of the sigmoid activation function (say ±1)

Hidden neurons go into saturation otherwise, making the


learning process inappropriate or very slow.

For hyperbolic Tan:


• Limiting value is + 1 or -1
• The output is offset to that value by +/- 0.7159

• Better way is to manage the input to the activation


function!

5
Normalizing / Scaling inputs (Problem Specific)

Quicker tuning of
the weights

Using PCA, will speed


up learning as Destroys structure
redundancy removed.
Destroys
Making different structure
weights learn at
same speed
(balanced)
6
Initialization:
Good synaptic weights threshold choice

Synaptic weights should not be too high (saturation and


learning slowing down) or too low (saddle point, slow change in
derivative)

Standard deviation between Assumed 0


linear and saturation of
activation function.
1 when i=k
Tanh, with given a and b
of 0 mean
Assumed
uniform dist.
uncorrelated

Você também pode gostar