Escolar Documentos
Profissional Documentos
Cultura Documentos
COMP4302/COMP5322, Lecture 8
NEURAL NETWORKS
Consist of 2 layers
z Non-linear first layer (Gaussian transfer functions typically used)
z Linear output layer
z Fully connected network all neurons of the previous layer are
connected with all of the next layer
Vacations example
z How much you enjoy vacations as a function of their duration
Ex. Duration (days)
1
4
2
7
3
9
4
12
z
z
Interpolation Task
z
Given:
Goal:
Rating (1-10)
5
9
2
6
How to construct y?
k
y ( x1 ,..., x n ) = wi f i ( x1 ,..., x n )
z y is a weighted sum of other functions f i :
i =1
z each f i depends on the distance between the current input vector x and
the i-th sample input vector from the database ci: f i ( x ) = i ( x c i )
z i.e. each sample from the database (training data) is a reference point
(center) established by the i-th input-output pair
z f i gets its maximum value when x is close to ci
z each f i depends on only one center => it specializes in handling the
influence of the i-the sample on future predictions
Thus:
y ( x ) = wi e
x ci
x ci
i =1
The output layer adds the weighted outputs of the hidden layer
i ( x ci ) = e
It is still possible (and easy) to compute a set of weights such that the
correct output is produced for each input-output examples
y ( x1 ) = w1e
y ( x 2 ) = w1e
y ( x 3 ) = w1e
y ( x 4 ) = w1e
11
10
x1 x1
2
x 2 x1
+ w2 e
2
+ w2 e
2
x 3 x1
2
x 4 x1
+ w2 e
+ w2 e
x1 x 2
2
x2 x2
+ w3e
2
+ w3e
2
x3 x 2
2
x4 x2
+ w3e
+ w3e
x1 x 3
2
x 2 x3
+ w4 e
2
+ w4 e
2
x3 x3
2
x 4 x3
x1 x 4
2
+ w4 e
+ w4 e
x2 x4
2
x3 x4
2
x4 x4
12
w1
w2
1
4.90
8.84
4
0.87
13.93
16
-76.50
236.49
w3
w4
0.73
-9.20
-237.77
5.99
8.37
87.55
For each given sample (training set example), create a neuron in the first
layer centered on the sample input vector
Create a system of linear equations
z Choose
z
z
z
13
As the number of neurons in the first layer is lower than the number of
samples, a set of weights such that the network produces the correct output
for all samples does not exist
However, a set of weights that results in a reasonable approximation for all
samples can be found
One way is to use the gradient descent to minimize the error (as in
2
x c
ADALINE)
k i
wi = e ( k ) pi ( k ) = (t k y k ) pi (t ) = (t k y k ) e 2
k
x k ci
2
wi ( k +COMP4302/5322
1) = wi ( k ) + Neural
(Networks,
t k y k )w8,
e s2 2003
k
15
w1
w2
c1
c2
8.75
7.33
5.61
4.47
7
7
12
12
16
Initial values
Final values
14
If there are many samples, the number of neurons is the first layer of
the interpolation nets will be high
Solution: Approximate the unknown function with a smaller number of
neurons that are somehow representative
Compute the distance between the sample input and each of the centers
Compute the Gaussian functions of each distance
Multiply the Gaussian function by the corresponding weight of the neuron
Equate the sample output with the sum of the weighted Gaussian functions of
distance
cij = wi (t k y k ) e
x k ci
2
( x kj c ij )
Initial values
Final values
w1
8.75
9.13
w2
5.61
8.06
c1
7
6
c2
12
13.72
Adjustment of both
weights and centers
provides a better
approximation than
adjustment of either of
them alone
17
18
RBF networks for approximation with 3 different widths have been created
19
20
Try demorb1 (fit a function with RBF NN), demorb3 (too small
width) and demorb4 (too big width)!
COMP4302/5322 Neural Networks, w8, s2 2003
21
23
Input
(0, 0)
(1, 1)
(0, 1)
(0, 0)
Output
0
0
1
1
Define a pair of Gaussian
functions
c1 = (1,1), 1 ( x ) = e
x c1
c 2 = ( 0,0), 2 ( x ) = e
22
xc2
24
25
26
Correct classifications
Typically one output neuron for each class => i.e. binary encoding of the
targets
1.5
<0.5
>0.5
>0.5
<0.5
>0.5
>0.5
Input 2
<0.5
0.5
0
-1
-0.5
0.5
1.5
3.
4.
-0.5
-1
Input 2
2.
<0.5
Input 1
27
28
ANN classifications
2
1.5
Input 2
0.5
mis.
0
-1
-0.5
0.5
1.5
-0.5
6.
Initialize the weights between first and output layer to small random values
7.
Train the weights between the first and output layer using the LMS
algorithm
29
-1
Input 1
30
An RBF network has a simple architecture (2 layers: nonlinear and linear) with
single hidden layer; MLP may have 1 or more hidden layers.
MLPs may have a complex pattern of connectivity (not fully connected), RBF is
a fully connected network.
The hidden layer of an RBF net is non-linear, and the output layer is linear.
The hidden and output layers of an MLP used as a classifier are usually
nonlinear. When MLPs are used for nonlinear regression, a linear output layer
is preferred. Thus, MLPs may have variety of activation functions.
31
Representation formed (in the space of hidden neurons activations with respect
to the input space)
A MLP is said to form a distributed representation since for a given input
vector many hidden neurons will typically contribute to the output value.
RBF nets form a local representation as for a given input vector only a few
hidden neurons will have significant activation and contribute to the
output.
33
32
In MLP - usually at the same time as a part of a single global training strategy
involving supervised learning.
A RBF net is typically trained in 2 stages with the RBFs being determined first by
unsupervised technique using the given data alone, and the weights between the
hidden and output layer being found by fast linear supervised methods.
34
Number of neurons
z
MLPs typically have less number of of neurons than RBF NNs. This is because
they respond to large regions of the input space.
RBF nets may require many RBF neurons to cover the input space (larger
number of input vectors and larger ranges of input vectors => more RBF neurons
are needed.) However, they are capable of fast learning and have reduced
sensitivity to the order of presentation of training data.
35
36
RBF neuron and grows them incrementally until the error falls below the
goal
37