Escolar Documentos
Profissional Documentos
Cultura Documentos
Networks
CSL465/603 - Fall 2016
Narayanan C Krishnan
ckn@iitrpr.ac.in
Outline
Perceptron
Stochastic Gradient Descent
Multi-layer perceptron
Backpropagation algorithm
Variants of backpropagation networks
Perceptron
Developed by Frank Rosenblatt 1950-60
LINEAR MODELS FOR CLASSIFICATION
196
Initial4.version
was a piece of hardware
Figure 4.8 Illustration of the Mark 1 perceptron hardware. The photograph on the left shows how the inputs
were obtained using a simple camera system in which an input scene, in this case a printed character, was
illuminated by powerful lights, and an image focussed onto a 20 20 array of cadmium sulphide photocells,
giving a primitive 400 pixel image. The perceptron also had a patch board, shown in the middle photograph,
which allowed different configurations of input features to be tried. Often these were wired up at random to
demonstrate the ability of the perceptron to learn without the need for precise wiring, in contrast to a modern
digital computer. The photograph on the right shows one of the racks of adaptive weights. Each weight was
implemented using a rotary variable resistor, also called a potentiometer, driven by an electric motor thereby
allowing the value of the weight to be adjusted automatically by the learning algorithm.
Artificial Neural Networks
Perceptron
Input vector x column vector
Weight vector w column vector
x1
w1
x2
w2
.
.
.
x0=1
w0
wn
wi xi
i=0
o=
xn
1 if
x >0
i=0 i i
-1 otherwise
1, if w + x > 0
Output value - x = %
1, if w + x < 0
Artificial Neural Networks
Representational Power of
Perceptron (1)
Perceptron hyperplane decision surface
x2
x2
+
+
x1
-
(a)
+
x1
-
Decision boundary w x = 0
Datasets that can separated by a hyperplane
linearly separable.
Artificial Neural Networks
(b)
Representational Power of
Perceptron (2)
A single perceptron can represent many Boolean
functions
1
Perceptron Criterion
1, if w + x > 0
x =%
1, if w + x < 0
Using the target coding scheme it follows that all
data points should satisfy
w + x4 5 > 0, = 1, ,
A possible error function could be
; = = w + x4 5
5
Perceptron Convergence
Perceptron convergence theorem
If there exists an exact solution ( if the training data set is
linearly separable), then the perceptron learning
algorithm is guaranteed to find an exact solution in a finite
number of steps.
10
w ABC = w DEF = 5 5 x4
5M1
Artificial Neural Networks
11
Stochastic Approximation
Practical difficulties with gradient descent
Converge to the local minimum can be slow
12
Limitation of Perceptron
Can represent only linear functions
Example - XOR Function
1
N 0
2 + N > 0
1 + N > 0
1 + 2 + N 0
Artificial Neural Networks
13
Multilayer Perceptron
Architecture of the multilayer network
Node at the hidden layer
Algorithm to learn the weights of the connections
between nodes
bad
hid
+ hod
r had
r hawed
hoard
o heed
c hud
who'd
hood
0
head hid
4 ...
whod hood
who'd hood
.
,
F1
FIGURE
Artificial Neural Networks
F2
4.5
Decision regions of a multilayer feedforward network. The network shown here was trained to
14
w1
x2
w2
.
.
.
x0 = 1
w0
wn
net = wi xi
i=0
o = (net) =
1
-net
1+e
xn
15
1Y
^
_Y
PY
=1
NP
1P
=1
QP
`P
Input - x
Connection weights
between input and hidden
layer PQ
Output of the SP hidden
layer node - P
P = wVW x
Connection weights
between hidden and
output layer YP
Output of the SP output
layer node Y
Y = v\W z
16
1Y
^
_Y
PY
=1
NP
1P
QP
`P
=1
17
YM1
NY
1Y
_Y
18
=
1
P1
P2
^
P^
P
QP
19
QP = = Y Y Y 1 Y PY P 1 P Q
YM5
20
Termination criteria
All training samples are correctly classified
Error between two consecutive epoch does not change
significantly
Have a limit on the number of epochs.
Artificial Neural Networks
21
22
Error function
, = = Y log Y
Weight update equations
YM1
PY = Y Y P
^
QP = = Y Y PY P 1 P Q
YM5
Artificial Neural Networks
23
PS =
+ PS1
P
24
25
26
Outputs
27
0.8
0.9
0.7
0.8
0.6
0.7
0.5
0.6
0.4
0.5
0.3
0.4
0.2
0.3
0.1
0.2
0.1
0
0
500
1000
1500
2000
2500
500
1000
1500
2000
2500
500
1000
1500
2000
2500
28
Overfitting (1)
Overtraining could result in overfitting!
0.01
Training set error
Test set error
0.009
0.008
0.006
0.005
0.004
0.003
0.08
0.002
0.07
2000 4000 6000 8000 10000 12000 14000 16000 18000 20000
Training iterations
0.06
0.05
Error
Error
0.007
0.04
0.03
0.02
0.01
0
0
CSL465/603 - Machine Learning
1000
2000
3000
4000
Training iterations
5000
6000
29
Overfitting (2)
Network with + 1 inputs, outputs and + 1
hidden layer units has totally + 1 + + 1
number of parameters to be learned!
determines the complexity of the model
Large value of complex functions that are prone to
overfitting
Small value of simpler functions that under fit the data
1
10
30
31
32
= + 2
2
Grouping of connections
Force a set of connections to have same weight
33
5. NEURAL NETWORKS
Input image
Artificial Neural Networks
Convolutional layer
CSL465/603 - Machine Learning
Sub-sampling
layer
34
Recurrent Networks
Neural network model applied to time series data
Outputs of the network at time t is also an input to other
units at time t+1
35
Summary
Perceptron
Linearly separable data
Perceptron criterion
Multilayer perceptrons
Non-linear activation functions
Hidden layer units
Backpropagation algorithm for training
36