Você está na página 1de 24

Artificial Neural Networks

Module 2

Simple Neural
Networks

UTM
1

Module 2 Contents

2.0 Simple Neural Networks


 2.1 Historical Perspectives
 2.2 Some Issues on ANN
 Biases and Thresholds
Linear Separability
 2.3 Simple Neural Networks
 The McCulloch-Pitts Neuron

 The Perceptrons

 2.4 Summary of Module 2

1
Module 2 Objectives
 From this module, the student should be able to understand
neural networks from a historical point of view.
 By understanding the limitations of primitive neural networks
can provide us with new insights on new neural network
paradigms.
 This module will allow the student to understand several
primitive neural networks such as the McCulloch Pitts neuron,
Hebb neural networks, Perceptrons, etc. and how they are used
to solve simple pattern classification problems.
 To study some of the important early developments in neural
networks.
 To be able to understand how they have been used in pattern
classification.
 To see their limitations and the need for better models and
learning algorithms. 3

ANN ~ What is it?


• An information processing system, developed based on the
biological brain.
• Consists of simple processing units called neurons (nodes).
• All the neurons are assumed to be interconnected.
• Neurons are activated by a certain (activation) function.
• For each interconnection, there is an associated weight
(adjustable gains).
• Signals are processed from input to output by multiplying
the weights and activated signals.
• A learning rule is needed to adapt the weights to solve a
particular problem.
• Training patterns are needed in order to train the ANN.
4

2
2.1 Historical Perspectives
Biological Neuron
of ANN

Artificial Neuron
x1
W11 sum = W 11x + W 12x +...+ W1NxN
1 2
x2
W12
x3 f()

S
W13

W1jxj O

1
W1N

xN

Neural Networks

2.1 Historical Perspectives of ANN

Some Important Names in ANN Research:

 McCulloch and Pitts … 1943 (1st Neuron Model)

 Donald Hebb …. 1949 (1st Learning Rule)

 Marvin Minsky …. 1951 (1st Neural Machine)

 Rosenblatt …. 1958 (Perceptrons)


6

3
Why do we need to study
from a historical
perspective?
 It is important to study ANNs from its historical
perspectives such that we would be able to know the
pitfalls of primitive ANNs and also able to study the
measures the researchers took to overcome such
limitations.

 Through this knowledge, we would be able to


discover new ideas and insights into better neuro-
models which can help us to solve more demanding
applications.

2.1.1 The Early Years

 As mentioned the first artificial neuron was formulated by


McCulloch and Pitts in 1943.

 However, this neuron does not have any learning algorithm,


which means the weights are not adapted.

 They dealt variable inputs which were multiplied by fixed


synaptic weights, with the products summed.

 If the sum exceeded the neuron’s threshold, the neuron then


fires, if not it is switched off.

 Perhaps the first learning algorithm was proposed by Donald


Hebb, a psychologist at McGill university, Canada in 1949. He
wrote a book entitled “The Organization of Behavior”.
8

4
 Hebb’s learning rule (referred to as “Hebbian learning”) has a
number of limitations (see 2.2.3).

 Other notable researchers were Marvin Minsky who developed


the first neural machine in 1951in New York.

 In the 1950s and 1960s, which is referred to as “The First


Golden Age of Neural Networks”, there were many researchers
who proposed many neural network techniques (see Table).

 In 1958, Frank Rosenblatt from Cornell Aeronautical Laboratory,


put together a learning machine, called the perceptron, that
was destined to be the forerunner of the ANNs of the 1980s and
1990s.

 It combined the McCulloch-Pitts and Hebb models, and it did so


in a functioning piece of hardware.

 Rosenblatt’s machine was complicated and bulky, using motor-


driven potiometers to represent variable synapses. 9

 To a certain extent, the perceptrons were shown by Rosenblatt,


to be able to solve many classification problems.

 He demonstrated the capabilities of the perceptron and even cut


a bunch of arbitrarily selected wires in a densely interconnected
multinode perceptron, and it would still work, though with some
degradation.

 However, it was later discovered that Rosenblatt’s perceptrons


was only able to solve linear problems.

 In 1969, Minsky and Papert of MIT published an in-depth


analysis of the perceptrons, demonstrating that the classes of
inputs a perceptron could distinguish were very limited.

 This nail the research in ANNs and from the 1970s, it became
the “Quiet years” in ANN research.

10

5
The Perceptrons

The Mark I perceptron patchboard. The connection patterns were


typically “random”, so as to illustrate the ability of the perceptron
to learn the desired pattern without need for precise wiring (in
contrast to the precise wiring required in a programmed computer).

11

Mark I Perceptron

Charles Wightman holding a sub-rack of 8 motor/potentiometer


pairs. Each motor/potentiometer pair functioned as a single
adaptive weight value. The perceptron learning law was
implemented in analog circuits that would control the motor of
each potentiometer (the resistance of which functioned to
implement one weight).
12

6
Mark I Perceptron

The Mark I Preceptron image input system being adjusted by Charles


Wightman, Mark I Perceptron project enginneer. A printed character
was mounted on the board and illuminated with four floodlights. The
image of the character was focused on a 20x20 array of CdS
photoconductors, which then provided 400 pixel values for use as
inputs to the neural network (which then attempted to classify the
figure into one of M classes).
13

Research in ANNs - 1940s ~ 1960s (The Early Years)

 1943: McCulloch-Pitts - 1st artificial neuron.


 1949: D. Hebb - 1st learning algorithm
 1951: Minsky - 1st neural machine
 1956: Rochester, Holland, etc. - Tests on a cell assembly
of the action of the brain
 1958: Von Neumann - try to model the brain
 1958: Rosenblatt - the perceptrons
 1960: Widrow and Hoff - LMS algorithm/Adaline
- Demonstrated the 1st application of neural
networks in cancellation of echo in telephone lines.
 1969: Minsky and Papert: In-depth analysis of the
perceptrons. Published a book on the limitations
of the perceptrons.
14

7
Research in ANNs - 1970s ~ 1980s (The Quiet Years)

 1972: Kohonen - work on associative memory NNs

 1972: James Anderson - developed the “Brain-State-in-a-


Box” neural network

 1975: Fukushima at NHK laboratories in Tokyo developed


a series of specialized NN for character recognition
called the cognitron, however, failed to recognize
position or rotation-distorted characters
(Neocognitron).

 1976: Grossberg: Director of Center for Adaptive Systems


at Boston University has 146 publications on ANNs
(up to 1989) - very mathematical and biological

 1982: Kohonen developed the self-organizing maps and


demonstrated several practical applications

15

Research in ANNs - 1980s ~ 1990s (The Golden Years)


 1984: Ackley, Hinton and Sejnowski developed the
Boltzmann machine, nondeterministic NN in which
weights or activations are changed on the basis of
p.d.f.

 1985: Grossberg and Carpenter: developed the The ART


family of NNs based on adaptive resonance theory.

 1985: Hopfield is a nobel prize winner (in physics),


together with D. Tank at AT&T proposed a number
of ANNs based on fixed weights and adaptive
activations called the Hopfield network.

 1986: Rumelhart and PDP group at Stanford University


developed the backpropagation algorithm which
has been demonstrated successfully for many
applications. This algorithm was perhaps the most
widely used ANN paradigm and has started a
revolution in the field of ANNs.
 2000: ??? 16

8
2.2 General Information on ANN
 2.2.1 Biases and Thresholds
 2.2.2 Linear Separability

17

2.2.1 Biases and Thresholds


 A bias acts exactly like a weight.

 It is considered as a connection and its activation is


always 1.

 It is then adapted similarly to the way a weight is


adapted according to the learning rule of the ANN.

 Its use is to increase signal levels in the ANN such as


to improve convergence.

 Some ANN do not use any bias signals.


18

9
• A neural network input S with a bias signal can be
written as follows:

S  b   xi wi
1 b

W1

S
x1
y

W2

x2

19

 A threshold (q) is a value that is used to make


some form of decisions in an ANN such that the
ANN will fire or unfire.
 It is quite similar to a bias but is not adapted.
 An example of a binary threshold is given as
follows:
1

o
q

 Then the equation of the separating line becomes:


b  x1w1  x2 w2  q
20

10
2.2.2 Linear Separability
 Weights and biases in an ANN determine the boundary regions
that separate the ANN output

(-)

 Example of an AND function +

(-)

21

ANN Performance
 The performance of an ANN is described by
the figure of merit, which expresses the
number of recalled patterns when input
patterns are applied, that could be complete,
partially complete, or even noisy.

 A 100% performance in recalled patterns


means that for every trained input stimulus
signal, the ANN always produces the desired
output pattern.
22

11
ANN PERFORMANCE

000 = MOUSE
001 = RABBIT

010 = COW

23

Important ANN Parameters [1]

When designing an ANN, one should be concerned


with the following:

 Network topology
 Number of layers in the network
 Number of neurons per layer
 Number of calculations per iterations
 Speed to recall a pattern
 Network performance
 Network plasticity (i.e. the no. of neurons failing and
the degree of functionality of the ANN)
 Network capacity (max pattern signals that the ANN
can recall)
24

12
Important ANN Parameters [2]
 Degree of adaptability of the ANN (to what extent the
ANN is able to adapt to itself after training)

 Bias terms (occasionally set a priori to some fixed value,


such as +1)

 Threshold terms (occasionally set to some a priori fixed


value, such as 0 or 1).

 Boundaries of the weights (initial values of the weights)

 Choice of the nonlinearity function (activation functions)

 Network noise immunity (the degree of noise corruption


on patterns that could still produce desired outputs)

 Steady-state or Final values of the synaptic weights 25

2.2 Simple Neural Networks


2.2.1 Introduction
 Also called primitive neural networks.
 Mainly used as pattern classifiers.
 Usually are single layer in architecture.
 Used in the 40s-60s for simple applications such as
membership in a single-class (i.e. either “in’ or “out”).

Input
Pattern Output

YES

NO

26

13
 Several examples of these neural networks
 McCulloch-Pitts neuron ~ 1st artificial neuron

 Hebb net ~ 1st implementation of learning in


neural nets
 The Perceptron
 ADALINE and MADALINE

 Some examples of applications of these nets are


 Detection of heart abnormalities with ECG data as

inputs in 1963 (Specht, Widrow). It has 46 input


measurements and the output was either normal
or abnormal.
 For echo-cancellation in telephone lines (Widrow).
 Minsky and Papert used the Perceptrons for
“connected” or “not-connected” pattern
classifications. 27

2.2.2 The McCulloch-Pitts Neuron


 Uses only binary activation signals.
 Connected by directed, weighted paths.
 A connection path is:
 excitatory if the weight is positive
 inhibitory if otherwise

 All excitatory connections into a particular neuron have the


same weights.
 Neuron will fire when input into the neuron is greater than the
threshold.
 It takes 1 time step for a signal to pass over 1 connection link.
 No learning ~ weights are assigned rather than adapted.
 Neural network is implemented in hardware such as relays and
resistors.

28

14
• Architecture of the McCulloch-Pitts Neuron:

1 if s >= q
f(s) =
0 if s < q

x1 sum = W 11x1 + W 12 x2+...+W 1NxN


W11
x2 W12

S
f( )
W13 1
x3 W1jxj Oj
o
q
W1N N
xN
Oj = f ( SW
i =1
ij xj )
29

Example to solve an AND function

x1 x2 y
q, threshold for y is 2
1 1 1
1 0 0
0 1 0
0 0 0
x1 W=1

S Wij
1

o y
W=1 2
x2
y= f ( SW ij xj )

30

15
Example to solve an OR function
x1 x2 y
1 1 1
1 0 1
q, threshold for y is 2 0 1 1
0 0 0

x1 W=2

S Wij
1

2
o
y
W=2

x2
y= f ( SW
ij xj )

31

Example to solve an XNOR function

x1 x2 y
1 1 1
1 0 0
0 1 0
0 0 1

w=2
x1 z1 w=2
w=-1
y y
w=-1
x2 w=2
w=2 z2

32

16
2.2.4 The Perceptrons
 Developed by Frank Rosenblatt (1958).

 Its learning rule is superior than the Hebb learning rule.

 Has been proven by Rosenblatt that the weights can


converge on particular applications.

 However, the Perceptron does not work for nonlinear


applications as proven by Minsky and Papert (1969).

 Activation function used is the bipolar activation function


with an arbitrary, but fixed threshold.

 Weights are adjusted by the Perceptron learning rule:

wi ( new )  wi ( old )   Tx i

33

The Perceptron Algorithm


Step 0. Set up the NN model (which follows the problem to be solved)
Initialize weights and bias.
(For simplicity, set weights and bias to zero or randomize)
Set learning rate,  (0<<1) and threshold (0<q<1)
(For simplicity,  can be set to 1.)

Step 1. While stopping condition is false, do Steps 2-6.


Step 2. For each training pair u:t, do Steps 3-5.
Step 3. Set activations of input units:

xi  ui .
Step 4. Compute response of output unit:

s  b   xi wi
i
1 if s  q
q is a threshold value 
y  0 if -q  s < q
assigned between 0 and 1
 1 if s < - q
34

17
The Perceptron Algorithm

Step 5 Update weights and bias if an error occurred for this pattern:

If y  t
w i (new )  wi (old )   txi .
b (new )  b (old )   t.
else
w i (new )  wi (old ),
b (new )  b (old ).

Step 6. Test stopping condition :


If no weights changed in Step 2, stop; else, continue.

35

Exercise 2.2a
1
x1 x2 t b
1 1 1 x1 W1
1
-1
-1
1
-1
-1
W2
S y

-1 -1 -1
x2
• Using the Perceptron network, solve the AND problem above (using
bipolar activations) and show the results of the weights adaptation for
each pattern over 1 epoch.
• Choose all initial weights to be -1, =0.5, and q=0.2
• Fill up the following table.

Iter# x1 x2 s y t w1 w2 b

36

18
Exercise 2.2b
x1 x2 x3 t
-1 -1 -1 -1 1
b
-1 -1 1 1
-1 1 -1 1 x1 W1

-1 1 1 1 y
x2
f
W2
1 -1 -1 1 W3
1 -1 1 1 x3
1 1 -1 1
1 1 1 1
• Using the Perceptron network (3-input), solve the OR problem above
(using bipolar activations) and show the results of the weights
adaptation for each pattern over 1 epoch.
• Choose all initial weights to be -0.5, =0.25, and q=0.1
• Fill up the following table.
37

Iter. #1 (-1 -1 -1 t=-1) : s = 1 + (-0.5*-1) + (-0.5*-1) + (-0.5*-1)


s = 1 + 0.5 + 0.5 + 0.5 = 2.5
y = 1 (2.5>0.1)
t = -1

y != t, hence Adapt weights & bias


w1(new) = w1(old) + tx1
= -0.5 + (0.25*-1*-1)
= -0.25
w2(new) = w2(old) + tx2
= -0.5 + (0.25*-1*-1)
= -0.25
w3(new) = w3(old) + tx3
= -0.5 + (0.25*-1*-1)
= -0.25
b(new) = b(old) + t
= 1 + (0.25*-1)
= 0.75 38

19
Iter. #2 (-1 -1 1 t=1) : s = 0.75 + (-0.25*-1) + (-0.25*-1) + (-0.25*1)
s = 0.75 + 0.25 + 0.25 - 0.25 = 1
y = 1 (1>0.1)
t=1

y = t, hence No Adaptation
w1(new) = w1(old) = -0.25
w2(new) = w2(old) = -0.25
w3(new) = w3(old) = -0.25
b(new) = b(old) = 0.75

Iter. #3 (-1 1 -1 t=1) : s = 0.75 + (-0.25*-1) + (-0.25*1) + (-0.25*-1)


s = 0.75 + 0.25 + 0.25 - 0.25 = 1
y = 1 (1>0.1)
t=1

y = t, hence No Adaptation
w1(new) = w1(old) = -0.25
w2(new) = w2(old) = -0.25
w3(new) = w3(old) = -0.25
b(new) = b(old) = 0.75
39

Iter. #4 (-1 1 1 t=1) : s = 0.75 + (-0.25*-1) + (-0.25*1) + (-0.25*1)


s = 0.75 + 0.25 - 0.25 - 0.25 = 0.5
y = 1 (0.5>0.1)
t=1

y = t, hence No Adaptation
w1(new) = w1(old) = -0.25
w2(new) = w2(old) = -0.25
w3(new) = w3(old) = -0.25
b(new) = b(old) = 0.75

Iter. #5 (1 -1 -1 t=1) : s = 0.75 + (-0.25*1) + (-0.25*-1) + (-0.25*-1)


s = 0.75 - 0.25 + 0.25 + 0.25 = 1
y = 1 (1>0.1)
t=1

y = t, hence No Adaptation
w1(new) = w1(old) = -0.25
w2(new) = w2(old) = -0.25
w3(new) = w3(old) = -0.25
b(new) = b(old) = 0.75
40

20
Iter. #6 (1 -1 1 t=1) : s = 0.75 + (-0.25*1) + (-0.25*-1) + (-0.25*1)
s = 0.75 - 0.25 + 0.25 - 0.25 = 0.5
y = 1 (0.5>0.1)
t=1

y = t, hence No Adaptation
w1(new) = w1(old) = -0.25
w2(new) = w2(old) = -0.25
w3(new) = w3(old) = -0.25
b(new) = b(old) = 0.75

Iter. #7 (1 1 -1 t=1) : s = 0.75 + (-0.25*1) + (-0.25*1) + (-0.25*-1)


s = 0.75 - 0.25 - 0.25 + 0.25 = 0.5
y = 1 (0.5>0.1)
t=1

y = t, hence No Adaptation
w1(new) = w1(old) = -0.25
w2(new) = w2(old) = -0.25
w3(new) = w3(old) = -0.25
b(new) = b(old) = 0.75
41

Iter. #8 (1 1 1 t=1) : s = 0.75 + (-0.25*1) + (-0.25*1) + (-0.25*1)


s = 0.75 - 0.25 - 0.25 - 0.25 = 0
y = 0 (-0.1<0<0.1)
t=1

y != t, hence Adapt weights & bias


w1(new) = w1(old) + tx1
= -0.25 + (0.25*1*1)
= 0
w2(new) = w2(old) + tx2
= -0.25 + (0.25*1*1)
= 0
w3(new) = w3(old) + tx3
= -0.25 + (0.25*1*1)
= 0
b(new) = b(old) + t
= 0.75 + (0.25*1)
= 1
42

21
Epoch #1

x1 x2 x3 s y t w1 w2 w3 b

- - - -0.5 -0.5 -0.5 1


-1 -1 -1 2.5 1 -1 -0.25 -0.25 -0.25 0.75
-1 -1 1 1 1 1 -0.25 -0.25 -0.25 0.75
-1 1 -1 1 1 1 -0.25 -0.25 -0.25 0.75
-1 1 1 0.5 1 1 -0.25 -0.25 -0.25 0.75
1 -1 -1 1 1 1 -0.25 -0.25 -0.25 0.75
1 -1 1 0.5 1 1 -0.25 -0.25 -0.25 0.75
1 1 -1 0.5 1 1 -0.25 -0.25 -0.25 0.75
1 1 1 0 0 1 0 0 0 1

43

Epoch #2

x1 x2 x3 s y t w1 w2 w3 b

- - - 0 0 0 1
-1 -1 -1 1 1 -1 0.25 0.25 0.25 0.75
-1 -1 1 0.5 1 1 0.25 0.25 0.25 0.75
-1 1 -1 0.5 1 1 0.25 0.25 0.25 0.75
-1 1 1 1 1 1 0.25 0.25 0.25 0.75
1 -1 -1 0.5 1 1 0.25 0.25 0.25 0.75
1 -1 1 1 1 1 0.25 0.25 0.25 0.75
1 1 -1 1 1 1 0.25 0.25 0.25 0.75
1 1 1 1.5 1 1 0.25 0.25 0.25 0.75

44

22
Epoch #3

x1 x2 x3 s y t w1 w2 w3 b

- - - 0.25 0.25 0.25 0.75


-1 -1 -1 0 0 -1 0.5 0.5 0.5 0.5
-1 -1 1 0 0 1 0.25 0.25 0.75 0.75
-1 1 -1 0 0 1 0 0.5 0.5 1
-1 1 1 2 1 1 0 0.5 0.5 1
1 -1 -1 0 0 1 0.25 0.25 0.25 1.25
1 -1 1 1.5 1 1 0.25 0.25 0.25 1.25
1 1 -1 1.5 1 1 0.25 0.25 0.25 1.25
1 1 1 2 1 1 0.25 0.25 0.25 1.25

45

Epoch #4

x1 x2 x3 s y t w1 w2 w3 b

- - - 0.25 0.25 0.25 1.25


-1 -1 -1 0.5 1 -1 0.5 0.5 0.5 1
-1 -1 1 0.5 1 1 0.5 0.5 0.5 1
-1 1 -1 0.5 1 1 0.5 0.5 0.5 1
-1 1 1 1.5 1 1 0.5 0.5 0.5 1
1 -1 -1 0.5 1 1 0.5 0.5 0.5 1
1 -1 1 1.5 1 1 0.5 0.5 0.5 1
1 1 -1 1.5 1 1 0.5 0.5 0.5 1
1 1 1 2.5 1 1 0.5 0.5 0.5 1

46

23
Epoch #5

x1 x2 x3 s y t w1 w2 w3 b

- - - 0.5 0.5 0.5 1


-1 -1 -1 -0.5 -1 -1 0.5 0.5 0.5 1
-1 -1 1 0.5 1 1 0.5 0.5 0.5 1
-1 1 -1 0.5 1 1 0.5 0.5 0.5 1
-1 1 1 1.5 1 1 0.5 0.5 0.5 1
1 -1 -1 0.5 1 1 0.5 0.5 0.5 1
1 -1 1 1.5 1 1 0.5 0.5 0.5 1
1 1 -1 1.5 1 1 0.5 0.5 0.5 1
1 1 1 2.5 1 1 0.5 0.5 0.5 1

47

2.3 Summary of Module 2

 In this module we have discussed ANNs from a historical


perspective.

 The ANNs paradigms mentioned can be found in more detail in


many literature.

 We have also studied several primitive ANN algorithms.

 It can be observed that these primitive ANNs have limitations in


solving practical problems.

 However, as their architecture and algorithms are simple, they


provide a good basis for the understanding of practical ANN
paradigms such as the multilayer neural networks and the BP
algorithm.

48

24

Você também pode gostar