Você está na página 1de 31

Artificial Neural Networks

Edward Gatt
What are Neural Networks?
• Models of the brain and nervous system
• Highly parallel
– Process information much more like the brain than a
serial computer
• Learning

• Very simple principles


• Very complex behaviours

• Applications
– As powerful problem solvers
– As biological models
ANNs – The basics
• ANNs incorporate the two fundamental
components of biological neural nets:

1. Neurones (nodes)
2. Synapses (weights)
• Neurone vs. Node
• Structure of a node:

Squashing function limits node output:


• Synapse vs. weight
Feed-forward nets

Information flow is unidirectional


Data is presented to Input layer
Passed on to Hidden Layer
Passed on to Output layer

Information is distributed

Information processing is parallel

Internal representation (interpretation) of data


• Feeding data through the net:

(1 × 0.25) + (0.5 × (-1.5)) = 0.25 + (-0.75) = - 0.5

1
Squashing: 0. 5
= 0.3775
1+ e
Supervised Vs. Unsupervised
• Networks can be ‘supervised’
– Need to be trained ahead of time with lots of data
• Unsupervised networks adapt to the input
– Applications in Clustering and reducing
dimensionality
– Learning may be very slow
What can a Neural Net do?
• Compute a known function
• Approximate an unknown function
• Pattern Recognition
• Signal Processing

• Learn to do any of the above


Basic Concepts
A Neural Network generally
maps a set of inputs to a set of Input 0 Input 1 ... Input n

outputs

Number of inputs/outputs is
Neural Network
variable

The Network itself is composed


Output 0 Output 1 ... Output m

of an arbitrary number of nodes


with an arbitrary topology
Basic Concepts
Definition of a node:
Input 0 Input 1 ... Input n

W0 W1 ... Wn
• A node is an element
which performs the
Wb + +
function
fH(x)
y = fH(∑(wixi) + Wb)
Connection

Output
Node
Simple Perceptron
• Binary logic application Input 0 Input 1

• fH(x) = u(x) [linear threshold]


• Wi = random(-1,1) W0 W1

Wb +
• Y = u(W0X0 + W1X1 + Wb)
fH(x)

• Now how do we train it?


Output
Basic Training
• Perception learning rule
ΔWi = η * (D – Y) * Xi

• η = Learning Rate
• D = Desired Output

• Adjust weights based on a how well the


current weights match an objective
Logic Training
• Expose the network to the logical OR X0 X1 D
operation
0 0 0
• Update the weights after each epoch
0 1 1
1 0 1
• As the output approaches the desired
1 1 1
output for all cases, ΔWi will approach 0
Training the Network - Learning
• Backpropagation
– Requires training set (input / output pairs)
– Starts with small random weights
– Error is used to adjust weights (supervised learning)
 Gradient descent on error landscape
The Backpropagation Network
•The backpropagation network (BPN) is the most popular
type of ANN for applications such as classification or
function approximation.
•Like other networks using supervised learning, the BPN is
not biologically plausible.
•The structure of the network is identical to the one we
discussed before:
• Three (sometimes more) layers of neurons,
• Only feedforward processing:
input layer → hidden layer → output layer,
• Sigmoid activation functions
Typical Activation Functions
• F(x) = 1 / (1 + e -k ∑ (wixi) )
• Shown for
k = 0.5, 1 and 10

• Using a nonlinear
function which
approximates a linear
threshold allows a
network to approximate
nonlinear functions
Alternative Activation functions
• Radial Basis Functions
– Square
– Triangle
– Gaussian!

...
• (μ, σ) can be varied at Input 0 Input 1 Input n

each hidden node to


guide training fRBF
(x)
fRBF
(x)
fRBF
(x)

fH(x) fH(x) fH(x)


The Backpropagation Network
•BPN units and activation functions:
output vector y
… OK f(neto)
O1

H1 H2 H3 … HJ f(neth)

I1 I2 … II
input vector x
Supervised Learning in the BPN
•Before the learning process starts, all weights (synapses)
in the network are initialized with pseudorandom
numbers.
•We also have to provide a set of training patterns
(exemplars). They can be described as a set of ordered
vector pairs {(x1, y1), (x2, y2), …, (xP, yP)}.
•Then we can start the backpropagation learning
algorithm.
•This algorithm iteratively minimizes the network’s error
by finding the gradient of the error surface in weight-
space and adjusting the weights in the opposite direction
(gradient-descent technique).
Supervised Learning in the BPN
•Gradient-descent example: Finding the absolute
minimum of a one-dimensional error function f(x):
f(x)
slope: f’(x0)

x0 x1 = x0 - ηf’(x0) x

Repeat this iteratively until for some xi, f’(xi) is sufficiently close to 0.
Supervised Learning in the BPN
• In the BPN, learning is performed as follows:
1. Randomly select a vector pair (xp, yp) from the training
set and call it (x, y).
2. Use x as input to the BPN and successively compute
the outputs of all neurons in the network (bottom-up)
until you get the network output o.
3. Compute the error δopk, for the pattern p across all K
output layer units by using the formula:
o o
δ pk = ( yk − ok ) f ' (net )
k
Supervised Learning in the BPN
4. Compute the error δhpj, for all J hidden layer units by
using the formula:
K
δ pjh = f ' (netkh )∑ δ pko wkj
k =1

5. Update the connection-weight values to the hidden layer by using the following
equation:

h
w ji (t + 1) = w ji (t ) + ηδ x pj i
Supervised Learning in the BPN
6. Update the connection-weight values to the output layer by using the following
equation:

o
wkj (t + 1) = wkj (t ) + ηδ pk f (net hj )

•Repeat steps 1 to 6 for all vector pairs in the training set;


this is called a training epoch.
•Run as many epochs as required to reduce the network
error E to fall below a threshold ε:
P K
E = ∑∑ (δ ) o 2
pk
p =1 k =1
Supervised Learning in the BPN
The only thing that we need to know before we can start our network is the
derivative of our sigmoid function, for example, f’(netk) for the output neurons:

1
f (net k ) =
1 + e − net k

∂f (net k )
f ' (net k ) = = ok (1 − ok )
∂net k
Supervised Learning in the BPN
Now our BPN is ready to go!
If we choose the type and number of neurons in our network appropriately, after
training the network should show the following behavior:
• If we input any of the training vectors, the network
should yield the expected output vector (with some
margin of error).
• If we input a vector that the network has never
“seen” before, it should be able to generalize and
yield a plausible output vector based on its
knowledge about similar input vectors.
Self-Organizing Maps (Kohonen Maps)
In the BPN, we used supervised learning.
This is not biologically plausible: In a biological system, there is no external
“teacher” who manipulates the network’s weights from outside the network.
Biologically more adequate: unsupervised learning.
We will study Self-Organizing Maps (SOMs) as examples for unsupervised learning
(Kohonen, 1980).
Self-Organizing Maps (Kohonen Maps)
Such topology-conserving mapping can be achieved by SOMs:
• Two layers: input layer and output (map) layer
• Input and output layers are completely connected.
• Output neurons are interconnected within a defined
neighborhood.
• A topology (neighborhood relation) is defined on
the output layer.
Self-Organizing Maps (Kohonen Maps)
A neighborhood function φ(i, k) indicates how closely neurons i and k in the output
layer are connected to each other.
Usually, a Gaussian function on the distance between the two neurons in the layer
is used:

position of k
position of i
Unsupervised Learning in SOMs
For n-dimensional input space and m output neurons:

(1) Choose random weight vector wi for neuron i, i = 1, ..., m


(2) Choose random input x
(3) Determine winner neuron k:
||wk – x|| = mini ||wi – x|| (Euclidean distance)

(4) Update all weight vectors of all neurons i in the


neighborhood of neuron k: wi := wi + η·φ(i, k)·(x – wi)
(wi is shifted towards x)

(5) If convergence criterion met, STOP.


Otherwise, narrow neighborhood function φ and learning
parameter η and go to (2).

Você também pode gostar