Artificial Neural Network

Artificial Neural Networks
Edward Gatt
What are Neural Networks?
• Models of the brain and nervous system
• Highly parallel
– Process information much more like the brain than a
serial computer
• Learning
• Very simple principles

• Very complex behaviours
• Applications
– As powerful problem solvers
– As biological models
ANNs – The basics
• ANNs incorporate the two fundamental
components of biological neural nets:
1. Neurones (nodes)
2. Synapses (weights)
• Neurone vs. Node
• Structure of a node:
Squashing function limits node output:

• Synapse vs. weight
Feed-forward nets
Information flow is unidirectional

Data is presented to Input layer
Passed on to Hidden Layer
Passed on to Output layer
Information is distributed
Information processing is parallel
Internal representation (interpretation) of data

• Feeding data through the net:
(1 × 0.25) + (0.5 × (-1.5)) = 0.25 + (-0.75) = - 0.5
1
Squashing: 0. 5
= 0.3775
1+ e
Supervised Vs. Unsupervised
• Networks can be ‘supervised’
– Need to be trained ahead of time with lots of data
• Unsupervised networks adapt to the input
– Applications in Clustering and reducing
dimensionality
– Learning may be very slow
What can a Neural Net do?
• Compute a known function
• Approximate an unknown function
• Pattern Recognition
• Signal Processing
• Learn to do any of the above

Basic Concepts
A Neural Network generally
maps a set of inputs to a set of Input 0 Input 1 ... Input n
outputs
Number of inputs/outputs is
Neural Network
variable
The Network itself is composed

Output 0 Output 1 ... Output m
of an arbitrary number of nodes

with an arbitrary topology
Basic Concepts
Definition of a node:
Input 0 Input 1 ... Input n
W0 W1 ... Wn
• A node is an element
which performs the
Wb + +
function
fH(x)
y = fH(∑(wixi) + Wb)
Connection
Output
Node
Simple Perceptron
• Binary logic application Input 0 Input 1
• fH(x) = u(x) [linear threshold]

• Wi = random(-1,1) W0 W1
Wb +
• Y = u(W0X0 + W1X1 + Wb)
fH(x)
• Now how do we train it?

Output
Basic Training
• Perception learning rule
ΔWi = η * (D – Y) * Xi
• η = Learning Rate
• D = Desired Output
• Adjust weights based on a how well the

current weights match an objective
Logic Training
• Expose the network to the logical OR X0 X1 D
operation
0 0 0
• Update the weights after each epoch
0 1 1
1 0 1
• As the output approaches the desired
1 1 1
output for all cases, ΔWi will approach 0
Training the Network - Learning
• Backpropagation
– Requires training set (input / output pairs)
– Starts with small random weights
– Error is used to adjust weights (supervised learning)
Gradient descent on error landscape
The Backpropagation Network
•The backpropagation network (BPN) is the most popular
type of ANN for applications such as classification or
function approximation.
•Like other networks using supervised learning, the BPN is
not biologically plausible.
•The structure of the network is identical to the one we
discussed before:
• Three (sometimes more) layers of neurons,
• Only feedforward processing:
input layer → hidden layer → output layer,
• Sigmoid activation functions
Typical Activation Functions
• F(x) = 1 / (1 + e -k ∑ (wixi) )
• Shown for
k = 0.5, 1 and 10
• Using a nonlinear
function which
approximates a linear
threshold allows a
network to approximate
nonlinear functions
Alternative Activation functions
• Radial Basis Functions
– Square
– Triangle
– Gaussian!
...
• (μ, σ) can be varied at Input 0 Input 1 Input n
each hidden node to

guide training fRBF
(x)
fRBF
(x)
fRBF
(x)
fH(x) fH(x) fH(x)

The Backpropagation Network
•BPN units and activation functions:
output vector y
… OK f(neto)
O1
H1 H2 H3 … HJ f(neth)
I1 I2 … II
input vector x
Supervised Learning in the BPN
•Before the learning process starts, all weights (synapses)
in the network are initialized with pseudorandom
numbers.
•We also have to provide a set of training patterns
(exemplars). They can be described as a set of ordered
vector pairs {(x1, y1), (x2, y2), …, (xP, yP)}.
•Then we can start the backpropagation learning
algorithm.
•This algorithm iteratively minimizes the network’s error
by finding the gradient of the error surface in weight-
space and adjusting the weights in the opposite direction
(gradient-descent technique).
•Gradient-descent example: Finding the absolute
minimum of a one-dimensional error function f(x):
f(x)
slope: f’(x0)
x0 x1 = x0 - ηf’(x0) x
Repeat this iteratively until for some xi, f’(xi) is sufficiently close to 0.
• In the BPN, learning is performed as follows:
1. Randomly select a vector pair (xp, yp) from the training
set and call it (x, y).
2. Use x as input to the BPN and successively compute
the outputs of all neurons in the network (bottom-up)
until you get the network output o.
3. Compute the error δopk, for the pattern p across all K
output layer units by using the formula:
o o
δ pk = ( yk − ok ) f ' (net )
k
4. Compute the error δhpj, for all J hidden layer units by
using the formula:
K
δ pjh = f ' (netkh )∑ δ pko wkj
k =1
5. Update the connection-weight values to the hidden layer by using the following
equation:
h
w ji (t + 1) = w ji (t ) + ηδ x pj i
6. Update the connection-weight values to the output layer by using the following
equation:
o
wkj (t + 1) = wkj (t ) + ηδ pk f (net hj )
•Repeat steps 1 to 6 for all vector pairs in the training set;

this is called a training epoch.
•Run as many epochs as required to reduce the network
error E to fall below a threshold ε:
P K
E = ∑∑ (δ ) o 2
pk
p =1 k =1
The only thing that we need to know before we can start our network is the
derivative of our sigmoid function, for example, f’(netk) for the output neurons:
1
f (net k ) =
1 + e − net k
∂f (net k )
f ' (net k ) = = ok (1 − ok )
∂net k
Now our BPN is ready to go!
If we choose the type and number of neurons in our network appropriately, after
training the network should show the following behavior:
• If we input any of the training vectors, the network
should yield the expected output vector (with some
margin of error).
• If we input a vector that the network has never
“seen” before, it should be able to generalize and
yield a plausible output vector based on its
knowledge about similar input vectors.
Self-Organizing Maps (Kohonen Maps)
In the BPN, we used supervised learning.
This is not biologically plausible: In a biological system, there is no external
“teacher” who manipulates the network’s weights from outside the network.
Biologically more adequate: unsupervised learning.
We will study Self-Organizing Maps (SOMs) as examples for unsupervised learning
(Kohonen, 1980).
Such topology-conserving mapping can be achieved by SOMs:
• Two layers: input layer and output (map) layer
• Input and output layers are completely connected.
• Output neurons are interconnected within a defined
neighborhood.
• A topology (neighborhood relation) is defined on
the output layer.
A neighborhood function φ(i, k) indicates how closely neurons i and k in the output
layer are connected to each other.
Usually, a Gaussian function on the distance between the two neurons in the layer
is used:
position of k
position of i
Unsupervised Learning in SOMs
For n-dimensional input space and m output neurons:
(1) Choose random weight vector wi for neuron i, i = 1, ..., m

(2) Choose random input x
(3) Determine winner neuron k:
||wk – x|| = mini ||wi – x|| (Euclidean distance)
(4) Update all weight vectors of all neurons i in the

neighborhood of neuron k: wi := wi + η·φ(i, k)·(x – wi)
(wi is shifted towards x)
(5) If convergence criterion met, STOP.

Otherwise, narrow neighborhood function φ and learning
parameter η and go to (2).

Artificial Neural Network

Enviado por

Dados do documento

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Artificial Neural Network

Enviado por

Direitos autorais:

Formatos disponíveis

Artificial Neural Networks

• Very simple principles

Squashing function limits node output:

Information flow is unidirectional

Information processing is parallel

Internal representation (interpretation) of data

(1 × 0.25) + (0.5 × (-1.5)) = 0.25 + (-0.75) = - 0.5

• Learn to do any of the above

The Network itself is composed

of an arbitrary number of nodes

• fH(x) = u(x) [linear threshold]

• Now how do we train it?

• Adjust weights based on a how well the

each hidden node to

fH(x) fH(x) fH(x)

•Repeat steps 1 to 6 for all vector pairs in the training set;

(1) Choose random weight vector wi for neuron i, i = 1, ..., m

(4) Update all weight vectors of all neurons i in the

(5) If convergence criterion met, STOP.

Você também pode gostar