Você está na página 1de 48

UNIT-IV

HOPFIELD NETWORKS

AS PER JNTU YOUR SYLLABUS IS:

• Introduction Hopfield Networks


• The Hopfield Network Model
• Hopfield Network Algorithm
• Boltzmann's machine Applications of Hopfield
N/Ws
• Associative Memories (AM)
• Bidirectional Associative Memories

1
Kinds of Neural Networks
 The perceptron: Single unit, limited in what it can learn.
 Multi-layer feed forward network: Can learn many different
patterns. In particular, this is applied to supervised learning problems
where we have a set of data which has been put into classes.
 Competitive learning and Kohonen self-organizing maps:
Designed for unsupervised learning that is, extracting patterns from
datasets without pre-classification.
 Content-addressable aka associative networks, such as Hopfield
networks, provide a kind of memory which can retrieve wholes from
parts.
 Recurrent networks allow neural networks to learn sequences of
actions in time.
Overall structure of Neural
Networks

Competitive Learning Networks


Main Principal: Winner-takes-all (WTA) neurons

 Among all competing nodes, only one will win and all others will lose
 We mainly deal with single winner WTA, but multiple winners WTA are possible
(and useful in some applications)
 Easiest way to realize WTA: have an external, central arbitrator (a program) to
decide the winner by comparing the current outputs of the competitors (break
the tie arbitrarily)

2
 This is biologically unsound (no such external arbitrator exists in biological nerve
system).

The Jordan Network in Detail


• The plan units represent the current task.
• The state units allow the network to store information about the current
state of the network, so that the network can act differently depending upon
the state that it is in.
• These networks can be trained by a variant on back propagation, where the
target pattern is a sequence of desired behaviour. The network will learn to
carry out a pattern of behaviour in time.

Main Types of Architectures in NN

3
Architectures (1) Feed Forward Networks Some important points:

 The neurons are arranged in separate layers


 There is no connection between the neurons in the same layer
 The neurons in one layer receive inputs from the previous layer
 The neurons in one layer delivers its output to the next layer
 The connections are unidirectional
 (Hierarchical)

Structure of the network:

Architectures (2) Recurrent Networks


 Some connections are present from a layer to the previous layers

Structure of the network:

4
Architectures (3) Associative
networks
 There is no hierarchical arrangement
 The connections can be bidirectional

5
Hopfield Networks
Introduction H.N.:
 The Hopfield N/W was developed by John Hopfield in 1982. it is the impact on
the field of NN. The N/W was very sophisticated and based on coherent
theoretical picture.
 In 1982, J.J. Hopfield brings together several earlier ideas concerning these
networks and presents a complete mathematical analysis (based on Ising spin
models). It is therefore that this network is generally referred to as the
Hopfield network.
 There are two main approaches for solving combinational optimization
problems using ANN: Hopfield Networks & Kohonen’s Self Organizing Feature
Maps.
 While the latter are mainly used in Euclidean Problems, the Hopfield networks
have been widely applied in different classes of combinational Optimization
problems.
MODEL of H.N.:
 The Hopfield model is used as an auto-associative memory to store and recall
a set of bitmap images. Associative recall of images, given incomplete or
corrupted version of a stored image the network can recall the original
 A Hopfield net is a form of recurrent artificial neural network invented by
john Hopfield. Hopfield nets serve as content-addressable memory systems
with binary threshold units. They are guaranteed to converge to a local
minimum, but convergence to one of the stored patterns is not guaranteed.
 The Hopfield network demonstrates how the mathematical simplification of a
neuron can allow the analysis of the behaviour of large scale NNs.
 Hopfield provided the important link between local interactions & global
behaviour.
 NNs are complex and often contain nonlinear components and hence the
behavior of a NN is difficult to analyze.
 Hopfield applied some ideas to those from an important and developing area of mathematics called
nonlinear system theory
 Almost all the networks discussed in the previous units were non –recurrent.
i.e. there is no feedback from the output of the network to their inputs.

6
 Non recurrent networks have a repertoire of behavior that is limited compared
to their recurrent counterparts.
 As recurrent networks have feedback paths from their outputs to their inputs,
the response of such networks is dynamic i,e. is after applying a new input,
the output calculated and feedback to modify the input.
 The output is again recalculated and this process is repeated again and again.
 If the network is stable, successive iterations produce smaller and smaller
output changes until eventually the output become constant.
 For many networks, the process never ends and such networks are said to be
unstable. Unstable networks posses interesting properties and are usually
studied as examples of chaotic systems.
 Stability problems stymied early researchers. No one was as able to predict
which network would be stable and which would change.
 Moreover the problem appeared so difficult that many researchers were
pessimistic about finding a solution.
 But fortunately a powerful theorem that defines a subset of the recurrent
researchers whose outputs eventually reach a stable state was deviced out by
Cohen & Grossberg.
 This opened the door further research. Many scientists today are exploiting
the complicated behaviour and capabilities of these systems.

ENERGY OF NEURAL NETWORKS


 A Hopfield net (Hopfield 1982) is a net of such units subjects to the
asynchronous rule for updating one neuron at a time.
 Pick a unit 1 at random.
If Σwij sj ≥θi, turn it on. Otherwise turn it off.
 Moreover, a Hopfield assumes symmetric weights: wij=wji.
Hopfield defined the energy
 E = - ½ Σ ij SiSjWij +Σi Siθi
 If we pick unit I and the firing rule does not change its Si it will not change E.
Si 0 to 1 transition:
 If Si initially equals 0 , and Σ Wij Sj ≥θi
 Then Si goes from 0 to 1 with all other sj constant, and the “energy gap” change in E,
is given by
 ∆E = - ½ Σ(wij+sj +wjisj)+θi
= - (Σj wijsj – θi) by symmetry

7
≤0.
Si : 1 to 0 transition
 If si initially equals 1, and Σ wij sj <θi
 Then si goes from 1 to 0 with all other sj constant
 The “energy gap or change in E, is given, for symmetric wij, by :
 ∆E = Σj wij sj – θi < 0 , On every updating we have ∆E ≤0.
Minimizing Energy
 On every updating we have ∆E≤0.
 Hence the dynamic of the net tends to move E toward a minimum.
 We stress that there may be different such states – they are local minima, Global
minimization is not guaranteed.

Hopfield model Network architecture:


➢ N processing units (binary)
Networ
➢ Fully(Infinitely) connected: N(N-1) connections

➢ Single-layer(no hidden layer)

Recurre
➢ Recurrent(feedback) network : No self-feedback loof

 8
• Learning process
• Let ξ1 , ξ2 , ξ3 ,⋅⋅⋅⋅⋅⋅, ξMdenote a known set of N-dim. memories.

1 M
W = (∑ξµξµ
T
−MΙ)
N µ=1

 Inputting and updating


 Let ξprobe denote an unknown N-dimensional input vector.

9
– Update asynchronously (i.e., randomly and one at a time)
according to the rule

 Convergence and Outputting


 Repeat updating until the state vector remains unchanged.

 Let X fixed the fixed point (stable state).


denote

Y = X fixed
• Associated memories
1 1
E =− ∑∑ωji x j xi
2 j i
∆E j = E j ( n +1) − E j ( n) = − ∆x j ∑ωji xi
2 i
i≠j i≠j

– Memory vectors ξ1,ξ2 ,ξ ⋅, ξ


3 ,⋅⋅⋅⋅⋅ M are states that
correspond to minimum E.

– Any input vector converges to the stored memory vector that is


most similar or most accessible to the input.

– N=3 example

– Let (1,-1,1), (-1,1,-1) denote the stored memories. (M=2)


 0 −2 2 
1
W = − 2 0 − 2
3
 2 − 2 0 

10
Limitations of Hopfield model:
1) The stored memories are not always stable.

N

M

N

➢ The signal-to-noise ratio: M for large M.

➢ The quality of memory recall breaks down at M=0.14N

2) There may be stable states that were not the stored memories.
(Spurious states)

11
3) Stable state may not be the state that is most similar to the
input state.

On a scale-free neural network:


 Network architecture: the BA scale-free network
 A small core of m nodes. (fully connected)
 N (≫m) nodes are added.
➢ Total N + m processing units.
➢ Total Nm connections. (for 1≪m≪N)

m = 7 core

12
Hopfield pattern recognition
ξ1
 Stored P different patterns:
i
ξiµ ( µ = 1,2,⋅ ⋅ ⋅, P )
 Input pattern: 10% reversal of ( =0.8)
1
 Output pattern: Si Ψ= ∑S ξi
1
i
N i

 The quality of recognition: overlap

13
Hopfield
Discrete Hopfield NN :
 Input vectors values are in {-1,1} (or {0,1}).
 The number of neurons is equal to the input dimension.
 Every neuron has a link from every other neuron (recurrent architecture)
except itself (no self-feedback).
 The neuron state at time n is its output value.
 The network state at time n is the vector of neurons states.
 The activation function used to update a neuron state is the sign
function except if the input of the activation function is 0 then the new
output (state) of the neuron is equal to the old one.
 Weights are symmetric: Wij = Wji

How do we compute the weights?


 N: input dimension.

 M: Numbers of patterns (called fundamental memories) are used to

f
compute the weights.
µi
14
 i-th component of the µ fundamental memory.

 xi (n)State of neuron i at time n.

Training Hopfield NN
1. Storage. Let f1, f2, … , fM denote a known set of N-dimensional
fundamental memories. The weights of the network are:

1 M


w ji = M

µ

=1
,i f µ, j j ≠i
0 j =i

where wji is the weight from neuron i to neuron j. The elements of
the vector fμ are in {-1,+1}. Once they are computed, the synaptic
weights are kept fixed.

2. Initialisation. Let x probe denote an input vector (probe)


presented to the network. The algorithm is initialised by setting:

x j (0) = x probe ,j j =1, ... , N

where xj(0) is the state of neuron j at time n = 0, and x probe,j is the


jth element of the probe vector x probe.

3. Iteration Until Convergence. Update the elements of the


network state x(n) asynchronously (i.e. randomly and one at the
time) according to the rule
N 
x j ( n +1) =sign ∑ w ji xi ( n)  j =1, 2, ... , N
i =1 

Repeat the iteration until the state x remains unchanged.

15
4. Outputting. Let xfixed denote the fixed point or stable state,
that is such that x(n+1)=x(n), computed at the end of step 3.
The resulting output y of the network is:
y =x fixed

Recurrent Networks Introduction:


 Elman and Hopfield networks, these models how neocortical principal
cells are believed to function.
 Elman networks are two-layer back-propagation networks with
recurrence. Can learn temporal patterns. Whether they’re sufficiently
general to model how the brain does the same thing is a research
question.
 Hopfield networks give you autoassociative memory

RECURR
RECURRENT NETWORKS & BINARY SYSTEMS:
 This network is consists of two layers.
 This network somewhat different from the format found in the work of
Hopfield and others it is, still functionally equivalent to the model.

16
 Layer 0, as in the networks discussed in previous units serves no
computations function. It simply distributes the network outputs back
to the inputs.
 On the other hand each layer1 neuron computes the weighted sum of
its inputs, producing a NET signal that is then operated on by the non
linear function F to yield the out signal.
 The function F was a simple threshold in the earlier works of Hopfield.
 The output of such a neuron is one if the weighted sum of the outputs of
the other neurons is greater than a threshold Tj, if is zero.
NETj = Σi≠j Wij OUTi +INj
OUTj= 1 if NET j>Tj
OUTi= 0 if NETj <Tj
OUTj remains unchanged if NETj=Tj.

 The “State of a Network” is defined as the set of the current values of


OUT signals from all the neurons.
 The state of each neuron changed at discrete random times in the
original HOP filed Network.
 But in the later works the neuron states could change simultaneously.
 As the output of a “ Binary Neuron” could either be zero or one the
current state of the network forms a binary number, each bit of which
represents the OUT signal from a neuron. It should be noted that
intermediate levels between 0 & 1 are not allowed.

17
Recurren
STABILITY
 The weights between layers in this network may be considered to form a
matrix W.
 Cohen & Grossberg have proved those recurrent networks are stable if
the matrix is “Symmetrical” with zeros on its main diagonal.
 A symmetrical matrix has the property Wij = W ji for i not equal to j
and Wii = 0 for all i.
 The stability of such a network can be proved by a mathematical

Single
technique.
 Suppose that a function can be found which always decreases each time
the network changes the state.
 Eventually this function must reach a minimum and stop, thereby
ensuring that the network stable.
Here
 E is the artificial network energy Wij is the weight from the output of
neuron I to the input of neurons j
18
OUTj is the output of neuron j
Ij is the external input to neuron j
Tj is the threshold of neuron j.
 The change in energy E, due to a change in the state of neuron j is given
by:
SE = -[ Σi≠j (Wij OUTi) + Ij –Tj ] δ OUTj
= - [NETj – Tj] δOUTj
Here δ OUTj is the change in the output of
neuron j.
 Case1: where the NET value of neuron j is greater than the threshold
value. This will cause the term in the brackets to be positive and in
equation
NETj = Σi=j Wij OUTj +INj ,
the output of neuron j must change in the positive direction or else
remain constant. This implies that the OUTj can be only positive or zero
and δ E must be negative. Therefore the network energy must either
decrease or stay constant.
 Case2: that NET is less than the threshold value. Then δOUTj can be
only negative or zero. Hence the energy is again restricted to either
decrease or stay constant.
 Case3: if NET equals the threshold then δj is zero and the energy
remains unchanged.

 This means that any change in the state of a neuron will either reduce
the energy or maintain its current value.
 Since the energy shows a continuous downward trend, eventually it
must find minimum and stop at the point. (by definition such networks
are said to be stable.)
 The network (weight) symmetry criterion is sufficient, but not necessary
to define a stable system.
 There are many stable systems which do not satisfy this criteria e.g all
the feed forward networks.
 A minute deviation from symmetry produces continuous oscillations and
therefore approximate symmetry is usually adequate to produce stable
systems.

19
Bi directional Recurrent Neural
Networks
 One of the methods used to try to overcome these limitations consists of
using bidirectional recurrent neural networks (BRNNs).
 An RNN is a neural network that allows “backward” connections. This
means, that it can re-process the output.
 Our brain, as you can surely guess, is a recurrent network.
 Sadly, the issues of RNN training and architecture are too complicated
for this lecture. An intuitive explanation will be presented, however.
The Network structure is:

 The input, vector It, encodes the external input at time t. In the most
simple case, it encodes one amino acid, using orthogonal encoding.

 The output prediction has the functional form

Ot = η(Ft , Bt , It)

and depends on the forward (upstream) context Ft, the backward


(downstream context) Bt, and the input It at time t.

The Past and the Future

20
 Ft and Bt store information about the “past” and the “future” of the
sequence. They make the whole difference, because now we can utilize
global information.
 The funtions satisfy the recurrent bidirectional equations:
Ft = φ(Ft-1, It)
Bt = β(Bt+1, It)
where φ() and β() are learnable nonlinear state transition
functions, implemented by two NNs (left and right subnetworks in the
picture).
 Intuitively, we can think of Ft and Bt as “wheels” that can be rolled
along the protein.
 To predict the class at position t, we roll the wheels in opposite
directions from the N- and C-terminus up to position t and then combine
what is read on the wheels with It to calculate the proper output using η.

Boltzmann
How a Boltzmann Machine models
data
 It is not a causal generative model (like a sigmoid belief net) in which we
first generate the hidden states and then generate the visible states
given the hidden ones.
21
 To generate a sample from the model, we just keep stochastically
updating the binary states of all the units

 After a while, the probability of observing any particular vector on


the visible units will have reached its equilibrium value.

Restricted Boltzmann
Machines
 We restrict the connectivity to make learning easier.

 Only one layer of hidden units. No connections between hidden units.

 In an RBM, the hidden units really are conditionally independent given


the visible states. It only takes one step to reach conditional equilibrium
distribution when the visible units are clamped.

 So we can quickly get an unbiased sample from the posterior


distribution when given a data-vector :

Weights à Energies à Probabilities


 Each possible joint configuration of the visible and hidden units has an
energy
 The energy is determined by the weights and biases.
 The energy of a joint configuration of the visible and hidden units
determines its probability.
 The probability of a configuration over the visible units is found by
summing the probabilities of all the joint configurations that contain it.

Boltzmann Machine
 The Boltzmann machine is a stochastic version of the Hopfield model.
Used for optimization problems such as the classic traveling salesman
problem
 Those are only a few of the more common network structures.
Advanced users can build networks designed for a particular problem in
many software packages readily available on the market today.
22
 The Boltzmann machine is similar in function and operation to the
Hopfield network with the addition of using a simulated annealing
technique when determining the original pattern.
 The Boltzmann machine incorporates the concept of simulated
annealing to search the pattern layer's state space for a global
minimum. Because of this, the machine will gravitate to an improved
set of values over time as data iterates through the system.
 Ackley, Hinton, and Sejnowski developed the Boltzmann learning
rule in 1985. Like the Hopfield network, the Boltzmann machine has an
associated state space energy based upon the connection weights in the
pattern layer.
 The processes of learning a training set full of patterns involves the
minimization of this state space energy. Because of this, the machine
will gravitate to an improved set of values for the connection weights
while data iterates through the system.
 The Boltzmann machine requires a simulated annealing schedule, which
is added to the learning process of the network. Just as in physical
annealing, temperatures start at higher values and decreases over time.
 The increased temperature adds an increased noise factor into each
processing element in the pattern layer. Typically, the final temperature
is zero. If the network fails to settle properly, adding more iterations at
lower temperatures may help to get to a optimum solution.
 A Boltzmann machine learning at high temperature behaves much like a
random model and at low temperatures it behaves like a deterministic model.
Because of the random component in annealed learning, a processing element
can sometimes assume a new state value that increases rather than
decreases the overall energy of the system. This mimics physical annealing
and is helpful in escaping local minima and moving toward a global minimum.
 As with the Hopfield network, once a set of patterns are learned, a partial
pattern can be presented to the network and it will complete the missing
information. The limitation on the number of classes, being less than fifteen
percent of the total processing elements in the pattern layer, still applies.

Different types of Boltzmann


machine
 Higher-order Boltzmann machines
 The stochastic dynamics and the learning rule can accommodate more
complicated energy functions (Sejnowski, 1986). For example, the

23
quadratic energy function in can
be replaced by an energy function whose typical term is

. The total input to unit that is used in the update

rule must then be replaced by . The


only change in the learning rule is that is replaced by

 Conditional Boltzmann machines


 Boltzmann machines model the distribution of the data vectors, but there is a
simple extension for modeling conditional distributions (Ackley et. al. ,1985).
The only difference between the visible and the hidden units is that, when

sampling , the visible units are clamped and the hidden


units are not. If subsets of the visible units are also clamped when

sampling this subset acts as "input" units and the


remaining visible units act as "output" units. The same learning rule applies,
but now it maximizes the log probabilities of the observed output vectors
conditional on the input vectors

 Mean field Boltzmann machines


 Instead of using units that have stochastic binary states, it is possible to
use "mean field" units that have deterministic, real-valued states
between 0 and 1, as in an analog Hopfield net. Eq.

is used to compute an "ideal" value for a unit's


state given the current states of the other units and the actual value is moved
towards the ideal value by some fraction of the difference. If this fraction is
small, all the units can be updated in parallel. The same learning rules can be
used by simply replacing the stochastic, binary values by the deterministic
real-values (Petersen and Andersen, 1987), but the learning algorithm is hard
to justify and mean field nets have problems modeling multi-modal
distributions.

24
 Non-binary units
 The binary stochastic units used in Boltzmann machines can be generalized to
"softmax" units that have more than 2 discrete values, Gaussian units whose
output is simply their total input plus Gaussian noise, binomial units, Poisson
units, and any other type of unit that falls in the exponential family (Welling
et. al., 2005). This family is characterized by the fact that the adjustable
parameters have linear effects on the log probabilities. The general form of
the gradient required for learning is simply the change in the sufficient
statistics caused by clamping data on the visible units.

Boltzmann Machine Model:


 Boltzmann Machine neural net was introduced by Hinton and Sejnowski
in 1983.
 Used for solving constrained optimization problems.
 Typical Boltzmann Machine:
 Weights are fixed to represent the constrains of the problem and
the function to be optimized.
 The net seeks the solution by changing the activations of the units
(0 or 1) based on a probability distribution and the effect that the
change would have on the energy function or consensus function
for the net.
 The objective of the neural net is to maximize the consensus function:

Where: wij – weight of the connection , xi, xj – are the states of the Xi and
Xj units

If units are connected: wij ≠ 0, The bidirectional nature of


connections: wij = wji

 The sum runs over all units of the net.

Simulated annealing
 Simulated annealing is a general method for making likely the escape
from local minima by allowing jumps to higher energy states.

 The analogy here is with the process of annealing used by a craftsman


in forging a sword from an alloy.
 He heats the metal, then slowly cools it as he hammers the blade into
shape.
25
 If he cools the blade too quickly the metal will form patches of different
composition;
 If the metal is cooled slowly while it is shaped, the constituent metals
will form a uniform alloy.
 The net finds the maximum or at least a local maximum by letting each
unit attempt to change its state (from 1 to 0 or vice versa):

Where:

∆C(i) - the change in consensus (unit Xi were to change its


state)

xi – the current state of unit Xi

[1 – 2xi] - +1, if Xi is ‘off’; -1, if Xi is ‘on’

 Unit Xi does not necessary change its state, so the probability of the net
accepting a change in state for Xi:

Where:

T (temperature) – control parameter that reduced as the net searches for a


maximal consensus

This process of gradually reducing the temperature is called simulated


annealing. This is used to reduce the probability of the net becoming
trapped in a local minimum which is not a global minimum.

Architecture of Boltzmann Machine

26
Boltzmann Machine Learning Rule
 Learning Rule was proposed by Ackey, Hinton and Sejnowski in 1985.

 Extends Hopfield model with learning:

 Each neuron fires with bipolar values.

 All connections are symmetric.

 In activation passing, the next neuron whose state we wish to update is


selected randomly.

 There are no self-feedback (connections from a neuron to itself)

 Based on probabilistic operation during training, correlations.

 Deterministic operation once weights determined.

Simple Boltzmann Machine


 Boltzmann Machine with hidden and visible neurons. The network is fully
connected with symmetric connections

27
Boltzmann Machine Structure
 The Boltzmann Machine is a Hopfield network, in which

 The neurons are divided into two subsets:

 Visible, which is further divided into:

 Input and Output

 Hidden

This allows a much richer representation of the input data.

 The neurons are stochastic: at any time there is a probability attached


to whether the neurons fires whereas the Hopfield net is based on
deterministic principles.

May use either supervised or unsupervised learning

Boltzmann Machine Operation


 We will concentrate on the unsupervised learning methods.

 There are three phases in operation of the network:

 The clamped phase in which the input and output of visible


neurons are held fixed, while the hidden neurons are allowed to
vary.

 The free running phase in which only the inputs are held fixed and
other neurons are allowed to vary.

 The learning phase.

28
 These phases iterate till learning has created a Boltzmann Machine
which can be said to have learned the input patterns and will converge
to the learned patterns when noisy or incomplete pattern is presented.

Generalized Networks-Clamped
Phase
 Generally the initial weights of the net are randomly set to values in a small
range e.g. -0.5 to +0.5.

 Then an input pattern is presented to the net and clamped to the visible
neurons.

 Now perform a simulated annealing on the net: choose a hidden neurons at


random and flip its state from sj to –sj with probability:

P( sj à -sj) = 1 / ( 1 + exp(-∆E/ T) )

Where the energy of the net is

 The minima of the “energy” function correspond to stable configurations of


the network.

Where: wji - the weight between i and j over the pattern

sj and si – the state of i-th and j-th neurons

 The activation passing can continue till the net reaches a low energy state.

 Because the stochastic nature of the Boltzmann Machine, we cannot specify a


single state which will be the attractor of the system.

 But, the net will reach a state of thermal equilibrium in which individual
neurons will change state and the probability of any single state can be
calculated.

 For a system being in any state α with associated energy Eα and temperature
T the probability will be:

P(α) = ( exp( -Eα / T ) ) / ( ∑β exp(-Eβ /T) )

 The updating can be present as a local operation since:

Where: | vj | - the absolute value of the jth neuron’s activation

29
 After the temperature is gradually dropped, the net goes as low an energy
state as it can at each temperature. So, the correlations
between the firing of pairs of neurons at the final temperature:

ρ+ij = ‹sj si›+


Where: ‘+’ indicates that the correlations is calculated when the visible
neurons are in a clamped state

Free Running Phase:


 Here we need to repeat the same calculations, but do not clamp the
visible neurons.

 After presentation of the input patterns all neurons can update their
states and the annealing schedule is performed (as before).

 And again, the correlations between the firing of pairs of neurons at the
final temperature:

ρ-ij = ‹sjsi›-
Where: ‘-’ indicates that the correlations is carried out when the visible
neurons are not in a clamped state

Learning Phase
 Here we use the Boltzmann Machine’s learning rule to update the
weights:

Where: η – is a learning rate.

This means: whether weight are changed depend on the


difference between the correlations in clamped vs. free mode.

 By applying this learning rule the pattern completion property of the


Boltzmann Machine is established.

Applications
 The weighted matching problem:
30
 A set of N point with a known “distance” between each.

 Link the points together in pairs so as to minimize the total length


of the links.

 The Traveling Salesman Problem.

 Graph bipartitioning:

 A set of points which will be split into two disjoint sets with as low
an associated cost as possible.

Conclusion of Boltzmann machine


 Learning in Boltzmann Machine is accomplished by using a Simulated
Annealing technique which has stochastic nature.

 Boltzmann Machine:

 Global search: finding the global minimum/maximum of potential


energy.

 The concept of it specifies that the neural net is first operated at a


high temperature, which is gradually lowered until the net is
trapped in an balance configuration around a single minimum of
the energy function.

 Simulated Annealing:

 Local search: finding relative minimum/maximum of potential


energy.

 Force search out of local regions by accepting suboptimal state


transitions with decreasing probability.

31
Associative Memories
 Motivation

 Human ability to retrieve information from applied associated


stimuli

 Ex. Recalling one’s relationship with another after not seeing


them for several years despite the other’s physical changes
(aging, facial hair, etc.)

 Enhances human awareness and deduction skills and


efficiently organizes vast amounts of information

 Why not replicate this ability with computers?

 Ability would be a crucial addition to the Artificial Intelligence


Community in developing rational, goal oriented, problem
solving agents

 One realization of Associative Memories are Contents


Addressable Memories (CAM)

Capacity versus Robustness Challenge for


Associative Memories

32
 In early memory models, capacity was limited to the length of the
memory and allowed for negligible input distortion (old CAMs).

 Ex. Linear Associative Memory

 Recent years have increased the memory’s robustness, but sacrificed


capacity

 J. J. Hopfield’s proposed Hopfield Network

 Capacity: 2n , where n is the memory length

 Current research offers a solution which maximizes memory capacity


while still allowing for input distortion

 Morphological Neural Model

 Capacity: essentially limitless (2n in the binary case)

 Allows for Input Distortion

 One Step Convergence

Morphological Memories
 Formulated using Mathematical Morphology Techniques
 Image Dilation
 Image Erosion
 Training Constructs Two Memories: M and W
 M used for recalling dilated patterns
 W used for recalling eroded patterns
 M and W are not sufficient…Why?
 General distorted patterns are both dilated and eroded
 solution: hybrid approach
 Incorporate a kernel matrix, Z, into M and W
 General distorted pattern recall is now possible!
 Input → MZ → WZ → Output

Improving Limitations
 Experiment

33
 Construct a binary morphological auto-associative memory to
recall bitmap images of capital alphabetic letters
 Use Hopfield Model for baseline
 Construct letters using Microsoft San Serif font (block
letters) and Math5 font (cursive letters)
 Attempt recall 5 times for each pattern for each image
distortion at 0%, 2%, 4%, 8%, 10%, 15%, 20%, and 25%
 Use different memory sizes: 5 images, 10, 26, and 52
 Use Average Recall Rate per memory size as a performance
measure, where recall is correct if and only if it is perfect

Results
 Morphological Model and Hopfield Model:
 Both degraded in performance as memory size increased
 Both recalled letters in Microsoft San Serif font better than Math5
font
 Morphological Model:
 Always perfect recall with 0% image distortion
 Performance smoothly degraded as memory size and distortion
increased
 Hopfield Model:
 Never correctly recalled images when memory contained more than 5
images

34
Resul

35
Result
HOPFILED NETS AND OPTIMIZATION: Traveling
salesman problem:
 To design Hopfield nets to solve optimization problems: given a
problem, choose weights for the network so that E is a measure of the
overall constraint violation. A famous example is the traveling salesman
problem.
 Hopfield and Tank 1986 have constructed VLSI chips for such networks
which do indeed settle incredibly quickly to a local minimum of E.
 Unfortunately, there is no guarantee that this minimum is an optimal
solution to the traveling salesman problem.
 Experience shows it will be "a pretty good approximation," but
conventional algorithms exist which yield better performance.

The (TSP) traveling salesman


problem 1
 There are n cities, with a road of length dij connecting city i to city j.

36
 The salesman wishes to find a way to visit the cities that is optimal
in two ways: each city is visited only once, and the total route is as
short as possible.

 This is an NP-Complete problem: the only known algorithms (so far) to


solve it have exponential complexity.

Exponential Complexity
 Why is exponential complexity a problem?

 It means that the number of operations necessary to compute the exact


solution of the problem grows exponentially with the size of the problem
(here, the number of cities).

exp(1) = 2.72

exp(10) = 2.20 104

exp(100) = 2.69 1043

exp(500) = 1.40 10217

exp(250,000) = 10108,573

(Most powerful computer = 1012 operations/second)

Solution representation:

 Construct a Hopfield network with N2 nodes.

 Semantics: nia = 1 if and only if town i on place a in tour

Solution representation
37
 Construct a Hopfield network with N2 nodes.

 Semantics: nia = 1 if and only if town i on place a in tour

d ij nia ( n j , a +1 + n j , a −1 )
1
L= ∑
2 i , j ,a

Energy function:
 Energy function that enforces the constraints:

∑n
a
ia =1, ∀i ∑ni
ia = 1, ∀a

H = ∑ d ij nia ( n j ,a +1 + n j ,a −1 ) + ∑  1 − ∑ nia  + ∑  1 − ∑ nia 


1 γ   
2
  
2

2 i , j ,a 2  a  i  i  a  

Connection weights
 Nodes within each row connected with weight –γ

 Nodes within each column connected with weight –γ

 Each node is connected to nodes in columns left and right with weight –
dij

 Continuous activation

38
H =
1
2
∑d
i , j ,a

+
γ 

ij

2



 a 
1 − ∑
i
nia



2

+ ∑
i



1 − ∑
a
nia 
2
Com
nia ( n j , a +1 + n j , a −1 )

 
 


p
TSP Network Connections
84

• Semantics
1 2 3 4 5 6 7 8 91
0
Combinatorial
A
optimization
TSP
B (Travelling Salesman
Problem)
C

E
place
city
F

G
39
H

J
The TSP problem
 1 if arc i - j is in the tour
xij = 
 0 otherwise
n n
Minimize ∑∑c
i =1 j =1
ij xij

n
subject to ∑x
i =1
ij =b j =1 ( j =1,..., n)
n

∑x
j =1
ij = ai =1 (i =1,..., n)

X =( xij ) ∈S
xij =0 or 1 (i, j =1,..., n)
cij is the cost of moving or the distance from node i to j

 Hopfield and Tank (1985) showed how this problem can be solved by a
recurrent network.

 The first step is to map the problem onto the network so that solutions
correspond to states of the network.

 The problem for N cities are coded into an N by N network. The next
step is to construct an energy function that can eventually be rewritten
in the form of

1
E =− ∑wij xi x j
2 i, j
and has minima associated with states that are valid solutions.

40
 Let nodes be indexed according to their row and column so that yxi is
the output of the node for city x in tour position i and consider the sum

x

i
∑∑yj ≠i
xi y xj
 Each term is the product of a pair of single city outputs with different
tour positions. This term will tend to encourage rows to contain at most
a single “on” unit. Similar terms may be constructed to encourage single
units being “on” in columns, the existence of exactly ten units “on” in
the net and to foster a shortest tour.

The TSP energy function


E = A / 2∑∑∑y xi y xj + B / 2∑∑∑y xi y xj
Hopfield network
x as
i associative
j ≠i memory
i x x ≠y

 A primary application +of C 2( ∑∑


the/ Hopfield
x
y xi −
network
i
2
n)associative memory. In this
is an
case, the weights of the connections between the neurons have to be thus set
that the states of the system corresponding with the patterns which are to be
stored in the network are stable.

 These states can be seen as 'dips' in energy space. When the network is cued
with a noisy or incomplete test pattern, it will render the incorrect or missing
data by iterating to a stable state which is in some sense 'near' to the cued
pattern.

 We consider now NN models for unsupervised learning problems, called auto-


association problems. Association is the task of mapping patterns to patterns

 In an associative memory the stimulus of an incomplete or corrupted pattern


leads to the response of a stored pattern that corresponds in some manner to
the input pattern

 A NN model most commonly used for (auto-) association problems is the


Hopfield network.

Main Two types of associations


• Two types of associations. For two patterns s and t
– hetero-association (s != t) : relating two different patterns
– auto-association (s = t): relating parts of a pattern with other
parts

41

Associative Memory
 Human memory operates in an associative manner; that is a portion of a
recollection can produce a larger related memory.
 A recurrent network forms an associative memory. Like human memory, a
portion of the desired data is supplied and the full data “memory” is returned.
 To make an associative memory using a recurrent network, the weights must
be selected to produce energy minima at desired vertexes of the unit
hypercube.
 Hopfield (1984) has developed an associative memory in which the outputs
are continuous, ranging from +1 to -1, corresponding to the binary values 0
and 1, respectively. The memories are encoded as binary vectors and stored
in the weights according to the formula that follows:

42
Wij = Σ ( OUTi,d OUT j,d)
d= 1 to m
where m = the number of desired memories(output vectors)
d= the number of a desired memory (output vector)
OUTi,d = the ith component of the desired output vector.
 This expression may be clarified by noting that the weight array W can be
found by calculating the outer product of each desired vector with itself (if the
desired vector has n components, this operation forms an n-by-n matrix) an
summing all of the matrixes thus formed.
W = ∑i D1i Di where Di is the ith desired row vector.
 Once the weights are determined, the network may be used to produce the
desired output vector, even given an input vector that any be partially
incorrect or incomplete. To do so, the outputs of the network are first forced to
the values of this input vector.
 The input vector is removed and the network is allowed to “relax” toward the
closet deep minimum.
 Note that the network follows the local slope of the energy function, and it
may become trapped in a local minimum and not find the best solution in a
global sense.
 The Hopfield network implements a so-called associative (also called content
addressable) memory.
 A collection of patterns called fundamental memories is stored in the NN by
means of weights.
 Each neuron represents an attribute (dimension) of the input.
 The weight of the link between two neurons measures the correlation between
the two corresponding attributes over the fundamental memories. If the
weight is high then the corresponding attributes are often equal in the
fundamental memories.

Levels of difficulty in associative


learning
 Stimuli with overlapping representations of varying size
 Concurrent dynamics of spiking neurons and spike- time dependent synapses.
 Ease “plasticity-rigidity“ problem with coupled short- and long-term synaptic
dynamics
 Spiking network is source of necessary stochasticity

Theories of memory: associative memory


model
43
 We are often able to recall memories from small partial triggers.
 Essential for this type of memory is the ability to form associations.
 Synaptic plasticity is the necessary ingredient behind associative
abilities in the brain.
 If circuits in the nervous system act as associative memories, they must be
able to dynamically interleave storage of new patterns with recall of old
patterns.
 The cellular and circuit-level mechanisms that may provide this functionality
have been postulated by Paulsen and Moser with particular reference to areas
CA3 and CA1 of the mammalian hippocampus.

Bi-directional Associative
Memory
 This network model was developed by Bart Kosko and again generalizes the
Hopfield model. A set of paired patterns are learned with the patterns
represented as bipolar vectors. Like the Hopfield, when a noisy version of one
pattern is presented, the closest pattern associated with it is determined.

 It has as many inputs as output processing nodes. The two hidden layers
are made up of two separate associated memories and represent the
size of two input vectors.

 The two lengths need not be the same, although this examples shows
identical input vector lengths of four each. The middle layers are fully
connected to each other.

44
 The input and output layers are for implementation purposes the means
to enter and retrieve information from the network. Kosko original work
targeted the bi-directional associative memory layers for optical
processing, which would not need formal input and output structures.

 The middle layers are designed to store associated pairs of vectors.


When a noisy pattern vector is impressed upon the input, the middle
layers oscillate back and forth until a stable equilibrium state is reached.

 This state, providing the network is not over trained, corresponds to the
closest learned association and will generate the original training
pattern on the output.

 Like the Hopfield network, the bi-directional associative memory


network is susceptible to incorrectly finding a trained pattern when
complements of the training set are used as the unknown input vector.

BAM NETWORK STRUCTURE

 Here an input vector A is applied to the weight network W and produces a


vector of neuron output B.
 Now vector B is applied to the “transpose” of the first weight network W 1,
which produces a new set of outputs for vector A. this process is repeated
until the network arrives at a stable point. At this point neither A and B
changes.
 It can be seen that neurons in layers 1 and 2 operate as in other network
models. They produce the weighted sum of inputs and apply it to the
activation function F.
45
symbolically
B = F (AW)
where B is the vector which consists of outputs from layer 2
A is the vector which consists of output from layer 1.
W is the weight matrix between layer 1 and 2
F is the activation function.
similarly the new value of A is given by A = F(B W1)
where W1 is the transpose of matrix W.
 The activation function used here is the familiar sigmoidal (logistic) function
given by
OUTi = 1 / (1+exp-λ / NETj)
where OUTi is the output of neuron i.
NETj is the weighted sum of the inputs to neuron i.
λ is a constant that determines the slope of he curve.
 For the simplest version of BAM, the constant λ is made large, there by
producing an activation function that approaches a simple threshold. Let us
now assume that the threshold function is being used and let us also assume
that there is memory within each neuron in layers1 and 2, and that their
outputs change simultaneously with each “tick” of a master clock and remain
constant between the “Tick”. Hence the neurons will obey the following rules.
OUTi (n+1) = 1 if NETi (n)> 0 (OUTi is 1 if NETi is positive)
OUTi (n+1) = 0 if NETi (n)< 0 (OUTi is 0 if NETi is negative)
OUTi (n+1) = OUTi (n) if NETi (n)=0 (OUTi is unchanged if NETi =0)
Where OUTi (n+1) is the value of the output after a single tick and
OUTi (n) is the value of the output at time n.

Characteristics of BAM
 BAM have the capability to generalize. For example consider that an incomplete or
partially incorrect vector is applied at A. then, the network tends to produce the
closest memory at B. Which in turn tends to correct the errors in A. through it might
take several passes, but still the network converges to the nearest stored memory.
 Feedback systems rarely stabilize they are prone to oscillations, that is they wander
from state to state, never reaching stable. But kosko has proven that all the BAMs are
unconditionally stable for any weight network.
 This is also a very important characteristic and it arises from the transpose
relationship used between the two weight network. And this also ensures that any set
of associations may be learned without the risk of limitability.
 Also, there is a close relationship between the BAM and the Hopfield networks. If the
weight W is made square and symmetrical the W = W1.

46
 Like the hopfield network, the BAM has restrictions on the maximum numbers
associations it can accurately recall. If this unit exceeds, the network may produce
incorrect outputs.
 In general the maximum numbers of stored associations cannot exceeds the number
of neurons in the smaller layer. But by choosing an appropriate threshold for each
neuron, the number of stable states can be made anything from 1 to 2n , where n is
the number of neurons in the smaller layer.

TYPES OF BAM
 Though BAMs have many problems they still remain the subject of research,
because of their simplicity and the property that they can be implemented
using large integrated circuits (either analog or digital)
 Continuous BAM
 Adaptive BAM
 Competitive BAM

Continuous BAM
 The neurons in layers 1 and 2 are considered to be synchronous, that is all the
neurons contain memory so that all of them change state simultaneously,
upon the occurrence of a pulse from a central clock. But in an asynchronous
system, any neuron is free to change state at any time. Whenever its input
indicates that it should do so.
 The BAMs simple threshold had been used as the neurons activation function,
thereby producing a discontinuity in the neurons transfer function. Both
synchronous operation and discontinuous functions are biologically
implansible and quite unnecessary.
 Continuous asynchronous BAMs overcome both these situations and function
in much the same way as the discrete version. It might appear that such BAMs
would suffer from instability, but fortunately this is not the case as continuous
BAMs are stable.
 Continuous BAMs us the sigmoid function with values of λ, near 1, thereby producing neurons
that respond smoothly and continuously, much like their biological prototype continuous BAM
lends itself to analog implementations constructed of resistors and amplifiers very large scale
integration (VLSI) of such networks appear feasible and economically attractive.

Adaptive BAM
 All the versions of BAM discussed so far had their weight matrix calculated as
the sum of the outer products of the input –vector pairs. This calculation is
useful in that it demonstrates the functions that a BAM can represent. But this
is certainly not the way that weights are determined in the brain.

47
 Adaptive BAM adjusts its weight during operation. That is application of the
training vector set causes it to response. Slowly the short term memory
converts into long-term memory, modifying the network as a function of its
experience.
 The network is trained by applying vectors to layer-A and associated vectors
to layer B. either of the vectors can be a noisy versions of the ideal within
limits the network learns the idealized vectors free of noise.
 As the continuous BAM is proved to be stable regardless of the weights, slow
changes in the weights do not upset the stability

Competitive BAM
 Some sort of competition between the neurons is observed in many biological
neural systems. For example in the neurons that process signals from the
retina, lateral inhibition tends to increase the output of the most highly
activated neuron at the expense of its neighbors.
 The rich-gets richer system increases contrast by raising the activation level of
the neurons connected to bright areas of the retina, while reducing the
outputs of those “viewing” the darker areas.
 Competitions in BAM are implemented by interconnecting neurons which each
layer by means of additional weights. These from another weight matrix, with
positive weights on the main diagonal and negative weights at other positions.
 From the Cohen Grossberg theorem we can infer that such a system in unconditionally stable if the
weight arrays are symmetrical though, practically the networks are stable even without symmetry.

48

Você também pode gostar