Você está na página 1de 15

Biological

Neurons

The brain is principally composed of about 10


billion neurons, each connected to about 10,000
other neurons. Each of the yellow blobs in the
picture above are neuronal cell bodies (soma), and
the lines are the input and output channels
(dendrites and axons) which connect them.
Each neuron receives electrochemical
inputs from other neurons at the dendrites. If the
sum of these electrical inputs is sufficiently
powerful to activate the neuron, it transmits an
electrochemical signal along the axon, and passes
this signal to the other neurons whose dendrites

are attached at any of the axon terminals. These


attached neurons may
then fire.
It is important to note that
a neuron fires only if
the total signal
received at the cell
body exceeds a certain
level. The neuron either
fires or it doesn't, there
aren't different grades of
firing.
So, our entire brain is composed of these
interconnected electro-chemical transmitting
neurons. From a very large number of extremely
simple processing units (each performing a
weighted sum of its inputs, and then firing a binary
signal if the total input exceeds a certain level) the
brain manages to perform extremely complex
tasks.
This is the model on which artificial neural
networks are based. Thus far, artificial neural
networks haven't even come close to modeling the
complexity of the brain, but they have shown to be
good at problems which are easy for a human but
difficult for a traditional computer, such as image
recognition and predictions based on past
knowledge.

History: The 1940's to the 1970's


In 1943, neurophysiologist Warren McCulloch and
mathematician Walter Pitts wrote a paper on how
neurons might work. In order to describe how
neurons in the brain might work, they modeled a
simple neural network using electrical circuits.
In 1949, Donald Hebb wrote The Organization of
Behavior, a work which pointed out the fact that
neural pathways are strengthened each time they
are used, a concept fundamentally essential to the
ways in which humans learn. If two nerves fire at
the same time, he argued, the connection between
them is enhanced.
As computers became more advanced in the
1950's, it was finally possible to simulate a
hypothetical neural network. The first step towards
this was made by Nathanial Rochester from the IBM
research laboratories. Unfortunately for him, the
first attempt to do so failed.
In 1959, Bernard Widrow and Marcian Hoff of
Stanford developed models called "ADALINE" and
"MADALINE." In a typical display of Stanford's love
for acronymns, the names come from their use of
Multiple ADAptive LINear Elements. ADALINE was
developed to recognize binary patterns so that if it
was reading streaming bits from a phone line, it
could predict the next bit. MADALINE was the first
neural network applied to a real world problem,

using an adaptive filter that eliminates echoes on


phone lines. While the system is as ancient as air
traffic control systems, like air traffic control
systems, it is still in commercial use.
In 1962, Widrow & Hoff developed a learning
procedure that examines the value before the
weight adjusts it (i.e. 0 or 1) according to the rule:
Weight Change = (Pre-Weight line value) * (Error /
(Number of Inputs)). It is based on the idea that
while one active perceptron may have a big error,
one can adjust the weight values to distribute it
across the network, or at least to adjacent
perceptrons. Applying this rule still results in an
error if the line before the weight is 0, although this
will eventually correct itself. If the error is
conserved so that all of it is distributed to all of the
weights than the error is eliminated.
Despite the later success of the neural network,
traditional von Neumann architecture took over the
computing scene, and neural research was left
behind. Ironically, John von Neumann himself
suggested the imitation of neural functions by
using telegraph relays or vacuum tubes.
In the same time period, a paper was written that
suggested there could not be an extension from
the single layered neural network to a multiple
layered neural network. In addition, many people in
the field were using a learning function that was

fundamentally flawed because it was not


differentiable across the entire line. As a result,
research and funding went drastically down.
This was coupled with the fact that the early
successes of some neural networks led to an
exaggeration of the potential of neural networks,
especially considering the practical technology at
the time. Promises went unfulfilled, and at times
greater philosophical questions led to fear. Writers
pondered the effect that the so-called "thinking
machines" would have on humans, ideas which are
still around today.
The idea of a computer which programs itself is
very appealing. If Microsoft's Windows 2000 could
reprogram itself, it might be able to repair the
thousands of bugs that the programming staff
made. Such ideas were appealing but very difficult
to implement. In addition, von Neumann
architecture was gaining in popularity. There were
a few advances in the field, but for the most part
research was few and far between.
In 1972, Kohonen and Anderson developed a
similar network independently of one another,
which we will discuss more about later. They both
used matrix mathematics to describe their ideas
but did not realize that what they were doing was
creating an array of analog ADALINE circuits. The

neurons are supposed to activate a set of outputs


instead of just one.
The first multilayered network was developed in
1975, an unsupervised network.
History: The 1980's to the present
In 1982, interest in the field was renewed. John
Hopfield of Caltech presented a paper to the
National Academy of Sciences. His approach was to
create more useful machines by using bidirectional
lines. Previously, the connections between neurons
was only one way.
That same year, Reilly and Cooper used a "Hybrid
network" with multiple layers, each layer using a
different problem-solving strategy.
Also in 1982, there was a joint US-Japan conference
on Cooperative/Competitive Neural Networks.
Japan announced a new Fifth Generation effort on
neural networks, and US papers generated worry
that the US could be left behind in the field. (Fifth
generation computing involves artificial
intelligence. First generation used switches and
wires, second generation used the transister, third
state used solid-state technology like integrated
circuits and higher level programming languages,
and the fourth generation is code generators.) As a
result, there was more funding and thus more
research in the field.

In 1986, with multiple layered neural networks in


the news, the problem was how to extend the
Widrow-Hoff rule to multiple layers. Three
independent groups of researchers, one of which
included David Rumelhart, a former member of
Stanford's psychology department, came up with
similar ideas which are now called back
propagation networks because it distributes
pattern recognition errors throughout the network.
Hybrid networks used just two layers, these backpropagation networks use many. The result is that
back-propagation networks are "slow learners,"
needing possibly thousands of iterations to learn.
Now, neural networks are used in several
applications, some of which we will describe later
in our presentation. The fundamental idea behind
the nature of neural networks is that if it works in
nature, it must be able to work in computers. The
future of neural networks, though, lies in the
development of hardware. Much like the advanced
chess-playing machines like Deep Blue, fast,
efficient neural networks depend on hardware
being specified for its eventual use.
Research that concentrates on developing neural
networks is relatively slow. Due to the limitations of
processors, neural networks take weeks to learn.
Some companies are trying to create what is called
a "silicon compiler" to generate a specific type of
integrated circuit that is optimized for the

application of neural networks. Digital, analog, and


optical chips are the different types of chips being
developed. One might immediately discount analog
signals as a thing of the past. However neurons in
the brain actually work more like analog signals
than digital signals. While digital signals have two
distinct states (1 or 0, on or off), analog signals
vary between minimum and maximum values. It
may be awhile, though, before optical chips can be
used in commercial applications.
Conventional computing versus artificial
neural networks
There are fundamental differences between
conventional computing and the use of neural
networks. In order to best illustrate these
differences one must examine two different types
of learning, the top-down approach and
the bottom-up approach. Then we'll look at what it
means to learn and finally compare conventional
computing with artificial neural networks.
Some specific details of neural networks:
Although the possibilities of solving problems using
a single perceptron is limited, by arranging many
perceptrons in various configurations and applying
training mechanisms, one can actually perform
tasks that are hard to implement using
conventional Von Neumann machines.

We are going to describe four different uses of


neural networks that are of great significance:
1 Classification. In a mathematical sense, this
. involves dividing an n-dimensional space into
various regions, and given a point in the space
one should tell which region to which it belongs.
This idea is used in many real-world applications,
for instance, in various pattern recognition
programs. Each pattern is transformed into a
multi-dimensional point, and is classified to a
certain group, each of which represents a known
pattern.
Type of network
used:
Feedforward
networks
2 Prediction. A neural network can be trained to
. produce outputs that are expected given a
particular input. If we have a network that fits
well in modeling a known sequence of values,
one can use it to predict future results. An
obvious example is stock market prediction.
Type of network
used:
Feedforward
networks
3 Clustering. Sometimes we have to analyze data

. that are so complicated there is no obvious way


to classify them into different categories. Neural
netowrks can be used to identify special features
of these data and classify them into different
categories without prior knowledge of the data.
This technique is useful in data-mining for both
commercial and scientific uses.
Type of network
used:
Simple
Competitiv
e Networks
Adaptive
Resonance
Theory
(ART)
networks
Kohonen
SelfOrganizing
Maps
(SOM)
4 Association. A neural network can be trained to
. "remember" a number of patterns, so that when
a distorted version of a particular pattern is
presented, the network associates it with the
closest one in its memory and returns the
original version of that particular pattern. This is
useful for restoring noisy data.
Type of network

used:
Hopfield
networks
The above is just a general picture of what neural
networks can do in real life. There are many
creative uses of neural networks that arises from
these general applications. One example is image
compression using association networks; another is
solving the Travelling Salesman's Problem using
clustering networks.

The perceptron
The perceptron is a mathematical model of a
biological neuron. While in actual neurons the
dendrite receives electrical signals from the axons
of other neurons, in the perceptron these electrical
signals are represented as numerical values. At the
synapses between the dendrite and axons,
electrical signals are modulated in various
amounts. This is also modeled in the perceptron by
multiplying each input value by a value called the
weight. An actual neuron fires an output signal only
when the total strength of the input signals exceed
a certain threshold. We model this phenomenon in
a perceptron by calculating the weighted sum of
the inputs to represent the total strength of the
input signals, and applying a step function on the

sum to determine its output. As in biological neural


networks, this output is fed to other perceptrons.

(Fig. 1) A biological neuron

(Fig. 2) An artificial neuron (perceptron)


There are a number of terminology commonly used
for describing neural networks. They are listed in
the table below:
All the input values
of each perceptron
The
are collectively
input
called the input
vector
vector of that
perceptron.
The
Similarly, all the

weight values of
each perceptron
weigh
are collectively
t
called the weight
vector
vector of that
perceptron.
What can a perceptron do?
As mentioned above, a perceptron calculates the
weighted sum of the input values. For simplicity, let
us assume that there are two input values, x and y
for a certain perceptron P. Let the weights for x and
y be A and B for respectively, the weighted sum
could be represented as: A x + B y.
Since the perceptron outputs an non-zero value
only when the weighted sum exceeds a certain
threshold C, one can write down the output of this
perceptron as follows:
Outpu
t of P {1 if A x + B y > C
=
{0 if A x + B y < =
C
Recall that A x + B y > C and A x + B y < C are the
two regions on the xy plane separated by the line A
x + B y + C = 0. If we consider the input (x, y) as a
point on a plane, then the perceptron actually tells
us which region on the plane to which this point

belongs. Such regions, since they are separated by


a single line, are called linearly separable regions.
This result is useful because it turns out that some
logic functions such as the boolean AND, OR and
NOT operators are linearly separable i.e. they
can be performed using a single perceprton. We
can illustrate (for the 2D case) why they are
linearly separable by plotting each of them on a
graph:

(Fig. 3) Graphs
showing linearly
separable logic
functions
In the above graphs, the two axes are the inputs
which can take the value of either 0 or 1, and the
numbers on the graph are the expected output for
a particular input. Using an appropriate weight
vector for each case, a single perceptron can
perform all of these functions.
However, not all logic operators are linearly
separable. For instance, the XOR operator is not
linearly separable and cannot be achieved by a
single perceptron. Yet this problem could be
overcome by using more than one perceptron
arranged in feed-forward networks.

(Fig.
4)
Since
it is
impos
sible
to
draw
a line
to
divide
the
region
s
contai
ning
either
1 or 0,
the
XOR
functi
on is
not
linearl
y
separ
able.

Você também pode gostar