Você está na página 1de 39

Introduction to Neural Network


Before we start
Information processing technology inspired by studies of brain and the nervous system.

Brains Capability

its performance tends to degrade gracefully under partial damage. it can learn (reorganize itself) from experience. it performs massively parallel computations extremely efficiently. it supports our intelligence and selfselfawareness.

What Is A Neural Network?

"...a computing system made up of a number of simple, highly interconnected processing elements, which process information by their dynamic state response to external inputs. An ANN is a network of many very simple processors ("units"), each possibly having a (small amount of) local memory. The units are connected by unidirectional communication channels ("connections"), which carry numeric (as opposed to symbolic) data. The units operate only on their local data and on the inputs they receive via the connections.


1943 --- McCulloch and Pitts (start of the modern era of neural networks). Logical calculus of neural networks. A network consists of sufficient number of neurons (using a simple model) and properly set synaptic connections can compute any computable function. 1949 --- Hebb's book "The organization of behavior". An explicit statement of a physiological learning rule for synaptic modification was presented for the first time. Hebb proposes that the connectivity of the brain is continually changing as an organism learns differing functional tasks, and that neural assemblies are created by such changes. Hebb's work was immensely influential among psychologyists. 1958 --- Rosenblatt introduced Perceptron A novel method of supervised learning.

Historical Contd

Perceptron convergence theorem. Least mean-square (LMS) algorithm mean1969 --- Minsky and Papert showed limits on perceptron computation. Minsky and Papert showed that there are fundamental limits on what singlesingle-layer perceptrons can compute. They speculated that the limits could not be overcome for the multimultilayer version 1982 --- Hopfield's networks Hopfield showed how to use "Ising spin glass" type of model to store information in dynamically stable networks.


His work paved the way for physicists to enter neural modeling, thereby transforming the field of neural networks.


1982 --- Kohonen's self-organizing maps (SOM) Kohonen's self-organizing selfselfmaps is capable of reproducing important aspects of the structure of biological neural nets: Data representation using topographic maps (which are common in the nervous systems). SOM also has a wide range of applications. SOM shows how the output layer can pick up the correlational structure (from the inputs) in the form of the spatial arrangement of units. 1985 --- Ackley, Hinton, and Sejnowski, developed Boltzmann machine, which was the first successful realization of a multilayer neural network. 1986 --- Rumelhart, Hinton, and Williams developed the back-propagation backalgorithm --- the most popular learning algorithm for the training of multilayer perceptrons. It has been the workhorse for many neural network applications

Why Neural Nets?


Adaptive learning: An ability to learn how to do tasks based on the data given for training or initial experience. Self-Organisation: Self-Organisation: An ANN can create its own organisation or representation of the information it receives during learning time. Real Time Operation: ANN computations may be carried Operation: out in parallel, and special hardware devices are being designed and manufactured which take advantage of this capability. Fault Tolerance via Redundant Information Coding: Partial Coding: destruction of a network leads to the corresponding degradation of performance. However, some network capabilities may be retained even with major network damage.

Before we start..
Processin g element Elemen t size Energy use Processin g speed Style of computation Fault toleran t learns Intelligent conscious



30 W 100 Hz

parallel, distribute d








Serial, No centralize d

A little Not (yet)

Differentiated between brain and computer

Some Similarities

Neuron Vs


Relationships between biological & artificial networks

i. ii. iii. iv. v. vi.

Soma Dendrites Axon Synapse Slow Speed Many Neurons - 109

i. ii. iii. iv. v. vi.

Node Input Output Weight Fast Speed Few Neurons - a dozen to hundreds of thousands

Summary of selected biophysical mechanisms and their corresponding possible neural operations they could implement
Biophysical Mechanism

Neural Operation

Action potential initiation Repetitive spiking activity Action potential conduction Chemically mediated synaptic transduction Electrically mediated synaptic transduction Distributed excitatory synapses in dendritic tree Excitatory and inhibitory synapses of dendritic spine Long distance action of neurotransmitter



Analog OR/AND 1-bit A/D converter 1Current-toCurrent-to-frequency transducer Impulse transmission Sigmoid threshold or Nonreciprocal 22port negative resistance Reciprocal 1-port resistance 1Linear addition Local AND-NOT presynaptic ANDinhibition Modulating and routing transmission of signals


Neural Network Fundamentals

Components and Structures  Composed of processing elements organized in different ways to form the network s structures Processing Elements  Artificial neurons = Processing Elements (PEs)  Each PE receives, process input , and delivers a single output (refer to diagram)  Input can be raw or the output of other processing elements.

Neural Network Fundamentals Contd

The Network

Composed of a collection of neuron grouped in layers (input, intermediate, output) Can be organized in several different ways neuron connected into different ways After structure is determined, information can be processed

Network Structure

Network Information Processing

Neural Network Fundamentals Contd


Corresponds to a single attribute. Input can be text, pictures, voice Preprocessing needed to convert this data to meaningful inputs Contains the solution to a problem PostPost-processing is required Express the relative strength (mathematic value) of the input data Crucial in that they store learned patterns of information.



i i !1

Neural Network Fundamentals Contd

Summation Function  Computes the weighted sum all the input elements entering each processing elements  Multiplies each input value by its weight and totals the value for a weighted sum Y.  The formula is
And for the jth

The summation function computes the internal simulation or activation level of the neuron. Neuron may or may not produce an output

Neural Network Fundamentals Contd

Transformation (Transfer) Function

This Function is to produce the output after summations function has been compute (if necessary). The popular - transfer function (sigmoid function)- useful function)nonlinear transfer function is



YT = transformed (normalized) value of Y Transformation modifies the output level to be within reasonable values ( 0-1) 0This performed before the output reach the next level Without transformation = the value become very large especially ehen there are several layers of neuron

What is an artificial neuron ?

Definition : Non linear, parameterized function with restricted output range

n 1 y ! f w0  wi xi i !1





Activation functions
20 18 16 14 12 10 8 6 4 2 0 0 2 4 6 8 10 12 14 16 18 20



1 .5 1

0 .5 0 -0 .5


-8 -6 -4 -2 0 2 4 6 8 10

-1 -1 .5 -2 -1 0

1 1  exp( x)

2 1. 5 1 0. 5 0 -0 .5 -1 -1 .5 -2 -1 0

Hyperbolic tangent

-8 -6 -4 -2 0 2 4 6 8 10

exp( x)  exp( x) exp( x)  exp( x)

Learning Algorithm

There are a lot of learning algorithm classified as supervised learning and unsupervised Learning. Supervised Learning uses a set of inputs for which the appropriate (desired) output are know Unsupervised Learning only input stimuli are shown to the network. The network is selfself-Organizing.

2 Main Types of ANN

e.g:  Adaline  Perceptron  MLP  RBF  Fuzzy ARTMAP  etc. e.g:


Competitive learning networks - SOM - ART families - neocognition - etc.

Supervised Network

Teacher error + ANN

Unsupervised ANN
Teacher error


How does an ANN learn

neurons weights


Connected by links-each linkslink has a numerical weight Weight

basic means of long-term longmemory in ANNs Express the strength

Learns through repeated adjustments of these weights

Input layer

Middle layer

Output Layer

Learning Process of ANN

Learn from experience


Compute output

Learning algorithms Recognize pattern of activities

Adjust Weight

Involves 3 tasks


Compute outputs Compare outputs with desired targets Adjust the weights and repeat the process

Is Desired Output achieved


NN Application Development

Similar to the structured design methodologies of traditional computer-based IS computerThere are 9 step (Turban, Aronson. 2001)

Collect data Separate into training and test, sets Define a network structure Select a training algorithm Set, parameters, value, initialize weights Transform data to network inputs Start training and determine and revise weights Stop and test Implementation; use the network with new cases

What Applications Should Neural Networks Be Used For?

capturing associations or discovering regularities within a set of patterns; where the volume, number of variables or diversity of the data is very great; the relationships between variables are vaguely understood; or, the relationships are difficult to describe adequately with conventional approaches.

Mathematic Relate

Neural Network Architecture

Feedforward Flow

Recurrent Structure

Algorithms Backpropagation, Madaline III Neuron Output feedforward to subsequent layer Solving problem static pattern recognition, classification and generalization problems (eg: quality control, loan evaluation)

Algorithms TrueTime Algorithm Neuron Output feedback as neuron input Solving problem dynamic timetime-dependent problems (e.g: sales forecasting, process analysis, sequence recognition, and sequence generation)

Topologies of ANN

Fully-connected feed-forward

Partially recurrent network

Fully recurrent network


Parallel processing Distributed representations Online (i.e., incremental) algorithm Simple computations Robust with respect to noisy data Robust with respect to node failure Empirically shown to work well for many problem domains


Slow training Poor interpretability Network topology layouts ad hoc Hard to debug because distributed representations preclude content checking May converge to a local, not global, minimum of error Not known how to model higher-level highercognitive mechanisms May be hard to describe a problem in terms of features with numerical values

Limitation of ANN

Lack of explanation capability Do not produce an explicit model Do not perform well on tasks that people do not perform well Required extensive training and testing of data

Applications of NN

best at identifying patterns or trends in data, they are well suited for prediction or forecasting needs including:

sales forecasting industrial process control customer research data validation risk management target marketing

Example of Applications xample


NETtalk (Sejnowski and Rosenberg, 1987)  Maps character strings into phonemes for learning speech from text. Neurogammon (Tesauro and Sejnowski, 1989)  Backgammon learning program Speech recognition (Waibel, 1989)  Converts sound to text Character recognition (Le Cun et al., 1989) Face Recognition (Mitchell) ALVINN (Pomerleau, 1988)

Other Issues

How to Set Alpha, the Learning Rate Parameter? Use a tuning set or cross-validation to train using several crosscandidate values for alpha, and then select the value that gives the lowest error How to Estimate the Error? Use cross-validation (or some other evaluation method) crossmultiple times with different random initial weights. Report the average error rate. How many Hidden Layers and How many Hidden Units per Layer? Usually just one hidden layer is used (i.e., a 2-layer 2network). How many units should it contain? Too few => can't learn. Too many => poor generalization. Determine experimentally using a tuning set or cross-validation to crossselect number that minimizes error.

Other Issues (cont..)

How many examples in the Training Set? Under what circumstances can I be assured that a net that is trained to classify 1 - e/2 of the training set correctly, will also classify 1 - e of the testing set correctly? Clearly, the larger the training set the better the generalization, but the longer the training time required. But to obtain 1 - e correct classification on the testing set, training set should be of size approximately n/e, where n is the number of weights in the network and e is a fraction between 0 and 1. For example, if e=.1 and n=80, then a training set of size 800 that is trained until 95% correct classification is achieved on the training set, should produce 90% correct classification on the testing set.

Other Issues (cont..)

When to Stop?

Too much training "overfits" the data, and hence the error rate will go up on the testing set. Hence it is not usually advantageous to continue training until the MSE is minimized. Instead, train the network until the error rate on a tuning set starts to increase.