DM NeuralNet GA

4/29/2011
Data Mining Techniques -

Artificial Neural Networks
Artificial Neural Networks

• A class of powerful, general-purpose tools readily applied to:
– Prediction
– Classification
– Clustering
• Biological Neural Net (human brain) is the most powerful – we
can generalize from experience
• Computers are best at following pre-determined instructions
• Computerized Neural Nets attempt to bridge the gap
– Predicting time-series in financial world
– Diagnosing medical conditions
– Identifying clusters of valuable customers
– Fraud detection
1
4/29/2011
Neural Networks
• When applied in well-defined domains, their ability
to generalize and learn from data “mimics” a
human’s ability to learn from experience.
• Very useful in Data Mining…better results are the
hope
• Drawback – training a neural network results in
internal weights distributed throughout the network
making it difficult to understand why a solution is
valid
Neural Network History

• 1930s thru 1970s
• 1980s:
– Back propagation – better way of training a neural net
– Computing power became available
– Researchers became more comfortable with n-nets
– Relevant operational data more accessible
– Useful applications (expert systems) emerged
2
4/29/2011
Retail -Loan Prospector
• A Neural Network (Expert System) is like a black box that knows how
to process inputs to create a useful output.
• The calculation(s) are quite complex and difficult to understand
Neural Net Limitations

• Neural Nets are good for prediction and estimation
when:
– Inputs are well understood
– Output is well understood
– Experience is available for examples to use to “train” the
neural net application (expert system)
• Neural Nets are only as good as the training set used
to generate it. The resulting model is static and must
be updated with more recent examples and
retraining for it to stay relevant
3
4/29/2011
Feed-Forward Neural Net Examples
• One-way flow through the network from the

inputs to the outputs
The Unit of a Neural Network

• The unit of a neural
network is modeled on
the biological neuron
• The unit combines its
inputs into a single
value, which it then
transforms to produce
the output; together
these are called the
activation function
4
4/29/2011
Loan Appraiser - revisited

• Illustrates that a
neural network
(feed-forward in
this case) is filled
with seemingly
meaningless
weights
• The appraised
value of this
property is
$176,228 (not a
bad deal for San
Diego!)
Neural Network Training

• Training is the process of setting the best weights on the
edges connecting all the units in the network
• The goal is to use the training set to calculate weights where
the output of the network is as close to the desired output as
possible for as many of the examples in the training set as
possible
• Back propagation has been used since the 1980s to adjust the
weights (other methods are now available):
– Calculates the error by taking the difference between the calculated
result and the actual result
– The error is fed back through the network and the weights are
adjusted to minimize the error
5
4/29/2011
Neural Network Working

• The output of a neuron is a function of the
weighted sum of the inputs plus a bias.
The function of the entire neural network is simply

the computation of the outputs of all the neurons
► An entirely deterministic calculation
Activation Function
• It is Applied to the weighted sum of
the inputs of a neuron to produce
the output.
• ● Majority of NN’s use sigmoid
functions
• ► Smooth, continuous, and
monotonically increasing
(derivative is always positive)
• ► Bounded range - but never
reaches max or min.
• ■ Consider “ON” to be slightly less
than the max and “OFF” to be
slightly greater than the min.
6
4/29/2011
Activation Functions
The most common sigmoid function used is the
logistic function
► f(x) = 1/(1 + e-x)
► The calculation of derivatives are important
for neural
networks and the logistic function has a very
nice derivative
■ f’(x) = f(x)(1 - f(x))
Where do weights come from?

• The weights in a neural network are the most
important factor in determining its function.
• Training is the act of presenting the network

with some sample data and modifying the
weights to better approximate the desired
function.
7
4/29/2011
There are two main types of training
► Supervised Training
■ Supplies the neural network with inputs and
the desired outputs.
■ Response of the network to the inputs is
measured.
The weights are modified to reduce the
difference between the actual and desired
outputs.
2nd Type of Training

• Unsupervised Training
• Only supplies inputs
• The neural network adjusts its own weights so that
similar inputs cause similar outputs.
– The network identifies the patterns and differences in the
• inputs without any external assistance.
• ● Epoch
– One iteration through the process of providing the network
with an input and updating the network's weights
– Typically many epochs are required to train the neural
network.
8
4/29/2011
Percepteron
• It was the first neural network with
the ability to learn.
• ● Made up of only input neurons
and output neurons
• ● Input neurons typically have two
states: ON and OFF
• ● Output neurons use a simple
threshold activation function
• ● In basic form, can only solve
linear problems
• ► Limited applications
How do Percepteron learn?

• Uses supervised training
• If the output is not correct, the weights are
adjusted according to the formula:
9
4/29/2011
Multilayer feed forward networks

• Most common neural network
• ● An extension of the perceptron
• ► Multiple layers
• ■ The addition of one or more “hidden” layers in between
the
• input and output layers
• ► Activation function is not simply a threshold
• ■ Usually a sigmoid function
• ► A general function approximator
• ■ Not limited to linear problems
• ● Information flows in one direction
• ► The outputs of one layer act as inputs to the next layer
Example
10
4/29/2011
Backpropagation
• Most common method of obtaining the many
• weights in the network
• ● A form of supervised training
• ● The basic backpropagation algorithm is based
on
• minimizing the error of the network using the
• derivatives of the error function
• ► Simple
• ► Slow
• ► Prone to local minima issues
Backpropagation
• Most common measure of error is the mean
• square error:
• E = (target – output)2
• ● Partial derivatives of the error wrt the
weights:
11
4/29/2011
Backpropagation
• Calculation of the derivatives flows
backwards through the network,
hence the name,
• backpropagation
• ● These derivatives point in the
direction of the
• maximum increase of the error
function
• ● A small step (learning rate) in
the opposite
• direction will result in the
maximum decrease of
• the (local) error function:
• The learning rate is important

• ► Too small
• ■ Convergence extremely slow
• ► Too large
• ■ May not converge
• ● Momentum
• ► Tends to aid convergence
• ► Applies smoothed averaging to the change in weights:
12
4/29/2011
• Training is essentially minimizing the mean square

• error function
• ► Key problem is avoiding local minima
• ► Traditional techniques for avoiding local minima:
• ■ Simulated annealing
• 􀂊 Perturb the weights in progressively smaller
amounts
• ■ Genetic algorithms
• 􀂊 Use the weights as chromosomes
• 􀂊 Apply natural selection, mating, and mutations to
thesechromosomes
• Another multilayer
feedforward network
• ● Up to 100 times faster
than backpropagation
• ● Not as general as
backpropagation
• ● Made up of three layers:
• ► Input
• ► Kohonen
• ► Grossberg (Output)
13
4/29/2011
How do they work?

• Kohonen Layer:
• ► Neurons in the Kohonen layer sum all of the weighted
• inputs received
• ► The neuron with the largest sum outputs a 1 and the
• other neurons output 0
• ● Grossberg Layer:
• ► Each Grossberg neuron merely outputs the weight of the
• connection between itself and the one active Kohonen
• neuron
Why 2 different types of layers?

• More accurate representation of biological neural
• networks
• ● Each layer has its own distinct purpose:
• ► Kohonen layer separates inputs into separate
classes
• ■ Inputs in the same class will turn on the same
Kohonen
• neuron
• ► Grossberg layer adjusts weights to obtain
acceptable
• outputs for each class
14
4/29/2011
Training a CP network
• Training the Kohonen layer
• ► Uses unsupervised training
• ► Input vectors are often normalized
• ► The one active Kohonen neuron
updates its weights according to the
formula:
The weights of the connections are being modified to more

closely match the values of the inputs
■ At the end of training, the weights will approximate the
average value of the inputs in that class
Training a CP Network
• Training the Grossberg layer
• ► Uses supervised training
• ► Weight update algorithm is similar to that
used in
• backpropagation
15
4/29/2011
Hidden Layers and neurons

• For most problems, one layer is sufficient
• ● Two layers are required when the function is
• discontinuous
• ● The number of neurons is very important:
• ► Too few
• ■ Underfit the data – NN can’t learn the details
• ► Too many
• ■ Overfit the data – NN learns the insignificant details
• ► Start small and increase the number until
satisfactory
• results are obtained
Overfitting
16
4/29/2011
How is training set chosen?

• Overfitting can also occur if a “good” training set
is
• not chosen.
• ● What constitutes a “good” training set?
• ► Samples must represent the general
population
• ► Samples must contain members of each class
• ► Samples in each class must contain a wide
range of variations or noise effect
Size of training set

• The size of the training set is related to the
• number of hidden neurons
• ► Eg. 10 inputs, 5 hidden neurons, 2 outputs:
• ► 11(5) + 6(2) = 67 weights (variables)
• ► If only 10 training samples are used to determine these
• weights, the network will end up being overfit
• ■ Any solution found will be specific to the 10 training
• samples
• ■ Analogous to having 10 equations, 67 unknowns 􀂊 you
• can come up with a specific solution, but you can’t find the
• general solution with the given information
17
4/29/2011
Training & Verification

• The set of all known samples is
broken into two
• orthogonal (independent) sets:
• ► Training set
• ■ A group of samples used to
train the neural network
• ► Testing set
• ■ A group of samples used to
test the performance of the
• neural network
• ■ Used to estimate the error
rate
Verification
• Provides an unbiased test of the quality of the
• network
• ● Common error is to “test” the neural network using
• the same samples that were used to train the
• neural network
• ► The network was optimized on these samples, and will
• obviously perform well on them
• ► Doesn’t give any indication as to how well the network
• will be able to classify inputs that weren’t in the training
• set
18
4/29/2011
Verification
• Various metrics can be used to grade the
• performance of the neural network based upon the
• results of the testing set
• ► Mean square error, SNR, etc.
• ● Resampling is an alternative method of estimating
• error rate of the neural network
• ► Basic idea is to iterate the training and testing
• procedures multiple times
• ► Two main techniques are used:
• ■ Cross-Validation
• ■ Bootstrapping
Applications of Neural Networks

• Prediction – weather, stocks, disease
• Classification – financial risk assessment, image processing
• Data association – Text Recognition (OCR)
• Data conceptualization – Customer purchasing habits
• Filtering – Normalizing telephone signals (static)
19
4/29/2011
Overview
• Advantages
– Adapt to unknown situations
– Robustness: fault tolerance due to network redundancy
– Autonomous learning and generalization
• Disadvantages
– Not exact
– Large complexity of the network structure
Conclusions
● The toy examples confirmed the basic operation
of
neural networks and also demonstrated their
ability to learn the desired function and generalize
when needed
● The ability of neural networks to learn and
generalize in addition to their wide range of
applicability makes them very powerful tools
20
4/29/2011
Artificial Intelligence for Data Mining

• Neural networks are useful for data mining and decision-
support applications.
• People are good at generalizing from experience.
• Computers excel at following explicit instructions over and

over.
• Neural networks bridge this gap by modeling, on a computer,

the neural behavior of human brains.
41
Neural Network Characteristics
• Neural networks are useful for pattern recognition or

data classification, through a learning process.
• Neural networks simulate biological systems, where

learning involves adjustments to the synaptic
connections between neurons
42
21
4/29/2011
Anatomy of a Neural Network
• Neural Networks map a set of

input-nodes to a set of
Input 0 Input 1 ... Input n
output-nodes
• Number of inputs/outputs is
Neural Network
variable
• The Network itself is

composed of an arbitrary Output 0 Output 1 ... Output m
number of nodes with an

arbitrary topology
43
Biological Background
• A neuron: many-inputs / one-output unit
• Output can be excited or not excited
• Incoming signals from other neurons determine if

the neuron shall excite ("fire")
• Output subject to attenuation in the synapses, which

are junction parts of the neuron
44
22
4/29/2011
Learning Performance
• Supervised
– Need to be trained ahead of time with lots of data
• Unsupervised networks adapt to the input

– Applications in Clustering and reducing dimensionality
– Learning may be very slow
– No help from the outside
– No training data, no information available on the desired output
– Learning by doing
– Used to pick out structure in the input:
– Clustering
– Compression
45
Genetic Algorithms
• Optimization search type algorithms.
• Creates an initial feasible solution and iteratively
creates new “better” solutions.
• Based on human evolution and survival of the
fittest.
• Must represent a solution as an individual.
• Individual: string I=I1,I2,…,In where Ij is in given
alphabet A.
• Each character Ij is called a gene.
• Population: set of individuals.
23
4/29/2011
Genetic Algorithms
• A Genetic Algorithm (GA) is a computational
model consisting of five parts:
– A starting set of individuals, P.
– Crossover: technique to combine two parents
to create offspring.
– Mutation: randomly change an individual.
– Fitness: determine the best individuals.
– Algorithm which applies the crossover and
mutation techniques to P iteratively using the
fitness function to determine the best
individuals in P to keep.
Genetic Algorithm
24
4/29/2011
GA Advantages/Disadvantages
• Advantages
– Easily parallelized
• Disadvantages
– Difficult to understand and explain to end users.
– Abstraction of the problem and method to
represent individuals is quite difficult.
– Determining fitness function is difficult.
– Determining how to perform crossover and
mutation is difficult.
25

DM NeuralNet GA

Enviado por

Dados do documento

Descrição original:

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

DM NeuralNet GA

Enviado por

Direitos autorais:

Formatos disponíveis

4/29/2011

Data Mining Techniques -

Artificial Neural Networks

Neural Network History

Retail -Loan Prospector

Neural Net Limitations

Feed-Forward Neural Net Examples

• One-way flow through the network from the

The Unit of a Neural Network

Loan Appraiser - revisited

Neural Network Training

Neural Network Working

The function of the entire neural network is simply

Where do weights come from?

• Training is the act of presenting the network

There are two main types of training

2nd Type of Training

How do Percepteron learn?

Multilayer feed forward networks

• The learning rate is important

• Training is essentially minimizing the mean square

How do they work?

Why 2 different types of layers?

The weights of the connections are being modified to more

Hidden Layers and neurons

How is training set chosen?

Size of training set

Training & Verification

Applications of Neural Networks

• Classification – financial risk assessment, image processing

• Data association – Text Recognition (OCR)

• Data conceptualization – Customer purchasing habits

• Filtering – Normalizing telephone signals (static)

Artificial Intelligence for Data Mining

• People are good at generalizing from experience.

• Computers excel at following explicit instructions over and

• Neural networks bridge this gap by modeling, on a computer,

Neural Network Characteristics

• Neural networks are useful for pattern recognition or

• Neural networks simulate biological systems, where

Anatomy of a Neural Network

• Neural Networks map a set of

• The Network itself is

number of nodes with an

• Output can be excited or not excited

• Incoming signals from other neurons determine if

• Output subject to attenuation in the synapses, which

• Unsupervised networks adapt to the input

Você também pode gostar