Você está na página 1de 25

4/29/2011

Data Mining Techniques -


Artificial Neural Networks

Artificial Neural Networks


• A class of powerful, general-purpose tools readily applied to:
– Prediction
– Classification
– Clustering
• Biological Neural Net (human brain) is the most powerful – we
can generalize from experience
• Computers are best at following pre-determined instructions
• Computerized Neural Nets attempt to bridge the gap
– Predicting time-series in financial world
– Diagnosing medical conditions
– Identifying clusters of valuable customers
– Fraud detection

1
4/29/2011

Neural Networks
• When applied in well-defined domains, their ability
to generalize and learn from data “mimics” a
human’s ability to learn from experience.
• Very useful in Data Mining…better results are the
hope
• Drawback – training a neural network results in
internal weights distributed throughout the network
making it difficult to understand why a solution is
valid

Neural Network History


• 1930s thru 1970s
• 1980s:
– Back propagation – better way of training a neural net
– Computing power became available
– Researchers became more comfortable with n-nets
– Relevant operational data more accessible
– Useful applications (expert systems) emerged

2
4/29/2011

Retail -Loan Prospector

• A Neural Network (Expert System) is like a black box that knows how
to process inputs to create a useful output.
• The calculation(s) are quite complex and difficult to understand

Neural Net Limitations


• Neural Nets are good for prediction and estimation
when:
– Inputs are well understood
– Output is well understood
– Experience is available for examples to use to “train” the
neural net application (expert system)
• Neural Nets are only as good as the training set used
to generate it. The resulting model is static and must
be updated with more recent examples and
retraining for it to stay relevant

3
4/29/2011

Feed-Forward Neural Net Examples

• One-way flow through the network from the


inputs to the outputs

The Unit of a Neural Network


• The unit of a neural
network is modeled on
the biological neuron
• The unit combines its
inputs into a single
value, which it then
transforms to produce
the output; together
these are called the
activation function

4
4/29/2011

Loan Appraiser - revisited


• Illustrates that a
neural network
(feed-forward in
this case) is filled
with seemingly
meaningless
weights
• The appraised
value of this
property is
$176,228 (not a
bad deal for San
Diego!)

Neural Network Training


• Training is the process of setting the best weights on the
edges connecting all the units in the network
• The goal is to use the training set to calculate weights where
the output of the network is as close to the desired output as
possible for as many of the examples in the training set as
possible
• Back propagation has been used since the 1980s to adjust the
weights (other methods are now available):
– Calculates the error by taking the difference between the calculated
result and the actual result
– The error is fed back through the network and the weights are
adjusted to minimize the error

5
4/29/2011

Neural Network Working


• The output of a neuron is a function of the
weighted sum of the inputs plus a bias.

The function of the entire neural network is simply


the computation of the outputs of all the neurons
► An entirely deterministic calculation

Activation Function
• It is Applied to the weighted sum of
the inputs of a neuron to produce
the output.
• ● Majority of NN’s use sigmoid
functions
• ► Smooth, continuous, and
monotonically increasing
(derivative is always positive)
• ► Bounded range - but never
reaches max or min.
• ■ Consider “ON” to be slightly less
than the max and “OFF” to be
slightly greater than the min.

6
4/29/2011

Activation Functions
The most common sigmoid function used is the
logistic function
► f(x) = 1/(1 + e-x)
► The calculation of derivatives are important
for neural
networks and the logistic function has a very
nice derivative
■ f’(x) = f(x)(1 - f(x))

Where do weights come from?


• The weights in a neural network are the most
important factor in determining its function.

• Training is the act of presenting the network


with some sample data and modifying the
weights to better approximate the desired
function.

7
4/29/2011

There are two main types of training

► Supervised Training
■ Supplies the neural network with inputs and
the desired outputs.
■ Response of the network to the inputs is
measured.
The weights are modified to reduce the
difference between the actual and desired
outputs.

2nd Type of Training


• Unsupervised Training
• Only supplies inputs
• The neural network adjusts its own weights so that
similar inputs cause similar outputs.
– The network identifies the patterns and differences in the
• inputs without any external assistance.
• ● Epoch
– One iteration through the process of providing the network
with an input and updating the network's weights
– Typically many epochs are required to train the neural
network.

8
4/29/2011

Percepteron
• It was the first neural network with
the ability to learn.
• ● Made up of only input neurons
and output neurons
• ● Input neurons typically have two
states: ON and OFF
• ● Output neurons use a simple
threshold activation function
• ● In basic form, can only solve
linear problems
• ► Limited applications

How do Percepteron learn?


• Uses supervised training
• If the output is not correct, the weights are
adjusted according to the formula:

9
4/29/2011

Multilayer feed forward networks


• Most common neural network
• ● An extension of the perceptron
• ► Multiple layers
• ■ The addition of one or more “hidden” layers in between
the
• input and output layers
• ► Activation function is not simply a threshold
• ■ Usually a sigmoid function
• ► A general function approximator
• ■ Not limited to linear problems
• ● Information flows in one direction
• ► The outputs of one layer act as inputs to the next layer

Example

10
4/29/2011

Backpropagation
• Most common method of obtaining the many
• weights in the network
• ● A form of supervised training
• ● The basic backpropagation algorithm is based
on
• minimizing the error of the network using the
• derivatives of the error function
• ► Simple
• ► Slow
• ► Prone to local minima issues

Backpropagation
• Most common measure of error is the mean
• square error:
• E = (target – output)2
• ● Partial derivatives of the error wrt the
weights:

11
4/29/2011

Backpropagation
• Calculation of the derivatives flows
backwards through the network,
hence the name,
• backpropagation
• ● These derivatives point in the
direction of the
• maximum increase of the error
function
• ● A small step (learning rate) in
the opposite
• direction will result in the
maximum decrease of
• the (local) error function:

• The learning rate is important


• ► Too small
• ■ Convergence extremely slow
• ► Too large
• ■ May not converge
• ● Momentum
• ► Tends to aid convergence
• ► Applies smoothed averaging to the change in weights:

12
4/29/2011

• Training is essentially minimizing the mean square


• error function
• ► Key problem is avoiding local minima
• ► Traditional techniques for avoiding local minima:
• ■ Simulated annealing
• 􀂊 Perturb the weights in progressively smaller
amounts
• ■ Genetic algorithms
• 􀂊 Use the weights as chromosomes
• 􀂊 Apply natural selection, mating, and mutations to
thesechromosomes

• Another multilayer
feedforward network
• ● Up to 100 times faster
than backpropagation
• ● Not as general as
backpropagation
• ● Made up of three layers:
• ► Input
• ► Kohonen
• ► Grossberg (Output)

13
4/29/2011

How do they work?


• Kohonen Layer:
• ► Neurons in the Kohonen layer sum all of the weighted
• inputs received
• ► The neuron with the largest sum outputs a 1 and the
• other neurons output 0
• ● Grossberg Layer:
• ► Each Grossberg neuron merely outputs the weight of the
• connection between itself and the one active Kohonen
• neuron

Why 2 different types of layers?


• More accurate representation of biological neural
• networks
• ● Each layer has its own distinct purpose:
• ► Kohonen layer separates inputs into separate
classes
• ■ Inputs in the same class will turn on the same
Kohonen
• neuron
• ► Grossberg layer adjusts weights to obtain
acceptable
• outputs for each class

14
4/29/2011

Training a CP network
• Training the Kohonen layer
• ► Uses unsupervised training
• ► Input vectors are often normalized
• ► The one active Kohonen neuron
updates its weights according to the
formula:

The weights of the connections are being modified to more


closely match the values of the inputs
■ At the end of training, the weights will approximate the
average value of the inputs in that class

Training a CP Network
• Training the Grossberg layer
• ► Uses supervised training
• ► Weight update algorithm is similar to that
used in
• backpropagation

15
4/29/2011

Hidden Layers and neurons


• For most problems, one layer is sufficient
• ● Two layers are required when the function is
• discontinuous
• ● The number of neurons is very important:
• ► Too few
• ■ Underfit the data – NN can’t learn the details
• ► Too many
• ■ Overfit the data – NN learns the insignificant details
• ► Start small and increase the number until
satisfactory
• results are obtained

Overfitting

16
4/29/2011

How is training set chosen?


• Overfitting can also occur if a “good” training set
is
• not chosen.
• ● What constitutes a “good” training set?
• ► Samples must represent the general
population
• ► Samples must contain members of each class
• ► Samples in each class must contain a wide
range of variations or noise effect

Size of training set


• The size of the training set is related to the
• number of hidden neurons
• ► Eg. 10 inputs, 5 hidden neurons, 2 outputs:
• ► 11(5) + 6(2) = 67 weights (variables)
• ► If only 10 training samples are used to determine these
• weights, the network will end up being overfit
• ■ Any solution found will be specific to the 10 training
• samples
• ■ Analogous to having 10 equations, 67 unknowns 􀂊 you
• can come up with a specific solution, but you can’t find the
• general solution with the given information

17
4/29/2011

Training & Verification


• The set of all known samples is
broken into two
• orthogonal (independent) sets:
• ► Training set
• ■ A group of samples used to
train the neural network
• ► Testing set
• ■ A group of samples used to
test the performance of the
• neural network
• ■ Used to estimate the error
rate

Verification
• Provides an unbiased test of the quality of the
• network
• ● Common error is to “test” the neural network using
• the same samples that were used to train the
• neural network
• ► The network was optimized on these samples, and will
• obviously perform well on them
• ► Doesn’t give any indication as to how well the network
• will be able to classify inputs that weren’t in the training
• set

18
4/29/2011

Verification
• Various metrics can be used to grade the
• performance of the neural network based upon the
• results of the testing set
• ► Mean square error, SNR, etc.
• ● Resampling is an alternative method of estimating
• error rate of the neural network
• ► Basic idea is to iterate the training and testing
• procedures multiple times
• ► Two main techniques are used:
• ■ Cross-Validation
• ■ Bootstrapping

Applications of Neural Networks


• Prediction – weather, stocks, disease

• Classification – financial risk assessment, image processing

• Data association – Text Recognition (OCR)

• Data conceptualization – Customer purchasing habits

• Filtering – Normalizing telephone signals (static)

19
4/29/2011

Overview
• Advantages
– Adapt to unknown situations
– Robustness: fault tolerance due to network redundancy
– Autonomous learning and generalization

• Disadvantages
– Not exact
– Large complexity of the network structure

Conclusions
● The toy examples confirmed the basic operation
of
neural networks and also demonstrated their
ability to learn the desired function and generalize
when needed
● The ability of neural networks to learn and
generalize in addition to their wide range of
applicability makes them very powerful tools

20
4/29/2011

Artificial Intelligence for Data Mining


• Neural networks are useful for data mining and decision-
support applications.

• People are good at generalizing from experience.

• Computers excel at following explicit instructions over and


over.

• Neural networks bridge this gap by modeling, on a computer,


the neural behavior of human brains.

41

Neural Network Characteristics

• Neural networks are useful for pattern recognition or


data classification, through a learning process.

• Neural networks simulate biological systems, where


learning involves adjustments to the synaptic
connections between neurons

42

21
4/29/2011

Anatomy of a Neural Network

• Neural Networks map a set of


input-nodes to a set of
Input 0 Input 1 ... Input n

output-nodes

• Number of inputs/outputs is
Neural Network
variable

• The Network itself is


composed of an arbitrary Output 0 Output 1 ... Output m

number of nodes with an


arbitrary topology

43

Biological Background
• A neuron: many-inputs / one-output unit

• Output can be excited or not excited

• Incoming signals from other neurons determine if


the neuron shall excite ("fire")

• Output subject to attenuation in the synapses, which


are junction parts of the neuron

44

22
4/29/2011

Learning Performance
• Supervised
– Need to be trained ahead of time with lots of data

• Unsupervised networks adapt to the input


– Applications in Clustering and reducing dimensionality
– Learning may be very slow
– No help from the outside
– No training data, no information available on the desired output
– Learning by doing
– Used to pick out structure in the input:
– Clustering
– Compression

45

Genetic Algorithms
• Optimization search type algorithms.
• Creates an initial feasible solution and iteratively
creates new “better” solutions.
• Based on human evolution and survival of the
fittest.
• Must represent a solution as an individual.
• Individual: string I=I1,I2,…,In where Ij is in given
alphabet A.
• Each character Ij is called a gene.
• Population: set of individuals.

23
4/29/2011

Genetic Algorithms
• A Genetic Algorithm (GA) is a computational
model consisting of five parts:
– A starting set of individuals, P.
– Crossover: technique to combine two parents
to create offspring.
– Mutation: randomly change an individual.
– Fitness: determine the best individuals.
– Algorithm which applies the crossover and
mutation techniques to P iteratively using the
fitness function to determine the best
individuals in P to keep.

Genetic Algorithm

24
4/29/2011

GA Advantages/Disadvantages
• Advantages
– Easily parallelized
• Disadvantages
– Difficult to understand and explain to end users.
– Abstraction of the problem and method to
represent individuals is quite difficult.
– Determining fitness function is difficult.
– Determining how to perform crossover and
mutation is difficult.

25

Você também pode gostar