Escolar Documentos
Profissional Documentos
Cultura Documentos
Institute of
Business Informatics
Supervised Learning
Uwe Lmmel
www.wi.hs-
wismar.de/~laemmel
U.laemmel@wi.hs-
wismar.de
1 Supervised Learning
Neural Networks
Idea
Artificial Neuron & Network
Supervised Learning
Unsupervised Learning
Data Mining other
Techniques
2 Supervised Learning
Supervised Learning
Feed-Forward Networks
Perceptron AdaLinE LTU
Multi-Layer networks
Backpropagation Algorithm
Pattern recognition
Data preparation
Examples
Bank Customer
Customer Relationship
3 Supervised Learning
Connections
Feed-forward Feed-back / auto-associative
From (output) layer back to
Input layer previous (hidden/input) layer
Hidden layer All neurons fully connected to
Output layer each other
Hopfield
network
4 Supervised Learning
Perceptron Adaline TLU
...
5 Supervised Learning
Papert, Minsky and Perceptron -
History
"Once upon a time two daughter sciences were born to the new science
of cybernetics.
One sister was natural, with features inherited from the study of the
brain, from the way nature does things.
The other was artificial, related from the beginning to the use of
computers.
But Snow White was not dead.
What Minsky and Papert had shown the world as proof was not the
heart of the princess; it was the heart of a pig."
Seymour Papert, 1988
6 Supervised Learning
Perception
mapping layer
Perception
first step of recognition
picture
trainable, fully
connected
fixed 1-1- links
7 Supervised Learning
Perceptron
Input layer
binary input, passed trough,
no trainable links
Propagation function netj = oiwij
Activation function
oj = aj = 1 if netj j , 0 otherwise
8 Supervised Learning
Linear separable
Neuron j should be 0,
iff both neurons 1 and 2 have the same
value (o1=o2), otherwise 1:
?
0 w1j + 0w2j < j j
0 w1j + 1w2j j
w1j w2j
1 w1j + 0w2j j
1 w1j + 1w2j < j 1 2
9 Supervised Learning
Linear o2
separable
1 (1,1)
10 Supervised Learning
Learning is easy
11 Supervised Learning
Exercise
Decoding
input: binary code of a digit
output - unary representation:
as many digits 1, as the digit
represents:
5:11111
architecture:
12 Supervised Learning
Exercise
Decoding
input: Binary code of a digit
output: classification:
0~ 1st Neuron, 1~ 2nd Neuron, ... 5~ 6th
Neuron, ...
architecture:
13 Supervised Learning
Exercises
14 Supervised Learning
Exercises
15 Supervised Learning
multi-layer Perceptron
17 Supervised Learning
Feed-Forward Network
18 Supervised Learning
Evaluation of the net output in a feed
forward network
pattern p
Training
Oi=pi
Ni
netj
Oj=actj
Nj
netk Ok=act
Nk
k
19 Supervised Learning
Backpropagation-Learning
Algorithm
supervised Learning
error is a function of the weights w i :
E(W) = E(w1,w2, ... , wn)
We are looking for a minimal error
minimal error = hollow in the error
surface
Backpropagation uses the gradient
for weight adaptation
20 Supervised Learning
error curve
weight1
weight2
21 Supervised Learning
Problem
output teaching
output
hidden
layer
0,80 Gradient:
Vector orthogonal to a
surface in direction
0,40 of the strongest slope
derivation of a function
in a certain direction is
0,00 the projection of the
-1 -0,6 -0,2 0,2 0,6 1 gradient in this
direction
example of an error curve
of a weight wi
23 Supervised Learning
Example: Newton-Approximation
tan = f(x) = 2x
tan = f(x) / (x-x)
x =(x + a/x)
calculation of the root
f(x) = x-5
f(x)= x-a
x = 2
x = (x + 5/x) = 2.25
x
x X= (x + 5/x) =
2.2361
24 Supervised Learning
Backpropagation - Learning
gradient-descent algorithm
supervised learning:
error signal used for weight adaptation
error signal:
teaching calculated output , if output neuron
weighted sum of error signals of successor
weight adaptation:
: Learning rate
: error signal w wij oi j
'
ij
25 Supervised Learning
Standard-Backpropagation Rule
gradient descent: derivation of a function
logistic function: 1
f Logistic ( x)
1 ex
fact(netj) = fact(netj)(1- fact(netj)) = oj(1-oj)
o j (1 o j ) k w jk if j is hidden neuron
j k
o j (1 o j ) (t j o j ) if j is output neuron
wij' wij oi j
26 Supervised Learning
Backpropagation
Examples:
XOR (Excel)
Bank Customer
27 Supervised Learning
Backpropagation - Problems
A B C
28 Supervised Learning
Backpropagation-Problems
A: flat plateau
weight adaptation is slow
finding a minimum takes a lot of time
C: leaving a minimum
if the modification in one training step is to
high,
the minimum can be lost
29 Supervised Learning
Solutions: looking at the values
30 Supervised Learning
Solution: Quickprop
assumption: error curve is a square function
calculate the vertex of the curve
S (t )
wij (t ) wij (t 1)
S (t 1) S (t )
31 Supervised Learning
Resilient Propagation (RPROP)
sign and size of the weight modification are calculated
separately: bij(t) size of modification
bij(t-1) + if S(t-1)S(t) > 0
bij(t) = bij(t-1) - if S(t-1)S(t) < 0
bij(t-1) otherwise
+>1 : both ascents are equal big step
0<-<1 : ascents are different smaller step
-bij(t) if S(t-1)>0 S(t) > 0
wij(t) = bij(t) f S(t-1)<0 S(t) < 0
-wij(t-1) if S(t-1)S(t) < 0 (*)
-sgn(S(t))bij(t) otherwise
(*) S(t) is set to 0, S(t):=0 ; at time (t+1) the 4th case will be applied.
32 Supervised Learning
Limits of the Learning Algorithm
33 Supervised Learning
Exercise - JavaNNS
34 Supervised Learning
Pattern Recognition
Eingabeschicht
input layer 1. 1.
verdeckte
hidden 2.2.verdeckte
hidden Ausgabe-
output
Schicht
layer schicht
layer schicht
layer
35 Supervised Learning
Example: Pattern Recognition
36 Supervised Learning
font Example
37 Supervised Learning
Exercise
load the network font_untrained
train the network, use various learning
algorithms:
(look at the SNNS documentation for the
parameters and their meaning)
Backpropagation =2.0
Backpropagation =0.8 mu=0.6 c=0.1
with momentum
Quickprop =0.1 mg=2.0 n=0.0001
Rprop =0.6
use various values for
learning parameter, momentum, and noise:
learning parameter 0.2 0.3 0.5 1.0
Momentum 0.9 0.7 0.5 0.0
noise 0.0 0.1 0.2
38 Supervised Learning
Example: Bank Customer
objectives methods
prospects of better selection and
results integration
adaptation to algorithms completion
data reduction transformation
trouble shooting normalization
coding
filter
40 Supervised Learning
Selection and Integration
41 Supervised Learning
Completion / Cleaning
Missing values
ignore / omit attribute
add values
manual
global constant (missing
value)
average
highly probable value
remove data set
noised data
inconsistent data
42 Supervised Learning
Transformation
Normalization
Coding
Filter
43 Supervised Learning
Normalization of values
logarithmic normalization
act = (ln(x) - ln(minValue)) / (ln(maxValue)-
ln(minValue))
44 Supervised Learning
Binary Coding of nominal values I
45 Supervised Learning
Bank Customer
credit
history debt collateral income
will will
customer cancel not cancel
gets an offer 43.80 66.30
48 Supervised Learning
Goal Function: Lift will will
customer cancel not
cancel
gets an 43.80
66.30
offer
gets no
0.00 72.00
offer
49 Supervised Learning
----- 32 input data ------
<important
Data
results>
^missing values^
50 Supervised Learning
Feed Forward Network What to do?
gain:
additional income by the mailing action
if target group was chosen according analysis
52 Supervised Learning
Review Students Project
better results
wishes
engineering approach data
mining
real data for teaching purposes
53 Supervised Learning
Data Mining Cup 2007
54 Supervised Learning
Data
55 Supervised Learning
DMC2007
56 Supervised Learning
Optimization of Neural Networks
objectives
good results in an application:
better generalisation
(improve correctness)
faster processing of patterns
(improve efficiency)
good presentation of the results
(improve comprehension)
57 Supervised Learning
Ability to generalize
58 Supervised Learning
Development of an NN-application
60 Supervised Learning
Memory Capacity
Number of patterns
a network can store without generalisation
61 Supervised Learning
Memory Capacity - Experiment
output-layer is a copy of
the input-layer
training set consisting of
n random pattern
error:
error = 0
network can store more
than n patterns
error >> 0
network can not store n
patterns
memory capacity:
error > 0 and error = 0
for n-1 patterns and
error >>0 for n+1
patterns
62 Supervised Learning
Layers Not fully Connected
connections:
new
removed
remaining
63 Supervised Learning
Summary
Feed-forward network
Perceptron (has limits)
Learning is Math
Backpropagation is a Backpropagation of Error
Algorithm
works like gradient descent
Activation Functions: Logistics, tanh
Application in Data Mining, Pattern Recognition
data preparation is important
Finding an appropriate Architecture
64 Supervised Learning