Você está na página 1de 64

Business School

Institute of
Business Informatics

Supervised Learning
Uwe Lmmel
www.wi.hs-
wismar.de/~laemmel
U.laemmel@wi.hs-
wismar.de
1 Supervised Learning
Neural Networks

Idea
Artificial Neuron & Network
Supervised Learning
Unsupervised Learning
Data Mining other
Techniques

2 Supervised Learning
Supervised Learning

Feed-Forward Networks
Perceptron AdaLinE LTU
Multi-Layer networks
Backpropagation Algorithm
Pattern recognition
Data preparation
Examples
Bank Customer
Customer Relationship

3 Supervised Learning
Connections
Feed-forward Feed-back / auto-associative
From (output) layer back to
Input layer previous (hidden/input) layer
Hidden layer All neurons fully connected to
Output layer each other

Hopfield
network

4 Supervised Learning
Perceptron Adaline TLU

One layer of trainable links only


Adaptive linear element
Threshold Linear Unit

class of neural network of a special architecture:

...

5 Supervised Learning
Papert, Minsky and Perceptron -
History
"Once upon a time two daughter sciences were born to the new science
of cybernetics.
One sister was natural, with features inherited from the study of the
brain, from the way nature does things.
The other was artificial, related from the beginning to the use of
computers.

But Snow White was not dead.
What Minsky and Papert had shown the world as proof was not the
heart of the princess; it was the heart of a pig."
Seymour Papert, 1988

6 Supervised Learning
Perception
mapping layer

Perception
first step of recognition

becoming aware of output-layer


something via the senses

picture
trainable, fully
connected
fixed 1-1- links

7 Supervised Learning
Perceptron

Input layer
binary input, passed trough,
no trainable links
Propagation function netj = oiwij

Activation function
oj = aj = 1 if netj j , 0 otherwise

A perceptron can learn all the functions,


that can be represented, in a finite
time .
(perceptron convergence theorem, F. Rosenblatt)

8 Supervised Learning
Linear separable

Neuron j should be 0,
iff both neurons 1 and 2 have the same
value (o1=o2), otherwise 1:

netj = o1w1j + o2w2j


j

?
0 w1j + 0w2j < j j
0 w1j + 1w2j j
w1j w2j
1 w1j + 0w2j j
1 w1j + 1w2j < j 1 2

9 Supervised Learning
Linear o2
separable
1 (1,1)

netj = o1w1j + o2w2j (0,0) o1


1
line in a 2-dim. space o1*w1 +o2*w2=q
line divides plane so,
that (0,1) and (1,0) are in different sub planes.
the network can not solve the problem.
a perceptron can represent only some functions
a neural network representing the XOR-
function needs hidden neurons

10 Supervised Learning
Learning is easy

while input pattern do begin


next input patter
calculate output
for each j in OutputNeurons do
if ojtj then
if oj=0 then {output=0, but 1 expected }
for each i in InputNeurons do
wij:=wij+oi
else if oj=1 then {output=1, but 0 expected }
for each i in InputNeurons do
wij:=wij-oi ;
end repeat until desired behaviour

11 Supervised Learning
Exercise

Decoding
input: binary code of a digit
output - unary representation:
as many digits 1, as the digit
represents:
5:11111
architecture:

12 Supervised Learning
Exercise

Decoding
input: Binary code of a digit
output: classification:
0~ 1st Neuron, 1~ 2nd Neuron, ... 5~ 6th
Neuron, ...
architecture:

13 Supervised Learning
Exercises

1. Look at the EXCEL-file of the decoding problem


2. Implement (in PASCAL/Java)
a 4-10-Perceptron which transforms a binary
representation of a digit (0..9) into a decimal
number.
Implement the learning algorithm and train the
network.
3. Which task can be learned faster?
(Unary representation or classification)

14 Supervised Learning
Exercises

5. Develop a perceptron for the


recognition of digits 0..9. (pixel
representation)
input layer: 3x7-input neurons
Use the SNNS or JavaNNS
6. Can we recognize numbers greater
than 9 as well?
7. Develop a perceptron for the
recognition of capital letters. (input
layer 5x7)

15 Supervised Learning
multi-layer Perceptron

Cancels the limits of a


perceptron

several trainable layers


a two layer perceptron can classify convex
polygons
a three layer perceptron can classify any sets

multi layer perceptron = feed-forward network


= backpropagation
16
network
Supervised Learning
Multi-layer feed-forward network

17 Supervised Learning
Feed-Forward Network

18 Supervised Learning
Evaluation of the net output in a feed
forward network
pattern p
Training

Oi=pi
Ni

netj
Oj=actj
Nj

netk Ok=act
Nk
k

Input-Layer hidden Layer(s) Output Layer

19 Supervised Learning
Backpropagation-Learning
Algorithm
supervised Learning
error is a function of the weights w i :
E(W) = E(w1,w2, ... , wn)
We are looking for a minimal error
minimal error = hollow in the error
surface
Backpropagation uses the gradient
for weight adaptation

20 Supervised Learning
error curve

weight1
weight2

21 Supervised Learning
Problem

output teaching
output
hidden
layer

error in output layer:


input layer difference output teaching output
error in a hidden layer?
22 Supervised Learning
Gradient descent

0,80 Gradient:
Vector orthogonal to a
surface in direction
0,40 of the strongest slope
derivation of a function
in a certain direction is
0,00 the projection of the
-1 -0,6 -0,2 0,2 0,6 1 gradient in this
direction
example of an error curve
of a weight wi

23 Supervised Learning
Example: Newton-Approximation

tan = f(x) = 2x
tan = f(x) / (x-x)
x =(x + a/x)
calculation of the root
f(x) = x-5
f(x)= x-a
x = 2

x = (x + 5/x) = 2.25
x
x X= (x + 5/x) =
2.2361

24 Supervised Learning
Backpropagation - Learning

gradient-descent algorithm
supervised learning:
error signal used for weight adaptation
error signal:
teaching calculated output , if output neuron
weighted sum of error signals of successor

weight adaptation:
: Learning rate
: error signal w wij oi j
'
ij

25 Supervised Learning
Standard-Backpropagation Rule
gradient descent: derivation of a function
logistic function: 1
f Logistic ( x)
1 ex
fact(netj) = fact(netj)(1- fact(netj)) = oj(1-oj)

the error signal jis therefore:

o j (1 o j ) k w jk if j is hidden neuron

j k

o j (1 o j ) (t j o j ) if j is output neuron

wij' wij oi j

26 Supervised Learning
Backpropagation

Examples:
XOR (Excel)
Bank Customer

27 Supervised Learning
Backpropagation - Problems

A B C

28 Supervised Learning
Backpropagation-Problems

A: flat plateau
weight adaptation is slow
finding a minimum takes a lot of time

B: Oscillation in a narrow gorge


it jumps from one side to the other and back

C: leaving a minimum
if the modification in one training step is to
high,
the minimum can be lost

29 Supervised Learning
Solutions: looking at the values

change the parameter of the logistic


function in order to get other values
Modification of weights depends on the
output:
if oi=0 no modification will take place
If we use binary input we probably have a
lot of zero-values: change [0,1] into [- ,
] or [-1,1]
use another activation function, eg. tanh
and use [-1..1] values

30 Supervised Learning
Solution: Quickprop
assumption: error curve is a square function
calculate the vertex of the curve

S (t )
wij (t ) wij (t 1)
S (t 1) S (t )

slope of the error curve:


E
S (t )
wij (t )
-2 2 6

31 Supervised Learning
Resilient Propagation (RPROP)
sign and size of the weight modification are calculated
separately: bij(t) size of modification
bij(t-1) + if S(t-1)S(t) > 0
bij(t) = bij(t-1) - if S(t-1)S(t) < 0
bij(t-1) otherwise
+>1 : both ascents are equal big step
0<-<1 : ascents are different smaller step
-bij(t) if S(t-1)>0 S(t) > 0
wij(t) = bij(t) f S(t-1)<0 S(t) < 0
-wij(t-1) if S(t-1)S(t) < 0 (*)
-sgn(S(t))bij(t) otherwise

(*) S(t) is set to 0, S(t):=0 ; at time (t+1) the 4th case will be applied.

32 Supervised Learning
Limits of the Learning Algorithm

it is not a model for biological learning


no teaching output in natural learning
no feedbacks in a natural neural network
(at least nobody has discovered yet)
training of an ANN is rather time consuming

33 Supervised Learning
Exercise - JavaNNS

Implement a feed forward network containing of


2 input neurons, 2 hidden neurons and one output
neuron.
Train the network so that it simulates the XOR-
function.

Implement a 4-2-4-network, which works like the


identity function. (Encoder-Decoder-Network).
Try other versions: 4-3-4, 8-4-8, ...
What can you say about the training effort?

34 Supervised Learning
Pattern Recognition
Eingabeschicht
input layer 1. 1.
verdeckte
hidden 2.2.verdeckte
hidden Ausgabe-
output
Schicht
layer schicht
layer schicht
layer







35 Supervised Learning
Example: Pattern Recognition

JavaNNS example: Font

36 Supervised Learning
font Example

input = 24x24 pixel-array


output layer: 75 neurons, one neuron for each
character:
digits
letters (lower case, capital)
separators and operator characters
two hidden layer of 4x6 neurons each
all neuron of a row of the input layer are linked to
one neuron of the first hidden layer
all neuron of a column of the input layer are linked
to one neuron of the second hidden layer.

37 Supervised Learning
Exercise
load the network font_untrained
train the network, use various learning
algorithms:
(look at the SNNS documentation for the
parameters and their meaning)
Backpropagation =2.0
Backpropagation =0.8 mu=0.6 c=0.1
with momentum
Quickprop =0.1 mg=2.0 n=0.0001
Rprop =0.6
use various values for
learning parameter, momentum, and noise:
learning parameter 0.2 0.3 0.5 1.0
Momentum 0.9 0.7 0.5 0.0
noise 0.0 0.1 0.2

38 Supervised Learning
Example: Bank Customer

A1: Credit history


A2: debt
A3: collateral
A4: income

network architecture depends on the coding of input and output


How can we code values like good, bad, 1, 2, 3, ...?
39 Supervised Learning
Data Pre-processing

objectives methods
prospects of better selection and
results integration
adaptation to algorithms completion
data reduction transformation
trouble shooting normalization
coding
filter

40 Supervised Learning
Selection and Integration

unification of data (different origins)


selection of attributes/features
reduction
omit obviously non-relevant data
all values are equal
key values
meaning not relevant
data protection

41 Supervised Learning
Completion / Cleaning

Missing values
ignore / omit attribute
add values
manual
global constant (missing
value)
average
highly probable value
remove data set
noised data
inconsistent data

42 Supervised Learning
Transformation

Normalization
Coding
Filter

43 Supervised Learning
Normalization of values

Normalization equally distributed


in the range [0,1]
e.g. for the logistic function
act = (x-minValue) / (maxValue - minValue)
in the range [-1,+1]
e.g. for activation function tanh
act = (x-minValue) / (maxValue - minValue)*2-1

logarithmic normalization
act = (ln(x) - ln(minValue)) / (ln(maxValue)-
ln(minValue))

44 Supervised Learning
Binary Coding of nominal values I

no order relation, n-values


n neurons,
each neuron represents one and only one value:
example:
red, blue, yellow, white,
black
1,0,0,0,0 0,1,0,0,0 0,0,1,0,0 ...
disadvantage:
n neurons necessary lots of zeros in the input

45 Supervised Learning
Bank Customer
credit
history debt collateral income

Are these customers good


ones?
1: bad high adequate 3
46 Supervised Learning 2: good low adequate 2
Data Mining Cup
2002
The Problem: A Mailing Action
mailing action of a company:
special offer
estimated annual income per customer:

will will
customer cancel not cancel
gets an offer 43.80 66.30

given: gets no offer 0.00 72.00

10,000 sets of customer data


containing 1,000 cancellers (training)
problem:
test set containing 10,000 customer data
Who will cancel ? Whom to send an offer?
47 Supervised Learning
will will
Mailing Action Aim?customer cancel not
cancel
gets an 43.80
66.30
offer
gets no
0.00 72.00
offer
no mailing action:
9,000 x 72.00 = 648,000
everybody gets an offer:
1,000 x 43.80 + 9,000 x 66.30 =
640,500
maximum (100% correct classification):
1,000 x 43.80 + 9,000 x 72.00 =
691,800

48 Supervised Learning
Goal Function: Lift will will
customer cancel not
cancel
gets an 43.80
66.30
offer
gets no
0.00 72.00
offer

basis: no mailing action: 9,000 72.00


goal = extra income:
liftM = 43.8 cM + 66.30 nkM 72.00 nkM

49 Supervised Learning
----- 32 input data ------

<important

Data
results>

^missing values^

50 Supervised Learning
Feed Forward Network What to do?

train the net with training set (10,000)


test the net using the test set ( another 10,000)
classify all 10,000 customer into canceller or
loyal
evaluate the additional income
51 Supervised Learning
Results
data mining cup neural network project
2002 2004

gain:
additional income by the mailing action
if target group was chosen according analysis
52 Supervised Learning
Review Students Project

copy of the data mining cup


real data
known results motivati enthusias
contest on m

better results

wishes
engineering approach data
mining
real data for teaching purposes
53 Supervised Learning
Data Mining Cup 2007

started on April 10.


check-out couponing
Who will get a rebate coupon?
50,000 data sets for training

54 Supervised Learning
Data

55 Supervised Learning
DMC2007

~75% output = N(o)


e.g. classification has to > 75%!!
first experiments: no success
deadline: May 31st

56 Supervised Learning
Optimization of Neural Networks

objectives
good results in an application:
better generalisation
(improve correctness)
faster processing of patterns
(improve efficiency)
good presentation of the results
(improve comprehension)

57 Supervised Learning
Ability to generalize

a trained net can classify data


(out of the same class as the learning data)
that it has never seen before
aim of every ANN development

network too large:


all training patterns are learned from memory
no ability to generalize
network too small:
rules of pattern recognition can not be learned
(simple example: Perceptron and XOR)

58 Supervised Learning
Development of an NN-application

build a network calculate


architecture network
output compare to quality is good
input of training
pattern teaching enough
output
modify use Test set data
weights error is too
change high evaluate output
parameters
compare to teaching
output
error is too high

quality is good enough


59 Supervised Learning
Possible Changes
Architecture of NN
size of a network
shortcut connection
partial connected layers
remove/add links
receptive areas
Find the right parameter
values
learning parameter
size of layers
using genetic algorithms

60 Supervised Learning
Memory Capacity

Number of patterns
a network can store without generalisation

figure out the memory capacity


change output-layer: output-layer input-layer

train the network with an increasing number of random


patterns:
error becomes small: network stores all
patterns
error remains: network can not store all patterns
in between: memory capacity

61 Supervised Learning
Memory Capacity - Experiment

output-layer is a copy of
the input-layer
training set consisting of
n random pattern
error:
error = 0
network can store more
than n patterns
error >> 0
network can not store n
patterns
memory capacity:
error > 0 and error = 0
for n-1 patterns and
error >>0 for n+1
patterns
62 Supervised Learning
Layers Not fully Connected

connections:
new
removed
remaining

partial connected (e.g. 75%)


remove links, if weight has been nearby 0 for
several training steps
build new connections (by chance)

63 Supervised Learning
Summary

Feed-forward network
Perceptron (has limits)
Learning is Math
Backpropagation is a Backpropagation of Error
Algorithm
works like gradient descent
Activation Functions: Logistics, tanh
Application in Data Mining, Pattern Recognition
data preparation is important
Finding an appropriate Architecture

64 Supervised Learning

Você também pode gostar