Você está na página 1de 9

Support Vector Machine (2 classes non-linear boundary)

YIK LUN, KEI


allen29@ucla.edu
This paper is a lab from the book called An Introduction to Statistical Learning
with Applications in R. All R codes and comments below are belonged to the
book and authors.

SVM using a non-linear kernel

0
2
4

x[,2]

library(e1071)
set.seed(1)
x=matrix(rnorm(200*2),ncol=2)
x[1:100,]=x[1:100,]+2
x[101:150,]= x[101:150,] -2
y=c(rep(1,150),rep(2,50) )
dat=data.frame(x=x,y=as.factor(y))
plot(x, col=y)

0
x[,1]

train=sample(200,100)
svmfit =svm(y~.,data=dat[train,],kernel="radial",gamma =1,cost =1)
summary(svmfit)
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##

Call:
svm(formula = y ~ ., data = dat[train, ], kernel = "radial",
gamma = 1, cost = 1)
Parameters:
SVM-Type:
SVM-Kernel:
cost:
gamma:

C-classification
radial
1
1

Number of Support Vectors:

37

( 17 20 )
Number of Classes:

Levels:
1 2

plot(svmfit,dat[train,])

SVM classification plot


x
o
o o
oo o
oo
o oo o
xx
o
oo
x
x x
oo
o
x
xooxo ooo o o
x
x
xx o
x
xx
o
x
x
xx
oo
x
x
x
x
x x
xo
o
x oo o o
x
x o
x
x
x
oo o
o ooo o
o
x x
o oo o o

x.1

1
0
1
2
3 x

o
4

ox

oo

o o
0

x.2

svmfit =svm(y~., data=dat[train,],kernel="radial",gamma =1,cost=1e5)


plot(svmfit,dat[train,]) #irregular decision boundary will cause overfitting

SVM classification plot


o
o
o o
oo o
oo
o oo o
xx
o
oo
o
x x
oo
o
x
xooxo x oo o o
x
o x
xo o
xx
o
x
x
xx
oo
o
x
x
x
x x
x xo
o
o oo o o
x
o o
x
o
o
oo o
o ooo o
o
o o
o oo o o

x.1

1
0
1
2
3 o

o
4

o o

x.2

Cross-Validation (cost=1 and gamma=2)


set.seed(1)
tune.out=tune(svm , y~., data=dat[train ,], kernel ="radial",
ranges =list(cost=c(0.1 ,1 ,10 ,100 ,1000),
gamma=c(0.5,1,2,3,4,50) ))
summary(tune.out)
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##

oo

oo

Parameter tuning of 'svm':


- sampling method: 10-fold cross validation
- best parameters:
cost gamma
1
2
- best performance: 0.12
- Detailed performance results:
cost gamma error dispersion
1 1e-01
0.5 0.27 0.11595018
2 1e+00
0.5 0.13 0.08232726
4

##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##

3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30

1e+01
1e+02
1e+03
1e-01
1e+00
1e+01
1e+02
1e+03
1e-01
1e+00
1e+01
1e+02
1e+03
1e-01
1e+00
1e+01
1e+02
1e+03
1e-01
1e+00
1e+01
1e+02
1e+03
1e-01
1e+00
1e+01
1e+02
1e+03

0.5
0.5
0.5
1.0
1.0
1.0
1.0
1.0
2.0
2.0
2.0
2.0
2.0
3.0
3.0
3.0
3.0
3.0
4.0
4.0
4.0
4.0
4.0
50.0
50.0
50.0
50.0
50.0

0.15
0.17
0.21
0.25
0.13
0.16
0.20
0.20
0.25
0.12
0.17
0.19
0.20
0.27
0.13
0.18
0.21
0.22
0.27
0.15
0.18
0.21
0.24
0.27
0.22
0.23
0.23
0.23

0.07071068
0.08232726
0.09944289
0.13540064
0.08232726
0.06992059
0.09428090
0.08164966
0.12692955
0.09189366
0.09486833
0.09944289
0.09428090
0.11595018
0.09486833
0.10327956
0.08755950
0.10327956
0.11595018
0.10801234
0.11352924
0.08755950
0.10749677
0.11595018
0.11352924
0.12516656
0.12516656
0.12516656

Predict
pred=predict(tune.out$best.model,newx=dat[-train ,])
table(true=dat[-train ,"y"], pred)
##
pred
## true 1 2
##
1 56 21
##
2 18 5

ROC Curve function


library (ROCR)
## Loading required package: gplots
##
## Attaching package: 'gplots'
##
## The following object is masked from 'package:stats':
5

##
##

lowess

rocplot =function(pred , truth , ...){


predob = prediction(pred,truth)
perf = performance (predob , "tpr", "fpr")
plot(perf ,...)}

ROC
svmfit.opt=svm(y~., data=dat[train,], kernel="radial",
gamma =2, cost=1, decision.values =T)
fitted=attributes(predict(svmfit.opt,dat[train,],
decision.values =TRUE))$decision.values
rocplot(fitted ,dat[train,"y"], main="Training Data")

0.8
0.6
0.4
0.2
0.0

True positive rate

1.0

Training Data

0.0

0.2

0.4

0.6

False positive rate

svmfit.flex=svm (y~., data=dat[train ,], kernel ="radial",


gamma =50, cost=1, decision.values =T)
fitted=attributes(predict(svmfit.flex,dat[train,],
decision.values =T))$decision.values
rocplot(fitted ,dat[train ,"y"],main="Training Data")

0.8

1.0

0.8
0.6
0.4
0.2
0.0

True positive rate

1.0

Training Data

0.0

0.2

0.4

0.6

0.8

False positive rate

ROC on testing data (gamma =2 is more accurate)


fitted =attributes(predict(svmfit.opt,dat[-train,],
decision.values =T))$decision.values
rocplot (fitted ,dat [-train ,"y"], main ="Test Data")

1.0

0.8
0.6
0.4
0.2
0.0

True positive rate

1.0

Test Data

0.0

0.2

0.4

0.6

False positive rate

fitted =attributes(predict(svmfit.flex ,dat[-train ,],


decision.values =T))$decision.values
rocplot (fitted ,dat [-train ,"y"], main ="Test Data")

0.8

1.0

0.8
0.6
0.4
0.2
0.0

True positive rate

1.0

Test Data

0.0

0.2

0.4

0.6

0.8

1.0

False positive rate

Reference:
James, Gareth, et al. An introduction to statistical learning. New
York: springer, 2013.