Você está na página 1de 2

Support Vector Machine as a Classifier

It is illustrated using a sample data from a telecom company. It contains usage and
demographic information of the customer and whether they churn or not. Support
Vector machine is used to build a classification model to identify churn behavior of
the customer. Following are the steps followed to build a model.
Tool used R
Package e1071
Dataset Churn.csv
1) The most common challenge faced while building a classification model is low
event rate. It is difficult for any algorithm to train the model. To create the
biased sample use the manual function created. Pass the biased percent and
training percent to the function and it will give out biased training dataset
and test dataset with original event rate
sampling <- function(trnpct=.5,
bsdpct=.15,
seed = 123)
2) Next most important step to build an SVM, is feature selection. This package
has many wrapper functions to identify the most important variables. Most
widely used algorithm for feature selection is Random Forest. It calculates
decrease in Gini at each node and order the variables based on the mean.
Pass the variables in the formula and the training dataset
tunerf <- tune.randomForest(churn ~ .,data = data[[1]])
MeanDecreaseGini
Account_Length
14.353478
Int_l_Plan
30.177718
VMail_Plan
5.217211
VMail_Message
13.145090
Day_Mins
56.048435
Day_Calls
15.106658
Day_Charge
61.052403
Eve_Mins
25.277152
Eve_Calls
12.220649
Eve_Charge
24.576259
Night_Mins
16.589489
Night_Calls
12.827148
Night_Charge
16.169401
Intl_Mins
15.826774
Intl_Calls
17.165469
Intl_Charge
16.065042
CustServ_Calls
66.314815

3) There are many parameters to play around while building a model using SVM.
To identify the best set of parameters, following are the steps.
a. Need to build the model with both Classification Types (C, nu) which
uses different error functions to optimize

b. Need to build model using all four kind of Kernel Functions (Linear,
Polynomial, Radial and Sigmoid)
c. Pass the range of values for Cost, Gamma and Epsilon
d. Following function will iterate through all the ranges and will pick the
best set of parameters (Cost, Gamma and Epsilon)
tunepara[[k]] <- tune(svm, train.x = formula, data = data[[1]],
ranges = list(cost = ranges[[1]], gamma = ranges[[2]], epsilon
= ranges[[3]]),
kernel = kernel[j], type = type[i])
Parameter tuning of svm:
- sampling method: 10-fold cross validation
- best parameters:
cost gamma epsilon
0.25 0.25
0
- best performance: 0.2326566

4) Build the SVM model for each type and kernel combination with optimized set
of parameters
model[[k]] <- svm(formula , data = train, type = type[i], kernel =
kernel[j],
cost = tunepara[[k]][[1]]$cost,
gamma = tunepara[[k]][[1]]$gamma,
epsilon = tunepara[[k]][[1]]$epsilon)
Call:
svm(formula = formula, data = data[[1]], type = type[i], kernel = kernel[j], cost =
tunepara[[k]][[1]]$cost,
gamma = tunepara[[k]][[1]]$gamma, epsilon = tunepara[[k]][[1]]$epsilon)
Parameters:
SVM-Type: nu-classification
SVM-Kernel: linear
gamma: 0.25
nu: 0.5
Number of Support Vectors: 547

5) To identify the best model between the 8 models created Specificity,


Sensitivity and Accuracy are the most wide used metrics for SVM
mresult$sensitivity <- round(mresult$TP/(mresult$TP+mresult$FN),2)
mresult$specificity <- round(mresult$TN/(mresult$TN+mresult$FP),2)
mresult$accuracy <- round((mresult$TP + mresult$TN)/
(mresult$TP + mresult$TN + mresult$FP + mresult$FN),2)

Você também pode gostar