Escolar Documentos
Profissional Documentos
Cultura Documentos
>
>
>
>
0.14583333
136
0.08583691
147
0.14583333
166
0.08583691
179
0.08583691
191
0.08583691
216
0.14583333
229
0.08583691
242
0.08583691
262
0.20000000
287
0.70000000
318
0.03846154
329
0.08583691
347
0.61111111
364
0.08583691
387
0.17777778
408
0.61111111
425
0.03846154
444
0.08583691
461
0.85000000
487
0.08583691
506
0.08583691
523
0.85000000
544
0.08583691
560
0.20000000
570
0.85000000
585
0.08583691
606
0.67857143
628
0.67857143
642
0.18750000
658
0.20000000
137
0.08583691
150
0.08583691
168
0.14583333
181
0.85714286
203
0.08583691
218
0.08583691
232
0.08583691
251
0.14583333
264
0.08583691
290
0.83333333
319
0.08583691
331
0.20000000
348
0.27272727
365
0.20000000
391
0.08583691
410
0.08583691
428
0.08583691
445
0.85000000
465
0.08583691
495
0.81818182
507
0.08583691
525
0.83333333
546
0.83333333
561
0.03846154
571
0.27272727
588
0.18750000
607
0.08583691
630
0.08583691
644
0.08583691
659
0.08583691
138
0.18750000
151
0.08583691
169
0.08583691
182
0.85000000
205
0.08583691
219
0.81818182
234
0.83333333
253
0.75000000
265
0.08583691
301
0.17777778
320
0.20000000
334
0.08583691
351
0.17777778
366
0.08583691
395
0.17777778
415
0.83333333
429
0.08583691
449
0.08583691
468
0.08583691
497
0.85000000
511
0.61111111
536
0.17777778
553
0.21428571
562
0.67857143
572
0.08583691
594
0.81818182
608
0.85000000
632
0.67857143
648
0.08583691
664
0.08583691
140
0.08583691
153
0.08583691
170
0.20000000
183
0.67857143
207
0.08583691
223
0.08583691
236
0.20000000
255
0.14583333
267
0.08583691
303
0.08583691
322
0.20000000
336
0.14583333
353
0.17777778
370
0.27272727
398
0.85000000
417
0.20000000
432
0.71428571
450
0.03846154
469
0.08583691
498
0.08583691
517
0.14583333
540
0.17777778
555
0.14583333
564
0.85000000
574
0.18750000
598
0.67857143
610
0.08583691
633
0.61111111
650
0.18750000
666
0.03846154
142
0.85000000
159
0.20000000
173
0.20000000
185
0.83333333
208
0.61111111
225
0.08583691
237
0.14583333
256
0.85000000
275
0.75000000
307
0.08583691
325
0.08583691
337
0.20000000
358
0.08583691
372
0.08583691
404
0.08583691
418
0.03846154
437
0.08583691
455
0.20000000
473
0.14583333
499
0.83333333
519
0.14583333
541
0.27272727
556
0.61111111
566
0.27272727
575
0.14583333
600
0.08583691
617
0.85000000
639
0.17777778
651
0.21428571
667
0.20000000
143
0.75000000
161
0.08583691
176
0.17777778
187
0.67857143
212
0.08583691
226
0.08583691
239
0.08583691
259
0.17777778
276
0.08583691
308
0.18750000
326
0.14583333
338
0.61111111
359
0.08583691
383
0.08583691
406
0.20000000
421
0.08583691
439
0.21428571
456
0.17777778
482
0.83333333
502
0.85000000
520
0.08583691
542
0.08583691
557
0.67857143
567
0.61111111
577
0.20000000
602
0.14583333
619
0.75000000
640
0.70000000
654
0.85000000
670
0.08583691
144
0.20000000
164
0.14583333
177
0.20000000
188
0.18750000
213
0.67857143
227
0.70000000
241
0.83333333
260
0.17777778
286
0.85000000
314
0.18750000
328
0.08583691
340
0.14583333
362
0.08583691
385
0.08583691
407
0.08583691
424
0.08583691
441
0.08583691
460
0.17777778
486
0.14583333
505
0.83333333
521
0.08583691
543
0.03846154
558
0.08583691
568
0.08583691
578
0.08583691
605
0.17777778
625
0.20000000
641
0.83333333
657
0.18750000
671
> str(perf1)
Formal class 'performance' [package "ROCR"] with 6 slots
..@ x.name
: chr "False positive rate"
..@ y.name
: chr "True positive rate"
..@ alpha.name : chr "Cutoff"
..@ x.values
:List of 1
.. ..$ : num [1:18] 0 0 0.0446 0.0706 0.0855 ...
..@ y.values
:List of 1
.. ..$ : num [1:18] 0 0.0153 0.1679 0.2443 0.2748 ...
..@ alpha.values:List of 1
.. ..$ : num [1:18] Inf 0.857 0.85 0.833 0.818 ...
> #Calcolo dell'AUC
> #Da notare che il calcolo approssimato
> mean(sample(fit2.prob[,2],1000,replace=T) > sample(fit2.prob[,1],1000,replace=
T))
[1] 0.089
> mean(sample(fit2.prob[,1],1000,replace=T) > sample(fit2.prob[,2],1000,replace=
T))
[1] 0.888
> #----------------------------------------------------------------------------> data = read.csv("/Users/michelesummo/Desktop/GestioneCSVs/credit.csv")
> set.seed(12345)
> rand.seq = runif(1000)
> str(rand.seq)
num [1:1000] 0.721 0.876 0.761 0.886 0.456 ...
> rand.seq.ordering = order(rand.seq)
> str(rand.seq.ordering)
int [1:1000] 14 448 697 32 196 83 119 602 443 945 ...
> credit.rand = data[rand.seq.ordering]
Errore in [.data.frame (data, rand.seq.ordering) :
undefined columns selected
> credit.rand = data[rand.seq.ordering,]
> credit.train = credit.rand[1:900,]
> credit.test = credit.rand[901:1000,]
> library(C50)
> credit.model = C5.0(credit.train[,-17],credit.train$default)
> summary(credit.model)
Call:
C5.0.default(x = credit.train[, -17], y = credit.train$default)
C5.0 [Release 2.07 GPL Edition]
------------------------------Class specified by attribute
outcome'
:
:...checking_balance = < 0 DM: no (1)
:
checking_balance in {> 200 DM,1 - 200 DM}: yes (5/1)
credit_history in {critical,good,poor}:
:...months_loan_duration <= 11: no (87/14)
months_loan_duration > 11:
:...savings_balance = > 1000 DM: no (13)
savings_balance in {< 100 DM,100 - 500 DM,500 - 1000 DM,unknown}:
:...checking_balance = > 200 DM:
:...dependents > 1: yes (3)
: dependents <= 1:
: :...credit_history in {good,poor}: no (23/3)
:
credit_history = critical:
:
:...amount <= 2337: yes (3)
:
amount > 2337: no (6)
checking_balance = 1 - 200 DM:
:...savings_balance = unknown: no (34/6)
: savings_balance in {< 100 DM,100 - 500 DM,500 - 1000 DM}:
: :...months_loan_duration > 45: yes (11/1)
:
months_loan_duration <= 45:
:
:...other_credit = store:
:
:...age <= 35: yes (4)
:
: age > 35: no (2)
:
other_credit = bank:
:
:...years_at_residence <= 1: no (3)
:
: years_at_residence > 1:
:
: :...existing_loans_count <= 1: yes (5)
:
:
existing_loans_count > 1:
:
:
:...percent_of_income <= 2: no (4/1)
:
:
percent_of_income > 2: yes (3)
:
other_credit = none:
:
:...job = unemployed: no (1)
:
job = management:
:
:...amount <= 7511: no (10/3)
:
: amount > 7511: yes (7)
:
job = unskilled: [S1]
:
job = skilled:
:
:...dependents <= 1: no (55/15)
:
dependents > 1:
:
:...age <= 34: no (3)
:
age > 34: yes (4)
checking_balance = < 0 DM:
:...job = management: no (26/6)
job = unemployed: yes (4/1)
job = unskilled:
:...employment_duration in {4 - 7 years,
: :
unemployed}: no (4)
: employment_duration = < 1 year:
: :...other_credit = bank: no (1)
: : other_credit in {none,store}: yes (11/2)
: employment_duration = > 7 years:
: :...other_credit in {bank,none}: no (5/1)
: : other_credit = store: yes (2)
: employment_duration = 1 - 4 years:
: :...age <= 39: no (14/3)
:
age > 39:
:
:...credit_history in {critical,good}: yes (3)
:
credit_history = poor: no (1)
job = skilled:
:...credit_history = poor:
:...savings_balance in {< 100 DM,100 - 500 DM,
: :
500 - 1000 DM}: yes (8)
: savings_balance = unknown: no (1)
credit_history = critical:
:...other_credit = store: no (0)
: other_credit = bank: yes (4)
: other_credit = none:
: :...savings_balance in {100 - 500 DM,
:
:
unknown}: no (1)
:
savings_balance = 500 - 1000 DM: yes (1)
:
savings_balance = < 100 DM:
:
:...months_loan_duration <= 13:
:
:...percent_of_income <= 3: yes (3)
:
: percent_of_income > 3: no (3/1)
:
months_loan_duration > 13:
:
:...amount <= 5293: no (10/1)
:
amount > 5293: yes (2)
credit_history = good:
:...existing_loans_count > 1: yes (5)
existing_loans_count <= 1:
:...other_credit = store: no (2)
other_credit = bank:
:...percent_of_income <= 2: yes (2)
: percent_of_income > 2: no (6/1)
other_credit = none: [S2]
SubTree [S1]
employment_duration in {< 1 year,1 - 4 years}: yes (11/3)
employment_duration in {> 7 years,4 - 7 years,unemployed}: no (8)
SubTree [S2]
savings_balance = 100 - 500 DM: yes (3)
savings_balance = 500 - 1000 DM: no (1)
savings_balance = unknown:
:...phone = no: yes (9/1)
: phone = yes: no (3/1)
savings_balance = < 100 DM:
:...percent_of_income <= 1: no (4)
percent_of_income > 1:
:...phone = yes: yes (10/1)
phone = no:
:...purpose in {business,car0,education,renovations}: yes (3)
purpose = car:
:...percent_of_income <= 3: no (2)
: percent_of_income > 3: yes (6/1)
purpose = furniture/appliances:
:...years_at_residence <= 1: no (4)
years_at_residence > 1:
:...housing = other: no (1)
housing = rent: yes (2)
housing = own:
:...amount <= 1778: no (3)
amount > 1778:
:...years_at_residence <= 3: yes (6)
years_at_residence > 3: no (3/1)
Evaluation on training data (900 cases):
Decision Tree
---------------Size
Errors
66 125(13.9%)
(a) (b)
---- ---609
23
102 166
<<
<-classified as
(a): class no
(b): class yes
Attribute usage:
100.00%
60.22%
53.22%
49.44%
30.89%
25.89%
17.78%
9.67%
7.22%
6.67%
5.78%
5.56%
3.78%
3.44%
3.33%
1.67%
checking_balance
credit_history
months_loan_duration
savings_balance
job
other_credit
dependents
existing_loans_count
percent_of_income
employment_duration
phone
amount
years_at_residence
age
purpose
housing
Column Total |
73 |
27 |
100 |
---------------|-----------|-----------|-----------|
> credit.model.rules = C5.0(credit.train[,-17], credit.train$default, rules=T)
> str(credit.model.rules)
List of 16
$ names
: chr "| Generated using R version 3.0.2 (2013-09-25)\n| on Ven G
en 03 22:41:52 2014\n| function call: makeNamesFile(x = x, y = y, lab"| __trunca
ted__
$ cost
: chr ""
$ costMatrix : NULL
$ caseWeights : logi FALSE
$ control
:List of 11
..$ subset
: logi TRUE
..$ bands
: num 0
..$ winnow
: logi FALSE
..$ noGlobalPruning: logi FALSE
..$ CF
: num 0.25
..$ minCases
: num 2
..$ fuzzyThreshold : logi FALSE
..$ sample
: num 0
..$ earlyStopping : logi TRUE
..$ label
: chr "outcome"
..$ seed
: int 3455
$ trials
: Named num [1:2] 1 1
..- attr(*, "names")= chr [1:2] "Requested" "Actual"
$ rbm
: logi TRUE
$ boostResults: NULL
$ size
: int 21
$ dims
: int [1:2] 900 16
$ call
: language C5.0.default(x = credit.train[, -17], y = credit.train
$default, rules = T)
$ levels
: chr [1:2] "no" "yes"
$ output
: chr "\nC5.0 [Release 2.07 GPL Edition] \tFri Jan 3 22:41:52 2
014\n-------------------------------\n\nClass specified by attribute "| __trunca
ted__
$ tree
: chr ""
$ predictors : chr [1:16] "checking_balance" "months_loan_duration" "credit_hi
story" "purpose" ...
$ rules
: chr "id=\"See5/C5.0 2.07 GPL Edition 2014-01-03\"\nentries=\"1\
"\nrules=\"21\" default=\"no\"\nconds=\"3\" cover=\"15\" ok=\"15\" li"| __trunca
ted__
- attr(*, "class")= chr "C5.0"
> library(hmeasure)
>
> pred = predict(credit.model.rules, credit.test)
> misclassCounts(pred, credit.true)
Errore in as.array(true.class) : oggetto "credit.true" non trovato
> misclassCounts(pred, credit.test$default)
$conf.matrix
pred.1 pred.0
actual.1
0
0
actual.0
0
0
$metrics
ER Sens Spec Precision Recall TPR FPR F Youden
1 NaN NaN NaN
NaN
NaN NaN NaN NaN
NaN
> misclassCounts(pred, credit.test$Default)
Errore in as.array.default(true.class) :
tentativo di impostare un attributo a NULL
> str(pred)
Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 2 2 ...
> str(credit.test$default)
Factor w/ 2 levels "no","yes": 1 2 1 1 2 1 1 2 2 2 ...
> pred.unclass = unclass(pred) - 1
> true.unclass = unclass(credit.test$default) - 1
> misclassCounts(pred.unclass, true.unclass)
$conf.matrix
pred.1 pred.0
actual.1
15
17
actual.0
7
61
$metrics
ER
Sens
Spec Precision Recall
TPR
FPR
F
Youden
1 0.24 0.46875 0.8970588 0.6818182 0.46875 0.46875 0.1029412 0.5555556 0.3658088
> #------------------------------------------------------------------------> ?classAgreement
No documentation for classAgreement in specified packages and libraries:
you could try ??classAgreement
> library(class)
> ?class
> class(iris3)
[1] "array"
> ?iris3
> train = rbind(iris3[1:25,,1],iris3[1:25,,2],iris3[1:25,,3])
> str(iris3)
num [1:50, 1:4, 1:3] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
- attr(*, "dimnames")=List of 3
..$ : NULL
..$ : chr [1:4] "Sepal L." "Sepal W." "Petal L." "Petal W."
..$ : chr [1:3] "Setosa" "Versicolor" "Virginica"
> str(iris3[1:25,,1])
num [1:25, 1:4] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:4] "Sepal L." "Sepal W." "Petal L." "Petal W."
> iris3[1:10,,1]
Sepal L. Sepal W. Petal L. Petal W.
[1,]
5.1
3.5
1.4
0.2
[2,]
4.9
3.0
1.4
0.2
[3,]
4.7
3.2
1.3
0.2
[4,]
4.6
3.1
1.5
0.2
[5,]
5.0
3.6
1.4
0.2
[6,]
5.4
3.9
1.7
0.4
[7,]
4.6
3.4
1.4
0.3
[8,]
5.0
3.4
1.5
0.2
[9,]
4.4
2.9
1.4
0.2
[10,]
4.9
3.1
1.5
0.1
> iris3[1:10,,1:""]
Errore in 1:"" : argomento NA/NaN
> iris3[1:10,,1:2]
, , Setosa
[1,]
[2,]
[3,]
[4,]
[5,]
[6,]
[7,]
[8,]
[9,]
[10,]
4.6
5.0
5.4
4.6
5.0
4.4
4.9
3.1
3.6
3.9
3.4
3.4
2.9
3.1
1.5
1.4
1.7
1.4
1.5
1.4
1.5
0.2
0.2
0.4
0.3
0.2
0.2
0.1
, , Versicolor
[1,]
[2,]
[3,]
[4,]
[5,]
[6,]
[7,]
[8,]
[9,]
[10,]
> str(train)
num [1:75, 1:4] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:4] "Sepal L." "Sepal W." "Petal L." "Petal W."
> test = rbind(iris3[26:50,,1],iris3[26:50,,2],iris3[26:50,,3])
> dim(train)
[1] 75 4
> dim(test)
[1] 75 4
> ?factor
> cl = factor(c(rep("S",25),rep("C",25),rep("V",25)))
> ?rep
> str(cl)
Factor w/ 3 levels "C","S","V": 2 2 2 2 2 2 2 2 2 2 ...
> ?knn
> pred_k3 = knn(train, test, cl, k=3, prob=F)
> pred_k3
[1] S S S S S S S S S S S S S S S S S S S S S S S S S C C V C C
C
[40] C C C C C C C C C C C V C C V V V V V C V V V V C V V V V V
Levels: C S V
> table(cl, pred_k3)
pred_k3
cl C S V
C 23 0 2
S 0 25 0
V 4 0 21
> pred_k1 = knn(train, test, cl, k=1, prob=F)
> pred_k1
[1] S S S S S S S S S S S S S S S S S S S S S S S S S C C C C C
C
[40] C C C C C C C C C C C V V C V V V V V C V V V V C V V V V V
Levels: C S V
> table(cl,pred_k1)
pred_k1
cl C S V
C 24 0 1
C C C V C C C C
V V V V V V
C C C V C C C C
V V V V V V
S 0 25 0
V 3 0 22
> ?diag
> #Accuratezza
> sum(diag(table(cl,pred_k1))) / sum(table(cl,pred_k1))
[1] 0.9466667
> accuracy.vector = numeric(20)
> accuracy.vector
[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> for(i in 1:20) {}
> for(i in 1:20) {
+
temp.pred = knn(train, test, cl, k=i, prob=F)
+
accuracy.vector[i] = sum(diag(table(cl,temp.pred))) / sum(table(cl,temp.pre
d))
+ }
> ?ts
> plot(ts(accuracy.vector), lwd=2)
> plot(ts(accuracy.vector), lwd=2)
> which.max(accuracy.vector)
[1] 2
> #------------------RITORNIAMO INDIETRO---------------------------------------> ts
function (data = NA, start = 1, end = numeric(), frequency = 1,
deltat = 1, ts.eps = getOption("ts.eps"), class = if (nseries >
1) c("mts", "ts", "matrix") else "ts", names = if (!is.null(dimnames(dat
a))) colnames(data) else paste("Series",
seq(nseries)))
{
if (is.data.frame(data))
data <- data.matrix(data)
if (is.matrix(data)) {
nseries <- ncol(data)
ndata <- nrow(data)
dimnames(data) <- list(NULL, names)
}
else {
nseries <- 1
ndata <- length(data)
}
if (ndata == 0)
stop("'ts' object must have one or more observations")
if (missing(frequency))
frequency <- 1/deltat
else if (missing(deltat))
deltat <- 1/frequency
if (frequency > 1 && abs(frequency - round(frequency)) <
ts.eps)
frequency <- round(frequency)
if (length(start) > 1L) {
start <- start[1L] + (start[2L] - 1)/frequency
}
if (length(end) > 1L) {
end <- end[1L] + (end[2L] - 1)/frequency
}
if (missing(end))
end <- start + (ndata - 1)/frequency
else if (missing(start))
start <- end - (ndata - 1)/frequency
if (start > end)
stop("'start' cannot be after 'end'")
$
$
$
$
$
-01
[7]
-06
[13]
+00
[19]
+00
[25]
+00
[31]
-08
[37]
+00
[43]
+00
[49]
-08
[55]
-08
[61]
-06
[67]
-01
[73]
-02
[79]
-08
[85]
-07
[91]
-01
[97]
-01
[103]
-01
[109]
-08
[115]
-08
[121]
-08
[127]
-01
[133]
-01
[139]
+00
[145]
+00
[151]
-06
[157]
-04
[163]
-01
[169]
-04
[175]
+00
[181]
+00
[187]
-07
[193]
-03
[199]
+00
[205]
+00
[211]
-04
[217]
-01
[223]
+00
[229]
-08
[235]
-03
[241]
+00
[247]
-08
[253]
-05
[259]
+00
[265]
+00
[271]
-02
[277]
-01
[283]
-01
[289]
+00
[295]
-01
[301]
-08
[307]
-01
[313]
+00
[319]
-01
[325]
+00
[331]
-07
[337]
-01
[343]
-08
[349]
-04
[355]
-07
[361]
-03
[367] 1.000000e+00 9.999999e-01 1.000000e+00 3.898518e-06 9.999998e-01 1.000000e
+00
[373] 1.351664e-05 2.015980e-01 1.040805e-07 1.385894e-06 1.000000e+00 8.304392e
-05
[379] 1.060101e-07 1.060101e-07 1.000000e+00 9.999526e-01 1.141910e-04 9.999997e
-01
[385] 2.080808e-04 9.116228e-03 9.999523e-01 9.999759e-01 3.524888e-06 1.000000e
+00
[391] 1.533452e-01 1.000000e+00 4.328582e-06 9.811857e-01 8.691326e-01 9.999998e
-01
[397] 9.999954e-01 2.139179e-01 9.997807e-01 4.562277e-08 1.073977e-05 6.795599e
-08
[403] 9.760270e-01 3.088151e-08 4.754954e-07 1.060101e-07 9.468313e-01 9.742596e
-08
[409] 9.999982e-01 1.506409e-08 6.856126e-07 1.000000e+00 7.531440e-08 1.545591e
-01
[415] 1.000000e+00 9.999999e-01 1.781027e-05 1.000000e+00 1.000000e+00 1.000000e
+00
[421] 1.849641e-01 9.999972e-01 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e
+00
[427] 1.000000e+00 1.995846e-05 1.000000e+00 9.999997e-01 8.245870e-05 1.000000e
+00
[433] 6.792897e-07 5.122147e-06 2.615216e-08
> predB.priori = predict( model.nb.laplaceB, HouseVotes84, type='raw')[,1]
> prioriBoth = cbind(predA.priori, predB.priori)
> temp = HMeasure(HouseVotes84$Class, predBoth)
Class labels have been switched from (democrat,republican) to (0,1)
Errore in rownames(scores) <- NULL : oggetto "predBoth" non trovato
> temp = HMeasure(HouseVotes84$Class, prioriBoth)
Class labels have been switched from (democrat,republican) to (0,1)
Warning messages:
1: In HMeasure.single(y = true.class, s = s, classifier.name = name.now, :
ROC curve of predA.priori mostly lying under the diagonal. Switching scores.
2: In HMeasure.single(y = true.class, s = s, classifier.name = name.now, :
ROC curve of predB.priori mostly lying under the diagonal. Switching scores.
> temp$Measures
NULL
> temp$Measure
NULL
> temp$measures
NULL
> temp$metrics
H
Gini
AUC
AUCH
KS
MER
predA.priori 0.7847836 0.9474764 0.9737382 0.9777956 0.8406902 0.08965517
predB.priori 0.7841794 0.9465400 0.9732700 0.9777622 0.8406902 0.08965517
MWL Spec.Sens95 Sens.Spec95
ER
Sens
Spec
predA.priori 0.07552913 0.8838951 0.8273810 0.09655172 0.9226190 0.8913858
predB.priori 0.07552913 0.8801498 0.8214286 0.09655172 0.9285714 0.8876404
Precision
Recall
TPR
FPR
F
Youden TP FP
predA.priori 0.8423913 0.9226190 0.9226190 0.1086142 0.8806818 0.8140048 155 29
predB.priori 0.8387097 0.9285714 0.9285714 0.1123596 0.8813559 0.8162119 156 30
TN FN
predA.priori 238 13
predB.priori 237 12
> plotROC(temp, lwd=2)
Errore in plotROC(temp, lwd = 2) : unused argument (lwd = 2)
> plotROC(temp)
> #Esercizio
> nrow(HouseVotes84)
[1] 435
> ?sample
> set.seed(1234)
> randList = sample(nrow(HouseVotes84),round(nrow(HouseVotes84)*0.75), replace=T
)
> #Si fa il sampling col replacement in modo che ciascuna estrazione
> #sia probabilisticamente indipendente una con l'altra
> randList
[1] 50 271 266 272 375 279 5 102 290 224 302 238 123 402 128 365 125 117 8
2
[20] 102 138 132 70 18 96 353 229 398 362 20 199 116 133 221 79 331 88 11
3
[39] 432 352 241 282 136 271 144 219 295 211 107 333 33 135 313 220 67 220 21
5
[58] 327 76 370 377 19 138 6 104 308 135 222 23 246 53 389 7 341 40 22
6
[77] 168 31 140 291 403 206 63 237 86 391 170 136 70 390 73 392 59 58 4
6
[96] 223 131 12 135 323 16 246 122 89 59 142 68 57 190 17 311 44 414 5
3
[115] 96 398 412 122 54 347 324 399 433 410 212 124 110 219 217 139 419 276 5
6
[134] 185 398 204 396 261 275 379 219 428 142 210 156 273 323 247 427 251 191 10
0
[153] 36 370 103 430 262 435 164 242 187 251 189 98 37 278 188 32 350 142 33
0
[172] 255 309 186 150 331 185 244 51 132 209 151 262 34 416 10 367 276 135 32
4
[191] 278 432 56 385 353 358 364 319 428 279 288 230 139 335 229 319 134 176 8
9
[210] 429 247 122 81 330 247 406 278 305 209 370 184 14 113 146 59 218 349 14
7
[229] 222 216 347 247 47 352 247 93 327 134 213 431 185 107 95 300 427 208 33
7
[248] 250 421 347 232 260 115 122 29 245 115 2 257 227 368 13 261 117 53 4
4
[267] 326 7 22 326 156 331 164 348 12 221 358 237 117 150 161 187 400 342 32
1
[286] 123 199 126 303 358 285 180 415 106 265 330 302 51 277 135 154 427 235 19
4
[305] 413 197 83 432 239 335 398 297 178 178 64 86 84 178 152 364 87 375 17
3
[324] 67 148 160
> randList = sample(nrow(HouseVotes84),round(nrow(HouseVotes84)*0.75))
> randList
[1] 186 81 285 398 317 380 409 84 202 165 160 12 394 174 403 115 217 429 15
5
[20] 130 15 277 381 19 83 305 54 290 407 384 241 296 197 309 2 224 184 13
2
[39] 332 388 261 93 322 433 382 104 342 190 119 153 293 39 162 252 113 79
1
[58] 41 82 44 258 73 368 353 257 274 271 405 62 338 207 196 7 133 248 15
0
[77] 272 278 205 68 270 33 220 147 273 283 366 76 301 145 177 267 46 141 22
6
[96] 402 117 36 315 356 412 143 413 92 372 259 346 306 90 215 124 304 238 19
1
[115] 262 350 323 425 314 176 358 198 192 11 80 373 340 404 231 209 348 185 42
2
[134] 415 333 269 118 417 53 58 383 116 399 168 378 428 379 276 420 280 146 36
2
[153] 357 291 148 157 324 390 38 21 210 193 328 43 163 365 246 411 316
1
[172] 23 318 286 175 351 222 13 303 419 393 211 354 201 266 319 232 352
0
[191] 281 22 263 434 427 60 87 221 134 255 10 235 245 56 59 144 423
7
[210] 225 361 341 167 287 85 135 212 48 369 94 179 396 260 194 55 47
5
[229] 371 391 102 9 127 292 32 199 136 172 218 170 298 387 282 424 120
4
[248] 126 410 72 313 78 101 149 152 3 103 418 363 64 275 364 426 106
0
[267] 288 228 240 347 431 330 320 20 343 247 243 89 336 61 131 253 385
6
[286] 121 265 279 140 249 421 96 112 69 432 345 216 114 395 203 66 200
1
[305] 339 430 57 435 236 171 349 105 206 284 321 264 181 110 125 122 400
8
[324] 91 227 187
> sample(2,round(nrow(HouseVotes84)*0.75))
Errore in sample.int(x, size, replace, prob) :
cannot take a sample larger than the population when 'replace = FALSE'
> #Se replace == FALSE allora ciascuna estrazione del numero avviene
> #senza reimbussolamento
> train = HouseVotes84[randList,]
> dim(train)
[1] 326 17
> test = HouseVotes84[-randList,]
> model.nb.laplace = naiveBayes(Class~., data=train, laplace=1)
> model.nb = naiveBayes(Class~., data=train)
> predL = predict(model.nb.laplace, test)
> pred = predict(model.nb, test)
> table(predL, test$Class)
327 16
28 37
183 2
166 3
386 41
173 36
156 40
42 31
310 30
predL
democrat republican
democrat
65
2
republican
9
33
> table(pred, test$Class)
pred
democrat republican
democrat
65
2
republican
9
33
> predL.post = predict(model.nb, test, type='raw')[,2]
> predL.post = predict(model.nb.laplace, test, type='raw')[,2]
> pred.post = predict(model.nb, test, type='raw')[,2]
> predBoth = cbind(pred.post, predL.post)
> HMeasure(test$Class, predBoth)
Class labels have been switched from (democrat,republican) to (0,1)
$metrics
H
Gini
AUC
AUCH
KS
MER
M
WL
pred.post 0.7762976 0.9212355 0.9606178 0.9714286 0.8498069 0.08256881 0.065482
70
predL.post 0.7672269 0.9204633 0.9602317 0.9704633 0.8362934 0.08256881 0.071374
46
Spec.Sens95 Sens.Spec95
ER
Sens
Spec Precision
pred.post
0.8783784 0.7714286 0.1009174 0.9428571 0.8783784 0.7857143
predL.post 0.8648649 0.7714286 0.1009174 0.9428571 0.8783784 0.7857143
Recall
TPR
FPR
F
Youden TP FP TN FN
6
[85] 0.48571429 0.51428571 0.54285714 0.57142857 0.60000000 0.71428571 0.7428571
4
[92] 0.74285714 0.77142857 0.82857143 0.85714286 0.88571429 0.94285714 1.0000000
0
[99] 1.00000000
attr(,"data")[[1]]$G0
[1] 0.00000000 0.00000000 0.01351351 0.04054054 0.05405405 0.12162162 0.32432432
[8] 1.00000000
attr(,"data")[[1]]$G1
[1] 0.0000000 0.2571429 0.5428571 0.7714286 0.8571429 0.9714286 1.0000000 1.0000
000
attr(,"data")[[1]]$cost
[1] 0.0000000 1.0000000 0.9090909 0.8000000 0.7500000 0.4444444 0.0625000 0.0000
000
[9] 1.0000000
attr(,"data")[[1]]$pi1
[1] 0.3211009
attr(,"data")[[1]]$pi0
[1] 0.6788991
attr(,"data")[[1]]$n0
[1] 74
attr(,"data")[[1]]$n1
[1] 35
attr(,"data")[[1]]$n
[1] 109
attr(,"data")[[1]]$hc
[1] 8
attr(,"data")[[1]]$s.class0
[1] 4.525125e-13 4.525125e-13
12
[7] 2.538304e-12 3.890551e-12
12
[13] 4.410442e-12 4.410442e-12
12
[19] 1.194853e-11 2.243127e-11
11
[25] 4.289404e-11 4.479269e-11
10
[31] 2.325143e-10 3.366018e-10
10
[37] 6.948256e-10 9.688177e-10
09
[43] 7.238779e-09 1.149933e-08
08
[49] 8.021760e-08 1.403882e-07
07
[55] 1.689265e-06 1.857594e-06
04
[61] 3.405700e-04 1.281181e-03
4.797781e-13 7.320922e-13 2.353211e-12 2.538304e3.981558e-12 3.998903e-12 4.106566e-12 4.106566e4.410442e-12 6.208715e-12 6.575199e-12 9.505316e2.243127e-11 3.482690e-11 4.050324e-11 4.089769e5.331871e-11 6.240569e-11 9.287757e-11 1.664516e5.209834e-10 5.321965e-10 5.433921e-10 5.973297e1.405080e-09 1.790492e-09 2.117662e-09 3.718155e1.315558e-08 2.526723e-08 3.708444e-08 4.065557e5.827140e-07 6.235422e-07 8.439995e-07 9.584138e2.285757e-06 1.084362e-05 7.030967e-05 2.889456e3.005914e-02 3.577871e-02 2.012428e-01 8.722791e-
01
[67] 9.666022e-01 9.780211e-01 9.829004e-01 9.942232e-01 9.993197e-01 9.999979e01
[73] 9.999992e-01 1.000000e+00
attr(,"data")[[1]]$s.class1
[1] 3.172690e-07 3.002413e-01
01
[7] 9.990692e-01 9.991937e-01
01
[13] 9.999948e-01 9.999955e-01
01
[19] 9.999996e-01 9.999998e-01
01
[25] 9.999999e-01 9.999999e-01
00
[31] 1.000000e+00 1.000000e+00
8.159505e-01 9.033260e-01 9.460419e-01 9.989276e9.999690e-01 9.999746e-01 9.999922e-01 9.999931e9.999989e-01 9.999991e-01 9.999994e-01 9.999996e9.999999e-01 9.999999e-01 9.999999e-01 9.999999e1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+
1.000000e+00 1.000000e+00 1.000000e+00
attr(,"data")[[1]]$severity.ratio
[1] 0.472973
attr(,"data")[[2]]
attr(,"data")[[2]]$F0
[1] 0.00000000 0.02702703
1
[8] 0.12162162 0.13513514
4
[15] 0.25675676 0.28378378
5
[22] 0.36486486 0.37837838
5
[29] 0.45945946 0.47297297
4
[36] 0.55405405 0.56756757
4
[43] 0.64864865 0.66216216
2
[50] 0.72972973 0.74324324
1
[57] 0.82432432 0.83783784
8
[64] 0.89189189 0.89189189
5
[71] 0.94594595 0.94594595
6
[78] 0.95945946 0.95945946
9
[85] 0.98648649 0.98648649
9
[92] 1.00000000 1.00000000
0
[99] 1.00000000
attr(,"data")[[2]]$F1
[1] 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.0000000
0
[8] 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.0000000
0
[15] 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.0000000
0
[22]
0
[29]
0
[36]
0
[43]
3
[50]
3
[57]
9
[64]
4
[71]
1
[78]
6
[85]
4
[92]
0
[99]
attr(,"data")[[2]]$G0
[1] 0.00000000 0.00000000 0.01351351 0.04054054 0.05405405 0.13513514 0.32432432
[8] 1.00000000
attr(,"data")[[2]]$G1
[1] 0.0000000 0.2571429 0.5428571 0.7714286 0.8571429 0.9714286 1.0000000 1.0000
000
attr(,"data")[[2]]$cost
[1] 0.00000000 1.00000000 0.90909091 0.80000000 0.75000000 0.40000000 0.06666667
[8] 0.00000000 1.00000000
attr(,"data")[[2]]$pi1
[1] 0.3211009
attr(,"data")[[2]]$pi0
[1] 0.6788991
attr(,"data")[[2]]$n0
[1] 74
attr(,"data")[[2]]$n1
[1] 35
attr(,"data")[[2]]$n
[1] 109
attr(,"data")[[2]]$hc
[1] 8
attr(,"data")[[2]]$s.class0
[1] 2.338869e-12 2.338869e-12 2.477128e-12 3.764024e-12 1.165847e-11 1.255347e11
[7] 1.255347e-11 1.915777e-11 1.928659e-11 1.985773e-11 2.020274e-11 2.020274e11
[13]
11
[19]
10
[25]
10
[31]
09
[37]
08
[43]
07
[49]
06
[55]
04
[61]
01
[67]
01
[73]
2.163481e-11 2.163481e-11 2.163481e-11 3.064158e-11 3.164844e-11 4.560274e5.799568e-11 1.065829e-10 1.065829e-10 1.644633e-10 1.903462e-10 1.912963e1.966008e-10 2.060753e-10 2.447647e-10 2.906770e-10 3.176506e-10 7.595068e1.062645e-09 1.496789e-09 1.704934e-09 2.289815e-09 2.326324e-09 2.329976e2.556589e-09 3.098267e-09 5.994529e-09 7.519696e-09 8.922027e-09 1.564241e3.056522e-08 3.991274e-08 4.626319e-08 1.029253e-07 1.198746e-07 1.434890e3.123092e-07 3.968584e-07 1.628488e-06 1.710560e-06 2.293669e-06 2.571179e4.585425e-06 6.496091e-06 8.132989e-06 2.811818e-05 1.753507e-04 7.893108e8.899171e-04 3.838173e-03 5.997919e-02 7.178916e-02 3.368645e-01 8.746494e9.667372e-01 9.876162e-01 9.903742e-01 9.966558e-01 9.995828e-01 9.999973e9.999989e-01 9.999999e-01
attr(,"data")[[2]]$s.class1
[1] 8.935902e-07 3.167905e-01
01
[7] 9.989291e-01 9.991017e-01
01
[13] 9.999935e-01 9.999944e-01
01
[19] 9.999995e-01 9.999997e-01
01
[25] 9.999999e-01 9.999999e-01
00
[31] 1.000000e+00 1.000000e+00
8.125632e-01 9.046617e-01 9.453324e-01 9.987978e9.999700e-01 9.999730e-01 9.999902e-01 9.999915e9.999986e-01 9.999988e-01 9.999992e-01 9.999995e9.999999e-01 9.999999e-01 9.999999e-01 9.999999e9.999999e-01 1.000000e+00 1.000000e+00 1.000000e+
1.000000e+00 1.000000e+00 1.000000e+00
attr(,"data")[[2]]$severity.ratio
[1] 0.472973
attr(,"class")
[1] "hmeasure"
> measure = HMeasure(test$Class, predBoth)
Class labels have been switched from (democrat,republican) to (0,1)
> measure$metrics
H
Gini
AUC
AUCH
KS
MER
M
WL
pred.post 0.7762976 0.9212355 0.9606178 0.9714286 0.8498069 0.08256881 0.065482
70
predL.post 0.7672269 0.9204633 0.9602317 0.9704633 0.8362934 0.08256881 0.071374
46
Spec.Sens95 Sens.Spec95
ER
Sens
Spec Precision
pred.post
0.8783784 0.7714286 0.1009174 0.9428571 0.8783784 0.7857143
predL.post 0.8648649 0.7714286 0.1009174 0.9428571 0.8783784 0.7857143
Recall
TPR
FPR
F
Youden TP FP TN FN
pred.post 0.9428571 0.9428571 0.1216216 0.8571429 0.8212355 33 9 65 2
predL.post 0.9428571 0.9428571 0.1216216 0.8571429 0.8212355 33 9 65 2
> plotROC(measure)
> ?plotROC
> function accuracy(table) {
Errore: unexpected symbol in "function accuracy"