Você está na página 1de 3

20/07/2017 Wrapped Learners - mlr tutorial

Wrapper
Wrappers can be employed to extend integrated learners (http://www.rdocumentation.org/packages/mlr/functions/makeLearner.html) with new functionality. The broad scope
of operations and methods which are implemented as wrappers underline the flexibility of the wrapping approach:

Data preprocessing (../preproc/index.html)


Imputation (../impute/index.html)
Bagging (../bagging/index.html)
Tuning (../tune/index.html)
Feature selection (../feature_selection/index.html)
Cost-sensitive classification (../cost_sensitive_classif/index.html)
Over- and undersampling (../over_and_undersampling/index.html) for imbalanced classification problems
Multiclass extension (http://www.rdocumentation.org/packages/mlr/functions/makeMulticlassWrapper.html) for binary-class learners
Multilabel classification (../multilabel/index.html)

All these operations and methods have a few things in common: First, they all wrap around mlr (http://www.rdocumentation.org/packages/mlr/) learners
(http://www.rdocumentation.org/packages/mlr/functions/makeLearner.html) and they return a new learner. Therefore learners can be wrapped multiple times. Second, they
are implemented using a train (pre-model hook) and predict (post-model hook) method.

Example: Bagging wrapper


In this section we exemplary describe the bagging wrapper to create a random forest which supports weights. To achieve that we combine several decision trees from the
rpart (http://www.rdocumentation.org/packages/rpart/) package to create our own custom random forest.

First, we create a weighted toy task.

data(iris)
task = makeClassifTask(data = iris, target = "Species", weights = as.integer(iris$Species))

Next, we use makeBaggingWrapper (http://www.rdocumentation.org/packages/mlr/functions/makeBaggingWrapper.html) to create the base learners and the bagged learner.
We choose to set equivalents of ntree (100 base learners) and mtry (proportion of randomly selected features).

base.lrn = makeLearner("classif.rpart")
wrapped.lrn = makeBaggingWrapper(base.lrn, bw.iters = 100, bw.feats = 0.5)
print(wrapped.lrn)
#> Learner classif.rpart.bagged from package rpart
#> Type: classif
#> Name: ; Short name:
#> Class: BaggingWrapper
#> Properties: twoclass,multiclass,missings,numerics,factors,ordered,prob,weights,featimp
#> Predict-Type: response
#> Hyperparameters: xval=0,bw.iters=100,bw.feats=0.5

As we can see in the output, the wrapped learner inherited all properties from the base learner, especially the "weights" attribute is still present. We can use this newly
constructed learner like all base learners, i.e. we can use it in train (http://www.rdocumentation.org/packages/mlr/functions/train.html), benchmark
(http://www.rdocumentation.org/packages/mlr/functions/benchmark.html), resample (http://www.rdocumentation.org/packages/mlr/functions/resample.html), etc.

https://mlr-org.github.io/mlr-tutorial/release/html/wrapper/index.html 1/3
20/07/2017 Wrapped Learners - mlr tutorial

benchmark(tasks = task, learners = list(base.lrn, wrapped.lrn))


#> Task: iris, Learner: classif.rpart
#> [Resample] cross-validation iter 1:
#> mmce.test.mean= 0.2
#> [Resample] cross-validation iter 2:
#> mmce.test.mean=0.0667
#> [Resample] cross-validation iter 3:
#> mmce.test.mean= 0
#> [Resample] cross-validation iter 4:
#> mmce.test.mean=0.0667
#> [Resample] cross-validation iter 5:
#> mmce.test.mean= 0
#> [Resample] cross-validation iter 6:
#> mmce.test.mean=0.0667
#> [Resample] cross-validation iter 7:
#> mmce.test.mean= 0
#> [Resample] cross-validation iter 8:
#> mmce.test.mean=0.133
#> [Resample] cross-validation iter 9:
#> mmce.test.mean=0.0667
#> [Resample] cross-validation iter 10:
#> mmce.test.mean=0.0667
#> [Resample] Aggr. Result: mmce.test.mean=0.0667
#> Task: iris, Learner: classif.rpart.bagged
#> [Resample] cross-validation iter 1:
#> mmce.test.mean= 0.2
#> [Resample] cross-validation iter 2:
#> mmce.test.mean=0.0667
#> [Resample] cross-validation iter 3:
#> mmce.test.mean= 0
#> [Resample] cross-validation iter 4:
#> mmce.test.mean=0.0667
#> [Resample] cross-validation iter 5:
#> mmce.test.mean= 0
#> [Resample] cross-validation iter 6:
#> mmce.test.mean=0.0667
#> [Resample] cross-validation iter 7:
#> mmce.test.mean= 0
#> [Resample] cross-validation iter 8:
#> mmce.test.mean=0.133
#> [Resample] cross-validation iter 9:
#> mmce.test.mean= 0
#> [Resample] cross-validation iter 10:
#> mmce.test.mean=0.0667
#> [Resample] Aggr. Result: mmce.test.mean=0.06
#> task.id learner.id mmce.test.mean
#> 1 iris classif.rpart 0.06666667
#> 2 iris classif.rpart.bagged 0.06000000

That far we are quite happy with our new learner. But we hope for a better performance by tuning some hyperparameters of both the decision trees and bagging wrapper.
Let's have a look at the available hyperparameters of the fused learner:

getParamSet(wrapped.lrn)
#> Type len Def Constr Req Tunable Trafo
#> bw.iters integer - 10 1 to Inf - TRUE -
#> bw.replace logical - TRUE - - TRUE -
#> bw.size numeric - - 0 to 1 - TRUE -
#> bw.feats numeric - 0.667 0 to 1 - TRUE -
#> minsplit integer - 20 1 to Inf - TRUE -
#> minbucket integer - - 1 to Inf - TRUE -
#> cp numeric - 0.01 0 to 1 - TRUE -
#> maxcompete integer - 4 0 to Inf - TRUE -
#> maxsurrogate integer - 5 0 to Inf - TRUE -
#> usesurrogate discrete - 2 0,1,2 - TRUE -
#> surrogatestyle discrete - 0 0,1 - TRUE -
#> maxdepth integer - 30 1 to 30 - TRUE -
#> xval integer - 10 0 to Inf - FALSE -
#> parms untyped - - - - TRUE -

We choose to tune the parameters minsplit and bw.feats for the mmce (../measures/index.html) using a random search
(http://www.rdocumentation.org/packages/mlr/functions/TuneControl.html) in a 3-fold CV:

https://mlr-org.github.io/mlr-tutorial/release/html/wrapper/index.html 2/3
20/07/2017 Wrapped Learners - mlr tutorial

ctrl = makeTuneControlRandom(maxit = 10)


rdesc = makeResampleDesc("CV", iters = 3)
par.set = makeParamSet(
makeIntegerParam("minsplit", lower = 1, upper = 10),
makeNumericParam("bw.feats", lower = 0.25, upper = 1)
)
tuned.lrn = makeTuneWrapper(wrapped.lrn, rdesc, mmce, par.set, ctrl)
print(tuned.lrn)
#> Learner classif.rpart.bagged.tuned from package rpart
#> Type: classif
#> Name: ; Short name:
#> Class: TuneWrapper
#> Properties: numerics,factors,ordered,missings,weights,prob,twoclass,multiclass,featimp
#> Predict-Type: response
#> Hyperparameters: xval=0,bw.iters=100,bw.feats=0.5

Calling the train method of the newly constructed learner performs the following steps:

1. The tuning wrapper sets parameters for the underlying model in slot $next.learner and calls its train method.
2. Next learner is the bagging wrapper. The passed down argument bw.feats is used in the bagging wrapper training function, the argument minsplit gets passed
down to $next.learner . The base wrapper function calls the base learner bw.iters times and stores the resulting models.
3. The bagged models are evaluated using the mean mmce (../measures/index.html) (default aggregation for this performance measure) and new parameters are
selected using the tuning method.
4. This is repeated until the tuner terminates. Output is a tuned bagged learner.

lrn = train(tuned.lrn, task = task)


#> [Tune] Started tuning learner classif.rpart.bagged for parameter set:
#> Type len Def Constr Req Tunable Trafo
#> minsplit integer - - 1 to 10 - TRUE -
#> bw.feats numeric - - 0.25 to 1 - TRUE -
#> With control class: TuneControlRandom
#> Imputation value: 1
#> [Tune-x] 1: minsplit=5; bw.feats=0.935
#> [Tune-y] 1: mmce.test.mean=0.0467; time: 0.1 min
#> [Tune-x] 2: minsplit=9; bw.feats=0.675
#> [Tune-y] 2: mmce.test.mean=0.0467; time: 0.1 min
#> [Tune-x] 3: minsplit=2; bw.feats=0.847
#> [Tune-y] 3: mmce.test.mean=0.0467; time: 0.1 min
#> [Tune-x] 4: minsplit=4; bw.feats=0.761
#> [Tune-y] 4: mmce.test.mean=0.0467; time: 0.1 min
#> [Tune-x] 5: minsplit=6; bw.feats=0.338
#> [Tune-y] 5: mmce.test.mean=0.0867; time: 0.1 min
#> [Tune-x] 6: minsplit=1; bw.feats=0.637
#> [Tune-y] 6: mmce.test.mean=0.0467; time: 0.1 min
#> [Tune-x] 7: minsplit=1; bw.feats=0.998
#> [Tune-y] 7: mmce.test.mean=0.0467; time: 0.1 min
#> [Tune-x] 8: minsplit=4; bw.feats=0.698
#> [Tune-y] 8: mmce.test.mean=0.0467; time: 0.1 min
#> [Tune-x] 9: minsplit=3; bw.feats=0.836
#> [Tune-y] 9: mmce.test.mean=0.0467; time: 0.1 min
#> [Tune-x] 10: minsplit=10; bw.feats=0.529
#> [Tune-y] 10: mmce.test.mean=0.0533; time: 0.1 min
#> [Tune] Result: minsplit=1; bw.feats=0.998 : mmce.test.mean=0.0467
print(lrn)
#> Model for learner.id=classif.rpart.bagged.tuned; learner.class=TuneWrapper
#> Trained on: task.id = iris; obs = 150; features = 4
#> Hyperparameters: xval=0,bw.iters=100,bw.feats=0.5

Documentation built with MkDocs (http://www.mkdocs.org/).

https://mlr-org.github.io/mlr-tutorial/release/html/wrapper/index.html 3/3

Você também pode gostar