Escolar Documentos
Profissional Documentos
Cultura Documentos
Platform
Feature Extraction
Model Comparison
Second, we want to know how many trees
are much enough so that the Eval converges.
From the figure 3., we can see that Eval are
almost the same for the forest with 100, 200
and 400 trees. Thus, a random forest with
more than 100 trees and 128 splits per node
are the best parameters for our data. The
best Eval by random forest is 0.123857.
3.2
For GBDT, there are two important parameters, one is the maximum number of
leaves per tree, the other is the number
of trees. From the figure 4 and figure 5, we
can see that the best parameter is 32 leaves
for each tree and 16 trees in total. The best
Eval by GBDT is 0.122872.
3.3
Neuron Network
3.4
3.4.1
In sections above, weve shown the experiment results of three machine learning
algorithm with various sets of parameters,
and the results indicates that the random
forest has the best performance, and
gradient boost decision trees do almost the
same well as random forest. Therefore, We
choose random forest and the voting results
of random forest and gradient boost decision trees as our best two models for both
track 1 and 2. The detailed parameters of
our best random forest classifier are shown
in table 2. As for the voting classifier, we
reused the result of the previous model and
made it vote with our best GBDT classifier
of which the parameters are documented in
table 3. In addition, the voting method are
soft voting[2] which just simply average
the probability in the binary classification.
Model Comparison
Efficiency
Best Model
Pre-processing
model
Eval
running time
RF
0.123857
28m17s
GBDT
0.122872
25s
NN
0.130037
1m8s
1
128
40
500
Bagging
store a RF model.
Team Work
32
1
0.1
16
References
Name
feature engineering
0.7
0.3
model tuning
0.5
0.5