Você está na página 1de 5

CDS503: Machine Learning

Lab 4 Extra: Weka Experimenter


Overview
We have been running one experiment at a time using Weka Explorer. Weka Experimenter
enables us to create, run, modify, and analyse experiments in a more convenient manner than is
possible when processing the schemes individually. For example, we can create an experiment
that runs several schemes against a series of datasets and then analyse the results to determine
if one of the schemes is (statistically) better than the other schemes.
Let’s design a small experiment to evaluate a suite of standard classification algorithms on the
problem.
1. Close the Weka Explorer.
2. Click the “Experimenter” button on the Weka GUI Chooser to launch the Weka Experiment
Environment.

3. Click “New” to start a new experiment.


4. In the “Experiment Type” pane change the “Number of folds” from “10” to “5”.

1 | CDS503: Lab 04 Extra (JLSY)


Semester 2, 2017/2018
5. In the “Datasets” pane click “Add new…” and select iris.arff data set.
6. In the “Algorithms” pane click “Add new…” and add the following 6 classification algorithms
(Click the “Choose” button on the weka.gui.GenericObjectEditor dialog box to select the machine
learning algorithms):

• rules.ZeroR (simply predicts the majority class – usually called the majority baseline)
• bayes.NaiveBayes
• bayes.BayesNet
• functions.SMO
• lazy.IBk
• trees.J48
7. Select IBK in the list of algorithms and click the “Edit selected…” button.
8. Change “KNN” from “1” to “3” and click the “OK” button to save the settings.

2 | CDS503: Lab 04 Extra (JLSY)


Semester 2, 2017/2018
9. Click on “Run” at the top of the window to open the Run tab and click the “Start” button to run
the experiment. The experiment should complete in just a few seconds.

10. Click on “Analyse” to open the Analyse tab. Click the “Experiment” button to load the results
from the experiment.

3 | CDS503: Lab 04 Extra (JLSY)


Semester 2, 2017/2018
11. Click the “Perform test” button to perform a pairwise test comparing all of the results to the
results for ZeroR.

Test configuration information

Statistical test (T-test) results

Key to test sets

What does the test output mean?


Test configuration information

• Tester: Information about what statistical tests are used to compare the machine learning
results
• Analyzing: The field (selected performance metric) we run statistical tests on
• Datasets: Number of datasets (we have only one data set iris.arff)
• Resultsets: Number of result sets (we set up 6 experiments, so we have 6 result sets)
• Confidence: Confidence level of the statistical T-test
Statistical test results
The matrix shows average percent_correct for each experiment or result set. For example, result
set (1) shows average percentage of correctly classified instances (accuracy) as 33.33% (result
set 1 is used as the baseline for comparison with every other result set). Note that the “v” beside
the result sets (2) to (6) means that the accuracy of result sets (2), (3), (4), (5) and (6) are
significantly better than baseline result (1). Symbol “v” means significantly better, and symbol “*”
means significantly worse. If we do not see “v” or “*” beside its percent_correct, it means that the
result set is not significantly better or worse than result set (1).
The parenthesis below the percent_correct numbers is another way to represent statistical
significance information. The (x, y, z) style means: x = whether this result set is significantly better

4 | CDS503: Lab 04 Extra (JLSY)


Semester 2, 2017/2018
than baseline; y = whether it is inconclusive to get significance conclusion; z = whether it is
significantly worse than baseline. For example, (1/0/0) for result set (2) means (significantly better
than baseline, not inconclusive, not significantly worse than baseline) compared to baseline result
set (1).
The (50) in front of the percent_correct row is the number instances in the test set.
Key information
It records numbering of the 6 result sets, by numbering each experiment (1), (2), (3), (4), (5) and
(6).
The Experimenter analysis can make comparison across more than one experiment based on a
single performance metric whereas Explorer will show more detailed performance metrics
(including precision and recall by class and the confusion matrix) for only a single experiment.
Results
The results suggest SVM (SMO) achieved the highest accuracy.
12. Since SMO achieved the best performance, we can also compare SMO to every other
experiment to see if its performance is significantly better than the rest. Click “Select” for the “Test
base”, select “functions.SMO” and click the “Select” button to choose the new test base. Click the
“Perform test” button again to perform the new analysis.

Although the results for SMO look better, the analysis suggests that the difference between these
results and the results from all of the other algorithms (except ZeroR) are not statistically
significant.

5 | CDS503: Lab 04 Extra (JLSY)


Semester 2, 2017/2018

Você também pode gostar