Você está na página 1de 6

A comparison of methods for Multiclass Minimal

Complexity Machines
Mayank Sharma Jayadeva
Department of Electrical Engineering Department of Electrical Engineering
Indian Institute of Technology, Delhi Indian Institute of Technology, Delhi
Hauz Khas, New Delhi, India 110016 Hauz Khas, New Delhi, India 110016
Email: eez142368@iitd.ac.in/mayankiitkgp10@gmail.com Email: jayadeva@ee.iitd.ac.in

AbstractComputational learning theory indicates that a small The MCM (and SVM) were originally postulated for binary
VC dimension leads to good generalization and sparsity. However, classification. Multiclass classification using these methods
the exact relation between the VC dimension and the defining is still an area of active research. Various methods have
variables of a classifier is usually abstruse and rarely tractable.
Recently, it has been shown how to learn a binary classifier by been proposed for multiclass classification using SVMs and
minimizing a tight bound on its VC dimension. Experimental are extensively studied in [4], [5]. In this paper we focus
results have shown that the MCM uses fewer support vectors on three of the methods, namely, One-v/s-One, One-v/s-Rest
than Support Vector Machines (SVMs) and leads to better and Directed Acyclic Graphs for multiclass classification. We
generalization. In this paper, we extend the MCM to the general compare the performances of these three methods on various
case when multiple classes are present. We explore the use of
the MCM in various multiclass methods like One V/s One, One datasets from the UCI Machine Learning Repository [6] and
V/s Rest and Directed Acyclic Graphs, and demonstrate the datasets from [7] using MCM and SVM. We establish the use
advantage of MCMs over SVMs on a number of benchmark of DAG with MCM as an efficient classifier for the task of
datasets. multi-class classification.
The rest of the paper is organized as follows. In Section II
I. I NTRODUCTION we first give a brief introduction to Support Vector Machines.
Learning capacity of machines are determined using the Section III describes MCMs for binary classification. Section
Vapnik-Chevronenkis (VC) dimension, which plays a key role IV briefly discusses about the various multi-class classification
in determining machine complexity and generalization ability. methods.In Section V we present the comparison of various
This is of significance when multiple models of a learning methods for multi-class classification. Finally discussions and
machine are employed to perform the task at hand, such as conclusions are presented in Section VI.
in case of multi-class classification. Support Vector Machines II. S UPPORT V ECTOR M ACHINES
(SVM)[1] have conventionally been employed to perform such
Support vector machines are one of the most widely used
tasks, however their VC dimension may be unbounded [2].
hyperplane classifiers. It consists of learning a hyperplane in
Specifically, the total risk (R()) associated with a learning
input or feature space such that all the points are atleast at a
machine is the sum of empirical and structural risks, as given
distance of 1 (geometric margin) from the hyperplane. Given
by Eqns. (1)-(2).
the training data D, a set of N points in d dimensions of the
form:
N
1 X
Rempirical () = |yi f (xi , )| (1) D = {(xi , yi )|xi Rd , yi {1, 1}} (3)
2N i=1
s

The soft margin SVM optimization problem can be written as
h(log( 2N
h ) + 1) log( 4 )
R() Rempirical () + (2)
N N
1 X
minw,b,qi kwk2 + C qi (4)
where (xi , yi ) denote the N training samples and corre- 2 i=1
sponding labels, f (xi , ) denotes the output determined by
a learning machine with parameters on an input sample subject to:
xi , h denotes the VC dimension and 0 1. Burges
[2, Sec 6, Th. 3,5] shows that SVMs can have unbounded, yi {wT (xi ) + b} + qi 1 (5)
and possibly infinite VC dimension. In this context, Jayadeva
qi 0 (6)
proposed the Minimal Complexity Machines [3] with bounds
on VC dimension, thereby providing better generalization and where w Rd are the weights, b R is the bias term, C
low complexity learning. is a hyper-parameter which balances the error term and the
importance of the weights in the minimization, qi s are the subject to:
slack variables and (.) is the feature space. The solution to
this Quadratic Programming Problem (QPP) becomes easier to
solve when converted to its dual. The dual of the problem has yi {wT (xi ) + b} + qi 1 (12)
T
box constraints which can solved using Sequential Minimal yi {w (xi ) + b} + qi h (13)
Optimization (SMO) as proposed by Platt[8]. The dual of the qi 0 (14)
minimization problem is given by:
N X
N N
where h constitutes a tight or exact () bound on the VC
dimension , all the other parameters are as previously defined.
X X
min i j yi yj k(xi , xj ) i (7)
i=1 j=1 i=1
Using (xi ) = xi ) converts this formulation into a linear
MCM. An exact bound implies that h and are close to one
subject to: another.
Since the image vectors (xi ) s form an over-complete
0 i C (8) basis in the empirical feature space, in which w lies, hence
N
using Representer Theorem as discussed in [11] and [12], we
can write:
X
i yi = 0 (9)
i=1 N
X
w= i (xi ) (15)
Where, i s
are the dual variables, k(xi , xj ) is the kernel
i=1
function given by (xi )T (xj ) for a given pair (xi , xj ). SVMs
were motivated by work of Vapnik[9] and his colleagues on where i s are coefficients for each of the input vector
theory of generalization and complexity of learning machines. transformed to feature space. Thus the kernel version of MCM
It is well known that the capacity of learning machine can be can be written as:
measured by its Vapnik-Chervonenkis (VC) dimension. The N
X
VC dimension can be used to estimate a probabilistic upper minw,b,h,qi (h + C qi ) (16)
bound on the test set of a classifier. A small VC dimensional i=1
generally leads to a good generalization and low test error
subject to:
rates[9].
III. M INIMAL C OMPLEXITY M ACHINES N
X
Inspired by the celebrated tutorial of Burges [2] Jayadeva[3] yi { j k(xi , xj ) + b} + qi 1 (17)
formulated a novel algorithm called as Minimal Complexity j=1

Machines. The objective function of MCM linearly bounds N


X
the VC dimension from both above and below, thus leading yi { j k(xi , xj ) + b} + qi h (18)
to a simple Linear Programming Problem (LPP). Vapnik [9] j=1

showed that the VC dimension for set of all gap tolerant qi 0 (19)
classifiers with margin r rmin is bounded by
IV. O NE V / S REST, O NE V / S O NE AND DAG METHODS FOR
R2 SVM AND MCM
1 + min( , d) (10)
(rmin )2
Since SVM and MCM in the primal form are somewhat
where R denotes the radius of the smallest sphere enclosing similar with inclusion of one additional constraint and one
all the training samples. The geometric interpretation of MCM VC dimension term, hence we shall describe the procedures
can be stated as finding a hyperplane such that all the points for SVM which are used for MCM in similar manner.
are atleast at a distance of 1 from the hyperplane and at
maximum of a distance h from the hyperplane. The other A. One v/s Rest
interpretation can be stated in terms of convex hulls of the One v/s Rest (OVR) is perhaps the earliest multiclass
two classes. As given in [10] for SVM, a similar analogy for approach for SVMs. Let k be the number of classes, then
linearly separable case can be formulated for MCM. It finds OVR method constructs k SVMs for each class. The ith SVM
a hyperplane that is as far as possible from the nearest points is trained with all examples in ith class as positive and the rest
of the two convex hulls and all the points in the convex hull with negative labels. After solving for k SVMs or MCMs, we
are at a distance of h from the hyperplane . obtain k decision functions:
The formulation of soft margin MCM as stated in [3] is
given by: (w1 )T (x) + b1 (20)
N ..
X . (21)
minw,b,h,qi h + C qi (11)
k T k
i=1 (w ) (x) + b (22)
To classify a new test point x Rd , we use the output class function is evaluated and the tree is traversed left or right
being the one which gives the largest decision function. depending on the output value. Finally a prediction is made
at the leaf node.
class(x) = arg max((wi )T (x) + bi ) (23)
i Expand DAG substantially, provide references, etc.
1) Implementation Issues: When we use OVR approach There have been several works in literature based on using
for multiclass classification, the classes become unbalanced. the DAG approach for multi-class classification. Large margin
In order to address this issue, we use the method used in DAGs (Decision DAGs) have been proposed by Platt et al.
LIBSVM [13]. Hence, the modified objective functions for [16]. Adaptive DAGs have been proposed by Kijsirikul and
SVM and MCM respectively are: Ussivakul [17], [18], while hierarchical DAGs have been
For SVM presented by Schwenker [19], [20], Suzuki et al. [21], Vural
et al. [22] among others. Binary decision trees have been used
N N for multi-class SVMs by Madzarov [23], while variants using
1 X X
genetic algorithms have been proposed by Lorena et al.[24],
minw,b,qi kwk2 + C1 qi + C2 qi (24)
2 i=1,y =1 i=1,y =1
Lian et al.[25] and Liu et al. [26]. The DAG approach has
i i
also been developed for multi-class classification in the fuzzy
(25)
domain by Midelfart et al. [27], Feng et al. [28].
subject to: Many practical problems to which machine learning has
been applied are often multi-class in nature, as more than
two classes need to be predicted from data. Thus, multi-
yi {wT (xi ) + b} + qi 1 (26)
class strategies such as DAG find many applications, a few
qi 0 (27) of which include protein structure prediction [29], [30], [31],
Equivalently, for MCM we have document/text categorization [32], [33], action recognition
[34], land cover classification [35], expression [36] and ges-
N N
ture [37] detection, seizure detection [38], ECG classification
X X [39], diagnosis and detection problems [40], [41], [42]. In
minw,b,h,qi (h + C1 qi + C2 qi ) (28)
i=1,yi =1 i=1,yi =1
the following sub-sections, we provide an analysis of the
generalization ability of the DAG approach to motivate its use
(29)
for the task of multi-class classification.
subject to: 1) Analysis of Generalization Ability: Here we state
without proof two theorems as stated in [16], which imply
N
X that we can control the capacity of Decision DAGs (DDAGs)
yi { j k(xi , xj ) + b} + qi 1, (30) by enlarging their margins. Both OVO and OVR classification
j=1 methods suffers from the problem of ambiguous regions where
N
X as Decision DAG partitions the input space into polytopic
yi { j k(xi , xj ) + b} + qi h (31) regions, each of which is mapped to a leaf node and assigned
j=1 to a specific class. Hence, in general, the performance of
qi 0 (32) DDAGs is better that OVO and OVR multiclass classification
methods
B. One v/s One Theorem 1: Suppose we are able to classify a random N
One v/s One (OVO) is another multiclass classification sample of labeled examples using a perceptron DDAG on
technique which was introduced in [14] and it was first used k classes containing M decision nodes with margins ri at
in SVMs in [15]. In this method, we construct k(k 1)/2 node i, then the generalization error can be bounded with
classifiers where each one is trained on the data from the probability greater than 1 to be less than
two classes. In order to classify a test point x we use voting
strategy. For each of the classifers we check the sign of output 130R2 2(2N )M
and increment the counter for the class given by the classifier. (D log(4eN )log(4N ) + log ) (33)
N
At the end, we then predict the class for test point as the class
M
with highest votes. This approach is also called as Max-Wins. P 1
where D = ri2
and R is the radius of a ball containing
In case of tie, we select the class with smaller index. i=1
the distributions support. Platt[16] further discusses that
C. Directed Acyclic Graph the only margins that should matter are the ones relative to
Directed Acyclic Graph SVM or DAGSVM [16] has the the boundaries of the cell where a given training point is
same training paradigm as that of OVO method. In training assigned, where as the bound in Theorem 1 depends on all
phase DAG solves k(k 1)/2 SVMs. However, for testing it the margins in the graph. Thus, it can be expected that a
creates a rooted binary tree structure with k(k 1)/2 internal DDAG whose j node margins are large, would accurately
nodes and k leaves. For each sample, the binary decision identify class j, even though other nodes do not have large
margins. TABLE II
Theorem 2: Suppose we are able to correctly distinguish C OMPARISON OF ACCURACIES (%) OF L INEAR MCM AND SVM FOR OVR
MULTICLASS CLASSIFICATION
class j from the other classes in a random N sample with a
DDAG G over k classes containing M decision nodes with Dataset ID Dataset Name MCM SVM
margins ri at node i, then with probability 1 1 Breast 60.41 6.34 68.74 6.82
2 User Modeling 86.59 3.47 87.07 2.69
3 Ecoli 84.80 2.59 87.48 4.38
130R2 2(2N )k1 4 Glass 63.59 9.15 63.33 7.06
j (G) (D log(4eN )log(4N ) + log ) 5 Iris 93.33 4.71 96.67 4.08
N 7 Seeds 96.19 1.30 95.24 0.01
(34) 8 Wine 98.30 2.54 97.17 2.02

10 Vehicle 78.94 4.00 78.83 2.46
1
P
where D = r2
and R is the radius of a ball 11 SVM guide 2 79.04 3.56 77.24 4.94
i 12 SVM guide 4 73.00 2.13 79.75 1.00
ijnodes
containing the distributions support.
V. R ESULTS
We now compare the performance of the OVO strategy on
In this section, we present our results on several UCI mul- the multiclass datasets using linear SVM and MCM classifiers.
ticlass datasets [6] and datasets from [7] for both MCM and The results are shown in Table III. It can be observed that
SVM. All the computational experiments have been carried when using the OVO strategy, linear MCM performs better
out on a 2.30 GHz Intel Core I5 (II gen) processor with 6 than SVM for 5 out of the 10 datasets considered, while the
GB RAM. For the SVM, we used the standard LIBSVM[13] SVM performs better than MCM for only 2 of the datasets.
package and the multiclass MCM code was written in Matlab. This indicates that the MCM performs relatively better than
As a pre-processing step, we first divide the data into 5-folds, SVM when using OVO strategy.
with the same ratio of samples from each class in each of
the folds. We then normalize the data to have zero mean and TABLE III
unit variance. We compared our results of multiclass MCM C OMPARISON OF ACCURACIES (%) OF L INEAR SVM AND MCM FOR OVO
with SVM. We tested each of the two algorithms on the MULTICLASS CLASSIFICATION

same folds. For the purpose of demonstration we used RBF Dataset ID Dataset Name MCM SVM
2
kernel K(xi , xj ) = ekxi xj k . Table I describes the various 1 Breast 72.22 11.03 69.82 6.95
datasets used for the experiment in terms of number of samples 2 User Modeling 88.07 3.46 88.80 3.53
3 Ecoli 87.19 3.82 87.18 3.38
and features contained in the dataset. This is indicated in the 4 Glass 61.65 6.49 63.47 6.18
column titled size of the dataset, indicated as [samples 5 Iris 97.33 4.34 97.33 4.34
features]. The last column indicates the number of classes in 7 Seeds 97.14 1.99 96.19 2.12
8 Wine 98.87 1.54 98.31 1.53
the dataset. 10 Vehicle 79.07 1.64 79.30 2.63
11 SVM guide 2 82.10 2.88 82.10 2.33
12 SVM guide 4 81.39 7.56 81.03 8.19
TABLE I
D ESCRIPTION OF D ATASETS USED FOR EXPERIMENTS . As the key objective in this paper is to evaluate the DAG
S.No. (Dataset ID) Dataset Size Classes strategy for multiclass classification, we now present the re-
1 Breast 106x9 6 sults of using the DAG approach for multi-class classification
2 User Modeling 403x5 4 using linear SVM and MCM. These results are shown in Table
3 Ecoli 336x7 6
4 Glass 214x9 6 IV. It can be observed that the MCM outperforms SVM for
5 Iris 150x4 3 6 of the 10 datasets. Even in the case of the remaining 4
6 libras 360x90 15 datasets where SVM accuracies are slightly higher than MCM,
7 Seeds 210x7 3
8 Wine 178x13 3 it may be noted that the standard deviations obtained across
9 Vowel 528x10 11 the five folds using the MCM are lower than that of SVM.
10 Vehicle 846x18 4 These results indicate that the DAG approach for multiclass
11 SVM guide 2 391x20 3
12 SVM guide 4 300x10 6 classification performs better for the case of the MCM when
compared to DAG-SVM.

A. Results for Linear MCM and SVM B. Results for Kernel MCM and SVM
This section presents the results for the multi-class datasets We now present the results for multiclass kernel MCM
using linear MCM and SVM. The results on the datasets are and SVM formulations for the datasets using the OVR, OVO
shown in Table II. One can observe that the linear MCM and DAG classification strategies. The results for the OVR
accuracies are better than SVM for only three datasets, which approach on the multiclass datasets using kernel SVM and
suggests that the OVR strategy need not necessarily give good MCM are shown in Table V.
generalization for multi-class datasets when using the linear The results for the OVO strategy on multi-class datasets
MCM. using the kernel SVM and MCM are presented in Table VI.
TABLE IV TABLE VII
C OMPARISON OF ACCURACIES (%) OF L INEAR SVM AND MCM FOR DAG C OMPARISON OF ACCURACIES (%) OF K ERNEL SVM AND MCM FOR
MULTICLASS CLASSIFICATION DAG MULTICLASS CLASSIFICATION

Dataset ID Dataset Name MCM SVM Dataset ID Dataset Name MCM SVM
1 Breast 67.14 5.17 68.74 6.82 1 Breast 65.05 5.11 65.05 5.11
2 User Modeling 86.34 2.48 87.07 2.69 2 User Modeling 91.05 3.47 90.55 1.99
3 Ecoli 86.59 4.32 87.48 4.38 3 Ecoli 88.80 1.05 87.16 5.89
4 Glass 63.54 5.10 63.33 7.06 4 Glass 67.31 3.33 71.47 4.64
5 Iris 97.33 4.34 96.67 4.08 5 Iris 97.33 2.78 96.00 2.78
7 Seeds 97.14 1.99 95.71 1.99 6 libras 83.93 5.25 87.25 7.10
8 Wine 98.87 1.54 97.19 1.96 7 Seeds 96.66 1.30 95.23 1.68
10 Vehicle 78.12 2.09 78.83 2.46 8 Wine 97.80 2.81 97.74 1.26
11 SVM guide 2 82.10 2.88 81.34 3.57 9 Vowel 71.15 0.01 81.24 3.79
12 SVM guide 4 82.72 8.86 79.75 1.00 10 Vehicle 81.91 2.38 81.19 4.64

TABLE V
C OMPARISON OF ACCURACIES (%) OF K ERNEL SVM AND MCM FOR with datasets of larger size by modifying the present MCM
OVR MULTICLASS CLASSIFICATION formulation to handle larger datasets. As the current MCM
Dataset ID Dataset Name MCM SVM formulation uses linear programming to solve the optimization
1 Breast problem work is in progress to develop a distributed version
2 User Modeling of MCM.
3 Ecoli 87.18 5.94 85.09 5.40
4 Glass 67.72 5.55 65.91 3.07
5 Iris 96.66 3.33 96.00 2.78 R EFERENCES
6 libras 81.21 9.56 82.82 8.42 [1] C. Cortes and V. Vapnik, Support-vector networks, Machine learning,
7 Seeds 95.23 1.68 vol. 20, no. 3, pp. 273297, 1995.
8 Wine 97.20 2.78 97.19 1.96 [2] C. J. Burges, A tutorial on support vector machines for pattern recogni-
9 Vowel 74.81 5.64 tion, Data mining and knowledge discovery, vol. 2, no. 2, pp. 121167,
10 Vehicle 81.19 4.64 1998.
[3] Jayadeva, Learning a hyperplane classifier by minimizing an exact
bound on the vc dimension, NEUROCOMPUTING, vol. 149, pp. 683
From the results, it can be seen that kernel MCM performs 689, 2015.
[4] C.-W. Hsu and C.-J. Lin, A comparison of methods for multiclass
better than SVM for only 4 of the 10 datasets. support vector machines, Neural Networks, IEEE Transactions on,
vol. 13, no. 2, pp. 415425, 2002.
TABLE VI [5] A. C. Lorena, A. C. De Carvalho, and J. M. Gama, A review on
C OMPARISON OF ACCURACIES (%) OF K ERNEL SVM AND MCM FOR the combination of binary classifiers in multiclass problems, Artificial
OVO MULTICLASS CLASSIFICATION Intelligence Review, vol. 30, no. 1-4, pp. 1937, 2008.
[6] C. Blake and C. J. Merz, {UCI} repository of machine learning
Dataset ID Dataset Name MCM SVM databases, 1998.
1 Breast 65.41 8.40 65.27 1.01 [7] C.-W. Hsu, C.-C. Chang, C.-J. Lin et al., A practical guide to support
2 User Modeling 89.54 4.33 90.08 1.50 vector classification, 2003.
3 Ecoli 87.78 3.95 87.16 5.89 [8] J. Platt et al., Fast training of support vector machines using sequen-
4 Glass 68.66 8.12 71.47 4.64 tial minimal optimization, Advances in kernel methodssupport vector
5 Iris 96.67 4.08 96.00 2.78 learning, vol. 3, 1999.
6 libras 83.42 9.21 87.22 7.78 [9] V. N. Vapnik and V. Vapnik, Statistical learning theory. Wiley New
7 Seeds 96.19 2.19 96.19 1.30 York, 1998, vol. 1.
8 Wine 97.76 3.62 97.74 1.26 [10] K. P. Bennett and E. J. Bredensteiner, Duality and geometry in svm
9 Vowel 76.57 4.81 81.24 4.16 classifiers, in ICML, 2000, pp. 5764.
10 Vehicle 83.79 2.74 84.62 3.86 [11] B. Scholkopf, R. Herbrich, and A. J. Smola, A generalized representer
theorem, in Computational learning theory. Springer, 2001, pp. 416
426.
Finally, the results using the DAG approach on multi-class [12] F. Dinuzzo and B. Scholkopf, The representer theorem
datasets for the kernel SVM and MCM are presented in Table for hilbert spaces: a necessary and sufficient condition,
VII. One can observe that the kernel MCM outperforms SVM in Advances in Neural Information Processing Systems 25,
F. Pereira, C. Burges, L. Bottou, and K. Weinberger, Eds.
for 6 of the 10 datasets using DAG approach. This establishes Curran Associates, Inc., 2012, pp. 189196. [Online]. Available:
the better generalization ability of the DAG approach over http://papers.nips.cc/paper/4841-the-representer-theorem-for-hilbert-spaces-a-necessary
OVO and OVR in case of multi-class classification using the [13] C.-C. Chang and C.-J. Lin, Libsvm: A library for support vector
machines, ACM Transactions on Intelligent Systems and Technology
MCM with an RBF kernel. (TIST), vol. 2, no. 3, p. 27, 2011.
[14] S. Knerr, L. Personnaz, and G. Dreyfus, Single-layer learning revisited:
VI. C ONCLUSION a stepwise procedure for building and training a neural network, in
We see that Kernel DAG MCM outperforms other methods Neurocomputing. Springer, 1990, pp. 4150.
[15] J. Friedman, Another approach to polychotomous classifcation, Dept.
like OVO and OVR multiclass classification for majority of the Statist., Stanford Univ., Stanford, CA, USA, Tech. Rep, 1996.
datasets. For the linear case OVO MCM and DAG MCM have [16] J. C. Platt, N. Cristianini, and J. Shawe-Taylor, Large margin dags for
similar performances. In this paper we have dealt with datasets multiclass classification. in nips, vol. 12, 1999, pp. 547553.
[17] B. Kijsirikul and N. Ussivakul, Multiclass support vector machines
of small sizes to demonstrate that MCM outperforms SVM using adaptive directed acyclic graph, in Proceedings of international
for various multiclass approaches.Future work involves dealing joint conference on neural networks (IJCNN 2002), 2002, pp. 980985.
[18] B. Kijsirikul, N. Ussivakul, and S. Meknavin, Adaptive directed acyclic chines and Drives (PEMD 2014), 7th IET International Conference on.
graphs for multiclass classification, in PRICAI, vol. 2002. Springer, IET, 2014, pp. 16.
2002, pp. 158168. [41] , Recursive undecimated wavelet packet transform and dag svm
[19] F. Schwenker, Hierarchical support vector machines for multi-class for induction motor diagnosis, 2015.
pattern recognition. in Kes, 2000, pp. 561565. [42] J. Lu, M. Lin, Y. Huang, and X. Kong, A high-accuracy algorithm
[20] F. Schwenker and G. Palm, Tree-structured support vector machines for surface defect detection of steel based on dag-svm, Sensors &
for multi-class pattern recognition, in Multiple Classifier Systems. Transducers, vol. 157, no. 10, p. 412, 2013.
Springer, 2001, pp. 409417.
[21] J. Suzuki, T. Hirao, Y. Sasaki, and E. Maeda, Hierarchical directed
acyclic graph kernel: Methods for structured natural language data, in
Proceedings of the 41st Annual Meeting on Association for Computa-
tional Linguistics-Volume 1. Association for Computational Linguistics,
2003, pp. 3239.
[22] V. Vural and J. G. Dy, A hierarchical method for multi-class support
vector machines, in Proceedings of the twenty-first international con-
ference on Machine learning. ACM, 2004, p. 105.
[23] G. Madzarov, D. Gjorgjevikj, and I. Chorbev, A multi-class svm
classifier utilizing binary decision tree, Informatica, vol. 33, no. 2, 2009.
[24] A. C. Lorena et al., An hybrid ga/svm approach for multiclass
classification with directed acyclic graphs, in Advances in Artificial
IntelligenceSBIA 2004. Springer, 2004, pp. 366375.
[25] K. LIAN, J.-g. HUANG, H.-j. WANG, and B. LONG, Study on
a ga-based svm decision-tree multi-classification strategy [j], Acta
Electronica Sinica, vol. 8, p. 006, 2008.
[26] S. Liu, J. Yun, and P. Chen, A new ga-based decision search for dag-
svm, in Computer Application and System Modeling (ICCASM), 2010
International Conference on, vol. 2. IEEE, 2010, pp. V2650.
[27] H. Midelfart and J. Komorowski, A rough set framework for learning
in a directed acyclic graph, in Rough Sets and Current Trends in
Computing. Springer, 2002, pp. 144155.
[28] J. Feng, Y. Yang, and J. Fan, Fuzzy multi-class svm classifier based
on optimal directed acyclic graph using in similar handwritten chinese
characters recognition, in Advances in Neural NetworksISNN 2005.
Springer, 2005, pp. 875880.
[29] M. N. Nguyen and J. C. Rajapakse, Multi-class support vector machines
for protein secondary structure prediction, Genome Informatics, vol. 14,
pp. 218227, 2003.
[30] H. Kim and H. Park, Protein secondary structure prediction based on
an improved support vector machines approach, Protein Engineering,
vol. 16, no. 8, pp. 553560, 2003.
[31] K. Sato, T. Mituyama, K. Asai, and Y. Sakakibara, Directed acyclic
graph kernels for structural rna analysis, BMC bioinformatics, vol. 9,
no. 1, p. 318, 2008.
[32] P.-Y. Hao, J.-H. Chiang, and Y.-K. Tu, Hierarchically svm classification
based on support vector clustering method and its application to docu-
ment categorization, Expert Systems with applications, vol. 33, no. 3,
pp. 627635, 2007.
[33] A. Sun and E.-P. Lim, Hierarchical text classification and evaluation,
in Data Mining, 2001. ICDM 2001, Proceedings IEEE International
Conference on. IEEE, 2001, pp. 521528.
[34] L. Wang and H. Sahbi, Directed acyclic graph kernels for action
recognition, in Computer Vision (ICCV), 2013 IEEE International
Conference on. IEEE, 2013, pp. 31683175.
[35] L.-M. He, F.-S. Kong, and Z.-Q. Shen, Multiclass svm based land
cover classification with multisource data, in Machine Learning and
Cybernetics, 2005. Proceedings of 2005 International Conference on,
vol. 6. IEEE, 2005, pp. 35413545.
[36] Y. Luo, Y. Cui, and Y. Zhang, Facial expression recognition based
on improved dagsvm, in International Symposium on Optoelectronic
Technology and Application 2014. International Society for Optics and
Photonics, 2014, pp. 930 126930 126.
[37] C. J. L. X. Z. Yi and L. Yuan, Improved dagsvm hand gesture
recognition approach, Journal of Huazhong University of Science and
Technology (Natural Science Edition), vol. 5, p. 017, 2013.
[38] A. M. Murugavel and S. Ramakrishnan, Tree based wavelet transform
and dag svm for seizure detection, Signal & Image Processing, vol. 3,
no. 1, p. 115, 2012.
[39] I. Saini, D. Singh, and A. Khosla, Electrocardiogram beat classification
using empirical mode decomposition and multiclass directed acyclic
graph support vector machine, Computers & Electrical Engineering,
vol. 40, no. 5, pp. 17741787, 2014.
[40] H. Keskes and A. Braham, Dag svm and pitch synchronous wavelet
transform for induction motor diagnosis, in Power Electronics, Ma-

Você também pode gostar