Você está na página 1de 12

This article has been accepted for publication in a future issue of this journal, but has not been

fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TBDATA.2017.2735996, IEEE
Transactions on Big Data
IEEE TRANSACTIONS ON BIG DATA, VOL. *, NO. *, * 201* 1

Noise-resistant Statistical Traffic Classification


Binfeng Wang, Jun Zhang, Member, IEEE, Zili Zhang, Lei Pan, Member, IEEE, Yang Xiang, Senior
Member, IEEE, and Dawen Xia

Abstract—Network traffic classification plays a significant role With the emergence of more and more new applications,
in cyber security applications and management scenarios. Con- the traditional traffic classification techniques are facing sig-
ventional statistical classification techniques rely on the assump- nificant challenges. The classic technique identifies the origin
tion that clean labelled samples are available for building classifi-
cation models. However, in the big data era, mislabelled training application of network traffic according to the port number
data commonly exist due to the introduction of new applications in the packet header, e.g., the port 80 is associated with
and lack of knowledge. Existing statistical traffic classification the HTTP application. The assumption is that each network
techniques do not address the problem of mislabelled training application use a distinct port number assigned by IANA.
data, so their performance become poor in the presence of Unfortunately, many applications are using dynamic ports and
mislabelled training data. To meet this challenge, in this paper,
we propose a new scheme, Noise-resistant Statistical Traffic even other applications’ port numbers for certain reasons,
Classification (NSTC), which incorporates the techniques of noise which makes the port-based technique ineffective. To address
elimination and reliability estimation into traffic classification. these problems, the payload-based technique is proposed to
NSTC estimates the reliability of the remaining training data apply deep packet inspection (DPI) to identify the applications
before it builds a robust traffic classifier. Through a number of according to specific content patterns in the payload of IP
traffic classification experiments on two real-world traffic data
sets, the results show that the new NSTC scheme can effectively packets, called application signatures [4]. Today most business
address the problem of mislabelled training data. Compared with systems apply the payload-based technique to classify network
the state of the art methods, NSTC can significantly improve the traffic. However, the payload-based technique cannot deal with
classification performance in the context of big unclean data. the applications with encrypted payload and has a privacy issue
Index Terms—Traffic classification, cyber security, machine caused by DPI. Recently, the research community focuses
learning. on the statistical traffic classification technique that does not
inspect the content of packet payload. This technique extracts
a set of statistical features from traffic flows and employs
I. I NTRODUCTION
machine learning (ML) [6] for application identification. In

T AFFIC classification is a fundamental tool for modern


cyber security management [1]. For example, network
administrators can apply traffic classification technologies to
a feature space, a traffic class consists of all traffic flows
generated by an application (or a type of applications), and
traffic classification becomes a classic multi-class classification
obtain the current network status, in particular the critical problem.
applications, services and user behaviors such as daily usage, Considering the real-world scenario of big traffic data, this
anomaly behaviors and so on. Traffic classification is usually paper addresses a new problem of mislabelled training samples
used to achieve the quality of service (QoS), i.e., various in the area of statistical traffic classification. With more new
applications are assigned to different priorities with appro- applications emerging in our daily life, the composition of
priate levels of Internet resource. For cyber security, traffic network traffic becomes more complex than ever. In practice,
classification can aid to quickly detect network intrusions [2] due to carelessness or lack of knowledge, mislabelled samples
such as Denial of Service attacks [3]. In the last decade, the will be present in the training data. The existing methods
technology of traffic classification has drawn increasing atten- do not consider the presence of such noisy data, so their
tions of academia and practitioners. CISCO has incorporated classification performance is severely compromised. The major
the traffic classification technology into its recent network contributions of this work are:
devices. The number of published research papers increases
dramatically after 2005 [5]. • We develop a new system, Noise-resistant Statistical
Traffic Classification (NSTC), to address the problem of
Manuscript received * *, *; revised * *, *. This work was supported mislabelled training samples.
by the National Natural Science Foundation of China (No. 61401371). • We propose a noise tolerant method to filter the noisy
(Corresponding author: Zili Zhang and Jun Zhang.)
B. Wang is with the College of Computer and Information Science & training samples so that the reliable training samples will
College of Software, Southwest University, Chongqing 400715, China. E- be kept for training traffic classifiers.
mail: wbf sm1989@163.com • We present a mathematical proof to justify our approach
J. Zhang, L. Pan, and Y. Xiang are with the School of Information Technol-
ogy, Deakin University, Geelong, VIC 3216, Australia. E-mail: {jun.zhang, and set up experiments for performance evaluation.
l.pan, yang}@deakin.edu.au.
Z. Zhang is with the College of Computer and Information Science &
Performance evaluation of the NSTC scheme is carried out
College of Software, Southwest University, Chongqing 400715, China, and the on two real-world Internet traffic data sets. The results show
School of Information Technology, Deakin University, Geelong, VIC 3216, that NSTC significantly outperforms the state of the art traffic
Australia. E-mail: zhangzl@swu.edu.cn.
D. Xia is with the School of Information Engineering, Guizhou Minzu
classification methods in the context of big unclean data.
University, Guiyang 550025, China. E-mail: gzmy xdw1982@163.com. The remainder of the paper is structured as follows. Sec-

2332-7790 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TBDATA.2017.2735996, IEEE
Transactions on Big Data
IEEE TRANSACTIONS ON BIG DATA, VOL. *, NO. *, * 201* 2

tion II reviews related works of statistical traffic classification. Zhang et al. [37] proposed a hybrid feature selection approach
Section III presents the details of noise-resistant statistical filtering most of features with WSU metric and using a wrap-
traffic classification. Section IV reports the experiments and per method to select features for a specific classifier with Area
results. Finally, Section V concludes this paper. Under ROC Curve (AUC) metric. Furthermore, to overcome
the impact of dynamic traffic flows on feature selection, they
II. R ELATED W ORK proposed an “Selects the Robust and Stable Features (SRSF)”
algorithm based on the results achieved by WSU AUC. Am-
This section presents a review on statistical traffic classifi- busaidi et al . [38] demonstrated that the removal of redundant
cation that can address the problems including dynamic ports, features significantly improved the intrusion detection rate. To
encrypted applications and protection of user privacy. identify both optimal and stable features, Fahad et al. [39]
The supervised traffic classification methods produce a proposed an Global Optimization Approach (GOA), relying
decision-making model employing supervised training data. on multi-criterion fusion-based feature selection technique and
And, the supervised training data [7] are labelled according an information theoretic method. To characterise application
to different applications ahead of time. A classifier is trained behaviour at the early stage, Huang et al. [40] developed
in the feature space using the training data set and applied to the statistical attributes of the first few application interaction
classify new network traffic. rounds for each flow from an application layer perspective and
Many classical supervised algorithms have been applied to proposed an “APPlication Round method (APPR)” algorithm
identify various network applications. These methods gen- to identify network application traffic.
erally use sufficient supervised training data. In some early To reduce the congestion and the high client-to-relay ratio,
work, Naı̈ve Bayes techniques [8] with kernel estimation and Al Sabah et al. [19] proposed to define classes of service for
fast correlation-based filter were applied to leverage statistical Tor’s traffic and map each application class to its appropriate
features to address the problems that payload-based traffic QoS requirement. When the characteristics of the network
classification suffer. To solve the problem of requiring full traffic change, the accuracy of classification will degrade.
packet/payloads for classification, Auld and Moore [9] em- Wang et al. [20] proposed an adjustable traffic classification
ployed the Bayesian neural network technique and used the system using the ensemble classification technique and a
features derived from packet header to obtain a high accuracy. change detection method to improve accuracy with relatively
Later, Este et al. [10] presented an approach that applied shorter updating time. Grimaudo et al. [21] pushed forward the
support vector machine (SVM) techniques to solve multi-class adoption of behavioral classifiers by engineering a hierarchical
traffic classification and developed an optimization algorithm classifier that allows proper classification of network traffic
under the circumstance of little training samples. into more than twenty fine grained classes. Nguyen et al. [22]
For the problem of real-time classification, Hullár et al. [11] used several statistics derived from sub-flows to achieve au-
used only the first few bytes of the first few packets and tomated QoS management and augmented training data sets
employed a Markov model to recognize Peer-to-Peer (P2P) to maintain timely and continuous traffic classification. Jaber
applications. Nguyen et al. [12] combined short sub-flows et al. [23] proposed a new online method that combines the
from the last N packets to improve the performance of real- statistical and host-based approaches in order to construct a
time classification. Bermolen et al. [13] proposed a new robust and precise method for early Internet traffic identifica-
methodology based on SVM to effectively identify P2P tion.
streaming applications in short time. To address the problem To mitigate the problem that unlabelled traffic samples in
that supervised methods are sensitive to the size of training the training data set affect the classification performance, some
data, Zhang et al. [14] incorporated correlated information and methods were proposed to automatically label the training
Bayes theory into the classification process. Glata et al. [15] data. Erman et al. [24] combined unsupervised and supervised
proposed a classification scheme that does not require the methods for identifying applications. They first employed a
training procedure. Their results show that the main sources clustering algorithm to partition a training data set that is
of one-way traffic derive from malicious scanning, peer-to- composed of scarce labeled flows and abundant unlabelled
peer applications and outages. Jin et al. [16] developed a flows. Second, they used the available labeled flows to obtain
lightweight modular architecture combining a couple of linear a mapping from the clusters to the different known classes.
binary classifiers to improve the classification performance. Li et al. [25] applied a semi-supervised SVM method to
Callado et al. [17] introduced a new set of methodologies for identify applications of network traffic. Their method only
generic combination of traffic identification and provided a required a few labeled samples and improved the classification
recommendation for using the combination algorithms. Xie et performance. Zhang et al. [5] proposed to extract the unknown
al. [18] adapted the subspace clustering to identify the traffic traffic samples from mass network traffic, and built a robust
of each application in isolation and improved the performance traffic classifier. Their experiments show reduced impact of
of one-class classification. unknown application in the process of classification. Wang
To reduce the redundant features, Liu et al. [36] pro- et al. [34] proposed to combine flow clustering based on
posed a class-oriented feature selection approach combing the application signatures. To solve the issue of mapping from
proposed local metric and the existing global metric. The flow clusters to real applications, Erman et al. [35] integrated
approach applies the weighted symmetric uncertainty strategy a set of supervised training data with unsupervised learning.
to removing the redundant features in each feature subset. However, it was still difficult to map a great number of clusters

2332-7790 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TBDATA.2017.2735996, IEEE
Transactions on Big Data
IEEE TRANSACTIONS ON BIG DATA, VOL. *, NO. *, * 201* 3

to real application. Additionally, the process of mapping would


produce a large proportion of “unknown” clusters. Chao et
al. [26] utilized the first few packets of a traffic flow to detect
small network traffic.
To compare the similarities between data samples, clustering
algorithms have been used in [27]. McGregor et al. [28]
extracted simple statistical characteristics of network traffic,
performed EM clustering, and manually matched the resulting
clusters with network applications. This approach was later
extended by applying other clustering algorithms including K-
means [29], DBSCAN [30], Fuzzy C-means [31]. To avoid
the manual matching of clusters and applications, Bernaille et
al. [32] used a payload analysis tool and relied on the first
five packets of a TCP connection to identify the clusters. In
the case of online classification, the method could map avail-
able traffic to the nearest cluster according to the Euclidean
distance. Erman et al. [33] compared the performance of
three unsupervised clustering algorithms: K-means, DBSCAN
and AutoClass. The empirical research shows that it could
obtain a high-quality result of recognizing network traffic Fig. 1. NSTC system model.
when the number of clusters is larger than the number of real
applications.
In summary, the existing work related to network traffic B. Noise Identification and Noise Tolerant Classification
classification can be used to construct real-time classifiers.
This step aims to remove the bad samples from the train-
These classifiers can achieve good performances if there are
ing set while conserving most of good samples. Given the
many clean training samples. However, a mechanism is needed
labelled training data {T1 , ..., Tm } for m classes, we identify
to effectively handle the mislabelled training samples. The next
the mislabelled samples by combining multiple classification
section will present such a mechanism based on the existing
algorithms [41] and using a consensus filtering strategy.
work.
We employ n classical learning algorithms with good per-
formance in traffic classification. We feed each algorithm with
III. P ROPOSED S CHEME
the training data containing unclean samples to train n traffic
As discussed in the above section, conventional statistical classifiers { f1 , ..., fn }. We nominate a classifier as the noise
traffic classification methods assume that the training data identifier. Let us take a classifier fi and a labeled flow x as an
are clean with little consideration of noisy data. However, example to explain the noise identification.
mislabelled samples often exist in the training data due to var-
ious reasons, which may significantly affect the classification
(
1, If prediction of fi (x) does NOT match x’s label;
performance. This section presents a Noise-resistant Statistical di =
0, Otherwise,
Traffic Classification (NSTC) scheme to address the problem
(1)
of mislabelled training data.
where di is the result of noise identification produced by
classifier fi . Specifically, “1” indicates a positive result that
A. System Model flow x is identified as noise by classifier fi ; and “0” indicates
Fig. 1 illustrates the system model of the proposed NSTC a negative result.
scheme. NSTC applies a novel approach to effectively com- In practice, many mislabelled samples may not be identified
bine noise elimination and reliability estimation into the clas- by all classifiers. To further handle these noisy samples, we
sification process. In the phase of processing traffic flows, propose a new reliability estimation method, which estimates
the system captures IP packets crossing a target network and how likely a training sample is mislabelled. This likelihood
constructs traffic flows according to the header information is associated with a noise value assigned to each sample. The
of the IP packets. For the classification purpose, a set of flow training data and their noise values will be used to build a
statistical features are extracted and discretized to represent the noise tolerant classifier.
traffic flows. NSTC employs a set of classifiers to determine We apply an ensemble-based method to estimate the noise
whether a training sample is noise or not. That is, a clean value for each training sample. A sum function is used to
sample will be recognized by most classifiers as “clean” data; aggregate the prediction values {di } to determine the noise
and a noisy sample will be recognized as “unclean” data. value. For a training sample x which exists in the training
NSTC uses a weighting scheme to select the clean samples data, we can obtain its noise value L as follows,
to be placed into the training data set. NSTC employs a
n
robust traffic classifier which can be used for online traffic
L = ∑ di , where 0 ≤ L ≤ n. (2)
identification. i=1

2332-7790 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TBDATA.2017.2735996, IEEE
Transactions on Big Data
IEEE TRANSACTIONS ON BIG DATA, VOL. *, NO. *, * 201* 4

Algorithm 1 : Noise-resistant statistical traffic classification. 0.14

Require: mislabelled training data set {T1 , ..., Tm } Class II


0 Class I before using our scheme
Ensure: S Robust training data set T 0.12
m Class I using our scheme
1: T ← i=1 Ti
2: Based on the training set T , use n algorithms to train n 0.10
translation
classifiers, { f1 (x), ..., fn (x)}
3: for traffic class i = 1 to m do 0.08

4: for j = 1 to ||Ti || do p2 p1 p0
5: Take a sample x j from Ti 0.06

6: Set L = 0;
0.04
7: for classifier l = 1 to n do
8: if x j is identified as noise by fl (x) then
0.02
9: L++
10: end if
0.00
11: end for m2 m1 x2 x1 a m0
12: if L 6= n then
13: Put x j into training cluster CiL Fig. 2. Performance Benefit Illustration
14: end if
15: end for
16: end for noisy data. If wk = 0, we will select none of the samples from
17: for k = 0 to n − 1 do this cluster because this cluster contains all noise identified by
18: Obtain the weight value wk all the classifiers.
19: Randomly select samples from Cik by (wk × 100) per- This likelihood estimation method can effectively alleviate
cent and put the selected samples into T
0 the noise influence through the noise tolerant process of com-
20: end for
bining reliability estimation and training sampling. The result
is that reliable training samples can properly contribute more
than that of the less reliable samples to the training procedure.
The value of L indicates how many classifiers identify this To the best knowledge of the authors, this is the first method to
sample as noise. use a sampling strategy to address the problem of mislabelled
For each traffic class, we group the training data into n training data. Finally, we apply Random Forest (RF) algorithm
training clusters according to their noise values. The samples to build a robust classification model with the new training data
in the same training cluster should have the same noise values. set. RF has demonstrated good generalisation capability and
excellent classification performance in the pervious work on
Ti = Ci0 ∪Ci1 ... ∪Ci(n−1) (3) network traffic classification [5].
For example, Cik consists of the training samples with the noise
value of k in the i-th traffic class. Different training clusters C. Theoretical Benefit Justification
have different noise levels that can also be determined by k. We study performance benefit of the new NSTC scheme
Considering different noise levels, we propose a function to through a theoretical analysis in a binary classification case.
compute the weights that will be used for training a robust Our theoretical analysis is based on the Bayesian theory and
classifier. the statistical theory, which can be extended to multi-class
n−k k
 
wk = , (4) classification in a straightforward manner.
n Suppose that we have two classes of network traffic — class
where wk is determined by the noise level of a training cluster. I and class II. Since this is the first time to investigate NSTC
We sort the training data according to the weights and train from a theoretical point of view, the normal distributions are
a robust classifier for traffic classification. Typically, for the used in this study,
i-th traffic class, 
 p0 (x) ∼ N(µ0 , σ02 )
1) From each training cluster Cik , randomly select hk sam- p (x) ∼ N(µ1 , σ 2 )
ples such that  1
p2 (x) ∼ N(µ2 , σ 2 ).
hk = ||Cik || × wk ; (5)
p0 (x) represents the probability density function (PDF) of
2) Merge the selected samples from all training clusters to class II. p1 (x) represents the PDF of class I, which is estimated
0
form the new training set Ti . by using the unclean training data. p2 (x) represents the PDF
For example, if wk = 1, we will select all samples from the of class I, which is estimated after using our proposed NSTC.
corresponding training cluster. These samples have very high The sizes of the two classes are similar. The curve of p1 (x) is
reliability because each sample in this cluster is identified by closer to p0 (x) than p2 (x) because some samples of class II are
all classifiers as non-noise. In another example, if wk = 0.01, mislabelled to class I. To simplify the analysis, we assume that
we will select 1 percent of the samples from the training cluster p1 (x) and p2 (x) have the same variance, σ . Fig. 2 illustrates
so that the majority of the samples in this set will be treated as the PDF plots.

2332-7790 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TBDATA.2017.2735996, IEEE
Transactions on Big Data
IEEE TRANSACTIONS ON BIG DATA, VOL. *, NO. *, * 201* 5

The goal of this analysis is to show that we can reduce the (µ1 − µ2 ) units. Therefore, ∃α that satisfies the following two
classification error rate by removing the mislabelled training equations,
data. According to the Bayesian theory, for the distributions of α − x2 = µ1 − µ2 , (16)
p0 (x) and p1 (x), the optimized decision point is at x1 , where
p1 (α) = p2 (x2 ). (17)
p0 (x1 ) = p1 (x1 ). (6)
When x < µ0 , p0 (x) is a monotonically increasing function.
The probability of the minimum classification error is In this case, both x1 and x2 are left to µ0 , so that x1 < µ0 and
Z x1 Z +∞
x2 < µ0 hold. According to Equation (12), x1 > x2 , we can
Pe1 = p0 (t) dt + p1 (t) dt (7)
−∞ x1 obtain
p0 (x1 ) > p0 (x2 ). (18)
   
x1 − µ0 x1 − µ1
= 1+Φ −Φ ,
σ0 σ
From Equations (6), (9) and (18), we get
where Φ(x) is the cumulative distribution function (CDF) of
the standard normal distribution, p1 (x1 ) > p2 (x2 ). (19)
Z x
1 −t 2 2

Based on Equations (17) and (19), we have
Φ(x) = √ e dt. (8)
2π −∞
p1 (x1 ) > p1 (α). (20)
Similarly, for the distributions of p0 (x) and p2 (x), the opti-
mized decision point is at x2 , where As shown in Fig. 2, x1 and α are right to µ1 , i.e., x1 > µ1 and
α > µ1 . When x > µ1 , p1 (x) is a monotonically decreasing
p0 (x2 ) = p2 (x2 ). (9) function. As a result, we obtain
The probability of the minimum classification error is x1 < α. (21)
   
x2 − µ0 x2 − µ2
Pe2 = 1 + Φ −Φ . (10) From Equations (16) and (21), we get
σ0 σ
The difference between Pe1 and Pe2 is x1 − x2 < µ1 − µ2 . (22)
    
x1 − µ0 x2 − µ0 We then reorganize Equation (22) to obtain
Pe1 − Pe2 = Φ −Φ (11)
σ0 σ0
     x2 − µ2 > x1 − µ1 . (23)
x2 − µ2 x1 − µ1
+ Φ −Φ .
σ σ Because σ > 0, we have
We would like to show that the difference between Pe1 and x2 − µ2 x1 − µ1
> . (24)
Pe2 is positive. We can see that (Pe1 −Pe2 ) has two components σ σ
according to Equation (11) . We then need to show that each
Since Φ(x) is a monotonically increasing function, we obtain
part is positive.    
Let us check the first part of (Pe1 − Pe2 ). In the case study, x 2 − µ2 x1 − µ1
Φ −Φ > 0. (25)
the graph of p1 (x) is closer to p0 (x) than p2 (x), as shown in σ σ
Fig. 2, so we can obtain the relative position of two decision
points, i.e., Now, we have proven that the second part of (Pe1 − Pe2 ) is
x1 > x2 . (12) positive.
Finally, from Equations (11), (15) and (25), we obtain
Considering σ0 > 0, we can have
Pe1 − Pe2 > 0, (26)
x1 − µ0 x2 − µ0
> . (13)
σ0 σ0 that is,
Since the CDF, Φ(x), is a monotonically increasing function, Pe2 < Pe1 . (27)
we obtain 
x1 − µ0
 
x 2 − µ0
 It means that the classification error of using NSTC is less
Φ >Φ . (14) than that of the original classification method. The reason is
σ0 σ0
that NSTC can address the problem of mislabelled training
The above formula is equivalent to data.
   
x1 − µ0 x2 − µ0
Φ −Φ > 0. (15)
σ0 σ0 IV. P ERFORMANCE E VALUATION
That is, we have proven that the first part of (Pe1 − Pe2 ) is To evaluate the correctness of the NSTC method, we choose
positive. to use two network traffic data sets for experiments and
Now, we work on the second part of (Pe1 − Pe2 ). In this compare the performance of the NSTC method and other
case, as shown in Fig. 2, the graph of p2 (x) can be treated as traffic classification methods. Our results show that the NSTC
the horizontal translation of the graph of p1 (x) to the left with method is resilient to mislabelled data.

2332-7790 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TBDATA.2017.2735996, IEEE
Transactions on Big Data
IEEE TRANSACTIONS ON BIG DATA, VOL. *, NO. *, * 201* 6

BT SMTP
SSL3 SSH
15.66% DNS 15.16%
15.66% POP3 15.16%
4.17%
SMTP 15.16%
15.66% SSL3
SSH MSN 15.16%
15.66% SMALL 2.73%
2.95% IMAP XMPP
RAZOR FTP 1.53%
3.01%
1.91% POP3 HTTP BT
HTTP 1.59% 15.16%
9.89% IMAP 15.66% 15.16% FTP DNS
1.17%
0.59% 1.18%
(a) ToN data set (b) ISP data set
Fig. 3. Network class distributions of the two data sets

A. Data and Experiments TABLE I


T HE 20 STATISTICAL FEATURES OF NETWORK FLOWS TRANSFERRED
To conduct network traffic classification experiments, we BETWEEN SERVER AND CLIENT [5], [14]
choose to use two publicly available and popular data sets
— ToN data set [5] and ISP data set [14]. The ToN data set Type of features Feature description Number
[5] is made of multiple heterogeneous real-world traces, with Packets Number of packets transferred in 2
each direction
the intention of minimizing the data bias. The ToN data set
Bytes Volume of bytes transferred in 2
consists of approximately 638,000 traffic flows scattered in 11 each direction
major traffic classes. The ISP data set [14] is a trace captured Packet Size Min., Max., Mean and Std Dev. 8
using a passive probe at a 100-Mbps Ethernet edge link from of packet size in each direction
an Internet service provider located in Australia. Full packet Inter-Packet Min., Max., Mean and Std Dev. 8
payloads are preserved in the collection without any filtering Time of packet time in each direction
Total 20
or packet loss. The ISP data set consists of approximately
200,000 flows randomly sampled from 11 major classes. Fig. 3
shows the network class distribution of the ToN and ISP data
to construct the training set for classifier training. To simulate
sets. Many network traffic classes are same across the two data
the problem of mislabelled training data, we randomly selected
sets, such as HTTP, BT, SMTP, POP3, SSH, SSL3, DNS and
a certain number of training samples and reassigned them to
FTP.
incorrect class labels. Consequently, the training pool consists
To represent the traffic flows, 20 statistical features are
of clean data and noisy data. We train the classifiers with data
extracted from the traffic flows [5], [14], as listed in TABLE I.
in the training pool and test the performance using the data
The traffic flows can be divided into two parts according to the
in the testing pool. We compare the proposed NSTC method
network flow directions, that is, client-to-server and server-to-
to four state-of-the-art traffic classification methods, which are
client. Then, we use feature selection [42] to remove irrelevant
suggested by the previous work [5], [14], [43]. All methods
and redundant features. Due to the different natures of the data,
are implemented in JAVA with the WEKA software suite [44].
we obtained different features for the two data sets:
We use two common metrics to measure the classification
• For the ToN data set, the selected nine features are client-
performance, overall accuracy and F-measure [14], [43].
to-server number of packets, client-to-server maximum
• Overall accuracy is the ratio of the sum of all correctly
packet bytes, client-to-server minimum packet bytes,
classified flows to the sum of all testing flows. It measures
client-to-server average packet bytes, the standard devia-
the accuracy of a classifier on the whole testing data.
tion of client-to-server packet bytes, client-to-server min-
• F-measure is calculated by
imum inter-packet time, server-to-client number of pack-
ets, server-to-client maximum packet bytes, and server- 2 × precision × recall
F − measure = , (28)
to-client minimum packet bytes. precision + recall
• For the ISP data set, the selected six features are client-
where precision is the ratio of correctly classified flows
to-server maximum packet bytes, client-to-server mini- over all predicted flows in a class, and recall is the ratio
mum inter-packet time, server-to-client number of pack- of correctly classified flows over all ground truth flows in
ets, server-to-client maximum packet bytes, the standard a class. F-measure indicates the per-class performance.
deviation of server-to-client packet bytes, and server-to-
client minimum inter-packet time.
Each data set is partitioned into two non-overlapping parts B. Results and Analysis
— the training pool and the testing pool. During each exper- To comprehensively evaluate the proposed NSTC scheme,
iment, we randomly selected 5% data from the training pool we compare the obtained experiment results in terms of overall

2332-7790 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TBDATA.2017.2735996, IEEE
Transactions on Big Data
IEEE TRANSACTIONS ON BIG DATA, VOL. *, NO. *, * 201* 7

100 100
SVM NN RF NSTC SVM NN RF NSTC
95 95

90 90

85 85
Overall Accuracy (%)

Overall Accuracy (%)


80 80

75 75

70 70

65 65

60 60

55 55

50 50

45 45
20 25 30 35 40 45 50 20 25 30 35 40 45 50
Percentage of mislabelled data (%) Percentage of mislabelled data (%)
(a) ToN data set (b) ISP data set
Fig. 4. Overall-accuracy vs noise-ratio

100 100
95 SVM NN RF NSTC SVM NN RF NSTC
95
90 90
85 85
Overall Accuracy (%)

Overall Accuracy (%)


80 80
75 75
70 70
65 65
60 60
55 55
50 50
45 45
10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100
Percentage of training data (%) Percentage of training data (%)
(a) ToN data set (b) ISP data set
Fig. 5. Overall-accuracy vs training-size

performance and per-class performance. To observe the impact method, RF, by 3 to 17 percent for the ToN data set and
of noisy samples, we vary the number of noisy samples with 2 to 7 percent for the ISP data set. For instance, when the
respect to a fixed number of training samples and the number training data have 40 percent noise, the overall accuracy of
of training samples with a fixed number of noisy samples, NSTC is higher than that of RF by 10 percent for the ToN
respectively. We report the average results of the 100 iterations data set and by 7 percent for the ISP data set, respectively.
of our experiments. RF is the second best method, which is significantly better
1) Overall Performance: The overall performance is eval- than SVM and NN. The result shows that the NSTC method
uated in terms of average accuracy against various size of can effectively improve the classification accuracy through
noise data and training data. Fig. 4 shows that the increas- aggregating the techniques of eliminating and tolerating noise.
ing density of noisy samples negatively affects the overall In addition, the trend reports a decline of the overall accuracy
classification accuracy for the two data sets; Fig. 5 shows when the noise ratio increases. The gap between the NSTC
that the increasing number of training samples with the fixed method and other methods becomes bigger on the two data
number of mislabelled samples positively improve the overall sets with the increment of noise ratio. Our results show that
classification accuracy for the two data sets. According to our the proposed NSTC method has superior performance in the
experiment results, the NSTC scheme outperforms the well- presence of a high noise percentage.
known classfiers including the support vector machine (SVM), With respect to different training data sizes, as shown in
nearest neighbour (NN) and random forest (RF) [5], [14], [43]. Fig. 5, the NSTC method is also superior to other methods on
In Fig. 4, our proposed NSTC scheme consistently out- the two data sets. We change the training data size and keep
performs other methods when there are increasing portions the same ratio of noise by randomly selecting samples from
of noisy samples in the training data. The overall accuracy the pre-organized training set. The size varies from 10 to 100
of the NSTC scheme is higher than that of the second best percent of the pre-organised training set. The overall accuracy

2332-7790 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TBDATA.2017.2735996, IEEE
Transactions on Big Data
IEEE TRANSACTIONS ON BIG DATA, VOL. *, NO. *, * 201* 8

100 100 100

90 SVM 90 90 SVM
NN NN
80 RF 80 80 RF
NSTC NSTC
70 70 70

60 60 60
F-measure (%)

F-measure (%)
F-measure (%)
50 50 50

40 40 40

30 30 30

20 20 SVM 20
NN
10 10 RF 10

0 0
NSTC 0

20 25 30 35 40 45 50 20 25 30 35 40 45 50 20 25 30 35 40 45 50
Percentage of mislabelled data (%) Percentage of mislabelled data (%) Percentage of mislabelled data (%)

(a) FTP (b) HTTP (c) IMAP


100 100
100
90 90 SVM
NN 90
80 80 RF
80
NSTC
70 70
70
60 60
F-measure (%)

F-measure (%)
F-measure (%)

60
50 50
50
40 40 40
30 30 30

20 SVM 20 SVM
20
NN NN
10 RF 10 10 RF
NSTC 0
NSTC
0 0

20 25 30 35 40 45 50 20 25 30 35 40 45 50 20 25 30 35 40 45 50
Percentage of mislabelled data (%) Percentage of mislabelled data (%) Percentage of mislabelled data (%)

(d) POP3 (e) RAZOR (f) SSH


100 100 100

90 90 90

80 80 80

70 70 70

60 60 60
F-measure (%)

F-measure (%)

F-measure (%)
50 50 50

40 40 40

30 30 30 SVM
NN
20 SVM 20 SVM 20 RF
NN NN
10 10 10
NSTC
RF RF
0
NSTC NSTC
0 0

20 25 30 35 40 45 50 20 25 30 35 40 45 50 20 25 30 35 40 45 50
Percentage of mislabelled data (%) Percentage of mislabelled data (%) Percentage of mislabelled data (%)

(g) SSL3 (h) BT (i) DNS


100 100

90 90 SVM
NN
80 80 RF
NSTC
70 70

60 60
F-measure (%)

F-measure (%)

50 50

40 40

30 30

20 SVM 20
NN
10 RF 10

0
NSTC 0

20 25 30 35 40 45 50 20 25 30 35 40 45 50
Percentage of mislabelled data (%) Percentage of mislabelled data (%)

(j) SMTP (k) SMALL


Fig. 6. F-measure on the ToN data set

of the NSTC method is consistently higher than the second sizes. The overall accuracy of all classifiers increases when
best method, RF, by up to 9 percent for the ToN and 10 percent more training data are used. Especially, the increase of the
for the ISP data sets, respectively. The RF’s performance is training size can immediately improve the overall accuracy of
much higher than that of SVM and NN. For instance, when the the NSTC method in the presence of unclean training samples.
size of training data reaches 70 percent of the pre-organized For example, when the training size reaches 60 percent, the
training set, the overall accuracy of NSTC is higher than that overall accuracy of NSTC method is improved by 15 percent
of the second best method RF by approximately 8 percent for the ToN data set and by 20 percent for the ISP data
for the ToN data set and 10 percent for the ISP data set. sets, respectively. On the contrary, there is inconsistent and
The results confirm the effectiveness of the proposed NSTC sometimes little improvement for the other three methods
method, which is robust against the change of training data when the training data are increasing.

2332-7790 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TBDATA.2017.2735996, IEEE
Transactions on Big Data
IEEE TRANSACTIONS ON BIG DATA, VOL. *, NO. *, * 201* 9

100 100 100

90 90 90 SVM
NN
80 80 80 RF
70
NSTC
70 70

60 60 60

F-measure (%)
F-measure (%)

F-measure (%)
50 50 50

40 40 40

30 30 SVM 30
NN
20 SVM 20 RF 20
NN NSTC
10 RF 10 10

0
NSTC 0 0

20 25 30 35 40 45 50 20 25 30 35 40 45 50 20 25 30 35 40 45 50
Percentage of mislabelled data (%) Percentage of mislabelled data (%) Percentage of mislabelled data (%)

(a) BT (b) DNS (c) FTP


100 100 100

90 90 SVM 90 SVM
NN NN
80 80 RF 80 RF
NSTC NSTC
70 70 70

60 60 60

F-measure (%)
F-measure (%)
F-measure (%)

50 50 50

40 40 40

30 30 30

20 SVM 20 20
NN
10 RF 10 10

0
NSTC 0 0

20 25 30 35 40 45 50 20 25 30 35 40 45 50 20 25 30 35 40 45 50

Percentage of mislabelled data (%) Percent of mislabelled data (%) Percentage of mislabelled data (%)

(d) HTTP (e) IMAP (f) MSN


100 100
100
90 90
90
80 80
80
70 70
70
60 60

F-measure (%)
F-measure (%)
F-measure (%)

60
50 50
50
40 40 40
30 30 30

20 SVM 20 SVM SVM


20
NN NN NN
10 RF 10 RF 10 RF
NSTC NSTC NSTC
0 0 0

20 25 30 35 40 45 50 20 25 30 35 40 45 50 20 25 30 35 40 45 50
Percentage of mislabelled data (%) Percentage of mislabelled data (%) Percentage of mislabelled data (%)

(g) POP3 (h) SMTP (i) SSH


100 100

90 90

80 80

70 70

60 60
F-measure (%)

F-measure (%)

50 50

40 40

30 30 SVM
NN
20 SVM 20 RF
NN
10 10
NSTC
RF
NSTC
0 0

20 25 30 35 40 45 50 20 25 30 35 40 45 50
Percentage of mislabelled data (%) Percentage of mislabelled data (%)

(j) SSL3 (k) XMPP


Fig. 7. F-measure on ISP data set

2) Per Class Performance: We use F-measure to assess the but NSTC can further improve the performance by 4 to 12
per-class performance of traffic classification. We compare percent over the performance of the second best method, RF.
the performance of the NSTC scheme with other methods FTP is not easy to classify for the conventional methods, but
including SVM, NN and RF. To evaluate the impact of NSTC can improve its F-measure by up to 20 percent. Though
mislabelled training samples, we change the noise ratio of the the SMALL class could not be identified easily by NSTC,
training data set from 20 percent to 50 percent. NSTC’s performance is still the highest. For the classes of
Fig. 6 shows the F-measure of each class in the ToN data POP3 and SSH, the NSTC’s F-measure can reach over 90
set. The performance of NSTC is consistently better than all percent that is not significantly affected by the noise ratio.
other three methods, no matter what the noise ratio is. For The results demonstrate that the proposed NSTC method can
example, SSH is the easiest class for the conventional methods, improve the F-measure of all classes in the ToN data set.

2332-7790 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TBDATA.2017.2735996, IEEE
Transactions on Big Data
IEEE TRANSACTIONS ON BIG DATA, VOL. *, NO. *, * 201* 10

100 percent.
90
Hence, the above results confirm the effectiveness of the
80 proposed NSTC method.
70 3) Further Evaluation: The use of correlated traffic flows
Overall Accuracy (%)

60
can further improve the traffic classification performance [14].
We are inspired by the idea of Traffic Classification using
50
Correlation (TCC) [14] to incorporate traffic flow correlation
40
into NSTC. To evaluate the effectiveness of the improved
30 NSTC, we conduct a number of tests and compare the results
20 of the improved NSTC and TCC in the presence of unclean
TCC Improved NSTC
training samples.
10
Fig. 8 depicts the overall accuracy of NSTC and TCC on
0
20 25 30 35 40 45 50
the ToN data set. Please note that the results on the ISP data
Percentage of mislabelled data (%) set are similar. The results show that the improved NSTC has
much better overall accuracy than TCC. For instance, when the
Fig. 8. Overall Accuracy of TCC vs NSTC training data contain 30 percent noise, the overall accuracy of
NSTC is higher than TCC by 10 percent. When the percentage
of mislabelled data is increased to 50 percent, TCC’s accuracy
reduced to about 75%, while NSTC’s accuracy is still over
3000
80%. The results suggest that TCC struggles with handling
mislabelled training data and flow correlation can be used to
2500
further improve NSTC’s performance.
)
ion time (s

2000 Fig. 10 reports the F-measure of NSTC and TCC on the ToN
1500 data set. The F-measure of NSTC significantly outperforms
TCC in each class, no matter what the noise ratio is. For
Classificat

1000
example, in class RAZOR, NSTC’s F-measure is higher than
500
TCC’s F-measure about 20 percent. In class DNS, the different
Pe 50 0
rce 45 SV
is about 15 percent. In some classes, such as HTTP, SSL3
nta M
ge 40
of
and SMTP, TCC’s F-measure is very good, but NSTC can
mi 35 NN
sla 30
further improve it. NSTC delivers consistent and reliable
be
l le
dd
25 NS results, which confirms its capability of addressing mislabelled
TC
ata 20 training data.
(%
) We also evaluate the classification time of three methods,
SVM, NN and NSTC. The comparative results of classification
Fig. 9. Computation time
time are listed in Fig. 9. The listed computation time excludes
the time used during the preprocessing step when the noisy
That is, the techniques of eliminating and tolerating noise can samples are injected to the training data. In this comparison,
effectively address the problem of mislabelled training data we focus on classification time that is crucial for online traffic
from the per-class perspective. classification. As shown in Fig. 9, NSTC is significantly faster
Fig. 7 shows the F-measure of each class in the ISP data than SVM and NN. More specifically, NSTC extends an RF
set. NSTC consistently achieves the highest F-measure for classifier for traffic classification and the noise ratio does not
each class. Here, we divide all the network classes into three affect the classification procedure. Therefore, NSTC has high
categories according to the performance of the conventional efficiency and is suitable to online traffic classification.
methods, i.e., easy, average and hard:
• BT, POP3, SMTP and SSH are in the easy category.
V. C ONCLUSION
Although the space for improvement is small, NSTC can
further improve the performance by up to 15 percent. For This paper presented a real-world challenge that network
instance, the improvement is 12 percent for POP3 when traffic classification struggles to perform well in the presence
the training data include 50 percent noise. of mislabelled samples in the training data. That is, when
• DNS, HTTP, IMAP, MSN, SSL3 and XMPP belong to mislabelled traffic samples are present, conventional traffic
the average category, where NSTC significantly improves classification methods cannot sustain their performance. We
F-measure. For example, for XMPP, the F-measure of proposed a novel traffic classification method, noise-resistant
NSTC is higher than the second best method, RF, by statistical traffic classification (NSTC), which can identify
about 20 percent when the noise ratio is 25 percent. noisy examples and tolerate suspected noisy samples. We
• FTP is in the hard category, where its F-measure of the provided the empirical and theoretical study to demonstrate
conventional methods is much lower than 50 percent. the performance benefit of the new NSTC method compared
NSTC can still improve its performance by about 5 to the existing methods. The experiments and results show that

2332-7790 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TBDATA.2017.2735996, IEEE
Transactions on Big Data
IEEE TRANSACTIONS ON BIG DATA, VOL. *, NO. *, * 201* 11

100 100 100

90 90 90

80 80 80

70 70 70

60 60 60

F-measure (%)
F-measure (%)
F-measure (%)

50 50 50

40 40 40

30 30 30

20 20 20

10 10 10
TCC Improved NSTC TCC Improved NSTC TCC Improved NSTC
0 0 0
20 25 30 35 40 45 50 20 25 30 35 40 45 50 20 25 30 35 40 45 50

Percentage of mislabelled data (%) Percentage of mislabelled data (%) Percentage of mislabelled data (%)

(a) FTP (b) HTTP (c) IMAP


100 100
100
90 90
90
80 80
80
70 70
70
60 60
F-measure (%)

F-measure (%)

F-measure (%)
60
50 50
50
40 40
40
30 30
30
20 20 20

10 10 10
TCC Improved NSTC TCC Improved NSTC TCC Improved NSTC
0 0 0
20 25 30 35 40 45 50 20 25 30 35 40 45 50 20 25 30 35 40 45 50
Percentage of mislabelled data (%) Percentage of mislabelled data (%) Percentage of mislabelled data (%)

(d) POP3 (e) RAZOR (f) SSH


100 100 100

90 90 90

80 80 80

70 70 70

60 60
F-measure (%)

60

F-measure (%)
F-measure (%)

50 50 50

40 40 40

30 30 30

20 20 20

10 10 10
TCC Improved NSTC TCC Improved NSTC TCC Improved NSTC
0 0 0
20 25 30 35 40 45 50 20 25 30 35 40 45 50 20 25 30 35 40 45 50

Percentage of mislabelled data (%) Percentage of mislabelled data (%) Percentage of mislabelled data (%)

(g) SSL3 (h) BT (i) DNS


100 100

90 90

80 80

70 70

60 60
F-measure (%)
F-measure (%)

50 50

40 40

30 30

20 20

10 10
TCC Improved NSTC TCC Improved NSTC
0 0
20 25 30 35 40 45 50 20 25 30 35 40 45 50

Percentage of mislabelled data (%) Percentage of mislabelled data (%)

(j) SMTP (k) SMALL


Fig. 10. F-measure of TCC vs NSTC

NSTC delivers consistently superior performance to other traf- [3] Y. Xiang, W. Zhou, and M. Guo, “Flexible deterministic packet marking:
fic classification schemes in the presence of unclean training An IP traceback system to find the real source of attacks,” IEEE Trans.
Parallel Distrib. Syst., vol. 20, no. 4, pp. 567–580, Apr. 2009.
data. [4] M. A. Ashraf, H. Jamal, S. A. Khan, Z. Ahmed, and M. I. Baig, “A
Heterogeneous Service-Oriented Deep Packet Inspection and Analysis
Framework for Traffic-Aware Network Management and Security Sys-
R EFERENCES tems,” IEEE Access, vol. 4, pp. 5918–5936, 2016.
[1] T. T. T. Nguyen and G. Armitage, “A survey of techniques for Internet [5] J. Zhang, X. Chen, Y. Xiang, W. Zhou, and J. Wu, “Robust Network
traffic classification using machine learning,” IEEE Commun. Surveys Traffic Classification,” IEEE/ACM Trans. Netw., vol. 23, no. 4, pp. 1257–
Tuts., vol. 10, no. 4, pp. 56–76, 4th Quart., 2008. 1270, Aug. 2015.
[2] S. Mavoungou, G. Kaddoum, M. Taha, and G. Matar, “Survey on Threats [6] S. Tatinati, K. C. Veluvolu and W. T. Ang, “Multistep Prediction of
and Attacks on Mobile Networks,” IEEE Access, vol. 4, no. , pp. 4543– Physiological Tremor Based on Machine Learning for Robotics Assisted
4572, 2016. Microsurgery,” IEEE Trans. Cybern., vol. 45, no. 2, pp. 328–339, Feb.

2332-7790 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TBDATA.2017.2735996, IEEE
Transactions on Big Data
IEEE TRANSACTIONS ON BIG DATA, VOL. *, NO. *, * 201* 12

2015. [31] D. Liu and C. Lung, “P2P traffic identification and optimization using
[7] D. Kelly and B. Caulfield, “Pervasive Sound Sensing: A Weakly fuzzy c-means clustering,” in Proc. IEEE Int. Conf. Fuzzy Syst., pp.
Supervised Training Approach,” IEEE Trans. Cybern., vol. 46, no. 1, 2245–2252, 2011.
pp.123–135, Jan. 2015. [32] L. Bernaille, R. Teixeira, I. Akodkenou, A. Soule, and K. Salamatian,
[8] A. W. Moore and D. Zuev, “Internet traffic classification using Bayesian “Traffic Classification on the Fly,” SIGCOMM Comput. Commun. Rev.,
analysis techniques,” in Proc. SIGMETRICS, vol. 33, pp. 50–60, Jun. vol. 36, no. 2, pp. 23–26, Apr. 2006.
2005. [33] J. Erman, M. Arlitt, and A. Mahanti, “Traffic Classification Using
[9] T. Auld, A. Moore, and S. Gull, “Bayesian neural networks for internet Clustering Algorithms,” in Proc. ACM SIGCOMM, pp. 281–286, 2006.
traffic classification,” IEEE Trans. Neural Netw., vol. 18, no. 1, pp. 223– [34] Y. Wang, Y. Xiang, and S.-Z. Yu, “An Automatic Application Signature
239, Jan. 2007. Construction System for Unknown Traffic,” Concurrency Computat.
[10] A. Este, F. Gringoli, and L. Salgarelli, “Support vector machines for Pract. Exper., vol. 22, pp. 1927–1944, 2010.
TCP traffic classification,” Comput. Netw., vol. 53, no. 14, pp. 2476– [35] J. Erman, A. Mahanti, M. Arlitt, I. Cohen, and C. Williamson, “Of-
2490, 2009. fline/realtime traffic classification using semi-supervised learning,” Per-
[11] B. Hullár, S. Laki, and A. Gyorgy, “Early identification of peer-to-peer form. Eval., vol. 64, no. 9, pp. 1194–1213, Oct. 2007.
traffic,” in Proc. IEEE Int. Conf. Commun., pp. 1–6, 2011. [36] Z. Liu, R. Wang, M. Tao, and X. Cai, “A class-oriented feature selection
[12] T. T. T. Nguyen and G. Armitage, “Training on multiple sub-flows approach for multi-class imbalanced network traffic datasets based on
to optimize the use of machine learning classifiers in real-world IP local and global metrics fusion,” Neurocomputing, vol. 168, pp. 365–
networks,” in Proc. 31st IEEE Conf. Local Comput. Netw., pp. 369– 381, 2015.
376, 2006. [37] H. Zhang, G. Lu, M. T. Qassrawi, Y. Zhang, and X. Yu, “Feature
[13] P. Bermolen, M. Mellia, M. Meo, D. Rossi, and S. Valenti, “Abacus: selection for optimizing traffic classification,” Comput. Commun., vol.
Accurate behavioral classification P2P-TV traffic,” Comput. Netw., vol. 35, no. 12, pp. 1457–1471, 2012.
55, no. 6, pp. 1394–1411, 2011. [38] M. A. Ambusaidi, X. He, P. Nanda, and Z. Tan, “Building an Intrusion
[14] J. Zhang, Y. Xiang, Y. Wang, W. Zhou, Y. Xiang, and Y. Guan, “Network Detection System Using a Filter-Based Feature Selection Algorithm,”
traffic classification using correlation information,” IEEE Trans. Parallel IEEE Transactions on Computers, vol. 65, no. 10, pp. 2986–2998, Oct.
Distrib. Syst., vol. 24, no. 1, pp. 104–117, Jan. 2013. 1 2016.
[15] E. Glatz and X. Dimitropoulos, “Classifying internet one-way traffic,” [39] A. Fahad, Z. Tari, I. Khalil, A. Almalawi, and A. Y. Zomaya, “An
in Proc. ACM SIGMETRICS/PERFORMANCE Joint Int. Conf. Meas. optimal and stable feature selection approach for traffic classification
Model. of Comput. Syst., pp. 417–418, 2012. based on multi-criterion fusion,” Future Gen. Comput. Syst., vol. 36,
[16] Y. Jin, N. Duffield, J. Erman, P. Haffner, S. Sen, and Z.-L. Zhang, “A pp. 156–169, 2014.
modular machine learning system for flow-level traffic classification in [40] N. F. Huang, G. Y. Jai, H. C. Chao, Y. J. Tzang, and H. Y. Chang,
large networks,” ACM Trans. Knowl. Discov. Data, vol. 6, no. 1, pp. “Application traffic classification at the early stage by characterizing
4:1–4:34, Mar. 2012. application rounds,” Inform. Sci., vol. 232, pp. 130–142, 2013.
[41] V. Soto, S. Garcı́a-Moratilla, G. Martinez-Munoz, D. Hernández-Lobato,
[17] A. Callado, J. Kelner, D. Sadok, C. Alberto Kamienski, and S. Fer-
and A. Suárez, “A double pruning scheme for boosting ensembles,” IEEE
nandes, “Better network traffic identification through the independent
Trans. Cybern., vol. 44, no. 12, pp. 2682–2695, Dec. 2014.
combination of techniques,” J. Netw. Comput. Appl., vol. 33, no. 4, pp.
[42] I. Guyon and A. Elisseeff, “An introduction to variable and feature
433–446, 2010.
selection,” J. Mach. Learning Res., vol. 3, pp. 1157–1182, 2003.
[18] G. Xie, M. Iliofotou, R. Keralapura, M. Faloutsos, and A. Nucci, “Sub-
[43] H. Kim, K. Claffy, M. Fomenkov, D. Barman, M. Faloutsos, and K.
Flow: Towards practical flow-level traffic classification,” in Proc. IEEE
Lee, “Internet Traffic Classification Demystified: Myths, Caveats, and
INFOCOM, pp. 2541–2545, 2012.
the Best Practices,” in Proc. ACM CoNEXT, pp. 1–12, 2008.
[19] M. AlSabah, K. Bauer, and I. Goldberg, “Enhancing Tor’s performance
[44] Weka 3: Data Mining Software in Java. http:// www.cs.waikato.ac.nz/ ml/
using real-time traffic classification,” in Proc. ACM Conf. Computer and
weka.
Comm. Security, pp. 73–84, 2012.
[20] R. Wang, L. Shi, and B. Jennings, “Ensemble classifier for traffic in
presence of changing distributions,” in Proc. IEEE Symposium Comput.
Commun., pp. 629–635, 2013.
[21] L. Grimaudo, M. Mellia, and E. Baralis, “Hierarchical learning for fine
grained internet traffic classification,” in Proc. IEEE Int. Conf. IWCMC,
pp. 463–468, 2012.
[22] T. T. T. Nguyen, G. Armitage, P. Branch, and S. Zander, “Timely
and continuous machine-learning-based classification for interactive IP
traffic,” IEEE/ACM Trans on Nets., vol. 20, no. 6, pp. 1880–1894, Dec.
2012.
[23] M. Jaber, R. G. Cascella, and C. Barakat, “Using host profiling to
refine statistical application identification,” in Proc. IEEE INFOCOM,
pp. 2746–2750, 2012.
[24] J. Erman, A. Mahanti, M. Arlitt, I. Cohenz, and C. Williamson, “Semi-
supervised network traffic classification,” SIGMETRICS Perform. Eval.
Rev., vol. 35, no. 1, pp. 369–370, 2007.
[25] X. Li, F. Qi, D. Xu, and X. Qiu, “An internet traffic classification method
based on semi-supervised support vector machine,” in Proc. IEEE ICC.
pp. 1–5, 2011.
[26] S. C. Chao, K. C. J. Lin, and M. S. Chen, “Flow Classification for
Software-Defined Data Centers Using Stream Mining,” IEEE Transac-
tions on Services Computing, vol. PP, no. 99, pp. 1–1, 2016.
[27] C. L. Liu, W. H. Hsaio, C. H. Lee and F. S. Gou, “Semi-supervised
linear discriminant clustering,” IEEE Trans. Cybern., vol. 44, no. 7, pp.
989–1000, Jul. 2014.
[28] A. McGregor, M. Hall, P. Lorier, and J. Brunskill, “Flow Clustering
Using Machine Learning Techniques,” in Proc. PAM Workshop France,
pp. 205–214, Apr. 2004.
[29] J. Erman, A. Mahanti, and M. Arlitt, “Internet traffic identification using
machine learning,” in Proc. 49th IEEE GLOBECOM, pp. 1–6, Dec.
2006.
[30] J. Erman, M. Arlitt, and A. Mahanti, “Traffic classification using
clustering algorithms,” in Proc. ACM SIGCOMM Workshop, pp. 281–
286, 2006.

2332-7790 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Você também pode gostar