A Mobile Messaging Apps: Service Usage Classification To Internet Traffic and Encryption System

IDL - International Digital Library Of
Technology & Research

Volume 1, Issue 2, Mar 2017 Available at: www.dbpublications.org
International e-Journal For Technology And Research-2017
A Mobile Messaging Apps: Service Usage

Classification to Internet Traffic and
Encryption System
Ms. SHRUTHI N 1, Dr. K THIPPESWAMY 2
Dept. of Computer Science
1
MTech, Student VTU PG Center, Mysuru, India
2
Guide & Head of Department VTU PG Center, Mysuru, India
SURVEY PAPER 2 INTRODUCTION:

1 ABSTRACT: The rapid adoption of mobile Recent years have witnessed the increased popularity
messaging Apps has enabled us to collect massive of mobile messaging Apps, such as WeChat and
amount of encrypted Internet traffic of mobile WhatsApp. Indeed, messaging Apps have become the
messaging. The classification of this traffic into hubs for most activities of mobile users. For example,
different types of in-App service usages can help for messaging Apps help people text each another, share
intelligent network management, such as managing photos, chat, and engage in commercial activities such
network bandwidth budget and providing quality of as paying bills, booking tickets and shopping. Mobile
services. Traditional approaches for classification of companies monetize their services in messaging Apps.
Internet traffic rely on packet inspection, such as Therefore, service usage analytics in messaging Apps
parsing HTTP headers. However, messaging Apps are becomes critical for business, because it can help
increasingly using secure protocols, such as HTTPS understand in-App behaviors of end users, and thus
and SSL, to transmit data. This imposes significant enables a variety of applications. For instance, it
challenges on the performances of service usage provides in-depth insights into end users and App
classification by packet inspection. To this end, in this performances, enhances user experiences, and
paper, we investigate how to exploit encrypted Internet increases engagement, conversions and monetization.
traffic for classifying in-App usages. Specifically, we There are four modules in our system including traffic
develop a system, named CUMMA, for classifying segmentation, traffic feature extraction, service usage
service usages of mobile messaging Apps by jointly prediction, and outlier detection and handling.
modeling user behavioral patterns, network traffic Specifically, we first built a data collection platform to
characteristics, and temporal dependencies. Along this collect the traffic-flows of in-App usages and the
line, we first segment Internet traffic from traffic- corresponding usage types reported by mobile users.
flows into sessions with a number of dialogs in a We then hierarchically segment these traffic from
hierarchical way. Also, we extract the discriminative traffic-flows to sessions to dialogs where each is
features of traffic data from two perspectives: (i) assumed to be of individual usage or mixed usages.
packet length and (ii) time delay. Next, we teach a Also, we extracted the packet length related features
service usage predictor to classify these segmented and the time delay related features from traffic-flows
dialogs into single-type usages or outliers. In addition, to prepare the training data. In addition, we learned
we design a clustering Hidden Markov Model (HMM) service usage classifiers to classify these segmented
based method to detect mixed dialogs from outliers dialogs. Moreover, we detected the anomalous dialogs
and decompose mixed dialogs into sub-dialogs of with mixed usages and segmented these mixed dialogs
single-type usage. Indeed, CUMMA enables mobile into multiple sub-dialogs of single type usage. Finally,
analysts to identify service usages and analyze end- the experimental results on real world WeChat and
user in-App behaviors even for encrypted Internet WhatsApp traffic data demonstrate the performances
traffic. of the proposed method. With this system, we showed
that the valuable applications for in-App usage
analytics can be enabled to score quality of
IDL - International Digital Library 1|P a g e Copyright@IDL-2017


experiences, profile user behaviors and enhance (2) Determine if attacks are generalizable across
customer care. Traditional methods for traffic services, and
classification rely on packet inspection by analyzing (3) Calculate the potential costs of protecting against
the TCP or UDP port numbers of an IP packet or this leakage.
reconstructing protocol signatures in its payload For Specifically, we focus our analysis on the Apple
example, an IP packet usually has five tuples of iMessage service and show that it is possible to reveal
protocol types, source address, source port, destination information about the device operating system, fine-
address and destination port. People estimate the usage grained user actions, the language of the messages,
types of traffic by assuming that messaging Apps and even the approximate message length with
consistently transmit data using the same port numbers accuracy exceeding 96%, as shown in the summary
which are visible in the TCP and UDP headers. provided in Table 1. In addition, we demonstrate that
However, there are emerging challenges for inspecting these attacks are applicable to many other popular
IP packet content. For example, messaging Apps are messaging services, such as WhatsApp, Viber, and
increasingly using unpredictable port numbers. Also, Telegram, because they target deterministic
customers may encrypt the content of packets. In relationships between user actions and the resultant
addition, governments have imposed privacy encrypted packets that exist regardless of the
regulations which limit the ability of third parties to underlying encryption methods or network protocols
lawfully inspect packet contents. Moreover, many used. Our analysis of countermeasures shows that the
mobile apps use the Secure Sockets Layer (SSL) and attacks can be completely mitigated by adding random
its successor Transport Layer Security (TLS) as a Padding to the messages, but at a cost of over 300%
building block for encrypted communications. overhead, which translates to at least a terabyte of
extra data per day for the service providers. Overall,
3 SURVEY these attacks could impact over a billion users across
3.1 Traffic Analysis of Encrypted Messaging the globe and the high level of accuracy that we
Services: Apple iMessage and Beyond This survey demonstrate in our experiments means that they
refer thats the course of the past decade instant represent realistic threats to privacy, particularly given
messaging services Have gone from a niche recent revelations about widespread metadata
application used on desktop Computers to the most collection by government agencies.
prevalent form of communication in the world, due in
large part to the growth of Internet enabled phones and 3.2 Real-time Classification for Encrypted Traffic
tablets. Messaging services, like Apple Message and Survey of Classifying network flows by their
Whats App, handle tens of billions of messages each application type is the backbone of many crucial
day from an international user base of over one billion network monitoring and controlling tasks. Basic
people. Given the volume of messages traversing these network management functions such as billing, quality
services and ongoing concerns over widespread of service, network equipment optimization, security
eavesdropping of Internet communications, it is not and trend analyzers, are all based on the ability to
surprising that privacy has been an important topic for accurately classify network traffic into the right
both the users and service providers. To protect user corresponding application. Historically, one of the
privacy, these messaging services offer transport layer most common forms of traffic classification has been
encryption technologies to protect messages in transit, the port-based classification, which makes use of the
and some services, like iMessage and Telegram, offer port numbers employed by the application on the
end-to-end encryption to ensure that not even the transport layer. However, many modern applications
providers themselves can eavesdrop on the messages. use dynamic ports negotiation making port-based
As previous experience with Voice-over-IP and classification ineffective with accuracy ranges
encrypted HTTP tunnels have shown us, however, the between 30% and 70%. The next step in the evolution
use of state-of-the-art encryption technologies is no of classification techniques was Deep Packet
guarantee of privacy for the underlying message Inspection (DPI) or payload-based classification. DPI
content. In this paper, we analyze the network traffic requires the inspection ofthe packets payload. The
of popular encrypted messaging services to classifier extracts the application payload from the
(1) Understand the breadth and depth of their TCP/UDP packet and searches for a signature that can
information leakage, identify the flow type. Signatures usually include a
sequence of bytes/strings and offsets that are unique to


the application and characterize it. DPI is widely used Most papers usually focus on either traffic flow
by todays traffic classifier vendors. It is very accurate reassembly or traffic classification and identification,
but suffers from a number of drawbacks. Recently, we But not on their combination. This paper describes the
have witnessed a dramatic growth in the variety of architecture of a real time Internet traffic classifier
network applications. Some of these applications are monitor for use in corporate networks. It also
transmitted in an encrypted manner, posing a great evaluates different machine learning methods for
challenge to the DPI paradigm. Such applications may network traffic classification. The classifier monitor is
choose that encryption both for security and to avoid based on concept of bidirectional flow. This means
detection. Common P2P applications such as that the fundamental object to be classified in a
BitTorrent and eMule have recently added encryption determined pattern is the traffic flow, either complete
capabilities (primarily to avoid detection). As a or as sub flow. A flow is defined by one or more
significant share of the total bandwidth is occupied by packets between a host pair with the same quintuple:
P2P applications and since current DPI based source and destination IP address, source and
classifiers must see the packets payload, encryption destination ports and protocol type (ICMP, TCP,
may become a real threat for ISPs in the near future. UDP).
The inability of port-based and payload-based analysis
to deal with the wide range of new applications and 3.4 On different ways to classify Internet traffic: a
techniques used in order to avoid detection has short review of selected publications Internet traffic
motivated the study of other classification techniques. classification or identification is the act of matching IP
Two examples include behavior based classification Packets to the application that generated them. Traffic
and classification based on a combination of learning classification is important for managing computer
theory and statistics. networks: for example, it is used for traffic shaping,
policy routing, and packet filtering. From business
3.3 ITCM: Real Time Internet Traffic Classifier point of view, it provides valuable marketing
Monitor The Internet traffic is changing information via customer profiling, whereas scientific
continuously and this contribute to difficult the and government agencies employ it to identify global
characterization of network behavior and structure. Internet trends. Given just a single IP packet it is
Massive games and cloud and grid services increase difficult to classify it there is no application name in
every day their percentage participation in total the protocol headers. In the past, the service port
network traffic. Traffic monitoring Systems generally number was used for discriminating the traffic class,
make use of flow information. Examples are Net Flow but this became ineffective in the early 2000s due to
or IETF IPFIX, which defines a standard to exporting peer-to-peer (P2P) traffic. Another popular and de
flow information by routers and switches. Such facto standard classification method is Deep Packet
systems are widely used in network service providers Inspection (DPI): pattern matching on full packet
and corporations to gain knowledge about critical contents. Despite being accurate, it is computationally
business Applications, analyze communication expensive and brings privacy concerns. Moreover,
patterns prevalent in traffic, collect data for account, or traffic encryption makes DPI increasingly irrelevant.
detect anomalous traffic patterns. A vital issue for Instead, novel classifiers investigate groups of packets
corporations and ISPs (Internet Service Providers) is in order to find distinguishing features of entire
to identify traffic application types which are application protocols. Usually, a flow of packets is
transmitted on their networks. Pattern recognition and statistically summarized (e.g. by average packet size
machine learning models have given significant and inter packet arrival time) and the resultant feature
attention to semi supervised learning. In network vector is classified using Machine Learning (ML) (e.g.
traffic areas, encryption and processing restrictions, Neural Network or Support Vector Machine). Such
protocol obfuscation and use of ephemeral ports make methods are largely resistant to misuse of the port
the task of construct classification models difficult. number and to encryption: the overall behavior of a
The large amount of Internet traffic flowing through particular protocol or host is examined instead of
networks makes the use of approaches that combine seeking for a strict match in a single packet. Latest
labeled and unlabeled data to construct accurate methods tackle the problem of classification from
classifiers suitable. There are a large number of papers many perspectives: counting packets, analyzing the
in the traffic monitoring and traffic classification area. DNS context, adopting multi classification, and more.
Our Multilevel Traffic Classification project


(MuTriCs) develops an algorithm that combines encrypt their data before storing on cloud. Various
different methods to increase classification encryption schemes are available with some
completeness and accuracy. The aim of this work is to advantages and disadvantages. Encryption schemes
discuss diversity in classification methods. This survey include, AES, ECC, RSA, RC4 etc.
also share our findings on the quality of traffic
classification papers. For the review, we selected 3.6 Issues and Future Directions in Traffic
publications that: Classification The variety and complexity of modern
(a) Present differentiated methods, Internet traffic exceeds anything imagined by the
(b) Were published recently (2009-2012), and original designers of the underlying Internet
(c) Are interesting in our opinion. Comparing with architecture. As the Internet becomes our most critical
existing surveys namely and our paper focuses on communications infrastructure, service providers
different time span. We review newer works that were attempt to retrofit functionality, including security,
not mentioned in these studies: they represent novel reliability, privacy, and multiple service qualities,
developments in traffic classification. intobest effort architecture originally intended to
support a research environment. In order to prioritize,
3.5 Secure Classification Of Encrypted Cloud protect, or prevent certain traffic, providers need to
Storage By Using SVM Todays digital system implement technology for traffic classification:
allows a users to store and retrieve their data. In fact associating traffic flows with the applications or
user can store their data on remote servers and perform application types that generated them. When the focus
various operations on that data. These remote servers is on detecting specific applications (e.g., Skype), the
are managed by third party people; this is known as term traffic identification is sometimes used. Despite
cloud service providers in cloud computing area. the increasing dependence on the Internet, there is
Formally cloud computing is defined as, a type of Essentially no scientifically reproducible body of
computing based on Internet-based that allows a research on global Internet traffic characteristics due
sharing of processing resources and data on demand. to the sensitivity of and typical restrictions on sharing
Computing resources includes networks, servers, traffic data. Despite these constraints, security
storage, applications and services). Cloud system is concerns and economic realities have motivated recent
cost effective, flexible and reducing the administration advances in traffic classification capabilities.
overhead of various organizations also provide the Situational awareness of traffic is essential to
various operations over data stored in remote servers. prevention, mitigation, and response to new forms of
Cloud storage online space where you can store your malware, which can suddenly and rapidly threaten
data or keep backup of files, through cloud service legitimate service on network links. Arguably as
providers by selecting some data storage package. At important, the high cost of deploying and operating
this point, cloud storage is physical storage including, Internet infrastructure compels providers to
hard drives or sub flash devices etc. With this storage continually seek ways to optimize their network
system, security and reliability is become a engineering or otherwise increase return on capital
challenging issues. It is very important to keep data investments, including application based service
secure and original. Data should not be modified. The differentiation and content-sensitive pricing.
confidential data store at cloud might be leak or Acknowledged their general lack of accuracy and
modified by third party unauthorized entities like applicability, whereas others took a narrower focus,
hackers. To keep security of data over cloud, various taxonomizing and reviewing documented machine-
techniques are available such as, learning approaches for IP traffic classification. In this
Encryption: It is the technique, used to hide survey we studied a critical but constructive analysis
the original information using various of the field of Internet traffic classification, focusing
encryption algorithms. on major obstacles to progress and suggestions for
Authentication: In this technique, every valid overcoming them.
user has its own user name and password.
Authorization: In this technique, only valid 4 CONCLUSION
Authenticated users can be able to access the cloud In this survey, we will conclude a system for
Storage data. In this paper, to maintain the security of classifying service usages using encrypted Internet
data stored on cloud storage, we have focused on the traffic in mobile messaging Apps by jointly modeling
encryption technique. In this technique, the users can behavior structure, network traffic characteristics, and


temporal dependencies. There are four modules in our
system including traffic segmentation, traffic feature 6] F. Liu, W. K. Ng and W. Zhang, "Encrypted SVM
extraction, service usage prediction, and outlier for Outsourced Data Mining," Cloud Computing
detection and handling. Specifically, we first built a (CLOUD), 2015 IEEE 8th International Conference
data collection platform to collect the traffic-flows of on, New York City, NY, 2015, pp. 1085-1092.
in App usages and the corresponding usage types
reported by mobile users. We then hierarchically 7] Y. Rahulamathavan, R. C. W. Phan, S. Veluru, K.
segment these traffic from traffic-flows to sessions to Cumanan and M. Rajarajan, "Privacy-Preserving
dialogs where each is assumed to be of individual Multi-Class Support Vector Machine for Outsourcing
usage or mixed usages. Also, we extracted the packet the Data Classification in Cloud," in IEEE
length related features and the time delay related Transactions on Dependable and Secure Computing,
features from traffic-flows to prepare the training data. vol. 11, no. 5, pp. 467-479, Sept.-Oct. 2014.
In addition, we learned service usage classifiers to
classify these segmented dialogs. Moreover, we 8] J. Xu, W. Zhang, C. Yang, J. Xu and N. Yu, "Two-
detected the anomalous dialogs with mixed usages and Step-Ranking Secure Multi-Keyword Search over
segmented these mixed dialogs into multiple sub- Encrypted Cloud Data," Cloud and Service
dialogs of single type usage. Finally, the experimental Computing (CSC), 2012 International Conference on,
results on real world WeChat and WhatsApp traffic Shanghai, 2012, pp. 124-130.
data demonstrate the performances of the proposed
method. With this survey, we showed that the valuable 9] H. Hu, J. Xu, C. Ren and B. Choi, "Processing
applications for in-App usage analytics can be enabled private queries over untrusted data cloud through
to score quality of experiences, profile user behaviors privacy homomorphism," Data Engineering (ICDE),
and enhance customer care. 2011 IEEE 27th International Conference on,
Hannover, 2011, pp. 601-612.
OTHER REFERENCES
1] M. Kumar, J. Meena, R. Singh and M. Vardhan, 10] Y. Huang, J. Katz and D. Evans, "Quid-Pro-Quo-
"Data outsourcing: A threat to confidentiality, tocols: Strengthening Semi-honest Protocols with
integrity, and availability," Green Computing and Dual Execution," Security and Privacy (SP), 2012
Internet of Things (ICGCIoT), 2015 International IEEE Symposium on, San Francisco, CA, 2012, pp.
Conference on, Noida, 2015, pp. 1496-1501. 272-284.
2] B. K. Samanthula, Y. Elmehdwi and W. Jiang, "k-

Nearest Neighbor Classification over Semantically
Secure Encrypted Relational Data," in IEEE
Transactions on Knowledge and Data Engineering,
vol. 27, no. 5, pp. 1261-1273, May 1 2015.
3] S. Thurner, M. Grn, S. Schmitt and H. Baier,

"Improving the Detection of Encrypted Data on
Storage Devices," IT Security Incident Management &
IT Forensics (IMF), 2015 Ninth International
Conference on, Magdeburg, 2015, pp. 26-39.
4] Y. Fu; H. Xiong; X. Lu; J. Yang; C. Chen, "Service

Usage Classification with Encrypted Internet Traffic
in Mobile Messaging Apps," in IEEE Transactions on
Mobile computing, vol.PP, no.99, pp.1-1
5] M. D. Singh, P. R. Krishna and A. Saxena, "A

privacy preserving Jaccard similarity function for
mining encrypted data," TENCON 2009 - 2009 IEEE
Region 10 Conference, Singapore, 2009, pp. 1-4.

A Mobile Messaging Apps: Service Usage Classification To Internet Traffic and Encryption System

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

A Mobile Messaging Apps: Service Usage Classification To Internet Traffic and Encryption System

Enviado por

Direitos autorais:

Formatos disponíveis

IDL - International Digital Library Of

Technology & Research

International e-Journal For Technology And Research-2017