Você está na página 1de 53

An Approach for Profiling Distributed

Applications Through Network Traffic
Analysis
Thiago Vieira
tpbv@cin.ufpe.br
http://github.com/tpbvieira
CIn – UFPE

Outline


Motivation and Problem Statement
Background and Related Work
Profiling Distributed Applications Through
Deep Packet Inspection
Evaluating MapReduce for Network
Traffic Analysis
Conclusion

2

Motivation
and
Problem Statement
3

Motivation

4

Motivation

low intrusion, high accuracy and fast results??
How??
5

Motivation

Protocols, flows, throughput and load distribution

6

Motivation


Online Classification
App. State Evaluation = More processing
7

Motivation

MapReduce restrictions
MapReduce for DPI?
8

Problem Statement

How to evaluate large application
traffic of a datacenter?
Can MapReduce express DPI
algorithms?
How is the MapReduce processing
capacity for profiling application traffic?

9

Problem Statement

Our Goal:

To propose a solution for profiling
applications traffic at a data center

To analyse the MapReduce
processing capacity for profiling
applications through network traffic
analysis

10

Background
and
Related Work
11

Background

MapReduce
Block

Block

Block

12

Background

Hadoop specific input data

New input type and extensions

Splitting

Specific configurations

13

Related Work

Lee et al. (2011)

Hadoop-based packet trace
processing tool to process network
traffic.

Packet level evaluation

Input → Blocks → Packets → Map

Is not able to evaluate more than one
packet per MapReduce iteration

Inefficient for DPI
14

Profiling Distributed
Applications Through
Deep Packet Inspection

15

Proposal



A solution for profiling application
traffic
Monitoring plans
Cumulative traces for offline processing
MapReduce

Processing capacity and scalability

Splitting approach

16

Architecture

17

Architecture

18

Evaluation Methodology

Methodology → GQM (Basili et al.,
1994) and the systematic approach to
performance evaluation (Jain, 1991)
Goal → To evaluate the effectiveness
and scalability of MapReduce for
profilinng application traffic
AppAnalyzer and AppParser

19

Evaluation Methodology

Q1 → Can MapReduce express DPI
algorithms and extracts application
indicators from network traffic of
distributed applications?
Q2 → Is the completion time scalability
of MapReduce for DPI proportional to
the number of worker nodes?

20

Evaluation Methodology

21

Evaluation Methodology

22

Experiment Setup



Traffic from a JXTA-based backup
system
Virtual Machines of Amazon EC2
Indicators:

Round-trip time

Connection requisitions per time

Messages received per time
Measured the job completion time
Hadoop default configuration
23

Results

rejects the null hypothesis H0num.indct

24

Results

25

Results

rejected the null hypothesis H0scale.prop and confirmed the H1scale.prop

26

Possible Threats to Validity

Small cluster and input size

According to real traces, the majority
of the jobs are small and executed
into small number of nodes

Fair Scheduler

27

Discussion

MapReduce can express DPI algorithms
The completion time scalability is not
proportional to node addition
Node addition provides different gains
according to the cluster and input size
It is necessary more evaluations

To understand the MapReduce
behaviour

How can MapReduce be configured
for network traffic analysis
28

Evaluating MapReduce
for
Network Traffic Analysis

29

Motivation

The kind of workload and input type
submitted for processing by MapReduce
impacts on its performance

Lack of evaluation of MapReduce for
netwokr traffic analysis

Specific configuration for better
performance

30

Evaluation Methodology

Methodology → Systematic approach to
performance evaluation (Jain, 1991)
Goal → To investigate the behaviour of
MapReduce phases, its scalability and
the speed-up achieved for packet level
analysis and DPI

31

Evaluation Methodology

32

Experiment Setup



Previous network traces, application and
indicators
Network traces for packet level
Physical Machines
Drivers

Distributed vs non-distributed

P3, CountUp and JXTA
Indicators extracted from Hadoop logs
by Hadoop-Analyzer
33

Results

34

Results

35

Results

36

Results

37

Results

38

Results

39

Results

40

Results

41

Results

42

Results

43

Results

44

Discussion

Block size impacts into completion time

Bigger blocks for smaller clusters

68MB better than 128MB
Processing capacity per input size

Better for bigger data

More efficient accumulate data

Input Size + Block Size→ Pools

45

Discussion

Scalability

No relevant gains in some cases

Execution waves
P3 performs better than CountUpDriver

Local concurrency
Phase predominancy

Map and Shuffle

Optimizations: Shuffle Start

46

Possible Threats to Validity

P3 version

2012 February

Binary version without complete code

47

Conclusion

48

Conclusion


MapReduce can express DPI algorithms
MapReduce scalability is not
proportional to node addition
It is necessary to avoid wasting
resources
Input, block and cluster size impact into
completion time
Map intensive jobs
It is possible to configure and choose
the best resource allocation
49

Contributions



An approach to implements DPI
algorithms through MapReduce
The parser JNetPCAP-JXTA
The Hadoop-Analyzer
Characterization of MapReduce phases
for packet level analysis and DPI
Was described the processing capacity
and scalability of MapReduce for packet
level analysis and DPI
50

Contributions

Showed the speed-up obtained with
MapReduce for DPI
Papers:

Vieira, T., Soares, P., Machado, M., Assad, R., and Garcia, V.
Measuring Distributed Applications Through MapReduce and
Traffic Analysis. In IEEE 18th International Conference
onParallel and Distributed Systems (ICPADS), 2012
Vieira, T., Soares, P., Machado, M., Assad, R., and Garcia, V.
Evaluating Performance of Distributed Systems with
MapReduce and Network Traffic Analysis. In The Seventh
International Conference on Software Engineering Advances
(ICSEA) 2012

51

Future Work

To evaluate all components of the
proposed architecture
A technique for efficient evaluation of
applications through network traffic
analysis

52

Thank you!

53