Você está na página 1de 53

An Approach for Profiling Distributed

Applications Through Network Traffic


Analysis
Thiago Vieira
tpbv@cin.ufpe.br
http://github.com/tpbvieira
CIn UFPE

Outline

Motivation and Problem Statement


Background and Related Work
Profiling Distributed Applications Through
Deep Packet Inspection
Evaluating MapReduce for Network
Traffic Analysis
Conclusion

Motivation
and
Problem Statement
3

Motivation

Motivation

low intrusion, high accuracy and fast results??


How??
5

Motivation

Protocols, flows, throughput and load distribution

Motivation

Online Classification
App. State Evaluation = More processing
7

Motivation

MapReduce restrictions
MapReduce for DPI?
8

Problem Statement

How to evaluate large application


traffic of a datacenter?
Can MapReduce express DPI
algorithms?
How is the MapReduce processing
capacity for profiling application traffic?

Problem Statement

Our Goal:

To propose a solution for profiling


applications traffic at a data center

To analyse the MapReduce


processing capacity for profiling
applications through network traffic
analysis

10

Background
and
Related Work
11

Background

MapReduce
Block

Block

Block

12

Background

Hadoop specific input data

New input type and extensions

Splitting

Specific configurations

13

Related Work

Lee et al. (2011)

Hadoop-based packet trace


processing tool to process network
traffic.

Packet level evaluation

Input Blocks Packets Map

Is not able to evaluate more than one


packet per MapReduce iteration

Inefficient for DPI


14

Profiling Distributed
Applications Through
Deep Packet Inspection

15

Proposal

A solution for profiling application


traffic
Monitoring plans
Cumulative traces for offline processing
MapReduce

Processing capacity and scalability

Splitting approach

16

Architecture

17

Architecture

18

Evaluation Methodology

Methodology GQM (Basili et al.,


1994) and the systematic approach to
performance evaluation (Jain, 1991)
Goal To evaluate the effectiveness
and scalability of MapReduce for
profilinng application traffic
AppAnalyzer and AppParser

19

Evaluation Methodology

Q1 Can MapReduce express DPI


algorithms and extracts application
indicators from network traffic of
distributed applications?
Q2 Is the completion time scalability
of MapReduce for DPI proportional to
the number of worker nodes?

20

Evaluation Methodology

21

Evaluation Methodology

22

Experiment Setup

Traffic from a JXTA-based backup


system
Virtual Machines of Amazon EC2
Indicators:

Round-trip time

Connection requisitions per time

Messages received per time


Measured the job completion time
Hadoop default configuration
23

Results

rejects the null hypothesis H0num.indct

24

Results

25

Results

rejected the null hypothesis H0scale.prop and confirmed the H1scale.prop

26

Possible Threats to Validity

Small cluster and input size

According to real traces, the majority


of the jobs are small and executed
into small number of nodes

Fair Scheduler

27

Discussion

MapReduce can express DPI algorithms


The completion time scalability is not
proportional to node addition
Node addition provides different gains
according to the cluster and input size
It is necessary more evaluations

To understand the MapReduce


behaviour

How can MapReduce be configured


for network traffic analysis
28

Evaluating MapReduce
for
Network Traffic Analysis

29

Motivation

The kind of workload and input type


submitted for processing by MapReduce
impacts on its performance

Lack of evaluation of MapReduce for


netwokr traffic analysis

Specific configuration for better


performance

30

Evaluation Methodology

Methodology Systematic approach to


performance evaluation (Jain, 1991)
Goal To investigate the behaviour of
MapReduce phases, its scalability and
the speed-up achieved for packet level
analysis and DPI

31

Evaluation Methodology

32

Experiment Setup

Previous network traces, application and


indicators
Network traces for packet level
Physical Machines
Drivers

Distributed vs non-distributed

P3, CountUp and JXTA


Indicators extracted from Hadoop logs
by Hadoop-Analyzer
33

Results

34

Results

35

Results

36

Results

37

Results

38

Results

39

Results

40

Results

41

Results

42

Results

43

Results

44

Discussion

Block size impacts into completion time

Bigger blocks for smaller clusters

68MB better than 128MB


Processing capacity per input size

Better for bigger data

More efficient accumulate data

Input Size + Block Size Pools

45

Discussion

Scalability

No relevant gains in some cases

Execution waves
P3 performs better than CountUpDriver

Local concurrency
Phase predominancy

Map and Shuffle

Optimizations: Shuffle Start

46

Possible Threats to Validity

P3 version

2012 February

Binary version without complete code

47

Conclusion

48

Conclusion

MapReduce can express DPI algorithms


MapReduce scalability is not
proportional to node addition
It is necessary to avoid wasting
resources
Input, block and cluster size impact into
completion time
Map intensive jobs
It is possible to configure and choose
the best resource allocation
49

Contributions

An approach to implements DPI


algorithms through MapReduce
The parser JNetPCAP-JXTA
The Hadoop-Analyzer
Characterization of MapReduce phases
for packet level analysis and DPI
Was described the processing capacity
and scalability of MapReduce for packet
level analysis and DPI
50

Contributions

Showed the speed-up obtained with


MapReduce for DPI
Papers:

Vieira, T., Soares, P., Machado, M., Assad, R., and Garcia, V.
Measuring Distributed Applications Through MapReduce and
Traffic Analysis. In IEEE 18th International Conference
onParallel and Distributed Systems (ICPADS), 2012
Vieira, T., Soares, P., Machado, M., Assad, R., and Garcia, V.
Evaluating Performance of Distributed Systems with
MapReduce and Network Traffic Analysis. In The Seventh
International Conference on Software Engineering Advances
(ICSEA) 2012

51

Future Work

To evaluate all components of the


proposed architecture
A technique for efficient evaluation of
applications through network traffic
analysis

52

Thank you!

53

Você também pode gostar