Escolar Documentos
Profissional Documentos
Cultura Documentos
To facilitate this problems research, we make the following suppositions in regard to Hadoop
characteristics:
2.1. Suppose the communication cost among slave node machines under Hadoop platform is
ignored. Hadoop architecture adopts data locality storage where computation occurs on the data storage
node as possible as it can, and it has two data replicas respectively on the same rack and the nearest
rack as possible to avoid large amounts of data shuttle cost [2, 3,7, 12].
2.2. Suppose slave node machines have same architecture under Hadoop platform. Due to
heterogeneous, different core processor architecture will increase experiment complexity and research
model complexity, and non-compatibility of hardware [2,3].
Figure 2. Computing Model of System
Fig 2. Demonstrates the improvement of Map-reduce model, and inserts two modules (including
local updating and global updating) of hormone updating of ACO algorithm into the master node and
slave node respectively.
3. Algorithm design and analysis
Under the above supposition, a typical job scheduling problem is described as the following: n jobs
need to be dispatched onto n node machines, and one node machine only processes one job, and one
job only can be implemented by one node machine. So different dispatch plans have different
execution cost and resource consumption, job scheduling is to find one plan to make jobs completing
smoothly to ensure availability, reliability and optimality.
Under the same processor performance conditions, jobs complexity is the key factor of influence
processing time. Jobs complexity is higher, and the required processing time is more; vice verse,
simple job needs fewer processing time [9]. With different clients requests submitted such as ftp, mail,
http, upload, these requests need differential processing time.
Definition 1. Cost matrix C
n*n
={C
i*j
| C
i*j
C
n*n
C
i*j
>=0i=1,n; j=1n}stands for the
processing cost that the i
th
node machine completes the j
th
job. The value of element in the matrix is
derived from the requested job complexity and processors performance.
Definition 2. Hormone matrix T
n*n
={T
ij
|T
ij
T
n*n
i=1,n; j=1n;}, stands for the value of
Job Dispatcher
JobRecycler
U
s
e
r
B
r
o
k
e
r
Cloud Computing
Resource Content
ACO Local Hormone Update
ACO Global
Hormone
Update
Master
Slave
then set D
min
=D
Nc
min
Nc=Nc+1
Until Nc>=Ncmax.
4. Experiment deployment and analysis
The whole experiment can be divided into two parts, the first is to identify ACO algorithms static
dispatch performance and dynamic dispatch performance; and the second part is to compare workload
balance performance and other QoS indexes of ACO algorithm against Hadoops built-in FIFO.
The first part of experiment is implemented with Matlab7.0 under Windows XP platform. The
sub-step is to dispatch 20 jobs to 10 slave node machines, when the submitted jobs are completed, and
then dispatch another 20 jobs to slave node machines, and then analyze the two scheduling matrixes.
The dynamic job scheduling is to dispatch 20 jobs to 10 slave node machines, during the submitted
jobs processing procedure, the second batch of 20 jobs are submitted to master node machine to
dynamically dispatch, and analyze the job scheduling matrix.
Table1. Experiment result of static scheduling
Experiment times Sum of two loops Sum of two execution time
1
2
3
17
16
15
27ms
26ms
25ms
Table2. Experiment result of dynamic scheduling
Experiment times Loop times Execution time
1
2
3
8
8
7
14ms
14ms
12ms
Because the static scheduling is divided into two phases, the total execution time is twice of
dynamic scheduling approximately, and the total loop time is twice of dynamic scheduling
approximately too, so we get a conclusion: ACO algorithm is more adoptable to continual small
granularity jobs for saving processing time. In view to optimization, static scheduling is easy to find
local optimization result; and the dynamic scheduling is easy to find global optimization result among
the whole data scope. The forecast can be gained: along with the increasing of cloud computing scale
and submitted jobs, ACO algorithm is possible to achieve global optimization under long time, large
amounts of submitted jobs situation. A shorter time slot is chosen by this experiment after all.
The second part experiment is as following:
Experiment platform: Ubuntu-9.10 OS, 10 Pentium IV processor and 1 high performance master
node machine. The first experiment executes dynamic scheduling of 100 jobs with Hadoop-0.20.2
built-in FIFO algorithm; the second experiment executes dynamic scheduling of 100 jobs with ACO
algorithm under Eclipse platform; and then scales out the jobs quantity to compare the two algorithms
QoS indexes, choosing jobs execution time ,WAET, jobs losing rate, and workload rate.
With the left of Fig.3, we can see that the execution time of two algorithms is equal approximately
when the quantity of jobs is at 300. Along with the quantity of jobs increasing, the advantage of ACO
algorithm is highlighting.
- 287 -
W
of th
WA
W
FIFO
and
lost.
1000
W
work
sam
poin
FIFO
T
little
certa
matr
4. C
then
little
prob
dyna
data
conc
easi
Internation
With the midd
he number of
ET is lower th
Figure
With the right
O is rising ea
requirement s
. But the line
0.
We randomly
kload rate of
mpling value of
nt more than 8
O line, and no
The above fac
e granularity j
ain advantage
rix, instead FI
Conclusion
This paper f
n identifies the
e granularity
blems under H
amic scheduli
a, we can infe
current jobs to
ly.
ACO Algo
nal Journal of Dig
dle of Fig.3, w
f jobs. And w
han FIFOs.
e 3. Jobs Exec
t of Fig.3, we
arly, and then
scale. As soon
of ACO is al
Figure 4. 1
#
select 1
#
, 3
#
1
#
, 3
#
7
#
slave
f 1
#
slave node
80% of ACO
one of ACO.
ct demonstrate
jobs situation
e. At the same
IFO only gets
firstly analyze
e shortcoming
concurrent jo
Hadoop platfo
ing, and also
er that ACO h
o shorten resp
orithm-based Para
Hengliang Sh
gital Content Tech
we can see that
we also can fin
cution Time, W
e test the job
is dropping, w
n as the waiting
ways droppin
#
, 3
#
7
#
Slave N
7
#
Slave Nod
e machine resp
e more than 80
line. Meanwh
es ACO algor
n; and for a f
e time, ACO
local optimal j
es application
gs of Hadoops
obs, and inno
orm, and com
compares Qo
has much adva
ponse time, im
allel Job Schedulin
hi, Guangyi Bai,
Z
hnology and its A
single jobs W
nd that the AC
WAET, and Lo
s losing rate
which is caus
g time of many
ng, and is stea
Node Machine
de Machine to
pectively. Stat
0%, and 3 poi
hile, there is 1
rithm is more
few large gran
algorithm is e
job scheduling
scene charac
s built-in FIFO
ovatively prop
mpares the exp
oS indexes of
antages to cou
mprove through
ng Investigation o
Zhenmin Tang
Applications. Volu
WAET is decre
CO performan
osing Rate Com
of FIFO and
ed by the diff
y jobs exceed
dy when the q
e Workload Ra
o be as our te
tically, for FIF
ints of 3
#
, 2 po
sampling poi
adoptable to
nularity concu
easy to get gl
g table.
cteristics unde
O algorithm in
poses ACO al
periment resu
ACO against
uple with larg
hput, and get t
on Hadoop
ume 5, Number 7,
easing along w
nce is better t
mparison of F
ACO respect
ferentia betwe
s its deadline,
quantity of job
ate Respective
est targets. Fig
FO line, we ca
oints of 7
#
, and
int which is lo
be applied in
urrent jobs sit
lobal optimiza
er cloud comp
n dealing with
lgorithm to c
ult of static sc
t FIFO. With
ge amounts of
the optimal jo
, July 2011
with the increa
than FIFO, fo
FIFO and ACO
ively. The lin
een resource s
these jobs wi
bs submitted
ely
g. 4 demonstr
an find there a
d there are no
ower than 20%
large amount
tuation, FIFO
ation job disp
puting model,
h large amount
couple with th
cheduling aga
these experim
f little granula
ob dispatch ma
asing
or its
O
ne of
scale
ll be
is at
rates
are 3
one
% of
ts of
has
patch
and
ts of
hese
ainst
ment
arity
atrix
- 288 -