Evaluation of Parallel Processing Systems Through Queuing Model

ISSN 2278-3091
Volume
No.2, March
April
2015 4(2), March - April 2015, 36 - 43
Vikas Shinde, International Journal of Advanced
Trends4,
in Computer
Science- and
Engineering,
International Journal of Advanced Trends in Computer Science and Engineering

Available Online at http://www.warse.org/ijatcse/static/pdf/file/ijatcse06422015.pdf
Evaluation of Parallel Processing Systems through Queuing Model

Vikas Shinde
Department of Applied Mathematics,
Madhav Institute of Technology & Science, Gwalior-India
ABSTRACT
and the programs as customers. A model of
In this investigation, Jackson queueing network
parallel processing system is a system which is
has been widely used to model and analyze the
expandable in vertical and horizontal manner and
performance of complex parallel systems. M/G/1
can be treated as cluster for a single queue of
queueing system is used to model a parallel
waiting jobs. A job is modeled as a sequence of
processing system, which is expandable in vertical
independent stages which must be processed,
and horizontal manner. Determine a closed form
where the number of processors desired by the
solution for the system performance metrics, such
jobs in each stage may be different. If, for some
as processors waiting time, system processing
stage, the job in service requires fewer processors
power, etc.
than the system provides, then the job will occupy
Keywords: Queueing Network, Massive Parallel
the processors according to its need and the other
Processing, Shared Memory, Waiting Time.
processors will be idle for that stage. If, for some

other stage, the job in service requires more
1.
INTRODUCTION
processors than the system provides, then it will
Parallel processing of the computer
use all the processors in the system for an
systems has been widely studied due to a
extended period of time such that the total work
significant role in day-by-day fast computing of
served in that stage is conserved.
the
Many researchers have extensively investigated
jobs.
As
parallel
computing
systems
proliferate the need for effective performance
processing
evaluation, queueing techniques become ever
approaches. Al-Saqabi et al. [1] established a
more important. In fact, the performance of such
distributed scheduling algorithm that will track
systems depends on the hardware resources,
the available workstations i.e the workstations not
(CPU, Memory, etc.,) on software (system
being used by their owners in networks and act
programs,
the
upon those workstations by scheduling processes
organization and management of these resources.
of parallel applications onto them. Guan and
In view of the increasing complexity of
Cheung [2] constructed a massively parallel
computing systems, it is more and more difficult
processing system which has drawn a lot of
to predict their performance indices based on
attention to an important feature affecting the
analytical queueing models. In such models, it is
performance and characteristics of the architecture
convenient to represent the resources as servers
with an interconnection of multiple processors.
compilers,
etc.,)
and
on
36
systems
via
queue
theoretic
Vikas Shinde, International Journal of Advanced Trends in Computer Science and Engineering, 4(2), March - April 2015, 36 - 43
Jean-Marie et al. [5] introduced a hybrid
dynamic allocation of the resources of a general
analytical approach by using techniques from the
parallel processing system comprised of several
theories of both stochastic task graphs and
heterogeneous processors.
queueing networks. Jozwiak and Jan [6] discussed
The rest of paper is organized as follows. Model
quality driven model based multi processor
description is given in section 2. In section 3,
accelerator
adequately
described the governing equation and their
addresses the architecture design issues of
performance analysis. Conclusion is mentioned in
hardware multi processors for the modern highly
section 4.
demanding
studied
design
method
embedded
that
applications.
communication
Jan
architectures
[7]
for
2. MODEL DESCRIPTION
massively parallel hardware multi processors.
Every computer consists of a set of processors
Systematic framework and a corresponding
(CPUs) P1, P2, P3, Pn and m 0 shared
methodology for workload modeling of parallel
memory units M1, M2, M3,..Mm
systems was proposed by Kotsis [8]. Mohapatra
communicate via an interconnection network N,
et al. [10] proposed the structure for processors
as illustrated in figure 1.
which is divided into groups or cluster and
constitute a global main memory that provides a
organized in several stages. Maheshwari and Shen
convenient message depository for processor-to-
[11] established a clustering algorithm wherein all
processor communication. A system with this
the clusters have balanced amount of computation
arrangement is called a shared memory computer.
load and there is only one communication path
A global shared memory can be a serious
between any pair of clusters. Nassar [12]
bottleneck, particularly when the processors share
evaluated the throughput of several multi buses as
large amounts of information, since normally only
a discrete time Markov chain under different
one processor can access a given memory module
working conditions. Reijns [13] considered the
at a time. If the processors have their own local
delay effect caused by memory interference in a
memories, then the global memory can be reduced
parallel processing system with shared memory
in size, or even eliminated completely. To
was
separate the functions of processing and memory,
implemented
queueing.
Tomic
using
machine repair
[14]
gave
the
which
The memory units
matrix
which refer to a CPU with no associated main
representation of the linear evolution operator of
memory, but with other temporary storage units
the certain class of parallel processing system and
such as register files and caches as a processing
effectively used as a performance prediction tool
element (PE).
for the modern parallel processing systems.

Wasserman et al. [15] studied the problem of
37
PE1
PE2
PE3
PEn
Interconnection network N
Memory M
Figure 1: Shared Memory

The basic cluster shown in figure 2, each
The basic cluster is defined in two ways:
processing unit has a local memory for its own
(i) by increasing the number of the processing
computation and there is a shared memory for
units or using several basic clusters with one
facilitating the communication between the
additional memory that is shared by those
processors. A horizontal communication network
clusters, and (ii) in a two stage system, it must be
(HCN) is used for transmitting data between
noted that in the second level of the system, there
processors and shared memory. Moreover the
is a HCN that connects the VCN of each basic
basic cluster includes a unit for I/O operations and
cluster to SM2. The units that are located inside
a unit for supervisory and managing the
the basic clusters are indicated by (SM1, HCN1,
processors. A vertical communication network
.), and the units that are located outside of the
(VCN) is used for transmitting control signals and
cluster are indicated by (SM2, HCN2,)
vertical expansion of the system.
SM1
Horizontal Communication Network

LM1
P1
LM2
P2
LMN
I/O
PN
Vertical Communication Network

Figure 2: Basic Cluster
38
Manager
SM2
Horizontal Communication Network 2
SM1
I/ O
Cluster 2
Manager
HCN 1
VCN 1
Vertical Communication Network 2

Figure 3: Two Stage System
This method can expand the system
horizontally by increasing the number of PCs in
vertically and constructing s-stages system. A
each level. In multistage clustering structure based
cluster in i
th
stage of the s-stages system is
system, if there are number of PCs that make a
depicted in figure 4. Here cluster include some
cluster will be equal for all clusters of ith stage, the
processing clusters or PC namely, one I/O cluster
system is known as homogenous at level i. If
and one managing cluster. There are two
system is homogenous in all level it will be called
interconnection networks, HCNi and VCNi that
homogenous on the other hand if it will not be
transmitting data inside and outside of the clusters
homogenous at least in one stage, it will be
respectively. Such systems can be expanded
recognized as non homogenous or heterogeneous.
vertically by increasing the number of stages or
39
SMi
Horizontal Communication Network i
I/OCi-1
PCi-1
MCi-1
PCi-1
Vertical Communication Network i
Vertical Expansion Path
Horizontal Expansion Path
Figure 4: Cluster in ith stage of s stage system

3. THE PERFORMANCE ANALYSIS
For evaluating the performance of the
and Co is the number of processors in
system, let consider the system is constructed based on

homogenous MSCS. In this system any processor
Processors itself generated the inter job

communication requests.
The time between two consecutive requests
probable that a job needs to communicate with the
have
other
parameter .
jobs.
Therefore several
queues can
be
constructed for each interconnection networks and
each
basic cluster.
performs a piece of the main program that is called

processors job. During the job execution, it is
Ci is the number of PCS in ith stage of system
shared memories.
exponentially
distributed
with
Access time to memory in ith stage has

exponentially distributed with parameter mi.
Consider the following assumptions for analyzing the

system.
The destination of each request will be

uniformly distributed between processors
40
jobs and the probability of outgoing request

from ith stage is denoted by Pi.
PROC
The service time of the inter connection

networks in ith stage have exponentially
v1
1P1
distributed with parameter hi and vi for

h1
HCNi and VCNi, respectively.
Conflict
over
memory
P1
modules
HCN1
and
VCN1
interconnection networks will be resolved by

the queueing center which is modeled as
v2
h2 1P2
P2
M/G/1.
Request processors must be waited until they
VCN2
HCN2
SM1
FromVCNs-2
offer service as per above scheme and during

m1
waiting period, they can not generate any
h3
vs1
Ps1
1 P3 P3
To VCN3
other request.
SM2
The parallel processing system in which the
HCN3
VCNS-1
input rate of each stage must be computed and

m2
queueing problem is analyzed by developing the
M/G/1 model. For analyzing the design of MPPs with
hs
HCNS
SM3
a large number of units, the area of computation for

closed queueing network will be very large. Apply
m3
queueing network methodology for analyzing the

closed queueing network and also determine the input
SMs
rate of each service center as a function of the input

ms
rate for previous center. This technique can reduce the
Figure 5: Multi stage Cluster MPPs with s stage system
calculation and simulation time.

As shown in the figure 5, all the request departs
Since there are (C0-1) processors in each basic cluster,
from HCNi will pass through the SMi with probability
the requests that receive to HCN1 and VCN1
one. Therefore, compute input request rate of VCNs
originating from other processor in the same cluster,
and HCNs. The processor requests will be directed to
indicated by h1 and v1, will be (1-P1)(C0-1) and
service center HCN1 and VCN1 by probability (1-P1)
P1(C0-1), respectively. So the total requests of the
and P1, respectively. If the request rate of a processor
processors that received to service centers in the first
will be , the input rate of HCN1 and VCN1 that
stage can be computed by following equations:
originated from that process will be (1-P1) and P1.
41
steps will be negligible. After calculating the
v 1 P1 ( C 0 1 ) P1
(1)
C 0 P1
effective request rate, the waiting time can be

determine by Little formula as
m1 h1 (1 P1 ) (C 0 1)(1 P1 )
C 0 (1 P1 )
L 1
2 2 2 s
L W or W

2(1 ) (8)
2 2 2 2 s
2 (1 )
(2)
The input request rate at the ith stage from each PCs is
(vi-1)
vi Pi v( i 1) (C i 1 1) Pi v (i 1) C i 1 Pi v( i1)
Here Pvi , , Pmi , Phi are the probabilities that referred to
(3)
a processor request to
VCN i , SM i & HCN i
respectively and computed by the following product

type solution
mi hi (1 Pi ) v (i 1) (C i 1 1) (1 Pi ) v ( i 1)
C i 1 (1 Pi ) v (i 1)
i 1
Pvi Pj 1
(4)
(9)
j0
Pmi Phi
In the last stage there is no request for outer cluster, so

that
(1 Pi ) i 1
Pj 1
Pi
j 0
(10)
By determining the average waiting time of a
vs 0
processor for each communication request,

which can
(5)
determine the processor utilization as by using:
(5)
ms hs C s 1 (1 Ps ) v ( s 1) C s 1 Ps v ( s 1)
C s 1 v ( s 1)
(6)
Processor
Utilization
1
2( )
w 2 2 2 s
Now consider M/G/1 model to calculate the
PU
(11)
queue length at each mode for all stages, then the
Total processing power of the system (TPP), is
average of total waited processors in the system can be
obtained by considering the single processor power
computed as.
(SPP). Thus
By using Pollaczek-Khintchine formula, it give

TPP
2

2(1 )
2( ) SPP
2 2 2 s
(7)
PU
SPP
(12)
(7)
i 0
The waited processors would not be able to

generate the request. In this situation the effective
4. CONCLUSIONS
processors request rate would be lower than the
In
required. The effective request rate will be decreased
this
investigation,
the
performance
modeling of a parallel processing system as a
with the same ratio as there are active processors in
sequence of stages, each of which requires a
the system. L and have been calculated
certain integral number of processors for a certain
successively till their changes in two consecutive
integral of time. This proposed a new structure and

42
developed an analytical model for massive parallel
parallel hardware multi processors, J of
processing system based on queueing theory. The
Parallel Distribution Computing, Vol. 72, pp.
system performance metrics may provide insights
1450-1463. (2012)
8.
to the system designers and decision makers to
Kotsis, G.: A systematic approach for

workload modeling for parallel processing
improve the system at optimal cost.
systems, J. Parallel computing, Vol. 22, No.

13, pp. 1771-1787. (1997)
REFERENCES
9.
1. Al-Saqabi, K., Sarwar, S. and Saleh, K.:
Computer Application, New York Wiely.
Distributed gang scheduling in networks of
(1975)
heterogeneous workstations, J. Computer
10. Mohapatra, P., Das, C. R. and Feng, T. Y.:
Communications, Vol. 20, No. 5, pp. 338-
Performance
348. (1997)
programs, J. Parallel Computing, Vol. 24,
(2000)
No. 5-6, pp. 893-909. (1998)
3. Hayes, J. P. : Computer Architecture and
12. Nassar, H.: A Markov model for multibus
organization, McGraw-Hill. (1998)
multiprocessor systems under asynchronous
4. Hwang, K. H. and Xu, .Z.: Scalable parallel
operation, J. Information Processing Letters,
computing, McGraw-Hill (1998)
Vol. 54, No. 1, pp. 11-16. (1995)
Jean-Marie, A., Lefebvre-Barbaroux, S. and
13. Reijns, G. L. and Gemund, Van. J. C.:
Liu, Z.: An analytical approach to the
Analysis of a shared- memory multiprocessor
master-slave
via a novel queueing model, J of system
computational models, J. Parallel Computing,
Architecture, Vol. 45, No. 14, pp. 1189-1193.
Vol. 24, No. 5-6, pp. 841-862. (1998)

6.
(1999)
Jozwiak, L., Jan, Y.: Design of massively
14. Tomic, D.: Spectral performance evaluation
parallel hardware multi-processors for highly
of parallel processing systems, J. Parallel
demanding embedded applications, J of
computing, Vol. 13, No. 1, pp. 25-38. (2002)
Microprocessors and Microsystems, Vol. 37,
15. Wasserman, K. M., Michailidis G. and
pp. 1155-1172. (2013)

7.
Jan,
Y.
and
Jozwiak,
L.
based
clustering algorithm for partitioning parallel
Architecture, Vol. 46, No. 13, pp. 1185-1190.
of
cluster
11. Maheshwari, P. and Shen, H.: An efficient
parallel processing system, J. of systems
evaluation
of
Vol. 43, pp. 109-114. (1994)
approaches for constructing a massively
performance
analysis
multiprocessor, IEEE Trans. on Computer,
2. Guan, H. and Cheung, To-Yat. : Efficient
5.
Keleinrock, L.: Queueing Systems, Vol. II,
Bambos, N.: Optimal processor allocation to
Scalable
differentiated job flows, J. Performance
communication architectures for massively
Evaluation, Vol. 63, No. 1, pp. 1-14. (2006)
43

Evaluation of Parallel Processing Systems Through Queuing Model

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Evaluation of Parallel Processing Systems Through Queuing Model

Enviado por

Direitos autorais:

Formatos disponíveis

ISSN 2278-3091

International Journal of Advanced Trends in Computer Science and Engineering

Evaluation of Parallel Processing Systems through Queuing Model

and the programs as customers. A model of

In this investigation, Jackson queueing network

parallel processing system is a system which is

has been widely used to model and analyze the

expandable in vertical and horizontal manner and

performance of complex parallel systems. M/G/1

can be treated as cluster for a single queue of

queueing system is used to model a parallel

waiting jobs. A job is modeled as a sequence of

processing system, which is expandable in vertical

independent stages which must be processed,

and horizontal manner. Determine a closed form

where the number of processors desired by the

solution for the system performance metrics, such

jobs in each stage may be different. If, for some

as processors waiting time, system processing

stage, the job in service requires fewer processors

than the system provides, then the job will occupy

Keywords: Queueing Network, Massive Parallel

the processors according to its need and the other

Processing, Shared Memory, Waiting Time.

processors will be idle for that stage. If, for some

processors than the system provides, then it will

Parallel processing of the computer

use all the processors in the system for an

systems has been widely studied due to a

extended period of time such that the total work

significant role in day-by-day fast computing of

served in that stage is conserved.

Many researchers have extensively investigated

proliferate the need for effective performance

evaluation, queueing techniques become ever

approaches. Al-Saqabi et al. [1] established a

more important. In fact, the performance of such

distributed scheduling algorithm that will track

systems depends on the hardware resources,

the available workstations i.e the workstations not

(CPU, Memory, etc.,) on software (system

being used by their owners in networks and act

upon those workstations by scheduling processes

organization and management of these resources.

of parallel applications onto them. Guan and

In view of the increasing complexity of

Cheung [2] constructed a massively parallel

computing systems, it is more and more difficult

processing system which has drawn a lot of

to predict their performance indices based on

attention to an important feature affecting the

analytical queueing models. In such models, it is

performance and characteristics of the architecture

convenient to represent the resources as servers

with an interconnection of multiple processors.

Jean-Marie et al. [5] introduced a hybrid

dynamic allocation of the resources of a general

analytical approach by using techniques from the

parallel processing system comprised of several

theories of both stochastic task graphs and

queueing networks. Jozwiak and Jan [6] discussed

The rest of paper is organized as follows. Model

quality driven model based multi processor