Escolar Documentos
Profissional Documentos
Cultura Documentos
PART-1
Answer any one full question.
1) Give Flynns classification of various computer architectures. Clearly explain the features
of each with conceptual diagrams.
(10 Marks)
IS
CU
MU = memory unit
IS = instruction stream
DS = data stream
IS
PU
DS
MU
I/O
SIMD:
PE1
Program loaded
From
Host
CU
IS
DS
LM1
DS
Data sets loaded
IS
from host
PEn
DS
DS
LMn
DS
DS
PE = Processing Elements
LM = Local Memory
Fig 1b: SIMD architecture (with distributed memory)
3. MIMD (multiple instruction stream over a multiple data stream):
Parallel computers are reserved for MIMD machines is as shown in the Fig 1c.
IS
CU1
IS
PU1
DS
Share
d
memo
ry
I/O
I/O
CUn
IS
PUn
DS
IS
The same data stream flows through an array of processors executing different
instruction streams.
IS
IS
CU2
CU1
IS
Memory
(Program
and
DS
data)
IS
PU1
DS
DS
DS
CUn
IS
PU2
DS
DS
PUn
IS
I/O
Of the four machine models, most parallel computers assumed MIMD model
for general-purpose computations.
The SIMD and MISD are more suitable for special-purpose computations.
Therefore MIMD is the most popular model, SIMD next and MIMD is the
least popular model.
2) a) A 40 MHz processor was supposed to execute 200000 instructions with following
instruction mix and CPI needed for each instruction
Instruction type
CPI
Instruction count
Integer arithmetic
60%
Data transfer
18%
Floating point
12%
Control transfer
10%
= Ic / (T106)
= Ic / Ic * CPI * * 106
= 1 / 3.14*1/40*106 * 106
MIPS rate
= 12.7388 MIPS.
Execution time:
T = Ic * CPI *
= 200 *103 * 3.14 * 1/40*106
T = 15.7msec.
2) b) Differentiate between implicit and explicit parallelism with a neat sketch.
(5 Marks)
Sol:
Implicit parallelism:
An implicit approach uses a conventional language, such as C, Fortran, Lips or Pascal
to write the source program.
The sequentially coded source program is translated into parallel object code by a
parallelizing compiler.
As shown in Fig 5 (a), this compiler must be able to detect parallelism and assign
target machine resources. Programmer
This compiler approach has been applied in programming shared memory
Source code written in
multiprocessors.
sequential
C,
This approach requires
less effort on languages,
the part of the programmer.
Fortran, Lips, or Pascal
Parallelizing
compiler
Parallel object
code
Execution by routine
multistage network.
UMA model is suitable for general purpose, time sharing application by multiple
users.
Coordination of parallel events, synchronization and communication among
processors are done through shared variables.
In this type of architecture when all the processors have equal access time to all the
peripherals, the system is said to be symmetric multiprocessor.
In this case all the processors equally capable of running the executive programs.
In an asymmetric multiprocessor, only one or a subset of processors are executive
capable.
The remaining processors have no I/O capability and thus are called attached
processors.
An executive or a master processor can execute the OS and handle I/O.
Attached processors execute user codes under the supervision of the master processor.
Processors
P1
P2
pn
I/O
SM1
SMn
Shared memory
Fig 2: The UMA multiprocessor model.
ii)
multiprocessor system.
In this case there are three memory access patterns. They are
a. Local memory access (fastest).
b. Global memory access.
c. Remote memory access (slowest).
In this model processors are divided into several clusters.
Each cluster is itself an UMA or an NUMA microprocessor.
The clusters are connected to global shared memory modules. The entire system is
LM1
P
1
LM2
P
2
LMn
Inter
conne
ction
netwo
GSM
GSM
P
P
:P
C
I
N
CSM
CSM
: CSM
Cluster1
CSM
C
I
N:
CSM
CSM
Cluster N
pipelines.
If the instruction is decoded as a vector operation, it will be sent to the vector
control unit. This control unit will supervise the flow of vector data between the
main memory and the vector functional pipelines. The vector data flow is
coordinated by the control unit. A number of vector functional pipelines may be
built into a vector processor.
In vector super computer, there will be a vector processor and it can be built on two
architectures, namely
1. Register-to-register architecture
2. Memory-to-memory architecture
Register-to-register architecture:
Here vector registers are used to hold the vector operands, intermediate and
vector registers.
All vector registers are programmable in user instructions.
Each vector register is equipped with a component counter which keeps track
Memory-to-memory architecture:
In this architecture, the vector operands and intermediate results are directly copied
into the memory
and they are retrieved as and when it isVector
required
from the memory.
processor
Scalar
instructions
Main memory
(program and
data)
Mass
storage
Host
comp
Instructions
Scalar
vector
Data
Data
I/O (user)
(5+5Marks)
Sol:
a) There are 5 types of data dependencies. They are as follows:
(1)
Flow dependence:
A statement S2 is flow-dependent on the statement S1 if an execution path exists
S
1
S
1
(3)
Output dependence:
Two statements are output dependent if they produce the same output variable.
Ex:
(4)
S1:
S2:
load R1, A
move R1, R3
S
1
S
1
I/O dependence:
Read and write are I/O statements. I/O dependence occurs not because the
same variable is involved but because the same file is referenced by both I/O
statements.
(5)
Unknown dependence:
The dependence relation between two statements cannot be determined in the
following situations.
The subscript of a variable itself subscribed.
The subscript does not contain the loop index variable.
A variable appears more than once with subscripts having different coefficients of
the loop variable.
The subscript is nonlinear in the loop index variable.
When one or more of these conditions exists, a conservative assumption is to
claim unknown dependence among the statements involved.
S
5
S
4
PART3
Answer any Two full questions.
6) Trace out the following program to detect the parallelism using Bernsteins conditions
P1: C = D x E
P2: M= G + C
P3: A = B + C
P4: C = L + M
P5: F = G / E
Assume that each step requires one cycle to execute and two adders are available.
Compare between serial and parallel execution of the above program
(10 Marks)
Sol: Bernstein revealed a set of conditions based on which two processes can execute in
parallel.
P1, P2 - process
I1, I2 - inputs
O1, O2 -- outputs
P1 || P2 if and only if
I1 O 2 =
I2 O 1 =
O1 O2 =
P1 || P2 || . . . . || Pk if and only if
Pi || Pj
if
ij
G
+
1
+1
+2
+
2
A
+3
+
3
G
E
F
Fig(a): Sequential execution in 5 steps
P1 || P5,
P2 || P3,
P2 || P5,
P4 || P5,
P5 || P3
Collectively
P2 || P3 || P5 Because
P2 || P3,
P2 || P5,
P3 || P5
7)
Explain hardware and software parallelism with an example.
(10Marks)
Sol:
Hardware parallelism:
This refers to parallelism defined by machine architecture and hardware multiplicity.
One way to characterize the parallelism is by the number of instruction issues per
machine cycle.
If a processor issues k-instructions per machine cycle, then it is called k-issue processor.
A conventional processor takes one or more machine cycles to issue a single instruction.
These are called one issue machine with single instruction pipeline in the processor.
A multiprocessor system built with n k-issue processor should be able to handle a
maximum nk thread of instructions simultaneously.
Software parallelism:
It is defined by the control and data dependences of programs.
The degree of parallelism is revealed in the program profile or in the program flow graph.
parallelism.
Assuming two multiplier units and two add/subtract units and 2-issue processor in which
one memory access (load/store) and one arithmetic operation can execute simultaneously.
Calculate average hardware parallelism.
L
1
L
2
L
4
L
3
1-cycle, 4-operations
1-cycle, 2-operations
1-cycle, 2-operations
B
Fig(a): Software parallelism
Cycle 1
L2
Cycle 1
Cycle 1
L
3
L
4
Cycle 1
Cycle 1
A
Fig (b) Hardware parallelism
Cycle 1
Cycle 1
B
7-cycles and 8-operations
H/w parallelism = 8/7 = 1.14 instruction/cycle
8) Explain how grain packing can be done to compute the sum of the 4 elements in the
resulting product matrix C = A x B Where A and B are 2x2 Matrices. Assume grain size
for multiplication is 101 and the grain size for addition is 8.
(10Marks)
Sol:
A
B
A
B
is = 101
is = 8
C = AX B
A=
A11
A12
A21
A22
B=
B11
B21
A22
B12
C=
C21
C11
C22
C12
SUM
SUM
Grain size of U = 210
Grain size of V = 210
Grain size of W = 210
****