Escolar Documentos
Profissional Documentos
Cultura Documentos
Let the three processors be labelled as P1, P2 and P3. The process being run by each
processor including procedures for acquiring and releasing a lock L is as follows.
{{LL} SC} LD ST ST
read hit read miss read miss wriet hit write miss write miss
(compulsory) (due to invalidation) (compulsory) (due to invalidation)
none •read request • read request • invalidate • write request • write request
•mem to • owner cache to mem • ack • mem to • owner cache to
requesting • mem to requesting requesting mem
cache cache cache • mem to
•ack • ack • ack requesting cache
• ack
[4 marks for protocol steps]
Performance can be improved by the following changes.
• A better coherence protocol such as Berkeley or Illinois protocol may be used.
• A more relaxed memory model, for example, TSO or Processor Consistency, may be
used.
[2 marks]
3. Consider a two stage dynamic
network shown in the figure with
each stage consisting of k cross-bar 1 1
switches of size k × k. Is it a blocking
network or non-blocking network?
Illustrate your answer. 2 2
Suppose each input to the 1st stage
has same arrival rate of messages
with Poisson distribution. There are
buffers to queue up the requests at the
inputs of the 1st stage but no buffers k k
elsewhere. Derive an expression for
the throughput of this network,
assuming that all messages are of 1st stage 2nd stage
same size.
(12)
Solution:
The network can connect any input to any output individually, but when multiple messages need
to be routed, blocking may occur. Let us use the notation <i, j> to denote jth port (at input or
output side, as the case may be) of the ith switch (in the 1st stage or the 2nd stage, as the case may
be). The route from <i1, j1> at the primary input to <i2, j2> at the primary output, has to go
through <i1, i2> at the 1st stage output and <i2, i1> at the 2nd stage input. This requires i1th
switch in the 1st stage to switch from j1 to i2 and i2th switch in the 2nd stage to switch from i1 to
j2. This will conflict with all messages requiring routing from <i1, x> to <i2, y> or from <x, y>
to <i2, j2> for any x and y.
[4 marks for this part]
Now let r be the probability of arrival of a message at any input of the 1st stage in one cycle,
where one cycle corresponds to the service time of a message (assumed to be a constant because
of uniform message size).
Probability of i simultaneous messages arriving at the inputs of a switch in the 1st stage
= q(i) = k Ci r i (1 − r ) k −i
Expected no. of requests accepted out of these i requests
k −1 i
= E(i) = 1 − k
k
Throughput at the input of the 1st stage = k2r
k
k
r
Expected throughput at the output of the 1 stage = k ∑ E (i )q (i ) = k [1 − 1 − ]
st 2
i =0 k
[3 marks for these basic expressions]
k
r
Let us use s to denote [1 − 1 − ]
k
Throughput at the output of the 1st stage = k2s
= Throughput at the input of the second stage.
k
s
By similar arguments, throughput at the output of the 2 stage = k [1 − 1 − ]
nd 2
k
[3 marks for extending the analysis to 2 stages]
This does not include the effects of resubmission (resulting from conflicts and rejections) and
the queuing. The effect of resubmissions is to increase the throughput at the input of the
network. Queuing affects the delay but not the throughput.
r
Let us say, r effectively increases to r’, where r’ = .
r + PA (1 − r )
k
r'
Then s increases to s’, where s’ = [1 − 1 − ] and the overall throughput increases to
k
k
s'
k [1 − 1 − ] .
2
k
k
1 s'
Here PA = [1 − 1 − ] .
r k
The queuing buffers will allow this throughput to be maintained. In absence of the buffers, the
message sources would have slowed down.
[2 marks for accounting for resubmissions]
4. Show a systolic array to multiply two band matrices a and b, where each matrix has a band
of width 3 (aij = 0 and bij = 0 for i < j-1 or i > j+1). The dimension of each matrix is n × n.
Find the number of steps required for computation, starting from the time when the first
element of each matrix enters the systolic array.
(6)
Solution:
Therefore, the systolic array to compute the product will have only 3 × 3 = 9 processing
elements as shown.
A23
The position of various matrix elements at T = 0 when the first elements of each matrix enter the
systolic array is as shown next.
T=0
T=1
A34
A34
T=3
A34
A21 B12
The first element of the result C11 comes out at T = 3. The next main diagonal element C22 will
come out at T = 6. Clearly, the last element Cnn comes out at time 3n.