Escolar Documentos
Profissional Documentos
Cultura Documentos
1. What is pipelining?
Pipelining is essentially an implementation technique in which multiple
instructions are being overlapped in execution with each overlapped instruction in
different stages of its execution.
9. What are the impact of hazards on a pipeline and how will you overcome
each of the three different hazards?
Hazards limit the performance of the pipeline and each of the three different kinds
of hazards has to be handled separately. Structural hazards can be eliminated by
using more hardware resources. Data hazards can be eliminated by data
forwarding and control hazards can be eliminated by early evaluation and branch
prediction
12. What is the effect of pipelining on latency and throughput of the machine?
Pipelining helps, the throughput of the entire workload and it does not help the
latency of a single task. The potential speed up in a pipeline example will be equal
to the number of pipeline stages.
18. What is the difference between Big Endian and Little Endian?
This classification of the computer essentially arises from the two different ways
in which the computer stores bytes of a multiple byte number. In the Little Endian
format, the lower order bit of the number is stored in the lowest address and the
higher order bit is stored in the highest address. In the Big Endian, format the
higher order bit is stored in the lowest address and then the lower ordered bit in
highest address. Little Endian, the LSB comes first and in Big Endian the MSB
comes first.
19. List computers where Little Endian and Big Endian is being used?
Intel Processors in PCs use the little Endian and most of the UNIX machine are
Big Endian
25. What is the speed up of a pipeline and what is the effect of unbalanced
pipeline?
The speed up of a pipeline is defined as the ratio of the time taken to execute the
instructions in a unpipelined processor to the time taken to execute these
instructions in an pipelined processor. The number of stages of a pipeline is equal
to the ideal speed up of the pipeline. If the delay through each pipeline stage were
unbalanced then the speed up of the pipeline would decrease. So creating a
pipeline with balanced stage is one of the most difficult tasks.
The above diagram is the basic schematic of a Von Nueman processor. The
processor is the active part of the computer, which is responsible for data
manipulating decision-making. The processor is made of 2 components which are
the data path and the control. The data path is the hardware that performs all the
operations and control is the hardware that tells the data path what to do.
27. What is the difference between single cycle and multi cycle implementation
of a data path?
In a single cycle implementation, the clock period is dependent on the length of
the longest instruction to be executed. The load instruction takes five steps to be
executed and hence the clock period is determined by the load instruction and
now when we want to execute and R type instruction, which would require only
four steps, we have a period where it is idle or time is being wasted.
In multi cycle implementation, the clock period is determined by the longest step
of an instruction and not the longest instruction itself. The CPI is exactly one for
single cycle implementation and it is greater than one for multi cycle
implementation. Hence, the multicycle has overall better performance.
44. Assuming ideal conditions, if I have a pipelined machine with n stages, how
fast does a pipelined instruction execute compared to the same instruction on
an identical machine that is not pipelined?
It will execute itself n times faster in the pipelined stage
45. We have a non-pipelined computer that previously took 2.3 microseconds to
execute an instruction, and now it has a pipeline with three stages that take
.5, .8, and .9 microseconds each. What is the speedup?
Speed up is 2.3/.9
51. What are the difference between the write through and write back?
The write back scheme can improve performance and is faster when compared to
the write through scheme as the write through scheme involves writing data to the
memory that takes time and slows down the processor. The write back is more
complex than the write through scheme to implement. In write-back, since few
writes to the next memory level are required it uses less memory bandwidth, but
the write through uses large memory bandwidth.
54. Instead of just 5-8, pipe stages why not have, say, a pipeline with 50 pipe
stages?
The main reason why a 50-stage pipeline is not being adopted is that it will
involve large amount of hardware resources and hence will increase the area
occupied. In addition, the size of the instruction cache to be used will be very
large to keep the pipeline full and to minimize stalling.
59. What is the difference between 1 way,2 way, 4 way and 8 way set associative
cache in 8 block cache memory?
A 1 way set associative means that it is direct mapped structure. A 8 way set
associative means it is the fully associative structure. A 2 way set associative
means there are 2 blocks in each set. A 4 way set associative means there are 4
blocks in each set.
65. How will you calculate the offset, index and the tag for a block?
Offset= log 2 (block size)
Index = Log 2(number of blocks/ associativity)
Tag size= address size- offset-index
67. Consider two equally sized caches, one of which is direct mapped and the
other is two way set-associative. There are 256 lines, with 8 words per line,
and 4 bytes per word. The machine has 32-bit addresses and is byte
addressable, with a word of 4 bytes. How many bits are used for the tag,
index, and offset?
In both caches we need 5 bits for the offset since there are 8 words/line x 4
bytes/word = 32 bytes/line. In the two-way set-associative cache, we have 128
available lines that we wish to index into, which will require 7 bits. This leaves 32
- 7 - 5 = 20 bits for the tag. The direct-mapped cache will have 256 lines, which
results in 8 index bits and 19 tag bits.
68. 32 KB 4-way set-associative data cache array with 32 byte line sizes
How many sets?
How many index bits, offset bits, tag bits?
How large is the tag array?
69. What are the different ways to speed up access to the main memory?
The two typical ways of speeding up the access to the main memory is
Use wider memory to provide more bytes at a time
Use independent memory banks to allow multiple independent accesses.
70. What are the methods to improve the miss penalty of cache
The various methods are as follows:
Give priority to reads over writes
Use a second level cache
72. What is LRU and where is it used and where is it not used?
LRU refers to least recently used scheme. It is not used or it is not possible in
direct mapped cache and fully associative cache. Because in direct mapped cache,
if a miss occurs then the requested block can go into only one position. In the case
of a fully associative cache, the requested block can go anywhere. LRU is used
only in set associative caches. It is a replacement scheme in which the block
replaced is the one, which has been unused for a long time.
73. Explain how a cache hit or miss is determined, given a memory address?
A memory address is broken up into three parts: the tag, the index and the offset.
Therefore, we first calculate the offset, Index and the tag. We look for the cache
line in each set corresponding to the index of our memory address. There is one
such line for each set. We then compare the tags of each line to the tag of our
memory address. If they match, we have a cache hit else, we have a cache miss.
74. Are you familiar with the term snooping?
Snooping is essentially the process whereby snoopy caches snoop the buses and
when a value, which has a copy in the local cache, is being modified then that
value in the local cache is also updated.
91. What is the relation between performance and the execution time of a
machine?
They are inversely proportional
94. What are the various programs for measuring the performance?
The various programs for measuring performance are as follows:
Real Applications
Modified application
Kernels
Toy Benchmarks
Synthetic Benchmarks
1. What is scoreboarding?
Scoreboarding is a technique, which allows pipelined instruction to execute of
order when there are sufficient resources and no data dependencies. A centralized
table keeps track of the status of instruction; functional units and the registers
.Instructions are executed when they are ready and stalled if hazards exist.
Topics
1. Cache and Virtual memory
2. Pipelining/single and multi cycle
3. Dynamic scheduling
4. Brand prediction
5. number systems
6. Parallelism/exceptions and interrupts