Escolar Documentos
Profissional Documentos
Cultura Documentos
2010
COMPUTER ARCHITECTURE
CE2010
BK
TP.HCM
Nam Ho
dce
2010
Chapter 1
Computer Abstraction and
Technology - Exercises
Adapted from Computer Organization and Design, 4th
Edition, Patterson & Hennessy, 2008
dce
2010
Instruction count
MIPS =
Execution time 106
Instruction count
Clock rate
=
=
6
Instruction count CPI
CPI
10
6
10
Clock rate
dce
2010
Using MIPS
There are three problems with MIPS:
MIPS specifies the instruction execution rate but
not the capabilities of the instructions
MIPS varies between programs on the same
computer
MIPS can vary inversely with performance (see
the next example)
dce
2010
Exercise 1
Consider the machine with the following three instruction
classes and CPI:
CPI for this instruction class
A
B
C
1
2
3
CPI
Assume that the machines clock rate is 500 MHz. Which code
sequence will execute faster according to execution time ?
According to MIPS?
dce
2010
Exercise 1
Using the formula: CPU clock cycles
n
Compiler 1:
CPU clock cycles = (5 x 1 + 1 x 2 + 1 x 3) x 109 = 10 x 109 cycles
Execution time = (10 x 109)/500 x 106 = 20 seconds
Compiler 2:
CPU clock cycles = (10 x 1 + 1 x 2 + 1 x 3) x 109 = 15 x 109 cycles
Execution time = (15 x 109)/(500 x 106) = 30 seconds
dce
2010
Exercise 1
Instruction count
MIPS =
Execution Time 106
Compiler 1:
Compiler 2:
(5 + 1 + 1) 109
MIPS =
= 350
6
20 10
(10 + 1 + 1) 109
MIPS =
= 400
6
30 10
dce
2010
Exercise 2
PowerPC G4 can execute 4 instructions/cycle
with a clock rate 867 MHz. What is the MIPS
rate ?
Answer:
CPI = 1/4
MIPS = 1/(CPI x Cycle Time x 106)
MIPS = Clock Rate/CPI x 106
MIPS = 4 x 867 = 3468
dce
2010
Speedup
Tbefore improved
= --------------Timproved
Timproved =
Taffected
+ Tunaffected
improvement factor
dce
2010
Exercise 3
Store
Load
Branch
Instructions
500
50
100
50
CPI
10
dce
2010
Exercise 4
Speedup
Tbefore improved
= ---------------
Timproved
Taffected
Timproved =
+ Tunaffected
improvement factor
Computer Architecture Chapter 1
11
dce
2010
Exercise 5
A common transformation required in graphics processors is
square root. Implementations of floating-point (FP) square
root vary significantly in performance, especially among
processors designed for graphics.
Suppose FP square root (FPSQR) is responsible for 20% of
the execution time of a critical graphics benchmark. One
proposal is to enhance the FPSQR hardware and speed up
this operation by a factor of 10.
The other alternative is just to try to make all FP instructions
in the graphics processor run faster by a factor of 1.6; FP
instructions are responsible for half of the execution time for
the application. Compare these two design alternatives.
12
dce
2010
Exercise 5
SpeedupFPSQR
1
=
= 1.22
=
0.2 0.82
(1 0.2) +
10
SpeedupFP =
1
=
= 1.23
0.5 0.8125
(1 0.5) +
1.6
13
dce
2010
Geometric Mean
GM =
i =1
14
dce
2010
Geometric Mean
15
dce
2010
Exercise 6
Show that the Geometric Mean doesnt matter
which computer is used for normalization.
16
dce
2010
Fallacy: MFLOPS
MFLOPS: Million Floating-point Operations
Per Second
Fallacy: MFLOPS is a consistent and useful
measure of performance.
For example, the Cray C90 has no divide
instruction, while the Intel Pentium has divide,
square root, sine, and cosine
17
dce
Exercise 7
2010
Program a:
FPop = 106 0.4 = 4 105, clockcylesfp = CPI No. FP instr. = 4 105
Tfp = 4 105 0.33 109 = 1.32 104 then MFLOPS = 3.03 103
Program b:
FPop = 3 106 0.4 = 1.2 106, clockcylesfp = CPI No. FP instr. =
0.70 1.2 106
Tfp = 0.84 106 0.33 109 = 2.77 104 then MFLOPS = 4.33 103
Computer Architecture Chapter 1
18